public inbox for gentoo-commits@lists.gentoo.org
 help / color / mirror / Atom feed
From: "Mike Pagano" <mpagano@gentoo.org>
To: gentoo-commits@lists.gentoo.org
Subject: [gentoo-commits] proj/linux-patches:3.14 commit in: /
Date: Fri,  9 Jan 2015 16:18:50 +0000 (UTC)	[thread overview]
Message-ID: <1420820324.0b8b3489401caaf8a95a16b8636504a3179aa437.mpagano@gentoo> (raw)

commit:     0b8b3489401caaf8a95a16b8636504a3179aa437
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Fri Jan  9 16:18:44 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Fri Jan  9 16:18:44 2015 +0000
URL:        http://sources.gentoo.org/gitweb/?p=proj/linux-patches.git;a=commit;h=0b8b3489

BFQ upgrade to v7r7

---
 0000_README                                        |   12 +-
 ...oups-kconfig-build-bits-for-BFQ-v7r7-3.14.patch |    6 +-
 ...ntroduce-the-BFQ-v7r7-I-O-sched-for-3.14.patch1 | 2023 ++++++++++++++------
 ...ly-Queue-Merge-EQM-to-BFQ-v7r7-for-3.14.0.patch |  530 +++--
 4 files changed, 1829 insertions(+), 742 deletions(-)

diff --git a/0000_README b/0000_README
index 9766ed5..0f5f3a0 100644
--- a/0000_README
+++ b/0000_README
@@ -190,15 +190,15 @@ Patch:  5000_enable-additional-cpu-optimizations-for-gcc.patch
 From:   https://github.com/graysky2/kernel_gcc_patch/
 Desc:   Kernel patch enables gcc optimizations for additional CPUs.
 
-Patch:  5001_BFQ-1-block-cgroups-kconfig-build-bits-for-v7r2-3.14.patch
+Patch:  5001_BFQ-1-block-cgroups-kconfig-build-bits-for-v7r7-3.14.patch
 From:   http://algo.ing.unimo.it/people/paolo/disk_sched/
-Desc:   BFQ v7r2 patch 1 for 3.14: Build, cgroups and kconfig bits
+Desc:   BFQ v7r7 patch 1 for 3.14: Build, cgroups and kconfig bits
 
-Patch:  5002_BFQ-2-block-introduce-the-v7r2-I-O-sched-for-3.14.patch1
+Patch:  5002_BFQ-2-block-introduce-the-v7r7-I-O-sched-for-3.14.patch1
 From:   http://algo.ing.unimo.it/people/paolo/disk_sched/
-Desc:   BFQ v7r2 patch 2 for 3.14: BFQ Scheduler
+Desc:   BFQ v7r7 patch 2 for 3.14: BFQ Scheduler
 
-Patch:  5003_BFQ-3-block-add-Early-Queue-Merge-EQM-v7r2-for-3.14.0.patch
+Patch:  5003_BFQ-3-block-add-Early-Queue-Merge-EQM-v7r7-for-3.14.0.patch
 From:   http://algo.ing.unimo.it/people/paolo/disk_sched/
-Desc:   BFQ v7r2 patch 3 for 3.14: Early Queue Merge (EQM)
+Desc:   BFQ v7r7 patch 3 for 3.14: Early Queue Merge (EQM)
 

diff --git a/5001_BFQ-1-block-cgroups-kconfig-build-bits-for-BFQ-v7r2-3.14.patch b/5001_BFQ-1-block-cgroups-kconfig-build-bits-for-BFQ-v7r7-3.14.patch
similarity index 97%
rename from 5001_BFQ-1-block-cgroups-kconfig-build-bits-for-BFQ-v7r2-3.14.patch
rename to 5001_BFQ-1-block-cgroups-kconfig-build-bits-for-BFQ-v7r7-3.14.patch
index 2572376..064b032 100644
--- a/5001_BFQ-1-block-cgroups-kconfig-build-bits-for-BFQ-v7r2-3.14.patch
+++ b/5001_BFQ-1-block-cgroups-kconfig-build-bits-for-BFQ-v7r7-3.14.patch
@@ -1,7 +1,7 @@
-From c3280db98437c9520f04ecacfdf1a868d7a4b7b3 Mon Sep 17 00:00:00 2001
+From 5a63dcf91cb3c8f1550016ab34d4aaa6ebbb82fb Mon Sep 17 00:00:00 2001
 From: Paolo Valente <paolo.valente@unimore.it>
 Date: Tue, 3 Sep 2013 16:50:42 +0200
-Subject: [PATCH 1/3] block: cgroups, kconfig, build bits for BFQ-v7r2-3.14
+Subject: [PATCH 1/3] block: cgroups, kconfig, build bits for BFQ-v7r7-3.14
 
 Update Kconfig.iosched and do the related Makefile changes to include
 kernel configuration options for BFQ. Also add the bfqio controller
@@ -100,5 +100,5 @@ index 7b99d71..4e8c0ff 100644
  SUBSYS(perf)
  #endif
 -- 
-1.9.0
+2.1.3
 

diff --git a/5002_BFQ-2-block-introduce-the-BFQ-v7r2-I-O-sched-for-3.14.patch1 b/5002_BFQ-2-block-introduce-the-BFQ-v7r7-I-O-sched-for-3.14.patch1
similarity index 72%
rename from 5002_BFQ-2-block-introduce-the-BFQ-v7r2-I-O-sched-for-3.14.patch1
rename to 5002_BFQ-2-block-introduce-the-BFQ-v7r7-I-O-sched-for-3.14.patch1
index 133602c..65374b9 100644
--- a/5002_BFQ-2-block-introduce-the-BFQ-v7r2-I-O-sched-for-3.14.patch1
+++ b/5002_BFQ-2-block-introduce-the-BFQ-v7r7-I-O-sched-for-3.14.patch1
@@ -1,9 +1,9 @@
-From 5055277df59d9280da6b60cf90bed8e5e57dc44d Mon Sep 17 00:00:00 2001
+From ba938955e241de008540551e3a16e43850ecdd66 Mon Sep 17 00:00:00 2001
 From: Paolo Valente <paolo.valente@unimore.it>
 Date: Thu, 9 May 2013 19:10:02 +0200
-Subject: [PATCH 2/3] block: introduce the BFQ-v7r2 I/O sched for 3.14
+Subject: [PATCH 2/3] block: introduce the BFQ-v7r7 I/O sched for 3.14
 
-Add the BFQ-v7r2 I/O scheduler to 3.14.
+Add the BFQ-v7r7 I/O scheduler to 3.14.
 The general structure is borrowed from CFQ, as much of the code for
 handling I/O contexts. Over time, several useful features have been
 ported from CFQ as well (details in the changelog in README.BFQ). A
@@ -56,12 +56,12 @@ until it expires.
 Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
 Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
 ---
- block/bfq-cgroup.c  |  926 +++++++++++++++
+ block/bfq-cgroup.c  |  938 +++++++++++++
  block/bfq-ioc.c     |   36 +
- block/bfq-iosched.c | 3300 +++++++++++++++++++++++++++++++++++++++++++++++++++
- block/bfq-sched.c   | 1078 +++++++++++++++++
- block/bfq.h         |  622 ++++++++++
- 5 files changed, 5962 insertions(+)
+ block/bfq-iosched.c | 3902 +++++++++++++++++++++++++++++++++++++++++++++++++++
+ block/bfq-sched.c   | 1214 ++++++++++++++++
+ block/bfq.h         |  775 ++++++++++
+ 5 files changed, 6865 insertions(+)
  create mode 100644 block/bfq-cgroup.c
  create mode 100644 block/bfq-ioc.c
  create mode 100644 block/bfq-iosched.c
@@ -70,10 +70,10 @@ Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
 
 diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
 new file mode 100644
-index 0000000..bcecdb4
+index 0000000..798bb17
 --- /dev/null
 +++ b/block/bfq-cgroup.c
-@@ -0,0 +1,926 @@
+@@ -0,0 +1,938 @@
 +/*
 + * BFQ: CGROUPS support.
 + *
@@ -85,7 +85,8 @@ index 0000000..bcecdb4
 + *
 + * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
 + *
-+ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ file.
++ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
++ * file.
 + */
 +
 +#ifdef CONFIG_CGROUP_BFQIO
@@ -153,6 +154,12 @@ index 0000000..bcecdb4
 +		entity->new_weight = bfq_ioprio_to_weight(bgrp->ioprio);
 +		entity->new_ioprio = bgrp->ioprio;
 +	} else {
++		if (bgrp->weight < BFQ_MIN_WEIGHT ||
++		    bgrp->weight > BFQ_MAX_WEIGHT) {
++			printk(KERN_CRIT "bfq_group_init_entity: "
++					 "bgrp->weight %d\n", bgrp->weight);
++			BUG();
++		}
 +		entity->new_weight = bgrp->weight;
 +		entity->new_ioprio = bfq_weight_to_ioprio(bgrp->weight);
 +	}
@@ -160,6 +167,7 @@ index 0000000..bcecdb4
 +	entity->ioprio = entity->new_ioprio;
 +	entity->ioprio_class = entity->new_ioprio_class = bgrp->ioprio_class;
 +	entity->my_sched_data = &bfqg->sched_data;
++	bfqg->active_entities = 0;
 +}
 +
 +static inline void bfq_group_set_parent(struct bfq_group *bfqg,
@@ -217,8 +225,9 @@ index 0000000..bcecdb4
 +			bfq_group_set_parent(prev, bfqg);
 +			/*
 +			 * Build a list of allocated nodes using the bfqd
-+			 * filed, that is still unused and will be initialized
-+			 * only after the node will be connected.
++			 * filed, that is still unused and will be
++			 * initialized only after the node will be
++			 * connected.
 +			 */
 +			prev->bfqd = bfqg;
 +			prev = bfqg;
@@ -238,7 +247,8 @@ index 0000000..bcecdb4
 +}
 +
 +/**
-+ * bfq_group_chain_link - link an allocated group chain to a cgroup hierarchy.
++ * bfq_group_chain_link - link an allocated group chain to a cgroup
++ *                        hierarchy.
 + * @bfqd: the queue descriptor.
 + * @css: the leaf cgroup_subsys_state to start from.
 + * @leaf: the leaf group (to be associated to @cgroup).
@@ -512,7 +522,8 @@ index 0000000..bcecdb4
 +}
 +
 +/**
-+ * bfq_reparent_active_entities - move to the root group all active entities.
++ * bfq_reparent_active_entities - move to the root group all active
++ *                                entities.
 + * @bfqd: the device data structure with the root group.
 + * @bfqg: the group to move from.
 + * @st: the service tree with the entities.
@@ -557,8 +568,8 @@ index 0000000..bcecdb4
 +	hlist_del(&bfqg->group_node);
 +
 +	/*
-+	 * Empty all service_trees belonging to this group before deactivating
-+	 * the group itself.
++	 * Empty all service_trees belonging to this group before
++	 * deactivating the group itself.
 +	 */
 +	for (i = 0; i < BFQ_IOPRIO_CLASSES; i++) {
 +		st = bfqg->sched_data.service_tree + i;
@@ -578,7 +589,7 @@ index 0000000..bcecdb4
 +		 * all the leaf entities corresponding to these queues
 +		 * to the root_group.
 +		 * Also, it may happen that the group has an entity
-+		 * under service, which is disconnected from the active
++		 * in service, which is disconnected from the active
 +		 * tree: it must be moved, too.
 +		 * There is no need to put the sync queues, as the
 +		 * scheduler has taken no reference.
@@ -616,14 +627,14 @@ index 0000000..bcecdb4
 +	kfree(bfqg);
 +}
 +
-+static void bfq_end_raising_async(struct bfq_data *bfqd)
++static void bfq_end_wr_async(struct bfq_data *bfqd)
 +{
 +	struct hlist_node *tmp;
 +	struct bfq_group *bfqg;
 +
 +	hlist_for_each_entry_safe(bfqg, tmp, &bfqd->group_list, bfqd_node)
-+		bfq_end_raising_async_queues(bfqd, bfqg);
-+	bfq_end_raising_async_queues(bfqd, bfqd->root_group);
++		bfq_end_wr_async_queues(bfqd, bfqg);
++	bfq_end_wr_async_queues(bfqd, bfqd->root_group);
 +}
 +
 +/**
@@ -849,10 +860,11 @@ index 0000000..bcecdb4
 +		ioc = task->io_context;
 +		if (ioc != NULL && atomic_read(&ioc->nr_tasks) > 1)
 +			/*
-+			 * ioc == NULL means that the task is either too young
-+			 * or exiting: if it has still no ioc the ioc can't be
-+			 * shared, if the task is exiting the attach will fail
-+			 * anyway, no matter what we return here.
++			 * ioc == NULL means that the task is either too
++			 * young or exiting: if it has still no ioc the
++			 * ioc can't be shared, if the task is exiting the
++			 * attach will fail anyway, no matter what we
++			 * return here.
 +			 */
 +			ret = -EINVAL;
 +		task_unlock(task);
@@ -970,9 +982,9 @@ index 0000000..bcecdb4
 +{
 +}
 +
-+static void bfq_end_raising_async(struct bfq_data *bfqd)
++static void bfq_end_wr_async(struct bfq_data *bfqd)
 +{
-+	bfq_end_raising_async_queues(bfqd, bfqd->root_group);
++	bfq_end_wr_async_queues(bfqd, bfqd->root_group);
 +}
 +
 +static inline void bfq_disconnect_groups(struct bfq_data *bfqd)
@@ -1044,10 +1056,10 @@ index 0000000..7f6b000
 +}
 diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
 new file mode 100644
-index 0000000..f5f71e4
+index 0000000..5aa5f09
 --- /dev/null
 +++ b/block/bfq-iosched.c
-@@ -0,0 +1,3300 @@
+@@ -0,0 +1,3902 @@
 +/*
 + * Budget Fair Queueing (BFQ) disk scheduler.
 + *
@@ -1059,28 +1071,32 @@ index 0000000..f5f71e4
 + *
 + * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
 + *
-+ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ file.
++ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
++ * file.
 + *
-+ * BFQ is a proportional share disk scheduling algorithm based on the
-+ * slice-by-slice service scheme of CFQ. But BFQ assigns budgets, measured in
-+ * number of sectors, to tasks instead of time slices. The disk is not granted
-+ * to the in-service task for a given time slice, but until it has exhausted
-+ * its assigned budget. This change from the time to the service domain allows
-+ * BFQ to distribute the disk bandwidth among tasks as desired, without any
-+ * distortion due to ZBR, workload fluctuations or other factors. BFQ uses an
-+ * ad hoc internal scheduler, called B-WF2Q+, to schedule tasks according to
-+ * their budgets (more precisely BFQ schedules queues associated to tasks).
-+ * Thanks to this accurate scheduler, BFQ can afford to assign high budgets to
-+ * disk-bound non-seeky tasks (to boost the throughput), and yet guarantee low
-+ * latencies to interactive and soft real-time applications.
++ * BFQ is a proportional-share storage-I/O scheduling algorithm based on
++ * the slice-by-slice service scheme of CFQ. But BFQ assigns budgets,
++ * measured in number of sectors, to processes instead of time slices. The
++ * device is not granted to the in-service process for a given time slice,
++ * but until it has exhausted its assigned budget. This change from the time
++ * to the service domain allows BFQ to distribute the device throughput
++ * among processes as desired, without any distortion due to ZBR, workload
++ * fluctuations or other factors. BFQ uses an ad hoc internal scheduler,
++ * called B-WF2Q+, to schedule processes according to their budgets. More
++ * precisely, BFQ schedules queues associated to processes. Thanks to the
++ * accurate policy of B-WF2Q+, BFQ can afford to assign high budgets to
++ * I/O-bound processes issuing sequential requests (to boost the
++ * throughput), and yet guarantee a low latency to interactive and soft
++ * real-time applications.
 + *
 + * BFQ is described in [1], where also a reference to the initial, more
-+ * theoretical paper on BFQ can be found. The interested reader can find in
-+ * the latter paper full details on the main algorithm as well as formulas of
-+ * the guarantees, plus formal proofs of all the properties. With respect to
-+ * the version of BFQ presented in these papers, this implementation adds a
-+ * few more heuristics, such as the one that guarantees a low latency to soft
-+ * real-time applications, and a hierarchical extension based on H-WF2Q+.
++ * theoretical paper on BFQ can be found. The interested reader can find
++ * in the latter paper full details on the main algorithm, as well as
++ * formulas of the guarantees and formal proofs of all the properties.
++ * With respect to the version of BFQ presented in these papers, this
++ * implementation adds a few more heuristics, such as the one that
++ * guarantees a low latency to soft real-time applications, and a
++ * hierarchical extension based on H-WF2Q+.
 + *
 + * B-WF2Q+ is based on WF2Q+, that is described in [2], together with
 + * H-WF2Q+, while the augmented tree used to implement B-WF2Q+ with O(log N)
@@ -1165,21 +1181,46 @@ index 0000000..f5f71e4
 +#define BFQ_RATE_SHIFT		16
 +
 +/*
-+ * The duration of the weight raising for interactive applications is
-+ * computed automatically (as default behaviour), using the following
-+ * formula: duration = (R / r) * T, where r is the peak rate of the
-+ * disk, and R and T are two reference parameters. In particular, R is
-+ * the peak rate of a reference disk, and T is about the maximum time
-+ * for starting popular large applications on that disk, under BFQ and
-+ * while reading two files in parallel. Finally, BFQ uses two
-+ * different pairs (R, T) depending on whether the disk is rotational
-+ * or non-rotational.
++ * By default, BFQ computes the duration of the weight raising for
++ * interactive applications automatically, using the following formula:
++ * duration = (R / r) * T, where r is the peak rate of the device, and
++ * R and T are two reference parameters.
++ * In particular, R is the peak rate of the reference device (see below),
++ * and T is a reference time: given the systems that are likely to be
++ * installed on the reference device according to its speed class, T is
++ * about the maximum time needed, under BFQ and while reading two files in
++ * parallel, to load typical large applications on these systems.
++ * In practice, the slower/faster the device at hand is, the more/less it
++ * takes to load applications with respect to the reference device.
++ * Accordingly, the longer/shorter BFQ grants weight raising to interactive
++ * applications.
++ *
++ * BFQ uses four different reference pairs (R, T), depending on:
++ * . whether the device is rotational or non-rotational;
++ * . whether the device is slow, such as old or portable HDDs, as well as
++ *   SD cards, or fast, such as newer HDDs and SSDs.
++ *
++ * The device's speed class is dynamically (re)detected in
++ * bfq_update_peak_rate() every time the estimated peak rate is updated.
++ *
++ * In the following definitions, R_slow[0]/R_fast[0] and T_slow[0]/T_fast[0]
++ * are the reference values for a slow/fast rotational device, whereas
++ * R_slow[1]/R_fast[1] and T_slow[1]/T_fast[1] are the reference values for
++ * a slow/fast non-rotational device. Finally, device_speed_thresh are the
++ * thresholds used to switch between speed classes.
++ * Both the reference peak rates and the thresholds are measured in
++ * sectors/usec, left-shifted by BFQ_RATE_SHIFT.
++ */
++static int R_slow[2] = {1536, 10752};
++static int R_fast[2] = {17415, 34791};
++/*
++ * To improve readability, a conversion function is used to initialize the
++ * following arrays, which entails that they can be initialized only in a
++ * function.
 + */
-+#define T_rot			(msecs_to_jiffies(5500))
-+#define T_nonrot		(msecs_to_jiffies(2000))
-+/* Next two quantities are in sectors/usec, left-shifted by BFQ_RATE_SHIFT */
-+#define R_rot			17415
-+#define R_nonrot		34791
++static int T_slow[2];
++static int T_fast[2];
++static int device_speed_thresh[2];
 +
 +#define BFQ_SERVICE_TREE_INIT	((struct bfq_service_tree)		\
 +				{ RB_ROOT, RB_ROOT, NULL, NULL, 0, 0 })
@@ -1385,6 +1426,125 @@ index 0000000..f5f71e4
 +		bfqq->pos_root = NULL;
 +}
 +
++/*
++ * Tell whether there are active queues or groups with differentiated weights.
++ */
++static inline bool bfq_differentiated_weights(struct bfq_data *bfqd)
++{
++	BUG_ON(!bfqd->hw_tag);
++	/*
++	 * For weights to differ, at least one of the trees must contain
++	 * at least two nodes.
++	 */
++	return (!RB_EMPTY_ROOT(&bfqd->queue_weights_tree) &&
++		(bfqd->queue_weights_tree.rb_node->rb_left ||
++		 bfqd->queue_weights_tree.rb_node->rb_right)
++#ifdef CONFIG_CGROUP_BFQIO
++	       ) ||
++	       (!RB_EMPTY_ROOT(&bfqd->group_weights_tree) &&
++		(bfqd->group_weights_tree.rb_node->rb_left ||
++		 bfqd->group_weights_tree.rb_node->rb_right)
++#endif
++	       );
++}
++
++/*
++ * If the weight-counter tree passed as input contains no counter for
++ * the weight of the input entity, then add that counter; otherwise just
++ * increment the existing counter.
++ *
++ * Note that weight-counter trees contain few nodes in mostly symmetric
++ * scenarios. For example, if all queues have the same weight, then the
++ * weight-counter tree for the queues may contain at most one node.
++ * This holds even if low_latency is on, because weight-raised queues
++ * are not inserted in the tree.
++ * In most scenarios, the rate at which nodes are created/destroyed
++ * should be low too.
++ */
++static void bfq_weights_tree_add(struct bfq_data *bfqd,
++				 struct bfq_entity *entity,
++				 struct rb_root *root)
++{
++	struct rb_node **new = &(root->rb_node), *parent = NULL;
++
++	/*
++	 * Do not insert if:
++	 * - the device does not support queueing;
++	 * - the entity is already associated with a counter, which happens if:
++	 *   1) the entity is associated with a queue, 2) a request arrival
++	 *   has caused the queue to become both non-weight-raised, and hence
++	 *   change its weight, and backlogged; in this respect, each
++	 *   of the two events causes an invocation of this function,
++	 *   3) this is the invocation of this function caused by the second
++	 *   event. This second invocation is actually useless, and we handle
++	 *   this fact by exiting immediately. More efficient or clearer
++	 *   solutions might possibly be adopted.
++	 */
++	if (!bfqd->hw_tag || entity->weight_counter)
++		return;
++
++	while (*new) {
++		struct bfq_weight_counter *__counter = container_of(*new,
++						struct bfq_weight_counter,
++						weights_node);
++		parent = *new;
++
++		if (entity->weight == __counter->weight) {
++			entity->weight_counter = __counter;
++			goto inc_counter;
++		}
++		if (entity->weight < __counter->weight)
++			new = &((*new)->rb_left);
++		else
++			new = &((*new)->rb_right);
++	}
++
++	entity->weight_counter = kzalloc(sizeof(struct bfq_weight_counter),
++					 GFP_ATOMIC);
++	entity->weight_counter->weight = entity->weight;
++	rb_link_node(&entity->weight_counter->weights_node, parent, new);
++	rb_insert_color(&entity->weight_counter->weights_node, root);
++
++inc_counter:
++	entity->weight_counter->num_active++;
++}
++
++/*
++ * Decrement the weight counter associated with the entity, and, if the
++ * counter reaches 0, remove the counter from the tree.
++ * See the comments to the function bfq_weights_tree_add() for considerations
++ * about overhead.
++ */
++static void bfq_weights_tree_remove(struct bfq_data *bfqd,
++				    struct bfq_entity *entity,
++				    struct rb_root *root)
++{
++	/*
++	 * Check whether the entity is actually associated with a counter.
++	 * In fact, the device may not be considered NCQ-capable for a while,
++	 * which implies that no insertion in the weight trees is performed,
++	 * after which the device may start to be deemed NCQ-capable, and hence
++	 * this function may start to be invoked. This may cause the function
++	 * to be invoked for entities that are not associated with any counter.
++	 */
++	if (!entity->weight_counter)
++		return;
++
++	BUG_ON(RB_EMPTY_ROOT(root));
++	BUG_ON(entity->weight_counter->weight != entity->weight);
++
++	BUG_ON(!entity->weight_counter->num_active);
++	entity->weight_counter->num_active--;
++	if (entity->weight_counter->num_active > 0)
++		goto reset_entity_pointer;
++
++	rb_erase(&entity->weight_counter->weights_node, root);
++	kfree(entity->weight_counter);
++
++reset_entity_pointer:
++	entity->weight_counter = NULL;
++}
++
 +static struct request *bfq_find_next_rq(struct bfq_data *bfqd,
 +					struct bfq_queue *bfqq,
 +					struct request *last)
@@ -1409,37 +1569,12 @@ index 0000000..f5f71e4
 +	return bfq_choose_req(bfqd, next, prev, blk_rq_pos(last));
 +}
 +
-+static void bfq_del_rq_rb(struct request *rq)
-+{
-+	struct bfq_queue *bfqq = RQ_BFQQ(rq);
-+	struct bfq_data *bfqd = bfqq->bfqd;
-+	const int sync = rq_is_sync(rq);
-+
-+	BUG_ON(bfqq->queued[sync] == 0);
-+	bfqq->queued[sync]--;
-+	bfqd->queued--;
-+
-+	elv_rb_del(&bfqq->sort_list, rq);
-+
-+	if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
-+		if (bfq_bfqq_busy(bfqq) && bfqq != bfqd->in_service_queue)
-+			bfq_del_bfqq_busy(bfqd, bfqq, 1);
-+		/*
-+		 * Remove queue from request-position tree as it is empty.
-+		 */
-+		if (bfqq->pos_root != NULL) {
-+			rb_erase(&bfqq->pos_node, bfqq->pos_root);
-+			bfqq->pos_root = NULL;
-+		}
-+	}
-+}
-+
 +/* see the definition of bfq_async_charge_factor for details */
 +static inline unsigned long bfq_serv_to_charge(struct request *rq,
 +					       struct bfq_queue *bfqq)
 +{
 +	return blk_rq_sectors(rq) *
-+		(1 + ((!bfq_bfqq_sync(bfqq)) * (bfqq->raising_coeff == 1) *
++		(1 + ((!bfq_bfqq_sync(bfqq)) * (bfqq->wr_coeff == 1) *
 +		bfq_async_charge_factor));
 +}
 +
@@ -1477,17 +1612,20 @@ index 0000000..f5f71e4
 +
 +	new_budget = max_t(unsigned long, bfqq->max_budget,
 +			   bfq_serv_to_charge(next_rq, bfqq));
-+	entity->budget = new_budget;
-+	bfq_log_bfqq(bfqd, bfqq, "updated next rq: new budget %lu", new_budget);
-+	bfq_activate_bfqq(bfqd, bfqq);
++	if (entity->budget != new_budget) {
++		entity->budget = new_budget;
++		bfq_log_bfqq(bfqd, bfqq, "updated next rq: new budget %lu",
++					 new_budget);
++		bfq_activate_bfqq(bfqd, bfqq);
++	}
 +}
 +
-+static inline unsigned int bfq_wrais_duration(struct bfq_data *bfqd)
++static inline unsigned int bfq_wr_duration(struct bfq_data *bfqd)
 +{
 +	u64 dur;
 +
-+	if (bfqd->bfq_raising_max_time > 0)
-+		return bfqd->bfq_raising_max_time;
++	if (bfqd->bfq_wr_max_time > 0)
++		return bfqd->bfq_wr_max_time;
 +
 +	dur = bfqd->RT_prod;
 +	do_div(dur, bfqd->peak_rate);
@@ -1495,16 +1633,230 @@ index 0000000..f5f71e4
 +	return dur;
 +}
 +
-+static void bfq_add_rq_rb(struct request *rq)
++/* Empty burst list and add just bfqq (see comments to bfq_handle_burst) */
++static inline void bfq_reset_burst_list(struct bfq_data *bfqd,
++					struct bfq_queue *bfqq)
++{
++	struct bfq_queue *item;
++	struct hlist_node *n;
++
++	hlist_for_each_entry_safe(item, n, &bfqd->burst_list, burst_list_node)
++		hlist_del_init(&item->burst_list_node);
++	hlist_add_head(&bfqq->burst_list_node, &bfqd->burst_list);
++	bfqd->burst_size = 1;
++}
++
++/* Add bfqq to the list of queues in current burst (see bfq_handle_burst) */
++static void bfq_add_to_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++	/* Increment burst size to take into account also bfqq */
++	bfqd->burst_size++;
++
++	if (bfqd->burst_size == bfqd->bfq_large_burst_thresh) {
++		struct bfq_queue *pos, *bfqq_item;
++		struct hlist_node *n;
++
++		/*
++		 * Enough queues have been activated shortly after each
++		 * other to consider this burst as large.
++		 */
++		bfqd->large_burst = true;
++
++		/*
++		 * We can now mark all queues in the burst list as
++		 * belonging to a large burst.
++		 */
++		hlist_for_each_entry(bfqq_item, &bfqd->burst_list,
++				     burst_list_node)
++		        bfq_mark_bfqq_in_large_burst(bfqq_item);
++		bfq_mark_bfqq_in_large_burst(bfqq);
++
++		/*
++		 * From now on, and until the current burst finishes, any
++		 * new queue being activated shortly after the last queue
++		 * was inserted in the burst can be immediately marked as
++		 * belonging to a large burst. So the burst list is not
++		 * needed any more. Remove it.
++		 */
++		hlist_for_each_entry_safe(pos, n, &bfqd->burst_list,
++					  burst_list_node)
++			hlist_del_init(&pos->burst_list_node);
++	} else /* burst not yet large: add bfqq to the burst list */
++		hlist_add_head(&bfqq->burst_list_node, &bfqd->burst_list);
++}
++
++/*
++ * If many queues happen to become active shortly after each other, then,
++ * to help the processes associated to these queues get their job done as
++ * soon as possible, it is usually better to not grant either weight-raising
++ * or device idling to these queues. In this comment we describe, firstly,
++ * the reasons why this fact holds, and, secondly, the next function, which
++ * implements the main steps needed to properly mark these queues so that
++ * they can then be treated in a different way.
++ *
++ * As for the terminology, we say that a queue becomes active, i.e.,
++ * switches from idle to backlogged, either when it is created (as a
++ * consequence of the arrival of an I/O request), or, if already existing,
++ * when a new request for the queue arrives while the queue is idle.
++ * Bursts of activations, i.e., activations of different queues occurring
++ * shortly after each other, are typically caused by services or applications
++ * that spawn or reactivate many parallel threads/processes. Examples are
++ * systemd during boot or git grep.
++ *
++ * These services or applications benefit mostly from a high throughput:
++ * the quicker the requests of the activated queues are cumulatively served,
++ * the sooner the target job of these queues gets completed. As a consequence,
++ * weight-raising any of these queues, which also implies idling the device
++ * for it, is almost always counterproductive: in most cases it just lowers
++ * throughput.
++ *
++ * On the other hand, a burst of activations may be also caused by the start
++ * of an application that does not consist in a lot of parallel I/O-bound
++ * threads. In fact, with a complex application, the burst may be just a
++ * consequence of the fact that several processes need to be executed to
++ * start-up the application. To start an application as quickly as possible,
++ * the best thing to do is to privilege the I/O related to the application
++ * with respect to all other I/O. Therefore, the best strategy to start as
++ * quickly as possible an application that causes a burst of activations is
++ * to weight-raise all the queues activated during the burst. This is the
++ * exact opposite of the best strategy for the other type of bursts.
++ *
++ * In the end, to take the best action for each of the two cases, the two
++ * types of bursts need to be distinguished. Fortunately, this seems
++ * relatively easy to do, by looking at the sizes of the bursts. In
++ * particular, we found a threshold such that bursts with a larger size
++ * than that threshold are apparently caused only by services or commands
++ * such as systemd or git grep. For brevity, hereafter we call just 'large'
++ * these bursts. BFQ *does not* weight-raise queues whose activations occur
++ * in a large burst. In addition, for each of these queues BFQ performs or
++ * does not perform idling depending on which choice boosts the throughput
++ * most. The exact choice depends on the device and request pattern at
++ * hand.
++ *
++ * Turning back to the next function, it implements all the steps needed
++ * to detect the occurrence of a large burst and to properly mark all the
++ * queues belonging to it (so that they can then be treated in a different
++ * way). This goal is achieved by maintaining a special "burst list" that
++ * holds, temporarily, the queues that belong to the burst in progress. The
++ * list is then used to mark these queues as belonging to a large burst if
++ * the burst does become large. The main steps are the following.
++ *
++ * . when the very first queue is activated, the queue is inserted into the
++ *   list (as it could be the first queue in a possible burst)
++ *
++ * . if the current burst has not yet become large, and a queue Q that does
++ *   not yet belong to the burst is activated shortly after the last time
++ *   at which a new queue entered the burst list, then the function appends
++ *   Q to the burst list
++ *
++ * . if, as a consequence of the previous step, the burst size reaches
++ *   the large-burst threshold, then
++ *
++ *     . all the queues in the burst list are marked as belonging to a
++ *       large burst
++ *
++ *     . the burst list is deleted; in fact, the burst list already served
++ *       its purpose (keeping temporarily track of the queues in a burst,
++ *       so as to be able to mark them as belonging to a large burst in the
++ *       previous sub-step), and now is not needed any more
++ *
++ *     . the device enters a large-burst mode
++ *
++ * . if a queue Q that does not belong to the burst is activated while
++ *   the device is in large-burst mode and shortly after the last time
++ *   at which a queue either entered the burst list or was marked as
++ *   belonging to the current large burst, then Q is immediately marked
++ *   as belonging to a large burst.
++ *
++ * . if a queue Q that does not belong to the burst is activated a while
++ *   later, i.e., not shortly after, than the last time at which a queue
++ *   either entered the burst list or was marked as belonging to the
++ *   current large burst, then the current burst is deemed as finished and:
++ *
++ *        . the large-burst mode is reset if set
++ *
++ *        . the burst list is emptied
++ *
++ *        . Q is inserted in the burst list, as Q may be the first queue
++ *          in a possible new burst (then the burst list contains just Q
++ *          after this step).
++ */
++static void bfq_handle_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++			     bool idle_for_long_time)
++{
++	/*
++	 * If bfqq happened to be activated in a burst, but has been idle
++	 * for at least as long as an interactive queue, then we assume
++	 * that, in the overall I/O initiated in the burst, the I/O
++	 * associated to bfqq is finished. So bfqq does not need to be
++	 * treated as a queue belonging to a burst anymore. Accordingly,
++	 * we reset bfqq's in_large_burst flag if set, and remove bfqq
++	 * from the burst list if it's there. We do not decrement instead
++	 * burst_size, because the fact that bfqq does not need to belong
++	 * to the burst list any more does not invalidate the fact that
++	 * bfqq may have been activated during the current burst.
++	 */
++	if (idle_for_long_time) {
++		hlist_del_init(&bfqq->burst_list_node);
++		bfq_clear_bfqq_in_large_burst(bfqq);
++	}
++
++	/*
++	 * If bfqq is already in the burst list or is part of a large
++	 * burst, then there is nothing else to do.
++	 */
++	if (!hlist_unhashed(&bfqq->burst_list_node) ||
++	    bfq_bfqq_in_large_burst(bfqq))
++		return;
++
++	/*
++	 * If bfqq's activation happens late enough, then the current
++	 * burst is finished, and related data structures must be reset.
++	 *
++	 * In this respect, consider the special case where bfqq is the very
++	 * first queue being activated. In this case, last_ins_in_burst is
++	 * not yet significant when we get here. But it is easy to verify
++	 * that, whether or not the following condition is true, bfqq will
++	 * end up being inserted into the burst list. In particular the
++	 * list will happen to contain only bfqq. And this is exactly what
++	 * has to happen, as bfqq may be the first queue in a possible
++	 * burst.
++	 */
++	if (time_is_before_jiffies(bfqd->last_ins_in_burst +
++	    bfqd->bfq_burst_interval)) {
++		bfqd->large_burst = false;
++		bfq_reset_burst_list(bfqd, bfqq);
++		return;
++	}
++
++	/*
++	 * If we get here, then bfqq is being activated shortly after the
++	 * last queue. So, if the current burst is also large, we can mark
++	 * bfqq as belonging to this large burst immediately.
++	 */
++	if (bfqd->large_burst) {
++		bfq_mark_bfqq_in_large_burst(bfqq);
++		return;
++	}
++
++	/*
++	 * If we get here, then a large-burst state has not yet been
++	 * reached, but bfqq is being activated shortly after the last
++	 * queue. Then we add bfqq to the burst.
++	 */
++	bfq_add_to_burst(bfqd, bfqq);
++}
++
++static void bfq_add_request(struct request *rq)
 +{
 +	struct bfq_queue *bfqq = RQ_BFQQ(rq);
 +	struct bfq_entity *entity = &bfqq->entity;
 +	struct bfq_data *bfqd = bfqq->bfqd;
 +	struct request *next_rq, *prev;
-+	unsigned long old_raising_coeff = bfqq->raising_coeff;
-+	int idle_for_long_time = 0;
++	unsigned long old_wr_coeff = bfqq->wr_coeff;
++	bool interactive = false;
 +
-+	bfq_log_bfqq(bfqd, bfqq, "add_rq_rb %d", rq_is_sync(rq));
++	bfq_log_bfqq(bfqd, bfqq, "add_request %d", rq_is_sync(rq));
 +	bfqq->queued[rq_is_sync(rq)]++;
 +	bfqd->queued++;
 +
@@ -1525,14 +1877,50 @@ index 0000000..f5f71e4
 +		bfq_rq_pos_tree_add(bfqd, bfqq);
 +
 +	if (!bfq_bfqq_busy(bfqq)) {
-+		int soft_rt = bfqd->bfq_raising_max_softrt_rate > 0 &&
++		bool soft_rt,
++		     idle_for_long_time = time_is_before_jiffies(
++						bfqq->budget_timeout +
++						bfqd->bfq_wr_min_idle_time);
++
++		if (bfq_bfqq_sync(bfqq)) {
++			bool already_in_burst =
++			   !hlist_unhashed(&bfqq->burst_list_node) ||
++			   bfq_bfqq_in_large_burst(bfqq);
++			bfq_handle_burst(bfqd, bfqq, idle_for_long_time);
++			/*
++			 * If bfqq was not already in the current burst,
++			 * then, at this point, bfqq either has been
++			 * added to the current burst or has caused the
++			 * current burst to terminate. In particular, in
++			 * the second case, bfqq has become the first
++			 * queue in a possible new burst.
++			 * In both cases last_ins_in_burst needs to be
++			 * moved forward.
++			 */
++			if (!already_in_burst)
++				bfqd->last_ins_in_burst = jiffies;
++		}
++
++		soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
++			!bfq_bfqq_in_large_burst(bfqq) &&
 +			time_is_before_jiffies(bfqq->soft_rt_next_start);
-+		idle_for_long_time = time_is_before_jiffies(
-+			bfqq->budget_timeout +
-+			bfqd->bfq_raising_min_idle_time);
++		interactive = !bfq_bfqq_in_large_burst(bfqq) &&
++			      idle_for_long_time;
 +		entity->budget = max_t(unsigned long, bfqq->max_budget,
 +				       bfq_serv_to_charge(next_rq, bfqq));
 +
++		if (!bfq_bfqq_IO_bound(bfqq)) {
++			if (time_before(jiffies,
++					RQ_BIC(rq)->ttime.last_end_request +
++					bfqd->bfq_slice_idle)) {
++				bfqq->requests_within_timer++;
++				if (bfqq->requests_within_timer >=
++				    bfqd->bfq_requests_within_timer)
++					bfq_mark_bfqq_IO_bound(bfqq);
++			} else
++				bfqq->requests_within_timer = 0;
++		}
++
 +		if (!bfqd->low_latency)
 +			goto add_bfqq_busy;
 +
@@ -1540,45 +1928,40 @@ index 0000000..f5f71e4
 +		 * If the queue is not being boosted and has been idle
 +		 * for enough time, start a weight-raising period
 +		 */
-+		if (old_raising_coeff == 1 &&
-+		    (idle_for_long_time || soft_rt)) {
-+			bfqq->raising_coeff = bfqd->bfq_raising_coeff;
-+			if (idle_for_long_time)
-+				bfqq->raising_cur_max_time =
-+					bfq_wrais_duration(bfqd);
++		if (old_wr_coeff == 1 && (interactive || soft_rt)) {
++			bfqq->wr_coeff = bfqd->bfq_wr_coeff;
++			if (interactive)
++				bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
 +			else
-+				bfqq->raising_cur_max_time =
-+					bfqd->bfq_raising_rt_max_time;
++				bfqq->wr_cur_max_time =
++					bfqd->bfq_wr_rt_max_time;
 +			bfq_log_bfqq(bfqd, bfqq,
-+				     "wrais starting at %lu, "
-+				     "rais_max_time %u",
++				     "wrais starting at %lu, rais_max_time %u",
 +				     jiffies,
-+				     jiffies_to_msecs(bfqq->
-+					raising_cur_max_time));
-+		} else if (old_raising_coeff > 1) {
-+			if (idle_for_long_time)
-+				bfqq->raising_cur_max_time =
-+					bfq_wrais_duration(bfqd);
-+			else if (bfqq->raising_cur_max_time ==
-+				 bfqd->bfq_raising_rt_max_time &&
-+				 !soft_rt) {
-+				bfqq->raising_coeff = 1;
++				     jiffies_to_msecs(bfqq->wr_cur_max_time));
++		} else if (old_wr_coeff > 1) {
++			if (interactive)
++				bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
++			else if (bfq_bfqq_in_large_burst(bfqq) ||
++				 (bfqq->wr_cur_max_time ==
++				  bfqd->bfq_wr_rt_max_time &&
++				  !soft_rt)) {
++				bfqq->wr_coeff = 1;
 +				bfq_log_bfqq(bfqd, bfqq,
-+					     "wrais ending at %lu, "
-+					     "rais_max_time %u",
-+					     jiffies,
-+					     jiffies_to_msecs(bfqq->
-+						raising_cur_max_time));
++					"wrais ending at %lu, rais_max_time %u",
++					jiffies,
++					jiffies_to_msecs(bfqq->
++						wr_cur_max_time));
 +			} else if (time_before(
-+					bfqq->last_rais_start_finish +
-+					bfqq->raising_cur_max_time,
++					bfqq->last_wr_start_finish +
++					bfqq->wr_cur_max_time,
 +					jiffies +
-+					bfqd->bfq_raising_rt_max_time) &&
++					bfqd->bfq_wr_rt_max_time) &&
 +				   soft_rt) {
 +				/*
 +				 *
 +				 * The remaining weight-raising time is lower
-+				 * than bfqd->bfq_raising_rt_max_time, which
++				 * than bfqd->bfq_wr_rt_max_time, which
 +				 * means that the application is enjoying
 +				 * weight raising either because deemed soft-
 +				 * rt in the near past, or because deemed
@@ -1619,12 +2002,12 @@ index 0000000..f5f71e4
 +				 *    latency because the application is not
 +				 *    weight-raised while they are pending.
 +				 */
-+				bfqq->last_rais_start_finish = jiffies;
-+				bfqq->raising_cur_max_time =
-+					bfqd->bfq_raising_rt_max_time;
++				bfqq->last_wr_start_finish = jiffies;
++				bfqq->wr_cur_max_time =
++					bfqd->bfq_wr_rt_max_time;
 +			}
 +		}
-+		if (old_raising_coeff != bfqq->raising_coeff)
++		if (old_wr_coeff != bfqq->wr_coeff)
 +			entity->ioprio_changed = 1;
 +add_bfqq_busy:
 +		bfqq->last_idle_bklogged = jiffies;
@@ -1632,38 +2015,27 @@ index 0000000..f5f71e4
 +		bfq_clear_bfqq_softrt_update(bfqq);
 +		bfq_add_bfqq_busy(bfqd, bfqq);
 +	} else {
-+		if (bfqd->low_latency && old_raising_coeff == 1 &&
-+			!rq_is_sync(rq) &&
-+			time_is_before_jiffies(
-+				bfqq->last_rais_start_finish +
-+				bfqd->bfq_raising_min_inter_arr_async)) {
-+			bfqq->raising_coeff = bfqd->bfq_raising_coeff;
-+			bfqq->raising_cur_max_time = bfq_wrais_duration(bfqd);
-+
-+			bfqd->raised_busy_queues++;
++		if (bfqd->low_latency && old_wr_coeff == 1 && !rq_is_sync(rq) &&
++		    time_is_before_jiffies(
++				bfqq->last_wr_start_finish +
++				bfqd->bfq_wr_min_inter_arr_async)) {
++			bfqq->wr_coeff = bfqd->bfq_wr_coeff;
++			bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
++
++			bfqd->wr_busy_queues++;
 +			entity->ioprio_changed = 1;
 +			bfq_log_bfqq(bfqd, bfqq,
-+				     "non-idle wrais starting at %lu, "
-+				     "rais_max_time %u",
-+				     jiffies,
-+				     jiffies_to_msecs(bfqq->
-+					raising_cur_max_time));
++			    "non-idle wrais starting at %lu, rais_max_time %u",
++			    jiffies,
++			    jiffies_to_msecs(bfqq->wr_cur_max_time));
 +		}
-+		bfq_updated_next_req(bfqd, bfqq);
++		if (prev != bfqq->next_rq)
++			bfq_updated_next_req(bfqd, bfqq);
 +	}
 +
 +	if (bfqd->low_latency &&
-+		(old_raising_coeff == 1 || bfqq->raising_coeff == 1 ||
-+		 idle_for_long_time))
-+		bfqq->last_rais_start_finish = jiffies;
-+}
-+
-+static void bfq_reposition_rq_rb(struct bfq_queue *bfqq, struct request *rq)
-+{
-+	elv_rb_del(&bfqq->sort_list, rq);
-+	bfqq->queued[rq_is_sync(rq)]--;
-+	bfqq->bfqd->queued--;
-+	bfq_add_rq_rb(rq);
++		(old_wr_coeff == 1 || bfqq->wr_coeff == 1 || interactive))
++		bfqq->last_wr_start_finish = jiffies;
 +}
 +
 +static struct request *bfq_find_rq_fmerge(struct bfq_data *bfqd,
@@ -1694,11 +2066,12 @@ index 0000000..f5f71e4
 +		(long long unsigned)bfqd->last_position);
 +}
 +
-+static void bfq_deactivate_request(struct request_queue *q, struct request *rq)
++static inline void bfq_deactivate_request(struct request_queue *q,
++					  struct request *rq)
 +{
 +	struct bfq_data *bfqd = q->elevator->elevator_data;
 +
-+	WARN_ON(bfqd->rq_in_driver == 0);
++	BUG_ON(bfqd->rq_in_driver == 0);
 +	bfqd->rq_in_driver--;
 +}
 +
@@ -1706,6 +2079,7 @@ index 0000000..f5f71e4
 +{
 +	struct bfq_queue *bfqq = RQ_BFQQ(rq);
 +	struct bfq_data *bfqd = bfqq->bfqd;
++	const int sync = rq_is_sync(rq);
 +
 +	if (bfqq->next_rq == rq) {
 +		bfqq->next_rq = bfq_find_next_rq(bfqd, bfqq, rq);
@@ -1713,10 +2087,25 @@ index 0000000..f5f71e4
 +	}
 +
 +	list_del_init(&rq->queuelist);
-+	bfq_del_rq_rb(rq);
++	BUG_ON(bfqq->queued[sync] == 0);
++	bfqq->queued[sync]--;
++	bfqd->queued--;
++	elv_rb_del(&bfqq->sort_list, rq);
++
++	if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
++		if (bfq_bfqq_busy(bfqq) && bfqq != bfqd->in_service_queue)
++			bfq_del_bfqq_busy(bfqd, bfqq, 1);
++		/*
++		 * Remove queue from request-position tree as it is empty.
++		 */
++		if (bfqq->pos_root != NULL) {
++			rb_erase(&bfqq->pos_node, bfqq->pos_root);
++			bfqq->pos_root = NULL;
++		}
++	}
 +
 +	if (rq->cmd_flags & REQ_META) {
-+		WARN_ON(bfqq->meta_pending == 0);
++		BUG_ON(bfqq->meta_pending == 0);
 +		bfqq->meta_pending--;
 +	}
 +}
@@ -1739,10 +2128,33 @@ index 0000000..f5f71e4
 +static void bfq_merged_request(struct request_queue *q, struct request *req,
 +			       int type)
 +{
-+	if (type == ELEVATOR_FRONT_MERGE) {
++	if (type == ELEVATOR_FRONT_MERGE &&
++	    rb_prev(&req->rb_node) &&
++	    blk_rq_pos(req) <
++	    blk_rq_pos(container_of(rb_prev(&req->rb_node),
++				    struct request, rb_node))) {
 +		struct bfq_queue *bfqq = RQ_BFQQ(req);
-+
-+		bfq_reposition_rq_rb(bfqq, req);
++		struct bfq_data *bfqd = bfqq->bfqd;
++		struct request *prev, *next_rq;
++
++		/* Reposition request in its sort_list */
++		elv_rb_del(&bfqq->sort_list, req);
++		elv_rb_add(&bfqq->sort_list, req);
++		/* Choose next request to be served for bfqq */
++		prev = bfqq->next_rq;
++		next_rq = bfq_choose_req(bfqd, bfqq->next_rq, req,
++					 bfqd->last_position);
++		BUG_ON(next_rq == NULL);
++		bfqq->next_rq = next_rq;
++		/*
++		 * If next_rq changes, update both the queue's budget to
++		 * fit the new request and the queue's position in its
++		 * rq_pos_tree.
++		 */
++		if (prev != bfqq->next_rq) {
++			bfq_updated_next_req(bfqd, bfqq);
++			bfq_rq_pos_tree_add(bfqd, bfqq);
++		}
 +	}
 +}
 +
@@ -1767,41 +2179,41 @@ index 0000000..f5f71e4
 +}
 +
 +/* Must be called with bfqq != NULL */
-+static inline void bfq_bfqq_end_raising(struct bfq_queue *bfqq)
++static inline void bfq_bfqq_end_wr(struct bfq_queue *bfqq)
 +{
 +	BUG_ON(bfqq == NULL);
 +	if (bfq_bfqq_busy(bfqq))
-+		bfqq->bfqd->raised_busy_queues--;
-+	bfqq->raising_coeff = 1;
-+	bfqq->raising_cur_max_time = 0;
++		bfqq->bfqd->wr_busy_queues--;
++	bfqq->wr_coeff = 1;
++	bfqq->wr_cur_max_time = 0;
 +	/* Trigger a weight change on the next activation of the queue */
 +	bfqq->entity.ioprio_changed = 1;
 +}
 +
-+static void bfq_end_raising_async_queues(struct bfq_data *bfqd,
-+					struct bfq_group *bfqg)
++static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
++				    struct bfq_group *bfqg)
 +{
 +	int i, j;
 +
 +	for (i = 0; i < 2; i++)
 +		for (j = 0; j < IOPRIO_BE_NR; j++)
 +			if (bfqg->async_bfqq[i][j] != NULL)
-+				bfq_bfqq_end_raising(bfqg->async_bfqq[i][j]);
++				bfq_bfqq_end_wr(bfqg->async_bfqq[i][j]);
 +	if (bfqg->async_idle_bfqq != NULL)
-+		bfq_bfqq_end_raising(bfqg->async_idle_bfqq);
++		bfq_bfqq_end_wr(bfqg->async_idle_bfqq);
 +}
 +
-+static void bfq_end_raising(struct bfq_data *bfqd)
++static void bfq_end_wr(struct bfq_data *bfqd)
 +{
 +	struct bfq_queue *bfqq;
 +
 +	spin_lock_irq(bfqd->queue->queue_lock);
 +
 +	list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list)
-+		bfq_bfqq_end_raising(bfqq);
++		bfq_bfqq_end_wr(bfqq);
 +	list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
-+		bfq_bfqq_end_raising(bfqq);
-+	bfq_end_raising_async(bfqd);
++		bfq_bfqq_end_wr(bfqq);
++	bfq_end_wr_async(bfqd);
 +
 +	spin_unlock_irq(bfqd->queue->queue_lock);
 +}
@@ -1904,8 +2316,8 @@ index 0000000..f5f71e4
 +
 +	/*
 +	 * If the exact sector wasn't found, the parent of the NULL leaf
-+	 * will contain the closest sector (rq_pos_tree sorted by next_request
-+	 * position).
++	 * will contain the closest sector (rq_pos_tree sorted by
++	 * next_request position).
 +	 */
 +	__bfqq = rb_entry(parent, struct bfq_queue, pos_node);
 +	if (bfq_rq_close(bfqd, __bfqq->next_rq))
@@ -2007,35 +2419,15 @@ index 0000000..f5f71e4
 +		return bfqd->bfq_max_budget / 32;
 +}
 +
-+/*
-+ * Decides whether idling should be done for given device and
-+ * given in-service queue.
-+ */
-+static inline bool bfq_queue_nonrot_noidle(struct bfq_data *bfqd,
-+					   struct bfq_queue *in_service_bfqq)
-+{
-+	if (in_service_bfqq == NULL)
-+		return false;
-+	/*
-+	 * If the device is non-rotational, and hence has no seek penalty,
-+	 * disable idling; but do so only if:
-+	 * - device does not support queuing, otherwise we still have
-+	 *   a problem with sync vs async workloads;
-+	 * - the queue is not weight-raised, to preserve guarantees.
-+	 */
-+	return blk_queue_nonrot(bfqd->queue) && bfqd->hw_tag &&
-+	       (in_service_bfqq->raising_coeff == 1);
-+}
-+
 +static void bfq_arm_slice_timer(struct bfq_data *bfqd)
 +{
 +	struct bfq_queue *bfqq = bfqd->in_service_queue;
 +	struct bfq_io_cq *bic;
 +	unsigned long sl;
 +
-+	WARN_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
++	BUG_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
 +
-+	/* Tasks have exited, don't wait. */
++	/* Processes have exited, don't wait. */
 +	bic = bfqd->in_service_bic;
 +	if (bic == NULL || atomic_read(&bic->icq.ioc->active_ref) == 0)
 +		return;
@@ -2053,11 +2445,17 @@ index 0000000..f5f71e4
 +	 * BFQ_MIN_TT. This happened to help reduce latency.
 +	 */
 +	sl = bfqd->bfq_slice_idle;
-+	if (bfq_sample_valid(bfqq->seek_samples) && BFQQ_SEEKY(bfqq) &&
-+	    bfqq->entity.service > bfq_max_budget(bfqd) / 8 &&
-+	    bfqq->raising_coeff == 1)
++	/*
++	 * Unless the queue is being weight-raised, grant only minimum idle
++	 * time if the queue either has been seeky for long enough or has
++	 * already proved to be constantly seeky.
++	 */
++	if (bfq_sample_valid(bfqq->seek_samples) &&
++	    ((BFQQ_SEEKY(bfqq) && bfqq->entity.service >
++				  bfq_max_budget(bfqq->bfqd) / 8) ||
++	      bfq_bfqq_constantly_seeky(bfqq)) && bfqq->wr_coeff == 1)
 +		sl = min(sl, msecs_to_jiffies(BFQ_MIN_TT));
-+	else if (bfqq->raising_coeff > 1)
++	else if (bfqq->wr_coeff > 1)
 +		sl = sl * 3;
 +	bfqd->last_idling_start = ktime_get();
 +	mod_timer(&bfqd->idle_slice_timer, jiffies + sl);
@@ -2074,7 +2472,7 @@ index 0000000..f5f71e4
 +{
 +	struct bfq_queue *bfqq = bfqd->in_service_queue;
 +	unsigned int timeout_coeff;
-+	if (bfqq->raising_cur_max_time == bfqd->bfq_raising_rt_max_time)
++	if (bfqq->wr_cur_max_time == bfqd->bfq_wr_rt_max_time)
 +		timeout_coeff = 1;
 +	else
 +		timeout_coeff = bfqq->entity.weight / bfqq->entity.orig_weight;
@@ -2098,8 +2496,18 @@ index 0000000..f5f71e4
 +	struct bfq_data *bfqd = q->elevator->elevator_data;
 +	struct bfq_queue *bfqq = RQ_BFQQ(rq);
 +
-+	bfq_remove_request(rq);
++	/*
++	 * For consistency, the next instruction should have been executed
++	 * after removing the request from the queue and dispatching it.
++	 * We execute instead this instruction before bfq_remove_request()
++	 * (and hence introduce a temporary inconsistency), for efficiency.
++	 * In fact, in a forced_dispatch, this prevents two counters related
++	 * to bfqq->dispatched to risk to be uselessly decremented if bfqq
++	 * is not in service, and then to be incremented again after
++	 * incrementing bfqq->dispatched.
++	 */
 +	bfqq->dispatched++;
++	bfq_remove_request(rq);
 +	elv_dispatch_sort(q, rq);
 +
 +	if (bfq_bfqq_sync(bfqq))
@@ -2129,9 +2537,7 @@ index 0000000..f5f71e4
 +	return rq;
 +}
 +
-+/*
-+ * Must be called with the queue_lock held.
-+ */
++/* Must be called with the queue_lock held. */
 +static int bfqq_process_refs(struct bfq_queue *bfqq)
 +{
 +	int process_refs, io_refs;
@@ -2209,9 +2615,9 @@ index 0000000..f5f71e4
 +
 +	if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
 +		/*
-+		 * overloading budget_timeout field to store when
-+		 * the queue remains with no backlog, used by
-+		 * the weight-raising mechanism
++		 * Overloading budget_timeout field to store the time
++		 * at which the queue remains with no backlog; used by
++		 * the weight-raising mechanism.
 +		 */
 +		bfqq->budget_timeout = jiffies;
 +		bfq_del_bfqq_busy(bfqd, bfqq, 1);
@@ -2439,11 +2845,26 @@ index 0000000..f5f71e4
 +			bfqd->peak_rate_samples++;
 +
 +		if (bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES &&
-+		    update && bfqd->bfq_user_max_budget == 0) {
-+			bfqd->bfq_max_budget =
-+				bfq_calc_max_budget(bfqd->peak_rate, timeout);
-+			bfq_log(bfqd, "new max_budget=%lu",
-+				bfqd->bfq_max_budget);
++		    update) {
++			int dev_type = blk_queue_nonrot(bfqd->queue);
++			if (bfqd->bfq_user_max_budget == 0) {
++				bfqd->bfq_max_budget =
++					bfq_calc_max_budget(bfqd->peak_rate,
++							    timeout);
++				bfq_log(bfqd, "new max_budget=%lu",
++					bfqd->bfq_max_budget);
++			}
++			if (bfqd->device_speed == BFQ_BFQD_FAST &&
++			    bfqd->peak_rate < device_speed_thresh[dev_type]) {
++				bfqd->device_speed = BFQ_BFQD_SLOW;
++				bfqd->RT_prod = R_slow[dev_type] *
++						T_slow[dev_type];
++			} else if (bfqd->device_speed == BFQ_BFQD_SLOW &&
++			    bfqd->peak_rate > device_speed_thresh[dev_type]) {
++				bfqd->device_speed = BFQ_BFQD_FAST;
++				bfqd->RT_prod = R_fast[dev_type] *
++						T_fast[dev_type];
++			}
 +		}
 +	}
 +
@@ -2479,10 +2900,10 @@ index 0000000..f5f71e4
 +}
 +
 +/*
-+ * To be deemed as soft real-time, an application must meet two requirements.
-+ * First, the application must not require an average bandwidth higher than
-+ * the approximate bandwidth required to playback or record a compressed high-
-+ * definition video.
++ * To be deemed as soft real-time, an application must meet two
++ * requirements. First, the application must not require an average
++ * bandwidth higher than the approximate bandwidth required to playback or
++ * record a compressed high-definition video.
 + * The next function is invoked on the completion of the last request of a
 + * batch, to compute the next-start time instant, soft_rt_next_start, such
 + * that, if the next request of the application does not arrive before
@@ -2493,30 +2914,31 @@ index 0000000..f5f71e4
 + * the application stops issuing new requests until all its pending requests
 + * have been completed. After that, the application may issue a new batch,
 + * and so on.
-+ * For this reason the next function is invoked to compute soft_rt_next_start
-+ * only for applications that meet this requirement, whereas soft_rt_next_start
-+ * is set to infinity for applications that do not.
++ * For this reason the next function is invoked to compute
++ * soft_rt_next_start only for applications that meet this requirement,
++ * whereas soft_rt_next_start is set to infinity for applications that do
++ * not.
 + *
 + * Unfortunately, even a greedy application may happen to behave in an
-+ * isochronous way if the CPU load is high. In fact, the application may stop
-+ * issuing requests while the CPUs are busy serving other processes, then
-+ * restart, then stop again for a while, and so on. In addition, if the disk
-+ * achieves a low enough throughput with the request pattern issued by the
-+ * application (e.g., because the request pattern is random and/or the device
-+ * is slow), then the application may meet the above bandwidth requirement too.
-+ * To prevent such a greedy application to be deemed as soft real-time, a
-+ * further rule is used in the computation of soft_rt_next_start:
-+ * soft_rt_next_start must be higher than the current time plus the maximum
-+ * time for which the arrival of a request is waited for when a sync queue
-+ * becomes idle, namely bfqd->bfq_slice_idle.
-+ * This filters out greedy applications, as the latter issue instead their next
-+ * request as soon as possible after the last one has been completed (in
-+ * contrast, when a batch of requests is completed, a soft real-time application
-+ * spends some time processing data).
++ * isochronous way if the CPU load is high. In fact, the application may
++ * stop issuing requests while the CPUs are busy serving other processes,
++ * then restart, then stop again for a while, and so on. In addition, if
++ * the disk achieves a low enough throughput with the request pattern
++ * issued by the application (e.g., because the request pattern is random
++ * and/or the device is slow), then the application may meet the above
++ * bandwidth requirement too. To prevent such a greedy application to be
++ * deemed as soft real-time, a further rule is used in the computation of
++ * soft_rt_next_start: soft_rt_next_start must be higher than the current
++ * time plus the maximum time for which the arrival of a request is waited
++ * for when a sync queue becomes idle, namely bfqd->bfq_slice_idle.
++ * This filters out greedy applications, as the latter issue instead their
++ * next request as soon as possible after the last one has been completed
++ * (in contrast, when a batch of requests is completed, a soft real-time
++ * application spends some time processing data).
 + *
-+ * Unfortunately, the last filter may easily generate false positives if only
-+ * bfqd->bfq_slice_idle is used as a reference time interval and one or both
-+ * the following cases occur:
++ * Unfortunately, the last filter may easily generate false positives if
++ * only bfqd->bfq_slice_idle is used as a reference time interval and one
++ * or both the following cases occur:
 + * 1) HZ is so low that the duration of a jiffy is comparable to or higher
 + *    than bfqd->bfq_slice_idle. This happens, e.g., on slow devices with
 + *    HZ=100.
@@ -2525,15 +2947,16 @@ index 0000000..f5f71e4
 + *    increments. This seems to happen, e.g., inside virtual machines.
 + * To address this issue, we do not use as a reference time interval just
 + * bfqd->bfq_slice_idle, but bfqd->bfq_slice_idle plus a few jiffies. In
-+ * particular we add the minimum number of jiffies for which the filter seems
-+ * to be quite precise also in embedded systems and KVM/QEMU virtual machines.
++ * particular we add the minimum number of jiffies for which the filter
++ * seems to be quite precise also in embedded systems and KVM/QEMU virtual
++ * machines.
 + */
 +static inline unsigned long bfq_bfqq_softrt_next_start(struct bfq_data *bfqd,
 +						       struct bfq_queue *bfqq)
 +{
 +	return max(bfqq->last_idle_bklogged +
 +		   HZ * bfqq->service_from_backlogged /
-+		   bfqd->bfq_raising_max_softrt_rate,
++		   bfqd->bfq_wr_max_softrt_rate,
 +		   jiffies + bfqq->bfqd->bfq_slice_idle + 4);
 +}
 +
@@ -2594,7 +3017,7 @@ index 0000000..f5f71e4
 +	 * As above explained, 'punish' slow (i.e., seeky), timed-out
 +	 * and async queues, to favor sequential sync workloads.
 +	 *
-+	 * Processes doing IO in the slower disk zones will tend to be
++	 * Processes doing I/O in the slower disk zones will tend to be
 +	 * slow(er) even if not seeky. Hence, since the estimated peak
 +	 * rate is actually an average over the disk surface, these
 +	 * processes may timeout just for bad luck. To avoid punishing
@@ -2607,19 +3030,31 @@ index 0000000..f5f71e4
 +
 +	bfqq->service_from_backlogged += bfqq->entity.service;
 +
-+	if (bfqd->low_latency && bfqq->raising_coeff == 1)
-+		bfqq->last_rais_start_finish = jiffies;
++	if (BFQQ_SEEKY(bfqq) && reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
++	    !bfq_bfqq_constantly_seeky(bfqq)) {
++		bfq_mark_bfqq_constantly_seeky(bfqq);
++		if (!blk_queue_nonrot(bfqd->queue))
++			bfqd->const_seeky_busy_in_flight_queues++;
++	}
++
++	if (reason == BFQ_BFQQ_TOO_IDLE &&
++	    bfqq->entity.service <= 2 * bfqq->entity.budget / 10 )
++		bfq_clear_bfqq_IO_bound(bfqq);
++
++	if (bfqd->low_latency && bfqq->wr_coeff == 1)
++		bfqq->last_wr_start_finish = jiffies;
 +
-+	if (bfqd->low_latency && bfqd->bfq_raising_max_softrt_rate > 0 &&
++	if (bfqd->low_latency && bfqd->bfq_wr_max_softrt_rate > 0 &&
 +	    RB_EMPTY_ROOT(&bfqq->sort_list)) {
 +		/*
 +		 * If we get here, and there are no outstanding requests,
 +		 * then the request pattern is isochronous (see the comments
-+		 * to the function bfq_bfqq_softrt_next_start()). Hence we can
-+		 * compute soft_rt_next_start. If, instead, the queue still
-+		 * has outstanding requests, then we have to wait for the
-+		 * completion of all the outstanding requests to discover
-+		 * whether the request pattern is actually isochronous.
++		 * to the function bfq_bfqq_softrt_next_start()). Hence we
++		 * can compute soft_rt_next_start. If, instead, the queue
++		 * still has outstanding requests, then we have to wait
++		 * for the completion of all the outstanding requests to
++		 * discover whether the request pattern is actually
++		 * isochronous.
 +		 */
 +		if (bfqq->dispatched == 0)
 +			bfqq->soft_rt_next_start =
@@ -2651,10 +3086,13 @@ index 0000000..f5f71e4
 +	}
 +
 +	bfq_log_bfqq(bfqd, bfqq,
-+		"expire (%d, slow %d, num_disp %d, idle_win %d)", reason, slow,
-+		bfqq->dispatched, bfq_bfqq_idle_window(bfqq));
++		"expire (%d, slow %d, num_disp %d, idle_win %d)", reason,
++		slow, bfqq->dispatched, bfq_bfqq_idle_window(bfqq));
 +
-+	/* Increase, decrease or leave budget unchanged according to reason */
++	/*
++	 * Increase, decrease or leave budget unchanged according to
++	 * reason.
++	 */
 +	__bfq_bfqq_recalc_budget(bfqd, bfqq, reason);
 +	__bfq_bfqq_expire(bfqd, bfqq);
 +}
@@ -2666,12 +3104,9 @@ index 0000000..f5f71e4
 + */
 +static int bfq_bfqq_budget_timeout(struct bfq_queue *bfqq)
 +{
-+	if (bfq_bfqq_budget_new(bfqq))
-+		return 0;
-+
-+	if (time_before(jiffies, bfqq->budget_timeout))
++	if (bfq_bfqq_budget_new(bfqq) ||
++	    time_before(jiffies, bfqq->budget_timeout))
 +		return 0;
-+
 +	return 1;
 +}
 +
@@ -2686,7 +3121,7 @@ index 0000000..f5f71e4
 +static inline int bfq_may_expire_for_budg_timeout(struct bfq_queue *bfqq)
 +{
 +	bfq_log_bfqq(bfqq->bfqd, bfqq,
-+		"may_budget_timeout: wr %d left %d timeout %d",
++		"may_budget_timeout: wait_request %d left %d timeout %d",
 +		bfq_bfqq_wait_request(bfqq),
 +			bfq_bfqq_budget_left(bfqq) >=  bfqq->entity.budget / 3,
 +		bfq_bfqq_budget_timeout(bfqq));
@@ -2698,79 +3133,190 @@ index 0000000..f5f71e4
 +}
 +
 +/*
-+ * For weight-raised queues issuing sync requests, idling is always performed,
-+ * as this is instrumental in guaranteeing a high fraction of the throughput
-+ * to these queues, and hence in guaranteeing a lower latency for their
-+ * requests. See [1] for details.
++ * Device idling is allowed only for the queues for which this function
++ * returns true. For this reason, the return value of this function plays a
++ * critical role for both throughput boosting and service guarantees. The
++ * return value is computed through a logical expression. In this rather
++ * long comment, we try to briefly describe all the details and motivations
++ * behind the components of this logical expression.
++ *
++ * First, the expression is false if bfqq is not sync, or if: bfqq happened
++ * to become active during a large burst of queue activations, and the
++ * pattern of requests bfqq contains boosts the throughput if bfqq is
++ * expired. In fact, queues that became active during a large burst benefit
++ * only from throughput, as discussed in the comments to bfq_handle_burst.
++ * In this respect, expiring bfqq certainly boosts the throughput on NCQ-
++ * capable flash-based devices, whereas, on rotational devices, it boosts
++ * the throughput only if bfqq contains random requests.
++ *
++ * On the opposite end, if (a) bfqq is sync, (b) the above burst-related
++ * condition does not hold, and (c) bfqq is being weight-raised, then the
++ * expression always evaluates to true, as device idling is instrumental
++ * for preserving low-latency guarantees (see [1]). If, instead, conditions
++ * (a) and (b) do hold, but (c) does not, then the expression evaluates to
++ * true only if: (1) bfqq is I/O-bound and has a non-null idle window, and
++ * (2) at least one of the following two conditions holds.
++ * The first condition is that the device is not performing NCQ, because
++ * idling the device most certainly boosts the throughput if this condition
++ * holds and bfqq is I/O-bound and has been granted a non-null idle window.
++ * The second compound condition is made of the logical AND of two components.
++ *
++ * The first component is true only if there is no weight-raised busy
++ * queue. This guarantees that the device is not idled for a sync non-
++ * weight-raised queue when there are busy weight-raised queues. The former
++ * is then expired immediately if empty. Combined with the timestamping
++ * rules of BFQ (see [1] for details), this causes sync non-weight-raised
++ * queues to get a lower number of requests served, and hence to ask for a
++ * lower number of requests from the request pool, before the busy weight-
++ * raised queues get served again.
++ *
++ * This is beneficial for the processes associated with weight-raised
++ * queues, when the request pool is saturated (e.g., in the presence of
++ * write hogs). In fact, if the processes associated with the other queues
++ * ask for requests at a lower rate, then weight-raised processes have a
++ * higher probability to get a request from the pool immediately (or at
++ * least soon) when they need one. Hence they have a higher probability to
++ * actually get a fraction of the disk throughput proportional to their
++ * high weight. This is especially true with NCQ-capable drives, which
++ * enqueue several requests in advance and further reorder internally-
++ * queued requests.
++ *
++ * In the end, mistreating non-weight-raised queues when there are busy
++ * weight-raised queues seems to mitigate starvation problems in the
++ * presence of heavy write workloads and NCQ, and hence to guarantee a
++ * higher application and system responsiveness in these hostile scenarios.
++ *
++ * If the first component of the compound condition is instead true, i.e.,
++ * there is no weight-raised busy queue, then the second component of the
++ * compound condition takes into account service-guarantee and throughput
++ * issues related to NCQ (recall that the compound condition is evaluated
++ * only if the device is detected as supporting NCQ).
++ *
++ * As for service guarantees, allowing the drive to enqueue more than one
++ * request at a time, and hence delegating de facto final scheduling
++ * decisions to the drive's internal scheduler, causes loss of control on
++ * the actual request service order. In this respect, when the drive is
++ * allowed to enqueue more than one request at a time, the service
++ * distribution enforced by the drive's internal scheduler is likely to
++ * coincide with the desired device-throughput distribution only in the
++ * following, perfectly symmetric, scenario:
++ * 1) all active queues have the same weight,
++ * 2) all active groups at the same level in the groups tree have the same
++ *    weight,
++ * 3) all active groups at the same level in the groups tree have the same
++ *    number of children.
++ *
++ * Even in such a scenario, sequential I/O may still receive a preferential
++ * treatment, but this is not likely to be a big issue with flash-based
++ * devices, because of their non-dramatic loss of throughput with random
++ * I/O. Things do differ with HDDs, for which additional care is taken, as
++ * explained after completing the discussion for flash-based devices.
 + *
-+ * For non-weight-raised queues, idling is instead disabled if the device is
-+ * NCQ-enabled and non-rotational, as this boosts the throughput on such
-+ * devices.
++ * Unfortunately, keeping the necessary state for evaluating exactly the
++ * above symmetry conditions would be quite complex and time-consuming.
++ * Therefore BFQ evaluates instead the following stronger sub-conditions,
++ * for which it is much easier to maintain the needed state:
++ * 1) all active queues have the same weight,
++ * 2) all active groups have the same weight,
++ * 3) all active groups have at most one active child each.
++ * In particular, the last two conditions are always true if hierarchical
++ * support and the cgroups interface are not enabled, hence no state needs
++ * to be maintained in this case.
++ *
++ * According to the above considerations, the second component of the
++ * compound condition evaluates to true if any of the above symmetry
++ * sub-condition does not hold, or the device is not flash-based. Therefore,
++ * if also the first component is true, then idling is allowed for a sync
++ * queue. These are the only sub-conditions considered if the device is
++ * flash-based, as, for such a device, it is sensible to force idling only
++ * for service-guarantee issues. In fact, as for throughput, idling
++ * NCQ-capable flash-based devices would not boost the throughput even
++ * with sequential I/O; rather it would lower the throughput in proportion
++ * to how fast the device is. In the end, (only) if all the three
++ * sub-conditions hold and the device is flash-based, the compound
++ * condition evaluates to false and therefore no idling is performed.
++ *
++ * As already said, things change with a rotational device, where idling
++ * boosts the throughput with sequential I/O (even with NCQ). Hence, for
++ * such a device the second component of the compound condition evaluates
++ * to true also if the following additional sub-condition does not hold:
++ * the queue is constantly seeky. Unfortunately, this different behavior
++ * with respect to flash-based devices causes an additional asymmetry: if
++ * some sync queues enjoy idling and some other sync queues do not, then
++ * the latter get a low share of the device throughput, simply because the
++ * former get many requests served after being set as in service, whereas
++ * the latter do not. As a consequence, to guarantee the desired throughput
++ * distribution, on HDDs the compound expression evaluates to true (and
++ * hence device idling is performed) also if the following last symmetry
++ * condition does not hold: no other queue is benefiting from idling. Also
++ * this last condition is actually replaced with a simpler-to-maintain and
++ * stronger condition: there is no busy queue which is not constantly seeky
++ * (and hence may also benefit from idling).
++ *
++ * To sum up, when all the required symmetry and throughput-boosting
++ * sub-conditions hold, the second component of the compound condition
++ * evaluates to false, and hence no idling is performed. This helps to
++ * keep the drives' internal queues full on NCQ-capable devices, and hence
++ * to boost the throughput, without causing 'almost' any loss of service
++ * guarantees. The 'almost' follows from the fact that, if the internal
++ * queue of one such device is filled while all the sub-conditions hold,
++ * but at some point in time some sub-condition stops to hold, then it may
++ * become impossible to let requests be served in the new desired order
++ * until all the requests already queued in the device have been served.
 + */
 +static inline bool bfq_bfqq_must_not_expire(struct bfq_queue *bfqq)
 +{
 +	struct bfq_data *bfqd = bfqq->bfqd;
++#ifdef CONFIG_CGROUP_BFQIO
++#define symmetric_scenario	  (!bfqd->active_numerous_groups && \
++				   !bfq_differentiated_weights(bfqd))
++#else
++#define symmetric_scenario	  (!bfq_differentiated_weights(bfqd))
++#endif
++#define cond_for_seeky_on_ncq_hdd (bfq_bfqq_constantly_seeky(bfqq) && \
++				   bfqd->busy_in_flight_queues == \
++				   bfqd->const_seeky_busy_in_flight_queues)
++
++#define cond_for_expiring_in_burst	(bfq_bfqq_in_large_burst(bfqq) && \
++					 bfqd->hw_tag && \
++					 (blk_queue_nonrot(bfqd->queue) || \
++					  bfq_bfqq_constantly_seeky(bfqq)))
 +
-+	return bfq_bfqq_sync(bfqq) && (
-+		bfqq->raising_coeff > 1 ||
-+		(bfq_bfqq_idle_window(bfqq) &&
-+		 !(bfqd->hw_tag &&
-+		   (blk_queue_nonrot(bfqd->queue) ||
-+		 /*
-+		  * If there are weight-raised busy queues, then do not idle
-+		  * the disk for a sync non-weight-raised queue, and hence
-+		  * expire the queue immediately if empty. Combined with the
-+		  * timestamping rules of BFQ (see [1] for details), this
-+		  * causes sync non-weight-raised queues to get a lower
-+		  * fraction of the disk throughput, and hence reduces the rate
-+		  * at which the processes associated to these queues ask for
-+		  * requests from the request pool.
-+		  *
-+		  * This is beneficial for weight-raised processes, when the
-+		  * system operates in request-pool saturation conditions
-+		  * (e.g., in the presence of write hogs). In fact, if
-+		  * non-weight-raised processes ask for requests at a lower
-+		  * rate, then weight-raised processes have a higher
-+		  * probability to get a request from the pool immediately
-+		  * (or at least soon) when they need one. Hence they have a
-+		  * higher probability to actually get a fraction of the disk
-+		  * throughput proportional to their high weight. This is
-+		  * especially true with NCQ-enabled drives, which enqueue
-+		  * several requests in advance and further reorder
-+		  * internally-queued requests.
-+		  *
-+		  * Mistreating non-weight-raised queues in the above-described
-+		  * way, when there are busy weight-raised queues, seems to
-+		  * mitigate starvation problems in the presence of heavy write
-+		  * workloads and NCQ, and hence to guarantee a higher
-+		  * application and system responsiveness in these hostile
-+		  * scenarios.
-+		  */
-+		    bfqd->raised_busy_queues > 0)
-+		  )
-+		)
++/*
++ * Condition for expiring a non-weight-raised queue (and hence not idling
++ * the device).
++ */
++#define cond_for_expiring_non_wr  (bfqd->hw_tag && \
++				   (bfqd->wr_busy_queues > 0 || \
++				    (symmetric_scenario && \
++				     (blk_queue_nonrot(bfqd->queue) || \
++				      cond_for_seeky_on_ncq_hdd))))
++
++	return bfq_bfqq_sync(bfqq) &&
++		!cond_for_expiring_in_burst &&
++		(bfqq->wr_coeff > 1 ||
++		 (bfq_bfqq_IO_bound(bfqq) && bfq_bfqq_idle_window(bfqq) &&
++		  !cond_for_expiring_non_wr)
 +	);
 +}
 +
 +/*
-+ * If the in-service queue is empty, but it is sync and either of the following
-+ * conditions holds, then: 1) the queue must remain in service and cannot be
-+ * expired, and 2) the disk must be idled to wait for the possible arrival
-+ * of a new request for the queue. The conditions are:
-+ * - the device is rotational and not performing NCQ, and the queue has its
-+ *   idle window set (in this case, waiting for a new request for the queue
-+ *   is likely to boost the disk throughput);
-+ * - the queue is weight-raised (waiting for the request is necessary to
-+ *   provide the queue with fairness and latency guarantees, see [1] for
-+ *   details).
++ * If the in-service queue is empty but sync, and the function
++ * bfq_bfqq_must_not_expire returns true, then:
++ * 1) the queue must remain in service and cannot be expired, and
++ * 2) the disk must be idled to wait for the possible arrival of a new
++ *    request for the queue.
++ * See the comments to the function bfq_bfqq_must_not_expire for the reasons
++ * why performing device idling is the best choice to boost the throughput
++ * and preserve service guarantees when bfq_bfqq_must_not_expire itself
++ * returns true.
 + */
 +static inline bool bfq_bfqq_must_idle(struct bfq_queue *bfqq)
 +{
 +	struct bfq_data *bfqd = bfqq->bfqd;
 +
 +	return RB_EMPTY_ROOT(&bfqq->sort_list) && bfqd->bfq_slice_idle != 0 &&
-+	       bfq_bfqq_must_not_expire(bfqq) &&
-+	       !bfq_queue_nonrot_noidle(bfqd, bfqq);
++	       bfq_bfqq_must_not_expire(bfqq);
 +}
 +
 +/*
@@ -2817,16 +3363,18 @@ index 0000000..f5f71e4
 +			goto expire;
 +		} else {
 +			/*
-+			 * The idle timer may be pending because we may not
-+			 * disable disk idling even when a new request arrives
++			 * The idle timer may be pending because we may
++			 * not disable disk idling even when a new request
++			 * arrives.
 +			 */
 +			if (timer_pending(&bfqd->idle_slice_timer)) {
 +				/*
 +				 * If we get here: 1) at least a new request
 +				 * has arrived but we have not disabled the
 +				 * timer because the request was too small,
-+				 * 2) then the block layer has unplugged the
-+				 * device, causing the dispatch to be invoked.
++				 * 2) then the block layer has unplugged
++				 * the device, causing the dispatch to be
++				 * invoked.
 +				 *
 +				 * Since the device is unplugged, now the
 +				 * requests are probably large enough to
@@ -2844,9 +3392,9 @@ index 0000000..f5f71e4
 +	}
 +
 +	/*
-+	 * No requests pending.  If the in-service queue has no cooperator and
-+	 * still has requests in flight (possibly waiting for a completion)
-+	 * or is idling for a new request, then keep it.
++	 * No requests pending.  If the in-service queue still has requests
++	 * in flight (possibly waiting for a completion) or is idling for a
++	 * new request, then keep it.
 +	 */
 +	if (new_bfqq == NULL && (timer_pending(&bfqd->idle_slice_timer) ||
 +	    (bfqq->dispatched != 0 && bfq_bfqq_must_not_expire(bfqq)))) {
@@ -2872,40 +3420,38 @@ index 0000000..f5f71e4
 +	return bfqq;
 +}
 +
-+static void bfq_update_raising_data(struct bfq_data *bfqd,
-+				    struct bfq_queue *bfqq)
++static void bfq_update_wr_data(struct bfq_data *bfqd,
++			       struct bfq_queue *bfqq)
 +{
-+	if (bfqq->raising_coeff > 1) { /* queue is being boosted */
++	if (bfqq->wr_coeff > 1) { /* queue is being boosted */
 +		struct bfq_entity *entity = &bfqq->entity;
 +
 +		bfq_log_bfqq(bfqd, bfqq,
-+			"raising period dur %u/%u msec, "
-+			"old raising coeff %u, w %d(%d)",
++			"raising period dur %u/%u msec, old coeff %u, w %d(%d)",
 +			jiffies_to_msecs(jiffies -
-+				bfqq->last_rais_start_finish),
-+			jiffies_to_msecs(bfqq->raising_cur_max_time),
-+			bfqq->raising_coeff,
++				bfqq->last_wr_start_finish),
++			jiffies_to_msecs(bfqq->wr_cur_max_time),
++			bfqq->wr_coeff,
 +			bfqq->entity.weight, bfqq->entity.orig_weight);
 +
 +		BUG_ON(bfqq != bfqd->in_service_queue && entity->weight !=
-+			entity->orig_weight * bfqq->raising_coeff);
++		       entity->orig_weight * bfqq->wr_coeff);
 +		if (entity->ioprio_changed)
-+			bfq_log_bfqq(bfqd, bfqq,
-+			"WARN: pending prio change");
++			bfq_log_bfqq(bfqd, bfqq, "WARN: pending prio change");
 +		/*
-+		 * If too much time has elapsed from the beginning
-+		 * of this weight-raising, stop it.
++		 * If the queue was activated in a burst, or
++		 * too much time has elapsed from the beginning
++		 * of this weight-raising, then end weight raising.
 +		 */
-+		if (time_is_before_jiffies(bfqq->last_rais_start_finish +
-+					   bfqq->raising_cur_max_time)) {
-+			bfqq->last_rais_start_finish = jiffies;
++		if (bfq_bfqq_in_large_burst(bfqq) ||
++		    time_is_before_jiffies(bfqq->last_wr_start_finish +
++					   bfqq->wr_cur_max_time)) {
++			bfqq->last_wr_start_finish = jiffies;
 +			bfq_log_bfqq(bfqd, bfqq,
-+				     "wrais ending at %lu, "
-+				     "rais_max_time %u",
-+				     bfqq->last_rais_start_finish,
-+				     jiffies_to_msecs(bfqq->
-+					raising_cur_max_time));
-+			bfq_bfqq_end_raising(bfqq);
++				     "wrais ending at %lu, rais_max_time %u",
++				     bfqq->last_wr_start_finish,
++				     jiffies_to_msecs(bfqq->wr_cur_max_time));
++			bfq_bfqq_end_wr(bfqq);
 +			__bfq_entity_update_weight_prio(
 +				bfq_entity_service_tree(entity),
 +				entity);
@@ -2934,20 +3480,18 @@ index 0000000..f5f71e4
 +
 +	if (service_to_charge > bfq_bfqq_budget_left(bfqq)) {
 +		/*
-+		 * This may happen if the next rq is chosen
-+		 * in fifo order instead of sector order.
-+		 * The budget is properly dimensioned
-+		 * to be always sufficient to serve the next request
-+		 * only if it is chosen in sector order. The reason is
-+		 * that it would be quite inefficient and little useful
-+		 * to always make sure that the budget is large enough
-+		 * to serve even the possible next rq in fifo order.
++		 * This may happen if the next rq is chosen in fifo order
++		 * instead of sector order. The budget is properly
++		 * dimensioned to be always sufficient to serve the next
++		 * request only if it is chosen in sector order. The reason
++		 * is that it would be quite inefficient and little useful
++		 * to always make sure that the budget is large enough to
++		 * serve even the possible next rq in fifo order.
 +		 * In fact, requests are seldom served in fifo order.
 +		 *
-+		 * Expire the queue for budget exhaustion, and
-+		 * make sure that the next act_budget is enough
-+		 * to serve the next request, even if it comes
-+		 * from the fifo expired path.
++		 * Expire the queue for budget exhaustion, and make sure
++		 * that the next act_budget is enough to serve the next
++		 * request, even if it comes from the fifo expired path.
 +		 */
 +		bfqq->next_rq = rq;
 +		/*
@@ -2963,7 +3507,7 @@ index 0000000..f5f71e4
 +	bfq_bfqq_served(bfqq, service_to_charge);
 +	bfq_dispatch_insert(bfqd->queue, rq);
 +
-+	bfq_update_raising_data(bfqd, bfqq);
++	bfq_update_wr_data(bfqd, bfqq);
 +
 +	bfq_log_bfqq(bfqd, bfqq,
 +			"dispatched %u sec req (%llu), budg left %lu",
@@ -3004,8 +3548,8 @@ index 0000000..f5f71e4
 +}
 +
 +/*
-+ * Drain our current requests.  Used for barriers and when switching
-+ * io schedulers on-the-fly.
++ * Drain our current requests.
++ * Used for barriers and when switching io schedulers on-the-fly.
 + */
 +static int bfq_forced_dispatch(struct bfq_data *bfqd)
 +{
@@ -3105,6 +3649,17 @@ index 0000000..f5f71e4
 +	BUG_ON(bfq_bfqq_busy(bfqq));
 +	BUG_ON(bfqd->in_service_queue == bfqq);
 +
++	if (bfq_bfqq_sync(bfqq))
++		/*
++		 * The fact that this queue is being destroyed does not
++		 * invalidate the fact that this queue may have been
++		 * activated during the current burst. As a consequence,
++		 * although the queue does not exist anymore, and hence
++		 * needs to be removed from the burst list if there,
++		 * the burst size has not to be decremented.
++		 */
++		hlist_del_init(&bfqq->burst_list_node);
++
 +	bfq_log_bfqq(bfqd, bfqq, "put_queue: %p freed", bfqq);
 +
 +	kmem_cache_free(bfq_pool, bfqq);
@@ -3121,10 +3676,8 @@ index 0000000..f5f71e4
 +	 */
 +	__bfqq = bfqq->new_bfqq;
 +	while (__bfqq) {
-+		if (__bfqq == bfqq) {
-+			WARN(1, "bfqq->new_bfqq loop detected.\n");
++		if (__bfqq == bfqq)
 +			break;
-+		}
 +		next = __bfqq->new_bfqq;
 +		bfq_put_queue(__bfqq);
 +		__bfqq = next;
@@ -3146,7 +3699,7 @@ index 0000000..f5f71e4
 +	bfq_put_queue(bfqq);
 +}
 +
-+static void bfq_init_icq(struct io_cq *icq)
++static inline void bfq_init_icq(struct io_cq *icq)
 +{
 +	struct bfq_io_cq *bic = icq_to_bic(icq);
 +
@@ -3185,7 +3738,7 @@ index 0000000..f5f71e4
 +	switch (ioprio_class) {
 +	default:
 +		dev_err(bfqq->bfqd->queue->backing_dev_info.dev,
-+			"bfq: bad prio %x\n", ioprio_class);
++			"bfq: bad prio class %d\n", ioprio_class);
 +	case IOPRIO_CLASS_NONE:
 +		/*
 +		 * No prio set, inherit CPU scheduling settings.
@@ -3208,13 +3761,15 @@ index 0000000..f5f71e4
 +		break;
 +	}
 +
++	if (bfqq->entity.new_ioprio < 0 ||
++	    bfqq->entity.new_ioprio >= IOPRIO_BE_NR) {
++		printk(KERN_CRIT "bfq_init_prio_data: new_ioprio %d\n",
++				 bfqq->entity.new_ioprio);
++		BUG();
++	}
++
 +	bfqq->entity.ioprio_changed = 1;
 +
-+	/*
-+	 * Keep track of original prio settings in case we have to temporarily
-+	 * elevate the priority of this queue.
-+	 */
-+	bfqq->org_ioprio = bfqq->entity.new_ioprio;
 +	bfq_clear_bfqq_prio_changed(bfqq);
 +}
 +
@@ -3229,8 +3784,8 @@ index 0000000..f5f71e4
 +	bfqd = bfq_get_bfqd_locked(&(bic->icq.q->elevator->elevator_data),
 +				   &flags);
 +	/*
-+	 * This condition may trigger on a newly created bic, be sure to drop
-+	 * the lock before returning.
++	 * This condition may trigger on a newly created bic, be sure to
++	 * drop the lock before returning.
 +	 */
 +	if (unlikely(bfqd == NULL) || likely(bic->ioprio == ioprio))
 +		goto out;
@@ -3265,6 +3820,7 @@ index 0000000..f5f71e4
 +{
 +	RB_CLEAR_NODE(&bfqq->entity.rb_node);
 +	INIT_LIST_HEAD(&bfqq->fifo);
++	INIT_HLIST_NODE(&bfqq->burst_list_node);
 +
 +	atomic_set(&bfqq->ref, 0);
 +	bfqq->bfqd = bfqd;
@@ -3276,13 +3832,14 @@ index 0000000..f5f71e4
 +			bfq_mark_bfqq_idle_window(bfqq);
 +		bfq_mark_bfqq_sync(bfqq);
 +	}
++	bfq_mark_bfqq_IO_bound(bfqq);
 +
 +	/* Tentative initial value to trade off between thr and lat */
 +	bfqq->max_budget = (2 * bfq_max_budget(bfqd)) / 3;
 +	bfqq->pid = pid;
 +
-+	bfqq->raising_coeff = 1;
-+	bfqq->last_rais_start_finish = 0;
++	bfqq->wr_coeff = 1;
++	bfqq->last_wr_start_finish = 0;
 +	/*
 +	 * Set to the value for which bfqq will not be deemed as
 +	 * soft rt when it becomes backlogged.
@@ -3327,14 +3884,13 @@ index 0000000..f5f71e4
 +
 +		if (bfqq != NULL) {
 +			bfq_init_bfqq(bfqd, bfqq, current->pid, is_sync);
++			bfq_init_prio_data(bfqq, bic);
++			bfq_init_entity(&bfqq->entity, bfqg);
 +			bfq_log_bfqq(bfqd, bfqq, "allocated");
 +		} else {
 +			bfqq = &bfqd->oom_bfqq;
 +			bfq_log_bfqq(bfqd, bfqq, "using oom bfqq");
 +		}
-+
-+		bfq_init_prio_data(bfqq, bic);
-+		bfq_init_entity(&bfqq->entity, bfqg);
 +	}
 +
 +	if (new_bfqq != NULL)
@@ -3381,7 +3937,8 @@ index 0000000..f5f71e4
 +		bfqq = bfq_find_alloc_queue(bfqd, bfqg, is_sync, bic, gfp_mask);
 +
 +	/*
-+	 * Pin the queue now that it's allocated, scheduler exit will prune it.
++	 * Pin the queue now that it's allocated, scheduler exit will
++	 * prune it.
 +	 */
 +	if (!is_sync && *async_bfqq == NULL) {
 +		atomic_inc(&bfqq->ref);
@@ -3460,11 +4017,11 @@ index 0000000..f5f71e4
 +	if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
 +	    bfqd->bfq_slice_idle == 0 ||
 +		(bfqd->hw_tag && BFQQ_SEEKY(bfqq) &&
-+			bfqq->raising_coeff == 1))
++			bfqq->wr_coeff == 1))
 +		enable_idle = 0;
 +	else if (bfq_sample_valid(bic->ttime.ttime_samples)) {
 +		if (bic->ttime.ttime_mean > bfqd->bfq_slice_idle &&
-+			bfqq->raising_coeff == 1)
++			bfqq->wr_coeff == 1)
 +			enable_idle = 0;
 +		else
 +			enable_idle = 1;
@@ -3492,6 +4049,13 @@ index 0000000..f5f71e4
 +
 +	bfq_update_io_thinktime(bfqd, bic);
 +	bfq_update_io_seektime(bfqd, bfqq, rq);
++	if (!BFQQ_SEEKY(bfqq) && bfq_bfqq_constantly_seeky(bfqq)) {
++		bfq_clear_bfqq_constantly_seeky(bfqq);
++		if (!blk_queue_nonrot(bfqd->queue)) {
++			BUG_ON(!bfqd->const_seeky_busy_in_flight_queues);
++			bfqd->const_seeky_busy_in_flight_queues--;
++		}
++	}
 +	if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
 +	    !BFQQ_SEEKY(bfqq))
 +		bfq_update_idle_window(bfqd, bfqq, bic);
@@ -3560,7 +4124,7 @@ index 0000000..f5f71e4
 +	assert_spin_locked(bfqd->queue->queue_lock);
 +	bfq_init_prio_data(bfqq, RQ_BIC(rq));
 +
-+	bfq_add_rq_rb(rq);
++	bfq_add_request(rq);
 +
 +	rq_set_fifo_time(rq, jiffies + bfqd->bfq_fifo_expire[rq_is_sync(rq)]);
 +	list_add_tail(&rq->queuelist, &bfqq->fifo);
@@ -3597,23 +4161,36 @@ index 0000000..f5f71e4
 +{
 +	struct bfq_queue *bfqq = RQ_BFQQ(rq);
 +	struct bfq_data *bfqd = bfqq->bfqd;
-+	const int sync = rq_is_sync(rq);
++	bool sync = bfq_bfqq_sync(bfqq);
 +
-+	bfq_log_bfqq(bfqd, bfqq, "completed %u sects req (%d)",
-+			blk_rq_sectors(rq), sync);
++	bfq_log_bfqq(bfqd, bfqq, "completed one req with %u sects left (%d)",
++		     blk_rq_sectors(rq), sync);
 +
 +	bfq_update_hw_tag(bfqd);
 +
-+	WARN_ON(!bfqd->rq_in_driver);
-+	WARN_ON(!bfqq->dispatched);
++	BUG_ON(!bfqd->rq_in_driver);
++	BUG_ON(!bfqq->dispatched);
 +	bfqd->rq_in_driver--;
 +	bfqq->dispatched--;
 +
-+	if (bfq_bfqq_sync(bfqq))
-+		bfqd->sync_flight--;
++	if (!bfqq->dispatched && !bfq_bfqq_busy(bfqq)) {
++		bfq_weights_tree_remove(bfqd, &bfqq->entity,
++					&bfqd->queue_weights_tree);
++		if (!blk_queue_nonrot(bfqd->queue)) {
++			BUG_ON(!bfqd->busy_in_flight_queues);
++			bfqd->busy_in_flight_queues--;
++			if (bfq_bfqq_constantly_seeky(bfqq)) {
++				BUG_ON(!bfqd->
++					const_seeky_busy_in_flight_queues);
++				bfqd->const_seeky_busy_in_flight_queues--;
++			}
++		}
++	}
 +
-+	if (sync)
++	if (sync) {
++		bfqd->sync_flight--;
 +		RQ_BIC(rq)->ttime.last_end_request = jiffies;
++	}
 +
 +	/*
 +	 * If we are waiting to discover whether the request pattern of the
@@ -3673,9 +4250,9 @@ index 0000000..f5f71e4
 +
 +	/*
 +	 * Don't force setup of a queue from here, as a call to may_queue
-+	 * does not necessarily imply that a request actually will be queued.
-+	 * So just lookup a possibly existing queue, or return 'may queue'
-+	 * if that fails.
++	 * does not necessarily imply that a request actually will be
++	 * queued. So just lookup a possibly existing queue, or return
++	 * 'may queue' if that fails.
 +	 */
 +	bic = bfq_bic_lookup(bfqd, tsk->io_context);
 +	if (bic == NULL)
@@ -3844,12 +4421,12 @@ index 0000000..f5f71e4
 +
 +	bfqq = bfqd->in_service_queue;
 +	/*
-+	 * Theoretical race here: the in-service queue can be NULL or different
-+	 * from the queue that was idling if the timer handler spins on
-+	 * the queue_lock and a new request arrives for the current
-+	 * queue and there is a full dispatch cycle that changes the
-+	 * in-service queue.  This can hardly happen, but in the worst case
-+	 * we just expire a queue too early.
++	 * Theoretical race here: the in-service queue can be NULL or
++	 * different from the queue that was idling if the timer handler
++	 * spins on the queue_lock and a new request arrives for the
++	 * current queue and there is a full dispatch cycle that changes
++	 * the in-service queue.  This can hardly happen, but in the worst
++	 * case we just expire a queue too early.
 +	 */
 +	if (bfqq != NULL) {
 +		bfq_log_bfqq(bfqd, bfqq, "slice_timer expired");
@@ -3863,9 +4440,9 @@ index 0000000..f5f71e4
 +		else if (bfqq->queued[0] == 0 && bfqq->queued[1] == 0)
 +			/*
 +			 * The queue may not be empty upon timer expiration,
-+			 * because we may not disable the timer when the first
-+			 * request of the in-service queue arrives during
-+			 * disk idling
++			 * because we may not disable the timer when the
++			 * first request of the in-service queue arrives
++			 * during disk idling.
 +			 */
 +			reason = BFQ_BFQQ_TOO_IDLE;
 +		else
@@ -3970,6 +4547,14 @@ index 0000000..f5f71e4
 +	 */
 +	bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, 1, 0);
 +	atomic_inc(&bfqd->oom_bfqq.ref);
++	bfqd->oom_bfqq.entity.new_ioprio = BFQ_DEFAULT_QUEUE_IOPRIO;
++	bfqd->oom_bfqq.entity.new_ioprio_class = IOPRIO_CLASS_BE;
++	/*
++	 * Trigger weight initialization, according to ioprio, at the
++	 * oom_bfqq's first activation. The oom_bfqq's ioprio and ioprio
++	 * class won't be changed any more.
++	 */
++	bfqd->oom_bfqq.entity.ioprio_changed = 1;
 +
 +	bfqd->queue = q;
 +
@@ -3985,17 +4570,24 @@ index 0000000..f5f71e4
 +	}
 +
 +	bfqd->root_group = bfqg;
++	bfq_init_entity(&bfqd->oom_bfqq.entity, bfqd->root_group);
++#ifdef CONFIG_CGROUP_BFQIO
++	bfqd->active_numerous_groups = 0;
++#endif
 +
 +	init_timer(&bfqd->idle_slice_timer);
 +	bfqd->idle_slice_timer.function = bfq_idle_slice_timer;
 +	bfqd->idle_slice_timer.data = (unsigned long)bfqd;
 +
 +	bfqd->rq_pos_tree = RB_ROOT;
++	bfqd->queue_weights_tree = RB_ROOT;
++	bfqd->group_weights_tree = RB_ROOT;
 +
 +	INIT_WORK(&bfqd->unplug_work, bfq_kick_queue);
 +
 +	INIT_LIST_HEAD(&bfqd->active_list);
 +	INIT_LIST_HEAD(&bfqd->idle_list);
++	INIT_HLIST_HEAD(&bfqd->burst_list);
 +
 +	bfqd->hw_tag = -1;
 +
@@ -4012,29 +4604,38 @@ index 0000000..f5f71e4
 +	bfqd->bfq_timeout[BLK_RW_ASYNC] = bfq_timeout_async;
 +	bfqd->bfq_timeout[BLK_RW_SYNC] = bfq_timeout_sync;
 +
++	bfqd->bfq_coop_thresh = 2;
++	bfqd->bfq_failed_cooperations = 7000;
++	bfqd->bfq_requests_within_timer = 120;
++
++	bfqd->bfq_large_burst_thresh = 11;
++	bfqd->bfq_burst_interval = msecs_to_jiffies(500);
++
 +	bfqd->low_latency = true;
 +
-+	bfqd->bfq_raising_coeff = 20;
-+	bfqd->bfq_raising_rt_max_time = msecs_to_jiffies(300);
-+	bfqd->bfq_raising_max_time = 0;
-+	bfqd->bfq_raising_min_idle_time = msecs_to_jiffies(2000);
-+	bfqd->bfq_raising_min_inter_arr_async = msecs_to_jiffies(500);
-+	bfqd->bfq_raising_max_softrt_rate = 7000; /*
-+						   * Approximate rate required
-+						   * to playback or record a
-+						   * high-definition compressed
-+						   * video.
-+						   */
-+	bfqd->raised_busy_queues = 0;
-+
-+	/* Initially estimate the device's peak rate as the reference rate */
-+	if (blk_queue_nonrot(bfqd->queue)) {
-+		bfqd->RT_prod = R_nonrot * T_nonrot;
-+		bfqd->peak_rate = R_nonrot;
-+	} else {
-+		bfqd->RT_prod = R_rot * T_rot;
-+		bfqd->peak_rate = R_rot;
-+	}
++	bfqd->bfq_wr_coeff = 20;
++	bfqd->bfq_wr_rt_max_time = msecs_to_jiffies(300);
++	bfqd->bfq_wr_max_time = 0;
++	bfqd->bfq_wr_min_idle_time = msecs_to_jiffies(2000);
++	bfqd->bfq_wr_min_inter_arr_async = msecs_to_jiffies(500);
++	bfqd->bfq_wr_max_softrt_rate = 7000; /*
++					      * Approximate rate required
++					      * to playback or record a
++					      * high-definition compressed
++					      * video.
++					      */
++	bfqd->wr_busy_queues = 0;
++	bfqd->busy_in_flight_queues = 0;
++	bfqd->const_seeky_busy_in_flight_queues = 0;
++
++	/*
++	 * Begin by assuming, optimistically, that the device peak rate is
++	 * equal to the highest reference rate.
++	 */
++	bfqd->RT_prod = R_fast[blk_queue_nonrot(bfqd->queue)] *
++			T_fast[blk_queue_nonrot(bfqd->queue)];
++	bfqd->peak_rate = R_fast[blk_queue_nonrot(bfqd->queue)];
++	bfqd->device_speed = BFQ_BFQD_FAST;
 +
 +	return 0;
 +}
@@ -4058,7 +4659,8 @@ index 0000000..f5f71e4
 +	return sprintf(page, "%d\n", var);
 +}
 +
-+static ssize_t bfq_var_store(unsigned long *var, const char *page, size_t count)
++static ssize_t bfq_var_store(unsigned long *var, const char *page,
++			     size_t count)
 +{
 +	unsigned long new_val;
 +	int ret = kstrtoul(page, 10, &new_val);
@@ -4069,12 +4671,12 @@ index 0000000..f5f71e4
 +	return count;
 +}
 +
-+static ssize_t bfq_raising_max_time_show(struct elevator_queue *e, char *page)
++static ssize_t bfq_wr_max_time_show(struct elevator_queue *e, char *page)
 +{
 +	struct bfq_data *bfqd = e->elevator_data;
-+	return sprintf(page, "%d\n", bfqd->bfq_raising_max_time > 0 ?
-+		       jiffies_to_msecs(bfqd->bfq_raising_max_time) :
-+		       jiffies_to_msecs(bfq_wrais_duration(bfqd)));
++	return sprintf(page, "%d\n", bfqd->bfq_wr_max_time > 0 ?
++		       jiffies_to_msecs(bfqd->bfq_wr_max_time) :
++		       jiffies_to_msecs(bfq_wr_duration(bfqd)));
 +}
 +
 +static ssize_t bfq_weights_show(struct elevator_queue *e, char *page)
@@ -4091,15 +4693,13 @@ index 0000000..f5f71e4
 +	num_char += sprintf(page + num_char, "Active:\n");
 +	list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list) {
 +	  num_char += sprintf(page + num_char,
-+			      "pid%d: weight %hu, nr_queued %d %d,"
-+			      " dur %d/%u\n",
++			      "pid%d: weight %hu, nr_queued %d %d, dur %d/%u\n",
 +			      bfqq->pid,
 +			      bfqq->entity.weight,
 +			      bfqq->queued[0],
 +			      bfqq->queued[1],
-+			jiffies_to_msecs(jiffies -
-+				bfqq->last_rais_start_finish),
-+			jiffies_to_msecs(bfqq->raising_cur_max_time));
++			jiffies_to_msecs(jiffies - bfqq->last_wr_start_finish),
++			jiffies_to_msecs(bfqq->wr_cur_max_time));
 +	}
 +
 +	num_char += sprintf(page + num_char, "Idle:\n");
@@ -4109,8 +4709,8 @@ index 0000000..f5f71e4
 +				bfqq->pid,
 +				bfqq->entity.weight,
 +				jiffies_to_msecs(jiffies -
-+					bfqq->last_rais_start_finish),
-+				jiffies_to_msecs(bfqq->raising_cur_max_time));
++					bfqq->last_wr_start_finish),
++				jiffies_to_msecs(bfqq->wr_cur_max_time));
 +	}
 +
 +	spin_unlock_irq(bfqd->queue->queue_lock);
@@ -4134,19 +4734,17 @@ index 0000000..f5f71e4
 +SHOW_FUNCTION(bfq_back_seek_penalty_show, bfqd->bfq_back_penalty, 0);
 +SHOW_FUNCTION(bfq_slice_idle_show, bfqd->bfq_slice_idle, 1);
 +SHOW_FUNCTION(bfq_max_budget_show, bfqd->bfq_user_max_budget, 0);
-+SHOW_FUNCTION(bfq_max_budget_async_rq_show, bfqd->bfq_max_budget_async_rq, 0);
++SHOW_FUNCTION(bfq_max_budget_async_rq_show,
++	      bfqd->bfq_max_budget_async_rq, 0);
 +SHOW_FUNCTION(bfq_timeout_sync_show, bfqd->bfq_timeout[BLK_RW_SYNC], 1);
 +SHOW_FUNCTION(bfq_timeout_async_show, bfqd->bfq_timeout[BLK_RW_ASYNC], 1);
 +SHOW_FUNCTION(bfq_low_latency_show, bfqd->low_latency, 0);
-+SHOW_FUNCTION(bfq_raising_coeff_show, bfqd->bfq_raising_coeff, 0);
-+SHOW_FUNCTION(bfq_raising_rt_max_time_show, bfqd->bfq_raising_rt_max_time, 1);
-+SHOW_FUNCTION(bfq_raising_min_idle_time_show, bfqd->bfq_raising_min_idle_time,
++SHOW_FUNCTION(bfq_wr_coeff_show, bfqd->bfq_wr_coeff, 0);
++SHOW_FUNCTION(bfq_wr_rt_max_time_show, bfqd->bfq_wr_rt_max_time, 1);
++SHOW_FUNCTION(bfq_wr_min_idle_time_show, bfqd->bfq_wr_min_idle_time, 1);
++SHOW_FUNCTION(bfq_wr_min_inter_arr_async_show, bfqd->bfq_wr_min_inter_arr_async,
 +	1);
-+SHOW_FUNCTION(bfq_raising_min_inter_arr_async_show,
-+	bfqd->bfq_raising_min_inter_arr_async,
-+	1);
-+SHOW_FUNCTION(bfq_raising_max_softrt_rate_show,
-+	bfqd->bfq_raising_max_softrt_rate, 0);
++SHOW_FUNCTION(bfq_wr_max_softrt_rate_show, bfqd->bfq_wr_max_softrt_rate, 0);
 +#undef SHOW_FUNCTION
 +
 +#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV)			\
@@ -4179,18 +4777,16 @@ index 0000000..f5f71e4
 +		1, INT_MAX, 0);
 +STORE_FUNCTION(bfq_timeout_async_store, &bfqd->bfq_timeout[BLK_RW_ASYNC], 0,
 +		INT_MAX, 1);
-+STORE_FUNCTION(bfq_raising_coeff_store, &bfqd->bfq_raising_coeff, 1,
-+		INT_MAX, 0);
-+STORE_FUNCTION(bfq_raising_max_time_store, &bfqd->bfq_raising_max_time, 0,
-+		INT_MAX, 1);
-+STORE_FUNCTION(bfq_raising_rt_max_time_store, &bfqd->bfq_raising_rt_max_time, 0,
++STORE_FUNCTION(bfq_wr_coeff_store, &bfqd->bfq_wr_coeff, 1, INT_MAX, 0);
++STORE_FUNCTION(bfq_wr_max_time_store, &bfqd->bfq_wr_max_time, 0, INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_rt_max_time_store, &bfqd->bfq_wr_rt_max_time, 0, INT_MAX,
++		1);
++STORE_FUNCTION(bfq_wr_min_idle_time_store, &bfqd->bfq_wr_min_idle_time, 0,
 +		INT_MAX, 1);
-+STORE_FUNCTION(bfq_raising_min_idle_time_store,
-+	       &bfqd->bfq_raising_min_idle_time, 0, INT_MAX, 1);
-+STORE_FUNCTION(bfq_raising_min_inter_arr_async_store,
-+		&bfqd->bfq_raising_min_inter_arr_async, 0, INT_MAX, 1);
-+STORE_FUNCTION(bfq_raising_max_softrt_rate_store,
-+	       &bfqd->bfq_raising_max_softrt_rate, 0, INT_MAX, 0);
++STORE_FUNCTION(bfq_wr_min_inter_arr_async_store,
++		&bfqd->bfq_wr_min_inter_arr_async, 0, INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_max_softrt_rate_store, &bfqd->bfq_wr_max_softrt_rate, 0,
++		INT_MAX, 0);
 +#undef STORE_FUNCTION
 +
 +/* do nothing for the moment */
@@ -4259,7 +4855,7 @@ index 0000000..f5f71e4
 +	if (__data > 1)
 +		__data = 1;
 +	if (__data == 0 && bfqd->low_latency != 0)
-+		bfq_end_raising(bfqd);
++		bfq_end_wr(bfqd);
 +	bfqd->low_latency = __data;
 +
 +	return ret;
@@ -4280,12 +4876,12 @@ index 0000000..f5f71e4
 +	BFQ_ATTR(timeout_sync),
 +	BFQ_ATTR(timeout_async),
 +	BFQ_ATTR(low_latency),
-+	BFQ_ATTR(raising_coeff),
-+	BFQ_ATTR(raising_max_time),
-+	BFQ_ATTR(raising_rt_max_time),
-+	BFQ_ATTR(raising_min_idle_time),
-+	BFQ_ATTR(raising_min_inter_arr_async),
-+	BFQ_ATTR(raising_max_softrt_rate),
++	BFQ_ATTR(wr_coeff),
++	BFQ_ATTR(wr_max_time),
++	BFQ_ATTR(wr_rt_max_time),
++	BFQ_ATTR(wr_min_idle_time),
++	BFQ_ATTR(wr_min_inter_arr_async),
++	BFQ_ATTR(wr_max_softrt_rate),
 +	BFQ_ATTR(weights),
 +	__ATTR_NULL
 +};
@@ -4332,8 +4928,25 @@ index 0000000..f5f71e4
 +	if (bfq_slab_setup())
 +		return -ENOMEM;
 +
++	/*
++	 * Times to load large popular applications for the typical systems
++	 * installed on the reference devices (see the comments before the
++	 * definitions of the two arrays).
++	 */
++	T_slow[0] = msecs_to_jiffies(2600);
++	T_slow[1] = msecs_to_jiffies(1000);
++	T_fast[0] = msecs_to_jiffies(5500);
++	T_fast[1] = msecs_to_jiffies(2000);
++
++	/*
++	 * Thresholds that determine the switch between speed classes (see
++	 * the comments before the definition of the array).
++	 */
++	device_speed_thresh[0] = (R_fast[0] + R_slow[0]) / 2;
++	device_speed_thresh[1] = (R_fast[1] + R_slow[1]) / 2;
++
 +	elv_register(&iosched_bfq);
-+	pr_info("BFQ I/O-scheduler version: v7r2");
++	pr_info("BFQ I/O-scheduler version: v7r7");
 +
 +	return 0;
 +}
@@ -4348,12 +4961,13 @@ index 0000000..f5f71e4
 +module_exit(bfq_exit);
 +
 +MODULE_AUTHOR("Fabio Checconi, Paolo Valente");
++MODULE_LICENSE("GPL");
 diff --git a/block/bfq-sched.c b/block/bfq-sched.c
 new file mode 100644
-index 0000000..999b475
+index 0000000..2931563
 --- /dev/null
 +++ b/block/bfq-sched.c
-@@ -0,0 +1,1078 @@
+@@ -0,0 +1,1214 @@
 +/*
 + * BFQ: Hierarchical B-WF2Q+ scheduler.
 + *
@@ -4453,7 +5067,8 @@ index 0000000..999b475
 + * Shift for timestamp calculations.  This actually limits the maximum
 + * service allowed in one timestamp delta (small shift values increase it),
 + * the maximum total weight that can be used for the queues in the system
-+ * (big shift values increase it), and the period of virtual time wraparounds.
++ * (big shift values increase it), and the period of virtual time
++ * wraparounds.
 + */
 +#define WFQ_SERVICE_SHIFT	22
 +
@@ -4685,8 +5300,18 @@ index 0000000..999b475
 +	goto up;
 +}
 +
++static void bfq_weights_tree_add(struct bfq_data *bfqd,
++				 struct bfq_entity *entity,
++				 struct rb_root *root);
++
++static void bfq_weights_tree_remove(struct bfq_data *bfqd,
++				    struct bfq_entity *entity,
++				    struct rb_root *root);
++
++
 +/**
-+ * bfq_active_insert - insert an entity in the active tree of its group/device.
++ * bfq_active_insert - insert an entity in the active tree of its
++ *                     group/device.
 + * @st: the service tree of the entity.
 + * @entity: the entity being inserted.
 + *
@@ -4700,6 +5325,11 @@ index 0000000..999b475
 +{
 +	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
 +	struct rb_node *node = &entity->rb_node;
++#ifdef CONFIG_CGROUP_BFQIO
++	struct bfq_sched_data *sd = NULL;
++	struct bfq_group *bfqg = NULL;
++	struct bfq_data *bfqd = NULL;
++#endif
 +
 +	bfq_insert(&st->active, entity);
 +
@@ -4710,17 +5340,36 @@ index 0000000..999b475
 +
 +	bfq_update_active_tree(node);
 +
++#ifdef CONFIG_CGROUP_BFQIO
++	sd = entity->sched_data;
++	bfqg = container_of(sd, struct bfq_group, sched_data);
++	BUG_ON(!bfqg);
++	bfqd = (struct bfq_data *)bfqg->bfqd;
++#endif
 +	if (bfqq != NULL)
 +		list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list);
++#ifdef CONFIG_CGROUP_BFQIO
++	else { /* bfq_group */
++		BUG_ON(!bfqd);
++		bfq_weights_tree_add(bfqd, entity, &bfqd->group_weights_tree);
++	}
++	if (bfqg != bfqd->root_group) {
++		BUG_ON(!bfqg);
++		BUG_ON(!bfqd);
++		bfqg->active_entities++;
++		if (bfqg->active_entities == 2)
++			bfqd->active_numerous_groups++;
++	}
++#endif
 +}
 +
 +/**
 + * bfq_ioprio_to_weight - calc a weight from an ioprio.
 + * @ioprio: the ioprio value to convert.
 + */
-+static unsigned short bfq_ioprio_to_weight(int ioprio)
++static inline unsigned short bfq_ioprio_to_weight(int ioprio)
 +{
-+	WARN_ON(ioprio < 0 || ioprio >= IOPRIO_BE_NR);
++	BUG_ON(ioprio < 0 || ioprio >= IOPRIO_BE_NR);
 +	return IOPRIO_BE_NR - ioprio;
 +}
 +
@@ -4732,19 +5381,17 @@ index 0000000..999b475
 + * 0 is used as an escape ioprio value for weights (numerically) equal or
 + * larger than IOPRIO_BE_NR
 + */
-+static unsigned short bfq_weight_to_ioprio(int weight)
++static inline unsigned short bfq_weight_to_ioprio(int weight)
 +{
-+	WARN_ON(weight < BFQ_MIN_WEIGHT || weight > BFQ_MAX_WEIGHT);
++	BUG_ON(weight < BFQ_MIN_WEIGHT || weight > BFQ_MAX_WEIGHT);
 +	return IOPRIO_BE_NR - weight < 0 ? 0 : IOPRIO_BE_NR - weight;
 +}
 +
 +static inline void bfq_get_entity(struct bfq_entity *entity)
 +{
 +	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+	struct bfq_sched_data *sd;
 +
 +	if (bfqq != NULL) {
-+		sd = entity->sched_data;
 +		atomic_inc(&bfqq->ref);
 +		bfq_log_bfqq(bfqq->bfqd, bfqq, "get_entity: %p %d",
 +			     bfqq, atomic_read(&bfqq->ref));
@@ -4791,6 +5438,11 @@ index 0000000..999b475
 +{
 +	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
 +	struct rb_node *node;
++#ifdef CONFIG_CGROUP_BFQIO
++	struct bfq_sched_data *sd = NULL;
++	struct bfq_group *bfqg = NULL;
++	struct bfq_data *bfqd = NULL;
++#endif
 +
 +	node = bfq_find_deepest(&entity->rb_node);
 +	bfq_extract(&st->active, entity);
@@ -4798,8 +5450,31 @@ index 0000000..999b475
 +	if (node != NULL)
 +		bfq_update_active_tree(node);
 +
++#ifdef CONFIG_CGROUP_BFQIO
++	sd = entity->sched_data;
++	bfqg = container_of(sd, struct bfq_group, sched_data);
++	BUG_ON(!bfqg);
++	bfqd = (struct bfq_data *)bfqg->bfqd;
++#endif
 +	if (bfqq != NULL)
 +		list_del(&bfqq->bfqq_list);
++#ifdef CONFIG_CGROUP_BFQIO
++	else { /* bfq_group */
++		BUG_ON(!bfqd);
++		bfq_weights_tree_remove(bfqd, entity,
++					&bfqd->group_weights_tree);
++	}
++	if (bfqg != bfqd->root_group) {
++		BUG_ON(!bfqg);
++		BUG_ON(!bfqd);
++		BUG_ON(!bfqg->active_entities);
++		bfqg->active_entities--;
++		if (bfqg->active_entities == 1) {
++			BUG_ON(!bfqd->active_numerous_groups);
++			bfqd->active_numerous_groups--;
++		}
++	}
++#endif
 +}
 +
 +/**
@@ -4897,11 +5572,37 @@ index 0000000..999b475
 +
 +	if (entity->ioprio_changed) {
 +		struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++		unsigned short prev_weight, new_weight;
++		struct bfq_data *bfqd = NULL;
++		struct rb_root *root;
++#ifdef CONFIG_CGROUP_BFQIO
++		struct bfq_sched_data *sd;
++		struct bfq_group *bfqg;
++#endif
++
++		if (bfqq != NULL)
++			bfqd = bfqq->bfqd;
++#ifdef CONFIG_CGROUP_BFQIO
++		else {
++			sd = entity->my_sched_data;
++			bfqg = container_of(sd, struct bfq_group, sched_data);
++			BUG_ON(!bfqg);
++			bfqd = (struct bfq_data *)bfqg->bfqd;
++			BUG_ON(!bfqd);
++		}
++#endif
 +
 +		BUG_ON(old_st->wsum < entity->weight);
 +		old_st->wsum -= entity->weight;
 +
 +		if (entity->new_weight != entity->orig_weight) {
++			if (entity->new_weight < BFQ_MIN_WEIGHT ||
++			    entity->new_weight > BFQ_MAX_WEIGHT) {
++				printk(KERN_CRIT "update_weight_prio: "
++						 "new_weight %d\n",
++					entity->new_weight);
++				BUG();
++			}
 +			entity->orig_weight = entity->new_weight;
 +			entity->ioprio =
 +				bfq_weight_to_ioprio(entity->orig_weight);
@@ -4924,8 +5625,31 @@ index 0000000..999b475
 +		 * when entity->finish <= old_st->vtime).
 +		 */
 +		new_st = bfq_entity_service_tree(entity);
-+		entity->weight = entity->orig_weight *
-+			(bfqq != NULL ? bfqq->raising_coeff : 1);
++
++		prev_weight = entity->weight;
++		new_weight = entity->orig_weight *
++			     (bfqq != NULL ? bfqq->wr_coeff : 1);
++		/*
++		 * If the weight of the entity changes, remove the entity
++		 * from its old weight counter (if there is a counter
++		 * associated with the entity), and add it to the counter
++		 * associated with its new weight.
++		 */
++		if (prev_weight != new_weight) {
++			root = bfqq ? &bfqd->queue_weights_tree :
++				      &bfqd->group_weights_tree;
++			bfq_weights_tree_remove(bfqd, entity, root);
++		}
++		entity->weight = new_weight;
++		/*
++		 * Add the entity to its weights tree only if it is
++		 * not associated with a weight-raised queue.
++		 */
++		if (prev_weight != new_weight &&
++		    (bfqq ? bfqq->wr_coeff == 1 : 1))
++			/* If we get here, root has been initialized. */
++			bfq_weights_tree_add(bfqd, entity, root);
++
 +		new_st->wsum += entity->weight;
 +
 +		if (new_st != old_st)
@@ -4936,7 +5660,8 @@ index 0000000..999b475
 +}
 +
 +/**
-+ * bfq_bfqq_served - update the scheduler status after selection for service.
++ * bfq_bfqq_served - update the scheduler status after selection for
++ *                   service.
 + * @bfqq: the queue being served.
 + * @served: bytes to transfer.
 + *
@@ -5075,7 +5800,7 @@ index 0000000..999b475
 + * and if the caller did not specify @requeue, put it on the idle tree.
 + *
 + * Return %1 if the caller should update the entity hierarchy, i.e.,
-+ * if the entity was under service or if it was the next_in_service for
++ * if the entity was in service or if it was the next_in_service for
 + * its sched_data; return %0 otherwise.
 + */
 +static int __bfq_deactivate_entity(struct bfq_entity *entity, int requeue)
@@ -5131,7 +5856,7 @@ index 0000000..999b475
 +			/*
 +			 * The parent entity is still backlogged, and
 +			 * we don't need to update it as it is still
-+			 * under service.
++			 * in service.
 +			 */
 +			break;
 +
@@ -5172,7 +5897,7 @@ index 0000000..999b475
 + * active tree of the device is not empty.
 + *
 + * NOTE: this hierarchical implementation updates vtimes quite often,
-+ * we may end up with reactivated tasks getting timestamps after a
++ * we may end up with reactivated processes getting timestamps after a
 + * vtime skip done because we needed a ->first_active entity on some
 + * intermediate node.
 + */
@@ -5195,8 +5920,8 @@ index 0000000..999b475
 + *
 + * This function searches the first schedulable entity, starting from the
 + * root of the tree and going on the left every time on this side there is
-+ * a subtree with at least one eligible (start >= vtime) entity.  The path
-+ * on the right is followed only if a) the left subtree contains no eligible
++ * a subtree with at least one eligible (start >= vtime) entity. The path on
++ * the right is followed only if a) the left subtree contains no eligible
 + * entities and b) no eligible entity has been found yet.
 + */
 +static struct bfq_entity *bfq_first_active_entity(struct bfq_service_tree *st)
@@ -5356,8 +6081,8 @@ index 0000000..999b475
 +		bfq_update_budget(entity);
 +		bfq_update_vtime(bfq_entity_service_tree(entity));
 +		bfq_active_extract(bfq_entity_service_tree(entity), entity);
-+		sd->active_entity = entity;
-+		sd->next_active = NULL;
++		sd->in_service_entity = entity;
++		sd->next_in_service = NULL;
 +		entity->service = 0;
 +	}
 +
@@ -5409,8 +6134,22 @@ index 0000000..999b475
 +
 +	BUG_ON(bfqd->busy_queues == 0);
 +	bfqd->busy_queues--;
-+	if (bfqq->raising_coeff > 1)
-+		bfqd->raised_busy_queues--;
++
++	if (!bfqq->dispatched) {
++		bfq_weights_tree_remove(bfqd, &bfqq->entity,
++					&bfqd->queue_weights_tree);
++		if (!blk_queue_nonrot(bfqd->queue)) {
++			BUG_ON(!bfqd->busy_in_flight_queues);
++			bfqd->busy_in_flight_queues--;
++			if (bfq_bfqq_constantly_seeky(bfqq)) {
++				BUG_ON(!bfqd->
++					const_seeky_busy_in_flight_queues);
++				bfqd->const_seeky_busy_in_flight_queues--;
++			}
++		}
++	}
++	if (bfqq->wr_coeff > 1)
++		bfqd->wr_busy_queues--;
 +
 +	bfq_deactivate_bfqq(bfqd, bfqq, requeue);
 +}
@@ -5429,17 +6168,28 @@ index 0000000..999b475
 +
 +	bfq_mark_bfqq_busy(bfqq);
 +	bfqd->busy_queues++;
-+	if (bfqq->raising_coeff > 1)
-+		bfqd->raised_busy_queues++;
++
++	if (!bfqq->dispatched) {
++		if (bfqq->wr_coeff == 1)
++			bfq_weights_tree_add(bfqd, &bfqq->entity,
++					     &bfqd->queue_weights_tree);
++		if (!blk_queue_nonrot(bfqd->queue)) {
++			bfqd->busy_in_flight_queues++;
++			if (bfq_bfqq_constantly_seeky(bfqq))
++				bfqd->const_seeky_busy_in_flight_queues++;
++		}
++	}
++	if (bfqq->wr_coeff > 1)
++		bfqd->wr_busy_queues++;
 +}
 diff --git a/block/bfq.h b/block/bfq.h
 new file mode 100644
-index 0000000..3ca8482
+index 0000000..050eeaf
 --- /dev/null
 +++ b/block/bfq.h
-@@ -0,0 +1,622 @@
+@@ -0,0 +1,775 @@
 +/*
-+ * BFQ-v7r2 for 3.14.0: data structures and common functions prototypes.
++ * BFQ-v7r7 for 3.14.0: data structures and common functions prototypes.
 + *
 + * Based on ideas and code from CFQ:
 + * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
@@ -5464,6 +6214,8 @@ index 0000000..3ca8482
 +#define BFQ_MIN_WEIGHT	1
 +#define BFQ_MAX_WEIGHT	1000
 +
++#define BFQ_DEFAULT_QUEUE_IOPRIO	4
++
 +#define BFQ_DEFAULT_GRP_WEIGHT	10
 +#define BFQ_DEFAULT_GRP_IOPRIO	0
 +#define BFQ_DEFAULT_GRP_CLASS	IOPRIO_CLASS_BE
@@ -5497,7 +6249,7 @@ index 0000000..3ca8482
 +
 +/**
 + * struct bfq_sched_data - multi-class scheduler.
-+ * @in_service_entity: entity under service.
++ * @in_service_entity: entity in service.
 + * @next_in_service: head-of-the-line entity in the scheduler.
 + * @service_tree: array of service trees, one per ioprio_class.
 + *
@@ -5521,8 +6273,23 @@ index 0000000..3ca8482
 +};
 +
 +/**
++ * struct bfq_weight_counter - counter of the number of all active entities
++ *                             with a given weight.
++ * @weight: weight of the entities that this counter refers to.
++ * @num_active: number of active entities with this weight.
++ * @weights_node: weights tree member (see bfq_data's @queue_weights_tree
++ *                and @group_weights_tree).
++ */
++struct bfq_weight_counter {
++	short int weight;
++	unsigned int num_active;
++	struct rb_node weights_node;
++};
++
++/**
 + * struct bfq_entity - schedulable entity.
 + * @rb_node: service_tree member.
++ * @weight_counter: pointer to the weight counter associated with this entity.
 + * @on_st: flag, true if the entity is on a tree (either the active or
 + *         the idle one of its service_tree).
 + * @finish: B-WF2Q+ finish timestamp (aka F_i).
@@ -5573,6 +6340,7 @@ index 0000000..3ca8482
 + */
 +struct bfq_entity {
 +	struct rb_node rb_node;
++	struct bfq_weight_counter *weight_counter;
 +
 +	int on_st;
 +
@@ -5618,28 +6386,34 @@ index 0000000..3ca8482
 + * @max_budget: maximum budget allowed from the feedback mechanism.
 + * @budget_timeout: budget expiration (in jiffies).
 + * @dispatched: number of requests on the dispatch list or inside driver.
-+ * @org_ioprio: saved ioprio during boosted periods.
 + * @flags: status flags.
 + * @bfqq_list: node for active/idle bfqq list inside our bfqd.
++ * @burst_list_node: node for the device's burst list.
 + * @seek_samples: number of seeks sampled
 + * @seek_total: sum of the distances of the seeks sampled
 + * @seek_mean: mean seek distance
 + * @last_request_pos: position of the last request enqueued
++ * @requests_within_timer: number of consecutive pairs of request completion
++ *                         and arrival, such that the queue becomes idle
++ *                         after the completion, but the next request arrives
++ *                         within an idle time slice; used only if the queue's
++ *                         IO_bound has been cleared.
 + * @pid: pid of the process owning the queue, used for logging purposes.
-+ * @last_rais_start_finish: start time of the current weight-raising period if
-+ *                          the @bfq-queue is being weight-raised, otherwise
-+ *                          finish time of the last weight-raising period
-+ * @raising_cur_max_time: current max raising time for this queue
-+ * @soft_rt_next_start: minimum time instant such that, only if a new request
-+ *                      is enqueued after this time instant in an idle
-+ *                      @bfq_queue with no outstanding requests, then the
-+ *                      task associated with the queue it is deemed as soft
-+ *                      real-time (see the comments to the function
-+ *                      bfq_bfqq_softrt_next_start())
++ * @last_wr_start_finish: start time of the current weight-raising period if
++ *                        the @bfq-queue is being weight-raised, otherwise
++ *                        finish time of the last weight-raising period
++ * @wr_cur_max_time: current max raising time for this queue
++ * @soft_rt_next_start: minimum time instant such that, only if a new
++ *                      request is enqueued after this time instant in an
++ *                      idle @bfq_queue with no outstanding requests, then
++ *                      the task associated with the queue it is deemed as
++ *                      soft real-time (see the comments to the function
++ *                      bfq_bfqq_softrt_next_start()).
 + * @last_idle_bklogged: time of the last transition of the @bfq_queue from
 + *                      idle to backlogged
 + * @service_from_backlogged: cumulative service received from the @bfq_queue
-+ *                           since the last transition from idle to backlogged
++ *                           since the last transition from idle to
++ *                           backlogged
 + *
 + * A bfq_queue is a leaf request queue; it can be associated with an io_context
 + * or more, if it is async or shared between cooperating processes. @cgroup
@@ -5671,24 +6445,26 @@ index 0000000..3ca8482
 +
 +	int dispatched;
 +
-+	unsigned short org_ioprio;
-+
 +	unsigned int flags;
 +
 +	struct list_head bfqq_list;
 +
++	struct hlist_node burst_list_node;
++
 +	unsigned int seek_samples;
 +	u64 seek_total;
 +	sector_t seek_mean;
 +	sector_t last_request_pos;
 +
++	unsigned int requests_within_timer;
++
 +	pid_t pid;
 +
 +	/* weight-raising fields */
-+	unsigned long raising_cur_max_time;
++	unsigned long wr_cur_max_time;
 +	unsigned long soft_rt_next_start;
-+	unsigned long last_rais_start_finish;
-+	unsigned int raising_coeff;
++	unsigned long last_wr_start_finish;
++	unsigned int wr_coeff;
 +	unsigned long last_idle_bklogged;
 +	unsigned long service_from_backlogged;
 +};
@@ -5720,35 +6496,82 @@ index 0000000..3ca8482
 +	int ioprio;
 +};
 +
++enum bfq_device_speed {
++	BFQ_BFQD_FAST,
++	BFQ_BFQD_SLOW,
++};
++
 +/**
 + * struct bfq_data - per device data structure.
 + * @queue: request queue for the managed device.
 + * @root_group: root bfq_group for the device.
-+ * @rq_pos_tree: rbtree sorted by next_request position,
-+ *		used when determining if two or more queues
-+ *		have interleaving requests (see bfq_close_cooperator).
++ * @rq_pos_tree: rbtree sorted by next_request position, used when
++ *               determining if two or more queues have interleaving
++ *               requests (see bfq_close_cooperator()).
++ * @active_numerous_groups: number of bfq_groups containing more than one
++ *                          active @bfq_entity.
++ * @queue_weights_tree: rbtree of weight counters of @bfq_queues, sorted by
++ *                      weight. Used to keep track of whether all @bfq_queues
++ *                     have the same weight. The tree contains one counter
++ *                     for each distinct weight associated to some active
++ *                     and not weight-raised @bfq_queue (see the comments to
++ *                      the functions bfq_weights_tree_[add|remove] for
++ *                     further details).
++ * @group_weights_tree: rbtree of non-queue @bfq_entity weight counters, sorted
++ *                      by weight. Used to keep track of whether all
++ *                     @bfq_groups have the same weight. The tree contains
++ *                     one counter for each distinct weight associated to
++ *                     some active @bfq_group (see the comments to the
++ *                     functions bfq_weights_tree_[add|remove] for further
++ *                     details).
 + * @busy_queues: number of bfq_queues containing requests (including the
-+ *		 queue under service, even if it is idling).
-+ * @raised_busy_queues: number of weight-raised busy bfq_queues.
++ *		 queue in service, even if it is idling).
++ * @busy_in_flight_queues: number of @bfq_queues containing pending or
++ *                         in-flight requests, plus the @bfq_queue in
++ *                         service, even if idle but waiting for the
++ *                         possible arrival of its next sync request. This
++ *                         field is updated only if the device is rotational,
++ *                         but used only if the device is also NCQ-capable.
++ *                         The reason why the field is updated also for non-
++ *                         NCQ-capable rotational devices is related to the
++ *                         fact that the value of @hw_tag may be set also
++ *                         later than when busy_in_flight_queues may need to
++ *                         be incremented for the first time(s). Taking also
++ *                         this possibility into account, to avoid unbalanced
++ *                         increments/decrements, would imply more overhead
++ *                         than just updating busy_in_flight_queues
++ *                         regardless of the value of @hw_tag.
++ * @const_seeky_busy_in_flight_queues: number of constantly-seeky @bfq_queues
++ *                                     (that is, seeky queues that expired
++ *                                     for budget timeout at least once)
++ *                                     containing pending or in-flight
++ *                                     requests, including the in-service
++ *                                     @bfq_queue if constantly seeky. This
++ *                                     field is updated only if the device
++ *                                     is rotational, but used only if the
++ *                                     device is also NCQ-capable (see the
++ *                                     comments to @busy_in_flight_queues).
++ * @wr_busy_queues: number of weight-raised busy @bfq_queues.
 + * @queued: number of queued requests.
 + * @rq_in_driver: number of requests dispatched and waiting for completion.
 + * @sync_flight: number of sync requests in the driver.
-+ * @max_rq_in_driver: max number of reqs in driver in the last @hw_tag_samples
-+ *		      completed requests .
++ * @max_rq_in_driver: max number of reqs in driver in the last
++ *                    @hw_tag_samples completed requests.
 + * @hw_tag_samples: nr of samples used to calculate hw_tag.
 + * @hw_tag: flag set to one if the driver is showing a queueing behavior.
 + * @budgets_assigned: number of budgets assigned.
 + * @idle_slice_timer: timer set when idling for the next sequential request
-+ *                    from the queue under service.
++ *                    from the queue in service.
 + * @unplug_work: delayed work to restart dispatching on the request queue.
-+ * @in_service_queue: bfq_queue under service.
++ * @in_service_queue: bfq_queue in service.
 + * @in_service_bic: bfq_io_cq (bic) associated with the @in_service_queue.
 + * @last_position: on-disk position of the last served request.
 + * @last_budget_start: beginning of the last budget.
 + * @last_idling_start: beginning of the last idle slice.
 + * @peak_rate: peak transfer rate observed for a budget.
 + * @peak_rate_samples: number of samples used to calculate @peak_rate.
-+ * @bfq_max_budget: maximum budget allotted to a bfq_queue before rescheduling.
++ * @bfq_max_budget: maximum budget allotted to a bfq_queue before
++ *                  rescheduling.
 + * @group_list: list of all the bfq_groups active on the device.
 + * @active_list: list of all the bfq_queues active on the device.
 + * @idle_list: list of all the bfq_queues idle on the device.
@@ -5758,7 +6581,8 @@ index 0000000..3ca8482
 + * @bfq_back_penalty: weight of backward seeks wrt forward ones.
 + * @bfq_back_max: maximum allowed backward seek.
 + * @bfq_slice_idle: maximum idling time.
-+ * @bfq_user_max_budget: user-configured max budget value (0 for auto-tuning).
++ * @bfq_user_max_budget: user-configured max budget value
++ *                       (0 for auto-tuning).
 + * @bfq_max_budget_async_rq: maximum budget (in nr of requests) allotted to
 + *                           async queues.
 + * @bfq_timeout: timeout for bfq_queues to consume their budget; used to
@@ -5768,21 +6592,49 @@ index 0000000..3ca8482
 + *               they are charged for the whole allocated budget, to try
 + *               to preserve a behavior reasonably fair among them, but
 + *               without service-domain guarantees).
-+ * @bfq_raising_coeff: Maximum factor by which the weight of a boosted
-+ *                            queue is multiplied
-+ * @bfq_raising_max_time: maximum duration of a weight-raising period (jiffies)
-+ * @bfq_raising_rt_max_time: maximum duration for soft real-time processes
-+ * @bfq_raising_min_idle_time: minimum idle period after which weight-raising
-+ *			       may be reactivated for a queue (in jiffies)
-+ * @bfq_raising_min_inter_arr_async: minimum period between request arrivals
-+ *				     after which weight-raising may be
-+ *				     reactivated for an already busy queue
-+ *				     (in jiffies)
-+ * @bfq_raising_max_softrt_rate: max service-rate for a soft real-time queue,
-+ *			         sectors per seconds
++ * @bfq_coop_thresh: number of queue merges after which a @bfq_queue is
++ *                   no more granted any weight-raising.
++ * @bfq_failed_cooperations: number of consecutive failed cooperation
++ *                           chances after which weight-raising is restored
++ *                           to a queue subject to more than bfq_coop_thresh
++ *                           queue merges.
++ * @bfq_requests_within_timer: number of consecutive requests that must be
++ *                             issued within the idle time slice to set
++ *                             again idling to a queue which was marked as
++ *                             non-I/O-bound (see the definition of the
++ *                             IO_bound flag for further details).
++ * @last_ins_in_burst: last time at which a queue entered the current
++ *                     burst of queues being activated shortly after
++ *                     each other; for more details about this and the
++ *                     following parameters related to a burst of
++ *                     activations, see the comments to the function
++ *                     @bfq_handle_burst.
++ * @bfq_burst_interval: reference time interval used to decide whether a
++ *                      queue has been activated shortly after
++ *                      @last_ins_in_burst.
++ * @burst_size: number of queues in the current burst of queue activations.
++ * @bfq_large_burst_thresh: maximum burst size above which the current
++ * 			    queue-activation burst is deemed as 'large'.
++ * @large_burst: true if a large queue-activation burst is in progress.
++ * @burst_list: head of the burst list (as for the above fields, more details
++ * 		in the comments to the function bfq_handle_burst).
++ * @low_latency: if set to true, low-latency heuristics are enabled.
++ * @bfq_wr_coeff: maximum factor by which the weight of a weight-raised
++ *                queue is multiplied.
++ * @bfq_wr_max_time: maximum duration of a weight-raising period (jiffies).
++ * @bfq_wr_rt_max_time: maximum duration for soft real-time processes.
++ * @bfq_wr_min_idle_time: minimum idle period after which weight-raising
++ *			  may be reactivated for a queue (in jiffies).
++ * @bfq_wr_min_inter_arr_async: minimum period between request arrivals
++ *				after which weight-raising may be
++ *				reactivated for an already busy queue
++ *				(in jiffies).
++ * @bfq_wr_max_softrt_rate: max service-rate for a soft real-time queue,
++ *			    sectors per seconds.
 + * @RT_prod: cached value of the product R*T used for computing the maximum
-+ *	     duration of the weight raising automatically
-+ * @oom_bfqq: fallback dummy bfqq for extreme OOM conditions
++ *	     duration of the weight raising automatically.
++ * @device_speed: device-speed class for the low-latency heuristic.
++ * @oom_bfqq: fallback dummy bfqq for extreme OOM conditions.
 + *
 + * All the fields are protected by the @queue lock.
 + */
@@ -5790,11 +6642,19 @@ index 0000000..3ca8482
 +	struct request_queue *queue;
 +
 +	struct bfq_group *root_group;
-+
 +	struct rb_root rq_pos_tree;
 +
++#ifdef CONFIG_CGROUP_BFQIO
++	int active_numerous_groups;
++#endif
++
++	struct rb_root queue_weights_tree;
++	struct rb_root group_weights_tree;
++
 +	int busy_queues;
-+	int raised_busy_queues;
++	int busy_in_flight_queues;
++	int const_seeky_busy_in_flight_queues;
++	int wr_busy_queues;
 +	int queued;
 +	int rq_in_driver;
 +	int sync_flight;
@@ -5834,22 +6694,34 @@ index 0000000..3ca8482
 +	unsigned int bfq_max_budget_async_rq;
 +	unsigned int bfq_timeout[2];
 +
++	unsigned int bfq_coop_thresh;
++	unsigned int bfq_failed_cooperations;
++	unsigned int bfq_requests_within_timer;
++
++	unsigned long last_ins_in_burst;
++	unsigned long bfq_burst_interval;
++	int burst_size;
++	unsigned long bfq_large_burst_thresh;
++	bool large_burst;
++	struct hlist_head burst_list;
++
 +	bool low_latency;
 +
 +	/* parameters of the low_latency heuristics */
-+	unsigned int bfq_raising_coeff;
-+	unsigned int bfq_raising_max_time;
-+	unsigned int bfq_raising_rt_max_time;
-+	unsigned int bfq_raising_min_idle_time;
-+	unsigned long bfq_raising_min_inter_arr_async;
-+	unsigned int bfq_raising_max_softrt_rate;
++	unsigned int bfq_wr_coeff;
++	unsigned int bfq_wr_max_time;
++	unsigned int bfq_wr_rt_max_time;
++	unsigned int bfq_wr_min_idle_time;
++	unsigned long bfq_wr_min_inter_arr_async;
++	unsigned int bfq_wr_max_softrt_rate;
 +	u64 RT_prod;
++	enum bfq_device_speed device_speed;
 +
 +	struct bfq_queue oom_bfqq;
 +};
 +
 +enum bfqq_state_flags {
-+	BFQ_BFQQ_FLAG_busy = 0,		/* has requests or is under service */
++	BFQ_BFQQ_FLAG_busy = 0,		/* has requests or is in service */
 +	BFQ_BFQQ_FLAG_wait_request,	/* waiting for a request */
 +	BFQ_BFQQ_FLAG_must_alloc,	/* must be allowed rq alloc */
 +	BFQ_BFQQ_FLAG_fifo_expire,	/* FIFO checked in this slice */
@@ -5857,9 +6729,25 @@ index 0000000..3ca8482
 +	BFQ_BFQQ_FLAG_prio_changed,	/* task priority has changed */
 +	BFQ_BFQQ_FLAG_sync,		/* synchronous queue */
 +	BFQ_BFQQ_FLAG_budget_new,	/* no completion with this budget */
++	BFQ_BFQQ_FLAG_IO_bound,         /*
++					 * bfqq has timed-out at least once
++					 * having consumed at most 2/10 of
++					 * its budget
++					 */
++	BFQ_BFQQ_FLAG_in_large_burst,	/*
++					 * bfqq activated in a large burst,
++					 * see comments to bfq_handle_burst.
++					 */
++	BFQ_BFQQ_FLAG_constantly_seeky,	/*
++					 * bfqq has proved to be slow and
++					 * seeky until budget timeout
++					 */
++	BFQ_BFQQ_FLAG_softrt_update,    /*
++					 * may need softrt-next-start
++					 * update
++					 */
 +	BFQ_BFQQ_FLAG_coop,		/* bfqq is shared */
 +	BFQ_BFQQ_FLAG_split_coop,	/* shared bfqq will be splitted */
-+	BFQ_BFQQ_FLAG_softrt_update,	/* needs softrt-next-start update */
 +};
 +
 +#define BFQ_BFQQ_FNS(name)						\
@@ -5884,6 +6772,9 @@ index 0000000..3ca8482
 +BFQ_BFQQ_FNS(prio_changed);
 +BFQ_BFQQ_FNS(sync);
 +BFQ_BFQQ_FNS(budget_new);
++BFQ_BFQQ_FNS(IO_bound);
++BFQ_BFQQ_FNS(in_large_burst);
++BFQ_BFQQ_FNS(constantly_seeky);
 +BFQ_BFQQ_FNS(coop);
 +BFQ_BFQQ_FNS(split_coop);
 +BFQ_BFQQ_FNS(softrt_update);
@@ -5898,7 +6789,10 @@ index 0000000..3ca8482
 +
 +/* Expiration reasons. */
 +enum bfqq_expiration {
-+	BFQ_BFQQ_TOO_IDLE = 0,		/* queue has been idling for too long */
++	BFQ_BFQQ_TOO_IDLE = 0,		/*
++					 * queue has been idling for
++					 * too long
++					 */
 +	BFQ_BFQQ_BUDGET_TIMEOUT,	/* budget took too long to be used */
 +	BFQ_BFQQ_BUDGET_EXHAUSTED,	/* budget consumed */
 +	BFQ_BFQQ_NO_MORE_REQUESTS,	/* the queue has no more requests */
@@ -5920,7 +6814,13 @@ index 0000000..3ca8482
 + *              except for the idle class that has only one queue.
 + * @async_idle_bfqq: async queue for the idle class (ioprio is ignored).
 + * @my_entity: pointer to @entity, %NULL for the toplevel group; used
-+ *             to avoid too many special cases during group creation/migration.
++ *             to avoid too many special cases during group creation/
++ *             migration.
++ * @active_entities: number of active entities belonging to the group;
++ *                   unused for the root group. Used to know whether there
++ *                   are groups with more than one active @bfq_entity
++ *                   (see the comments to the function
++ *                   bfq_bfqq_must_not_expire()).
 + *
 + * Each (device, cgroup) pair has its own bfq_group, i.e., for each cgroup
 + * there is a set of bfq_groups, each one collecting the lower-level
@@ -5946,6 +6846,8 @@ index 0000000..3ca8482
 +	struct bfq_queue *async_idle_bfqq;
 +
 +	struct bfq_entity *my_entity;
++
++	int active_entities;
 +};
 +
 +/**
@@ -5992,15 +6894,15 @@ index 0000000..3ca8482
 +}
 +
 +static inline struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic,
-+					    int is_sync)
++					    bool is_sync)
 +{
-+	return bic->bfqq[!!is_sync];
++	return bic->bfqq[is_sync];
 +}
 +
 +static inline void bic_set_bfqq(struct bfq_io_cq *bic,
-+				struct bfq_queue *bfqq, int is_sync)
++				struct bfq_queue *bfqq, bool is_sync)
 +{
-+	bic->bfqq[!!is_sync] = bfqq;
++	bic->bfqq[is_sync] = bfqq;
 +}
 +
 +static inline struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic)
@@ -6055,11 +6957,12 @@ index 0000000..3ca8482
 +static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
 +				       struct bfq_group *bfqg, int is_sync,
 +				       struct bfq_io_cq *bic, gfp_t gfp_mask);
-+static void bfq_end_raising_async_queues(struct bfq_data *bfqd,
-+					 struct bfq_group *bfqg);
++static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
++				    struct bfq_group *bfqg);
 +static void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg);
 +static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq);
-+#endif
++
++#endif /* _BFQ_H */
 -- 
-1.9.0
+2.1.3
 

diff --git a/5003_BFQ-3-block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r2-for-3.14.0.patch b/5003_BFQ-3-block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r7-for-3.14.0.patch
similarity index 63%
rename from 5003_BFQ-3-block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r2-for-3.14.0.patch
rename to 5003_BFQ-3-block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r7-for-3.14.0.patch
index 010450b..4ff37ea 100644
--- a/5003_BFQ-3-block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r2-for-3.14.0.patch
+++ b/5003_BFQ-3-block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r7-for-3.14.0.patch
@@ -1,15 +1,15 @@
-From 4fbeda28a90d7fccd05d28a89d9fc409b2344e0a Mon Sep 17 00:00:00 2001
+From c1e7cf789b3812aaa71fbef0adce65aac3c28de7 Mon Sep 17 00:00:00 2001
 From: Mauro Andreolini <mauro.andreolini@unimore.it>
-Date: Fri, 14 Feb 2014 12:52:49 +0100
-Subject: [PATCH 3/3] block, bfq: add Early Queue Merge (EQM) to BFQ-v7r2 for
+Date: Thu, 18 Dec 2014 22:03:16 +0100
+Subject: [PATCH 3/3] block, bfq: add Early Queue Merge (EQM) to BFQ-v7r7 for
  3.14.0
 
-A set of processes may happen  to  perform interleaved reads, i.e., requests
+A set of processes may happen  to  perform interleaved reads, i.e.,requests
 whose union would give rise to a  sequential read  pattern.  There are two
 typical  cases: in the first  case,   processes  read  fixed-size chunks of
 data at a fixed distance from each other, while in the second case processes
 may read variable-size chunks at  variable distances. The latter case occurs
-for  example with  KVM, which  splits the  I/O generated  by the  guest into
+for  example with  QEMU, which  splits the  I/O generated  by the  guest into
 multiple chunks,  and lets these chunks  be served by a  pool of cooperating
 processes,  iteratively  assigning  the  next  chunk of  I/O  to  the first
 available  process. CFQ  uses actual  queue merging  for the  first type of
@@ -32,21 +32,27 @@ a non-merged state.
 
 Signed-off-by: Mauro Andreolini <mauro.andreolini@unimore.it>
 Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
-Reviewed-by: Paolo Valente <paolo.valente@unimore.it>
+Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
 ---
- block/bfq-iosched.c | 657 ++++++++++++++++++++++++++++++++++++----------------
- block/bfq-sched.c   |  28 ---
- block/bfq.h         |  20 +-
- 3 files changed, 476 insertions(+), 229 deletions(-)
+ block/bfq-iosched.c | 751 +++++++++++++++++++++++++++++++++++++---------------
+ block/bfq-sched.c   |  28 --
+ block/bfq.h         |  54 +++-
+ 3 files changed, 581 insertions(+), 252 deletions(-)
 
 diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
-index f5f71e4..0d3503d 100644
+index 5aa5f09..c133e23 100644
 --- a/block/bfq-iosched.c
 +++ b/block/bfq-iosched.c
-@@ -445,6 +445,46 @@ static inline unsigned int bfq_wrais_duration(struct bfq_data *bfqd)
+@@ -571,6 +571,57 @@ static inline unsigned int bfq_wr_duration(struct bfq_data *bfqd)
  	return dur;
  }
  
++static inline unsigned
++bfq_bfqq_cooperations(struct bfq_queue *bfqq)
++{
++	return bfqq->bic ? bfqq->bic->cooperations : 0;
++}
++
 +static inline void
 +bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
 +{
@@ -54,29 +60,34 @@ index f5f71e4..0d3503d 100644
 +		bfq_mark_bfqq_idle_window(bfqq);
 +	else
 +		bfq_clear_bfqq_idle_window(bfqq);
-+	if (bic->raising_time_left && bfqq->bfqd->low_latency) {
++	if (bic->saved_IO_bound)
++		bfq_mark_bfqq_IO_bound(bfqq);
++	else
++		bfq_clear_bfqq_IO_bound(bfqq);
++	/* Assuming that the flag in_large_burst is already correctly set */
++	if (bic->wr_time_left && bfqq->bfqd->low_latency &&
++	    !bfq_bfqq_in_large_burst(bfqq) &&
++	    bic->cooperations < bfqq->bfqd->bfq_coop_thresh) {
 +		/*
 +		 * Start a weight raising period with the duration given by
 +		 * the raising_time_left snapshot.
 +		 */
 +		if (bfq_bfqq_busy(bfqq))
-+			bfqq->bfqd->raised_busy_queues++;
-+		bfqq->raising_coeff = bfqq->bfqd->bfq_raising_coeff;
-+		bfqq->raising_cur_max_time = bic->raising_time_left;
-+		bfqq->last_rais_start_finish = jiffies;
++			bfqq->bfqd->wr_busy_queues++;
++		bfqq->wr_coeff = bfqq->bfqd->bfq_wr_coeff;
++		bfqq->wr_cur_max_time = bic->wr_time_left;
++		bfqq->last_wr_start_finish = jiffies;
 +		bfqq->entity.ioprio_changed = 1;
 +	}
 +	/*
-+	 * Clear raising_time_left to prevent bfq_bfqq_save_state() from
++	 * Clear wr_time_left to prevent bfq_bfqq_save_state() from
 +	 * getting confused about the queue's need of a weight-raising
 +	 * period.
 +	 */
-+	bic->raising_time_left = 0;
++	bic->wr_time_left = 0;
 +}
 +
-+/*
-+ * Must be called with the queue_lock held.
-+ */
++/* Must be called with the queue_lock held. */
 +static int bfqq_process_refs(struct bfq_queue *bfqq)
 +{
 +	int process_refs, io_refs;
@@ -87,10 +98,35 @@ index f5f71e4..0d3503d 100644
 +	return process_refs;
 +}
 +
- static void bfq_add_rq_rb(struct request *rq)
- {
- 	struct bfq_queue *bfqq = RQ_BFQQ(rq);
-@@ -486,12 +526,20 @@ static void bfq_add_rq_rb(struct request *rq)
+ /* Empty burst list and add just bfqq (see comments to bfq_handle_burst) */
+ static inline void bfq_reset_burst_list(struct bfq_data *bfqd,
+ 					struct bfq_queue *bfqq)
+@@ -815,7 +866,7 @@ static void bfq_add_request(struct request *rq)
+ 		bfq_rq_pos_tree_add(bfqd, bfqq);
+ 
+ 	if (!bfq_bfqq_busy(bfqq)) {
+-		bool soft_rt,
++		bool soft_rt, coop_or_in_burst,
+ 		     idle_for_long_time = time_is_before_jiffies(
+ 						bfqq->budget_timeout +
+ 						bfqd->bfq_wr_min_idle_time);
+@@ -839,11 +890,12 @@ static void bfq_add_request(struct request *rq)
+ 				bfqd->last_ins_in_burst = jiffies;
+ 		}
+ 
++		coop_or_in_burst = bfq_bfqq_in_large_burst(bfqq) ||
++			bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh;
+ 		soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
+-			!bfq_bfqq_in_large_burst(bfqq) &&
++			!coop_or_in_burst &&
+ 			time_is_before_jiffies(bfqq->soft_rt_next_start);
+-		interactive = !bfq_bfqq_in_large_burst(bfqq) &&
+-			      idle_for_long_time;
++		interactive = !coop_or_in_burst && idle_for_long_time;
+ 		entity->budget = max_t(unsigned long, bfqq->max_budget,
+ 				       bfq_serv_to_charge(next_rq, bfqq));
+ 
+@@ -862,11 +914,20 @@ static void bfq_add_request(struct request *rq)
  		if (!bfqd->low_latency)
  			goto add_bfqq_busy;
  
@@ -108,28 +144,68 @@ index f5f71e4..0d3503d 100644
 +		 *   requests have not been redirected to a shared queue)
 +		 * start a weight-raising period.
  		 */
--		if (old_raising_coeff == 1 &&
--		    (idle_for_long_time || soft_rt)) {
-+		if (old_raising_coeff == 1 && (idle_for_long_time || soft_rt) &&
+-		if (old_wr_coeff == 1 && (interactive || soft_rt)) {
++		if (old_wr_coeff == 1 && (interactive || soft_rt) &&
 +		    (!bfq_bfqq_sync(bfqq) || bfqq->bic != NULL)) {
- 			bfqq->raising_coeff = bfqd->bfq_raising_coeff;
- 			if (idle_for_long_time)
- 				bfqq->raising_cur_max_time =
-@@ -574,6 +622,7 @@ static void bfq_add_rq_rb(struct request *rq)
- 					bfqd->bfq_raising_rt_max_time;
+ 			bfqq->wr_coeff = bfqd->bfq_wr_coeff;
+ 			if (interactive)
+ 				bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
+@@ -880,7 +941,7 @@ static void bfq_add_request(struct request *rq)
+ 		} else if (old_wr_coeff > 1) {
+ 			if (interactive)
+ 				bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
+-			else if (bfq_bfqq_in_large_burst(bfqq) ||
++			else if (coop_or_in_burst ||
+ 				 (bfqq->wr_cur_max_time ==
+ 				  bfqd->bfq_wr_rt_max_time &&
+ 				  !soft_rt)) {
+@@ -899,18 +960,18 @@ static void bfq_add_request(struct request *rq)
+ 				/*
+ 				 *
+ 				 * The remaining weight-raising time is lower
+-				 * than bfqd->bfq_wr_rt_max_time, which
+-				 * means that the application is enjoying
+-				 * weight raising either because deemed soft-
+-				 * rt in the near past, or because deemed
+-				 * interactive a long ago. In both cases,
+-				 * resetting now the current remaining weight-
+-				 * raising time for the application to the
+-				 * weight-raising duration for soft rt
+-				 * applications would not cause any latency
+-				 * increase for the application (as the new
+-				 * duration would be higher than the remaining
+-				 * time).
++				 * than bfqd->bfq_wr_rt_max_time, which means
++				 * that the application is enjoying weight
++				 * raising either because deemed soft-rt in
++				 * the near past, or because deemed interactive
++				 * a long ago.
++				 * In both cases, resetting now the current
++				 * remaining weight-raising time for the
++				 * application to the weight-raising duration
++				 * for soft rt applications would not cause any
++				 * latency increase for the application (as the
++				 * new duration would be higher than the
++				 * remaining time).
+ 				 *
+ 				 * In addition, the application is now meeting
+ 				 * the requirements for being deemed soft rt.
+@@ -945,6 +1006,7 @@ static void bfq_add_request(struct request *rq)
+ 					bfqd->bfq_wr_rt_max_time;
  			}
  		}
 +set_ioprio_changed:
- 		if (old_raising_coeff != bfqq->raising_coeff)
+ 		if (old_wr_coeff != bfqq->wr_coeff)
  			entity->ioprio_changed = 1;
  add_bfqq_busy:
-@@ -756,90 +805,35 @@ static void bfq_end_raising(struct bfq_data *bfqd)
+@@ -1156,90 +1218,35 @@ static void bfq_end_wr(struct bfq_data *bfqd)
  	spin_unlock_irq(bfqd->queue->queue_lock);
  }
  
 -static int bfq_allow_merge(struct request_queue *q, struct request *rq,
 -			   struct bio *bio)
--{
++static inline sector_t bfq_io_struct_pos(void *io_struct, bool request)
+ {
 -	struct bfq_data *bfqd = q->elevator->elevator_data;
 -	struct bfq_io_cq *bic;
 -	struct bfq_queue *bfqq;
@@ -176,8 +252,7 @@ index f5f71e4..0d3503d 100644
 - */
 -static struct bfq_queue *bfq_set_in_service_queue(struct bfq_data *bfqd,
 -						  struct bfq_queue *bfqq)
-+static inline sector_t bfq_io_struct_pos(void *io_struct, bool request)
- {
+-{
 -	if (!bfqq)
 -		bfqq = bfq_get_next_queue(bfqd);
 +	if (request)
@@ -228,8 +303,8 @@ index f5f71e4..0d3503d 100644
  
  	if (RB_EMPTY_ROOT(root))
  		return NULL;
-@@ -858,7 +852,7 @@ static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
- 	 * position).
+@@ -1258,7 +1265,7 @@ static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
+ 	 * next_request position).
  	 */
  	__bfqq = rb_entry(parent, struct bfq_queue, pos_node);
 -	if (bfq_rq_close(bfqd, __bfqq->next_rq))
@@ -237,7 +312,7 @@ index f5f71e4..0d3503d 100644
  		return __bfqq;
  
  	if (blk_rq_pos(__bfqq->next_rq) < sector)
-@@ -869,7 +863,7 @@ static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
+@@ -1269,7 +1276,7 @@ static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
  		return NULL;
  
  	__bfqq = rb_entry(node, struct bfq_queue, pos_node);
@@ -246,7 +321,7 @@ index f5f71e4..0d3503d 100644
  		return __bfqq;
  
  	return NULL;
-@@ -878,14 +872,12 @@ static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
+@@ -1278,14 +1285,12 @@ static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
  /*
   * bfqd - obvious
   * cur_bfqq - passed in so that we don't decide that the current queue
@@ -265,7 +340,7 @@ index f5f71e4..0d3503d 100644
  {
  	struct bfq_queue *bfqq;
  
-@@ -905,7 +897,7 @@ static struct bfq_queue *bfq_close_cooperator(struct bfq_data *bfqd,
+@@ -1305,7 +1310,7 @@ static struct bfq_queue *bfq_close_cooperator(struct bfq_data *bfqd,
  	 * working closely on the same area of the disk. In that case,
  	 * we can group them together and don't waste time idling.
  	 */
@@ -274,7 +349,7 @@ index f5f71e4..0d3503d 100644
  	if (bfqq == NULL || bfqq == cur_bfqq)
  		return NULL;
  
-@@ -932,6 +924,282 @@ static struct bfq_queue *bfq_close_cooperator(struct bfq_data *bfqd,
+@@ -1332,6 +1337,315 @@ static struct bfq_queue *bfq_close_cooperator(struct bfq_data *bfqd,
  	return bfqq;
  }
  
@@ -313,24 +388,26 @@ index f5f71e4..0d3503d 100644
 +		new_bfqq->pid);
 +
 +	/*
-+	 * Merging is just a redirection: the requests of the process owning
-+	 * one of the two queues are redirected to the other queue. The latter
-+	 * queue, in its turn, is set as shared if this is the first time that
-+	 * the requests of some process are redirected to it.
++	 * Merging is just a redirection: the requests of the process
++	 * owning one of the two queues are redirected to the other queue.
++	 * The latter queue, in its turn, is set as shared if this is the
++	 * first time that the requests of some process are redirected to
++	 * it.
 +	 *
 +	 * We redirect bfqq to new_bfqq and not the opposite, because we
-+	 * are in the context of the process owning bfqq, hence we have the
-+	 * io_cq of this process. So we can immediately configure this io_cq
-+	 * to redirect the requests of the process to new_bfqq.
++	 * are in the context of the process owning bfqq, hence we have
++	 * the io_cq of this process. So we can immediately configure this
++	 * io_cq to redirect the requests of the process to new_bfqq.
 +	 *
 +	 * NOTE, even if new_bfqq coincides with the in-service queue, the
-+	 * io_cq of new_bfqq is not available, because, if the in-service queue
-+	 * is shared, bfqd->in_service_bic may not point to the io_cq of the
-+	 * in-service queue.
-+	 * Redirecting the requests of the process owning bfqq to the currently
-+	 * in-service queue is in any case the best option, as we feed the
-+	 * in-service queue with new requests close to the last request served
-+	 * and, by doing so, hopefully increase the throughput.
++	 * io_cq of new_bfqq is not available, because, if the in-service
++	 * queue is shared, bfqd->in_service_bic may not point to the
++	 * io_cq of the in-service queue.
++	 * Redirecting the requests of the process owning bfqq to the
++	 * currently in-service queue is in any case the best option, as
++	 * we feed the in-service queue with new requests close to the
++	 * last request served and, by doing so, hopefully increase the
++	 * throughput.
 +	 */
 +	bfqq->new_bfqq = new_bfqq;
 +	atomic_add(process_refs, &new_bfqq->ref);
@@ -338,10 +415,17 @@ index f5f71e4..0d3503d 100644
 +}
 +
 +/*
-+ * Attempt to schedule a merge of bfqq with the currently in-service queue or
-+ * with a close queue among the scheduled queues.
++ * Attempt to schedule a merge of bfqq with the currently in-service queue
++ * or with a close queue among the scheduled queues.
 + * Return NULL if no merge was scheduled, a pointer to the shared bfq_queue
 + * structure otherwise.
++ *
++ * The OOM queue is not allowed to participate to cooperation: in fact, since
++ * the requests temporarily redirected to the OOM queue could be redirected
++ * again to dedicated queues at any time, the state needed to correctly
++ * handle merging with the OOM queue would be quite complex and expensive
++ * to maintain. Besides, in such a critical condition as an out of memory,
++ * the benefits of queue merging may be little relevant, or even negligible.
 + */
 +static struct bfq_queue *
 +bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
@@ -352,13 +436,14 @@ index f5f71e4..0d3503d 100644
 +	if (bfqq->new_bfqq)
 +		return bfqq->new_bfqq;
 +
-+	if (!io_struct)
++	if (!io_struct || unlikely(bfqq == &bfqd->oom_bfqq))
 +		return NULL;
 +
 +	in_service_bfqq = bfqd->in_service_queue;
 +
 +	if (in_service_bfqq == NULL || in_service_bfqq == bfqq ||
-+	    !bfqd->in_service_bic)
++	    !bfqd->in_service_bic ||
++	    unlikely(in_service_bfqq == &bfqd->oom_bfqq))
 +		goto check_scheduled;
 +
 +	if (bfq_class_idle(in_service_bfqq) || bfq_class_idle(bfqq))
@@ -374,7 +459,7 @@ index f5f71e4..0d3503d 100644
 +	    bfq_bfqq_sync(in_service_bfqq) && bfq_bfqq_sync(bfqq)) {
 +		new_bfqq = bfq_setup_merge(bfqq, in_service_bfqq);
 +		if (new_bfqq != NULL)
-+			return new_bfqq; /* Merge with the in-service queue */
++			return new_bfqq; /* Merge with in-service queue */
 +	}
 +
 +	/*
@@ -385,7 +470,7 @@ index f5f71e4..0d3503d 100644
 +check_scheduled:
 +	new_bfqq = bfq_close_cooperator(bfqd, bfqq,
 +					bfq_io_struct_pos(io_struct, request));
-+	if (new_bfqq)
++	if (new_bfqq && likely(new_bfqq != &bfqd->oom_bfqq))
 +		return bfq_setup_merge(bfqq, new_bfqq);
 +
 +	return NULL;
@@ -401,41 +486,46 @@ index f5f71e4..0d3503d 100644
 +	 */
 +	if (bfqq->bic == NULL)
 +		return;
-+	if (bfqq->bic->raising_time_left)
++	if (bfqq->bic->wr_time_left)
 +		/*
 +		 * This is the queue of a just-started process, and would
-+		 * deserve weight raising: we set raising_time_left to the full
-+		 * weight-raising duration to trigger weight-raising when and
-+		 * if the queue is split and the first request of the queue
-+		 * is enqueued.
++		 * deserve weight raising: we set wr_time_left to the full
++		 * weight-raising duration to trigger weight-raising when
++		 * and if the queue is split and the first request of the
++		 * queue is enqueued.
 +		 */
-+		bfqq->bic->raising_time_left = bfq_wrais_duration(bfqq->bfqd);
-+	else if (bfqq->raising_coeff > 1) {
-+		unsigned long wrais_duration =
-+			jiffies - bfqq->last_rais_start_finish;
++		bfqq->bic->wr_time_left = bfq_wr_duration(bfqq->bfqd);
++	else if (bfqq->wr_coeff > 1) {
++		unsigned long wr_duration =
++			jiffies - bfqq->last_wr_start_finish;
 +		/*
 +		 * It may happen that a queue's weight raising period lasts
-+		 * longer than its raising_cur_max_time, as weight raising is
++		 * longer than its wr_cur_max_time, as weight raising is
 +		 * handled only when a request is enqueued or dispatched (it
 +		 * does not use any timer). If the weight raising period is
 +		 * about to end, don't save it.
 +		 */
-+		if (bfqq->raising_cur_max_time <= wrais_duration)
-+			bfqq->bic->raising_time_left = 0;
++		if (bfqq->wr_cur_max_time <= wr_duration)
++			bfqq->bic->wr_time_left = 0;
 +		else
-+			bfqq->bic->raising_time_left =
-+				bfqq->raising_cur_max_time - wrais_duration;
++			bfqq->bic->wr_time_left =
++				bfqq->wr_cur_max_time - wr_duration;
 +		/*
 +		 * The bfq_queue is becoming shared or the requests of the
 +		 * process owning the queue are being redirected to a shared
 +		 * queue. Stop the weight raising period of the queue, as in
-+		 * both cases it should not be owned by an interactive or soft
-+		 * real-time application.
++		 * both cases it should not be owned by an interactive or
++		 * soft real-time application.
 +		 */
-+		bfq_bfqq_end_raising(bfqq);
++		bfq_bfqq_end_wr(bfqq);
 +	} else
-+		bfqq->bic->raising_time_left = 0;
++		bfqq->bic->wr_time_left = 0;
 +	bfqq->bic->saved_idle_window = bfq_bfqq_idle_window(bfqq);
++	bfqq->bic->saved_IO_bound = bfq_bfqq_IO_bound(bfqq);
++	bfqq->bic->saved_in_large_burst = bfq_bfqq_in_large_burst(bfqq);
++	bfqq->bic->was_in_burst_list = !hlist_unhashed(&bfqq->burst_list_node);
++	bfqq->bic->cooperations++;
++	bfqq->bic->failed_cooperations = 0;
 +}
 +
 +static inline void
@@ -458,23 +548,28 @@ index f5f71e4..0d3503d 100644
 +	/* Save weight raising and idle window of the merged queues */
 +	bfq_bfqq_save_state(bfqq);
 +	bfq_bfqq_save_state(new_bfqq);
++	if (bfq_bfqq_IO_bound(bfqq))
++		bfq_mark_bfqq_IO_bound(new_bfqq);
++	bfq_clear_bfqq_IO_bound(bfqq);
 +	/*
 +	 * Grab a reference to the bic, to prevent it from being destroyed
 +	 * before being possibly touched by a bfq_split_bfqq().
 +	 */
 +	bfq_get_bic_reference(bfqq);
 +	bfq_get_bic_reference(new_bfqq);
-+	/* Merge queues (that is, let bic redirect its requests to new_bfqq) */
++	/*
++	 * Merge queues (that is, let bic redirect its requests to new_bfqq)
++	 */
 +	bic_set_bfqq(bic, new_bfqq, 1);
 +	bfq_mark_bfqq_coop(new_bfqq);
 +	/*
-+	 * new_bfqq now belongs to at least two bics (it is a shared queue): set
-+	 * new_bfqq->bic to NULL. bfqq either:
++	 * new_bfqq now belongs to at least two bics (it is a shared queue):
++	 * set new_bfqq->bic to NULL. bfqq either:
 +	 * - does not belong to any bic any more, and hence bfqq->bic must
 +	 *   be set to NULL, or
 +	 * - is a queue whose owning bics have already been redirected to a
-+	 *   different queue, hence the queue is destined to not belong to any
-+	 *   bic soon and bfqq->bic is already NULL (therefore the next
++	 *   different queue, hence the queue is destined to not belong to
++	 *   any bic soon and bfqq->bic is already NULL (therefore the next
 +	 *   assignment causes no harm).
 +	 */
 +	new_bfqq->bic = NULL;
@@ -482,6 +577,18 @@ index f5f71e4..0d3503d 100644
 +	bfq_put_queue(bfqq);
 +}
 +
++static inline void bfq_bfqq_increase_failed_cooperations(struct bfq_queue *bfqq)
++{
++	struct bfq_io_cq *bic = bfqq->bic;
++	struct bfq_data *bfqd = bfqq->bfqd;
++
++	if (bic && bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh) {
++		bic->failed_cooperations++;
++		if (bic->failed_cooperations >= bfqd->bfq_failed_cooperations)
++			bic->cooperations = 0;
++	}
++}
++
 +static int bfq_allow_merge(struct request_queue *q, struct request *rq,
 +			   struct bio *bio)
 +{
@@ -514,12 +621,13 @@ index f5f71e4..0d3503d 100644
 +		if (new_bfqq != NULL) {
 +			bfq_merge_bfqqs(bfqd, bic, bfqq, new_bfqq);
 +			/*
-+			 * If we get here, the bio will be queued in the shared
-+			 * queue, i.e., new_bfqq, so use new_bfqq to decide
-+			 * whether bio and rq can be merged.
++			 * If we get here, the bio will be queued in the
++			 * shared queue, i.e., new_bfqq, so use new_bfqq
++			 * to decide whether bio and rq can be merged.
 +			 */
 +			bfqq = new_bfqq;
-+		}
++		} else
++			bfq_bfqq_increase_failed_cooperations(bfqq);
 +	}
 +
 +	return bfqq == RQ_BFQQ(rq);
@@ -557,13 +665,11 @@ index f5f71e4..0d3503d 100644
  /*
   * If enough samples have been computed, return the current max budget
   * stored in bfqd, which is dynamically updated according to the
-@@ -1079,63 +1347,6 @@ static struct request *bfq_check_fifo(struct bfq_queue *bfqq)
+@@ -1475,61 +1789,6 @@ static struct request *bfq_check_fifo(struct bfq_queue *bfqq)
  	return rq;
  }
  
--/*
-- * Must be called with the queue_lock held.
-- */
+-/* Must be called with the queue_lock held. */
 -static int bfqq_process_refs(struct bfq_queue *bfqq)
 -{
 -	int process_refs, io_refs;
@@ -621,7 +727,7 @@ index f5f71e4..0d3503d 100644
  static inline unsigned long bfq_bfqq_budget_left(struct bfq_queue *bfqq)
  {
  	struct bfq_entity *entity = &bfqq->entity;
-@@ -1729,7 +1940,7 @@ static inline bool bfq_bfqq_must_idle(struct bfq_queue *bfqq)
+@@ -2263,7 +2522,7 @@ static inline bool bfq_bfqq_must_idle(struct bfq_queue *bfqq)
   */
  static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
  {
@@ -630,7 +736,7 @@ index f5f71e4..0d3503d 100644
  	struct request *next_rq;
  	enum bfqq_expiration reason = BFQ_BFQQ_BUDGET_TIMEOUT;
  
-@@ -1739,17 +1950,6 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
+@@ -2273,17 +2532,6 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
  
  	bfq_log_bfqq(bfqd, bfqq, "select_queue: already in-service queue");
  
@@ -648,7 +754,7 @@ index f5f71e4..0d3503d 100644
  	if (bfq_may_expire_for_budg_timeout(bfqq) &&
  	    !timer_pending(&bfqd->idle_slice_timer) &&
  	    !bfq_bfqq_must_idle(bfqq))
-@@ -1786,36 +1986,26 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
+@@ -2322,10 +2570,7 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
  				bfq_clear_bfqq_wait_request(bfqq);
  				del_timer(&bfqd->idle_slice_timer);
  			}
@@ -660,13 +766,9 @@ index f5f71e4..0d3503d 100644
  		}
  	}
  
- 	/*
--	 * No requests pending.  If the in-service queue has no cooperator and
--	 * still has requests in flight (possibly waiting for a completion)
--	 * or is idling for a new request, then keep it.
-+	 * No requests pending.  If the in-service queue still has requests in
-+	 * flight (possibly waiting for a completion) or is idling for a new
-+	 * request, then keep it.
+@@ -2334,40 +2579,30 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
+ 	 * in flight (possibly waiting for a completion) or is idling for a
+ 	 * new request, then keep it.
  	 */
 -	if (new_bfqq == NULL && (timer_pending(&bfqd->idle_slice_timer) ||
 -	    (bfqq->dispatched != 0 && bfq_bfqq_must_not_expire(bfqq)))) {
@@ -692,44 +794,62 @@ index f5f71e4..0d3503d 100644
  	bfq_log(bfqd, "select_queue: new queue %d returned",
  		bfqq != NULL ? bfqq->pid : 0);
  keep_queue:
-@@ -1825,9 +2015,8 @@ keep_queue:
- static void bfq_update_raising_data(struct bfq_data *bfqd,
- 				    struct bfq_queue *bfqq)
+ 	return bfqq;
+ }
+ 
+-static void bfq_update_wr_data(struct bfq_data *bfqd,
+-			       struct bfq_queue *bfqq)
++static void bfq_update_wr_data(struct bfq_data *bfqd, struct bfq_queue *bfqq)
  {
-+	struct bfq_entity *entity = &bfqq->entity;
- 	if (bfqq->raising_coeff > 1) { /* queue is being boosted */
+-	if (bfqq->wr_coeff > 1) { /* queue is being boosted */
 -		struct bfq_entity *entity = &bfqq->entity;
 -
++	struct bfq_entity *entity = &bfqq->entity;
++	if (bfqq->wr_coeff > 1) { /* queue is being weight-raised */
  		bfq_log_bfqq(bfqd, bfqq,
- 			"raising period dur %u/%u msec, "
- 			"old raising coeff %u, w %d(%d)",
-@@ -1844,7 +2033,7 @@ static void bfq_update_raising_data(struct bfq_data *bfqd,
- 			"WARN: pending prio change");
+ 			"raising period dur %u/%u msec, old coeff %u, w %d(%d)",
+-			jiffies_to_msecs(jiffies -
+-				bfqq->last_wr_start_finish),
++			jiffies_to_msecs(jiffies - bfqq->last_wr_start_finish),
+ 			jiffies_to_msecs(bfqq->wr_cur_max_time),
+ 			bfqq->wr_coeff,
+ 			bfqq->entity.weight, bfqq->entity.orig_weight);
+@@ -2376,12 +2611,16 @@ static void bfq_update_wr_data(struct bfq_data *bfqd,
+ 		       entity->orig_weight * bfqq->wr_coeff);
+ 		if (entity->ioprio_changed)
+ 			bfq_log_bfqq(bfqd, bfqq, "WARN: pending prio change");
++
  		/*
- 		 * If too much time has elapsed from the beginning
--		 * of this weight-raising, stop it.
-+		 * of this weight-raising period, stop it.
+ 		 * If the queue was activated in a burst, or
+ 		 * too much time has elapsed from the beginning
+-		 * of this weight-raising, then end weight raising.
++		 * of this weight-raising period, or the queue has
++		 * exceeded the acceptable number of cooperations,
++		 * then end weight raising.
  		 */
- 		if (time_is_before_jiffies(bfqq->last_rais_start_finish +
- 					   bfqq->raising_cur_max_time)) {
-@@ -1856,11 +2045,13 @@ static void bfq_update_raising_data(struct bfq_data *bfqd,
- 				     jiffies_to_msecs(bfqq->
- 					raising_cur_max_time));
- 			bfq_bfqq_end_raising(bfqq);
+ 		if (bfq_bfqq_in_large_burst(bfqq) ||
++		    bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh ||
+ 		    time_is_before_jiffies(bfqq->last_wr_start_finish +
+ 					   bfqq->wr_cur_max_time)) {
+ 			bfqq->last_wr_start_finish = jiffies;
+@@ -2390,11 +2629,13 @@ static void bfq_update_wr_data(struct bfq_data *bfqd,
+ 				     bfqq->last_wr_start_finish,
+ 				     jiffies_to_msecs(bfqq->wr_cur_max_time));
+ 			bfq_bfqq_end_wr(bfqq);
 -			__bfq_entity_update_weight_prio(
 -				bfq_entity_service_tree(entity),
 -				entity);
  		}
  	}
 +	/* Update weight both if it must be raised and if it must be lowered */
-+	if ((entity->weight > entity->orig_weight) != (bfqq->raising_coeff > 1))
++	if ((entity->weight > entity->orig_weight) != (bfqq->wr_coeff > 1))
 +		__bfq_entity_update_weight_prio(
 +			bfq_entity_service_tree(entity),
 +			entity);
  }
  
  /*
-@@ -2101,6 +2292,25 @@ static void bfq_init_icq(struct io_cq *icq)
+@@ -2642,6 +2883,25 @@ static inline void bfq_init_icq(struct io_cq *icq)
  	struct bfq_io_cq *bic = icq_to_bic(icq);
  
  	bic->ttime.last_end_request = jiffies;
@@ -751,11 +871,11 @@ index f5f71e4..0d3503d 100644
 +	 * the field raising_time_left and assign 1 to it, to mark the queue
 +	 * as needing weight raising.
 +	 */
-+	bic->raising_time_left = 1;
++	bic->wr_time_left = 1;
  }
  
  static void bfq_exit_icq(struct io_cq *icq)
-@@ -2114,6 +2324,13 @@ static void bfq_exit_icq(struct io_cq *icq)
+@@ -2655,6 +2915,13 @@ static void bfq_exit_icq(struct io_cq *icq)
  	}
  
  	if (bic->bfqq[BLK_RW_SYNC]) {
@@ -769,7 +889,7 @@ index f5f71e4..0d3503d 100644
  		bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_SYNC]);
  		bic->bfqq[BLK_RW_SYNC] = NULL;
  	}
-@@ -2405,6 +2622,10 @@ static void bfq_update_idle_window(struct bfq_data *bfqd,
+@@ -2950,6 +3217,10 @@ static void bfq_update_idle_window(struct bfq_data *bfqd,
  	if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq))
  		return;
  
@@ -780,7 +900,7 @@ index f5f71e4..0d3503d 100644
  	enable_idle = bfq_bfqq_idle_window(bfqq);
  
  	if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
-@@ -2445,6 +2666,7 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+@@ -2997,6 +3268,7 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
  	if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
  	    !BFQQ_SEEKY(bfqq))
  		bfq_update_idle_window(bfqd, bfqq, bic);
@@ -788,7 +908,7 @@ index f5f71e4..0d3503d 100644
  
  	bfq_log_bfqq(bfqd, bfqq,
  		     "rq_enqueued: idle_window=%d (seeky %d, mean %llu)",
-@@ -2505,13 +2727,48 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+@@ -3057,13 +3329,49 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
  static void bfq_insert_request(struct request_queue *q, struct request *rq)
  {
  	struct bfq_data *bfqd = q->elevator->elevator_data;
@@ -820,12 +940,13 @@ index f5f71e4..0d3503d 100644
 +						bfqq, new_bfqq);
 +			rq->elv.priv[1] = new_bfqq;
 +			bfqq = new_bfqq;
-+		}
++		} else
++			bfq_bfqq_increase_failed_cooperations(bfqq);
 +	}
 +
  	bfq_init_prio_data(bfqq, RQ_BIC(rq));
  
- 	bfq_add_rq_rb(rq);
+ 	bfq_add_request(rq);
  
 +	/*
 +	 * Here a newly-created bfq_queue has already started a weight-raising
@@ -834,11 +955,11 @@ index f5f71e4..0d3503d 100644
 +	 * comments about this field in bfq_init_icq().
 +	 */
 +	if (bfqq->bic != NULL)
-+		bfqq->bic->raising_time_left = 0;
++		bfqq->bic->wr_time_left = 0;
  	rq_set_fifo_time(rq, jiffies + bfqd->bfq_fifo_expire[rq_is_sync(rq)]);
  	list_add_tail(&rq->queuelist, &bfqq->fifo);
  
-@@ -2663,18 +2920,6 @@ static void bfq_put_request(struct request *rq)
+@@ -3228,18 +3536,6 @@ static void bfq_put_request(struct request *rq)
  	}
  }
  
@@ -857,7 +978,7 @@ index f5f71e4..0d3503d 100644
  /*
   * Returns NULL if a new bfqq should be allocated, or the old bfqq if this
   * was the last process referring to said bfqq.
-@@ -2683,6 +2928,9 @@ static struct bfq_queue *
+@@ -3248,6 +3544,9 @@ static struct bfq_queue *
  bfq_split_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq)
  {
  	bfq_log_bfqq(bfqq->bfqd, bfqq, "splitting queue");
@@ -867,7 +988,7 @@ index f5f71e4..0d3503d 100644
  	if (bfqq_process_refs(bfqq) == 1) {
  		bfqq->pid = current->pid;
  		bfq_clear_bfqq_coop(bfqq);
-@@ -2711,6 +2959,7 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
+@@ -3276,6 +3575,7 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
  	struct bfq_queue *bfqq;
  	struct bfq_group *bfqg;
  	unsigned long flags;
@@ -875,9 +996,21 @@ index f5f71e4..0d3503d 100644
  
  	might_sleep_if(gfp_mask & __GFP_WAIT);
  
-@@ -2729,24 +2978,14 @@ new_queue:
+@@ -3293,25 +3593,26 @@ new_queue:
+ 	if (bfqq == NULL || bfqq == &bfqd->oom_bfqq) {
  		bfqq = bfq_get_queue(bfqd, bfqg, is_sync, bic, gfp_mask);
  		bic_set_bfqq(bic, bfqq, is_sync);
++		if (split && is_sync) {
++			if ((bic->was_in_burst_list && bfqd->large_burst) ||
++			    bic->saved_in_large_burst)
++				bfq_mark_bfqq_in_large_burst(bfqq);
++			else {
++			    bfq_clear_bfqq_in_large_burst(bfqq);
++			    if (bic->was_in_burst_list)
++			       hlist_add_head(&bfqq->burst_list_node,
++				              &bfqd->burst_list);
++			}
++		}
  	} else {
 -		/*
 -		 * If the queue was seeky for too long, break it apart.
@@ -902,7 +1035,7 @@ index f5f71e4..0d3503d 100644
  	}
  
  	bfqq->allocated[rw]++;
-@@ -2757,6 +2996,26 @@ new_queue:
+@@ -3322,6 +3623,26 @@ new_queue:
  	rq->elv.priv[0] = bic;
  	rq->elv.priv[1] = bfqq;
  
@@ -913,14 +1046,14 @@ index f5f71e4..0d3503d 100644
 +	 * queue has just been split, mark a flag so that the
 +	 * information is available to the other scheduler hooks.
 +	 */
-+	if (bfqq_process_refs(bfqq) == 1) {
++	if (likely(bfqq != &bfqd->oom_bfqq) && bfqq_process_refs(bfqq) == 1) {
 +		bfqq->bic = bic;
 +		if (split) {
 +			bfq_mark_bfqq_just_split(bfqq);
 +			/*
-+			 * If the queue has just been split from a shared queue,
-+			 * restore the idle window and the possible weight
-+			 * raising period.
++			 * If the queue has just been split from a shared
++			 * queue, restore the idle window and the possible
++			 * weight raising period.
 +			 */
 +			bfq_bfqq_resume_state(bfqq, bic);
 +		}
@@ -930,10 +1063,10 @@ index f5f71e4..0d3503d 100644
  
  	return 0;
 diff --git a/block/bfq-sched.c b/block/bfq-sched.c
-index 999b475..e54ea33 100644
+index 2931563..6764a7e 100644
 --- a/block/bfq-sched.c
 +++ b/block/bfq-sched.c
-@@ -980,34 +980,6 @@ static struct bfq_queue *bfq_get_next_queue(struct bfq_data *bfqd)
+@@ -1091,34 +1091,6 @@ static struct bfq_queue *bfq_get_next_queue(struct bfq_data *bfqd)
  	return bfqq;
  }
  
@@ -957,8 +1090,8 @@ index 999b475..e54ea33 100644
 -		bfq_update_budget(entity);
 -		bfq_update_vtime(bfq_entity_service_tree(entity));
 -		bfq_active_extract(bfq_entity_service_tree(entity), entity);
--		sd->active_entity = entity;
--		sd->next_active = NULL;
+-		sd->in_service_entity = entity;
+-		sd->next_in_service = NULL;
 -		entity->service = 0;
 -	}
 -
@@ -969,38 +1102,69 @@ index 999b475..e54ea33 100644
  {
  	if (bfqd->in_service_bic != NULL) {
 diff --git a/block/bfq.h b/block/bfq.h
-index 3ca8482..c278796 100644
+index 050eeaf..a2022d0 100644
 --- a/block/bfq.h
 +++ b/block/bfq.h
-@@ -200,6 +200,8 @@ struct bfq_group;
+@@ -218,18 +218,21 @@ struct bfq_group;
+  *                      idle @bfq_queue with no outstanding requests, then
+  *                      the task associated with the queue it is deemed as
+  *                      soft real-time (see the comments to the function
+- *                      bfq_bfqq_softrt_next_start()).
++ *                      bfq_bfqq_softrt_next_start())
+  * @last_idle_bklogged: time of the last transition of the @bfq_queue from
   *                      idle to backlogged
   * @service_from_backlogged: cumulative service received from the @bfq_queue
-  *                           since the last transition from idle to backlogged
+  *                           since the last transition from idle to
+  *                           backlogged
 + * @bic: pointer to the bfq_io_cq owning the bfq_queue, set to %NULL if the
 + *	 queue is shared
   *
-  * A bfq_queue is a leaf request queue; it can be associated with an io_context
-  * or more, if it is async or shared between cooperating processes. @cgroup
-@@ -243,6 +245,7 @@ struct bfq_queue {
- 	sector_t last_request_pos;
+- * A bfq_queue is a leaf request queue; it can be associated with an io_context
+- * or more, if it is async or shared between cooperating processes. @cgroup
+- * holds a reference to the cgroup, to be sure that it does not disappear while
+- * a bfqq still references it (mostly to avoid races between request issuing and
+- * task migration followed by cgroup destruction).
++ * A bfq_queue is a leaf request queue; it can be associated with an
++ * io_context or more, if it  is  async or shared  between  cooperating
++ * processes. @cgroup holds a reference to the cgroup, to be sure that it
++ * does not disappear while a bfqq still references it (mostly to avoid
++ * races between request issuing and task migration followed by cgroup
++ * destruction).
+  * All the fields are protected by the queue lock of the containing bfqd.
+  */
+ struct bfq_queue {
+@@ -269,6 +272,7 @@ struct bfq_queue {
+ 	unsigned int requests_within_timer;
  
  	pid_t pid;
 +	struct bfq_io_cq *bic;
  
  	/* weight-raising fields */
- 	unsigned long raising_cur_max_time;
-@@ -272,12 +275,23 @@ struct bfq_ttime {
+ 	unsigned long wr_cur_max_time;
+@@ -298,12 +302,42 @@ struct bfq_ttime {
   * @icq: associated io_cq structure
   * @bfqq: array of two process queues, the sync and the async
   * @ttime: associated @bfq_ttime struct
-+ * @raising_time_left: snapshot of the time left before weight raising ends
-+ *		       for the sync queue associated to this process; this
-+ *		       snapshot is taken to remember this value while the weight
-+ *		       raising is suspended because the queue is merged with a
-+ *		       shared queue, and is used to set @raising_cur_max_time
-+ *		       when the queue is split from the shared queue and its
-+ *		       weight is raised again
-+ * @saved_idle_window: same purpose as the previous field for the idle window
++ * @wr_time_left: snapshot of the time left before weight raising ends
++ *                for the sync queue associated to this process; this
++ *		  snapshot is taken to remember this value while the weight
++ *		  raising is suspended because the queue is merged with a
++ *		  shared queue, and is used to set @raising_cur_max_time
++ *		  when the queue is split from the shared queue and its
++ *		  weight is raised again
++ * @saved_idle_window: same purpose as the previous field for the idle
++ *                     window
++ * @saved_IO_bound: same purpose as the previous two fields for the I/O
++ *                  bound classification of a queue
++ * @saved_in_large_burst: same purpose as the previous fields for the
++ *                        value of the field keeping the queue's belonging
++ *                        to a large burst
++ * @was_in_burst_list: true if the queue belonged to a burst list
++ *                     before its merge with another cooperating queue
++ * @cooperations: counter of consecutive successful queue merges underwent
++ *                by any of the process' @bfq_queues
++ * @failed_cooperations: counter of consecutive failed queue merges of any
++ *                       of the process' @bfq_queues
   */
  struct bfq_io_cq {
  	struct io_cq icq; /* must be the first member */
@@ -1008,25 +1172,45 @@ index 3ca8482..c278796 100644
  	struct bfq_ttime ttime;
  	int ioprio;
 +
-+	unsigned int raising_time_left;
-+	unsigned int saved_idle_window;
++	unsigned int wr_time_left;
++	bool saved_idle_window;
++	bool saved_IO_bound;
++
++	bool saved_in_large_burst;
++	bool was_in_burst_list;
++
++	unsigned int cooperations;
++	unsigned int failed_cooperations;
  };
  
- /**
-@@ -418,8 +432,9 @@ enum bfqq_state_flags {
+ enum bfq_device_speed {
+@@ -539,7 +573,7 @@ enum bfqq_state_flags {
+ 	BFQ_BFQQ_FLAG_prio_changed,	/* task priority has changed */
  	BFQ_BFQQ_FLAG_sync,		/* synchronous queue */
  	BFQ_BFQQ_FLAG_budget_new,	/* no completion with this budget */
+-	BFQ_BFQQ_FLAG_IO_bound,         /*
++	BFQ_BFQQ_FLAG_IO_bound,		/*
+ 					 * bfqq has timed-out at least once
+ 					 * having consumed at most 2/10 of
+ 					 * its budget
+@@ -552,12 +586,13 @@ enum bfqq_state_flags {
+ 					 * bfqq has proved to be slow and
+ 					 * seeky until budget timeout
+ 					 */
+-	BFQ_BFQQ_FLAG_softrt_update,    /*
++	BFQ_BFQQ_FLAG_softrt_update,	/*
+ 					 * may need softrt-next-start
+ 					 * update
+ 					 */
  	BFQ_BFQQ_FLAG_coop,		/* bfqq is shared */
 -	BFQ_BFQQ_FLAG_split_coop,	/* shared bfqq will be splitted */
--	BFQ_BFQQ_FLAG_softrt_update,	/* needs softrt-next-start update */
 +	BFQ_BFQQ_FLAG_split_coop,	/* shared bfqq will be split */
 +	BFQ_BFQQ_FLAG_just_split,	/* queue has just been split */
-+	BFQ_BFQQ_FLAG_softrt_update,	/* may need softrt-next-start update */
  };
  
  #define BFQ_BFQQ_FNS(name)						\
-@@ -446,6 +461,7 @@ BFQ_BFQQ_FNS(sync);
- BFQ_BFQQ_FNS(budget_new);
+@@ -587,6 +622,7 @@ BFQ_BFQQ_FNS(in_large_burst);
+ BFQ_BFQQ_FNS(constantly_seeky);
  BFQ_BFQQ_FNS(coop);
  BFQ_BFQQ_FNS(split_coop);
 +BFQ_BFQQ_FNS(just_split);
@@ -1034,5 +1218,5 @@ index 3ca8482..c278796 100644
  #undef BFQ_BFQQ_FNS
  
 -- 
-1.9.0
+2.1.3
 


             reply	other threads:[~2015-01-09 16:18 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-09 16:18 Mike Pagano [this message]
  -- strict thread matches above, loose matches on Subject: below --
2016-12-11 22:31 [gentoo-commits] proj/linux-patches:3.14 commit in: / Mike Pagano
2016-09-11 17:39 Mike Pagano
2016-09-09 19:22 Mike Pagano
2016-08-20 16:29 Mike Pagano
2016-08-17 12:18 Mike Pagano
2016-08-10 12:53 Mike Pagano
2016-07-27 19:15 Mike Pagano
2016-06-24 20:37 Mike Pagano
2016-06-08 11:21 Mike Pagano
2016-06-02 18:01 Mike Pagano
2016-05-19 12:38 Mike Pagano
2016-05-12  0:07 Mike Pagano
2016-05-04 23:46 Mike Pagano
2016-04-20 10:10 Mike Pagano
2016-04-12 19:01 Mike Pagano
2016-03-16 19:41 Mike Pagano
2016-03-10  0:49 Mike Pagano
2016-03-04  0:16 Mike Pagano
2016-02-25 23:29 Mike Pagano
2016-02-17 23:58 Mike Pagano
2016-01-31 21:34 Mike Pagano
2016-01-23 18:58 Mike Pagano
2016-01-20 15:13 Mike Pagano
2015-12-10 13:52 Mike Pagano
2015-11-10  0:05 Mike Pagano
2015-10-27 13:38 Mike Pagano
2015-10-23 19:40 Mike Pagano
2015-10-01 13:18 Mike Pagano
2015-09-21 17:37 Mike Pagano
2015-09-14 16:23 Mike Pagano
2015-08-17 16:37 Mike Pagano
2015-08-10 23:13 Mike Pagano
2015-08-03 22:33 Mike Pagano
2015-07-17 15:34 Mike Pagano
2015-07-10 23:40 Mike Pagano
2015-07-07  0:44 Mike Pagano
2015-06-30 14:34 Mike Pagano
2015-06-23 17:10 Mike Pagano
2015-06-06 21:34 Mike Pagano
2015-05-18 19:33 Mike Pagano
2015-05-13 19:23 Mike Pagano
2015-05-08 12:14 Mike Pagano
2015-04-29 17:04 Mike Pagano
2015-04-20  9:42 Mike Pagano
2015-04-14  9:50 Mike Pagano
2015-03-28 20:25 Mike Pagano
2015-03-26 20:52 Mike Pagano
2015-03-19 12:42 Mike Pagano
2015-03-07 14:45 Mike Pagano
2015-02-27 14:34 Mike Pagano
2015-02-14 21:11 Mike Pagano
2015-02-11 15:16 Mike Pagano
2015-02-07  1:28 Mike Pagano
2015-01-30 11:12 Mike Pagano
2015-01-28 22:16 Anthony G. Basile
2015-01-28 22:01 Anthony G. Basile
2015-01-17  0:55 Mike Pagano
2015-01-09 18:28 Mike Pagano
2015-01-02 19:10 Mike Pagano
2014-12-16 20:29 Mike Pagano
2014-12-09 23:03 Mike Pagano
2014-11-23 12:07 Anthony G. Basile
2014-11-22 20:16 Mike Pagano
2014-11-15  0:32 Mike Pagano
2014-10-30 22:56 Mike Pagano
2014-10-30 22:42 Mike Pagano
2014-10-15 15:43 Mike Pagano
2014-10-09 23:03 Mike Pagano
2014-10-06 15:44 Mike Pagano
2014-09-17 19:59 Anthony G. Basile
2014-09-09 22:16 Vlastimil Babka
2014-08-19 11:44 Mike Pagano
2014-08-08 18:30 ` Mike Pagano
2014-08-14 12:44 Mike Pagano
2014-08-19 11:44 ` Mike Pagano
2014-08-02  0:19 Mike Pagano
2014-08-19 11:44 ` Mike Pagano
2014-07-28 19:17 Mike Pagano
2014-08-19 11:44 ` Mike Pagano
2014-07-18 12:05 Mike Pagano
2014-07-09 23:09 Mike Pagano
2014-07-08 18:04 Mike Pagano
2014-07-01 12:08 Mike Pagano
2014-06-27 15:00 Mike Pagano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1420820324.0b8b3489401caaf8a95a16b8636504a3179aa437.mpagano@gentoo \
    --to=mpagano@gentoo.org \
    --cc=gentoo-commits@lists.gentoo.org \
    --cc=gentoo-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox