* [gentoo-commits] proj/xen-upstream-patches:main commit in: /
@ 2022-07-12 19:34 Florian Schmaus
0 siblings, 0 replies; 11+ messages in thread
From: Florian Schmaus @ 2022-07-12 19:34 UTC (permalink / raw
To: gentoo-commits
commit: 203664c0242012d9e1b0c78d4f4909186d243787
Author: Florian Schmaus <flow <AT> gentoo <DOT> org>
AuthorDate: Tue Jul 12 19:32:32 2022 +0000
Commit: Florian Schmaus <flow <AT> gentoo <DOT> org>
CommitDate: Tue Jul 12 19:32:32 2022 +0000
URL: https://gitweb.gentoo.org/proj/xen-upstream-patches.git/commit/?id=203664c0
Xen 4.16.2-pre-patchset-1
Signed-off-by: Florian Schmaus <flow <AT> gentoo.org>
0001-update-Xen-version-to-4.16.2-pre.patch | 2 +-
...p-unmap_domain_pirq-XSM-during-destructio.patch | 2 +-
...xen-fix-XEN_DOMCTL_gdbsx_guestmemio-crash.patch | 2 +-
...e-to-use-IOMMU-with-reserved-CAP.ND-value.patch | 2 +-
...d-inadvertently-degrading-a-TLB-flush-to-.patch | 2 +-
...xen-build-Fix-dependency-for-the-MAP-rule.patch | 2 +-
...evtchn-don-t-set-errno-to-negative-values.patch | 2 +-
...-ctrl-don-t-set-errno-to-a-negative-value.patch | 2 +-
...guest-don-t-set-errno-to-a-negative-value.patch | 2 +-
...light-don-t-set-errno-to-a-negative-value.patch | 2 +-
...mmu-cleanup-iommu-related-domctl-handling.patch | 2 +-
...-make-domctl-handler-tolerate-NULL-domain.patch | 2 +-
...-disallow-device-assignment-to-PoD-guests.patch | 2 +-
...-msr-handle-reads-to-MSR_P5_MC_-ADDR-TYPE.patch | 2 +-
0015-kconfig-detect-LD-implementation.patch | 2 +-
...-lld-do-not-generate-quoted-section-names.patch | 2 +-
...race-between-sending-an-I-O-and-domain-sh.patch | 2 +-
...ess-GNU-ld-warning-about-RWX-load-segment.patch | 2 +-
...ce-GNU-ld-warning-about-executable-stacks.patch | 2 +-
...0-use-poll-mode-if-INTERRUPT_LINE-is-0xff.patch | 2 +-
...llow-pci-phantom-to-mark-real-devices-as-.patch | 2 +-
0022-x86-pv-Clean-up-_get_page_type.patch | 2 +-
...v-Fix-ABAC-cmpxchg-race-in-_get_page_type.patch | 2 +-
...troduce-_PAGE_-constants-for-memory-types.patch | 2 +-
...-change-the-cacheability-of-the-directmap.patch | 2 +-
...-Split-cache_flush-out-of-cache_writeback.patch | 2 +-
...rk-around-CLFLUSH-ordering-on-older-parts.patch | 2 +-
...ck-and-flush-non-coherent-mappings-of-RAM.patch | 2 +-
...unt-for-PGT_pae_xen_l2-in-recently-added-.patch | 2 +-
...rl-Make-VERW-flushing-runtime-conditional.patch | 2 +-
...rl-Enumeration-for-MMIO-Stale-Data-contro.patch | 2 +-
0032-x86-spec-ctrl-Add-spec-ctrl-unpriv-mmio.patch | 2 +-
...ork-around-bogus-gcc12-warning-in-hvm_gsi.patch | 52 ++++
...i-dbgp-fix-selecting-n-th-ehci-controller.patch | 36 +++
0035-tools-xenstored-Harden-corrupt.patch | 44 +++
...rl-Only-adjust-MSR_SPEC_CTRL-for-idle-wit.patch | 93 +++++++
...rl-Knobs-for-STIBP-and-PSFD-and-follow-ha.patch | 234 ++++++++++++++++
0038-libxc-fix-compilation-error-with-gcc13.patch | 33 +++
...rl-Honour-spec-ctrl-0-for-unpriv-mmio-sub.patch | 32 +++
...-Extend-parse_boolean-to-signal-a-name-ma.patch | 87 ++++++
...rl-Add-fine-grained-cmdline-suboptions-fo.patch | 137 +++++++++
...rs-fix-build-of-xen-init-dom0-with-Werror.patch | 28 ++
...-return-value-of-libxl__xs_directory-in-n.patch | 38 +++
...rl-Rework-spec_ctrl_flags-context-switchi.patch | 167 +++++++++++
...rl-Rename-SCF_ist_wrmsr-to-SCF_ist_sc_msr.patch | 110 ++++++++
...rl-Rename-opt_ibpb-to-opt_ibpb_ctxt_switc.patch | 97 +++++++
...ctrl-Rework-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch | 106 +++++++
0048-x86-spec-ctrl-Support-IBPB-on-entry.patch | 300 ++++++++++++++++++++
0049-x86-cpuid-Enumeration-for-BTC_NO.patch | 106 +++++++
0050-x86-spec-ctrl-Enable-Zen2-chickenbit.patch | 106 +++++++
...rl-Mitigate-Branch-Type-Confusion-when-po.patch | 305 +++++++++++++++++++++
info.txt | 4 +-
52 files changed, 2145 insertions(+), 34 deletions(-)
diff --git a/0001-update-Xen-version-to-4.16.2-pre.patch b/0001-update-Xen-version-to-4.16.2-pre.patch
index 30411de..2e62c21 100644
--- a/0001-update-Xen-version-to-4.16.2-pre.patch
+++ b/0001-update-Xen-version-to-4.16.2-pre.patch
@@ -1,7 +1,7 @@
From 5be9edb482ab20cf3e7acb05b511465294d1e19b Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 7 Jun 2022 13:55:17 +0200
-Subject: [PATCH 01/32] update Xen version to 4.16.2-pre
+Subject: [PATCH 01/51] update Xen version to 4.16.2-pre
---
xen/Makefile | 2 +-
diff --git a/0002-x86-irq-skip-unmap_domain_pirq-XSM-during-destructio.patch b/0002-x86-irq-skip-unmap_domain_pirq-XSM-during-destructio.patch
index fc6c2e1..0ba090e 100644
--- a/0002-x86-irq-skip-unmap_domain_pirq-XSM-during-destructio.patch
+++ b/0002-x86-irq-skip-unmap_domain_pirq-XSM-during-destructio.patch
@@ -1,7 +1,7 @@
From b58fb6e81bd55b6bd946abc3070770f7994c9ef9 Mon Sep 17 00:00:00 2001
From: Jason Andryuk <jandryuk@gmail.com>
Date: Tue, 7 Jun 2022 13:55:39 +0200
-Subject: [PATCH 02/32] x86/irq: skip unmap_domain_pirq XSM during destruction
+Subject: [PATCH 02/51] x86/irq: skip unmap_domain_pirq XSM during destruction
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
diff --git a/0003-xen-fix-XEN_DOMCTL_gdbsx_guestmemio-crash.patch b/0003-xen-fix-XEN_DOMCTL_gdbsx_guestmemio-crash.patch
index 905993b..fa1443c 100644
--- a/0003-xen-fix-XEN_DOMCTL_gdbsx_guestmemio-crash.patch
+++ b/0003-xen-fix-XEN_DOMCTL_gdbsx_guestmemio-crash.patch
@@ -1,7 +1,7 @@
From 6c6bbfdff9374ef41f84c4ebed7b8a7a40767ef6 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 7 Jun 2022 13:56:54 +0200
-Subject: [PATCH 03/32] xen: fix XEN_DOMCTL_gdbsx_guestmemio crash
+Subject: [PATCH 03/51] xen: fix XEN_DOMCTL_gdbsx_guestmemio crash
A hypervisor built without CONFIG_GDBSX will crash in case the
XEN_DOMCTL_gdbsx_guestmemio domctl is being called, as the call will
diff --git a/0004-VT-d-refuse-to-use-IOMMU-with-reserved-CAP.ND-value.patch b/0004-VT-d-refuse-to-use-IOMMU-with-reserved-CAP.ND-value.patch
index c566888..a4d229a 100644
--- a/0004-VT-d-refuse-to-use-IOMMU-with-reserved-CAP.ND-value.patch
+++ b/0004-VT-d-refuse-to-use-IOMMU-with-reserved-CAP.ND-value.patch
@@ -1,7 +1,7 @@
From b378ee56c7e0bb5eeb35dcc55b3d29e5f50eb566 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 7 Jun 2022 13:58:16 +0200
-Subject: [PATCH 04/32] VT-d: refuse to use IOMMU with reserved CAP.ND value
+Subject: [PATCH 04/51] VT-d: refuse to use IOMMU with reserved CAP.ND value
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
diff --git a/0005-x86-mm-avoid-inadvertently-degrading-a-TLB-flush-to-.patch b/0005-x86-mm-avoid-inadvertently-degrading-a-TLB-flush-to-.patch
index 6410aaa..45a1825 100644
--- a/0005-x86-mm-avoid-inadvertently-degrading-a-TLB-flush-to-.patch
+++ b/0005-x86-mm-avoid-inadvertently-degrading-a-TLB-flush-to-.patch
@@ -1,7 +1,7 @@
From 7c003ab4a398ff4ddd54d15d4158cffb463134cc Mon Sep 17 00:00:00 2001
From: David Vrabel <dvrabel@amazon.co.uk>
Date: Tue, 7 Jun 2022 13:59:31 +0200
-Subject: [PATCH 05/32] x86/mm: avoid inadvertently degrading a TLB flush to
+Subject: [PATCH 05/51] x86/mm: avoid inadvertently degrading a TLB flush to
local only
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
diff --git a/0006-xen-build-Fix-dependency-for-the-MAP-rule.patch b/0006-xen-build-Fix-dependency-for-the-MAP-rule.patch
index 6489cba..7eb13cd 100644
--- a/0006-xen-build-Fix-dependency-for-the-MAP-rule.patch
+++ b/0006-xen-build-Fix-dependency-for-the-MAP-rule.patch
@@ -1,7 +1,7 @@
From 4bb8c34ba4241c2bf7845cd8b80c17530dbfb085 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Tue, 7 Jun 2022 14:00:09 +0200
-Subject: [PATCH 06/32] xen/build: Fix dependency for the MAP rule
+Subject: [PATCH 06/51] xen/build: Fix dependency for the MAP rule
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
diff --git a/0007-tools-libs-evtchn-don-t-set-errno-to-negative-values.patch b/0007-tools-libs-evtchn-don-t-set-errno-to-negative-values.patch
index 2f02fcc..ed98922 100644
--- a/0007-tools-libs-evtchn-don-t-set-errno-to-negative-values.patch
+++ b/0007-tools-libs-evtchn-don-t-set-errno-to-negative-values.patch
@@ -1,7 +1,7 @@
From 13a29f3756bc4cab96c59f46c3875b483553fb8f Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 7 Jun 2022 14:00:31 +0200
-Subject: [PATCH 07/32] tools/libs/evtchn: don't set errno to negative values
+Subject: [PATCH 07/51] tools/libs/evtchn: don't set errno to negative values
Setting errno to a negative value makes no sense.
diff --git a/0008-tools-libs-ctrl-don-t-set-errno-to-a-negative-value.patch b/0008-tools-libs-ctrl-don-t-set-errno-to-a-negative-value.patch
index acd7955..166f0ff 100644
--- a/0008-tools-libs-ctrl-don-t-set-errno-to-a-negative-value.patch
+++ b/0008-tools-libs-ctrl-don-t-set-errno-to-a-negative-value.patch
@@ -1,7 +1,7 @@
From ba62afdbc31a8cfe897191efd25ed4449d9acd94 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 7 Jun 2022 14:01:03 +0200
-Subject: [PATCH 08/32] tools/libs/ctrl: don't set errno to a negative value
+Subject: [PATCH 08/51] tools/libs/ctrl: don't set errno to a negative value
The claimed reason for setting errno to -1 is wrong. On x86
xc_domain_pod_target() will set errno to a sane value in the error
diff --git a/0009-tools-libs-guest-don-t-set-errno-to-a-negative-value.patch b/0009-tools-libs-guest-don-t-set-errno-to-a-negative-value.patch
index 41eb1f1..5d035f6 100644
--- a/0009-tools-libs-guest-don-t-set-errno-to-a-negative-value.patch
+++ b/0009-tools-libs-guest-don-t-set-errno-to-a-negative-value.patch
@@ -1,7 +1,7 @@
From a2cf30eec08db5df974a9e8bb7366fee8fc7fcd9 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 7 Jun 2022 14:01:27 +0200
-Subject: [PATCH 09/32] tools/libs/guest: don't set errno to a negative value
+Subject: [PATCH 09/51] tools/libs/guest: don't set errno to a negative value
Setting errno to a negative error value makes no sense.
diff --git a/0010-tools-libs-light-don-t-set-errno-to-a-negative-value.patch b/0010-tools-libs-light-don-t-set-errno-to-a-negative-value.patch
index a83e1cc..ac900ae 100644
--- a/0010-tools-libs-light-don-t-set-errno-to-a-negative-value.patch
+++ b/0010-tools-libs-light-don-t-set-errno-to-a-negative-value.patch
@@ -1,7 +1,7 @@
From 15391de8e2bb6153eadd483154c53044ab53d98d Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 7 Jun 2022 14:01:44 +0200
-Subject: [PATCH 10/32] tools/libs/light: don't set errno to a negative value
+Subject: [PATCH 10/51] tools/libs/light: don't set errno to a negative value
Setting errno to a negative value makes no sense.
diff --git a/0011-xen-iommu-cleanup-iommu-related-domctl-handling.patch b/0011-xen-iommu-cleanup-iommu-related-domctl-handling.patch
index b62ae9b..3c60de4 100644
--- a/0011-xen-iommu-cleanup-iommu-related-domctl-handling.patch
+++ b/0011-xen-iommu-cleanup-iommu-related-domctl-handling.patch
@@ -1,7 +1,7 @@
From a6c32abd144ec6443c6a433b5a2ac00e2615aa86 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 7 Jun 2022 14:02:08 +0200
-Subject: [PATCH 11/32] xen/iommu: cleanup iommu related domctl handling
+Subject: [PATCH 11/51] xen/iommu: cleanup iommu related domctl handling
Today iommu_do_domctl() is being called from arch_do_domctl() in the
"default:" case of a switch statement. This has led already to crashes
diff --git a/0012-IOMMU-make-domctl-handler-tolerate-NULL-domain.patch b/0012-IOMMU-make-domctl-handler-tolerate-NULL-domain.patch
index ff26651..37b9005 100644
--- a/0012-IOMMU-make-domctl-handler-tolerate-NULL-domain.patch
+++ b/0012-IOMMU-make-domctl-handler-tolerate-NULL-domain.patch
@@ -1,7 +1,7 @@
From 4cf9a7c7bdb9d544fbac81105bbc1059ba3dd932 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 7 Jun 2022 14:02:30 +0200
-Subject: [PATCH 12/32] IOMMU: make domctl handler tolerate NULL domain
+Subject: [PATCH 12/51] IOMMU: make domctl handler tolerate NULL domain
Besides the reporter's issue of hitting a NULL deref when !CONFIG_GDBSX,
XEN_DOMCTL_test_assign_device can legitimately end up having NULL passed
diff --git a/0013-IOMMU-x86-disallow-device-assignment-to-PoD-guests.patch b/0013-IOMMU-x86-disallow-device-assignment-to-PoD-guests.patch
index efadef6..8416c96 100644
--- a/0013-IOMMU-x86-disallow-device-assignment-to-PoD-guests.patch
+++ b/0013-IOMMU-x86-disallow-device-assignment-to-PoD-guests.patch
@@ -1,7 +1,7 @@
From 838f6c211f7f05f107e1acdfb0977ab61ec0bf2e Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 7 Jun 2022 14:03:20 +0200
-Subject: [PATCH 13/32] IOMMU/x86: disallow device assignment to PoD guests
+Subject: [PATCH 13/51] IOMMU/x86: disallow device assignment to PoD guests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
diff --git a/0014-x86-msr-handle-reads-to-MSR_P5_MC_-ADDR-TYPE.patch b/0014-x86-msr-handle-reads-to-MSR_P5_MC_-ADDR-TYPE.patch
index 09f56f5..69049f1 100644
--- a/0014-x86-msr-handle-reads-to-MSR_P5_MC_-ADDR-TYPE.patch
+++ b/0014-x86-msr-handle-reads-to-MSR_P5_MC_-ADDR-TYPE.patch
@@ -1,7 +1,7 @@
From 9ebe2ba83644ec6cd33a93c68dab5f551adcbea0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Date: Tue, 7 Jun 2022 14:04:16 +0200
-Subject: [PATCH 14/32] x86/msr: handle reads to MSR_P5_MC_{ADDR,TYPE}
+Subject: [PATCH 14/51] x86/msr: handle reads to MSR_P5_MC_{ADDR,TYPE}
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
diff --git a/0015-kconfig-detect-LD-implementation.patch b/0015-kconfig-detect-LD-implementation.patch
index f2fc24a..4507bc7 100644
--- a/0015-kconfig-detect-LD-implementation.patch
+++ b/0015-kconfig-detect-LD-implementation.patch
@@ -1,7 +1,7 @@
From 3754bd128d1a6b3d5864d1a3ee5d27b67d35387a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Date: Tue, 7 Jun 2022 14:05:06 +0200
-Subject: [PATCH 15/32] kconfig: detect LD implementation
+Subject: [PATCH 15/51] kconfig: detect LD implementation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
diff --git a/0016-linker-lld-do-not-generate-quoted-section-names.patch b/0016-linker-lld-do-not-generate-quoted-section-names.patch
index a42083e..5b3a8cd 100644
--- a/0016-linker-lld-do-not-generate-quoted-section-names.patch
+++ b/0016-linker-lld-do-not-generate-quoted-section-names.patch
@@ -1,7 +1,7 @@
From 88b653f73928117461dc250acd1e830a47a14c2b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Date: Tue, 7 Jun 2022 14:05:24 +0200
-Subject: [PATCH 16/32] linker/lld: do not generate quoted section names
+Subject: [PATCH 16/51] linker/lld: do not generate quoted section names
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
diff --git a/0017-xen-io-Fix-race-between-sending-an-I-O-and-domain-sh.patch b/0017-xen-io-Fix-race-between-sending-an-I-O-and-domain-sh.patch
index d226e97..bc48a84 100644
--- a/0017-xen-io-Fix-race-between-sending-an-I-O-and-domain-sh.patch
+++ b/0017-xen-io-Fix-race-between-sending-an-I-O-and-domain-sh.patch
@@ -1,7 +1,7 @@
From 982a314bd3000a16c3128afadb36a8ff41029adc Mon Sep 17 00:00:00 2001
From: Julien Grall <jgrall@amazon.com>
Date: Tue, 7 Jun 2022 14:06:11 +0200
-Subject: [PATCH 17/32] xen: io: Fix race between sending an I/O and domain
+Subject: [PATCH 17/51] xen: io: Fix race between sending an I/O and domain
shutdown
Xen provides hypercalls to shutdown (SCHEDOP_shutdown{,_code}) and
diff --git a/0018-build-suppress-GNU-ld-warning-about-RWX-load-segment.patch b/0018-build-suppress-GNU-ld-warning-about-RWX-load-segment.patch
index 87a0873..b20a99a 100644
--- a/0018-build-suppress-GNU-ld-warning-about-RWX-load-segment.patch
+++ b/0018-build-suppress-GNU-ld-warning-about-RWX-load-segment.patch
@@ -1,7 +1,7 @@
From 4890031d224262a6cf43d3bef1af4a16c13db306 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 7 Jun 2022 14:06:51 +0200
-Subject: [PATCH 18/32] build: suppress GNU ld warning about RWX load segments
+Subject: [PATCH 18/51] build: suppress GNU ld warning about RWX load segments
We cannot really avoid such and we're also not really at risk because of
them, as we control page table permissions ourselves rather than relying
diff --git a/0019-build-silence-GNU-ld-warning-about-executable-stacks.patch b/0019-build-silence-GNU-ld-warning-about-executable-stacks.patch
index 75e9f7e..e4d739b 100644
--- a/0019-build-silence-GNU-ld-warning-about-executable-stacks.patch
+++ b/0019-build-silence-GNU-ld-warning-about-executable-stacks.patch
@@ -1,7 +1,7 @@
From 1bc669a568a9f4bdab9e9ddb95823ba370dc0baf Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 7 Jun 2022 14:07:11 +0200
-Subject: [PATCH 19/32] build: silence GNU ld warning about executable stacks
+Subject: [PATCH 19/51] build: silence GNU ld warning about executable stacks
While for C files the compiler is supposed to arrange for emitting
respective information, for assembly sources we're responsible ourselves.
diff --git a/0020-ns16550-use-poll-mode-if-INTERRUPT_LINE-is-0xff.patch b/0020-ns16550-use-poll-mode-if-INTERRUPT_LINE-is-0xff.patch
index b83be9a..baa1e15 100644
--- a/0020-ns16550-use-poll-mode-if-INTERRUPT_LINE-is-0xff.patch
+++ b/0020-ns16550-use-poll-mode-if-INTERRUPT_LINE-is-0xff.patch
@@ -2,7 +2,7 @@ From f1be0b62a03b90a40a03e21f965e4cbb89809bb1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
<marmarek@invisiblethingslab.com>
Date: Tue, 7 Jun 2022 14:07:34 +0200
-Subject: [PATCH 20/32] ns16550: use poll mode if INTERRUPT_LINE is 0xff
+Subject: [PATCH 20/51] ns16550: use poll mode if INTERRUPT_LINE is 0xff
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
diff --git a/0021-PCI-don-t-allow-pci-phantom-to-mark-real-devices-as-.patch b/0021-PCI-don-t-allow-pci-phantom-to-mark-real-devices-as-.patch
index 1264578..1312bda 100644
--- a/0021-PCI-don-t-allow-pci-phantom-to-mark-real-devices-as-.patch
+++ b/0021-PCI-don-t-allow-pci-phantom-to-mark-real-devices-as-.patch
@@ -1,7 +1,7 @@
From 8e11ec8fbf6f933f8854f4bc54226653316903f2 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 7 Jun 2022 14:08:06 +0200
-Subject: [PATCH 21/32] PCI: don't allow "pci-phantom=" to mark real devices as
+Subject: [PATCH 21/51] PCI: don't allow "pci-phantom=" to mark real devices as
phantom functions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
diff --git a/0022-x86-pv-Clean-up-_get_page_type.patch b/0022-x86-pv-Clean-up-_get_page_type.patch
index a6008b0..0270beb 100644
--- a/0022-x86-pv-Clean-up-_get_page_type.patch
+++ b/0022-x86-pv-Clean-up-_get_page_type.patch
@@ -1,7 +1,7 @@
From b152dfbc3ad71a788996440b18174d995c3bffc9 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu, 9 Jun 2022 15:27:19 +0200
-Subject: [PATCH 22/32] x86/pv: Clean up _get_page_type()
+Subject: [PATCH 22/51] x86/pv: Clean up _get_page_type()
Various fixes for clarity, ahead of making complicated changes.
diff --git a/0023-x86-pv-Fix-ABAC-cmpxchg-race-in-_get_page_type.patch b/0023-x86-pv-Fix-ABAC-cmpxchg-race-in-_get_page_type.patch
index 2f4b734..1e3febd 100644
--- a/0023-x86-pv-Fix-ABAC-cmpxchg-race-in-_get_page_type.patch
+++ b/0023-x86-pv-Fix-ABAC-cmpxchg-race-in-_get_page_type.patch
@@ -1,7 +1,7 @@
From 8dab3f79b122e69cbcdebca72cdc14f004ee2193 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu, 9 Jun 2022 15:27:37 +0200
-Subject: [PATCH 23/32] x86/pv: Fix ABAC cmpxchg() race in _get_page_type()
+Subject: [PATCH 23/51] x86/pv: Fix ABAC cmpxchg() race in _get_page_type()
_get_page_type() suffers from a race condition where it incorrectly assumes
that because 'x' was read and a subsequent a cmpxchg() succeeds, the type
diff --git a/0024-x86-page-Introduce-_PAGE_-constants-for-memory-types.patch b/0024-x86-page-Introduce-_PAGE_-constants-for-memory-types.patch
index c8c2dda..409b72f 100644
--- a/0024-x86-page-Introduce-_PAGE_-constants-for-memory-types.patch
+++ b/0024-x86-page-Introduce-_PAGE_-constants-for-memory-types.patch
@@ -1,7 +1,7 @@
From 9cfd796ae05421ded8e4f70b2c55352491cfa841 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu, 9 Jun 2022 15:27:53 +0200
-Subject: [PATCH 24/32] x86/page: Introduce _PAGE_* constants for memory types
+Subject: [PATCH 24/51] x86/page: Introduce _PAGE_* constants for memory types
... rather than opencoding the PAT/PCD/PWT attributes in __PAGE_HYPERVISOR_*
constants. These are going to be needed by forthcoming logic.
diff --git a/0025-x86-Don-t-change-the-cacheability-of-the-directmap.patch b/0025-x86-Don-t-change-the-cacheability-of-the-directmap.patch
index 582fc74..0a24a0a 100644
--- a/0025-x86-Don-t-change-the-cacheability-of-the-directmap.patch
+++ b/0025-x86-Don-t-change-the-cacheability-of-the-directmap.patch
@@ -1,7 +1,7 @@
From 74193f4292d9cfc2874866e941d9939d8f33fcef Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu, 9 Jun 2022 15:28:23 +0200
-Subject: [PATCH 25/32] x86: Don't change the cacheability of the directmap
+Subject: [PATCH 25/51] x86: Don't change the cacheability of the directmap
Changeset 55f97f49b7ce ("x86: Change cache attributes of Xen 1:1 page mappings
in response to guest mapping requests") attempted to keep the cacheability
diff --git a/0026-x86-Split-cache_flush-out-of-cache_writeback.patch b/0026-x86-Split-cache_flush-out-of-cache_writeback.patch
index ffd8d7c..50f70f4 100644
--- a/0026-x86-Split-cache_flush-out-of-cache_writeback.patch
+++ b/0026-x86-Split-cache_flush-out-of-cache_writeback.patch
@@ -1,7 +1,7 @@
From 8eafa2d871ae51d461256e4a14175e24df330c70 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu, 9 Jun 2022 15:28:48 +0200
-Subject: [PATCH 26/32] x86: Split cache_flush() out of cache_writeback()
+Subject: [PATCH 26/51] x86: Split cache_flush() out of cache_writeback()
Subsequent changes will want a fully flushing version.
diff --git a/0027-x86-amd-Work-around-CLFLUSH-ordering-on-older-parts.patch b/0027-x86-amd-Work-around-CLFLUSH-ordering-on-older-parts.patch
index a3ab379..060bc99 100644
--- a/0027-x86-amd-Work-around-CLFLUSH-ordering-on-older-parts.patch
+++ b/0027-x86-amd-Work-around-CLFLUSH-ordering-on-older-parts.patch
@@ -1,7 +1,7 @@
From c4815be949aae6583a9a22897beb96b095b4f1a2 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu, 9 Jun 2022 15:29:13 +0200
-Subject: [PATCH 27/32] x86/amd: Work around CLFLUSH ordering on older parts
+Subject: [PATCH 27/51] x86/amd: Work around CLFLUSH ordering on older parts
On pre-CLFLUSHOPT AMD CPUs, CLFLUSH is weakely ordered with everything,
including reads and writes to the address, and LFENCE/SFENCE instructions.
diff --git a/0028-x86-pv-Track-and-flush-non-coherent-mappings-of-RAM.patch b/0028-x86-pv-Track-and-flush-non-coherent-mappings-of-RAM.patch
index 66cd741..af60348 100644
--- a/0028-x86-pv-Track-and-flush-non-coherent-mappings-of-RAM.patch
+++ b/0028-x86-pv-Track-and-flush-non-coherent-mappings-of-RAM.patch
@@ -1,7 +1,7 @@
From dc020d8d1ba420e2dd0e7a40f5045db897f3c4f4 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu, 9 Jun 2022 15:29:38 +0200
-Subject: [PATCH 28/32] x86/pv: Track and flush non-coherent mappings of RAM
+Subject: [PATCH 28/51] x86/pv: Track and flush non-coherent mappings of RAM
There are legitimate uses of WC mappings of RAM, e.g. for DMA buffers with
devices that make non-coherent writes. The Linux sound subsystem makes
diff --git a/0029-x86-mm-account-for-PGT_pae_xen_l2-in-recently-added-.patch b/0029-x86-mm-account-for-PGT_pae_xen_l2-in-recently-added-.patch
index 0076984..90ce4cf 100644
--- a/0029-x86-mm-account-for-PGT_pae_xen_l2-in-recently-added-.patch
+++ b/0029-x86-mm-account-for-PGT_pae_xen_l2-in-recently-added-.patch
@@ -1,7 +1,7 @@
From 0b4e62847c5af1a59eea8d17093feccd550d1c26 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Fri, 10 Jun 2022 10:28:28 +0200
-Subject: [PATCH 29/32] x86/mm: account for PGT_pae_xen_l2 in recently added
+Subject: [PATCH 29/51] x86/mm: account for PGT_pae_xen_l2 in recently added
assertion
While PGT_pae_xen_l2 will be zapped once the type refcount of an L2 page
diff --git a/0030-x86-spec-ctrl-Make-VERW-flushing-runtime-conditional.patch b/0030-x86-spec-ctrl-Make-VERW-flushing-runtime-conditional.patch
index 8556452..af25b5c 100644
--- a/0030-x86-spec-ctrl-Make-VERW-flushing-runtime-conditional.patch
+++ b/0030-x86-spec-ctrl-Make-VERW-flushing-runtime-conditional.patch
@@ -1,7 +1,7 @@
From 0e80f9f61168d4e4f008da75762cee0118f802ed Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Mon, 13 Jun 2022 16:19:01 +0100
-Subject: [PATCH 30/32] x86/spec-ctrl: Make VERW flushing runtime conditional
+Subject: [PATCH 30/51] x86/spec-ctrl: Make VERW flushing runtime conditional
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
diff --git a/0031-x86-spec-ctrl-Enumeration-for-MMIO-Stale-Data-contro.patch b/0031-x86-spec-ctrl-Enumeration-for-MMIO-Stale-Data-contro.patch
index 6934800..3b91fb5 100644
--- a/0031-x86-spec-ctrl-Enumeration-for-MMIO-Stale-Data-contro.patch
+++ b/0031-x86-spec-ctrl-Enumeration-for-MMIO-Stale-Data-contro.patch
@@ -1,7 +1,7 @@
From a83108736db0ddaa5855f5abda6dcc8ae4fe25e9 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Mon, 20 Sep 2021 18:47:49 +0100
-Subject: [PATCH 31/32] x86/spec-ctrl: Enumeration for MMIO Stale Data controls
+Subject: [PATCH 31/51] x86/spec-ctrl: Enumeration for MMIO Stale Data controls
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
diff --git a/0032-x86-spec-ctrl-Add-spec-ctrl-unpriv-mmio.patch b/0032-x86-spec-ctrl-Add-spec-ctrl-unpriv-mmio.patch
index a5ac3e9..c63891a 100644
--- a/0032-x86-spec-ctrl-Add-spec-ctrl-unpriv-mmio.patch
+++ b/0032-x86-spec-ctrl-Add-spec-ctrl-unpriv-mmio.patch
@@ -1,7 +1,7 @@
From 2e82446cb252f6c8ac697e81f4155872c69afde4 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Mon, 13 Jun 2022 19:18:32 +0100
-Subject: [PATCH 32/32] x86/spec-ctrl: Add spec-ctrl=unpriv-mmio
+Subject: [PATCH 32/51] x86/spec-ctrl: Add spec-ctrl=unpriv-mmio
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
diff --git a/0033-IOMMU-x86-work-around-bogus-gcc12-warning-in-hvm_gsi.patch b/0033-IOMMU-x86-work-around-bogus-gcc12-warning-in-hvm_gsi.patch
new file mode 100644
index 0000000..07f488d
--- /dev/null
+++ b/0033-IOMMU-x86-work-around-bogus-gcc12-warning-in-hvm_gsi.patch
@@ -0,0 +1,52 @@
+From 460b08d6c6c16b3f32aa138e772b759ae02a4479 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 12 Jul 2022 11:10:34 +0200
+Subject: [PATCH 33/51] IOMMU/x86: work around bogus gcc12 warning in
+ hvm_gsi_eoi()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+As per [1] the expansion of the pirq_dpci() macro causes a -Waddress
+controlled warning (enabled implicitly in our builds, if not by default)
+tying the middle part of the involved conditional expression to the
+surrounding boolean context. Work around this by introducing a local
+inline function in the affected source file.
+
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+
+[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102967
+master commit: 80ad8db8a4d9bb24952f0aea788ce6f47566fa76
+master date: 2022-06-15 10:19:32 +0200
+---
+ xen/drivers/passthrough/x86/hvm.c | 12 ++++++++++++
+ 1 file changed, 12 insertions(+)
+
+diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c
+index 0b37cd145b60..ba0f6c53d742 100644
+--- a/xen/drivers/passthrough/x86/hvm.c
++++ b/xen/drivers/passthrough/x86/hvm.c
+@@ -25,6 +25,18 @@
+ #include <asm/hvm/support.h>
+ #include <asm/io_apic.h>
+
++/*
++ * Gcc12 takes issue with pirq_dpci() being used in boolean context (see gcc
++ * bug 102967). While we can't replace the macro definition in the header by an
++ * inline function, we can do so here.
++ */
++static inline struct hvm_pirq_dpci *_pirq_dpci(struct pirq *pirq)
++{
++ return pirq_dpci(pirq);
++}
++#undef pirq_dpci
++#define pirq_dpci(pirq) _pirq_dpci(pirq)
++
+ static DEFINE_PER_CPU(struct list_head, dpci_list);
+
+ /*
+--
+2.35.1
+
diff --git a/0034-ehci-dbgp-fix-selecting-n-th-ehci-controller.patch b/0034-ehci-dbgp-fix-selecting-n-th-ehci-controller.patch
new file mode 100644
index 0000000..ac71ab8
--- /dev/null
+++ b/0034-ehci-dbgp-fix-selecting-n-th-ehci-controller.patch
@@ -0,0 +1,36 @@
+From 5cb8142076ce1ce53eafd7e00acb4d0eac4e7784 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
+ <marmarek@invisiblethingslab.com>
+Date: Tue, 12 Jul 2022 11:11:35 +0200
+Subject: [PATCH 34/51] ehci-dbgp: fix selecting n-th ehci controller
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The ehci<n> number was parsed but ignored.
+
+Fixes: 322ecbe4ac85 ("console: add EHCI debug port based serial console")
+Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: d6d0cb659fda64430d4649f8680c5cead32da8fd
+master date: 2022-06-16 14:23:37 +0100
+---
+ xen/drivers/char/ehci-dbgp.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/drivers/char/ehci-dbgp.c b/xen/drivers/char/ehci-dbgp.c
+index c893d246defa..66b4811af24a 100644
+--- a/xen/drivers/char/ehci-dbgp.c
++++ b/xen/drivers/char/ehci-dbgp.c
+@@ -1478,7 +1478,7 @@ void __init ehci_dbgp_init(void)
+ unsigned int num = 0;
+
+ if ( opt_dbgp[4] )
+- simple_strtoul(opt_dbgp + 4, &e, 10);
++ num = simple_strtoul(opt_dbgp + 4, &e, 10);
+
+ dbgp->cap = find_dbgp(dbgp, num);
+ if ( !dbgp->cap )
+--
+2.35.1
+
diff --git a/0035-tools-xenstored-Harden-corrupt.patch b/0035-tools-xenstored-Harden-corrupt.patch
new file mode 100644
index 0000000..bb0f7f1
--- /dev/null
+++ b/0035-tools-xenstored-Harden-corrupt.patch
@@ -0,0 +1,44 @@
+From 81ee3d08351be1ef2a14d371993604098d6a4673 Mon Sep 17 00:00:00 2001
+From: Julien Grall <jgrall@amazon.com>
+Date: Tue, 12 Jul 2022 11:12:13 +0200
+Subject: [PATCH 35/51] tools/xenstored: Harden corrupt()
+
+At the moment, corrupt() is neither checking for allocation failure
+nor freeing the allocated memory.
+
+Harden the code by printing ENOMEM if the allocation failed and
+free 'str' after the last use.
+
+This is not considered to be a security issue because corrupt() should
+only be called when Xenstored thinks the database is corrupted. Note
+that the trigger (i.e. a guest reliably provoking the call) would be
+a security issue.
+
+Fixes: 06d17943f0cd ("Added a basic integrity checker, and some basic ability to recover from store")
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+master commit: db3382dd4f468c763512d6bf91c96773395058fb
+master date: 2022-06-23 13:44:10 +0100
+---
+ tools/xenstore/xenstored_core.c | 5 ++++-
+ 1 file changed, 4 insertions(+), 1 deletion(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 91d093a12ea6..0c8ee276f837 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -2087,7 +2087,10 @@ void corrupt(struct connection *conn, const char *fmt, ...)
+ va_end(arglist);
+
+ log("corruption detected by connection %i: err %s: %s",
+- conn ? (int)conn->id : -1, strerror(saved_errno), str);
++ conn ? (int)conn->id : -1, strerror(saved_errno),
++ str ?: "ENOMEM");
++
++ talloc_free(str);
+
+ check_store();
+ }
+--
+2.35.1
+
diff --git a/0036-x86-spec-ctrl-Only-adjust-MSR_SPEC_CTRL-for-idle-wit.patch b/0036-x86-spec-ctrl-Only-adjust-MSR_SPEC_CTRL-for-idle-wit.patch
new file mode 100644
index 0000000..8bc0768
--- /dev/null
+++ b/0036-x86-spec-ctrl-Only-adjust-MSR_SPEC_CTRL-for-idle-wit.patch
@@ -0,0 +1,93 @@
+From 09d533f4c80b7eaf9fb4e36ebba8259580857a9d Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 12 Jul 2022 11:12:46 +0200
+Subject: [PATCH 36/51] x86/spec-ctrl: Only adjust MSR_SPEC_CTRL for idle with
+ legacy IBRS
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Back at the time of the original Spectre-v2 fixes, it was recommended to clear
+MSR_SPEC_CTRL when going idle. This is because of the side effects on the
+sibling thread caused by the microcode IBRS and STIBP implementations which
+were retrofitted to existing CPUs.
+
+However, there are no relevant cross-thread impacts for the hardware
+IBRS/STIBP implementations, so this logic should not be used on Intel CPUs
+supporting eIBRS, or any AMD CPUs; doing so only adds unnecessary latency to
+the idle path.
+
+Furthermore, there's no point playing with MSR_SPEC_CTRL in the idle paths if
+SMT is disabled for other reasons.
+
+Fixes: 8d03080d2a33 ("x86/spec-ctrl: Cease using thunk=lfence on AMD")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: ffc7694e0c99eea158c32aa164b7d1e1bb1dc46b
+master date: 2022-06-30 18:07:13 +0100
+---
+ xen/arch/x86/spec_ctrl.c | 10 ++++++++--
+ xen/include/asm-x86/cpufeatures.h | 2 +-
+ xen/include/asm-x86/spec_ctrl.h | 5 +++--
+ 3 files changed, 12 insertions(+), 5 deletions(-)
+
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 099113ba41e6..1ed5ceda8b46 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -1150,8 +1150,14 @@ void __init init_speculation_mitigations(void)
+ /* (Re)init BSP state now that default_spec_ctrl_flags has been calculated. */
+ init_shadow_spec_ctrl_state();
+
+- /* If Xen is using any MSR_SPEC_CTRL settings, adjust the idle path. */
+- if ( default_xen_spec_ctrl )
++ /*
++ * For microcoded IBRS only (i.e. Intel, pre eIBRS), it is recommended to
++ * clear MSR_SPEC_CTRL before going idle, to avoid impacting sibling
++ * threads. Activate this if SMT is enabled, and Xen is using a non-zero
++ * MSR_SPEC_CTRL setting.
++ */
++ if ( boot_cpu_has(X86_FEATURE_IBRSB) && !(caps & ARCH_CAPS_IBRS_ALL) &&
++ hw_smt_enabled && default_xen_spec_ctrl )
+ setup_force_cpu_cap(X86_FEATURE_SC_MSR_IDLE);
+
+ xpti_init_default(caps);
+diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
+index bd45a144ee78..493d338a085e 100644
+--- a/xen/include/asm-x86/cpufeatures.h
++++ b/xen/include/asm-x86/cpufeatures.h
+@@ -33,7 +33,7 @@ XEN_CPUFEATURE(SC_MSR_HVM, X86_SYNTH(17)) /* MSR_SPEC_CTRL used by Xen fo
+ XEN_CPUFEATURE(SC_RSB_PV, X86_SYNTH(18)) /* RSB overwrite needed for PV */
+ XEN_CPUFEATURE(SC_RSB_HVM, X86_SYNTH(19)) /* RSB overwrite needed for HVM */
+ XEN_CPUFEATURE(XEN_SELFSNOOP, X86_SYNTH(20)) /* SELFSNOOP gets used by Xen itself */
+-XEN_CPUFEATURE(SC_MSR_IDLE, X86_SYNTH(21)) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
++XEN_CPUFEATURE(SC_MSR_IDLE, X86_SYNTH(21)) /* Clear MSR_SPEC_CTRL on idle */
+ XEN_CPUFEATURE(XEN_LBR, X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
+ /* Bits 23,24 unused. */
+ XEN_CPUFEATURE(SC_VERW_IDLE, X86_SYNTH(25)) /* VERW used by Xen for idle */
+diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
+index 751355f471f4..7e83e0179fb9 100644
+--- a/xen/include/asm-x86/spec_ctrl.h
++++ b/xen/include/asm-x86/spec_ctrl.h
+@@ -78,7 +78,8 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
+ uint32_t val = 0;
+
+ /*
+- * Branch Target Injection:
++ * It is recommended in some cases to clear MSR_SPEC_CTRL when going idle,
++ * to avoid impacting sibling threads.
+ *
+ * Latch the new shadow value, then enable shadowing, then update the MSR.
+ * There are no SMP issues here; only local processor ordering concerns.
+@@ -114,7 +115,7 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
+ uint32_t val = info->xen_spec_ctrl;
+
+ /*
+- * Branch Target Injection:
++ * Restore MSR_SPEC_CTRL on exit from idle.
+ *
+ * Disable shadowing before updating the MSR. There are no SMP issues
+ * here; only local processor ordering concerns.
+--
+2.35.1
+
diff --git a/0037-x86-spec-ctrl-Knobs-for-STIBP-and-PSFD-and-follow-ha.patch b/0037-x86-spec-ctrl-Knobs-for-STIBP-and-PSFD-and-follow-ha.patch
new file mode 100644
index 0000000..156aa58
--- /dev/null
+++ b/0037-x86-spec-ctrl-Knobs-for-STIBP-and-PSFD-and-follow-ha.patch
@@ -0,0 +1,234 @@
+From db6ca8176ccc4ff7dfe3c06969af9ebfab0d7b04 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 12 Jul 2022 11:13:33 +0200
+Subject: [PATCH 37/51] x86/spec-ctrl: Knobs for STIBP and PSFD, and follow
+ hardware STIBP hint
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+STIBP and PSFD are slightly weird bits, because they're both implied by other
+bits in MSR_SPEC_CTRL. Add fine grain controls for them, and take the
+implications into account when setting IBRS/SSBD.
+
+Rearrange the IBPB text/variables/logic to keep all the MSR_SPEC_CTRL bits
+together, for consistency.
+
+However, AMD have a hardware hint CPUID bit recommending that STIBP be set
+unilaterally. This is advertised on Zen3, so follow the recommendation.
+Furthermore, in such cases, set STIBP behind the guest's back for now. This
+has negligible overhead for the guest, but saves a WRMSR on vmentry. This is
+the only default change.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: fef244b179c06fcdfa581f7d57fa6e578c49ff50
+master date: 2022-06-30 18:07:13 +0100
+---
+ docs/misc/xen-command-line.pandoc | 21 +++++++---
+ xen/arch/x86/hvm/svm/vmcb.c | 9 +++++
+ xen/arch/x86/spec_ctrl.c | 67 ++++++++++++++++++++++++++-----
+ 3 files changed, 82 insertions(+), 15 deletions(-)
+
+diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
+index a642e43476a2..46e9c58d35cd 100644
+--- a/docs/misc/xen-command-line.pandoc
++++ b/docs/misc/xen-command-line.pandoc
+@@ -2234,8 +2234,9 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
+
+ ### spec-ctrl (x86)
+ > `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
+-> bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
+-> l1d-flush,branch-harden,srb-lock,unpriv-mmio}=<bool> ]`
++> bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
++> eager-fpu,l1d-flush,branch-harden,srb-lock,
++> unpriv-mmio}=<bool> ]`
+
+ Controls for speculative execution sidechannel mitigations. By default, Xen
+ will pick the most appropriate mitigations based on compiled in support,
+@@ -2285,9 +2286,10 @@ On hardware supporting IBRS (Indirect Branch Restricted Speculation), the
+ If Xen is not using IBRS itself, functionality is still set up so IBRS can be
+ virtualised for guests.
+
+-On hardware supporting IBPB (Indirect Branch Prediction Barrier), the `ibpb=`
+-option can be used to force (the default) or prevent Xen from issuing branch
+-prediction barriers on vcpu context switches.
++On hardware supporting STIBP (Single Thread Indirect Branch Predictors), the
++`stibp=` option can be used to force or prevent Xen using the feature itself.
++By default, Xen will use STIBP when IBRS is in use (IBRS implies STIBP), and
++when hardware hints recommend using it as a blanket setting.
+
+ On hardware supporting SSBD (Speculative Store Bypass Disable), the `ssbd=`
+ option can be used to force or prevent Xen using the feature itself. On AMD
+@@ -2295,6 +2297,15 @@ hardware, this is a global option applied at boot, and not virtualised for
+ guest use. On Intel hardware, the feature is virtualised for guests,
+ independently of Xen's choice of setting.
+
++On hardware supporting PSFD (Predictive Store Forwarding Disable), the `psfd=`
++option can be used to force or prevent Xen using the feature itself. By
++default, Xen will not use PSFD. PSFD is implied by SSBD, and SSBD is off by
++default.
++
++On hardware supporting IBPB (Indirect Branch Prediction Barrier), the `ibpb=`
++option can be used to force (the default) or prevent Xen from issuing branch
++prediction barriers on vcpu context switches.
++
+ On all hardware, the `eager-fpu=` option can be used to force or prevent Xen
+ from using fully eager FPU context switches. This is currently implemented as
+ a global control. By default, Xen will choose to use fully eager context
+diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c
+index 565e997155f2..ef7224eb5dd7 100644
+--- a/xen/arch/x86/hvm/svm/vmcb.c
++++ b/xen/arch/x86/hvm/svm/vmcb.c
+@@ -29,6 +29,7 @@
+ #include <asm/hvm/support.h>
+ #include <asm/hvm/svm/svm.h>
+ #include <asm/hvm/svm/svmdebug.h>
++#include <asm/spec_ctrl.h>
+
+ struct vmcb_struct *alloc_vmcb(void)
+ {
+@@ -176,6 +177,14 @@ static int construct_vmcb(struct vcpu *v)
+ vmcb->_pause_filter_thresh = SVM_PAUSETHRESH_INIT;
+ }
+
++ /*
++ * When default_xen_spec_ctrl simply SPEC_CTRL_STIBP, default this behind
++ * the back of the VM too. Our SMT topology isn't accurate, the overhead
++ * is neglegable, and doing this saves a WRMSR on the vmentry path.
++ */
++ if ( default_xen_spec_ctrl == SPEC_CTRL_STIBP )
++ v->arch.msrs->spec_ctrl.raw = SPEC_CTRL_STIBP;
++
+ return 0;
+ }
+
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 1ed5ceda8b46..dfdd45c358c4 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -48,9 +48,13 @@ static enum ind_thunk {
+ THUNK_LFENCE,
+ THUNK_JMP,
+ } opt_thunk __initdata = THUNK_DEFAULT;
++
+ static int8_t __initdata opt_ibrs = -1;
++int8_t __initdata opt_stibp = -1;
++bool __read_mostly opt_ssbd;
++int8_t __initdata opt_psfd = -1;
++
+ bool __read_mostly opt_ibpb = true;
+-bool __read_mostly opt_ssbd = false;
+ int8_t __read_mostly opt_eager_fpu = -1;
+ int8_t __read_mostly opt_l1d_flush = -1;
+ static bool __initdata opt_branch_harden = true;
+@@ -172,12 +176,20 @@ static int __init parse_spec_ctrl(const char *s)
+ else
+ rc = -EINVAL;
+ }
++
++ /* Bits in MSR_SPEC_CTRL. */
+ else if ( (val = parse_boolean("ibrs", s, ss)) >= 0 )
+ opt_ibrs = val;
+- else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
+- opt_ibpb = val;
++ else if ( (val = parse_boolean("stibp", s, ss)) >= 0 )
++ opt_stibp = val;
+ else if ( (val = parse_boolean("ssbd", s, ss)) >= 0 )
+ opt_ssbd = val;
++ else if ( (val = parse_boolean("psfd", s, ss)) >= 0 )
++ opt_psfd = val;
++
++ /* Misc settings. */
++ else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
++ opt_ibpb = val;
+ else if ( (val = parse_boolean("eager-fpu", s, ss)) >= 0 )
+ opt_eager_fpu = val;
+ else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
+@@ -376,7 +388,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
+ "\n");
+
+ /* Settings for Xen's protection, irrespective of guests. */
+- printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s, Other:%s%s%s%s%s\n",
++ printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s\n",
+ thunk == THUNK_NONE ? "N/A" :
+ thunk == THUNK_RETPOLINE ? "RETPOLINE" :
+ thunk == THUNK_LFENCE ? "LFENCE" :
+@@ -390,6 +402,9 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
+ (!boot_cpu_has(X86_FEATURE_SSBD) &&
+ !boot_cpu_has(X86_FEATURE_AMD_SSBD)) ? "" :
+ (default_xen_spec_ctrl & SPEC_CTRL_SSBD) ? " SSBD+" : " SSBD-",
++ (!boot_cpu_has(X86_FEATURE_PSFD) &&
++ !boot_cpu_has(X86_FEATURE_INTEL_PSFD)) ? "" :
++ (default_xen_spec_ctrl & SPEC_CTRL_PSFD) ? " PSFD+" : " PSFD-",
+ !(caps & ARCH_CAPS_TSX_CTRL) ? "" :
+ (opt_tsx & 1) ? " TSX+" : " TSX-",
+ !cpu_has_srbds_ctrl ? "" :
+@@ -979,10 +994,7 @@ void __init init_speculation_mitigations(void)
+ if ( !has_spec_ctrl )
+ printk(XENLOG_WARNING "?!? CET active, but no MSR_SPEC_CTRL?\n");
+ else if ( opt_ibrs == -1 )
+- {
+ opt_ibrs = ibrs = true;
+- default_xen_spec_ctrl |= SPEC_CTRL_IBRS | SPEC_CTRL_STIBP;
+- }
+
+ if ( opt_thunk == THUNK_DEFAULT || opt_thunk == THUNK_RETPOLINE )
+ thunk = THUNK_JMP;
+@@ -1086,14 +1098,49 @@ void __init init_speculation_mitigations(void)
+ setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);
+ }
+
+- /* If we have IBRS available, see whether we should use it. */
++ /* Figure out default_xen_spec_ctrl. */
+ if ( has_spec_ctrl && ibrs )
+- default_xen_spec_ctrl |= SPEC_CTRL_IBRS;
++ {
++ /* IBRS implies STIBP. */
++ if ( opt_stibp == -1 )
++ opt_stibp = 1;
++
++ default_xen_spec_ctrl |= SPEC_CTRL_IBRS;
++ }
++
++ /*
++ * Use STIBP by default if the hardware hint is set. Otherwise, leave it
++ * off as it a severe performance pentalty on pre-eIBRS Intel hardware
++ * where it was retrofitted in microcode.
++ */
++ if ( opt_stibp == -1 )
++ opt_stibp = !!boot_cpu_has(X86_FEATURE_STIBP_ALWAYS);
++
++ if ( opt_stibp && (boot_cpu_has(X86_FEATURE_STIBP) ||
++ boot_cpu_has(X86_FEATURE_AMD_STIBP)) )
++ default_xen_spec_ctrl |= SPEC_CTRL_STIBP;
+
+- /* If we have SSBD available, see whether we should use it. */
+ if ( opt_ssbd && (boot_cpu_has(X86_FEATURE_SSBD) ||
+ boot_cpu_has(X86_FEATURE_AMD_SSBD)) )
++ {
++ /* SSBD implies PSFD */
++ if ( opt_psfd == -1 )
++ opt_psfd = 1;
++
+ default_xen_spec_ctrl |= SPEC_CTRL_SSBD;
++ }
++
++ /*
++ * Don't use PSFD by default. AMD designed the predictor to
++ * auto-clear on privilege change. PSFD is implied by SSBD, which is
++ * off by default.
++ */
++ if ( opt_psfd == -1 )
++ opt_psfd = 0;
++
++ if ( opt_psfd && (boot_cpu_has(X86_FEATURE_PSFD) ||
++ boot_cpu_has(X86_FEATURE_INTEL_PSFD)) )
++ default_xen_spec_ctrl |= SPEC_CTRL_PSFD;
+
+ /*
+ * PV guests can create RSB entries for any linear address they control,
+--
+2.35.1
+
diff --git a/0038-libxc-fix-compilation-error-with-gcc13.patch b/0038-libxc-fix-compilation-error-with-gcc13.patch
new file mode 100644
index 0000000..8056742
--- /dev/null
+++ b/0038-libxc-fix-compilation-error-with-gcc13.patch
@@ -0,0 +1,33 @@
+From cd3d6b4cd46cd05590805b4a6c0b6654af60106e Mon Sep 17 00:00:00 2001
+From: Charles Arnold <carnold@suse.com>
+Date: Tue, 12 Jul 2022 11:14:07 +0200
+Subject: [PATCH 38/51] libxc: fix compilation error with gcc13
+
+xc_psr.c:161:5: error: conflicting types for 'xc_psr_cmt_get_data'
+due to enum/integer mismatch;
+
+Signed-off-by: Charles Arnold <carnold@suse.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Anthony PERARD <anthony.perard@citrix.com>
+master commit: 8eeae8c2b4efefda8e946461e86cf2ae9c18e5a9
+master date: 2022-07-06 13:06:40 +0200
+---
+ tools/include/xenctrl.h | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
+index 07b96e6671a5..893ae39e4a95 100644
+--- a/tools/include/xenctrl.h
++++ b/tools/include/xenctrl.h
+@@ -2516,7 +2516,7 @@ int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask);
+ int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
+ uint32_t *l3_cache_size);
+ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
+- uint32_t psr_cmt_type, uint64_t *monitor_data,
++ xc_psr_cmt_type type, uint64_t *monitor_data,
+ uint64_t *tsc);
+ int xc_psr_cmt_enabled(xc_interface *xch);
+
+--
+2.35.1
+
diff --git a/0039-x86-spec-ctrl-Honour-spec-ctrl-0-for-unpriv-mmio-sub.patch b/0039-x86-spec-ctrl-Honour-spec-ctrl-0-for-unpriv-mmio-sub.patch
new file mode 100644
index 0000000..1797a8f
--- /dev/null
+++ b/0039-x86-spec-ctrl-Honour-spec-ctrl-0-for-unpriv-mmio-sub.patch
@@ -0,0 +1,32 @@
+From 61b9c2ceeb94b0cdaff01023cc5523b1f13e66e2 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 12 Jul 2022 11:14:34 +0200
+Subject: [PATCH 39/51] x86/spec-ctrl: Honour spec-ctrl=0 for unpriv-mmio
+ sub-option
+
+This was an oversight from when unpriv-mmio was introduced.
+
+Fixes: 8c24b70fedcb ("x86/spec-ctrl: Add spec-ctrl=unpriv-mmio")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 4cdb519d797c19ebb8fadc5938cdb47479d5a21b
+master date: 2022-07-11 15:21:35 +0100
+---
+ xen/arch/x86/spec_ctrl.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index dfdd45c358c4..ae74943c1053 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -122,6 +122,7 @@ static int __init parse_spec_ctrl(const char *s)
+ opt_l1d_flush = 0;
+ opt_branch_harden = false;
+ opt_srb_lock = 0;
++ opt_unpriv_mmio = false;
+ }
+ else if ( val > 0 )
+ rc = -EINVAL;
+--
+2.35.1
+
diff --git a/0040-xen-cmdline-Extend-parse_boolean-to-signal-a-name-ma.patch b/0040-xen-cmdline-Extend-parse_boolean-to-signal-a-name-ma.patch
new file mode 100644
index 0000000..3512590
--- /dev/null
+++ b/0040-xen-cmdline-Extend-parse_boolean-to-signal-a-name-ma.patch
@@ -0,0 +1,87 @@
+From eec5b02403a9df2523527caad24f17af5060fbe7 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 12 Jul 2022 11:15:03 +0200
+Subject: [PATCH 40/51] xen/cmdline: Extend parse_boolean() to signal a name
+ match
+
+This will help parsing a sub-option which has boolean and non-boolean options
+available.
+
+First, rework 'int val' into 'bool has_neg_prefix'. This inverts it's value,
+but the resulting logic is far easier to follow.
+
+Second, reject anything of the form 'no-$FOO=' which excludes ambiguous
+constructs such as 'no-$foo=yes' which have never been valid.
+
+This just leaves the case where everything is otherwise fine, but parse_bool()
+can't interpret the provided string.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 382326cac528dd1eb0d04efd5c05363c453e29f4
+master date: 2022-07-11 15:21:35 +0100
+---
+ xen/common/kernel.c | 20 ++++++++++++++++----
+ xen/include/xen/lib.h | 3 ++-
+ 2 files changed, 18 insertions(+), 5 deletions(-)
+
+diff --git a/xen/common/kernel.c b/xen/common/kernel.c
+index e119e5401f9d..7ed96521f97a 100644
+--- a/xen/common/kernel.c
++++ b/xen/common/kernel.c
+@@ -272,9 +272,9 @@ int parse_bool(const char *s, const char *e)
+ int parse_boolean(const char *name, const char *s, const char *e)
+ {
+ size_t slen, nlen;
+- int val = !!strncmp(s, "no-", 3);
++ bool has_neg_prefix = !strncmp(s, "no-", 3);
+
+- if ( !val )
++ if ( has_neg_prefix )
+ s += 3;
+
+ slen = e ? ({ ASSERT(e >= s); e - s; }) : strlen(s);
+@@ -286,11 +286,23 @@ int parse_boolean(const char *name, const char *s, const char *e)
+
+ /* Exact, unadorned name? Result depends on the 'no-' prefix. */
+ if ( slen == nlen )
+- return val;
++ return !has_neg_prefix;
++
++ /* Inexact match with a 'no-' prefix? Not valid. */
++ if ( has_neg_prefix )
++ return -1;
+
+ /* =$SOMETHING? Defer to the regular boolean parsing. */
+ if ( s[nlen] == '=' )
+- return parse_bool(&s[nlen + 1], e);
++ {
++ int b = parse_bool(&s[nlen + 1], e);
++
++ if ( b >= 0 )
++ return b;
++
++ /* Not a boolean, but the name matched. Signal specially. */
++ return -2;
++ }
+
+ /* Unrecognised. Give up. */
+ return -1;
+diff --git a/xen/include/xen/lib.h b/xen/include/xen/lib.h
+index c6987973bf88..2296044caf79 100644
+--- a/xen/include/xen/lib.h
++++ b/xen/include/xen/lib.h
+@@ -80,7 +80,8 @@ int parse_bool(const char *s, const char *e);
+ /**
+ * Given a specific name, parses a string of the form:
+ * [no-]$NAME[=...]
+- * returning 0 or 1 for a recognised boolean, or -1 for an error.
++ * returning 0 or 1 for a recognised boolean. Returns -1 for general errors,
++ * and -2 for "not a boolean, but $NAME= matches".
+ */
+ int parse_boolean(const char *name, const char *s, const char *e);
+
+--
+2.35.1
+
diff --git a/0041-x86-spec-ctrl-Add-fine-grained-cmdline-suboptions-fo.patch b/0041-x86-spec-ctrl-Add-fine-grained-cmdline-suboptions-fo.patch
new file mode 100644
index 0000000..9964bb9
--- /dev/null
+++ b/0041-x86-spec-ctrl-Add-fine-grained-cmdline-suboptions-fo.patch
@@ -0,0 +1,137 @@
+From f066c8bb3e5686141cef6fa1dc86ea9f37c5388a Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 12 Jul 2022 11:15:37 +0200
+Subject: [PATCH 41/51] x86/spec-ctrl: Add fine-grained cmdline suboptions for
+ primitives
+
+Support controling the PV/HVM suboption of msr-sc/rsb/md-clear, which
+previously wasn't possible.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 27357c394ba6e1571a89105b840ce1c6f026485c
+master date: 2022-07-11 15:21:35 +0100
+---
+ docs/misc/xen-command-line.pandoc | 12 ++++--
+ xen/arch/x86/spec_ctrl.c | 66 ++++++++++++++++++++++++++-----
+ 2 files changed, 66 insertions(+), 12 deletions(-)
+
+diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
+index 46e9c58d35cd..1bbdb55129cc 100644
+--- a/docs/misc/xen-command-line.pandoc
++++ b/docs/misc/xen-command-line.pandoc
+@@ -2233,7 +2233,8 @@ not be able to control the state of the mitigation.
+ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
+
+ ### spec-ctrl (x86)
+-> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
++> `= List of [ <bool>, xen=<bool>, {pv,hvm}=<bool>,
++> {msr-sc,rsb,md-clear}=<bool>|{pv,hvm}=<bool>,
+ > bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
+ > eager-fpu,l1d-flush,branch-harden,srb-lock,
+ > unpriv-mmio}=<bool> ]`
+@@ -2258,12 +2259,17 @@ in place for guests to use.
+
+ Use of a positive boolean value for either of these options is invalid.
+
+-The booleans `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` offer fine
++The `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` options offer fine
+ grained control over the primitives by Xen. These impact Xen's ability to
+-protect itself, and Xen's ability to virtualise support for guests to use.
++protect itself, and/or Xen's ability to virtualise support for guests to use.
+
+ * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
+ respectively.
++* Each other option can be used either as a plain boolean
++ (e.g. `spec-ctrl=rsb` to control both the PV and HVM sub-options), or with
++ `pv=` or `hvm=` subsuboptions (e.g. `spec-ctrl=rsb=no-hvm` to disable HVM
++ RSB only).
++
+ * `msr-sc=` offers control over Xen's support for manipulating `MSR_SPEC_CTRL`
+ on entry and exit. These blocks are necessary to virtualise support for
+ guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index ae74943c1053..9507e5da60a9 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -147,20 +147,68 @@ static int __init parse_spec_ctrl(const char *s)
+ opt_rsb_hvm = val;
+ opt_md_clear_hvm = val;
+ }
+- else if ( (val = parse_boolean("msr-sc", s, ss)) >= 0 )
++ else if ( (val = parse_boolean("msr-sc", s, ss)) != -1 )
+ {
+- opt_msr_sc_pv = val;
+- opt_msr_sc_hvm = val;
++ switch ( val )
++ {
++ case 0:
++ case 1:
++ opt_msr_sc_pv = opt_msr_sc_hvm = val;
++ break;
++
++ case -2:
++ s += strlen("msr-sc=");
++ if ( (val = parse_boolean("pv", s, ss)) >= 0 )
++ opt_msr_sc_pv = val;
++ else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
++ opt_msr_sc_hvm = val;
++ else
++ default:
++ rc = -EINVAL;
++ break;
++ }
+ }
+- else if ( (val = parse_boolean("rsb", s, ss)) >= 0 )
++ else if ( (val = parse_boolean("rsb", s, ss)) != -1 )
+ {
+- opt_rsb_pv = val;
+- opt_rsb_hvm = val;
++ switch ( val )
++ {
++ case 0:
++ case 1:
++ opt_rsb_pv = opt_rsb_hvm = val;
++ break;
++
++ case -2:
++ s += strlen("rsb=");
++ if ( (val = parse_boolean("pv", s, ss)) >= 0 )
++ opt_rsb_pv = val;
++ else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
++ opt_rsb_hvm = val;
++ else
++ default:
++ rc = -EINVAL;
++ break;
++ }
+ }
+- else if ( (val = parse_boolean("md-clear", s, ss)) >= 0 )
++ else if ( (val = parse_boolean("md-clear", s, ss)) != -1 )
+ {
+- opt_md_clear_pv = val;
+- opt_md_clear_hvm = val;
++ switch ( val )
++ {
++ case 0:
++ case 1:
++ opt_md_clear_pv = opt_md_clear_hvm = val;
++ break;
++
++ case -2:
++ s += strlen("md-clear=");
++ if ( (val = parse_boolean("pv", s, ss)) >= 0 )
++ opt_md_clear_pv = val;
++ else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
++ opt_md_clear_hvm = val;
++ else
++ default:
++ rc = -EINVAL;
++ break;
++ }
+ }
+
+ /* Xen's speculative sidechannel mitigation settings. */
+--
+2.35.1
+
diff --git a/0042-tools-helpers-fix-build-of-xen-init-dom0-with-Werror.patch b/0042-tools-helpers-fix-build-of-xen-init-dom0-with-Werror.patch
new file mode 100644
index 0000000..eea790a
--- /dev/null
+++ b/0042-tools-helpers-fix-build-of-xen-init-dom0-with-Werror.patch
@@ -0,0 +1,28 @@
+From 14fd97e3de939a63a6e467f240efb49fe226a5dc Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Tue, 12 Jul 2022 11:16:10 +0200
+Subject: [PATCH 42/51] tools/helpers: fix build of xen-init-dom0 with -Werror
+
+Missing prototype of asprintf() without _GNU_SOURCE.
+
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Reviewed-by: Henry Wang <Henry.Wang@arm.com>
+master commit: d693b22733044d68e9974766b5c9e6259c9b1708
+master date: 2022-07-12 08:38:35 +0200
+---
+ tools/helpers/xen-init-dom0.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+diff --git a/tools/helpers/xen-init-dom0.c b/tools/helpers/xen-init-dom0.c
+index c99224a4b607..b4861c9e8041 100644
+--- a/tools/helpers/xen-init-dom0.c
++++ b/tools/helpers/xen-init-dom0.c
+@@ -1,3 +1,5 @@
++#define _GNU_SOURCE
++
+ #include <stdlib.h>
+ #include <stdint.h>
+ #include <string.h>
+--
+2.35.1
+
diff --git a/0043-libxl-check-return-value-of-libxl__xs_directory-in-n.patch b/0043-libxl-check-return-value-of-libxl__xs_directory-in-n.patch
new file mode 100644
index 0000000..0c2470a
--- /dev/null
+++ b/0043-libxl-check-return-value-of-libxl__xs_directory-in-n.patch
@@ -0,0 +1,38 @@
+From 744accad1b73223b3261e3e678e16e030d83b179 Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Tue, 12 Jul 2022 11:16:30 +0200
+Subject: [PATCH 43/51] libxl: check return value of libxl__xs_directory in
+ name2bdf
+
+libxl__xs_directory() can potentially return NULL without setting `n`.
+As `n` isn't initialised, we need to check libxl__xs_directory()
+return value before checking `n`. Otherwise, `n` might be non-zero
+with `bdfs` NULL which would lead to a segv.
+
+Fixes: 57bff091f4 ("libxl: add 'name' field to 'libxl_device_pci' in the IDL...")
+Reported-by: "G.R." <firemeteor@users.sourceforge.net>
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+Tested-by: "G.R." <firemeteor@users.sourceforge.net>
+master commit: d778089ac70e5b8e3bdea0c85fc8c0b9ed0eaf2f
+master date: 2022-07-12 08:38:51 +0200
+---
+ tools/libs/light/libxl_pci.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
+index 4bbbfe9f168f..ce3bf7c0ae81 100644
+--- a/tools/libs/light/libxl_pci.c
++++ b/tools/libs/light/libxl_pci.c
+@@ -859,7 +859,7 @@ static int name2bdf(libxl__gc *gc, libxl_device_pci *pci)
+ int rc = ERROR_NOTFOUND;
+
+ bdfs = libxl__xs_directory(gc, XBT_NULL, PCI_INFO_PATH, &n);
+- if (!n)
++ if (!bdfs || !n)
+ goto out;
+
+ for (i = 0; i < n; i++) {
+--
+2.35.1
+
diff --git a/0044-x86-spec-ctrl-Rework-spec_ctrl_flags-context-switchi.patch b/0044-x86-spec-ctrl-Rework-spec_ctrl_flags-context-switchi.patch
new file mode 100644
index 0000000..d8517f8
--- /dev/null
+++ b/0044-x86-spec-ctrl-Rework-spec_ctrl_flags-context-switchi.patch
@@ -0,0 +1,167 @@
+From 3a280cbae7022b83af91c27a8e2211ba3b1234f5 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 1 Jul 2022 15:59:40 +0100
+Subject: [PATCH 44/51] x86/spec-ctrl: Rework spec_ctrl_flags context switching
+
+We are shortly going to need to context switch new bits in both the vcpu and
+S3 paths. Introduce SCF_IST_MASK and SCF_DOM_MASK, and rework d->arch.verw
+into d->arch.spec_ctrl_flags to accommodate.
+
+No functional change.
+
+This is part of XSA-407.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 5796912f7279d9348a3166655588d30eae9f72cc)
+---
+ xen/arch/x86/acpi/power.c | 8 ++++----
+ xen/arch/x86/domain.c | 8 ++++----
+ xen/arch/x86/spec_ctrl.c | 9 ++++++---
+ xen/include/asm-x86/domain.h | 3 +--
+ xen/include/asm-x86/spec_ctrl.h | 30 ++++++++++++++++++++++++++++-
+ xen/include/asm-x86/spec_ctrl_asm.h | 3 ---
+ 6 files changed, 44 insertions(+), 17 deletions(-)
+
+diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
+index 5eaa77f66a28..dd397f713067 100644
+--- a/xen/arch/x86/acpi/power.c
++++ b/xen/arch/x86/acpi/power.c
+@@ -248,8 +248,8 @@ static int enter_state(u32 state)
+ error = 0;
+
+ ci = get_cpu_info();
+- /* Avoid NMI/#MC using MSR_SPEC_CTRL until we've reloaded microcode. */
+- ci->spec_ctrl_flags &= ~SCF_ist_wrmsr;
++ /* Avoid NMI/#MC using unsafe MSRs until we've reloaded microcode. */
++ ci->spec_ctrl_flags &= ~SCF_IST_MASK;
+
+ ACPI_FLUSH_CPU_CACHE();
+
+@@ -292,8 +292,8 @@ static int enter_state(u32 state)
+ if ( !recheck_cpu_features(0) )
+ panic("Missing previously available feature(s)\n");
+
+- /* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */
+- ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_ist_wrmsr);
++ /* Re-enabled default NMI/#MC use of MSRs now microcode is loaded. */
++ ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_IST_MASK);
+
+ if ( boot_cpu_has(X86_FEATURE_IBRSB) || boot_cpu_has(X86_FEATURE_IBRS) )
+ {
+diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
+index 1fe6644a71ae..82a0b73cf6ef 100644
+--- a/xen/arch/x86/domain.c
++++ b/xen/arch/x86/domain.c
+@@ -2092,10 +2092,10 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
+ }
+ }
+
+- /* Update the top-of-stack block with the VERW disposition. */
+- info->spec_ctrl_flags &= ~SCF_verw;
+- if ( nextd->arch.verw )
+- info->spec_ctrl_flags |= SCF_verw;
++ /* Update the top-of-stack block with the new spec_ctrl settings. */
++ info->spec_ctrl_flags =
++ (info->spec_ctrl_flags & ~SCF_DOM_MASK) |
++ (nextd->arch.spec_ctrl_flags & SCF_DOM_MASK);
+ }
+
+ sched_context_switched(prev, next);
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 9507e5da60a9..7e646680f1c7 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -1010,9 +1010,12 @@ void spec_ctrl_init_domain(struct domain *d)
+ {
+ bool pv = is_pv_domain(d);
+
+- d->arch.verw =
+- (pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
+- (opt_fb_clear_mmio && is_iommu_enabled(d));
++ bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
++ (opt_fb_clear_mmio && is_iommu_enabled(d)));
++
++ d->arch.spec_ctrl_flags =
++ (verw ? SCF_verw : 0) |
++ 0;
+ }
+
+ void __init init_speculation_mitigations(void)
+diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
+index 2398a1d99da9..e4c099262cb7 100644
+--- a/xen/include/asm-x86/domain.h
++++ b/xen/include/asm-x86/domain.h
+@@ -319,8 +319,7 @@ struct arch_domain
+ uint32_t pci_cf8;
+ uint8_t cmos_idx;
+
+- /* Use VERW on return-to-guest for its flushing side effect. */
+- bool verw;
++ uint8_t spec_ctrl_flags; /* See SCF_DOM_MASK */
+
+ union {
+ struct pv_domain pv;
+diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
+index 7e83e0179fb9..3cd72e40305f 100644
+--- a/xen/include/asm-x86/spec_ctrl.h
++++ b/xen/include/asm-x86/spec_ctrl.h
+@@ -20,12 +20,40 @@
+ #ifndef __X86_SPEC_CTRL_H__
+ #define __X86_SPEC_CTRL_H__
+
+-/* Encoding of cpuinfo.spec_ctrl_flags */
++/*
++ * Encoding of:
++ * cpuinfo.spec_ctrl_flags
++ * default_spec_ctrl_flags
++ * domain.spec_ctrl_flags
++ *
++ * Live settings are in the top-of-stack block, because they need to be
++ * accessable when XPTI is active. Some settings are fixed from boot, some
++ * context switched per domain, and some inhibited in the S3 path.
++ */
+ #define SCF_use_shadow (1 << 0)
+ #define SCF_ist_wrmsr (1 << 1)
+ #define SCF_ist_rsb (1 << 2)
+ #define SCF_verw (1 << 3)
+
++/*
++ * The IST paths (NMI/#MC) can interrupt any arbitrary context. Some
++ * functionality requires updated microcode to work.
++ *
++ * On boot, this is easy; we load microcode before figuring out which
++ * speculative protections to apply. However, on the S3 resume path, we must
++ * be able to disable the configured mitigations until microcode is reloaded.
++ *
++ * These are the controls to inhibit on the S3 resume path until microcode has
++ * been reloaded.
++ */
++#define SCF_IST_MASK (SCF_ist_wrmsr)
++
++/*
++ * Some speculative protections are per-domain. These settings are merged
++ * into the top-of-stack block in the context switch path.
++ */
++#define SCF_DOM_MASK (SCF_verw)
++
+ #ifndef __ASSEMBLY__
+
+ #include <asm/alternative.h>
+diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
+index 5a590bac44aa..66b00d511fc6 100644
+--- a/xen/include/asm-x86/spec_ctrl_asm.h
++++ b/xen/include/asm-x86/spec_ctrl_asm.h
+@@ -248,9 +248,6 @@
+
+ /*
+ * Use in IST interrupt/exception context. May interrupt Xen or PV context.
+- * Fine grain control of SCF_ist_wrmsr is needed for safety in the S3 resume
+- * path to avoid using MSR_SPEC_CTRL before the microcode introducing it has
+- * been reloaded.
+ */
+ .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
+ /*
+--
+2.35.1
+
diff --git a/0045-x86-spec-ctrl-Rename-SCF_ist_wrmsr-to-SCF_ist_sc_msr.patch b/0045-x86-spec-ctrl-Rename-SCF_ist_wrmsr-to-SCF_ist_sc_msr.patch
new file mode 100644
index 0000000..5b841a6
--- /dev/null
+++ b/0045-x86-spec-ctrl-Rename-SCF_ist_wrmsr-to-SCF_ist_sc_msr.patch
@@ -0,0 +1,110 @@
+From 31aa2a20bfefc3a8a200da54a56471bf99f9630e Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 28 Jun 2022 14:36:56 +0100
+Subject: [PATCH 45/51] x86/spec-ctrl: Rename SCF_ist_wrmsr to SCF_ist_sc_msr
+
+We are about to introduce SCF_ist_ibpb, at which point SCF_ist_wrmsr becomes
+ambiguous.
+
+No functional change.
+
+This is part of XSA-407.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 76d6a36f645dfdbad8830559d4d52caf36efc75e)
+---
+ xen/arch/x86/spec_ctrl.c | 6 +++---
+ xen/include/asm-x86/spec_ctrl.h | 4 ++--
+ xen/include/asm-x86/spec_ctrl_asm.h | 8 ++++----
+ 3 files changed, 9 insertions(+), 9 deletions(-)
+
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 7e646680f1c7..89f95c083e1b 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -1115,7 +1115,7 @@ void __init init_speculation_mitigations(void)
+ {
+ if ( opt_msr_sc_pv )
+ {
+- default_spec_ctrl_flags |= SCF_ist_wrmsr;
++ default_spec_ctrl_flags |= SCF_ist_sc_msr;
+ setup_force_cpu_cap(X86_FEATURE_SC_MSR_PV);
+ }
+
+@@ -1126,7 +1126,7 @@ void __init init_speculation_mitigations(void)
+ * Xen's value is not restored atomically. An early NMI hitting
+ * the VMExit path needs to restore Xen's value for safety.
+ */
+- default_spec_ctrl_flags |= SCF_ist_wrmsr;
++ default_spec_ctrl_flags |= SCF_ist_sc_msr;
+ setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);
+ }
+ }
+@@ -1139,7 +1139,7 @@ void __init init_speculation_mitigations(void)
+ * on real hardware matches the availability of MSR_SPEC_CTRL in the
+ * first place.
+ *
+- * No need for SCF_ist_wrmsr because Xen's value is restored
++ * No need for SCF_ist_sc_msr because Xen's value is restored
+ * atomically WRT NMIs in the VMExit path.
+ *
+ * TODO: Adjust cpu_has_svm_spec_ctrl to be usable earlier on boot.
+diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
+index 3cd72e40305f..f8f0ac47e759 100644
+--- a/xen/include/asm-x86/spec_ctrl.h
++++ b/xen/include/asm-x86/spec_ctrl.h
+@@ -31,7 +31,7 @@
+ * context switched per domain, and some inhibited in the S3 path.
+ */
+ #define SCF_use_shadow (1 << 0)
+-#define SCF_ist_wrmsr (1 << 1)
++#define SCF_ist_sc_msr (1 << 1)
+ #define SCF_ist_rsb (1 << 2)
+ #define SCF_verw (1 << 3)
+
+@@ -46,7 +46,7 @@
+ * These are the controls to inhibit on the S3 resume path until microcode has
+ * been reloaded.
+ */
+-#define SCF_IST_MASK (SCF_ist_wrmsr)
++#define SCF_IST_MASK (SCF_ist_sc_msr)
+
+ /*
+ * Some speculative protections are per-domain. These settings are merged
+diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
+index 66b00d511fc6..0ff1b118f882 100644
+--- a/xen/include/asm-x86/spec_ctrl_asm.h
++++ b/xen/include/asm-x86/spec_ctrl_asm.h
+@@ -266,8 +266,8 @@
+
+ .L\@_skip_rsb:
+
+- test $SCF_ist_wrmsr, %al
+- jz .L\@_skip_wrmsr
++ test $SCF_ist_sc_msr, %al
++ jz .L\@_skip_msr_spec_ctrl
+
+ xor %edx, %edx
+ testb $3, UREGS_cs(%rsp)
+@@ -290,7 +290,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
+ * to speculate around the WRMSR. As a result, we need a dispatch
+ * serialising instruction in the else clause.
+ */
+-.L\@_skip_wrmsr:
++.L\@_skip_msr_spec_ctrl:
+ lfence
+ UNLIKELY_END(\@_serialise)
+ .endm
+@@ -301,7 +301,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
+ * Requires %rbx=stack_end
+ * Clobbers %rax, %rcx, %rdx
+ */
+- testb $SCF_ist_wrmsr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
++ testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
+ jz .L\@_skip
+
+ DO_SPEC_CTRL_EXIT_TO_XEN
+--
+2.35.1
+
diff --git a/0046-x86-spec-ctrl-Rename-opt_ibpb-to-opt_ibpb_ctxt_switc.patch b/0046-x86-spec-ctrl-Rename-opt_ibpb-to-opt_ibpb_ctxt_switc.patch
new file mode 100644
index 0000000..a950639
--- /dev/null
+++ b/0046-x86-spec-ctrl-Rename-opt_ibpb-to-opt_ibpb_ctxt_switc.patch
@@ -0,0 +1,97 @@
+From e7671561c84322860875745e57b228a7a310f2bf Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Mon, 4 Jul 2022 21:32:17 +0100
+Subject: [PATCH 46/51] x86/spec-ctrl: Rename opt_ibpb to opt_ibpb_ctxt_switch
+
+We are about to introduce the use of IBPB at different points in Xen, making
+opt_ibpb ambiguous. Rename it to opt_ibpb_ctxt_switch.
+
+No functional change.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit a8e5ef079d6f5c88c472e3e620db5a8d1402a50d)
+---
+ xen/arch/x86/domain.c | 2 +-
+ xen/arch/x86/spec_ctrl.c | 10 +++++-----
+ xen/include/asm-x86/spec_ctrl.h | 2 +-
+ 3 files changed, 7 insertions(+), 7 deletions(-)
+
+diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
+index 82a0b73cf6ef..0d39981550ca 100644
+--- a/xen/arch/x86/domain.c
++++ b/xen/arch/x86/domain.c
+@@ -2064,7 +2064,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
+
+ ctxt_switch_levelling(next);
+
+- if ( opt_ibpb && !is_idle_domain(nextd) )
++ if ( opt_ibpb_ctxt_switch && !is_idle_domain(nextd) )
+ {
+ static DEFINE_PER_CPU(unsigned int, last);
+ unsigned int *last_id = &this_cpu(last);
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 89f95c083e1b..f4ae36eae2d0 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -54,7 +54,7 @@ int8_t __initdata opt_stibp = -1;
+ bool __read_mostly opt_ssbd;
+ int8_t __initdata opt_psfd = -1;
+
+-bool __read_mostly opt_ibpb = true;
++bool __read_mostly opt_ibpb_ctxt_switch = true;
+ int8_t __read_mostly opt_eager_fpu = -1;
+ int8_t __read_mostly opt_l1d_flush = -1;
+ static bool __initdata opt_branch_harden = true;
+@@ -117,7 +117,7 @@ static int __init parse_spec_ctrl(const char *s)
+
+ opt_thunk = THUNK_JMP;
+ opt_ibrs = 0;
+- opt_ibpb = false;
++ opt_ibpb_ctxt_switch = false;
+ opt_ssbd = false;
+ opt_l1d_flush = 0;
+ opt_branch_harden = false;
+@@ -238,7 +238,7 @@ static int __init parse_spec_ctrl(const char *s)
+
+ /* Misc settings. */
+ else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
+- opt_ibpb = val;
++ opt_ibpb_ctxt_switch = val;
+ else if ( (val = parse_boolean("eager-fpu", s, ss)) >= 0 )
+ opt_eager_fpu = val;
+ else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
+@@ -458,7 +458,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
+ (opt_tsx & 1) ? " TSX+" : " TSX-",
+ !cpu_has_srbds_ctrl ? "" :
+ opt_srb_lock ? " SRB_LOCK+" : " SRB_LOCK-",
+- opt_ibpb ? " IBPB" : "",
++ opt_ibpb_ctxt_switch ? " IBPB-ctxt" : "",
+ opt_l1d_flush ? " L1D_FLUSH" : "",
+ opt_md_clear_pv || opt_md_clear_hvm ||
+ opt_fb_clear_mmio ? " VERW" : "",
+@@ -1240,7 +1240,7 @@ void __init init_speculation_mitigations(void)
+
+ /* Check we have hardware IBPB support before using it... */
+ if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
+- opt_ibpb = false;
++ opt_ibpb_ctxt_switch = false;
+
+ /* Check whether Eager FPU should be enabled by default. */
+ if ( opt_eager_fpu == -1 )
+diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
+index f8f0ac47e759..fb4365575620 100644
+--- a/xen/include/asm-x86/spec_ctrl.h
++++ b/xen/include/asm-x86/spec_ctrl.h
+@@ -63,7 +63,7 @@
+ void init_speculation_mitigations(void);
+ void spec_ctrl_init_domain(struct domain *d);
+
+-extern bool opt_ibpb;
++extern bool opt_ibpb_ctxt_switch;
+ extern bool opt_ssbd;
+ extern int8_t opt_eager_fpu;
+ extern int8_t opt_l1d_flush;
+--
+2.35.1
+
diff --git a/0047-x86-spec-ctrl-Rework-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch b/0047-x86-spec-ctrl-Rework-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch
new file mode 100644
index 0000000..3ce9fd9
--- /dev/null
+++ b/0047-x86-spec-ctrl-Rework-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch
@@ -0,0 +1,106 @@
+From 2a9e690a0ad5d54dca4166e089089a07bbe7fc85 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 1 Jul 2022 15:59:40 +0100
+Subject: [PATCH 47/51] x86/spec-ctrl: Rework SPEC_CTRL_ENTRY_FROM_INTR_IST
+
+We are shortly going to add a conditional IBPB in this path.
+
+Therefore, we cannot hold spec_ctrl_flags in %eax, and rely on only clobbering
+it after we're done with its contents. %rbx is available for use, and the
+more normal register to hold preserved information in.
+
+With %rax freed up, use it instead of %rdx for the RSB tmp register, and for
+the adjustment to spec_ctrl_flags.
+
+This leaves no use of %rdx, except as 0 for the upper half of WRMSR. In
+practice, %rdx is 0 from SAVE_ALL on all paths and isn't likely to change in
+the foreseeable future, so update the macro entry requirements to state this
+dependency. This marginal optimisation can be revisited if circumstances
+change.
+
+No practical change.
+
+This is part of XSA-407.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit e9b8d31981f184c6539f91ec54bd9cae29cdae36)
+---
+ xen/arch/x86/x86_64/entry.S | 4 ++--
+ xen/include/asm-x86/spec_ctrl_asm.h | 21 ++++++++++-----------
+ 2 files changed, 12 insertions(+), 13 deletions(-)
+
+diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
+index 2a86938f1f32..a1810bf4d311 100644
+--- a/xen/arch/x86/x86_64/entry.S
++++ b/xen/arch/x86/x86_64/entry.S
+@@ -932,7 +932,7 @@ ENTRY(double_fault)
+
+ GET_STACK_END(14)
+
+- SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
++ SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: abcd */
+ /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
+ mov STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rbx
+@@ -968,7 +968,7 @@ handle_ist_exception:
+
+ GET_STACK_END(14)
+
+- SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
++ SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: abcd */
+ /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
+ mov STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
+diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
+index 0ff1b118f882..15e24cde00d1 100644
+--- a/xen/include/asm-x86/spec_ctrl_asm.h
++++ b/xen/include/asm-x86/spec_ctrl_asm.h
+@@ -251,34 +251,33 @@
+ */
+ .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
+ /*
+- * Requires %rsp=regs, %r14=stack_end
+- * Clobbers %rax, %rcx, %rdx
++ * Requires %rsp=regs, %r14=stack_end, %rdx=0
++ * Clobbers %rax, %rbx, %rcx, %rdx
+ *
+ * This is logical merge of DO_OVERWRITE_RSB and DO_SPEC_CTRL_ENTRY
+ * maybexen=1, but with conditionals rather than alternatives.
+ */
+- movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %eax
++ movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
+
+- test $SCF_ist_rsb, %al
++ test $SCF_ist_rsb, %bl
+ jz .L\@_skip_rsb
+
+- DO_OVERWRITE_RSB tmp=rdx /* Clobbers %rcx/%rdx */
++ DO_OVERWRITE_RSB /* Clobbers %rax/%rcx */
+
+ .L\@_skip_rsb:
+
+- test $SCF_ist_sc_msr, %al
++ test $SCF_ist_sc_msr, %bl
+ jz .L\@_skip_msr_spec_ctrl
+
+- xor %edx, %edx
++ xor %eax, %eax
+ testb $3, UREGS_cs(%rsp)
+- setnz %dl
+- not %edx
+- and %dl, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
++ setnz %al
++ not %eax
++ and %al, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
+
+ /* Load Xen's intended value. */
+ mov $MSR_SPEC_CTRL, %ecx
+ movzbl STACK_CPUINFO_FIELD(xen_spec_ctrl)(%r14), %eax
+- xor %edx, %edx
+ wrmsr
+
+ /* Opencoded UNLIKELY_START() with no condition. */
+--
+2.35.1
+
diff --git a/0048-x86-spec-ctrl-Support-IBPB-on-entry.patch b/0048-x86-spec-ctrl-Support-IBPB-on-entry.patch
new file mode 100644
index 0000000..d5ad043
--- /dev/null
+++ b/0048-x86-spec-ctrl-Support-IBPB-on-entry.patch
@@ -0,0 +1,300 @@
+From 76c5fcee9027fb8823dd501086f0ff3ee3c4231c Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 24 Feb 2022 13:44:33 +0000
+Subject: [PATCH 48/51] x86/spec-ctrl: Support IBPB-on-entry
+
+We are going to need this to mitigate Branch Type Confusion on AMD/Hygon CPUs,
+but as we've talked about using it in other cases too, arrange to support it
+generally. However, this is also very expensive in some cases, so we're going
+to want per-domain controls.
+
+Introduce SCF_ist_ibpb and SCF_entry_ibpb controls, adding them to the IST and
+DOM masks as appropriate. Also introduce X86_FEATURE_IBPB_ENTRY_{PV,HVM} to
+to patch the code blocks.
+
+For SVM, the STGI is serialising enough to protect against Spectre-v1 attacks,
+so no "else lfence" is necessary. VT-x will use use the MSR host load list,
+so doesn't need any code in the VMExit path.
+
+For the IST path, we can't safely check CPL==0 to skip a flush, as we might
+have hit an entry path before it's IBPB. As IST hitting Xen is rare, flush
+irrespective of CPL. A later path, SCF_ist_sc_msr, provides Spectre-v1
+safety.
+
+For the PV paths, we know we're interrupting CPL>0, while for the INTR paths,
+we can safely check CPL==0. Only flush when interrupting guest context.
+
+An "else lfence" is needed for safety, but we want to be able to skip it on
+unaffected CPUs, so the block wants to be an alternative, which means the
+lfence has to be inline rather than UNLIKELY() (the replacement block doesn't
+have displacements fixed up for anything other than the first instruction).
+
+As with SPEC_CTRL_ENTRY_FROM_INTR_IST, %rdx is 0 on entry so rely on this to
+shrink the logic marginally. Update the comments to specify this new
+dependency.
+
+This is part of XSA-407.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 53a570b285694947776d5190f591a0d5b9b18de7)
+---
+ xen/arch/x86/hvm/svm/entry.S | 18 ++++++++++-
+ xen/arch/x86/hvm/vmx/vmcs.c | 4 +++
+ xen/arch/x86/x86_64/compat/entry.S | 2 +-
+ xen/arch/x86/x86_64/entry.S | 12 +++----
+ xen/include/asm-x86/cpufeatures.h | 2 ++
+ xen/include/asm-x86/spec_ctrl.h | 6 ++--
+ xen/include/asm-x86/spec_ctrl_asm.h | 49 +++++++++++++++++++++++++++--
+ 7 files changed, 81 insertions(+), 12 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
+index 4ae55a2ef605..0ff4008060fa 100644
+--- a/xen/arch/x86/hvm/svm/entry.S
++++ b/xen/arch/x86/hvm/svm/entry.S
+@@ -97,7 +97,19 @@ __UNLIKELY_END(nsvm_hap)
+
+ GET_CURRENT(bx)
+
+- /* SPEC_CTRL_ENTRY_FROM_SVM Req: %rsp=regs/cpuinfo Clob: acd */
++ /* SPEC_CTRL_ENTRY_FROM_SVM Req: %rsp=regs/cpuinfo, %rdx=0 Clob: acd */
++
++ .macro svm_vmexit_cond_ibpb
++ testb $SCF_entry_ibpb, CPUINFO_xen_spec_ctrl(%rsp)
++ jz .L_skip_ibpb
++
++ mov $MSR_PRED_CMD, %ecx
++ mov $PRED_CMD_IBPB, %eax
++ wrmsr
++.L_skip_ibpb:
++ .endm
++ ALTERNATIVE "", svm_vmexit_cond_ibpb, X86_FEATURE_IBPB_ENTRY_HVM
++
+ ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
+
+ .macro svm_vmexit_spec_ctrl
+@@ -114,6 +126,10 @@ __UNLIKELY_END(nsvm_hap)
+ ALTERNATIVE "", svm_vmexit_spec_ctrl, X86_FEATURE_SC_MSR_HVM
+ /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
++ /*
++ * STGI is executed unconditionally, and is sufficiently serialising
++ * to safely resolve any Spectre-v1 concerns in the above logic.
++ */
+ stgi
+ GLOBAL(svm_stgi_label)
+ mov %rsp,%rdi
+diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
+index f9f9bc18cdbc..dd817cee4e69 100644
+--- a/xen/arch/x86/hvm/vmx/vmcs.c
++++ b/xen/arch/x86/hvm/vmx/vmcs.c
+@@ -1345,6 +1345,10 @@ static int construct_vmcs(struct vcpu *v)
+ rc = vmx_add_msr(v, MSR_FLUSH_CMD, FLUSH_CMD_L1D,
+ VMX_MSR_GUEST_LOADONLY);
+
++ if ( !rc && (d->arch.spec_ctrl_flags & SCF_entry_ibpb) )
++ rc = vmx_add_msr(v, MSR_PRED_CMD, PRED_CMD_IBPB,
++ VMX_MSR_HOST);
++
+ out:
+ vmx_vmcs_exit(v);
+
+diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
+index 5fd6dbbd4513..b86d38d1c50d 100644
+--- a/xen/arch/x86/x86_64/compat/entry.S
++++ b/xen/arch/x86/x86_64/compat/entry.S
+@@ -18,7 +18,7 @@ ENTRY(entry_int82)
+ movl $HYPERCALL_VECTOR, 4(%rsp)
+ SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */
+
+- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
++ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
+ /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
+ CR4_PV32_RESTORE
+diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
+index a1810bf4d311..fba8ae498f74 100644
+--- a/xen/arch/x86/x86_64/entry.S
++++ b/xen/arch/x86/x86_64/entry.S
+@@ -260,7 +260,7 @@ ENTRY(lstar_enter)
+ movl $TRAP_syscall, 4(%rsp)
+ SAVE_ALL
+
+- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
++ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
+ /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
+ GET_STACK_END(bx)
+@@ -298,7 +298,7 @@ ENTRY(cstar_enter)
+ movl $TRAP_syscall, 4(%rsp)
+ SAVE_ALL
+
+- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
++ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
+ /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
+ GET_STACK_END(bx)
+@@ -338,7 +338,7 @@ GLOBAL(sysenter_eflags_saved)
+ movl $TRAP_syscall, 4(%rsp)
+ SAVE_ALL
+
+- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
++ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
+ /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
+ GET_STACK_END(bx)
+@@ -392,7 +392,7 @@ ENTRY(int80_direct_trap)
+ movl $0x80, 4(%rsp)
+ SAVE_ALL
+
+- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
++ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
+ /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
+ GET_STACK_END(bx)
+@@ -674,7 +674,7 @@ ENTRY(common_interrupt)
+
+ GET_STACK_END(14)
+
+- SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
++ SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: acd */
+ /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
+ mov STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
+@@ -708,7 +708,7 @@ GLOBAL(handle_exception)
+
+ GET_STACK_END(14)
+
+- SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
++ SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: acd */
+ /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
+
+ mov STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
+diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
+index 493d338a085e..672c9ee22ba2 100644
+--- a/xen/include/asm-x86/cpufeatures.h
++++ b/xen/include/asm-x86/cpufeatures.h
+@@ -39,6 +39,8 @@ XEN_CPUFEATURE(XEN_LBR, X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
+ XEN_CPUFEATURE(SC_VERW_IDLE, X86_SYNTH(25)) /* VERW used by Xen for idle */
+ XEN_CPUFEATURE(XEN_SHSTK, X86_SYNTH(26)) /* Xen uses CET Shadow Stacks */
+ XEN_CPUFEATURE(XEN_IBT, X86_SYNTH(27)) /* Xen uses CET Indirect Branch Tracking */
++XEN_CPUFEATURE(IBPB_ENTRY_PV, X86_SYNTH(28)) /* MSR_PRED_CMD used by Xen for PV */
++XEN_CPUFEATURE(IBPB_ENTRY_HVM, X86_SYNTH(29)) /* MSR_PRED_CMD used by Xen for HVM */
+
+ /* Bug words follow the synthetic words. */
+ #define X86_NR_BUG 1
+diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
+index fb4365575620..3fc599a817c4 100644
+--- a/xen/include/asm-x86/spec_ctrl.h
++++ b/xen/include/asm-x86/spec_ctrl.h
+@@ -34,6 +34,8 @@
+ #define SCF_ist_sc_msr (1 << 1)
+ #define SCF_ist_rsb (1 << 2)
+ #define SCF_verw (1 << 3)
++#define SCF_ist_ibpb (1 << 4)
++#define SCF_entry_ibpb (1 << 5)
+
+ /*
+ * The IST paths (NMI/#MC) can interrupt any arbitrary context. Some
+@@ -46,13 +48,13 @@
+ * These are the controls to inhibit on the S3 resume path until microcode has
+ * been reloaded.
+ */
+-#define SCF_IST_MASK (SCF_ist_sc_msr)
++#define SCF_IST_MASK (SCF_ist_sc_msr | SCF_ist_ibpb)
+
+ /*
+ * Some speculative protections are per-domain. These settings are merged
+ * into the top-of-stack block in the context switch path.
+ */
+-#define SCF_DOM_MASK (SCF_verw)
++#define SCF_DOM_MASK (SCF_verw | SCF_entry_ibpb)
+
+ #ifndef __ASSEMBLY__
+
+diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
+index 15e24cde00d1..9eb4ad9ab71d 100644
+--- a/xen/include/asm-x86/spec_ctrl_asm.h
++++ b/xen/include/asm-x86/spec_ctrl_asm.h
+@@ -88,6 +88,35 @@
+ * - SPEC_CTRL_EXIT_TO_{SVM,VMX}
+ */
+
++.macro DO_SPEC_CTRL_COND_IBPB maybexen:req
++/*
++ * Requires %rsp=regs (also cpuinfo if !maybexen)
++ * Requires %r14=stack_end (if maybexen), %rdx=0
++ * Clobbers %rax, %rcx, %rdx
++ *
++ * Conditionally issue IBPB if SCF_entry_ibpb is active. In the maybexen
++ * case, we can safely look at UREGS_cs to skip taking the hit when
++ * interrupting Xen.
++ */
++ .if \maybexen
++ testb $SCF_entry_ibpb, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
++ jz .L\@_skip
++ testb $3, UREGS_cs(%rsp)
++ .else
++ testb $SCF_entry_ibpb, CPUINFO_xen_spec_ctrl(%rsp)
++ .endif
++ jz .L\@_skip
++
++ mov $MSR_PRED_CMD, %ecx
++ mov $PRED_CMD_IBPB, %eax
++ wrmsr
++ jmp .L\@_done
++
++.L\@_skip:
++ lfence
++.L\@_done:
++.endm
++
+ .macro DO_OVERWRITE_RSB tmp=rax
+ /*
+ * Requires nothing
+@@ -225,12 +254,16 @@
+
+ /* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
+ #define SPEC_CTRL_ENTRY_FROM_PV \
++ ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=0), \
++ X86_FEATURE_IBPB_ENTRY_PV; \
+ ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV; \
+ ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=0), \
+ X86_FEATURE_SC_MSR_PV
+
+ /* Use in interrupt/exception context. May interrupt Xen or PV context. */
+ #define SPEC_CTRL_ENTRY_FROM_INTR \
++ ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=1), \
++ X86_FEATURE_IBPB_ENTRY_PV; \
+ ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV; \
+ ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1), \
+ X86_FEATURE_SC_MSR_PV
+@@ -254,11 +287,23 @@
+ * Requires %rsp=regs, %r14=stack_end, %rdx=0
+ * Clobbers %rax, %rbx, %rcx, %rdx
+ *
+- * This is logical merge of DO_OVERWRITE_RSB and DO_SPEC_CTRL_ENTRY
+- * maybexen=1, but with conditionals rather than alternatives.
++ * This is logical merge of:
++ * DO_SPEC_CTRL_COND_IBPB maybexen=0
++ * DO_OVERWRITE_RSB
++ * DO_SPEC_CTRL_ENTRY maybexen=1
++ * but with conditionals rather than alternatives.
+ */
+ movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
+
++ test $SCF_ist_ibpb, %bl
++ jz .L\@_skip_ibpb
++
++ mov $MSR_PRED_CMD, %ecx
++ mov $PRED_CMD_IBPB, %eax
++ wrmsr
++
++.L\@_skip_ibpb:
++
+ test $SCF_ist_rsb, %bl
+ jz .L\@_skip_rsb
+
+--
+2.35.1
+
diff --git a/0049-x86-cpuid-Enumeration-for-BTC_NO.patch b/0049-x86-cpuid-Enumeration-for-BTC_NO.patch
new file mode 100644
index 0000000..0e5d119
--- /dev/null
+++ b/0049-x86-cpuid-Enumeration-for-BTC_NO.patch
@@ -0,0 +1,106 @@
+From 0826c7596d35c887b3b7858137c7ac374d9ef17a Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Mon, 16 May 2022 15:48:24 +0100
+Subject: [PATCH 49/51] x86/cpuid: Enumeration for BTC_NO
+
+BTC_NO indicates that hardware is not succeptable to Branch Type Confusion.
+
+Zen3 CPUs don't suffer BTC.
+
+This is part of XSA-407.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 76cb04ad64f3ab9ae785988c40655a71dde9c319)
+---
+ tools/libs/light/libxl_cpuid.c | 1 +
+ tools/misc/xen-cpuid.c | 2 +-
+ xen/arch/x86/cpu/amd.c | 10 ++++++++++
+ xen/arch/x86/spec_ctrl.c | 5 +++--
+ xen/include/public/arch-x86/cpufeatureset.h | 1 +
+ 5 files changed, 16 insertions(+), 3 deletions(-)
+
+diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
+index d462f9e421ed..bf6fdee360a9 100644
+--- a/tools/libs/light/libxl_cpuid.c
++++ b/tools/libs/light/libxl_cpuid.c
+@@ -288,6 +288,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
+ {"virt-ssbd", 0x80000008, NA, CPUID_REG_EBX, 25, 1},
+ {"ssb-no", 0x80000008, NA, CPUID_REG_EBX, 26, 1},
+ {"psfd", 0x80000008, NA, CPUID_REG_EBX, 28, 1},
++ {"btc-no", 0x80000008, NA, CPUID_REG_EBX, 29, 1},
+
+ {"nc", 0x80000008, NA, CPUID_REG_ECX, 0, 8},
+ {"apicidsize", 0x80000008, NA, CPUID_REG_ECX, 12, 4},
+diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
+index bc7dcf55757a..fe22f5f5b68b 100644
+--- a/tools/misc/xen-cpuid.c
++++ b/tools/misc/xen-cpuid.c
+@@ -158,7 +158,7 @@ static const char *const str_e8b[32] =
+ /* [22] */ [23] = "ppin",
+ [24] = "amd-ssbd", [25] = "virt-ssbd",
+ [26] = "ssb-no",
+- [28] = "psfd",
++ [28] = "psfd", [29] = "btc-no",
+ };
+
+ static const char *const str_7d0[32] =
+diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
+index b3b9a0df5fed..b158e3acb5c7 100644
+--- a/xen/arch/x86/cpu/amd.c
++++ b/xen/arch/x86/cpu/amd.c
+@@ -847,6 +847,16 @@ static void init_amd(struct cpuinfo_x86 *c)
+ warning_add(text);
+ }
+ break;
++
++ case 0x19:
++ /*
++ * Zen3 (Fam19h model < 0x10) parts are not susceptible to
++ * Branch Type Confusion, but predate the allocation of the
++ * BTC_NO bit. Fill it back in if we're not virtualised.
++ */
++ if (!cpu_has_hypervisor && !cpu_has(c, X86_FEATURE_BTC_NO))
++ __set_bit(X86_FEATURE_BTC_NO, c->x86_capability);
++ break;
+ }
+
+ display_cacheinfo(c);
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index f4ae36eae2d0..0f101c057f3e 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -388,7 +388,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
+ * Hardware read-only information, stating immunity to certain issues, or
+ * suggestions of which mitigation to use.
+ */
+- printk(" Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
++ printk(" Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+ (caps & ARCH_CAPS_RDCL_NO) ? " RDCL_NO" : "",
+ (caps & ARCH_CAPS_IBRS_ALL) ? " IBRS_ALL" : "",
+ (caps & ARCH_CAPS_RSBA) ? " RSBA" : "",
+@@ -403,7 +403,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
+ (e8b & cpufeat_mask(X86_FEATURE_IBRS_ALWAYS)) ? " IBRS_ALWAYS" : "",
+ (e8b & cpufeat_mask(X86_FEATURE_STIBP_ALWAYS)) ? " STIBP_ALWAYS" : "",
+ (e8b & cpufeat_mask(X86_FEATURE_IBRS_FAST)) ? " IBRS_FAST" : "",
+- (e8b & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "");
++ (e8b & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "",
++ (e8b & cpufeat_mask(X86_FEATURE_BTC_NO)) ? " BTC_NO" : "");
+
+ /* Hardware features which need driving to mitigate issues. */
+ printk(" Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
+diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
+index 743b857dcd5c..e7b8167800a2 100644
+--- a/xen/include/public/arch-x86/cpufeatureset.h
++++ b/xen/include/public/arch-x86/cpufeatureset.h
+@@ -266,6 +266,7 @@ XEN_CPUFEATURE(AMD_SSBD, 8*32+24) /*S MSR_SPEC_CTRL.SSBD available */
+ XEN_CPUFEATURE(VIRT_SSBD, 8*32+25) /* MSR_VIRT_SPEC_CTRL.SSBD */
+ XEN_CPUFEATURE(SSB_NO, 8*32+26) /*A Hardware not vulnerable to SSB */
+ XEN_CPUFEATURE(PSFD, 8*32+28) /*S MSR_SPEC_CTRL.PSFD */
++XEN_CPUFEATURE(BTC_NO, 8*32+29) /*A Hardware not vulnerable to Branch Type Confusion */
+
+ /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
+ XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A AVX512 Neural Network Instructions */
+--
+2.35.1
+
diff --git a/0050-x86-spec-ctrl-Enable-Zen2-chickenbit.patch b/0050-x86-spec-ctrl-Enable-Zen2-chickenbit.patch
new file mode 100644
index 0000000..c83844d
--- /dev/null
+++ b/0050-x86-spec-ctrl-Enable-Zen2-chickenbit.patch
@@ -0,0 +1,106 @@
+From 5457a6870eb1369b868f7b8e833966ed43a773ad Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 15 Mar 2022 18:30:25 +0000
+Subject: [PATCH 50/51] x86/spec-ctrl: Enable Zen2 chickenbit
+
+... as instructed in the Branch Type Confusion whitepaper.
+
+This is part of XSA-407.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+(cherry picked from commit 9deaf2d932f08c16c6b96a1c426e4b1142c0cdbe)
+---
+ xen/arch/x86/cpu/amd.c | 28 ++++++++++++++++++++++++++++
+ xen/arch/x86/cpu/cpu.h | 1 +
+ xen/arch/x86/cpu/hygon.c | 6 ++++++
+ xen/include/asm-x86/msr-index.h | 1 +
+ 4 files changed, 36 insertions(+)
+
+diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
+index b158e3acb5c7..37ac84ddd74d 100644
+--- a/xen/arch/x86/cpu/amd.c
++++ b/xen/arch/x86/cpu/amd.c
+@@ -731,6 +731,31 @@ void amd_init_ssbd(const struct cpuinfo_x86 *c)
+ printk_once(XENLOG_ERR "No SSBD controls available\n");
+ }
+
++/*
++ * On Zen2 we offer this chicken (bit) on the altar of Speculation.
++ *
++ * Refer to the AMD Branch Type Confusion whitepaper:
++ * https://XXX
++ *
++ * Setting this unnamed bit supposedly causes prediction information on
++ * non-branch instructions to be ignored. It is to be set unilaterally in
++ * newer microcode.
++ *
++ * This chickenbit is something unrelated on Zen1, and Zen1 vs Zen2 isn't a
++ * simple model number comparison, so use STIBP as a heuristic to separate the
++ * two uarches in Fam17h(AMD)/18h(Hygon).
++ */
++void amd_init_spectral_chicken(void)
++{
++ uint64_t val, chickenbit = 1 << 1;
++
++ if (cpu_has_hypervisor || !boot_cpu_has(X86_FEATURE_AMD_STIBP))
++ return;
++
++ if (rdmsr_safe(MSR_AMD64_DE_CFG2, val) == 0 && !(val & chickenbit))
++ wrmsr_safe(MSR_AMD64_DE_CFG2, val | chickenbit);
++}
++
+ void __init detect_zen2_null_seg_behaviour(void)
+ {
+ uint64_t base;
+@@ -796,6 +821,9 @@ static void init_amd(struct cpuinfo_x86 *c)
+
+ amd_init_ssbd(c);
+
++ if (c->x86 == 0x17)
++ amd_init_spectral_chicken();
++
+ /* Probe for NSCB on Zen2 CPUs when not virtualised */
+ if (!cpu_has_hypervisor && !cpu_has_nscb && c == &boot_cpu_data &&
+ c->x86 == 0x17)
+diff --git a/xen/arch/x86/cpu/cpu.h b/xen/arch/x86/cpu/cpu.h
+index b593bd85f04f..145bc5156a86 100644
+--- a/xen/arch/x86/cpu/cpu.h
++++ b/xen/arch/x86/cpu/cpu.h
+@@ -22,4 +22,5 @@ void early_init_amd(struct cpuinfo_x86 *c);
+ void amd_log_freq(const struct cpuinfo_x86 *c);
+ void amd_init_lfence(struct cpuinfo_x86 *c);
+ void amd_init_ssbd(const struct cpuinfo_x86 *c);
++void amd_init_spectral_chicken(void);
+ void detect_zen2_null_seg_behaviour(void);
+diff --git a/xen/arch/x86/cpu/hygon.c b/xen/arch/x86/cpu/hygon.c
+index cdc94130dd2e..6f8d491297e8 100644
+--- a/xen/arch/x86/cpu/hygon.c
++++ b/xen/arch/x86/cpu/hygon.c
+@@ -40,6 +40,12 @@ static void init_hygon(struct cpuinfo_x86 *c)
+ c->x86 == 0x18)
+ detect_zen2_null_seg_behaviour();
+
++ /*
++ * TODO: Check heuristic safety with Hygon first
++ if (c->x86 == 0x18)
++ amd_init_spectral_chicken();
++ */
++
+ /*
+ * Hygon CPUs before Zen2 don't clear segment bases/limits when
+ * loading a NULL selector.
+diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
+index 72bc32ba04ff..d3735e499e0f 100644
+--- a/xen/include/asm-x86/msr-index.h
++++ b/xen/include/asm-x86/msr-index.h
+@@ -361,6 +361,7 @@
+ #define MSR_AMD64_DE_CFG 0xc0011029
+ #define AMD64_DE_CFG_LFENCE_SERIALISE (_AC(1, ULL) << 1)
+ #define MSR_AMD64_EX_CFG 0xc001102c
++#define MSR_AMD64_DE_CFG2 0xc00110e3
+
+ #define MSR_AMD64_DR0_ADDRESS_MASK 0xc0011027
+ #define MSR_AMD64_DR1_ADDRESS_MASK 0xc0011019
+--
+2.35.1
+
diff --git a/0051-x86-spec-ctrl-Mitigate-Branch-Type-Confusion-when-po.patch b/0051-x86-spec-ctrl-Mitigate-Branch-Type-Confusion-when-po.patch
new file mode 100644
index 0000000..e313ede
--- /dev/null
+++ b/0051-x86-spec-ctrl-Mitigate-Branch-Type-Confusion-when-po.patch
@@ -0,0 +1,305 @@
+From 0a5387a01165b46c8c85e7f7e2ddbe60a7f5db44 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Mon, 27 Jun 2022 19:29:40 +0100
+Subject: [PATCH 51/51] x86/spec-ctrl: Mitigate Branch Type Confusion when
+ possible
+
+Branch Type Confusion affects AMD/Hygon CPUs on Zen2 and earlier. To
+mitigate, we require SMT safety (STIBP on Zen2, no-SMT on Zen1), and to issue
+an IBPB on each entry to Xen, to flush the BTB.
+
+Due to performance concerns, dom0 (which is trusted in most configurations) is
+excluded from protections by default.
+
+Therefore:
+ * Use STIBP by default on Zen2 too, which now means we want it on by default
+ on all hardware supporting STIBP.
+ * Break the current IBPB logic out into a new function, extending it with
+ IBPB-at-entry logic.
+ * Change the existing IBPB-at-ctxt-switch boolean to be tristate, and disable
+ it by default when IBPB-at-entry is providing sufficient safety.
+
+If all PV guests on the system are trusted, then it is recommended to boot
+with `spec-ctrl=ibpb-entry=no-pv`, as this will provide an additional marginal
+perf improvement.
+
+This is part of XSA-407 / CVE-2022-23825.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit d8cb7e0f069e0f106d24941355b59b45a731eabe)
+---
+ docs/misc/xen-command-line.pandoc | 14 ++--
+ xen/arch/x86/spec_ctrl.c | 113 ++++++++++++++++++++++++++----
+ xen/include/asm-x86/spec_ctrl.h | 2 +-
+ 3 files changed, 112 insertions(+), 17 deletions(-)
+
+diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
+index 1bbdb55129cc..bd6826d0ae05 100644
+--- a/docs/misc/xen-command-line.pandoc
++++ b/docs/misc/xen-command-line.pandoc
+@@ -2234,7 +2234,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
+
+ ### spec-ctrl (x86)
+ > `= List of [ <bool>, xen=<bool>, {pv,hvm}=<bool>,
+-> {msr-sc,rsb,md-clear}=<bool>|{pv,hvm}=<bool>,
++> {msr-sc,rsb,md-clear,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
+ > bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
+ > eager-fpu,l1d-flush,branch-harden,srb-lock,
+ > unpriv-mmio}=<bool> ]`
+@@ -2259,9 +2259,10 @@ in place for guests to use.
+
+ Use of a positive boolean value for either of these options is invalid.
+
+-The `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` options offer fine
+-grained control over the primitives by Xen. These impact Xen's ability to
+-protect itself, and/or Xen's ability to virtualise support for guests to use.
++The `pv=`, `hvm=`, `msr-sc=`, `rsb=`, `md-clear=` and `ibpb-entry=` options
++offer fine grained control over the primitives by Xen. These impact Xen's
++ability to protect itself, and/or Xen's ability to virtualise support for
++guests to use.
+
+ * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
+ respectively.
+@@ -2280,6 +2281,11 @@ protect itself, and/or Xen's ability to virtualise support for guests to use.
+ compatibility with development versions of this fix, `mds=` is also accepted
+ on Xen 4.12 and earlier as an alias. Consult vendor documentation in
+ preference to here.*
++* `ibpb-entry=` offers control over whether IBPB (Indirect Branch Prediction
++ Barrier) is used on entry to Xen. This is used by default on hardware
++ vulnerable to Branch Type Confusion, but for performance reasons, dom0 is
++ unprotected by default. If it necessary to protect dom0 too, boot with
++ `spec-ctrl=ibpb-entry`.
+
+ If Xen was compiled with INDIRECT_THUNK support, `bti-thunk=` can be used to
+ select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 0f101c057f3e..1d9796c34d71 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -39,6 +39,10 @@ static bool __initdata opt_rsb_hvm = true;
+ static int8_t __read_mostly opt_md_clear_pv = -1;
+ static int8_t __read_mostly opt_md_clear_hvm = -1;
+
++static int8_t __read_mostly opt_ibpb_entry_pv = -1;
++static int8_t __read_mostly opt_ibpb_entry_hvm = -1;
++static bool __read_mostly opt_ibpb_entry_dom0;
++
+ /* Cmdline controls for Xen's speculative settings. */
+ static enum ind_thunk {
+ THUNK_DEFAULT, /* Decide which thunk to use at boot time. */
+@@ -54,7 +58,7 @@ int8_t __initdata opt_stibp = -1;
+ bool __read_mostly opt_ssbd;
+ int8_t __initdata opt_psfd = -1;
+
+-bool __read_mostly opt_ibpb_ctxt_switch = true;
++int8_t __read_mostly opt_ibpb_ctxt_switch = -1;
+ int8_t __read_mostly opt_eager_fpu = -1;
+ int8_t __read_mostly opt_l1d_flush = -1;
+ static bool __initdata opt_branch_harden = true;
+@@ -114,6 +118,9 @@ static int __init parse_spec_ctrl(const char *s)
+ opt_rsb_hvm = false;
+ opt_md_clear_pv = 0;
+ opt_md_clear_hvm = 0;
++ opt_ibpb_entry_pv = 0;
++ opt_ibpb_entry_hvm = 0;
++ opt_ibpb_entry_dom0 = false;
+
+ opt_thunk = THUNK_JMP;
+ opt_ibrs = 0;
+@@ -140,12 +147,14 @@ static int __init parse_spec_ctrl(const char *s)
+ opt_msr_sc_pv = val;
+ opt_rsb_pv = val;
+ opt_md_clear_pv = val;
++ opt_ibpb_entry_pv = val;
+ }
+ else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
+ {
+ opt_msr_sc_hvm = val;
+ opt_rsb_hvm = val;
+ opt_md_clear_hvm = val;
++ opt_ibpb_entry_hvm = val;
+ }
+ else if ( (val = parse_boolean("msr-sc", s, ss)) != -1 )
+ {
+@@ -210,6 +219,28 @@ static int __init parse_spec_ctrl(const char *s)
+ break;
+ }
+ }
++ else if ( (val = parse_boolean("ibpb-entry", s, ss)) != -1 )
++ {
++ switch ( val )
++ {
++ case 0:
++ case 1:
++ opt_ibpb_entry_pv = opt_ibpb_entry_hvm =
++ opt_ibpb_entry_dom0 = val;
++ break;
++
++ case -2:
++ s += strlen("ibpb-entry=");
++ if ( (val = parse_boolean("pv", s, ss)) >= 0 )
++ opt_ibpb_entry_pv = val;
++ else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
++ opt_ibpb_entry_hvm = val;
++ else
++ default:
++ rc = -EINVAL;
++ break;
++ }
++ }
+
+ /* Xen's speculative sidechannel mitigation settings. */
+ else if ( !strncmp(s, "bti-thunk=", 10) )
+@@ -477,27 +508,31 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
+ * mitigation support for guests.
+ */
+ #ifdef CONFIG_HVM
+- printk(" Support for HVM VMs:%s%s%s%s%s\n",
++ printk(" Support for HVM VMs:%s%s%s%s%s%s\n",
+ (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
+ boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
+ boot_cpu_has(X86_FEATURE_MD_CLEAR) ||
++ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ||
+ opt_eager_fpu) ? "" : " None",
+ boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ? " MSR_SPEC_CTRL" : "",
+ boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ? " RSB" : "",
+ opt_eager_fpu ? " EAGER_FPU" : "",
+- boot_cpu_has(X86_FEATURE_MD_CLEAR) ? " MD_CLEAR" : "");
++ boot_cpu_has(X86_FEATURE_MD_CLEAR) ? " MD_CLEAR" : "",
++ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ? " IBPB-entry" : "");
+
+ #endif
+ #ifdef CONFIG_PV
+- printk(" Support for PV VMs:%s%s%s%s%s\n",
++ printk(" Support for PV VMs:%s%s%s%s%s%s\n",
+ (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
+ boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
+ boot_cpu_has(X86_FEATURE_MD_CLEAR) ||
++ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ||
+ opt_eager_fpu) ? "" : " None",
+ boot_cpu_has(X86_FEATURE_SC_MSR_PV) ? " MSR_SPEC_CTRL" : "",
+ boot_cpu_has(X86_FEATURE_SC_RSB_PV) ? " RSB" : "",
+ opt_eager_fpu ? " EAGER_FPU" : "",
+- boot_cpu_has(X86_FEATURE_MD_CLEAR) ? " MD_CLEAR" : "");
++ boot_cpu_has(X86_FEATURE_MD_CLEAR) ? " MD_CLEAR" : "",
++ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ? " IBPB-entry" : "");
+
+ printk(" XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
+ opt_xpti_hwdom ? "enabled" : "disabled",
+@@ -759,6 +794,55 @@ static bool __init should_use_eager_fpu(void)
+ }
+ }
+
++static void __init ibpb_calculations(void)
++{
++ /* Check we have hardware IBPB support before using it... */
++ if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
++ {
++ opt_ibpb_entry_hvm = opt_ibpb_entry_pv = opt_ibpb_ctxt_switch = 0;
++ opt_ibpb_entry_dom0 = false;
++ return;
++ }
++
++ /*
++ * IBPB-on-entry mitigations for Branch Type Confusion.
++ *
++ * IBPB && !BTC_NO selects all AMD/Hygon hardware, not known to be safe,
++ * that we can provide some form of mitigation on.
++ */
++ if ( opt_ibpb_entry_pv == -1 )
++ opt_ibpb_entry_pv = (IS_ENABLED(CONFIG_PV) &&
++ boot_cpu_has(X86_FEATURE_IBPB) &&
++ !boot_cpu_has(X86_FEATURE_BTC_NO));
++ if ( opt_ibpb_entry_hvm == -1 )
++ opt_ibpb_entry_hvm = (IS_ENABLED(CONFIG_HVM) &&
++ boot_cpu_has(X86_FEATURE_IBPB) &&
++ !boot_cpu_has(X86_FEATURE_BTC_NO));
++
++ if ( opt_ibpb_entry_pv )
++ {
++ setup_force_cpu_cap(X86_FEATURE_IBPB_ENTRY_PV);
++
++ /*
++ * We only need to flush in IST context if we're protecting against PV
++ * guests. HVM IBPB-on-entry protections are both atomic with
++ * NMI/#MC, so can't interrupt Xen ahead of having already flushed the
++ * BTB.
++ */
++ default_spec_ctrl_flags |= SCF_ist_ibpb;
++ }
++ if ( opt_ibpb_entry_hvm )
++ setup_force_cpu_cap(X86_FEATURE_IBPB_ENTRY_HVM);
++
++ /*
++ * If we're using IBPB-on-entry to protect against PV and HVM guests
++ * (ignoring dom0 if trusted), then there's no need to also issue IBPB on
++ * context switch too.
++ */
++ if ( opt_ibpb_ctxt_switch == -1 )
++ opt_ibpb_ctxt_switch = !(opt_ibpb_entry_hvm && opt_ibpb_entry_pv);
++}
++
+ /* Calculate whether this CPU is vulnerable to L1TF. */
+ static __init void l1tf_calculations(uint64_t caps)
+ {
+@@ -1014,8 +1098,12 @@ void spec_ctrl_init_domain(struct domain *d)
+ bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
+ (opt_fb_clear_mmio && is_iommu_enabled(d)));
+
++ bool ibpb = ((pv ? opt_ibpb_entry_pv : opt_ibpb_entry_hvm) &&
++ (d->domain_id != 0 || opt_ibpb_entry_dom0));
++
+ d->arch.spec_ctrl_flags =
+ (verw ? SCF_verw : 0) |
++ (ibpb ? SCF_entry_ibpb : 0) |
+ 0;
+ }
+
+@@ -1162,12 +1250,15 @@ void __init init_speculation_mitigations(void)
+ }
+
+ /*
+- * Use STIBP by default if the hardware hint is set. Otherwise, leave it
+- * off as it a severe performance pentalty on pre-eIBRS Intel hardware
+- * where it was retrofitted in microcode.
++ * Use STIBP by default on all AMD systems. Zen3 and later enumerate
++ * STIBP_ALWAYS, but STIBP is needed on Zen2 as part of the mitigations
++ * for Branch Type Confusion.
++ *
++ * Leave STIBP off by default on Intel. Pre-eIBRS systems suffer a
++ * substantial perf hit when it was implemented in microcode.
+ */
+ if ( opt_stibp == -1 )
+- opt_stibp = !!boot_cpu_has(X86_FEATURE_STIBP_ALWAYS);
++ opt_stibp = !!boot_cpu_has(X86_FEATURE_AMD_STIBP);
+
+ if ( opt_stibp && (boot_cpu_has(X86_FEATURE_STIBP) ||
+ boot_cpu_has(X86_FEATURE_AMD_STIBP)) )
+@@ -1239,9 +1330,7 @@ void __init init_speculation_mitigations(void)
+ if ( opt_rsb_hvm )
+ setup_force_cpu_cap(X86_FEATURE_SC_RSB_HVM);
+
+- /* Check we have hardware IBPB support before using it... */
+- if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
+- opt_ibpb_ctxt_switch = false;
++ ibpb_calculations();
+
+ /* Check whether Eager FPU should be enabled by default. */
+ if ( opt_eager_fpu == -1 )
+diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
+index 3fc599a817c4..9403b81dc7af 100644
+--- a/xen/include/asm-x86/spec_ctrl.h
++++ b/xen/include/asm-x86/spec_ctrl.h
+@@ -65,7 +65,7 @@
+ void init_speculation_mitigations(void);
+ void spec_ctrl_init_domain(struct domain *d);
+
+-extern bool opt_ibpb_ctxt_switch;
++extern int8_t opt_ibpb_ctxt_switch;
+ extern bool opt_ssbd;
+ extern int8_t opt_eager_fpu;
+ extern int8_t opt_l1d_flush;
+--
+2.35.1
+
diff --git a/info.txt b/info.txt
index 2310ace..e830829 100644
--- a/info.txt
+++ b/info.txt
@@ -1,6 +1,6 @@
-Xen Upstream Patchset #0 for 4.16.2-pre
+Xen upstream patchset #1 for 4.16.2-pre
Containing patches from
RELEASE-4.16.1 (13fee86475f3831d7a1ecf6d7e0acbc2ac779f7e)
to
-staging-4.16 (2e82446cb252f6c8ac697e81f4155872c69afde4)
+staging-4.16 (0a5387a01165b46c8c85e7f7e2ddbe60a7f5db44)
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [gentoo-commits] proj/xen-upstream-patches:main commit in: /
@ 2022-07-14 8:16 Florian Schmaus
0 siblings, 0 replies; 11+ messages in thread
From: Florian Schmaus @ 2022-07-14 8:16 UTC (permalink / raw
To: gentoo-commits
commit: 61be85e19a48da7c6e8387cd46901b2852cb2e2c
Author: Florian Schmaus <flow <AT> gentoo <DOT> org>
AuthorDate: Thu Jul 14 08:14:22 2022 +0000
Commit: Florian Schmaus <flow <AT> gentoo <DOT> org>
CommitDate: Thu Jul 14 08:16:37 2022 +0000
URL: https://gitweb.gentoo.org/proj/xen-upstream-patches.git/commit/?id=61be85e1
Correctly obtain the array length
Signed-off-by: Florian Schmaus <flow <AT> gentoo.org>
create-patches | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/create-patches b/create-patches
index 8e8c9fa..5a73699 100755
--- a/create-patches
+++ b/create-patches
@@ -18,7 +18,7 @@ XEN_MAJOR_MINOR_VERSION="${XEN_VER_COMPONENTS[0]}.${XEN_VER_COMPONENTS[1]}"
git -C "${XEN_REPO_DIR}" fetch origin
readarray -d '' CURRENT_PATCHES < <(find . -maxdepth 1 -type f -name "*.patch" -print0)
-if [[ ${CURRENT_PATCHES[@]} -gt 0 ]]; then
+if [[ ${#CURRENT_PATCHES[@]} -gt 0 ]]; then
git rm -f *.patch
fi
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [gentoo-commits] proj/xen-upstream-patches:main commit in: /
@ 2022-10-19 9:04 Florian Schmaus
0 siblings, 0 replies; 11+ messages in thread
From: Florian Schmaus @ 2022-10-19 9:04 UTC (permalink / raw
To: gentoo-commits
commit: a0ef09913a37dcad16d28e9f5fa1e4f6a7cc5da7
Author: Florian Schmaus <flow <AT> gentoo <DOT> org>
AuthorDate: Wed Oct 19 09:03:58 2022 +0000
Commit: Florian Schmaus <flow <AT> gentoo <DOT> org>
CommitDate: Wed Oct 19 09:03:58 2022 +0000
URL: https://gitweb.gentoo.org/proj/xen-upstream-patches.git/commit/?id=a0ef0991
Xen 4.16.3-pre-patchset-0
Signed-off-by: Florian Schmaus <flow <AT> gentoo.org>
... => 0001-update-Xen-version-to-4.16.3-pre.patch | 14 +-
...p-unmap_domain_pirq-XSM-during-destructio.patch | 50 ----
...-Prevent-adding-mapping-when-domain-is-dy.patch | 62 +++++
...-Handle-preemption-when-freeing-intermedi.patch | 167 +++++++++++
...xen-fix-XEN_DOMCTL_gdbsx_guestmemio-crash.patch | 63 -----
...e-to-use-IOMMU-with-reserved-CAP.ND-value.patch | 49 ----
...-option-to-skip-root-pagetable-removal-in.patch | 138 +++++++++
...just-monitor-table-related-error-handling.patch | 77 ++++++
...d-inadvertently-degrading-a-TLB-flush-to-.patch | 116 --------
...tolerate-failure-of-sh_set_toplevel_shado.patch | 76 +++++
...xen-build-Fix-dependency-for-the-MAP-rule.patch | 29 --
...evtchn-don-t-set-errno-to-negative-values.patch | 74 -----
...hadow-tolerate-failure-in-shadow_prealloc.patch | 279 +++++++++++++++++++
...-ctrl-don-t-set-errno-to-a-negative-value.patch | 36 ---
...-refuse-new-allocations-for-dying-domains.patch | 100 +++++++
...guest-don-t-set-errno-to-a-negative-value.patch | 32 ---
...ly-free-paging-pool-memory-for-dying-doma.patch | 115 ++++++++
...light-don-t-set-errno-to-a-negative-value.patch | 32 ---
...-free-the-paging-memory-pool-preemptively.patch | 181 ++++++++++++
...mmu-cleanup-iommu-related-domctl-handling.patch | 112 --------
...en-x86-p2m-Add-preemption-in-p2m_teardown.patch | 197 +++++++++++++
...-make-domctl-handler-tolerate-NULL-domain.patch | 36 ---
...s-Use-arch-specific-default-paging-memory.patch | 149 ++++++++++
...-disallow-device-assignment-to-PoD-guests.patch | 229 ---------------
...m-Construct-the-P2M-pages-pool-for-guests.patch | 189 +++++++++++++
...-msr-handle-reads-to-MSR_P5_MC_-ADDR-TYPE.patch | 121 --------
...xl-Implement-XEN_DOMCTL_shadow_op-for-Arm.patch | 108 ++++++++
0015-kconfig-detect-LD-implementation.patch | 46 ---
...ocate-and-free-P2M-pages-from-the-P2M-poo.patch | 289 +++++++++++++++++++
...ect-locking-on-transitive-grant-copy-erro.patch | 66 +++++
...-lld-do-not-generate-quoted-section-names.patch | 54 ----
...-Replace-deprecated-soundhw-on-QEMU-comma.patch | 112 ++++++++
...race-between-sending-an-I-O-and-domain-sh.patch | 142 ----------
...ess-GNU-ld-warning-about-RWX-load-segment.patch | 35 ---
...urface-suitable-value-in-EBX-of-XSTATE-su.patch | 44 +++
...ce-GNU-ld-warning-about-executable-stacks.patch | 35 ---
...ed-introduce-cpupool_update_node_affinity.patch | 257 +++++++++++++++++
...0-use-poll-mode-if-INTERRUPT_LINE-is-0xff.patch | 50 ----
...arve-out-memory-allocation-and-freeing-fr.patch | 263 ++++++++++++++++++
...llow-pci-phantom-to-mark-real-devices-as-.patch | 56 ----
0021-xen-sched-fix-cpu-hotplug.patch | 307 +++++++++++++++++++++
...orrect-PIE-related-option-s-in-EMBEDDED_E.patch | 58 ++++
0022-x86-pv-Clean-up-_get_page_type.patch | 180 ------------
...ore-minor-fix-of-the-migration-stream-doc.patch | 41 +++
...v-Fix-ABAC-cmpxchg-race-in-_get_page_type.patch | 201 --------------
...troduce-_PAGE_-constants-for-memory-types.patch | 53 ----
0024-xen-gnttab-fix-gnttab_acquire_resource.patch | 69 +++++
...-change-the-cacheability-of-the-directmap.patch | 223 ---------------
...-VCPUOP_register_vcpu_time_memory_area-fo.patch | 59 ++++
...-Split-cache_flush-out-of-cache_writeback.patch | 294 --------------------
...-x86-vpmu-Fix-race-condition-in-vpmu_load.patch | 97 +++++++
...rk-around-CLFLUSH-ordering-on-older-parts.patch | 95 -------
...ck-and-flush-non-coherent-mappings-of-RAM.patch | 160 -----------
...unt-for-PGT_pae_xen_l2-in-recently-added-.patch | 37 ---
...rl-Make-VERW-flushing-runtime-conditional.patch | 258 -----------------
...rl-Enumeration-for-MMIO-Stale-Data-contro.patch | 98 -------
0032-x86-spec-ctrl-Add-spec-ctrl-unpriv-mmio.patch | 187 -------------
...ork-around-bogus-gcc12-warning-in-hvm_gsi.patch | 52 ----
...i-dbgp-fix-selecting-n-th-ehci-controller.patch | 36 ---
0035-tools-xenstored-Harden-corrupt.patch | 44 ---
...rl-Only-adjust-MSR_SPEC_CTRL-for-idle-wit.patch | 93 -------
...rl-Knobs-for-STIBP-and-PSFD-and-follow-ha.patch | 234 ----------------
0038-libxc-fix-compilation-error-with-gcc13.patch | 33 ---
...rl-Honour-spec-ctrl-0-for-unpriv-mmio-sub.patch | 32 ---
...-Extend-parse_boolean-to-signal-a-name-ma.patch | 87 ------
...rl-Add-fine-grained-cmdline-suboptions-fo.patch | 137 ---------
...rs-fix-build-of-xen-init-dom0-with-Werror.patch | 28 --
...-return-value-of-libxl__xs_directory-in-n.patch | 38 ---
...rl-Rework-spec_ctrl_flags-context-switchi.patch | 167 -----------
...rl-Rename-SCF_ist_wrmsr-to-SCF_ist_sc_msr.patch | 110 --------
...rl-Rename-opt_ibpb-to-opt_ibpb_ctxt_switc.patch | 97 -------
...ctrl-Rework-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch | 106 -------
0048-x86-spec-ctrl-Support-IBPB-on-entry.patch | 300 --------------------
0049-x86-cpuid-Enumeration-for-BTC_NO.patch | 106 -------
0050-x86-spec-ctrl-Enable-Zen2-chickenbit.patch | 106 -------
...rl-Mitigate-Branch-Type-Confusion-when-po.patch | 305 --------------------
info.txt | 6 +-
77 files changed, 3510 insertions(+), 5304 deletions(-)
diff --git a/0001-update-Xen-version-to-4.16.2-pre.patch b/0001-update-Xen-version-to-4.16.3-pre.patch
similarity index 59%
rename from 0001-update-Xen-version-to-4.16.2-pre.patch
rename to 0001-update-Xen-version-to-4.16.3-pre.patch
index 2e62c21..6ae690c 100644
--- a/0001-update-Xen-version-to-4.16.2-pre.patch
+++ b/0001-update-Xen-version-to-4.16.3-pre.patch
@@ -1,25 +1,25 @@
-From 5be9edb482ab20cf3e7acb05b511465294d1e19b Mon Sep 17 00:00:00 2001
+From 4aa32912ebeda8cb94d1c3941e7f1f0a2d4f921b Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 7 Jun 2022 13:55:17 +0200
-Subject: [PATCH 01/51] update Xen version to 4.16.2-pre
+Date: Tue, 11 Oct 2022 14:49:41 +0200
+Subject: [PATCH 01/26] update Xen version to 4.16.3-pre
---
xen/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/Makefile b/xen/Makefile
-index 8abc71cf73aa..90a29782dbf4 100644
+index 76d0a3ff253f..8a403ee896cd 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -2,7 +2,7 @@
# All other places this is stored (eg. compile.h) should be autogenerated.
export XEN_VERSION = 4
export XEN_SUBVERSION = 16
--export XEN_EXTRAVERSION ?= .1$(XEN_VENDORVERSION)
-+export XEN_EXTRAVERSION ?= .2-pre$(XEN_VENDORVERSION)
+-export XEN_EXTRAVERSION ?= .2$(XEN_VENDORVERSION)
++export XEN_EXTRAVERSION ?= .3-pre$(XEN_VENDORVERSION)
export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
-include xen-version
--
-2.35.1
+2.37.3
diff --git a/0002-x86-irq-skip-unmap_domain_pirq-XSM-during-destructio.patch b/0002-x86-irq-skip-unmap_domain_pirq-XSM-during-destructio.patch
deleted file mode 100644
index 0ba090e..0000000
--- a/0002-x86-irq-skip-unmap_domain_pirq-XSM-during-destructio.patch
+++ /dev/null
@@ -1,50 +0,0 @@
-From b58fb6e81bd55b6bd946abc3070770f7994c9ef9 Mon Sep 17 00:00:00 2001
-From: Jason Andryuk <jandryuk@gmail.com>
-Date: Tue, 7 Jun 2022 13:55:39 +0200
-Subject: [PATCH 02/51] x86/irq: skip unmap_domain_pirq XSM during destruction
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-xsm_unmap_domain_irq was seen denying unmap_domain_pirq when called from
-complete_domain_destroy as an RCU callback. The source context was an
-unexpected, random domain. Since this is a xen-internal operation,
-going through the XSM hook is inapproriate.
-
-Check d->is_dying and skip the XSM hook when set since this is a cleanup
-operation for a domain being destroyed.
-
-Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
-Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 2e6f95a942d1927a53f077c301db0b799c54c05a
-master date: 2022-04-08 14:51:52 +0200
----
- xen/arch/x86/irq.c | 10 ++++++++--
- 1 file changed, 8 insertions(+), 2 deletions(-)
-
-diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
-index 67cbf6b979dc..47b86af5dce9 100644
---- a/xen/arch/x86/irq.c
-+++ b/xen/arch/x86/irq.c
-@@ -2342,8 +2342,14 @@ int unmap_domain_pirq(struct domain *d, int pirq)
- nr = msi_desc->msi.nvec;
- }
-
-- ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq,
-- msi_desc ? msi_desc->dev : NULL);
-+ /*
-+ * When called by complete_domain_destroy via RCU, current is a random
-+ * domain. Skip the XSM check since this is a Xen-initiated action.
-+ */
-+ if ( !d->is_dying )
-+ ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq,
-+ msi_desc ? msi_desc->dev : NULL);
-+
- if ( ret )
- goto done;
-
---
-2.35.1
-
diff --git a/0002-xen-arm-p2m-Prevent-adding-mapping-when-domain-is-dy.patch b/0002-xen-arm-p2m-Prevent-adding-mapping-when-domain-is-dy.patch
new file mode 100644
index 0000000..fecc260
--- /dev/null
+++ b/0002-xen-arm-p2m-Prevent-adding-mapping-when-domain-is-dy.patch
@@ -0,0 +1,62 @@
+From 8d9531a3421dad2b0012e09e6f41d5274e162064 Mon Sep 17 00:00:00 2001
+From: Julien Grall <jgrall@amazon.com>
+Date: Tue, 11 Oct 2022 14:52:13 +0200
+Subject: [PATCH 02/26] xen/arm: p2m: Prevent adding mapping when domain is
+ dying
+
+During the domain destroy process, the domain will still be accessible
+until it is fully destroyed. So does the P2M because we don't bail
+out early if is_dying is non-zero. If a domain has permission to
+modify the other domain's P2M (i.e. dom0, or a stubdomain), then
+foreign mapping can be added past relinquish_p2m_mapping().
+
+Therefore, we need to prevent mapping to be added when the domain
+is dying. This commit prevents such adding of mapping by adding the
+d->is_dying check to p2m_set_entry(). Also this commit enhances the
+check in relinquish_p2m_mapping() to make sure that no mappings can
+be added in the P2M after the P2M lock is released.
+
+This is part of CVE-2022-33746 / XSA-410.
+
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Henry Wang <Henry.Wang@arm.com>
+Tested-by: Henry Wang <Henry.Wang@arm.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+master commit: 3ebe773293e3b945460a3d6f54f3b91915397bab
+master date: 2022-10-11 14:20:18 +0200
+---
+ xen/arch/arm/p2m.c | 11 +++++++++++
+ 1 file changed, 11 insertions(+)
+
+diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
+index 3349b464a39e..1affdafadbeb 100644
+--- a/xen/arch/arm/p2m.c
++++ b/xen/arch/arm/p2m.c
+@@ -1093,6 +1093,15 @@ int p2m_set_entry(struct p2m_domain *p2m,
+ {
+ int rc = 0;
+
++ /*
++ * Any reference taken by the P2M mappings (e.g. foreign mapping) will
++ * be dropped in relinquish_p2m_mapping(). As the P2M will still
++ * be accessible after, we need to prevent mapping to be added when the
++ * domain is dying.
++ */
++ if ( unlikely(p2m->domain->is_dying) )
++ return -ENOMEM;
++
+ while ( nr )
+ {
+ unsigned long mask;
+@@ -1610,6 +1619,8 @@ int relinquish_p2m_mapping(struct domain *d)
+ unsigned int order;
+ gfn_t start, end;
+
++ BUG_ON(!d->is_dying);
++ /* No mappings can be added in the P2M after the P2M lock is released. */
+ p2m_write_lock(p2m);
+
+ start = p2m->lowest_mapped_gfn;
+--
+2.37.3
+
diff --git a/0003-xen-arm-p2m-Handle-preemption-when-freeing-intermedi.patch b/0003-xen-arm-p2m-Handle-preemption-when-freeing-intermedi.patch
new file mode 100644
index 0000000..3190db8
--- /dev/null
+++ b/0003-xen-arm-p2m-Handle-preemption-when-freeing-intermedi.patch
@@ -0,0 +1,167 @@
+From 937fdbad5180440888f1fcee46299103327efa90 Mon Sep 17 00:00:00 2001
+From: Julien Grall <jgrall@amazon.com>
+Date: Tue, 11 Oct 2022 14:52:27 +0200
+Subject: [PATCH 03/26] xen/arm: p2m: Handle preemption when freeing
+ intermediate page tables
+
+At the moment the P2M page tables will be freed when the domain structure
+is freed without any preemption. As the P2M is quite large, iterating
+through this may take more time than it is reasonable without intermediate
+preemption (to run softirqs and perhaps scheduler).
+
+Split p2m_teardown() in two parts: one preemptible and called when
+relinquishing the resources, the other one non-preemptible and called
+when freeing the domain structure.
+
+As we are now freeing the P2M pages early, we also need to prevent
+further allocation if someone call p2m_set_entry() past p2m_teardown()
+(I wasn't able to prove this will never happen). This is done by
+the checking domain->is_dying from previous patch in p2m_set_entry().
+
+Similarly, we want to make sure that no-one can accessed the free
+pages. Therefore the root is cleared before freeing pages.
+
+This is part of CVE-2022-33746 / XSA-410.
+
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Henry Wang <Henry.Wang@arm.com>
+Tested-by: Henry Wang <Henry.Wang@arm.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+master commit: 3202084566bba0ef0c45caf8c24302f83d92f9c8
+master date: 2022-10-11 14:20:56 +0200
+---
+ xen/arch/arm/domain.c | 10 +++++++--
+ xen/arch/arm/p2m.c | 47 ++++++++++++++++++++++++++++++++++++---
+ xen/include/asm-arm/p2m.h | 13 +++++++++--
+ 3 files changed, 63 insertions(+), 7 deletions(-)
+
+diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
+index 96e1b235501d..2694c39127c5 100644
+--- a/xen/arch/arm/domain.c
++++ b/xen/arch/arm/domain.c
+@@ -789,10 +789,10 @@ fail:
+ void arch_domain_destroy(struct domain *d)
+ {
+ /* IOMMU page table is shared with P2M, always call
+- * iommu_domain_destroy() before p2m_teardown().
++ * iommu_domain_destroy() before p2m_final_teardown().
+ */
+ iommu_domain_destroy(d);
+- p2m_teardown(d);
++ p2m_final_teardown(d);
+ domain_vgic_free(d);
+ domain_vuart_free(d);
+ free_xenheap_page(d->shared_info);
+@@ -996,6 +996,7 @@ enum {
+ PROG_xen,
+ PROG_page,
+ PROG_mapping,
++ PROG_p2m,
+ PROG_done,
+ };
+
+@@ -1056,6 +1057,11 @@ int domain_relinquish_resources(struct domain *d)
+ if ( ret )
+ return ret;
+
++ PROGRESS(p2m):
++ ret = p2m_teardown(d);
++ if ( ret )
++ return ret;
++
+ PROGRESS(done):
+ break;
+
+diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
+index 1affdafadbeb..27418ee5ee98 100644
+--- a/xen/arch/arm/p2m.c
++++ b/xen/arch/arm/p2m.c
+@@ -1527,17 +1527,58 @@ static void p2m_free_vmid(struct domain *d)
+ spin_unlock(&vmid_alloc_lock);
+ }
+
+-void p2m_teardown(struct domain *d)
++int p2m_teardown(struct domain *d)
+ {
+ struct p2m_domain *p2m = p2m_get_hostp2m(d);
++ unsigned long count = 0;
+ struct page_info *pg;
++ unsigned int i;
++ int rc = 0;
++
++ p2m_write_lock(p2m);
++
++ /*
++ * We are about to free the intermediate page-tables, so clear the
++ * root to prevent any walk to use them.
++ */
++ for ( i = 0; i < P2M_ROOT_PAGES; i++ )
++ clear_and_clean_page(p2m->root + i);
++
++ /*
++ * The domain will not be scheduled anymore, so in theory we should
++ * not need to flush the TLBs. Do it for safety purpose.
++ *
++ * Note that all the devices have already been de-assigned. So we don't
++ * need to flush the IOMMU TLB here.
++ */
++ p2m_force_tlb_flush_sync(p2m);
++
++ while ( (pg = page_list_remove_head(&p2m->pages)) )
++ {
++ free_domheap_page(pg);
++ count++;
++ /* Arbitrarily preempt every 512 iterations */
++ if ( !(count % 512) && hypercall_preempt_check() )
++ {
++ rc = -ERESTART;
++ break;
++ }
++ }
++
++ p2m_write_unlock(p2m);
++
++ return rc;
++}
++
++void p2m_final_teardown(struct domain *d)
++{
++ struct p2m_domain *p2m = p2m_get_hostp2m(d);
+
+ /* p2m not actually initialized */
+ if ( !p2m->domain )
+ return;
+
+- while ( (pg = page_list_remove_head(&p2m->pages)) )
+- free_domheap_page(pg);
++ ASSERT(page_list_empty(&p2m->pages));
+
+ if ( p2m->root )
+ free_domheap_pages(p2m->root, P2M_ROOT_ORDER);
+diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
+index 8f11d9c97b5d..b3ba83283e11 100644
+--- a/xen/include/asm-arm/p2m.h
++++ b/xen/include/asm-arm/p2m.h
+@@ -192,8 +192,17 @@ void setup_virt_paging(void);
+ /* Init the datastructures for later use by the p2m code */
+ int p2m_init(struct domain *d);
+
+-/* Return all the p2m resources to Xen. */
+-void p2m_teardown(struct domain *d);
++/*
++ * The P2M resources are freed in two parts:
++ * - p2m_teardown() will be called when relinquish the resources. It
++ * will free large resources (e.g. intermediate page-tables) that
++ * requires preemption.
++ * - p2m_final_teardown() will be called when domain struct is been
++ * freed. This *cannot* be preempted and therefore one small
++ * resources should be freed here.
++ */
++int p2m_teardown(struct domain *d);
++void p2m_final_teardown(struct domain *d);
+
+ /*
+ * Remove mapping refcount on each mapping page in the p2m
+--
+2.37.3
+
diff --git a/0003-xen-fix-XEN_DOMCTL_gdbsx_guestmemio-crash.patch b/0003-xen-fix-XEN_DOMCTL_gdbsx_guestmemio-crash.patch
deleted file mode 100644
index fa1443c..0000000
--- a/0003-xen-fix-XEN_DOMCTL_gdbsx_guestmemio-crash.patch
+++ /dev/null
@@ -1,63 +0,0 @@
-From 6c6bbfdff9374ef41f84c4ebed7b8a7a40767ef6 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 7 Jun 2022 13:56:54 +0200
-Subject: [PATCH 03/51] xen: fix XEN_DOMCTL_gdbsx_guestmemio crash
-
-A hypervisor built without CONFIG_GDBSX will crash in case the
-XEN_DOMCTL_gdbsx_guestmemio domctl is being called, as the call will
-end up in iommu_do_domctl() with d == NULL:
-
- (XEN) CPU: 6
- (XEN) RIP: e008:[<ffff82d040269984>] iommu_do_domctl+0x4/0x30
- (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (d0v0)
- (XEN) rax: 00000000000003e8 rbx: ffff830856277ef8 rcx: ffff830856277fff
- ...
- (XEN) Xen call trace:
- (XEN) [<ffff82d040269984>] R iommu_do_domctl+0x4/0x30
- (XEN) [<ffff82d04035cd5f>] S arch_do_domctl+0x7f/0x2330
- (XEN) [<ffff82d040239e46>] S do_domctl+0xe56/0x1930
- (XEN) [<ffff82d040238ff0>] S do_domctl+0/0x1930
- (XEN) [<ffff82d0402f8c59>] S pv_hypercall+0x99/0x110
- (XEN) [<ffff82d0402f5161>] S arch/x86/pv/domain.c#_toggle_guest_pt+0x11/0x90
- (XEN) [<ffff82d040366288>] S lstar_enter+0x128/0x130
- (XEN)
- (XEN) Pagetable walk from 0000000000000144:
- (XEN) L4[0x000] = 0000000000000000 ffffffffffffffff
- (XEN)
- (XEN) ****************************************
- (XEN) Panic on CPU 6:
- (XEN) FATAL PAGE FAULT
- (XEN) [error_code=0000]
- (XEN) Faulting linear address: 0000000000000144
- (XEN) ****************************************
-
-It used to be permitted to pass DOMID_IDLE to dbg_rw_mem(), which is why the
-special case skipping the domid checks exists. Now that it is only permitted
-to pass proper domids, remove the special case, making 'd' always valid.
-
-Reported-by: Cheyenne Wills <cheyenne.wills@gmail.com>
-Fixes: e726a82ca0dc ("xen: make gdbsx support configurable")
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: f00daf1fb3213a9b0335d9dcd90fe9cb5c02b7a9
-master date: 2022-04-19 17:07:08 +0100
----
- xen/common/domctl.c | 1 -
- 1 file changed, 1 deletion(-)
-
-diff --git a/xen/common/domctl.c b/xen/common/domctl.c
-index 271862ae587f..419e4070f59d 100644
---- a/xen/common/domctl.c
-+++ b/xen/common/domctl.c
-@@ -304,7 +304,6 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
- if ( op->domain == DOMID_INVALID )
- {
- case XEN_DOMCTL_createdomain:
-- case XEN_DOMCTL_gdbsx_guestmemio:
- d = NULL;
- break;
- }
---
-2.35.1
-
diff --git a/0004-VT-d-refuse-to-use-IOMMU-with-reserved-CAP.ND-value.patch b/0004-VT-d-refuse-to-use-IOMMU-with-reserved-CAP.ND-value.patch
deleted file mode 100644
index a4d229a..0000000
--- a/0004-VT-d-refuse-to-use-IOMMU-with-reserved-CAP.ND-value.patch
+++ /dev/null
@@ -1,49 +0,0 @@
-From b378ee56c7e0bb5eeb35dcc55b3d29e5f50eb566 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 7 Jun 2022 13:58:16 +0200
-Subject: [PATCH 04/51] VT-d: refuse to use IOMMU with reserved CAP.ND value
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The field taking the value 7 (resulting in 18-bit DIDs when using the
-calculation in cap_ndoms(), when the DID fields are only 16 bits wide)
-is reserved. Instead of misbehaving in case we would encounter such an
-IOMMU, refuse to use it.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Kevin Tian <kevin.tian@intel.com>
-master commit: a1545fbf45c689aff39ce76a6eaa609d32ef72a7
-master date: 2022-04-20 10:54:26 +0200
----
- xen/drivers/passthrough/vtd/iommu.c | 4 +++-
- 1 file changed, 3 insertions(+), 1 deletion(-)
-
-diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
-index 93dd8aa643aa..8975c1de61bc 100644
---- a/xen/drivers/passthrough/vtd/iommu.c
-+++ b/xen/drivers/passthrough/vtd/iommu.c
-@@ -1279,8 +1279,11 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
-
- quirk_iommu_caps(iommu);
-
-+ nr_dom = cap_ndoms(iommu->cap);
-+
- if ( cap_fault_reg_offset(iommu->cap) +
- cap_num_fault_regs(iommu->cap) * PRIMARY_FAULT_REG_LEN >= PAGE_SIZE ||
-+ ((nr_dom - 1) >> 16) /* I.e. cap.nd > 6 */ ||
- ecap_iotlb_offset(iommu->ecap) >= PAGE_SIZE )
- {
- printk(XENLOG_ERR VTDPREFIX "IOMMU: unsupported\n");
-@@ -1305,7 +1308,6 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
- vtd_ops.sync_cache = sync_cache;
-
- /* allocate domain id bitmap */
-- nr_dom = cap_ndoms(iommu->cap);
- iommu->domid_bitmap = xzalloc_array(unsigned long, BITS_TO_LONGS(nr_dom));
- if ( !iommu->domid_bitmap )
- return -ENOMEM;
---
-2.35.1
-
diff --git a/0004-x86-p2m-add-option-to-skip-root-pagetable-removal-in.patch b/0004-x86-p2m-add-option-to-skip-root-pagetable-removal-in.patch
new file mode 100644
index 0000000..b3edbd9
--- /dev/null
+++ b/0004-x86-p2m-add-option-to-skip-root-pagetable-removal-in.patch
@@ -0,0 +1,138 @@
+From 8fc19c143b8aa563077f3d5c46fcc0a54dc04f35 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 11 Oct 2022 14:52:39 +0200
+Subject: [PATCH 04/26] x86/p2m: add option to skip root pagetable removal in
+ p2m_teardown()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Add a new parameter to p2m_teardown() in order to select whether the
+root page table should also be freed. Note that all users are
+adjusted to pass the parameter to remove the root page tables, so
+behavior is not modified.
+
+No functional change intended.
+
+This is part of CVE-2022-33746 / XSA-410.
+
+Suggested-by: Julien Grall <julien@xen.org>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Tim Deegan <tim@xen.org>
+master commit: 1df52a270225527ae27bfa2fc40347bf93b78357
+master date: 2022-10-11 14:21:23 +0200
+---
+ xen/arch/x86/mm/hap/hap.c | 6 +++---
+ xen/arch/x86/mm/p2m.c | 20 ++++++++++++++++----
+ xen/arch/x86/mm/shadow/common.c | 4 ++--
+ xen/include/asm-x86/p2m.h | 2 +-
+ 4 files changed, 22 insertions(+), 10 deletions(-)
+
+diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
+index 47a7487fa7a3..a8f5a19da917 100644
+--- a/xen/arch/x86/mm/hap/hap.c
++++ b/xen/arch/x86/mm/hap/hap.c
+@@ -541,18 +541,18 @@ void hap_final_teardown(struct domain *d)
+ }
+
+ for ( i = 0; i < MAX_ALTP2M; i++ )
+- p2m_teardown(d->arch.altp2m_p2m[i]);
++ p2m_teardown(d->arch.altp2m_p2m[i], true);
+ }
+
+ /* Destroy nestedp2m's first */
+ for (i = 0; i < MAX_NESTEDP2M; i++) {
+- p2m_teardown(d->arch.nested_p2m[i]);
++ p2m_teardown(d->arch.nested_p2m[i], true);
+ }
+
+ if ( d->arch.paging.hap.total_pages != 0 )
+ hap_teardown(d, NULL);
+
+- p2m_teardown(p2m_get_hostp2m(d));
++ p2m_teardown(p2m_get_hostp2m(d), true);
+ /* Free any memory that the p2m teardown released */
+ paging_lock(d);
+ hap_set_allocation(d, 0, NULL);
+diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
+index def1695cf00b..aba4f17cbe12 100644
+--- a/xen/arch/x86/mm/p2m.c
++++ b/xen/arch/x86/mm/p2m.c
+@@ -749,11 +749,11 @@ int p2m_alloc_table(struct p2m_domain *p2m)
+ * hvm fixme: when adding support for pvh non-hardware domains, this path must
+ * cleanup any foreign p2m types (release refcnts on them).
+ */
+-void p2m_teardown(struct p2m_domain *p2m)
++void p2m_teardown(struct p2m_domain *p2m, bool remove_root)
+ /* Return all the p2m pages to Xen.
+ * We know we don't have any extra mappings to these pages */
+ {
+- struct page_info *pg;
++ struct page_info *pg, *root_pg = NULL;
+ struct domain *d;
+
+ if (p2m == NULL)
+@@ -763,10 +763,22 @@ void p2m_teardown(struct p2m_domain *p2m)
+
+ p2m_lock(p2m);
+ ASSERT(atomic_read(&d->shr_pages) == 0);
+- p2m->phys_table = pagetable_null();
++
++ if ( remove_root )
++ p2m->phys_table = pagetable_null();
++ else if ( !pagetable_is_null(p2m->phys_table) )
++ {
++ root_pg = pagetable_get_page(p2m->phys_table);
++ clear_domain_page(pagetable_get_mfn(p2m->phys_table));
++ }
+
+ while ( (pg = page_list_remove_head(&p2m->pages)) )
+- d->arch.paging.free_page(d, pg);
++ if ( pg != root_pg )
++ d->arch.paging.free_page(d, pg);
++
++ if ( root_pg )
++ page_list_add(root_pg, &p2m->pages);
++
+ p2m_unlock(p2m);
+ }
+
+diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
+index 8c1b041f7135..8c5baba9544d 100644
+--- a/xen/arch/x86/mm/shadow/common.c
++++ b/xen/arch/x86/mm/shadow/common.c
+@@ -2701,7 +2701,7 @@ int shadow_enable(struct domain *d, u32 mode)
+ paging_unlock(d);
+ out_unlocked:
+ if ( rv != 0 && !pagetable_is_null(p2m_get_pagetable(p2m)) )
+- p2m_teardown(p2m);
++ p2m_teardown(p2m, true);
+ if ( rv != 0 && pg != NULL )
+ {
+ pg->count_info &= ~PGC_count_mask;
+@@ -2866,7 +2866,7 @@ void shadow_final_teardown(struct domain *d)
+ shadow_teardown(d, NULL);
+
+ /* It is now safe to pull down the p2m map. */
+- p2m_teardown(p2m_get_hostp2m(d));
++ p2m_teardown(p2m_get_hostp2m(d), true);
+ /* Free any shadow memory that the p2m teardown released */
+ paging_lock(d);
+ shadow_set_allocation(d, 0, NULL);
+diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
+index f2af7a746ced..c3c16748e7d5 100644
+--- a/xen/include/asm-x86/p2m.h
++++ b/xen/include/asm-x86/p2m.h
+@@ -574,7 +574,7 @@ int p2m_init(struct domain *d);
+ int p2m_alloc_table(struct p2m_domain *p2m);
+
+ /* Return all the p2m resources to Xen. */
+-void p2m_teardown(struct p2m_domain *p2m);
++void p2m_teardown(struct p2m_domain *p2m, bool remove_root);
+ void p2m_final_teardown(struct domain *d);
+
+ /* Add a page to a domain's p2m table */
+--
+2.37.3
+
diff --git a/0005-x86-HAP-adjust-monitor-table-related-error-handling.patch b/0005-x86-HAP-adjust-monitor-table-related-error-handling.patch
new file mode 100644
index 0000000..33ab1ad
--- /dev/null
+++ b/0005-x86-HAP-adjust-monitor-table-related-error-handling.patch
@@ -0,0 +1,77 @@
+From 3422c19d85a3d23a9d798eafb739ffb8865522d2 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 11 Oct 2022 14:52:59 +0200
+Subject: [PATCH 05/26] x86/HAP: adjust monitor table related error handling
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+hap_make_monitor_table() will return INVALID_MFN if it encounters an
+error condition, but hap_update_paging_modes() wasn’t handling this
+value, resulting in an inappropriate value being stored in
+monitor_table. This would subsequently misguide at least
+hap_vcpu_teardown(). Avoid this by bailing early.
+
+Further, when a domain has/was already crashed or (perhaps less
+important as there's no such path known to lead here) is already dying,
+avoid calling domain_crash() on it again - that's at best confusing.
+
+This is part of CVE-2022-33746 / XSA-410.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 5b44a61180f4f2e4f490a28400c884dd357ff45d
+master date: 2022-10-11 14:21:56 +0200
+---
+ xen/arch/x86/mm/hap/hap.c | 14 ++++++++++++--
+ 1 file changed, 12 insertions(+), 2 deletions(-)
+
+diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
+index a8f5a19da917..d75dc2b9ed3d 100644
+--- a/xen/arch/x86/mm/hap/hap.c
++++ b/xen/arch/x86/mm/hap/hap.c
+@@ -39,6 +39,7 @@
+ #include <asm/domain.h>
+ #include <xen/numa.h>
+ #include <asm/hvm/nestedhvm.h>
++#include <public/sched.h>
+
+ #include "private.h"
+
+@@ -405,8 +406,13 @@ static mfn_t hap_make_monitor_table(struct vcpu *v)
+ return m4mfn;
+
+ oom:
+- printk(XENLOG_G_ERR "out of memory building monitor pagetable\n");
+- domain_crash(d);
++ if ( !d->is_dying &&
++ (!d->is_shutting_down || d->shutdown_code != SHUTDOWN_crash) )
++ {
++ printk(XENLOG_G_ERR "%pd: out of memory building monitor pagetable\n",
++ d);
++ domain_crash(d);
++ }
+ return INVALID_MFN;
+ }
+
+@@ -766,6 +772,9 @@ static void hap_update_paging_modes(struct vcpu *v)
+ if ( pagetable_is_null(v->arch.hvm.monitor_table) )
+ {
+ mfn_t mmfn = hap_make_monitor_table(v);
++
++ if ( mfn_eq(mmfn, INVALID_MFN) )
++ goto unlock;
+ v->arch.hvm.monitor_table = pagetable_from_mfn(mmfn);
+ make_cr3(v, mmfn);
+ hvm_update_host_cr3(v);
+@@ -774,6 +783,7 @@ static void hap_update_paging_modes(struct vcpu *v)
+ /* CR3 is effectively updated by a mode change. Flush ASIDs, etc. */
+ hap_update_cr3(v, 0, false);
+
++ unlock:
+ paging_unlock(d);
+ put_gfn(d, cr3_gfn);
+ }
+--
+2.37.3
+
diff --git a/0005-x86-mm-avoid-inadvertently-degrading-a-TLB-flush-to-.patch b/0005-x86-mm-avoid-inadvertently-degrading-a-TLB-flush-to-.patch
deleted file mode 100644
index 45a1825..0000000
--- a/0005-x86-mm-avoid-inadvertently-degrading-a-TLB-flush-to-.patch
+++ /dev/null
@@ -1,116 +0,0 @@
-From 7c003ab4a398ff4ddd54d15d4158cffb463134cc Mon Sep 17 00:00:00 2001
-From: David Vrabel <dvrabel@amazon.co.uk>
-Date: Tue, 7 Jun 2022 13:59:31 +0200
-Subject: [PATCH 05/51] x86/mm: avoid inadvertently degrading a TLB flush to
- local only
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-If the direct map is incorrectly modified with interrupts disabled,
-the required TLB flushes are degraded to flushing the local CPU only.
-
-This could lead to very hard to diagnose problems as different CPUs will
-end up with different views of memory. Although, no such issues have yet
-been identified.
-
-Change the check in the flush_area() macro to look at system_state
-instead. This defers the switch from local to all later in the boot
-(see xen/arch/x86/setup.c:__start_xen()). This is fine because
-additional PCPUs are not brought up until after the system state is
-SYS_STATE_smp_boot.
-
-Signed-off-by: David Vrabel <dvrabel@amazon.co.uk>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-
-x86/flushtlb: remove flush_area check on system state
-
-Booting with Shadow Stacks leads to the following assert on a debug
-hypervisor:
-
-Assertion 'local_irq_is_enabled()' failed at arch/x86/smp.c:265
-----[ Xen-4.17.0-10.24-d x86_64 debug=y Not tainted ]----
-CPU: 0
-RIP: e008:[<ffff82d040345300>] flush_area_mask+0x40/0x13e
-[...]
-Xen call trace:
- [<ffff82d040345300>] R flush_area_mask+0x40/0x13e
- [<ffff82d040338a40>] F modify_xen_mappings+0xc5/0x958
- [<ffff82d0404474f9>] F arch/x86/alternative.c#_alternative_instructions+0xb7/0xb9
- [<ffff82d0404476cc>] F alternative_branches+0xf/0x12
- [<ffff82d04044e37d>] F __start_xen+0x1ef4/0x2776
- [<ffff82d040203344>] F __high_start+0x94/0xa0
-
-This is due to SYS_STATE_smp_boot being set before calling
-alternative_branches(), and the flush in modify_xen_mappings() then
-using flush_area_all() with interrupts disabled. Note that
-alternative_branches() is called before APs are started, so the flush
-must be a local one (and indeed the cpumask passed to
-flush_area_mask() just contains one CPU).
-
-Take the opportunity to simplify a bit the logic and make flush_area()
-an alias of flush_area_all() in mm.c, taking into account that
-cpu_online_map just contains the BSP before APs are started. This
-requires widening the assert in flush_area_mask() to allow being
-called with interrupts disabled as long as it's strictly a local only
-flush.
-
-The overall result is that a conditional can be removed from
-flush_area().
-
-While there also introduce an ASSERT to check that a vCPU state flush
-is not issued for the local CPU only.
-
-Fixes: 78e072bc37 ('x86/mm: avoid inadvertently degrading a TLB flush to local only')
-Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 78e072bc375043e81691a59454e09f0b38241ddd
-master date: 2022-04-20 10:55:01 +0200
-master commit: 9f735ee4903f1b9f1966bb4ba5b5616b03ae08b5
-master date: 2022-05-25 11:09:46 +0200
----
- xen/arch/x86/mm.c | 10 ++--------
- xen/arch/x86/smp.c | 5 ++++-
- 2 files changed, 6 insertions(+), 9 deletions(-)
-
-diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
-index 4d799032dc82..e222d9aa98ee 100644
---- a/xen/arch/x86/mm.c
-+++ b/xen/arch/x86/mm.c
-@@ -5051,14 +5051,8 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
- #define l1f_to_lNf(f) (((f) & _PAGE_PRESENT) ? ((f) | _PAGE_PSE) : (f))
- #define lNf_to_l1f(f) (((f) & _PAGE_PRESENT) ? ((f) & ~_PAGE_PSE) : (f))
-
--/*
-- * map_pages_to_xen() can be called with interrupts disabled during
-- * early bootstrap. In this case it is safe to use flush_area_local()
-- * and avoid locking because only the local CPU is online.
-- */
--#define flush_area(v,f) (!local_irq_is_enabled() ? \
-- flush_area_local((const void *)v, f) : \
-- flush_area_all((const void *)v, f))
-+/* flush_area_all() can be used prior to any other CPU being online. */
-+#define flush_area(v, f) flush_area_all((const void *)(v), f)
-
- #define L3T_INIT(page) (page) = ZERO_BLOCK_PTR
-
-diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
-index eef0f9c6cbf4..3556ec116608 100644
---- a/xen/arch/x86/smp.c
-+++ b/xen/arch/x86/smp.c
-@@ -262,7 +262,10 @@ void flush_area_mask(const cpumask_t *mask, const void *va, unsigned int flags)
- {
- unsigned int cpu = smp_processor_id();
-
-- ASSERT(local_irq_is_enabled());
-+ /* Local flushes can be performed with interrupts disabled. */
-+ ASSERT(local_irq_is_enabled() || cpumask_subset(mask, cpumask_of(cpu)));
-+ /* Exclude use of FLUSH_VCPU_STATE for the local CPU. */
-+ ASSERT(!cpumask_test_cpu(cpu, mask) || !(flags & FLUSH_VCPU_STATE));
-
- if ( (flags & ~(FLUSH_VCPU_STATE | FLUSH_ORDER_MASK)) &&
- cpumask_test_cpu(cpu, mask) )
---
-2.35.1
-
diff --git a/0006-x86-shadow-tolerate-failure-of-sh_set_toplevel_shado.patch b/0006-x86-shadow-tolerate-failure-of-sh_set_toplevel_shado.patch
new file mode 100644
index 0000000..bbae48b
--- /dev/null
+++ b/0006-x86-shadow-tolerate-failure-of-sh_set_toplevel_shado.patch
@@ -0,0 +1,76 @@
+From 40e9daf6b56ae49bda3ba4e254ccf0e998e52a8c Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 11 Oct 2022 14:53:12 +0200
+Subject: [PATCH 06/26] x86/shadow: tolerate failure of
+ sh_set_toplevel_shadow()
+
+Subsequently sh_set_toplevel_shadow() will be adjusted to install a
+blank entry in case prealloc fails. There are, in fact, pre-existing
+error paths which would put in place a blank entry. The 4- and 2-level
+code in sh_update_cr3(), however, assume the top level entry to be
+valid.
+
+Hence bail from the function in the unlikely event that it's not. Note
+that 3-level logic works differently: In particular a guest is free to
+supply a PDPTR pointing at 4 non-present (or otherwise deemed invalid)
+entries. The guest will crash, but we already cope with that.
+
+Really mfn_valid() is likely wrong to use in sh_set_toplevel_shadow(),
+and it should instead be !mfn_eq(gmfn, INVALID_MFN). Avoid such a change
+in security context, but add a respective assertion.
+
+This is part of CVE-2022-33746 / XSA-410.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Tim Deegan <tim@xen.org>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: eac000978c1feb5a9ee3236ab0c0da9a477e5336
+master date: 2022-10-11 14:22:24 +0200
+---
+ xen/arch/x86/mm/shadow/common.c | 1 +
+ xen/arch/x86/mm/shadow/multi.c | 10 ++++++++++
+ 2 files changed, 11 insertions(+)
+
+diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
+index 8c5baba9544d..00e520cbd05b 100644
+--- a/xen/arch/x86/mm/shadow/common.c
++++ b/xen/arch/x86/mm/shadow/common.c
+@@ -2516,6 +2516,7 @@ void sh_set_toplevel_shadow(struct vcpu *v,
+ /* Now figure out the new contents: is this a valid guest MFN? */
+ if ( !mfn_valid(gmfn) )
+ {
++ ASSERT(mfn_eq(gmfn, INVALID_MFN));
+ new_entry = pagetable_null();
+ goto install_new_entry;
+ }
+diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
+index 7b8f4dd13b03..2ff78fe3362c 100644
+--- a/xen/arch/x86/mm/shadow/multi.c
++++ b/xen/arch/x86/mm/shadow/multi.c
+@@ -3312,6 +3312,11 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
+ if ( sh_remove_write_access(d, gmfn, 4, 0) != 0 )
+ guest_flush_tlb_mask(d, d->dirty_cpumask);
+ sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow, sh_make_shadow);
++ if ( unlikely(pagetable_is_null(v->arch.paging.shadow.shadow_table[0])) )
++ {
++ ASSERT(d->is_dying || d->is_shutting_down);
++ return;
++ }
+ if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
+ {
+ mfn_t smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
+@@ -3370,6 +3375,11 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
+ if ( sh_remove_write_access(d, gmfn, 2, 0) != 0 )
+ guest_flush_tlb_mask(d, d->dirty_cpumask);
+ sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow, sh_make_shadow);
++ if ( unlikely(pagetable_is_null(v->arch.paging.shadow.shadow_table[0])) )
++ {
++ ASSERT(d->is_dying || d->is_shutting_down);
++ return;
++ }
+ #else
+ #error This should never happen
+ #endif
+--
+2.37.3
+
diff --git a/0006-xen-build-Fix-dependency-for-the-MAP-rule.patch b/0006-xen-build-Fix-dependency-for-the-MAP-rule.patch
deleted file mode 100644
index 7eb13cd..0000000
--- a/0006-xen-build-Fix-dependency-for-the-MAP-rule.patch
+++ /dev/null
@@ -1,29 +0,0 @@
-From 4bb8c34ba4241c2bf7845cd8b80c17530dbfb085 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 7 Jun 2022 14:00:09 +0200
-Subject: [PATCH 06/51] xen/build: Fix dependency for the MAP rule
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-master commit: e1e72198213b80b7a82bdc90f96ed05ae4f53e20
-master date: 2022-04-20 19:10:59 +0100
----
- xen/Makefile | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/Makefile b/xen/Makefile
-index 90a29782dbf4..ce4eca3ee4d7 100644
---- a/xen/Makefile
-+++ b/xen/Makefile
-@@ -507,7 +507,7 @@ cscope:
- cscope -k -b -q
-
- .PHONY: _MAP
--_MAP:
-+_MAP: $(TARGET)
- $(NM) -n $(TARGET)-syms | grep -v '\(compiled\)\|\(\.o$$\)\|\( [aUw] \)\|\(\.\.ng$$\)\|\(LASH[RL]DI\)' > System.map
-
- %.o %.i %.s: %.c FORCE
---
-2.35.1
-
diff --git a/0007-tools-libs-evtchn-don-t-set-errno-to-negative-values.patch b/0007-tools-libs-evtchn-don-t-set-errno-to-negative-values.patch
deleted file mode 100644
index ed98922..0000000
--- a/0007-tools-libs-evtchn-don-t-set-errno-to-negative-values.patch
+++ /dev/null
@@ -1,74 +0,0 @@
-From 13a29f3756bc4cab96c59f46c3875b483553fb8f Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 7 Jun 2022 14:00:31 +0200
-Subject: [PATCH 07/51] tools/libs/evtchn: don't set errno to negative values
-
-Setting errno to a negative value makes no sense.
-
-Fixes: 6b6500b3cbaa ("tools/libs/evtchn: Add support for restricting a handle")
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 60245b71c1cd001686fa7b7a26869cbcb80d074c
-master date: 2022-04-22 20:39:34 +0100
----
- tools/libs/evtchn/freebsd.c | 2 +-
- tools/libs/evtchn/minios.c | 2 +-
- tools/libs/evtchn/netbsd.c | 2 +-
- tools/libs/evtchn/solaris.c | 2 +-
- 4 files changed, 4 insertions(+), 4 deletions(-)
-
-diff --git a/tools/libs/evtchn/freebsd.c b/tools/libs/evtchn/freebsd.c
-index 7427ab240860..fa17a0f8dbb5 100644
---- a/tools/libs/evtchn/freebsd.c
-+++ b/tools/libs/evtchn/freebsd.c
-@@ -58,7 +58,7 @@ int osdep_evtchn_close(xenevtchn_handle *xce)
-
- int osdep_evtchn_restrict(xenevtchn_handle *xce, domid_t domid)
- {
-- errno = -EOPNOTSUPP;
-+ errno = EOPNOTSUPP;
-
- return -1;
- }
-diff --git a/tools/libs/evtchn/minios.c b/tools/libs/evtchn/minios.c
-index e5dfdc5ef52e..c0bd5429eea2 100644
---- a/tools/libs/evtchn/minios.c
-+++ b/tools/libs/evtchn/minios.c
-@@ -97,7 +97,7 @@ int osdep_evtchn_close(xenevtchn_handle *xce)
-
- int osdep_evtchn_restrict(xenevtchn_handle *xce, domid_t domid)
- {
-- errno = -EOPNOTSUPP;
-+ errno = EOPNOTSUPP;
-
- return -1;
- }
-diff --git a/tools/libs/evtchn/netbsd.c b/tools/libs/evtchn/netbsd.c
-index 1cebc21ffce0..56409513bc23 100644
---- a/tools/libs/evtchn/netbsd.c
-+++ b/tools/libs/evtchn/netbsd.c
-@@ -53,7 +53,7 @@ int osdep_evtchn_close(xenevtchn_handle *xce)
-
- int osdep_evtchn_restrict(xenevtchn_handle *xce, domid_t domid)
- {
-- errno = -EOPNOTSUPP;
-+ errno = EOPNOTSUPP;
-
- return -1;
- }
-diff --git a/tools/libs/evtchn/solaris.c b/tools/libs/evtchn/solaris.c
-index df9579df1778..beaa7721425f 100644
---- a/tools/libs/evtchn/solaris.c
-+++ b/tools/libs/evtchn/solaris.c
-@@ -53,7 +53,7 @@ int osdep_evtchn_close(xenevtchn_handle *xce)
-
- int osdep_evtchn_restrict(xenevtchn_handle *xce, domid_t domid)
- {
-- errno = -EOPNOTSUPP;
-+ errno = EOPNOTSUPP;
- return -1;
- }
-
---
-2.35.1
-
diff --git a/0007-x86-shadow-tolerate-failure-in-shadow_prealloc.patch b/0007-x86-shadow-tolerate-failure-in-shadow_prealloc.patch
new file mode 100644
index 0000000..5e2f8ab
--- /dev/null
+++ b/0007-x86-shadow-tolerate-failure-in-shadow_prealloc.patch
@@ -0,0 +1,279 @@
+From 28d3f677ec97c98154311f64871ac48762cf980a Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 11 Oct 2022 14:53:27 +0200
+Subject: [PATCH 07/26] x86/shadow: tolerate failure in shadow_prealloc()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Prevent _shadow_prealloc() from calling BUG() when unable to fulfill
+the pre-allocation and instead return true/false. Modify
+shadow_prealloc() to crash the domain on allocation failure (if the
+domain is not already dying), as shadow cannot operate normally after
+that. Modify callers to also gracefully handle {_,}shadow_prealloc()
+failing to fulfill the request.
+
+Note this in turn requires adjusting the callers of
+sh_make_monitor_table() also to handle it returning INVALID_MFN.
+sh_update_paging_modes() is also modified to add additional error
+paths in case of allocation failure, some of those will return with
+null monitor page tables (and the domain likely crashed). This is no
+different that current error paths, but the newly introduced ones are
+more likely to trigger.
+
+The now added failure points in sh_update_paging_modes() also require
+that on some error return paths the previous structures are cleared,
+and thus monitor table is null.
+
+While there adjust the 'type' parameter type of shadow_prealloc() to
+unsigned int rather than u32.
+
+This is part of CVE-2022-33746 / XSA-410.
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Tim Deegan <tim@xen.org>
+master commit: b7f93c6afb12b6061e2d19de2f39ea09b569ac68
+master date: 2022-10-11 14:22:53 +0200
+---
+ xen/arch/x86/mm/shadow/common.c | 69 ++++++++++++++++++++++++--------
+ xen/arch/x86/mm/shadow/hvm.c | 4 +-
+ xen/arch/x86/mm/shadow/multi.c | 11 +++--
+ xen/arch/x86/mm/shadow/private.h | 3 +-
+ 4 files changed, 66 insertions(+), 21 deletions(-)
+
+diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
+index 00e520cbd05b..2067c7d16bb4 100644
+--- a/xen/arch/x86/mm/shadow/common.c
++++ b/xen/arch/x86/mm/shadow/common.c
+@@ -36,6 +36,7 @@
+ #include <asm/flushtlb.h>
+ #include <asm/shadow.h>
+ #include <xen/numa.h>
++#include <public/sched.h>
+ #include "private.h"
+
+ DEFINE_PER_CPU(uint32_t,trace_shadow_path_flags);
+@@ -928,14 +929,15 @@ static inline void trace_shadow_prealloc_unpin(struct domain *d, mfn_t smfn)
+
+ /* Make sure there are at least count order-sized pages
+ * available in the shadow page pool. */
+-static void _shadow_prealloc(struct domain *d, unsigned int pages)
++static bool __must_check _shadow_prealloc(struct domain *d, unsigned int pages)
+ {
+ struct vcpu *v;
+ struct page_info *sp, *t;
+ mfn_t smfn;
+ int i;
+
+- if ( d->arch.paging.shadow.free_pages >= pages ) return;
++ if ( d->arch.paging.shadow.free_pages >= pages )
++ return true;
+
+ /* Shouldn't have enabled shadows if we've no vcpus. */
+ ASSERT(d->vcpu && d->vcpu[0]);
+@@ -951,7 +953,8 @@ static void _shadow_prealloc(struct domain *d, unsigned int pages)
+ sh_unpin(d, smfn);
+
+ /* See if that freed up enough space */
+- if ( d->arch.paging.shadow.free_pages >= pages ) return;
++ if ( d->arch.paging.shadow.free_pages >= pages )
++ return true;
+ }
+
+ /* Stage two: all shadow pages are in use in hierarchies that are
+@@ -974,7 +977,7 @@ static void _shadow_prealloc(struct domain *d, unsigned int pages)
+ if ( d->arch.paging.shadow.free_pages >= pages )
+ {
+ guest_flush_tlb_mask(d, d->dirty_cpumask);
+- return;
++ return true;
+ }
+ }
+ }
+@@ -987,7 +990,12 @@ static void _shadow_prealloc(struct domain *d, unsigned int pages)
+ d->arch.paging.shadow.total_pages,
+ d->arch.paging.shadow.free_pages,
+ d->arch.paging.shadow.p2m_pages);
+- BUG();
++
++ ASSERT(d->is_dying);
++
++ guest_flush_tlb_mask(d, d->dirty_cpumask);
++
++ return false;
+ }
+
+ /* Make sure there are at least count pages of the order according to
+@@ -995,9 +1003,19 @@ static void _shadow_prealloc(struct domain *d, unsigned int pages)
+ * This must be called before any calls to shadow_alloc(). Since this
+ * will free existing shadows to make room, it must be called early enough
+ * to avoid freeing shadows that the caller is currently working on. */
+-void shadow_prealloc(struct domain *d, u32 type, unsigned int count)
++bool shadow_prealloc(struct domain *d, unsigned int type, unsigned int count)
+ {
+- return _shadow_prealloc(d, shadow_size(type) * count);
++ bool ret = _shadow_prealloc(d, shadow_size(type) * count);
++
++ if ( !ret && !d->is_dying &&
++ (!d->is_shutting_down || d->shutdown_code != SHUTDOWN_crash) )
++ /*
++ * Failing to allocate memory required for shadow usage can only result in
++ * a domain crash, do it here rather that relying on every caller to do it.
++ */
++ domain_crash(d);
++
++ return ret;
+ }
+
+ /* Deliberately free all the memory we can: this will tear down all of
+@@ -1218,7 +1236,7 @@ void shadow_free(struct domain *d, mfn_t smfn)
+ static struct page_info *
+ shadow_alloc_p2m_page(struct domain *d)
+ {
+- struct page_info *pg;
++ struct page_info *pg = NULL;
+
+ /* This is called both from the p2m code (which never holds the
+ * paging lock) and the log-dirty code (which always does). */
+@@ -1236,16 +1254,18 @@ shadow_alloc_p2m_page(struct domain *d)
+ d->arch.paging.shadow.p2m_pages,
+ shadow_min_acceptable_pages(d));
+ }
+- paging_unlock(d);
+- return NULL;
++ goto out;
+ }
+
+- shadow_prealloc(d, SH_type_p2m_table, 1);
++ if ( !shadow_prealloc(d, SH_type_p2m_table, 1) )
++ goto out;
++
+ pg = mfn_to_page(shadow_alloc(d, SH_type_p2m_table, 0));
+ d->arch.paging.shadow.p2m_pages++;
+ d->arch.paging.shadow.total_pages--;
+ ASSERT(!page_get_owner(pg) && !(pg->count_info & PGC_count_mask));
+
++ out:
+ paging_unlock(d);
+
+ return pg;
+@@ -1336,7 +1356,9 @@ int shadow_set_allocation(struct domain *d, unsigned int pages, bool *preempted)
+ else if ( d->arch.paging.shadow.total_pages > pages )
+ {
+ /* Need to return memory to domheap */
+- _shadow_prealloc(d, 1);
++ if ( !_shadow_prealloc(d, 1) )
++ return -ENOMEM;
++
+ sp = page_list_remove_head(&d->arch.paging.shadow.freelist);
+ ASSERT(sp);
+ /*
+@@ -2334,12 +2356,13 @@ static void sh_update_paging_modes(struct vcpu *v)
+ if ( mfn_eq(v->arch.paging.shadow.oos_snapshot[0], INVALID_MFN) )
+ {
+ int i;
++
++ if ( !shadow_prealloc(d, SH_type_oos_snapshot, SHADOW_OOS_PAGES) )
++ return;
++
+ for(i = 0; i < SHADOW_OOS_PAGES; i++)
+- {
+- shadow_prealloc(d, SH_type_oos_snapshot, 1);
+ v->arch.paging.shadow.oos_snapshot[i] =
+ shadow_alloc(d, SH_type_oos_snapshot, 0);
+- }
+ }
+ #endif /* OOS */
+
+@@ -2403,6 +2426,9 @@ static void sh_update_paging_modes(struct vcpu *v)
+ mfn_t mmfn = sh_make_monitor_table(
+ v, v->arch.paging.mode->shadow.shadow_levels);
+
++ if ( mfn_eq(mmfn, INVALID_MFN) )
++ return;
++
+ v->arch.hvm.monitor_table = pagetable_from_mfn(mmfn);
+ make_cr3(v, mmfn);
+ hvm_update_host_cr3(v);
+@@ -2441,6 +2467,12 @@ static void sh_update_paging_modes(struct vcpu *v)
+ v->arch.hvm.monitor_table = pagetable_null();
+ new_mfn = sh_make_monitor_table(
+ v, v->arch.paging.mode->shadow.shadow_levels);
++ if ( mfn_eq(new_mfn, INVALID_MFN) )
++ {
++ sh_destroy_monitor_table(v, old_mfn,
++ old_mode->shadow.shadow_levels);
++ return;
++ }
+ v->arch.hvm.monitor_table = pagetable_from_mfn(new_mfn);
+ SHADOW_PRINTK("new monitor table %"PRI_mfn "\n",
+ mfn_x(new_mfn));
+@@ -2526,7 +2558,12 @@ void sh_set_toplevel_shadow(struct vcpu *v,
+ if ( !mfn_valid(smfn) )
+ {
+ /* Make sure there's enough free shadow memory. */
+- shadow_prealloc(d, root_type, 1);
++ if ( !shadow_prealloc(d, root_type, 1) )
++ {
++ new_entry = pagetable_null();
++ goto install_new_entry;
++ }
++
+ /* Shadow the page. */
+ smfn = make_shadow(v, gmfn, root_type);
+ }
+diff --git a/xen/arch/x86/mm/shadow/hvm.c b/xen/arch/x86/mm/shadow/hvm.c
+index d5f42102a0bd..a0878d9ad71a 100644
+--- a/xen/arch/x86/mm/shadow/hvm.c
++++ b/xen/arch/x86/mm/shadow/hvm.c
+@@ -700,7 +700,9 @@ mfn_t sh_make_monitor_table(const struct vcpu *v, unsigned int shadow_levels)
+ ASSERT(!pagetable_get_pfn(v->arch.hvm.monitor_table));
+
+ /* Guarantee we can get the memory we need */
+- shadow_prealloc(d, SH_type_monitor_table, CONFIG_PAGING_LEVELS);
++ if ( !shadow_prealloc(d, SH_type_monitor_table, CONFIG_PAGING_LEVELS) )
++ return INVALID_MFN;
++
+ m4mfn = shadow_alloc(d, SH_type_monitor_table, 0);
+ mfn_to_page(m4mfn)->shadow_flags = 4;
+
+diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
+index 2ff78fe3362c..c07af0bd99da 100644
+--- a/xen/arch/x86/mm/shadow/multi.c
++++ b/xen/arch/x86/mm/shadow/multi.c
+@@ -2440,9 +2440,14 @@ static int sh_page_fault(struct vcpu *v,
+ * Preallocate shadow pages *before* removing writable accesses
+ * otherwhise an OOS L1 might be demoted and promoted again with
+ * writable mappings. */
+- shadow_prealloc(d,
+- SH_type_l1_shadow,
+- GUEST_PAGING_LEVELS < 4 ? 1 : GUEST_PAGING_LEVELS - 1);
++ if ( !shadow_prealloc(d, SH_type_l1_shadow,
++ GUEST_PAGING_LEVELS < 4
++ ? 1 : GUEST_PAGING_LEVELS - 1) )
++ {
++ paging_unlock(d);
++ put_gfn(d, gfn_x(gfn));
++ return 0;
++ }
+
+ rc = gw_remove_write_accesses(v, va, &gw);
+
+diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h
+index 35efb1b984fb..738214f75e8d 100644
+--- a/xen/arch/x86/mm/shadow/private.h
++++ b/xen/arch/x86/mm/shadow/private.h
+@@ -383,7 +383,8 @@ void shadow_promote(struct domain *d, mfn_t gmfn, u32 type);
+ void shadow_demote(struct domain *d, mfn_t gmfn, u32 type);
+
+ /* Shadow page allocation functions */
+-void shadow_prealloc(struct domain *d, u32 shadow_type, unsigned int count);
++bool __must_check shadow_prealloc(struct domain *d, unsigned int shadow_type,
++ unsigned int count);
+ mfn_t shadow_alloc(struct domain *d,
+ u32 shadow_type,
+ unsigned long backpointer);
+--
+2.37.3
+
diff --git a/0008-tools-libs-ctrl-don-t-set-errno-to-a-negative-value.patch b/0008-tools-libs-ctrl-don-t-set-errno-to-a-negative-value.patch
deleted file mode 100644
index 166f0ff..0000000
--- a/0008-tools-libs-ctrl-don-t-set-errno-to-a-negative-value.patch
+++ /dev/null
@@ -1,36 +0,0 @@
-From ba62afdbc31a8cfe897191efd25ed4449d9acd94 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 7 Jun 2022 14:01:03 +0200
-Subject: [PATCH 08/51] tools/libs/ctrl: don't set errno to a negative value
-
-The claimed reason for setting errno to -1 is wrong. On x86
-xc_domain_pod_target() will set errno to a sane value in the error
-case.
-
-Fixes: ff1745d5882b ("tools: libxl: do not set the PoD target on ARM")
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: a0fb7e0e73483ed042d5ca34861a891a51ad337b
-master date: 2022-04-22 20:39:34 +0100
----
- tools/libs/ctrl/xc_domain.c | 4 +---
- 1 file changed, 1 insertion(+), 3 deletions(-)
-
-diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
-index b155d6afd2ef..9d675c8f21e1 100644
---- a/tools/libs/ctrl/xc_domain.c
-+++ b/tools/libs/ctrl/xc_domain.c
-@@ -1297,9 +1297,7 @@ int xc_domain_get_pod_target(xc_interface *xch,
- uint64_t *pod_cache_pages,
- uint64_t *pod_entries)
- {
-- /* On x86 (above) xc_domain_pod_target will incorrectly return -1
-- * with errno==-1 on error. Do the same for least surprise. */
-- errno = -1;
-+ errno = EOPNOTSUPP;
- return -1;
- }
- #endif
---
-2.35.1
-
diff --git a/0008-x86-p2m-refuse-new-allocations-for-dying-domains.patch b/0008-x86-p2m-refuse-new-allocations-for-dying-domains.patch
new file mode 100644
index 0000000..70b5cc9
--- /dev/null
+++ b/0008-x86-p2m-refuse-new-allocations-for-dying-domains.patch
@@ -0,0 +1,100 @@
+From 745e0b300dc3f5000e6d48c273b405d4bcc29ba7 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 11 Oct 2022 14:53:41 +0200
+Subject: [PATCH 08/26] x86/p2m: refuse new allocations for dying domains
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+This will in particular prevent any attempts to add entries to the p2m,
+once - in a subsequent change - non-root entries have been removed.
+
+This is part of CVE-2022-33746 / XSA-410.
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Tim Deegan <tim@xen.org>
+master commit: ff600a8cf8e36f8ecbffecf96a035952e022ab87
+master date: 2022-10-11 14:23:22 +0200
+---
+ xen/arch/x86/mm/hap/hap.c | 5 ++++-
+ xen/arch/x86/mm/shadow/common.c | 18 ++++++++++++++----
+ 2 files changed, 18 insertions(+), 5 deletions(-)
+
+diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
+index d75dc2b9ed3d..787991233e53 100644
+--- a/xen/arch/x86/mm/hap/hap.c
++++ b/xen/arch/x86/mm/hap/hap.c
+@@ -245,6 +245,9 @@ static struct page_info *hap_alloc(struct domain *d)
+
+ ASSERT(paging_locked_by_me(d));
+
++ if ( unlikely(d->is_dying) )
++ return NULL;
++
+ pg = page_list_remove_head(&d->arch.paging.hap.freelist);
+ if ( unlikely(!pg) )
+ return NULL;
+@@ -281,7 +284,7 @@ static struct page_info *hap_alloc_p2m_page(struct domain *d)
+ d->arch.paging.hap.p2m_pages++;
+ ASSERT(!page_get_owner(pg) && !(pg->count_info & PGC_count_mask));
+ }
+- else if ( !d->arch.paging.p2m_alloc_failed )
++ else if ( !d->arch.paging.p2m_alloc_failed && !d->is_dying )
+ {
+ d->arch.paging.p2m_alloc_failed = 1;
+ dprintk(XENLOG_ERR, "d%i failed to allocate from HAP pool\n",
+diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
+index 2067c7d16bb4..9807f6ec6c00 100644
+--- a/xen/arch/x86/mm/shadow/common.c
++++ b/xen/arch/x86/mm/shadow/common.c
+@@ -939,6 +939,10 @@ static bool __must_check _shadow_prealloc(struct domain *d, unsigned int pages)
+ if ( d->arch.paging.shadow.free_pages >= pages )
+ return true;
+
++ if ( unlikely(d->is_dying) )
++ /* No reclaim when the domain is dying, teardown will take care of it. */
++ return false;
++
+ /* Shouldn't have enabled shadows if we've no vcpus. */
+ ASSERT(d->vcpu && d->vcpu[0]);
+
+@@ -991,7 +995,7 @@ static bool __must_check _shadow_prealloc(struct domain *d, unsigned int pages)
+ d->arch.paging.shadow.free_pages,
+ d->arch.paging.shadow.p2m_pages);
+
+- ASSERT(d->is_dying);
++ ASSERT_UNREACHABLE();
+
+ guest_flush_tlb_mask(d, d->dirty_cpumask);
+
+@@ -1005,10 +1009,13 @@ static bool __must_check _shadow_prealloc(struct domain *d, unsigned int pages)
+ * to avoid freeing shadows that the caller is currently working on. */
+ bool shadow_prealloc(struct domain *d, unsigned int type, unsigned int count)
+ {
+- bool ret = _shadow_prealloc(d, shadow_size(type) * count);
++ bool ret;
+
+- if ( !ret && !d->is_dying &&
+- (!d->is_shutting_down || d->shutdown_code != SHUTDOWN_crash) )
++ if ( unlikely(d->is_dying) )
++ return false;
++
++ ret = _shadow_prealloc(d, shadow_size(type) * count);
++ if ( !ret && (!d->is_shutting_down || d->shutdown_code != SHUTDOWN_crash) )
+ /*
+ * Failing to allocate memory required for shadow usage can only result in
+ * a domain crash, do it here rather that relying on every caller to do it.
+@@ -1238,6 +1245,9 @@ shadow_alloc_p2m_page(struct domain *d)
+ {
+ struct page_info *pg = NULL;
+
++ if ( unlikely(d->is_dying) )
++ return NULL;
++
+ /* This is called both from the p2m code (which never holds the
+ * paging lock) and the log-dirty code (which always does). */
+ paging_lock_recursive(d);
+--
+2.37.3
+
diff --git a/0009-tools-libs-guest-don-t-set-errno-to-a-negative-value.patch b/0009-tools-libs-guest-don-t-set-errno-to-a-negative-value.patch
deleted file mode 100644
index 5d035f6..0000000
--- a/0009-tools-libs-guest-don-t-set-errno-to-a-negative-value.patch
+++ /dev/null
@@ -1,32 +0,0 @@
-From a2cf30eec08db5df974a9e8bb7366fee8fc7fcd9 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 7 Jun 2022 14:01:27 +0200
-Subject: [PATCH 09/51] tools/libs/guest: don't set errno to a negative value
-
-Setting errno to a negative error value makes no sense.
-
-Fixes: cb99a64029c9 ("libxc: arm: allow passing a device tree blob to the guest")
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 438e96ab479495a932391a22e219ee62fa8c4f47
-master date: 2022-04-22 20:39:34 +0100
----
- tools/libs/guest/xg_dom_core.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/tools/libs/guest/xg_dom_core.c b/tools/libs/guest/xg_dom_core.c
-index 2e4c1330ea6b..65975a75da37 100644
---- a/tools/libs/guest/xg_dom_core.c
-+++ b/tools/libs/guest/xg_dom_core.c
-@@ -856,7 +856,7 @@ int xc_dom_devicetree_file(struct xc_dom_image *dom, const char *filename)
- return -1;
- return 0;
- #else
-- errno = -EINVAL;
-+ errno = EINVAL;
- return -1;
- #endif
- }
---
-2.35.1
-
diff --git a/0009-x86-p2m-truly-free-paging-pool-memory-for-dying-doma.patch b/0009-x86-p2m-truly-free-paging-pool-memory-for-dying-doma.patch
new file mode 100644
index 0000000..07e63ac
--- /dev/null
+++ b/0009-x86-p2m-truly-free-paging-pool-memory-for-dying-doma.patch
@@ -0,0 +1,115 @@
+From 943635d8f8486209e4e48966507ad57963e96284 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 11 Oct 2022 14:54:00 +0200
+Subject: [PATCH 09/26] x86/p2m: truly free paging pool memory for dying
+ domains
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Modify {hap,shadow}_free to free the page immediately if the domain is
+dying, so that pages don't accumulate in the pool when
+{shadow,hap}_final_teardown() get called. This is to limit the amount of
+work which needs to be done there (in a non-preemptable manner).
+
+Note the call to shadow_free() in shadow_free_p2m_page() is moved after
+increasing total_pages, so that the decrease done in shadow_free() in
+case the domain is dying doesn't underflow the counter, even if just for
+a short interval.
+
+This is part of CVE-2022-33746 / XSA-410.
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Tim Deegan <tim@xen.org>
+master commit: f50a2c0e1d057c00d6061f40ae24d068226052ad
+master date: 2022-10-11 14:23:51 +0200
+---
+ xen/arch/x86/mm/hap/hap.c | 12 ++++++++++++
+ xen/arch/x86/mm/shadow/common.c | 28 +++++++++++++++++++++++++---
+ 2 files changed, 37 insertions(+), 3 deletions(-)
+
+diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
+index 787991233e53..aef2297450e1 100644
+--- a/xen/arch/x86/mm/hap/hap.c
++++ b/xen/arch/x86/mm/hap/hap.c
+@@ -265,6 +265,18 @@ static void hap_free(struct domain *d, mfn_t mfn)
+
+ ASSERT(paging_locked_by_me(d));
+
++ /*
++ * For dying domains, actually free the memory here. This way less work is
++ * left to hap_final_teardown(), which cannot easily have preemption checks
++ * added.
++ */
++ if ( unlikely(d->is_dying) )
++ {
++ free_domheap_page(pg);
++ d->arch.paging.hap.total_pages--;
++ return;
++ }
++
+ d->arch.paging.hap.free_pages++;
+ page_list_add_tail(pg, &d->arch.paging.hap.freelist);
+ }
+diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
+index 9807f6ec6c00..9eb33eafc7f7 100644
+--- a/xen/arch/x86/mm/shadow/common.c
++++ b/xen/arch/x86/mm/shadow/common.c
+@@ -1187,6 +1187,7 @@ mfn_t shadow_alloc(struct domain *d,
+ void shadow_free(struct domain *d, mfn_t smfn)
+ {
+ struct page_info *next = NULL, *sp = mfn_to_page(smfn);
++ bool dying = ACCESS_ONCE(d->is_dying);
+ struct page_list_head *pin_list;
+ unsigned int pages;
+ u32 shadow_type;
+@@ -1229,11 +1230,32 @@ void shadow_free(struct domain *d, mfn_t smfn)
+ * just before the allocator hands the page out again. */
+ page_set_tlbflush_timestamp(sp);
+ perfc_decr(shadow_alloc_count);
+- page_list_add_tail(sp, &d->arch.paging.shadow.freelist);
++
++ /*
++ * For dying domains, actually free the memory here. This way less
++ * work is left to shadow_final_teardown(), which cannot easily have
++ * preemption checks added.
++ */
++ if ( unlikely(dying) )
++ {
++ /*
++ * The backpointer field (sh.back) used by shadow code aliases the
++ * domain owner field, unconditionally clear it here to avoid
++ * free_domheap_page() attempting to parse it.
++ */
++ page_set_owner(sp, NULL);
++ free_domheap_page(sp);
++ }
++ else
++ page_list_add_tail(sp, &d->arch.paging.shadow.freelist);
++
+ sp = next;
+ }
+
+- d->arch.paging.shadow.free_pages += pages;
++ if ( unlikely(dying) )
++ d->arch.paging.shadow.total_pages -= pages;
++ else
++ d->arch.paging.shadow.free_pages += pages;
+ }
+
+ /* Divert a page from the pool to be used by the p2m mapping.
+@@ -1303,9 +1325,9 @@ shadow_free_p2m_page(struct domain *d, struct page_info *pg)
+ * paging lock) and the log-dirty code (which always does). */
+ paging_lock_recursive(d);
+
+- shadow_free(d, page_to_mfn(pg));
+ d->arch.paging.shadow.p2m_pages--;
+ d->arch.paging.shadow.total_pages++;
++ shadow_free(d, page_to_mfn(pg));
+
+ paging_unlock(d);
+ }
+--
+2.37.3
+
diff --git a/0010-tools-libs-light-don-t-set-errno-to-a-negative-value.patch b/0010-tools-libs-light-don-t-set-errno-to-a-negative-value.patch
deleted file mode 100644
index ac900ae..0000000
--- a/0010-tools-libs-light-don-t-set-errno-to-a-negative-value.patch
+++ /dev/null
@@ -1,32 +0,0 @@
-From 15391de8e2bb6153eadd483154c53044ab53d98d Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 7 Jun 2022 14:01:44 +0200
-Subject: [PATCH 10/51] tools/libs/light: don't set errno to a negative value
-
-Setting errno to a negative value makes no sense.
-
-Fixes: e78e8b9bb649 ("libxl: Add interface for querying hypervisor about PCI topology")
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 2419a159fb943c24a6f2439604b9fdb1478fcd08
-master date: 2022-04-22 20:39:34 +0100
----
- tools/libs/light/libxl_linux.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/tools/libs/light/libxl_linux.c b/tools/libs/light/libxl_linux.c
-index 8d62dfd255cb..27f2bce71837 100644
---- a/tools/libs/light/libxl_linux.c
-+++ b/tools/libs/light/libxl_linux.c
-@@ -288,7 +288,7 @@ int libxl__pci_topology_init(libxl__gc *gc,
- if (i == num_devs) {
- LOG(ERROR, "Too many devices");
- err = ERROR_FAIL;
-- errno = -ENOSPC;
-+ errno = ENOSPC;
- goto out;
- }
-
---
-2.35.1
-
diff --git a/0010-x86-p2m-free-the-paging-memory-pool-preemptively.patch b/0010-x86-p2m-free-the-paging-memory-pool-preemptively.patch
new file mode 100644
index 0000000..59c6940
--- /dev/null
+++ b/0010-x86-p2m-free-the-paging-memory-pool-preemptively.patch
@@ -0,0 +1,181 @@
+From f5959ed715e19cf2844656477dbf74c2f576c9d4 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 11 Oct 2022 14:54:21 +0200
+Subject: [PATCH 10/26] x86/p2m: free the paging memory pool preemptively
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The paging memory pool is currently freed in two different places:
+from {shadow,hap}_teardown() via domain_relinquish_resources() and
+from {shadow,hap}_final_teardown() via complete_domain_destroy().
+While the former does handle preemption, the later doesn't.
+
+Attempt to move as much p2m related freeing as possible to happen
+before the call to {shadow,hap}_teardown(), so that most memory can be
+freed in a preemptive way. In order to avoid causing issues to
+existing callers leave the root p2m page tables set and free them in
+{hap,shadow}_final_teardown(). Also modify {hap,shadow}_free to free
+the page immediately if the domain is dying, so that pages don't
+accumulate in the pool when {shadow,hap}_final_teardown() get called.
+
+Move altp2m_vcpu_disable_ve() to be done in hap_teardown(), as that's
+the place where altp2m_active gets disabled now.
+
+This is part of CVE-2022-33746 / XSA-410.
+
+Reported-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Tim Deegan <tim@xen.org>
+master commit: e7aa55c0aab36d994bf627c92bd5386ae167e16e
+master date: 2022-10-11 14:24:21 +0200
+---
+ xen/arch/x86/domain.c | 7 ------
+ xen/arch/x86/mm/hap/hap.c | 42 ++++++++++++++++++++-------------
+ xen/arch/x86/mm/shadow/common.c | 12 ++++++++++
+ 3 files changed, 38 insertions(+), 23 deletions(-)
+
+diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
+index 0d39981550ca..a4356893bdbc 100644
+--- a/xen/arch/x86/domain.c
++++ b/xen/arch/x86/domain.c
+@@ -38,7 +38,6 @@
+ #include <xen/livepatch.h>
+ #include <public/sysctl.h>
+ #include <public/hvm/hvm_vcpu.h>
+-#include <asm/altp2m.h>
+ #include <asm/regs.h>
+ #include <asm/mc146818rtc.h>
+ #include <asm/system.h>
+@@ -2381,12 +2380,6 @@ int domain_relinquish_resources(struct domain *d)
+ vpmu_destroy(v);
+ }
+
+- if ( altp2m_active(d) )
+- {
+- for_each_vcpu ( d, v )
+- altp2m_vcpu_disable_ve(v);
+- }
+-
+ if ( is_pv_domain(d) )
+ {
+ for_each_vcpu ( d, v )
+diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
+index aef2297450e1..a44fcfd95e1e 100644
+--- a/xen/arch/x86/mm/hap/hap.c
++++ b/xen/arch/x86/mm/hap/hap.c
+@@ -28,6 +28,7 @@
+ #include <xen/domain_page.h>
+ #include <xen/guest_access.h>
+ #include <xen/keyhandler.h>
++#include <asm/altp2m.h>
+ #include <asm/event.h>
+ #include <asm/page.h>
+ #include <asm/current.h>
+@@ -546,24 +547,8 @@ void hap_final_teardown(struct domain *d)
+ unsigned int i;
+
+ if ( hvm_altp2m_supported() )
+- {
+- d->arch.altp2m_active = 0;
+-
+- if ( d->arch.altp2m_eptp )
+- {
+- free_xenheap_page(d->arch.altp2m_eptp);
+- d->arch.altp2m_eptp = NULL;
+- }
+-
+- if ( d->arch.altp2m_visible_eptp )
+- {
+- free_xenheap_page(d->arch.altp2m_visible_eptp);
+- d->arch.altp2m_visible_eptp = NULL;
+- }
+-
+ for ( i = 0; i < MAX_ALTP2M; i++ )
+ p2m_teardown(d->arch.altp2m_p2m[i], true);
+- }
+
+ /* Destroy nestedp2m's first */
+ for (i = 0; i < MAX_NESTEDP2M; i++) {
+@@ -578,6 +563,8 @@ void hap_final_teardown(struct domain *d)
+ paging_lock(d);
+ hap_set_allocation(d, 0, NULL);
+ ASSERT(d->arch.paging.hap.p2m_pages == 0);
++ ASSERT(d->arch.paging.hap.free_pages == 0);
++ ASSERT(d->arch.paging.hap.total_pages == 0);
+ paging_unlock(d);
+ }
+
+@@ -603,6 +590,7 @@ void hap_vcpu_teardown(struct vcpu *v)
+ void hap_teardown(struct domain *d, bool *preempted)
+ {
+ struct vcpu *v;
++ unsigned int i;
+
+ ASSERT(d->is_dying);
+ ASSERT(d != current->domain);
+@@ -611,6 +599,28 @@ void hap_teardown(struct domain *d, bool *preempted)
+ for_each_vcpu ( d, v )
+ hap_vcpu_teardown(v);
+
++ /* Leave the root pt in case we get further attempts to modify the p2m. */
++ if ( hvm_altp2m_supported() )
++ {
++ if ( altp2m_active(d) )
++ for_each_vcpu ( d, v )
++ altp2m_vcpu_disable_ve(v);
++
++ d->arch.altp2m_active = 0;
++
++ FREE_XENHEAP_PAGE(d->arch.altp2m_eptp);
++ FREE_XENHEAP_PAGE(d->arch.altp2m_visible_eptp);
++
++ for ( i = 0; i < MAX_ALTP2M; i++ )
++ p2m_teardown(d->arch.altp2m_p2m[i], false);
++ }
++
++ /* Destroy nestedp2m's after altp2m. */
++ for ( i = 0; i < MAX_NESTEDP2M; i++ )
++ p2m_teardown(d->arch.nested_p2m[i], false);
++
++ p2m_teardown(p2m_get_hostp2m(d), false);
++
+ paging_lock(d); /* Keep various asserts happy */
+
+ if ( d->arch.paging.hap.total_pages != 0 )
+diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
+index 9eb33eafc7f7..ac9a1ae07808 100644
+--- a/xen/arch/x86/mm/shadow/common.c
++++ b/xen/arch/x86/mm/shadow/common.c
+@@ -2824,8 +2824,17 @@ void shadow_teardown(struct domain *d, bool *preempted)
+ for_each_vcpu ( d, v )
+ shadow_vcpu_teardown(v);
+
++ p2m_teardown(p2m_get_hostp2m(d), false);
++
+ paging_lock(d);
+
++ /*
++ * Reclaim all shadow memory so that shadow_set_allocation() doesn't find
++ * in-use pages, as _shadow_prealloc() will no longer try to reclaim pages
++ * because the domain is dying.
++ */
++ shadow_blow_tables(d);
++
+ #if (SHADOW_OPTIMIZATIONS & (SHOPT_VIRTUAL_TLB|SHOPT_OUT_OF_SYNC))
+ /* Free the virtual-TLB array attached to each vcpu */
+ for_each_vcpu(d, v)
+@@ -2946,6 +2955,9 @@ void shadow_final_teardown(struct domain *d)
+ d->arch.paging.shadow.total_pages,
+ d->arch.paging.shadow.free_pages,
+ d->arch.paging.shadow.p2m_pages);
++ ASSERT(!d->arch.paging.shadow.total_pages);
++ ASSERT(!d->arch.paging.shadow.free_pages);
++ ASSERT(!d->arch.paging.shadow.p2m_pages);
+ paging_unlock(d);
+ }
+
+--
+2.37.3
+
diff --git a/0011-xen-iommu-cleanup-iommu-related-domctl-handling.patch b/0011-xen-iommu-cleanup-iommu-related-domctl-handling.patch
deleted file mode 100644
index 3c60de4..0000000
--- a/0011-xen-iommu-cleanup-iommu-related-domctl-handling.patch
+++ /dev/null
@@ -1,112 +0,0 @@
-From a6c32abd144ec6443c6a433b5a2ac00e2615aa86 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 7 Jun 2022 14:02:08 +0200
-Subject: [PATCH 11/51] xen/iommu: cleanup iommu related domctl handling
-
-Today iommu_do_domctl() is being called from arch_do_domctl() in the
-"default:" case of a switch statement. This has led already to crashes
-due to unvalidated parameters.
-
-Fix that by moving the call of iommu_do_domctl() to the main switch
-statement of do_domctl().
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> # Arm
-master commit: 9cd7e31b3f584e97a138a770cfb031a91a867936
-master date: 2022-04-26 10:23:58 +0200
----
- xen/arch/arm/domctl.c | 11 +----------
- xen/arch/x86/domctl.c | 2 +-
- xen/common/domctl.c | 7 +++++++
- xen/include/xen/iommu.h | 12 +++++++++---
- 4 files changed, 18 insertions(+), 14 deletions(-)
-
-diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
-index 6245af6d0bab..1baf25c3d98b 100644
---- a/xen/arch/arm/domctl.c
-+++ b/xen/arch/arm/domctl.c
-@@ -176,16 +176,7 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
- return rc;
- }
- default:
-- {
-- int rc;
--
-- rc = subarch_do_domctl(domctl, d, u_domctl);
--
-- if ( rc == -ENOSYS )
-- rc = iommu_do_domctl(domctl, d, u_domctl);
--
-- return rc;
-- }
-+ return subarch_do_domctl(domctl, d, u_domctl);
- }
- }
-
-diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
-index 7d102e0647ec..0fa51f2ebd10 100644
---- a/xen/arch/x86/domctl.c
-+++ b/xen/arch/x86/domctl.c
-@@ -1380,7 +1380,7 @@ long arch_do_domctl(
- break;
-
- default:
-- ret = iommu_do_domctl(domctl, d, u_domctl);
-+ ret = -ENOSYS;
- break;
- }
-
-diff --git a/xen/common/domctl.c b/xen/common/domctl.c
-index 419e4070f59d..65d2a4588b71 100644
---- a/xen/common/domctl.c
-+++ b/xen/common/domctl.c
-@@ -870,6 +870,13 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
- copyback = 1;
- break;
-
-+ case XEN_DOMCTL_assign_device:
-+ case XEN_DOMCTL_test_assign_device:
-+ case XEN_DOMCTL_deassign_device:
-+ case XEN_DOMCTL_get_device_group:
-+ ret = iommu_do_domctl(op, d, u_domctl);
-+ break;
-+
- default:
- ret = arch_do_domctl(op, d, u_domctl);
- break;
-diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
-index 92b2d23f0ba2..861579562e8a 100644
---- a/xen/include/xen/iommu.h
-+++ b/xen/include/xen/iommu.h
-@@ -342,8 +342,17 @@ struct domain_iommu {
- /* Does the IOMMU pagetable need to be kept synchronized with the P2M */
- #ifdef CONFIG_HAS_PASSTHROUGH
- #define need_iommu_pt_sync(d) (dom_iommu(d)->need_sync)
-+
-+int iommu_do_domctl(struct xen_domctl *domctl, struct domain *d,
-+ XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl);
- #else
- #define need_iommu_pt_sync(d) ({ (void)(d); false; })
-+
-+static inline int iommu_do_domctl(struct xen_domctl *domctl, struct domain *d,
-+ XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
-+{
-+ return -ENOSYS;
-+}
- #endif
-
- int __must_check iommu_suspend(void);
-@@ -357,9 +366,6 @@ int iommu_do_pci_domctl(struct xen_domctl *, struct domain *d,
- XEN_GUEST_HANDLE_PARAM(xen_domctl_t));
- #endif
-
--int iommu_do_domctl(struct xen_domctl *, struct domain *d,
-- XEN_GUEST_HANDLE_PARAM(xen_domctl_t));
--
- void iommu_dev_iotlb_flush_timeout(struct domain *d, struct pci_dev *pdev);
-
- /*
---
-2.35.1
-
diff --git a/0011-xen-x86-p2m-Add-preemption-in-p2m_teardown.patch b/0011-xen-x86-p2m-Add-preemption-in-p2m_teardown.patch
new file mode 100644
index 0000000..5520627
--- /dev/null
+++ b/0011-xen-x86-p2m-Add-preemption-in-p2m_teardown.patch
@@ -0,0 +1,197 @@
+From a603386b422f5cb4c5e2639a7e20a1d99dba2175 Mon Sep 17 00:00:00 2001
+From: Julien Grall <jgrall@amazon.com>
+Date: Tue, 11 Oct 2022 14:54:44 +0200
+Subject: [PATCH 11/26] xen/x86: p2m: Add preemption in p2m_teardown()
+
+The list p2m->pages contain all the pages used by the P2M. On large
+instance this can be quite large and the time spent to call
+d->arch.paging.free_page() will take more than 1ms for a 80GB guest
+on a Xen running in nested environment on a c5.metal.
+
+By extrapolation, it would take > 100ms for a 8TB guest (what we
+current security support). So add some preemption in p2m_teardown()
+and propagate to the callers. Note there are 3 places where
+the preemption is not enabled:
+ - hap_final_teardown()/shadow_final_teardown(): We are
+ preventing update the P2M once the domain is dying (so
+ no more pages could be allocated) and most of the P2M pages
+ will be freed in preemptive manneer when relinquishing the
+ resources. So this is fine to disable preemption.
+ - shadow_enable(): This is fine because it will undo the allocation
+ that may have been made by p2m_alloc_table() (so only the root
+ page table).
+
+The preemption is arbitrarily checked every 1024 iterations.
+
+We now need to include <xen/event.h> in p2m-basic in order to
+import the definition for local_events_need_delivery() used by
+general_preempt_check(). Ideally, the inclusion should happen in
+xen/sched.h but it opened a can of worms.
+
+Note that with the current approach, Xen doesn't keep track on whether
+the alt/nested P2Ms have been cleared. So there are some redundant work.
+However, this is not expected to incurr too much overhead (the P2M lock
+shouldn't be contended during teardown). So this is optimization is
+left outside of the security event.
+
+This is part of CVE-2022-33746 / XSA-410.
+
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+master commit: 8a2111250b424edc49c65c4d41b276766d30635c
+master date: 2022-10-11 14:24:48 +0200
+---
+ xen/arch/x86/mm/hap/hap.c | 22 ++++++++++++++++------
+ xen/arch/x86/mm/p2m.c | 18 +++++++++++++++---
+ xen/arch/x86/mm/shadow/common.c | 12 +++++++++---
+ xen/include/asm-x86/p2m.h | 2 +-
+ 4 files changed, 41 insertions(+), 13 deletions(-)
+
+diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
+index a44fcfd95e1e..1f9a157a0c34 100644
+--- a/xen/arch/x86/mm/hap/hap.c
++++ b/xen/arch/x86/mm/hap/hap.c
+@@ -548,17 +548,17 @@ void hap_final_teardown(struct domain *d)
+
+ if ( hvm_altp2m_supported() )
+ for ( i = 0; i < MAX_ALTP2M; i++ )
+- p2m_teardown(d->arch.altp2m_p2m[i], true);
++ p2m_teardown(d->arch.altp2m_p2m[i], true, NULL);
+
+ /* Destroy nestedp2m's first */
+ for (i = 0; i < MAX_NESTEDP2M; i++) {
+- p2m_teardown(d->arch.nested_p2m[i], true);
++ p2m_teardown(d->arch.nested_p2m[i], true, NULL);
+ }
+
+ if ( d->arch.paging.hap.total_pages != 0 )
+ hap_teardown(d, NULL);
+
+- p2m_teardown(p2m_get_hostp2m(d), true);
++ p2m_teardown(p2m_get_hostp2m(d), true, NULL);
+ /* Free any memory that the p2m teardown released */
+ paging_lock(d);
+ hap_set_allocation(d, 0, NULL);
+@@ -612,14 +612,24 @@ void hap_teardown(struct domain *d, bool *preempted)
+ FREE_XENHEAP_PAGE(d->arch.altp2m_visible_eptp);
+
+ for ( i = 0; i < MAX_ALTP2M; i++ )
+- p2m_teardown(d->arch.altp2m_p2m[i], false);
++ {
++ p2m_teardown(d->arch.altp2m_p2m[i], false, preempted);
++ if ( preempted && *preempted )
++ return;
++ }
+ }
+
+ /* Destroy nestedp2m's after altp2m. */
+ for ( i = 0; i < MAX_NESTEDP2M; i++ )
+- p2m_teardown(d->arch.nested_p2m[i], false);
++ {
++ p2m_teardown(d->arch.nested_p2m[i], false, preempted);
++ if ( preempted && *preempted )
++ return;
++ }
+
+- p2m_teardown(p2m_get_hostp2m(d), false);
++ p2m_teardown(p2m_get_hostp2m(d), false, preempted);
++ if ( preempted && *preempted )
++ return;
+
+ paging_lock(d); /* Keep various asserts happy */
+
+diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
+index aba4f17cbe12..8781df9dda8d 100644
+--- a/xen/arch/x86/mm/p2m.c
++++ b/xen/arch/x86/mm/p2m.c
+@@ -749,12 +749,13 @@ int p2m_alloc_table(struct p2m_domain *p2m)
+ * hvm fixme: when adding support for pvh non-hardware domains, this path must
+ * cleanup any foreign p2m types (release refcnts on them).
+ */
+-void p2m_teardown(struct p2m_domain *p2m, bool remove_root)
++void p2m_teardown(struct p2m_domain *p2m, bool remove_root, bool *preempted)
+ /* Return all the p2m pages to Xen.
+ * We know we don't have any extra mappings to these pages */
+ {
+ struct page_info *pg, *root_pg = NULL;
+ struct domain *d;
++ unsigned int i = 0;
+
+ if (p2m == NULL)
+ return;
+@@ -773,8 +774,19 @@ void p2m_teardown(struct p2m_domain *p2m, bool remove_root)
+ }
+
+ while ( (pg = page_list_remove_head(&p2m->pages)) )
+- if ( pg != root_pg )
+- d->arch.paging.free_page(d, pg);
++ {
++ if ( pg == root_pg )
++ continue;
++
++ d->arch.paging.free_page(d, pg);
++
++ /* Arbitrarily check preemption every 1024 iterations */
++ if ( preempted && !(++i % 1024) && general_preempt_check() )
++ {
++ *preempted = true;
++ break;
++ }
++ }
+
+ if ( root_pg )
+ page_list_add(root_pg, &p2m->pages);
+diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
+index ac9a1ae07808..3b0d781991b5 100644
+--- a/xen/arch/x86/mm/shadow/common.c
++++ b/xen/arch/x86/mm/shadow/common.c
+@@ -2770,8 +2770,12 @@ int shadow_enable(struct domain *d, u32 mode)
+ out_locked:
+ paging_unlock(d);
+ out_unlocked:
++ /*
++ * This is fine to ignore the preemption here because only the root
++ * will be allocated by p2m_alloc_table().
++ */
+ if ( rv != 0 && !pagetable_is_null(p2m_get_pagetable(p2m)) )
+- p2m_teardown(p2m, true);
++ p2m_teardown(p2m, true, NULL);
+ if ( rv != 0 && pg != NULL )
+ {
+ pg->count_info &= ~PGC_count_mask;
+@@ -2824,7 +2828,9 @@ void shadow_teardown(struct domain *d, bool *preempted)
+ for_each_vcpu ( d, v )
+ shadow_vcpu_teardown(v);
+
+- p2m_teardown(p2m_get_hostp2m(d), false);
++ p2m_teardown(p2m_get_hostp2m(d), false, preempted);
++ if ( preempted && *preempted )
++ return;
+
+ paging_lock(d);
+
+@@ -2945,7 +2951,7 @@ void shadow_final_teardown(struct domain *d)
+ shadow_teardown(d, NULL);
+
+ /* It is now safe to pull down the p2m map. */
+- p2m_teardown(p2m_get_hostp2m(d), true);
++ p2m_teardown(p2m_get_hostp2m(d), true, NULL);
+ /* Free any shadow memory that the p2m teardown released */
+ paging_lock(d);
+ shadow_set_allocation(d, 0, NULL);
+diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
+index c3c16748e7d5..2db9ab0122f2 100644
+--- a/xen/include/asm-x86/p2m.h
++++ b/xen/include/asm-x86/p2m.h
+@@ -574,7 +574,7 @@ int p2m_init(struct domain *d);
+ int p2m_alloc_table(struct p2m_domain *p2m);
+
+ /* Return all the p2m resources to Xen. */
+-void p2m_teardown(struct p2m_domain *p2m, bool remove_root);
++void p2m_teardown(struct p2m_domain *p2m, bool remove_root, bool *preempted);
+ void p2m_final_teardown(struct domain *d);
+
+ /* Add a page to a domain's p2m table */
+--
+2.37.3
+
diff --git a/0012-IOMMU-make-domctl-handler-tolerate-NULL-domain.patch b/0012-IOMMU-make-domctl-handler-tolerate-NULL-domain.patch
deleted file mode 100644
index 37b9005..0000000
--- a/0012-IOMMU-make-domctl-handler-tolerate-NULL-domain.patch
+++ /dev/null
@@ -1,36 +0,0 @@
-From 4cf9a7c7bdb9d544fbac81105bbc1059ba3dd932 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 7 Jun 2022 14:02:30 +0200
-Subject: [PATCH 12/51] IOMMU: make domctl handler tolerate NULL domain
-
-Besides the reporter's issue of hitting a NULL deref when !CONFIG_GDBSX,
-XEN_DOMCTL_test_assign_device can legitimately end up having NULL passed
-here, when the domctl was passed DOMID_INVALID.
-
-Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...")
-Reported-by: Cheyenne Wills <cheyenne.wills@gmail.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Paul Durrant <paul@xen.org>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-master commit: fa4d84e6dd3c3bfd23a525b75a5483d4ce15adbb
-master date: 2022-04-26 10:25:54 +0200
----
- xen/drivers/passthrough/iommu.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
-index caaba62c8865..287f63fc736f 100644
---- a/xen/drivers/passthrough/iommu.c
-+++ b/xen/drivers/passthrough/iommu.c
-@@ -535,7 +535,7 @@ int iommu_do_domctl(
- {
- int ret = -ENODEV;
-
-- if ( !is_iommu_enabled(d) )
-+ if ( !(d ? is_iommu_enabled(d) : iommu_enabled) )
- return -EOPNOTSUPP;
-
- #ifdef CONFIG_HAS_PCI
---
-2.35.1
-
diff --git a/0012-libxl-docs-Use-arch-specific-default-paging-memory.patch b/0012-libxl-docs-Use-arch-specific-default-paging-memory.patch
new file mode 100644
index 0000000..9390500
--- /dev/null
+++ b/0012-libxl-docs-Use-arch-specific-default-paging-memory.patch
@@ -0,0 +1,149 @@
+From 755a9b52844de3e1e47aa1fc9991a4240ccfbf35 Mon Sep 17 00:00:00 2001
+From: Henry Wang <Henry.Wang@arm.com>
+Date: Tue, 11 Oct 2022 14:55:08 +0200
+Subject: [PATCH 12/26] libxl, docs: Use arch-specific default paging memory
+
+The default paging memory (descibed in `shadow_memory` entry in xl
+config) in libxl is used to determine the memory pool size for xl
+guests. Currently this size is only used for x86, and contains a part
+of RAM to shadow the resident processes. Since on Arm there is no
+shadow mode guests, so the part of RAM to shadow the resident processes
+is not necessary. Therefore, this commit splits the function
+`libxl_get_required_shadow_memory()` to arch specific helpers and
+renamed the helper to `libxl__arch_get_required_paging_memory()`.
+
+On x86, this helper calls the original value from
+`libxl_get_required_shadow_memory()` so no functional change intended.
+
+On Arm, this helper returns 1MB per vcpu plus 4KB per MiB of RAM
+for the P2M map and additional 512KB.
+
+Also update the xl.cfg documentation to add Arm documentation
+according to code changes and correct the comment style following Xen
+coding style.
+
+This is part of CVE-2022-33747 / XSA-409.
+
+Suggested-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Henry Wang <Henry.Wang@arm.com>
+Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
+master commit: 156a239ea288972425f967ac807b3cb5b5e14874
+master date: 2022-10-11 14:28:37 +0200
+---
+ docs/man/xl.cfg.5.pod.in | 5 +++++
+ tools/libs/light/libxl_arch.h | 4 ++++
+ tools/libs/light/libxl_arm.c | 14 ++++++++++++++
+ tools/libs/light/libxl_utils.c | 9 ++-------
+ tools/libs/light/libxl_x86.c | 13 +++++++++++++
+ 5 files changed, 38 insertions(+), 7 deletions(-)
+
+diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
+index b98d1613987e..eda1e77ebd06 100644
+--- a/docs/man/xl.cfg.5.pod.in
++++ b/docs/man/xl.cfg.5.pod.in
+@@ -1768,6 +1768,11 @@ are not using hardware assisted paging (i.e. you are using shadow
+ mode) and your guest workload consists of a very large number of
+ similar processes then increasing this value may improve performance.
+
++On Arm, this field is used to determine the size of the guest P2M pages
++pool, and the default value is 1MB per vCPU plus 4KB per MB of RAM for
++the P2M map and additional 512KB for extended regions. Users should
++adjust this value if bigger P2M pool size is needed.
++
+ =back
+
+ =head3 Processor and Platform Features
+diff --git a/tools/libs/light/libxl_arch.h b/tools/libs/light/libxl_arch.h
+index 1522ecb97f72..5a060c2c3033 100644
+--- a/tools/libs/light/libxl_arch.h
++++ b/tools/libs/light/libxl_arch.h
+@@ -90,6 +90,10 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
+ libxl_domain_config *dst,
+ const libxl_domain_config *src);
+
++_hidden
++unsigned long libxl__arch_get_required_paging_memory(unsigned long maxmem_kb,
++ unsigned int smp_cpus);
++
+ #if defined(__i386__) || defined(__x86_64__)
+
+ #define LAPIC_BASE_ADDRESS 0xfee00000
+diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
+index eef1de093914..73a95e83af24 100644
+--- a/tools/libs/light/libxl_arm.c
++++ b/tools/libs/light/libxl_arm.c
+@@ -154,6 +154,20 @@ out:
+ return rc;
+ }
+
++unsigned long libxl__arch_get_required_paging_memory(unsigned long maxmem_kb,
++ unsigned int smp_cpus)
++{
++ /*
++ * 256 pages (1MB) per vcpu,
++ * plus 1 page per MiB of RAM for the P2M map,
++ * plus 1 page per MiB of extended region. This default value is 128 MiB
++ * which should be enough for domains that are not running backend.
++ * This is higher than the minimum that Xen would allocate if no value
++ * were given (but the Xen minimum is for safety, not performance).
++ */
++ return 4 * (256 * smp_cpus + maxmem_kb / 1024 + 128);
++}
++
+ static struct arch_info {
+ const char *guest_type;
+ const char *timer_compat;
+diff --git a/tools/libs/light/libxl_utils.c b/tools/libs/light/libxl_utils.c
+index 4699c4a0a36f..e276c0ee9cc3 100644
+--- a/tools/libs/light/libxl_utils.c
++++ b/tools/libs/light/libxl_utils.c
+@@ -18,6 +18,7 @@
+ #include <ctype.h>
+
+ #include "libxl_internal.h"
++#include "libxl_arch.h"
+ #include "_paths.h"
+
+ #ifndef LIBXL_HAVE_NONCONST_LIBXL_BASENAME_RETURN_VALUE
+@@ -39,13 +40,7 @@ char *libxl_basename(const char *name)
+
+ unsigned long libxl_get_required_shadow_memory(unsigned long maxmem_kb, unsigned int smp_cpus)
+ {
+- /* 256 pages (1MB) per vcpu,
+- plus 1 page per MiB of RAM for the P2M map,
+- plus 1 page per MiB of RAM to shadow the resident processes.
+- This is higher than the minimum that Xen would allocate if no value
+- were given (but the Xen minimum is for safety, not performance).
+- */
+- return 4 * (256 * smp_cpus + 2 * (maxmem_kb / 1024));
++ return libxl__arch_get_required_paging_memory(maxmem_kb, smp_cpus);
+ }
+
+ char *libxl_domid_to_name(libxl_ctx *ctx, uint32_t domid)
+diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
+index 1feadebb1852..51362893cf98 100644
+--- a/tools/libs/light/libxl_x86.c
++++ b/tools/libs/light/libxl_x86.c
+@@ -882,6 +882,19 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
+ libxl_defbool_val(src->b_info.arch_x86.msr_relaxed));
+ }
+
++unsigned long libxl__arch_get_required_paging_memory(unsigned long maxmem_kb,
++ unsigned int smp_cpus)
++{
++ /*
++ * 256 pages (1MB) per vcpu,
++ * plus 1 page per MiB of RAM for the P2M map,
++ * plus 1 page per MiB of RAM to shadow the resident processes.
++ * This is higher than the minimum that Xen would allocate if no value
++ * were given (but the Xen minimum is for safety, not performance).
++ */
++ return 4 * (256 * smp_cpus + 2 * (maxmem_kb / 1024));
++}
++
+ /*
+ * Local variables:
+ * mode: C
+--
+2.37.3
+
diff --git a/0013-IOMMU-x86-disallow-device-assignment-to-PoD-guests.patch b/0013-IOMMU-x86-disallow-device-assignment-to-PoD-guests.patch
deleted file mode 100644
index 8416c96..0000000
--- a/0013-IOMMU-x86-disallow-device-assignment-to-PoD-guests.patch
+++ /dev/null
@@ -1,229 +0,0 @@
-From 838f6c211f7f05f107e1acdfb0977ab61ec0bf2e Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 7 Jun 2022 14:03:20 +0200
-Subject: [PATCH 13/51] IOMMU/x86: disallow device assignment to PoD guests
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-While it is okay for IOMMU page tables to be set up for guests starting
-in PoD mode, actual device assignment may only occur once all PoD
-entries have been removed from the P2M. So far this was enforced only
-for boot-time assignment, and only in the tool stack.
-
-Also use the new function to replace p2m_pod_entry_count(): Its unlocked
-access to p2m->pod.entry_count wasn't really okay (irrespective of the
-result being stale by the time the caller gets to see it). Nor was the
-use of that function in line with the immediately preceding comment: A
-PoD guest isn't just one with a non-zero entry count, but also one with
-a non-empty cache (e.g. prior to actually launching the guest).
-
-To allow the tool stack to see a consistent snapshot of PoD state, move
-the tail of XENMEM_{get,set}_pod_target handling into a function, adding
-proper locking there.
-
-In libxl take the liberty to use the new local variable r also for a
-pre-existing call into libxc.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: ad4312d764e8b40a1e45b64aac6d840a60c59f13
-master date: 2022-05-02 08:48:02 +0200
----
- xen/arch/x86/mm.c | 6 +---
- xen/arch/x86/mm/p2m-pod.c | 43 ++++++++++++++++++++++++++++-
- xen/common/vm_event.c | 2 +-
- xen/drivers/passthrough/x86/iommu.c | 3 +-
- xen/include/asm-x86/p2m.h | 21 +++++++-------
- 5 files changed, 57 insertions(+), 18 deletions(-)
-
-diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
-index e222d9aa98ee..4ee2de11051d 100644
---- a/xen/arch/x86/mm.c
-+++ b/xen/arch/x86/mm.c
-@@ -4777,7 +4777,6 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
- {
- xen_pod_target_t target;
- struct domain *d;
-- struct p2m_domain *p2m;
-
- if ( copy_from_guest(&target, arg, 1) )
- return -EFAULT;
-@@ -4812,10 +4811,7 @@ long arch_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
- }
- else if ( rc >= 0 )
- {
-- p2m = p2m_get_hostp2m(d);
-- target.tot_pages = domain_tot_pages(d);
-- target.pod_cache_pages = p2m->pod.count;
-- target.pod_entries = p2m->pod.entry_count;
-+ p2m_pod_get_mem_target(d, &target);
-
- if ( __copy_to_guest(arg, &target, 1) )
- {
-diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
-index d8d1a0ce7ed7..a3c9d8a97423 100644
---- a/xen/arch/x86/mm/p2m-pod.c
-+++ b/xen/arch/x86/mm/p2m-pod.c
-@@ -20,6 +20,7 @@
- */
-
- #include <xen/event.h>
-+#include <xen/iocap.h>
- #include <xen/ioreq.h>
- #include <xen/mm.h>
- #include <xen/sched.h>
-@@ -362,7 +363,10 @@ p2m_pod_set_mem_target(struct domain *d, unsigned long target)
-
- ASSERT( pod_target >= p2m->pod.count );
-
-- ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/);
-+ if ( has_arch_pdevs(d) || cache_flush_permitted(d) )
-+ ret = -ENOTEMPTY;
-+ else
-+ ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/);
-
- out:
- pod_unlock(p2m);
-@@ -370,6 +374,23 @@ out:
- return ret;
- }
-
-+void p2m_pod_get_mem_target(const struct domain *d, xen_pod_target_t *target)
-+{
-+ struct p2m_domain *p2m = p2m_get_hostp2m(d);
-+
-+ ASSERT(is_hvm_domain(d));
-+
-+ pod_lock(p2m);
-+ lock_page_alloc(p2m);
-+
-+ target->tot_pages = domain_tot_pages(d);
-+ target->pod_cache_pages = p2m->pod.count;
-+ target->pod_entries = p2m->pod.entry_count;
-+
-+ unlock_page_alloc(p2m);
-+ pod_unlock(p2m);
-+}
-+
- int p2m_pod_empty_cache(struct domain *d)
- {
- struct p2m_domain *p2m = p2m_get_hostp2m(d);
-@@ -1387,6 +1408,9 @@ guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
- if ( !paging_mode_translate(d) )
- return -EINVAL;
-
-+ if ( has_arch_pdevs(d) || cache_flush_permitted(d) )
-+ return -ENOTEMPTY;
-+
- do {
- rc = mark_populate_on_demand(d, gfn, chunk_order);
-
-@@ -1408,3 +1432,20 @@ void p2m_pod_init(struct p2m_domain *p2m)
- for ( i = 0; i < ARRAY_SIZE(p2m->pod.mrp.list); ++i )
- p2m->pod.mrp.list[i] = gfn_x(INVALID_GFN);
- }
-+
-+bool p2m_pod_active(const struct domain *d)
-+{
-+ struct p2m_domain *p2m;
-+ bool res;
-+
-+ if ( !is_hvm_domain(d) )
-+ return false;
-+
-+ p2m = p2m_get_hostp2m(d);
-+
-+ pod_lock(p2m);
-+ res = p2m->pod.entry_count | p2m->pod.count;
-+ pod_unlock(p2m);
-+
-+ return res;
-+}
-diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c
-index 70ab3ba406ff..21d2f0edf727 100644
---- a/xen/common/vm_event.c
-+++ b/xen/common/vm_event.c
-@@ -639,7 +639,7 @@ int vm_event_domctl(struct domain *d, struct xen_domctl_vm_event_op *vec)
-
- rc = -EXDEV;
- /* Disallow paging in a PoD guest */
-- if ( p2m_pod_entry_count(p2m_get_hostp2m(d)) )
-+ if ( p2m_pod_active(d) )
- break;
-
- /* domain_pause() not required here, see XSA-99 */
-diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
-index a36a6bd4b249..dc9936e16930 100644
---- a/xen/drivers/passthrough/x86/iommu.c
-+++ b/xen/drivers/passthrough/x86/iommu.c
-@@ -502,11 +502,12 @@ bool arch_iommu_use_permitted(const struct domain *d)
- {
- /*
- * Prevent device assign if mem paging, mem sharing or log-dirty
-- * have been enabled for this domain.
-+ * have been enabled for this domain, or if PoD is still in active use.
- */
- return d == dom_io ||
- (likely(!mem_sharing_enabled(d)) &&
- likely(!mem_paging_enabled(d)) &&
-+ likely(!p2m_pod_active(d)) &&
- likely(!p2m_get_hostp2m(d)->global_logdirty));
- }
-
-diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
-index 357a8087481e..f2af7a746ced 100644
---- a/xen/include/asm-x86/p2m.h
-+++ b/xen/include/asm-x86/p2m.h
-@@ -661,6 +661,12 @@ int p2m_pod_empty_cache(struct domain *d);
- * domain matches target */
- int p2m_pod_set_mem_target(struct domain *d, unsigned long target);
-
-+/* Obtain a consistent snapshot of PoD related domain state. */
-+void p2m_pod_get_mem_target(const struct domain *d, xen_pod_target_t *target);
-+
-+/* Check whether PoD is (still) active in a domain. */
-+bool p2m_pod_active(const struct domain *d);
-+
- /* Scan pod cache when offline/broken page triggered */
- int
- p2m_pod_offline_or_broken_hit(struct page_info *p);
-@@ -669,11 +675,6 @@ p2m_pod_offline_or_broken_hit(struct page_info *p);
- void
- p2m_pod_offline_or_broken_replace(struct page_info *p);
-
--static inline long p2m_pod_entry_count(const struct p2m_domain *p2m)
--{
-- return p2m->pod.entry_count;
--}
--
- void p2m_pod_init(struct p2m_domain *p2m);
-
- #else
-@@ -689,6 +690,11 @@ static inline int p2m_pod_empty_cache(struct domain *d)
- return 0;
- }
-
-+static inline bool p2m_pod_active(const struct domain *d)
-+{
-+ return false;
-+}
-+
- static inline int p2m_pod_offline_or_broken_hit(struct page_info *p)
- {
- return 0;
-@@ -699,11 +705,6 @@ static inline void p2m_pod_offline_or_broken_replace(struct page_info *p)
- ASSERT_UNREACHABLE();
- }
-
--static inline long p2m_pod_entry_count(const struct p2m_domain *p2m)
--{
-- return 0;
--}
--
- static inline void p2m_pod_init(struct p2m_domain *p2m) {}
-
- #endif
---
-2.35.1
-
diff --git a/0013-xen-arm-Construct-the-P2M-pages-pool-for-guests.patch b/0013-xen-arm-Construct-the-P2M-pages-pool-for-guests.patch
new file mode 100644
index 0000000..dee9d9c
--- /dev/null
+++ b/0013-xen-arm-Construct-the-P2M-pages-pool-for-guests.patch
@@ -0,0 +1,189 @@
+From 914fc8e8b4cc003e90d51bee0aef54687358530a Mon Sep 17 00:00:00 2001
+From: Henry Wang <Henry.Wang@arm.com>
+Date: Tue, 11 Oct 2022 14:55:21 +0200
+Subject: [PATCH 13/26] xen/arm: Construct the P2M pages pool for guests
+
+This commit constructs the p2m pages pool for guests from the
+data structure and helper perspective.
+
+This is implemented by:
+
+- Adding a `struct paging_domain` which contains a freelist, a
+counter variable and a spinlock to `struct arch_domain` to
+indicate the free p2m pages and the number of p2m total pages in
+the p2m pages pool.
+
+- Adding a helper `p2m_get_allocation` to get the p2m pool size.
+
+- Adding a helper `p2m_set_allocation` to set the p2m pages pool
+size. This helper should be called before allocating memory for
+a guest.
+
+- Adding a helper `p2m_teardown_allocation` to free the p2m pages
+pool. This helper should be called during the xl domain destory.
+
+This is part of CVE-2022-33747 / XSA-409.
+
+Signed-off-by: Henry Wang <Henry.Wang@arm.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+master commit: 55914f7fc91a468649b8a3ec3f53ae1c4aca6670
+master date: 2022-10-11 14:28:39 +0200
+---
+ xen/arch/arm/p2m.c | 88 ++++++++++++++++++++++++++++++++++++
+ xen/include/asm-arm/domain.h | 10 ++++
+ xen/include/asm-arm/p2m.h | 4 ++
+ 3 files changed, 102 insertions(+)
+
+diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
+index 27418ee5ee98..d8957dd8727c 100644
+--- a/xen/arch/arm/p2m.c
++++ b/xen/arch/arm/p2m.c
+@@ -50,6 +50,92 @@ static uint64_t generate_vttbr(uint16_t vmid, mfn_t root_mfn)
+ return (mfn_to_maddr(root_mfn) | ((uint64_t)vmid << 48));
+ }
+
++/* Return the size of the pool, rounded up to the nearest MB */
++unsigned int p2m_get_allocation(struct domain *d)
++{
++ unsigned long nr_pages = ACCESS_ONCE(d->arch.paging.p2m_total_pages);
++
++ return ROUNDUP(nr_pages, 1 << (20 - PAGE_SHIFT)) >> (20 - PAGE_SHIFT);
++}
++
++/*
++ * Set the pool of pages to the required number of pages.
++ * Returns 0 for success, non-zero for failure.
++ * Call with d->arch.paging.lock held.
++ */
++int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted)
++{
++ struct page_info *pg;
++
++ ASSERT(spin_is_locked(&d->arch.paging.lock));
++
++ for ( ; ; )
++ {
++ if ( d->arch.paging.p2m_total_pages < pages )
++ {
++ /* Need to allocate more memory from domheap */
++ pg = alloc_domheap_page(NULL, 0);
++ if ( pg == NULL )
++ {
++ printk(XENLOG_ERR "Failed to allocate P2M pages.\n");
++ return -ENOMEM;
++ }
++ ACCESS_ONCE(d->arch.paging.p2m_total_pages) =
++ d->arch.paging.p2m_total_pages + 1;
++ page_list_add_tail(pg, &d->arch.paging.p2m_freelist);
++ }
++ else if ( d->arch.paging.p2m_total_pages > pages )
++ {
++ /* Need to return memory to domheap */
++ pg = page_list_remove_head(&d->arch.paging.p2m_freelist);
++ if( pg )
++ {
++ ACCESS_ONCE(d->arch.paging.p2m_total_pages) =
++ d->arch.paging.p2m_total_pages - 1;
++ free_domheap_page(pg);
++ }
++ else
++ {
++ printk(XENLOG_ERR
++ "Failed to free P2M pages, P2M freelist is empty.\n");
++ return -ENOMEM;
++ }
++ }
++ else
++ break;
++
++ /* Check to see if we need to yield and try again */
++ if ( preempted && general_preempt_check() )
++ {
++ *preempted = true;
++ return -ERESTART;
++ }
++ }
++
++ return 0;
++}
++
++int p2m_teardown_allocation(struct domain *d)
++{
++ int ret = 0;
++ bool preempted = false;
++
++ spin_lock(&d->arch.paging.lock);
++ if ( d->arch.paging.p2m_total_pages != 0 )
++ {
++ ret = p2m_set_allocation(d, 0, &preempted);
++ if ( preempted )
++ {
++ spin_unlock(&d->arch.paging.lock);
++ return -ERESTART;
++ }
++ ASSERT(d->arch.paging.p2m_total_pages == 0);
++ }
++ spin_unlock(&d->arch.paging.lock);
++
++ return ret;
++}
++
+ /* Unlock the flush and do a P2M TLB flush if necessary */
+ void p2m_write_unlock(struct p2m_domain *p2m)
+ {
+@@ -1599,7 +1685,9 @@ int p2m_init(struct domain *d)
+ unsigned int cpu;
+
+ rwlock_init(&p2m->lock);
++ spin_lock_init(&d->arch.paging.lock);
+ INIT_PAGE_LIST_HEAD(&p2m->pages);
++ INIT_PAGE_LIST_HEAD(&d->arch.paging.p2m_freelist);
+
+ p2m->vmid = INVALID_VMID;
+
+diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
+index 7f8ddd3f5c3b..2f31795ab96d 100644
+--- a/xen/include/asm-arm/domain.h
++++ b/xen/include/asm-arm/domain.h
+@@ -40,6 +40,14 @@ struct vtimer {
+ uint64_t cval;
+ };
+
++struct paging_domain {
++ spinlock_t lock;
++ /* Free P2M pages from the pre-allocated P2M pool */
++ struct page_list_head p2m_freelist;
++ /* Number of pages from the pre-allocated P2M pool */
++ unsigned long p2m_total_pages;
++};
++
+ struct arch_domain
+ {
+ #ifdef CONFIG_ARM_64
+@@ -51,6 +59,8 @@ struct arch_domain
+
+ struct hvm_domain hvm;
+
++ struct paging_domain paging;
++
+ struct vmmio vmmio;
+
+ /* Continuable domain_relinquish_resources(). */
+diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
+index b3ba83283e11..c9598740bd02 100644
+--- a/xen/include/asm-arm/p2m.h
++++ b/xen/include/asm-arm/p2m.h
+@@ -218,6 +218,10 @@ void p2m_restore_state(struct vcpu *n);
+ /* Print debugging/statistial info about a domain's p2m */
+ void p2m_dump_info(struct domain *d);
+
++unsigned int p2m_get_allocation(struct domain *d);
++int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted);
++int p2m_teardown_allocation(struct domain *d);
++
+ static inline void p2m_write_lock(struct p2m_domain *p2m)
+ {
+ write_lock(&p2m->lock);
+--
+2.37.3
+
diff --git a/0014-x86-msr-handle-reads-to-MSR_P5_MC_-ADDR-TYPE.patch b/0014-x86-msr-handle-reads-to-MSR_P5_MC_-ADDR-TYPE.patch
deleted file mode 100644
index 69049f1..0000000
--- a/0014-x86-msr-handle-reads-to-MSR_P5_MC_-ADDR-TYPE.patch
+++ /dev/null
@@ -1,121 +0,0 @@
-From 9ebe2ba83644ec6cd33a93c68dab5f551adcbea0 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 7 Jun 2022 14:04:16 +0200
-Subject: [PATCH 14/51] x86/msr: handle reads to MSR_P5_MC_{ADDR,TYPE}
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Windows Server 2019 Essentials will unconditionally attempt to read
-P5_MC_ADDR MSR at boot and throw a BSOD if injected a #GP.
-
-Fix this by mapping MSR_P5_MC_{ADDR,TYPE} to
-MSR_IA32_MCi_{ADDR,STATUS}, as reported also done by hardware in Intel
-SDM "Mapping of the Pentium Processor Machine-Check Errors to the
-Machine-Check Architecture" section.
-
-Reported-by: Steffen Einsle <einsle@phptrix.de>
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: ce59e472b581e4923f6892172dde62b88c8aa8b7
-master date: 2022-05-02 08:49:12 +0200
----
- xen/arch/x86/cpu/mcheck/mce.h | 6 ++++++
- xen/arch/x86/cpu/mcheck/mce_intel.c | 19 +++++++++++++++++++
- xen/arch/x86/cpu/mcheck/vmce.c | 2 ++
- xen/arch/x86/msr.c | 2 ++
- xen/include/asm-x86/msr-index.h | 3 +++
- 5 files changed, 32 insertions(+)
-
-diff --git a/xen/arch/x86/cpu/mcheck/mce.h b/xen/arch/x86/cpu/mcheck/mce.h
-index 195362691904..192315ecfa3d 100644
---- a/xen/arch/x86/cpu/mcheck/mce.h
-+++ b/xen/arch/x86/cpu/mcheck/mce.h
-@@ -169,6 +169,12 @@ static inline int mce_vendor_bank_msr(const struct vcpu *v, uint32_t msr)
- if (msr >= MSR_IA32_MC0_CTL2 &&
- msr < MSR_IA32_MCx_CTL2(v->arch.vmce.mcg_cap & MCG_CAP_COUNT) )
- return 1;
-+ fallthrough;
-+
-+ case X86_VENDOR_CENTAUR:
-+ case X86_VENDOR_SHANGHAI:
-+ if (msr == MSR_P5_MC_ADDR || msr == MSR_P5_MC_TYPE)
-+ return 1;
- break;
-
- case X86_VENDOR_AMD:
-diff --git a/xen/arch/x86/cpu/mcheck/mce_intel.c b/xen/arch/x86/cpu/mcheck/mce_intel.c
-index bb9f3a3ff795..d364e9bf5ad1 100644
---- a/xen/arch/x86/cpu/mcheck/mce_intel.c
-+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c
-@@ -1001,8 +1001,27 @@ int vmce_intel_wrmsr(struct vcpu *v, uint32_t msr, uint64_t val)
-
- int vmce_intel_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
- {
-+ const struct cpuid_policy *cp = v->domain->arch.cpuid;
- unsigned int bank = msr - MSR_IA32_MC0_CTL2;
-
-+ switch ( msr )
-+ {
-+ case MSR_P5_MC_ADDR:
-+ /*
-+ * Bank 0 is used for the 'bank 0 quirk' on older processors.
-+ * See vcpu_fill_mc_msrs() for reference.
-+ */
-+ *val = v->arch.vmce.bank[1].mci_addr;
-+ return 1;
-+
-+ case MSR_P5_MC_TYPE:
-+ *val = v->arch.vmce.bank[1].mci_status;
-+ return 1;
-+ }
-+
-+ if ( !(cp->x86_vendor & X86_VENDOR_INTEL) )
-+ return 0;
-+
- if ( bank < GUEST_MC_BANK_NUM )
- {
- *val = v->arch.vmce.bank[bank].mci_ctl2;
-diff --git a/xen/arch/x86/cpu/mcheck/vmce.c b/xen/arch/x86/cpu/mcheck/vmce.c
-index eb6434a3ba20..0899df58bcbf 100644
---- a/xen/arch/x86/cpu/mcheck/vmce.c
-+++ b/xen/arch/x86/cpu/mcheck/vmce.c
-@@ -150,6 +150,8 @@ static int bank_mce_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val)
- default:
- switch ( boot_cpu_data.x86_vendor )
- {
-+ case X86_VENDOR_CENTAUR:
-+ case X86_VENDOR_SHANGHAI:
- case X86_VENDOR_INTEL:
- ret = vmce_intel_rdmsr(v, msr, val);
- break;
-diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
-index aaedb2c31287..da305c7aa4c9 100644
---- a/xen/arch/x86/msr.c
-+++ b/xen/arch/x86/msr.c
-@@ -282,6 +282,8 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
- *val = msrs->misc_features_enables.raw;
- break;
-
-+ case MSR_P5_MC_ADDR:
-+ case MSR_P5_MC_TYPE:
- case MSR_IA32_MCG_CAP ... MSR_IA32_MCG_CTL: /* 0x179 -> 0x17b */
- case MSR_IA32_MCx_CTL2(0) ... MSR_IA32_MCx_CTL2(31): /* 0x280 -> 0x29f */
- case MSR_IA32_MCx_CTL(0) ... MSR_IA32_MCx_MISC(31): /* 0x400 -> 0x47f */
-diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
-index 3e038db618ff..31964b88af7a 100644
---- a/xen/include/asm-x86/msr-index.h
-+++ b/xen/include/asm-x86/msr-index.h
-@@ -15,6 +15,9 @@
- * abbreviated name. Exceptions will be considered on a case-by-case basis.
- */
-
-+#define MSR_P5_MC_ADDR 0
-+#define MSR_P5_MC_TYPE 0x00000001
-+
- #define MSR_APIC_BASE 0x0000001b
- #define APIC_BASE_BSP (_AC(1, ULL) << 8)
- #define APIC_BASE_EXTD (_AC(1, ULL) << 10)
---
-2.35.1
-
diff --git a/0014-xen-arm-libxl-Implement-XEN_DOMCTL_shadow_op-for-Arm.patch b/0014-xen-arm-libxl-Implement-XEN_DOMCTL_shadow_op-for-Arm.patch
new file mode 100644
index 0000000..fe24269
--- /dev/null
+++ b/0014-xen-arm-libxl-Implement-XEN_DOMCTL_shadow_op-for-Arm.patch
@@ -0,0 +1,108 @@
+From 3a16da801e14b8ff996b6f7408391ce488abd925 Mon Sep 17 00:00:00 2001
+From: Henry Wang <Henry.Wang@arm.com>
+Date: Tue, 11 Oct 2022 14:55:40 +0200
+Subject: [PATCH 14/26] xen/arm, libxl: Implement XEN_DOMCTL_shadow_op for Arm
+
+This commit implements the `XEN_DOMCTL_shadow_op` support in Xen
+for Arm. The p2m pages pool size for xl guests is supposed to be
+determined by `XEN_DOMCTL_shadow_op`. Hence, this commit:
+
+- Introduces a function `p2m_domctl` and implements the subops
+`XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION` and
+`XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION` of `XEN_DOMCTL_shadow_op`.
+
+- Adds the `XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION` support in libxl.
+
+Therefore enabling the setting of shadow memory pool size
+when creating a guest from xl and getting shadow memory pool size
+from Xen.
+
+Note that the `XEN_DOMCTL_shadow_op` added in this commit is only
+a dummy op, and the functionality of setting/getting p2m memory pool
+size for xl guests will be added in following commits.
+
+This is part of CVE-2022-33747 / XSA-409.
+
+Signed-off-by: Henry Wang <Henry.Wang@arm.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+master commit: cf2a68d2ffbc3ce95e01449d46180bddb10d24a0
+master date: 2022-10-11 14:28:42 +0200
+---
+ tools/libs/light/libxl_arm.c | 12 ++++++++++++
+ xen/arch/arm/domctl.c | 32 ++++++++++++++++++++++++++++++++
+ 2 files changed, 44 insertions(+)
+
+diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
+index 73a95e83af24..22a0c561bbc6 100644
+--- a/tools/libs/light/libxl_arm.c
++++ b/tools/libs/light/libxl_arm.c
+@@ -131,6 +131,18 @@ int libxl__arch_domain_create(libxl__gc *gc,
+ libxl__domain_build_state *state,
+ uint32_t domid)
+ {
++ libxl_ctx *ctx = libxl__gc_owner(gc);
++ unsigned int shadow_mb = DIV_ROUNDUP(d_config->b_info.shadow_memkb, 1024);
++
++ int r = xc_shadow_control(ctx->xch, domid,
++ XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION,
++ &shadow_mb, 0);
++ if (r) {
++ LOGED(ERROR, domid,
++ "Failed to set %u MiB shadow allocation", shadow_mb);
++ return ERROR_FAIL;
++ }
++
+ return 0;
+ }
+
+diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
+index 1baf25c3d98b..9bf72e693019 100644
+--- a/xen/arch/arm/domctl.c
++++ b/xen/arch/arm/domctl.c
+@@ -47,11 +47,43 @@ static int handle_vuart_init(struct domain *d,
+ return rc;
+ }
+
++static long p2m_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
++ XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
++{
++ if ( unlikely(d == current->domain) )
++ {
++ printk(XENLOG_ERR "Tried to do a p2m domctl op on itself.\n");
++ return -EINVAL;
++ }
++
++ if ( unlikely(d->is_dying) )
++ {
++ printk(XENLOG_ERR "Tried to do a p2m domctl op on dying domain %u\n",
++ d->domain_id);
++ return -EINVAL;
++ }
++
++ switch ( sc->op )
++ {
++ case XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION:
++ return 0;
++ case XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION:
++ return 0;
++ default:
++ {
++ printk(XENLOG_ERR "Bad p2m domctl op %u\n", sc->op);
++ return -EINVAL;
++ }
++ }
++}
++
+ long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
+ XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
+ {
+ switch ( domctl->cmd )
+ {
++ case XEN_DOMCTL_shadow_op:
++ return p2m_domctl(d, &domctl->u.shadow_op, u_domctl);
+ case XEN_DOMCTL_cacheflush:
+ {
+ gfn_t s = _gfn(domctl->u.cacheflush.start_pfn);
+--
+2.37.3
+
diff --git a/0015-kconfig-detect-LD-implementation.patch b/0015-kconfig-detect-LD-implementation.patch
deleted file mode 100644
index 4507bc7..0000000
--- a/0015-kconfig-detect-LD-implementation.patch
+++ /dev/null
@@ -1,46 +0,0 @@
-From 3754bd128d1a6b3d5864d1a3ee5d27b67d35387a Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 7 Jun 2022 14:05:06 +0200
-Subject: [PATCH 15/51] kconfig: detect LD implementation
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Detect GNU and LLVM ld implementations. This is required for further
-patches that will introduce diverging behaviour depending on the
-linker implementation in use.
-
-Note that LLVM ld returns "compatible with GNU linkers" as part of the
-version string, so be on the safe side and use '^' to only match at
-the start of the line in case LLVM ever decides to change the text to
-use "compatible with GNU ld" instead.
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Michal Orzel <michal.orzel@arm.com>
-Acked-by: Julien Grall <jgrall@amazon.com>
-master commit: c70c4b624f85f7d4e28c70a804a0a3f20d73092b
-master date: 2022-05-02 08:50:39 +0200
----
- xen/Kconfig | 6 ++++++
- 1 file changed, 6 insertions(+)
-
-diff --git a/xen/Kconfig b/xen/Kconfig
-index bcbd2758e5d3..0c89afd50fcf 100644
---- a/xen/Kconfig
-+++ b/xen/Kconfig
-@@ -23,6 +23,12 @@ config CLANG_VERSION
- int
- default $(shell,$(BASEDIR)/scripts/clang-version.sh $(CC))
-
-+config LD_IS_GNU
-+ def_bool $(success,$(LD) --version | head -n 1 | grep -q "^GNU ld")
-+
-+config LD_IS_LLVM
-+ def_bool $(success,$(LD) --version | head -n 1 | grep -q "^LLD")
-+
- # -fvisibility=hidden reduces -fpic cost, if it's available
- config CC_HAS_VISIBILITY_ATTRIBUTE
- def_bool $(cc-option,-fvisibility=hidden)
---
-2.35.1
-
diff --git a/0015-xen-arm-Allocate-and-free-P2M-pages-from-the-P2M-poo.patch b/0015-xen-arm-Allocate-and-free-P2M-pages-from-the-P2M-poo.patch
new file mode 100644
index 0000000..704543a
--- /dev/null
+++ b/0015-xen-arm-Allocate-and-free-P2M-pages-from-the-P2M-poo.patch
@@ -0,0 +1,289 @@
+From 44e9dcc48b81bca202a5b31926125a6a59a4c72e Mon Sep 17 00:00:00 2001
+From: Henry Wang <Henry.Wang@arm.com>
+Date: Tue, 11 Oct 2022 14:55:53 +0200
+Subject: [PATCH 15/26] xen/arm: Allocate and free P2M pages from the P2M pool
+
+This commit sets/tearsdown of p2m pages pool for non-privileged Arm
+guests by calling `p2m_set_allocation` and `p2m_teardown_allocation`.
+
+- For dom0, P2M pages should come from heap directly instead of p2m
+pool, so that the kernel may take advantage of the extended regions.
+
+- For xl guests, the setting of the p2m pool is called in
+`XEN_DOMCTL_shadow_op` and the p2m pool is destroyed in
+`domain_relinquish_resources`. Note that domctl->u.shadow_op.mb is
+updated with the new size when setting the p2m pool.
+
+- For dom0less domUs, the setting of the p2m pool is called before
+allocating memory during domain creation. Users can specify the p2m
+pool size by `xen,domain-p2m-mem-mb` dts property.
+
+To actually allocate/free pages from the p2m pool, this commit adds
+two helper functions namely `p2m_alloc_page` and `p2m_free_page` to
+`struct p2m_domain`. By replacing the `alloc_domheap_page` and
+`free_domheap_page` with these two helper functions, p2m pages can
+be added/removed from the list of p2m pool rather than from the heap.
+
+Since page from `p2m_alloc_page` is cleaned, take the opportunity
+to remove the redundant `clean_page` in `p2m_create_table`.
+
+This is part of CVE-2022-33747 / XSA-409.
+
+Signed-off-by: Henry Wang <Henry.Wang@arm.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+master commit: cbea5a1149ca7fd4b7cdbfa3ec2e4f109b601ff7
+master date: 2022-10-11 14:28:44 +0200
+---
+ docs/misc/arm/device-tree/booting.txt | 8 ++++
+ xen/arch/arm/domain.c | 6 +++
+ xen/arch/arm/domain_build.c | 29 ++++++++++++++
+ xen/arch/arm/domctl.c | 23 ++++++++++-
+ xen/arch/arm/p2m.c | 57 +++++++++++++++++++++++++--
+ 5 files changed, 118 insertions(+), 5 deletions(-)
+
+diff --git a/docs/misc/arm/device-tree/booting.txt b/docs/misc/arm/device-tree/booting.txt
+index 71895663a4de..d92ccc56ffe0 100644
+--- a/docs/misc/arm/device-tree/booting.txt
++++ b/docs/misc/arm/device-tree/booting.txt
+@@ -182,6 +182,14 @@ with the following properties:
+ Both #address-cells and #size-cells need to be specified because
+ both sub-nodes (described shortly) have reg properties.
+
++- xen,domain-p2m-mem-mb
++
++ Optional. A 32-bit integer specifying the amount of megabytes of RAM
++ used for the domain P2M pool. This is in-sync with the shadow_memory
++ option in xl.cfg. Leaving this field empty in device tree will lead to
++ the default size of domain P2M pool, i.e. 1MB per guest vCPU plus 4KB
++ per MB of guest RAM plus 512KB for guest extended regions.
++
+ Under the "xen,domain" compatible node, one or more sub-nodes are present
+ for the DomU kernel and ramdisk.
+
+diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
+index 2694c39127c5..a818f33a1afa 100644
+--- a/xen/arch/arm/domain.c
++++ b/xen/arch/arm/domain.c
+@@ -997,6 +997,7 @@ enum {
+ PROG_page,
+ PROG_mapping,
+ PROG_p2m,
++ PROG_p2m_pool,
+ PROG_done,
+ };
+
+@@ -1062,6 +1063,11 @@ int domain_relinquish_resources(struct domain *d)
+ if ( ret )
+ return ret;
+
++ PROGRESS(p2m_pool):
++ ret = p2m_teardown_allocation(d);
++ if( ret )
++ return ret;
++
+ PROGRESS(done):
+ break;
+
+diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
+index d02bacbcd1ed..8aec3755ca5d 100644
+--- a/xen/arch/arm/domain_build.c
++++ b/xen/arch/arm/domain_build.c
+@@ -2833,6 +2833,21 @@ static void __init find_gnttab_region(struct domain *d,
+ kinfo->gnttab_start, kinfo->gnttab_start + kinfo->gnttab_size);
+ }
+
++static unsigned long __init domain_p2m_pages(unsigned long maxmem_kb,
++ unsigned int smp_cpus)
++{
++ /*
++ * Keep in sync with libxl__get_required_paging_memory().
++ * 256 pages (1MB) per vcpu, plus 1 page per MiB of RAM for the P2M map,
++ * plus 128 pages to cover extended regions.
++ */
++ unsigned long memkb = 4 * (256 * smp_cpus + (maxmem_kb / 1024) + 128);
++
++ BUILD_BUG_ON(PAGE_SIZE != SZ_4K);
++
++ return DIV_ROUND_UP(memkb, 1024) << (20 - PAGE_SHIFT);
++}
++
+ static int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
+ {
+ unsigned int i;
+@@ -2924,6 +2939,8 @@ static int __init construct_domU(struct domain *d,
+ struct kernel_info kinfo = {};
+ int rc;
+ u64 mem;
++ u32 p2m_mem_mb;
++ unsigned long p2m_pages;
+
+ rc = dt_property_read_u64(node, "memory", &mem);
+ if ( !rc )
+@@ -2933,6 +2950,18 @@ static int __init construct_domU(struct domain *d,
+ }
+ kinfo.unassigned_mem = (paddr_t)mem * SZ_1K;
+
++ rc = dt_property_read_u32(node, "xen,domain-p2m-mem-mb", &p2m_mem_mb);
++ /* If xen,domain-p2m-mem-mb is not specified, use the default value. */
++ p2m_pages = rc ?
++ p2m_mem_mb << (20 - PAGE_SHIFT) :
++ domain_p2m_pages(mem, d->max_vcpus);
++
++ spin_lock(&d->arch.paging.lock);
++ rc = p2m_set_allocation(d, p2m_pages, NULL);
++ spin_unlock(&d->arch.paging.lock);
++ if ( rc != 0 )
++ return rc;
++
+ printk("*** LOADING DOMU cpus=%u memory=%"PRIx64"KB ***\n", d->max_vcpus, mem);
+
+ kinfo.vpl011 = dt_property_read_bool(node, "vpl011");
+diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
+index 9bf72e693019..c8fdeb124084 100644
+--- a/xen/arch/arm/domctl.c
++++ b/xen/arch/arm/domctl.c
+@@ -50,6 +50,9 @@ static int handle_vuart_init(struct domain *d,
+ static long p2m_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
+ XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
+ {
++ long rc;
++ bool preempted = false;
++
+ if ( unlikely(d == current->domain) )
+ {
+ printk(XENLOG_ERR "Tried to do a p2m domctl op on itself.\n");
+@@ -66,9 +69,27 @@ static long p2m_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
+ switch ( sc->op )
+ {
+ case XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION:
+- return 0;
++ {
++ /* Allow and handle preemption */
++ spin_lock(&d->arch.paging.lock);
++ rc = p2m_set_allocation(d, sc->mb << (20 - PAGE_SHIFT), &preempted);
++ spin_unlock(&d->arch.paging.lock);
++
++ if ( preempted )
++ /* Not finished. Set up to re-run the call. */
++ rc = hypercall_create_continuation(__HYPERVISOR_domctl, "h",
++ u_domctl);
++ else
++ /* Finished. Return the new allocation. */
++ sc->mb = p2m_get_allocation(d);
++
++ return rc;
++ }
+ case XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION:
++ {
++ sc->mb = p2m_get_allocation(d);
+ return 0;
++ }
+ default:
+ {
+ printk(XENLOG_ERR "Bad p2m domctl op %u\n", sc->op);
+diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
+index d8957dd8727c..b2d856a801af 100644
+--- a/xen/arch/arm/p2m.c
++++ b/xen/arch/arm/p2m.c
+@@ -50,6 +50,54 @@ static uint64_t generate_vttbr(uint16_t vmid, mfn_t root_mfn)
+ return (mfn_to_maddr(root_mfn) | ((uint64_t)vmid << 48));
+ }
+
++static struct page_info *p2m_alloc_page(struct domain *d)
++{
++ struct page_info *pg;
++
++ spin_lock(&d->arch.paging.lock);
++ /*
++ * For hardware domain, there should be no limit in the number of pages that
++ * can be allocated, so that the kernel may take advantage of the extended
++ * regions. Hence, allocate p2m pages for hardware domains from heap.
++ */
++ if ( is_hardware_domain(d) )
++ {
++ pg = alloc_domheap_page(NULL, 0);
++ if ( pg == NULL )
++ {
++ printk(XENLOG_G_ERR "Failed to allocate P2M pages for hwdom.\n");
++ spin_unlock(&d->arch.paging.lock);
++ return NULL;
++ }
++ }
++ else
++ {
++ pg = page_list_remove_head(&d->arch.paging.p2m_freelist);
++ if ( unlikely(!pg) )
++ {
++ spin_unlock(&d->arch.paging.lock);
++ return NULL;
++ }
++ d->arch.paging.p2m_total_pages--;
++ }
++ spin_unlock(&d->arch.paging.lock);
++
++ return pg;
++}
++
++static void p2m_free_page(struct domain *d, struct page_info *pg)
++{
++ spin_lock(&d->arch.paging.lock);
++ if ( is_hardware_domain(d) )
++ free_domheap_page(pg);
++ else
++ {
++ d->arch.paging.p2m_total_pages++;
++ page_list_add_tail(pg, &d->arch.paging.p2m_freelist);
++ }
++ spin_unlock(&d->arch.paging.lock);
++}
++
+ /* Return the size of the pool, rounded up to the nearest MB */
+ unsigned int p2m_get_allocation(struct domain *d)
+ {
+@@ -751,7 +799,7 @@ static int p2m_create_table(struct p2m_domain *p2m, lpae_t *entry)
+
+ ASSERT(!p2m_is_valid(*entry));
+
+- page = alloc_domheap_page(NULL, 0);
++ page = p2m_alloc_page(p2m->domain);
+ if ( page == NULL )
+ return -ENOMEM;
+
+@@ -878,7 +926,7 @@ static void p2m_free_entry(struct p2m_domain *p2m,
+ pg = mfn_to_page(mfn);
+
+ page_list_del(pg, &p2m->pages);
+- free_domheap_page(pg);
++ p2m_free_page(p2m->domain, pg);
+ }
+
+ static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry,
+@@ -902,7 +950,7 @@ static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry,
+ ASSERT(level < target);
+ ASSERT(p2m_is_superpage(*entry, level));
+
+- page = alloc_domheap_page(NULL, 0);
++ page = p2m_alloc_page(p2m->domain);
+ if ( !page )
+ return false;
+
+@@ -1641,7 +1689,7 @@ int p2m_teardown(struct domain *d)
+
+ while ( (pg = page_list_remove_head(&p2m->pages)) )
+ {
+- free_domheap_page(pg);
++ p2m_free_page(p2m->domain, pg);
+ count++;
+ /* Arbitrarily preempt every 512 iterations */
+ if ( !(count % 512) && hypercall_preempt_check() )
+@@ -1665,6 +1713,7 @@ void p2m_final_teardown(struct domain *d)
+ return;
+
+ ASSERT(page_list_empty(&p2m->pages));
++ ASSERT(page_list_empty(&d->arch.paging.p2m_freelist));
+
+ if ( p2m->root )
+ free_domheap_pages(p2m->root, P2M_ROOT_ORDER);
+--
+2.37.3
+
diff --git a/0016-gnttab-correct-locking-on-transitive-grant-copy-erro.patch b/0016-gnttab-correct-locking-on-transitive-grant-copy-erro.patch
new file mode 100644
index 0000000..6283d47
--- /dev/null
+++ b/0016-gnttab-correct-locking-on-transitive-grant-copy-erro.patch
@@ -0,0 +1,66 @@
+From 32cb81501c8b858fe9a451650804ec3024a8b364 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 11 Oct 2022 14:56:29 +0200
+Subject: [PATCH 16/26] gnttab: correct locking on transitive grant copy error
+ path
+
+While the comment next to the lock dropping in preparation of
+recursively calling acquire_grant_for_copy() mistakenly talks about the
+rd == td case (excluded a few lines further up), the same concerns apply
+to the calling of release_grant_for_copy() on a subsequent error path.
+
+This is CVE-2022-33748 / XSA-411.
+
+Fixes: ad48fb963dbf ("gnttab: fix transitive grant handling")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+master commit: 6e3aab858eef614a21a782a3b73acc88e74690ea
+master date: 2022-10-11 14:29:30 +0200
+---
+ xen/common/grant_table.c | 19 ++++++++++++++++---
+ 1 file changed, 16 insertions(+), 3 deletions(-)
+
+diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
+index 4c742cd8fe81..d8ca645b96ff 100644
+--- a/xen/common/grant_table.c
++++ b/xen/common/grant_table.c
+@@ -2613,9 +2613,8 @@ acquire_grant_for_copy(
+ trans_domid);
+
+ /*
+- * acquire_grant_for_copy() could take the lock on the
+- * remote table (if rd == td), so we have to drop the lock
+- * here and reacquire.
++ * acquire_grant_for_copy() will take the lock on the remote table,
++ * so we have to drop the lock here and reacquire.
+ */
+ active_entry_release(act);
+ grant_read_unlock(rgt);
+@@ -2652,11 +2651,25 @@ acquire_grant_for_copy(
+ act->trans_gref != trans_gref ||
+ !act->is_sub_page)) )
+ {
++ /*
++ * Like above for acquire_grant_for_copy() we need to drop and then
++ * re-acquire the locks here to prevent lock order inversion issues.
++ * Unlike for acquire_grant_for_copy() we don't need to re-check
++ * anything, as release_grant_for_copy() doesn't depend on the grant
++ * table entry: It only updates internal state and the status flags.
++ */
++ active_entry_release(act);
++ grant_read_unlock(rgt);
++
+ release_grant_for_copy(td, trans_gref, readonly);
+ rcu_unlock_domain(td);
++
++ grant_read_lock(rgt);
++ act = active_entry_acquire(rgt, gref);
+ reduce_status_for_pin(rd, act, status, readonly);
+ active_entry_release(act);
+ grant_read_unlock(rgt);
++
+ put_page(*page);
+ *page = NULL;
+ return ERESTART;
+--
+2.37.3
+
diff --git a/0016-linker-lld-do-not-generate-quoted-section-names.patch b/0016-linker-lld-do-not-generate-quoted-section-names.patch
deleted file mode 100644
index 5b3a8cd..0000000
--- a/0016-linker-lld-do-not-generate-quoted-section-names.patch
+++ /dev/null
@@ -1,54 +0,0 @@
-From 88b653f73928117461dc250acd1e830a47a14c2b Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 7 Jun 2022 14:05:24 +0200
-Subject: [PATCH 16/51] linker/lld: do not generate quoted section names
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-LLVM LD doesn't strip the quotes from the section names, and so the
-resulting binary ends up with section names like:
-
- [ 1] ".text" PROGBITS ffff82d040200000 00008000
- 000000000018cbc1 0000000000000000 AX 0 0 4096
-
-This confuses some tools (like gdb) and prevents proper parsing of the
-binary.
-
-The issue has already been reported and is being fixed in LLD. In
-order to workaround this issue and keep the GNU ld support define
-different DECL_SECTION macros depending on the used ld
-implementation.
-
-Drop the quotes from the definitions of the debug sections in
-DECL_DEBUG{2}, as those quotes are not required for GNU ld either.
-
-Fixes: 6254920587c3 ('x86: quote section names when defining them in linker script')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 702c9a800eb3ecd4b8595998d37a769d470c5bb0
-master date: 2022-05-02 08:51:45 +0200
----
- xen/arch/x86/xen.lds.S | 6 +++++-
- 1 file changed, 5 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
-index 4c58f3209c3d..bc9b9651b192 100644
---- a/xen/arch/x86/xen.lds.S
-+++ b/xen/arch/x86/xen.lds.S
-@@ -18,7 +18,11 @@ ENTRY(efi_start)
- #else /* !EFI */
-
- #define FORMAT "elf64-x86-64"
--#define DECL_SECTION(x) #x : AT(ADDR(#x) - __XEN_VIRT_START)
-+#ifdef CONFIG_LD_IS_GNU
-+# define DECL_SECTION(x) x : AT(ADDR(#x) - __XEN_VIRT_START)
-+#else
-+# define DECL_SECTION(x) x : AT(ADDR(x) - __XEN_VIRT_START)
-+#endif
-
- ENTRY(start_pa)
-
---
-2.35.1
-
diff --git a/0017-tools-libxl-Replace-deprecated-soundhw-on-QEMU-comma.patch b/0017-tools-libxl-Replace-deprecated-soundhw-on-QEMU-comma.patch
new file mode 100644
index 0000000..ffbc311
--- /dev/null
+++ b/0017-tools-libxl-Replace-deprecated-soundhw-on-QEMU-comma.patch
@@ -0,0 +1,112 @@
+From e85e2a3c17b6cd38de041cdaf14d9efdcdabad1a Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Tue, 11 Oct 2022 14:59:10 +0200
+Subject: [PATCH 17/26] tools/libxl: Replace deprecated -soundhw on QEMU
+ command line
+
+-soundhw is deprecated since 825ff02911c9 ("audio: add soundhw
+deprecation notice"), QEMU v5.1, and is been remove for upcoming v7.1
+by 039a68373c45 ("introduce -audio as a replacement for -soundhw").
+
+Instead we can just add the sound card with "-device", for most option
+that "-soundhw" could handle. "-device" is an option that existed
+before QEMU 1.0, and could already be used to add audio hardware.
+
+The list of possible option for libxl's "soundhw" is taken the list
+from QEMU 7.0.
+
+The list of options for "soundhw" are listed in order of preference in
+the manual. The first three (hda, ac97, es1370) are PCI devices and
+easy to test on Linux, and the last four are ISA devices which doesn't
+seems to work out of the box on linux.
+
+The sound card 'pcspk' isn't listed even if it used to be accepted by
+'-soundhw' because QEMU crash when trying to add it to a Xen domain.
+Also, it wouldn't work with "-device" might need to be "-machine
+pcspk-audiodev=default" instead.
+
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
+master commit: 62ca138c2c052187783aca3957d3f47c4dcfd683
+master date: 2022-08-18 09:25:50 +0200
+---
+ docs/man/xl.cfg.5.pod.in | 6 +++---
+ tools/libs/light/libxl_dm.c | 19 ++++++++++++++++++-
+ tools/libs/light/libxl_types_internal.idl | 10 ++++++++++
+ 3 files changed, 31 insertions(+), 4 deletions(-)
+
+diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
+index eda1e77ebd06..ab7541f22c3e 100644
+--- a/docs/man/xl.cfg.5.pod.in
++++ b/docs/man/xl.cfg.5.pod.in
+@@ -2545,9 +2545,9 @@ The form serial=DEVICE is also accepted for backwards compatibility.
+
+ =item B<soundhw="DEVICE">
+
+-Select the virtual sound card to expose to the guest. The valid
+-devices are defined by the device model configuration, please see the
+-B<qemu(1)> manpage for details. The default is not to export any sound
++Select the virtual sound card to expose to the guest. The valid devices are
++B<hda>, B<ac97>, B<es1370>, B<adlib>, B<cs4231a>, B<gus>, B<sb16> if there are
++available with the device model QEMU. The default is not to export any sound
+ device.
+
+ =item B<vkb_device=BOOLEAN>
+diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
+index 04bf5d85632e..fc264a3a13a6 100644
+--- a/tools/libs/light/libxl_dm.c
++++ b/tools/libs/light/libxl_dm.c
+@@ -1204,6 +1204,7 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
+ uint64_t ram_size;
+ const char *path, *chardev;
+ bool is_stubdom = libxl_defbool_val(b_info->device_model_stubdomain);
++ int rc;
+
+ dm_args = flexarray_make(gc, 16, 1);
+ dm_envs = flexarray_make(gc, 16, 1);
+@@ -1531,7 +1532,23 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
+ }
+ }
+ if (b_info->u.hvm.soundhw) {
+- flexarray_vappend(dm_args, "-soundhw", b_info->u.hvm.soundhw, NULL);
++ libxl__qemu_soundhw soundhw;
++
++ rc = libxl__qemu_soundhw_from_string(b_info->u.hvm.soundhw, &soundhw);
++ if (rc) {
++ LOGD(ERROR, guest_domid, "Unknown soundhw option '%s'", b_info->u.hvm.soundhw);
++ return ERROR_INVAL;
++ }
++
++ switch (soundhw) {
++ case LIBXL__QEMU_SOUNDHW_HDA:
++ flexarray_vappend(dm_args, "-device", "intel-hda",
++ "-device", "hda-duplex", NULL);
++ break;
++ default:
++ flexarray_append_pair(dm_args, "-device",
++ (char*)libxl__qemu_soundhw_to_string(soundhw));
++ }
+ }
+ if (!libxl__acpi_defbool_val(b_info)) {
+ flexarray_append(dm_args, "-no-acpi");
+diff --git a/tools/libs/light/libxl_types_internal.idl b/tools/libs/light/libxl_types_internal.idl
+index 3593e21dbb64..caa08d3229cd 100644
+--- a/tools/libs/light/libxl_types_internal.idl
++++ b/tools/libs/light/libxl_types_internal.idl
+@@ -55,3 +55,13 @@ libxl__device_action = Enumeration("device_action", [
+ (1, "ADD"),
+ (2, "REMOVE"),
+ ])
++
++libxl__qemu_soundhw = Enumeration("qemu_soundhw", [
++ (1, "ac97"),
++ (2, "adlib"),
++ (3, "cs4231a"),
++ (4, "es1370"),
++ (5, "gus"),
++ (6, "hda"),
++ (7, "sb16"),
++ ])
+--
+2.37.3
+
diff --git a/0017-xen-io-Fix-race-between-sending-an-I-O-and-domain-sh.patch b/0017-xen-io-Fix-race-between-sending-an-I-O-and-domain-sh.patch
deleted file mode 100644
index bc48a84..0000000
--- a/0017-xen-io-Fix-race-between-sending-an-I-O-and-domain-sh.patch
+++ /dev/null
@@ -1,142 +0,0 @@
-From 982a314bd3000a16c3128afadb36a8ff41029adc Mon Sep 17 00:00:00 2001
-From: Julien Grall <jgrall@amazon.com>
-Date: Tue, 7 Jun 2022 14:06:11 +0200
-Subject: [PATCH 17/51] xen: io: Fix race between sending an I/O and domain
- shutdown
-
-Xen provides hypercalls to shutdown (SCHEDOP_shutdown{,_code}) and
-resume a domain (XEN_DOMCTL_resumedomain). They can be used for checkpoint
-where the expectation is the domain should continue as nothing happened
-afterwards.
-
-hvmemul_do_io() and handle_pio() will act differently if the return
-code of hvm_send_ioreq() (resp. hvmemul_do_pio_buffer()) is X86EMUL_RETRY.
-
-In this case, the I/O state will be reset to STATE_IOREQ_NONE (i.e
-no I/O is pending) and/or the PC will not be advanced.
-
-If the shutdown request happens right after the I/O was sent to the
-IOREQ, then emulation code will end up to re-execute the instruction
-and therefore forward again the same I/O (at least when reading IO port).
-
-This would be problem if the access has a side-effect. A dumb example,
-is a device implementing a counter which is incremented by one for every
-access. When running shutdown/resume in a loop, the value read by the
-OS may not be the old value + 1.
-
-Add an extra boolean in the structure hvm_vcpu_io to indicate whether
-the I/O was suspended. This is then used in place of checking the domain
-is shutting down in hvmemul_do_io() and handle_pio() as they should
-act on suspend (i.e. vcpu_start_shutdown_deferral() returns false) rather
-than shutdown.
-
-Signed-off-by: Julien Grall <jgrall@amazon.com>
-Reviewed-by: Paul Durrant <paul@xen.org>
-master commit: b7e0d8978810b534725e94a321736496928f00a5
-master date: 2022-05-06 17:16:22 +0100
----
- xen/arch/arm/ioreq.c | 3 ++-
- xen/arch/x86/hvm/emulate.c | 3 ++-
- xen/arch/x86/hvm/io.c | 7 ++++---
- xen/common/ioreq.c | 4 ++++
- xen/include/xen/sched.h | 5 +++++
- 5 files changed, 17 insertions(+), 5 deletions(-)
-
-diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
-index 308650b40051..fbccef212bf1 100644
---- a/xen/arch/arm/ioreq.c
-+++ b/xen/arch/arm/ioreq.c
-@@ -80,9 +80,10 @@ enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
- return IO_ABORT;
-
- vio->req = p;
-+ vio->suspended = false;
-
- rc = ioreq_send(s, &p, 0);
-- if ( rc != IO_RETRY || v->domain->is_shutting_down )
-+ if ( rc != IO_RETRY || vio->suspended )
- vio->req.state = STATE_IOREQ_NONE;
- else if ( !ioreq_needs_completion(&vio->req) )
- rc = IO_HANDLED;
-diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
-index 76a2ccfafe23..7da348b5d486 100644
---- a/xen/arch/x86/hvm/emulate.c
-+++ b/xen/arch/x86/hvm/emulate.c
-@@ -239,6 +239,7 @@ static int hvmemul_do_io(
- ASSERT(p.count);
-
- vio->req = p;
-+ vio->suspended = false;
-
- rc = hvm_io_intercept(&p);
-
-@@ -334,7 +335,7 @@ static int hvmemul_do_io(
- else
- {
- rc = ioreq_send(s, &p, 0);
-- if ( rc != X86EMUL_RETRY || currd->is_shutting_down )
-+ if ( rc != X86EMUL_RETRY || vio->suspended )
- vio->req.state = STATE_IOREQ_NONE;
- else if ( !ioreq_needs_completion(&vio->req) )
- rc = X86EMUL_OKAY;
-diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
-index 93f1d1503fa6..80915f27e488 100644
---- a/xen/arch/x86/hvm/io.c
-+++ b/xen/arch/x86/hvm/io.c
-@@ -138,10 +138,11 @@ bool handle_pio(uint16_t port, unsigned int size, int dir)
-
- case X86EMUL_RETRY:
- /*
-- * We should not advance RIP/EIP if the domain is shutting down or
-- * if X86EMUL_RETRY has been returned by an internal handler.
-+ * We should not advance RIP/EIP if the vio was suspended (e.g.
-+ * because the domain is shutting down) or if X86EMUL_RETRY has
-+ * been returned by an internal handler.
- */
-- if ( curr->domain->is_shutting_down || !vcpu_ioreq_pending(curr) )
-+ if ( vio->suspended || !vcpu_ioreq_pending(curr) )
- return false;
- break;
-
-diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
-index d732dc045df9..42414b750bef 100644
---- a/xen/common/ioreq.c
-+++ b/xen/common/ioreq.c
-@@ -1256,6 +1256,7 @@ int ioreq_send(struct ioreq_server *s, ioreq_t *proto_p,
- struct vcpu *curr = current;
- struct domain *d = curr->domain;
- struct ioreq_vcpu *sv;
-+ struct vcpu_io *vio = &curr->io;
-
- ASSERT(s);
-
-@@ -1263,7 +1264,10 @@ int ioreq_send(struct ioreq_server *s, ioreq_t *proto_p,
- return ioreq_send_buffered(s, proto_p);
-
- if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
-+ {
-+ vio->suspended = true;
- return IOREQ_STATUS_RETRY;
-+ }
-
- list_for_each_entry ( sv,
- &s->ioreq_vcpu_list,
-diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
-index 28146ee404e6..9671062360ac 100644
---- a/xen/include/xen/sched.h
-+++ b/xen/include/xen/sched.h
-@@ -159,6 +159,11 @@ enum vio_completion {
- struct vcpu_io {
- /* I/O request in flight to device model. */
- enum vio_completion completion;
-+ /*
-+ * Indicate whether the I/O was not handled because the domain
-+ * is about to be paused.
-+ */
-+ bool suspended;
- ioreq_t req;
- };
-
---
-2.35.1
-
diff --git a/0018-build-suppress-GNU-ld-warning-about-RWX-load-segment.patch b/0018-build-suppress-GNU-ld-warning-about-RWX-load-segment.patch
deleted file mode 100644
index b20a99a..0000000
--- a/0018-build-suppress-GNU-ld-warning-about-RWX-load-segment.patch
+++ /dev/null
@@ -1,35 +0,0 @@
-From 4890031d224262a6cf43d3bef1af4a16c13db306 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 7 Jun 2022 14:06:51 +0200
-Subject: [PATCH 18/51] build: suppress GNU ld warning about RWX load segments
-
-We cannot really avoid such and we're also not really at risk because of
-them, as we control page table permissions ourselves rather than relying
-on a loader of some sort. Present GNU ld master started warning about
-such, and hence 2.39 is anticipated to have this warning.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Julien Grall <jgrall@amazon.com>
-master commit: 68f5aac012b9ae36ce9b65d9ca9cc9f232191ad3
-master date: 2022-05-18 11:17:19 +0200
----
- xen/Makefile | 2 ++
- 1 file changed, 2 insertions(+)
-
-diff --git a/xen/Makefile b/xen/Makefile
-index ce4eca3ee4d7..4d9abe704628 100644
---- a/xen/Makefile
-+++ b/xen/Makefile
-@@ -260,6 +260,8 @@ endif
-
- AFLAGS += -D__ASSEMBLY__
-
-+LDFLAGS-$(call ld-option,--warn-rwx-segments) += --no-warn-rwx-segments
-+
- CFLAGS += $(CFLAGS-y)
- # allow extra CFLAGS externally via EXTRA_CFLAGS_XEN_CORE
- CFLAGS += $(EXTRA_CFLAGS_XEN_CORE)
---
-2.35.1
-
diff --git a/0018-x86-CPUID-surface-suitable-value-in-EBX-of-XSTATE-su.patch b/0018-x86-CPUID-surface-suitable-value-in-EBX-of-XSTATE-su.patch
new file mode 100644
index 0000000..d6ade98
--- /dev/null
+++ b/0018-x86-CPUID-surface-suitable-value-in-EBX-of-XSTATE-su.patch
@@ -0,0 +1,44 @@
+From e8882bcfe35520e950ba60acd6e67e65f1ce90a8 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 11 Oct 2022 14:59:26 +0200
+Subject: [PATCH 18/26] x86/CPUID: surface suitable value in EBX of XSTATE
+ subleaf 1
+
+While the SDM isn't very clear about this, our present behavior make
+Linux 5.19 unhappy. As of commit 8ad7e8f69695 ("x86/fpu/xsave: Support
+XSAVEC in the kernel") they're using this CPUID output also to size
+the compacted area used by XSAVEC. Getting back zero there isn't really
+liked, yet for PV that's the default on capable hardware: XSAVES isn't
+exposed to PV domains.
+
+Considering that the size reported is that of the compacted save area,
+I view Linux'es assumption as appropriate (short of the SDM properly
+considering the case). Therefore we need to populate the field also when
+only XSAVEC is supported for a guest.
+
+Fixes: 460b9a4b3630 ("x86/xsaves: enable xsaves/xrstors for hvm guest")
+Fixes: 8d050ed1097c ("x86: don't expose XSAVES capability to PV guests")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: c3bd0b83ea5b7c0da6542687436042eeea1e7909
+master date: 2022-08-24 14:23:59 +0200
+---
+ xen/arch/x86/cpuid.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
+index ff335f16390d..a647331f4793 100644
+--- a/xen/arch/x86/cpuid.c
++++ b/xen/arch/x86/cpuid.c
+@@ -1060,7 +1060,7 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
+ switch ( subleaf )
+ {
+ case 1:
+- if ( p->xstate.xsaves )
++ if ( p->xstate.xsavec || p->xstate.xsaves )
+ {
+ /*
+ * TODO: Figure out what to do for XSS state. VT-x manages
+--
+2.37.3
+
diff --git a/0019-build-silence-GNU-ld-warning-about-executable-stacks.patch b/0019-build-silence-GNU-ld-warning-about-executable-stacks.patch
deleted file mode 100644
index e4d739b..0000000
--- a/0019-build-silence-GNU-ld-warning-about-executable-stacks.patch
+++ /dev/null
@@ -1,35 +0,0 @@
-From 1bc669a568a9f4bdab9e9ddb95823ba370dc0baf Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 7 Jun 2022 14:07:11 +0200
-Subject: [PATCH 19/51] build: silence GNU ld warning about executable stacks
-
-While for C files the compiler is supposed to arrange for emitting
-respective information, for assembly sources we're responsible ourselves.
-Present GNU ld master started warning about such, and hence 2.39 is
-anticipated to have this warning.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Julien Grall <jgrall@amazon.com>
-master commit: 62d22296a95d259c934ca2f39ac511d729cfbb68
-master date: 2022-05-18 11:18:45 +0200
----
- xen/Makefile | 2 ++
- 1 file changed, 2 insertions(+)
-
-diff --git a/xen/Makefile b/xen/Makefile
-index 4d9abe704628..971028eda240 100644
---- a/xen/Makefile
-+++ b/xen/Makefile
-@@ -260,6 +260,8 @@ endif
-
- AFLAGS += -D__ASSEMBLY__
-
-+$(call cc-option-add,AFLAGS,CC,-Wa$(comma)--noexecstack)
-+
- LDFLAGS-$(call ld-option,--warn-rwx-segments) += --no-warn-rwx-segments
-
- CFLAGS += $(CFLAGS-y)
---
-2.35.1
-
diff --git a/0019-xen-sched-introduce-cpupool_update_node_affinity.patch b/0019-xen-sched-introduce-cpupool_update_node_affinity.patch
new file mode 100644
index 0000000..957d0fe
--- /dev/null
+++ b/0019-xen-sched-introduce-cpupool_update_node_affinity.patch
@@ -0,0 +1,257 @@
+From d4e971ad12dd27913dffcf96b5de378ea7b476e1 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 11 Oct 2022 14:59:40 +0200
+Subject: [PATCH 19/26] xen/sched: introduce cpupool_update_node_affinity()
+
+For updating the node affinities of all domains in a cpupool add a new
+function cpupool_update_node_affinity().
+
+In order to avoid multiple allocations of cpumasks carve out memory
+allocation and freeing from domain_update_node_affinity() into new
+helpers, which can be used by cpupool_update_node_affinity().
+
+Modify domain_update_node_affinity() to take an additional parameter
+for passing the allocated memory in and to allocate and free the memory
+via the new helpers in case NULL was passed.
+
+This will help later to pre-allocate the cpumasks in order to avoid
+allocations in stop-machine context.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: a83fa1e2b96ace65b45dde6954d67012633a082b
+master date: 2022-09-05 11:42:30 +0100
+---
+ xen/common/sched/core.c | 54 ++++++++++++++++++++++++++------------
+ xen/common/sched/cpupool.c | 39 +++++++++++++++------------
+ xen/common/sched/private.h | 7 +++++
+ xen/include/xen/sched.h | 9 ++++++-
+ 4 files changed, 74 insertions(+), 35 deletions(-)
+
+diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
+index f07bd2681fcb..065a83eca912 100644
+--- a/xen/common/sched/core.c
++++ b/xen/common/sched/core.c
+@@ -1824,9 +1824,28 @@ int vcpu_affinity_domctl(struct domain *d, uint32_t cmd,
+ return ret;
+ }
+
+-void domain_update_node_affinity(struct domain *d)
++bool alloc_affinity_masks(struct affinity_masks *affinity)
+ {
+- cpumask_var_t dom_cpumask, dom_cpumask_soft;
++ if ( !alloc_cpumask_var(&affinity->hard) )
++ return false;
++ if ( !alloc_cpumask_var(&affinity->soft) )
++ {
++ free_cpumask_var(affinity->hard);
++ return false;
++ }
++
++ return true;
++}
++
++void free_affinity_masks(struct affinity_masks *affinity)
++{
++ free_cpumask_var(affinity->soft);
++ free_cpumask_var(affinity->hard);
++}
++
++void domain_update_node_aff(struct domain *d, struct affinity_masks *affinity)
++{
++ struct affinity_masks masks;
+ cpumask_t *dom_affinity;
+ const cpumask_t *online;
+ struct sched_unit *unit;
+@@ -1836,14 +1855,16 @@ void domain_update_node_affinity(struct domain *d)
+ if ( !d->vcpu || !d->vcpu[0] )
+ return;
+
+- if ( !zalloc_cpumask_var(&dom_cpumask) )
+- return;
+- if ( !zalloc_cpumask_var(&dom_cpumask_soft) )
++ if ( !affinity )
+ {
+- free_cpumask_var(dom_cpumask);
+- return;
++ affinity = &masks;
++ if ( !alloc_affinity_masks(affinity) )
++ return;
+ }
+
++ cpumask_clear(affinity->hard);
++ cpumask_clear(affinity->soft);
++
+ online = cpupool_domain_master_cpumask(d);
+
+ spin_lock(&d->node_affinity_lock);
+@@ -1864,22 +1885,21 @@ void domain_update_node_affinity(struct domain *d)
+ */
+ for_each_sched_unit ( d, unit )
+ {
+- cpumask_or(dom_cpumask, dom_cpumask, unit->cpu_hard_affinity);
+- cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
+- unit->cpu_soft_affinity);
++ cpumask_or(affinity->hard, affinity->hard, unit->cpu_hard_affinity);
++ cpumask_or(affinity->soft, affinity->soft, unit->cpu_soft_affinity);
+ }
+ /* Filter out non-online cpus */
+- cpumask_and(dom_cpumask, dom_cpumask, online);
+- ASSERT(!cpumask_empty(dom_cpumask));
++ cpumask_and(affinity->hard, affinity->hard, online);
++ ASSERT(!cpumask_empty(affinity->hard));
+ /* And compute the intersection between hard, online and soft */
+- cpumask_and(dom_cpumask_soft, dom_cpumask_soft, dom_cpumask);
++ cpumask_and(affinity->soft, affinity->soft, affinity->hard);
+
+ /*
+ * If not empty, the intersection of hard, soft and online is the
+ * narrowest set we want. If empty, we fall back to hard&online.
+ */
+- dom_affinity = cpumask_empty(dom_cpumask_soft) ?
+- dom_cpumask : dom_cpumask_soft;
++ dom_affinity = cpumask_empty(affinity->soft) ? affinity->hard
++ : affinity->soft;
+
+ nodes_clear(d->node_affinity);
+ for_each_cpu ( cpu, dom_affinity )
+@@ -1888,8 +1908,8 @@ void domain_update_node_affinity(struct domain *d)
+
+ spin_unlock(&d->node_affinity_lock);
+
+- free_cpumask_var(dom_cpumask_soft);
+- free_cpumask_var(dom_cpumask);
++ if ( affinity == &masks )
++ free_affinity_masks(affinity);
+ }
+
+ typedef long ret_t;
+diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
+index 8c6e6eb9ccd5..45b6ff99561a 100644
+--- a/xen/common/sched/cpupool.c
++++ b/xen/common/sched/cpupool.c
+@@ -401,6 +401,25 @@ int cpupool_move_domain(struct domain *d, struct cpupool *c)
+ return ret;
+ }
+
++/* Update affinities of all domains in a cpupool. */
++static void cpupool_update_node_affinity(const struct cpupool *c)
++{
++ struct affinity_masks masks;
++ struct domain *d;
++
++ if ( !alloc_affinity_masks(&masks) )
++ return;
++
++ rcu_read_lock(&domlist_read_lock);
++
++ for_each_domain_in_cpupool(d, c)
++ domain_update_node_aff(d, &masks);
++
++ rcu_read_unlock(&domlist_read_lock);
++
++ free_affinity_masks(&masks);
++}
++
+ /*
+ * assign a specific cpu to a cpupool
+ * cpupool_lock must be held
+@@ -408,7 +427,6 @@ int cpupool_move_domain(struct domain *d, struct cpupool *c)
+ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
+ {
+ int ret;
+- struct domain *d;
+ const cpumask_t *cpus;
+
+ cpus = sched_get_opt_cpumask(c->gran, cpu);
+@@ -433,12 +451,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
+
+ rcu_read_unlock(&sched_res_rculock);
+
+- rcu_read_lock(&domlist_read_lock);
+- for_each_domain_in_cpupool(d, c)
+- {
+- domain_update_node_affinity(d);
+- }
+- rcu_read_unlock(&domlist_read_lock);
++ cpupool_update_node_affinity(c);
+
+ return 0;
+ }
+@@ -447,18 +460,14 @@ static int cpupool_unassign_cpu_finish(struct cpupool *c)
+ {
+ int cpu = cpupool_moving_cpu;
+ const cpumask_t *cpus;
+- struct domain *d;
+ int ret;
+
+ if ( c != cpupool_cpu_moving )
+ return -EADDRNOTAVAIL;
+
+- /*
+- * We need this for scanning the domain list, both in
+- * cpu_disable_scheduler(), and at the bottom of this function.
+- */
+ rcu_read_lock(&domlist_read_lock);
+ ret = cpu_disable_scheduler(cpu);
++ rcu_read_unlock(&domlist_read_lock);
+
+ rcu_read_lock(&sched_res_rculock);
+ cpus = get_sched_res(cpu)->cpus;
+@@ -485,11 +494,7 @@ static int cpupool_unassign_cpu_finish(struct cpupool *c)
+ }
+ rcu_read_unlock(&sched_res_rculock);
+
+- for_each_domain_in_cpupool(d, c)
+- {
+- domain_update_node_affinity(d);
+- }
+- rcu_read_unlock(&domlist_read_lock);
++ cpupool_update_node_affinity(c);
+
+ return ret;
+ }
+diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
+index a870320146ef..2b04b01a0c0a 100644
+--- a/xen/common/sched/private.h
++++ b/xen/common/sched/private.h
+@@ -593,6 +593,13 @@ affinity_balance_cpumask(const struct sched_unit *unit, int step,
+ cpumask_copy(mask, unit->cpu_hard_affinity);
+ }
+
++struct affinity_masks {
++ cpumask_var_t hard;
++ cpumask_var_t soft;
++};
++
++bool alloc_affinity_masks(struct affinity_masks *affinity);
++void free_affinity_masks(struct affinity_masks *affinity);
+ void sched_rm_cpu(unsigned int cpu);
+ const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int cpu);
+ void schedule_dump(struct cpupool *c);
+diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
+index 9671062360ac..3f4225738a40 100644
+--- a/xen/include/xen/sched.h
++++ b/xen/include/xen/sched.h
+@@ -655,8 +655,15 @@ static inline void get_knownalive_domain(struct domain *d)
+ ASSERT(!(atomic_read(&d->refcnt) & DOMAIN_DESTROYED));
+ }
+
++struct affinity_masks;
++
+ int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity);
+-void domain_update_node_affinity(struct domain *d);
++void domain_update_node_aff(struct domain *d, struct affinity_masks *affinity);
++
++static inline void domain_update_node_affinity(struct domain *d)
++{
++ domain_update_node_aff(d, NULL);
++}
+
+ /*
+ * To be implemented by each architecture, sanity checking the configuration
+--
+2.37.3
+
diff --git a/0020-ns16550-use-poll-mode-if-INTERRUPT_LINE-is-0xff.patch b/0020-ns16550-use-poll-mode-if-INTERRUPT_LINE-is-0xff.patch
deleted file mode 100644
index baa1e15..0000000
--- a/0020-ns16550-use-poll-mode-if-INTERRUPT_LINE-is-0xff.patch
+++ /dev/null
@@ -1,50 +0,0 @@
-From f1be0b62a03b90a40a03e21f965e4cbb89809bb1 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
- <marmarek@invisiblethingslab.com>
-Date: Tue, 7 Jun 2022 14:07:34 +0200
-Subject: [PATCH 20/51] ns16550: use poll mode if INTERRUPT_LINE is 0xff
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Intel LPSS has INTERRUPT_LINE set to 0xff by default, that is declared
-by the PCI Local Bus Specification Revision 3.0 (from 2004) as
-"unknown"/"no connection". Fallback to poll mode in this case.
-The 0xff handling is x86-specific, the surrounding code is guarded with
-CONFIG_X86 anyway.
-
-Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 6a2ea1a2370a0c8a0210accac0ae62e68c185134
-master date: 2022-05-20 12:19:45 +0200
----
- xen/drivers/char/ns16550.c | 13 +++++++++++++
- 1 file changed, 13 insertions(+)
-
-diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
-index 30596d60d4ed..2d2bd2a02469 100644
---- a/xen/drivers/char/ns16550.c
-+++ b/xen/drivers/char/ns16550.c
-@@ -1221,6 +1221,19 @@ pci_uart_config(struct ns16550 *uart, bool_t skip_amt, unsigned int idx)
- pci_conf_read8(PCI_SBDF(0, b, d, f),
- PCI_INTERRUPT_LINE) : 0;
-
-+#ifdef CONFIG_X86
-+ /*
-+ * PCI Local Bus Specification Revision 3.0 defines 0xff value
-+ * as special only for X86.
-+ */
-+ if ( uart->irq == 0xff )
-+ uart->irq = 0;
-+#endif
-+ if ( !uart->irq )
-+ printk(XENLOG_INFO
-+ "ns16550: %pp: no legacy IRQ, using poll mode\n",
-+ &PCI_SBDF(0, b, d, f));
-+
- return 0;
- }
- }
---
-2.35.1
-
diff --git a/0020-xen-sched-carve-out-memory-allocation-and-freeing-fr.patch b/0020-xen-sched-carve-out-memory-allocation-and-freeing-fr.patch
new file mode 100644
index 0000000..30784c3
--- /dev/null
+++ b/0020-xen-sched-carve-out-memory-allocation-and-freeing-fr.patch
@@ -0,0 +1,263 @@
+From c377ceab0a007690a1e71c81a5232613c99e944d Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 11 Oct 2022 15:00:05 +0200
+Subject: [PATCH 20/26] xen/sched: carve out memory allocation and freeing from
+ schedule_cpu_rm()
+
+In order to prepare not allocating or freeing memory from
+schedule_cpu_rm(), move this functionality to dedicated functions.
+
+For now call those functions from schedule_cpu_rm().
+
+No change of behavior expected.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: d42be6f83480b3ada286dc18444331a816be88a3
+master date: 2022-09-05 11:42:30 +0100
+---
+ xen/common/sched/core.c | 143 ++++++++++++++++++++++---------------
+ xen/common/sched/private.h | 11 +++
+ 2 files changed, 98 insertions(+), 56 deletions(-)
+
+diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
+index 065a83eca912..2decb1161a63 100644
+--- a/xen/common/sched/core.c
++++ b/xen/common/sched/core.c
+@@ -3221,6 +3221,75 @@ out:
+ return ret;
+ }
+
++/*
++ * Allocate all memory needed for free_cpu_rm_data(), as allocations cannot
++ * be made in stop_machine() context.
++ *
++ * Between alloc_cpu_rm_data() and the real cpu removal action the relevant
++ * contents of struct sched_resource can't change, as the cpu in question is
++ * locked against any other movement to or from cpupools, and the data copied
++ * by alloc_cpu_rm_data() is modified only in case the cpu in question is
++ * being moved from or to a cpupool.
++ */
++struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu)
++{
++ struct cpu_rm_data *data;
++ const struct sched_resource *sr;
++ unsigned int idx;
++
++ rcu_read_lock(&sched_res_rculock);
++
++ sr = get_sched_res(cpu);
++ data = xmalloc_flex_struct(struct cpu_rm_data, sr, sr->granularity - 1);
++ if ( !data )
++ goto out;
++
++ data->old_ops = sr->scheduler;
++ data->vpriv_old = idle_vcpu[cpu]->sched_unit->priv;
++ data->ppriv_old = sr->sched_priv;
++
++ for ( idx = 0; idx < sr->granularity - 1; idx++ )
++ {
++ data->sr[idx] = sched_alloc_res();
++ if ( data->sr[idx] )
++ {
++ data->sr[idx]->sched_unit_idle = sched_alloc_unit_mem();
++ if ( !data->sr[idx]->sched_unit_idle )
++ {
++ sched_res_free(&data->sr[idx]->rcu);
++ data->sr[idx] = NULL;
++ }
++ }
++ if ( !data->sr[idx] )
++ {
++ while ( idx > 0 )
++ sched_res_free(&data->sr[--idx]->rcu);
++ XFREE(data);
++ goto out;
++ }
++
++ data->sr[idx]->curr = data->sr[idx]->sched_unit_idle;
++ data->sr[idx]->scheduler = &sched_idle_ops;
++ data->sr[idx]->granularity = 1;
++
++ /* We want the lock not to change when replacing the resource. */
++ data->sr[idx]->schedule_lock = sr->schedule_lock;
++ }
++
++ out:
++ rcu_read_unlock(&sched_res_rculock);
++
++ return data;
++}
++
++void free_cpu_rm_data(struct cpu_rm_data *mem, unsigned int cpu)
++{
++ sched_free_udata(mem->old_ops, mem->vpriv_old);
++ sched_free_pdata(mem->old_ops, mem->ppriv_old, cpu);
++
++ xfree(mem);
++}
++
+ /*
+ * Remove a pCPU from its cpupool. Its scheduler becomes &sched_idle_ops
+ * (the idle scheduler).
+@@ -3229,53 +3298,23 @@ out:
+ */
+ int schedule_cpu_rm(unsigned int cpu)
+ {
+- void *ppriv_old, *vpriv_old;
+- struct sched_resource *sr, **sr_new = NULL;
++ struct sched_resource *sr;
++ struct cpu_rm_data *data;
+ struct sched_unit *unit;
+- struct scheduler *old_ops;
+ spinlock_t *old_lock;
+ unsigned long flags;
+- int idx, ret = -ENOMEM;
++ int idx = 0;
+ unsigned int cpu_iter;
+
++ data = alloc_cpu_rm_data(cpu);
++ if ( !data )
++ return -ENOMEM;
++
+ rcu_read_lock(&sched_res_rculock);
+
+ sr = get_sched_res(cpu);
+- old_ops = sr->scheduler;
+
+- if ( sr->granularity > 1 )
+- {
+- sr_new = xmalloc_array(struct sched_resource *, sr->granularity - 1);
+- if ( !sr_new )
+- goto out;
+- for ( idx = 0; idx < sr->granularity - 1; idx++ )
+- {
+- sr_new[idx] = sched_alloc_res();
+- if ( sr_new[idx] )
+- {
+- sr_new[idx]->sched_unit_idle = sched_alloc_unit_mem();
+- if ( !sr_new[idx]->sched_unit_idle )
+- {
+- sched_res_free(&sr_new[idx]->rcu);
+- sr_new[idx] = NULL;
+- }
+- }
+- if ( !sr_new[idx] )
+- {
+- for ( idx--; idx >= 0; idx-- )
+- sched_res_free(&sr_new[idx]->rcu);
+- goto out;
+- }
+- sr_new[idx]->curr = sr_new[idx]->sched_unit_idle;
+- sr_new[idx]->scheduler = &sched_idle_ops;
+- sr_new[idx]->granularity = 1;
+-
+- /* We want the lock not to change when replacing the resource. */
+- sr_new[idx]->schedule_lock = sr->schedule_lock;
+- }
+- }
+-
+- ret = 0;
++ ASSERT(sr->granularity);
+ ASSERT(sr->cpupool != NULL);
+ ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
+ ASSERT(!cpumask_test_cpu(cpu, sr->cpupool->cpu_valid));
+@@ -3283,10 +3322,6 @@ int schedule_cpu_rm(unsigned int cpu)
+ /* See comment in schedule_cpu_add() regarding lock switching. */
+ old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
+
+- vpriv_old = idle_vcpu[cpu]->sched_unit->priv;
+- ppriv_old = sr->sched_priv;
+-
+- idx = 0;
+ for_each_cpu ( cpu_iter, sr->cpus )
+ {
+ per_cpu(sched_res_idx, cpu_iter) = 0;
+@@ -3300,27 +3335,27 @@ int schedule_cpu_rm(unsigned int cpu)
+ else
+ {
+ /* Initialize unit. */
+- unit = sr_new[idx]->sched_unit_idle;
+- unit->res = sr_new[idx];
++ unit = data->sr[idx]->sched_unit_idle;
++ unit->res = data->sr[idx];
+ unit->is_running = true;
+ sched_unit_add_vcpu(unit, idle_vcpu[cpu_iter]);
+ sched_domain_insert_unit(unit, idle_vcpu[cpu_iter]->domain);
+
+ /* Adjust cpu masks of resources (old and new). */
+ cpumask_clear_cpu(cpu_iter, sr->cpus);
+- cpumask_set_cpu(cpu_iter, sr_new[idx]->cpus);
++ cpumask_set_cpu(cpu_iter, data->sr[idx]->cpus);
+ cpumask_set_cpu(cpu_iter, &sched_res_mask);
+
+ /* Init timer. */
+- init_timer(&sr_new[idx]->s_timer, s_timer_fn, NULL, cpu_iter);
++ init_timer(&data->sr[idx]->s_timer, s_timer_fn, NULL, cpu_iter);
+
+ /* Last resource initializations and insert resource pointer. */
+- sr_new[idx]->master_cpu = cpu_iter;
+- set_sched_res(cpu_iter, sr_new[idx]);
++ data->sr[idx]->master_cpu = cpu_iter;
++ set_sched_res(cpu_iter, data->sr[idx]);
+
+ /* Last action: set the new lock pointer. */
+ smp_mb();
+- sr_new[idx]->schedule_lock = &sched_free_cpu_lock;
++ data->sr[idx]->schedule_lock = &sched_free_cpu_lock;
+
+ idx++;
+ }
+@@ -3336,16 +3371,12 @@ int schedule_cpu_rm(unsigned int cpu)
+ /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
+ spin_unlock_irqrestore(old_lock, flags);
+
+- sched_deinit_pdata(old_ops, ppriv_old, cpu);
++ sched_deinit_pdata(data->old_ops, data->ppriv_old, cpu);
+
+- sched_free_udata(old_ops, vpriv_old);
+- sched_free_pdata(old_ops, ppriv_old, cpu);
+-
+-out:
+ rcu_read_unlock(&sched_res_rculock);
+- xfree(sr_new);
++ free_cpu_rm_data(data, cpu);
+
+- return ret;
++ return 0;
+ }
+
+ struct scheduler *scheduler_get_default(void)
+diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
+index 2b04b01a0c0a..e286849a1312 100644
+--- a/xen/common/sched/private.h
++++ b/xen/common/sched/private.h
+@@ -600,6 +600,15 @@ struct affinity_masks {
+
+ bool alloc_affinity_masks(struct affinity_masks *affinity);
+ void free_affinity_masks(struct affinity_masks *affinity);
++
++/* Memory allocation related data for schedule_cpu_rm(). */
++struct cpu_rm_data {
++ const struct scheduler *old_ops;
++ void *ppriv_old;
++ void *vpriv_old;
++ struct sched_resource *sr[];
++};
++
+ void sched_rm_cpu(unsigned int cpu);
+ const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int cpu);
+ void schedule_dump(struct cpupool *c);
+@@ -608,6 +617,8 @@ struct scheduler *scheduler_alloc(unsigned int sched_id);
+ void scheduler_free(struct scheduler *sched);
+ int cpu_disable_scheduler(unsigned int cpu);
+ int schedule_cpu_add(unsigned int cpu, struct cpupool *c);
++struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu);
++void free_cpu_rm_data(struct cpu_rm_data *mem, unsigned int cpu);
+ int schedule_cpu_rm(unsigned int cpu);
+ int sched_move_domain(struct domain *d, struct cpupool *c);
+ struct cpupool *cpupool_get_by_id(unsigned int poolid);
+--
+2.37.3
+
diff --git a/0021-PCI-don-t-allow-pci-phantom-to-mark-real-devices-as-.patch b/0021-PCI-don-t-allow-pci-phantom-to-mark-real-devices-as-.patch
deleted file mode 100644
index 1312bda..0000000
--- a/0021-PCI-don-t-allow-pci-phantom-to-mark-real-devices-as-.patch
+++ /dev/null
@@ -1,56 +0,0 @@
-From 8e11ec8fbf6f933f8854f4bc54226653316903f2 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 7 Jun 2022 14:08:06 +0200
-Subject: [PATCH 21/51] PCI: don't allow "pci-phantom=" to mark real devices as
- phantom functions
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-IOMMU code mapping / unmapping devices and interrupts will misbehave if
-a wrong command line option declared a function "phantom" when there's a
-real device at that position. Warn about this and adjust the specified
-stride (in the worst case ignoring the option altogether).
-
-Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 444b555dc9e09fa3ce90f066e0c88dec9b47f422
-master date: 2022-05-20 12:20:35 +0200
----
- xen/drivers/passthrough/pci.c | 19 ++++++++++++++++++-
- 1 file changed, 18 insertions(+), 1 deletion(-)
-
-diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
-index 395958698e6a..e0491c908f10 100644
---- a/xen/drivers/passthrough/pci.c
-+++ b/xen/drivers/passthrough/pci.c
-@@ -382,7 +382,24 @@ static struct pci_dev *alloc_pdev(struct pci_seg *pseg, u8 bus, u8 devfn)
- phantom_devs[i].slot == PCI_SLOT(devfn) &&
- phantom_devs[i].stride > PCI_FUNC(devfn) )
- {
-- pdev->phantom_stride = phantom_devs[i].stride;
-+ pci_sbdf_t sbdf = pdev->sbdf;
-+ unsigned int stride = phantom_devs[i].stride;
-+
-+ while ( (sbdf.fn += stride) > PCI_FUNC(devfn) )
-+ {
-+ if ( pci_conf_read16(sbdf, PCI_VENDOR_ID) == 0xffff &&
-+ pci_conf_read16(sbdf, PCI_DEVICE_ID) == 0xffff )
-+ continue;
-+ stride <<= 1;
-+ printk(XENLOG_WARNING
-+ "%pp looks to be a real device; bumping %04x:%02x:%02x stride to %u\n",
-+ &sbdf, phantom_devs[i].seg,
-+ phantom_devs[i].bus, phantom_devs[i].slot,
-+ stride);
-+ sbdf = pdev->sbdf;
-+ }
-+ if ( PCI_FUNC(stride) )
-+ pdev->phantom_stride = stride;
- break;
- }
- }
---
-2.35.1
-
diff --git a/0021-xen-sched-fix-cpu-hotplug.patch b/0021-xen-sched-fix-cpu-hotplug.patch
new file mode 100644
index 0000000..ea0b732
--- /dev/null
+++ b/0021-xen-sched-fix-cpu-hotplug.patch
@@ -0,0 +1,307 @@
+From 4f3204c2bc66db18c61600dd3e08bf1fd9584a1b Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 11 Oct 2022 15:00:19 +0200
+Subject: [PATCH 21/26] xen/sched: fix cpu hotplug
+
+Cpu unplugging is calling schedule_cpu_rm() via stop_machine_run() with
+interrupts disabled, thus any memory allocation or freeing must be
+avoided.
+
+Since commit 5047cd1d5dea ("xen/common: Use enhanced
+ASSERT_ALLOC_CONTEXT in xmalloc()") this restriction is being enforced
+via an assertion, which will now fail.
+
+Fix this by allocating needed memory before entering stop_machine_run()
+and freeing any memory only after having finished stop_machine_run().
+
+Fixes: 1ec410112cdd ("xen/sched: support differing granularity in schedule_cpu_[add/rm]()")
+Reported-by: Gao Ruifeng <ruifeng.gao@intel.com>
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: d84473689611eed32fd90b27e614f28af767fa3f
+master date: 2022-09-05 11:42:30 +0100
+---
+ xen/common/sched/core.c | 25 +++++++++++---
+ xen/common/sched/cpupool.c | 69 +++++++++++++++++++++++++++++---------
+ xen/common/sched/private.h | 5 +--
+ 3 files changed, 77 insertions(+), 22 deletions(-)
+
+diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
+index 2decb1161a63..900aab8f66a7 100644
+--- a/xen/common/sched/core.c
++++ b/xen/common/sched/core.c
+@@ -3231,7 +3231,7 @@ out:
+ * by alloc_cpu_rm_data() is modified only in case the cpu in question is
+ * being moved from or to a cpupool.
+ */
+-struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu)
++struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu, bool aff_alloc)
+ {
+ struct cpu_rm_data *data;
+ const struct sched_resource *sr;
+@@ -3244,6 +3244,17 @@ struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu)
+ if ( !data )
+ goto out;
+
++ if ( aff_alloc )
++ {
++ if ( !alloc_affinity_masks(&data->affinity) )
++ {
++ XFREE(data);
++ goto out;
++ }
++ }
++ else
++ memset(&data->affinity, 0, sizeof(data->affinity));
++
+ data->old_ops = sr->scheduler;
+ data->vpriv_old = idle_vcpu[cpu]->sched_unit->priv;
+ data->ppriv_old = sr->sched_priv;
+@@ -3264,6 +3275,7 @@ struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu)
+ {
+ while ( idx > 0 )
+ sched_res_free(&data->sr[--idx]->rcu);
++ free_affinity_masks(&data->affinity);
+ XFREE(data);
+ goto out;
+ }
+@@ -3286,6 +3298,7 @@ void free_cpu_rm_data(struct cpu_rm_data *mem, unsigned int cpu)
+ {
+ sched_free_udata(mem->old_ops, mem->vpriv_old);
+ sched_free_pdata(mem->old_ops, mem->ppriv_old, cpu);
++ free_affinity_masks(&mem->affinity);
+
+ xfree(mem);
+ }
+@@ -3296,17 +3309,18 @@ void free_cpu_rm_data(struct cpu_rm_data *mem, unsigned int cpu)
+ * The cpu is already marked as "free" and not valid any longer for its
+ * cpupool.
+ */
+-int schedule_cpu_rm(unsigned int cpu)
++int schedule_cpu_rm(unsigned int cpu, struct cpu_rm_data *data)
+ {
+ struct sched_resource *sr;
+- struct cpu_rm_data *data;
+ struct sched_unit *unit;
+ spinlock_t *old_lock;
+ unsigned long flags;
+ int idx = 0;
+ unsigned int cpu_iter;
++ bool free_data = !data;
+
+- data = alloc_cpu_rm_data(cpu);
++ if ( !data )
++ data = alloc_cpu_rm_data(cpu, false);
+ if ( !data )
+ return -ENOMEM;
+
+@@ -3374,7 +3388,8 @@ int schedule_cpu_rm(unsigned int cpu)
+ sched_deinit_pdata(data->old_ops, data->ppriv_old, cpu);
+
+ rcu_read_unlock(&sched_res_rculock);
+- free_cpu_rm_data(data, cpu);
++ if ( free_data )
++ free_cpu_rm_data(data, cpu);
+
+ return 0;
+ }
+diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
+index 45b6ff99561a..b5a948639aad 100644
+--- a/xen/common/sched/cpupool.c
++++ b/xen/common/sched/cpupool.c
+@@ -402,22 +402,28 @@ int cpupool_move_domain(struct domain *d, struct cpupool *c)
+ }
+
+ /* Update affinities of all domains in a cpupool. */
+-static void cpupool_update_node_affinity(const struct cpupool *c)
++static void cpupool_update_node_affinity(const struct cpupool *c,
++ struct affinity_masks *masks)
+ {
+- struct affinity_masks masks;
++ struct affinity_masks local_masks;
+ struct domain *d;
+
+- if ( !alloc_affinity_masks(&masks) )
+- return;
++ if ( !masks )
++ {
++ if ( !alloc_affinity_masks(&local_masks) )
++ return;
++ masks = &local_masks;
++ }
+
+ rcu_read_lock(&domlist_read_lock);
+
+ for_each_domain_in_cpupool(d, c)
+- domain_update_node_aff(d, &masks);
++ domain_update_node_aff(d, masks);
+
+ rcu_read_unlock(&domlist_read_lock);
+
+- free_affinity_masks(&masks);
++ if ( masks == &local_masks )
++ free_affinity_masks(masks);
+ }
+
+ /*
+@@ -451,15 +457,17 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
+
+ rcu_read_unlock(&sched_res_rculock);
+
+- cpupool_update_node_affinity(c);
++ cpupool_update_node_affinity(c, NULL);
+
+ return 0;
+ }
+
+-static int cpupool_unassign_cpu_finish(struct cpupool *c)
++static int cpupool_unassign_cpu_finish(struct cpupool *c,
++ struct cpu_rm_data *mem)
+ {
+ int cpu = cpupool_moving_cpu;
+ const cpumask_t *cpus;
++ struct affinity_masks *masks = mem ? &mem->affinity : NULL;
+ int ret;
+
+ if ( c != cpupool_cpu_moving )
+@@ -482,7 +490,7 @@ static int cpupool_unassign_cpu_finish(struct cpupool *c)
+ */
+ if ( !ret )
+ {
+- ret = schedule_cpu_rm(cpu);
++ ret = schedule_cpu_rm(cpu, mem);
+ if ( ret )
+ cpumask_andnot(&cpupool_free_cpus, &cpupool_free_cpus, cpus);
+ else
+@@ -494,7 +502,7 @@ static int cpupool_unassign_cpu_finish(struct cpupool *c)
+ }
+ rcu_read_unlock(&sched_res_rculock);
+
+- cpupool_update_node_affinity(c);
++ cpupool_update_node_affinity(c, masks);
+
+ return ret;
+ }
+@@ -558,7 +566,7 @@ static long cpupool_unassign_cpu_helper(void *info)
+ cpupool_cpu_moving->cpupool_id, cpupool_moving_cpu);
+ spin_lock(&cpupool_lock);
+
+- ret = cpupool_unassign_cpu_finish(c);
++ ret = cpupool_unassign_cpu_finish(c, NULL);
+
+ spin_unlock(&cpupool_lock);
+ debugtrace_printk("cpupool_unassign_cpu ret=%ld\n", ret);
+@@ -701,7 +709,7 @@ static int cpupool_cpu_add(unsigned int cpu)
+ * This function is called in stop_machine context, so we can be sure no
+ * non-idle vcpu is active on the system.
+ */
+-static void cpupool_cpu_remove(unsigned int cpu)
++static void cpupool_cpu_remove(unsigned int cpu, struct cpu_rm_data *mem)
+ {
+ int ret;
+
+@@ -709,7 +717,7 @@ static void cpupool_cpu_remove(unsigned int cpu)
+
+ if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) )
+ {
+- ret = cpupool_unassign_cpu_finish(cpupool0);
++ ret = cpupool_unassign_cpu_finish(cpupool0, mem);
+ BUG_ON(ret);
+ }
+ cpumask_clear_cpu(cpu, &cpupool_free_cpus);
+@@ -775,7 +783,7 @@ static void cpupool_cpu_remove_forced(unsigned int cpu)
+ {
+ ret = cpupool_unassign_cpu_start(c, master_cpu);
+ BUG_ON(ret);
+- ret = cpupool_unassign_cpu_finish(c);
++ ret = cpupool_unassign_cpu_finish(c, NULL);
+ BUG_ON(ret);
+ }
+ }
+@@ -993,12 +1001,24 @@ void dump_runq(unsigned char key)
+ static int cpu_callback(
+ struct notifier_block *nfb, unsigned long action, void *hcpu)
+ {
++ static struct cpu_rm_data *mem;
++
+ unsigned int cpu = (unsigned long)hcpu;
+ int rc = 0;
+
+ switch ( action )
+ {
+ case CPU_DOWN_FAILED:
++ if ( system_state <= SYS_STATE_active )
++ {
++ if ( mem )
++ {
++ free_cpu_rm_data(mem, cpu);
++ mem = NULL;
++ }
++ rc = cpupool_cpu_add(cpu);
++ }
++ break;
+ case CPU_ONLINE:
+ if ( system_state <= SYS_STATE_active )
+ rc = cpupool_cpu_add(cpu);
+@@ -1006,12 +1026,31 @@ static int cpu_callback(
+ case CPU_DOWN_PREPARE:
+ /* Suspend/Resume don't change assignments of cpus to cpupools. */
+ if ( system_state <= SYS_STATE_active )
++ {
+ rc = cpupool_cpu_remove_prologue(cpu);
++ if ( !rc )
++ {
++ ASSERT(!mem);
++ mem = alloc_cpu_rm_data(cpu, true);
++ rc = mem ? 0 : -ENOMEM;
++ }
++ }
+ break;
+ case CPU_DYING:
+ /* Suspend/Resume don't change assignments of cpus to cpupools. */
+ if ( system_state <= SYS_STATE_active )
+- cpupool_cpu_remove(cpu);
++ {
++ ASSERT(mem);
++ cpupool_cpu_remove(cpu, mem);
++ }
++ break;
++ case CPU_DEAD:
++ if ( system_state <= SYS_STATE_active )
++ {
++ ASSERT(mem);
++ free_cpu_rm_data(mem, cpu);
++ mem = NULL;
++ }
+ break;
+ case CPU_RESUME_FAILED:
+ cpupool_cpu_remove_forced(cpu);
+diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
+index e286849a1312..0126a4bb9ed3 100644
+--- a/xen/common/sched/private.h
++++ b/xen/common/sched/private.h
+@@ -603,6 +603,7 @@ void free_affinity_masks(struct affinity_masks *affinity);
+
+ /* Memory allocation related data for schedule_cpu_rm(). */
+ struct cpu_rm_data {
++ struct affinity_masks affinity;
+ const struct scheduler *old_ops;
+ void *ppriv_old;
+ void *vpriv_old;
+@@ -617,9 +618,9 @@ struct scheduler *scheduler_alloc(unsigned int sched_id);
+ void scheduler_free(struct scheduler *sched);
+ int cpu_disable_scheduler(unsigned int cpu);
+ int schedule_cpu_add(unsigned int cpu, struct cpupool *c);
+-struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu);
++struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu, bool aff_alloc);
+ void free_cpu_rm_data(struct cpu_rm_data *mem, unsigned int cpu);
+-int schedule_cpu_rm(unsigned int cpu);
++int schedule_cpu_rm(unsigned int cpu, struct cpu_rm_data *mem);
+ int sched_move_domain(struct domain *d, struct cpupool *c);
+ struct cpupool *cpupool_get_by_id(unsigned int poolid);
+ void cpupool_put(struct cpupool *pool);
+--
+2.37.3
+
diff --git a/0022-Config.mk-correct-PIE-related-option-s-in-EMBEDDED_E.patch b/0022-Config.mk-correct-PIE-related-option-s-in-EMBEDDED_E.patch
new file mode 100644
index 0000000..03f485a
--- /dev/null
+++ b/0022-Config.mk-correct-PIE-related-option-s-in-EMBEDDED_E.patch
@@ -0,0 +1,58 @@
+From 2b694dd2932be78431b14257f23b738f2fc8f6a1 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 11 Oct 2022 15:00:33 +0200
+Subject: [PATCH 22/26] Config.mk: correct PIE-related option(s) in
+ EMBEDDED_EXTRA_CFLAGS
+
+I haven't been able to find evidence of "-nopie" ever having been a
+supported compiler option. The correct spelling is "-no-pie".
+Furthermore like "-pie" this is an option which is solely passed to the
+linker. The compiler only recognizes "-fpie" / "-fPIE" / "-fno-pie", and
+it doesn't infer these options from "-pie" / "-no-pie".
+
+Add the compiler recognized form, but for the possible case of the
+variable also being used somewhere for linking keep the linker option as
+well (with corrected spelling).
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+
+Build: Drop -no-pie from EMBEDDED_EXTRA_CFLAGS
+
+This breaks all Clang builds, as demostrated by Gitlab CI.
+
+Contrary to the description in ecd6b9759919, -no-pie is not even an option
+passed to the linker. GCC's actual behaviour is to inhibit the passing of
+-pie to the linker, as well as selecting different cr0 artefacts to be linked.
+
+EMBEDDED_EXTRA_CFLAGS is not used for $(CC)-doing-linking, and not liable to
+gain such a usecase.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+Tested-by: Stefano Stabellini <sstabellini@kernel.org>
+Fixes: ecd6b9759919 ("Config.mk: correct PIE-related option(s) in EMBEDDED_EXTRA_CFLAGS")
+master commit: ecd6b9759919fa6335b0be1b5fc5cce29a30c4f1
+master date: 2022-09-08 09:25:26 +0200
+master commit: 13a7c0074ac8fb31f6c0485429b7a20a1946cb22
+master date: 2022-09-27 15:40:42 -0700
+---
+ Config.mk | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/Config.mk b/Config.mk
+index 46de3cd1e0e1..6f95067b8de6 100644
+--- a/Config.mk
++++ b/Config.mk
+@@ -197,7 +197,7 @@ endif
+ APPEND_LDFLAGS += $(foreach i, $(APPEND_LIB), -L$(i))
+ APPEND_CFLAGS += $(foreach i, $(APPEND_INCLUDES), -I$(i))
+
+-EMBEDDED_EXTRA_CFLAGS := -nopie -fno-stack-protector -fno-stack-protector-all
++EMBEDDED_EXTRA_CFLAGS := -fno-pie -fno-stack-protector -fno-stack-protector-all
+ EMBEDDED_EXTRA_CFLAGS += -fno-exceptions -fno-asynchronous-unwind-tables
+
+ XEN_EXTFILES_URL ?= http://xenbits.xen.org/xen-extfiles
+--
+2.37.3
+
diff --git a/0022-x86-pv-Clean-up-_get_page_type.patch b/0022-x86-pv-Clean-up-_get_page_type.patch
deleted file mode 100644
index 0270beb..0000000
--- a/0022-x86-pv-Clean-up-_get_page_type.patch
+++ /dev/null
@@ -1,180 +0,0 @@
-From b152dfbc3ad71a788996440b18174d995c3bffc9 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 9 Jun 2022 15:27:19 +0200
-Subject: [PATCH 22/51] x86/pv: Clean up _get_page_type()
-
-Various fixes for clarity, ahead of making complicated changes.
-
- * Split the overflow check out of the if/else chain for type handling, as
- it's somewhat unrelated.
- * Comment the main if/else chain to explain what is going on. Adjust one
- ASSERT() and state the bit layout for validate-locked and partial states.
- * Correct the comment about TLB flushing, as it's backwards. The problem
- case is when writeable mappings are retained to a page becoming read-only,
- as it allows the guest to bypass Xen's safety checks for updates.
- * Reduce the scope of 'y'. It is an artefact of the cmpxchg loop and not
- valid for use by subsequent logic. Switch to using ACCESS_ONCE() to treat
- all reads as explicitly volatile. The only thing preventing the validated
- wait-loop being infinite is the compiler barrier hidden in cpu_relax().
- * Replace one page_get_owner(page) with the already-calculated 'd' already in
- scope.
-
-No functional change.
-
-This is part of XSA-401 / CVE-2022-26362.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: George Dunlap <george.dunlap@citrix.com>
-master commit: 9186e96b199e4f7e52e033b238f9fe869afb69c7
-master date: 2022-06-09 14:20:36 +0200
----
- xen/arch/x86/mm.c | 72 +++++++++++++++++++++++++++++++++++++++--------
- 1 file changed, 61 insertions(+), 11 deletions(-)
-
-diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
-index 4ee2de11051d..79ad7fdd2b82 100644
---- a/xen/arch/x86/mm.c
-+++ b/xen/arch/x86/mm.c
-@@ -2906,16 +2906,17 @@ static int _put_page_type(struct page_info *page, unsigned int flags,
- static int _get_page_type(struct page_info *page, unsigned long type,
- bool preemptible)
- {
-- unsigned long nx, x, y = page->u.inuse.type_info;
-+ unsigned long nx, x;
- int rc = 0;
-
- ASSERT(!(type & ~(PGT_type_mask | PGT_pae_xen_l2)));
- ASSERT(!in_irq());
-
-- for ( ; ; )
-+ for ( unsigned long y = ACCESS_ONCE(page->u.inuse.type_info); ; )
- {
- x = y;
- nx = x + 1;
-+
- if ( unlikely((nx & PGT_count_mask) == 0) )
- {
- gdprintk(XENLOG_WARNING,
-@@ -2923,8 +2924,15 @@ static int _get_page_type(struct page_info *page, unsigned long type,
- mfn_x(page_to_mfn(page)));
- return -EINVAL;
- }
-- else if ( unlikely((x & PGT_count_mask) == 0) )
-+
-+ if ( unlikely((x & PGT_count_mask) == 0) )
- {
-+ /*
-+ * Typeref 0 -> 1.
-+ *
-+ * Type changes are permitted when the typeref is 0. If the type
-+ * actually changes, the page needs re-validating.
-+ */
- struct domain *d = page_get_owner(page);
-
- if ( d && shadow_mode_enabled(d) )
-@@ -2935,8 +2943,8 @@ static int _get_page_type(struct page_info *page, unsigned long type,
- {
- /*
- * On type change we check to flush stale TLB entries. It is
-- * vital that no other CPUs are left with mappings of a frame
-- * which is about to become writeable to the guest.
-+ * vital that no other CPUs are left with writeable mappings
-+ * to a frame which is intending to become pgtable/segdesc.
- */
- cpumask_t *mask = this_cpu(scratch_cpumask);
-
-@@ -2948,7 +2956,7 @@ static int _get_page_type(struct page_info *page, unsigned long type,
-
- if ( unlikely(!cpumask_empty(mask)) &&
- /* Shadow mode: track only writable pages. */
-- (!shadow_mode_enabled(page_get_owner(page)) ||
-+ (!shadow_mode_enabled(d) ||
- ((nx & PGT_type_mask) == PGT_writable_page)) )
- {
- perfc_incr(need_flush_tlb_flush);
-@@ -2979,7 +2987,14 @@ static int _get_page_type(struct page_info *page, unsigned long type,
- }
- else if ( unlikely((x & (PGT_type_mask|PGT_pae_xen_l2)) != type) )
- {
-- /* Don't log failure if it could be a recursive-mapping attempt. */
-+ /*
-+ * else, we're trying to take a new reference, of the wrong type.
-+ *
-+ * This (being able to prohibit use of the wrong type) is what the
-+ * typeref system exists for, but skip printing the failure if it
-+ * looks like a recursive mapping, as subsequent logic might
-+ * ultimately permit the attempt.
-+ */
- if ( ((x & PGT_type_mask) == PGT_l2_page_table) &&
- (type == PGT_l1_page_table) )
- return -EINVAL;
-@@ -2998,18 +3013,46 @@ static int _get_page_type(struct page_info *page, unsigned long type,
- }
- else if ( unlikely(!(x & PGT_validated)) )
- {
-+ /*
-+ * else, the count is non-zero, and we're grabbing the right type;
-+ * but the page hasn't been validated yet.
-+ *
-+ * The page is in one of two states (depending on PGT_partial),
-+ * and should have exactly one reference.
-+ */
-+ ASSERT((x & (PGT_type_mask | PGT_count_mask)) == (type | 1));
-+
- if ( !(x & PGT_partial) )
- {
-- /* Someone else is updating validation of this page. Wait... */
-+ /*
-+ * The page has been left in the "validate locked" state
-+ * (i.e. PGT_[type] | 1) which means that a concurrent caller
-+ * of _get_page_type() is in the middle of validation.
-+ *
-+ * Spin waiting for the concurrent user to complete (partial
-+ * or fully validated), then restart our attempt to acquire a
-+ * type reference.
-+ */
- do {
- if ( preemptible && hypercall_preempt_check() )
- return -EINTR;
- cpu_relax();
-- } while ( (y = page->u.inuse.type_info) == x );
-+ } while ( (y = ACCESS_ONCE(page->u.inuse.type_info)) == x );
- continue;
- }
-- /* Type ref count was left at 1 when PGT_partial got set. */
-- ASSERT((x & PGT_count_mask) == 1);
-+
-+ /*
-+ * The page has been left in the "partial" state
-+ * (i.e., PGT_[type] | PGT_partial | 1).
-+ *
-+ * Rather than bumping the type count, we need to try to grab the
-+ * validation lock; if we succeed, we need to validate the page,
-+ * then drop the general ref associated with the PGT_partial bit.
-+ *
-+ * We grab the validation lock by setting nx to (PGT_[type] | 1)
-+ * (i.e., non-zero type count, neither PGT_validated nor
-+ * PGT_partial set).
-+ */
- nx = x & ~PGT_partial;
- }
-
-@@ -3058,6 +3101,13 @@ static int _get_page_type(struct page_info *page, unsigned long type,
- }
-
- out:
-+ /*
-+ * Did we drop the PGT_partial bit when acquiring the typeref? If so,
-+ * drop the general reference that went along with it.
-+ *
-+ * N.B. validate_page() may have have re-set PGT_partial, not reflected in
-+ * nx, but will have taken an extra ref when doing so.
-+ */
- if ( (x & PGT_partial) && !(nx & PGT_partial) )
- put_page(page);
-
---
-2.35.1
-
diff --git a/0023-tools-xenstore-minor-fix-of-the-migration-stream-doc.patch b/0023-tools-xenstore-minor-fix-of-the-migration-stream-doc.patch
new file mode 100644
index 0000000..45f7509
--- /dev/null
+++ b/0023-tools-xenstore-minor-fix-of-the-migration-stream-doc.patch
@@ -0,0 +1,41 @@
+From 49510071ee93905378e54664778760ed3908d447 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 11 Oct 2022 15:00:59 +0200
+Subject: [PATCH 23/26] tools/xenstore: minor fix of the migration stream doc
+
+Drop mentioning the non-existent read-only socket in the migration
+stream description document.
+
+The related record field was removed in commit 8868a0e3f674 ("docs:
+update the xenstore migration stream documentation).
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+master commit: ace1d2eff80d3d66c37ae765dae3e3cb5697e5a4
+master date: 2022-09-08 09:25:58 +0200
+---
+ docs/designs/xenstore-migration.md | 8 +++-----
+ 1 file changed, 3 insertions(+), 5 deletions(-)
+
+diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-migration.md
+index 5f1155273ec3..78530bbb0ef4 100644
+--- a/docs/designs/xenstore-migration.md
++++ b/docs/designs/xenstore-migration.md
+@@ -129,11 +129,9 @@ xenstored state that needs to be restored.
+ | `evtchn-fd` | The file descriptor used to communicate with |
+ | | the event channel driver |
+
+-xenstored will resume in the original process context. Hence `rw-socket-fd` and
+-`ro-socket-fd` simply specify the file descriptors of the sockets. Sockets
+-are not always used, however, and so -1 will be used to denote an unused
+-socket.
+-
++xenstored will resume in the original process context. Hence `rw-socket-fd`
++simply specifies the file descriptor of the socket. Sockets are not always
++used, however, and so -1 will be used to denote an unused socket.
+
+ \pagebreak
+
+--
+2.37.3
+
diff --git a/0023-x86-pv-Fix-ABAC-cmpxchg-race-in-_get_page_type.patch b/0023-x86-pv-Fix-ABAC-cmpxchg-race-in-_get_page_type.patch
deleted file mode 100644
index 1e3febd..0000000
--- a/0023-x86-pv-Fix-ABAC-cmpxchg-race-in-_get_page_type.patch
+++ /dev/null
@@ -1,201 +0,0 @@
-From 8dab3f79b122e69cbcdebca72cdc14f004ee2193 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 9 Jun 2022 15:27:37 +0200
-Subject: [PATCH 23/51] x86/pv: Fix ABAC cmpxchg() race in _get_page_type()
-
-_get_page_type() suffers from a race condition where it incorrectly assumes
-that because 'x' was read and a subsequent a cmpxchg() succeeds, the type
-cannot have changed in-between. Consider:
-
-CPU A:
- 1. Creates an L2e referencing pg
- `-> _get_page_type(pg, PGT_l1_page_table), sees count 0, type PGT_writable_page
- 2. Issues flush_tlb_mask()
-CPU B:
- 3. Creates a writeable mapping of pg
- `-> _get_page_type(pg, PGT_writable_page), count increases to 1
- 4. Writes into new mapping, creating a TLB entry for pg
- 5. Removes the writeable mapping of pg
- `-> _put_page_type(pg), count goes back down to 0
-CPU A:
- 7. Issues cmpxchg(), setting count 1, type PGT_l1_page_table
-
-CPU B now has a writeable mapping to pg, which Xen believes is a pagetable and
-suitably protected (i.e. read-only). The TLB flush in step 2 must be deferred
-until after the guest is prohibited from creating new writeable mappings,
-which is after step 7.
-
-Defer all safety actions until after the cmpxchg() has successfully taken the
-intended typeref, because that is what prevents concurrent users from using
-the old type.
-
-Also remove the early validation for writeable and shared pages. This removes
-race conditions where one half of a parallel mapping attempt can return
-successfully before:
- * The IOMMU pagetables are in sync with the new page type
- * Writeable mappings to shared pages have been torn down
-
-This is part of XSA-401 / CVE-2022-26362.
-
-Reported-by: Jann Horn <jannh@google.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: George Dunlap <george.dunlap@citrix.com>
-master commit: 8cc5036bc385112a82f1faff27a0970e6440dfed
-master date: 2022-06-09 14:21:04 +0200
----
- xen/arch/x86/mm.c | 116 ++++++++++++++++++++++++++--------------------
- 1 file changed, 67 insertions(+), 49 deletions(-)
-
-diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
-index 79ad7fdd2b82..c6429b0f749a 100644
---- a/xen/arch/x86/mm.c
-+++ b/xen/arch/x86/mm.c
-@@ -2933,56 +2933,12 @@ static int _get_page_type(struct page_info *page, unsigned long type,
- * Type changes are permitted when the typeref is 0. If the type
- * actually changes, the page needs re-validating.
- */
-- struct domain *d = page_get_owner(page);
--
-- if ( d && shadow_mode_enabled(d) )
-- shadow_prepare_page_type_change(d, page, type);
-
- ASSERT(!(x & PGT_pae_xen_l2));
- if ( (x & PGT_type_mask) != type )
- {
-- /*
-- * On type change we check to flush stale TLB entries. It is
-- * vital that no other CPUs are left with writeable mappings
-- * to a frame which is intending to become pgtable/segdesc.
-- */
-- cpumask_t *mask = this_cpu(scratch_cpumask);
--
-- BUG_ON(in_irq());
-- cpumask_copy(mask, d->dirty_cpumask);
--
-- /* Don't flush if the timestamp is old enough */
-- tlbflush_filter(mask, page->tlbflush_timestamp);
--
-- if ( unlikely(!cpumask_empty(mask)) &&
-- /* Shadow mode: track only writable pages. */
-- (!shadow_mode_enabled(d) ||
-- ((nx & PGT_type_mask) == PGT_writable_page)) )
-- {
-- perfc_incr(need_flush_tlb_flush);
-- /*
-- * If page was a page table make sure the flush is
-- * performed using an IPI in order to avoid changing the
-- * type of a page table page under the feet of
-- * spurious_page_fault().
-- */
-- flush_mask(mask,
-- (x & PGT_type_mask) &&
-- (x & PGT_type_mask) <= PGT_root_page_table
-- ? FLUSH_TLB | FLUSH_FORCE_IPI
-- : FLUSH_TLB);
-- }
--
-- /* We lose existing type and validity. */
- nx &= ~(PGT_type_mask | PGT_validated);
- nx |= type;
--
-- /*
-- * No special validation needed for writable pages.
-- * Page tables and GDT/LDT need to be scanned for validity.
-- */
-- if ( type == PGT_writable_page || type == PGT_shared_page )
-- nx |= PGT_validated;
- }
- }
- else if ( unlikely((x & (PGT_type_mask|PGT_pae_xen_l2)) != type) )
-@@ -3063,6 +3019,56 @@ static int _get_page_type(struct page_info *page, unsigned long type,
- return -EINTR;
- }
-
-+ /*
-+ * One typeref has been taken and is now globally visible.
-+ *
-+ * The page is either in the "validate locked" state (PGT_[type] | 1) or
-+ * fully validated (PGT_[type] | PGT_validated | >0).
-+ */
-+
-+ if ( unlikely((x & PGT_count_mask) == 0) )
-+ {
-+ struct domain *d = page_get_owner(page);
-+
-+ if ( d && shadow_mode_enabled(d) )
-+ shadow_prepare_page_type_change(d, page, type);
-+
-+ if ( (x & PGT_type_mask) != type )
-+ {
-+ /*
-+ * On type change we check to flush stale TLB entries. It is
-+ * vital that no other CPUs are left with writeable mappings
-+ * to a frame which is intending to become pgtable/segdesc.
-+ */
-+ cpumask_t *mask = this_cpu(scratch_cpumask);
-+
-+ BUG_ON(in_irq());
-+ cpumask_copy(mask, d->dirty_cpumask);
-+
-+ /* Don't flush if the timestamp is old enough */
-+ tlbflush_filter(mask, page->tlbflush_timestamp);
-+
-+ if ( unlikely(!cpumask_empty(mask)) &&
-+ /* Shadow mode: track only writable pages. */
-+ (!shadow_mode_enabled(d) ||
-+ ((nx & PGT_type_mask) == PGT_writable_page)) )
-+ {
-+ perfc_incr(need_flush_tlb_flush);
-+ /*
-+ * If page was a page table make sure the flush is
-+ * performed using an IPI in order to avoid changing the
-+ * type of a page table page under the feet of
-+ * spurious_page_fault().
-+ */
-+ flush_mask(mask,
-+ (x & PGT_type_mask) &&
-+ (x & PGT_type_mask) <= PGT_root_page_table
-+ ? FLUSH_TLB | FLUSH_FORCE_IPI
-+ : FLUSH_TLB);
-+ }
-+ }
-+ }
-+
- if ( unlikely(((x & PGT_type_mask) == PGT_writable_page) !=
- (type == PGT_writable_page)) )
- {
-@@ -3091,13 +3097,25 @@ static int _get_page_type(struct page_info *page, unsigned long type,
-
- if ( unlikely(!(nx & PGT_validated)) )
- {
-- if ( !(x & PGT_partial) )
-+ /*
-+ * No special validation needed for writable or shared pages. Page
-+ * tables and GDT/LDT need to have their contents audited.
-+ *
-+ * per validate_page(), non-atomic updates are fine here.
-+ */
-+ if ( type == PGT_writable_page || type == PGT_shared_page )
-+ page->u.inuse.type_info |= PGT_validated;
-+ else
- {
-- page->nr_validated_ptes = 0;
-- page->partial_flags = 0;
-- page->linear_pt_count = 0;
-+ if ( !(x & PGT_partial) )
-+ {
-+ page->nr_validated_ptes = 0;
-+ page->partial_flags = 0;
-+ page->linear_pt_count = 0;
-+ }
-+
-+ rc = validate_page(page, type, preemptible);
- }
-- rc = validate_page(page, type, preemptible);
- }
-
- out:
---
-2.35.1
-
diff --git a/0024-x86-page-Introduce-_PAGE_-constants-for-memory-types.patch b/0024-x86-page-Introduce-_PAGE_-constants-for-memory-types.patch
deleted file mode 100644
index 409b72f..0000000
--- a/0024-x86-page-Introduce-_PAGE_-constants-for-memory-types.patch
+++ /dev/null
@@ -1,53 +0,0 @@
-From 9cfd796ae05421ded8e4f70b2c55352491cfa841 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 9 Jun 2022 15:27:53 +0200
-Subject: [PATCH 24/51] x86/page: Introduce _PAGE_* constants for memory types
-
-... rather than opencoding the PAT/PCD/PWT attributes in __PAGE_HYPERVISOR_*
-constants. These are going to be needed by forthcoming logic.
-
-No functional change.
-
-This is part of XSA-402.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 1be8707c75bf4ba68447c74e1618b521dd432499
-master date: 2022-06-09 14:21:38 +0200
----
- xen/include/asm-x86/page.h | 12 ++++++++++--
- 1 file changed, 10 insertions(+), 2 deletions(-)
-
-diff --git a/xen/include/asm-x86/page.h b/xen/include/asm-x86/page.h
-index 1d080cffbe84..2e542050f65a 100644
---- a/xen/include/asm-x86/page.h
-+++ b/xen/include/asm-x86/page.h
-@@ -331,6 +331,14 @@ void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t);
-
- #define PAGE_CACHE_ATTRS (_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
-
-+/* Memory types, encoded under Xen's choice of MSR_PAT. */
-+#define _PAGE_WB ( 0)
-+#define _PAGE_WT ( _PAGE_PWT)
-+#define _PAGE_UCM ( _PAGE_PCD )
-+#define _PAGE_UC ( _PAGE_PCD | _PAGE_PWT)
-+#define _PAGE_WC (_PAGE_PAT )
-+#define _PAGE_WP (_PAGE_PAT | _PAGE_PWT)
-+
- /*
- * Debug option: Ensure that granted mappings are not implicitly unmapped.
- * WARNING: This will need to be disabled to run OSes that use the spare PTE
-@@ -349,8 +357,8 @@ void efi_update_l4_pgtable(unsigned int l4idx, l4_pgentry_t);
- #define __PAGE_HYPERVISOR_RX (_PAGE_PRESENT | _PAGE_ACCESSED)
- #define __PAGE_HYPERVISOR (__PAGE_HYPERVISOR_RX | \
- _PAGE_DIRTY | _PAGE_RW)
--#define __PAGE_HYPERVISOR_UCMINUS (__PAGE_HYPERVISOR | _PAGE_PCD)
--#define __PAGE_HYPERVISOR_UC (__PAGE_HYPERVISOR | _PAGE_PCD | _PAGE_PWT)
-+#define __PAGE_HYPERVISOR_UCMINUS (__PAGE_HYPERVISOR | _PAGE_UCM)
-+#define __PAGE_HYPERVISOR_UC (__PAGE_HYPERVISOR | _PAGE_UC)
- #define __PAGE_HYPERVISOR_SHSTK (__PAGE_HYPERVISOR_RO | _PAGE_DIRTY)
-
- #define MAP_SMALL_PAGES _PAGE_AVAIL0 /* don't use superpages mappings */
---
-2.35.1
-
diff --git a/0024-xen-gnttab-fix-gnttab_acquire_resource.patch b/0024-xen-gnttab-fix-gnttab_acquire_resource.patch
new file mode 100644
index 0000000..898503f
--- /dev/null
+++ b/0024-xen-gnttab-fix-gnttab_acquire_resource.patch
@@ -0,0 +1,69 @@
+From b9560762392c01b3ee84148c07be8017cb42dbc9 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 11 Oct 2022 15:01:22 +0200
+Subject: [PATCH 24/26] xen/gnttab: fix gnttab_acquire_resource()
+
+Commit 9dc46386d89d ("gnttab: work around "may be used uninitialized"
+warning") was wrong, as vaddrs can legitimately be NULL in case
+XENMEM_resource_grant_table_id_status was specified for a grant table
+v1. This would result in crashes in debug builds due to
+ASSERT_UNREACHABLE() triggering.
+
+Check vaddrs only to be NULL in the rc == 0 case.
+
+Expand the tests in tools/tests/resource to tickle this path, and verify that
+using XENMEM_resource_grant_table_id_status on a v1 grant table fails.
+
+Fixes: 9dc46386d89d ("gnttab: work around "may be used uninitialized" warning")
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com> # xen
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 52daa6a8483e4fbd6757c9d1b791e23931791608
+master date: 2022-09-09 16:28:38 +0100
+---
+ tools/tests/resource/test-resource.c | 15 +++++++++++++++
+ xen/common/grant_table.c | 2 +-
+ 2 files changed, 16 insertions(+), 1 deletion(-)
+
+diff --git a/tools/tests/resource/test-resource.c b/tools/tests/resource/test-resource.c
+index 0557f8a1b585..37dfff4dcd20 100644
+--- a/tools/tests/resource/test-resource.c
++++ b/tools/tests/resource/test-resource.c
+@@ -106,6 +106,21 @@ static void test_gnttab(uint32_t domid, unsigned int nr_frames,
+ if ( rc )
+ return fail(" Fail: Unmap grant table %d - %s\n",
+ errno, strerror(errno));
++
++ /*
++ * Verify that an attempt to map the status frames fails, as the domain is
++ * in gnttab v1 mode.
++ */
++ res = xenforeignmemory_map_resource(
++ fh, domid, XENMEM_resource_grant_table,
++ XENMEM_resource_grant_table_id_status, 0, 1,
++ (void **)&gnttab, PROT_READ | PROT_WRITE, 0);
++
++ if ( res )
++ {
++ fail(" Fail: Managed to map gnttab v2 status frames in v1 mode\n");
++ xenforeignmemory_unmap_resource(fh, res);
++ }
+ }
+
+ static void test_domain_configurations(void)
+diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
+index d8ca645b96ff..76272b3c8add 100644
+--- a/xen/common/grant_table.c
++++ b/xen/common/grant_table.c
+@@ -4142,7 +4142,7 @@ int gnttab_acquire_resource(
+ * on non-error paths, and hence it needs setting to NULL at the top of the
+ * function. Leave some runtime safety.
+ */
+- if ( !vaddrs )
++ if ( !rc && !vaddrs )
+ {
+ ASSERT_UNREACHABLE();
+ rc = -ENODATA;
+--
+2.37.3
+
diff --git a/0025-x86-Don-t-change-the-cacheability-of-the-directmap.patch b/0025-x86-Don-t-change-the-cacheability-of-the-directmap.patch
deleted file mode 100644
index 0a24a0a..0000000
--- a/0025-x86-Don-t-change-the-cacheability-of-the-directmap.patch
+++ /dev/null
@@ -1,223 +0,0 @@
-From 74193f4292d9cfc2874866e941d9939d8f33fcef Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 9 Jun 2022 15:28:23 +0200
-Subject: [PATCH 25/51] x86: Don't change the cacheability of the directmap
-
-Changeset 55f97f49b7ce ("x86: Change cache attributes of Xen 1:1 page mappings
-in response to guest mapping requests") attempted to keep the cacheability
-consistent between different mappings of the same page.
-
-The reason wasn't described in the changelog, but it is understood to be in
-regards to a concern over machine check exceptions, owing to errata when using
-mixed cacheabilities. It did this primarily by updating Xen's mapping of the
-page in the direct map when the guest mapped a page with reduced cacheability.
-
-Unfortunately, the logic didn't actually prevent mixed cacheability from
-occurring:
- * A guest could map a page normally, and then map the same page with
- different cacheability; nothing prevented this.
- * The cacheability of the directmap was always latest-takes-precedence in
- terms of guest requests.
- * Grant-mapped frames with lesser cacheability didn't adjust the page's
- cacheattr settings.
- * The map_domain_page() function still unconditionally created WB mappings,
- irrespective of the page's cacheattr settings.
-
-Additionally, update_xen_mappings() had a bug where the alias calculation was
-wrong for mfn's which were .init content, which should have been treated as
-fully guest pages, not Xen pages.
-
-Worse yet, the logic introduced a vulnerability whereby necessary
-pagetable/segdesc adjustments made by Xen in the validation logic could become
-non-coherent between the cache and main memory. The CPU could subsequently
-operate on the stale value in the cache, rather than the safe value in main
-memory.
-
-The directmap contains primarily mappings of RAM. PAT/MTRR conflict
-resolution is asymmetric, and generally for MTRR=WB ranges, PAT of lesser
-cacheability resolves to being coherent. The special case is WC mappings,
-which are non-coherent against MTRR=WB regions (except for fully-coherent
-CPUs).
-
-Xen must not have any WC cacheability in the directmap, to prevent Xen's
-actions from creating non-coherency. (Guest actions creating non-coherency is
-dealt with in subsequent patches.) As all memory types for MTRR=WB ranges
-inter-operate coherently, so leave Xen's directmap mappings as WB.
-
-Only PV guests with access to devices can use reduced-cacheability mappings to
-begin with, and they're trusted not to mount DoSs against the system anyway.
-
-Drop PGC_cacheattr_{base,mask} entirely, and the logic to manipulate them.
-Shift the later PGC_* constants up, to gain 3 extra bits in the main reference
-count. Retain the check in get_page_from_l1e() for special_pages() because a
-guest has no business using reduced cacheability on these.
-
-This reverts changeset 55f97f49b7ce6c3520c555d19caac6cf3f9a5df0
-
-This is CVE-2022-26363, part of XSA-402.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: George Dunlap <george.dunlap@citrix.com>
-master commit: ae09597da34aee6bc5b76475c5eea6994457e854
-master date: 2022-06-09 14:22:08 +0200
----
- xen/arch/x86/mm.c | 84 ++++------------------------------------
- xen/include/asm-x86/mm.h | 23 +++++------
- 2 files changed, 17 insertions(+), 90 deletions(-)
-
-diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
-index c6429b0f749a..ab32d13a1a0d 100644
---- a/xen/arch/x86/mm.c
-+++ b/xen/arch/x86/mm.c
-@@ -783,28 +783,6 @@ bool is_iomem_page(mfn_t mfn)
- return (page_get_owner(page) == dom_io);
- }
-
--static int update_xen_mappings(unsigned long mfn, unsigned int cacheattr)
--{
-- int err = 0;
-- bool alias = mfn >= PFN_DOWN(xen_phys_start) &&
-- mfn < PFN_UP(xen_phys_start + xen_virt_end - XEN_VIRT_START);
-- unsigned long xen_va =
-- XEN_VIRT_START + ((mfn - PFN_DOWN(xen_phys_start)) << PAGE_SHIFT);
--
-- if ( boot_cpu_has(X86_FEATURE_XEN_SELFSNOOP) )
-- return 0;
--
-- if ( unlikely(alias) && cacheattr )
-- err = map_pages_to_xen(xen_va, _mfn(mfn), 1, 0);
-- if ( !err )
-- err = map_pages_to_xen((unsigned long)mfn_to_virt(mfn), _mfn(mfn), 1,
-- PAGE_HYPERVISOR | cacheattr_to_pte_flags(cacheattr));
-- if ( unlikely(alias) && !cacheattr && !err )
-- err = map_pages_to_xen(xen_va, _mfn(mfn), 1, PAGE_HYPERVISOR);
--
-- return err;
--}
--
- #ifndef NDEBUG
- struct mmio_emul_range_ctxt {
- const struct domain *d;
-@@ -1009,47 +987,14 @@ get_page_from_l1e(
- goto could_not_pin;
- }
-
-- if ( pte_flags_to_cacheattr(l1f) !=
-- ((page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base) )
-+ if ( (l1f & PAGE_CACHE_ATTRS) != _PAGE_WB && is_special_page(page) )
- {
-- unsigned long x, nx, y = page->count_info;
-- unsigned long cacheattr = pte_flags_to_cacheattr(l1f);
-- int err;
--
-- if ( is_special_page(page) )
-- {
-- if ( write )
-- put_page_type(page);
-- put_page(page);
-- gdprintk(XENLOG_WARNING,
-- "Attempt to change cache attributes of Xen heap page\n");
-- return -EACCES;
-- }
--
-- do {
-- x = y;
-- nx = (x & ~PGC_cacheattr_mask) | (cacheattr << PGC_cacheattr_base);
-- } while ( (y = cmpxchg(&page->count_info, x, nx)) != x );
--
-- err = update_xen_mappings(mfn, cacheattr);
-- if ( unlikely(err) )
-- {
-- cacheattr = y & PGC_cacheattr_mask;
-- do {
-- x = y;
-- nx = (x & ~PGC_cacheattr_mask) | cacheattr;
-- } while ( (y = cmpxchg(&page->count_info, x, nx)) != x );
--
-- if ( write )
-- put_page_type(page);
-- put_page(page);
--
-- gdprintk(XENLOG_WARNING, "Error updating mappings for mfn %" PRI_mfn
-- " (pfn %" PRI_pfn ", from L1 entry %" PRIpte ") for d%d\n",
-- mfn, get_gpfn_from_mfn(mfn),
-- l1e_get_intpte(l1e), l1e_owner->domain_id);
-- return err;
-- }
-+ if ( write )
-+ put_page_type(page);
-+ put_page(page);
-+ gdprintk(XENLOG_WARNING,
-+ "Attempt to change cache attributes of Xen heap page\n");
-+ return -EACCES;
- }
-
- return 0;
-@@ -2467,24 +2412,9 @@ static int mod_l4_entry(l4_pgentry_t *pl4e,
- */
- static int cleanup_page_mappings(struct page_info *page)
- {
-- unsigned int cacheattr =
-- (page->count_info & PGC_cacheattr_mask) >> PGC_cacheattr_base;
- int rc = 0;
- unsigned long mfn = mfn_x(page_to_mfn(page));
-
-- /*
-- * If we've modified xen mappings as a result of guest cache
-- * attributes, restore them to the "normal" state.
-- */
-- if ( unlikely(cacheattr) )
-- {
-- page->count_info &= ~PGC_cacheattr_mask;
--
-- BUG_ON(is_special_page(page));
--
-- rc = update_xen_mappings(mfn, 0);
-- }
--
- /*
- * If this may be in a PV domain's IOMMU, remove it.
- *
-diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
-index cb9052749963..8a9a43bb0a9d 100644
---- a/xen/include/asm-x86/mm.h
-+++ b/xen/include/asm-x86/mm.h
-@@ -69,25 +69,22 @@
- /* Set when is using a page as a page table */
- #define _PGC_page_table PG_shift(3)
- #define PGC_page_table PG_mask(1, 3)
-- /* 3-bit PAT/PCD/PWT cache-attribute hint. */
--#define PGC_cacheattr_base PG_shift(6)
--#define PGC_cacheattr_mask PG_mask(7, 6)
- /* Page is broken? */
--#define _PGC_broken PG_shift(7)
--#define PGC_broken PG_mask(1, 7)
-+#define _PGC_broken PG_shift(4)
-+#define PGC_broken PG_mask(1, 4)
- /* Mutually-exclusive page states: { inuse, offlining, offlined, free }. */
--#define PGC_state PG_mask(3, 9)
--#define PGC_state_inuse PG_mask(0, 9)
--#define PGC_state_offlining PG_mask(1, 9)
--#define PGC_state_offlined PG_mask(2, 9)
--#define PGC_state_free PG_mask(3, 9)
-+#define PGC_state PG_mask(3, 6)
-+#define PGC_state_inuse PG_mask(0, 6)
-+#define PGC_state_offlining PG_mask(1, 6)
-+#define PGC_state_offlined PG_mask(2, 6)
-+#define PGC_state_free PG_mask(3, 6)
- #define page_state_is(pg, st) (((pg)->count_info&PGC_state) == PGC_state_##st)
- /* Page is not reference counted (see below for caveats) */
--#define _PGC_extra PG_shift(10)
--#define PGC_extra PG_mask(1, 10)
-+#define _PGC_extra PG_shift(7)
-+#define PGC_extra PG_mask(1, 7)
-
- /* Count of references to this frame. */
--#define PGC_count_width PG_shift(10)
-+#define PGC_count_width PG_shift(7)
- #define PGC_count_mask ((1UL<<PGC_count_width)-1)
-
- /*
---
-2.35.1
-
diff --git a/0025-x86-wire-up-VCPUOP_register_vcpu_time_memory_area-fo.patch b/0025-x86-wire-up-VCPUOP_register_vcpu_time_memory_area-fo.patch
new file mode 100644
index 0000000..849ef60
--- /dev/null
+++ b/0025-x86-wire-up-VCPUOP_register_vcpu_time_memory_area-fo.patch
@@ -0,0 +1,59 @@
+From 3f4da85ca8816f6617529c80850eaddd80ea0f1f Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 11 Oct 2022 15:01:36 +0200
+Subject: [PATCH 25/26] x86: wire up VCPUOP_register_vcpu_time_memory_area for
+ 32-bit guests
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Forever sinced its introduction VCPUOP_register_vcpu_time_memory_area
+was available only to native domains. Linux, for example, would attempt
+to use it irrespective of guest bitness (including in its so called
+PVHVM mode) as long as it finds XEN_PVCLOCK_TSC_STABLE_BIT set (which we
+set only for clocksource=tsc, which in turn needs engaging via command
+line option).
+
+Fixes: a5d39947cb89 ("Allow guests to register secondary vcpu_time_info")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: b726541d94bd0a80b5864d17a2cd2e6d73a3fe0a
+master date: 2022-09-29 14:47:45 +0200
+---
+ xen/arch/x86/x86_64/domain.c | 20 ++++++++++++++++++++
+ 1 file changed, 20 insertions(+)
+
+diff --git a/xen/arch/x86/x86_64/domain.c b/xen/arch/x86/x86_64/domain.c
+index c46dccc25a54..d51d99344796 100644
+--- a/xen/arch/x86/x86_64/domain.c
++++ b/xen/arch/x86/x86_64/domain.c
+@@ -54,6 +54,26 @@ arch_compat_vcpu_op(
+ break;
+ }
+
++ case VCPUOP_register_vcpu_time_memory_area:
++ {
++ struct compat_vcpu_register_time_memory_area area = { .addr.p = 0 };
++
++ rc = -EFAULT;
++ if ( copy_from_guest(&area.addr.h, arg, 1) )
++ break;
++
++ if ( area.addr.h.c != area.addr.p ||
++ !compat_handle_okay(area.addr.h, 1) )
++ break;
++
++ rc = 0;
++ guest_from_compat_handle(v->arch.time_info_guest, area.addr.h);
++
++ force_update_vcpu_system_time(v);
++
++ break;
++ }
++
+ case VCPUOP_get_physid:
+ rc = arch_do_vcpu_op(cmd, v, arg);
+ break;
+--
+2.37.3
+
diff --git a/0026-x86-Split-cache_flush-out-of-cache_writeback.patch b/0026-x86-Split-cache_flush-out-of-cache_writeback.patch
deleted file mode 100644
index 50f70f4..0000000
--- a/0026-x86-Split-cache_flush-out-of-cache_writeback.patch
+++ /dev/null
@@ -1,294 +0,0 @@
-From 8eafa2d871ae51d461256e4a14175e24df330c70 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 9 Jun 2022 15:28:48 +0200
-Subject: [PATCH 26/51] x86: Split cache_flush() out of cache_writeback()
-
-Subsequent changes will want a fully flushing version.
-
-Use the new helper rather than opencoding it in flush_area_local(). This
-resolves an outstanding issue where the conditional sfence is on the wrong
-side of the clflushopt loop. clflushopt is ordered with respect to older
-stores, not to younger stores.
-
-Rename gnttab_cache_flush()'s helper to avoid colliding in name.
-grant_table.c can see the prototype from cache.h so the build fails
-otherwise.
-
-This is part of XSA-402.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 9a67ffee3371506e1cbfdfff5b90658d4828f6a2
-master date: 2022-06-09 14:22:38 +0200
----
- xen/arch/x86/flushtlb.c | 84 ++++++++++++++++++++++++---
- xen/common/grant_table.c | 4 +-
- xen/drivers/passthrough/vtd/extern.h | 1 -
- xen/drivers/passthrough/vtd/iommu.c | 53 +----------------
- xen/drivers/passthrough/vtd/x86/vtd.c | 5 --
- xen/include/asm-x86/cache.h | 7 +++
- 6 files changed, 88 insertions(+), 66 deletions(-)
-
-diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
-index 25798df50f54..0c912b8669f8 100644
---- a/xen/arch/x86/flushtlb.c
-+++ b/xen/arch/x86/flushtlb.c
-@@ -234,7 +234,7 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
- if ( flags & FLUSH_CACHE )
- {
- const struct cpuinfo_x86 *c = ¤t_cpu_data;
-- unsigned long i, sz = 0;
-+ unsigned long sz = 0;
-
- if ( order < (BITS_PER_LONG - PAGE_SHIFT) )
- sz = 1UL << (order + PAGE_SHIFT);
-@@ -244,13 +244,7 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
- c->x86_clflush_size && c->x86_cache_size && sz &&
- ((sz >> 10) < c->x86_cache_size) )
- {
-- alternative("", "sfence", X86_FEATURE_CLFLUSHOPT);
-- for ( i = 0; i < sz; i += c->x86_clflush_size )
-- alternative_input(".byte " __stringify(NOP_DS_PREFIX) ";"
-- " clflush %0",
-- "data16 clflush %0", /* clflushopt */
-- X86_FEATURE_CLFLUSHOPT,
-- "m" (((const char *)va)[i]));
-+ cache_flush(va, sz);
- flags &= ~FLUSH_CACHE;
- }
- else
-@@ -265,6 +259,80 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
- return flags;
- }
-
-+void cache_flush(const void *addr, unsigned int size)
-+{
-+ /*
-+ * This function may be called before current_cpu_data is established.
-+ * Hence a fallback is needed to prevent the loop below becoming infinite.
-+ */
-+ unsigned int clflush_size = current_cpu_data.x86_clflush_size ?: 16;
-+ const void *end = addr + size;
-+
-+ addr -= (unsigned long)addr & (clflush_size - 1);
-+ for ( ; addr < end; addr += clflush_size )
-+ {
-+ /*
-+ * Note regarding the "ds" prefix use: it's faster to do a clflush
-+ * + prefix than a clflush + nop, and hence the prefix is added instead
-+ * of letting the alternative framework fill the gap by appending nops.
-+ */
-+ alternative_io("ds; clflush %[p]",
-+ "data16 clflush %[p]", /* clflushopt */
-+ X86_FEATURE_CLFLUSHOPT,
-+ /* no outputs */,
-+ [p] "m" (*(const char *)(addr)));
-+ }
-+
-+ alternative("", "sfence", X86_FEATURE_CLFLUSHOPT);
-+}
-+
-+void cache_writeback(const void *addr, unsigned int size)
-+{
-+ unsigned int clflush_size;
-+ const void *end = addr + size;
-+
-+ /* Fall back to CLFLUSH{,OPT} when CLWB isn't available. */
-+ if ( !boot_cpu_has(X86_FEATURE_CLWB) )
-+ return cache_flush(addr, size);
-+
-+ /*
-+ * This function may be called before current_cpu_data is established.
-+ * Hence a fallback is needed to prevent the loop below becoming infinite.
-+ */
-+ clflush_size = current_cpu_data.x86_clflush_size ?: 16;
-+ addr -= (unsigned long)addr & (clflush_size - 1);
-+ for ( ; addr < end; addr += clflush_size )
-+ {
-+/*
-+ * The arguments to a macro must not include preprocessor directives. Doing so
-+ * results in undefined behavior, so we have to create some defines here in
-+ * order to avoid it.
-+ */
-+#if defined(HAVE_AS_CLWB)
-+# define CLWB_ENCODING "clwb %[p]"
-+#elif defined(HAVE_AS_XSAVEOPT)
-+# define CLWB_ENCODING "data16 xsaveopt %[p]" /* clwb */
-+#else
-+# define CLWB_ENCODING ".byte 0x66, 0x0f, 0xae, 0x30" /* clwb (%%rax) */
-+#endif
-+
-+#define BASE_INPUT(addr) [p] "m" (*(const char *)(addr))
-+#if defined(HAVE_AS_CLWB) || defined(HAVE_AS_XSAVEOPT)
-+# define INPUT BASE_INPUT
-+#else
-+# define INPUT(addr) "a" (addr), BASE_INPUT(addr)
-+#endif
-+
-+ asm volatile (CLWB_ENCODING :: INPUT(addr));
-+
-+#undef INPUT
-+#undef BASE_INPUT
-+#undef CLWB_ENCODING
-+ }
-+
-+ asm volatile ("sfence" ::: "memory");
-+}
-+
- unsigned int guest_flush_tlb_flags(const struct domain *d)
- {
- bool shadow = paging_mode_shadow(d);
-diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
-index 66f8ce71741c..4c742cd8fe81 100644
---- a/xen/common/grant_table.c
-+++ b/xen/common/grant_table.c
-@@ -3431,7 +3431,7 @@ gnttab_swap_grant_ref(XEN_GUEST_HANDLE_PARAM(gnttab_swap_grant_ref_t) uop,
- return 0;
- }
-
--static int cache_flush(const gnttab_cache_flush_t *cflush, grant_ref_t *cur_ref)
-+static int _cache_flush(const gnttab_cache_flush_t *cflush, grant_ref_t *cur_ref)
- {
- struct domain *d, *owner;
- struct page_info *page;
-@@ -3525,7 +3525,7 @@ gnttab_cache_flush(XEN_GUEST_HANDLE_PARAM(gnttab_cache_flush_t) uop,
- return -EFAULT;
- for ( ; ; )
- {
-- int ret = cache_flush(&op, cur_ref);
-+ int ret = _cache_flush(&op, cur_ref);
-
- if ( ret < 0 )
- return ret;
-diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
-index 01e010a10d61..401079299725 100644
---- a/xen/drivers/passthrough/vtd/extern.h
-+++ b/xen/drivers/passthrough/vtd/extern.h
-@@ -76,7 +76,6 @@ int __must_check qinval_device_iotlb_sync(struct vtd_iommu *iommu,
- struct pci_dev *pdev,
- u16 did, u16 size, u64 addr);
-
--unsigned int get_cache_line_size(void);
- void flush_all_cache(void);
-
- uint64_t alloc_pgtable_maddr(unsigned long npages, nodeid_t node);
-diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
-index 8975c1de61bc..bc377c9bcfa4 100644
---- a/xen/drivers/passthrough/vtd/iommu.c
-+++ b/xen/drivers/passthrough/vtd/iommu.c
-@@ -31,6 +31,7 @@
- #include <xen/pci.h>
- #include <xen/pci_regs.h>
- #include <xen/keyhandler.h>
-+#include <asm/cache.h>
- #include <asm/msi.h>
- #include <asm/nops.h>
- #include <asm/irq.h>
-@@ -206,54 +207,6 @@ static void check_cleanup_domid_map(const struct domain *d,
- }
- }
-
--static void sync_cache(const void *addr, unsigned int size)
--{
-- static unsigned long clflush_size = 0;
-- const void *end = addr + size;
--
-- if ( clflush_size == 0 )
-- clflush_size = get_cache_line_size();
--
-- addr -= (unsigned long)addr & (clflush_size - 1);
-- for ( ; addr < end; addr += clflush_size )
--/*
-- * The arguments to a macro must not include preprocessor directives. Doing so
-- * results in undefined behavior, so we have to create some defines here in
-- * order to avoid it.
-- */
--#if defined(HAVE_AS_CLWB)
--# define CLWB_ENCODING "clwb %[p]"
--#elif defined(HAVE_AS_XSAVEOPT)
--# define CLWB_ENCODING "data16 xsaveopt %[p]" /* clwb */
--#else
--# define CLWB_ENCODING ".byte 0x66, 0x0f, 0xae, 0x30" /* clwb (%%rax) */
--#endif
--
--#define BASE_INPUT(addr) [p] "m" (*(const char *)(addr))
--#if defined(HAVE_AS_CLWB) || defined(HAVE_AS_XSAVEOPT)
--# define INPUT BASE_INPUT
--#else
--# define INPUT(addr) "a" (addr), BASE_INPUT(addr)
--#endif
-- /*
-- * Note regarding the use of NOP_DS_PREFIX: it's faster to do a clflush
-- * + prefix than a clflush + nop, and hence the prefix is added instead
-- * of letting the alternative framework fill the gap by appending nops.
-- */
-- alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %[p]",
-- "data16 clflush %[p]", /* clflushopt */
-- X86_FEATURE_CLFLUSHOPT,
-- CLWB_ENCODING,
-- X86_FEATURE_CLWB, /* no outputs */,
-- INPUT(addr));
--#undef INPUT
--#undef BASE_INPUT
--#undef CLWB_ENCODING
--
-- alternative_2("", "sfence", X86_FEATURE_CLFLUSHOPT,
-- "sfence", X86_FEATURE_CLWB);
--}
--
- /* Allocate page table, return its machine address */
- uint64_t alloc_pgtable_maddr(unsigned long npages, nodeid_t node)
- {
-@@ -273,7 +226,7 @@ uint64_t alloc_pgtable_maddr(unsigned long npages, nodeid_t node)
- clear_page(vaddr);
-
- if ( (iommu_ops.init ? &iommu_ops : &vtd_ops)->sync_cache )
-- sync_cache(vaddr, PAGE_SIZE);
-+ cache_writeback(vaddr, PAGE_SIZE);
- unmap_domain_page(vaddr);
- cur_pg++;
- }
-@@ -1305,7 +1258,7 @@ int __init iommu_alloc(struct acpi_drhd_unit *drhd)
- iommu->nr_pt_levels = agaw_to_level(agaw);
-
- if ( !ecap_coherent(iommu->ecap) )
-- vtd_ops.sync_cache = sync_cache;
-+ vtd_ops.sync_cache = cache_writeback;
-
- /* allocate domain id bitmap */
- iommu->domid_bitmap = xzalloc_array(unsigned long, BITS_TO_LONGS(nr_dom));
-diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c b/xen/drivers/passthrough/vtd/x86/vtd.c
-index 6681dccd6970..55f0faa521cb 100644
---- a/xen/drivers/passthrough/vtd/x86/vtd.c
-+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
-@@ -47,11 +47,6 @@ void unmap_vtd_domain_page(const void *va)
- unmap_domain_page(va);
- }
-
--unsigned int get_cache_line_size(void)
--{
-- return ((cpuid_ebx(1) >> 8) & 0xff) * 8;
--}
--
- void flush_all_cache()
- {
- wbinvd();
-diff --git a/xen/include/asm-x86/cache.h b/xen/include/asm-x86/cache.h
-index 1f7173d8c72c..e4770efb22b9 100644
---- a/xen/include/asm-x86/cache.h
-+++ b/xen/include/asm-x86/cache.h
-@@ -11,4 +11,11 @@
-
- #define __read_mostly __section(".data.read_mostly")
-
-+#ifndef __ASSEMBLY__
-+
-+void cache_flush(const void *addr, unsigned int size);
-+void cache_writeback(const void *addr, unsigned int size);
-+
-+#endif
-+
- #endif
---
-2.35.1
-
diff --git a/0026-x86-vpmu-Fix-race-condition-in-vpmu_load.patch b/0026-x86-vpmu-Fix-race-condition-in-vpmu_load.patch
new file mode 100644
index 0000000..0f33747
--- /dev/null
+++ b/0026-x86-vpmu-Fix-race-condition-in-vpmu_load.patch
@@ -0,0 +1,97 @@
+From 1bce7fb1f702da4f7a749c6f1457ecb20bf74fca Mon Sep 17 00:00:00 2001
+From: Tamas K Lengyel <tamas.lengyel@intel.com>
+Date: Tue, 11 Oct 2022 15:01:48 +0200
+Subject: [PATCH 26/26] x86/vpmu: Fix race-condition in vpmu_load
+
+The vPMU code-bases attempts to perform an optimization on saving/reloading the
+PMU context by keeping track of what vCPU ran on each pCPU. When a pCPU is
+getting scheduled, checks if the previous vCPU isn't the current one. If so,
+attempts a call to vpmu_save_force. Unfortunately if the previous vCPU is
+already getting scheduled to run on another pCPU its state will be already
+runnable, which results in an ASSERT failure.
+
+Fix this by always performing a pmu context save in vpmu_save when called from
+vpmu_switch_from, and do a vpmu_load when called from vpmu_switch_to.
+
+While this presents a minimal overhead in case the same vCPU is getting
+rescheduled on the same pCPU, the ASSERT failure is avoided and the code is a
+lot easier to reason about.
+
+Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+master commit: defa4e51d20a143bdd4395a075bf0933bb38a9a4
+master date: 2022-09-30 09:53:49 +0200
+---
+ xen/arch/x86/cpu/vpmu.c | 42 ++++-------------------------------------
+ 1 file changed, 4 insertions(+), 38 deletions(-)
+
+diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
+index 16e91a3694fe..b6c2ec3cd047 100644
+--- a/xen/arch/x86/cpu/vpmu.c
++++ b/xen/arch/x86/cpu/vpmu.c
+@@ -368,58 +368,24 @@ void vpmu_save(struct vcpu *v)
+ vpmu->last_pcpu = pcpu;
+ per_cpu(last_vcpu, pcpu) = v;
+
++ vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
++
+ if ( vpmu->arch_vpmu_ops )
+ if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v, 0) )
+ vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+
++ vpmu_reset(vpmu, VPMU_CONTEXT_SAVE);
++
+ apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
+ }
+
+ int vpmu_load(struct vcpu *v, bool_t from_guest)
+ {
+ struct vpmu_struct *vpmu = vcpu_vpmu(v);
+- int pcpu = smp_processor_id();
+- struct vcpu *prev = NULL;
+
+ if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
+ return 0;
+
+- /* First time this VCPU is running here */
+- if ( vpmu->last_pcpu != pcpu )
+- {
+- /*
+- * Get the context from last pcpu that we ran on. Note that if another
+- * VCPU is running there it must have saved this VPCU's context before
+- * startig to run (see below).
+- * There should be no race since remote pcpu will disable interrupts
+- * before saving the context.
+- */
+- if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
+- {
+- on_selected_cpus(cpumask_of(vpmu->last_pcpu),
+- vpmu_save_force, (void *)v, 1);
+- vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+- }
+- }
+-
+- /* Prevent forced context save from remote CPU */
+- local_irq_disable();
+-
+- prev = per_cpu(last_vcpu, pcpu);
+-
+- if ( prev != v && prev )
+- {
+- vpmu = vcpu_vpmu(prev);
+-
+- /* Someone ran here before us */
+- vpmu_save_force(prev);
+- vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
+-
+- vpmu = vcpu_vpmu(v);
+- }
+-
+- local_irq_enable();
+-
+ /* Only when PMU is counting, we load PMU context immediately. */
+ if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ||
+ (!has_vlapic(vpmu_vcpu(vpmu)->domain) &&
+--
+2.37.3
+
diff --git a/0027-x86-amd-Work-around-CLFLUSH-ordering-on-older-parts.patch b/0027-x86-amd-Work-around-CLFLUSH-ordering-on-older-parts.patch
deleted file mode 100644
index 060bc99..0000000
--- a/0027-x86-amd-Work-around-CLFLUSH-ordering-on-older-parts.patch
+++ /dev/null
@@ -1,95 +0,0 @@
-From c4815be949aae6583a9a22897beb96b095b4f1a2 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 9 Jun 2022 15:29:13 +0200
-Subject: [PATCH 27/51] x86/amd: Work around CLFLUSH ordering on older parts
-
-On pre-CLFLUSHOPT AMD CPUs, CLFLUSH is weakely ordered with everything,
-including reads and writes to the address, and LFENCE/SFENCE instructions.
-
-This creates a multitude of problematic corner cases, laid out in the manual.
-Arrange to use MFENCE on both sides of the CLFLUSH to force proper ordering.
-
-This is part of XSA-402.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 062868a5a8b428b85db589fa9a6d6e43969ffeb9
-master date: 2022-06-09 14:23:07 +0200
----
- xen/arch/x86/cpu/amd.c | 8 ++++++++
- xen/arch/x86/flushtlb.c | 13 ++++++++++++-
- xen/include/asm-x86/cpufeatures.h | 1 +
- 3 files changed, 21 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
-index a8e37dbb1f5c..b3b9a0df5fed 100644
---- a/xen/arch/x86/cpu/amd.c
-+++ b/xen/arch/x86/cpu/amd.c
-@@ -812,6 +812,14 @@ static void init_amd(struct cpuinfo_x86 *c)
- if (!cpu_has_lfence_dispatch)
- __set_bit(X86_FEATURE_MFENCE_RDTSC, c->x86_capability);
-
-+ /*
-+ * On pre-CLFLUSHOPT AMD CPUs, CLFLUSH is weakly ordered with
-+ * everything, including reads and writes to address, and
-+ * LFENCE/SFENCE instructions.
-+ */
-+ if (!cpu_has_clflushopt)
-+ setup_force_cpu_cap(X86_BUG_CLFLUSH_MFENCE);
-+
- switch(c->x86)
- {
- case 0xf ... 0x11:
-diff --git a/xen/arch/x86/flushtlb.c b/xen/arch/x86/flushtlb.c
-index 0c912b8669f8..dcbb4064012e 100644
---- a/xen/arch/x86/flushtlb.c
-+++ b/xen/arch/x86/flushtlb.c
-@@ -259,6 +259,13 @@ unsigned int flush_area_local(const void *va, unsigned int flags)
- return flags;
- }
-
-+/*
-+ * On pre-CLFLUSHOPT AMD CPUs, CLFLUSH is weakly ordered with everything,
-+ * including reads and writes to address, and LFENCE/SFENCE instructions.
-+ *
-+ * This function only works safely after alternatives have run. Luckily, at
-+ * the time of writing, we don't flush the caches that early.
-+ */
- void cache_flush(const void *addr, unsigned int size)
- {
- /*
-@@ -268,6 +275,8 @@ void cache_flush(const void *addr, unsigned int size)
- unsigned int clflush_size = current_cpu_data.x86_clflush_size ?: 16;
- const void *end = addr + size;
-
-+ alternative("", "mfence", X86_BUG_CLFLUSH_MFENCE);
-+
- addr -= (unsigned long)addr & (clflush_size - 1);
- for ( ; addr < end; addr += clflush_size )
- {
-@@ -283,7 +292,9 @@ void cache_flush(const void *addr, unsigned int size)
- [p] "m" (*(const char *)(addr)));
- }
-
-- alternative("", "sfence", X86_FEATURE_CLFLUSHOPT);
-+ alternative_2("",
-+ "sfence", X86_FEATURE_CLFLUSHOPT,
-+ "mfence", X86_BUG_CLFLUSH_MFENCE);
- }
-
- void cache_writeback(const void *addr, unsigned int size)
-diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
-index 7413febd7ad8..ff3157d52d13 100644
---- a/xen/include/asm-x86/cpufeatures.h
-+++ b/xen/include/asm-x86/cpufeatures.h
-@@ -47,6 +47,7 @@ XEN_CPUFEATURE(XEN_IBT, X86_SYNTH(27)) /* Xen uses CET Indirect Branch
-
- #define X86_BUG_FPU_PTRS X86_BUG( 0) /* (F)X{SAVE,RSTOR} doesn't save/restore FOP/FIP/FDP. */
- #define X86_BUG_NULL_SEG X86_BUG( 1) /* NULL-ing a selector preserves the base and limit. */
-+#define X86_BUG_CLFLUSH_MFENCE X86_BUG( 2) /* MFENCE needed to serialise CLFLUSH */
-
- /* Total number of capability words, inc synth and bug words. */
- #define NCAPINTS (FSCAPINTS + X86_NR_SYNTH + X86_NR_BUG) /* N 32-bit words worth of info */
---
-2.35.1
-
diff --git a/0028-x86-pv-Track-and-flush-non-coherent-mappings-of-RAM.patch b/0028-x86-pv-Track-and-flush-non-coherent-mappings-of-RAM.patch
deleted file mode 100644
index af60348..0000000
--- a/0028-x86-pv-Track-and-flush-non-coherent-mappings-of-RAM.patch
+++ /dev/null
@@ -1,160 +0,0 @@
-From dc020d8d1ba420e2dd0e7a40f5045db897f3c4f4 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 9 Jun 2022 15:29:38 +0200
-Subject: [PATCH 28/51] x86/pv: Track and flush non-coherent mappings of RAM
-
-There are legitimate uses of WC mappings of RAM, e.g. for DMA buffers with
-devices that make non-coherent writes. The Linux sound subsystem makes
-extensive use of this technique.
-
-For such usecases, the guest's DMA buffer is mapped and consistently used as
-WC, and Xen doesn't interact with the buffer.
-
-However, a mischevious guest can use WC mappings to deliberately create
-non-coherency between the cache and RAM, and use this to trick Xen into
-validating a pagetable which isn't actually safe.
-
-Allocate a new PGT_non_coherent to track the non-coherency of mappings. Set
-it whenever a non-coherent writeable mapping is created. If the page is used
-as anything other than PGT_writable_page, force a cache flush before
-validation. Also force a cache flush before the page is returned to the heap.
-
-This is CVE-2022-26364, part of XSA-402.
-
-Reported-by: Jann Horn <jannh@google.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: George Dunlap <george.dunlap@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: c1c9cae3a9633054b177c5de21ad7268162b2f2c
-master date: 2022-06-09 14:23:37 +0200
----
- xen/arch/x86/mm.c | 38 +++++++++++++++++++++++++++++++++++
- xen/arch/x86/pv/grant_table.c | 21 +++++++++++++++++++
- xen/include/asm-x86/mm.h | 6 +++++-
- 3 files changed, 64 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
-index ab32d13a1a0d..bab9624fabb7 100644
---- a/xen/arch/x86/mm.c
-+++ b/xen/arch/x86/mm.c
-@@ -997,6 +997,15 @@ get_page_from_l1e(
- return -EACCES;
- }
-
-+ /*
-+ * Track writeable non-coherent mappings to RAM pages, to trigger a cache
-+ * flush later if the target is used as anything but a PGT_writeable page.
-+ * We care about all writeable mappings, including foreign mappings.
-+ */
-+ if ( !boot_cpu_has(X86_FEATURE_XEN_SELFSNOOP) &&
-+ (l1f & (PAGE_CACHE_ATTRS | _PAGE_RW)) == (_PAGE_WC | _PAGE_RW) )
-+ set_bit(_PGT_non_coherent, &page->u.inuse.type_info);
-+
- return 0;
-
- could_not_pin:
-@@ -2454,6 +2463,19 @@ static int cleanup_page_mappings(struct page_info *page)
- }
- }
-
-+ /*
-+ * Flush the cache if there were previously non-coherent writeable
-+ * mappings of this page. This forces the page to be coherent before it
-+ * is freed back to the heap.
-+ */
-+ if ( __test_and_clear_bit(_PGT_non_coherent, &page->u.inuse.type_info) )
-+ {
-+ void *addr = __map_domain_page(page);
-+
-+ cache_flush(addr, PAGE_SIZE);
-+ unmap_domain_page(addr);
-+ }
-+
- return rc;
- }
-
-@@ -3027,6 +3049,22 @@ static int _get_page_type(struct page_info *page, unsigned long type,
-
- if ( unlikely(!(nx & PGT_validated)) )
- {
-+ /*
-+ * Flush the cache if there were previously non-coherent mappings of
-+ * this page, and we're trying to use it as anything other than a
-+ * writeable page. This forces the page to be coherent before we
-+ * validate its contents for safety.
-+ */
-+ if ( (nx & PGT_non_coherent) && type != PGT_writable_page )
-+ {
-+ void *addr = __map_domain_page(page);
-+
-+ cache_flush(addr, PAGE_SIZE);
-+ unmap_domain_page(addr);
-+
-+ page->u.inuse.type_info &= ~PGT_non_coherent;
-+ }
-+
- /*
- * No special validation needed for writable or shared pages. Page
- * tables and GDT/LDT need to have their contents audited.
-diff --git a/xen/arch/x86/pv/grant_table.c b/xen/arch/x86/pv/grant_table.c
-index 0325618c9883..81c72e61ed55 100644
---- a/xen/arch/x86/pv/grant_table.c
-+++ b/xen/arch/x86/pv/grant_table.c
-@@ -109,7 +109,17 @@ int create_grant_pv_mapping(uint64_t addr, mfn_t frame,
-
- ol1e = *pl1e;
- if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr, 0) )
-+ {
-+ /*
-+ * We always create mappings in this path. However, our caller,
-+ * map_grant_ref(), only passes potentially non-zero cache_flags for
-+ * MMIO frames, so this path doesn't create non-coherent mappings of
-+ * RAM frames and there's no need to calculate PGT_non_coherent.
-+ */
-+ ASSERT(!cache_flags || is_iomem_page(frame));
-+
- rc = GNTST_okay;
-+ }
-
- out_unlock:
- page_unlock(page);
-@@ -294,7 +304,18 @@ int replace_grant_pv_mapping(uint64_t addr, mfn_t frame,
- l1e_get_flags(ol1e), addr, grant_pte_flags);
-
- if ( UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, curr, 0) )
-+ {
-+ /*
-+ * Generally, replace_grant_pv_mapping() is used to destroy mappings
-+ * (n1le = l1e_empty()), but it can be a present mapping on the
-+ * GNTABOP_unmap_and_replace path.
-+ *
-+ * In such cases, the PTE is fully transplanted from its old location
-+ * via steal_linear_addr(), so we need not perform PGT_non_coherent
-+ * checking here.
-+ */
- rc = GNTST_okay;
-+ }
-
- out_unlock:
- page_unlock(page);
-diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
-index 8a9a43bb0a9d..7464167ae192 100644
---- a/xen/include/asm-x86/mm.h
-+++ b/xen/include/asm-x86/mm.h
-@@ -53,8 +53,12 @@
- #define _PGT_partial PG_shift(8)
- #define PGT_partial PG_mask(1, 8)
-
-+/* Has this page been mapped writeable with a non-coherent memory type? */
-+#define _PGT_non_coherent PG_shift(9)
-+#define PGT_non_coherent PG_mask(1, 9)
-+
- /* Count of uses of this frame as its current type. */
--#define PGT_count_width PG_shift(8)
-+#define PGT_count_width PG_shift(9)
- #define PGT_count_mask ((1UL<<PGT_count_width)-1)
-
- /* Are the 'type mask' bits identical? */
---
-2.35.1
-
diff --git a/0029-x86-mm-account-for-PGT_pae_xen_l2-in-recently-added-.patch b/0029-x86-mm-account-for-PGT_pae_xen_l2-in-recently-added-.patch
deleted file mode 100644
index 90ce4cf..0000000
--- a/0029-x86-mm-account-for-PGT_pae_xen_l2-in-recently-added-.patch
+++ /dev/null
@@ -1,37 +0,0 @@
-From 0b4e62847c5af1a59eea8d17093feccd550d1c26 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Fri, 10 Jun 2022 10:28:28 +0200
-Subject: [PATCH 29/51] x86/mm: account for PGT_pae_xen_l2 in recently added
- assertion
-
-While PGT_pae_xen_l2 will be zapped once the type refcount of an L2 page
-reaches zero, it'll be retained as long as the type refcount is non-
-zero. Hence any checking against the requested type needs to either zap
-the bit from the type or include it in the used mask.
-
-Fixes: 9186e96b199e ("x86/pv: Clean up _get_page_type()")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: c2095ac76be0f4a1940346c9ffb49fb967345060
-master date: 2022-06-10 10:21:06 +0200
----
- xen/arch/x86/mm.c | 3 ++-
- 1 file changed, 2 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
-index bab9624fabb7..c1b9a3bb102a 100644
---- a/xen/arch/x86/mm.c
-+++ b/xen/arch/x86/mm.c
-@@ -2928,7 +2928,8 @@ static int _get_page_type(struct page_info *page, unsigned long type,
- * The page is in one of two states (depending on PGT_partial),
- * and should have exactly one reference.
- */
-- ASSERT((x & (PGT_type_mask | PGT_count_mask)) == (type | 1));
-+ ASSERT((x & (PGT_type_mask | PGT_pae_xen_l2 | PGT_count_mask)) ==
-+ (type | 1));
-
- if ( !(x & PGT_partial) )
- {
---
-2.35.1
-
diff --git a/0030-x86-spec-ctrl-Make-VERW-flushing-runtime-conditional.patch b/0030-x86-spec-ctrl-Make-VERW-flushing-runtime-conditional.patch
deleted file mode 100644
index af25b5c..0000000
--- a/0030-x86-spec-ctrl-Make-VERW-flushing-runtime-conditional.patch
+++ /dev/null
@@ -1,258 +0,0 @@
-From 0e80f9f61168d4e4f008da75762cee0118f802ed Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Mon, 13 Jun 2022 16:19:01 +0100
-Subject: [PATCH 30/51] x86/spec-ctrl: Make VERW flushing runtime conditional
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Currently, VERW flushing to mitigate MDS is boot time conditional per domain
-type. However, to provide mitigations for DRPW (CVE-2022-21166), we need to
-conditionally use VERW based on the trustworthiness of the guest, and the
-devices passed through.
-
-Remove the PV/HVM alternatives and instead issue a VERW on the return-to-guest
-path depending on the SCF_verw bit in cpuinfo spec_ctrl_flags.
-
-Introduce spec_ctrl_init_domain() and d->arch.verw to calculate the VERW
-disposition at domain creation time, and context switch the SCF_verw bit.
-
-For now, VERW flushing is used and controlled exactly as before, but later
-patches will add per-domain cases too.
-
-No change in behaviour.
-
-This is part of XSA-404.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-(cherry picked from commit e06b95c1d44ab80da255219fc9f1e2fc423edcb6)
----
- docs/misc/xen-command-line.pandoc | 5 ++---
- xen/arch/x86/domain.c | 12 ++++++++++--
- xen/arch/x86/hvm/vmx/entry.S | 2 +-
- xen/arch/x86/spec_ctrl.c | 30 +++++++++++++++++------------
- xen/include/asm-x86/cpufeatures.h | 3 +--
- xen/include/asm-x86/domain.h | 3 +++
- xen/include/asm-x86/spec_ctrl.h | 2 ++
- xen/include/asm-x86/spec_ctrl_asm.h | 16 +++++++++++++--
- 8 files changed, 51 insertions(+), 22 deletions(-)
-
-diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index 1d08fb7e9aa6..d5cb09f86541 100644
---- a/docs/misc/xen-command-line.pandoc
-+++ b/docs/misc/xen-command-line.pandoc
-@@ -2258,9 +2258,8 @@ in place for guests to use.
- Use of a positive boolean value for either of these options is invalid.
-
- The booleans `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` offer fine
--grained control over the alternative blocks used by Xen. These impact Xen's
--ability to protect itself, and Xen's ability to virtualise support for guests
--to use.
-+grained control over the primitives by Xen. These impact Xen's ability to
-+protect itself, and Xen's ability to virtualise support for guests to use.
-
- * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
- respectively.
-diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
-index ef1812dc1402..1fe6644a71ae 100644
---- a/xen/arch/x86/domain.c
-+++ b/xen/arch/x86/domain.c
-@@ -863,6 +863,8 @@ int arch_domain_create(struct domain *d,
-
- d->arch.msr_relaxed = config->arch.misc_flags & XEN_X86_MSR_RELAXED;
-
-+ spec_ctrl_init_domain(d);
-+
- return 0;
-
- fail:
-@@ -2017,14 +2019,15 @@ static void __context_switch(void)
- void context_switch(struct vcpu *prev, struct vcpu *next)
- {
- unsigned int cpu = smp_processor_id();
-+ struct cpu_info *info = get_cpu_info();
- const struct domain *prevd = prev->domain, *nextd = next->domain;
- unsigned int dirty_cpu = read_atomic(&next->dirty_cpu);
-
- ASSERT(prev != next);
- ASSERT(local_irq_is_enabled());
-
-- get_cpu_info()->use_pv_cr3 = false;
-- get_cpu_info()->xen_cr3 = 0;
-+ info->use_pv_cr3 = false;
-+ info->xen_cr3 = 0;
-
- if ( unlikely(dirty_cpu != cpu) && dirty_cpu != VCPU_CPU_CLEAN )
- {
-@@ -2088,6 +2091,11 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
- *last_id = next_id;
- }
- }
-+
-+ /* Update the top-of-stack block with the VERW disposition. */
-+ info->spec_ctrl_flags &= ~SCF_verw;
-+ if ( nextd->arch.verw )
-+ info->spec_ctrl_flags |= SCF_verw;
- }
-
- sched_context_switched(prev, next);
-diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
-index 49651f3c435a..5f5de45a1309 100644
---- a/xen/arch/x86/hvm/vmx/entry.S
-+++ b/xen/arch/x86/hvm/vmx/entry.S
-@@ -87,7 +87,7 @@ UNLIKELY_END(realmode)
-
- /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
- /* SPEC_CTRL_EXIT_TO_VMX Req: %rsp=regs/cpuinfo Clob: */
-- ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)), X86_FEATURE_SC_VERW_HVM
-+ DO_SPEC_CTRL_COND_VERW
-
- mov VCPU_hvm_guest_cr2(%rbx),%rax
-
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index c19464da70ce..21730aa03071 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -36,8 +36,8 @@ static bool __initdata opt_msr_sc_pv = true;
- static bool __initdata opt_msr_sc_hvm = true;
- static int8_t __initdata opt_rsb_pv = -1;
- static bool __initdata opt_rsb_hvm = true;
--static int8_t __initdata opt_md_clear_pv = -1;
--static int8_t __initdata opt_md_clear_hvm = -1;
-+static int8_t __read_mostly opt_md_clear_pv = -1;
-+static int8_t __read_mostly opt_md_clear_hvm = -1;
-
- /* Cmdline controls for Xen's speculative settings. */
- static enum ind_thunk {
-@@ -932,6 +932,13 @@ static __init void mds_calculations(uint64_t caps)
- }
- }
-
-+void spec_ctrl_init_domain(struct domain *d)
-+{
-+ bool pv = is_pv_domain(d);
-+
-+ d->arch.verw = pv ? opt_md_clear_pv : opt_md_clear_hvm;
-+}
-+
- void __init init_speculation_mitigations(void)
- {
- enum ind_thunk thunk = THUNK_DEFAULT;
-@@ -1196,21 +1203,20 @@ void __init init_speculation_mitigations(void)
- boot_cpu_has(X86_FEATURE_MD_CLEAR));
-
- /*
-- * Enable MDS defences as applicable. The PV blocks need using all the
-- * time, and the Idle blocks need using if either PV or HVM defences are
-- * used.
-+ * Enable MDS defences as applicable. The Idle blocks need using if
-+ * either PV or HVM defences are used.
- *
- * HVM is more complicated. The MD_CLEAR microcode extends L1D_FLUSH with
-- * equivelent semantics to avoid needing to perform both flushes on the
-- * HVM path. The HVM blocks don't need activating if our hypervisor told
-- * us it was handling L1D_FLUSH, or we are using L1D_FLUSH ourselves.
-+ * equivalent semantics to avoid needing to perform both flushes on the
-+ * HVM path. Therefore, we don't need VERW in addition to L1D_FLUSH.
-+ *
-+ * After calculating the appropriate idle setting, simplify
-+ * opt_md_clear_hvm to mean just "should we VERW on the way into HVM
-+ * guests", so spec_ctrl_init_domain() can calculate suitable settings.
- */
-- if ( opt_md_clear_pv )
-- setup_force_cpu_cap(X86_FEATURE_SC_VERW_PV);
- if ( opt_md_clear_pv || opt_md_clear_hvm )
- setup_force_cpu_cap(X86_FEATURE_SC_VERW_IDLE);
-- if ( opt_md_clear_hvm && !(caps & ARCH_CAPS_SKIP_L1DFL) && !opt_l1d_flush )
-- setup_force_cpu_cap(X86_FEATURE_SC_VERW_HVM);
-+ opt_md_clear_hvm &= !(caps & ARCH_CAPS_SKIP_L1DFL) && !opt_l1d_flush;
-
- /*
- * Warn the user if they are on MLPDS/MFBDS-vulnerable hardware with HT
-diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
-index ff3157d52d13..bd45a144ee78 100644
---- a/xen/include/asm-x86/cpufeatures.h
-+++ b/xen/include/asm-x86/cpufeatures.h
-@@ -35,8 +35,7 @@ XEN_CPUFEATURE(SC_RSB_HVM, X86_SYNTH(19)) /* RSB overwrite needed for HVM
- XEN_CPUFEATURE(XEN_SELFSNOOP, X86_SYNTH(20)) /* SELFSNOOP gets used by Xen itself */
- XEN_CPUFEATURE(SC_MSR_IDLE, X86_SYNTH(21)) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
- XEN_CPUFEATURE(XEN_LBR, X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
--XEN_CPUFEATURE(SC_VERW_PV, X86_SYNTH(23)) /* VERW used by Xen for PV */
--XEN_CPUFEATURE(SC_VERW_HVM, X86_SYNTH(24)) /* VERW used by Xen for HVM */
-+/* Bits 23,24 unused. */
- XEN_CPUFEATURE(SC_VERW_IDLE, X86_SYNTH(25)) /* VERW used by Xen for idle */
- XEN_CPUFEATURE(XEN_SHSTK, X86_SYNTH(26)) /* Xen uses CET Shadow Stacks */
- XEN_CPUFEATURE(XEN_IBT, X86_SYNTH(27)) /* Xen uses CET Indirect Branch Tracking */
-diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
-index 92d54de0b9a1..2398a1d99da9 100644
---- a/xen/include/asm-x86/domain.h
-+++ b/xen/include/asm-x86/domain.h
-@@ -319,6 +319,9 @@ struct arch_domain
- uint32_t pci_cf8;
- uint8_t cmos_idx;
-
-+ /* Use VERW on return-to-guest for its flushing side effect. */
-+ bool verw;
-+
- union {
- struct pv_domain pv;
- struct hvm_domain hvm;
-diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
-index f76029523610..751355f471f4 100644
---- a/xen/include/asm-x86/spec_ctrl.h
-+++ b/xen/include/asm-x86/spec_ctrl.h
-@@ -24,6 +24,7 @@
- #define SCF_use_shadow (1 << 0)
- #define SCF_ist_wrmsr (1 << 1)
- #define SCF_ist_rsb (1 << 2)
-+#define SCF_verw (1 << 3)
-
- #ifndef __ASSEMBLY__
-
-@@ -32,6 +33,7 @@
- #include <asm/msr-index.h>
-
- void init_speculation_mitigations(void);
-+void spec_ctrl_init_domain(struct domain *d);
-
- extern bool opt_ibpb;
- extern bool opt_ssbd;
-diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
-index 02b3b18ce69f..5a590bac44aa 100644
---- a/xen/include/asm-x86/spec_ctrl_asm.h
-+++ b/xen/include/asm-x86/spec_ctrl_asm.h
-@@ -136,6 +136,19 @@
- #endif
- .endm
-
-+.macro DO_SPEC_CTRL_COND_VERW
-+/*
-+ * Requires %rsp=cpuinfo
-+ *
-+ * Issue a VERW for its flushing side effect, if indicated. This is a Spectre
-+ * v1 gadget, but the IRET/VMEntry is serialising.
-+ */
-+ testb $SCF_verw, CPUINFO_spec_ctrl_flags(%rsp)
-+ jz .L\@_verw_skip
-+ verw CPUINFO_verw_sel(%rsp)
-+.L\@_verw_skip:
-+.endm
-+
- .macro DO_SPEC_CTRL_ENTRY maybexen:req
- /*
- * Requires %rsp=regs (also cpuinfo if !maybexen)
-@@ -231,8 +244,7 @@
- #define SPEC_CTRL_EXIT_TO_PV \
- ALTERNATIVE "", \
- DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV; \
-- ALTERNATIVE "", __stringify(verw CPUINFO_verw_sel(%rsp)), \
-- X86_FEATURE_SC_VERW_PV
-+ DO_SPEC_CTRL_COND_VERW
-
- /*
- * Use in IST interrupt/exception context. May interrupt Xen or PV context.
---
-2.35.1
-
diff --git a/0031-x86-spec-ctrl-Enumeration-for-MMIO-Stale-Data-contro.patch b/0031-x86-spec-ctrl-Enumeration-for-MMIO-Stale-Data-contro.patch
deleted file mode 100644
index 3b91fb5..0000000
--- a/0031-x86-spec-ctrl-Enumeration-for-MMIO-Stale-Data-contro.patch
+++ /dev/null
@@ -1,98 +0,0 @@
-From a83108736db0ddaa5855f5abda6dcc8ae4fe25e9 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Mon, 20 Sep 2021 18:47:49 +0100
-Subject: [PATCH 31/51] x86/spec-ctrl: Enumeration for MMIO Stale Data controls
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The three *_NO bits indicate non-susceptibility to the SSDP, FBSDP and PSDP
-data movement primitives.
-
-FB_CLEAR indicates that the VERW instruction has re-gained it's Fill Buffer
-flushing side effect. This is only enumerated on parts where VERW had
-previously lost it's flushing side effect due to the MDS/TAA vulnerabilities
-being fixed in hardware.
-
-FB_CLEAR_CTRL is available on a subset of FB_CLEAR parts where the Fill Buffer
-clearing side effect of VERW can be turned off for performance reasons.
-
-This is part of XSA-404.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-(cherry picked from commit 2ebe8fe9b7e0d36e9ec3cfe4552b2b197ef0dcec)
----
- xen/arch/x86/spec_ctrl.c | 11 ++++++++---
- xen/include/asm-x86/msr-index.h | 6 ++++++
- 2 files changed, 14 insertions(+), 3 deletions(-)
-
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 21730aa03071..d285538bde9f 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -323,7 +323,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
- * Hardware read-only information, stating immunity to certain issues, or
- * suggestions of which mitigation to use.
- */
-- printk(" Hardware hints:%s%s%s%s%s%s%s%s%s%s%s\n",
-+ printk(" Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
- (caps & ARCH_CAPS_RDCL_NO) ? " RDCL_NO" : "",
- (caps & ARCH_CAPS_IBRS_ALL) ? " IBRS_ALL" : "",
- (caps & ARCH_CAPS_RSBA) ? " RSBA" : "",
-@@ -332,13 +332,16 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
- (caps & ARCH_CAPS_SSB_NO) ? " SSB_NO" : "",
- (caps & ARCH_CAPS_MDS_NO) ? " MDS_NO" : "",
- (caps & ARCH_CAPS_TAA_NO) ? " TAA_NO" : "",
-+ (caps & ARCH_CAPS_SBDR_SSDP_NO) ? " SBDR_SSDP_NO" : "",
-+ (caps & ARCH_CAPS_FBSDP_NO) ? " FBSDP_NO" : "",
-+ (caps & ARCH_CAPS_PSDP_NO) ? " PSDP_NO" : "",
- (e8b & cpufeat_mask(X86_FEATURE_IBRS_ALWAYS)) ? " IBRS_ALWAYS" : "",
- (e8b & cpufeat_mask(X86_FEATURE_STIBP_ALWAYS)) ? " STIBP_ALWAYS" : "",
- (e8b & cpufeat_mask(X86_FEATURE_IBRS_FAST)) ? " IBRS_FAST" : "",
- (e8b & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "");
-
- /* Hardware features which need driving to mitigate issues. */
-- printk(" Hardware features:%s%s%s%s%s%s%s%s%s%s\n",
-+ printk(" Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
- (e8b & cpufeat_mask(X86_FEATURE_IBPB)) ||
- (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBPB" : "",
- (e8b & cpufeat_mask(X86_FEATURE_IBRS)) ||
-@@ -353,7 +356,9 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
- (_7d0 & cpufeat_mask(X86_FEATURE_MD_CLEAR)) ? " MD_CLEAR" : "",
- (_7d0 & cpufeat_mask(X86_FEATURE_SRBDS_CTRL)) ? " SRBDS_CTRL" : "",
- (e8b & cpufeat_mask(X86_FEATURE_VIRT_SSBD)) ? " VIRT_SSBD" : "",
-- (caps & ARCH_CAPS_TSX_CTRL) ? " TSX_CTRL" : "");
-+ (caps & ARCH_CAPS_TSX_CTRL) ? " TSX_CTRL" : "",
-+ (caps & ARCH_CAPS_FB_CLEAR) ? " FB_CLEAR" : "",
-+ (caps & ARCH_CAPS_FB_CLEAR_CTRL) ? " FB_CLEAR_CTRL" : "");
-
- /* Compiled-in support which pertains to mitigations. */
- if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) )
-diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
-index 31964b88af7a..72bc32ba04ff 100644
---- a/xen/include/asm-x86/msr-index.h
-+++ b/xen/include/asm-x86/msr-index.h
-@@ -66,6 +66,11 @@
- #define ARCH_CAPS_IF_PSCHANGE_MC_NO (_AC(1, ULL) << 6)
- #define ARCH_CAPS_TSX_CTRL (_AC(1, ULL) << 7)
- #define ARCH_CAPS_TAA_NO (_AC(1, ULL) << 8)
-+#define ARCH_CAPS_SBDR_SSDP_NO (_AC(1, ULL) << 13)
-+#define ARCH_CAPS_FBSDP_NO (_AC(1, ULL) << 14)
-+#define ARCH_CAPS_PSDP_NO (_AC(1, ULL) << 15)
-+#define ARCH_CAPS_FB_CLEAR (_AC(1, ULL) << 17)
-+#define ARCH_CAPS_FB_CLEAR_CTRL (_AC(1, ULL) << 18)
-
- #define MSR_FLUSH_CMD 0x0000010b
- #define FLUSH_CMD_L1D (_AC(1, ULL) << 0)
-@@ -83,6 +88,7 @@
- #define MCU_OPT_CTRL_RNGDS_MITG_DIS (_AC(1, ULL) << 0)
- #define MCU_OPT_CTRL_RTM_ALLOW (_AC(1, ULL) << 1)
- #define MCU_OPT_CTRL_RTM_LOCKED (_AC(1, ULL) << 2)
-+#define MCU_OPT_CTRL_FB_CLEAR_DIS (_AC(1, ULL) << 3)
-
- #define MSR_RTIT_OUTPUT_BASE 0x00000560
- #define MSR_RTIT_OUTPUT_MASK 0x00000561
---
-2.35.1
-
diff --git a/0032-x86-spec-ctrl-Add-spec-ctrl-unpriv-mmio.patch b/0032-x86-spec-ctrl-Add-spec-ctrl-unpriv-mmio.patch
deleted file mode 100644
index c63891a..0000000
--- a/0032-x86-spec-ctrl-Add-spec-ctrl-unpriv-mmio.patch
+++ /dev/null
@@ -1,187 +0,0 @@
-From 2e82446cb252f6c8ac697e81f4155872c69afde4 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Mon, 13 Jun 2022 19:18:32 +0100
-Subject: [PATCH 32/51] x86/spec-ctrl: Add spec-ctrl=unpriv-mmio
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Per Xen's support statement, PCI passthrough should be to trusted domains
-because the overall system security depends on factors outside of Xen's
-control.
-
-As such, Xen, in a supported configuration, is not vulnerable to DRPW/SBDR.
-
-However, users who have risk assessed their configuration may be happy with
-the risk of DoS, but unhappy with the risk of cross-domain data leakage. Such
-users should enable this option.
-
-On CPUs vulnerable to MDS, the existing mitigations are the best we can do to
-mitigate MMIO cross-domain data leakage.
-
-On CPUs fixed to MDS but vulnerable MMIO stale data leakage, this option:
-
- * On CPUs susceptible to FBSDP, mitigates cross-domain fill buffer leakage
- using FB_CLEAR.
- * On CPUs susceptible to SBDR, mitigates RNG data recovery by engaging the
- srb-lock, previously used to mitigate SRBDS.
-
-Both mitigations require microcode from IPU 2022.1, May 2022.
-
-This is part of XSA-404.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-(cherry picked from commit 8c24b70fedcb52633b2370f834d8a2be3f7fa38e)
----
- docs/misc/xen-command-line.pandoc | 14 +++++++--
- xen/arch/x86/spec_ctrl.c | 48 ++++++++++++++++++++++++-------
- 2 files changed, 48 insertions(+), 14 deletions(-)
-
-diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index d5cb09f86541..a642e43476a2 100644
---- a/docs/misc/xen-command-line.pandoc
-+++ b/docs/misc/xen-command-line.pandoc
-@@ -2235,7 +2235,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
- ### spec-ctrl (x86)
- > `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
- > bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
--> l1d-flush,branch-harden,srb-lock}=<bool> ]`
-+> l1d-flush,branch-harden,srb-lock,unpriv-mmio}=<bool> ]`
-
- Controls for speculative execution sidechannel mitigations. By default, Xen
- will pick the most appropriate mitigations based on compiled in support,
-@@ -2314,8 +2314,16 @@ Xen will enable this mitigation.
- On hardware supporting SRBDS_CTRL, the `srb-lock=` option can be used to force
- or prevent Xen from protect the Special Register Buffer from leaking stale
- data. By default, Xen will enable this mitigation, except on parts where MDS
--is fixed and TAA is fixed/mitigated (in which case, there is believed to be no
--way for an attacker to obtain the stale data).
-+is fixed and TAA is fixed/mitigated and there are no unprivileged MMIO
-+mappings (in which case, there is believed to be no way for an attacker to
-+obtain stale data).
-+
-+The `unpriv-mmio=` boolean indicates whether the system has (or will have)
-+less than fully privileged domains granted access to MMIO devices. By
-+default, this option is disabled. If enabled, Xen will use the `FB_CLEAR`
-+and/or `SRBDS_CTRL` functionality available in the Intel May 2022 microcode
-+release to mitigate cross-domain leakage of data via the MMIO Stale Data
-+vulnerabilities.
-
- ### sync_console
- > `= <boolean>`
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index d285538bde9f..099113ba41e6 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -67,6 +67,8 @@ static bool __initdata cpu_has_bug_msbds_only; /* => minimal HT impact. */
- static bool __initdata cpu_has_bug_mds; /* Any other M{LP,SB,FB}DS combination. */
-
- static int8_t __initdata opt_srb_lock = -1;
-+static bool __initdata opt_unpriv_mmio;
-+static bool __read_mostly opt_fb_clear_mmio;
-
- static int __init parse_spec_ctrl(const char *s)
- {
-@@ -184,6 +186,8 @@ static int __init parse_spec_ctrl(const char *s)
- opt_branch_harden = val;
- else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 )
- opt_srb_lock = val;
-+ else if ( (val = parse_boolean("unpriv-mmio", s, ss)) >= 0 )
-+ opt_unpriv_mmio = val;
- else
- rc = -EINVAL;
-
-@@ -392,7 +396,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
- opt_srb_lock ? " SRB_LOCK+" : " SRB_LOCK-",
- opt_ibpb ? " IBPB" : "",
- opt_l1d_flush ? " L1D_FLUSH" : "",
-- opt_md_clear_pv || opt_md_clear_hvm ? " VERW" : "",
-+ opt_md_clear_pv || opt_md_clear_hvm ||
-+ opt_fb_clear_mmio ? " VERW" : "",
- opt_branch_harden ? " BRANCH_HARDEN" : "");
-
- /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
-@@ -941,7 +946,9 @@ void spec_ctrl_init_domain(struct domain *d)
- {
- bool pv = is_pv_domain(d);
-
-- d->arch.verw = pv ? opt_md_clear_pv : opt_md_clear_hvm;
-+ d->arch.verw =
-+ (pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
-+ (opt_fb_clear_mmio && is_iommu_enabled(d));
- }
-
- void __init init_speculation_mitigations(void)
-@@ -1195,6 +1202,18 @@ void __init init_speculation_mitigations(void)
-
- mds_calculations(caps);
-
-+ /*
-+ * Parts which enumerate FB_CLEAR are those which are post-MDS_NO and have
-+ * reintroduced the VERW fill buffer flushing side effect because of a
-+ * susceptibility to FBSDP.
-+ *
-+ * If unprivileged guests have (or will have) MMIO mappings, we can
-+ * mitigate cross-domain leakage of fill buffer data by issuing VERW on
-+ * the return-to-guest path.
-+ */
-+ if ( opt_unpriv_mmio )
-+ opt_fb_clear_mmio = caps & ARCH_CAPS_FB_CLEAR;
-+
- /*
- * By default, enable PV and HVM mitigations on MDS-vulnerable hardware.
- * This will only be a token effort for MLPDS/MFBDS when HT is enabled,
-@@ -1208,18 +1227,20 @@ void __init init_speculation_mitigations(void)
- boot_cpu_has(X86_FEATURE_MD_CLEAR));
-
- /*
-- * Enable MDS defences as applicable. The Idle blocks need using if
-- * either PV or HVM defences are used.
-+ * Enable MDS/MMIO defences as applicable. The Idle blocks need using if
-+ * either the PV or HVM MDS defences are used, or if we may give MMIO
-+ * access to untrusted guests.
- *
- * HVM is more complicated. The MD_CLEAR microcode extends L1D_FLUSH with
- * equivalent semantics to avoid needing to perform both flushes on the
-- * HVM path. Therefore, we don't need VERW in addition to L1D_FLUSH.
-+ * HVM path. Therefore, we don't need VERW in addition to L1D_FLUSH (for
-+ * MDS mitigations. L1D_FLUSH is not safe for MMIO mitigations.)
- *
- * After calculating the appropriate idle setting, simplify
- * opt_md_clear_hvm to mean just "should we VERW on the way into HVM
- * guests", so spec_ctrl_init_domain() can calculate suitable settings.
- */
-- if ( opt_md_clear_pv || opt_md_clear_hvm )
-+ if ( opt_md_clear_pv || opt_md_clear_hvm || opt_fb_clear_mmio )
- setup_force_cpu_cap(X86_FEATURE_SC_VERW_IDLE);
- opt_md_clear_hvm &= !(caps & ARCH_CAPS_SKIP_L1DFL) && !opt_l1d_flush;
-
-@@ -1284,14 +1305,19 @@ void __init init_speculation_mitigations(void)
- * On some SRBDS-affected hardware, it may be safe to relax srb-lock by
- * default.
- *
-- * On parts which enumerate MDS_NO and not TAA_NO, TSX is the only known
-- * way to access the Fill Buffer. If TSX isn't available (inc. SKU
-- * reasons on some models), or TSX is explicitly disabled, then there is
-- * no need for the extra overhead to protect RDRAND/RDSEED.
-+ * All parts with SRBDS_CTRL suffer SSDP, the mechanism by which stale RNG
-+ * data becomes available to other contexts. To recover the data, an
-+ * attacker needs to use:
-+ * - SBDS (MDS or TAA to sample the cores fill buffer)
-+ * - SBDR (Architecturally retrieve stale transaction buffer contents)
-+ * - DRPW (Architecturally latch stale fill buffer data)
-+ *
-+ * On MDS_NO parts, and with TAA_NO or TSX unavailable/disabled, and there
-+ * is no unprivileged MMIO access, the RNG data doesn't need protecting.
- */
- if ( cpu_has_srbds_ctrl )
- {
-- if ( opt_srb_lock == -1 &&
-+ if ( opt_srb_lock == -1 && !opt_unpriv_mmio &&
- (caps & (ARCH_CAPS_MDS_NO|ARCH_CAPS_TAA_NO)) == ARCH_CAPS_MDS_NO &&
- (!cpu_has_hle || ((caps & ARCH_CAPS_TSX_CTRL) && rtm_disabled)) )
- opt_srb_lock = 0;
---
-2.35.1
-
diff --git a/0033-IOMMU-x86-work-around-bogus-gcc12-warning-in-hvm_gsi.patch b/0033-IOMMU-x86-work-around-bogus-gcc12-warning-in-hvm_gsi.patch
deleted file mode 100644
index 07f488d..0000000
--- a/0033-IOMMU-x86-work-around-bogus-gcc12-warning-in-hvm_gsi.patch
+++ /dev/null
@@ -1,52 +0,0 @@
-From 460b08d6c6c16b3f32aa138e772b759ae02a4479 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 12 Jul 2022 11:10:34 +0200
-Subject: [PATCH 33/51] IOMMU/x86: work around bogus gcc12 warning in
- hvm_gsi_eoi()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-As per [1] the expansion of the pirq_dpci() macro causes a -Waddress
-controlled warning (enabled implicitly in our builds, if not by default)
-tying the middle part of the involved conditional expression to the
-surrounding boolean context. Work around this by introducing a local
-inline function in the affected source file.
-
-Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-
-[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102967
-master commit: 80ad8db8a4d9bb24952f0aea788ce6f47566fa76
-master date: 2022-06-15 10:19:32 +0200
----
- xen/drivers/passthrough/x86/hvm.c | 12 ++++++++++++
- 1 file changed, 12 insertions(+)
-
-diff --git a/xen/drivers/passthrough/x86/hvm.c b/xen/drivers/passthrough/x86/hvm.c
-index 0b37cd145b60..ba0f6c53d742 100644
---- a/xen/drivers/passthrough/x86/hvm.c
-+++ b/xen/drivers/passthrough/x86/hvm.c
-@@ -25,6 +25,18 @@
- #include <asm/hvm/support.h>
- #include <asm/io_apic.h>
-
-+/*
-+ * Gcc12 takes issue with pirq_dpci() being used in boolean context (see gcc
-+ * bug 102967). While we can't replace the macro definition in the header by an
-+ * inline function, we can do so here.
-+ */
-+static inline struct hvm_pirq_dpci *_pirq_dpci(struct pirq *pirq)
-+{
-+ return pirq_dpci(pirq);
-+}
-+#undef pirq_dpci
-+#define pirq_dpci(pirq) _pirq_dpci(pirq)
-+
- static DEFINE_PER_CPU(struct list_head, dpci_list);
-
- /*
---
-2.35.1
-
diff --git a/0034-ehci-dbgp-fix-selecting-n-th-ehci-controller.patch b/0034-ehci-dbgp-fix-selecting-n-th-ehci-controller.patch
deleted file mode 100644
index ac71ab8..0000000
--- a/0034-ehci-dbgp-fix-selecting-n-th-ehci-controller.patch
+++ /dev/null
@@ -1,36 +0,0 @@
-From 5cb8142076ce1ce53eafd7e00acb4d0eac4e7784 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
- <marmarek@invisiblethingslab.com>
-Date: Tue, 12 Jul 2022 11:11:35 +0200
-Subject: [PATCH 34/51] ehci-dbgp: fix selecting n-th ehci controller
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The ehci<n> number was parsed but ignored.
-
-Fixes: 322ecbe4ac85 ("console: add EHCI debug port based serial console")
-Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: d6d0cb659fda64430d4649f8680c5cead32da8fd
-master date: 2022-06-16 14:23:37 +0100
----
- xen/drivers/char/ehci-dbgp.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/drivers/char/ehci-dbgp.c b/xen/drivers/char/ehci-dbgp.c
-index c893d246defa..66b4811af24a 100644
---- a/xen/drivers/char/ehci-dbgp.c
-+++ b/xen/drivers/char/ehci-dbgp.c
-@@ -1478,7 +1478,7 @@ void __init ehci_dbgp_init(void)
- unsigned int num = 0;
-
- if ( opt_dbgp[4] )
-- simple_strtoul(opt_dbgp + 4, &e, 10);
-+ num = simple_strtoul(opt_dbgp + 4, &e, 10);
-
- dbgp->cap = find_dbgp(dbgp, num);
- if ( !dbgp->cap )
---
-2.35.1
-
diff --git a/0035-tools-xenstored-Harden-corrupt.patch b/0035-tools-xenstored-Harden-corrupt.patch
deleted file mode 100644
index bb0f7f1..0000000
--- a/0035-tools-xenstored-Harden-corrupt.patch
+++ /dev/null
@@ -1,44 +0,0 @@
-From 81ee3d08351be1ef2a14d371993604098d6a4673 Mon Sep 17 00:00:00 2001
-From: Julien Grall <jgrall@amazon.com>
-Date: Tue, 12 Jul 2022 11:12:13 +0200
-Subject: [PATCH 35/51] tools/xenstored: Harden corrupt()
-
-At the moment, corrupt() is neither checking for allocation failure
-nor freeing the allocated memory.
-
-Harden the code by printing ENOMEM if the allocation failed and
-free 'str' after the last use.
-
-This is not considered to be a security issue because corrupt() should
-only be called when Xenstored thinks the database is corrupted. Note
-that the trigger (i.e. a guest reliably provoking the call) would be
-a security issue.
-
-Fixes: 06d17943f0cd ("Added a basic integrity checker, and some basic ability to recover from store")
-Signed-off-by: Julien Grall <jgrall@amazon.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-master commit: db3382dd4f468c763512d6bf91c96773395058fb
-master date: 2022-06-23 13:44:10 +0100
----
- tools/xenstore/xenstored_core.c | 5 ++++-
- 1 file changed, 4 insertions(+), 1 deletion(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 91d093a12ea6..0c8ee276f837 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -2087,7 +2087,10 @@ void corrupt(struct connection *conn, const char *fmt, ...)
- va_end(arglist);
-
- log("corruption detected by connection %i: err %s: %s",
-- conn ? (int)conn->id : -1, strerror(saved_errno), str);
-+ conn ? (int)conn->id : -1, strerror(saved_errno),
-+ str ?: "ENOMEM");
-+
-+ talloc_free(str);
-
- check_store();
- }
---
-2.35.1
-
diff --git a/0036-x86-spec-ctrl-Only-adjust-MSR_SPEC_CTRL-for-idle-wit.patch b/0036-x86-spec-ctrl-Only-adjust-MSR_SPEC_CTRL-for-idle-wit.patch
deleted file mode 100644
index 8bc0768..0000000
--- a/0036-x86-spec-ctrl-Only-adjust-MSR_SPEC_CTRL-for-idle-wit.patch
+++ /dev/null
@@ -1,93 +0,0 @@
-From 09d533f4c80b7eaf9fb4e36ebba8259580857a9d Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 12 Jul 2022 11:12:46 +0200
-Subject: [PATCH 36/51] x86/spec-ctrl: Only adjust MSR_SPEC_CTRL for idle with
- legacy IBRS
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Back at the time of the original Spectre-v2 fixes, it was recommended to clear
-MSR_SPEC_CTRL when going idle. This is because of the side effects on the
-sibling thread caused by the microcode IBRS and STIBP implementations which
-were retrofitted to existing CPUs.
-
-However, there are no relevant cross-thread impacts for the hardware
-IBRS/STIBP implementations, so this logic should not be used on Intel CPUs
-supporting eIBRS, or any AMD CPUs; doing so only adds unnecessary latency to
-the idle path.
-
-Furthermore, there's no point playing with MSR_SPEC_CTRL in the idle paths if
-SMT is disabled for other reasons.
-
-Fixes: 8d03080d2a33 ("x86/spec-ctrl: Cease using thunk=lfence on AMD")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: ffc7694e0c99eea158c32aa164b7d1e1bb1dc46b
-master date: 2022-06-30 18:07:13 +0100
----
- xen/arch/x86/spec_ctrl.c | 10 ++++++++--
- xen/include/asm-x86/cpufeatures.h | 2 +-
- xen/include/asm-x86/spec_ctrl.h | 5 +++--
- 3 files changed, 12 insertions(+), 5 deletions(-)
-
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 099113ba41e6..1ed5ceda8b46 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -1150,8 +1150,14 @@ void __init init_speculation_mitigations(void)
- /* (Re)init BSP state now that default_spec_ctrl_flags has been calculated. */
- init_shadow_spec_ctrl_state();
-
-- /* If Xen is using any MSR_SPEC_CTRL settings, adjust the idle path. */
-- if ( default_xen_spec_ctrl )
-+ /*
-+ * For microcoded IBRS only (i.e. Intel, pre eIBRS), it is recommended to
-+ * clear MSR_SPEC_CTRL before going idle, to avoid impacting sibling
-+ * threads. Activate this if SMT is enabled, and Xen is using a non-zero
-+ * MSR_SPEC_CTRL setting.
-+ */
-+ if ( boot_cpu_has(X86_FEATURE_IBRSB) && !(caps & ARCH_CAPS_IBRS_ALL) &&
-+ hw_smt_enabled && default_xen_spec_ctrl )
- setup_force_cpu_cap(X86_FEATURE_SC_MSR_IDLE);
-
- xpti_init_default(caps);
-diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
-index bd45a144ee78..493d338a085e 100644
---- a/xen/include/asm-x86/cpufeatures.h
-+++ b/xen/include/asm-x86/cpufeatures.h
-@@ -33,7 +33,7 @@ XEN_CPUFEATURE(SC_MSR_HVM, X86_SYNTH(17)) /* MSR_SPEC_CTRL used by Xen fo
- XEN_CPUFEATURE(SC_RSB_PV, X86_SYNTH(18)) /* RSB overwrite needed for PV */
- XEN_CPUFEATURE(SC_RSB_HVM, X86_SYNTH(19)) /* RSB overwrite needed for HVM */
- XEN_CPUFEATURE(XEN_SELFSNOOP, X86_SYNTH(20)) /* SELFSNOOP gets used by Xen itself */
--XEN_CPUFEATURE(SC_MSR_IDLE, X86_SYNTH(21)) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
-+XEN_CPUFEATURE(SC_MSR_IDLE, X86_SYNTH(21)) /* Clear MSR_SPEC_CTRL on idle */
- XEN_CPUFEATURE(XEN_LBR, X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
- /* Bits 23,24 unused. */
- XEN_CPUFEATURE(SC_VERW_IDLE, X86_SYNTH(25)) /* VERW used by Xen for idle */
-diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
-index 751355f471f4..7e83e0179fb9 100644
---- a/xen/include/asm-x86/spec_ctrl.h
-+++ b/xen/include/asm-x86/spec_ctrl.h
-@@ -78,7 +78,8 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
- uint32_t val = 0;
-
- /*
-- * Branch Target Injection:
-+ * It is recommended in some cases to clear MSR_SPEC_CTRL when going idle,
-+ * to avoid impacting sibling threads.
- *
- * Latch the new shadow value, then enable shadowing, then update the MSR.
- * There are no SMP issues here; only local processor ordering concerns.
-@@ -114,7 +115,7 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
- uint32_t val = info->xen_spec_ctrl;
-
- /*
-- * Branch Target Injection:
-+ * Restore MSR_SPEC_CTRL on exit from idle.
- *
- * Disable shadowing before updating the MSR. There are no SMP issues
- * here; only local processor ordering concerns.
---
-2.35.1
-
diff --git a/0037-x86-spec-ctrl-Knobs-for-STIBP-and-PSFD-and-follow-ha.patch b/0037-x86-spec-ctrl-Knobs-for-STIBP-and-PSFD-and-follow-ha.patch
deleted file mode 100644
index 156aa58..0000000
--- a/0037-x86-spec-ctrl-Knobs-for-STIBP-and-PSFD-and-follow-ha.patch
+++ /dev/null
@@ -1,234 +0,0 @@
-From db6ca8176ccc4ff7dfe3c06969af9ebfab0d7b04 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 12 Jul 2022 11:13:33 +0200
-Subject: [PATCH 37/51] x86/spec-ctrl: Knobs for STIBP and PSFD, and follow
- hardware STIBP hint
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-STIBP and PSFD are slightly weird bits, because they're both implied by other
-bits in MSR_SPEC_CTRL. Add fine grain controls for them, and take the
-implications into account when setting IBRS/SSBD.
-
-Rearrange the IBPB text/variables/logic to keep all the MSR_SPEC_CTRL bits
-together, for consistency.
-
-However, AMD have a hardware hint CPUID bit recommending that STIBP be set
-unilaterally. This is advertised on Zen3, so follow the recommendation.
-Furthermore, in such cases, set STIBP behind the guest's back for now. This
-has negligible overhead for the guest, but saves a WRMSR on vmentry. This is
-the only default change.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: fef244b179c06fcdfa581f7d57fa6e578c49ff50
-master date: 2022-06-30 18:07:13 +0100
----
- docs/misc/xen-command-line.pandoc | 21 +++++++---
- xen/arch/x86/hvm/svm/vmcb.c | 9 +++++
- xen/arch/x86/spec_ctrl.c | 67 ++++++++++++++++++++++++++-----
- 3 files changed, 82 insertions(+), 15 deletions(-)
-
-diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index a642e43476a2..46e9c58d35cd 100644
---- a/docs/misc/xen-command-line.pandoc
-+++ b/docs/misc/xen-command-line.pandoc
-@@ -2234,8 +2234,9 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
-
- ### spec-ctrl (x86)
- > `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
--> bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
--> l1d-flush,branch-harden,srb-lock,unpriv-mmio}=<bool> ]`
-+> bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
-+> eager-fpu,l1d-flush,branch-harden,srb-lock,
-+> unpriv-mmio}=<bool> ]`
-
- Controls for speculative execution sidechannel mitigations. By default, Xen
- will pick the most appropriate mitigations based on compiled in support,
-@@ -2285,9 +2286,10 @@ On hardware supporting IBRS (Indirect Branch Restricted Speculation), the
- If Xen is not using IBRS itself, functionality is still set up so IBRS can be
- virtualised for guests.
-
--On hardware supporting IBPB (Indirect Branch Prediction Barrier), the `ibpb=`
--option can be used to force (the default) or prevent Xen from issuing branch
--prediction barriers on vcpu context switches.
-+On hardware supporting STIBP (Single Thread Indirect Branch Predictors), the
-+`stibp=` option can be used to force or prevent Xen using the feature itself.
-+By default, Xen will use STIBP when IBRS is in use (IBRS implies STIBP), and
-+when hardware hints recommend using it as a blanket setting.
-
- On hardware supporting SSBD (Speculative Store Bypass Disable), the `ssbd=`
- option can be used to force or prevent Xen using the feature itself. On AMD
-@@ -2295,6 +2297,15 @@ hardware, this is a global option applied at boot, and not virtualised for
- guest use. On Intel hardware, the feature is virtualised for guests,
- independently of Xen's choice of setting.
-
-+On hardware supporting PSFD (Predictive Store Forwarding Disable), the `psfd=`
-+option can be used to force or prevent Xen using the feature itself. By
-+default, Xen will not use PSFD. PSFD is implied by SSBD, and SSBD is off by
-+default.
-+
-+On hardware supporting IBPB (Indirect Branch Prediction Barrier), the `ibpb=`
-+option can be used to force (the default) or prevent Xen from issuing branch
-+prediction barriers on vcpu context switches.
-+
- On all hardware, the `eager-fpu=` option can be used to force or prevent Xen
- from using fully eager FPU context switches. This is currently implemented as
- a global control. By default, Xen will choose to use fully eager context
-diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c
-index 565e997155f2..ef7224eb5dd7 100644
---- a/xen/arch/x86/hvm/svm/vmcb.c
-+++ b/xen/arch/x86/hvm/svm/vmcb.c
-@@ -29,6 +29,7 @@
- #include <asm/hvm/support.h>
- #include <asm/hvm/svm/svm.h>
- #include <asm/hvm/svm/svmdebug.h>
-+#include <asm/spec_ctrl.h>
-
- struct vmcb_struct *alloc_vmcb(void)
- {
-@@ -176,6 +177,14 @@ static int construct_vmcb(struct vcpu *v)
- vmcb->_pause_filter_thresh = SVM_PAUSETHRESH_INIT;
- }
-
-+ /*
-+ * When default_xen_spec_ctrl simply SPEC_CTRL_STIBP, default this behind
-+ * the back of the VM too. Our SMT topology isn't accurate, the overhead
-+ * is neglegable, and doing this saves a WRMSR on the vmentry path.
-+ */
-+ if ( default_xen_spec_ctrl == SPEC_CTRL_STIBP )
-+ v->arch.msrs->spec_ctrl.raw = SPEC_CTRL_STIBP;
-+
- return 0;
- }
-
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 1ed5ceda8b46..dfdd45c358c4 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -48,9 +48,13 @@ static enum ind_thunk {
- THUNK_LFENCE,
- THUNK_JMP,
- } opt_thunk __initdata = THUNK_DEFAULT;
-+
- static int8_t __initdata opt_ibrs = -1;
-+int8_t __initdata opt_stibp = -1;
-+bool __read_mostly opt_ssbd;
-+int8_t __initdata opt_psfd = -1;
-+
- bool __read_mostly opt_ibpb = true;
--bool __read_mostly opt_ssbd = false;
- int8_t __read_mostly opt_eager_fpu = -1;
- int8_t __read_mostly opt_l1d_flush = -1;
- static bool __initdata opt_branch_harden = true;
-@@ -172,12 +176,20 @@ static int __init parse_spec_ctrl(const char *s)
- else
- rc = -EINVAL;
- }
-+
-+ /* Bits in MSR_SPEC_CTRL. */
- else if ( (val = parse_boolean("ibrs", s, ss)) >= 0 )
- opt_ibrs = val;
-- else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
-- opt_ibpb = val;
-+ else if ( (val = parse_boolean("stibp", s, ss)) >= 0 )
-+ opt_stibp = val;
- else if ( (val = parse_boolean("ssbd", s, ss)) >= 0 )
- opt_ssbd = val;
-+ else if ( (val = parse_boolean("psfd", s, ss)) >= 0 )
-+ opt_psfd = val;
-+
-+ /* Misc settings. */
-+ else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
-+ opt_ibpb = val;
- else if ( (val = parse_boolean("eager-fpu", s, ss)) >= 0 )
- opt_eager_fpu = val;
- else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
-@@ -376,7 +388,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
- "\n");
-
- /* Settings for Xen's protection, irrespective of guests. */
-- printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s, Other:%s%s%s%s%s\n",
-+ printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s\n",
- thunk == THUNK_NONE ? "N/A" :
- thunk == THUNK_RETPOLINE ? "RETPOLINE" :
- thunk == THUNK_LFENCE ? "LFENCE" :
-@@ -390,6 +402,9 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
- (!boot_cpu_has(X86_FEATURE_SSBD) &&
- !boot_cpu_has(X86_FEATURE_AMD_SSBD)) ? "" :
- (default_xen_spec_ctrl & SPEC_CTRL_SSBD) ? " SSBD+" : " SSBD-",
-+ (!boot_cpu_has(X86_FEATURE_PSFD) &&
-+ !boot_cpu_has(X86_FEATURE_INTEL_PSFD)) ? "" :
-+ (default_xen_spec_ctrl & SPEC_CTRL_PSFD) ? " PSFD+" : " PSFD-",
- !(caps & ARCH_CAPS_TSX_CTRL) ? "" :
- (opt_tsx & 1) ? " TSX+" : " TSX-",
- !cpu_has_srbds_ctrl ? "" :
-@@ -979,10 +994,7 @@ void __init init_speculation_mitigations(void)
- if ( !has_spec_ctrl )
- printk(XENLOG_WARNING "?!? CET active, but no MSR_SPEC_CTRL?\n");
- else if ( opt_ibrs == -1 )
-- {
- opt_ibrs = ibrs = true;
-- default_xen_spec_ctrl |= SPEC_CTRL_IBRS | SPEC_CTRL_STIBP;
-- }
-
- if ( opt_thunk == THUNK_DEFAULT || opt_thunk == THUNK_RETPOLINE )
- thunk = THUNK_JMP;
-@@ -1086,14 +1098,49 @@ void __init init_speculation_mitigations(void)
- setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);
- }
-
-- /* If we have IBRS available, see whether we should use it. */
-+ /* Figure out default_xen_spec_ctrl. */
- if ( has_spec_ctrl && ibrs )
-- default_xen_spec_ctrl |= SPEC_CTRL_IBRS;
-+ {
-+ /* IBRS implies STIBP. */
-+ if ( opt_stibp == -1 )
-+ opt_stibp = 1;
-+
-+ default_xen_spec_ctrl |= SPEC_CTRL_IBRS;
-+ }
-+
-+ /*
-+ * Use STIBP by default if the hardware hint is set. Otherwise, leave it
-+ * off as it a severe performance pentalty on pre-eIBRS Intel hardware
-+ * where it was retrofitted in microcode.
-+ */
-+ if ( opt_stibp == -1 )
-+ opt_stibp = !!boot_cpu_has(X86_FEATURE_STIBP_ALWAYS);
-+
-+ if ( opt_stibp && (boot_cpu_has(X86_FEATURE_STIBP) ||
-+ boot_cpu_has(X86_FEATURE_AMD_STIBP)) )
-+ default_xen_spec_ctrl |= SPEC_CTRL_STIBP;
-
-- /* If we have SSBD available, see whether we should use it. */
- if ( opt_ssbd && (boot_cpu_has(X86_FEATURE_SSBD) ||
- boot_cpu_has(X86_FEATURE_AMD_SSBD)) )
-+ {
-+ /* SSBD implies PSFD */
-+ if ( opt_psfd == -1 )
-+ opt_psfd = 1;
-+
- default_xen_spec_ctrl |= SPEC_CTRL_SSBD;
-+ }
-+
-+ /*
-+ * Don't use PSFD by default. AMD designed the predictor to
-+ * auto-clear on privilege change. PSFD is implied by SSBD, which is
-+ * off by default.
-+ */
-+ if ( opt_psfd == -1 )
-+ opt_psfd = 0;
-+
-+ if ( opt_psfd && (boot_cpu_has(X86_FEATURE_PSFD) ||
-+ boot_cpu_has(X86_FEATURE_INTEL_PSFD)) )
-+ default_xen_spec_ctrl |= SPEC_CTRL_PSFD;
-
- /*
- * PV guests can create RSB entries for any linear address they control,
---
-2.35.1
-
diff --git a/0038-libxc-fix-compilation-error-with-gcc13.patch b/0038-libxc-fix-compilation-error-with-gcc13.patch
deleted file mode 100644
index 8056742..0000000
--- a/0038-libxc-fix-compilation-error-with-gcc13.patch
+++ /dev/null
@@ -1,33 +0,0 @@
-From cd3d6b4cd46cd05590805b4a6c0b6654af60106e Mon Sep 17 00:00:00 2001
-From: Charles Arnold <carnold@suse.com>
-Date: Tue, 12 Jul 2022 11:14:07 +0200
-Subject: [PATCH 38/51] libxc: fix compilation error with gcc13
-
-xc_psr.c:161:5: error: conflicting types for 'xc_psr_cmt_get_data'
-due to enum/integer mismatch;
-
-Signed-off-by: Charles Arnold <carnold@suse.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Anthony PERARD <anthony.perard@citrix.com>
-master commit: 8eeae8c2b4efefda8e946461e86cf2ae9c18e5a9
-master date: 2022-07-06 13:06:40 +0200
----
- tools/include/xenctrl.h | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
-index 07b96e6671a5..893ae39e4a95 100644
---- a/tools/include/xenctrl.h
-+++ b/tools/include/xenctrl.h
-@@ -2516,7 +2516,7 @@ int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask);
- int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
- uint32_t *l3_cache_size);
- int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
-- uint32_t psr_cmt_type, uint64_t *monitor_data,
-+ xc_psr_cmt_type type, uint64_t *monitor_data,
- uint64_t *tsc);
- int xc_psr_cmt_enabled(xc_interface *xch);
-
---
-2.35.1
-
diff --git a/0039-x86-spec-ctrl-Honour-spec-ctrl-0-for-unpriv-mmio-sub.patch b/0039-x86-spec-ctrl-Honour-spec-ctrl-0-for-unpriv-mmio-sub.patch
deleted file mode 100644
index 1797a8f..0000000
--- a/0039-x86-spec-ctrl-Honour-spec-ctrl-0-for-unpriv-mmio-sub.patch
+++ /dev/null
@@ -1,32 +0,0 @@
-From 61b9c2ceeb94b0cdaff01023cc5523b1f13e66e2 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 12 Jul 2022 11:14:34 +0200
-Subject: [PATCH 39/51] x86/spec-ctrl: Honour spec-ctrl=0 for unpriv-mmio
- sub-option
-
-This was an oversight from when unpriv-mmio was introduced.
-
-Fixes: 8c24b70fedcb ("x86/spec-ctrl: Add spec-ctrl=unpriv-mmio")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 4cdb519d797c19ebb8fadc5938cdb47479d5a21b
-master date: 2022-07-11 15:21:35 +0100
----
- xen/arch/x86/spec_ctrl.c | 1 +
- 1 file changed, 1 insertion(+)
-
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index dfdd45c358c4..ae74943c1053 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -122,6 +122,7 @@ static int __init parse_spec_ctrl(const char *s)
- opt_l1d_flush = 0;
- opt_branch_harden = false;
- opt_srb_lock = 0;
-+ opt_unpriv_mmio = false;
- }
- else if ( val > 0 )
- rc = -EINVAL;
---
-2.35.1
-
diff --git a/0040-xen-cmdline-Extend-parse_boolean-to-signal-a-name-ma.patch b/0040-xen-cmdline-Extend-parse_boolean-to-signal-a-name-ma.patch
deleted file mode 100644
index 3512590..0000000
--- a/0040-xen-cmdline-Extend-parse_boolean-to-signal-a-name-ma.patch
+++ /dev/null
@@ -1,87 +0,0 @@
-From eec5b02403a9df2523527caad24f17af5060fbe7 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 12 Jul 2022 11:15:03 +0200
-Subject: [PATCH 40/51] xen/cmdline: Extend parse_boolean() to signal a name
- match
-
-This will help parsing a sub-option which has boolean and non-boolean options
-available.
-
-First, rework 'int val' into 'bool has_neg_prefix'. This inverts it's value,
-but the resulting logic is far easier to follow.
-
-Second, reject anything of the form 'no-$FOO=' which excludes ambiguous
-constructs such as 'no-$foo=yes' which have never been valid.
-
-This just leaves the case where everything is otherwise fine, but parse_bool()
-can't interpret the provided string.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 382326cac528dd1eb0d04efd5c05363c453e29f4
-master date: 2022-07-11 15:21:35 +0100
----
- xen/common/kernel.c | 20 ++++++++++++++++----
- xen/include/xen/lib.h | 3 ++-
- 2 files changed, 18 insertions(+), 5 deletions(-)
-
-diff --git a/xen/common/kernel.c b/xen/common/kernel.c
-index e119e5401f9d..7ed96521f97a 100644
---- a/xen/common/kernel.c
-+++ b/xen/common/kernel.c
-@@ -272,9 +272,9 @@ int parse_bool(const char *s, const char *e)
- int parse_boolean(const char *name, const char *s, const char *e)
- {
- size_t slen, nlen;
-- int val = !!strncmp(s, "no-", 3);
-+ bool has_neg_prefix = !strncmp(s, "no-", 3);
-
-- if ( !val )
-+ if ( has_neg_prefix )
- s += 3;
-
- slen = e ? ({ ASSERT(e >= s); e - s; }) : strlen(s);
-@@ -286,11 +286,23 @@ int parse_boolean(const char *name, const char *s, const char *e)
-
- /* Exact, unadorned name? Result depends on the 'no-' prefix. */
- if ( slen == nlen )
-- return val;
-+ return !has_neg_prefix;
-+
-+ /* Inexact match with a 'no-' prefix? Not valid. */
-+ if ( has_neg_prefix )
-+ return -1;
-
- /* =$SOMETHING? Defer to the regular boolean parsing. */
- if ( s[nlen] == '=' )
-- return parse_bool(&s[nlen + 1], e);
-+ {
-+ int b = parse_bool(&s[nlen + 1], e);
-+
-+ if ( b >= 0 )
-+ return b;
-+
-+ /* Not a boolean, but the name matched. Signal specially. */
-+ return -2;
-+ }
-
- /* Unrecognised. Give up. */
- return -1;
-diff --git a/xen/include/xen/lib.h b/xen/include/xen/lib.h
-index c6987973bf88..2296044caf79 100644
---- a/xen/include/xen/lib.h
-+++ b/xen/include/xen/lib.h
-@@ -80,7 +80,8 @@ int parse_bool(const char *s, const char *e);
- /**
- * Given a specific name, parses a string of the form:
- * [no-]$NAME[=...]
-- * returning 0 or 1 for a recognised boolean, or -1 for an error.
-+ * returning 0 or 1 for a recognised boolean. Returns -1 for general errors,
-+ * and -2 for "not a boolean, but $NAME= matches".
- */
- int parse_boolean(const char *name, const char *s, const char *e);
-
---
-2.35.1
-
diff --git a/0041-x86-spec-ctrl-Add-fine-grained-cmdline-suboptions-fo.patch b/0041-x86-spec-ctrl-Add-fine-grained-cmdline-suboptions-fo.patch
deleted file mode 100644
index 9964bb9..0000000
--- a/0041-x86-spec-ctrl-Add-fine-grained-cmdline-suboptions-fo.patch
+++ /dev/null
@@ -1,137 +0,0 @@
-From f066c8bb3e5686141cef6fa1dc86ea9f37c5388a Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 12 Jul 2022 11:15:37 +0200
-Subject: [PATCH 41/51] x86/spec-ctrl: Add fine-grained cmdline suboptions for
- primitives
-
-Support controling the PV/HVM suboption of msr-sc/rsb/md-clear, which
-previously wasn't possible.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 27357c394ba6e1571a89105b840ce1c6f026485c
-master date: 2022-07-11 15:21:35 +0100
----
- docs/misc/xen-command-line.pandoc | 12 ++++--
- xen/arch/x86/spec_ctrl.c | 66 ++++++++++++++++++++++++++-----
- 2 files changed, 66 insertions(+), 12 deletions(-)
-
-diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index 46e9c58d35cd..1bbdb55129cc 100644
---- a/docs/misc/xen-command-line.pandoc
-+++ b/docs/misc/xen-command-line.pandoc
-@@ -2233,7 +2233,8 @@ not be able to control the state of the mitigation.
- By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
-
- ### spec-ctrl (x86)
--> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
-+> `= List of [ <bool>, xen=<bool>, {pv,hvm}=<bool>,
-+> {msr-sc,rsb,md-clear}=<bool>|{pv,hvm}=<bool>,
- > bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
- > eager-fpu,l1d-flush,branch-harden,srb-lock,
- > unpriv-mmio}=<bool> ]`
-@@ -2258,12 +2259,17 @@ in place for guests to use.
-
- Use of a positive boolean value for either of these options is invalid.
-
--The booleans `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` offer fine
-+The `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` options offer fine
- grained control over the primitives by Xen. These impact Xen's ability to
--protect itself, and Xen's ability to virtualise support for guests to use.
-+protect itself, and/or Xen's ability to virtualise support for guests to use.
-
- * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
- respectively.
-+* Each other option can be used either as a plain boolean
-+ (e.g. `spec-ctrl=rsb` to control both the PV and HVM sub-options), or with
-+ `pv=` or `hvm=` subsuboptions (e.g. `spec-ctrl=rsb=no-hvm` to disable HVM
-+ RSB only).
-+
- * `msr-sc=` offers control over Xen's support for manipulating `MSR_SPEC_CTRL`
- on entry and exit. These blocks are necessary to virtualise support for
- guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index ae74943c1053..9507e5da60a9 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -147,20 +147,68 @@ static int __init parse_spec_ctrl(const char *s)
- opt_rsb_hvm = val;
- opt_md_clear_hvm = val;
- }
-- else if ( (val = parse_boolean("msr-sc", s, ss)) >= 0 )
-+ else if ( (val = parse_boolean("msr-sc", s, ss)) != -1 )
- {
-- opt_msr_sc_pv = val;
-- opt_msr_sc_hvm = val;
-+ switch ( val )
-+ {
-+ case 0:
-+ case 1:
-+ opt_msr_sc_pv = opt_msr_sc_hvm = val;
-+ break;
-+
-+ case -2:
-+ s += strlen("msr-sc=");
-+ if ( (val = parse_boolean("pv", s, ss)) >= 0 )
-+ opt_msr_sc_pv = val;
-+ else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
-+ opt_msr_sc_hvm = val;
-+ else
-+ default:
-+ rc = -EINVAL;
-+ break;
-+ }
- }
-- else if ( (val = parse_boolean("rsb", s, ss)) >= 0 )
-+ else if ( (val = parse_boolean("rsb", s, ss)) != -1 )
- {
-- opt_rsb_pv = val;
-- opt_rsb_hvm = val;
-+ switch ( val )
-+ {
-+ case 0:
-+ case 1:
-+ opt_rsb_pv = opt_rsb_hvm = val;
-+ break;
-+
-+ case -2:
-+ s += strlen("rsb=");
-+ if ( (val = parse_boolean("pv", s, ss)) >= 0 )
-+ opt_rsb_pv = val;
-+ else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
-+ opt_rsb_hvm = val;
-+ else
-+ default:
-+ rc = -EINVAL;
-+ break;
-+ }
- }
-- else if ( (val = parse_boolean("md-clear", s, ss)) >= 0 )
-+ else if ( (val = parse_boolean("md-clear", s, ss)) != -1 )
- {
-- opt_md_clear_pv = val;
-- opt_md_clear_hvm = val;
-+ switch ( val )
-+ {
-+ case 0:
-+ case 1:
-+ opt_md_clear_pv = opt_md_clear_hvm = val;
-+ break;
-+
-+ case -2:
-+ s += strlen("md-clear=");
-+ if ( (val = parse_boolean("pv", s, ss)) >= 0 )
-+ opt_md_clear_pv = val;
-+ else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
-+ opt_md_clear_hvm = val;
-+ else
-+ default:
-+ rc = -EINVAL;
-+ break;
-+ }
- }
-
- /* Xen's speculative sidechannel mitigation settings. */
---
-2.35.1
-
diff --git a/0042-tools-helpers-fix-build-of-xen-init-dom0-with-Werror.patch b/0042-tools-helpers-fix-build-of-xen-init-dom0-with-Werror.patch
deleted file mode 100644
index eea790a..0000000
--- a/0042-tools-helpers-fix-build-of-xen-init-dom0-with-Werror.patch
+++ /dev/null
@@ -1,28 +0,0 @@
-From 14fd97e3de939a63a6e467f240efb49fe226a5dc Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Tue, 12 Jul 2022 11:16:10 +0200
-Subject: [PATCH 42/51] tools/helpers: fix build of xen-init-dom0 with -Werror
-
-Missing prototype of asprintf() without _GNU_SOURCE.
-
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Reviewed-by: Henry Wang <Henry.Wang@arm.com>
-master commit: d693b22733044d68e9974766b5c9e6259c9b1708
-master date: 2022-07-12 08:38:35 +0200
----
- tools/helpers/xen-init-dom0.c | 2 ++
- 1 file changed, 2 insertions(+)
-
-diff --git a/tools/helpers/xen-init-dom0.c b/tools/helpers/xen-init-dom0.c
-index c99224a4b607..b4861c9e8041 100644
---- a/tools/helpers/xen-init-dom0.c
-+++ b/tools/helpers/xen-init-dom0.c
-@@ -1,3 +1,5 @@
-+#define _GNU_SOURCE
-+
- #include <stdlib.h>
- #include <stdint.h>
- #include <string.h>
---
-2.35.1
-
diff --git a/0043-libxl-check-return-value-of-libxl__xs_directory-in-n.patch b/0043-libxl-check-return-value-of-libxl__xs_directory-in-n.patch
deleted file mode 100644
index 0c2470a..0000000
--- a/0043-libxl-check-return-value-of-libxl__xs_directory-in-n.patch
+++ /dev/null
@@ -1,38 +0,0 @@
-From 744accad1b73223b3261e3e678e16e030d83b179 Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Tue, 12 Jul 2022 11:16:30 +0200
-Subject: [PATCH 43/51] libxl: check return value of libxl__xs_directory in
- name2bdf
-
-libxl__xs_directory() can potentially return NULL without setting `n`.
-As `n` isn't initialised, we need to check libxl__xs_directory()
-return value before checking `n`. Otherwise, `n` might be non-zero
-with `bdfs` NULL which would lead to a segv.
-
-Fixes: 57bff091f4 ("libxl: add 'name' field to 'libxl_device_pci' in the IDL...")
-Reported-by: "G.R." <firemeteor@users.sourceforge.net>
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-Tested-by: "G.R." <firemeteor@users.sourceforge.net>
-master commit: d778089ac70e5b8e3bdea0c85fc8c0b9ed0eaf2f
-master date: 2022-07-12 08:38:51 +0200
----
- tools/libs/light/libxl_pci.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
-index 4bbbfe9f168f..ce3bf7c0ae81 100644
---- a/tools/libs/light/libxl_pci.c
-+++ b/tools/libs/light/libxl_pci.c
-@@ -859,7 +859,7 @@ static int name2bdf(libxl__gc *gc, libxl_device_pci *pci)
- int rc = ERROR_NOTFOUND;
-
- bdfs = libxl__xs_directory(gc, XBT_NULL, PCI_INFO_PATH, &n);
-- if (!n)
-+ if (!bdfs || !n)
- goto out;
-
- for (i = 0; i < n; i++) {
---
-2.35.1
-
diff --git a/0044-x86-spec-ctrl-Rework-spec_ctrl_flags-context-switchi.patch b/0044-x86-spec-ctrl-Rework-spec_ctrl_flags-context-switchi.patch
deleted file mode 100644
index d8517f8..0000000
--- a/0044-x86-spec-ctrl-Rework-spec_ctrl_flags-context-switchi.patch
+++ /dev/null
@@ -1,167 +0,0 @@
-From 3a280cbae7022b83af91c27a8e2211ba3b1234f5 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 1 Jul 2022 15:59:40 +0100
-Subject: [PATCH 44/51] x86/spec-ctrl: Rework spec_ctrl_flags context switching
-
-We are shortly going to need to context switch new bits in both the vcpu and
-S3 paths. Introduce SCF_IST_MASK and SCF_DOM_MASK, and rework d->arch.verw
-into d->arch.spec_ctrl_flags to accommodate.
-
-No functional change.
-
-This is part of XSA-407.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 5796912f7279d9348a3166655588d30eae9f72cc)
----
- xen/arch/x86/acpi/power.c | 8 ++++----
- xen/arch/x86/domain.c | 8 ++++----
- xen/arch/x86/spec_ctrl.c | 9 ++++++---
- xen/include/asm-x86/domain.h | 3 +--
- xen/include/asm-x86/spec_ctrl.h | 30 ++++++++++++++++++++++++++++-
- xen/include/asm-x86/spec_ctrl_asm.h | 3 ---
- 6 files changed, 44 insertions(+), 17 deletions(-)
-
-diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
-index 5eaa77f66a28..dd397f713067 100644
---- a/xen/arch/x86/acpi/power.c
-+++ b/xen/arch/x86/acpi/power.c
-@@ -248,8 +248,8 @@ static int enter_state(u32 state)
- error = 0;
-
- ci = get_cpu_info();
-- /* Avoid NMI/#MC using MSR_SPEC_CTRL until we've reloaded microcode. */
-- ci->spec_ctrl_flags &= ~SCF_ist_wrmsr;
-+ /* Avoid NMI/#MC using unsafe MSRs until we've reloaded microcode. */
-+ ci->spec_ctrl_flags &= ~SCF_IST_MASK;
-
- ACPI_FLUSH_CPU_CACHE();
-
-@@ -292,8 +292,8 @@ static int enter_state(u32 state)
- if ( !recheck_cpu_features(0) )
- panic("Missing previously available feature(s)\n");
-
-- /* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */
-- ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_ist_wrmsr);
-+ /* Re-enabled default NMI/#MC use of MSRs now microcode is loaded. */
-+ ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_IST_MASK);
-
- if ( boot_cpu_has(X86_FEATURE_IBRSB) || boot_cpu_has(X86_FEATURE_IBRS) )
- {
-diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
-index 1fe6644a71ae..82a0b73cf6ef 100644
---- a/xen/arch/x86/domain.c
-+++ b/xen/arch/x86/domain.c
-@@ -2092,10 +2092,10 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
- }
- }
-
-- /* Update the top-of-stack block with the VERW disposition. */
-- info->spec_ctrl_flags &= ~SCF_verw;
-- if ( nextd->arch.verw )
-- info->spec_ctrl_flags |= SCF_verw;
-+ /* Update the top-of-stack block with the new spec_ctrl settings. */
-+ info->spec_ctrl_flags =
-+ (info->spec_ctrl_flags & ~SCF_DOM_MASK) |
-+ (nextd->arch.spec_ctrl_flags & SCF_DOM_MASK);
- }
-
- sched_context_switched(prev, next);
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 9507e5da60a9..7e646680f1c7 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -1010,9 +1010,12 @@ void spec_ctrl_init_domain(struct domain *d)
- {
- bool pv = is_pv_domain(d);
-
-- d->arch.verw =
-- (pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
-- (opt_fb_clear_mmio && is_iommu_enabled(d));
-+ bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
-+ (opt_fb_clear_mmio && is_iommu_enabled(d)));
-+
-+ d->arch.spec_ctrl_flags =
-+ (verw ? SCF_verw : 0) |
-+ 0;
- }
-
- void __init init_speculation_mitigations(void)
-diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
-index 2398a1d99da9..e4c099262cb7 100644
---- a/xen/include/asm-x86/domain.h
-+++ b/xen/include/asm-x86/domain.h
-@@ -319,8 +319,7 @@ struct arch_domain
- uint32_t pci_cf8;
- uint8_t cmos_idx;
-
-- /* Use VERW on return-to-guest for its flushing side effect. */
-- bool verw;
-+ uint8_t spec_ctrl_flags; /* See SCF_DOM_MASK */
-
- union {
- struct pv_domain pv;
-diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
-index 7e83e0179fb9..3cd72e40305f 100644
---- a/xen/include/asm-x86/spec_ctrl.h
-+++ b/xen/include/asm-x86/spec_ctrl.h
-@@ -20,12 +20,40 @@
- #ifndef __X86_SPEC_CTRL_H__
- #define __X86_SPEC_CTRL_H__
-
--/* Encoding of cpuinfo.spec_ctrl_flags */
-+/*
-+ * Encoding of:
-+ * cpuinfo.spec_ctrl_flags
-+ * default_spec_ctrl_flags
-+ * domain.spec_ctrl_flags
-+ *
-+ * Live settings are in the top-of-stack block, because they need to be
-+ * accessable when XPTI is active. Some settings are fixed from boot, some
-+ * context switched per domain, and some inhibited in the S3 path.
-+ */
- #define SCF_use_shadow (1 << 0)
- #define SCF_ist_wrmsr (1 << 1)
- #define SCF_ist_rsb (1 << 2)
- #define SCF_verw (1 << 3)
-
-+/*
-+ * The IST paths (NMI/#MC) can interrupt any arbitrary context. Some
-+ * functionality requires updated microcode to work.
-+ *
-+ * On boot, this is easy; we load microcode before figuring out which
-+ * speculative protections to apply. However, on the S3 resume path, we must
-+ * be able to disable the configured mitigations until microcode is reloaded.
-+ *
-+ * These are the controls to inhibit on the S3 resume path until microcode has
-+ * been reloaded.
-+ */
-+#define SCF_IST_MASK (SCF_ist_wrmsr)
-+
-+/*
-+ * Some speculative protections are per-domain. These settings are merged
-+ * into the top-of-stack block in the context switch path.
-+ */
-+#define SCF_DOM_MASK (SCF_verw)
-+
- #ifndef __ASSEMBLY__
-
- #include <asm/alternative.h>
-diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
-index 5a590bac44aa..66b00d511fc6 100644
---- a/xen/include/asm-x86/spec_ctrl_asm.h
-+++ b/xen/include/asm-x86/spec_ctrl_asm.h
-@@ -248,9 +248,6 @@
-
- /*
- * Use in IST interrupt/exception context. May interrupt Xen or PV context.
-- * Fine grain control of SCF_ist_wrmsr is needed for safety in the S3 resume
-- * path to avoid using MSR_SPEC_CTRL before the microcode introducing it has
-- * been reloaded.
- */
- .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
- /*
---
-2.35.1
-
diff --git a/0045-x86-spec-ctrl-Rename-SCF_ist_wrmsr-to-SCF_ist_sc_msr.patch b/0045-x86-spec-ctrl-Rename-SCF_ist_wrmsr-to-SCF_ist_sc_msr.patch
deleted file mode 100644
index 5b841a6..0000000
--- a/0045-x86-spec-ctrl-Rename-SCF_ist_wrmsr-to-SCF_ist_sc_msr.patch
+++ /dev/null
@@ -1,110 +0,0 @@
-From 31aa2a20bfefc3a8a200da54a56471bf99f9630e Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 28 Jun 2022 14:36:56 +0100
-Subject: [PATCH 45/51] x86/spec-ctrl: Rename SCF_ist_wrmsr to SCF_ist_sc_msr
-
-We are about to introduce SCF_ist_ibpb, at which point SCF_ist_wrmsr becomes
-ambiguous.
-
-No functional change.
-
-This is part of XSA-407.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 76d6a36f645dfdbad8830559d4d52caf36efc75e)
----
- xen/arch/x86/spec_ctrl.c | 6 +++---
- xen/include/asm-x86/spec_ctrl.h | 4 ++--
- xen/include/asm-x86/spec_ctrl_asm.h | 8 ++++----
- 3 files changed, 9 insertions(+), 9 deletions(-)
-
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 7e646680f1c7..89f95c083e1b 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -1115,7 +1115,7 @@ void __init init_speculation_mitigations(void)
- {
- if ( opt_msr_sc_pv )
- {
-- default_spec_ctrl_flags |= SCF_ist_wrmsr;
-+ default_spec_ctrl_flags |= SCF_ist_sc_msr;
- setup_force_cpu_cap(X86_FEATURE_SC_MSR_PV);
- }
-
-@@ -1126,7 +1126,7 @@ void __init init_speculation_mitigations(void)
- * Xen's value is not restored atomically. An early NMI hitting
- * the VMExit path needs to restore Xen's value for safety.
- */
-- default_spec_ctrl_flags |= SCF_ist_wrmsr;
-+ default_spec_ctrl_flags |= SCF_ist_sc_msr;
- setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);
- }
- }
-@@ -1139,7 +1139,7 @@ void __init init_speculation_mitigations(void)
- * on real hardware matches the availability of MSR_SPEC_CTRL in the
- * first place.
- *
-- * No need for SCF_ist_wrmsr because Xen's value is restored
-+ * No need for SCF_ist_sc_msr because Xen's value is restored
- * atomically WRT NMIs in the VMExit path.
- *
- * TODO: Adjust cpu_has_svm_spec_ctrl to be usable earlier on boot.
-diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
-index 3cd72e40305f..f8f0ac47e759 100644
---- a/xen/include/asm-x86/spec_ctrl.h
-+++ b/xen/include/asm-x86/spec_ctrl.h
-@@ -31,7 +31,7 @@
- * context switched per domain, and some inhibited in the S3 path.
- */
- #define SCF_use_shadow (1 << 0)
--#define SCF_ist_wrmsr (1 << 1)
-+#define SCF_ist_sc_msr (1 << 1)
- #define SCF_ist_rsb (1 << 2)
- #define SCF_verw (1 << 3)
-
-@@ -46,7 +46,7 @@
- * These are the controls to inhibit on the S3 resume path until microcode has
- * been reloaded.
- */
--#define SCF_IST_MASK (SCF_ist_wrmsr)
-+#define SCF_IST_MASK (SCF_ist_sc_msr)
-
- /*
- * Some speculative protections are per-domain. These settings are merged
-diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
-index 66b00d511fc6..0ff1b118f882 100644
---- a/xen/include/asm-x86/spec_ctrl_asm.h
-+++ b/xen/include/asm-x86/spec_ctrl_asm.h
-@@ -266,8 +266,8 @@
-
- .L\@_skip_rsb:
-
-- test $SCF_ist_wrmsr, %al
-- jz .L\@_skip_wrmsr
-+ test $SCF_ist_sc_msr, %al
-+ jz .L\@_skip_msr_spec_ctrl
-
- xor %edx, %edx
- testb $3, UREGS_cs(%rsp)
-@@ -290,7 +290,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
- * to speculate around the WRMSR. As a result, we need a dispatch
- * serialising instruction in the else clause.
- */
--.L\@_skip_wrmsr:
-+.L\@_skip_msr_spec_ctrl:
- lfence
- UNLIKELY_END(\@_serialise)
- .endm
-@@ -301,7 +301,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
- * Requires %rbx=stack_end
- * Clobbers %rax, %rcx, %rdx
- */
-- testb $SCF_ist_wrmsr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
-+ testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
- jz .L\@_skip
-
- DO_SPEC_CTRL_EXIT_TO_XEN
---
-2.35.1
-
diff --git a/0046-x86-spec-ctrl-Rename-opt_ibpb-to-opt_ibpb_ctxt_switc.patch b/0046-x86-spec-ctrl-Rename-opt_ibpb-to-opt_ibpb_ctxt_switc.patch
deleted file mode 100644
index a950639..0000000
--- a/0046-x86-spec-ctrl-Rename-opt_ibpb-to-opt_ibpb_ctxt_switc.patch
+++ /dev/null
@@ -1,97 +0,0 @@
-From e7671561c84322860875745e57b228a7a310f2bf Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Mon, 4 Jul 2022 21:32:17 +0100
-Subject: [PATCH 46/51] x86/spec-ctrl: Rename opt_ibpb to opt_ibpb_ctxt_switch
-
-We are about to introduce the use of IBPB at different points in Xen, making
-opt_ibpb ambiguous. Rename it to opt_ibpb_ctxt_switch.
-
-No functional change.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit a8e5ef079d6f5c88c472e3e620db5a8d1402a50d)
----
- xen/arch/x86/domain.c | 2 +-
- xen/arch/x86/spec_ctrl.c | 10 +++++-----
- xen/include/asm-x86/spec_ctrl.h | 2 +-
- 3 files changed, 7 insertions(+), 7 deletions(-)
-
-diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
-index 82a0b73cf6ef..0d39981550ca 100644
---- a/xen/arch/x86/domain.c
-+++ b/xen/arch/x86/domain.c
-@@ -2064,7 +2064,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
-
- ctxt_switch_levelling(next);
-
-- if ( opt_ibpb && !is_idle_domain(nextd) )
-+ if ( opt_ibpb_ctxt_switch && !is_idle_domain(nextd) )
- {
- static DEFINE_PER_CPU(unsigned int, last);
- unsigned int *last_id = &this_cpu(last);
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 89f95c083e1b..f4ae36eae2d0 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -54,7 +54,7 @@ int8_t __initdata opt_stibp = -1;
- bool __read_mostly opt_ssbd;
- int8_t __initdata opt_psfd = -1;
-
--bool __read_mostly opt_ibpb = true;
-+bool __read_mostly opt_ibpb_ctxt_switch = true;
- int8_t __read_mostly opt_eager_fpu = -1;
- int8_t __read_mostly opt_l1d_flush = -1;
- static bool __initdata opt_branch_harden = true;
-@@ -117,7 +117,7 @@ static int __init parse_spec_ctrl(const char *s)
-
- opt_thunk = THUNK_JMP;
- opt_ibrs = 0;
-- opt_ibpb = false;
-+ opt_ibpb_ctxt_switch = false;
- opt_ssbd = false;
- opt_l1d_flush = 0;
- opt_branch_harden = false;
-@@ -238,7 +238,7 @@ static int __init parse_spec_ctrl(const char *s)
-
- /* Misc settings. */
- else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
-- opt_ibpb = val;
-+ opt_ibpb_ctxt_switch = val;
- else if ( (val = parse_boolean("eager-fpu", s, ss)) >= 0 )
- opt_eager_fpu = val;
- else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
-@@ -458,7 +458,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
- (opt_tsx & 1) ? " TSX+" : " TSX-",
- !cpu_has_srbds_ctrl ? "" :
- opt_srb_lock ? " SRB_LOCK+" : " SRB_LOCK-",
-- opt_ibpb ? " IBPB" : "",
-+ opt_ibpb_ctxt_switch ? " IBPB-ctxt" : "",
- opt_l1d_flush ? " L1D_FLUSH" : "",
- opt_md_clear_pv || opt_md_clear_hvm ||
- opt_fb_clear_mmio ? " VERW" : "",
-@@ -1240,7 +1240,7 @@ void __init init_speculation_mitigations(void)
-
- /* Check we have hardware IBPB support before using it... */
- if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
-- opt_ibpb = false;
-+ opt_ibpb_ctxt_switch = false;
-
- /* Check whether Eager FPU should be enabled by default. */
- if ( opt_eager_fpu == -1 )
-diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
-index f8f0ac47e759..fb4365575620 100644
---- a/xen/include/asm-x86/spec_ctrl.h
-+++ b/xen/include/asm-x86/spec_ctrl.h
-@@ -63,7 +63,7 @@
- void init_speculation_mitigations(void);
- void spec_ctrl_init_domain(struct domain *d);
-
--extern bool opt_ibpb;
-+extern bool opt_ibpb_ctxt_switch;
- extern bool opt_ssbd;
- extern int8_t opt_eager_fpu;
- extern int8_t opt_l1d_flush;
---
-2.35.1
-
diff --git a/0047-x86-spec-ctrl-Rework-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch b/0047-x86-spec-ctrl-Rework-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch
deleted file mode 100644
index 3ce9fd9..0000000
--- a/0047-x86-spec-ctrl-Rework-SPEC_CTRL_ENTRY_FROM_INTR_IST.patch
+++ /dev/null
@@ -1,106 +0,0 @@
-From 2a9e690a0ad5d54dca4166e089089a07bbe7fc85 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 1 Jul 2022 15:59:40 +0100
-Subject: [PATCH 47/51] x86/spec-ctrl: Rework SPEC_CTRL_ENTRY_FROM_INTR_IST
-
-We are shortly going to add a conditional IBPB in this path.
-
-Therefore, we cannot hold spec_ctrl_flags in %eax, and rely on only clobbering
-it after we're done with its contents. %rbx is available for use, and the
-more normal register to hold preserved information in.
-
-With %rax freed up, use it instead of %rdx for the RSB tmp register, and for
-the adjustment to spec_ctrl_flags.
-
-This leaves no use of %rdx, except as 0 for the upper half of WRMSR. In
-practice, %rdx is 0 from SAVE_ALL on all paths and isn't likely to change in
-the foreseeable future, so update the macro entry requirements to state this
-dependency. This marginal optimisation can be revisited if circumstances
-change.
-
-No practical change.
-
-This is part of XSA-407.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit e9b8d31981f184c6539f91ec54bd9cae29cdae36)
----
- xen/arch/x86/x86_64/entry.S | 4 ++--
- xen/include/asm-x86/spec_ctrl_asm.h | 21 ++++++++++-----------
- 2 files changed, 12 insertions(+), 13 deletions(-)
-
-diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
-index 2a86938f1f32..a1810bf4d311 100644
---- a/xen/arch/x86/x86_64/entry.S
-+++ b/xen/arch/x86/x86_64/entry.S
-@@ -932,7 +932,7 @@ ENTRY(double_fault)
-
- GET_STACK_END(14)
-
-- SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
-+ SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: abcd */
- /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
-
- mov STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rbx
-@@ -968,7 +968,7 @@ handle_ist_exception:
-
- GET_STACK_END(14)
-
-- SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
-+ SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: abcd */
- /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
-
- mov STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
-diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
-index 0ff1b118f882..15e24cde00d1 100644
---- a/xen/include/asm-x86/spec_ctrl_asm.h
-+++ b/xen/include/asm-x86/spec_ctrl_asm.h
-@@ -251,34 +251,33 @@
- */
- .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
- /*
-- * Requires %rsp=regs, %r14=stack_end
-- * Clobbers %rax, %rcx, %rdx
-+ * Requires %rsp=regs, %r14=stack_end, %rdx=0
-+ * Clobbers %rax, %rbx, %rcx, %rdx
- *
- * This is logical merge of DO_OVERWRITE_RSB and DO_SPEC_CTRL_ENTRY
- * maybexen=1, but with conditionals rather than alternatives.
- */
-- movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %eax
-+ movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
-
-- test $SCF_ist_rsb, %al
-+ test $SCF_ist_rsb, %bl
- jz .L\@_skip_rsb
-
-- DO_OVERWRITE_RSB tmp=rdx /* Clobbers %rcx/%rdx */
-+ DO_OVERWRITE_RSB /* Clobbers %rax/%rcx */
-
- .L\@_skip_rsb:
-
-- test $SCF_ist_sc_msr, %al
-+ test $SCF_ist_sc_msr, %bl
- jz .L\@_skip_msr_spec_ctrl
-
-- xor %edx, %edx
-+ xor %eax, %eax
- testb $3, UREGS_cs(%rsp)
-- setnz %dl
-- not %edx
-- and %dl, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
-+ setnz %al
-+ not %eax
-+ and %al, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
-
- /* Load Xen's intended value. */
- mov $MSR_SPEC_CTRL, %ecx
- movzbl STACK_CPUINFO_FIELD(xen_spec_ctrl)(%r14), %eax
-- xor %edx, %edx
- wrmsr
-
- /* Opencoded UNLIKELY_START() with no condition. */
---
-2.35.1
-
diff --git a/0048-x86-spec-ctrl-Support-IBPB-on-entry.patch b/0048-x86-spec-ctrl-Support-IBPB-on-entry.patch
deleted file mode 100644
index d5ad043..0000000
--- a/0048-x86-spec-ctrl-Support-IBPB-on-entry.patch
+++ /dev/null
@@ -1,300 +0,0 @@
-From 76c5fcee9027fb8823dd501086f0ff3ee3c4231c Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 24 Feb 2022 13:44:33 +0000
-Subject: [PATCH 48/51] x86/spec-ctrl: Support IBPB-on-entry
-
-We are going to need this to mitigate Branch Type Confusion on AMD/Hygon CPUs,
-but as we've talked about using it in other cases too, arrange to support it
-generally. However, this is also very expensive in some cases, so we're going
-to want per-domain controls.
-
-Introduce SCF_ist_ibpb and SCF_entry_ibpb controls, adding them to the IST and
-DOM masks as appropriate. Also introduce X86_FEATURE_IBPB_ENTRY_{PV,HVM} to
-to patch the code blocks.
-
-For SVM, the STGI is serialising enough to protect against Spectre-v1 attacks,
-so no "else lfence" is necessary. VT-x will use use the MSR host load list,
-so doesn't need any code in the VMExit path.
-
-For the IST path, we can't safely check CPL==0 to skip a flush, as we might
-have hit an entry path before it's IBPB. As IST hitting Xen is rare, flush
-irrespective of CPL. A later path, SCF_ist_sc_msr, provides Spectre-v1
-safety.
-
-For the PV paths, we know we're interrupting CPL>0, while for the INTR paths,
-we can safely check CPL==0. Only flush when interrupting guest context.
-
-An "else lfence" is needed for safety, but we want to be able to skip it on
-unaffected CPUs, so the block wants to be an alternative, which means the
-lfence has to be inline rather than UNLIKELY() (the replacement block doesn't
-have displacements fixed up for anything other than the first instruction).
-
-As with SPEC_CTRL_ENTRY_FROM_INTR_IST, %rdx is 0 on entry so rely on this to
-shrink the logic marginally. Update the comments to specify this new
-dependency.
-
-This is part of XSA-407.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 53a570b285694947776d5190f591a0d5b9b18de7)
----
- xen/arch/x86/hvm/svm/entry.S | 18 ++++++++++-
- xen/arch/x86/hvm/vmx/vmcs.c | 4 +++
- xen/arch/x86/x86_64/compat/entry.S | 2 +-
- xen/arch/x86/x86_64/entry.S | 12 +++----
- xen/include/asm-x86/cpufeatures.h | 2 ++
- xen/include/asm-x86/spec_ctrl.h | 6 ++--
- xen/include/asm-x86/spec_ctrl_asm.h | 49 +++++++++++++++++++++++++++--
- 7 files changed, 81 insertions(+), 12 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
-index 4ae55a2ef605..0ff4008060fa 100644
---- a/xen/arch/x86/hvm/svm/entry.S
-+++ b/xen/arch/x86/hvm/svm/entry.S
-@@ -97,7 +97,19 @@ __UNLIKELY_END(nsvm_hap)
-
- GET_CURRENT(bx)
-
-- /* SPEC_CTRL_ENTRY_FROM_SVM Req: %rsp=regs/cpuinfo Clob: acd */
-+ /* SPEC_CTRL_ENTRY_FROM_SVM Req: %rsp=regs/cpuinfo, %rdx=0 Clob: acd */
-+
-+ .macro svm_vmexit_cond_ibpb
-+ testb $SCF_entry_ibpb, CPUINFO_xen_spec_ctrl(%rsp)
-+ jz .L_skip_ibpb
-+
-+ mov $MSR_PRED_CMD, %ecx
-+ mov $PRED_CMD_IBPB, %eax
-+ wrmsr
-+.L_skip_ibpb:
-+ .endm
-+ ALTERNATIVE "", svm_vmexit_cond_ibpb, X86_FEATURE_IBPB_ENTRY_HVM
-+
- ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
-
- .macro svm_vmexit_spec_ctrl
-@@ -114,6 +126,10 @@ __UNLIKELY_END(nsvm_hap)
- ALTERNATIVE "", svm_vmexit_spec_ctrl, X86_FEATURE_SC_MSR_HVM
- /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
-
-+ /*
-+ * STGI is executed unconditionally, and is sufficiently serialising
-+ * to safely resolve any Spectre-v1 concerns in the above logic.
-+ */
- stgi
- GLOBAL(svm_stgi_label)
- mov %rsp,%rdi
-diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
-index f9f9bc18cdbc..dd817cee4e69 100644
---- a/xen/arch/x86/hvm/vmx/vmcs.c
-+++ b/xen/arch/x86/hvm/vmx/vmcs.c
-@@ -1345,6 +1345,10 @@ static int construct_vmcs(struct vcpu *v)
- rc = vmx_add_msr(v, MSR_FLUSH_CMD, FLUSH_CMD_L1D,
- VMX_MSR_GUEST_LOADONLY);
-
-+ if ( !rc && (d->arch.spec_ctrl_flags & SCF_entry_ibpb) )
-+ rc = vmx_add_msr(v, MSR_PRED_CMD, PRED_CMD_IBPB,
-+ VMX_MSR_HOST);
-+
- out:
- vmx_vmcs_exit(v);
-
-diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
-index 5fd6dbbd4513..b86d38d1c50d 100644
---- a/xen/arch/x86/x86_64/compat/entry.S
-+++ b/xen/arch/x86/x86_64/compat/entry.S
-@@ -18,7 +18,7 @@ ENTRY(entry_int82)
- movl $HYPERCALL_VECTOR, 4(%rsp)
- SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */
-
-- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
-+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
- /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
-
- CR4_PV32_RESTORE
-diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
-index a1810bf4d311..fba8ae498f74 100644
---- a/xen/arch/x86/x86_64/entry.S
-+++ b/xen/arch/x86/x86_64/entry.S
-@@ -260,7 +260,7 @@ ENTRY(lstar_enter)
- movl $TRAP_syscall, 4(%rsp)
- SAVE_ALL
-
-- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
-+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
- /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
-
- GET_STACK_END(bx)
-@@ -298,7 +298,7 @@ ENTRY(cstar_enter)
- movl $TRAP_syscall, 4(%rsp)
- SAVE_ALL
-
-- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
-+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
- /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
-
- GET_STACK_END(bx)
-@@ -338,7 +338,7 @@ GLOBAL(sysenter_eflags_saved)
- movl $TRAP_syscall, 4(%rsp)
- SAVE_ALL
-
-- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
-+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
- /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
-
- GET_STACK_END(bx)
-@@ -392,7 +392,7 @@ ENTRY(int80_direct_trap)
- movl $0x80, 4(%rsp)
- SAVE_ALL
-
-- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
-+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
- /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
-
- GET_STACK_END(bx)
-@@ -674,7 +674,7 @@ ENTRY(common_interrupt)
-
- GET_STACK_END(14)
-
-- SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
-+ SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: acd */
- /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
-
- mov STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
-@@ -708,7 +708,7 @@ GLOBAL(handle_exception)
-
- GET_STACK_END(14)
-
-- SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
-+ SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: acd */
- /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
-
- mov STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
-diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
-index 493d338a085e..672c9ee22ba2 100644
---- a/xen/include/asm-x86/cpufeatures.h
-+++ b/xen/include/asm-x86/cpufeatures.h
-@@ -39,6 +39,8 @@ XEN_CPUFEATURE(XEN_LBR, X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
- XEN_CPUFEATURE(SC_VERW_IDLE, X86_SYNTH(25)) /* VERW used by Xen for idle */
- XEN_CPUFEATURE(XEN_SHSTK, X86_SYNTH(26)) /* Xen uses CET Shadow Stacks */
- XEN_CPUFEATURE(XEN_IBT, X86_SYNTH(27)) /* Xen uses CET Indirect Branch Tracking */
-+XEN_CPUFEATURE(IBPB_ENTRY_PV, X86_SYNTH(28)) /* MSR_PRED_CMD used by Xen for PV */
-+XEN_CPUFEATURE(IBPB_ENTRY_HVM, X86_SYNTH(29)) /* MSR_PRED_CMD used by Xen for HVM */
-
- /* Bug words follow the synthetic words. */
- #define X86_NR_BUG 1
-diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
-index fb4365575620..3fc599a817c4 100644
---- a/xen/include/asm-x86/spec_ctrl.h
-+++ b/xen/include/asm-x86/spec_ctrl.h
-@@ -34,6 +34,8 @@
- #define SCF_ist_sc_msr (1 << 1)
- #define SCF_ist_rsb (1 << 2)
- #define SCF_verw (1 << 3)
-+#define SCF_ist_ibpb (1 << 4)
-+#define SCF_entry_ibpb (1 << 5)
-
- /*
- * The IST paths (NMI/#MC) can interrupt any arbitrary context. Some
-@@ -46,13 +48,13 @@
- * These are the controls to inhibit on the S3 resume path until microcode has
- * been reloaded.
- */
--#define SCF_IST_MASK (SCF_ist_sc_msr)
-+#define SCF_IST_MASK (SCF_ist_sc_msr | SCF_ist_ibpb)
-
- /*
- * Some speculative protections are per-domain. These settings are merged
- * into the top-of-stack block in the context switch path.
- */
--#define SCF_DOM_MASK (SCF_verw)
-+#define SCF_DOM_MASK (SCF_verw | SCF_entry_ibpb)
-
- #ifndef __ASSEMBLY__
-
-diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
-index 15e24cde00d1..9eb4ad9ab71d 100644
---- a/xen/include/asm-x86/spec_ctrl_asm.h
-+++ b/xen/include/asm-x86/spec_ctrl_asm.h
-@@ -88,6 +88,35 @@
- * - SPEC_CTRL_EXIT_TO_{SVM,VMX}
- */
-
-+.macro DO_SPEC_CTRL_COND_IBPB maybexen:req
-+/*
-+ * Requires %rsp=regs (also cpuinfo if !maybexen)
-+ * Requires %r14=stack_end (if maybexen), %rdx=0
-+ * Clobbers %rax, %rcx, %rdx
-+ *
-+ * Conditionally issue IBPB if SCF_entry_ibpb is active. In the maybexen
-+ * case, we can safely look at UREGS_cs to skip taking the hit when
-+ * interrupting Xen.
-+ */
-+ .if \maybexen
-+ testb $SCF_entry_ibpb, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
-+ jz .L\@_skip
-+ testb $3, UREGS_cs(%rsp)
-+ .else
-+ testb $SCF_entry_ibpb, CPUINFO_xen_spec_ctrl(%rsp)
-+ .endif
-+ jz .L\@_skip
-+
-+ mov $MSR_PRED_CMD, %ecx
-+ mov $PRED_CMD_IBPB, %eax
-+ wrmsr
-+ jmp .L\@_done
-+
-+.L\@_skip:
-+ lfence
-+.L\@_done:
-+.endm
-+
- .macro DO_OVERWRITE_RSB tmp=rax
- /*
- * Requires nothing
-@@ -225,12 +254,16 @@
-
- /* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
- #define SPEC_CTRL_ENTRY_FROM_PV \
-+ ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=0), \
-+ X86_FEATURE_IBPB_ENTRY_PV; \
- ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV; \
- ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=0), \
- X86_FEATURE_SC_MSR_PV
-
- /* Use in interrupt/exception context. May interrupt Xen or PV context. */
- #define SPEC_CTRL_ENTRY_FROM_INTR \
-+ ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=1), \
-+ X86_FEATURE_IBPB_ENTRY_PV; \
- ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV; \
- ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1), \
- X86_FEATURE_SC_MSR_PV
-@@ -254,11 +287,23 @@
- * Requires %rsp=regs, %r14=stack_end, %rdx=0
- * Clobbers %rax, %rbx, %rcx, %rdx
- *
-- * This is logical merge of DO_OVERWRITE_RSB and DO_SPEC_CTRL_ENTRY
-- * maybexen=1, but with conditionals rather than alternatives.
-+ * This is logical merge of:
-+ * DO_SPEC_CTRL_COND_IBPB maybexen=0
-+ * DO_OVERWRITE_RSB
-+ * DO_SPEC_CTRL_ENTRY maybexen=1
-+ * but with conditionals rather than alternatives.
- */
- movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
-
-+ test $SCF_ist_ibpb, %bl
-+ jz .L\@_skip_ibpb
-+
-+ mov $MSR_PRED_CMD, %ecx
-+ mov $PRED_CMD_IBPB, %eax
-+ wrmsr
-+
-+.L\@_skip_ibpb:
-+
- test $SCF_ist_rsb, %bl
- jz .L\@_skip_rsb
-
---
-2.35.1
-
diff --git a/0049-x86-cpuid-Enumeration-for-BTC_NO.patch b/0049-x86-cpuid-Enumeration-for-BTC_NO.patch
deleted file mode 100644
index 0e5d119..0000000
--- a/0049-x86-cpuid-Enumeration-for-BTC_NO.patch
+++ /dev/null
@@ -1,106 +0,0 @@
-From 0826c7596d35c887b3b7858137c7ac374d9ef17a Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Mon, 16 May 2022 15:48:24 +0100
-Subject: [PATCH 49/51] x86/cpuid: Enumeration for BTC_NO
-
-BTC_NO indicates that hardware is not succeptable to Branch Type Confusion.
-
-Zen3 CPUs don't suffer BTC.
-
-This is part of XSA-407.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 76cb04ad64f3ab9ae785988c40655a71dde9c319)
----
- tools/libs/light/libxl_cpuid.c | 1 +
- tools/misc/xen-cpuid.c | 2 +-
- xen/arch/x86/cpu/amd.c | 10 ++++++++++
- xen/arch/x86/spec_ctrl.c | 5 +++--
- xen/include/public/arch-x86/cpufeatureset.h | 1 +
- 5 files changed, 16 insertions(+), 3 deletions(-)
-
-diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
-index d462f9e421ed..bf6fdee360a9 100644
---- a/tools/libs/light/libxl_cpuid.c
-+++ b/tools/libs/light/libxl_cpuid.c
-@@ -288,6 +288,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
- {"virt-ssbd", 0x80000008, NA, CPUID_REG_EBX, 25, 1},
- {"ssb-no", 0x80000008, NA, CPUID_REG_EBX, 26, 1},
- {"psfd", 0x80000008, NA, CPUID_REG_EBX, 28, 1},
-+ {"btc-no", 0x80000008, NA, CPUID_REG_EBX, 29, 1},
-
- {"nc", 0x80000008, NA, CPUID_REG_ECX, 0, 8},
- {"apicidsize", 0x80000008, NA, CPUID_REG_ECX, 12, 4},
-diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
-index bc7dcf55757a..fe22f5f5b68b 100644
---- a/tools/misc/xen-cpuid.c
-+++ b/tools/misc/xen-cpuid.c
-@@ -158,7 +158,7 @@ static const char *const str_e8b[32] =
- /* [22] */ [23] = "ppin",
- [24] = "amd-ssbd", [25] = "virt-ssbd",
- [26] = "ssb-no",
-- [28] = "psfd",
-+ [28] = "psfd", [29] = "btc-no",
- };
-
- static const char *const str_7d0[32] =
-diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
-index b3b9a0df5fed..b158e3acb5c7 100644
---- a/xen/arch/x86/cpu/amd.c
-+++ b/xen/arch/x86/cpu/amd.c
-@@ -847,6 +847,16 @@ static void init_amd(struct cpuinfo_x86 *c)
- warning_add(text);
- }
- break;
-+
-+ case 0x19:
-+ /*
-+ * Zen3 (Fam19h model < 0x10) parts are not susceptible to
-+ * Branch Type Confusion, but predate the allocation of the
-+ * BTC_NO bit. Fill it back in if we're not virtualised.
-+ */
-+ if (!cpu_has_hypervisor && !cpu_has(c, X86_FEATURE_BTC_NO))
-+ __set_bit(X86_FEATURE_BTC_NO, c->x86_capability);
-+ break;
- }
-
- display_cacheinfo(c);
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index f4ae36eae2d0..0f101c057f3e 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -388,7 +388,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
- * Hardware read-only information, stating immunity to certain issues, or
- * suggestions of which mitigation to use.
- */
-- printk(" Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
-+ printk(" Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
- (caps & ARCH_CAPS_RDCL_NO) ? " RDCL_NO" : "",
- (caps & ARCH_CAPS_IBRS_ALL) ? " IBRS_ALL" : "",
- (caps & ARCH_CAPS_RSBA) ? " RSBA" : "",
-@@ -403,7 +403,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
- (e8b & cpufeat_mask(X86_FEATURE_IBRS_ALWAYS)) ? " IBRS_ALWAYS" : "",
- (e8b & cpufeat_mask(X86_FEATURE_STIBP_ALWAYS)) ? " STIBP_ALWAYS" : "",
- (e8b & cpufeat_mask(X86_FEATURE_IBRS_FAST)) ? " IBRS_FAST" : "",
-- (e8b & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "");
-+ (e8b & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "",
-+ (e8b & cpufeat_mask(X86_FEATURE_BTC_NO)) ? " BTC_NO" : "");
-
- /* Hardware features which need driving to mitigate issues. */
- printk(" Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
-diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
-index 743b857dcd5c..e7b8167800a2 100644
---- a/xen/include/public/arch-x86/cpufeatureset.h
-+++ b/xen/include/public/arch-x86/cpufeatureset.h
-@@ -266,6 +266,7 @@ XEN_CPUFEATURE(AMD_SSBD, 8*32+24) /*S MSR_SPEC_CTRL.SSBD available */
- XEN_CPUFEATURE(VIRT_SSBD, 8*32+25) /* MSR_VIRT_SPEC_CTRL.SSBD */
- XEN_CPUFEATURE(SSB_NO, 8*32+26) /*A Hardware not vulnerable to SSB */
- XEN_CPUFEATURE(PSFD, 8*32+28) /*S MSR_SPEC_CTRL.PSFD */
-+XEN_CPUFEATURE(BTC_NO, 8*32+29) /*A Hardware not vulnerable to Branch Type Confusion */
-
- /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
- XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A AVX512 Neural Network Instructions */
---
-2.35.1
-
diff --git a/0050-x86-spec-ctrl-Enable-Zen2-chickenbit.patch b/0050-x86-spec-ctrl-Enable-Zen2-chickenbit.patch
deleted file mode 100644
index c83844d..0000000
--- a/0050-x86-spec-ctrl-Enable-Zen2-chickenbit.patch
+++ /dev/null
@@ -1,106 +0,0 @@
-From 5457a6870eb1369b868f7b8e833966ed43a773ad Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 15 Mar 2022 18:30:25 +0000
-Subject: [PATCH 50/51] x86/spec-ctrl: Enable Zen2 chickenbit
-
-... as instructed in the Branch Type Confusion whitepaper.
-
-This is part of XSA-407.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-(cherry picked from commit 9deaf2d932f08c16c6b96a1c426e4b1142c0cdbe)
----
- xen/arch/x86/cpu/amd.c | 28 ++++++++++++++++++++++++++++
- xen/arch/x86/cpu/cpu.h | 1 +
- xen/arch/x86/cpu/hygon.c | 6 ++++++
- xen/include/asm-x86/msr-index.h | 1 +
- 4 files changed, 36 insertions(+)
-
-diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
-index b158e3acb5c7..37ac84ddd74d 100644
---- a/xen/arch/x86/cpu/amd.c
-+++ b/xen/arch/x86/cpu/amd.c
-@@ -731,6 +731,31 @@ void amd_init_ssbd(const struct cpuinfo_x86 *c)
- printk_once(XENLOG_ERR "No SSBD controls available\n");
- }
-
-+/*
-+ * On Zen2 we offer this chicken (bit) on the altar of Speculation.
-+ *
-+ * Refer to the AMD Branch Type Confusion whitepaper:
-+ * https://XXX
-+ *
-+ * Setting this unnamed bit supposedly causes prediction information on
-+ * non-branch instructions to be ignored. It is to be set unilaterally in
-+ * newer microcode.
-+ *
-+ * This chickenbit is something unrelated on Zen1, and Zen1 vs Zen2 isn't a
-+ * simple model number comparison, so use STIBP as a heuristic to separate the
-+ * two uarches in Fam17h(AMD)/18h(Hygon).
-+ */
-+void amd_init_spectral_chicken(void)
-+{
-+ uint64_t val, chickenbit = 1 << 1;
-+
-+ if (cpu_has_hypervisor || !boot_cpu_has(X86_FEATURE_AMD_STIBP))
-+ return;
-+
-+ if (rdmsr_safe(MSR_AMD64_DE_CFG2, val) == 0 && !(val & chickenbit))
-+ wrmsr_safe(MSR_AMD64_DE_CFG2, val | chickenbit);
-+}
-+
- void __init detect_zen2_null_seg_behaviour(void)
- {
- uint64_t base;
-@@ -796,6 +821,9 @@ static void init_amd(struct cpuinfo_x86 *c)
-
- amd_init_ssbd(c);
-
-+ if (c->x86 == 0x17)
-+ amd_init_spectral_chicken();
-+
- /* Probe for NSCB on Zen2 CPUs when not virtualised */
- if (!cpu_has_hypervisor && !cpu_has_nscb && c == &boot_cpu_data &&
- c->x86 == 0x17)
-diff --git a/xen/arch/x86/cpu/cpu.h b/xen/arch/x86/cpu/cpu.h
-index b593bd85f04f..145bc5156a86 100644
---- a/xen/arch/x86/cpu/cpu.h
-+++ b/xen/arch/x86/cpu/cpu.h
-@@ -22,4 +22,5 @@ void early_init_amd(struct cpuinfo_x86 *c);
- void amd_log_freq(const struct cpuinfo_x86 *c);
- void amd_init_lfence(struct cpuinfo_x86 *c);
- void amd_init_ssbd(const struct cpuinfo_x86 *c);
-+void amd_init_spectral_chicken(void);
- void detect_zen2_null_seg_behaviour(void);
-diff --git a/xen/arch/x86/cpu/hygon.c b/xen/arch/x86/cpu/hygon.c
-index cdc94130dd2e..6f8d491297e8 100644
---- a/xen/arch/x86/cpu/hygon.c
-+++ b/xen/arch/x86/cpu/hygon.c
-@@ -40,6 +40,12 @@ static void init_hygon(struct cpuinfo_x86 *c)
- c->x86 == 0x18)
- detect_zen2_null_seg_behaviour();
-
-+ /*
-+ * TODO: Check heuristic safety with Hygon first
-+ if (c->x86 == 0x18)
-+ amd_init_spectral_chicken();
-+ */
-+
- /*
- * Hygon CPUs before Zen2 don't clear segment bases/limits when
- * loading a NULL selector.
-diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
-index 72bc32ba04ff..d3735e499e0f 100644
---- a/xen/include/asm-x86/msr-index.h
-+++ b/xen/include/asm-x86/msr-index.h
-@@ -361,6 +361,7 @@
- #define MSR_AMD64_DE_CFG 0xc0011029
- #define AMD64_DE_CFG_LFENCE_SERIALISE (_AC(1, ULL) << 1)
- #define MSR_AMD64_EX_CFG 0xc001102c
-+#define MSR_AMD64_DE_CFG2 0xc00110e3
-
- #define MSR_AMD64_DR0_ADDRESS_MASK 0xc0011027
- #define MSR_AMD64_DR1_ADDRESS_MASK 0xc0011019
---
-2.35.1
-
diff --git a/0051-x86-spec-ctrl-Mitigate-Branch-Type-Confusion-when-po.patch b/0051-x86-spec-ctrl-Mitigate-Branch-Type-Confusion-when-po.patch
deleted file mode 100644
index e313ede..0000000
--- a/0051-x86-spec-ctrl-Mitigate-Branch-Type-Confusion-when-po.patch
+++ /dev/null
@@ -1,305 +0,0 @@
-From 0a5387a01165b46c8c85e7f7e2ddbe60a7f5db44 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Mon, 27 Jun 2022 19:29:40 +0100
-Subject: [PATCH 51/51] x86/spec-ctrl: Mitigate Branch Type Confusion when
- possible
-
-Branch Type Confusion affects AMD/Hygon CPUs on Zen2 and earlier. To
-mitigate, we require SMT safety (STIBP on Zen2, no-SMT on Zen1), and to issue
-an IBPB on each entry to Xen, to flush the BTB.
-
-Due to performance concerns, dom0 (which is trusted in most configurations) is
-excluded from protections by default.
-
-Therefore:
- * Use STIBP by default on Zen2 too, which now means we want it on by default
- on all hardware supporting STIBP.
- * Break the current IBPB logic out into a new function, extending it with
- IBPB-at-entry logic.
- * Change the existing IBPB-at-ctxt-switch boolean to be tristate, and disable
- it by default when IBPB-at-entry is providing sufficient safety.
-
-If all PV guests on the system are trusted, then it is recommended to boot
-with `spec-ctrl=ibpb-entry=no-pv`, as this will provide an additional marginal
-perf improvement.
-
-This is part of XSA-407 / CVE-2022-23825.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit d8cb7e0f069e0f106d24941355b59b45a731eabe)
----
- docs/misc/xen-command-line.pandoc | 14 ++--
- xen/arch/x86/spec_ctrl.c | 113 ++++++++++++++++++++++++++----
- xen/include/asm-x86/spec_ctrl.h | 2 +-
- 3 files changed, 112 insertions(+), 17 deletions(-)
-
-diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index 1bbdb55129cc..bd6826d0ae05 100644
---- a/docs/misc/xen-command-line.pandoc
-+++ b/docs/misc/xen-command-line.pandoc
-@@ -2234,7 +2234,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
-
- ### spec-ctrl (x86)
- > `= List of [ <bool>, xen=<bool>, {pv,hvm}=<bool>,
--> {msr-sc,rsb,md-clear}=<bool>|{pv,hvm}=<bool>,
-+> {msr-sc,rsb,md-clear,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
- > bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
- > eager-fpu,l1d-flush,branch-harden,srb-lock,
- > unpriv-mmio}=<bool> ]`
-@@ -2259,9 +2259,10 @@ in place for guests to use.
-
- Use of a positive boolean value for either of these options is invalid.
-
--The `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` options offer fine
--grained control over the primitives by Xen. These impact Xen's ability to
--protect itself, and/or Xen's ability to virtualise support for guests to use.
-+The `pv=`, `hvm=`, `msr-sc=`, `rsb=`, `md-clear=` and `ibpb-entry=` options
-+offer fine grained control over the primitives by Xen. These impact Xen's
-+ability to protect itself, and/or Xen's ability to virtualise support for
-+guests to use.
-
- * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
- respectively.
-@@ -2280,6 +2281,11 @@ protect itself, and/or Xen's ability to virtualise support for guests to use.
- compatibility with development versions of this fix, `mds=` is also accepted
- on Xen 4.12 and earlier as an alias. Consult vendor documentation in
- preference to here.*
-+* `ibpb-entry=` offers control over whether IBPB (Indirect Branch Prediction
-+ Barrier) is used on entry to Xen. This is used by default on hardware
-+ vulnerable to Branch Type Confusion, but for performance reasons, dom0 is
-+ unprotected by default. If it necessary to protect dom0 too, boot with
-+ `spec-ctrl=ibpb-entry`.
-
- If Xen was compiled with INDIRECT_THUNK support, `bti-thunk=` can be used to
- select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 0f101c057f3e..1d9796c34d71 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -39,6 +39,10 @@ static bool __initdata opt_rsb_hvm = true;
- static int8_t __read_mostly opt_md_clear_pv = -1;
- static int8_t __read_mostly opt_md_clear_hvm = -1;
-
-+static int8_t __read_mostly opt_ibpb_entry_pv = -1;
-+static int8_t __read_mostly opt_ibpb_entry_hvm = -1;
-+static bool __read_mostly opt_ibpb_entry_dom0;
-+
- /* Cmdline controls for Xen's speculative settings. */
- static enum ind_thunk {
- THUNK_DEFAULT, /* Decide which thunk to use at boot time. */
-@@ -54,7 +58,7 @@ int8_t __initdata opt_stibp = -1;
- bool __read_mostly opt_ssbd;
- int8_t __initdata opt_psfd = -1;
-
--bool __read_mostly opt_ibpb_ctxt_switch = true;
-+int8_t __read_mostly opt_ibpb_ctxt_switch = -1;
- int8_t __read_mostly opt_eager_fpu = -1;
- int8_t __read_mostly opt_l1d_flush = -1;
- static bool __initdata opt_branch_harden = true;
-@@ -114,6 +118,9 @@ static int __init parse_spec_ctrl(const char *s)
- opt_rsb_hvm = false;
- opt_md_clear_pv = 0;
- opt_md_clear_hvm = 0;
-+ opt_ibpb_entry_pv = 0;
-+ opt_ibpb_entry_hvm = 0;
-+ opt_ibpb_entry_dom0 = false;
-
- opt_thunk = THUNK_JMP;
- opt_ibrs = 0;
-@@ -140,12 +147,14 @@ static int __init parse_spec_ctrl(const char *s)
- opt_msr_sc_pv = val;
- opt_rsb_pv = val;
- opt_md_clear_pv = val;
-+ opt_ibpb_entry_pv = val;
- }
- else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
- {
- opt_msr_sc_hvm = val;
- opt_rsb_hvm = val;
- opt_md_clear_hvm = val;
-+ opt_ibpb_entry_hvm = val;
- }
- else if ( (val = parse_boolean("msr-sc", s, ss)) != -1 )
- {
-@@ -210,6 +219,28 @@ static int __init parse_spec_ctrl(const char *s)
- break;
- }
- }
-+ else if ( (val = parse_boolean("ibpb-entry", s, ss)) != -1 )
-+ {
-+ switch ( val )
-+ {
-+ case 0:
-+ case 1:
-+ opt_ibpb_entry_pv = opt_ibpb_entry_hvm =
-+ opt_ibpb_entry_dom0 = val;
-+ break;
-+
-+ case -2:
-+ s += strlen("ibpb-entry=");
-+ if ( (val = parse_boolean("pv", s, ss)) >= 0 )
-+ opt_ibpb_entry_pv = val;
-+ else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
-+ opt_ibpb_entry_hvm = val;
-+ else
-+ default:
-+ rc = -EINVAL;
-+ break;
-+ }
-+ }
-
- /* Xen's speculative sidechannel mitigation settings. */
- else if ( !strncmp(s, "bti-thunk=", 10) )
-@@ -477,27 +508,31 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
- * mitigation support for guests.
- */
- #ifdef CONFIG_HVM
-- printk(" Support for HVM VMs:%s%s%s%s%s\n",
-+ printk(" Support for HVM VMs:%s%s%s%s%s%s\n",
- (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
- boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
- boot_cpu_has(X86_FEATURE_MD_CLEAR) ||
-+ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ||
- opt_eager_fpu) ? "" : " None",
- boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ? " MSR_SPEC_CTRL" : "",
- boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ? " RSB" : "",
- opt_eager_fpu ? " EAGER_FPU" : "",
-- boot_cpu_has(X86_FEATURE_MD_CLEAR) ? " MD_CLEAR" : "");
-+ boot_cpu_has(X86_FEATURE_MD_CLEAR) ? " MD_CLEAR" : "",
-+ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ? " IBPB-entry" : "");
-
- #endif
- #ifdef CONFIG_PV
-- printk(" Support for PV VMs:%s%s%s%s%s\n",
-+ printk(" Support for PV VMs:%s%s%s%s%s%s\n",
- (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
- boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
- boot_cpu_has(X86_FEATURE_MD_CLEAR) ||
-+ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ||
- opt_eager_fpu) ? "" : " None",
- boot_cpu_has(X86_FEATURE_SC_MSR_PV) ? " MSR_SPEC_CTRL" : "",
- boot_cpu_has(X86_FEATURE_SC_RSB_PV) ? " RSB" : "",
- opt_eager_fpu ? " EAGER_FPU" : "",
-- boot_cpu_has(X86_FEATURE_MD_CLEAR) ? " MD_CLEAR" : "");
-+ boot_cpu_has(X86_FEATURE_MD_CLEAR) ? " MD_CLEAR" : "",
-+ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ? " IBPB-entry" : "");
-
- printk(" XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
- opt_xpti_hwdom ? "enabled" : "disabled",
-@@ -759,6 +794,55 @@ static bool __init should_use_eager_fpu(void)
- }
- }
-
-+static void __init ibpb_calculations(void)
-+{
-+ /* Check we have hardware IBPB support before using it... */
-+ if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
-+ {
-+ opt_ibpb_entry_hvm = opt_ibpb_entry_pv = opt_ibpb_ctxt_switch = 0;
-+ opt_ibpb_entry_dom0 = false;
-+ return;
-+ }
-+
-+ /*
-+ * IBPB-on-entry mitigations for Branch Type Confusion.
-+ *
-+ * IBPB && !BTC_NO selects all AMD/Hygon hardware, not known to be safe,
-+ * that we can provide some form of mitigation on.
-+ */
-+ if ( opt_ibpb_entry_pv == -1 )
-+ opt_ibpb_entry_pv = (IS_ENABLED(CONFIG_PV) &&
-+ boot_cpu_has(X86_FEATURE_IBPB) &&
-+ !boot_cpu_has(X86_FEATURE_BTC_NO));
-+ if ( opt_ibpb_entry_hvm == -1 )
-+ opt_ibpb_entry_hvm = (IS_ENABLED(CONFIG_HVM) &&
-+ boot_cpu_has(X86_FEATURE_IBPB) &&
-+ !boot_cpu_has(X86_FEATURE_BTC_NO));
-+
-+ if ( opt_ibpb_entry_pv )
-+ {
-+ setup_force_cpu_cap(X86_FEATURE_IBPB_ENTRY_PV);
-+
-+ /*
-+ * We only need to flush in IST context if we're protecting against PV
-+ * guests. HVM IBPB-on-entry protections are both atomic with
-+ * NMI/#MC, so can't interrupt Xen ahead of having already flushed the
-+ * BTB.
-+ */
-+ default_spec_ctrl_flags |= SCF_ist_ibpb;
-+ }
-+ if ( opt_ibpb_entry_hvm )
-+ setup_force_cpu_cap(X86_FEATURE_IBPB_ENTRY_HVM);
-+
-+ /*
-+ * If we're using IBPB-on-entry to protect against PV and HVM guests
-+ * (ignoring dom0 if trusted), then there's no need to also issue IBPB on
-+ * context switch too.
-+ */
-+ if ( opt_ibpb_ctxt_switch == -1 )
-+ opt_ibpb_ctxt_switch = !(opt_ibpb_entry_hvm && opt_ibpb_entry_pv);
-+}
-+
- /* Calculate whether this CPU is vulnerable to L1TF. */
- static __init void l1tf_calculations(uint64_t caps)
- {
-@@ -1014,8 +1098,12 @@ void spec_ctrl_init_domain(struct domain *d)
- bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
- (opt_fb_clear_mmio && is_iommu_enabled(d)));
-
-+ bool ibpb = ((pv ? opt_ibpb_entry_pv : opt_ibpb_entry_hvm) &&
-+ (d->domain_id != 0 || opt_ibpb_entry_dom0));
-+
- d->arch.spec_ctrl_flags =
- (verw ? SCF_verw : 0) |
-+ (ibpb ? SCF_entry_ibpb : 0) |
- 0;
- }
-
-@@ -1162,12 +1250,15 @@ void __init init_speculation_mitigations(void)
- }
-
- /*
-- * Use STIBP by default if the hardware hint is set. Otherwise, leave it
-- * off as it a severe performance pentalty on pre-eIBRS Intel hardware
-- * where it was retrofitted in microcode.
-+ * Use STIBP by default on all AMD systems. Zen3 and later enumerate
-+ * STIBP_ALWAYS, but STIBP is needed on Zen2 as part of the mitigations
-+ * for Branch Type Confusion.
-+ *
-+ * Leave STIBP off by default on Intel. Pre-eIBRS systems suffer a
-+ * substantial perf hit when it was implemented in microcode.
- */
- if ( opt_stibp == -1 )
-- opt_stibp = !!boot_cpu_has(X86_FEATURE_STIBP_ALWAYS);
-+ opt_stibp = !!boot_cpu_has(X86_FEATURE_AMD_STIBP);
-
- if ( opt_stibp && (boot_cpu_has(X86_FEATURE_STIBP) ||
- boot_cpu_has(X86_FEATURE_AMD_STIBP)) )
-@@ -1239,9 +1330,7 @@ void __init init_speculation_mitigations(void)
- if ( opt_rsb_hvm )
- setup_force_cpu_cap(X86_FEATURE_SC_RSB_HVM);
-
-- /* Check we have hardware IBPB support before using it... */
-- if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
-- opt_ibpb_ctxt_switch = false;
-+ ibpb_calculations();
-
- /* Check whether Eager FPU should be enabled by default. */
- if ( opt_eager_fpu == -1 )
-diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
-index 3fc599a817c4..9403b81dc7af 100644
---- a/xen/include/asm-x86/spec_ctrl.h
-+++ b/xen/include/asm-x86/spec_ctrl.h
-@@ -65,7 +65,7 @@
- void init_speculation_mitigations(void);
- void spec_ctrl_init_domain(struct domain *d);
-
--extern bool opt_ibpb_ctxt_switch;
-+extern int8_t opt_ibpb_ctxt_switch;
- extern bool opt_ssbd;
- extern int8_t opt_eager_fpu;
- extern int8_t opt_l1d_flush;
---
-2.35.1
-
diff --git a/info.txt b/info.txt
index e830829..d2c53b1 100644
--- a/info.txt
+++ b/info.txt
@@ -1,6 +1,6 @@
-Xen upstream patchset #1 for 4.16.2-pre
+Xen upstream patchset #0 for 4.16.3-pre
Containing patches from
-RELEASE-4.16.1 (13fee86475f3831d7a1ecf6d7e0acbc2ac779f7e)
+RELEASE-4.16.2 (1871bd1c9eb934f0ffd039f3d68e42fd0097f322)
to
-staging-4.16 (0a5387a01165b46c8c85e7f7e2ddbe60a7f5db44)
+staging-4.16 (1bce7fb1f702da4f7a749c6f1457ecb20bf74fca)
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [gentoo-commits] proj/xen-upstream-patches:main commit in: /
@ 2022-11-09 8:53 Florian Schmaus
0 siblings, 0 replies; 11+ messages in thread
From: Florian Schmaus @ 2022-11-09 8:53 UTC (permalink / raw
To: gentoo-commits
commit: fac86a27853d2f21c62fefcba9cca32e3b9bdcdc
Author: Florian Schmaus <flow <AT> gentoo <DOT> org>
AuthorDate: Wed Nov 9 08:53:02 2022 +0000
Commit: Florian Schmaus <flow <AT> gentoo <DOT> org>
CommitDate: Wed Nov 9 08:53:02 2022 +0000
URL: https://gitweb.gentoo.org/proj/xen-upstream-patches.git/commit/?id=fac86a27
Xen 4.16.3-pre-patchset-1
Signed-off-by: Florian Schmaus <flow <AT> gentoo.org>
0001-update-Xen-version-to-4.16.3-pre.patch | 4 +-
...-Prevent-adding-mapping-when-domain-is-dy.patch | 4 +-
...-Handle-preemption-when-freeing-intermedi.patch | 4 +-
...-option-to-skip-root-pagetable-removal-in.patch | 4 +-
...just-monitor-table-related-error-handling.patch | 4 +-
...tolerate-failure-of-sh_set_toplevel_shado.patch | 4 +-
...hadow-tolerate-failure-in-shadow_prealloc.patch | 4 +-
...-refuse-new-allocations-for-dying-domains.patch | 4 +-
...ly-free-paging-pool-memory-for-dying-doma.patch | 4 +-
...-free-the-paging-memory-pool-preemptively.patch | 4 +-
...en-x86-p2m-Add-preemption-in-p2m_teardown.patch | 4 +-
...s-Use-arch-specific-default-paging-memory.patch | 4 +-
...m-Construct-the-P2M-pages-pool-for-guests.patch | 4 +-
...xl-Implement-XEN_DOMCTL_shadow_op-for-Arm.patch | 4 +-
...ocate-and-free-P2M-pages-from-the-P2M-poo.patch | 4 +-
...ect-locking-on-transitive-grant-copy-erro.patch | 4 +-
...-Replace-deprecated-soundhw-on-QEMU-comma.patch | 4 +-
...urface-suitable-value-in-EBX-of-XSTATE-su.patch | 4 +-
...ed-introduce-cpupool_update_node_affinity.patch | 4 +-
...arve-out-memory-allocation-and-freeing-fr.patch | 4 +-
0021-xen-sched-fix-cpu-hotplug.patch | 4 +-
...orrect-PIE-related-option-s-in-EMBEDDED_E.patch | 4 +-
...ore-minor-fix-of-the-migration-stream-doc.patch | 4 +-
0024-xen-gnttab-fix-gnttab_acquire_resource.patch | 4 +-
...-VCPUOP_register_vcpu_time_memory_area-fo.patch | 4 +-
...-x86-vpmu-Fix-race-condition-in-vpmu_load.patch | 4 +-
0027-arm-p2m-Rework-p2m_init.patch | 88 ++
...-Populate-pages-for-GICv2-mapping-in-p2m_.patch | 169 ++++
0029-x86emul-respect-NSCB.patch | 40 +
...correct-error-handling-in-vmx_create_vmcs.patch | 38 +
...-argo-Remove-reachable-ASSERT_UNREACHABLE.patch | 41 +
...onvert-memory-marked-for-runtime-use-to-o.patch | 64 ++
0033-xen-sched-fix-race-in-RTDS-scheduler.patch | 42 +
...-fix-restore_vcpu_affinity-by-removing-it.patch | 158 ++++
...-x86-shadow-drop-replace-bogus-assertions.patch | 71 ++
...assume-that-vpci-per-device-data-exists-u.patch | 60 ++
...pci-msix-remove-from-table-list-on-detach.patch | 47 ++
...p-secondary-time-area-handles-during-soft.patch | 49 ++
...vcpu_info-wants-to-unshare-the-underlying.patch | 41 +
...-correctly-ignore-empty-onlining-requests.patch | 43 +
...m-correct-ballooning-up-for-compat-guests.patch | 55 ++
...-correct-ballooning-down-for-compat-guest.patch | 72 ++
...ert-VMX-use-a-single-global-APIC-access-p.patch | 259 ++++++
...ore-create_node-Don-t-defer-work-to-undo-.patch | 120 +++
...ore-Fail-a-transaction-if-it-is-not-possi.patch | 145 ++++
0046-tools-xenstore-split-up-send_reply.patch | 213 +++++
...ore-add-helpers-to-free-struct-buffered_d.patch | 117 +++
...ls-xenstore-reduce-number-of-watch-events.patch | 201 +++++
...xenstore-let-unread-watch-events-time-out.patch | 309 +++++++
...tools-xenstore-limit-outstanding-requests.patch | 453 +++++++++++
...ore-don-t-buffer-multiple-identical-watch.patch | 93 +++
0052-tools-xenstore-fix-connection-id-usage.patch | 61 ++
...ore-simplify-and-fix-per-domain-node-acco.patch | 336 ++++++++
...ore-limit-max-number-of-nodes-accessed-in.patch | 255 ++++++
...ore-move-the-call-of-setup_structure-to-d.patch | 96 +++
...ore-add-infrastructure-to-keep-track-of-p.patch | 289 +++++++
...store-add-memory-accounting-for-responses.patch | 82 ++
...enstore-add-memory-accounting-for-watches.patch | 96 +++
...-xenstore-add-memory-accounting-for-nodes.patch | 342 ++++++++
...-xenstore-add-exports-for-quota-variables.patch | 62 ++
...ore-add-control-command-for-setting-and-s.patch | 248 ++++++
...-xenstored-Synchronise-defaults-with-oxen.patch | 63 ++
...-xenstored-Check-for-maxrequests-before-p.patch | 101 +++
0064-tools-ocaml-GC-parameter-tuning.patch | 126 +++
0065-tools-ocaml-libs-xb-hide-type-of-Xb.t.patch | 92 +++
...-Change-Xb.input-to-return-Packet.t-optio.patch | 224 ++++++
0067-tools-ocaml-xb-Add-BoundedQueue.patch | 133 +++
...-Limit-maximum-in-flight-requests-outstan.patch | 888 +++++++++++++++++++++
...clarify-support-of-untrusted-driver-domai.patch | 55 ++
...ore-don-t-use-conn-in-as-context-for-temp.patch | 718 +++++++++++++++++
...ls-xenstore-fix-checking-node-permissions.patch | 143 ++++
...tore-remove-recursion-from-construct_node.patch | 125 +++
...ore-don-t-let-remove_child_entry-call-cor.patch | 110 +++
...ls-xenstore-add-generic-treewalk-function.patch | 250 ++++++
0075-tools-xenstore-simplify-check_store.patch | 114 +++
...ols-xenstore-use-treewalk-for-check_store.patch | 172 ++++
...-xenstore-use-treewalk-for-deleting-nodes.patch | 180 +++++
...ore-use-treewalk-for-creating-node-record.patch | 169 ++++
...ore-remove-nodes-owned-by-destroyed-domai.patch | 298 +++++++
...ore-make-the-internal-memory-data-base-th.patch | 101 +++
...e-xenstore.txt-with-permissions-descripti.patch | 50 ++
...-xenstored-Fix-quota-bypass-on-domain-shu.patch | 93 +++
...caml-Ensure-packet-size-is-never-negative.patch | 75 ++
...xenstore-fix-deleting-node-in-transaction.patch | 46 ++
...ore-harden-transaction-finalization-again.patch | 410 ++++++++++
0086-x86-spec-ctrl-Enumeration-for-IBPB_RET.patch | 82 ++
...rl-Mitigate-IBPB-not-flushing-the-RSB-RAS.patch | 113 +++
info.txt | 4 +-
88 files changed, 9840 insertions(+), 54 deletions(-)
diff --git a/0001-update-Xen-version-to-4.16.3-pre.patch b/0001-update-Xen-version-to-4.16.3-pre.patch
index 6ae690c..d04dd34 100644
--- a/0001-update-Xen-version-to-4.16.3-pre.patch
+++ b/0001-update-Xen-version-to-4.16.3-pre.patch
@@ -1,7 +1,7 @@
From 4aa32912ebeda8cb94d1c3941e7f1f0a2d4f921b Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 11 Oct 2022 14:49:41 +0200
-Subject: [PATCH 01/26] update Xen version to 4.16.3-pre
+Subject: [PATCH 01/87] update Xen version to 4.16.3-pre
---
xen/Makefile | 2 +-
@@ -21,5 +21,5 @@ index 76d0a3ff253f..8a403ee896cd 100644
-include xen-version
--
-2.37.3
+2.37.4
diff --git a/0002-xen-arm-p2m-Prevent-adding-mapping-when-domain-is-dy.patch b/0002-xen-arm-p2m-Prevent-adding-mapping-when-domain-is-dy.patch
index fecc260..63aa293 100644
--- a/0002-xen-arm-p2m-Prevent-adding-mapping-when-domain-is-dy.patch
+++ b/0002-xen-arm-p2m-Prevent-adding-mapping-when-domain-is-dy.patch
@@ -1,7 +1,7 @@
From 8d9531a3421dad2b0012e09e6f41d5274e162064 Mon Sep 17 00:00:00 2001
From: Julien Grall <jgrall@amazon.com>
Date: Tue, 11 Oct 2022 14:52:13 +0200
-Subject: [PATCH 02/26] xen/arm: p2m: Prevent adding mapping when domain is
+Subject: [PATCH 02/87] xen/arm: p2m: Prevent adding mapping when domain is
dying
During the domain destroy process, the domain will still be accessible
@@ -58,5 +58,5 @@ index 3349b464a39e..1affdafadbeb 100644
start = p2m->lowest_mapped_gfn;
--
-2.37.3
+2.37.4
diff --git a/0003-xen-arm-p2m-Handle-preemption-when-freeing-intermedi.patch b/0003-xen-arm-p2m-Handle-preemption-when-freeing-intermedi.patch
index 3190db8..0b33b0a 100644
--- a/0003-xen-arm-p2m-Handle-preemption-when-freeing-intermedi.patch
+++ b/0003-xen-arm-p2m-Handle-preemption-when-freeing-intermedi.patch
@@ -1,7 +1,7 @@
From 937fdbad5180440888f1fcee46299103327efa90 Mon Sep 17 00:00:00 2001
From: Julien Grall <jgrall@amazon.com>
Date: Tue, 11 Oct 2022 14:52:27 +0200
-Subject: [PATCH 03/26] xen/arm: p2m: Handle preemption when freeing
+Subject: [PATCH 03/87] xen/arm: p2m: Handle preemption when freeing
intermediate page tables
At the moment the P2M page tables will be freed when the domain structure
@@ -163,5 +163,5 @@ index 8f11d9c97b5d..b3ba83283e11 100644
/*
* Remove mapping refcount on each mapping page in the p2m
--
-2.37.3
+2.37.4
diff --git a/0004-x86-p2m-add-option-to-skip-root-pagetable-removal-in.patch b/0004-x86-p2m-add-option-to-skip-root-pagetable-removal-in.patch
index b3edbd9..04c002b 100644
--- a/0004-x86-p2m-add-option-to-skip-root-pagetable-removal-in.patch
+++ b/0004-x86-p2m-add-option-to-skip-root-pagetable-removal-in.patch
@@ -1,7 +1,7 @@
From 8fc19c143b8aa563077f3d5c46fcc0a54dc04f35 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Date: Tue, 11 Oct 2022 14:52:39 +0200
-Subject: [PATCH 04/26] x86/p2m: add option to skip root pagetable removal in
+Subject: [PATCH 04/87] x86/p2m: add option to skip root pagetable removal in
p2m_teardown()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
@@ -134,5 +134,5 @@ index f2af7a746ced..c3c16748e7d5 100644
/* Add a page to a domain's p2m table */
--
-2.37.3
+2.37.4
diff --git a/0005-x86-HAP-adjust-monitor-table-related-error-handling.patch b/0005-x86-HAP-adjust-monitor-table-related-error-handling.patch
index 33ab1ad..0f48084 100644
--- a/0005-x86-HAP-adjust-monitor-table-related-error-handling.patch
+++ b/0005-x86-HAP-adjust-monitor-table-related-error-handling.patch
@@ -1,7 +1,7 @@
From 3422c19d85a3d23a9d798eafb739ffb8865522d2 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 11 Oct 2022 14:52:59 +0200
-Subject: [PATCH 05/26] x86/HAP: adjust monitor table related error handling
+Subject: [PATCH 05/87] x86/HAP: adjust monitor table related error handling
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
@@ -73,5 +73,5 @@ index a8f5a19da917..d75dc2b9ed3d 100644
put_gfn(d, cr3_gfn);
}
--
-2.37.3
+2.37.4
diff --git a/0006-x86-shadow-tolerate-failure-of-sh_set_toplevel_shado.patch b/0006-x86-shadow-tolerate-failure-of-sh_set_toplevel_shado.patch
index bbae48b..b9439ca 100644
--- a/0006-x86-shadow-tolerate-failure-of-sh_set_toplevel_shado.patch
+++ b/0006-x86-shadow-tolerate-failure-of-sh_set_toplevel_shado.patch
@@ -1,7 +1,7 @@
From 40e9daf6b56ae49bda3ba4e254ccf0e998e52a8c Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 11 Oct 2022 14:53:12 +0200
-Subject: [PATCH 06/26] x86/shadow: tolerate failure of
+Subject: [PATCH 06/87] x86/shadow: tolerate failure of
sh_set_toplevel_shadow()
Subsequently sh_set_toplevel_shadow() will be adjusted to install a
@@ -72,5 +72,5 @@ index 7b8f4dd13b03..2ff78fe3362c 100644
#error This should never happen
#endif
--
-2.37.3
+2.37.4
diff --git a/0007-x86-shadow-tolerate-failure-in-shadow_prealloc.patch b/0007-x86-shadow-tolerate-failure-in-shadow_prealloc.patch
index 5e2f8ab..d288a0b 100644
--- a/0007-x86-shadow-tolerate-failure-in-shadow_prealloc.patch
+++ b/0007-x86-shadow-tolerate-failure-in-shadow_prealloc.patch
@@ -1,7 +1,7 @@
From 28d3f677ec97c98154311f64871ac48762cf980a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Date: Tue, 11 Oct 2022 14:53:27 +0200
-Subject: [PATCH 07/26] x86/shadow: tolerate failure in shadow_prealloc()
+Subject: [PATCH 07/87] x86/shadow: tolerate failure in shadow_prealloc()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
@@ -275,5 +275,5 @@ index 35efb1b984fb..738214f75e8d 100644
u32 shadow_type,
unsigned long backpointer);
--
-2.37.3
+2.37.4
diff --git a/0008-x86-p2m-refuse-new-allocations-for-dying-domains.patch b/0008-x86-p2m-refuse-new-allocations-for-dying-domains.patch
index 70b5cc9..d89d5b9 100644
--- a/0008-x86-p2m-refuse-new-allocations-for-dying-domains.patch
+++ b/0008-x86-p2m-refuse-new-allocations-for-dying-domains.patch
@@ -1,7 +1,7 @@
From 745e0b300dc3f5000e6d48c273b405d4bcc29ba7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Date: Tue, 11 Oct 2022 14:53:41 +0200
-Subject: [PATCH 08/26] x86/p2m: refuse new allocations for dying domains
+Subject: [PATCH 08/87] x86/p2m: refuse new allocations for dying domains
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
@@ -96,5 +96,5 @@ index 2067c7d16bb4..9807f6ec6c00 100644
* paging lock) and the log-dirty code (which always does). */
paging_lock_recursive(d);
--
-2.37.3
+2.37.4
diff --git a/0009-x86-p2m-truly-free-paging-pool-memory-for-dying-doma.patch b/0009-x86-p2m-truly-free-paging-pool-memory-for-dying-doma.patch
index 07e63ac..57620cd 100644
--- a/0009-x86-p2m-truly-free-paging-pool-memory-for-dying-doma.patch
+++ b/0009-x86-p2m-truly-free-paging-pool-memory-for-dying-doma.patch
@@ -1,7 +1,7 @@
From 943635d8f8486209e4e48966507ad57963e96284 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Date: Tue, 11 Oct 2022 14:54:00 +0200
-Subject: [PATCH 09/26] x86/p2m: truly free paging pool memory for dying
+Subject: [PATCH 09/87] x86/p2m: truly free paging pool memory for dying
domains
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
@@ -111,5 +111,5 @@ index 9807f6ec6c00..9eb33eafc7f7 100644
paging_unlock(d);
}
--
-2.37.3
+2.37.4
diff --git a/0010-x86-p2m-free-the-paging-memory-pool-preemptively.patch b/0010-x86-p2m-free-the-paging-memory-pool-preemptively.patch
index 59c6940..8c80e31 100644
--- a/0010-x86-p2m-free-the-paging-memory-pool-preemptively.patch
+++ b/0010-x86-p2m-free-the-paging-memory-pool-preemptively.patch
@@ -1,7 +1,7 @@
From f5959ed715e19cf2844656477dbf74c2f576c9d4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Date: Tue, 11 Oct 2022 14:54:21 +0200
-Subject: [PATCH 10/26] x86/p2m: free the paging memory pool preemptively
+Subject: [PATCH 10/87] x86/p2m: free the paging memory pool preemptively
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
@@ -177,5 +177,5 @@ index 9eb33eafc7f7..ac9a1ae07808 100644
}
--
-2.37.3
+2.37.4
diff --git a/0011-xen-x86-p2m-Add-preemption-in-p2m_teardown.patch b/0011-xen-x86-p2m-Add-preemption-in-p2m_teardown.patch
index 5520627..096656a 100644
--- a/0011-xen-x86-p2m-Add-preemption-in-p2m_teardown.patch
+++ b/0011-xen-x86-p2m-Add-preemption-in-p2m_teardown.patch
@@ -1,7 +1,7 @@
From a603386b422f5cb4c5e2639a7e20a1d99dba2175 Mon Sep 17 00:00:00 2001
From: Julien Grall <jgrall@amazon.com>
Date: Tue, 11 Oct 2022 14:54:44 +0200
-Subject: [PATCH 11/26] xen/x86: p2m: Add preemption in p2m_teardown()
+Subject: [PATCH 11/87] xen/x86: p2m: Add preemption in p2m_teardown()
The list p2m->pages contain all the pages used by the P2M. On large
instance this can be quite large and the time spent to call
@@ -193,5 +193,5 @@ index c3c16748e7d5..2db9ab0122f2 100644
/* Add a page to a domain's p2m table */
--
-2.37.3
+2.37.4
diff --git a/0012-libxl-docs-Use-arch-specific-default-paging-memory.patch b/0012-libxl-docs-Use-arch-specific-default-paging-memory.patch
index 9390500..d1aeae9 100644
--- a/0012-libxl-docs-Use-arch-specific-default-paging-memory.patch
+++ b/0012-libxl-docs-Use-arch-specific-default-paging-memory.patch
@@ -1,7 +1,7 @@
From 755a9b52844de3e1e47aa1fc9991a4240ccfbf35 Mon Sep 17 00:00:00 2001
From: Henry Wang <Henry.Wang@arm.com>
Date: Tue, 11 Oct 2022 14:55:08 +0200
-Subject: [PATCH 12/26] libxl, docs: Use arch-specific default paging memory
+Subject: [PATCH 12/87] libxl, docs: Use arch-specific default paging memory
The default paging memory (descibed in `shadow_memory` entry in xl
config) in libxl is used to determine the memory pool size for xl
@@ -145,5 +145,5 @@ index 1feadebb1852..51362893cf98 100644
* Local variables:
* mode: C
--
-2.37.3
+2.37.4
diff --git a/0013-xen-arm-Construct-the-P2M-pages-pool-for-guests.patch b/0013-xen-arm-Construct-the-P2M-pages-pool-for-guests.patch
index dee9d9c..7ab3212 100644
--- a/0013-xen-arm-Construct-the-P2M-pages-pool-for-guests.patch
+++ b/0013-xen-arm-Construct-the-P2M-pages-pool-for-guests.patch
@@ -1,7 +1,7 @@
From 914fc8e8b4cc003e90d51bee0aef54687358530a Mon Sep 17 00:00:00 2001
From: Henry Wang <Henry.Wang@arm.com>
Date: Tue, 11 Oct 2022 14:55:21 +0200
-Subject: [PATCH 13/26] xen/arm: Construct the P2M pages pool for guests
+Subject: [PATCH 13/87] xen/arm: Construct the P2M pages pool for guests
This commit constructs the p2m pages pool for guests from the
data structure and helper perspective.
@@ -185,5 +185,5 @@ index b3ba83283e11..c9598740bd02 100644
{
write_lock(&p2m->lock);
--
-2.37.3
+2.37.4
diff --git a/0014-xen-arm-libxl-Implement-XEN_DOMCTL_shadow_op-for-Arm.patch b/0014-xen-arm-libxl-Implement-XEN_DOMCTL_shadow_op-for-Arm.patch
index fe24269..0c19560 100644
--- a/0014-xen-arm-libxl-Implement-XEN_DOMCTL_shadow_op-for-Arm.patch
+++ b/0014-xen-arm-libxl-Implement-XEN_DOMCTL_shadow_op-for-Arm.patch
@@ -1,7 +1,7 @@
From 3a16da801e14b8ff996b6f7408391ce488abd925 Mon Sep 17 00:00:00 2001
From: Henry Wang <Henry.Wang@arm.com>
Date: Tue, 11 Oct 2022 14:55:40 +0200
-Subject: [PATCH 14/26] xen/arm, libxl: Implement XEN_DOMCTL_shadow_op for Arm
+Subject: [PATCH 14/87] xen/arm, libxl: Implement XEN_DOMCTL_shadow_op for Arm
This commit implements the `XEN_DOMCTL_shadow_op` support in Xen
for Arm. The p2m pages pool size for xl guests is supposed to be
@@ -104,5 +104,5 @@ index 1baf25c3d98b..9bf72e693019 100644
{
gfn_t s = _gfn(domctl->u.cacheflush.start_pfn);
--
-2.37.3
+2.37.4
diff --git a/0015-xen-arm-Allocate-and-free-P2M-pages-from-the-P2M-poo.patch b/0015-xen-arm-Allocate-and-free-P2M-pages-from-the-P2M-poo.patch
index 704543a..7472b4b 100644
--- a/0015-xen-arm-Allocate-and-free-P2M-pages-from-the-P2M-poo.patch
+++ b/0015-xen-arm-Allocate-and-free-P2M-pages-from-the-P2M-poo.patch
@@ -1,7 +1,7 @@
From 44e9dcc48b81bca202a5b31926125a6a59a4c72e Mon Sep 17 00:00:00 2001
From: Henry Wang <Henry.Wang@arm.com>
Date: Tue, 11 Oct 2022 14:55:53 +0200
-Subject: [PATCH 15/26] xen/arm: Allocate and free P2M pages from the P2M pool
+Subject: [PATCH 15/87] xen/arm: Allocate and free P2M pages from the P2M pool
This commit sets/tearsdown of p2m pages pool for non-privileged Arm
guests by calling `p2m_set_allocation` and `p2m_teardown_allocation`.
@@ -285,5 +285,5 @@ index d8957dd8727c..b2d856a801af 100644
if ( p2m->root )
free_domheap_pages(p2m->root, P2M_ROOT_ORDER);
--
-2.37.3
+2.37.4
diff --git a/0016-gnttab-correct-locking-on-transitive-grant-copy-erro.patch b/0016-gnttab-correct-locking-on-transitive-grant-copy-erro.patch
index 6283d47..dfb46a9 100644
--- a/0016-gnttab-correct-locking-on-transitive-grant-copy-erro.patch
+++ b/0016-gnttab-correct-locking-on-transitive-grant-copy-erro.patch
@@ -1,7 +1,7 @@
From 32cb81501c8b858fe9a451650804ec3024a8b364 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 11 Oct 2022 14:56:29 +0200
-Subject: [PATCH 16/26] gnttab: correct locking on transitive grant copy error
+Subject: [PATCH 16/87] gnttab: correct locking on transitive grant copy error
path
While the comment next to the lock dropping in preparation of
@@ -62,5 +62,5 @@ index 4c742cd8fe81..d8ca645b96ff 100644
*page = NULL;
return ERESTART;
--
-2.37.3
+2.37.4
diff --git a/0017-tools-libxl-Replace-deprecated-soundhw-on-QEMU-comma.patch b/0017-tools-libxl-Replace-deprecated-soundhw-on-QEMU-comma.patch
index ffbc311..8133c53 100644
--- a/0017-tools-libxl-Replace-deprecated-soundhw-on-QEMU-comma.patch
+++ b/0017-tools-libxl-Replace-deprecated-soundhw-on-QEMU-comma.patch
@@ -1,7 +1,7 @@
From e85e2a3c17b6cd38de041cdaf14d9efdcdabad1a Mon Sep 17 00:00:00 2001
From: Anthony PERARD <anthony.perard@citrix.com>
Date: Tue, 11 Oct 2022 14:59:10 +0200
-Subject: [PATCH 17/26] tools/libxl: Replace deprecated -soundhw on QEMU
+Subject: [PATCH 17/87] tools/libxl: Replace deprecated -soundhw on QEMU
command line
-soundhw is deprecated since 825ff02911c9 ("audio: add soundhw
@@ -108,5 +108,5 @@ index 3593e21dbb64..caa08d3229cd 100644
+ (7, "sb16"),
+ ])
--
-2.37.3
+2.37.4
diff --git a/0018-x86-CPUID-surface-suitable-value-in-EBX-of-XSTATE-su.patch b/0018-x86-CPUID-surface-suitable-value-in-EBX-of-XSTATE-su.patch
index d6ade98..5fc8919 100644
--- a/0018-x86-CPUID-surface-suitable-value-in-EBX-of-XSTATE-su.patch
+++ b/0018-x86-CPUID-surface-suitable-value-in-EBX-of-XSTATE-su.patch
@@ -1,7 +1,7 @@
From e8882bcfe35520e950ba60acd6e67e65f1ce90a8 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 11 Oct 2022 14:59:26 +0200
-Subject: [PATCH 18/26] x86/CPUID: surface suitable value in EBX of XSTATE
+Subject: [PATCH 18/87] x86/CPUID: surface suitable value in EBX of XSTATE
subleaf 1
While the SDM isn't very clear about this, our present behavior make
@@ -40,5 +40,5 @@ index ff335f16390d..a647331f4793 100644
/*
* TODO: Figure out what to do for XSS state. VT-x manages
--
-2.37.3
+2.37.4
diff --git a/0019-xen-sched-introduce-cpupool_update_node_affinity.patch b/0019-xen-sched-introduce-cpupool_update_node_affinity.patch
index 957d0fe..badb8c3 100644
--- a/0019-xen-sched-introduce-cpupool_update_node_affinity.patch
+++ b/0019-xen-sched-introduce-cpupool_update_node_affinity.patch
@@ -1,7 +1,7 @@
From d4e971ad12dd27913dffcf96b5de378ea7b476e1 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 11 Oct 2022 14:59:40 +0200
-Subject: [PATCH 19/26] xen/sched: introduce cpupool_update_node_affinity()
+Subject: [PATCH 19/87] xen/sched: introduce cpupool_update_node_affinity()
For updating the node affinities of all domains in a cpupool add a new
function cpupool_update_node_affinity().
@@ -253,5 +253,5 @@ index 9671062360ac..3f4225738a40 100644
/*
* To be implemented by each architecture, sanity checking the configuration
--
-2.37.3
+2.37.4
diff --git a/0020-xen-sched-carve-out-memory-allocation-and-freeing-fr.patch b/0020-xen-sched-carve-out-memory-allocation-and-freeing-fr.patch
index 30784c3..0a04620 100644
--- a/0020-xen-sched-carve-out-memory-allocation-and-freeing-fr.patch
+++ b/0020-xen-sched-carve-out-memory-allocation-and-freeing-fr.patch
@@ -1,7 +1,7 @@
From c377ceab0a007690a1e71c81a5232613c99e944d Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 11 Oct 2022 15:00:05 +0200
-Subject: [PATCH 20/26] xen/sched: carve out memory allocation and freeing from
+Subject: [PATCH 20/87] xen/sched: carve out memory allocation and freeing from
schedule_cpu_rm()
In order to prepare not allocating or freeing memory from
@@ -259,5 +259,5 @@ index 2b04b01a0c0a..e286849a1312 100644
int sched_move_domain(struct domain *d, struct cpupool *c);
struct cpupool *cpupool_get_by_id(unsigned int poolid);
--
-2.37.3
+2.37.4
diff --git a/0021-xen-sched-fix-cpu-hotplug.patch b/0021-xen-sched-fix-cpu-hotplug.patch
index ea0b732..ac3b1d7 100644
--- a/0021-xen-sched-fix-cpu-hotplug.patch
+++ b/0021-xen-sched-fix-cpu-hotplug.patch
@@ -1,7 +1,7 @@
From 4f3204c2bc66db18c61600dd3e08bf1fd9584a1b Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 11 Oct 2022 15:00:19 +0200
-Subject: [PATCH 21/26] xen/sched: fix cpu hotplug
+Subject: [PATCH 21/87] xen/sched: fix cpu hotplug
Cpu unplugging is calling schedule_cpu_rm() via stop_machine_run() with
interrupts disabled, thus any memory allocation or freeing must be
@@ -303,5 +303,5 @@ index e286849a1312..0126a4bb9ed3 100644
struct cpupool *cpupool_get_by_id(unsigned int poolid);
void cpupool_put(struct cpupool *pool);
--
-2.37.3
+2.37.4
diff --git a/0022-Config.mk-correct-PIE-related-option-s-in-EMBEDDED_E.patch b/0022-Config.mk-correct-PIE-related-option-s-in-EMBEDDED_E.patch
index 03f485a..5432b3c 100644
--- a/0022-Config.mk-correct-PIE-related-option-s-in-EMBEDDED_E.patch
+++ b/0022-Config.mk-correct-PIE-related-option-s-in-EMBEDDED_E.patch
@@ -1,7 +1,7 @@
From 2b694dd2932be78431b14257f23b738f2fc8f6a1 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 11 Oct 2022 15:00:33 +0200
-Subject: [PATCH 22/26] Config.mk: correct PIE-related option(s) in
+Subject: [PATCH 22/87] Config.mk: correct PIE-related option(s) in
EMBEDDED_EXTRA_CFLAGS
I haven't been able to find evidence of "-nopie" ever having been a
@@ -54,5 +54,5 @@ index 46de3cd1e0e1..6f95067b8de6 100644
XEN_EXTFILES_URL ?= http://xenbits.xen.org/xen-extfiles
--
-2.37.3
+2.37.4
diff --git a/0023-tools-xenstore-minor-fix-of-the-migration-stream-doc.patch b/0023-tools-xenstore-minor-fix-of-the-migration-stream-doc.patch
index 45f7509..724d1d8 100644
--- a/0023-tools-xenstore-minor-fix-of-the-migration-stream-doc.patch
+++ b/0023-tools-xenstore-minor-fix-of-the-migration-stream-doc.patch
@@ -1,7 +1,7 @@
From 49510071ee93905378e54664778760ed3908d447 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 11 Oct 2022 15:00:59 +0200
-Subject: [PATCH 23/26] tools/xenstore: minor fix of the migration stream doc
+Subject: [PATCH 23/87] tools/xenstore: minor fix of the migration stream doc
Drop mentioning the non-existent read-only socket in the migration
stream description document.
@@ -37,5 +37,5 @@ index 5f1155273ec3..78530bbb0ef4 100644
\pagebreak
--
-2.37.3
+2.37.4
diff --git a/0024-xen-gnttab-fix-gnttab_acquire_resource.patch b/0024-xen-gnttab-fix-gnttab_acquire_resource.patch
index 898503f..49c0b7a 100644
--- a/0024-xen-gnttab-fix-gnttab_acquire_resource.patch
+++ b/0024-xen-gnttab-fix-gnttab_acquire_resource.patch
@@ -1,7 +1,7 @@
From b9560762392c01b3ee84148c07be8017cb42dbc9 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgross@suse.com>
Date: Tue, 11 Oct 2022 15:01:22 +0200
-Subject: [PATCH 24/26] xen/gnttab: fix gnttab_acquire_resource()
+Subject: [PATCH 24/87] xen/gnttab: fix gnttab_acquire_resource()
Commit 9dc46386d89d ("gnttab: work around "may be used uninitialized"
warning") was wrong, as vaddrs can legitimately be NULL in case
@@ -65,5 +65,5 @@ index d8ca645b96ff..76272b3c8add 100644
ASSERT_UNREACHABLE();
rc = -ENODATA;
--
-2.37.3
+2.37.4
diff --git a/0025-x86-wire-up-VCPUOP_register_vcpu_time_memory_area-fo.patch b/0025-x86-wire-up-VCPUOP_register_vcpu_time_memory_area-fo.patch
index 849ef60..489a9c8 100644
--- a/0025-x86-wire-up-VCPUOP_register_vcpu_time_memory_area-fo.patch
+++ b/0025-x86-wire-up-VCPUOP_register_vcpu_time_memory_area-fo.patch
@@ -1,7 +1,7 @@
From 3f4da85ca8816f6617529c80850eaddd80ea0f1f Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 11 Oct 2022 15:01:36 +0200
-Subject: [PATCH 25/26] x86: wire up VCPUOP_register_vcpu_time_memory_area for
+Subject: [PATCH 25/87] x86: wire up VCPUOP_register_vcpu_time_memory_area for
32-bit guests
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
@@ -55,5 +55,5 @@ index c46dccc25a54..d51d99344796 100644
rc = arch_do_vcpu_op(cmd, v, arg);
break;
--
-2.37.3
+2.37.4
diff --git a/0026-x86-vpmu-Fix-race-condition-in-vpmu_load.patch b/0026-x86-vpmu-Fix-race-condition-in-vpmu_load.patch
index 0f33747..910f573 100644
--- a/0026-x86-vpmu-Fix-race-condition-in-vpmu_load.patch
+++ b/0026-x86-vpmu-Fix-race-condition-in-vpmu_load.patch
@@ -1,7 +1,7 @@
From 1bce7fb1f702da4f7a749c6f1457ecb20bf74fca Mon Sep 17 00:00:00 2001
From: Tamas K Lengyel <tamas.lengyel@intel.com>
Date: Tue, 11 Oct 2022 15:01:48 +0200
-Subject: [PATCH 26/26] x86/vpmu: Fix race-condition in vpmu_load
+Subject: [PATCH 26/87] x86/vpmu: Fix race-condition in vpmu_load
The vPMU code-bases attempts to perform an optimization on saving/reloading the
PMU context by keeping track of what vCPU ran on each pCPU. When a pCPU is
@@ -93,5 +93,5 @@ index 16e91a3694fe..b6c2ec3cd047 100644
if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ||
(!has_vlapic(vpmu_vcpu(vpmu)->domain) &&
--
-2.37.3
+2.37.4
diff --git a/0027-arm-p2m-Rework-p2m_init.patch b/0027-arm-p2m-Rework-p2m_init.patch
new file mode 100644
index 0000000..0668899
--- /dev/null
+++ b/0027-arm-p2m-Rework-p2m_init.patch
@@ -0,0 +1,88 @@
+From 86cb37447548420e41ff953a7372972f6154d6d1 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 25 Oct 2022 09:21:11 +0000
+Subject: [PATCH 27/87] arm/p2m: Rework p2m_init()
+
+p2m_init() is mostly trivial initialisation, but has two fallible operations
+which are on either side of the backpointer trigger for teardown to take
+actions.
+
+p2m_free_vmid() is idempotent with a failed p2m_alloc_vmid(), so rearrange
+p2m_init() to perform all trivial setup, then set the backpointer, then
+perform all fallible setup.
+
+This will simplify a future bugfix which needs to add a third fallible
+operation.
+
+No practical change.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
+(cherry picked from commit: 3783e583319fa1ce75e414d851f0fde191a14753)
+---
+ xen/arch/arm/p2m.c | 24 ++++++++++++------------
+ 1 file changed, 12 insertions(+), 12 deletions(-)
+
+diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
+index b2d856a801af..4f7d923ad9f8 100644
+--- a/xen/arch/arm/p2m.c
++++ b/xen/arch/arm/p2m.c
+@@ -1730,7 +1730,7 @@ void p2m_final_teardown(struct domain *d)
+ int p2m_init(struct domain *d)
+ {
+ struct p2m_domain *p2m = p2m_get_hostp2m(d);
+- int rc = 0;
++ int rc;
+ unsigned int cpu;
+
+ rwlock_init(&p2m->lock);
+@@ -1739,11 +1739,6 @@ int p2m_init(struct domain *d)
+ INIT_PAGE_LIST_HEAD(&d->arch.paging.p2m_freelist);
+
+ p2m->vmid = INVALID_VMID;
+-
+- rc = p2m_alloc_vmid(d);
+- if ( rc != 0 )
+- return rc;
+-
+ p2m->max_mapped_gfn = _gfn(0);
+ p2m->lowest_mapped_gfn = _gfn(ULONG_MAX);
+
+@@ -1759,8 +1754,6 @@ int p2m_init(struct domain *d)
+ p2m->clean_pte = is_iommu_enabled(d) &&
+ !iommu_has_feature(d, IOMMU_FEAT_COHERENT_WALK);
+
+- rc = p2m_alloc_table(d);
+-
+ /*
+ * Make sure that the type chosen to is able to store the an vCPU ID
+ * between 0 and the maximum of virtual CPUS supported as long as
+@@ -1773,13 +1766,20 @@ int p2m_init(struct domain *d)
+ p2m->last_vcpu_ran[cpu] = INVALID_VCPU_ID;
+
+ /*
+- * Besides getting a domain when we only have the p2m in hand,
+- * the back pointer to domain is also used in p2m_teardown()
+- * as an end-of-initialization indicator.
++ * "Trivial" initialisation is now complete. Set the backpointer so
++ * p2m_teardown() and friends know to do something.
+ */
+ p2m->domain = d;
+
+- return rc;
++ rc = p2m_alloc_vmid(d);
++ if ( rc )
++ return rc;
++
++ rc = p2m_alloc_table(d);
++ if ( rc )
++ return rc;
++
++ return 0;
+ }
+
+ /*
+--
+2.37.4
+
diff --git a/0028-xen-arm-p2m-Populate-pages-for-GICv2-mapping-in-p2m_.patch b/0028-xen-arm-p2m-Populate-pages-for-GICv2-mapping-in-p2m_.patch
new file mode 100644
index 0000000..7bc6c36
--- /dev/null
+++ b/0028-xen-arm-p2m-Populate-pages-for-GICv2-mapping-in-p2m_.patch
@@ -0,0 +1,169 @@
+From e5a5bdeba6a0c3eacd2ba39c1ee36b3c54e77dca Mon Sep 17 00:00:00 2001
+From: Henry Wang <Henry.Wang@arm.com>
+Date: Tue, 25 Oct 2022 09:21:12 +0000
+Subject: [PATCH 28/87] xen/arm: p2m: Populate pages for GICv2 mapping in
+ p2m_init()
+
+Hardware using GICv2 needs to create a P2M mapping of 8KB GICv2 area
+when the domain is created. Considering the worst case of page tables
+which requires 6 P2M pages as the two pages will be consecutive but not
+necessarily in the same L3 page table and keep a buffer, populate 16
+pages as the default value to the P2M pages pool in p2m_init() at the
+domain creation stage to satisfy the GICv2 requirement. For GICv3, the
+above-mentioned P2M mapping is not necessary, but since the allocated
+16 pages here would not be lost, hence populate these pages
+unconditionally.
+
+With the default 16 P2M pages populated, there would be a case that
+failures would happen in the domain creation with P2M pages already in
+use. To properly free the P2M for this case, firstly support the
+optionally preemption of p2m_teardown(), then call p2m_teardown() and
+p2m_set_allocation(d, 0, NULL) non-preemptively in p2m_final_teardown().
+As non-preemptive p2m_teardown() should only return 0, use a
+BUG_ON to confirm that.
+
+Since p2m_final_teardown() is called either after
+domain_relinquish_resources() where relinquish_p2m_mapping() has been
+called, or from failure path of domain_create()/arch_domain_create()
+where mappings that require p2m_put_l3_page() should never be created,
+relinquish_p2m_mapping() is not added in p2m_final_teardown(), add
+in-code comments to refer this.
+
+Fixes: cbea5a1149ca ("xen/arm: Allocate and free P2M pages from the P2M pool")
+Suggested-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Henry Wang <Henry.Wang@arm.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
+(cherry picked from commit: c7cff1188802646eaa38e918e5738da0e84949be)
+---
+ xen/arch/arm/domain.c | 2 +-
+ xen/arch/arm/p2m.c | 34 ++++++++++++++++++++++++++++++++--
+ xen/include/asm-arm/p2m.h | 14 ++++++++++----
+ 3 files changed, 43 insertions(+), 7 deletions(-)
+
+diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
+index a818f33a1afa..c7feaa323ad1 100644
+--- a/xen/arch/arm/domain.c
++++ b/xen/arch/arm/domain.c
+@@ -1059,7 +1059,7 @@ int domain_relinquish_resources(struct domain *d)
+ return ret;
+
+ PROGRESS(p2m):
+- ret = p2m_teardown(d);
++ ret = p2m_teardown(d, true);
+ if ( ret )
+ return ret;
+
+diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
+index 4f7d923ad9f8..6f87e17c1d08 100644
+--- a/xen/arch/arm/p2m.c
++++ b/xen/arch/arm/p2m.c
+@@ -1661,7 +1661,7 @@ static void p2m_free_vmid(struct domain *d)
+ spin_unlock(&vmid_alloc_lock);
+ }
+
+-int p2m_teardown(struct domain *d)
++int p2m_teardown(struct domain *d, bool allow_preemption)
+ {
+ struct p2m_domain *p2m = p2m_get_hostp2m(d);
+ unsigned long count = 0;
+@@ -1669,6 +1669,9 @@ int p2m_teardown(struct domain *d)
+ unsigned int i;
+ int rc = 0;
+
++ if ( page_list_empty(&p2m->pages) )
++ return 0;
++
+ p2m_write_lock(p2m);
+
+ /*
+@@ -1692,7 +1695,7 @@ int p2m_teardown(struct domain *d)
+ p2m_free_page(p2m->domain, pg);
+ count++;
+ /* Arbitrarily preempt every 512 iterations */
+- if ( !(count % 512) && hypercall_preempt_check() )
++ if ( allow_preemption && !(count % 512) && hypercall_preempt_check() )
+ {
+ rc = -ERESTART;
+ break;
+@@ -1712,7 +1715,20 @@ void p2m_final_teardown(struct domain *d)
+ if ( !p2m->domain )
+ return;
+
++ /*
++ * No need to call relinquish_p2m_mapping() here because
++ * p2m_final_teardown() is called either after domain_relinquish_resources()
++ * where relinquish_p2m_mapping() has been called, or from failure path of
++ * domain_create()/arch_domain_create() where mappings that require
++ * p2m_put_l3_page() should never be created. For the latter case, also see
++ * comment on top of the p2m_set_entry() for more info.
++ */
++
++ BUG_ON(p2m_teardown(d, false));
+ ASSERT(page_list_empty(&p2m->pages));
++
++ while ( p2m_teardown_allocation(d) == -ERESTART )
++ continue; /* No preemption support here */
+ ASSERT(page_list_empty(&d->arch.paging.p2m_freelist));
+
+ if ( p2m->root )
+@@ -1779,6 +1795,20 @@ int p2m_init(struct domain *d)
+ if ( rc )
+ return rc;
+
++ /*
++ * Hardware using GICv2 needs to create a P2M mapping of 8KB GICv2 area
++ * when the domain is created. Considering the worst case for page
++ * tables and keep a buffer, populate 16 pages to the P2M pages pool here.
++ * For GICv3, the above-mentioned P2M mapping is not necessary, but since
++ * the allocated 16 pages here would not be lost, hence populate these
++ * pages unconditionally.
++ */
++ spin_lock(&d->arch.paging.lock);
++ rc = p2m_set_allocation(d, 16, NULL);
++ spin_unlock(&d->arch.paging.lock);
++ if ( rc )
++ return rc;
++
+ return 0;
+ }
+
+diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
+index c9598740bd02..b2725206e8de 100644
+--- a/xen/include/asm-arm/p2m.h
++++ b/xen/include/asm-arm/p2m.h
+@@ -194,14 +194,18 @@ int p2m_init(struct domain *d);
+
+ /*
+ * The P2M resources are freed in two parts:
+- * - p2m_teardown() will be called when relinquish the resources. It
+- * will free large resources (e.g. intermediate page-tables) that
+- * requires preemption.
++ * - p2m_teardown() will be called preemptively when relinquish the
++ * resources, in which case it will free large resources (e.g. intermediate
++ * page-tables) that requires preemption.
+ * - p2m_final_teardown() will be called when domain struct is been
+ * freed. This *cannot* be preempted and therefore one small
+ * resources should be freed here.
++ * Note that p2m_final_teardown() will also call p2m_teardown(), to properly
++ * free the P2M when failures happen in the domain creation with P2M pages
++ * already in use. In this case p2m_teardown() is called non-preemptively and
++ * p2m_teardown() will always return 0.
+ */
+-int p2m_teardown(struct domain *d);
++int p2m_teardown(struct domain *d, bool allow_preemption);
+ void p2m_final_teardown(struct domain *d);
+
+ /*
+@@ -266,6 +270,8 @@ mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
+ /*
+ * Direct set a p2m entry: only for use by the P2M code.
+ * The P2M write lock should be taken.
++ * TODO: Add a check in __p2m_set_entry() to avoid creating a mapping in
++ * arch_domain_create() that requires p2m_put_l3_page() to be called.
+ */
+ int p2m_set_entry(struct p2m_domain *p2m,
+ gfn_t sgfn,
+--
+2.37.4
+
diff --git a/0029-x86emul-respect-NSCB.patch b/0029-x86emul-respect-NSCB.patch
new file mode 100644
index 0000000..08785b7
--- /dev/null
+++ b/0029-x86emul-respect-NSCB.patch
@@ -0,0 +1,40 @@
+From 5dae06578cd5dcc312175b00ed6836a85732438d Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Mon, 31 Oct 2022 13:19:35 +0100
+Subject: [PATCH 29/87] x86emul: respect NSCB
+
+protmode_load_seg() would better adhere to that "feature" of clearing
+base (and limit) during NULL selector loads.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 87a20c98d9f0f422727fe9b4b9e22c2c43a5cd9c
+master date: 2022-10-11 14:30:41 +0200
+---
+ xen/arch/x86/x86_emulate/x86_emulate.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
+index 441086ea861d..847f8f37719f 100644
+--- a/xen/arch/x86/x86_emulate/x86_emulate.c
++++ b/xen/arch/x86/x86_emulate/x86_emulate.c
+@@ -1970,6 +1970,7 @@ amd_like(const struct x86_emulate_ctxt *ctxt)
+ #define vcpu_has_tbm() (ctxt->cpuid->extd.tbm)
+ #define vcpu_has_clzero() (ctxt->cpuid->extd.clzero)
+ #define vcpu_has_wbnoinvd() (ctxt->cpuid->extd.wbnoinvd)
++#define vcpu_has_nscb() (ctxt->cpuid->extd.nscb)
+
+ #define vcpu_has_bmi1() (ctxt->cpuid->feat.bmi1)
+ #define vcpu_has_hle() (ctxt->cpuid->feat.hle)
+@@ -2102,7 +2103,7 @@ protmode_load_seg(
+ case x86_seg_tr:
+ goto raise_exn;
+ }
+- if ( !_amd_like(cp) || !ops->read_segment ||
++ if ( !_amd_like(cp) || vcpu_has_nscb() || !ops->read_segment ||
+ ops->read_segment(seg, sreg, ctxt) != X86EMUL_OKAY )
+ memset(sreg, 0, sizeof(*sreg));
+ else
+--
+2.37.4
+
diff --git a/0030-VMX-correct-error-handling-in-vmx_create_vmcs.patch b/0030-VMX-correct-error-handling-in-vmx_create_vmcs.patch
new file mode 100644
index 0000000..e1b618d
--- /dev/null
+++ b/0030-VMX-correct-error-handling-in-vmx_create_vmcs.patch
@@ -0,0 +1,38 @@
+From 02ab5e97c41d275ccea0910b1d8bce41ed1be5bf Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Mon, 31 Oct 2022 13:20:40 +0100
+Subject: [PATCH 30/87] VMX: correct error handling in vmx_create_vmcs()
+
+With the addition of vmx_add_msr() calls to construct_vmcs() there are
+now cases where simply freeing the VMCS isn't enough: The MSR bitmap
+page as well as one of the MSR area ones (if it's the 2nd vmx_add_msr()
+which fails) may also need freeing. Switch to using vmx_destroy_vmcs()
+instead.
+
+Fixes: 3bd36952dab6 ("x86/spec-ctrl: Introduce an option to control L1D_FLUSH for HVM HAP guests")
+Fixes: 53a570b28569 ("x86/spec-ctrl: Support IBPB-on-entry")
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+master commit: 448d28309f1a966bdc850aff1a637e0b79a03e43
+master date: 2022-10-12 17:57:56 +0200
+---
+ xen/arch/x86/hvm/vmx/vmcs.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
+index dd817cee4e69..237b13459d4f 100644
+--- a/xen/arch/x86/hvm/vmx/vmcs.c
++++ b/xen/arch/x86/hvm/vmx/vmcs.c
+@@ -1831,7 +1831,7 @@ int vmx_create_vmcs(struct vcpu *v)
+
+ if ( (rc = construct_vmcs(v)) != 0 )
+ {
+- vmx_free_vmcs(vmx->vmcs_pa);
++ vmx_destroy_vmcs(v);
+ return rc;
+ }
+
+--
+2.37.4
+
diff --git a/0031-argo-Remove-reachable-ASSERT_UNREACHABLE.patch b/0031-argo-Remove-reachable-ASSERT_UNREACHABLE.patch
new file mode 100644
index 0000000..e89709d
--- /dev/null
+++ b/0031-argo-Remove-reachable-ASSERT_UNREACHABLE.patch
@@ -0,0 +1,41 @@
+From d4a11d6a22cf73ac7441750e5e8113779348885e Mon Sep 17 00:00:00 2001
+From: Jason Andryuk <jandryuk@gmail.com>
+Date: Mon, 31 Oct 2022 13:21:31 +0100
+Subject: [PATCH 31/87] argo: Remove reachable ASSERT_UNREACHABLE
+
+I observed this ASSERT_UNREACHABLE in partner_rings_remove consistently
+trip. It was in OpenXT with the viptables patch applied.
+
+dom10 shuts down.
+dom7 is REJECTED sending to dom10.
+dom7 shuts down and this ASSERT trips for dom10.
+
+The argo_send_info has a domid, but there is no refcount taken on
+the domain. Therefore it's not appropriate to ASSERT that the domain
+can be looked up via domid. Replace with a debug message.
+
+Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
+Reviewed-by: Christopher Clark <christopher.w.clark@gmail.com>
+master commit: 197f612b77c5afe04e60df2100a855370d720ad7
+master date: 2022-10-14 14:45:41 +0100
+---
+ xen/common/argo.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/xen/common/argo.c b/xen/common/argo.c
+index eaea7ba8885a..80f3275092af 100644
+--- a/xen/common/argo.c
++++ b/xen/common/argo.c
+@@ -1298,7 +1298,8 @@ partner_rings_remove(struct domain *src_d)
+ ASSERT_UNREACHABLE();
+ }
+ else
+- ASSERT_UNREACHABLE();
++ argo_dprintk("%pd has entry for stale partner d%u\n",
++ src_d, send_info->id.domain_id);
+
+ if ( dst_d )
+ rcu_unlock_domain(dst_d);
+--
+2.37.4
+
diff --git a/0032-EFI-don-t-convert-memory-marked-for-runtime-use-to-o.patch b/0032-EFI-don-t-convert-memory-marked-for-runtime-use-to-o.patch
new file mode 100644
index 0000000..33b98df
--- /dev/null
+++ b/0032-EFI-don-t-convert-memory-marked-for-runtime-use-to-o.patch
@@ -0,0 +1,64 @@
+From 54f8ed80c8308e65c3f57ae6cbd130f43f5ecbbd Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Mon, 31 Oct 2022 13:22:17 +0100
+Subject: [PATCH 32/87] EFI: don't convert memory marked for runtime use to
+ ordinary RAM
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
+higher priority than the type of the range. To avoid accessing memory at
+runtime which was re-used for other purposes, make
+efi_arch_process_memory_map() follow suit. While in theory the same would
+apply to EfiACPIReclaimMemory, we don't actually "reclaim" or clobber
+that memory (converted to E820_ACPI on x86) there (and it would be a bug
+if the Dom0 kernel tried to reclaim the range, bypassing Xen's memory
+management, plus it would be at least bogus if it clobbered that space),
+hence that type's handling can be left alone.
+
+Fixes: bf6501a62e80 ("x86-64: EFI boot code")
+Fixes: facac0af87ef ("x86-64: EFI runtime code")
+Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+master commit: f324300c8347b6aa6f9c0b18e0a90bbf44011a9a
+master date: 2022-10-21 12:30:24 +0200
+---
+ xen/arch/arm/efi/efi-boot.h | 3 ++-
+ xen/arch/x86/efi/efi-boot.h | 4 +++-
+ 2 files changed, 5 insertions(+), 2 deletions(-)
+
+diff --git a/xen/arch/arm/efi/efi-boot.h b/xen/arch/arm/efi/efi-boot.h
+index 9f267982397b..849071fe5308 100644
+--- a/xen/arch/arm/efi/efi-boot.h
++++ b/xen/arch/arm/efi/efi-boot.h
+@@ -194,7 +194,8 @@ static EFI_STATUS __init efi_process_memory_map_bootinfo(EFI_MEMORY_DESCRIPTOR *
+
+ for ( Index = 0; Index < (mmap_size / desc_size); Index++ )
+ {
+- if ( desc_ptr->Attribute & EFI_MEMORY_WB &&
++ if ( !(desc_ptr->Attribute & EFI_MEMORY_RUNTIME) &&
++ (desc_ptr->Attribute & EFI_MEMORY_WB) &&
+ (desc_ptr->Type == EfiConventionalMemory ||
+ desc_ptr->Type == EfiLoaderCode ||
+ desc_ptr->Type == EfiLoaderData ||
+diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
+index 4ee77fb9bfa2..d99601622310 100644
+--- a/xen/arch/x86/efi/efi-boot.h
++++ b/xen/arch/x86/efi/efi-boot.h
+@@ -185,7 +185,9 @@ static void __init efi_arch_process_memory_map(EFI_SYSTEM_TABLE *SystemTable,
+ /* fall through */
+ case EfiLoaderCode:
+ case EfiLoaderData:
+- if ( desc->Attribute & EFI_MEMORY_WB )
++ if ( desc->Attribute & EFI_MEMORY_RUNTIME )
++ type = E820_RESERVED;
++ else if ( desc->Attribute & EFI_MEMORY_WB )
+ type = E820_RAM;
+ else
+ case EfiUnusableMemory:
+--
+2.37.4
+
diff --git a/0033-xen-sched-fix-race-in-RTDS-scheduler.patch b/0033-xen-sched-fix-race-in-RTDS-scheduler.patch
new file mode 100644
index 0000000..93ee04b
--- /dev/null
+++ b/0033-xen-sched-fix-race-in-RTDS-scheduler.patch
@@ -0,0 +1,42 @@
+From 481465f35da1bcec0b2a4dfd6fc51d86cac28547 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Mon, 31 Oct 2022 13:22:54 +0100
+Subject: [PATCH 33/87] xen/sched: fix race in RTDS scheduler
+
+When a domain gets paused the unit runnable state can change to "not
+runnable" without the scheduling lock being involved. This means that
+a specific scheduler isn't involved in this change of runnable state.
+
+In the RTDS scheduler this can result in an inconsistency in case a
+unit is losing its "runnable" capability while the RTDS scheduler's
+scheduling function is active. RTDS will remove the unit from the run
+queue, but doesn't do so for the replenish queue, leading to hitting
+an ASSERT() in replq_insert() later when the domain is unpaused again.
+
+Fix that by removing the unit from the replenish queue as well in this
+case.
+
+Fixes: 7c7b407e7772 ("xen/sched: introduce unit_runnable_state()")
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Dario Faggioli <dfaggioli@suse.com>
+master commit: 73c62927f64ecb48f27d06176befdf76b879f340
+master date: 2022-10-21 12:32:23 +0200
+---
+ xen/common/sched/rt.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/xen/common/sched/rt.c b/xen/common/sched/rt.c
+index c24cd2ac3200..ec2ca1bebc26 100644
+--- a/xen/common/sched/rt.c
++++ b/xen/common/sched/rt.c
+@@ -1087,6 +1087,7 @@ rt_schedule(const struct scheduler *ops, struct sched_unit *currunit,
+ else if ( !unit_runnable_state(snext->unit) )
+ {
+ q_remove(snext);
++ replq_remove(ops, snext);
+ snext = rt_unit(sched_idle_unit(sched_cpu));
+ }
+
+--
+2.37.4
+
diff --git a/0034-xen-sched-fix-restore_vcpu_affinity-by-removing-it.patch b/0034-xen-sched-fix-restore_vcpu_affinity-by-removing-it.patch
new file mode 100644
index 0000000..eecec07
--- /dev/null
+++ b/0034-xen-sched-fix-restore_vcpu_affinity-by-removing-it.patch
@@ -0,0 +1,158 @@
+From 88f2bf5de9ad789e1c61b5d5ecf118909eed6917 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Mon, 31 Oct 2022 13:23:50 +0100
+Subject: [PATCH 34/87] xen/sched: fix restore_vcpu_affinity() by removing it
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When the system is coming up after having been suspended,
+restore_vcpu_affinity() is called for each domain in order to adjust
+the vcpu's affinity settings in case a cpu didn't come to live again.
+
+The way restore_vcpu_affinity() is doing that is wrong, because the
+specific scheduler isn't being informed about a possible migration of
+the vcpu to another cpu. Additionally the migration is often even
+happening if all cpus are running again, as it is done without check
+whether it is really needed.
+
+As cpupool management is already calling cpu_disable_scheduler() for
+cpus not having come up again, and cpu_disable_scheduler() is taking
+care of eventually needed vcpu migration in the proper way, there is
+simply no need for restore_vcpu_affinity().
+
+So just remove restore_vcpu_affinity() completely, together with the
+no longer used sched_reset_affinity_broken().
+
+Fixes: 8a04eaa8ea83 ("xen/sched: move some per-vcpu items to struct sched_unit")
+Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Dario Faggioli <dfaggioli@suse.com>
+Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
+master commit: fce1f381f7388daaa3e96dbb0d67d7a3e4bb2d2d
+master date: 2022-10-24 11:16:27 +0100
+---
+ xen/arch/x86/acpi/power.c | 3 --
+ xen/common/sched/core.c | 78 ---------------------------------------
+ xen/include/xen/sched.h | 1 -
+ 3 files changed, 82 deletions(-)
+
+diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
+index dd397f713067..1a7baeebe6d0 100644
+--- a/xen/arch/x86/acpi/power.c
++++ b/xen/arch/x86/acpi/power.c
+@@ -159,10 +159,7 @@ static void thaw_domains(void)
+
+ rcu_read_lock(&domlist_read_lock);
+ for_each_domain ( d )
+- {
+- restore_vcpu_affinity(d);
+ domain_unpause(d);
+- }
+ rcu_read_unlock(&domlist_read_lock);
+ }
+
+diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
+index 900aab8f66a7..9173cf690c72 100644
+--- a/xen/common/sched/core.c
++++ b/xen/common/sched/core.c
+@@ -1188,84 +1188,6 @@ static bool sched_check_affinity_broken(const struct sched_unit *unit)
+ return false;
+ }
+
+-static void sched_reset_affinity_broken(const struct sched_unit *unit)
+-{
+- struct vcpu *v;
+-
+- for_each_sched_unit_vcpu ( unit, v )
+- v->affinity_broken = false;
+-}
+-
+-void restore_vcpu_affinity(struct domain *d)
+-{
+- unsigned int cpu = smp_processor_id();
+- struct sched_unit *unit;
+-
+- ASSERT(system_state == SYS_STATE_resume);
+-
+- rcu_read_lock(&sched_res_rculock);
+-
+- for_each_sched_unit ( d, unit )
+- {
+- spinlock_t *lock;
+- unsigned int old_cpu = sched_unit_master(unit);
+- struct sched_resource *res;
+-
+- ASSERT(!unit_runnable(unit));
+-
+- /*
+- * Re-assign the initial processor as after resume we have no
+- * guarantee the old processor has come back to life again.
+- *
+- * Therefore, here, before actually unpausing the domains, we should
+- * set v->processor of each of their vCPUs to something that will
+- * make sense for the scheduler of the cpupool in which they are in.
+- */
+- lock = unit_schedule_lock_irq(unit);
+-
+- cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
+- cpupool_domain_master_cpumask(d));
+- if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
+- {
+- if ( sched_check_affinity_broken(unit) )
+- {
+- sched_set_affinity(unit, unit->cpu_hard_affinity_saved, NULL);
+- sched_reset_affinity_broken(unit);
+- cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
+- cpupool_domain_master_cpumask(d));
+- }
+-
+- if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
+- {
+- /* Affinity settings of one vcpu are for the complete unit. */
+- printk(XENLOG_DEBUG "Breaking affinity for %pv\n",
+- unit->vcpu_list);
+- sched_set_affinity(unit, &cpumask_all, NULL);
+- cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
+- cpupool_domain_master_cpumask(d));
+- }
+- }
+-
+- res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu)));
+- sched_set_res(unit, res);
+-
+- spin_unlock_irq(lock);
+-
+- /* v->processor might have changed, so reacquire the lock. */
+- lock = unit_schedule_lock_irq(unit);
+- res = sched_pick_resource(unit_scheduler(unit), unit);
+- sched_set_res(unit, res);
+- spin_unlock_irq(lock);
+-
+- if ( old_cpu != sched_unit_master(unit) )
+- sched_move_irqs(unit);
+- }
+-
+- rcu_read_unlock(&sched_res_rculock);
+-
+- domain_update_node_affinity(d);
+-}
+-
+ /*
+ * This function is used by cpu_hotplug code via cpu notifier chain
+ * and from cpupools to switch schedulers on a cpu.
+diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
+index 3f4225738a40..1a1fab5239ec 100644
+--- a/xen/include/xen/sched.h
++++ b/xen/include/xen/sched.h
+@@ -999,7 +999,6 @@ void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value);
+ void sched_setup_dom0_vcpus(struct domain *d);
+ int vcpu_temporary_affinity(struct vcpu *v, unsigned int cpu, uint8_t reason);
+ int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity);
+-void restore_vcpu_affinity(struct domain *d);
+ int vcpu_affinity_domctl(struct domain *d, uint32_t cmd,
+ struct xen_domctl_vcpuaffinity *vcpuaff);
+
+--
+2.37.4
+
diff --git a/0035-x86-shadow-drop-replace-bogus-assertions.patch b/0035-x86-shadow-drop-replace-bogus-assertions.patch
new file mode 100644
index 0000000..55e9f62
--- /dev/null
+++ b/0035-x86-shadow-drop-replace-bogus-assertions.patch
@@ -0,0 +1,71 @@
+From 9fdb4f17656f74b35af0882b558e44832ff00b5f Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Mon, 31 Oct 2022 13:24:33 +0100
+Subject: [PATCH 35/87] x86/shadow: drop (replace) bogus assertions
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The addition of a call to shadow_blow_tables() from shadow_teardown()
+has resulted in the "no vcpus" related assertion becoming triggerable:
+If domain_create() fails with at least one page successfully allocated
+in the course of shadow_enable(), or if domain_create() succeeds and
+the domain is then killed without ever invoking XEN_DOMCTL_max_vcpus.
+Note that in-tree tests (test-resource and test-tsx) do exactly the
+latter of these two.
+
+The assertion's comment was bogus anyway: Shadow mode has been getting
+enabled before allocation of vCPU-s for quite some time. Convert the
+assertion to a conditional: As long as there are no vCPU-s, there's
+nothing to blow away.
+
+Fixes: e7aa55c0aab3 ("x86/p2m: free the paging memory pool preemptively")
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+A similar assertion/comment pair exists in _shadow_prealloc(); the
+comment is similarly bogus, and the assertion could in principle trigger
+e.g. when shadow_alloc_p2m_page() is called early enough. Replace those
+at the same time by a similar early return, here indicating failure to
+the caller (which will generally lead to the domain being crashed in
+shadow_prealloc()).
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: a92dc2bb30ba65ae25d2f417677eb7ef9a6a0fef
+master date: 2022-10-24 15:46:11 +0200
+---
+ xen/arch/x86/mm/shadow/common.c | 10 ++++++----
+ 1 file changed, 6 insertions(+), 4 deletions(-)
+
+diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
+index 3b0d781991b5..1de0139742f7 100644
+--- a/xen/arch/x86/mm/shadow/common.c
++++ b/xen/arch/x86/mm/shadow/common.c
+@@ -943,8 +943,9 @@ static bool __must_check _shadow_prealloc(struct domain *d, unsigned int pages)
+ /* No reclaim when the domain is dying, teardown will take care of it. */
+ return false;
+
+- /* Shouldn't have enabled shadows if we've no vcpus. */
+- ASSERT(d->vcpu && d->vcpu[0]);
++ /* Nothing to reclaim when there are no vcpus yet. */
++ if ( !d->vcpu[0] )
++ return false;
+
+ /* Stage one: walk the list of pinned pages, unpinning them */
+ perfc_incr(shadow_prealloc_1);
+@@ -1034,8 +1035,9 @@ void shadow_blow_tables(struct domain *d)
+ mfn_t smfn;
+ int i;
+
+- /* Shouldn't have enabled shadows if we've no vcpus. */
+- ASSERT(d->vcpu && d->vcpu[0]);
++ /* Nothing to do when there are no vcpus yet. */
++ if ( !d->vcpu[0] )
++ return;
+
+ /* Pass one: unpin all pinned pages */
+ foreach_pinned_shadow(d, sp, t)
+--
+2.37.4
+
diff --git a/0036-vpci-don-t-assume-that-vpci-per-device-data-exists-u.patch b/0036-vpci-don-t-assume-that-vpci-per-device-data-exists-u.patch
new file mode 100644
index 0000000..ab8f792
--- /dev/null
+++ b/0036-vpci-don-t-assume-that-vpci-per-device-data-exists-u.patch
@@ -0,0 +1,60 @@
+From 96d26f11f56e83b98ec184f4e0d17161efe3a927 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Mon, 31 Oct 2022 13:25:13 +0100
+Subject: [PATCH 36/87] vpci: don't assume that vpci per-device data exists
+ unconditionally
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+It's possible for a device to be assigned to a domain but have no
+vpci structure if vpci_process_pending() failed and called
+vpci_remove_device() as a result. The unconditional accesses done by
+vpci_{read,write}() and vpci_remove_device() to pdev->vpci would
+then trigger a NULL pointer dereference.
+
+Add checks for pdev->vpci presence in the affected functions.
+
+Fixes: 9c244fdef7 ('vpci: add header handlers')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 6ccb5e308ceeb895fbccd87a528a8bd24325aa39
+master date: 2022-10-26 14:55:30 +0200
+---
+ xen/drivers/vpci/vpci.c | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
+index dfc8136ffb95..53d78d53911d 100644
+--- a/xen/drivers/vpci/vpci.c
++++ b/xen/drivers/vpci/vpci.c
+@@ -37,7 +37,7 @@ extern vpci_register_init_t *const __end_vpci_array[];
+
+ void vpci_remove_device(struct pci_dev *pdev)
+ {
+- if ( !has_vpci(pdev->domain) )
++ if ( !has_vpci(pdev->domain) || !pdev->vpci )
+ return;
+
+ spin_lock(&pdev->vpci->lock);
+@@ -326,7 +326,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
+
+ /* Find the PCI dev matching the address. */
+ pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
+- if ( !pdev )
++ if ( !pdev || !pdev->vpci )
+ return vpci_read_hw(sbdf, reg, size);
+
+ spin_lock(&pdev->vpci->lock);
+@@ -436,7 +436,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
+ * Passthrough everything that's not trapped.
+ */
+ pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
+- if ( !pdev )
++ if ( !pdev || !pdev->vpci )
+ {
+ vpci_write_hw(sbdf, reg, size, data);
+ return;
+--
+2.37.4
+
diff --git a/0037-vpci-msix-remove-from-table-list-on-detach.patch b/0037-vpci-msix-remove-from-table-list-on-detach.patch
new file mode 100644
index 0000000..2bae0a2
--- /dev/null
+++ b/0037-vpci-msix-remove-from-table-list-on-detach.patch
@@ -0,0 +1,47 @@
+From 8f3f8f20de5cea704671d4ca83f2dceb93ab98d8 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Mon, 31 Oct 2022 13:25:40 +0100
+Subject: [PATCH 37/87] vpci/msix: remove from table list on detach
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Teardown of MSIX vPCI related data doesn't currently remove the MSIX
+device data from the list of MSIX tables handled by the domain,
+leading to a use-after-free of the data in the msix structure.
+
+Remove the structure from the list before freeing in order to solve
+it.
+
+Reported-by: Jan Beulich <jbeulich@suse.com>
+Fixes: d6281be9d0 ('vpci/msix: add MSI-X handlers')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: c14aea137eab29eb9c30bfad745a00c65ad21066
+master date: 2022-10-26 14:56:58 +0200
+---
+ xen/drivers/vpci/vpci.c | 8 ++++++--
+ 1 file changed, 6 insertions(+), 2 deletions(-)
+
+diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
+index 53d78d53911d..b9339f8f3e43 100644
+--- a/xen/drivers/vpci/vpci.c
++++ b/xen/drivers/vpci/vpci.c
+@@ -51,8 +51,12 @@ void vpci_remove_device(struct pci_dev *pdev)
+ xfree(r);
+ }
+ spin_unlock(&pdev->vpci->lock);
+- if ( pdev->vpci->msix && pdev->vpci->msix->pba )
+- iounmap(pdev->vpci->msix->pba);
++ if ( pdev->vpci->msix )
++ {
++ list_del(&pdev->vpci->msix->next);
++ if ( pdev->vpci->msix->pba )
++ iounmap(pdev->vpci->msix->pba);
++ }
+ xfree(pdev->vpci->msix);
+ xfree(pdev->vpci->msi);
+ xfree(pdev->vpci);
+--
+2.37.4
+
diff --git a/0038-x86-also-zap-secondary-time-area-handles-during-soft.patch b/0038-x86-also-zap-secondary-time-area-handles-during-soft.patch
new file mode 100644
index 0000000..286661a
--- /dev/null
+++ b/0038-x86-also-zap-secondary-time-area-handles-during-soft.patch
@@ -0,0 +1,49 @@
+From aac108509055e5f5ff293e1fb44614f96a0996c6 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Mon, 31 Oct 2022 13:26:08 +0100
+Subject: [PATCH 38/87] x86: also zap secondary time area handles during soft
+ reset
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Just like domain_soft_reset() properly zaps runstate area handles, the
+secondary time area ones also need discarding to prevent guest memory
+corruption once the guest is re-started.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: b80d4f8d2ea6418e32fb4f20d1304ace6d6566e3
+master date: 2022-10-27 11:49:09 +0200
+---
+ xen/arch/x86/domain.c | 6 ++++++
+ 1 file changed, 6 insertions(+)
+
+diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
+index a4356893bdbc..3fab2364be8d 100644
+--- a/xen/arch/x86/domain.c
++++ b/xen/arch/x86/domain.c
+@@ -929,6 +929,7 @@ int arch_domain_soft_reset(struct domain *d)
+ struct page_info *page = virt_to_page(d->shared_info), *new_page;
+ int ret = 0;
+ struct domain *owner;
++ struct vcpu *v;
+ mfn_t mfn;
+ gfn_t gfn;
+ p2m_type_t p2mt;
+@@ -1008,7 +1009,12 @@ int arch_domain_soft_reset(struct domain *d)
+ "Failed to add a page to replace %pd's shared_info frame %"PRI_gfn"\n",
+ d, gfn_x(gfn));
+ free_domheap_page(new_page);
++ goto exit_put_gfn;
+ }
++
++ for_each_vcpu ( d, v )
++ set_xen_guest_handle(v->arch.time_info_guest, NULL);
++
+ exit_put_gfn:
+ put_gfn(d, gfn_x(gfn));
+ exit_put_page:
+--
+2.37.4
+
diff --git a/0039-common-map_vcpu_info-wants-to-unshare-the-underlying.patch b/0039-common-map_vcpu_info-wants-to-unshare-the-underlying.patch
new file mode 100644
index 0000000..cea8bb5
--- /dev/null
+++ b/0039-common-map_vcpu_info-wants-to-unshare-the-underlying.patch
@@ -0,0 +1,41 @@
+From 426a8346c01075ec5eba4aadefab03a96b6ece6a Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Mon, 31 Oct 2022 13:26:33 +0100
+Subject: [PATCH 39/87] common: map_vcpu_info() wants to unshare the underlying
+ page
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Not passing P2M_UNSHARE to get_page_from_gfn() means there won't even be
+an attempt to unshare the referenced page, without any indication to the
+caller (e.g. -EAGAIN). Note that guests have no direct control over
+which of their pages are shared (or paged out), and hence they have no
+way to make sure all on their own that the subsequent obtaining of a
+writable type reference can actually succeed.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+master commit: 48980cf24d5cf41fd644600f99c753419505e735
+master date: 2022-10-28 11:38:32 +0200
+---
+ xen/common/domain.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/common/domain.c b/xen/common/domain.c
+index 56d47dd66478..e3afcacb6cae 100644
+--- a/xen/common/domain.c
++++ b/xen/common/domain.c
+@@ -1471,7 +1471,7 @@ int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset)
+ if ( (v != current) && !(v->pause_flags & VPF_down) )
+ return -EINVAL;
+
+- page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
++ page = get_page_from_gfn(d, gfn, NULL, P2M_UNSHARE);
+ if ( !page )
+ return -EINVAL;
+
+--
+2.37.4
+
diff --git a/0040-x86-pv-shim-correctly-ignore-empty-onlining-requests.patch b/0040-x86-pv-shim-correctly-ignore-empty-onlining-requests.patch
new file mode 100644
index 0000000..d242cb2
--- /dev/null
+++ b/0040-x86-pv-shim-correctly-ignore-empty-onlining-requests.patch
@@ -0,0 +1,43 @@
+From 08f6c88405a4406cac5b90e8d9873258dc445006 Mon Sep 17 00:00:00 2001
+From: Igor Druzhinin <igor.druzhinin@citrix.com>
+Date: Mon, 31 Oct 2022 13:26:59 +0100
+Subject: [PATCH 40/87] x86/pv-shim: correctly ignore empty onlining requests
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Mem-op requests may have zero extents. Such requests need treating as
+no-ops. pv_shim_online_memory(), however, would have tried to take 2³²-1
+order-sized pages from its balloon list (to then populate them),
+typically ending when the entire set of ballooned pages of this order
+was consumed.
+
+Note that pv_shim_offline_memory() does not have such an issue.
+
+Fixes: b2245acc60c3 ("xen/pvshim: memory hotplug")
+Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 9272225ca72801fd9fa5b268a2d1c5adebd19cd9
+master date: 2022-10-28 15:47:59 +0200
+---
+ xen/arch/x86/pv/shim.c | 3 +++
+ 1 file changed, 3 insertions(+)
+
+diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
+index d9704121a739..4146ee3f9ce8 100644
+--- a/xen/arch/x86/pv/shim.c
++++ b/xen/arch/x86/pv/shim.c
+@@ -944,6 +944,9 @@ void pv_shim_online_memory(unsigned int nr, unsigned int order)
+ struct page_info *page, *tmp;
+ PAGE_LIST_HEAD(list);
+
++ if ( !nr )
++ return;
++
+ spin_lock(&balloon_lock);
+ page_list_for_each_safe ( page, tmp, &balloon )
+ {
+--
+2.37.4
+
diff --git a/0041-x86-pv-shim-correct-ballooning-up-for-compat-guests.patch b/0041-x86-pv-shim-correct-ballooning-up-for-compat-guests.patch
new file mode 100644
index 0000000..5c77bbf
--- /dev/null
+++ b/0041-x86-pv-shim-correct-ballooning-up-for-compat-guests.patch
@@ -0,0 +1,55 @@
+From 2f75e3654f00a62bd1f446a7424ccd56750a2e15 Mon Sep 17 00:00:00 2001
+From: Igor Druzhinin <igor.druzhinin@citrix.com>
+Date: Mon, 31 Oct 2022 13:28:15 +0100
+Subject: [PATCH 41/87] x86/pv-shim: correct ballooning up for compat guests
+
+The compat layer for multi-extent memory ops may need to split incoming
+requests. Since the guest handles in the interface structures may not be
+altered, it does so by leveraging do_memory_op()'s continuation
+handling: It hands on non-initial requests with a non-zero start extent,
+with the (native) handle suitably adjusted down. As a result
+do_memory_op() sees only the first of potentially several requests with
+start extent being zero. It's only that case when the function would
+issue a call to pv_shim_online_memory(), yet the range then covers only
+the first sub-range that results from the split.
+
+Address that breakage by making a complementary call to
+pv_shim_online_memory() in compat layer.
+
+Fixes: b2245acc60c3 ("xen/pvshim: memory hotplug")
+Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: a0bfdd201ea12aa5679bb8944d63a4e0d3c23160
+master date: 2022-10-28 15:48:50 +0200
+---
+ xen/common/compat/memory.c | 6 +++++-
+ 1 file changed, 5 insertions(+), 1 deletion(-)
+
+diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
+index c43fa97cf15f..a0e0562a4033 100644
+--- a/xen/common/compat/memory.c
++++ b/xen/common/compat/memory.c
+@@ -7,6 +7,7 @@ EMIT_FILE;
+ #include <xen/event.h>
+ #include <xen/mem_access.h>
+ #include <asm/current.h>
++#include <asm/guest.h>
+ #include <compat/memory.h>
+
+ #define xen_domid_t domid_t
+@@ -146,7 +147,10 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
+ nat.rsrv->nr_extents = end_extent;
+ ++split;
+ }
+-
++ /* Avoid calling pv_shim_online_memory() when in a continuation. */
++ if ( pv_shim && op != XENMEM_decrease_reservation && !start_extent )
++ pv_shim_online_memory(cmp.rsrv.nr_extents - nat.rsrv->nr_extents,
++ cmp.rsrv.extent_order);
+ break;
+
+ case XENMEM_exchange:
+--
+2.37.4
+
diff --git a/0042-x86-pv-shim-correct-ballooning-down-for-compat-guest.patch b/0042-x86-pv-shim-correct-ballooning-down-for-compat-guest.patch
new file mode 100644
index 0000000..dd044e4
--- /dev/null
+++ b/0042-x86-pv-shim-correct-ballooning-down-for-compat-guest.patch
@@ -0,0 +1,72 @@
+From c229b16ba3eb5579a9a5d470ab16dd9ad55e57d6 Mon Sep 17 00:00:00 2001
+From: Igor Druzhinin <igor.druzhinin@citrix.com>
+Date: Mon, 31 Oct 2022 13:28:46 +0100
+Subject: [PATCH 42/87] x86/pv-shim: correct ballooning down for compat guests
+
+The compat layer for multi-extent memory ops may need to split incoming
+requests. Since the guest handles in the interface structures may not be
+altered, it does so by leveraging do_memory_op()'s continuation
+handling: It hands on non-initial requests with a non-zero start extent,
+with the (native) handle suitably adjusted down. As a result
+do_memory_op() sees only the first of potentially several requests with
+start extent being zero. In order to be usable as overall result, the
+function accumulates args.nr_done, i.e. it initialized the field with
+the start extent. Therefore non-initial requests resulting from the
+split would pass too large a number into pv_shim_offline_memory().
+
+Address that breakage by always calling pv_shim_offline_memory()
+regardless of current hypercall preemption status, with a suitably
+adjusted first argument. Note that this is correct also for the native
+guest case: We now simply "commit" what was completed right away, rather
+than at the end of a series of preemption/re-start cycles. In fact this
+improves overall preemption behavior: There's no longer a potentially
+big chunk of work done non-preemptively at the end of the last
+"iteration".
+
+Fixes: b2245acc60c3 ("xen/pvshim: memory hotplug")
+Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 1d7fbc535d1d37bdc2cc53ede360b0f6651f7de1
+master date: 2022-10-28 15:49:33 +0200
+---
+ xen/common/memory.c | 19 +++++++------------
+ 1 file changed, 7 insertions(+), 12 deletions(-)
+
+diff --git a/xen/common/memory.c b/xen/common/memory.c
+index 064de4ad8d66..76f8858cc379 100644
+--- a/xen/common/memory.c
++++ b/xen/common/memory.c
+@@ -1420,22 +1420,17 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
+
+ rc = args.nr_done;
+
+- if ( args.preempted )
+- return hypercall_create_continuation(
+- __HYPERVISOR_memory_op, "lh",
+- op | (rc << MEMOP_EXTENT_SHIFT), arg);
+-
+ #ifdef CONFIG_X86
+ if ( pv_shim && op == XENMEM_decrease_reservation )
+- /*
+- * Only call pv_shim_offline_memory when the hypercall has
+- * finished. Note that nr_done is used to cope in case the
+- * hypercall has failed and only part of the extents where
+- * processed.
+- */
+- pv_shim_offline_memory(args.nr_done, args.extent_order);
++ pv_shim_offline_memory(args.nr_done - start_extent,
++ args.extent_order);
+ #endif
+
++ if ( args.preempted )
++ return hypercall_create_continuation(
++ __HYPERVISOR_memory_op, "lh",
++ op | (rc << MEMOP_EXTENT_SHIFT), arg);
++
+ break;
+
+ case XENMEM_exchange:
+--
+2.37.4
+
diff --git a/0043-x86-vmx-Revert-VMX-use-a-single-global-APIC-access-p.patch b/0043-x86-vmx-Revert-VMX-use-a-single-global-APIC-access-p.patch
new file mode 100644
index 0000000..92b3bf1
--- /dev/null
+++ b/0043-x86-vmx-Revert-VMX-use-a-single-global-APIC-access-p.patch
@@ -0,0 +1,259 @@
+From 62e7fb702db4adaa9415ac87d95e0f461e32d9ca Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 24 Aug 2022 14:16:44 +0100
+Subject: [PATCH 43/87] x86/vmx: Revert "VMX: use a single, global APIC access
+ page"
+
+The claim "No accesses would ever go to this page." is false. A consequence
+of how Intel's APIC Acceleration works, and Xen's choice to have per-domain
+P2Ms (rather than per-vCPU P2Ms) means that the APIC page is fully read-write
+to any vCPU which is not in xAPIC mode.
+
+This reverts commit 58850b9074d3e7affdf3bc94c84e417ecfa4d165.
+
+This is XSA-412 / CVE-2022-42327.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 3b5beaf49033cddf4b2cc4e4d391b966f4203471)
+---
+ xen/arch/x86/hvm/vmx/vmx.c | 59 ++++++++++++++++++++++--------
+ xen/arch/x86/mm/shadow/set.c | 8 ----
+ xen/arch/x86/mm/shadow/types.h | 7 ----
+ xen/include/asm-x86/hvm/vmx/vmcs.h | 1 +
+ xen/include/asm-x86/mm.h | 20 +---------
+ 5 files changed, 46 insertions(+), 49 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
+index d429d76c18c9..3f4276531322 100644
+--- a/xen/arch/x86/hvm/vmx/vmx.c
++++ b/xen/arch/x86/hvm/vmx/vmx.c
+@@ -66,7 +66,8 @@ boolean_param("force-ept", opt_force_ept);
+ static void vmx_ctxt_switch_from(struct vcpu *v);
+ static void vmx_ctxt_switch_to(struct vcpu *v);
+
+-static int alloc_vlapic_mapping(void);
++static int vmx_alloc_vlapic_mapping(struct domain *d);
++static void vmx_free_vlapic_mapping(struct domain *d);
+ static void vmx_install_vlapic_mapping(struct vcpu *v);
+ static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr,
+ unsigned int flags);
+@@ -77,8 +78,6 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content);
+ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
+ static void vmx_invlpg(struct vcpu *v, unsigned long linear);
+
+-static mfn_t __read_mostly apic_access_mfn = INVALID_MFN_INITIALIZER;
+-
+ /* Values for domain's ->arch.hvm_domain.pi_ops.flags. */
+ #define PI_CSW_FROM (1u << 0)
+ #define PI_CSW_TO (1u << 1)
+@@ -402,6 +401,7 @@ static int vmx_domain_initialise(struct domain *d)
+ .to = vmx_ctxt_switch_to,
+ .tail = vmx_do_resume,
+ };
++ int rc;
+
+ d->arch.ctxt_switch = &csw;
+
+@@ -411,15 +411,24 @@ static int vmx_domain_initialise(struct domain *d)
+ */
+ d->arch.hvm.vmx.exec_sp = is_hardware_domain(d) || opt_ept_exec_sp;
+
++ if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 )
++ return rc;
++
+ return 0;
+ }
+
++static void vmx_domain_relinquish_resources(struct domain *d)
++{
++ vmx_free_vlapic_mapping(d);
++}
++
+ static void domain_creation_finished(struct domain *d)
+ {
+ gfn_t gfn = gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE);
++ mfn_t apic_access_mfn = d->arch.hvm.vmx.apic_access_mfn;
+ bool ipat;
+
+- if ( !has_vlapic(d) || mfn_eq(apic_access_mfn, INVALID_MFN) )
++ if ( mfn_eq(apic_access_mfn, _mfn(0)) )
+ return;
+
+ ASSERT(epte_get_entry_emt(d, gfn, apic_access_mfn, 0, &ipat,
+@@ -2481,6 +2490,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
+ .cpu_up_prepare = vmx_cpu_up_prepare,
+ .cpu_dead = vmx_cpu_dead,
+ .domain_initialise = vmx_domain_initialise,
++ .domain_relinquish_resources = vmx_domain_relinquish_resources,
+ .domain_creation_finished = domain_creation_finished,
+ .vcpu_initialise = vmx_vcpu_initialise,
+ .vcpu_destroy = vmx_vcpu_destroy,
+@@ -2731,7 +2741,7 @@ const struct hvm_function_table * __init start_vmx(void)
+ {
+ set_in_cr4(X86_CR4_VMXE);
+
+- if ( vmx_vmcs_init() || alloc_vlapic_mapping() )
++ if ( vmx_vmcs_init() )
+ {
+ printk("VMX: failed to initialise.\n");
+ return NULL;
+@@ -3305,36 +3315,55 @@ gp_fault:
+ return X86EMUL_EXCEPTION;
+ }
+
+-static int __init alloc_vlapic_mapping(void)
++static int vmx_alloc_vlapic_mapping(struct domain *d)
+ {
+ struct page_info *pg;
+ mfn_t mfn;
+
+- if ( !cpu_has_vmx_virtualize_apic_accesses )
++ if ( !has_vlapic(d) || !cpu_has_vmx_virtualize_apic_accesses )
+ return 0;
+
+- pg = alloc_domheap_page(NULL, 0);
++ pg = alloc_domheap_page(d, MEMF_no_refcount);
+ if ( !pg )
+ return -ENOMEM;
+
+- /*
+- * Signal to shadow code that this page cannot be refcounted. This also
+- * makes epte_get_entry_emt() recognize this page as "special".
+- */
+- page_suppress_refcounting(pg);
++ if ( !get_page_and_type(pg, d, PGT_writable_page) )
++ {
++ /*
++ * The domain can't possibly know about this page yet, so failure
++ * here is a clear indication of something fishy going on.
++ */
++ domain_crash(d);
++ return -ENODATA;
++ }
+
+ mfn = page_to_mfn(pg);
+ clear_domain_page(mfn);
+- apic_access_mfn = mfn;
++ d->arch.hvm.vmx.apic_access_mfn = mfn;
+
+ return 0;
+ }
+
++static void vmx_free_vlapic_mapping(struct domain *d)
++{
++ mfn_t mfn = d->arch.hvm.vmx.apic_access_mfn;
++
++ d->arch.hvm.vmx.apic_access_mfn = _mfn(0);
++ if ( !mfn_eq(mfn, _mfn(0)) )
++ {
++ struct page_info *pg = mfn_to_page(mfn);
++
++ put_page_alloc_ref(pg);
++ put_page_and_type(pg);
++ }
++}
++
+ static void vmx_install_vlapic_mapping(struct vcpu *v)
+ {
++ mfn_t apic_access_mfn = v->domain->arch.hvm.vmx.apic_access_mfn;
+ paddr_t virt_page_ma, apic_page_ma;
+
+- if ( !has_vlapic(v->domain) || mfn_eq(apic_access_mfn, INVALID_MFN) )
++ if ( mfn_eq(apic_access_mfn, _mfn(0)) )
+ return;
+
+ ASSERT(cpu_has_vmx_virtualize_apic_accesses);
+diff --git a/xen/arch/x86/mm/shadow/set.c b/xen/arch/x86/mm/shadow/set.c
+index 87e9c6eeb219..bd6c68b547c9 100644
+--- a/xen/arch/x86/mm/shadow/set.c
++++ b/xen/arch/x86/mm/shadow/set.c
+@@ -101,14 +101,6 @@ shadow_get_page_from_l1e(shadow_l1e_t sl1e, struct domain *d, p2m_type_t type)
+ owner = page_get_owner(pg);
+ }
+
+- /*
+- * Check whether refcounting is suppressed on this page. For example,
+- * VMX'es APIC access MFN is just a surrogate page. It doesn't actually
+- * get accessed, and hence there's no need to refcount it.
+- */
+- if ( pg && page_refcounting_suppressed(pg) )
+- return 0;
+-
+ if ( owner == dom_io )
+ owner = NULL;
+
+diff --git a/xen/arch/x86/mm/shadow/types.h b/xen/arch/x86/mm/shadow/types.h
+index 6970e7d6ea4a..814a4018535a 100644
+--- a/xen/arch/x86/mm/shadow/types.h
++++ b/xen/arch/x86/mm/shadow/types.h
+@@ -276,16 +276,9 @@ int shadow_set_l4e(struct domain *d, shadow_l4e_t *sl4e,
+ static void inline
+ shadow_put_page_from_l1e(shadow_l1e_t sl1e, struct domain *d)
+ {
+- mfn_t mfn = shadow_l1e_get_mfn(sl1e);
+-
+ if ( !shadow_mode_refcounts(d) )
+ return;
+
+- if ( mfn_valid(mfn) &&
+- /* See the respective comment in shadow_get_page_from_l1e(). */
+- page_refcounting_suppressed(mfn_to_page(mfn)) )
+- return;
+-
+ put_page_from_l1e(sl1e, d);
+ }
+
+diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
+index 03c9ccf627ab..8073af323b96 100644
+--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
++++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
+@@ -58,6 +58,7 @@ struct ept_data {
+ #define _VMX_DOMAIN_PML_ENABLED 0
+ #define VMX_DOMAIN_PML_ENABLED (1ul << _VMX_DOMAIN_PML_ENABLED)
+ struct vmx_domain {
++ mfn_t apic_access_mfn;
+ /* VMX_DOMAIN_* */
+ unsigned int status;
+
+diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
+index 7bdf9c2290d8..e1bcea57a8f5 100644
+--- a/xen/include/asm-x86/mm.h
++++ b/xen/include/asm-x86/mm.h
+@@ -83,7 +83,7 @@
+ #define PGC_state_offlined PG_mask(2, 6)
+ #define PGC_state_free PG_mask(3, 6)
+ #define page_state_is(pg, st) (((pg)->count_info&PGC_state) == PGC_state_##st)
+-/* Page is not reference counted (see below for caveats) */
++/* Page is not reference counted */
+ #define _PGC_extra PG_shift(7)
+ #define PGC_extra PG_mask(1, 7)
+
+@@ -375,24 +375,6 @@ void zap_ro_mpt(mfn_t mfn);
+
+ bool is_iomem_page(mfn_t mfn);
+
+-/*
+- * Pages with no owner which may get passed to functions wanting to
+- * refcount them can be marked PGC_extra to bypass this refcounting (which
+- * would fail due to the lack of an owner).
+- *
+- * (For pages with owner PGC_extra has different meaning.)
+- */
+-static inline void page_suppress_refcounting(struct page_info *pg)
+-{
+- ASSERT(!page_get_owner(pg));
+- pg->count_info |= PGC_extra;
+-}
+-
+-static inline bool page_refcounting_suppressed(const struct page_info *pg)
+-{
+- return !page_get_owner(pg) && (pg->count_info & PGC_extra);
+-}
+-
+ struct platform_bad_page {
+ unsigned long mfn;
+ unsigned int order;
+--
+2.37.4
+
diff --git a/0044-tools-xenstore-create_node-Don-t-defer-work-to-undo-.patch b/0044-tools-xenstore-create_node-Don-t-defer-work-to-undo-.patch
new file mode 100644
index 0000000..8b9ff53
--- /dev/null
+++ b/0044-tools-xenstore-create_node-Don-t-defer-work-to-undo-.patch
@@ -0,0 +1,120 @@
+From 28ea39a4eb476f9105e1021bef1367c075feaa0b Mon Sep 17 00:00:00 2001
+From: Julien Grall <jgrall@amazon.com>
+Date: Tue, 13 Sep 2022 07:35:06 +0200
+Subject: [PATCH 44/87] tools/xenstore: create_node: Don't defer work to undo
+ any changes on failure
+
+XSA-115 extended destroy_node() to update the node accounting for the
+connection. The implementation is assuming the connection is the parent
+of the node, however all the nodes are allocated using a separate context
+(see process_message()). This will result to crash (or corrupt) xenstored
+as the pointer is wrongly used.
+
+In case of an error, any changes to the database or update to the
+accounting will now be reverted in create_node() by calling directly
+destroy_node(). This has the nice advantage to remove the loop to unset
+the destructors in case of success.
+
+Take the opportunity to free the nodes right now as they are not
+going to be reachable (the function returns NULL) and are just wasting
+resources.
+
+This is XSA-414 / CVE-2022-42309.
+
+Fixes: 0bfb2101f243 ("tools/xenstore: fix node accounting after failed node creation")
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+(cherry picked from commit 1cd3cc7ea27cda7640a8d895e09617b61c265697)
+---
+ tools/xenstore/xenstored_core.c | 47 ++++++++++++++++++++++-----------
+ 1 file changed, 32 insertions(+), 15 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 0c8ee276f837..29947c3020c3 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -1088,9 +1088,8 @@ nomem:
+ return NULL;
+ }
+
+-static int destroy_node(void *_node)
++static int destroy_node(struct connection *conn, struct node *node)
+ {
+- struct node *node = _node;
+ TDB_DATA key;
+
+ if (streq(node->name, "/"))
+@@ -1099,7 +1098,7 @@ static int destroy_node(void *_node)
+ set_tdb_key(node->name, &key);
+ tdb_delete(tdb_ctx, key);
+
+- domain_entry_dec(talloc_parent(node), node);
++ domain_entry_dec(conn, node);
+
+ return 0;
+ }
+@@ -1108,7 +1107,8 @@ static struct node *create_node(struct connection *conn, const void *ctx,
+ const char *name,
+ void *data, unsigned int datalen)
+ {
+- struct node *node, *i;
++ struct node *node, *i, *j;
++ int ret;
+
+ node = construct_node(conn, ctx, name);
+ if (!node)
+@@ -1130,23 +1130,40 @@ static struct node *create_node(struct connection *conn, const void *ctx,
+ /* i->parent is set for each new node, so check quota. */
+ if (i->parent &&
+ domain_entry(conn) >= quota_nb_entry_per_domain) {
+- errno = ENOSPC;
+- return NULL;
++ ret = ENOSPC;
++ goto err;
+ }
+- if (write_node(conn, i, false))
+- return NULL;
+
+- /* Account for new node, set destructor for error case. */
+- if (i->parent) {
++ ret = write_node(conn, i, false);
++ if (ret)
++ goto err;
++
++ /* Account for new node */
++ if (i->parent)
+ domain_entry_inc(conn, i);
+- talloc_set_destructor(i, destroy_node);
+- }
+ }
+
+- /* OK, now remove destructors so they stay around */
+- for (i = node; i->parent; i = i->parent)
+- talloc_set_destructor(i, NULL);
+ return node;
++
++err:
++ /*
++ * We failed to update TDB for some of the nodes. Undo any work that
++ * have already been done.
++ */
++ for (j = node; j != i; j = j->parent)
++ destroy_node(conn, j);
++
++ /* We don't need to keep the nodes around, so free them. */
++ i = node;
++ while (i) {
++ j = i;
++ i = i->parent;
++ talloc_free(j);
++ }
++
++ errno = ret;
++
++ return NULL;
+ }
+
+ /* path, data... */
+--
+2.37.4
+
diff --git a/0045-tools-xenstore-Fail-a-transaction-if-it-is-not-possi.patch b/0045-tools-xenstore-Fail-a-transaction-if-it-is-not-possi.patch
new file mode 100644
index 0000000..4ca6c93
--- /dev/null
+++ b/0045-tools-xenstore-Fail-a-transaction-if-it-is-not-possi.patch
@@ -0,0 +1,145 @@
+From 427e86b48836a9511f57004ca367283cd85cd30f Mon Sep 17 00:00:00 2001
+From: Julien Grall <jgrall@amazon.com>
+Date: Tue, 13 Sep 2022 07:35:06 +0200
+Subject: [PATCH 45/87] tools/xenstore: Fail a transaction if it is not
+ possible to create a node
+
+Commit f2bebf72c4d5 "xenstore: rework of transaction handling" moved
+out from copying the entire database everytime a new transaction is
+opened to track the list of nodes changed.
+
+The content of all the nodes accessed during a transaction will be
+temporarily stored in TDB using a different key.
+
+The function create_node() may write/update multiple nodes if the child
+doesn't exist. In case of a failure, the function will revert any
+changes (this include any update to TDB). Unfortunately, the function
+which reverts the changes (i.e. destroy_node()) will not use the correct
+key to delete any update or even request the transaction to fail.
+
+This means that if a client decide to go ahead with committing the
+transaction, orphan nodes will be created because they were not linked
+to an existing node (create_node() will write the nodes backwards).
+
+Once some nodes have been partially updated in a transaction, it is not
+easily possible to undo any changes. So rather than continuing and hit
+weird issue while committing, it is much saner to fail the transaction.
+
+This will have an impact on any client that decides to commit even if it
+can't write a node. Although, it is not clear why a normal client would
+want to do that...
+
+Lastly, update destroy_node() to use the correct key for deleting the
+node. Rather than recreating it (this will allocate memory and
+therefore fail), stash the key in the structure node.
+
+This is XSA-415 / CVE-2022-42310.
+
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+(cherry picked from commit 5d71766bd1a4a3a8b2fe952ca2be80e02fe48f34)
+---
+ tools/xenstore/xenstored_core.c | 23 +++++++++++++++--------
+ tools/xenstore/xenstored_core.h | 2 ++
+ tools/xenstore/xenstored_transaction.c | 5 +++++
+ tools/xenstore/xenstored_transaction.h | 3 +++
+ 4 files changed, 25 insertions(+), 8 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 29947c3020c3..e9c9695fd16e 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -566,15 +566,17 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
+ return 0;
+ }
+
++/*
++ * Write the node. If the node is written, caller can find the key used in
++ * node->key. This can later be used if the change needs to be reverted.
++ */
+ static int write_node(struct connection *conn, struct node *node,
+ bool no_quota_check)
+ {
+- TDB_DATA key;
+-
+- if (access_node(conn, node, NODE_ACCESS_WRITE, &key))
++ if (access_node(conn, node, NODE_ACCESS_WRITE, &node->key))
+ return errno;
+
+- return write_node_raw(conn, &key, node, no_quota_check);
++ return write_node_raw(conn, &node->key, node, no_quota_check);
+ }
+
+ unsigned int perm_for_conn(struct connection *conn,
+@@ -1090,16 +1092,21 @@ nomem:
+
+ static int destroy_node(struct connection *conn, struct node *node)
+ {
+- TDB_DATA key;
+-
+ if (streq(node->name, "/"))
+ corrupt(NULL, "Destroying root node!");
+
+- set_tdb_key(node->name, &key);
+- tdb_delete(tdb_ctx, key);
++ tdb_delete(tdb_ctx, node->key);
+
+ domain_entry_dec(conn, node);
+
++ /*
++ * It is not possible to easily revert the changes in a transaction.
++ * So if the failure happens in a transaction, mark it as fail to
++ * prevent any commit.
++ */
++ if ( conn->transaction )
++ fail_transaction(conn->transaction);
++
+ return 0;
+ }
+
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index 07d861d92499..0004fa848c83 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -155,6 +155,8 @@ struct node_perms {
+
+ struct node {
+ const char *name;
++ /* Key used to update TDB */
++ TDB_DATA key;
+
+ /* Parent (optional) */
+ struct node *parent;
+diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
+index cd07fb0f218b..faf6c930e42a 100644
+--- a/tools/xenstore/xenstored_transaction.c
++++ b/tools/xenstore/xenstored_transaction.c
+@@ -580,6 +580,11 @@ void transaction_entry_dec(struct transaction *trans, unsigned int domid)
+ list_add_tail(&d->list, &trans->changed_domains);
+ }
+
++void fail_transaction(struct transaction *trans)
++{
++ trans->fail = true;
++}
++
+ void conn_delete_all_transactions(struct connection *conn)
+ {
+ struct transaction *trans;
+diff --git a/tools/xenstore/xenstored_transaction.h b/tools/xenstore/xenstored_transaction.h
+index 43a162bea3f3..14062730e3c9 100644
+--- a/tools/xenstore/xenstored_transaction.h
++++ b/tools/xenstore/xenstored_transaction.h
+@@ -46,6 +46,9 @@ int access_node(struct connection *conn, struct node *node,
+ int transaction_prepend(struct connection *conn, const char *name,
+ TDB_DATA *key);
+
++/* Mark the transaction as failed. This will prevent it to be committed. */
++void fail_transaction(struct transaction *trans);
++
+ void conn_delete_all_transactions(struct connection *conn);
+ int check_transactions(struct hashtable *hash);
+
+--
+2.37.4
+
diff --git a/0046-tools-xenstore-split-up-send_reply.patch b/0046-tools-xenstore-split-up-send_reply.patch
new file mode 100644
index 0000000..7af249a
--- /dev/null
+++ b/0046-tools-xenstore-split-up-send_reply.patch
@@ -0,0 +1,213 @@
+From ce6aea73f6c4c90fab2500933b3a488e2f30334b Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:07 +0200
+Subject: [PATCH 46/87] tools/xenstore: split up send_reply()
+
+Today send_reply() is used for both, normal request replies and watch
+events.
+
+Split it up into send_reply() and send_event(). This will be used to
+add some event specific handling.
+
+add_event() can be merged into send_event(), removing the need for an
+intermediate memory allocation.
+
+This is part of XSA-326.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 9bfde319dbac2a1321898d2f75a3f075c3eb7b32)
+---
+ tools/xenstore/xenstored_core.c | 74 +++++++++++++++++++-------------
+ tools/xenstore/xenstored_core.h | 1 +
+ tools/xenstore/xenstored_watch.c | 39 +++--------------
+ 3 files changed, 52 insertions(+), 62 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index e9c9695fd16e..249ad5ec6fb1 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -767,49 +767,32 @@ static void send_error(struct connection *conn, int error)
+ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
+ const void *data, unsigned int len)
+ {
+- struct buffered_data *bdata;
++ struct buffered_data *bdata = conn->in;
++
++ assert(type != XS_WATCH_EVENT);
+
+ if ( len > XENSTORE_PAYLOAD_MAX ) {
+ send_error(conn, E2BIG);
+ return;
+ }
+
+- /* Replies reuse the request buffer, events need a new one. */
+- if (type != XS_WATCH_EVENT) {
+- bdata = conn->in;
+- /* Drop asynchronous responses, e.g. errors for watch events. */
+- if (!bdata)
+- return;
+- bdata->inhdr = true;
+- bdata->used = 0;
+- conn->in = NULL;
+- } else {
+- /* Message is a child of the connection for auto-cleanup. */
+- bdata = new_buffer(conn);
++ if (!bdata)
++ return;
++ bdata->inhdr = true;
++ bdata->used = 0;
+
+- /*
+- * Allocation failure here is unfortunate: we have no way to
+- * tell anybody about it.
+- */
+- if (!bdata)
+- return;
+- }
+ if (len <= DEFAULT_BUFFER_SIZE)
+ bdata->buffer = bdata->default_buffer;
+- else
++ else {
+ bdata->buffer = talloc_array(bdata, char, len);
+- if (!bdata->buffer) {
+- if (type == XS_WATCH_EVENT) {
+- /* Same as above: no way to tell someone. */
+- talloc_free(bdata);
++ if (!bdata->buffer) {
++ send_error(conn, ENOMEM);
+ return;
+ }
+- /* re-establish request buffer for sending ENOMEM. */
+- conn->in = bdata;
+- send_error(conn, ENOMEM);
+- return;
+ }
+
++ conn->in = NULL;
++
+ /* Update relevant header fields and fill in the message body. */
+ bdata->hdr.msg.type = type;
+ bdata->hdr.msg.len = len;
+@@ -817,8 +800,39 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
+
+ /* Queue for later transmission. */
+ list_add_tail(&bdata->list, &conn->out_list);
++}
+
+- return;
++/*
++ * Send a watch event.
++ * As this is not directly related to the current command, errors can't be
++ * reported.
++ */
++void send_event(struct connection *conn, const char *path, const char *token)
++{
++ struct buffered_data *bdata;
++ unsigned int len;
++
++ len = strlen(path) + 1 + strlen(token) + 1;
++ /* Don't try to send over-long events. */
++ if (len > XENSTORE_PAYLOAD_MAX)
++ return;
++
++ bdata = new_buffer(conn);
++ if (!bdata)
++ return;
++
++ bdata->buffer = talloc_array(bdata, char, len);
++ if (!bdata->buffer) {
++ talloc_free(bdata);
++ return;
++ }
++ strcpy(bdata->buffer, path);
++ strcpy(bdata->buffer + strlen(path) + 1, token);
++ bdata->hdr.msg.type = XS_WATCH_EVENT;
++ bdata->hdr.msg.len = len;
++
++ /* Queue for later transmission. */
++ list_add_tail(&bdata->list, &conn->out_list);
+ }
+
+ /* Some routines (write, mkdir, etc) just need a non-error return */
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index 0004fa848c83..9af9af4390bd 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -187,6 +187,7 @@ unsigned int get_string(const struct buffered_data *data, unsigned int offset);
+
+ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
+ const void *data, unsigned int len);
++void send_event(struct connection *conn, const char *path, const char *token);
+
+ /* Some routines (write, mkdir, etc) just need a non-error return */
+ void send_ack(struct connection *conn, enum xsd_sockmsg_type type);
+diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
+index aca0a71bada1..99a2c266b28a 100644
+--- a/tools/xenstore/xenstored_watch.c
++++ b/tools/xenstore/xenstored_watch.c
+@@ -85,35 +85,6 @@ static const char *get_watch_path(const struct watch *watch, const char *name)
+ return path;
+ }
+
+-/*
+- * Send a watch event.
+- * Temporary memory allocations are done with ctx.
+- */
+-static void add_event(struct connection *conn,
+- const void *ctx,
+- struct watch *watch,
+- const char *name)
+-{
+- /* Data to send (node\0token\0). */
+- unsigned int len;
+- char *data;
+-
+- name = get_watch_path(watch, name);
+-
+- len = strlen(name) + 1 + strlen(watch->token) + 1;
+- /* Don't try to send over-long events. */
+- if (len > XENSTORE_PAYLOAD_MAX)
+- return;
+-
+- data = talloc_array(ctx, char, len);
+- if (!data)
+- return;
+- strcpy(data, name);
+- strcpy(data + strlen(name) + 1, watch->token);
+- send_reply(conn, XS_WATCH_EVENT, data, len);
+- talloc_free(data);
+-}
+-
+ /*
+ * Check permissions of a specific watch to fire:
+ * Either the node itself or its parent have to be readable by the connection
+@@ -190,10 +161,14 @@ void fire_watches(struct connection *conn, const void *ctx, const char *name,
+ list_for_each_entry(watch, &i->watches, list) {
+ if (exact) {
+ if (streq(name, watch->node))
+- add_event(i, ctx, watch, name);
++ send_event(i,
++ get_watch_path(watch, name),
++ watch->token);
+ } else {
+ if (is_child(name, watch->node))
+- add_event(i, ctx, watch, name);
++ send_event(i,
++ get_watch_path(watch, name),
++ watch->token);
+ }
+ }
+ }
+@@ -292,7 +267,7 @@ int do_watch(struct connection *conn, struct buffered_data *in)
+ send_ack(conn, XS_WATCH);
+
+ /* We fire once up front: simplifies clients and restart. */
+- add_event(conn, in, watch, watch->node);
++ send_event(conn, get_watch_path(watch, watch->node), watch->token);
+
+ return 0;
+ }
+--
+2.37.4
+
diff --git a/0047-tools-xenstore-add-helpers-to-free-struct-buffered_d.patch b/0047-tools-xenstore-add-helpers-to-free-struct-buffered_d.patch
new file mode 100644
index 0000000..96ba7bd
--- /dev/null
+++ b/0047-tools-xenstore-add-helpers-to-free-struct-buffered_d.patch
@@ -0,0 +1,117 @@
+From f8af1a27b00e373bfb5f5e61b14c51165a740fa4 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:07 +0200
+Subject: [PATCH 47/87] tools/xenstore: add helpers to free struct
+ buffered_data
+
+Add two helpers for freeing struct buffered_data: free_buffered_data()
+for freeing one instance and conn_free_buffered_data() for freeing all
+instances for a connection.
+
+This is avoiding duplicated code and will help later when more actions
+are needed when freeing a struct buffered_data.
+
+This is part of XSA-326.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit ead062a68a9c201a95488e84750a70a107f7b317)
+---
+ tools/xenstore/xenstored_core.c | 26 +++++++++++++++++---------
+ tools/xenstore/xenstored_core.h | 2 ++
+ tools/xenstore/xenstored_domain.c | 7 +------
+ 3 files changed, 20 insertions(+), 15 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 249ad5ec6fb1..527a1ebdeded 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -211,6 +211,21 @@ void reopen_log(void)
+ }
+ }
+
++static void free_buffered_data(struct buffered_data *out,
++ struct connection *conn)
++{
++ list_del(&out->list);
++ talloc_free(out);
++}
++
++void conn_free_buffered_data(struct connection *conn)
++{
++ struct buffered_data *out;
++
++ while ((out = list_top(&conn->out_list, struct buffered_data, list)))
++ free_buffered_data(out, conn);
++}
++
+ static bool write_messages(struct connection *conn)
+ {
+ int ret;
+@@ -254,8 +269,7 @@ static bool write_messages(struct connection *conn)
+
+ trace_io(conn, out, 1);
+
+- list_del(&out->list);
+- talloc_free(out);
++ free_buffered_data(out, conn);
+
+ return true;
+ }
+@@ -1506,18 +1520,12 @@ static struct {
+ */
+ void ignore_connection(struct connection *conn)
+ {
+- struct buffered_data *out, *tmp;
+-
+ trace("CONN %p ignored\n", conn);
+
+ conn->is_ignored = true;
+ conn_delete_all_watches(conn);
+ conn_delete_all_transactions(conn);
+-
+- list_for_each_entry_safe(out, tmp, &conn->out_list, list) {
+- list_del(&out->list);
+- talloc_free(out);
+- }
++ conn_free_buffered_data(conn);
+
+ talloc_free(conn->in);
+ conn->in = NULL;
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index 9af9af4390bd..e7ee87825c3b 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -276,6 +276,8 @@ int remember_string(struct hashtable *hash, const char *str);
+
+ void set_tdb_key(const char *name, TDB_DATA *key);
+
++void conn_free_buffered_data(struct connection *conn);
++
+ const char *dump_state_global(FILE *fp);
+ const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
+ struct xs_state_connection *sc);
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index d03c7d93a9e7..93c4c1edcdd1 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -411,15 +411,10 @@ static struct domain *find_domain_by_domid(unsigned int domid)
+ static void domain_conn_reset(struct domain *domain)
+ {
+ struct connection *conn = domain->conn;
+- struct buffered_data *out;
+
+ conn_delete_all_watches(conn);
+ conn_delete_all_transactions(conn);
+-
+- while ((out = list_top(&conn->out_list, struct buffered_data, list))) {
+- list_del(&out->list);
+- talloc_free(out);
+- }
++ conn_free_buffered_data(conn);
+
+ talloc_free(conn->in);
+
+--
+2.37.4
+
diff --git a/0048-tools-xenstore-reduce-number-of-watch-events.patch b/0048-tools-xenstore-reduce-number-of-watch-events.patch
new file mode 100644
index 0000000..3a080fb
--- /dev/null
+++ b/0048-tools-xenstore-reduce-number-of-watch-events.patch
@@ -0,0 +1,201 @@
+From e26d6f4d1b389b859fb5a6570421e80e0213f92b Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:07 +0200
+Subject: [PATCH 48/87] tools/xenstore: reduce number of watch events
+
+When removing a watched node outside of a transaction, two watch events
+are being produced instead of just a single one.
+
+When finalizing a transaction watch events can be generated for each
+node which is being modified, even if outside a transaction such
+modifications might not have resulted in a watch event.
+
+This happens e.g.:
+
+- for nodes which are only modified due to added/removed child entries
+- for nodes being removed or created implicitly (e.g. creation of a/b/c
+ is implicitly creating a/b, resulting in watch events for a, a/b and
+ a/b/c instead of a/b/c only)
+
+Avoid these additional watch events, in order to reduce the needed
+memory inside Xenstore for queueing them.
+
+This is being achieved by adding event flags to struct accessed_node
+specifying whether an event should be triggered, and whether it should
+be an exact match of the modified path. Both flags can be set from
+fire_watches() instead of implying them only.
+
+This is part of XSA-326.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 3a96013a3e17baa07410b1b9776225d1d9a74297)
+---
+ tools/xenstore/xenstored_core.c | 19 ++++++------
+ tools/xenstore/xenstored_transaction.c | 41 +++++++++++++++++++++-----
+ tools/xenstore/xenstored_transaction.h | 3 ++
+ tools/xenstore/xenstored_watch.c | 7 +++--
+ 4 files changed, 51 insertions(+), 19 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 527a1ebdeded..bf2243873901 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -1295,7 +1295,7 @@ static void delete_child(struct connection *conn,
+ }
+
+ static int delete_node(struct connection *conn, const void *ctx,
+- struct node *parent, struct node *node)
++ struct node *parent, struct node *node, bool watch_exact)
+ {
+ char *name;
+
+@@ -1307,7 +1307,7 @@ static int delete_node(struct connection *conn, const void *ctx,
+ node->children);
+ child = name ? read_node(conn, node, name) : NULL;
+ if (child) {
+- if (delete_node(conn, ctx, node, child))
++ if (delete_node(conn, ctx, node, child, true))
+ return errno;
+ } else {
+ trace("delete_node: Error deleting child '%s/%s'!\n",
+@@ -1319,7 +1319,12 @@ static int delete_node(struct connection *conn, const void *ctx,
+ talloc_free(name);
+ }
+
+- fire_watches(conn, ctx, node->name, node, true, NULL);
++ /*
++ * Fire the watches now, when we can still see the node permissions.
++ * This fine as we are single threaded and the next possible read will
++ * be handled only after the node has been really removed.
++ */
++ fire_watches(conn, ctx, node->name, node, watch_exact, NULL);
+ delete_node_single(conn, node);
+ delete_child(conn, parent, basename(node->name));
+ talloc_free(node);
+@@ -1345,13 +1350,7 @@ static int _rm(struct connection *conn, const void *ctx, struct node *node,
+ return (errno == ENOMEM) ? ENOMEM : EINVAL;
+ node->parent = parent;
+
+- /*
+- * Fire the watches now, when we can still see the node permissions.
+- * This fine as we are single threaded and the next possible read will
+- * be handled only after the node has been really removed.
+- */
+- fire_watches(conn, ctx, name, node, false, NULL);
+- return delete_node(conn, ctx, parent, node);
++ return delete_node(conn, ctx, parent, node, false);
+ }
+
+
+diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
+index faf6c930e42a..54432907fc76 100644
+--- a/tools/xenstore/xenstored_transaction.c
++++ b/tools/xenstore/xenstored_transaction.c
+@@ -130,6 +130,10 @@ struct accessed_node
+
+ /* Transaction node in data base? */
+ bool ta_node;
++
++ /* Watch event flags. */
++ bool fire_watch;
++ bool watch_exact;
+ };
+
+ struct changed_domain
+@@ -323,6 +327,29 @@ err:
+ return ret;
+ }
+
++/*
++ * A watch event should be fired for a node modified inside a transaction.
++ * Set the corresponding information. A non-exact event is replacing an exact
++ * one, but not the other way round.
++ */
++void queue_watches(struct connection *conn, const char *name, bool watch_exact)
++{
++ struct accessed_node *i;
++
++ i = find_accessed_node(conn->transaction, name);
++ if (!i) {
++ conn->transaction->fail = true;
++ return;
++ }
++
++ if (!i->fire_watch) {
++ i->fire_watch = true;
++ i->watch_exact = watch_exact;
++ } else if (!watch_exact) {
++ i->watch_exact = false;
++ }
++}
++
+ /*
+ * Finalize transaction:
+ * Walk through accessed nodes and check generation against global data.
+@@ -377,15 +404,15 @@ static int finalize_transaction(struct connection *conn,
+ ret = tdb_store(tdb_ctx, key, data,
+ TDB_REPLACE);
+ talloc_free(data.dptr);
+- if (ret)
+- goto err;
+- fire_watches(conn, trans, i->node, NULL, false,
+- i->perms.p ? &i->perms : NULL);
+ } else {
+- fire_watches(conn, trans, i->node, NULL, false,
++ ret = tdb_delete(tdb_ctx, key);
++ }
++ if (ret)
++ goto err;
++ if (i->fire_watch) {
++ fire_watches(conn, trans, i->node, NULL,
++ i->watch_exact,
+ i->perms.p ? &i->perms : NULL);
+- if (tdb_delete(tdb_ctx, key))
+- goto err;
+ }
+ }
+
+diff --git a/tools/xenstore/xenstored_transaction.h b/tools/xenstore/xenstored_transaction.h
+index 14062730e3c9..0093cac807e3 100644
+--- a/tools/xenstore/xenstored_transaction.h
++++ b/tools/xenstore/xenstored_transaction.h
+@@ -42,6 +42,9 @@ void transaction_entry_dec(struct transaction *trans, unsigned int domid);
+ int access_node(struct connection *conn, struct node *node,
+ enum node_access_type type, TDB_DATA *key);
+
++/* Queue watches for a modified node. */
++void queue_watches(struct connection *conn, const char *name, bool watch_exact);
++
+ /* Prepend the transaction to name if appropriate. */
+ int transaction_prepend(struct connection *conn, const char *name,
+ TDB_DATA *key);
+diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
+index 99a2c266b28a..205d9d8ea116 100644
+--- a/tools/xenstore/xenstored_watch.c
++++ b/tools/xenstore/xenstored_watch.c
+@@ -29,6 +29,7 @@
+ #include "xenstore_lib.h"
+ #include "utils.h"
+ #include "xenstored_domain.h"
++#include "xenstored_transaction.h"
+
+ extern int quota_nb_watch_per_domain;
+
+@@ -143,9 +144,11 @@ void fire_watches(struct connection *conn, const void *ctx, const char *name,
+ struct connection *i;
+ struct watch *watch;
+
+- /* During transactions, don't fire watches. */
+- if (conn && conn->transaction)
++ /* During transactions, don't fire watches, but queue them. */
++ if (conn && conn->transaction) {
++ queue_watches(conn, name, exact);
+ return;
++ }
+
+ /* Create an event for each watch. */
+ list_for_each_entry(i, &connections, list) {
+--
+2.37.4
+
diff --git a/0049-tools-xenstore-let-unread-watch-events-time-out.patch b/0049-tools-xenstore-let-unread-watch-events-time-out.patch
new file mode 100644
index 0000000..dab0861
--- /dev/null
+++ b/0049-tools-xenstore-let-unread-watch-events-time-out.patch
@@ -0,0 +1,309 @@
+From d08cdf0b19daf948a6b9754e90de9bc304bcd262 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:07 +0200
+Subject: [PATCH 49/87] tools/xenstore: let unread watch events time out
+
+A future modification will limit the number of outstanding requests
+for a domain, where "outstanding" means that the response of the
+request or any resulting watch event hasn't been consumed yet.
+
+In order to avoid a malicious guest being capable to block other guests
+by not reading watch events, add a timeout for watch events. In case a
+watch event hasn't been consumed after this timeout, it is being
+deleted. Set the default timeout to 20 seconds (a random value being
+not too high).
+
+In order to support to specify other timeout values in future, use a
+generic command line option for that purpose:
+
+--timeout|-w watch-event=<seconds>
+
+This is part of XSA-326 / CVE-2022-42311.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 5285dcb1a5c01695c11e6397c95d906b5e765c98)
+---
+ tools/xenstore/xenstored_core.c | 133 +++++++++++++++++++++++++++++++-
+ tools/xenstore/xenstored_core.h | 6 ++
+ 2 files changed, 138 insertions(+), 1 deletion(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index bf2243873901..45244c021cd3 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -108,6 +108,8 @@ int quota_max_transaction = 10;
+ int quota_nb_perms_per_node = 5;
+ int quota_max_path_len = XENSTORE_REL_PATH_MAX;
+
++unsigned int timeout_watch_event_msec = 20000;
++
+ void trace(const char *fmt, ...)
+ {
+ va_list arglist;
+@@ -211,19 +213,92 @@ void reopen_log(void)
+ }
+ }
+
++static uint64_t get_now_msec(void)
++{
++ struct timespec now_ts;
++
++ if (clock_gettime(CLOCK_MONOTONIC, &now_ts))
++ barf_perror("Could not find time (clock_gettime failed)");
++
++ return now_ts.tv_sec * 1000 + now_ts.tv_nsec / 1000000;
++}
++
+ static void free_buffered_data(struct buffered_data *out,
+ struct connection *conn)
+ {
++ struct buffered_data *req;
++
+ list_del(&out->list);
++
++ /*
++ * Update conn->timeout_msec with the next found timeout value in the
++ * queued pending requests.
++ */
++ if (out->timeout_msec) {
++ conn->timeout_msec = 0;
++ list_for_each_entry(req, &conn->out_list, list) {
++ if (req->timeout_msec) {
++ conn->timeout_msec = req->timeout_msec;
++ break;
++ }
++ }
++ }
++
+ talloc_free(out);
+ }
+
++static void check_event_timeout(struct connection *conn, uint64_t msecs,
++ int *ptimeout)
++{
++ uint64_t delta;
++ struct buffered_data *out, *tmp;
++
++ if (!conn->timeout_msec)
++ return;
++
++ delta = conn->timeout_msec - msecs;
++ if (conn->timeout_msec <= msecs) {
++ delta = 0;
++ list_for_each_entry_safe(out, tmp, &conn->out_list, list) {
++ /*
++ * Only look at buffers with timeout and no data
++ * already written to the ring.
++ */
++ if (out->timeout_msec && out->inhdr && !out->used) {
++ if (out->timeout_msec > msecs) {
++ conn->timeout_msec = out->timeout_msec;
++ delta = conn->timeout_msec - msecs;
++ break;
++ }
++
++ /*
++ * Free out without updating conn->timeout_msec,
++ * as the update is done in this loop already.
++ */
++ out->timeout_msec = 0;
++ trace("watch event path %s for domain %u timed out\n",
++ out->buffer, conn->id);
++ free_buffered_data(out, conn);
++ }
++ }
++ if (!delta) {
++ conn->timeout_msec = 0;
++ return;
++ }
++ }
++
++ if (*ptimeout == -1 || *ptimeout > delta)
++ *ptimeout = delta;
++}
++
+ void conn_free_buffered_data(struct connection *conn)
+ {
+ struct buffered_data *out;
+
+ while ((out = list_top(&conn->out_list, struct buffered_data, list)))
+ free_buffered_data(out, conn);
++
++ conn->timeout_msec = 0;
+ }
+
+ static bool write_messages(struct connection *conn)
+@@ -411,6 +486,7 @@ static void initialize_fds(int *p_sock_pollfd_idx, int *ptimeout)
+ {
+ struct connection *conn;
+ struct wrl_timestampt now;
++ uint64_t msecs;
+
+ if (fds)
+ memset(fds, 0, sizeof(struct pollfd) * current_array_size);
+@@ -431,10 +507,12 @@ static void initialize_fds(int *p_sock_pollfd_idx, int *ptimeout)
+
+ wrl_gettime_now(&now);
+ wrl_log_periodic(now);
++ msecs = get_now_msec();
+
+ list_for_each_entry(conn, &connections, list) {
+ if (conn->domain) {
+ wrl_check_timeout(conn->domain, now, ptimeout);
++ check_event_timeout(conn, msecs, ptimeout);
+ if (conn_can_read(conn) ||
+ (conn_can_write(conn) &&
+ !list_empty(&conn->out_list)))
+@@ -794,6 +872,7 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
+ return;
+ bdata->inhdr = true;
+ bdata->used = 0;
++ bdata->timeout_msec = 0;
+
+ if (len <= DEFAULT_BUFFER_SIZE)
+ bdata->buffer = bdata->default_buffer;
+@@ -845,6 +924,12 @@ void send_event(struct connection *conn, const char *path, const char *token)
+ bdata->hdr.msg.type = XS_WATCH_EVENT;
+ bdata->hdr.msg.len = len;
+
++ if (timeout_watch_event_msec && domain_is_unprivileged(conn)) {
++ bdata->timeout_msec = get_now_msec() + timeout_watch_event_msec;
++ if (!conn->timeout_msec)
++ conn->timeout_msec = bdata->timeout_msec;
++ }
++
+ /* Queue for later transmission. */
+ list_add_tail(&bdata->list, &conn->out_list);
+ }
+@@ -2201,6 +2286,9 @@ static void usage(void)
+ " -t, --transaction <nb> limit the number of transaction allowed per domain,\n"
+ " -A, --perm-nb <nb> limit the number of permissions per node,\n"
+ " -M, --path-max <chars> limit the allowed Xenstore node path length,\n"
++" -w, --timeout <what>=<seconds> set the timeout in seconds for <what>,\n"
++" allowed timeout candidates are:\n"
++" watch-event: time a watch-event is kept pending\n"
+ " -R, --no-recovery to request that no recovery should be attempted when\n"
+ " the store is corrupted (debug only),\n"
+ " -I, --internal-db store database in memory, not on disk\n"
+@@ -2223,6 +2311,7 @@ static struct option options[] = {
+ { "transaction", 1, NULL, 't' },
+ { "perm-nb", 1, NULL, 'A' },
+ { "path-max", 1, NULL, 'M' },
++ { "timeout", 1, NULL, 'w' },
+ { "no-recovery", 0, NULL, 'R' },
+ { "internal-db", 0, NULL, 'I' },
+ { "verbose", 0, NULL, 'V' },
+@@ -2236,6 +2325,39 @@ int dom0_domid = 0;
+ int dom0_event = 0;
+ int priv_domid = 0;
+
++static int get_optval_int(const char *arg)
++{
++ char *end;
++ long val;
++
++ val = strtol(arg, &end, 10);
++ if (!*arg || *end || val < 0 || val > INT_MAX)
++ barf("invalid parameter value \"%s\"\n", arg);
++
++ return val;
++}
++
++static bool what_matches(const char *arg, const char *what)
++{
++ unsigned int what_len = strlen(what);
++
++ return !strncmp(arg, what, what_len) && arg[what_len] == '=';
++}
++
++static void set_timeout(const char *arg)
++{
++ const char *eq = strchr(arg, '=');
++ int val;
++
++ if (!eq)
++ barf("quotas must be specified via <what>=<seconds>\n");
++ val = get_optval_int(eq + 1);
++ if (what_matches(arg, "watch-event"))
++ timeout_watch_event_msec = val * 1000;
++ else
++ barf("unknown timeout \"%s\"\n", arg);
++}
++
+ int main(int argc, char *argv[])
+ {
+ int opt;
+@@ -2250,7 +2372,7 @@ int main(int argc, char *argv[])
+ orig_argc = argc;
+ orig_argv = argv;
+
+- while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:T:RVW:U", options,
++ while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:T:RVW:w:U", options,
+ NULL)) != -1) {
+ switch (opt) {
+ case 'D':
+@@ -2300,6 +2422,9 @@ int main(int argc, char *argv[])
+ quota_max_path_len = min(XENSTORE_REL_PATH_MAX,
+ quota_max_path_len);
+ break;
++ case 'w':
++ set_timeout(optarg);
++ break;
+ case 'e':
+ dom0_event = strtol(optarg, NULL, 10);
+ break;
+@@ -2741,6 +2866,12 @@ static void add_buffered_data(struct buffered_data *bdata,
+ barf("error restoring buffered data");
+
+ memcpy(bdata->buffer, data, len);
++ if (bdata->hdr.msg.type == XS_WATCH_EVENT && timeout_watch_event_msec &&
++ domain_is_unprivileged(conn)) {
++ bdata->timeout_msec = get_now_msec() + timeout_watch_event_msec;
++ if (!conn->timeout_msec)
++ conn->timeout_msec = bdata->timeout_msec;
++ }
+
+ /* Queue for later transmission. */
+ list_add_tail(&bdata->list, &conn->out_list);
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index e7ee87825c3b..8a81fc693f01 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -27,6 +27,7 @@
+ #include <fcntl.h>
+ #include <stdbool.h>
+ #include <stdint.h>
++#include <time.h>
+ #include <errno.h>
+
+ #include "xenstore_lib.h"
+@@ -67,6 +68,8 @@ struct buffered_data
+ char raw[sizeof(struct xsd_sockmsg)];
+ } hdr;
+
++ uint64_t timeout_msec;
++
+ /* The actual data. */
+ char *buffer;
+ char default_buffer[DEFAULT_BUFFER_SIZE];
+@@ -118,6 +121,7 @@ struct connection
+
+ /* Buffered output data */
+ struct list_head out_list;
++ uint64_t timeout_msec;
+
+ /* Transaction context for current request (NULL if none). */
+ struct transaction *transaction;
+@@ -244,6 +248,8 @@ extern int dom0_event;
+ extern int priv_domid;
+ extern int quota_nb_entry_per_domain;
+
++extern unsigned int timeout_watch_event_msec;
++
+ /* Map the kernel's xenstore page. */
+ void *xenbus_map(void);
+ void unmap_xenbus(void *interface);
+--
+2.37.4
+
diff --git a/0050-tools-xenstore-limit-outstanding-requests.patch b/0050-tools-xenstore-limit-outstanding-requests.patch
new file mode 100644
index 0000000..bb10180
--- /dev/null
+++ b/0050-tools-xenstore-limit-outstanding-requests.patch
@@ -0,0 +1,453 @@
+From 49344fb86ff040bae1107e236592c2d4dc4607f3 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:08 +0200
+Subject: [PATCH 50/87] tools/xenstore: limit outstanding requests
+
+Add another quota for limiting the number of outstanding requests of a
+guest. As the way to specify quotas on the command line is becoming
+rather nasty, switch to a new scheme using [--quota|-Q] <what>=<val>
+allowing to add more quotas in future easily.
+
+Set the default value to 20 (basically a random value not seeming to
+be too high or too low).
+
+A request is said to be outstanding if any message generated by this
+request (the direct response plus potential watch events) is not yet
+completely stored into a ring buffer. The initial watch event sent as
+a result of registering a watch is an exception.
+
+Note that across a live update the relation to buffered watch events
+for other domains is lost.
+
+Use talloc_zero() for allocating the domain structure in order to have
+all per-domain quota zeroed initially.
+
+This is part of XSA-326 / CVE-2022-42312.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 36de433a273f55d614c83b89c9a8972287a1e475)
+---
+ tools/xenstore/xenstored_core.c | 88 +++++++++++++++++++++++++++++--
+ tools/xenstore/xenstored_core.h | 20 ++++++-
+ tools/xenstore/xenstored_domain.c | 38 ++++++++++---
+ tools/xenstore/xenstored_domain.h | 3 ++
+ tools/xenstore/xenstored_watch.c | 15 ++++--
+ 5 files changed, 150 insertions(+), 14 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 45244c021cd3..488d540f3a32 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -107,6 +107,7 @@ int quota_max_entry_size = 2048; /* 2K */
+ int quota_max_transaction = 10;
+ int quota_nb_perms_per_node = 5;
+ int quota_max_path_len = XENSTORE_REL_PATH_MAX;
++int quota_req_outstanding = 20;
+
+ unsigned int timeout_watch_event_msec = 20000;
+
+@@ -223,12 +224,24 @@ static uint64_t get_now_msec(void)
+ return now_ts.tv_sec * 1000 + now_ts.tv_nsec / 1000000;
+ }
+
++/*
++ * Remove a struct buffered_data from the list of outgoing data.
++ * A struct buffered_data related to a request having caused watch events to be
++ * sent is kept until all those events have been written out.
++ * Each watch event is referencing the related request via pend.req, while the
++ * number of watch events caused by a request is kept in pend.ref.event_cnt
++ * (those two cases are mutually exclusive, so the two fields can share memory
++ * via a union).
++ * The struct buffered_data is freed only if no related watch event is
++ * referencing it. The related return data can be freed right away.
++ */
+ static void free_buffered_data(struct buffered_data *out,
+ struct connection *conn)
+ {
+ struct buffered_data *req;
+
+ list_del(&out->list);
++ out->on_out_list = false;
+
+ /*
+ * Update conn->timeout_msec with the next found timeout value in the
+@@ -244,6 +257,30 @@ static void free_buffered_data(struct buffered_data *out,
+ }
+ }
+
++ if (out->hdr.msg.type == XS_WATCH_EVENT) {
++ req = out->pend.req;
++ if (req) {
++ req->pend.ref.event_cnt--;
++ if (!req->pend.ref.event_cnt && !req->on_out_list) {
++ if (req->on_ref_list) {
++ domain_outstanding_domid_dec(
++ req->pend.ref.domid);
++ list_del(&req->list);
++ }
++ talloc_free(req);
++ }
++ }
++ } else if (out->pend.ref.event_cnt) {
++ /* Hang out off from conn. */
++ talloc_steal(NULL, out);
++ if (out->buffer != out->default_buffer)
++ talloc_free(out->buffer);
++ list_add(&out->list, &conn->ref_list);
++ out->on_ref_list = true;
++ return;
++ } else
++ domain_outstanding_dec(conn);
++
+ talloc_free(out);
+ }
+
+@@ -405,6 +442,7 @@ int delay_request(struct connection *conn, struct buffered_data *in,
+ static int destroy_conn(void *_conn)
+ {
+ struct connection *conn = _conn;
++ struct buffered_data *req;
+
+ /* Flush outgoing if possible, but don't block. */
+ if (!conn->domain) {
+@@ -418,6 +456,11 @@ static int destroy_conn(void *_conn)
+ break;
+ close(conn->fd);
+ }
++
++ conn_free_buffered_data(conn);
++ list_for_each_entry(req, &conn->ref_list, list)
++ req->on_ref_list = false;
++
+ if (conn->target)
+ talloc_unlink(conn, conn->target);
+ list_del(&conn->list);
+@@ -893,6 +936,8 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
+
+ /* Queue for later transmission. */
+ list_add_tail(&bdata->list, &conn->out_list);
++ bdata->on_out_list = true;
++ domain_outstanding_inc(conn);
+ }
+
+ /*
+@@ -900,7 +945,8 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
+ * As this is not directly related to the current command, errors can't be
+ * reported.
+ */
+-void send_event(struct connection *conn, const char *path, const char *token)
++void send_event(struct buffered_data *req, struct connection *conn,
++ const char *path, const char *token)
+ {
+ struct buffered_data *bdata;
+ unsigned int len;
+@@ -930,8 +976,13 @@ void send_event(struct connection *conn, const char *path, const char *token)
+ conn->timeout_msec = bdata->timeout_msec;
+ }
+
++ bdata->pend.req = req;
++ if (req)
++ req->pend.ref.event_cnt++;
++
+ /* Queue for later transmission. */
+ list_add_tail(&bdata->list, &conn->out_list);
++ bdata->on_out_list = true;
+ }
+
+ /* Some routines (write, mkdir, etc) just need a non-error return */
+@@ -1740,6 +1791,7 @@ static void handle_input(struct connection *conn)
+ return;
+ }
+ in = conn->in;
++ in->pend.ref.domid = conn->id;
+
+ /* Not finished header yet? */
+ if (in->inhdr) {
+@@ -1808,6 +1860,7 @@ struct connection *new_connection(const struct interface_funcs *funcs)
+ new->is_stalled = false;
+ new->transaction_started = 0;
+ INIT_LIST_HEAD(&new->out_list);
++ INIT_LIST_HEAD(&new->ref_list);
+ INIT_LIST_HEAD(&new->watches);
+ INIT_LIST_HEAD(&new->transaction_list);
+ INIT_LIST_HEAD(&new->delayed);
+@@ -2286,6 +2339,9 @@ static void usage(void)
+ " -t, --transaction <nb> limit the number of transaction allowed per domain,\n"
+ " -A, --perm-nb <nb> limit the number of permissions per node,\n"
+ " -M, --path-max <chars> limit the allowed Xenstore node path length,\n"
++" -Q, --quota <what>=<nb> set the quota <what> to the value <nb>, allowed\n"
++" quotas are:\n"
++" outstanding: number of outstanding requests\n"
+ " -w, --timeout <what>=<seconds> set the timeout in seconds for <what>,\n"
+ " allowed timeout candidates are:\n"
+ " watch-event: time a watch-event is kept pending\n"
+@@ -2311,6 +2367,7 @@ static struct option options[] = {
+ { "transaction", 1, NULL, 't' },
+ { "perm-nb", 1, NULL, 'A' },
+ { "path-max", 1, NULL, 'M' },
++ { "quota", 1, NULL, 'Q' },
+ { "timeout", 1, NULL, 'w' },
+ { "no-recovery", 0, NULL, 'R' },
+ { "internal-db", 0, NULL, 'I' },
+@@ -2358,6 +2415,20 @@ static void set_timeout(const char *arg)
+ barf("unknown timeout \"%s\"\n", arg);
+ }
+
++static void set_quota(const char *arg)
++{
++ const char *eq = strchr(arg, '=');
++ int val;
++
++ if (!eq)
++ barf("quotas must be specified via <what>=<nb>\n");
++ val = get_optval_int(eq + 1);
++ if (what_matches(arg, "outstanding"))
++ quota_req_outstanding = val;
++ else
++ barf("unknown quota \"%s\"\n", arg);
++}
++
+ int main(int argc, char *argv[])
+ {
+ int opt;
+@@ -2372,8 +2443,8 @@ int main(int argc, char *argv[])
+ orig_argc = argc;
+ orig_argv = argv;
+
+- while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:T:RVW:w:U", options,
+- NULL)) != -1) {
++ while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:Q:T:RVW:w:U",
++ options, NULL)) != -1) {
+ switch (opt) {
+ case 'D':
+ no_domain_init = true;
+@@ -2422,6 +2493,9 @@ int main(int argc, char *argv[])
+ quota_max_path_len = min(XENSTORE_REL_PATH_MAX,
+ quota_max_path_len);
+ break;
++ case 'Q':
++ set_quota(optarg);
++ break;
+ case 'w':
+ set_timeout(optarg);
+ break;
+@@ -2875,6 +2949,14 @@ static void add_buffered_data(struct buffered_data *bdata,
+
+ /* Queue for later transmission. */
+ list_add_tail(&bdata->list, &conn->out_list);
++ bdata->on_out_list = true;
++ /*
++ * Watch events are never "outstanding", but the request causing them
++ * are instead kept "outstanding" until all watch events caused by that
++ * request have been delivered.
++ */
++ if (bdata->hdr.msg.type != XS_WATCH_EVENT)
++ domain_outstanding_inc(conn);
+ }
+
+ void read_state_buffered_data(const void *ctx, struct connection *conn,
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index 8a81fc693f01..db09f463a657 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -56,6 +56,8 @@ struct xs_state_connection;
+ struct buffered_data
+ {
+ struct list_head list;
++ bool on_out_list;
++ bool on_ref_list;
+
+ /* Are we still doing the header? */
+ bool inhdr;
+@@ -63,6 +65,17 @@ struct buffered_data
+ /* How far are we? */
+ unsigned int used;
+
++ /* Outstanding request accounting. */
++ union {
++ /* ref is being used for requests. */
++ struct {
++ unsigned int event_cnt; /* # of outstanding events. */
++ unsigned int domid; /* domid of request. */
++ } ref;
++ /* req is being used for watch events. */
++ struct buffered_data *req; /* request causing event. */
++ } pend;
++
+ union {
+ struct xsd_sockmsg msg;
+ char raw[sizeof(struct xsd_sockmsg)];
+@@ -123,6 +136,9 @@ struct connection
+ struct list_head out_list;
+ uint64_t timeout_msec;
+
++ /* Referenced requests no longer pending. */
++ struct list_head ref_list;
++
+ /* Transaction context for current request (NULL if none). */
+ struct transaction *transaction;
+
+@@ -191,7 +207,8 @@ unsigned int get_string(const struct buffered_data *data, unsigned int offset);
+
+ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
+ const void *data, unsigned int len);
+-void send_event(struct connection *conn, const char *path, const char *token);
++void send_event(struct buffered_data *req, struct connection *conn,
++ const char *path, const char *token);
+
+ /* Some routines (write, mkdir, etc) just need a non-error return */
+ void send_ack(struct connection *conn, enum xsd_sockmsg_type type);
+@@ -247,6 +264,7 @@ extern int dom0_domid;
+ extern int dom0_event;
+ extern int priv_domid;
+ extern int quota_nb_entry_per_domain;
++extern int quota_req_outstanding;
+
+ extern unsigned int timeout_watch_event_msec;
+
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index 93c4c1edcdd1..850085a92c76 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -78,6 +78,9 @@ struct domain
+ /* number of watch for this domain */
+ int nbwatch;
+
++ /* Number of outstanding requests. */
++ int nboutstanding;
++
+ /* write rate limit */
+ wrl_creditt wrl_credit; /* [ -wrl_config_writecost, +_dburst ] */
+ struct wrl_timestampt wrl_timestamp;
+@@ -183,8 +186,12 @@ static bool domain_can_read(struct connection *conn)
+ {
+ struct xenstore_domain_interface *intf = conn->domain->interface;
+
+- if (domain_is_unprivileged(conn) && conn->domain->wrl_credit < 0)
+- return false;
++ if (domain_is_unprivileged(conn)) {
++ if (conn->domain->wrl_credit < 0)
++ return false;
++ if (conn->domain->nboutstanding >= quota_req_outstanding)
++ return false;
++ }
+
+ return (intf->req_cons != intf->req_prod);
+ }
+@@ -331,7 +338,7 @@ static struct domain *alloc_domain(const void *context, unsigned int domid)
+ {
+ struct domain *domain;
+
+- domain = talloc(context, struct domain);
++ domain = talloc_zero(context, struct domain);
+ if (!domain) {
+ errno = ENOMEM;
+ return NULL;
+@@ -392,9 +399,6 @@ static int new_domain(struct domain *domain, int port, bool restore)
+ domain->conn->domain = domain;
+ domain->conn->id = domain->domid;
+
+- domain->nbentry = 0;
+- domain->nbwatch = 0;
+-
+ return 0;
+ }
+
+@@ -938,6 +942,28 @@ int domain_watch(struct connection *conn)
+ : 0;
+ }
+
++void domain_outstanding_inc(struct connection *conn)
++{
++ if (!conn || !conn->domain)
++ return;
++ conn->domain->nboutstanding++;
++}
++
++void domain_outstanding_dec(struct connection *conn)
++{
++ if (!conn || !conn->domain)
++ return;
++ conn->domain->nboutstanding--;
++}
++
++void domain_outstanding_domid_dec(unsigned int domid)
++{
++ struct domain *d = find_domain_by_domid(domid);
++
++ if (d)
++ d->nboutstanding--;
++}
++
+ static wrl_creditt wrl_config_writecost = WRL_FACTOR;
+ static wrl_creditt wrl_config_rate = WRL_RATE * WRL_FACTOR;
+ static wrl_creditt wrl_config_dburst = WRL_DBURST * WRL_FACTOR;
+diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
+index 1e929b8f8c6f..4f51b005291a 100644
+--- a/tools/xenstore/xenstored_domain.h
++++ b/tools/xenstore/xenstored_domain.h
+@@ -64,6 +64,9 @@ int domain_entry(struct connection *conn);
+ void domain_watch_inc(struct connection *conn);
+ void domain_watch_dec(struct connection *conn);
+ int domain_watch(struct connection *conn);
++void domain_outstanding_inc(struct connection *conn);
++void domain_outstanding_dec(struct connection *conn);
++void domain_outstanding_domid_dec(unsigned int domid);
+
+ /* Special node permission handling. */
+ int set_perms_special(struct connection *conn, const char *name,
+diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
+index 205d9d8ea116..0755ffa375ba 100644
+--- a/tools/xenstore/xenstored_watch.c
++++ b/tools/xenstore/xenstored_watch.c
+@@ -142,6 +142,7 @@ void fire_watches(struct connection *conn, const void *ctx, const char *name,
+ struct node *node, bool exact, struct node_perms *perms)
+ {
+ struct connection *i;
++ struct buffered_data *req;
+ struct watch *watch;
+
+ /* During transactions, don't fire watches, but queue them. */
+@@ -150,6 +151,8 @@ void fire_watches(struct connection *conn, const void *ctx, const char *name,
+ return;
+ }
+
++ req = domain_is_unprivileged(conn) ? conn->in : NULL;
++
+ /* Create an event for each watch. */
+ list_for_each_entry(i, &connections, list) {
+ /* introduce/release domain watches */
+@@ -164,12 +167,12 @@ void fire_watches(struct connection *conn, const void *ctx, const char *name,
+ list_for_each_entry(watch, &i->watches, list) {
+ if (exact) {
+ if (streq(name, watch->node))
+- send_event(i,
++ send_event(req, i,
+ get_watch_path(watch, name),
+ watch->token);
+ } else {
+ if (is_child(name, watch->node))
+- send_event(i,
++ send_event(req, i,
+ get_watch_path(watch, name),
+ watch->token);
+ }
+@@ -269,8 +272,12 @@ int do_watch(struct connection *conn, struct buffered_data *in)
+ trace_create(watch, "watch");
+ send_ack(conn, XS_WATCH);
+
+- /* We fire once up front: simplifies clients and restart. */
+- send_event(conn, get_watch_path(watch, watch->node), watch->token);
++ /*
++ * We fire once up front: simplifies clients and restart.
++ * This event will not be linked to the XS_WATCH request.
++ */
++ send_event(NULL, conn, get_watch_path(watch, watch->node),
++ watch->token);
+
+ return 0;
+ }
+--
+2.37.4
+
diff --git a/0051-tools-xenstore-don-t-buffer-multiple-identical-watch.patch b/0051-tools-xenstore-don-t-buffer-multiple-identical-watch.patch
new file mode 100644
index 0000000..2c2dfd6
--- /dev/null
+++ b/0051-tools-xenstore-don-t-buffer-multiple-identical-watch.patch
@@ -0,0 +1,93 @@
+From b270ad4a7ebe3409337bf3730317af6977c38197 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:08 +0200
+Subject: [PATCH 51/87] tools/xenstore: don't buffer multiple identical watch
+ events
+
+A guest not reading its Xenstore response buffer fast enough might
+pile up lots of Xenstore watch events buffered. Reduce the generated
+load by dropping new events which already have an identical copy
+pending.
+
+The special events "@..." are excluded from that handling as there are
+known use cases where the handler is relying on each event to be sent
+individually.
+
+This is part of XSA-326.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit b5c0bdb96d33e18c324c13d8e33c08732d77eaa2)
+---
+ tools/xenstore/xenstored_core.c | 20 +++++++++++++++++++-
+ tools/xenstore/xenstored_core.h | 3 +++
+ 2 files changed, 22 insertions(+), 1 deletion(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 488d540f3a32..f1fa97b8cf50 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -916,6 +916,7 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
+ bdata->inhdr = true;
+ bdata->used = 0;
+ bdata->timeout_msec = 0;
++ bdata->watch_event = false;
+
+ if (len <= DEFAULT_BUFFER_SIZE)
+ bdata->buffer = bdata->default_buffer;
+@@ -948,7 +949,7 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
+ void send_event(struct buffered_data *req, struct connection *conn,
+ const char *path, const char *token)
+ {
+- struct buffered_data *bdata;
++ struct buffered_data *bdata, *bd;
+ unsigned int len;
+
+ len = strlen(path) + 1 + strlen(token) + 1;
+@@ -970,12 +971,29 @@ void send_event(struct buffered_data *req, struct connection *conn,
+ bdata->hdr.msg.type = XS_WATCH_EVENT;
+ bdata->hdr.msg.len = len;
+
++ /*
++ * Check whether an identical event is pending already.
++ * Special events are excluded from that check.
++ */
++ if (path[0] != '@') {
++ list_for_each_entry(bd, &conn->out_list, list) {
++ if (bd->watch_event && bd->hdr.msg.len == len &&
++ !memcmp(bdata->buffer, bd->buffer, len)) {
++ trace("dropping duplicate watch %s %s for domain %u\n",
++ path, token, conn->id);
++ talloc_free(bdata);
++ return;
++ }
++ }
++ }
++
+ if (timeout_watch_event_msec && domain_is_unprivileged(conn)) {
+ bdata->timeout_msec = get_now_msec() + timeout_watch_event_msec;
+ if (!conn->timeout_msec)
+ conn->timeout_msec = bdata->timeout_msec;
+ }
+
++ bdata->watch_event = true;
+ bdata->pend.req = req;
+ if (req)
+ req->pend.ref.event_cnt++;
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index db09f463a657..b9b50e81c7b4 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -62,6 +62,9 @@ struct buffered_data
+ /* Are we still doing the header? */
+ bool inhdr;
+
++ /* Is this a watch event? */
++ bool watch_event;
++
+ /* How far are we? */
+ unsigned int used;
+
+--
+2.37.4
+
diff --git a/0052-tools-xenstore-fix-connection-id-usage.patch b/0052-tools-xenstore-fix-connection-id-usage.patch
new file mode 100644
index 0000000..5eac10f
--- /dev/null
+++ b/0052-tools-xenstore-fix-connection-id-usage.patch
@@ -0,0 +1,61 @@
+From 787241f55216d34ca025c835c6a2096d7664d711 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:08 +0200
+Subject: [PATCH 52/87] tools/xenstore: fix connection->id usage
+
+Don't use conn->id for privilege checks, but domain_is_unprivileged().
+
+This is part of XSA-326.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 3047df38e1991510bc295e3e1bb6b6b6c4a97831)
+---
+ tools/xenstore/xenstored_control.c | 2 +-
+ tools/xenstore/xenstored_core.h | 2 +-
+ tools/xenstore/xenstored_transaction.c | 3 ++-
+ 3 files changed, 4 insertions(+), 3 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
+index 7b4300ef7777..adb8d51b043b 100644
+--- a/tools/xenstore/xenstored_control.c
++++ b/tools/xenstore/xenstored_control.c
+@@ -891,7 +891,7 @@ int do_control(struct connection *conn, struct buffered_data *in)
+ unsigned int cmd, num, off;
+ char **vec = NULL;
+
+- if (conn->id != 0)
++ if (domain_is_unprivileged(conn))
+ return EACCES;
+
+ off = get_string(in, 0);
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index b9b50e81c7b4..b1a70488b989 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -123,7 +123,7 @@ struct connection
+ /* The index of pollfd in global pollfd array */
+ int pollfd_idx;
+
+- /* Who am I? 0 for socket connections. */
++ /* Who am I? Domid of connection. */
+ unsigned int id;
+
+ /* Is this connection ignored? */
+diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
+index 54432907fc76..ee1b09031a3b 100644
+--- a/tools/xenstore/xenstored_transaction.c
++++ b/tools/xenstore/xenstored_transaction.c
+@@ -477,7 +477,8 @@ int do_transaction_start(struct connection *conn, struct buffered_data *in)
+ if (conn->transaction)
+ return EBUSY;
+
+- if (conn->id && conn->transaction_started > quota_max_transaction)
++ if (domain_is_unprivileged(conn) &&
++ conn->transaction_started > quota_max_transaction)
+ return ENOSPC;
+
+ /* Attach transaction to input for autofree until it's complete */
+--
+2.37.4
+
diff --git a/0053-tools-xenstore-simplify-and-fix-per-domain-node-acco.patch b/0053-tools-xenstore-simplify-and-fix-per-domain-node-acco.patch
new file mode 100644
index 0000000..1bd3051
--- /dev/null
+++ b/0053-tools-xenstore-simplify-and-fix-per-domain-node-acco.patch
@@ -0,0 +1,336 @@
+From 717460e062dfe13a69cb01f518dd7b65d39376ef Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:08 +0200
+Subject: [PATCH 53/87] tools/xenstore: simplify and fix per domain node
+ accounting
+
+The accounting of nodes can be simplified now that each connection
+holds the associated domid.
+
+Fix the node accounting to cover nodes created for a domain before it
+has been introduced. This requires to react properly to an allocation
+failure inside domain_entry_inc() by returning an error code.
+
+Especially in error paths the node accounting has to be fixed in some
+cases.
+
+This is part of XSA-326 / CVE-2022-42313.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit dbef1f7482894c572d90cd73d99ed689c891e863)
+---
+ tools/xenstore/xenstored_core.c | 43 ++++++++--
+ tools/xenstore/xenstored_domain.c | 105 ++++++++++++++++---------
+ tools/xenstore/xenstored_domain.h | 4 +-
+ tools/xenstore/xenstored_transaction.c | 8 +-
+ 4 files changed, 109 insertions(+), 51 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index f1fa97b8cf50..692d863fce35 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -638,7 +638,7 @@ struct node *read_node(struct connection *conn, const void *ctx,
+
+ /* Permissions are struct xs_permissions. */
+ node->perms.p = hdr->perms;
+- if (domain_adjust_node_perms(node)) {
++ if (domain_adjust_node_perms(conn, node)) {
+ talloc_free(node);
+ return NULL;
+ }
+@@ -660,7 +660,7 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
+ void *p;
+ struct xs_tdb_record_hdr *hdr;
+
+- if (domain_adjust_node_perms(node))
++ if (domain_adjust_node_perms(conn, node))
+ return errno;
+
+ data.dsize = sizeof(*hdr)
+@@ -1272,13 +1272,17 @@ nomem:
+ return NULL;
+ }
+
+-static int destroy_node(struct connection *conn, struct node *node)
++static void destroy_node_rm(struct node *node)
+ {
+ if (streq(node->name, "/"))
+ corrupt(NULL, "Destroying root node!");
+
+ tdb_delete(tdb_ctx, node->key);
++}
+
++static int destroy_node(struct connection *conn, struct node *node)
++{
++ destroy_node_rm(node);
+ domain_entry_dec(conn, node);
+
+ /*
+@@ -1328,8 +1332,12 @@ static struct node *create_node(struct connection *conn, const void *ctx,
+ goto err;
+
+ /* Account for new node */
+- if (i->parent)
+- domain_entry_inc(conn, i);
++ if (i->parent) {
++ if (domain_entry_inc(conn, i)) {
++ destroy_node_rm(i);
++ return NULL;
++ }
++ }
+ }
+
+ return node;
+@@ -1614,10 +1622,27 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
+ old_perms = node->perms;
+ domain_entry_dec(conn, node);
+ node->perms = perms;
+- domain_entry_inc(conn, node);
++ if (domain_entry_inc(conn, node)) {
++ node->perms = old_perms;
++ /*
++ * This should never fail because we had a reference on the
++ * domain before and Xenstored is single-threaded.
++ */
++ domain_entry_inc(conn, node);
++ return ENOMEM;
++ }
+
+- if (write_node(conn, node, false))
++ if (write_node(conn, node, false)) {
++ int saved_errno = errno;
++
++ domain_entry_dec(conn, node);
++ node->perms = old_perms;
++ /* No failure possible as above. */
++ domain_entry_inc(conn, node);
++
++ errno = saved_errno;
+ return errno;
++ }
+
+ fire_watches(conn, in, name, node, false, &old_perms);
+ send_ack(conn, XS_SET_PERMS);
+@@ -3122,7 +3147,9 @@ void read_state_node(const void *ctx, const void *state)
+ set_tdb_key(name, &key);
+ if (write_node_raw(NULL, &key, node, true))
+ barf("write node error restoring node");
+- domain_entry_inc(&conn, node);
++
++ if (domain_entry_inc(&conn, node))
++ barf("node accounting error restoring node");
+
+ talloc_free(node);
+ }
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index 850085a92c76..260952e09096 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -16,6 +16,7 @@
+ along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
++#include <assert.h>
+ #include <stdio.h>
+ #include <sys/mman.h>
+ #include <unistd.h>
+@@ -363,6 +364,18 @@ static struct domain *find_or_alloc_domain(const void *ctx, unsigned int domid)
+ return domain ? : alloc_domain(ctx, domid);
+ }
+
++static struct domain *find_or_alloc_existing_domain(unsigned int domid)
++{
++ struct domain *domain;
++ xc_dominfo_t dominfo;
++
++ domain = find_domain_struct(domid);
++ if (!domain && get_domain_info(domid, &dominfo))
++ domain = alloc_domain(NULL, domid);
++
++ return domain;
++}
++
+ static int new_domain(struct domain *domain, int port, bool restore)
+ {
+ int rc;
+@@ -782,30 +795,28 @@ void domain_deinit(void)
+ xenevtchn_unbind(xce_handle, virq_port);
+ }
+
+-void domain_entry_inc(struct connection *conn, struct node *node)
++int domain_entry_inc(struct connection *conn, struct node *node)
+ {
+ struct domain *d;
++ unsigned int domid;
+
+ if (!conn)
+- return;
++ return 0;
+
+- if (node->perms.p && node->perms.p[0].id != conn->id) {
+- if (conn->transaction) {
+- transaction_entry_inc(conn->transaction,
+- node->perms.p[0].id);
+- } else {
+- d = find_domain_by_domid(node->perms.p[0].id);
+- if (d)
+- d->nbentry++;
+- }
+- } else if (conn->domain) {
+- if (conn->transaction) {
+- transaction_entry_inc(conn->transaction,
+- conn->domain->domid);
+- } else {
+- conn->domain->nbentry++;
+- }
++ domid = node->perms.p ? node->perms.p[0].id : conn->id;
++
++ if (conn->transaction) {
++ transaction_entry_inc(conn->transaction, domid);
++ } else {
++ d = (domid == conn->id && conn->domain) ? conn->domain
++ : find_or_alloc_existing_domain(domid);
++ if (d)
++ d->nbentry++;
++ else
++ return ENOMEM;
+ }
++
++ return 0;
+ }
+
+ /*
+@@ -841,7 +852,7 @@ static int chk_domain_generation(unsigned int domid, uint64_t gen)
+ * Remove permissions for no longer existing domains in order to avoid a new
+ * domain with the same domid inheriting the permissions.
+ */
+-int domain_adjust_node_perms(struct node *node)
++int domain_adjust_node_perms(struct connection *conn, struct node *node)
+ {
+ unsigned int i;
+ int ret;
+@@ -851,8 +862,14 @@ int domain_adjust_node_perms(struct node *node)
+ return errno;
+
+ /* If the owner doesn't exist any longer give it to priv domain. */
+- if (!ret)
++ if (!ret) {
++ /*
++ * In theory we'd need to update the number of dom0 nodes here,
++ * but we could be called for a read of the node. So better
++ * avoid the risk to overflow the node count of dom0.
++ */
+ node->perms.p[0].id = priv_domid;
++ }
+
+ for (i = 1; i < node->perms.num; i++) {
+ if (node->perms.p[i].perms & XS_PERM_IGNORE)
+@@ -871,25 +888,25 @@ int domain_adjust_node_perms(struct node *node)
+ void domain_entry_dec(struct connection *conn, struct node *node)
+ {
+ struct domain *d;
++ unsigned int domid;
+
+ if (!conn)
+ return;
+
+- if (node->perms.p && node->perms.p[0].id != conn->id) {
+- if (conn->transaction) {
+- transaction_entry_dec(conn->transaction,
+- node->perms.p[0].id);
++ domid = node->perms.p ? node->perms.p[0].id : conn->id;
++
++ if (conn->transaction) {
++ transaction_entry_dec(conn->transaction, domid);
++ } else {
++ d = (domid == conn->id && conn->domain) ? conn->domain
++ : find_domain_struct(domid);
++ if (d) {
++ d->nbentry--;
+ } else {
+- d = find_domain_by_domid(node->perms.p[0].id);
+- if (d && d->nbentry)
+- d->nbentry--;
+- }
+- } else if (conn->domain && conn->domain->nbentry) {
+- if (conn->transaction) {
+- transaction_entry_dec(conn->transaction,
+- conn->domain->domid);
+- } else {
+- conn->domain->nbentry--;
++ errno = ENOENT;
++ corrupt(conn,
++ "Node \"%s\" owned by non-existing domain %u\n",
++ node->name, domid);
+ }
+ }
+ }
+@@ -899,13 +916,23 @@ int domain_entry_fix(unsigned int domid, int num, bool update)
+ struct domain *d;
+ int cnt;
+
+- d = find_domain_by_domid(domid);
+- if (!d)
+- return 0;
++ if (update) {
++ d = find_domain_struct(domid);
++ assert(d);
++ } else {
++ /*
++ * We are called first with update == false in order to catch
++ * any error. So do a possible allocation and check for error
++ * only in this case, as in the case of update == true nothing
++ * can go wrong anymore as the allocation already happened.
++ */
++ d = find_or_alloc_existing_domain(domid);
++ if (!d)
++ return -1;
++ }
+
+ cnt = d->nbentry + num;
+- if (cnt < 0)
+- cnt = 0;
++ assert(cnt >= 0);
+
+ if (update)
+ d->nbentry = cnt;
+diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
+index 4f51b005291a..d6519904d831 100644
+--- a/tools/xenstore/xenstored_domain.h
++++ b/tools/xenstore/xenstored_domain.h
+@@ -54,10 +54,10 @@ const char *get_implicit_path(const struct connection *conn);
+ bool domain_is_unprivileged(struct connection *conn);
+
+ /* Remove node permissions for no longer existing domains. */
+-int domain_adjust_node_perms(struct node *node);
++int domain_adjust_node_perms(struct connection *conn, struct node *node);
+
+ /* Quota manipulation */
+-void domain_entry_inc(struct connection *conn, struct node *);
++int domain_entry_inc(struct connection *conn, struct node *);
+ void domain_entry_dec(struct connection *conn, struct node *);
+ int domain_entry_fix(unsigned int domid, int num, bool update);
+ int domain_entry(struct connection *conn);
+diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
+index ee1b09031a3b..86caf6c398be 100644
+--- a/tools/xenstore/xenstored_transaction.c
++++ b/tools/xenstore/xenstored_transaction.c
+@@ -519,8 +519,12 @@ static int transaction_fix_domains(struct transaction *trans, bool update)
+
+ list_for_each_entry(d, &trans->changed_domains, list) {
+ cnt = domain_entry_fix(d->domid, d->nbentry, update);
+- if (!update && cnt >= quota_nb_entry_per_domain)
+- return ENOSPC;
++ if (!update) {
++ if (cnt >= quota_nb_entry_per_domain)
++ return ENOSPC;
++ if (cnt < 0)
++ return ENOMEM;
++ }
+ }
+
+ return 0;
+--
+2.37.4
+
diff --git a/0054-tools-xenstore-limit-max-number-of-nodes-accessed-in.patch b/0054-tools-xenstore-limit-max-number-of-nodes-accessed-in.patch
new file mode 100644
index 0000000..0a84c6c
--- /dev/null
+++ b/0054-tools-xenstore-limit-max-number-of-nodes-accessed-in.patch
@@ -0,0 +1,255 @@
+From 7017cfefc455db535054ebc09124af8101746a4a Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:09 +0200
+Subject: [PATCH 54/87] tools/xenstore: limit max number of nodes accessed in a
+ transaction
+
+Today a guest is free to access as many nodes in a single transaction
+as it wants. This can lead to unbounded memory consumption in Xenstore
+as there is the need to keep track of all nodes having been accessed
+during a transaction.
+
+In oxenstored the number of requests in a transaction is being limited
+via a quota maxrequests (default is 1024). As multiple accesses of a
+node are not problematic in C Xenstore, limit the number of accessed
+nodes.
+
+In order to let read_node() detect a quota error in case too many nodes
+are being accessed, check the return value of access_node() and return
+NULL in case an error has been seen. Introduce __must_check and add it
+to the access_node() prototype.
+
+This is part of XSA-326 / CVE-2022-42314.
+
+Suggested-by: Julien Grall <julien@xen.org>
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 268369d8e322d227a74a899009c5748d7b0ea142)
+---
+ tools/include/xen-tools/libs.h | 4 +++
+ tools/xenstore/xenstored_core.c | 50 ++++++++++++++++++--------
+ tools/xenstore/xenstored_core.h | 1 +
+ tools/xenstore/xenstored_transaction.c | 9 +++++
+ tools/xenstore/xenstored_transaction.h | 4 +--
+ 5 files changed, 52 insertions(+), 16 deletions(-)
+
+diff --git a/tools/include/xen-tools/libs.h b/tools/include/xen-tools/libs.h
+index a16e0c380709..bafc90e2f603 100644
+--- a/tools/include/xen-tools/libs.h
++++ b/tools/include/xen-tools/libs.h
+@@ -63,4 +63,8 @@
+ #define ROUNDUP(_x,_w) (((unsigned long)(_x)+(1UL<<(_w))-1) & ~((1UL<<(_w))-1))
+ #endif
+
++#ifndef __must_check
++#define __must_check __attribute__((__warn_unused_result__))
++#endif
++
+ #endif /* __XEN_TOOLS_LIBS__ */
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 692d863fce35..f835aa1b2f1f 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -106,6 +106,7 @@ int quota_nb_watch_per_domain = 128;
+ int quota_max_entry_size = 2048; /* 2K */
+ int quota_max_transaction = 10;
+ int quota_nb_perms_per_node = 5;
++int quota_trans_nodes = 1024;
+ int quota_max_path_len = XENSTORE_REL_PATH_MAX;
+ int quota_req_outstanding = 20;
+
+@@ -595,6 +596,7 @@ struct node *read_node(struct connection *conn, const void *ctx,
+ TDB_DATA key, data;
+ struct xs_tdb_record_hdr *hdr;
+ struct node *node;
++ int err;
+
+ node = talloc(ctx, struct node);
+ if (!node) {
+@@ -616,14 +618,13 @@ struct node *read_node(struct connection *conn, const void *ctx,
+ if (data.dptr == NULL) {
+ if (tdb_error(tdb_ctx) == TDB_ERR_NOEXIST) {
+ node->generation = NO_GENERATION;
+- access_node(conn, node, NODE_ACCESS_READ, NULL);
+- errno = ENOENT;
++ err = access_node(conn, node, NODE_ACCESS_READ, NULL);
++ errno = err ? : ENOENT;
+ } else {
+ log("TDB error on read: %s", tdb_errorstr(tdb_ctx));
+ errno = EIO;
+ }
+- talloc_free(node);
+- return NULL;
++ goto error;
+ }
+
+ node->parent = NULL;
+@@ -638,19 +639,36 @@ struct node *read_node(struct connection *conn, const void *ctx,
+
+ /* Permissions are struct xs_permissions. */
+ node->perms.p = hdr->perms;
+- if (domain_adjust_node_perms(conn, node)) {
+- talloc_free(node);
+- return NULL;
+- }
++ if (domain_adjust_node_perms(conn, node))
++ goto error;
+
+ /* Data is binary blob (usually ascii, no nul). */
+ node->data = node->perms.p + hdr->num_perms;
+ /* Children is strings, nul separated. */
+ node->children = node->data + node->datalen;
+
+- access_node(conn, node, NODE_ACCESS_READ, NULL);
++ if (access_node(conn, node, NODE_ACCESS_READ, NULL))
++ goto error;
+
+ return node;
++
++ error:
++ err = errno;
++ talloc_free(node);
++ errno = err;
++ return NULL;
++}
++
++static bool read_node_can_propagate_errno(void)
++{
++ /*
++ * 2 error cases for read_node() can always be propagated up:
++ * ENOMEM, because this has nothing to do with the node being in the
++ * data base or not, but is caused by a general lack of memory.
++ * ENOSPC, because this is related to hitting quota limits which need
++ * to be respected.
++ */
++ return errno == ENOMEM || errno == ENOSPC;
+ }
+
+ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
+@@ -767,7 +785,7 @@ static int ask_parents(struct connection *conn, const void *ctx,
+ node = read_node(conn, ctx, name);
+ if (node)
+ break;
+- if (errno == ENOMEM)
++ if (read_node_can_propagate_errno())
+ return errno;
+ } while (!streq(name, "/"));
+
+@@ -829,7 +847,7 @@ static struct node *get_node(struct connection *conn,
+ }
+ }
+ /* Clean up errno if they weren't supposed to know. */
+- if (!node && errno != ENOMEM)
++ if (!node && !read_node_can_propagate_errno())
+ errno = errno_from_parents(conn, ctx, name, errno, perm);
+ return node;
+ }
+@@ -1235,7 +1253,7 @@ static struct node *construct_node(struct connection *conn, const void *ctx,
+
+ /* If parent doesn't exist, create it. */
+ parent = read_node(conn, parentname, parentname);
+- if (!parent)
++ if (!parent && errno == ENOENT)
+ parent = construct_node(conn, ctx, parentname);
+ if (!parent)
+ return NULL;
+@@ -1509,7 +1527,7 @@ static int _rm(struct connection *conn, const void *ctx, struct node *node,
+
+ parent = read_node(conn, ctx, parentname);
+ if (!parent)
+- return (errno == ENOMEM) ? ENOMEM : EINVAL;
++ return read_node_can_propagate_errno() ? errno : EINVAL;
+ node->parent = parent;
+
+ return delete_node(conn, ctx, parent, node, false);
+@@ -1539,7 +1557,7 @@ static int do_rm(struct connection *conn, struct buffered_data *in)
+ return 0;
+ }
+ /* Restore errno, just in case. */
+- if (errno != ENOMEM)
++ if (!read_node_can_propagate_errno())
+ errno = ENOENT;
+ }
+ return errno;
+@@ -2384,6 +2402,8 @@ static void usage(void)
+ " -M, --path-max <chars> limit the allowed Xenstore node path length,\n"
+ " -Q, --quota <what>=<nb> set the quota <what> to the value <nb>, allowed\n"
+ " quotas are:\n"
++" transaction-nodes: number of accessed node per\n"
++" transaction\n"
+ " outstanding: number of outstanding requests\n"
+ " -w, --timeout <what>=<seconds> set the timeout in seconds for <what>,\n"
+ " allowed timeout candidates are:\n"
+@@ -2468,6 +2488,8 @@ static void set_quota(const char *arg)
+ val = get_optval_int(eq + 1);
+ if (what_matches(arg, "outstanding"))
+ quota_req_outstanding = val;
++ else if (what_matches(arg, "transaction-nodes"))
++ quota_trans_nodes = val;
+ else
+ barf("unknown quota \"%s\"\n", arg);
+ }
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index b1a70488b989..245f9258235f 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -268,6 +268,7 @@ extern int dom0_event;
+ extern int priv_domid;
+ extern int quota_nb_entry_per_domain;
+ extern int quota_req_outstanding;
++extern int quota_trans_nodes;
+
+ extern unsigned int timeout_watch_event_msec;
+
+diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
+index 86caf6c398be..7bd41eb475e3 100644
+--- a/tools/xenstore/xenstored_transaction.c
++++ b/tools/xenstore/xenstored_transaction.c
+@@ -156,6 +156,9 @@ struct transaction
+ /* Connection-local identifier for this transaction. */
+ uint32_t id;
+
++ /* Node counter. */
++ unsigned int nodes;
++
+ /* Generation when transaction started. */
+ uint64_t generation;
+
+@@ -260,6 +263,11 @@ int access_node(struct connection *conn, struct node *node,
+
+ i = find_accessed_node(trans, node->name);
+ if (!i) {
++ if (trans->nodes >= quota_trans_nodes &&
++ domain_is_unprivileged(conn)) {
++ ret = ENOSPC;
++ goto err;
++ }
+ i = talloc_zero(trans, struct accessed_node);
+ if (!i)
+ goto nomem;
+@@ -297,6 +305,7 @@ int access_node(struct connection *conn, struct node *node,
+ i->ta_node = true;
+ }
+ }
++ trans->nodes++;
+ list_add_tail(&i->list, &trans->accessed);
+ }
+
+diff --git a/tools/xenstore/xenstored_transaction.h b/tools/xenstore/xenstored_transaction.h
+index 0093cac807e3..e3cbd6b23095 100644
+--- a/tools/xenstore/xenstored_transaction.h
++++ b/tools/xenstore/xenstored_transaction.h
+@@ -39,8 +39,8 @@ void transaction_entry_inc(struct transaction *trans, unsigned int domid);
+ void transaction_entry_dec(struct transaction *trans, unsigned int domid);
+
+ /* This node was accessed. */
+-int access_node(struct connection *conn, struct node *node,
+- enum node_access_type type, TDB_DATA *key);
++int __must_check access_node(struct connection *conn, struct node *node,
++ enum node_access_type type, TDB_DATA *key);
+
+ /* Queue watches for a modified node. */
+ void queue_watches(struct connection *conn, const char *name, bool watch_exact);
+--
+2.37.4
+
diff --git a/0055-tools-xenstore-move-the-call-of-setup_structure-to-d.patch b/0055-tools-xenstore-move-the-call-of-setup_structure-to-d.patch
new file mode 100644
index 0000000..5a8abbd
--- /dev/null
+++ b/0055-tools-xenstore-move-the-call-of-setup_structure-to-d.patch
@@ -0,0 +1,96 @@
+From 2d39cf77d70b44b70f970da90187f48d2c0b3e96 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:09 +0200
+Subject: [PATCH 55/87] tools/xenstore: move the call of setup_structure() to
+ dom0 introduction
+
+Setting up the basic structure when introducing dom0 has the advantage
+to be able to add proper node memory accounting for the added nodes
+later.
+
+This makes it possible to do proper node accounting, too.
+
+An additional requirement to make that work fine is to correct the
+owner of the created nodes to be dom0_domid instead of domid 0.
+
+This is part of XSA-326.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 60e2f6020dea7f616857b8fc1141b1c085d88761)
+---
+ tools/xenstore/xenstored_core.c | 9 ++++-----
+ tools/xenstore/xenstored_core.h | 1 +
+ tools/xenstore/xenstored_domain.c | 3 +++
+ 3 files changed, 8 insertions(+), 5 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index f835aa1b2f1f..5171d34c947e 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -2039,7 +2039,8 @@ static int tdb_flags;
+ static void manual_node(const char *name, const char *child)
+ {
+ struct node *node;
+- struct xs_permissions perms = { .id = 0, .perms = XS_PERM_NONE };
++ struct xs_permissions perms = { .id = dom0_domid,
++ .perms = XS_PERM_NONE };
+
+ node = talloc_zero(NULL, struct node);
+ if (!node)
+@@ -2078,7 +2079,7 @@ static void tdb_logger(TDB_CONTEXT *tdb, int level, const char * fmt, ...)
+ }
+ }
+
+-static void setup_structure(bool live_update)
++void setup_structure(bool live_update)
+ {
+ char *tdbname;
+
+@@ -2101,6 +2102,7 @@ static void setup_structure(bool live_update)
+ manual_node("/", "tool");
+ manual_node("/tool", "xenstored");
+ manual_node("/tool/xenstored", NULL);
++ domain_entry_fix(dom0_domid, 3, true);
+ }
+
+ check_store();
+@@ -2614,9 +2616,6 @@ int main(int argc, char *argv[])
+
+ init_pipe(reopen_log_pipe);
+
+- /* Setup the database */
+- setup_structure(live_update);
+-
+ /* Listen to hypervisor. */
+ if (!no_domain_init && !live_update) {
+ domain_init(-1);
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index 245f9258235f..2c77ec7ee0f4 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -231,6 +231,7 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
+ struct node *read_node(struct connection *conn, const void *ctx,
+ const char *name);
+
++void setup_structure(bool live_update);
+ struct connection *new_connection(const struct interface_funcs *funcs);
+ struct connection *get_connection_by_id(unsigned int conn_id);
+ void ignore_connection(struct connection *conn);
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index 260952e09096..f04b7aae8a32 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -470,6 +470,9 @@ static struct domain *introduce_domain(const void *ctx,
+ }
+ domain->interface = interface;
+
++ if (is_master_domain)
++ setup_structure(restore);
++
+ /* Now domain belongs to its connection. */
+ talloc_steal(domain->conn, domain);
+
+--
+2.37.4
+
diff --git a/0056-tools-xenstore-add-infrastructure-to-keep-track-of-p.patch b/0056-tools-xenstore-add-infrastructure-to-keep-track-of-p.patch
new file mode 100644
index 0000000..b92c61c
--- /dev/null
+++ b/0056-tools-xenstore-add-infrastructure-to-keep-track-of-p.patch
@@ -0,0 +1,289 @@
+From 2e406cf5fbb817341dc860473158382057e13de5 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:09 +0200
+Subject: [PATCH 56/87] tools/xenstore: add infrastructure to keep track of per
+ domain memory usage
+
+The amount of memory a domain can consume in Xenstore is limited by
+various quota today, but even with sane quota a domain can still
+consume rather large memory quantities.
+
+Add the infrastructure for keeping track of the amount of memory a
+domain is consuming in Xenstore. Note that this is only the memory a
+domain has direct control over, so any internal administration data
+needed by Xenstore only is not being accounted for.
+
+There are two quotas defined: a soft quota which will result in a
+warning issued via syslog() when it is exceeded, and a hard quota
+resulting in a stop of accepting further requests or watch events as
+long as the hard quota would be violated by accepting those.
+
+Setting any of those quotas to 0 will disable it.
+
+As default values use 2MB per domain for the soft limit (this basically
+covers the allowed case to create 1000 nodes needing 2kB each), and
+2.5MB for the hard limit.
+
+This is part of XSA-326.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 0d4a8ec7a93faedbe54fd197db146de628459e77)
+---
+ tools/xenstore/xenstored_core.c | 30 ++++++++--
+ tools/xenstore/xenstored_core.h | 2 +
+ tools/xenstore/xenstored_domain.c | 93 +++++++++++++++++++++++++++++++
+ tools/xenstore/xenstored_domain.h | 20 +++++++
+ 4 files changed, 139 insertions(+), 6 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 5171d34c947e..b2bf6740d430 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -109,6 +109,8 @@ int quota_nb_perms_per_node = 5;
+ int quota_trans_nodes = 1024;
+ int quota_max_path_len = XENSTORE_REL_PATH_MAX;
+ int quota_req_outstanding = 20;
++int quota_memory_per_domain_soft = 2 * 1024 * 1024; /* 2 MB */
++int quota_memory_per_domain_hard = 2 * 1024 * 1024 + 512 * 1024; /* 2.5 MB */
+
+ unsigned int timeout_watch_event_msec = 20000;
+
+@@ -2406,7 +2408,14 @@ static void usage(void)
+ " quotas are:\n"
+ " transaction-nodes: number of accessed node per\n"
+ " transaction\n"
++" memory: total used memory per domain for nodes,\n"
++" transactions, watches and requests, above\n"
++" which Xenstore will stop talking to domain\n"
+ " outstanding: number of outstanding requests\n"
++" -q, --quota-soft <what>=<nb> set a soft quota <what> to the value <nb>,\n"
++" causing a warning to be issued via syslog() if the\n"
++" limit is violated, allowed quotas are:\n"
++" memory: see above\n"
+ " -w, --timeout <what>=<seconds> set the timeout in seconds for <what>,\n"
+ " allowed timeout candidates are:\n"
+ " watch-event: time a watch-event is kept pending\n"
+@@ -2433,6 +2442,7 @@ static struct option options[] = {
+ { "perm-nb", 1, NULL, 'A' },
+ { "path-max", 1, NULL, 'M' },
+ { "quota", 1, NULL, 'Q' },
++ { "quota-soft", 1, NULL, 'q' },
+ { "timeout", 1, NULL, 'w' },
+ { "no-recovery", 0, NULL, 'R' },
+ { "internal-db", 0, NULL, 'I' },
+@@ -2480,7 +2490,7 @@ static void set_timeout(const char *arg)
+ barf("unknown timeout \"%s\"\n", arg);
+ }
+
+-static void set_quota(const char *arg)
++static void set_quota(const char *arg, bool soft)
+ {
+ const char *eq = strchr(arg, '=');
+ int val;
+@@ -2488,11 +2498,16 @@ static void set_quota(const char *arg)
+ if (!eq)
+ barf("quotas must be specified via <what>=<nb>\n");
+ val = get_optval_int(eq + 1);
+- if (what_matches(arg, "outstanding"))
++ if (what_matches(arg, "outstanding") && !soft)
+ quota_req_outstanding = val;
+- else if (what_matches(arg, "transaction-nodes"))
++ else if (what_matches(arg, "transaction-nodes") && !soft)
+ quota_trans_nodes = val;
+- else
++ else if (what_matches(arg, "memory")) {
++ if (soft)
++ quota_memory_per_domain_soft = val;
++ else
++ quota_memory_per_domain_hard = val;
++ } else
+ barf("unknown quota \"%s\"\n", arg);
+ }
+
+@@ -2510,7 +2525,7 @@ int main(int argc, char *argv[])
+ orig_argc = argc;
+ orig_argv = argv;
+
+- while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:Q:T:RVW:w:U",
++ while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:Q:q:T:RVW:w:U",
+ options, NULL)) != -1) {
+ switch (opt) {
+ case 'D':
+@@ -2561,7 +2576,10 @@ int main(int argc, char *argv[])
+ quota_max_path_len);
+ break;
+ case 'Q':
+- set_quota(optarg);
++ set_quota(optarg, false);
++ break;
++ case 'q':
++ set_quota(optarg, true);
+ break;
+ case 'w':
+ set_timeout(optarg);
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index 2c77ec7ee0f4..373af18297bf 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -270,6 +270,8 @@ extern int priv_domid;
+ extern int quota_nb_entry_per_domain;
+ extern int quota_req_outstanding;
+ extern int quota_trans_nodes;
++extern int quota_memory_per_domain_soft;
++extern int quota_memory_per_domain_hard;
+
+ extern unsigned int timeout_watch_event_msec;
+
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index f04b7aae8a32..94fd561e9de4 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -76,6 +76,13 @@ struct domain
+ /* number of entry from this domain in the store */
+ int nbentry;
+
++ /* Amount of memory allocated for this domain. */
++ int memory;
++ bool soft_quota_reported;
++ bool hard_quota_reported;
++ time_t mem_last_msg;
++#define MEM_WARN_MINTIME_SEC 10
++
+ /* number of watch for this domain */
+ int nbwatch;
+
+@@ -192,6 +199,9 @@ static bool domain_can_read(struct connection *conn)
+ return false;
+ if (conn->domain->nboutstanding >= quota_req_outstanding)
+ return false;
++ if (conn->domain->memory >= quota_memory_per_domain_hard &&
++ quota_memory_per_domain_hard)
++ return false;
+ }
+
+ return (intf->req_cons != intf->req_prod);
+@@ -950,6 +960,89 @@ int domain_entry(struct connection *conn)
+ : 0;
+ }
+
++static bool domain_chk_quota(struct domain *domain, int mem)
++{
++ time_t now;
++
++ if (!domain || !domid_is_unprivileged(domain->domid) ||
++ (domain->conn && domain->conn->is_ignored))
++ return false;
++
++ now = time(NULL);
++
++ if (mem >= quota_memory_per_domain_hard &&
++ quota_memory_per_domain_hard) {
++ if (domain->hard_quota_reported)
++ return true;
++ syslog(LOG_ERR, "Domain %u exceeds hard memory quota, Xenstore interface to domain stalled\n",
++ domain->domid);
++ domain->mem_last_msg = now;
++ domain->hard_quota_reported = true;
++ return true;
++ }
++
++ if (now - domain->mem_last_msg >= MEM_WARN_MINTIME_SEC) {
++ if (domain->hard_quota_reported) {
++ domain->mem_last_msg = now;
++ domain->hard_quota_reported = false;
++ syslog(LOG_INFO, "Domain %u below hard memory quota again\n",
++ domain->domid);
++ }
++ if (mem >= quota_memory_per_domain_soft &&
++ quota_memory_per_domain_soft &&
++ !domain->soft_quota_reported) {
++ domain->mem_last_msg = now;
++ domain->soft_quota_reported = true;
++ syslog(LOG_WARNING, "Domain %u exceeds soft memory quota\n",
++ domain->domid);
++ }
++ if (mem < quota_memory_per_domain_soft &&
++ domain->soft_quota_reported) {
++ domain->mem_last_msg = now;
++ domain->soft_quota_reported = false;
++ syslog(LOG_INFO, "Domain %u below soft memory quota again\n",
++ domain->domid);
++ }
++
++ }
++
++ return false;
++}
++
++int domain_memory_add(unsigned int domid, int mem, bool no_quota_check)
++{
++ struct domain *domain;
++
++ domain = find_domain_struct(domid);
++ if (domain) {
++ /*
++ * domain_chk_quota() will print warning and also store whether
++ * the soft/hard quota has been hit. So check no_quota_check
++ * *after*.
++ */
++ if (domain_chk_quota(domain, domain->memory + mem) &&
++ !no_quota_check)
++ return ENOMEM;
++ domain->memory += mem;
++ } else {
++ /*
++ * The domain the memory is to be accounted for should always
++ * exist, as accounting is done either for a domain related to
++ * the current connection, or for the domain owning a node
++ * (which is always existing, as the owner of the node is
++ * tested to exist and replaced by domid 0 if not).
++ * So not finding the related domain MUST be an error in the
++ * data base.
++ */
++ errno = ENOENT;
++ corrupt(NULL, "Accounting called for non-existing domain %u\n",
++ domid);
++ return ENOENT;
++ }
++
++ return 0;
++}
++
+ void domain_watch_inc(struct connection *conn)
+ {
+ if (!conn || !conn->domain)
+diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
+index d6519904d831..633c9a0a0a1f 100644
+--- a/tools/xenstore/xenstored_domain.h
++++ b/tools/xenstore/xenstored_domain.h
+@@ -61,6 +61,26 @@ int domain_entry_inc(struct connection *conn, struct node *);
+ void domain_entry_dec(struct connection *conn, struct node *);
+ int domain_entry_fix(unsigned int domid, int num, bool update);
+ int domain_entry(struct connection *conn);
++int domain_memory_add(unsigned int domid, int mem, bool no_quota_check);
++
++/*
++ * domain_memory_add_chk(): to be used when memory quota should be checked.
++ * Not to be used when specifying a negative mem value, as lowering the used
++ * memory should always be allowed.
++ */
++static inline int domain_memory_add_chk(unsigned int domid, int mem)
++{
++ return domain_memory_add(domid, mem, false);
++}
++/*
++ * domain_memory_add_nochk(): to be used when memory quota should not be
++ * checked, e.g. when lowering memory usage, or in an error case for undoing
++ * a previous memory adjustment.
++ */
++static inline void domain_memory_add_nochk(unsigned int domid, int mem)
++{
++ domain_memory_add(domid, mem, true);
++}
+ void domain_watch_inc(struct connection *conn);
+ void domain_watch_dec(struct connection *conn);
+ int domain_watch(struct connection *conn);
+--
+2.37.4
+
diff --git a/0057-tools-xenstore-add-memory-accounting-for-responses.patch b/0057-tools-xenstore-add-memory-accounting-for-responses.patch
new file mode 100644
index 0000000..9dd565d
--- /dev/null
+++ b/0057-tools-xenstore-add-memory-accounting-for-responses.patch
@@ -0,0 +1,82 @@
+From 30c8e752f66f681b5c731a637c26510ae5f35965 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:09 +0200
+Subject: [PATCH 57/87] tools/xenstore: add memory accounting for responses
+
+Add the memory accounting for queued responses.
+
+In case adding a watch event for a guest is causing the hard memory
+quota of that guest to be violated, the event is dropped. This will
+ensure that it is impossible to drive another guest past its memory
+quota by generating insane amounts of events for that guest. This is
+especially important for protecting driver domains from that attack
+vector.
+
+This is part of XSA-326 / CVE-2022-42315.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit f6d00133643a524d2138c9e3f192bbde719050ba)
+---
+ tools/xenstore/xenstored_core.c | 22 +++++++++++++++++++---
+ 1 file changed, 19 insertions(+), 3 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index b2bf6740d430..ecab6cfbbe15 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -260,6 +260,8 @@ static void free_buffered_data(struct buffered_data *out,
+ }
+ }
+
++ domain_memory_add_nochk(conn->id, -out->hdr.msg.len - sizeof(out->hdr));
++
+ if (out->hdr.msg.type == XS_WATCH_EVENT) {
+ req = out->pend.req;
+ if (req) {
+@@ -938,11 +940,14 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
+ bdata->timeout_msec = 0;
+ bdata->watch_event = false;
+
+- if (len <= DEFAULT_BUFFER_SIZE)
++ if (len <= DEFAULT_BUFFER_SIZE) {
+ bdata->buffer = bdata->default_buffer;
+- else {
++ /* Don't check quota, path might be used for returning error. */
++ domain_memory_add_nochk(conn->id, len + sizeof(bdata->hdr));
++ } else {
+ bdata->buffer = talloc_array(bdata, char, len);
+- if (!bdata->buffer) {
++ if (!bdata->buffer ||
++ domain_memory_add_chk(conn->id, len + sizeof(bdata->hdr))) {
+ send_error(conn, ENOMEM);
+ return;
+ }
+@@ -1007,6 +1012,11 @@ void send_event(struct buffered_data *req, struct connection *conn,
+ }
+ }
+
++ if (domain_memory_add_chk(conn->id, len + sizeof(bdata->hdr))) {
++ talloc_free(bdata);
++ return;
++ }
++
+ if (timeout_watch_event_msec && domain_is_unprivileged(conn)) {
+ bdata->timeout_msec = get_now_msec() + timeout_watch_event_msec;
+ if (!conn->timeout_msec)
+@@ -3039,6 +3049,12 @@ static void add_buffered_data(struct buffered_data *bdata,
+ */
+ if (bdata->hdr.msg.type != XS_WATCH_EVENT)
+ domain_outstanding_inc(conn);
++ /*
++ * We are restoring the state after Live-Update and the new quota may
++ * be smaller. So ignore it. The limit will be applied for any resource
++ * after the state has been fully restored.
++ */
++ domain_memory_add_nochk(conn->id, len + sizeof(bdata->hdr));
+ }
+
+ void read_state_buffered_data(const void *ctx, struct connection *conn,
+--
+2.37.4
+
diff --git a/0058-tools-xenstore-add-memory-accounting-for-watches.patch b/0058-tools-xenstore-add-memory-accounting-for-watches.patch
new file mode 100644
index 0000000..dc6b80c
--- /dev/null
+++ b/0058-tools-xenstore-add-memory-accounting-for-watches.patch
@@ -0,0 +1,96 @@
+From bce985745cde48a339954759677b77d3eeec41f3 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:10 +0200
+Subject: [PATCH 58/87] tools/xenstore: add memory accounting for watches
+
+Add the memory accounting for registered watches.
+
+When a socket connection is destroyed, the associated watches are
+removed, too. In order to keep memory accounting correct the watches
+must be removed explicitly via a call of conn_delete_all_watches() from
+destroy_conn().
+
+This is part of XSA-326 / CVE-2022-42315.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 7f9978a2cc37aaffab2fb09593bc598c0712a69b)
+---
+ tools/xenstore/xenstored_core.c | 1 +
+ tools/xenstore/xenstored_watch.c | 13 ++++++++++---
+ 2 files changed, 11 insertions(+), 3 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index ecab6cfbbe15..d86942f5aa77 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -463,6 +463,7 @@ static int destroy_conn(void *_conn)
+ }
+
+ conn_free_buffered_data(conn);
++ conn_delete_all_watches(conn);
+ list_for_each_entry(req, &conn->ref_list, list)
+ req->on_ref_list = false;
+
+diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
+index 0755ffa375ba..fdf9b2d653a0 100644
+--- a/tools/xenstore/xenstored_watch.c
++++ b/tools/xenstore/xenstored_watch.c
+@@ -211,7 +211,7 @@ static int check_watch_path(struct connection *conn, const void *ctx,
+ }
+
+ static struct watch *add_watch(struct connection *conn, char *path, char *token,
+- bool relative)
++ bool relative, bool no_quota_check)
+ {
+ struct watch *watch;
+
+@@ -222,6 +222,9 @@ static struct watch *add_watch(struct connection *conn, char *path, char *token,
+ watch->token = talloc_strdup(watch, token);
+ if (!watch->node || !watch->token)
+ goto nomem;
++ if (domain_memory_add(conn->id, strlen(path) + strlen(token),
++ no_quota_check))
++ goto nomem;
+
+ if (relative)
+ watch->relative_path = get_implicit_path(conn);
+@@ -265,7 +268,7 @@ int do_watch(struct connection *conn, struct buffered_data *in)
+ if (domain_watch(conn) > quota_nb_watch_per_domain)
+ return E2BIG;
+
+- watch = add_watch(conn, vec[0], vec[1], relative);
++ watch = add_watch(conn, vec[0], vec[1], relative, false);
+ if (!watch)
+ return errno;
+
+@@ -296,6 +299,8 @@ int do_unwatch(struct connection *conn, struct buffered_data *in)
+ list_for_each_entry(watch, &conn->watches, list) {
+ if (streq(watch->node, node) && streq(watch->token, vec[1])) {
+ list_del(&watch->list);
++ domain_memory_add_nochk(conn->id, -strlen(watch->node) -
++ strlen(watch->token));
+ talloc_free(watch);
+ domain_watch_dec(conn);
+ send_ack(conn, XS_UNWATCH);
+@@ -311,6 +316,8 @@ void conn_delete_all_watches(struct connection *conn)
+
+ while ((watch = list_top(&conn->watches, struct watch, list))) {
+ list_del(&watch->list);
++ domain_memory_add_nochk(conn->id, -strlen(watch->node) -
++ strlen(watch->token));
+ talloc_free(watch);
+ domain_watch_dec(conn);
+ }
+@@ -373,7 +380,7 @@ void read_state_watch(const void *ctx, const void *state)
+ if (!path)
+ barf("allocation error for read watch");
+
+- if (!add_watch(conn, path, token, relative))
++ if (!add_watch(conn, path, token, relative, true))
+ barf("error adding watch");
+ }
+
+--
+2.37.4
+
diff --git a/0059-tools-xenstore-add-memory-accounting-for-nodes.patch b/0059-tools-xenstore-add-memory-accounting-for-nodes.patch
new file mode 100644
index 0000000..a1ab308
--- /dev/null
+++ b/0059-tools-xenstore-add-memory-accounting-for-nodes.patch
@@ -0,0 +1,342 @@
+From 578d422af0b444a9e437dd0ceddf2049364f1a40 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:10 +0200
+Subject: [PATCH 59/87] tools/xenstore: add memory accounting for nodes
+
+Add the memory accounting for Xenstore nodes. In order to make this
+not too complicated allow for some sloppiness when writing nodes. Any
+hard quota violation will result in no further requests to be accepted.
+
+This is part of XSA-326 / CVE-2022-42315.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 00e9e32d022be1afc144b75acdaeba8393e63315)
+---
+ tools/xenstore/xenstored_core.c | 140 ++++++++++++++++++++++---
+ tools/xenstore/xenstored_core.h | 12 +++
+ tools/xenstore/xenstored_transaction.c | 16 ++-
+ 3 files changed, 151 insertions(+), 17 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index d86942f5aa77..16504de42017 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -591,6 +591,117 @@ void set_tdb_key(const char *name, TDB_DATA *key)
+ key->dsize = strlen(name);
+ }
+
++static void get_acc_data(TDB_DATA *key, struct node_account_data *acc)
++{
++ TDB_DATA old_data;
++ struct xs_tdb_record_hdr *hdr;
++
++ if (acc->memory < 0) {
++ old_data = tdb_fetch(tdb_ctx, *key);
++ /* No check for error, as the node might not exist. */
++ if (old_data.dptr == NULL) {
++ acc->memory = 0;
++ } else {
++ hdr = (void *)old_data.dptr;
++ acc->memory = old_data.dsize;
++ acc->domid = hdr->perms[0].id;
++ }
++ talloc_free(old_data.dptr);
++ }
++}
++
++/*
++ * Per-transaction nodes need to be accounted for the transaction owner.
++ * Those nodes are stored in the data base with the transaction generation
++ * count prepended (e.g. 123/local/domain/...). So testing for the node's
++ * key not to start with "/" is sufficient.
++ */
++static unsigned int get_acc_domid(struct connection *conn, TDB_DATA *key,
++ unsigned int domid)
++{
++ return (!conn || key->dptr[0] == '/') ? domid : conn->id;
++}
++
++int do_tdb_write(struct connection *conn, TDB_DATA *key, TDB_DATA *data,
++ struct node_account_data *acc, bool no_quota_check)
++{
++ struct xs_tdb_record_hdr *hdr = (void *)data->dptr;
++ struct node_account_data old_acc = {};
++ unsigned int old_domid, new_domid;
++ int ret;
++
++ if (!acc)
++ old_acc.memory = -1;
++ else
++ old_acc = *acc;
++
++ get_acc_data(key, &old_acc);
++ old_domid = get_acc_domid(conn, key, old_acc.domid);
++ new_domid = get_acc_domid(conn, key, hdr->perms[0].id);
++
++ /*
++ * Don't check for ENOENT, as we want to be able to switch orphaned
++ * nodes to new owners.
++ */
++ if (old_acc.memory)
++ domain_memory_add_nochk(old_domid,
++ -old_acc.memory - key->dsize);
++ ret = domain_memory_add(new_domid, data->dsize + key->dsize,
++ no_quota_check);
++ if (ret) {
++ /* Error path, so no quota check. */
++ if (old_acc.memory)
++ domain_memory_add_nochk(old_domid,
++ old_acc.memory + key->dsize);
++ return ret;
++ }
++
++ /* TDB should set errno, but doesn't even set ecode AFAICT. */
++ if (tdb_store(tdb_ctx, *key, *data, TDB_REPLACE) != 0) {
++ domain_memory_add_nochk(new_domid, -data->dsize - key->dsize);
++ /* Error path, so no quota check. */
++ if (old_acc.memory)
++ domain_memory_add_nochk(old_domid,
++ old_acc.memory + key->dsize);
++ errno = EIO;
++ return errno;
++ }
++
++ if (acc) {
++ /* Don't use new_domid, as it might be a transaction node. */
++ acc->domid = hdr->perms[0].id;
++ acc->memory = data->dsize;
++ }
++
++ return 0;
++}
++
++int do_tdb_delete(struct connection *conn, TDB_DATA *key,
++ struct node_account_data *acc)
++{
++ struct node_account_data tmp_acc;
++ unsigned int domid;
++
++ if (!acc) {
++ acc = &tmp_acc;
++ acc->memory = -1;
++ }
++
++ get_acc_data(key, acc);
++
++ if (tdb_delete(tdb_ctx, *key)) {
++ errno = EIO;
++ return errno;
++ }
++
++ if (acc->memory) {
++ domid = get_acc_domid(conn, key, acc->domid);
++ domain_memory_add_nochk(domid, -acc->memory - key->dsize);
++ }
++
++ return 0;
++}
++
+ /*
+ * If it fails, returns NULL and sets errno.
+ * Temporary memory allocations will be done with ctx.
+@@ -644,9 +755,15 @@ struct node *read_node(struct connection *conn, const void *ctx,
+
+ /* Permissions are struct xs_permissions. */
+ node->perms.p = hdr->perms;
++ node->acc.domid = node->perms.p[0].id;
++ node->acc.memory = data.dsize;
+ if (domain_adjust_node_perms(conn, node))
+ goto error;
+
++ /* If owner is gone reset currently accounted memory size. */
++ if (node->acc.domid != node->perms.p[0].id)
++ node->acc.memory = 0;
++
+ /* Data is binary blob (usually ascii, no nul). */
+ node->data = node->perms.p + hdr->num_perms;
+ /* Children is strings, nul separated. */
+@@ -715,12 +832,9 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
+ p += node->datalen;
+ memcpy(p, node->children, node->childlen);
+
+- /* TDB should set errno, but doesn't even set ecode AFAICT. */
+- if (tdb_store(tdb_ctx, *key, data, TDB_REPLACE) != 0) {
+- corrupt(conn, "Write of %s failed", key->dptr);
+- errno = EIO;
+- return errno;
+- }
++ if (do_tdb_write(conn, key, &data, &node->acc, no_quota_check))
++ return EIO;
++
+ return 0;
+ }
+
+@@ -1222,7 +1336,7 @@ static void delete_node_single(struct connection *conn, struct node *node)
+ if (access_node(conn, node, NODE_ACCESS_DELETE, &key))
+ return;
+
+- if (tdb_delete(tdb_ctx, key) != 0) {
++ if (do_tdb_delete(conn, &key, &node->acc) != 0) {
+ corrupt(conn, "Could not delete '%s'", node->name);
+ return;
+ }
+@@ -1295,6 +1409,7 @@ static struct node *construct_node(struct connection *conn, const void *ctx,
+ /* No children, no data */
+ node->children = node->data = NULL;
+ node->childlen = node->datalen = 0;
++ node->acc.memory = 0;
+ node->parent = parent;
+ return node;
+
+@@ -1303,17 +1418,17 @@ nomem:
+ return NULL;
+ }
+
+-static void destroy_node_rm(struct node *node)
++static void destroy_node_rm(struct connection *conn, struct node *node)
+ {
+ if (streq(node->name, "/"))
+ corrupt(NULL, "Destroying root node!");
+
+- tdb_delete(tdb_ctx, node->key);
++ do_tdb_delete(conn, &node->key, &node->acc);
+ }
+
+ static int destroy_node(struct connection *conn, struct node *node)
+ {
+- destroy_node_rm(node);
++ destroy_node_rm(conn, node);
+ domain_entry_dec(conn, node);
+
+ /*
+@@ -1365,7 +1480,7 @@ static struct node *create_node(struct connection *conn, const void *ctx,
+ /* Account for new node */
+ if (i->parent) {
+ if (domain_entry_inc(conn, i)) {
+- destroy_node_rm(i);
++ destroy_node_rm(conn, i);
+ return NULL;
+ }
+ }
+@@ -2291,7 +2406,7 @@ static int clean_store_(TDB_CONTEXT *tdb, TDB_DATA key, TDB_DATA val,
+ if (!hashtable_search(reachable, name)) {
+ log("clean_store: '%s' is orphaned!", name);
+ if (recovery) {
+- tdb_delete(tdb, key);
++ do_tdb_delete(NULL, &key, NULL);
+ }
+ }
+
+@@ -3149,6 +3264,7 @@ void read_state_node(const void *ctx, const void *state)
+ if (!node)
+ barf("allocation error restoring node");
+
++ node->acc.memory = 0;
+ node->name = name;
+ node->generation = ++generation;
+ node->datalen = sn->data_len;
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index 373af18297bf..da9ecce67f31 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -176,6 +176,11 @@ struct node_perms {
+ struct xs_permissions *p;
+ };
+
++struct node_account_data {
++ unsigned int domid;
++ int memory; /* -1 if unknown */
++};
++
+ struct node {
+ const char *name;
+ /* Key used to update TDB */
+@@ -198,6 +203,9 @@ struct node {
+ /* Children, each nul-terminated. */
+ unsigned int childlen;
+ char *children;
++
++ /* Allocation information for node currently in store. */
++ struct node_account_data acc;
+ };
+
+ /* Return the only argument in the input. */
+@@ -306,6 +314,10 @@ extern xengnttab_handle **xgt_handle;
+ int remember_string(struct hashtable *hash, const char *str);
+
+ void set_tdb_key(const char *name, TDB_DATA *key);
++int do_tdb_write(struct connection *conn, TDB_DATA *key, TDB_DATA *data,
++ struct node_account_data *acc, bool no_quota_check);
++int do_tdb_delete(struct connection *conn, TDB_DATA *key,
++ struct node_account_data *acc);
+
+ void conn_free_buffered_data(struct connection *conn);
+
+diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
+index 7bd41eb475e3..ace9a11d77bb 100644
+--- a/tools/xenstore/xenstored_transaction.c
++++ b/tools/xenstore/xenstored_transaction.c
+@@ -153,6 +153,9 @@ struct transaction
+ /* List of all transactions active on this connection. */
+ struct list_head list;
+
++ /* Connection this transaction is associated with. */
++ struct connection *conn;
++
+ /* Connection-local identifier for this transaction. */
+ uint32_t id;
+
+@@ -286,6 +289,8 @@ int access_node(struct connection *conn, struct node *node,
+
+ introduce = true;
+ i->ta_node = false;
++ /* acc.memory < 0 means "unknown, get size from TDB". */
++ node->acc.memory = -1;
+
+ /*
+ * Additional transaction-specific node for read type. We only
+@@ -410,11 +415,11 @@ static int finalize_transaction(struct connection *conn,
+ goto err;
+ hdr = (void *)data.dptr;
+ hdr->generation = ++generation;
+- ret = tdb_store(tdb_ctx, key, data,
+- TDB_REPLACE);
++ ret = do_tdb_write(conn, &key, &data, NULL,
++ true);
+ talloc_free(data.dptr);
+ } else {
+- ret = tdb_delete(tdb_ctx, key);
++ ret = do_tdb_delete(conn, &key, NULL);
+ }
+ if (ret)
+ goto err;
+@@ -425,7 +430,7 @@ static int finalize_transaction(struct connection *conn,
+ }
+ }
+
+- if (i->ta_node && tdb_delete(tdb_ctx, ta_key))
++ if (i->ta_node && do_tdb_delete(conn, &ta_key, NULL))
+ goto err;
+ list_del(&i->list);
+ talloc_free(i);
+@@ -453,7 +458,7 @@ static int destroy_transaction(void *_transaction)
+ i->node);
+ if (trans_name) {
+ set_tdb_key(trans_name, &key);
+- tdb_delete(tdb_ctx, key);
++ do_tdb_delete(trans->conn, &key, NULL);
+ }
+ }
+ list_del(&i->list);
+@@ -497,6 +502,7 @@ int do_transaction_start(struct connection *conn, struct buffered_data *in)
+
+ INIT_LIST_HEAD(&trans->accessed);
+ INIT_LIST_HEAD(&trans->changed_domains);
++ trans->conn = conn;
+ trans->fail = false;
+ trans->generation = ++generation;
+
+--
+2.37.4
+
diff --git a/0060-tools-xenstore-add-exports-for-quota-variables.patch b/0060-tools-xenstore-add-exports-for-quota-variables.patch
new file mode 100644
index 0000000..79ca465
--- /dev/null
+++ b/0060-tools-xenstore-add-exports-for-quota-variables.patch
@@ -0,0 +1,62 @@
+From 0a67b4eef104c36bef52990e413ef361acc8183c Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:10 +0200
+Subject: [PATCH 60/87] tools/xenstore: add exports for quota variables
+
+Some quota variables are not exported via header files.
+
+This is part of XSA-326.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 1da16d5990b5f7752657fca3e948f735177ea9ad)
+---
+ tools/xenstore/xenstored_core.h | 5 +++++
+ tools/xenstore/xenstored_transaction.c | 1 -
+ tools/xenstore/xenstored_watch.c | 2 --
+ 3 files changed, 5 insertions(+), 3 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index da9ecce67f31..bfd3fc1e9df3 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -275,6 +275,11 @@ extern TDB_CONTEXT *tdb_ctx;
+ extern int dom0_domid;
+ extern int dom0_event;
+ extern int priv_domid;
++extern int quota_nb_watch_per_domain;
++extern int quota_max_transaction;
++extern int quota_max_entry_size;
++extern int quota_nb_perms_per_node;
++extern int quota_max_path_len;
+ extern int quota_nb_entry_per_domain;
+ extern int quota_req_outstanding;
+ extern int quota_trans_nodes;
+diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
+index ace9a11d77bb..28774813de83 100644
+--- a/tools/xenstore/xenstored_transaction.c
++++ b/tools/xenstore/xenstored_transaction.c
+@@ -175,7 +175,6 @@ struct transaction
+ bool fail;
+ };
+
+-extern int quota_max_transaction;
+ uint64_t generation;
+
+ static struct accessed_node *find_accessed_node(struct transaction *trans,
+diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
+index fdf9b2d653a0..85362bcce314 100644
+--- a/tools/xenstore/xenstored_watch.c
++++ b/tools/xenstore/xenstored_watch.c
+@@ -31,8 +31,6 @@
+ #include "xenstored_domain.h"
+ #include "xenstored_transaction.h"
+
+-extern int quota_nb_watch_per_domain;
+-
+ struct watch
+ {
+ /* Watches on this connection */
+--
+2.37.4
+
diff --git a/0061-tools-xenstore-add-control-command-for-setting-and-s.patch b/0061-tools-xenstore-add-control-command-for-setting-and-s.patch
new file mode 100644
index 0000000..5adcd35
--- /dev/null
+++ b/0061-tools-xenstore-add-control-command-for-setting-and-s.patch
@@ -0,0 +1,248 @@
+From b584b9b95687655f4f9f5c37fea3b1eea3f32886 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:10 +0200
+Subject: [PATCH 61/87] tools/xenstore: add control command for setting and
+ showing quota
+
+Add a xenstore-control command "quota" to:
+- show current quota settings
+- change quota settings
+- show current quota related values of a domain
+
+Note that in the case the new quota is lower than existing one,
+Xenstored may continue to handle requests from a domain exceeding the
+new limit (depends on which one has been broken) and the amount of
+resource used will not change. However the domain will not be able to
+create more resource (associated to the quota) until it is back to below
+the limit.
+
+This is part of XSA-326.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 9c484bef83496b683b0087e3bd2a560da4aa37af)
+---
+ docs/misc/xenstore.txt | 11 +++
+ tools/xenstore/xenstored_control.c | 111 +++++++++++++++++++++++++++++
+ tools/xenstore/xenstored_domain.c | 33 +++++++++
+ tools/xenstore/xenstored_domain.h | 2 +
+ 4 files changed, 157 insertions(+)
+
+diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
+index 334dc8b6fdf5..a7d006519ae8 100644
+--- a/docs/misc/xenstore.txt
++++ b/docs/misc/xenstore.txt
+@@ -366,6 +366,17 @@ CONTROL <command>|[<parameters>|]
+ print|<string>
+ print <string> to syslog (xenstore runs as daemon) or
+ to console (xenstore runs as stubdom)
++ quota|[set <name> <val>|<domid>]
++ without parameters: print the current quota settings
++ with "set <name> <val>": set the quota <name> to new value
++ <val> (The admin should make sure all the domain usage is
++ below the quota. If it is not, then Xenstored may continue to
++ handle requests from the domain as long as the resource
++ violating the new quota setting isn't increased further)
++ with "<domid>": print quota related accounting data for
++ the domain <domid>
++ quota-soft|[set <name> <val>]
++ like the "quota" command, but for soft-quota.
+ help <supported-commands>
+ return list of supported commands for CONTROL
+
+diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
+index adb8d51b043b..1031a81c3874 100644
+--- a/tools/xenstore/xenstored_control.c
++++ b/tools/xenstore/xenstored_control.c
+@@ -196,6 +196,115 @@ static int do_control_log(void *ctx, struct connection *conn,
+ return 0;
+ }
+
++struct quota {
++ const char *name;
++ int *quota;
++ const char *descr;
++};
++
++static const struct quota hard_quotas[] = {
++ { "nodes", "a_nb_entry_per_domain, "Nodes per domain" },
++ { "watches", "a_nb_watch_per_domain, "Watches per domain" },
++ { "transactions", "a_max_transaction, "Transactions per domain" },
++ { "outstanding", "a_req_outstanding,
++ "Outstanding requests per domain" },
++ { "transaction-nodes", "a_trans_nodes,
++ "Max. number of accessed nodes per transaction" },
++ { "memory", "a_memory_per_domain_hard,
++ "Total Xenstore memory per domain (error level)" },
++ { "node-size", "a_max_entry_size, "Max. size of a node" },
++ { "path-max", "a_max_path_len, "Max. length of a node path" },
++ { "permissions", "a_nb_perms_per_node,
++ "Max. number of permissions per node" },
++ { NULL, NULL, NULL }
++};
++
++static const struct quota soft_quotas[] = {
++ { "memory", "a_memory_per_domain_soft,
++ "Total Xenstore memory per domain (warning level)" },
++ { NULL, NULL, NULL }
++};
++
++static int quota_show_current(const void *ctx, struct connection *conn,
++ const struct quota *quotas)
++{
++ char *resp;
++ unsigned int i;
++
++ resp = talloc_strdup(ctx, "Quota settings:\n");
++ if (!resp)
++ return ENOMEM;
++
++ for (i = 0; quotas[i].quota; i++) {
++ resp = talloc_asprintf_append(resp, "%-17s: %8d %s\n",
++ quotas[i].name, *quotas[i].quota,
++ quotas[i].descr);
++ if (!resp)
++ return ENOMEM;
++ }
++
++ send_reply(conn, XS_CONTROL, resp, strlen(resp) + 1);
++
++ return 0;
++}
++
++static int quota_set(const void *ctx, struct connection *conn,
++ char **vec, int num, const struct quota *quotas)
++{
++ unsigned int i;
++ int val;
++
++ if (num != 2)
++ return EINVAL;
++
++ val = atoi(vec[1]);
++ if (val < 1)
++ return EINVAL;
++
++ for (i = 0; quotas[i].quota; i++) {
++ if (!strcmp(vec[0], quotas[i].name)) {
++ *quotas[i].quota = val;
++ send_ack(conn, XS_CONTROL);
++ return 0;
++ }
++ }
++
++ return EINVAL;
++}
++
++static int quota_get(const void *ctx, struct connection *conn,
++ char **vec, int num)
++{
++ if (num != 1)
++ return EINVAL;
++
++ return domain_get_quota(ctx, conn, atoi(vec[0]));
++}
++
++static int do_control_quota(void *ctx, struct connection *conn,
++ char **vec, int num)
++{
++ if (num == 0)
++ return quota_show_current(ctx, conn, hard_quotas);
++
++ if (!strcmp(vec[0], "set"))
++ return quota_set(ctx, conn, vec + 1, num - 1, hard_quotas);
++
++ return quota_get(ctx, conn, vec, num);
++}
++
++static int do_control_quota_s(void *ctx, struct connection *conn,
++ char **vec, int num)
++{
++ if (num == 0)
++ return quota_show_current(ctx, conn, soft_quotas);
++
++ if (!strcmp(vec[0], "set"))
++ return quota_set(ctx, conn, vec + 1, num - 1, soft_quotas);
++
++ return EINVAL;
++}
++
+ #ifdef __MINIOS__
+ static int do_control_memreport(void *ctx, struct connection *conn,
+ char **vec, int num)
+@@ -847,6 +956,8 @@ static struct cmd_s cmds[] = {
+ { "memreport", do_control_memreport, "[<file>]" },
+ #endif
+ { "print", do_control_print, "<string>" },
++ { "quota", do_control_quota, "[set <name> <val>|<domid>]" },
++ { "quota-soft", do_control_quota_s, "[set <name> <val>]" },
+ { "help", do_control_help, "" },
+ };
+
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index 94fd561e9de4..e7c6886ccf47 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -31,6 +31,7 @@
+ #include "xenstored_domain.h"
+ #include "xenstored_transaction.h"
+ #include "xenstored_watch.h"
++#include "xenstored_control.h"
+
+ #include <xenevtchn.h>
+ #include <xenctrl.h>
+@@ -345,6 +346,38 @@ static struct domain *find_domain_struct(unsigned int domid)
+ return NULL;
+ }
+
++int domain_get_quota(const void *ctx, struct connection *conn,
++ unsigned int domid)
++{
++ struct domain *d = find_domain_struct(domid);
++ char *resp;
++ int ta;
++
++ if (!d)
++ return ENOENT;
++
++ ta = d->conn ? d->conn->transaction_started : 0;
++ resp = talloc_asprintf(ctx, "Domain %u:\n", domid);
++ if (!resp)
++ return ENOMEM;
++
++#define ent(t, e) \
++ resp = talloc_asprintf_append(resp, "%-16s: %8d\n", #t, e); \
++ if (!resp) return ENOMEM
++
++ ent(nodes, d->nbentry);
++ ent(watches, d->nbwatch);
++ ent(transactions, ta);
++ ent(outstanding, d->nboutstanding);
++ ent(memory, d->memory);
++
++#undef ent
++
++ send_reply(conn, XS_CONTROL, resp, strlen(resp) + 1);
++
++ return 0;
++}
++
+ static struct domain *alloc_domain(const void *context, unsigned int domid)
+ {
+ struct domain *domain;
+diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
+index 633c9a0a0a1f..904faa923afb 100644
+--- a/tools/xenstore/xenstored_domain.h
++++ b/tools/xenstore/xenstored_domain.h
+@@ -87,6 +87,8 @@ int domain_watch(struct connection *conn);
+ void domain_outstanding_inc(struct connection *conn);
+ void domain_outstanding_dec(struct connection *conn);
+ void domain_outstanding_domid_dec(unsigned int domid);
++int domain_get_quota(const void *ctx, struct connection *conn,
++ unsigned int domid);
+
+ /* Special node permission handling. */
+ int set_perms_special(struct connection *conn, const char *name,
+--
+2.37.4
+
diff --git a/0062-tools-ocaml-xenstored-Synchronise-defaults-with-oxen.patch b/0062-tools-ocaml-xenstored-Synchronise-defaults-with-oxen.patch
new file mode 100644
index 0000000..b9f5b18
--- /dev/null
+++ b/0062-tools-ocaml-xenstored-Synchronise-defaults-with-oxen.patch
@@ -0,0 +1,63 @@
+From b0e95b451225de4db99bbe0b8dc79fdf08873e9e Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Wed, 12 Oct 2022 19:13:01 +0100
+Subject: [PATCH 62/87] tools/ocaml/xenstored: Synchronise defaults with
+ oxenstore.conf.in
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+We currently have 2 different set of defaults in upstream Xen git tree:
+* defined in the source code, only used if there is no config file
+* defined in the oxenstored.conf.in upstream Xen
+
+An oxenstored.conf file is not mandatory, and if missing, maxrequests in
+particular has an unsafe default.
+
+Resync the defaults from oxenstored.conf.in into the source code.
+
+This is part of XSA-326 / CVE-2022-42316.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 84734955d4bf629ba459a74773afcde50a52236f)
+---
+ tools/ocaml/xenstored/define.ml | 6 +++---
+ tools/ocaml/xenstored/quota.ml | 4 ++--
+ 2 files changed, 5 insertions(+), 5 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/define.ml b/tools/ocaml/xenstored/define.ml
+index ebe18b8e312c..6b06f808595b 100644
+--- a/tools/ocaml/xenstored/define.ml
++++ b/tools/ocaml/xenstored/define.ml
+@@ -21,9 +21,9 @@ let xs_daemon_socket = Paths.xen_run_stored ^ "/socket"
+
+ let default_config_dir = Paths.xen_config_dir
+
+-let maxwatch = ref (50)
+-let maxtransaction = ref (20)
+-let maxrequests = ref (-1) (* maximum requests per transaction *)
++let maxwatch = ref (100)
++let maxtransaction = ref (10)
++let maxrequests = ref (1024) (* maximum requests per transaction *)
+
+ let conflict_burst_limit = ref 5.0
+ let conflict_max_history_seconds = ref 0.05
+diff --git a/tools/ocaml/xenstored/quota.ml b/tools/ocaml/xenstored/quota.ml
+index abcac912805a..6e3d6401ae89 100644
+--- a/tools/ocaml/xenstored/quota.ml
++++ b/tools/ocaml/xenstored/quota.ml
+@@ -20,8 +20,8 @@ exception Transaction_opened
+
+ let warn fmt = Logging.warn "quota" fmt
+ let activate = ref true
+-let maxent = ref (10000)
+-let maxsize = ref (4096)
++let maxent = ref (1000)
++let maxsize = ref (2048)
+
+ type t = {
+ maxent: int; (* max entities per domU *)
+--
+2.37.4
+
diff --git a/0063-tools-ocaml-xenstored-Check-for-maxrequests-before-p.patch b/0063-tools-ocaml-xenstored-Check-for-maxrequests-before-p.patch
new file mode 100644
index 0000000..5b3b646
--- /dev/null
+++ b/0063-tools-ocaml-xenstored-Check-for-maxrequests-before-p.patch
@@ -0,0 +1,101 @@
+From ab21bb1971a7fa9308053b0686f43277f6e8a6c9 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Thu, 28 Jul 2022 17:08:15 +0100
+Subject: [PATCH 63/87] tools/ocaml/xenstored: Check for maxrequests before
+ performing operations
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Previously we'd perform the operation, record the updated tree in the
+transaction record, then try to insert a watchop path and the reply packet.
+
+If we exceeded max requests we would've returned EQUOTA, but still:
+* have performed the operation on the transaction's tree
+* have recorded the watchop, making this queue effectively unbounded
+
+It is better if we check whether we'd have room to store the operation before
+performing the transaction, and raise EQUOTA there. Then the transaction
+record won't grow.
+
+This is part of XSA-326 / CVE-2022-42317.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 329f4d1a6535c6c5a34025ca0d03fc5c7228fcff)
+---
+ tools/ocaml/xenstored/process.ml | 4 +++-
+ tools/ocaml/xenstored/transaction.ml | 16 ++++++++++++----
+ 2 files changed, 15 insertions(+), 5 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
+index 27790d4a5c41..dd58e6979cf9 100644
+--- a/tools/ocaml/xenstored/process.ml
++++ b/tools/ocaml/xenstored/process.ml
+@@ -389,6 +389,7 @@ let input_handle_error ~cons ~doms ~fct ~con ~t ~req =
+ let reply_error e =
+ Packet.Error e in
+ try
++ Transaction.check_quota_exn ~perm:(Connection.get_perm con) t;
+ fct con t doms cons req.Packet.data
+ with
+ | Define.Invalid_path -> reply_error "EINVAL"
+@@ -681,9 +682,10 @@ let process_packet ~store ~cons ~doms ~con ~req =
+ in
+
+ let response = try
++ Transaction.check_quota_exn ~perm:(Connection.get_perm con) t;
+ if tid <> Transaction.none then
+ (* Remember the request and response for this operation in case we need to replay the transaction *)
+- Transaction.add_operation ~perm:(Connection.get_perm con) t req response;
++ Transaction.add_operation t req response;
+ response
+ with Quota.Limit_reached ->
+ Packet.Error "EQUOTA"
+diff --git a/tools/ocaml/xenstored/transaction.ml b/tools/ocaml/xenstored/transaction.ml
+index 17b1bdf2eaf9..294143e2335b 100644
+--- a/tools/ocaml/xenstored/transaction.ml
++++ b/tools/ocaml/xenstored/transaction.ml
+@@ -85,6 +85,7 @@ type t = {
+ oldroot: Store.Node.t;
+ mutable paths: (Xenbus.Xb.Op.operation * Store.Path.t) list;
+ mutable operations: (Packet.request * Packet.response) list;
++ mutable quota_reached: bool;
+ mutable read_lowpath: Store.Path.t option;
+ mutable write_lowpath: Store.Path.t option;
+ }
+@@ -127,6 +128,7 @@ let make ?(internal=false) id store =
+ oldroot = Store.get_root store;
+ paths = [];
+ operations = [];
++ quota_reached = false;
+ read_lowpath = None;
+ write_lowpath = None;
+ } in
+@@ -143,13 +145,19 @@ let get_root t = Store.get_root t.store
+
+ let is_read_only t = t.paths = []
+ let add_wop t ty path = t.paths <- (ty, path) :: t.paths
+-let add_operation ~perm t request response =
++let get_operations t = List.rev t.operations
++
++let check_quota_exn ~perm t =
+ if !Define.maxrequests >= 0
+ && not (Perms.Connection.is_dom0 perm)
+- && List.length t.operations >= !Define.maxrequests
+- then raise Quota.Limit_reached;
++ && (t.quota_reached || List.length t.operations >= !Define.maxrequests)
++ then begin
++ t.quota_reached <- true;
++ raise Quota.Limit_reached;
++ end
++
++let add_operation t request response =
+ t.operations <- (request, response) :: t.operations
+-let get_operations t = List.rev t.operations
+ let set_read_lowpath t path = t.read_lowpath <- get_lowest path t.read_lowpath
+ let set_write_lowpath t path = t.write_lowpath <- get_lowest path t.write_lowpath
+
+--
+2.37.4
+
diff --git a/0064-tools-ocaml-GC-parameter-tuning.patch b/0064-tools-ocaml-GC-parameter-tuning.patch
new file mode 100644
index 0000000..6c80e2d
--- /dev/null
+++ b/0064-tools-ocaml-GC-parameter-tuning.patch
@@ -0,0 +1,126 @@
+From a63bbcf5318b487ca86574d7fcf916958af5ed02 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Wed, 12 Oct 2022 19:13:07 +0100
+Subject: [PATCH 64/87] tools/ocaml: GC parameter tuning
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+By default the OCaml garbage collector would return memory to the OS only
+after unused memory is 5x live memory. Tweak this to 120% instead, which
+would match the major GC speed.
+
+This is part of XSA-326.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 4a8bacff20b857ca0d628ef5525877ade11f2a42)
+---
+ tools/ocaml/xenstored/define.ml | 1 +
+ tools/ocaml/xenstored/xenstored.ml | 64 ++++++++++++++++++++++++++++++
+ 2 files changed, 65 insertions(+)
+
+diff --git a/tools/ocaml/xenstored/define.ml b/tools/ocaml/xenstored/define.ml
+index 6b06f808595b..ba63a8147e09 100644
+--- a/tools/ocaml/xenstored/define.ml
++++ b/tools/ocaml/xenstored/define.ml
+@@ -25,6 +25,7 @@ let maxwatch = ref (100)
+ let maxtransaction = ref (10)
+ let maxrequests = ref (1024) (* maximum requests per transaction *)
+
++let gc_max_overhead = ref 120 (* 120% see comment in xenstored.ml *)
+ let conflict_burst_limit = ref 5.0
+ let conflict_max_history_seconds = ref 0.05
+ let conflict_rate_limit_is_aggregate = ref true
+diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
+index d44ae673c42a..3b57ad016dfb 100644
+--- a/tools/ocaml/xenstored/xenstored.ml
++++ b/tools/ocaml/xenstored/xenstored.ml
+@@ -104,6 +104,7 @@ let parse_config filename =
+ ("quota-maxsize", Config.Set_int Quota.maxsize);
+ ("quota-maxrequests", Config.Set_int Define.maxrequests);
+ ("quota-path-max", Config.Set_int Define.path_max);
++ ("gc-max-overhead", Config.Set_int Define.gc_max_overhead);
+ ("test-eagain", Config.Set_bool Transaction.test_eagain);
+ ("persistent", Config.Set_bool Disk.enable);
+ ("xenstored-log-file", Config.String Logging.set_xenstored_log_destination);
+@@ -265,6 +266,67 @@ let to_file store cons fds file =
+ (fun () -> close_out channel)
+ end
+
++(*
++ By default OCaml's GC only returns memory to the OS when it exceeds a
++ configurable 'max overhead' setting.
++ The default is 500%, that is 5/6th of the OCaml heap needs to be free
++ and only 1/6th live for a compaction to be triggerred that would
++ release memory back to the OS.
++ If the limit is not hit then the OCaml process can reuse that memory
++ for its own purposes, but other processes won't be able to use it.
++
++ There is also a 'space overhead' setting that controls how much work
++ each major GC slice does, and by default aims at having no more than
++ 80% or 120% (depending on version) garbage values compared to live
++ values.
++ This doesn't have as much relevance to memory returned to the OS as
++ long as space_overhead <= max_overhead, because compaction is only
++ triggerred at the end of major GC cycles.
++
++ The defaults are too large once the program starts using ~100MiB of
++ memory, at which point ~500MiB would be unavailable to other processes
++ (which would be fine if this was the main process in this VM, but it is
++ not).
++
++ Max overhead can also be set to 0, however this is for testing purposes
++ only (setting it lower than 'space overhead' wouldn't help because the
++ major GC wouldn't run fast enough, and compaction does have a
++ performance cost: we can only compact contiguous regions, so memory has
++ to be moved around).
++
++ Max overhead controls how often the heap is compacted, which is useful
++ if there are burst of activity followed by long periods of idle state,
++ or if a domain quits, etc. Compaction returns memory to the OS.
++
++ wasted = live * space_overhead / 100
++
++ For globally overriding the GC settings one can use OCAMLRUNPARAM,
++ however we provide a config file override to be consistent with other
++ oxenstored settings.
++
++ One might want to dynamically adjust the overhead setting based on used
++ memory, i.e. to use a fixed upper bound in bytes, not percentage. However
++ measurements show that such adjustments increase GC overhead massively,
++ while still not guaranteeing that memory is returned any more quickly
++ than with a percentage based setting.
++
++ The allocation policy could also be tweaked, e.g. first fit would reduce
++ fragmentation and thus memory usage, but the documentation warns that it
++ can be sensibly slower, and indeed one of our own testcases can trigger
++ such a corner case where it is multiple times slower, so it is best to keep
++ the default allocation policy (next-fit/best-fit depending on version).
++
++ There are other tweaks that can be attempted in the future, e.g. setting
++ 'ulimit -v' to 75% of RAM, however getting the kernel to actually return
++ NULL from allocations is difficult even with that setting, and without a
++ NULL the emergency GC won't be triggerred.
++ Perhaps cgroup limits could help, but for now tweak the safest only.
++*)
++
++let tweak_gc () =
++ Gc.set { (Gc.get ()) with Gc.max_overhead = !Define.gc_max_overhead }
++
++
+ let _ =
+ let cf = do_argv in
+ let pidfile =
+@@ -274,6 +336,8 @@ let _ =
+ default_pidfile
+ in
+
++ tweak_gc ();
++
+ (try
+ Unixext.mkdir_rec (Filename.dirname pidfile) 0o755
+ with _ ->
+--
+2.37.4
+
diff --git a/0065-tools-ocaml-libs-xb-hide-type-of-Xb.t.patch b/0065-tools-ocaml-libs-xb-hide-type-of-Xb.t.patch
new file mode 100644
index 0000000..4c1bcbe
--- /dev/null
+++ b/0065-tools-ocaml-libs-xb-hide-type-of-Xb.t.patch
@@ -0,0 +1,92 @@
+From 8b60ad49b46f2e020e0f0847df80c768d669cdb2 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Fri, 29 Jul 2022 18:53:29 +0100
+Subject: [PATCH 65/87] tools/ocaml/libs/xb: hide type of Xb.t
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Hiding the type will make it easier to change the implementation
+in the future without breaking code that relies on it.
+
+No functional change.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 7ade30a1451734d041363c750a65d322e25b47ba)
+---
+ tools/ocaml/libs/xb/xb.ml | 3 +++
+ tools/ocaml/libs/xb/xb.mli | 9 ++-------
+ tools/ocaml/xenstored/connection.ml | 8 ++------
+ 3 files changed, 7 insertions(+), 13 deletions(-)
+
+diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
+index 104d319d7747..8404ddd8a682 100644
+--- a/tools/ocaml/libs/xb/xb.ml
++++ b/tools/ocaml/libs/xb/xb.ml
+@@ -196,6 +196,9 @@ let peek_output con = Queue.peek con.pkt_out
+ let input_len con = Queue.length con.pkt_in
+ let has_in_packet con = Queue.length con.pkt_in > 0
+ let get_in_packet con = Queue.pop con.pkt_in
++let has_partial_input con = match con.partial_in with
++ | HaveHdr _ -> true
++ | NoHdr (n, _) -> n < Partial.header_size ()
+ let has_more_input con =
+ match con.backend with
+ | Fd _ -> false
+diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
+index 3a00da6cddc1..794e35bb343e 100644
+--- a/tools/ocaml/libs/xb/xb.mli
++++ b/tools/ocaml/libs/xb/xb.mli
+@@ -66,13 +66,7 @@ type backend_mmap = {
+ type backend_fd = { fd : Unix.file_descr; }
+ type backend = Fd of backend_fd | Xenmmap of backend_mmap
+ type partial_buf = HaveHdr of Partial.pkt | NoHdr of int * bytes
+-type t = {
+- backend : backend;
+- pkt_in : Packet.t Queue.t;
+- pkt_out : Packet.t Queue.t;
+- mutable partial_in : partial_buf;
+- mutable partial_out : string;
+-}
++type t
+ val init_partial_in : unit -> partial_buf
+ val reconnect : t -> unit
+ val queue : t -> Packet.t -> unit
+@@ -97,6 +91,7 @@ val has_output : t -> bool
+ val peek_output : t -> Packet.t
+ val input_len : t -> int
+ val has_in_packet : t -> bool
++val has_partial_input : t -> bool
+ val get_in_packet : t -> Packet.t
+ val has_more_input : t -> bool
+ val is_selectable : t -> bool
+diff --git a/tools/ocaml/xenstored/connection.ml b/tools/ocaml/xenstored/connection.ml
+index 65f99ea6f28a..38b47363a173 100644
+--- a/tools/ocaml/xenstored/connection.ml
++++ b/tools/ocaml/xenstored/connection.ml
+@@ -125,9 +125,7 @@ let get_perm con =
+ let set_target con target_domid =
+ con.perm <- Perms.Connection.set_target (get_perm con) ~perms:[Perms.READ; Perms.WRITE] target_domid
+
+-let is_backend_mmap con = match con.xb.Xenbus.Xb.backend with
+- | Xenbus.Xb.Xenmmap _ -> true
+- | _ -> false
++let is_backend_mmap con = Xenbus.Xb.is_mmap con.xb
+
+ let send_reply con tid rid ty data =
+ if (String.length data) > xenstore_payload_max && (is_backend_mmap con) then
+@@ -280,9 +278,7 @@ let get_transaction con tid =
+
+ let do_input con = Xenbus.Xb.input con.xb
+ let has_input con = Xenbus.Xb.has_in_packet con.xb
+-let has_partial_input con = match con.xb.Xenbus.Xb.partial_in with
+- | HaveHdr _ -> true
+- | NoHdr (n, _) -> n < Xenbus.Partial.header_size ()
++let has_partial_input con = Xenbus.Xb.has_partial_input con.xb
+ let pop_in con = Xenbus.Xb.get_in_packet con.xb
+ let has_more_input con = Xenbus.Xb.has_more_input con.xb
+
+--
+2.37.4
+
diff --git a/0066-tools-ocaml-Change-Xb.input-to-return-Packet.t-optio.patch b/0066-tools-ocaml-Change-Xb.input-to-return-Packet.t-optio.patch
new file mode 100644
index 0000000..0fa056d
--- /dev/null
+++ b/0066-tools-ocaml-Change-Xb.input-to-return-Packet.t-optio.patch
@@ -0,0 +1,224 @@
+From 59981b08c8ef6eed37b1171656c2a5f3b4b74012 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Wed, 12 Oct 2022 19:13:02 +0100
+Subject: [PATCH 66/87] tools/ocaml: Change Xb.input to return Packet.t option
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The queue here would only ever hold at most one element. This will simplify
+follow-up patches.
+
+This is part of XSA-326.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit c0a86a462721008eca5ff733660de094d3c34bc7)
+---
+ tools/ocaml/libs/xb/xb.ml | 18 +++++-------------
+ tools/ocaml/libs/xb/xb.mli | 5 +----
+ tools/ocaml/libs/xs/xsraw.ml | 20 ++++++--------------
+ tools/ocaml/xenstored/connection.ml | 4 +---
+ tools/ocaml/xenstored/process.ml | 15 +++++++--------
+ 5 files changed, 20 insertions(+), 42 deletions(-)
+
+diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
+index 8404ddd8a682..165fd4a1edf4 100644
+--- a/tools/ocaml/libs/xb/xb.ml
++++ b/tools/ocaml/libs/xb/xb.ml
+@@ -45,7 +45,6 @@ type partial_buf = HaveHdr of Partial.pkt | NoHdr of int * bytes
+ type t =
+ {
+ backend: backend;
+- pkt_in: Packet.t Queue.t;
+ pkt_out: Packet.t Queue.t;
+ mutable partial_in: partial_buf;
+ mutable partial_out: string;
+@@ -62,7 +61,6 @@ let reconnect t = match t.backend with
+ Xs_ring.close backend.mmap;
+ backend.eventchn_notify ();
+ (* Clear our old connection state *)
+- Queue.clear t.pkt_in;
+ Queue.clear t.pkt_out;
+ t.partial_in <- init_partial_in ();
+ t.partial_out <- ""
+@@ -124,7 +122,6 @@ let output con =
+
+ (* NB: can throw Reconnect *)
+ let input con =
+- let newpacket = ref false in
+ let to_read =
+ match con.partial_in with
+ | HaveHdr partial_pkt -> Partial.to_complete partial_pkt
+@@ -143,21 +140,19 @@ let input con =
+ if Partial.to_complete partial_pkt = 0 then (
+ let pkt = Packet.of_partialpkt partial_pkt in
+ con.partial_in <- init_partial_in ();
+- Queue.push pkt con.pkt_in;
+- newpacket := true
+- )
++ Some pkt
++ ) else None
+ | NoHdr (i, buf) ->
+ (* we complete the partial header *)
+ if sz > 0 then
+ Bytes.blit b 0 buf (Partial.header_size () - i) sz;
+ con.partial_in <- if sz = i then
+- HaveHdr (Partial.of_string (Bytes.to_string buf)) else NoHdr (i - sz, buf)
+- );
+- !newpacket
++ HaveHdr (Partial.of_string (Bytes.to_string buf)) else NoHdr (i - sz, buf);
++ None
++ )
+
+ let newcon backend = {
+ backend = backend;
+- pkt_in = Queue.create ();
+ pkt_out = Queue.create ();
+ partial_in = init_partial_in ();
+ partial_out = "";
+@@ -193,9 +188,6 @@ let has_output con = has_new_output con || has_old_output con
+
+ let peek_output con = Queue.peek con.pkt_out
+
+-let input_len con = Queue.length con.pkt_in
+-let has_in_packet con = Queue.length con.pkt_in > 0
+-let get_in_packet con = Queue.pop con.pkt_in
+ let has_partial_input con = match con.partial_in with
+ | HaveHdr _ -> true
+ | NoHdr (n, _) -> n < Partial.header_size ()
+diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
+index 794e35bb343e..91c682162cea 100644
+--- a/tools/ocaml/libs/xb/xb.mli
++++ b/tools/ocaml/libs/xb/xb.mli
+@@ -77,7 +77,7 @@ val write_fd : backend_fd -> 'a -> string -> int -> int
+ val write_mmap : backend_mmap -> 'a -> string -> int -> int
+ val write : t -> string -> int -> int
+ val output : t -> bool
+-val input : t -> bool
++val input : t -> Packet.t option
+ val newcon : backend -> t
+ val open_fd : Unix.file_descr -> t
+ val open_mmap : Xenmmap.mmap_interface -> (unit -> unit) -> t
+@@ -89,10 +89,7 @@ val has_new_output : t -> bool
+ val has_old_output : t -> bool
+ val has_output : t -> bool
+ val peek_output : t -> Packet.t
+-val input_len : t -> int
+-val has_in_packet : t -> bool
+ val has_partial_input : t -> bool
+-val get_in_packet : t -> Packet.t
+ val has_more_input : t -> bool
+ val is_selectable : t -> bool
+ val get_fd : t -> Unix.file_descr
+diff --git a/tools/ocaml/libs/xs/xsraw.ml b/tools/ocaml/libs/xs/xsraw.ml
+index d982fb24dbb1..451f8b38dbcc 100644
+--- a/tools/ocaml/libs/xs/xsraw.ml
++++ b/tools/ocaml/libs/xs/xsraw.ml
+@@ -94,26 +94,18 @@ let pkt_send con =
+ done
+
+ (* receive one packet - can sleep *)
+-let pkt_recv con =
+- let workdone = ref false in
+- while not !workdone
+- do
+- workdone := Xb.input con.xb
+- done;
+- Xb.get_in_packet con.xb
++let rec pkt_recv con =
++ match Xb.input con.xb with
++ | Some packet -> packet
++ | None -> pkt_recv con
+
+ let pkt_recv_timeout con timeout =
+ let fd = Xb.get_fd con.xb in
+ let r, _, _ = Unix.select [ fd ] [] [] timeout in
+ if r = [] then
+ true, None
+- else (
+- let workdone = Xb.input con.xb in
+- if workdone then
+- false, (Some (Xb.get_in_packet con.xb))
+- else
+- false, None
+- )
++ else
++ false, Xb.input con.xb
+
+ let queue_watchevent con data =
+ let ls = split_string ~limit:2 '\000' data in
+diff --git a/tools/ocaml/xenstored/connection.ml b/tools/ocaml/xenstored/connection.ml
+index 38b47363a173..cc20e047d2b9 100644
+--- a/tools/ocaml/xenstored/connection.ml
++++ b/tools/ocaml/xenstored/connection.ml
+@@ -277,9 +277,7 @@ let get_transaction con tid =
+ Hashtbl.find con.transactions tid
+
+ let do_input con = Xenbus.Xb.input con.xb
+-let has_input con = Xenbus.Xb.has_in_packet con.xb
+ let has_partial_input con = Xenbus.Xb.has_partial_input con.xb
+-let pop_in con = Xenbus.Xb.get_in_packet con.xb
+ let has_more_input con = Xenbus.Xb.has_more_input con.xb
+
+ let has_output con = Xenbus.Xb.has_output con.xb
+@@ -307,7 +305,7 @@ let is_bad con = match con.dom with None -> false | Some dom -> Domain.is_bad_do
+ Restrictions below can be relaxed once xenstored learns to dump more
+ of its live state in a safe way *)
+ let has_extra_connection_data con =
+- let has_in = has_input con || has_partial_input con in
++ let has_in = has_partial_input con in
+ let has_out = has_output con in
+ let has_socket = con.dom = None in
+ let has_nondefault_perms = make_perm con.dom <> con.perm in
+diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
+index dd58e6979cf9..cbf708213796 100644
+--- a/tools/ocaml/xenstored/process.ml
++++ b/tools/ocaml/xenstored/process.ml
+@@ -195,10 +195,9 @@ let parse_live_update args =
+ | _ when Unix.gettimeofday () < t.deadline -> false
+ | l ->
+ warn "timeout reached: have to wait, migrate or shutdown %d domains:" (List.length l);
+- let msgs = List.rev_map (fun con -> Printf.sprintf "%s: %d tx, in: %b, out: %b, perm: %s"
++ let msgs = List.rev_map (fun con -> Printf.sprintf "%s: %d tx, out: %b, perm: %s"
+ (Connection.get_domstr con)
+ (Connection.number_of_transactions con)
+- (Connection.has_input con)
+ (Connection.has_output con)
+ (Connection.get_perm con |> Perms.Connection.to_string)
+ ) l in
+@@ -706,16 +705,17 @@ let do_input store cons doms con =
+ info "%s requests a reconnect" (Connection.get_domstr con);
+ History.reconnect con;
+ info "%s reconnection complete" (Connection.get_domstr con);
+- false
++ None
+ | Failure exp ->
+ error "caught exception %s" exp;
+ error "got a bad client %s" (sprintf "%-8s" (Connection.get_domstr con));
+ Connection.mark_as_bad con;
+- false
++ None
+ in
+
+- if newpacket then (
+- let packet = Connection.pop_in con in
++ match newpacket with
++ | None -> ()
++ | Some packet ->
+ let tid, rid, ty, data = Xenbus.Xb.Packet.unpack packet in
+ let req = {Packet.tid=tid; Packet.rid=rid; Packet.ty=ty; Packet.data=data} in
+
+@@ -725,8 +725,7 @@ let do_input store cons doms con =
+ (Xenbus.Xb.Op.to_string ty) (sanitize_data data); *)
+ process_packet ~store ~cons ~doms ~con ~req;
+ write_access_log ~ty ~tid ~con:(Connection.get_domstr con) ~data;
+- Connection.incr_ops con;
+- )
++ Connection.incr_ops con
+
+ let do_output _store _cons _doms con =
+ if Connection.has_output con then (
+--
+2.37.4
+
diff --git a/0067-tools-ocaml-xb-Add-BoundedQueue.patch b/0067-tools-ocaml-xb-Add-BoundedQueue.patch
new file mode 100644
index 0000000..9a141a3
--- /dev/null
+++ b/0067-tools-ocaml-xb-Add-BoundedQueue.patch
@@ -0,0 +1,133 @@
+From ea1567893b05df03fe65657f0a25211a6a9ff7ec Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Wed, 12 Oct 2022 19:13:03 +0100
+Subject: [PATCH 67/87] tools/ocaml/xb: Add BoundedQueue
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Ensures we cannot store more than [capacity] elements in a [Queue]. Replacing
+all Queue with this module will then ensure at compile time that all Queues
+are correctly bound checked.
+
+Each element in the queue has a class with its own limits. This, in a
+subsequent change, will ensure that command responses can proceed during a
+flood of watch events.
+
+No functional change.
+
+This is part of XSA-326.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 19171fb5d888b4467a7073e8febc5e05540956e9)
+---
+ tools/ocaml/libs/xb/xb.ml | 92 +++++++++++++++++++++++++++++++++++++++
+ 1 file changed, 92 insertions(+)
+
+diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
+index 165fd4a1edf4..4197a3888a68 100644
+--- a/tools/ocaml/libs/xb/xb.ml
++++ b/tools/ocaml/libs/xb/xb.ml
+@@ -17,6 +17,98 @@
+ module Op = struct include Op end
+ module Packet = struct include Packet end
+
++module BoundedQueue : sig
++ type ('a, 'b) t
++
++ (** [create ~capacity ~classify ~limit] creates a queue with maximum [capacity] elements.
++ This is burst capacity, each element is further classified according to [classify],
++ and each class can have its own [limit].
++ [capacity] is enforced as an overall limit.
++ The [limit] can be dynamic, and can be smaller than the number of elements already queued of that class,
++ in which case those elements are considered to use "burst capacity".
++ *)
++ val create: capacity:int -> classify:('a -> 'b) -> limit:('b -> int) -> ('a, 'b) t
++
++ (** [clear q] discards all elements from [q] *)
++ val clear: ('a, 'b) t -> unit
++
++ (** [can_push q] when [length q < capacity]. *)
++ val can_push: ('a, 'b) t -> 'b -> bool
++
++ (** [push e q] adds [e] at the end of queue [q] if [can_push q], or returns [None]. *)
++ val push: 'a -> ('a, 'b) t -> unit option
++
++ (** [pop q] removes and returns first element in [q], or raises [Queue.Empty]. *)
++ val pop: ('a, 'b) t -> 'a
++
++ (** [peek q] returns the first element in [q], or raises [Queue.Empty]. *)
++ val peek : ('a, 'b) t -> 'a
++
++ (** [length q] returns the current number of elements in [q] *)
++ val length: ('a, 'b) t -> int
++
++ (** [debug string_of_class q] prints queue usage statistics in an unspecified internal format. *)
++ val debug: ('b -> string) -> (_, 'b) t -> string
++end = struct
++ type ('a, 'b) t =
++ { q: 'a Queue.t
++ ; capacity: int
++ ; classify: 'a -> 'b
++ ; limit: 'b -> int
++ ; class_count: ('b, int) Hashtbl.t
++ }
++
++ let create ~capacity ~classify ~limit =
++ { capacity; q = Queue.create (); classify; limit; class_count = Hashtbl.create 3 }
++
++ let get_count t classification = try Hashtbl.find t.class_count classification with Not_found -> 0
++
++ let can_push_internal t classification class_count =
++ Queue.length t.q < t.capacity && class_count < t.limit classification
++
++ let ok = Some ()
++
++ let push e t =
++ let classification = t.classify e in
++ let class_count = get_count t classification in
++ if can_push_internal t classification class_count then begin
++ Queue.push e t.q;
++ Hashtbl.replace t.class_count classification (class_count + 1);
++ ok
++ end
++ else
++ None
++
++ let can_push t classification =
++ can_push_internal t classification @@ get_count t classification
++
++ let clear t =
++ Queue.clear t.q;
++ Hashtbl.reset t.class_count
++
++ let pop t =
++ let e = Queue.pop t.q in
++ let classification = t.classify e in
++ let () = match get_count t classification - 1 with
++ | 0 -> Hashtbl.remove t.class_count classification (* reduces memusage *)
++ | n -> Hashtbl.replace t.class_count classification n
++ in
++ e
++
++ let peek t = Queue.peek t.q
++ let length t = Queue.length t.q
++
++ let debug string_of_class t =
++ let b = Buffer.create 128 in
++ Printf.bprintf b "BoundedQueue capacity: %d, used: {" t.capacity;
++ Hashtbl.iter (fun packet_class count ->
++ Printf.bprintf b " %s: %d" (string_of_class packet_class) count
++ ) t.class_count;
++ Printf.bprintf b "}";
++ Buffer.contents b
++end
++
++
+ exception End_of_file
+ exception Eagain
+ exception Noent
+--
+2.37.4
+
diff --git a/0068-tools-ocaml-Limit-maximum-in-flight-requests-outstan.patch b/0068-tools-ocaml-Limit-maximum-in-flight-requests-outstan.patch
new file mode 100644
index 0000000..0572fa1
--- /dev/null
+++ b/0068-tools-ocaml-Limit-maximum-in-flight-requests-outstan.patch
@@ -0,0 +1,888 @@
+From cec3c52c287f5aee7de061b40765aca5301cf9ca Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Wed, 12 Oct 2022 19:13:04 +0100
+Subject: [PATCH 68/87] tools/ocaml: Limit maximum in-flight requests /
+ outstanding replies
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Introduce a limit on the number of outstanding reply packets in the xenbus
+queue. This limits the number of in-flight requests: when the output queue is
+full we'll stop processing inputs until the output queue has room again.
+
+To avoid a busy loop on the Unix socket we only add it to the watched input
+file descriptor set if we'd be able to call `input` on it. Even though Dom0
+is trusted and exempt from quotas a flood of events might cause a backlog
+where events are produced faster than daemons in Dom0 can consume them, which
+could lead to an unbounded queue size and OOM.
+
+Therefore the xenbus queue limit must apply to all connections, Dom0 is not
+exempt from it, although if everything works correctly it will eventually
+catch up.
+
+This prevents a malicious guest from sending more commands while it has
+outstanding watch events or command replies in its input ring. However if it
+can cause the generation of watch events by other means (e.g. by Dom0, or
+another cooperative guest) and stop reading its own ring then watch events
+would've queued up without limit.
+
+The xenstore protocol doesn't have a back-pressure mechanism, and doesn't
+allow dropping watch events. In fact, dropping watch events is known to break
+some pieces of normal functionality. This leaves little choice to safely
+implement the xenstore protocol without exposing the xenstore daemon to
+out-of-memory attacks.
+
+Implement the fix as pipes with bounded buffers:
+* Use a bounded buffer for watch events
+* The watch structure will have a bounded receiving pipe of watch events
+* The source will have an "overflow" pipe of pending watch events it couldn't
+ deliver
+
+Items are queued up on one end and are sent as far along the pipe as possible:
+
+ source domain -> watch -> xenbus of target -> xenstore ring/socket of target
+
+If the pipe is "full" at any point then back-pressure is applied and we prevent
+more items from being queued up. For the source domain this means that we'll
+stop accepting new commands as long as its pipe buffer is not empty.
+
+Before we try to enqueue an item we first check whether it is possible to send
+it further down the pipe, by attempting to recursively flush the pipes. This
+ensures that we retain the order of events as much as possible.
+
+We might break causality of watch events if the target domain's queue is full
+and we need to start using the watch's queue. This is a breaking change in
+the xenstore protocol, but only for domains which are not processing their
+incoming ring as expected.
+
+When a watch is deleted its entire pending queue is dropped (no code is needed
+for that, because it is part of the 'watch' type).
+
+There is a cache of watches that have pending events that we attempt to flush
+at every cycle if possible.
+
+Introduce 3 limits here:
+* quota-maxwatchevents on watch event destination: when this is hit the
+ source will not be allowed to queue up more watch events.
+* quota-maxoustanding which is the number of responses not read from the ring:
+ once exceeded, no more inputs are processed until all outstanding replies
+ are consumed by the client.
+* overflow queue on the watch event source: all watches that cannot be stored
+ on destination are queued up here, a single command can trigger multiple
+ watches (e.g. due to recursion).
+
+The overflow queue currently doesn't have an upper bound, it is difficult to
+accurately calculate one as it depends on whether you are Dom0 and how many
+watches each path has registered and how many watch events you can trigger
+with a single command (e.g. a commit). However these events were already
+using memory, this just moves them elsewhere, and as long as we correctly
+block a domain it shouldn't result in unbounded memory usage.
+
+Note that Dom0 is not excluded from these checks, it is important that Dom0 is
+especially not excluded when it is the source, since there are many ways in
+which a guest could trigger Dom0 to send it watch events.
+
+This should protect against malicious frontends as long as the backend follows
+the PV xenstore protocol and only exposes paths needed by the frontend, and
+changes those paths at most once as a reaction to guest events, or protocol
+state.
+
+The queue limits are per watch, and per domain-pair, so even if one
+communication channel would be "blocked", others would keep working, and the
+domain itself won't get blocked as long as it doesn't overflow the queue of
+watch events.
+
+Similarly a malicious backend could cause the frontend to get blocked, but
+this watch queue protects the frontend as well as long as it follows the PV
+protocol. (Although note that protection against malicious backends is only a
+best effort at the moment)
+
+This is part of XSA-326 / CVE-2022-42318.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 9284ae0c40fb5b9606947eaaec23dc71d0540e96)
+---
+ tools/ocaml/libs/xb/xb.ml | 61 +++++++--
+ tools/ocaml/libs/xb/xb.mli | 11 +-
+ tools/ocaml/libs/xs/queueop.ml | 25 ++--
+ tools/ocaml/libs/xs/xsraw.ml | 4 +-
+ tools/ocaml/xenstored/connection.ml | 155 +++++++++++++++++++++--
+ tools/ocaml/xenstored/connections.ml | 57 +++++++--
+ tools/ocaml/xenstored/define.ml | 7 +
+ tools/ocaml/xenstored/oxenstored.conf.in | 2 +
+ tools/ocaml/xenstored/process.ml | 31 ++++-
+ tools/ocaml/xenstored/xenstored.ml | 2 +
+ 10 files changed, 296 insertions(+), 59 deletions(-)
+
+diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
+index 4197a3888a68..b292ed7a874d 100644
+--- a/tools/ocaml/libs/xb/xb.ml
++++ b/tools/ocaml/libs/xb/xb.ml
+@@ -134,14 +134,44 @@ type backend = Fd of backend_fd | Xenmmap of backend_mmap
+
+ type partial_buf = HaveHdr of Partial.pkt | NoHdr of int * bytes
+
++(*
++ separate capacity reservation for replies and watch events:
++ this allows a domain to keep working even when under a constant flood of
++ watch events
++*)
++type capacity = { maxoutstanding: int; maxwatchevents: int }
++
++module Queue = BoundedQueue
++
++type packet_class =
++ | CommandReply
++ | Watchevent
++
++let string_of_packet_class = function
++ | CommandReply -> "command_reply"
++ | Watchevent -> "watch_event"
++
+ type t =
+ {
+ backend: backend;
+- pkt_out: Packet.t Queue.t;
++ pkt_out: (Packet.t, packet_class) Queue.t;
+ mutable partial_in: partial_buf;
+ mutable partial_out: string;
++ capacity: capacity
+ }
+
++let to_read con =
++ match con.partial_in with
++ | HaveHdr partial_pkt -> Partial.to_complete partial_pkt
++ | NoHdr (i, _) -> i
++
++let debug t =
++ Printf.sprintf "XenBus state: partial_in: %d needed, partial_out: %d bytes, pkt_out: %d packets, %s"
++ (to_read t)
++ (String.length t.partial_out)
++ (Queue.length t.pkt_out)
++ (BoundedQueue.debug string_of_packet_class t.pkt_out)
++
+ let init_partial_in () = NoHdr
+ (Partial.header_size (), Bytes.make (Partial.header_size()) '\000')
+
+@@ -199,7 +229,8 @@ let output con =
+ let s = if String.length con.partial_out > 0 then
+ con.partial_out
+ else if Queue.length con.pkt_out > 0 then
+- Packet.to_string (Queue.pop con.pkt_out)
++ let pkt = Queue.pop con.pkt_out in
++ Packet.to_string pkt
+ else
+ "" in
+ (* send data from s, and save the unsent data to partial_out *)
+@@ -212,12 +243,15 @@ let output con =
+ (* after sending one packet, partial is empty *)
+ con.partial_out = ""
+
++(* we can only process an input packet if we're guaranteed to have room
++ to store the response packet *)
++let can_input con = Queue.can_push con.pkt_out CommandReply
++
+ (* NB: can throw Reconnect *)
+ let input con =
+- let to_read =
+- match con.partial_in with
+- | HaveHdr partial_pkt -> Partial.to_complete partial_pkt
+- | NoHdr (i, _) -> i in
++ if not (can_input con) then None
++ else
++ let to_read = to_read con in
+
+ (* try to get more data from input stream *)
+ let b = Bytes.make to_read '\000' in
+@@ -243,11 +277,22 @@ let input con =
+ None
+ )
+
+-let newcon backend = {
++let classify t =
++ match t.Packet.ty with
++ | Op.Watchevent -> Watchevent
++ | _ -> CommandReply
++
++let newcon ~capacity backend =
++ let limit = function
++ | CommandReply -> capacity.maxoutstanding
++ | Watchevent -> capacity.maxwatchevents
++ in
++ {
+ backend = backend;
+- pkt_out = Queue.create ();
++ pkt_out = Queue.create ~capacity:(capacity.maxoutstanding + capacity.maxwatchevents) ~classify ~limit;
+ partial_in = init_partial_in ();
+ partial_out = "";
++ capacity = capacity;
+ }
+
+ let open_fd fd = newcon (Fd { fd = fd; })
+diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
+index 91c682162cea..71b2754ca788 100644
+--- a/tools/ocaml/libs/xb/xb.mli
++++ b/tools/ocaml/libs/xb/xb.mli
+@@ -66,10 +66,11 @@ type backend_mmap = {
+ type backend_fd = { fd : Unix.file_descr; }
+ type backend = Fd of backend_fd | Xenmmap of backend_mmap
+ type partial_buf = HaveHdr of Partial.pkt | NoHdr of int * bytes
++type capacity = { maxoutstanding: int; maxwatchevents: int }
+ type t
+ val init_partial_in : unit -> partial_buf
+ val reconnect : t -> unit
+-val queue : t -> Packet.t -> unit
++val queue : t -> Packet.t -> unit option
+ val read_fd : backend_fd -> 'a -> bytes -> int -> int
+ val read_mmap : backend_mmap -> 'a -> bytes -> int -> int
+ val read : t -> bytes -> int -> int
+@@ -78,13 +79,14 @@ val write_mmap : backend_mmap -> 'a -> string -> int -> int
+ val write : t -> string -> int -> int
+ val output : t -> bool
+ val input : t -> Packet.t option
+-val newcon : backend -> t
+-val open_fd : Unix.file_descr -> t
+-val open_mmap : Xenmmap.mmap_interface -> (unit -> unit) -> t
++val newcon : capacity:capacity -> backend -> t
++val open_fd : Unix.file_descr -> capacity:capacity -> t
++val open_mmap : Xenmmap.mmap_interface -> (unit -> unit) -> capacity:capacity -> t
+ val close : t -> unit
+ val is_fd : t -> bool
+ val is_mmap : t -> bool
+ val output_len : t -> int
++val can_input: t -> bool
+ val has_new_output : t -> bool
+ val has_old_output : t -> bool
+ val has_output : t -> bool
+@@ -93,3 +95,4 @@ val has_partial_input : t -> bool
+ val has_more_input : t -> bool
+ val is_selectable : t -> bool
+ val get_fd : t -> Unix.file_descr
++val debug: t -> string
+diff --git a/tools/ocaml/libs/xs/queueop.ml b/tools/ocaml/libs/xs/queueop.ml
+index 9ff5bbd529ce..4e532cdaeacb 100644
+--- a/tools/ocaml/libs/xs/queueop.ml
++++ b/tools/ocaml/libs/xs/queueop.ml
+@@ -16,9 +16,10 @@
+ open Xenbus
+
+ let data_concat ls = (String.concat "\000" ls) ^ "\000"
++let queue con pkt = let r = Xb.queue con pkt in assert (r <> None)
+ let queue_path ty (tid: int) (path: string) con =
+ let data = data_concat [ path; ] in
+- Xb.queue con (Xb.Packet.create tid 0 ty data)
++ queue con (Xb.Packet.create tid 0 ty data)
+
+ (* operations *)
+ let directory tid path con = queue_path Xb.Op.Directory tid path con
+@@ -27,48 +28,48 @@ let read tid path con = queue_path Xb.Op.Read tid path con
+ let getperms tid path con = queue_path Xb.Op.Getperms tid path con
+
+ let debug commands con =
+- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Debug (data_concat commands))
++ queue con (Xb.Packet.create 0 0 Xb.Op.Debug (data_concat commands))
+
+ let watch path data con =
+ let data = data_concat [ path; data; ] in
+- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Watch data)
++ queue con (Xb.Packet.create 0 0 Xb.Op.Watch data)
+
+ let unwatch path data con =
+ let data = data_concat [ path; data; ] in
+- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Unwatch data)
++ queue con (Xb.Packet.create 0 0 Xb.Op.Unwatch data)
+
+ let transaction_start con =
+- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Transaction_start (data_concat []))
++ queue con (Xb.Packet.create 0 0 Xb.Op.Transaction_start (data_concat []))
+
+ let transaction_end tid commit con =
+ let data = data_concat [ (if commit then "T" else "F"); ] in
+- Xb.queue con (Xb.Packet.create tid 0 Xb.Op.Transaction_end data)
++ queue con (Xb.Packet.create tid 0 Xb.Op.Transaction_end data)
+
+ let introduce domid mfn port con =
+ let data = data_concat [ Printf.sprintf "%u" domid;
+ Printf.sprintf "%nu" mfn;
+ string_of_int port; ] in
+- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Introduce data)
++ queue con (Xb.Packet.create 0 0 Xb.Op.Introduce data)
+
+ let release domid con =
+ let data = data_concat [ Printf.sprintf "%u" domid; ] in
+- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Release data)
++ queue con (Xb.Packet.create 0 0 Xb.Op.Release data)
+
+ let resume domid con =
+ let data = data_concat [ Printf.sprintf "%u" domid; ] in
+- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Resume data)
++ queue con (Xb.Packet.create 0 0 Xb.Op.Resume data)
+
+ let getdomainpath domid con =
+ let data = data_concat [ Printf.sprintf "%u" domid; ] in
+- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Getdomainpath data)
++ queue con (Xb.Packet.create 0 0 Xb.Op.Getdomainpath data)
+
+ let write tid path value con =
+ let data = path ^ "\000" ^ value (* no NULL at the end *) in
+- Xb.queue con (Xb.Packet.create tid 0 Xb.Op.Write data)
++ queue con (Xb.Packet.create tid 0 Xb.Op.Write data)
+
+ let mkdir tid path con = queue_path Xb.Op.Mkdir tid path con
+ let rm tid path con = queue_path Xb.Op.Rm tid path con
+
+ let setperms tid path perms con =
+ let data = data_concat [ path; perms ] in
+- Xb.queue con (Xb.Packet.create tid 0 Xb.Op.Setperms data)
++ queue con (Xb.Packet.create tid 0 Xb.Op.Setperms data)
+diff --git a/tools/ocaml/libs/xs/xsraw.ml b/tools/ocaml/libs/xs/xsraw.ml
+index 451f8b38dbcc..cbd17280600c 100644
+--- a/tools/ocaml/libs/xs/xsraw.ml
++++ b/tools/ocaml/libs/xs/xsraw.ml
+@@ -36,8 +36,10 @@ type con = {
+ let close con =
+ Xb.close con.xb
+
++let capacity = { Xb.maxoutstanding = 1; maxwatchevents = 0; }
++
+ let open_fd fd = {
+- xb = Xb.open_fd fd;
++ xb = Xb.open_fd ~capacity fd;
+ watchevents = Queue.create ();
+ }
+
+diff --git a/tools/ocaml/xenstored/connection.ml b/tools/ocaml/xenstored/connection.ml
+index cc20e047d2b9..9624a5f9da2c 100644
+--- a/tools/ocaml/xenstored/connection.ml
++++ b/tools/ocaml/xenstored/connection.ml
+@@ -20,12 +20,84 @@ open Stdext
+
+ let xenstore_payload_max = 4096 (* xen/include/public/io/xs_wire.h *)
+
++type 'a bounded_sender = 'a -> unit option
++(** a bounded sender accepts an ['a] item and returns:
++ None - if there is no room to accept the item
++ Some () - if it has successfully accepted/sent the item
++ *)
++
++module BoundedPipe : sig
++ type 'a t
++
++ (** [create ~capacity ~destination] creates a bounded pipe with a
++ local buffer holding at most [capacity] items. Once the buffer is
++ full it will not accept further items. items from the pipe are
++ flushed into [destination] as long as it accepts items. The
++ destination could be another pipe.
++ *)
++ val create: capacity:int -> destination:'a bounded_sender -> 'a t
++
++ (** [is_empty t] returns whether the local buffer of [t] is empty. *)
++ val is_empty : _ t -> bool
++
++ (** [length t] the number of items in the internal buffer *)
++ val length: _ t -> int
++
++ (** [flush_pipe t] sends as many items from the local buffer as possible,
++ which could be none. *)
++ val flush_pipe: _ t -> unit
++
++ (** [push t item] tries to [flush_pipe] and then push [item]
++ into the pipe if its [capacity] allows.
++ Returns [None] if there is no more room
++ *)
++ val push : 'a t -> 'a bounded_sender
++end = struct
++ (* items are enqueued in [q], and then flushed to [connect_to] *)
++ type 'a t =
++ { q: 'a Queue.t
++ ; destination: 'a bounded_sender
++ ; capacity: int
++ }
++
++ let create ~capacity ~destination =
++ { q = Queue.create (); capacity; destination }
++
++ let rec flush_pipe t =
++ if not Queue.(is_empty t.q) then
++ let item = Queue.peek t.q in
++ match t.destination item with
++ | None -> () (* no room *)
++ | Some () ->
++ (* successfully sent item to next stage *)
++ let _ = Queue.pop t.q in
++ (* continue trying to send more items *)
++ flush_pipe t
++
++ let push t item =
++ (* first try to flush as many items from this pipe as possible to make room,
++ it is important to do this first to preserve the order of the items
++ *)
++ flush_pipe t;
++ if Queue.length t.q < t.capacity then begin
++ (* enqueue, instead of sending directly.
++ this ensures that [out] sees the items in the same order as we receive them
++ *)
++ Queue.push item t.q;
++ Some (flush_pipe t)
++ end else None
++
++ let is_empty t = Queue.is_empty t.q
++ let length t = Queue.length t.q
++end
++
+ type watch = {
+ con: t;
+ token: string;
+ path: string;
+ base: string;
+ is_relative: bool;
++ pending_watchevents: Xenbus.Xb.Packet.t BoundedPipe.t;
+ }
+
+ and t = {
+@@ -38,8 +110,36 @@ and t = {
+ anonid: int;
+ mutable stat_nb_ops: int;
+ mutable perm: Perms.Connection.t;
++ pending_source_watchevents: (watch * Xenbus.Xb.Packet.t) BoundedPipe.t
+ }
+
++module Watch = struct
++ module T = struct
++ type t = watch
++
++ let compare w1 w2 =
++ (* cannot compare watches from different connections *)
++ assert (w1.con == w2.con);
++ match String.compare w1.token w2.token with
++ | 0 -> String.compare w1.path w2.path
++ | n -> n
++ end
++ module Set = Set.Make(T)
++
++ let flush_events t =
++ BoundedPipe.flush_pipe t.pending_watchevents;
++ not (BoundedPipe.is_empty t.pending_watchevents)
++
++ let pending_watchevents t =
++ BoundedPipe.length t.pending_watchevents
++end
++
++let source_flush_watchevents t =
++ BoundedPipe.flush_pipe t.pending_source_watchevents
++
++let source_pending_watchevents t =
++ BoundedPipe.length t.pending_source_watchevents
++
+ let mark_as_bad con =
+ match con.dom with
+ |None -> ()
+@@ -67,7 +167,8 @@ let watch_create ~con ~path ~token = {
+ token = token;
+ path = path;
+ base = get_path con;
+- is_relative = path.[0] <> '/' && path.[0] <> '@'
++ is_relative = path.[0] <> '/' && path.[0] <> '@';
++ pending_watchevents = BoundedPipe.create ~capacity:!Define.maxwatchevents ~destination:(Xenbus.Xb.queue con.xb)
+ }
+
+ let get_con w = w.con
+@@ -93,6 +194,9 @@ let make_perm dom =
+ Perms.Connection.create ~perms:[Perms.READ; Perms.WRITE] domid
+
+ let create xbcon dom =
++ let destination (watch, pkt) =
++ BoundedPipe.push watch.pending_watchevents pkt
++ in
+ let id =
+ match dom with
+ | None -> let old = !anon_id_next in incr anon_id_next; old
+@@ -109,6 +213,16 @@ let create xbcon dom =
+ anonid = id;
+ stat_nb_ops = 0;
+ perm = make_perm dom;
++
++ (* the actual capacity will be lower, this is used as an overflow
++ buffer: anything that doesn't fit elsewhere gets put here, only
++ limited by the amount of watches that you can generate with a
++ single xenstore command (which is finite, although possibly very
++ large in theory for Dom0). Once the pipe here has any contents the
++ domain is blocked from sending more commands until it is empty
++ again though.
++ *)
++ pending_source_watchevents = BoundedPipe.create ~capacity:Sys.max_array_length ~destination
+ }
+ in
+ Logging.new_connection ~tid:Transaction.none ~con:(get_domstr con);
+@@ -127,11 +241,17 @@ let set_target con target_domid =
+
+ let is_backend_mmap con = Xenbus.Xb.is_mmap con.xb
+
+-let send_reply con tid rid ty data =
++let packet_of con tid rid ty data =
+ if (String.length data) > xenstore_payload_max && (is_backend_mmap con) then
+- Xenbus.Xb.queue con.xb (Xenbus.Xb.Packet.create tid rid Xenbus.Xb.Op.Error "E2BIG\000")
++ Xenbus.Xb.Packet.create tid rid Xenbus.Xb.Op.Error "E2BIG\000"
+ else
+- Xenbus.Xb.queue con.xb (Xenbus.Xb.Packet.create tid rid ty data)
++ Xenbus.Xb.Packet.create tid rid ty data
++
++let send_reply con tid rid ty data =
++ let result = Xenbus.Xb.queue con.xb (packet_of con tid rid ty data) in
++ (* should never happen: we only process an input packet when there is room for an output packet *)
++ (* and the limit for replies is different from the limit for watch events *)
++ assert (result <> None)
+
+ let send_error con tid rid err = send_reply con tid rid Xenbus.Xb.Op.Error (err ^ "\000")
+ let send_ack con tid rid ty = send_reply con tid rid ty "OK\000"
+@@ -181,11 +301,11 @@ let del_watch con path token =
+ apath, w
+
+ let del_watches con =
+- Hashtbl.clear con.watches;
++ Hashtbl.reset con.watches;
+ con.nb_watches <- 0
+
+ let del_transactions con =
+- Hashtbl.clear con.transactions
++ Hashtbl.reset con.transactions
+
+ let list_watches con =
+ let ll = Hashtbl.fold
+@@ -208,21 +328,29 @@ let lookup_watch_perm path = function
+ let lookup_watch_perms oldroot root path =
+ lookup_watch_perm path oldroot @ lookup_watch_perm path (Some root)
+
+-let fire_single_watch_unchecked watch =
++let fire_single_watch_unchecked source watch =
+ let data = Utils.join_by_null [watch.path; watch.token; ""] in
+- send_reply watch.con Transaction.none 0 Xenbus.Xb.Op.Watchevent data
++ let pkt = packet_of watch.con Transaction.none 0 Xenbus.Xb.Op.Watchevent data in
+
+-let fire_single_watch (oldroot, root) watch =
++ match BoundedPipe.push source.pending_source_watchevents (watch, pkt) with
++ | Some () -> () (* packet queued *)
++ | None ->
++ (* a well behaved Dom0 shouldn't be able to trigger this,
++ if it happens it is likely a Dom0 bug causing runaway memory usage
++ *)
++ failwith "watch event overflow, cannot happen"
++
++let fire_single_watch source (oldroot, root) watch =
+ let abspath = get_watch_path watch.con watch.path |> Store.Path.of_string in
+ let perms = lookup_watch_perms oldroot root abspath in
+ if Perms.can_fire_watch watch.con.perm perms then
+- fire_single_watch_unchecked watch
++ fire_single_watch_unchecked source watch
+ else
+ let perms = perms |> List.map (Perms.Node.to_string ~sep:" ") |> String.concat ", " in
+ let con = get_domstr watch.con in
+ Logging.watch_not_fired ~con perms (Store.Path.to_string abspath)
+
+-let fire_watch roots watch path =
++let fire_watch source roots watch path =
+ let new_path =
+ if watch.is_relative && path.[0] = '/'
+ then begin
+@@ -232,7 +360,7 @@ let fire_watch roots watch path =
+ end else
+ path
+ in
+- fire_single_watch roots { watch with path = new_path }
++ fire_single_watch source roots { watch with path = new_path }
+
+ (* Search for a valid unused transaction id. *)
+ let rec valid_transaction_id con proposed_id =
+@@ -280,6 +408,7 @@ let do_input con = Xenbus.Xb.input con.xb
+ let has_partial_input con = Xenbus.Xb.has_partial_input con.xb
+ let has_more_input con = Xenbus.Xb.has_more_input con.xb
+
++let can_input con = Xenbus.Xb.can_input con.xb && BoundedPipe.is_empty con.pending_source_watchevents
+ let has_output con = Xenbus.Xb.has_output con.xb
+ let has_old_output con = Xenbus.Xb.has_old_output con.xb
+ let has_new_output con = Xenbus.Xb.has_new_output con.xb
+@@ -323,7 +452,7 @@ let prevents_live_update con = not (is_bad con)
+ && (has_extra_connection_data con || has_transaction_data con)
+
+ let has_more_work con =
+- has_more_input con || not (has_old_output con) && has_new_output con
++ (has_more_input con && can_input con) || not (has_old_output con) && has_new_output con
+
+ let incr_ops con = con.stat_nb_ops <- con.stat_nb_ops + 1
+
+diff --git a/tools/ocaml/xenstored/connections.ml b/tools/ocaml/xenstored/connections.ml
+index 3c7429fe7f61..7d68c583b43a 100644
+--- a/tools/ocaml/xenstored/connections.ml
++++ b/tools/ocaml/xenstored/connections.ml
+@@ -22,22 +22,30 @@ type t = {
+ domains: (int, Connection.t) Hashtbl.t;
+ ports: (Xeneventchn.t, Connection.t) Hashtbl.t;
+ mutable watches: Connection.watch list Trie.t;
++ mutable has_pending_watchevents: Connection.Watch.Set.t
+ }
+
+ let create () = {
+ anonymous = Hashtbl.create 37;
+ domains = Hashtbl.create 37;
+ ports = Hashtbl.create 37;
+- watches = Trie.create ()
++ watches = Trie.create ();
++ has_pending_watchevents = Connection.Watch.Set.empty;
+ }
+
++let get_capacity () =
++ (* not multiplied by maxwatch on purpose: 2nd queue in watch itself! *)
++ { Xenbus.Xb.maxoutstanding = !Define.maxoutstanding; maxwatchevents = !Define.maxwatchevents }
++
+ let add_anonymous cons fd =
+- let xbcon = Xenbus.Xb.open_fd fd in
++ let capacity = get_capacity () in
++ let xbcon = Xenbus.Xb.open_fd fd ~capacity in
+ let con = Connection.create xbcon None in
+ Hashtbl.add cons.anonymous (Xenbus.Xb.get_fd xbcon) con
+
+ let add_domain cons dom =
+- let xbcon = Xenbus.Xb.open_mmap (Domain.get_interface dom) (fun () -> Domain.notify dom) in
++ let capacity = get_capacity () in
++ let xbcon = Xenbus.Xb.open_mmap ~capacity (Domain.get_interface dom) (fun () -> Domain.notify dom) in
+ let con = Connection.create xbcon (Some dom) in
+ Hashtbl.add cons.domains (Domain.get_id dom) con;
+ match Domain.get_port dom with
+@@ -48,7 +56,9 @@ let select ?(only_if = (fun _ -> true)) cons =
+ Hashtbl.fold (fun _ con (ins, outs) ->
+ if (only_if con) then (
+ let fd = Connection.get_fd con in
+- (fd :: ins, if Connection.has_output con then fd :: outs else outs)
++ let in_fds = if Connection.can_input con then fd :: ins else ins in
++ let out_fds = if Connection.has_output con then fd :: outs else outs in
++ in_fds, out_fds
+ ) else (ins, outs)
+ )
+ cons.anonymous ([], [])
+@@ -67,10 +77,17 @@ let del_watches_of_con con watches =
+ | [] -> None
+ | ws -> Some ws
+
++let del_watches cons con =
++ Connection.del_watches con;
++ cons.watches <- Trie.map (del_watches_of_con con) cons.watches;
++ cons.has_pending_watchevents <-
++ cons.has_pending_watchevents |> Connection.Watch.Set.filter @@ fun w ->
++ Connection.get_con w != con
++
+ let del_anonymous cons con =
+ try
+ Hashtbl.remove cons.anonymous (Connection.get_fd con);
+- cons.watches <- Trie.map (del_watches_of_con con) cons.watches;
++ del_watches cons con;
+ Connection.close con
+ with exn ->
+ debug "del anonymous %s" (Printexc.to_string exn)
+@@ -85,7 +102,7 @@ let del_domain cons id =
+ | Some p -> Hashtbl.remove cons.ports p
+ | None -> ())
+ | None -> ());
+- cons.watches <- Trie.map (del_watches_of_con con) cons.watches;
++ del_watches cons con;
+ Connection.close con
+ with exn ->
+ debug "del domain %u: %s" id (Printexc.to_string exn)
+@@ -136,31 +153,33 @@ let del_watch cons con path token =
+ cons.watches <- Trie.set cons.watches key watches;
+ watch
+
+-let del_watches cons con =
+- Connection.del_watches con;
+- cons.watches <- Trie.map (del_watches_of_con con) cons.watches
+-
+ (* path is absolute *)
+-let fire_watches ?oldroot root cons path recurse =
++let fire_watches ?oldroot source root cons path recurse =
+ let key = key_of_path path in
+ let path = Store.Path.to_string path in
+ let roots = oldroot, root in
+ let fire_watch _ = function
+ | None -> ()
+- | Some watches -> List.iter (fun w -> Connection.fire_watch roots w path) watches
++ | Some watches -> List.iter (fun w -> Connection.fire_watch source roots w path) watches
+ in
+ let fire_rec _x = function
+ | None -> ()
+ | Some watches ->
+- List.iter (Connection.fire_single_watch roots) watches
++ List.iter (Connection.fire_single_watch source roots) watches
+ in
+ Trie.iter_path fire_watch cons.watches key;
+ if recurse then
+ Trie.iter fire_rec (Trie.sub cons.watches key)
+
++let send_watchevents cons con =
++ cons.has_pending_watchevents <-
++ cons.has_pending_watchevents |> Connection.Watch.Set.filter Connection.Watch.flush_events;
++ Connection.source_flush_watchevents con
++
+ let fire_spec_watches root cons specpath =
++ let source = find_domain cons 0 in
+ iter cons (fun con ->
+- List.iter (Connection.fire_single_watch (None, root)) (Connection.get_watches con specpath))
++ List.iter (Connection.fire_single_watch source (None, root)) (Connection.get_watches con specpath))
+
+ let set_target cons domain target_domain =
+ let con = find_domain cons domain in
+@@ -197,6 +216,16 @@ let debug cons =
+ let domains = Hashtbl.fold (fun _ con accu -> Connection.debug con :: accu) cons.domains [] in
+ String.concat "" (domains @ anonymous)
+
++let debug_watchevents cons con =
++ (* == (physical equality)
++ has to be used here because w.con.xb.backend might contain a [unit->unit] value causing regular
++ comparison to fail due to having a 'functional value' which cannot be compared.
++ *)
++ let s = cons.has_pending_watchevents |> Connection.Watch.Set.filter (fun w -> w.con == con) in
++ let pending = s |> Connection.Watch.Set.elements
++ |> List.map (fun w -> Connection.Watch.pending_watchevents w) |> List.fold_left (+) 0 in
++ Printf.sprintf "Watches with pending events: %d, pending events total: %d" (Connection.Watch.Set.cardinal s) pending
++
+ let filter ~f cons =
+ let fold _ v acc = if f v then v :: acc else acc in
+ []
+diff --git a/tools/ocaml/xenstored/define.ml b/tools/ocaml/xenstored/define.ml
+index ba63a8147e09..327b6d795ec7 100644
+--- a/tools/ocaml/xenstored/define.ml
++++ b/tools/ocaml/xenstored/define.ml
+@@ -24,6 +24,13 @@ let default_config_dir = Paths.xen_config_dir
+ let maxwatch = ref (100)
+ let maxtransaction = ref (10)
+ let maxrequests = ref (1024) (* maximum requests per transaction *)
++let maxoutstanding = ref (1024) (* maximum outstanding requests, i.e. in-flight requests / domain *)
++let maxwatchevents = ref (1024)
++(*
++ maximum outstanding watch events per watch,
++ recommended >= maxoutstanding to avoid blocking backend transactions due to
++ malicious frontends
++ *)
+
+ let gc_max_overhead = ref 120 (* 120% see comment in xenstored.ml *)
+ let conflict_burst_limit = ref 5.0
+diff --git a/tools/ocaml/xenstored/oxenstored.conf.in b/tools/ocaml/xenstored/oxenstored.conf.in
+index 4ae48e42d47d..9d034e744b4b 100644
+--- a/tools/ocaml/xenstored/oxenstored.conf.in
++++ b/tools/ocaml/xenstored/oxenstored.conf.in
+@@ -62,6 +62,8 @@ quota-maxwatch = 100
+ quota-transaction = 10
+ quota-maxrequests = 1024
+ quota-path-max = 1024
++quota-maxoutstanding = 1024
++quota-maxwatchevents = 1024
+
+ # Activate filed base backend
+ persistent = false
+diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
+index cbf708213796..ce39ce28b5f3 100644
+--- a/tools/ocaml/xenstored/process.ml
++++ b/tools/ocaml/xenstored/process.ml
+@@ -57,7 +57,7 @@ let split_one_path data con =
+ | path :: "" :: [] -> Store.Path.create path (Connection.get_path con)
+ | _ -> raise Invalid_Cmd_Args
+
+-let process_watch t cons =
++let process_watch source t cons =
+ let oldroot = t.Transaction.oldroot in
+ let newroot = Store.get_root t.store in
+ let ops = Transaction.get_paths t |> List.rev in
+@@ -67,8 +67,9 @@ let process_watch t cons =
+ | Xenbus.Xb.Op.Rm -> true, None, oldroot
+ | Xenbus.Xb.Op.Setperms -> false, Some oldroot, newroot
+ | _ -> raise (Failure "huh ?") in
+- Connections.fire_watches ?oldroot root cons (snd op) recurse in
+- List.iter (fun op -> do_op_watch op cons) ops
++ Connections.fire_watches ?oldroot source root cons (snd op) recurse in
++ List.iter (fun op -> do_op_watch op cons) ops;
++ Connections.send_watchevents cons source
+
+ let create_implicit_path t perm path =
+ let dirname = Store.Path.get_parent path in
+@@ -234,6 +235,20 @@ let do_debug con t _domains cons data =
+ | "watches" :: _ ->
+ let watches = Connections.debug cons in
+ Some (watches ^ "\000")
++ | "xenbus" :: domid :: _ ->
++ let domid = int_of_string domid in
++ let con = Connections.find_domain cons domid in
++ let s = Printf.sprintf "xenbus: %s; overflow queue length: %d, can_input: %b, has_more_input: %b, has_old_output: %b, has_new_output: %b, has_more_work: %b. pending: %s"
++ (Xenbus.Xb.debug con.xb)
++ (Connection.source_pending_watchevents con)
++ (Connection.can_input con)
++ (Connection.has_more_input con)
++ (Connection.has_old_output con)
++ (Connection.has_new_output con)
++ (Connection.has_more_work con)
++ (Connections.debug_watchevents cons con)
++ in
++ Some s
+ | "mfn" :: domid :: _ ->
+ let domid = int_of_string domid in
+ let con = Connections.find_domain cons domid in
+@@ -342,7 +357,7 @@ let reply_ack fct con t doms cons data =
+ fct con t doms cons data;
+ Packet.Ack (fun () ->
+ if Transaction.get_id t = Transaction.none then
+- process_watch t cons
++ process_watch con t cons
+ )
+
+ let reply_data fct con t doms cons data =
+@@ -501,7 +516,7 @@ let do_watch con t _domains cons data =
+ Packet.Ack (fun () ->
+ (* xenstore.txt says this watch is fired immediately,
+ implying even if path doesn't exist or is unreadable *)
+- Connection.fire_single_watch_unchecked watch)
++ Connection.fire_single_watch_unchecked con watch)
+
+ let do_unwatch con _t _domains cons data =
+ let (node, token) =
+@@ -532,7 +547,7 @@ let do_transaction_end con t domains cons data =
+ if not success then
+ raise Transaction_again;
+ if commit then begin
+- process_watch t cons;
++ process_watch con t cons;
+ match t.Transaction.ty with
+ | Transaction.No ->
+ () (* no need to record anything *)
+@@ -700,7 +715,8 @@ let process_packet ~store ~cons ~doms ~con ~req =
+ let do_input store cons doms con =
+ let newpacket =
+ try
+- Connection.do_input con
++ if Connection.can_input con then Connection.do_input con
++ else None
+ with Xenbus.Xb.Reconnect ->
+ info "%s requests a reconnect" (Connection.get_domstr con);
+ History.reconnect con;
+@@ -728,6 +744,7 @@ let do_input store cons doms con =
+ Connection.incr_ops con
+
+ let do_output _store _cons _doms con =
++ Connection.source_flush_watchevents con;
+ if Connection.has_output con then (
+ if Connection.has_new_output con then (
+ let packet = Connection.peek_output con in
+diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
+index 3b57ad016dfb..c799e20f1145 100644
+--- a/tools/ocaml/xenstored/xenstored.ml
++++ b/tools/ocaml/xenstored/xenstored.ml
+@@ -103,6 +103,8 @@ let parse_config filename =
+ ("quota-maxentity", Config.Set_int Quota.maxent);
+ ("quota-maxsize", Config.Set_int Quota.maxsize);
+ ("quota-maxrequests", Config.Set_int Define.maxrequests);
++ ("quota-maxoutstanding", Config.Set_int Define.maxoutstanding);
++ ("quota-maxwatchevents", Config.Set_int Define.maxwatchevents);
+ ("quota-path-max", Config.Set_int Define.path_max);
+ ("gc-max-overhead", Config.Set_int Define.gc_max_overhead);
+ ("test-eagain", Config.Set_bool Transaction.test_eagain);
+--
+2.37.4
+
diff --git a/0069-SUPPORT.md-clarify-support-of-untrusted-driver-domai.patch b/0069-SUPPORT.md-clarify-support-of-untrusted-driver-domai.patch
new file mode 100644
index 0000000..5660b02
--- /dev/null
+++ b/0069-SUPPORT.md-clarify-support-of-untrusted-driver-domai.patch
@@ -0,0 +1,55 @@
+From a026fddf89420dd25c5a9574d88aeab7c5711f6c Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Thu, 29 Sep 2022 13:07:35 +0200
+Subject: [PATCH 69/87] SUPPORT.md: clarify support of untrusted driver domains
+ with oxenstored
+
+Add a support statement for the scope of support regarding different
+Xenstore variants. Especially oxenstored does not (yet) have security
+support of untrusted driver domains, as those might drive oxenstored
+out of memory by creating lots of watch events for the guests they are
+servicing.
+
+Add a statement regarding Live Update support of oxenstored.
+
+This is part of XSA-326.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: George Dunlap <george.dunlap@citrix.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit c7bc20d8d123851a468402bbfc9e3330efff21ec)
+---
+ SUPPORT.md | 13 +++++++++----
+ 1 file changed, 9 insertions(+), 4 deletions(-)
+
+diff --git a/SUPPORT.md b/SUPPORT.md
+index 85726102eab8..7d0cb34c8f6f 100644
+--- a/SUPPORT.md
++++ b/SUPPORT.md
+@@ -179,13 +179,18 @@ Support for running qemu-xen device model in a linux stubdomain.
+
+ Status: Tech Preview
+
+-## Liveupdate of C xenstored daemon
++## Xenstore
+
+- Status: Tech Preview
++### C xenstored daemon
+
+-## Liveupdate of OCaml xenstored daemon
++ Status: Supported
++ Status, Liveupdate: Tech Preview
+
+- Status: Tech Preview
++### OCaml xenstored daemon
++
++ Status: Supported
++ Status, untrusted driver domains: Supported, not security supported
++ Status, Liveupdate: Not functional
+
+ ## Toolstack/3rd party
+
+--
+2.37.4
+
diff --git a/0070-tools-xenstore-don-t-use-conn-in-as-context-for-temp.patch b/0070-tools-xenstore-don-t-use-conn-in-as-context-for-temp.patch
new file mode 100644
index 0000000..434ad0c
--- /dev/null
+++ b/0070-tools-xenstore-don-t-use-conn-in-as-context-for-temp.patch
@@ -0,0 +1,718 @@
+From c758765e464e166b5495c76466facc79584bbe1e Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:10 +0200
+Subject: [PATCH 70/87] tools/xenstore: don't use conn->in as context for
+ temporary allocations
+
+Using the struct buffered data pointer of the current processed request
+for temporary data allocations has a major drawback: the used area (and
+with that the temporary data) is freed only after the response of the
+request has been written to the ring page or has been read via the
+socket. This can happen much later in case a guest isn't reading its
+responses fast enough.
+
+As the temporary data can be safely freed after creating the response,
+add a temporary context for that purpose and use that for allocating
+the temporary memory, as it was already the case before commit
+cc0612464896 ("xenstore: add small default data buffer to internal
+struct").
+
+Some sub-functions need to gain the "const" attribute for the talloc
+context.
+
+This is XSA-416 / CVE-2022-42319.
+
+Fixes: cc0612464896 ("xenstore: add small default data buffer to internal struct")
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 2a587de219cc0765330fbf9fac6827bfaf29e29b)
+---
+ tools/xenstore/xenstored_control.c | 31 ++++++-----
+ tools/xenstore/xenstored_control.h | 3 +-
+ tools/xenstore/xenstored_core.c | 76 ++++++++++++++++----------
+ tools/xenstore/xenstored_domain.c | 29 ++++++----
+ tools/xenstore/xenstored_domain.h | 21 ++++---
+ tools/xenstore/xenstored_transaction.c | 14 +++--
+ tools/xenstore/xenstored_transaction.h | 6 +-
+ tools/xenstore/xenstored_watch.c | 9 +--
+ tools/xenstore/xenstored_watch.h | 6 +-
+ 9 files changed, 118 insertions(+), 77 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
+index 1031a81c3874..d0350c6ad861 100644
+--- a/tools/xenstore/xenstored_control.c
++++ b/tools/xenstore/xenstored_control.c
+@@ -155,7 +155,7 @@ bool lu_is_pending(void)
+
+ struct cmd_s {
+ char *cmd;
+- int (*func)(void *, struct connection *, char **, int);
++ int (*func)(const void *, struct connection *, char **, int);
+ char *pars;
+ /*
+ * max_pars can be used to limit the size of the parameter vector,
+@@ -167,7 +167,7 @@ struct cmd_s {
+ unsigned int max_pars;
+ };
+
+-static int do_control_check(void *ctx, struct connection *conn,
++static int do_control_check(const void *ctx, struct connection *conn,
+ char **vec, int num)
+ {
+ if (num)
+@@ -179,7 +179,7 @@ static int do_control_check(void *ctx, struct connection *conn,
+ return 0;
+ }
+
+-static int do_control_log(void *ctx, struct connection *conn,
++static int do_control_log(const void *ctx, struct connection *conn,
+ char **vec, int num)
+ {
+ if (num != 1)
+@@ -281,7 +281,7 @@ static int quota_get(const void *ctx, struct connection *conn,
+ return domain_get_quota(ctx, conn, atoi(vec[0]));
+ }
+
+-static int do_control_quota(void *ctx, struct connection *conn,
++static int do_control_quota(const void *ctx, struct connection *conn,
+ char **vec, int num)
+ {
+ if (num == 0)
+@@ -293,7 +293,7 @@ static int do_control_quota(void *ctx, struct connection *conn,
+ return quota_get(ctx, conn, vec, num);
+ }
+
+-static int do_control_quota_s(void *ctx, struct connection *conn,
++static int do_control_quota_s(const void *ctx, struct connection *conn,
+ char **vec, int num)
+ {
+ if (num == 0)
+@@ -306,7 +306,7 @@ static int do_control_quota_s(void *ctx, struct connection *conn,
+ }
+
+ #ifdef __MINIOS__
+-static int do_control_memreport(void *ctx, struct connection *conn,
++static int do_control_memreport(const void *ctx, struct connection *conn,
+ char **vec, int num)
+ {
+ if (num)
+@@ -318,7 +318,7 @@ static int do_control_memreport(void *ctx, struct connection *conn,
+ return 0;
+ }
+ #else
+-static int do_control_logfile(void *ctx, struct connection *conn,
++static int do_control_logfile(const void *ctx, struct connection *conn,
+ char **vec, int num)
+ {
+ if (num != 1)
+@@ -333,7 +333,7 @@ static int do_control_logfile(void *ctx, struct connection *conn,
+ return 0;
+ }
+
+-static int do_control_memreport(void *ctx, struct connection *conn,
++static int do_control_memreport(const void *ctx, struct connection *conn,
+ char **vec, int num)
+ {
+ FILE *fp;
+@@ -373,7 +373,7 @@ static int do_control_memreport(void *ctx, struct connection *conn,
+ }
+ #endif
+
+-static int do_control_print(void *ctx, struct connection *conn,
++static int do_control_print(const void *ctx, struct connection *conn,
+ char **vec, int num)
+ {
+ if (num != 1)
+@@ -875,7 +875,7 @@ static const char *lu_start(const void *ctx, struct connection *conn,
+ return NULL;
+ }
+
+-static int do_control_lu(void *ctx, struct connection *conn,
++static int do_control_lu(const void *ctx, struct connection *conn,
+ char **vec, int num)
+ {
+ const char *ret = NULL;
+@@ -922,7 +922,7 @@ static int do_control_lu(void *ctx, struct connection *conn,
+ }
+ #endif
+
+-static int do_control_help(void *, struct connection *, char **, int);
++static int do_control_help(const void *, struct connection *, char **, int);
+
+ static struct cmd_s cmds[] = {
+ { "check", do_control_check, "" },
+@@ -961,7 +961,7 @@ static struct cmd_s cmds[] = {
+ { "help", do_control_help, "" },
+ };
+
+-static int do_control_help(void *ctx, struct connection *conn,
++static int do_control_help(const void *ctx, struct connection *conn,
+ char **vec, int num)
+ {
+ int cmd, len = 0;
+@@ -997,7 +997,8 @@ static int do_control_help(void *ctx, struct connection *conn,
+ return 0;
+ }
+
+-int do_control(struct connection *conn, struct buffered_data *in)
++int do_control(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ unsigned int cmd, num, off;
+ char **vec = NULL;
+@@ -1017,11 +1018,11 @@ int do_control(struct connection *conn, struct buffered_data *in)
+ num = xs_count_strings(in->buffer, in->used);
+ if (cmds[cmd].max_pars)
+ num = min(num, cmds[cmd].max_pars);
+- vec = talloc_array(in, char *, num);
++ vec = talloc_array(ctx, char *, num);
+ if (!vec)
+ return ENOMEM;
+ if (get_strings(in, vec, num) < num)
+ return EIO;
+
+- return cmds[cmd].func(in, conn, vec + 1, num - 1);
++ return cmds[cmd].func(ctx, conn, vec + 1, num - 1);
+ }
+diff --git a/tools/xenstore/xenstored_control.h b/tools/xenstore/xenstored_control.h
+index 98b6fbcea2b1..a8cb76559ba1 100644
+--- a/tools/xenstore/xenstored_control.h
++++ b/tools/xenstore/xenstored_control.h
+@@ -16,7 +16,8 @@
+ along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+-int do_control(struct connection *conn, struct buffered_data *in);
++int do_control(const void *ctx, struct connection *conn,
++ struct buffered_data *in);
+ void lu_read_state(void);
+
+ struct connection *lu_get_connection(void);
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 16504de42017..411cc0e44714 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -1248,11 +1248,13 @@ static struct node *get_node_canonicalized(struct connection *conn,
+ return get_node(conn, ctx, *canonical_name, perm);
+ }
+
+-static int send_directory(struct connection *conn, struct buffered_data *in)
++static int send_directory(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ struct node *node;
+
+- node = get_node_canonicalized(conn, in, onearg(in), NULL, XS_PERM_READ);
++ node = get_node_canonicalized(conn, ctx, onearg(in), NULL,
++ XS_PERM_READ);
+ if (!node)
+ return errno;
+
+@@ -1261,7 +1263,7 @@ static int send_directory(struct connection *conn, struct buffered_data *in)
+ return 0;
+ }
+
+-static int send_directory_part(struct connection *conn,
++static int send_directory_part(const void *ctx, struct connection *conn,
+ struct buffered_data *in)
+ {
+ unsigned int off, len, maxlen, genlen;
+@@ -1273,7 +1275,8 @@ static int send_directory_part(struct connection *conn,
+ return EINVAL;
+
+ /* First arg is node name. */
+- node = get_node_canonicalized(conn, in, in->buffer, NULL, XS_PERM_READ);
++ node = get_node_canonicalized(conn, ctx, in->buffer, NULL,
++ XS_PERM_READ);
+ if (!node)
+ return errno;
+
+@@ -1300,7 +1303,7 @@ static int send_directory_part(struct connection *conn,
+ break;
+ }
+
+- data = talloc_array(in, char, genlen + len + 1);
++ data = talloc_array(ctx, char, genlen + len + 1);
+ if (!data)
+ return ENOMEM;
+
+@@ -1316,11 +1319,13 @@ static int send_directory_part(struct connection *conn,
+ return 0;
+ }
+
+-static int do_read(struct connection *conn, struct buffered_data *in)
++static int do_read(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ struct node *node;
+
+- node = get_node_canonicalized(conn, in, onearg(in), NULL, XS_PERM_READ);
++ node = get_node_canonicalized(conn, ctx, onearg(in), NULL,
++ XS_PERM_READ);
+ if (!node)
+ return errno;
+
+@@ -1510,7 +1515,8 @@ err:
+ }
+
+ /* path, data... */
+-static int do_write(struct connection *conn, struct buffered_data *in)
++static int do_write(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ unsigned int offset, datalen;
+ struct node *node;
+@@ -1524,12 +1530,12 @@ static int do_write(struct connection *conn, struct buffered_data *in)
+ offset = strlen(vec[0]) + 1;
+ datalen = in->used - offset;
+
+- node = get_node_canonicalized(conn, in, vec[0], &name, XS_PERM_WRITE);
++ node = get_node_canonicalized(conn, ctx, vec[0], &name, XS_PERM_WRITE);
+ if (!node) {
+ /* No permissions, invalid input? */
+ if (errno != ENOENT)
+ return errno;
+- node = create_node(conn, in, name, in->buffer + offset,
++ node = create_node(conn, ctx, name, in->buffer + offset,
+ datalen);
+ if (!node)
+ return errno;
+@@ -1540,18 +1546,19 @@ static int do_write(struct connection *conn, struct buffered_data *in)
+ return errno;
+ }
+
+- fire_watches(conn, in, name, node, false, NULL);
++ fire_watches(conn, ctx, name, node, false, NULL);
+ send_ack(conn, XS_WRITE);
+
+ return 0;
+ }
+
+-static int do_mkdir(struct connection *conn, struct buffered_data *in)
++static int do_mkdir(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ struct node *node;
+ char *name;
+
+- node = get_node_canonicalized(conn, in, onearg(in), &name,
++ node = get_node_canonicalized(conn, ctx, onearg(in), &name,
+ XS_PERM_WRITE);
+
+ /* If it already exists, fine. */
+@@ -1561,10 +1568,10 @@ static int do_mkdir(struct connection *conn, struct buffered_data *in)
+ return errno;
+ if (!name)
+ return ENOMEM;
+- node = create_node(conn, in, name, NULL, 0);
++ node = create_node(conn, ctx, name, NULL, 0);
+ if (!node)
+ return errno;
+- fire_watches(conn, in, name, node, false, NULL);
++ fire_watches(conn, ctx, name, node, false, NULL);
+ }
+ send_ack(conn, XS_MKDIR);
+
+@@ -1662,24 +1669,25 @@ static int _rm(struct connection *conn, const void *ctx, struct node *node,
+ }
+
+
+-static int do_rm(struct connection *conn, struct buffered_data *in)
++static int do_rm(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ struct node *node;
+ int ret;
+ char *name;
+ char *parentname;
+
+- node = get_node_canonicalized(conn, in, onearg(in), &name,
++ node = get_node_canonicalized(conn, ctx, onearg(in), &name,
+ XS_PERM_WRITE);
+ if (!node) {
+ /* Didn't exist already? Fine, if parent exists. */
+ if (errno == ENOENT) {
+ if (!name)
+ return ENOMEM;
+- parentname = get_parent(in, name);
++ parentname = get_parent(ctx, name);
+ if (!parentname)
+ return errno;
+- node = read_node(conn, in, parentname);
++ node = read_node(conn, ctx, parentname);
+ if (node) {
+ send_ack(conn, XS_RM);
+ return 0;
+@@ -1694,7 +1702,7 @@ static int do_rm(struct connection *conn, struct buffered_data *in)
+ if (streq(name, "/"))
+ return EINVAL;
+
+- ret = _rm(conn, in, node, name);
++ ret = _rm(conn, ctx, node, name);
+ if (ret)
+ return ret;
+
+@@ -1704,13 +1712,15 @@ static int do_rm(struct connection *conn, struct buffered_data *in)
+ }
+
+
+-static int do_get_perms(struct connection *conn, struct buffered_data *in)
++static int do_get_perms(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ struct node *node;
+ char *strings;
+ unsigned int len;
+
+- node = get_node_canonicalized(conn, in, onearg(in), NULL, XS_PERM_READ);
++ node = get_node_canonicalized(conn, ctx, onearg(in), NULL,
++ XS_PERM_READ);
+ if (!node)
+ return errno;
+
+@@ -1723,7 +1733,8 @@ static int do_get_perms(struct connection *conn, struct buffered_data *in)
+ return 0;
+ }
+
+-static int do_set_perms(struct connection *conn, struct buffered_data *in)
++static int do_set_perms(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ struct node_perms perms, old_perms;
+ char *name, *permstr;
+@@ -1740,7 +1751,7 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
+
+ permstr = in->buffer + strlen(in->buffer) + 1;
+
+- perms.p = talloc_array(in, struct xs_permissions, perms.num);
++ perms.p = talloc_array(ctx, struct xs_permissions, perms.num);
+ if (!perms.p)
+ return ENOMEM;
+ if (!xs_strings_to_perms(perms.p, perms.num, permstr))
+@@ -1755,7 +1766,7 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
+ }
+
+ /* We must own node to do this (tools can do this too). */
+- node = get_node_canonicalized(conn, in, in->buffer, &name,
++ node = get_node_canonicalized(conn, ctx, in->buffer, &name,
+ XS_PERM_WRITE | XS_PERM_OWNER);
+ if (!node)
+ return errno;
+@@ -1790,7 +1801,7 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
+ return errno;
+ }
+
+- fire_watches(conn, in, name, node, false, &old_perms);
++ fire_watches(conn, ctx, name, node, false, &old_perms);
+ send_ack(conn, XS_SET_PERMS);
+
+ return 0;
+@@ -1798,7 +1809,8 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
+
+ static struct {
+ const char *str;
+- int (*func)(struct connection *conn, struct buffered_data *in);
++ int (*func)(const void *ctx, struct connection *conn,
++ struct buffered_data *in);
+ unsigned int flags;
+ #define XS_FLAG_NOTID (1U << 0) /* Ignore transaction id. */
+ #define XS_FLAG_PRIV (1U << 1) /* Privileged domain only. */
+@@ -1874,6 +1886,7 @@ static void process_message(struct connection *conn, struct buffered_data *in)
+ struct transaction *trans;
+ enum xsd_sockmsg_type type = in->hdr.msg.type;
+ int ret;
++ void *ctx;
+
+ /* At least send_error() and send_reply() expects conn->in == in */
+ assert(conn->in == in);
+@@ -1898,10 +1911,17 @@ static void process_message(struct connection *conn, struct buffered_data *in)
+ return;
+ }
+
++ ctx = talloc_new(NULL);
++ if (!ctx) {
++ send_error(conn, ENOMEM);
++ return;
++ }
++
+ assert(conn->transaction == NULL);
+ conn->transaction = trans;
+
+- ret = wire_funcs[type].func(conn, in);
++ ret = wire_funcs[type].func(ctx, conn, in);
++ talloc_free(ctx);
+ if (ret)
+ send_error(conn, ret);
+
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index e7c6886ccf47..fb732d0a14c3 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -330,7 +330,7 @@ bool domain_is_unprivileged(struct connection *conn)
+ domid_is_unprivileged(conn->domain->domid);
+ }
+
+-static char *talloc_domain_path(void *context, unsigned int domid)
++static char *talloc_domain_path(const void *context, unsigned int domid)
+ {
+ return talloc_asprintf(context, "/local/domain/%u", domid);
+ }
+@@ -534,7 +534,8 @@ static struct domain *introduce_domain(const void *ctx,
+ }
+
+ /* domid, gfn, evtchn, path */
+-int do_introduce(struct connection *conn, struct buffered_data *in)
++int do_introduce(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ struct domain *domain;
+ char *vec[3];
+@@ -552,7 +553,7 @@ int do_introduce(struct connection *conn, struct buffered_data *in)
+ if (port <= 0)
+ return EINVAL;
+
+- domain = introduce_domain(in, domid, port, false);
++ domain = introduce_domain(ctx, domid, port, false);
+ if (!domain)
+ return errno;
+
+@@ -575,7 +576,8 @@ static struct domain *find_connected_domain(unsigned int domid)
+ return domain;
+ }
+
+-int do_set_target(struct connection *conn, struct buffered_data *in)
++int do_set_target(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ char *vec[2];
+ unsigned int domid, tdomid;
+@@ -619,7 +621,8 @@ static struct domain *onearg_domain(struct connection *conn,
+ }
+
+ /* domid */
+-int do_release(struct connection *conn, struct buffered_data *in)
++int do_release(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ struct domain *domain;
+
+@@ -634,7 +637,8 @@ int do_release(struct connection *conn, struct buffered_data *in)
+ return 0;
+ }
+
+-int do_resume(struct connection *conn, struct buffered_data *in)
++int do_resume(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ struct domain *domain;
+
+@@ -649,7 +653,8 @@ int do_resume(struct connection *conn, struct buffered_data *in)
+ return 0;
+ }
+
+-int do_get_domain_path(struct connection *conn, struct buffered_data *in)
++int do_get_domain_path(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ char *path;
+ const char *domid_str = onearg(in);
+@@ -657,18 +662,17 @@ int do_get_domain_path(struct connection *conn, struct buffered_data *in)
+ if (!domid_str)
+ return EINVAL;
+
+- path = talloc_domain_path(conn, atoi(domid_str));
++ path = talloc_domain_path(ctx, atoi(domid_str));
+ if (!path)
+ return errno;
+
+ send_reply(conn, XS_GET_DOMAIN_PATH, path, strlen(path) + 1);
+
+- talloc_free(path);
+-
+ return 0;
+ }
+
+-int do_is_domain_introduced(struct connection *conn, struct buffered_data *in)
++int do_is_domain_introduced(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ int result;
+ unsigned int domid;
+@@ -689,7 +693,8 @@ int do_is_domain_introduced(struct connection *conn, struct buffered_data *in)
+ }
+
+ /* Allow guest to reset all watches */
+-int do_reset_watches(struct connection *conn, struct buffered_data *in)
++int do_reset_watches(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ conn_delete_all_watches(conn);
+ conn_delete_all_transactions(conn);
+diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
+index 904faa923afb..b9e152890149 100644
+--- a/tools/xenstore/xenstored_domain.h
++++ b/tools/xenstore/xenstored_domain.h
+@@ -24,25 +24,32 @@ void handle_event(void);
+ void check_domains(void);
+
+ /* domid, mfn, eventchn, path */
+-int do_introduce(struct connection *conn, struct buffered_data *in);
++int do_introduce(const void *ctx, struct connection *conn,
++ struct buffered_data *in);
+
+ /* domid */
+-int do_is_domain_introduced(struct connection *conn, struct buffered_data *in);
++int do_is_domain_introduced(const void *ctx, struct connection *conn,
++ struct buffered_data *in);
+
+ /* domid */
+-int do_release(struct connection *conn, struct buffered_data *in);
++int do_release(const void *ctx, struct connection *conn,
++ struct buffered_data *in);
+
+ /* domid */
+-int do_resume(struct connection *conn, struct buffered_data *in);
++int do_resume(const void *ctx, struct connection *conn,
++ struct buffered_data *in);
+
+ /* domid, target */
+-int do_set_target(struct connection *conn, struct buffered_data *in);
++int do_set_target(const void *ctx, struct connection *conn,
++ struct buffered_data *in);
+
+ /* domid */
+-int do_get_domain_path(struct connection *conn, struct buffered_data *in);
++int do_get_domain_path(const void *ctx, struct connection *conn,
++ struct buffered_data *in);
+
+ /* Allow guest to reset all watches */
+-int do_reset_watches(struct connection *conn, struct buffered_data *in);
++int do_reset_watches(const void *ctx, struct connection *conn,
++ struct buffered_data *in);
+
+ void domain_init(int evtfd);
+ void dom0_init(void);
+diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
+index 28774813de83..3e3eb47326cc 100644
+--- a/tools/xenstore/xenstored_transaction.c
++++ b/tools/xenstore/xenstored_transaction.c
+@@ -481,7 +481,8 @@ struct transaction *transaction_lookup(struct connection *conn, uint32_t id)
+ return ERR_PTR(-ENOENT);
+ }
+
+-int do_transaction_start(struct connection *conn, struct buffered_data *in)
++int do_transaction_start(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ struct transaction *trans, *exists;
+ char id_str[20];
+@@ -494,8 +495,8 @@ int do_transaction_start(struct connection *conn, struct buffered_data *in)
+ conn->transaction_started > quota_max_transaction)
+ return ENOSPC;
+
+- /* Attach transaction to input for autofree until it's complete */
+- trans = talloc_zero(in, struct transaction);
++ /* Attach transaction to ctx for autofree until it's complete */
++ trans = talloc_zero(ctx, struct transaction);
+ if (!trans)
+ return ENOMEM;
+
+@@ -544,7 +545,8 @@ static int transaction_fix_domains(struct transaction *trans, bool update)
+ return 0;
+ }
+
+-int do_transaction_end(struct connection *conn, struct buffered_data *in)
++int do_transaction_end(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ const char *arg = onearg(in);
+ struct transaction *trans;
+@@ -562,8 +564,8 @@ int do_transaction_end(struct connection *conn, struct buffered_data *in)
+ if (!conn->transaction_started)
+ conn->ta_start_time = 0;
+
+- /* Attach transaction to in for auto-cleanup */
+- talloc_steal(in, trans);
++ /* Attach transaction to ctx for auto-cleanup */
++ talloc_steal(ctx, trans);
+
+ if (streq(arg, "T")) {
+ if (trans->fail)
+diff --git a/tools/xenstore/xenstored_transaction.h b/tools/xenstore/xenstored_transaction.h
+index e3cbd6b23095..39d7f81c5127 100644
+--- a/tools/xenstore/xenstored_transaction.h
++++ b/tools/xenstore/xenstored_transaction.h
+@@ -29,8 +29,10 @@ struct transaction;
+
+ extern uint64_t generation;
+
+-int do_transaction_start(struct connection *conn, struct buffered_data *node);
+-int do_transaction_end(struct connection *conn, struct buffered_data *in);
++int do_transaction_start(const void *ctx, struct connection *conn,
++ struct buffered_data *node);
++int do_transaction_end(const void *ctx, struct connection *conn,
++ struct buffered_data *in);
+
+ struct transaction *transaction_lookup(struct connection *conn, uint32_t id);
+
+diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
+index 85362bcce314..316c08b7f754 100644
+--- a/tools/xenstore/xenstored_watch.c
++++ b/tools/xenstore/xenstored_watch.c
+@@ -243,7 +243,7 @@ static struct watch *add_watch(struct connection *conn, char *path, char *token,
+ return NULL;
+ }
+
+-int do_watch(struct connection *conn, struct buffered_data *in)
++int do_watch(const void *ctx, struct connection *conn, struct buffered_data *in)
+ {
+ struct watch *watch;
+ char *vec[2];
+@@ -252,7 +252,7 @@ int do_watch(struct connection *conn, struct buffered_data *in)
+ if (get_strings(in, vec, ARRAY_SIZE(vec)) != ARRAY_SIZE(vec))
+ return EINVAL;
+
+- errno = check_watch_path(conn, in, &(vec[0]), &relative);
++ errno = check_watch_path(conn, ctx, &(vec[0]), &relative);
+ if (errno)
+ return errno;
+
+@@ -283,7 +283,8 @@ int do_watch(struct connection *conn, struct buffered_data *in)
+ return 0;
+ }
+
+-int do_unwatch(struct connection *conn, struct buffered_data *in)
++int do_unwatch(const void *ctx, struct connection *conn,
++ struct buffered_data *in)
+ {
+ struct watch *watch;
+ char *node, *vec[2];
+@@ -291,7 +292,7 @@ int do_unwatch(struct connection *conn, struct buffered_data *in)
+ if (get_strings(in, vec, ARRAY_SIZE(vec)) != ARRAY_SIZE(vec))
+ return EINVAL;
+
+- node = canonicalize(conn, in, vec[0]);
++ node = canonicalize(conn, ctx, vec[0]);
+ if (!node)
+ return ENOMEM;
+ list_for_each_entry(watch, &conn->watches, list) {
+diff --git a/tools/xenstore/xenstored_watch.h b/tools/xenstore/xenstored_watch.h
+index 0e693f0839cd..091890edca96 100644
+--- a/tools/xenstore/xenstored_watch.h
++++ b/tools/xenstore/xenstored_watch.h
+@@ -21,8 +21,10 @@
+
+ #include "xenstored_core.h"
+
+-int do_watch(struct connection *conn, struct buffered_data *in);
+-int do_unwatch(struct connection *conn, struct buffered_data *in);
++int do_watch(const void *ctx, struct connection *conn,
++ struct buffered_data *in);
++int do_unwatch(const void *ctx, struct connection *conn,
++ struct buffered_data *in);
+
+ /* Fire all watches: !exact means all the children are affected (ie. rm). */
+ void fire_watches(struct connection *conn, const void *tmp, const char *name,
+--
+2.37.4
+
diff --git a/0071-tools-xenstore-fix-checking-node-permissions.patch b/0071-tools-xenstore-fix-checking-node-permissions.patch
new file mode 100644
index 0000000..7cfb08b
--- /dev/null
+++ b/0071-tools-xenstore-fix-checking-node-permissions.patch
@@ -0,0 +1,143 @@
+From 036fa8717b316a10b67ea8cf4d5dd200ac2b29af Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:10 +0200
+Subject: [PATCH 71/87] tools/xenstore: fix checking node permissions
+
+Today chk_domain_generation() is being used to check whether a node
+permission entry is still valid or whether it is referring to a domain
+no longer existing. This is done by comparing the node's and the
+domain's generation count.
+
+In case no struct domain is existing for a checked domain, but the
+domain itself is valid, chk_domain_generation() assumes it is being
+called due to the first node created for a new domain and it will
+return success.
+
+This might be wrong in case the checked permission is related to an
+old domain, which has just been replaced with a new domain using the
+same domid.
+
+Fix that by letting chk_domain_generation() fail in case a struct
+domain isn't found. In order to cover the case of the first node for
+a new domain try to allocate the needed struct domain explicitly when
+processing the related SET_PERMS command. In case a referenced domain
+isn't existing, flag the related permission to be ignored right away.
+
+This is XSA-417 / CVE-2022-42320.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit ab128218225d3542596ca3a02aee80d55494bef8)
+---
+ tools/xenstore/xenstored_core.c | 5 +++++
+ tools/xenstore/xenstored_domain.c | 37 +++++++++++++++++++++----------
+ tools/xenstore/xenstored_domain.h | 1 +
+ 3 files changed, 31 insertions(+), 12 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 411cc0e44714..c676ee4e4e4f 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -1757,6 +1757,11 @@ static int do_set_perms(const void *ctx, struct connection *conn,
+ if (!xs_strings_to_perms(perms.p, perms.num, permstr))
+ return errno;
+
++ if (domain_alloc_permrefs(&perms) < 0)
++ return ENOMEM;
++ if (perms.p[0].perms & XS_PERM_IGNORE)
++ return ENOENT;
++
+ /* First arg is node name. */
+ if (strstarts(in->buffer, "@")) {
+ if (set_perms_special(conn, in->buffer, &perms))
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index fb732d0a14c3..e2f1b09c6037 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -875,7 +875,6 @@ int domain_entry_inc(struct connection *conn, struct node *node)
+ * count (used for testing whether a node permission is older than a domain).
+ *
+ * Return values:
+- * -1: error
+ * 0: domain has higher generation count (it is younger than a node with the
+ * given count), or domain isn't existing any longer
+ * 1: domain is older than the node
+@@ -883,20 +882,38 @@ int domain_entry_inc(struct connection *conn, struct node *node)
+ static int chk_domain_generation(unsigned int domid, uint64_t gen)
+ {
+ struct domain *d;
+- xc_dominfo_t dominfo;
+
+ if (!xc_handle && domid == 0)
+ return 1;
+
+ d = find_domain_struct(domid);
+- if (d)
+- return (d->generation <= gen) ? 1 : 0;
+
+- if (!get_domain_info(domid, &dominfo))
+- return 0;
++ return (d && d->generation <= gen) ? 1 : 0;
++}
+
+- d = alloc_domain(NULL, domid);
+- return d ? 1 : -1;
++/*
++ * Allocate all missing struct domain referenced by a permission set.
++ * Any permission entries for not existing domains will be marked to be
++ * ignored.
++ */
++int domain_alloc_permrefs(struct node_perms *perms)
++{
++ unsigned int i, domid;
++ struct domain *d;
++ xc_dominfo_t dominfo;
++
++ for (i = 0; i < perms->num; i++) {
++ domid = perms->p[i].id;
++ d = find_domain_struct(domid);
++ if (!d) {
++ if (!get_domain_info(domid, &dominfo))
++ perms->p[i].perms |= XS_PERM_IGNORE;
++ else if (!alloc_domain(NULL, domid))
++ return ENOMEM;
++ }
++ }
++
++ return 0;
+ }
+
+ /*
+@@ -909,8 +926,6 @@ int domain_adjust_node_perms(struct connection *conn, struct node *node)
+ int ret;
+
+ ret = chk_domain_generation(node->perms.p[0].id, node->generation);
+- if (ret < 0)
+- return errno;
+
+ /* If the owner doesn't exist any longer give it to priv domain. */
+ if (!ret) {
+@@ -927,8 +942,6 @@ int domain_adjust_node_perms(struct connection *conn, struct node *node)
+ continue;
+ ret = chk_domain_generation(node->perms.p[i].id,
+ node->generation);
+- if (ret < 0)
+- return errno;
+ if (!ret)
+ node->perms.p[i].perms |= XS_PERM_IGNORE;
+ }
+diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
+index b9e152890149..40fe5f690900 100644
+--- a/tools/xenstore/xenstored_domain.h
++++ b/tools/xenstore/xenstored_domain.h
+@@ -62,6 +62,7 @@ bool domain_is_unprivileged(struct connection *conn);
+
+ /* Remove node permissions for no longer existing domains. */
+ int domain_adjust_node_perms(struct connection *conn, struct node *node);
++int domain_alloc_permrefs(struct node_perms *perms);
+
+ /* Quota manipulation */
+ int domain_entry_inc(struct connection *conn, struct node *);
+--
+2.37.4
+
diff --git a/0072-tools-xenstore-remove-recursion-from-construct_node.patch b/0072-tools-xenstore-remove-recursion-from-construct_node.patch
new file mode 100644
index 0000000..72aebfd
--- /dev/null
+++ b/0072-tools-xenstore-remove-recursion-from-construct_node.patch
@@ -0,0 +1,125 @@
+From 074b32e47174a30bb751f2e2c07628eb56117eb8 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:11 +0200
+Subject: [PATCH 72/87] tools/xenstore: remove recursion from construct_node()
+
+In order to reduce stack usage due to recursion, switch
+construct_node() to use a loop instead.
+
+This is part of XSA-418 / CVE-2022-42321.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit da8ee25d02a5447ba39a9800ee2a710ae1f54222)
+---
+ tools/xenstore/xenstored_core.c | 86 +++++++++++++++++++++------------
+ 1 file changed, 55 insertions(+), 31 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index c676ee4e4e4f..3907c35643e9 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -1377,45 +1377,69 @@ static int add_child(const void *ctx, struct node *parent, const char *name)
+ static struct node *construct_node(struct connection *conn, const void *ctx,
+ const char *name)
+ {
+- struct node *parent, *node;
+- char *parentname = get_parent(ctx, name);
++ const char **names = NULL;
++ unsigned int levels = 0;
++ struct node *node = NULL;
++ struct node *parent = NULL;
++ const char *parentname = talloc_strdup(ctx, name);
+
+ if (!parentname)
+ return NULL;
+
+- /* If parent doesn't exist, create it. */
+- parent = read_node(conn, parentname, parentname);
+- if (!parent && errno == ENOENT)
+- parent = construct_node(conn, ctx, parentname);
+- if (!parent)
+- return NULL;
++ /* Walk the path up until an existing node is found. */
++ while (!parent) {
++ names = talloc_realloc(ctx, names, const char *, levels + 1);
++ if (!names)
++ goto nomem;
+
+- /* Add child to parent. */
+- if (add_child(ctx, parent, name))
+- goto nomem;
++ /*
++ * names[0] is the name of the node to construct initially,
++ * names[1] is its parent, and so on.
++ */
++ names[levels] = parentname;
++ parentname = get_parent(ctx, parentname);
++ if (!parentname)
++ return NULL;
+
+- /* Allocate node */
+- node = talloc(ctx, struct node);
+- if (!node)
+- goto nomem;
+- node->name = talloc_strdup(node, name);
+- if (!node->name)
+- goto nomem;
++ /* Try to read parent node until we found an existing one. */
++ parent = read_node(conn, ctx, parentname);
++ if (!parent && (errno != ENOENT || !strcmp(parentname, "/")))
++ return NULL;
+
+- /* Inherit permissions, except unprivileged domains own what they create */
+- node->perms.num = parent->perms.num;
+- node->perms.p = talloc_memdup(node, parent->perms.p,
+- node->perms.num * sizeof(*node->perms.p));
+- if (!node->perms.p)
+- goto nomem;
+- if (domain_is_unprivileged(conn))
+- node->perms.p[0].id = conn->id;
++ levels++;
++ }
++
++ /* Walk the path down again constructing the missing nodes. */
++ for (; levels > 0; levels--) {
++ /* Add child to parent. */
++ if (add_child(ctx, parent, names[levels - 1]))
++ goto nomem;
++
++ /* Allocate node */
++ node = talloc(ctx, struct node);
++ if (!node)
++ goto nomem;
++ node->name = talloc_steal(node, names[levels - 1]);
++
++ /* Inherit permissions, unpriv domains own what they create. */
++ node->perms.num = parent->perms.num;
++ node->perms.p = talloc_memdup(node, parent->perms.p,
++ node->perms.num *
++ sizeof(*node->perms.p));
++ if (!node->perms.p)
++ goto nomem;
++ if (domain_is_unprivileged(conn))
++ node->perms.p[0].id = conn->id;
++
++ /* No children, no data */
++ node->children = node->data = NULL;
++ node->childlen = node->datalen = 0;
++ node->acc.memory = 0;
++ node->parent = parent;
++
++ parent = node;
++ }
+
+- /* No children, no data */
+- node->children = node->data = NULL;
+- node->childlen = node->datalen = 0;
+- node->acc.memory = 0;
+- node->parent = parent;
+ return node;
+
+ nomem:
+--
+2.37.4
+
diff --git a/0073-tools-xenstore-don-t-let-remove_child_entry-call-cor.patch b/0073-tools-xenstore-don-t-let-remove_child_entry-call-cor.patch
new file mode 100644
index 0000000..3c01eb5
--- /dev/null
+++ b/0073-tools-xenstore-don-t-let-remove_child_entry-call-cor.patch
@@ -0,0 +1,110 @@
+From 32ff913afed898e6aef61626a58dc0bf5c6309ef Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:11 +0200
+Subject: [PATCH 73/87] tools/xenstore: don't let remove_child_entry() call
+ corrupt()
+
+In case of write_node() returning an error, remove_child_entry() will
+call corrupt() today. This could result in an endless recursion, as
+remove_child_entry() is called by corrupt(), too:
+
+corrupt()
+ check_store()
+ check_store_()
+ remove_child_entry()
+
+Fix that by letting remove_child_entry() return an error instead and
+let the caller decide what to do.
+
+This is part of XSA-418 / CVE-2022-42321.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 0c00c51f3bc8206c7f9cf87d014650157bee2bf4)
+---
+ tools/xenstore/xenstored_core.c | 36 ++++++++++++++++++---------------
+ 1 file changed, 20 insertions(+), 16 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 3907c35643e9..f433a45dc217 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -1608,15 +1608,15 @@ static void memdel(void *mem, unsigned off, unsigned len, unsigned total)
+ memmove(mem + off, mem + off + len, total - off - len);
+ }
+
+-static void remove_child_entry(struct connection *conn, struct node *node,
+- size_t offset)
++static int remove_child_entry(struct connection *conn, struct node *node,
++ size_t offset)
+ {
+ size_t childlen = strlen(node->children + offset);
+
+ memdel(node->children, offset, childlen + 1, node->childlen);
+ node->childlen -= childlen + 1;
+- if (write_node(conn, node, true))
+- corrupt(conn, "Can't update parent node '%s'", node->name);
++
++ return write_node(conn, node, true);
+ }
+
+ static void delete_child(struct connection *conn,
+@@ -1626,7 +1626,9 @@ static void delete_child(struct connection *conn,
+
+ for (i = 0; i < node->childlen; i += strlen(node->children+i) + 1) {
+ if (streq(node->children+i, childname)) {
+- remove_child_entry(conn, node, i);
++ if (remove_child_entry(conn, node, i))
++ corrupt(conn, "Can't update parent node '%s'",
++ node->name);
+ return;
+ }
+ }
+@@ -2325,6 +2327,17 @@ int remember_string(struct hashtable *hash, const char *str)
+ return hashtable_insert(hash, k, (void *)1);
+ }
+
++static int rm_child_entry(struct node *node, size_t off, size_t len)
++{
++ if (!recovery)
++ return off;
++
++ if (remove_child_entry(NULL, node, off))
++ log("check_store: child entry could not be removed from '%s'",
++ node->name);
++
++ return off - len - 1;
++}
+
+ /**
+ * A node has a children field that names the children of the node, separated
+@@ -2377,12 +2390,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
+ if (hashtable_search(children, childname)) {
+ log("check_store: '%s' is duplicated!",
+ childname);
+-
+- if (recovery) {
+- remove_child_entry(NULL, node,
+- i);
+- i -= childlen + 1;
+- }
++ i = rm_child_entry(node, i, childlen);
+ }
+ else {
+ if (!remember_string(children,
+@@ -2399,11 +2407,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
+ } else if (errno != ENOMEM) {
+ log("check_store: No child '%s' found!\n",
+ childname);
+-
+- if (recovery) {
+- remove_child_entry(NULL, node, i);
+- i -= childlen + 1;
+- }
++ i = rm_child_entry(node, i, childlen);
+ } else {
+ log("check_store: ENOMEM");
+ ret = ENOMEM;
+--
+2.37.4
+
diff --git a/0074-tools-xenstore-add-generic-treewalk-function.patch b/0074-tools-xenstore-add-generic-treewalk-function.patch
new file mode 100644
index 0000000..d84439c
--- /dev/null
+++ b/0074-tools-xenstore-add-generic-treewalk-function.patch
@@ -0,0 +1,250 @@
+From 01ab4910229696e51c59a80eb86d0fedeeccb54b Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:11 +0200
+Subject: [PATCH 74/87] tools/xenstore: add generic treewalk function
+
+Add a generic function to walk the complete node tree. It will start
+at "/" and descend recursively into each child, calling a function
+specified by the caller. Depending on the return value of the user
+specified function the walk will be aborted, continued, or the current
+child will be skipped by not descending into its children.
+
+This is part of XSA-418 / CVE-2022-42321.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 0d7c5d19bc27492360196e7dad2b227908564fff)
+---
+ tools/xenstore/xenstored_core.c | 143 +++++++++++++++++++++++++++++---
+ tools/xenstore/xenstored_core.h | 40 +++++++++
+ 2 files changed, 170 insertions(+), 13 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index f433a45dc217..2cda3ee375ab 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -1838,6 +1838,135 @@ static int do_set_perms(const void *ctx, struct connection *conn,
+ return 0;
+ }
+
++static char *child_name(const void *ctx, const char *s1, const char *s2)
++{
++ if (strcmp(s1, "/"))
++ return talloc_asprintf(ctx, "%s/%s", s1, s2);
++ return talloc_asprintf(ctx, "/%s", s2);
++}
++
++static int rm_from_parent(struct connection *conn, struct node *parent,
++ const char *name)
++{
++ size_t off;
++
++ if (!parent)
++ return WALK_TREE_ERROR_STOP;
++
++ for (off = parent->childoff - 1; off && parent->children[off - 1];
++ off--);
++ if (remove_child_entry(conn, parent, off)) {
++ log("treewalk: child entry could not be removed from '%s'",
++ parent->name);
++ return WALK_TREE_ERROR_STOP;
++ }
++ parent->childoff = off;
++
++ return WALK_TREE_OK;
++}
++
++static int walk_call_func(const void *ctx, struct connection *conn,
++ struct node *node, struct node *parent, void *arg,
++ int (*func)(const void *ctx, struct connection *conn,
++ struct node *node, void *arg))
++{
++ int ret;
++
++ if (!func)
++ return WALK_TREE_OK;
++
++ ret = func(ctx, conn, node, arg);
++ if (ret == WALK_TREE_RM_CHILDENTRY && parent)
++ ret = rm_from_parent(conn, parent, node->name);
++
++ return ret;
++}
++
++int walk_node_tree(const void *ctx, struct connection *conn, const char *root,
++ struct walk_funcs *funcs, void *arg)
++{
++ int ret = 0;
++ void *tmpctx;
++ char *name;
++ struct node *node = NULL;
++ struct node *parent = NULL;
++
++ tmpctx = talloc_new(ctx);
++ if (!tmpctx) {
++ errno = ENOMEM;
++ return WALK_TREE_ERROR_STOP;
++ }
++ name = talloc_strdup(tmpctx, root);
++ if (!name) {
++ errno = ENOMEM;
++ talloc_free(tmpctx);
++ return WALK_TREE_ERROR_STOP;
++ }
++
++ /* Continue the walk until an error is returned. */
++ while (ret >= 0) {
++ /* node == NULL possible only for the initial loop iteration. */
++ if (node) {
++ /* Go one step up if ret or if last child finished. */
++ if (ret || node->childoff >= node->childlen) {
++ parent = node->parent;
++ /* Call function AFTER processing a node. */
++ ret = walk_call_func(ctx, conn, node, parent,
++ arg, funcs->exit);
++ /* Last node, so exit loop. */
++ if (!parent)
++ break;
++ talloc_free(node);
++ /* Continue with parent. */
++ node = parent;
++ continue;
++ }
++ /* Get next child of current node. */
++ name = child_name(tmpctx, node->name,
++ node->children + node->childoff);
++ if (!name) {
++ ret = WALK_TREE_ERROR_STOP;
++ break;
++ }
++ /* Point to next child. */
++ node->childoff += strlen(node->children +
++ node->childoff) + 1;
++ /* Descent into children. */
++ parent = node;
++ }
++ /* Read next node (root node or next child). */
++ node = read_node(conn, tmpctx, name);
++ if (!node) {
++ /* Child not found - should not happen! */
++ /* ENOENT case can be handled by supplied function. */
++ if (errno == ENOENT && funcs->enoent)
++ ret = funcs->enoent(ctx, conn, parent, name,
++ arg);
++ else
++ ret = WALK_TREE_ERROR_STOP;
++ if (!parent)
++ break;
++ if (ret == WALK_TREE_RM_CHILDENTRY)
++ ret = rm_from_parent(conn, parent, name);
++ if (ret < 0)
++ break;
++ talloc_free(name);
++ node = parent;
++ continue;
++ }
++ talloc_free(name);
++ node->parent = parent;
++ node->childoff = 0;
++ /* Call function BEFORE processing a node. */
++ ret = walk_call_func(ctx, conn, node, parent, arg,
++ funcs->enter);
++ }
++
++ talloc_free(tmpctx);
++
++ return ret < 0 ? ret : WALK_TREE_OK;
++}
++
+ static struct {
+ const char *str;
+ int (*func)(const void *ctx, struct connection *conn,
+@@ -2305,18 +2434,6 @@ static int keys_equal_fn(void *key1, void *key2)
+ return 0 == strcmp((char *)key1, (char *)key2);
+ }
+
+-
+-static char *child_name(const char *s1, const char *s2)
+-{
+- if (strcmp(s1, "/")) {
+- return talloc_asprintf(NULL, "%s/%s", s1, s2);
+- }
+- else {
+- return talloc_asprintf(NULL, "/%s", s2);
+- }
+-}
+-
+-
+ int remember_string(struct hashtable *hash, const char *str)
+ {
+ char *k = malloc(strlen(str) + 1);
+@@ -2376,7 +2493,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
+ while (i < node->childlen && !ret) {
+ struct node *childnode;
+ size_t childlen = strlen(node->children + i);
+- char * childname = child_name(node->name,
++ char * childname = child_name(NULL, node->name,
+ node->children + i);
+
+ if (!childname) {
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index bfd3fc1e9df3..2d9942171d92 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -202,6 +202,7 @@ struct node {
+
+ /* Children, each nul-terminated. */
+ unsigned int childlen;
++ unsigned int childoff; /* Used by walk_node_tree() internally. */
+ char *children;
+
+ /* Allocation information for node currently in store. */
+@@ -338,6 +339,45 @@ void read_state_buffered_data(const void *ctx, struct connection *conn,
+ const struct xs_state_connection *sc);
+ void read_state_node(const void *ctx, const void *state);
+
++/*
++ * Walk the node tree below root calling funcs->enter() and funcs->exit() for
++ * each node. funcs->enter() is being called when entering a node, so before
++ * any of the children of the node is processed. funcs->exit() is being
++ * called when leaving the node, so after all children have been processed.
++ * funcs->enoent() is being called when a node isn't existing.
++ * funcs->*() return values:
++ * < 0: tree walk is stopped, walk_node_tree() returns funcs->*() return value
++ * in case WALK_TREE_ERROR_STOP is returned, errno should be set
++ * WALK_TREE_OK: tree walk is continuing
++ * WALK_TREE_SKIP_CHILDREN: tree walk won't descend below current node, but
++ * walk continues
++ * WALK_TREE_RM_CHILDENTRY: Remove the child entry from its parent and write
++ * the modified parent node back to the data base, implies to not descend
++ * below the current node, but to continue the walk
++ * funcs->*() is allowed to modify the node it is called for in the data base.
++ * In case funcs->enter() is deleting the node, it must not return WALK_TREE_OK
++ * in order to avoid descending into no longer existing children.
++ */
++/* Return values for funcs->*() and walk_node_tree(). */
++#define WALK_TREE_SUCCESS_STOP -100 /* Stop walk early, no error. */
++#define WALK_TREE_ERROR_STOP -1 /* Stop walk due to error. */
++#define WALK_TREE_OK 0 /* No error. */
++/* Return value for funcs->*() only. */
++#define WALK_TREE_SKIP_CHILDREN 1 /* Don't recurse below current node. */
++#define WALK_TREE_RM_CHILDENTRY 2 /* Remove child entry from parent. */
++
++struct walk_funcs {
++ int (*enter)(const void *ctx, struct connection *conn,
++ struct node *node, void *arg);
++ int (*exit)(const void *ctx, struct connection *conn,
++ struct node *node, void *arg);
++ int (*enoent)(const void *ctx, struct connection *conn,
++ struct node *parent, char *name, void *arg);
++};
++
++int walk_node_tree(const void *ctx, struct connection *conn, const char *root,
++ struct walk_funcs *funcs, void *arg);
++
+ #endif /* _XENSTORED_CORE_H */
+
+ /*
+--
+2.37.4
+
diff --git a/0075-tools-xenstore-simplify-check_store.patch b/0075-tools-xenstore-simplify-check_store.patch
new file mode 100644
index 0000000..5d0348f
--- /dev/null
+++ b/0075-tools-xenstore-simplify-check_store.patch
@@ -0,0 +1,114 @@
+From c5a76df793c638423e1388528dc679a3e020a477 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:12 +0200
+Subject: [PATCH 75/87] tools/xenstore: simplify check_store()
+
+check_store() is using a hash table for storing all node names it has
+found via walking the tree. Additionally it using another hash table
+for all children of a node to detect duplicate child names.
+
+Simplify that by dropping the second hash table as the first one is
+already holding all the needed information.
+
+This is part of XSA-418 / CVE-2022-42321.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 70f719f52a220bc5bc987e4dd28e14a7039a176b)
+---
+ tools/xenstore/xenstored_core.c | 47 +++++++++++----------------------
+ 1 file changed, 15 insertions(+), 32 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 2cda3ee375ab..760f3c16c794 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -2477,50 +2477,34 @@ static int check_store_(const char *name, struct hashtable *reachable)
+ if (node) {
+ size_t i = 0;
+
+- struct hashtable * children =
+- create_hashtable(16, hash_from_key_fn, keys_equal_fn);
+- if (!children) {
+- log("check_store create table: ENOMEM");
+- return ENOMEM;
+- }
+-
+ if (!remember_string(reachable, name)) {
+- hashtable_destroy(children, 0);
+ log("check_store: ENOMEM");
+ return ENOMEM;
+ }
+
+ while (i < node->childlen && !ret) {
+- struct node *childnode;
++ struct node *childnode = NULL;
+ size_t childlen = strlen(node->children + i);
+- char * childname = child_name(NULL, node->name,
+- node->children + i);
++ char *childname = child_name(NULL, node->name,
++ node->children + i);
+
+ if (!childname) {
+ log("check_store: ENOMEM");
+ ret = ENOMEM;
+ break;
+ }
++
++ if (hashtable_search(reachable, childname)) {
++ log("check_store: '%s' is duplicated!",
++ childname);
++ i = rm_child_entry(node, i, childlen);
++ goto next;
++ }
++
+ childnode = read_node(NULL, childname, childname);
+-
++
+ if (childnode) {
+- if (hashtable_search(children, childname)) {
+- log("check_store: '%s' is duplicated!",
+- childname);
+- i = rm_child_entry(node, i, childlen);
+- }
+- else {
+- if (!remember_string(children,
+- childname)) {
+- log("check_store: ENOMEM");
+- talloc_free(childnode);
+- talloc_free(childname);
+- ret = ENOMEM;
+- break;
+- }
+- ret = check_store_(childname,
+- reachable);
+- }
++ ret = check_store_(childname, reachable);
+ } else if (errno != ENOMEM) {
+ log("check_store: No child '%s' found!\n",
+ childname);
+@@ -2530,19 +2514,18 @@ static int check_store_(const char *name, struct hashtable *reachable)
+ ret = ENOMEM;
+ }
+
++ next:
+ talloc_free(childnode);
+ talloc_free(childname);
+ i += childlen + 1;
+ }
+
+- hashtable_destroy(children, 0 /* Don't free values (they are
+- all (void *)1) */);
+ talloc_free(node);
+ } else if (errno != ENOMEM) {
+ /* Impossible, because no database should ever be without the
+ root, and otherwise, we've just checked in our caller
+ (which made a recursive call to get here). */
+-
++
+ log("check_store: No child '%s' found: impossible!", name);
+ } else {
+ log("check_store: ENOMEM");
+--
+2.37.4
+
diff --git a/0076-tools-xenstore-use-treewalk-for-check_store.patch b/0076-tools-xenstore-use-treewalk-for-check_store.patch
new file mode 100644
index 0000000..b965eb0
--- /dev/null
+++ b/0076-tools-xenstore-use-treewalk-for-check_store.patch
@@ -0,0 +1,172 @@
+From f5a4c26b2efc55a5267840fcb31f95c00cc25d10 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:12 +0200
+Subject: [PATCH 76/87] tools/xenstore: use treewalk for check_store()
+
+Instead of doing an open tree walk using call recursion, use
+walk_node_tree() when checking the store for inconsistencies.
+
+This will reduce code size and avoid many nesting levels of function
+calls which could potentially exhaust the stack.
+
+This is part of XSA-418 / CVE-2022-42321.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit a07cc0ec60612f414bedf2bafb26ec38d2602e95)
+---
+ tools/xenstore/xenstored_core.c | 109 +++++++++-----------------------
+ 1 file changed, 30 insertions(+), 79 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 760f3c16c794..efdd1888fd78 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -2444,18 +2444,6 @@ int remember_string(struct hashtable *hash, const char *str)
+ return hashtable_insert(hash, k, (void *)1);
+ }
+
+-static int rm_child_entry(struct node *node, size_t off, size_t len)
+-{
+- if (!recovery)
+- return off;
+-
+- if (remove_child_entry(NULL, node, off))
+- log("check_store: child entry could not be removed from '%s'",
+- node->name);
+-
+- return off - len - 1;
+-}
+-
+ /**
+ * A node has a children field that names the children of the node, separated
+ * by NULs. We check whether there are entries in there that are duplicated
+@@ -2469,70 +2457,29 @@ static int rm_child_entry(struct node *node, size_t off, size_t len)
+ * As we go, we record each node in the given reachable hashtable. These
+ * entries will be used later in clean_store.
+ */
+-static int check_store_(const char *name, struct hashtable *reachable)
++static int check_store_step(const void *ctx, struct connection *conn,
++ struct node *node, void *arg)
+ {
+- struct node *node = read_node(NULL, name, name);
+- int ret = 0;
++ struct hashtable *reachable = arg;
+
+- if (node) {
+- size_t i = 0;
+-
+- if (!remember_string(reachable, name)) {
+- log("check_store: ENOMEM");
+- return ENOMEM;
+- }
+-
+- while (i < node->childlen && !ret) {
+- struct node *childnode = NULL;
+- size_t childlen = strlen(node->children + i);
+- char *childname = child_name(NULL, node->name,
+- node->children + i);
+-
+- if (!childname) {
+- log("check_store: ENOMEM");
+- ret = ENOMEM;
+- break;
+- }
+-
+- if (hashtable_search(reachable, childname)) {
+- log("check_store: '%s' is duplicated!",
+- childname);
+- i = rm_child_entry(node, i, childlen);
+- goto next;
+- }
+-
+- childnode = read_node(NULL, childname, childname);
+-
+- if (childnode) {
+- ret = check_store_(childname, reachable);
+- } else if (errno != ENOMEM) {
+- log("check_store: No child '%s' found!\n",
+- childname);
+- i = rm_child_entry(node, i, childlen);
+- } else {
+- log("check_store: ENOMEM");
+- ret = ENOMEM;
+- }
+-
+- next:
+- talloc_free(childnode);
+- talloc_free(childname);
+- i += childlen + 1;
+- }
+-
+- talloc_free(node);
+- } else if (errno != ENOMEM) {
+- /* Impossible, because no database should ever be without the
+- root, and otherwise, we've just checked in our caller
+- (which made a recursive call to get here). */
+-
+- log("check_store: No child '%s' found: impossible!", name);
+- } else {
+- log("check_store: ENOMEM");
+- ret = ENOMEM;
++ if (hashtable_search(reachable, (void *)node->name)) {
++ log("check_store: '%s' is duplicated!", node->name);
++ return recovery ? WALK_TREE_RM_CHILDENTRY
++ : WALK_TREE_SKIP_CHILDREN;
+ }
+
+- return ret;
++ if (!remember_string(reachable, node->name))
++ return WALK_TREE_ERROR_STOP;
++
++ return WALK_TREE_OK;
++}
++
++static int check_store_enoent(const void *ctx, struct connection *conn,
++ struct node *parent, char *name, void *arg)
++{
++ log("check_store: node '%s' not found", name);
++
++ return recovery ? WALK_TREE_RM_CHILDENTRY : WALK_TREE_OK;
+ }
+
+
+@@ -2581,24 +2528,28 @@ static void clean_store(struct hashtable *reachable)
+
+ void check_store(void)
+ {
+- char * root = talloc_strdup(NULL, "/");
+- struct hashtable * reachable =
+- create_hashtable(16, hash_from_key_fn, keys_equal_fn);
+-
++ struct hashtable *reachable;
++ struct walk_funcs walkfuncs = {
++ .enter = check_store_step,
++ .enoent = check_store_enoent,
++ };
++
++ reachable = create_hashtable(16, hash_from_key_fn, keys_equal_fn);
+ if (!reachable) {
+ log("check_store: ENOMEM");
+ return;
+ }
+
+ log("Checking store ...");
+- if (!check_store_(root, reachable) &&
+- !check_transactions(reachable))
++ if (walk_node_tree(NULL, NULL, "/", &walkfuncs, reachable)) {
++ if (errno == ENOMEM)
++ log("check_store: ENOMEM");
++ } else if (!check_transactions(reachable))
+ clean_store(reachable);
+ log("Checking store complete.");
+
+ hashtable_destroy(reachable, 0 /* Don't free values (they are all
+ (void *)1) */);
+- talloc_free(root);
+ }
+
+
+--
+2.37.4
+
diff --git a/0077-tools-xenstore-use-treewalk-for-deleting-nodes.patch b/0077-tools-xenstore-use-treewalk-for-deleting-nodes.patch
new file mode 100644
index 0000000..6d80a4d
--- /dev/null
+++ b/0077-tools-xenstore-use-treewalk-for-deleting-nodes.patch
@@ -0,0 +1,180 @@
+From 1514de3a5f23aef451133367d8dc04a26b88052f Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:12 +0200
+Subject: [PATCH 77/87] tools/xenstore: use treewalk for deleting nodes
+
+Instead of doing an open tree walk using call recursion, use
+walk_node_tree() when deleting a sub-tree of nodes.
+
+This will reduce code size and avoid many nesting levels of function
+calls which could potentially exhaust the stack.
+
+This is part of XSA-418 / CVE-2022-42321.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit ea16962053a6849a6e7cada549ba7f8c586d85c6)
+---
+ tools/xenstore/xenstored_core.c | 99 ++++++++++++++-------------------
+ 1 file changed, 43 insertions(+), 56 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index efdd1888fd78..58fb651542ec 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -1334,21 +1334,6 @@ static int do_read(const void *ctx, struct connection *conn,
+ return 0;
+ }
+
+-static void delete_node_single(struct connection *conn, struct node *node)
+-{
+- TDB_DATA key;
+-
+- if (access_node(conn, node, NODE_ACCESS_DELETE, &key))
+- return;
+-
+- if (do_tdb_delete(conn, &key, &node->acc) != 0) {
+- corrupt(conn, "Could not delete '%s'", node->name);
+- return;
+- }
+-
+- domain_entry_dec(conn, node);
+-}
+-
+ /* Must not be / */
+ static char *basename(const char *name)
+ {
+@@ -1619,69 +1604,59 @@ static int remove_child_entry(struct connection *conn, struct node *node,
+ return write_node(conn, node, true);
+ }
+
+-static void delete_child(struct connection *conn,
+- struct node *node, const char *childname)
++static int delete_child(struct connection *conn,
++ struct node *node, const char *childname)
+ {
+ unsigned int i;
+
+ for (i = 0; i < node->childlen; i += strlen(node->children+i) + 1) {
+ if (streq(node->children+i, childname)) {
+- if (remove_child_entry(conn, node, i))
+- corrupt(conn, "Can't update parent node '%s'",
+- node->name);
+- return;
++ errno = remove_child_entry(conn, node, i) ? EIO : 0;
++ return errno;
+ }
+ }
+ corrupt(conn, "Can't find child '%s' in %s", childname, node->name);
++
++ errno = EIO;
++ return errno;
+ }
+
+-static int delete_node(struct connection *conn, const void *ctx,
+- struct node *parent, struct node *node, bool watch_exact)
++static int delnode_sub(const void *ctx, struct connection *conn,
++ struct node *node, void *arg)
+ {
+- char *name;
++ const char *root = arg;
++ bool watch_exact;
++ int ret;
++ TDB_DATA key;
+
+- /* Delete children. */
+- while (node->childlen) {
+- struct node *child;
++ /* Any error here will probably be repeated for all following calls. */
++ ret = access_node(conn, node, NODE_ACCESS_DELETE, &key);
++ if (ret > 0)
++ return WALK_TREE_SUCCESS_STOP;
+
+- name = talloc_asprintf(node, "%s/%s", node->name,
+- node->children);
+- child = name ? read_node(conn, node, name) : NULL;
+- if (child) {
+- if (delete_node(conn, ctx, node, child, true))
+- return errno;
+- } else {
+- trace("delete_node: Error deleting child '%s/%s'!\n",
+- node->name, node->children);
+- /* Quit deleting. */
+- errno = ENOMEM;
+- return errno;
+- }
+- talloc_free(name);
+- }
++ /* In case of error stop the walk. */
++ if (!ret && do_tdb_delete(conn, &key, &node->acc))
++ return WALK_TREE_SUCCESS_STOP;
+
+ /*
+ * Fire the watches now, when we can still see the node permissions.
+ * This fine as we are single threaded and the next possible read will
+ * be handled only after the node has been really removed.
+- */
++ */
++ watch_exact = strcmp(root, node->name);
+ fire_watches(conn, ctx, node->name, node, watch_exact, NULL);
+- delete_node_single(conn, node);
+- delete_child(conn, parent, basename(node->name));
+- talloc_free(node);
+
+- return 0;
++ domain_entry_dec(conn, node);
++
++ return WALK_TREE_RM_CHILDENTRY;
+ }
+
+-static int _rm(struct connection *conn, const void *ctx, struct node *node,
+- const char *name)
++static int _rm(struct connection *conn, const void *ctx, const char *name)
+ {
+- /*
+- * Deleting node by node, so the result is always consistent even in
+- * case of a failure.
+- */
+ struct node *parent;
+ char *parentname = get_parent(ctx, name);
++ struct walk_funcs walkfuncs = { .exit = delnode_sub };
++ int ret;
+
+ if (!parentname)
+ return errno;
+@@ -1689,9 +1664,21 @@ static int _rm(struct connection *conn, const void *ctx, struct node *node,
+ parent = read_node(conn, ctx, parentname);
+ if (!parent)
+ return read_node_can_propagate_errno() ? errno : EINVAL;
+- node->parent = parent;
+
+- return delete_node(conn, ctx, parent, node, false);
++ ret = walk_node_tree(ctx, conn, name, &walkfuncs, (void *)name);
++ if (ret < 0) {
++ if (ret == WALK_TREE_ERROR_STOP) {
++ corrupt(conn, "error when deleting sub-nodes of %s\n",
++ name);
++ errno = EIO;
++ }
++ return errno;
++ }
++
++ if (delete_child(conn, parent, basename(name)))
++ return errno;
++
++ return 0;
+ }
+
+
+@@ -1728,7 +1715,7 @@ static int do_rm(const void *ctx, struct connection *conn,
+ if (streq(name, "/"))
+ return EINVAL;
+
+- ret = _rm(conn, ctx, node, name);
++ ret = _rm(conn, ctx, name);
+ if (ret)
+ return ret;
+
+--
+2.37.4
+
diff --git a/0078-tools-xenstore-use-treewalk-for-creating-node-record.patch b/0078-tools-xenstore-use-treewalk-for-creating-node-record.patch
new file mode 100644
index 0000000..d5ed8c1
--- /dev/null
+++ b/0078-tools-xenstore-use-treewalk-for-creating-node-record.patch
@@ -0,0 +1,169 @@
+From 7682de61a49f7692cbd31a62f12c0ca12e069575 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:12 +0200
+Subject: [PATCH 78/87] tools/xenstore: use treewalk for creating node records
+
+Instead of doing an open tree walk using call recursion, use
+walk_node_tree() when creating the node records during a live update.
+
+This will reduce code size and avoid many nesting levels of function
+calls which could potentially exhaust the stack.
+
+This is part of XSA-418 / CVE-2022-42321.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 297ac246a5d8ed656b349641288f3402dcc0251e)
+---
+ tools/xenstore/xenstored_core.c | 105 ++++++++++++--------------------
+ 1 file changed, 40 insertions(+), 65 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 58fb651542ec..05d349778bb4 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -3120,101 +3120,76 @@ const char *dump_state_node_perms(FILE *fp, const struct xs_permissions *perms,
+ return NULL;
+ }
+
+-static const char *dump_state_node_tree(FILE *fp, char *path,
+- unsigned int path_max_len)
++struct dump_node_data {
++ FILE *fp;
++ const char *err;
++};
++
++static int dump_state_node_err(struct dump_node_data *data, const char *err)
+ {
+- unsigned int pathlen, childlen, p = 0;
++ data->err = err;
++ return WALK_TREE_ERROR_STOP;
++}
++
++static int dump_state_node(const void *ctx, struct connection *conn,
++ struct node *node, void *arg)
++{
++ struct dump_node_data *data = arg;
++ FILE *fp = data->fp;
++ unsigned int pathlen;
+ struct xs_state_record_header head;
+ struct xs_state_node sn;
+- TDB_DATA key, data;
+- const struct xs_tdb_record_hdr *hdr;
+- const char *child;
+ const char *ret;
+
+- pathlen = strlen(path) + 1;
+-
+- set_tdb_key(path, &key);
+- data = tdb_fetch(tdb_ctx, key);
+- if (data.dptr == NULL)
+- return "Error reading node";
+-
+- /* Clean up in case of failure. */
+- talloc_steal(path, data.dptr);
+-
+- hdr = (void *)data.dptr;
++ pathlen = strlen(node->name) + 1;
+
+ head.type = XS_STATE_TYPE_NODE;
+ head.length = sizeof(sn);
+ sn.conn_id = 0;
+ sn.ta_id = 0;
+ sn.ta_access = 0;
+- sn.perm_n = hdr->num_perms;
++ sn.perm_n = node->perms.num;
+ sn.path_len = pathlen;
+- sn.data_len = hdr->datalen;
+- head.length += hdr->num_perms * sizeof(*sn.perms);
++ sn.data_len = node->datalen;
++ head.length += node->perms.num * sizeof(*sn.perms);
+ head.length += pathlen;
+- head.length += hdr->datalen;
++ head.length += node->datalen;
+ head.length = ROUNDUP(head.length, 3);
+
+ if (fwrite(&head, sizeof(head), 1, fp) != 1)
+- return "Dump node state error";
++ return dump_state_node_err(data, "Dump node head error");
+ if (fwrite(&sn, sizeof(sn), 1, fp) != 1)
+- return "Dump node state error";
++ return dump_state_node_err(data, "Dump node state error");
+
+- ret = dump_state_node_perms(fp, hdr->perms, hdr->num_perms);
++ ret = dump_state_node_perms(fp, node->perms.p, node->perms.num);
+ if (ret)
+- return ret;
++ return dump_state_node_err(data, ret);
+
+- if (fwrite(path, pathlen, 1, fp) != 1)
+- return "Dump node path error";
+- if (hdr->datalen &&
+- fwrite(hdr->perms + hdr->num_perms, hdr->datalen, 1, fp) != 1)
+- return "Dump node data error";
++ if (fwrite(node->name, pathlen, 1, fp) != 1)
++ return dump_state_node_err(data, "Dump node path error");
++
++ if (node->datalen && fwrite(node->data, node->datalen, 1, fp) != 1)
++ return dump_state_node_err(data, "Dump node data error");
+
+ ret = dump_state_align(fp);
+ if (ret)
+- return ret;
++ return dump_state_node_err(data, ret);
+
+- child = (char *)(hdr->perms + hdr->num_perms) + hdr->datalen;
+-
+- /*
+- * Use path for constructing children paths.
+- * As we don't write out nodes without having written their parent
+- * already we will never clobber a part of the path we'll need later.
+- */
+- pathlen--;
+- if (path[pathlen - 1] != '/') {
+- path[pathlen] = '/';
+- pathlen++;
+- }
+- while (p < hdr->childlen) {
+- childlen = strlen(child) + 1;
+- if (pathlen + childlen > path_max_len)
+- return "Dump node path length error";
+- strcpy(path + pathlen, child);
+- ret = dump_state_node_tree(fp, path, path_max_len);
+- if (ret)
+- return ret;
+- p += childlen;
+- child += childlen;
+- }
+-
+- talloc_free(data.dptr);
+-
+- return NULL;
++ return WALK_TREE_OK;
+ }
+
+ const char *dump_state_nodes(FILE *fp, const void *ctx)
+ {
+- char *path;
++ struct dump_node_data data = {
++ .fp = fp,
++ .err = "Dump node walk error"
++ };
++ struct walk_funcs walkfuncs = { .enter = dump_state_node };
+
+- path = talloc_size(ctx, XENSTORE_ABS_PATH_MAX + 1);
+- if (!path)
+- return "Path buffer allocation error";
++ if (walk_node_tree(ctx, NULL, "/", &walkfuncs, &data))
++ return data.err;
+
+- strcpy(path, "/");
+-
+- return dump_state_node_tree(fp, path, XENSTORE_ABS_PATH_MAX + 1);
++ return NULL;
+ }
+
+ void read_state_global(const void *ctx, const void *state)
+--
+2.37.4
+
diff --git a/0079-tools-xenstore-remove-nodes-owned-by-destroyed-domai.patch b/0079-tools-xenstore-remove-nodes-owned-by-destroyed-domai.patch
new file mode 100644
index 0000000..f6ba349
--- /dev/null
+++ b/0079-tools-xenstore-remove-nodes-owned-by-destroyed-domai.patch
@@ -0,0 +1,298 @@
+From 825332daeac9fc3ac1e482e805ac4a3bc1e1ab34 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:12 +0200
+Subject: [PATCH 79/87] tools/xenstore: remove nodes owned by destroyed domain
+
+In case a domain is removed from Xenstore, remove all nodes owned by
+it per default.
+
+This tackles the problem that nodes might be created by a domain
+outside its home path in Xenstore, leading to Xenstore hogging more
+and more memory. Domain quota don't work in this case if the guest is
+rebooting in between.
+
+Since XSA-322 ownership of such stale nodes is transferred to dom0,
+which is helping against unintended access, but not against OOM of
+Xenstore.
+
+As a fallback for weird cases add a Xenstore start parameter for
+keeping today's way to handle stale nodes, adding the risk of Xenstore
+hitting an OOM situation.
+
+This is part of XSA-419 / CVE-2022-42322.
+
+Fixes: 496306324d8d ("tools/xenstore: revoke access rights for removed domains")
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 755d3f9debf8879448211fffb018f556136f6a79)
+---
+ tools/xenstore/xenstored_core.c | 17 +++++--
+ tools/xenstore/xenstored_core.h | 4 ++
+ tools/xenstore/xenstored_domain.c | 84 +++++++++++++++++++++++--------
+ tools/xenstore/xenstored_domain.h | 2 +-
+ 4 files changed, 80 insertions(+), 27 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 05d349778bb4..0ca1a5a19ac2 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -80,6 +80,7 @@ static bool verbose = false;
+ LIST_HEAD(connections);
+ int tracefd = -1;
+ static bool recovery = true;
++bool keep_orphans = false;
+ static int reopen_log_pipe[2];
+ static int reopen_log_pipe0_pollfd_idx = -1;
+ char *tracefile = NULL;
+@@ -757,7 +758,7 @@ struct node *read_node(struct connection *conn, const void *ctx,
+ node->perms.p = hdr->perms;
+ node->acc.domid = node->perms.p[0].id;
+ node->acc.memory = data.dsize;
+- if (domain_adjust_node_perms(conn, node))
++ if (domain_adjust_node_perms(node))
+ goto error;
+
+ /* If owner is gone reset currently accounted memory size. */
+@@ -800,7 +801,7 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
+ void *p;
+ struct xs_tdb_record_hdr *hdr;
+
+- if (domain_adjust_node_perms(conn, node))
++ if (domain_adjust_node_perms(node))
+ return errno;
+
+ data.dsize = sizeof(*hdr)
+@@ -1651,7 +1652,7 @@ static int delnode_sub(const void *ctx, struct connection *conn,
+ return WALK_TREE_RM_CHILDENTRY;
+ }
+
+-static int _rm(struct connection *conn, const void *ctx, const char *name)
++int rm_node(struct connection *conn, const void *ctx, const char *name)
+ {
+ struct node *parent;
+ char *parentname = get_parent(ctx, name);
+@@ -1715,7 +1716,7 @@ static int do_rm(const void *ctx, struct connection *conn,
+ if (streq(name, "/"))
+ return EINVAL;
+
+- ret = _rm(conn, ctx, name);
++ ret = rm_node(conn, ctx, name);
+ if (ret)
+ return ret;
+
+@@ -2639,6 +2640,8 @@ static void usage(void)
+ " -R, --no-recovery to request that no recovery should be attempted when\n"
+ " the store is corrupted (debug only),\n"
+ " -I, --internal-db store database in memory, not on disk\n"
++" -K, --keep-orphans don't delete nodes owned by a domain when the\n"
++" domain is deleted (this is a security risk!)\n"
+ " -V, --verbose to request verbose execution.\n");
+ }
+
+@@ -2663,6 +2666,7 @@ static struct option options[] = {
+ { "timeout", 1, NULL, 'w' },
+ { "no-recovery", 0, NULL, 'R' },
+ { "internal-db", 0, NULL, 'I' },
++ { "keep-orphans", 0, NULL, 'K' },
+ { "verbose", 0, NULL, 'V' },
+ { "watch-nb", 1, NULL, 'W' },
+ #ifndef NO_LIVE_UPDATE
+@@ -2742,7 +2746,7 @@ int main(int argc, char *argv[])
+ orig_argc = argc;
+ orig_argv = argv;
+
+- while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:Q:q:T:RVW:w:U",
++ while ((opt = getopt_long(argc, argv, "DE:F:HKNPS:t:A:M:Q:q:T:RVW:w:U",
+ options, NULL)) != -1) {
+ switch (opt) {
+ case 'D':
+@@ -2778,6 +2782,9 @@ int main(int argc, char *argv[])
+ case 'I':
+ tdb_flags = TDB_INTERNAL|TDB_NOLOCK;
+ break;
++ case 'K':
++ keep_orphans = true;
++ break;
+ case 'V':
+ verbose = true;
+ break;
+diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
+index 2d9942171d92..725793257e4a 100644
+--- a/tools/xenstore/xenstored_core.h
++++ b/tools/xenstore/xenstored_core.h
+@@ -240,6 +240,9 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
+ struct node *read_node(struct connection *conn, const void *ctx,
+ const char *name);
+
++/* Remove a node and its children. */
++int rm_node(struct connection *conn, const void *ctx, const char *name);
++
+ void setup_structure(bool live_update);
+ struct connection *new_connection(const struct interface_funcs *funcs);
+ struct connection *get_connection_by_id(unsigned int conn_id);
+@@ -286,6 +289,7 @@ extern int quota_req_outstanding;
+ extern int quota_trans_nodes;
+ extern int quota_memory_per_domain_soft;
+ extern int quota_memory_per_domain_hard;
++extern bool keep_orphans;
+
+ extern unsigned int timeout_watch_event_msec;
+
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index e2f1b09c6037..8b134017a27a 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -227,10 +227,64 @@ static void unmap_interface(void *interface)
+ xengnttab_unmap(*xgt_handle, interface, 1);
+ }
+
++static int domain_tree_remove_sub(const void *ctx, struct connection *conn,
++ struct node *node, void *arg)
++{
++ struct domain *domain = arg;
++ TDB_DATA key;
++ int ret = WALK_TREE_OK;
++
++ if (node->perms.p[0].id != domain->domid)
++ return WALK_TREE_OK;
++
++ if (keep_orphans) {
++ set_tdb_key(node->name, &key);
++ domain->nbentry--;
++ node->perms.p[0].id = priv_domid;
++ node->acc.memory = 0;
++ domain_entry_inc(NULL, node);
++ if (write_node_raw(NULL, &key, node, true)) {
++ /* That's unfortunate. We only can try to continue. */
++ syslog(LOG_ERR,
++ "error when moving orphaned node %s to dom0\n",
++ node->name);
++ } else
++ trace("orphaned node %s moved to dom0\n", node->name);
++ } else {
++ if (rm_node(NULL, ctx, node->name)) {
++ /* That's unfortunate. We only can try to continue. */
++ syslog(LOG_ERR,
++ "error when deleting orphaned node %s\n",
++ node->name);
++ } else
++ trace("orphaned node %s deleted\n", node->name);
++
++ /* Skip children in all cases in order to avoid more errors. */
++ ret = WALK_TREE_SKIP_CHILDREN;
++ }
++
++ return domain->nbentry > 0 ? ret : WALK_TREE_SUCCESS_STOP;
++}
++
++static void domain_tree_remove(struct domain *domain)
++{
++ int ret;
++ struct walk_funcs walkfuncs = { .enter = domain_tree_remove_sub };
++
++ if (domain->nbentry > 0) {
++ ret = walk_node_tree(domain, NULL, "/", &walkfuncs, domain);
++ if (ret == WALK_TREE_ERROR_STOP)
++ syslog(LOG_ERR,
++ "error when looking for orphaned nodes\n");
++ }
++}
++
+ static int destroy_domain(void *_domain)
+ {
+ struct domain *domain = _domain;
+
++ domain_tree_remove(domain);
++
+ list_del(&domain->list);
+
+ if (!domain->introduced)
+@@ -851,15 +905,15 @@ int domain_entry_inc(struct connection *conn, struct node *node)
+ struct domain *d;
+ unsigned int domid;
+
+- if (!conn)
++ if (!node->perms.p)
+ return 0;
+
+- domid = node->perms.p ? node->perms.p[0].id : conn->id;
++ domid = node->perms.p[0].id;
+
+- if (conn->transaction) {
++ if (conn && conn->transaction) {
+ transaction_entry_inc(conn->transaction, domid);
+ } else {
+- d = (domid == conn->id && conn->domain) ? conn->domain
++ d = (conn && domid == conn->id && conn->domain) ? conn->domain
+ : find_or_alloc_existing_domain(domid);
+ if (d)
+ d->nbentry++;
+@@ -920,23 +974,11 @@ int domain_alloc_permrefs(struct node_perms *perms)
+ * Remove permissions for no longer existing domains in order to avoid a new
+ * domain with the same domid inheriting the permissions.
+ */
+-int domain_adjust_node_perms(struct connection *conn, struct node *node)
++int domain_adjust_node_perms(struct node *node)
+ {
+ unsigned int i;
+ int ret;
+
+- ret = chk_domain_generation(node->perms.p[0].id, node->generation);
+-
+- /* If the owner doesn't exist any longer give it to priv domain. */
+- if (!ret) {
+- /*
+- * In theory we'd need to update the number of dom0 nodes here,
+- * but we could be called for a read of the node. So better
+- * avoid the risk to overflow the node count of dom0.
+- */
+- node->perms.p[0].id = priv_domid;
+- }
+-
+ for (i = 1; i < node->perms.num; i++) {
+ if (node->perms.p[i].perms & XS_PERM_IGNORE)
+ continue;
+@@ -954,15 +996,15 @@ void domain_entry_dec(struct connection *conn, struct node *node)
+ struct domain *d;
+ unsigned int domid;
+
+- if (!conn)
++ if (!node->perms.p)
+ return;
+
+ domid = node->perms.p ? node->perms.p[0].id : conn->id;
+
+- if (conn->transaction) {
++ if (conn && conn->transaction) {
+ transaction_entry_dec(conn->transaction, domid);
+ } else {
+- d = (domid == conn->id && conn->domain) ? conn->domain
++ d = (conn && domid == conn->id && conn->domain) ? conn->domain
+ : find_domain_struct(domid);
+ if (d) {
+ d->nbentry--;
+@@ -1081,7 +1123,7 @@ int domain_memory_add(unsigned int domid, int mem, bool no_quota_check)
+ * exist, as accounting is done either for a domain related to
+ * the current connection, or for the domain owning a node
+ * (which is always existing, as the owner of the node is
+- * tested to exist and replaced by domid 0 if not).
++ * tested to exist and deleted or replaced by domid 0 if not).
+ * So not finding the related domain MUST be an error in the
+ * data base.
+ */
+diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
+index 40fe5f690900..5454e925ad15 100644
+--- a/tools/xenstore/xenstored_domain.h
++++ b/tools/xenstore/xenstored_domain.h
+@@ -61,7 +61,7 @@ const char *get_implicit_path(const struct connection *conn);
+ bool domain_is_unprivileged(struct connection *conn);
+
+ /* Remove node permissions for no longer existing domains. */
+-int domain_adjust_node_perms(struct connection *conn, struct node *node);
++int domain_adjust_node_perms(struct node *node);
+ int domain_alloc_permrefs(struct node_perms *perms);
+
+ /* Quota manipulation */
+--
+2.37.4
+
diff --git a/0080-tools-xenstore-make-the-internal-memory-data-base-th.patch b/0080-tools-xenstore-make-the-internal-memory-data-base-th.patch
new file mode 100644
index 0000000..53d6227
--- /dev/null
+++ b/0080-tools-xenstore-make-the-internal-memory-data-base-th.patch
@@ -0,0 +1,101 @@
+From 8b81fc185ab13feca2f63eda3792189e5ac11a97 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:13 +0200
+Subject: [PATCH 80/87] tools/xenstore: make the internal memory data base the
+ default
+
+Having a file backed data base has the only advantage of being capable
+to dump the contents of it while Xenstore is running, and potentially
+using less swap space in case the data base can't be kept in memory.
+
+It has the major disadvantage of a huge performance overhead: switching
+to keep the data base in memory only speeds up live update of xenstored
+with 120000 nodes from 20 minutes to 11 seconds. A complete tree walk
+of this configuration will be reduced from 7 seconds to 280 msecs
+(measured by "xenstore-control check").
+
+So make the internal memory data base the default and enhance the
+"--internal-db" command line parameter to take an optional parameter
+allowing to switch the internal data base back to the file based one.
+
+This is part of XSA-419.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit d174fefa90487ddd25ebc618028f67b2e8a1f795)
+---
+ tools/helpers/init-xenstore-domain.c | 4 ++--
+ tools/xenstore/xenstored_core.c | 13 ++++++++-----
+ 2 files changed, 10 insertions(+), 7 deletions(-)
+
+diff --git a/tools/helpers/init-xenstore-domain.c b/tools/helpers/init-xenstore-domain.c
+index 11ebf79e6d26..8d1d1a4f1e3a 100644
+--- a/tools/helpers/init-xenstore-domain.c
++++ b/tools/helpers/init-xenstore-domain.c
+@@ -223,9 +223,9 @@ static int build(xc_interface *xch)
+ }
+
+ if ( param )
+- snprintf(cmdline, 512, "--event %d --internal-db %s", rv, param);
++ snprintf(cmdline, 512, "--event %d %s", rv, param);
+ else
+- snprintf(cmdline, 512, "--event %d --internal-db", rv);
++ snprintf(cmdline, 512, "--event %d", rv);
+
+ dom->guest_domid = domid;
+ dom->cmdline = xc_dom_strdup(dom, cmdline);
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 0ca1a5a19ac2..041124d8b7a5 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -2329,7 +2329,7 @@ static void accept_connection(int sock)
+ }
+ #endif
+
+-static int tdb_flags;
++static int tdb_flags = TDB_INTERNAL | TDB_NOLOCK;
+
+ /* We create initial nodes manually. */
+ static void manual_node(const char *name, const char *child)
+@@ -2639,7 +2639,8 @@ static void usage(void)
+ " watch-event: time a watch-event is kept pending\n"
+ " -R, --no-recovery to request that no recovery should be attempted when\n"
+ " the store is corrupted (debug only),\n"
+-" -I, --internal-db store database in memory, not on disk\n"
++" -I, --internal-db [on|off] store database in memory, not on disk, default is\n"
++" memory, with \"--internal-db off\" it is on disk\n"
+ " -K, --keep-orphans don't delete nodes owned by a domain when the\n"
+ " domain is deleted (this is a security risk!)\n"
+ " -V, --verbose to request verbose execution.\n");
+@@ -2665,7 +2666,7 @@ static struct option options[] = {
+ { "quota-soft", 1, NULL, 'q' },
+ { "timeout", 1, NULL, 'w' },
+ { "no-recovery", 0, NULL, 'R' },
+- { "internal-db", 0, NULL, 'I' },
++ { "internal-db", 2, NULL, 'I' },
+ { "keep-orphans", 0, NULL, 'K' },
+ { "verbose", 0, NULL, 'V' },
+ { "watch-nb", 1, NULL, 'W' },
+@@ -2746,7 +2747,8 @@ int main(int argc, char *argv[])
+ orig_argc = argc;
+ orig_argv = argv;
+
+- while ((opt = getopt_long(argc, argv, "DE:F:HKNPS:t:A:M:Q:q:T:RVW:w:U",
++ while ((opt = getopt_long(argc, argv,
++ "DE:F:HI::KNPS:t:A:M:Q:q:T:RVW:w:U",
+ options, NULL)) != -1) {
+ switch (opt) {
+ case 'D':
+@@ -2780,7 +2782,8 @@ int main(int argc, char *argv[])
+ tracefile = optarg;
+ break;
+ case 'I':
+- tdb_flags = TDB_INTERNAL|TDB_NOLOCK;
++ if (optarg && !strcmp(optarg, "off"))
++ tdb_flags = 0;
+ break;
+ case 'K':
+ keep_orphans = true;
+--
+2.37.4
+
diff --git a/0081-docs-enhance-xenstore.txt-with-permissions-descripti.patch b/0081-docs-enhance-xenstore.txt-with-permissions-descripti.patch
new file mode 100644
index 0000000..c0b9c4a
--- /dev/null
+++ b/0081-docs-enhance-xenstore.txt-with-permissions-descripti.patch
@@ -0,0 +1,50 @@
+From 1f5b394d6ed0ee26b5878bd0cdf4a698bbc4294f Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:13 +0200
+Subject: [PATCH 81/87] docs: enhance xenstore.txt with permissions description
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The permission scheme of Xenstore nodes is not really covered by
+docs/misc/xenstore.txt, other than referring to the Xen wiki.
+
+Add a paragraph explaining the permissions of nodes, and especially
+mentioning removal of nodes when a domain has been removed from
+Xenstore.
+
+This is part of XSA-419.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit d084d2c6dff7044956ebdf83a259ad6081a1d921)
+---
+ docs/misc/xenstore.txt | 11 +++++++++++
+ 1 file changed, 11 insertions(+)
+
+diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
+index a7d006519ae8..eccd596ee38c 100644
+--- a/docs/misc/xenstore.txt
++++ b/docs/misc/xenstore.txt
+@@ -43,6 +43,17 @@ bytes are forbidden; clients specifying relative paths should keep
+ them to within 2048 bytes. (See XENSTORE_*_PATH_MAX in xs_wire.h.)
+
+
++Each node has one or multiple permission entries. Permissions are
++granted by domain-id, the first permission entry of each node specifies
++the owner of the node. Permissions of a node can be changed by the
++owner of the node, the owner can only be modified by the control
++domain (usually domain id 0). The owner always has the right to read
++and write the node, while other permissions can be setup to allow
++read and/or write access. When a domain is being removed from Xenstore
++nodes owned by that domain will be removed together with all of those
++nodes' children.
++
++
+ Communication with xenstore is via either sockets, or event channel
+ and shared memory, as specified in io/xs_wire.h: each message in
+ either direction is a header formatted as a struct xsd_sockmsg
+--
+2.37.4
+
diff --git a/0082-tools-ocaml-xenstored-Fix-quota-bypass-on-domain-shu.patch b/0082-tools-ocaml-xenstored-Fix-quota-bypass-on-domain-shu.patch
new file mode 100644
index 0000000..1cdc2b2
--- /dev/null
+++ b/0082-tools-ocaml-xenstored-Fix-quota-bypass-on-domain-shu.patch
@@ -0,0 +1,93 @@
+From 5b0919f2c0e5060f6e0bc328f100abae0a9f07b8 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Wed, 12 Oct 2022 19:13:06 +0100
+Subject: [PATCH 82/87] tools/ocaml/xenstored: Fix quota bypass on domain
+ shutdown
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+XSA-322 fixed a domid reuse vulnerability by assigning Dom0 as the owner of
+any nodes left after a domain is shutdown (e.g. outside its /local/domain/N
+tree).
+
+However Dom0 has no quota on purpose, so this opened up another potential
+attack vector. Avoid it by deleting these nodes instead of assigning them to
+Dom0.
+
+This is part of XSA-419 / CVE-2022-42323.
+
+Fixes: c46eff921209 ("tools/ocaml/xenstored: clean up permissions for dead domains")
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit db471408edd46af403b8bd44d180a928ad7fbb80)
+---
+ tools/ocaml/xenstored/perms.ml | 3 +--
+ tools/ocaml/xenstored/store.ml | 29 +++++++++++++++++++++--------
+ 2 files changed, 22 insertions(+), 10 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/perms.ml b/tools/ocaml/xenstored/perms.ml
+index e8a16221f8fa..84f2503e8e29 100644
+--- a/tools/ocaml/xenstored/perms.ml
++++ b/tools/ocaml/xenstored/perms.ml
+@@ -64,8 +64,7 @@ let get_owner perm = perm.owner
+ * *)
+ let remove_domid ~domid perm =
+ let acl = List.filter (fun (acl_domid, _) -> acl_domid <> domid) perm.acl in
+- let owner = if perm.owner = domid then 0 else perm.owner in
+- { perm with acl; owner }
++ if perm.owner = domid then None else Some { perm with acl; owner = perm.owner }
+
+ let default0 = create 0 NONE []
+
+diff --git a/tools/ocaml/xenstored/store.ml b/tools/ocaml/xenstored/store.ml
+index 20e67b142746..70f0c83de404 100644
+--- a/tools/ocaml/xenstored/store.ml
++++ b/tools/ocaml/xenstored/store.ml
+@@ -87,10 +87,21 @@ let check_owner node connection =
+
+ let rec recurse fct node = fct node; SymbolMap.iter (fun _ -> recurse fct) node.children
+
+-(** [recurse_map f tree] applies [f] on each node in the tree recursively *)
+-let recurse_map f =
++(** [recurse_filter_map f tree] applies [f] on each node in the tree recursively,
++ possibly removing some nodes.
++ Note that the nodes removed this way won't generate watch events.
++*)
++let recurse_filter_map f =
++ let invalid = -1 in
++ let is_valid _ node = node.perms.owner <> invalid in
+ let rec walk node =
+- f { node with children = SymbolMap.map walk node.children }
++ (* Map.filter_map is Ocaml 4.11+ only *)
++ let node =
++ { node with children =
++ SymbolMap.map walk node.children |> SymbolMap.filter is_valid } in
++ match f node with
++ | Some keep -> keep
++ | None -> { node with perms = {node.perms with owner = invalid } }
+ in
+ walk
+
+@@ -444,11 +455,13 @@ let setperms store perm path nperms =
+
+ let reset_permissions store domid =
+ Logging.info "store|node" "Cleaning up xenstore ACLs for domid %d" domid;
+- store.root <- Node.recurse_map (fun node ->
+- let perms = Perms.Node.remove_domid ~domid node.perms in
+- if perms <> node.perms then
+- Logging.debug "store|node" "Changed permissions for node %s" (Node.get_name node);
+- { node with perms }
++ store.root <- Node.recurse_filter_map (fun node ->
++ match Perms.Node.remove_domid ~domid node.perms with
++ | None -> None
++ | Some perms ->
++ if perms <> node.perms then
++ Logging.debug "store|node" "Changed permissions for node %s" (Node.get_name node);
++ Some { node with perms }
+ ) store.root
+
+ type ops = {
+--
+2.37.4
+
diff --git a/0083-tools-ocaml-Ensure-packet-size-is-never-negative.patch b/0083-tools-ocaml-Ensure-packet-size-is-never-negative.patch
new file mode 100644
index 0000000..5fc3c77
--- /dev/null
+++ b/0083-tools-ocaml-Ensure-packet-size-is-never-negative.patch
@@ -0,0 +1,75 @@
+From 635390415f4a9c0621330f0b40f8c7e914c4523f Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Wed, 12 Oct 2022 19:13:05 +0100
+Subject: [PATCH 83/87] tools/ocaml: Ensure packet size is never negative
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Integers in Ocaml have 63 or 31 bits of signed precision.
+
+On 64-bit builds of Ocaml, this is fine because a C uint32_t always fits
+within a 63-bit signed integer.
+
+In 32-bit builds of Ocaml, this goes wrong. The C uint32_t is truncated
+first (loses the top bit), then has a unsigned/signed mismatch.
+
+A "negative" value (i.e. a packet on the ring of between 1G and 2G in size)
+will trigger an exception later in Bytes.make in xb.ml, and because the packet
+is not removed from the ring, the exception re-triggers on every subsequent
+query, creating a livelock.
+
+Fix both the source of the exception in Xb, and as defence in depth, mark the
+domain as bad for any Invalid_argument exceptions to avoid the risk of
+livelock.
+
+This is XSA-420 / CVE-2022-42324.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit ae34df4d82636f4c82700b447ea2c93b9f82b3f3)
+---
+ tools/ocaml/libs/xb/partial.ml | 6 +++---
+ tools/ocaml/xenstored/process.ml | 2 +-
+ 2 files changed, 4 insertions(+), 4 deletions(-)
+
+diff --git a/tools/ocaml/libs/xb/partial.ml b/tools/ocaml/libs/xb/partial.ml
+index b6e2a716e263..3aa8927eb7f0 100644
+--- a/tools/ocaml/libs/xb/partial.ml
++++ b/tools/ocaml/libs/xb/partial.ml
+@@ -36,7 +36,7 @@ let of_string s =
+ This will leave the guest connection is a bad state and will
+ be hard to recover from without restarting the connection
+ (ie rebooting the guest) *)
+- let dlen = min xenstore_payload_max dlen in
++ let dlen = max 0 (min xenstore_payload_max dlen) in
+ {
+ tid = tid;
+ rid = rid;
+@@ -46,8 +46,8 @@ let of_string s =
+ }
+
+ let append pkt s sz =
+- if pkt.len > 4096 then failwith "Buffer.add: cannot grow buffer";
+- Buffer.add_string pkt.buf (String.sub s 0 sz)
++ if Buffer.length pkt.buf + sz > xenstore_payload_max then failwith "Buffer.add: cannot grow buffer";
++ Buffer.add_substring pkt.buf s 0 sz
+
+ let to_complete pkt =
+ pkt.len - (Buffer.length pkt.buf)
+diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
+index ce39ce28b5f3..6cb990ee7fb2 100644
+--- a/tools/ocaml/xenstored/process.ml
++++ b/tools/ocaml/xenstored/process.ml
+@@ -722,7 +722,7 @@ let do_input store cons doms con =
+ History.reconnect con;
+ info "%s reconnection complete" (Connection.get_domstr con);
+ None
+- | Failure exp ->
++ | Invalid_argument exp | Failure exp ->
+ error "caught exception %s" exp;
+ error "got a bad client %s" (sprintf "%-8s" (Connection.get_domstr con));
+ Connection.mark_as_bad con;
+--
+2.37.4
+
diff --git a/0084-tools-xenstore-fix-deleting-node-in-transaction.patch b/0084-tools-xenstore-fix-deleting-node-in-transaction.patch
new file mode 100644
index 0000000..4ab044c
--- /dev/null
+++ b/0084-tools-xenstore-fix-deleting-node-in-transaction.patch
@@ -0,0 +1,46 @@
+From 4305807dfdc183f4acd170fe00eb66b338fa6430 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:13 +0200
+Subject: [PATCH 84/87] tools/xenstore: fix deleting node in transaction
+
+In case a node has been created in a transaction and it is later
+deleted in the same transaction, the transaction will be terminated
+with an error.
+
+As this error is encountered only when handling the deleted node at
+transaction finalization, the transaction will have been performed
+partially and without updating the accounting information. This will
+enable a malicious guest to create arbitrary number of nodes.
+
+This is part of XSA-421 / CVE-2022-42325.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Tested-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 13ac37f1416cae88d97f7baf6cf2a827edb9a187)
+---
+ tools/xenstore/xenstored_transaction.c | 8 +++++++-
+ 1 file changed, 7 insertions(+), 1 deletion(-)
+
+diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
+index 3e3eb47326cc..7ffe21bb5285 100644
+--- a/tools/xenstore/xenstored_transaction.c
++++ b/tools/xenstore/xenstored_transaction.c
+@@ -418,7 +418,13 @@ static int finalize_transaction(struct connection *conn,
+ true);
+ talloc_free(data.dptr);
+ } else {
+- ret = do_tdb_delete(conn, &key, NULL);
++ /*
++ * A node having been created and later deleted
++ * in this transaction will have no generation
++ * information stored.
++ */
++ ret = (i->generation == NO_GENERATION)
++ ? 0 : do_tdb_delete(conn, &key, NULL);
+ }
+ if (ret)
+ goto err;
+--
+2.37.4
+
diff --git a/0085-tools-xenstore-harden-transaction-finalization-again.patch b/0085-tools-xenstore-harden-transaction-finalization-again.patch
new file mode 100644
index 0000000..6718ae7
--- /dev/null
+++ b/0085-tools-xenstore-harden-transaction-finalization-again.patch
@@ -0,0 +1,410 @@
+From 1bdd7c438b399e2ecce9e3c72bd7c1ae56df60f8 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 13 Sep 2022 07:35:14 +0200
+Subject: [PATCH 85/87] tools/xenstore: harden transaction finalization against
+ errors
+
+When finalizing a transaction, any error occurring after checking for
+conflicts will result in the transaction being performed only
+partially today. Additionally accounting data will not be updated at
+the end of the transaction, which might result in further problems
+later.
+
+Avoid those problems by multiple modifications:
+
+- free any transaction specific nodes which don't need to be committed
+ as they haven't been written during the transaction as soon as their
+ generation count has been verified, this will reduce the risk of
+ out-of-memory situations
+
+- store the transaction specific node name in struct accessed_node in
+ order to avoid the need to allocate additional memory for it when
+ finalizing the transaction
+
+- don't stop the transaction finalization when hitting an error
+ condition, but try to continue to handle all modified nodes
+
+- in case of a detected error do the accounting update as needed and
+ call the data base checking only after that
+
+- if writing a node in a transaction is failing (e.g. due to a failed
+ quota check), fail the transaction, as prior changes to struct
+ accessed_node can't easily be undone in that case
+
+This is part of XSA-421 / CVE-2022-42326.
+
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+Tested-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 2dd823ca7237e7fb90c890642d6a3b357a26fcff)
+---
+ tools/xenstore/xenstored_core.c | 16 ++-
+ tools/xenstore/xenstored_transaction.c | 171 +++++++++++--------------
+ tools/xenstore/xenstored_transaction.h | 4 +-
+ 3 files changed, 92 insertions(+), 99 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 041124d8b7a5..ccb7f0a92578 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -727,8 +727,7 @@ struct node *read_node(struct connection *conn, const void *ctx,
+ return NULL;
+ }
+
+- if (transaction_prepend(conn, name, &key))
+- return NULL;
++ transaction_prepend(conn, name, &key);
+
+ data = tdb_fetch(tdb_ctx, key);
+
+@@ -846,10 +845,21 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
+ static int write_node(struct connection *conn, struct node *node,
+ bool no_quota_check)
+ {
++ int ret;
++
+ if (access_node(conn, node, NODE_ACCESS_WRITE, &node->key))
+ return errno;
+
+- return write_node_raw(conn, &node->key, node, no_quota_check);
++ ret = write_node_raw(conn, &node->key, node, no_quota_check);
++ if (ret && conn && conn->transaction) {
++ /*
++ * Reverting access_node() is hard, so just fail the
++ * transaction.
++ */
++ fail_transaction(conn->transaction);
++ }
++
++ return ret;
+ }
+
+ unsigned int perm_for_conn(struct connection *conn,
+diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
+index 7ffe21bb5285..ac854197cadb 100644
+--- a/tools/xenstore/xenstored_transaction.c
++++ b/tools/xenstore/xenstored_transaction.c
+@@ -114,7 +114,8 @@ struct accessed_node
+ struct list_head list;
+
+ /* The name of the node. */
+- char *node;
++ char *trans_name; /* Transaction specific name. */
++ char *node; /* Main data base name. */
+
+ /* Generation count (or NO_GENERATION) for conflict checking. */
+ uint64_t generation;
+@@ -199,25 +200,20 @@ static char *transaction_get_node_name(void *ctx, struct transaction *trans,
+ * Prepend the transaction to name if node has been modified in the current
+ * transaction.
+ */
+-int transaction_prepend(struct connection *conn, const char *name,
+- TDB_DATA *key)
++void transaction_prepend(struct connection *conn, const char *name,
++ TDB_DATA *key)
+ {
+- char *tdb_name;
++ struct accessed_node *i;
+
+- if (!conn || !conn->transaction ||
+- !find_accessed_node(conn->transaction, name)) {
+- set_tdb_key(name, key);
+- return 0;
++ if (conn && conn->transaction) {
++ i = find_accessed_node(conn->transaction, name);
++ if (i) {
++ set_tdb_key(i->trans_name, key);
++ return;
++ }
+ }
+
+- tdb_name = transaction_get_node_name(conn->transaction,
+- conn->transaction, name);
+- if (!tdb_name)
+- return errno;
+-
+- set_tdb_key(tdb_name, key);
+-
+- return 0;
++ set_tdb_key(name, key);
+ }
+
+ /*
+@@ -240,7 +236,6 @@ int access_node(struct connection *conn, struct node *node,
+ struct accessed_node *i = NULL;
+ struct transaction *trans;
+ TDB_DATA local_key;
+- const char *trans_name = NULL;
+ int ret;
+ bool introduce = false;
+
+@@ -259,10 +254,6 @@ int access_node(struct connection *conn, struct node *node,
+
+ trans = conn->transaction;
+
+- trans_name = transaction_get_node_name(node, trans, node->name);
+- if (!trans_name)
+- goto nomem;
+-
+ i = find_accessed_node(trans, node->name);
+ if (!i) {
+ if (trans->nodes >= quota_trans_nodes &&
+@@ -273,9 +264,10 @@ int access_node(struct connection *conn, struct node *node,
+ i = talloc_zero(trans, struct accessed_node);
+ if (!i)
+ goto nomem;
+- i->node = talloc_strdup(i, node->name);
+- if (!i->node)
++ i->trans_name = transaction_get_node_name(i, trans, node->name);
++ if (!i->trans_name)
+ goto nomem;
++ i->node = strchr(i->trans_name, '/') + 1;
+ if (node->generation != NO_GENERATION && node->perms.num) {
+ i->perms.p = talloc_array(i, struct xs_permissions,
+ node->perms.num);
+@@ -302,7 +294,7 @@ int access_node(struct connection *conn, struct node *node,
+ i->generation = node->generation;
+ i->check_gen = true;
+ if (node->generation != NO_GENERATION) {
+- set_tdb_key(trans_name, &local_key);
++ set_tdb_key(i->trans_name, &local_key);
+ ret = write_node_raw(conn, &local_key, node, true);
+ if (ret)
+ goto err;
+@@ -321,7 +313,7 @@ int access_node(struct connection *conn, struct node *node,
+ return -1;
+
+ if (key) {
+- set_tdb_key(trans_name, key);
++ set_tdb_key(i->trans_name, key);
+ if (type == NODE_ACCESS_WRITE)
+ i->ta_node = true;
+ if (type == NODE_ACCESS_DELETE)
+@@ -333,7 +325,6 @@ int access_node(struct connection *conn, struct node *node,
+ nomem:
+ ret = ENOMEM;
+ err:
+- talloc_free((void *)trans_name);
+ talloc_free(i);
+ trans->fail = true;
+ errno = ret;
+@@ -371,100 +362,90 @@ void queue_watches(struct connection *conn, const char *name, bool watch_exact)
+ * base.
+ */
+ static int finalize_transaction(struct connection *conn,
+- struct transaction *trans)
++ struct transaction *trans, bool *is_corrupt)
+ {
+- struct accessed_node *i;
++ struct accessed_node *i, *n;
+ TDB_DATA key, ta_key, data;
+ struct xs_tdb_record_hdr *hdr;
+ uint64_t gen;
+- char *trans_name;
+- int ret;
+
+- list_for_each_entry(i, &trans->accessed, list) {
+- if (!i->check_gen)
+- continue;
++ list_for_each_entry_safe(i, n, &trans->accessed, list) {
++ if (i->check_gen) {
++ set_tdb_key(i->node, &key);
++ data = tdb_fetch(tdb_ctx, key);
++ hdr = (void *)data.dptr;
++ if (!data.dptr) {
++ if (tdb_error(tdb_ctx) != TDB_ERR_NOEXIST)
++ return EIO;
++ gen = NO_GENERATION;
++ } else
++ gen = hdr->generation;
++ talloc_free(data.dptr);
++ if (i->generation != gen)
++ return EAGAIN;
++ }
+
+- set_tdb_key(i->node, &key);
+- data = tdb_fetch(tdb_ctx, key);
+- hdr = (void *)data.dptr;
+- if (!data.dptr) {
+- if (tdb_error(tdb_ctx) != TDB_ERR_NOEXIST)
+- return EIO;
+- gen = NO_GENERATION;
+- } else
+- gen = hdr->generation;
+- talloc_free(data.dptr);
+- if (i->generation != gen)
+- return EAGAIN;
++ /* Entries for unmodified nodes can be removed early. */
++ if (!i->modified) {
++ if (i->ta_node) {
++ set_tdb_key(i->trans_name, &ta_key);
++ if (do_tdb_delete(conn, &ta_key, NULL))
++ return EIO;
++ }
++ list_del(&i->list);
++ talloc_free(i);
++ }
+ }
+
+ while ((i = list_top(&trans->accessed, struct accessed_node, list))) {
+- trans_name = transaction_get_node_name(i, trans, i->node);
+- if (!trans_name)
+- /* We are doomed: the transaction is only partial. */
+- goto err;
+-
+- set_tdb_key(trans_name, &ta_key);
+-
+- if (i->modified) {
+- set_tdb_key(i->node, &key);
+- if (i->ta_node) {
+- data = tdb_fetch(tdb_ctx, ta_key);
+- if (!data.dptr)
+- goto err;
++ set_tdb_key(i->node, &key);
++ if (i->ta_node) {
++ set_tdb_key(i->trans_name, &ta_key);
++ data = tdb_fetch(tdb_ctx, ta_key);
++ if (data.dptr) {
+ hdr = (void *)data.dptr;
+ hdr->generation = ++generation;
+- ret = do_tdb_write(conn, &key, &data, NULL,
+- true);
++ *is_corrupt |= do_tdb_write(conn, &key, &data,
++ NULL, true);
+ talloc_free(data.dptr);
++ if (do_tdb_delete(conn, &ta_key, NULL))
++ *is_corrupt = true;
+ } else {
+- /*
+- * A node having been created and later deleted
+- * in this transaction will have no generation
+- * information stored.
+- */
+- ret = (i->generation == NO_GENERATION)
+- ? 0 : do_tdb_delete(conn, &key, NULL);
+- }
+- if (ret)
+- goto err;
+- if (i->fire_watch) {
+- fire_watches(conn, trans, i->node, NULL,
+- i->watch_exact,
+- i->perms.p ? &i->perms : NULL);
++ *is_corrupt = true;
+ }
++ } else {
++ /*
++ * A node having been created and later deleted
++ * in this transaction will have no generation
++ * information stored.
++ */
++ *is_corrupt |= (i->generation == NO_GENERATION)
++ ? false
++ : do_tdb_delete(conn, &key, NULL);
+ }
++ if (i->fire_watch)
++ fire_watches(conn, trans, i->node, NULL, i->watch_exact,
++ i->perms.p ? &i->perms : NULL);
+
+- if (i->ta_node && do_tdb_delete(conn, &ta_key, NULL))
+- goto err;
+ list_del(&i->list);
+ talloc_free(i);
+ }
+
+ return 0;
+-
+-err:
+- corrupt(conn, "Partial transaction");
+- return EIO;
+ }
+
+ static int destroy_transaction(void *_transaction)
+ {
+ struct transaction *trans = _transaction;
+ struct accessed_node *i;
+- char *trans_name;
+ TDB_DATA key;
+
+ wrl_ntransactions--;
+ trace_destroy(trans, "transaction");
+ while ((i = list_top(&trans->accessed, struct accessed_node, list))) {
+ if (i->ta_node) {
+- trans_name = transaction_get_node_name(i, trans,
+- i->node);
+- if (trans_name) {
+- set_tdb_key(trans_name, &key);
+- do_tdb_delete(trans->conn, &key, NULL);
+- }
++ set_tdb_key(i->trans_name, &key);
++ do_tdb_delete(trans->conn, &key, NULL);
+ }
+ list_del(&i->list);
+ talloc_free(i);
+@@ -556,6 +537,7 @@ int do_transaction_end(const void *ctx, struct connection *conn,
+ {
+ const char *arg = onearg(in);
+ struct transaction *trans;
++ bool is_corrupt = false;
+ int ret;
+
+ if (!arg || (!streq(arg, "T") && !streq(arg, "F")))
+@@ -579,13 +561,17 @@ int do_transaction_end(const void *ctx, struct connection *conn,
+ ret = transaction_fix_domains(trans, false);
+ if (ret)
+ return ret;
+- if (finalize_transaction(conn, trans))
+- return EAGAIN;
++ ret = finalize_transaction(conn, trans, &is_corrupt);
++ if (ret)
++ return ret;
+
+ wrl_apply_debit_trans_commit(conn);
+
+ /* fix domain entry for each changed domain */
+ transaction_fix_domains(trans, true);
++
++ if (is_corrupt)
++ corrupt(conn, "transaction inconsistency");
+ }
+ send_ack(conn, XS_TRANSACTION_END);
+
+@@ -660,7 +646,7 @@ int check_transactions(struct hashtable *hash)
+ struct connection *conn;
+ struct transaction *trans;
+ struct accessed_node *i;
+- char *tname, *tnode;
++ char *tname;
+
+ list_for_each_entry(conn, &connections, list) {
+ list_for_each_entry(trans, &conn->transaction_list, list) {
+@@ -672,11 +658,8 @@ int check_transactions(struct hashtable *hash)
+ list_for_each_entry(i, &trans->accessed, list) {
+ if (!i->ta_node)
+ continue;
+- tnode = transaction_get_node_name(tname, trans,
+- i->node);
+- if (!tnode || !remember_string(hash, tnode))
++ if (!remember_string(hash, i->trans_name))
+ goto nomem;
+- talloc_free(tnode);
+ }
+
+ talloc_free(tname);
+diff --git a/tools/xenstore/xenstored_transaction.h b/tools/xenstore/xenstored_transaction.h
+index 39d7f81c5127..3417303f9427 100644
+--- a/tools/xenstore/xenstored_transaction.h
++++ b/tools/xenstore/xenstored_transaction.h
+@@ -48,8 +48,8 @@ int __must_check access_node(struct connection *conn, struct node *node,
+ void queue_watches(struct connection *conn, const char *name, bool watch_exact);
+
+ /* Prepend the transaction to name if appropriate. */
+-int transaction_prepend(struct connection *conn, const char *name,
+- TDB_DATA *key);
++void transaction_prepend(struct connection *conn, const char *name,
++ TDB_DATA *key);
+
+ /* Mark the transaction as failed. This will prevent it to be committed. */
+ void fail_transaction(struct transaction *trans);
+--
+2.37.4
+
diff --git a/0086-x86-spec-ctrl-Enumeration-for-IBPB_RET.patch b/0086-x86-spec-ctrl-Enumeration-for-IBPB_RET.patch
new file mode 100644
index 0000000..c15c285
--- /dev/null
+++ b/0086-x86-spec-ctrl-Enumeration-for-IBPB_RET.patch
@@ -0,0 +1,82 @@
+From b1a1df345aaf359f305d6d041e571929c9252645 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 14 Jun 2022 16:18:36 +0100
+Subject: [PATCH 86/87] x86/spec-ctrl: Enumeration for IBPB_RET
+
+The IBPB_RET bit indicates that the CPU's implementation of MSR_PRED_CMD.IBPB
+does flush the RSB/RAS too.
+
+This is part of XSA-422 / CVE-2022-23824.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 24496558e650535bdbd22cc04731e82276cd1b3f)
+---
+ tools/libs/light/libxl_cpuid.c | 1 +
+ tools/misc/xen-cpuid.c | 1 +
+ xen/arch/x86/spec_ctrl.c | 5 +++--
+ xen/include/public/arch-x86/cpufeatureset.h | 1 +
+ 4 files changed, 6 insertions(+), 2 deletions(-)
+
+diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
+index bf6fdee360a9..691d5c6b2a68 100644
+--- a/tools/libs/light/libxl_cpuid.c
++++ b/tools/libs/light/libxl_cpuid.c
+@@ -289,6 +289,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
+ {"ssb-no", 0x80000008, NA, CPUID_REG_EBX, 26, 1},
+ {"psfd", 0x80000008, NA, CPUID_REG_EBX, 28, 1},
+ {"btc-no", 0x80000008, NA, CPUID_REG_EBX, 29, 1},
++ {"ibpb-ret", 0x80000008, NA, CPUID_REG_EBX, 30, 1},
+
+ {"nc", 0x80000008, NA, CPUID_REG_ECX, 0, 8},
+ {"apicidsize", 0x80000008, NA, CPUID_REG_ECX, 12, 4},
+diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
+index fe22f5f5b68b..cd094427dd4c 100644
+--- a/tools/misc/xen-cpuid.c
++++ b/tools/misc/xen-cpuid.c
+@@ -159,6 +159,7 @@ static const char *const str_e8b[32] =
+ [24] = "amd-ssbd", [25] = "virt-ssbd",
+ [26] = "ssb-no",
+ [28] = "psfd", [29] = "btc-no",
++ [30] = "ibpb-ret",
+ };
+
+ static const char *const str_7d0[32] =
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 0f4bad3d3abb..16a562d3a172 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -419,7 +419,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
+ * Hardware read-only information, stating immunity to certain issues, or
+ * suggestions of which mitigation to use.
+ */
+- printk(" Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
++ printk(" Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+ (caps & ARCH_CAPS_RDCL_NO) ? " RDCL_NO" : "",
+ (caps & ARCH_CAPS_IBRS_ALL) ? " IBRS_ALL" : "",
+ (caps & ARCH_CAPS_RSBA) ? " RSBA" : "",
+@@ -436,7 +436,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
+ (e8b & cpufeat_mask(X86_FEATURE_STIBP_ALWAYS)) ? " STIBP_ALWAYS" : "",
+ (e8b & cpufeat_mask(X86_FEATURE_IBRS_FAST)) ? " IBRS_FAST" : "",
+ (e8b & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "",
+- (e8b & cpufeat_mask(X86_FEATURE_BTC_NO)) ? " BTC_NO" : "");
++ (e8b & cpufeat_mask(X86_FEATURE_BTC_NO)) ? " BTC_NO" : "",
++ (e8b & cpufeat_mask(X86_FEATURE_IBPB_RET)) ? " IBPB_RET" : "");
+
+ /* Hardware features which need driving to mitigate issues. */
+ printk(" Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
+diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
+index e7b8167800a2..e0731221404c 100644
+--- a/xen/include/public/arch-x86/cpufeatureset.h
++++ b/xen/include/public/arch-x86/cpufeatureset.h
+@@ -267,6 +267,7 @@ XEN_CPUFEATURE(VIRT_SSBD, 8*32+25) /* MSR_VIRT_SPEC_CTRL.SSBD */
+ XEN_CPUFEATURE(SSB_NO, 8*32+26) /*A Hardware not vulnerable to SSB */
+ XEN_CPUFEATURE(PSFD, 8*32+28) /*S MSR_SPEC_CTRL.PSFD */
+ XEN_CPUFEATURE(BTC_NO, 8*32+29) /*A Hardware not vulnerable to Branch Type Confusion */
++XEN_CPUFEATURE(IBPB_RET, 8*32+30) /*A IBPB clears RSB/RAS too. */
+
+ /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
+ XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A AVX512 Neural Network Instructions */
+--
+2.37.4
+
diff --git a/0087-x86-spec-ctrl-Mitigate-IBPB-not-flushing-the-RSB-RAS.patch b/0087-x86-spec-ctrl-Mitigate-IBPB-not-flushing-the-RSB-RAS.patch
new file mode 100644
index 0000000..9bcb4d3
--- /dev/null
+++ b/0087-x86-spec-ctrl-Mitigate-IBPB-not-flushing-the-RSB-RAS.patch
@@ -0,0 +1,113 @@
+From c1e196ab490b47ce42037c2fef8184a19d96922b Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 14 Jun 2022 16:18:36 +0100
+Subject: [PATCH 87/87] x86/spec-ctrl: Mitigate IBPB not flushing the RSB/RAS
+
+Introduce spec_ctrl_new_guest_context() to encapsulate all logic pertaining to
+using MSR_PRED_CMD for a new guest context, even if it only has one user
+presently.
+
+Introduce X86_BUG_IBPB_NO_RET, and use it extend spec_ctrl_new_guest_context()
+with a manual fixup for hardware which mis-implements IBPB.
+
+This is part of XSA-422 / CVE-2022-23824.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 2b27967fb89d7904a1571a2fb963b1c9cac548db)
+---
+ xen/arch/x86/asm-macros.c | 1 +
+ xen/arch/x86/domain.c | 2 +-
+ xen/arch/x86/spec_ctrl.c | 8 ++++++++
+ xen/include/asm-x86/cpufeatures.h | 1 +
+ xen/include/asm-x86/spec_ctrl.h | 22 ++++++++++++++++++++++
+ 5 files changed, 33 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/asm-macros.c b/xen/arch/x86/asm-macros.c
+index 7e536b0d82f5..891d86c7655c 100644
+--- a/xen/arch/x86/asm-macros.c
++++ b/xen/arch/x86/asm-macros.c
+@@ -1,2 +1,3 @@
+ #include <asm/asm-defns.h>
+ #include <asm/alternative-asm.h>
++#include <asm/spec_ctrl_asm.h>
+diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
+index 3fab2364be8d..3080cde62b5b 100644
+--- a/xen/arch/x86/domain.c
++++ b/xen/arch/x86/domain.c
+@@ -2092,7 +2092,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
+ */
+ if ( *last_id != next_id )
+ {
+- wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
++ spec_ctrl_new_guest_context();
+ *last_id = next_id;
+ }
+ }
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 16a562d3a172..90d86fe5cb47 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -804,6 +804,14 @@ static void __init ibpb_calculations(void)
+ return;
+ }
+
++ /*
++ * AMD/Hygon CPUs to date (June 2022) don't flush the the RAS. Future
++ * CPUs are expected to enumerate IBPB_RET when this has been fixed.
++ * Until then, cover the difference with the software sequence.
++ */
++ if ( boot_cpu_has(X86_FEATURE_IBPB) && !boot_cpu_has(X86_FEATURE_IBPB_RET) )
++ setup_force_cpu_cap(X86_BUG_IBPB_NO_RET);
++
+ /*
+ * IBPB-on-entry mitigations for Branch Type Confusion.
+ *
+diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
+index 672c9ee22ba2..ecc1bb09505a 100644
+--- a/xen/include/asm-x86/cpufeatures.h
++++ b/xen/include/asm-x86/cpufeatures.h
+@@ -49,6 +49,7 @@ XEN_CPUFEATURE(IBPB_ENTRY_HVM, X86_SYNTH(29)) /* MSR_PRED_CMD used by Xen for
+ #define X86_BUG_FPU_PTRS X86_BUG( 0) /* (F)X{SAVE,RSTOR} doesn't save/restore FOP/FIP/FDP. */
+ #define X86_BUG_NULL_SEG X86_BUG( 1) /* NULL-ing a selector preserves the base and limit. */
+ #define X86_BUG_CLFLUSH_MFENCE X86_BUG( 2) /* MFENCE needed to serialise CLFLUSH */
++#define X86_BUG_IBPB_NO_RET X86_BUG( 3) /* IBPB doesn't flush the RSB/RAS */
+
+ /* Total number of capability words, inc synth and bug words. */
+ #define NCAPINTS (FSCAPINTS + X86_NR_SYNTH + X86_NR_BUG) /* N 32-bit words worth of info */
+diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
+index 9403b81dc7af..6a77c3937844 100644
+--- a/xen/include/asm-x86/spec_ctrl.h
++++ b/xen/include/asm-x86/spec_ctrl.h
+@@ -65,6 +65,28 @@
+ void init_speculation_mitigations(void);
+ void spec_ctrl_init_domain(struct domain *d);
+
++/*
++ * Switch to a new guest prediction context.
++ *
++ * This flushes all indirect branch predictors (BTB, RSB/RAS), so guest code
++ * which has previously run on this CPU can't attack subsequent guest code.
++ *
++ * As this flushes the RSB/RAS, it destroys the predictions of the calling
++ * context. For best performace, arrange for this to be used when we're going
++ * to jump out of the current context, e.g. with reset_stack_and_jump().
++ *
++ * For hardware which mis-implements IBPB, fix up by flushing the RSB/RAS
++ * manually.
++ */
++static always_inline void spec_ctrl_new_guest_context(void)
++{
++ wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
++
++ /* (ab)use alternative_input() to specify clobbers. */
++ alternative_input("", "DO_OVERWRITE_RSB", X86_BUG_IBPB_NO_RET,
++ : "rax", "rcx");
++}
++
+ extern int8_t opt_ibpb_ctxt_switch;
+ extern bool opt_ssbd;
+ extern int8_t opt_eager_fpu;
+--
+2.37.4
+
diff --git a/info.txt b/info.txt
index d2c53b1..a70e606 100644
--- a/info.txt
+++ b/info.txt
@@ -1,6 +1,6 @@
-Xen upstream patchset #0 for 4.16.3-pre
+Xen upstream patchset #1 for 4.16.3-pre
Containing patches from
RELEASE-4.16.2 (1871bd1c9eb934f0ffd039f3d68e42fd0097f322)
to
-staging-4.16 (1bce7fb1f702da4f7a749c6f1457ecb20bf74fca)
+staging-4.16 (c1e196ab490b47ce42037c2fef8184a19d96922b)
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [gentoo-commits] proj/xen-upstream-patches:main commit in: /
@ 2023-04-14 16:04 Tomáš Mózes
0 siblings, 0 replies; 11+ messages in thread
From: Tomáš Mózes @ 2023-04-14 16:04 UTC (permalink / raw
To: gentoo-commits
commit: 83c3b8b1a5486948521768ca2f2ef6de59357210
Author: Tomáš Mózes <hydrapolic <AT> gmail <DOT> com>
AuthorDate: Fri Apr 14 15:59:27 2023 +0000
Commit: Tomáš Mózes <hydrapolic <AT> gmail <DOT> com>
CommitDate: Fri Apr 14 15:59:27 2023 +0000
URL: https://gitweb.gentoo.org/proj/xen-upstream-patches.git/commit/?id=83c3b8b1
Xen 4.16.4-pre-patchset-0
Signed-off-by: Tomáš Mózes <hydrapolic <AT> gmail.com>
... => 0001-update-Xen-version-to-4.16.4-pre.patch | 14 +-
...roadcast-accept-partial-broadcast-success.patch | 34 +
...-Prevent-adding-mapping-when-domain-is-dy.patch | 62 --
...prevent-overflow-with-high-frequency-TSCs.patch | 34 +
...-Handle-preemption-when-freeing-intermedi.patch | 167 ----
...-Restore-Xen-s-MSR_PAT-value-on-S3-resume.patch | 36 +
...-option-to-skip-root-pagetable-removal-in.patch | 138 ----
...uild-with-recent-QEMU-use-enable-trace-ba.patch | 50 ++
...just-monitor-table-related-error-handling.patch | 77 --
...tolerate-failure-of-sh_set_toplevel_shado.patch | 76 --
...culate-model-specific-LBRs-once-at-start-.patch | 342 ++++++++
...hadow-tolerate-failure-in-shadow_prealloc.patch | 279 -------
...pport-for-CPUs-without-model-specific-LBR.patch | 83 ++
...-refuse-new-allocations-for-dying-domains.patch | 100 ---
...fix-PAE-check-for-top-level-table-unshado.patch | 39 +
...x-an-incorrect-assignment-to-uart-io_size.patch | 34 +
...ly-free-paging-pool-memory-for-dying-doma.patch | 115 ---
0010-libxl-fix-guest-kexec-skip-cpuid-policy.patch | 72 ++
...-free-the-paging-memory-pool-preemptively.patch | 181 -----
...-xenctrl-Make-domain_getinfolist-tail-rec.patch | 71 ++
...en-x86-p2m-Add-preemption-in-p2m_teardown.patch | 197 -----
...s-Use-arch-specific-default-paging-memory.patch | 149 ----
...-xenctrl-Use-larger-chunksize-in-domain_g.patch | 41 +
...aml-xb-mmap-Use-Data_abstract_val-wrapper.patch | 75 ++
...m-Construct-the-P2M-pages-pool-for-guests.patch | 189 -----
0014-tools-ocaml-xb-Drop-Xs_ring.write.patch | 62 ++
...xl-Implement-XEN_DOMCTL_shadow_op-for-Arm.patch | 108 ---
...tored-validate-config-file-before-live-up.patch | 131 +++
...ocate-and-free-P2M-pages-from-the-P2M-poo.patch | 289 -------
...ect-locking-on-transitive-grant-copy-erro.patch | 66 --
...l-libs-Don-t-declare-stubs-as-taking-void.patch | 61 ++
...-Replace-deprecated-soundhw-on-QEMU-comma.patch | 112 ---
...-libs-Allocate-the-correct-amount-of-memo.patch | 80 ++
...-evtchn-Don-t-reference-Custom-objects-wi.patch | 213 +++++
...urface-suitable-value-in-EBX-of-XSTATE-su.patch | 44 -
...-xc-Fix-binding-for-xc_domain_assign_devi.patch | 70 ++
...ed-introduce-cpupool_update_node_affinity.patch | 257 ------
...-xc-Don-t-reference-Abstract_Tag-objects-.patch | 76 ++
...arve-out-memory-allocation-and-freeing-fr.patch | 263 ------
...-libs-Fix-memory-resource-leaks-with-caml.patch | 61 ++
0021-xen-sched-fix-cpu-hotplug.patch | 307 -------
...orrect-PIE-related-option-s-in-EMBEDDED_E.patch | 58 --
...rl-Mitigate-Cross-Thread-Return-Address-P.patch | 120 +++
...Remove-clang-8-from-Debian-unstable-conta.patch | 84 ++
...ore-minor-fix-of-the-migration-stream-doc.patch | 41 -
...ix-parallel-build-between-flex-bison-and-.patch | 50 ++
0024-xen-gnttab-fix-gnttab_acquire_resource.patch | 69 --
...uid-Infrastructure-for-leaves-7-1-ecx-edx.patch | 128 +++
...-VCPUOP_register_vcpu_time_memory_area-fo.patch | 59 --
...isable-CET-SS-on-parts-susceptible-to-fra.patch | 191 +++++
...-x86-vpmu-Fix-race-condition-in-vpmu_load.patch | 97 ---
0027-arm-p2m-Rework-p2m_init.patch | 88 --
...pect-credit2_runqueue-all-when-arranging-.patch | 69 ++
...MD-apply-the-patch-early-on-every-logical.patch | 152 ++++
...-Populate-pages-for-GICv2-mapping-in-p2m_.patch | 169 ----
...-mem_sharing-teardown-before-paging-teard.patch | 111 +++
0029-x86emul-respect-NSCB.patch | 40 -
...correct-error-handling-in-vmx_create_vmcs.patch | 38 -
...Work-around-Clang-IAS-macro-expansion-bug.patch | 115 +++
...-argo-Remove-reachable-ASSERT_UNREACHABLE.patch | 41 -
...ng-Wunicode-diagnostic-when-building-asm-.patch | 83 ++
...onvert-memory-marked-for-runtime-use-to-o.patch | 64 --
...KG_CONFIG_FILE-instead-of-PKG_CONFIG-vari.patch | 98 +++
...Fix-resource-leaks-in-xc_core_arch_map_p2.patch | 65 ++
0033-xen-sched-fix-race-in-RTDS-scheduler.patch | 42 -
...Fix-leak-on-realloc-failure-in-backup_pte.patch | 56 ++
...-fix-restore_vcpu_affinity-by-removing-it.patch | 158 ----
...-x86-shadow-drop-replace-bogus-assertions.patch | 71 --
...MD-late-load-the-patch-on-every-logical-t.patch | 90 +++
...assume-that-vpci-per-device-data-exists-u.patch | 60 --
...account-for-log-dirty-mode-when-pre-alloc.patch | 92 +++
...pci-msix-remove-from-table-list-on-detach.patch | 47 --
...nd-number-of-pinned-cache-attribute-regio.patch | 50 ++
...ialize-pinned-cache-attribute-list-manipu.patch | 126 +++
...p-secondary-time-area-handles-during-soft.patch | 49 --
...vcpu_info-wants-to-unshare-the-underlying.patch | 41 -
...rl-Defer-CR4_PV32_RESTORE-on-the-cstar_en.patch | 56 ++
...python-change-s-size-type-for-Python-3.10.patch | 72 ++
...-correctly-ignore-empty-onlining-requests.patch | 43 -
...s-xenmon-Fix-xenmon.py-for-with-python3.x.patch | 54 ++
...m-correct-ballooning-up-for-compat-guests.patch | 55 --
...arking-fix-build-with-gcc12-and-NR_CPUS-1.patch | 95 +++
...-correct-ballooning-down-for-compat-guest.patch | 72 --
...help-gcc13-to-avoid-it-emitting-a-warning.patch | 129 +++
...ert-VMX-use-a-single-global-APIC-access-p.patch | 259 ------
0044-VT-d-constrain-IGD-check.patch | 44 +
...ore-create_node-Don-t-defer-work-to-undo-.patch | 120 ---
0045-bunzip-work-around-gcc13-warning.patch | 42 +
...ore-Fail-a-transaction-if-it-is-not-possi.patch | 145 ----
0046-libacpi-fix-PCI-hotplug-AML.patch | 57 ++
0046-tools-xenstore-split-up-send_reply.patch | 213 -----
...ithout-XT-x2APIC-needs-to-be-forced-into-.patch | 42 +
...ore-add-helpers-to-free-struct-buffered_d.patch | 117 ---
...mmu-no-igfx-if-the-IOMMU-scope-contains-f.patch | 44 +
...ls-xenstore-reduce-number-of-watch-events.patch | 201 -----
...xenstore-let-unread-watch-events-time-out.patch | 309 -------
...fix-and-improve-sh_page_has_multiple_shad.patch | 47 ++
...tools-xenstore-limit-outstanding-requests.patch | 453 -----------
...Fix-evaluate_nospec-code-generation-under.patch | 101 +++
...ore-don-t-buffer-multiple-identical-watch.patch | 93 ---
...x86-shadow-Fix-build-with-no-PG_log_dirty.patch | 56 ++
0052-tools-xenstore-fix-connection-id-usage.patch | 61 --
...-t-spuriously-crash-the-domain-when-INIT-.patch | 51 ++
...ore-simplify-and-fix-per-domain-node-acco.patch | 336 --------
...6-ucode-Fix-error-paths-control_thread_fn.patch | 56 ++
...ore-limit-max-number-of-nodes-accessed-in.patch | 255 ------
...andle-accesses-adjacent-to-the-MSI-X-tabl.patch | 543 +++++++++++++
...rect-name-value-pair-parsing-for-PCI-port.patch | 59 ++
...ore-move-the-call-of-setup_structure-to-d.patch | 96 ---
0056-bump-default-SeaBIOS-version-to-1.16.0.patch | 28 +
...ore-add-infrastructure-to-keep-track-of-p.patch | 289 -------
0057-CI-Drop-automation-configs.patch | 87 ++
...store-add-memory-accounting-for-responses.patch | 82 --
...Switch-arm32-cross-builds-to-run-on-arm64.patch | 87 ++
...enstore-add-memory-accounting-for-watches.patch | 96 ---
...n-Remove-CentOS-7.2-containers-and-builds.patch | 145 ++++
...-xenstore-add-memory-accounting-for-nodes.patch | 342 --------
...mation-Remove-non-debug-x86_32-build-jobs.patch | 67 ++
...-xenstore-add-exports-for-quota-variables.patch | 62 --
...-llvm-8-from-the-Debian-Stretch-container.patch | 103 +++
...ore-add-control-command-for-setting-and-s.patch | 248 ------
...-xenstored-Synchronise-defaults-with-oxen.patch | 63 --
...-xenstored-Check-for-maxrequests-before-p.patch | 101 ---
0064-tools-ocaml-GC-parameter-tuning.patch | 126 ---
0065-tools-ocaml-libs-xb-hide-type-of-Xb.t.patch | 92 ---
...-Change-Xb.input-to-return-Packet.t-optio.patch | 224 ------
0067-tools-ocaml-xb-Add-BoundedQueue.patch | 133 ---
...-Limit-maximum-in-flight-requests-outstan.patch | 888 ---------------------
...clarify-support-of-untrusted-driver-domai.patch | 55 --
...ore-don-t-use-conn-in-as-context-for-temp.patch | 718 -----------------
...ls-xenstore-fix-checking-node-permissions.patch | 143 ----
...tore-remove-recursion-from-construct_node.patch | 125 ---
...ore-don-t-let-remove_child_entry-call-cor.patch | 110 ---
...ls-xenstore-add-generic-treewalk-function.patch | 250 ------
0075-tools-xenstore-simplify-check_store.patch | 114 ---
...ols-xenstore-use-treewalk-for-check_store.patch | 172 ----
...-xenstore-use-treewalk-for-deleting-nodes.patch | 180 -----
...ore-use-treewalk-for-creating-node-record.patch | 169 ----
...ore-remove-nodes-owned-by-destroyed-domai.patch | 298 -------
...ore-make-the-internal-memory-data-base-th.patch | 101 ---
...e-xenstore.txt-with-permissions-descripti.patch | 50 --
...-xenstored-Fix-quota-bypass-on-domain-shu.patch | 93 ---
...caml-Ensure-packet-size-is-never-negative.patch | 75 --
...xenstore-fix-deleting-node-in-transaction.patch | 46 --
...ore-harden-transaction-finalization-again.patch | 410 ----------
0086-x86-spec-ctrl-Enumeration-for-IBPB_RET.patch | 82 --
...rl-Mitigate-IBPB-not-flushing-the-RSB-RAS.patch | 113 ---
info.txt | 6 +-
148 files changed, 5420 insertions(+), 13296 deletions(-)
diff --git a/0001-update-Xen-version-to-4.16.3-pre.patch b/0001-update-Xen-version-to-4.16.4-pre.patch
similarity index 59%
rename from 0001-update-Xen-version-to-4.16.3-pre.patch
rename to 0001-update-Xen-version-to-4.16.4-pre.patch
index d04dd34..961358a 100644
--- a/0001-update-Xen-version-to-4.16.3-pre.patch
+++ b/0001-update-Xen-version-to-4.16.4-pre.patch
@@ -1,25 +1,25 @@
-From 4aa32912ebeda8cb94d1c3941e7f1f0a2d4f921b Mon Sep 17 00:00:00 2001
+From e3396cd8be5ee99d363a23f30c680e42fb2757bd Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 11 Oct 2022 14:49:41 +0200
-Subject: [PATCH 01/87] update Xen version to 4.16.3-pre
+Date: Tue, 20 Dec 2022 13:50:16 +0100
+Subject: [PATCH 01/61] update Xen version to 4.16.4-pre
---
xen/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/Makefile b/xen/Makefile
-index 76d0a3ff253f..8a403ee896cd 100644
+index 06dde1e03c..67c5551ffd 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -2,7 +2,7 @@
# All other places this is stored (eg. compile.h) should be autogenerated.
export XEN_VERSION = 4
export XEN_SUBVERSION = 16
--export XEN_EXTRAVERSION ?= .2$(XEN_VENDORVERSION)
-+export XEN_EXTRAVERSION ?= .3-pre$(XEN_VENDORVERSION)
+-export XEN_EXTRAVERSION ?= .3$(XEN_VENDORVERSION)
++export XEN_EXTRAVERSION ?= .4-pre$(XEN_VENDORVERSION)
export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
-include xen-version
--
-2.37.4
+2.40.0
diff --git a/0002-ioreq_broadcast-accept-partial-broadcast-success.patch b/0002-ioreq_broadcast-accept-partial-broadcast-success.patch
new file mode 100644
index 0000000..1b0ae9c
--- /dev/null
+++ b/0002-ioreq_broadcast-accept-partial-broadcast-success.patch
@@ -0,0 +1,34 @@
+From f2edbd79f5d5ce3b633885469852e1215dc0d4b5 Mon Sep 17 00:00:00 2001
+From: Per Bilse <per.bilse@citrix.com>
+Date: Tue, 20 Dec 2022 13:50:47 +0100
+Subject: [PATCH 02/61] ioreq_broadcast(): accept partial broadcast success
+
+Avoid incorrectly triggering an error when a broadcast buffered ioreq
+is not handled by all registered clients, as long as the failure is
+strictly because the client doesn't handle buffered ioreqs.
+
+Signed-off-by: Per Bilse <per.bilse@citrix.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+master commit: a44734df6c24fadbdb001f051cc5580c467caf7d
+master date: 2022-12-07 12:17:30 +0100
+---
+ xen/common/ioreq.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
+index 42414b750b..2a8d8de2d5 100644
+--- a/xen/common/ioreq.c
++++ b/xen/common/ioreq.c
+@@ -1322,7 +1322,8 @@ unsigned int ioreq_broadcast(ioreq_t *p, bool buffered)
+
+ FOR_EACH_IOREQ_SERVER(d, id, s)
+ {
+- if ( !s->enabled )
++ if ( !s->enabled ||
++ (buffered && s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_OFF) )
+ continue;
+
+ if ( ioreq_send(s, p, buffered) == IOREQ_STATUS_UNHANDLED )
+--
+2.40.0
+
diff --git a/0002-xen-arm-p2m-Prevent-adding-mapping-when-domain-is-dy.patch b/0002-xen-arm-p2m-Prevent-adding-mapping-when-domain-is-dy.patch
deleted file mode 100644
index 63aa293..0000000
--- a/0002-xen-arm-p2m-Prevent-adding-mapping-when-domain-is-dy.patch
+++ /dev/null
@@ -1,62 +0,0 @@
-From 8d9531a3421dad2b0012e09e6f41d5274e162064 Mon Sep 17 00:00:00 2001
-From: Julien Grall <jgrall@amazon.com>
-Date: Tue, 11 Oct 2022 14:52:13 +0200
-Subject: [PATCH 02/87] xen/arm: p2m: Prevent adding mapping when domain is
- dying
-
-During the domain destroy process, the domain will still be accessible
-until it is fully destroyed. So does the P2M because we don't bail
-out early if is_dying is non-zero. If a domain has permission to
-modify the other domain's P2M (i.e. dom0, or a stubdomain), then
-foreign mapping can be added past relinquish_p2m_mapping().
-
-Therefore, we need to prevent mapping to be added when the domain
-is dying. This commit prevents such adding of mapping by adding the
-d->is_dying check to p2m_set_entry(). Also this commit enhances the
-check in relinquish_p2m_mapping() to make sure that no mappings can
-be added in the P2M after the P2M lock is released.
-
-This is part of CVE-2022-33746 / XSA-410.
-
-Signed-off-by: Julien Grall <jgrall@amazon.com>
-Signed-off-by: Henry Wang <Henry.Wang@arm.com>
-Tested-by: Henry Wang <Henry.Wang@arm.com>
-Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
-master commit: 3ebe773293e3b945460a3d6f54f3b91915397bab
-master date: 2022-10-11 14:20:18 +0200
----
- xen/arch/arm/p2m.c | 11 +++++++++++
- 1 file changed, 11 insertions(+)
-
-diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
-index 3349b464a39e..1affdafadbeb 100644
---- a/xen/arch/arm/p2m.c
-+++ b/xen/arch/arm/p2m.c
-@@ -1093,6 +1093,15 @@ int p2m_set_entry(struct p2m_domain *p2m,
- {
- int rc = 0;
-
-+ /*
-+ * Any reference taken by the P2M mappings (e.g. foreign mapping) will
-+ * be dropped in relinquish_p2m_mapping(). As the P2M will still
-+ * be accessible after, we need to prevent mapping to be added when the
-+ * domain is dying.
-+ */
-+ if ( unlikely(p2m->domain->is_dying) )
-+ return -ENOMEM;
-+
- while ( nr )
- {
- unsigned long mask;
-@@ -1610,6 +1619,8 @@ int relinquish_p2m_mapping(struct domain *d)
- unsigned int order;
- gfn_t start, end;
-
-+ BUG_ON(!d->is_dying);
-+ /* No mappings can be added in the P2M after the P2M lock is released. */
- p2m_write_lock(p2m);
-
- start = p2m->lowest_mapped_gfn;
---
-2.37.4
-
diff --git a/0003-x86-time-prevent-overflow-with-high-frequency-TSCs.patch b/0003-x86-time-prevent-overflow-with-high-frequency-TSCs.patch
new file mode 100644
index 0000000..a031317
--- /dev/null
+++ b/0003-x86-time-prevent-overflow-with-high-frequency-TSCs.patch
@@ -0,0 +1,34 @@
+From 65bf12135f618614bbf44626fba1c20ca8d1a127 Mon Sep 17 00:00:00 2001
+From: Neowutran <xen@neowutran.ovh>
+Date: Tue, 20 Dec 2022 13:51:42 +0100
+Subject: [PATCH 03/61] x86/time: prevent overflow with high frequency TSCs
+
+Make sure tsc_khz is promoted to a 64-bit type before multiplying by
+1000 to avoid an 'overflow before widen' bug. Otherwise just above
+4.294GHz the value will overflow. Processors with clocks this high are
+now in production and require this to work correctly.
+
+Signed-off-by: Neowutran <xen@neowutran.ovh>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: ad15a0a8ca2515d8ac58edfc0bc1d3719219cb77
+master date: 2022-12-19 11:34:16 +0100
+---
+ xen/arch/x86/time.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
+index 1daff92dca..db0b149ec6 100644
+--- a/xen/arch/x86/time.c
++++ b/xen/arch/x86/time.c
+@@ -2490,7 +2490,7 @@ int tsc_set_info(struct domain *d,
+ case TSC_MODE_ALWAYS_EMULATE:
+ d->arch.vtsc_offset = get_s_time() - elapsed_nsec;
+ d->arch.tsc_khz = gtsc_khz ?: cpu_khz;
+- set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000);
++ set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000UL);
+
+ /*
+ * In default mode use native TSC if the host has safe TSC and
+--
+2.40.0
+
diff --git a/0003-xen-arm-p2m-Handle-preemption-when-freeing-intermedi.patch b/0003-xen-arm-p2m-Handle-preemption-when-freeing-intermedi.patch
deleted file mode 100644
index 0b33b0a..0000000
--- a/0003-xen-arm-p2m-Handle-preemption-when-freeing-intermedi.patch
+++ /dev/null
@@ -1,167 +0,0 @@
-From 937fdbad5180440888f1fcee46299103327efa90 Mon Sep 17 00:00:00 2001
-From: Julien Grall <jgrall@amazon.com>
-Date: Tue, 11 Oct 2022 14:52:27 +0200
-Subject: [PATCH 03/87] xen/arm: p2m: Handle preemption when freeing
- intermediate page tables
-
-At the moment the P2M page tables will be freed when the domain structure
-is freed without any preemption. As the P2M is quite large, iterating
-through this may take more time than it is reasonable without intermediate
-preemption (to run softirqs and perhaps scheduler).
-
-Split p2m_teardown() in two parts: one preemptible and called when
-relinquishing the resources, the other one non-preemptible and called
-when freeing the domain structure.
-
-As we are now freeing the P2M pages early, we also need to prevent
-further allocation if someone call p2m_set_entry() past p2m_teardown()
-(I wasn't able to prove this will never happen). This is done by
-the checking domain->is_dying from previous patch in p2m_set_entry().
-
-Similarly, we want to make sure that no-one can accessed the free
-pages. Therefore the root is cleared before freeing pages.
-
-This is part of CVE-2022-33746 / XSA-410.
-
-Signed-off-by: Julien Grall <jgrall@amazon.com>
-Signed-off-by: Henry Wang <Henry.Wang@arm.com>
-Tested-by: Henry Wang <Henry.Wang@arm.com>
-Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
-master commit: 3202084566bba0ef0c45caf8c24302f83d92f9c8
-master date: 2022-10-11 14:20:56 +0200
----
- xen/arch/arm/domain.c | 10 +++++++--
- xen/arch/arm/p2m.c | 47 ++++++++++++++++++++++++++++++++++++---
- xen/include/asm-arm/p2m.h | 13 +++++++++--
- 3 files changed, 63 insertions(+), 7 deletions(-)
-
-diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
-index 96e1b235501d..2694c39127c5 100644
---- a/xen/arch/arm/domain.c
-+++ b/xen/arch/arm/domain.c
-@@ -789,10 +789,10 @@ fail:
- void arch_domain_destroy(struct domain *d)
- {
- /* IOMMU page table is shared with P2M, always call
-- * iommu_domain_destroy() before p2m_teardown().
-+ * iommu_domain_destroy() before p2m_final_teardown().
- */
- iommu_domain_destroy(d);
-- p2m_teardown(d);
-+ p2m_final_teardown(d);
- domain_vgic_free(d);
- domain_vuart_free(d);
- free_xenheap_page(d->shared_info);
-@@ -996,6 +996,7 @@ enum {
- PROG_xen,
- PROG_page,
- PROG_mapping,
-+ PROG_p2m,
- PROG_done,
- };
-
-@@ -1056,6 +1057,11 @@ int domain_relinquish_resources(struct domain *d)
- if ( ret )
- return ret;
-
-+ PROGRESS(p2m):
-+ ret = p2m_teardown(d);
-+ if ( ret )
-+ return ret;
-+
- PROGRESS(done):
- break;
-
-diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
-index 1affdafadbeb..27418ee5ee98 100644
---- a/xen/arch/arm/p2m.c
-+++ b/xen/arch/arm/p2m.c
-@@ -1527,17 +1527,58 @@ static void p2m_free_vmid(struct domain *d)
- spin_unlock(&vmid_alloc_lock);
- }
-
--void p2m_teardown(struct domain *d)
-+int p2m_teardown(struct domain *d)
- {
- struct p2m_domain *p2m = p2m_get_hostp2m(d);
-+ unsigned long count = 0;
- struct page_info *pg;
-+ unsigned int i;
-+ int rc = 0;
-+
-+ p2m_write_lock(p2m);
-+
-+ /*
-+ * We are about to free the intermediate page-tables, so clear the
-+ * root to prevent any walk to use them.
-+ */
-+ for ( i = 0; i < P2M_ROOT_PAGES; i++ )
-+ clear_and_clean_page(p2m->root + i);
-+
-+ /*
-+ * The domain will not be scheduled anymore, so in theory we should
-+ * not need to flush the TLBs. Do it for safety purpose.
-+ *
-+ * Note that all the devices have already been de-assigned. So we don't
-+ * need to flush the IOMMU TLB here.
-+ */
-+ p2m_force_tlb_flush_sync(p2m);
-+
-+ while ( (pg = page_list_remove_head(&p2m->pages)) )
-+ {
-+ free_domheap_page(pg);
-+ count++;
-+ /* Arbitrarily preempt every 512 iterations */
-+ if ( !(count % 512) && hypercall_preempt_check() )
-+ {
-+ rc = -ERESTART;
-+ break;
-+ }
-+ }
-+
-+ p2m_write_unlock(p2m);
-+
-+ return rc;
-+}
-+
-+void p2m_final_teardown(struct domain *d)
-+{
-+ struct p2m_domain *p2m = p2m_get_hostp2m(d);
-
- /* p2m not actually initialized */
- if ( !p2m->domain )
- return;
-
-- while ( (pg = page_list_remove_head(&p2m->pages)) )
-- free_domheap_page(pg);
-+ ASSERT(page_list_empty(&p2m->pages));
-
- if ( p2m->root )
- free_domheap_pages(p2m->root, P2M_ROOT_ORDER);
-diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
-index 8f11d9c97b5d..b3ba83283e11 100644
---- a/xen/include/asm-arm/p2m.h
-+++ b/xen/include/asm-arm/p2m.h
-@@ -192,8 +192,17 @@ void setup_virt_paging(void);
- /* Init the datastructures for later use by the p2m code */
- int p2m_init(struct domain *d);
-
--/* Return all the p2m resources to Xen. */
--void p2m_teardown(struct domain *d);
-+/*
-+ * The P2M resources are freed in two parts:
-+ * - p2m_teardown() will be called when relinquish the resources. It
-+ * will free large resources (e.g. intermediate page-tables) that
-+ * requires preemption.
-+ * - p2m_final_teardown() will be called when domain struct is been
-+ * freed. This *cannot* be preempted and therefore one small
-+ * resources should be freed here.
-+ */
-+int p2m_teardown(struct domain *d);
-+void p2m_final_teardown(struct domain *d);
-
- /*
- * Remove mapping refcount on each mapping page in the p2m
---
-2.37.4
-
diff --git a/0004-x86-S3-Restore-Xen-s-MSR_PAT-value-on-S3-resume.patch b/0004-x86-S3-Restore-Xen-s-MSR_PAT-value-on-S3-resume.patch
new file mode 100644
index 0000000..3d1c089
--- /dev/null
+++ b/0004-x86-S3-Restore-Xen-s-MSR_PAT-value-on-S3-resume.patch
@@ -0,0 +1,36 @@
+From 7b1b9849e8a0d7791866d6d21c45993dfe27836c Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 7 Feb 2023 17:03:09 +0100
+Subject: [PATCH 04/61] x86/S3: Restore Xen's MSR_PAT value on S3 resume
+
+There are two paths in the trampoline, and Xen's PAT needs setting up in both,
+not just the boot path.
+
+Fixes: 4304ff420e51 ("x86/S3: Drop {save,restore}_rest_processor_state() completely")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 4d975798e11579fdf405b348543061129e01b0fb
+master date: 2023-01-10 21:21:30 +0000
+---
+ xen/arch/x86/boot/wakeup.S | 5 +++++
+ 1 file changed, 5 insertions(+)
+
+diff --git a/xen/arch/x86/boot/wakeup.S b/xen/arch/x86/boot/wakeup.S
+index c17d613b61..08447e1934 100644
+--- a/xen/arch/x86/boot/wakeup.S
++++ b/xen/arch/x86/boot/wakeup.S
+@@ -130,6 +130,11 @@ wakeup_32:
+ and %edi, %edx
+ wrmsr
+ 1:
++ /* Set up PAT before enabling paging. */
++ mov $XEN_MSR_PAT & 0xffffffff, %eax
++ mov $XEN_MSR_PAT >> 32, %edx
++ mov $MSR_IA32_CR_PAT, %ecx
++ wrmsr
+
+ /* Set up EFER (Extended Feature Enable Register). */
+ movl $MSR_EFER,%ecx
+--
+2.40.0
+
diff --git a/0004-x86-p2m-add-option-to-skip-root-pagetable-removal-in.patch b/0004-x86-p2m-add-option-to-skip-root-pagetable-removal-in.patch
deleted file mode 100644
index 04c002b..0000000
--- a/0004-x86-p2m-add-option-to-skip-root-pagetable-removal-in.patch
+++ /dev/null
@@ -1,138 +0,0 @@
-From 8fc19c143b8aa563077f3d5c46fcc0a54dc04f35 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 11 Oct 2022 14:52:39 +0200
-Subject: [PATCH 04/87] x86/p2m: add option to skip root pagetable removal in
- p2m_teardown()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Add a new parameter to p2m_teardown() in order to select whether the
-root page table should also be freed. Note that all users are
-adjusted to pass the parameter to remove the root page tables, so
-behavior is not modified.
-
-No functional change intended.
-
-This is part of CVE-2022-33746 / XSA-410.
-
-Suggested-by: Julien Grall <julien@xen.org>
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Tim Deegan <tim@xen.org>
-master commit: 1df52a270225527ae27bfa2fc40347bf93b78357
-master date: 2022-10-11 14:21:23 +0200
----
- xen/arch/x86/mm/hap/hap.c | 6 +++---
- xen/arch/x86/mm/p2m.c | 20 ++++++++++++++++----
- xen/arch/x86/mm/shadow/common.c | 4 ++--
- xen/include/asm-x86/p2m.h | 2 +-
- 4 files changed, 22 insertions(+), 10 deletions(-)
-
-diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
-index 47a7487fa7a3..a8f5a19da917 100644
---- a/xen/arch/x86/mm/hap/hap.c
-+++ b/xen/arch/x86/mm/hap/hap.c
-@@ -541,18 +541,18 @@ void hap_final_teardown(struct domain *d)
- }
-
- for ( i = 0; i < MAX_ALTP2M; i++ )
-- p2m_teardown(d->arch.altp2m_p2m[i]);
-+ p2m_teardown(d->arch.altp2m_p2m[i], true);
- }
-
- /* Destroy nestedp2m's first */
- for (i = 0; i < MAX_NESTEDP2M; i++) {
-- p2m_teardown(d->arch.nested_p2m[i]);
-+ p2m_teardown(d->arch.nested_p2m[i], true);
- }
-
- if ( d->arch.paging.hap.total_pages != 0 )
- hap_teardown(d, NULL);
-
-- p2m_teardown(p2m_get_hostp2m(d));
-+ p2m_teardown(p2m_get_hostp2m(d), true);
- /* Free any memory that the p2m teardown released */
- paging_lock(d);
- hap_set_allocation(d, 0, NULL);
-diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
-index def1695cf00b..aba4f17cbe12 100644
---- a/xen/arch/x86/mm/p2m.c
-+++ b/xen/arch/x86/mm/p2m.c
-@@ -749,11 +749,11 @@ int p2m_alloc_table(struct p2m_domain *p2m)
- * hvm fixme: when adding support for pvh non-hardware domains, this path must
- * cleanup any foreign p2m types (release refcnts on them).
- */
--void p2m_teardown(struct p2m_domain *p2m)
-+void p2m_teardown(struct p2m_domain *p2m, bool remove_root)
- /* Return all the p2m pages to Xen.
- * We know we don't have any extra mappings to these pages */
- {
-- struct page_info *pg;
-+ struct page_info *pg, *root_pg = NULL;
- struct domain *d;
-
- if (p2m == NULL)
-@@ -763,10 +763,22 @@ void p2m_teardown(struct p2m_domain *p2m)
-
- p2m_lock(p2m);
- ASSERT(atomic_read(&d->shr_pages) == 0);
-- p2m->phys_table = pagetable_null();
-+
-+ if ( remove_root )
-+ p2m->phys_table = pagetable_null();
-+ else if ( !pagetable_is_null(p2m->phys_table) )
-+ {
-+ root_pg = pagetable_get_page(p2m->phys_table);
-+ clear_domain_page(pagetable_get_mfn(p2m->phys_table));
-+ }
-
- while ( (pg = page_list_remove_head(&p2m->pages)) )
-- d->arch.paging.free_page(d, pg);
-+ if ( pg != root_pg )
-+ d->arch.paging.free_page(d, pg);
-+
-+ if ( root_pg )
-+ page_list_add(root_pg, &p2m->pages);
-+
- p2m_unlock(p2m);
- }
-
-diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
-index 8c1b041f7135..8c5baba9544d 100644
---- a/xen/arch/x86/mm/shadow/common.c
-+++ b/xen/arch/x86/mm/shadow/common.c
-@@ -2701,7 +2701,7 @@ int shadow_enable(struct domain *d, u32 mode)
- paging_unlock(d);
- out_unlocked:
- if ( rv != 0 && !pagetable_is_null(p2m_get_pagetable(p2m)) )
-- p2m_teardown(p2m);
-+ p2m_teardown(p2m, true);
- if ( rv != 0 && pg != NULL )
- {
- pg->count_info &= ~PGC_count_mask;
-@@ -2866,7 +2866,7 @@ void shadow_final_teardown(struct domain *d)
- shadow_teardown(d, NULL);
-
- /* It is now safe to pull down the p2m map. */
-- p2m_teardown(p2m_get_hostp2m(d));
-+ p2m_teardown(p2m_get_hostp2m(d), true);
- /* Free any shadow memory that the p2m teardown released */
- paging_lock(d);
- shadow_set_allocation(d, 0, NULL);
-diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
-index f2af7a746ced..c3c16748e7d5 100644
---- a/xen/include/asm-x86/p2m.h
-+++ b/xen/include/asm-x86/p2m.h
-@@ -574,7 +574,7 @@ int p2m_init(struct domain *d);
- int p2m_alloc_table(struct p2m_domain *p2m);
-
- /* Return all the p2m resources to Xen. */
--void p2m_teardown(struct p2m_domain *p2m);
-+void p2m_teardown(struct p2m_domain *p2m, bool remove_root);
- void p2m_final_teardown(struct domain *d);
-
- /* Add a page to a domain's p2m table */
---
-2.37.4
-
diff --git a/0005-tools-Fix-build-with-recent-QEMU-use-enable-trace-ba.patch b/0005-tools-Fix-build-with-recent-QEMU-use-enable-trace-ba.patch
new file mode 100644
index 0000000..ff66a43
--- /dev/null
+++ b/0005-tools-Fix-build-with-recent-QEMU-use-enable-trace-ba.patch
@@ -0,0 +1,50 @@
+From 998c03b2abfbf17ff96bccad1512de1ea18d0d75 Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Tue, 7 Feb 2023 17:03:51 +0100
+Subject: [PATCH 05/61] tools: Fix build with recent QEMU, use
+ "--enable-trace-backends"
+
+The configure option "--enable-trace-backend" isn't accepted anymore
+and we should use "--enable-trace-backends" instead which was
+introduce in 2014 and allow multiple backends.
+
+"--enable-trace-backends" was introduced by:
+ 5b808275f3bb ("trace: Multi-backend tracing")
+The backward compatible option "--enable-trace-backend" is removed by
+ 10229ec3b0ff ("configure: remove backwards-compatibility and obsolete options")
+
+As we already use ./configure options that wouldn't be accepted by
+older version of QEMU's configure, we will simply use the new spelling
+for the option and avoid trying to detect which spelling to use.
+
+We already make use if "--firmwarepath=" which was introduced by
+ 3d5eecab4a5a ("Add --firmwarepath to configure")
+which already include the new spelling for "--enable-trace-backends".
+
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
+master commit: e66d450b6e0ffec635639df993ab43ce28b3383f
+master date: 2023-01-11 10:45:29 +0100
+---
+ tools/Makefile | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/tools/Makefile b/tools/Makefile
+index 757a560be0..9b6b605ec9 100644
+--- a/tools/Makefile
++++ b/tools/Makefile
+@@ -218,9 +218,9 @@ subdir-all-qemu-xen-dir: qemu-xen-dir-find
+ mkdir -p qemu-xen-build; \
+ cd qemu-xen-build; \
+ if $$source/scripts/tracetool.py --check-backend --backend log ; then \
+- enable_trace_backend='--enable-trace-backend=log'; \
++ enable_trace_backend="--enable-trace-backends=log"; \
+ elif $$source/scripts/tracetool.py --check-backend --backend stderr ; then \
+- enable_trace_backend='--enable-trace-backend=stderr'; \
++ enable_trace_backend='--enable-trace-backends=stderr'; \
+ else \
+ enable_trace_backend='' ; \
+ fi ; \
+--
+2.40.0
+
diff --git a/0005-x86-HAP-adjust-monitor-table-related-error-handling.patch b/0005-x86-HAP-adjust-monitor-table-related-error-handling.patch
deleted file mode 100644
index 0f48084..0000000
--- a/0005-x86-HAP-adjust-monitor-table-related-error-handling.patch
+++ /dev/null
@@ -1,77 +0,0 @@
-From 3422c19d85a3d23a9d798eafb739ffb8865522d2 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 11 Oct 2022 14:52:59 +0200
-Subject: [PATCH 05/87] x86/HAP: adjust monitor table related error handling
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-hap_make_monitor_table() will return INVALID_MFN if it encounters an
-error condition, but hap_update_paging_modes() wasn’t handling this
-value, resulting in an inappropriate value being stored in
-monitor_table. This would subsequently misguide at least
-hap_vcpu_teardown(). Avoid this by bailing early.
-
-Further, when a domain has/was already crashed or (perhaps less
-important as there's no such path known to lead here) is already dying,
-avoid calling domain_crash() on it again - that's at best confusing.
-
-This is part of CVE-2022-33746 / XSA-410.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 5b44a61180f4f2e4f490a28400c884dd357ff45d
-master date: 2022-10-11 14:21:56 +0200
----
- xen/arch/x86/mm/hap/hap.c | 14 ++++++++++++--
- 1 file changed, 12 insertions(+), 2 deletions(-)
-
-diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
-index a8f5a19da917..d75dc2b9ed3d 100644
---- a/xen/arch/x86/mm/hap/hap.c
-+++ b/xen/arch/x86/mm/hap/hap.c
-@@ -39,6 +39,7 @@
- #include <asm/domain.h>
- #include <xen/numa.h>
- #include <asm/hvm/nestedhvm.h>
-+#include <public/sched.h>
-
- #include "private.h"
-
-@@ -405,8 +406,13 @@ static mfn_t hap_make_monitor_table(struct vcpu *v)
- return m4mfn;
-
- oom:
-- printk(XENLOG_G_ERR "out of memory building monitor pagetable\n");
-- domain_crash(d);
-+ if ( !d->is_dying &&
-+ (!d->is_shutting_down || d->shutdown_code != SHUTDOWN_crash) )
-+ {
-+ printk(XENLOG_G_ERR "%pd: out of memory building monitor pagetable\n",
-+ d);
-+ domain_crash(d);
-+ }
- return INVALID_MFN;
- }
-
-@@ -766,6 +772,9 @@ static void hap_update_paging_modes(struct vcpu *v)
- if ( pagetable_is_null(v->arch.hvm.monitor_table) )
- {
- mfn_t mmfn = hap_make_monitor_table(v);
-+
-+ if ( mfn_eq(mmfn, INVALID_MFN) )
-+ goto unlock;
- v->arch.hvm.monitor_table = pagetable_from_mfn(mmfn);
- make_cr3(v, mmfn);
- hvm_update_host_cr3(v);
-@@ -774,6 +783,7 @@ static void hap_update_paging_modes(struct vcpu *v)
- /* CR3 is effectively updated by a mode change. Flush ASIDs, etc. */
- hap_update_cr3(v, 0, false);
-
-+ unlock:
- paging_unlock(d);
- put_gfn(d, cr3_gfn);
- }
---
-2.37.4
-
diff --git a/0006-x86-shadow-tolerate-failure-of-sh_set_toplevel_shado.patch b/0006-x86-shadow-tolerate-failure-of-sh_set_toplevel_shado.patch
deleted file mode 100644
index b9439ca..0000000
--- a/0006-x86-shadow-tolerate-failure-of-sh_set_toplevel_shado.patch
+++ /dev/null
@@ -1,76 +0,0 @@
-From 40e9daf6b56ae49bda3ba4e254ccf0e998e52a8c Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 11 Oct 2022 14:53:12 +0200
-Subject: [PATCH 06/87] x86/shadow: tolerate failure of
- sh_set_toplevel_shadow()
-
-Subsequently sh_set_toplevel_shadow() will be adjusted to install a
-blank entry in case prealloc fails. There are, in fact, pre-existing
-error paths which would put in place a blank entry. The 4- and 2-level
-code in sh_update_cr3(), however, assume the top level entry to be
-valid.
-
-Hence bail from the function in the unlikely event that it's not. Note
-that 3-level logic works differently: In particular a guest is free to
-supply a PDPTR pointing at 4 non-present (or otherwise deemed invalid)
-entries. The guest will crash, but we already cope with that.
-
-Really mfn_valid() is likely wrong to use in sh_set_toplevel_shadow(),
-and it should instead be !mfn_eq(gmfn, INVALID_MFN). Avoid such a change
-in security context, but add a respective assertion.
-
-This is part of CVE-2022-33746 / XSA-410.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Tim Deegan <tim@xen.org>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: eac000978c1feb5a9ee3236ab0c0da9a477e5336
-master date: 2022-10-11 14:22:24 +0200
----
- xen/arch/x86/mm/shadow/common.c | 1 +
- xen/arch/x86/mm/shadow/multi.c | 10 ++++++++++
- 2 files changed, 11 insertions(+)
-
-diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
-index 8c5baba9544d..00e520cbd05b 100644
---- a/xen/arch/x86/mm/shadow/common.c
-+++ b/xen/arch/x86/mm/shadow/common.c
-@@ -2516,6 +2516,7 @@ void sh_set_toplevel_shadow(struct vcpu *v,
- /* Now figure out the new contents: is this a valid guest MFN? */
- if ( !mfn_valid(gmfn) )
- {
-+ ASSERT(mfn_eq(gmfn, INVALID_MFN));
- new_entry = pagetable_null();
- goto install_new_entry;
- }
-diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
-index 7b8f4dd13b03..2ff78fe3362c 100644
---- a/xen/arch/x86/mm/shadow/multi.c
-+++ b/xen/arch/x86/mm/shadow/multi.c
-@@ -3312,6 +3312,11 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
- if ( sh_remove_write_access(d, gmfn, 4, 0) != 0 )
- guest_flush_tlb_mask(d, d->dirty_cpumask);
- sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow, sh_make_shadow);
-+ if ( unlikely(pagetable_is_null(v->arch.paging.shadow.shadow_table[0])) )
-+ {
-+ ASSERT(d->is_dying || d->is_shutting_down);
-+ return;
-+ }
- if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
- {
- mfn_t smfn = pagetable_get_mfn(v->arch.paging.shadow.shadow_table[0]);
-@@ -3370,6 +3375,11 @@ sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
- if ( sh_remove_write_access(d, gmfn, 2, 0) != 0 )
- guest_flush_tlb_mask(d, d->dirty_cpumask);
- sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow, sh_make_shadow);
-+ if ( unlikely(pagetable_is_null(v->arch.paging.shadow.shadow_table[0])) )
-+ {
-+ ASSERT(d->is_dying || d->is_shutting_down);
-+ return;
-+ }
- #else
- #error This should never happen
- #endif
---
-2.37.4
-
diff --git a/0006-x86-vmx-Calculate-model-specific-LBRs-once-at-start-.patch b/0006-x86-vmx-Calculate-model-specific-LBRs-once-at-start-.patch
new file mode 100644
index 0000000..c010110
--- /dev/null
+++ b/0006-x86-vmx-Calculate-model-specific-LBRs-once-at-start-.patch
@@ -0,0 +1,342 @@
+From 401e9e33a04c2a9887636ef58490c764543f0538 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 7 Feb 2023 17:04:18 +0100
+Subject: [PATCH 06/61] x86/vmx: Calculate model-specific LBRs once at start of
+ day
+
+There is no point repeating this calculation at runtime, especially as it is
+in the fallback path of the WRSMR/RDMSR handlers.
+
+Move the infrastructure higher in vmx.c to avoid forward declarations,
+renaming last_branch_msr_get() to get_model_specific_lbr() to highlight that
+these are model-specific only.
+
+No practical change.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+master commit: e94af0d58f86c3a914b9cbbf4d9ed3d43b974771
+master date: 2023-01-12 18:42:00 +0000
+---
+ xen/arch/x86/hvm/vmx/vmx.c | 276 +++++++++++++++++++------------------
+ 1 file changed, 139 insertions(+), 137 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
+index 3f42765313..bc308d9df2 100644
+--- a/xen/arch/x86/hvm/vmx/vmx.c
++++ b/xen/arch/x86/hvm/vmx/vmx.c
+@@ -394,6 +394,142 @@ void vmx_pi_hooks_deassign(struct domain *d)
+ domain_unpause(d);
+ }
+
++static const struct lbr_info {
++ u32 base, count;
++} p4_lbr[] = {
++ { MSR_P4_LER_FROM_LIP, 1 },
++ { MSR_P4_LER_TO_LIP, 1 },
++ { MSR_P4_LASTBRANCH_TOS, 1 },
++ { MSR_P4_LASTBRANCH_0_FROM_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
++ { MSR_P4_LASTBRANCH_0_TO_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
++ { 0, 0 }
++}, c2_lbr[] = {
++ { MSR_IA32_LASTINTFROMIP, 1 },
++ { MSR_IA32_LASTINTTOIP, 1 },
++ { MSR_C2_LASTBRANCH_TOS, 1 },
++ { MSR_C2_LASTBRANCH_0_FROM_IP, NUM_MSR_C2_LASTBRANCH_FROM_TO },
++ { MSR_C2_LASTBRANCH_0_TO_IP, NUM_MSR_C2_LASTBRANCH_FROM_TO },
++ { 0, 0 }
++}, nh_lbr[] = {
++ { MSR_IA32_LASTINTFROMIP, 1 },
++ { MSR_IA32_LASTINTTOIP, 1 },
++ { MSR_NHL_LBR_SELECT, 1 },
++ { MSR_NHL_LASTBRANCH_TOS, 1 },
++ { MSR_P4_LASTBRANCH_0_FROM_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
++ { MSR_P4_LASTBRANCH_0_TO_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
++ { 0, 0 }
++}, sk_lbr[] = {
++ { MSR_IA32_LASTINTFROMIP, 1 },
++ { MSR_IA32_LASTINTTOIP, 1 },
++ { MSR_NHL_LBR_SELECT, 1 },
++ { MSR_NHL_LASTBRANCH_TOS, 1 },
++ { MSR_SKL_LASTBRANCH_0_FROM_IP, NUM_MSR_SKL_LASTBRANCH },
++ { MSR_SKL_LASTBRANCH_0_TO_IP, NUM_MSR_SKL_LASTBRANCH },
++ { MSR_SKL_LASTBRANCH_0_INFO, NUM_MSR_SKL_LASTBRANCH },
++ { 0, 0 }
++}, at_lbr[] = {
++ { MSR_IA32_LASTINTFROMIP, 1 },
++ { MSR_IA32_LASTINTTOIP, 1 },
++ { MSR_C2_LASTBRANCH_TOS, 1 },
++ { MSR_C2_LASTBRANCH_0_FROM_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
++ { MSR_C2_LASTBRANCH_0_TO_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
++ { 0, 0 }
++}, sm_lbr[] = {
++ { MSR_IA32_LASTINTFROMIP, 1 },
++ { MSR_IA32_LASTINTTOIP, 1 },
++ { MSR_SM_LBR_SELECT, 1 },
++ { MSR_SM_LASTBRANCH_TOS, 1 },
++ { MSR_C2_LASTBRANCH_0_FROM_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
++ { MSR_C2_LASTBRANCH_0_TO_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
++ { 0, 0 }
++}, gm_lbr[] = {
++ { MSR_IA32_LASTINTFROMIP, 1 },
++ { MSR_IA32_LASTINTTOIP, 1 },
++ { MSR_SM_LBR_SELECT, 1 },
++ { MSR_SM_LASTBRANCH_TOS, 1 },
++ { MSR_GM_LASTBRANCH_0_FROM_IP, NUM_MSR_GM_LASTBRANCH_FROM_TO },
++ { MSR_GM_LASTBRANCH_0_TO_IP, NUM_MSR_GM_LASTBRANCH_FROM_TO },
++ { 0, 0 }
++};
++static const struct lbr_info *__read_mostly model_specific_lbr;
++
++static const struct lbr_info *__init get_model_specific_lbr(void)
++{
++ switch ( boot_cpu_data.x86 )
++ {
++ case 6:
++ switch ( boot_cpu_data.x86_model )
++ {
++ /* Core2 Duo */
++ case 0x0f:
++ /* Enhanced Core */
++ case 0x17:
++ /* Xeon 7400 */
++ case 0x1d:
++ return c2_lbr;
++ /* Nehalem */
++ case 0x1a: case 0x1e: case 0x1f: case 0x2e:
++ /* Westmere */
++ case 0x25: case 0x2c: case 0x2f:
++ /* Sandy Bridge */
++ case 0x2a: case 0x2d:
++ /* Ivy Bridge */
++ case 0x3a: case 0x3e:
++ /* Haswell */
++ case 0x3c: case 0x3f: case 0x45: case 0x46:
++ /* Broadwell */
++ case 0x3d: case 0x47: case 0x4f: case 0x56:
++ return nh_lbr;
++ /* Skylake */
++ case 0x4e: case 0x5e:
++ /* Xeon Scalable */
++ case 0x55:
++ /* Cannon Lake */
++ case 0x66:
++ /* Goldmont Plus */
++ case 0x7a:
++ /* Ice Lake */
++ case 0x6a: case 0x6c: case 0x7d: case 0x7e:
++ /* Tiger Lake */
++ case 0x8c: case 0x8d:
++ /* Tremont */
++ case 0x86:
++ /* Kaby Lake */
++ case 0x8e: case 0x9e:
++ /* Comet Lake */
++ case 0xa5: case 0xa6:
++ return sk_lbr;
++ /* Atom */
++ case 0x1c: case 0x26: case 0x27: case 0x35: case 0x36:
++ return at_lbr;
++ /* Silvermont */
++ case 0x37: case 0x4a: case 0x4d: case 0x5a: case 0x5d:
++ /* Xeon Phi Knights Landing */
++ case 0x57:
++ /* Xeon Phi Knights Mill */
++ case 0x85:
++ /* Airmont */
++ case 0x4c:
++ return sm_lbr;
++ /* Goldmont */
++ case 0x5c: case 0x5f:
++ return gm_lbr;
++ }
++ break;
++
++ case 15:
++ switch ( boot_cpu_data.x86_model )
++ {
++ /* Pentium4/Xeon with em64t */
++ case 3: case 4: case 6:
++ return p4_lbr;
++ }
++ break;
++ }
++
++ return NULL;
++}
++
+ static int vmx_domain_initialise(struct domain *d)
+ {
+ static const struct arch_csw csw = {
+@@ -2812,6 +2948,7 @@ const struct hvm_function_table * __init start_vmx(void)
+ vmx_function_table.get_guest_bndcfgs = vmx_get_guest_bndcfgs;
+ }
+
++ model_specific_lbr = get_model_specific_lbr();
+ lbr_tsx_fixup_check();
+ ler_to_fixup_check();
+
+@@ -2958,141 +3095,6 @@ static int vmx_cr_access(cr_access_qual_t qual)
+ return X86EMUL_OKAY;
+ }
+
+-static const struct lbr_info {
+- u32 base, count;
+-} p4_lbr[] = {
+- { MSR_P4_LER_FROM_LIP, 1 },
+- { MSR_P4_LER_TO_LIP, 1 },
+- { MSR_P4_LASTBRANCH_TOS, 1 },
+- { MSR_P4_LASTBRANCH_0_FROM_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
+- { MSR_P4_LASTBRANCH_0_TO_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
+- { 0, 0 }
+-}, c2_lbr[] = {
+- { MSR_IA32_LASTINTFROMIP, 1 },
+- { MSR_IA32_LASTINTTOIP, 1 },
+- { MSR_C2_LASTBRANCH_TOS, 1 },
+- { MSR_C2_LASTBRANCH_0_FROM_IP, NUM_MSR_C2_LASTBRANCH_FROM_TO },
+- { MSR_C2_LASTBRANCH_0_TO_IP, NUM_MSR_C2_LASTBRANCH_FROM_TO },
+- { 0, 0 }
+-}, nh_lbr[] = {
+- { MSR_IA32_LASTINTFROMIP, 1 },
+- { MSR_IA32_LASTINTTOIP, 1 },
+- { MSR_NHL_LBR_SELECT, 1 },
+- { MSR_NHL_LASTBRANCH_TOS, 1 },
+- { MSR_P4_LASTBRANCH_0_FROM_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
+- { MSR_P4_LASTBRANCH_0_TO_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
+- { 0, 0 }
+-}, sk_lbr[] = {
+- { MSR_IA32_LASTINTFROMIP, 1 },
+- { MSR_IA32_LASTINTTOIP, 1 },
+- { MSR_NHL_LBR_SELECT, 1 },
+- { MSR_NHL_LASTBRANCH_TOS, 1 },
+- { MSR_SKL_LASTBRANCH_0_FROM_IP, NUM_MSR_SKL_LASTBRANCH },
+- { MSR_SKL_LASTBRANCH_0_TO_IP, NUM_MSR_SKL_LASTBRANCH },
+- { MSR_SKL_LASTBRANCH_0_INFO, NUM_MSR_SKL_LASTBRANCH },
+- { 0, 0 }
+-}, at_lbr[] = {
+- { MSR_IA32_LASTINTFROMIP, 1 },
+- { MSR_IA32_LASTINTTOIP, 1 },
+- { MSR_C2_LASTBRANCH_TOS, 1 },
+- { MSR_C2_LASTBRANCH_0_FROM_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
+- { MSR_C2_LASTBRANCH_0_TO_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
+- { 0, 0 }
+-}, sm_lbr[] = {
+- { MSR_IA32_LASTINTFROMIP, 1 },
+- { MSR_IA32_LASTINTTOIP, 1 },
+- { MSR_SM_LBR_SELECT, 1 },
+- { MSR_SM_LASTBRANCH_TOS, 1 },
+- { MSR_C2_LASTBRANCH_0_FROM_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
+- { MSR_C2_LASTBRANCH_0_TO_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
+- { 0, 0 }
+-}, gm_lbr[] = {
+- { MSR_IA32_LASTINTFROMIP, 1 },
+- { MSR_IA32_LASTINTTOIP, 1 },
+- { MSR_SM_LBR_SELECT, 1 },
+- { MSR_SM_LASTBRANCH_TOS, 1 },
+- { MSR_GM_LASTBRANCH_0_FROM_IP, NUM_MSR_GM_LASTBRANCH_FROM_TO },
+- { MSR_GM_LASTBRANCH_0_TO_IP, NUM_MSR_GM_LASTBRANCH_FROM_TO },
+- { 0, 0 }
+-};
+-
+-static const struct lbr_info *last_branch_msr_get(void)
+-{
+- switch ( boot_cpu_data.x86 )
+- {
+- case 6:
+- switch ( boot_cpu_data.x86_model )
+- {
+- /* Core2 Duo */
+- case 0x0f:
+- /* Enhanced Core */
+- case 0x17:
+- /* Xeon 7400 */
+- case 0x1d:
+- return c2_lbr;
+- /* Nehalem */
+- case 0x1a: case 0x1e: case 0x1f: case 0x2e:
+- /* Westmere */
+- case 0x25: case 0x2c: case 0x2f:
+- /* Sandy Bridge */
+- case 0x2a: case 0x2d:
+- /* Ivy Bridge */
+- case 0x3a: case 0x3e:
+- /* Haswell */
+- case 0x3c: case 0x3f: case 0x45: case 0x46:
+- /* Broadwell */
+- case 0x3d: case 0x47: case 0x4f: case 0x56:
+- return nh_lbr;
+- /* Skylake */
+- case 0x4e: case 0x5e:
+- /* Xeon Scalable */
+- case 0x55:
+- /* Cannon Lake */
+- case 0x66:
+- /* Goldmont Plus */
+- case 0x7a:
+- /* Ice Lake */
+- case 0x6a: case 0x6c: case 0x7d: case 0x7e:
+- /* Tiger Lake */
+- case 0x8c: case 0x8d:
+- /* Tremont */
+- case 0x86:
+- /* Kaby Lake */
+- case 0x8e: case 0x9e:
+- /* Comet Lake */
+- case 0xa5: case 0xa6:
+- return sk_lbr;
+- /* Atom */
+- case 0x1c: case 0x26: case 0x27: case 0x35: case 0x36:
+- return at_lbr;
+- /* Silvermont */
+- case 0x37: case 0x4a: case 0x4d: case 0x5a: case 0x5d:
+- /* Xeon Phi Knights Landing */
+- case 0x57:
+- /* Xeon Phi Knights Mill */
+- case 0x85:
+- /* Airmont */
+- case 0x4c:
+- return sm_lbr;
+- /* Goldmont */
+- case 0x5c: case 0x5f:
+- return gm_lbr;
+- }
+- break;
+-
+- case 15:
+- switch ( boot_cpu_data.x86_model )
+- {
+- /* Pentium4/Xeon with em64t */
+- case 3: case 4: case 6:
+- return p4_lbr;
+- }
+- break;
+- }
+-
+- return NULL;
+-}
+-
+ enum
+ {
+ LBR_FORMAT_32 = 0x0, /* 32-bit record format */
+@@ -3199,7 +3201,7 @@ static void __init ler_to_fixup_check(void)
+
+ static int is_last_branch_msr(u32 ecx)
+ {
+- const struct lbr_info *lbr = last_branch_msr_get();
++ const struct lbr_info *lbr = model_specific_lbr;
+
+ if ( lbr == NULL )
+ return 0;
+@@ -3536,7 +3538,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
+ if ( !(v->arch.hvm.vmx.lbr_flags & LBR_MSRS_INSERTED) &&
+ (msr_content & IA32_DEBUGCTLMSR_LBR) )
+ {
+- const struct lbr_info *lbr = last_branch_msr_get();
++ const struct lbr_info *lbr = model_specific_lbr;
+
+ if ( unlikely(!lbr) )
+ {
+--
+2.40.0
+
diff --git a/0007-x86-shadow-tolerate-failure-in-shadow_prealloc.patch b/0007-x86-shadow-tolerate-failure-in-shadow_prealloc.patch
deleted file mode 100644
index d288a0b..0000000
--- a/0007-x86-shadow-tolerate-failure-in-shadow_prealloc.patch
+++ /dev/null
@@ -1,279 +0,0 @@
-From 28d3f677ec97c98154311f64871ac48762cf980a Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 11 Oct 2022 14:53:27 +0200
-Subject: [PATCH 07/87] x86/shadow: tolerate failure in shadow_prealloc()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Prevent _shadow_prealloc() from calling BUG() when unable to fulfill
-the pre-allocation and instead return true/false. Modify
-shadow_prealloc() to crash the domain on allocation failure (if the
-domain is not already dying), as shadow cannot operate normally after
-that. Modify callers to also gracefully handle {_,}shadow_prealloc()
-failing to fulfill the request.
-
-Note this in turn requires adjusting the callers of
-sh_make_monitor_table() also to handle it returning INVALID_MFN.
-sh_update_paging_modes() is also modified to add additional error
-paths in case of allocation failure, some of those will return with
-null monitor page tables (and the domain likely crashed). This is no
-different that current error paths, but the newly introduced ones are
-more likely to trigger.
-
-The now added failure points in sh_update_paging_modes() also require
-that on some error return paths the previous structures are cleared,
-and thus monitor table is null.
-
-While there adjust the 'type' parameter type of shadow_prealloc() to
-unsigned int rather than u32.
-
-This is part of CVE-2022-33746 / XSA-410.
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Tim Deegan <tim@xen.org>
-master commit: b7f93c6afb12b6061e2d19de2f39ea09b569ac68
-master date: 2022-10-11 14:22:53 +0200
----
- xen/arch/x86/mm/shadow/common.c | 69 ++++++++++++++++++++++++--------
- xen/arch/x86/mm/shadow/hvm.c | 4 +-
- xen/arch/x86/mm/shadow/multi.c | 11 +++--
- xen/arch/x86/mm/shadow/private.h | 3 +-
- 4 files changed, 66 insertions(+), 21 deletions(-)
-
-diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
-index 00e520cbd05b..2067c7d16bb4 100644
---- a/xen/arch/x86/mm/shadow/common.c
-+++ b/xen/arch/x86/mm/shadow/common.c
-@@ -36,6 +36,7 @@
- #include <asm/flushtlb.h>
- #include <asm/shadow.h>
- #include <xen/numa.h>
-+#include <public/sched.h>
- #include "private.h"
-
- DEFINE_PER_CPU(uint32_t,trace_shadow_path_flags);
-@@ -928,14 +929,15 @@ static inline void trace_shadow_prealloc_unpin(struct domain *d, mfn_t smfn)
-
- /* Make sure there are at least count order-sized pages
- * available in the shadow page pool. */
--static void _shadow_prealloc(struct domain *d, unsigned int pages)
-+static bool __must_check _shadow_prealloc(struct domain *d, unsigned int pages)
- {
- struct vcpu *v;
- struct page_info *sp, *t;
- mfn_t smfn;
- int i;
-
-- if ( d->arch.paging.shadow.free_pages >= pages ) return;
-+ if ( d->arch.paging.shadow.free_pages >= pages )
-+ return true;
-
- /* Shouldn't have enabled shadows if we've no vcpus. */
- ASSERT(d->vcpu && d->vcpu[0]);
-@@ -951,7 +953,8 @@ static void _shadow_prealloc(struct domain *d, unsigned int pages)
- sh_unpin(d, smfn);
-
- /* See if that freed up enough space */
-- if ( d->arch.paging.shadow.free_pages >= pages ) return;
-+ if ( d->arch.paging.shadow.free_pages >= pages )
-+ return true;
- }
-
- /* Stage two: all shadow pages are in use in hierarchies that are
-@@ -974,7 +977,7 @@ static void _shadow_prealloc(struct domain *d, unsigned int pages)
- if ( d->arch.paging.shadow.free_pages >= pages )
- {
- guest_flush_tlb_mask(d, d->dirty_cpumask);
-- return;
-+ return true;
- }
- }
- }
-@@ -987,7 +990,12 @@ static void _shadow_prealloc(struct domain *d, unsigned int pages)
- d->arch.paging.shadow.total_pages,
- d->arch.paging.shadow.free_pages,
- d->arch.paging.shadow.p2m_pages);
-- BUG();
-+
-+ ASSERT(d->is_dying);
-+
-+ guest_flush_tlb_mask(d, d->dirty_cpumask);
-+
-+ return false;
- }
-
- /* Make sure there are at least count pages of the order according to
-@@ -995,9 +1003,19 @@ static void _shadow_prealloc(struct domain *d, unsigned int pages)
- * This must be called before any calls to shadow_alloc(). Since this
- * will free existing shadows to make room, it must be called early enough
- * to avoid freeing shadows that the caller is currently working on. */
--void shadow_prealloc(struct domain *d, u32 type, unsigned int count)
-+bool shadow_prealloc(struct domain *d, unsigned int type, unsigned int count)
- {
-- return _shadow_prealloc(d, shadow_size(type) * count);
-+ bool ret = _shadow_prealloc(d, shadow_size(type) * count);
-+
-+ if ( !ret && !d->is_dying &&
-+ (!d->is_shutting_down || d->shutdown_code != SHUTDOWN_crash) )
-+ /*
-+ * Failing to allocate memory required for shadow usage can only result in
-+ * a domain crash, do it here rather that relying on every caller to do it.
-+ */
-+ domain_crash(d);
-+
-+ return ret;
- }
-
- /* Deliberately free all the memory we can: this will tear down all of
-@@ -1218,7 +1236,7 @@ void shadow_free(struct domain *d, mfn_t smfn)
- static struct page_info *
- shadow_alloc_p2m_page(struct domain *d)
- {
-- struct page_info *pg;
-+ struct page_info *pg = NULL;
-
- /* This is called both from the p2m code (which never holds the
- * paging lock) and the log-dirty code (which always does). */
-@@ -1236,16 +1254,18 @@ shadow_alloc_p2m_page(struct domain *d)
- d->arch.paging.shadow.p2m_pages,
- shadow_min_acceptable_pages(d));
- }
-- paging_unlock(d);
-- return NULL;
-+ goto out;
- }
-
-- shadow_prealloc(d, SH_type_p2m_table, 1);
-+ if ( !shadow_prealloc(d, SH_type_p2m_table, 1) )
-+ goto out;
-+
- pg = mfn_to_page(shadow_alloc(d, SH_type_p2m_table, 0));
- d->arch.paging.shadow.p2m_pages++;
- d->arch.paging.shadow.total_pages--;
- ASSERT(!page_get_owner(pg) && !(pg->count_info & PGC_count_mask));
-
-+ out:
- paging_unlock(d);
-
- return pg;
-@@ -1336,7 +1356,9 @@ int shadow_set_allocation(struct domain *d, unsigned int pages, bool *preempted)
- else if ( d->arch.paging.shadow.total_pages > pages )
- {
- /* Need to return memory to domheap */
-- _shadow_prealloc(d, 1);
-+ if ( !_shadow_prealloc(d, 1) )
-+ return -ENOMEM;
-+
- sp = page_list_remove_head(&d->arch.paging.shadow.freelist);
- ASSERT(sp);
- /*
-@@ -2334,12 +2356,13 @@ static void sh_update_paging_modes(struct vcpu *v)
- if ( mfn_eq(v->arch.paging.shadow.oos_snapshot[0], INVALID_MFN) )
- {
- int i;
-+
-+ if ( !shadow_prealloc(d, SH_type_oos_snapshot, SHADOW_OOS_PAGES) )
-+ return;
-+
- for(i = 0; i < SHADOW_OOS_PAGES; i++)
-- {
-- shadow_prealloc(d, SH_type_oos_snapshot, 1);
- v->arch.paging.shadow.oos_snapshot[i] =
- shadow_alloc(d, SH_type_oos_snapshot, 0);
-- }
- }
- #endif /* OOS */
-
-@@ -2403,6 +2426,9 @@ static void sh_update_paging_modes(struct vcpu *v)
- mfn_t mmfn = sh_make_monitor_table(
- v, v->arch.paging.mode->shadow.shadow_levels);
-
-+ if ( mfn_eq(mmfn, INVALID_MFN) )
-+ return;
-+
- v->arch.hvm.monitor_table = pagetable_from_mfn(mmfn);
- make_cr3(v, mmfn);
- hvm_update_host_cr3(v);
-@@ -2441,6 +2467,12 @@ static void sh_update_paging_modes(struct vcpu *v)
- v->arch.hvm.monitor_table = pagetable_null();
- new_mfn = sh_make_monitor_table(
- v, v->arch.paging.mode->shadow.shadow_levels);
-+ if ( mfn_eq(new_mfn, INVALID_MFN) )
-+ {
-+ sh_destroy_monitor_table(v, old_mfn,
-+ old_mode->shadow.shadow_levels);
-+ return;
-+ }
- v->arch.hvm.monitor_table = pagetable_from_mfn(new_mfn);
- SHADOW_PRINTK("new monitor table %"PRI_mfn "\n",
- mfn_x(new_mfn));
-@@ -2526,7 +2558,12 @@ void sh_set_toplevel_shadow(struct vcpu *v,
- if ( !mfn_valid(smfn) )
- {
- /* Make sure there's enough free shadow memory. */
-- shadow_prealloc(d, root_type, 1);
-+ if ( !shadow_prealloc(d, root_type, 1) )
-+ {
-+ new_entry = pagetable_null();
-+ goto install_new_entry;
-+ }
-+
- /* Shadow the page. */
- smfn = make_shadow(v, gmfn, root_type);
- }
-diff --git a/xen/arch/x86/mm/shadow/hvm.c b/xen/arch/x86/mm/shadow/hvm.c
-index d5f42102a0bd..a0878d9ad71a 100644
---- a/xen/arch/x86/mm/shadow/hvm.c
-+++ b/xen/arch/x86/mm/shadow/hvm.c
-@@ -700,7 +700,9 @@ mfn_t sh_make_monitor_table(const struct vcpu *v, unsigned int shadow_levels)
- ASSERT(!pagetable_get_pfn(v->arch.hvm.monitor_table));
-
- /* Guarantee we can get the memory we need */
-- shadow_prealloc(d, SH_type_monitor_table, CONFIG_PAGING_LEVELS);
-+ if ( !shadow_prealloc(d, SH_type_monitor_table, CONFIG_PAGING_LEVELS) )
-+ return INVALID_MFN;
-+
- m4mfn = shadow_alloc(d, SH_type_monitor_table, 0);
- mfn_to_page(m4mfn)->shadow_flags = 4;
-
-diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
-index 2ff78fe3362c..c07af0bd99da 100644
---- a/xen/arch/x86/mm/shadow/multi.c
-+++ b/xen/arch/x86/mm/shadow/multi.c
-@@ -2440,9 +2440,14 @@ static int sh_page_fault(struct vcpu *v,
- * Preallocate shadow pages *before* removing writable accesses
- * otherwhise an OOS L1 might be demoted and promoted again with
- * writable mappings. */
-- shadow_prealloc(d,
-- SH_type_l1_shadow,
-- GUEST_PAGING_LEVELS < 4 ? 1 : GUEST_PAGING_LEVELS - 1);
-+ if ( !shadow_prealloc(d, SH_type_l1_shadow,
-+ GUEST_PAGING_LEVELS < 4
-+ ? 1 : GUEST_PAGING_LEVELS - 1) )
-+ {
-+ paging_unlock(d);
-+ put_gfn(d, gfn_x(gfn));
-+ return 0;
-+ }
-
- rc = gw_remove_write_accesses(v, va, &gw);
-
-diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h
-index 35efb1b984fb..738214f75e8d 100644
---- a/xen/arch/x86/mm/shadow/private.h
-+++ b/xen/arch/x86/mm/shadow/private.h
-@@ -383,7 +383,8 @@ void shadow_promote(struct domain *d, mfn_t gmfn, u32 type);
- void shadow_demote(struct domain *d, mfn_t gmfn, u32 type);
-
- /* Shadow page allocation functions */
--void shadow_prealloc(struct domain *d, u32 shadow_type, unsigned int count);
-+bool __must_check shadow_prealloc(struct domain *d, unsigned int shadow_type,
-+ unsigned int count);
- mfn_t shadow_alloc(struct domain *d,
- u32 shadow_type,
- unsigned long backpointer);
---
-2.37.4
-
diff --git a/0007-x86-vmx-Support-for-CPUs-without-model-specific-LBR.patch b/0007-x86-vmx-Support-for-CPUs-without-model-specific-LBR.patch
new file mode 100644
index 0000000..fc81a17
--- /dev/null
+++ b/0007-x86-vmx-Support-for-CPUs-without-model-specific-LBR.patch
@@ -0,0 +1,83 @@
+From 9f425039ca50e8cc8db350ec54d8a7cd4175f417 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 7 Feb 2023 17:04:49 +0100
+Subject: [PATCH 07/61] x86/vmx: Support for CPUs without model-specific LBR
+
+Ice Lake (server at least) has both architectural LBR and model-specific LBR.
+Sapphire Rapids does not have model-specific LBR at all. I.e. On SPR and
+later, model_specific_lbr will always be NULL, so we must make changes to
+avoid reliably hitting the domain_crash().
+
+The Arch LBR spec states that CPUs without model-specific LBR implement
+MSR_DBG_CTL.LBR by discarding writes and always returning 0.
+
+Do this for any CPU for which we lack model-specific LBR information.
+
+Adjust the now-stale comment, now that the Arch LBR spec has created a way to
+signal "no model specific LBR" to guests.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+master commit: 3edca52ce736297d7fcf293860cd94ef62638052
+master date: 2023-01-12 18:42:00 +0000
+---
+ xen/arch/x86/hvm/vmx/vmx.c | 31 ++++++++++++++++---------------
+ 1 file changed, 16 insertions(+), 15 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
+index bc308d9df2..094141be9a 100644
+--- a/xen/arch/x86/hvm/vmx/vmx.c
++++ b/xen/arch/x86/hvm/vmx/vmx.c
+@@ -3518,18 +3518,26 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
+ if ( msr_content & rsvd )
+ goto gp_fault;
+
++ /*
++ * The Arch LBR spec (new in Ice Lake) states that CPUs with no
++ * model-specific LBRs implement MSR_DBG_CTL.LBR by discarding writes
++ * and always returning 0.
++ *
++ * Use this property in all cases where we don't know any
++ * model-specific LBR information, as it matches real hardware
++ * behaviour on post-Ice Lake systems.
++ */
++ if ( !model_specific_lbr )
++ msr_content &= ~IA32_DEBUGCTLMSR_LBR;
++
+ /*
+ * When a guest first enables LBR, arrange to save and restore the LBR
+ * MSRs and allow the guest direct access.
+ *
+- * MSR_DEBUGCTL and LBR has existed almost as long as MSRs have
+- * existed, and there is no architectural way to hide the feature, or
+- * fail the attempt to enable LBR.
+- *
+- * Unknown host LBR MSRs or hitting -ENOSPC with the guest load/save
+- * list are definitely hypervisor bugs, whereas -ENOMEM for allocating
+- * the load/save list is simply unlucky (and shouldn't occur with
+- * sensible management by the toolstack).
++ * Hitting -ENOSPC with the guest load/save list is definitely a
++ * hypervisor bug, whereas -ENOMEM for allocating the load/save list
++ * is simply unlucky (and shouldn't occur with sensible management by
++ * the toolstack).
+ *
+ * Either way, there is nothing we can do right now to recover, and
+ * the guest won't execute correctly either. Simply crash the domain
+@@ -3540,13 +3548,6 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
+ {
+ const struct lbr_info *lbr = model_specific_lbr;
+
+- if ( unlikely(!lbr) )
+- {
+- gprintk(XENLOG_ERR, "Unknown Host LBR MSRs\n");
+- domain_crash(v->domain);
+- return X86EMUL_OKAY;
+- }
+-
+ for ( ; lbr->count; lbr++ )
+ {
+ unsigned int i;
+--
+2.40.0
+
diff --git a/0008-x86-p2m-refuse-new-allocations-for-dying-domains.patch b/0008-x86-p2m-refuse-new-allocations-for-dying-domains.patch
deleted file mode 100644
index d89d5b9..0000000
--- a/0008-x86-p2m-refuse-new-allocations-for-dying-domains.patch
+++ /dev/null
@@ -1,100 +0,0 @@
-From 745e0b300dc3f5000e6d48c273b405d4bcc29ba7 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 11 Oct 2022 14:53:41 +0200
-Subject: [PATCH 08/87] x86/p2m: refuse new allocations for dying domains
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-This will in particular prevent any attempts to add entries to the p2m,
-once - in a subsequent change - non-root entries have been removed.
-
-This is part of CVE-2022-33746 / XSA-410.
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Tim Deegan <tim@xen.org>
-master commit: ff600a8cf8e36f8ecbffecf96a035952e022ab87
-master date: 2022-10-11 14:23:22 +0200
----
- xen/arch/x86/mm/hap/hap.c | 5 ++++-
- xen/arch/x86/mm/shadow/common.c | 18 ++++++++++++++----
- 2 files changed, 18 insertions(+), 5 deletions(-)
-
-diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
-index d75dc2b9ed3d..787991233e53 100644
---- a/xen/arch/x86/mm/hap/hap.c
-+++ b/xen/arch/x86/mm/hap/hap.c
-@@ -245,6 +245,9 @@ static struct page_info *hap_alloc(struct domain *d)
-
- ASSERT(paging_locked_by_me(d));
-
-+ if ( unlikely(d->is_dying) )
-+ return NULL;
-+
- pg = page_list_remove_head(&d->arch.paging.hap.freelist);
- if ( unlikely(!pg) )
- return NULL;
-@@ -281,7 +284,7 @@ static struct page_info *hap_alloc_p2m_page(struct domain *d)
- d->arch.paging.hap.p2m_pages++;
- ASSERT(!page_get_owner(pg) && !(pg->count_info & PGC_count_mask));
- }
-- else if ( !d->arch.paging.p2m_alloc_failed )
-+ else if ( !d->arch.paging.p2m_alloc_failed && !d->is_dying )
- {
- d->arch.paging.p2m_alloc_failed = 1;
- dprintk(XENLOG_ERR, "d%i failed to allocate from HAP pool\n",
-diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
-index 2067c7d16bb4..9807f6ec6c00 100644
---- a/xen/arch/x86/mm/shadow/common.c
-+++ b/xen/arch/x86/mm/shadow/common.c
-@@ -939,6 +939,10 @@ static bool __must_check _shadow_prealloc(struct domain *d, unsigned int pages)
- if ( d->arch.paging.shadow.free_pages >= pages )
- return true;
-
-+ if ( unlikely(d->is_dying) )
-+ /* No reclaim when the domain is dying, teardown will take care of it. */
-+ return false;
-+
- /* Shouldn't have enabled shadows if we've no vcpus. */
- ASSERT(d->vcpu && d->vcpu[0]);
-
-@@ -991,7 +995,7 @@ static bool __must_check _shadow_prealloc(struct domain *d, unsigned int pages)
- d->arch.paging.shadow.free_pages,
- d->arch.paging.shadow.p2m_pages);
-
-- ASSERT(d->is_dying);
-+ ASSERT_UNREACHABLE();
-
- guest_flush_tlb_mask(d, d->dirty_cpumask);
-
-@@ -1005,10 +1009,13 @@ static bool __must_check _shadow_prealloc(struct domain *d, unsigned int pages)
- * to avoid freeing shadows that the caller is currently working on. */
- bool shadow_prealloc(struct domain *d, unsigned int type, unsigned int count)
- {
-- bool ret = _shadow_prealloc(d, shadow_size(type) * count);
-+ bool ret;
-
-- if ( !ret && !d->is_dying &&
-- (!d->is_shutting_down || d->shutdown_code != SHUTDOWN_crash) )
-+ if ( unlikely(d->is_dying) )
-+ return false;
-+
-+ ret = _shadow_prealloc(d, shadow_size(type) * count);
-+ if ( !ret && (!d->is_shutting_down || d->shutdown_code != SHUTDOWN_crash) )
- /*
- * Failing to allocate memory required for shadow usage can only result in
- * a domain crash, do it here rather that relying on every caller to do it.
-@@ -1238,6 +1245,9 @@ shadow_alloc_p2m_page(struct domain *d)
- {
- struct page_info *pg = NULL;
-
-+ if ( unlikely(d->is_dying) )
-+ return NULL;
-+
- /* This is called both from the p2m code (which never holds the
- * paging lock) and the log-dirty code (which always does). */
- paging_lock_recursive(d);
---
-2.37.4
-
diff --git a/0008-x86-shadow-fix-PAE-check-for-top-level-table-unshado.patch b/0008-x86-shadow-fix-PAE-check-for-top-level-table-unshado.patch
new file mode 100644
index 0000000..ab7862b
--- /dev/null
+++ b/0008-x86-shadow-fix-PAE-check-for-top-level-table-unshado.patch
@@ -0,0 +1,39 @@
+From 1550835b381a18fc0e972e5d04925e02fab31553 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 7 Feb 2023 17:05:22 +0100
+Subject: [PATCH 08/61] x86/shadow: fix PAE check for top-level table
+ unshadowing
+
+Clearly within the for_each_vcpu() the vCPU of this loop is meant, not
+the (loop invariant) one the fault occurred on.
+
+Fixes: 3d5e6a3ff383 ("x86 hvm: implement HVMOP_pagetable_dying")
+Fixes: ef3b0d8d2c39 ("x86/shadow: shadow_table[] needs only one entry for PV-only configs")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: f8fdceefbb1193ec81667eb40b83bc525cb71204
+master date: 2023-01-20 09:23:42 +0100
+---
+ xen/arch/x86/mm/shadow/multi.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
+index c07af0bd99..f7acd18a36 100644
+--- a/xen/arch/x86/mm/shadow/multi.c
++++ b/xen/arch/x86/mm/shadow/multi.c
+@@ -2665,10 +2665,10 @@ static int sh_page_fault(struct vcpu *v,
+ #if GUEST_PAGING_LEVELS == 3
+ unsigned int i;
+
+- for_each_shadow_table(v, i)
++ for_each_shadow_table(tmp, i)
+ {
+ mfn_t smfn = pagetable_get_mfn(
+- v->arch.paging.shadow.shadow_table[i]);
++ tmp->arch.paging.shadow.shadow_table[i]);
+
+ if ( mfn_valid(smfn) && (mfn_x(smfn) != 0) )
+ {
+--
+2.40.0
+
diff --git a/0009-ns16550-fix-an-incorrect-assignment-to-uart-io_size.patch b/0009-ns16550-fix-an-incorrect-assignment-to-uart-io_size.patch
new file mode 100644
index 0000000..83e46c7
--- /dev/null
+++ b/0009-ns16550-fix-an-incorrect-assignment-to-uart-io_size.patch
@@ -0,0 +1,34 @@
+From 0fd9ad2b9c0c9d9c4879a566f1788d3e9cd38ef6 Mon Sep 17 00:00:00 2001
+From: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
+Date: Tue, 7 Feb 2023 17:05:56 +0100
+Subject: [PATCH 09/61] ns16550: fix an incorrect assignment to uart->io_size
+
+uart->io_size represents the size in bytes. Thus, when serial_port.bit_width
+is assigned to it, it should be converted to size in bytes.
+
+Fixes: 17b516196c ("ns16550: add ACPI support for ARM only")
+Reported-by: Jan Beulich <jbeulich@suse.com>
+Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+master commit: 352c89f72ddb67b8d9d4e492203f8c77f85c8df1
+master date: 2023-01-24 16:54:38 +0100
+---
+ xen/drivers/char/ns16550.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
+index 2d2bd2a024..5dd4d723f5 100644
+--- a/xen/drivers/char/ns16550.c
++++ b/xen/drivers/char/ns16550.c
+@@ -1780,7 +1780,7 @@ static int __init ns16550_acpi_uart_init(const void *data)
+ uart->parity = spcr->parity;
+ uart->stop_bits = spcr->stop_bits;
+ uart->io_base = spcr->serial_port.address;
+- uart->io_size = spcr->serial_port.bit_width;
++ uart->io_size = DIV_ROUND_UP(spcr->serial_port.bit_width, BITS_PER_BYTE);
+ uart->reg_shift = spcr->serial_port.bit_offset;
+ uart->reg_width = spcr->serial_port.access_width;
+
+--
+2.40.0
+
diff --git a/0009-x86-p2m-truly-free-paging-pool-memory-for-dying-doma.patch b/0009-x86-p2m-truly-free-paging-pool-memory-for-dying-doma.patch
deleted file mode 100644
index 57620cd..0000000
--- a/0009-x86-p2m-truly-free-paging-pool-memory-for-dying-doma.patch
+++ /dev/null
@@ -1,115 +0,0 @@
-From 943635d8f8486209e4e48966507ad57963e96284 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 11 Oct 2022 14:54:00 +0200
-Subject: [PATCH 09/87] x86/p2m: truly free paging pool memory for dying
- domains
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Modify {hap,shadow}_free to free the page immediately if the domain is
-dying, so that pages don't accumulate in the pool when
-{shadow,hap}_final_teardown() get called. This is to limit the amount of
-work which needs to be done there (in a non-preemptable manner).
-
-Note the call to shadow_free() in shadow_free_p2m_page() is moved after
-increasing total_pages, so that the decrease done in shadow_free() in
-case the domain is dying doesn't underflow the counter, even if just for
-a short interval.
-
-This is part of CVE-2022-33746 / XSA-410.
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Tim Deegan <tim@xen.org>
-master commit: f50a2c0e1d057c00d6061f40ae24d068226052ad
-master date: 2022-10-11 14:23:51 +0200
----
- xen/arch/x86/mm/hap/hap.c | 12 ++++++++++++
- xen/arch/x86/mm/shadow/common.c | 28 +++++++++++++++++++++++++---
- 2 files changed, 37 insertions(+), 3 deletions(-)
-
-diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
-index 787991233e53..aef2297450e1 100644
---- a/xen/arch/x86/mm/hap/hap.c
-+++ b/xen/arch/x86/mm/hap/hap.c
-@@ -265,6 +265,18 @@ static void hap_free(struct domain *d, mfn_t mfn)
-
- ASSERT(paging_locked_by_me(d));
-
-+ /*
-+ * For dying domains, actually free the memory here. This way less work is
-+ * left to hap_final_teardown(), which cannot easily have preemption checks
-+ * added.
-+ */
-+ if ( unlikely(d->is_dying) )
-+ {
-+ free_domheap_page(pg);
-+ d->arch.paging.hap.total_pages--;
-+ return;
-+ }
-+
- d->arch.paging.hap.free_pages++;
- page_list_add_tail(pg, &d->arch.paging.hap.freelist);
- }
-diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
-index 9807f6ec6c00..9eb33eafc7f7 100644
---- a/xen/arch/x86/mm/shadow/common.c
-+++ b/xen/arch/x86/mm/shadow/common.c
-@@ -1187,6 +1187,7 @@ mfn_t shadow_alloc(struct domain *d,
- void shadow_free(struct domain *d, mfn_t smfn)
- {
- struct page_info *next = NULL, *sp = mfn_to_page(smfn);
-+ bool dying = ACCESS_ONCE(d->is_dying);
- struct page_list_head *pin_list;
- unsigned int pages;
- u32 shadow_type;
-@@ -1229,11 +1230,32 @@ void shadow_free(struct domain *d, mfn_t smfn)
- * just before the allocator hands the page out again. */
- page_set_tlbflush_timestamp(sp);
- perfc_decr(shadow_alloc_count);
-- page_list_add_tail(sp, &d->arch.paging.shadow.freelist);
-+
-+ /*
-+ * For dying domains, actually free the memory here. This way less
-+ * work is left to shadow_final_teardown(), which cannot easily have
-+ * preemption checks added.
-+ */
-+ if ( unlikely(dying) )
-+ {
-+ /*
-+ * The backpointer field (sh.back) used by shadow code aliases the
-+ * domain owner field, unconditionally clear it here to avoid
-+ * free_domheap_page() attempting to parse it.
-+ */
-+ page_set_owner(sp, NULL);
-+ free_domheap_page(sp);
-+ }
-+ else
-+ page_list_add_tail(sp, &d->arch.paging.shadow.freelist);
-+
- sp = next;
- }
-
-- d->arch.paging.shadow.free_pages += pages;
-+ if ( unlikely(dying) )
-+ d->arch.paging.shadow.total_pages -= pages;
-+ else
-+ d->arch.paging.shadow.free_pages += pages;
- }
-
- /* Divert a page from the pool to be used by the p2m mapping.
-@@ -1303,9 +1325,9 @@ shadow_free_p2m_page(struct domain *d, struct page_info *pg)
- * paging lock) and the log-dirty code (which always does). */
- paging_lock_recursive(d);
-
-- shadow_free(d, page_to_mfn(pg));
- d->arch.paging.shadow.p2m_pages--;
- d->arch.paging.shadow.total_pages++;
-+ shadow_free(d, page_to_mfn(pg));
-
- paging_unlock(d);
- }
---
-2.37.4
-
diff --git a/0010-libxl-fix-guest-kexec-skip-cpuid-policy.patch b/0010-libxl-fix-guest-kexec-skip-cpuid-policy.patch
new file mode 100644
index 0000000..6150286
--- /dev/null
+++ b/0010-libxl-fix-guest-kexec-skip-cpuid-policy.patch
@@ -0,0 +1,72 @@
+From 6e081438bf8ef616d0123aab7a743476d8114ef6 Mon Sep 17 00:00:00 2001
+From: Jason Andryuk <jandryuk@gmail.com>
+Date: Tue, 7 Feb 2023 17:06:47 +0100
+Subject: [PATCH 10/61] libxl: fix guest kexec - skip cpuid policy
+
+When a domain performs a kexec (soft reset), libxl__build_pre() is
+called with the existing domid. Calling libxl__cpuid_legacy() on the
+existing domain fails since the cpuid policy has already been set, and
+the guest isn't rebuilt and doesn't kexec.
+
+xc: error: Failed to set d1's policy (err leaf 0xffffffff, subleaf 0xffffffff, msr 0xffffffff) (17 = File exists): Internal error
+libxl: error: libxl_cpuid.c:494:libxl__cpuid_legacy: Domain 1:Failed to apply CPUID policy: File exists
+libxl: error: libxl_create.c:1641:domcreate_rebuild_done: Domain 1:cannot (re-)build domain: -3
+libxl: error: libxl_xshelp.c:201:libxl__xs_read_mandatory: xenstore read failed: `/libxl/1/type': No such file or directory
+libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain type for domid=1, assuming HVM
+
+During a soft_reset, skip calling libxl__cpuid_legacy() to avoid the
+issue. Before commit 34990446ca91, the libxl__cpuid_legacy() failure
+would have been ignored, so kexec would continue.
+
+Fixes: 34990446ca91 ("libxl: don't ignore the return value from xc_cpuid_apply_policy")
+Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
+Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
+master commit: 1e454c2b5b1172e0fc7457e411ebaba61db8fc87
+master date: 2023-01-26 10:58:23 +0100
+---
+ tools/libs/light/libxl_create.c | 2 ++
+ tools/libs/light/libxl_dom.c | 2 +-
+ tools/libs/light/libxl_internal.h | 1 +
+ 3 files changed, 4 insertions(+), 1 deletion(-)
+
+diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
+index 885675591f..2e6357a9d7 100644
+--- a/tools/libs/light/libxl_create.c
++++ b/tools/libs/light/libxl_create.c
+@@ -2176,6 +2176,8 @@ static int do_domain_soft_reset(libxl_ctx *ctx,
+ aop_console_how);
+ cdcs->domid_out = &domid_out;
+
++ state->soft_reset = true;
++
+ dom_path = libxl__xs_get_dompath(gc, domid);
+ if (!dom_path) {
+ LOGD(ERROR, domid, "failed to read domain path");
+diff --git a/tools/libs/light/libxl_dom.c b/tools/libs/light/libxl_dom.c
+index 73fccd9243..a2bd2395fa 100644
+--- a/tools/libs/light/libxl_dom.c
++++ b/tools/libs/light/libxl_dom.c
+@@ -384,7 +384,7 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
+ /* Construct a CPUID policy, but only for brand new domains. Domains
+ * being migrated-in/restored have CPUID handled during the
+ * static_data_done() callback. */
+- if (!state->restore)
++ if (!state->restore && !state->soft_reset)
+ rc = libxl__cpuid_legacy(ctx, domid, false, info);
+
+ out:
+diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
+index 0b4671318c..ee6a251700 100644
+--- a/tools/libs/light/libxl_internal.h
++++ b/tools/libs/light/libxl_internal.h
+@@ -1407,6 +1407,7 @@ typedef struct {
+ /* Whether this domain is being migrated/restored, or booting fresh. Only
+ * applicable to the primary domain, not support domains (e.g. stub QEMU). */
+ bool restore;
++ bool soft_reset;
+ } libxl__domain_build_state;
+
+ _hidden void libxl__domain_build_state_init(libxl__domain_build_state *s);
+--
+2.40.0
+
diff --git a/0010-x86-p2m-free-the-paging-memory-pool-preemptively.patch b/0010-x86-p2m-free-the-paging-memory-pool-preemptively.patch
deleted file mode 100644
index 8c80e31..0000000
--- a/0010-x86-p2m-free-the-paging-memory-pool-preemptively.patch
+++ /dev/null
@@ -1,181 +0,0 @@
-From f5959ed715e19cf2844656477dbf74c2f576c9d4 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 11 Oct 2022 14:54:21 +0200
-Subject: [PATCH 10/87] x86/p2m: free the paging memory pool preemptively
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The paging memory pool is currently freed in two different places:
-from {shadow,hap}_teardown() via domain_relinquish_resources() and
-from {shadow,hap}_final_teardown() via complete_domain_destroy().
-While the former does handle preemption, the later doesn't.
-
-Attempt to move as much p2m related freeing as possible to happen
-before the call to {shadow,hap}_teardown(), so that most memory can be
-freed in a preemptive way. In order to avoid causing issues to
-existing callers leave the root p2m page tables set and free them in
-{hap,shadow}_final_teardown(). Also modify {hap,shadow}_free to free
-the page immediately if the domain is dying, so that pages don't
-accumulate in the pool when {shadow,hap}_final_teardown() get called.
-
-Move altp2m_vcpu_disable_ve() to be done in hap_teardown(), as that's
-the place where altp2m_active gets disabled now.
-
-This is part of CVE-2022-33746 / XSA-410.
-
-Reported-by: Julien Grall <jgrall@amazon.com>
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Tim Deegan <tim@xen.org>
-master commit: e7aa55c0aab36d994bf627c92bd5386ae167e16e
-master date: 2022-10-11 14:24:21 +0200
----
- xen/arch/x86/domain.c | 7 ------
- xen/arch/x86/mm/hap/hap.c | 42 ++++++++++++++++++++-------------
- xen/arch/x86/mm/shadow/common.c | 12 ++++++++++
- 3 files changed, 38 insertions(+), 23 deletions(-)
-
-diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
-index 0d39981550ca..a4356893bdbc 100644
---- a/xen/arch/x86/domain.c
-+++ b/xen/arch/x86/domain.c
-@@ -38,7 +38,6 @@
- #include <xen/livepatch.h>
- #include <public/sysctl.h>
- #include <public/hvm/hvm_vcpu.h>
--#include <asm/altp2m.h>
- #include <asm/regs.h>
- #include <asm/mc146818rtc.h>
- #include <asm/system.h>
-@@ -2381,12 +2380,6 @@ int domain_relinquish_resources(struct domain *d)
- vpmu_destroy(v);
- }
-
-- if ( altp2m_active(d) )
-- {
-- for_each_vcpu ( d, v )
-- altp2m_vcpu_disable_ve(v);
-- }
--
- if ( is_pv_domain(d) )
- {
- for_each_vcpu ( d, v )
-diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
-index aef2297450e1..a44fcfd95e1e 100644
---- a/xen/arch/x86/mm/hap/hap.c
-+++ b/xen/arch/x86/mm/hap/hap.c
-@@ -28,6 +28,7 @@
- #include <xen/domain_page.h>
- #include <xen/guest_access.h>
- #include <xen/keyhandler.h>
-+#include <asm/altp2m.h>
- #include <asm/event.h>
- #include <asm/page.h>
- #include <asm/current.h>
-@@ -546,24 +547,8 @@ void hap_final_teardown(struct domain *d)
- unsigned int i;
-
- if ( hvm_altp2m_supported() )
-- {
-- d->arch.altp2m_active = 0;
--
-- if ( d->arch.altp2m_eptp )
-- {
-- free_xenheap_page(d->arch.altp2m_eptp);
-- d->arch.altp2m_eptp = NULL;
-- }
--
-- if ( d->arch.altp2m_visible_eptp )
-- {
-- free_xenheap_page(d->arch.altp2m_visible_eptp);
-- d->arch.altp2m_visible_eptp = NULL;
-- }
--
- for ( i = 0; i < MAX_ALTP2M; i++ )
- p2m_teardown(d->arch.altp2m_p2m[i], true);
-- }
-
- /* Destroy nestedp2m's first */
- for (i = 0; i < MAX_NESTEDP2M; i++) {
-@@ -578,6 +563,8 @@ void hap_final_teardown(struct domain *d)
- paging_lock(d);
- hap_set_allocation(d, 0, NULL);
- ASSERT(d->arch.paging.hap.p2m_pages == 0);
-+ ASSERT(d->arch.paging.hap.free_pages == 0);
-+ ASSERT(d->arch.paging.hap.total_pages == 0);
- paging_unlock(d);
- }
-
-@@ -603,6 +590,7 @@ void hap_vcpu_teardown(struct vcpu *v)
- void hap_teardown(struct domain *d, bool *preempted)
- {
- struct vcpu *v;
-+ unsigned int i;
-
- ASSERT(d->is_dying);
- ASSERT(d != current->domain);
-@@ -611,6 +599,28 @@ void hap_teardown(struct domain *d, bool *preempted)
- for_each_vcpu ( d, v )
- hap_vcpu_teardown(v);
-
-+ /* Leave the root pt in case we get further attempts to modify the p2m. */
-+ if ( hvm_altp2m_supported() )
-+ {
-+ if ( altp2m_active(d) )
-+ for_each_vcpu ( d, v )
-+ altp2m_vcpu_disable_ve(v);
-+
-+ d->arch.altp2m_active = 0;
-+
-+ FREE_XENHEAP_PAGE(d->arch.altp2m_eptp);
-+ FREE_XENHEAP_PAGE(d->arch.altp2m_visible_eptp);
-+
-+ for ( i = 0; i < MAX_ALTP2M; i++ )
-+ p2m_teardown(d->arch.altp2m_p2m[i], false);
-+ }
-+
-+ /* Destroy nestedp2m's after altp2m. */
-+ for ( i = 0; i < MAX_NESTEDP2M; i++ )
-+ p2m_teardown(d->arch.nested_p2m[i], false);
-+
-+ p2m_teardown(p2m_get_hostp2m(d), false);
-+
- paging_lock(d); /* Keep various asserts happy */
-
- if ( d->arch.paging.hap.total_pages != 0 )
-diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
-index 9eb33eafc7f7..ac9a1ae07808 100644
---- a/xen/arch/x86/mm/shadow/common.c
-+++ b/xen/arch/x86/mm/shadow/common.c
-@@ -2824,8 +2824,17 @@ void shadow_teardown(struct domain *d, bool *preempted)
- for_each_vcpu ( d, v )
- shadow_vcpu_teardown(v);
-
-+ p2m_teardown(p2m_get_hostp2m(d), false);
-+
- paging_lock(d);
-
-+ /*
-+ * Reclaim all shadow memory so that shadow_set_allocation() doesn't find
-+ * in-use pages, as _shadow_prealloc() will no longer try to reclaim pages
-+ * because the domain is dying.
-+ */
-+ shadow_blow_tables(d);
-+
- #if (SHADOW_OPTIMIZATIONS & (SHOPT_VIRTUAL_TLB|SHOPT_OUT_OF_SYNC))
- /* Free the virtual-TLB array attached to each vcpu */
- for_each_vcpu(d, v)
-@@ -2946,6 +2955,9 @@ void shadow_final_teardown(struct domain *d)
- d->arch.paging.shadow.total_pages,
- d->arch.paging.shadow.free_pages,
- d->arch.paging.shadow.p2m_pages);
-+ ASSERT(!d->arch.paging.shadow.total_pages);
-+ ASSERT(!d->arch.paging.shadow.free_pages);
-+ ASSERT(!d->arch.paging.shadow.p2m_pages);
- paging_unlock(d);
- }
-
---
-2.37.4
-
diff --git a/0011-tools-ocaml-xenctrl-Make-domain_getinfolist-tail-rec.patch b/0011-tools-ocaml-xenctrl-Make-domain_getinfolist-tail-rec.patch
new file mode 100644
index 0000000..1d4455f
--- /dev/null
+++ b/0011-tools-ocaml-xenctrl-Make-domain_getinfolist-tail-rec.patch
@@ -0,0 +1,71 @@
+From c6a3d14df051bae0323af539e34cf5a65fba1112 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Tue, 1 Nov 2022 17:59:16 +0000
+Subject: [PATCH 11/61] tools/ocaml/xenctrl: Make domain_getinfolist tail
+ recursive
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+domain_getinfolist() is quadratic with the number of domains, because of the
+behaviour of the underlying hypercall. xenopsd was further observed to be
+wasting excessive quantites of time manipulating the list of already-obtained
+domains.
+
+Implement a tail recursive `rev_concat` equivalent to `concat |> rev`, and use
+it instead of calling `@` multiple times.
+
+An incidental benefit is that the list of domains will now be in domid order,
+instead of having pairs of 2 domains changing direction every time.
+
+In a scalability testing scenario with ~1000 VMs, a combination of this and
+the subsequent change takes xenopsd's wallclock time in domain_getinfolist()
+down from 88% to 0.02%
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Tested-by: Pau Ruiz Safont <pau.safont@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit c3b6be714c64aa62b56d0bce96f4b6a10b5c2078)
+---
+ tools/ocaml/libs/xc/xenctrl.ml | 23 +++++++++++++++++------
+ 1 file changed, 17 insertions(+), 6 deletions(-)
+
+diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
+index 7503031d8f..f10b686215 100644
+--- a/tools/ocaml/libs/xc/xenctrl.ml
++++ b/tools/ocaml/libs/xc/xenctrl.ml
+@@ -212,14 +212,25 @@ external domain_shutdown: handle -> domid -> shutdown_reason -> unit
+ external _domain_getinfolist: handle -> domid -> int -> domaininfo list
+ = "stub_xc_domain_getinfolist"
+
++let rev_append_fold acc e = List.rev_append e acc
++
++(**
++ * [rev_concat lst] is equivalent to [lst |> List.concat |> List.rev]
++ * except it is tail recursive, whereas [List.concat] isn't.
++ * Example:
++ * rev_concat [[10;9;8];[7;6];[5]]] = [5; 6; 7; 8; 9; 10]
++ *)
++let rev_concat lst = List.fold_left rev_append_fold [] lst
++
+ let domain_getinfolist handle first_domain =
+ let nb = 2 in
+- let last_domid l = (List.hd l).domid + 1 in
+- let rec __getlist from =
+- let l = _domain_getinfolist handle from nb in
+- (if List.length l = nb then __getlist (last_domid l) else []) @ l
+- in
+- List.rev (__getlist first_domain)
++ let rec __getlist lst from =
++ (* _domain_getinfolist returns domains in reverse order, largest first *)
++ match _domain_getinfolist handle from nb with
++ | [] -> rev_concat lst
++ | (hd :: _) as l -> __getlist (l :: lst) (hd.domid + 1)
++ in
++ __getlist [] first_domain
+
+ external domain_getinfo: handle -> domid -> domaininfo= "stub_xc_domain_getinfo"
+
+--
+2.40.0
+
diff --git a/0011-xen-x86-p2m-Add-preemption-in-p2m_teardown.patch b/0011-xen-x86-p2m-Add-preemption-in-p2m_teardown.patch
deleted file mode 100644
index 096656a..0000000
--- a/0011-xen-x86-p2m-Add-preemption-in-p2m_teardown.patch
+++ /dev/null
@@ -1,197 +0,0 @@
-From a603386b422f5cb4c5e2639a7e20a1d99dba2175 Mon Sep 17 00:00:00 2001
-From: Julien Grall <jgrall@amazon.com>
-Date: Tue, 11 Oct 2022 14:54:44 +0200
-Subject: [PATCH 11/87] xen/x86: p2m: Add preemption in p2m_teardown()
-
-The list p2m->pages contain all the pages used by the P2M. On large
-instance this can be quite large and the time spent to call
-d->arch.paging.free_page() will take more than 1ms for a 80GB guest
-on a Xen running in nested environment on a c5.metal.
-
-By extrapolation, it would take > 100ms for a 8TB guest (what we
-current security support). So add some preemption in p2m_teardown()
-and propagate to the callers. Note there are 3 places where
-the preemption is not enabled:
- - hap_final_teardown()/shadow_final_teardown(): We are
- preventing update the P2M once the domain is dying (so
- no more pages could be allocated) and most of the P2M pages
- will be freed in preemptive manneer when relinquishing the
- resources. So this is fine to disable preemption.
- - shadow_enable(): This is fine because it will undo the allocation
- that may have been made by p2m_alloc_table() (so only the root
- page table).
-
-The preemption is arbitrarily checked every 1024 iterations.
-
-We now need to include <xen/event.h> in p2m-basic in order to
-import the definition for local_events_need_delivery() used by
-general_preempt_check(). Ideally, the inclusion should happen in
-xen/sched.h but it opened a can of worms.
-
-Note that with the current approach, Xen doesn't keep track on whether
-the alt/nested P2Ms have been cleared. So there are some redundant work.
-However, this is not expected to incurr too much overhead (the P2M lock
-shouldn't be contended during teardown). So this is optimization is
-left outside of the security event.
-
-This is part of CVE-2022-33746 / XSA-410.
-
-Signed-off-by: Julien Grall <jgrall@amazon.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-master commit: 8a2111250b424edc49c65c4d41b276766d30635c
-master date: 2022-10-11 14:24:48 +0200
----
- xen/arch/x86/mm/hap/hap.c | 22 ++++++++++++++++------
- xen/arch/x86/mm/p2m.c | 18 +++++++++++++++---
- xen/arch/x86/mm/shadow/common.c | 12 +++++++++---
- xen/include/asm-x86/p2m.h | 2 +-
- 4 files changed, 41 insertions(+), 13 deletions(-)
-
-diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
-index a44fcfd95e1e..1f9a157a0c34 100644
---- a/xen/arch/x86/mm/hap/hap.c
-+++ b/xen/arch/x86/mm/hap/hap.c
-@@ -548,17 +548,17 @@ void hap_final_teardown(struct domain *d)
-
- if ( hvm_altp2m_supported() )
- for ( i = 0; i < MAX_ALTP2M; i++ )
-- p2m_teardown(d->arch.altp2m_p2m[i], true);
-+ p2m_teardown(d->arch.altp2m_p2m[i], true, NULL);
-
- /* Destroy nestedp2m's first */
- for (i = 0; i < MAX_NESTEDP2M; i++) {
-- p2m_teardown(d->arch.nested_p2m[i], true);
-+ p2m_teardown(d->arch.nested_p2m[i], true, NULL);
- }
-
- if ( d->arch.paging.hap.total_pages != 0 )
- hap_teardown(d, NULL);
-
-- p2m_teardown(p2m_get_hostp2m(d), true);
-+ p2m_teardown(p2m_get_hostp2m(d), true, NULL);
- /* Free any memory that the p2m teardown released */
- paging_lock(d);
- hap_set_allocation(d, 0, NULL);
-@@ -612,14 +612,24 @@ void hap_teardown(struct domain *d, bool *preempted)
- FREE_XENHEAP_PAGE(d->arch.altp2m_visible_eptp);
-
- for ( i = 0; i < MAX_ALTP2M; i++ )
-- p2m_teardown(d->arch.altp2m_p2m[i], false);
-+ {
-+ p2m_teardown(d->arch.altp2m_p2m[i], false, preempted);
-+ if ( preempted && *preempted )
-+ return;
-+ }
- }
-
- /* Destroy nestedp2m's after altp2m. */
- for ( i = 0; i < MAX_NESTEDP2M; i++ )
-- p2m_teardown(d->arch.nested_p2m[i], false);
-+ {
-+ p2m_teardown(d->arch.nested_p2m[i], false, preempted);
-+ if ( preempted && *preempted )
-+ return;
-+ }
-
-- p2m_teardown(p2m_get_hostp2m(d), false);
-+ p2m_teardown(p2m_get_hostp2m(d), false, preempted);
-+ if ( preempted && *preempted )
-+ return;
-
- paging_lock(d); /* Keep various asserts happy */
-
-diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
-index aba4f17cbe12..8781df9dda8d 100644
---- a/xen/arch/x86/mm/p2m.c
-+++ b/xen/arch/x86/mm/p2m.c
-@@ -749,12 +749,13 @@ int p2m_alloc_table(struct p2m_domain *p2m)
- * hvm fixme: when adding support for pvh non-hardware domains, this path must
- * cleanup any foreign p2m types (release refcnts on them).
- */
--void p2m_teardown(struct p2m_domain *p2m, bool remove_root)
-+void p2m_teardown(struct p2m_domain *p2m, bool remove_root, bool *preempted)
- /* Return all the p2m pages to Xen.
- * We know we don't have any extra mappings to these pages */
- {
- struct page_info *pg, *root_pg = NULL;
- struct domain *d;
-+ unsigned int i = 0;
-
- if (p2m == NULL)
- return;
-@@ -773,8 +774,19 @@ void p2m_teardown(struct p2m_domain *p2m, bool remove_root)
- }
-
- while ( (pg = page_list_remove_head(&p2m->pages)) )
-- if ( pg != root_pg )
-- d->arch.paging.free_page(d, pg);
-+ {
-+ if ( pg == root_pg )
-+ continue;
-+
-+ d->arch.paging.free_page(d, pg);
-+
-+ /* Arbitrarily check preemption every 1024 iterations */
-+ if ( preempted && !(++i % 1024) && general_preempt_check() )
-+ {
-+ *preempted = true;
-+ break;
-+ }
-+ }
-
- if ( root_pg )
- page_list_add(root_pg, &p2m->pages);
-diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
-index ac9a1ae07808..3b0d781991b5 100644
---- a/xen/arch/x86/mm/shadow/common.c
-+++ b/xen/arch/x86/mm/shadow/common.c
-@@ -2770,8 +2770,12 @@ int shadow_enable(struct domain *d, u32 mode)
- out_locked:
- paging_unlock(d);
- out_unlocked:
-+ /*
-+ * This is fine to ignore the preemption here because only the root
-+ * will be allocated by p2m_alloc_table().
-+ */
- if ( rv != 0 && !pagetable_is_null(p2m_get_pagetable(p2m)) )
-- p2m_teardown(p2m, true);
-+ p2m_teardown(p2m, true, NULL);
- if ( rv != 0 && pg != NULL )
- {
- pg->count_info &= ~PGC_count_mask;
-@@ -2824,7 +2828,9 @@ void shadow_teardown(struct domain *d, bool *preempted)
- for_each_vcpu ( d, v )
- shadow_vcpu_teardown(v);
-
-- p2m_teardown(p2m_get_hostp2m(d), false);
-+ p2m_teardown(p2m_get_hostp2m(d), false, preempted);
-+ if ( preempted && *preempted )
-+ return;
-
- paging_lock(d);
-
-@@ -2945,7 +2951,7 @@ void shadow_final_teardown(struct domain *d)
- shadow_teardown(d, NULL);
-
- /* It is now safe to pull down the p2m map. */
-- p2m_teardown(p2m_get_hostp2m(d), true);
-+ p2m_teardown(p2m_get_hostp2m(d), true, NULL);
- /* Free any shadow memory that the p2m teardown released */
- paging_lock(d);
- shadow_set_allocation(d, 0, NULL);
-diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
-index c3c16748e7d5..2db9ab0122f2 100644
---- a/xen/include/asm-x86/p2m.h
-+++ b/xen/include/asm-x86/p2m.h
-@@ -574,7 +574,7 @@ int p2m_init(struct domain *d);
- int p2m_alloc_table(struct p2m_domain *p2m);
-
- /* Return all the p2m resources to Xen. */
--void p2m_teardown(struct p2m_domain *p2m, bool remove_root);
-+void p2m_teardown(struct p2m_domain *p2m, bool remove_root, bool *preempted);
- void p2m_final_teardown(struct domain *d);
-
- /* Add a page to a domain's p2m table */
---
-2.37.4
-
diff --git a/0012-libxl-docs-Use-arch-specific-default-paging-memory.patch b/0012-libxl-docs-Use-arch-specific-default-paging-memory.patch
deleted file mode 100644
index d1aeae9..0000000
--- a/0012-libxl-docs-Use-arch-specific-default-paging-memory.patch
+++ /dev/null
@@ -1,149 +0,0 @@
-From 755a9b52844de3e1e47aa1fc9991a4240ccfbf35 Mon Sep 17 00:00:00 2001
-From: Henry Wang <Henry.Wang@arm.com>
-Date: Tue, 11 Oct 2022 14:55:08 +0200
-Subject: [PATCH 12/87] libxl, docs: Use arch-specific default paging memory
-
-The default paging memory (descibed in `shadow_memory` entry in xl
-config) in libxl is used to determine the memory pool size for xl
-guests. Currently this size is only used for x86, and contains a part
-of RAM to shadow the resident processes. Since on Arm there is no
-shadow mode guests, so the part of RAM to shadow the resident processes
-is not necessary. Therefore, this commit splits the function
-`libxl_get_required_shadow_memory()` to arch specific helpers and
-renamed the helper to `libxl__arch_get_required_paging_memory()`.
-
-On x86, this helper calls the original value from
-`libxl_get_required_shadow_memory()` so no functional change intended.
-
-On Arm, this helper returns 1MB per vcpu plus 4KB per MiB of RAM
-for the P2M map and additional 512KB.
-
-Also update the xl.cfg documentation to add Arm documentation
-according to code changes and correct the comment style following Xen
-coding style.
-
-This is part of CVE-2022-33747 / XSA-409.
-
-Suggested-by: Julien Grall <jgrall@amazon.com>
-Signed-off-by: Henry Wang <Henry.Wang@arm.com>
-Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
-master commit: 156a239ea288972425f967ac807b3cb5b5e14874
-master date: 2022-10-11 14:28:37 +0200
----
- docs/man/xl.cfg.5.pod.in | 5 +++++
- tools/libs/light/libxl_arch.h | 4 ++++
- tools/libs/light/libxl_arm.c | 14 ++++++++++++++
- tools/libs/light/libxl_utils.c | 9 ++-------
- tools/libs/light/libxl_x86.c | 13 +++++++++++++
- 5 files changed, 38 insertions(+), 7 deletions(-)
-
-diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
-index b98d1613987e..eda1e77ebd06 100644
---- a/docs/man/xl.cfg.5.pod.in
-+++ b/docs/man/xl.cfg.5.pod.in
-@@ -1768,6 +1768,11 @@ are not using hardware assisted paging (i.e. you are using shadow
- mode) and your guest workload consists of a very large number of
- similar processes then increasing this value may improve performance.
-
-+On Arm, this field is used to determine the size of the guest P2M pages
-+pool, and the default value is 1MB per vCPU plus 4KB per MB of RAM for
-+the P2M map and additional 512KB for extended regions. Users should
-+adjust this value if bigger P2M pool size is needed.
-+
- =back
-
- =head3 Processor and Platform Features
-diff --git a/tools/libs/light/libxl_arch.h b/tools/libs/light/libxl_arch.h
-index 1522ecb97f72..5a060c2c3033 100644
---- a/tools/libs/light/libxl_arch.h
-+++ b/tools/libs/light/libxl_arch.h
-@@ -90,6 +90,10 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
- libxl_domain_config *dst,
- const libxl_domain_config *src);
-
-+_hidden
-+unsigned long libxl__arch_get_required_paging_memory(unsigned long maxmem_kb,
-+ unsigned int smp_cpus);
-+
- #if defined(__i386__) || defined(__x86_64__)
-
- #define LAPIC_BASE_ADDRESS 0xfee00000
-diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
-index eef1de093914..73a95e83af24 100644
---- a/tools/libs/light/libxl_arm.c
-+++ b/tools/libs/light/libxl_arm.c
-@@ -154,6 +154,20 @@ out:
- return rc;
- }
-
-+unsigned long libxl__arch_get_required_paging_memory(unsigned long maxmem_kb,
-+ unsigned int smp_cpus)
-+{
-+ /*
-+ * 256 pages (1MB) per vcpu,
-+ * plus 1 page per MiB of RAM for the P2M map,
-+ * plus 1 page per MiB of extended region. This default value is 128 MiB
-+ * which should be enough for domains that are not running backend.
-+ * This is higher than the minimum that Xen would allocate if no value
-+ * were given (but the Xen minimum is for safety, not performance).
-+ */
-+ return 4 * (256 * smp_cpus + maxmem_kb / 1024 + 128);
-+}
-+
- static struct arch_info {
- const char *guest_type;
- const char *timer_compat;
-diff --git a/tools/libs/light/libxl_utils.c b/tools/libs/light/libxl_utils.c
-index 4699c4a0a36f..e276c0ee9cc3 100644
---- a/tools/libs/light/libxl_utils.c
-+++ b/tools/libs/light/libxl_utils.c
-@@ -18,6 +18,7 @@
- #include <ctype.h>
-
- #include "libxl_internal.h"
-+#include "libxl_arch.h"
- #include "_paths.h"
-
- #ifndef LIBXL_HAVE_NONCONST_LIBXL_BASENAME_RETURN_VALUE
-@@ -39,13 +40,7 @@ char *libxl_basename(const char *name)
-
- unsigned long libxl_get_required_shadow_memory(unsigned long maxmem_kb, unsigned int smp_cpus)
- {
-- /* 256 pages (1MB) per vcpu,
-- plus 1 page per MiB of RAM for the P2M map,
-- plus 1 page per MiB of RAM to shadow the resident processes.
-- This is higher than the minimum that Xen would allocate if no value
-- were given (but the Xen minimum is for safety, not performance).
-- */
-- return 4 * (256 * smp_cpus + 2 * (maxmem_kb / 1024));
-+ return libxl__arch_get_required_paging_memory(maxmem_kb, smp_cpus);
- }
-
- char *libxl_domid_to_name(libxl_ctx *ctx, uint32_t domid)
-diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
-index 1feadebb1852..51362893cf98 100644
---- a/tools/libs/light/libxl_x86.c
-+++ b/tools/libs/light/libxl_x86.c
-@@ -882,6 +882,19 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
- libxl_defbool_val(src->b_info.arch_x86.msr_relaxed));
- }
-
-+unsigned long libxl__arch_get_required_paging_memory(unsigned long maxmem_kb,
-+ unsigned int smp_cpus)
-+{
-+ /*
-+ * 256 pages (1MB) per vcpu,
-+ * plus 1 page per MiB of RAM for the P2M map,
-+ * plus 1 page per MiB of RAM to shadow the resident processes.
-+ * This is higher than the minimum that Xen would allocate if no value
-+ * were given (but the Xen minimum is for safety, not performance).
-+ */
-+ return 4 * (256 * smp_cpus + 2 * (maxmem_kb / 1024));
-+}
-+
- /*
- * Local variables:
- * mode: C
---
-2.37.4
-
diff --git a/0012-tools-ocaml-xenctrl-Use-larger-chunksize-in-domain_g.patch b/0012-tools-ocaml-xenctrl-Use-larger-chunksize-in-domain_g.patch
new file mode 100644
index 0000000..fc352ad
--- /dev/null
+++ b/0012-tools-ocaml-xenctrl-Use-larger-chunksize-in-domain_g.patch
@@ -0,0 +1,41 @@
+From 8c66a2d88a9f17e5b5099fcb83231b7a1169ca25 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Tue, 1 Nov 2022 17:59:17 +0000
+Subject: [PATCH 12/61] tools/ocaml/xenctrl: Use larger chunksize in
+ domain_getinfolist
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+domain_getinfolist() is quadratic with the number of domains, because of the
+behaviour of the underlying hypercall. Nevertheless, getting domain info in
+blocks of 1024 is far more efficient than blocks of 2.
+
+In a scalability testing scenario with ~1000 VMs, a combination of this and
+the previous change takes xenopsd's wallclock time in domain_getinfolist()
+down from 88% to 0.02%
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Tested-by: Pau Ruiz Safont <pau.safont@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 95db09b1b154fb72fad861815ceae1f3fa49fc4e)
+---
+ tools/ocaml/libs/xc/xenctrl.ml | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
+index f10b686215..b40c70d33f 100644
+--- a/tools/ocaml/libs/xc/xenctrl.ml
++++ b/tools/ocaml/libs/xc/xenctrl.ml
+@@ -223,7 +223,7 @@ let rev_append_fold acc e = List.rev_append e acc
+ let rev_concat lst = List.fold_left rev_append_fold [] lst
+
+ let domain_getinfolist handle first_domain =
+- let nb = 2 in
++ let nb = 1024 in
+ let rec __getlist lst from =
+ (* _domain_getinfolist returns domains in reverse order, largest first *)
+ match _domain_getinfolist handle from nb with
+--
+2.40.0
+
diff --git a/0013-tools-ocaml-xb-mmap-Use-Data_abstract_val-wrapper.patch b/0013-tools-ocaml-xb-mmap-Use-Data_abstract_val-wrapper.patch
new file mode 100644
index 0000000..a999dd8
--- /dev/null
+++ b/0013-tools-ocaml-xb-mmap-Use-Data_abstract_val-wrapper.patch
@@ -0,0 +1,75 @@
+From 049d16c8ce900dfc8f4b657849aeb82b95ed857c Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Fri, 16 Dec 2022 18:25:10 +0000
+Subject: [PATCH 13/61] tools/ocaml/xb,mmap: Use Data_abstract_val wrapper
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+This is not strictly necessary since it is essentially a no-op currently: a
+cast to void * and value *, even in OCaml 5.0.
+
+However it does make it clearer that what we have here is not a regular OCaml
+value, but one allocated with Abstract_tag or Custom_tag, and follows the
+example from the manual more closely:
+https://v2.ocaml.org/manual/intfc.html#ss:c-outside-head
+
+It also makes it clearer that these modules have been reviewed for
+compat with OCaml 5.0.
+
+We cannot use OCaml finalizers here, because we want exact control over when
+to unmap these pages from remote domains.
+
+No functional change.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit d2ccc637111d6dbcf808aaffeec7a46f0b1e1c81)
+---
+ tools/ocaml/libs/mmap/mmap_stubs.h | 4 ++++
+ tools/ocaml/libs/mmap/xenmmap_stubs.c | 2 +-
+ tools/ocaml/libs/xb/xs_ring_stubs.c | 2 +-
+ 3 files changed, 6 insertions(+), 2 deletions(-)
+
+diff --git a/tools/ocaml/libs/mmap/mmap_stubs.h b/tools/ocaml/libs/mmap/mmap_stubs.h
+index 65e4239890..f4784e4715 100644
+--- a/tools/ocaml/libs/mmap/mmap_stubs.h
++++ b/tools/ocaml/libs/mmap/mmap_stubs.h
+@@ -30,4 +30,8 @@ struct mmap_interface
+ int len;
+ };
+
++#ifndef Data_abstract_val
++#define Data_abstract_val(x) ((void *)Op_val(x))
++#endif
++
+ #endif
+diff --git a/tools/ocaml/libs/mmap/xenmmap_stubs.c b/tools/ocaml/libs/mmap/xenmmap_stubs.c
+index e2ce088e25..e03951d781 100644
+--- a/tools/ocaml/libs/mmap/xenmmap_stubs.c
++++ b/tools/ocaml/libs/mmap/xenmmap_stubs.c
+@@ -28,7 +28,7 @@
+ #include <caml/fail.h>
+ #include <caml/callback.h>
+
+-#define Intf_val(a) ((struct mmap_interface *) a)
++#define Intf_val(a) ((struct mmap_interface *)Data_abstract_val(a))
+
+ static int mmap_interface_init(struct mmap_interface *intf,
+ int fd, int pflag, int mflag,
+diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
+index 7a91fdee75..1f58524535 100644
+--- a/tools/ocaml/libs/xb/xs_ring_stubs.c
++++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
+@@ -35,7 +35,7 @@
+ #include <sys/mman.h>
+ #include "mmap_stubs.h"
+
+-#define GET_C_STRUCT(a) ((struct mmap_interface *) a)
++#define GET_C_STRUCT(a) ((struct mmap_interface *)Data_abstract_val(a))
+
+ /*
+ * Bytes_val has been introduced by Ocaml 4.06.1. So define our own version
+--
+2.40.0
+
diff --git a/0013-xen-arm-Construct-the-P2M-pages-pool-for-guests.patch b/0013-xen-arm-Construct-the-P2M-pages-pool-for-guests.patch
deleted file mode 100644
index 7ab3212..0000000
--- a/0013-xen-arm-Construct-the-P2M-pages-pool-for-guests.patch
+++ /dev/null
@@ -1,189 +0,0 @@
-From 914fc8e8b4cc003e90d51bee0aef54687358530a Mon Sep 17 00:00:00 2001
-From: Henry Wang <Henry.Wang@arm.com>
-Date: Tue, 11 Oct 2022 14:55:21 +0200
-Subject: [PATCH 13/87] xen/arm: Construct the P2M pages pool for guests
-
-This commit constructs the p2m pages pool for guests from the
-data structure and helper perspective.
-
-This is implemented by:
-
-- Adding a `struct paging_domain` which contains a freelist, a
-counter variable and a spinlock to `struct arch_domain` to
-indicate the free p2m pages and the number of p2m total pages in
-the p2m pages pool.
-
-- Adding a helper `p2m_get_allocation` to get the p2m pool size.
-
-- Adding a helper `p2m_set_allocation` to set the p2m pages pool
-size. This helper should be called before allocating memory for
-a guest.
-
-- Adding a helper `p2m_teardown_allocation` to free the p2m pages
-pool. This helper should be called during the xl domain destory.
-
-This is part of CVE-2022-33747 / XSA-409.
-
-Signed-off-by: Henry Wang <Henry.Wang@arm.com>
-Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
-master commit: 55914f7fc91a468649b8a3ec3f53ae1c4aca6670
-master date: 2022-10-11 14:28:39 +0200
----
- xen/arch/arm/p2m.c | 88 ++++++++++++++++++++++++++++++++++++
- xen/include/asm-arm/domain.h | 10 ++++
- xen/include/asm-arm/p2m.h | 4 ++
- 3 files changed, 102 insertions(+)
-
-diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
-index 27418ee5ee98..d8957dd8727c 100644
---- a/xen/arch/arm/p2m.c
-+++ b/xen/arch/arm/p2m.c
-@@ -50,6 +50,92 @@ static uint64_t generate_vttbr(uint16_t vmid, mfn_t root_mfn)
- return (mfn_to_maddr(root_mfn) | ((uint64_t)vmid << 48));
- }
-
-+/* Return the size of the pool, rounded up to the nearest MB */
-+unsigned int p2m_get_allocation(struct domain *d)
-+{
-+ unsigned long nr_pages = ACCESS_ONCE(d->arch.paging.p2m_total_pages);
-+
-+ return ROUNDUP(nr_pages, 1 << (20 - PAGE_SHIFT)) >> (20 - PAGE_SHIFT);
-+}
-+
-+/*
-+ * Set the pool of pages to the required number of pages.
-+ * Returns 0 for success, non-zero for failure.
-+ * Call with d->arch.paging.lock held.
-+ */
-+int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted)
-+{
-+ struct page_info *pg;
-+
-+ ASSERT(spin_is_locked(&d->arch.paging.lock));
-+
-+ for ( ; ; )
-+ {
-+ if ( d->arch.paging.p2m_total_pages < pages )
-+ {
-+ /* Need to allocate more memory from domheap */
-+ pg = alloc_domheap_page(NULL, 0);
-+ if ( pg == NULL )
-+ {
-+ printk(XENLOG_ERR "Failed to allocate P2M pages.\n");
-+ return -ENOMEM;
-+ }
-+ ACCESS_ONCE(d->arch.paging.p2m_total_pages) =
-+ d->arch.paging.p2m_total_pages + 1;
-+ page_list_add_tail(pg, &d->arch.paging.p2m_freelist);
-+ }
-+ else if ( d->arch.paging.p2m_total_pages > pages )
-+ {
-+ /* Need to return memory to domheap */
-+ pg = page_list_remove_head(&d->arch.paging.p2m_freelist);
-+ if( pg )
-+ {
-+ ACCESS_ONCE(d->arch.paging.p2m_total_pages) =
-+ d->arch.paging.p2m_total_pages - 1;
-+ free_domheap_page(pg);
-+ }
-+ else
-+ {
-+ printk(XENLOG_ERR
-+ "Failed to free P2M pages, P2M freelist is empty.\n");
-+ return -ENOMEM;
-+ }
-+ }
-+ else
-+ break;
-+
-+ /* Check to see if we need to yield and try again */
-+ if ( preempted && general_preempt_check() )
-+ {
-+ *preempted = true;
-+ return -ERESTART;
-+ }
-+ }
-+
-+ return 0;
-+}
-+
-+int p2m_teardown_allocation(struct domain *d)
-+{
-+ int ret = 0;
-+ bool preempted = false;
-+
-+ spin_lock(&d->arch.paging.lock);
-+ if ( d->arch.paging.p2m_total_pages != 0 )
-+ {
-+ ret = p2m_set_allocation(d, 0, &preempted);
-+ if ( preempted )
-+ {
-+ spin_unlock(&d->arch.paging.lock);
-+ return -ERESTART;
-+ }
-+ ASSERT(d->arch.paging.p2m_total_pages == 0);
-+ }
-+ spin_unlock(&d->arch.paging.lock);
-+
-+ return ret;
-+}
-+
- /* Unlock the flush and do a P2M TLB flush if necessary */
- void p2m_write_unlock(struct p2m_domain *p2m)
- {
-@@ -1599,7 +1685,9 @@ int p2m_init(struct domain *d)
- unsigned int cpu;
-
- rwlock_init(&p2m->lock);
-+ spin_lock_init(&d->arch.paging.lock);
- INIT_PAGE_LIST_HEAD(&p2m->pages);
-+ INIT_PAGE_LIST_HEAD(&d->arch.paging.p2m_freelist);
-
- p2m->vmid = INVALID_VMID;
-
-diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
-index 7f8ddd3f5c3b..2f31795ab96d 100644
---- a/xen/include/asm-arm/domain.h
-+++ b/xen/include/asm-arm/domain.h
-@@ -40,6 +40,14 @@ struct vtimer {
- uint64_t cval;
- };
-
-+struct paging_domain {
-+ spinlock_t lock;
-+ /* Free P2M pages from the pre-allocated P2M pool */
-+ struct page_list_head p2m_freelist;
-+ /* Number of pages from the pre-allocated P2M pool */
-+ unsigned long p2m_total_pages;
-+};
-+
- struct arch_domain
- {
- #ifdef CONFIG_ARM_64
-@@ -51,6 +59,8 @@ struct arch_domain
-
- struct hvm_domain hvm;
-
-+ struct paging_domain paging;
-+
- struct vmmio vmmio;
-
- /* Continuable domain_relinquish_resources(). */
-diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
-index b3ba83283e11..c9598740bd02 100644
---- a/xen/include/asm-arm/p2m.h
-+++ b/xen/include/asm-arm/p2m.h
-@@ -218,6 +218,10 @@ void p2m_restore_state(struct vcpu *n);
- /* Print debugging/statistial info about a domain's p2m */
- void p2m_dump_info(struct domain *d);
-
-+unsigned int p2m_get_allocation(struct domain *d);
-+int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted);
-+int p2m_teardown_allocation(struct domain *d);
-+
- static inline void p2m_write_lock(struct p2m_domain *p2m)
- {
- write_lock(&p2m->lock);
---
-2.37.4
-
diff --git a/0014-tools-ocaml-xb-Drop-Xs_ring.write.patch b/0014-tools-ocaml-xb-Drop-Xs_ring.write.patch
new file mode 100644
index 0000000..813f041
--- /dev/null
+++ b/0014-tools-ocaml-xb-Drop-Xs_ring.write.patch
@@ -0,0 +1,62 @@
+From f7c4fab9b50af74d0e1170fbf35367ced48d8209 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Fri, 16 Dec 2022 18:25:20 +0000
+Subject: [PATCH 14/61] tools/ocaml/xb: Drop Xs_ring.write
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+This function is unusued (only Xs_ring.write_substring is used), and the
+bytes/string conversion here is backwards: the C stub implements the bytes
+version and then we use a Bytes.unsafe_of_string to convert a string into
+bytes.
+
+However the operation here really is read-only: we read from the string and
+write it to the ring, so the C stub should implement the read-only string
+version, and if needed we could use Bytes.unsafe_to_string to be able to send
+'bytes'. However that is not necessary as the 'bytes' version is dropped above.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 01f139215e678c2dc7d4bb3f9f2777069bb1b091)
+---
+ tools/ocaml/libs/xb/xs_ring.ml | 5 +----
+ tools/ocaml/libs/xb/xs_ring_stubs.c | 2 +-
+ 2 files changed, 2 insertions(+), 5 deletions(-)
+
+diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
+index db7f86bd27..dd5e014a33 100644
+--- a/tools/ocaml/libs/xb/xs_ring.ml
++++ b/tools/ocaml/libs/xb/xs_ring.ml
+@@ -25,14 +25,11 @@ module Server_features = Set.Make(struct
+ end)
+
+ external read: Xenmmap.mmap_interface -> bytes -> int -> int = "ml_interface_read"
+-external write: Xenmmap.mmap_interface -> bytes -> int -> int = "ml_interface_write"
++external write_substring: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_write"
+
+ external _internal_set_server_features: Xenmmap.mmap_interface -> int -> unit = "ml_interface_set_server_features" [@@noalloc]
+ external _internal_get_server_features: Xenmmap.mmap_interface -> int = "ml_interface_get_server_features" [@@noalloc]
+
+-let write_substring mmap buff len =
+- write mmap (Bytes.unsafe_of_string buff) len
+-
+ let get_server_features mmap =
+ (* NB only one feature currently defined above *)
+ let x = _internal_get_server_features mmap in
+diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
+index 1f58524535..1243c63f03 100644
+--- a/tools/ocaml/libs/xb/xs_ring_stubs.c
++++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
+@@ -112,7 +112,7 @@ CAMLprim value ml_interface_write(value ml_interface,
+ CAMLlocal1(ml_result);
+
+ struct mmap_interface *interface = GET_C_STRUCT(ml_interface);
+- const unsigned char *buffer = Bytes_val(ml_buffer);
++ const char *buffer = String_val(ml_buffer);
+ int len = Int_val(ml_len);
+ int result;
+
+--
+2.40.0
+
diff --git a/0014-xen-arm-libxl-Implement-XEN_DOMCTL_shadow_op-for-Arm.patch b/0014-xen-arm-libxl-Implement-XEN_DOMCTL_shadow_op-for-Arm.patch
deleted file mode 100644
index 0c19560..0000000
--- a/0014-xen-arm-libxl-Implement-XEN_DOMCTL_shadow_op-for-Arm.patch
+++ /dev/null
@@ -1,108 +0,0 @@
-From 3a16da801e14b8ff996b6f7408391ce488abd925 Mon Sep 17 00:00:00 2001
-From: Henry Wang <Henry.Wang@arm.com>
-Date: Tue, 11 Oct 2022 14:55:40 +0200
-Subject: [PATCH 14/87] xen/arm, libxl: Implement XEN_DOMCTL_shadow_op for Arm
-
-This commit implements the `XEN_DOMCTL_shadow_op` support in Xen
-for Arm. The p2m pages pool size for xl guests is supposed to be
-determined by `XEN_DOMCTL_shadow_op`. Hence, this commit:
-
-- Introduces a function `p2m_domctl` and implements the subops
-`XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION` and
-`XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION` of `XEN_DOMCTL_shadow_op`.
-
-- Adds the `XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION` support in libxl.
-
-Therefore enabling the setting of shadow memory pool size
-when creating a guest from xl and getting shadow memory pool size
-from Xen.
-
-Note that the `XEN_DOMCTL_shadow_op` added in this commit is only
-a dummy op, and the functionality of setting/getting p2m memory pool
-size for xl guests will be added in following commits.
-
-This is part of CVE-2022-33747 / XSA-409.
-
-Signed-off-by: Henry Wang <Henry.Wang@arm.com>
-Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
-master commit: cf2a68d2ffbc3ce95e01449d46180bddb10d24a0
-master date: 2022-10-11 14:28:42 +0200
----
- tools/libs/light/libxl_arm.c | 12 ++++++++++++
- xen/arch/arm/domctl.c | 32 ++++++++++++++++++++++++++++++++
- 2 files changed, 44 insertions(+)
-
-diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
-index 73a95e83af24..22a0c561bbc6 100644
---- a/tools/libs/light/libxl_arm.c
-+++ b/tools/libs/light/libxl_arm.c
-@@ -131,6 +131,18 @@ int libxl__arch_domain_create(libxl__gc *gc,
- libxl__domain_build_state *state,
- uint32_t domid)
- {
-+ libxl_ctx *ctx = libxl__gc_owner(gc);
-+ unsigned int shadow_mb = DIV_ROUNDUP(d_config->b_info.shadow_memkb, 1024);
-+
-+ int r = xc_shadow_control(ctx->xch, domid,
-+ XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION,
-+ &shadow_mb, 0);
-+ if (r) {
-+ LOGED(ERROR, domid,
-+ "Failed to set %u MiB shadow allocation", shadow_mb);
-+ return ERROR_FAIL;
-+ }
-+
- return 0;
- }
-
-diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
-index 1baf25c3d98b..9bf72e693019 100644
---- a/xen/arch/arm/domctl.c
-+++ b/xen/arch/arm/domctl.c
-@@ -47,11 +47,43 @@ static int handle_vuart_init(struct domain *d,
- return rc;
- }
-
-+static long p2m_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
-+ XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
-+{
-+ if ( unlikely(d == current->domain) )
-+ {
-+ printk(XENLOG_ERR "Tried to do a p2m domctl op on itself.\n");
-+ return -EINVAL;
-+ }
-+
-+ if ( unlikely(d->is_dying) )
-+ {
-+ printk(XENLOG_ERR "Tried to do a p2m domctl op on dying domain %u\n",
-+ d->domain_id);
-+ return -EINVAL;
-+ }
-+
-+ switch ( sc->op )
-+ {
-+ case XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION:
-+ return 0;
-+ case XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION:
-+ return 0;
-+ default:
-+ {
-+ printk(XENLOG_ERR "Bad p2m domctl op %u\n", sc->op);
-+ return -EINVAL;
-+ }
-+ }
-+}
-+
- long arch_do_domctl(struct xen_domctl *domctl, struct domain *d,
- XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
- {
- switch ( domctl->cmd )
- {
-+ case XEN_DOMCTL_shadow_op:
-+ return p2m_domctl(d, &domctl->u.shadow_op, u_domctl);
- case XEN_DOMCTL_cacheflush:
- {
- gfn_t s = _gfn(domctl->u.cacheflush.start_pfn);
---
-2.37.4
-
diff --git a/0015-tools-oxenstored-validate-config-file-before-live-up.patch b/0015-tools-oxenstored-validate-config-file-before-live-up.patch
new file mode 100644
index 0000000..f65fbd6
--- /dev/null
+++ b/0015-tools-oxenstored-validate-config-file-before-live-up.patch
@@ -0,0 +1,131 @@
+From fd1c70442d3aa962be4d041d5f8fce9d2fa72ce1 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Tue, 11 May 2021 15:56:50 +0000
+Subject: [PATCH 15/61] tools/oxenstored: validate config file before live
+ update
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The configuration file can contain typos or various errors that could prevent
+live update from succeeding (e.g. a flag only valid on a different version).
+Unknown entries in the config file would be ignored on startup normally,
+add a strict --config-test that live-update can use to check that the config file
+is valid *for the new binary*.
+
+For compatibility with running old code during live update recognize
+--live --help as an equivalent to --config-test.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit e6f07052ce4a0f0b7d4dc522d87465efb2d9ee86)
+---
+ tools/ocaml/xenstored/parse_arg.ml | 26 ++++++++++++++++++++++++++
+ tools/ocaml/xenstored/xenstored.ml | 11 +++++++++--
+ 2 files changed, 35 insertions(+), 2 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/parse_arg.ml b/tools/ocaml/xenstored/parse_arg.ml
+index 7c0478e76a..5e4ca6f1f7 100644
+--- a/tools/ocaml/xenstored/parse_arg.ml
++++ b/tools/ocaml/xenstored/parse_arg.ml
+@@ -26,8 +26,14 @@ type config =
+ restart: bool;
+ live_reload: bool;
+ disable_socket: bool;
++ config_test: bool;
+ }
+
++let get_config_filename config_file =
++ match config_file with
++ | Some name -> name
++ | None -> Define.default_config_dir ^ "/oxenstored.conf"
++
+ let do_argv =
+ let pidfile = ref "" and tracefile = ref "" (* old xenstored compatibility *)
+ and domain_init = ref true
+@@ -38,6 +44,8 @@ let do_argv =
+ and restart = ref false
+ and live_reload = ref false
+ and disable_socket = ref false
++ and config_test = ref false
++ and help = ref false
+ in
+
+ let speclist =
+@@ -55,10 +63,27 @@ let do_argv =
+ ("-T", Arg.Set_string tracefile, ""); (* for compatibility *)
+ ("--restart", Arg.Set restart, "Read database on starting");
+ ("--live", Arg.Set live_reload, "Read live dump on startup");
++ ("--config-test", Arg.Set config_test, "Test validity of config file");
+ ("--disable-socket", Arg.Unit (fun () -> disable_socket := true), "Disable socket");
++ ("--help", Arg.Set help, "Display this list of options")
+ ] in
+ let usage_msg = "usage : xenstored [--config-file <filename>] [--no-domain-init] [--help] [--no-fork] [--reraise-top-level] [--restart] [--disable-socket]" in
+ Arg.parse speclist (fun _ -> ()) usage_msg;
++ let () =
++ if !help then begin
++ if !live_reload then
++ (*
++ * Transform --live --help into --config-test for backward compat with
++ * running code during live update.
++ * Caller will validate config and exit
++ *)
++ config_test := true
++ else begin
++ Arg.usage_string speclist usage_msg |> print_endline;
++ exit 0
++ end
++ end
++ in
+ {
+ domain_init = !domain_init;
+ activate_access_log = !activate_access_log;
+@@ -70,4 +95,5 @@ let do_argv =
+ restart = !restart;
+ live_reload = !live_reload;
+ disable_socket = !disable_socket;
++ config_test = !config_test;
+ }
+diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
+index 4d5851c5cb..e2638a5af2 100644
+--- a/tools/ocaml/xenstored/xenstored.ml
++++ b/tools/ocaml/xenstored/xenstored.ml
+@@ -88,7 +88,7 @@ let default_pidfile = Paths.xen_run_dir ^ "/xenstored.pid"
+
+ let ring_scan_interval = ref 20
+
+-let parse_config filename =
++let parse_config ?(strict=false) filename =
+ let pidfile = ref default_pidfile in
+ let options = [
+ ("merge-activate", Config.Set_bool Transaction.do_coalesce);
+@@ -129,11 +129,12 @@ let parse_config filename =
+ ("xenstored-port", Config.Set_string Domains.xenstored_port); ] in
+ begin try Config.read filename options (fun _ _ -> raise Not_found)
+ with
+- | Config.Error err -> List.iter (fun (k, e) ->
++ | Config.Error err as e -> List.iter (fun (k, e) ->
+ match e with
+ | "unknown key" -> eprintf "config: unknown key %s\n" k
+ | _ -> eprintf "config: %s: %s\n" k e
+ ) err;
++ if strict then raise e
+ | Sys_error m -> eprintf "error: config: %s\n" m;
+ end;
+ !pidfile
+@@ -358,6 +359,12 @@ let tweak_gc () =
+ let () =
+ Printexc.set_uncaught_exception_handler Logging.fallback_exception_handler;
+ let cf = do_argv in
++ if cf.config_test then begin
++ let path = config_filename cf in
++ let _pidfile:string = parse_config ~strict:true path in
++ Printf.printf "Configuration valid at %s\n%!" path;
++ exit 0
++ end;
+ let pidfile =
+ if Sys.file_exists (config_filename cf) then
+ parse_config (config_filename cf)
+--
+2.40.0
+
diff --git a/0015-xen-arm-Allocate-and-free-P2M-pages-from-the-P2M-poo.patch b/0015-xen-arm-Allocate-and-free-P2M-pages-from-the-P2M-poo.patch
deleted file mode 100644
index 7472b4b..0000000
--- a/0015-xen-arm-Allocate-and-free-P2M-pages-from-the-P2M-poo.patch
+++ /dev/null
@@ -1,289 +0,0 @@
-From 44e9dcc48b81bca202a5b31926125a6a59a4c72e Mon Sep 17 00:00:00 2001
-From: Henry Wang <Henry.Wang@arm.com>
-Date: Tue, 11 Oct 2022 14:55:53 +0200
-Subject: [PATCH 15/87] xen/arm: Allocate and free P2M pages from the P2M pool
-
-This commit sets/tearsdown of p2m pages pool for non-privileged Arm
-guests by calling `p2m_set_allocation` and `p2m_teardown_allocation`.
-
-- For dom0, P2M pages should come from heap directly instead of p2m
-pool, so that the kernel may take advantage of the extended regions.
-
-- For xl guests, the setting of the p2m pool is called in
-`XEN_DOMCTL_shadow_op` and the p2m pool is destroyed in
-`domain_relinquish_resources`. Note that domctl->u.shadow_op.mb is
-updated with the new size when setting the p2m pool.
-
-- For dom0less domUs, the setting of the p2m pool is called before
-allocating memory during domain creation. Users can specify the p2m
-pool size by `xen,domain-p2m-mem-mb` dts property.
-
-To actually allocate/free pages from the p2m pool, this commit adds
-two helper functions namely `p2m_alloc_page` and `p2m_free_page` to
-`struct p2m_domain`. By replacing the `alloc_domheap_page` and
-`free_domheap_page` with these two helper functions, p2m pages can
-be added/removed from the list of p2m pool rather than from the heap.
-
-Since page from `p2m_alloc_page` is cleaned, take the opportunity
-to remove the redundant `clean_page` in `p2m_create_table`.
-
-This is part of CVE-2022-33747 / XSA-409.
-
-Signed-off-by: Henry Wang <Henry.Wang@arm.com>
-Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
-master commit: cbea5a1149ca7fd4b7cdbfa3ec2e4f109b601ff7
-master date: 2022-10-11 14:28:44 +0200
----
- docs/misc/arm/device-tree/booting.txt | 8 ++++
- xen/arch/arm/domain.c | 6 +++
- xen/arch/arm/domain_build.c | 29 ++++++++++++++
- xen/arch/arm/domctl.c | 23 ++++++++++-
- xen/arch/arm/p2m.c | 57 +++++++++++++++++++++++++--
- 5 files changed, 118 insertions(+), 5 deletions(-)
-
-diff --git a/docs/misc/arm/device-tree/booting.txt b/docs/misc/arm/device-tree/booting.txt
-index 71895663a4de..d92ccc56ffe0 100644
---- a/docs/misc/arm/device-tree/booting.txt
-+++ b/docs/misc/arm/device-tree/booting.txt
-@@ -182,6 +182,14 @@ with the following properties:
- Both #address-cells and #size-cells need to be specified because
- both sub-nodes (described shortly) have reg properties.
-
-+- xen,domain-p2m-mem-mb
-+
-+ Optional. A 32-bit integer specifying the amount of megabytes of RAM
-+ used for the domain P2M pool. This is in-sync with the shadow_memory
-+ option in xl.cfg. Leaving this field empty in device tree will lead to
-+ the default size of domain P2M pool, i.e. 1MB per guest vCPU plus 4KB
-+ per MB of guest RAM plus 512KB for guest extended regions.
-+
- Under the "xen,domain" compatible node, one or more sub-nodes are present
- for the DomU kernel and ramdisk.
-
-diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
-index 2694c39127c5..a818f33a1afa 100644
---- a/xen/arch/arm/domain.c
-+++ b/xen/arch/arm/domain.c
-@@ -997,6 +997,7 @@ enum {
- PROG_page,
- PROG_mapping,
- PROG_p2m,
-+ PROG_p2m_pool,
- PROG_done,
- };
-
-@@ -1062,6 +1063,11 @@ int domain_relinquish_resources(struct domain *d)
- if ( ret )
- return ret;
-
-+ PROGRESS(p2m_pool):
-+ ret = p2m_teardown_allocation(d);
-+ if( ret )
-+ return ret;
-+
- PROGRESS(done):
- break;
-
-diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
-index d02bacbcd1ed..8aec3755ca5d 100644
---- a/xen/arch/arm/domain_build.c
-+++ b/xen/arch/arm/domain_build.c
-@@ -2833,6 +2833,21 @@ static void __init find_gnttab_region(struct domain *d,
- kinfo->gnttab_start, kinfo->gnttab_start + kinfo->gnttab_size);
- }
-
-+static unsigned long __init domain_p2m_pages(unsigned long maxmem_kb,
-+ unsigned int smp_cpus)
-+{
-+ /*
-+ * Keep in sync with libxl__get_required_paging_memory().
-+ * 256 pages (1MB) per vcpu, plus 1 page per MiB of RAM for the P2M map,
-+ * plus 128 pages to cover extended regions.
-+ */
-+ unsigned long memkb = 4 * (256 * smp_cpus + (maxmem_kb / 1024) + 128);
-+
-+ BUILD_BUG_ON(PAGE_SIZE != SZ_4K);
-+
-+ return DIV_ROUND_UP(memkb, 1024) << (20 - PAGE_SHIFT);
-+}
-+
- static int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
- {
- unsigned int i;
-@@ -2924,6 +2939,8 @@ static int __init construct_domU(struct domain *d,
- struct kernel_info kinfo = {};
- int rc;
- u64 mem;
-+ u32 p2m_mem_mb;
-+ unsigned long p2m_pages;
-
- rc = dt_property_read_u64(node, "memory", &mem);
- if ( !rc )
-@@ -2933,6 +2950,18 @@ static int __init construct_domU(struct domain *d,
- }
- kinfo.unassigned_mem = (paddr_t)mem * SZ_1K;
-
-+ rc = dt_property_read_u32(node, "xen,domain-p2m-mem-mb", &p2m_mem_mb);
-+ /* If xen,domain-p2m-mem-mb is not specified, use the default value. */
-+ p2m_pages = rc ?
-+ p2m_mem_mb << (20 - PAGE_SHIFT) :
-+ domain_p2m_pages(mem, d->max_vcpus);
-+
-+ spin_lock(&d->arch.paging.lock);
-+ rc = p2m_set_allocation(d, p2m_pages, NULL);
-+ spin_unlock(&d->arch.paging.lock);
-+ if ( rc != 0 )
-+ return rc;
-+
- printk("*** LOADING DOMU cpus=%u memory=%"PRIx64"KB ***\n", d->max_vcpus, mem);
-
- kinfo.vpl011 = dt_property_read_bool(node, "vpl011");
-diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
-index 9bf72e693019..c8fdeb124084 100644
---- a/xen/arch/arm/domctl.c
-+++ b/xen/arch/arm/domctl.c
-@@ -50,6 +50,9 @@ static int handle_vuart_init(struct domain *d,
- static long p2m_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
- XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
- {
-+ long rc;
-+ bool preempted = false;
-+
- if ( unlikely(d == current->domain) )
- {
- printk(XENLOG_ERR "Tried to do a p2m domctl op on itself.\n");
-@@ -66,9 +69,27 @@ static long p2m_domctl(struct domain *d, struct xen_domctl_shadow_op *sc,
- switch ( sc->op )
- {
- case XEN_DOMCTL_SHADOW_OP_SET_ALLOCATION:
-- return 0;
-+ {
-+ /* Allow and handle preemption */
-+ spin_lock(&d->arch.paging.lock);
-+ rc = p2m_set_allocation(d, sc->mb << (20 - PAGE_SHIFT), &preempted);
-+ spin_unlock(&d->arch.paging.lock);
-+
-+ if ( preempted )
-+ /* Not finished. Set up to re-run the call. */
-+ rc = hypercall_create_continuation(__HYPERVISOR_domctl, "h",
-+ u_domctl);
-+ else
-+ /* Finished. Return the new allocation. */
-+ sc->mb = p2m_get_allocation(d);
-+
-+ return rc;
-+ }
- case XEN_DOMCTL_SHADOW_OP_GET_ALLOCATION:
-+ {
-+ sc->mb = p2m_get_allocation(d);
- return 0;
-+ }
- default:
- {
- printk(XENLOG_ERR "Bad p2m domctl op %u\n", sc->op);
-diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
-index d8957dd8727c..b2d856a801af 100644
---- a/xen/arch/arm/p2m.c
-+++ b/xen/arch/arm/p2m.c
-@@ -50,6 +50,54 @@ static uint64_t generate_vttbr(uint16_t vmid, mfn_t root_mfn)
- return (mfn_to_maddr(root_mfn) | ((uint64_t)vmid << 48));
- }
-
-+static struct page_info *p2m_alloc_page(struct domain *d)
-+{
-+ struct page_info *pg;
-+
-+ spin_lock(&d->arch.paging.lock);
-+ /*
-+ * For hardware domain, there should be no limit in the number of pages that
-+ * can be allocated, so that the kernel may take advantage of the extended
-+ * regions. Hence, allocate p2m pages for hardware domains from heap.
-+ */
-+ if ( is_hardware_domain(d) )
-+ {
-+ pg = alloc_domheap_page(NULL, 0);
-+ if ( pg == NULL )
-+ {
-+ printk(XENLOG_G_ERR "Failed to allocate P2M pages for hwdom.\n");
-+ spin_unlock(&d->arch.paging.lock);
-+ return NULL;
-+ }
-+ }
-+ else
-+ {
-+ pg = page_list_remove_head(&d->arch.paging.p2m_freelist);
-+ if ( unlikely(!pg) )
-+ {
-+ spin_unlock(&d->arch.paging.lock);
-+ return NULL;
-+ }
-+ d->arch.paging.p2m_total_pages--;
-+ }
-+ spin_unlock(&d->arch.paging.lock);
-+
-+ return pg;
-+}
-+
-+static void p2m_free_page(struct domain *d, struct page_info *pg)
-+{
-+ spin_lock(&d->arch.paging.lock);
-+ if ( is_hardware_domain(d) )
-+ free_domheap_page(pg);
-+ else
-+ {
-+ d->arch.paging.p2m_total_pages++;
-+ page_list_add_tail(pg, &d->arch.paging.p2m_freelist);
-+ }
-+ spin_unlock(&d->arch.paging.lock);
-+}
-+
- /* Return the size of the pool, rounded up to the nearest MB */
- unsigned int p2m_get_allocation(struct domain *d)
- {
-@@ -751,7 +799,7 @@ static int p2m_create_table(struct p2m_domain *p2m, lpae_t *entry)
-
- ASSERT(!p2m_is_valid(*entry));
-
-- page = alloc_domheap_page(NULL, 0);
-+ page = p2m_alloc_page(p2m->domain);
- if ( page == NULL )
- return -ENOMEM;
-
-@@ -878,7 +926,7 @@ static void p2m_free_entry(struct p2m_domain *p2m,
- pg = mfn_to_page(mfn);
-
- page_list_del(pg, &p2m->pages);
-- free_domheap_page(pg);
-+ p2m_free_page(p2m->domain, pg);
- }
-
- static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry,
-@@ -902,7 +950,7 @@ static bool p2m_split_superpage(struct p2m_domain *p2m, lpae_t *entry,
- ASSERT(level < target);
- ASSERT(p2m_is_superpage(*entry, level));
-
-- page = alloc_domheap_page(NULL, 0);
-+ page = p2m_alloc_page(p2m->domain);
- if ( !page )
- return false;
-
-@@ -1641,7 +1689,7 @@ int p2m_teardown(struct domain *d)
-
- while ( (pg = page_list_remove_head(&p2m->pages)) )
- {
-- free_domheap_page(pg);
-+ p2m_free_page(p2m->domain, pg);
- count++;
- /* Arbitrarily preempt every 512 iterations */
- if ( !(count % 512) && hypercall_preempt_check() )
-@@ -1665,6 +1713,7 @@ void p2m_final_teardown(struct domain *d)
- return;
-
- ASSERT(page_list_empty(&p2m->pages));
-+ ASSERT(page_list_empty(&d->arch.paging.p2m_freelist));
-
- if ( p2m->root )
- free_domheap_pages(p2m->root, P2M_ROOT_ORDER);
---
-2.37.4
-
diff --git a/0016-gnttab-correct-locking-on-transitive-grant-copy-erro.patch b/0016-gnttab-correct-locking-on-transitive-grant-copy-erro.patch
deleted file mode 100644
index dfb46a9..0000000
--- a/0016-gnttab-correct-locking-on-transitive-grant-copy-erro.patch
+++ /dev/null
@@ -1,66 +0,0 @@
-From 32cb81501c8b858fe9a451650804ec3024a8b364 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 11 Oct 2022 14:56:29 +0200
-Subject: [PATCH 16/87] gnttab: correct locking on transitive grant copy error
- path
-
-While the comment next to the lock dropping in preparation of
-recursively calling acquire_grant_for_copy() mistakenly talks about the
-rd == td case (excluded a few lines further up), the same concerns apply
-to the calling of release_grant_for_copy() on a subsequent error path.
-
-This is CVE-2022-33748 / XSA-411.
-
-Fixes: ad48fb963dbf ("gnttab: fix transitive grant handling")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-master commit: 6e3aab858eef614a21a782a3b73acc88e74690ea
-master date: 2022-10-11 14:29:30 +0200
----
- xen/common/grant_table.c | 19 ++++++++++++++++---
- 1 file changed, 16 insertions(+), 3 deletions(-)
-
-diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
-index 4c742cd8fe81..d8ca645b96ff 100644
---- a/xen/common/grant_table.c
-+++ b/xen/common/grant_table.c
-@@ -2613,9 +2613,8 @@ acquire_grant_for_copy(
- trans_domid);
-
- /*
-- * acquire_grant_for_copy() could take the lock on the
-- * remote table (if rd == td), so we have to drop the lock
-- * here and reacquire.
-+ * acquire_grant_for_copy() will take the lock on the remote table,
-+ * so we have to drop the lock here and reacquire.
- */
- active_entry_release(act);
- grant_read_unlock(rgt);
-@@ -2652,11 +2651,25 @@ acquire_grant_for_copy(
- act->trans_gref != trans_gref ||
- !act->is_sub_page)) )
- {
-+ /*
-+ * Like above for acquire_grant_for_copy() we need to drop and then
-+ * re-acquire the locks here to prevent lock order inversion issues.
-+ * Unlike for acquire_grant_for_copy() we don't need to re-check
-+ * anything, as release_grant_for_copy() doesn't depend on the grant
-+ * table entry: It only updates internal state and the status flags.
-+ */
-+ active_entry_release(act);
-+ grant_read_unlock(rgt);
-+
- release_grant_for_copy(td, trans_gref, readonly);
- rcu_unlock_domain(td);
-+
-+ grant_read_lock(rgt);
-+ act = active_entry_acquire(rgt, gref);
- reduce_status_for_pin(rd, act, status, readonly);
- active_entry_release(act);
- grant_read_unlock(rgt);
-+
- put_page(*page);
- *page = NULL;
- return ERESTART;
---
-2.37.4
-
diff --git a/0016-tools-ocaml-libs-Don-t-declare-stubs-as-taking-void.patch b/0016-tools-ocaml-libs-Don-t-declare-stubs-as-taking-void.patch
new file mode 100644
index 0000000..a64d657
--- /dev/null
+++ b/0016-tools-ocaml-libs-Don-t-declare-stubs-as-taking-void.patch
@@ -0,0 +1,61 @@
+From 552e5f28d411c1a1a92f2fd3592a76e74f47610b Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
+Date: Thu, 12 Jan 2023 11:28:29 +0000
+Subject: [PATCH 16/61] tools/ocaml/libs: Don't declare stubs as taking void
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+There is no such thing as an Ocaml function (C stub or otherwise) taking no
+parameters. In the absence of any other parameters, unit is still passed.
+
+This doesn't explode with any ABI we care about, but would malfunction for an
+ABI environment such as stdcall.
+
+Fixes: c3afd398ba7f ("ocaml: Add XS bindings.")
+Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
+Signed-off-by: Edwin Török <edwin.torok@cloud.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit ff8b560be80b9211c303d74df7e4b3921d2bb8ca)
+---
+ tools/ocaml/libs/xb/xenbus_stubs.c | 5 ++---
+ tools/ocaml/libs/xc/xenctrl_stubs.c | 4 ++--
+ 2 files changed, 4 insertions(+), 5 deletions(-)
+
+diff --git a/tools/ocaml/libs/xb/xenbus_stubs.c b/tools/ocaml/libs/xb/xenbus_stubs.c
+index 3065181a55..97116b0782 100644
+--- a/tools/ocaml/libs/xb/xenbus_stubs.c
++++ b/tools/ocaml/libs/xb/xenbus_stubs.c
+@@ -30,10 +30,9 @@
+ #include <xenctrl.h>
+ #include <xen/io/xs_wire.h>
+
+-CAMLprim value stub_header_size(void)
++CAMLprim value stub_header_size(value unit)
+ {
+- CAMLparam0();
+- CAMLreturn(Val_int(sizeof(struct xsd_sockmsg)));
++ return Val_int(sizeof(struct xsd_sockmsg));
+ }
+
+ CAMLprim value stub_header_of_string(value s)
+diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
+index 5b4fe72c8d..434fc0345b 100644
+--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
++++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
+@@ -67,9 +67,9 @@ static void Noreturn failwith_xc(xc_interface *xch)
+ caml_raise_with_string(*caml_named_value("xc.error"), error_str);
+ }
+
+-CAMLprim value stub_xc_interface_open(void)
++CAMLprim value stub_xc_interface_open(value unit)
+ {
+- CAMLparam0();
++ CAMLparam1(unit);
+ xc_interface *xch;
+
+ /* Don't assert XC_OPENFLAG_NON_REENTRANT because these bindings
+--
+2.40.0
+
diff --git a/0017-tools-libxl-Replace-deprecated-soundhw-on-QEMU-comma.patch b/0017-tools-libxl-Replace-deprecated-soundhw-on-QEMU-comma.patch
deleted file mode 100644
index 8133c53..0000000
--- a/0017-tools-libxl-Replace-deprecated-soundhw-on-QEMU-comma.patch
+++ /dev/null
@@ -1,112 +0,0 @@
-From e85e2a3c17b6cd38de041cdaf14d9efdcdabad1a Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Tue, 11 Oct 2022 14:59:10 +0200
-Subject: [PATCH 17/87] tools/libxl: Replace deprecated -soundhw on QEMU
- command line
-
--soundhw is deprecated since 825ff02911c9 ("audio: add soundhw
-deprecation notice"), QEMU v5.1, and is been remove for upcoming v7.1
-by 039a68373c45 ("introduce -audio as a replacement for -soundhw").
-
-Instead we can just add the sound card with "-device", for most option
-that "-soundhw" could handle. "-device" is an option that existed
-before QEMU 1.0, and could already be used to add audio hardware.
-
-The list of possible option for libxl's "soundhw" is taken the list
-from QEMU 7.0.
-
-The list of options for "soundhw" are listed in order of preference in
-the manual. The first three (hda, ac97, es1370) are PCI devices and
-easy to test on Linux, and the last four are ISA devices which doesn't
-seems to work out of the box on linux.
-
-The sound card 'pcspk' isn't listed even if it used to be accepted by
-'-soundhw' because QEMU crash when trying to add it to a Xen domain.
-Also, it wouldn't work with "-device" might need to be "-machine
-pcspk-audiodev=default" instead.
-
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
-master commit: 62ca138c2c052187783aca3957d3f47c4dcfd683
-master date: 2022-08-18 09:25:50 +0200
----
- docs/man/xl.cfg.5.pod.in | 6 +++---
- tools/libs/light/libxl_dm.c | 19 ++++++++++++++++++-
- tools/libs/light/libxl_types_internal.idl | 10 ++++++++++
- 3 files changed, 31 insertions(+), 4 deletions(-)
-
-diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
-index eda1e77ebd06..ab7541f22c3e 100644
---- a/docs/man/xl.cfg.5.pod.in
-+++ b/docs/man/xl.cfg.5.pod.in
-@@ -2545,9 +2545,9 @@ The form serial=DEVICE is also accepted for backwards compatibility.
-
- =item B<soundhw="DEVICE">
-
--Select the virtual sound card to expose to the guest. The valid
--devices are defined by the device model configuration, please see the
--B<qemu(1)> manpage for details. The default is not to export any sound
-+Select the virtual sound card to expose to the guest. The valid devices are
-+B<hda>, B<ac97>, B<es1370>, B<adlib>, B<cs4231a>, B<gus>, B<sb16> if there are
-+available with the device model QEMU. The default is not to export any sound
- device.
-
- =item B<vkb_device=BOOLEAN>
-diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
-index 04bf5d85632e..fc264a3a13a6 100644
---- a/tools/libs/light/libxl_dm.c
-+++ b/tools/libs/light/libxl_dm.c
-@@ -1204,6 +1204,7 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
- uint64_t ram_size;
- const char *path, *chardev;
- bool is_stubdom = libxl_defbool_val(b_info->device_model_stubdomain);
-+ int rc;
-
- dm_args = flexarray_make(gc, 16, 1);
- dm_envs = flexarray_make(gc, 16, 1);
-@@ -1531,7 +1532,23 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
- }
- }
- if (b_info->u.hvm.soundhw) {
-- flexarray_vappend(dm_args, "-soundhw", b_info->u.hvm.soundhw, NULL);
-+ libxl__qemu_soundhw soundhw;
-+
-+ rc = libxl__qemu_soundhw_from_string(b_info->u.hvm.soundhw, &soundhw);
-+ if (rc) {
-+ LOGD(ERROR, guest_domid, "Unknown soundhw option '%s'", b_info->u.hvm.soundhw);
-+ return ERROR_INVAL;
-+ }
-+
-+ switch (soundhw) {
-+ case LIBXL__QEMU_SOUNDHW_HDA:
-+ flexarray_vappend(dm_args, "-device", "intel-hda",
-+ "-device", "hda-duplex", NULL);
-+ break;
-+ default:
-+ flexarray_append_pair(dm_args, "-device",
-+ (char*)libxl__qemu_soundhw_to_string(soundhw));
-+ }
- }
- if (!libxl__acpi_defbool_val(b_info)) {
- flexarray_append(dm_args, "-no-acpi");
-diff --git a/tools/libs/light/libxl_types_internal.idl b/tools/libs/light/libxl_types_internal.idl
-index 3593e21dbb64..caa08d3229cd 100644
---- a/tools/libs/light/libxl_types_internal.idl
-+++ b/tools/libs/light/libxl_types_internal.idl
-@@ -55,3 +55,13 @@ libxl__device_action = Enumeration("device_action", [
- (1, "ADD"),
- (2, "REMOVE"),
- ])
-+
-+libxl__qemu_soundhw = Enumeration("qemu_soundhw", [
-+ (1, "ac97"),
-+ (2, "adlib"),
-+ (3, "cs4231a"),
-+ (4, "es1370"),
-+ (5, "gus"),
-+ (6, "hda"),
-+ (7, "sb16"),
-+ ])
---
-2.37.4
-
diff --git a/0017-tools-ocaml-libs-Allocate-the-correct-amount-of-memo.patch b/0017-tools-ocaml-libs-Allocate-the-correct-amount-of-memo.patch
new file mode 100644
index 0000000..9fa8d08
--- /dev/null
+++ b/0017-tools-ocaml-libs-Allocate-the-correct-amount-of-memo.patch
@@ -0,0 +1,80 @@
+From 6d66fb984cc768406158353cabf9a55652b0dea7 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 31 Jan 2023 10:59:42 +0000
+Subject: [PATCH 17/61] tools/ocaml/libs: Allocate the correct amount of memory
+ for Abstract_tag
+
+caml_alloc() takes units of Wsize (word size), not bytes. As a consequence,
+we're allocating 4 or 8 times too much memory.
+
+Ocaml has a helper, Wsize_bsize(), but it truncates cases which aren't an
+exact multiple. Use a BUILD_BUG_ON() to cover the potential for truncation,
+as there's no rounding-up form of the helper.
+
+Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
+Fixes: d3e649277a13 ("ocaml: add mmap bindings implementation.")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 36eb2de31b6ecb8787698fb1a701bd708c8971b2)
+---
+ tools/ocaml/libs/mmap/Makefile | 2 ++
+ tools/ocaml/libs/mmap/xenmmap_stubs.c | 6 +++++-
+ tools/ocaml/libs/xc/xenctrl_stubs.c | 5 ++++-
+ 3 files changed, 11 insertions(+), 2 deletions(-)
+
+diff --git a/tools/ocaml/libs/mmap/Makefile b/tools/ocaml/libs/mmap/Makefile
+index df45819df5..a3bd75e33a 100644
+--- a/tools/ocaml/libs/mmap/Makefile
++++ b/tools/ocaml/libs/mmap/Makefile
+@@ -2,6 +2,8 @@ TOPLEVEL=$(CURDIR)/../..
+ XEN_ROOT=$(TOPLEVEL)/../..
+ include $(TOPLEVEL)/common.make
+
++CFLAGS += $(CFLAGS_xeninclude)
++
+ OBJS = xenmmap
+ INTF = $(foreach obj, $(OBJS),$(obj).cmi)
+ LIBS = xenmmap.cma xenmmap.cmxa
+diff --git a/tools/ocaml/libs/mmap/xenmmap_stubs.c b/tools/ocaml/libs/mmap/xenmmap_stubs.c
+index e03951d781..d623ad390e 100644
+--- a/tools/ocaml/libs/mmap/xenmmap_stubs.c
++++ b/tools/ocaml/libs/mmap/xenmmap_stubs.c
+@@ -21,6 +21,8 @@
+ #include <errno.h>
+ #include "mmap_stubs.h"
+
++#include <xen-tools/libs.h>
++
+ #include <caml/mlvalues.h>
+ #include <caml/memory.h>
+ #include <caml/alloc.h>
+@@ -59,7 +61,9 @@ CAMLprim value stub_mmap_init(value fd, value pflag, value mflag,
+ default: caml_invalid_argument("maptype");
+ }
+
+- result = caml_alloc(sizeof(struct mmap_interface), Abstract_tag);
++ BUILD_BUG_ON((sizeof(struct mmap_interface) % sizeof(value)) != 0);
++ result = caml_alloc(Wsize_bsize(sizeof(struct mmap_interface)),
++ Abstract_tag);
+
+ if (mmap_interface_init(Intf_val(result), Int_val(fd),
+ c_pflag, c_mflag,
+diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
+index 434fc0345b..ec64341a9a 100644
+--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
++++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
+@@ -940,7 +940,10 @@ CAMLprim value stub_map_foreign_range(value xch, value dom,
+ uint32_t c_dom;
+ unsigned long c_mfn;
+
+- result = caml_alloc(sizeof(struct mmap_interface), Abstract_tag);
++ BUILD_BUG_ON((sizeof(struct mmap_interface) % sizeof(value)) != 0);
++ result = caml_alloc(Wsize_bsize(sizeof(struct mmap_interface)),
++ Abstract_tag);
++
+ intf = (struct mmap_interface *) result;
+
+ intf->len = Int_val(size);
+--
+2.40.0
+
diff --git a/0018-tools-ocaml-evtchn-Don-t-reference-Custom-objects-wi.patch b/0018-tools-ocaml-evtchn-Don-t-reference-Custom-objects-wi.patch
new file mode 100644
index 0000000..8e1c860
--- /dev/null
+++ b/0018-tools-ocaml-evtchn-Don-t-reference-Custom-objects-wi.patch
@@ -0,0 +1,213 @@
+From e18faeb91e620624106b94c8821f8c9574eddb17 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
+Date: Thu, 12 Jan 2023 17:48:29 +0000
+Subject: [PATCH 18/61] tools/ocaml/evtchn: Don't reference Custom objects with
+ the GC lock released
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The modification to the _H() macro for Ocaml 5 support introduced a subtle
+bug. From the manual:
+
+ https://ocaml.org/manual/intfc.html#ss:parallel-execution-long-running-c-code
+
+"After caml_release_runtime_system() was called and until
+caml_acquire_runtime_system() is called, the C code must not access any OCaml
+data, nor call any function of the run-time system, nor call back into OCaml
+code."
+
+Previously, the value was a naked C pointer, so dereferencing it wasn't
+"accessing any Ocaml data", but the fix to avoid naked C pointers added a
+layer of indirection through an Ocaml Custom object, meaning that the common
+pattern of using _H() in a blocking section is unsafe.
+
+In order to fix:
+
+ * Drop the _H() macro and replace it with a static inline xce_of_val().
+ * Opencode the assignment into Data_custom_val() in the two constructors.
+ * Rename "value xce" parameters to "value xce_val" so we can consistently
+ have "xenevtchn_handle *xce" on the stack, and obtain the pointer with the
+ GC lock still held.
+
+Fixes: 22d5affdf0ce ("tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak")
+Signed-off-by: Edwin Török <edwin.torok@cloud.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 2636d8ff7a670c4d2485757dbe966e36c259a960)
+---
+ tools/ocaml/libs/eventchn/xeneventchn_stubs.c | 60 +++++++++++--------
+ 1 file changed, 35 insertions(+), 25 deletions(-)
+
+diff --git a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
+index aa8a69cc1e..d7881ca95f 100644
+--- a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
++++ b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
+@@ -33,11 +33,14 @@
+ #include <caml/fail.h>
+ #include <caml/signals.h>
+
+-#define _H(__h) (*((xenevtchn_handle **)Data_custom_val(__h)))
++static inline xenevtchn_handle *xce_of_val(value v)
++{
++ return *(xenevtchn_handle **)Data_custom_val(v);
++}
+
+ static void stub_evtchn_finalize(value v)
+ {
+- xenevtchn_close(_H(v));
++ xenevtchn_close(xce_of_val(v));
+ }
+
+ static struct custom_operations xenevtchn_ops = {
+@@ -68,7 +71,7 @@ CAMLprim value stub_eventchn_init(value cloexec)
+ caml_failwith("open failed");
+
+ result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
+- _H(result) = xce;
++ *(xenevtchn_handle **)Data_custom_val(result) = xce;
+
+ CAMLreturn(result);
+ }
+@@ -87,18 +90,19 @@ CAMLprim value stub_eventchn_fdopen(value fdval)
+ caml_failwith("evtchn fdopen failed");
+
+ result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
+- _H(result) = xce;
++ *(xenevtchn_handle **)Data_custom_val(result) = xce;
+
+ CAMLreturn(result);
+ }
+
+-CAMLprim value stub_eventchn_fd(value xce)
++CAMLprim value stub_eventchn_fd(value xce_val)
+ {
+- CAMLparam1(xce);
++ CAMLparam1(xce_val);
+ CAMLlocal1(result);
++ xenevtchn_handle *xce = xce_of_val(xce_val);
+ int fd;
+
+- fd = xenevtchn_fd(_H(xce));
++ fd = xenevtchn_fd(xce);
+ if (fd == -1)
+ caml_failwith("evtchn fd failed");
+
+@@ -107,13 +111,14 @@ CAMLprim value stub_eventchn_fd(value xce)
+ CAMLreturn(result);
+ }
+
+-CAMLprim value stub_eventchn_notify(value xce, value port)
++CAMLprim value stub_eventchn_notify(value xce_val, value port)
+ {
+- CAMLparam2(xce, port);
++ CAMLparam2(xce_val, port);
++ xenevtchn_handle *xce = xce_of_val(xce_val);
+ int rc;
+
+ caml_enter_blocking_section();
+- rc = xenevtchn_notify(_H(xce), Int_val(port));
++ rc = xenevtchn_notify(xce, Int_val(port));
+ caml_leave_blocking_section();
+
+ if (rc == -1)
+@@ -122,15 +127,16 @@ CAMLprim value stub_eventchn_notify(value xce, value port)
+ CAMLreturn(Val_unit);
+ }
+
+-CAMLprim value stub_eventchn_bind_interdomain(value xce, value domid,
++CAMLprim value stub_eventchn_bind_interdomain(value xce_val, value domid,
+ value remote_port)
+ {
+- CAMLparam3(xce, domid, remote_port);
++ CAMLparam3(xce_val, domid, remote_port);
+ CAMLlocal1(port);
++ xenevtchn_handle *xce = xce_of_val(xce_val);
+ xenevtchn_port_or_error_t rc;
+
+ caml_enter_blocking_section();
+- rc = xenevtchn_bind_interdomain(_H(xce), Int_val(domid), Int_val(remote_port));
++ rc = xenevtchn_bind_interdomain(xce, Int_val(domid), Int_val(remote_port));
+ caml_leave_blocking_section();
+
+ if (rc == -1)
+@@ -140,14 +146,15 @@ CAMLprim value stub_eventchn_bind_interdomain(value xce, value domid,
+ CAMLreturn(port);
+ }
+
+-CAMLprim value stub_eventchn_bind_virq(value xce, value virq_type)
++CAMLprim value stub_eventchn_bind_virq(value xce_val, value virq_type)
+ {
+- CAMLparam2(xce, virq_type);
++ CAMLparam2(xce_val, virq_type);
+ CAMLlocal1(port);
++ xenevtchn_handle *xce = xce_of_val(xce_val);
+ xenevtchn_port_or_error_t rc;
+
+ caml_enter_blocking_section();
+- rc = xenevtchn_bind_virq(_H(xce), Int_val(virq_type));
++ rc = xenevtchn_bind_virq(xce, Int_val(virq_type));
+ caml_leave_blocking_section();
+
+ if (rc == -1)
+@@ -157,13 +164,14 @@ CAMLprim value stub_eventchn_bind_virq(value xce, value virq_type)
+ CAMLreturn(port);
+ }
+
+-CAMLprim value stub_eventchn_unbind(value xce, value port)
++CAMLprim value stub_eventchn_unbind(value xce_val, value port)
+ {
+- CAMLparam2(xce, port);
++ CAMLparam2(xce_val, port);
++ xenevtchn_handle *xce = xce_of_val(xce_val);
+ int rc;
+
+ caml_enter_blocking_section();
+- rc = xenevtchn_unbind(_H(xce), Int_val(port));
++ rc = xenevtchn_unbind(xce, Int_val(port));
+ caml_leave_blocking_section();
+
+ if (rc == -1)
+@@ -172,14 +180,15 @@ CAMLprim value stub_eventchn_unbind(value xce, value port)
+ CAMLreturn(Val_unit);
+ }
+
+-CAMLprim value stub_eventchn_pending(value xce)
++CAMLprim value stub_eventchn_pending(value xce_val)
+ {
+- CAMLparam1(xce);
++ CAMLparam1(xce_val);
+ CAMLlocal1(result);
++ xenevtchn_handle *xce = xce_of_val(xce_val);
+ xenevtchn_port_or_error_t port;
+
+ caml_enter_blocking_section();
+- port = xenevtchn_pending(_H(xce));
++ port = xenevtchn_pending(xce);
+ caml_leave_blocking_section();
+
+ if (port == -1)
+@@ -189,16 +198,17 @@ CAMLprim value stub_eventchn_pending(value xce)
+ CAMLreturn(result);
+ }
+
+-CAMLprim value stub_eventchn_unmask(value xce, value _port)
++CAMLprim value stub_eventchn_unmask(value xce_val, value _port)
+ {
+- CAMLparam2(xce, _port);
++ CAMLparam2(xce_val, _port);
++ xenevtchn_handle *xce = xce_of_val(xce_val);
+ evtchn_port_t port;
+ int rc;
+
+ port = Int_val(_port);
+
+ caml_enter_blocking_section();
+- rc = xenevtchn_unmask(_H(xce), port);
++ rc = xenevtchn_unmask(xce, port);
+ caml_leave_blocking_section();
+
+ if (rc)
+--
+2.40.0
+
diff --git a/0018-x86-CPUID-surface-suitable-value-in-EBX-of-XSTATE-su.patch b/0018-x86-CPUID-surface-suitable-value-in-EBX-of-XSTATE-su.patch
deleted file mode 100644
index 5fc8919..0000000
--- a/0018-x86-CPUID-surface-suitable-value-in-EBX-of-XSTATE-su.patch
+++ /dev/null
@@ -1,44 +0,0 @@
-From e8882bcfe35520e950ba60acd6e67e65f1ce90a8 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 11 Oct 2022 14:59:26 +0200
-Subject: [PATCH 18/87] x86/CPUID: surface suitable value in EBX of XSTATE
- subleaf 1
-
-While the SDM isn't very clear about this, our present behavior make
-Linux 5.19 unhappy. As of commit 8ad7e8f69695 ("x86/fpu/xsave: Support
-XSAVEC in the kernel") they're using this CPUID output also to size
-the compacted area used by XSAVEC. Getting back zero there isn't really
-liked, yet for PV that's the default on capable hardware: XSAVES isn't
-exposed to PV domains.
-
-Considering that the size reported is that of the compacted save area,
-I view Linux'es assumption as appropriate (short of the SDM properly
-considering the case). Therefore we need to populate the field also when
-only XSAVEC is supported for a guest.
-
-Fixes: 460b9a4b3630 ("x86/xsaves: enable xsaves/xrstors for hvm guest")
-Fixes: 8d050ed1097c ("x86: don't expose XSAVES capability to PV guests")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: c3bd0b83ea5b7c0da6542687436042eeea1e7909
-master date: 2022-08-24 14:23:59 +0200
----
- xen/arch/x86/cpuid.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
-index ff335f16390d..a647331f4793 100644
---- a/xen/arch/x86/cpuid.c
-+++ b/xen/arch/x86/cpuid.c
-@@ -1060,7 +1060,7 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
- switch ( subleaf )
- {
- case 1:
-- if ( p->xstate.xsaves )
-+ if ( p->xstate.xsavec || p->xstate.xsaves )
- {
- /*
- * TODO: Figure out what to do for XSS state. VT-x manages
---
-2.37.4
-
diff --git a/0019-tools-ocaml-xc-Fix-binding-for-xc_domain_assign_devi.patch b/0019-tools-ocaml-xc-Fix-binding-for-xc_domain_assign_devi.patch
new file mode 100644
index 0000000..5571446
--- /dev/null
+++ b/0019-tools-ocaml-xc-Fix-binding-for-xc_domain_assign_devi.patch
@@ -0,0 +1,70 @@
+From 854013084e2c6267af7787df8b35d85646f79a54 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
+Date: Thu, 12 Jan 2023 11:38:38 +0000
+Subject: [PATCH 19/61] tools/ocaml/xc: Fix binding for
+ xc_domain_assign_device()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The patch adding this binding was plain broken, and unreviewed. It modified
+the C stub to add a 4th parameter without an equivalent adjustment in the
+Ocaml side of the bindings.
+
+In 64bit builds, this causes us to dereference whatever dead value is in %rcx
+when trying to interpret the rflags parameter.
+
+This has gone unnoticed because Xapi doesn't use this binding (it has its
+own), but unbreak the binding by passing RDM_RELAXED unconditionally for
+now (matching the libxl default behaviour).
+
+Fixes: 9b34056cb4 ("tools: extend xc_assign_device() to support rdm reservation policy")
+Signed-off-by: Edwin Török <edwin.torok@cloud.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 4250683842104f02996428f93927a035c8e19266)
+---
+ tools/ocaml/libs/xc/xenctrl_stubs.c | 17 +++++------------
+ 1 file changed, 5 insertions(+), 12 deletions(-)
+
+diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
+index ec64341a9a..e2efcbe182 100644
+--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
++++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
+@@ -1123,17 +1123,12 @@ CAMLprim value stub_xc_domain_test_assign_device(value xch, value domid, value d
+ CAMLreturn(Val_bool(ret == 0));
+ }
+
+-static int domain_assign_device_rdm_flag_table[] = {
+- XEN_DOMCTL_DEV_RDM_RELAXED,
+-};
+-
+-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+- value rflag)
++CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+ {
+- CAMLparam4(xch, domid, desc, rflag);
++ CAMLparam3(xch, domid, desc);
+ int ret;
+ int domain, bus, dev, func;
+- uint32_t sbdf, flag;
++ uint32_t sbdf;
+
+ domain = Int_val(Field(desc, 0));
+ bus = Int_val(Field(desc, 1));
+@@ -1141,10 +1136,8 @@ CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+ func = Int_val(Field(desc, 3));
+ sbdf = encode_sbdf(domain, bus, dev, func);
+
+- ret = Int_val(Field(rflag, 0));
+- flag = domain_assign_device_rdm_flag_table[ret];
+-
+- ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
++ ret = xc_assign_device(_H(xch), _D(domid), sbdf,
++ XEN_DOMCTL_DEV_RDM_RELAXED);
+
+ if (ret < 0)
+ failwith_xc(_H(xch));
+--
+2.40.0
+
diff --git a/0019-xen-sched-introduce-cpupool_update_node_affinity.patch b/0019-xen-sched-introduce-cpupool_update_node_affinity.patch
deleted file mode 100644
index badb8c3..0000000
--- a/0019-xen-sched-introduce-cpupool_update_node_affinity.patch
+++ /dev/null
@@ -1,257 +0,0 @@
-From d4e971ad12dd27913dffcf96b5de378ea7b476e1 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 11 Oct 2022 14:59:40 +0200
-Subject: [PATCH 19/87] xen/sched: introduce cpupool_update_node_affinity()
-
-For updating the node affinities of all domains in a cpupool add a new
-function cpupool_update_node_affinity().
-
-In order to avoid multiple allocations of cpumasks carve out memory
-allocation and freeing from domain_update_node_affinity() into new
-helpers, which can be used by cpupool_update_node_affinity().
-
-Modify domain_update_node_affinity() to take an additional parameter
-for passing the allocated memory in and to allocate and free the memory
-via the new helpers in case NULL was passed.
-
-This will help later to pre-allocate the cpumasks in order to avoid
-allocations in stop-machine context.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: a83fa1e2b96ace65b45dde6954d67012633a082b
-master date: 2022-09-05 11:42:30 +0100
----
- xen/common/sched/core.c | 54 ++++++++++++++++++++++++++------------
- xen/common/sched/cpupool.c | 39 +++++++++++++++------------
- xen/common/sched/private.h | 7 +++++
- xen/include/xen/sched.h | 9 ++++++-
- 4 files changed, 74 insertions(+), 35 deletions(-)
-
-diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
-index f07bd2681fcb..065a83eca912 100644
---- a/xen/common/sched/core.c
-+++ b/xen/common/sched/core.c
-@@ -1824,9 +1824,28 @@ int vcpu_affinity_domctl(struct domain *d, uint32_t cmd,
- return ret;
- }
-
--void domain_update_node_affinity(struct domain *d)
-+bool alloc_affinity_masks(struct affinity_masks *affinity)
- {
-- cpumask_var_t dom_cpumask, dom_cpumask_soft;
-+ if ( !alloc_cpumask_var(&affinity->hard) )
-+ return false;
-+ if ( !alloc_cpumask_var(&affinity->soft) )
-+ {
-+ free_cpumask_var(affinity->hard);
-+ return false;
-+ }
-+
-+ return true;
-+}
-+
-+void free_affinity_masks(struct affinity_masks *affinity)
-+{
-+ free_cpumask_var(affinity->soft);
-+ free_cpumask_var(affinity->hard);
-+}
-+
-+void domain_update_node_aff(struct domain *d, struct affinity_masks *affinity)
-+{
-+ struct affinity_masks masks;
- cpumask_t *dom_affinity;
- const cpumask_t *online;
- struct sched_unit *unit;
-@@ -1836,14 +1855,16 @@ void domain_update_node_affinity(struct domain *d)
- if ( !d->vcpu || !d->vcpu[0] )
- return;
-
-- if ( !zalloc_cpumask_var(&dom_cpumask) )
-- return;
-- if ( !zalloc_cpumask_var(&dom_cpumask_soft) )
-+ if ( !affinity )
- {
-- free_cpumask_var(dom_cpumask);
-- return;
-+ affinity = &masks;
-+ if ( !alloc_affinity_masks(affinity) )
-+ return;
- }
-
-+ cpumask_clear(affinity->hard);
-+ cpumask_clear(affinity->soft);
-+
- online = cpupool_domain_master_cpumask(d);
-
- spin_lock(&d->node_affinity_lock);
-@@ -1864,22 +1885,21 @@ void domain_update_node_affinity(struct domain *d)
- */
- for_each_sched_unit ( d, unit )
- {
-- cpumask_or(dom_cpumask, dom_cpumask, unit->cpu_hard_affinity);
-- cpumask_or(dom_cpumask_soft, dom_cpumask_soft,
-- unit->cpu_soft_affinity);
-+ cpumask_or(affinity->hard, affinity->hard, unit->cpu_hard_affinity);
-+ cpumask_or(affinity->soft, affinity->soft, unit->cpu_soft_affinity);
- }
- /* Filter out non-online cpus */
-- cpumask_and(dom_cpumask, dom_cpumask, online);
-- ASSERT(!cpumask_empty(dom_cpumask));
-+ cpumask_and(affinity->hard, affinity->hard, online);
-+ ASSERT(!cpumask_empty(affinity->hard));
- /* And compute the intersection between hard, online and soft */
-- cpumask_and(dom_cpumask_soft, dom_cpumask_soft, dom_cpumask);
-+ cpumask_and(affinity->soft, affinity->soft, affinity->hard);
-
- /*
- * If not empty, the intersection of hard, soft and online is the
- * narrowest set we want. If empty, we fall back to hard&online.
- */
-- dom_affinity = cpumask_empty(dom_cpumask_soft) ?
-- dom_cpumask : dom_cpumask_soft;
-+ dom_affinity = cpumask_empty(affinity->soft) ? affinity->hard
-+ : affinity->soft;
-
- nodes_clear(d->node_affinity);
- for_each_cpu ( cpu, dom_affinity )
-@@ -1888,8 +1908,8 @@ void domain_update_node_affinity(struct domain *d)
-
- spin_unlock(&d->node_affinity_lock);
-
-- free_cpumask_var(dom_cpumask_soft);
-- free_cpumask_var(dom_cpumask);
-+ if ( affinity == &masks )
-+ free_affinity_masks(affinity);
- }
-
- typedef long ret_t;
-diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
-index 8c6e6eb9ccd5..45b6ff99561a 100644
---- a/xen/common/sched/cpupool.c
-+++ b/xen/common/sched/cpupool.c
-@@ -401,6 +401,25 @@ int cpupool_move_domain(struct domain *d, struct cpupool *c)
- return ret;
- }
-
-+/* Update affinities of all domains in a cpupool. */
-+static void cpupool_update_node_affinity(const struct cpupool *c)
-+{
-+ struct affinity_masks masks;
-+ struct domain *d;
-+
-+ if ( !alloc_affinity_masks(&masks) )
-+ return;
-+
-+ rcu_read_lock(&domlist_read_lock);
-+
-+ for_each_domain_in_cpupool(d, c)
-+ domain_update_node_aff(d, &masks);
-+
-+ rcu_read_unlock(&domlist_read_lock);
-+
-+ free_affinity_masks(&masks);
-+}
-+
- /*
- * assign a specific cpu to a cpupool
- * cpupool_lock must be held
-@@ -408,7 +427,6 @@ int cpupool_move_domain(struct domain *d, struct cpupool *c)
- static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
- {
- int ret;
-- struct domain *d;
- const cpumask_t *cpus;
-
- cpus = sched_get_opt_cpumask(c->gran, cpu);
-@@ -433,12 +451,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
-
- rcu_read_unlock(&sched_res_rculock);
-
-- rcu_read_lock(&domlist_read_lock);
-- for_each_domain_in_cpupool(d, c)
-- {
-- domain_update_node_affinity(d);
-- }
-- rcu_read_unlock(&domlist_read_lock);
-+ cpupool_update_node_affinity(c);
-
- return 0;
- }
-@@ -447,18 +460,14 @@ static int cpupool_unassign_cpu_finish(struct cpupool *c)
- {
- int cpu = cpupool_moving_cpu;
- const cpumask_t *cpus;
-- struct domain *d;
- int ret;
-
- if ( c != cpupool_cpu_moving )
- return -EADDRNOTAVAIL;
-
-- /*
-- * We need this for scanning the domain list, both in
-- * cpu_disable_scheduler(), and at the bottom of this function.
-- */
- rcu_read_lock(&domlist_read_lock);
- ret = cpu_disable_scheduler(cpu);
-+ rcu_read_unlock(&domlist_read_lock);
-
- rcu_read_lock(&sched_res_rculock);
- cpus = get_sched_res(cpu)->cpus;
-@@ -485,11 +494,7 @@ static int cpupool_unassign_cpu_finish(struct cpupool *c)
- }
- rcu_read_unlock(&sched_res_rculock);
-
-- for_each_domain_in_cpupool(d, c)
-- {
-- domain_update_node_affinity(d);
-- }
-- rcu_read_unlock(&domlist_read_lock);
-+ cpupool_update_node_affinity(c);
-
- return ret;
- }
-diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
-index a870320146ef..2b04b01a0c0a 100644
---- a/xen/common/sched/private.h
-+++ b/xen/common/sched/private.h
-@@ -593,6 +593,13 @@ affinity_balance_cpumask(const struct sched_unit *unit, int step,
- cpumask_copy(mask, unit->cpu_hard_affinity);
- }
-
-+struct affinity_masks {
-+ cpumask_var_t hard;
-+ cpumask_var_t soft;
-+};
-+
-+bool alloc_affinity_masks(struct affinity_masks *affinity);
-+void free_affinity_masks(struct affinity_masks *affinity);
- void sched_rm_cpu(unsigned int cpu);
- const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int cpu);
- void schedule_dump(struct cpupool *c);
-diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
-index 9671062360ac..3f4225738a40 100644
---- a/xen/include/xen/sched.h
-+++ b/xen/include/xen/sched.h
-@@ -655,8 +655,15 @@ static inline void get_knownalive_domain(struct domain *d)
- ASSERT(!(atomic_read(&d->refcnt) & DOMAIN_DESTROYED));
- }
-
-+struct affinity_masks;
-+
- int domain_set_node_affinity(struct domain *d, const nodemask_t *affinity);
--void domain_update_node_affinity(struct domain *d);
-+void domain_update_node_aff(struct domain *d, struct affinity_masks *affinity);
-+
-+static inline void domain_update_node_affinity(struct domain *d)
-+{
-+ domain_update_node_aff(d, NULL);
-+}
-
- /*
- * To be implemented by each architecture, sanity checking the configuration
---
-2.37.4
-
diff --git a/0020-tools-ocaml-xc-Don-t-reference-Abstract_Tag-objects-.patch b/0020-tools-ocaml-xc-Don-t-reference-Abstract_Tag-objects-.patch
new file mode 100644
index 0000000..a829d36
--- /dev/null
+++ b/0020-tools-ocaml-xc-Don-t-reference-Abstract_Tag-objects-.patch
@@ -0,0 +1,76 @@
+From 1fdff77e26290ae1ed40e8253959d12a0c4b3d3f Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 31 Jan 2023 17:19:30 +0000
+Subject: [PATCH 20/61] tools/ocaml/xc: Don't reference Abstract_Tag objects
+ with the GC lock released
+
+The intf->{addr,len} references in the xc_map_foreign_range() call are unsafe.
+From the manual:
+
+ https://ocaml.org/manual/intfc.html#ss:parallel-execution-long-running-c-code
+
+"After caml_release_runtime_system() was called and until
+caml_acquire_runtime_system() is called, the C code must not access any OCaml
+data, nor call any function of the run-time system, nor call back into OCaml
+code."
+
+More than what the manual says, the intf pointer is (potentially) invalidated
+by caml_enter_blocking_section() if another thread happens to perform garbage
+collection at just the right (wrong) moment.
+
+Rewrite the logic. There's no need to stash data in the Ocaml object until
+the success path at the very end.
+
+Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 9e7c74e6f9fd2e44df1212643b80af9032b45b07)
+---
+ tools/ocaml/libs/xc/xenctrl_stubs.c | 23 +++++++++++------------
+ 1 file changed, 11 insertions(+), 12 deletions(-)
+
+diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
+index e2efcbe182..0a0fe45c54 100644
+--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
++++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
+@@ -937,26 +937,25 @@ CAMLprim value stub_map_foreign_range(value xch, value dom,
+ CAMLparam4(xch, dom, size, mfn);
+ CAMLlocal1(result);
+ struct mmap_interface *intf;
+- uint32_t c_dom;
+- unsigned long c_mfn;
++ unsigned long c_mfn = Nativeint_val(mfn);
++ int len = Int_val(size);
++ void *ptr;
+
+ BUILD_BUG_ON((sizeof(struct mmap_interface) % sizeof(value)) != 0);
+ result = caml_alloc(Wsize_bsize(sizeof(struct mmap_interface)),
+ Abstract_tag);
+
+- intf = (struct mmap_interface *) result;
+-
+- intf->len = Int_val(size);
+-
+- c_dom = _D(dom);
+- c_mfn = Nativeint_val(mfn);
+ caml_enter_blocking_section();
+- intf->addr = xc_map_foreign_range(_H(xch), c_dom,
+- intf->len, PROT_READ|PROT_WRITE,
+- c_mfn);
++ ptr = xc_map_foreign_range(_H(xch), _D(dom), len,
++ PROT_READ|PROT_WRITE, c_mfn);
+ caml_leave_blocking_section();
+- if (!intf->addr)
++
++ if (!ptr)
+ caml_failwith("xc_map_foreign_range error");
++
++ intf = Data_abstract_val(result);
++ *intf = (struct mmap_interface){ ptr, len };
++
+ CAMLreturn(result);
+ }
+
+--
+2.40.0
+
diff --git a/0020-xen-sched-carve-out-memory-allocation-and-freeing-fr.patch b/0020-xen-sched-carve-out-memory-allocation-and-freeing-fr.patch
deleted file mode 100644
index 0a04620..0000000
--- a/0020-xen-sched-carve-out-memory-allocation-and-freeing-fr.patch
+++ /dev/null
@@ -1,263 +0,0 @@
-From c377ceab0a007690a1e71c81a5232613c99e944d Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 11 Oct 2022 15:00:05 +0200
-Subject: [PATCH 20/87] xen/sched: carve out memory allocation and freeing from
- schedule_cpu_rm()
-
-In order to prepare not allocating or freeing memory from
-schedule_cpu_rm(), move this functionality to dedicated functions.
-
-For now call those functions from schedule_cpu_rm().
-
-No change of behavior expected.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: d42be6f83480b3ada286dc18444331a816be88a3
-master date: 2022-09-05 11:42:30 +0100
----
- xen/common/sched/core.c | 143 ++++++++++++++++++++++---------------
- xen/common/sched/private.h | 11 +++
- 2 files changed, 98 insertions(+), 56 deletions(-)
-
-diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
-index 065a83eca912..2decb1161a63 100644
---- a/xen/common/sched/core.c
-+++ b/xen/common/sched/core.c
-@@ -3221,6 +3221,75 @@ out:
- return ret;
- }
-
-+/*
-+ * Allocate all memory needed for free_cpu_rm_data(), as allocations cannot
-+ * be made in stop_machine() context.
-+ *
-+ * Between alloc_cpu_rm_data() and the real cpu removal action the relevant
-+ * contents of struct sched_resource can't change, as the cpu in question is
-+ * locked against any other movement to or from cpupools, and the data copied
-+ * by alloc_cpu_rm_data() is modified only in case the cpu in question is
-+ * being moved from or to a cpupool.
-+ */
-+struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu)
-+{
-+ struct cpu_rm_data *data;
-+ const struct sched_resource *sr;
-+ unsigned int idx;
-+
-+ rcu_read_lock(&sched_res_rculock);
-+
-+ sr = get_sched_res(cpu);
-+ data = xmalloc_flex_struct(struct cpu_rm_data, sr, sr->granularity - 1);
-+ if ( !data )
-+ goto out;
-+
-+ data->old_ops = sr->scheduler;
-+ data->vpriv_old = idle_vcpu[cpu]->sched_unit->priv;
-+ data->ppriv_old = sr->sched_priv;
-+
-+ for ( idx = 0; idx < sr->granularity - 1; idx++ )
-+ {
-+ data->sr[idx] = sched_alloc_res();
-+ if ( data->sr[idx] )
-+ {
-+ data->sr[idx]->sched_unit_idle = sched_alloc_unit_mem();
-+ if ( !data->sr[idx]->sched_unit_idle )
-+ {
-+ sched_res_free(&data->sr[idx]->rcu);
-+ data->sr[idx] = NULL;
-+ }
-+ }
-+ if ( !data->sr[idx] )
-+ {
-+ while ( idx > 0 )
-+ sched_res_free(&data->sr[--idx]->rcu);
-+ XFREE(data);
-+ goto out;
-+ }
-+
-+ data->sr[idx]->curr = data->sr[idx]->sched_unit_idle;
-+ data->sr[idx]->scheduler = &sched_idle_ops;
-+ data->sr[idx]->granularity = 1;
-+
-+ /* We want the lock not to change when replacing the resource. */
-+ data->sr[idx]->schedule_lock = sr->schedule_lock;
-+ }
-+
-+ out:
-+ rcu_read_unlock(&sched_res_rculock);
-+
-+ return data;
-+}
-+
-+void free_cpu_rm_data(struct cpu_rm_data *mem, unsigned int cpu)
-+{
-+ sched_free_udata(mem->old_ops, mem->vpriv_old);
-+ sched_free_pdata(mem->old_ops, mem->ppriv_old, cpu);
-+
-+ xfree(mem);
-+}
-+
- /*
- * Remove a pCPU from its cpupool. Its scheduler becomes &sched_idle_ops
- * (the idle scheduler).
-@@ -3229,53 +3298,23 @@ out:
- */
- int schedule_cpu_rm(unsigned int cpu)
- {
-- void *ppriv_old, *vpriv_old;
-- struct sched_resource *sr, **sr_new = NULL;
-+ struct sched_resource *sr;
-+ struct cpu_rm_data *data;
- struct sched_unit *unit;
-- struct scheduler *old_ops;
- spinlock_t *old_lock;
- unsigned long flags;
-- int idx, ret = -ENOMEM;
-+ int idx = 0;
- unsigned int cpu_iter;
-
-+ data = alloc_cpu_rm_data(cpu);
-+ if ( !data )
-+ return -ENOMEM;
-+
- rcu_read_lock(&sched_res_rculock);
-
- sr = get_sched_res(cpu);
-- old_ops = sr->scheduler;
-
-- if ( sr->granularity > 1 )
-- {
-- sr_new = xmalloc_array(struct sched_resource *, sr->granularity - 1);
-- if ( !sr_new )
-- goto out;
-- for ( idx = 0; idx < sr->granularity - 1; idx++ )
-- {
-- sr_new[idx] = sched_alloc_res();
-- if ( sr_new[idx] )
-- {
-- sr_new[idx]->sched_unit_idle = sched_alloc_unit_mem();
-- if ( !sr_new[idx]->sched_unit_idle )
-- {
-- sched_res_free(&sr_new[idx]->rcu);
-- sr_new[idx] = NULL;
-- }
-- }
-- if ( !sr_new[idx] )
-- {
-- for ( idx--; idx >= 0; idx-- )
-- sched_res_free(&sr_new[idx]->rcu);
-- goto out;
-- }
-- sr_new[idx]->curr = sr_new[idx]->sched_unit_idle;
-- sr_new[idx]->scheduler = &sched_idle_ops;
-- sr_new[idx]->granularity = 1;
--
-- /* We want the lock not to change when replacing the resource. */
-- sr_new[idx]->schedule_lock = sr->schedule_lock;
-- }
-- }
--
-- ret = 0;
-+ ASSERT(sr->granularity);
- ASSERT(sr->cpupool != NULL);
- ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus));
- ASSERT(!cpumask_test_cpu(cpu, sr->cpupool->cpu_valid));
-@@ -3283,10 +3322,6 @@ int schedule_cpu_rm(unsigned int cpu)
- /* See comment in schedule_cpu_add() regarding lock switching. */
- old_lock = pcpu_schedule_lock_irqsave(cpu, &flags);
-
-- vpriv_old = idle_vcpu[cpu]->sched_unit->priv;
-- ppriv_old = sr->sched_priv;
--
-- idx = 0;
- for_each_cpu ( cpu_iter, sr->cpus )
- {
- per_cpu(sched_res_idx, cpu_iter) = 0;
-@@ -3300,27 +3335,27 @@ int schedule_cpu_rm(unsigned int cpu)
- else
- {
- /* Initialize unit. */
-- unit = sr_new[idx]->sched_unit_idle;
-- unit->res = sr_new[idx];
-+ unit = data->sr[idx]->sched_unit_idle;
-+ unit->res = data->sr[idx];
- unit->is_running = true;
- sched_unit_add_vcpu(unit, idle_vcpu[cpu_iter]);
- sched_domain_insert_unit(unit, idle_vcpu[cpu_iter]->domain);
-
- /* Adjust cpu masks of resources (old and new). */
- cpumask_clear_cpu(cpu_iter, sr->cpus);
-- cpumask_set_cpu(cpu_iter, sr_new[idx]->cpus);
-+ cpumask_set_cpu(cpu_iter, data->sr[idx]->cpus);
- cpumask_set_cpu(cpu_iter, &sched_res_mask);
-
- /* Init timer. */
-- init_timer(&sr_new[idx]->s_timer, s_timer_fn, NULL, cpu_iter);
-+ init_timer(&data->sr[idx]->s_timer, s_timer_fn, NULL, cpu_iter);
-
- /* Last resource initializations and insert resource pointer. */
-- sr_new[idx]->master_cpu = cpu_iter;
-- set_sched_res(cpu_iter, sr_new[idx]);
-+ data->sr[idx]->master_cpu = cpu_iter;
-+ set_sched_res(cpu_iter, data->sr[idx]);
-
- /* Last action: set the new lock pointer. */
- smp_mb();
-- sr_new[idx]->schedule_lock = &sched_free_cpu_lock;
-+ data->sr[idx]->schedule_lock = &sched_free_cpu_lock;
-
- idx++;
- }
-@@ -3336,16 +3371,12 @@ int schedule_cpu_rm(unsigned int cpu)
- /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
- spin_unlock_irqrestore(old_lock, flags);
-
-- sched_deinit_pdata(old_ops, ppriv_old, cpu);
-+ sched_deinit_pdata(data->old_ops, data->ppriv_old, cpu);
-
-- sched_free_udata(old_ops, vpriv_old);
-- sched_free_pdata(old_ops, ppriv_old, cpu);
--
--out:
- rcu_read_unlock(&sched_res_rculock);
-- xfree(sr_new);
-+ free_cpu_rm_data(data, cpu);
-
-- return ret;
-+ return 0;
- }
-
- struct scheduler *scheduler_get_default(void)
-diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
-index 2b04b01a0c0a..e286849a1312 100644
---- a/xen/common/sched/private.h
-+++ b/xen/common/sched/private.h
-@@ -600,6 +600,15 @@ struct affinity_masks {
-
- bool alloc_affinity_masks(struct affinity_masks *affinity);
- void free_affinity_masks(struct affinity_masks *affinity);
-+
-+/* Memory allocation related data for schedule_cpu_rm(). */
-+struct cpu_rm_data {
-+ const struct scheduler *old_ops;
-+ void *ppriv_old;
-+ void *vpriv_old;
-+ struct sched_resource *sr[];
-+};
-+
- void sched_rm_cpu(unsigned int cpu);
- const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int cpu);
- void schedule_dump(struct cpupool *c);
-@@ -608,6 +617,8 @@ struct scheduler *scheduler_alloc(unsigned int sched_id);
- void scheduler_free(struct scheduler *sched);
- int cpu_disable_scheduler(unsigned int cpu);
- int schedule_cpu_add(unsigned int cpu, struct cpupool *c);
-+struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu);
-+void free_cpu_rm_data(struct cpu_rm_data *mem, unsigned int cpu);
- int schedule_cpu_rm(unsigned int cpu);
- int sched_move_domain(struct domain *d, struct cpupool *c);
- struct cpupool *cpupool_get_by_id(unsigned int poolid);
---
-2.37.4
-
diff --git a/0021-tools-ocaml-libs-Fix-memory-resource-leaks-with-caml.patch b/0021-tools-ocaml-libs-Fix-memory-resource-leaks-with-caml.patch
new file mode 100644
index 0000000..8ed7dfa
--- /dev/null
+++ b/0021-tools-ocaml-libs-Fix-memory-resource-leaks-with-caml.patch
@@ -0,0 +1,61 @@
+From 1b6acdeeb2323c53d841356da50440e274e7bf9a Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 1 Feb 2023 11:27:42 +0000
+Subject: [PATCH 21/61] tools/ocaml/libs: Fix memory/resource leaks with
+ caml_alloc_custom()
+
+All caml_alloc_*() functions can throw exceptions, and longjump out of
+context. If this happens, we leak the xch/xce handle.
+
+Reorder the logic to allocate the the Ocaml object first.
+
+Fixes: 8b3c06a3e545 ("tools/ocaml/xenctrl: OCaml 5 support, fix use-after-free")
+Fixes: 22d5affdf0ce ("tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit d69ccf52ad467ccc22029172a8e61dc621187889)
+---
+ tools/ocaml/libs/eventchn/xeneventchn_stubs.c | 6 ++++--
+ 1 file changed, 4 insertions(+), 2 deletions(-)
+
+diff --git a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
+index d7881ca95f..de2fc29292 100644
+--- a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
++++ b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
+@@ -63,6 +63,8 @@ CAMLprim value stub_eventchn_init(value cloexec)
+ if ( !Bool_val(cloexec) )
+ flags |= XENEVTCHN_NO_CLOEXEC;
+
++ result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
++
+ caml_enter_blocking_section();
+ xce = xenevtchn_open(NULL, flags);
+ caml_leave_blocking_section();
+@@ -70,7 +72,6 @@ CAMLprim value stub_eventchn_init(value cloexec)
+ if (xce == NULL)
+ caml_failwith("open failed");
+
+- result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
+ *(xenevtchn_handle **)Data_custom_val(result) = xce;
+
+ CAMLreturn(result);
+@@ -82,6 +83,8 @@ CAMLprim value stub_eventchn_fdopen(value fdval)
+ CAMLlocal1(result);
+ xenevtchn_handle *xce;
+
++ result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
++
+ caml_enter_blocking_section();
+ xce = xenevtchn_fdopen(NULL, Int_val(fdval), 0);
+ caml_leave_blocking_section();
+@@ -89,7 +92,6 @@ CAMLprim value stub_eventchn_fdopen(value fdval)
+ if (xce == NULL)
+ caml_failwith("evtchn fdopen failed");
+
+- result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
+ *(xenevtchn_handle **)Data_custom_val(result) = xce;
+
+ CAMLreturn(result);
+--
+2.40.0
+
diff --git a/0021-xen-sched-fix-cpu-hotplug.patch b/0021-xen-sched-fix-cpu-hotplug.patch
deleted file mode 100644
index ac3b1d7..0000000
--- a/0021-xen-sched-fix-cpu-hotplug.patch
+++ /dev/null
@@ -1,307 +0,0 @@
-From 4f3204c2bc66db18c61600dd3e08bf1fd9584a1b Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 11 Oct 2022 15:00:19 +0200
-Subject: [PATCH 21/87] xen/sched: fix cpu hotplug
-
-Cpu unplugging is calling schedule_cpu_rm() via stop_machine_run() with
-interrupts disabled, thus any memory allocation or freeing must be
-avoided.
-
-Since commit 5047cd1d5dea ("xen/common: Use enhanced
-ASSERT_ALLOC_CONTEXT in xmalloc()") this restriction is being enforced
-via an assertion, which will now fail.
-
-Fix this by allocating needed memory before entering stop_machine_run()
-and freeing any memory only after having finished stop_machine_run().
-
-Fixes: 1ec410112cdd ("xen/sched: support differing granularity in schedule_cpu_[add/rm]()")
-Reported-by: Gao Ruifeng <ruifeng.gao@intel.com>
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: d84473689611eed32fd90b27e614f28af767fa3f
-master date: 2022-09-05 11:42:30 +0100
----
- xen/common/sched/core.c | 25 +++++++++++---
- xen/common/sched/cpupool.c | 69 +++++++++++++++++++++++++++++---------
- xen/common/sched/private.h | 5 +--
- 3 files changed, 77 insertions(+), 22 deletions(-)
-
-diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
-index 2decb1161a63..900aab8f66a7 100644
---- a/xen/common/sched/core.c
-+++ b/xen/common/sched/core.c
-@@ -3231,7 +3231,7 @@ out:
- * by alloc_cpu_rm_data() is modified only in case the cpu in question is
- * being moved from or to a cpupool.
- */
--struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu)
-+struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu, bool aff_alloc)
- {
- struct cpu_rm_data *data;
- const struct sched_resource *sr;
-@@ -3244,6 +3244,17 @@ struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu)
- if ( !data )
- goto out;
-
-+ if ( aff_alloc )
-+ {
-+ if ( !alloc_affinity_masks(&data->affinity) )
-+ {
-+ XFREE(data);
-+ goto out;
-+ }
-+ }
-+ else
-+ memset(&data->affinity, 0, sizeof(data->affinity));
-+
- data->old_ops = sr->scheduler;
- data->vpriv_old = idle_vcpu[cpu]->sched_unit->priv;
- data->ppriv_old = sr->sched_priv;
-@@ -3264,6 +3275,7 @@ struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu)
- {
- while ( idx > 0 )
- sched_res_free(&data->sr[--idx]->rcu);
-+ free_affinity_masks(&data->affinity);
- XFREE(data);
- goto out;
- }
-@@ -3286,6 +3298,7 @@ void free_cpu_rm_data(struct cpu_rm_data *mem, unsigned int cpu)
- {
- sched_free_udata(mem->old_ops, mem->vpriv_old);
- sched_free_pdata(mem->old_ops, mem->ppriv_old, cpu);
-+ free_affinity_masks(&mem->affinity);
-
- xfree(mem);
- }
-@@ -3296,17 +3309,18 @@ void free_cpu_rm_data(struct cpu_rm_data *mem, unsigned int cpu)
- * The cpu is already marked as "free" and not valid any longer for its
- * cpupool.
- */
--int schedule_cpu_rm(unsigned int cpu)
-+int schedule_cpu_rm(unsigned int cpu, struct cpu_rm_data *data)
- {
- struct sched_resource *sr;
-- struct cpu_rm_data *data;
- struct sched_unit *unit;
- spinlock_t *old_lock;
- unsigned long flags;
- int idx = 0;
- unsigned int cpu_iter;
-+ bool free_data = !data;
-
-- data = alloc_cpu_rm_data(cpu);
-+ if ( !data )
-+ data = alloc_cpu_rm_data(cpu, false);
- if ( !data )
- return -ENOMEM;
-
-@@ -3374,7 +3388,8 @@ int schedule_cpu_rm(unsigned int cpu)
- sched_deinit_pdata(data->old_ops, data->ppriv_old, cpu);
-
- rcu_read_unlock(&sched_res_rculock);
-- free_cpu_rm_data(data, cpu);
-+ if ( free_data )
-+ free_cpu_rm_data(data, cpu);
-
- return 0;
- }
-diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
-index 45b6ff99561a..b5a948639aad 100644
---- a/xen/common/sched/cpupool.c
-+++ b/xen/common/sched/cpupool.c
-@@ -402,22 +402,28 @@ int cpupool_move_domain(struct domain *d, struct cpupool *c)
- }
-
- /* Update affinities of all domains in a cpupool. */
--static void cpupool_update_node_affinity(const struct cpupool *c)
-+static void cpupool_update_node_affinity(const struct cpupool *c,
-+ struct affinity_masks *masks)
- {
-- struct affinity_masks masks;
-+ struct affinity_masks local_masks;
- struct domain *d;
-
-- if ( !alloc_affinity_masks(&masks) )
-- return;
-+ if ( !masks )
-+ {
-+ if ( !alloc_affinity_masks(&local_masks) )
-+ return;
-+ masks = &local_masks;
-+ }
-
- rcu_read_lock(&domlist_read_lock);
-
- for_each_domain_in_cpupool(d, c)
-- domain_update_node_aff(d, &masks);
-+ domain_update_node_aff(d, masks);
-
- rcu_read_unlock(&domlist_read_lock);
-
-- free_affinity_masks(&masks);
-+ if ( masks == &local_masks )
-+ free_affinity_masks(masks);
- }
-
- /*
-@@ -451,15 +457,17 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu)
-
- rcu_read_unlock(&sched_res_rculock);
-
-- cpupool_update_node_affinity(c);
-+ cpupool_update_node_affinity(c, NULL);
-
- return 0;
- }
-
--static int cpupool_unassign_cpu_finish(struct cpupool *c)
-+static int cpupool_unassign_cpu_finish(struct cpupool *c,
-+ struct cpu_rm_data *mem)
- {
- int cpu = cpupool_moving_cpu;
- const cpumask_t *cpus;
-+ struct affinity_masks *masks = mem ? &mem->affinity : NULL;
- int ret;
-
- if ( c != cpupool_cpu_moving )
-@@ -482,7 +490,7 @@ static int cpupool_unassign_cpu_finish(struct cpupool *c)
- */
- if ( !ret )
- {
-- ret = schedule_cpu_rm(cpu);
-+ ret = schedule_cpu_rm(cpu, mem);
- if ( ret )
- cpumask_andnot(&cpupool_free_cpus, &cpupool_free_cpus, cpus);
- else
-@@ -494,7 +502,7 @@ static int cpupool_unassign_cpu_finish(struct cpupool *c)
- }
- rcu_read_unlock(&sched_res_rculock);
-
-- cpupool_update_node_affinity(c);
-+ cpupool_update_node_affinity(c, masks);
-
- return ret;
- }
-@@ -558,7 +566,7 @@ static long cpupool_unassign_cpu_helper(void *info)
- cpupool_cpu_moving->cpupool_id, cpupool_moving_cpu);
- spin_lock(&cpupool_lock);
-
-- ret = cpupool_unassign_cpu_finish(c);
-+ ret = cpupool_unassign_cpu_finish(c, NULL);
-
- spin_unlock(&cpupool_lock);
- debugtrace_printk("cpupool_unassign_cpu ret=%ld\n", ret);
-@@ -701,7 +709,7 @@ static int cpupool_cpu_add(unsigned int cpu)
- * This function is called in stop_machine context, so we can be sure no
- * non-idle vcpu is active on the system.
- */
--static void cpupool_cpu_remove(unsigned int cpu)
-+static void cpupool_cpu_remove(unsigned int cpu, struct cpu_rm_data *mem)
- {
- int ret;
-
-@@ -709,7 +717,7 @@ static void cpupool_cpu_remove(unsigned int cpu)
-
- if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) )
- {
-- ret = cpupool_unassign_cpu_finish(cpupool0);
-+ ret = cpupool_unassign_cpu_finish(cpupool0, mem);
- BUG_ON(ret);
- }
- cpumask_clear_cpu(cpu, &cpupool_free_cpus);
-@@ -775,7 +783,7 @@ static void cpupool_cpu_remove_forced(unsigned int cpu)
- {
- ret = cpupool_unassign_cpu_start(c, master_cpu);
- BUG_ON(ret);
-- ret = cpupool_unassign_cpu_finish(c);
-+ ret = cpupool_unassign_cpu_finish(c, NULL);
- BUG_ON(ret);
- }
- }
-@@ -993,12 +1001,24 @@ void dump_runq(unsigned char key)
- static int cpu_callback(
- struct notifier_block *nfb, unsigned long action, void *hcpu)
- {
-+ static struct cpu_rm_data *mem;
-+
- unsigned int cpu = (unsigned long)hcpu;
- int rc = 0;
-
- switch ( action )
- {
- case CPU_DOWN_FAILED:
-+ if ( system_state <= SYS_STATE_active )
-+ {
-+ if ( mem )
-+ {
-+ free_cpu_rm_data(mem, cpu);
-+ mem = NULL;
-+ }
-+ rc = cpupool_cpu_add(cpu);
-+ }
-+ break;
- case CPU_ONLINE:
- if ( system_state <= SYS_STATE_active )
- rc = cpupool_cpu_add(cpu);
-@@ -1006,12 +1026,31 @@ static int cpu_callback(
- case CPU_DOWN_PREPARE:
- /* Suspend/Resume don't change assignments of cpus to cpupools. */
- if ( system_state <= SYS_STATE_active )
-+ {
- rc = cpupool_cpu_remove_prologue(cpu);
-+ if ( !rc )
-+ {
-+ ASSERT(!mem);
-+ mem = alloc_cpu_rm_data(cpu, true);
-+ rc = mem ? 0 : -ENOMEM;
-+ }
-+ }
- break;
- case CPU_DYING:
- /* Suspend/Resume don't change assignments of cpus to cpupools. */
- if ( system_state <= SYS_STATE_active )
-- cpupool_cpu_remove(cpu);
-+ {
-+ ASSERT(mem);
-+ cpupool_cpu_remove(cpu, mem);
-+ }
-+ break;
-+ case CPU_DEAD:
-+ if ( system_state <= SYS_STATE_active )
-+ {
-+ ASSERT(mem);
-+ free_cpu_rm_data(mem, cpu);
-+ mem = NULL;
-+ }
- break;
- case CPU_RESUME_FAILED:
- cpupool_cpu_remove_forced(cpu);
-diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
-index e286849a1312..0126a4bb9ed3 100644
---- a/xen/common/sched/private.h
-+++ b/xen/common/sched/private.h
-@@ -603,6 +603,7 @@ void free_affinity_masks(struct affinity_masks *affinity);
-
- /* Memory allocation related data for schedule_cpu_rm(). */
- struct cpu_rm_data {
-+ struct affinity_masks affinity;
- const struct scheduler *old_ops;
- void *ppriv_old;
- void *vpriv_old;
-@@ -617,9 +618,9 @@ struct scheduler *scheduler_alloc(unsigned int sched_id);
- void scheduler_free(struct scheduler *sched);
- int cpu_disable_scheduler(unsigned int cpu);
- int schedule_cpu_add(unsigned int cpu, struct cpupool *c);
--struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu);
-+struct cpu_rm_data *alloc_cpu_rm_data(unsigned int cpu, bool aff_alloc);
- void free_cpu_rm_data(struct cpu_rm_data *mem, unsigned int cpu);
--int schedule_cpu_rm(unsigned int cpu);
-+int schedule_cpu_rm(unsigned int cpu, struct cpu_rm_data *mem);
- int sched_move_domain(struct domain *d, struct cpupool *c);
- struct cpupool *cpupool_get_by_id(unsigned int poolid);
- void cpupool_put(struct cpupool *pool);
---
-2.37.4
-
diff --git a/0022-Config.mk-correct-PIE-related-option-s-in-EMBEDDED_E.patch b/0022-Config.mk-correct-PIE-related-option-s-in-EMBEDDED_E.patch
deleted file mode 100644
index 5432b3c..0000000
--- a/0022-Config.mk-correct-PIE-related-option-s-in-EMBEDDED_E.patch
+++ /dev/null
@@ -1,58 +0,0 @@
-From 2b694dd2932be78431b14257f23b738f2fc8f6a1 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 11 Oct 2022 15:00:33 +0200
-Subject: [PATCH 22/87] Config.mk: correct PIE-related option(s) in
- EMBEDDED_EXTRA_CFLAGS
-
-I haven't been able to find evidence of "-nopie" ever having been a
-supported compiler option. The correct spelling is "-no-pie".
-Furthermore like "-pie" this is an option which is solely passed to the
-linker. The compiler only recognizes "-fpie" / "-fPIE" / "-fno-pie", and
-it doesn't infer these options from "-pie" / "-no-pie".
-
-Add the compiler recognized form, but for the possible case of the
-variable also being used somewhere for linking keep the linker option as
-well (with corrected spelling).
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Julien Grall <jgrall@amazon.com>
-
-Build: Drop -no-pie from EMBEDDED_EXTRA_CFLAGS
-
-This breaks all Clang builds, as demostrated by Gitlab CI.
-
-Contrary to the description in ecd6b9759919, -no-pie is not even an option
-passed to the linker. GCC's actual behaviour is to inhibit the passing of
--pie to the linker, as well as selecting different cr0 artefacts to be linked.
-
-EMBEDDED_EXTRA_CFLAGS is not used for $(CC)-doing-linking, and not liable to
-gain such a usecase.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-Tested-by: Stefano Stabellini <sstabellini@kernel.org>
-Fixes: ecd6b9759919 ("Config.mk: correct PIE-related option(s) in EMBEDDED_EXTRA_CFLAGS")
-master commit: ecd6b9759919fa6335b0be1b5fc5cce29a30c4f1
-master date: 2022-09-08 09:25:26 +0200
-master commit: 13a7c0074ac8fb31f6c0485429b7a20a1946cb22
-master date: 2022-09-27 15:40:42 -0700
----
- Config.mk | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/Config.mk b/Config.mk
-index 46de3cd1e0e1..6f95067b8de6 100644
---- a/Config.mk
-+++ b/Config.mk
-@@ -197,7 +197,7 @@ endif
- APPEND_LDFLAGS += $(foreach i, $(APPEND_LIB), -L$(i))
- APPEND_CFLAGS += $(foreach i, $(APPEND_INCLUDES), -I$(i))
-
--EMBEDDED_EXTRA_CFLAGS := -nopie -fno-stack-protector -fno-stack-protector-all
-+EMBEDDED_EXTRA_CFLAGS := -fno-pie -fno-stack-protector -fno-stack-protector-all
- EMBEDDED_EXTRA_CFLAGS += -fno-exceptions -fno-asynchronous-unwind-tables
-
- XEN_EXTFILES_URL ?= http://xenbits.xen.org/xen-extfiles
---
-2.37.4
-
diff --git a/0022-x86-spec-ctrl-Mitigate-Cross-Thread-Return-Address-P.patch b/0022-x86-spec-ctrl-Mitigate-Cross-Thread-Return-Address-P.patch
new file mode 100644
index 0000000..1d1edb0
--- /dev/null
+++ b/0022-x86-spec-ctrl-Mitigate-Cross-Thread-Return-Address-P.patch
@@ -0,0 +1,120 @@
+From d4e286db89d80c862b4a24bf971dd71008c8b53e Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 8 Sep 2022 21:27:58 +0100
+Subject: [PATCH 22/61] x86/spec-ctrl: Mitigate Cross-Thread Return Address
+ Predictions
+
+This is XSA-426 / CVE-2022-27672
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 63305e5392ec2d17b85e7996a97462744425db80)
+---
+ docs/misc/xen-command-line.pandoc | 2 +-
+ xen/arch/x86/spec_ctrl.c | 31 ++++++++++++++++++++++++++++---
+ xen/include/asm-x86/cpufeatures.h | 3 ++-
+ xen/include/asm-x86/spec_ctrl.h | 15 +++++++++++++++
+ 4 files changed, 46 insertions(+), 5 deletions(-)
+
+diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
+index bd6826d0ae..b3f60cd923 100644
+--- a/docs/misc/xen-command-line.pandoc
++++ b/docs/misc/xen-command-line.pandoc
+@@ -2275,7 +2275,7 @@ guests to use.
+ on entry and exit. These blocks are necessary to virtualise support for
+ guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
+ * `rsb=` offers control over whether to overwrite the Return Stack Buffer /
+- Return Address Stack on entry to Xen.
++ Return Address Stack on entry to Xen and on idle.
+ * `md-clear=` offers control over whether to use VERW to flush
+ microarchitectural buffers on idle and exit from Xen. *Note: For
+ compatibility with development versions of this fix, `mds=` is also accepted
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 90d86fe5cb..14649d92f5 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -1317,13 +1317,38 @@ void __init init_speculation_mitigations(void)
+ * 3) Some CPUs have RSBs which are not full width, which allow the
+ * attacker's entries to alias Xen addresses.
+ *
++ * 4) Some CPUs have RSBs which are re-partitioned based on thread
++ * idleness, which allows an attacker to inject entries into the other
++ * thread. We still active the optimisation in this case, and mitigate
++ * in the idle path which has lower overhead.
++ *
+ * It is safe to turn off RSB stuffing when Xen is using SMEP itself, and
+ * 32bit PV guests are disabled, and when the RSB is full width.
+ */
+ BUILD_BUG_ON(RO_MPT_VIRT_START != PML4_ADDR(256));
+- if ( opt_rsb_pv == -1 && boot_cpu_has(X86_FEATURE_XEN_SMEP) &&
+- !opt_pv32 && rsb_is_full_width() )
+- opt_rsb_pv = 0;
++ if ( opt_rsb_pv == -1 )
++ {
++ opt_rsb_pv = (opt_pv32 || !boot_cpu_has(X86_FEATURE_XEN_SMEP) ||
++ !rsb_is_full_width());
++
++ /*
++ * Cross-Thread Return Address Predictions.
++ *
++ * Vulnerable systems are Zen1/Zen2 uarch, which is AMD Fam17 / Hygon
++ * Fam18, when SMT is active.
++ *
++ * To mitigate, we must flush the RSB/RAS/RAP once between entering
++ * Xen and going idle.
++ *
++ * Most cases flush on entry to Xen anyway. The one case where we
++ * don't is when using the SMEP optimisation for PV guests. Flushing
++ * before going idle is less overhead than flushing on PV entry.
++ */
++ if ( !opt_rsb_pv && hw_smt_enabled &&
++ (boot_cpu_data.x86_vendor & (X86_VENDOR_AMD|X86_VENDOR_HYGON)) &&
++ (boot_cpu_data.x86 == 0x17 || boot_cpu_data.x86 == 0x18) )
++ setup_force_cpu_cap(X86_FEATURE_SC_RSB_IDLE);
++ }
+
+ if ( opt_rsb_pv )
+ {
+diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
+index ecc1bb0950..ccf9d7287c 100644
+--- a/xen/include/asm-x86/cpufeatures.h
++++ b/xen/include/asm-x86/cpufeatures.h
+@@ -35,7 +35,8 @@ XEN_CPUFEATURE(SC_RSB_HVM, X86_SYNTH(19)) /* RSB overwrite needed for HVM
+ XEN_CPUFEATURE(XEN_SELFSNOOP, X86_SYNTH(20)) /* SELFSNOOP gets used by Xen itself */
+ XEN_CPUFEATURE(SC_MSR_IDLE, X86_SYNTH(21)) /* Clear MSR_SPEC_CTRL on idle */
+ XEN_CPUFEATURE(XEN_LBR, X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
+-/* Bits 23,24 unused. */
++/* Bits 23 unused. */
++XEN_CPUFEATURE(SC_RSB_IDLE, X86_SYNTH(24)) /* RSB overwrite needed for idle. */
+ XEN_CPUFEATURE(SC_VERW_IDLE, X86_SYNTH(25)) /* VERW used by Xen for idle */
+ XEN_CPUFEATURE(XEN_SHSTK, X86_SYNTH(26)) /* Xen uses CET Shadow Stacks */
+ XEN_CPUFEATURE(XEN_IBT, X86_SYNTH(27)) /* Xen uses CET Indirect Branch Tracking */
+diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
+index 6a77c39378..391973ef6a 100644
+--- a/xen/include/asm-x86/spec_ctrl.h
++++ b/xen/include/asm-x86/spec_ctrl.h
+@@ -159,6 +159,21 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
+ */
+ alternative_input("", "verw %[sel]", X86_FEATURE_SC_VERW_IDLE,
+ [sel] "m" (info->verw_sel));
++
++ /*
++ * Cross-Thread Return Address Predictions:
++ *
++ * On vulnerable systems, the return predictions (RSB/RAS) are statically
++ * partitioned between active threads. When entering idle, our entries
++ * are re-partitioned to allow the other threads to use them.
++ *
++ * In some cases, we might still have guest entries in the RAS, so flush
++ * them before injecting them sideways to our sibling thread.
++ *
++ * (ab)use alternative_input() to specify clobbers.
++ */
++ alternative_input("", "DO_OVERWRITE_RSB", X86_FEATURE_SC_RSB_IDLE,
++ : "rax", "rcx");
+ }
+
+ /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
+--
+2.40.0
+
diff --git a/0023-automation-Remove-clang-8-from-Debian-unstable-conta.patch b/0023-automation-Remove-clang-8-from-Debian-unstable-conta.patch
new file mode 100644
index 0000000..36dfb4f
--- /dev/null
+++ b/0023-automation-Remove-clang-8-from-Debian-unstable-conta.patch
@@ -0,0 +1,84 @@
+From 0802504627453a54b1ab408b6e9dc8b5c561172d Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Tue, 21 Feb 2023 16:55:38 +0000
+Subject: [PATCH 23/61] automation: Remove clang-8 from Debian unstable
+ container
+
+First, apt complain that it isn't the right way to add keys anymore,
+but hopefully that's just a warning.
+
+Second, we can't install clang-8:
+The following packages have unmet dependencies:
+ clang-8 : Depends: libstdc++-8-dev but it is not installable
+ Depends: libgcc-8-dev but it is not installable
+ Depends: libobjc-8-dev but it is not installable
+ Recommends: llvm-8-dev but it is not going to be installed
+ Recommends: libomp-8-dev but it is not going to be installed
+ libllvm8 : Depends: libffi7 (>= 3.3~20180313) but it is not installable
+E: Unable to correct problems, you have held broken packages.
+
+clang on Debian unstable is now version 14.0.6.
+
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+(cherry picked from commit a6b1e2b80fe2053b1c9c9843fb086a668513ea36)
+---
+ automation/build/debian/unstable-llvm-8.list | 3 ---
+ automation/build/debian/unstable.dockerfile | 12 ------------
+ automation/gitlab-ci/build.yaml | 10 ----------
+ 3 files changed, 25 deletions(-)
+ delete mode 100644 automation/build/debian/unstable-llvm-8.list
+
+diff --git a/automation/build/debian/unstable-llvm-8.list b/automation/build/debian/unstable-llvm-8.list
+deleted file mode 100644
+index dc119fa0b4..0000000000
+--- a/automation/build/debian/unstable-llvm-8.list
++++ /dev/null
+@@ -1,3 +0,0 @@
+-# Unstable LLVM 8 repos
+-deb http://apt.llvm.org/unstable/ llvm-toolchain-8 main
+-deb-src http://apt.llvm.org/unstable/ llvm-toolchain-8 main
+diff --git a/automation/build/debian/unstable.dockerfile b/automation/build/debian/unstable.dockerfile
+index bd61cd12c2..828afa2e1e 100644
+--- a/automation/build/debian/unstable.dockerfile
++++ b/automation/build/debian/unstable.dockerfile
+@@ -52,15 +52,3 @@ RUN apt-get update && \
+ apt-get autoremove -y && \
+ apt-get clean && \
+ rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
+-
+-RUN wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key|apt-key add -
+-COPY unstable-llvm-8.list /etc/apt/sources.list.d/
+-
+-RUN apt-get update && \
+- apt-get --quiet --yes install \
+- clang-8 \
+- lld-8 \
+- && \
+- apt-get autoremove -y && \
+- apt-get clean && \
+- rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
+diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
+index fdd5c76582..06a75a8c5a 100644
+--- a/automation/gitlab-ci/build.yaml
++++ b/automation/gitlab-ci/build.yaml
+@@ -304,16 +304,6 @@ debian-unstable-clang-debug:
+ variables:
+ CONTAINER: debian:unstable
+
+-debian-unstable-clang-8:
+- extends: .clang-8-x86-64-build
+- variables:
+- CONTAINER: debian:unstable
+-
+-debian-unstable-clang-8-debug:
+- extends: .clang-8-x86-64-build-debug
+- variables:
+- CONTAINER: debian:unstable
+-
+ debian-unstable-gcc:
+ extends: .gcc-x86-64-build
+ variables:
+--
+2.40.0
+
diff --git a/0023-tools-xenstore-minor-fix-of-the-migration-stream-doc.patch b/0023-tools-xenstore-minor-fix-of-the-migration-stream-doc.patch
deleted file mode 100644
index 724d1d8..0000000
--- a/0023-tools-xenstore-minor-fix-of-the-migration-stream-doc.patch
+++ /dev/null
@@ -1,41 +0,0 @@
-From 49510071ee93905378e54664778760ed3908d447 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 11 Oct 2022 15:00:59 +0200
-Subject: [PATCH 23/87] tools/xenstore: minor fix of the migration stream doc
-
-Drop mentioning the non-existent read-only socket in the migration
-stream description document.
-
-The related record field was removed in commit 8868a0e3f674 ("docs:
-update the xenstore migration stream documentation).
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: Julien Grall <jgrall@amazon.com>
-master commit: ace1d2eff80d3d66c37ae765dae3e3cb5697e5a4
-master date: 2022-09-08 09:25:58 +0200
----
- docs/designs/xenstore-migration.md | 8 +++-----
- 1 file changed, 3 insertions(+), 5 deletions(-)
-
-diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-migration.md
-index 5f1155273ec3..78530bbb0ef4 100644
---- a/docs/designs/xenstore-migration.md
-+++ b/docs/designs/xenstore-migration.md
-@@ -129,11 +129,9 @@ xenstored state that needs to be restored.
- | `evtchn-fd` | The file descriptor used to communicate with |
- | | the event channel driver |
-
--xenstored will resume in the original process context. Hence `rw-socket-fd` and
--`ro-socket-fd` simply specify the file descriptors of the sockets. Sockets
--are not always used, however, and so -1 will be used to denote an unused
--socket.
--
-+xenstored will resume in the original process context. Hence `rw-socket-fd`
-+simply specifies the file descriptor of the socket. Sockets are not always
-+used, however, and so -1 will be used to denote an unused socket.
-
- \pagebreak
-
---
-2.37.4
-
diff --git a/0024-libs-util-Fix-parallel-build-between-flex-bison-and-.patch b/0024-libs-util-Fix-parallel-build-between-flex-bison-and-.patch
new file mode 100644
index 0000000..6164878
--- /dev/null
+++ b/0024-libs-util-Fix-parallel-build-between-flex-bison-and-.patch
@@ -0,0 +1,50 @@
+From e4b5dff3d06421847761669a3676bef1f23e705a Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Fri, 3 Mar 2023 08:06:23 +0100
+Subject: [PATCH 24/61] libs/util: Fix parallel build between flex/bison and CC
+ rules
+
+flex/bison generate two targets, and when those targets are
+prerequisite of other rules they are considered independently by make.
+
+We can have a situation where the .c file is out-of-date but not the
+.h, git checkout for example. In this case, if a rule only have the .h
+file as prerequiste, make will procced and start to build the object.
+In parallel, another target can have the .c file as prerequisite and
+make will find out it need re-generating and do so, changing the .h at
+the same time. This parallel task breaks the first one.
+
+To avoid this scenario, we put both the header and the source as
+prerequisite for all object even if they only need the header.
+
+Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: bf652a50fb3bb3b1b3d93db6fb79bc28f978fe75
+master date: 2023-02-09 18:26:17 +0000
+---
+ tools/libs/util/Makefile | 8 ++++++++
+ 1 file changed, 8 insertions(+)
+
+diff --git a/tools/libs/util/Makefile b/tools/libs/util/Makefile
+index b739360be7..977849c056 100644
+--- a/tools/libs/util/Makefile
++++ b/tools/libs/util/Makefile
+@@ -41,6 +41,14 @@ include $(XEN_ROOT)/tools/libs/libs.mk
+
+ $(LIB_OBJS) $(PIC_OBJS): $(AUTOINCS) _paths.h
+
++# Adding the .c conterparts of the headers generated by flex/bison as
++# prerequisite of all objects.
++# This is to tell make that if only the .c file is out-of-date but not the
++# header, it should still wait for the .c file to be rebuilt.
++# Otherwise, make doesn't considered "%.c %.h" as grouped targets, and will run
++# the flex/bison rules in parallel of CC rules which only need the header.
++$(LIB_OBJS) $(PIC_OBJS): libxlu_cfg_l.c libxlu_cfg_y.c libxlu_disk_l.c
++
+ %.c %.h:: %.y
+ @rm -f $*.[ch]
+ $(BISON) --output=$*.c $<
+--
+2.40.0
+
diff --git a/0024-xen-gnttab-fix-gnttab_acquire_resource.patch b/0024-xen-gnttab-fix-gnttab_acquire_resource.patch
deleted file mode 100644
index 49c0b7a..0000000
--- a/0024-xen-gnttab-fix-gnttab_acquire_resource.patch
+++ /dev/null
@@ -1,69 +0,0 @@
-From b9560762392c01b3ee84148c07be8017cb42dbc9 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 11 Oct 2022 15:01:22 +0200
-Subject: [PATCH 24/87] xen/gnttab: fix gnttab_acquire_resource()
-
-Commit 9dc46386d89d ("gnttab: work around "may be used uninitialized"
-warning") was wrong, as vaddrs can legitimately be NULL in case
-XENMEM_resource_grant_table_id_status was specified for a grant table
-v1. This would result in crashes in debug builds due to
-ASSERT_UNREACHABLE() triggering.
-
-Check vaddrs only to be NULL in the rc == 0 case.
-
-Expand the tests in tools/tests/resource to tickle this path, and verify that
-using XENMEM_resource_grant_table_id_status on a v1 grant table fails.
-
-Fixes: 9dc46386d89d ("gnttab: work around "may be used uninitialized" warning")
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com> # xen
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 52daa6a8483e4fbd6757c9d1b791e23931791608
-master date: 2022-09-09 16:28:38 +0100
----
- tools/tests/resource/test-resource.c | 15 +++++++++++++++
- xen/common/grant_table.c | 2 +-
- 2 files changed, 16 insertions(+), 1 deletion(-)
-
-diff --git a/tools/tests/resource/test-resource.c b/tools/tests/resource/test-resource.c
-index 0557f8a1b585..37dfff4dcd20 100644
---- a/tools/tests/resource/test-resource.c
-+++ b/tools/tests/resource/test-resource.c
-@@ -106,6 +106,21 @@ static void test_gnttab(uint32_t domid, unsigned int nr_frames,
- if ( rc )
- return fail(" Fail: Unmap grant table %d - %s\n",
- errno, strerror(errno));
-+
-+ /*
-+ * Verify that an attempt to map the status frames fails, as the domain is
-+ * in gnttab v1 mode.
-+ */
-+ res = xenforeignmemory_map_resource(
-+ fh, domid, XENMEM_resource_grant_table,
-+ XENMEM_resource_grant_table_id_status, 0, 1,
-+ (void **)&gnttab, PROT_READ | PROT_WRITE, 0);
-+
-+ if ( res )
-+ {
-+ fail(" Fail: Managed to map gnttab v2 status frames in v1 mode\n");
-+ xenforeignmemory_unmap_resource(fh, res);
-+ }
- }
-
- static void test_domain_configurations(void)
-diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
-index d8ca645b96ff..76272b3c8add 100644
---- a/xen/common/grant_table.c
-+++ b/xen/common/grant_table.c
-@@ -4142,7 +4142,7 @@ int gnttab_acquire_resource(
- * on non-error paths, and hence it needs setting to NULL at the top of the
- * function. Leave some runtime safety.
- */
-- if ( !vaddrs )
-+ if ( !rc && !vaddrs )
- {
- ASSERT_UNREACHABLE();
- rc = -ENODATA;
---
-2.37.4
-
diff --git a/0025-x86-cpuid-Infrastructure-for-leaves-7-1-ecx-edx.patch b/0025-x86-cpuid-Infrastructure-for-leaves-7-1-ecx-edx.patch
new file mode 100644
index 0000000..e73f62d
--- /dev/null
+++ b/0025-x86-cpuid-Infrastructure-for-leaves-7-1-ecx-edx.patch
@@ -0,0 +1,128 @@
+From 2094f834b85d32233c76763b014bc8764c3e36b1 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 3 Mar 2023 08:06:44 +0100
+Subject: [PATCH 25/61] x86/cpuid: Infrastructure for leaves 7:1{ecx,edx}
+
+We don't actually need ecx yet, but adding it in now will reduce the amount to
+which leaf 7 is out of order in a featureset.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: b4a23bf6293aadecfd03bf9e83974443e2eac9cb
+master date: 2023-02-09 18:26:17 +0000
+---
+ tools/misc/xen-cpuid.c | 10 ++++++++++
+ xen/arch/x86/cpu/common.c | 3 ++-
+ xen/include/public/arch-x86/cpufeatureset.h | 4 ++++
+ xen/include/xen/lib/x86/cpuid.h | 17 +++++++++++++++--
+ 4 files changed, 31 insertions(+), 3 deletions(-)
+
+diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
+index cd094427dd..3cfbbf043f 100644
+--- a/tools/misc/xen-cpuid.c
++++ b/tools/misc/xen-cpuid.c
+@@ -198,6 +198,14 @@ static const char *const str_7b1[32] =
+ {
+ };
+
++static const char *const str_7c1[32] =
++{
++};
++
++static const char *const str_7d1[32] =
++{
++};
++
+ static const char *const str_7d2[32] =
+ {
+ [ 0] = "intel-psfd",
+@@ -223,6 +231,8 @@ static const struct {
+ { "0x80000021.eax", "e21a", str_e21a },
+ { "0x00000007:1.ebx", "7b1", str_7b1 },
+ { "0x00000007:2.edx", "7d2", str_7d2 },
++ { "0x00000007:1.ecx", "7c1", str_7c1 },
++ { "0x00000007:1.edx", "7d1", str_7d1 },
+ };
+
+ #define COL_ALIGN "18"
+diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
+index 9ce148a666..8222de6461 100644
+--- a/xen/arch/x86/cpu/common.c
++++ b/xen/arch/x86/cpu/common.c
+@@ -448,7 +448,8 @@ static void generic_identify(struct cpuinfo_x86 *c)
+ cpuid_count(7, 1,
+ &c->x86_capability[FEATURESET_7a1],
+ &c->x86_capability[FEATURESET_7b1],
+- &tmp, &tmp);
++ &c->x86_capability[FEATURESET_7c1],
++ &c->x86_capability[FEATURESET_7d1]);
+ if (max_subleaf >= 2)
+ cpuid_count(7, 2,
+ &tmp, &tmp, &tmp,
+diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
+index e073122140..0b01ca5e8f 100644
+--- a/xen/include/public/arch-x86/cpufeatureset.h
++++ b/xen/include/public/arch-x86/cpufeatureset.h
+@@ -304,6 +304,10 @@ XEN_CPUFEATURE(NSCB, 11*32+ 6) /*A Null Selector Clears Base (and
+ /* Intel-defined CPU features, CPUID level 0x00000007:2.edx, word 13 */
+ XEN_CPUFEATURE(INTEL_PSFD, 13*32+ 0) /*A MSR_SPEC_CTRL.PSFD */
+
++/* Intel-defined CPU features, CPUID level 0x00000007:1.ecx, word 14 */
++
++/* Intel-defined CPU features, CPUID level 0x00000007:1.edx, word 15 */
++
+ #endif /* XEN_CPUFEATURE */
+
+ /* Clean up from a default include. Close the enum (for C). */
+diff --git a/xen/include/xen/lib/x86/cpuid.h b/xen/include/xen/lib/x86/cpuid.h
+index 50be07c0eb..fa98b371ee 100644
+--- a/xen/include/xen/lib/x86/cpuid.h
++++ b/xen/include/xen/lib/x86/cpuid.h
+@@ -17,7 +17,9 @@
+ #define FEATURESET_7a1 10 /* 0x00000007:1.eax */
+ #define FEATURESET_e21a 11 /* 0x80000021.eax */
+ #define FEATURESET_7b1 12 /* 0x00000007:1.ebx */
+-#define FEATURESET_7d2 13 /* 0x80000007:2.edx */
++#define FEATURESET_7d2 13 /* 0x00000007:2.edx */
++#define FEATURESET_7c1 14 /* 0x00000007:1.ecx */
++#define FEATURESET_7d1 15 /* 0x00000007:1.edx */
+
+ struct cpuid_leaf
+ {
+@@ -194,7 +196,14 @@ struct cpuid_policy
+ uint32_t _7b1;
+ struct { DECL_BITFIELD(7b1); };
+ };
+- uint32_t /* c */:32, /* d */:32;
++ union {
++ uint32_t _7c1;
++ struct { DECL_BITFIELD(7c1); };
++ };
++ union {
++ uint32_t _7d1;
++ struct { DECL_BITFIELD(7d1); };
++ };
+
+ /* Subleaf 2. */
+ uint32_t /* a */:32, /* b */:32, /* c */:32;
+@@ -343,6 +352,8 @@ static inline void cpuid_policy_to_featureset(
+ fs[FEATURESET_e21a] = p->extd.e21a;
+ fs[FEATURESET_7b1] = p->feat._7b1;
+ fs[FEATURESET_7d2] = p->feat._7d2;
++ fs[FEATURESET_7c1] = p->feat._7c1;
++ fs[FEATURESET_7d1] = p->feat._7d1;
+ }
+
+ /* Fill in a CPUID policy from a featureset bitmap. */
+@@ -363,6 +374,8 @@ static inline void cpuid_featureset_to_policy(
+ p->extd.e21a = fs[FEATURESET_e21a];
+ p->feat._7b1 = fs[FEATURESET_7b1];
+ p->feat._7d2 = fs[FEATURESET_7d2];
++ p->feat._7c1 = fs[FEATURESET_7c1];
++ p->feat._7d1 = fs[FEATURESET_7d1];
+ }
+
+ static inline uint64_t cpuid_policy_xcr0_max(const struct cpuid_policy *p)
+--
+2.40.0
+
diff --git a/0025-x86-wire-up-VCPUOP_register_vcpu_time_memory_area-fo.patch b/0025-x86-wire-up-VCPUOP_register_vcpu_time_memory_area-fo.patch
deleted file mode 100644
index 489a9c8..0000000
--- a/0025-x86-wire-up-VCPUOP_register_vcpu_time_memory_area-fo.patch
+++ /dev/null
@@ -1,59 +0,0 @@
-From 3f4da85ca8816f6617529c80850eaddd80ea0f1f Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 11 Oct 2022 15:01:36 +0200
-Subject: [PATCH 25/87] x86: wire up VCPUOP_register_vcpu_time_memory_area for
- 32-bit guests
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Forever sinced its introduction VCPUOP_register_vcpu_time_memory_area
-was available only to native domains. Linux, for example, would attempt
-to use it irrespective of guest bitness (including in its so called
-PVHVM mode) as long as it finds XEN_PVCLOCK_TSC_STABLE_BIT set (which we
-set only for clocksource=tsc, which in turn needs engaging via command
-line option).
-
-Fixes: a5d39947cb89 ("Allow guests to register secondary vcpu_time_info")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: b726541d94bd0a80b5864d17a2cd2e6d73a3fe0a
-master date: 2022-09-29 14:47:45 +0200
----
- xen/arch/x86/x86_64/domain.c | 20 ++++++++++++++++++++
- 1 file changed, 20 insertions(+)
-
-diff --git a/xen/arch/x86/x86_64/domain.c b/xen/arch/x86/x86_64/domain.c
-index c46dccc25a54..d51d99344796 100644
---- a/xen/arch/x86/x86_64/domain.c
-+++ b/xen/arch/x86/x86_64/domain.c
-@@ -54,6 +54,26 @@ arch_compat_vcpu_op(
- break;
- }
-
-+ case VCPUOP_register_vcpu_time_memory_area:
-+ {
-+ struct compat_vcpu_register_time_memory_area area = { .addr.p = 0 };
-+
-+ rc = -EFAULT;
-+ if ( copy_from_guest(&area.addr.h, arg, 1) )
-+ break;
-+
-+ if ( area.addr.h.c != area.addr.p ||
-+ !compat_handle_okay(area.addr.h, 1) )
-+ break;
-+
-+ rc = 0;
-+ guest_from_compat_handle(v->arch.time_info_guest, area.addr.h);
-+
-+ force_update_vcpu_system_time(v);
-+
-+ break;
-+ }
-+
- case VCPUOP_get_physid:
- rc = arch_do_vcpu_op(cmd, v, arg);
- break;
---
-2.37.4
-
diff --git a/0026-x86-shskt-Disable-CET-SS-on-parts-susceptible-to-fra.patch b/0026-x86-shskt-Disable-CET-SS-on-parts-susceptible-to-fra.patch
new file mode 100644
index 0000000..7fd4031
--- /dev/null
+++ b/0026-x86-shskt-Disable-CET-SS-on-parts-susceptible-to-fra.patch
@@ -0,0 +1,191 @@
+From 5857cc632b884711c172c5766b8fbba59f990b47 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 3 Mar 2023 08:12:24 +0100
+Subject: [PATCH 26/61] x86/shskt: Disable CET-SS on parts susceptible to
+ fractured updates
+
+Refer to Intel SDM Rev 70 (Dec 2022), Vol3 17.2.3 "Supervisor Shadow Stack
+Token".
+
+Architecturally, an event delivery which starts in CPL<3 and switches shadow
+stack will first validate the Supervisor Shadow Stack Token (setting the busy
+bit), then pushes CS/LIP/SSP. One example of this is an NMI interrupting Xen.
+
+Some CPUs suffer from an issue called fracturing, whereby a fault/vmexit/etc
+between setting the busy bit and completing the event injection renders the
+action non-restartable, because when it comes time to restart, the busy bit is
+found to be already set.
+
+This is far more easily encountered under virt, yet it is not the fault of the
+hypervisor, nor the fault of the guest kernel. The fault lies somewhere
+between the architectural specification, and the uarch behaviour.
+
+Intel have allocated CPUID.7[1].ecx[18] CET_SSS to enumerate that supervisor
+shadow stacks are safe to use. Because of how Xen lays out its shadow stacks,
+fracturing is not expected to be a problem on native.
+
+Detect this case on boot and default to not using shstk if virtualised.
+Specifying `cet=shstk` on the command line will override this heuristic and
+enable shadow stacks irrespective.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 01e7477d1b081cff4288ff9f51ec59ee94c03ee0
+master date: 2023-02-09 18:26:17 +0000
+---
+ docs/misc/xen-command-line.pandoc | 7 +++-
+ tools/libs/light/libxl_cpuid.c | 2 +
+ tools/misc/xen-cpuid.c | 1 +
+ xen/arch/x86/cpu/common.c | 8 +++-
+ xen/arch/x86/setup.c | 46 +++++++++++++++++----
+ xen/include/public/arch-x86/cpufeatureset.h | 1 +
+ 6 files changed, 55 insertions(+), 10 deletions(-)
+
+diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
+index b3f60cd923..a6018fd5c3 100644
+--- a/docs/misc/xen-command-line.pandoc
++++ b/docs/misc/xen-command-line.pandoc
+@@ -287,10 +287,15 @@ can be maintained with the pv-shim mechanism.
+ protection.
+
+ The option is available when `CONFIG_XEN_SHSTK` is compiled in, and
+- defaults to `true` on hardware supporting CET-SS. Specifying
++ generally defaults to `true` on hardware supporting CET-SS. Specifying
+ `cet=no-shstk` will cause Xen not to use Shadow Stacks even when support
+ is available in hardware.
+
++ Some hardware suffers from an issue known as Supervisor Shadow Stack
++ Fracturing. On such hardware, Xen will default to not using Shadow Stacks
++ when virtualised. Specifying `cet=shstk` will override this heuristic and
++ enable Shadow Stacks unilaterally.
++
+ * The `ibt=` boolean controls whether Xen uses Indirect Branch Tracking for
+ its own protection.
+
+diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
+index 691d5c6b2a..b4eacc2bd5 100644
+--- a/tools/libs/light/libxl_cpuid.c
++++ b/tools/libs/light/libxl_cpuid.c
+@@ -234,6 +234,8 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
+ {"fsrs", 0x00000007, 1, CPUID_REG_EAX, 11, 1},
+ {"fsrcs", 0x00000007, 1, CPUID_REG_EAX, 12, 1},
+
++ {"cet-sss", 0x00000007, 1, CPUID_REG_EDX, 18, 1},
++
+ {"intel-psfd", 0x00000007, 2, CPUID_REG_EDX, 0, 1},
+
+ {"lahfsahf", 0x80000001, NA, CPUID_REG_ECX, 0, 1},
+diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
+index 3cfbbf043f..db9c4ed8fc 100644
+--- a/tools/misc/xen-cpuid.c
++++ b/tools/misc/xen-cpuid.c
+@@ -204,6 +204,7 @@ static const char *const str_7c1[32] =
+
+ static const char *const str_7d1[32] =
+ {
++ [18] = "cet-sss",
+ };
+
+ static const char *const str_7d2[32] =
+diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
+index 8222de6461..e1fc034ce6 100644
+--- a/xen/arch/x86/cpu/common.c
++++ b/xen/arch/x86/cpu/common.c
+@@ -344,9 +344,15 @@ void __init early_cpu_init(void)
+ c->x86_model, c->x86_model, c->x86_mask, eax);
+
+ if (c->cpuid_level >= 7) {
+- cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
++ uint32_t max_subleaf;
++
++ cpuid_count(7, 0, &max_subleaf, &ebx, &ecx, &edx);
+ c->x86_capability[cpufeat_word(X86_FEATURE_CET_SS)] = ecx;
+ c->x86_capability[cpufeat_word(X86_FEATURE_CET_IBT)] = edx;
++
++ if (max_subleaf >= 1)
++ cpuid_count(7, 1, &eax, &ebx, &ecx,
++ &c->x86_capability[FEATURESET_7d1]);
+ }
+
+ eax = cpuid_eax(0x80000000);
+diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
+index 70b37d8afe..f0de805780 100644
+--- a/xen/arch/x86/setup.c
++++ b/xen/arch/x86/setup.c
+@@ -98,11 +98,7 @@ unsigned long __initdata highmem_start;
+ size_param("highmem-start", highmem_start);
+ #endif
+
+-#ifdef CONFIG_XEN_SHSTK
+-static bool __initdata opt_xen_shstk = true;
+-#else
+-#define opt_xen_shstk false
+-#endif
++static int8_t __initdata opt_xen_shstk = -IS_ENABLED(CONFIG_XEN_SHSTK);
+
+ #ifdef CONFIG_XEN_IBT
+ static bool __initdata opt_xen_ibt = true;
+@@ -1113,11 +1109,45 @@ void __init noreturn __start_xen(unsigned long mbi_p)
+ early_cpu_init();
+
+ /* Choose shadow stack early, to set infrastructure up appropriately. */
+- if ( opt_xen_shstk && boot_cpu_has(X86_FEATURE_CET_SS) )
++ if ( !boot_cpu_has(X86_FEATURE_CET_SS) )
++ opt_xen_shstk = 0;
++
++ if ( opt_xen_shstk )
+ {
+- printk("Enabling Supervisor Shadow Stacks\n");
++ /*
++ * Some CPUs suffer from Shadow Stack Fracturing, an issue whereby a
++ * fault/VMExit/etc between setting a Supervisor Busy bit and the
++ * event delivery completing renders the operation non-restartable.
++ * On restart, event delivery will find the Busy bit already set.
++ *
++ * This is a problem on bare metal, but outside of synthetic cases or
++ * a very badly timed #MC, it's not believed to be a problem. It is a
++ * much bigger problem under virt, because we can VMExit for a number
++ * of legitimate reasons and tickle this bug.
++ *
++ * CPUs with this addressed enumerate CET-SSS to indicate that
++ * supervisor shadow stacks are now safe to use.
++ */
++ bool cpu_has_bug_shstk_fracture =
++ boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
++ !boot_cpu_has(X86_FEATURE_CET_SSS);
+
+- setup_force_cpu_cap(X86_FEATURE_XEN_SHSTK);
++ /*
++ * On bare metal, assume that Xen won't be impacted by shstk
++ * fracturing problems. Under virt, be more conservative and disable
++ * shstk by default.
++ */
++ if ( opt_xen_shstk == -1 )
++ opt_xen_shstk =
++ cpu_has_hypervisor ? !cpu_has_bug_shstk_fracture
++ : true;
++
++ if ( opt_xen_shstk )
++ {
++ printk("Enabling Supervisor Shadow Stacks\n");
++
++ setup_force_cpu_cap(X86_FEATURE_XEN_SHSTK);
++ }
+ }
+
+ if ( opt_xen_ibt && boot_cpu_has(X86_FEATURE_CET_IBT) )
+diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
+index 0b01ca5e8f..4832ad09df 100644
+--- a/xen/include/public/arch-x86/cpufeatureset.h
++++ b/xen/include/public/arch-x86/cpufeatureset.h
+@@ -307,6 +307,7 @@ XEN_CPUFEATURE(INTEL_PSFD, 13*32+ 0) /*A MSR_SPEC_CTRL.PSFD */
+ /* Intel-defined CPU features, CPUID level 0x00000007:1.ecx, word 14 */
+
+ /* Intel-defined CPU features, CPUID level 0x00000007:1.edx, word 15 */
++XEN_CPUFEATURE(CET_SSS, 15*32+18) /* CET Supervisor Shadow Stacks safe to use */
+
+ #endif /* XEN_CPUFEATURE */
+
+--
+2.40.0
+
diff --git a/0026-x86-vpmu-Fix-race-condition-in-vpmu_load.patch b/0026-x86-vpmu-Fix-race-condition-in-vpmu_load.patch
deleted file mode 100644
index 910f573..0000000
--- a/0026-x86-vpmu-Fix-race-condition-in-vpmu_load.patch
+++ /dev/null
@@ -1,97 +0,0 @@
-From 1bce7fb1f702da4f7a749c6f1457ecb20bf74fca Mon Sep 17 00:00:00 2001
-From: Tamas K Lengyel <tamas.lengyel@intel.com>
-Date: Tue, 11 Oct 2022 15:01:48 +0200
-Subject: [PATCH 26/87] x86/vpmu: Fix race-condition in vpmu_load
-
-The vPMU code-bases attempts to perform an optimization on saving/reloading the
-PMU context by keeping track of what vCPU ran on each pCPU. When a pCPU is
-getting scheduled, checks if the previous vCPU isn't the current one. If so,
-attempts a call to vpmu_save_force. Unfortunately if the previous vCPU is
-already getting scheduled to run on another pCPU its state will be already
-runnable, which results in an ASSERT failure.
-
-Fix this by always performing a pmu context save in vpmu_save when called from
-vpmu_switch_from, and do a vpmu_load when called from vpmu_switch_to.
-
-While this presents a minimal overhead in case the same vCPU is getting
-rescheduled on the same pCPU, the ASSERT failure is avoided and the code is a
-lot easier to reason about.
-
-Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-master commit: defa4e51d20a143bdd4395a075bf0933bb38a9a4
-master date: 2022-09-30 09:53:49 +0200
----
- xen/arch/x86/cpu/vpmu.c | 42 ++++-------------------------------------
- 1 file changed, 4 insertions(+), 38 deletions(-)
-
-diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
-index 16e91a3694fe..b6c2ec3cd047 100644
---- a/xen/arch/x86/cpu/vpmu.c
-+++ b/xen/arch/x86/cpu/vpmu.c
-@@ -368,58 +368,24 @@ void vpmu_save(struct vcpu *v)
- vpmu->last_pcpu = pcpu;
- per_cpu(last_vcpu, pcpu) = v;
-
-+ vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-+
- if ( vpmu->arch_vpmu_ops )
- if ( vpmu->arch_vpmu_ops->arch_vpmu_save(v, 0) )
- vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
-
-+ vpmu_reset(vpmu, VPMU_CONTEXT_SAVE);
-+
- apic_write(APIC_LVTPC, PMU_APIC_VECTOR | APIC_LVT_MASKED);
- }
-
- int vpmu_load(struct vcpu *v, bool_t from_guest)
- {
- struct vpmu_struct *vpmu = vcpu_vpmu(v);
-- int pcpu = smp_processor_id();
-- struct vcpu *prev = NULL;
-
- if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
- return 0;
-
-- /* First time this VCPU is running here */
-- if ( vpmu->last_pcpu != pcpu )
-- {
-- /*
-- * Get the context from last pcpu that we ran on. Note that if another
-- * VCPU is running there it must have saved this VPCU's context before
-- * startig to run (see below).
-- * There should be no race since remote pcpu will disable interrupts
-- * before saving the context.
-- */
-- if ( vpmu_is_set(vpmu, VPMU_CONTEXT_LOADED) )
-- {
-- on_selected_cpus(cpumask_of(vpmu->last_pcpu),
-- vpmu_save_force, (void *)v, 1);
-- vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
-- }
-- }
--
-- /* Prevent forced context save from remote CPU */
-- local_irq_disable();
--
-- prev = per_cpu(last_vcpu, pcpu);
--
-- if ( prev != v && prev )
-- {
-- vpmu = vcpu_vpmu(prev);
--
-- /* Someone ran here before us */
-- vpmu_save_force(prev);
-- vpmu_reset(vpmu, VPMU_CONTEXT_LOADED);
--
-- vpmu = vcpu_vpmu(v);
-- }
--
-- local_irq_enable();
--
- /* Only when PMU is counting, we load PMU context immediately. */
- if ( !vpmu_is_set(vpmu, VPMU_RUNNING) ||
- (!has_vlapic(vpmu_vcpu(vpmu)->domain) &&
---
-2.37.4
-
diff --git a/0027-arm-p2m-Rework-p2m_init.patch b/0027-arm-p2m-Rework-p2m_init.patch
deleted file mode 100644
index 0668899..0000000
--- a/0027-arm-p2m-Rework-p2m_init.patch
+++ /dev/null
@@ -1,88 +0,0 @@
-From 86cb37447548420e41ff953a7372972f6154d6d1 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 25 Oct 2022 09:21:11 +0000
-Subject: [PATCH 27/87] arm/p2m: Rework p2m_init()
-
-p2m_init() is mostly trivial initialisation, but has two fallible operations
-which are on either side of the backpointer trigger for teardown to take
-actions.
-
-p2m_free_vmid() is idempotent with a failed p2m_alloc_vmid(), so rearrange
-p2m_init() to perform all trivial setup, then set the backpointer, then
-perform all fallible setup.
-
-This will simplify a future bugfix which needs to add a third fallible
-operation.
-
-No practical change.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
-(cherry picked from commit: 3783e583319fa1ce75e414d851f0fde191a14753)
----
- xen/arch/arm/p2m.c | 24 ++++++++++++------------
- 1 file changed, 12 insertions(+), 12 deletions(-)
-
-diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
-index b2d856a801af..4f7d923ad9f8 100644
---- a/xen/arch/arm/p2m.c
-+++ b/xen/arch/arm/p2m.c
-@@ -1730,7 +1730,7 @@ void p2m_final_teardown(struct domain *d)
- int p2m_init(struct domain *d)
- {
- struct p2m_domain *p2m = p2m_get_hostp2m(d);
-- int rc = 0;
-+ int rc;
- unsigned int cpu;
-
- rwlock_init(&p2m->lock);
-@@ -1739,11 +1739,6 @@ int p2m_init(struct domain *d)
- INIT_PAGE_LIST_HEAD(&d->arch.paging.p2m_freelist);
-
- p2m->vmid = INVALID_VMID;
--
-- rc = p2m_alloc_vmid(d);
-- if ( rc != 0 )
-- return rc;
--
- p2m->max_mapped_gfn = _gfn(0);
- p2m->lowest_mapped_gfn = _gfn(ULONG_MAX);
-
-@@ -1759,8 +1754,6 @@ int p2m_init(struct domain *d)
- p2m->clean_pte = is_iommu_enabled(d) &&
- !iommu_has_feature(d, IOMMU_FEAT_COHERENT_WALK);
-
-- rc = p2m_alloc_table(d);
--
- /*
- * Make sure that the type chosen to is able to store the an vCPU ID
- * between 0 and the maximum of virtual CPUS supported as long as
-@@ -1773,13 +1766,20 @@ int p2m_init(struct domain *d)
- p2m->last_vcpu_ran[cpu] = INVALID_VCPU_ID;
-
- /*
-- * Besides getting a domain when we only have the p2m in hand,
-- * the back pointer to domain is also used in p2m_teardown()
-- * as an end-of-initialization indicator.
-+ * "Trivial" initialisation is now complete. Set the backpointer so
-+ * p2m_teardown() and friends know to do something.
- */
- p2m->domain = d;
-
-- return rc;
-+ rc = p2m_alloc_vmid(d);
-+ if ( rc )
-+ return rc;
-+
-+ rc = p2m_alloc_table(d);
-+ if ( rc )
-+ return rc;
-+
-+ return 0;
- }
-
- /*
---
-2.37.4
-
diff --git a/0027-credit2-respect-credit2_runqueue-all-when-arranging-.patch b/0027-credit2-respect-credit2_runqueue-all-when-arranging-.patch
new file mode 100644
index 0000000..6c8ab5c
--- /dev/null
+++ b/0027-credit2-respect-credit2_runqueue-all-when-arranging-.patch
@@ -0,0 +1,69 @@
+From 366693226ce025e8721626609b4b43b9061b55f5 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
+ <marmarek@invisiblethingslab.com>
+Date: Fri, 3 Mar 2023 08:13:20 +0100
+Subject: [PATCH 27/61] credit2: respect credit2_runqueue=all when arranging
+ runqueues
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Documentation for credit2_runqueue=all says it should create one queue
+for all pCPUs on the host. But since introduction
+sched_credit2_max_cpus_runqueue, it actually created separate runqueue
+per socket, even if the CPUs count is below
+sched_credit2_max_cpus_runqueue.
+
+Adjust the condition to skip syblink check in case of
+credit2_runqueue=all.
+
+Fixes: 8e2aa76dc167 ("xen: credit2: limit the max number of CPUs in a runqueue")
+Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+master commit: 1f5747ee929fbbcae58d7234c6c38a77495d0cfe
+master date: 2023-02-15 16:12:42 +0100
+---
+ docs/misc/xen-command-line.pandoc | 5 +++++
+ xen/common/sched/credit2.c | 9 +++++++--
+ 2 files changed, 12 insertions(+), 2 deletions(-)
+
+diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
+index a6018fd5c3..7b7a619c1b 100644
+--- a/docs/misc/xen-command-line.pandoc
++++ b/docs/misc/xen-command-line.pandoc
+@@ -724,6 +724,11 @@ Available alternatives, with their meaning, are:
+ * `all`: just one runqueue shared by all the logical pCPUs of
+ the host
+
++Regardless of the above choice, Xen attempts to respect
++`sched_credit2_max_cpus_runqueue` limit, which may mean more than one runqueue
++for the `all` value. If that isn't intended, raise
++the `sched_credit2_max_cpus_runqueue` value.
++
+ ### dbgp
+ > `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]`
+
+diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
+index 6396b38e04..1a240f417a 100644
+--- a/xen/common/sched/credit2.c
++++ b/xen/common/sched/credit2.c
+@@ -996,9 +996,14 @@ cpu_add_to_runqueue(const struct scheduler *ops, unsigned int cpu)
+ *
+ * Otherwise, let's try to make sure that siblings stay in the
+ * same runqueue, pretty much under any cinrcumnstances.
++ *
++ * Furthermore, try to respect credit2_runqueue=all, as long as
++ * max_cpus_runq isn't violated.
+ */
+- if ( rqd->refcnt < max_cpus_runq && (ops->cpupool->gran != SCHED_GRAN_cpu ||
+- cpu_runqueue_siblings_match(rqd, cpu, max_cpus_runq)) )
++ if ( rqd->refcnt < max_cpus_runq &&
++ (ops->cpupool->gran != SCHED_GRAN_cpu ||
++ cpu_runqueue_siblings_match(rqd, cpu, max_cpus_runq) ||
++ opt_runqueue == OPT_RUNQUEUE_ALL) )
+ {
+ /*
+ * This runqueue is ok, but as we said, we also want an even
+--
+2.40.0
+
diff --git a/0028-x86-ucode-AMD-apply-the-patch-early-on-every-logical.patch b/0028-x86-ucode-AMD-apply-the-patch-early-on-every-logical.patch
new file mode 100644
index 0000000..55df5d0
--- /dev/null
+++ b/0028-x86-ucode-AMD-apply-the-patch-early-on-every-logical.patch
@@ -0,0 +1,152 @@
+From d1c6934b41f8288ea3169e63bce8a7eea9d9c549 Mon Sep 17 00:00:00 2001
+From: Sergey Dyasli <sergey.dyasli@citrix.com>
+Date: Fri, 3 Mar 2023 08:14:01 +0100
+Subject: [PATCH 28/61] x86/ucode/AMD: apply the patch early on every logical
+ thread
+
+The original issue has been reported on AMD Bulldozer-based CPUs where
+ucode loading loses the LWP feature bit in order to gain the IBPB bit.
+LWP disabling is per-SMT/CMT core modification and needs to happen on
+each sibling thread despite the shared microcode engine. Otherwise,
+logical CPUs will end up with different cpuid capabilities.
+Link: https://bugzilla.kernel.org/show_bug.cgi?id=216211
+
+Guests running under Xen happen to be not affected because of levelling
+logic for the feature masking/override MSRs which causes the LWP bit to
+fall out and hides the issue. The latest recommendation from AMD, after
+discussing this bug, is to load ucode on every logical CPU.
+
+In Linux kernel this issue has been addressed by e7ad18d1169c
+("x86/microcode/AMD: Apply the patch early on every logical thread").
+Follow the same approach in Xen.
+
+Introduce SAME_UCODE match result and use it for early AMD ucode
+loading. Take this opportunity and move opt_ucode_allow_same out of
+compare_revisions() to the relevant callers and also modify the warning
+message based on it. Intel's side of things is modified for consistency
+but provides no functional change.
+
+Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: f4ef8a41b80831db2136bdaff9f946a1a4b051e7
+master date: 2023-02-21 15:08:05 +0100
+---
+ xen/arch/x86/cpu/microcode/amd.c | 11 ++++++++---
+ xen/arch/x86/cpu/microcode/core.c | 24 ++++++++++++++++--------
+ xen/arch/x86/cpu/microcode/intel.c | 10 +++++++---
+ xen/arch/x86/cpu/microcode/private.h | 3 ++-
+ 4 files changed, 33 insertions(+), 15 deletions(-)
+
+diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
+index fe92e594f1..52182c1a23 100644
+--- a/xen/arch/x86/cpu/microcode/amd.c
++++ b/xen/arch/x86/cpu/microcode/amd.c
+@@ -176,8 +176,8 @@ static enum microcode_match_result compare_revisions(
+ if ( new_rev > old_rev )
+ return NEW_UCODE;
+
+- if ( opt_ucode_allow_same && new_rev == old_rev )
+- return NEW_UCODE;
++ if ( new_rev == old_rev )
++ return SAME_UCODE;
+
+ return OLD_UCODE;
+ }
+@@ -220,8 +220,13 @@ static int apply_microcode(const struct microcode_patch *patch)
+ unsigned int cpu = smp_processor_id();
+ struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
+ uint32_t rev, old_rev = sig->rev;
++ enum microcode_match_result result = microcode_fits(patch);
+
+- if ( microcode_fits(patch) != NEW_UCODE )
++ /*
++ * Allow application of the same revision to pick up SMT-specific changes
++ * even if the revision of the other SMT thread is already up-to-date.
++ */
++ if ( result != NEW_UCODE && result != SAME_UCODE )
+ return -EINVAL;
+
+ if ( check_final_patch_levels(sig) )
+diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
+index ac3ceb567c..ceec1f1edc 100644
+--- a/xen/arch/x86/cpu/microcode/core.c
++++ b/xen/arch/x86/cpu/microcode/core.c
+@@ -608,16 +608,24 @@ static long microcode_update_helper(void *data)
+ * that ucode revision.
+ */
+ spin_lock(µcode_mutex);
+- if ( microcode_cache &&
+- microcode_ops->compare_patch(patch, microcode_cache) != NEW_UCODE )
++ if ( microcode_cache )
+ {
+- spin_unlock(µcode_mutex);
+- printk(XENLOG_WARNING "microcode: couldn't find any newer revision "
+- "in the provided blob!\n");
+- microcode_free_patch(patch);
+- ret = -ENOENT;
++ enum microcode_match_result result;
+
+- goto put;
++ result = microcode_ops->compare_patch(patch, microcode_cache);
++
++ if ( result != NEW_UCODE &&
++ !(opt_ucode_allow_same && result == SAME_UCODE) )
++ {
++ spin_unlock(µcode_mutex);
++ printk(XENLOG_WARNING
++ "microcode: couldn't find any newer%s revision in the provided blob!\n",
++ opt_ucode_allow_same ? " (or the same)" : "");
++ microcode_free_patch(patch);
++ ret = -ENOENT;
++
++ goto put;
++ }
+ }
+ spin_unlock(µcode_mutex);
+
+diff --git a/xen/arch/x86/cpu/microcode/intel.c b/xen/arch/x86/cpu/microcode/intel.c
+index f6d01490e0..c26fbb8cc7 100644
+--- a/xen/arch/x86/cpu/microcode/intel.c
++++ b/xen/arch/x86/cpu/microcode/intel.c
+@@ -232,8 +232,8 @@ static enum microcode_match_result compare_revisions(
+ if ( new_rev > old_rev )
+ return NEW_UCODE;
+
+- if ( opt_ucode_allow_same && new_rev == old_rev )
+- return NEW_UCODE;
++ if ( new_rev == old_rev )
++ return SAME_UCODE;
+
+ /*
+ * Treat pre-production as always applicable - anyone using pre-production
+@@ -290,8 +290,12 @@ static int apply_microcode(const struct microcode_patch *patch)
+ unsigned int cpu = smp_processor_id();
+ struct cpu_signature *sig = &this_cpu(cpu_sig);
+ uint32_t rev, old_rev = sig->rev;
++ enum microcode_match_result result;
++
++ result = microcode_update_match(patch);
+
+- if ( microcode_update_match(patch) != NEW_UCODE )
++ if ( result != NEW_UCODE &&
++ !(opt_ucode_allow_same && result == SAME_UCODE) )
+ return -EINVAL;
+
+ wbinvd();
+diff --git a/xen/arch/x86/cpu/microcode/private.h b/xen/arch/x86/cpu/microcode/private.h
+index c085a10268..feafab0677 100644
+--- a/xen/arch/x86/cpu/microcode/private.h
++++ b/xen/arch/x86/cpu/microcode/private.h
+@@ -6,7 +6,8 @@
+ extern bool opt_ucode_allow_same;
+
+ enum microcode_match_result {
+- OLD_UCODE, /* signature matched, but revision id is older or equal */
++ OLD_UCODE, /* signature matched, but revision id is older */
++ SAME_UCODE, /* signature matched, but revision id is the same */
+ NEW_UCODE, /* signature matched, but revision id is newer */
+ MIS_UCODE, /* signature mismatched */
+ };
+--
+2.40.0
+
diff --git a/0028-xen-arm-p2m-Populate-pages-for-GICv2-mapping-in-p2m_.patch b/0028-xen-arm-p2m-Populate-pages-for-GICv2-mapping-in-p2m_.patch
deleted file mode 100644
index 7bc6c36..0000000
--- a/0028-xen-arm-p2m-Populate-pages-for-GICv2-mapping-in-p2m_.patch
+++ /dev/null
@@ -1,169 +0,0 @@
-From e5a5bdeba6a0c3eacd2ba39c1ee36b3c54e77dca Mon Sep 17 00:00:00 2001
-From: Henry Wang <Henry.Wang@arm.com>
-Date: Tue, 25 Oct 2022 09:21:12 +0000
-Subject: [PATCH 28/87] xen/arm: p2m: Populate pages for GICv2 mapping in
- p2m_init()
-
-Hardware using GICv2 needs to create a P2M mapping of 8KB GICv2 area
-when the domain is created. Considering the worst case of page tables
-which requires 6 P2M pages as the two pages will be consecutive but not
-necessarily in the same L3 page table and keep a buffer, populate 16
-pages as the default value to the P2M pages pool in p2m_init() at the
-domain creation stage to satisfy the GICv2 requirement. For GICv3, the
-above-mentioned P2M mapping is not necessary, but since the allocated
-16 pages here would not be lost, hence populate these pages
-unconditionally.
-
-With the default 16 P2M pages populated, there would be a case that
-failures would happen in the domain creation with P2M pages already in
-use. To properly free the P2M for this case, firstly support the
-optionally preemption of p2m_teardown(), then call p2m_teardown() and
-p2m_set_allocation(d, 0, NULL) non-preemptively in p2m_final_teardown().
-As non-preemptive p2m_teardown() should only return 0, use a
-BUG_ON to confirm that.
-
-Since p2m_final_teardown() is called either after
-domain_relinquish_resources() where relinquish_p2m_mapping() has been
-called, or from failure path of domain_create()/arch_domain_create()
-where mappings that require p2m_put_l3_page() should never be created,
-relinquish_p2m_mapping() is not added in p2m_final_teardown(), add
-in-code comments to refer this.
-
-Fixes: cbea5a1149ca ("xen/arm: Allocate and free P2M pages from the P2M pool")
-Suggested-by: Julien Grall <jgrall@amazon.com>
-Signed-off-by: Henry Wang <Henry.Wang@arm.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
-(cherry picked from commit: c7cff1188802646eaa38e918e5738da0e84949be)
----
- xen/arch/arm/domain.c | 2 +-
- xen/arch/arm/p2m.c | 34 ++++++++++++++++++++++++++++++++--
- xen/include/asm-arm/p2m.h | 14 ++++++++++----
- 3 files changed, 43 insertions(+), 7 deletions(-)
-
-diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
-index a818f33a1afa..c7feaa323ad1 100644
---- a/xen/arch/arm/domain.c
-+++ b/xen/arch/arm/domain.c
-@@ -1059,7 +1059,7 @@ int domain_relinquish_resources(struct domain *d)
- return ret;
-
- PROGRESS(p2m):
-- ret = p2m_teardown(d);
-+ ret = p2m_teardown(d, true);
- if ( ret )
- return ret;
-
-diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
-index 4f7d923ad9f8..6f87e17c1d08 100644
---- a/xen/arch/arm/p2m.c
-+++ b/xen/arch/arm/p2m.c
-@@ -1661,7 +1661,7 @@ static void p2m_free_vmid(struct domain *d)
- spin_unlock(&vmid_alloc_lock);
- }
-
--int p2m_teardown(struct domain *d)
-+int p2m_teardown(struct domain *d, bool allow_preemption)
- {
- struct p2m_domain *p2m = p2m_get_hostp2m(d);
- unsigned long count = 0;
-@@ -1669,6 +1669,9 @@ int p2m_teardown(struct domain *d)
- unsigned int i;
- int rc = 0;
-
-+ if ( page_list_empty(&p2m->pages) )
-+ return 0;
-+
- p2m_write_lock(p2m);
-
- /*
-@@ -1692,7 +1695,7 @@ int p2m_teardown(struct domain *d)
- p2m_free_page(p2m->domain, pg);
- count++;
- /* Arbitrarily preempt every 512 iterations */
-- if ( !(count % 512) && hypercall_preempt_check() )
-+ if ( allow_preemption && !(count % 512) && hypercall_preempt_check() )
- {
- rc = -ERESTART;
- break;
-@@ -1712,7 +1715,20 @@ void p2m_final_teardown(struct domain *d)
- if ( !p2m->domain )
- return;
-
-+ /*
-+ * No need to call relinquish_p2m_mapping() here because
-+ * p2m_final_teardown() is called either after domain_relinquish_resources()
-+ * where relinquish_p2m_mapping() has been called, or from failure path of
-+ * domain_create()/arch_domain_create() where mappings that require
-+ * p2m_put_l3_page() should never be created. For the latter case, also see
-+ * comment on top of the p2m_set_entry() for more info.
-+ */
-+
-+ BUG_ON(p2m_teardown(d, false));
- ASSERT(page_list_empty(&p2m->pages));
-+
-+ while ( p2m_teardown_allocation(d) == -ERESTART )
-+ continue; /* No preemption support here */
- ASSERT(page_list_empty(&d->arch.paging.p2m_freelist));
-
- if ( p2m->root )
-@@ -1779,6 +1795,20 @@ int p2m_init(struct domain *d)
- if ( rc )
- return rc;
-
-+ /*
-+ * Hardware using GICv2 needs to create a P2M mapping of 8KB GICv2 area
-+ * when the domain is created. Considering the worst case for page
-+ * tables and keep a buffer, populate 16 pages to the P2M pages pool here.
-+ * For GICv3, the above-mentioned P2M mapping is not necessary, but since
-+ * the allocated 16 pages here would not be lost, hence populate these
-+ * pages unconditionally.
-+ */
-+ spin_lock(&d->arch.paging.lock);
-+ rc = p2m_set_allocation(d, 16, NULL);
-+ spin_unlock(&d->arch.paging.lock);
-+ if ( rc )
-+ return rc;
-+
- return 0;
- }
-
-diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
-index c9598740bd02..b2725206e8de 100644
---- a/xen/include/asm-arm/p2m.h
-+++ b/xen/include/asm-arm/p2m.h
-@@ -194,14 +194,18 @@ int p2m_init(struct domain *d);
-
- /*
- * The P2M resources are freed in two parts:
-- * - p2m_teardown() will be called when relinquish the resources. It
-- * will free large resources (e.g. intermediate page-tables) that
-- * requires preemption.
-+ * - p2m_teardown() will be called preemptively when relinquish the
-+ * resources, in which case it will free large resources (e.g. intermediate
-+ * page-tables) that requires preemption.
- * - p2m_final_teardown() will be called when domain struct is been
- * freed. This *cannot* be preempted and therefore one small
- * resources should be freed here.
-+ * Note that p2m_final_teardown() will also call p2m_teardown(), to properly
-+ * free the P2M when failures happen in the domain creation with P2M pages
-+ * already in use. In this case p2m_teardown() is called non-preemptively and
-+ * p2m_teardown() will always return 0.
- */
--int p2m_teardown(struct domain *d);
-+int p2m_teardown(struct domain *d, bool allow_preemption);
- void p2m_final_teardown(struct domain *d);
-
- /*
-@@ -266,6 +270,8 @@ mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
- /*
- * Direct set a p2m entry: only for use by the P2M code.
- * The P2M write lock should be taken.
-+ * TODO: Add a check in __p2m_set_entry() to avoid creating a mapping in
-+ * arch_domain_create() that requires p2m_put_l3_page() to be called.
- */
- int p2m_set_entry(struct p2m_domain *p2m,
- gfn_t sgfn,
---
-2.37.4
-
diff --git a/0029-x86-perform-mem_sharing-teardown-before-paging-teard.patch b/0029-x86-perform-mem_sharing-teardown-before-paging-teard.patch
new file mode 100644
index 0000000..c96f44e
--- /dev/null
+++ b/0029-x86-perform-mem_sharing-teardown-before-paging-teard.patch
@@ -0,0 +1,111 @@
+From 700320a79297fb5087f7dd540424c468b2d2cffe Mon Sep 17 00:00:00 2001
+From: Tamas K Lengyel <tamas@tklengyel.com>
+Date: Fri, 3 Mar 2023 08:14:25 +0100
+Subject: [PATCH 29/61] x86: perform mem_sharing teardown before paging
+ teardown
+
+An assert failure has been observed in p2m_teardown when performing vm
+forking and then destroying the forked VM (p2m-basic.c:173). The assert
+checks whether the domain's shared pages counter is 0. According to the
+patch that originally added the assert (7bedbbb5c31) the p2m_teardown
+should only happen after mem_sharing already relinquished all shared pages.
+
+In this patch we flip the order in which relinquish ops are called to avoid
+tripping the assert. Conceptually sharing being torn down makes sense to
+happen before paging is torn down.
+
+Fixes: e7aa55c0aab3 ("x86/p2m: free the paging memory pool preemptively")
+Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 2869349f0cb3a89dcbf1f1b30371f58df6309312
+master date: 2023-02-23 12:35:48 +0100
+---
+ xen/arch/x86/domain.c | 56 ++++++++++++++++++++++---------------------
+ 1 file changed, 29 insertions(+), 27 deletions(-)
+
+diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
+index 3080cde62b..6eeb248908 100644
+--- a/xen/arch/x86/domain.c
++++ b/xen/arch/x86/domain.c
+@@ -2343,9 +2343,9 @@ int domain_relinquish_resources(struct domain *d)
+
+ enum {
+ PROG_iommu_pagetables = 1,
++ PROG_shared,
+ PROG_paging,
+ PROG_vcpu_pagetables,
+- PROG_shared,
+ PROG_xen,
+ PROG_l4,
+ PROG_l3,
+@@ -2364,6 +2364,34 @@ int domain_relinquish_resources(struct domain *d)
+ if ( ret )
+ return ret;
+
++#ifdef CONFIG_MEM_SHARING
++ PROGRESS(shared):
++
++ if ( is_hvm_domain(d) )
++ {
++ /*
++ * If the domain has shared pages, relinquish them allowing
++ * for preemption.
++ */
++ ret = relinquish_shared_pages(d);
++ if ( ret )
++ return ret;
++
++ /*
++ * If the domain is forked, decrement the parent's pause count
++ * and release the domain.
++ */
++ if ( mem_sharing_is_fork(d) )
++ {
++ struct domain *parent = d->parent;
++
++ d->parent = NULL;
++ domain_unpause(parent);
++ put_domain(parent);
++ }
++ }
++#endif
++
+ PROGRESS(paging):
+
+ /* Tear down paging-assistance stuff. */
+@@ -2404,32 +2432,6 @@ int domain_relinquish_resources(struct domain *d)
+ d->arch.auto_unmask = 0;
+ }
+
+-#ifdef CONFIG_MEM_SHARING
+- PROGRESS(shared):
+-
+- if ( is_hvm_domain(d) )
+- {
+- /* If the domain has shared pages, relinquish them allowing
+- * for preemption. */
+- ret = relinquish_shared_pages(d);
+- if ( ret )
+- return ret;
+-
+- /*
+- * If the domain is forked, decrement the parent's pause count
+- * and release the domain.
+- */
+- if ( mem_sharing_is_fork(d) )
+- {
+- struct domain *parent = d->parent;
+-
+- d->parent = NULL;
+- domain_unpause(parent);
+- put_domain(parent);
+- }
+- }
+-#endif
+-
+ spin_lock(&d->page_alloc_lock);
+ page_list_splice(&d->arch.relmem_list, &d->page_list);
+ INIT_PAGE_LIST_HEAD(&d->arch.relmem_list);
+--
+2.40.0
+
diff --git a/0029-x86emul-respect-NSCB.patch b/0029-x86emul-respect-NSCB.patch
deleted file mode 100644
index 08785b7..0000000
--- a/0029-x86emul-respect-NSCB.patch
+++ /dev/null
@@ -1,40 +0,0 @@
-From 5dae06578cd5dcc312175b00ed6836a85732438d Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Mon, 31 Oct 2022 13:19:35 +0100
-Subject: [PATCH 29/87] x86emul: respect NSCB
-
-protmode_load_seg() would better adhere to that "feature" of clearing
-base (and limit) during NULL selector loads.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 87a20c98d9f0f422727fe9b4b9e22c2c43a5cd9c
-master date: 2022-10-11 14:30:41 +0200
----
- xen/arch/x86/x86_emulate/x86_emulate.c | 3 ++-
- 1 file changed, 2 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
-index 441086ea861d..847f8f37719f 100644
---- a/xen/arch/x86/x86_emulate/x86_emulate.c
-+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
-@@ -1970,6 +1970,7 @@ amd_like(const struct x86_emulate_ctxt *ctxt)
- #define vcpu_has_tbm() (ctxt->cpuid->extd.tbm)
- #define vcpu_has_clzero() (ctxt->cpuid->extd.clzero)
- #define vcpu_has_wbnoinvd() (ctxt->cpuid->extd.wbnoinvd)
-+#define vcpu_has_nscb() (ctxt->cpuid->extd.nscb)
-
- #define vcpu_has_bmi1() (ctxt->cpuid->feat.bmi1)
- #define vcpu_has_hle() (ctxt->cpuid->feat.hle)
-@@ -2102,7 +2103,7 @@ protmode_load_seg(
- case x86_seg_tr:
- goto raise_exn;
- }
-- if ( !_amd_like(cp) || !ops->read_segment ||
-+ if ( !_amd_like(cp) || vcpu_has_nscb() || !ops->read_segment ||
- ops->read_segment(seg, sreg, ctxt) != X86EMUL_OKAY )
- memset(sreg, 0, sizeof(*sreg));
- else
---
-2.37.4
-
diff --git a/0030-VMX-correct-error-handling-in-vmx_create_vmcs.patch b/0030-VMX-correct-error-handling-in-vmx_create_vmcs.patch
deleted file mode 100644
index e1b618d..0000000
--- a/0030-VMX-correct-error-handling-in-vmx_create_vmcs.patch
+++ /dev/null
@@ -1,38 +0,0 @@
-From 02ab5e97c41d275ccea0910b1d8bce41ed1be5bf Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Mon, 31 Oct 2022 13:20:40 +0100
-Subject: [PATCH 30/87] VMX: correct error handling in vmx_create_vmcs()
-
-With the addition of vmx_add_msr() calls to construct_vmcs() there are
-now cases where simply freeing the VMCS isn't enough: The MSR bitmap
-page as well as one of the MSR area ones (if it's the 2nd vmx_add_msr()
-which fails) may also need freeing. Switch to using vmx_destroy_vmcs()
-instead.
-
-Fixes: 3bd36952dab6 ("x86/spec-ctrl: Introduce an option to control L1D_FLUSH for HVM HAP guests")
-Fixes: 53a570b28569 ("x86/spec-ctrl: Support IBPB-on-entry")
-Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Kevin Tian <kevin.tian@intel.com>
-master commit: 448d28309f1a966bdc850aff1a637e0b79a03e43
-master date: 2022-10-12 17:57:56 +0200
----
- xen/arch/x86/hvm/vmx/vmcs.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
-index dd817cee4e69..237b13459d4f 100644
---- a/xen/arch/x86/hvm/vmx/vmcs.c
-+++ b/xen/arch/x86/hvm/vmx/vmcs.c
-@@ -1831,7 +1831,7 @@ int vmx_create_vmcs(struct vcpu *v)
-
- if ( (rc = construct_vmcs(v)) != 0 )
- {
-- vmx_free_vmcs(vmx->vmcs_pa);
-+ vmx_destroy_vmcs(v);
- return rc;
- }
-
---
-2.37.4
-
diff --git a/0030-xen-Work-around-Clang-IAS-macro-expansion-bug.patch b/0030-xen-Work-around-Clang-IAS-macro-expansion-bug.patch
new file mode 100644
index 0000000..a92f2f0
--- /dev/null
+++ b/0030-xen-Work-around-Clang-IAS-macro-expansion-bug.patch
@@ -0,0 +1,115 @@
+From 2b8f72a6b40dafc3fb40bce100cd62c4a377535a Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 3 Mar 2023 08:14:57 +0100
+Subject: [PATCH 30/61] xen: Work around Clang-IAS macro \@ expansion bug
+
+https://github.com/llvm/llvm-project/issues/60792
+
+It turns out that Clang-IAS does not expand \@ uniquely in a translaition
+unit, and the XSA-426 change tickles this bug:
+
+ <instantiation>:4:1: error: invalid symbol redefinition
+ .L1_fill_rsb_loop:
+ ^
+ make[3]: *** [Rules.mk:247: arch/x86/acpi/cpu_idle.o] Error 1
+
+Extend DO_OVERWRITE_RSB with an optional parameter so C callers can mix %= in
+too, which Clang does seem to expand properly.
+
+Fixes: 63305e5392ec ("x86/spec-ctrl: Mitigate Cross-Thread Return Address Predictions")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: a2adacff0b91cc7b977abb209dc419a2ef15963f
+master date: 2023-02-24 17:44:29 +0000
+---
+ xen/include/asm-x86/spec_ctrl.h | 4 ++--
+ xen/include/asm-x86/spec_ctrl_asm.h | 23 ++++++++++++++---------
+ 2 files changed, 16 insertions(+), 11 deletions(-)
+
+diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
+index 391973ef6a..a431fea587 100644
+--- a/xen/include/asm-x86/spec_ctrl.h
++++ b/xen/include/asm-x86/spec_ctrl.h
+@@ -83,7 +83,7 @@ static always_inline void spec_ctrl_new_guest_context(void)
+ wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
+
+ /* (ab)use alternative_input() to specify clobbers. */
+- alternative_input("", "DO_OVERWRITE_RSB", X86_BUG_IBPB_NO_RET,
++ alternative_input("", "DO_OVERWRITE_RSB xu=%=", X86_BUG_IBPB_NO_RET,
+ : "rax", "rcx");
+ }
+
+@@ -172,7 +172,7 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
+ *
+ * (ab)use alternative_input() to specify clobbers.
+ */
+- alternative_input("", "DO_OVERWRITE_RSB", X86_FEATURE_SC_RSB_IDLE,
++ alternative_input("", "DO_OVERWRITE_RSB xu=%=", X86_FEATURE_SC_RSB_IDLE,
+ : "rax", "rcx");
+ }
+
+diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
+index 9eb4ad9ab7..b61a5571ae 100644
+--- a/xen/include/asm-x86/spec_ctrl_asm.h
++++ b/xen/include/asm-x86/spec_ctrl_asm.h
+@@ -117,11 +117,16 @@
+ .L\@_done:
+ .endm
+
+-.macro DO_OVERWRITE_RSB tmp=rax
++.macro DO_OVERWRITE_RSB tmp=rax xu
+ /*
+ * Requires nothing
+ * Clobbers \tmp (%rax by default), %rcx
+ *
++ * xu is an optional parameter to add eXtra Uniqueness. It is intended for
++ * passing %= in from an asm() block, in order to work around
++ * https://github.com/llvm/llvm-project/issues/60792 where Clang-IAS doesn't
++ * expand \@ uniquely.
++ *
+ * Requires 256 bytes of {,shadow}stack space, but %rsp/SSP has no net
+ * change. Based on Google's performance numbers, the loop is unrolled to 16
+ * iterations and two calls per iteration.
+@@ -137,31 +142,31 @@
+ mov $16, %ecx /* 16 iterations, two calls per loop */
+ mov %rsp, %\tmp /* Store the current %rsp */
+
+-.L\@_fill_rsb_loop:
++.L\@_fill_rsb_loop\xu:
+
+ .irp n, 1, 2 /* Unrolled twice. */
+- call .L\@_insert_rsb_entry_\n /* Create an RSB entry. */
++ call .L\@_insert_rsb_entry\xu\n /* Create an RSB entry. */
+
+-.L\@_capture_speculation_\n:
++.L\@_capture_speculation\xu\n:
+ pause
+ lfence
+- jmp .L\@_capture_speculation_\n /* Capture rogue speculation. */
++ jmp .L\@_capture_speculation\xu\n /* Capture rogue speculation. */
+
+-.L\@_insert_rsb_entry_\n:
++.L\@_insert_rsb_entry\xu\n:
+ .endr
+
+ sub $1, %ecx
+- jnz .L\@_fill_rsb_loop
++ jnz .L\@_fill_rsb_loop\xu
+ mov %\tmp, %rsp /* Restore old %rsp */
+
+ #ifdef CONFIG_XEN_SHSTK
+ mov $1, %ecx
+ rdsspd %ecx
+ cmp $1, %ecx
+- je .L\@_shstk_done
++ je .L\@_shstk_done\xu
+ mov $64, %ecx /* 64 * 4 bytes, given incsspd */
+ incsspd %ecx /* Restore old SSP */
+-.L\@_shstk_done:
++.L\@_shstk_done\xu:
+ #endif
+ .endm
+
+--
+2.40.0
+
diff --git a/0031-argo-Remove-reachable-ASSERT_UNREACHABLE.patch b/0031-argo-Remove-reachable-ASSERT_UNREACHABLE.patch
deleted file mode 100644
index e89709d..0000000
--- a/0031-argo-Remove-reachable-ASSERT_UNREACHABLE.patch
+++ /dev/null
@@ -1,41 +0,0 @@
-From d4a11d6a22cf73ac7441750e5e8113779348885e Mon Sep 17 00:00:00 2001
-From: Jason Andryuk <jandryuk@gmail.com>
-Date: Mon, 31 Oct 2022 13:21:31 +0100
-Subject: [PATCH 31/87] argo: Remove reachable ASSERT_UNREACHABLE
-
-I observed this ASSERT_UNREACHABLE in partner_rings_remove consistently
-trip. It was in OpenXT with the viptables patch applied.
-
-dom10 shuts down.
-dom7 is REJECTED sending to dom10.
-dom7 shuts down and this ASSERT trips for dom10.
-
-The argo_send_info has a domid, but there is no refcount taken on
-the domain. Therefore it's not appropriate to ASSERT that the domain
-can be looked up via domid. Replace with a debug message.
-
-Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
-Reviewed-by: Christopher Clark <christopher.w.clark@gmail.com>
-master commit: 197f612b77c5afe04e60df2100a855370d720ad7
-master date: 2022-10-14 14:45:41 +0100
----
- xen/common/argo.c | 3 ++-
- 1 file changed, 2 insertions(+), 1 deletion(-)
-
-diff --git a/xen/common/argo.c b/xen/common/argo.c
-index eaea7ba8885a..80f3275092af 100644
---- a/xen/common/argo.c
-+++ b/xen/common/argo.c
-@@ -1298,7 +1298,8 @@ partner_rings_remove(struct domain *src_d)
- ASSERT_UNREACHABLE();
- }
- else
-- ASSERT_UNREACHABLE();
-+ argo_dprintk("%pd has entry for stale partner d%u\n",
-+ src_d, send_info->id.domain_id);
-
- if ( dst_d )
- rcu_unlock_domain(dst_d);
---
-2.37.4
-
diff --git a/0031-xen-Fix-Clang-Wunicode-diagnostic-when-building-asm-.patch b/0031-xen-Fix-Clang-Wunicode-diagnostic-when-building-asm-.patch
new file mode 100644
index 0000000..bad0316
--- /dev/null
+++ b/0031-xen-Fix-Clang-Wunicode-diagnostic-when-building-asm-.patch
@@ -0,0 +1,83 @@
+From f073db0a07c5f6800a70c91819c4b8c2ba359451 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 3 Mar 2023 08:15:50 +0100
+Subject: [PATCH 31/61] xen: Fix Clang -Wunicode diagnostic when building
+ asm-macros
+
+While trying to work around a different Clang-IAS bug (parent changeset), I
+stumbled onto:
+
+ In file included from arch/x86/asm-macros.c:3:
+ ./arch/x86/include/asm/spec_ctrl_asm.h:144:19: error: \u used with
+ no following hex digits; treating as '\' followed by identifier [-Werror,-Wunicode]
+ .L\@_fill_rsb_loop\uniq:
+ ^
+
+It turns out that Clang -E is sensitive to the file extension of the source
+file it is processing. Furthermore, C explicitly permits the use of \u
+escapes in identifier names, so the diagnostic would be reasonable in
+principle if we trying to compile the result.
+
+asm-macros should really have been .S from the outset, as it is ultimately
+generating assembly, not C. Rename it, which causes Clang not to complain.
+
+We need to introduce rules for generating a .i file from .S, and substituting
+c_flags for a_flags lets us drop the now-redundant -D__ASSEMBLY__.
+
+No functional change.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 53f0d02040b1df08f0589f162790ca376e1c2040
+master date: 2023-02-24 17:44:29 +0000
+---
+ xen/Rules.mk | 6 ++++++
+ xen/arch/x86/Makefile | 2 +-
+ xen/arch/x86/{asm-macros.c => asm-macros.S} | 0
+ 3 files changed, 7 insertions(+), 1 deletion(-)
+ rename xen/arch/x86/{asm-macros.c => asm-macros.S} (100%)
+
+diff --git a/xen/Rules.mk b/xen/Rules.mk
+index 5e0699e58b..1f171f88e2 100644
+--- a/xen/Rules.mk
++++ b/xen/Rules.mk
+@@ -223,6 +223,9 @@ $(filter %.init.o,$(obj-y) $(obj-bin-y) $(extra-y)): %.init.o: %.o FORCE
+ quiet_cmd_cpp_i_c = CPP $@
+ cmd_cpp_i_c = $(CPP) $(call cpp_flags,$(c_flags)) -MQ $@ -o $@ $<
+
++quiet_cmd_cpp_i_S = CPP $@
++cmd_cpp_i_S = $(CPP) $(call cpp_flags,$(a_flags)) -MQ $@ -o $@ $<
++
+ quiet_cmd_cc_s_c = CC $@
+ cmd_cc_s_c = $(CC) $(filter-out -Wa$(comma)%,$(c_flags)) -S $< -o $@
+
+@@ -232,6 +235,9 @@ cmd_cpp_s_S = $(CPP) $(call cpp_flags,$(a_flags)) -MQ $@ -o $@ $<
+ %.i: %.c FORCE
+ $(call if_changed,cpp_i_c)
+
++%.i: %.S FORCE
++ $(call if_changed,cpp_i_S)
++
+ %.s: %.c FORCE
+ $(call if_changed,cc_s_c)
+
+diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
+index 69b6cfaded..8e975f472d 100644
+--- a/xen/arch/x86/Makefile
++++ b/xen/arch/x86/Makefile
+@@ -273,7 +273,7 @@ efi/buildid.o efi/relocs-dummy.o: ;
+ .PHONY: include
+ include: $(BASEDIR)/include/asm-x86/asm-macros.h
+
+-asm-macros.i: CFLAGS-y += -D__ASSEMBLY__ -P
++asm-macros.i: CFLAGS-y += -P
+
+ $(BASEDIR)/include/asm-x86/asm-macros.h: asm-macros.i Makefile
+ echo '#if 0' >$@.new
+diff --git a/xen/arch/x86/asm-macros.c b/xen/arch/x86/asm-macros.S
+similarity index 100%
+rename from xen/arch/x86/asm-macros.c
+rename to xen/arch/x86/asm-macros.S
+--
+2.40.0
+
diff --git a/0032-EFI-don-t-convert-memory-marked-for-runtime-use-to-o.patch b/0032-EFI-don-t-convert-memory-marked-for-runtime-use-to-o.patch
deleted file mode 100644
index 33b98df..0000000
--- a/0032-EFI-don-t-convert-memory-marked-for-runtime-use-to-o.patch
+++ /dev/null
@@ -1,64 +0,0 @@
-From 54f8ed80c8308e65c3f57ae6cbd130f43f5ecbbd Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Mon, 31 Oct 2022 13:22:17 +0100
-Subject: [PATCH 32/87] EFI: don't convert memory marked for runtime use to
- ordinary RAM
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
-higher priority than the type of the range. To avoid accessing memory at
-runtime which was re-used for other purposes, make
-efi_arch_process_memory_map() follow suit. While in theory the same would
-apply to EfiACPIReclaimMemory, we don't actually "reclaim" or clobber
-that memory (converted to E820_ACPI on x86) there (and it would be a bug
-if the Dom0 kernel tried to reclaim the range, bypassing Xen's memory
-management, plus it would be at least bogus if it clobbered that space),
-hence that type's handling can be left alone.
-
-Fixes: bf6501a62e80 ("x86-64: EFI boot code")
-Fixes: facac0af87ef ("x86-64: EFI runtime code")
-Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-master commit: f324300c8347b6aa6f9c0b18e0a90bbf44011a9a
-master date: 2022-10-21 12:30:24 +0200
----
- xen/arch/arm/efi/efi-boot.h | 3 ++-
- xen/arch/x86/efi/efi-boot.h | 4 +++-
- 2 files changed, 5 insertions(+), 2 deletions(-)
-
-diff --git a/xen/arch/arm/efi/efi-boot.h b/xen/arch/arm/efi/efi-boot.h
-index 9f267982397b..849071fe5308 100644
---- a/xen/arch/arm/efi/efi-boot.h
-+++ b/xen/arch/arm/efi/efi-boot.h
-@@ -194,7 +194,8 @@ static EFI_STATUS __init efi_process_memory_map_bootinfo(EFI_MEMORY_DESCRIPTOR *
-
- for ( Index = 0; Index < (mmap_size / desc_size); Index++ )
- {
-- if ( desc_ptr->Attribute & EFI_MEMORY_WB &&
-+ if ( !(desc_ptr->Attribute & EFI_MEMORY_RUNTIME) &&
-+ (desc_ptr->Attribute & EFI_MEMORY_WB) &&
- (desc_ptr->Type == EfiConventionalMemory ||
- desc_ptr->Type == EfiLoaderCode ||
- desc_ptr->Type == EfiLoaderData ||
-diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
-index 4ee77fb9bfa2..d99601622310 100644
---- a/xen/arch/x86/efi/efi-boot.h
-+++ b/xen/arch/x86/efi/efi-boot.h
-@@ -185,7 +185,9 @@ static void __init efi_arch_process_memory_map(EFI_SYSTEM_TABLE *SystemTable,
- /* fall through */
- case EfiLoaderCode:
- case EfiLoaderData:
-- if ( desc->Attribute & EFI_MEMORY_WB )
-+ if ( desc->Attribute & EFI_MEMORY_RUNTIME )
-+ type = E820_RESERVED;
-+ else if ( desc->Attribute & EFI_MEMORY_WB )
- type = E820_RAM;
- else
- case EfiUnusableMemory:
---
-2.37.4
-
diff --git a/0032-tools-Use-PKG_CONFIG_FILE-instead-of-PKG_CONFIG-vari.patch b/0032-tools-Use-PKG_CONFIG_FILE-instead-of-PKG_CONFIG-vari.patch
new file mode 100644
index 0000000..bfcdd26
--- /dev/null
+++ b/0032-tools-Use-PKG_CONFIG_FILE-instead-of-PKG_CONFIG-vari.patch
@@ -0,0 +1,98 @@
+From a2adc7fcc22405e81dc11290416e6140bb0244ca Mon Sep 17 00:00:00 2001
+From: Bertrand Marquis <bertrand.marquis@arm.com>
+Date: Fri, 3 Mar 2023 08:16:45 +0100
+Subject: [PATCH 32/61] tools: Use PKG_CONFIG_FILE instead of PKG_CONFIG
+ variable
+
+Replace PKG_CONFIG variable name with PKG_CONFIG_FILE for the name of
+the pkg-config file.
+This is preventing a conflict in some build systems where PKG_CONFIG
+actually contains the path to the pkg-config executable to use, as the
+default assignment in libs.mk is using a weak assignment (?=).
+
+This problem has been found when trying to build the latest version of
+Xen tools using buildroot.
+
+Fixes: d400dc5729e4 ("tools: tweak tools/libs/libs.mk for being able to support libxenctrl")
+Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
+Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
+master commit: b97e2fe7b9e1f4706693552697239ac2b71efee4
+master date: 2023-02-24 17:44:29 +0000
+---
+ tools/libs/ctrl/Makefile | 2 +-
+ tools/libs/libs.mk | 13 +++++++------
+ 2 files changed, 8 insertions(+), 7 deletions(-)
+
+diff --git a/tools/libs/ctrl/Makefile b/tools/libs/ctrl/Makefile
+index 6ff5918798..d3666ae7ff 100644
+--- a/tools/libs/ctrl/Makefile
++++ b/tools/libs/ctrl/Makefile
+@@ -47,7 +47,7 @@ CFLAGS += -include $(XEN_ROOT)/tools/config.h
+ CFLAGS-$(CONFIG_Linux) += -D_GNU_SOURCE
+
+ LIBHEADER := xenctrl.h xenctrl_compat.h
+-PKG_CONFIG := xencontrol.pc
++PKG_CONFIG_FILE := xencontrol.pc
+ PKG_CONFIG_NAME := Xencontrol
+
+ NO_HEADERS_CHK := y
+diff --git a/tools/libs/libs.mk b/tools/libs/libs.mk
+index f1554462fb..0e005218e2 100644
+--- a/tools/libs/libs.mk
++++ b/tools/libs/libs.mk
+@@ -1,7 +1,7 @@
+ # Common Makefile for building a lib.
+ #
+ # Variables taken as input:
+-# PKG_CONFIG: name of pkg-config file (xen$(LIBNAME).pc if empty)
++# PKG_CONFIG_FILE: name of pkg-config file (xen$(LIBNAME).pc if empty)
+ # MAJOR: major version of lib (Xen version if empty)
+ # MINOR: minor version of lib (0 if empty)
+
+@@ -29,7 +29,8 @@ endif
+ comma:= ,
+ empty:=
+ space:= $(empty) $(empty)
+-PKG_CONFIG ?= $(LIB_FILE_NAME).pc
++
++PKG_CONFIG_FILE ?= $(LIB_FILE_NAME).pc
+ PKG_CONFIG_NAME ?= Xen$(LIBNAME)
+ PKG_CONFIG_DESC ?= The $(PKG_CONFIG_NAME) library for Xen hypervisor
+ PKG_CONFIG_VERSION := $(MAJOR).$(MINOR)
+@@ -38,13 +39,13 @@ PKG_CONFIG_LIB := $(LIB_FILE_NAME)
+ PKG_CONFIG_REQPRIV := $(subst $(space),$(comma),$(strip $(foreach lib,$(patsubst ctrl,control,$(USELIBS_$(LIBNAME))),xen$(lib))))
+
+ ifneq ($(CONFIG_LIBXC_MINIOS),y)
+-PKG_CONFIG_INST := $(PKG_CONFIG)
++PKG_CONFIG_INST := $(PKG_CONFIG_FILE)
+ $(PKG_CONFIG_INST): PKG_CONFIG_PREFIX = $(prefix)
+ $(PKG_CONFIG_INST): PKG_CONFIG_INCDIR = $(includedir)
+ $(PKG_CONFIG_INST): PKG_CONFIG_LIBDIR = $(libdir)
+ endif
+
+-PKG_CONFIG_LOCAL := $(PKG_CONFIG_DIR)/$(PKG_CONFIG)
++PKG_CONFIG_LOCAL := $(PKG_CONFIG_DIR)/$(PKG_CONFIG_FILE)
+
+ LIBHEADER ?= $(LIB_FILE_NAME).h
+ LIBHEADERS = $(foreach h, $(LIBHEADER), $(XEN_INCLUDE)/$(h))
+@@ -114,7 +115,7 @@ install: build
+ $(SYMLINK_SHLIB) lib$(LIB_FILE_NAME).so.$(MAJOR).$(MINOR) $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so.$(MAJOR)
+ $(SYMLINK_SHLIB) lib$(LIB_FILE_NAME).so.$(MAJOR) $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so
+ for i in $(LIBHEADERS); do $(INSTALL_DATA) $$i $(DESTDIR)$(includedir); done
+- $(INSTALL_DATA) $(PKG_CONFIG) $(DESTDIR)$(PKG_INSTALLDIR)
++ $(INSTALL_DATA) $(PKG_CONFIG_FILE) $(DESTDIR)$(PKG_INSTALLDIR)
+
+ .PHONY: uninstall
+ uninstall:
+@@ -134,7 +135,7 @@ clean:
+ rm -rf *.rpm $(LIB) *~ $(DEPS_RM) $(LIB_OBJS) $(PIC_OBJS)
+ rm -f lib$(LIB_FILE_NAME).so.$(MAJOR).$(MINOR) lib$(LIB_FILE_NAME).so.$(MAJOR)
+ rm -f headers.chk headers.lst
+- rm -f $(PKG_CONFIG)
++ rm -f $(PKG_CONFIG_FILE)
+ rm -f _paths.h
+
+ .PHONY: distclean
+--
+2.40.0
+
diff --git a/0033-libs-guest-Fix-resource-leaks-in-xc_core_arch_map_p2.patch b/0033-libs-guest-Fix-resource-leaks-in-xc_core_arch_map_p2.patch
new file mode 100644
index 0000000..5caa850
--- /dev/null
+++ b/0033-libs-guest-Fix-resource-leaks-in-xc_core_arch_map_p2.patch
@@ -0,0 +1,65 @@
+From b181a3a5532574d2163408284bcd785ec87fe046 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 3 Mar 2023 08:17:04 +0100
+Subject: [PATCH 33/61] libs/guest: Fix resource leaks in
+ xc_core_arch_map_p2m_tree_rw()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Edwin, with the help of GCC's -fanalyzer, identified that p2m_frame_list_list
+gets leaked. What fanalyzer can't see is that the live_p2m_frame_list_list
+and live_p2m_frame_list foreign mappings are leaked too.
+
+Rework the logic so the out path is executed unconditionally, which cleans up
+all the intermediate allocations/mappings appropriately.
+
+Fixes: bd7a29c3d0b9 ("tools/libs/ctrl: fix xc_core_arch_map_p2m() to support linear p2m table")
+Reported-by: Edwin Török <edwin.torok@cloud.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+master commit: 1868d7f22660c8980bd0a7e53f044467e8b63bb5
+master date: 2023-02-27 15:51:23 +0000
+---
+ tools/libs/guest/xg_core_x86.c | 8 +++-----
+ 1 file changed, 3 insertions(+), 5 deletions(-)
+
+diff --git a/tools/libs/guest/xg_core_x86.c b/tools/libs/guest/xg_core_x86.c
+index 61106b98b8..c5e4542ccc 100644
+--- a/tools/libs/guest/xg_core_x86.c
++++ b/tools/libs/guest/xg_core_x86.c
+@@ -229,11 +229,11 @@ xc_core_arch_map_p2m_tree_rw(xc_interface *xch, struct domain_info_context *dinf
+ uint32_t dom, shared_info_any_t *live_shinfo)
+ {
+ /* Double and single indirect references to the live P2M table */
+- xen_pfn_t *live_p2m_frame_list_list;
++ xen_pfn_t *live_p2m_frame_list_list = NULL;
+ xen_pfn_t *live_p2m_frame_list = NULL;
+ /* Copies of the above. */
+ xen_pfn_t *p2m_frame_list_list = NULL;
+- xen_pfn_t *p2m_frame_list;
++ xen_pfn_t *p2m_frame_list = NULL;
+
+ int err;
+ int i;
+@@ -297,8 +297,6 @@ xc_core_arch_map_p2m_tree_rw(xc_interface *xch, struct domain_info_context *dinf
+
+ dinfo->p2m_frames = P2M_FL_ENTRIES;
+
+- return p2m_frame_list;
+-
+ out:
+ err = errno;
+
+@@ -312,7 +310,7 @@ xc_core_arch_map_p2m_tree_rw(xc_interface *xch, struct domain_info_context *dinf
+
+ errno = err;
+
+- return NULL;
++ return p2m_frame_list;
+ }
+
+ static int
+--
+2.40.0
+
diff --git a/0033-xen-sched-fix-race-in-RTDS-scheduler.patch b/0033-xen-sched-fix-race-in-RTDS-scheduler.patch
deleted file mode 100644
index 93ee04b..0000000
--- a/0033-xen-sched-fix-race-in-RTDS-scheduler.patch
+++ /dev/null
@@ -1,42 +0,0 @@
-From 481465f35da1bcec0b2a4dfd6fc51d86cac28547 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Mon, 31 Oct 2022 13:22:54 +0100
-Subject: [PATCH 33/87] xen/sched: fix race in RTDS scheduler
-
-When a domain gets paused the unit runnable state can change to "not
-runnable" without the scheduling lock being involved. This means that
-a specific scheduler isn't involved in this change of runnable state.
-
-In the RTDS scheduler this can result in an inconsistency in case a
-unit is losing its "runnable" capability while the RTDS scheduler's
-scheduling function is active. RTDS will remove the unit from the run
-queue, but doesn't do so for the replenish queue, leading to hitting
-an ASSERT() in replq_insert() later when the domain is unpaused again.
-
-Fix that by removing the unit from the replenish queue as well in this
-case.
-
-Fixes: 7c7b407e7772 ("xen/sched: introduce unit_runnable_state()")
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: Dario Faggioli <dfaggioli@suse.com>
-master commit: 73c62927f64ecb48f27d06176befdf76b879f340
-master date: 2022-10-21 12:32:23 +0200
----
- xen/common/sched/rt.c | 1 +
- 1 file changed, 1 insertion(+)
-
-diff --git a/xen/common/sched/rt.c b/xen/common/sched/rt.c
-index c24cd2ac3200..ec2ca1bebc26 100644
---- a/xen/common/sched/rt.c
-+++ b/xen/common/sched/rt.c
-@@ -1087,6 +1087,7 @@ rt_schedule(const struct scheduler *ops, struct sched_unit *currunit,
- else if ( !unit_runnable_state(snext->unit) )
- {
- q_remove(snext);
-+ replq_remove(ops, snext);
- snext = rt_unit(sched_idle_unit(sched_cpu));
- }
-
---
-2.37.4
-
diff --git a/0034-libs-guest-Fix-leak-on-realloc-failure-in-backup_pte.patch b/0034-libs-guest-Fix-leak-on-realloc-failure-in-backup_pte.patch
new file mode 100644
index 0000000..4be16a3
--- /dev/null
+++ b/0034-libs-guest-Fix-leak-on-realloc-failure-in-backup_pte.patch
@@ -0,0 +1,56 @@
+From 25d103f2eb59f021cce61f07a0bf0bfa696b4416 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
+Date: Fri, 3 Mar 2023 08:17:23 +0100
+Subject: [PATCH 34/61] libs/guest: Fix leak on realloc failure in
+ backup_ptes()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+From `man 2 realloc`:
+
+ If realloc() fails, the original block is left untouched; it is not freed or moved.
+
+Found using GCC -fanalyzer:
+
+ | 184 | backup->entries = realloc(backup->entries,
+ | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ | | | | |
+ | | | | (91) when ‘realloc’ fails
+ | | | (92) ‘old_ptes.entries’ leaks here; was allocated at (44)
+ | | (90) ...to here
+
+Signed-off-by: Edwin Török <edwin.torok@cloud.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 275d13184cfa52ebe4336ed66526ce93716adbe0
+master date: 2023-02-27 15:51:23 +0000
+---
+ tools/libs/guest/xg_offline_page.c | 10 ++++++++--
+ 1 file changed, 8 insertions(+), 2 deletions(-)
+
+diff --git a/tools/libs/guest/xg_offline_page.c b/tools/libs/guest/xg_offline_page.c
+index cfe0e2d537..c42b973363 100644
+--- a/tools/libs/guest/xg_offline_page.c
++++ b/tools/libs/guest/xg_offline_page.c
+@@ -181,10 +181,16 @@ static int backup_ptes(xen_pfn_t table_mfn, int offset,
+
+ if (backup->max == backup->cur)
+ {
+- backup->entries = realloc(backup->entries,
+- backup->max * 2 * sizeof(struct pte_backup_entry));
++ void *orig = backup->entries;
++
++ backup->entries = realloc(
++ orig, backup->max * 2 * sizeof(struct pte_backup_entry));
++
+ if (backup->entries == NULL)
++ {
++ free(orig);
+ return -1;
++ }
+ else
+ backup->max *= 2;
+ }
+--
+2.40.0
+
diff --git a/0034-xen-sched-fix-restore_vcpu_affinity-by-removing-it.patch b/0034-xen-sched-fix-restore_vcpu_affinity-by-removing-it.patch
deleted file mode 100644
index eecec07..0000000
--- a/0034-xen-sched-fix-restore_vcpu_affinity-by-removing-it.patch
+++ /dev/null
@@ -1,158 +0,0 @@
-From 88f2bf5de9ad789e1c61b5d5ecf118909eed6917 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Mon, 31 Oct 2022 13:23:50 +0100
-Subject: [PATCH 34/87] xen/sched: fix restore_vcpu_affinity() by removing it
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-When the system is coming up after having been suspended,
-restore_vcpu_affinity() is called for each domain in order to adjust
-the vcpu's affinity settings in case a cpu didn't come to live again.
-
-The way restore_vcpu_affinity() is doing that is wrong, because the
-specific scheduler isn't being informed about a possible migration of
-the vcpu to another cpu. Additionally the migration is often even
-happening if all cpus are running again, as it is done without check
-whether it is really needed.
-
-As cpupool management is already calling cpu_disable_scheduler() for
-cpus not having come up again, and cpu_disable_scheduler() is taking
-care of eventually needed vcpu migration in the proper way, there is
-simply no need for restore_vcpu_affinity().
-
-So just remove restore_vcpu_affinity() completely, together with the
-no longer used sched_reset_affinity_broken().
-
-Fixes: 8a04eaa8ea83 ("xen/sched: move some per-vcpu items to struct sched_unit")
-Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: Dario Faggioli <dfaggioli@suse.com>
-Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
-master commit: fce1f381f7388daaa3e96dbb0d67d7a3e4bb2d2d
-master date: 2022-10-24 11:16:27 +0100
----
- xen/arch/x86/acpi/power.c | 3 --
- xen/common/sched/core.c | 78 ---------------------------------------
- xen/include/xen/sched.h | 1 -
- 3 files changed, 82 deletions(-)
-
-diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
-index dd397f713067..1a7baeebe6d0 100644
---- a/xen/arch/x86/acpi/power.c
-+++ b/xen/arch/x86/acpi/power.c
-@@ -159,10 +159,7 @@ static void thaw_domains(void)
-
- rcu_read_lock(&domlist_read_lock);
- for_each_domain ( d )
-- {
-- restore_vcpu_affinity(d);
- domain_unpause(d);
-- }
- rcu_read_unlock(&domlist_read_lock);
- }
-
-diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
-index 900aab8f66a7..9173cf690c72 100644
---- a/xen/common/sched/core.c
-+++ b/xen/common/sched/core.c
-@@ -1188,84 +1188,6 @@ static bool sched_check_affinity_broken(const struct sched_unit *unit)
- return false;
- }
-
--static void sched_reset_affinity_broken(const struct sched_unit *unit)
--{
-- struct vcpu *v;
--
-- for_each_sched_unit_vcpu ( unit, v )
-- v->affinity_broken = false;
--}
--
--void restore_vcpu_affinity(struct domain *d)
--{
-- unsigned int cpu = smp_processor_id();
-- struct sched_unit *unit;
--
-- ASSERT(system_state == SYS_STATE_resume);
--
-- rcu_read_lock(&sched_res_rculock);
--
-- for_each_sched_unit ( d, unit )
-- {
-- spinlock_t *lock;
-- unsigned int old_cpu = sched_unit_master(unit);
-- struct sched_resource *res;
--
-- ASSERT(!unit_runnable(unit));
--
-- /*
-- * Re-assign the initial processor as after resume we have no
-- * guarantee the old processor has come back to life again.
-- *
-- * Therefore, here, before actually unpausing the domains, we should
-- * set v->processor of each of their vCPUs to something that will
-- * make sense for the scheduler of the cpupool in which they are in.
-- */
-- lock = unit_schedule_lock_irq(unit);
--
-- cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
-- cpupool_domain_master_cpumask(d));
-- if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
-- {
-- if ( sched_check_affinity_broken(unit) )
-- {
-- sched_set_affinity(unit, unit->cpu_hard_affinity_saved, NULL);
-- sched_reset_affinity_broken(unit);
-- cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
-- cpupool_domain_master_cpumask(d));
-- }
--
-- if ( cpumask_empty(cpumask_scratch_cpu(cpu)) )
-- {
-- /* Affinity settings of one vcpu are for the complete unit. */
-- printk(XENLOG_DEBUG "Breaking affinity for %pv\n",
-- unit->vcpu_list);
-- sched_set_affinity(unit, &cpumask_all, NULL);
-- cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity,
-- cpupool_domain_master_cpumask(d));
-- }
-- }
--
-- res = get_sched_res(cpumask_any(cpumask_scratch_cpu(cpu)));
-- sched_set_res(unit, res);
--
-- spin_unlock_irq(lock);
--
-- /* v->processor might have changed, so reacquire the lock. */
-- lock = unit_schedule_lock_irq(unit);
-- res = sched_pick_resource(unit_scheduler(unit), unit);
-- sched_set_res(unit, res);
-- spin_unlock_irq(lock);
--
-- if ( old_cpu != sched_unit_master(unit) )
-- sched_move_irqs(unit);
-- }
--
-- rcu_read_unlock(&sched_res_rculock);
--
-- domain_update_node_affinity(d);
--}
--
- /*
- * This function is used by cpu_hotplug code via cpu notifier chain
- * and from cpupools to switch schedulers on a cpu.
-diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
-index 3f4225738a40..1a1fab5239ec 100644
---- a/xen/include/xen/sched.h
-+++ b/xen/include/xen/sched.h
-@@ -999,7 +999,6 @@ void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value);
- void sched_setup_dom0_vcpus(struct domain *d);
- int vcpu_temporary_affinity(struct vcpu *v, unsigned int cpu, uint8_t reason);
- int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity);
--void restore_vcpu_affinity(struct domain *d);
- int vcpu_affinity_domctl(struct domain *d, uint32_t cmd,
- struct xen_domctl_vcpuaffinity *vcpuaff);
-
---
-2.37.4
-
diff --git a/0035-x86-shadow-drop-replace-bogus-assertions.patch b/0035-x86-shadow-drop-replace-bogus-assertions.patch
deleted file mode 100644
index 55e9f62..0000000
--- a/0035-x86-shadow-drop-replace-bogus-assertions.patch
+++ /dev/null
@@ -1,71 +0,0 @@
-From 9fdb4f17656f74b35af0882b558e44832ff00b5f Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Mon, 31 Oct 2022 13:24:33 +0100
-Subject: [PATCH 35/87] x86/shadow: drop (replace) bogus assertions
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The addition of a call to shadow_blow_tables() from shadow_teardown()
-has resulted in the "no vcpus" related assertion becoming triggerable:
-If domain_create() fails with at least one page successfully allocated
-in the course of shadow_enable(), or if domain_create() succeeds and
-the domain is then killed without ever invoking XEN_DOMCTL_max_vcpus.
-Note that in-tree tests (test-resource and test-tsx) do exactly the
-latter of these two.
-
-The assertion's comment was bogus anyway: Shadow mode has been getting
-enabled before allocation of vCPU-s for quite some time. Convert the
-assertion to a conditional: As long as there are no vCPU-s, there's
-nothing to blow away.
-
-Fixes: e7aa55c0aab3 ("x86/p2m: free the paging memory pool preemptively")
-Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
-
-A similar assertion/comment pair exists in _shadow_prealloc(); the
-comment is similarly bogus, and the assertion could in principle trigger
-e.g. when shadow_alloc_p2m_page() is called early enough. Replace those
-at the same time by a similar early return, here indicating failure to
-the caller (which will generally lead to the domain being crashed in
-shadow_prealloc()).
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: a92dc2bb30ba65ae25d2f417677eb7ef9a6a0fef
-master date: 2022-10-24 15:46:11 +0200
----
- xen/arch/x86/mm/shadow/common.c | 10 ++++++----
- 1 file changed, 6 insertions(+), 4 deletions(-)
-
-diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
-index 3b0d781991b5..1de0139742f7 100644
---- a/xen/arch/x86/mm/shadow/common.c
-+++ b/xen/arch/x86/mm/shadow/common.c
-@@ -943,8 +943,9 @@ static bool __must_check _shadow_prealloc(struct domain *d, unsigned int pages)
- /* No reclaim when the domain is dying, teardown will take care of it. */
- return false;
-
-- /* Shouldn't have enabled shadows if we've no vcpus. */
-- ASSERT(d->vcpu && d->vcpu[0]);
-+ /* Nothing to reclaim when there are no vcpus yet. */
-+ if ( !d->vcpu[0] )
-+ return false;
-
- /* Stage one: walk the list of pinned pages, unpinning them */
- perfc_incr(shadow_prealloc_1);
-@@ -1034,8 +1035,9 @@ void shadow_blow_tables(struct domain *d)
- mfn_t smfn;
- int i;
-
-- /* Shouldn't have enabled shadows if we've no vcpus. */
-- ASSERT(d->vcpu && d->vcpu[0]);
-+ /* Nothing to do when there are no vcpus yet. */
-+ if ( !d->vcpu[0] )
-+ return;
-
- /* Pass one: unpin all pinned pages */
- foreach_pinned_shadow(d, sp, t)
---
-2.37.4
-
diff --git a/0035-x86-ucode-AMD-late-load-the-patch-on-every-logical-t.patch b/0035-x86-ucode-AMD-late-load-the-patch-on-every-logical-t.patch
new file mode 100644
index 0000000..931d93f
--- /dev/null
+++ b/0035-x86-ucode-AMD-late-load-the-patch-on-every-logical-t.patch
@@ -0,0 +1,90 @@
+From 84dfe7a56f04a7412fa4869b3e756c49e1cfbe75 Mon Sep 17 00:00:00 2001
+From: Sergey Dyasli <sergey.dyasli@citrix.com>
+Date: Fri, 3 Mar 2023 08:17:40 +0100
+Subject: [PATCH 35/61] x86/ucode/AMD: late load the patch on every logical
+ thread
+
+Currently late ucode loading is performed only on the first core of CPU
+siblings. But according to the latest recommendation from AMD, late
+ucode loading should happen on every logical thread/core on AMD CPUs.
+
+To achieve that, introduce is_cpu_primary() helper which will consider
+every logical cpu as "primary" when running on AMD CPUs. Also include
+Hygon in the check for future-proofing.
+
+Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: f1315e48a03a42f78f9b03c0a384165baf02acae
+master date: 2023-02-28 14:51:28 +0100
+---
+ xen/arch/x86/cpu/microcode/core.c | 24 +++++++++++++++++++-----
+ 1 file changed, 19 insertions(+), 5 deletions(-)
+
+diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
+index ceec1f1edc..ee7df9a591 100644
+--- a/xen/arch/x86/cpu/microcode/core.c
++++ b/xen/arch/x86/cpu/microcode/core.c
+@@ -273,6 +273,20 @@ static bool microcode_update_cache(struct microcode_patch *patch)
+ return true;
+ }
+
++/* Returns true if ucode should be loaded on a given cpu */
++static bool is_cpu_primary(unsigned int cpu)
++{
++ if ( boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) )
++ /* Load ucode on every logical thread/core */
++ return true;
++
++ /* Intel CPUs should load ucode only on the first core of SMT siblings */
++ if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) )
++ return true;
++
++ return false;
++}
++
+ /* Wait for a condition to be met with a timeout (us). */
+ static int wait_for_condition(bool (*func)(unsigned int data),
+ unsigned int data, unsigned int timeout)
+@@ -378,7 +392,7 @@ static int primary_thread_work(const struct microcode_patch *patch)
+
+ static int microcode_nmi_callback(const struct cpu_user_regs *regs, int cpu)
+ {
+- unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
++ bool primary_cpu = is_cpu_primary(cpu);
+ int ret;
+
+ /* System-generated NMI, leave to main handler */
+@@ -391,10 +405,10 @@ static int microcode_nmi_callback(const struct cpu_user_regs *regs, int cpu)
+ * ucode_in_nmi.
+ */
+ if ( cpu == cpumask_first(&cpu_online_map) ||
+- (!ucode_in_nmi && cpu == primary) )
++ (!ucode_in_nmi && primary_cpu) )
+ return 0;
+
+- if ( cpu == primary )
++ if ( primary_cpu )
+ ret = primary_thread_work(nmi_patch);
+ else
+ ret = secondary_nmi_work();
+@@ -545,7 +559,7 @@ static int do_microcode_update(void *patch)
+ */
+ if ( cpu == cpumask_first(&cpu_online_map) )
+ ret = control_thread_fn(patch);
+- else if ( cpu == cpumask_first(this_cpu(cpu_sibling_mask)) )
++ else if ( is_cpu_primary(cpu) )
+ ret = primary_thread_fn(patch);
+ else
+ ret = secondary_thread_fn();
+@@ -637,7 +651,7 @@ static long microcode_update_helper(void *data)
+ /* Calculate the number of online CPU core */
+ nr_cores = 0;
+ for_each_online_cpu(cpu)
+- if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) )
++ if ( is_cpu_primary(cpu) )
+ nr_cores++;
+
+ printk(XENLOG_INFO "%u cores are to update their microcode\n", nr_cores);
+--
+2.40.0
+
diff --git a/0036-vpci-don-t-assume-that-vpci-per-device-data-exists-u.patch b/0036-vpci-don-t-assume-that-vpci-per-device-data-exists-u.patch
deleted file mode 100644
index ab8f792..0000000
--- a/0036-vpci-don-t-assume-that-vpci-per-device-data-exists-u.patch
+++ /dev/null
@@ -1,60 +0,0 @@
-From 96d26f11f56e83b98ec184f4e0d17161efe3a927 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Mon, 31 Oct 2022 13:25:13 +0100
-Subject: [PATCH 36/87] vpci: don't assume that vpci per-device data exists
- unconditionally
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-It's possible for a device to be assigned to a domain but have no
-vpci structure if vpci_process_pending() failed and called
-vpci_remove_device() as a result. The unconditional accesses done by
-vpci_{read,write}() and vpci_remove_device() to pdev->vpci would
-then trigger a NULL pointer dereference.
-
-Add checks for pdev->vpci presence in the affected functions.
-
-Fixes: 9c244fdef7 ('vpci: add header handlers')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 6ccb5e308ceeb895fbccd87a528a8bd24325aa39
-master date: 2022-10-26 14:55:30 +0200
----
- xen/drivers/vpci/vpci.c | 6 +++---
- 1 file changed, 3 insertions(+), 3 deletions(-)
-
-diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
-index dfc8136ffb95..53d78d53911d 100644
---- a/xen/drivers/vpci/vpci.c
-+++ b/xen/drivers/vpci/vpci.c
-@@ -37,7 +37,7 @@ extern vpci_register_init_t *const __end_vpci_array[];
-
- void vpci_remove_device(struct pci_dev *pdev)
- {
-- if ( !has_vpci(pdev->domain) )
-+ if ( !has_vpci(pdev->domain) || !pdev->vpci )
- return;
-
- spin_lock(&pdev->vpci->lock);
-@@ -326,7 +326,7 @@ uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg, unsigned int size)
-
- /* Find the PCI dev matching the address. */
- pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
-- if ( !pdev )
-+ if ( !pdev || !pdev->vpci )
- return vpci_read_hw(sbdf, reg, size);
-
- spin_lock(&pdev->vpci->lock);
-@@ -436,7 +436,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
- * Passthrough everything that's not trapped.
- */
- pdev = pci_get_pdev_by_domain(d, sbdf.seg, sbdf.bus, sbdf.devfn);
-- if ( !pdev )
-+ if ( !pdev || !pdev->vpci )
- {
- vpci_write_hw(sbdf, reg, size, data);
- return;
---
-2.37.4
-
diff --git a/0036-x86-shadow-account-for-log-dirty-mode-when-pre-alloc.patch b/0036-x86-shadow-account-for-log-dirty-mode-when-pre-alloc.patch
new file mode 100644
index 0000000..38629a4
--- /dev/null
+++ b/0036-x86-shadow-account-for-log-dirty-mode-when-pre-alloc.patch
@@ -0,0 +1,92 @@
+From b0d6684ee58f7252940f5a62e4b85bdc56307eef Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 21 Mar 2023 11:59:44 +0000
+Subject: [PATCH 36/61] x86/shadow: account for log-dirty mode when
+ pre-allocating
+
+Pre-allocation is intended to ensure that in the course of constructing
+or updating shadows there won't be any risk of just made shadows or
+shadows being acted upon can disappear under our feet. The amount of
+pages pre-allocated then, however, needs to account for all possible
+subsequent allocations. While the use in sh_page_fault() accounts for
+all shadows which may need making, so far it didn't account for
+allocations coming from log-dirty tracking (which piggybacks onto the
+P2M allocation functions).
+
+Since shadow_prealloc() takes a count of shadows (or other data
+structures) rather than a count of pages, putting the adjustment at the
+call site of this function won't work very well: We simply can't express
+the correct count that way in all cases. Instead take care of this in
+the function itself, by "snooping" for L1 type requests. (While not
+applicable right now, future new request sites of L1 tables would then
+also be covered right away.)
+
+It is relevant to note here that pre-allocations like the one done from
+shadow_alloc_p2m_page() are benign when they fall in the "scope" of an
+earlier pre-alloc which already included that count: The inner call will
+simply find enough pages available then; it'll bail right away.
+
+This is CVE-2022-42332 / XSA-427.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Tim Deegan <tim@xen.org>
+(cherry picked from commit 91767a71061035ae42be93de495cd976f863a41a)
+---
+ xen/arch/x86/mm/paging.c | 1 +
+ xen/arch/x86/mm/shadow/common.c | 12 +++++++++++-
+ xen/include/asm-x86/paging.h | 4 ++++
+ 3 files changed, 16 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
+index 97ac9ccf59..9fb66e65cd 100644
+--- a/xen/arch/x86/mm/paging.c
++++ b/xen/arch/x86/mm/paging.c
+@@ -280,6 +280,7 @@ void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn)
+ if ( unlikely(!VALID_M2P(pfn_x(pfn))) )
+ return;
+
++ BUILD_BUG_ON(paging_logdirty_levels() != 4);
+ i1 = L1_LOGDIRTY_IDX(pfn);
+ i2 = L2_LOGDIRTY_IDX(pfn);
+ i3 = L3_LOGDIRTY_IDX(pfn);
+diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
+index 1de0139742..c14a269935 100644
+--- a/xen/arch/x86/mm/shadow/common.c
++++ b/xen/arch/x86/mm/shadow/common.c
+@@ -1015,7 +1015,17 @@ bool shadow_prealloc(struct domain *d, unsigned int type, unsigned int count)
+ if ( unlikely(d->is_dying) )
+ return false;
+
+- ret = _shadow_prealloc(d, shadow_size(type) * count);
++ count *= shadow_size(type);
++ /*
++ * Log-dirty handling may result in allocations when populating its
++ * tracking structures. Tie this to the caller requesting space for L1
++ * shadows.
++ */
++ if ( paging_mode_log_dirty(d) &&
++ ((SHF_L1_ANY | SHF_FL1_ANY) & (1u << type)) )
++ count += paging_logdirty_levels();
++
++ ret = _shadow_prealloc(d, count);
+ if ( !ret && (!d->is_shutting_down || d->shutdown_code != SHUTDOWN_crash) )
+ /*
+ * Failing to allocate memory required for shadow usage can only result in
+diff --git a/xen/include/asm-x86/paging.h b/xen/include/asm-x86/paging.h
+index 27890791d8..c6b429c691 100644
+--- a/xen/include/asm-x86/paging.h
++++ b/xen/include/asm-x86/paging.h
+@@ -192,6 +192,10 @@ int paging_mfn_is_dirty(struct domain *d, mfn_t gmfn);
+ #define L4_LOGDIRTY_IDX(pfn) ((pfn_x(pfn) >> (PAGE_SHIFT + 3 + PAGETABLE_ORDER * 2)) & \
+ (LOGDIRTY_NODE_ENTRIES-1))
+
++#define paging_logdirty_levels() \
++ (DIV_ROUND_UP(PADDR_BITS - PAGE_SHIFT - (PAGE_SHIFT + 3), \
++ PAGE_SHIFT - ilog2(sizeof(mfn_t))) + 1)
++
+ #ifdef CONFIG_HVM
+ /* VRAM dirty tracking support */
+ struct sh_dirty_vram {
+--
+2.40.0
+
diff --git a/0037-vpci-msix-remove-from-table-list-on-detach.patch b/0037-vpci-msix-remove-from-table-list-on-detach.patch
deleted file mode 100644
index 2bae0a2..0000000
--- a/0037-vpci-msix-remove-from-table-list-on-detach.patch
+++ /dev/null
@@ -1,47 +0,0 @@
-From 8f3f8f20de5cea704671d4ca83f2dceb93ab98d8 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Mon, 31 Oct 2022 13:25:40 +0100
-Subject: [PATCH 37/87] vpci/msix: remove from table list on detach
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Teardown of MSIX vPCI related data doesn't currently remove the MSIX
-device data from the list of MSIX tables handled by the domain,
-leading to a use-after-free of the data in the msix structure.
-
-Remove the structure from the list before freeing in order to solve
-it.
-
-Reported-by: Jan Beulich <jbeulich@suse.com>
-Fixes: d6281be9d0 ('vpci/msix: add MSI-X handlers')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: c14aea137eab29eb9c30bfad745a00c65ad21066
-master date: 2022-10-26 14:56:58 +0200
----
- xen/drivers/vpci/vpci.c | 8 ++++++--
- 1 file changed, 6 insertions(+), 2 deletions(-)
-
-diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
-index 53d78d53911d..b9339f8f3e43 100644
---- a/xen/drivers/vpci/vpci.c
-+++ b/xen/drivers/vpci/vpci.c
-@@ -51,8 +51,12 @@ void vpci_remove_device(struct pci_dev *pdev)
- xfree(r);
- }
- spin_unlock(&pdev->vpci->lock);
-- if ( pdev->vpci->msix && pdev->vpci->msix->pba )
-- iounmap(pdev->vpci->msix->pba);
-+ if ( pdev->vpci->msix )
-+ {
-+ list_del(&pdev->vpci->msix->next);
-+ if ( pdev->vpci->msix->pba )
-+ iounmap(pdev->vpci->msix->pba);
-+ }
- xfree(pdev->vpci->msix);
- xfree(pdev->vpci->msi);
- xfree(pdev->vpci);
---
-2.37.4
-
diff --git a/0037-x86-HVM-bound-number-of-pinned-cache-attribute-regio.patch b/0037-x86-HVM-bound-number-of-pinned-cache-attribute-regio.patch
new file mode 100644
index 0000000..6730b2d
--- /dev/null
+++ b/0037-x86-HVM-bound-number-of-pinned-cache-attribute-regio.patch
@@ -0,0 +1,50 @@
+From 2fe1517a00e088f6b1f1aff7d4ea1b477b288987 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 21 Mar 2023 12:01:01 +0000
+Subject: [PATCH 37/61] x86/HVM: bound number of pinned cache attribute regions
+
+This is exposed via DMOP, i.e. to potentially not fully privileged
+device models. With that we may not permit registration of an (almost)
+unbounded amount of such regions.
+
+This is CVE-2022-42333 / part of XSA-428.
+
+Fixes: 642123c5123f ("x86/hvm: provide XEN_DMOP_pin_memory_cacheattr")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+(cherry picked from commit a5e768640f786b681063f4e08af45d0c4e91debf)
+---
+ xen/arch/x86/hvm/mtrr.c | 5 +++++
+ 1 file changed, 5 insertions(+)
+
+diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
+index 4a9f3177ed..98e55bbdbd 100644
+--- a/xen/arch/x86/hvm/mtrr.c
++++ b/xen/arch/x86/hvm/mtrr.c
+@@ -595,6 +595,7 @@ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
+ uint64_t gfn_end, uint32_t type)
+ {
+ struct hvm_mem_pinned_cacheattr_range *range;
++ unsigned int nr = 0;
+ int rc = 1;
+
+ if ( !is_hvm_domain(d) )
+@@ -666,11 +667,15 @@ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
+ rc = -EBUSY;
+ break;
+ }
++ ++nr;
+ }
+ rcu_read_unlock(&pinned_cacheattr_rcu_lock);
+ if ( rc <= 0 )
+ return rc;
+
++ if ( nr >= 64 /* The limit is arbitrary. */ )
++ return -ENOSPC;
++
+ range = xzalloc(struct hvm_mem_pinned_cacheattr_range);
+ if ( range == NULL )
+ return -ENOMEM;
+--
+2.40.0
+
diff --git a/0038-x86-HVM-serialize-pinned-cache-attribute-list-manipu.patch b/0038-x86-HVM-serialize-pinned-cache-attribute-list-manipu.patch
new file mode 100644
index 0000000..ca8528f
--- /dev/null
+++ b/0038-x86-HVM-serialize-pinned-cache-attribute-list-manipu.patch
@@ -0,0 +1,126 @@
+From 564de020d29fbc4efd20ef8052051e86b2465a1a Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 21 Mar 2023 12:01:01 +0000
+Subject: [PATCH 38/61] x86/HVM: serialize pinned cache attribute list
+ manipulation
+
+While the RCU variants of list insertion and removal allow lockless list
+traversal (with RCU just read-locked), insertions and removals still
+need serializing amongst themselves. To keep things simple, use the
+domain lock for this purpose.
+
+This is CVE-2022-42334 / part of XSA-428.
+
+Fixes: 642123c5123f ("x86/hvm: provide XEN_DMOP_pin_memory_cacheattr")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 829ec245cf66560e3b50d140ccb3168e7fb7c945)
+---
+ xen/arch/x86/hvm/mtrr.c | 51 +++++++++++++++++++++++++----------------
+ 1 file changed, 31 insertions(+), 20 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
+index 98e55bbdbd..9b3b33012b 100644
+--- a/xen/arch/x86/hvm/mtrr.c
++++ b/xen/arch/x86/hvm/mtrr.c
+@@ -594,7 +594,7 @@ static void free_pinned_cacheattr_entry(struct rcu_head *rcu)
+ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
+ uint64_t gfn_end, uint32_t type)
+ {
+- struct hvm_mem_pinned_cacheattr_range *range;
++ struct hvm_mem_pinned_cacheattr_range *range, *newr;
+ unsigned int nr = 0;
+ int rc = 1;
+
+@@ -608,14 +608,15 @@ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
+ {
+ case XEN_DOMCTL_DELETE_MEM_CACHEATTR:
+ /* Remove the requested range. */
+- rcu_read_lock(&pinned_cacheattr_rcu_lock);
+- list_for_each_entry_rcu ( range,
+- &d->arch.hvm.pinned_cacheattr_ranges,
+- list )
++ domain_lock(d);
++ list_for_each_entry ( range,
++ &d->arch.hvm.pinned_cacheattr_ranges,
++ list )
+ if ( range->start == gfn_start && range->end == gfn_end )
+ {
+- rcu_read_unlock(&pinned_cacheattr_rcu_lock);
+ list_del_rcu(&range->list);
++ domain_unlock(d);
++
+ type = range->type;
+ call_rcu(&range->rcu, free_pinned_cacheattr_entry);
+ p2m_memory_type_changed(d);
+@@ -636,7 +637,7 @@ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
+ }
+ return 0;
+ }
+- rcu_read_unlock(&pinned_cacheattr_rcu_lock);
++ domain_unlock(d);
+ return -ENOENT;
+
+ case PAT_TYPE_UC_MINUS:
+@@ -651,7 +652,10 @@ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
+ return -EINVAL;
+ }
+
+- rcu_read_lock(&pinned_cacheattr_rcu_lock);
++ newr = xzalloc(struct hvm_mem_pinned_cacheattr_range);
++
++ domain_lock(d);
++
+ list_for_each_entry_rcu ( range,
+ &d->arch.hvm.pinned_cacheattr_ranges,
+ list )
+@@ -669,27 +673,34 @@ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
+ }
+ ++nr;
+ }
+- rcu_read_unlock(&pinned_cacheattr_rcu_lock);
++
+ if ( rc <= 0 )
+- return rc;
++ /* nothing */;
++ else if ( nr >= 64 /* The limit is arbitrary. */ )
++ rc = -ENOSPC;
++ else if ( !newr )
++ rc = -ENOMEM;
++ else
++ {
++ newr->start = gfn_start;
++ newr->end = gfn_end;
++ newr->type = type;
+
+- if ( nr >= 64 /* The limit is arbitrary. */ )
+- return -ENOSPC;
++ list_add_rcu(&newr->list, &d->arch.hvm.pinned_cacheattr_ranges);
+
+- range = xzalloc(struct hvm_mem_pinned_cacheattr_range);
+- if ( range == NULL )
+- return -ENOMEM;
++ newr = NULL;
++ rc = 0;
++ }
++
++ domain_unlock(d);
+
+- range->start = gfn_start;
+- range->end = gfn_end;
+- range->type = type;
++ xfree(newr);
+
+- list_add_rcu(&range->list, &d->arch.hvm.pinned_cacheattr_ranges);
+ p2m_memory_type_changed(d);
+ if ( type != PAT_TYPE_WRBACK )
+ flush_all(FLUSH_CACHE);
+
+- return 0;
++ return rc;
+ }
+
+ static int hvm_save_mtrr_msr(struct vcpu *v, hvm_domain_context_t *h)
+--
+2.40.0
+
diff --git a/0038-x86-also-zap-secondary-time-area-handles-during-soft.patch b/0038-x86-also-zap-secondary-time-area-handles-during-soft.patch
deleted file mode 100644
index 286661a..0000000
--- a/0038-x86-also-zap-secondary-time-area-handles-during-soft.patch
+++ /dev/null
@@ -1,49 +0,0 @@
-From aac108509055e5f5ff293e1fb44614f96a0996c6 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Mon, 31 Oct 2022 13:26:08 +0100
-Subject: [PATCH 38/87] x86: also zap secondary time area handles during soft
- reset
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Just like domain_soft_reset() properly zaps runstate area handles, the
-secondary time area ones also need discarding to prevent guest memory
-corruption once the guest is re-started.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: b80d4f8d2ea6418e32fb4f20d1304ace6d6566e3
-master date: 2022-10-27 11:49:09 +0200
----
- xen/arch/x86/domain.c | 6 ++++++
- 1 file changed, 6 insertions(+)
-
-diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
-index a4356893bdbc..3fab2364be8d 100644
---- a/xen/arch/x86/domain.c
-+++ b/xen/arch/x86/domain.c
-@@ -929,6 +929,7 @@ int arch_domain_soft_reset(struct domain *d)
- struct page_info *page = virt_to_page(d->shared_info), *new_page;
- int ret = 0;
- struct domain *owner;
-+ struct vcpu *v;
- mfn_t mfn;
- gfn_t gfn;
- p2m_type_t p2mt;
-@@ -1008,7 +1009,12 @@ int arch_domain_soft_reset(struct domain *d)
- "Failed to add a page to replace %pd's shared_info frame %"PRI_gfn"\n",
- d, gfn_x(gfn));
- free_domheap_page(new_page);
-+ goto exit_put_gfn;
- }
-+
-+ for_each_vcpu ( d, v )
-+ set_xen_guest_handle(v->arch.time_info_guest, NULL);
-+
- exit_put_gfn:
- put_gfn(d, gfn_x(gfn));
- exit_put_page:
---
-2.37.4
-
diff --git a/0039-common-map_vcpu_info-wants-to-unshare-the-underlying.patch b/0039-common-map_vcpu_info-wants-to-unshare-the-underlying.patch
deleted file mode 100644
index cea8bb5..0000000
--- a/0039-common-map_vcpu_info-wants-to-unshare-the-underlying.patch
+++ /dev/null
@@ -1,41 +0,0 @@
-From 426a8346c01075ec5eba4aadefab03a96b6ece6a Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Mon, 31 Oct 2022 13:26:33 +0100
-Subject: [PATCH 39/87] common: map_vcpu_info() wants to unshare the underlying
- page
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Not passing P2M_UNSHARE to get_page_from_gfn() means there won't even be
-an attempt to unshare the referenced page, without any indication to the
-caller (e.g. -EAGAIN). Note that guests have no direct control over
-which of their pages are shared (or paged out), and hence they have no
-way to make sure all on their own that the subsequent obtaining of a
-writable type reference can actually succeed.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-Acked-by: Julien Grall <jgrall@amazon.com>
-master commit: 48980cf24d5cf41fd644600f99c753419505e735
-master date: 2022-10-28 11:38:32 +0200
----
- xen/common/domain.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/common/domain.c b/xen/common/domain.c
-index 56d47dd66478..e3afcacb6cae 100644
---- a/xen/common/domain.c
-+++ b/xen/common/domain.c
-@@ -1471,7 +1471,7 @@ int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset)
- if ( (v != current) && !(v->pause_flags & VPF_down) )
- return -EINVAL;
-
-- page = get_page_from_gfn(d, gfn, NULL, P2M_ALLOC);
-+ page = get_page_from_gfn(d, gfn, NULL, P2M_UNSHARE);
- if ( !page )
- return -EINVAL;
-
---
-2.37.4
-
diff --git a/0039-x86-spec-ctrl-Defer-CR4_PV32_RESTORE-on-the-cstar_en.patch b/0039-x86-spec-ctrl-Defer-CR4_PV32_RESTORE-on-the-cstar_en.patch
new file mode 100644
index 0000000..74bcf67
--- /dev/null
+++ b/0039-x86-spec-ctrl-Defer-CR4_PV32_RESTORE-on-the-cstar_en.patch
@@ -0,0 +1,56 @@
+From 3c924fe46b455834b5c04268db6b528b549668d1 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 10 Feb 2023 21:11:14 +0000
+Subject: [PATCH 39/61] x86/spec-ctrl: Defer CR4_PV32_RESTORE on the
+ cstar_enter path
+
+As stated (correctly) by the comment next to SPEC_CTRL_ENTRY_FROM_PV, between
+the two hunks visible in the patch, RET's are not safe prior to this point.
+
+CR4_PV32_RESTORE hides a CALL/RET pair in certain configurations (PV32
+compiled in, SMEP or SMAP active), and the RET can be attacked with one of
+several known speculative issues.
+
+Furthermore, CR4_PV32_RESTORE also hides a reference to the cr4_pv32_mask
+global variable, which is not safe when XPTI is active before restoring Xen's
+full pagetables.
+
+This crash has gone unnoticed because it is only AMD CPUs which permit the
+SYSCALL instruction in compatibility mode, and these are not vulnerable to
+Meltdown so don't activate XPTI by default.
+
+This is XSA-429 / CVE-2022-42331
+
+Fixes: 5e7962901131 ("x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point")
+Fixes: 5784de3e2067 ("x86: Meltdown band-aid against malicious 64-bit PV guests")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit df5b055b12116d9e63ced59ae5389e69a2a3de48)
+---
+ xen/arch/x86/x86_64/entry.S | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
+index fba8ae498f..db2ea7871e 100644
+--- a/xen/arch/x86/x86_64/entry.S
++++ b/xen/arch/x86/x86_64/entry.S
+@@ -288,7 +288,6 @@ ENTRY(cstar_enter)
+ ALTERNATIVE "", "setssbsy", X86_FEATURE_XEN_SHSTK
+ #endif
+ push %rax /* Guest %rsp */
+- CR4_PV32_RESTORE
+ movq 8(%rsp), %rax /* Restore guest %rax. */
+ movq $FLAT_USER_SS32, 8(%rsp) /* Assume a 64bit domain. Compat handled lower. */
+ pushq %r11
+@@ -312,6 +311,8 @@ ENTRY(cstar_enter)
+ .Lcstar_cr3_okay:
+ sti
+
++ CR4_PV32_RESTORE
++
+ movq STACK_CPUINFO_FIELD(current_vcpu)(%rbx), %rbx
+
+ #ifdef CONFIG_PV32
+--
+2.40.0
+
diff --git a/0040-tools-python-change-s-size-type-for-Python-3.10.patch b/0040-tools-python-change-s-size-type-for-Python-3.10.patch
new file mode 100644
index 0000000..979fd6f
--- /dev/null
+++ b/0040-tools-python-change-s-size-type-for-Python-3.10.patch
@@ -0,0 +1,72 @@
+From 0cbffc6099db7fd01041910a98b99ccad50af11b Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
+ <marmarek@invisiblethingslab.com>
+Date: Tue, 21 Mar 2023 13:49:28 +0100
+Subject: [PATCH 40/61] tools/python: change 's#' size type for Python >= 3.10
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Python < 3.10 by default uses 'int' type for data+size string types
+(s#), unless PY_SSIZE_T_CLEAN is defined - in which case it uses
+Py_ssize_t. The former behavior was removed in Python 3.10 and now it's
+required to define PY_SSIZE_T_CLEAN before including Python.h, and using
+Py_ssize_t for the length argument. The PY_SSIZE_T_CLEAN behavior is
+supported since Python 2.5.
+
+Adjust bindings accordingly.
+
+Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
+Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
+master commit: 897257ba49d0a6ddcf084960fd792ccce9c40f94
+master date: 2023-02-06 08:50:13 +0100
+---
+ tools/python/xen/lowlevel/xc/xc.c | 3 ++-
+ tools/python/xen/lowlevel/xs/xs.c | 3 ++-
+ 2 files changed, 4 insertions(+), 2 deletions(-)
+
+diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
+index fd00861032..cfb2734a99 100644
+--- a/tools/python/xen/lowlevel/xc/xc.c
++++ b/tools/python/xen/lowlevel/xc/xc.c
+@@ -4,6 +4,7 @@
+ * Copyright (c) 2003-2004, K A Fraser (University of Cambridge)
+ */
+
++#define PY_SSIZE_T_CLEAN
+ #include <Python.h>
+ #define XC_WANT_COMPAT_MAP_FOREIGN_API
+ #include <xenctrl.h>
+@@ -1774,7 +1775,7 @@ static PyObject *pyflask_load(PyObject *self, PyObject *args, PyObject *kwds)
+ {
+ xc_interface *xc_handle;
+ char *policy;
+- uint32_t len;
++ Py_ssize_t len;
+ int ret;
+
+ static char *kwd_list[] = { "policy", NULL };
+diff --git a/tools/python/xen/lowlevel/xs/xs.c b/tools/python/xen/lowlevel/xs/xs.c
+index 0dad7fa5f2..3ba5a8b893 100644
+--- a/tools/python/xen/lowlevel/xs/xs.c
++++ b/tools/python/xen/lowlevel/xs/xs.c
+@@ -18,6 +18,7 @@
+ * Copyright (C) 2005 XenSource Ltd.
+ */
+
++#define PY_SSIZE_T_CLEAN
+ #include <Python.h>
+
+ #include <stdbool.h>
+@@ -141,7 +142,7 @@ static PyObject *xspy_write(XsHandle *self, PyObject *args)
+ char *thstr;
+ char *path;
+ char *data;
+- int data_n;
++ Py_ssize_t data_n;
+ bool result;
+
+ if (!xh)
+--
+2.40.0
+
diff --git a/0040-x86-pv-shim-correctly-ignore-empty-onlining-requests.patch b/0040-x86-pv-shim-correctly-ignore-empty-onlining-requests.patch
deleted file mode 100644
index d242cb2..0000000
--- a/0040-x86-pv-shim-correctly-ignore-empty-onlining-requests.patch
+++ /dev/null
@@ -1,43 +0,0 @@
-From 08f6c88405a4406cac5b90e8d9873258dc445006 Mon Sep 17 00:00:00 2001
-From: Igor Druzhinin <igor.druzhinin@citrix.com>
-Date: Mon, 31 Oct 2022 13:26:59 +0100
-Subject: [PATCH 40/87] x86/pv-shim: correctly ignore empty onlining requests
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Mem-op requests may have zero extents. Such requests need treating as
-no-ops. pv_shim_online_memory(), however, would have tried to take 2³²-1
-order-sized pages from its balloon list (to then populate them),
-typically ending when the entire set of ballooned pages of this order
-was consumed.
-
-Note that pv_shim_offline_memory() does not have such an issue.
-
-Fixes: b2245acc60c3 ("xen/pvshim: memory hotplug")
-Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 9272225ca72801fd9fa5b268a2d1c5adebd19cd9
-master date: 2022-10-28 15:47:59 +0200
----
- xen/arch/x86/pv/shim.c | 3 +++
- 1 file changed, 3 insertions(+)
-
-diff --git a/xen/arch/x86/pv/shim.c b/xen/arch/x86/pv/shim.c
-index d9704121a739..4146ee3f9ce8 100644
---- a/xen/arch/x86/pv/shim.c
-+++ b/xen/arch/x86/pv/shim.c
-@@ -944,6 +944,9 @@ void pv_shim_online_memory(unsigned int nr, unsigned int order)
- struct page_info *page, *tmp;
- PAGE_LIST_HEAD(list);
-
-+ if ( !nr )
-+ return;
-+
- spin_lock(&balloon_lock);
- page_list_for_each_safe ( page, tmp, &balloon )
- {
---
-2.37.4
-
diff --git a/0041-tools-xenmon-Fix-xenmon.py-for-with-python3.x.patch b/0041-tools-xenmon-Fix-xenmon.py-for-with-python3.x.patch
new file mode 100644
index 0000000..ff97af6
--- /dev/null
+++ b/0041-tools-xenmon-Fix-xenmon.py-for-with-python3.x.patch
@@ -0,0 +1,54 @@
+From 5ce8d2aef85f590e4fb42d18784512203069d0c0 Mon Sep 17 00:00:00 2001
+From: Bernhard Kaindl <bernhard.kaindl@citrix.com>
+Date: Tue, 21 Mar 2023 13:49:47 +0100
+Subject: [PATCH 41/61] tools/xenmon: Fix xenmon.py for with python3.x
+
+Fixes for Py3:
+* class Delayed(): file not defined; also an error for pylint -E. Inherit
+ object instead for Py2 compatibility. Fix DomainInfo() too.
+* Inconsistent use of tabs and spaces for indentation (in one block)
+
+Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 3a59443c1d5ae0677a792c660ccd3796ce036732
+master date: 2023-02-06 10:22:12 +0000
+---
+ tools/xenmon/xenmon.py | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+diff --git a/tools/xenmon/xenmon.py b/tools/xenmon/xenmon.py
+index 175eacd2cb..977ada6887 100644
+--- a/tools/xenmon/xenmon.py
++++ b/tools/xenmon/xenmon.py
+@@ -117,7 +117,7 @@ def setup_cmdline_parser():
+ return parser
+
+ # encapsulate information about a domain
+-class DomainInfo:
++class DomainInfo(object):
+ def __init__(self):
+ self.allocated_sum = 0
+ self.gotten_sum = 0
+@@ -533,7 +533,7 @@ def show_livestats(cpu):
+ # simple functions to allow initialization of log files without actually
+ # physically creating files that are never used; only on the first real
+ # write does the file get created
+-class Delayed(file):
++class Delayed(object):
+ def __init__(self, filename, mode):
+ self.filename = filename
+ self.saved_mode = mode
+@@ -677,8 +677,8 @@ def main():
+
+ if os.uname()[0] == "SunOS":
+ xenbaked_cmd = "/usr/lib/xenbaked"
+- stop_cmd = "/usr/bin/pkill -INT -z global xenbaked"
+- kill_cmd = "/usr/bin/pkill -KILL -z global xenbaked"
++ stop_cmd = "/usr/bin/pkill -INT -z global xenbaked"
++ kill_cmd = "/usr/bin/pkill -KILL -z global xenbaked"
+ else:
+ # assumes that xenbaked is in your path
+ xenbaked_cmd = "xenbaked"
+--
+2.40.0
+
diff --git a/0041-x86-pv-shim-correct-ballooning-up-for-compat-guests.patch b/0041-x86-pv-shim-correct-ballooning-up-for-compat-guests.patch
deleted file mode 100644
index 5c77bbf..0000000
--- a/0041-x86-pv-shim-correct-ballooning-up-for-compat-guests.patch
+++ /dev/null
@@ -1,55 +0,0 @@
-From 2f75e3654f00a62bd1f446a7424ccd56750a2e15 Mon Sep 17 00:00:00 2001
-From: Igor Druzhinin <igor.druzhinin@citrix.com>
-Date: Mon, 31 Oct 2022 13:28:15 +0100
-Subject: [PATCH 41/87] x86/pv-shim: correct ballooning up for compat guests
-
-The compat layer for multi-extent memory ops may need to split incoming
-requests. Since the guest handles in the interface structures may not be
-altered, it does so by leveraging do_memory_op()'s continuation
-handling: It hands on non-initial requests with a non-zero start extent,
-with the (native) handle suitably adjusted down. As a result
-do_memory_op() sees only the first of potentially several requests with
-start extent being zero. It's only that case when the function would
-issue a call to pv_shim_online_memory(), yet the range then covers only
-the first sub-range that results from the split.
-
-Address that breakage by making a complementary call to
-pv_shim_online_memory() in compat layer.
-
-Fixes: b2245acc60c3 ("xen/pvshim: memory hotplug")
-Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: a0bfdd201ea12aa5679bb8944d63a4e0d3c23160
-master date: 2022-10-28 15:48:50 +0200
----
- xen/common/compat/memory.c | 6 +++++-
- 1 file changed, 5 insertions(+), 1 deletion(-)
-
-diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
-index c43fa97cf15f..a0e0562a4033 100644
---- a/xen/common/compat/memory.c
-+++ b/xen/common/compat/memory.c
-@@ -7,6 +7,7 @@ EMIT_FILE;
- #include <xen/event.h>
- #include <xen/mem_access.h>
- #include <asm/current.h>
-+#include <asm/guest.h>
- #include <compat/memory.h>
-
- #define xen_domid_t domid_t
-@@ -146,7 +147,10 @@ int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
- nat.rsrv->nr_extents = end_extent;
- ++split;
- }
--
-+ /* Avoid calling pv_shim_online_memory() when in a continuation. */
-+ if ( pv_shim && op != XENMEM_decrease_reservation && !start_extent )
-+ pv_shim_online_memory(cmp.rsrv.nr_extents - nat.rsrv->nr_extents,
-+ cmp.rsrv.extent_order);
- break;
-
- case XENMEM_exchange:
---
-2.37.4
-
diff --git a/0042-core-parking-fix-build-with-gcc12-and-NR_CPUS-1.patch b/0042-core-parking-fix-build-with-gcc12-and-NR_CPUS-1.patch
new file mode 100644
index 0000000..c425c43
--- /dev/null
+++ b/0042-core-parking-fix-build-with-gcc12-and-NR_CPUS-1.patch
@@ -0,0 +1,95 @@
+From 4a6bedefe589dab12182d6b974de8ea3b2fcc681 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 21 Mar 2023 13:50:18 +0100
+Subject: [PATCH 42/61] core-parking: fix build with gcc12 and NR_CPUS=1
+
+Gcc12 takes issue with core_parking_remove()'s
+
+ for ( ; i < cur_idle_nums; ++i )
+ core_parking_cpunum[i] = core_parking_cpunum[i + 1];
+
+complaining that the right hand side array access is past the bounds of
+1. Clearly the compiler can't know that cur_idle_nums can only ever be
+zero in this case (as the sole CPU cannot be parked).
+
+Arrange for core_parking.c's contents to not be needed altogether, and
+then disable its building when NR_CPUS == 1.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 4b0422f70feb4b1cd04598ffde805fc224f3812e
+master date: 2023-03-13 15:15:42 +0100
+---
+ xen/arch/x86/Kconfig | 2 +-
+ xen/arch/x86/platform_hypercall.c | 11 ++++++++---
+ xen/arch/x86/sysctl.c | 3 +++
+ xen/common/Kconfig | 1 +
+ 4 files changed, 13 insertions(+), 4 deletions(-)
+
+diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
+index 3c14096c80..8e2b504923 100644
+--- a/xen/arch/x86/Kconfig
++++ b/xen/arch/x86/Kconfig
+@@ -8,7 +8,7 @@ config X86
+ select ACPI_LEGACY_TABLES_LOOKUP
+ select ALTERNATIVE_CALL
+ select ARCH_SUPPORTS_INT128
+- select CORE_PARKING
++ imply CORE_PARKING
+ select HAS_ALTERNATIVE
+ select HAS_COMPAT
+ select HAS_CPUFREQ
+diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
+index bf4090c942..c35e5669a4 100644
+--- a/xen/arch/x86/platform_hypercall.c
++++ b/xen/arch/x86/platform_hypercall.c
+@@ -725,12 +725,17 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
+ case XEN_CORE_PARKING_SET:
+ idle_nums = min_t(uint32_t,
+ op->u.core_parking.idle_nums, num_present_cpus() - 1);
+- ret = continue_hypercall_on_cpu(
+- 0, core_parking_helper, (void *)(unsigned long)idle_nums);
++ if ( CONFIG_NR_CPUS > 1 )
++ ret = continue_hypercall_on_cpu(
++ 0, core_parking_helper,
++ (void *)(unsigned long)idle_nums);
++ else if ( idle_nums )
++ ret = -EINVAL;
+ break;
+
+ case XEN_CORE_PARKING_GET:
+- op->u.core_parking.idle_nums = get_cur_idle_nums();
++ op->u.core_parking.idle_nums = CONFIG_NR_CPUS > 1
++ ? get_cur_idle_nums() : 0;
+ ret = __copy_field_to_guest(u_xenpf_op, op, u.core_parking) ?
+ -EFAULT : 0;
+ break;
+diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
+index aff52a13f3..ff843eaee2 100644
+--- a/xen/arch/x86/sysctl.c
++++ b/xen/arch/x86/sysctl.c
+@@ -179,6 +179,9 @@ long arch_do_sysctl(
+ ret = -EBUSY;
+ break;
+ }
++ if ( CONFIG_NR_CPUS <= 1 )
++ /* Mimic behavior of smt_up_down_helper(). */
++ return 0;
+ plug = op == XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE;
+ fn = smt_up_down_helper;
+ hcpu = _p(plug);
+diff --git a/xen/common/Kconfig b/xen/common/Kconfig
+index 6443943889..c9f4b7f492 100644
+--- a/xen/common/Kconfig
++++ b/xen/common/Kconfig
+@@ -10,6 +10,7 @@ config COMPAT
+
+ config CORE_PARKING
+ bool
++ depends on NR_CPUS > 1
+
+ config GRANT_TABLE
+ bool "Grant table support" if EXPERT
+--
+2.40.0
+
diff --git a/0042-x86-pv-shim-correct-ballooning-down-for-compat-guest.patch b/0042-x86-pv-shim-correct-ballooning-down-for-compat-guest.patch
deleted file mode 100644
index dd044e4..0000000
--- a/0042-x86-pv-shim-correct-ballooning-down-for-compat-guest.patch
+++ /dev/null
@@ -1,72 +0,0 @@
-From c229b16ba3eb5579a9a5d470ab16dd9ad55e57d6 Mon Sep 17 00:00:00 2001
-From: Igor Druzhinin <igor.druzhinin@citrix.com>
-Date: Mon, 31 Oct 2022 13:28:46 +0100
-Subject: [PATCH 42/87] x86/pv-shim: correct ballooning down for compat guests
-
-The compat layer for multi-extent memory ops may need to split incoming
-requests. Since the guest handles in the interface structures may not be
-altered, it does so by leveraging do_memory_op()'s continuation
-handling: It hands on non-initial requests with a non-zero start extent,
-with the (native) handle suitably adjusted down. As a result
-do_memory_op() sees only the first of potentially several requests with
-start extent being zero. In order to be usable as overall result, the
-function accumulates args.nr_done, i.e. it initialized the field with
-the start extent. Therefore non-initial requests resulting from the
-split would pass too large a number into pv_shim_offline_memory().
-
-Address that breakage by always calling pv_shim_offline_memory()
-regardless of current hypercall preemption status, with a suitably
-adjusted first argument. Note that this is correct also for the native
-guest case: We now simply "commit" what was completed right away, rather
-than at the end of a series of preemption/re-start cycles. In fact this
-improves overall preemption behavior: There's no longer a potentially
-big chunk of work done non-preemptively at the end of the last
-"iteration".
-
-Fixes: b2245acc60c3 ("xen/pvshim: memory hotplug")
-Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 1d7fbc535d1d37bdc2cc53ede360b0f6651f7de1
-master date: 2022-10-28 15:49:33 +0200
----
- xen/common/memory.c | 19 +++++++------------
- 1 file changed, 7 insertions(+), 12 deletions(-)
-
-diff --git a/xen/common/memory.c b/xen/common/memory.c
-index 064de4ad8d66..76f8858cc379 100644
---- a/xen/common/memory.c
-+++ b/xen/common/memory.c
-@@ -1420,22 +1420,17 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
-
- rc = args.nr_done;
-
-- if ( args.preempted )
-- return hypercall_create_continuation(
-- __HYPERVISOR_memory_op, "lh",
-- op | (rc << MEMOP_EXTENT_SHIFT), arg);
--
- #ifdef CONFIG_X86
- if ( pv_shim && op == XENMEM_decrease_reservation )
-- /*
-- * Only call pv_shim_offline_memory when the hypercall has
-- * finished. Note that nr_done is used to cope in case the
-- * hypercall has failed and only part of the extents where
-- * processed.
-- */
-- pv_shim_offline_memory(args.nr_done, args.extent_order);
-+ pv_shim_offline_memory(args.nr_done - start_extent,
-+ args.extent_order);
- #endif
-
-+ if ( args.preempted )
-+ return hypercall_create_continuation(
-+ __HYPERVISOR_memory_op, "lh",
-+ op | (rc << MEMOP_EXTENT_SHIFT), arg);
-+
- break;
-
- case XENMEM_exchange:
---
-2.37.4
-
diff --git a/0043-x86-altp2m-help-gcc13-to-avoid-it-emitting-a-warning.patch b/0043-x86-altp2m-help-gcc13-to-avoid-it-emitting-a-warning.patch
new file mode 100644
index 0000000..0e040ad
--- /dev/null
+++ b/0043-x86-altp2m-help-gcc13-to-avoid-it-emitting-a-warning.patch
@@ -0,0 +1,129 @@
+From cdde3171a2a932a6836b094c4387412e27414ec9 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 21 Mar 2023 13:51:42 +0100
+Subject: [PATCH 43/61] x86/altp2m: help gcc13 to avoid it emitting a warning
+
+Switches of altp2m-s always expect a valid altp2m to be in place (and
+indeed altp2m_vcpu_initialise() sets the active one to be at index 0).
+The compiler, however, cannot know that, and hence it cannot eliminate
+p2m_get_altp2m()'s case of returnin (literal) NULL. If then the compiler
+decides to special case that code path in the caller, the dereference in
+instances of
+
+ atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
+
+can, to the code generator, appear to be NULL dereferences, leading to
+
+In function 'atomic_dec',
+ inlined from '...' at ...:
+./arch/x86/include/asm/atomic.h:182:5: error: array subscript 0 is outside array bounds of 'int[0]' [-Werror=array-bounds=]
+
+Aid the compiler by adding a BUG_ON() checking the return value of the
+problematic p2m_get_altp2m(). Since with the use of the local variable
+the 2nd p2m_get_altp2m() each will look questionable at the first glance
+(Why is the local variable not used here?), open-code the only relevant
+piece of p2m_get_altp2m() there.
+
+To avoid repeatedly doing these transformations, and also to limit how
+"bad" the open-coding really is, convert the entire operation to an
+inline helper, used by all three instances (and accepting the redundant
+BUG_ON(idx >= MAX_ALTP2M) in two of the three cases).
+
+Reported-by: Charles Arnold <carnold@suse.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: be62b1fc2aa7375d553603fca07299da765a89fe
+master date: 2023-03-13 15:16:21 +0100
+---
+ xen/arch/x86/hvm/vmx/vmx.c | 8 +-------
+ xen/arch/x86/mm/p2m.c | 14 ++------------
+ xen/include/asm-x86/p2m.h | 20 ++++++++++++++++++++
+ 3 files changed, 23 insertions(+), 19 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
+index 094141be9a..c8a839cd5e 100644
+--- a/xen/arch/x86/hvm/vmx/vmx.c
++++ b/xen/arch/x86/hvm/vmx/vmx.c
+@@ -4036,13 +4036,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+ }
+ }
+
+- if ( idx != vcpu_altp2m(v).p2midx )
+- {
+- BUG_ON(idx >= MAX_ALTP2M);
+- atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
+- vcpu_altp2m(v).p2midx = idx;
+- atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
+- }
++ p2m_set_altp2m(v, idx);
+ }
+
+ /* XXX: This looks ugly, but we need a mechanism to ensure
+diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
+index 8781df9dda..2d41446a69 100644
+--- a/xen/arch/x86/mm/p2m.c
++++ b/xen/arch/x86/mm/p2m.c
+@@ -2194,13 +2194,8 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx)
+
+ if ( d->arch.altp2m_eptp[idx] != mfn_x(INVALID_MFN) )
+ {
+- if ( idx != vcpu_altp2m(v).p2midx )
+- {
+- atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
+- vcpu_altp2m(v).p2midx = idx;
+- atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
++ if ( p2m_set_altp2m(v, idx) )
+ altp2m_vcpu_update_p2m(v);
+- }
+ rc = 1;
+ }
+
+@@ -2471,13 +2466,8 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, unsigned int idx)
+ if ( d->arch.altp2m_visible_eptp[idx] != mfn_x(INVALID_MFN) )
+ {
+ for_each_vcpu( d, v )
+- if ( idx != vcpu_altp2m(v).p2midx )
+- {
+- atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
+- vcpu_altp2m(v).p2midx = idx;
+- atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
++ if ( p2m_set_altp2m(v, idx) )
+ altp2m_vcpu_update_p2m(v);
+- }
+
+ rc = 0;
+ }
+diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
+index 2db9ab0122..f92bb97394 100644
+--- a/xen/include/asm-x86/p2m.h
++++ b/xen/include/asm-x86/p2m.h
+@@ -841,6 +841,26 @@ static inline struct p2m_domain *p2m_get_altp2m(struct vcpu *v)
+ return v->domain->arch.altp2m_p2m[index];
+ }
+
++/* set current alternate p2m table */
++static inline bool p2m_set_altp2m(struct vcpu *v, unsigned int idx)
++{
++ struct p2m_domain *orig;
++
++ BUG_ON(idx >= MAX_ALTP2M);
++
++ if ( idx == vcpu_altp2m(v).p2midx )
++ return false;
++
++ orig = p2m_get_altp2m(v);
++ BUG_ON(!orig);
++ atomic_dec(&orig->active_vcpus);
++
++ vcpu_altp2m(v).p2midx = idx;
++ atomic_inc(&v->domain->arch.altp2m_p2m[idx]->active_vcpus);
++
++ return true;
++}
++
+ /* Switch alternate p2m for a single vcpu */
+ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx);
+
+--
+2.40.0
+
diff --git a/0043-x86-vmx-Revert-VMX-use-a-single-global-APIC-access-p.patch b/0043-x86-vmx-Revert-VMX-use-a-single-global-APIC-access-p.patch
deleted file mode 100644
index 92b3bf1..0000000
--- a/0043-x86-vmx-Revert-VMX-use-a-single-global-APIC-access-p.patch
+++ /dev/null
@@ -1,259 +0,0 @@
-From 62e7fb702db4adaa9415ac87d95e0f461e32d9ca Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 24 Aug 2022 14:16:44 +0100
-Subject: [PATCH 43/87] x86/vmx: Revert "VMX: use a single, global APIC access
- page"
-
-The claim "No accesses would ever go to this page." is false. A consequence
-of how Intel's APIC Acceleration works, and Xen's choice to have per-domain
-P2Ms (rather than per-vCPU P2Ms) means that the APIC page is fully read-write
-to any vCPU which is not in xAPIC mode.
-
-This reverts commit 58850b9074d3e7affdf3bc94c84e417ecfa4d165.
-
-This is XSA-412 / CVE-2022-42327.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 3b5beaf49033cddf4b2cc4e4d391b966f4203471)
----
- xen/arch/x86/hvm/vmx/vmx.c | 59 ++++++++++++++++++++++--------
- xen/arch/x86/mm/shadow/set.c | 8 ----
- xen/arch/x86/mm/shadow/types.h | 7 ----
- xen/include/asm-x86/hvm/vmx/vmcs.h | 1 +
- xen/include/asm-x86/mm.h | 20 +---------
- 5 files changed, 46 insertions(+), 49 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
-index d429d76c18c9..3f4276531322 100644
---- a/xen/arch/x86/hvm/vmx/vmx.c
-+++ b/xen/arch/x86/hvm/vmx/vmx.c
-@@ -66,7 +66,8 @@ boolean_param("force-ept", opt_force_ept);
- static void vmx_ctxt_switch_from(struct vcpu *v);
- static void vmx_ctxt_switch_to(struct vcpu *v);
-
--static int alloc_vlapic_mapping(void);
-+static int vmx_alloc_vlapic_mapping(struct domain *d);
-+static void vmx_free_vlapic_mapping(struct domain *d);
- static void vmx_install_vlapic_mapping(struct vcpu *v);
- static void vmx_update_guest_cr(struct vcpu *v, unsigned int cr,
- unsigned int flags);
-@@ -77,8 +78,6 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content);
- static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
- static void vmx_invlpg(struct vcpu *v, unsigned long linear);
-
--static mfn_t __read_mostly apic_access_mfn = INVALID_MFN_INITIALIZER;
--
- /* Values for domain's ->arch.hvm_domain.pi_ops.flags. */
- #define PI_CSW_FROM (1u << 0)
- #define PI_CSW_TO (1u << 1)
-@@ -402,6 +401,7 @@ static int vmx_domain_initialise(struct domain *d)
- .to = vmx_ctxt_switch_to,
- .tail = vmx_do_resume,
- };
-+ int rc;
-
- d->arch.ctxt_switch = &csw;
-
-@@ -411,15 +411,24 @@ static int vmx_domain_initialise(struct domain *d)
- */
- d->arch.hvm.vmx.exec_sp = is_hardware_domain(d) || opt_ept_exec_sp;
-
-+ if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 )
-+ return rc;
-+
- return 0;
- }
-
-+static void vmx_domain_relinquish_resources(struct domain *d)
-+{
-+ vmx_free_vlapic_mapping(d);
-+}
-+
- static void domain_creation_finished(struct domain *d)
- {
- gfn_t gfn = gaddr_to_gfn(APIC_DEFAULT_PHYS_BASE);
-+ mfn_t apic_access_mfn = d->arch.hvm.vmx.apic_access_mfn;
- bool ipat;
-
-- if ( !has_vlapic(d) || mfn_eq(apic_access_mfn, INVALID_MFN) )
-+ if ( mfn_eq(apic_access_mfn, _mfn(0)) )
- return;
-
- ASSERT(epte_get_entry_emt(d, gfn, apic_access_mfn, 0, &ipat,
-@@ -2481,6 +2490,7 @@ static struct hvm_function_table __initdata vmx_function_table = {
- .cpu_up_prepare = vmx_cpu_up_prepare,
- .cpu_dead = vmx_cpu_dead,
- .domain_initialise = vmx_domain_initialise,
-+ .domain_relinquish_resources = vmx_domain_relinquish_resources,
- .domain_creation_finished = domain_creation_finished,
- .vcpu_initialise = vmx_vcpu_initialise,
- .vcpu_destroy = vmx_vcpu_destroy,
-@@ -2731,7 +2741,7 @@ const struct hvm_function_table * __init start_vmx(void)
- {
- set_in_cr4(X86_CR4_VMXE);
-
-- if ( vmx_vmcs_init() || alloc_vlapic_mapping() )
-+ if ( vmx_vmcs_init() )
- {
- printk("VMX: failed to initialise.\n");
- return NULL;
-@@ -3305,36 +3315,55 @@ gp_fault:
- return X86EMUL_EXCEPTION;
- }
-
--static int __init alloc_vlapic_mapping(void)
-+static int vmx_alloc_vlapic_mapping(struct domain *d)
- {
- struct page_info *pg;
- mfn_t mfn;
-
-- if ( !cpu_has_vmx_virtualize_apic_accesses )
-+ if ( !has_vlapic(d) || !cpu_has_vmx_virtualize_apic_accesses )
- return 0;
-
-- pg = alloc_domheap_page(NULL, 0);
-+ pg = alloc_domheap_page(d, MEMF_no_refcount);
- if ( !pg )
- return -ENOMEM;
-
-- /*
-- * Signal to shadow code that this page cannot be refcounted. This also
-- * makes epte_get_entry_emt() recognize this page as "special".
-- */
-- page_suppress_refcounting(pg);
-+ if ( !get_page_and_type(pg, d, PGT_writable_page) )
-+ {
-+ /*
-+ * The domain can't possibly know about this page yet, so failure
-+ * here is a clear indication of something fishy going on.
-+ */
-+ domain_crash(d);
-+ return -ENODATA;
-+ }
-
- mfn = page_to_mfn(pg);
- clear_domain_page(mfn);
-- apic_access_mfn = mfn;
-+ d->arch.hvm.vmx.apic_access_mfn = mfn;
-
- return 0;
- }
-
-+static void vmx_free_vlapic_mapping(struct domain *d)
-+{
-+ mfn_t mfn = d->arch.hvm.vmx.apic_access_mfn;
-+
-+ d->arch.hvm.vmx.apic_access_mfn = _mfn(0);
-+ if ( !mfn_eq(mfn, _mfn(0)) )
-+ {
-+ struct page_info *pg = mfn_to_page(mfn);
-+
-+ put_page_alloc_ref(pg);
-+ put_page_and_type(pg);
-+ }
-+}
-+
- static void vmx_install_vlapic_mapping(struct vcpu *v)
- {
-+ mfn_t apic_access_mfn = v->domain->arch.hvm.vmx.apic_access_mfn;
- paddr_t virt_page_ma, apic_page_ma;
-
-- if ( !has_vlapic(v->domain) || mfn_eq(apic_access_mfn, INVALID_MFN) )
-+ if ( mfn_eq(apic_access_mfn, _mfn(0)) )
- return;
-
- ASSERT(cpu_has_vmx_virtualize_apic_accesses);
-diff --git a/xen/arch/x86/mm/shadow/set.c b/xen/arch/x86/mm/shadow/set.c
-index 87e9c6eeb219..bd6c68b547c9 100644
---- a/xen/arch/x86/mm/shadow/set.c
-+++ b/xen/arch/x86/mm/shadow/set.c
-@@ -101,14 +101,6 @@ shadow_get_page_from_l1e(shadow_l1e_t sl1e, struct domain *d, p2m_type_t type)
- owner = page_get_owner(pg);
- }
-
-- /*
-- * Check whether refcounting is suppressed on this page. For example,
-- * VMX'es APIC access MFN is just a surrogate page. It doesn't actually
-- * get accessed, and hence there's no need to refcount it.
-- */
-- if ( pg && page_refcounting_suppressed(pg) )
-- return 0;
--
- if ( owner == dom_io )
- owner = NULL;
-
-diff --git a/xen/arch/x86/mm/shadow/types.h b/xen/arch/x86/mm/shadow/types.h
-index 6970e7d6ea4a..814a4018535a 100644
---- a/xen/arch/x86/mm/shadow/types.h
-+++ b/xen/arch/x86/mm/shadow/types.h
-@@ -276,16 +276,9 @@ int shadow_set_l4e(struct domain *d, shadow_l4e_t *sl4e,
- static void inline
- shadow_put_page_from_l1e(shadow_l1e_t sl1e, struct domain *d)
- {
-- mfn_t mfn = shadow_l1e_get_mfn(sl1e);
--
- if ( !shadow_mode_refcounts(d) )
- return;
-
-- if ( mfn_valid(mfn) &&
-- /* See the respective comment in shadow_get_page_from_l1e(). */
-- page_refcounting_suppressed(mfn_to_page(mfn)) )
-- return;
--
- put_page_from_l1e(sl1e, d);
- }
-
-diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
-index 03c9ccf627ab..8073af323b96 100644
---- a/xen/include/asm-x86/hvm/vmx/vmcs.h
-+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
-@@ -58,6 +58,7 @@ struct ept_data {
- #define _VMX_DOMAIN_PML_ENABLED 0
- #define VMX_DOMAIN_PML_ENABLED (1ul << _VMX_DOMAIN_PML_ENABLED)
- struct vmx_domain {
-+ mfn_t apic_access_mfn;
- /* VMX_DOMAIN_* */
- unsigned int status;
-
-diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
-index 7bdf9c2290d8..e1bcea57a8f5 100644
---- a/xen/include/asm-x86/mm.h
-+++ b/xen/include/asm-x86/mm.h
-@@ -83,7 +83,7 @@
- #define PGC_state_offlined PG_mask(2, 6)
- #define PGC_state_free PG_mask(3, 6)
- #define page_state_is(pg, st) (((pg)->count_info&PGC_state) == PGC_state_##st)
--/* Page is not reference counted (see below for caveats) */
-+/* Page is not reference counted */
- #define _PGC_extra PG_shift(7)
- #define PGC_extra PG_mask(1, 7)
-
-@@ -375,24 +375,6 @@ void zap_ro_mpt(mfn_t mfn);
-
- bool is_iomem_page(mfn_t mfn);
-
--/*
-- * Pages with no owner which may get passed to functions wanting to
-- * refcount them can be marked PGC_extra to bypass this refcounting (which
-- * would fail due to the lack of an owner).
-- *
-- * (For pages with owner PGC_extra has different meaning.)
-- */
--static inline void page_suppress_refcounting(struct page_info *pg)
--{
-- ASSERT(!page_get_owner(pg));
-- pg->count_info |= PGC_extra;
--}
--
--static inline bool page_refcounting_suppressed(const struct page_info *pg)
--{
-- return !page_get_owner(pg) && (pg->count_info & PGC_extra);
--}
--
- struct platform_bad_page {
- unsigned long mfn;
- unsigned int order;
---
-2.37.4
-
diff --git a/0044-VT-d-constrain-IGD-check.patch b/0044-VT-d-constrain-IGD-check.patch
new file mode 100644
index 0000000..13ca74e
--- /dev/null
+++ b/0044-VT-d-constrain-IGD-check.patch
@@ -0,0 +1,44 @@
+From 4d42cc4d25c35ca381370a1fa0b45350723d1308 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 21 Mar 2023 13:52:20 +0100
+Subject: [PATCH 44/61] VT-d: constrain IGD check
+
+Marking a DRHD as controlling an IGD isn't very sensible without
+checking that at the very least it's a graphics device that lives at
+0000:00:02.0. Re-use the reading of the class-code to control both the
+clearing of "gfx_only" and the setting of "igd_drhd_address".
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+master commit: f8c4317295fa1cde1a81779b7e362651c084efb8
+master date: 2023-03-14 10:44:08 +0100
+---
+ xen/drivers/passthrough/vtd/dmar.c | 9 +++------
+ 1 file changed, 3 insertions(+), 6 deletions(-)
+
+diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
+index 33a12b2ae9..9ec49936b8 100644
+--- a/xen/drivers/passthrough/vtd/dmar.c
++++ b/xen/drivers/passthrough/vtd/dmar.c
+@@ -391,15 +391,12 @@ static int __init acpi_parse_dev_scope(
+
+ if ( drhd )
+ {
+- if ( (seg == 0) && (bus == 0) && (path->dev == 2) &&
+- (path->fn == 0) )
+- igd_drhd_address = drhd->address;
+-
+- if ( gfx_only &&
+- pci_conf_read8(PCI_SBDF(seg, bus, path->dev, path->fn),
++ if ( pci_conf_read8(PCI_SBDF(seg, bus, path->dev, path->fn),
+ PCI_CLASS_DEVICE + 1) != 0x03
+ /* PCI_BASE_CLASS_DISPLAY */ )
+ gfx_only = false;
++ else if ( !seg && !bus && path->dev == 2 && !path->fn )
++ igd_drhd_address = drhd->address;
+ }
+
+ break;
+--
+2.40.0
+
diff --git a/0044-tools-xenstore-create_node-Don-t-defer-work-to-undo-.patch b/0044-tools-xenstore-create_node-Don-t-defer-work-to-undo-.patch
deleted file mode 100644
index 8b9ff53..0000000
--- a/0044-tools-xenstore-create_node-Don-t-defer-work-to-undo-.patch
+++ /dev/null
@@ -1,120 +0,0 @@
-From 28ea39a4eb476f9105e1021bef1367c075feaa0b Mon Sep 17 00:00:00 2001
-From: Julien Grall <jgrall@amazon.com>
-Date: Tue, 13 Sep 2022 07:35:06 +0200
-Subject: [PATCH 44/87] tools/xenstore: create_node: Don't defer work to undo
- any changes on failure
-
-XSA-115 extended destroy_node() to update the node accounting for the
-connection. The implementation is assuming the connection is the parent
-of the node, however all the nodes are allocated using a separate context
-(see process_message()). This will result to crash (or corrupt) xenstored
-as the pointer is wrongly used.
-
-In case of an error, any changes to the database or update to the
-accounting will now be reverted in create_node() by calling directly
-destroy_node(). This has the nice advantage to remove the loop to unset
-the destructors in case of success.
-
-Take the opportunity to free the nodes right now as they are not
-going to be reachable (the function returns NULL) and are just wasting
-resources.
-
-This is XSA-414 / CVE-2022-42309.
-
-Fixes: 0bfb2101f243 ("tools/xenstore: fix node accounting after failed node creation")
-Signed-off-by: Julien Grall <jgrall@amazon.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-(cherry picked from commit 1cd3cc7ea27cda7640a8d895e09617b61c265697)
----
- tools/xenstore/xenstored_core.c | 47 ++++++++++++++++++++++-----------
- 1 file changed, 32 insertions(+), 15 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 0c8ee276f837..29947c3020c3 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -1088,9 +1088,8 @@ nomem:
- return NULL;
- }
-
--static int destroy_node(void *_node)
-+static int destroy_node(struct connection *conn, struct node *node)
- {
-- struct node *node = _node;
- TDB_DATA key;
-
- if (streq(node->name, "/"))
-@@ -1099,7 +1098,7 @@ static int destroy_node(void *_node)
- set_tdb_key(node->name, &key);
- tdb_delete(tdb_ctx, key);
-
-- domain_entry_dec(talloc_parent(node), node);
-+ domain_entry_dec(conn, node);
-
- return 0;
- }
-@@ -1108,7 +1107,8 @@ static struct node *create_node(struct connection *conn, const void *ctx,
- const char *name,
- void *data, unsigned int datalen)
- {
-- struct node *node, *i;
-+ struct node *node, *i, *j;
-+ int ret;
-
- node = construct_node(conn, ctx, name);
- if (!node)
-@@ -1130,23 +1130,40 @@ static struct node *create_node(struct connection *conn, const void *ctx,
- /* i->parent is set for each new node, so check quota. */
- if (i->parent &&
- domain_entry(conn) >= quota_nb_entry_per_domain) {
-- errno = ENOSPC;
-- return NULL;
-+ ret = ENOSPC;
-+ goto err;
- }
-- if (write_node(conn, i, false))
-- return NULL;
-
-- /* Account for new node, set destructor for error case. */
-- if (i->parent) {
-+ ret = write_node(conn, i, false);
-+ if (ret)
-+ goto err;
-+
-+ /* Account for new node */
-+ if (i->parent)
- domain_entry_inc(conn, i);
-- talloc_set_destructor(i, destroy_node);
-- }
- }
-
-- /* OK, now remove destructors so they stay around */
-- for (i = node; i->parent; i = i->parent)
-- talloc_set_destructor(i, NULL);
- return node;
-+
-+err:
-+ /*
-+ * We failed to update TDB for some of the nodes. Undo any work that
-+ * have already been done.
-+ */
-+ for (j = node; j != i; j = j->parent)
-+ destroy_node(conn, j);
-+
-+ /* We don't need to keep the nodes around, so free them. */
-+ i = node;
-+ while (i) {
-+ j = i;
-+ i = i->parent;
-+ talloc_free(j);
-+ }
-+
-+ errno = ret;
-+
-+ return NULL;
- }
-
- /* path, data... */
---
-2.37.4
-
diff --git a/0045-bunzip-work-around-gcc13-warning.patch b/0045-bunzip-work-around-gcc13-warning.patch
new file mode 100644
index 0000000..9b26011
--- /dev/null
+++ b/0045-bunzip-work-around-gcc13-warning.patch
@@ -0,0 +1,42 @@
+From 49116b2101094c3d6658928f03db88d035ba97be Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 21 Mar 2023 13:52:58 +0100
+Subject: [PATCH 45/61] bunzip: work around gcc13 warning
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+While provable that length[0] is always initialized (because symCount
+cannot be zero), upcoming gcc13 fails to recognize this and warns about
+the unconditional use of the value immediately following the loop.
+
+See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511.
+
+Reported-by: Martin Liška <martin.liska@suse.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 402195e56de0aacf97e05c80ed367d464ca6938b
+master date: 2023-03-14 10:45:28 +0100
+---
+ xen/common/bunzip2.c | 5 +++++
+ 1 file changed, 5 insertions(+)
+
+diff --git a/xen/common/bunzip2.c b/xen/common/bunzip2.c
+index 2087cfbbed..5108e570ed 100644
+--- a/xen/common/bunzip2.c
++++ b/xen/common/bunzip2.c
+@@ -233,6 +233,11 @@ static int __init get_next_block(struct bunzip_data *bd)
+ becomes negative, so an unsigned inequality catches
+ it.) */
+ t = get_bits(bd, 5)-1;
++ /* GCC 13 has apparently improved use-before-set detection, but
++ it can't figure out that length[0] is always intialized by
++ virtue of symCount always being positive when making it here.
++ See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511. */
++ length[0] = 0;
+ for (i = 0; i < symCount; i++) {
+ for (;;) {
+ if (((unsigned)t) > (MAX_HUFCODE_BITS-1))
+--
+2.40.0
+
diff --git a/0045-tools-xenstore-Fail-a-transaction-if-it-is-not-possi.patch b/0045-tools-xenstore-Fail-a-transaction-if-it-is-not-possi.patch
deleted file mode 100644
index 4ca6c93..0000000
--- a/0045-tools-xenstore-Fail-a-transaction-if-it-is-not-possi.patch
+++ /dev/null
@@ -1,145 +0,0 @@
-From 427e86b48836a9511f57004ca367283cd85cd30f Mon Sep 17 00:00:00 2001
-From: Julien Grall <jgrall@amazon.com>
-Date: Tue, 13 Sep 2022 07:35:06 +0200
-Subject: [PATCH 45/87] tools/xenstore: Fail a transaction if it is not
- possible to create a node
-
-Commit f2bebf72c4d5 "xenstore: rework of transaction handling" moved
-out from copying the entire database everytime a new transaction is
-opened to track the list of nodes changed.
-
-The content of all the nodes accessed during a transaction will be
-temporarily stored in TDB using a different key.
-
-The function create_node() may write/update multiple nodes if the child
-doesn't exist. In case of a failure, the function will revert any
-changes (this include any update to TDB). Unfortunately, the function
-which reverts the changes (i.e. destroy_node()) will not use the correct
-key to delete any update or even request the transaction to fail.
-
-This means that if a client decide to go ahead with committing the
-transaction, orphan nodes will be created because they were not linked
-to an existing node (create_node() will write the nodes backwards).
-
-Once some nodes have been partially updated in a transaction, it is not
-easily possible to undo any changes. So rather than continuing and hit
-weird issue while committing, it is much saner to fail the transaction.
-
-This will have an impact on any client that decides to commit even if it
-can't write a node. Although, it is not clear why a normal client would
-want to do that...
-
-Lastly, update destroy_node() to use the correct key for deleting the
-node. Rather than recreating it (this will allocate memory and
-therefore fail), stash the key in the structure node.
-
-This is XSA-415 / CVE-2022-42310.
-
-Signed-off-by: Julien Grall <jgrall@amazon.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-(cherry picked from commit 5d71766bd1a4a3a8b2fe952ca2be80e02fe48f34)
----
- tools/xenstore/xenstored_core.c | 23 +++++++++++++++--------
- tools/xenstore/xenstored_core.h | 2 ++
- tools/xenstore/xenstored_transaction.c | 5 +++++
- tools/xenstore/xenstored_transaction.h | 3 +++
- 4 files changed, 25 insertions(+), 8 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 29947c3020c3..e9c9695fd16e 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -566,15 +566,17 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
- return 0;
- }
-
-+/*
-+ * Write the node. If the node is written, caller can find the key used in
-+ * node->key. This can later be used if the change needs to be reverted.
-+ */
- static int write_node(struct connection *conn, struct node *node,
- bool no_quota_check)
- {
-- TDB_DATA key;
--
-- if (access_node(conn, node, NODE_ACCESS_WRITE, &key))
-+ if (access_node(conn, node, NODE_ACCESS_WRITE, &node->key))
- return errno;
-
-- return write_node_raw(conn, &key, node, no_quota_check);
-+ return write_node_raw(conn, &node->key, node, no_quota_check);
- }
-
- unsigned int perm_for_conn(struct connection *conn,
-@@ -1090,16 +1092,21 @@ nomem:
-
- static int destroy_node(struct connection *conn, struct node *node)
- {
-- TDB_DATA key;
--
- if (streq(node->name, "/"))
- corrupt(NULL, "Destroying root node!");
-
-- set_tdb_key(node->name, &key);
-- tdb_delete(tdb_ctx, key);
-+ tdb_delete(tdb_ctx, node->key);
-
- domain_entry_dec(conn, node);
-
-+ /*
-+ * It is not possible to easily revert the changes in a transaction.
-+ * So if the failure happens in a transaction, mark it as fail to
-+ * prevent any commit.
-+ */
-+ if ( conn->transaction )
-+ fail_transaction(conn->transaction);
-+
- return 0;
- }
-
-diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
-index 07d861d92499..0004fa848c83 100644
---- a/tools/xenstore/xenstored_core.h
-+++ b/tools/xenstore/xenstored_core.h
-@@ -155,6 +155,8 @@ struct node_perms {
-
- struct node {
- const char *name;
-+ /* Key used to update TDB */
-+ TDB_DATA key;
-
- /* Parent (optional) */
- struct node *parent;
-diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
-index cd07fb0f218b..faf6c930e42a 100644
---- a/tools/xenstore/xenstored_transaction.c
-+++ b/tools/xenstore/xenstored_transaction.c
-@@ -580,6 +580,11 @@ void transaction_entry_dec(struct transaction *trans, unsigned int domid)
- list_add_tail(&d->list, &trans->changed_domains);
- }
-
-+void fail_transaction(struct transaction *trans)
-+{
-+ trans->fail = true;
-+}
-+
- void conn_delete_all_transactions(struct connection *conn)
- {
- struct transaction *trans;
-diff --git a/tools/xenstore/xenstored_transaction.h b/tools/xenstore/xenstored_transaction.h
-index 43a162bea3f3..14062730e3c9 100644
---- a/tools/xenstore/xenstored_transaction.h
-+++ b/tools/xenstore/xenstored_transaction.h
-@@ -46,6 +46,9 @@ int access_node(struct connection *conn, struct node *node,
- int transaction_prepend(struct connection *conn, const char *name,
- TDB_DATA *key);
-
-+/* Mark the transaction as failed. This will prevent it to be committed. */
-+void fail_transaction(struct transaction *trans);
-+
- void conn_delete_all_transactions(struct connection *conn);
- int check_transactions(struct hashtable *hash);
-
---
-2.37.4
-
diff --git a/0046-libacpi-fix-PCI-hotplug-AML.patch b/0046-libacpi-fix-PCI-hotplug-AML.patch
new file mode 100644
index 0000000..b1c79f5
--- /dev/null
+++ b/0046-libacpi-fix-PCI-hotplug-AML.patch
@@ -0,0 +1,57 @@
+From 54102e428ba3f677904278479f8110c8eef6fedc Mon Sep 17 00:00:00 2001
+From: David Woodhouse <dwmw@amazon.co.uk>
+Date: Tue, 21 Mar 2023 13:53:25 +0100
+Subject: [PATCH 46/61] libacpi: fix PCI hotplug AML
+
+The emulated PIIX3 uses a nybble for the status of each PCI function,
+so the status for e.g. slot 0 functions 0 and 1 respectively can be
+read as (\_GPE.PH00 & 0x0F), and (\_GPE.PH00 >> 0x04).
+
+The AML that Xen gives to a guest gets the operand order for the odd-
+numbered functions the wrong way round, returning (0x04 >> \_GPE.PH00)
+instead.
+
+As far as I can tell, this was the wrong way round in Xen from the
+moment that PCI hotplug was first introduced in commit 83d82e6f35a8:
+
++ ShiftRight (0x4, \_GPE.PH00, Local1)
++ Return (Local1) /* IN status as the _STA */
+
+Or maybe there's bizarre AML operand ordering going on there, like
+Intel's wrong-way-round assembler, and it only broke later when it was
+changed to being generated?
+
+Either way, it's definitely wrong now, and instrumenting a Linux guest
+shows that it correctly sees _STA being 0x00 in function 0 of an empty
+slot, but then the loop in acpiphp_glue.c::get_slot_status() goes on to
+look at function 1 and sees that _STA evaluates to 0x04. Thus reporting
+an adapter is present in every slot in /sys/bus/pci/slots/*
+
+Quite why Linux wants to look for function 1 being physically present
+when function 0 isn't... I don't want to think about right now.
+
+Fixes: 83d82e6f35a8 ("hvmloader: pass-through: multi-function PCI hot-plug")
+Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: b190af7d3e90f58da5f58044b8dea7261b8b483d
+master date: 2023-03-20 17:12:34 +0100
+---
+ tools/libacpi/mk_dsdt.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
+index c5ba4c0b2f..250a50b7eb 100644
+--- a/tools/libacpi/mk_dsdt.c
++++ b/tools/libacpi/mk_dsdt.c
+@@ -431,7 +431,7 @@ int main(int argc, char **argv)
+ stmt("Store", "0x89, \\_GPE.DPT2");
+ }
+ if ( slot & 1 )
+- stmt("ShiftRight", "0x4, \\_GPE.PH%02X, Local1", slot & ~1);
++ stmt("ShiftRight", "\\_GPE.PH%02X, 0x04, Local1", slot & ~1);
+ else
+ stmt("And", "\\_GPE.PH%02X, 0x0f, Local1", slot & ~1);
+ stmt("Return", "Local1"); /* IN status as the _STA */
+--
+2.40.0
+
diff --git a/0046-tools-xenstore-split-up-send_reply.patch b/0046-tools-xenstore-split-up-send_reply.patch
deleted file mode 100644
index 7af249a..0000000
--- a/0046-tools-xenstore-split-up-send_reply.patch
+++ /dev/null
@@ -1,213 +0,0 @@
-From ce6aea73f6c4c90fab2500933b3a488e2f30334b Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:07 +0200
-Subject: [PATCH 46/87] tools/xenstore: split up send_reply()
-
-Today send_reply() is used for both, normal request replies and watch
-events.
-
-Split it up into send_reply() and send_event(). This will be used to
-add some event specific handling.
-
-add_event() can be merged into send_event(), removing the need for an
-intermediate memory allocation.
-
-This is part of XSA-326.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 9bfde319dbac2a1321898d2f75a3f075c3eb7b32)
----
- tools/xenstore/xenstored_core.c | 74 +++++++++++++++++++-------------
- tools/xenstore/xenstored_core.h | 1 +
- tools/xenstore/xenstored_watch.c | 39 +++--------------
- 3 files changed, 52 insertions(+), 62 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index e9c9695fd16e..249ad5ec6fb1 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -767,49 +767,32 @@ static void send_error(struct connection *conn, int error)
- void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
- const void *data, unsigned int len)
- {
-- struct buffered_data *bdata;
-+ struct buffered_data *bdata = conn->in;
-+
-+ assert(type != XS_WATCH_EVENT);
-
- if ( len > XENSTORE_PAYLOAD_MAX ) {
- send_error(conn, E2BIG);
- return;
- }
-
-- /* Replies reuse the request buffer, events need a new one. */
-- if (type != XS_WATCH_EVENT) {
-- bdata = conn->in;
-- /* Drop asynchronous responses, e.g. errors for watch events. */
-- if (!bdata)
-- return;
-- bdata->inhdr = true;
-- bdata->used = 0;
-- conn->in = NULL;
-- } else {
-- /* Message is a child of the connection for auto-cleanup. */
-- bdata = new_buffer(conn);
-+ if (!bdata)
-+ return;
-+ bdata->inhdr = true;
-+ bdata->used = 0;
-
-- /*
-- * Allocation failure here is unfortunate: we have no way to
-- * tell anybody about it.
-- */
-- if (!bdata)
-- return;
-- }
- if (len <= DEFAULT_BUFFER_SIZE)
- bdata->buffer = bdata->default_buffer;
-- else
-+ else {
- bdata->buffer = talloc_array(bdata, char, len);
-- if (!bdata->buffer) {
-- if (type == XS_WATCH_EVENT) {
-- /* Same as above: no way to tell someone. */
-- talloc_free(bdata);
-+ if (!bdata->buffer) {
-+ send_error(conn, ENOMEM);
- return;
- }
-- /* re-establish request buffer for sending ENOMEM. */
-- conn->in = bdata;
-- send_error(conn, ENOMEM);
-- return;
- }
-
-+ conn->in = NULL;
-+
- /* Update relevant header fields and fill in the message body. */
- bdata->hdr.msg.type = type;
- bdata->hdr.msg.len = len;
-@@ -817,8 +800,39 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
-
- /* Queue for later transmission. */
- list_add_tail(&bdata->list, &conn->out_list);
-+}
-
-- return;
-+/*
-+ * Send a watch event.
-+ * As this is not directly related to the current command, errors can't be
-+ * reported.
-+ */
-+void send_event(struct connection *conn, const char *path, const char *token)
-+{
-+ struct buffered_data *bdata;
-+ unsigned int len;
-+
-+ len = strlen(path) + 1 + strlen(token) + 1;
-+ /* Don't try to send over-long events. */
-+ if (len > XENSTORE_PAYLOAD_MAX)
-+ return;
-+
-+ bdata = new_buffer(conn);
-+ if (!bdata)
-+ return;
-+
-+ bdata->buffer = talloc_array(bdata, char, len);
-+ if (!bdata->buffer) {
-+ talloc_free(bdata);
-+ return;
-+ }
-+ strcpy(bdata->buffer, path);
-+ strcpy(bdata->buffer + strlen(path) + 1, token);
-+ bdata->hdr.msg.type = XS_WATCH_EVENT;
-+ bdata->hdr.msg.len = len;
-+
-+ /* Queue for later transmission. */
-+ list_add_tail(&bdata->list, &conn->out_list);
- }
-
- /* Some routines (write, mkdir, etc) just need a non-error return */
-diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
-index 0004fa848c83..9af9af4390bd 100644
---- a/tools/xenstore/xenstored_core.h
-+++ b/tools/xenstore/xenstored_core.h
-@@ -187,6 +187,7 @@ unsigned int get_string(const struct buffered_data *data, unsigned int offset);
-
- void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
- const void *data, unsigned int len);
-+void send_event(struct connection *conn, const char *path, const char *token);
-
- /* Some routines (write, mkdir, etc) just need a non-error return */
- void send_ack(struct connection *conn, enum xsd_sockmsg_type type);
-diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
-index aca0a71bada1..99a2c266b28a 100644
---- a/tools/xenstore/xenstored_watch.c
-+++ b/tools/xenstore/xenstored_watch.c
-@@ -85,35 +85,6 @@ static const char *get_watch_path(const struct watch *watch, const char *name)
- return path;
- }
-
--/*
-- * Send a watch event.
-- * Temporary memory allocations are done with ctx.
-- */
--static void add_event(struct connection *conn,
-- const void *ctx,
-- struct watch *watch,
-- const char *name)
--{
-- /* Data to send (node\0token\0). */
-- unsigned int len;
-- char *data;
--
-- name = get_watch_path(watch, name);
--
-- len = strlen(name) + 1 + strlen(watch->token) + 1;
-- /* Don't try to send over-long events. */
-- if (len > XENSTORE_PAYLOAD_MAX)
-- return;
--
-- data = talloc_array(ctx, char, len);
-- if (!data)
-- return;
-- strcpy(data, name);
-- strcpy(data + strlen(name) + 1, watch->token);
-- send_reply(conn, XS_WATCH_EVENT, data, len);
-- talloc_free(data);
--}
--
- /*
- * Check permissions of a specific watch to fire:
- * Either the node itself or its parent have to be readable by the connection
-@@ -190,10 +161,14 @@ void fire_watches(struct connection *conn, const void *ctx, const char *name,
- list_for_each_entry(watch, &i->watches, list) {
- if (exact) {
- if (streq(name, watch->node))
-- add_event(i, ctx, watch, name);
-+ send_event(i,
-+ get_watch_path(watch, name),
-+ watch->token);
- } else {
- if (is_child(name, watch->node))
-- add_event(i, ctx, watch, name);
-+ send_event(i,
-+ get_watch_path(watch, name),
-+ watch->token);
- }
- }
- }
-@@ -292,7 +267,7 @@ int do_watch(struct connection *conn, struct buffered_data *in)
- send_ack(conn, XS_WATCH);
-
- /* We fire once up front: simplifies clients and restart. */
-- add_event(conn, in, watch, watch->node);
-+ send_event(conn, get_watch_path(watch, watch->node), watch->token);
-
- return 0;
- }
---
-2.37.4
-
diff --git a/0047-AMD-IOMMU-without-XT-x2APIC-needs-to-be-forced-into-.patch b/0047-AMD-IOMMU-without-XT-x2APIC-needs-to-be-forced-into-.patch
new file mode 100644
index 0000000..54940ba
--- /dev/null
+++ b/0047-AMD-IOMMU-without-XT-x2APIC-needs-to-be-forced-into-.patch
@@ -0,0 +1,42 @@
+From 8e9690a2252eda09537275a951ee0af0b3b330f2 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Fri, 31 Mar 2023 08:36:59 +0200
+Subject: [PATCH 47/61] AMD/IOMMU: without XT, x2APIC needs to be forced into
+ physical mode
+
+An earlier change with the same title (commit 1ba66a870eba) altered only
+the path where x2apic_phys was already set to false (perhaps from the
+command line). The same of course needs applying when the variable
+wasn't modified yet from its initial value.
+
+Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 0d2686f6b66b4b1b3c72c3525083b0ce02830054
+master date: 2023-03-21 09:23:25 +0100
+---
+ xen/arch/x86/genapic/x2apic.c | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
+index 628b441da5..247364af58 100644
+--- a/xen/arch/x86/genapic/x2apic.c
++++ b/xen/arch/x86/genapic/x2apic.c
+@@ -239,11 +239,11 @@ const struct genapic *__init apic_x2apic_probe(void)
+ if ( x2apic_phys < 0 )
+ {
+ /*
+- * Force physical mode if there's no interrupt remapping support: The
+- * ID in clustered mode requires a 32 bit destination field due to
++ * Force physical mode if there's no (full) interrupt remapping support:
++ * The ID in clustered mode requires a 32 bit destination field due to
+ * the usage of the high 16 bits to hold the cluster ID.
+ */
+- x2apic_phys = !iommu_intremap ||
++ x2apic_phys = iommu_intremap != iommu_intremap_full ||
+ (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL);
+ }
+ else if ( !x2apic_phys )
+--
+2.40.0
+
diff --git a/0047-tools-xenstore-add-helpers-to-free-struct-buffered_d.patch b/0047-tools-xenstore-add-helpers-to-free-struct-buffered_d.patch
deleted file mode 100644
index 96ba7bd..0000000
--- a/0047-tools-xenstore-add-helpers-to-free-struct-buffered_d.patch
+++ /dev/null
@@ -1,117 +0,0 @@
-From f8af1a27b00e373bfb5f5e61b14c51165a740fa4 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:07 +0200
-Subject: [PATCH 47/87] tools/xenstore: add helpers to free struct
- buffered_data
-
-Add two helpers for freeing struct buffered_data: free_buffered_data()
-for freeing one instance and conn_free_buffered_data() for freeing all
-instances for a connection.
-
-This is avoiding duplicated code and will help later when more actions
-are needed when freeing a struct buffered_data.
-
-This is part of XSA-326.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit ead062a68a9c201a95488e84750a70a107f7b317)
----
- tools/xenstore/xenstored_core.c | 26 +++++++++++++++++---------
- tools/xenstore/xenstored_core.h | 2 ++
- tools/xenstore/xenstored_domain.c | 7 +------
- 3 files changed, 20 insertions(+), 15 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 249ad5ec6fb1..527a1ebdeded 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -211,6 +211,21 @@ void reopen_log(void)
- }
- }
-
-+static void free_buffered_data(struct buffered_data *out,
-+ struct connection *conn)
-+{
-+ list_del(&out->list);
-+ talloc_free(out);
-+}
-+
-+void conn_free_buffered_data(struct connection *conn)
-+{
-+ struct buffered_data *out;
-+
-+ while ((out = list_top(&conn->out_list, struct buffered_data, list)))
-+ free_buffered_data(out, conn);
-+}
-+
- static bool write_messages(struct connection *conn)
- {
- int ret;
-@@ -254,8 +269,7 @@ static bool write_messages(struct connection *conn)
-
- trace_io(conn, out, 1);
-
-- list_del(&out->list);
-- talloc_free(out);
-+ free_buffered_data(out, conn);
-
- return true;
- }
-@@ -1506,18 +1520,12 @@ static struct {
- */
- void ignore_connection(struct connection *conn)
- {
-- struct buffered_data *out, *tmp;
--
- trace("CONN %p ignored\n", conn);
-
- conn->is_ignored = true;
- conn_delete_all_watches(conn);
- conn_delete_all_transactions(conn);
--
-- list_for_each_entry_safe(out, tmp, &conn->out_list, list) {
-- list_del(&out->list);
-- talloc_free(out);
-- }
-+ conn_free_buffered_data(conn);
-
- talloc_free(conn->in);
- conn->in = NULL;
-diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
-index 9af9af4390bd..e7ee87825c3b 100644
---- a/tools/xenstore/xenstored_core.h
-+++ b/tools/xenstore/xenstored_core.h
-@@ -276,6 +276,8 @@ int remember_string(struct hashtable *hash, const char *str);
-
- void set_tdb_key(const char *name, TDB_DATA *key);
-
-+void conn_free_buffered_data(struct connection *conn);
-+
- const char *dump_state_global(FILE *fp);
- const char *dump_state_buffered_data(FILE *fp, const struct connection *c,
- struct xs_state_connection *sc);
-diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
-index d03c7d93a9e7..93c4c1edcdd1 100644
---- a/tools/xenstore/xenstored_domain.c
-+++ b/tools/xenstore/xenstored_domain.c
-@@ -411,15 +411,10 @@ static struct domain *find_domain_by_domid(unsigned int domid)
- static void domain_conn_reset(struct domain *domain)
- {
- struct connection *conn = domain->conn;
-- struct buffered_data *out;
-
- conn_delete_all_watches(conn);
- conn_delete_all_transactions(conn);
--
-- while ((out = list_top(&conn->out_list, struct buffered_data, list))) {
-- list_del(&out->list);
-- talloc_free(out);
-- }
-+ conn_free_buffered_data(conn);
-
- talloc_free(conn->in);
-
---
-2.37.4
-
diff --git a/0048-VT-d-fix-iommu-no-igfx-if-the-IOMMU-scope-contains-f.patch b/0048-VT-d-fix-iommu-no-igfx-if-the-IOMMU-scope-contains-f.patch
new file mode 100644
index 0000000..4c480b0
--- /dev/null
+++ b/0048-VT-d-fix-iommu-no-igfx-if-the-IOMMU-scope-contains-f.patch
@@ -0,0 +1,44 @@
+From 07e8f5b3d1300327a9f2e67b03dead0e2138b92f Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
+ <marmarek@invisiblethingslab.com>
+Date: Fri, 31 Mar 2023 08:38:07 +0200
+Subject: [PATCH 48/61] VT-d: fix iommu=no-igfx if the IOMMU scope contains
+ fake device(s)
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+If the scope for IGD's IOMMU contains additional device that doesn't
+actually exist, iommu=no-igfx would not disable that IOMMU. In this
+particular case (Thinkpad x230) it included 00:02.1, but there is no
+such device on this platform. Consider only existing devices for the
+"gfx only" check as well as the establishing of IGD DRHD address
+(underlying is_igd_drhd(), which is used to determine applicability of
+two workarounds).
+
+Fixes: 2d7f191b392e ("VT-d: generalize and correct "iommu=no-igfx" handling")
+Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+master commit: 49de6749baa8d0addc3048defd4ef3e85cb135e9
+master date: 2023-03-23 09:16:41 +0100
+---
+ xen/drivers/passthrough/vtd/dmar.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
+index 9ec49936b8..bfec40f47d 100644
+--- a/xen/drivers/passthrough/vtd/dmar.c
++++ b/xen/drivers/passthrough/vtd/dmar.c
+@@ -389,7 +389,7 @@ static int __init acpi_parse_dev_scope(
+ printk(VTDPREFIX " endpoint: %pp\n",
+ &PCI_SBDF(seg, bus, path->dev, path->fn));
+
+- if ( drhd )
++ if ( drhd && pci_device_detect(seg, bus, path->dev, path->fn) )
+ {
+ if ( pci_conf_read8(PCI_SBDF(seg, bus, path->dev, path->fn),
+ PCI_CLASS_DEVICE + 1) != 0x03
+--
+2.40.0
+
diff --git a/0048-tools-xenstore-reduce-number-of-watch-events.patch b/0048-tools-xenstore-reduce-number-of-watch-events.patch
deleted file mode 100644
index 3a080fb..0000000
--- a/0048-tools-xenstore-reduce-number-of-watch-events.patch
+++ /dev/null
@@ -1,201 +0,0 @@
-From e26d6f4d1b389b859fb5a6570421e80e0213f92b Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:07 +0200
-Subject: [PATCH 48/87] tools/xenstore: reduce number of watch events
-
-When removing a watched node outside of a transaction, two watch events
-are being produced instead of just a single one.
-
-When finalizing a transaction watch events can be generated for each
-node which is being modified, even if outside a transaction such
-modifications might not have resulted in a watch event.
-
-This happens e.g.:
-
-- for nodes which are only modified due to added/removed child entries
-- for nodes being removed or created implicitly (e.g. creation of a/b/c
- is implicitly creating a/b, resulting in watch events for a, a/b and
- a/b/c instead of a/b/c only)
-
-Avoid these additional watch events, in order to reduce the needed
-memory inside Xenstore for queueing them.
-
-This is being achieved by adding event flags to struct accessed_node
-specifying whether an event should be triggered, and whether it should
-be an exact match of the modified path. Both flags can be set from
-fire_watches() instead of implying them only.
-
-This is part of XSA-326.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 3a96013a3e17baa07410b1b9776225d1d9a74297)
----
- tools/xenstore/xenstored_core.c | 19 ++++++------
- tools/xenstore/xenstored_transaction.c | 41 +++++++++++++++++++++-----
- tools/xenstore/xenstored_transaction.h | 3 ++
- tools/xenstore/xenstored_watch.c | 7 +++--
- 4 files changed, 51 insertions(+), 19 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 527a1ebdeded..bf2243873901 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -1295,7 +1295,7 @@ static void delete_child(struct connection *conn,
- }
-
- static int delete_node(struct connection *conn, const void *ctx,
-- struct node *parent, struct node *node)
-+ struct node *parent, struct node *node, bool watch_exact)
- {
- char *name;
-
-@@ -1307,7 +1307,7 @@ static int delete_node(struct connection *conn, const void *ctx,
- node->children);
- child = name ? read_node(conn, node, name) : NULL;
- if (child) {
-- if (delete_node(conn, ctx, node, child))
-+ if (delete_node(conn, ctx, node, child, true))
- return errno;
- } else {
- trace("delete_node: Error deleting child '%s/%s'!\n",
-@@ -1319,7 +1319,12 @@ static int delete_node(struct connection *conn, const void *ctx,
- talloc_free(name);
- }
-
-- fire_watches(conn, ctx, node->name, node, true, NULL);
-+ /*
-+ * Fire the watches now, when we can still see the node permissions.
-+ * This fine as we are single threaded and the next possible read will
-+ * be handled only after the node has been really removed.
-+ */
-+ fire_watches(conn, ctx, node->name, node, watch_exact, NULL);
- delete_node_single(conn, node);
- delete_child(conn, parent, basename(node->name));
- talloc_free(node);
-@@ -1345,13 +1350,7 @@ static int _rm(struct connection *conn, const void *ctx, struct node *node,
- return (errno == ENOMEM) ? ENOMEM : EINVAL;
- node->parent = parent;
-
-- /*
-- * Fire the watches now, when we can still see the node permissions.
-- * This fine as we are single threaded and the next possible read will
-- * be handled only after the node has been really removed.
-- */
-- fire_watches(conn, ctx, name, node, false, NULL);
-- return delete_node(conn, ctx, parent, node);
-+ return delete_node(conn, ctx, parent, node, false);
- }
-
-
-diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
-index faf6c930e42a..54432907fc76 100644
---- a/tools/xenstore/xenstored_transaction.c
-+++ b/tools/xenstore/xenstored_transaction.c
-@@ -130,6 +130,10 @@ struct accessed_node
-
- /* Transaction node in data base? */
- bool ta_node;
-+
-+ /* Watch event flags. */
-+ bool fire_watch;
-+ bool watch_exact;
- };
-
- struct changed_domain
-@@ -323,6 +327,29 @@ err:
- return ret;
- }
-
-+/*
-+ * A watch event should be fired for a node modified inside a transaction.
-+ * Set the corresponding information. A non-exact event is replacing an exact
-+ * one, but not the other way round.
-+ */
-+void queue_watches(struct connection *conn, const char *name, bool watch_exact)
-+{
-+ struct accessed_node *i;
-+
-+ i = find_accessed_node(conn->transaction, name);
-+ if (!i) {
-+ conn->transaction->fail = true;
-+ return;
-+ }
-+
-+ if (!i->fire_watch) {
-+ i->fire_watch = true;
-+ i->watch_exact = watch_exact;
-+ } else if (!watch_exact) {
-+ i->watch_exact = false;
-+ }
-+}
-+
- /*
- * Finalize transaction:
- * Walk through accessed nodes and check generation against global data.
-@@ -377,15 +404,15 @@ static int finalize_transaction(struct connection *conn,
- ret = tdb_store(tdb_ctx, key, data,
- TDB_REPLACE);
- talloc_free(data.dptr);
-- if (ret)
-- goto err;
-- fire_watches(conn, trans, i->node, NULL, false,
-- i->perms.p ? &i->perms : NULL);
- } else {
-- fire_watches(conn, trans, i->node, NULL, false,
-+ ret = tdb_delete(tdb_ctx, key);
-+ }
-+ if (ret)
-+ goto err;
-+ if (i->fire_watch) {
-+ fire_watches(conn, trans, i->node, NULL,
-+ i->watch_exact,
- i->perms.p ? &i->perms : NULL);
-- if (tdb_delete(tdb_ctx, key))
-- goto err;
- }
- }
-
-diff --git a/tools/xenstore/xenstored_transaction.h b/tools/xenstore/xenstored_transaction.h
-index 14062730e3c9..0093cac807e3 100644
---- a/tools/xenstore/xenstored_transaction.h
-+++ b/tools/xenstore/xenstored_transaction.h
-@@ -42,6 +42,9 @@ void transaction_entry_dec(struct transaction *trans, unsigned int domid);
- int access_node(struct connection *conn, struct node *node,
- enum node_access_type type, TDB_DATA *key);
-
-+/* Queue watches for a modified node. */
-+void queue_watches(struct connection *conn, const char *name, bool watch_exact);
-+
- /* Prepend the transaction to name if appropriate. */
- int transaction_prepend(struct connection *conn, const char *name,
- TDB_DATA *key);
-diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
-index 99a2c266b28a..205d9d8ea116 100644
---- a/tools/xenstore/xenstored_watch.c
-+++ b/tools/xenstore/xenstored_watch.c
-@@ -29,6 +29,7 @@
- #include "xenstore_lib.h"
- #include "utils.h"
- #include "xenstored_domain.h"
-+#include "xenstored_transaction.h"
-
- extern int quota_nb_watch_per_domain;
-
-@@ -143,9 +144,11 @@ void fire_watches(struct connection *conn, const void *ctx, const char *name,
- struct connection *i;
- struct watch *watch;
-
-- /* During transactions, don't fire watches. */
-- if (conn && conn->transaction)
-+ /* During transactions, don't fire watches, but queue them. */
-+ if (conn && conn->transaction) {
-+ queue_watches(conn, name, exact);
- return;
-+ }
-
- /* Create an event for each watch. */
- list_for_each_entry(i, &connections, list) {
---
-2.37.4
-
diff --git a/0049-tools-xenstore-let-unread-watch-events-time-out.patch b/0049-tools-xenstore-let-unread-watch-events-time-out.patch
deleted file mode 100644
index dab0861..0000000
--- a/0049-tools-xenstore-let-unread-watch-events-time-out.patch
+++ /dev/null
@@ -1,309 +0,0 @@
-From d08cdf0b19daf948a6b9754e90de9bc304bcd262 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:07 +0200
-Subject: [PATCH 49/87] tools/xenstore: let unread watch events time out
-
-A future modification will limit the number of outstanding requests
-for a domain, where "outstanding" means that the response of the
-request or any resulting watch event hasn't been consumed yet.
-
-In order to avoid a malicious guest being capable to block other guests
-by not reading watch events, add a timeout for watch events. In case a
-watch event hasn't been consumed after this timeout, it is being
-deleted. Set the default timeout to 20 seconds (a random value being
-not too high).
-
-In order to support to specify other timeout values in future, use a
-generic command line option for that purpose:
-
---timeout|-w watch-event=<seconds>
-
-This is part of XSA-326 / CVE-2022-42311.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 5285dcb1a5c01695c11e6397c95d906b5e765c98)
----
- tools/xenstore/xenstored_core.c | 133 +++++++++++++++++++++++++++++++-
- tools/xenstore/xenstored_core.h | 6 ++
- 2 files changed, 138 insertions(+), 1 deletion(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index bf2243873901..45244c021cd3 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -108,6 +108,8 @@ int quota_max_transaction = 10;
- int quota_nb_perms_per_node = 5;
- int quota_max_path_len = XENSTORE_REL_PATH_MAX;
-
-+unsigned int timeout_watch_event_msec = 20000;
-+
- void trace(const char *fmt, ...)
- {
- va_list arglist;
-@@ -211,19 +213,92 @@ void reopen_log(void)
- }
- }
-
-+static uint64_t get_now_msec(void)
-+{
-+ struct timespec now_ts;
-+
-+ if (clock_gettime(CLOCK_MONOTONIC, &now_ts))
-+ barf_perror("Could not find time (clock_gettime failed)");
-+
-+ return now_ts.tv_sec * 1000 + now_ts.tv_nsec / 1000000;
-+}
-+
- static void free_buffered_data(struct buffered_data *out,
- struct connection *conn)
- {
-+ struct buffered_data *req;
-+
- list_del(&out->list);
-+
-+ /*
-+ * Update conn->timeout_msec with the next found timeout value in the
-+ * queued pending requests.
-+ */
-+ if (out->timeout_msec) {
-+ conn->timeout_msec = 0;
-+ list_for_each_entry(req, &conn->out_list, list) {
-+ if (req->timeout_msec) {
-+ conn->timeout_msec = req->timeout_msec;
-+ break;
-+ }
-+ }
-+ }
-+
- talloc_free(out);
- }
-
-+static void check_event_timeout(struct connection *conn, uint64_t msecs,
-+ int *ptimeout)
-+{
-+ uint64_t delta;
-+ struct buffered_data *out, *tmp;
-+
-+ if (!conn->timeout_msec)
-+ return;
-+
-+ delta = conn->timeout_msec - msecs;
-+ if (conn->timeout_msec <= msecs) {
-+ delta = 0;
-+ list_for_each_entry_safe(out, tmp, &conn->out_list, list) {
-+ /*
-+ * Only look at buffers with timeout and no data
-+ * already written to the ring.
-+ */
-+ if (out->timeout_msec && out->inhdr && !out->used) {
-+ if (out->timeout_msec > msecs) {
-+ conn->timeout_msec = out->timeout_msec;
-+ delta = conn->timeout_msec - msecs;
-+ break;
-+ }
-+
-+ /*
-+ * Free out without updating conn->timeout_msec,
-+ * as the update is done in this loop already.
-+ */
-+ out->timeout_msec = 0;
-+ trace("watch event path %s for domain %u timed out\n",
-+ out->buffer, conn->id);
-+ free_buffered_data(out, conn);
-+ }
-+ }
-+ if (!delta) {
-+ conn->timeout_msec = 0;
-+ return;
-+ }
-+ }
-+
-+ if (*ptimeout == -1 || *ptimeout > delta)
-+ *ptimeout = delta;
-+}
-+
- void conn_free_buffered_data(struct connection *conn)
- {
- struct buffered_data *out;
-
- while ((out = list_top(&conn->out_list, struct buffered_data, list)))
- free_buffered_data(out, conn);
-+
-+ conn->timeout_msec = 0;
- }
-
- static bool write_messages(struct connection *conn)
-@@ -411,6 +486,7 @@ static void initialize_fds(int *p_sock_pollfd_idx, int *ptimeout)
- {
- struct connection *conn;
- struct wrl_timestampt now;
-+ uint64_t msecs;
-
- if (fds)
- memset(fds, 0, sizeof(struct pollfd) * current_array_size);
-@@ -431,10 +507,12 @@ static void initialize_fds(int *p_sock_pollfd_idx, int *ptimeout)
-
- wrl_gettime_now(&now);
- wrl_log_periodic(now);
-+ msecs = get_now_msec();
-
- list_for_each_entry(conn, &connections, list) {
- if (conn->domain) {
- wrl_check_timeout(conn->domain, now, ptimeout);
-+ check_event_timeout(conn, msecs, ptimeout);
- if (conn_can_read(conn) ||
- (conn_can_write(conn) &&
- !list_empty(&conn->out_list)))
-@@ -794,6 +872,7 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
- return;
- bdata->inhdr = true;
- bdata->used = 0;
-+ bdata->timeout_msec = 0;
-
- if (len <= DEFAULT_BUFFER_SIZE)
- bdata->buffer = bdata->default_buffer;
-@@ -845,6 +924,12 @@ void send_event(struct connection *conn, const char *path, const char *token)
- bdata->hdr.msg.type = XS_WATCH_EVENT;
- bdata->hdr.msg.len = len;
-
-+ if (timeout_watch_event_msec && domain_is_unprivileged(conn)) {
-+ bdata->timeout_msec = get_now_msec() + timeout_watch_event_msec;
-+ if (!conn->timeout_msec)
-+ conn->timeout_msec = bdata->timeout_msec;
-+ }
-+
- /* Queue for later transmission. */
- list_add_tail(&bdata->list, &conn->out_list);
- }
-@@ -2201,6 +2286,9 @@ static void usage(void)
- " -t, --transaction <nb> limit the number of transaction allowed per domain,\n"
- " -A, --perm-nb <nb> limit the number of permissions per node,\n"
- " -M, --path-max <chars> limit the allowed Xenstore node path length,\n"
-+" -w, --timeout <what>=<seconds> set the timeout in seconds for <what>,\n"
-+" allowed timeout candidates are:\n"
-+" watch-event: time a watch-event is kept pending\n"
- " -R, --no-recovery to request that no recovery should be attempted when\n"
- " the store is corrupted (debug only),\n"
- " -I, --internal-db store database in memory, not on disk\n"
-@@ -2223,6 +2311,7 @@ static struct option options[] = {
- { "transaction", 1, NULL, 't' },
- { "perm-nb", 1, NULL, 'A' },
- { "path-max", 1, NULL, 'M' },
-+ { "timeout", 1, NULL, 'w' },
- { "no-recovery", 0, NULL, 'R' },
- { "internal-db", 0, NULL, 'I' },
- { "verbose", 0, NULL, 'V' },
-@@ -2236,6 +2325,39 @@ int dom0_domid = 0;
- int dom0_event = 0;
- int priv_domid = 0;
-
-+static int get_optval_int(const char *arg)
-+{
-+ char *end;
-+ long val;
-+
-+ val = strtol(arg, &end, 10);
-+ if (!*arg || *end || val < 0 || val > INT_MAX)
-+ barf("invalid parameter value \"%s\"\n", arg);
-+
-+ return val;
-+}
-+
-+static bool what_matches(const char *arg, const char *what)
-+{
-+ unsigned int what_len = strlen(what);
-+
-+ return !strncmp(arg, what, what_len) && arg[what_len] == '=';
-+}
-+
-+static void set_timeout(const char *arg)
-+{
-+ const char *eq = strchr(arg, '=');
-+ int val;
-+
-+ if (!eq)
-+ barf("quotas must be specified via <what>=<seconds>\n");
-+ val = get_optval_int(eq + 1);
-+ if (what_matches(arg, "watch-event"))
-+ timeout_watch_event_msec = val * 1000;
-+ else
-+ barf("unknown timeout \"%s\"\n", arg);
-+}
-+
- int main(int argc, char *argv[])
- {
- int opt;
-@@ -2250,7 +2372,7 @@ int main(int argc, char *argv[])
- orig_argc = argc;
- orig_argv = argv;
-
-- while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:T:RVW:U", options,
-+ while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:T:RVW:w:U", options,
- NULL)) != -1) {
- switch (opt) {
- case 'D':
-@@ -2300,6 +2422,9 @@ int main(int argc, char *argv[])
- quota_max_path_len = min(XENSTORE_REL_PATH_MAX,
- quota_max_path_len);
- break;
-+ case 'w':
-+ set_timeout(optarg);
-+ break;
- case 'e':
- dom0_event = strtol(optarg, NULL, 10);
- break;
-@@ -2741,6 +2866,12 @@ static void add_buffered_data(struct buffered_data *bdata,
- barf("error restoring buffered data");
-
- memcpy(bdata->buffer, data, len);
-+ if (bdata->hdr.msg.type == XS_WATCH_EVENT && timeout_watch_event_msec &&
-+ domain_is_unprivileged(conn)) {
-+ bdata->timeout_msec = get_now_msec() + timeout_watch_event_msec;
-+ if (!conn->timeout_msec)
-+ conn->timeout_msec = bdata->timeout_msec;
-+ }
-
- /* Queue for later transmission. */
- list_add_tail(&bdata->list, &conn->out_list);
-diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
-index e7ee87825c3b..8a81fc693f01 100644
---- a/tools/xenstore/xenstored_core.h
-+++ b/tools/xenstore/xenstored_core.h
-@@ -27,6 +27,7 @@
- #include <fcntl.h>
- #include <stdbool.h>
- #include <stdint.h>
-+#include <time.h>
- #include <errno.h>
-
- #include "xenstore_lib.h"
-@@ -67,6 +68,8 @@ struct buffered_data
- char raw[sizeof(struct xsd_sockmsg)];
- } hdr;
-
-+ uint64_t timeout_msec;
-+
- /* The actual data. */
- char *buffer;
- char default_buffer[DEFAULT_BUFFER_SIZE];
-@@ -118,6 +121,7 @@ struct connection
-
- /* Buffered output data */
- struct list_head out_list;
-+ uint64_t timeout_msec;
-
- /* Transaction context for current request (NULL if none). */
- struct transaction *transaction;
-@@ -244,6 +248,8 @@ extern int dom0_event;
- extern int priv_domid;
- extern int quota_nb_entry_per_domain;
-
-+extern unsigned int timeout_watch_event_msec;
-+
- /* Map the kernel's xenstore page. */
- void *xenbus_map(void);
- void unmap_xenbus(void *interface);
---
-2.37.4
-
diff --git a/0049-x86-shadow-fix-and-improve-sh_page_has_multiple_shad.patch b/0049-x86-shadow-fix-and-improve-sh_page_has_multiple_shad.patch
new file mode 100644
index 0000000..0abf7e9
--- /dev/null
+++ b/0049-x86-shadow-fix-and-improve-sh_page_has_multiple_shad.patch
@@ -0,0 +1,47 @@
+From cab866ee62d860e9ff4abe701163972d4e9f896d Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Fri, 31 Mar 2023 08:38:42 +0200
+Subject: [PATCH 49/61] x86/shadow: fix and improve
+ sh_page_has_multiple_shadows()
+
+While no caller currently invokes the function without first making sure
+there is at least one shadow [1], we'd better eliminate UB here:
+find_first_set_bit() requires input to be non-zero to return a well-
+defined result.
+
+Further, using find_first_set_bit() isn't very efficient in the first
+place for the intended purpose.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+
+[1] The function has exactly two uses, and both are from OOS code, which
+ is HVM-only. For HVM (but not for PV) sh_mfn_is_a_page_table(),
+ guarding the call to sh_unsync(), guarantees at least one shadow.
+ Hence even if sh_page_has_multiple_shadows() returned a bogus value
+ when invoked for a PV domain, the subsequent is_hvm_vcpu() and
+ oos_active checks (the former being redundant with the latter) will
+ compensate. (Arguably that oos_active check should come first, for
+ both clarity and efficiency reasons.)
+master commit: 2896224a4e294652c33f487b603d20bd30955f21
+master date: 2023-03-24 11:07:08 +0100
+---
+ xen/arch/x86/mm/shadow/private.h | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h
+index 738214f75e..762214f73c 100644
+--- a/xen/arch/x86/mm/shadow/private.h
++++ b/xen/arch/x86/mm/shadow/private.h
+@@ -324,7 +324,7 @@ static inline int sh_page_has_multiple_shadows(struct page_info *pg)
+ return 0;
+ shadows = pg->shadow_flags & SHF_page_type_mask;
+ /* More than one type bit set in shadow-flags? */
+- return ( (shadows & ~(1UL << find_first_set_bit(shadows))) != 0 );
++ return shadows && (shadows & (shadows - 1));
+ }
+
+ #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
+--
+2.40.0
+
diff --git a/0050-tools-xenstore-limit-outstanding-requests.patch b/0050-tools-xenstore-limit-outstanding-requests.patch
deleted file mode 100644
index bb10180..0000000
--- a/0050-tools-xenstore-limit-outstanding-requests.patch
+++ /dev/null
@@ -1,453 +0,0 @@
-From 49344fb86ff040bae1107e236592c2d4dc4607f3 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:08 +0200
-Subject: [PATCH 50/87] tools/xenstore: limit outstanding requests
-
-Add another quota for limiting the number of outstanding requests of a
-guest. As the way to specify quotas on the command line is becoming
-rather nasty, switch to a new scheme using [--quota|-Q] <what>=<val>
-allowing to add more quotas in future easily.
-
-Set the default value to 20 (basically a random value not seeming to
-be too high or too low).
-
-A request is said to be outstanding if any message generated by this
-request (the direct response plus potential watch events) is not yet
-completely stored into a ring buffer. The initial watch event sent as
-a result of registering a watch is an exception.
-
-Note that across a live update the relation to buffered watch events
-for other domains is lost.
-
-Use talloc_zero() for allocating the domain structure in order to have
-all per-domain quota zeroed initially.
-
-This is part of XSA-326 / CVE-2022-42312.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 36de433a273f55d614c83b89c9a8972287a1e475)
----
- tools/xenstore/xenstored_core.c | 88 +++++++++++++++++++++++++++++--
- tools/xenstore/xenstored_core.h | 20 ++++++-
- tools/xenstore/xenstored_domain.c | 38 ++++++++++---
- tools/xenstore/xenstored_domain.h | 3 ++
- tools/xenstore/xenstored_watch.c | 15 ++++--
- 5 files changed, 150 insertions(+), 14 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 45244c021cd3..488d540f3a32 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -107,6 +107,7 @@ int quota_max_entry_size = 2048; /* 2K */
- int quota_max_transaction = 10;
- int quota_nb_perms_per_node = 5;
- int quota_max_path_len = XENSTORE_REL_PATH_MAX;
-+int quota_req_outstanding = 20;
-
- unsigned int timeout_watch_event_msec = 20000;
-
-@@ -223,12 +224,24 @@ static uint64_t get_now_msec(void)
- return now_ts.tv_sec * 1000 + now_ts.tv_nsec / 1000000;
- }
-
-+/*
-+ * Remove a struct buffered_data from the list of outgoing data.
-+ * A struct buffered_data related to a request having caused watch events to be
-+ * sent is kept until all those events have been written out.
-+ * Each watch event is referencing the related request via pend.req, while the
-+ * number of watch events caused by a request is kept in pend.ref.event_cnt
-+ * (those two cases are mutually exclusive, so the two fields can share memory
-+ * via a union).
-+ * The struct buffered_data is freed only if no related watch event is
-+ * referencing it. The related return data can be freed right away.
-+ */
- static void free_buffered_data(struct buffered_data *out,
- struct connection *conn)
- {
- struct buffered_data *req;
-
- list_del(&out->list);
-+ out->on_out_list = false;
-
- /*
- * Update conn->timeout_msec with the next found timeout value in the
-@@ -244,6 +257,30 @@ static void free_buffered_data(struct buffered_data *out,
- }
- }
-
-+ if (out->hdr.msg.type == XS_WATCH_EVENT) {
-+ req = out->pend.req;
-+ if (req) {
-+ req->pend.ref.event_cnt--;
-+ if (!req->pend.ref.event_cnt && !req->on_out_list) {
-+ if (req->on_ref_list) {
-+ domain_outstanding_domid_dec(
-+ req->pend.ref.domid);
-+ list_del(&req->list);
-+ }
-+ talloc_free(req);
-+ }
-+ }
-+ } else if (out->pend.ref.event_cnt) {
-+ /* Hang out off from conn. */
-+ talloc_steal(NULL, out);
-+ if (out->buffer != out->default_buffer)
-+ talloc_free(out->buffer);
-+ list_add(&out->list, &conn->ref_list);
-+ out->on_ref_list = true;
-+ return;
-+ } else
-+ domain_outstanding_dec(conn);
-+
- talloc_free(out);
- }
-
-@@ -405,6 +442,7 @@ int delay_request(struct connection *conn, struct buffered_data *in,
- static int destroy_conn(void *_conn)
- {
- struct connection *conn = _conn;
-+ struct buffered_data *req;
-
- /* Flush outgoing if possible, but don't block. */
- if (!conn->domain) {
-@@ -418,6 +456,11 @@ static int destroy_conn(void *_conn)
- break;
- close(conn->fd);
- }
-+
-+ conn_free_buffered_data(conn);
-+ list_for_each_entry(req, &conn->ref_list, list)
-+ req->on_ref_list = false;
-+
- if (conn->target)
- talloc_unlink(conn, conn->target);
- list_del(&conn->list);
-@@ -893,6 +936,8 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
-
- /* Queue for later transmission. */
- list_add_tail(&bdata->list, &conn->out_list);
-+ bdata->on_out_list = true;
-+ domain_outstanding_inc(conn);
- }
-
- /*
-@@ -900,7 +945,8 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
- * As this is not directly related to the current command, errors can't be
- * reported.
- */
--void send_event(struct connection *conn, const char *path, const char *token)
-+void send_event(struct buffered_data *req, struct connection *conn,
-+ const char *path, const char *token)
- {
- struct buffered_data *bdata;
- unsigned int len;
-@@ -930,8 +976,13 @@ void send_event(struct connection *conn, const char *path, const char *token)
- conn->timeout_msec = bdata->timeout_msec;
- }
-
-+ bdata->pend.req = req;
-+ if (req)
-+ req->pend.ref.event_cnt++;
-+
- /* Queue for later transmission. */
- list_add_tail(&bdata->list, &conn->out_list);
-+ bdata->on_out_list = true;
- }
-
- /* Some routines (write, mkdir, etc) just need a non-error return */
-@@ -1740,6 +1791,7 @@ static void handle_input(struct connection *conn)
- return;
- }
- in = conn->in;
-+ in->pend.ref.domid = conn->id;
-
- /* Not finished header yet? */
- if (in->inhdr) {
-@@ -1808,6 +1860,7 @@ struct connection *new_connection(const struct interface_funcs *funcs)
- new->is_stalled = false;
- new->transaction_started = 0;
- INIT_LIST_HEAD(&new->out_list);
-+ INIT_LIST_HEAD(&new->ref_list);
- INIT_LIST_HEAD(&new->watches);
- INIT_LIST_HEAD(&new->transaction_list);
- INIT_LIST_HEAD(&new->delayed);
-@@ -2286,6 +2339,9 @@ static void usage(void)
- " -t, --transaction <nb> limit the number of transaction allowed per domain,\n"
- " -A, --perm-nb <nb> limit the number of permissions per node,\n"
- " -M, --path-max <chars> limit the allowed Xenstore node path length,\n"
-+" -Q, --quota <what>=<nb> set the quota <what> to the value <nb>, allowed\n"
-+" quotas are:\n"
-+" outstanding: number of outstanding requests\n"
- " -w, --timeout <what>=<seconds> set the timeout in seconds for <what>,\n"
- " allowed timeout candidates are:\n"
- " watch-event: time a watch-event is kept pending\n"
-@@ -2311,6 +2367,7 @@ static struct option options[] = {
- { "transaction", 1, NULL, 't' },
- { "perm-nb", 1, NULL, 'A' },
- { "path-max", 1, NULL, 'M' },
-+ { "quota", 1, NULL, 'Q' },
- { "timeout", 1, NULL, 'w' },
- { "no-recovery", 0, NULL, 'R' },
- { "internal-db", 0, NULL, 'I' },
-@@ -2358,6 +2415,20 @@ static void set_timeout(const char *arg)
- barf("unknown timeout \"%s\"\n", arg);
- }
-
-+static void set_quota(const char *arg)
-+{
-+ const char *eq = strchr(arg, '=');
-+ int val;
-+
-+ if (!eq)
-+ barf("quotas must be specified via <what>=<nb>\n");
-+ val = get_optval_int(eq + 1);
-+ if (what_matches(arg, "outstanding"))
-+ quota_req_outstanding = val;
-+ else
-+ barf("unknown quota \"%s\"\n", arg);
-+}
-+
- int main(int argc, char *argv[])
- {
- int opt;
-@@ -2372,8 +2443,8 @@ int main(int argc, char *argv[])
- orig_argc = argc;
- orig_argv = argv;
-
-- while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:T:RVW:w:U", options,
-- NULL)) != -1) {
-+ while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:Q:T:RVW:w:U",
-+ options, NULL)) != -1) {
- switch (opt) {
- case 'D':
- no_domain_init = true;
-@@ -2422,6 +2493,9 @@ int main(int argc, char *argv[])
- quota_max_path_len = min(XENSTORE_REL_PATH_MAX,
- quota_max_path_len);
- break;
-+ case 'Q':
-+ set_quota(optarg);
-+ break;
- case 'w':
- set_timeout(optarg);
- break;
-@@ -2875,6 +2949,14 @@ static void add_buffered_data(struct buffered_data *bdata,
-
- /* Queue for later transmission. */
- list_add_tail(&bdata->list, &conn->out_list);
-+ bdata->on_out_list = true;
-+ /*
-+ * Watch events are never "outstanding", but the request causing them
-+ * are instead kept "outstanding" until all watch events caused by that
-+ * request have been delivered.
-+ */
-+ if (bdata->hdr.msg.type != XS_WATCH_EVENT)
-+ domain_outstanding_inc(conn);
- }
-
- void read_state_buffered_data(const void *ctx, struct connection *conn,
-diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
-index 8a81fc693f01..db09f463a657 100644
---- a/tools/xenstore/xenstored_core.h
-+++ b/tools/xenstore/xenstored_core.h
-@@ -56,6 +56,8 @@ struct xs_state_connection;
- struct buffered_data
- {
- struct list_head list;
-+ bool on_out_list;
-+ bool on_ref_list;
-
- /* Are we still doing the header? */
- bool inhdr;
-@@ -63,6 +65,17 @@ struct buffered_data
- /* How far are we? */
- unsigned int used;
-
-+ /* Outstanding request accounting. */
-+ union {
-+ /* ref is being used for requests. */
-+ struct {
-+ unsigned int event_cnt; /* # of outstanding events. */
-+ unsigned int domid; /* domid of request. */
-+ } ref;
-+ /* req is being used for watch events. */
-+ struct buffered_data *req; /* request causing event. */
-+ } pend;
-+
- union {
- struct xsd_sockmsg msg;
- char raw[sizeof(struct xsd_sockmsg)];
-@@ -123,6 +136,9 @@ struct connection
- struct list_head out_list;
- uint64_t timeout_msec;
-
-+ /* Referenced requests no longer pending. */
-+ struct list_head ref_list;
-+
- /* Transaction context for current request (NULL if none). */
- struct transaction *transaction;
-
-@@ -191,7 +207,8 @@ unsigned int get_string(const struct buffered_data *data, unsigned int offset);
-
- void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
- const void *data, unsigned int len);
--void send_event(struct connection *conn, const char *path, const char *token);
-+void send_event(struct buffered_data *req, struct connection *conn,
-+ const char *path, const char *token);
-
- /* Some routines (write, mkdir, etc) just need a non-error return */
- void send_ack(struct connection *conn, enum xsd_sockmsg_type type);
-@@ -247,6 +264,7 @@ extern int dom0_domid;
- extern int dom0_event;
- extern int priv_domid;
- extern int quota_nb_entry_per_domain;
-+extern int quota_req_outstanding;
-
- extern unsigned int timeout_watch_event_msec;
-
-diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
-index 93c4c1edcdd1..850085a92c76 100644
---- a/tools/xenstore/xenstored_domain.c
-+++ b/tools/xenstore/xenstored_domain.c
-@@ -78,6 +78,9 @@ struct domain
- /* number of watch for this domain */
- int nbwatch;
-
-+ /* Number of outstanding requests. */
-+ int nboutstanding;
-+
- /* write rate limit */
- wrl_creditt wrl_credit; /* [ -wrl_config_writecost, +_dburst ] */
- struct wrl_timestampt wrl_timestamp;
-@@ -183,8 +186,12 @@ static bool domain_can_read(struct connection *conn)
- {
- struct xenstore_domain_interface *intf = conn->domain->interface;
-
-- if (domain_is_unprivileged(conn) && conn->domain->wrl_credit < 0)
-- return false;
-+ if (domain_is_unprivileged(conn)) {
-+ if (conn->domain->wrl_credit < 0)
-+ return false;
-+ if (conn->domain->nboutstanding >= quota_req_outstanding)
-+ return false;
-+ }
-
- return (intf->req_cons != intf->req_prod);
- }
-@@ -331,7 +338,7 @@ static struct domain *alloc_domain(const void *context, unsigned int domid)
- {
- struct domain *domain;
-
-- domain = talloc(context, struct domain);
-+ domain = talloc_zero(context, struct domain);
- if (!domain) {
- errno = ENOMEM;
- return NULL;
-@@ -392,9 +399,6 @@ static int new_domain(struct domain *domain, int port, bool restore)
- domain->conn->domain = domain;
- domain->conn->id = domain->domid;
-
-- domain->nbentry = 0;
-- domain->nbwatch = 0;
--
- return 0;
- }
-
-@@ -938,6 +942,28 @@ int domain_watch(struct connection *conn)
- : 0;
- }
-
-+void domain_outstanding_inc(struct connection *conn)
-+{
-+ if (!conn || !conn->domain)
-+ return;
-+ conn->domain->nboutstanding++;
-+}
-+
-+void domain_outstanding_dec(struct connection *conn)
-+{
-+ if (!conn || !conn->domain)
-+ return;
-+ conn->domain->nboutstanding--;
-+}
-+
-+void domain_outstanding_domid_dec(unsigned int domid)
-+{
-+ struct domain *d = find_domain_by_domid(domid);
-+
-+ if (d)
-+ d->nboutstanding--;
-+}
-+
- static wrl_creditt wrl_config_writecost = WRL_FACTOR;
- static wrl_creditt wrl_config_rate = WRL_RATE * WRL_FACTOR;
- static wrl_creditt wrl_config_dburst = WRL_DBURST * WRL_FACTOR;
-diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
-index 1e929b8f8c6f..4f51b005291a 100644
---- a/tools/xenstore/xenstored_domain.h
-+++ b/tools/xenstore/xenstored_domain.h
-@@ -64,6 +64,9 @@ int domain_entry(struct connection *conn);
- void domain_watch_inc(struct connection *conn);
- void domain_watch_dec(struct connection *conn);
- int domain_watch(struct connection *conn);
-+void domain_outstanding_inc(struct connection *conn);
-+void domain_outstanding_dec(struct connection *conn);
-+void domain_outstanding_domid_dec(unsigned int domid);
-
- /* Special node permission handling. */
- int set_perms_special(struct connection *conn, const char *name,
-diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
-index 205d9d8ea116..0755ffa375ba 100644
---- a/tools/xenstore/xenstored_watch.c
-+++ b/tools/xenstore/xenstored_watch.c
-@@ -142,6 +142,7 @@ void fire_watches(struct connection *conn, const void *ctx, const char *name,
- struct node *node, bool exact, struct node_perms *perms)
- {
- struct connection *i;
-+ struct buffered_data *req;
- struct watch *watch;
-
- /* During transactions, don't fire watches, but queue them. */
-@@ -150,6 +151,8 @@ void fire_watches(struct connection *conn, const void *ctx, const char *name,
- return;
- }
-
-+ req = domain_is_unprivileged(conn) ? conn->in : NULL;
-+
- /* Create an event for each watch. */
- list_for_each_entry(i, &connections, list) {
- /* introduce/release domain watches */
-@@ -164,12 +167,12 @@ void fire_watches(struct connection *conn, const void *ctx, const char *name,
- list_for_each_entry(watch, &i->watches, list) {
- if (exact) {
- if (streq(name, watch->node))
-- send_event(i,
-+ send_event(req, i,
- get_watch_path(watch, name),
- watch->token);
- } else {
- if (is_child(name, watch->node))
-- send_event(i,
-+ send_event(req, i,
- get_watch_path(watch, name),
- watch->token);
- }
-@@ -269,8 +272,12 @@ int do_watch(struct connection *conn, struct buffered_data *in)
- trace_create(watch, "watch");
- send_ack(conn, XS_WATCH);
-
-- /* We fire once up front: simplifies clients and restart. */
-- send_event(conn, get_watch_path(watch, watch->node), watch->token);
-+ /*
-+ * We fire once up front: simplifies clients and restart.
-+ * This event will not be linked to the XS_WATCH request.
-+ */
-+ send_event(NULL, conn, get_watch_path(watch, watch->node),
-+ watch->token);
-
- return 0;
- }
---
-2.37.4
-
diff --git a/0050-x86-nospec-Fix-evaluate_nospec-code-generation-under.patch b/0050-x86-nospec-Fix-evaluate_nospec-code-generation-under.patch
new file mode 100644
index 0000000..14a8e14
--- /dev/null
+++ b/0050-x86-nospec-Fix-evaluate_nospec-code-generation-under.patch
@@ -0,0 +1,101 @@
+From 90320fd05991d7817cea85e1d45674b757abf03c Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 31 Mar 2023 08:39:32 +0200
+Subject: [PATCH 50/61] x86/nospec: Fix evaluate_nospec() code generation under
+ Clang
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+It turns out that evaluate_nospec() code generation is not safe under Clang.
+Given:
+
+ void eval_nospec_test(int x)
+ {
+ if ( evaluate_nospec(x) )
+ asm volatile ("nop #true" ::: "memory");
+ else
+ asm volatile ("nop #false" ::: "memory");
+ }
+
+Clang emits:
+
+ <eval_nospec_test>:
+ 0f ae e8 lfence
+ 85 ff test %edi,%edi
+ 74 02 je <eval_nospec_test+0x9>
+ 90 nop
+ c3 ret
+ 90 nop
+ c3 ret
+
+which is not safe because the lfence has been hoisted above the conditional
+jump. Clang concludes that both barrier_nospec_true()'s have identical side
+effects and can safely be merged.
+
+Clang can be persuaded that the side effects are different if there are
+different comments in the asm blocks. This is fragile, but no more fragile
+that other aspects of this construct.
+
+Introduce barrier_nospec_false() with a separate internal comment to prevent
+Clang merging it with barrier_nospec_true() despite the otherwise-identical
+content. The generated code now becomes:
+
+ <eval_nospec_test>:
+ 85 ff test %edi,%edi
+ 74 05 je <eval_nospec_test+0x9>
+ 0f ae e8 lfence
+ 90 nop
+ c3 ret
+ 0f ae e8 lfence
+ 90 nop
+ c3 ret
+
+which has the correct number of lfence's, and in the correct place.
+
+Link: https://github.com/llvm/llvm-project/issues/55084
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: bc3c133841435829ba5c0a48427e2a77633502ab
+master date: 2023-03-24 12:16:31 +0000
+---
+ xen/include/asm-x86/nospec.h | 15 +++++++++++++--
+ 1 file changed, 13 insertions(+), 2 deletions(-)
+
+diff --git a/xen/include/asm-x86/nospec.h b/xen/include/asm-x86/nospec.h
+index 5312ae4c6f..7150e76b87 100644
+--- a/xen/include/asm-x86/nospec.h
++++ b/xen/include/asm-x86/nospec.h
+@@ -10,15 +10,26 @@
+ static always_inline bool barrier_nospec_true(void)
+ {
+ #ifdef CONFIG_SPECULATIVE_HARDEN_BRANCH
+- alternative("lfence", "", X86_FEATURE_SC_NO_BRANCH_HARDEN);
++ alternative("lfence #nospec-true", "", X86_FEATURE_SC_NO_BRANCH_HARDEN);
+ #endif
+ return true;
+ }
+
++static always_inline bool barrier_nospec_false(void)
++{
++#ifdef CONFIG_SPECULATIVE_HARDEN_BRANCH
++ alternative("lfence #nospec-false", "", X86_FEATURE_SC_NO_BRANCH_HARDEN);
++#endif
++ return false;
++}
++
+ /* Allow to protect evaluation of conditionals with respect to speculation */
+ static always_inline bool evaluate_nospec(bool condition)
+ {
+- return condition ? barrier_nospec_true() : !barrier_nospec_true();
++ if ( condition )
++ return barrier_nospec_true();
++ else
++ return barrier_nospec_false();
+ }
+
+ /* Allow to block speculative execution in generic code */
+--
+2.40.0
+
diff --git a/0051-tools-xenstore-don-t-buffer-multiple-identical-watch.patch b/0051-tools-xenstore-don-t-buffer-multiple-identical-watch.patch
deleted file mode 100644
index 2c2dfd6..0000000
--- a/0051-tools-xenstore-don-t-buffer-multiple-identical-watch.patch
+++ /dev/null
@@ -1,93 +0,0 @@
-From b270ad4a7ebe3409337bf3730317af6977c38197 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:08 +0200
-Subject: [PATCH 51/87] tools/xenstore: don't buffer multiple identical watch
- events
-
-A guest not reading its Xenstore response buffer fast enough might
-pile up lots of Xenstore watch events buffered. Reduce the generated
-load by dropping new events which already have an identical copy
-pending.
-
-The special events "@..." are excluded from that handling as there are
-known use cases where the handler is relying on each event to be sent
-individually.
-
-This is part of XSA-326.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit b5c0bdb96d33e18c324c13d8e33c08732d77eaa2)
----
- tools/xenstore/xenstored_core.c | 20 +++++++++++++++++++-
- tools/xenstore/xenstored_core.h | 3 +++
- 2 files changed, 22 insertions(+), 1 deletion(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 488d540f3a32..f1fa97b8cf50 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -916,6 +916,7 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
- bdata->inhdr = true;
- bdata->used = 0;
- bdata->timeout_msec = 0;
-+ bdata->watch_event = false;
-
- if (len <= DEFAULT_BUFFER_SIZE)
- bdata->buffer = bdata->default_buffer;
-@@ -948,7 +949,7 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
- void send_event(struct buffered_data *req, struct connection *conn,
- const char *path, const char *token)
- {
-- struct buffered_data *bdata;
-+ struct buffered_data *bdata, *bd;
- unsigned int len;
-
- len = strlen(path) + 1 + strlen(token) + 1;
-@@ -970,12 +971,29 @@ void send_event(struct buffered_data *req, struct connection *conn,
- bdata->hdr.msg.type = XS_WATCH_EVENT;
- bdata->hdr.msg.len = len;
-
-+ /*
-+ * Check whether an identical event is pending already.
-+ * Special events are excluded from that check.
-+ */
-+ if (path[0] != '@') {
-+ list_for_each_entry(bd, &conn->out_list, list) {
-+ if (bd->watch_event && bd->hdr.msg.len == len &&
-+ !memcmp(bdata->buffer, bd->buffer, len)) {
-+ trace("dropping duplicate watch %s %s for domain %u\n",
-+ path, token, conn->id);
-+ talloc_free(bdata);
-+ return;
-+ }
-+ }
-+ }
-+
- if (timeout_watch_event_msec && domain_is_unprivileged(conn)) {
- bdata->timeout_msec = get_now_msec() + timeout_watch_event_msec;
- if (!conn->timeout_msec)
- conn->timeout_msec = bdata->timeout_msec;
- }
-
-+ bdata->watch_event = true;
- bdata->pend.req = req;
- if (req)
- req->pend.ref.event_cnt++;
-diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
-index db09f463a657..b9b50e81c7b4 100644
---- a/tools/xenstore/xenstored_core.h
-+++ b/tools/xenstore/xenstored_core.h
-@@ -62,6 +62,9 @@ struct buffered_data
- /* Are we still doing the header? */
- bool inhdr;
-
-+ /* Is this a watch event? */
-+ bool watch_event;
-+
- /* How far are we? */
- unsigned int used;
-
---
-2.37.4
-
diff --git a/0051-x86-shadow-Fix-build-with-no-PG_log_dirty.patch b/0051-x86-shadow-Fix-build-with-no-PG_log_dirty.patch
new file mode 100644
index 0000000..ef2a137
--- /dev/null
+++ b/0051-x86-shadow-Fix-build-with-no-PG_log_dirty.patch
@@ -0,0 +1,56 @@
+From 7e1fe95c79d55a1c1a65f71a078b8e31c69ffe94 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 31 Mar 2023 08:39:49 +0200
+Subject: [PATCH 51/61] x86/shadow: Fix build with no PG_log_dirty
+
+Gitlab Randconfig found:
+
+ arch/x86/mm/shadow/common.c: In function 'shadow_prealloc':
+ arch/x86/mm/shadow/common.c:1023:18: error: implicit declaration of function
+ 'paging_logdirty_levels'; did you mean 'paging_log_dirty_init'? [-Werror=implicit-function-declaration]
+ 1023 | count += paging_logdirty_levels();
+ | ^~~~~~~~~~~~~~~~~~~~~~
+ | paging_log_dirty_init
+ arch/x86/mm/shadow/common.c:1023:18: error: nested extern declaration of 'paging_logdirty_levels' [-Werror=nested-externs]
+
+The '#if PG_log_dirty' expression is currently SHADOW_PAGING && !HVM &&
+PV_SHIM_EXCLUSIVE. Move the declaration outside.
+
+Fixes: 33fb3a661223 ("x86/shadow: account for log-dirty mode when pre-allocating")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 6d14cb105b1c54ad7b4228d858ae85aa8a672bbd
+master date: 2023-03-24 12:16:31 +0000
+---
+ xen/include/asm-x86/paging.h | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+diff --git a/xen/include/asm-x86/paging.h b/xen/include/asm-x86/paging.h
+index c6b429c691..43abaa5bd1 100644
+--- a/xen/include/asm-x86/paging.h
++++ b/xen/include/asm-x86/paging.h
+@@ -154,6 +154,10 @@ struct paging_mode {
+ /*****************************************************************************
+ * Log dirty code */
+
++#define paging_logdirty_levels() \
++ (DIV_ROUND_UP(PADDR_BITS - PAGE_SHIFT - (PAGE_SHIFT + 3), \
++ PAGE_SHIFT - ilog2(sizeof(mfn_t))) + 1)
++
+ #if PG_log_dirty
+
+ /* get the dirty bitmap for a specific range of pfns */
+@@ -192,10 +196,6 @@ int paging_mfn_is_dirty(struct domain *d, mfn_t gmfn);
+ #define L4_LOGDIRTY_IDX(pfn) ((pfn_x(pfn) >> (PAGE_SHIFT + 3 + PAGETABLE_ORDER * 2)) & \
+ (LOGDIRTY_NODE_ENTRIES-1))
+
+-#define paging_logdirty_levels() \
+- (DIV_ROUND_UP(PADDR_BITS - PAGE_SHIFT - (PAGE_SHIFT + 3), \
+- PAGE_SHIFT - ilog2(sizeof(mfn_t))) + 1)
+-
+ #ifdef CONFIG_HVM
+ /* VRAM dirty tracking support */
+ struct sh_dirty_vram {
+--
+2.40.0
+
diff --git a/0052-tools-xenstore-fix-connection-id-usage.patch b/0052-tools-xenstore-fix-connection-id-usage.patch
deleted file mode 100644
index 5eac10f..0000000
--- a/0052-tools-xenstore-fix-connection-id-usage.patch
+++ /dev/null
@@ -1,61 +0,0 @@
-From 787241f55216d34ca025c835c6a2096d7664d711 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:08 +0200
-Subject: [PATCH 52/87] tools/xenstore: fix connection->id usage
-
-Don't use conn->id for privilege checks, but domain_is_unprivileged().
-
-This is part of XSA-326.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 3047df38e1991510bc295e3e1bb6b6b6c4a97831)
----
- tools/xenstore/xenstored_control.c | 2 +-
- tools/xenstore/xenstored_core.h | 2 +-
- tools/xenstore/xenstored_transaction.c | 3 ++-
- 3 files changed, 4 insertions(+), 3 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
-index 7b4300ef7777..adb8d51b043b 100644
---- a/tools/xenstore/xenstored_control.c
-+++ b/tools/xenstore/xenstored_control.c
-@@ -891,7 +891,7 @@ int do_control(struct connection *conn, struct buffered_data *in)
- unsigned int cmd, num, off;
- char **vec = NULL;
-
-- if (conn->id != 0)
-+ if (domain_is_unprivileged(conn))
- return EACCES;
-
- off = get_string(in, 0);
-diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
-index b9b50e81c7b4..b1a70488b989 100644
---- a/tools/xenstore/xenstored_core.h
-+++ b/tools/xenstore/xenstored_core.h
-@@ -123,7 +123,7 @@ struct connection
- /* The index of pollfd in global pollfd array */
- int pollfd_idx;
-
-- /* Who am I? 0 for socket connections. */
-+ /* Who am I? Domid of connection. */
- unsigned int id;
-
- /* Is this connection ignored? */
-diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
-index 54432907fc76..ee1b09031a3b 100644
---- a/tools/xenstore/xenstored_transaction.c
-+++ b/tools/xenstore/xenstored_transaction.c
-@@ -477,7 +477,8 @@ int do_transaction_start(struct connection *conn, struct buffered_data *in)
- if (conn->transaction)
- return EBUSY;
-
-- if (conn->id && conn->transaction_started > quota_max_transaction)
-+ if (domain_is_unprivileged(conn) &&
-+ conn->transaction_started > quota_max_transaction)
- return ENOSPC;
-
- /* Attach transaction to input for autofree until it's complete */
---
-2.37.4
-
diff --git a/0052-x86-vmx-Don-t-spuriously-crash-the-domain-when-INIT-.patch b/0052-x86-vmx-Don-t-spuriously-crash-the-domain-when-INIT-.patch
new file mode 100644
index 0000000..c408fbb
--- /dev/null
+++ b/0052-x86-vmx-Don-t-spuriously-crash-the-domain-when-INIT-.patch
@@ -0,0 +1,51 @@
+From b1022b65de59828d40d9d71cc734a42c1c30c972 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 31 Mar 2023 08:40:27 +0200
+Subject: [PATCH 52/61] x86/vmx: Don't spuriously crash the domain when INIT is
+ received
+
+In VMX operation, the handling of INIT IPIs is changed. Instead of the CPU
+resetting, the next VMEntry fails with EXIT_REASON_INIT. From the TXT spec,
+the intent of this behaviour is so that an entity which cares can scrub
+secrets from RAM before participating in an orderly shutdown.
+
+Right now, Xen's behaviour is that when an INIT arrives, the HVM VM which
+schedules next is killed (citing an unknown VMExit), *and* we ignore the INIT
+and continue blindly onwards anyway.
+
+This patch addresses only the first of these two problems by ignoring the INIT
+and continuing without crashing the VM in question.
+
+The second wants addressing too, just as soon as we've figured out something
+better to do...
+
+Discovered as collateral damage from when an AP triple faults on S3 resume on
+Intel TigerLake platforms.
+
+Link: https://github.com/QubesOS/qubes-issues/issues/7283
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+master commit: b1f11273d5a774cc88a3685c96c2e7cf6385e3b6
+master date: 2023-03-24 22:49:58 +0000
+---
+ xen/arch/x86/hvm/vmx/vmx.c | 4 ++++
+ 1 file changed, 4 insertions(+)
+
+diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
+index c8a839cd5e..cebe46ef6a 100644
+--- a/xen/arch/x86/hvm/vmx/vmx.c
++++ b/xen/arch/x86/hvm/vmx/vmx.c
+@@ -4002,6 +4002,10 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+ case EXIT_REASON_MCE_DURING_VMENTRY:
+ do_machine_check(regs);
+ break;
++
++ case EXIT_REASON_INIT:
++ printk(XENLOG_ERR "Error: INIT received - ignoring\n");
++ return; /* Renter the guest without further processing */
+ }
+
+ /* Now enable interrupts so it's safe to take locks. */
+--
+2.40.0
+
diff --git a/0053-tools-xenstore-simplify-and-fix-per-domain-node-acco.patch b/0053-tools-xenstore-simplify-and-fix-per-domain-node-acco.patch
deleted file mode 100644
index 1bd3051..0000000
--- a/0053-tools-xenstore-simplify-and-fix-per-domain-node-acco.patch
+++ /dev/null
@@ -1,336 +0,0 @@
-From 717460e062dfe13a69cb01f518dd7b65d39376ef Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:08 +0200
-Subject: [PATCH 53/87] tools/xenstore: simplify and fix per domain node
- accounting
-
-The accounting of nodes can be simplified now that each connection
-holds the associated domid.
-
-Fix the node accounting to cover nodes created for a domain before it
-has been introduced. This requires to react properly to an allocation
-failure inside domain_entry_inc() by returning an error code.
-
-Especially in error paths the node accounting has to be fixed in some
-cases.
-
-This is part of XSA-326 / CVE-2022-42313.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit dbef1f7482894c572d90cd73d99ed689c891e863)
----
- tools/xenstore/xenstored_core.c | 43 ++++++++--
- tools/xenstore/xenstored_domain.c | 105 ++++++++++++++++---------
- tools/xenstore/xenstored_domain.h | 4 +-
- tools/xenstore/xenstored_transaction.c | 8 +-
- 4 files changed, 109 insertions(+), 51 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index f1fa97b8cf50..692d863fce35 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -638,7 +638,7 @@ struct node *read_node(struct connection *conn, const void *ctx,
-
- /* Permissions are struct xs_permissions. */
- node->perms.p = hdr->perms;
-- if (domain_adjust_node_perms(node)) {
-+ if (domain_adjust_node_perms(conn, node)) {
- talloc_free(node);
- return NULL;
- }
-@@ -660,7 +660,7 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
- void *p;
- struct xs_tdb_record_hdr *hdr;
-
-- if (domain_adjust_node_perms(node))
-+ if (domain_adjust_node_perms(conn, node))
- return errno;
-
- data.dsize = sizeof(*hdr)
-@@ -1272,13 +1272,17 @@ nomem:
- return NULL;
- }
-
--static int destroy_node(struct connection *conn, struct node *node)
-+static void destroy_node_rm(struct node *node)
- {
- if (streq(node->name, "/"))
- corrupt(NULL, "Destroying root node!");
-
- tdb_delete(tdb_ctx, node->key);
-+}
-
-+static int destroy_node(struct connection *conn, struct node *node)
-+{
-+ destroy_node_rm(node);
- domain_entry_dec(conn, node);
-
- /*
-@@ -1328,8 +1332,12 @@ static struct node *create_node(struct connection *conn, const void *ctx,
- goto err;
-
- /* Account for new node */
-- if (i->parent)
-- domain_entry_inc(conn, i);
-+ if (i->parent) {
-+ if (domain_entry_inc(conn, i)) {
-+ destroy_node_rm(i);
-+ return NULL;
-+ }
-+ }
- }
-
- return node;
-@@ -1614,10 +1622,27 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
- old_perms = node->perms;
- domain_entry_dec(conn, node);
- node->perms = perms;
-- domain_entry_inc(conn, node);
-+ if (domain_entry_inc(conn, node)) {
-+ node->perms = old_perms;
-+ /*
-+ * This should never fail because we had a reference on the
-+ * domain before and Xenstored is single-threaded.
-+ */
-+ domain_entry_inc(conn, node);
-+ return ENOMEM;
-+ }
-
-- if (write_node(conn, node, false))
-+ if (write_node(conn, node, false)) {
-+ int saved_errno = errno;
-+
-+ domain_entry_dec(conn, node);
-+ node->perms = old_perms;
-+ /* No failure possible as above. */
-+ domain_entry_inc(conn, node);
-+
-+ errno = saved_errno;
- return errno;
-+ }
-
- fire_watches(conn, in, name, node, false, &old_perms);
- send_ack(conn, XS_SET_PERMS);
-@@ -3122,7 +3147,9 @@ void read_state_node(const void *ctx, const void *state)
- set_tdb_key(name, &key);
- if (write_node_raw(NULL, &key, node, true))
- barf("write node error restoring node");
-- domain_entry_inc(&conn, node);
-+
-+ if (domain_entry_inc(&conn, node))
-+ barf("node accounting error restoring node");
-
- talloc_free(node);
- }
-diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
-index 850085a92c76..260952e09096 100644
---- a/tools/xenstore/xenstored_domain.c
-+++ b/tools/xenstore/xenstored_domain.c
-@@ -16,6 +16,7 @@
- along with this program; If not, see <http://www.gnu.org/licenses/>.
- */
-
-+#include <assert.h>
- #include <stdio.h>
- #include <sys/mman.h>
- #include <unistd.h>
-@@ -363,6 +364,18 @@ static struct domain *find_or_alloc_domain(const void *ctx, unsigned int domid)
- return domain ? : alloc_domain(ctx, domid);
- }
-
-+static struct domain *find_or_alloc_existing_domain(unsigned int domid)
-+{
-+ struct domain *domain;
-+ xc_dominfo_t dominfo;
-+
-+ domain = find_domain_struct(domid);
-+ if (!domain && get_domain_info(domid, &dominfo))
-+ domain = alloc_domain(NULL, domid);
-+
-+ return domain;
-+}
-+
- static int new_domain(struct domain *domain, int port, bool restore)
- {
- int rc;
-@@ -782,30 +795,28 @@ void domain_deinit(void)
- xenevtchn_unbind(xce_handle, virq_port);
- }
-
--void domain_entry_inc(struct connection *conn, struct node *node)
-+int domain_entry_inc(struct connection *conn, struct node *node)
- {
- struct domain *d;
-+ unsigned int domid;
-
- if (!conn)
-- return;
-+ return 0;
-
-- if (node->perms.p && node->perms.p[0].id != conn->id) {
-- if (conn->transaction) {
-- transaction_entry_inc(conn->transaction,
-- node->perms.p[0].id);
-- } else {
-- d = find_domain_by_domid(node->perms.p[0].id);
-- if (d)
-- d->nbentry++;
-- }
-- } else if (conn->domain) {
-- if (conn->transaction) {
-- transaction_entry_inc(conn->transaction,
-- conn->domain->domid);
-- } else {
-- conn->domain->nbentry++;
-- }
-+ domid = node->perms.p ? node->perms.p[0].id : conn->id;
-+
-+ if (conn->transaction) {
-+ transaction_entry_inc(conn->transaction, domid);
-+ } else {
-+ d = (domid == conn->id && conn->domain) ? conn->domain
-+ : find_or_alloc_existing_domain(domid);
-+ if (d)
-+ d->nbentry++;
-+ else
-+ return ENOMEM;
- }
-+
-+ return 0;
- }
-
- /*
-@@ -841,7 +852,7 @@ static int chk_domain_generation(unsigned int domid, uint64_t gen)
- * Remove permissions for no longer existing domains in order to avoid a new
- * domain with the same domid inheriting the permissions.
- */
--int domain_adjust_node_perms(struct node *node)
-+int domain_adjust_node_perms(struct connection *conn, struct node *node)
- {
- unsigned int i;
- int ret;
-@@ -851,8 +862,14 @@ int domain_adjust_node_perms(struct node *node)
- return errno;
-
- /* If the owner doesn't exist any longer give it to priv domain. */
-- if (!ret)
-+ if (!ret) {
-+ /*
-+ * In theory we'd need to update the number of dom0 nodes here,
-+ * but we could be called for a read of the node. So better
-+ * avoid the risk to overflow the node count of dom0.
-+ */
- node->perms.p[0].id = priv_domid;
-+ }
-
- for (i = 1; i < node->perms.num; i++) {
- if (node->perms.p[i].perms & XS_PERM_IGNORE)
-@@ -871,25 +888,25 @@ int domain_adjust_node_perms(struct node *node)
- void domain_entry_dec(struct connection *conn, struct node *node)
- {
- struct domain *d;
-+ unsigned int domid;
-
- if (!conn)
- return;
-
-- if (node->perms.p && node->perms.p[0].id != conn->id) {
-- if (conn->transaction) {
-- transaction_entry_dec(conn->transaction,
-- node->perms.p[0].id);
-+ domid = node->perms.p ? node->perms.p[0].id : conn->id;
-+
-+ if (conn->transaction) {
-+ transaction_entry_dec(conn->transaction, domid);
-+ } else {
-+ d = (domid == conn->id && conn->domain) ? conn->domain
-+ : find_domain_struct(domid);
-+ if (d) {
-+ d->nbentry--;
- } else {
-- d = find_domain_by_domid(node->perms.p[0].id);
-- if (d && d->nbentry)
-- d->nbentry--;
-- }
-- } else if (conn->domain && conn->domain->nbentry) {
-- if (conn->transaction) {
-- transaction_entry_dec(conn->transaction,
-- conn->domain->domid);
-- } else {
-- conn->domain->nbentry--;
-+ errno = ENOENT;
-+ corrupt(conn,
-+ "Node \"%s\" owned by non-existing domain %u\n",
-+ node->name, domid);
- }
- }
- }
-@@ -899,13 +916,23 @@ int domain_entry_fix(unsigned int domid, int num, bool update)
- struct domain *d;
- int cnt;
-
-- d = find_domain_by_domid(domid);
-- if (!d)
-- return 0;
-+ if (update) {
-+ d = find_domain_struct(domid);
-+ assert(d);
-+ } else {
-+ /*
-+ * We are called first with update == false in order to catch
-+ * any error. So do a possible allocation and check for error
-+ * only in this case, as in the case of update == true nothing
-+ * can go wrong anymore as the allocation already happened.
-+ */
-+ d = find_or_alloc_existing_domain(domid);
-+ if (!d)
-+ return -1;
-+ }
-
- cnt = d->nbentry + num;
-- if (cnt < 0)
-- cnt = 0;
-+ assert(cnt >= 0);
-
- if (update)
- d->nbentry = cnt;
-diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
-index 4f51b005291a..d6519904d831 100644
---- a/tools/xenstore/xenstored_domain.h
-+++ b/tools/xenstore/xenstored_domain.h
-@@ -54,10 +54,10 @@ const char *get_implicit_path(const struct connection *conn);
- bool domain_is_unprivileged(struct connection *conn);
-
- /* Remove node permissions for no longer existing domains. */
--int domain_adjust_node_perms(struct node *node);
-+int domain_adjust_node_perms(struct connection *conn, struct node *node);
-
- /* Quota manipulation */
--void domain_entry_inc(struct connection *conn, struct node *);
-+int domain_entry_inc(struct connection *conn, struct node *);
- void domain_entry_dec(struct connection *conn, struct node *);
- int domain_entry_fix(unsigned int domid, int num, bool update);
- int domain_entry(struct connection *conn);
-diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
-index ee1b09031a3b..86caf6c398be 100644
---- a/tools/xenstore/xenstored_transaction.c
-+++ b/tools/xenstore/xenstored_transaction.c
-@@ -519,8 +519,12 @@ static int transaction_fix_domains(struct transaction *trans, bool update)
-
- list_for_each_entry(d, &trans->changed_domains, list) {
- cnt = domain_entry_fix(d->domid, d->nbentry, update);
-- if (!update && cnt >= quota_nb_entry_per_domain)
-- return ENOSPC;
-+ if (!update) {
-+ if (cnt >= quota_nb_entry_per_domain)
-+ return ENOSPC;
-+ if (cnt < 0)
-+ return ENOMEM;
-+ }
- }
-
- return 0;
---
-2.37.4
-
diff --git a/0053-x86-ucode-Fix-error-paths-control_thread_fn.patch b/0053-x86-ucode-Fix-error-paths-control_thread_fn.patch
new file mode 100644
index 0000000..7bb2c27
--- /dev/null
+++ b/0053-x86-ucode-Fix-error-paths-control_thread_fn.patch
@@ -0,0 +1,56 @@
+From 0f81c5a2c8e0432d5af3d9f4e6398376cd514516 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 31 Mar 2023 08:40:56 +0200
+Subject: [PATCH 53/61] x86/ucode: Fix error paths control_thread_fn()
+
+These two early exits skipped re-enabling the watchdog, restoring the NMI
+callback, and clearing the nmi_patch global pointer. Always execute the tail
+of the function on the way out.
+
+Fixes: 8dd4dfa92d62 ("x86/microcode: Synchronize late microcode loading")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: fc2e1f3aad602a66c14b8285a1bd38a82f8fd02d
+master date: 2023-03-28 11:57:56 +0100
+---
+ xen/arch/x86/cpu/microcode/core.c | 9 +++------
+ 1 file changed, 3 insertions(+), 6 deletions(-)
+
+diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
+index ee7df9a591..ad150e5963 100644
+--- a/xen/arch/x86/cpu/microcode/core.c
++++ b/xen/arch/x86/cpu/microcode/core.c
+@@ -488,10 +488,7 @@ static int control_thread_fn(const struct microcode_patch *patch)
+ ret = wait_for_condition(wait_cpu_callin, num_online_cpus(),
+ MICROCODE_CALLIN_TIMEOUT_US);
+ if ( ret )
+- {
+- set_state(LOADING_EXIT);
+- return ret;
+- }
++ goto out;
+
+ /* Control thread loads ucode first while others are in NMI handler. */
+ ret = microcode_ops->apply_microcode(patch);
+@@ -503,8 +500,7 @@ static int control_thread_fn(const struct microcode_patch *patch)
+ {
+ printk(XENLOG_ERR
+ "Late loading aborted: CPU%u failed to update ucode\n", cpu);
+- set_state(LOADING_EXIT);
+- return ret;
++ goto out;
+ }
+
+ /* Let primary threads load the given ucode update */
+@@ -535,6 +531,7 @@ static int control_thread_fn(const struct microcode_patch *patch)
+ }
+ }
+
++ out:
+ /* Mark loading is done to unblock other threads */
+ set_state(LOADING_EXIT);
+
+--
+2.40.0
+
diff --git a/0054-tools-xenstore-limit-max-number-of-nodes-accessed-in.patch b/0054-tools-xenstore-limit-max-number-of-nodes-accessed-in.patch
deleted file mode 100644
index 0a84c6c..0000000
--- a/0054-tools-xenstore-limit-max-number-of-nodes-accessed-in.patch
+++ /dev/null
@@ -1,255 +0,0 @@
-From 7017cfefc455db535054ebc09124af8101746a4a Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:09 +0200
-Subject: [PATCH 54/87] tools/xenstore: limit max number of nodes accessed in a
- transaction
-
-Today a guest is free to access as many nodes in a single transaction
-as it wants. This can lead to unbounded memory consumption in Xenstore
-as there is the need to keep track of all nodes having been accessed
-during a transaction.
-
-In oxenstored the number of requests in a transaction is being limited
-via a quota maxrequests (default is 1024). As multiple accesses of a
-node are not problematic in C Xenstore, limit the number of accessed
-nodes.
-
-In order to let read_node() detect a quota error in case too many nodes
-are being accessed, check the return value of access_node() and return
-NULL in case an error has been seen. Introduce __must_check and add it
-to the access_node() prototype.
-
-This is part of XSA-326 / CVE-2022-42314.
-
-Suggested-by: Julien Grall <julien@xen.org>
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 268369d8e322d227a74a899009c5748d7b0ea142)
----
- tools/include/xen-tools/libs.h | 4 +++
- tools/xenstore/xenstored_core.c | 50 ++++++++++++++++++--------
- tools/xenstore/xenstored_core.h | 1 +
- tools/xenstore/xenstored_transaction.c | 9 +++++
- tools/xenstore/xenstored_transaction.h | 4 +--
- 5 files changed, 52 insertions(+), 16 deletions(-)
-
-diff --git a/tools/include/xen-tools/libs.h b/tools/include/xen-tools/libs.h
-index a16e0c380709..bafc90e2f603 100644
---- a/tools/include/xen-tools/libs.h
-+++ b/tools/include/xen-tools/libs.h
-@@ -63,4 +63,8 @@
- #define ROUNDUP(_x,_w) (((unsigned long)(_x)+(1UL<<(_w))-1) & ~((1UL<<(_w))-1))
- #endif
-
-+#ifndef __must_check
-+#define __must_check __attribute__((__warn_unused_result__))
-+#endif
-+
- #endif /* __XEN_TOOLS_LIBS__ */
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 692d863fce35..f835aa1b2f1f 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -106,6 +106,7 @@ int quota_nb_watch_per_domain = 128;
- int quota_max_entry_size = 2048; /* 2K */
- int quota_max_transaction = 10;
- int quota_nb_perms_per_node = 5;
-+int quota_trans_nodes = 1024;
- int quota_max_path_len = XENSTORE_REL_PATH_MAX;
- int quota_req_outstanding = 20;
-
-@@ -595,6 +596,7 @@ struct node *read_node(struct connection *conn, const void *ctx,
- TDB_DATA key, data;
- struct xs_tdb_record_hdr *hdr;
- struct node *node;
-+ int err;
-
- node = talloc(ctx, struct node);
- if (!node) {
-@@ -616,14 +618,13 @@ struct node *read_node(struct connection *conn, const void *ctx,
- if (data.dptr == NULL) {
- if (tdb_error(tdb_ctx) == TDB_ERR_NOEXIST) {
- node->generation = NO_GENERATION;
-- access_node(conn, node, NODE_ACCESS_READ, NULL);
-- errno = ENOENT;
-+ err = access_node(conn, node, NODE_ACCESS_READ, NULL);
-+ errno = err ? : ENOENT;
- } else {
- log("TDB error on read: %s", tdb_errorstr(tdb_ctx));
- errno = EIO;
- }
-- talloc_free(node);
-- return NULL;
-+ goto error;
- }
-
- node->parent = NULL;
-@@ -638,19 +639,36 @@ struct node *read_node(struct connection *conn, const void *ctx,
-
- /* Permissions are struct xs_permissions. */
- node->perms.p = hdr->perms;
-- if (domain_adjust_node_perms(conn, node)) {
-- talloc_free(node);
-- return NULL;
-- }
-+ if (domain_adjust_node_perms(conn, node))
-+ goto error;
-
- /* Data is binary blob (usually ascii, no nul). */
- node->data = node->perms.p + hdr->num_perms;
- /* Children is strings, nul separated. */
- node->children = node->data + node->datalen;
-
-- access_node(conn, node, NODE_ACCESS_READ, NULL);
-+ if (access_node(conn, node, NODE_ACCESS_READ, NULL))
-+ goto error;
-
- return node;
-+
-+ error:
-+ err = errno;
-+ talloc_free(node);
-+ errno = err;
-+ return NULL;
-+}
-+
-+static bool read_node_can_propagate_errno(void)
-+{
-+ /*
-+ * 2 error cases for read_node() can always be propagated up:
-+ * ENOMEM, because this has nothing to do with the node being in the
-+ * data base or not, but is caused by a general lack of memory.
-+ * ENOSPC, because this is related to hitting quota limits which need
-+ * to be respected.
-+ */
-+ return errno == ENOMEM || errno == ENOSPC;
- }
-
- int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
-@@ -767,7 +785,7 @@ static int ask_parents(struct connection *conn, const void *ctx,
- node = read_node(conn, ctx, name);
- if (node)
- break;
-- if (errno == ENOMEM)
-+ if (read_node_can_propagate_errno())
- return errno;
- } while (!streq(name, "/"));
-
-@@ -829,7 +847,7 @@ static struct node *get_node(struct connection *conn,
- }
- }
- /* Clean up errno if they weren't supposed to know. */
-- if (!node && errno != ENOMEM)
-+ if (!node && !read_node_can_propagate_errno())
- errno = errno_from_parents(conn, ctx, name, errno, perm);
- return node;
- }
-@@ -1235,7 +1253,7 @@ static struct node *construct_node(struct connection *conn, const void *ctx,
-
- /* If parent doesn't exist, create it. */
- parent = read_node(conn, parentname, parentname);
-- if (!parent)
-+ if (!parent && errno == ENOENT)
- parent = construct_node(conn, ctx, parentname);
- if (!parent)
- return NULL;
-@@ -1509,7 +1527,7 @@ static int _rm(struct connection *conn, const void *ctx, struct node *node,
-
- parent = read_node(conn, ctx, parentname);
- if (!parent)
-- return (errno == ENOMEM) ? ENOMEM : EINVAL;
-+ return read_node_can_propagate_errno() ? errno : EINVAL;
- node->parent = parent;
-
- return delete_node(conn, ctx, parent, node, false);
-@@ -1539,7 +1557,7 @@ static int do_rm(struct connection *conn, struct buffered_data *in)
- return 0;
- }
- /* Restore errno, just in case. */
-- if (errno != ENOMEM)
-+ if (!read_node_can_propagate_errno())
- errno = ENOENT;
- }
- return errno;
-@@ -2384,6 +2402,8 @@ static void usage(void)
- " -M, --path-max <chars> limit the allowed Xenstore node path length,\n"
- " -Q, --quota <what>=<nb> set the quota <what> to the value <nb>, allowed\n"
- " quotas are:\n"
-+" transaction-nodes: number of accessed node per\n"
-+" transaction\n"
- " outstanding: number of outstanding requests\n"
- " -w, --timeout <what>=<seconds> set the timeout in seconds for <what>,\n"
- " allowed timeout candidates are:\n"
-@@ -2468,6 +2488,8 @@ static void set_quota(const char *arg)
- val = get_optval_int(eq + 1);
- if (what_matches(arg, "outstanding"))
- quota_req_outstanding = val;
-+ else if (what_matches(arg, "transaction-nodes"))
-+ quota_trans_nodes = val;
- else
- barf("unknown quota \"%s\"\n", arg);
- }
-diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
-index b1a70488b989..245f9258235f 100644
---- a/tools/xenstore/xenstored_core.h
-+++ b/tools/xenstore/xenstored_core.h
-@@ -268,6 +268,7 @@ extern int dom0_event;
- extern int priv_domid;
- extern int quota_nb_entry_per_domain;
- extern int quota_req_outstanding;
-+extern int quota_trans_nodes;
-
- extern unsigned int timeout_watch_event_msec;
-
-diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
-index 86caf6c398be..7bd41eb475e3 100644
---- a/tools/xenstore/xenstored_transaction.c
-+++ b/tools/xenstore/xenstored_transaction.c
-@@ -156,6 +156,9 @@ struct transaction
- /* Connection-local identifier for this transaction. */
- uint32_t id;
-
-+ /* Node counter. */
-+ unsigned int nodes;
-+
- /* Generation when transaction started. */
- uint64_t generation;
-
-@@ -260,6 +263,11 @@ int access_node(struct connection *conn, struct node *node,
-
- i = find_accessed_node(trans, node->name);
- if (!i) {
-+ if (trans->nodes >= quota_trans_nodes &&
-+ domain_is_unprivileged(conn)) {
-+ ret = ENOSPC;
-+ goto err;
-+ }
- i = talloc_zero(trans, struct accessed_node);
- if (!i)
- goto nomem;
-@@ -297,6 +305,7 @@ int access_node(struct connection *conn, struct node *node,
- i->ta_node = true;
- }
- }
-+ trans->nodes++;
- list_add_tail(&i->list, &trans->accessed);
- }
-
-diff --git a/tools/xenstore/xenstored_transaction.h b/tools/xenstore/xenstored_transaction.h
-index 0093cac807e3..e3cbd6b23095 100644
---- a/tools/xenstore/xenstored_transaction.h
-+++ b/tools/xenstore/xenstored_transaction.h
-@@ -39,8 +39,8 @@ void transaction_entry_inc(struct transaction *trans, unsigned int domid);
- void transaction_entry_dec(struct transaction *trans, unsigned int domid);
-
- /* This node was accessed. */
--int access_node(struct connection *conn, struct node *node,
-- enum node_access_type type, TDB_DATA *key);
-+int __must_check access_node(struct connection *conn, struct node *node,
-+ enum node_access_type type, TDB_DATA *key);
-
- /* Queue watches for a modified node. */
- void queue_watches(struct connection *conn, const char *name, bool watch_exact);
---
-2.37.4
-
diff --git a/0054-vpci-msix-handle-accesses-adjacent-to-the-MSI-X-tabl.patch b/0054-vpci-msix-handle-accesses-adjacent-to-the-MSI-X-tabl.patch
new file mode 100644
index 0000000..4973ae7
--- /dev/null
+++ b/0054-vpci-msix-handle-accesses-adjacent-to-the-MSI-X-tabl.patch
@@ -0,0 +1,543 @@
+From d080287c2a8dce11baee1d7bbf9276757e8572e4 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Fri, 31 Mar 2023 08:41:27 +0200
+Subject: [PATCH 54/61] vpci/msix: handle accesses adjacent to the MSI-X table
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The handling of the MSI-X table accesses by Xen requires that any
+pages part of the MSI-X related tables are not mapped into the domain
+physmap. As a result, any device registers in the same pages as the
+start or the end of the MSIX or PBA tables is not currently
+accessible, as the accesses are just dropped.
+
+Note the spec forbids such placing of registers, as the MSIX and PBA
+tables must be 4K isolated from any other registers:
+
+"If a Base Address register that maps address space for the MSI-X
+Table or MSI-X PBA also maps other usable address space that is not
+associated with MSI-X structures, locations (e.g., for CSRs) used in
+the other address space must not share any naturally aligned 4-KB
+address range with one where either MSI-X structure resides."
+
+Yet the 'Intel Wi-Fi 6 AX201' device on one of my boxes has registers
+in the same page as the MSIX tables, and thus won't work on a PVH dom0
+without this fix.
+
+In order to cope with the behavior passthrough any accesses that fall
+on the same page as the MSIX tables (but don't fall in between) to the
+underlying hardware. Such forwarding also takes care of the PBA
+accesses, so it allows to remove the code doing this handling in
+msix_{read,write}. Note that as a result accesses to the PBA array
+are no longer limited to 4 and 8 byte sizes, there's no access size
+restriction for PBA accesses documented in the specification.
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+
+vpci/msix: restore PBA access length and alignment restrictions
+
+Accesses to the PBA array have the same length and alignment
+limitations as accesses to the MSI-X table:
+
+"For all accesses to MSI-X Table and MSI-X PBA fields, software must
+use aligned full DWORD or aligned full QWORD transactions; otherwise,
+the result is undefined."
+
+Introduce such length and alignment checks into the handling of PBA
+accesses for vPCI. This was a mistake of mine for not reading the
+specification correctly.
+
+Note that accesses must now be aligned, and hence there's no longer a
+need to check that the end of the access falls into the PBA region as
+both the access and the region addresses must be aligned.
+
+Fixes: b177892d2d ('vpci/msix: handle accesses adjacent to the MSI-X table')
+Reported-by: Jan Beulich <jbeulich@suse.com>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: b177892d2d0e8a31122c218989f43130aeba5282
+master date: 2023-03-28 14:20:35 +0200
+master commit: 7a502b4fbc339e9d3d3d45fb37f09da06bc3081c
+master date: 2023-03-29 14:56:33 +0200
+---
+ xen/drivers/vpci/msix.c | 357 +++++++++++++++++++++++++++++-----------
+ xen/drivers/vpci/vpci.c | 7 +-
+ xen/include/xen/vpci.h | 8 +-
+ 3 files changed, 275 insertions(+), 97 deletions(-)
+
+diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
+index ea5d73a02a..7e1bfb2f0a 100644
+--- a/xen/drivers/vpci/msix.c
++++ b/xen/drivers/vpci/msix.c
+@@ -27,6 +27,11 @@
+ ((addr) >= vmsix_table_addr(vpci, nr) && \
+ (addr) < vmsix_table_addr(vpci, nr) + vmsix_table_size(vpci, nr))
+
++#define VMSIX_ADDR_SAME_PAGE(addr, vpci, nr) \
++ (PFN_DOWN(addr) >= PFN_DOWN(vmsix_table_addr(vpci, nr)) && \
++ PFN_DOWN(addr) <= PFN_DOWN(vmsix_table_addr(vpci, nr) + \
++ vmsix_table_size(vpci, nr) - 1))
++
+ static uint32_t control_read(const struct pci_dev *pdev, unsigned int reg,
+ void *data)
+ {
+@@ -149,7 +154,7 @@ static struct vpci_msix *msix_find(const struct domain *d, unsigned long addr)
+
+ for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
+ if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled &&
+- VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) )
++ VMSIX_ADDR_SAME_PAGE(addr, msix->pdev->vpci, i) )
+ return msix;
+ }
+
+@@ -182,36 +187,172 @@ static struct vpci_msix_entry *get_entry(struct vpci_msix *msix,
+ return &msix->entries[(addr - start) / PCI_MSIX_ENTRY_SIZE];
+ }
+
+-static void __iomem *get_pba(struct vpci *vpci)
++static void __iomem *get_table(struct vpci *vpci, unsigned int slot)
+ {
+ struct vpci_msix *msix = vpci->msix;
++ paddr_t addr = 0;
++
++ ASSERT(spin_is_locked(&vpci->lock));
++
++ if ( likely(msix->table[slot]) )
++ return msix->table[slot];
++
++ switch ( slot )
++ {
++ case VPCI_MSIX_TBL_TAIL:
++ addr = vmsix_table_size(vpci, VPCI_MSIX_TABLE);
++ fallthrough;
++ case VPCI_MSIX_TBL_HEAD:
++ addr += vmsix_table_addr(vpci, VPCI_MSIX_TABLE);
++ break;
++
++ case VPCI_MSIX_PBA_TAIL:
++ addr = vmsix_table_size(vpci, VPCI_MSIX_PBA);
++ fallthrough;
++ case VPCI_MSIX_PBA_HEAD:
++ addr += vmsix_table_addr(vpci, VPCI_MSIX_PBA);
++ break;
++
++ default:
++ ASSERT_UNREACHABLE();
++ return NULL;
++ }
++
++ msix->table[slot] = ioremap(round_pgdown(addr), PAGE_SIZE);
++
++ return msix->table[slot];
++}
++
++unsigned int get_slot(const struct vpci *vpci, unsigned long addr)
++{
++ unsigned long pfn = PFN_DOWN(addr);
++
+ /*
+- * PBA will only be unmapped when the device is deassigned, so access it
+- * without holding the vpci lock.
++ * The logic below relies on having the tables identity mapped to the guest
++ * address space, or for the `addr` parameter to be translated into its
++ * host physical memory address equivalent.
+ */
+- void __iomem *pba = read_atomic(&msix->pba);
+
+- if ( likely(pba) )
+- return pba;
++ if ( pfn == PFN_DOWN(vmsix_table_addr(vpci, VPCI_MSIX_TABLE)) )
++ return VPCI_MSIX_TBL_HEAD;
++ if ( pfn == PFN_DOWN(vmsix_table_addr(vpci, VPCI_MSIX_TABLE) +
++ vmsix_table_size(vpci, VPCI_MSIX_TABLE) - 1) )
++ return VPCI_MSIX_TBL_TAIL;
++ if ( pfn == PFN_DOWN(vmsix_table_addr(vpci, VPCI_MSIX_PBA)) )
++ return VPCI_MSIX_PBA_HEAD;
++ if ( pfn == PFN_DOWN(vmsix_table_addr(vpci, VPCI_MSIX_PBA) +
++ vmsix_table_size(vpci, VPCI_MSIX_PBA) - 1) )
++ return VPCI_MSIX_PBA_TAIL;
++
++ ASSERT_UNREACHABLE();
++ return -1;
++}
++
++static bool adjacent_handle(const struct vpci_msix *msix, unsigned long addr)
++{
++ unsigned int i;
++
++ if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
++ return true;
++
++ if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_TABLE) )
++ return false;
++
++ for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
++ if ( VMSIX_ADDR_SAME_PAGE(addr, msix->pdev->vpci, i) )
++ return true;
++
++ return false;
++}
++
++static int adjacent_read(const struct domain *d, const struct vpci_msix *msix,
++ unsigned long addr, unsigned int len,
++ unsigned long *data)
++{
++ const void __iomem *mem;
++ struct vpci *vpci = msix->pdev->vpci;
++ unsigned int slot;
++
++ *data = ~0ul;
++
++ if ( !adjacent_handle(msix, addr + len - 1) )
++ return X86EMUL_OKAY;
++
++ if ( VMSIX_ADDR_IN_RANGE(addr, vpci, VPCI_MSIX_PBA) &&
++ !access_allowed(msix->pdev, addr, len) )
++ /* PBA accesses must be aligned and 4 or 8 bytes in size. */
++ return X86EMUL_OKAY;
++
++ slot = get_slot(vpci, addr);
++ if ( slot >= ARRAY_SIZE(msix->table) )
++ return X86EMUL_OKAY;
++
++ if ( unlikely(!IS_ALIGNED(addr, len)) )
++ {
++ unsigned int i;
+
+- pba = ioremap(vmsix_table_addr(vpci, VPCI_MSIX_PBA),
+- vmsix_table_size(vpci, VPCI_MSIX_PBA));
+- if ( !pba )
+- return read_atomic(&msix->pba);
++ gprintk(XENLOG_DEBUG, "%pp: unaligned read to MSI-X related page\n",
++ &msix->pdev->sbdf);
++
++ /*
++ * Split unaligned accesses into byte sized ones. Shouldn't happen in
++ * the first place, but devices shouldn't have registers in the same 4K
++ * page as the MSIX tables either.
++ *
++ * It's unclear whether this could cause issues if a guest expects
++ * registers to be accessed atomically, it better use an aligned access
++ * if it has such expectations.
++ */
++ for ( i = 0; i < len; i++ )
++ {
++ unsigned long partial = ~0ul;
++ int rc = adjacent_read(d, msix, addr + i, 1, &partial);
++
++ if ( rc != X86EMUL_OKAY )
++ return rc;
++
++ *data &= ~(0xfful << (i * 8));
++ *data |= (partial & 0xff) << (i * 8);
++ }
++
++ return X86EMUL_OKAY;
++ }
+
+ spin_lock(&vpci->lock);
+- if ( !msix->pba )
++ mem = get_table(vpci, slot);
++ if ( !mem )
+ {
+- write_atomic(&msix->pba, pba);
+ spin_unlock(&vpci->lock);
++ gprintk(XENLOG_WARNING,
++ "%pp: unable to map MSI-X page, returning all bits set\n",
++ &msix->pdev->sbdf);
++ return X86EMUL_OKAY;
+ }
+- else
++
++ switch ( len )
+ {
+- spin_unlock(&vpci->lock);
+- iounmap(pba);
++ case 1:
++ *data = readb(mem + PAGE_OFFSET(addr));
++ break;
++
++ case 2:
++ *data = readw(mem + PAGE_OFFSET(addr));
++ break;
++
++ case 4:
++ *data = readl(mem + PAGE_OFFSET(addr));
++ break;
++
++ case 8:
++ *data = readq(mem + PAGE_OFFSET(addr));
++ break;
++
++ default:
++ ASSERT_UNREACHABLE();
+ }
++ spin_unlock(&vpci->lock);
+
+- return read_atomic(&msix->pba);
++ return X86EMUL_OKAY;
+ }
+
+ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
+@@ -227,47 +368,11 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
+ if ( !msix )
+ return X86EMUL_RETRY;
+
+- if ( !access_allowed(msix->pdev, addr, len) )
+- return X86EMUL_OKAY;
+-
+- if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
+- {
+- struct vpci *vpci = msix->pdev->vpci;
+- unsigned int idx = addr - vmsix_table_addr(vpci, VPCI_MSIX_PBA);
+- const void __iomem *pba = get_pba(vpci);
+-
+- /*
+- * Access to PBA.
+- *
+- * TODO: note that this relies on having the PBA identity mapped to the
+- * guest address space. If this changes the address will need to be
+- * translated.
+- */
+- if ( !pba )
+- {
+- gprintk(XENLOG_WARNING,
+- "%pp: unable to map MSI-X PBA, report all pending\n",
+- &msix->pdev->sbdf);
+- return X86EMUL_OKAY;
+- }
+-
+- switch ( len )
+- {
+- case 4:
+- *data = readl(pba + idx);
+- break;
+-
+- case 8:
+- *data = readq(pba + idx);
+- break;
+-
+- default:
+- ASSERT_UNREACHABLE();
+- break;
+- }
++ if ( adjacent_handle(msix, addr) )
++ return adjacent_read(d, msix, addr, len, data);
+
++ if ( !access_allowed(msix->pdev, addr, len) )
+ return X86EMUL_OKAY;
+- }
+
+ spin_lock(&msix->pdev->vpci->lock);
+ entry = get_entry(msix, addr);
+@@ -303,57 +408,103 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
+ return X86EMUL_OKAY;
+ }
+
+-static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
+- unsigned long data)
++static int adjacent_write(const struct domain *d, const struct vpci_msix *msix,
++ unsigned long addr, unsigned int len,
++ unsigned long data)
+ {
+- const struct domain *d = v->domain;
+- struct vpci_msix *msix = msix_find(d, addr);
+- struct vpci_msix_entry *entry;
+- unsigned int offset;
++ void __iomem *mem;
++ struct vpci *vpci = msix->pdev->vpci;
++ unsigned int slot;
+
+- if ( !msix )
+- return X86EMUL_RETRY;
++ if ( !adjacent_handle(msix, addr + len - 1) )
++ return X86EMUL_OKAY;
+
+- if ( !access_allowed(msix->pdev, addr, len) )
++ /*
++ * Only check start and end of the access because the size of the PBA is
++ * assumed to be equal or bigger (8 bytes) than the length of any access
++ * handled here.
++ */
++ if ( VMSIX_ADDR_IN_RANGE(addr, vpci, VPCI_MSIX_PBA) &&
++ (!access_allowed(msix->pdev, addr, len) || !is_hardware_domain(d)) )
++ /* Ignore writes to PBA for DomUs, it's undefined behavior. */
+ return X86EMUL_OKAY;
+
+- if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
+- {
+- /* Ignore writes to PBA for DomUs, it's behavior is undefined. */
+- if ( is_hardware_domain(d) )
+- {
+- struct vpci *vpci = msix->pdev->vpci;
+- unsigned int idx = addr - vmsix_table_addr(vpci, VPCI_MSIX_PBA);
+- const void __iomem *pba = get_pba(vpci);
++ slot = get_slot(vpci, addr);
++ if ( slot >= ARRAY_SIZE(msix->table) )
++ return X86EMUL_OKAY;
+
+- if ( !pba )
+- {
+- /* Unable to map the PBA, ignore write. */
+- gprintk(XENLOG_WARNING,
+- "%pp: unable to map MSI-X PBA, write ignored\n",
+- &msix->pdev->sbdf);
+- return X86EMUL_OKAY;
+- }
++ if ( unlikely(!IS_ALIGNED(addr, len)) )
++ {
++ unsigned int i;
+
+- switch ( len )
+- {
+- case 4:
+- writel(data, pba + idx);
+- break;
++ gprintk(XENLOG_DEBUG, "%pp: unaligned write to MSI-X related page\n",
++ &msix->pdev->sbdf);
+
+- case 8:
+- writeq(data, pba + idx);
+- break;
++ for ( i = 0; i < len; i++ )
++ {
++ int rc = adjacent_write(d, msix, addr + i, 1, data >> (i * 8));
+
+- default:
+- ASSERT_UNREACHABLE();
+- break;
+- }
++ if ( rc != X86EMUL_OKAY )
++ return rc;
+ }
+
+ return X86EMUL_OKAY;
+ }
+
++ spin_lock(&vpci->lock);
++ mem = get_table(vpci, slot);
++ if ( !mem )
++ {
++ spin_unlock(&vpci->lock);
++ gprintk(XENLOG_WARNING,
++ "%pp: unable to map MSI-X page, dropping write\n",
++ &msix->pdev->sbdf);
++ return X86EMUL_OKAY;
++ }
++
++ switch ( len )
++ {
++ case 1:
++ writeb(data, mem + PAGE_OFFSET(addr));
++ break;
++
++ case 2:
++ writew(data, mem + PAGE_OFFSET(addr));
++ break;
++
++ case 4:
++ writel(data, mem + PAGE_OFFSET(addr));
++ break;
++
++ case 8:
++ writeq(data, mem + PAGE_OFFSET(addr));
++ break;
++
++ default:
++ ASSERT_UNREACHABLE();
++ }
++ spin_unlock(&vpci->lock);
++
++ return X86EMUL_OKAY;
++}
++
++static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
++ unsigned long data)
++{
++ const struct domain *d = v->domain;
++ struct vpci_msix *msix = msix_find(d, addr);
++ struct vpci_msix_entry *entry;
++ unsigned int offset;
++
++ if ( !msix )
++ return X86EMUL_RETRY;
++
++ if ( adjacent_handle(msix, addr) )
++ return adjacent_write(d, msix, addr, len, data);
++
++ if ( !access_allowed(msix->pdev, addr, len) )
++ return X86EMUL_OKAY;
++
+ spin_lock(&msix->pdev->vpci->lock);
+ entry = get_entry(msix, addr);
+ offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
+@@ -482,6 +633,26 @@ int vpci_make_msix_hole(const struct pci_dev *pdev)
+ }
+ }
+
++ if ( is_hardware_domain(d) )
++ {
++ /*
++ * For dom0 only: remove any hypervisor mappings of the MSIX or PBA
++ * related areas, as dom0 is capable of moving the position of the BARs
++ * in the host address space.
++ *
++ * We rely on being called with the vPCI lock held once the domain is
++ * running, so the maps are not in use.
++ */
++ for ( i = 0; i < ARRAY_SIZE(pdev->vpci->msix->table); i++ )
++ if ( pdev->vpci->msix->table[i] )
++ {
++ /* If there are any maps, the domain must be running. */
++ ASSERT(spin_is_locked(&pdev->vpci->lock));
++ iounmap(pdev->vpci->msix->table[i]);
++ pdev->vpci->msix->table[i] = NULL;
++ }
++ }
++
+ return 0;
+ }
+
+diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
+index b9339f8f3e..60b5f45cd1 100644
+--- a/xen/drivers/vpci/vpci.c
++++ b/xen/drivers/vpci/vpci.c
+@@ -53,9 +53,12 @@ void vpci_remove_device(struct pci_dev *pdev)
+ spin_unlock(&pdev->vpci->lock);
+ if ( pdev->vpci->msix )
+ {
++ unsigned int i;
++
+ list_del(&pdev->vpci->msix->next);
+- if ( pdev->vpci->msix->pba )
+- iounmap(pdev->vpci->msix->pba);
++ for ( i = 0; i < ARRAY_SIZE(pdev->vpci->msix->table); i++ )
++ if ( pdev->vpci->msix->table[i] )
++ iounmap(pdev->vpci->msix->table[i]);
+ }
+ xfree(pdev->vpci->msix);
+ xfree(pdev->vpci->msi);
+diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
+index 755b4fd5c8..3326d9026e 100644
+--- a/xen/include/xen/vpci.h
++++ b/xen/include/xen/vpci.h
+@@ -129,8 +129,12 @@ struct vpci {
+ bool enabled : 1;
+ /* Masked? */
+ bool masked : 1;
+- /* PBA map */
+- void __iomem *pba;
++ /* Partial table map. */
++#define VPCI_MSIX_TBL_HEAD 0
++#define VPCI_MSIX_TBL_TAIL 1
++#define VPCI_MSIX_PBA_HEAD 2
++#define VPCI_MSIX_PBA_TAIL 3
++ void __iomem *table[4];
+ /* Entries. */
+ struct vpci_msix_entry {
+ uint64_t addr;
+--
+2.40.0
+
diff --git a/0055-ns16550-correct-name-value-pair-parsing-for-PCI-port.patch b/0055-ns16550-correct-name-value-pair-parsing-for-PCI-port.patch
new file mode 100644
index 0000000..9c05f3a
--- /dev/null
+++ b/0055-ns16550-correct-name-value-pair-parsing-for-PCI-port.patch
@@ -0,0 +1,59 @@
+From 06264af090ac69a95cdadbc261cc82d964dcb568 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Fri, 31 Mar 2023 08:42:02 +0200
+Subject: [PATCH 55/61] ns16550: correct name/value pair parsing for PCI
+ port/bridge
+
+First of all these were inverted: "bridge=" caused the port coordinates
+to be established, while "port=" controlled the bridge coordinates. And
+then the error messages being identical also wasn't helpful. While
+correcting this also move both case blocks close together.
+
+Fixes: 97fd49a7e074 ("ns16550: add support for UART parameters to be specifed with name-value pairs")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: e692b22230b411d762ac9e278a398e28df474eae
+master date: 2023-03-29 14:55:37 +0200
+---
+ xen/drivers/char/ns16550.c | 16 ++++++++--------
+ 1 file changed, 8 insertions(+), 8 deletions(-)
+
+diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
+index 5dd4d723f5..3651e0c0d4 100644
+--- a/xen/drivers/char/ns16550.c
++++ b/xen/drivers/char/ns16550.c
+@@ -1536,13 +1536,6 @@ static bool __init parse_namevalue_pairs(char *str, struct ns16550 *uart)
+ break;
+
+ #ifdef CONFIG_HAS_PCI
+- case bridge_bdf:
+- if ( !parse_pci(param_value, NULL, &uart->ps_bdf[0],
+- &uart->ps_bdf[1], &uart->ps_bdf[2]) )
+- PARSE_ERR_RET("Bad port PCI coordinates\n");
+- uart->ps_bdf_enable = true;
+- break;
+-
+ case device:
+ if ( strncmp(param_value, "pci", 3) == 0 )
+ {
+@@ -1557,9 +1550,16 @@ static bool __init parse_namevalue_pairs(char *str, struct ns16550 *uart)
+ break;
+
+ case port_bdf:
++ if ( !parse_pci(param_value, NULL, &uart->ps_bdf[0],
++ &uart->ps_bdf[1], &uart->ps_bdf[2]) )
++ PARSE_ERR_RET("Bad port PCI coordinates\n");
++ uart->ps_bdf_enable = true;
++ break;
++
++ case bridge_bdf:
+ if ( !parse_pci(param_value, NULL, &uart->pb_bdf[0],
+ &uart->pb_bdf[1], &uart->pb_bdf[2]) )
+- PARSE_ERR_RET("Bad port PCI coordinates\n");
++ PARSE_ERR_RET("Bad bridge PCI coordinates\n");
+ uart->pb_bdf_enable = true;
+ break;
+ #endif
+--
+2.40.0
+
diff --git a/0055-tools-xenstore-move-the-call-of-setup_structure-to-d.patch b/0055-tools-xenstore-move-the-call-of-setup_structure-to-d.patch
deleted file mode 100644
index 5a8abbd..0000000
--- a/0055-tools-xenstore-move-the-call-of-setup_structure-to-d.patch
+++ /dev/null
@@ -1,96 +0,0 @@
-From 2d39cf77d70b44b70f970da90187f48d2c0b3e96 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:09 +0200
-Subject: [PATCH 55/87] tools/xenstore: move the call of setup_structure() to
- dom0 introduction
-
-Setting up the basic structure when introducing dom0 has the advantage
-to be able to add proper node memory accounting for the added nodes
-later.
-
-This makes it possible to do proper node accounting, too.
-
-An additional requirement to make that work fine is to correct the
-owner of the created nodes to be dom0_domid instead of domid 0.
-
-This is part of XSA-326.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 60e2f6020dea7f616857b8fc1141b1c085d88761)
----
- tools/xenstore/xenstored_core.c | 9 ++++-----
- tools/xenstore/xenstored_core.h | 1 +
- tools/xenstore/xenstored_domain.c | 3 +++
- 3 files changed, 8 insertions(+), 5 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index f835aa1b2f1f..5171d34c947e 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -2039,7 +2039,8 @@ static int tdb_flags;
- static void manual_node(const char *name, const char *child)
- {
- struct node *node;
-- struct xs_permissions perms = { .id = 0, .perms = XS_PERM_NONE };
-+ struct xs_permissions perms = { .id = dom0_domid,
-+ .perms = XS_PERM_NONE };
-
- node = talloc_zero(NULL, struct node);
- if (!node)
-@@ -2078,7 +2079,7 @@ static void tdb_logger(TDB_CONTEXT *tdb, int level, const char * fmt, ...)
- }
- }
-
--static void setup_structure(bool live_update)
-+void setup_structure(bool live_update)
- {
- char *tdbname;
-
-@@ -2101,6 +2102,7 @@ static void setup_structure(bool live_update)
- manual_node("/", "tool");
- manual_node("/tool", "xenstored");
- manual_node("/tool/xenstored", NULL);
-+ domain_entry_fix(dom0_domid, 3, true);
- }
-
- check_store();
-@@ -2614,9 +2616,6 @@ int main(int argc, char *argv[])
-
- init_pipe(reopen_log_pipe);
-
-- /* Setup the database */
-- setup_structure(live_update);
--
- /* Listen to hypervisor. */
- if (!no_domain_init && !live_update) {
- domain_init(-1);
-diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
-index 245f9258235f..2c77ec7ee0f4 100644
---- a/tools/xenstore/xenstored_core.h
-+++ b/tools/xenstore/xenstored_core.h
-@@ -231,6 +231,7 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
- struct node *read_node(struct connection *conn, const void *ctx,
- const char *name);
-
-+void setup_structure(bool live_update);
- struct connection *new_connection(const struct interface_funcs *funcs);
- struct connection *get_connection_by_id(unsigned int conn_id);
- void ignore_connection(struct connection *conn);
-diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
-index 260952e09096..f04b7aae8a32 100644
---- a/tools/xenstore/xenstored_domain.c
-+++ b/tools/xenstore/xenstored_domain.c
-@@ -470,6 +470,9 @@ static struct domain *introduce_domain(const void *ctx,
- }
- domain->interface = interface;
-
-+ if (is_master_domain)
-+ setup_structure(restore);
-+
- /* Now domain belongs to its connection. */
- talloc_steal(domain->conn, domain);
-
---
-2.37.4
-
diff --git a/0056-bump-default-SeaBIOS-version-to-1.16.0.patch b/0056-bump-default-SeaBIOS-version-to-1.16.0.patch
new file mode 100644
index 0000000..37d9b67
--- /dev/null
+++ b/0056-bump-default-SeaBIOS-version-to-1.16.0.patch
@@ -0,0 +1,28 @@
+From 2a4d327387601b60c9844a5b0cc44de28792ea52 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Fri, 6 May 2022 14:46:52 +0200
+Subject: [PATCH 56/61] bump default SeaBIOS version to 1.16.0
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit 944e389daa133dd310d87c4eebacba9f6da76018)
+---
+ Config.mk | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/Config.mk b/Config.mk
+index 1215c2725b..073715c28d 100644
+--- a/Config.mk
++++ b/Config.mk
+@@ -241,7 +241,7 @@ OVMF_UPSTREAM_REVISION ?= 7b4a99be8a39c12d3a7fc4b8db9f0eab4ac688d5
+ QEMU_UPSTREAM_REVISION ?= qemu-xen-4.16.3
+ MINIOS_UPSTREAM_REVISION ?= xen-RELEASE-4.16.3
+
+-SEABIOS_UPSTREAM_REVISION ?= rel-1.14.0
++SEABIOS_UPSTREAM_REVISION ?= rel-1.16.0
+
+ ETHERBOOT_NICS ?= rtl8139 8086100e
+
+--
+2.40.0
+
diff --git a/0056-tools-xenstore-add-infrastructure-to-keep-track-of-p.patch b/0056-tools-xenstore-add-infrastructure-to-keep-track-of-p.patch
deleted file mode 100644
index b92c61c..0000000
--- a/0056-tools-xenstore-add-infrastructure-to-keep-track-of-p.patch
+++ /dev/null
@@ -1,289 +0,0 @@
-From 2e406cf5fbb817341dc860473158382057e13de5 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:09 +0200
-Subject: [PATCH 56/87] tools/xenstore: add infrastructure to keep track of per
- domain memory usage
-
-The amount of memory a domain can consume in Xenstore is limited by
-various quota today, but even with sane quota a domain can still
-consume rather large memory quantities.
-
-Add the infrastructure for keeping track of the amount of memory a
-domain is consuming in Xenstore. Note that this is only the memory a
-domain has direct control over, so any internal administration data
-needed by Xenstore only is not being accounted for.
-
-There are two quotas defined: a soft quota which will result in a
-warning issued via syslog() when it is exceeded, and a hard quota
-resulting in a stop of accepting further requests or watch events as
-long as the hard quota would be violated by accepting those.
-
-Setting any of those quotas to 0 will disable it.
-
-As default values use 2MB per domain for the soft limit (this basically
-covers the allowed case to create 1000 nodes needing 2kB each), and
-2.5MB for the hard limit.
-
-This is part of XSA-326.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 0d4a8ec7a93faedbe54fd197db146de628459e77)
----
- tools/xenstore/xenstored_core.c | 30 ++++++++--
- tools/xenstore/xenstored_core.h | 2 +
- tools/xenstore/xenstored_domain.c | 93 +++++++++++++++++++++++++++++++
- tools/xenstore/xenstored_domain.h | 20 +++++++
- 4 files changed, 139 insertions(+), 6 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 5171d34c947e..b2bf6740d430 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -109,6 +109,8 @@ int quota_nb_perms_per_node = 5;
- int quota_trans_nodes = 1024;
- int quota_max_path_len = XENSTORE_REL_PATH_MAX;
- int quota_req_outstanding = 20;
-+int quota_memory_per_domain_soft = 2 * 1024 * 1024; /* 2 MB */
-+int quota_memory_per_domain_hard = 2 * 1024 * 1024 + 512 * 1024; /* 2.5 MB */
-
- unsigned int timeout_watch_event_msec = 20000;
-
-@@ -2406,7 +2408,14 @@ static void usage(void)
- " quotas are:\n"
- " transaction-nodes: number of accessed node per\n"
- " transaction\n"
-+" memory: total used memory per domain for nodes,\n"
-+" transactions, watches and requests, above\n"
-+" which Xenstore will stop talking to domain\n"
- " outstanding: number of outstanding requests\n"
-+" -q, --quota-soft <what>=<nb> set a soft quota <what> to the value <nb>,\n"
-+" causing a warning to be issued via syslog() if the\n"
-+" limit is violated, allowed quotas are:\n"
-+" memory: see above\n"
- " -w, --timeout <what>=<seconds> set the timeout in seconds for <what>,\n"
- " allowed timeout candidates are:\n"
- " watch-event: time a watch-event is kept pending\n"
-@@ -2433,6 +2442,7 @@ static struct option options[] = {
- { "perm-nb", 1, NULL, 'A' },
- { "path-max", 1, NULL, 'M' },
- { "quota", 1, NULL, 'Q' },
-+ { "quota-soft", 1, NULL, 'q' },
- { "timeout", 1, NULL, 'w' },
- { "no-recovery", 0, NULL, 'R' },
- { "internal-db", 0, NULL, 'I' },
-@@ -2480,7 +2490,7 @@ static void set_timeout(const char *arg)
- barf("unknown timeout \"%s\"\n", arg);
- }
-
--static void set_quota(const char *arg)
-+static void set_quota(const char *arg, bool soft)
- {
- const char *eq = strchr(arg, '=');
- int val;
-@@ -2488,11 +2498,16 @@ static void set_quota(const char *arg)
- if (!eq)
- barf("quotas must be specified via <what>=<nb>\n");
- val = get_optval_int(eq + 1);
-- if (what_matches(arg, "outstanding"))
-+ if (what_matches(arg, "outstanding") && !soft)
- quota_req_outstanding = val;
-- else if (what_matches(arg, "transaction-nodes"))
-+ else if (what_matches(arg, "transaction-nodes") && !soft)
- quota_trans_nodes = val;
-- else
-+ else if (what_matches(arg, "memory")) {
-+ if (soft)
-+ quota_memory_per_domain_soft = val;
-+ else
-+ quota_memory_per_domain_hard = val;
-+ } else
- barf("unknown quota \"%s\"\n", arg);
- }
-
-@@ -2510,7 +2525,7 @@ int main(int argc, char *argv[])
- orig_argc = argc;
- orig_argv = argv;
-
-- while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:Q:T:RVW:w:U",
-+ while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:Q:q:T:RVW:w:U",
- options, NULL)) != -1) {
- switch (opt) {
- case 'D':
-@@ -2561,7 +2576,10 @@ int main(int argc, char *argv[])
- quota_max_path_len);
- break;
- case 'Q':
-- set_quota(optarg);
-+ set_quota(optarg, false);
-+ break;
-+ case 'q':
-+ set_quota(optarg, true);
- break;
- case 'w':
- set_timeout(optarg);
-diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
-index 2c77ec7ee0f4..373af18297bf 100644
---- a/tools/xenstore/xenstored_core.h
-+++ b/tools/xenstore/xenstored_core.h
-@@ -270,6 +270,8 @@ extern int priv_domid;
- extern int quota_nb_entry_per_domain;
- extern int quota_req_outstanding;
- extern int quota_trans_nodes;
-+extern int quota_memory_per_domain_soft;
-+extern int quota_memory_per_domain_hard;
-
- extern unsigned int timeout_watch_event_msec;
-
-diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
-index f04b7aae8a32..94fd561e9de4 100644
---- a/tools/xenstore/xenstored_domain.c
-+++ b/tools/xenstore/xenstored_domain.c
-@@ -76,6 +76,13 @@ struct domain
- /* number of entry from this domain in the store */
- int nbentry;
-
-+ /* Amount of memory allocated for this domain. */
-+ int memory;
-+ bool soft_quota_reported;
-+ bool hard_quota_reported;
-+ time_t mem_last_msg;
-+#define MEM_WARN_MINTIME_SEC 10
-+
- /* number of watch for this domain */
- int nbwatch;
-
-@@ -192,6 +199,9 @@ static bool domain_can_read(struct connection *conn)
- return false;
- if (conn->domain->nboutstanding >= quota_req_outstanding)
- return false;
-+ if (conn->domain->memory >= quota_memory_per_domain_hard &&
-+ quota_memory_per_domain_hard)
-+ return false;
- }
-
- return (intf->req_cons != intf->req_prod);
-@@ -950,6 +960,89 @@ int domain_entry(struct connection *conn)
- : 0;
- }
-
-+static bool domain_chk_quota(struct domain *domain, int mem)
-+{
-+ time_t now;
-+
-+ if (!domain || !domid_is_unprivileged(domain->domid) ||
-+ (domain->conn && domain->conn->is_ignored))
-+ return false;
-+
-+ now = time(NULL);
-+
-+ if (mem >= quota_memory_per_domain_hard &&
-+ quota_memory_per_domain_hard) {
-+ if (domain->hard_quota_reported)
-+ return true;
-+ syslog(LOG_ERR, "Domain %u exceeds hard memory quota, Xenstore interface to domain stalled\n",
-+ domain->domid);
-+ domain->mem_last_msg = now;
-+ domain->hard_quota_reported = true;
-+ return true;
-+ }
-+
-+ if (now - domain->mem_last_msg >= MEM_WARN_MINTIME_SEC) {
-+ if (domain->hard_quota_reported) {
-+ domain->mem_last_msg = now;
-+ domain->hard_quota_reported = false;
-+ syslog(LOG_INFO, "Domain %u below hard memory quota again\n",
-+ domain->domid);
-+ }
-+ if (mem >= quota_memory_per_domain_soft &&
-+ quota_memory_per_domain_soft &&
-+ !domain->soft_quota_reported) {
-+ domain->mem_last_msg = now;
-+ domain->soft_quota_reported = true;
-+ syslog(LOG_WARNING, "Domain %u exceeds soft memory quota\n",
-+ domain->domid);
-+ }
-+ if (mem < quota_memory_per_domain_soft &&
-+ domain->soft_quota_reported) {
-+ domain->mem_last_msg = now;
-+ domain->soft_quota_reported = false;
-+ syslog(LOG_INFO, "Domain %u below soft memory quota again\n",
-+ domain->domid);
-+ }
-+
-+ }
-+
-+ return false;
-+}
-+
-+int domain_memory_add(unsigned int domid, int mem, bool no_quota_check)
-+{
-+ struct domain *domain;
-+
-+ domain = find_domain_struct(domid);
-+ if (domain) {
-+ /*
-+ * domain_chk_quota() will print warning and also store whether
-+ * the soft/hard quota has been hit. So check no_quota_check
-+ * *after*.
-+ */
-+ if (domain_chk_quota(domain, domain->memory + mem) &&
-+ !no_quota_check)
-+ return ENOMEM;
-+ domain->memory += mem;
-+ } else {
-+ /*
-+ * The domain the memory is to be accounted for should always
-+ * exist, as accounting is done either for a domain related to
-+ * the current connection, or for the domain owning a node
-+ * (which is always existing, as the owner of the node is
-+ * tested to exist and replaced by domid 0 if not).
-+ * So not finding the related domain MUST be an error in the
-+ * data base.
-+ */
-+ errno = ENOENT;
-+ corrupt(NULL, "Accounting called for non-existing domain %u\n",
-+ domid);
-+ return ENOENT;
-+ }
-+
-+ return 0;
-+}
-+
- void domain_watch_inc(struct connection *conn)
- {
- if (!conn || !conn->domain)
-diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
-index d6519904d831..633c9a0a0a1f 100644
---- a/tools/xenstore/xenstored_domain.h
-+++ b/tools/xenstore/xenstored_domain.h
-@@ -61,6 +61,26 @@ int domain_entry_inc(struct connection *conn, struct node *);
- void domain_entry_dec(struct connection *conn, struct node *);
- int domain_entry_fix(unsigned int domid, int num, bool update);
- int domain_entry(struct connection *conn);
-+int domain_memory_add(unsigned int domid, int mem, bool no_quota_check);
-+
-+/*
-+ * domain_memory_add_chk(): to be used when memory quota should be checked.
-+ * Not to be used when specifying a negative mem value, as lowering the used
-+ * memory should always be allowed.
-+ */
-+static inline int domain_memory_add_chk(unsigned int domid, int mem)
-+{
-+ return domain_memory_add(domid, mem, false);
-+}
-+/*
-+ * domain_memory_add_nochk(): to be used when memory quota should not be
-+ * checked, e.g. when lowering memory usage, or in an error case for undoing
-+ * a previous memory adjustment.
-+ */
-+static inline void domain_memory_add_nochk(unsigned int domid, int mem)
-+{
-+ domain_memory_add(domid, mem, true);
-+}
- void domain_watch_inc(struct connection *conn);
- void domain_watch_dec(struct connection *conn);
- int domain_watch(struct connection *conn);
---
-2.37.4
-
diff --git a/0057-CI-Drop-automation-configs.patch b/0057-CI-Drop-automation-configs.patch
new file mode 100644
index 0000000..d726468
--- /dev/null
+++ b/0057-CI-Drop-automation-configs.patch
@@ -0,0 +1,87 @@
+From 657dc5f5f6269008fd7484ca7cca723e21455483 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 29 Dec 2022 15:39:13 +0000
+Subject: [PATCH 57/61] CI: Drop automation/configs/
+
+Having 3 extra hypervisor builds on the end of a full build is deeply
+confusing to debug if one of them fails, because the .config file presented in
+the artefacts is not the one which caused a build failure. Also, the log
+tends to be truncated in the UI.
+
+PV-only is tested as part of PV-Shim in a full build anyway, so doesn't need
+repeating. HVM-only and neither appear frequently in randconfig, so drop all
+the logic here to simplify things.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Michal Orzel <michal.orzel@amd.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+(cherry picked from commit 7b20009a812f26e74bdbde2ab96165376b3dad34)
+---
+ automation/configs/x86/hvm_only_config | 3 ---
+ automation/configs/x86/no_hvm_pv_config | 3 ---
+ automation/configs/x86/pv_only_config | 3 ---
+ automation/scripts/build | 21 ---------------------
+ 4 files changed, 30 deletions(-)
+ delete mode 100644 automation/configs/x86/hvm_only_config
+ delete mode 100644 automation/configs/x86/no_hvm_pv_config
+ delete mode 100644 automation/configs/x86/pv_only_config
+
+diff --git a/automation/configs/x86/hvm_only_config b/automation/configs/x86/hvm_only_config
+deleted file mode 100644
+index 9efbddd535..0000000000
+--- a/automation/configs/x86/hvm_only_config
++++ /dev/null
+@@ -1,3 +0,0 @@
+-CONFIG_HVM=y
+-# CONFIG_PV is not set
+-# CONFIG_DEBUG is not set
+diff --git a/automation/configs/x86/no_hvm_pv_config b/automation/configs/x86/no_hvm_pv_config
+deleted file mode 100644
+index 0bf6a8e468..0000000000
+--- a/automation/configs/x86/no_hvm_pv_config
++++ /dev/null
+@@ -1,3 +0,0 @@
+-# CONFIG_HVM is not set
+-# CONFIG_PV is not set
+-# CONFIG_DEBUG is not set
+diff --git a/automation/configs/x86/pv_only_config b/automation/configs/x86/pv_only_config
+deleted file mode 100644
+index e9d8b4a7c7..0000000000
+--- a/automation/configs/x86/pv_only_config
++++ /dev/null
+@@ -1,3 +0,0 @@
+-CONFIG_PV=y
+-# CONFIG_HVM is not set
+-# CONFIG_DEBUG is not set
+diff --git a/automation/scripts/build b/automation/scripts/build
+index 281f8b1fcc..2c807fa397 100755
+--- a/automation/scripts/build
++++ b/automation/scripts/build
+@@ -73,24 +73,3 @@ if [[ "${XEN_TARGET_ARCH}" != "x86_32" ]]; then
+ cp -r dist binaries/
+ fi
+ fi
+-
+-if [[ "${hypervisor_only}" == "y" ]]; then
+- # If we are build testing a specific Kconfig exit now, there's no point in
+- # testing all the possible configs.
+- exit 0
+-fi
+-
+-# Build all the configs we care about
+-case ${XEN_TARGET_ARCH} in
+- x86_64) arch=x86 ;;
+- *) exit 0 ;;
+-esac
+-
+-cfg_dir="automation/configs/${arch}"
+-for cfg in `ls ${cfg_dir}`; do
+- echo "Building $cfg"
+- make -j$(nproc) -C xen clean
+- rm -f xen/.config
+- make -C xen KBUILD_DEFCONFIG=../../../../${cfg_dir}/${cfg} XEN_CONFIG_EXPERT=y defconfig
+- make -j$(nproc) -C xen XEN_CONFIG_EXPERT=y
+-done
+--
+2.40.0
+
diff --git a/0057-tools-xenstore-add-memory-accounting-for-responses.patch b/0057-tools-xenstore-add-memory-accounting-for-responses.patch
deleted file mode 100644
index 9dd565d..0000000
--- a/0057-tools-xenstore-add-memory-accounting-for-responses.patch
+++ /dev/null
@@ -1,82 +0,0 @@
-From 30c8e752f66f681b5c731a637c26510ae5f35965 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:09 +0200
-Subject: [PATCH 57/87] tools/xenstore: add memory accounting for responses
-
-Add the memory accounting for queued responses.
-
-In case adding a watch event for a guest is causing the hard memory
-quota of that guest to be violated, the event is dropped. This will
-ensure that it is impossible to drive another guest past its memory
-quota by generating insane amounts of events for that guest. This is
-especially important for protecting driver domains from that attack
-vector.
-
-This is part of XSA-326 / CVE-2022-42315.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit f6d00133643a524d2138c9e3f192bbde719050ba)
----
- tools/xenstore/xenstored_core.c | 22 +++++++++++++++++++---
- 1 file changed, 19 insertions(+), 3 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index b2bf6740d430..ecab6cfbbe15 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -260,6 +260,8 @@ static void free_buffered_data(struct buffered_data *out,
- }
- }
-
-+ domain_memory_add_nochk(conn->id, -out->hdr.msg.len - sizeof(out->hdr));
-+
- if (out->hdr.msg.type == XS_WATCH_EVENT) {
- req = out->pend.req;
- if (req) {
-@@ -938,11 +940,14 @@ void send_reply(struct connection *conn, enum xsd_sockmsg_type type,
- bdata->timeout_msec = 0;
- bdata->watch_event = false;
-
-- if (len <= DEFAULT_BUFFER_SIZE)
-+ if (len <= DEFAULT_BUFFER_SIZE) {
- bdata->buffer = bdata->default_buffer;
-- else {
-+ /* Don't check quota, path might be used for returning error. */
-+ domain_memory_add_nochk(conn->id, len + sizeof(bdata->hdr));
-+ } else {
- bdata->buffer = talloc_array(bdata, char, len);
-- if (!bdata->buffer) {
-+ if (!bdata->buffer ||
-+ domain_memory_add_chk(conn->id, len + sizeof(bdata->hdr))) {
- send_error(conn, ENOMEM);
- return;
- }
-@@ -1007,6 +1012,11 @@ void send_event(struct buffered_data *req, struct connection *conn,
- }
- }
-
-+ if (domain_memory_add_chk(conn->id, len + sizeof(bdata->hdr))) {
-+ talloc_free(bdata);
-+ return;
-+ }
-+
- if (timeout_watch_event_msec && domain_is_unprivileged(conn)) {
- bdata->timeout_msec = get_now_msec() + timeout_watch_event_msec;
- if (!conn->timeout_msec)
-@@ -3039,6 +3049,12 @@ static void add_buffered_data(struct buffered_data *bdata,
- */
- if (bdata->hdr.msg.type != XS_WATCH_EVENT)
- domain_outstanding_inc(conn);
-+ /*
-+ * We are restoring the state after Live-Update and the new quota may
-+ * be smaller. So ignore it. The limit will be applied for any resource
-+ * after the state has been fully restored.
-+ */
-+ domain_memory_add_nochk(conn->id, len + sizeof(bdata->hdr));
- }
-
- void read_state_buffered_data(const void *ctx, struct connection *conn,
---
-2.37.4
-
diff --git a/0058-automation-Switch-arm32-cross-builds-to-run-on-arm64.patch b/0058-automation-Switch-arm32-cross-builds-to-run-on-arm64.patch
new file mode 100644
index 0000000..92d65ec
--- /dev/null
+++ b/0058-automation-Switch-arm32-cross-builds-to-run-on-arm64.patch
@@ -0,0 +1,87 @@
+From 37800cf8ab7806e506b96a13cad0fb395d86663a Mon Sep 17 00:00:00 2001
+From: Michal Orzel <michal.orzel@amd.com>
+Date: Tue, 14 Feb 2023 16:38:38 +0100
+Subject: [PATCH 58/61] automation: Switch arm32 cross builds to run on arm64
+
+Due to the limited x86 CI resources slowing down the whole pipeline,
+switch the arm32 cross builds to be executed on arm64 which is much more
+capable. For that, rename the existing debian container dockerfile
+from unstable-arm32-gcc to unstable-arm64v8-arm32-gcc and use
+arm64v8/debian:unstable as an image. Note, that we cannot use the same
+container name as we have to keep the backwards compatibility.
+Take the opportunity to remove extra empty line at the end of a file.
+
+Modify the tag of .arm32-cross-build-tmpl to arm64 and update the build
+jobs accordingly.
+
+Signed-off-by: Michal Orzel <michal.orzel@amd.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+(cherry picked from commit a35fccc8df93de7154dba87db6e7bcf391e9d51c)
+---
+ ...ockerfile => unstable-arm64v8-arm32-gcc.dockerfile} | 3 +--
+ automation/gitlab-ci/build.yaml | 10 +++++-----
+ 2 files changed, 6 insertions(+), 7 deletions(-)
+ rename automation/build/debian/{unstable-arm32-gcc.dockerfile => unstable-arm64v8-arm32-gcc.dockerfile} (94%)
+
+diff --git a/automation/build/debian/unstable-arm32-gcc.dockerfile b/automation/build/debian/unstable-arm64v8-arm32-gcc.dockerfile
+similarity index 94%
+rename from automation/build/debian/unstable-arm32-gcc.dockerfile
+rename to automation/build/debian/unstable-arm64v8-arm32-gcc.dockerfile
+index b41a57f197..11860425a6 100644
+--- a/automation/build/debian/unstable-arm32-gcc.dockerfile
++++ b/automation/build/debian/unstable-arm64v8-arm32-gcc.dockerfile
+@@ -1,4 +1,4 @@
+-FROM debian:unstable
++FROM arm64v8/debian:unstable
+ LABEL maintainer.name="The Xen Project" \
+ maintainer.email="xen-devel@lists.xenproject.org"
+
+@@ -21,4 +21,3 @@ RUN apt-get update && \
+ apt-get autoremove -y && \
+ apt-get clean && \
+ rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
+-
+diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
+index 06a75a8c5a..f66fbca8a7 100644
+--- a/automation/gitlab-ci/build.yaml
++++ b/automation/gitlab-ci/build.yaml
+@@ -123,7 +123,7 @@
+ variables:
+ XEN_TARGET_ARCH: arm32
+ tags:
+- - x86_64
++ - arm64
+
+ .arm32-cross-build:
+ extends: .arm32-cross-build-tmpl
+@@ -497,23 +497,23 @@ alpine-3.12-clang-debug:
+ debian-unstable-gcc-arm32:
+ extends: .gcc-arm32-cross-build
+ variables:
+- CONTAINER: debian:unstable-arm32-gcc
++ CONTAINER: debian:unstable-arm64v8-arm32-gcc
+
+ debian-unstable-gcc-arm32-debug:
+ extends: .gcc-arm32-cross-build-debug
+ variables:
+- CONTAINER: debian:unstable-arm32-gcc
++ CONTAINER: debian:unstable-arm64v8-arm32-gcc
+
+ debian-unstable-gcc-arm32-randconfig:
+ extends: .gcc-arm32-cross-build
+ variables:
+- CONTAINER: debian:unstable-arm32-gcc
++ CONTAINER: debian:unstable-arm64v8-arm32-gcc
+ RANDCONFIG: y
+
+ debian-unstable-gcc-arm32-debug-randconfig:
+ extends: .gcc-arm32-cross-build-debug
+ variables:
+- CONTAINER: debian:unstable-arm32-gcc
++ CONTAINER: debian:unstable-arm64v8-arm32-gcc
+ RANDCONFIG: y
+
+ # Arm builds
+--
+2.40.0
+
diff --git a/0058-tools-xenstore-add-memory-accounting-for-watches.patch b/0058-tools-xenstore-add-memory-accounting-for-watches.patch
deleted file mode 100644
index dc6b80c..0000000
--- a/0058-tools-xenstore-add-memory-accounting-for-watches.patch
+++ /dev/null
@@ -1,96 +0,0 @@
-From bce985745cde48a339954759677b77d3eeec41f3 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:10 +0200
-Subject: [PATCH 58/87] tools/xenstore: add memory accounting for watches
-
-Add the memory accounting for registered watches.
-
-When a socket connection is destroyed, the associated watches are
-removed, too. In order to keep memory accounting correct the watches
-must be removed explicitly via a call of conn_delete_all_watches() from
-destroy_conn().
-
-This is part of XSA-326 / CVE-2022-42315.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 7f9978a2cc37aaffab2fb09593bc598c0712a69b)
----
- tools/xenstore/xenstored_core.c | 1 +
- tools/xenstore/xenstored_watch.c | 13 ++++++++++---
- 2 files changed, 11 insertions(+), 3 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index ecab6cfbbe15..d86942f5aa77 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -463,6 +463,7 @@ static int destroy_conn(void *_conn)
- }
-
- conn_free_buffered_data(conn);
-+ conn_delete_all_watches(conn);
- list_for_each_entry(req, &conn->ref_list, list)
- req->on_ref_list = false;
-
-diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
-index 0755ffa375ba..fdf9b2d653a0 100644
---- a/tools/xenstore/xenstored_watch.c
-+++ b/tools/xenstore/xenstored_watch.c
-@@ -211,7 +211,7 @@ static int check_watch_path(struct connection *conn, const void *ctx,
- }
-
- static struct watch *add_watch(struct connection *conn, char *path, char *token,
-- bool relative)
-+ bool relative, bool no_quota_check)
- {
- struct watch *watch;
-
-@@ -222,6 +222,9 @@ static struct watch *add_watch(struct connection *conn, char *path, char *token,
- watch->token = talloc_strdup(watch, token);
- if (!watch->node || !watch->token)
- goto nomem;
-+ if (domain_memory_add(conn->id, strlen(path) + strlen(token),
-+ no_quota_check))
-+ goto nomem;
-
- if (relative)
- watch->relative_path = get_implicit_path(conn);
-@@ -265,7 +268,7 @@ int do_watch(struct connection *conn, struct buffered_data *in)
- if (domain_watch(conn) > quota_nb_watch_per_domain)
- return E2BIG;
-
-- watch = add_watch(conn, vec[0], vec[1], relative);
-+ watch = add_watch(conn, vec[0], vec[1], relative, false);
- if (!watch)
- return errno;
-
-@@ -296,6 +299,8 @@ int do_unwatch(struct connection *conn, struct buffered_data *in)
- list_for_each_entry(watch, &conn->watches, list) {
- if (streq(watch->node, node) && streq(watch->token, vec[1])) {
- list_del(&watch->list);
-+ domain_memory_add_nochk(conn->id, -strlen(watch->node) -
-+ strlen(watch->token));
- talloc_free(watch);
- domain_watch_dec(conn);
- send_ack(conn, XS_UNWATCH);
-@@ -311,6 +316,8 @@ void conn_delete_all_watches(struct connection *conn)
-
- while ((watch = list_top(&conn->watches, struct watch, list))) {
- list_del(&watch->list);
-+ domain_memory_add_nochk(conn->id, -strlen(watch->node) -
-+ strlen(watch->token));
- talloc_free(watch);
- domain_watch_dec(conn);
- }
-@@ -373,7 +380,7 @@ void read_state_watch(const void *ctx, const void *state)
- if (!path)
- barf("allocation error for read watch");
-
-- if (!add_watch(conn, path, token, relative))
-+ if (!add_watch(conn, path, token, relative, true))
- barf("error adding watch");
- }
-
---
-2.37.4
-
diff --git a/0059-automation-Remove-CentOS-7.2-containers-and-builds.patch b/0059-automation-Remove-CentOS-7.2-containers-and-builds.patch
new file mode 100644
index 0000000..8d58eea
--- /dev/null
+++ b/0059-automation-Remove-CentOS-7.2-containers-and-builds.patch
@@ -0,0 +1,145 @@
+From a4d901580b2ab3133bca13159b790914c217b0e2 Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Tue, 21 Feb 2023 16:55:36 +0000
+Subject: [PATCH 59/61] automation: Remove CentOS 7.2 containers and builds
+
+We already have a container which track the latest CentOS 7, no need
+for this one as well.
+
+Also, 7.2 have outdated root certificate which prevent connection to
+website which use Let's Encrypt.
+
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+(cherry picked from commit ba512629f76dfddb39ea9133ee51cdd9e392a927)
+---
+ automation/build/centos/7.2.dockerfile | 52 -------------------------
+ automation/build/centos/CentOS-7.2.repo | 35 -----------------
+ automation/gitlab-ci/build.yaml | 10 -----
+ 3 files changed, 97 deletions(-)
+ delete mode 100644 automation/build/centos/7.2.dockerfile
+ delete mode 100644 automation/build/centos/CentOS-7.2.repo
+
+diff --git a/automation/build/centos/7.2.dockerfile b/automation/build/centos/7.2.dockerfile
+deleted file mode 100644
+index 4baa097e31..0000000000
+--- a/automation/build/centos/7.2.dockerfile
++++ /dev/null
+@@ -1,52 +0,0 @@
+-FROM centos:7.2.1511
+-LABEL maintainer.name="The Xen Project" \
+- maintainer.email="xen-devel@lists.xenproject.org"
+-
+-# ensure we only get bits from the vault for
+-# the version we want
+-COPY CentOS-7.2.repo /etc/yum.repos.d/CentOS-Base.repo
+-
+-# install EPEL for dev86, xz-devel and possibly other packages
+-RUN yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm && \
+- yum clean all
+-
+-RUN mkdir /build
+-WORKDIR /build
+-
+-# work around https://github.com/moby/moby/issues/10180
+-# and install Xen depends
+-RUN rpm --rebuilddb && \
+- yum -y install \
+- yum-plugin-ovl \
+- gcc \
+- gcc-c++ \
+- ncurses-devel \
+- zlib-devel \
+- openssl-devel \
+- python-devel \
+- libuuid-devel \
+- pkgconfig \
+- # gettext for Xen < 4.13
+- gettext \
+- flex \
+- bison \
+- libaio-devel \
+- glib2-devel \
+- yajl-devel \
+- pixman-devel \
+- glibc-devel \
+- # glibc-devel.i686 for Xen < 4.15
+- glibc-devel.i686 \
+- make \
+- binutils \
+- git \
+- wget \
+- acpica-tools \
+- python-markdown \
+- patch \
+- checkpolicy \
+- dev86 \
+- xz-devel \
+- bzip2 \
+- nasm \
+- && yum clean all
+diff --git a/automation/build/centos/CentOS-7.2.repo b/automation/build/centos/CentOS-7.2.repo
+deleted file mode 100644
+index 4da27faeb5..0000000000
+--- a/automation/build/centos/CentOS-7.2.repo
++++ /dev/null
+@@ -1,35 +0,0 @@
+-# CentOS-Base.repo
+-#
+-# This is a replacement file that pins things to just use CentOS 7.2
+-# from the CentOS Vault.
+-#
+-
+-[base]
+-name=CentOS-7.2.1511 - Base
+-baseurl=http://vault.centos.org/7.2.1511/os/$basearch/
+-gpgcheck=1
+-gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
+-
+-#released updates
+-[updates]
+-name=CentOS-7.2.1511 - Updates
+-baseurl=http://vault.centos.org/7.2.1511/updates/$basearch/
+-gpgcheck=1
+-gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
+-
+-#additional packages that may be useful
+-[extras]
+-name=CentOS-7.2.1511 - Extras
+-baseurl=http://vault.centos.org/7.2.1511/extras/$basearch/
+-gpgcheck=1
+-gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
+-
+-#additional packages that extend functionality of existing packages
+-[centosplus]
+-name=CentOS-7.2.1511 - Plus
+-baseurl=http://vault.centos.org/7.2.1511/centosplus/$basearch/
+-gpgcheck=1
+-gpgcheck=1
+-enabled=0
+-gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
+-
+diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
+index f66fbca8a7..bc1a732069 100644
+--- a/automation/gitlab-ci/build.yaml
++++ b/automation/gitlab-ci/build.yaml
+@@ -184,16 +184,6 @@ archlinux-gcc-debug:
+ variables:
+ CONTAINER: archlinux:current
+
+-centos-7-2-gcc:
+- extends: .gcc-x86-64-build
+- variables:
+- CONTAINER: centos:7.2
+-
+-centos-7-2-gcc-debug:
+- extends: .gcc-x86-64-build-debug
+- variables:
+- CONTAINER: centos:7.2
+-
+ centos-7-gcc:
+ extends: .gcc-x86-64-build
+ variables:
+--
+2.40.0
+
diff --git a/0059-tools-xenstore-add-memory-accounting-for-nodes.patch b/0059-tools-xenstore-add-memory-accounting-for-nodes.patch
deleted file mode 100644
index a1ab308..0000000
--- a/0059-tools-xenstore-add-memory-accounting-for-nodes.patch
+++ /dev/null
@@ -1,342 +0,0 @@
-From 578d422af0b444a9e437dd0ceddf2049364f1a40 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:10 +0200
-Subject: [PATCH 59/87] tools/xenstore: add memory accounting for nodes
-
-Add the memory accounting for Xenstore nodes. In order to make this
-not too complicated allow for some sloppiness when writing nodes. Any
-hard quota violation will result in no further requests to be accepted.
-
-This is part of XSA-326 / CVE-2022-42315.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 00e9e32d022be1afc144b75acdaeba8393e63315)
----
- tools/xenstore/xenstored_core.c | 140 ++++++++++++++++++++++---
- tools/xenstore/xenstored_core.h | 12 +++
- tools/xenstore/xenstored_transaction.c | 16 ++-
- 3 files changed, 151 insertions(+), 17 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index d86942f5aa77..16504de42017 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -591,6 +591,117 @@ void set_tdb_key(const char *name, TDB_DATA *key)
- key->dsize = strlen(name);
- }
-
-+static void get_acc_data(TDB_DATA *key, struct node_account_data *acc)
-+{
-+ TDB_DATA old_data;
-+ struct xs_tdb_record_hdr *hdr;
-+
-+ if (acc->memory < 0) {
-+ old_data = tdb_fetch(tdb_ctx, *key);
-+ /* No check for error, as the node might not exist. */
-+ if (old_data.dptr == NULL) {
-+ acc->memory = 0;
-+ } else {
-+ hdr = (void *)old_data.dptr;
-+ acc->memory = old_data.dsize;
-+ acc->domid = hdr->perms[0].id;
-+ }
-+ talloc_free(old_data.dptr);
-+ }
-+}
-+
-+/*
-+ * Per-transaction nodes need to be accounted for the transaction owner.
-+ * Those nodes are stored in the data base with the transaction generation
-+ * count prepended (e.g. 123/local/domain/...). So testing for the node's
-+ * key not to start with "/" is sufficient.
-+ */
-+static unsigned int get_acc_domid(struct connection *conn, TDB_DATA *key,
-+ unsigned int domid)
-+{
-+ return (!conn || key->dptr[0] == '/') ? domid : conn->id;
-+}
-+
-+int do_tdb_write(struct connection *conn, TDB_DATA *key, TDB_DATA *data,
-+ struct node_account_data *acc, bool no_quota_check)
-+{
-+ struct xs_tdb_record_hdr *hdr = (void *)data->dptr;
-+ struct node_account_data old_acc = {};
-+ unsigned int old_domid, new_domid;
-+ int ret;
-+
-+ if (!acc)
-+ old_acc.memory = -1;
-+ else
-+ old_acc = *acc;
-+
-+ get_acc_data(key, &old_acc);
-+ old_domid = get_acc_domid(conn, key, old_acc.domid);
-+ new_domid = get_acc_domid(conn, key, hdr->perms[0].id);
-+
-+ /*
-+ * Don't check for ENOENT, as we want to be able to switch orphaned
-+ * nodes to new owners.
-+ */
-+ if (old_acc.memory)
-+ domain_memory_add_nochk(old_domid,
-+ -old_acc.memory - key->dsize);
-+ ret = domain_memory_add(new_domid, data->dsize + key->dsize,
-+ no_quota_check);
-+ if (ret) {
-+ /* Error path, so no quota check. */
-+ if (old_acc.memory)
-+ domain_memory_add_nochk(old_domid,
-+ old_acc.memory + key->dsize);
-+ return ret;
-+ }
-+
-+ /* TDB should set errno, but doesn't even set ecode AFAICT. */
-+ if (tdb_store(tdb_ctx, *key, *data, TDB_REPLACE) != 0) {
-+ domain_memory_add_nochk(new_domid, -data->dsize - key->dsize);
-+ /* Error path, so no quota check. */
-+ if (old_acc.memory)
-+ domain_memory_add_nochk(old_domid,
-+ old_acc.memory + key->dsize);
-+ errno = EIO;
-+ return errno;
-+ }
-+
-+ if (acc) {
-+ /* Don't use new_domid, as it might be a transaction node. */
-+ acc->domid = hdr->perms[0].id;
-+ acc->memory = data->dsize;
-+ }
-+
-+ return 0;
-+}
-+
-+int do_tdb_delete(struct connection *conn, TDB_DATA *key,
-+ struct node_account_data *acc)
-+{
-+ struct node_account_data tmp_acc;
-+ unsigned int domid;
-+
-+ if (!acc) {
-+ acc = &tmp_acc;
-+ acc->memory = -1;
-+ }
-+
-+ get_acc_data(key, acc);
-+
-+ if (tdb_delete(tdb_ctx, *key)) {
-+ errno = EIO;
-+ return errno;
-+ }
-+
-+ if (acc->memory) {
-+ domid = get_acc_domid(conn, key, acc->domid);
-+ domain_memory_add_nochk(domid, -acc->memory - key->dsize);
-+ }
-+
-+ return 0;
-+}
-+
- /*
- * If it fails, returns NULL and sets errno.
- * Temporary memory allocations will be done with ctx.
-@@ -644,9 +755,15 @@ struct node *read_node(struct connection *conn, const void *ctx,
-
- /* Permissions are struct xs_permissions. */
- node->perms.p = hdr->perms;
-+ node->acc.domid = node->perms.p[0].id;
-+ node->acc.memory = data.dsize;
- if (domain_adjust_node_perms(conn, node))
- goto error;
-
-+ /* If owner is gone reset currently accounted memory size. */
-+ if (node->acc.domid != node->perms.p[0].id)
-+ node->acc.memory = 0;
-+
- /* Data is binary blob (usually ascii, no nul). */
- node->data = node->perms.p + hdr->num_perms;
- /* Children is strings, nul separated. */
-@@ -715,12 +832,9 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
- p += node->datalen;
- memcpy(p, node->children, node->childlen);
-
-- /* TDB should set errno, but doesn't even set ecode AFAICT. */
-- if (tdb_store(tdb_ctx, *key, data, TDB_REPLACE) != 0) {
-- corrupt(conn, "Write of %s failed", key->dptr);
-- errno = EIO;
-- return errno;
-- }
-+ if (do_tdb_write(conn, key, &data, &node->acc, no_quota_check))
-+ return EIO;
-+
- return 0;
- }
-
-@@ -1222,7 +1336,7 @@ static void delete_node_single(struct connection *conn, struct node *node)
- if (access_node(conn, node, NODE_ACCESS_DELETE, &key))
- return;
-
-- if (tdb_delete(tdb_ctx, key) != 0) {
-+ if (do_tdb_delete(conn, &key, &node->acc) != 0) {
- corrupt(conn, "Could not delete '%s'", node->name);
- return;
- }
-@@ -1295,6 +1409,7 @@ static struct node *construct_node(struct connection *conn, const void *ctx,
- /* No children, no data */
- node->children = node->data = NULL;
- node->childlen = node->datalen = 0;
-+ node->acc.memory = 0;
- node->parent = parent;
- return node;
-
-@@ -1303,17 +1418,17 @@ nomem:
- return NULL;
- }
-
--static void destroy_node_rm(struct node *node)
-+static void destroy_node_rm(struct connection *conn, struct node *node)
- {
- if (streq(node->name, "/"))
- corrupt(NULL, "Destroying root node!");
-
-- tdb_delete(tdb_ctx, node->key);
-+ do_tdb_delete(conn, &node->key, &node->acc);
- }
-
- static int destroy_node(struct connection *conn, struct node *node)
- {
-- destroy_node_rm(node);
-+ destroy_node_rm(conn, node);
- domain_entry_dec(conn, node);
-
- /*
-@@ -1365,7 +1480,7 @@ static struct node *create_node(struct connection *conn, const void *ctx,
- /* Account for new node */
- if (i->parent) {
- if (domain_entry_inc(conn, i)) {
-- destroy_node_rm(i);
-+ destroy_node_rm(conn, i);
- return NULL;
- }
- }
-@@ -2291,7 +2406,7 @@ static int clean_store_(TDB_CONTEXT *tdb, TDB_DATA key, TDB_DATA val,
- if (!hashtable_search(reachable, name)) {
- log("clean_store: '%s' is orphaned!", name);
- if (recovery) {
-- tdb_delete(tdb, key);
-+ do_tdb_delete(NULL, &key, NULL);
- }
- }
-
-@@ -3149,6 +3264,7 @@ void read_state_node(const void *ctx, const void *state)
- if (!node)
- barf("allocation error restoring node");
-
-+ node->acc.memory = 0;
- node->name = name;
- node->generation = ++generation;
- node->datalen = sn->data_len;
-diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
-index 373af18297bf..da9ecce67f31 100644
---- a/tools/xenstore/xenstored_core.h
-+++ b/tools/xenstore/xenstored_core.h
-@@ -176,6 +176,11 @@ struct node_perms {
- struct xs_permissions *p;
- };
-
-+struct node_account_data {
-+ unsigned int domid;
-+ int memory; /* -1 if unknown */
-+};
-+
- struct node {
- const char *name;
- /* Key used to update TDB */
-@@ -198,6 +203,9 @@ struct node {
- /* Children, each nul-terminated. */
- unsigned int childlen;
- char *children;
-+
-+ /* Allocation information for node currently in store. */
-+ struct node_account_data acc;
- };
-
- /* Return the only argument in the input. */
-@@ -306,6 +314,10 @@ extern xengnttab_handle **xgt_handle;
- int remember_string(struct hashtable *hash, const char *str);
-
- void set_tdb_key(const char *name, TDB_DATA *key);
-+int do_tdb_write(struct connection *conn, TDB_DATA *key, TDB_DATA *data,
-+ struct node_account_data *acc, bool no_quota_check);
-+int do_tdb_delete(struct connection *conn, TDB_DATA *key,
-+ struct node_account_data *acc);
-
- void conn_free_buffered_data(struct connection *conn);
-
-diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
-index 7bd41eb475e3..ace9a11d77bb 100644
---- a/tools/xenstore/xenstored_transaction.c
-+++ b/tools/xenstore/xenstored_transaction.c
-@@ -153,6 +153,9 @@ struct transaction
- /* List of all transactions active on this connection. */
- struct list_head list;
-
-+ /* Connection this transaction is associated with. */
-+ struct connection *conn;
-+
- /* Connection-local identifier for this transaction. */
- uint32_t id;
-
-@@ -286,6 +289,8 @@ int access_node(struct connection *conn, struct node *node,
-
- introduce = true;
- i->ta_node = false;
-+ /* acc.memory < 0 means "unknown, get size from TDB". */
-+ node->acc.memory = -1;
-
- /*
- * Additional transaction-specific node for read type. We only
-@@ -410,11 +415,11 @@ static int finalize_transaction(struct connection *conn,
- goto err;
- hdr = (void *)data.dptr;
- hdr->generation = ++generation;
-- ret = tdb_store(tdb_ctx, key, data,
-- TDB_REPLACE);
-+ ret = do_tdb_write(conn, &key, &data, NULL,
-+ true);
- talloc_free(data.dptr);
- } else {
-- ret = tdb_delete(tdb_ctx, key);
-+ ret = do_tdb_delete(conn, &key, NULL);
- }
- if (ret)
- goto err;
-@@ -425,7 +430,7 @@ static int finalize_transaction(struct connection *conn,
- }
- }
-
-- if (i->ta_node && tdb_delete(tdb_ctx, ta_key))
-+ if (i->ta_node && do_tdb_delete(conn, &ta_key, NULL))
- goto err;
- list_del(&i->list);
- talloc_free(i);
-@@ -453,7 +458,7 @@ static int destroy_transaction(void *_transaction)
- i->node);
- if (trans_name) {
- set_tdb_key(trans_name, &key);
-- tdb_delete(tdb_ctx, key);
-+ do_tdb_delete(trans->conn, &key, NULL);
- }
- }
- list_del(&i->list);
-@@ -497,6 +502,7 @@ int do_transaction_start(struct connection *conn, struct buffered_data *in)
-
- INIT_LIST_HEAD(&trans->accessed);
- INIT_LIST_HEAD(&trans->changed_domains);
-+ trans->conn = conn;
- trans->fail = false;
- trans->generation = ++generation;
-
---
-2.37.4
-
diff --git a/0060-automation-Remove-non-debug-x86_32-build-jobs.patch b/0060-automation-Remove-non-debug-x86_32-build-jobs.patch
new file mode 100644
index 0000000..c5516be
--- /dev/null
+++ b/0060-automation-Remove-non-debug-x86_32-build-jobs.patch
@@ -0,0 +1,67 @@
+From 27974fde92850419e385ad0355997c54d78046f2 Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Fri, 24 Feb 2023 17:29:15 +0000
+Subject: [PATCH 60/61] automation: Remove non-debug x86_32 build jobs
+
+In the interest of having less jobs, we remove the x86_32 build jobs
+that do release build. Debug build is very likely to be enough to find
+32bit build issues.
+
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+(cherry picked from commit 7b66792ea7f77fb9e587e1e9c530a7c869eecba1)
+---
+ automation/gitlab-ci/build.yaml | 20 --------------------
+ 1 file changed, 20 deletions(-)
+
+diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
+index bc1a732069..4b51ad9e34 100644
+--- a/automation/gitlab-ci/build.yaml
++++ b/automation/gitlab-ci/build.yaml
+@@ -264,21 +264,11 @@ debian-stretch-gcc-debug:
+ variables:
+ CONTAINER: debian:stretch
+
+-debian-stretch-32-clang:
+- extends: .clang-x86-32-build
+- variables:
+- CONTAINER: debian:stretch-i386
+-
+ debian-stretch-32-clang-debug:
+ extends: .clang-x86-32-build-debug
+ variables:
+ CONTAINER: debian:stretch-i386
+
+-debian-stretch-32-gcc:
+- extends: .gcc-x86-32-build
+- variables:
+- CONTAINER: debian:stretch-i386
+-
+ debian-stretch-32-gcc-debug:
+ extends: .gcc-x86-32-build-debug
+ variables:
+@@ -316,21 +306,11 @@ debian-unstable-gcc-debug-randconfig:
+ CONTAINER: debian:unstable
+ RANDCONFIG: y
+
+-debian-unstable-32-clang:
+- extends: .clang-x86-32-build
+- variables:
+- CONTAINER: debian:unstable-i386
+-
+ debian-unstable-32-clang-debug:
+ extends: .clang-x86-32-build-debug
+ variables:
+ CONTAINER: debian:unstable-i386
+
+-debian-unstable-32-gcc:
+- extends: .gcc-x86-32-build
+- variables:
+- CONTAINER: debian:unstable-i386
+-
+ debian-unstable-32-gcc-debug:
+ extends: .gcc-x86-32-build-debug
+ variables:
+--
+2.40.0
+
diff --git a/0060-tools-xenstore-add-exports-for-quota-variables.patch b/0060-tools-xenstore-add-exports-for-quota-variables.patch
deleted file mode 100644
index 79ca465..0000000
--- a/0060-tools-xenstore-add-exports-for-quota-variables.patch
+++ /dev/null
@@ -1,62 +0,0 @@
-From 0a67b4eef104c36bef52990e413ef361acc8183c Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:10 +0200
-Subject: [PATCH 60/87] tools/xenstore: add exports for quota variables
-
-Some quota variables are not exported via header files.
-
-This is part of XSA-326.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 1da16d5990b5f7752657fca3e948f735177ea9ad)
----
- tools/xenstore/xenstored_core.h | 5 +++++
- tools/xenstore/xenstored_transaction.c | 1 -
- tools/xenstore/xenstored_watch.c | 2 --
- 3 files changed, 5 insertions(+), 3 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
-index da9ecce67f31..bfd3fc1e9df3 100644
---- a/tools/xenstore/xenstored_core.h
-+++ b/tools/xenstore/xenstored_core.h
-@@ -275,6 +275,11 @@ extern TDB_CONTEXT *tdb_ctx;
- extern int dom0_domid;
- extern int dom0_event;
- extern int priv_domid;
-+extern int quota_nb_watch_per_domain;
-+extern int quota_max_transaction;
-+extern int quota_max_entry_size;
-+extern int quota_nb_perms_per_node;
-+extern int quota_max_path_len;
- extern int quota_nb_entry_per_domain;
- extern int quota_req_outstanding;
- extern int quota_trans_nodes;
-diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
-index ace9a11d77bb..28774813de83 100644
---- a/tools/xenstore/xenstored_transaction.c
-+++ b/tools/xenstore/xenstored_transaction.c
-@@ -175,7 +175,6 @@ struct transaction
- bool fail;
- };
-
--extern int quota_max_transaction;
- uint64_t generation;
-
- static struct accessed_node *find_accessed_node(struct transaction *trans,
-diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
-index fdf9b2d653a0..85362bcce314 100644
---- a/tools/xenstore/xenstored_watch.c
-+++ b/tools/xenstore/xenstored_watch.c
-@@ -31,8 +31,6 @@
- #include "xenstored_domain.h"
- #include "xenstored_transaction.h"
-
--extern int quota_nb_watch_per_domain;
--
- struct watch
- {
- /* Watches on this connection */
---
-2.37.4
-
diff --git a/0061-CI-Remove-llvm-8-from-the-Debian-Stretch-container.patch b/0061-CI-Remove-llvm-8-from-the-Debian-Stretch-container.patch
new file mode 100644
index 0000000..9170382
--- /dev/null
+++ b/0061-CI-Remove-llvm-8-from-the-Debian-Stretch-container.patch
@@ -0,0 +1,103 @@
+From 31627a059c2e186f4ad12d171d964b09abe8a4a9 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 24 Mar 2023 17:59:56 +0000
+Subject: [PATCH 61/61] CI: Remove llvm-8 from the Debian Stretch container
+
+For similar reasons to c/s a6b1e2b80fe20. While this container is still
+build-able for now, all the other problems with explicitly-versioned compilers
+remain.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+(cherry picked from commit 7a298375721636290a57f31bb0f7c2a5a38956a4)
+---
+ automation/build/debian/stretch-llvm-8.list | 3 ---
+ automation/build/debian/stretch.dockerfile | 12 ---------
+ automation/gitlab-ci/build.yaml | 27 ---------------------
+ 3 files changed, 42 deletions(-)
+ delete mode 100644 automation/build/debian/stretch-llvm-8.list
+
+diff --git a/automation/build/debian/stretch-llvm-8.list b/automation/build/debian/stretch-llvm-8.list
+deleted file mode 100644
+index 09fe843fb2..0000000000
+--- a/automation/build/debian/stretch-llvm-8.list
++++ /dev/null
+@@ -1,3 +0,0 @@
+-# Strech LLVM 8 repos
+-deb http://apt.llvm.org/stretch/ llvm-toolchain-stretch-8 main
+-deb-src http://apt.llvm.org/stretch/ llvm-toolchain-stretch-8 main
+diff --git a/automation/build/debian/stretch.dockerfile b/automation/build/debian/stretch.dockerfile
+index da6aa874dd..9861acbcc3 100644
+--- a/automation/build/debian/stretch.dockerfile
++++ b/automation/build/debian/stretch.dockerfile
+@@ -53,15 +53,3 @@ RUN apt-get update && \
+ apt-get autoremove -y && \
+ apt-get clean && \
+ rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
+-
+-RUN wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add -
+-COPY stretch-llvm-8.list /etc/apt/sources.list.d/
+-
+-RUN apt-get update && \
+- apt-get --quiet --yes install \
+- clang-8 \
+- lld-8 \
+- && \
+- apt-get autoremove -y && \
+- apt-get clean && \
+- rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
+diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
+index 4b51ad9e34..fd8034b429 100644
+--- a/automation/gitlab-ci/build.yaml
++++ b/automation/gitlab-ci/build.yaml
+@@ -27,13 +27,6 @@
+ CXX: clang++
+ clang: y
+
+-.clang-8-tmpl:
+- variables: &clang-8
+- CC: clang-8
+- CXX: clang++-8
+- LD: ld.lld-8
+- clang: y
+-
+ .x86-64-build-tmpl:
+ <<: *build
+ variables:
+@@ -98,16 +91,6 @@
+ variables:
+ <<: *clang
+
+-.clang-8-x86-64-build:
+- extends: .x86-64-build
+- variables:
+- <<: *clang-8
+-
+-.clang-8-x86-64-build-debug:
+- extends: .x86-64-build-debug
+- variables:
+- <<: *clang-8
+-
+ .clang-x86-32-build:
+ extends: .x86-32-build
+ variables:
+@@ -244,16 +227,6 @@ debian-stretch-clang-debug:
+ variables:
+ CONTAINER: debian:stretch
+
+-debian-stretch-clang-8:
+- extends: .clang-8-x86-64-build
+- variables:
+- CONTAINER: debian:stretch
+-
+-debian-stretch-clang-8-debug:
+- extends: .clang-8-x86-64-build-debug
+- variables:
+- CONTAINER: debian:stretch
+-
+ debian-stretch-gcc:
+ extends: .gcc-x86-64-build
+ variables:
+--
+2.40.0
+
diff --git a/0061-tools-xenstore-add-control-command-for-setting-and-s.patch b/0061-tools-xenstore-add-control-command-for-setting-and-s.patch
deleted file mode 100644
index 5adcd35..0000000
--- a/0061-tools-xenstore-add-control-command-for-setting-and-s.patch
+++ /dev/null
@@ -1,248 +0,0 @@
-From b584b9b95687655f4f9f5c37fea3b1eea3f32886 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:10 +0200
-Subject: [PATCH 61/87] tools/xenstore: add control command for setting and
- showing quota
-
-Add a xenstore-control command "quota" to:
-- show current quota settings
-- change quota settings
-- show current quota related values of a domain
-
-Note that in the case the new quota is lower than existing one,
-Xenstored may continue to handle requests from a domain exceeding the
-new limit (depends on which one has been broken) and the amount of
-resource used will not change. However the domain will not be able to
-create more resource (associated to the quota) until it is back to below
-the limit.
-
-This is part of XSA-326.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 9c484bef83496b683b0087e3bd2a560da4aa37af)
----
- docs/misc/xenstore.txt | 11 +++
- tools/xenstore/xenstored_control.c | 111 +++++++++++++++++++++++++++++
- tools/xenstore/xenstored_domain.c | 33 +++++++++
- tools/xenstore/xenstored_domain.h | 2 +
- 4 files changed, 157 insertions(+)
-
-diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
-index 334dc8b6fdf5..a7d006519ae8 100644
---- a/docs/misc/xenstore.txt
-+++ b/docs/misc/xenstore.txt
-@@ -366,6 +366,17 @@ CONTROL <command>|[<parameters>|]
- print|<string>
- print <string> to syslog (xenstore runs as daemon) or
- to console (xenstore runs as stubdom)
-+ quota|[set <name> <val>|<domid>]
-+ without parameters: print the current quota settings
-+ with "set <name> <val>": set the quota <name> to new value
-+ <val> (The admin should make sure all the domain usage is
-+ below the quota. If it is not, then Xenstored may continue to
-+ handle requests from the domain as long as the resource
-+ violating the new quota setting isn't increased further)
-+ with "<domid>": print quota related accounting data for
-+ the domain <domid>
-+ quota-soft|[set <name> <val>]
-+ like the "quota" command, but for soft-quota.
- help <supported-commands>
- return list of supported commands for CONTROL
-
-diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
-index adb8d51b043b..1031a81c3874 100644
---- a/tools/xenstore/xenstored_control.c
-+++ b/tools/xenstore/xenstored_control.c
-@@ -196,6 +196,115 @@ static int do_control_log(void *ctx, struct connection *conn,
- return 0;
- }
-
-+struct quota {
-+ const char *name;
-+ int *quota;
-+ const char *descr;
-+};
-+
-+static const struct quota hard_quotas[] = {
-+ { "nodes", "a_nb_entry_per_domain, "Nodes per domain" },
-+ { "watches", "a_nb_watch_per_domain, "Watches per domain" },
-+ { "transactions", "a_max_transaction, "Transactions per domain" },
-+ { "outstanding", "a_req_outstanding,
-+ "Outstanding requests per domain" },
-+ { "transaction-nodes", "a_trans_nodes,
-+ "Max. number of accessed nodes per transaction" },
-+ { "memory", "a_memory_per_domain_hard,
-+ "Total Xenstore memory per domain (error level)" },
-+ { "node-size", "a_max_entry_size, "Max. size of a node" },
-+ { "path-max", "a_max_path_len, "Max. length of a node path" },
-+ { "permissions", "a_nb_perms_per_node,
-+ "Max. number of permissions per node" },
-+ { NULL, NULL, NULL }
-+};
-+
-+static const struct quota soft_quotas[] = {
-+ { "memory", "a_memory_per_domain_soft,
-+ "Total Xenstore memory per domain (warning level)" },
-+ { NULL, NULL, NULL }
-+};
-+
-+static int quota_show_current(const void *ctx, struct connection *conn,
-+ const struct quota *quotas)
-+{
-+ char *resp;
-+ unsigned int i;
-+
-+ resp = talloc_strdup(ctx, "Quota settings:\n");
-+ if (!resp)
-+ return ENOMEM;
-+
-+ for (i = 0; quotas[i].quota; i++) {
-+ resp = talloc_asprintf_append(resp, "%-17s: %8d %s\n",
-+ quotas[i].name, *quotas[i].quota,
-+ quotas[i].descr);
-+ if (!resp)
-+ return ENOMEM;
-+ }
-+
-+ send_reply(conn, XS_CONTROL, resp, strlen(resp) + 1);
-+
-+ return 0;
-+}
-+
-+static int quota_set(const void *ctx, struct connection *conn,
-+ char **vec, int num, const struct quota *quotas)
-+{
-+ unsigned int i;
-+ int val;
-+
-+ if (num != 2)
-+ return EINVAL;
-+
-+ val = atoi(vec[1]);
-+ if (val < 1)
-+ return EINVAL;
-+
-+ for (i = 0; quotas[i].quota; i++) {
-+ if (!strcmp(vec[0], quotas[i].name)) {
-+ *quotas[i].quota = val;
-+ send_ack(conn, XS_CONTROL);
-+ return 0;
-+ }
-+ }
-+
-+ return EINVAL;
-+}
-+
-+static int quota_get(const void *ctx, struct connection *conn,
-+ char **vec, int num)
-+{
-+ if (num != 1)
-+ return EINVAL;
-+
-+ return domain_get_quota(ctx, conn, atoi(vec[0]));
-+}
-+
-+static int do_control_quota(void *ctx, struct connection *conn,
-+ char **vec, int num)
-+{
-+ if (num == 0)
-+ return quota_show_current(ctx, conn, hard_quotas);
-+
-+ if (!strcmp(vec[0], "set"))
-+ return quota_set(ctx, conn, vec + 1, num - 1, hard_quotas);
-+
-+ return quota_get(ctx, conn, vec, num);
-+}
-+
-+static int do_control_quota_s(void *ctx, struct connection *conn,
-+ char **vec, int num)
-+{
-+ if (num == 0)
-+ return quota_show_current(ctx, conn, soft_quotas);
-+
-+ if (!strcmp(vec[0], "set"))
-+ return quota_set(ctx, conn, vec + 1, num - 1, soft_quotas);
-+
-+ return EINVAL;
-+}
-+
- #ifdef __MINIOS__
- static int do_control_memreport(void *ctx, struct connection *conn,
- char **vec, int num)
-@@ -847,6 +956,8 @@ static struct cmd_s cmds[] = {
- { "memreport", do_control_memreport, "[<file>]" },
- #endif
- { "print", do_control_print, "<string>" },
-+ { "quota", do_control_quota, "[set <name> <val>|<domid>]" },
-+ { "quota-soft", do_control_quota_s, "[set <name> <val>]" },
- { "help", do_control_help, "" },
- };
-
-diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
-index 94fd561e9de4..e7c6886ccf47 100644
---- a/tools/xenstore/xenstored_domain.c
-+++ b/tools/xenstore/xenstored_domain.c
-@@ -31,6 +31,7 @@
- #include "xenstored_domain.h"
- #include "xenstored_transaction.h"
- #include "xenstored_watch.h"
-+#include "xenstored_control.h"
-
- #include <xenevtchn.h>
- #include <xenctrl.h>
-@@ -345,6 +346,38 @@ static struct domain *find_domain_struct(unsigned int domid)
- return NULL;
- }
-
-+int domain_get_quota(const void *ctx, struct connection *conn,
-+ unsigned int domid)
-+{
-+ struct domain *d = find_domain_struct(domid);
-+ char *resp;
-+ int ta;
-+
-+ if (!d)
-+ return ENOENT;
-+
-+ ta = d->conn ? d->conn->transaction_started : 0;
-+ resp = talloc_asprintf(ctx, "Domain %u:\n", domid);
-+ if (!resp)
-+ return ENOMEM;
-+
-+#define ent(t, e) \
-+ resp = talloc_asprintf_append(resp, "%-16s: %8d\n", #t, e); \
-+ if (!resp) return ENOMEM
-+
-+ ent(nodes, d->nbentry);
-+ ent(watches, d->nbwatch);
-+ ent(transactions, ta);
-+ ent(outstanding, d->nboutstanding);
-+ ent(memory, d->memory);
-+
-+#undef ent
-+
-+ send_reply(conn, XS_CONTROL, resp, strlen(resp) + 1);
-+
-+ return 0;
-+}
-+
- static struct domain *alloc_domain(const void *context, unsigned int domid)
- {
- struct domain *domain;
-diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
-index 633c9a0a0a1f..904faa923afb 100644
---- a/tools/xenstore/xenstored_domain.h
-+++ b/tools/xenstore/xenstored_domain.h
-@@ -87,6 +87,8 @@ int domain_watch(struct connection *conn);
- void domain_outstanding_inc(struct connection *conn);
- void domain_outstanding_dec(struct connection *conn);
- void domain_outstanding_domid_dec(unsigned int domid);
-+int domain_get_quota(const void *ctx, struct connection *conn,
-+ unsigned int domid);
-
- /* Special node permission handling. */
- int set_perms_special(struct connection *conn, const char *name,
---
-2.37.4
-
diff --git a/0062-tools-ocaml-xenstored-Synchronise-defaults-with-oxen.patch b/0062-tools-ocaml-xenstored-Synchronise-defaults-with-oxen.patch
deleted file mode 100644
index b9f5b18..0000000
--- a/0062-tools-ocaml-xenstored-Synchronise-defaults-with-oxen.patch
+++ /dev/null
@@ -1,63 +0,0 @@
-From b0e95b451225de4db99bbe0b8dc79fdf08873e9e Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Wed, 12 Oct 2022 19:13:01 +0100
-Subject: [PATCH 62/87] tools/ocaml/xenstored: Synchronise defaults with
- oxenstore.conf.in
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-We currently have 2 different set of defaults in upstream Xen git tree:
-* defined in the source code, only used if there is no config file
-* defined in the oxenstored.conf.in upstream Xen
-
-An oxenstored.conf file is not mandatory, and if missing, maxrequests in
-particular has an unsafe default.
-
-Resync the defaults from oxenstored.conf.in into the source code.
-
-This is part of XSA-326 / CVE-2022-42316.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 84734955d4bf629ba459a74773afcde50a52236f)
----
- tools/ocaml/xenstored/define.ml | 6 +++---
- tools/ocaml/xenstored/quota.ml | 4 ++--
- 2 files changed, 5 insertions(+), 5 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/define.ml b/tools/ocaml/xenstored/define.ml
-index ebe18b8e312c..6b06f808595b 100644
---- a/tools/ocaml/xenstored/define.ml
-+++ b/tools/ocaml/xenstored/define.ml
-@@ -21,9 +21,9 @@ let xs_daemon_socket = Paths.xen_run_stored ^ "/socket"
-
- let default_config_dir = Paths.xen_config_dir
-
--let maxwatch = ref (50)
--let maxtransaction = ref (20)
--let maxrequests = ref (-1) (* maximum requests per transaction *)
-+let maxwatch = ref (100)
-+let maxtransaction = ref (10)
-+let maxrequests = ref (1024) (* maximum requests per transaction *)
-
- let conflict_burst_limit = ref 5.0
- let conflict_max_history_seconds = ref 0.05
-diff --git a/tools/ocaml/xenstored/quota.ml b/tools/ocaml/xenstored/quota.ml
-index abcac912805a..6e3d6401ae89 100644
---- a/tools/ocaml/xenstored/quota.ml
-+++ b/tools/ocaml/xenstored/quota.ml
-@@ -20,8 +20,8 @@ exception Transaction_opened
-
- let warn fmt = Logging.warn "quota" fmt
- let activate = ref true
--let maxent = ref (10000)
--let maxsize = ref (4096)
-+let maxent = ref (1000)
-+let maxsize = ref (2048)
-
- type t = {
- maxent: int; (* max entities per domU *)
---
-2.37.4
-
diff --git a/0063-tools-ocaml-xenstored-Check-for-maxrequests-before-p.patch b/0063-tools-ocaml-xenstored-Check-for-maxrequests-before-p.patch
deleted file mode 100644
index 5b3b646..0000000
--- a/0063-tools-ocaml-xenstored-Check-for-maxrequests-before-p.patch
+++ /dev/null
@@ -1,101 +0,0 @@
-From ab21bb1971a7fa9308053b0686f43277f6e8a6c9 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Thu, 28 Jul 2022 17:08:15 +0100
-Subject: [PATCH 63/87] tools/ocaml/xenstored: Check for maxrequests before
- performing operations
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Previously we'd perform the operation, record the updated tree in the
-transaction record, then try to insert a watchop path and the reply packet.
-
-If we exceeded max requests we would've returned EQUOTA, but still:
-* have performed the operation on the transaction's tree
-* have recorded the watchop, making this queue effectively unbounded
-
-It is better if we check whether we'd have room to store the operation before
-performing the transaction, and raise EQUOTA there. Then the transaction
-record won't grow.
-
-This is part of XSA-326 / CVE-2022-42317.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 329f4d1a6535c6c5a34025ca0d03fc5c7228fcff)
----
- tools/ocaml/xenstored/process.ml | 4 +++-
- tools/ocaml/xenstored/transaction.ml | 16 ++++++++++++----
- 2 files changed, 15 insertions(+), 5 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
-index 27790d4a5c41..dd58e6979cf9 100644
---- a/tools/ocaml/xenstored/process.ml
-+++ b/tools/ocaml/xenstored/process.ml
-@@ -389,6 +389,7 @@ let input_handle_error ~cons ~doms ~fct ~con ~t ~req =
- let reply_error e =
- Packet.Error e in
- try
-+ Transaction.check_quota_exn ~perm:(Connection.get_perm con) t;
- fct con t doms cons req.Packet.data
- with
- | Define.Invalid_path -> reply_error "EINVAL"
-@@ -681,9 +682,10 @@ let process_packet ~store ~cons ~doms ~con ~req =
- in
-
- let response = try
-+ Transaction.check_quota_exn ~perm:(Connection.get_perm con) t;
- if tid <> Transaction.none then
- (* Remember the request and response for this operation in case we need to replay the transaction *)
-- Transaction.add_operation ~perm:(Connection.get_perm con) t req response;
-+ Transaction.add_operation t req response;
- response
- with Quota.Limit_reached ->
- Packet.Error "EQUOTA"
-diff --git a/tools/ocaml/xenstored/transaction.ml b/tools/ocaml/xenstored/transaction.ml
-index 17b1bdf2eaf9..294143e2335b 100644
---- a/tools/ocaml/xenstored/transaction.ml
-+++ b/tools/ocaml/xenstored/transaction.ml
-@@ -85,6 +85,7 @@ type t = {
- oldroot: Store.Node.t;
- mutable paths: (Xenbus.Xb.Op.operation * Store.Path.t) list;
- mutable operations: (Packet.request * Packet.response) list;
-+ mutable quota_reached: bool;
- mutable read_lowpath: Store.Path.t option;
- mutable write_lowpath: Store.Path.t option;
- }
-@@ -127,6 +128,7 @@ let make ?(internal=false) id store =
- oldroot = Store.get_root store;
- paths = [];
- operations = [];
-+ quota_reached = false;
- read_lowpath = None;
- write_lowpath = None;
- } in
-@@ -143,13 +145,19 @@ let get_root t = Store.get_root t.store
-
- let is_read_only t = t.paths = []
- let add_wop t ty path = t.paths <- (ty, path) :: t.paths
--let add_operation ~perm t request response =
-+let get_operations t = List.rev t.operations
-+
-+let check_quota_exn ~perm t =
- if !Define.maxrequests >= 0
- && not (Perms.Connection.is_dom0 perm)
-- && List.length t.operations >= !Define.maxrequests
-- then raise Quota.Limit_reached;
-+ && (t.quota_reached || List.length t.operations >= !Define.maxrequests)
-+ then begin
-+ t.quota_reached <- true;
-+ raise Quota.Limit_reached;
-+ end
-+
-+let add_operation t request response =
- t.operations <- (request, response) :: t.operations
--let get_operations t = List.rev t.operations
- let set_read_lowpath t path = t.read_lowpath <- get_lowest path t.read_lowpath
- let set_write_lowpath t path = t.write_lowpath <- get_lowest path t.write_lowpath
-
---
-2.37.4
-
diff --git a/0064-tools-ocaml-GC-parameter-tuning.patch b/0064-tools-ocaml-GC-parameter-tuning.patch
deleted file mode 100644
index 6c80e2d..0000000
--- a/0064-tools-ocaml-GC-parameter-tuning.patch
+++ /dev/null
@@ -1,126 +0,0 @@
-From a63bbcf5318b487ca86574d7fcf916958af5ed02 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Wed, 12 Oct 2022 19:13:07 +0100
-Subject: [PATCH 64/87] tools/ocaml: GC parameter tuning
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-By default the OCaml garbage collector would return memory to the OS only
-after unused memory is 5x live memory. Tweak this to 120% instead, which
-would match the major GC speed.
-
-This is part of XSA-326.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 4a8bacff20b857ca0d628ef5525877ade11f2a42)
----
- tools/ocaml/xenstored/define.ml | 1 +
- tools/ocaml/xenstored/xenstored.ml | 64 ++++++++++++++++++++++++++++++
- 2 files changed, 65 insertions(+)
-
-diff --git a/tools/ocaml/xenstored/define.ml b/tools/ocaml/xenstored/define.ml
-index 6b06f808595b..ba63a8147e09 100644
---- a/tools/ocaml/xenstored/define.ml
-+++ b/tools/ocaml/xenstored/define.ml
-@@ -25,6 +25,7 @@ let maxwatch = ref (100)
- let maxtransaction = ref (10)
- let maxrequests = ref (1024) (* maximum requests per transaction *)
-
-+let gc_max_overhead = ref 120 (* 120% see comment in xenstored.ml *)
- let conflict_burst_limit = ref 5.0
- let conflict_max_history_seconds = ref 0.05
- let conflict_rate_limit_is_aggregate = ref true
-diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
-index d44ae673c42a..3b57ad016dfb 100644
---- a/tools/ocaml/xenstored/xenstored.ml
-+++ b/tools/ocaml/xenstored/xenstored.ml
-@@ -104,6 +104,7 @@ let parse_config filename =
- ("quota-maxsize", Config.Set_int Quota.maxsize);
- ("quota-maxrequests", Config.Set_int Define.maxrequests);
- ("quota-path-max", Config.Set_int Define.path_max);
-+ ("gc-max-overhead", Config.Set_int Define.gc_max_overhead);
- ("test-eagain", Config.Set_bool Transaction.test_eagain);
- ("persistent", Config.Set_bool Disk.enable);
- ("xenstored-log-file", Config.String Logging.set_xenstored_log_destination);
-@@ -265,6 +266,67 @@ let to_file store cons fds file =
- (fun () -> close_out channel)
- end
-
-+(*
-+ By default OCaml's GC only returns memory to the OS when it exceeds a
-+ configurable 'max overhead' setting.
-+ The default is 500%, that is 5/6th of the OCaml heap needs to be free
-+ and only 1/6th live for a compaction to be triggerred that would
-+ release memory back to the OS.
-+ If the limit is not hit then the OCaml process can reuse that memory
-+ for its own purposes, but other processes won't be able to use it.
-+
-+ There is also a 'space overhead' setting that controls how much work
-+ each major GC slice does, and by default aims at having no more than
-+ 80% or 120% (depending on version) garbage values compared to live
-+ values.
-+ This doesn't have as much relevance to memory returned to the OS as
-+ long as space_overhead <= max_overhead, because compaction is only
-+ triggerred at the end of major GC cycles.
-+
-+ The defaults are too large once the program starts using ~100MiB of
-+ memory, at which point ~500MiB would be unavailable to other processes
-+ (which would be fine if this was the main process in this VM, but it is
-+ not).
-+
-+ Max overhead can also be set to 0, however this is for testing purposes
-+ only (setting it lower than 'space overhead' wouldn't help because the
-+ major GC wouldn't run fast enough, and compaction does have a
-+ performance cost: we can only compact contiguous regions, so memory has
-+ to be moved around).
-+
-+ Max overhead controls how often the heap is compacted, which is useful
-+ if there are burst of activity followed by long periods of idle state,
-+ or if a domain quits, etc. Compaction returns memory to the OS.
-+
-+ wasted = live * space_overhead / 100
-+
-+ For globally overriding the GC settings one can use OCAMLRUNPARAM,
-+ however we provide a config file override to be consistent with other
-+ oxenstored settings.
-+
-+ One might want to dynamically adjust the overhead setting based on used
-+ memory, i.e. to use a fixed upper bound in bytes, not percentage. However
-+ measurements show that such adjustments increase GC overhead massively,
-+ while still not guaranteeing that memory is returned any more quickly
-+ than with a percentage based setting.
-+
-+ The allocation policy could also be tweaked, e.g. first fit would reduce
-+ fragmentation and thus memory usage, but the documentation warns that it
-+ can be sensibly slower, and indeed one of our own testcases can trigger
-+ such a corner case where it is multiple times slower, so it is best to keep
-+ the default allocation policy (next-fit/best-fit depending on version).
-+
-+ There are other tweaks that can be attempted in the future, e.g. setting
-+ 'ulimit -v' to 75% of RAM, however getting the kernel to actually return
-+ NULL from allocations is difficult even with that setting, and without a
-+ NULL the emergency GC won't be triggerred.
-+ Perhaps cgroup limits could help, but for now tweak the safest only.
-+*)
-+
-+let tweak_gc () =
-+ Gc.set { (Gc.get ()) with Gc.max_overhead = !Define.gc_max_overhead }
-+
-+
- let _ =
- let cf = do_argv in
- let pidfile =
-@@ -274,6 +336,8 @@ let _ =
- default_pidfile
- in
-
-+ tweak_gc ();
-+
- (try
- Unixext.mkdir_rec (Filename.dirname pidfile) 0o755
- with _ ->
---
-2.37.4
-
diff --git a/0065-tools-ocaml-libs-xb-hide-type-of-Xb.t.patch b/0065-tools-ocaml-libs-xb-hide-type-of-Xb.t.patch
deleted file mode 100644
index 4c1bcbe..0000000
--- a/0065-tools-ocaml-libs-xb-hide-type-of-Xb.t.patch
+++ /dev/null
@@ -1,92 +0,0 @@
-From 8b60ad49b46f2e020e0f0847df80c768d669cdb2 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Fri, 29 Jul 2022 18:53:29 +0100
-Subject: [PATCH 65/87] tools/ocaml/libs/xb: hide type of Xb.t
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Hiding the type will make it easier to change the implementation
-in the future without breaking code that relies on it.
-
-No functional change.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 7ade30a1451734d041363c750a65d322e25b47ba)
----
- tools/ocaml/libs/xb/xb.ml | 3 +++
- tools/ocaml/libs/xb/xb.mli | 9 ++-------
- tools/ocaml/xenstored/connection.ml | 8 ++------
- 3 files changed, 7 insertions(+), 13 deletions(-)
-
-diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
-index 104d319d7747..8404ddd8a682 100644
---- a/tools/ocaml/libs/xb/xb.ml
-+++ b/tools/ocaml/libs/xb/xb.ml
-@@ -196,6 +196,9 @@ let peek_output con = Queue.peek con.pkt_out
- let input_len con = Queue.length con.pkt_in
- let has_in_packet con = Queue.length con.pkt_in > 0
- let get_in_packet con = Queue.pop con.pkt_in
-+let has_partial_input con = match con.partial_in with
-+ | HaveHdr _ -> true
-+ | NoHdr (n, _) -> n < Partial.header_size ()
- let has_more_input con =
- match con.backend with
- | Fd _ -> false
-diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
-index 3a00da6cddc1..794e35bb343e 100644
---- a/tools/ocaml/libs/xb/xb.mli
-+++ b/tools/ocaml/libs/xb/xb.mli
-@@ -66,13 +66,7 @@ type backend_mmap = {
- type backend_fd = { fd : Unix.file_descr; }
- type backend = Fd of backend_fd | Xenmmap of backend_mmap
- type partial_buf = HaveHdr of Partial.pkt | NoHdr of int * bytes
--type t = {
-- backend : backend;
-- pkt_in : Packet.t Queue.t;
-- pkt_out : Packet.t Queue.t;
-- mutable partial_in : partial_buf;
-- mutable partial_out : string;
--}
-+type t
- val init_partial_in : unit -> partial_buf
- val reconnect : t -> unit
- val queue : t -> Packet.t -> unit
-@@ -97,6 +91,7 @@ val has_output : t -> bool
- val peek_output : t -> Packet.t
- val input_len : t -> int
- val has_in_packet : t -> bool
-+val has_partial_input : t -> bool
- val get_in_packet : t -> Packet.t
- val has_more_input : t -> bool
- val is_selectable : t -> bool
-diff --git a/tools/ocaml/xenstored/connection.ml b/tools/ocaml/xenstored/connection.ml
-index 65f99ea6f28a..38b47363a173 100644
---- a/tools/ocaml/xenstored/connection.ml
-+++ b/tools/ocaml/xenstored/connection.ml
-@@ -125,9 +125,7 @@ let get_perm con =
- let set_target con target_domid =
- con.perm <- Perms.Connection.set_target (get_perm con) ~perms:[Perms.READ; Perms.WRITE] target_domid
-
--let is_backend_mmap con = match con.xb.Xenbus.Xb.backend with
-- | Xenbus.Xb.Xenmmap _ -> true
-- | _ -> false
-+let is_backend_mmap con = Xenbus.Xb.is_mmap con.xb
-
- let send_reply con tid rid ty data =
- if (String.length data) > xenstore_payload_max && (is_backend_mmap con) then
-@@ -280,9 +278,7 @@ let get_transaction con tid =
-
- let do_input con = Xenbus.Xb.input con.xb
- let has_input con = Xenbus.Xb.has_in_packet con.xb
--let has_partial_input con = match con.xb.Xenbus.Xb.partial_in with
-- | HaveHdr _ -> true
-- | NoHdr (n, _) -> n < Xenbus.Partial.header_size ()
-+let has_partial_input con = Xenbus.Xb.has_partial_input con.xb
- let pop_in con = Xenbus.Xb.get_in_packet con.xb
- let has_more_input con = Xenbus.Xb.has_more_input con.xb
-
---
-2.37.4
-
diff --git a/0066-tools-ocaml-Change-Xb.input-to-return-Packet.t-optio.patch b/0066-tools-ocaml-Change-Xb.input-to-return-Packet.t-optio.patch
deleted file mode 100644
index 0fa056d..0000000
--- a/0066-tools-ocaml-Change-Xb.input-to-return-Packet.t-optio.patch
+++ /dev/null
@@ -1,224 +0,0 @@
-From 59981b08c8ef6eed37b1171656c2a5f3b4b74012 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Wed, 12 Oct 2022 19:13:02 +0100
-Subject: [PATCH 66/87] tools/ocaml: Change Xb.input to return Packet.t option
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The queue here would only ever hold at most one element. This will simplify
-follow-up patches.
-
-This is part of XSA-326.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit c0a86a462721008eca5ff733660de094d3c34bc7)
----
- tools/ocaml/libs/xb/xb.ml | 18 +++++-------------
- tools/ocaml/libs/xb/xb.mli | 5 +----
- tools/ocaml/libs/xs/xsraw.ml | 20 ++++++--------------
- tools/ocaml/xenstored/connection.ml | 4 +---
- tools/ocaml/xenstored/process.ml | 15 +++++++--------
- 5 files changed, 20 insertions(+), 42 deletions(-)
-
-diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
-index 8404ddd8a682..165fd4a1edf4 100644
---- a/tools/ocaml/libs/xb/xb.ml
-+++ b/tools/ocaml/libs/xb/xb.ml
-@@ -45,7 +45,6 @@ type partial_buf = HaveHdr of Partial.pkt | NoHdr of int * bytes
- type t =
- {
- backend: backend;
-- pkt_in: Packet.t Queue.t;
- pkt_out: Packet.t Queue.t;
- mutable partial_in: partial_buf;
- mutable partial_out: string;
-@@ -62,7 +61,6 @@ let reconnect t = match t.backend with
- Xs_ring.close backend.mmap;
- backend.eventchn_notify ();
- (* Clear our old connection state *)
-- Queue.clear t.pkt_in;
- Queue.clear t.pkt_out;
- t.partial_in <- init_partial_in ();
- t.partial_out <- ""
-@@ -124,7 +122,6 @@ let output con =
-
- (* NB: can throw Reconnect *)
- let input con =
-- let newpacket = ref false in
- let to_read =
- match con.partial_in with
- | HaveHdr partial_pkt -> Partial.to_complete partial_pkt
-@@ -143,21 +140,19 @@ let input con =
- if Partial.to_complete partial_pkt = 0 then (
- let pkt = Packet.of_partialpkt partial_pkt in
- con.partial_in <- init_partial_in ();
-- Queue.push pkt con.pkt_in;
-- newpacket := true
-- )
-+ Some pkt
-+ ) else None
- | NoHdr (i, buf) ->
- (* we complete the partial header *)
- if sz > 0 then
- Bytes.blit b 0 buf (Partial.header_size () - i) sz;
- con.partial_in <- if sz = i then
-- HaveHdr (Partial.of_string (Bytes.to_string buf)) else NoHdr (i - sz, buf)
-- );
-- !newpacket
-+ HaveHdr (Partial.of_string (Bytes.to_string buf)) else NoHdr (i - sz, buf);
-+ None
-+ )
-
- let newcon backend = {
- backend = backend;
-- pkt_in = Queue.create ();
- pkt_out = Queue.create ();
- partial_in = init_partial_in ();
- partial_out = "";
-@@ -193,9 +188,6 @@ let has_output con = has_new_output con || has_old_output con
-
- let peek_output con = Queue.peek con.pkt_out
-
--let input_len con = Queue.length con.pkt_in
--let has_in_packet con = Queue.length con.pkt_in > 0
--let get_in_packet con = Queue.pop con.pkt_in
- let has_partial_input con = match con.partial_in with
- | HaveHdr _ -> true
- | NoHdr (n, _) -> n < Partial.header_size ()
-diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
-index 794e35bb343e..91c682162cea 100644
---- a/tools/ocaml/libs/xb/xb.mli
-+++ b/tools/ocaml/libs/xb/xb.mli
-@@ -77,7 +77,7 @@ val write_fd : backend_fd -> 'a -> string -> int -> int
- val write_mmap : backend_mmap -> 'a -> string -> int -> int
- val write : t -> string -> int -> int
- val output : t -> bool
--val input : t -> bool
-+val input : t -> Packet.t option
- val newcon : backend -> t
- val open_fd : Unix.file_descr -> t
- val open_mmap : Xenmmap.mmap_interface -> (unit -> unit) -> t
-@@ -89,10 +89,7 @@ val has_new_output : t -> bool
- val has_old_output : t -> bool
- val has_output : t -> bool
- val peek_output : t -> Packet.t
--val input_len : t -> int
--val has_in_packet : t -> bool
- val has_partial_input : t -> bool
--val get_in_packet : t -> Packet.t
- val has_more_input : t -> bool
- val is_selectable : t -> bool
- val get_fd : t -> Unix.file_descr
-diff --git a/tools/ocaml/libs/xs/xsraw.ml b/tools/ocaml/libs/xs/xsraw.ml
-index d982fb24dbb1..451f8b38dbcc 100644
---- a/tools/ocaml/libs/xs/xsraw.ml
-+++ b/tools/ocaml/libs/xs/xsraw.ml
-@@ -94,26 +94,18 @@ let pkt_send con =
- done
-
- (* receive one packet - can sleep *)
--let pkt_recv con =
-- let workdone = ref false in
-- while not !workdone
-- do
-- workdone := Xb.input con.xb
-- done;
-- Xb.get_in_packet con.xb
-+let rec pkt_recv con =
-+ match Xb.input con.xb with
-+ | Some packet -> packet
-+ | None -> pkt_recv con
-
- let pkt_recv_timeout con timeout =
- let fd = Xb.get_fd con.xb in
- let r, _, _ = Unix.select [ fd ] [] [] timeout in
- if r = [] then
- true, None
-- else (
-- let workdone = Xb.input con.xb in
-- if workdone then
-- false, (Some (Xb.get_in_packet con.xb))
-- else
-- false, None
-- )
-+ else
-+ false, Xb.input con.xb
-
- let queue_watchevent con data =
- let ls = split_string ~limit:2 '\000' data in
-diff --git a/tools/ocaml/xenstored/connection.ml b/tools/ocaml/xenstored/connection.ml
-index 38b47363a173..cc20e047d2b9 100644
---- a/tools/ocaml/xenstored/connection.ml
-+++ b/tools/ocaml/xenstored/connection.ml
-@@ -277,9 +277,7 @@ let get_transaction con tid =
- Hashtbl.find con.transactions tid
-
- let do_input con = Xenbus.Xb.input con.xb
--let has_input con = Xenbus.Xb.has_in_packet con.xb
- let has_partial_input con = Xenbus.Xb.has_partial_input con.xb
--let pop_in con = Xenbus.Xb.get_in_packet con.xb
- let has_more_input con = Xenbus.Xb.has_more_input con.xb
-
- let has_output con = Xenbus.Xb.has_output con.xb
-@@ -307,7 +305,7 @@ let is_bad con = match con.dom with None -> false | Some dom -> Domain.is_bad_do
- Restrictions below can be relaxed once xenstored learns to dump more
- of its live state in a safe way *)
- let has_extra_connection_data con =
-- let has_in = has_input con || has_partial_input con in
-+ let has_in = has_partial_input con in
- let has_out = has_output con in
- let has_socket = con.dom = None in
- let has_nondefault_perms = make_perm con.dom <> con.perm in
-diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
-index dd58e6979cf9..cbf708213796 100644
---- a/tools/ocaml/xenstored/process.ml
-+++ b/tools/ocaml/xenstored/process.ml
-@@ -195,10 +195,9 @@ let parse_live_update args =
- | _ when Unix.gettimeofday () < t.deadline -> false
- | l ->
- warn "timeout reached: have to wait, migrate or shutdown %d domains:" (List.length l);
-- let msgs = List.rev_map (fun con -> Printf.sprintf "%s: %d tx, in: %b, out: %b, perm: %s"
-+ let msgs = List.rev_map (fun con -> Printf.sprintf "%s: %d tx, out: %b, perm: %s"
- (Connection.get_domstr con)
- (Connection.number_of_transactions con)
-- (Connection.has_input con)
- (Connection.has_output con)
- (Connection.get_perm con |> Perms.Connection.to_string)
- ) l in
-@@ -706,16 +705,17 @@ let do_input store cons doms con =
- info "%s requests a reconnect" (Connection.get_domstr con);
- History.reconnect con;
- info "%s reconnection complete" (Connection.get_domstr con);
-- false
-+ None
- | Failure exp ->
- error "caught exception %s" exp;
- error "got a bad client %s" (sprintf "%-8s" (Connection.get_domstr con));
- Connection.mark_as_bad con;
-- false
-+ None
- in
-
-- if newpacket then (
-- let packet = Connection.pop_in con in
-+ match newpacket with
-+ | None -> ()
-+ | Some packet ->
- let tid, rid, ty, data = Xenbus.Xb.Packet.unpack packet in
- let req = {Packet.tid=tid; Packet.rid=rid; Packet.ty=ty; Packet.data=data} in
-
-@@ -725,8 +725,7 @@ let do_input store cons doms con =
- (Xenbus.Xb.Op.to_string ty) (sanitize_data data); *)
- process_packet ~store ~cons ~doms ~con ~req;
- write_access_log ~ty ~tid ~con:(Connection.get_domstr con) ~data;
-- Connection.incr_ops con;
-- )
-+ Connection.incr_ops con
-
- let do_output _store _cons _doms con =
- if Connection.has_output con then (
---
-2.37.4
-
diff --git a/0067-tools-ocaml-xb-Add-BoundedQueue.patch b/0067-tools-ocaml-xb-Add-BoundedQueue.patch
deleted file mode 100644
index 9a141a3..0000000
--- a/0067-tools-ocaml-xb-Add-BoundedQueue.patch
+++ /dev/null
@@ -1,133 +0,0 @@
-From ea1567893b05df03fe65657f0a25211a6a9ff7ec Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Wed, 12 Oct 2022 19:13:03 +0100
-Subject: [PATCH 67/87] tools/ocaml/xb: Add BoundedQueue
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Ensures we cannot store more than [capacity] elements in a [Queue]. Replacing
-all Queue with this module will then ensure at compile time that all Queues
-are correctly bound checked.
-
-Each element in the queue has a class with its own limits. This, in a
-subsequent change, will ensure that command responses can proceed during a
-flood of watch events.
-
-No functional change.
-
-This is part of XSA-326.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 19171fb5d888b4467a7073e8febc5e05540956e9)
----
- tools/ocaml/libs/xb/xb.ml | 92 +++++++++++++++++++++++++++++++++++++++
- 1 file changed, 92 insertions(+)
-
-diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
-index 165fd4a1edf4..4197a3888a68 100644
---- a/tools/ocaml/libs/xb/xb.ml
-+++ b/tools/ocaml/libs/xb/xb.ml
-@@ -17,6 +17,98 @@
- module Op = struct include Op end
- module Packet = struct include Packet end
-
-+module BoundedQueue : sig
-+ type ('a, 'b) t
-+
-+ (** [create ~capacity ~classify ~limit] creates a queue with maximum [capacity] elements.
-+ This is burst capacity, each element is further classified according to [classify],
-+ and each class can have its own [limit].
-+ [capacity] is enforced as an overall limit.
-+ The [limit] can be dynamic, and can be smaller than the number of elements already queued of that class,
-+ in which case those elements are considered to use "burst capacity".
-+ *)
-+ val create: capacity:int -> classify:('a -> 'b) -> limit:('b -> int) -> ('a, 'b) t
-+
-+ (** [clear q] discards all elements from [q] *)
-+ val clear: ('a, 'b) t -> unit
-+
-+ (** [can_push q] when [length q < capacity]. *)
-+ val can_push: ('a, 'b) t -> 'b -> bool
-+
-+ (** [push e q] adds [e] at the end of queue [q] if [can_push q], or returns [None]. *)
-+ val push: 'a -> ('a, 'b) t -> unit option
-+
-+ (** [pop q] removes and returns first element in [q], or raises [Queue.Empty]. *)
-+ val pop: ('a, 'b) t -> 'a
-+
-+ (** [peek q] returns the first element in [q], or raises [Queue.Empty]. *)
-+ val peek : ('a, 'b) t -> 'a
-+
-+ (** [length q] returns the current number of elements in [q] *)
-+ val length: ('a, 'b) t -> int
-+
-+ (** [debug string_of_class q] prints queue usage statistics in an unspecified internal format. *)
-+ val debug: ('b -> string) -> (_, 'b) t -> string
-+end = struct
-+ type ('a, 'b) t =
-+ { q: 'a Queue.t
-+ ; capacity: int
-+ ; classify: 'a -> 'b
-+ ; limit: 'b -> int
-+ ; class_count: ('b, int) Hashtbl.t
-+ }
-+
-+ let create ~capacity ~classify ~limit =
-+ { capacity; q = Queue.create (); classify; limit; class_count = Hashtbl.create 3 }
-+
-+ let get_count t classification = try Hashtbl.find t.class_count classification with Not_found -> 0
-+
-+ let can_push_internal t classification class_count =
-+ Queue.length t.q < t.capacity && class_count < t.limit classification
-+
-+ let ok = Some ()
-+
-+ let push e t =
-+ let classification = t.classify e in
-+ let class_count = get_count t classification in
-+ if can_push_internal t classification class_count then begin
-+ Queue.push e t.q;
-+ Hashtbl.replace t.class_count classification (class_count + 1);
-+ ok
-+ end
-+ else
-+ None
-+
-+ let can_push t classification =
-+ can_push_internal t classification @@ get_count t classification
-+
-+ let clear t =
-+ Queue.clear t.q;
-+ Hashtbl.reset t.class_count
-+
-+ let pop t =
-+ let e = Queue.pop t.q in
-+ let classification = t.classify e in
-+ let () = match get_count t classification - 1 with
-+ | 0 -> Hashtbl.remove t.class_count classification (* reduces memusage *)
-+ | n -> Hashtbl.replace t.class_count classification n
-+ in
-+ e
-+
-+ let peek t = Queue.peek t.q
-+ let length t = Queue.length t.q
-+
-+ let debug string_of_class t =
-+ let b = Buffer.create 128 in
-+ Printf.bprintf b "BoundedQueue capacity: %d, used: {" t.capacity;
-+ Hashtbl.iter (fun packet_class count ->
-+ Printf.bprintf b " %s: %d" (string_of_class packet_class) count
-+ ) t.class_count;
-+ Printf.bprintf b "}";
-+ Buffer.contents b
-+end
-+
-+
- exception End_of_file
- exception Eagain
- exception Noent
---
-2.37.4
-
diff --git a/0068-tools-ocaml-Limit-maximum-in-flight-requests-outstan.patch b/0068-tools-ocaml-Limit-maximum-in-flight-requests-outstan.patch
deleted file mode 100644
index 0572fa1..0000000
--- a/0068-tools-ocaml-Limit-maximum-in-flight-requests-outstan.patch
+++ /dev/null
@@ -1,888 +0,0 @@
-From cec3c52c287f5aee7de061b40765aca5301cf9ca Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Wed, 12 Oct 2022 19:13:04 +0100
-Subject: [PATCH 68/87] tools/ocaml: Limit maximum in-flight requests /
- outstanding replies
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Introduce a limit on the number of outstanding reply packets in the xenbus
-queue. This limits the number of in-flight requests: when the output queue is
-full we'll stop processing inputs until the output queue has room again.
-
-To avoid a busy loop on the Unix socket we only add it to the watched input
-file descriptor set if we'd be able to call `input` on it. Even though Dom0
-is trusted and exempt from quotas a flood of events might cause a backlog
-where events are produced faster than daemons in Dom0 can consume them, which
-could lead to an unbounded queue size and OOM.
-
-Therefore the xenbus queue limit must apply to all connections, Dom0 is not
-exempt from it, although if everything works correctly it will eventually
-catch up.
-
-This prevents a malicious guest from sending more commands while it has
-outstanding watch events or command replies in its input ring. However if it
-can cause the generation of watch events by other means (e.g. by Dom0, or
-another cooperative guest) and stop reading its own ring then watch events
-would've queued up without limit.
-
-The xenstore protocol doesn't have a back-pressure mechanism, and doesn't
-allow dropping watch events. In fact, dropping watch events is known to break
-some pieces of normal functionality. This leaves little choice to safely
-implement the xenstore protocol without exposing the xenstore daemon to
-out-of-memory attacks.
-
-Implement the fix as pipes with bounded buffers:
-* Use a bounded buffer for watch events
-* The watch structure will have a bounded receiving pipe of watch events
-* The source will have an "overflow" pipe of pending watch events it couldn't
- deliver
-
-Items are queued up on one end and are sent as far along the pipe as possible:
-
- source domain -> watch -> xenbus of target -> xenstore ring/socket of target
-
-If the pipe is "full" at any point then back-pressure is applied and we prevent
-more items from being queued up. For the source domain this means that we'll
-stop accepting new commands as long as its pipe buffer is not empty.
-
-Before we try to enqueue an item we first check whether it is possible to send
-it further down the pipe, by attempting to recursively flush the pipes. This
-ensures that we retain the order of events as much as possible.
-
-We might break causality of watch events if the target domain's queue is full
-and we need to start using the watch's queue. This is a breaking change in
-the xenstore protocol, but only for domains which are not processing their
-incoming ring as expected.
-
-When a watch is deleted its entire pending queue is dropped (no code is needed
-for that, because it is part of the 'watch' type).
-
-There is a cache of watches that have pending events that we attempt to flush
-at every cycle if possible.
-
-Introduce 3 limits here:
-* quota-maxwatchevents on watch event destination: when this is hit the
- source will not be allowed to queue up more watch events.
-* quota-maxoustanding which is the number of responses not read from the ring:
- once exceeded, no more inputs are processed until all outstanding replies
- are consumed by the client.
-* overflow queue on the watch event source: all watches that cannot be stored
- on destination are queued up here, a single command can trigger multiple
- watches (e.g. due to recursion).
-
-The overflow queue currently doesn't have an upper bound, it is difficult to
-accurately calculate one as it depends on whether you are Dom0 and how many
-watches each path has registered and how many watch events you can trigger
-with a single command (e.g. a commit). However these events were already
-using memory, this just moves them elsewhere, and as long as we correctly
-block a domain it shouldn't result in unbounded memory usage.
-
-Note that Dom0 is not excluded from these checks, it is important that Dom0 is
-especially not excluded when it is the source, since there are many ways in
-which a guest could trigger Dom0 to send it watch events.
-
-This should protect against malicious frontends as long as the backend follows
-the PV xenstore protocol and only exposes paths needed by the frontend, and
-changes those paths at most once as a reaction to guest events, or protocol
-state.
-
-The queue limits are per watch, and per domain-pair, so even if one
-communication channel would be "blocked", others would keep working, and the
-domain itself won't get blocked as long as it doesn't overflow the queue of
-watch events.
-
-Similarly a malicious backend could cause the frontend to get blocked, but
-this watch queue protects the frontend as well as long as it follows the PV
-protocol. (Although note that protection against malicious backends is only a
-best effort at the moment)
-
-This is part of XSA-326 / CVE-2022-42318.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 9284ae0c40fb5b9606947eaaec23dc71d0540e96)
----
- tools/ocaml/libs/xb/xb.ml | 61 +++++++--
- tools/ocaml/libs/xb/xb.mli | 11 +-
- tools/ocaml/libs/xs/queueop.ml | 25 ++--
- tools/ocaml/libs/xs/xsraw.ml | 4 +-
- tools/ocaml/xenstored/connection.ml | 155 +++++++++++++++++++++--
- tools/ocaml/xenstored/connections.ml | 57 +++++++--
- tools/ocaml/xenstored/define.ml | 7 +
- tools/ocaml/xenstored/oxenstored.conf.in | 2 +
- tools/ocaml/xenstored/process.ml | 31 ++++-
- tools/ocaml/xenstored/xenstored.ml | 2 +
- 10 files changed, 296 insertions(+), 59 deletions(-)
-
-diff --git a/tools/ocaml/libs/xb/xb.ml b/tools/ocaml/libs/xb/xb.ml
-index 4197a3888a68..b292ed7a874d 100644
---- a/tools/ocaml/libs/xb/xb.ml
-+++ b/tools/ocaml/libs/xb/xb.ml
-@@ -134,14 +134,44 @@ type backend = Fd of backend_fd | Xenmmap of backend_mmap
-
- type partial_buf = HaveHdr of Partial.pkt | NoHdr of int * bytes
-
-+(*
-+ separate capacity reservation for replies and watch events:
-+ this allows a domain to keep working even when under a constant flood of
-+ watch events
-+*)
-+type capacity = { maxoutstanding: int; maxwatchevents: int }
-+
-+module Queue = BoundedQueue
-+
-+type packet_class =
-+ | CommandReply
-+ | Watchevent
-+
-+let string_of_packet_class = function
-+ | CommandReply -> "command_reply"
-+ | Watchevent -> "watch_event"
-+
- type t =
- {
- backend: backend;
-- pkt_out: Packet.t Queue.t;
-+ pkt_out: (Packet.t, packet_class) Queue.t;
- mutable partial_in: partial_buf;
- mutable partial_out: string;
-+ capacity: capacity
- }
-
-+let to_read con =
-+ match con.partial_in with
-+ | HaveHdr partial_pkt -> Partial.to_complete partial_pkt
-+ | NoHdr (i, _) -> i
-+
-+let debug t =
-+ Printf.sprintf "XenBus state: partial_in: %d needed, partial_out: %d bytes, pkt_out: %d packets, %s"
-+ (to_read t)
-+ (String.length t.partial_out)
-+ (Queue.length t.pkt_out)
-+ (BoundedQueue.debug string_of_packet_class t.pkt_out)
-+
- let init_partial_in () = NoHdr
- (Partial.header_size (), Bytes.make (Partial.header_size()) '\000')
-
-@@ -199,7 +229,8 @@ let output con =
- let s = if String.length con.partial_out > 0 then
- con.partial_out
- else if Queue.length con.pkt_out > 0 then
-- Packet.to_string (Queue.pop con.pkt_out)
-+ let pkt = Queue.pop con.pkt_out in
-+ Packet.to_string pkt
- else
- "" in
- (* send data from s, and save the unsent data to partial_out *)
-@@ -212,12 +243,15 @@ let output con =
- (* after sending one packet, partial is empty *)
- con.partial_out = ""
-
-+(* we can only process an input packet if we're guaranteed to have room
-+ to store the response packet *)
-+let can_input con = Queue.can_push con.pkt_out CommandReply
-+
- (* NB: can throw Reconnect *)
- let input con =
-- let to_read =
-- match con.partial_in with
-- | HaveHdr partial_pkt -> Partial.to_complete partial_pkt
-- | NoHdr (i, _) -> i in
-+ if not (can_input con) then None
-+ else
-+ let to_read = to_read con in
-
- (* try to get more data from input stream *)
- let b = Bytes.make to_read '\000' in
-@@ -243,11 +277,22 @@ let input con =
- None
- )
-
--let newcon backend = {
-+let classify t =
-+ match t.Packet.ty with
-+ | Op.Watchevent -> Watchevent
-+ | _ -> CommandReply
-+
-+let newcon ~capacity backend =
-+ let limit = function
-+ | CommandReply -> capacity.maxoutstanding
-+ | Watchevent -> capacity.maxwatchevents
-+ in
-+ {
- backend = backend;
-- pkt_out = Queue.create ();
-+ pkt_out = Queue.create ~capacity:(capacity.maxoutstanding + capacity.maxwatchevents) ~classify ~limit;
- partial_in = init_partial_in ();
- partial_out = "";
-+ capacity = capacity;
- }
-
- let open_fd fd = newcon (Fd { fd = fd; })
-diff --git a/tools/ocaml/libs/xb/xb.mli b/tools/ocaml/libs/xb/xb.mli
-index 91c682162cea..71b2754ca788 100644
---- a/tools/ocaml/libs/xb/xb.mli
-+++ b/tools/ocaml/libs/xb/xb.mli
-@@ -66,10 +66,11 @@ type backend_mmap = {
- type backend_fd = { fd : Unix.file_descr; }
- type backend = Fd of backend_fd | Xenmmap of backend_mmap
- type partial_buf = HaveHdr of Partial.pkt | NoHdr of int * bytes
-+type capacity = { maxoutstanding: int; maxwatchevents: int }
- type t
- val init_partial_in : unit -> partial_buf
- val reconnect : t -> unit
--val queue : t -> Packet.t -> unit
-+val queue : t -> Packet.t -> unit option
- val read_fd : backend_fd -> 'a -> bytes -> int -> int
- val read_mmap : backend_mmap -> 'a -> bytes -> int -> int
- val read : t -> bytes -> int -> int
-@@ -78,13 +79,14 @@ val write_mmap : backend_mmap -> 'a -> string -> int -> int
- val write : t -> string -> int -> int
- val output : t -> bool
- val input : t -> Packet.t option
--val newcon : backend -> t
--val open_fd : Unix.file_descr -> t
--val open_mmap : Xenmmap.mmap_interface -> (unit -> unit) -> t
-+val newcon : capacity:capacity -> backend -> t
-+val open_fd : Unix.file_descr -> capacity:capacity -> t
-+val open_mmap : Xenmmap.mmap_interface -> (unit -> unit) -> capacity:capacity -> t
- val close : t -> unit
- val is_fd : t -> bool
- val is_mmap : t -> bool
- val output_len : t -> int
-+val can_input: t -> bool
- val has_new_output : t -> bool
- val has_old_output : t -> bool
- val has_output : t -> bool
-@@ -93,3 +95,4 @@ val has_partial_input : t -> bool
- val has_more_input : t -> bool
- val is_selectable : t -> bool
- val get_fd : t -> Unix.file_descr
-+val debug: t -> string
-diff --git a/tools/ocaml/libs/xs/queueop.ml b/tools/ocaml/libs/xs/queueop.ml
-index 9ff5bbd529ce..4e532cdaeacb 100644
---- a/tools/ocaml/libs/xs/queueop.ml
-+++ b/tools/ocaml/libs/xs/queueop.ml
-@@ -16,9 +16,10 @@
- open Xenbus
-
- let data_concat ls = (String.concat "\000" ls) ^ "\000"
-+let queue con pkt = let r = Xb.queue con pkt in assert (r <> None)
- let queue_path ty (tid: int) (path: string) con =
- let data = data_concat [ path; ] in
-- Xb.queue con (Xb.Packet.create tid 0 ty data)
-+ queue con (Xb.Packet.create tid 0 ty data)
-
- (* operations *)
- let directory tid path con = queue_path Xb.Op.Directory tid path con
-@@ -27,48 +28,48 @@ let read tid path con = queue_path Xb.Op.Read tid path con
- let getperms tid path con = queue_path Xb.Op.Getperms tid path con
-
- let debug commands con =
-- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Debug (data_concat commands))
-+ queue con (Xb.Packet.create 0 0 Xb.Op.Debug (data_concat commands))
-
- let watch path data con =
- let data = data_concat [ path; data; ] in
-- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Watch data)
-+ queue con (Xb.Packet.create 0 0 Xb.Op.Watch data)
-
- let unwatch path data con =
- let data = data_concat [ path; data; ] in
-- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Unwatch data)
-+ queue con (Xb.Packet.create 0 0 Xb.Op.Unwatch data)
-
- let transaction_start con =
-- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Transaction_start (data_concat []))
-+ queue con (Xb.Packet.create 0 0 Xb.Op.Transaction_start (data_concat []))
-
- let transaction_end tid commit con =
- let data = data_concat [ (if commit then "T" else "F"); ] in
-- Xb.queue con (Xb.Packet.create tid 0 Xb.Op.Transaction_end data)
-+ queue con (Xb.Packet.create tid 0 Xb.Op.Transaction_end data)
-
- let introduce domid mfn port con =
- let data = data_concat [ Printf.sprintf "%u" domid;
- Printf.sprintf "%nu" mfn;
- string_of_int port; ] in
-- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Introduce data)
-+ queue con (Xb.Packet.create 0 0 Xb.Op.Introduce data)
-
- let release domid con =
- let data = data_concat [ Printf.sprintf "%u" domid; ] in
-- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Release data)
-+ queue con (Xb.Packet.create 0 0 Xb.Op.Release data)
-
- let resume domid con =
- let data = data_concat [ Printf.sprintf "%u" domid; ] in
-- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Resume data)
-+ queue con (Xb.Packet.create 0 0 Xb.Op.Resume data)
-
- let getdomainpath domid con =
- let data = data_concat [ Printf.sprintf "%u" domid; ] in
-- Xb.queue con (Xb.Packet.create 0 0 Xb.Op.Getdomainpath data)
-+ queue con (Xb.Packet.create 0 0 Xb.Op.Getdomainpath data)
-
- let write tid path value con =
- let data = path ^ "\000" ^ value (* no NULL at the end *) in
-- Xb.queue con (Xb.Packet.create tid 0 Xb.Op.Write data)
-+ queue con (Xb.Packet.create tid 0 Xb.Op.Write data)
-
- let mkdir tid path con = queue_path Xb.Op.Mkdir tid path con
- let rm tid path con = queue_path Xb.Op.Rm tid path con
-
- let setperms tid path perms con =
- let data = data_concat [ path; perms ] in
-- Xb.queue con (Xb.Packet.create tid 0 Xb.Op.Setperms data)
-+ queue con (Xb.Packet.create tid 0 Xb.Op.Setperms data)
-diff --git a/tools/ocaml/libs/xs/xsraw.ml b/tools/ocaml/libs/xs/xsraw.ml
-index 451f8b38dbcc..cbd17280600c 100644
---- a/tools/ocaml/libs/xs/xsraw.ml
-+++ b/tools/ocaml/libs/xs/xsraw.ml
-@@ -36,8 +36,10 @@ type con = {
- let close con =
- Xb.close con.xb
-
-+let capacity = { Xb.maxoutstanding = 1; maxwatchevents = 0; }
-+
- let open_fd fd = {
-- xb = Xb.open_fd fd;
-+ xb = Xb.open_fd ~capacity fd;
- watchevents = Queue.create ();
- }
-
-diff --git a/tools/ocaml/xenstored/connection.ml b/tools/ocaml/xenstored/connection.ml
-index cc20e047d2b9..9624a5f9da2c 100644
---- a/tools/ocaml/xenstored/connection.ml
-+++ b/tools/ocaml/xenstored/connection.ml
-@@ -20,12 +20,84 @@ open Stdext
-
- let xenstore_payload_max = 4096 (* xen/include/public/io/xs_wire.h *)
-
-+type 'a bounded_sender = 'a -> unit option
-+(** a bounded sender accepts an ['a] item and returns:
-+ None - if there is no room to accept the item
-+ Some () - if it has successfully accepted/sent the item
-+ *)
-+
-+module BoundedPipe : sig
-+ type 'a t
-+
-+ (** [create ~capacity ~destination] creates a bounded pipe with a
-+ local buffer holding at most [capacity] items. Once the buffer is
-+ full it will not accept further items. items from the pipe are
-+ flushed into [destination] as long as it accepts items. The
-+ destination could be another pipe.
-+ *)
-+ val create: capacity:int -> destination:'a bounded_sender -> 'a t
-+
-+ (** [is_empty t] returns whether the local buffer of [t] is empty. *)
-+ val is_empty : _ t -> bool
-+
-+ (** [length t] the number of items in the internal buffer *)
-+ val length: _ t -> int
-+
-+ (** [flush_pipe t] sends as many items from the local buffer as possible,
-+ which could be none. *)
-+ val flush_pipe: _ t -> unit
-+
-+ (** [push t item] tries to [flush_pipe] and then push [item]
-+ into the pipe if its [capacity] allows.
-+ Returns [None] if there is no more room
-+ *)
-+ val push : 'a t -> 'a bounded_sender
-+end = struct
-+ (* items are enqueued in [q], and then flushed to [connect_to] *)
-+ type 'a t =
-+ { q: 'a Queue.t
-+ ; destination: 'a bounded_sender
-+ ; capacity: int
-+ }
-+
-+ let create ~capacity ~destination =
-+ { q = Queue.create (); capacity; destination }
-+
-+ let rec flush_pipe t =
-+ if not Queue.(is_empty t.q) then
-+ let item = Queue.peek t.q in
-+ match t.destination item with
-+ | None -> () (* no room *)
-+ | Some () ->
-+ (* successfully sent item to next stage *)
-+ let _ = Queue.pop t.q in
-+ (* continue trying to send more items *)
-+ flush_pipe t
-+
-+ let push t item =
-+ (* first try to flush as many items from this pipe as possible to make room,
-+ it is important to do this first to preserve the order of the items
-+ *)
-+ flush_pipe t;
-+ if Queue.length t.q < t.capacity then begin
-+ (* enqueue, instead of sending directly.
-+ this ensures that [out] sees the items in the same order as we receive them
-+ *)
-+ Queue.push item t.q;
-+ Some (flush_pipe t)
-+ end else None
-+
-+ let is_empty t = Queue.is_empty t.q
-+ let length t = Queue.length t.q
-+end
-+
- type watch = {
- con: t;
- token: string;
- path: string;
- base: string;
- is_relative: bool;
-+ pending_watchevents: Xenbus.Xb.Packet.t BoundedPipe.t;
- }
-
- and t = {
-@@ -38,8 +110,36 @@ and t = {
- anonid: int;
- mutable stat_nb_ops: int;
- mutable perm: Perms.Connection.t;
-+ pending_source_watchevents: (watch * Xenbus.Xb.Packet.t) BoundedPipe.t
- }
-
-+module Watch = struct
-+ module T = struct
-+ type t = watch
-+
-+ let compare w1 w2 =
-+ (* cannot compare watches from different connections *)
-+ assert (w1.con == w2.con);
-+ match String.compare w1.token w2.token with
-+ | 0 -> String.compare w1.path w2.path
-+ | n -> n
-+ end
-+ module Set = Set.Make(T)
-+
-+ let flush_events t =
-+ BoundedPipe.flush_pipe t.pending_watchevents;
-+ not (BoundedPipe.is_empty t.pending_watchevents)
-+
-+ let pending_watchevents t =
-+ BoundedPipe.length t.pending_watchevents
-+end
-+
-+let source_flush_watchevents t =
-+ BoundedPipe.flush_pipe t.pending_source_watchevents
-+
-+let source_pending_watchevents t =
-+ BoundedPipe.length t.pending_source_watchevents
-+
- let mark_as_bad con =
- match con.dom with
- |None -> ()
-@@ -67,7 +167,8 @@ let watch_create ~con ~path ~token = {
- token = token;
- path = path;
- base = get_path con;
-- is_relative = path.[0] <> '/' && path.[0] <> '@'
-+ is_relative = path.[0] <> '/' && path.[0] <> '@';
-+ pending_watchevents = BoundedPipe.create ~capacity:!Define.maxwatchevents ~destination:(Xenbus.Xb.queue con.xb)
- }
-
- let get_con w = w.con
-@@ -93,6 +194,9 @@ let make_perm dom =
- Perms.Connection.create ~perms:[Perms.READ; Perms.WRITE] domid
-
- let create xbcon dom =
-+ let destination (watch, pkt) =
-+ BoundedPipe.push watch.pending_watchevents pkt
-+ in
- let id =
- match dom with
- | None -> let old = !anon_id_next in incr anon_id_next; old
-@@ -109,6 +213,16 @@ let create xbcon dom =
- anonid = id;
- stat_nb_ops = 0;
- perm = make_perm dom;
-+
-+ (* the actual capacity will be lower, this is used as an overflow
-+ buffer: anything that doesn't fit elsewhere gets put here, only
-+ limited by the amount of watches that you can generate with a
-+ single xenstore command (which is finite, although possibly very
-+ large in theory for Dom0). Once the pipe here has any contents the
-+ domain is blocked from sending more commands until it is empty
-+ again though.
-+ *)
-+ pending_source_watchevents = BoundedPipe.create ~capacity:Sys.max_array_length ~destination
- }
- in
- Logging.new_connection ~tid:Transaction.none ~con:(get_domstr con);
-@@ -127,11 +241,17 @@ let set_target con target_domid =
-
- let is_backend_mmap con = Xenbus.Xb.is_mmap con.xb
-
--let send_reply con tid rid ty data =
-+let packet_of con tid rid ty data =
- if (String.length data) > xenstore_payload_max && (is_backend_mmap con) then
-- Xenbus.Xb.queue con.xb (Xenbus.Xb.Packet.create tid rid Xenbus.Xb.Op.Error "E2BIG\000")
-+ Xenbus.Xb.Packet.create tid rid Xenbus.Xb.Op.Error "E2BIG\000"
- else
-- Xenbus.Xb.queue con.xb (Xenbus.Xb.Packet.create tid rid ty data)
-+ Xenbus.Xb.Packet.create tid rid ty data
-+
-+let send_reply con tid rid ty data =
-+ let result = Xenbus.Xb.queue con.xb (packet_of con tid rid ty data) in
-+ (* should never happen: we only process an input packet when there is room for an output packet *)
-+ (* and the limit for replies is different from the limit for watch events *)
-+ assert (result <> None)
-
- let send_error con tid rid err = send_reply con tid rid Xenbus.Xb.Op.Error (err ^ "\000")
- let send_ack con tid rid ty = send_reply con tid rid ty "OK\000"
-@@ -181,11 +301,11 @@ let del_watch con path token =
- apath, w
-
- let del_watches con =
-- Hashtbl.clear con.watches;
-+ Hashtbl.reset con.watches;
- con.nb_watches <- 0
-
- let del_transactions con =
-- Hashtbl.clear con.transactions
-+ Hashtbl.reset con.transactions
-
- let list_watches con =
- let ll = Hashtbl.fold
-@@ -208,21 +328,29 @@ let lookup_watch_perm path = function
- let lookup_watch_perms oldroot root path =
- lookup_watch_perm path oldroot @ lookup_watch_perm path (Some root)
-
--let fire_single_watch_unchecked watch =
-+let fire_single_watch_unchecked source watch =
- let data = Utils.join_by_null [watch.path; watch.token; ""] in
-- send_reply watch.con Transaction.none 0 Xenbus.Xb.Op.Watchevent data
-+ let pkt = packet_of watch.con Transaction.none 0 Xenbus.Xb.Op.Watchevent data in
-
--let fire_single_watch (oldroot, root) watch =
-+ match BoundedPipe.push source.pending_source_watchevents (watch, pkt) with
-+ | Some () -> () (* packet queued *)
-+ | None ->
-+ (* a well behaved Dom0 shouldn't be able to trigger this,
-+ if it happens it is likely a Dom0 bug causing runaway memory usage
-+ *)
-+ failwith "watch event overflow, cannot happen"
-+
-+let fire_single_watch source (oldroot, root) watch =
- let abspath = get_watch_path watch.con watch.path |> Store.Path.of_string in
- let perms = lookup_watch_perms oldroot root abspath in
- if Perms.can_fire_watch watch.con.perm perms then
-- fire_single_watch_unchecked watch
-+ fire_single_watch_unchecked source watch
- else
- let perms = perms |> List.map (Perms.Node.to_string ~sep:" ") |> String.concat ", " in
- let con = get_domstr watch.con in
- Logging.watch_not_fired ~con perms (Store.Path.to_string abspath)
-
--let fire_watch roots watch path =
-+let fire_watch source roots watch path =
- let new_path =
- if watch.is_relative && path.[0] = '/'
- then begin
-@@ -232,7 +360,7 @@ let fire_watch roots watch path =
- end else
- path
- in
-- fire_single_watch roots { watch with path = new_path }
-+ fire_single_watch source roots { watch with path = new_path }
-
- (* Search for a valid unused transaction id. *)
- let rec valid_transaction_id con proposed_id =
-@@ -280,6 +408,7 @@ let do_input con = Xenbus.Xb.input con.xb
- let has_partial_input con = Xenbus.Xb.has_partial_input con.xb
- let has_more_input con = Xenbus.Xb.has_more_input con.xb
-
-+let can_input con = Xenbus.Xb.can_input con.xb && BoundedPipe.is_empty con.pending_source_watchevents
- let has_output con = Xenbus.Xb.has_output con.xb
- let has_old_output con = Xenbus.Xb.has_old_output con.xb
- let has_new_output con = Xenbus.Xb.has_new_output con.xb
-@@ -323,7 +452,7 @@ let prevents_live_update con = not (is_bad con)
- && (has_extra_connection_data con || has_transaction_data con)
-
- let has_more_work con =
-- has_more_input con || not (has_old_output con) && has_new_output con
-+ (has_more_input con && can_input con) || not (has_old_output con) && has_new_output con
-
- let incr_ops con = con.stat_nb_ops <- con.stat_nb_ops + 1
-
-diff --git a/tools/ocaml/xenstored/connections.ml b/tools/ocaml/xenstored/connections.ml
-index 3c7429fe7f61..7d68c583b43a 100644
---- a/tools/ocaml/xenstored/connections.ml
-+++ b/tools/ocaml/xenstored/connections.ml
-@@ -22,22 +22,30 @@ type t = {
- domains: (int, Connection.t) Hashtbl.t;
- ports: (Xeneventchn.t, Connection.t) Hashtbl.t;
- mutable watches: Connection.watch list Trie.t;
-+ mutable has_pending_watchevents: Connection.Watch.Set.t
- }
-
- let create () = {
- anonymous = Hashtbl.create 37;
- domains = Hashtbl.create 37;
- ports = Hashtbl.create 37;
-- watches = Trie.create ()
-+ watches = Trie.create ();
-+ has_pending_watchevents = Connection.Watch.Set.empty;
- }
-
-+let get_capacity () =
-+ (* not multiplied by maxwatch on purpose: 2nd queue in watch itself! *)
-+ { Xenbus.Xb.maxoutstanding = !Define.maxoutstanding; maxwatchevents = !Define.maxwatchevents }
-+
- let add_anonymous cons fd =
-- let xbcon = Xenbus.Xb.open_fd fd in
-+ let capacity = get_capacity () in
-+ let xbcon = Xenbus.Xb.open_fd fd ~capacity in
- let con = Connection.create xbcon None in
- Hashtbl.add cons.anonymous (Xenbus.Xb.get_fd xbcon) con
-
- let add_domain cons dom =
-- let xbcon = Xenbus.Xb.open_mmap (Domain.get_interface dom) (fun () -> Domain.notify dom) in
-+ let capacity = get_capacity () in
-+ let xbcon = Xenbus.Xb.open_mmap ~capacity (Domain.get_interface dom) (fun () -> Domain.notify dom) in
- let con = Connection.create xbcon (Some dom) in
- Hashtbl.add cons.domains (Domain.get_id dom) con;
- match Domain.get_port dom with
-@@ -48,7 +56,9 @@ let select ?(only_if = (fun _ -> true)) cons =
- Hashtbl.fold (fun _ con (ins, outs) ->
- if (only_if con) then (
- let fd = Connection.get_fd con in
-- (fd :: ins, if Connection.has_output con then fd :: outs else outs)
-+ let in_fds = if Connection.can_input con then fd :: ins else ins in
-+ let out_fds = if Connection.has_output con then fd :: outs else outs in
-+ in_fds, out_fds
- ) else (ins, outs)
- )
- cons.anonymous ([], [])
-@@ -67,10 +77,17 @@ let del_watches_of_con con watches =
- | [] -> None
- | ws -> Some ws
-
-+let del_watches cons con =
-+ Connection.del_watches con;
-+ cons.watches <- Trie.map (del_watches_of_con con) cons.watches;
-+ cons.has_pending_watchevents <-
-+ cons.has_pending_watchevents |> Connection.Watch.Set.filter @@ fun w ->
-+ Connection.get_con w != con
-+
- let del_anonymous cons con =
- try
- Hashtbl.remove cons.anonymous (Connection.get_fd con);
-- cons.watches <- Trie.map (del_watches_of_con con) cons.watches;
-+ del_watches cons con;
- Connection.close con
- with exn ->
- debug "del anonymous %s" (Printexc.to_string exn)
-@@ -85,7 +102,7 @@ let del_domain cons id =
- | Some p -> Hashtbl.remove cons.ports p
- | None -> ())
- | None -> ());
-- cons.watches <- Trie.map (del_watches_of_con con) cons.watches;
-+ del_watches cons con;
- Connection.close con
- with exn ->
- debug "del domain %u: %s" id (Printexc.to_string exn)
-@@ -136,31 +153,33 @@ let del_watch cons con path token =
- cons.watches <- Trie.set cons.watches key watches;
- watch
-
--let del_watches cons con =
-- Connection.del_watches con;
-- cons.watches <- Trie.map (del_watches_of_con con) cons.watches
--
- (* path is absolute *)
--let fire_watches ?oldroot root cons path recurse =
-+let fire_watches ?oldroot source root cons path recurse =
- let key = key_of_path path in
- let path = Store.Path.to_string path in
- let roots = oldroot, root in
- let fire_watch _ = function
- | None -> ()
-- | Some watches -> List.iter (fun w -> Connection.fire_watch roots w path) watches
-+ | Some watches -> List.iter (fun w -> Connection.fire_watch source roots w path) watches
- in
- let fire_rec _x = function
- | None -> ()
- | Some watches ->
-- List.iter (Connection.fire_single_watch roots) watches
-+ List.iter (Connection.fire_single_watch source roots) watches
- in
- Trie.iter_path fire_watch cons.watches key;
- if recurse then
- Trie.iter fire_rec (Trie.sub cons.watches key)
-
-+let send_watchevents cons con =
-+ cons.has_pending_watchevents <-
-+ cons.has_pending_watchevents |> Connection.Watch.Set.filter Connection.Watch.flush_events;
-+ Connection.source_flush_watchevents con
-+
- let fire_spec_watches root cons specpath =
-+ let source = find_domain cons 0 in
- iter cons (fun con ->
-- List.iter (Connection.fire_single_watch (None, root)) (Connection.get_watches con specpath))
-+ List.iter (Connection.fire_single_watch source (None, root)) (Connection.get_watches con specpath))
-
- let set_target cons domain target_domain =
- let con = find_domain cons domain in
-@@ -197,6 +216,16 @@ let debug cons =
- let domains = Hashtbl.fold (fun _ con accu -> Connection.debug con :: accu) cons.domains [] in
- String.concat "" (domains @ anonymous)
-
-+let debug_watchevents cons con =
-+ (* == (physical equality)
-+ has to be used here because w.con.xb.backend might contain a [unit->unit] value causing regular
-+ comparison to fail due to having a 'functional value' which cannot be compared.
-+ *)
-+ let s = cons.has_pending_watchevents |> Connection.Watch.Set.filter (fun w -> w.con == con) in
-+ let pending = s |> Connection.Watch.Set.elements
-+ |> List.map (fun w -> Connection.Watch.pending_watchevents w) |> List.fold_left (+) 0 in
-+ Printf.sprintf "Watches with pending events: %d, pending events total: %d" (Connection.Watch.Set.cardinal s) pending
-+
- let filter ~f cons =
- let fold _ v acc = if f v then v :: acc else acc in
- []
-diff --git a/tools/ocaml/xenstored/define.ml b/tools/ocaml/xenstored/define.ml
-index ba63a8147e09..327b6d795ec7 100644
---- a/tools/ocaml/xenstored/define.ml
-+++ b/tools/ocaml/xenstored/define.ml
-@@ -24,6 +24,13 @@ let default_config_dir = Paths.xen_config_dir
- let maxwatch = ref (100)
- let maxtransaction = ref (10)
- let maxrequests = ref (1024) (* maximum requests per transaction *)
-+let maxoutstanding = ref (1024) (* maximum outstanding requests, i.e. in-flight requests / domain *)
-+let maxwatchevents = ref (1024)
-+(*
-+ maximum outstanding watch events per watch,
-+ recommended >= maxoutstanding to avoid blocking backend transactions due to
-+ malicious frontends
-+ *)
-
- let gc_max_overhead = ref 120 (* 120% see comment in xenstored.ml *)
- let conflict_burst_limit = ref 5.0
-diff --git a/tools/ocaml/xenstored/oxenstored.conf.in b/tools/ocaml/xenstored/oxenstored.conf.in
-index 4ae48e42d47d..9d034e744b4b 100644
---- a/tools/ocaml/xenstored/oxenstored.conf.in
-+++ b/tools/ocaml/xenstored/oxenstored.conf.in
-@@ -62,6 +62,8 @@ quota-maxwatch = 100
- quota-transaction = 10
- quota-maxrequests = 1024
- quota-path-max = 1024
-+quota-maxoutstanding = 1024
-+quota-maxwatchevents = 1024
-
- # Activate filed base backend
- persistent = false
-diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
-index cbf708213796..ce39ce28b5f3 100644
---- a/tools/ocaml/xenstored/process.ml
-+++ b/tools/ocaml/xenstored/process.ml
-@@ -57,7 +57,7 @@ let split_one_path data con =
- | path :: "" :: [] -> Store.Path.create path (Connection.get_path con)
- | _ -> raise Invalid_Cmd_Args
-
--let process_watch t cons =
-+let process_watch source t cons =
- let oldroot = t.Transaction.oldroot in
- let newroot = Store.get_root t.store in
- let ops = Transaction.get_paths t |> List.rev in
-@@ -67,8 +67,9 @@ let process_watch t cons =
- | Xenbus.Xb.Op.Rm -> true, None, oldroot
- | Xenbus.Xb.Op.Setperms -> false, Some oldroot, newroot
- | _ -> raise (Failure "huh ?") in
-- Connections.fire_watches ?oldroot root cons (snd op) recurse in
-- List.iter (fun op -> do_op_watch op cons) ops
-+ Connections.fire_watches ?oldroot source root cons (snd op) recurse in
-+ List.iter (fun op -> do_op_watch op cons) ops;
-+ Connections.send_watchevents cons source
-
- let create_implicit_path t perm path =
- let dirname = Store.Path.get_parent path in
-@@ -234,6 +235,20 @@ let do_debug con t _domains cons data =
- | "watches" :: _ ->
- let watches = Connections.debug cons in
- Some (watches ^ "\000")
-+ | "xenbus" :: domid :: _ ->
-+ let domid = int_of_string domid in
-+ let con = Connections.find_domain cons domid in
-+ let s = Printf.sprintf "xenbus: %s; overflow queue length: %d, can_input: %b, has_more_input: %b, has_old_output: %b, has_new_output: %b, has_more_work: %b. pending: %s"
-+ (Xenbus.Xb.debug con.xb)
-+ (Connection.source_pending_watchevents con)
-+ (Connection.can_input con)
-+ (Connection.has_more_input con)
-+ (Connection.has_old_output con)
-+ (Connection.has_new_output con)
-+ (Connection.has_more_work con)
-+ (Connections.debug_watchevents cons con)
-+ in
-+ Some s
- | "mfn" :: domid :: _ ->
- let domid = int_of_string domid in
- let con = Connections.find_domain cons domid in
-@@ -342,7 +357,7 @@ let reply_ack fct con t doms cons data =
- fct con t doms cons data;
- Packet.Ack (fun () ->
- if Transaction.get_id t = Transaction.none then
-- process_watch t cons
-+ process_watch con t cons
- )
-
- let reply_data fct con t doms cons data =
-@@ -501,7 +516,7 @@ let do_watch con t _domains cons data =
- Packet.Ack (fun () ->
- (* xenstore.txt says this watch is fired immediately,
- implying even if path doesn't exist or is unreadable *)
-- Connection.fire_single_watch_unchecked watch)
-+ Connection.fire_single_watch_unchecked con watch)
-
- let do_unwatch con _t _domains cons data =
- let (node, token) =
-@@ -532,7 +547,7 @@ let do_transaction_end con t domains cons data =
- if not success then
- raise Transaction_again;
- if commit then begin
-- process_watch t cons;
-+ process_watch con t cons;
- match t.Transaction.ty with
- | Transaction.No ->
- () (* no need to record anything *)
-@@ -700,7 +715,8 @@ let process_packet ~store ~cons ~doms ~con ~req =
- let do_input store cons doms con =
- let newpacket =
- try
-- Connection.do_input con
-+ if Connection.can_input con then Connection.do_input con
-+ else None
- with Xenbus.Xb.Reconnect ->
- info "%s requests a reconnect" (Connection.get_domstr con);
- History.reconnect con;
-@@ -728,6 +744,7 @@ let do_input store cons doms con =
- Connection.incr_ops con
-
- let do_output _store _cons _doms con =
-+ Connection.source_flush_watchevents con;
- if Connection.has_output con then (
- if Connection.has_new_output con then (
- let packet = Connection.peek_output con in
-diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
-index 3b57ad016dfb..c799e20f1145 100644
---- a/tools/ocaml/xenstored/xenstored.ml
-+++ b/tools/ocaml/xenstored/xenstored.ml
-@@ -103,6 +103,8 @@ let parse_config filename =
- ("quota-maxentity", Config.Set_int Quota.maxent);
- ("quota-maxsize", Config.Set_int Quota.maxsize);
- ("quota-maxrequests", Config.Set_int Define.maxrequests);
-+ ("quota-maxoutstanding", Config.Set_int Define.maxoutstanding);
-+ ("quota-maxwatchevents", Config.Set_int Define.maxwatchevents);
- ("quota-path-max", Config.Set_int Define.path_max);
- ("gc-max-overhead", Config.Set_int Define.gc_max_overhead);
- ("test-eagain", Config.Set_bool Transaction.test_eagain);
---
-2.37.4
-
diff --git a/0069-SUPPORT.md-clarify-support-of-untrusted-driver-domai.patch b/0069-SUPPORT.md-clarify-support-of-untrusted-driver-domai.patch
deleted file mode 100644
index 5660b02..0000000
--- a/0069-SUPPORT.md-clarify-support-of-untrusted-driver-domai.patch
+++ /dev/null
@@ -1,55 +0,0 @@
-From a026fddf89420dd25c5a9574d88aeab7c5711f6c Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Thu, 29 Sep 2022 13:07:35 +0200
-Subject: [PATCH 69/87] SUPPORT.md: clarify support of untrusted driver domains
- with oxenstored
-
-Add a support statement for the scope of support regarding different
-Xenstore variants. Especially oxenstored does not (yet) have security
-support of untrusted driver domains, as those might drive oxenstored
-out of memory by creating lots of watch events for the guests they are
-servicing.
-
-Add a statement regarding Live Update support of oxenstored.
-
-This is part of XSA-326.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: George Dunlap <george.dunlap@citrix.com>
-Acked-by: Julien Grall <jgrall@amazon.com>
-Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit c7bc20d8d123851a468402bbfc9e3330efff21ec)
----
- SUPPORT.md | 13 +++++++++----
- 1 file changed, 9 insertions(+), 4 deletions(-)
-
-diff --git a/SUPPORT.md b/SUPPORT.md
-index 85726102eab8..7d0cb34c8f6f 100644
---- a/SUPPORT.md
-+++ b/SUPPORT.md
-@@ -179,13 +179,18 @@ Support for running qemu-xen device model in a linux stubdomain.
-
- Status: Tech Preview
-
--## Liveupdate of C xenstored daemon
-+## Xenstore
-
-- Status: Tech Preview
-+### C xenstored daemon
-
--## Liveupdate of OCaml xenstored daemon
-+ Status: Supported
-+ Status, Liveupdate: Tech Preview
-
-- Status: Tech Preview
-+### OCaml xenstored daemon
-+
-+ Status: Supported
-+ Status, untrusted driver domains: Supported, not security supported
-+ Status, Liveupdate: Not functional
-
- ## Toolstack/3rd party
-
---
-2.37.4
-
diff --git a/0070-tools-xenstore-don-t-use-conn-in-as-context-for-temp.patch b/0070-tools-xenstore-don-t-use-conn-in-as-context-for-temp.patch
deleted file mode 100644
index 434ad0c..0000000
--- a/0070-tools-xenstore-don-t-use-conn-in-as-context-for-temp.patch
+++ /dev/null
@@ -1,718 +0,0 @@
-From c758765e464e166b5495c76466facc79584bbe1e Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:10 +0200
-Subject: [PATCH 70/87] tools/xenstore: don't use conn->in as context for
- temporary allocations
-
-Using the struct buffered data pointer of the current processed request
-for temporary data allocations has a major drawback: the used area (and
-with that the temporary data) is freed only after the response of the
-request has been written to the ring page or has been read via the
-socket. This can happen much later in case a guest isn't reading its
-responses fast enough.
-
-As the temporary data can be safely freed after creating the response,
-add a temporary context for that purpose and use that for allocating
-the temporary memory, as it was already the case before commit
-cc0612464896 ("xenstore: add small default data buffer to internal
-struct").
-
-Some sub-functions need to gain the "const" attribute for the talloc
-context.
-
-This is XSA-416 / CVE-2022-42319.
-
-Fixes: cc0612464896 ("xenstore: add small default data buffer to internal struct")
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 2a587de219cc0765330fbf9fac6827bfaf29e29b)
----
- tools/xenstore/xenstored_control.c | 31 ++++++-----
- tools/xenstore/xenstored_control.h | 3 +-
- tools/xenstore/xenstored_core.c | 76 ++++++++++++++++----------
- tools/xenstore/xenstored_domain.c | 29 ++++++----
- tools/xenstore/xenstored_domain.h | 21 ++++---
- tools/xenstore/xenstored_transaction.c | 14 +++--
- tools/xenstore/xenstored_transaction.h | 6 +-
- tools/xenstore/xenstored_watch.c | 9 +--
- tools/xenstore/xenstored_watch.h | 6 +-
- 9 files changed, 118 insertions(+), 77 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_control.c b/tools/xenstore/xenstored_control.c
-index 1031a81c3874..d0350c6ad861 100644
---- a/tools/xenstore/xenstored_control.c
-+++ b/tools/xenstore/xenstored_control.c
-@@ -155,7 +155,7 @@ bool lu_is_pending(void)
-
- struct cmd_s {
- char *cmd;
-- int (*func)(void *, struct connection *, char **, int);
-+ int (*func)(const void *, struct connection *, char **, int);
- char *pars;
- /*
- * max_pars can be used to limit the size of the parameter vector,
-@@ -167,7 +167,7 @@ struct cmd_s {
- unsigned int max_pars;
- };
-
--static int do_control_check(void *ctx, struct connection *conn,
-+static int do_control_check(const void *ctx, struct connection *conn,
- char **vec, int num)
- {
- if (num)
-@@ -179,7 +179,7 @@ static int do_control_check(void *ctx, struct connection *conn,
- return 0;
- }
-
--static int do_control_log(void *ctx, struct connection *conn,
-+static int do_control_log(const void *ctx, struct connection *conn,
- char **vec, int num)
- {
- if (num != 1)
-@@ -281,7 +281,7 @@ static int quota_get(const void *ctx, struct connection *conn,
- return domain_get_quota(ctx, conn, atoi(vec[0]));
- }
-
--static int do_control_quota(void *ctx, struct connection *conn,
-+static int do_control_quota(const void *ctx, struct connection *conn,
- char **vec, int num)
- {
- if (num == 0)
-@@ -293,7 +293,7 @@ static int do_control_quota(void *ctx, struct connection *conn,
- return quota_get(ctx, conn, vec, num);
- }
-
--static int do_control_quota_s(void *ctx, struct connection *conn,
-+static int do_control_quota_s(const void *ctx, struct connection *conn,
- char **vec, int num)
- {
- if (num == 0)
-@@ -306,7 +306,7 @@ static int do_control_quota_s(void *ctx, struct connection *conn,
- }
-
- #ifdef __MINIOS__
--static int do_control_memreport(void *ctx, struct connection *conn,
-+static int do_control_memreport(const void *ctx, struct connection *conn,
- char **vec, int num)
- {
- if (num)
-@@ -318,7 +318,7 @@ static int do_control_memreport(void *ctx, struct connection *conn,
- return 0;
- }
- #else
--static int do_control_logfile(void *ctx, struct connection *conn,
-+static int do_control_logfile(const void *ctx, struct connection *conn,
- char **vec, int num)
- {
- if (num != 1)
-@@ -333,7 +333,7 @@ static int do_control_logfile(void *ctx, struct connection *conn,
- return 0;
- }
-
--static int do_control_memreport(void *ctx, struct connection *conn,
-+static int do_control_memreport(const void *ctx, struct connection *conn,
- char **vec, int num)
- {
- FILE *fp;
-@@ -373,7 +373,7 @@ static int do_control_memreport(void *ctx, struct connection *conn,
- }
- #endif
-
--static int do_control_print(void *ctx, struct connection *conn,
-+static int do_control_print(const void *ctx, struct connection *conn,
- char **vec, int num)
- {
- if (num != 1)
-@@ -875,7 +875,7 @@ static const char *lu_start(const void *ctx, struct connection *conn,
- return NULL;
- }
-
--static int do_control_lu(void *ctx, struct connection *conn,
-+static int do_control_lu(const void *ctx, struct connection *conn,
- char **vec, int num)
- {
- const char *ret = NULL;
-@@ -922,7 +922,7 @@ static int do_control_lu(void *ctx, struct connection *conn,
- }
- #endif
-
--static int do_control_help(void *, struct connection *, char **, int);
-+static int do_control_help(const void *, struct connection *, char **, int);
-
- static struct cmd_s cmds[] = {
- { "check", do_control_check, "" },
-@@ -961,7 +961,7 @@ static struct cmd_s cmds[] = {
- { "help", do_control_help, "" },
- };
-
--static int do_control_help(void *ctx, struct connection *conn,
-+static int do_control_help(const void *ctx, struct connection *conn,
- char **vec, int num)
- {
- int cmd, len = 0;
-@@ -997,7 +997,8 @@ static int do_control_help(void *ctx, struct connection *conn,
- return 0;
- }
-
--int do_control(struct connection *conn, struct buffered_data *in)
-+int do_control(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- unsigned int cmd, num, off;
- char **vec = NULL;
-@@ -1017,11 +1018,11 @@ int do_control(struct connection *conn, struct buffered_data *in)
- num = xs_count_strings(in->buffer, in->used);
- if (cmds[cmd].max_pars)
- num = min(num, cmds[cmd].max_pars);
-- vec = talloc_array(in, char *, num);
-+ vec = talloc_array(ctx, char *, num);
- if (!vec)
- return ENOMEM;
- if (get_strings(in, vec, num) < num)
- return EIO;
-
-- return cmds[cmd].func(in, conn, vec + 1, num - 1);
-+ return cmds[cmd].func(ctx, conn, vec + 1, num - 1);
- }
-diff --git a/tools/xenstore/xenstored_control.h b/tools/xenstore/xenstored_control.h
-index 98b6fbcea2b1..a8cb76559ba1 100644
---- a/tools/xenstore/xenstored_control.h
-+++ b/tools/xenstore/xenstored_control.h
-@@ -16,7 +16,8 @@
- along with this program; If not, see <http://www.gnu.org/licenses/>.
- */
-
--int do_control(struct connection *conn, struct buffered_data *in);
-+int do_control(const void *ctx, struct connection *conn,
-+ struct buffered_data *in);
- void lu_read_state(void);
-
- struct connection *lu_get_connection(void);
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 16504de42017..411cc0e44714 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -1248,11 +1248,13 @@ static struct node *get_node_canonicalized(struct connection *conn,
- return get_node(conn, ctx, *canonical_name, perm);
- }
-
--static int send_directory(struct connection *conn, struct buffered_data *in)
-+static int send_directory(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- struct node *node;
-
-- node = get_node_canonicalized(conn, in, onearg(in), NULL, XS_PERM_READ);
-+ node = get_node_canonicalized(conn, ctx, onearg(in), NULL,
-+ XS_PERM_READ);
- if (!node)
- return errno;
-
-@@ -1261,7 +1263,7 @@ static int send_directory(struct connection *conn, struct buffered_data *in)
- return 0;
- }
-
--static int send_directory_part(struct connection *conn,
-+static int send_directory_part(const void *ctx, struct connection *conn,
- struct buffered_data *in)
- {
- unsigned int off, len, maxlen, genlen;
-@@ -1273,7 +1275,8 @@ static int send_directory_part(struct connection *conn,
- return EINVAL;
-
- /* First arg is node name. */
-- node = get_node_canonicalized(conn, in, in->buffer, NULL, XS_PERM_READ);
-+ node = get_node_canonicalized(conn, ctx, in->buffer, NULL,
-+ XS_PERM_READ);
- if (!node)
- return errno;
-
-@@ -1300,7 +1303,7 @@ static int send_directory_part(struct connection *conn,
- break;
- }
-
-- data = talloc_array(in, char, genlen + len + 1);
-+ data = talloc_array(ctx, char, genlen + len + 1);
- if (!data)
- return ENOMEM;
-
-@@ -1316,11 +1319,13 @@ static int send_directory_part(struct connection *conn,
- return 0;
- }
-
--static int do_read(struct connection *conn, struct buffered_data *in)
-+static int do_read(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- struct node *node;
-
-- node = get_node_canonicalized(conn, in, onearg(in), NULL, XS_PERM_READ);
-+ node = get_node_canonicalized(conn, ctx, onearg(in), NULL,
-+ XS_PERM_READ);
- if (!node)
- return errno;
-
-@@ -1510,7 +1515,8 @@ err:
- }
-
- /* path, data... */
--static int do_write(struct connection *conn, struct buffered_data *in)
-+static int do_write(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- unsigned int offset, datalen;
- struct node *node;
-@@ -1524,12 +1530,12 @@ static int do_write(struct connection *conn, struct buffered_data *in)
- offset = strlen(vec[0]) + 1;
- datalen = in->used - offset;
-
-- node = get_node_canonicalized(conn, in, vec[0], &name, XS_PERM_WRITE);
-+ node = get_node_canonicalized(conn, ctx, vec[0], &name, XS_PERM_WRITE);
- if (!node) {
- /* No permissions, invalid input? */
- if (errno != ENOENT)
- return errno;
-- node = create_node(conn, in, name, in->buffer + offset,
-+ node = create_node(conn, ctx, name, in->buffer + offset,
- datalen);
- if (!node)
- return errno;
-@@ -1540,18 +1546,19 @@ static int do_write(struct connection *conn, struct buffered_data *in)
- return errno;
- }
-
-- fire_watches(conn, in, name, node, false, NULL);
-+ fire_watches(conn, ctx, name, node, false, NULL);
- send_ack(conn, XS_WRITE);
-
- return 0;
- }
-
--static int do_mkdir(struct connection *conn, struct buffered_data *in)
-+static int do_mkdir(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- struct node *node;
- char *name;
-
-- node = get_node_canonicalized(conn, in, onearg(in), &name,
-+ node = get_node_canonicalized(conn, ctx, onearg(in), &name,
- XS_PERM_WRITE);
-
- /* If it already exists, fine. */
-@@ -1561,10 +1568,10 @@ static int do_mkdir(struct connection *conn, struct buffered_data *in)
- return errno;
- if (!name)
- return ENOMEM;
-- node = create_node(conn, in, name, NULL, 0);
-+ node = create_node(conn, ctx, name, NULL, 0);
- if (!node)
- return errno;
-- fire_watches(conn, in, name, node, false, NULL);
-+ fire_watches(conn, ctx, name, node, false, NULL);
- }
- send_ack(conn, XS_MKDIR);
-
-@@ -1662,24 +1669,25 @@ static int _rm(struct connection *conn, const void *ctx, struct node *node,
- }
-
-
--static int do_rm(struct connection *conn, struct buffered_data *in)
-+static int do_rm(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- struct node *node;
- int ret;
- char *name;
- char *parentname;
-
-- node = get_node_canonicalized(conn, in, onearg(in), &name,
-+ node = get_node_canonicalized(conn, ctx, onearg(in), &name,
- XS_PERM_WRITE);
- if (!node) {
- /* Didn't exist already? Fine, if parent exists. */
- if (errno == ENOENT) {
- if (!name)
- return ENOMEM;
-- parentname = get_parent(in, name);
-+ parentname = get_parent(ctx, name);
- if (!parentname)
- return errno;
-- node = read_node(conn, in, parentname);
-+ node = read_node(conn, ctx, parentname);
- if (node) {
- send_ack(conn, XS_RM);
- return 0;
-@@ -1694,7 +1702,7 @@ static int do_rm(struct connection *conn, struct buffered_data *in)
- if (streq(name, "/"))
- return EINVAL;
-
-- ret = _rm(conn, in, node, name);
-+ ret = _rm(conn, ctx, node, name);
- if (ret)
- return ret;
-
-@@ -1704,13 +1712,15 @@ static int do_rm(struct connection *conn, struct buffered_data *in)
- }
-
-
--static int do_get_perms(struct connection *conn, struct buffered_data *in)
-+static int do_get_perms(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- struct node *node;
- char *strings;
- unsigned int len;
-
-- node = get_node_canonicalized(conn, in, onearg(in), NULL, XS_PERM_READ);
-+ node = get_node_canonicalized(conn, ctx, onearg(in), NULL,
-+ XS_PERM_READ);
- if (!node)
- return errno;
-
-@@ -1723,7 +1733,8 @@ static int do_get_perms(struct connection *conn, struct buffered_data *in)
- return 0;
- }
-
--static int do_set_perms(struct connection *conn, struct buffered_data *in)
-+static int do_set_perms(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- struct node_perms perms, old_perms;
- char *name, *permstr;
-@@ -1740,7 +1751,7 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
-
- permstr = in->buffer + strlen(in->buffer) + 1;
-
-- perms.p = talloc_array(in, struct xs_permissions, perms.num);
-+ perms.p = talloc_array(ctx, struct xs_permissions, perms.num);
- if (!perms.p)
- return ENOMEM;
- if (!xs_strings_to_perms(perms.p, perms.num, permstr))
-@@ -1755,7 +1766,7 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
- }
-
- /* We must own node to do this (tools can do this too). */
-- node = get_node_canonicalized(conn, in, in->buffer, &name,
-+ node = get_node_canonicalized(conn, ctx, in->buffer, &name,
- XS_PERM_WRITE | XS_PERM_OWNER);
- if (!node)
- return errno;
-@@ -1790,7 +1801,7 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
- return errno;
- }
-
-- fire_watches(conn, in, name, node, false, &old_perms);
-+ fire_watches(conn, ctx, name, node, false, &old_perms);
- send_ack(conn, XS_SET_PERMS);
-
- return 0;
-@@ -1798,7 +1809,8 @@ static int do_set_perms(struct connection *conn, struct buffered_data *in)
-
- static struct {
- const char *str;
-- int (*func)(struct connection *conn, struct buffered_data *in);
-+ int (*func)(const void *ctx, struct connection *conn,
-+ struct buffered_data *in);
- unsigned int flags;
- #define XS_FLAG_NOTID (1U << 0) /* Ignore transaction id. */
- #define XS_FLAG_PRIV (1U << 1) /* Privileged domain only. */
-@@ -1874,6 +1886,7 @@ static void process_message(struct connection *conn, struct buffered_data *in)
- struct transaction *trans;
- enum xsd_sockmsg_type type = in->hdr.msg.type;
- int ret;
-+ void *ctx;
-
- /* At least send_error() and send_reply() expects conn->in == in */
- assert(conn->in == in);
-@@ -1898,10 +1911,17 @@ static void process_message(struct connection *conn, struct buffered_data *in)
- return;
- }
-
-+ ctx = talloc_new(NULL);
-+ if (!ctx) {
-+ send_error(conn, ENOMEM);
-+ return;
-+ }
-+
- assert(conn->transaction == NULL);
- conn->transaction = trans;
-
-- ret = wire_funcs[type].func(conn, in);
-+ ret = wire_funcs[type].func(ctx, conn, in);
-+ talloc_free(ctx);
- if (ret)
- send_error(conn, ret);
-
-diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
-index e7c6886ccf47..fb732d0a14c3 100644
---- a/tools/xenstore/xenstored_domain.c
-+++ b/tools/xenstore/xenstored_domain.c
-@@ -330,7 +330,7 @@ bool domain_is_unprivileged(struct connection *conn)
- domid_is_unprivileged(conn->domain->domid);
- }
-
--static char *talloc_domain_path(void *context, unsigned int domid)
-+static char *talloc_domain_path(const void *context, unsigned int domid)
- {
- return talloc_asprintf(context, "/local/domain/%u", domid);
- }
-@@ -534,7 +534,8 @@ static struct domain *introduce_domain(const void *ctx,
- }
-
- /* domid, gfn, evtchn, path */
--int do_introduce(struct connection *conn, struct buffered_data *in)
-+int do_introduce(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- struct domain *domain;
- char *vec[3];
-@@ -552,7 +553,7 @@ int do_introduce(struct connection *conn, struct buffered_data *in)
- if (port <= 0)
- return EINVAL;
-
-- domain = introduce_domain(in, domid, port, false);
-+ domain = introduce_domain(ctx, domid, port, false);
- if (!domain)
- return errno;
-
-@@ -575,7 +576,8 @@ static struct domain *find_connected_domain(unsigned int domid)
- return domain;
- }
-
--int do_set_target(struct connection *conn, struct buffered_data *in)
-+int do_set_target(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- char *vec[2];
- unsigned int domid, tdomid;
-@@ -619,7 +621,8 @@ static struct domain *onearg_domain(struct connection *conn,
- }
-
- /* domid */
--int do_release(struct connection *conn, struct buffered_data *in)
-+int do_release(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- struct domain *domain;
-
-@@ -634,7 +637,8 @@ int do_release(struct connection *conn, struct buffered_data *in)
- return 0;
- }
-
--int do_resume(struct connection *conn, struct buffered_data *in)
-+int do_resume(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- struct domain *domain;
-
-@@ -649,7 +653,8 @@ int do_resume(struct connection *conn, struct buffered_data *in)
- return 0;
- }
-
--int do_get_domain_path(struct connection *conn, struct buffered_data *in)
-+int do_get_domain_path(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- char *path;
- const char *domid_str = onearg(in);
-@@ -657,18 +662,17 @@ int do_get_domain_path(struct connection *conn, struct buffered_data *in)
- if (!domid_str)
- return EINVAL;
-
-- path = talloc_domain_path(conn, atoi(domid_str));
-+ path = talloc_domain_path(ctx, atoi(domid_str));
- if (!path)
- return errno;
-
- send_reply(conn, XS_GET_DOMAIN_PATH, path, strlen(path) + 1);
-
-- talloc_free(path);
--
- return 0;
- }
-
--int do_is_domain_introduced(struct connection *conn, struct buffered_data *in)
-+int do_is_domain_introduced(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- int result;
- unsigned int domid;
-@@ -689,7 +693,8 @@ int do_is_domain_introduced(struct connection *conn, struct buffered_data *in)
- }
-
- /* Allow guest to reset all watches */
--int do_reset_watches(struct connection *conn, struct buffered_data *in)
-+int do_reset_watches(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- conn_delete_all_watches(conn);
- conn_delete_all_transactions(conn);
-diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
-index 904faa923afb..b9e152890149 100644
---- a/tools/xenstore/xenstored_domain.h
-+++ b/tools/xenstore/xenstored_domain.h
-@@ -24,25 +24,32 @@ void handle_event(void);
- void check_domains(void);
-
- /* domid, mfn, eventchn, path */
--int do_introduce(struct connection *conn, struct buffered_data *in);
-+int do_introduce(const void *ctx, struct connection *conn,
-+ struct buffered_data *in);
-
- /* domid */
--int do_is_domain_introduced(struct connection *conn, struct buffered_data *in);
-+int do_is_domain_introduced(const void *ctx, struct connection *conn,
-+ struct buffered_data *in);
-
- /* domid */
--int do_release(struct connection *conn, struct buffered_data *in);
-+int do_release(const void *ctx, struct connection *conn,
-+ struct buffered_data *in);
-
- /* domid */
--int do_resume(struct connection *conn, struct buffered_data *in);
-+int do_resume(const void *ctx, struct connection *conn,
-+ struct buffered_data *in);
-
- /* domid, target */
--int do_set_target(struct connection *conn, struct buffered_data *in);
-+int do_set_target(const void *ctx, struct connection *conn,
-+ struct buffered_data *in);
-
- /* domid */
--int do_get_domain_path(struct connection *conn, struct buffered_data *in);
-+int do_get_domain_path(const void *ctx, struct connection *conn,
-+ struct buffered_data *in);
-
- /* Allow guest to reset all watches */
--int do_reset_watches(struct connection *conn, struct buffered_data *in);
-+int do_reset_watches(const void *ctx, struct connection *conn,
-+ struct buffered_data *in);
-
- void domain_init(int evtfd);
- void dom0_init(void);
-diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
-index 28774813de83..3e3eb47326cc 100644
---- a/tools/xenstore/xenstored_transaction.c
-+++ b/tools/xenstore/xenstored_transaction.c
-@@ -481,7 +481,8 @@ struct transaction *transaction_lookup(struct connection *conn, uint32_t id)
- return ERR_PTR(-ENOENT);
- }
-
--int do_transaction_start(struct connection *conn, struct buffered_data *in)
-+int do_transaction_start(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- struct transaction *trans, *exists;
- char id_str[20];
-@@ -494,8 +495,8 @@ int do_transaction_start(struct connection *conn, struct buffered_data *in)
- conn->transaction_started > quota_max_transaction)
- return ENOSPC;
-
-- /* Attach transaction to input for autofree until it's complete */
-- trans = talloc_zero(in, struct transaction);
-+ /* Attach transaction to ctx for autofree until it's complete */
-+ trans = talloc_zero(ctx, struct transaction);
- if (!trans)
- return ENOMEM;
-
-@@ -544,7 +545,8 @@ static int transaction_fix_domains(struct transaction *trans, bool update)
- return 0;
- }
-
--int do_transaction_end(struct connection *conn, struct buffered_data *in)
-+int do_transaction_end(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- const char *arg = onearg(in);
- struct transaction *trans;
-@@ -562,8 +564,8 @@ int do_transaction_end(struct connection *conn, struct buffered_data *in)
- if (!conn->transaction_started)
- conn->ta_start_time = 0;
-
-- /* Attach transaction to in for auto-cleanup */
-- talloc_steal(in, trans);
-+ /* Attach transaction to ctx for auto-cleanup */
-+ talloc_steal(ctx, trans);
-
- if (streq(arg, "T")) {
- if (trans->fail)
-diff --git a/tools/xenstore/xenstored_transaction.h b/tools/xenstore/xenstored_transaction.h
-index e3cbd6b23095..39d7f81c5127 100644
---- a/tools/xenstore/xenstored_transaction.h
-+++ b/tools/xenstore/xenstored_transaction.h
-@@ -29,8 +29,10 @@ struct transaction;
-
- extern uint64_t generation;
-
--int do_transaction_start(struct connection *conn, struct buffered_data *node);
--int do_transaction_end(struct connection *conn, struct buffered_data *in);
-+int do_transaction_start(const void *ctx, struct connection *conn,
-+ struct buffered_data *node);
-+int do_transaction_end(const void *ctx, struct connection *conn,
-+ struct buffered_data *in);
-
- struct transaction *transaction_lookup(struct connection *conn, uint32_t id);
-
-diff --git a/tools/xenstore/xenstored_watch.c b/tools/xenstore/xenstored_watch.c
-index 85362bcce314..316c08b7f754 100644
---- a/tools/xenstore/xenstored_watch.c
-+++ b/tools/xenstore/xenstored_watch.c
-@@ -243,7 +243,7 @@ static struct watch *add_watch(struct connection *conn, char *path, char *token,
- return NULL;
- }
-
--int do_watch(struct connection *conn, struct buffered_data *in)
-+int do_watch(const void *ctx, struct connection *conn, struct buffered_data *in)
- {
- struct watch *watch;
- char *vec[2];
-@@ -252,7 +252,7 @@ int do_watch(struct connection *conn, struct buffered_data *in)
- if (get_strings(in, vec, ARRAY_SIZE(vec)) != ARRAY_SIZE(vec))
- return EINVAL;
-
-- errno = check_watch_path(conn, in, &(vec[0]), &relative);
-+ errno = check_watch_path(conn, ctx, &(vec[0]), &relative);
- if (errno)
- return errno;
-
-@@ -283,7 +283,8 @@ int do_watch(struct connection *conn, struct buffered_data *in)
- return 0;
- }
-
--int do_unwatch(struct connection *conn, struct buffered_data *in)
-+int do_unwatch(const void *ctx, struct connection *conn,
-+ struct buffered_data *in)
- {
- struct watch *watch;
- char *node, *vec[2];
-@@ -291,7 +292,7 @@ int do_unwatch(struct connection *conn, struct buffered_data *in)
- if (get_strings(in, vec, ARRAY_SIZE(vec)) != ARRAY_SIZE(vec))
- return EINVAL;
-
-- node = canonicalize(conn, in, vec[0]);
-+ node = canonicalize(conn, ctx, vec[0]);
- if (!node)
- return ENOMEM;
- list_for_each_entry(watch, &conn->watches, list) {
-diff --git a/tools/xenstore/xenstored_watch.h b/tools/xenstore/xenstored_watch.h
-index 0e693f0839cd..091890edca96 100644
---- a/tools/xenstore/xenstored_watch.h
-+++ b/tools/xenstore/xenstored_watch.h
-@@ -21,8 +21,10 @@
-
- #include "xenstored_core.h"
-
--int do_watch(struct connection *conn, struct buffered_data *in);
--int do_unwatch(struct connection *conn, struct buffered_data *in);
-+int do_watch(const void *ctx, struct connection *conn,
-+ struct buffered_data *in);
-+int do_unwatch(const void *ctx, struct connection *conn,
-+ struct buffered_data *in);
-
- /* Fire all watches: !exact means all the children are affected (ie. rm). */
- void fire_watches(struct connection *conn, const void *tmp, const char *name,
---
-2.37.4
-
diff --git a/0071-tools-xenstore-fix-checking-node-permissions.patch b/0071-tools-xenstore-fix-checking-node-permissions.patch
deleted file mode 100644
index 7cfb08b..0000000
--- a/0071-tools-xenstore-fix-checking-node-permissions.patch
+++ /dev/null
@@ -1,143 +0,0 @@
-From 036fa8717b316a10b67ea8cf4d5dd200ac2b29af Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:10 +0200
-Subject: [PATCH 71/87] tools/xenstore: fix checking node permissions
-
-Today chk_domain_generation() is being used to check whether a node
-permission entry is still valid or whether it is referring to a domain
-no longer existing. This is done by comparing the node's and the
-domain's generation count.
-
-In case no struct domain is existing for a checked domain, but the
-domain itself is valid, chk_domain_generation() assumes it is being
-called due to the first node created for a new domain and it will
-return success.
-
-This might be wrong in case the checked permission is related to an
-old domain, which has just been replaced with a new domain using the
-same domid.
-
-Fix that by letting chk_domain_generation() fail in case a struct
-domain isn't found. In order to cover the case of the first node for
-a new domain try to allocate the needed struct domain explicitly when
-processing the related SET_PERMS command. In case a referenced domain
-isn't existing, flag the related permission to be ignored right away.
-
-This is XSA-417 / CVE-2022-42320.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit ab128218225d3542596ca3a02aee80d55494bef8)
----
- tools/xenstore/xenstored_core.c | 5 +++++
- tools/xenstore/xenstored_domain.c | 37 +++++++++++++++++++++----------
- tools/xenstore/xenstored_domain.h | 1 +
- 3 files changed, 31 insertions(+), 12 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 411cc0e44714..c676ee4e4e4f 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -1757,6 +1757,11 @@ static int do_set_perms(const void *ctx, struct connection *conn,
- if (!xs_strings_to_perms(perms.p, perms.num, permstr))
- return errno;
-
-+ if (domain_alloc_permrefs(&perms) < 0)
-+ return ENOMEM;
-+ if (perms.p[0].perms & XS_PERM_IGNORE)
-+ return ENOENT;
-+
- /* First arg is node name. */
- if (strstarts(in->buffer, "@")) {
- if (set_perms_special(conn, in->buffer, &perms))
-diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
-index fb732d0a14c3..e2f1b09c6037 100644
---- a/tools/xenstore/xenstored_domain.c
-+++ b/tools/xenstore/xenstored_domain.c
-@@ -875,7 +875,6 @@ int domain_entry_inc(struct connection *conn, struct node *node)
- * count (used for testing whether a node permission is older than a domain).
- *
- * Return values:
-- * -1: error
- * 0: domain has higher generation count (it is younger than a node with the
- * given count), or domain isn't existing any longer
- * 1: domain is older than the node
-@@ -883,20 +882,38 @@ int domain_entry_inc(struct connection *conn, struct node *node)
- static int chk_domain_generation(unsigned int domid, uint64_t gen)
- {
- struct domain *d;
-- xc_dominfo_t dominfo;
-
- if (!xc_handle && domid == 0)
- return 1;
-
- d = find_domain_struct(domid);
-- if (d)
-- return (d->generation <= gen) ? 1 : 0;
-
-- if (!get_domain_info(domid, &dominfo))
-- return 0;
-+ return (d && d->generation <= gen) ? 1 : 0;
-+}
-
-- d = alloc_domain(NULL, domid);
-- return d ? 1 : -1;
-+/*
-+ * Allocate all missing struct domain referenced by a permission set.
-+ * Any permission entries for not existing domains will be marked to be
-+ * ignored.
-+ */
-+int domain_alloc_permrefs(struct node_perms *perms)
-+{
-+ unsigned int i, domid;
-+ struct domain *d;
-+ xc_dominfo_t dominfo;
-+
-+ for (i = 0; i < perms->num; i++) {
-+ domid = perms->p[i].id;
-+ d = find_domain_struct(domid);
-+ if (!d) {
-+ if (!get_domain_info(domid, &dominfo))
-+ perms->p[i].perms |= XS_PERM_IGNORE;
-+ else if (!alloc_domain(NULL, domid))
-+ return ENOMEM;
-+ }
-+ }
-+
-+ return 0;
- }
-
- /*
-@@ -909,8 +926,6 @@ int domain_adjust_node_perms(struct connection *conn, struct node *node)
- int ret;
-
- ret = chk_domain_generation(node->perms.p[0].id, node->generation);
-- if (ret < 0)
-- return errno;
-
- /* If the owner doesn't exist any longer give it to priv domain. */
- if (!ret) {
-@@ -927,8 +942,6 @@ int domain_adjust_node_perms(struct connection *conn, struct node *node)
- continue;
- ret = chk_domain_generation(node->perms.p[i].id,
- node->generation);
-- if (ret < 0)
-- return errno;
- if (!ret)
- node->perms.p[i].perms |= XS_PERM_IGNORE;
- }
-diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
-index b9e152890149..40fe5f690900 100644
---- a/tools/xenstore/xenstored_domain.h
-+++ b/tools/xenstore/xenstored_domain.h
-@@ -62,6 +62,7 @@ bool domain_is_unprivileged(struct connection *conn);
-
- /* Remove node permissions for no longer existing domains. */
- int domain_adjust_node_perms(struct connection *conn, struct node *node);
-+int domain_alloc_permrefs(struct node_perms *perms);
-
- /* Quota manipulation */
- int domain_entry_inc(struct connection *conn, struct node *);
---
-2.37.4
-
diff --git a/0072-tools-xenstore-remove-recursion-from-construct_node.patch b/0072-tools-xenstore-remove-recursion-from-construct_node.patch
deleted file mode 100644
index 72aebfd..0000000
--- a/0072-tools-xenstore-remove-recursion-from-construct_node.patch
+++ /dev/null
@@ -1,125 +0,0 @@
-From 074b32e47174a30bb751f2e2c07628eb56117eb8 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:11 +0200
-Subject: [PATCH 72/87] tools/xenstore: remove recursion from construct_node()
-
-In order to reduce stack usage due to recursion, switch
-construct_node() to use a loop instead.
-
-This is part of XSA-418 / CVE-2022-42321.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit da8ee25d02a5447ba39a9800ee2a710ae1f54222)
----
- tools/xenstore/xenstored_core.c | 86 +++++++++++++++++++++------------
- 1 file changed, 55 insertions(+), 31 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index c676ee4e4e4f..3907c35643e9 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -1377,45 +1377,69 @@ static int add_child(const void *ctx, struct node *parent, const char *name)
- static struct node *construct_node(struct connection *conn, const void *ctx,
- const char *name)
- {
-- struct node *parent, *node;
-- char *parentname = get_parent(ctx, name);
-+ const char **names = NULL;
-+ unsigned int levels = 0;
-+ struct node *node = NULL;
-+ struct node *parent = NULL;
-+ const char *parentname = talloc_strdup(ctx, name);
-
- if (!parentname)
- return NULL;
-
-- /* If parent doesn't exist, create it. */
-- parent = read_node(conn, parentname, parentname);
-- if (!parent && errno == ENOENT)
-- parent = construct_node(conn, ctx, parentname);
-- if (!parent)
-- return NULL;
-+ /* Walk the path up until an existing node is found. */
-+ while (!parent) {
-+ names = talloc_realloc(ctx, names, const char *, levels + 1);
-+ if (!names)
-+ goto nomem;
-
-- /* Add child to parent. */
-- if (add_child(ctx, parent, name))
-- goto nomem;
-+ /*
-+ * names[0] is the name of the node to construct initially,
-+ * names[1] is its parent, and so on.
-+ */
-+ names[levels] = parentname;
-+ parentname = get_parent(ctx, parentname);
-+ if (!parentname)
-+ return NULL;
-
-- /* Allocate node */
-- node = talloc(ctx, struct node);
-- if (!node)
-- goto nomem;
-- node->name = talloc_strdup(node, name);
-- if (!node->name)
-- goto nomem;
-+ /* Try to read parent node until we found an existing one. */
-+ parent = read_node(conn, ctx, parentname);
-+ if (!parent && (errno != ENOENT || !strcmp(parentname, "/")))
-+ return NULL;
-
-- /* Inherit permissions, except unprivileged domains own what they create */
-- node->perms.num = parent->perms.num;
-- node->perms.p = talloc_memdup(node, parent->perms.p,
-- node->perms.num * sizeof(*node->perms.p));
-- if (!node->perms.p)
-- goto nomem;
-- if (domain_is_unprivileged(conn))
-- node->perms.p[0].id = conn->id;
-+ levels++;
-+ }
-+
-+ /* Walk the path down again constructing the missing nodes. */
-+ for (; levels > 0; levels--) {
-+ /* Add child to parent. */
-+ if (add_child(ctx, parent, names[levels - 1]))
-+ goto nomem;
-+
-+ /* Allocate node */
-+ node = talloc(ctx, struct node);
-+ if (!node)
-+ goto nomem;
-+ node->name = talloc_steal(node, names[levels - 1]);
-+
-+ /* Inherit permissions, unpriv domains own what they create. */
-+ node->perms.num = parent->perms.num;
-+ node->perms.p = talloc_memdup(node, parent->perms.p,
-+ node->perms.num *
-+ sizeof(*node->perms.p));
-+ if (!node->perms.p)
-+ goto nomem;
-+ if (domain_is_unprivileged(conn))
-+ node->perms.p[0].id = conn->id;
-+
-+ /* No children, no data */
-+ node->children = node->data = NULL;
-+ node->childlen = node->datalen = 0;
-+ node->acc.memory = 0;
-+ node->parent = parent;
-+
-+ parent = node;
-+ }
-
-- /* No children, no data */
-- node->children = node->data = NULL;
-- node->childlen = node->datalen = 0;
-- node->acc.memory = 0;
-- node->parent = parent;
- return node;
-
- nomem:
---
-2.37.4
-
diff --git a/0073-tools-xenstore-don-t-let-remove_child_entry-call-cor.patch b/0073-tools-xenstore-don-t-let-remove_child_entry-call-cor.patch
deleted file mode 100644
index 3c01eb5..0000000
--- a/0073-tools-xenstore-don-t-let-remove_child_entry-call-cor.patch
+++ /dev/null
@@ -1,110 +0,0 @@
-From 32ff913afed898e6aef61626a58dc0bf5c6309ef Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:11 +0200
-Subject: [PATCH 73/87] tools/xenstore: don't let remove_child_entry() call
- corrupt()
-
-In case of write_node() returning an error, remove_child_entry() will
-call corrupt() today. This could result in an endless recursion, as
-remove_child_entry() is called by corrupt(), too:
-
-corrupt()
- check_store()
- check_store_()
- remove_child_entry()
-
-Fix that by letting remove_child_entry() return an error instead and
-let the caller decide what to do.
-
-This is part of XSA-418 / CVE-2022-42321.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 0c00c51f3bc8206c7f9cf87d014650157bee2bf4)
----
- tools/xenstore/xenstored_core.c | 36 ++++++++++++++++++---------------
- 1 file changed, 20 insertions(+), 16 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 3907c35643e9..f433a45dc217 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -1608,15 +1608,15 @@ static void memdel(void *mem, unsigned off, unsigned len, unsigned total)
- memmove(mem + off, mem + off + len, total - off - len);
- }
-
--static void remove_child_entry(struct connection *conn, struct node *node,
-- size_t offset)
-+static int remove_child_entry(struct connection *conn, struct node *node,
-+ size_t offset)
- {
- size_t childlen = strlen(node->children + offset);
-
- memdel(node->children, offset, childlen + 1, node->childlen);
- node->childlen -= childlen + 1;
-- if (write_node(conn, node, true))
-- corrupt(conn, "Can't update parent node '%s'", node->name);
-+
-+ return write_node(conn, node, true);
- }
-
- static void delete_child(struct connection *conn,
-@@ -1626,7 +1626,9 @@ static void delete_child(struct connection *conn,
-
- for (i = 0; i < node->childlen; i += strlen(node->children+i) + 1) {
- if (streq(node->children+i, childname)) {
-- remove_child_entry(conn, node, i);
-+ if (remove_child_entry(conn, node, i))
-+ corrupt(conn, "Can't update parent node '%s'",
-+ node->name);
- return;
- }
- }
-@@ -2325,6 +2327,17 @@ int remember_string(struct hashtable *hash, const char *str)
- return hashtable_insert(hash, k, (void *)1);
- }
-
-+static int rm_child_entry(struct node *node, size_t off, size_t len)
-+{
-+ if (!recovery)
-+ return off;
-+
-+ if (remove_child_entry(NULL, node, off))
-+ log("check_store: child entry could not be removed from '%s'",
-+ node->name);
-+
-+ return off - len - 1;
-+}
-
- /**
- * A node has a children field that names the children of the node, separated
-@@ -2377,12 +2390,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
- if (hashtable_search(children, childname)) {
- log("check_store: '%s' is duplicated!",
- childname);
--
-- if (recovery) {
-- remove_child_entry(NULL, node,
-- i);
-- i -= childlen + 1;
-- }
-+ i = rm_child_entry(node, i, childlen);
- }
- else {
- if (!remember_string(children,
-@@ -2399,11 +2407,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
- } else if (errno != ENOMEM) {
- log("check_store: No child '%s' found!\n",
- childname);
--
-- if (recovery) {
-- remove_child_entry(NULL, node, i);
-- i -= childlen + 1;
-- }
-+ i = rm_child_entry(node, i, childlen);
- } else {
- log("check_store: ENOMEM");
- ret = ENOMEM;
---
-2.37.4
-
diff --git a/0074-tools-xenstore-add-generic-treewalk-function.patch b/0074-tools-xenstore-add-generic-treewalk-function.patch
deleted file mode 100644
index d84439c..0000000
--- a/0074-tools-xenstore-add-generic-treewalk-function.patch
+++ /dev/null
@@ -1,250 +0,0 @@
-From 01ab4910229696e51c59a80eb86d0fedeeccb54b Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:11 +0200
-Subject: [PATCH 74/87] tools/xenstore: add generic treewalk function
-
-Add a generic function to walk the complete node tree. It will start
-at "/" and descend recursively into each child, calling a function
-specified by the caller. Depending on the return value of the user
-specified function the walk will be aborted, continued, or the current
-child will be skipped by not descending into its children.
-
-This is part of XSA-418 / CVE-2022-42321.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 0d7c5d19bc27492360196e7dad2b227908564fff)
----
- tools/xenstore/xenstored_core.c | 143 +++++++++++++++++++++++++++++---
- tools/xenstore/xenstored_core.h | 40 +++++++++
- 2 files changed, 170 insertions(+), 13 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index f433a45dc217..2cda3ee375ab 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -1838,6 +1838,135 @@ static int do_set_perms(const void *ctx, struct connection *conn,
- return 0;
- }
-
-+static char *child_name(const void *ctx, const char *s1, const char *s2)
-+{
-+ if (strcmp(s1, "/"))
-+ return talloc_asprintf(ctx, "%s/%s", s1, s2);
-+ return talloc_asprintf(ctx, "/%s", s2);
-+}
-+
-+static int rm_from_parent(struct connection *conn, struct node *parent,
-+ const char *name)
-+{
-+ size_t off;
-+
-+ if (!parent)
-+ return WALK_TREE_ERROR_STOP;
-+
-+ for (off = parent->childoff - 1; off && parent->children[off - 1];
-+ off--);
-+ if (remove_child_entry(conn, parent, off)) {
-+ log("treewalk: child entry could not be removed from '%s'",
-+ parent->name);
-+ return WALK_TREE_ERROR_STOP;
-+ }
-+ parent->childoff = off;
-+
-+ return WALK_TREE_OK;
-+}
-+
-+static int walk_call_func(const void *ctx, struct connection *conn,
-+ struct node *node, struct node *parent, void *arg,
-+ int (*func)(const void *ctx, struct connection *conn,
-+ struct node *node, void *arg))
-+{
-+ int ret;
-+
-+ if (!func)
-+ return WALK_TREE_OK;
-+
-+ ret = func(ctx, conn, node, arg);
-+ if (ret == WALK_TREE_RM_CHILDENTRY && parent)
-+ ret = rm_from_parent(conn, parent, node->name);
-+
-+ return ret;
-+}
-+
-+int walk_node_tree(const void *ctx, struct connection *conn, const char *root,
-+ struct walk_funcs *funcs, void *arg)
-+{
-+ int ret = 0;
-+ void *tmpctx;
-+ char *name;
-+ struct node *node = NULL;
-+ struct node *parent = NULL;
-+
-+ tmpctx = talloc_new(ctx);
-+ if (!tmpctx) {
-+ errno = ENOMEM;
-+ return WALK_TREE_ERROR_STOP;
-+ }
-+ name = talloc_strdup(tmpctx, root);
-+ if (!name) {
-+ errno = ENOMEM;
-+ talloc_free(tmpctx);
-+ return WALK_TREE_ERROR_STOP;
-+ }
-+
-+ /* Continue the walk until an error is returned. */
-+ while (ret >= 0) {
-+ /* node == NULL possible only for the initial loop iteration. */
-+ if (node) {
-+ /* Go one step up if ret or if last child finished. */
-+ if (ret || node->childoff >= node->childlen) {
-+ parent = node->parent;
-+ /* Call function AFTER processing a node. */
-+ ret = walk_call_func(ctx, conn, node, parent,
-+ arg, funcs->exit);
-+ /* Last node, so exit loop. */
-+ if (!parent)
-+ break;
-+ talloc_free(node);
-+ /* Continue with parent. */
-+ node = parent;
-+ continue;
-+ }
-+ /* Get next child of current node. */
-+ name = child_name(tmpctx, node->name,
-+ node->children + node->childoff);
-+ if (!name) {
-+ ret = WALK_TREE_ERROR_STOP;
-+ break;
-+ }
-+ /* Point to next child. */
-+ node->childoff += strlen(node->children +
-+ node->childoff) + 1;
-+ /* Descent into children. */
-+ parent = node;
-+ }
-+ /* Read next node (root node or next child). */
-+ node = read_node(conn, tmpctx, name);
-+ if (!node) {
-+ /* Child not found - should not happen! */
-+ /* ENOENT case can be handled by supplied function. */
-+ if (errno == ENOENT && funcs->enoent)
-+ ret = funcs->enoent(ctx, conn, parent, name,
-+ arg);
-+ else
-+ ret = WALK_TREE_ERROR_STOP;
-+ if (!parent)
-+ break;
-+ if (ret == WALK_TREE_RM_CHILDENTRY)
-+ ret = rm_from_parent(conn, parent, name);
-+ if (ret < 0)
-+ break;
-+ talloc_free(name);
-+ node = parent;
-+ continue;
-+ }
-+ talloc_free(name);
-+ node->parent = parent;
-+ node->childoff = 0;
-+ /* Call function BEFORE processing a node. */
-+ ret = walk_call_func(ctx, conn, node, parent, arg,
-+ funcs->enter);
-+ }
-+
-+ talloc_free(tmpctx);
-+
-+ return ret < 0 ? ret : WALK_TREE_OK;
-+}
-+
- static struct {
- const char *str;
- int (*func)(const void *ctx, struct connection *conn,
-@@ -2305,18 +2434,6 @@ static int keys_equal_fn(void *key1, void *key2)
- return 0 == strcmp((char *)key1, (char *)key2);
- }
-
--
--static char *child_name(const char *s1, const char *s2)
--{
-- if (strcmp(s1, "/")) {
-- return talloc_asprintf(NULL, "%s/%s", s1, s2);
-- }
-- else {
-- return talloc_asprintf(NULL, "/%s", s2);
-- }
--}
--
--
- int remember_string(struct hashtable *hash, const char *str)
- {
- char *k = malloc(strlen(str) + 1);
-@@ -2376,7 +2493,7 @@ static int check_store_(const char *name, struct hashtable *reachable)
- while (i < node->childlen && !ret) {
- struct node *childnode;
- size_t childlen = strlen(node->children + i);
-- char * childname = child_name(node->name,
-+ char * childname = child_name(NULL, node->name,
- node->children + i);
-
- if (!childname) {
-diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
-index bfd3fc1e9df3..2d9942171d92 100644
---- a/tools/xenstore/xenstored_core.h
-+++ b/tools/xenstore/xenstored_core.h
-@@ -202,6 +202,7 @@ struct node {
-
- /* Children, each nul-terminated. */
- unsigned int childlen;
-+ unsigned int childoff; /* Used by walk_node_tree() internally. */
- char *children;
-
- /* Allocation information for node currently in store. */
-@@ -338,6 +339,45 @@ void read_state_buffered_data(const void *ctx, struct connection *conn,
- const struct xs_state_connection *sc);
- void read_state_node(const void *ctx, const void *state);
-
-+/*
-+ * Walk the node tree below root calling funcs->enter() and funcs->exit() for
-+ * each node. funcs->enter() is being called when entering a node, so before
-+ * any of the children of the node is processed. funcs->exit() is being
-+ * called when leaving the node, so after all children have been processed.
-+ * funcs->enoent() is being called when a node isn't existing.
-+ * funcs->*() return values:
-+ * < 0: tree walk is stopped, walk_node_tree() returns funcs->*() return value
-+ * in case WALK_TREE_ERROR_STOP is returned, errno should be set
-+ * WALK_TREE_OK: tree walk is continuing
-+ * WALK_TREE_SKIP_CHILDREN: tree walk won't descend below current node, but
-+ * walk continues
-+ * WALK_TREE_RM_CHILDENTRY: Remove the child entry from its parent and write
-+ * the modified parent node back to the data base, implies to not descend
-+ * below the current node, but to continue the walk
-+ * funcs->*() is allowed to modify the node it is called for in the data base.
-+ * In case funcs->enter() is deleting the node, it must not return WALK_TREE_OK
-+ * in order to avoid descending into no longer existing children.
-+ */
-+/* Return values for funcs->*() and walk_node_tree(). */
-+#define WALK_TREE_SUCCESS_STOP -100 /* Stop walk early, no error. */
-+#define WALK_TREE_ERROR_STOP -1 /* Stop walk due to error. */
-+#define WALK_TREE_OK 0 /* No error. */
-+/* Return value for funcs->*() only. */
-+#define WALK_TREE_SKIP_CHILDREN 1 /* Don't recurse below current node. */
-+#define WALK_TREE_RM_CHILDENTRY 2 /* Remove child entry from parent. */
-+
-+struct walk_funcs {
-+ int (*enter)(const void *ctx, struct connection *conn,
-+ struct node *node, void *arg);
-+ int (*exit)(const void *ctx, struct connection *conn,
-+ struct node *node, void *arg);
-+ int (*enoent)(const void *ctx, struct connection *conn,
-+ struct node *parent, char *name, void *arg);
-+};
-+
-+int walk_node_tree(const void *ctx, struct connection *conn, const char *root,
-+ struct walk_funcs *funcs, void *arg);
-+
- #endif /* _XENSTORED_CORE_H */
-
- /*
---
-2.37.4
-
diff --git a/0075-tools-xenstore-simplify-check_store.patch b/0075-tools-xenstore-simplify-check_store.patch
deleted file mode 100644
index 5d0348f..0000000
--- a/0075-tools-xenstore-simplify-check_store.patch
+++ /dev/null
@@ -1,114 +0,0 @@
-From c5a76df793c638423e1388528dc679a3e020a477 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:12 +0200
-Subject: [PATCH 75/87] tools/xenstore: simplify check_store()
-
-check_store() is using a hash table for storing all node names it has
-found via walking the tree. Additionally it using another hash table
-for all children of a node to detect duplicate child names.
-
-Simplify that by dropping the second hash table as the first one is
-already holding all the needed information.
-
-This is part of XSA-418 / CVE-2022-42321.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 70f719f52a220bc5bc987e4dd28e14a7039a176b)
----
- tools/xenstore/xenstored_core.c | 47 +++++++++++----------------------
- 1 file changed, 15 insertions(+), 32 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 2cda3ee375ab..760f3c16c794 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -2477,50 +2477,34 @@ static int check_store_(const char *name, struct hashtable *reachable)
- if (node) {
- size_t i = 0;
-
-- struct hashtable * children =
-- create_hashtable(16, hash_from_key_fn, keys_equal_fn);
-- if (!children) {
-- log("check_store create table: ENOMEM");
-- return ENOMEM;
-- }
--
- if (!remember_string(reachable, name)) {
-- hashtable_destroy(children, 0);
- log("check_store: ENOMEM");
- return ENOMEM;
- }
-
- while (i < node->childlen && !ret) {
-- struct node *childnode;
-+ struct node *childnode = NULL;
- size_t childlen = strlen(node->children + i);
-- char * childname = child_name(NULL, node->name,
-- node->children + i);
-+ char *childname = child_name(NULL, node->name,
-+ node->children + i);
-
- if (!childname) {
- log("check_store: ENOMEM");
- ret = ENOMEM;
- break;
- }
-+
-+ if (hashtable_search(reachable, childname)) {
-+ log("check_store: '%s' is duplicated!",
-+ childname);
-+ i = rm_child_entry(node, i, childlen);
-+ goto next;
-+ }
-+
- childnode = read_node(NULL, childname, childname);
--
-+
- if (childnode) {
-- if (hashtable_search(children, childname)) {
-- log("check_store: '%s' is duplicated!",
-- childname);
-- i = rm_child_entry(node, i, childlen);
-- }
-- else {
-- if (!remember_string(children,
-- childname)) {
-- log("check_store: ENOMEM");
-- talloc_free(childnode);
-- talloc_free(childname);
-- ret = ENOMEM;
-- break;
-- }
-- ret = check_store_(childname,
-- reachable);
-- }
-+ ret = check_store_(childname, reachable);
- } else if (errno != ENOMEM) {
- log("check_store: No child '%s' found!\n",
- childname);
-@@ -2530,19 +2514,18 @@ static int check_store_(const char *name, struct hashtable *reachable)
- ret = ENOMEM;
- }
-
-+ next:
- talloc_free(childnode);
- talloc_free(childname);
- i += childlen + 1;
- }
-
-- hashtable_destroy(children, 0 /* Don't free values (they are
-- all (void *)1) */);
- talloc_free(node);
- } else if (errno != ENOMEM) {
- /* Impossible, because no database should ever be without the
- root, and otherwise, we've just checked in our caller
- (which made a recursive call to get here). */
--
-+
- log("check_store: No child '%s' found: impossible!", name);
- } else {
- log("check_store: ENOMEM");
---
-2.37.4
-
diff --git a/0076-tools-xenstore-use-treewalk-for-check_store.patch b/0076-tools-xenstore-use-treewalk-for-check_store.patch
deleted file mode 100644
index b965eb0..0000000
--- a/0076-tools-xenstore-use-treewalk-for-check_store.patch
+++ /dev/null
@@ -1,172 +0,0 @@
-From f5a4c26b2efc55a5267840fcb31f95c00cc25d10 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:12 +0200
-Subject: [PATCH 76/87] tools/xenstore: use treewalk for check_store()
-
-Instead of doing an open tree walk using call recursion, use
-walk_node_tree() when checking the store for inconsistencies.
-
-This will reduce code size and avoid many nesting levels of function
-calls which could potentially exhaust the stack.
-
-This is part of XSA-418 / CVE-2022-42321.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit a07cc0ec60612f414bedf2bafb26ec38d2602e95)
----
- tools/xenstore/xenstored_core.c | 109 +++++++++-----------------------
- 1 file changed, 30 insertions(+), 79 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 760f3c16c794..efdd1888fd78 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -2444,18 +2444,6 @@ int remember_string(struct hashtable *hash, const char *str)
- return hashtable_insert(hash, k, (void *)1);
- }
-
--static int rm_child_entry(struct node *node, size_t off, size_t len)
--{
-- if (!recovery)
-- return off;
--
-- if (remove_child_entry(NULL, node, off))
-- log("check_store: child entry could not be removed from '%s'",
-- node->name);
--
-- return off - len - 1;
--}
--
- /**
- * A node has a children field that names the children of the node, separated
- * by NULs. We check whether there are entries in there that are duplicated
-@@ -2469,70 +2457,29 @@ static int rm_child_entry(struct node *node, size_t off, size_t len)
- * As we go, we record each node in the given reachable hashtable. These
- * entries will be used later in clean_store.
- */
--static int check_store_(const char *name, struct hashtable *reachable)
-+static int check_store_step(const void *ctx, struct connection *conn,
-+ struct node *node, void *arg)
- {
-- struct node *node = read_node(NULL, name, name);
-- int ret = 0;
-+ struct hashtable *reachable = arg;
-
-- if (node) {
-- size_t i = 0;
--
-- if (!remember_string(reachable, name)) {
-- log("check_store: ENOMEM");
-- return ENOMEM;
-- }
--
-- while (i < node->childlen && !ret) {
-- struct node *childnode = NULL;
-- size_t childlen = strlen(node->children + i);
-- char *childname = child_name(NULL, node->name,
-- node->children + i);
--
-- if (!childname) {
-- log("check_store: ENOMEM");
-- ret = ENOMEM;
-- break;
-- }
--
-- if (hashtable_search(reachable, childname)) {
-- log("check_store: '%s' is duplicated!",
-- childname);
-- i = rm_child_entry(node, i, childlen);
-- goto next;
-- }
--
-- childnode = read_node(NULL, childname, childname);
--
-- if (childnode) {
-- ret = check_store_(childname, reachable);
-- } else if (errno != ENOMEM) {
-- log("check_store: No child '%s' found!\n",
-- childname);
-- i = rm_child_entry(node, i, childlen);
-- } else {
-- log("check_store: ENOMEM");
-- ret = ENOMEM;
-- }
--
-- next:
-- talloc_free(childnode);
-- talloc_free(childname);
-- i += childlen + 1;
-- }
--
-- talloc_free(node);
-- } else if (errno != ENOMEM) {
-- /* Impossible, because no database should ever be without the
-- root, and otherwise, we've just checked in our caller
-- (which made a recursive call to get here). */
--
-- log("check_store: No child '%s' found: impossible!", name);
-- } else {
-- log("check_store: ENOMEM");
-- ret = ENOMEM;
-+ if (hashtable_search(reachable, (void *)node->name)) {
-+ log("check_store: '%s' is duplicated!", node->name);
-+ return recovery ? WALK_TREE_RM_CHILDENTRY
-+ : WALK_TREE_SKIP_CHILDREN;
- }
-
-- return ret;
-+ if (!remember_string(reachable, node->name))
-+ return WALK_TREE_ERROR_STOP;
-+
-+ return WALK_TREE_OK;
-+}
-+
-+static int check_store_enoent(const void *ctx, struct connection *conn,
-+ struct node *parent, char *name, void *arg)
-+{
-+ log("check_store: node '%s' not found", name);
-+
-+ return recovery ? WALK_TREE_RM_CHILDENTRY : WALK_TREE_OK;
- }
-
-
-@@ -2581,24 +2528,28 @@ static void clean_store(struct hashtable *reachable)
-
- void check_store(void)
- {
-- char * root = talloc_strdup(NULL, "/");
-- struct hashtable * reachable =
-- create_hashtable(16, hash_from_key_fn, keys_equal_fn);
--
-+ struct hashtable *reachable;
-+ struct walk_funcs walkfuncs = {
-+ .enter = check_store_step,
-+ .enoent = check_store_enoent,
-+ };
-+
-+ reachable = create_hashtable(16, hash_from_key_fn, keys_equal_fn);
- if (!reachable) {
- log("check_store: ENOMEM");
- return;
- }
-
- log("Checking store ...");
-- if (!check_store_(root, reachable) &&
-- !check_transactions(reachable))
-+ if (walk_node_tree(NULL, NULL, "/", &walkfuncs, reachable)) {
-+ if (errno == ENOMEM)
-+ log("check_store: ENOMEM");
-+ } else if (!check_transactions(reachable))
- clean_store(reachable);
- log("Checking store complete.");
-
- hashtable_destroy(reachable, 0 /* Don't free values (they are all
- (void *)1) */);
-- talloc_free(root);
- }
-
-
---
-2.37.4
-
diff --git a/0077-tools-xenstore-use-treewalk-for-deleting-nodes.patch b/0077-tools-xenstore-use-treewalk-for-deleting-nodes.patch
deleted file mode 100644
index 6d80a4d..0000000
--- a/0077-tools-xenstore-use-treewalk-for-deleting-nodes.patch
+++ /dev/null
@@ -1,180 +0,0 @@
-From 1514de3a5f23aef451133367d8dc04a26b88052f Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:12 +0200
-Subject: [PATCH 77/87] tools/xenstore: use treewalk for deleting nodes
-
-Instead of doing an open tree walk using call recursion, use
-walk_node_tree() when deleting a sub-tree of nodes.
-
-This will reduce code size and avoid many nesting levels of function
-calls which could potentially exhaust the stack.
-
-This is part of XSA-418 / CVE-2022-42321.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit ea16962053a6849a6e7cada549ba7f8c586d85c6)
----
- tools/xenstore/xenstored_core.c | 99 ++++++++++++++-------------------
- 1 file changed, 43 insertions(+), 56 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index efdd1888fd78..58fb651542ec 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -1334,21 +1334,6 @@ static int do_read(const void *ctx, struct connection *conn,
- return 0;
- }
-
--static void delete_node_single(struct connection *conn, struct node *node)
--{
-- TDB_DATA key;
--
-- if (access_node(conn, node, NODE_ACCESS_DELETE, &key))
-- return;
--
-- if (do_tdb_delete(conn, &key, &node->acc) != 0) {
-- corrupt(conn, "Could not delete '%s'", node->name);
-- return;
-- }
--
-- domain_entry_dec(conn, node);
--}
--
- /* Must not be / */
- static char *basename(const char *name)
- {
-@@ -1619,69 +1604,59 @@ static int remove_child_entry(struct connection *conn, struct node *node,
- return write_node(conn, node, true);
- }
-
--static void delete_child(struct connection *conn,
-- struct node *node, const char *childname)
-+static int delete_child(struct connection *conn,
-+ struct node *node, const char *childname)
- {
- unsigned int i;
-
- for (i = 0; i < node->childlen; i += strlen(node->children+i) + 1) {
- if (streq(node->children+i, childname)) {
-- if (remove_child_entry(conn, node, i))
-- corrupt(conn, "Can't update parent node '%s'",
-- node->name);
-- return;
-+ errno = remove_child_entry(conn, node, i) ? EIO : 0;
-+ return errno;
- }
- }
- corrupt(conn, "Can't find child '%s' in %s", childname, node->name);
-+
-+ errno = EIO;
-+ return errno;
- }
-
--static int delete_node(struct connection *conn, const void *ctx,
-- struct node *parent, struct node *node, bool watch_exact)
-+static int delnode_sub(const void *ctx, struct connection *conn,
-+ struct node *node, void *arg)
- {
-- char *name;
-+ const char *root = arg;
-+ bool watch_exact;
-+ int ret;
-+ TDB_DATA key;
-
-- /* Delete children. */
-- while (node->childlen) {
-- struct node *child;
-+ /* Any error here will probably be repeated for all following calls. */
-+ ret = access_node(conn, node, NODE_ACCESS_DELETE, &key);
-+ if (ret > 0)
-+ return WALK_TREE_SUCCESS_STOP;
-
-- name = talloc_asprintf(node, "%s/%s", node->name,
-- node->children);
-- child = name ? read_node(conn, node, name) : NULL;
-- if (child) {
-- if (delete_node(conn, ctx, node, child, true))
-- return errno;
-- } else {
-- trace("delete_node: Error deleting child '%s/%s'!\n",
-- node->name, node->children);
-- /* Quit deleting. */
-- errno = ENOMEM;
-- return errno;
-- }
-- talloc_free(name);
-- }
-+ /* In case of error stop the walk. */
-+ if (!ret && do_tdb_delete(conn, &key, &node->acc))
-+ return WALK_TREE_SUCCESS_STOP;
-
- /*
- * Fire the watches now, when we can still see the node permissions.
- * This fine as we are single threaded and the next possible read will
- * be handled only after the node has been really removed.
-- */
-+ */
-+ watch_exact = strcmp(root, node->name);
- fire_watches(conn, ctx, node->name, node, watch_exact, NULL);
-- delete_node_single(conn, node);
-- delete_child(conn, parent, basename(node->name));
-- talloc_free(node);
-
-- return 0;
-+ domain_entry_dec(conn, node);
-+
-+ return WALK_TREE_RM_CHILDENTRY;
- }
-
--static int _rm(struct connection *conn, const void *ctx, struct node *node,
-- const char *name)
-+static int _rm(struct connection *conn, const void *ctx, const char *name)
- {
-- /*
-- * Deleting node by node, so the result is always consistent even in
-- * case of a failure.
-- */
- struct node *parent;
- char *parentname = get_parent(ctx, name);
-+ struct walk_funcs walkfuncs = { .exit = delnode_sub };
-+ int ret;
-
- if (!parentname)
- return errno;
-@@ -1689,9 +1664,21 @@ static int _rm(struct connection *conn, const void *ctx, struct node *node,
- parent = read_node(conn, ctx, parentname);
- if (!parent)
- return read_node_can_propagate_errno() ? errno : EINVAL;
-- node->parent = parent;
-
-- return delete_node(conn, ctx, parent, node, false);
-+ ret = walk_node_tree(ctx, conn, name, &walkfuncs, (void *)name);
-+ if (ret < 0) {
-+ if (ret == WALK_TREE_ERROR_STOP) {
-+ corrupt(conn, "error when deleting sub-nodes of %s\n",
-+ name);
-+ errno = EIO;
-+ }
-+ return errno;
-+ }
-+
-+ if (delete_child(conn, parent, basename(name)))
-+ return errno;
-+
-+ return 0;
- }
-
-
-@@ -1728,7 +1715,7 @@ static int do_rm(const void *ctx, struct connection *conn,
- if (streq(name, "/"))
- return EINVAL;
-
-- ret = _rm(conn, ctx, node, name);
-+ ret = _rm(conn, ctx, name);
- if (ret)
- return ret;
-
---
-2.37.4
-
diff --git a/0078-tools-xenstore-use-treewalk-for-creating-node-record.patch b/0078-tools-xenstore-use-treewalk-for-creating-node-record.patch
deleted file mode 100644
index d5ed8c1..0000000
--- a/0078-tools-xenstore-use-treewalk-for-creating-node-record.patch
+++ /dev/null
@@ -1,169 +0,0 @@
-From 7682de61a49f7692cbd31a62f12c0ca12e069575 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:12 +0200
-Subject: [PATCH 78/87] tools/xenstore: use treewalk for creating node records
-
-Instead of doing an open tree walk using call recursion, use
-walk_node_tree() when creating the node records during a live update.
-
-This will reduce code size and avoid many nesting levels of function
-calls which could potentially exhaust the stack.
-
-This is part of XSA-418 / CVE-2022-42321.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 297ac246a5d8ed656b349641288f3402dcc0251e)
----
- tools/xenstore/xenstored_core.c | 105 ++++++++++++--------------------
- 1 file changed, 40 insertions(+), 65 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 58fb651542ec..05d349778bb4 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -3120,101 +3120,76 @@ const char *dump_state_node_perms(FILE *fp, const struct xs_permissions *perms,
- return NULL;
- }
-
--static const char *dump_state_node_tree(FILE *fp, char *path,
-- unsigned int path_max_len)
-+struct dump_node_data {
-+ FILE *fp;
-+ const char *err;
-+};
-+
-+static int dump_state_node_err(struct dump_node_data *data, const char *err)
- {
-- unsigned int pathlen, childlen, p = 0;
-+ data->err = err;
-+ return WALK_TREE_ERROR_STOP;
-+}
-+
-+static int dump_state_node(const void *ctx, struct connection *conn,
-+ struct node *node, void *arg)
-+{
-+ struct dump_node_data *data = arg;
-+ FILE *fp = data->fp;
-+ unsigned int pathlen;
- struct xs_state_record_header head;
- struct xs_state_node sn;
-- TDB_DATA key, data;
-- const struct xs_tdb_record_hdr *hdr;
-- const char *child;
- const char *ret;
-
-- pathlen = strlen(path) + 1;
--
-- set_tdb_key(path, &key);
-- data = tdb_fetch(tdb_ctx, key);
-- if (data.dptr == NULL)
-- return "Error reading node";
--
-- /* Clean up in case of failure. */
-- talloc_steal(path, data.dptr);
--
-- hdr = (void *)data.dptr;
-+ pathlen = strlen(node->name) + 1;
-
- head.type = XS_STATE_TYPE_NODE;
- head.length = sizeof(sn);
- sn.conn_id = 0;
- sn.ta_id = 0;
- sn.ta_access = 0;
-- sn.perm_n = hdr->num_perms;
-+ sn.perm_n = node->perms.num;
- sn.path_len = pathlen;
-- sn.data_len = hdr->datalen;
-- head.length += hdr->num_perms * sizeof(*sn.perms);
-+ sn.data_len = node->datalen;
-+ head.length += node->perms.num * sizeof(*sn.perms);
- head.length += pathlen;
-- head.length += hdr->datalen;
-+ head.length += node->datalen;
- head.length = ROUNDUP(head.length, 3);
-
- if (fwrite(&head, sizeof(head), 1, fp) != 1)
-- return "Dump node state error";
-+ return dump_state_node_err(data, "Dump node head error");
- if (fwrite(&sn, sizeof(sn), 1, fp) != 1)
-- return "Dump node state error";
-+ return dump_state_node_err(data, "Dump node state error");
-
-- ret = dump_state_node_perms(fp, hdr->perms, hdr->num_perms);
-+ ret = dump_state_node_perms(fp, node->perms.p, node->perms.num);
- if (ret)
-- return ret;
-+ return dump_state_node_err(data, ret);
-
-- if (fwrite(path, pathlen, 1, fp) != 1)
-- return "Dump node path error";
-- if (hdr->datalen &&
-- fwrite(hdr->perms + hdr->num_perms, hdr->datalen, 1, fp) != 1)
-- return "Dump node data error";
-+ if (fwrite(node->name, pathlen, 1, fp) != 1)
-+ return dump_state_node_err(data, "Dump node path error");
-+
-+ if (node->datalen && fwrite(node->data, node->datalen, 1, fp) != 1)
-+ return dump_state_node_err(data, "Dump node data error");
-
- ret = dump_state_align(fp);
- if (ret)
-- return ret;
-+ return dump_state_node_err(data, ret);
-
-- child = (char *)(hdr->perms + hdr->num_perms) + hdr->datalen;
--
-- /*
-- * Use path for constructing children paths.
-- * As we don't write out nodes without having written their parent
-- * already we will never clobber a part of the path we'll need later.
-- */
-- pathlen--;
-- if (path[pathlen - 1] != '/') {
-- path[pathlen] = '/';
-- pathlen++;
-- }
-- while (p < hdr->childlen) {
-- childlen = strlen(child) + 1;
-- if (pathlen + childlen > path_max_len)
-- return "Dump node path length error";
-- strcpy(path + pathlen, child);
-- ret = dump_state_node_tree(fp, path, path_max_len);
-- if (ret)
-- return ret;
-- p += childlen;
-- child += childlen;
-- }
--
-- talloc_free(data.dptr);
--
-- return NULL;
-+ return WALK_TREE_OK;
- }
-
- const char *dump_state_nodes(FILE *fp, const void *ctx)
- {
-- char *path;
-+ struct dump_node_data data = {
-+ .fp = fp,
-+ .err = "Dump node walk error"
-+ };
-+ struct walk_funcs walkfuncs = { .enter = dump_state_node };
-
-- path = talloc_size(ctx, XENSTORE_ABS_PATH_MAX + 1);
-- if (!path)
-- return "Path buffer allocation error";
-+ if (walk_node_tree(ctx, NULL, "/", &walkfuncs, &data))
-+ return data.err;
-
-- strcpy(path, "/");
--
-- return dump_state_node_tree(fp, path, XENSTORE_ABS_PATH_MAX + 1);
-+ return NULL;
- }
-
- void read_state_global(const void *ctx, const void *state)
---
-2.37.4
-
diff --git a/0079-tools-xenstore-remove-nodes-owned-by-destroyed-domai.patch b/0079-tools-xenstore-remove-nodes-owned-by-destroyed-domai.patch
deleted file mode 100644
index f6ba349..0000000
--- a/0079-tools-xenstore-remove-nodes-owned-by-destroyed-domai.patch
+++ /dev/null
@@ -1,298 +0,0 @@
-From 825332daeac9fc3ac1e482e805ac4a3bc1e1ab34 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:12 +0200
-Subject: [PATCH 79/87] tools/xenstore: remove nodes owned by destroyed domain
-
-In case a domain is removed from Xenstore, remove all nodes owned by
-it per default.
-
-This tackles the problem that nodes might be created by a domain
-outside its home path in Xenstore, leading to Xenstore hogging more
-and more memory. Domain quota don't work in this case if the guest is
-rebooting in between.
-
-Since XSA-322 ownership of such stale nodes is transferred to dom0,
-which is helping against unintended access, but not against OOM of
-Xenstore.
-
-As a fallback for weird cases add a Xenstore start parameter for
-keeping today's way to handle stale nodes, adding the risk of Xenstore
-hitting an OOM situation.
-
-This is part of XSA-419 / CVE-2022-42322.
-
-Fixes: 496306324d8d ("tools/xenstore: revoke access rights for removed domains")
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 755d3f9debf8879448211fffb018f556136f6a79)
----
- tools/xenstore/xenstored_core.c | 17 +++++--
- tools/xenstore/xenstored_core.h | 4 ++
- tools/xenstore/xenstored_domain.c | 84 +++++++++++++++++++++++--------
- tools/xenstore/xenstored_domain.h | 2 +-
- 4 files changed, 80 insertions(+), 27 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 05d349778bb4..0ca1a5a19ac2 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -80,6 +80,7 @@ static bool verbose = false;
- LIST_HEAD(connections);
- int tracefd = -1;
- static bool recovery = true;
-+bool keep_orphans = false;
- static int reopen_log_pipe[2];
- static int reopen_log_pipe0_pollfd_idx = -1;
- char *tracefile = NULL;
-@@ -757,7 +758,7 @@ struct node *read_node(struct connection *conn, const void *ctx,
- node->perms.p = hdr->perms;
- node->acc.domid = node->perms.p[0].id;
- node->acc.memory = data.dsize;
-- if (domain_adjust_node_perms(conn, node))
-+ if (domain_adjust_node_perms(node))
- goto error;
-
- /* If owner is gone reset currently accounted memory size. */
-@@ -800,7 +801,7 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
- void *p;
- struct xs_tdb_record_hdr *hdr;
-
-- if (domain_adjust_node_perms(conn, node))
-+ if (domain_adjust_node_perms(node))
- return errno;
-
- data.dsize = sizeof(*hdr)
-@@ -1651,7 +1652,7 @@ static int delnode_sub(const void *ctx, struct connection *conn,
- return WALK_TREE_RM_CHILDENTRY;
- }
-
--static int _rm(struct connection *conn, const void *ctx, const char *name)
-+int rm_node(struct connection *conn, const void *ctx, const char *name)
- {
- struct node *parent;
- char *parentname = get_parent(ctx, name);
-@@ -1715,7 +1716,7 @@ static int do_rm(const void *ctx, struct connection *conn,
- if (streq(name, "/"))
- return EINVAL;
-
-- ret = _rm(conn, ctx, name);
-+ ret = rm_node(conn, ctx, name);
- if (ret)
- return ret;
-
-@@ -2639,6 +2640,8 @@ static void usage(void)
- " -R, --no-recovery to request that no recovery should be attempted when\n"
- " the store is corrupted (debug only),\n"
- " -I, --internal-db store database in memory, not on disk\n"
-+" -K, --keep-orphans don't delete nodes owned by a domain when the\n"
-+" domain is deleted (this is a security risk!)\n"
- " -V, --verbose to request verbose execution.\n");
- }
-
-@@ -2663,6 +2666,7 @@ static struct option options[] = {
- { "timeout", 1, NULL, 'w' },
- { "no-recovery", 0, NULL, 'R' },
- { "internal-db", 0, NULL, 'I' },
-+ { "keep-orphans", 0, NULL, 'K' },
- { "verbose", 0, NULL, 'V' },
- { "watch-nb", 1, NULL, 'W' },
- #ifndef NO_LIVE_UPDATE
-@@ -2742,7 +2746,7 @@ int main(int argc, char *argv[])
- orig_argc = argc;
- orig_argv = argv;
-
-- while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:A:M:Q:q:T:RVW:w:U",
-+ while ((opt = getopt_long(argc, argv, "DE:F:HKNPS:t:A:M:Q:q:T:RVW:w:U",
- options, NULL)) != -1) {
- switch (opt) {
- case 'D':
-@@ -2778,6 +2782,9 @@ int main(int argc, char *argv[])
- case 'I':
- tdb_flags = TDB_INTERNAL|TDB_NOLOCK;
- break;
-+ case 'K':
-+ keep_orphans = true;
-+ break;
- case 'V':
- verbose = true;
- break;
-diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
-index 2d9942171d92..725793257e4a 100644
---- a/tools/xenstore/xenstored_core.h
-+++ b/tools/xenstore/xenstored_core.h
-@@ -240,6 +240,9 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
- struct node *read_node(struct connection *conn, const void *ctx,
- const char *name);
-
-+/* Remove a node and its children. */
-+int rm_node(struct connection *conn, const void *ctx, const char *name);
-+
- void setup_structure(bool live_update);
- struct connection *new_connection(const struct interface_funcs *funcs);
- struct connection *get_connection_by_id(unsigned int conn_id);
-@@ -286,6 +289,7 @@ extern int quota_req_outstanding;
- extern int quota_trans_nodes;
- extern int quota_memory_per_domain_soft;
- extern int quota_memory_per_domain_hard;
-+extern bool keep_orphans;
-
- extern unsigned int timeout_watch_event_msec;
-
-diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
-index e2f1b09c6037..8b134017a27a 100644
---- a/tools/xenstore/xenstored_domain.c
-+++ b/tools/xenstore/xenstored_domain.c
-@@ -227,10 +227,64 @@ static void unmap_interface(void *interface)
- xengnttab_unmap(*xgt_handle, interface, 1);
- }
-
-+static int domain_tree_remove_sub(const void *ctx, struct connection *conn,
-+ struct node *node, void *arg)
-+{
-+ struct domain *domain = arg;
-+ TDB_DATA key;
-+ int ret = WALK_TREE_OK;
-+
-+ if (node->perms.p[0].id != domain->domid)
-+ return WALK_TREE_OK;
-+
-+ if (keep_orphans) {
-+ set_tdb_key(node->name, &key);
-+ domain->nbentry--;
-+ node->perms.p[0].id = priv_domid;
-+ node->acc.memory = 0;
-+ domain_entry_inc(NULL, node);
-+ if (write_node_raw(NULL, &key, node, true)) {
-+ /* That's unfortunate. We only can try to continue. */
-+ syslog(LOG_ERR,
-+ "error when moving orphaned node %s to dom0\n",
-+ node->name);
-+ } else
-+ trace("orphaned node %s moved to dom0\n", node->name);
-+ } else {
-+ if (rm_node(NULL, ctx, node->name)) {
-+ /* That's unfortunate. We only can try to continue. */
-+ syslog(LOG_ERR,
-+ "error when deleting orphaned node %s\n",
-+ node->name);
-+ } else
-+ trace("orphaned node %s deleted\n", node->name);
-+
-+ /* Skip children in all cases in order to avoid more errors. */
-+ ret = WALK_TREE_SKIP_CHILDREN;
-+ }
-+
-+ return domain->nbentry > 0 ? ret : WALK_TREE_SUCCESS_STOP;
-+}
-+
-+static void domain_tree_remove(struct domain *domain)
-+{
-+ int ret;
-+ struct walk_funcs walkfuncs = { .enter = domain_tree_remove_sub };
-+
-+ if (domain->nbentry > 0) {
-+ ret = walk_node_tree(domain, NULL, "/", &walkfuncs, domain);
-+ if (ret == WALK_TREE_ERROR_STOP)
-+ syslog(LOG_ERR,
-+ "error when looking for orphaned nodes\n");
-+ }
-+}
-+
- static int destroy_domain(void *_domain)
- {
- struct domain *domain = _domain;
-
-+ domain_tree_remove(domain);
-+
- list_del(&domain->list);
-
- if (!domain->introduced)
-@@ -851,15 +905,15 @@ int domain_entry_inc(struct connection *conn, struct node *node)
- struct domain *d;
- unsigned int domid;
-
-- if (!conn)
-+ if (!node->perms.p)
- return 0;
-
-- domid = node->perms.p ? node->perms.p[0].id : conn->id;
-+ domid = node->perms.p[0].id;
-
-- if (conn->transaction) {
-+ if (conn && conn->transaction) {
- transaction_entry_inc(conn->transaction, domid);
- } else {
-- d = (domid == conn->id && conn->domain) ? conn->domain
-+ d = (conn && domid == conn->id && conn->domain) ? conn->domain
- : find_or_alloc_existing_domain(domid);
- if (d)
- d->nbentry++;
-@@ -920,23 +974,11 @@ int domain_alloc_permrefs(struct node_perms *perms)
- * Remove permissions for no longer existing domains in order to avoid a new
- * domain with the same domid inheriting the permissions.
- */
--int domain_adjust_node_perms(struct connection *conn, struct node *node)
-+int domain_adjust_node_perms(struct node *node)
- {
- unsigned int i;
- int ret;
-
-- ret = chk_domain_generation(node->perms.p[0].id, node->generation);
--
-- /* If the owner doesn't exist any longer give it to priv domain. */
-- if (!ret) {
-- /*
-- * In theory we'd need to update the number of dom0 nodes here,
-- * but we could be called for a read of the node. So better
-- * avoid the risk to overflow the node count of dom0.
-- */
-- node->perms.p[0].id = priv_domid;
-- }
--
- for (i = 1; i < node->perms.num; i++) {
- if (node->perms.p[i].perms & XS_PERM_IGNORE)
- continue;
-@@ -954,15 +996,15 @@ void domain_entry_dec(struct connection *conn, struct node *node)
- struct domain *d;
- unsigned int domid;
-
-- if (!conn)
-+ if (!node->perms.p)
- return;
-
- domid = node->perms.p ? node->perms.p[0].id : conn->id;
-
-- if (conn->transaction) {
-+ if (conn && conn->transaction) {
- transaction_entry_dec(conn->transaction, domid);
- } else {
-- d = (domid == conn->id && conn->domain) ? conn->domain
-+ d = (conn && domid == conn->id && conn->domain) ? conn->domain
- : find_domain_struct(domid);
- if (d) {
- d->nbentry--;
-@@ -1081,7 +1123,7 @@ int domain_memory_add(unsigned int domid, int mem, bool no_quota_check)
- * exist, as accounting is done either for a domain related to
- * the current connection, or for the domain owning a node
- * (which is always existing, as the owner of the node is
-- * tested to exist and replaced by domid 0 if not).
-+ * tested to exist and deleted or replaced by domid 0 if not).
- * So not finding the related domain MUST be an error in the
- * data base.
- */
-diff --git a/tools/xenstore/xenstored_domain.h b/tools/xenstore/xenstored_domain.h
-index 40fe5f690900..5454e925ad15 100644
---- a/tools/xenstore/xenstored_domain.h
-+++ b/tools/xenstore/xenstored_domain.h
-@@ -61,7 +61,7 @@ const char *get_implicit_path(const struct connection *conn);
- bool domain_is_unprivileged(struct connection *conn);
-
- /* Remove node permissions for no longer existing domains. */
--int domain_adjust_node_perms(struct connection *conn, struct node *node);
-+int domain_adjust_node_perms(struct node *node);
- int domain_alloc_permrefs(struct node_perms *perms);
-
- /* Quota manipulation */
---
-2.37.4
-
diff --git a/0080-tools-xenstore-make-the-internal-memory-data-base-th.patch b/0080-tools-xenstore-make-the-internal-memory-data-base-th.patch
deleted file mode 100644
index 53d6227..0000000
--- a/0080-tools-xenstore-make-the-internal-memory-data-base-th.patch
+++ /dev/null
@@ -1,101 +0,0 @@
-From 8b81fc185ab13feca2f63eda3792189e5ac11a97 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:13 +0200
-Subject: [PATCH 80/87] tools/xenstore: make the internal memory data base the
- default
-
-Having a file backed data base has the only advantage of being capable
-to dump the contents of it while Xenstore is running, and potentially
-using less swap space in case the data base can't be kept in memory.
-
-It has the major disadvantage of a huge performance overhead: switching
-to keep the data base in memory only speeds up live update of xenstored
-with 120000 nodes from 20 minutes to 11 seconds. A complete tree walk
-of this configuration will be reduced from 7 seconds to 280 msecs
-(measured by "xenstore-control check").
-
-So make the internal memory data base the default and enhance the
-"--internal-db" command line parameter to take an optional parameter
-allowing to switch the internal data base back to the file based one.
-
-This is part of XSA-419.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit d174fefa90487ddd25ebc618028f67b2e8a1f795)
----
- tools/helpers/init-xenstore-domain.c | 4 ++--
- tools/xenstore/xenstored_core.c | 13 ++++++++-----
- 2 files changed, 10 insertions(+), 7 deletions(-)
-
-diff --git a/tools/helpers/init-xenstore-domain.c b/tools/helpers/init-xenstore-domain.c
-index 11ebf79e6d26..8d1d1a4f1e3a 100644
---- a/tools/helpers/init-xenstore-domain.c
-+++ b/tools/helpers/init-xenstore-domain.c
-@@ -223,9 +223,9 @@ static int build(xc_interface *xch)
- }
-
- if ( param )
-- snprintf(cmdline, 512, "--event %d --internal-db %s", rv, param);
-+ snprintf(cmdline, 512, "--event %d %s", rv, param);
- else
-- snprintf(cmdline, 512, "--event %d --internal-db", rv);
-+ snprintf(cmdline, 512, "--event %d", rv);
-
- dom->guest_domid = domid;
- dom->cmdline = xc_dom_strdup(dom, cmdline);
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 0ca1a5a19ac2..041124d8b7a5 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -2329,7 +2329,7 @@ static void accept_connection(int sock)
- }
- #endif
-
--static int tdb_flags;
-+static int tdb_flags = TDB_INTERNAL | TDB_NOLOCK;
-
- /* We create initial nodes manually. */
- static void manual_node(const char *name, const char *child)
-@@ -2639,7 +2639,8 @@ static void usage(void)
- " watch-event: time a watch-event is kept pending\n"
- " -R, --no-recovery to request that no recovery should be attempted when\n"
- " the store is corrupted (debug only),\n"
--" -I, --internal-db store database in memory, not on disk\n"
-+" -I, --internal-db [on|off] store database in memory, not on disk, default is\n"
-+" memory, with \"--internal-db off\" it is on disk\n"
- " -K, --keep-orphans don't delete nodes owned by a domain when the\n"
- " domain is deleted (this is a security risk!)\n"
- " -V, --verbose to request verbose execution.\n");
-@@ -2665,7 +2666,7 @@ static struct option options[] = {
- { "quota-soft", 1, NULL, 'q' },
- { "timeout", 1, NULL, 'w' },
- { "no-recovery", 0, NULL, 'R' },
-- { "internal-db", 0, NULL, 'I' },
-+ { "internal-db", 2, NULL, 'I' },
- { "keep-orphans", 0, NULL, 'K' },
- { "verbose", 0, NULL, 'V' },
- { "watch-nb", 1, NULL, 'W' },
-@@ -2746,7 +2747,8 @@ int main(int argc, char *argv[])
- orig_argc = argc;
- orig_argv = argv;
-
-- while ((opt = getopt_long(argc, argv, "DE:F:HKNPS:t:A:M:Q:q:T:RVW:w:U",
-+ while ((opt = getopt_long(argc, argv,
-+ "DE:F:HI::KNPS:t:A:M:Q:q:T:RVW:w:U",
- options, NULL)) != -1) {
- switch (opt) {
- case 'D':
-@@ -2780,7 +2782,8 @@ int main(int argc, char *argv[])
- tracefile = optarg;
- break;
- case 'I':
-- tdb_flags = TDB_INTERNAL|TDB_NOLOCK;
-+ if (optarg && !strcmp(optarg, "off"))
-+ tdb_flags = 0;
- break;
- case 'K':
- keep_orphans = true;
---
-2.37.4
-
diff --git a/0081-docs-enhance-xenstore.txt-with-permissions-descripti.patch b/0081-docs-enhance-xenstore.txt-with-permissions-descripti.patch
deleted file mode 100644
index c0b9c4a..0000000
--- a/0081-docs-enhance-xenstore.txt-with-permissions-descripti.patch
+++ /dev/null
@@ -1,50 +0,0 @@
-From 1f5b394d6ed0ee26b5878bd0cdf4a698bbc4294f Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:13 +0200
-Subject: [PATCH 81/87] docs: enhance xenstore.txt with permissions description
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The permission scheme of Xenstore nodes is not really covered by
-docs/misc/xenstore.txt, other than referring to the Xen wiki.
-
-Add a paragraph explaining the permissions of nodes, and especially
-mentioning removal of nodes when a domain has been removed from
-Xenstore.
-
-This is part of XSA-419.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit d084d2c6dff7044956ebdf83a259ad6081a1d921)
----
- docs/misc/xenstore.txt | 11 +++++++++++
- 1 file changed, 11 insertions(+)
-
-diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
-index a7d006519ae8..eccd596ee38c 100644
---- a/docs/misc/xenstore.txt
-+++ b/docs/misc/xenstore.txt
-@@ -43,6 +43,17 @@ bytes are forbidden; clients specifying relative paths should keep
- them to within 2048 bytes. (See XENSTORE_*_PATH_MAX in xs_wire.h.)
-
-
-+Each node has one or multiple permission entries. Permissions are
-+granted by domain-id, the first permission entry of each node specifies
-+the owner of the node. Permissions of a node can be changed by the
-+owner of the node, the owner can only be modified by the control
-+domain (usually domain id 0). The owner always has the right to read
-+and write the node, while other permissions can be setup to allow
-+read and/or write access. When a domain is being removed from Xenstore
-+nodes owned by that domain will be removed together with all of those
-+nodes' children.
-+
-+
- Communication with xenstore is via either sockets, or event channel
- and shared memory, as specified in io/xs_wire.h: each message in
- either direction is a header formatted as a struct xsd_sockmsg
---
-2.37.4
-
diff --git a/0082-tools-ocaml-xenstored-Fix-quota-bypass-on-domain-shu.patch b/0082-tools-ocaml-xenstored-Fix-quota-bypass-on-domain-shu.patch
deleted file mode 100644
index 1cdc2b2..0000000
--- a/0082-tools-ocaml-xenstored-Fix-quota-bypass-on-domain-shu.patch
+++ /dev/null
@@ -1,93 +0,0 @@
-From 5b0919f2c0e5060f6e0bc328f100abae0a9f07b8 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Wed, 12 Oct 2022 19:13:06 +0100
-Subject: [PATCH 82/87] tools/ocaml/xenstored: Fix quota bypass on domain
- shutdown
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-XSA-322 fixed a domid reuse vulnerability by assigning Dom0 as the owner of
-any nodes left after a domain is shutdown (e.g. outside its /local/domain/N
-tree).
-
-However Dom0 has no quota on purpose, so this opened up another potential
-attack vector. Avoid it by deleting these nodes instead of assigning them to
-Dom0.
-
-This is part of XSA-419 / CVE-2022-42323.
-
-Fixes: c46eff921209 ("tools/ocaml/xenstored: clean up permissions for dead domains")
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit db471408edd46af403b8bd44d180a928ad7fbb80)
----
- tools/ocaml/xenstored/perms.ml | 3 +--
- tools/ocaml/xenstored/store.ml | 29 +++++++++++++++++++++--------
- 2 files changed, 22 insertions(+), 10 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/perms.ml b/tools/ocaml/xenstored/perms.ml
-index e8a16221f8fa..84f2503e8e29 100644
---- a/tools/ocaml/xenstored/perms.ml
-+++ b/tools/ocaml/xenstored/perms.ml
-@@ -64,8 +64,7 @@ let get_owner perm = perm.owner
- * *)
- let remove_domid ~domid perm =
- let acl = List.filter (fun (acl_domid, _) -> acl_domid <> domid) perm.acl in
-- let owner = if perm.owner = domid then 0 else perm.owner in
-- { perm with acl; owner }
-+ if perm.owner = domid then None else Some { perm with acl; owner = perm.owner }
-
- let default0 = create 0 NONE []
-
-diff --git a/tools/ocaml/xenstored/store.ml b/tools/ocaml/xenstored/store.ml
-index 20e67b142746..70f0c83de404 100644
---- a/tools/ocaml/xenstored/store.ml
-+++ b/tools/ocaml/xenstored/store.ml
-@@ -87,10 +87,21 @@ let check_owner node connection =
-
- let rec recurse fct node = fct node; SymbolMap.iter (fun _ -> recurse fct) node.children
-
--(** [recurse_map f tree] applies [f] on each node in the tree recursively *)
--let recurse_map f =
-+(** [recurse_filter_map f tree] applies [f] on each node in the tree recursively,
-+ possibly removing some nodes.
-+ Note that the nodes removed this way won't generate watch events.
-+*)
-+let recurse_filter_map f =
-+ let invalid = -1 in
-+ let is_valid _ node = node.perms.owner <> invalid in
- let rec walk node =
-- f { node with children = SymbolMap.map walk node.children }
-+ (* Map.filter_map is Ocaml 4.11+ only *)
-+ let node =
-+ { node with children =
-+ SymbolMap.map walk node.children |> SymbolMap.filter is_valid } in
-+ match f node with
-+ | Some keep -> keep
-+ | None -> { node with perms = {node.perms with owner = invalid } }
- in
- walk
-
-@@ -444,11 +455,13 @@ let setperms store perm path nperms =
-
- let reset_permissions store domid =
- Logging.info "store|node" "Cleaning up xenstore ACLs for domid %d" domid;
-- store.root <- Node.recurse_map (fun node ->
-- let perms = Perms.Node.remove_domid ~domid node.perms in
-- if perms <> node.perms then
-- Logging.debug "store|node" "Changed permissions for node %s" (Node.get_name node);
-- { node with perms }
-+ store.root <- Node.recurse_filter_map (fun node ->
-+ match Perms.Node.remove_domid ~domid node.perms with
-+ | None -> None
-+ | Some perms ->
-+ if perms <> node.perms then
-+ Logging.debug "store|node" "Changed permissions for node %s" (Node.get_name node);
-+ Some { node with perms }
- ) store.root
-
- type ops = {
---
-2.37.4
-
diff --git a/0083-tools-ocaml-Ensure-packet-size-is-never-negative.patch b/0083-tools-ocaml-Ensure-packet-size-is-never-negative.patch
deleted file mode 100644
index 5fc3c77..0000000
--- a/0083-tools-ocaml-Ensure-packet-size-is-never-negative.patch
+++ /dev/null
@@ -1,75 +0,0 @@
-From 635390415f4a9c0621330f0b40f8c7e914c4523f Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Wed, 12 Oct 2022 19:13:05 +0100
-Subject: [PATCH 83/87] tools/ocaml: Ensure packet size is never negative
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Integers in Ocaml have 63 or 31 bits of signed precision.
-
-On 64-bit builds of Ocaml, this is fine because a C uint32_t always fits
-within a 63-bit signed integer.
-
-In 32-bit builds of Ocaml, this goes wrong. The C uint32_t is truncated
-first (loses the top bit), then has a unsigned/signed mismatch.
-
-A "negative" value (i.e. a packet on the ring of between 1G and 2G in size)
-will trigger an exception later in Bytes.make in xb.ml, and because the packet
-is not removed from the ring, the exception re-triggers on every subsequent
-query, creating a livelock.
-
-Fix both the source of the exception in Xb, and as defence in depth, mark the
-domain as bad for any Invalid_argument exceptions to avoid the risk of
-livelock.
-
-This is XSA-420 / CVE-2022-42324.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit ae34df4d82636f4c82700b447ea2c93b9f82b3f3)
----
- tools/ocaml/libs/xb/partial.ml | 6 +++---
- tools/ocaml/xenstored/process.ml | 2 +-
- 2 files changed, 4 insertions(+), 4 deletions(-)
-
-diff --git a/tools/ocaml/libs/xb/partial.ml b/tools/ocaml/libs/xb/partial.ml
-index b6e2a716e263..3aa8927eb7f0 100644
---- a/tools/ocaml/libs/xb/partial.ml
-+++ b/tools/ocaml/libs/xb/partial.ml
-@@ -36,7 +36,7 @@ let of_string s =
- This will leave the guest connection is a bad state and will
- be hard to recover from without restarting the connection
- (ie rebooting the guest) *)
-- let dlen = min xenstore_payload_max dlen in
-+ let dlen = max 0 (min xenstore_payload_max dlen) in
- {
- tid = tid;
- rid = rid;
-@@ -46,8 +46,8 @@ let of_string s =
- }
-
- let append pkt s sz =
-- if pkt.len > 4096 then failwith "Buffer.add: cannot grow buffer";
-- Buffer.add_string pkt.buf (String.sub s 0 sz)
-+ if Buffer.length pkt.buf + sz > xenstore_payload_max then failwith "Buffer.add: cannot grow buffer";
-+ Buffer.add_substring pkt.buf s 0 sz
-
- let to_complete pkt =
- pkt.len - (Buffer.length pkt.buf)
-diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
-index ce39ce28b5f3..6cb990ee7fb2 100644
---- a/tools/ocaml/xenstored/process.ml
-+++ b/tools/ocaml/xenstored/process.ml
-@@ -722,7 +722,7 @@ let do_input store cons doms con =
- History.reconnect con;
- info "%s reconnection complete" (Connection.get_domstr con);
- None
-- | Failure exp ->
-+ | Invalid_argument exp | Failure exp ->
- error "caught exception %s" exp;
- error "got a bad client %s" (sprintf "%-8s" (Connection.get_domstr con));
- Connection.mark_as_bad con;
---
-2.37.4
-
diff --git a/0084-tools-xenstore-fix-deleting-node-in-transaction.patch b/0084-tools-xenstore-fix-deleting-node-in-transaction.patch
deleted file mode 100644
index 4ab044c..0000000
--- a/0084-tools-xenstore-fix-deleting-node-in-transaction.patch
+++ /dev/null
@@ -1,46 +0,0 @@
-From 4305807dfdc183f4acd170fe00eb66b338fa6430 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:13 +0200
-Subject: [PATCH 84/87] tools/xenstore: fix deleting node in transaction
-
-In case a node has been created in a transaction and it is later
-deleted in the same transaction, the transaction will be terminated
-with an error.
-
-As this error is encountered only when handling the deleted node at
-transaction finalization, the transaction will have been performed
-partially and without updating the accounting information. This will
-enable a malicious guest to create arbitrary number of nodes.
-
-This is part of XSA-421 / CVE-2022-42325.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Tested-by: Julien Grall <jgrall@amazon.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 13ac37f1416cae88d97f7baf6cf2a827edb9a187)
----
- tools/xenstore/xenstored_transaction.c | 8 +++++++-
- 1 file changed, 7 insertions(+), 1 deletion(-)
-
-diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
-index 3e3eb47326cc..7ffe21bb5285 100644
---- a/tools/xenstore/xenstored_transaction.c
-+++ b/tools/xenstore/xenstored_transaction.c
-@@ -418,7 +418,13 @@ static int finalize_transaction(struct connection *conn,
- true);
- talloc_free(data.dptr);
- } else {
-- ret = do_tdb_delete(conn, &key, NULL);
-+ /*
-+ * A node having been created and later deleted
-+ * in this transaction will have no generation
-+ * information stored.
-+ */
-+ ret = (i->generation == NO_GENERATION)
-+ ? 0 : do_tdb_delete(conn, &key, NULL);
- }
- if (ret)
- goto err;
---
-2.37.4
-
diff --git a/0085-tools-xenstore-harden-transaction-finalization-again.patch b/0085-tools-xenstore-harden-transaction-finalization-again.patch
deleted file mode 100644
index 6718ae7..0000000
--- a/0085-tools-xenstore-harden-transaction-finalization-again.patch
+++ /dev/null
@@ -1,410 +0,0 @@
-From 1bdd7c438b399e2ecce9e3c72bd7c1ae56df60f8 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 13 Sep 2022 07:35:14 +0200
-Subject: [PATCH 85/87] tools/xenstore: harden transaction finalization against
- errors
-
-When finalizing a transaction, any error occurring after checking for
-conflicts will result in the transaction being performed only
-partially today. Additionally accounting data will not be updated at
-the end of the transaction, which might result in further problems
-later.
-
-Avoid those problems by multiple modifications:
-
-- free any transaction specific nodes which don't need to be committed
- as they haven't been written during the transaction as soon as their
- generation count has been verified, this will reduce the risk of
- out-of-memory situations
-
-- store the transaction specific node name in struct accessed_node in
- order to avoid the need to allocate additional memory for it when
- finalizing the transaction
-
-- don't stop the transaction finalization when hitting an error
- condition, but try to continue to handle all modified nodes
-
-- in case of a detected error do the accounting update as needed and
- call the data base checking only after that
-
-- if writing a node in a transaction is failing (e.g. due to a failed
- quota check), fail the transaction, as prior changes to struct
- accessed_node can't easily be undone in that case
-
-This is part of XSA-421 / CVE-2022-42326.
-
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-Tested-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 2dd823ca7237e7fb90c890642d6a3b357a26fcff)
----
- tools/xenstore/xenstored_core.c | 16 ++-
- tools/xenstore/xenstored_transaction.c | 171 +++++++++++--------------
- tools/xenstore/xenstored_transaction.h | 4 +-
- 3 files changed, 92 insertions(+), 99 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 041124d8b7a5..ccb7f0a92578 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -727,8 +727,7 @@ struct node *read_node(struct connection *conn, const void *ctx,
- return NULL;
- }
-
-- if (transaction_prepend(conn, name, &key))
-- return NULL;
-+ transaction_prepend(conn, name, &key);
-
- data = tdb_fetch(tdb_ctx, key);
-
-@@ -846,10 +845,21 @@ int write_node_raw(struct connection *conn, TDB_DATA *key, struct node *node,
- static int write_node(struct connection *conn, struct node *node,
- bool no_quota_check)
- {
-+ int ret;
-+
- if (access_node(conn, node, NODE_ACCESS_WRITE, &node->key))
- return errno;
-
-- return write_node_raw(conn, &node->key, node, no_quota_check);
-+ ret = write_node_raw(conn, &node->key, node, no_quota_check);
-+ if (ret && conn && conn->transaction) {
-+ /*
-+ * Reverting access_node() is hard, so just fail the
-+ * transaction.
-+ */
-+ fail_transaction(conn->transaction);
-+ }
-+
-+ return ret;
- }
-
- unsigned int perm_for_conn(struct connection *conn,
-diff --git a/tools/xenstore/xenstored_transaction.c b/tools/xenstore/xenstored_transaction.c
-index 7ffe21bb5285..ac854197cadb 100644
---- a/tools/xenstore/xenstored_transaction.c
-+++ b/tools/xenstore/xenstored_transaction.c
-@@ -114,7 +114,8 @@ struct accessed_node
- struct list_head list;
-
- /* The name of the node. */
-- char *node;
-+ char *trans_name; /* Transaction specific name. */
-+ char *node; /* Main data base name. */
-
- /* Generation count (or NO_GENERATION) for conflict checking. */
- uint64_t generation;
-@@ -199,25 +200,20 @@ static char *transaction_get_node_name(void *ctx, struct transaction *trans,
- * Prepend the transaction to name if node has been modified in the current
- * transaction.
- */
--int transaction_prepend(struct connection *conn, const char *name,
-- TDB_DATA *key)
-+void transaction_prepend(struct connection *conn, const char *name,
-+ TDB_DATA *key)
- {
-- char *tdb_name;
-+ struct accessed_node *i;
-
-- if (!conn || !conn->transaction ||
-- !find_accessed_node(conn->transaction, name)) {
-- set_tdb_key(name, key);
-- return 0;
-+ if (conn && conn->transaction) {
-+ i = find_accessed_node(conn->transaction, name);
-+ if (i) {
-+ set_tdb_key(i->trans_name, key);
-+ return;
-+ }
- }
-
-- tdb_name = transaction_get_node_name(conn->transaction,
-- conn->transaction, name);
-- if (!tdb_name)
-- return errno;
--
-- set_tdb_key(tdb_name, key);
--
-- return 0;
-+ set_tdb_key(name, key);
- }
-
- /*
-@@ -240,7 +236,6 @@ int access_node(struct connection *conn, struct node *node,
- struct accessed_node *i = NULL;
- struct transaction *trans;
- TDB_DATA local_key;
-- const char *trans_name = NULL;
- int ret;
- bool introduce = false;
-
-@@ -259,10 +254,6 @@ int access_node(struct connection *conn, struct node *node,
-
- trans = conn->transaction;
-
-- trans_name = transaction_get_node_name(node, trans, node->name);
-- if (!trans_name)
-- goto nomem;
--
- i = find_accessed_node(trans, node->name);
- if (!i) {
- if (trans->nodes >= quota_trans_nodes &&
-@@ -273,9 +264,10 @@ int access_node(struct connection *conn, struct node *node,
- i = talloc_zero(trans, struct accessed_node);
- if (!i)
- goto nomem;
-- i->node = talloc_strdup(i, node->name);
-- if (!i->node)
-+ i->trans_name = transaction_get_node_name(i, trans, node->name);
-+ if (!i->trans_name)
- goto nomem;
-+ i->node = strchr(i->trans_name, '/') + 1;
- if (node->generation != NO_GENERATION && node->perms.num) {
- i->perms.p = talloc_array(i, struct xs_permissions,
- node->perms.num);
-@@ -302,7 +294,7 @@ int access_node(struct connection *conn, struct node *node,
- i->generation = node->generation;
- i->check_gen = true;
- if (node->generation != NO_GENERATION) {
-- set_tdb_key(trans_name, &local_key);
-+ set_tdb_key(i->trans_name, &local_key);
- ret = write_node_raw(conn, &local_key, node, true);
- if (ret)
- goto err;
-@@ -321,7 +313,7 @@ int access_node(struct connection *conn, struct node *node,
- return -1;
-
- if (key) {
-- set_tdb_key(trans_name, key);
-+ set_tdb_key(i->trans_name, key);
- if (type == NODE_ACCESS_WRITE)
- i->ta_node = true;
- if (type == NODE_ACCESS_DELETE)
-@@ -333,7 +325,6 @@ int access_node(struct connection *conn, struct node *node,
- nomem:
- ret = ENOMEM;
- err:
-- talloc_free((void *)trans_name);
- talloc_free(i);
- trans->fail = true;
- errno = ret;
-@@ -371,100 +362,90 @@ void queue_watches(struct connection *conn, const char *name, bool watch_exact)
- * base.
- */
- static int finalize_transaction(struct connection *conn,
-- struct transaction *trans)
-+ struct transaction *trans, bool *is_corrupt)
- {
-- struct accessed_node *i;
-+ struct accessed_node *i, *n;
- TDB_DATA key, ta_key, data;
- struct xs_tdb_record_hdr *hdr;
- uint64_t gen;
-- char *trans_name;
-- int ret;
-
-- list_for_each_entry(i, &trans->accessed, list) {
-- if (!i->check_gen)
-- continue;
-+ list_for_each_entry_safe(i, n, &trans->accessed, list) {
-+ if (i->check_gen) {
-+ set_tdb_key(i->node, &key);
-+ data = tdb_fetch(tdb_ctx, key);
-+ hdr = (void *)data.dptr;
-+ if (!data.dptr) {
-+ if (tdb_error(tdb_ctx) != TDB_ERR_NOEXIST)
-+ return EIO;
-+ gen = NO_GENERATION;
-+ } else
-+ gen = hdr->generation;
-+ talloc_free(data.dptr);
-+ if (i->generation != gen)
-+ return EAGAIN;
-+ }
-
-- set_tdb_key(i->node, &key);
-- data = tdb_fetch(tdb_ctx, key);
-- hdr = (void *)data.dptr;
-- if (!data.dptr) {
-- if (tdb_error(tdb_ctx) != TDB_ERR_NOEXIST)
-- return EIO;
-- gen = NO_GENERATION;
-- } else
-- gen = hdr->generation;
-- talloc_free(data.dptr);
-- if (i->generation != gen)
-- return EAGAIN;
-+ /* Entries for unmodified nodes can be removed early. */
-+ if (!i->modified) {
-+ if (i->ta_node) {
-+ set_tdb_key(i->trans_name, &ta_key);
-+ if (do_tdb_delete(conn, &ta_key, NULL))
-+ return EIO;
-+ }
-+ list_del(&i->list);
-+ talloc_free(i);
-+ }
- }
-
- while ((i = list_top(&trans->accessed, struct accessed_node, list))) {
-- trans_name = transaction_get_node_name(i, trans, i->node);
-- if (!trans_name)
-- /* We are doomed: the transaction is only partial. */
-- goto err;
--
-- set_tdb_key(trans_name, &ta_key);
--
-- if (i->modified) {
-- set_tdb_key(i->node, &key);
-- if (i->ta_node) {
-- data = tdb_fetch(tdb_ctx, ta_key);
-- if (!data.dptr)
-- goto err;
-+ set_tdb_key(i->node, &key);
-+ if (i->ta_node) {
-+ set_tdb_key(i->trans_name, &ta_key);
-+ data = tdb_fetch(tdb_ctx, ta_key);
-+ if (data.dptr) {
- hdr = (void *)data.dptr;
- hdr->generation = ++generation;
-- ret = do_tdb_write(conn, &key, &data, NULL,
-- true);
-+ *is_corrupt |= do_tdb_write(conn, &key, &data,
-+ NULL, true);
- talloc_free(data.dptr);
-+ if (do_tdb_delete(conn, &ta_key, NULL))
-+ *is_corrupt = true;
- } else {
-- /*
-- * A node having been created and later deleted
-- * in this transaction will have no generation
-- * information stored.
-- */
-- ret = (i->generation == NO_GENERATION)
-- ? 0 : do_tdb_delete(conn, &key, NULL);
-- }
-- if (ret)
-- goto err;
-- if (i->fire_watch) {
-- fire_watches(conn, trans, i->node, NULL,
-- i->watch_exact,
-- i->perms.p ? &i->perms : NULL);
-+ *is_corrupt = true;
- }
-+ } else {
-+ /*
-+ * A node having been created and later deleted
-+ * in this transaction will have no generation
-+ * information stored.
-+ */
-+ *is_corrupt |= (i->generation == NO_GENERATION)
-+ ? false
-+ : do_tdb_delete(conn, &key, NULL);
- }
-+ if (i->fire_watch)
-+ fire_watches(conn, trans, i->node, NULL, i->watch_exact,
-+ i->perms.p ? &i->perms : NULL);
-
-- if (i->ta_node && do_tdb_delete(conn, &ta_key, NULL))
-- goto err;
- list_del(&i->list);
- talloc_free(i);
- }
-
- return 0;
--
--err:
-- corrupt(conn, "Partial transaction");
-- return EIO;
- }
-
- static int destroy_transaction(void *_transaction)
- {
- struct transaction *trans = _transaction;
- struct accessed_node *i;
-- char *trans_name;
- TDB_DATA key;
-
- wrl_ntransactions--;
- trace_destroy(trans, "transaction");
- while ((i = list_top(&trans->accessed, struct accessed_node, list))) {
- if (i->ta_node) {
-- trans_name = transaction_get_node_name(i, trans,
-- i->node);
-- if (trans_name) {
-- set_tdb_key(trans_name, &key);
-- do_tdb_delete(trans->conn, &key, NULL);
-- }
-+ set_tdb_key(i->trans_name, &key);
-+ do_tdb_delete(trans->conn, &key, NULL);
- }
- list_del(&i->list);
- talloc_free(i);
-@@ -556,6 +537,7 @@ int do_transaction_end(const void *ctx, struct connection *conn,
- {
- const char *arg = onearg(in);
- struct transaction *trans;
-+ bool is_corrupt = false;
- int ret;
-
- if (!arg || (!streq(arg, "T") && !streq(arg, "F")))
-@@ -579,13 +561,17 @@ int do_transaction_end(const void *ctx, struct connection *conn,
- ret = transaction_fix_domains(trans, false);
- if (ret)
- return ret;
-- if (finalize_transaction(conn, trans))
-- return EAGAIN;
-+ ret = finalize_transaction(conn, trans, &is_corrupt);
-+ if (ret)
-+ return ret;
-
- wrl_apply_debit_trans_commit(conn);
-
- /* fix domain entry for each changed domain */
- transaction_fix_domains(trans, true);
-+
-+ if (is_corrupt)
-+ corrupt(conn, "transaction inconsistency");
- }
- send_ack(conn, XS_TRANSACTION_END);
-
-@@ -660,7 +646,7 @@ int check_transactions(struct hashtable *hash)
- struct connection *conn;
- struct transaction *trans;
- struct accessed_node *i;
-- char *tname, *tnode;
-+ char *tname;
-
- list_for_each_entry(conn, &connections, list) {
- list_for_each_entry(trans, &conn->transaction_list, list) {
-@@ -672,11 +658,8 @@ int check_transactions(struct hashtable *hash)
- list_for_each_entry(i, &trans->accessed, list) {
- if (!i->ta_node)
- continue;
-- tnode = transaction_get_node_name(tname, trans,
-- i->node);
-- if (!tnode || !remember_string(hash, tnode))
-+ if (!remember_string(hash, i->trans_name))
- goto nomem;
-- talloc_free(tnode);
- }
-
- talloc_free(tname);
-diff --git a/tools/xenstore/xenstored_transaction.h b/tools/xenstore/xenstored_transaction.h
-index 39d7f81c5127..3417303f9427 100644
---- a/tools/xenstore/xenstored_transaction.h
-+++ b/tools/xenstore/xenstored_transaction.h
-@@ -48,8 +48,8 @@ int __must_check access_node(struct connection *conn, struct node *node,
- void queue_watches(struct connection *conn, const char *name, bool watch_exact);
-
- /* Prepend the transaction to name if appropriate. */
--int transaction_prepend(struct connection *conn, const char *name,
-- TDB_DATA *key);
-+void transaction_prepend(struct connection *conn, const char *name,
-+ TDB_DATA *key);
-
- /* Mark the transaction as failed. This will prevent it to be committed. */
- void fail_transaction(struct transaction *trans);
---
-2.37.4
-
diff --git a/0086-x86-spec-ctrl-Enumeration-for-IBPB_RET.patch b/0086-x86-spec-ctrl-Enumeration-for-IBPB_RET.patch
deleted file mode 100644
index c15c285..0000000
--- a/0086-x86-spec-ctrl-Enumeration-for-IBPB_RET.patch
+++ /dev/null
@@ -1,82 +0,0 @@
-From b1a1df345aaf359f305d6d041e571929c9252645 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 14 Jun 2022 16:18:36 +0100
-Subject: [PATCH 86/87] x86/spec-ctrl: Enumeration for IBPB_RET
-
-The IBPB_RET bit indicates that the CPU's implementation of MSR_PRED_CMD.IBPB
-does flush the RSB/RAS too.
-
-This is part of XSA-422 / CVE-2022-23824.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 24496558e650535bdbd22cc04731e82276cd1b3f)
----
- tools/libs/light/libxl_cpuid.c | 1 +
- tools/misc/xen-cpuid.c | 1 +
- xen/arch/x86/spec_ctrl.c | 5 +++--
- xen/include/public/arch-x86/cpufeatureset.h | 1 +
- 4 files changed, 6 insertions(+), 2 deletions(-)
-
-diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
-index bf6fdee360a9..691d5c6b2a68 100644
---- a/tools/libs/light/libxl_cpuid.c
-+++ b/tools/libs/light/libxl_cpuid.c
-@@ -289,6 +289,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
- {"ssb-no", 0x80000008, NA, CPUID_REG_EBX, 26, 1},
- {"psfd", 0x80000008, NA, CPUID_REG_EBX, 28, 1},
- {"btc-no", 0x80000008, NA, CPUID_REG_EBX, 29, 1},
-+ {"ibpb-ret", 0x80000008, NA, CPUID_REG_EBX, 30, 1},
-
- {"nc", 0x80000008, NA, CPUID_REG_ECX, 0, 8},
- {"apicidsize", 0x80000008, NA, CPUID_REG_ECX, 12, 4},
-diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
-index fe22f5f5b68b..cd094427dd4c 100644
---- a/tools/misc/xen-cpuid.c
-+++ b/tools/misc/xen-cpuid.c
-@@ -159,6 +159,7 @@ static const char *const str_e8b[32] =
- [24] = "amd-ssbd", [25] = "virt-ssbd",
- [26] = "ssb-no",
- [28] = "psfd", [29] = "btc-no",
-+ [30] = "ibpb-ret",
- };
-
- static const char *const str_7d0[32] =
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 0f4bad3d3abb..16a562d3a172 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -419,7 +419,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
- * Hardware read-only information, stating immunity to certain issues, or
- * suggestions of which mitigation to use.
- */
-- printk(" Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
-+ printk(" Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
- (caps & ARCH_CAPS_RDCL_NO) ? " RDCL_NO" : "",
- (caps & ARCH_CAPS_IBRS_ALL) ? " IBRS_ALL" : "",
- (caps & ARCH_CAPS_RSBA) ? " RSBA" : "",
-@@ -436,7 +436,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
- (e8b & cpufeat_mask(X86_FEATURE_STIBP_ALWAYS)) ? " STIBP_ALWAYS" : "",
- (e8b & cpufeat_mask(X86_FEATURE_IBRS_FAST)) ? " IBRS_FAST" : "",
- (e8b & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "",
-- (e8b & cpufeat_mask(X86_FEATURE_BTC_NO)) ? " BTC_NO" : "");
-+ (e8b & cpufeat_mask(X86_FEATURE_BTC_NO)) ? " BTC_NO" : "",
-+ (e8b & cpufeat_mask(X86_FEATURE_IBPB_RET)) ? " IBPB_RET" : "");
-
- /* Hardware features which need driving to mitigate issues. */
- printk(" Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
-diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
-index e7b8167800a2..e0731221404c 100644
---- a/xen/include/public/arch-x86/cpufeatureset.h
-+++ b/xen/include/public/arch-x86/cpufeatureset.h
-@@ -267,6 +267,7 @@ XEN_CPUFEATURE(VIRT_SSBD, 8*32+25) /* MSR_VIRT_SPEC_CTRL.SSBD */
- XEN_CPUFEATURE(SSB_NO, 8*32+26) /*A Hardware not vulnerable to SSB */
- XEN_CPUFEATURE(PSFD, 8*32+28) /*S MSR_SPEC_CTRL.PSFD */
- XEN_CPUFEATURE(BTC_NO, 8*32+29) /*A Hardware not vulnerable to Branch Type Confusion */
-+XEN_CPUFEATURE(IBPB_RET, 8*32+30) /*A IBPB clears RSB/RAS too. */
-
- /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
- XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A AVX512 Neural Network Instructions */
---
-2.37.4
-
diff --git a/0087-x86-spec-ctrl-Mitigate-IBPB-not-flushing-the-RSB-RAS.patch b/0087-x86-spec-ctrl-Mitigate-IBPB-not-flushing-the-RSB-RAS.patch
deleted file mode 100644
index 9bcb4d3..0000000
--- a/0087-x86-spec-ctrl-Mitigate-IBPB-not-flushing-the-RSB-RAS.patch
+++ /dev/null
@@ -1,113 +0,0 @@
-From c1e196ab490b47ce42037c2fef8184a19d96922b Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 14 Jun 2022 16:18:36 +0100
-Subject: [PATCH 87/87] x86/spec-ctrl: Mitigate IBPB not flushing the RSB/RAS
-
-Introduce spec_ctrl_new_guest_context() to encapsulate all logic pertaining to
-using MSR_PRED_CMD for a new guest context, even if it only has one user
-presently.
-
-Introduce X86_BUG_IBPB_NO_RET, and use it extend spec_ctrl_new_guest_context()
-with a manual fixup for hardware which mis-implements IBPB.
-
-This is part of XSA-422 / CVE-2022-23824.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 2b27967fb89d7904a1571a2fb963b1c9cac548db)
----
- xen/arch/x86/asm-macros.c | 1 +
- xen/arch/x86/domain.c | 2 +-
- xen/arch/x86/spec_ctrl.c | 8 ++++++++
- xen/include/asm-x86/cpufeatures.h | 1 +
- xen/include/asm-x86/spec_ctrl.h | 22 ++++++++++++++++++++++
- 5 files changed, 33 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/asm-macros.c b/xen/arch/x86/asm-macros.c
-index 7e536b0d82f5..891d86c7655c 100644
---- a/xen/arch/x86/asm-macros.c
-+++ b/xen/arch/x86/asm-macros.c
-@@ -1,2 +1,3 @@
- #include <asm/asm-defns.h>
- #include <asm/alternative-asm.h>
-+#include <asm/spec_ctrl_asm.h>
-diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
-index 3fab2364be8d..3080cde62b5b 100644
---- a/xen/arch/x86/domain.c
-+++ b/xen/arch/x86/domain.c
-@@ -2092,7 +2092,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
- */
- if ( *last_id != next_id )
- {
-- wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
-+ spec_ctrl_new_guest_context();
- *last_id = next_id;
- }
- }
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 16a562d3a172..90d86fe5cb47 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -804,6 +804,14 @@ static void __init ibpb_calculations(void)
- return;
- }
-
-+ /*
-+ * AMD/Hygon CPUs to date (June 2022) don't flush the the RAS. Future
-+ * CPUs are expected to enumerate IBPB_RET when this has been fixed.
-+ * Until then, cover the difference with the software sequence.
-+ */
-+ if ( boot_cpu_has(X86_FEATURE_IBPB) && !boot_cpu_has(X86_FEATURE_IBPB_RET) )
-+ setup_force_cpu_cap(X86_BUG_IBPB_NO_RET);
-+
- /*
- * IBPB-on-entry mitigations for Branch Type Confusion.
- *
-diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
-index 672c9ee22ba2..ecc1bb09505a 100644
---- a/xen/include/asm-x86/cpufeatures.h
-+++ b/xen/include/asm-x86/cpufeatures.h
-@@ -49,6 +49,7 @@ XEN_CPUFEATURE(IBPB_ENTRY_HVM, X86_SYNTH(29)) /* MSR_PRED_CMD used by Xen for
- #define X86_BUG_FPU_PTRS X86_BUG( 0) /* (F)X{SAVE,RSTOR} doesn't save/restore FOP/FIP/FDP. */
- #define X86_BUG_NULL_SEG X86_BUG( 1) /* NULL-ing a selector preserves the base and limit. */
- #define X86_BUG_CLFLUSH_MFENCE X86_BUG( 2) /* MFENCE needed to serialise CLFLUSH */
-+#define X86_BUG_IBPB_NO_RET X86_BUG( 3) /* IBPB doesn't flush the RSB/RAS */
-
- /* Total number of capability words, inc synth and bug words. */
- #define NCAPINTS (FSCAPINTS + X86_NR_SYNTH + X86_NR_BUG) /* N 32-bit words worth of info */
-diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
-index 9403b81dc7af..6a77c3937844 100644
---- a/xen/include/asm-x86/spec_ctrl.h
-+++ b/xen/include/asm-x86/spec_ctrl.h
-@@ -65,6 +65,28 @@
- void init_speculation_mitigations(void);
- void spec_ctrl_init_domain(struct domain *d);
-
-+/*
-+ * Switch to a new guest prediction context.
-+ *
-+ * This flushes all indirect branch predictors (BTB, RSB/RAS), so guest code
-+ * which has previously run on this CPU can't attack subsequent guest code.
-+ *
-+ * As this flushes the RSB/RAS, it destroys the predictions of the calling
-+ * context. For best performace, arrange for this to be used when we're going
-+ * to jump out of the current context, e.g. with reset_stack_and_jump().
-+ *
-+ * For hardware which mis-implements IBPB, fix up by flushing the RSB/RAS
-+ * manually.
-+ */
-+static always_inline void spec_ctrl_new_guest_context(void)
-+{
-+ wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
-+
-+ /* (ab)use alternative_input() to specify clobbers. */
-+ alternative_input("", "DO_OVERWRITE_RSB", X86_BUG_IBPB_NO_RET,
-+ : "rax", "rcx");
-+}
-+
- extern int8_t opt_ibpb_ctxt_switch;
- extern bool opt_ssbd;
- extern int8_t opt_eager_fpu;
---
-2.37.4
-
diff --git a/info.txt b/info.txt
index a70e606..c92b6d7 100644
--- a/info.txt
+++ b/info.txt
@@ -1,6 +1,6 @@
-Xen upstream patchset #1 for 4.16.3-pre
+Xen upstream patchset #0 for 4.16.4-pre
Containing patches from
-RELEASE-4.16.2 (1871bd1c9eb934f0ffd039f3d68e42fd0097f322)
+RELEASE-4.16.3 (08c42cec2f3dbb8d1df62c2ad4945d127b418fd6)
to
-staging-4.16 (c1e196ab490b47ce42037c2fef8184a19d96922b)
+staging-4.16 (4ad5975d4e35635f03d2cb9e86292c0daeabd75f)
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [gentoo-commits] proj/xen-upstream-patches:main commit in: /
@ 2023-04-17 8:16 Florian Schmaus
0 siblings, 0 replies; 11+ messages in thread
From: Florian Schmaus @ 2023-04-17 8:16 UTC (permalink / raw
To: gentoo-commits
commit: 7e0f315531fdc3c24b6b9a0bb9d391b4cb52780e
Author: Tomáš Mózes <hydrapolic <AT> gmail <DOT> com>
AuthorDate: Fri Apr 14 17:03:31 2023 +0000
Commit: Florian Schmaus <flow <AT> gentoo <DOT> org>
CommitDate: Fri Apr 14 17:03:31 2023 +0000
URL: https://gitweb.gentoo.org/proj/xen-upstream-patches.git/commit/?id=7e0f3155
Xen 4.17.1-pre-patchset-0
Signed-off-by: Tomáš Mózes <hydrapolic <AT> gmail.com>
0001-update-Xen-version-to-4.16.4-pre.patch | 25 --
0001-update-Xen-version-to-4.17.1-pre.patch | 136 ++++++++
...not-release-irq-until-all-cleanup-is-done.patch | 90 +++++
...not-forward-MADT-Local-APIC-NMI-structure.patch | 103 ++++++
...-t-mark-external-IRQs-as-pending-when-vLA.patch | 71 ++++
...n-don-t-mark-IRQ-vectors-as-pending-when-.patch | 60 ++++
...-t-mark-evtchn-upcall-vector-as-pending-w.patch | 70 ++++
...roadcast-accept-partial-broadcast-success.patch | 10 +-
...cate-the-ESRT-when-booting-via-multiboot2.patch | 195 +++++++++++
...prevent-overflow-with-high-frequency-TSCs.patch | 10 +-
...tored-Fix-incorrect-scope-after-an-if-sta.patch | 52 +++
...-evtchn-OCaml-5-support-fix-potential-res.patch | 68 ++++
...l-evtchn-Add-binding-for-xenevtchn_fdopen.patch | 81 +++++
...-evtchn-Extend-the-init-binding-with-a-cl.patch | 90 +++++
0014-tools-oxenstored-Style-fixes-to-Domain.patch | 64 ++++
...tored-Bind-the-DOM_EXC-VIRQ-in-in-Event.i.patch | 82 +++++
...tored-Rename-some-port-variables-to-remot.patch | 144 ++++++++
...oxenstored-Implement-Domain.rebind_evtchn.patch | 67 ++++
...tored-Rework-Domain-evtchn-handling-to-us.patch | 209 ++++++++++++
...tored-Keep-dev-xen-evtchn-open-across-liv.patch | 367 +++++++++++++++++++++
...tored-Log-live-update-issues-at-warning-l.patch | 42 +++
...oxenstored-Set-uncaught-exception-handler.patch | 83 +++++
...tored-syslog-Avoid-potential-NULL-derefer.patch | 55 +++
...tored-Render-backtraces-more-nicely-in-Sy.patch | 83 +++++
...s-xenstore-simplify-loop-handling-connect.patch | 136 ++++++++
...-Restore-Xen-s-MSR_PAT-value-on-S3-resume.patch | 6 +-
...uild-with-recent-QEMU-use-enable-trace-ba.patch | 8 +-
| 74 +++++
...culate-model-specific-LBRs-once-at-start-.patch | 24 +-
...pport-for-CPUs-without-model-specific-LBR.patch | 12 +-
...fix-PAE-check-for-top-level-table-unshado.patch | 10 +-
| 50 +++
...x-an-incorrect-assignment-to-uart-io_size.patch | 10 +-
...3-libxl-fix-guest-kexec-skip-cpuid-policy.patch | 18 +-
...-xenctrl-Make-domain_getinfolist-tail-rec.patch | 8 +-
...-xenctrl-Use-larger-chunksize-in-domain_g.patch | 8 +-
...aml-xb-mmap-Use-Data_abstract_val-wrapper.patch | 4 +-
...=> 0037-tools-ocaml-xb-Drop-Xs_ring.write.patch | 4 +-
...tored-validate-config-file-before-live-up.patch | 4 +-
...l-libs-Don-t-declare-stubs-as-taking-void.patch | 6 +-
...-libs-Allocate-the-correct-amount-of-memo.patch | 16 +-
...-evtchn-Don-t-reference-Custom-objects-wi.patch | 4 +-
...-xc-Fix-binding-for-xc_domain_assign_devi.patch | 10 +-
...-xc-Don-t-reference-Abstract_Tag-objects-.patch | 8 +-
...-libs-Fix-memory-resource-leaks-with-caml.patch | 4 +-
...rl-Mitigate-Cross-Thread-Return-Address-P.patch | 100 +++---
...Remove-clang-8-from-Debian-unstable-conta.patch | 12 +-
...ix-parallel-build-between-flex-bison-and-.patch | 14 +-
...uid-Infrastructure-for-leaves-7-1-ecx-edx.patch | 38 +--
...isable-CET-SS-on-parts-susceptible-to-fra.patch | 56 ++--
...pect-credit2_runqueue-all-when-arranging-.patch | 14 +-
0051-build-make-FILE-symbol-paths-consistent.patch | 42 +++
...MD-apply-the-patch-early-on-every-logical.patch | 28 +-
...-mem_sharing-teardown-before-paging-teard.patch | 14 +-
...Work-around-Clang-IAS-macro-expansion-bug.patch | 36 +-
...ng-Wunicode-diagnostic-when-building-asm-.patch | 38 +--
0056-bump-default-SeaBIOS-version-to-1.16.0.patch | 28 --
...KG_CONFIG_FILE-instead-of-PKG_CONFIG-vari.patch | 55 ++-
...Fix-resource-leaks-in-xc_core_arch_map_p2.patch | 6 +-
...Fix-leak-on-realloc-failure-in-backup_pte.patch | 8 +-
...MD-late-load-the-patch-on-every-logical-t.patch | 22 +-
...account-for-log-dirty-mode-when-pre-alloc.patch | 48 +--
...nd-number-of-pinned-cache-attribute-regio.patch | 6 +-
...ialize-pinned-cache-attribute-list-manipu.patch | 10 +-
...rl-Defer-CR4_PV32_RESTORE-on-the-cstar_en.patch | 6 +-
...lement-VMExit-based-guest-Bus-Lock-detect.patch | 175 ++++++++++
...troduce-helper-to-set-VMX_INTR_SHADOW_NMI.patch | 102 ++++++
0066-x86-vmx-implement-Notify-VM-Exit.patch | 243 ++++++++++++++
...python-change-s-size-type-for-Python-3.10.patch | 6 +-
...s-xenmon-Fix-xenmon.py-for-with-python3.x.patch | 6 +-
...rl-Add-BHI-controls-to-userspace-componen.patch | 51 +++
...arking-fix-build-with-gcc12-and-NR_CPUS-1.patch | 20 +-
...help-gcc13-to-avoid-it-emitting-a-warning.patch | 86 ++---
...ck.patch => 0072-VT-d-constrain-IGD-check.patch | 8 +-
... => 0073-bunzip-work-around-gcc13-warning.patch | 8 +-
...patch => 0074-libacpi-fix-PCI-hotplug-AML.patch | 8 +-
...ithout-XT-x2APIC-needs-to-be-forced-into-.patch | 16 +-
...mmu-no-igfx-if-the-IOMMU-scope-contains-f.patch | 8 +-
...fix-and-improve-sh_page_has_multiple_shad.patch | 8 +-
...Fix-evaluate_nospec-code-generation-under.patch | 14 +-
...x86-shadow-Fix-build-with-no-PG_log_dirty.patch | 20 +-
...-t-spuriously-crash-the-domain-when-INIT-.patch | 10 +-
...6-ucode-Fix-error-paths-control_thread_fn.patch | 16 +-
| 37 +++
...andle-accesses-adjacent-to-the-MSI-X-tabl.patch | 123 ++++---
...rect-name-value-pair-parsing-for-PCI-port.patch | 12 +-
....patch => 0085-CI-Drop-automation-configs.patch | 12 +-
...Switch-arm32-cross-builds-to-run-on-arm64.patch | 8 +-
...n-Remove-CentOS-7.2-containers-and-builds.patch | 6 +-
...mation-Remove-non-debug-x86_32-build-jobs.patch | 8 +-
...-llvm-8-from-the-Debian-Stretch-container.patch | 6 +-
info.txt | 6 +-
92 files changed, 3671 insertions(+), 614 deletions(-)
diff --git a/0001-update-Xen-version-to-4.16.4-pre.patch b/0001-update-Xen-version-to-4.16.4-pre.patch
deleted file mode 100644
index 961358a..0000000
--- a/0001-update-Xen-version-to-4.16.4-pre.patch
+++ /dev/null
@@ -1,25 +0,0 @@
-From e3396cd8be5ee99d363a23f30c680e42fb2757bd Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 20 Dec 2022 13:50:16 +0100
-Subject: [PATCH 01/61] update Xen version to 4.16.4-pre
-
----
- xen/Makefile | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/Makefile b/xen/Makefile
-index 06dde1e03c..67c5551ffd 100644
---- a/xen/Makefile
-+++ b/xen/Makefile
-@@ -2,7 +2,7 @@
- # All other places this is stored (eg. compile.h) should be autogenerated.
- export XEN_VERSION = 4
- export XEN_SUBVERSION = 16
--export XEN_EXTRAVERSION ?= .3$(XEN_VENDORVERSION)
-+export XEN_EXTRAVERSION ?= .4-pre$(XEN_VENDORVERSION)
- export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
- -include xen-version
-
---
-2.40.0
-
diff --git a/0001-update-Xen-version-to-4.17.1-pre.patch b/0001-update-Xen-version-to-4.17.1-pre.patch
new file mode 100644
index 0000000..1d1bb53
--- /dev/null
+++ b/0001-update-Xen-version-to-4.17.1-pre.patch
@@ -0,0 +1,136 @@
+From 0b999fa2eadaeff840a8331b87f1f73abf3b14eb Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 20 Dec 2022 13:40:38 +0100
+Subject: [PATCH 01/89] update Xen version to 4.17.1-pre
+
+---
+ MAINTAINERS | 92 +++++-----------------------------------------------
+ xen/Makefile | 2 +-
+ 2 files changed, 10 insertions(+), 84 deletions(-)
+
+diff --git a/MAINTAINERS b/MAINTAINERS
+index 175f10f33f..ebb908cc37 100644
+--- a/MAINTAINERS
++++ b/MAINTAINERS
+@@ -54,6 +54,15 @@ list. Remember to copy the appropriate stable branch maintainer who
+ will be listed in this section of the MAINTAINERS file in the
+ appropriate branch.
+
++The maintainer for this branch is:
++
++ Jan Beulich <jbeulich@suse.com>
++
++Tools backport requests should also be copied to:
++
++ Anthony Perard <anthony.perard@citrix.com>
++
++
+ Unstable Subsystem Maintainers
+ ==============================
+
+@@ -104,89 +113,6 @@ Descriptions of section entries:
+ xen-maintainers-<version format number of this file>
+
+
+- Check-in policy
+- ===============
+-
+-In order for a patch to be checked in, in general, several conditions
+-must be met:
+-
+-1. In order to get a change to a given file committed, it must have
+- the approval of at least one maintainer of that file.
+-
+- A patch of course needs Acks from the maintainers of each file that
+- it changes; so a patch which changes xen/arch/x86/traps.c,
+- xen/arch/x86/mm/p2m.c, and xen/arch/x86/mm/shadow/multi.c would
+- require an Ack from each of the three sets of maintainers.
+-
+- See below for rules on nested maintainership.
+-
+-2. It must have appropriate approval from someone other than the
+- submitter. This can be either:
+-
+- a. An Acked-by from a maintainer of the code being touched (a
+- co-maintainer if available, or a more general level maintainer if
+- not available; see the secton on nested maintainership)
+-
+- b. A Reviewed-by by anyone of suitable stature in the community
+-
+-3. Sufficient time must have been given for anyone to respond. This
+- depends in large part upon the urgency and nature of the patch.
+- For a straightforward uncontroversial patch, a day or two may be
+- sufficient; for a controversial patch, a week or two may be better.
+-
+-4. There must be no "open" objections.
+-
+-In a case where one person submits a patch and a maintainer gives an
+-Ack, the Ack stands in for both the approval requirement (#1) and the
+-Acked-by-non-submitter requirement (#2).
+-
+-In a case where a maintainer themselves submits a patch, the
+-Signed-off-by meets the approval requirement (#1); so a Review
+-from anyone in the community suffices for requirement #2.
+-
+-Before a maintainer checks in their own patch with another community
+-member's R-b but no co-maintainer Ack, it is especially important to
+-give their co-maintainer opportunity to give feedback, perhaps
+-declaring their intention to check it in without their co-maintainers
+-ack a day before doing so.
+-
+-Maintainers may choose to override non-maintainer objections in the
+-case that consensus can't be reached.
+-
+-As always, no policy can cover all possible situations. In
+-exceptional circumstances, committers may commit a patch in absence of
+-one or more of the above requirements, if they are reasonably
+-confident that the other maintainers will approve of their decision in
+-retrospect.
+-
+- The meaning of nesting
+- ======================
+-
+-Many maintainership areas are "nested": for example, there are entries
+-for xen/arch/x86 as well as xen/arch/x86/mm, and even
+-xen/arch/x86/mm/shadow; and there is a section at the end called "THE
+-REST" which lists all committers. The meaning of nesting is that:
+-
+-1. Under normal circumstances, the Ack of the most specific maintainer
+-is both necessary and sufficient to get a change to a given file
+-committed. So a change to xen/arch/x86/mm/shadow/multi.c requires the
+-the Ack of the xen/arch/x86/mm/shadow maintainer for that part of the
+-patch, but would not require the Ack of the xen/arch/x86 maintainer or
+-the xen/arch/x86/mm maintainer.
+-
+-2. In unusual circumstances, a more general maintainer's Ack can stand
+-in for or even overrule a specific maintainer's Ack. Unusual
+-circumstances might include:
+- - The patch is fixing a high-priority issue causing immediate pain,
+- and the more specific maintainer is not available.
+- - The more specific maintainer has not responded either to the
+- original patch, nor to "pings", within a reasonable amount of time.
+- - The more general maintainer wants to overrule the more specific
+- maintainer on some issue. (This should be exceptional.)
+- - In the case of a disagreement between maintainers, THE REST can
+- settle the matter by majority vote. (This should be very exceptional
+- indeed.)
+-
+
+ Maintainers List (try to look for most precise areas first)
+
+diff --git a/xen/Makefile b/xen/Makefile
+index d7102a3b47..dcedfbc38e 100644
+--- a/xen/Makefile
++++ b/xen/Makefile
+@@ -6,7 +6,7 @@ this-makefile := $(call lastword,$(MAKEFILE_LIST))
+ # All other places this is stored (eg. compile.h) should be autogenerated.
+ export XEN_VERSION = 4
+ export XEN_SUBVERSION = 17
+-export XEN_EXTRAVERSION ?= .0$(XEN_VENDORVERSION)
++export XEN_EXTRAVERSION ?= .1-pre$(XEN_VENDORVERSION)
+ export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
+ -include xen-version
+
+--
+2.40.0
+
diff --git a/0002-x86-irq-do-not-release-irq-until-all-cleanup-is-done.patch b/0002-x86-irq-do-not-release-irq-until-all-cleanup-is-done.patch
new file mode 100644
index 0000000..1c7a13d
--- /dev/null
+++ b/0002-x86-irq-do-not-release-irq-until-all-cleanup-is-done.patch
@@ -0,0 +1,90 @@
+From 9cbc04a95f8a7f7cc27901211cbe19a42850c4ed Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 20 Dec 2022 13:43:04 +0100
+Subject: [PATCH 02/89] x86/irq: do not release irq until all cleanup is done
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Current code in _clear_irq_vector() will mark the irq as unused before
+doing the cleanup required when move_in_progress is true.
+
+This can lead to races in create_irq() if the function picks an irq
+desc that's been marked as unused but has move_in_progress set, as the
+call to assign_irq_vector() in that function can then fail with
+-EAGAIN.
+
+Prevent that by only marking irq descs as unused when all the cleanup
+has been done. While there also use write_atomic() when setting
+IRQ_UNUSED in _clear_irq_vector() and add a barrier in order to
+prevent the setting of IRQ_UNUSED getting reordered by the compiler.
+
+The check for move_in_progress cannot be removed from
+_assign_irq_vector(), as other users (io_apic_set_pci_routing() and
+ioapic_guest_write()) can still pass active irq descs to
+assign_irq_vector().
+
+Note the trace point is not moved and is now set before the irq is
+marked as unused. This is done so that the CPU mask provided in the
+trace point is the one belonging to the current vector, not the old
+one.
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: e267d11969a40f0aec33dbf966f5a6490b205f43
+master date: 2022-12-02 10:32:21 +0100
+---
+ xen/arch/x86/irq.c | 31 ++++++++++++++++---------------
+ 1 file changed, 16 insertions(+), 15 deletions(-)
+
+diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
+index cd0c8a30a8..20150b1c7f 100644
+--- a/xen/arch/x86/irq.c
++++ b/xen/arch/x86/irq.c
+@@ -220,27 +220,28 @@ static void _clear_irq_vector(struct irq_desc *desc)
+ clear_bit(vector, desc->arch.used_vectors);
+ }
+
+- desc->arch.used = IRQ_UNUSED;
+-
+ trace_irq_mask(TRC_HW_IRQ_CLEAR_VECTOR, irq, vector, tmp_mask);
+
+- if ( likely(!desc->arch.move_in_progress) )
+- return;
++ if ( unlikely(desc->arch.move_in_progress) )
++ {
++ /* If we were in motion, also clear desc->arch.old_vector */
++ old_vector = desc->arch.old_vector;
++ cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map);
+
+- /* If we were in motion, also clear desc->arch.old_vector */
+- old_vector = desc->arch.old_vector;
+- cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map);
++ for_each_cpu(cpu, tmp_mask)
++ {
++ ASSERT(per_cpu(vector_irq, cpu)[old_vector] == irq);
++ TRACE_3D(TRC_HW_IRQ_MOVE_FINISH, irq, old_vector, cpu);
++ per_cpu(vector_irq, cpu)[old_vector] = ~irq;
++ }
+
+- for_each_cpu(cpu, tmp_mask)
+- {
+- ASSERT(per_cpu(vector_irq, cpu)[old_vector] == irq);
+- TRACE_3D(TRC_HW_IRQ_MOVE_FINISH, irq, old_vector, cpu);
+- per_cpu(vector_irq, cpu)[old_vector] = ~irq;
+- }
++ release_old_vec(desc);
+
+- release_old_vec(desc);
++ desc->arch.move_in_progress = 0;
++ }
+
+- desc->arch.move_in_progress = 0;
++ smp_wmb();
++ write_atomic(&desc->arch.used, IRQ_UNUSED);
+ }
+
+ void __init clear_irq_vector(int irq)
+--
+2.40.0
+
diff --git a/0003-x86-pvh-do-not-forward-MADT-Local-APIC-NMI-structure.patch b/0003-x86-pvh-do-not-forward-MADT-Local-APIC-NMI-structure.patch
new file mode 100644
index 0000000..47d6997
--- /dev/null
+++ b/0003-x86-pvh-do-not-forward-MADT-Local-APIC-NMI-structure.patch
@@ -0,0 +1,103 @@
+From b7b34bd66ac77326bb49b10130013b4a9f83e4a2 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 20 Dec 2022 13:43:37 +0100
+Subject: [PATCH 03/89] x86/pvh: do not forward MADT Local APIC NMI structures
+ to dom0
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Currently Xen will passthrough any Local APIC NMI Structure found in
+the native ACPI MADT table to a PVH dom0. This is wrong because PVH
+doesn't have access to the physical local APIC, and instead gets an
+emulated local APIC by Xen, that doesn't have the LINT0 or LINT1
+pins wired to anything. Furthermore the ACPI Processor UIDs used in
+the APIC NMI Structures are likely to not match the ones generated by
+Xen for the Local x2APIC Structures, creating confusion to dom0.
+
+Fix this by removing the logic to passthrough the Local APIC NMI
+Structure for PVH dom0.
+
+Fixes: 1d74282c45 ('x86: setup PVHv2 Dom0 ACPI tables')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: b39e6385250ccef9509af0eab9003ad5c1478842
+master date: 2022-12-02 10:33:40 +0100
+---
+ xen/arch/x86/hvm/dom0_build.c | 34 +---------------------------------
+ 1 file changed, 1 insertion(+), 33 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
+index 1864d048a1..3ac6b7b423 100644
+--- a/xen/arch/x86/hvm/dom0_build.c
++++ b/xen/arch/x86/hvm/dom0_build.c
+@@ -58,9 +58,6 @@
+ static unsigned int __initdata acpi_intr_overrides;
+ static struct acpi_madt_interrupt_override __initdata *intsrcovr;
+
+-static unsigned int __initdata acpi_nmi_sources;
+-static struct acpi_madt_nmi_source __initdata *nmisrc;
+-
+ static unsigned int __initdata order_stats[MAX_ORDER + 1];
+
+ static void __init print_order_stats(const struct domain *d)
+@@ -763,25 +760,6 @@ static int __init cf_check acpi_set_intr_ovr(
+ return 0;
+ }
+
+-static int __init cf_check acpi_count_nmi_src(
+- struct acpi_subtable_header *header, const unsigned long end)
+-{
+- acpi_nmi_sources++;
+- return 0;
+-}
+-
+-static int __init cf_check acpi_set_nmi_src(
+- struct acpi_subtable_header *header, const unsigned long end)
+-{
+- const struct acpi_madt_nmi_source *src =
+- container_of(header, struct acpi_madt_nmi_source, header);
+-
+- *nmisrc = *src;
+- nmisrc++;
+-
+- return 0;
+-}
+-
+ static int __init pvh_setup_acpi_madt(struct domain *d, paddr_t *addr)
+ {
+ struct acpi_table_madt *madt;
+@@ -797,16 +775,11 @@ static int __init pvh_setup_acpi_madt(struct domain *d, paddr_t *addr)
+ acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE,
+ acpi_count_intr_ovr, UINT_MAX);
+
+- /* Count number of NMI sources in the MADT. */
+- acpi_table_parse_madt(ACPI_MADT_TYPE_NMI_SOURCE, acpi_count_nmi_src,
+- UINT_MAX);
+-
+ max_vcpus = dom0_max_vcpus();
+ /* Calculate the size of the crafted MADT. */
+ size = sizeof(*madt);
+ size += sizeof(*io_apic) * nr_ioapics;
+ size += sizeof(*intsrcovr) * acpi_intr_overrides;
+- size += sizeof(*nmisrc) * acpi_nmi_sources;
+ size += sizeof(*x2apic) * max_vcpus;
+
+ madt = xzalloc_bytes(size);
+@@ -862,12 +835,7 @@ static int __init pvh_setup_acpi_madt(struct domain *d, paddr_t *addr)
+ acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, acpi_set_intr_ovr,
+ acpi_intr_overrides);
+
+- /* Setup NMI sources. */
+- nmisrc = (void *)intsrcovr;
+- acpi_table_parse_madt(ACPI_MADT_TYPE_NMI_SOURCE, acpi_set_nmi_src,
+- acpi_nmi_sources);
+-
+- ASSERT(((void *)nmisrc - (void *)madt) == size);
++ ASSERT(((void *)intsrcovr - (void *)madt) == size);
+ madt->header.length = size;
+ /*
+ * Calling acpi_tb_checksum here is a layering violation, but
+--
+2.40.0
+
diff --git a/0004-x86-HVM-don-t-mark-external-IRQs-as-pending-when-vLA.patch b/0004-x86-HVM-don-t-mark-external-IRQs-as-pending-when-vLA.patch
new file mode 100644
index 0000000..01dcba8
--- /dev/null
+++ b/0004-x86-HVM-don-t-mark-external-IRQs-as-pending-when-vLA.patch
@@ -0,0 +1,71 @@
+From 54bb56e12868100c5ce06e33b4f57b6b2b8f37b9 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 20 Dec 2022 13:44:07 +0100
+Subject: [PATCH 04/89] x86/HVM: don't mark external IRQs as pending when
+ vLAPIC is disabled
+
+In software-disabled state an LAPIC does not accept any interrupt
+requests and hence no IRR bit would newly become set while in this
+state. As a result it is also wrong for us to mark IO-APIC or MSI
+originating vectors as having a pending request when the vLAPIC is in
+this state. Such interrupts are simply lost.
+
+Introduce (IO-APIC) or re-use (MSI) a local variable to help
+readability.
+
+Fixes: 4fe21ad3712e ("This patch add virtual IOAPIC support for VMX guest")
+Fixes: 85715f4bc7c9 ("MSI 5/6: add MSI support to passthrough HVM domain")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: f1d7aac1e3c3cd164e17d41791a575a5c3e87121
+master date: 2022-12-02 10:35:01 +0100
+---
+ xen/arch/x86/hvm/vioapic.c | 9 +++++++--
+ xen/arch/x86/hvm/vmsi.c | 10 ++++++----
+ 2 files changed, 13 insertions(+), 6 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
+index cb7f440160..41e3c4d5e4 100644
+--- a/xen/arch/x86/hvm/vioapic.c
++++ b/xen/arch/x86/hvm/vioapic.c
+@@ -460,9 +460,14 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
+
+ case dest_Fixed:
+ for_each_vcpu ( d, v )
+- if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) )
+- ioapic_inj_irq(vioapic, vcpu_vlapic(v), vector, trig_mode,
++ {
++ struct vlapic *vlapic = vcpu_vlapic(v);
++
++ if ( vlapic_enabled(vlapic) &&
++ vlapic_match_dest(vlapic, NULL, 0, dest, dest_mode) )
++ ioapic_inj_irq(vioapic, vlapic, vector, trig_mode,
+ delivery_mode);
++ }
+ break;
+
+ case dest_NMI:
+diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
+index 75f92885dc..3cd4923060 100644
+--- a/xen/arch/x86/hvm/vmsi.c
++++ b/xen/arch/x86/hvm/vmsi.c
+@@ -87,10 +87,12 @@ int vmsi_deliver(
+
+ case dest_Fixed:
+ for_each_vcpu ( d, v )
+- if ( vlapic_match_dest(vcpu_vlapic(v), NULL,
+- 0, dest, dest_mode) )
+- vmsi_inj_irq(vcpu_vlapic(v), vector,
+- trig_mode, delivery_mode);
++ {
++ target = vcpu_vlapic(v);
++ if ( vlapic_enabled(target) &&
++ vlapic_match_dest(target, NULL, 0, dest, dest_mode) )
++ vmsi_inj_irq(target, vector, trig_mode, delivery_mode);
++ }
+ break;
+
+ default:
+--
+2.40.0
+
diff --git a/0005-x86-Viridian-don-t-mark-IRQ-vectors-as-pending-when-.patch b/0005-x86-Viridian-don-t-mark-IRQ-vectors-as-pending-when-.patch
new file mode 100644
index 0000000..3086285
--- /dev/null
+++ b/0005-x86-Viridian-don-t-mark-IRQ-vectors-as-pending-when-.patch
@@ -0,0 +1,60 @@
+From 5810edc049cd5828c2628a377ca8443610e54f82 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 20 Dec 2022 13:44:38 +0100
+Subject: [PATCH 05/89] x86/Viridian: don't mark IRQ vectors as pending when
+ vLAPIC is disabled
+
+In software-disabled state an LAPIC does not accept any interrupt
+requests and hence no IRR bit would newly become set while in this
+state. As a result it is also wrong for us to mark Viridian IPI or timer
+vectors as having a pending request when the vLAPIC is in this state.
+Such interrupts are simply lost.
+
+Introduce a local variable in send_ipi() to help readability.
+
+Fixes: fda96b7382ea ("viridian: add implementation of the HvSendSyntheticClusterIpi hypercall")
+Fixes: 26fba3c85571 ("viridian: add implementation of synthetic timers")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Paul Durrant <paul@xen.org>
+master commit: 831419f82913417dee4e5b0f80769c5db590540b
+master date: 2022-12-02 10:35:32 +0100
+---
+ xen/arch/x86/hvm/viridian/synic.c | 2 +-
+ xen/arch/x86/hvm/viridian/viridian.c | 7 ++++++-
+ 2 files changed, 7 insertions(+), 2 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/viridian/synic.c b/xen/arch/x86/hvm/viridian/synic.c
+index e18538c60a..856bb898b8 100644
+--- a/xen/arch/x86/hvm/viridian/synic.c
++++ b/xen/arch/x86/hvm/viridian/synic.c
+@@ -359,7 +359,7 @@ bool viridian_synic_deliver_timer_msg(struct vcpu *v, unsigned int sintx,
+ BUILD_BUG_ON(sizeof(payload) > sizeof(msg->u.payload));
+ memcpy(msg->u.payload, &payload, sizeof(payload));
+
+- if ( !vs->masked )
++ if ( !vs->masked && vlapic_enabled(vcpu_vlapic(v)) )
+ vlapic_set_irq(vcpu_vlapic(v), vs->vector, 0);
+
+ return true;
+diff --git a/xen/arch/x86/hvm/viridian/viridian.c b/xen/arch/x86/hvm/viridian/viridian.c
+index 25dca93e8b..2937ddd3a8 100644
+--- a/xen/arch/x86/hvm/viridian/viridian.c
++++ b/xen/arch/x86/hvm/viridian/viridian.c
+@@ -811,7 +811,12 @@ static void send_ipi(struct hypercall_vpmask *vpmask, uint8_t vector)
+ cpu_raise_softirq_batch_begin();
+
+ for_each_vp ( vpmask, vp )
+- vlapic_set_irq(vcpu_vlapic(currd->vcpu[vp]), vector, 0);
++ {
++ struct vlapic *vlapic = vcpu_vlapic(currd->vcpu[vp]);
++
++ if ( vlapic_enabled(vlapic) )
++ vlapic_set_irq(vlapic, vector, 0);
++ }
+
+ if ( nr > 1 )
+ cpu_raise_softirq_batch_finish();
+--
+2.40.0
+
diff --git a/0006-x86-HVM-don-t-mark-evtchn-upcall-vector-as-pending-w.patch b/0006-x86-HVM-don-t-mark-evtchn-upcall-vector-as-pending-w.patch
new file mode 100644
index 0000000..2577f20
--- /dev/null
+++ b/0006-x86-HVM-don-t-mark-evtchn-upcall-vector-as-pending-w.patch
@@ -0,0 +1,70 @@
+From 26f39b3d705b667aa21f368c252abffb0b4d3e5d Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 20 Dec 2022 13:45:07 +0100
+Subject: [PATCH 06/89] x86/HVM: don't mark evtchn upcall vector as pending
+ when vLAPIC is disabled
+
+Linux'es relatively new use of HVMOP_set_evtchn_upcall_vector has
+exposed a problem with the marking of the respective vector as
+pending: For quite some time Linux has been checking whether any stale
+ISR or IRR bits would still be set while preparing the LAPIC for use.
+This check is now triggering on the upcall vector, as the registration,
+at least for APs, happens before the LAPIC is actually enabled.
+
+In software-disabled state an LAPIC would not accept any interrupt
+requests and hence no IRR bit would newly become set while in this
+state. As a result it is also wrong for us to mark the upcall vector as
+having a pending request when the vLAPIC is in this state.
+
+To compensate for the "enabled" check added to the assertion logic, add
+logic to (conditionally) mark the upcall vector as having a request
+pending at the time the LAPIC is being software-enabled by the guest.
+Note however that, like for the pt_may_unmask_irq() we already have
+there, long term we may need to find a different solution. This will be
+especially relevant in case yet better LAPIC acceleration would
+eliminate notifications of guest writes to this and other registers.
+
+Fixes: 7b5b8ca7dffd ("x86/upcall: inject a spurious event after setting upcall vector")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+master commit: f5d0279839b58cb622f0995dbf9cff056f03082e
+master date: 2022-12-06 13:51:49 +0100
+---
+ xen/arch/x86/hvm/irq.c | 5 +++--
+ xen/arch/x86/hvm/vlapic.c | 3 +++
+ 2 files changed, 6 insertions(+), 2 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
+index 858ab5b248..d93ffe4546 100644
+--- a/xen/arch/x86/hvm/irq.c
++++ b/xen/arch/x86/hvm/irq.c
+@@ -321,9 +321,10 @@ void hvm_assert_evtchn_irq(struct vcpu *v)
+
+ if ( v->arch.hvm.evtchn_upcall_vector != 0 )
+ {
+- uint8_t vector = v->arch.hvm.evtchn_upcall_vector;
++ struct vlapic *vlapic = vcpu_vlapic(v);
+
+- vlapic_set_irq(vcpu_vlapic(v), vector, 0);
++ if ( vlapic_enabled(vlapic) )
++ vlapic_set_irq(vlapic, v->arch.hvm.evtchn_upcall_vector, 0);
+ }
+ else if ( is_hvm_pv_evtchn_domain(v->domain) )
+ vcpu_kick(v);
+diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
+index 257d3b6851..eb32f12e2d 100644
+--- a/xen/arch/x86/hvm/vlapic.c
++++ b/xen/arch/x86/hvm/vlapic.c
+@@ -829,6 +829,9 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val)
+ {
+ vlapic->hw.disabled &= ~VLAPIC_SW_DISABLED;
+ pt_may_unmask_irq(vlapic_domain(vlapic), &vlapic->pt);
++ if ( v->arch.hvm.evtchn_upcall_vector &&
++ vcpu_info(v, evtchn_upcall_pending) )
++ vlapic_set_irq(vlapic, v->arch.hvm.evtchn_upcall_vector, 0);
+ }
+ break;
+
+--
+2.40.0
+
diff --git a/0002-ioreq_broadcast-accept-partial-broadcast-success.patch b/0007-ioreq_broadcast-accept-partial-broadcast-success.patch
similarity index 77%
rename from 0002-ioreq_broadcast-accept-partial-broadcast-success.patch
rename to 0007-ioreq_broadcast-accept-partial-broadcast-success.patch
index 1b0ae9c..654990b 100644
--- a/0002-ioreq_broadcast-accept-partial-broadcast-success.patch
+++ b/0007-ioreq_broadcast-accept-partial-broadcast-success.patch
@@ -1,7 +1,7 @@
-From f2edbd79f5d5ce3b633885469852e1215dc0d4b5 Mon Sep 17 00:00:00 2001
+From c3e37c60fbf8f8cd71db0f0846c9c7aeadf02963 Mon Sep 17 00:00:00 2001
From: Per Bilse <per.bilse@citrix.com>
-Date: Tue, 20 Dec 2022 13:50:47 +0100
-Subject: [PATCH 02/61] ioreq_broadcast(): accept partial broadcast success
+Date: Tue, 20 Dec 2022 13:45:38 +0100
+Subject: [PATCH 07/89] ioreq_broadcast(): accept partial broadcast success
Avoid incorrectly triggering an error when a broadcast buffered ioreq
is not handled by all registered clients, as long as the failure is
@@ -16,10 +16,10 @@ master date: 2022-12-07 12:17:30 +0100
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
-index 42414b750b..2a8d8de2d5 100644
+index 4617aef29b..ecb8f545e1 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
-@@ -1322,7 +1322,8 @@ unsigned int ioreq_broadcast(ioreq_t *p, bool buffered)
+@@ -1317,7 +1317,8 @@ unsigned int ioreq_broadcast(ioreq_t *p, bool buffered)
FOR_EACH_IOREQ_SERVER(d, id, s)
{
diff --git a/0008-EFI-relocate-the-ESRT-when-booting-via-multiboot2.patch b/0008-EFI-relocate-the-ESRT-when-booting-via-multiboot2.patch
new file mode 100644
index 0000000..d1acae6
--- /dev/null
+++ b/0008-EFI-relocate-the-ESRT-when-booting-via-multiboot2.patch
@@ -0,0 +1,195 @@
+From 1dcc9b6dfe528c7815a314f9b5581804b5e23750 Mon Sep 17 00:00:00 2001
+From: Demi Marie Obenour <demi@invisiblethingslab.com>
+Date: Tue, 20 Dec 2022 13:46:09 +0100
+Subject: [PATCH 08/89] EFI: relocate the ESRT when booting via multiboot2
+
+This was missed in the initial patchset.
+
+Move efi_relocate_esrt() up to avoid adding a forward declaration.
+
+Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 8d7acf3f7d8d2555c78421dced45bc49f79ae806
+master date: 2022-12-14 12:00:35 +0100
+---
+ xen/arch/x86/efi/efi-boot.h | 2 +
+ xen/common/efi/boot.c | 136 ++++++++++++++++++------------------
+ 2 files changed, 70 insertions(+), 68 deletions(-)
+
+diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
+index 27f928ed3c..c94e53d139 100644
+--- a/xen/arch/x86/efi/efi-boot.h
++++ b/xen/arch/x86/efi/efi-boot.h
+@@ -823,6 +823,8 @@ void __init efi_multiboot2(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable
+ if ( gop )
+ efi_set_gop_mode(gop, gop_mode);
+
++ efi_relocate_esrt(SystemTable);
++
+ efi_exit_boot(ImageHandle, SystemTable);
+ }
+
+diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
+index b3de1011ee..d3c6b055ae 100644
+--- a/xen/common/efi/boot.c
++++ b/xen/common/efi/boot.c
+@@ -625,6 +625,74 @@ static size_t __init get_esrt_size(const EFI_MEMORY_DESCRIPTOR *desc)
+ return esrt_ptr->FwResourceCount * sizeof(esrt_ptr->Entries[0]);
+ }
+
++static EFI_GUID __initdata esrt_guid = EFI_SYSTEM_RESOURCE_TABLE_GUID;
++
++static void __init efi_relocate_esrt(EFI_SYSTEM_TABLE *SystemTable)
++{
++ EFI_STATUS status;
++ UINTN info_size = 0, map_key, mdesc_size;
++ void *memory_map = NULL;
++ UINT32 ver;
++ unsigned int i;
++
++ for ( ; ; )
++ {
++ status = efi_bs->GetMemoryMap(&info_size, memory_map, &map_key,
++ &mdesc_size, &ver);
++ if ( status == EFI_SUCCESS && memory_map != NULL )
++ break;
++ if ( status == EFI_BUFFER_TOO_SMALL || memory_map == NULL )
++ {
++ info_size += 8 * mdesc_size;
++ if ( memory_map != NULL )
++ efi_bs->FreePool(memory_map);
++ memory_map = NULL;
++ status = efi_bs->AllocatePool(EfiLoaderData, info_size, &memory_map);
++ if ( status == EFI_SUCCESS )
++ continue;
++ PrintErr(L"Cannot allocate memory to relocate ESRT\r\n");
++ }
++ else
++ PrintErr(L"Cannot obtain memory map to relocate ESRT\r\n");
++ return;
++ }
++
++ /* Try to obtain the ESRT. Errors are not fatal. */
++ for ( i = 0; i < info_size; i += mdesc_size )
++ {
++ /*
++ * ESRT needs to be moved to memory of type EfiACPIReclaimMemory
++ * so that the memory it is in will not be used for other purposes.
++ */
++ void *new_esrt = NULL;
++ const EFI_MEMORY_DESCRIPTOR *desc = memory_map + i;
++ size_t esrt_size = get_esrt_size(desc);
++
++ if ( !esrt_size )
++ continue;
++ if ( desc->Type == EfiRuntimeServicesData ||
++ desc->Type == EfiACPIReclaimMemory )
++ break; /* ESRT already safe from reuse */
++ status = efi_bs->AllocatePool(EfiACPIReclaimMemory, esrt_size,
++ &new_esrt);
++ if ( status == EFI_SUCCESS && new_esrt )
++ {
++ memcpy(new_esrt, (void *)esrt, esrt_size);
++ status = efi_bs->InstallConfigurationTable(&esrt_guid, new_esrt);
++ if ( status != EFI_SUCCESS )
++ {
++ PrintErr(L"Cannot install new ESRT\r\n");
++ efi_bs->FreePool(new_esrt);
++ }
++ }
++ else
++ PrintErr(L"Cannot allocate memory for ESRT\r\n");
++ break;
++ }
++
++ efi_bs->FreePool(memory_map);
++}
++
+ /*
+ * Include architecture specific implementation here, which references the
+ * static globals defined above.
+@@ -903,8 +971,6 @@ static UINTN __init efi_find_gop_mode(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,
+ return gop_mode;
+ }
+
+-static EFI_GUID __initdata esrt_guid = EFI_SYSTEM_RESOURCE_TABLE_GUID;
+-
+ static void __init efi_tables(void)
+ {
+ unsigned int i;
+@@ -1113,72 +1179,6 @@ static void __init efi_set_gop_mode(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop, UINTN gop
+ #define INVALID_VIRTUAL_ADDRESS (0xBAAADUL << \
+ (EFI_PAGE_SHIFT + BITS_PER_LONG - 32))
+
+-static void __init efi_relocate_esrt(EFI_SYSTEM_TABLE *SystemTable)
+-{
+- EFI_STATUS status;
+- UINTN info_size = 0, map_key, mdesc_size;
+- void *memory_map = NULL;
+- UINT32 ver;
+- unsigned int i;
+-
+- for ( ; ; )
+- {
+- status = efi_bs->GetMemoryMap(&info_size, memory_map, &map_key,
+- &mdesc_size, &ver);
+- if ( status == EFI_SUCCESS && memory_map != NULL )
+- break;
+- if ( status == EFI_BUFFER_TOO_SMALL || memory_map == NULL )
+- {
+- info_size += 8 * mdesc_size;
+- if ( memory_map != NULL )
+- efi_bs->FreePool(memory_map);
+- memory_map = NULL;
+- status = efi_bs->AllocatePool(EfiLoaderData, info_size, &memory_map);
+- if ( status == EFI_SUCCESS )
+- continue;
+- PrintErr(L"Cannot allocate memory to relocate ESRT\r\n");
+- }
+- else
+- PrintErr(L"Cannot obtain memory map to relocate ESRT\r\n");
+- return;
+- }
+-
+- /* Try to obtain the ESRT. Errors are not fatal. */
+- for ( i = 0; i < info_size; i += mdesc_size )
+- {
+- /*
+- * ESRT needs to be moved to memory of type EfiACPIReclaimMemory
+- * so that the memory it is in will not be used for other purposes.
+- */
+- void *new_esrt = NULL;
+- const EFI_MEMORY_DESCRIPTOR *desc = memory_map + i;
+- size_t esrt_size = get_esrt_size(desc);
+-
+- if ( !esrt_size )
+- continue;
+- if ( desc->Type == EfiRuntimeServicesData ||
+- desc->Type == EfiACPIReclaimMemory )
+- break; /* ESRT already safe from reuse */
+- status = efi_bs->AllocatePool(EfiACPIReclaimMemory, esrt_size,
+- &new_esrt);
+- if ( status == EFI_SUCCESS && new_esrt )
+- {
+- memcpy(new_esrt, (void *)esrt, esrt_size);
+- status = efi_bs->InstallConfigurationTable(&esrt_guid, new_esrt);
+- if ( status != EFI_SUCCESS )
+- {
+- PrintErr(L"Cannot install new ESRT\r\n");
+- efi_bs->FreePool(new_esrt);
+- }
+- }
+- else
+- PrintErr(L"Cannot allocate memory for ESRT\r\n");
+- break;
+- }
+-
+- efi_bs->FreePool(memory_map);
+-}
+-
+ static void __init efi_exit_boot(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable)
+ {
+ EFI_STATUS status;
+--
+2.40.0
+
diff --git a/0003-x86-time-prevent-overflow-with-high-frequency-TSCs.patch b/0009-x86-time-prevent-overflow-with-high-frequency-TSCs.patch
similarity index 80%
rename from 0003-x86-time-prevent-overflow-with-high-frequency-TSCs.patch
rename to 0009-x86-time-prevent-overflow-with-high-frequency-TSCs.patch
index a031317..a9401d7 100644
--- a/0003-x86-time-prevent-overflow-with-high-frequency-TSCs.patch
+++ b/0009-x86-time-prevent-overflow-with-high-frequency-TSCs.patch
@@ -1,7 +1,7 @@
-From 65bf12135f618614bbf44626fba1c20ca8d1a127 Mon Sep 17 00:00:00 2001
+From a7a26da0b59da7233e6c6f63b180bab131398351 Mon Sep 17 00:00:00 2001
From: Neowutran <xen@neowutran.ovh>
-Date: Tue, 20 Dec 2022 13:51:42 +0100
-Subject: [PATCH 03/61] x86/time: prevent overflow with high frequency TSCs
+Date: Tue, 20 Dec 2022 13:46:38 +0100
+Subject: [PATCH 09/89] x86/time: prevent overflow with high frequency TSCs
Make sure tsc_khz is promoted to a 64-bit type before multiplying by
1000 to avoid an 'overflow before widen' bug. Otherwise just above
@@ -17,10 +17,10 @@ master date: 2022-12-19 11:34:16 +0100
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
-index 1daff92dca..db0b149ec6 100644
+index b01acd390d..d882b43cf0 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
-@@ -2490,7 +2490,7 @@ int tsc_set_info(struct domain *d,
+@@ -2585,7 +2585,7 @@ int tsc_set_info(struct domain *d,
case TSC_MODE_ALWAYS_EMULATE:
d->arch.vtsc_offset = get_s_time() - elapsed_nsec;
d->arch.tsc_khz = gtsc_khz ?: cpu_khz;
diff --git a/0010-tools-oxenstored-Fix-incorrect-scope-after-an-if-sta.patch b/0010-tools-oxenstored-Fix-incorrect-scope-after-an-if-sta.patch
new file mode 100644
index 0000000..a8c427d
--- /dev/null
+++ b/0010-tools-oxenstored-Fix-incorrect-scope-after-an-if-sta.patch
@@ -0,0 +1,52 @@
+From 2e8d7a08bcd111fe21569e9ace1a047df76da949 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 11 Nov 2022 18:50:34 +0000
+Subject: [PATCH 10/89] tools/oxenstored: Fix incorrect scope after an if
+ statement
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+A debug statement got inserted into a single-expression if statement.
+
+Insert brackets to give the intended meaning, rather than the actual meaning
+where the "let con = Connections..." is outside and executed unconditionally.
+
+This results in some unnecessary ring checks for domains which otherwise have
+IO credit.
+
+Fixes: 42f0581a91d4 ("tools/oxenstored: Implement live update for socket connections")
+Reported-by: Edwin Török <edvin.torok@citrix.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit ee36179371fd4215a43fb179be2165f65c1cd1cd)
+---
+ tools/ocaml/xenstored/xenstored.ml | 5 +++--
+ 1 file changed, 3 insertions(+), 2 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
+index ffd43a4eee..c5dc7a28d0 100644
+--- a/tools/ocaml/xenstored/xenstored.ml
++++ b/tools/ocaml/xenstored/xenstored.ml
+@@ -475,7 +475,7 @@ let _ =
+
+ let ring_scan_checker dom =
+ (* no need to scan domains already marked as for processing *)
+- if not (Domain.get_io_credit dom > 0) then
++ if not (Domain.get_io_credit dom > 0) then (
+ debug "Looking up domid %d" (Domain.get_id dom);
+ let con = Connections.find_domain cons (Domain.get_id dom) in
+ if not (Connection.has_more_work con) then (
+@@ -490,7 +490,8 @@ let _ =
+ let n = 32 + 2 * (Domains.number domains) in
+ info "found lazy domain %d, credit %d" (Domain.get_id dom) n;
+ Domain.set_io_credit ~n dom
+- ) in
++ )
++ ) in
+
+ let last_stat_time = ref 0. in
+ let last_scan_time = ref 0. in
+--
+2.40.0
+
diff --git a/0011-tools-ocaml-evtchn-OCaml-5-support-fix-potential-res.patch b/0011-tools-ocaml-evtchn-OCaml-5-support-fix-potential-res.patch
new file mode 100644
index 0000000..c9cf630
--- /dev/null
+++ b/0011-tools-ocaml-evtchn-OCaml-5-support-fix-potential-res.patch
@@ -0,0 +1,68 @@
+From d11528a993f80c6a86f4cb0c30578c026348e3e4 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Tue, 18 Jan 2022 15:04:48 +0000
+Subject: [PATCH 11/89] tools/ocaml/evtchn: OCaml 5 support, fix potential
+ resource leak
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+There is no binding for xenevtchn_close(). In principle, this is a resource
+leak, but the typical usage is as a singleton that lives for the lifetime of
+the program.
+
+Ocaml 5 no longer permits storing a naked C pointer in an Ocaml value.
+
+Therefore, use a Custom block. This allows us to use the finaliser callback
+to call xenevtchn_close(), if the Ocaml object goes out of scope.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 22d5affdf0cecfa6faae46fbaec68b8018835220)
+---
+ tools/ocaml/libs/eventchn/xeneventchn_stubs.c | 21 +++++++++++++++++--
+ 1 file changed, 19 insertions(+), 2 deletions(-)
+
+diff --git a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
+index f889a7a2e4..37f1cc4e14 100644
+--- a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
++++ b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
+@@ -33,7 +33,22 @@
+ #include <caml/fail.h>
+ #include <caml/signals.h>
+
+-#define _H(__h) ((xenevtchn_handle *)(__h))
++#define _H(__h) (*((xenevtchn_handle **)Data_custom_val(__h)))
++
++static void stub_evtchn_finalize(value v)
++{
++ xenevtchn_close(_H(v));
++}
++
++static struct custom_operations xenevtchn_ops = {
++ .identifier = "xenevtchn",
++ .finalize = stub_evtchn_finalize,
++ .compare = custom_compare_default, /* Can't compare */
++ .hash = custom_hash_default, /* Can't hash */
++ .serialize = custom_serialize_default, /* Can't serialize */
++ .deserialize = custom_deserialize_default, /* Can't deserialize */
++ .compare_ext = custom_compare_ext_default, /* Can't compare */
++};
+
+ CAMLprim value stub_eventchn_init(void)
+ {
+@@ -48,7 +63,9 @@ CAMLprim value stub_eventchn_init(void)
+ if (xce == NULL)
+ caml_failwith("open failed");
+
+- result = (value)xce;
++ result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
++ _H(result) = xce;
++
+ CAMLreturn(result);
+ }
+
+--
+2.40.0
+
diff --git a/0012-tools-ocaml-evtchn-Add-binding-for-xenevtchn_fdopen.patch b/0012-tools-ocaml-evtchn-Add-binding-for-xenevtchn_fdopen.patch
new file mode 100644
index 0000000..7e921fd
--- /dev/null
+++ b/0012-tools-ocaml-evtchn-Add-binding-for-xenevtchn_fdopen.patch
@@ -0,0 +1,81 @@
+From 24d9dc2ae2f88249fcf81f7b7e612cdfb7c73e4b Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Mon, 14 Nov 2022 13:36:19 +0000
+Subject: [PATCH 12/89] tools/ocaml/evtchn: Add binding for xenevtchn_fdopen()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+For live update, the new oxenstored needs to reconstruct an evtchn object
+around an existing file descriptor.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 7ba68a6c558e1fd811c95cb7215a5cd07a3cc2ea)
+---
+ tools/ocaml/libs/eventchn/xeneventchn.ml | 1 +
+ tools/ocaml/libs/eventchn/xeneventchn.mli | 4 ++++
+ tools/ocaml/libs/eventchn/xeneventchn_stubs.c | 19 +++++++++++++++++++
+ 3 files changed, 24 insertions(+)
+
+diff --git a/tools/ocaml/libs/eventchn/xeneventchn.ml b/tools/ocaml/libs/eventchn/xeneventchn.ml
+index dd00a1f0ea..be4de82f46 100644
+--- a/tools/ocaml/libs/eventchn/xeneventchn.ml
++++ b/tools/ocaml/libs/eventchn/xeneventchn.ml
+@@ -17,6 +17,7 @@
+ type handle
+
+ external init: unit -> handle = "stub_eventchn_init"
++external fdopen: Unix.file_descr -> handle = "stub_eventchn_fdopen"
+ external fd: handle -> Unix.file_descr = "stub_eventchn_fd"
+
+ type t = int
+diff --git a/tools/ocaml/libs/eventchn/xeneventchn.mli b/tools/ocaml/libs/eventchn/xeneventchn.mli
+index 08c7337643..98b3c86f37 100644
+--- a/tools/ocaml/libs/eventchn/xeneventchn.mli
++++ b/tools/ocaml/libs/eventchn/xeneventchn.mli
+@@ -47,6 +47,10 @@ val init: unit -> handle
+ (** Return an initialised event channel interface. On error it
+ will throw a Failure exception. *)
+
++val fdopen: Unix.file_descr -> handle
++(** Return an initialised event channel interface, from an already open evtchn
++ file descriptor. On error it will throw a Failure exception. *)
++
+ val fd: handle -> Unix.file_descr
+ (** Return a file descriptor suitable for Unix.select. When
+ the descriptor becomes readable, it is safe to call 'pending'.
+diff --git a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
+index 37f1cc4e14..7bdf711bc1 100644
+--- a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
++++ b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
+@@ -69,6 +69,25 @@ CAMLprim value stub_eventchn_init(void)
+ CAMLreturn(result);
+ }
+
++CAMLprim value stub_eventchn_fdopen(value fdval)
++{
++ CAMLparam1(fdval);
++ CAMLlocal1(result);
++ xenevtchn_handle *xce;
++
++ caml_enter_blocking_section();
++ xce = xenevtchn_fdopen(NULL, Int_val(fdval), 0);
++ caml_leave_blocking_section();
++
++ if (xce == NULL)
++ caml_failwith("evtchn fdopen failed");
++
++ result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
++ _H(result) = xce;
++
++ CAMLreturn(result);
++}
++
+ CAMLprim value stub_eventchn_fd(value xce)
+ {
+ CAMLparam1(xce);
+--
+2.40.0
+
diff --git a/0013-tools-ocaml-evtchn-Extend-the-init-binding-with-a-cl.patch b/0013-tools-ocaml-evtchn-Extend-the-init-binding-with-a-cl.patch
new file mode 100644
index 0000000..af889eb
--- /dev/null
+++ b/0013-tools-ocaml-evtchn-Extend-the-init-binding-with-a-cl.patch
@@ -0,0 +1,90 @@
+From c7cf603836e40de1b4a6ca7d1d52736eb4a10327 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Thu, 3 Nov 2022 14:50:38 +0000
+Subject: [PATCH 13/89] tools/ocaml/evtchn: Extend the init() binding with a
+ cloexec flag
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+For live update, oxenstored wants to clear CLOEXEC on the evtchn handle, so it
+survives the execve() into the new oxenstored.
+
+Have the new interface match how cloexec works in other Ocaml standard
+libraries.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 9bafe4a53306e7aa2ce6ffc96f7477c6f329f7a7)
+---
+ tools/ocaml/libs/eventchn/xeneventchn.ml | 5 ++++-
+ tools/ocaml/libs/eventchn/xeneventchn.mli | 9 ++++++---
+ tools/ocaml/libs/eventchn/xeneventchn_stubs.c | 10 +++++++---
+ 3 files changed, 17 insertions(+), 7 deletions(-)
+
+diff --git a/tools/ocaml/libs/eventchn/xeneventchn.ml b/tools/ocaml/libs/eventchn/xeneventchn.ml
+index be4de82f46..c16fdd4674 100644
+--- a/tools/ocaml/libs/eventchn/xeneventchn.ml
++++ b/tools/ocaml/libs/eventchn/xeneventchn.ml
+@@ -16,7 +16,10 @@
+
+ type handle
+
+-external init: unit -> handle = "stub_eventchn_init"
++external _init: bool -> handle = "stub_eventchn_init"
++
++let init ?(cloexec=true) () = _init cloexec
++
+ external fdopen: Unix.file_descr -> handle = "stub_eventchn_fdopen"
+ external fd: handle -> Unix.file_descr = "stub_eventchn_fd"
+
+diff --git a/tools/ocaml/libs/eventchn/xeneventchn.mli b/tools/ocaml/libs/eventchn/xeneventchn.mli
+index 98b3c86f37..870429b6b5 100644
+--- a/tools/ocaml/libs/eventchn/xeneventchn.mli
++++ b/tools/ocaml/libs/eventchn/xeneventchn.mli
+@@ -43,9 +43,12 @@ val to_int: t -> int
+
+ val of_int: int -> t
+
+-val init: unit -> handle
+-(** Return an initialised event channel interface. On error it
+- will throw a Failure exception. *)
++val init: ?cloexec:bool -> unit -> handle
++(** [init ?cloexec ()]
++ Return an initialised event channel interface.
++ The default is to close the underlying file descriptor
++ on [execve], which can be overriden with [~cloexec:false].
++ On error it will throw a Failure exception. *)
+
+ val fdopen: Unix.file_descr -> handle
+ (** Return an initialised event channel interface, from an already open evtchn
+diff --git a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
+index 7bdf711bc1..aa8a69cc1e 100644
+--- a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
++++ b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
+@@ -50,14 +50,18 @@ static struct custom_operations xenevtchn_ops = {
+ .compare_ext = custom_compare_ext_default, /* Can't compare */
+ };
+
+-CAMLprim value stub_eventchn_init(void)
++CAMLprim value stub_eventchn_init(value cloexec)
+ {
+- CAMLparam0();
++ CAMLparam1(cloexec);
+ CAMLlocal1(result);
+ xenevtchn_handle *xce;
++ unsigned int flags = 0;
++
++ if ( !Bool_val(cloexec) )
++ flags |= XENEVTCHN_NO_CLOEXEC;
+
+ caml_enter_blocking_section();
+- xce = xenevtchn_open(NULL, 0);
++ xce = xenevtchn_open(NULL, flags);
+ caml_leave_blocking_section();
+
+ if (xce == NULL)
+--
+2.40.0
+
diff --git a/0014-tools-oxenstored-Style-fixes-to-Domain.patch b/0014-tools-oxenstored-Style-fixes-to-Domain.patch
new file mode 100644
index 0000000..aad4399
--- /dev/null
+++ b/0014-tools-oxenstored-Style-fixes-to-Domain.patch
@@ -0,0 +1,64 @@
+From 0929960173bc76b8d90df73c8ee665747c233e18 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 30 Nov 2022 14:56:43 +0000
+Subject: [PATCH 14/89] tools/oxenstored: Style fixes to Domain
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+This file has some style problems so severe that they interfere with the
+readability of the subsequent bugfix patches.
+
+Fix these issues ahead of time, to make the subsequent changes more readable.
+
+No functional change.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit b45bfaf359e4821b1bf98a4fcd194d7fd176f167)
+---
+ tools/ocaml/xenstored/domain.ml | 16 +++++++---------
+ 1 file changed, 7 insertions(+), 9 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/domain.ml b/tools/ocaml/xenstored/domain.ml
+index 81cb59b8f1..ab08dcf37f 100644
+--- a/tools/ocaml/xenstored/domain.ml
++++ b/tools/ocaml/xenstored/domain.ml
+@@ -57,17 +57,16 @@ let is_paused_for_conflict dom = dom.conflict_credit <= 0.0
+ let is_free_to_conflict = is_dom0
+
+ let string_of_port = function
+-| None -> "None"
+-| Some x -> string_of_int (Xeneventchn.to_int x)
++ | None -> "None"
++ | Some x -> string_of_int (Xeneventchn.to_int x)
+
+ let dump d chan =
+ fprintf chan "dom,%d,%nd,%d\n" d.id d.mfn d.remote_port
+
+-let notify dom = match dom.port with
+-| None ->
+- warn "domain %d: attempt to notify on unknown port" dom.id
+-| Some port ->
+- Event.notify dom.eventchn port
++let notify dom =
++ match dom.port with
++ | None -> warn "domain %d: attempt to notify on unknown port" dom.id
++ | Some port -> Event.notify dom.eventchn port
+
+ let bind_interdomain dom =
+ begin match dom.port with
+@@ -84,8 +83,7 @@ let close dom =
+ | None -> ()
+ | Some port -> Event.unbind dom.eventchn port
+ end;
+- Xenmmap.unmap dom.interface;
+- ()
++ Xenmmap.unmap dom.interface
+
+ let make id mfn remote_port interface eventchn = {
+ id = id;
+--
+2.40.0
+
diff --git a/0015-tools-oxenstored-Bind-the-DOM_EXC-VIRQ-in-in-Event.i.patch b/0015-tools-oxenstored-Bind-the-DOM_EXC-VIRQ-in-in-Event.i.patch
new file mode 100644
index 0000000..8b83edf
--- /dev/null
+++ b/0015-tools-oxenstored-Bind-the-DOM_EXC-VIRQ-in-in-Event.i.patch
@@ -0,0 +1,82 @@
+From bc5cc00868ea29d814bb3d783e28b49d1acf63e9 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 29 Nov 2022 21:05:43 +0000
+Subject: [PATCH 15/89] tools/oxenstored: Bind the DOM_EXC VIRQ in in
+ Event.init()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Xenstored always needs to bind the DOM_EXC VIRQ.
+
+Instead of doing it shortly after the call to Event.init(), do it in the
+constructor directly. This removes the need for the field to be a mutable
+option.
+
+It will also simplify a future change to support live update. Rename the
+field from virq_port (which could be any VIRQ) to it's proper name.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 9804a5db435fe40c8ded8cf36c2d2b2281c56f1d)
+---
+ tools/ocaml/xenstored/event.ml | 9 ++++++---
+ tools/ocaml/xenstored/xenstored.ml | 4 +---
+ 2 files changed, 7 insertions(+), 6 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/event.ml b/tools/ocaml/xenstored/event.ml
+index ccca90b6fc..a3be296374 100644
+--- a/tools/ocaml/xenstored/event.ml
++++ b/tools/ocaml/xenstored/event.ml
+@@ -17,12 +17,15 @@
+ (**************** high level binding ****************)
+ type t = {
+ handle: Xeneventchn.handle;
+- mutable virq_port: Xeneventchn.t option;
++ domexc: Xeneventchn.t;
+ }
+
+-let init () = { handle = Xeneventchn.init (); virq_port = None; }
++let init () =
++ let handle = Xeneventchn.init () in
++ let domexc = Xeneventchn.bind_dom_exc_virq handle in
++ { handle; domexc }
++
+ let fd eventchn = Xeneventchn.fd eventchn.handle
+-let bind_dom_exc_virq eventchn = eventchn.virq_port <- Some (Xeneventchn.bind_dom_exc_virq eventchn.handle)
+ let bind_interdomain eventchn domid port = Xeneventchn.bind_interdomain eventchn.handle domid port
+ let unbind eventchn port = Xeneventchn.unbind eventchn.handle port
+ let notify eventchn port = Xeneventchn.notify eventchn.handle port
+diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
+index c5dc7a28d0..55071b49ec 100644
+--- a/tools/ocaml/xenstored/xenstored.ml
++++ b/tools/ocaml/xenstored/xenstored.ml
+@@ -397,7 +397,6 @@ let _ =
+ if cf.restart && Sys.file_exists Disk.xs_daemon_database then (
+ let rwro = DB.from_file store domains cons Disk.xs_daemon_database in
+ info "Live reload: database loaded";
+- Event.bind_dom_exc_virq eventchn;
+ Process.LiveUpdate.completed ();
+ rwro
+ ) else (
+@@ -413,7 +412,6 @@ let _ =
+
+ if cf.domain_init then (
+ Connections.add_domain cons (Domains.create0 domains);
+- Event.bind_dom_exc_virq eventchn
+ );
+ rw_sock
+ ) in
+@@ -451,7 +449,7 @@ let _ =
+ let port = Event.pending eventchn in
+ debug "pending port %d" (Xeneventchn.to_int port);
+ finally (fun () ->
+- if Some port = eventchn.Event.virq_port then (
++ if port = eventchn.Event.domexc then (
+ let (notify, deaddom) = Domains.cleanup domains in
+ List.iter (Store.reset_permissions store) deaddom;
+ List.iter (Connections.del_domain cons) deaddom;
+--
+2.40.0
+
diff --git a/0016-tools-oxenstored-Rename-some-port-variables-to-remot.patch b/0016-tools-oxenstored-Rename-some-port-variables-to-remot.patch
new file mode 100644
index 0000000..4f168d6
--- /dev/null
+++ b/0016-tools-oxenstored-Rename-some-port-variables-to-remot.patch
@@ -0,0 +1,144 @@
+From fd0d9b05970986545656c8f6f688f70f3e78a29b Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 30 Nov 2022 03:17:28 +0000
+Subject: [PATCH 16/89] tools/oxenstored: Rename some 'port' variables to
+ 'remote_port'
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+This will make the logic clearer when we plumb local_port through these
+functions.
+
+While doing this, rearrange the construct in Domains.create0 to separate the
+remote port handling from the interface handling. (The interface logic is
+dubious in several ways, but not altered by this cleanup.)
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 31fbee749a75621039ca601eaee7222050a7dd83)
+---
+ tools/ocaml/xenstored/domains.ml | 26 ++++++++++++--------------
+ tools/ocaml/xenstored/process.ml | 12 ++++++------
+ tools/ocaml/xenstored/xenstored.ml | 8 ++++----
+ 3 files changed, 22 insertions(+), 24 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/domains.ml b/tools/ocaml/xenstored/domains.ml
+index 17fe2fa257..26018ac0dd 100644
+--- a/tools/ocaml/xenstored/domains.ml
++++ b/tools/ocaml/xenstored/domains.ml
+@@ -122,9 +122,9 @@ let cleanup doms =
+ let resume _doms _domid =
+ ()
+
+-let create doms domid mfn port =
++let create doms domid mfn remote_port =
+ let interface = Xenctrl.map_foreign_range xc domid (Xenmmap.getpagesize()) mfn in
+- let dom = Domain.make domid mfn port interface doms.eventchn in
++ let dom = Domain.make domid mfn remote_port interface doms.eventchn in
+ Hashtbl.add doms.table domid dom;
+ Domain.bind_interdomain dom;
+ dom
+@@ -133,18 +133,16 @@ let xenstored_kva = ref ""
+ let xenstored_port = ref ""
+
+ let create0 doms =
+- let port, interface =
+- (
+- let port = Utils.read_file_single_integer !xenstored_port
+- and fd = Unix.openfile !xenstored_kva
+- [ Unix.O_RDWR ] 0o600 in
+- let interface = Xenmmap.mmap fd Xenmmap.RDWR Xenmmap.SHARED
+- (Xenmmap.getpagesize()) 0 in
+- Unix.close fd;
+- port, interface
+- )
+- in
+- let dom = Domain.make 0 Nativeint.zero port interface doms.eventchn in
++ let remote_port = Utils.read_file_single_integer !xenstored_port in
++
++ let interface =
++ let fd = Unix.openfile !xenstored_kva [ Unix.O_RDWR ] 0o600 in
++ let interface = Xenmmap.mmap fd Xenmmap.RDWR Xenmmap.SHARED (Xenmmap.getpagesize()) 0 in
++ Unix.close fd;
++ interface
++ in
++
++ let dom = Domain.make 0 Nativeint.zero remote_port interface doms.eventchn in
+ Hashtbl.add doms.table 0 dom;
+ Domain.bind_interdomain dom;
+ Domain.notify dom;
+diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
+index 72a79e9328..b2973aca2a 100644
+--- a/tools/ocaml/xenstored/process.ml
++++ b/tools/ocaml/xenstored/process.ml
+@@ -558,10 +558,10 @@ let do_transaction_end con t domains cons data =
+ let do_introduce con t domains cons data =
+ if not (Connection.is_dom0 con)
+ then raise Define.Permission_denied;
+- let (domid, mfn, port) =
++ let (domid, mfn, remote_port) =
+ match (split None '\000' data) with
+- | domid :: mfn :: port :: _ ->
+- int_of_string domid, Nativeint.of_string mfn, int_of_string port
++ | domid :: mfn :: remote_port :: _ ->
++ int_of_string domid, Nativeint.of_string mfn, int_of_string remote_port
+ | _ -> raise Invalid_Cmd_Args;
+ in
+ let dom =
+@@ -569,18 +569,18 @@ let do_introduce con t domains cons data =
+ let edom = Domains.find domains domid in
+ if (Domain.get_mfn edom) = mfn && (Connections.find_domain cons domid) != con then begin
+ (* Use XS_INTRODUCE for recreating the xenbus event-channel. *)
+- edom.remote_port <- port;
++ edom.remote_port <- remote_port;
+ Domain.bind_interdomain edom;
+ end;
+ edom
+ else try
+- let ndom = Domains.create domains domid mfn port in
++ let ndom = Domains.create domains domid mfn remote_port in
+ Connections.add_domain cons ndom;
+ Connections.fire_spec_watches (Transaction.get_root t) cons Store.Path.introduce_domain;
+ ndom
+ with _ -> raise Invalid_Cmd_Args
+ in
+- if (Domain.get_remote_port dom) <> port || (Domain.get_mfn dom) <> mfn then
++ if (Domain.get_remote_port dom) <> remote_port || (Domain.get_mfn dom) <> mfn then
+ raise Domain_not_match
+
+ let do_release con t domains cons data =
+diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
+index 55071b49ec..1f11f576b5 100644
+--- a/tools/ocaml/xenstored/xenstored.ml
++++ b/tools/ocaml/xenstored/xenstored.ml
+@@ -167,10 +167,10 @@ let from_channel_f chan global_f socket_f domain_f watch_f store_f =
+ global_f ~rw
+ | "socket" :: fd :: [] ->
+ socket_f ~fd:(int_of_string fd)
+- | "dom" :: domid :: mfn :: port :: []->
++ | "dom" :: domid :: mfn :: remote_port :: []->
+ domain_f (int_of_string domid)
+ (Nativeint.of_string mfn)
+- (int_of_string port)
++ (int_of_string remote_port)
+ | "watch" :: domid :: path :: token :: [] ->
+ watch_f (int_of_string domid)
+ (unhexify path) (unhexify token)
+@@ -209,10 +209,10 @@ let from_channel store cons doms chan =
+ else
+ warn "Ignoring invalid socket FD %d" fd
+ in
+- let domain_f domid mfn port =
++ let domain_f domid mfn remote_port =
+ let ndom =
+ if domid > 0 then
+- Domains.create doms domid mfn port
++ Domains.create doms domid mfn remote_port
+ else
+ Domains.create0 doms
+ in
+--
+2.40.0
+
diff --git a/0017-tools-oxenstored-Implement-Domain.rebind_evtchn.patch b/0017-tools-oxenstored-Implement-Domain.rebind_evtchn.patch
new file mode 100644
index 0000000..72bcae0
--- /dev/null
+++ b/0017-tools-oxenstored-Implement-Domain.rebind_evtchn.patch
@@ -0,0 +1,67 @@
+From a20daa7ffda7ccc0e65abe77532a5dc8059bf128 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 30 Nov 2022 11:55:58 +0000
+Subject: [PATCH 17/89] tools/oxenstored: Implement Domain.rebind_evtchn
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Generally speaking, the event channel local/remote port is fixed for the
+lifetime of the associated domain object. The exception to this is a
+secondary XS_INTRODUCE (defined to re-bind to a new event channel) which pokes
+around at the domain object's internal state.
+
+We need to refactor the evtchn handling to support live update, so start by
+moving the relevant manipulation into Domain.
+
+No practical change.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit aecdc28d9538ca2a1028ef9bc6550cb171dbbed4)
+---
+ tools/ocaml/xenstored/domain.ml | 12 ++++++++++++
+ tools/ocaml/xenstored/process.ml | 3 +--
+ 2 files changed, 13 insertions(+), 2 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/domain.ml b/tools/ocaml/xenstored/domain.ml
+index ab08dcf37f..d59a9401e2 100644
+--- a/tools/ocaml/xenstored/domain.ml
++++ b/tools/ocaml/xenstored/domain.ml
+@@ -63,6 +63,18 @@ let string_of_port = function
+ let dump d chan =
+ fprintf chan "dom,%d,%nd,%d\n" d.id d.mfn d.remote_port
+
++let rebind_evtchn d remote_port =
++ begin match d.port with
++ | None -> ()
++ | Some p -> Event.unbind d.eventchn p
++ end;
++ let local = Event.bind_interdomain d.eventchn d.id remote_port in
++ debug "domain %d rebind (l %s, r %d) => (l %d, r %d)"
++ d.id (string_of_port d.port) d.remote_port
++ (Xeneventchn.to_int local) remote_port;
++ d.remote_port <- remote_port;
++ d.port <- Some (local)
++
+ let notify dom =
+ match dom.port with
+ | None -> warn "domain %d: attempt to notify on unknown port" dom.id
+diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
+index b2973aca2a..1c80e7198d 100644
+--- a/tools/ocaml/xenstored/process.ml
++++ b/tools/ocaml/xenstored/process.ml
+@@ -569,8 +569,7 @@ let do_introduce con t domains cons data =
+ let edom = Domains.find domains domid in
+ if (Domain.get_mfn edom) = mfn && (Connections.find_domain cons domid) != con then begin
+ (* Use XS_INTRODUCE for recreating the xenbus event-channel. *)
+- edom.remote_port <- remote_port;
+- Domain.bind_interdomain edom;
++ Domain.rebind_evtchn edom remote_port;
+ end;
+ edom
+ else try
+--
+2.40.0
+
diff --git a/0018-tools-oxenstored-Rework-Domain-evtchn-handling-to-us.patch b/0018-tools-oxenstored-Rework-Domain-evtchn-handling-to-us.patch
new file mode 100644
index 0000000..1392b34
--- /dev/null
+++ b/0018-tools-oxenstored-Rework-Domain-evtchn-handling-to-us.patch
@@ -0,0 +1,209 @@
+From 4b418768ef4d75d0f70e4ce7cb5710404527bf47 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 30 Nov 2022 11:59:34 +0000
+Subject: [PATCH 18/89] tools/oxenstored: Rework Domain evtchn handling to use
+ port_pair
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Inter-domain event channels are always a pair of local and remote ports.
+Right now the handling is asymmetric, caused by the fact that the evtchn is
+bound after the associated Domain object is constructed.
+
+First, move binding of the event channel into the Domain.make() constructor.
+This means the local port no longer needs to be an option. It also removes
+the final callers of Domain.bind_interdomain.
+
+Next, introduce a new port_pair type to encapsulate the fact that these two
+should be updated together, and replace the previous port and remote_port
+fields. This refactoring also changes the Domain.get_port interface (removing
+an option) so take the opportunity to name it get_local_port instead.
+
+Also, this fixes a use-after-free risk with Domain.close. Once the evtchn has
+been unbound, the same local port number can be reused for a different
+purpose, so explicitly invalidate the ports to prevent their accidental misuse
+in the future.
+
+This also cleans up some of the debugging, to always print a port pair.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit df2db174b36eba67c218763ef621c67912202fc6)
+---
+ tools/ocaml/xenstored/connections.ml | 9 +---
+ tools/ocaml/xenstored/domain.ml | 75 ++++++++++++++--------------
+ tools/ocaml/xenstored/domains.ml | 2 -
+ 3 files changed, 39 insertions(+), 47 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/connections.ml b/tools/ocaml/xenstored/connections.ml
+index 7d68c583b4..a80ae0bed2 100644
+--- a/tools/ocaml/xenstored/connections.ml
++++ b/tools/ocaml/xenstored/connections.ml
+@@ -48,9 +48,7 @@ let add_domain cons dom =
+ let xbcon = Xenbus.Xb.open_mmap ~capacity (Domain.get_interface dom) (fun () -> Domain.notify dom) in
+ let con = Connection.create xbcon (Some dom) in
+ Hashtbl.add cons.domains (Domain.get_id dom) con;
+- match Domain.get_port dom with
+- | Some p -> Hashtbl.add cons.ports p con;
+- | None -> ()
++ Hashtbl.add cons.ports (Domain.get_local_port dom) con
+
+ let select ?(only_if = (fun _ -> true)) cons =
+ Hashtbl.fold (fun _ con (ins, outs) ->
+@@ -97,10 +95,7 @@ let del_domain cons id =
+ let con = find_domain cons id in
+ Hashtbl.remove cons.domains id;
+ (match Connection.get_domain con with
+- | Some d ->
+- (match Domain.get_port d with
+- | Some p -> Hashtbl.remove cons.ports p
+- | None -> ())
++ | Some d -> Hashtbl.remove cons.ports (Domain.get_local_port d)
+ | None -> ());
+ del_watches cons con;
+ Connection.close con
+diff --git a/tools/ocaml/xenstored/domain.ml b/tools/ocaml/xenstored/domain.ml
+index d59a9401e2..481e10794d 100644
+--- a/tools/ocaml/xenstored/domain.ml
++++ b/tools/ocaml/xenstored/domain.ml
+@@ -19,14 +19,31 @@ open Printf
+ let debug fmt = Logging.debug "domain" fmt
+ let warn fmt = Logging.warn "domain" fmt
+
++(* A bound inter-domain event channel port pair. The remote port, and the
++ local port it is bound to. *)
++type port_pair =
++{
++ local: Xeneventchn.t;
++ remote: int;
++}
++
++(* Sentinal port_pair with both set to EVTCHN_INVALID *)
++let invalid_ports =
++{
++ local = Xeneventchn.of_int 0;
++ remote = 0
++}
++
++let string_of_port_pair p =
++ sprintf "(l %d, r %d)" (Xeneventchn.to_int p.local) p.remote
++
+ type t =
+ {
+ id: Xenctrl.domid;
+ mfn: nativeint;
+ interface: Xenmmap.mmap_interface;
+ eventchn: Event.t;
+- mutable remote_port: int;
+- mutable port: Xeneventchn.t option;
++ mutable ports: port_pair;
+ mutable bad_client: bool;
+ mutable io_credit: int; (* the rounds of ring process left to do, default is 0,
+ usually set to 1 when there is work detected, could
+@@ -41,8 +58,8 @@ let is_dom0 d = d.id = 0
+ let get_id domain = domain.id
+ let get_interface d = d.interface
+ let get_mfn d = d.mfn
+-let get_remote_port d = d.remote_port
+-let get_port d = d.port
++let get_remote_port d = d.ports.remote
++let get_local_port d = d.ports.local
+
+ let is_bad_domain domain = domain.bad_client
+ let mark_as_bad domain = domain.bad_client <- true
+@@ -56,54 +73,36 @@ let is_paused_for_conflict dom = dom.conflict_credit <= 0.0
+
+ let is_free_to_conflict = is_dom0
+
+-let string_of_port = function
+- | None -> "None"
+- | Some x -> string_of_int (Xeneventchn.to_int x)
+-
+ let dump d chan =
+- fprintf chan "dom,%d,%nd,%d\n" d.id d.mfn d.remote_port
++ fprintf chan "dom,%d,%nd,%d\n" d.id d.mfn d.ports.remote
+
+ let rebind_evtchn d remote_port =
+- begin match d.port with
+- | None -> ()
+- | Some p -> Event.unbind d.eventchn p
+- end;
++ Event.unbind d.eventchn d.ports.local;
+ let local = Event.bind_interdomain d.eventchn d.id remote_port in
+- debug "domain %d rebind (l %s, r %d) => (l %d, r %d)"
+- d.id (string_of_port d.port) d.remote_port
+- (Xeneventchn.to_int local) remote_port;
+- d.remote_port <- remote_port;
+- d.port <- Some (local)
++ let new_ports = { local; remote = remote_port } in
++ debug "domain %d rebind %s => %s"
++ d.id (string_of_port_pair d.ports) (string_of_port_pair new_ports);
++ d.ports <- new_ports
+
+ let notify dom =
+- match dom.port with
+- | None -> warn "domain %d: attempt to notify on unknown port" dom.id
+- | Some port -> Event.notify dom.eventchn port
+-
+-let bind_interdomain dom =
+- begin match dom.port with
+- | None -> ()
+- | Some port -> Event.unbind dom.eventchn port
+- end;
+- dom.port <- Some (Event.bind_interdomain dom.eventchn dom.id dom.remote_port);
+- debug "bound domain %d remote port %d to local port %s" dom.id dom.remote_port (string_of_port dom.port)
+-
++ Event.notify dom.eventchn dom.ports.local
+
+ let close dom =
+- debug "domain %d unbound port %s" dom.id (string_of_port dom.port);
+- begin match dom.port with
+- | None -> ()
+- | Some port -> Event.unbind dom.eventchn port
+- end;
++ debug "domain %d unbind %s" dom.id (string_of_port_pair dom.ports);
++ Event.unbind dom.eventchn dom.ports.local;
++ dom.ports <- invalid_ports;
+ Xenmmap.unmap dom.interface
+
+-let make id mfn remote_port interface eventchn = {
++let make id mfn remote_port interface eventchn =
++ let local = Event.bind_interdomain eventchn id remote_port in
++ let ports = { local; remote = remote_port } in
++ debug "domain %d bind %s" id (string_of_port_pair ports);
++{
+ id = id;
+ mfn = mfn;
+- remote_port = remote_port;
++ ports;
+ interface = interface;
+ eventchn = eventchn;
+- port = None;
+ bad_client = false;
+ io_credit = 0;
+ conflict_credit = !Define.conflict_burst_limit;
+diff --git a/tools/ocaml/xenstored/domains.ml b/tools/ocaml/xenstored/domains.ml
+index 26018ac0dd..2ab0c5f4d8 100644
+--- a/tools/ocaml/xenstored/domains.ml
++++ b/tools/ocaml/xenstored/domains.ml
+@@ -126,7 +126,6 @@ let create doms domid mfn remote_port =
+ let interface = Xenctrl.map_foreign_range xc domid (Xenmmap.getpagesize()) mfn in
+ let dom = Domain.make domid mfn remote_port interface doms.eventchn in
+ Hashtbl.add doms.table domid dom;
+- Domain.bind_interdomain dom;
+ dom
+
+ let xenstored_kva = ref ""
+@@ -144,7 +143,6 @@ let create0 doms =
+
+ let dom = Domain.make 0 Nativeint.zero remote_port interface doms.eventchn in
+ Hashtbl.add doms.table 0 dom;
+- Domain.bind_interdomain dom;
+ Domain.notify dom;
+ dom
+
+--
+2.40.0
+
diff --git a/0019-tools-oxenstored-Keep-dev-xen-evtchn-open-across-liv.patch b/0019-tools-oxenstored-Keep-dev-xen-evtchn-open-across-liv.patch
new file mode 100644
index 0000000..f6ae3fe
--- /dev/null
+++ b/0019-tools-oxenstored-Keep-dev-xen-evtchn-open-across-liv.patch
@@ -0,0 +1,367 @@
+From f02171b663393e10d35123e5572c0f5b3e72c29d Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Thu, 3 Nov 2022 15:31:39 +0000
+Subject: [PATCH 19/89] tools/oxenstored: Keep /dev/xen/evtchn open across live
+ update
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Closing the evtchn handle will unbind and free all local ports. The new
+xenstored would need to rebind all evtchns, which is work that we don't want
+or need to be doing during the critical handover period.
+
+However, it turns out that the Windows PV drivers also rebind their local port
+too across suspend/resume, leaving (o)xenstored with a stale idea of the
+remote port to use. In this case, reusing the established connection is the
+only robust option.
+
+Therefore:
+ * Have oxenstored open /dev/xen/evtchn without CLOEXEC at start of day.
+ * Extend the handover information with the evtchn fd, domexc virq local port,
+ and the local port number for each domain connection.
+ * Have (the new) oxenstored recover the open handle using Xeneventchn.fdopen,
+ and use the provided local ports rather than trying to rebind them.
+
+When this new information isn't present (i.e. live updating from an oxenstored
+prior to this change), the best-effort status quo will have to do.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 9b224c25293a53fcbe32da68052d861dda71a6f4)
+---
+ tools/ocaml/xenstored/domain.ml | 13 +++--
+ tools/ocaml/xenstored/domains.ml | 9 ++--
+ tools/ocaml/xenstored/event.ml | 20 +++++--
+ tools/ocaml/xenstored/process.ml | 2 +-
+ tools/ocaml/xenstored/xenstored.ml | 85 ++++++++++++++++++++----------
+ 5 files changed, 90 insertions(+), 39 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/domain.ml b/tools/ocaml/xenstored/domain.ml
+index 481e10794d..5c15752a37 100644
+--- a/tools/ocaml/xenstored/domain.ml
++++ b/tools/ocaml/xenstored/domain.ml
+@@ -74,7 +74,8 @@ let is_paused_for_conflict dom = dom.conflict_credit <= 0.0
+ let is_free_to_conflict = is_dom0
+
+ let dump d chan =
+- fprintf chan "dom,%d,%nd,%d\n" d.id d.mfn d.ports.remote
++ fprintf chan "dom,%d,%nd,%d,%d\n"
++ d.id d.mfn d.ports.remote (Xeneventchn.to_int d.ports.local)
+
+ let rebind_evtchn d remote_port =
+ Event.unbind d.eventchn d.ports.local;
+@@ -93,8 +94,14 @@ let close dom =
+ dom.ports <- invalid_ports;
+ Xenmmap.unmap dom.interface
+
+-let make id mfn remote_port interface eventchn =
+- let local = Event.bind_interdomain eventchn id remote_port in
++(* On clean start, local_port will be None, and we must bind the remote port
++ given. On Live Update, the event channel is already bound, and both the
++ local and remote port numbers come from the transfer record. *)
++let make ?local_port ~remote_port id mfn interface eventchn =
++ let local = match local_port with
++ | None -> Event.bind_interdomain eventchn id remote_port
++ | Some p -> Xeneventchn.of_int p
++ in
+ let ports = { local; remote = remote_port } in
+ debug "domain %d bind %s" id (string_of_port_pair ports);
+ {
+diff --git a/tools/ocaml/xenstored/domains.ml b/tools/ocaml/xenstored/domains.ml
+index 2ab0c5f4d8..b6c075c838 100644
+--- a/tools/ocaml/xenstored/domains.ml
++++ b/tools/ocaml/xenstored/domains.ml
+@@ -56,6 +56,7 @@ let exist doms id = Hashtbl.mem doms.table id
+ let find doms id = Hashtbl.find doms.table id
+ let number doms = Hashtbl.length doms.table
+ let iter doms fct = Hashtbl.iter (fun _ b -> fct b) doms.table
++let eventchn doms = doms.eventchn
+
+ let rec is_empty_queue q =
+ Queue.is_empty q ||
+@@ -122,16 +123,16 @@ let cleanup doms =
+ let resume _doms _domid =
+ ()
+
+-let create doms domid mfn remote_port =
++let create doms ?local_port ~remote_port domid mfn =
+ let interface = Xenctrl.map_foreign_range xc domid (Xenmmap.getpagesize()) mfn in
+- let dom = Domain.make domid mfn remote_port interface doms.eventchn in
++ let dom = Domain.make ?local_port ~remote_port domid mfn interface doms.eventchn in
+ Hashtbl.add doms.table domid dom;
+ dom
+
+ let xenstored_kva = ref ""
+ let xenstored_port = ref ""
+
+-let create0 doms =
++let create0 ?local_port doms =
+ let remote_port = Utils.read_file_single_integer !xenstored_port in
+
+ let interface =
+@@ -141,7 +142,7 @@ let create0 doms =
+ interface
+ in
+
+- let dom = Domain.make 0 Nativeint.zero remote_port interface doms.eventchn in
++ let dom = Domain.make ?local_port ~remote_port 0 Nativeint.zero interface doms.eventchn in
+ Hashtbl.add doms.table 0 dom;
+ Domain.notify dom;
+ dom
+diff --git a/tools/ocaml/xenstored/event.ml b/tools/ocaml/xenstored/event.ml
+index a3be296374..629dc6041b 100644
+--- a/tools/ocaml/xenstored/event.ml
++++ b/tools/ocaml/xenstored/event.ml
+@@ -20,9 +20,18 @@ type t = {
+ domexc: Xeneventchn.t;
+ }
+
+-let init () =
+- let handle = Xeneventchn.init () in
+- let domexc = Xeneventchn.bind_dom_exc_virq handle in
++(* On clean start, both parameters will be None, and we must open the evtchn
++ handle and bind the DOM_EXC VIRQ. On Live Update, the fd is preserved
++ across exec(), and the DOM_EXC VIRQ still bound. *)
++let init ?fd ?domexc_port () =
++ let handle = match fd with
++ | None -> Xeneventchn.init ~cloexec:false ()
++ | Some fd -> fd |> Utils.FD.of_int |> Xeneventchn.fdopen
++ in
++ let domexc = match domexc_port with
++ | None -> Xeneventchn.bind_dom_exc_virq handle
++ | Some p -> Xeneventchn.of_int p
++ in
+ { handle; domexc }
+
+ let fd eventchn = Xeneventchn.fd eventchn.handle
+@@ -31,3 +40,8 @@ let unbind eventchn port = Xeneventchn.unbind eventchn.handle port
+ let notify eventchn port = Xeneventchn.notify eventchn.handle port
+ let pending eventchn = Xeneventchn.pending eventchn.handle
+ let unmask eventchn port = Xeneventchn.unmask eventchn.handle port
++
++let dump e chan =
++ Printf.fprintf chan "evtchn-dev,%d,%d\n"
++ (Utils.FD.to_int @@ Xeneventchn.fd e.handle)
++ (Xeneventchn.to_int e.domexc)
+diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
+index 1c80e7198d..02bd0f7d80 100644
+--- a/tools/ocaml/xenstored/process.ml
++++ b/tools/ocaml/xenstored/process.ml
+@@ -573,7 +573,7 @@ let do_introduce con t domains cons data =
+ end;
+ edom
+ else try
+- let ndom = Domains.create domains domid mfn remote_port in
++ let ndom = Domains.create ~remote_port domains domid mfn in
+ Connections.add_domain cons ndom;
+ Connections.fire_spec_watches (Transaction.get_root t) cons Store.Path.introduce_domain;
+ ndom
+diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
+index 1f11f576b5..f526f4fb23 100644
+--- a/tools/ocaml/xenstored/xenstored.ml
++++ b/tools/ocaml/xenstored/xenstored.ml
+@@ -144,7 +144,7 @@ exception Bad_format of string
+
+ let dump_format_header = "$xenstored-dump-format"
+
+-let from_channel_f chan global_f socket_f domain_f watch_f store_f =
++let from_channel_f chan global_f evtchn_f socket_f domain_f watch_f store_f =
+ let unhexify s = Utils.unhexify s in
+ let getpath s =
+ let u = Utils.unhexify s in
+@@ -165,12 +165,19 @@ let from_channel_f chan global_f socket_f domain_f watch_f store_f =
+ (* there might be more parameters here,
+ e.g. a RO socket from a previous version: ignore it *)
+ global_f ~rw
++ | "evtchn-dev" :: fd :: domexc_port :: [] ->
++ evtchn_f ~fd:(int_of_string fd)
++ ~domexc_port:(int_of_string domexc_port)
+ | "socket" :: fd :: [] ->
+ socket_f ~fd:(int_of_string fd)
+- | "dom" :: domid :: mfn :: remote_port :: []->
+- domain_f (int_of_string domid)
+- (Nativeint.of_string mfn)
+- (int_of_string remote_port)
++ | "dom" :: domid :: mfn :: remote_port :: rest ->
++ let local_port = match rest with
++ | [] -> None (* backward compat: old version didn't have it *)
++ | local_port :: _ -> Some (int_of_string local_port) in
++ domain_f ?local_port
++ ~remote_port:(int_of_string remote_port)
++ (int_of_string domid)
++ (Nativeint.of_string mfn)
+ | "watch" :: domid :: path :: token :: [] ->
+ watch_f (int_of_string domid)
+ (unhexify path) (unhexify token)
+@@ -189,10 +196,21 @@ let from_channel_f chan global_f socket_f domain_f watch_f store_f =
+ done;
+ info "Completed loading xenstore dump"
+
+-let from_channel store cons doms chan =
++let from_channel store cons domains_init chan =
+ (* don't let the permission get on our way, full perm ! *)
+ let op = Store.get_ops store Perms.Connection.full_rights in
+ let rwro = ref (None) in
++ let doms = ref (None) in
++
++ let require_doms () =
++ match !doms with
++ | None ->
++ warn "No event channel file descriptor available in dump!";
++ let domains = domains_init @@ Event.init () in
++ doms := Some domains;
++ domains
++ | Some d -> d
++ in
+ let global_f ~rw =
+ let get_listen_sock sockfd =
+ let fd = sockfd |> int_of_string |> Utils.FD.of_int in
+@@ -201,6 +219,10 @@ let from_channel store cons doms chan =
+ in
+ rwro := get_listen_sock rw
+ in
++ let evtchn_f ~fd ~domexc_port =
++ let evtchn = Event.init ~fd ~domexc_port () in
++ doms := Some(domains_init evtchn)
++ in
+ let socket_f ~fd =
+ let ufd = Utils.FD.of_int fd in
+ let is_valid = try (Unix.fstat ufd).Unix.st_kind = Unix.S_SOCK with _ -> false in
+@@ -209,12 +231,13 @@ let from_channel store cons doms chan =
+ else
+ warn "Ignoring invalid socket FD %d" fd
+ in
+- let domain_f domid mfn remote_port =
++ let domain_f ?local_port ~remote_port domid mfn =
++ let doms = require_doms () in
+ let ndom =
+ if domid > 0 then
+- Domains.create doms domid mfn remote_port
++ Domains.create ?local_port ~remote_port doms domid mfn
+ else
+- Domains.create0 doms
++ Domains.create0 ?local_port doms
+ in
+ Connections.add_domain cons ndom;
+ in
+@@ -229,8 +252,8 @@ let from_channel store cons doms chan =
+ op.Store.write path value;
+ op.Store.setperms path perms
+ in
+- from_channel_f chan global_f socket_f domain_f watch_f store_f;
+- !rwro
++ from_channel_f chan global_f evtchn_f socket_f domain_f watch_f store_f;
++ !rwro, require_doms ()
+
+ let from_file store cons doms file =
+ info "Loading xenstore dump from %s" file;
+@@ -238,7 +261,7 @@ let from_file store cons doms file =
+ finally (fun () -> from_channel store doms cons channel)
+ (fun () -> close_in channel)
+
+-let to_channel store cons rw chan =
++let to_channel store cons (rw, evtchn) chan =
+ let hexify s = Utils.hexify s in
+
+ fprintf chan "%s\n" dump_format_header;
+@@ -248,6 +271,9 @@ let to_channel store cons rw chan =
+ Utils.FD.to_int fd in
+ fprintf chan "global,%d\n" (fdopt rw);
+
++ (* dump evtchn device info *)
++ Event.dump evtchn chan;
++
+ (* dump connections related to domains: domid, mfn, eventchn port/ sockets, and watches *)
+ Connections.iter cons (fun con -> Connection.dump con chan);
+
+@@ -367,7 +393,6 @@ let _ =
+ | None -> () end;
+
+ let store = Store.create () in
+- let eventchn = Event.init () in
+ let next_frequent_ops = ref 0. in
+ let advance_next_frequent_ops () =
+ next_frequent_ops := (Unix.gettimeofday () +. !Define.conflict_max_history_seconds)
+@@ -375,16 +400,8 @@ let _ =
+ let delay_next_frequent_ops_by duration =
+ next_frequent_ops := !next_frequent_ops +. duration
+ in
+- let domains = Domains.init eventchn advance_next_frequent_ops in
++ let domains_init eventchn = Domains.init eventchn advance_next_frequent_ops in
+
+- (* For things that need to be done periodically but more often
+- * than the periodic_ops function *)
+- let frequent_ops () =
+- if Unix.gettimeofday () > !next_frequent_ops then (
+- History.trim ();
+- Domains.incr_conflict_credit domains;
+- advance_next_frequent_ops ()
+- ) in
+ let cons = Connections.create () in
+
+ let quit = ref false in
+@@ -393,14 +410,15 @@ let _ =
+ List.iter (fun path ->
+ Store.write store Perms.Connection.full_rights path "") Store.Path.specials;
+
+- let rw_sock =
++ let rw_sock, domains =
+ if cf.restart && Sys.file_exists Disk.xs_daemon_database then (
+- let rwro = DB.from_file store domains cons Disk.xs_daemon_database in
++ let rw, domains = DB.from_file store domains_init cons Disk.xs_daemon_database in
+ info "Live reload: database loaded";
+ Process.LiveUpdate.completed ();
+- rwro
++ rw, domains
+ ) else (
+ info "No live reload: regular startup";
++ let domains = domains_init @@ Event.init () in
+ if !Disk.enable then (
+ info "reading store from disk";
+ Disk.read store
+@@ -413,9 +431,18 @@ let _ =
+ if cf.domain_init then (
+ Connections.add_domain cons (Domains.create0 domains);
+ );
+- rw_sock
++ rw_sock, domains
+ ) in
+
++ (* For things that need to be done periodically but more often
++ * than the periodic_ops function *)
++ let frequent_ops () =
++ if Unix.gettimeofday () > !next_frequent_ops then (
++ History.trim ();
++ Domains.incr_conflict_credit domains;
++ advance_next_frequent_ops ()
++ ) in
++
+ (* required for xenstore-control to detect availability of live-update *)
+ let tool_path = Store.Path.of_string "/tool" in
+ if not (Store.path_exists store tool_path) then
+@@ -430,8 +457,10 @@ let _ =
+ Sys.set_signal Sys.sigusr1 (Sys.Signal_handle (fun _ -> sigusr1_handler store));
+ Sys.set_signal Sys.sigpipe Sys.Signal_ignore;
+
++ let eventchn = Domains.eventchn domains in
++
+ if cf.activate_access_log then begin
+- let post_rotate () = DB.to_file store cons (None) Disk.xs_daemon_database in
++ let post_rotate () = DB.to_file store cons (None, eventchn) Disk.xs_daemon_database in
+ Logging.init_access_log post_rotate
+ end;
+
+@@ -593,7 +622,7 @@ let _ =
+ live_update := Process.LiveUpdate.should_run cons;
+ if !live_update || !quit then begin
+ (* don't initiate live update if saving state fails *)
+- DB.to_file store cons (rw_sock) Disk.xs_daemon_database;
++ DB.to_file store cons (rw_sock, eventchn) Disk.xs_daemon_database;
+ quit := true;
+ end
+ with exc ->
+--
+2.40.0
+
diff --git a/0020-tools-oxenstored-Log-live-update-issues-at-warning-l.patch b/0020-tools-oxenstored-Log-live-update-issues-at-warning-l.patch
new file mode 100644
index 0000000..533e3e7
--- /dev/null
+++ b/0020-tools-oxenstored-Log-live-update-issues-at-warning-l.patch
@@ -0,0 +1,42 @@
+From 991b512f5f69dde3c923804f887be9df56b03a74 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Tue, 8 Nov 2022 08:57:47 +0000
+Subject: [PATCH 20/89] tools/oxenstored: Log live update issues at warning
+ level
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+During live update, oxenstored tries a best effort approach to recover as many
+domains and information as possible even if it encounters errors restoring
+some domains.
+
+However, logging about misunderstood input is more severe than simply info.
+Log it at warning instead.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit 3f02e0a70fe9f8143454b742563433958d4a87f8)
+---
+ tools/ocaml/xenstored/xenstored.ml | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
+index f526f4fb23..35b8cbd43f 100644
+--- a/tools/ocaml/xenstored/xenstored.ml
++++ b/tools/ocaml/xenstored/xenstored.ml
+@@ -186,9 +186,9 @@ let from_channel_f chan global_f evtchn_f socket_f domain_f watch_f store_f =
+ (Perms.Node.of_string (unhexify perms ^ "\000"))
+ (unhexify value)
+ | _ ->
+- info "restoring: ignoring unknown line: %s" line
++ warn "restoring: ignoring unknown line: %s" line
+ with exn ->
+- info "restoring: ignoring unknown line: %s (exception: %s)"
++ warn "restoring: ignoring unknown line: %s (exception: %s)"
+ line (Printexc.to_string exn);
+ ()
+ with End_of_file ->
+--
+2.40.0
+
diff --git a/0021-tools-oxenstored-Set-uncaught-exception-handler.patch b/0021-tools-oxenstored-Set-uncaught-exception-handler.patch
new file mode 100644
index 0000000..8a42fcc
--- /dev/null
+++ b/0021-tools-oxenstored-Set-uncaught-exception-handler.patch
@@ -0,0 +1,83 @@
+From e13a9a2146952859c21c0a0c7b8b07757c2aba9d Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Mon, 7 Nov 2022 17:41:36 +0000
+Subject: [PATCH 21/89] tools/oxenstored: Set uncaught exception handler
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Unhandled exceptions go to stderr by default, but this doesn't typically work
+for oxenstored because:
+ * daemonize reopens stderr as /dev/null
+ * systemd redirects stderr to /dev/null too
+
+Debugging an unhandled exception requires reproducing the issue locally when
+using --no-fork, and is not conducive to figuring out what went wrong on a
+remote system.
+
+Install a custom handler which also tries to render the backtrace to the
+configured syslog facility, and DAEMON|ERR otherwise.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit ee7815f49faf743e960dac9e72809eb66393bc6d)
+---
+ tools/ocaml/xenstored/logging.ml | 29 +++++++++++++++++++++++++++++
+ tools/ocaml/xenstored/xenstored.ml | 3 ++-
+ 2 files changed, 31 insertions(+), 1 deletion(-)
+
+diff --git a/tools/ocaml/xenstored/logging.ml b/tools/ocaml/xenstored/logging.ml
+index 39c3036155..255051437d 100644
+--- a/tools/ocaml/xenstored/logging.ml
++++ b/tools/ocaml/xenstored/logging.ml
+@@ -342,3 +342,32 @@ let xb_answer ~tid ~con ~ty data =
+ let watch_not_fired ~con perms path =
+ let data = Printf.sprintf "EPERM perms=[%s] path=%s" perms path in
+ access_logging ~tid:0 ~con ~data Watch_not_fired ~level:Info
++
++let msg_of exn bt =
++ Printf.sprintf "Fatal exception: %s\n%s\n" (Printexc.to_string exn)
++ (Printexc.raw_backtrace_to_string bt)
++
++let fallback_exception_handler exn bt =
++ (* stderr goes to /dev/null, so use the logger where possible,
++ but always print to stderr too, in case everything else fails,
++ e.g. this can be used to debug with --no-fork
++
++ this function should try not to raise exceptions, but if it does
++ the ocaml runtime should still print the exception, both the original,
++ and the one from this function, but to stderr this time
++ *)
++ let msg = msg_of exn bt in
++ prerr_endline msg;
++ (* See Printexc.set_uncaught_exception_handler, need to flush,
++ so has to call stop and flush *)
++ match !xenstored_logger with
++ | Some l -> error "xenstored-fallback" "%s" msg; l.stop ()
++ | None ->
++ (* Too early, no logger set yet.
++ We normally try to use the configured logger so we don't flood syslog
++ during development for example, or if the user has a file set
++ *)
++ try Syslog.log Syslog.Daemon Syslog.Err msg
++ with e ->
++ let bt = Printexc.get_raw_backtrace () in
++ prerr_endline @@ msg_of e bt
+diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
+index 35b8cbd43f..4d5851c5cb 100644
+--- a/tools/ocaml/xenstored/xenstored.ml
++++ b/tools/ocaml/xenstored/xenstored.ml
+@@ -355,7 +355,8 @@ let tweak_gc () =
+ Gc.set { (Gc.get ()) with Gc.max_overhead = !Define.gc_max_overhead }
+
+
+-let _ =
++let () =
++ Printexc.set_uncaught_exception_handler Logging.fallback_exception_handler;
+ let cf = do_argv in
+ let pidfile =
+ if Sys.file_exists (config_filename cf) then
+--
+2.40.0
+
diff --git a/0022-tools-oxenstored-syslog-Avoid-potential-NULL-derefer.patch b/0022-tools-oxenstored-syslog-Avoid-potential-NULL-derefer.patch
new file mode 100644
index 0000000..eb6d42e
--- /dev/null
+++ b/0022-tools-oxenstored-syslog-Avoid-potential-NULL-derefer.patch
@@ -0,0 +1,55 @@
+From 91a9ac6e9be5aa94020f5c482e6c51b581e2ea39 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
+Date: Tue, 8 Nov 2022 14:24:19 +0000
+Subject: [PATCH 22/89] tools/oxenstored/syslog: Avoid potential NULL
+ dereference
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+strdup() may return NULL. Check for this before passing to syslog().
+
+Drop const from c_msg. It is bogus, as demonstrated by the need to cast to
+void * in order to free the memory.
+
+Signed-off-by: Edwin Török <edvin.torok@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit acd3fb6d65905f8a185dcb9fe6a330a591b96203)
+---
+ tools/ocaml/xenstored/syslog_stubs.c | 7 +++++--
+ 1 file changed, 5 insertions(+), 2 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/syslog_stubs.c b/tools/ocaml/xenstored/syslog_stubs.c
+index 875d48ad57..e16c3a9491 100644
+--- a/tools/ocaml/xenstored/syslog_stubs.c
++++ b/tools/ocaml/xenstored/syslog_stubs.c
+@@ -14,6 +14,7 @@
+
+ #include <syslog.h>
+ #include <string.h>
++#include <caml/fail.h>
+ #include <caml/mlvalues.h>
+ #include <caml/memory.h>
+ #include <caml/alloc.h>
+@@ -35,14 +36,16 @@ static int __syslog_facility_table[] = {
+ value stub_syslog(value facility, value level, value msg)
+ {
+ CAMLparam3(facility, level, msg);
+- const char *c_msg = strdup(String_val(msg));
++ char *c_msg = strdup(String_val(msg));
+ int c_facility = __syslog_facility_table[Int_val(facility)]
+ | __syslog_level_table[Int_val(level)];
+
++ if ( !c_msg )
++ caml_raise_out_of_memory();
+ caml_enter_blocking_section();
+ syslog(c_facility, "%s", c_msg);
+ caml_leave_blocking_section();
+
+- free((void*)c_msg);
++ free(c_msg);
+ CAMLreturn(Val_unit);
+ }
+--
+2.40.0
+
diff --git a/0023-tools-oxenstored-Render-backtraces-more-nicely-in-Sy.patch b/0023-tools-oxenstored-Render-backtraces-more-nicely-in-Sy.patch
new file mode 100644
index 0000000..c0343d0
--- /dev/null
+++ b/0023-tools-oxenstored-Render-backtraces-more-nicely-in-Sy.patch
@@ -0,0 +1,83 @@
+From c4972a4272690384b15d5706f2a833aed636895e Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 1 Dec 2022 21:06:25 +0000
+Subject: [PATCH 23/89] tools/oxenstored: Render backtraces more nicely in
+ Syslog
+
+fallback_exception_handler feeds a string with embedded newlines directly into
+syslog(). While this is an improvement on getting nothing, syslogd escapes
+all control characters it gets, and emits one (long) log line.
+
+Fix the problem generally in the syslog stub. As we already have a local copy
+of the string, split it in place and emit one syslog() call per line.
+
+Also tweak Logging.msg_of to avoid putting an extra newline on a string which
+already ends with one.
+
+Fixes: ee7815f49faf ("tools/oxenstored: Set uncaught exception handler")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Christian Lindig <christian.lindig@citrix.com>
+(cherry picked from commit d2162d884cba0ff7b2ac0d832f4e044444bda2e1)
+---
+ tools/ocaml/xenstored/logging.ml | 2 +-
+ tools/ocaml/xenstored/syslog_stubs.c | 26 +++++++++++++++++++++++---
+ 2 files changed, 24 insertions(+), 4 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/logging.ml b/tools/ocaml/xenstored/logging.ml
+index 255051437d..f233bc9a39 100644
+--- a/tools/ocaml/xenstored/logging.ml
++++ b/tools/ocaml/xenstored/logging.ml
+@@ -344,7 +344,7 @@ let watch_not_fired ~con perms path =
+ access_logging ~tid:0 ~con ~data Watch_not_fired ~level:Info
+
+ let msg_of exn bt =
+- Printf.sprintf "Fatal exception: %s\n%s\n" (Printexc.to_string exn)
++ Printf.sprintf "Fatal exception: %s\n%s" (Printexc.to_string exn)
+ (Printexc.raw_backtrace_to_string bt)
+
+ let fallback_exception_handler exn bt =
+diff --git a/tools/ocaml/xenstored/syslog_stubs.c b/tools/ocaml/xenstored/syslog_stubs.c
+index e16c3a9491..760e78ff73 100644
+--- a/tools/ocaml/xenstored/syslog_stubs.c
++++ b/tools/ocaml/xenstored/syslog_stubs.c
+@@ -37,14 +37,34 @@ value stub_syslog(value facility, value level, value msg)
+ {
+ CAMLparam3(facility, level, msg);
+ char *c_msg = strdup(String_val(msg));
++ char *s = c_msg, *ss;
+ int c_facility = __syslog_facility_table[Int_val(facility)]
+ | __syslog_level_table[Int_val(level)];
+
+ if ( !c_msg )
+ caml_raise_out_of_memory();
+- caml_enter_blocking_section();
+- syslog(c_facility, "%s", c_msg);
+- caml_leave_blocking_section();
++
++ /*
++ * syslog() doesn't like embedded newlines, and c_msg generally
++ * contains them.
++ *
++ * Split the message in place by converting \n to \0, and issue one
++ * syslog() call per line, skipping the final iteration if c_msg ends
++ * with a newline anyway.
++ */
++ do {
++ ss = strchr(s, '\n');
++ if ( ss )
++ *ss = '\0';
++ else if ( *s == '\0' )
++ break;
++
++ caml_enter_blocking_section();
++ syslog(c_facility, "%s", s);
++ caml_leave_blocking_section();
++
++ s = ss + 1;
++ } while ( ss );
+
+ free(c_msg);
+ CAMLreturn(Val_unit);
+--
+2.40.0
+
diff --git a/0024-Revert-tools-xenstore-simplify-loop-handling-connect.patch b/0024-Revert-tools-xenstore-simplify-loop-handling-connect.patch
new file mode 100644
index 0000000..81481fc
--- /dev/null
+++ b/0024-Revert-tools-xenstore-simplify-loop-handling-connect.patch
@@ -0,0 +1,136 @@
+From 2f8851c37f88e4eb4858e16626fcb2379db71a4f Mon Sep 17 00:00:00 2001
+From: Jason Andryuk <jandryuk@gmail.com>
+Date: Thu, 26 Jan 2023 11:00:24 +0100
+Subject: [PATCH 24/89] Revert "tools/xenstore: simplify loop handling
+ connection I/O"
+
+I'm observing guest kexec trigger xenstored to abort on a double free.
+
+gdb output:
+Program received signal SIGABRT, Aborted.
+__pthread_kill_implementation (no_tid=0, signo=6, threadid=140645614258112) at ./nptl/pthread_kill.c:44
+44 ./nptl/pthread_kill.c: No such file or directory.
+(gdb) bt
+ at ./nptl/pthread_kill.c:44
+ at ./nptl/pthread_kill.c:78
+ at ./nptl/pthread_kill.c:89
+ at ../sysdeps/posix/raise.c:26
+ at talloc.c:119
+ ptr=ptr@entry=0x559fae724290) at talloc.c:232
+ at xenstored_core.c:2945
+(gdb) frame 5
+ at talloc.c:119
+119 TALLOC_ABORT("Bad talloc magic value - double free");
+(gdb) frame 7
+ at xenstored_core.c:2945
+2945 talloc_increase_ref_count(conn);
+(gdb) p conn
+$1 = (struct connection *) 0x559fae724290
+
+Looking at a xenstore trace, we have:
+IN 0x559fae71f250 20230120 17:40:53 READ (/local/domain/3/image/device-model-dom
+id )
+wrl: dom 0 1 msec 10000 credit 1000000 reserve 100 disc
+ard
+wrl: dom 3 1 msec 10000 credit 1000000 reserve 100 disc
+ard
+wrl: dom 0 0 msec 10000 credit 1000000 reserve 0 disc
+ard
+wrl: dom 3 0 msec 10000 credit 1000000 reserve 0 disc
+ard
+OUT 0x559fae71f250 20230120 17:40:53 ERROR (ENOENT )
+wrl: dom 0 1 msec 10000 credit 1000000 reserve 100 disc
+ard
+wrl: dom 3 1 msec 10000 credit 1000000 reserve 100 disc
+ard
+IN 0x559fae71f250 20230120 17:40:53 RELEASE (3 )
+DESTROY watch 0x559fae73f630
+DESTROY watch 0x559fae75ddf0
+DESTROY watch 0x559fae75ec30
+DESTROY watch 0x559fae75ea60
+DESTROY watch 0x559fae732c00
+DESTROY watch 0x559fae72cea0
+DESTROY watch 0x559fae728fc0
+DESTROY watch 0x559fae729570
+DESTROY connection 0x559fae724290
+orphaned node /local/domain/3/device/suspend/event-channel deleted
+orphaned node /local/domain/3/device/vbd/51712 deleted
+orphaned node /local/domain/3/device/vkbd/0 deleted
+orphaned node /local/domain/3/device/vif/0 deleted
+orphaned node /local/domain/3/control/shutdown deleted
+orphaned node /local/domain/3/control/feature-poweroff deleted
+orphaned node /local/domain/3/control/feature-reboot deleted
+orphaned node /local/domain/3/control/feature-suspend deleted
+orphaned node /local/domain/3/control/feature-s3 deleted
+orphaned node /local/domain/3/control/feature-s4 deleted
+orphaned node /local/domain/3/control/sysrq deleted
+orphaned node /local/domain/3/data deleted
+orphaned node /local/domain/3/drivers deleted
+orphaned node /local/domain/3/feature deleted
+orphaned node /local/domain/3/attr deleted
+orphaned node /local/domain/3/error deleted
+orphaned node /local/domain/3/console/backend-id deleted
+
+and no further output.
+
+The trace shows that DESTROY was called for connection 0x559fae724290,
+but that is the same pointer (conn) main() was looping through from
+connections. So it wasn't actually removed from the connections list?
+
+Reverting commit e8e6e42279a5 "tools/xenstore: simplify loop handling
+connection I/O" fixes the abort/double free. I think the use of
+list_for_each_entry_safe is incorrect. list_for_each_entry_safe makes
+traversal safe for deleting the current iterator, but RELEASE/do_release
+will delete some other entry in the connections list. I think the
+observed abort is because list_for_each_entry has next pointing to the
+deleted connection, and it is used in the subsequent iteration.
+
+Add a comment explaining the unsuitability of list_for_each_entry_safe.
+Also notice that the old code takes a reference on next which would
+prevents a use-after-free.
+
+This reverts commit e8e6e42279a5723239c5c40ba4c7f579a979465d.
+
+This is XSA-425/CVE-2022-42330.
+
+Fixes: e8e6e42279a5 ("tools/xenstore: simplify loop handling connection I/O")
+Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Julien Grall <jgrall@amazon.com>
+---
+ tools/xenstore/xenstored_core.c | 19 +++++++++++++++++--
+ 1 file changed, 17 insertions(+), 2 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
+index 476d5c6d51..56dbdc2530 100644
+--- a/tools/xenstore/xenstored_core.c
++++ b/tools/xenstore/xenstored_core.c
+@@ -2935,8 +2935,23 @@ int main(int argc, char *argv[])
+ }
+ }
+
+- list_for_each_entry_safe(conn, next, &connections, list) {
+- talloc_increase_ref_count(conn);
++ /*
++ * list_for_each_entry_safe is not suitable here because
++ * handle_input may delete entries besides the current one, but
++ * those may be in the temporary next which would trigger a
++ * use-after-free. list_for_each_entry_safe is only safe for
++ * deleting the current entry.
++ */
++ next = list_entry(connections.next, typeof(*conn), list);
++ if (&next->list != &connections)
++ talloc_increase_ref_count(next);
++ while (&next->list != &connections) {
++ conn = next;
++
++ next = list_entry(conn->list.next,
++ typeof(*conn), list);
++ if (&next->list != &connections)
++ talloc_increase_ref_count(next);
+
+ if (conn_can_read(conn))
+ handle_input(conn);
+--
+2.40.0
+
diff --git a/0004-x86-S3-Restore-Xen-s-MSR_PAT-value-on-S3-resume.patch b/0025-x86-S3-Restore-Xen-s-MSR_PAT-value-on-S3-resume.patch
similarity index 86%
rename from 0004-x86-S3-Restore-Xen-s-MSR_PAT-value-on-S3-resume.patch
rename to 0025-x86-S3-Restore-Xen-s-MSR_PAT-value-on-S3-resume.patch
index 3d1c089..142280f 100644
--- a/0004-x86-S3-Restore-Xen-s-MSR_PAT-value-on-S3-resume.patch
+++ b/0025-x86-S3-Restore-Xen-s-MSR_PAT-value-on-S3-resume.patch
@@ -1,7 +1,7 @@
-From 7b1b9849e8a0d7791866d6d21c45993dfe27836c Mon Sep 17 00:00:00 2001
+From a470a83c36c07b56d90957ae1e6e9ebc458d3686 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 7 Feb 2023 17:03:09 +0100
-Subject: [PATCH 04/61] x86/S3: Restore Xen's MSR_PAT value on S3 resume
+Date: Tue, 7 Feb 2023 16:56:14 +0100
+Subject: [PATCH 25/89] x86/S3: Restore Xen's MSR_PAT value on S3 resume
There are two paths in the trampoline, and Xen's PAT needs setting up in both,
not just the boot path.
diff --git a/0005-tools-Fix-build-with-recent-QEMU-use-enable-trace-ba.patch b/0026-tools-Fix-build-with-recent-QEMU-use-enable-trace-ba.patch
similarity index 90%
rename from 0005-tools-Fix-build-with-recent-QEMU-use-enable-trace-ba.patch
rename to 0026-tools-Fix-build-with-recent-QEMU-use-enable-trace-ba.patch
index ff66a43..5d937d5 100644
--- a/0005-tools-Fix-build-with-recent-QEMU-use-enable-trace-ba.patch
+++ b/0026-tools-Fix-build-with-recent-QEMU-use-enable-trace-ba.patch
@@ -1,7 +1,7 @@
-From 998c03b2abfbf17ff96bccad1512de1ea18d0d75 Mon Sep 17 00:00:00 2001
+From 1d7a388e7b9711cbd7e14b2020b168b6789772af Mon Sep 17 00:00:00 2001
From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Tue, 7 Feb 2023 17:03:51 +0100
-Subject: [PATCH 05/61] tools: Fix build with recent QEMU, use
+Date: Tue, 7 Feb 2023 16:57:22 +0100
+Subject: [PATCH 26/89] tools: Fix build with recent QEMU, use
"--enable-trace-backends"
The configure option "--enable-trace-backend" isn't accepted anymore
@@ -30,7 +30,7 @@ master date: 2023-01-11 10:45:29 +0100
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/Makefile b/tools/Makefile
-index 757a560be0..9b6b605ec9 100644
+index 9e28027835..4906fdbc23 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -218,9 +218,9 @@ subdir-all-qemu-xen-dir: qemu-xen-dir-find
diff --git a/0027-include-compat-produce-stubs-for-headers-not-otherwi.patch b/0027-include-compat-produce-stubs-for-headers-not-otherwi.patch
new file mode 100644
index 0000000..3528bd6
--- /dev/null
+++ b/0027-include-compat-produce-stubs-for-headers-not-otherwi.patch
@@ -0,0 +1,74 @@
+From c871e05e138aae2ac75e9b4ccebe6cf3fd1a775b Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 7 Feb 2023 16:57:52 +0100
+Subject: [PATCH 27/89] include/compat: produce stubs for headers not otherwise
+ generated
+
+Public headers can include other public headers. Such interdependencies
+are retained in their compat counterparts. Since some compat headers are
+generated only in certain configurations, the referenced headers still
+need to exist. The lack thereof was observed with hvm/hvm_op.h needing
+trace.h, where generation of the latter depends on TRACEBUFFER=y. Make
+empty stubs in such cases (as generating the extra headers is relatively
+slow and hence better to avoid). Changes to .config and incrementally
+(re-)building is covered by the respective .*.cmd then no longer
+matching the command to be used, resulting in the necessary re-creation
+of the (possibly stub) header.
+
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
+master commit: 6bec713f871f21c6254a5783c1e39867ea828256
+master date: 2023-01-12 16:17:54 +0100
+---
+ xen/include/Makefile | 14 +++++++++++++-
+ 1 file changed, 13 insertions(+), 1 deletion(-)
+
+diff --git a/xen/include/Makefile b/xen/include/Makefile
+index 65be310eca..cfd7851614 100644
+--- a/xen/include/Makefile
++++ b/xen/include/Makefile
+@@ -34,6 +34,8 @@ headers-$(CONFIG_TRACEBUFFER) += compat/trace.h
+ headers-$(CONFIG_XENOPROF) += compat/xenoprof.h
+ headers-$(CONFIG_XSM_FLASK) += compat/xsm/flask_op.h
+
++headers-n := $(filter-out $(headers-y),$(headers-n) $(headers-))
++
+ cppflags-y := -include public/xen-compat.h -DXEN_GENERATING_COMPAT_HEADERS
+ cppflags-$(CONFIG_X86) += -m32
+
+@@ -43,13 +45,16 @@ public-$(CONFIG_X86) := $(wildcard $(srcdir)/public/arch-x86/*.h $(srcdir)/publi
+ public-$(CONFIG_ARM) := $(wildcard $(srcdir)/public/arch-arm/*.h $(srcdir)/public/arch-arm/*/*.h)
+
+ .PHONY: all
+-all: $(addprefix $(obj)/,$(headers-y))
++all: $(addprefix $(obj)/,$(headers-y) $(headers-n))
+
+ quiet_cmd_compat_h = GEN $@
+ cmd_compat_h = \
+ $(PYTHON) $(srctree)/tools/compat-build-header.py <$< $(patsubst $(obj)/%,%,$@) >>$@.new; \
+ mv -f $@.new $@
+
++quiet_cmd_stub_h = GEN $@
++cmd_stub_h = echo '/* empty */' >$@
++
+ quiet_cmd_compat_i = CPP $@
+ cmd_compat_i = $(CPP) $(filter-out -Wa$(comma)% -include %/include/xen/config.h,$(XEN_CFLAGS)) $(cppflags-y) -o $@ $<
+
+@@ -69,6 +74,13 @@ targets += $(headers-y)
+ $(obj)/compat/%.h: $(obj)/compat/%.i $(srctree)/tools/compat-build-header.py FORCE
+ $(call if_changed,compat_h)
+
++# Placeholders may be needed in case files in $(headers-y) include files we
++# don't otherwise generate. Real dependencies would need spelling out explicitly,
++# for them to appear in $(headers-y) instead.
++targets += $(headers-n)
++$(addprefix $(obj)/,$(headers-n)): FORCE
++ $(call if_changed,stub_h)
++
+ .PRECIOUS: $(obj)/compat/%.i
+ targets += $(patsubst %.h, %.i, $(headers-y))
+ $(obj)/compat/%.i: $(obj)/compat/%.c FORCE
+--
+2.40.0
+
diff --git a/0006-x86-vmx-Calculate-model-specific-LBRs-once-at-start-.patch b/0028-x86-vmx-Calculate-model-specific-LBRs-once-at-start-.patch
similarity index 93%
rename from 0006-x86-vmx-Calculate-model-specific-LBRs-once-at-start-.patch
rename to 0028-x86-vmx-Calculate-model-specific-LBRs-once-at-start-.patch
index c010110..8185bee 100644
--- a/0006-x86-vmx-Calculate-model-specific-LBRs-once-at-start-.patch
+++ b/0028-x86-vmx-Calculate-model-specific-LBRs-once-at-start-.patch
@@ -1,7 +1,7 @@
-From 401e9e33a04c2a9887636ef58490c764543f0538 Mon Sep 17 00:00:00 2001
+From 5e3250258afbace3e5dc3f31ac99c1eebf60f238 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 7 Feb 2023 17:04:18 +0100
-Subject: [PATCH 06/61] x86/vmx: Calculate model-specific LBRs once at start of
+Date: Tue, 7 Feb 2023 16:58:25 +0100
+Subject: [PATCH 28/89] x86/vmx: Calculate model-specific LBRs once at start of
day
There is no point repeating this calculation at runtime, especially as it is
@@ -23,10 +23,10 @@ master date: 2023-01-12 18:42:00 +0000
1 file changed, 139 insertions(+), 137 deletions(-)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
-index 3f42765313..bc308d9df2 100644
+index 7c81b80710..ad91464103 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
-@@ -394,6 +394,142 @@ void vmx_pi_hooks_deassign(struct domain *d)
+@@ -396,6 +396,142 @@ void vmx_pi_hooks_deassign(struct domain *d)
domain_unpause(d);
}
@@ -87,7 +87,7 @@ index 3f42765313..bc308d9df2 100644
+ { MSR_GM_LASTBRANCH_0_TO_IP, NUM_MSR_GM_LASTBRANCH_FROM_TO },
+ { 0, 0 }
+};
-+static const struct lbr_info *__read_mostly model_specific_lbr;
++static const struct lbr_info *__ro_after_init model_specific_lbr;
+
+static const struct lbr_info *__init get_model_specific_lbr(void)
+{
@@ -166,18 +166,18 @@ index 3f42765313..bc308d9df2 100644
+ return NULL;
+}
+
- static int vmx_domain_initialise(struct domain *d)
+ static int cf_check vmx_domain_initialise(struct domain *d)
{
static const struct arch_csw csw = {
-@@ -2812,6 +2948,7 @@ const struct hvm_function_table * __init start_vmx(void)
- vmx_function_table.get_guest_bndcfgs = vmx_get_guest_bndcfgs;
+@@ -2837,6 +2973,7 @@ const struct hvm_function_table * __init start_vmx(void)
+ vmx_function_table.tsc_scaling.setup = vmx_setup_tsc_scaling;
}
+ model_specific_lbr = get_model_specific_lbr();
lbr_tsx_fixup_check();
ler_to_fixup_check();
-@@ -2958,141 +3095,6 @@ static int vmx_cr_access(cr_access_qual_t qual)
+@@ -2983,141 +3120,6 @@ static int vmx_cr_access(cr_access_qual_t qual)
return X86EMUL_OKAY;
}
@@ -319,7 +319,7 @@ index 3f42765313..bc308d9df2 100644
enum
{
LBR_FORMAT_32 = 0x0, /* 32-bit record format */
-@@ -3199,7 +3201,7 @@ static void __init ler_to_fixup_check(void)
+@@ -3224,7 +3226,7 @@ static void __init ler_to_fixup_check(void)
static int is_last_branch_msr(u32 ecx)
{
@@ -328,7 +328,7 @@ index 3f42765313..bc308d9df2 100644
if ( lbr == NULL )
return 0;
-@@ -3536,7 +3538,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
+@@ -3563,7 +3565,7 @@ static int cf_check vmx_msr_write_intercept(
if ( !(v->arch.hvm.vmx.lbr_flags & LBR_MSRS_INSERTED) &&
(msr_content & IA32_DEBUGCTLMSR_LBR) )
{
diff --git a/0007-x86-vmx-Support-for-CPUs-without-model-specific-LBR.patch b/0029-x86-vmx-Support-for-CPUs-without-model-specific-LBR.patch
similarity index 88%
rename from 0007-x86-vmx-Support-for-CPUs-without-model-specific-LBR.patch
rename to 0029-x86-vmx-Support-for-CPUs-without-model-specific-LBR.patch
index fc81a17..2f87b83 100644
--- a/0007-x86-vmx-Support-for-CPUs-without-model-specific-LBR.patch
+++ b/0029-x86-vmx-Support-for-CPUs-without-model-specific-LBR.patch
@@ -1,7 +1,7 @@
-From 9f425039ca50e8cc8db350ec54d8a7cd4175f417 Mon Sep 17 00:00:00 2001
+From e904d8ae01a0be53368c8c388f13bf4ffcbcdf6c Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 7 Feb 2023 17:04:49 +0100
-Subject: [PATCH 07/61] x86/vmx: Support for CPUs without model-specific LBR
+Date: Tue, 7 Feb 2023 16:59:14 +0100
+Subject: [PATCH 29/89] x86/vmx: Support for CPUs without model-specific LBR
Ice Lake (server at least) has both architectural LBR and model-specific LBR.
Sapphire Rapids does not have model-specific LBR at all. I.e. On SPR and
@@ -26,10 +26,10 @@ master date: 2023-01-12 18:42:00 +0000
1 file changed, 16 insertions(+), 15 deletions(-)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
-index bc308d9df2..094141be9a 100644
+index ad91464103..861f91f2af 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
-@@ -3518,18 +3518,26 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
+@@ -3545,18 +3545,26 @@ static int cf_check vmx_msr_write_intercept(
if ( msr_content & rsvd )
goto gp_fault;
@@ -64,7 +64,7 @@ index bc308d9df2..094141be9a 100644
*
* Either way, there is nothing we can do right now to recover, and
* the guest won't execute correctly either. Simply crash the domain
-@@ -3540,13 +3548,6 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content)
+@@ -3567,13 +3575,6 @@ static int cf_check vmx_msr_write_intercept(
{
const struct lbr_info *lbr = model_specific_lbr;
diff --git a/0008-x86-shadow-fix-PAE-check-for-top-level-table-unshado.patch b/0030-x86-shadow-fix-PAE-check-for-top-level-table-unshado.patch
similarity index 82%
rename from 0008-x86-shadow-fix-PAE-check-for-top-level-table-unshado.patch
rename to 0030-x86-shadow-fix-PAE-check-for-top-level-table-unshado.patch
index ab7862b..e2bb8df 100644
--- a/0008-x86-shadow-fix-PAE-check-for-top-level-table-unshado.patch
+++ b/0030-x86-shadow-fix-PAE-check-for-top-level-table-unshado.patch
@@ -1,7 +1,7 @@
-From 1550835b381a18fc0e972e5d04925e02fab31553 Mon Sep 17 00:00:00 2001
+From 2d74e7035bd060d662f1c4f8522377be8021be92 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 7 Feb 2023 17:05:22 +0100
-Subject: [PATCH 08/61] x86/shadow: fix PAE check for top-level table
+Date: Tue, 7 Feb 2023 16:59:54 +0100
+Subject: [PATCH 30/89] x86/shadow: fix PAE check for top-level table
unshadowing
Clearly within the for_each_vcpu() the vCPU of this loop is meant, not
@@ -18,10 +18,10 @@ master date: 2023-01-20 09:23:42 +0100
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
-index c07af0bd99..f7acd18a36 100644
+index 2370b30602..671bf8c228 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
-@@ -2665,10 +2665,10 @@ static int sh_page_fault(struct vcpu *v,
+@@ -2672,10 +2672,10 @@ static int cf_check sh_page_fault(
#if GUEST_PAGING_LEVELS == 3
unsigned int i;
diff --git a/0031-build-fix-building-flask-headers-before-descending-i.patch b/0031-build-fix-building-flask-headers-before-descending-i.patch
new file mode 100644
index 0000000..273e795
--- /dev/null
+++ b/0031-build-fix-building-flask-headers-before-descending-i.patch
@@ -0,0 +1,50 @@
+From 819a5d4ed8b79e21843d5960a7ab8fbd16f28233 Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Tue, 7 Feb 2023 17:00:29 +0100
+Subject: [PATCH 31/89] build: fix building flask headers before descending in
+ flask/ss/
+
+Unfortunatly, adding prerequisite to "$(obj)/ss/built_in.o" doesn't
+work because we have "$(obj)/%/built_in.o: $(obj)/% ;" in Rules.mk.
+So, make is allow to try to build objects in "xsm/flask/ss/" before
+generating the headers.
+
+Adding a prerequisite on "$(obj)/ss" instead will fix the issue as
+that's the target used to run make in this subdirectory.
+
+Unfortunatly, that target is also used when running `make clean`, so
+we want to ignore it in this case. $(MAKECMDGOALS) can't be used in
+this case as it is empty, but we can guess which operation is done by
+looking at the list of loaded makefiles.
+
+Fixes: 7a3bcd2babcc ("build: build everything from the root dir, use obj=$subdir")
+Reported-by: "Daniel P. Smith" <dpsmith@apertussolutions.com>
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: d60324d8af9404014cfcc37bba09e9facfd02fcf
+master date: 2023-01-23 15:03:58 +0100
+---
+ xen/xsm/flask/Makefile | 6 +++++-
+ 1 file changed, 5 insertions(+), 1 deletion(-)
+
+diff --git a/xen/xsm/flask/Makefile b/xen/xsm/flask/Makefile
+index d25312f4fa..3fdcf7727e 100644
+--- a/xen/xsm/flask/Makefile
++++ b/xen/xsm/flask/Makefile
+@@ -16,7 +16,11 @@ FLASK_H_FILES := flask.h class_to_string.h initial_sid_to_string.h
+ AV_H_FILES := av_perm_to_string.h av_permissions.h
+ ALL_H_FILES := $(addprefix include/,$(FLASK_H_FILES) $(AV_H_FILES))
+
+-$(addprefix $(obj)/,$(obj-y)) $(obj)/ss/built_in.o: $(addprefix $(obj)/,$(ALL_H_FILES))
++# Adding prerequisite to descending into ss/ folder only when not running
++# `make *clean`.
++ifeq ($(filter %/Makefile.clean,$(MAKEFILE_LIST)),)
++$(addprefix $(obj)/,$(obj-y)) $(obj)/ss: $(addprefix $(obj)/,$(ALL_H_FILES))
++endif
+ extra-y += $(ALL_H_FILES)
+
+ mkflask := $(srcdir)/policy/mkflask.sh
+--
+2.40.0
+
diff --git a/0009-ns16550-fix-an-incorrect-assignment-to-uart-io_size.patch b/0032-ns16550-fix-an-incorrect-assignment-to-uart-io_size.patch
similarity index 81%
rename from 0009-ns16550-fix-an-incorrect-assignment-to-uart-io_size.patch
rename to 0032-ns16550-fix-an-incorrect-assignment-to-uart-io_size.patch
index 83e46c7..8b3a410 100644
--- a/0009-ns16550-fix-an-incorrect-assignment-to-uart-io_size.patch
+++ b/0032-ns16550-fix-an-incorrect-assignment-to-uart-io_size.patch
@@ -1,7 +1,7 @@
-From 0fd9ad2b9c0c9d9c4879a566f1788d3e9cd38ef6 Mon Sep 17 00:00:00 2001
+From d0127881376baeea1e4eb71d0f7b56d942147124 Mon Sep 17 00:00:00 2001
From: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
-Date: Tue, 7 Feb 2023 17:05:56 +0100
-Subject: [PATCH 09/61] ns16550: fix an incorrect assignment to uart->io_size
+Date: Tue, 7 Feb 2023 17:00:47 +0100
+Subject: [PATCH 32/89] ns16550: fix an incorrect assignment to uart->io_size
uart->io_size represents the size in bytes. Thus, when serial_port.bit_width
is assigned to it, it should be converted to size in bytes.
@@ -17,10 +17,10 @@ master date: 2023-01-24 16:54:38 +0100
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
-index 2d2bd2a024..5dd4d723f5 100644
+index 01a05c9aa8..ce013fb6a5 100644
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
-@@ -1780,7 +1780,7 @@ static int __init ns16550_acpi_uart_init(const void *data)
+@@ -1875,7 +1875,7 @@ static int __init ns16550_acpi_uart_init(const void *data)
uart->parity = spcr->parity;
uart->stop_bits = spcr->stop_bits;
uart->io_base = spcr->serial_port.address;
diff --git a/0010-libxl-fix-guest-kexec-skip-cpuid-policy.patch b/0033-libxl-fix-guest-kexec-skip-cpuid-policy.patch
similarity index 86%
rename from 0010-libxl-fix-guest-kexec-skip-cpuid-policy.patch
rename to 0033-libxl-fix-guest-kexec-skip-cpuid-policy.patch
index 6150286..7eb3779 100644
--- a/0010-libxl-fix-guest-kexec-skip-cpuid-policy.patch
+++ b/0033-libxl-fix-guest-kexec-skip-cpuid-policy.patch
@@ -1,7 +1,7 @@
-From 6e081438bf8ef616d0123aab7a743476d8114ef6 Mon Sep 17 00:00:00 2001
+From 3dae50283d9819c691a97f15b133124c00d39a2f Mon Sep 17 00:00:00 2001
From: Jason Andryuk <jandryuk@gmail.com>
-Date: Tue, 7 Feb 2023 17:06:47 +0100
-Subject: [PATCH 10/61] libxl: fix guest kexec - skip cpuid policy
+Date: Tue, 7 Feb 2023 17:01:49 +0100
+Subject: [PATCH 33/89] libxl: fix guest kexec - skip cpuid policy
When a domain performs a kexec (soft reset), libxl__build_pre() is
called with the existing domid. Calling libxl__cpuid_legacy() on the
@@ -30,10 +30,10 @@ master date: 2023-01-26 10:58:23 +0100
3 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
-index 885675591f..2e6357a9d7 100644
+index 612eacfc7f..dbee32b7b7 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
-@@ -2176,6 +2176,8 @@ static int do_domain_soft_reset(libxl_ctx *ctx,
+@@ -2203,6 +2203,8 @@ static int do_domain_soft_reset(libxl_ctx *ctx,
aop_console_how);
cdcs->domid_out = &domid_out;
@@ -43,10 +43,10 @@ index 885675591f..2e6357a9d7 100644
if (!dom_path) {
LOGD(ERROR, domid, "failed to read domain path");
diff --git a/tools/libs/light/libxl_dom.c b/tools/libs/light/libxl_dom.c
-index 73fccd9243..a2bd2395fa 100644
+index b454f988fb..f6311eea6e 100644
--- a/tools/libs/light/libxl_dom.c
+++ b/tools/libs/light/libxl_dom.c
-@@ -384,7 +384,7 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
+@@ -382,7 +382,7 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
/* Construct a CPUID policy, but only for brand new domains. Domains
* being migrated-in/restored have CPUID handled during the
* static_data_done() callback. */
@@ -56,10 +56,10 @@ index 73fccd9243..a2bd2395fa 100644
out:
diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
-index 0b4671318c..ee6a251700 100644
+index a7c447c10e..cae160351f 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
-@@ -1407,6 +1407,7 @@ typedef struct {
+@@ -1406,6 +1406,7 @@ typedef struct {
/* Whether this domain is being migrated/restored, or booting fresh. Only
* applicable to the primary domain, not support domains (e.g. stub QEMU). */
bool restore;
diff --git a/0011-tools-ocaml-xenctrl-Make-domain_getinfolist-tail-rec.patch b/0034-tools-ocaml-xenctrl-Make-domain_getinfolist-tail-rec.patch
similarity index 91%
rename from 0011-tools-ocaml-xenctrl-Make-domain_getinfolist-tail-rec.patch
rename to 0034-tools-ocaml-xenctrl-Make-domain_getinfolist-tail-rec.patch
index 1d4455f..8f57d4e 100644
--- a/0011-tools-ocaml-xenctrl-Make-domain_getinfolist-tail-rec.patch
+++ b/0034-tools-ocaml-xenctrl-Make-domain_getinfolist-tail-rec.patch
@@ -1,7 +1,7 @@
-From c6a3d14df051bae0323af539e34cf5a65fba1112 Mon Sep 17 00:00:00 2001
+From 03f545b6cf3220b4647677b588e5525a781a4813 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
Date: Tue, 1 Nov 2022 17:59:16 +0000
-Subject: [PATCH 11/61] tools/ocaml/xenctrl: Make domain_getinfolist tail
+Subject: [PATCH 34/89] tools/ocaml/xenctrl: Make domain_getinfolist tail
recursive
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
@@ -31,10 +31,10 @@ Acked-by: Christian Lindig <christian.lindig@citrix.com>
1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
-index 7503031d8f..f10b686215 100644
+index 83e39a8616..85b73a7f6f 100644
--- a/tools/ocaml/libs/xc/xenctrl.ml
+++ b/tools/ocaml/libs/xc/xenctrl.ml
-@@ -212,14 +212,25 @@ external domain_shutdown: handle -> domid -> shutdown_reason -> unit
+@@ -222,14 +222,25 @@ external domain_shutdown: handle -> domid -> shutdown_reason -> unit
external _domain_getinfolist: handle -> domid -> int -> domaininfo list
= "stub_xc_domain_getinfolist"
diff --git a/0012-tools-ocaml-xenctrl-Use-larger-chunksize-in-domain_g.patch b/0035-tools-ocaml-xenctrl-Use-larger-chunksize-in-domain_g.patch
similarity index 86%
rename from 0012-tools-ocaml-xenctrl-Use-larger-chunksize-in-domain_g.patch
rename to 0035-tools-ocaml-xenctrl-Use-larger-chunksize-in-domain_g.patch
index fc352ad..6c64355 100644
--- a/0012-tools-ocaml-xenctrl-Use-larger-chunksize-in-domain_g.patch
+++ b/0035-tools-ocaml-xenctrl-Use-larger-chunksize-in-domain_g.patch
@@ -1,7 +1,7 @@
-From 8c66a2d88a9f17e5b5099fcb83231b7a1169ca25 Mon Sep 17 00:00:00 2001
+From 5d8f9cfa166c55a308856e7b021d778350edbd6c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
Date: Tue, 1 Nov 2022 17:59:17 +0000
-Subject: [PATCH 12/61] tools/ocaml/xenctrl: Use larger chunksize in
+Subject: [PATCH 35/89] tools/ocaml/xenctrl: Use larger chunksize in
domain_getinfolist
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
@@ -24,10 +24,10 @@ Acked-by: Christian Lindig <christian.lindig@citrix.com>
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
-index f10b686215..b40c70d33f 100644
+index 85b73a7f6f..aa650533f7 100644
--- a/tools/ocaml/libs/xc/xenctrl.ml
+++ b/tools/ocaml/libs/xc/xenctrl.ml
-@@ -223,7 +223,7 @@ let rev_append_fold acc e = List.rev_append e acc
+@@ -233,7 +233,7 @@ let rev_append_fold acc e = List.rev_append e acc
let rev_concat lst = List.fold_left rev_append_fold [] lst
let domain_getinfolist handle first_domain =
diff --git a/0013-tools-ocaml-xb-mmap-Use-Data_abstract_val-wrapper.patch b/0036-tools-ocaml-xb-mmap-Use-Data_abstract_val-wrapper.patch
similarity index 95%
rename from 0013-tools-ocaml-xb-mmap-Use-Data_abstract_val-wrapper.patch
rename to 0036-tools-ocaml-xb-mmap-Use-Data_abstract_val-wrapper.patch
index a999dd8..d6a324a 100644
--- a/0013-tools-ocaml-xb-mmap-Use-Data_abstract_val-wrapper.patch
+++ b/0036-tools-ocaml-xb-mmap-Use-Data_abstract_val-wrapper.patch
@@ -1,7 +1,7 @@
-From 049d16c8ce900dfc8f4b657849aeb82b95ed857c Mon Sep 17 00:00:00 2001
+From 7d516fc87637dc551494f8eca08f106f578f7112 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
Date: Fri, 16 Dec 2022 18:25:10 +0000
-Subject: [PATCH 13/61] tools/ocaml/xb,mmap: Use Data_abstract_val wrapper
+Subject: [PATCH 36/89] tools/ocaml/xb,mmap: Use Data_abstract_val wrapper
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
diff --git a/0014-tools-ocaml-xb-Drop-Xs_ring.write.patch b/0037-tools-ocaml-xb-Drop-Xs_ring.write.patch
similarity index 95%
rename from 0014-tools-ocaml-xb-Drop-Xs_ring.write.patch
rename to 0037-tools-ocaml-xb-Drop-Xs_ring.write.patch
index 813f041..226ae52 100644
--- a/0014-tools-ocaml-xb-Drop-Xs_ring.write.patch
+++ b/0037-tools-ocaml-xb-Drop-Xs_ring.write.patch
@@ -1,7 +1,7 @@
-From f7c4fab9b50af74d0e1170fbf35367ced48d8209 Mon Sep 17 00:00:00 2001
+From f0e653fb4aea77210b8096c170e82de3c2039d89 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
Date: Fri, 16 Dec 2022 18:25:20 +0000
-Subject: [PATCH 14/61] tools/ocaml/xb: Drop Xs_ring.write
+Subject: [PATCH 37/89] tools/ocaml/xb: Drop Xs_ring.write
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
diff --git a/0015-tools-oxenstored-validate-config-file-before-live-up.patch b/0038-tools-oxenstored-validate-config-file-before-live-up.patch
similarity index 97%
rename from 0015-tools-oxenstored-validate-config-file-before-live-up.patch
rename to 0038-tools-oxenstored-validate-config-file-before-live-up.patch
index f65fbd6..5b7f58a 100644
--- a/0015-tools-oxenstored-validate-config-file-before-live-up.patch
+++ b/0038-tools-oxenstored-validate-config-file-before-live-up.patch
@@ -1,7 +1,7 @@
-From fd1c70442d3aa962be4d041d5f8fce9d2fa72ce1 Mon Sep 17 00:00:00 2001
+From e74d868b48d55dfb20f5a41ec20fbec93d8e5deb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
Date: Tue, 11 May 2021 15:56:50 +0000
-Subject: [PATCH 15/61] tools/oxenstored: validate config file before live
+Subject: [PATCH 38/89] tools/oxenstored: validate config file before live
update
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
diff --git a/0016-tools-ocaml-libs-Don-t-declare-stubs-as-taking-void.patch b/0039-tools-ocaml-libs-Don-t-declare-stubs-as-taking-void.patch
similarity index 92%
rename from 0016-tools-ocaml-libs-Don-t-declare-stubs-as-taking-void.patch
rename to 0039-tools-ocaml-libs-Don-t-declare-stubs-as-taking-void.patch
index a64d657..c967391 100644
--- a/0016-tools-ocaml-libs-Don-t-declare-stubs-as-taking-void.patch
+++ b/0039-tools-ocaml-libs-Don-t-declare-stubs-as-taking-void.patch
@@ -1,7 +1,7 @@
-From 552e5f28d411c1a1a92f2fd3592a76e74f47610b Mon Sep 17 00:00:00 2001
+From 2c21e1bee6d62cbd523069e839086addf35da9f2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
Date: Thu, 12 Jan 2023 11:28:29 +0000
-Subject: [PATCH 16/61] tools/ocaml/libs: Don't declare stubs as taking void
+Subject: [PATCH 39/89] tools/ocaml/libs: Don't declare stubs as taking void
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
@@ -41,7 +41,7 @@ index 3065181a55..97116b0782 100644
CAMLprim value stub_header_of_string(value s)
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
-index 5b4fe72c8d..434fc0345b 100644
+index f37848ae0b..6eb0ea69da 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -67,9 +67,9 @@ static void Noreturn failwith_xc(xc_interface *xch)
diff --git a/0017-tools-ocaml-libs-Allocate-the-correct-amount-of-memo.patch b/0040-tools-ocaml-libs-Allocate-the-correct-amount-of-memo.patch
similarity index 87%
rename from 0017-tools-ocaml-libs-Allocate-the-correct-amount-of-memo.patch
rename to 0040-tools-ocaml-libs-Allocate-the-correct-amount-of-memo.patch
index 9fa8d08..5a26683 100644
--- a/0017-tools-ocaml-libs-Allocate-the-correct-amount-of-memo.patch
+++ b/0040-tools-ocaml-libs-Allocate-the-correct-amount-of-memo.patch
@@ -1,7 +1,7 @@
-From 6d66fb984cc768406158353cabf9a55652b0dea7 Mon Sep 17 00:00:00 2001
+From 5797b798a542a7e5be34698463152cb92f18776f Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Tue, 31 Jan 2023 10:59:42 +0000
-Subject: [PATCH 17/61] tools/ocaml/libs: Allocate the correct amount of memory
+Subject: [PATCH 40/89] tools/ocaml/libs: Allocate the correct amount of memory
for Abstract_tag
caml_alloc() takes units of Wsize (word size), not bytes. As a consequence,
@@ -23,12 +23,12 @@ Acked-by: Christian Lindig <christian.lindig@citrix.com>
3 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/tools/ocaml/libs/mmap/Makefile b/tools/ocaml/libs/mmap/Makefile
-index df45819df5..a3bd75e33a 100644
+index a621537135..855b8b2c98 100644
--- a/tools/ocaml/libs/mmap/Makefile
+++ b/tools/ocaml/libs/mmap/Makefile
-@@ -2,6 +2,8 @@ TOPLEVEL=$(CURDIR)/../..
- XEN_ROOT=$(TOPLEVEL)/../..
- include $(TOPLEVEL)/common.make
+@@ -2,6 +2,8 @@ OCAML_TOPLEVEL=$(CURDIR)/../..
+ XEN_ROOT=$(OCAML_TOPLEVEL)/../..
+ include $(OCAML_TOPLEVEL)/common.make
+CFLAGS += $(CFLAGS_xeninclude)
+
@@ -60,10 +60,10 @@ index e03951d781..d623ad390e 100644
if (mmap_interface_init(Intf_val(result), Int_val(fd),
c_pflag, c_mflag,
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
-index 434fc0345b..ec64341a9a 100644
+index 6eb0ea69da..e25367531b 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
-@@ -940,7 +940,10 @@ CAMLprim value stub_map_foreign_range(value xch, value dom,
+@@ -956,7 +956,10 @@ CAMLprim value stub_map_foreign_range(value xch, value dom,
uint32_t c_dom;
unsigned long c_mfn;
diff --git a/0018-tools-ocaml-evtchn-Don-t-reference-Custom-objects-wi.patch b/0041-tools-ocaml-evtchn-Don-t-reference-Custom-objects-wi.patch
similarity index 98%
rename from 0018-tools-ocaml-evtchn-Don-t-reference-Custom-objects-wi.patch
rename to 0041-tools-ocaml-evtchn-Don-t-reference-Custom-objects-wi.patch
index 8e1c860..cabcdd0 100644
--- a/0018-tools-ocaml-evtchn-Don-t-reference-Custom-objects-wi.patch
+++ b/0041-tools-ocaml-evtchn-Don-t-reference-Custom-objects-wi.patch
@@ -1,7 +1,7 @@
-From e18faeb91e620624106b94c8821f8c9574eddb17 Mon Sep 17 00:00:00 2001
+From 021b82cc0c71ba592439f175c1ededa800b172a9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
Date: Thu, 12 Jan 2023 17:48:29 +0000
-Subject: [PATCH 18/61] tools/ocaml/evtchn: Don't reference Custom objects with
+Subject: [PATCH 41/89] tools/ocaml/evtchn: Don't reference Custom objects with
the GC lock released
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
diff --git a/0019-tools-ocaml-xc-Fix-binding-for-xc_domain_assign_devi.patch b/0042-tools-ocaml-xc-Fix-binding-for-xc_domain_assign_devi.patch
similarity index 89%
rename from 0019-tools-ocaml-xc-Fix-binding-for-xc_domain_assign_devi.patch
rename to 0042-tools-ocaml-xc-Fix-binding-for-xc_domain_assign_devi.patch
index 5571446..ac3e86d 100644
--- a/0019-tools-ocaml-xc-Fix-binding-for-xc_domain_assign_devi.patch
+++ b/0042-tools-ocaml-xc-Fix-binding-for-xc_domain_assign_devi.patch
@@ -1,7 +1,7 @@
-From 854013084e2c6267af7787df8b35d85646f79a54 Mon Sep 17 00:00:00 2001
+From afdcc108566e5a4ee352b6427c98ebad6885a81d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
Date: Thu, 12 Jan 2023 11:38:38 +0000
-Subject: [PATCH 19/61] tools/ocaml/xc: Fix binding for
+Subject: [PATCH 42/89] tools/ocaml/xc: Fix binding for
xc_domain_assign_device()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
@@ -28,10 +28,10 @@ Acked-by: Christian Lindig <christian.lindig@citrix.com>
1 file changed, 5 insertions(+), 12 deletions(-)
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
-index ec64341a9a..e2efcbe182 100644
+index e25367531b..f376d94334 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
-@@ -1123,17 +1123,12 @@ CAMLprim value stub_xc_domain_test_assign_device(value xch, value domid, value d
+@@ -1139,17 +1139,12 @@ CAMLprim value stub_xc_domain_test_assign_device(value xch, value domid, value d
CAMLreturn(Val_bool(ret == 0));
}
@@ -52,7 +52,7 @@ index ec64341a9a..e2efcbe182 100644
domain = Int_val(Field(desc, 0));
bus = Int_val(Field(desc, 1));
-@@ -1141,10 +1136,8 @@ CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+@@ -1157,10 +1152,8 @@ CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
func = Int_val(Field(desc, 3));
sbdf = encode_sbdf(domain, bus, dev, func);
diff --git a/0020-tools-ocaml-xc-Don-t-reference-Abstract_Tag-objects-.patch b/0043-tools-ocaml-xc-Don-t-reference-Abstract_Tag-objects-.patch
similarity index 91%
rename from 0020-tools-ocaml-xc-Don-t-reference-Abstract_Tag-objects-.patch
rename to 0043-tools-ocaml-xc-Don-t-reference-Abstract_Tag-objects-.patch
index a829d36..b7fec46 100644
--- a/0020-tools-ocaml-xc-Don-t-reference-Abstract_Tag-objects-.patch
+++ b/0043-tools-ocaml-xc-Don-t-reference-Abstract_Tag-objects-.patch
@@ -1,7 +1,7 @@
-From 1fdff77e26290ae1ed40e8253959d12a0c4b3d3f Mon Sep 17 00:00:00 2001
+From bf935b1ff7cc76b2d25f877e56a359afaafcac1f Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Tue, 31 Jan 2023 17:19:30 +0000
-Subject: [PATCH 20/61] tools/ocaml/xc: Don't reference Abstract_Tag objects
+Subject: [PATCH 43/89] tools/ocaml/xc: Don't reference Abstract_Tag objects
with the GC lock released
The intf->{addr,len} references in the xc_map_foreign_range() call are unsafe.
@@ -30,10 +30,10 @@ Acked-by: Christian Lindig <christian.lindig@citrix.com>
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
-index e2efcbe182..0a0fe45c54 100644
+index f376d94334..facb561577 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
-@@ -937,26 +937,25 @@ CAMLprim value stub_map_foreign_range(value xch, value dom,
+@@ -953,26 +953,25 @@ CAMLprim value stub_map_foreign_range(value xch, value dom,
CAMLparam4(xch, dom, size, mfn);
CAMLlocal1(result);
struct mmap_interface *intf;
diff --git a/0021-tools-ocaml-libs-Fix-memory-resource-leaks-with-caml.patch b/0044-tools-ocaml-libs-Fix-memory-resource-leaks-with-caml.patch
similarity index 94%
rename from 0021-tools-ocaml-libs-Fix-memory-resource-leaks-with-caml.patch
rename to 0044-tools-ocaml-libs-Fix-memory-resource-leaks-with-caml.patch
index 8ed7dfa..8876ab7 100644
--- a/0021-tools-ocaml-libs-Fix-memory-resource-leaks-with-caml.patch
+++ b/0044-tools-ocaml-libs-Fix-memory-resource-leaks-with-caml.patch
@@ -1,7 +1,7 @@
-From 1b6acdeeb2323c53d841356da50440e274e7bf9a Mon Sep 17 00:00:00 2001
+From 587823eca162d063027faf1826ec3544f0a06e78 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Wed, 1 Feb 2023 11:27:42 +0000
-Subject: [PATCH 21/61] tools/ocaml/libs: Fix memory/resource leaks with
+Subject: [PATCH 44/89] tools/ocaml/libs: Fix memory/resource leaks with
caml_alloc_custom()
All caml_alloc_*() functions can throw exceptions, and longjump out of
diff --git a/0022-x86-spec-ctrl-Mitigate-Cross-Thread-Return-Address-P.patch b/0045-x86-spec-ctrl-Mitigate-Cross-Thread-Return-Address-P.patch
similarity index 83%
rename from 0022-x86-spec-ctrl-Mitigate-Cross-Thread-Return-Address-P.patch
rename to 0045-x86-spec-ctrl-Mitigate-Cross-Thread-Return-Address-P.patch
index 1d1edb0..1720bdd 100644
--- a/0022-x86-spec-ctrl-Mitigate-Cross-Thread-Return-Address-P.patch
+++ b/0045-x86-spec-ctrl-Mitigate-Cross-Thread-Return-Address-P.patch
@@ -1,7 +1,7 @@
-From d4e286db89d80c862b4a24bf971dd71008c8b53e Mon Sep 17 00:00:00 2001
+From 3685e754e6017c616769b28133286d06bf07b613 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu, 8 Sep 2022 21:27:58 +0100
-Subject: [PATCH 22/61] x86/spec-ctrl: Mitigate Cross-Thread Return Address
+Subject: [PATCH 45/89] x86/spec-ctrl: Mitigate Cross-Thread Return Address
Predictions
This is XSA-426 / CVE-2022-27672
@@ -10,17 +10,17 @@ Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 63305e5392ec2d17b85e7996a97462744425db80)
---
- docs/misc/xen-command-line.pandoc | 2 +-
- xen/arch/x86/spec_ctrl.c | 31 ++++++++++++++++++++++++++++---
- xen/include/asm-x86/cpufeatures.h | 3 ++-
- xen/include/asm-x86/spec_ctrl.h | 15 +++++++++++++++
+ docs/misc/xen-command-line.pandoc | 2 +-
+ xen/arch/x86/include/asm/cpufeatures.h | 3 ++-
+ xen/arch/x86/include/asm/spec_ctrl.h | 15 +++++++++++++
+ xen/arch/x86/spec_ctrl.c | 31 +++++++++++++++++++++++---
4 files changed, 46 insertions(+), 5 deletions(-)
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index bd6826d0ae..b3f60cd923 100644
+index 424b12cfb2..e7fe8b0cc9 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
-@@ -2275,7 +2275,7 @@ guests to use.
+@@ -2343,7 +2343,7 @@ guests to use.
on entry and exit. These blocks are necessary to virtualise support for
guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
* `rsb=` offers control over whether to overwrite the Return Stack Buffer /
@@ -29,11 +29,51 @@ index bd6826d0ae..b3f60cd923 100644
* `md-clear=` offers control over whether to use VERW to flush
microarchitectural buffers on idle and exit from Xen. *Note: For
compatibility with development versions of this fix, `mds=` is also accepted
+diff --git a/xen/arch/x86/include/asm/cpufeatures.h b/xen/arch/x86/include/asm/cpufeatures.h
+index 865f110986..da0593de85 100644
+--- a/xen/arch/x86/include/asm/cpufeatures.h
++++ b/xen/arch/x86/include/asm/cpufeatures.h
+@@ -35,7 +35,8 @@ XEN_CPUFEATURE(SC_RSB_HVM, X86_SYNTH(19)) /* RSB overwrite needed for HVM
+ XEN_CPUFEATURE(XEN_SELFSNOOP, X86_SYNTH(20)) /* SELFSNOOP gets used by Xen itself */
+ XEN_CPUFEATURE(SC_MSR_IDLE, X86_SYNTH(21)) /* Clear MSR_SPEC_CTRL on idle */
+ XEN_CPUFEATURE(XEN_LBR, X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
+-/* Bits 23,24 unused. */
++/* Bits 23 unused. */
++XEN_CPUFEATURE(SC_RSB_IDLE, X86_SYNTH(24)) /* RSB overwrite needed for idle. */
+ XEN_CPUFEATURE(SC_VERW_IDLE, X86_SYNTH(25)) /* VERW used by Xen for idle */
+ XEN_CPUFEATURE(XEN_SHSTK, X86_SYNTH(26)) /* Xen uses CET Shadow Stacks */
+ XEN_CPUFEATURE(XEN_IBT, X86_SYNTH(27)) /* Xen uses CET Indirect Branch Tracking */
+diff --git a/xen/arch/x86/include/asm/spec_ctrl.h b/xen/arch/x86/include/asm/spec_ctrl.h
+index 6a77c39378..391973ef6a 100644
+--- a/xen/arch/x86/include/asm/spec_ctrl.h
++++ b/xen/arch/x86/include/asm/spec_ctrl.h
+@@ -159,6 +159,21 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
+ */
+ alternative_input("", "verw %[sel]", X86_FEATURE_SC_VERW_IDLE,
+ [sel] "m" (info->verw_sel));
++
++ /*
++ * Cross-Thread Return Address Predictions:
++ *
++ * On vulnerable systems, the return predictions (RSB/RAS) are statically
++ * partitioned between active threads. When entering idle, our entries
++ * are re-partitioned to allow the other threads to use them.
++ *
++ * In some cases, we might still have guest entries in the RAS, so flush
++ * them before injecting them sideways to our sibling thread.
++ *
++ * (ab)use alternative_input() to specify clobbers.
++ */
++ alternative_input("", "DO_OVERWRITE_RSB", X86_FEATURE_SC_RSB_IDLE,
++ : "rax", "rcx");
+ }
+
+ /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 90d86fe5cb..14649d92f5 100644
+index a320b81947..e80e2a5ed1 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
-@@ -1317,13 +1317,38 @@ void __init init_speculation_mitigations(void)
+@@ -1327,13 +1327,38 @@ void __init init_speculation_mitigations(void)
* 3) Some CPUs have RSBs which are not full width, which allow the
* attacker's entries to alias Xen addresses.
*
@@ -75,46 +115,6 @@ index 90d86fe5cb..14649d92f5 100644
if ( opt_rsb_pv )
{
-diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
-index ecc1bb0950..ccf9d7287c 100644
---- a/xen/include/asm-x86/cpufeatures.h
-+++ b/xen/include/asm-x86/cpufeatures.h
-@@ -35,7 +35,8 @@ XEN_CPUFEATURE(SC_RSB_HVM, X86_SYNTH(19)) /* RSB overwrite needed for HVM
- XEN_CPUFEATURE(XEN_SELFSNOOP, X86_SYNTH(20)) /* SELFSNOOP gets used by Xen itself */
- XEN_CPUFEATURE(SC_MSR_IDLE, X86_SYNTH(21)) /* Clear MSR_SPEC_CTRL on idle */
- XEN_CPUFEATURE(XEN_LBR, X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
--/* Bits 23,24 unused. */
-+/* Bits 23 unused. */
-+XEN_CPUFEATURE(SC_RSB_IDLE, X86_SYNTH(24)) /* RSB overwrite needed for idle. */
- XEN_CPUFEATURE(SC_VERW_IDLE, X86_SYNTH(25)) /* VERW used by Xen for idle */
- XEN_CPUFEATURE(XEN_SHSTK, X86_SYNTH(26)) /* Xen uses CET Shadow Stacks */
- XEN_CPUFEATURE(XEN_IBT, X86_SYNTH(27)) /* Xen uses CET Indirect Branch Tracking */
-diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
-index 6a77c39378..391973ef6a 100644
---- a/xen/include/asm-x86/spec_ctrl.h
-+++ b/xen/include/asm-x86/spec_ctrl.h
-@@ -159,6 +159,21 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
- */
- alternative_input("", "verw %[sel]", X86_FEATURE_SC_VERW_IDLE,
- [sel] "m" (info->verw_sel));
-+
-+ /*
-+ * Cross-Thread Return Address Predictions:
-+ *
-+ * On vulnerable systems, the return predictions (RSB/RAS) are statically
-+ * partitioned between active threads. When entering idle, our entries
-+ * are re-partitioned to allow the other threads to use them.
-+ *
-+ * In some cases, we might still have guest entries in the RAS, so flush
-+ * them before injecting them sideways to our sibling thread.
-+ *
-+ * (ab)use alternative_input() to specify clobbers.
-+ */
-+ alternative_input("", "DO_OVERWRITE_RSB", X86_FEATURE_SC_RSB_IDLE,
-+ : "rax", "rcx");
- }
-
- /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
--
2.40.0
diff --git a/0023-automation-Remove-clang-8-from-Debian-unstable-conta.patch b/0046-automation-Remove-clang-8-from-Debian-unstable-conta.patch
similarity index 90%
rename from 0023-automation-Remove-clang-8-from-Debian-unstable-conta.patch
rename to 0046-automation-Remove-clang-8-from-Debian-unstable-conta.patch
index 36dfb4f..6fc3323 100644
--- a/0023-automation-Remove-clang-8-from-Debian-unstable-conta.patch
+++ b/0046-automation-Remove-clang-8-from-Debian-unstable-conta.patch
@@ -1,7 +1,7 @@
-From 0802504627453a54b1ab408b6e9dc8b5c561172d Mon Sep 17 00:00:00 2001
+From aaf74a532c02017998492c0bf60a9c6be3332f20 Mon Sep 17 00:00:00 2001
From: Anthony PERARD <anthony.perard@citrix.com>
Date: Tue, 21 Feb 2023 16:55:38 +0000
-Subject: [PATCH 23/61] automation: Remove clang-8 from Debian unstable
+Subject: [PATCH 46/89] automation: Remove clang-8 from Debian unstable
container
First, apt complain that it isn't the right way to add keys anymore,
@@ -39,10 +39,10 @@ index dc119fa0b4..0000000000
-deb http://apt.llvm.org/unstable/ llvm-toolchain-8 main
-deb-src http://apt.llvm.org/unstable/ llvm-toolchain-8 main
diff --git a/automation/build/debian/unstable.dockerfile b/automation/build/debian/unstable.dockerfile
-index bd61cd12c2..828afa2e1e 100644
+index 9de766d596..b560337b7a 100644
--- a/automation/build/debian/unstable.dockerfile
+++ b/automation/build/debian/unstable.dockerfile
-@@ -52,15 +52,3 @@ RUN apt-get update && \
+@@ -51,15 +51,3 @@ RUN apt-get update && \
apt-get autoremove -y && \
apt-get clean && \
rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
@@ -59,10 +59,10 @@ index bd61cd12c2..828afa2e1e 100644
- apt-get clean && \
- rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
-index fdd5c76582..06a75a8c5a 100644
+index 716ee0b1e4..bed161b471 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
-@@ -304,16 +304,6 @@ debian-unstable-clang-debug:
+@@ -312,16 +312,6 @@ debian-unstable-clang-debug:
variables:
CONTAINER: debian:unstable
diff --git a/0024-libs-util-Fix-parallel-build-between-flex-bison-and-.patch b/0047-libs-util-Fix-parallel-build-between-flex-bison-and-.patch
similarity index 81%
rename from 0024-libs-util-Fix-parallel-build-between-flex-bison-and-.patch
rename to 0047-libs-util-Fix-parallel-build-between-flex-bison-and-.patch
index 6164878..f3e6d36 100644
--- a/0024-libs-util-Fix-parallel-build-between-flex-bison-and-.patch
+++ b/0047-libs-util-Fix-parallel-build-between-flex-bison-and-.patch
@@ -1,7 +1,7 @@
-From e4b5dff3d06421847761669a3676bef1f23e705a Mon Sep 17 00:00:00 2001
+From c622b8ace93cc38c73f47f5044dc3663ef93f815 Mon Sep 17 00:00:00 2001
From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Fri, 3 Mar 2023 08:06:23 +0100
-Subject: [PATCH 24/61] libs/util: Fix parallel build between flex/bison and CC
+Date: Fri, 3 Mar 2023 07:55:24 +0100
+Subject: [PATCH 47/89] libs/util: Fix parallel build between flex/bison and CC
rules
flex/bison generate two targets, and when those targets are
@@ -27,12 +27,12 @@ master date: 2023-02-09 18:26:17 +0000
1 file changed, 8 insertions(+)
diff --git a/tools/libs/util/Makefile b/tools/libs/util/Makefile
-index b739360be7..977849c056 100644
+index 493d2e00be..fee4ea0dc7 100644
--- a/tools/libs/util/Makefile
+++ b/tools/libs/util/Makefile
-@@ -41,6 +41,14 @@ include $(XEN_ROOT)/tools/libs/libs.mk
+@@ -40,6 +40,14 @@ include $(XEN_ROOT)/tools/libs/libs.mk
- $(LIB_OBJS) $(PIC_OBJS): $(AUTOINCS) _paths.h
+ $(OBJS-y) $(PIC_OBJS): $(AUTOINCS)
+# Adding the .c conterparts of the headers generated by flex/bison as
+# prerequisite of all objects.
@@ -40,7 +40,7 @@ index b739360be7..977849c056 100644
+# header, it should still wait for the .c file to be rebuilt.
+# Otherwise, make doesn't considered "%.c %.h" as grouped targets, and will run
+# the flex/bison rules in parallel of CC rules which only need the header.
-+$(LIB_OBJS) $(PIC_OBJS): libxlu_cfg_l.c libxlu_cfg_y.c libxlu_disk_l.c
++$(OBJS-y) $(PIC_OBJS): libxlu_cfg_l.c libxlu_cfg_y.c libxlu_disk_l.c
+
%.c %.h:: %.y
@rm -f $*.[ch]
diff --git a/0025-x86-cpuid-Infrastructure-for-leaves-7-1-ecx-edx.patch b/0048-x86-cpuid-Infrastructure-for-leaves-7-1-ecx-edx.patch
similarity index 77%
rename from 0025-x86-cpuid-Infrastructure-for-leaves-7-1-ecx-edx.patch
rename to 0048-x86-cpuid-Infrastructure-for-leaves-7-1-ecx-edx.patch
index e73f62d..46c48de 100644
--- a/0025-x86-cpuid-Infrastructure-for-leaves-7-1-ecx-edx.patch
+++ b/0048-x86-cpuid-Infrastructure-for-leaves-7-1-ecx-edx.patch
@@ -1,7 +1,7 @@
-From 2094f834b85d32233c76763b014bc8764c3e36b1 Mon Sep 17 00:00:00 2001
+From cdc23d47ad85e756540eaa8655ebc2a0445612ed Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 3 Mar 2023 08:06:44 +0100
-Subject: [PATCH 25/61] x86/cpuid: Infrastructure for leaves 7:1{ecx,edx}
+Date: Fri, 3 Mar 2023 07:55:54 +0100
+Subject: [PATCH 48/89] x86/cpuid: Infrastructure for leaves 7:1{ecx,edx}
We don't actually need ecx yet, but adding it in now will reduce the amount to
which leaf 7 is out of order in a featureset.
@@ -14,15 +14,15 @@ master date: 2023-02-09 18:26:17 +0000
tools/misc/xen-cpuid.c | 10 ++++++++++
xen/arch/x86/cpu/common.c | 3 ++-
xen/include/public/arch-x86/cpufeatureset.h | 4 ++++
- xen/include/xen/lib/x86/cpuid.h | 17 +++++++++++++++--
- 4 files changed, 31 insertions(+), 3 deletions(-)
+ xen/include/xen/lib/x86/cpuid.h | 15 ++++++++++++++-
+ 4 files changed, 30 insertions(+), 2 deletions(-)
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
-index cd094427dd..3cfbbf043f 100644
+index d5833e9ce8..addb3a39a1 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
-@@ -198,6 +198,14 @@ static const char *const str_7b1[32] =
- {
+@@ -202,6 +202,14 @@ static const char *const str_7b1[32] =
+ [ 0] = "ppin",
};
+static const char *const str_7c1[32] =
@@ -36,7 +36,7 @@ index cd094427dd..3cfbbf043f 100644
static const char *const str_7d2[32] =
{
[ 0] = "intel-psfd",
-@@ -223,6 +231,8 @@ static const struct {
+@@ -229,6 +237,8 @@ static const struct {
{ "0x80000021.eax", "e21a", str_e21a },
{ "0x00000007:1.ebx", "7b1", str_7b1 },
{ "0x00000007:2.edx", "7d2", str_7d2 },
@@ -46,10 +46,10 @@ index cd094427dd..3cfbbf043f 100644
#define COL_ALIGN "18"
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
-index 9ce148a666..8222de6461 100644
+index 0412dbc915..b3fcf4680f 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
-@@ -448,7 +448,8 @@ static void generic_identify(struct cpuinfo_x86 *c)
+@@ -450,7 +450,8 @@ static void generic_identify(struct cpuinfo_x86 *c)
cpuid_count(7, 1,
&c->x86_capability[FEATURESET_7a1],
&c->x86_capability[FEATURESET_7b1],
@@ -60,12 +60,12 @@ index 9ce148a666..8222de6461 100644
cpuid_count(7, 2,
&tmp, &tmp, &tmp,
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
-index e073122140..0b01ca5e8f 100644
+index 7915f5826f..f43cdcd0f9 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
-@@ -304,6 +304,10 @@ XEN_CPUFEATURE(NSCB, 11*32+ 6) /*A Null Selector Clears Base (and
- /* Intel-defined CPU features, CPUID level 0x00000007:2.edx, word 13 */
- XEN_CPUFEATURE(INTEL_PSFD, 13*32+ 0) /*A MSR_SPEC_CTRL.PSFD */
+@@ -295,6 +295,10 @@ XEN_CPUFEATURE(RRSBA_CTRL, 13*32+ 2) /* MSR_SPEC_CTRL.RRSBA_DIS_* */
+ XEN_CPUFEATURE(BHI_CTRL, 13*32+ 4) /* MSR_SPEC_CTRL.BHI_DIS_S */
+ XEN_CPUFEATURE(MCDT_NO, 13*32+ 5) /*A MCDT_NO */
+/* Intel-defined CPU features, CPUID level 0x00000007:1.ecx, word 14 */
+
@@ -75,15 +75,13 @@ index e073122140..0b01ca5e8f 100644
/* Clean up from a default include. Close the enum (for C). */
diff --git a/xen/include/xen/lib/x86/cpuid.h b/xen/include/xen/lib/x86/cpuid.h
-index 50be07c0eb..fa98b371ee 100644
+index 73a5c33036..fa98b371ee 100644
--- a/xen/include/xen/lib/x86/cpuid.h
+++ b/xen/include/xen/lib/x86/cpuid.h
-@@ -17,7 +17,9 @@
- #define FEATURESET_7a1 10 /* 0x00000007:1.eax */
+@@ -18,6 +18,8 @@
#define FEATURESET_e21a 11 /* 0x80000021.eax */
#define FEATURESET_7b1 12 /* 0x00000007:1.ebx */
--#define FEATURESET_7d2 13 /* 0x80000007:2.edx */
-+#define FEATURESET_7d2 13 /* 0x00000007:2.edx */
+ #define FEATURESET_7d2 13 /* 0x00000007:2.edx */
+#define FEATURESET_7c1 14 /* 0x00000007:1.ecx */
+#define FEATURESET_7d1 15 /* 0x00000007:1.edx */
diff --git a/0026-x86-shskt-Disable-CET-SS-on-parts-susceptible-to-fra.patch b/0049-x86-shskt-Disable-CET-SS-on-parts-susceptible-to-fra.patch
similarity index 83%
rename from 0026-x86-shskt-Disable-CET-SS-on-parts-susceptible-to-fra.patch
rename to 0049-x86-shskt-Disable-CET-SS-on-parts-susceptible-to-fra.patch
index 7fd4031..a34217e 100644
--- a/0026-x86-shskt-Disable-CET-SS-on-parts-susceptible-to-fra.patch
+++ b/0049-x86-shskt-Disable-CET-SS-on-parts-susceptible-to-fra.patch
@@ -1,7 +1,7 @@
-From 5857cc632b884711c172c5766b8fbba59f990b47 Mon Sep 17 00:00:00 2001
+From 8202b9cf84674c5b23a89c4b8722afbb9787f917 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 3 Mar 2023 08:12:24 +0100
-Subject: [PATCH 26/61] x86/shskt: Disable CET-SS on parts susceptible to
+Date: Fri, 3 Mar 2023 07:56:16 +0100
+Subject: [PATCH 49/89] x86/shskt: Disable CET-SS on parts susceptible to
fractured updates
Refer to Intel SDM Rev 70 (Dec 2022), Vol3 17.2.3 "Supervisor Shadow Stack
@@ -36,13 +36,13 @@ master date: 2023-02-09 18:26:17 +0000
docs/misc/xen-command-line.pandoc | 7 +++-
tools/libs/light/libxl_cpuid.c | 2 +
tools/misc/xen-cpuid.c | 1 +
- xen/arch/x86/cpu/common.c | 8 +++-
+ xen/arch/x86/cpu/common.c | 11 ++++-
xen/arch/x86/setup.c | 46 +++++++++++++++++----
xen/include/public/arch-x86/cpufeatureset.h | 1 +
- 6 files changed, 55 insertions(+), 10 deletions(-)
+ 6 files changed, 57 insertions(+), 11 deletions(-)
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index b3f60cd923..a6018fd5c3 100644
+index e7fe8b0cc9..807ca51fb2 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -287,10 +287,15 @@ can be maintained with the pv-shim mechanism.
@@ -63,23 +63,23 @@ index b3f60cd923..a6018fd5c3 100644
its own protection.
diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
-index 691d5c6b2a..b4eacc2bd5 100644
+index 2aa23225f4..d97a2f3338 100644
--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
-@@ -234,6 +234,8 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
+@@ -235,6 +235,8 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
{"fsrs", 0x00000007, 1, CPUID_REG_EAX, 11, 1},
{"fsrcs", 0x00000007, 1, CPUID_REG_EAX, 12, 1},
+ {"cet-sss", 0x00000007, 1, CPUID_REG_EDX, 18, 1},
+
{"intel-psfd", 0x00000007, 2, CPUID_REG_EDX, 0, 1},
+ {"mcdt-no", 0x00000007, 2, CPUID_REG_EDX, 5, 1},
- {"lahfsahf", 0x80000001, NA, CPUID_REG_ECX, 0, 1},
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
-index 3cfbbf043f..db9c4ed8fc 100644
+index addb3a39a1..0248eaef44 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
-@@ -204,6 +204,7 @@ static const char *const str_7c1[32] =
+@@ -208,6 +208,7 @@ static const char *const str_7c1[32] =
static const char *const str_7d1[32] =
{
@@ -88,31 +88,35 @@ index 3cfbbf043f..db9c4ed8fc 100644
static const char *const str_7d2[32] =
diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
-index 8222de6461..e1fc034ce6 100644
+index b3fcf4680f..27f73d3bbe 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
-@@ -344,9 +344,15 @@ void __init early_cpu_init(void)
+@@ -346,11 +346,18 @@ void __init early_cpu_init(void)
+ x86_cpuid_vendor_to_str(c->x86_vendor), c->x86, c->x86,
c->x86_model, c->x86_model, c->x86_mask, eax);
- if (c->cpuid_level >= 7) {
-- cpuid_count(7, 0, &eax, &ebx, &ecx, &edx);
+- if (c->cpuid_level >= 7)
+- cpuid_count(7, 0, &eax, &ebx,
++ if (c->cpuid_level >= 7) {
+ uint32_t max_subleaf;
+
-+ cpuid_count(7, 0, &max_subleaf, &ebx, &ecx, &edx);
- c->x86_capability[cpufeat_word(X86_FEATURE_CET_SS)] = ecx;
- c->x86_capability[cpufeat_word(X86_FEATURE_CET_IBT)] = edx;
-+
++ cpuid_count(7, 0, &max_subleaf, &ebx,
+ &c->x86_capability[FEATURESET_7c0],
+ &c->x86_capability[FEATURESET_7d0]);
+
+ if (max_subleaf >= 1)
+ cpuid_count(7, 1, &eax, &ebx, &ecx,
+ &c->x86_capability[FEATURESET_7d1]);
- }
-
++ }
++
eax = cpuid_eax(0x80000000);
+ if ((eax >> 16) == 0x8000 && eax >= 0x80000008) {
+ ebx = eax >= 0x8000001f ? cpuid_ebx(0x8000001f) : 0;
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
-index 70b37d8afe..f0de805780 100644
+index e05189f649..09c17b1016 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
-@@ -98,11 +98,7 @@ unsigned long __initdata highmem_start;
+@@ -95,11 +95,7 @@ unsigned long __initdata highmem_start;
size_param("highmem-start", highmem_start);
#endif
@@ -125,7 +129,7 @@ index 70b37d8afe..f0de805780 100644
#ifdef CONFIG_XEN_IBT
static bool __initdata opt_xen_ibt = true;
-@@ -1113,11 +1109,45 @@ void __init noreturn __start_xen(unsigned long mbi_p)
+@@ -1104,11 +1100,45 @@ void __init noreturn __start_xen(unsigned long mbi_p)
early_cpu_init();
/* Choose shadow stack early, to set infrastructure up appropriately. */
@@ -175,10 +179,10 @@ index 70b37d8afe..f0de805780 100644
if ( opt_xen_ibt && boot_cpu_has(X86_FEATURE_CET_IBT) )
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
-index 0b01ca5e8f..4832ad09df 100644
+index f43cdcd0f9..08600cfdc7 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
-@@ -307,6 +307,7 @@ XEN_CPUFEATURE(INTEL_PSFD, 13*32+ 0) /*A MSR_SPEC_CTRL.PSFD */
+@@ -298,6 +298,7 @@ XEN_CPUFEATURE(MCDT_NO, 13*32+ 5) /*A MCDT_NO */
/* Intel-defined CPU features, CPUID level 0x00000007:1.ecx, word 14 */
/* Intel-defined CPU features, CPUID level 0x00000007:1.edx, word 15 */
diff --git a/0027-credit2-respect-credit2_runqueue-all-when-arranging-.patch b/0050-credit2-respect-credit2_runqueue-all-when-arranging-.patch
similarity index 87%
rename from 0027-credit2-respect-credit2_runqueue-all-when-arranging-.patch
rename to 0050-credit2-respect-credit2_runqueue-all-when-arranging-.patch
index 6c8ab5c..0444aa9 100644
--- a/0027-credit2-respect-credit2_runqueue-all-when-arranging-.patch
+++ b/0050-credit2-respect-credit2_runqueue-all-when-arranging-.patch
@@ -1,8 +1,8 @@
-From 366693226ce025e8721626609b4b43b9061b55f5 Mon Sep 17 00:00:00 2001
+From 74b76704fd4059e9133e84c1384501858e9663b7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
<marmarek@invisiblethingslab.com>
-Date: Fri, 3 Mar 2023 08:13:20 +0100
-Subject: [PATCH 27/61] credit2: respect credit2_runqueue=all when arranging
+Date: Fri, 3 Mar 2023 07:57:39 +0100
+Subject: [PATCH 50/89] credit2: respect credit2_runqueue=all when arranging
runqueues
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
@@ -28,10 +28,10 @@ master date: 2023-02-15 16:12:42 +0100
2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index a6018fd5c3..7b7a619c1b 100644
+index 807ca51fb2..5be5ce10c6 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
-@@ -724,6 +724,11 @@ Available alternatives, with their meaning, are:
+@@ -726,6 +726,11 @@ Available alternatives, with their meaning, are:
* `all`: just one runqueue shared by all the logical pCPUs of
the host
@@ -42,9 +42,9 @@ index a6018fd5c3..7b7a619c1b 100644
+
### dbgp
> `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]`
-
+ > `= xhci[ <integer> | @pci<bus>:<slot>.<func> ][,share=<bool>|hwdom]`
diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
-index 6396b38e04..1a240f417a 100644
+index 0e3f89e537..ae55feea34 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -996,9 +996,14 @@ cpu_add_to_runqueue(const struct scheduler *ops, unsigned int cpu)
diff --git a/0051-build-make-FILE-symbol-paths-consistent.patch b/0051-build-make-FILE-symbol-paths-consistent.patch
new file mode 100644
index 0000000..47528c2
--- /dev/null
+++ b/0051-build-make-FILE-symbol-paths-consistent.patch
@@ -0,0 +1,42 @@
+From 46c104cce0bf340193cb1eacaee5dcd75e264c8f Mon Sep 17 00:00:00 2001
+From: Ross Lagerwall <ross.lagerwall@citrix.com>
+Date: Fri, 3 Mar 2023 07:58:12 +0100
+Subject: [PATCH 51/89] build: make FILE symbol paths consistent
+
+The FILE symbols in out-of-tree builds may be either a relative path to
+the object dir or an absolute path depending on how the build is
+invoked. Fix the paths for C files so that they are consistent with
+in-tree builds - the path is relative to the "xen" directory (e.g.
+common/irq.c).
+
+This fixes livepatch builds when the original Xen build was out-of-tree
+since livepatch-build always does in-tree builds. Note that this doesn't
+fix the behaviour for Clang < 6 which always embeds full paths.
+
+Fixes: 7115fa562fe7 ("build: adding out-of-tree support to the xen build")
+Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 5b9bb91abba7c983def3b4bef71ab08ad360a242
+master date: 2023-02-15 16:13:49 +0100
+---
+ xen/Rules.mk | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/xen/Rules.mk b/xen/Rules.mk
+index 70b7489ea8..d6b7cec0a8 100644
+--- a/xen/Rules.mk
++++ b/xen/Rules.mk
+@@ -228,8 +228,9 @@ quiet_cmd_cc_o_c = CC $@
+ ifeq ($(CONFIG_ENFORCE_UNIQUE_SYMBOLS),y)
+ cmd_cc_o_c = $(CC) $(c_flags) -c $< -o $(dot-target).tmp -MQ $@
+ ifneq ($(CONFIG_CC_IS_CLANG)$(call clang-ifversion,-lt,600,y),yy)
++ rel-path = $(patsubst $(abs_srctree)/%,%,$(call realpath,$(1)))
+ cmd_objcopy_fix_sym = \
+- $(OBJCOPY) --redefine-sym $(<F)=$< $(dot-target).tmp $@ && rm -f $(dot-target).tmp
++ $(OBJCOPY) --redefine-sym $(<F)=$(call rel-path,$<) $(dot-target).tmp $@ && rm -f $(dot-target).tmp
+ else
+ cmd_objcopy_fix_sym = mv -f $(dot-target).tmp $@
+ endif
+--
+2.40.0
+
diff --git a/0028-x86-ucode-AMD-apply-the-patch-early-on-every-logical.patch b/0052-x86-ucode-AMD-apply-the-patch-early-on-every-logical.patch
similarity index 85%
rename from 0028-x86-ucode-AMD-apply-the-patch-early-on-every-logical.patch
rename to 0052-x86-ucode-AMD-apply-the-patch-early-on-every-logical.patch
index 55df5d0..22a214b 100644
--- a/0028-x86-ucode-AMD-apply-the-patch-early-on-every-logical.patch
+++ b/0052-x86-ucode-AMD-apply-the-patch-early-on-every-logical.patch
@@ -1,7 +1,7 @@
-From d1c6934b41f8288ea3169e63bce8a7eea9d9c549 Mon Sep 17 00:00:00 2001
+From e9a7942f6c1638c668605fbf6d6e02bc7bff2582 Mon Sep 17 00:00:00 2001
From: Sergey Dyasli <sergey.dyasli@citrix.com>
-Date: Fri, 3 Mar 2023 08:14:01 +0100
-Subject: [PATCH 28/61] x86/ucode/AMD: apply the patch early on every logical
+Date: Fri, 3 Mar 2023 07:58:35 +0100
+Subject: [PATCH 52/89] x86/ucode/AMD: apply the patch early on every logical
thread
The original issue has been reported on AMD Bulldozer-based CPUs where
@@ -32,13 +32,13 @@ master commit: f4ef8a41b80831db2136bdaff9f946a1a4b051e7
master date: 2023-02-21 15:08:05 +0100
---
xen/arch/x86/cpu/microcode/amd.c | 11 ++++++++---
- xen/arch/x86/cpu/microcode/core.c | 24 ++++++++++++++++--------
+ xen/arch/x86/cpu/microcode/core.c | 26 +++++++++++++++++---------
xen/arch/x86/cpu/microcode/intel.c | 10 +++++++---
xen/arch/x86/cpu/microcode/private.h | 3 ++-
- 4 files changed, 33 insertions(+), 15 deletions(-)
+ 4 files changed, 34 insertions(+), 16 deletions(-)
diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
-index fe92e594f1..52182c1a23 100644
+index 8195707ee1..ded8fe90e6 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -176,8 +176,8 @@ static enum microcode_match_result compare_revisions(
@@ -52,7 +52,7 @@ index fe92e594f1..52182c1a23 100644
return OLD_UCODE;
}
-@@ -220,8 +220,13 @@ static int apply_microcode(const struct microcode_patch *patch)
+@@ -220,8 +220,13 @@ static int cf_check apply_microcode(const struct microcode_patch *patch)
unsigned int cpu = smp_processor_id();
struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
uint32_t rev, old_rev = sig->rev;
@@ -68,15 +68,16 @@ index fe92e594f1..52182c1a23 100644
if ( check_final_patch_levels(sig) )
diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
-index ac3ceb567c..ceec1f1edc 100644
+index 452a7ca773..57ecc5358b 100644
--- a/xen/arch/x86/cpu/microcode/core.c
+++ b/xen/arch/x86/cpu/microcode/core.c
-@@ -608,16 +608,24 @@ static long microcode_update_helper(void *data)
+@@ -610,17 +610,25 @@ static long cf_check microcode_update_helper(void *data)
* that ucode revision.
*/
spin_lock(µcode_mutex);
- if ( microcode_cache &&
-- microcode_ops->compare_patch(patch, microcode_cache) != NEW_UCODE )
+- alternative_call(ucode_ops.compare_patch,
+- patch, microcode_cache) != NEW_UCODE )
+ if ( microcode_cache )
{
- spin_unlock(µcode_mutex);
@@ -87,7 +88,8 @@ index ac3ceb567c..ceec1f1edc 100644
+ enum microcode_match_result result;
- goto put;
-+ result = microcode_ops->compare_patch(patch, microcode_cache);
++ result = alternative_call(ucode_ops.compare_patch, patch,
++ microcode_cache);
+
+ if ( result != NEW_UCODE &&
+ !(opt_ucode_allow_same && result == SAME_UCODE) )
@@ -105,7 +107,7 @@ index ac3ceb567c..ceec1f1edc 100644
spin_unlock(µcode_mutex);
diff --git a/xen/arch/x86/cpu/microcode/intel.c b/xen/arch/x86/cpu/microcode/intel.c
-index f6d01490e0..c26fbb8cc7 100644
+index f5ba6d76d7..cb08f63d2e 100644
--- a/xen/arch/x86/cpu/microcode/intel.c
+++ b/xen/arch/x86/cpu/microcode/intel.c
@@ -232,8 +232,8 @@ static enum microcode_match_result compare_revisions(
@@ -119,7 +121,7 @@ index f6d01490e0..c26fbb8cc7 100644
/*
* Treat pre-production as always applicable - anyone using pre-production
-@@ -290,8 +290,12 @@ static int apply_microcode(const struct microcode_patch *patch)
+@@ -290,8 +290,12 @@ static int cf_check apply_microcode(const struct microcode_patch *patch)
unsigned int cpu = smp_processor_id();
struct cpu_signature *sig = &this_cpu(cpu_sig);
uint32_t rev, old_rev = sig->rev;
diff --git a/0029-x86-perform-mem_sharing-teardown-before-paging-teard.patch b/0053-x86-perform-mem_sharing-teardown-before-paging-teard.patch
similarity index 89%
rename from 0029-x86-perform-mem_sharing-teardown-before-paging-teard.patch
rename to 0053-x86-perform-mem_sharing-teardown-before-paging-teard.patch
index c96f44e..934c0f5 100644
--- a/0029-x86-perform-mem_sharing-teardown-before-paging-teard.patch
+++ b/0053-x86-perform-mem_sharing-teardown-before-paging-teard.patch
@@ -1,7 +1,7 @@
-From 700320a79297fb5087f7dd540424c468b2d2cffe Mon Sep 17 00:00:00 2001
+From e8f28e129d23c940749c66150a89c4ed683a0fb9 Mon Sep 17 00:00:00 2001
From: Tamas K Lengyel <tamas@tklengyel.com>
-Date: Fri, 3 Mar 2023 08:14:25 +0100
-Subject: [PATCH 29/61] x86: perform mem_sharing teardown before paging
+Date: Fri, 3 Mar 2023 07:59:08 +0100
+Subject: [PATCH 53/89] x86: perform mem_sharing teardown before paging
teardown
An assert failure has been observed in p2m_teardown when performing vm
@@ -24,10 +24,10 @@ master date: 2023-02-23 12:35:48 +0100
1 file changed, 29 insertions(+), 27 deletions(-)
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
-index 3080cde62b..6eeb248908 100644
+index 5a119eec3a..e546c98322 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
-@@ -2343,9 +2343,9 @@ int domain_relinquish_resources(struct domain *d)
+@@ -2347,9 +2347,9 @@ int domain_relinquish_resources(struct domain *d)
enum {
PROG_iommu_pagetables = 1,
@@ -38,7 +38,7 @@ index 3080cde62b..6eeb248908 100644
PROG_xen,
PROG_l4,
PROG_l3,
-@@ -2364,6 +2364,34 @@ int domain_relinquish_resources(struct domain *d)
+@@ -2368,6 +2368,34 @@ int domain_relinquish_resources(struct domain *d)
if ( ret )
return ret;
@@ -73,7 +73,7 @@ index 3080cde62b..6eeb248908 100644
PROGRESS(paging):
/* Tear down paging-assistance stuff. */
-@@ -2404,32 +2432,6 @@ int domain_relinquish_resources(struct domain *d)
+@@ -2408,32 +2436,6 @@ int domain_relinquish_resources(struct domain *d)
d->arch.auto_unmask = 0;
}
diff --git a/0030-xen-Work-around-Clang-IAS-macro-expansion-bug.patch b/0054-xen-Work-around-Clang-IAS-macro-expansion-bug.patch
similarity index 76%
rename from 0030-xen-Work-around-Clang-IAS-macro-expansion-bug.patch
rename to 0054-xen-Work-around-Clang-IAS-macro-expansion-bug.patch
index a92f2f0..525dc49 100644
--- a/0030-xen-Work-around-Clang-IAS-macro-expansion-bug.patch
+++ b/0054-xen-Work-around-Clang-IAS-macro-expansion-bug.patch
@@ -1,7 +1,7 @@
-From 2b8f72a6b40dafc3fb40bce100cd62c4a377535a Mon Sep 17 00:00:00 2001
+From 837bdc6eb2df796e832302347f363afc820694fe Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 3 Mar 2023 08:14:57 +0100
-Subject: [PATCH 30/61] xen: Work around Clang-IAS macro \@ expansion bug
+Date: Fri, 3 Mar 2023 08:00:04 +0100
+Subject: [PATCH 54/89] xen: Work around Clang-IAS macro \@ expansion bug
https://github.com/llvm/llvm-project/issues/60792
@@ -22,14 +22,14 @@ Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: a2adacff0b91cc7b977abb209dc419a2ef15963f
master date: 2023-02-24 17:44:29 +0000
---
- xen/include/asm-x86/spec_ctrl.h | 4 ++--
- xen/include/asm-x86/spec_ctrl_asm.h | 23 ++++++++++++++---------
- 2 files changed, 16 insertions(+), 11 deletions(-)
+ xen/arch/x86/include/asm/spec_ctrl.h | 4 ++--
+ xen/arch/x86/include/asm/spec_ctrl_asm.h | 19 ++++++++++++-------
+ 2 files changed, 14 insertions(+), 9 deletions(-)
-diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
+diff --git a/xen/arch/x86/include/asm/spec_ctrl.h b/xen/arch/x86/include/asm/spec_ctrl.h
index 391973ef6a..a431fea587 100644
---- a/xen/include/asm-x86/spec_ctrl.h
-+++ b/xen/include/asm-x86/spec_ctrl.h
+--- a/xen/arch/x86/include/asm/spec_ctrl.h
++++ b/xen/arch/x86/include/asm/spec_ctrl.h
@@ -83,7 +83,7 @@ static always_inline void spec_ctrl_new_guest_context(void)
wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
@@ -48,10 +48,10 @@ index 391973ef6a..a431fea587 100644
: "rax", "rcx");
}
-diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
-index 9eb4ad9ab7..b61a5571ae 100644
---- a/xen/include/asm-x86/spec_ctrl_asm.h
-+++ b/xen/include/asm-x86/spec_ctrl_asm.h
+diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+index fab27ff553..f23bb105c5 100644
+--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
++++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -117,11 +117,16 @@
.L\@_done:
.endm
@@ -70,7 +70,7 @@ index 9eb4ad9ab7..b61a5571ae 100644
* Requires 256 bytes of {,shadow}stack space, but %rsp/SSP has no net
* change. Based on Google's performance numbers, the loop is unrolled to 16
* iterations and two calls per iteration.
-@@ -137,31 +142,31 @@
+@@ -136,27 +141,27 @@
mov $16, %ecx /* 16 iterations, two calls per loop */
mov %rsp, %\tmp /* Store the current %rsp */
@@ -80,13 +80,7 @@ index 9eb4ad9ab7..b61a5571ae 100644
.irp n, 1, 2 /* Unrolled twice. */
- call .L\@_insert_rsb_entry_\n /* Create an RSB entry. */
+ call .L\@_insert_rsb_entry\xu\n /* Create an RSB entry. */
-
--.L\@_capture_speculation_\n:
-+.L\@_capture_speculation\xu\n:
- pause
- lfence
-- jmp .L\@_capture_speculation_\n /* Capture rogue speculation. */
-+ jmp .L\@_capture_speculation\xu\n /* Capture rogue speculation. */
+ int3 /* Halt rogue speculation. */
-.L\@_insert_rsb_entry_\n:
+.L\@_insert_rsb_entry\xu\n:
diff --git a/0031-xen-Fix-Clang-Wunicode-diagnostic-when-building-asm-.patch b/0055-xen-Fix-Clang-Wunicode-diagnostic-when-building-asm-.patch
similarity index 70%
rename from 0031-xen-Fix-Clang-Wunicode-diagnostic-when-building-asm-.patch
rename to 0055-xen-Fix-Clang-Wunicode-diagnostic-when-building-asm-.patch
index bad0316..02755a9 100644
--- a/0031-xen-Fix-Clang-Wunicode-diagnostic-when-building-asm-.patch
+++ b/0055-xen-Fix-Clang-Wunicode-diagnostic-when-building-asm-.patch
@@ -1,7 +1,7 @@
-From f073db0a07c5f6800a70c91819c4b8c2ba359451 Mon Sep 17 00:00:00 2001
+From b10cf1561a638c835481ae923b571cb8f7350a89 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 3 Mar 2023 08:15:50 +0100
-Subject: [PATCH 31/61] xen: Fix Clang -Wunicode diagnostic when building
+Date: Fri, 3 Mar 2023 08:01:21 +0100
+Subject: [PATCH 55/89] xen: Fix Clang -Wunicode diagnostic when building
asm-macros
While trying to work around a different Clang-IAS bug (parent changeset), I
@@ -38,10 +38,10 @@ master date: 2023-02-24 17:44:29 +0000
rename xen/arch/x86/{asm-macros.c => asm-macros.S} (100%)
diff --git a/xen/Rules.mk b/xen/Rules.mk
-index 5e0699e58b..1f171f88e2 100644
+index d6b7cec0a8..59072ae8df 100644
--- a/xen/Rules.mk
+++ b/xen/Rules.mk
-@@ -223,6 +223,9 @@ $(filter %.init.o,$(obj-y) $(obj-bin-y) $(extra-y)): %.init.o: %.o FORCE
+@@ -273,6 +273,9 @@ $(filter %.init.o,$(obj-y) $(obj-bin-y) $(extra-y)): $(obj)/%.init.o: $(obj)/%.o
quiet_cmd_cpp_i_c = CPP $@
cmd_cpp_i_c = $(CPP) $(call cpp_flags,$(c_flags)) -MQ $@ -o $@ $<
@@ -51,29 +51,29 @@ index 5e0699e58b..1f171f88e2 100644
quiet_cmd_cc_s_c = CC $@
cmd_cc_s_c = $(CC) $(filter-out -Wa$(comma)%,$(c_flags)) -S $< -o $@
-@@ -232,6 +235,9 @@ cmd_cpp_s_S = $(CPP) $(call cpp_flags,$(a_flags)) -MQ $@ -o $@ $<
- %.i: %.c FORCE
- $(call if_changed,cpp_i_c)
+@@ -282,6 +285,9 @@ cmd_cpp_s_S = $(CPP) $(call cpp_flags,$(a_flags)) -MQ $@ -o $@ $<
+ $(obj)/%.i: $(src)/%.c FORCE
+ $(call if_changed_dep,cpp_i_c)
-+%.i: %.S FORCE
-+ $(call if_changed,cpp_i_S)
++$(obj)/%.i: $(src)/%.S FORCE
++ $(call if_changed_dep,cpp_i_S)
+
- %.s: %.c FORCE
- $(call if_changed,cc_s_c)
+ $(obj)/%.s: $(src)/%.c FORCE
+ $(call if_changed_dep,cc_s_c)
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
-index 69b6cfaded..8e975f472d 100644
+index 177a2ff742..5accbe4c67 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
-@@ -273,7 +273,7 @@ efi/buildid.o efi/relocs-dummy.o: ;
+@@ -240,7 +240,7 @@ $(obj)/efi/buildid.o $(obj)/efi/relocs-dummy.o: ;
.PHONY: include
- include: $(BASEDIR)/include/asm-x86/asm-macros.h
+ include: $(objtree)/arch/x86/include/asm/asm-macros.h
--asm-macros.i: CFLAGS-y += -D__ASSEMBLY__ -P
-+asm-macros.i: CFLAGS-y += -P
+-$(obj)/asm-macros.i: CFLAGS-y += -D__ASSEMBLY__ -P
++$(obj)/asm-macros.i: CFLAGS-y += -P
- $(BASEDIR)/include/asm-x86/asm-macros.h: asm-macros.i Makefile
- echo '#if 0' >$@.new
+ $(objtree)/arch/x86/include/asm/asm-macros.h: $(obj)/asm-macros.i $(src)/Makefile
+ $(call filechk,asm-macros.h)
diff --git a/xen/arch/x86/asm-macros.c b/xen/arch/x86/asm-macros.S
similarity index 100%
rename from xen/arch/x86/asm-macros.c
diff --git a/0056-bump-default-SeaBIOS-version-to-1.16.0.patch b/0056-bump-default-SeaBIOS-version-to-1.16.0.patch
deleted file mode 100644
index 37d9b67..0000000
--- a/0056-bump-default-SeaBIOS-version-to-1.16.0.patch
+++ /dev/null
@@ -1,28 +0,0 @@
-From 2a4d327387601b60c9844a5b0cc44de28792ea52 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Fri, 6 May 2022 14:46:52 +0200
-Subject: [PATCH 56/61] bump default SeaBIOS version to 1.16.0
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 944e389daa133dd310d87c4eebacba9f6da76018)
----
- Config.mk | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/Config.mk b/Config.mk
-index 1215c2725b..073715c28d 100644
---- a/Config.mk
-+++ b/Config.mk
-@@ -241,7 +241,7 @@ OVMF_UPSTREAM_REVISION ?= 7b4a99be8a39c12d3a7fc4b8db9f0eab4ac688d5
- QEMU_UPSTREAM_REVISION ?= qemu-xen-4.16.3
- MINIOS_UPSTREAM_REVISION ?= xen-RELEASE-4.16.3
-
--SEABIOS_UPSTREAM_REVISION ?= rel-1.14.0
-+SEABIOS_UPSTREAM_REVISION ?= rel-1.16.0
-
- ETHERBOOT_NICS ?= rtl8139 8086100e
-
---
-2.40.0
-
diff --git a/0032-tools-Use-PKG_CONFIG_FILE-instead-of-PKG_CONFIG-vari.patch b/0056-tools-Use-PKG_CONFIG_FILE-instead-of-PKG_CONFIG-vari.patch
similarity index 69%
rename from 0032-tools-Use-PKG_CONFIG_FILE-instead-of-PKG_CONFIG-vari.patch
rename to 0056-tools-Use-PKG_CONFIG_FILE-instead-of-PKG_CONFIG-vari.patch
index bfcdd26..59cc172 100644
--- a/0032-tools-Use-PKG_CONFIG_FILE-instead-of-PKG_CONFIG-vari.patch
+++ b/0056-tools-Use-PKG_CONFIG_FILE-instead-of-PKG_CONFIG-vari.patch
@@ -1,7 +1,7 @@
-From a2adc7fcc22405e81dc11290416e6140bb0244ca Mon Sep 17 00:00:00 2001
+From 53bd16bcc0d0f5ed5d1ac6d6dc14bf6ecf2e2c43 Mon Sep 17 00:00:00 2001
From: Bertrand Marquis <bertrand.marquis@arm.com>
-Date: Fri, 3 Mar 2023 08:16:45 +0100
-Subject: [PATCH 32/61] tools: Use PKG_CONFIG_FILE instead of PKG_CONFIG
+Date: Fri, 3 Mar 2023 08:02:30 +0100
+Subject: [PATCH 56/89] tools: Use PKG_CONFIG_FILE instead of PKG_CONFIG
variable
Replace PKG_CONFIG variable name with PKG_CONFIG_FILE for the name of
@@ -20,15 +20,15 @@ master commit: b97e2fe7b9e1f4706693552697239ac2b71efee4
master date: 2023-02-24 17:44:29 +0000
---
tools/libs/ctrl/Makefile | 2 +-
- tools/libs/libs.mk | 13 +++++++------
- 2 files changed, 8 insertions(+), 7 deletions(-)
+ tools/libs/libs.mk | 16 ++++++++--------
+ 2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/tools/libs/ctrl/Makefile b/tools/libs/ctrl/Makefile
-index 6ff5918798..d3666ae7ff 100644
+index 93442ab389..15d0ae8e4e 100644
--- a/tools/libs/ctrl/Makefile
+++ b/tools/libs/ctrl/Makefile
-@@ -47,7 +47,7 @@ CFLAGS += -include $(XEN_ROOT)/tools/config.h
- CFLAGS-$(CONFIG_Linux) += -D_GNU_SOURCE
+@@ -4,7 +4,7 @@ include $(XEN_ROOT)/tools/Rules.mk
+ include Makefile.common
LIBHEADER := xenctrl.h xenctrl_compat.h
-PKG_CONFIG := xencontrol.pc
@@ -37,7 +37,7 @@ index 6ff5918798..d3666ae7ff 100644
NO_HEADERS_CHK := y
diff --git a/tools/libs/libs.mk b/tools/libs/libs.mk
-index f1554462fb..0e005218e2 100644
+index 3eb91fc8f3..3fab5aecff 100644
--- a/tools/libs/libs.mk
+++ b/tools/libs/libs.mk
@@ -1,7 +1,7 @@
@@ -49,25 +49,27 @@ index f1554462fb..0e005218e2 100644
# MAJOR: major version of lib (Xen version if empty)
# MINOR: minor version of lib (0 if empty)
-@@ -29,7 +29,8 @@ endif
- comma:= ,
- empty:=
- space:= $(empty) $(empty)
+@@ -26,7 +26,7 @@ ifneq ($(nosharedlibs),y)
+ TARGETS += lib$(LIB_FILE_NAME).so
+ endif
+
-PKG_CONFIG ?= $(LIB_FILE_NAME).pc
-+
+PKG_CONFIG_FILE ?= $(LIB_FILE_NAME).pc
PKG_CONFIG_NAME ?= Xen$(LIBNAME)
PKG_CONFIG_DESC ?= The $(PKG_CONFIG_NAME) library for Xen hypervisor
PKG_CONFIG_VERSION := $(MAJOR).$(MINOR)
-@@ -38,13 +39,13 @@ PKG_CONFIG_LIB := $(LIB_FILE_NAME)
+@@ -35,13 +35,13 @@ PKG_CONFIG_LIB := $(LIB_FILE_NAME)
PKG_CONFIG_REQPRIV := $(subst $(space),$(comma),$(strip $(foreach lib,$(patsubst ctrl,control,$(USELIBS_$(LIBNAME))),xen$(lib))))
ifneq ($(CONFIG_LIBXC_MINIOS),y)
--PKG_CONFIG_INST := $(PKG_CONFIG)
-+PKG_CONFIG_INST := $(PKG_CONFIG_FILE)
- $(PKG_CONFIG_INST): PKG_CONFIG_PREFIX = $(prefix)
- $(PKG_CONFIG_INST): PKG_CONFIG_INCDIR = $(includedir)
- $(PKG_CONFIG_INST): PKG_CONFIG_LIBDIR = $(libdir)
+-TARGETS += $(PKG_CONFIG)
+-$(PKG_CONFIG): PKG_CONFIG_PREFIX = $(prefix)
+-$(PKG_CONFIG): PKG_CONFIG_INCDIR = $(includedir)
+-$(PKG_CONFIG): PKG_CONFIG_LIBDIR = $(libdir)
++TARGETS += $(PKG_CONFIG_FILE)
++$(PKG_CONFIG_FILE): PKG_CONFIG_PREFIX = $(prefix)
++$(PKG_CONFIG_FILE): PKG_CONFIG_INCDIR = $(includedir)
++$(PKG_CONFIG_FILE): PKG_CONFIG_LIBDIR = $(libdir)
endif
-PKG_CONFIG_LOCAL := $(PKG_CONFIG_DIR)/$(PKG_CONFIG)
@@ -75,7 +77,7 @@ index f1554462fb..0e005218e2 100644
LIBHEADER ?= $(LIB_FILE_NAME).h
LIBHEADERS = $(foreach h, $(LIBHEADER), $(XEN_INCLUDE)/$(h))
-@@ -114,7 +115,7 @@ install: build
+@@ -103,7 +103,7 @@ install:: all
$(SYMLINK_SHLIB) lib$(LIB_FILE_NAME).so.$(MAJOR).$(MINOR) $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so.$(MAJOR)
$(SYMLINK_SHLIB) lib$(LIB_FILE_NAME).so.$(MAJOR) $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so
for i in $(LIBHEADERS); do $(INSTALL_DATA) $$i $(DESTDIR)$(includedir); done
@@ -83,16 +85,7 @@ index f1554462fb..0e005218e2 100644
+ $(INSTALL_DATA) $(PKG_CONFIG_FILE) $(DESTDIR)$(PKG_INSTALLDIR)
.PHONY: uninstall
- uninstall:
-@@ -134,7 +135,7 @@ clean:
- rm -rf *.rpm $(LIB) *~ $(DEPS_RM) $(LIB_OBJS) $(PIC_OBJS)
- rm -f lib$(LIB_FILE_NAME).so.$(MAJOR).$(MINOR) lib$(LIB_FILE_NAME).so.$(MAJOR)
- rm -f headers.chk headers.lst
-- rm -f $(PKG_CONFIG)
-+ rm -f $(PKG_CONFIG_FILE)
- rm -f _paths.h
-
- .PHONY: distclean
+ uninstall::
--
2.40.0
diff --git a/0033-libs-guest-Fix-resource-leaks-in-xc_core_arch_map_p2.patch b/0057-libs-guest-Fix-resource-leaks-in-xc_core_arch_map_p2.patch
similarity index 93%
rename from 0033-libs-guest-Fix-resource-leaks-in-xc_core_arch_map_p2.patch
rename to 0057-libs-guest-Fix-resource-leaks-in-xc_core_arch_map_p2.patch
index 5caa850..ea80bd0 100644
--- a/0033-libs-guest-Fix-resource-leaks-in-xc_core_arch_map_p2.patch
+++ b/0057-libs-guest-Fix-resource-leaks-in-xc_core_arch_map_p2.patch
@@ -1,7 +1,7 @@
-From b181a3a5532574d2163408284bcd785ec87fe046 Mon Sep 17 00:00:00 2001
+From 01f85d835bb10d18bdab2cc780ea5ad47004516d Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 3 Mar 2023 08:17:04 +0100
-Subject: [PATCH 33/61] libs/guest: Fix resource leaks in
+Date: Fri, 3 Mar 2023 08:02:59 +0100
+Subject: [PATCH 57/89] libs/guest: Fix resource leaks in
xc_core_arch_map_p2m_tree_rw()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
diff --git a/0034-libs-guest-Fix-leak-on-realloc-failure-in-backup_pte.patch b/0058-libs-guest-Fix-leak-on-realloc-failure-in-backup_pte.patch
similarity index 89%
rename from 0034-libs-guest-Fix-leak-on-realloc-failure-in-backup_pte.patch
rename to 0058-libs-guest-Fix-leak-on-realloc-failure-in-backup_pte.patch
index 4be16a3..d55c095 100644
--- a/0034-libs-guest-Fix-leak-on-realloc-failure-in-backup_pte.patch
+++ b/0058-libs-guest-Fix-leak-on-realloc-failure-in-backup_pte.patch
@@ -1,7 +1,7 @@
-From 25d103f2eb59f021cce61f07a0bf0bfa696b4416 Mon Sep 17 00:00:00 2001
+From fa8250f1920413f02b63551a6a4d8ef0b47891a8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
-Date: Fri, 3 Mar 2023 08:17:23 +0100
-Subject: [PATCH 34/61] libs/guest: Fix leak on realloc failure in
+Date: Fri, 3 Mar 2023 08:03:19 +0100
+Subject: [PATCH 58/89] libs/guest: Fix leak on realloc failure in
backup_ptes()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
@@ -29,7 +29,7 @@ master date: 2023-02-27 15:51:23 +0000
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/tools/libs/guest/xg_offline_page.c b/tools/libs/guest/xg_offline_page.c
-index cfe0e2d537..c42b973363 100644
+index c594fdba41..ccd0299f0f 100644
--- a/tools/libs/guest/xg_offline_page.c
+++ b/tools/libs/guest/xg_offline_page.c
@@ -181,10 +181,16 @@ static int backup_ptes(xen_pfn_t table_mfn, int offset,
diff --git a/0035-x86-ucode-AMD-late-load-the-patch-on-every-logical-t.patch b/0059-x86-ucode-AMD-late-load-the-patch-on-every-logical-t.patch
similarity index 81%
rename from 0035-x86-ucode-AMD-late-load-the-patch-on-every-logical-t.patch
rename to 0059-x86-ucode-AMD-late-load-the-patch-on-every-logical-t.patch
index 931d93f..292a61a 100644
--- a/0035-x86-ucode-AMD-late-load-the-patch-on-every-logical-t.patch
+++ b/0059-x86-ucode-AMD-late-load-the-patch-on-every-logical-t.patch
@@ -1,7 +1,7 @@
-From 84dfe7a56f04a7412fa4869b3e756c49e1cfbe75 Mon Sep 17 00:00:00 2001
+From ec5b058d2a6436a2e180315522fcf1645a8153b4 Mon Sep 17 00:00:00 2001
From: Sergey Dyasli <sergey.dyasli@citrix.com>
-Date: Fri, 3 Mar 2023 08:17:40 +0100
-Subject: [PATCH 35/61] x86/ucode/AMD: late load the patch on every logical
+Date: Fri, 3 Mar 2023 08:03:43 +0100
+Subject: [PATCH 59/89] x86/ucode/AMD: late load the patch on every logical
thread
Currently late ucode loading is performed only on the first core of CPU
@@ -21,10 +21,10 @@ master date: 2023-02-28 14:51:28 +0100
1 file changed, 19 insertions(+), 5 deletions(-)
diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
-index ceec1f1edc..ee7df9a591 100644
+index 57ecc5358b..2497630bbe 100644
--- a/xen/arch/x86/cpu/microcode/core.c
+++ b/xen/arch/x86/cpu/microcode/core.c
-@@ -273,6 +273,20 @@ static bool microcode_update_cache(struct microcode_patch *patch)
+@@ -274,6 +274,20 @@ static bool microcode_update_cache(struct microcode_patch *patch)
return true;
}
@@ -45,16 +45,16 @@ index ceec1f1edc..ee7df9a591 100644
/* Wait for a condition to be met with a timeout (us). */
static int wait_for_condition(bool (*func)(unsigned int data),
unsigned int data, unsigned int timeout)
-@@ -378,7 +392,7 @@ static int primary_thread_work(const struct microcode_patch *patch)
-
- static int microcode_nmi_callback(const struct cpu_user_regs *regs, int cpu)
+@@ -380,7 +394,7 @@ static int primary_thread_work(const struct microcode_patch *patch)
+ static int cf_check microcode_nmi_callback(
+ const struct cpu_user_regs *regs, int cpu)
{
- unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
+ bool primary_cpu = is_cpu_primary(cpu);
int ret;
/* System-generated NMI, leave to main handler */
-@@ -391,10 +405,10 @@ static int microcode_nmi_callback(const struct cpu_user_regs *regs, int cpu)
+@@ -393,10 +407,10 @@ static int cf_check microcode_nmi_callback(
* ucode_in_nmi.
*/
if ( cpu == cpumask_first(&cpu_online_map) ||
@@ -67,7 +67,7 @@ index ceec1f1edc..ee7df9a591 100644
ret = primary_thread_work(nmi_patch);
else
ret = secondary_nmi_work();
-@@ -545,7 +559,7 @@ static int do_microcode_update(void *patch)
+@@ -547,7 +561,7 @@ static int cf_check do_microcode_update(void *patch)
*/
if ( cpu == cpumask_first(&cpu_online_map) )
ret = control_thread_fn(patch);
@@ -76,7 +76,7 @@ index ceec1f1edc..ee7df9a591 100644
ret = primary_thread_fn(patch);
else
ret = secondary_thread_fn();
-@@ -637,7 +651,7 @@ static long microcode_update_helper(void *data)
+@@ -640,7 +654,7 @@ static long cf_check microcode_update_helper(void *data)
/* Calculate the number of online CPU core */
nr_cores = 0;
for_each_online_cpu(cpu)
diff --git a/0036-x86-shadow-account-for-log-dirty-mode-when-pre-alloc.patch b/0060-x86-shadow-account-for-log-dirty-mode-when-pre-alloc.patch
similarity index 81%
rename from 0036-x86-shadow-account-for-log-dirty-mode-when-pre-alloc.patch
rename to 0060-x86-shadow-account-for-log-dirty-mode-when-pre-alloc.patch
index 38629a4..fd397b0 100644
--- a/0036-x86-shadow-account-for-log-dirty-mode-when-pre-alloc.patch
+++ b/0060-x86-shadow-account-for-log-dirty-mode-when-pre-alloc.patch
@@ -1,7 +1,7 @@
-From b0d6684ee58f7252940f5a62e4b85bdc56307eef Mon Sep 17 00:00:00 2001
+From f8f8f07880d3817fc7b0472420eca9fecaa55358 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 21 Mar 2023 11:59:44 +0000
-Subject: [PATCH 36/61] x86/shadow: account for log-dirty mode when
+Date: Tue, 21 Mar 2023 11:58:50 +0000
+Subject: [PATCH 60/89] x86/shadow: account for log-dirty mode when
pre-allocating
Pre-allocation is intended to ensure that in the course of constructing
@@ -32,16 +32,31 @@ Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
(cherry picked from commit 91767a71061035ae42be93de495cd976f863a41a)
---
- xen/arch/x86/mm/paging.c | 1 +
- xen/arch/x86/mm/shadow/common.c | 12 +++++++++++-
- xen/include/asm-x86/paging.h | 4 ++++
+ xen/arch/x86/include/asm/paging.h | 4 ++++
+ xen/arch/x86/mm/paging.c | 1 +
+ xen/arch/x86/mm/shadow/common.c | 12 +++++++++++-
3 files changed, 16 insertions(+), 1 deletion(-)
+diff --git a/xen/arch/x86/include/asm/paging.h b/xen/arch/x86/include/asm/paging.h
+index b2b243a4ff..635ccc83b1 100644
+--- a/xen/arch/x86/include/asm/paging.h
++++ b/xen/arch/x86/include/asm/paging.h
+@@ -190,6 +190,10 @@ bool paging_mfn_is_dirty(const struct domain *d, mfn_t gmfn);
+ #define L4_LOGDIRTY_IDX(pfn) ((pfn_x(pfn) >> (PAGE_SHIFT + 3 + PAGETABLE_ORDER * 2)) & \
+ (LOGDIRTY_NODE_ENTRIES-1))
+
++#define paging_logdirty_levels() \
++ (DIV_ROUND_UP(PADDR_BITS - PAGE_SHIFT - (PAGE_SHIFT + 3), \
++ PAGE_SHIFT - ilog2(sizeof(mfn_t))) + 1)
++
+ #ifdef CONFIG_HVM
+ /* VRAM dirty tracking support */
+ struct sh_dirty_vram {
diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
-index 97ac9ccf59..9fb66e65cd 100644
+index 8d579fa9a3..308d44bce7 100644
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
-@@ -280,6 +280,7 @@ void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn)
+@@ -282,6 +282,7 @@ void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn)
if ( unlikely(!VALID_M2P(pfn_x(pfn))) )
return;
@@ -50,7 +65,7 @@ index 97ac9ccf59..9fb66e65cd 100644
i2 = L2_LOGDIRTY_IDX(pfn);
i3 = L3_LOGDIRTY_IDX(pfn);
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
-index 1de0139742..c14a269935 100644
+index a8404f97f6..cf5e181f74 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -1015,7 +1015,17 @@ bool shadow_prealloc(struct domain *d, unsigned int type, unsigned int count)
@@ -72,21 +87,6 @@ index 1de0139742..c14a269935 100644
if ( !ret && (!d->is_shutting_down || d->shutdown_code != SHUTDOWN_crash) )
/*
* Failing to allocate memory required for shadow usage can only result in
-diff --git a/xen/include/asm-x86/paging.h b/xen/include/asm-x86/paging.h
-index 27890791d8..c6b429c691 100644
---- a/xen/include/asm-x86/paging.h
-+++ b/xen/include/asm-x86/paging.h
-@@ -192,6 +192,10 @@ int paging_mfn_is_dirty(struct domain *d, mfn_t gmfn);
- #define L4_LOGDIRTY_IDX(pfn) ((pfn_x(pfn) >> (PAGE_SHIFT + 3 + PAGETABLE_ORDER * 2)) & \
- (LOGDIRTY_NODE_ENTRIES-1))
-
-+#define paging_logdirty_levels() \
-+ (DIV_ROUND_UP(PADDR_BITS - PAGE_SHIFT - (PAGE_SHIFT + 3), \
-+ PAGE_SHIFT - ilog2(sizeof(mfn_t))) + 1)
-+
- #ifdef CONFIG_HVM
- /* VRAM dirty tracking support */
- struct sh_dirty_vram {
--
2.40.0
diff --git a/0037-x86-HVM-bound-number-of-pinned-cache-attribute-regio.patch b/0061-x86-HVM-bound-number-of-pinned-cache-attribute-regio.patch
similarity index 90%
rename from 0037-x86-HVM-bound-number-of-pinned-cache-attribute-regio.patch
rename to 0061-x86-HVM-bound-number-of-pinned-cache-attribute-regio.patch
index 6730b2d..b638eca 100644
--- a/0037-x86-HVM-bound-number-of-pinned-cache-attribute-regio.patch
+++ b/0061-x86-HVM-bound-number-of-pinned-cache-attribute-regio.patch
@@ -1,7 +1,7 @@
-From 2fe1517a00e088f6b1f1aff7d4ea1b477b288987 Mon Sep 17 00:00:00 2001
+From d0cb66d59a956ccba3dbe794f4ec01e4a4269ee9 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 21 Mar 2023 12:01:01 +0000
-Subject: [PATCH 37/61] x86/HVM: bound number of pinned cache attribute regions
+Subject: [PATCH 61/89] x86/HVM: bound number of pinned cache attribute regions
This is exposed via DMOP, i.e. to potentially not fully privileged
device models. With that we may not permit registration of an (almost)
@@ -18,7 +18,7 @@ Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
1 file changed, 5 insertions(+)
diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
-index 4a9f3177ed..98e55bbdbd 100644
+index 4d2aa6def8..714911dd7f 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
@@ -595,6 +595,7 @@ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
diff --git a/0038-x86-HVM-serialize-pinned-cache-attribute-list-manipu.patch b/0062-x86-HVM-serialize-pinned-cache-attribute-list-manipu.patch
similarity index 92%
rename from 0038-x86-HVM-serialize-pinned-cache-attribute-list-manipu.patch
rename to 0062-x86-HVM-serialize-pinned-cache-attribute-list-manipu.patch
index ca8528f..a0f6efc 100644
--- a/0038-x86-HVM-serialize-pinned-cache-attribute-list-manipu.patch
+++ b/0062-x86-HVM-serialize-pinned-cache-attribute-list-manipu.patch
@@ -1,7 +1,7 @@
-From 564de020d29fbc4efd20ef8052051e86b2465a1a Mon Sep 17 00:00:00 2001
+From a2a915b3960e6ab060d8be2c36e6e697700ea87c Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 21 Mar 2023 12:01:01 +0000
-Subject: [PATCH 38/61] x86/HVM: serialize pinned cache attribute list
+Subject: [PATCH 62/89] x86/HVM: serialize pinned cache attribute list
manipulation
While the RCU variants of list insertion and removal allow lockless list
@@ -20,10 +20,10 @@ Reviewed-by: Julien Grall <jgrall@amazon.com>
1 file changed, 31 insertions(+), 20 deletions(-)
diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
-index 98e55bbdbd..9b3b33012b 100644
+index 714911dd7f..bd5cc42ef4 100644
--- a/xen/arch/x86/hvm/mtrr.c
+++ b/xen/arch/x86/hvm/mtrr.c
-@@ -594,7 +594,7 @@ static void free_pinned_cacheattr_entry(struct rcu_head *rcu)
+@@ -594,7 +594,7 @@ static void cf_check free_pinned_cacheattr_entry(struct rcu_head *rcu)
int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
uint64_t gfn_end, uint32_t type)
{
@@ -120,7 +120,7 @@ index 98e55bbdbd..9b3b33012b 100644
+ return rc;
}
- static int hvm_save_mtrr_msr(struct vcpu *v, hvm_domain_context_t *h)
+ static int cf_check hvm_save_mtrr_msr(struct vcpu *v, hvm_domain_context_t *h)
--
2.40.0
diff --git a/0039-x86-spec-ctrl-Defer-CR4_PV32_RESTORE-on-the-cstar_en.patch b/0063-x86-spec-ctrl-Defer-CR4_PV32_RESTORE-on-the-cstar_en.patch
similarity index 92%
rename from 0039-x86-spec-ctrl-Defer-CR4_PV32_RESTORE-on-the-cstar_en.patch
rename to 0063-x86-spec-ctrl-Defer-CR4_PV32_RESTORE-on-the-cstar_en.patch
index 74bcf67..fa97a41 100644
--- a/0039-x86-spec-ctrl-Defer-CR4_PV32_RESTORE-on-the-cstar_en.patch
+++ b/0063-x86-spec-ctrl-Defer-CR4_PV32_RESTORE-on-the-cstar_en.patch
@@ -1,7 +1,7 @@
-From 3c924fe46b455834b5c04268db6b528b549668d1 Mon Sep 17 00:00:00 2001
+From a730e4d1190594102784222f76a984d10bbc88a9 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Fri, 10 Feb 2023 21:11:14 +0000
-Subject: [PATCH 39/61] x86/spec-ctrl: Defer CR4_PV32_RESTORE on the
+Subject: [PATCH 63/89] x86/spec-ctrl: Defer CR4_PV32_RESTORE on the
cstar_enter path
As stated (correctly) by the comment next to SPEC_CTRL_ENTRY_FROM_PV, between
@@ -31,7 +31,7 @@ Reviewed-by: Jan Beulich <jbeulich@suse.com>
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
-index fba8ae498f..db2ea7871e 100644
+index ae01285181..7675a59ff0 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -288,7 +288,6 @@ ENTRY(cstar_enter)
diff --git a/0064-x86-vmx-implement-VMExit-based-guest-Bus-Lock-detect.patch b/0064-x86-vmx-implement-VMExit-based-guest-Bus-Lock-detect.patch
new file mode 100644
index 0000000..cebb501
--- /dev/null
+++ b/0064-x86-vmx-implement-VMExit-based-guest-Bus-Lock-detect.patch
@@ -0,0 +1,175 @@
+From 83f12e4eafdc4b034501adf4847a09a1293fdf8b Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 21 Mar 2023 13:40:41 +0100
+Subject: [PATCH 64/89] x86/vmx: implement VMExit based guest Bus Lock
+ detection
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Add support for enabling guest Bus Lock Detection on Intel systems.
+Such detection works by triggering a vmexit, which ought to be enough
+of a pause to prevent a guest from abusing of the Bus Lock.
+
+Add an extra Xen perf counter to track the number of Bus Locks detected.
+This is done because Bus Locks can also be reported by setting the bit
+26 in the exit reason field, so also account for those.
+
+Note EXIT_REASON_BUS_LOCK VMExits will always have bit 26 set in
+exit_reason, and hence the performance counter doesn't need to be
+increased for EXIT_REASON_BUS_LOCK handling.
+
+Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+master commit: f7d07619d2ae0382e2922e287fbfbb27722f3f0b
+master date: 2022-12-19 11:22:43 +0100
+---
+ xen/arch/x86/hvm/vmx/vmcs.c | 4 +++-
+ xen/arch/x86/hvm/vmx/vmx.c | 15 +++++++++++++++
+ xen/arch/x86/hvm/vmx/vvmx.c | 3 ++-
+ xen/arch/x86/include/asm/hvm/vmx/vmcs.h | 3 +++
+ xen/arch/x86/include/asm/hvm/vmx/vmx.h | 2 ++
+ xen/arch/x86/include/asm/perfc_defn.h | 4 +++-
+ 6 files changed, 28 insertions(+), 3 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
+index 84dbb88d33..a0d5e8d6ab 100644
+--- a/xen/arch/x86/hvm/vmx/vmcs.c
++++ b/xen/arch/x86/hvm/vmx/vmcs.c
+@@ -209,6 +209,7 @@ static void __init vmx_display_features(void)
+ P(cpu_has_vmx_virt_exceptions, "Virtualisation Exceptions");
+ P(cpu_has_vmx_pml, "Page Modification Logging");
+ P(cpu_has_vmx_tsc_scaling, "TSC Scaling");
++ P(cpu_has_vmx_bus_lock_detection, "Bus Lock Detection");
+ #undef P
+
+ if ( !printed )
+@@ -318,7 +319,8 @@ static int vmx_init_vmcs_config(bool bsp)
+ SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
+ SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
+ SECONDARY_EXEC_XSAVES |
+- SECONDARY_EXEC_TSC_SCALING);
++ SECONDARY_EXEC_TSC_SCALING |
++ SECONDARY_EXEC_BUS_LOCK_DETECTION);
+ if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
+ opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
+ if ( opt_vpid_enabled )
+diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
+index 861f91f2af..d0f0f2e429 100644
+--- a/xen/arch/x86/hvm/vmx/vmx.c
++++ b/xen/arch/x86/hvm/vmx/vmx.c
+@@ -4084,6 +4084,12 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+ return;
+ }
+
++ if ( unlikely(exit_reason & VMX_EXIT_REASONS_BUS_LOCK) )
++ {
++ perfc_incr(buslock);
++ exit_reason &= ~VMX_EXIT_REASONS_BUS_LOCK;
++ }
++
+ /* XXX: This looks ugly, but we need a mechanism to ensure
+ * any pending vmresume has really happened
+ */
+@@ -4593,6 +4599,15 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+ vmx_handle_descriptor_access(exit_reason);
+ break;
+
++ case EXIT_REASON_BUS_LOCK:
++ /*
++ * Nothing to do: just taking a vmexit should be enough of a pause to
++ * prevent a VM from crippling the host with bus locks. Note
++ * EXIT_REASON_BUS_LOCK will always have bit 26 set in exit_reason, and
++ * hence the perf counter is already increased.
++ */
++ break;
++
+ case EXIT_REASON_VMX_PREEMPTION_TIMER_EXPIRED:
+ case EXIT_REASON_INVPCID:
+ /* fall through */
+diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
+index 5f54451475..2095c1e612 100644
+--- a/xen/arch/x86/hvm/vmx/vvmx.c
++++ b/xen/arch/x86/hvm/vmx/vvmx.c
+@@ -2405,7 +2405,7 @@ void nvmx_idtv_handling(void)
+ * be reinjected, otherwise, pass to L1.
+ */
+ __vmread(VM_EXIT_REASON, &reason);
+- if ( reason != EXIT_REASON_EPT_VIOLATION ?
++ if ( (uint16_t)reason != EXIT_REASON_EPT_VIOLATION ?
+ !(nvmx->intr.intr_info & INTR_INFO_VALID_MASK) :
+ !nvcpu->nv_vmexit_pending )
+ {
+@@ -2486,6 +2486,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
+ case EXIT_REASON_EPT_VIOLATION:
+ case EXIT_REASON_EPT_MISCONFIG:
+ case EXIT_REASON_EXTERNAL_INTERRUPT:
++ case EXIT_REASON_BUS_LOCK:
+ /* pass to L0 handler */
+ break;
+ case VMX_EXIT_REASONS_FAILED_VMENTRY:
+diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
+index 75f9928abf..f3df5113d4 100644
+--- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
++++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
+@@ -267,6 +267,7 @@ extern u32 vmx_vmentry_control;
+ #define SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS 0x00040000
+ #define SECONDARY_EXEC_XSAVES 0x00100000
+ #define SECONDARY_EXEC_TSC_SCALING 0x02000000
++#define SECONDARY_EXEC_BUS_LOCK_DETECTION 0x40000000
+ extern u32 vmx_secondary_exec_control;
+
+ #define VMX_EPT_EXEC_ONLY_SUPPORTED 0x00000001
+@@ -346,6 +347,8 @@ extern u64 vmx_ept_vpid_cap;
+ (vmx_secondary_exec_control & SECONDARY_EXEC_XSAVES)
+ #define cpu_has_vmx_tsc_scaling \
+ (vmx_secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
++#define cpu_has_vmx_bus_lock_detection \
++ (vmx_secondary_exec_control & SECONDARY_EXEC_BUS_LOCK_DETECTION)
+
+ #define VMCS_RID_TYPE_MASK 0x80000000
+
+diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmx.h b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
+index 8eedf59155..03995701a1 100644
+--- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
++++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
+@@ -159,6 +159,7 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
+ * Exit Reasons
+ */
+ #define VMX_EXIT_REASONS_FAILED_VMENTRY 0x80000000
++#define VMX_EXIT_REASONS_BUS_LOCK (1u << 26)
+
+ #define EXIT_REASON_EXCEPTION_NMI 0
+ #define EXIT_REASON_EXTERNAL_INTERRUPT 1
+@@ -219,6 +220,7 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
+ #define EXIT_REASON_PML_FULL 62
+ #define EXIT_REASON_XSAVES 63
+ #define EXIT_REASON_XRSTORS 64
++#define EXIT_REASON_BUS_LOCK 74
+ /* Remember to also update VMX_PERF_EXIT_REASON_SIZE! */
+
+ /*
+diff --git a/xen/arch/x86/include/asm/perfc_defn.h b/xen/arch/x86/include/asm/perfc_defn.h
+index 509afc516b..6fce21e85a 100644
+--- a/xen/arch/x86/include/asm/perfc_defn.h
++++ b/xen/arch/x86/include/asm/perfc_defn.h
+@@ -6,7 +6,7 @@ PERFCOUNTER_ARRAY(exceptions, "exceptions", 32)
+
+ #ifdef CONFIG_HVM
+
+-#define VMX_PERF_EXIT_REASON_SIZE 65
++#define VMX_PERF_EXIT_REASON_SIZE 75
+ #define VMEXIT_NPF_PERFC 143
+ #define SVM_PERF_EXIT_REASON_SIZE (VMEXIT_NPF_PERFC + 1)
+ PERFCOUNTER_ARRAY(vmexits, "vmexits",
+@@ -128,4 +128,6 @@ PERFCOUNTER(pauseloop_exits, "vmexits from Pause-Loop Detection")
+ PERFCOUNTER(iommu_pt_shatters, "IOMMU page table shatters")
+ PERFCOUNTER(iommu_pt_coalesces, "IOMMU page table coalesces")
+
++PERFCOUNTER(buslock, "Bus Locks Detected")
++
+ /*#endif*/ /* __XEN_PERFC_DEFN_H__ */
+--
+2.40.0
+
diff --git a/0065-x86-vmx-introduce-helper-to-set-VMX_INTR_SHADOW_NMI.patch b/0065-x86-vmx-introduce-helper-to-set-VMX_INTR_SHADOW_NMI.patch
new file mode 100644
index 0000000..847ee99
--- /dev/null
+++ b/0065-x86-vmx-introduce-helper-to-set-VMX_INTR_SHADOW_NMI.patch
@@ -0,0 +1,102 @@
+From 27abea1ba6fa68f81b98de31cf9b9ebb594ff238 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 21 Mar 2023 13:41:49 +0100
+Subject: [PATCH 65/89] x86/vmx: introduce helper to set VMX_INTR_SHADOW_NMI
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Introduce a small helper to OR VMX_INTR_SHADOW_NMI in
+GUEST_INTERRUPTIBILITY_INFO in order to help dealing with the NMI
+unblocked by IRET case. Replace the existing usage in handling
+EXIT_REASON_EXCEPTION_NMI and also add such handling to EPT violations
+and page-modification log-full events.
+
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+master commit: d329b37d12132164c3894d0b6284be72576ef950
+master date: 2022-12-19 11:23:34 +0100
+---
+ xen/arch/x86/hvm/vmx/vmx.c | 28 +++++++++++++++++++-------
+ xen/arch/x86/include/asm/hvm/vmx/vmx.h | 3 +++
+ 2 files changed, 24 insertions(+), 7 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
+index d0f0f2e429..456726e897 100644
+--- a/xen/arch/x86/hvm/vmx/vmx.c
++++ b/xen/arch/x86/hvm/vmx/vmx.c
+@@ -3967,6 +3967,15 @@ static int vmx_handle_apic_write(void)
+ return vlapic_apicv_write(current, exit_qualification & 0xfff);
+ }
+
++static void undo_nmis_unblocked_by_iret(void)
++{
++ unsigned long guest_info;
++
++ __vmread(GUEST_INTERRUPTIBILITY_INFO, &guest_info);
++ __vmwrite(GUEST_INTERRUPTIBILITY_INFO,
++ guest_info | VMX_INTR_SHADOW_NMI);
++}
++
+ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+ {
+ unsigned long exit_qualification, exit_reason, idtv_info, intr_info = 0;
+@@ -4167,13 +4176,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+ if ( unlikely(intr_info & INTR_INFO_NMI_UNBLOCKED_BY_IRET) &&
+ !(idtv_info & INTR_INFO_VALID_MASK) &&
+ (vector != TRAP_double_fault) )
+- {
+- unsigned long guest_info;
+-
+- __vmread(GUEST_INTERRUPTIBILITY_INFO, &guest_info);
+- __vmwrite(GUEST_INTERRUPTIBILITY_INFO,
+- guest_info | VMX_INTR_SHADOW_NMI);
+- }
++ undo_nmis_unblocked_by_iret();
+
+ perfc_incra(cause_vector, vector);
+
+@@ -4539,6 +4542,11 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+
+ __vmread(GUEST_PHYSICAL_ADDRESS, &gpa);
+ __vmread(EXIT_QUALIFICATION, &exit_qualification);
++
++ if ( unlikely(exit_qualification & INTR_INFO_NMI_UNBLOCKED_BY_IRET) &&
++ !(idtv_info & INTR_INFO_VALID_MASK) )
++ undo_nmis_unblocked_by_iret();
++
+ ept_handle_violation(exit_qualification, gpa);
+ break;
+ }
+@@ -4583,6 +4591,12 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+ break;
+
+ case EXIT_REASON_PML_FULL:
++ __vmread(EXIT_QUALIFICATION, &exit_qualification);
++
++ if ( unlikely(exit_qualification & INTR_INFO_NMI_UNBLOCKED_BY_IRET) &&
++ !(idtv_info & INTR_INFO_VALID_MASK) )
++ undo_nmis_unblocked_by_iret();
++
+ vmx_vcpu_flush_pml_buffer(v);
+ break;
+
+diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmx.h b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
+index 03995701a1..eae39365aa 100644
+--- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
++++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
+@@ -225,6 +225,9 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
+
+ /*
+ * Interruption-information format
++ *
++ * Note INTR_INFO_NMI_UNBLOCKED_BY_IRET is also used with Exit Qualification
++ * field for EPT violations, PML full and SPP-related event vmexits.
+ */
+ #define INTR_INFO_VECTOR_MASK 0xff /* 7:0 */
+ #define INTR_INFO_INTR_TYPE_MASK 0x700 /* 10:8 */
+--
+2.40.0
+
diff --git a/0066-x86-vmx-implement-Notify-VM-Exit.patch b/0066-x86-vmx-implement-Notify-VM-Exit.patch
new file mode 100644
index 0000000..bc54d18
--- /dev/null
+++ b/0066-x86-vmx-implement-Notify-VM-Exit.patch
@@ -0,0 +1,243 @@
+From b745ff30113d2bd91e2d34cf56437b2fe2e2ea35 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 21 Mar 2023 13:42:43 +0100
+Subject: [PATCH 66/89] x86/vmx: implement Notify VM Exit
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Under certain conditions guests can get the CPU stuck in an unbounded
+loop without the possibility of an interrupt window to occur on
+instruction boundary. This was the case with the scenarios described
+in XSA-156.
+
+Make use of the Notify VM Exit mechanism, that will trigger a VM Exit
+if no interrupt window occurs for a specified amount of time. Note
+that using the Notify VM Exit avoids having to trap #AC and #DB
+exceptions, as Xen is guaranteed to get a VM Exit even if the guest
+puts the CPU in a loop without an interrupt window, as such disable
+the intercepts if the feature is available and enabled.
+
+Setting the notify VM exit window to 0 is safe because there's a
+threshold added by the hardware in order to have a sane window value.
+
+Note the handling of EXIT_REASON_NOTIFY in the nested virtualization
+case is passed to L0, and hence a nested guest being able to trigger a
+notify VM exit with an invalid context would be able to crash the L1
+hypervisor (by L0 destroying the domain). Since we don't expose VM
+Notify support to L1 it should already enable the required
+protections in order to prevent VM Notify from triggering in the first
+place.
+
+Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+
+x86/vmx: Partially revert "x86/vmx: implement Notify VM Exit"
+
+The original patch tried to do two things - implement VMNotify, and
+re-optimise VT-x to not intercept #DB/#AC by default.
+
+The second part is buggy in multiple ways. Both GDBSX and Introspection need
+to conditionally intercept #DB, which was not accounted for. Also, #DB
+interception has nothing at all to do with cpu_has_monitor_trap_flag.
+
+Revert the second half, leaving #DB/#AC intercepted unilaterally, but with
+VMNotify active by default when available.
+
+Fixes: 573279cde1c4 ("x86/vmx: implement Notify VM Exit")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+master commit: 573279cde1c4e752d4df34bc65ffafa17573148e
+master date: 2022-12-19 11:24:14 +0100
+master commit: 5f08bc9404c7cfa8131e262c7dbcb4d96c752686
+master date: 2023-01-20 19:39:32 +0000
+---
+ docs/misc/xen-command-line.pandoc | 11 +++++++++++
+ xen/arch/x86/hvm/vmx/vmcs.c | 10 ++++++++++
+ xen/arch/x86/hvm/vmx/vmx.c | 16 ++++++++++++++++
+ xen/arch/x86/hvm/vmx/vvmx.c | 1 +
+ xen/arch/x86/include/asm/hvm/vmx/vmcs.h | 4 ++++
+ xen/arch/x86/include/asm/hvm/vmx/vmx.h | 6 ++++++
+ xen/arch/x86/include/asm/perfc_defn.h | 3 ++-
+ 7 files changed, 50 insertions(+), 1 deletion(-)
+
+diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
+index 5be5ce10c6..d601120faa 100644
+--- a/docs/misc/xen-command-line.pandoc
++++ b/docs/misc/xen-command-line.pandoc
+@@ -2634,6 +2634,17 @@ guest will notify Xen that it has failed to acquire a spinlock.
+ <major>, <minor> and <build> must be integers. The values will be
+ encoded in guest CPUID 0x40000002 if viridian enlightenments are enabled.
+
++### vm-notify-window (Intel)
++> `= <integer>`
++
++> Default: `0`
++
++Specify the value of the VM Notify window used to detect locked VMs. Set to -1
++to disable the feature. Value is in units of crystal clock cycles.
++
++Note the hardware might add a threshold to the provided value in order to make
++it safe, and hence using 0 is fine.
++
+ ### vpid (Intel)
+ > `= <boolean>`
+
+diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
+index a0d5e8d6ab..7912053bda 100644
+--- a/xen/arch/x86/hvm/vmx/vmcs.c
++++ b/xen/arch/x86/hvm/vmx/vmcs.c
+@@ -67,6 +67,9 @@ integer_param("ple_gap", ple_gap);
+ static unsigned int __read_mostly ple_window = 4096;
+ integer_param("ple_window", ple_window);
+
++static unsigned int __ro_after_init vm_notify_window;
++integer_param("vm-notify-window", vm_notify_window);
++
+ static bool __read_mostly opt_ept_pml = true;
+ static s8 __read_mostly opt_ept_ad = -1;
+ int8_t __read_mostly opt_ept_exec_sp = -1;
+@@ -210,6 +213,7 @@ static void __init vmx_display_features(void)
+ P(cpu_has_vmx_pml, "Page Modification Logging");
+ P(cpu_has_vmx_tsc_scaling, "TSC Scaling");
+ P(cpu_has_vmx_bus_lock_detection, "Bus Lock Detection");
++ P(cpu_has_vmx_notify_vm_exiting, "Notify VM Exit");
+ #undef P
+
+ if ( !printed )
+@@ -329,6 +333,8 @@ static int vmx_init_vmcs_config(bool bsp)
+ opt |= SECONDARY_EXEC_UNRESTRICTED_GUEST;
+ if ( opt_ept_pml )
+ opt |= SECONDARY_EXEC_ENABLE_PML;
++ if ( vm_notify_window != ~0u )
++ opt |= SECONDARY_EXEC_NOTIFY_VM_EXITING;
+
+ /*
+ * "APIC Register Virtualization" and "Virtual Interrupt Delivery"
+@@ -1290,6 +1296,10 @@ static int construct_vmcs(struct vcpu *v)
+ v->arch.hvm.vmx.exception_bitmap = HVM_TRAP_MASK
+ | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault))
+ | (v->arch.fully_eager_fpu ? 0 : (1U << TRAP_no_device));
++
++ if ( cpu_has_vmx_notify_vm_exiting )
++ __vmwrite(NOTIFY_WINDOW, vm_notify_window);
++
+ vmx_update_exception_bitmap(v);
+
+ v->arch.hvm.guest_cr[0] = X86_CR0_PE | X86_CR0_ET;
+diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
+index 456726e897..f0e759eeaf 100644
+--- a/xen/arch/x86/hvm/vmx/vmx.c
++++ b/xen/arch/x86/hvm/vmx/vmx.c
+@@ -4622,6 +4622,22 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+ */
+ break;
+
++ case EXIT_REASON_NOTIFY:
++ __vmread(EXIT_QUALIFICATION, &exit_qualification);
++
++ if ( unlikely(exit_qualification & NOTIFY_VM_CONTEXT_INVALID) )
++ {
++ perfc_incr(vmnotify_crash);
++ gprintk(XENLOG_ERR, "invalid VM context after notify vmexit\n");
++ domain_crash(v->domain);
++ break;
++ }
++
++ if ( unlikely(exit_qualification & INTR_INFO_NMI_UNBLOCKED_BY_IRET) )
++ undo_nmis_unblocked_by_iret();
++
++ break;
++
+ case EXIT_REASON_VMX_PREEMPTION_TIMER_EXPIRED:
+ case EXIT_REASON_INVPCID:
+ /* fall through */
+diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
+index 2095c1e612..f8fe8d0c14 100644
+--- a/xen/arch/x86/hvm/vmx/vvmx.c
++++ b/xen/arch/x86/hvm/vmx/vvmx.c
+@@ -2487,6 +2487,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
+ case EXIT_REASON_EPT_MISCONFIG:
+ case EXIT_REASON_EXTERNAL_INTERRUPT:
+ case EXIT_REASON_BUS_LOCK:
++ case EXIT_REASON_NOTIFY:
+ /* pass to L0 handler */
+ break;
+ case VMX_EXIT_REASONS_FAILED_VMENTRY:
+diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
+index f3df5113d4..78404e42b3 100644
+--- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
++++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
+@@ -268,6 +268,7 @@ extern u32 vmx_vmentry_control;
+ #define SECONDARY_EXEC_XSAVES 0x00100000
+ #define SECONDARY_EXEC_TSC_SCALING 0x02000000
+ #define SECONDARY_EXEC_BUS_LOCK_DETECTION 0x40000000
++#define SECONDARY_EXEC_NOTIFY_VM_EXITING 0x80000000
+ extern u32 vmx_secondary_exec_control;
+
+ #define VMX_EPT_EXEC_ONLY_SUPPORTED 0x00000001
+@@ -349,6 +350,8 @@ extern u64 vmx_ept_vpid_cap;
+ (vmx_secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
+ #define cpu_has_vmx_bus_lock_detection \
+ (vmx_secondary_exec_control & SECONDARY_EXEC_BUS_LOCK_DETECTION)
++#define cpu_has_vmx_notify_vm_exiting \
++ (vmx_secondary_exec_control & SECONDARY_EXEC_NOTIFY_VM_EXITING)
+
+ #define VMCS_RID_TYPE_MASK 0x80000000
+
+@@ -456,6 +459,7 @@ enum vmcs_field {
+ SECONDARY_VM_EXEC_CONTROL = 0x0000401e,
+ PLE_GAP = 0x00004020,
+ PLE_WINDOW = 0x00004022,
++ NOTIFY_WINDOW = 0x00004024,
+ VM_INSTRUCTION_ERROR = 0x00004400,
+ VM_EXIT_REASON = 0x00004402,
+ VM_EXIT_INTR_INFO = 0x00004404,
+diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmx.h b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
+index eae39365aa..8e1e42ac47 100644
+--- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
++++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
+@@ -221,6 +221,7 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
+ #define EXIT_REASON_XSAVES 63
+ #define EXIT_REASON_XRSTORS 64
+ #define EXIT_REASON_BUS_LOCK 74
++#define EXIT_REASON_NOTIFY 75
+ /* Remember to also update VMX_PERF_EXIT_REASON_SIZE! */
+
+ /*
+@@ -236,6 +237,11 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
+ #define INTR_INFO_VALID_MASK 0x80000000 /* 31 */
+ #define INTR_INFO_RESVD_BITS_MASK 0x7ffff000
+
++/*
++ * Exit Qualifications for NOTIFY VM EXIT
++ */
++#define NOTIFY_VM_CONTEXT_INVALID 1u
++
+ /*
+ * Exit Qualifications for MOV for Control Register Access
+ */
+diff --git a/xen/arch/x86/include/asm/perfc_defn.h b/xen/arch/x86/include/asm/perfc_defn.h
+index 6fce21e85a..487e20dc97 100644
+--- a/xen/arch/x86/include/asm/perfc_defn.h
++++ b/xen/arch/x86/include/asm/perfc_defn.h
+@@ -6,7 +6,7 @@ PERFCOUNTER_ARRAY(exceptions, "exceptions", 32)
+
+ #ifdef CONFIG_HVM
+
+-#define VMX_PERF_EXIT_REASON_SIZE 75
++#define VMX_PERF_EXIT_REASON_SIZE 76
+ #define VMEXIT_NPF_PERFC 143
+ #define SVM_PERF_EXIT_REASON_SIZE (VMEXIT_NPF_PERFC + 1)
+ PERFCOUNTER_ARRAY(vmexits, "vmexits",
+@@ -129,5 +129,6 @@ PERFCOUNTER(iommu_pt_shatters, "IOMMU page table shatters")
+ PERFCOUNTER(iommu_pt_coalesces, "IOMMU page table coalesces")
+
+ PERFCOUNTER(buslock, "Bus Locks Detected")
++PERFCOUNTER(vmnotify_crash, "domain crashes by Notify VM Exit")
+
+ /*#endif*/ /* __XEN_PERFC_DEFN_H__ */
+--
+2.40.0
+
diff --git a/0040-tools-python-change-s-size-type-for-Python-3.10.patch b/0067-tools-python-change-s-size-type-for-Python-3.10.patch
similarity index 92%
rename from 0040-tools-python-change-s-size-type-for-Python-3.10.patch
rename to 0067-tools-python-change-s-size-type-for-Python-3.10.patch
index 979fd6f..0671c67 100644
--- a/0040-tools-python-change-s-size-type-for-Python-3.10.patch
+++ b/0067-tools-python-change-s-size-type-for-Python-3.10.patch
@@ -1,8 +1,8 @@
-From 0cbffc6099db7fd01041910a98b99ccad50af11b Mon Sep 17 00:00:00 2001
+From 651ffe2c7847cb9922d22980984a3bea6f47bea7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
<marmarek@invisiblethingslab.com>
-Date: Tue, 21 Mar 2023 13:49:28 +0100
-Subject: [PATCH 40/61] tools/python: change 's#' size type for Python >= 3.10
+Date: Tue, 21 Mar 2023 13:43:44 +0100
+Subject: [PATCH 67/89] tools/python: change 's#' size type for Python >= 3.10
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
diff --git a/0041-tools-xenmon-Fix-xenmon.py-for-with-python3.x.patch b/0068-tools-xenmon-Fix-xenmon.py-for-with-python3.x.patch
similarity index 91%
rename from 0041-tools-xenmon-Fix-xenmon.py-for-with-python3.x.patch
rename to 0068-tools-xenmon-Fix-xenmon.py-for-with-python3.x.patch
index ff97af6..a47812b 100644
--- a/0041-tools-xenmon-Fix-xenmon.py-for-with-python3.x.patch
+++ b/0068-tools-xenmon-Fix-xenmon.py-for-with-python3.x.patch
@@ -1,7 +1,7 @@
-From 5ce8d2aef85f590e4fb42d18784512203069d0c0 Mon Sep 17 00:00:00 2001
+From 244d39fb13abae6c2da341b76363f169d8bbc93b Mon Sep 17 00:00:00 2001
From: Bernhard Kaindl <bernhard.kaindl@citrix.com>
-Date: Tue, 21 Mar 2023 13:49:47 +0100
-Subject: [PATCH 41/61] tools/xenmon: Fix xenmon.py for with python3.x
+Date: Tue, 21 Mar 2023 13:44:04 +0100
+Subject: [PATCH 68/89] tools/xenmon: Fix xenmon.py for with python3.x
Fixes for Py3:
* class Delayed(): file not defined; also an error for pylint -E. Inherit
diff --git a/0069-x86-spec-ctrl-Add-BHI-controls-to-userspace-componen.patch b/0069-x86-spec-ctrl-Add-BHI-controls-to-userspace-componen.patch
new file mode 100644
index 0000000..734a2e5
--- /dev/null
+++ b/0069-x86-spec-ctrl-Add-BHI-controls-to-userspace-componen.patch
@@ -0,0 +1,51 @@
+From b4dad09bb23c439f2e67ed2eb6d7bdd640b8bbae Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 21 Mar 2023 13:44:27 +0100
+Subject: [PATCH 69/89] x86/spec-ctrl: Add BHI controls to userspace components
+
+This was an oversight when adding the Xen parts.
+
+Fixes: cea9ae062295 ("x86/spec-ctrl: Enumeration for new Intel BHI controls")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 9276e832aef60437da13d91e66fc259fd94d6f91
+master date: 2023-03-13 11:26:26 +0000
+---
+ tools/libs/light/libxl_cpuid.c | 3 +++
+ tools/misc/xen-cpuid.c | 6 +++---
+ 2 files changed, 6 insertions(+), 3 deletions(-)
+
+diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
+index d97a2f3338..55cfbc8f23 100644
+--- a/tools/libs/light/libxl_cpuid.c
++++ b/tools/libs/light/libxl_cpuid.c
+@@ -238,6 +238,9 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
+ {"cet-sss", 0x00000007, 1, CPUID_REG_EDX, 18, 1},
+
+ {"intel-psfd", 0x00000007, 2, CPUID_REG_EDX, 0, 1},
++ {"ipred-ctrl", 0x00000007, 2, CPUID_REG_EDX, 1, 1},
++ {"rrsba-ctrl", 0x00000007, 2, CPUID_REG_EDX, 2, 1},
++ {"bhi-ctrl", 0x00000007, 2, CPUID_REG_EDX, 4, 1},
+ {"mcdt-no", 0x00000007, 2, CPUID_REG_EDX, 5, 1},
+
+ {"lahfsahf", 0x80000001, NA, CPUID_REG_ECX, 0, 1},
+diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
+index 0248eaef44..45e443f5d9 100644
+--- a/tools/misc/xen-cpuid.c
++++ b/tools/misc/xen-cpuid.c
+@@ -213,9 +213,9 @@ static const char *const str_7d1[32] =
+
+ static const char *const str_7d2[32] =
+ {
+- [ 0] = "intel-psfd",
+-
+- /* 4 */ [ 5] = "mcdt-no",
++ [ 0] = "intel-psfd", [ 1] = "ipred-ctrl",
++ [ 2] = "rrsba-ctrl",
++ [ 4] = "bhi-ctrl", [ 5] = "mcdt-no",
+ };
+
+ static const struct {
+--
+2.40.0
+
diff --git a/0042-core-parking-fix-build-with-gcc12-and-NR_CPUS-1.patch b/0070-core-parking-fix-build-with-gcc12-and-NR_CPUS-1.patch
similarity index 86%
rename from 0042-core-parking-fix-build-with-gcc12-and-NR_CPUS-1.patch
rename to 0070-core-parking-fix-build-with-gcc12-and-NR_CPUS-1.patch
index c425c43..0b2c2b4 100644
--- a/0042-core-parking-fix-build-with-gcc12-and-NR_CPUS-1.patch
+++ b/0070-core-parking-fix-build-with-gcc12-and-NR_CPUS-1.patch
@@ -1,7 +1,7 @@
-From 4a6bedefe589dab12182d6b974de8ea3b2fcc681 Mon Sep 17 00:00:00 2001
+From b5409f4e4d0722e8669123d59f15f784903d153f Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 21 Mar 2023 13:50:18 +0100
-Subject: [PATCH 42/61] core-parking: fix build with gcc12 and NR_CPUS=1
+Date: Tue, 21 Mar 2023 13:44:53 +0100
+Subject: [PATCH 70/89] core-parking: fix build with gcc12 and NR_CPUS=1
Gcc12 takes issue with core_parking_remove()'s
@@ -27,12 +27,12 @@ master date: 2023-03-13 15:15:42 +0100
4 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
-index 3c14096c80..8e2b504923 100644
+index 6a7825f4ba..2a5c3304e2 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
-@@ -8,7 +8,7 @@ config X86
- select ACPI_LEGACY_TABLES_LOOKUP
+@@ -10,7 +10,7 @@ config X86
select ALTERNATIVE_CALL
+ select ARCH_MAP_DOMAIN_PAGE
select ARCH_SUPPORTS_INT128
- select CORE_PARKING
+ imply CORE_PARKING
@@ -40,10 +40,10 @@ index 3c14096c80..8e2b504923 100644
select HAS_COMPAT
select HAS_CPUFREQ
diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
-index bf4090c942..c35e5669a4 100644
+index a7341dc3d7..e7deee2268 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
-@@ -725,12 +725,17 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
+@@ -727,12 +727,17 @@ ret_t do_platform_op(
case XEN_CORE_PARKING_SET:
idle_nums = min_t(uint32_t,
op->u.core_parking.idle_nums, num_present_cpus() - 1);
@@ -65,7 +65,7 @@ index bf4090c942..c35e5669a4 100644
-EFAULT : 0;
break;
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
-index aff52a13f3..ff843eaee2 100644
+index f82abc2488..f8f8d79755 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -179,6 +179,9 @@ long arch_do_sysctl(
@@ -79,7 +79,7 @@ index aff52a13f3..ff843eaee2 100644
fn = smt_up_down_helper;
hcpu = _p(plug);
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
-index 6443943889..c9f4b7f492 100644
+index f1ea3199c8..855c843113 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -10,6 +10,7 @@ config COMPAT
diff --git a/0043-x86-altp2m-help-gcc13-to-avoid-it-emitting-a-warning.patch b/0071-x86-altp2m-help-gcc13-to-avoid-it-emitting-a-warning.patch
similarity index 80%
rename from 0043-x86-altp2m-help-gcc13-to-avoid-it-emitting-a-warning.patch
rename to 0071-x86-altp2m-help-gcc13-to-avoid-it-emitting-a-warning.patch
index 0e040ad..b33bd11 100644
--- a/0043-x86-altp2m-help-gcc13-to-avoid-it-emitting-a-warning.patch
+++ b/0071-x86-altp2m-help-gcc13-to-avoid-it-emitting-a-warning.patch
@@ -1,7 +1,7 @@
-From cdde3171a2a932a6836b094c4387412e27414ec9 Mon Sep 17 00:00:00 2001
+From d84612ecab00ab31c09a7c5a5892906edbacaf5b Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 21 Mar 2023 13:51:42 +0100
-Subject: [PATCH 43/61] x86/altp2m: help gcc13 to avoid it emitting a warning
+Date: Tue, 21 Mar 2023 13:45:47 +0100
+Subject: [PATCH 71/89] x86/altp2m: help gcc13 to avoid it emitting a warning
Switches of altp2m-s always expect a valid altp2m to be in place (and
indeed altp2m_vcpu_initialise() sets the active one to be at index 0).
@@ -35,16 +35,16 @@ Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: be62b1fc2aa7375d553603fca07299da765a89fe
master date: 2023-03-13 15:16:21 +0100
---
- xen/arch/x86/hvm/vmx/vmx.c | 8 +-------
- xen/arch/x86/mm/p2m.c | 14 ++------------
- xen/include/asm-x86/p2m.h | 20 ++++++++++++++++++++
+ xen/arch/x86/hvm/vmx/vmx.c | 8 +-------
+ xen/arch/x86/include/asm/p2m.h | 20 ++++++++++++++++++++
+ xen/arch/x86/mm/p2m.c | 14 ++------------
3 files changed, 23 insertions(+), 19 deletions(-)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
-index 094141be9a..c8a839cd5e 100644
+index f0e759eeaf..a8fb4365ad 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
-@@ -4036,13 +4036,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+@@ -4072,13 +4072,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
}
}
@@ -58,12 +58,43 @@ index 094141be9a..c8a839cd5e 100644
+ p2m_set_altp2m(v, idx);
}
- /* XXX: This looks ugly, but we need a mechanism to ensure
+ if ( unlikely(currd->arch.monitor.vmexit_enabled) )
+diff --git a/xen/arch/x86/include/asm/p2m.h b/xen/arch/x86/include/asm/p2m.h
+index bd684d02f3..cd43d8621a 100644
+--- a/xen/arch/x86/include/asm/p2m.h
++++ b/xen/arch/x86/include/asm/p2m.h
+@@ -879,6 +879,26 @@ static inline struct p2m_domain *p2m_get_altp2m(struct vcpu *v)
+ return v->domain->arch.altp2m_p2m[index];
+ }
+
++/* set current alternate p2m table */
++static inline bool p2m_set_altp2m(struct vcpu *v, unsigned int idx)
++{
++ struct p2m_domain *orig;
++
++ BUG_ON(idx >= MAX_ALTP2M);
++
++ if ( idx == vcpu_altp2m(v).p2midx )
++ return false;
++
++ orig = p2m_get_altp2m(v);
++ BUG_ON(!orig);
++ atomic_dec(&orig->active_vcpus);
++
++ vcpu_altp2m(v).p2midx = idx;
++ atomic_inc(&v->domain->arch.altp2m_p2m[idx]->active_vcpus);
++
++ return true;
++}
++
+ /* Switch alternate p2m for a single vcpu */
+ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx);
+
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
-index 8781df9dda..2d41446a69 100644
+index a405ee5fde..b28c899b5e 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
-@@ -2194,13 +2194,8 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx)
+@@ -1787,13 +1787,8 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx)
if ( d->arch.altp2m_eptp[idx] != mfn_x(INVALID_MFN) )
{
@@ -78,7 +109,7 @@ index 8781df9dda..2d41446a69 100644
rc = 1;
}
-@@ -2471,13 +2466,8 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, unsigned int idx)
+@@ -2070,13 +2065,8 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, unsigned int idx)
if ( d->arch.altp2m_visible_eptp[idx] != mfn_x(INVALID_MFN) )
{
for_each_vcpu( d, v )
@@ -93,37 +124,6 @@ index 8781df9dda..2d41446a69 100644
rc = 0;
}
-diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
-index 2db9ab0122..f92bb97394 100644
---- a/xen/include/asm-x86/p2m.h
-+++ b/xen/include/asm-x86/p2m.h
-@@ -841,6 +841,26 @@ static inline struct p2m_domain *p2m_get_altp2m(struct vcpu *v)
- return v->domain->arch.altp2m_p2m[index];
- }
-
-+/* set current alternate p2m table */
-+static inline bool p2m_set_altp2m(struct vcpu *v, unsigned int idx)
-+{
-+ struct p2m_domain *orig;
-+
-+ BUG_ON(idx >= MAX_ALTP2M);
-+
-+ if ( idx == vcpu_altp2m(v).p2midx )
-+ return false;
-+
-+ orig = p2m_get_altp2m(v);
-+ BUG_ON(!orig);
-+ atomic_dec(&orig->active_vcpus);
-+
-+ vcpu_altp2m(v).p2midx = idx;
-+ atomic_inc(&v->domain->arch.altp2m_p2m[idx]->active_vcpus);
-+
-+ return true;
-+}
-+
- /* Switch alternate p2m for a single vcpu */
- bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx);
-
--
2.40.0
diff --git a/0044-VT-d-constrain-IGD-check.patch b/0072-VT-d-constrain-IGD-check.patch
similarity index 89%
rename from 0044-VT-d-constrain-IGD-check.patch
rename to 0072-VT-d-constrain-IGD-check.patch
index 13ca74e..497b04b 100644
--- a/0044-VT-d-constrain-IGD-check.patch
+++ b/0072-VT-d-constrain-IGD-check.patch
@@ -1,7 +1,7 @@
-From 4d42cc4d25c35ca381370a1fa0b45350723d1308 Mon Sep 17 00:00:00 2001
+From f971f5c531ce6a5fd6c1ff1f525f2c6837eeb78d Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 21 Mar 2023 13:52:20 +0100
-Subject: [PATCH 44/61] VT-d: constrain IGD check
+Date: Tue, 21 Mar 2023 13:46:39 +0100
+Subject: [PATCH 72/89] VT-d: constrain IGD check
Marking a DRHD as controlling an IGD isn't very sensible without
checking that at the very least it's a graphics device that lives at
@@ -17,7 +17,7 @@ master date: 2023-03-14 10:44:08 +0100
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
-index 33a12b2ae9..9ec49936b8 100644
+index 78c8bad151..78d4526446 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -391,15 +391,12 @@ static int __init acpi_parse_dev_scope(
diff --git a/0045-bunzip-work-around-gcc13-warning.patch b/0073-bunzip-work-around-gcc13-warning.patch
similarity index 87%
rename from 0045-bunzip-work-around-gcc13-warning.patch
rename to 0073-bunzip-work-around-gcc13-warning.patch
index 9b26011..c7ec163 100644
--- a/0045-bunzip-work-around-gcc13-warning.patch
+++ b/0073-bunzip-work-around-gcc13-warning.patch
@@ -1,7 +1,7 @@
-From 49116b2101094c3d6658928f03db88d035ba97be Mon Sep 17 00:00:00 2001
+From 7082d656ae9bcd26392caf72e50e0f7a61c8f285 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 21 Mar 2023 13:52:58 +0100
-Subject: [PATCH 45/61] bunzip: work around gcc13 warning
+Date: Tue, 21 Mar 2023 13:47:11 +0100
+Subject: [PATCH 73/89] bunzip: work around gcc13 warning
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
@@ -22,7 +22,7 @@ master date: 2023-03-14 10:45:28 +0100
1 file changed, 5 insertions(+)
diff --git a/xen/common/bunzip2.c b/xen/common/bunzip2.c
-index 2087cfbbed..5108e570ed 100644
+index 61b80aff1b..4466426941 100644
--- a/xen/common/bunzip2.c
+++ b/xen/common/bunzip2.c
@@ -233,6 +233,11 @@ static int __init get_next_block(struct bunzip_data *bd)
diff --git a/0046-libacpi-fix-PCI-hotplug-AML.patch b/0074-libacpi-fix-PCI-hotplug-AML.patch
similarity index 92%
rename from 0046-libacpi-fix-PCI-hotplug-AML.patch
rename to 0074-libacpi-fix-PCI-hotplug-AML.patch
index b1c79f5..3583849 100644
--- a/0046-libacpi-fix-PCI-hotplug-AML.patch
+++ b/0074-libacpi-fix-PCI-hotplug-AML.patch
@@ -1,7 +1,7 @@
-From 54102e428ba3f677904278479f8110c8eef6fedc Mon Sep 17 00:00:00 2001
+From 3eac216e6e60860bbc030602c401d3ef8efce8d9 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@amazon.co.uk>
-Date: Tue, 21 Mar 2023 13:53:25 +0100
-Subject: [PATCH 46/61] libacpi: fix PCI hotplug AML
+Date: Tue, 21 Mar 2023 13:47:52 +0100
+Subject: [PATCH 74/89] libacpi: fix PCI hotplug AML
The emulated PIIX3 uses a nybble for the status of each PCI function,
so the status for e.g. slot 0 functions 0 and 1 respectively can be
@@ -40,7 +40,7 @@ master date: 2023-03-20 17:12:34 +0100
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
-index c5ba4c0b2f..250a50b7eb 100644
+index 1176da80ef..1d27809116 100644
--- a/tools/libacpi/mk_dsdt.c
+++ b/tools/libacpi/mk_dsdt.c
@@ -431,7 +431,7 @@ int main(int argc, char **argv)
diff --git a/0047-AMD-IOMMU-without-XT-x2APIC-needs-to-be-forced-into-.patch b/0075-AMD-IOMMU-without-XT-x2APIC-needs-to-be-forced-into-.patch
similarity index 77%
rename from 0047-AMD-IOMMU-without-XT-x2APIC-needs-to-be-forced-into-.patch
rename to 0075-AMD-IOMMU-without-XT-x2APIC-needs-to-be-forced-into-.patch
index 54940ba..5decf2c 100644
--- a/0047-AMD-IOMMU-without-XT-x2APIC-needs-to-be-forced-into-.patch
+++ b/0075-AMD-IOMMU-without-XT-x2APIC-needs-to-be-forced-into-.patch
@@ -1,7 +1,7 @@
-From 8e9690a2252eda09537275a951ee0af0b3b330f2 Mon Sep 17 00:00:00 2001
+From 3c85fb7b65d6a8b0fa993bc1cb67eea9b4a64aca Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
-Date: Fri, 31 Mar 2023 08:36:59 +0200
-Subject: [PATCH 47/61] AMD/IOMMU: without XT, x2APIC needs to be forced into
+Date: Fri, 31 Mar 2023 08:28:56 +0200
+Subject: [PATCH 75/89] AMD/IOMMU: without XT, x2APIC needs to be forced into
physical mode
An earlier change with the same title (commit 1ba66a870eba) altered only
@@ -19,10 +19,10 @@ master date: 2023-03-21 09:23:25 +0100
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
-index 628b441da5..247364af58 100644
+index 7dfc793514..d512c50fc5 100644
--- a/xen/arch/x86/genapic/x2apic.c
+++ b/xen/arch/x86/genapic/x2apic.c
-@@ -239,11 +239,11 @@ const struct genapic *__init apic_x2apic_probe(void)
+@@ -236,11 +236,11 @@ const struct genapic *__init apic_x2apic_probe(void)
if ( x2apic_phys < 0 )
{
/*
@@ -34,9 +34,9 @@ index 628b441da5..247364af58 100644
*/
- x2apic_phys = !iommu_intremap ||
+ x2apic_phys = iommu_intremap != iommu_intremap_full ||
- (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL);
- }
- else if ( !x2apic_phys )
+ (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) ||
+ (IS_ENABLED(CONFIG_X2APIC_PHYSICAL) &&
+ !(acpi_gbl_FADT.flags & ACPI_FADT_APIC_CLUSTER));
--
2.40.0
diff --git a/0048-VT-d-fix-iommu-no-igfx-if-the-IOMMU-scope-contains-f.patch b/0076-VT-d-fix-iommu-no-igfx-if-the-IOMMU-scope-contains-f.patch
similarity index 89%
rename from 0048-VT-d-fix-iommu-no-igfx-if-the-IOMMU-scope-contains-f.patch
rename to 0076-VT-d-fix-iommu-no-igfx-if-the-IOMMU-scope-contains-f.patch
index 4c480b0..d897da6 100644
--- a/0048-VT-d-fix-iommu-no-igfx-if-the-IOMMU-scope-contains-f.patch
+++ b/0076-VT-d-fix-iommu-no-igfx-if-the-IOMMU-scope-contains-f.patch
@@ -1,8 +1,8 @@
-From 07e8f5b3d1300327a9f2e67b03dead0e2138b92f Mon Sep 17 00:00:00 2001
+From 33b1c8cd86bd6c311131b8dff32bd45581e2fbc1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
<marmarek@invisiblethingslab.com>
-Date: Fri, 31 Mar 2023 08:38:07 +0200
-Subject: [PATCH 48/61] VT-d: fix iommu=no-igfx if the IOMMU scope contains
+Date: Fri, 31 Mar 2023 08:29:55 +0200
+Subject: [PATCH 76/89] VT-d: fix iommu=no-igfx if the IOMMU scope contains
fake device(s)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
@@ -27,7 +27,7 @@ master date: 2023-03-23 09:16:41 +0100
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
-index 9ec49936b8..bfec40f47d 100644
+index 78d4526446..4936c20952 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -389,7 +389,7 @@ static int __init acpi_parse_dev_scope(
diff --git a/0049-x86-shadow-fix-and-improve-sh_page_has_multiple_shad.patch b/0077-x86-shadow-fix-and-improve-sh_page_has_multiple_shad.patch
similarity index 90%
rename from 0049-x86-shadow-fix-and-improve-sh_page_has_multiple_shad.patch
rename to 0077-x86-shadow-fix-and-improve-sh_page_has_multiple_shad.patch
index 0abf7e9..3486ccd 100644
--- a/0049-x86-shadow-fix-and-improve-sh_page_has_multiple_shad.patch
+++ b/0077-x86-shadow-fix-and-improve-sh_page_has_multiple_shad.patch
@@ -1,7 +1,7 @@
-From cab866ee62d860e9ff4abe701163972d4e9f896d Mon Sep 17 00:00:00 2001
+From 6f2d89d68175e74aca9c67761aa87ffc8f5ffed1 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
-Date: Fri, 31 Mar 2023 08:38:42 +0200
-Subject: [PATCH 49/61] x86/shadow: fix and improve
+Date: Fri, 31 Mar 2023 08:30:41 +0200
+Subject: [PATCH 77/89] x86/shadow: fix and improve
sh_page_has_multiple_shadows()
While no caller currently invokes the function without first making sure
@@ -30,7 +30,7 @@ master date: 2023-03-24 11:07:08 +0100
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h
-index 738214f75e..762214f73c 100644
+index 85bb26c7ea..c2bb1ed3c3 100644
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -324,7 +324,7 @@ static inline int sh_page_has_multiple_shadows(struct page_info *pg)
diff --git a/0050-x86-nospec-Fix-evaluate_nospec-code-generation-under.patch b/0078-x86-nospec-Fix-evaluate_nospec-code-generation-under.patch
similarity index 88%
rename from 0050-x86-nospec-Fix-evaluate_nospec-code-generation-under.patch
rename to 0078-x86-nospec-Fix-evaluate_nospec-code-generation-under.patch
index 14a8e14..62de15a 100644
--- a/0050-x86-nospec-Fix-evaluate_nospec-code-generation-under.patch
+++ b/0078-x86-nospec-Fix-evaluate_nospec-code-generation-under.patch
@@ -1,7 +1,7 @@
-From 90320fd05991d7817cea85e1d45674b757abf03c Mon Sep 17 00:00:00 2001
+From 00aa5c93d14c6561a69fe204cbe29f7519830782 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 31 Mar 2023 08:39:32 +0200
-Subject: [PATCH 50/61] x86/nospec: Fix evaluate_nospec() code generation under
+Date: Fri, 31 Mar 2023 08:31:20 +0200
+Subject: [PATCH 78/89] x86/nospec: Fix evaluate_nospec() code generation under
Clang
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
@@ -60,13 +60,13 @@ Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: bc3c133841435829ba5c0a48427e2a77633502ab
master date: 2023-03-24 12:16:31 +0000
---
- xen/include/asm-x86/nospec.h | 15 +++++++++++++--
+ xen/arch/x86/include/asm/nospec.h | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
-diff --git a/xen/include/asm-x86/nospec.h b/xen/include/asm-x86/nospec.h
+diff --git a/xen/arch/x86/include/asm/nospec.h b/xen/arch/x86/include/asm/nospec.h
index 5312ae4c6f..7150e76b87 100644
---- a/xen/include/asm-x86/nospec.h
-+++ b/xen/include/asm-x86/nospec.h
+--- a/xen/arch/x86/include/asm/nospec.h
++++ b/xen/arch/x86/include/asm/nospec.h
@@ -10,15 +10,26 @@
static always_inline bool barrier_nospec_true(void)
{
diff --git a/0051-x86-shadow-Fix-build-with-no-PG_log_dirty.patch b/0079-x86-shadow-Fix-build-with-no-PG_log_dirty.patch
similarity index 77%
rename from 0051-x86-shadow-Fix-build-with-no-PG_log_dirty.patch
rename to 0079-x86-shadow-Fix-build-with-no-PG_log_dirty.patch
index ef2a137..f7652a4 100644
--- a/0051-x86-shadow-Fix-build-with-no-PG_log_dirty.patch
+++ b/0079-x86-shadow-Fix-build-with-no-PG_log_dirty.patch
@@ -1,7 +1,7 @@
-From 7e1fe95c79d55a1c1a65f71a078b8e31c69ffe94 Mon Sep 17 00:00:00 2001
+From 11c8ef59b9024849c0fc224354904615d5579628 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 31 Mar 2023 08:39:49 +0200
-Subject: [PATCH 51/61] x86/shadow: Fix build with no PG_log_dirty
+Date: Fri, 31 Mar 2023 08:32:11 +0200
+Subject: [PATCH 79/89] x86/shadow: Fix build with no PG_log_dirty
Gitlab Randconfig found:
@@ -22,14 +22,14 @@ Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 6d14cb105b1c54ad7b4228d858ae85aa8a672bbd
master date: 2023-03-24 12:16:31 +0000
---
- xen/include/asm-x86/paging.h | 8 ++++----
+ xen/arch/x86/include/asm/paging.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
-diff --git a/xen/include/asm-x86/paging.h b/xen/include/asm-x86/paging.h
-index c6b429c691..43abaa5bd1 100644
---- a/xen/include/asm-x86/paging.h
-+++ b/xen/include/asm-x86/paging.h
-@@ -154,6 +154,10 @@ struct paging_mode {
+diff --git a/xen/arch/x86/include/asm/paging.h b/xen/arch/x86/include/asm/paging.h
+index 635ccc83b1..6f7000d5f4 100644
+--- a/xen/arch/x86/include/asm/paging.h
++++ b/xen/arch/x86/include/asm/paging.h
+@@ -152,6 +152,10 @@ struct paging_mode {
/*****************************************************************************
* Log dirty code */
@@ -40,7 +40,7 @@ index c6b429c691..43abaa5bd1 100644
#if PG_log_dirty
/* get the dirty bitmap for a specific range of pfns */
-@@ -192,10 +196,6 @@ int paging_mfn_is_dirty(struct domain *d, mfn_t gmfn);
+@@ -190,10 +194,6 @@ bool paging_mfn_is_dirty(const struct domain *d, mfn_t gmfn);
#define L4_LOGDIRTY_IDX(pfn) ((pfn_x(pfn) >> (PAGE_SHIFT + 3 + PAGETABLE_ORDER * 2)) & \
(LOGDIRTY_NODE_ENTRIES-1))
diff --git a/0052-x86-vmx-Don-t-spuriously-crash-the-domain-when-INIT-.patch b/0080-x86-vmx-Don-t-spuriously-crash-the-domain-when-INIT-.patch
similarity index 86%
rename from 0052-x86-vmx-Don-t-spuriously-crash-the-domain-when-INIT-.patch
rename to 0080-x86-vmx-Don-t-spuriously-crash-the-domain-when-INIT-.patch
index c408fbb..539401f 100644
--- a/0052-x86-vmx-Don-t-spuriously-crash-the-domain-when-INIT-.patch
+++ b/0080-x86-vmx-Don-t-spuriously-crash-the-domain-when-INIT-.patch
@@ -1,7 +1,7 @@
-From b1022b65de59828d40d9d71cc734a42c1c30c972 Mon Sep 17 00:00:00 2001
+From f6a3e93b3788aa009e9b86d9cb14c243b958daa9 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 31 Mar 2023 08:40:27 +0200
-Subject: [PATCH 52/61] x86/vmx: Don't spuriously crash the domain when INIT is
+Date: Fri, 31 Mar 2023 08:32:57 +0200
+Subject: [PATCH 80/89] x86/vmx: Don't spuriously crash the domain when INIT is
received
In VMX operation, the handling of INIT IPIs is changed. Instead of the CPU
@@ -32,10 +32,10 @@ master date: 2023-03-24 22:49:58 +0000
1 file changed, 4 insertions(+)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
-index c8a839cd5e..cebe46ef6a 100644
+index a8fb4365ad..64dbd50197 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
-@@ -4002,6 +4002,10 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+@@ -4038,6 +4038,10 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
case EXIT_REASON_MCE_DURING_VMENTRY:
do_machine_check(regs);
break;
diff --git a/0053-x86-ucode-Fix-error-paths-control_thread_fn.patch b/0081-x86-ucode-Fix-error-paths-control_thread_fn.patch
similarity index 77%
rename from 0053-x86-ucode-Fix-error-paths-control_thread_fn.patch
rename to 0081-x86-ucode-Fix-error-paths-control_thread_fn.patch
index 7bb2c27..765fa84 100644
--- a/0053-x86-ucode-Fix-error-paths-control_thread_fn.patch
+++ b/0081-x86-ucode-Fix-error-paths-control_thread_fn.patch
@@ -1,7 +1,7 @@
-From 0f81c5a2c8e0432d5af3d9f4e6398376cd514516 Mon Sep 17 00:00:00 2001
+From 7f55774489d2f12a23f2ac0f516b62e2709cea99 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 31 Mar 2023 08:40:56 +0200
-Subject: [PATCH 53/61] x86/ucode: Fix error paths control_thread_fn()
+Date: Fri, 31 Mar 2023 08:33:28 +0200
+Subject: [PATCH 81/89] x86/ucode: Fix error paths control_thread_fn()
These two early exits skipped re-enabling the watchdog, restoring the NMI
callback, and clearing the nmi_patch global pointer. Always execute the tail
@@ -18,10 +18,10 @@ master date: 2023-03-28 11:57:56 +0100
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
-index ee7df9a591..ad150e5963 100644
+index 2497630bbe..c760723e4f 100644
--- a/xen/arch/x86/cpu/microcode/core.c
+++ b/xen/arch/x86/cpu/microcode/core.c
-@@ -488,10 +488,7 @@ static int control_thread_fn(const struct microcode_patch *patch)
+@@ -490,10 +490,7 @@ static int control_thread_fn(const struct microcode_patch *patch)
ret = wait_for_condition(wait_cpu_callin, num_online_cpus(),
MICROCODE_CALLIN_TIMEOUT_US);
if ( ret )
@@ -32,8 +32,8 @@ index ee7df9a591..ad150e5963 100644
+ goto out;
/* Control thread loads ucode first while others are in NMI handler. */
- ret = microcode_ops->apply_microcode(patch);
-@@ -503,8 +500,7 @@ static int control_thread_fn(const struct microcode_patch *patch)
+ ret = alternative_call(ucode_ops.apply_microcode, patch);
+@@ -505,8 +502,7 @@ static int control_thread_fn(const struct microcode_patch *patch)
{
printk(XENLOG_ERR
"Late loading aborted: CPU%u failed to update ucode\n", cpu);
@@ -43,7 +43,7 @@ index ee7df9a591..ad150e5963 100644
}
/* Let primary threads load the given ucode update */
-@@ -535,6 +531,7 @@ static int control_thread_fn(const struct microcode_patch *patch)
+@@ -537,6 +533,7 @@ static int control_thread_fn(const struct microcode_patch *patch)
}
}
diff --git a/0082-include-don-t-mention-stub-headers-more-than-once-in.patch b/0082-include-don-t-mention-stub-headers-more-than-once-in.patch
new file mode 100644
index 0000000..cc0a914
--- /dev/null
+++ b/0082-include-don-t-mention-stub-headers-more-than-once-in.patch
@@ -0,0 +1,37 @@
+From 350693582427887387f21a6eeedaa0ac48aecc3f Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Fri, 31 Mar 2023 08:34:04 +0200
+Subject: [PATCH 82/89] include: don't mention stub headers more than once in a
+ make rule
+
+When !GRANT_TABLE and !PV_SHIM headers-n contains grant_table.h twice,
+causing make to complain "target '...' given more than once in the same
+rule" for the rule generating the stub headers. We don't need duplicate
+entries in headers-n anywhere, so zap them (by using $(sort ...)) right
+where the final value of the variable is constructed.
+
+Fixes: 6bec713f871f ("include/compat: produce stubs for headers not otherwise generated")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
+master commit: 231ab79704cbb5b9be7700287c3b185225d34f1b
+master date: 2023-03-28 14:20:16 +0200
+---
+ xen/include/Makefile | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/include/Makefile b/xen/include/Makefile
+index cfd7851614..e19f9464fd 100644
+--- a/xen/include/Makefile
++++ b/xen/include/Makefile
+@@ -34,7 +34,7 @@ headers-$(CONFIG_TRACEBUFFER) += compat/trace.h
+ headers-$(CONFIG_XENOPROF) += compat/xenoprof.h
+ headers-$(CONFIG_XSM_FLASK) += compat/xsm/flask_op.h
+
+-headers-n := $(filter-out $(headers-y),$(headers-n) $(headers-))
++headers-n := $(sort $(filter-out $(headers-y),$(headers-n) $(headers-)))
+
+ cppflags-y := -include public/xen-compat.h -DXEN_GENERATING_COMPAT_HEADERS
+ cppflags-$(CONFIG_X86) += -m32
+--
+2.40.0
+
diff --git a/0054-vpci-msix-handle-accesses-adjacent-to-the-MSI-X-tabl.patch b/0083-vpci-msix-handle-accesses-adjacent-to-the-MSI-X-tabl.patch
similarity index 88%
rename from 0054-vpci-msix-handle-accesses-adjacent-to-the-MSI-X-tabl.patch
rename to 0083-vpci-msix-handle-accesses-adjacent-to-the-MSI-X-tabl.patch
index 4973ae7..8a1f412 100644
--- a/0054-vpci-msix-handle-accesses-adjacent-to-the-MSI-X-tabl.patch
+++ b/0083-vpci-msix-handle-accesses-adjacent-to-the-MSI-X-tabl.patch
@@ -1,7 +1,7 @@
-From d080287c2a8dce11baee1d7bbf9276757e8572e4 Mon Sep 17 00:00:00 2001
+From 85100ed78ca18f188b1ca495f132db7df705f1a4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Fri, 31 Mar 2023 08:41:27 +0200
-Subject: [PATCH 54/61] vpci/msix: handle accesses adjacent to the MSI-X table
+Date: Fri, 31 Mar 2023 08:34:26 +0200
+Subject: [PATCH 83/89] vpci/msix: handle accesses adjacent to the MSI-X table
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
@@ -62,13 +62,13 @@ master date: 2023-03-28 14:20:35 +0200
master commit: 7a502b4fbc339e9d3d3d45fb37f09da06bc3081c
master date: 2023-03-29 14:56:33 +0200
---
- xen/drivers/vpci/msix.c | 357 +++++++++++++++++++++++++++++-----------
+ xen/drivers/vpci/msix.c | 353 +++++++++++++++++++++++++++++-----------
xen/drivers/vpci/vpci.c | 7 +-
xen/include/xen/vpci.h | 8 +-
- 3 files changed, 275 insertions(+), 97 deletions(-)
+ 3 files changed, 273 insertions(+), 95 deletions(-)
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
-index ea5d73a02a..7e1bfb2f0a 100644
+index bea0cc7aed..cafddcf305 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -27,6 +27,11 @@
@@ -80,8 +80,8 @@ index ea5d73a02a..7e1bfb2f0a 100644
+ PFN_DOWN(addr) <= PFN_DOWN(vmsix_table_addr(vpci, nr) + \
+ vmsix_table_size(vpci, nr) - 1))
+
- static uint32_t control_read(const struct pci_dev *pdev, unsigned int reg,
- void *data)
+ static uint32_t cf_check control_read(
+ const struct pci_dev *pdev, unsigned int reg, void *data)
{
@@ -149,7 +154,7 @@ static struct vpci_msix *msix_find(const struct domain *d, unsigned long addr)
@@ -179,7 +179,11 @@ index ea5d73a02a..7e1bfb2f0a 100644
+
+ return false;
+}
-+
+
+- pba = ioremap(vmsix_table_addr(vpci, VPCI_MSIX_PBA),
+- vmsix_table_size(vpci, VPCI_MSIX_PBA));
+- if ( !pba )
+- return read_atomic(&msix->pba);
+static int adjacent_read(const struct domain *d, const struct vpci_msix *msix,
+ unsigned long addr, unsigned int len,
+ unsigned long *data)
@@ -205,11 +209,7 @@ index ea5d73a02a..7e1bfb2f0a 100644
+ if ( unlikely(!IS_ALIGNED(addr, len)) )
+ {
+ unsigned int i;
-
-- pba = ioremap(vmsix_table_addr(vpci, VPCI_MSIX_PBA),
-- vmsix_table_size(vpci, VPCI_MSIX_PBA));
-- if ( !pba )
-- return read_atomic(&msix->pba);
++
+ gprintk(XENLOG_DEBUG, "%pp: unaligned read to MSI-X related page\n",
+ &msix->pdev->sbdf);
+
@@ -280,8 +280,8 @@ index ea5d73a02a..7e1bfb2f0a 100644
+ return X86EMUL_OKAY;
}
- static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
-@@ -227,47 +368,11 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
+ static int cf_check msix_read(
+@@ -227,47 +368,11 @@ static int cf_check msix_read(
if ( !msix )
return X86EMUL_RETRY;
@@ -332,12 +332,12 @@ index ea5d73a02a..7e1bfb2f0a 100644
spin_lock(&msix->pdev->vpci->lock);
entry = get_entry(msix, addr);
-@@ -303,57 +408,103 @@ static int msix_read(struct vcpu *v, unsigned long addr, unsigned int len,
+@@ -303,56 +408,102 @@ static int cf_check msix_read(
return X86EMUL_OKAY;
}
--static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
-- unsigned long data)
+-static int cf_check msix_write(
+- struct vcpu *v, unsigned long addr, unsigned int len, unsigned long data)
+static int adjacent_write(const struct domain *d, const struct vpci_msix *msix,
+ unsigned long addr, unsigned int len,
+ unsigned long data)
@@ -367,55 +367,48 @@ index ea5d73a02a..7e1bfb2f0a 100644
return X86EMUL_OKAY;
- if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
-- {
-- /* Ignore writes to PBA for DomUs, it's behavior is undefined. */
-- if ( is_hardware_domain(d) )
-- {
-- struct vpci *vpci = msix->pdev->vpci;
-- unsigned int idx = addr - vmsix_table_addr(vpci, VPCI_MSIX_PBA);
-- const void __iomem *pba = get_pba(vpci);
+ slot = get_slot(vpci, addr);
+ if ( slot >= ARRAY_SIZE(msix->table) )
+ return X86EMUL_OKAY;
-
-- if ( !pba )
-- {
-- /* Unable to map the PBA, ignore write. */
-- gprintk(XENLOG_WARNING,
-- "%pp: unable to map MSI-X PBA, write ignored\n",
-- &msix->pdev->sbdf);
-- return X86EMUL_OKAY;
-- }
++
+ if ( unlikely(!IS_ALIGNED(addr, len)) )
-+ {
+ {
+- struct vpci *vpci = msix->pdev->vpci;
+- unsigned int idx = addr - vmsix_table_addr(vpci, VPCI_MSIX_PBA);
+- const void __iomem *pba = get_pba(vpci);
+ unsigned int i;
-- switch ( len )
-- {
-- case 4:
-- writel(data, pba + idx);
-- break;
+- if ( !is_hardware_domain(d) )
+- /* Ignore writes to PBA for DomUs, it's behavior is undefined. */
+- return X86EMUL_OKAY;
+ gprintk(XENLOG_DEBUG, "%pp: unaligned write to MSI-X related page\n",
+ &msix->pdev->sbdf);
-- case 8:
-- writeq(data, pba + idx);
-- break;
+- if ( !pba )
+ for ( i = 0; i < len; i++ )
-+ {
+ {
+- /* Unable to map the PBA, ignore write. */
+- gprintk(XENLOG_WARNING,
+- "%pp: unable to map MSI-X PBA, write ignored\n",
+- &msix->pdev->sbdf);
+- return X86EMUL_OKAY;
+ int rc = adjacent_write(d, msix, addr + i, 1, data >> (i * 8));
-
-- default:
-- ASSERT_UNREACHABLE();
-- break;
-- }
++
+ if ( rc != X86EMUL_OKAY )
+ return rc;
}
- return X86EMUL_OKAY;
- }
+- switch ( len )
+- {
+- case 4:
+- writel(data, pba + idx);
+- break;
++ return X86EMUL_OKAY;
++ }
+- case 8:
+- writeq(data, pba + idx);
+- break;
+ spin_lock(&vpci->lock);
+ mem = get_table(vpci, slot);
+ if ( !mem )
@@ -426,13 +419,18 @@ index ea5d73a02a..7e1bfb2f0a 100644
+ &msix->pdev->sbdf);
+ return X86EMUL_OKAY;
+ }
-+
+
+- default:
+- ASSERT_UNREACHABLE();
+- break;
+- }
+ switch ( len )
+ {
+ case 1:
+ writeb(data, mem + PAGE_OFFSET(addr));
+ break;
-+
+
+- return X86EMUL_OKAY;
+ case 2:
+ writew(data, mem + PAGE_OFFSET(addr));
+ break;
@@ -447,14 +445,14 @@ index ea5d73a02a..7e1bfb2f0a 100644
+
+ default:
+ ASSERT_UNREACHABLE();
-+ }
+ }
+ spin_unlock(&vpci->lock);
+
+ return X86EMUL_OKAY;
+}
+
-+static int msix_write(struct vcpu *v, unsigned long addr, unsigned int len,
-+ unsigned long data)
++static int cf_check msix_write(
++ struct vcpu *v, unsigned long addr, unsigned int len, unsigned long data)
+{
+ const struct domain *d = v->domain;
+ struct vpci_msix *msix = msix_find(d, addr);
@@ -469,10 +467,9 @@ index ea5d73a02a..7e1bfb2f0a 100644
+
+ if ( !access_allowed(msix->pdev, addr, len) )
+ return X86EMUL_OKAY;
-+
+
spin_lock(&msix->pdev->vpci->lock);
entry = get_entry(msix, addr);
- offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
@@ -482,6 +633,26 @@ int vpci_make_msix_hole(const struct pci_dev *pdev)
}
}
@@ -501,10 +498,10 @@ index ea5d73a02a..7e1bfb2f0a 100644
}
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
-index b9339f8f3e..60b5f45cd1 100644
+index 6d48d496bb..652807a4a4 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
-@@ -53,9 +53,12 @@ void vpci_remove_device(struct pci_dev *pdev)
+@@ -54,9 +54,12 @@ void vpci_remove_device(struct pci_dev *pdev)
spin_unlock(&pdev->vpci->lock);
if ( pdev->vpci->msix )
{
@@ -520,10 +517,10 @@ index b9339f8f3e..60b5f45cd1 100644
xfree(pdev->vpci->msix);
xfree(pdev->vpci->msi);
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
-index 755b4fd5c8..3326d9026e 100644
+index d8acfeba8a..0b8a2a3c74 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
-@@ -129,8 +129,12 @@ struct vpci {
+@@ -133,8 +133,12 @@ struct vpci {
bool enabled : 1;
/* Masked? */
bool masked : 1;
diff --git a/0055-ns16550-correct-name-value-pair-parsing-for-PCI-port.patch b/0084-ns16550-correct-name-value-pair-parsing-for-PCI-port.patch
similarity index 86%
rename from 0055-ns16550-correct-name-value-pair-parsing-for-PCI-port.patch
rename to 0084-ns16550-correct-name-value-pair-parsing-for-PCI-port.patch
index 9c05f3a..6ab5c69 100644
--- a/0055-ns16550-correct-name-value-pair-parsing-for-PCI-port.patch
+++ b/0084-ns16550-correct-name-value-pair-parsing-for-PCI-port.patch
@@ -1,7 +1,7 @@
-From 06264af090ac69a95cdadbc261cc82d964dcb568 Mon Sep 17 00:00:00 2001
+From 7758cd57e002c5096b2296ede67c59fca68724d7 Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
-Date: Fri, 31 Mar 2023 08:42:02 +0200
-Subject: [PATCH 55/61] ns16550: correct name/value pair parsing for PCI
+Date: Fri, 31 Mar 2023 08:35:15 +0200
+Subject: [PATCH 84/89] ns16550: correct name/value pair parsing for PCI
port/bridge
First of all these were inverted: "bridge=" caused the port coordinates
@@ -19,10 +19,10 @@ master date: 2023-03-29 14:55:37 +0200
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
-index 5dd4d723f5..3651e0c0d4 100644
+index ce013fb6a5..97b3d8d269 100644
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
-@@ -1536,13 +1536,6 @@ static bool __init parse_namevalue_pairs(char *str, struct ns16550 *uart)
+@@ -1631,13 +1631,6 @@ static bool __init parse_namevalue_pairs(char *str, struct ns16550 *uart)
break;
#ifdef CONFIG_HAS_PCI
@@ -36,7 +36,7 @@ index 5dd4d723f5..3651e0c0d4 100644
case device:
if ( strncmp(param_value, "pci", 3) == 0 )
{
-@@ -1557,9 +1550,16 @@ static bool __init parse_namevalue_pairs(char *str, struct ns16550 *uart)
+@@ -1652,9 +1645,16 @@ static bool __init parse_namevalue_pairs(char *str, struct ns16550 *uart)
break;
case port_bdf:
diff --git a/0057-CI-Drop-automation-configs.patch b/0085-CI-Drop-automation-configs.patch
similarity index 90%
rename from 0057-CI-Drop-automation-configs.patch
rename to 0085-CI-Drop-automation-configs.patch
index d726468..bfed25a 100644
--- a/0057-CI-Drop-automation-configs.patch
+++ b/0085-CI-Drop-automation-configs.patch
@@ -1,7 +1,7 @@
-From 657dc5f5f6269008fd7484ca7cca723e21455483 Mon Sep 17 00:00:00 2001
+From 4c0d792675f0843c6dd52acdae38e5c0e112b09e Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu, 29 Dec 2022 15:39:13 +0000
-Subject: [PATCH 57/61] CI: Drop automation/configs/
+Subject: [PATCH 85/89] CI: Drop automation/configs/
Having 3 extra hypervisor builds on the end of a full build is deeply
confusing to debug if one of them fails, because the .config file presented in
@@ -54,10 +54,10 @@ index e9d8b4a7c7..0000000000
-# CONFIG_HVM is not set
-# CONFIG_DEBUG is not set
diff --git a/automation/scripts/build b/automation/scripts/build
-index 281f8b1fcc..2c807fa397 100755
+index a593419063..5dafa72ba5 100755
--- a/automation/scripts/build
+++ b/automation/scripts/build
-@@ -73,24 +73,3 @@ if [[ "${XEN_TARGET_ARCH}" != "x86_32" ]]; then
+@@ -85,24 +85,3 @@ if [[ "${XEN_TARGET_ARCH}" != "x86_32" ]]; then
cp -r dist binaries/
fi
fi
@@ -79,8 +79,8 @@ index 281f8b1fcc..2c807fa397 100755
- echo "Building $cfg"
- make -j$(nproc) -C xen clean
- rm -f xen/.config
-- make -C xen KBUILD_DEFCONFIG=../../../../${cfg_dir}/${cfg} XEN_CONFIG_EXPERT=y defconfig
-- make -j$(nproc) -C xen XEN_CONFIG_EXPERT=y
+- make -C xen KBUILD_DEFCONFIG=../../../../${cfg_dir}/${cfg} defconfig
+- make -j$(nproc) -C xen
-done
--
2.40.0
diff --git a/0058-automation-Switch-arm32-cross-builds-to-run-on-arm64.patch b/0086-automation-Switch-arm32-cross-builds-to-run-on-arm64.patch
similarity index 93%
rename from 0058-automation-Switch-arm32-cross-builds-to-run-on-arm64.patch
rename to 0086-automation-Switch-arm32-cross-builds-to-run-on-arm64.patch
index 92d65ec..a200cab 100644
--- a/0058-automation-Switch-arm32-cross-builds-to-run-on-arm64.patch
+++ b/0086-automation-Switch-arm32-cross-builds-to-run-on-arm64.patch
@@ -1,7 +1,7 @@
-From 37800cf8ab7806e506b96a13cad0fb395d86663a Mon Sep 17 00:00:00 2001
+From e3b23da4a10fafdabce22e2eba225d9404fc646f Mon Sep 17 00:00:00 2001
From: Michal Orzel <michal.orzel@amd.com>
Date: Tue, 14 Feb 2023 16:38:38 +0100
-Subject: [PATCH 58/61] automation: Switch arm32 cross builds to run on arm64
+Subject: [PATCH 86/89] automation: Switch arm32 cross builds to run on arm64
Due to the limited x86 CI resources slowing down the whole pipeline,
switch the arm32 cross builds to be executed on arm64 which is much more
@@ -42,7 +42,7 @@ index b41a57f197..11860425a6 100644
rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
-
diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
-index 06a75a8c5a..f66fbca8a7 100644
+index bed161b471..b4caf159f9 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
@@ -123,7 +123,7 @@
@@ -54,7 +54,7 @@ index 06a75a8c5a..f66fbca8a7 100644
.arm32-cross-build:
extends: .arm32-cross-build-tmpl
-@@ -497,23 +497,23 @@ alpine-3.12-clang-debug:
+@@ -505,23 +505,23 @@ alpine-3.12-clang-debug:
debian-unstable-gcc-arm32:
extends: .gcc-arm32-cross-build
variables:
diff --git a/0059-automation-Remove-CentOS-7.2-containers-and-builds.patch b/0087-automation-Remove-CentOS-7.2-containers-and-builds.patch
similarity index 96%
rename from 0059-automation-Remove-CentOS-7.2-containers-and-builds.patch
rename to 0087-automation-Remove-CentOS-7.2-containers-and-builds.patch
index 8d58eea..b5d629d 100644
--- a/0059-automation-Remove-CentOS-7.2-containers-and-builds.patch
+++ b/0087-automation-Remove-CentOS-7.2-containers-and-builds.patch
@@ -1,7 +1,7 @@
-From a4d901580b2ab3133bca13159b790914c217b0e2 Mon Sep 17 00:00:00 2001
+From 8c414bab3092bb68ab4eaaba39b61e3804c45f0a Mon Sep 17 00:00:00 2001
From: Anthony PERARD <anthony.perard@citrix.com>
Date: Tue, 21 Feb 2023 16:55:36 +0000
-Subject: [PATCH 59/61] automation: Remove CentOS 7.2 containers and builds
+Subject: [PATCH 87/89] automation: Remove CentOS 7.2 containers and builds
We already have a container which track the latest CentOS 7, no need
for this one as well.
@@ -120,7 +120,7 @@ index 4da27faeb5..0000000000
-gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
-
diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
-index f66fbca8a7..bc1a732069 100644
+index b4caf159f9..ff6df1cfc2 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
@@ -184,16 +184,6 @@ archlinux-gcc-debug:
diff --git a/0060-automation-Remove-non-debug-x86_32-build-jobs.patch b/0088-automation-Remove-non-debug-x86_32-build-jobs.patch
similarity index 88%
rename from 0060-automation-Remove-non-debug-x86_32-build-jobs.patch
rename to 0088-automation-Remove-non-debug-x86_32-build-jobs.patch
index c5516be..d16014e 100644
--- a/0060-automation-Remove-non-debug-x86_32-build-jobs.patch
+++ b/0088-automation-Remove-non-debug-x86_32-build-jobs.patch
@@ -1,7 +1,7 @@
-From 27974fde92850419e385ad0355997c54d78046f2 Mon Sep 17 00:00:00 2001
+From 435a1e5e8fd6fbd52cc16570dcff5982bdbec351 Mon Sep 17 00:00:00 2001
From: Anthony PERARD <anthony.perard@citrix.com>
Date: Fri, 24 Feb 2023 17:29:15 +0000
-Subject: [PATCH 60/61] automation: Remove non-debug x86_32 build jobs
+Subject: [PATCH 88/89] automation: Remove non-debug x86_32 build jobs
In the interest of having less jobs, we remove the x86_32 build jobs
that do release build. Debug build is very likely to be enough to find
@@ -15,7 +15,7 @@ Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
1 file changed, 20 deletions(-)
diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
-index bc1a732069..4b51ad9e34 100644
+index ff6df1cfc2..eea517aa0a 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
@@ -264,21 +264,11 @@ debian-stretch-gcc-debug:
@@ -40,7 +40,7 @@ index bc1a732069..4b51ad9e34 100644
debian-stretch-32-gcc-debug:
extends: .gcc-x86-32-build-debug
variables:
-@@ -316,21 +306,11 @@ debian-unstable-gcc-debug-randconfig:
+@@ -324,21 +314,11 @@ debian-unstable-gcc-debug-randconfig:
CONTAINER: debian:unstable
RANDCONFIG: y
diff --git a/0061-CI-Remove-llvm-8-from-the-Debian-Stretch-container.patch b/0089-CI-Remove-llvm-8-from-the-Debian-Stretch-container.patch
similarity index 94%
rename from 0061-CI-Remove-llvm-8-from-the-Debian-Stretch-container.patch
rename to 0089-CI-Remove-llvm-8-from-the-Debian-Stretch-container.patch
index 9170382..c0294ec 100644
--- a/0061-CI-Remove-llvm-8-from-the-Debian-Stretch-container.patch
+++ b/0089-CI-Remove-llvm-8-from-the-Debian-Stretch-container.patch
@@ -1,7 +1,7 @@
-From 31627a059c2e186f4ad12d171d964b09abe8a4a9 Mon Sep 17 00:00:00 2001
+From e4a5fb9227889bec99ab212b839680f4d5b51e60 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Fri, 24 Mar 2023 17:59:56 +0000
-Subject: [PATCH 61/61] CI: Remove llvm-8 from the Debian Stretch container
+Subject: [PATCH 89/89] CI: Remove llvm-8 from the Debian Stretch container
For similar reasons to c/s a6b1e2b80fe20. While this container is still
build-able for now, all the other problems with explicitly-versioned compilers
@@ -47,7 +47,7 @@ index da6aa874dd..9861acbcc3 100644
- apt-get clean && \
- rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
-index 4b51ad9e34..fd8034b429 100644
+index eea517aa0a..802449cb96 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
@@ -27,13 +27,6 @@
diff --git a/info.txt b/info.txt
index c92b6d7..45b2f7f 100644
--- a/info.txt
+++ b/info.txt
@@ -1,6 +1,6 @@
-Xen upstream patchset #0 for 4.16.4-pre
+Xen upstream patchset #0 for 4.17.1-pre
Containing patches from
-RELEASE-4.16.3 (08c42cec2f3dbb8d1df62c2ad4945d127b418fd6)
+RELEASE-4.17.0 (5556ac9bf224ed6b977f214653b234de45dcdfbf)
to
-staging-4.16 (4ad5975d4e35635f03d2cb9e86292c0daeabd75f)
+staging-4.17 (e4a5fb9227889bec99ab212b839680f4d5b51e60)
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [gentoo-commits] proj/xen-upstream-patches:main commit in: /
@ 2023-10-18 18:31 Tomáš Mózes
0 siblings, 0 replies; 11+ messages in thread
From: Tomáš Mózes @ 2023-10-18 18:31 UTC (permalink / raw
To: gentoo-commits
commit: ffe00bc5becaed2dbaed9fdcadb6eea0bd4f9dd4
Author: Tomáš Mózes <hydrapolic <AT> gmail <DOT> com>
AuthorDate: Wed Oct 18 18:30:08 2023 +0000
Commit: Tomáš Mózes <hydrapolic <AT> gmail <DOT> com>
CommitDate: Wed Oct 18 18:30:08 2023 +0000
URL: https://gitweb.gentoo.org/proj/xen-upstream-patches.git/commit/?id=ffe00bc5
Xen 4.17.3-pre-patchset-0
Signed-off-by: Tomáš Mózes <hydrapolic <AT> gmail.com>
0001-update-Xen-version-to-4.17.1-pre.patch | 136 ------
0001-update-Xen-version-to-4.17.3-pre.patch | 25 +
...ild-with-old-gcc-after-CPU-policy-changes.patch | 84 ++++
...not-release-irq-until-all-cleanup-is-done.patch | 90 ----
...EN_LIB_DIR-to-store-bootloader-from-pygru.patch | 45 ++
...not-forward-MADT-Local-APIC-NMI-structure.patch | 103 ----
0004-build-define-ARCH-and-SRCARCH-later.patch | 67 +++
...-t-mark-external-IRQs-as-pending-when-vLA.patch | 71 ---
...remove-TARGET_SUBARCH-a-duplicate-of-ARCH.patch | 50 ++
...n-don-t-mark-IRQ-vectors-as-pending-when-.patch | 60 ---
...remove-TARGET_ARCH-a-duplicate-of-SRCARCH.patch | 123 +++++
...-t-mark-evtchn-upcall-vector-as-pending-w.patch | 70 ---
...ate-XEN_BUILD_-and-XEN_DOMAIN-immediately.patch | 58 +++
...roadcast-accept-partial-broadcast-success.patch | 34 --
...valuate-XEN_COMPILE_ARCH-and-XEN_OS-immed.patch | 50 ++
...cate-the-ESRT-when-booting-via-multiboot2.patch | 195 --------
...prevent-overflow-with-high-frequency-TSCs.patch | 34 --
...ork-wrapping-of-libc-functions-in-test-an.patch | 245 ++++++++++
0010-rombios-Work-around-GCC-issue-99578.patch | 43 ++
...tored-Fix-incorrect-scope-after-an-if-sta.patch | 52 --
0011-rombios-Avoid-using-K-R-function-syntax.patch | 74 +++
...-evtchn-OCaml-5-support-fix-potential-res.patch | 68 ---
0012-rombios-Remove-the-use-of-egrep.patch | 34 ++
...l-evtchn-Add-binding-for-xenevtchn_fdopen.patch | 81 ----
0013-CI-Resync-FreeBSD-config-with-staging.patch | 62 +++
...-evtchn-Extend-the-init-binding-with-a-cl.patch | 90 ----
0014-tools-oxenstored-Style-fixes-to-Domain.patch | 64 ---
...-Fix-Wsingle-bit-bitfield-constant-conver.patch | 43 ++
...tored-Bind-the-DOM_EXC-VIRQ-in-in-Event.i.patch | 82 ----
0015-xen-vcpu-ignore-VCPU_SSHOTTMR_future.patch | 143 ++++++
...tored-Rename-some-port-variables-to-remot.patch | 144 ------
0016-x86-head-check-base-address-alignment.patch | 85 ++++
...oxenstored-Implement-Domain.rebind_evtchn.patch | 67 ---
...e-Handle-start-of-day-RUNNING-transitions.patch | 275 +++++++++++
...tored-Rework-Domain-evtchn-handling-to-us.patch | 209 --------
...sanitize-IO-APIC-pins-before-enabling-lap.patch | 113 +++++
...tored-Keep-dev-xen-evtchn-open-across-liv.patch | 367 --------------
...-x86-ioapic-add-a-raw-field-to-RTE-struct.patch | 147 ++++++
...tored-Log-live-update-issues-at-warning-l.patch | 42 --
...RTE-modifications-must-use-ioapic_write_e.patch | 180 +++++++
...ename-io_apic_read_remap_rte-local-variab.patch | 64 +++
...oxenstored-Set-uncaught-exception-handler.patch | 83 ----
...tored-syslog-Avoid-potential-NULL-derefer.patch | 55 ---
...ass-full-IO-APIC-RTE-for-remapping-table-.patch | 462 ++++++++++++++++++
0023-build-correct-gas-noexecstack-check.patch | 34 ++
...tored-Render-backtraces-more-nicely-in-Sy.patch | 83 ----
...s-xenstore-simplify-loop-handling-connect.patch | 136 ------
...tly-correct-JSON-generation-of-CPU-policy.patch | 38 ++
0025-tboot-Disable-CET-at-shutdown.patch | 53 ++
...-Restore-Xen-s-MSR_PAT-value-on-S3-resume.patch | 36 --
...uild-with-recent-QEMU-use-enable-trace-ba.patch | 50 --
...-valid-condition-in-svm_get_pending_event.patch | 29 ++
| 74 ---
...ert-x86-VMX-sanitize-rIP-before-re-enteri.patch | 100 ++++
...ix-reporting-of-spurious-i8259-interrupts.patch | 41 ++
...culate-model-specific-LBRs-once-at-start-.patch | 342 -------------
...pport-for-CPUs-without-model-specific-LBR.patch | 83 ----
...e-Handle-cache-flush-of-an-element-at-the.patch | 111 +++++
...end-Zenbleed-check-to-models-good-ucode-i.patch | 48 ++
...fix-PAE-check-for-top-level-table-unshado.patch | 39 --
| 50 --
...rl-Fix-confusion-between-SPEC_CTRL_EXIT_T.patch | 74 +++
...x-an-incorrect-assignment-to-uart-io_size.patch | 34 --
...rl-Fold-DO_SPEC_CTRL_EXIT_TO_XEN-into-it-.patch | 85 ++++
0033-libxl-fix-guest-kexec-skip-cpuid-policy.patch | 72 ---
...rl-Turn-the-remaining-SPEC_CTRL_-ENTRY-EX.patch | 83 ++++
...-xenctrl-Make-domain_getinfolist-tail-rec.patch | 71 ---
...rl-Improve-all-SPEC_CTRL_-ENTER-EXIT-_-co.patch | 106 ++++
...-xenctrl-Use-larger-chunksize-in-domain_g.patch | 41 --
...djust-restore_all_xen-to-hold-stack_end-i.patch | 74 +++
...aml-xb-mmap-Use-Data_abstract_val-wrapper.patch | 75 ---
...rack-the-IST-ness-of-an-entry-for-the-exi.patch | 109 +++++
0037-tools-ocaml-xb-Drop-Xs_ring.write.patch | 62 ---
...ec-ctrl-Issue-VERW-during-IST-exit-to-Xen.patch | 89 ++++
...tored-validate-config-file-before-live-up.patch | 131 -----
...md-Introduce-is_zen-1-2-_uarch-predicates.patch | 91 ++++
...l-libs-Don-t-declare-stubs-as-taking-void.patch | 61 ---
...6-spec-ctrl-Mitigate-the-Zen1-DIV-leakage.patch | 228 +++++++++
...-libs-Allocate-the-correct-amount-of-memo.patch | 80 ---
...defer-releasing-of-PV-s-top-level-shadow-.patch | 455 +++++++++++++++++
...-evtchn-Don-t-reference-Custom-objects-wi.patch | 213 --------
...ored-domain_entry_fix-Handle-conflicting-.patch | 64 +++
...-vi-flush-IOMMU-TLB-when-flushing-the-DTE.patch | 186 +++++++
...-xc-Fix-binding-for-xc_domain_assign_devi.patch | 70 ---
0043-libfsimage-xfs-Remove-dead-code.patch | 71 +++
...-xc-Don-t-reference-Abstract_Tag-objects-.patch | 76 ---
...-xfs-Amend-mask32lo-to-allow-the-value-32.patch | 33 ++
...-libs-Fix-memory-resource-leaks-with-caml.patch | 61 ---
...xfs-Sanity-check-the-superblock-during-mo.patch | 137 ++++++
...rl-Mitigate-Cross-Thread-Return-Address-P.patch | 120 -----
...Remove-clang-8-from-Debian-unstable-conta.patch | 84 ----
...-xfs-Add-compile-time-check-to-libfsimage.patch | 62 +++
...ix-parallel-build-between-flex-bison-and-.patch | 50 --
...tools-pygrub-Remove-unnecessary-hypercall.patch | 60 +++
0048-tools-pygrub-Small-refactors.patch | 65 +++
...uid-Infrastructure-for-leaves-7-1-ecx-edx.patch | 126 -----
...ools-pygrub-Open-the-output-files-earlier.patch | 105 ++++
...isable-CET-SS-on-parts-susceptible-to-fra.patch | 195 --------
...pect-credit2_runqueue-all-when-arranging-.patch | 69 ---
...image-Export-a-new-function-to-preload-al.patch | 126 +++++
0051-build-make-FILE-symbol-paths-consistent.patch | 42 --
0051-tools-pygrub-Deprivilege-pygrub.patch | 307 ++++++++++++
...upport-for-running-bootloader-in-restrict.patch | 251 ++++++++++
...MD-apply-the-patch-early-on-every-logical.patch | 154 ------
...t-bootloader-execution-in-restricted-mode.patch | 158 ++++++
...-mem_sharing-teardown-before-paging-teard.patch | 111 -----
...-asymmetry-with-AMD-DR-MASK-context-switc.patch | 104 ++++
...Work-around-Clang-IAS-macro-expansion-bug.patch | 109 -----
...ect-the-auditing-of-guest-breakpoint-addr.patch | 86 ++++
...ng-Wunicode-diagnostic-when-building-asm-.patch | 83 ----
...KG_CONFIG_FILE-instead-of-PKG_CONFIG-vari.patch | 91 ----
...Fix-resource-leaks-in-xc_core_arch_map_p2.patch | 65 ---
...Fix-leak-on-realloc-failure-in-backup_pte.patch | 56 ---
...MD-late-load-the-patch-on-every-logical-t.patch | 90 ----
...account-for-log-dirty-mode-when-pre-alloc.patch | 92 ----
...nd-number-of-pinned-cache-attribute-regio.patch | 50 --
...ialize-pinned-cache-attribute-list-manipu.patch | 126 -----
...rl-Defer-CR4_PV32_RESTORE-on-the-cstar_en.patch | 56 ---
...lement-VMExit-based-guest-Bus-Lock-detect.patch | 175 -------
...troduce-helper-to-set-VMX_INTR_SHADOW_NMI.patch | 102 ----
0066-x86-vmx-implement-Notify-VM-Exit.patch | 243 ----------
...python-change-s-size-type-for-Python-3.10.patch | 72 ---
...s-xenmon-Fix-xenmon.py-for-with-python3.x.patch | 54 ---
...rl-Add-BHI-controls-to-userspace-componen.patch | 51 --
...arking-fix-build-with-gcc12-and-NR_CPUS-1.patch | 95 ----
...help-gcc13-to-avoid-it-emitting-a-warning.patch | 129 -----
0072-VT-d-constrain-IGD-check.patch | 44 --
0073-bunzip-work-around-gcc13-warning.patch | 42 --
0074-libacpi-fix-PCI-hotplug-AML.patch | 57 ---
...ithout-XT-x2APIC-needs-to-be-forced-into-.patch | 42 --
...mmu-no-igfx-if-the-IOMMU-scope-contains-f.patch | 44 --
...fix-and-improve-sh_page_has_multiple_shad.patch | 47 --
...Fix-evaluate_nospec-code-generation-under.patch | 101 ----
...x86-shadow-Fix-build-with-no-PG_log_dirty.patch | 56 ---
...-t-spuriously-crash-the-domain-when-INIT-.patch | 51 --
...6-ucode-Fix-error-paths-control_thread_fn.patch | 56 ---
| 37 --
...andle-accesses-adjacent-to-the-MSI-X-tabl.patch | 540 ---------------------
...rect-name-value-pair-parsing-for-PCI-port.patch | 59 ---
0085-CI-Drop-automation-configs.patch | 87 ----
...Switch-arm32-cross-builds-to-run-on-arm64.patch | 87 ----
...n-Remove-CentOS-7.2-containers-and-builds.patch | 145 ------
...mation-Remove-non-debug-x86_32-build-jobs.patch | 67 ---
...-llvm-8-from-the-Debian-Stretch-container.patch | 103 ----
info.txt | 6 +-
145 files changed, 6138 insertions(+), 8495 deletions(-)
diff --git a/0001-update-Xen-version-to-4.17.1-pre.patch b/0001-update-Xen-version-to-4.17.1-pre.patch
deleted file mode 100644
index 1d1bb53..0000000
--- a/0001-update-Xen-version-to-4.17.1-pre.patch
+++ /dev/null
@@ -1,136 +0,0 @@
-From 0b999fa2eadaeff840a8331b87f1f73abf3b14eb Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 20 Dec 2022 13:40:38 +0100
-Subject: [PATCH 01/89] update Xen version to 4.17.1-pre
-
----
- MAINTAINERS | 92 +++++-----------------------------------------------
- xen/Makefile | 2 +-
- 2 files changed, 10 insertions(+), 84 deletions(-)
-
-diff --git a/MAINTAINERS b/MAINTAINERS
-index 175f10f33f..ebb908cc37 100644
---- a/MAINTAINERS
-+++ b/MAINTAINERS
-@@ -54,6 +54,15 @@ list. Remember to copy the appropriate stable branch maintainer who
- will be listed in this section of the MAINTAINERS file in the
- appropriate branch.
-
-+The maintainer for this branch is:
-+
-+ Jan Beulich <jbeulich@suse.com>
-+
-+Tools backport requests should also be copied to:
-+
-+ Anthony Perard <anthony.perard@citrix.com>
-+
-+
- Unstable Subsystem Maintainers
- ==============================
-
-@@ -104,89 +113,6 @@ Descriptions of section entries:
- xen-maintainers-<version format number of this file>
-
-
-- Check-in policy
-- ===============
--
--In order for a patch to be checked in, in general, several conditions
--must be met:
--
--1. In order to get a change to a given file committed, it must have
-- the approval of at least one maintainer of that file.
--
-- A patch of course needs Acks from the maintainers of each file that
-- it changes; so a patch which changes xen/arch/x86/traps.c,
-- xen/arch/x86/mm/p2m.c, and xen/arch/x86/mm/shadow/multi.c would
-- require an Ack from each of the three sets of maintainers.
--
-- See below for rules on nested maintainership.
--
--2. It must have appropriate approval from someone other than the
-- submitter. This can be either:
--
-- a. An Acked-by from a maintainer of the code being touched (a
-- co-maintainer if available, or a more general level maintainer if
-- not available; see the secton on nested maintainership)
--
-- b. A Reviewed-by by anyone of suitable stature in the community
--
--3. Sufficient time must have been given for anyone to respond. This
-- depends in large part upon the urgency and nature of the patch.
-- For a straightforward uncontroversial patch, a day or two may be
-- sufficient; for a controversial patch, a week or two may be better.
--
--4. There must be no "open" objections.
--
--In a case where one person submits a patch and a maintainer gives an
--Ack, the Ack stands in for both the approval requirement (#1) and the
--Acked-by-non-submitter requirement (#2).
--
--In a case where a maintainer themselves submits a patch, the
--Signed-off-by meets the approval requirement (#1); so a Review
--from anyone in the community suffices for requirement #2.
--
--Before a maintainer checks in their own patch with another community
--member's R-b but no co-maintainer Ack, it is especially important to
--give their co-maintainer opportunity to give feedback, perhaps
--declaring their intention to check it in without their co-maintainers
--ack a day before doing so.
--
--Maintainers may choose to override non-maintainer objections in the
--case that consensus can't be reached.
--
--As always, no policy can cover all possible situations. In
--exceptional circumstances, committers may commit a patch in absence of
--one or more of the above requirements, if they are reasonably
--confident that the other maintainers will approve of their decision in
--retrospect.
--
-- The meaning of nesting
-- ======================
--
--Many maintainership areas are "nested": for example, there are entries
--for xen/arch/x86 as well as xen/arch/x86/mm, and even
--xen/arch/x86/mm/shadow; and there is a section at the end called "THE
--REST" which lists all committers. The meaning of nesting is that:
--
--1. Under normal circumstances, the Ack of the most specific maintainer
--is both necessary and sufficient to get a change to a given file
--committed. So a change to xen/arch/x86/mm/shadow/multi.c requires the
--the Ack of the xen/arch/x86/mm/shadow maintainer for that part of the
--patch, but would not require the Ack of the xen/arch/x86 maintainer or
--the xen/arch/x86/mm maintainer.
--
--2. In unusual circumstances, a more general maintainer's Ack can stand
--in for or even overrule a specific maintainer's Ack. Unusual
--circumstances might include:
-- - The patch is fixing a high-priority issue causing immediate pain,
-- and the more specific maintainer is not available.
-- - The more specific maintainer has not responded either to the
-- original patch, nor to "pings", within a reasonable amount of time.
-- - The more general maintainer wants to overrule the more specific
-- maintainer on some issue. (This should be exceptional.)
-- - In the case of a disagreement between maintainers, THE REST can
-- settle the matter by majority vote. (This should be very exceptional
-- indeed.)
--
-
- Maintainers List (try to look for most precise areas first)
-
-diff --git a/xen/Makefile b/xen/Makefile
-index d7102a3b47..dcedfbc38e 100644
---- a/xen/Makefile
-+++ b/xen/Makefile
-@@ -6,7 +6,7 @@ this-makefile := $(call lastword,$(MAKEFILE_LIST))
- # All other places this is stored (eg. compile.h) should be autogenerated.
- export XEN_VERSION = 4
- export XEN_SUBVERSION = 17
--export XEN_EXTRAVERSION ?= .0$(XEN_VENDORVERSION)
-+export XEN_EXTRAVERSION ?= .1-pre$(XEN_VENDORVERSION)
- export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
- -include xen-version
-
---
-2.40.0
-
diff --git a/0001-update-Xen-version-to-4.17.3-pre.patch b/0001-update-Xen-version-to-4.17.3-pre.patch
new file mode 100644
index 0000000..1be1cd1
--- /dev/null
+++ b/0001-update-Xen-version-to-4.17.3-pre.patch
@@ -0,0 +1,25 @@
+From 2f337a04bfc2dda794ae0fc108577ec72932f83b Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Mon, 21 Aug 2023 15:52:13 +0200
+Subject: [PATCH 01/55] update Xen version to 4.17.3-pre
+
+---
+ xen/Makefile | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/Makefile b/xen/Makefile
+index fbada570b8..f6005bd536 100644
+--- a/xen/Makefile
++++ b/xen/Makefile
+@@ -6,7 +6,7 @@ this-makefile := $(call lastword,$(MAKEFILE_LIST))
+ # All other places this is stored (eg. compile.h) should be autogenerated.
+ export XEN_VERSION = 4
+ export XEN_SUBVERSION = 17
+-export XEN_EXTRAVERSION ?= .2$(XEN_VENDORVERSION)
++export XEN_EXTRAVERSION ?= .3-pre$(XEN_VENDORVERSION)
+ export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
+ -include xen-version
+
+--
+2.42.0
+
diff --git a/0002-x86-fix-build-with-old-gcc-after-CPU-policy-changes.patch b/0002-x86-fix-build-with-old-gcc-after-CPU-policy-changes.patch
new file mode 100644
index 0000000..1b62572
--- /dev/null
+++ b/0002-x86-fix-build-with-old-gcc-after-CPU-policy-changes.patch
@@ -0,0 +1,84 @@
+From 7d8897984927a51495e9a1b827aa4bce1d779b87 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Mon, 21 Aug 2023 15:53:17 +0200
+Subject: [PATCH 02/55] x86: fix build with old gcc after CPU policy changes
+
+Old gcc won't cope with initializers involving unnamed struct/union
+fields.
+
+Fixes: 441b1b2a50ea ("x86/emul: Switch x86_emulate_ctxt to cpu_policy")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 768846690d64bc730c1a1123e8de3af731bb2eb3
+master date: 2023-04-19 11:02:47 +0200
+---
+ tools/fuzz/x86_instruction_emulator/fuzz-emul.c | 4 +++-
+ xen/arch/x86/pv/emul-priv-op.c | 4 +++-
+ xen/arch/x86/pv/ro-page-fault.c | 4 +++-
+ 3 files changed, 9 insertions(+), 3 deletions(-)
+
+diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
+index 4885a68210..eeeb6931f4 100644
+--- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
++++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
+@@ -893,12 +893,14 @@ int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
+ struct x86_emulate_ctxt ctxt = {
+ .data = &state,
+ .regs = &input.regs,
+- .cpu_policy = &cp,
+ .addr_size = 8 * sizeof(void *),
+ .sp_size = 8 * sizeof(void *),
+ };
+ int rc;
+
++ /* Not part of the initializer, for old gcc to cope. */
++ ctxt.cpu_policy = &cp;
++
+ /* Reset all global state variables */
+ memset(&input, 0, sizeof(input));
+
+diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
+index 04416f1979..2c94beb10e 100644
+--- a/xen/arch/x86/pv/emul-priv-op.c
++++ b/xen/arch/x86/pv/emul-priv-op.c
+@@ -1327,12 +1327,14 @@ int pv_emulate_privileged_op(struct cpu_user_regs *regs)
+ struct domain *currd = curr->domain;
+ struct priv_op_ctxt ctxt = {
+ .ctxt.regs = regs,
+- .ctxt.cpu_policy = currd->arch.cpu_policy,
+ .ctxt.lma = !is_pv_32bit_domain(currd),
+ };
+ int rc;
+ unsigned int eflags, ar;
+
++ /* Not part of the initializer, for old gcc to cope. */
++ ctxt.ctxt.cpu_policy = currd->arch.cpu_policy;
++
+ if ( !pv_emul_read_descriptor(regs->cs, curr, &ctxt.cs.base,
+ &ctxt.cs.limit, &ar, 1) ||
+ !(ar & _SEGMENT_S) ||
+diff --git a/xen/arch/x86/pv/ro-page-fault.c b/xen/arch/x86/pv/ro-page-fault.c
+index 0d02c7d2ab..f23ad5d184 100644
+--- a/xen/arch/x86/pv/ro-page-fault.c
++++ b/xen/arch/x86/pv/ro-page-fault.c
+@@ -356,7 +356,6 @@ int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs)
+ unsigned int addr_size = is_pv_32bit_domain(currd) ? 32 : BITS_PER_LONG;
+ struct x86_emulate_ctxt ctxt = {
+ .regs = regs,
+- .cpu_policy = currd->arch.cpu_policy,
+ .addr_size = addr_size,
+ .sp_size = addr_size,
+ .lma = addr_size > 32,
+@@ -364,6 +363,9 @@ int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs)
+ int rc;
+ bool mmio_ro;
+
++ /* Not part of the initializer, for old gcc to cope. */
++ ctxt.cpu_policy = currd->arch.cpu_policy;
++
+ /* Attempt to read the PTE that maps the VA being accessed. */
+ pte = guest_get_eff_kern_l1e(addr);
+
+--
+2.42.0
+
diff --git a/0002-x86-irq-do-not-release-irq-until-all-cleanup-is-done.patch b/0002-x86-irq-do-not-release-irq-until-all-cleanup-is-done.patch
deleted file mode 100644
index 1c7a13d..0000000
--- a/0002-x86-irq-do-not-release-irq-until-all-cleanup-is-done.patch
+++ /dev/null
@@ -1,90 +0,0 @@
-From 9cbc04a95f8a7f7cc27901211cbe19a42850c4ed Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 20 Dec 2022 13:43:04 +0100
-Subject: [PATCH 02/89] x86/irq: do not release irq until all cleanup is done
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Current code in _clear_irq_vector() will mark the irq as unused before
-doing the cleanup required when move_in_progress is true.
-
-This can lead to races in create_irq() if the function picks an irq
-desc that's been marked as unused but has move_in_progress set, as the
-call to assign_irq_vector() in that function can then fail with
--EAGAIN.
-
-Prevent that by only marking irq descs as unused when all the cleanup
-has been done. While there also use write_atomic() when setting
-IRQ_UNUSED in _clear_irq_vector() and add a barrier in order to
-prevent the setting of IRQ_UNUSED getting reordered by the compiler.
-
-The check for move_in_progress cannot be removed from
-_assign_irq_vector(), as other users (io_apic_set_pci_routing() and
-ioapic_guest_write()) can still pass active irq descs to
-assign_irq_vector().
-
-Note the trace point is not moved and is now set before the irq is
-marked as unused. This is done so that the CPU mask provided in the
-trace point is the one belonging to the current vector, not the old
-one.
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: e267d11969a40f0aec33dbf966f5a6490b205f43
-master date: 2022-12-02 10:32:21 +0100
----
- xen/arch/x86/irq.c | 31 ++++++++++++++++---------------
- 1 file changed, 16 insertions(+), 15 deletions(-)
-
-diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
-index cd0c8a30a8..20150b1c7f 100644
---- a/xen/arch/x86/irq.c
-+++ b/xen/arch/x86/irq.c
-@@ -220,27 +220,28 @@ static void _clear_irq_vector(struct irq_desc *desc)
- clear_bit(vector, desc->arch.used_vectors);
- }
-
-- desc->arch.used = IRQ_UNUSED;
--
- trace_irq_mask(TRC_HW_IRQ_CLEAR_VECTOR, irq, vector, tmp_mask);
-
-- if ( likely(!desc->arch.move_in_progress) )
-- return;
-+ if ( unlikely(desc->arch.move_in_progress) )
-+ {
-+ /* If we were in motion, also clear desc->arch.old_vector */
-+ old_vector = desc->arch.old_vector;
-+ cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map);
-
-- /* If we were in motion, also clear desc->arch.old_vector */
-- old_vector = desc->arch.old_vector;
-- cpumask_and(tmp_mask, desc->arch.old_cpu_mask, &cpu_online_map);
-+ for_each_cpu(cpu, tmp_mask)
-+ {
-+ ASSERT(per_cpu(vector_irq, cpu)[old_vector] == irq);
-+ TRACE_3D(TRC_HW_IRQ_MOVE_FINISH, irq, old_vector, cpu);
-+ per_cpu(vector_irq, cpu)[old_vector] = ~irq;
-+ }
-
-- for_each_cpu(cpu, tmp_mask)
-- {
-- ASSERT(per_cpu(vector_irq, cpu)[old_vector] == irq);
-- TRACE_3D(TRC_HW_IRQ_MOVE_FINISH, irq, old_vector, cpu);
-- per_cpu(vector_irq, cpu)[old_vector] = ~irq;
-- }
-+ release_old_vec(desc);
-
-- release_old_vec(desc);
-+ desc->arch.move_in_progress = 0;
-+ }
-
-- desc->arch.move_in_progress = 0;
-+ smp_wmb();
-+ write_atomic(&desc->arch.used, IRQ_UNUSED);
- }
-
- void __init clear_irq_vector(int irq)
---
-2.40.0
-
diff --git a/0003-libxl-Use-XEN_LIB_DIR-to-store-bootloader-from-pygru.patch b/0003-libxl-Use-XEN_LIB_DIR-to-store-bootloader-from-pygru.patch
new file mode 100644
index 0000000..a395d7a
--- /dev/null
+++ b/0003-libxl-Use-XEN_LIB_DIR-to-store-bootloader-from-pygru.patch
@@ -0,0 +1,45 @@
+From 8d84be5b557b27e9cc53e48285aebad28a48468c Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Mon, 21 Aug 2023 15:53:47 +0200
+Subject: [PATCH 03/55] libxl: Use XEN_LIB_DIR to store bootloader from pygrub
+
+In osstest, the jobs using pygrub on arm64 on the branch linux-linus
+started to fails with:
+ [Errno 28] No space left on device
+ Error writing temporary copy of ramdisk
+
+This is because /var/run is small when dom0 has only 512MB to work
+with, /var/run is only 40MB. The size of both kernel and ramdisk on
+this jobs is now about 42MB, so not enough space in /var/run.
+
+So, to avoid writing a big binary in ramfs, we will use /var/lib
+instead, like we already do when saving the device model state on
+migration.
+
+Reported-by: Jan Beulich <jbeulich@suse.com>
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
+master commit: ad89640ad766d3cb6c92fc8b6406ca6bbab44136
+master date: 2023-08-08 09:45:20 +0200
+---
+ tools/libs/light/libxl_bootloader.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/tools/libs/light/libxl_bootloader.c b/tools/libs/light/libxl_bootloader.c
+index 1bc6e51827..108329b4a5 100644
+--- a/tools/libs/light/libxl_bootloader.c
++++ b/tools/libs/light/libxl_bootloader.c
+@@ -245,8 +245,8 @@ static void bootloader_cleanup(libxl__egc *egc, libxl__bootloader_state *bl)
+ static void bootloader_setpaths(libxl__gc *gc, libxl__bootloader_state *bl)
+ {
+ uint32_t domid = bl->domid;
+- bl->outputdir = GCSPRINTF(XEN_RUN_DIR "/bootloader.%"PRIu32".d", domid);
+- bl->outputpath = GCSPRINTF(XEN_RUN_DIR "/bootloader.%"PRIu32".out", domid);
++ bl->outputdir = GCSPRINTF(XEN_LIB_DIR "/bootloader.%"PRIu32".d", domid);
++ bl->outputpath = GCSPRINTF(XEN_LIB_DIR "/bootloader.%"PRIu32".out", domid);
+ }
+
+ /* Callbacks */
+--
+2.42.0
+
diff --git a/0003-x86-pvh-do-not-forward-MADT-Local-APIC-NMI-structure.patch b/0003-x86-pvh-do-not-forward-MADT-Local-APIC-NMI-structure.patch
deleted file mode 100644
index 47d6997..0000000
--- a/0003-x86-pvh-do-not-forward-MADT-Local-APIC-NMI-structure.patch
+++ /dev/null
@@ -1,103 +0,0 @@
-From b7b34bd66ac77326bb49b10130013b4a9f83e4a2 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 20 Dec 2022 13:43:37 +0100
-Subject: [PATCH 03/89] x86/pvh: do not forward MADT Local APIC NMI structures
- to dom0
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Currently Xen will passthrough any Local APIC NMI Structure found in
-the native ACPI MADT table to a PVH dom0. This is wrong because PVH
-doesn't have access to the physical local APIC, and instead gets an
-emulated local APIC by Xen, that doesn't have the LINT0 or LINT1
-pins wired to anything. Furthermore the ACPI Processor UIDs used in
-the APIC NMI Structures are likely to not match the ones generated by
-Xen for the Local x2APIC Structures, creating confusion to dom0.
-
-Fix this by removing the logic to passthrough the Local APIC NMI
-Structure for PVH dom0.
-
-Fixes: 1d74282c45 ('x86: setup PVHv2 Dom0 ACPI tables')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: b39e6385250ccef9509af0eab9003ad5c1478842
-master date: 2022-12-02 10:33:40 +0100
----
- xen/arch/x86/hvm/dom0_build.c | 34 +---------------------------------
- 1 file changed, 1 insertion(+), 33 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
-index 1864d048a1..3ac6b7b423 100644
---- a/xen/arch/x86/hvm/dom0_build.c
-+++ b/xen/arch/x86/hvm/dom0_build.c
-@@ -58,9 +58,6 @@
- static unsigned int __initdata acpi_intr_overrides;
- static struct acpi_madt_interrupt_override __initdata *intsrcovr;
-
--static unsigned int __initdata acpi_nmi_sources;
--static struct acpi_madt_nmi_source __initdata *nmisrc;
--
- static unsigned int __initdata order_stats[MAX_ORDER + 1];
-
- static void __init print_order_stats(const struct domain *d)
-@@ -763,25 +760,6 @@ static int __init cf_check acpi_set_intr_ovr(
- return 0;
- }
-
--static int __init cf_check acpi_count_nmi_src(
-- struct acpi_subtable_header *header, const unsigned long end)
--{
-- acpi_nmi_sources++;
-- return 0;
--}
--
--static int __init cf_check acpi_set_nmi_src(
-- struct acpi_subtable_header *header, const unsigned long end)
--{
-- const struct acpi_madt_nmi_source *src =
-- container_of(header, struct acpi_madt_nmi_source, header);
--
-- *nmisrc = *src;
-- nmisrc++;
--
-- return 0;
--}
--
- static int __init pvh_setup_acpi_madt(struct domain *d, paddr_t *addr)
- {
- struct acpi_table_madt *madt;
-@@ -797,16 +775,11 @@ static int __init pvh_setup_acpi_madt(struct domain *d, paddr_t *addr)
- acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE,
- acpi_count_intr_ovr, UINT_MAX);
-
-- /* Count number of NMI sources in the MADT. */
-- acpi_table_parse_madt(ACPI_MADT_TYPE_NMI_SOURCE, acpi_count_nmi_src,
-- UINT_MAX);
--
- max_vcpus = dom0_max_vcpus();
- /* Calculate the size of the crafted MADT. */
- size = sizeof(*madt);
- size += sizeof(*io_apic) * nr_ioapics;
- size += sizeof(*intsrcovr) * acpi_intr_overrides;
-- size += sizeof(*nmisrc) * acpi_nmi_sources;
- size += sizeof(*x2apic) * max_vcpus;
-
- madt = xzalloc_bytes(size);
-@@ -862,12 +835,7 @@ static int __init pvh_setup_acpi_madt(struct domain *d, paddr_t *addr)
- acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, acpi_set_intr_ovr,
- acpi_intr_overrides);
-
-- /* Setup NMI sources. */
-- nmisrc = (void *)intsrcovr;
-- acpi_table_parse_madt(ACPI_MADT_TYPE_NMI_SOURCE, acpi_set_nmi_src,
-- acpi_nmi_sources);
--
-- ASSERT(((void *)nmisrc - (void *)madt) == size);
-+ ASSERT(((void *)intsrcovr - (void *)madt) == size);
- madt->header.length = size;
- /*
- * Calling acpi_tb_checksum here is a layering violation, but
---
-2.40.0
-
diff --git a/0004-build-define-ARCH-and-SRCARCH-later.patch b/0004-build-define-ARCH-and-SRCARCH-later.patch
new file mode 100644
index 0000000..aebcbb7
--- /dev/null
+++ b/0004-build-define-ARCH-and-SRCARCH-later.patch
@@ -0,0 +1,67 @@
+From 1c3927f8f6743538a35aa45a91a2d4adbde9f277 Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Wed, 5 Jul 2023 08:25:03 +0200
+Subject: [PATCH 04/55] build: define ARCH and SRCARCH later
+
+Defining ARCH and SRCARCH later in xen/Makefile allows to switch to
+immediate evaluation variable type.
+
+ARCH and SRCARCH depend on value defined in Config.mk and aren't used
+for e.g. TARGET_SUBARCH or TARGET_ARCH, and not before they're needed in
+a sub-make or a rule.
+
+This will help reduce the number of times the shell rune is been
+run.
+
+With GNU make 4.4, the number of execution of the command present in
+these $(shell ) increased greatly. This is probably because as of make
+4.4, exported variable are also added to the environment of $(shell )
+construct.
+
+Also, `make -d` shows a lot of these:
+ Makefile:39: not recursively expanding SRCARCH to export to shell function
+ Makefile:38: not recursively expanding ARCH to export to shell function
+
+Reported-by: Jason Andryuk <jandryuk@gmail.com>
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Tested-by: Jason Andryuk <jandryuk@gmail.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 58e0a3f3b2c430f8640ef9df67ac857b0008ebc8)
+---
+ xen/Makefile | 13 +++++++------
+ 1 file changed, 7 insertions(+), 6 deletions(-)
+
+diff --git a/xen/Makefile b/xen/Makefile
+index f6005bd536..7ecfa6e8e9 100644
+--- a/xen/Makefile
++++ b/xen/Makefile
+@@ -35,12 +35,6 @@ MAKEFLAGS += -rR
+
+ EFI_MOUNTPOINT ?= $(BOOT_DIR)/efi
+
+-ARCH=$(XEN_TARGET_ARCH)
+-SRCARCH=$(shell echo $(ARCH) | \
+- sed -e 's/x86.*/x86/' -e s'/arm\(32\|64\)/arm/g' \
+- -e s'/riscv.*/riscv/g')
+-export ARCH SRCARCH
+-
+ # Allow someone to change their config file
+ export KCONFIG_CONFIG ?= .config
+
+@@ -241,6 +235,13 @@ include scripts/Kbuild.include
+ include $(XEN_ROOT)/Config.mk
+
+ # Set ARCH/SUBARCH appropriately.
++
++ARCH := $(XEN_TARGET_ARCH)
++SRCARCH := $(shell echo $(ARCH) | \
++ sed -e 's/x86.*/x86/' -e 's/arm\(32\|64\)/arm/g' \
++ -e 's/riscv.*/riscv/g')
++export ARCH SRCARCH
++
+ export TARGET_SUBARCH := $(XEN_TARGET_ARCH)
+ export TARGET_ARCH := $(shell echo $(XEN_TARGET_ARCH) | \
+ sed -e 's/x86.*/x86/' -e s'/arm\(32\|64\)/arm/g' \
+--
+2.42.0
+
diff --git a/0004-x86-HVM-don-t-mark-external-IRQs-as-pending-when-vLA.patch b/0004-x86-HVM-don-t-mark-external-IRQs-as-pending-when-vLA.patch
deleted file mode 100644
index 01dcba8..0000000
--- a/0004-x86-HVM-don-t-mark-external-IRQs-as-pending-when-vLA.patch
+++ /dev/null
@@ -1,71 +0,0 @@
-From 54bb56e12868100c5ce06e33b4f57b6b2b8f37b9 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 20 Dec 2022 13:44:07 +0100
-Subject: [PATCH 04/89] x86/HVM: don't mark external IRQs as pending when
- vLAPIC is disabled
-
-In software-disabled state an LAPIC does not accept any interrupt
-requests and hence no IRR bit would newly become set while in this
-state. As a result it is also wrong for us to mark IO-APIC or MSI
-originating vectors as having a pending request when the vLAPIC is in
-this state. Such interrupts are simply lost.
-
-Introduce (IO-APIC) or re-use (MSI) a local variable to help
-readability.
-
-Fixes: 4fe21ad3712e ("This patch add virtual IOAPIC support for VMX guest")
-Fixes: 85715f4bc7c9 ("MSI 5/6: add MSI support to passthrough HVM domain")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: f1d7aac1e3c3cd164e17d41791a575a5c3e87121
-master date: 2022-12-02 10:35:01 +0100
----
- xen/arch/x86/hvm/vioapic.c | 9 +++++++--
- xen/arch/x86/hvm/vmsi.c | 10 ++++++----
- 2 files changed, 13 insertions(+), 6 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
-index cb7f440160..41e3c4d5e4 100644
---- a/xen/arch/x86/hvm/vioapic.c
-+++ b/xen/arch/x86/hvm/vioapic.c
-@@ -460,9 +460,14 @@ static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin)
-
- case dest_Fixed:
- for_each_vcpu ( d, v )
-- if ( vlapic_match_dest(vcpu_vlapic(v), NULL, 0, dest, dest_mode) )
-- ioapic_inj_irq(vioapic, vcpu_vlapic(v), vector, trig_mode,
-+ {
-+ struct vlapic *vlapic = vcpu_vlapic(v);
-+
-+ if ( vlapic_enabled(vlapic) &&
-+ vlapic_match_dest(vlapic, NULL, 0, dest, dest_mode) )
-+ ioapic_inj_irq(vioapic, vlapic, vector, trig_mode,
- delivery_mode);
-+ }
- break;
-
- case dest_NMI:
-diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
-index 75f92885dc..3cd4923060 100644
---- a/xen/arch/x86/hvm/vmsi.c
-+++ b/xen/arch/x86/hvm/vmsi.c
-@@ -87,10 +87,12 @@ int vmsi_deliver(
-
- case dest_Fixed:
- for_each_vcpu ( d, v )
-- if ( vlapic_match_dest(vcpu_vlapic(v), NULL,
-- 0, dest, dest_mode) )
-- vmsi_inj_irq(vcpu_vlapic(v), vector,
-- trig_mode, delivery_mode);
-+ {
-+ target = vcpu_vlapic(v);
-+ if ( vlapic_enabled(target) &&
-+ vlapic_match_dest(target, NULL, 0, dest, dest_mode) )
-+ vmsi_inj_irq(target, vector, trig_mode, delivery_mode);
-+ }
- break;
-
- default:
---
-2.40.0
-
diff --git a/0005-build-remove-TARGET_SUBARCH-a-duplicate-of-ARCH.patch b/0005-build-remove-TARGET_SUBARCH-a-duplicate-of-ARCH.patch
new file mode 100644
index 0000000..4f31614
--- /dev/null
+++ b/0005-build-remove-TARGET_SUBARCH-a-duplicate-of-ARCH.patch
@@ -0,0 +1,50 @@
+From 56076ef445073458c39c481f9b70c3b4ff848839 Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Wed, 5 Jul 2023 08:27:51 +0200
+Subject: [PATCH 05/55] build: remove TARGET_SUBARCH, a duplicate of ARCH
+
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit a6ab7dd061338c33faef629cbe52ed1608571d84)
+---
+ xen/Makefile | 3 +--
+ xen/build.mk | 2 +-
+ 2 files changed, 2 insertions(+), 3 deletions(-)
+
+diff --git a/xen/Makefile b/xen/Makefile
+index 7ecfa6e8e9..6e89bcf348 100644
+--- a/xen/Makefile
++++ b/xen/Makefile
+@@ -234,7 +234,7 @@ include scripts/Kbuild.include
+ # we need XEN_TARGET_ARCH to generate the proper config
+ include $(XEN_ROOT)/Config.mk
+
+-# Set ARCH/SUBARCH appropriately.
++# Set ARCH/SRCARCH appropriately.
+
+ ARCH := $(XEN_TARGET_ARCH)
+ SRCARCH := $(shell echo $(ARCH) | \
+@@ -242,7 +242,6 @@ SRCARCH := $(shell echo $(ARCH) | \
+ -e 's/riscv.*/riscv/g')
+ export ARCH SRCARCH
+
+-export TARGET_SUBARCH := $(XEN_TARGET_ARCH)
+ export TARGET_ARCH := $(shell echo $(XEN_TARGET_ARCH) | \
+ sed -e 's/x86.*/x86/' -e s'/arm\(32\|64\)/arm/g' \
+ -e s'/riscv.*/riscv/g')
+diff --git a/xen/build.mk b/xen/build.mk
+index 758590c68e..d049d3a53a 100644
+--- a/xen/build.mk
++++ b/xen/build.mk
+@@ -41,7 +41,7 @@ include/xen/compile.h: include/xen/compile.h.in .banner FORCE
+ targets += include/xen/compile.h
+
+ -include $(wildcard .asm-offsets.s.d)
+-asm-offsets.s: arch/$(TARGET_ARCH)/$(TARGET_SUBARCH)/asm-offsets.c
++asm-offsets.s: arch/$(TARGET_ARCH)/$(ARCH)/asm-offsets.c
+ $(CC) $(call cpp_flags,$(c_flags)) -S -g0 -o $@.new -MQ $@ $<
+ $(call move-if-changed,$@.new,$@)
+
+--
+2.42.0
+
diff --git a/0005-x86-Viridian-don-t-mark-IRQ-vectors-as-pending-when-.patch b/0005-x86-Viridian-don-t-mark-IRQ-vectors-as-pending-when-.patch
deleted file mode 100644
index 3086285..0000000
--- a/0005-x86-Viridian-don-t-mark-IRQ-vectors-as-pending-when-.patch
+++ /dev/null
@@ -1,60 +0,0 @@
-From 5810edc049cd5828c2628a377ca8443610e54f82 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 20 Dec 2022 13:44:38 +0100
-Subject: [PATCH 05/89] x86/Viridian: don't mark IRQ vectors as pending when
- vLAPIC is disabled
-
-In software-disabled state an LAPIC does not accept any interrupt
-requests and hence no IRR bit would newly become set while in this
-state. As a result it is also wrong for us to mark Viridian IPI or timer
-vectors as having a pending request when the vLAPIC is in this state.
-Such interrupts are simply lost.
-
-Introduce a local variable in send_ipi() to help readability.
-
-Fixes: fda96b7382ea ("viridian: add implementation of the HvSendSyntheticClusterIpi hypercall")
-Fixes: 26fba3c85571 ("viridian: add implementation of synthetic timers")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Paul Durrant <paul@xen.org>
-master commit: 831419f82913417dee4e5b0f80769c5db590540b
-master date: 2022-12-02 10:35:32 +0100
----
- xen/arch/x86/hvm/viridian/synic.c | 2 +-
- xen/arch/x86/hvm/viridian/viridian.c | 7 ++++++-
- 2 files changed, 7 insertions(+), 2 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/viridian/synic.c b/xen/arch/x86/hvm/viridian/synic.c
-index e18538c60a..856bb898b8 100644
---- a/xen/arch/x86/hvm/viridian/synic.c
-+++ b/xen/arch/x86/hvm/viridian/synic.c
-@@ -359,7 +359,7 @@ bool viridian_synic_deliver_timer_msg(struct vcpu *v, unsigned int sintx,
- BUILD_BUG_ON(sizeof(payload) > sizeof(msg->u.payload));
- memcpy(msg->u.payload, &payload, sizeof(payload));
-
-- if ( !vs->masked )
-+ if ( !vs->masked && vlapic_enabled(vcpu_vlapic(v)) )
- vlapic_set_irq(vcpu_vlapic(v), vs->vector, 0);
-
- return true;
-diff --git a/xen/arch/x86/hvm/viridian/viridian.c b/xen/arch/x86/hvm/viridian/viridian.c
-index 25dca93e8b..2937ddd3a8 100644
---- a/xen/arch/x86/hvm/viridian/viridian.c
-+++ b/xen/arch/x86/hvm/viridian/viridian.c
-@@ -811,7 +811,12 @@ static void send_ipi(struct hypercall_vpmask *vpmask, uint8_t vector)
- cpu_raise_softirq_batch_begin();
-
- for_each_vp ( vpmask, vp )
-- vlapic_set_irq(vcpu_vlapic(currd->vcpu[vp]), vector, 0);
-+ {
-+ struct vlapic *vlapic = vcpu_vlapic(currd->vcpu[vp]);
-+
-+ if ( vlapic_enabled(vlapic) )
-+ vlapic_set_irq(vlapic, vector, 0);
-+ }
-
- if ( nr > 1 )
- cpu_raise_softirq_batch_finish();
---
-2.40.0
-
diff --git a/0006-build-remove-TARGET_ARCH-a-duplicate-of-SRCARCH.patch b/0006-build-remove-TARGET_ARCH-a-duplicate-of-SRCARCH.patch
new file mode 100644
index 0000000..9eef37a
--- /dev/null
+++ b/0006-build-remove-TARGET_ARCH-a-duplicate-of-SRCARCH.patch
@@ -0,0 +1,123 @@
+From 36e84ea02e1e8dce8f3a4e9351ab1c72dec3c11e Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Wed, 5 Jul 2023 08:29:49 +0200
+Subject: [PATCH 06/55] build: remove TARGET_ARCH, a duplicate of SRCARCH
+
+The same command is used to generate the value of both $(TARGET_ARCH)
+and $(SRCARCH), as $(ARCH) is an alias for $(XEN_TARGET_ARCH).
+
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit ac27b3beb9b7b423d5563768de890c7594c21b4e)
+---
+ xen/Makefile | 20 ++++++++------------
+ xen/Rules.mk | 2 +-
+ xen/build.mk | 6 +++---
+ 3 files changed, 12 insertions(+), 16 deletions(-)
+
+diff --git a/xen/Makefile b/xen/Makefile
+index 6e89bcf348..1a3b9a081f 100644
+--- a/xen/Makefile
++++ b/xen/Makefile
+@@ -242,10 +242,6 @@ SRCARCH := $(shell echo $(ARCH) | \
+ -e 's/riscv.*/riscv/g')
+ export ARCH SRCARCH
+
+-export TARGET_ARCH := $(shell echo $(XEN_TARGET_ARCH) | \
+- sed -e 's/x86.*/x86/' -e s'/arm\(32\|64\)/arm/g' \
+- -e s'/riscv.*/riscv/g')
+-
+ export CONFIG_SHELL := $(SHELL)
+ export CC CXX LD NM OBJCOPY OBJDUMP ADDR2LINE
+ export YACC = $(if $(BISON),$(BISON),bison)
+@@ -262,7 +258,7 @@ export XEN_TREEWIDE_CFLAGS := $(CFLAGS)
+ ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep clang),)
+ CLANG_FLAGS :=
+
+-ifeq ($(TARGET_ARCH),x86)
++ifeq ($(SRCARCH),x86)
+ # The tests to select whether the integrated assembler is usable need to happen
+ # before testing any assembler features, or else the result of the tests would
+ # be stale if the integrated assembler is not used.
+@@ -430,22 +426,22 @@ endif
+
+ ifdef building_out_of_srctree
+ CFLAGS += -I$(objtree)/include
+- CFLAGS += -I$(objtree)/arch/$(TARGET_ARCH)/include
++ CFLAGS += -I$(objtree)/arch/$(SRCARCH)/include
+ endif
+ CFLAGS += -I$(srctree)/include
+-CFLAGS += -I$(srctree)/arch/$(TARGET_ARCH)/include
++CFLAGS += -I$(srctree)/arch/$(SRCARCH)/include
+
+ # Note that link order matters!
+ ALL_OBJS-y := common/built_in.o
+ ALL_OBJS-y += drivers/built_in.o
+ ALL_OBJS-y += lib/built_in.o
+ ALL_OBJS-y += xsm/built_in.o
+-ALL_OBJS-y += arch/$(TARGET_ARCH)/built_in.o
++ALL_OBJS-y += arch/$(SRCARCH)/built_in.o
+ ALL_OBJS-$(CONFIG_CRYPTO) += crypto/built_in.o
+
+ ALL_LIBS-y := lib/lib.a
+
+-include $(srctree)/arch/$(TARGET_ARCH)/arch.mk
++include $(srctree)/arch/$(SRCARCH)/arch.mk
+
+ # define new variables to avoid the ones defined in Config.mk
+ export XEN_CFLAGS := $(CFLAGS)
+@@ -587,11 +583,11 @@ $(TARGET): outputmakefile FORCE
+ $(Q)$(MAKE) $(build)=tools
+ $(Q)$(MAKE) $(build)=. include/xen/compile.h
+ $(Q)$(MAKE) $(build)=include all
+- $(Q)$(MAKE) $(build)=arch/$(TARGET_ARCH) include
+- $(Q)$(MAKE) $(build)=. arch/$(TARGET_ARCH)/include/asm/asm-offsets.h
++ $(Q)$(MAKE) $(build)=arch/$(SRCARCH) include
++ $(Q)$(MAKE) $(build)=. arch/$(SRCARCH)/include/asm/asm-offsets.h
+ $(Q)$(MAKE) $(build)=. MKRELOC=$(MKRELOC) 'ALL_OBJS=$(ALL_OBJS-y)' 'ALL_LIBS=$(ALL_LIBS-y)' $@
+
+-SUBDIRS = xsm arch/$(TARGET_ARCH) common drivers lib test
++SUBDIRS = xsm arch/$(SRCARCH) common drivers lib test
+ define all_sources
+ ( find include -type f -name '*.h' -print; \
+ find $(SUBDIRS) -type f -name '*.[chS]' -print )
+diff --git a/xen/Rules.mk b/xen/Rules.mk
+index 59072ae8df..8af3dd7277 100644
+--- a/xen/Rules.mk
++++ b/xen/Rules.mk
+@@ -180,7 +180,7 @@ cpp_flags = $(filter-out -Wa$(comma)% -flto,$(1))
+ c_flags = -MMD -MP -MF $(depfile) $(XEN_CFLAGS)
+ a_flags = -MMD -MP -MF $(depfile) $(XEN_AFLAGS)
+
+-include $(srctree)/arch/$(TARGET_ARCH)/Rules.mk
++include $(srctree)/arch/$(SRCARCH)/Rules.mk
+
+ c_flags += $(_c_flags)
+ a_flags += $(_c_flags)
+diff --git a/xen/build.mk b/xen/build.mk
+index d049d3a53a..9ecb104f1e 100644
+--- a/xen/build.mk
++++ b/xen/build.mk
+@@ -41,11 +41,11 @@ include/xen/compile.h: include/xen/compile.h.in .banner FORCE
+ targets += include/xen/compile.h
+
+ -include $(wildcard .asm-offsets.s.d)
+-asm-offsets.s: arch/$(TARGET_ARCH)/$(ARCH)/asm-offsets.c
++asm-offsets.s: arch/$(SRCARCH)/$(ARCH)/asm-offsets.c
+ $(CC) $(call cpp_flags,$(c_flags)) -S -g0 -o $@.new -MQ $@ $<
+ $(call move-if-changed,$@.new,$@)
+
+-arch/$(TARGET_ARCH)/include/asm/asm-offsets.h: asm-offsets.s
++arch/$(SRCARCH)/include/asm/asm-offsets.h: asm-offsets.s
+ @(set -e; \
+ echo "/*"; \
+ echo " * DO NOT MODIFY."; \
+@@ -87,4 +87,4 @@ endif
+ targets += prelink.o
+
+ $(TARGET): prelink.o FORCE
+- $(Q)$(MAKE) $(build)=arch/$(TARGET_ARCH) $@
++ $(Q)$(MAKE) $(build)=arch/$(SRCARCH) $@
+--
+2.42.0
+
diff --git a/0006-x86-HVM-don-t-mark-evtchn-upcall-vector-as-pending-w.patch b/0006-x86-HVM-don-t-mark-evtchn-upcall-vector-as-pending-w.patch
deleted file mode 100644
index 2577f20..0000000
--- a/0006-x86-HVM-don-t-mark-evtchn-upcall-vector-as-pending-w.patch
+++ /dev/null
@@ -1,70 +0,0 @@
-From 26f39b3d705b667aa21f368c252abffb0b4d3e5d Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 20 Dec 2022 13:45:07 +0100
-Subject: [PATCH 06/89] x86/HVM: don't mark evtchn upcall vector as pending
- when vLAPIC is disabled
-
-Linux'es relatively new use of HVMOP_set_evtchn_upcall_vector has
-exposed a problem with the marking of the respective vector as
-pending: For quite some time Linux has been checking whether any stale
-ISR or IRR bits would still be set while preparing the LAPIC for use.
-This check is now triggering on the upcall vector, as the registration,
-at least for APs, happens before the LAPIC is actually enabled.
-
-In software-disabled state an LAPIC would not accept any interrupt
-requests and hence no IRR bit would newly become set while in this
-state. As a result it is also wrong for us to mark the upcall vector as
-having a pending request when the vLAPIC is in this state.
-
-To compensate for the "enabled" check added to the assertion logic, add
-logic to (conditionally) mark the upcall vector as having a request
-pending at the time the LAPIC is being software-enabled by the guest.
-Note however that, like for the pt_may_unmask_irq() we already have
-there, long term we may need to find a different solution. This will be
-especially relevant in case yet better LAPIC acceleration would
-eliminate notifications of guest writes to this and other registers.
-
-Fixes: 7b5b8ca7dffd ("x86/upcall: inject a spurious event after setting upcall vector")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-master commit: f5d0279839b58cb622f0995dbf9cff056f03082e
-master date: 2022-12-06 13:51:49 +0100
----
- xen/arch/x86/hvm/irq.c | 5 +++--
- xen/arch/x86/hvm/vlapic.c | 3 +++
- 2 files changed, 6 insertions(+), 2 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
-index 858ab5b248..d93ffe4546 100644
---- a/xen/arch/x86/hvm/irq.c
-+++ b/xen/arch/x86/hvm/irq.c
-@@ -321,9 +321,10 @@ void hvm_assert_evtchn_irq(struct vcpu *v)
-
- if ( v->arch.hvm.evtchn_upcall_vector != 0 )
- {
-- uint8_t vector = v->arch.hvm.evtchn_upcall_vector;
-+ struct vlapic *vlapic = vcpu_vlapic(v);
-
-- vlapic_set_irq(vcpu_vlapic(v), vector, 0);
-+ if ( vlapic_enabled(vlapic) )
-+ vlapic_set_irq(vlapic, v->arch.hvm.evtchn_upcall_vector, 0);
- }
- else if ( is_hvm_pv_evtchn_domain(v->domain) )
- vcpu_kick(v);
-diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
-index 257d3b6851..eb32f12e2d 100644
---- a/xen/arch/x86/hvm/vlapic.c
-+++ b/xen/arch/x86/hvm/vlapic.c
-@@ -829,6 +829,9 @@ void vlapic_reg_write(struct vcpu *v, unsigned int reg, uint32_t val)
- {
- vlapic->hw.disabled &= ~VLAPIC_SW_DISABLED;
- pt_may_unmask_irq(vlapic_domain(vlapic), &vlapic->pt);
-+ if ( v->arch.hvm.evtchn_upcall_vector &&
-+ vcpu_info(v, evtchn_upcall_pending) )
-+ vlapic_set_irq(vlapic, v->arch.hvm.evtchn_upcall_vector, 0);
- }
- break;
-
---
-2.40.0
-
diff --git a/0007-build-evaluate-XEN_BUILD_-and-XEN_DOMAIN-immediately.patch b/0007-build-evaluate-XEN_BUILD_-and-XEN_DOMAIN-immediately.patch
new file mode 100644
index 0000000..81e5ca4
--- /dev/null
+++ b/0007-build-evaluate-XEN_BUILD_-and-XEN_DOMAIN-immediately.patch
@@ -0,0 +1,58 @@
+From a1f68fb56710c507f9c1ec8e8d784f5b1e4088f1 Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Mon, 31 Jul 2023 15:02:18 +0200
+Subject: [PATCH 07/55] build: evaluate XEN_BUILD_* and XEN_DOMAIN immediately
+
+With GNU make 4.4, the number of execution of the command present in
+these $(shell ) increased greatly. This is probably because as of make
+4.4, exported variable are also added to the environment of $(shell )
+construct.
+
+Also, `make -d` shows a lot of these:
+ Makefile:15: not recursively expanding XEN_BUILD_DATE to export to shell function
+ Makefile:16: not recursively expanding XEN_BUILD_TIME to export to shell function
+ Makefile:17: not recursively expanding XEN_BUILD_HOST to export to shell function
+ Makefile:14: not recursively expanding XEN_DOMAIN to export to shell function
+
+So to avoid having these command been run more than necessary, we
+will replace ?= by an equivalent but with immediate expansion.
+
+Reported-by: Jason Andryuk <jandryuk@gmail.com>
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Tested-by: Jason Andryuk <jandryuk@gmail.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 0c594c1b57ee2ecec5f70826c53a2cf02a9c2acb)
+---
+ xen/Makefile | 16 ++++++++++++----
+ 1 file changed, 12 insertions(+), 4 deletions(-)
+
+diff --git a/xen/Makefile b/xen/Makefile
+index 1a3b9a081f..7bb9de7bdc 100644
+--- a/xen/Makefile
++++ b/xen/Makefile
+@@ -11,10 +11,18 @@ export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
+ -include xen-version
+
+ export XEN_WHOAMI ?= $(USER)
+-export XEN_DOMAIN ?= $(shell ([ -x /bin/dnsdomainname ] && /bin/dnsdomainname) || ([ -x /bin/domainname ] && /bin/domainname || echo [unknown]))
+-export XEN_BUILD_DATE ?= $(shell LC_ALL=C date)
+-export XEN_BUILD_TIME ?= $(shell LC_ALL=C date +%T)
+-export XEN_BUILD_HOST ?= $(shell hostname)
++ifeq ($(origin XEN_DOMAIN), undefined)
++export XEN_DOMAIN := $(shell ([ -x /bin/dnsdomainname ] && /bin/dnsdomainname) || ([ -x /bin/domainname ] && /bin/domainname || echo [unknown]))
++endif
++ifeq ($(origin XEN_BUILD_DATE), undefined)
++export XEN_BUILD_DATE := $(shell LC_ALL=C date)
++endif
++ifeq ($(origin XEN_BUILD_TIME), undefined)
++export XEN_BUILD_TIME := $(shell LC_ALL=C date +%T)
++endif
++ifeq ($(origin XEN_BUILD_HOST), undefined)
++export XEN_BUILD_HOST := $(shell hostname)
++endif
+
+ # Best effort attempt to find a python interpreter, defaulting to Python 3 if
+ # available. Fall back to just `python` if `which` is nowhere to be found.
+--
+2.42.0
+
diff --git a/0007-ioreq_broadcast-accept-partial-broadcast-success.patch b/0007-ioreq_broadcast-accept-partial-broadcast-success.patch
deleted file mode 100644
index 654990b..0000000
--- a/0007-ioreq_broadcast-accept-partial-broadcast-success.patch
+++ /dev/null
@@ -1,34 +0,0 @@
-From c3e37c60fbf8f8cd71db0f0846c9c7aeadf02963 Mon Sep 17 00:00:00 2001
-From: Per Bilse <per.bilse@citrix.com>
-Date: Tue, 20 Dec 2022 13:45:38 +0100
-Subject: [PATCH 07/89] ioreq_broadcast(): accept partial broadcast success
-
-Avoid incorrectly triggering an error when a broadcast buffered ioreq
-is not handled by all registered clients, as long as the failure is
-strictly because the client doesn't handle buffered ioreqs.
-
-Signed-off-by: Per Bilse <per.bilse@citrix.com>
-Reviewed-by: Paul Durrant <paul@xen.org>
-master commit: a44734df6c24fadbdb001f051cc5580c467caf7d
-master date: 2022-12-07 12:17:30 +0100
----
- xen/common/ioreq.c | 3 ++-
- 1 file changed, 2 insertions(+), 1 deletion(-)
-
-diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
-index 4617aef29b..ecb8f545e1 100644
---- a/xen/common/ioreq.c
-+++ b/xen/common/ioreq.c
-@@ -1317,7 +1317,8 @@ unsigned int ioreq_broadcast(ioreq_t *p, bool buffered)
-
- FOR_EACH_IOREQ_SERVER(d, id, s)
- {
-- if ( !s->enabled )
-+ if ( !s->enabled ||
-+ (buffered && s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_OFF) )
- continue;
-
- if ( ioreq_send(s, p, buffered) == IOREQ_STATUS_UNHANDLED )
---
-2.40.0
-
diff --git a/0008-Config.mk-evaluate-XEN_COMPILE_ARCH-and-XEN_OS-immed.patch b/0008-Config.mk-evaluate-XEN_COMPILE_ARCH-and-XEN_OS-immed.patch
new file mode 100644
index 0000000..8a4cb7d
--- /dev/null
+++ b/0008-Config.mk-evaluate-XEN_COMPILE_ARCH-and-XEN_OS-immed.patch
@@ -0,0 +1,50 @@
+From 476d2624ec3cf3e60709580ff1df208bb8f616e2 Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Mon, 31 Jul 2023 15:02:34 +0200
+Subject: [PATCH 08/55] Config.mk: evaluate XEN_COMPILE_ARCH and XEN_OS
+ immediately
+
+With GNU make 4.4, the number of execution of the command present in
+these $(shell ) increased greatly. This is probably because as of make
+4.4, exported variable are also added to the environment of $(shell )
+construct.
+
+So to avoid having these command been run more than necessary, we
+will replace ?= by an equivalent but with immediate expansion.
+
+Reported-by: Jason Andryuk <jandryuk@gmail.com>
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Tested-by: Jason Andryuk <jandryuk@gmail.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit a07414d989cf52e5e84192b78023bee1589bbda4)
+---
+ Config.mk | 8 ++++++--
+ 1 file changed, 6 insertions(+), 2 deletions(-)
+
+diff --git a/Config.mk b/Config.mk
+index 8bc2bcd5f6..4864033c73 100644
+--- a/Config.mk
++++ b/Config.mk
+@@ -19,13 +19,17 @@ or = $(if $(strip $(1)),$(1),$(if $(strip $(2)),$(2),$(if $(strip $(3)),$(
+
+ -include $(XEN_ROOT)/.config
+
+-XEN_COMPILE_ARCH ?= $(shell uname -m | sed -e s/i.86/x86_32/ \
++ifeq ($(origin XEN_COMPILE_ARCH), undefined)
++XEN_COMPILE_ARCH := $(shell uname -m | sed -e s/i.86/x86_32/ \
+ -e s/i86pc/x86_32/ -e s/amd64/x86_64/ \
+ -e s/armv7.*/arm32/ -e s/armv8.*/arm64/ \
+ -e s/aarch64/arm64/)
++endif
+
+ XEN_TARGET_ARCH ?= $(XEN_COMPILE_ARCH)
+-XEN_OS ?= $(shell uname -s)
++ifeq ($(origin XEN_OS), undefined)
++XEN_OS := $(shell uname -s)
++endif
+
+ CONFIG_$(XEN_OS) := y
+
+--
+2.42.0
+
diff --git a/0008-EFI-relocate-the-ESRT-when-booting-via-multiboot2.patch b/0008-EFI-relocate-the-ESRT-when-booting-via-multiboot2.patch
deleted file mode 100644
index d1acae6..0000000
--- a/0008-EFI-relocate-the-ESRT-when-booting-via-multiboot2.patch
+++ /dev/null
@@ -1,195 +0,0 @@
-From 1dcc9b6dfe528c7815a314f9b5581804b5e23750 Mon Sep 17 00:00:00 2001
-From: Demi Marie Obenour <demi@invisiblethingslab.com>
-Date: Tue, 20 Dec 2022 13:46:09 +0100
-Subject: [PATCH 08/89] EFI: relocate the ESRT when booting via multiboot2
-
-This was missed in the initial patchset.
-
-Move efi_relocate_esrt() up to avoid adding a forward declaration.
-
-Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 8d7acf3f7d8d2555c78421dced45bc49f79ae806
-master date: 2022-12-14 12:00:35 +0100
----
- xen/arch/x86/efi/efi-boot.h | 2 +
- xen/common/efi/boot.c | 136 ++++++++++++++++++------------------
- 2 files changed, 70 insertions(+), 68 deletions(-)
-
-diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
-index 27f928ed3c..c94e53d139 100644
---- a/xen/arch/x86/efi/efi-boot.h
-+++ b/xen/arch/x86/efi/efi-boot.h
-@@ -823,6 +823,8 @@ void __init efi_multiboot2(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable
- if ( gop )
- efi_set_gop_mode(gop, gop_mode);
-
-+ efi_relocate_esrt(SystemTable);
-+
- efi_exit_boot(ImageHandle, SystemTable);
- }
-
-diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
-index b3de1011ee..d3c6b055ae 100644
---- a/xen/common/efi/boot.c
-+++ b/xen/common/efi/boot.c
-@@ -625,6 +625,74 @@ static size_t __init get_esrt_size(const EFI_MEMORY_DESCRIPTOR *desc)
- return esrt_ptr->FwResourceCount * sizeof(esrt_ptr->Entries[0]);
- }
-
-+static EFI_GUID __initdata esrt_guid = EFI_SYSTEM_RESOURCE_TABLE_GUID;
-+
-+static void __init efi_relocate_esrt(EFI_SYSTEM_TABLE *SystemTable)
-+{
-+ EFI_STATUS status;
-+ UINTN info_size = 0, map_key, mdesc_size;
-+ void *memory_map = NULL;
-+ UINT32 ver;
-+ unsigned int i;
-+
-+ for ( ; ; )
-+ {
-+ status = efi_bs->GetMemoryMap(&info_size, memory_map, &map_key,
-+ &mdesc_size, &ver);
-+ if ( status == EFI_SUCCESS && memory_map != NULL )
-+ break;
-+ if ( status == EFI_BUFFER_TOO_SMALL || memory_map == NULL )
-+ {
-+ info_size += 8 * mdesc_size;
-+ if ( memory_map != NULL )
-+ efi_bs->FreePool(memory_map);
-+ memory_map = NULL;
-+ status = efi_bs->AllocatePool(EfiLoaderData, info_size, &memory_map);
-+ if ( status == EFI_SUCCESS )
-+ continue;
-+ PrintErr(L"Cannot allocate memory to relocate ESRT\r\n");
-+ }
-+ else
-+ PrintErr(L"Cannot obtain memory map to relocate ESRT\r\n");
-+ return;
-+ }
-+
-+ /* Try to obtain the ESRT. Errors are not fatal. */
-+ for ( i = 0; i < info_size; i += mdesc_size )
-+ {
-+ /*
-+ * ESRT needs to be moved to memory of type EfiACPIReclaimMemory
-+ * so that the memory it is in will not be used for other purposes.
-+ */
-+ void *new_esrt = NULL;
-+ const EFI_MEMORY_DESCRIPTOR *desc = memory_map + i;
-+ size_t esrt_size = get_esrt_size(desc);
-+
-+ if ( !esrt_size )
-+ continue;
-+ if ( desc->Type == EfiRuntimeServicesData ||
-+ desc->Type == EfiACPIReclaimMemory )
-+ break; /* ESRT already safe from reuse */
-+ status = efi_bs->AllocatePool(EfiACPIReclaimMemory, esrt_size,
-+ &new_esrt);
-+ if ( status == EFI_SUCCESS && new_esrt )
-+ {
-+ memcpy(new_esrt, (void *)esrt, esrt_size);
-+ status = efi_bs->InstallConfigurationTable(&esrt_guid, new_esrt);
-+ if ( status != EFI_SUCCESS )
-+ {
-+ PrintErr(L"Cannot install new ESRT\r\n");
-+ efi_bs->FreePool(new_esrt);
-+ }
-+ }
-+ else
-+ PrintErr(L"Cannot allocate memory for ESRT\r\n");
-+ break;
-+ }
-+
-+ efi_bs->FreePool(memory_map);
-+}
-+
- /*
- * Include architecture specific implementation here, which references the
- * static globals defined above.
-@@ -903,8 +971,6 @@ static UINTN __init efi_find_gop_mode(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,
- return gop_mode;
- }
-
--static EFI_GUID __initdata esrt_guid = EFI_SYSTEM_RESOURCE_TABLE_GUID;
--
- static void __init efi_tables(void)
- {
- unsigned int i;
-@@ -1113,72 +1179,6 @@ static void __init efi_set_gop_mode(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop, UINTN gop
- #define INVALID_VIRTUAL_ADDRESS (0xBAAADUL << \
- (EFI_PAGE_SHIFT + BITS_PER_LONG - 32))
-
--static void __init efi_relocate_esrt(EFI_SYSTEM_TABLE *SystemTable)
--{
-- EFI_STATUS status;
-- UINTN info_size = 0, map_key, mdesc_size;
-- void *memory_map = NULL;
-- UINT32 ver;
-- unsigned int i;
--
-- for ( ; ; )
-- {
-- status = efi_bs->GetMemoryMap(&info_size, memory_map, &map_key,
-- &mdesc_size, &ver);
-- if ( status == EFI_SUCCESS && memory_map != NULL )
-- break;
-- if ( status == EFI_BUFFER_TOO_SMALL || memory_map == NULL )
-- {
-- info_size += 8 * mdesc_size;
-- if ( memory_map != NULL )
-- efi_bs->FreePool(memory_map);
-- memory_map = NULL;
-- status = efi_bs->AllocatePool(EfiLoaderData, info_size, &memory_map);
-- if ( status == EFI_SUCCESS )
-- continue;
-- PrintErr(L"Cannot allocate memory to relocate ESRT\r\n");
-- }
-- else
-- PrintErr(L"Cannot obtain memory map to relocate ESRT\r\n");
-- return;
-- }
--
-- /* Try to obtain the ESRT. Errors are not fatal. */
-- for ( i = 0; i < info_size; i += mdesc_size )
-- {
-- /*
-- * ESRT needs to be moved to memory of type EfiACPIReclaimMemory
-- * so that the memory it is in will not be used for other purposes.
-- */
-- void *new_esrt = NULL;
-- const EFI_MEMORY_DESCRIPTOR *desc = memory_map + i;
-- size_t esrt_size = get_esrt_size(desc);
--
-- if ( !esrt_size )
-- continue;
-- if ( desc->Type == EfiRuntimeServicesData ||
-- desc->Type == EfiACPIReclaimMemory )
-- break; /* ESRT already safe from reuse */
-- status = efi_bs->AllocatePool(EfiACPIReclaimMemory, esrt_size,
-- &new_esrt);
-- if ( status == EFI_SUCCESS && new_esrt )
-- {
-- memcpy(new_esrt, (void *)esrt, esrt_size);
-- status = efi_bs->InstallConfigurationTable(&esrt_guid, new_esrt);
-- if ( status != EFI_SUCCESS )
-- {
-- PrintErr(L"Cannot install new ESRT\r\n");
-- efi_bs->FreePool(new_esrt);
-- }
-- }
-- else
-- PrintErr(L"Cannot allocate memory for ESRT\r\n");
-- break;
-- }
--
-- efi_bs->FreePool(memory_map);
--}
--
- static void __init efi_exit_boot(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable)
- {
- EFI_STATUS status;
---
-2.40.0
-
diff --git a/0009-x86-time-prevent-overflow-with-high-frequency-TSCs.patch b/0009-x86-time-prevent-overflow-with-high-frequency-TSCs.patch
deleted file mode 100644
index a9401d7..0000000
--- a/0009-x86-time-prevent-overflow-with-high-frequency-TSCs.patch
+++ /dev/null
@@ -1,34 +0,0 @@
-From a7a26da0b59da7233e6c6f63b180bab131398351 Mon Sep 17 00:00:00 2001
-From: Neowutran <xen@neowutran.ovh>
-Date: Tue, 20 Dec 2022 13:46:38 +0100
-Subject: [PATCH 09/89] x86/time: prevent overflow with high frequency TSCs
-
-Make sure tsc_khz is promoted to a 64-bit type before multiplying by
-1000 to avoid an 'overflow before widen' bug. Otherwise just above
-4.294GHz the value will overflow. Processors with clocks this high are
-now in production and require this to work correctly.
-
-Signed-off-by: Neowutran <xen@neowutran.ovh>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: ad15a0a8ca2515d8ac58edfc0bc1d3719219cb77
-master date: 2022-12-19 11:34:16 +0100
----
- xen/arch/x86/time.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
-index b01acd390d..d882b43cf0 100644
---- a/xen/arch/x86/time.c
-+++ b/xen/arch/x86/time.c
-@@ -2585,7 +2585,7 @@ int tsc_set_info(struct domain *d,
- case TSC_MODE_ALWAYS_EMULATE:
- d->arch.vtsc_offset = get_s_time() - elapsed_nsec;
- d->arch.tsc_khz = gtsc_khz ?: cpu_khz;
-- set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000);
-+ set_time_scale(&d->arch.vtsc_to_ns, d->arch.tsc_khz * 1000UL);
-
- /*
- * In default mode use native TSC if the host has safe TSC and
---
-2.40.0
-
diff --git a/0009-x86emul-rework-wrapping-of-libc-functions-in-test-an.patch b/0009-x86emul-rework-wrapping-of-libc-functions-in-test-an.patch
new file mode 100644
index 0000000..4f9c0bb
--- /dev/null
+++ b/0009-x86emul-rework-wrapping-of-libc-functions-in-test-an.patch
@@ -0,0 +1,245 @@
+From 37f1d68fa34220600f1e4ec82af5da70127757e5 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Fri, 18 Aug 2023 15:04:28 +0200
+Subject: [PATCH 09/55] x86emul: rework wrapping of libc functions in test and
+ fuzzing harnesses
+
+Our present approach is working fully behind the compiler's back. This
+was found to not work with LTO. Employ ld's --wrap= option instead. Note
+that while this makes the build work at least with new enough gcc (it
+doesn't with gcc7, for example, due to tool chain side issues afaict),
+according to my testing things still won't work when building the
+fuzzing harness with afl-cc: While with the gcc7 tool chain I see afl-as
+getting invoked, this does not happen with gcc13. Yet without using that
+assembler wrapper the resulting binary will look uninstrumented to
+afl-fuzz.
+
+While checking the resulting binaries I noticed that we've gained uses
+of snprintf() and strstr(), which only just so happen to not cause any
+problems. Add a wrappers for them as well.
+
+Since we don't have any actual uses of v{,sn}printf(), no definitions of
+their wrappers appear (just yet). But I think we want
+__wrap_{,sn}printf() to properly use __real_v{,sn}printf() right away,
+which means we need delarations of the latter.
+
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+(cherry picked from commit 6fba45ca3be1c5d46cddb1eaf371d9e69550b244)
+---
+ tools/fuzz/x86_instruction_emulator/Makefile | 6 ++-
+ tools/tests/x86_emulator/Makefile | 4 +-
+ tools/tests/x86_emulator/wrappers.c | 55 ++++++++++++++------
+ tools/tests/x86_emulator/x86-emulate.h | 14 +++--
+ 4 files changed, 53 insertions(+), 26 deletions(-)
+
+diff --git a/tools/fuzz/x86_instruction_emulator/Makefile b/tools/fuzz/x86_instruction_emulator/Makefile
+index 13aa238503..c83959c847 100644
+--- a/tools/fuzz/x86_instruction_emulator/Makefile
++++ b/tools/fuzz/x86_instruction_emulator/Makefile
+@@ -29,6 +29,8 @@ GCOV_FLAGS := --coverage
+ %-cov.o: %.c
+ $(CC) -c $(CFLAGS) $(GCOV_FLAGS) $< -o $@
+
++WRAPPED = $(shell sed -n 's,^ *WRAP(\([[:alnum:]_]*\));,\1,p' x86-emulate.h)
++
+ x86-emulate.h: x86_emulate/x86_emulate.h
+ x86-emulate.o x86-emulate-cov.o: x86-emulate.h x86_emulate/x86_emulate.c
+ fuzz-emul.o fuzz-emul-cov.o wrappers.o: x86-emulate.h
+@@ -37,10 +39,10 @@ x86-insn-fuzzer.a: fuzz-emul.o x86-emulate.o cpuid.o
+ $(AR) rc $@ $^
+
+ afl-harness: afl-harness.o fuzz-emul.o x86-emulate.o cpuid.o wrappers.o
+- $(CC) $(CFLAGS) $^ -o $@
++ $(CC) $(CFLAGS) $(addprefix -Wl$(comma)--wrap=,$(WRAPPED)) $^ -o $@
+
+ afl-harness-cov: afl-harness-cov.o fuzz-emul-cov.o x86-emulate-cov.o cpuid.o wrappers.o
+- $(CC) $(CFLAGS) $(GCOV_FLAGS) $^ -o $@
++ $(CC) $(CFLAGS) $(GCOV_FLAGS) $(addprefix -Wl$(comma)--wrap=,$(WRAPPED)) $^ -o $@
+
+ # Common targets
+ .PHONY: all
+diff --git a/tools/tests/x86_emulator/Makefile b/tools/tests/x86_emulator/Makefile
+index bd82598f97..a2fd6607c6 100644
+--- a/tools/tests/x86_emulator/Makefile
++++ b/tools/tests/x86_emulator/Makefile
+@@ -250,8 +250,10 @@ xop.h avx512f.h: simd-fma.c
+
+ endif # 32-bit override
+
++WRAPPED := $(shell sed -n 's,^ *WRAP(\([[:alnum:]_]*\));,\1,p' x86-emulate.h)
++
+ $(TARGET): x86-emulate.o cpuid.o test_x86_emulator.o evex-disp8.o predicates.o wrappers.o
+- $(HOSTCC) $(HOSTCFLAGS) -o $@ $^
++ $(HOSTCC) $(HOSTCFLAGS) $(addprefix -Wl$(comma)--wrap=,$(WRAPPED)) -o $@ $^
+
+ .PHONY: clean
+ clean:
+diff --git a/tools/tests/x86_emulator/wrappers.c b/tools/tests/x86_emulator/wrappers.c
+index eba7cc93c5..3829a6f416 100644
+--- a/tools/tests/x86_emulator/wrappers.c
++++ b/tools/tests/x86_emulator/wrappers.c
+@@ -1,78 +1,103 @@
+ #include <stdarg.h>
+
+-#define WRAP(x) typeof(x) emul_##x
++#define WRAP(x) typeof(x) __wrap_ ## x, __real_ ## x
+ #include "x86-emulate.h"
+
+-size_t emul_fwrite(const void *src, size_t sz, size_t n, FILE *f)
++size_t __wrap_fwrite(const void *src, size_t sz, size_t n, FILE *f)
+ {
+ emul_save_fpu_state();
+- sz = fwrite(src, sz, n, f);
++ sz = __real_fwrite(src, sz, n, f);
+ emul_restore_fpu_state();
+
+ return sz;
+ }
+
+-int emul_memcmp(const void *p1, const void *p2, size_t sz)
++int __wrap_memcmp(const void *p1, const void *p2, size_t sz)
+ {
+ int rc;
+
+ emul_save_fpu_state();
+- rc = memcmp(p1, p2, sz);
++ rc = __real_memcmp(p1, p2, sz);
+ emul_restore_fpu_state();
+
+ return rc;
+ }
+
+-void *emul_memcpy(void *dst, const void *src, size_t sz)
++void *__wrap_memcpy(void *dst, const void *src, size_t sz)
+ {
+ emul_save_fpu_state();
+- memcpy(dst, src, sz);
++ __real_memcpy(dst, src, sz);
+ emul_restore_fpu_state();
+
+ return dst;
+ }
+
+-void *emul_memset(void *dst, int c, size_t sz)
++void *__wrap_memset(void *dst, int c, size_t sz)
+ {
+ emul_save_fpu_state();
+- memset(dst, c, sz);
++ __real_memset(dst, c, sz);
+ emul_restore_fpu_state();
+
+ return dst;
+ }
+
+-int emul_printf(const char *fmt, ...)
++int __wrap_printf(const char *fmt, ...)
+ {
+ va_list varg;
+ int rc;
+
+ emul_save_fpu_state();
+ va_start(varg, fmt);
+- rc = vprintf(fmt, varg);
++ rc = __real_vprintf(fmt, varg);
+ va_end(varg);
+ emul_restore_fpu_state();
+
+ return rc;
+ }
+
+-int emul_putchar(int c)
++int __wrap_putchar(int c)
+ {
+ int rc;
+
+ emul_save_fpu_state();
+- rc = putchar(c);
++ rc = __real_putchar(c);
+ emul_restore_fpu_state();
+
+ return rc;
+ }
+
+-int emul_puts(const char *str)
++int __wrap_puts(const char *str)
+ {
+ int rc;
+
+ emul_save_fpu_state();
+- rc = puts(str);
++ rc = __real_puts(str);
+ emul_restore_fpu_state();
+
+ return rc;
+ }
++
++int __wrap_snprintf(char *buf, size_t n, const char *fmt, ...)
++{
++ va_list varg;
++ int rc;
++
++ emul_save_fpu_state();
++ va_start(varg, fmt);
++ rc = __real_vsnprintf(buf, n, fmt, varg);
++ va_end(varg);
++ emul_restore_fpu_state();
++
++ return rc;
++}
++
++char *__wrap_strstr(const char *s1, const char *s2)
++{
++ char *s;
++
++ emul_save_fpu_state();
++ s = __real_strstr(s1, s2);
++ emul_restore_fpu_state();
++
++ return s;
++}
+diff --git a/tools/tests/x86_emulator/x86-emulate.h b/tools/tests/x86_emulator/x86-emulate.h
+index 19bea9c38d..58760f096d 100644
+--- a/tools/tests/x86_emulator/x86-emulate.h
++++ b/tools/tests/x86_emulator/x86-emulate.h
+@@ -29,9 +29,7 @@
+ #ifdef EOF
+ # error "Must not include <stdio.h> before x86-emulate.h"
+ #endif
+-#ifdef WRAP
+-# include <stdio.h>
+-#endif
++#include <stdio.h>
+
+ #include <xen/xen.h>
+
+@@ -85,11 +83,7 @@ void emul_restore_fpu_state(void);
+ * around the actual function.
+ */
+ #ifndef WRAP
+-# if 0 /* This only works for explicit calls, not for compiler generated ones. */
+-# define WRAP(x) typeof(x) x asm("emul_" #x)
+-# else
+-# define WRAP(x) asm(".equ " #x ", emul_" #x)
+-# endif
++# define WRAP(x) typeof(x) __wrap_ ## x
+ #endif
+
+ WRAP(fwrite);
+@@ -99,6 +93,10 @@ WRAP(memset);
+ WRAP(printf);
+ WRAP(putchar);
+ WRAP(puts);
++WRAP(snprintf);
++WRAP(strstr);
++WRAP(vprintf);
++WRAP(vsnprintf);
+
+ #undef WRAP
+
+--
+2.42.0
+
diff --git a/0010-rombios-Work-around-GCC-issue-99578.patch b/0010-rombios-Work-around-GCC-issue-99578.patch
new file mode 100644
index 0000000..3995f02
--- /dev/null
+++ b/0010-rombios-Work-around-GCC-issue-99578.patch
@@ -0,0 +1,43 @@
+From ae1045c42954772e48862162d0e95fbc9393c91e Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 17 Aug 2023 21:32:53 +0100
+Subject: [PATCH 10/55] rombios: Work around GCC issue 99578
+
+GCC 12 objects to pointers derived from a constant:
+
+ util.c: In function 'find_rsdp':
+ util.c:429:16: error: array subscript 0 is outside array bounds of 'uint16_t[0]' {aka 'short unsigned int[]'} [-Werror=array-bounds]
+ 429 | ebda_seg = *(uint16_t *)ADDR_FROM_SEG_OFF(0x40, 0xe);
+ cc1: all warnings being treated as errors
+
+This is a GCC bug, but work around it rather than turning array-bounds
+checking off generally.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit e35138a2ffbe1fe71edaaaaae71063dc545a8416)
+---
+ tools/firmware/rombios/32bit/util.c | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+diff --git a/tools/firmware/rombios/32bit/util.c b/tools/firmware/rombios/32bit/util.c
+index 6c1c480514..a47e000a26 100644
+--- a/tools/firmware/rombios/32bit/util.c
++++ b/tools/firmware/rombios/32bit/util.c
+@@ -424,10 +424,10 @@ static struct acpi_20_rsdp *__find_rsdp(const void *start, unsigned int len)
+ struct acpi_20_rsdp *find_rsdp(void)
+ {
+ struct acpi_20_rsdp *rsdp;
+- uint16_t ebda_seg;
++ uint16_t *volatile /* GCC issue 99578 */ ebda_seg =
++ ADDR_FROM_SEG_OFF(0x40, 0xe);
+
+- ebda_seg = *(uint16_t *)ADDR_FROM_SEG_OFF(0x40, 0xe);
+- rsdp = __find_rsdp((void *)(ebda_seg << 16), 1024);
++ rsdp = __find_rsdp((void *)(*ebda_seg << 16), 1024);
+ if (!rsdp)
+ rsdp = __find_rsdp((void *)0xE0000, 0x20000);
+
+--
+2.42.0
+
diff --git a/0010-tools-oxenstored-Fix-incorrect-scope-after-an-if-sta.patch b/0010-tools-oxenstored-Fix-incorrect-scope-after-an-if-sta.patch
deleted file mode 100644
index a8c427d..0000000
--- a/0010-tools-oxenstored-Fix-incorrect-scope-after-an-if-sta.patch
+++ /dev/null
@@ -1,52 +0,0 @@
-From 2e8d7a08bcd111fe21569e9ace1a047df76da949 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 11 Nov 2022 18:50:34 +0000
-Subject: [PATCH 10/89] tools/oxenstored: Fix incorrect scope after an if
- statement
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-A debug statement got inserted into a single-expression if statement.
-
-Insert brackets to give the intended meaning, rather than the actual meaning
-where the "let con = Connections..." is outside and executed unconditionally.
-
-This results in some unnecessary ring checks for domains which otherwise have
-IO credit.
-
-Fixes: 42f0581a91d4 ("tools/oxenstored: Implement live update for socket connections")
-Reported-by: Edwin Török <edvin.torok@citrix.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit ee36179371fd4215a43fb179be2165f65c1cd1cd)
----
- tools/ocaml/xenstored/xenstored.ml | 5 +++--
- 1 file changed, 3 insertions(+), 2 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
-index ffd43a4eee..c5dc7a28d0 100644
---- a/tools/ocaml/xenstored/xenstored.ml
-+++ b/tools/ocaml/xenstored/xenstored.ml
-@@ -475,7 +475,7 @@ let _ =
-
- let ring_scan_checker dom =
- (* no need to scan domains already marked as for processing *)
-- if not (Domain.get_io_credit dom > 0) then
-+ if not (Domain.get_io_credit dom > 0) then (
- debug "Looking up domid %d" (Domain.get_id dom);
- let con = Connections.find_domain cons (Domain.get_id dom) in
- if not (Connection.has_more_work con) then (
-@@ -490,7 +490,8 @@ let _ =
- let n = 32 + 2 * (Domains.number domains) in
- info "found lazy domain %d, credit %d" (Domain.get_id dom) n;
- Domain.set_io_credit ~n dom
-- ) in
-+ )
-+ ) in
-
- let last_stat_time = ref 0. in
- let last_scan_time = ref 0. in
---
-2.40.0
-
diff --git a/0011-rombios-Avoid-using-K-R-function-syntax.patch b/0011-rombios-Avoid-using-K-R-function-syntax.patch
new file mode 100644
index 0000000..0bd761f
--- /dev/null
+++ b/0011-rombios-Avoid-using-K-R-function-syntax.patch
@@ -0,0 +1,74 @@
+From 24487fec3bbebbc1fd3f00d16bca7fb0f56a5f30 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 18 Aug 2023 10:47:46 +0100
+Subject: [PATCH 11/55] rombios: Avoid using K&R function syntax
+
+Clang-15 complains:
+
+ tcgbios.c:598:25: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
+ void tcpa_calling_int19h()
+ ^
+ void
+
+C2x formally removes K&R syntax. The declarations for these functions in
+32bitprotos.h are already ANSI compatible. Update the definitions to match.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit a562afa5679d4a7ceb9cb9222fec1fea9a61f738)
+---
+ tools/firmware/rombios/32bit/tcgbios/tcgbios.c | 10 +++++-----
+ 1 file changed, 5 insertions(+), 5 deletions(-)
+
+diff --git a/tools/firmware/rombios/32bit/tcgbios/tcgbios.c b/tools/firmware/rombios/32bit/tcgbios/tcgbios.c
+index fa22c4460a..ad0eac0d20 100644
+--- a/tools/firmware/rombios/32bit/tcgbios/tcgbios.c
++++ b/tools/firmware/rombios/32bit/tcgbios/tcgbios.c
+@@ -595,7 +595,7 @@ static void tcpa_add_measurement(uint32_t pcrIndex,
+ /*
+ * Add measurement to log about call of int 19h
+ */
+-void tcpa_calling_int19h()
++void tcpa_calling_int19h(void)
+ {
+ tcpa_add_measurement(4, EV_ACTION, 0);
+ }
+@@ -603,7 +603,7 @@ void tcpa_calling_int19h()
+ /*
+ * Add measurement to log about retuning from int 19h
+ */
+-void tcpa_returned_int19h()
++void tcpa_returned_int19h(void)
+ {
+ tcpa_add_measurement(4, EV_ACTION, 1);
+ }
+@@ -611,7 +611,7 @@ void tcpa_returned_int19h()
+ /*
+ * Add event separators for PCRs 0 to 7; specs 8.2.3
+ */
+-void tcpa_add_event_separators()
++void tcpa_add_event_separators(void)
+ {
+ uint32_t pcrIndex = 0;
+ while (pcrIndex <= 7) {
+@@ -624,7 +624,7 @@ void tcpa_add_event_separators()
+ /*
+ * Add a wake event to the log
+ */
+-void tcpa_wake_event()
++void tcpa_wake_event(void)
+ {
+ tcpa_add_measurement_to_log(6,
+ EV_ACTION,
+@@ -659,7 +659,7 @@ void tcpa_add_bootdevice(uint32_t bootcd, uint32_t bootdrv)
+ * Add measurement to the log about option rom scan
+ * 10.4.3 : action 14
+ */
+-void tcpa_start_option_rom_scan()
++void tcpa_start_option_rom_scan(void)
+ {
+ tcpa_add_measurement(2, EV_ACTION, 14);
+ }
+--
+2.42.0
+
diff --git a/0011-tools-ocaml-evtchn-OCaml-5-support-fix-potential-res.patch b/0011-tools-ocaml-evtchn-OCaml-5-support-fix-potential-res.patch
deleted file mode 100644
index c9cf630..0000000
--- a/0011-tools-ocaml-evtchn-OCaml-5-support-fix-potential-res.patch
+++ /dev/null
@@ -1,68 +0,0 @@
-From d11528a993f80c6a86f4cb0c30578c026348e3e4 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Tue, 18 Jan 2022 15:04:48 +0000
-Subject: [PATCH 11/89] tools/ocaml/evtchn: OCaml 5 support, fix potential
- resource leak
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-There is no binding for xenevtchn_close(). In principle, this is a resource
-leak, but the typical usage is as a singleton that lives for the lifetime of
-the program.
-
-Ocaml 5 no longer permits storing a naked C pointer in an Ocaml value.
-
-Therefore, use a Custom block. This allows us to use the finaliser callback
-to call xenevtchn_close(), if the Ocaml object goes out of scope.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 22d5affdf0cecfa6faae46fbaec68b8018835220)
----
- tools/ocaml/libs/eventchn/xeneventchn_stubs.c | 21 +++++++++++++++++--
- 1 file changed, 19 insertions(+), 2 deletions(-)
-
-diff --git a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
-index f889a7a2e4..37f1cc4e14 100644
---- a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
-+++ b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
-@@ -33,7 +33,22 @@
- #include <caml/fail.h>
- #include <caml/signals.h>
-
--#define _H(__h) ((xenevtchn_handle *)(__h))
-+#define _H(__h) (*((xenevtchn_handle **)Data_custom_val(__h)))
-+
-+static void stub_evtchn_finalize(value v)
-+{
-+ xenevtchn_close(_H(v));
-+}
-+
-+static struct custom_operations xenevtchn_ops = {
-+ .identifier = "xenevtchn",
-+ .finalize = stub_evtchn_finalize,
-+ .compare = custom_compare_default, /* Can't compare */
-+ .hash = custom_hash_default, /* Can't hash */
-+ .serialize = custom_serialize_default, /* Can't serialize */
-+ .deserialize = custom_deserialize_default, /* Can't deserialize */
-+ .compare_ext = custom_compare_ext_default, /* Can't compare */
-+};
-
- CAMLprim value stub_eventchn_init(void)
- {
-@@ -48,7 +63,9 @@ CAMLprim value stub_eventchn_init(void)
- if (xce == NULL)
- caml_failwith("open failed");
-
-- result = (value)xce;
-+ result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
-+ _H(result) = xce;
-+
- CAMLreturn(result);
- }
-
---
-2.40.0
-
diff --git a/0012-rombios-Remove-the-use-of-egrep.patch b/0012-rombios-Remove-the-use-of-egrep.patch
new file mode 100644
index 0000000..44702b4
--- /dev/null
+++ b/0012-rombios-Remove-the-use-of-egrep.patch
@@ -0,0 +1,34 @@
+From e418a77295e6b512d212b57123c11e4d4fb23e8c Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 18 Aug 2023 11:05:00 +0100
+Subject: [PATCH 12/55] rombios: Remove the use of egrep
+
+As the Alpine 3.18 container notes:
+
+ egrep: warning: egrep is obsolescent; using grep -E
+
+Adjust it.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 5ddac3c2852ecc120acab86fc403153a2097c5dc)
+---
+ tools/firmware/rombios/32bit/Makefile | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/tools/firmware/rombios/32bit/Makefile b/tools/firmware/rombios/32bit/Makefile
+index c058c71551..50d45647c2 100644
+--- a/tools/firmware/rombios/32bit/Makefile
++++ b/tools/firmware/rombios/32bit/Makefile
+@@ -26,7 +26,7 @@ $(TARGET): 32bitbios_all.o
+ 32bitbios_all.o: 32bitbios.o tcgbios/tcgbiosext.o util.o pmm.o
+ $(LD) $(LDFLAGS_DIRECT) -s -r $^ -o 32bitbios_all.o
+ @nm 32bitbios_all.o | \
+- egrep '^ +U ' >/dev/null && { \
++ grep -E '^ +U ' >/dev/null && { \
+ echo "There are undefined symbols in the BIOS:"; \
+ nm -u 32bitbios_all.o; \
+ exit 11; \
+--
+2.42.0
+
diff --git a/0012-tools-ocaml-evtchn-Add-binding-for-xenevtchn_fdopen.patch b/0012-tools-ocaml-evtchn-Add-binding-for-xenevtchn_fdopen.patch
deleted file mode 100644
index 7e921fd..0000000
--- a/0012-tools-ocaml-evtchn-Add-binding-for-xenevtchn_fdopen.patch
+++ /dev/null
@@ -1,81 +0,0 @@
-From 24d9dc2ae2f88249fcf81f7b7e612cdfb7c73e4b Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Mon, 14 Nov 2022 13:36:19 +0000
-Subject: [PATCH 12/89] tools/ocaml/evtchn: Add binding for xenevtchn_fdopen()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-For live update, the new oxenstored needs to reconstruct an evtchn object
-around an existing file descriptor.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 7ba68a6c558e1fd811c95cb7215a5cd07a3cc2ea)
----
- tools/ocaml/libs/eventchn/xeneventchn.ml | 1 +
- tools/ocaml/libs/eventchn/xeneventchn.mli | 4 ++++
- tools/ocaml/libs/eventchn/xeneventchn_stubs.c | 19 +++++++++++++++++++
- 3 files changed, 24 insertions(+)
-
-diff --git a/tools/ocaml/libs/eventchn/xeneventchn.ml b/tools/ocaml/libs/eventchn/xeneventchn.ml
-index dd00a1f0ea..be4de82f46 100644
---- a/tools/ocaml/libs/eventchn/xeneventchn.ml
-+++ b/tools/ocaml/libs/eventchn/xeneventchn.ml
-@@ -17,6 +17,7 @@
- type handle
-
- external init: unit -> handle = "stub_eventchn_init"
-+external fdopen: Unix.file_descr -> handle = "stub_eventchn_fdopen"
- external fd: handle -> Unix.file_descr = "stub_eventchn_fd"
-
- type t = int
-diff --git a/tools/ocaml/libs/eventchn/xeneventchn.mli b/tools/ocaml/libs/eventchn/xeneventchn.mli
-index 08c7337643..98b3c86f37 100644
---- a/tools/ocaml/libs/eventchn/xeneventchn.mli
-+++ b/tools/ocaml/libs/eventchn/xeneventchn.mli
-@@ -47,6 +47,10 @@ val init: unit -> handle
- (** Return an initialised event channel interface. On error it
- will throw a Failure exception. *)
-
-+val fdopen: Unix.file_descr -> handle
-+(** Return an initialised event channel interface, from an already open evtchn
-+ file descriptor. On error it will throw a Failure exception. *)
-+
- val fd: handle -> Unix.file_descr
- (** Return a file descriptor suitable for Unix.select. When
- the descriptor becomes readable, it is safe to call 'pending'.
-diff --git a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
-index 37f1cc4e14..7bdf711bc1 100644
---- a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
-+++ b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
-@@ -69,6 +69,25 @@ CAMLprim value stub_eventchn_init(void)
- CAMLreturn(result);
- }
-
-+CAMLprim value stub_eventchn_fdopen(value fdval)
-+{
-+ CAMLparam1(fdval);
-+ CAMLlocal1(result);
-+ xenevtchn_handle *xce;
-+
-+ caml_enter_blocking_section();
-+ xce = xenevtchn_fdopen(NULL, Int_val(fdval), 0);
-+ caml_leave_blocking_section();
-+
-+ if (xce == NULL)
-+ caml_failwith("evtchn fdopen failed");
-+
-+ result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
-+ _H(result) = xce;
-+
-+ CAMLreturn(result);
-+}
-+
- CAMLprim value stub_eventchn_fd(value xce)
- {
- CAMLparam1(xce);
---
-2.40.0
-
diff --git a/0013-CI-Resync-FreeBSD-config-with-staging.patch b/0013-CI-Resync-FreeBSD-config-with-staging.patch
new file mode 100644
index 0000000..dcd867b
--- /dev/null
+++ b/0013-CI-Resync-FreeBSD-config-with-staging.patch
@@ -0,0 +1,62 @@
+From f00d56309533427981f09ef2614f1bae4bcab62e Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 17 Feb 2023 11:16:32 +0000
+Subject: [PATCH 13/55] CI: Resync FreeBSD config with staging
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+CI: Update FreeBSD to 13.1
+
+Also print the compiler version before starting. It's not easy to find
+otherwise, and does change from time to time.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
+(cherry picked from commit 5e7667ea2dd33e0e5e0f3a96db37fdb4ecd98fba)
+
+CI: Update FreeBSD to 13.2
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Stefano Stabellini <sstabellini@kernel.org>
+(cherry picked from commit f872a624cbf92de9944483eea7674ef80ced1380)
+
+CI: Update FreeBSD to 12.4
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+(cherry picked from commit a73560896ce3c513460f26bd1c205060d6ec4f8a)
+---
+ .cirrus.yml | 5 +++--
+ 1 file changed, 3 insertions(+), 2 deletions(-)
+
+diff --git a/.cirrus.yml b/.cirrus.yml
+index c38333e736..7e0beb200d 100644
+--- a/.cirrus.yml
++++ b/.cirrus.yml
+@@ -10,19 +10,20 @@ freebsd_template: &FREEBSD_TEMPLATE
+ libxml2 glib git
+
+ build_script:
++ - cc --version
+ - ./configure --with-system-seabios=/usr/local/share/seabios/bios.bin
+ - gmake -j`sysctl -n hw.ncpu` clang=y
+
+ task:
+ name: 'FreeBSD 12'
+ freebsd_instance:
+- image_family: freebsd-12-3
++ image_family: freebsd-12-4
+ << : *FREEBSD_TEMPLATE
+
+ task:
+ name: 'FreeBSD 13'
+ freebsd_instance:
+- image_family: freebsd-13-0
++ image_family: freebsd-13-2
+ << : *FREEBSD_TEMPLATE
+
+ task:
+--
+2.42.0
+
diff --git a/0013-tools-ocaml-evtchn-Extend-the-init-binding-with-a-cl.patch b/0013-tools-ocaml-evtchn-Extend-the-init-binding-with-a-cl.patch
deleted file mode 100644
index af889eb..0000000
--- a/0013-tools-ocaml-evtchn-Extend-the-init-binding-with-a-cl.patch
+++ /dev/null
@@ -1,90 +0,0 @@
-From c7cf603836e40de1b4a6ca7d1d52736eb4a10327 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Thu, 3 Nov 2022 14:50:38 +0000
-Subject: [PATCH 13/89] tools/ocaml/evtchn: Extend the init() binding with a
- cloexec flag
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-For live update, oxenstored wants to clear CLOEXEC on the evtchn handle, so it
-survives the execve() into the new oxenstored.
-
-Have the new interface match how cloexec works in other Ocaml standard
-libraries.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 9bafe4a53306e7aa2ce6ffc96f7477c6f329f7a7)
----
- tools/ocaml/libs/eventchn/xeneventchn.ml | 5 ++++-
- tools/ocaml/libs/eventchn/xeneventchn.mli | 9 ++++++---
- tools/ocaml/libs/eventchn/xeneventchn_stubs.c | 10 +++++++---
- 3 files changed, 17 insertions(+), 7 deletions(-)
-
-diff --git a/tools/ocaml/libs/eventchn/xeneventchn.ml b/tools/ocaml/libs/eventchn/xeneventchn.ml
-index be4de82f46..c16fdd4674 100644
---- a/tools/ocaml/libs/eventchn/xeneventchn.ml
-+++ b/tools/ocaml/libs/eventchn/xeneventchn.ml
-@@ -16,7 +16,10 @@
-
- type handle
-
--external init: unit -> handle = "stub_eventchn_init"
-+external _init: bool -> handle = "stub_eventchn_init"
-+
-+let init ?(cloexec=true) () = _init cloexec
-+
- external fdopen: Unix.file_descr -> handle = "stub_eventchn_fdopen"
- external fd: handle -> Unix.file_descr = "stub_eventchn_fd"
-
-diff --git a/tools/ocaml/libs/eventchn/xeneventchn.mli b/tools/ocaml/libs/eventchn/xeneventchn.mli
-index 98b3c86f37..870429b6b5 100644
---- a/tools/ocaml/libs/eventchn/xeneventchn.mli
-+++ b/tools/ocaml/libs/eventchn/xeneventchn.mli
-@@ -43,9 +43,12 @@ val to_int: t -> int
-
- val of_int: int -> t
-
--val init: unit -> handle
--(** Return an initialised event channel interface. On error it
-- will throw a Failure exception. *)
-+val init: ?cloexec:bool -> unit -> handle
-+(** [init ?cloexec ()]
-+ Return an initialised event channel interface.
-+ The default is to close the underlying file descriptor
-+ on [execve], which can be overriden with [~cloexec:false].
-+ On error it will throw a Failure exception. *)
-
- val fdopen: Unix.file_descr -> handle
- (** Return an initialised event channel interface, from an already open evtchn
-diff --git a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
-index 7bdf711bc1..aa8a69cc1e 100644
---- a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
-+++ b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
-@@ -50,14 +50,18 @@ static struct custom_operations xenevtchn_ops = {
- .compare_ext = custom_compare_ext_default, /* Can't compare */
- };
-
--CAMLprim value stub_eventchn_init(void)
-+CAMLprim value stub_eventchn_init(value cloexec)
- {
-- CAMLparam0();
-+ CAMLparam1(cloexec);
- CAMLlocal1(result);
- xenevtchn_handle *xce;
-+ unsigned int flags = 0;
-+
-+ if ( !Bool_val(cloexec) )
-+ flags |= XENEVTCHN_NO_CLOEXEC;
-
- caml_enter_blocking_section();
-- xce = xenevtchn_open(NULL, 0);
-+ xce = xenevtchn_open(NULL, flags);
- caml_leave_blocking_section();
-
- if (xce == NULL)
---
-2.40.0
-
diff --git a/0014-tools-oxenstored-Style-fixes-to-Domain.patch b/0014-tools-oxenstored-Style-fixes-to-Domain.patch
deleted file mode 100644
index aad4399..0000000
--- a/0014-tools-oxenstored-Style-fixes-to-Domain.patch
+++ /dev/null
@@ -1,64 +0,0 @@
-From 0929960173bc76b8d90df73c8ee665747c233e18 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 30 Nov 2022 14:56:43 +0000
-Subject: [PATCH 14/89] tools/oxenstored: Style fixes to Domain
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-This file has some style problems so severe that they interfere with the
-readability of the subsequent bugfix patches.
-
-Fix these issues ahead of time, to make the subsequent changes more readable.
-
-No functional change.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit b45bfaf359e4821b1bf98a4fcd194d7fd176f167)
----
- tools/ocaml/xenstored/domain.ml | 16 +++++++---------
- 1 file changed, 7 insertions(+), 9 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/domain.ml b/tools/ocaml/xenstored/domain.ml
-index 81cb59b8f1..ab08dcf37f 100644
---- a/tools/ocaml/xenstored/domain.ml
-+++ b/tools/ocaml/xenstored/domain.ml
-@@ -57,17 +57,16 @@ let is_paused_for_conflict dom = dom.conflict_credit <= 0.0
- let is_free_to_conflict = is_dom0
-
- let string_of_port = function
--| None -> "None"
--| Some x -> string_of_int (Xeneventchn.to_int x)
-+ | None -> "None"
-+ | Some x -> string_of_int (Xeneventchn.to_int x)
-
- let dump d chan =
- fprintf chan "dom,%d,%nd,%d\n" d.id d.mfn d.remote_port
-
--let notify dom = match dom.port with
--| None ->
-- warn "domain %d: attempt to notify on unknown port" dom.id
--| Some port ->
-- Event.notify dom.eventchn port
-+let notify dom =
-+ match dom.port with
-+ | None -> warn "domain %d: attempt to notify on unknown port" dom.id
-+ | Some port -> Event.notify dom.eventchn port
-
- let bind_interdomain dom =
- begin match dom.port with
-@@ -84,8 +83,7 @@ let close dom =
- | None -> ()
- | Some port -> Event.unbind dom.eventchn port
- end;
-- Xenmmap.unmap dom.interface;
-- ()
-+ Xenmmap.unmap dom.interface
-
- let make id mfn remote_port interface eventchn = {
- id = id;
---
-2.40.0
-
diff --git a/0014-tools-vchan-Fix-Wsingle-bit-bitfield-constant-conver.patch b/0014-tools-vchan-Fix-Wsingle-bit-bitfield-constant-conver.patch
new file mode 100644
index 0000000..6e29490
--- /dev/null
+++ b/0014-tools-vchan-Fix-Wsingle-bit-bitfield-constant-conver.patch
@@ -0,0 +1,43 @@
+From 052a8d24bc670ab6503e21dfd2fb8bccfc22aa73 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 8 Aug 2023 14:53:42 +0100
+Subject: [PATCH 14/55] tools/vchan: Fix
+ -Wsingle-bit-bitfield-constant-conversion
+
+Gitlab reports:
+
+ node.c:158:17: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
+
+ ctrl->blocking = 1;
+ ^ ~
+ 1 error generated.
+ make[4]: *** [/builds/xen-project/people/andyhhp/xen/tools/vchan/../../tools/Rules.mk:188: node.o] Error 1
+
+In Xen 4.18, this was fixed with c/s 99ab02f63ea8 ("tools: convert bitfields
+to unsigned type") but this is an ABI change which can't be backported.
+
+Swich 1 for -1 to provide a minimally invasive way to fix the build.
+
+No functional change.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+---
+ tools/vchan/node.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/tools/vchan/node.c b/tools/vchan/node.c
+index f1638f013d..a28293b720 100644
+--- a/tools/vchan/node.c
++++ b/tools/vchan/node.c
+@@ -155,7 +155,7 @@ int main(int argc, char **argv)
+ perror("libxenvchan_*_init");
+ exit(1);
+ }
+- ctrl->blocking = 1;
++ ctrl->blocking = -1;
+
+ srand(seed);
+ fprintf(stderr, "seed=%d\n", seed);
+--
+2.42.0
+
diff --git a/0015-tools-oxenstored-Bind-the-DOM_EXC-VIRQ-in-in-Event.i.patch b/0015-tools-oxenstored-Bind-the-DOM_EXC-VIRQ-in-in-Event.i.patch
deleted file mode 100644
index 8b83edf..0000000
--- a/0015-tools-oxenstored-Bind-the-DOM_EXC-VIRQ-in-in-Event.i.patch
+++ /dev/null
@@ -1,82 +0,0 @@
-From bc5cc00868ea29d814bb3d783e28b49d1acf63e9 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 29 Nov 2022 21:05:43 +0000
-Subject: [PATCH 15/89] tools/oxenstored: Bind the DOM_EXC VIRQ in in
- Event.init()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Xenstored always needs to bind the DOM_EXC VIRQ.
-
-Instead of doing it shortly after the call to Event.init(), do it in the
-constructor directly. This removes the need for the field to be a mutable
-option.
-
-It will also simplify a future change to support live update. Rename the
-field from virq_port (which could be any VIRQ) to it's proper name.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 9804a5db435fe40c8ded8cf36c2d2b2281c56f1d)
----
- tools/ocaml/xenstored/event.ml | 9 ++++++---
- tools/ocaml/xenstored/xenstored.ml | 4 +---
- 2 files changed, 7 insertions(+), 6 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/event.ml b/tools/ocaml/xenstored/event.ml
-index ccca90b6fc..a3be296374 100644
---- a/tools/ocaml/xenstored/event.ml
-+++ b/tools/ocaml/xenstored/event.ml
-@@ -17,12 +17,15 @@
- (**************** high level binding ****************)
- type t = {
- handle: Xeneventchn.handle;
-- mutable virq_port: Xeneventchn.t option;
-+ domexc: Xeneventchn.t;
- }
-
--let init () = { handle = Xeneventchn.init (); virq_port = None; }
-+let init () =
-+ let handle = Xeneventchn.init () in
-+ let domexc = Xeneventchn.bind_dom_exc_virq handle in
-+ { handle; domexc }
-+
- let fd eventchn = Xeneventchn.fd eventchn.handle
--let bind_dom_exc_virq eventchn = eventchn.virq_port <- Some (Xeneventchn.bind_dom_exc_virq eventchn.handle)
- let bind_interdomain eventchn domid port = Xeneventchn.bind_interdomain eventchn.handle domid port
- let unbind eventchn port = Xeneventchn.unbind eventchn.handle port
- let notify eventchn port = Xeneventchn.notify eventchn.handle port
-diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
-index c5dc7a28d0..55071b49ec 100644
---- a/tools/ocaml/xenstored/xenstored.ml
-+++ b/tools/ocaml/xenstored/xenstored.ml
-@@ -397,7 +397,6 @@ let _ =
- if cf.restart && Sys.file_exists Disk.xs_daemon_database then (
- let rwro = DB.from_file store domains cons Disk.xs_daemon_database in
- info "Live reload: database loaded";
-- Event.bind_dom_exc_virq eventchn;
- Process.LiveUpdate.completed ();
- rwro
- ) else (
-@@ -413,7 +412,6 @@ let _ =
-
- if cf.domain_init then (
- Connections.add_domain cons (Domains.create0 domains);
-- Event.bind_dom_exc_virq eventchn
- );
- rw_sock
- ) in
-@@ -451,7 +449,7 @@ let _ =
- let port = Event.pending eventchn in
- debug "pending port %d" (Xeneventchn.to_int port);
- finally (fun () ->
-- if Some port = eventchn.Event.virq_port then (
-+ if port = eventchn.Event.domexc then (
- let (notify, deaddom) = Domains.cleanup domains in
- List.iter (Store.reset_permissions store) deaddom;
- List.iter (Connections.del_domain cons) deaddom;
---
-2.40.0
-
diff --git a/0015-xen-vcpu-ignore-VCPU_SSHOTTMR_future.patch b/0015-xen-vcpu-ignore-VCPU_SSHOTTMR_future.patch
new file mode 100644
index 0000000..81e010b
--- /dev/null
+++ b/0015-xen-vcpu-ignore-VCPU_SSHOTTMR_future.patch
@@ -0,0 +1,143 @@
+From 7b5155a79ea946dd513847d4e7ad2b7e6a4ebb73 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Sep 2023 08:45:29 +0200
+Subject: [PATCH 15/55] xen/vcpu: ignore VCPU_SSHOTTMR_future
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The usage of VCPU_SSHOTTMR_future in Linux prior to 4.7 is bogus.
+When the hypervisor returns -ETIME (timeout in the past) Linux keeps
+retrying to setup the timer with a higher timeout instead of
+self-injecting a timer interrupt.
+
+On boxes without any hardware assistance for logdirty we have seen HVM
+Linux guests < 4.7 with 32vCPUs give up trying to setup the timer when
+logdirty is enabled:
+
+CE: Reprogramming failure. Giving up
+CE: xen increased min_delta_ns to 1000000 nsec
+CE: Reprogramming failure. Giving up
+CE: Reprogramming failure. Giving up
+CE: xen increased min_delta_ns to 506250 nsec
+CE: xen increased min_delta_ns to 759375 nsec
+CE: xen increased min_delta_ns to 1000000 nsec
+CE: Reprogramming failure. Giving up
+CE: Reprogramming failure. Giving up
+CE: Reprogramming failure. Giving up
+Freezing user space processes ...
+INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected by 10, t=60002 jiffies, g=4006, c=4005, q=14130)
+Task dump for CPU 14:
+swapper/14 R running task 0 0 1 0x00000000
+Call Trace:
+ [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
+ [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
+ [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
+ [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
+ [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
+ [<ffffffff900000d5>] ? start_cpu+0x5/0x14
+INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=6922, c=6921, q=7013)
+Task dump for CPU 26:
+swapper/26 R running task 0 0 1 0x00000000
+Call Trace:
+ [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
+ [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
+ [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
+ [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
+ [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
+ [<ffffffff900000d5>] ? start_cpu+0x5/0x14
+INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=8499, c=8498, q=7664)
+Task dump for CPU 26:
+swapper/26 R running task 0 0 1 0x00000000
+Call Trace:
+ [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
+ [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
+ [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
+ [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
+ [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
+ [<ffffffff900000d5>] ? start_cpu+0x5/0x14
+
+Thus leading to CPU stalls and a broken system as a result.
+
+Workaround this bogus usage by ignoring the VCPU_SSHOTTMR_future in
+the hypervisor. Old Linux versions are the only ones known to have
+(wrongly) attempted to use the flag, and ignoring it is compatible
+with the behavior expected by any guests setting that flag.
+
+Note the usage of the flag has been removed from Linux by commit:
+
+c06b6d70feb3 xen/x86: don't lose event interrupts
+
+Which landed in Linux 4.7.
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Acked-by: Henry Wang <Henry.Wang@arm.com> # CHANGELOG
+Acked-by: Jan Beulich <jbeulich@suse.com>
+master commit: 19c6cbd90965b1440bd551069373d6fa3f2f365d
+master date: 2023-05-03 13:36:05 +0200
+---
+ CHANGELOG.md | 6 ++++++
+ xen/common/domain.c | 13 ++++++++++---
+ xen/include/public/vcpu.h | 5 ++++-
+ 3 files changed, 20 insertions(+), 4 deletions(-)
+
+diff --git a/CHANGELOG.md b/CHANGELOG.md
+index 7f4d0f25e9..bb0eceb69a 100644
+--- a/CHANGELOG.md
++++ b/CHANGELOG.md
+@@ -4,6 +4,12 @@ Notable changes to Xen will be documented in this file.
+
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
+
++## [4.17.3](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.17.3)
++
++### Changed
++ - Ignore VCPUOP_set_singleshot_timer's VCPU_SSHOTTMR_future flag. The only
++ known user doesn't use it properly, leading to in-guest breakage.
++
+ ## [4.17.0](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.17.0) - 2022-12-12
+
+ ### Changed
+diff --git a/xen/common/domain.c b/xen/common/domain.c
+index 53f7e734fe..30c2279673 100644
+--- a/xen/common/domain.c
++++ b/xen/common/domain.c
+@@ -1691,9 +1691,16 @@ long common_vcpu_op(int cmd, struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
+ if ( copy_from_guest(&set, arg, 1) )
+ return -EFAULT;
+
+- if ( (set.flags & VCPU_SSHOTTMR_future) &&
+- (set.timeout_abs_ns < NOW()) )
+- return -ETIME;
++ if ( set.timeout_abs_ns < NOW() )
++ {
++ /*
++ * Simplify the logic if the timeout has already expired and just
++ * inject the event.
++ */
++ stop_timer(&v->singleshot_timer);
++ send_timer_event(v);
++ break;
++ }
+
+ migrate_timer(&v->singleshot_timer, smp_processor_id());
+ set_timer(&v->singleshot_timer, set.timeout_abs_ns);
+diff --git a/xen/include/public/vcpu.h b/xen/include/public/vcpu.h
+index 81a3b3a743..a836b264a9 100644
+--- a/xen/include/public/vcpu.h
++++ b/xen/include/public/vcpu.h
+@@ -150,7 +150,10 @@ typedef struct vcpu_set_singleshot_timer vcpu_set_singleshot_timer_t;
+ DEFINE_XEN_GUEST_HANDLE(vcpu_set_singleshot_timer_t);
+
+ /* Flags to VCPUOP_set_singleshot_timer. */
+- /* Require the timeout to be in the future (return -ETIME if it's passed). */
++ /*
++ * Request the timeout to be in the future (return -ETIME if it's passed)
++ * but can be ignored by the hypervisor.
++ */
+ #define _VCPU_SSHOTTMR_future (0)
+ #define VCPU_SSHOTTMR_future (1U << _VCPU_SSHOTTMR_future)
+
+--
+2.42.0
+
diff --git a/0016-tools-oxenstored-Rename-some-port-variables-to-remot.patch b/0016-tools-oxenstored-Rename-some-port-variables-to-remot.patch
deleted file mode 100644
index 4f168d6..0000000
--- a/0016-tools-oxenstored-Rename-some-port-variables-to-remot.patch
+++ /dev/null
@@ -1,144 +0,0 @@
-From fd0d9b05970986545656c8f6f688f70f3e78a29b Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 30 Nov 2022 03:17:28 +0000
-Subject: [PATCH 16/89] tools/oxenstored: Rename some 'port' variables to
- 'remote_port'
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-This will make the logic clearer when we plumb local_port through these
-functions.
-
-While doing this, rearrange the construct in Domains.create0 to separate the
-remote port handling from the interface handling. (The interface logic is
-dubious in several ways, but not altered by this cleanup.)
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 31fbee749a75621039ca601eaee7222050a7dd83)
----
- tools/ocaml/xenstored/domains.ml | 26 ++++++++++++--------------
- tools/ocaml/xenstored/process.ml | 12 ++++++------
- tools/ocaml/xenstored/xenstored.ml | 8 ++++----
- 3 files changed, 22 insertions(+), 24 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/domains.ml b/tools/ocaml/xenstored/domains.ml
-index 17fe2fa257..26018ac0dd 100644
---- a/tools/ocaml/xenstored/domains.ml
-+++ b/tools/ocaml/xenstored/domains.ml
-@@ -122,9 +122,9 @@ let cleanup doms =
- let resume _doms _domid =
- ()
-
--let create doms domid mfn port =
-+let create doms domid mfn remote_port =
- let interface = Xenctrl.map_foreign_range xc domid (Xenmmap.getpagesize()) mfn in
-- let dom = Domain.make domid mfn port interface doms.eventchn in
-+ let dom = Domain.make domid mfn remote_port interface doms.eventchn in
- Hashtbl.add doms.table domid dom;
- Domain.bind_interdomain dom;
- dom
-@@ -133,18 +133,16 @@ let xenstored_kva = ref ""
- let xenstored_port = ref ""
-
- let create0 doms =
-- let port, interface =
-- (
-- let port = Utils.read_file_single_integer !xenstored_port
-- and fd = Unix.openfile !xenstored_kva
-- [ Unix.O_RDWR ] 0o600 in
-- let interface = Xenmmap.mmap fd Xenmmap.RDWR Xenmmap.SHARED
-- (Xenmmap.getpagesize()) 0 in
-- Unix.close fd;
-- port, interface
-- )
-- in
-- let dom = Domain.make 0 Nativeint.zero port interface doms.eventchn in
-+ let remote_port = Utils.read_file_single_integer !xenstored_port in
-+
-+ let interface =
-+ let fd = Unix.openfile !xenstored_kva [ Unix.O_RDWR ] 0o600 in
-+ let interface = Xenmmap.mmap fd Xenmmap.RDWR Xenmmap.SHARED (Xenmmap.getpagesize()) 0 in
-+ Unix.close fd;
-+ interface
-+ in
-+
-+ let dom = Domain.make 0 Nativeint.zero remote_port interface doms.eventchn in
- Hashtbl.add doms.table 0 dom;
- Domain.bind_interdomain dom;
- Domain.notify dom;
-diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
-index 72a79e9328..b2973aca2a 100644
---- a/tools/ocaml/xenstored/process.ml
-+++ b/tools/ocaml/xenstored/process.ml
-@@ -558,10 +558,10 @@ let do_transaction_end con t domains cons data =
- let do_introduce con t domains cons data =
- if not (Connection.is_dom0 con)
- then raise Define.Permission_denied;
-- let (domid, mfn, port) =
-+ let (domid, mfn, remote_port) =
- match (split None '\000' data) with
-- | domid :: mfn :: port :: _ ->
-- int_of_string domid, Nativeint.of_string mfn, int_of_string port
-+ | domid :: mfn :: remote_port :: _ ->
-+ int_of_string domid, Nativeint.of_string mfn, int_of_string remote_port
- | _ -> raise Invalid_Cmd_Args;
- in
- let dom =
-@@ -569,18 +569,18 @@ let do_introduce con t domains cons data =
- let edom = Domains.find domains domid in
- if (Domain.get_mfn edom) = mfn && (Connections.find_domain cons domid) != con then begin
- (* Use XS_INTRODUCE for recreating the xenbus event-channel. *)
-- edom.remote_port <- port;
-+ edom.remote_port <- remote_port;
- Domain.bind_interdomain edom;
- end;
- edom
- else try
-- let ndom = Domains.create domains domid mfn port in
-+ let ndom = Domains.create domains domid mfn remote_port in
- Connections.add_domain cons ndom;
- Connections.fire_spec_watches (Transaction.get_root t) cons Store.Path.introduce_domain;
- ndom
- with _ -> raise Invalid_Cmd_Args
- in
-- if (Domain.get_remote_port dom) <> port || (Domain.get_mfn dom) <> mfn then
-+ if (Domain.get_remote_port dom) <> remote_port || (Domain.get_mfn dom) <> mfn then
- raise Domain_not_match
-
- let do_release con t domains cons data =
-diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
-index 55071b49ec..1f11f576b5 100644
---- a/tools/ocaml/xenstored/xenstored.ml
-+++ b/tools/ocaml/xenstored/xenstored.ml
-@@ -167,10 +167,10 @@ let from_channel_f chan global_f socket_f domain_f watch_f store_f =
- global_f ~rw
- | "socket" :: fd :: [] ->
- socket_f ~fd:(int_of_string fd)
-- | "dom" :: domid :: mfn :: port :: []->
-+ | "dom" :: domid :: mfn :: remote_port :: []->
- domain_f (int_of_string domid)
- (Nativeint.of_string mfn)
-- (int_of_string port)
-+ (int_of_string remote_port)
- | "watch" :: domid :: path :: token :: [] ->
- watch_f (int_of_string domid)
- (unhexify path) (unhexify token)
-@@ -209,10 +209,10 @@ let from_channel store cons doms chan =
- else
- warn "Ignoring invalid socket FD %d" fd
- in
-- let domain_f domid mfn port =
-+ let domain_f domid mfn remote_port =
- let ndom =
- if domid > 0 then
-- Domains.create doms domid mfn port
-+ Domains.create doms domid mfn remote_port
- else
- Domains.create0 doms
- in
---
-2.40.0
-
diff --git a/0016-x86-head-check-base-address-alignment.patch b/0016-x86-head-check-base-address-alignment.patch
new file mode 100644
index 0000000..2b9cead
--- /dev/null
+++ b/0016-x86-head-check-base-address-alignment.patch
@@ -0,0 +1,85 @@
+From e5f9987d5f63ecc3cc9884c614aca699a41e7ca7 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Sep 2023 08:46:28 +0200
+Subject: [PATCH 16/55] x86/head: check base address alignment
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Ensure that the base address is 2M aligned, or else the page table
+entries created would be corrupt as reserved bits on the PDE end up
+set.
+
+We have encountered a broken firmware where grub2 would end up loading
+Xen at a non 2M aligned region when using the multiboot2 protocol, and
+that caused a very difficult to debug triple fault.
+
+If the alignment is not as required by the page tables print an error
+message and stop the boot. Also add a build time check that the
+calculation of symbol offsets don't break alignment of passed
+addresses.
+
+The check could be performed earlier, but so far the alignment is
+required by the page tables, and hence feels more natural that the
+check lives near to the piece of code that requires it.
+
+Note that when booted as an EFI application from the PE entry point
+the alignment check is already performed by
+efi_arch_load_addr_check(), and hence there's no need to add another
+check at the point where page tables get built in
+efi_arch_memory_setup().
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 0946068e7faea22868c577d7afa54ba4970ff520
+master date: 2023-05-03 13:36:25 +0200
+---
+ xen/arch/x86/boot/head.S | 14 ++++++++++++++
+ 1 file changed, 14 insertions(+)
+
+diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
+index 245c859dd7..6bc64c9e86 100644
+--- a/xen/arch/x86/boot/head.S
++++ b/xen/arch/x86/boot/head.S
+@@ -1,3 +1,4 @@
++#include <xen/lib.h>
+ #include <xen/multiboot.h>
+ #include <xen/multiboot2.h>
+ #include <public/xen.h>
+@@ -121,6 +122,7 @@ multiboot2_header:
+ .Lbad_ldr_nst: .asciz "ERR: EFI SystemTable is not provided by bootloader!"
+ .Lbad_ldr_nih: .asciz "ERR: EFI ImageHandle is not provided by bootloader!"
+ .Lbad_efi_msg: .asciz "ERR: EFI IA-32 platforms are not supported!"
++.Lbad_alg_msg: .asciz "ERR: Xen must be loaded at a 2Mb boundary!"
+
+ .section .init.data, "aw", @progbits
+ .align 4
+@@ -146,6 +148,9 @@ bad_cpu:
+ not_multiboot:
+ mov $sym_offs(.Lbad_ldr_msg), %ecx
+ jmp .Lget_vtb
++.Lnot_aligned:
++ mov $sym_offs(.Lbad_alg_msg), %ecx
++ jmp .Lget_vtb
+ .Lmb2_no_st:
+ /*
+ * Here we are on EFI platform. vga_text_buffer was zapped earlier
+@@ -673,6 +678,15 @@ trampoline_setup:
+ cmp %edi, %eax
+ jb 1b
+
++ .if !IS_ALIGNED(sym_offs(0), 1 << L2_PAGETABLE_SHIFT)
++ .error "Symbol offset calculation breaks alignment"
++ .endif
++
++ /* Check that the image base is aligned. */
++ lea sym_esi(_start), %eax
++ test $(1 << L2_PAGETABLE_SHIFT) - 1, %eax
++ jnz .Lnot_aligned
++
+ /* Map Xen into the higher mappings using 2M superpages. */
+ lea _PAGE_PSE + PAGE_HYPERVISOR_RWX + sym_esi(_start), %eax
+ mov $sym_offs(_start), %ecx /* %eax = PTE to write ^ */
+--
+2.42.0
+
diff --git a/0017-tools-oxenstored-Implement-Domain.rebind_evtchn.patch b/0017-tools-oxenstored-Implement-Domain.rebind_evtchn.patch
deleted file mode 100644
index 72bcae0..0000000
--- a/0017-tools-oxenstored-Implement-Domain.rebind_evtchn.patch
+++ /dev/null
@@ -1,67 +0,0 @@
-From a20daa7ffda7ccc0e65abe77532a5dc8059bf128 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 30 Nov 2022 11:55:58 +0000
-Subject: [PATCH 17/89] tools/oxenstored: Implement Domain.rebind_evtchn
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Generally speaking, the event channel local/remote port is fixed for the
-lifetime of the associated domain object. The exception to this is a
-secondary XS_INTRODUCE (defined to re-bind to a new event channel) which pokes
-around at the domain object's internal state.
-
-We need to refactor the evtchn handling to support live update, so start by
-moving the relevant manipulation into Domain.
-
-No practical change.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit aecdc28d9538ca2a1028ef9bc6550cb171dbbed4)
----
- tools/ocaml/xenstored/domain.ml | 12 ++++++++++++
- tools/ocaml/xenstored/process.ml | 3 +--
- 2 files changed, 13 insertions(+), 2 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/domain.ml b/tools/ocaml/xenstored/domain.ml
-index ab08dcf37f..d59a9401e2 100644
---- a/tools/ocaml/xenstored/domain.ml
-+++ b/tools/ocaml/xenstored/domain.ml
-@@ -63,6 +63,18 @@ let string_of_port = function
- let dump d chan =
- fprintf chan "dom,%d,%nd,%d\n" d.id d.mfn d.remote_port
-
-+let rebind_evtchn d remote_port =
-+ begin match d.port with
-+ | None -> ()
-+ | Some p -> Event.unbind d.eventchn p
-+ end;
-+ let local = Event.bind_interdomain d.eventchn d.id remote_port in
-+ debug "domain %d rebind (l %s, r %d) => (l %d, r %d)"
-+ d.id (string_of_port d.port) d.remote_port
-+ (Xeneventchn.to_int local) remote_port;
-+ d.remote_port <- remote_port;
-+ d.port <- Some (local)
-+
- let notify dom =
- match dom.port with
- | None -> warn "domain %d: attempt to notify on unknown port" dom.id
-diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
-index b2973aca2a..1c80e7198d 100644
---- a/tools/ocaml/xenstored/process.ml
-+++ b/tools/ocaml/xenstored/process.ml
-@@ -569,8 +569,7 @@ let do_introduce con t domains cons data =
- let edom = Domains.find domains domid in
- if (Domain.get_mfn edom) = mfn && (Connections.find_domain cons domid) != con then begin
- (* Use XS_INTRODUCE for recreating the xenbus event-channel. *)
-- edom.remote_port <- remote_port;
-- Domain.bind_interdomain edom;
-+ Domain.rebind_evtchn edom remote_port;
- end;
- edom
- else try
---
-2.40.0
-
diff --git a/0017-xenalyze-Handle-start-of-day-RUNNING-transitions.patch b/0017-xenalyze-Handle-start-of-day-RUNNING-transitions.patch
new file mode 100644
index 0000000..a4501a3
--- /dev/null
+++ b/0017-xenalyze-Handle-start-of-day-RUNNING-transitions.patch
@@ -0,0 +1,275 @@
+From f04295dd802fb6cd43a02ec59a5964b2c5950fe1 Mon Sep 17 00:00:00 2001
+From: George Dunlap <george.dunlap@cloud.com>
+Date: Tue, 5 Sep 2023 08:47:14 +0200
+Subject: [PATCH 17/55] xenalyze: Handle start-of-day ->RUNNING transitions
+
+A recent xentrace highlighted an unhandled corner case in the vcpu
+"start-of-day" logic, if the trace starts after the last running ->
+non-running transition, but before the first non-running -> running
+transition. Because start-of-day wasn't handled, vcpu_next_update()
+was expecting p->current to be NULL, and tripping out with the
+following error message when it wasn't:
+
+vcpu_next_update: FATAL: p->current not NULL! (d32768dv$p, runstate RUNSTATE_INIT)
+
+where 32768 is the DEFAULT_DOMAIN, and $p is the pcpu number.
+
+Instead of calling vcpu_start() piecemeal throughout
+sched_runstate_process(), call it at the top of the function if the
+vcpu in question is still in RUNSTATE_INIT, so that we can handle all
+the cases in one place.
+
+Sketch out at the top of the function all cases which we need to
+handle, and what to do in those cases. Some transitions tell us where
+v is running; some transitions tell us about what is (or is not)
+running on p; some transitions tell us neither.
+
+If a transition tells us where v is now running, update its state;
+otherwise leave it in INIT, in order to avoid having to deal with TSC
+skew on start-up.
+
+If a transition tells us what is or is not running on p, update
+p->current (either to v or NULL). Otherwise leave it alone.
+
+If neither, do nothing.
+
+Reifying those rules:
+
+- If we're continuing to run, set v to RUNNING, and use p->first_tsc
+ as the runstate time.
+
+- If we're starting to run, set v to RUNNING, and use ri->tsc as the
+ runstate time.
+
+- If v is being deschedled, leave v in the INIT state to avoid dealing
+ with TSC skew; but set p->current to NULL so that whatever is
+ scheduled next won't trigger the assert in vcpu_next_update().
+
+- If a vcpu is waking up (switching from one non-runnable state to
+ another non-runnable state), leave v in INIT, and p in whatever
+ state it's in (which may be the default domain, or some other vcpu
+ which has already run).
+
+While here, fix the comment above vcpu_start; it's called when the
+vcpu state is INIT, not when current is the default domain.
+
+Signed-off-by: George Dunlap <george.dunlap@cloud.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Anthony PERARD <anthony.perard@citrix.com>
+master commit: aab4b38b5d77e3c65f44bacd56427a85b7392a11
+master date: 2023-06-30 11:25:33 +0100
+---
+ tools/xentrace/xenalyze.c | 159 ++++++++++++++++++++++++--------------
+ 1 file changed, 101 insertions(+), 58 deletions(-)
+
+diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
+index e7ec284eea..9b4b62c82f 100644
+--- a/tools/xentrace/xenalyze.c
++++ b/tools/xentrace/xenalyze.c
+@@ -6885,39 +6885,86 @@ void vcpu_next_update(struct pcpu_info *p, struct vcpu_data *next, tsc_t tsc)
+ p->lost_record.seen_valid_schedule = 1;
+ }
+
+-/* If current is the default domain, we're fixing up from something
+- * like start-of-day. Update what we can. */
+-void vcpu_start(struct pcpu_info *p, struct vcpu_data *v) {
+- /* If vcpus are created, or first show up, in a "dead zone", this will
+- * fail. */
+- if( !p->current || p->current->d->did != DEFAULT_DOMAIN) {
+- fprintf(stderr, "Strange, p->current not default domain!\n");
+- error(ERR_FILE, NULL);
+- return;
+- }
++/*
++ * If the vcpu in question is in state INIT, we're fixing up from something
++ * like start-of-day. Update what we can.
++ */
++void vcpu_start(struct pcpu_info *p, struct vcpu_data *v,
++ int old_runstate, int new_runstate, tsc_t ri_tsc) {
++ tsc_t tsc;
++
++ /*
++ *
++ * Cases:
++ * running -> running:
++ * v -> running, using p->first_tsc
++ * {runnable, blocked} -> running:
++ * v -> running, using ri->tsc
++ * running -> {runnable, blocked}:
++ * Leave v INIT, but clear p->current in case another vcpu is scheduled
++ * blocked -> runnable:
++ * Leave INIT, and also leave p->current, since we still don't know who's scheduled here
++ */
++
++ /*
++ * NB that a vcpu won't come out of INIT until it starts running somewhere.
++ * If this event is pcpu that has already seen a scheduling event, p->current
++ * should be null; if this is the first scheduling event on this pcpu,
++ * p->current should be the default domain.
++ */
++ if( old_runstate == RUNSTATE_RUNNING ) {
++ if ( !p->current || p->current->d->did != DEFAULT_DOMAIN) {
++ fprintf(stderr, "Strange, p->current not default domain!\n");
++ error(ERR_FILE, NULL);
++ return;
+
+- if(!p->first_tsc) {
+- fprintf(stderr, "Strange, p%d first_tsc 0!\n", p->pid);
+- error(ERR_FILE, NULL);
++ }
++
++ if(!p->first_tsc) {
++ fprintf(stderr, "Strange, p%d first_tsc 0!\n", p->pid);
++ error(ERR_FILE, NULL);
++ }
++
++ if(p->first_tsc <= p->current->runstate.tsc) {
++ fprintf(stderr, "Strange, first_tsc %llx < default_domain runstate tsc %llx!\n",
++ p->first_tsc,
++ p->current->runstate.tsc);
++ error(ERR_FILE, NULL);
++ }
++
++ /* Change default domain to 'queued' */
++ runstate_update(p->current, RUNSTATE_QUEUED, p->first_tsc);
++
++ /*
++ * Set current to NULL, so that if another vcpu (not in INIT)
++ * is scheduled here, we don't trip over the check in
++ * vcpu_next_update()
++ */
++ p->current = NULL;
+ }
+
+- if(p->first_tsc <= p->current->runstate.tsc) {
+- fprintf(stderr, "Strange, first_tsc %llx < default_domain runstate tsc %llx!\n",
+- p->first_tsc,
+- p->current->runstate.tsc);
+- error(ERR_FILE, NULL);
++ /* TSC skew at start-of-day is hard to deal with. Don't
++ * bring a vcpu out of INIT until it's seen to be actually
++ * running somewhere. */
++ if ( new_runstate != RUNSTATE_RUNNING ) {
++ fprintf(warn, "First schedule for d%dv%d doesn't take us into a running state; leaving INIT\n",
++ v->d->did, v->vid);
++
++ return;
+ }
+
+- /* Change default domain to 'queued' */
+- runstate_update(p->current, RUNSTATE_QUEUED, p->first_tsc);
++ tsc = ri_tsc;
++ if ( old_runstate == RUNSTATE_RUNNING ) {
++ /* FIXME: Copy over data from the default domain this interval */
++ fprintf(warn, "Using first_tsc for d%dv%d (%lld cycles)\n",
++ v->d->did, v->vid, p->last_tsc - p->first_tsc);
+
+- /* FIXME: Copy over data from the default domain this interval */
+- fprintf(warn, "Using first_tsc for d%dv%d (%lld cycles)\n",
+- v->d->did, v->vid, p->last_tsc - p->first_tsc);
++ tsc = p->first_tsc;
++ }
+
+ /* Simulate the time since the first tsc */
+- runstate_update(v, RUNSTATE_RUNNING, p->first_tsc);
+- p->time.tsc = p->first_tsc;
++ runstate_update(v, RUNSTATE_RUNNING, tsc);
++ p->time.tsc = tsc;
+ p->current = v;
+ pcpu_string_draw(p);
+ v->p = p;
+@@ -7021,6 +7068,13 @@ void sched_runstate_process(struct pcpu_info *p)
+ last_oldstate = v->runstate.last_oldstate;
+ v->runstate.last_oldstate.wrong = RUNSTATE_INIT;
+
++ /* Handle all "start-of-day" issues in one place. This can be
++ * done before any of the other tracks or sanity checks. */
++ if ( v->runstate.state == RUNSTATE_INIT ) {
++ vcpu_start(p, v, sevt.old_runstate, sevt.new_runstate, ri->tsc);
++ return;
++ }
++
+ /* Close vmexits when the putative reason for blocking / &c stops.
+ * This way, we don't account cpu contention to some other overhead. */
+ if(sevt.new_runstate == RUNSTATE_RUNNABLE
+@@ -7190,32 +7244,27 @@ update:
+ * or stopping actually running on a physical cpu. */
+ if ( type == CONTINUE )
+ {
+- if( v->runstate.state == RUNSTATE_INIT ) {
+- /* Start-of-day; account first tsc -> now to v */
+- vcpu_start(p, v);
+- } else {
+- /* Continue running. First, do some sanity checks */
+- if ( v->runstate.state == RUNSTATE_LOST ) {
+- fprintf(warn, "WARNING: continue with d%dv%d in RUNSTATE_LOST. Resetting current.\n",
+- v->d->did, v->vid);
+- if ( p->current )
+- vcpu_prev_update(p, p->current, ri->tsc, RUNSTATE_LOST);
+- vcpu_next_update(p, v, ri->tsc);
+- }
+- else if( v->runstate.state != RUNSTATE_RUNNING ) {
+- /* This should never happen. */
+- fprintf(warn, "FATAL: sevt.old_runstate running, but d%dv%d runstate %s!\n",
+- v->d->did, v->vid, runstate_name[v->runstate.state]);
+- error(ERR_FILE, NULL);
+- } else if ( v->p != p ) {
+- fprintf(warn, "FATAL: continue on p%d, but d%dv%d p%d!\n",
+- p->pid, v->d->did, v->vid,
+- v->p ? v->p->pid : -1);
+- error(ERR_FILE, NULL);
+- }
+-
+- runstate_update(v, RUNSTATE_RUNNING, ri->tsc);
++ /* Continue running. First, do some sanity checks */
++ if ( v->runstate.state == RUNSTATE_LOST ) {
++ fprintf(warn, "WARNING: continue with d%dv%d in RUNSTATE_LOST. Resetting current.\n",
++ v->d->did, v->vid);
++ if ( p->current )
++ vcpu_prev_update(p, p->current, ri->tsc, RUNSTATE_LOST);
++ vcpu_next_update(p, v, ri->tsc);
++ }
++ else if( v->runstate.state != RUNSTATE_RUNNING ) {
++ /* This should never happen. */
++ fprintf(warn, "FATAL: sevt.old_runstate running, but d%dv%d runstate %s!\n",
++ v->d->did, v->vid, runstate_name[v->runstate.state]);
++ error(ERR_FILE, NULL);
++ } else if ( v->p != p ) {
++ fprintf(warn, "FATAL: continue on p%d, but d%dv%d p%d!\n",
++ p->pid, v->d->did, v->vid,
++ v->p ? v->p->pid : -1);
++ error(ERR_FILE, NULL);
+ }
++
++ runstate_update(v, RUNSTATE_RUNNING, ri->tsc);
+ }
+ else if ( sevt.old_runstate == RUNSTATE_RUNNING
+ || v->runstate.state == RUNSTATE_RUNNING )
+@@ -7232,10 +7281,7 @@ update:
+ * # (should never happen)
+ */
+ if( sevt.old_runstate == RUNSTATE_RUNNING ) {
+- if( v->runstate.state == RUNSTATE_INIT ) {
+- /* Start-of-day; account first tsc -> now to v */
+- vcpu_start(p, v);
+- } else if( v->runstate.state != RUNSTATE_RUNNING
++ if( v->runstate.state != RUNSTATE_RUNNING
+ && v->runstate.state != RUNSTATE_LOST ) {
+ /* This should never happen. */
+ fprintf(warn, "FATAL: sevt.old_runstate running, but d%dv%d runstate %s!\n",
+@@ -7264,11 +7310,8 @@ update:
+
+ vcpu_next_update(p, v, ri->tsc);
+ }
+- else if ( v->runstate.state != RUNSTATE_INIT )
++ else
+ {
+- /* TSC skew at start-of-day is hard to deal with. Don't
+- * bring a vcpu out of INIT until it's seen to be actually
+- * running somewhere. */
+ runstate_update(v, sevt.new_runstate, ri->tsc);
+ }
+
+--
+2.42.0
+
diff --git a/0018-tools-oxenstored-Rework-Domain-evtchn-handling-to-us.patch b/0018-tools-oxenstored-Rework-Domain-evtchn-handling-to-us.patch
deleted file mode 100644
index 1392b34..0000000
--- a/0018-tools-oxenstored-Rework-Domain-evtchn-handling-to-us.patch
+++ /dev/null
@@ -1,209 +0,0 @@
-From 4b418768ef4d75d0f70e4ce7cb5710404527bf47 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 30 Nov 2022 11:59:34 +0000
-Subject: [PATCH 18/89] tools/oxenstored: Rework Domain evtchn handling to use
- port_pair
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Inter-domain event channels are always a pair of local and remote ports.
-Right now the handling is asymmetric, caused by the fact that the evtchn is
-bound after the associated Domain object is constructed.
-
-First, move binding of the event channel into the Domain.make() constructor.
-This means the local port no longer needs to be an option. It also removes
-the final callers of Domain.bind_interdomain.
-
-Next, introduce a new port_pair type to encapsulate the fact that these two
-should be updated together, and replace the previous port and remote_port
-fields. This refactoring also changes the Domain.get_port interface (removing
-an option) so take the opportunity to name it get_local_port instead.
-
-Also, this fixes a use-after-free risk with Domain.close. Once the evtchn has
-been unbound, the same local port number can be reused for a different
-purpose, so explicitly invalidate the ports to prevent their accidental misuse
-in the future.
-
-This also cleans up some of the debugging, to always print a port pair.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit df2db174b36eba67c218763ef621c67912202fc6)
----
- tools/ocaml/xenstored/connections.ml | 9 +---
- tools/ocaml/xenstored/domain.ml | 75 ++++++++++++++--------------
- tools/ocaml/xenstored/domains.ml | 2 -
- 3 files changed, 39 insertions(+), 47 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/connections.ml b/tools/ocaml/xenstored/connections.ml
-index 7d68c583b4..a80ae0bed2 100644
---- a/tools/ocaml/xenstored/connections.ml
-+++ b/tools/ocaml/xenstored/connections.ml
-@@ -48,9 +48,7 @@ let add_domain cons dom =
- let xbcon = Xenbus.Xb.open_mmap ~capacity (Domain.get_interface dom) (fun () -> Domain.notify dom) in
- let con = Connection.create xbcon (Some dom) in
- Hashtbl.add cons.domains (Domain.get_id dom) con;
-- match Domain.get_port dom with
-- | Some p -> Hashtbl.add cons.ports p con;
-- | None -> ()
-+ Hashtbl.add cons.ports (Domain.get_local_port dom) con
-
- let select ?(only_if = (fun _ -> true)) cons =
- Hashtbl.fold (fun _ con (ins, outs) ->
-@@ -97,10 +95,7 @@ let del_domain cons id =
- let con = find_domain cons id in
- Hashtbl.remove cons.domains id;
- (match Connection.get_domain con with
-- | Some d ->
-- (match Domain.get_port d with
-- | Some p -> Hashtbl.remove cons.ports p
-- | None -> ())
-+ | Some d -> Hashtbl.remove cons.ports (Domain.get_local_port d)
- | None -> ());
- del_watches cons con;
- Connection.close con
-diff --git a/tools/ocaml/xenstored/domain.ml b/tools/ocaml/xenstored/domain.ml
-index d59a9401e2..481e10794d 100644
---- a/tools/ocaml/xenstored/domain.ml
-+++ b/tools/ocaml/xenstored/domain.ml
-@@ -19,14 +19,31 @@ open Printf
- let debug fmt = Logging.debug "domain" fmt
- let warn fmt = Logging.warn "domain" fmt
-
-+(* A bound inter-domain event channel port pair. The remote port, and the
-+ local port it is bound to. *)
-+type port_pair =
-+{
-+ local: Xeneventchn.t;
-+ remote: int;
-+}
-+
-+(* Sentinal port_pair with both set to EVTCHN_INVALID *)
-+let invalid_ports =
-+{
-+ local = Xeneventchn.of_int 0;
-+ remote = 0
-+}
-+
-+let string_of_port_pair p =
-+ sprintf "(l %d, r %d)" (Xeneventchn.to_int p.local) p.remote
-+
- type t =
- {
- id: Xenctrl.domid;
- mfn: nativeint;
- interface: Xenmmap.mmap_interface;
- eventchn: Event.t;
-- mutable remote_port: int;
-- mutable port: Xeneventchn.t option;
-+ mutable ports: port_pair;
- mutable bad_client: bool;
- mutable io_credit: int; (* the rounds of ring process left to do, default is 0,
- usually set to 1 when there is work detected, could
-@@ -41,8 +58,8 @@ let is_dom0 d = d.id = 0
- let get_id domain = domain.id
- let get_interface d = d.interface
- let get_mfn d = d.mfn
--let get_remote_port d = d.remote_port
--let get_port d = d.port
-+let get_remote_port d = d.ports.remote
-+let get_local_port d = d.ports.local
-
- let is_bad_domain domain = domain.bad_client
- let mark_as_bad domain = domain.bad_client <- true
-@@ -56,54 +73,36 @@ let is_paused_for_conflict dom = dom.conflict_credit <= 0.0
-
- let is_free_to_conflict = is_dom0
-
--let string_of_port = function
-- | None -> "None"
-- | Some x -> string_of_int (Xeneventchn.to_int x)
--
- let dump d chan =
-- fprintf chan "dom,%d,%nd,%d\n" d.id d.mfn d.remote_port
-+ fprintf chan "dom,%d,%nd,%d\n" d.id d.mfn d.ports.remote
-
- let rebind_evtchn d remote_port =
-- begin match d.port with
-- | None -> ()
-- | Some p -> Event.unbind d.eventchn p
-- end;
-+ Event.unbind d.eventchn d.ports.local;
- let local = Event.bind_interdomain d.eventchn d.id remote_port in
-- debug "domain %d rebind (l %s, r %d) => (l %d, r %d)"
-- d.id (string_of_port d.port) d.remote_port
-- (Xeneventchn.to_int local) remote_port;
-- d.remote_port <- remote_port;
-- d.port <- Some (local)
-+ let new_ports = { local; remote = remote_port } in
-+ debug "domain %d rebind %s => %s"
-+ d.id (string_of_port_pair d.ports) (string_of_port_pair new_ports);
-+ d.ports <- new_ports
-
- let notify dom =
-- match dom.port with
-- | None -> warn "domain %d: attempt to notify on unknown port" dom.id
-- | Some port -> Event.notify dom.eventchn port
--
--let bind_interdomain dom =
-- begin match dom.port with
-- | None -> ()
-- | Some port -> Event.unbind dom.eventchn port
-- end;
-- dom.port <- Some (Event.bind_interdomain dom.eventchn dom.id dom.remote_port);
-- debug "bound domain %d remote port %d to local port %s" dom.id dom.remote_port (string_of_port dom.port)
--
-+ Event.notify dom.eventchn dom.ports.local
-
- let close dom =
-- debug "domain %d unbound port %s" dom.id (string_of_port dom.port);
-- begin match dom.port with
-- | None -> ()
-- | Some port -> Event.unbind dom.eventchn port
-- end;
-+ debug "domain %d unbind %s" dom.id (string_of_port_pair dom.ports);
-+ Event.unbind dom.eventchn dom.ports.local;
-+ dom.ports <- invalid_ports;
- Xenmmap.unmap dom.interface
-
--let make id mfn remote_port interface eventchn = {
-+let make id mfn remote_port interface eventchn =
-+ let local = Event.bind_interdomain eventchn id remote_port in
-+ let ports = { local; remote = remote_port } in
-+ debug "domain %d bind %s" id (string_of_port_pair ports);
-+{
- id = id;
- mfn = mfn;
-- remote_port = remote_port;
-+ ports;
- interface = interface;
- eventchn = eventchn;
-- port = None;
- bad_client = false;
- io_credit = 0;
- conflict_credit = !Define.conflict_burst_limit;
-diff --git a/tools/ocaml/xenstored/domains.ml b/tools/ocaml/xenstored/domains.ml
-index 26018ac0dd..2ab0c5f4d8 100644
---- a/tools/ocaml/xenstored/domains.ml
-+++ b/tools/ocaml/xenstored/domains.ml
-@@ -126,7 +126,6 @@ let create doms domid mfn remote_port =
- let interface = Xenctrl.map_foreign_range xc domid (Xenmmap.getpagesize()) mfn in
- let dom = Domain.make domid mfn remote_port interface doms.eventchn in
- Hashtbl.add doms.table domid dom;
-- Domain.bind_interdomain dom;
- dom
-
- let xenstored_kva = ref ""
-@@ -144,7 +143,6 @@ let create0 doms =
-
- let dom = Domain.make 0 Nativeint.zero remote_port interface doms.eventchn in
- Hashtbl.add doms.table 0 dom;
-- Domain.bind_interdomain dom;
- Domain.notify dom;
- dom
-
---
-2.40.0
-
diff --git a/0018-x86-ioapic-sanitize-IO-APIC-pins-before-enabling-lap.patch b/0018-x86-ioapic-sanitize-IO-APIC-pins-before-enabling-lap.patch
new file mode 100644
index 0000000..a03f86e
--- /dev/null
+++ b/0018-x86-ioapic-sanitize-IO-APIC-pins-before-enabling-lap.patch
@@ -0,0 +1,113 @@
+From d0cdd34dd815bf99c3f8a7bddfdde5ae59b0f0db Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Sep 2023 08:47:34 +0200
+Subject: [PATCH 18/55] x86/ioapic: sanitize IO-APIC pins before enabling lapic
+ LVTERR/ESR
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The current logic to init the local APIC and the IO-APIC does init the
+local APIC LVTERR/ESR before doing any sanitization on the IO-APIC pin
+configuration. It's already noted on enable_IO_APIC() that Xen
+shouldn't trust the IO-APIC being empty at bootup.
+
+At XenServer we have a system where the IO-APIC 0 is handed to Xen
+with pin 0 unmasked, set to Fixed delivery mode, edge triggered and
+with a vector of 0 (all fields of the RTE are zeroed). Once the local
+APIC LVTERR/ESR is enabled periodic injections from such pin cause the
+local APIC to in turn inject periodic error vectors:
+
+APIC error on CPU0: 00(40), Received illegal vector
+APIC error on CPU0: 40(40), Received illegal vector
+APIC error on CPU0: 40(40), Received illegal vector
+APIC error on CPU0: 40(40), Received illegal vector
+APIC error on CPU0: 40(40), Received illegal vector
+APIC error on CPU0: 40(40), Received illegal vector
+
+That prevents Xen from booting.
+
+Move the masking of the IO-APIC pins ahead of the setup of the local
+APIC. This has the side effect of also moving the detection of the
+pin where the i8259 is connected, as such detection must be done
+before masking any pins.
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 813da5f0e73b8cbd2ac3c7922506e58c28cd736d
+master date: 2023-07-17 10:31:10 +0200
+---
+ xen/arch/x86/apic.c | 4 ++++
+ xen/arch/x86/include/asm/irq.h | 1 +
+ xen/arch/x86/io_apic.c | 4 +---
+ xen/arch/x86/smpboot.c | 5 +++++
+ 4 files changed, 11 insertions(+), 3 deletions(-)
+
+diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
+index 47e6e5fe41..33103d3e91 100644
+--- a/xen/arch/x86/apic.c
++++ b/xen/arch/x86/apic.c
+@@ -1491,6 +1491,10 @@ int __init APIC_init_uniprocessor (void)
+ physids_clear(phys_cpu_present_map);
+ physid_set(boot_cpu_physical_apicid, phys_cpu_present_map);
+
++ if ( !skip_ioapic_setup && nr_ioapics )
++ /* Sanitize the IO-APIC pins before enabling the lapic LVTERR/ESR. */
++ enable_IO_APIC();
++
+ setup_local_APIC(true);
+
+ if (nmi_watchdog == NMI_LOCAL_APIC)
+diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
+index 76e6ed6d60..f6a0207a80 100644
+--- a/xen/arch/x86/include/asm/irq.h
++++ b/xen/arch/x86/include/asm/irq.h
+@@ -122,6 +122,7 @@ bool bogus_8259A_irq(unsigned int irq);
+ int i8259A_suspend(void);
+ int i8259A_resume(void);
+
++void enable_IO_APIC(void);
+ void setup_IO_APIC(void);
+ void disable_IO_APIC(void);
+ void setup_ioapic_dest(void);
+diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
+index 9b8a972cf5..25a08b1ea6 100644
+--- a/xen/arch/x86/io_apic.c
++++ b/xen/arch/x86/io_apic.c
+@@ -1273,7 +1273,7 @@ static void cf_check _print_IO_APIC_keyhandler(unsigned char key)
+ __print_IO_APIC(0);
+ }
+
+-static void __init enable_IO_APIC(void)
++void __init enable_IO_APIC(void)
+ {
+ int i8259_apic, i8259_pin;
+ int i, apic;
+@@ -2067,8 +2067,6 @@ static void __init ioapic_pm_state_alloc(void)
+
+ void __init setup_IO_APIC(void)
+ {
+- enable_IO_APIC();
+-
+ if (acpi_ioapic)
+ io_apic_irqs = ~0; /* all IRQs go through IOAPIC */
+ else
+diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
+index b46fd9ab18..41ec3211ac 100644
+--- a/xen/arch/x86/smpboot.c
++++ b/xen/arch/x86/smpboot.c
+@@ -1232,6 +1232,11 @@ void __init smp_prepare_cpus(void)
+ verify_local_APIC();
+
+ connect_bsp_APIC();
++
++ if ( !skip_ioapic_setup && nr_ioapics )
++ /* Sanitize the IO-APIC pins before enabling the lapic LVTERR/ESR. */
++ enable_IO_APIC();
++
+ setup_local_APIC(true);
+
+ if ( !skip_ioapic_setup && nr_ioapics )
+--
+2.42.0
+
diff --git a/0019-tools-oxenstored-Keep-dev-xen-evtchn-open-across-liv.patch b/0019-tools-oxenstored-Keep-dev-xen-evtchn-open-across-liv.patch
deleted file mode 100644
index f6ae3fe..0000000
--- a/0019-tools-oxenstored-Keep-dev-xen-evtchn-open-across-liv.patch
+++ /dev/null
@@ -1,367 +0,0 @@
-From f02171b663393e10d35123e5572c0f5b3e72c29d Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Thu, 3 Nov 2022 15:31:39 +0000
-Subject: [PATCH 19/89] tools/oxenstored: Keep /dev/xen/evtchn open across live
- update
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Closing the evtchn handle will unbind and free all local ports. The new
-xenstored would need to rebind all evtchns, which is work that we don't want
-or need to be doing during the critical handover period.
-
-However, it turns out that the Windows PV drivers also rebind their local port
-too across suspend/resume, leaving (o)xenstored with a stale idea of the
-remote port to use. In this case, reusing the established connection is the
-only robust option.
-
-Therefore:
- * Have oxenstored open /dev/xen/evtchn without CLOEXEC at start of day.
- * Extend the handover information with the evtchn fd, domexc virq local port,
- and the local port number for each domain connection.
- * Have (the new) oxenstored recover the open handle using Xeneventchn.fdopen,
- and use the provided local ports rather than trying to rebind them.
-
-When this new information isn't present (i.e. live updating from an oxenstored
-prior to this change), the best-effort status quo will have to do.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 9b224c25293a53fcbe32da68052d861dda71a6f4)
----
- tools/ocaml/xenstored/domain.ml | 13 +++--
- tools/ocaml/xenstored/domains.ml | 9 ++--
- tools/ocaml/xenstored/event.ml | 20 +++++--
- tools/ocaml/xenstored/process.ml | 2 +-
- tools/ocaml/xenstored/xenstored.ml | 85 ++++++++++++++++++++----------
- 5 files changed, 90 insertions(+), 39 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/domain.ml b/tools/ocaml/xenstored/domain.ml
-index 481e10794d..5c15752a37 100644
---- a/tools/ocaml/xenstored/domain.ml
-+++ b/tools/ocaml/xenstored/domain.ml
-@@ -74,7 +74,8 @@ let is_paused_for_conflict dom = dom.conflict_credit <= 0.0
- let is_free_to_conflict = is_dom0
-
- let dump d chan =
-- fprintf chan "dom,%d,%nd,%d\n" d.id d.mfn d.ports.remote
-+ fprintf chan "dom,%d,%nd,%d,%d\n"
-+ d.id d.mfn d.ports.remote (Xeneventchn.to_int d.ports.local)
-
- let rebind_evtchn d remote_port =
- Event.unbind d.eventchn d.ports.local;
-@@ -93,8 +94,14 @@ let close dom =
- dom.ports <- invalid_ports;
- Xenmmap.unmap dom.interface
-
--let make id mfn remote_port interface eventchn =
-- let local = Event.bind_interdomain eventchn id remote_port in
-+(* On clean start, local_port will be None, and we must bind the remote port
-+ given. On Live Update, the event channel is already bound, and both the
-+ local and remote port numbers come from the transfer record. *)
-+let make ?local_port ~remote_port id mfn interface eventchn =
-+ let local = match local_port with
-+ | None -> Event.bind_interdomain eventchn id remote_port
-+ | Some p -> Xeneventchn.of_int p
-+ in
- let ports = { local; remote = remote_port } in
- debug "domain %d bind %s" id (string_of_port_pair ports);
- {
-diff --git a/tools/ocaml/xenstored/domains.ml b/tools/ocaml/xenstored/domains.ml
-index 2ab0c5f4d8..b6c075c838 100644
---- a/tools/ocaml/xenstored/domains.ml
-+++ b/tools/ocaml/xenstored/domains.ml
-@@ -56,6 +56,7 @@ let exist doms id = Hashtbl.mem doms.table id
- let find doms id = Hashtbl.find doms.table id
- let number doms = Hashtbl.length doms.table
- let iter doms fct = Hashtbl.iter (fun _ b -> fct b) doms.table
-+let eventchn doms = doms.eventchn
-
- let rec is_empty_queue q =
- Queue.is_empty q ||
-@@ -122,16 +123,16 @@ let cleanup doms =
- let resume _doms _domid =
- ()
-
--let create doms domid mfn remote_port =
-+let create doms ?local_port ~remote_port domid mfn =
- let interface = Xenctrl.map_foreign_range xc domid (Xenmmap.getpagesize()) mfn in
-- let dom = Domain.make domid mfn remote_port interface doms.eventchn in
-+ let dom = Domain.make ?local_port ~remote_port domid mfn interface doms.eventchn in
- Hashtbl.add doms.table domid dom;
- dom
-
- let xenstored_kva = ref ""
- let xenstored_port = ref ""
-
--let create0 doms =
-+let create0 ?local_port doms =
- let remote_port = Utils.read_file_single_integer !xenstored_port in
-
- let interface =
-@@ -141,7 +142,7 @@ let create0 doms =
- interface
- in
-
-- let dom = Domain.make 0 Nativeint.zero remote_port interface doms.eventchn in
-+ let dom = Domain.make ?local_port ~remote_port 0 Nativeint.zero interface doms.eventchn in
- Hashtbl.add doms.table 0 dom;
- Domain.notify dom;
- dom
-diff --git a/tools/ocaml/xenstored/event.ml b/tools/ocaml/xenstored/event.ml
-index a3be296374..629dc6041b 100644
---- a/tools/ocaml/xenstored/event.ml
-+++ b/tools/ocaml/xenstored/event.ml
-@@ -20,9 +20,18 @@ type t = {
- domexc: Xeneventchn.t;
- }
-
--let init () =
-- let handle = Xeneventchn.init () in
-- let domexc = Xeneventchn.bind_dom_exc_virq handle in
-+(* On clean start, both parameters will be None, and we must open the evtchn
-+ handle and bind the DOM_EXC VIRQ. On Live Update, the fd is preserved
-+ across exec(), and the DOM_EXC VIRQ still bound. *)
-+let init ?fd ?domexc_port () =
-+ let handle = match fd with
-+ | None -> Xeneventchn.init ~cloexec:false ()
-+ | Some fd -> fd |> Utils.FD.of_int |> Xeneventchn.fdopen
-+ in
-+ let domexc = match domexc_port with
-+ | None -> Xeneventchn.bind_dom_exc_virq handle
-+ | Some p -> Xeneventchn.of_int p
-+ in
- { handle; domexc }
-
- let fd eventchn = Xeneventchn.fd eventchn.handle
-@@ -31,3 +40,8 @@ let unbind eventchn port = Xeneventchn.unbind eventchn.handle port
- let notify eventchn port = Xeneventchn.notify eventchn.handle port
- let pending eventchn = Xeneventchn.pending eventchn.handle
- let unmask eventchn port = Xeneventchn.unmask eventchn.handle port
-+
-+let dump e chan =
-+ Printf.fprintf chan "evtchn-dev,%d,%d\n"
-+ (Utils.FD.to_int @@ Xeneventchn.fd e.handle)
-+ (Xeneventchn.to_int e.domexc)
-diff --git a/tools/ocaml/xenstored/process.ml b/tools/ocaml/xenstored/process.ml
-index 1c80e7198d..02bd0f7d80 100644
---- a/tools/ocaml/xenstored/process.ml
-+++ b/tools/ocaml/xenstored/process.ml
-@@ -573,7 +573,7 @@ let do_introduce con t domains cons data =
- end;
- edom
- else try
-- let ndom = Domains.create domains domid mfn remote_port in
-+ let ndom = Domains.create ~remote_port domains domid mfn in
- Connections.add_domain cons ndom;
- Connections.fire_spec_watches (Transaction.get_root t) cons Store.Path.introduce_domain;
- ndom
-diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
-index 1f11f576b5..f526f4fb23 100644
---- a/tools/ocaml/xenstored/xenstored.ml
-+++ b/tools/ocaml/xenstored/xenstored.ml
-@@ -144,7 +144,7 @@ exception Bad_format of string
-
- let dump_format_header = "$xenstored-dump-format"
-
--let from_channel_f chan global_f socket_f domain_f watch_f store_f =
-+let from_channel_f chan global_f evtchn_f socket_f domain_f watch_f store_f =
- let unhexify s = Utils.unhexify s in
- let getpath s =
- let u = Utils.unhexify s in
-@@ -165,12 +165,19 @@ let from_channel_f chan global_f socket_f domain_f watch_f store_f =
- (* there might be more parameters here,
- e.g. a RO socket from a previous version: ignore it *)
- global_f ~rw
-+ | "evtchn-dev" :: fd :: domexc_port :: [] ->
-+ evtchn_f ~fd:(int_of_string fd)
-+ ~domexc_port:(int_of_string domexc_port)
- | "socket" :: fd :: [] ->
- socket_f ~fd:(int_of_string fd)
-- | "dom" :: domid :: mfn :: remote_port :: []->
-- domain_f (int_of_string domid)
-- (Nativeint.of_string mfn)
-- (int_of_string remote_port)
-+ | "dom" :: domid :: mfn :: remote_port :: rest ->
-+ let local_port = match rest with
-+ | [] -> None (* backward compat: old version didn't have it *)
-+ | local_port :: _ -> Some (int_of_string local_port) in
-+ domain_f ?local_port
-+ ~remote_port:(int_of_string remote_port)
-+ (int_of_string domid)
-+ (Nativeint.of_string mfn)
- | "watch" :: domid :: path :: token :: [] ->
- watch_f (int_of_string domid)
- (unhexify path) (unhexify token)
-@@ -189,10 +196,21 @@ let from_channel_f chan global_f socket_f domain_f watch_f store_f =
- done;
- info "Completed loading xenstore dump"
-
--let from_channel store cons doms chan =
-+let from_channel store cons domains_init chan =
- (* don't let the permission get on our way, full perm ! *)
- let op = Store.get_ops store Perms.Connection.full_rights in
- let rwro = ref (None) in
-+ let doms = ref (None) in
-+
-+ let require_doms () =
-+ match !doms with
-+ | None ->
-+ warn "No event channel file descriptor available in dump!";
-+ let domains = domains_init @@ Event.init () in
-+ doms := Some domains;
-+ domains
-+ | Some d -> d
-+ in
- let global_f ~rw =
- let get_listen_sock sockfd =
- let fd = sockfd |> int_of_string |> Utils.FD.of_int in
-@@ -201,6 +219,10 @@ let from_channel store cons doms chan =
- in
- rwro := get_listen_sock rw
- in
-+ let evtchn_f ~fd ~domexc_port =
-+ let evtchn = Event.init ~fd ~domexc_port () in
-+ doms := Some(domains_init evtchn)
-+ in
- let socket_f ~fd =
- let ufd = Utils.FD.of_int fd in
- let is_valid = try (Unix.fstat ufd).Unix.st_kind = Unix.S_SOCK with _ -> false in
-@@ -209,12 +231,13 @@ let from_channel store cons doms chan =
- else
- warn "Ignoring invalid socket FD %d" fd
- in
-- let domain_f domid mfn remote_port =
-+ let domain_f ?local_port ~remote_port domid mfn =
-+ let doms = require_doms () in
- let ndom =
- if domid > 0 then
-- Domains.create doms domid mfn remote_port
-+ Domains.create ?local_port ~remote_port doms domid mfn
- else
-- Domains.create0 doms
-+ Domains.create0 ?local_port doms
- in
- Connections.add_domain cons ndom;
- in
-@@ -229,8 +252,8 @@ let from_channel store cons doms chan =
- op.Store.write path value;
- op.Store.setperms path perms
- in
-- from_channel_f chan global_f socket_f domain_f watch_f store_f;
-- !rwro
-+ from_channel_f chan global_f evtchn_f socket_f domain_f watch_f store_f;
-+ !rwro, require_doms ()
-
- let from_file store cons doms file =
- info "Loading xenstore dump from %s" file;
-@@ -238,7 +261,7 @@ let from_file store cons doms file =
- finally (fun () -> from_channel store doms cons channel)
- (fun () -> close_in channel)
-
--let to_channel store cons rw chan =
-+let to_channel store cons (rw, evtchn) chan =
- let hexify s = Utils.hexify s in
-
- fprintf chan "%s\n" dump_format_header;
-@@ -248,6 +271,9 @@ let to_channel store cons rw chan =
- Utils.FD.to_int fd in
- fprintf chan "global,%d\n" (fdopt rw);
-
-+ (* dump evtchn device info *)
-+ Event.dump evtchn chan;
-+
- (* dump connections related to domains: domid, mfn, eventchn port/ sockets, and watches *)
- Connections.iter cons (fun con -> Connection.dump con chan);
-
-@@ -367,7 +393,6 @@ let _ =
- | None -> () end;
-
- let store = Store.create () in
-- let eventchn = Event.init () in
- let next_frequent_ops = ref 0. in
- let advance_next_frequent_ops () =
- next_frequent_ops := (Unix.gettimeofday () +. !Define.conflict_max_history_seconds)
-@@ -375,16 +400,8 @@ let _ =
- let delay_next_frequent_ops_by duration =
- next_frequent_ops := !next_frequent_ops +. duration
- in
-- let domains = Domains.init eventchn advance_next_frequent_ops in
-+ let domains_init eventchn = Domains.init eventchn advance_next_frequent_ops in
-
-- (* For things that need to be done periodically but more often
-- * than the periodic_ops function *)
-- let frequent_ops () =
-- if Unix.gettimeofday () > !next_frequent_ops then (
-- History.trim ();
-- Domains.incr_conflict_credit domains;
-- advance_next_frequent_ops ()
-- ) in
- let cons = Connections.create () in
-
- let quit = ref false in
-@@ -393,14 +410,15 @@ let _ =
- List.iter (fun path ->
- Store.write store Perms.Connection.full_rights path "") Store.Path.specials;
-
-- let rw_sock =
-+ let rw_sock, domains =
- if cf.restart && Sys.file_exists Disk.xs_daemon_database then (
-- let rwro = DB.from_file store domains cons Disk.xs_daemon_database in
-+ let rw, domains = DB.from_file store domains_init cons Disk.xs_daemon_database in
- info "Live reload: database loaded";
- Process.LiveUpdate.completed ();
-- rwro
-+ rw, domains
- ) else (
- info "No live reload: regular startup";
-+ let domains = domains_init @@ Event.init () in
- if !Disk.enable then (
- info "reading store from disk";
- Disk.read store
-@@ -413,9 +431,18 @@ let _ =
- if cf.domain_init then (
- Connections.add_domain cons (Domains.create0 domains);
- );
-- rw_sock
-+ rw_sock, domains
- ) in
-
-+ (* For things that need to be done periodically but more often
-+ * than the periodic_ops function *)
-+ let frequent_ops () =
-+ if Unix.gettimeofday () > !next_frequent_ops then (
-+ History.trim ();
-+ Domains.incr_conflict_credit domains;
-+ advance_next_frequent_ops ()
-+ ) in
-+
- (* required for xenstore-control to detect availability of live-update *)
- let tool_path = Store.Path.of_string "/tool" in
- if not (Store.path_exists store tool_path) then
-@@ -430,8 +457,10 @@ let _ =
- Sys.set_signal Sys.sigusr1 (Sys.Signal_handle (fun _ -> sigusr1_handler store));
- Sys.set_signal Sys.sigpipe Sys.Signal_ignore;
-
-+ let eventchn = Domains.eventchn domains in
-+
- if cf.activate_access_log then begin
-- let post_rotate () = DB.to_file store cons (None) Disk.xs_daemon_database in
-+ let post_rotate () = DB.to_file store cons (None, eventchn) Disk.xs_daemon_database in
- Logging.init_access_log post_rotate
- end;
-
-@@ -593,7 +622,7 @@ let _ =
- live_update := Process.LiveUpdate.should_run cons;
- if !live_update || !quit then begin
- (* don't initiate live update if saving state fails *)
-- DB.to_file store cons (rw_sock) Disk.xs_daemon_database;
-+ DB.to_file store cons (rw_sock, eventchn) Disk.xs_daemon_database;
- quit := true;
- end
- with exc ->
---
-2.40.0
-
diff --git a/0019-x86-ioapic-add-a-raw-field-to-RTE-struct.patch b/0019-x86-ioapic-add-a-raw-field-to-RTE-struct.patch
new file mode 100644
index 0000000..10e5946
--- /dev/null
+++ b/0019-x86-ioapic-add-a-raw-field-to-RTE-struct.patch
@@ -0,0 +1,147 @@
+From a885649098e06432939907eee84f735a644883e6 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Sep 2023 08:48:43 +0200
+Subject: [PATCH 19/55] x86/ioapic: add a raw field to RTE struct
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Further changes will require access to the full RTE as a single value
+in order to pass it to IOMMU interrupt remapping handlers.
+
+No functional change intended.
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+master commit: cdc48cb5a74b10c2b07a09d2f554756d730bfee3
+master date: 2023-07-28 09:39:44 +0200
+---
+ xen/arch/x86/include/asm/io_apic.h | 57 +++++++++++++-----------
+ xen/arch/x86/io_apic.c | 2 +-
+ xen/drivers/passthrough/amd/iommu_intr.c | 4 +-
+ xen/drivers/passthrough/vtd/intremap.c | 4 +-
+ 4 files changed, 35 insertions(+), 32 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h
+index ef0878b09e..a558bb063c 100644
+--- a/xen/arch/x86/include/asm/io_apic.h
++++ b/xen/arch/x86/include/asm/io_apic.h
+@@ -89,35 +89,38 @@ enum ioapic_irq_destination_types {
+ };
+
+ struct IO_APIC_route_entry {
+- unsigned int vector:8;
+- unsigned int delivery_mode:3; /*
+- * 000: FIXED
+- * 001: lowest prio
+- * 111: ExtINT
+- */
+- unsigned int dest_mode:1; /* 0: physical, 1: logical */
+- unsigned int delivery_status:1;
+- unsigned int polarity:1; /* 0: low, 1: high */
+- unsigned int irr:1;
+- unsigned int trigger:1; /* 0: edge, 1: level */
+- unsigned int mask:1; /* 0: enabled, 1: disabled */
+- unsigned int __reserved_2:15;
+-
+ union {
+ struct {
+- unsigned int __reserved_1:24;
+- unsigned int physical_dest:4;
+- unsigned int __reserved_2:4;
+- } physical;
+-
+- struct {
+- unsigned int __reserved_1:24;
+- unsigned int logical_dest:8;
+- } logical;
+-
+- /* used when Interrupt Remapping with EIM is enabled */
+- unsigned int dest32;
+- } dest;
++ unsigned int vector:8;
++ unsigned int delivery_mode:3; /*
++ * 000: FIXED
++ * 001: lowest prio
++ * 111: ExtINT
++ */
++ unsigned int dest_mode:1; /* 0: physical, 1: logical */
++ unsigned int delivery_status:1;
++ unsigned int polarity:1; /* 0: low, 1: high */
++ unsigned int irr:1;
++ unsigned int trigger:1; /* 0: edge, 1: level */
++ unsigned int mask:1; /* 0: enabled, 1: disabled */
++ unsigned int __reserved_2:15;
++
++ union {
++ struct {
++ unsigned int __reserved_1:24;
++ unsigned int physical_dest:4;
++ unsigned int __reserved_2:4;
++ } physical;
++
++ struct {
++ unsigned int __reserved_1:24;
++ unsigned int logical_dest:8;
++ } logical;
++ unsigned int dest32;
++ } dest;
++ };
++ uint64_t raw;
++ };
+ };
+
+ /*
+diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
+index 25a08b1ea6..aada2ef96c 100644
+--- a/xen/arch/x86/io_apic.c
++++ b/xen/arch/x86/io_apic.c
+@@ -2360,7 +2360,7 @@ int ioapic_guest_read(unsigned long physbase, unsigned int reg, u32 *pval)
+ int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val)
+ {
+ int apic, pin, irq, ret, pirq;
+- struct IO_APIC_route_entry rte = { 0 };
++ struct IO_APIC_route_entry rte = { };
+ unsigned long flags;
+ struct irq_desc *desc;
+
+diff --git a/xen/drivers/passthrough/amd/iommu_intr.c b/xen/drivers/passthrough/amd/iommu_intr.c
+index f4de09f431..9e6be3be35 100644
+--- a/xen/drivers/passthrough/amd/iommu_intr.c
++++ b/xen/drivers/passthrough/amd/iommu_intr.c
+@@ -352,8 +352,8 @@ static int update_intremap_entry_from_ioapic(
+ void cf_check amd_iommu_ioapic_update_ire(
+ unsigned int apic, unsigned int reg, unsigned int value)
+ {
+- struct IO_APIC_route_entry old_rte = { 0 };
+- struct IO_APIC_route_entry new_rte = { 0 };
++ struct IO_APIC_route_entry old_rte = { };
++ struct IO_APIC_route_entry new_rte = { };
+ unsigned int rte_lo = (reg & 1) ? reg - 1 : reg;
+ unsigned int pin = (reg - 0x10) / 2;
+ int seg, bdf, rc;
+diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
+index 1512e4866b..019c21c556 100644
+--- a/xen/drivers/passthrough/vtd/intremap.c
++++ b/xen/drivers/passthrough/vtd/intremap.c
+@@ -419,7 +419,7 @@ unsigned int cf_check io_apic_read_remap_rte(
+ {
+ unsigned int ioapic_pin = (reg - 0x10) / 2;
+ int index;
+- struct IO_xAPIC_route_entry old_rte = { 0 };
++ struct IO_xAPIC_route_entry old_rte = { };
+ int rte_upper = (reg & 1) ? 1 : 0;
+ struct vtd_iommu *iommu = ioapic_to_iommu(IO_APIC_ID(apic));
+
+@@ -442,7 +442,7 @@ void cf_check io_apic_write_remap_rte(
+ unsigned int apic, unsigned int reg, unsigned int value)
+ {
+ unsigned int ioapic_pin = (reg - 0x10) / 2;
+- struct IO_xAPIC_route_entry old_rte = { 0 };
++ struct IO_xAPIC_route_entry old_rte = { };
+ struct IO_APIC_route_remap_entry *remap_rte;
+ unsigned int rte_upper = (reg & 1) ? 1 : 0;
+ struct vtd_iommu *iommu = ioapic_to_iommu(IO_APIC_ID(apic));
+--
+2.42.0
+
diff --git a/0020-tools-oxenstored-Log-live-update-issues-at-warning-l.patch b/0020-tools-oxenstored-Log-live-update-issues-at-warning-l.patch
deleted file mode 100644
index 533e3e7..0000000
--- a/0020-tools-oxenstored-Log-live-update-issues-at-warning-l.patch
+++ /dev/null
@@ -1,42 +0,0 @@
-From 991b512f5f69dde3c923804f887be9df56b03a74 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Tue, 8 Nov 2022 08:57:47 +0000
-Subject: [PATCH 20/89] tools/oxenstored: Log live update issues at warning
- level
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-During live update, oxenstored tries a best effort approach to recover as many
-domains and information as possible even if it encounters errors restoring
-some domains.
-
-However, logging about misunderstood input is more severe than simply info.
-Log it at warning instead.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 3f02e0a70fe9f8143454b742563433958d4a87f8)
----
- tools/ocaml/xenstored/xenstored.ml | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
-index f526f4fb23..35b8cbd43f 100644
---- a/tools/ocaml/xenstored/xenstored.ml
-+++ b/tools/ocaml/xenstored/xenstored.ml
-@@ -186,9 +186,9 @@ let from_channel_f chan global_f evtchn_f socket_f domain_f watch_f store_f =
- (Perms.Node.of_string (unhexify perms ^ "\000"))
- (unhexify value)
- | _ ->
-- info "restoring: ignoring unknown line: %s" line
-+ warn "restoring: ignoring unknown line: %s" line
- with exn ->
-- info "restoring: ignoring unknown line: %s (exception: %s)"
-+ warn "restoring: ignoring unknown line: %s (exception: %s)"
- line (Printexc.to_string exn);
- ()
- with End_of_file ->
---
-2.40.0
-
diff --git a/0020-x86-ioapic-RTE-modifications-must-use-ioapic_write_e.patch b/0020-x86-ioapic-RTE-modifications-must-use-ioapic_write_e.patch
new file mode 100644
index 0000000..43faeeb
--- /dev/null
+++ b/0020-x86-ioapic-RTE-modifications-must-use-ioapic_write_e.patch
@@ -0,0 +1,180 @@
+From 1bd4523d696d26976f64a919df8c7a1b3ea32f6f Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Sep 2023 08:49:37 +0200
+Subject: [PATCH 20/55] x86/ioapic: RTE modifications must use
+ ioapic_write_entry
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Do not allow to write to RTE registers using io_apic_write and instead
+require changes to RTE to be performed using ioapic_write_entry.
+
+This is in preparation for passing the full contents of the RTE to the
+IOMMU interrupt remapping handlers, so remapping entries for IO-APIC
+RTEs can be updated atomically when possible.
+
+While immediately this commit might expand the number of MMIO accesses
+in order to update an IO-APIC RTE, further changes will benefit from
+getting the full RTE value passed to the IOMMU handlers, as the logic
+is greatly simplified when the IOMMU handlers can get the complete RTE
+value in one go.
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: ef7995ed1bcd7eac37fb3c3fe56eaa54ea9baf6c
+master date: 2023-07-28 09:40:20 +0200
+---
+ xen/arch/x86/include/asm/io_apic.h | 8 ++---
+ xen/arch/x86/io_apic.c | 43 ++++++++++++------------
+ xen/drivers/passthrough/amd/iommu_intr.c | 6 ----
+ 3 files changed, 25 insertions(+), 32 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h
+index a558bb063c..6b514b4e3d 100644
+--- a/xen/arch/x86/include/asm/io_apic.h
++++ b/xen/arch/x86/include/asm/io_apic.h
+@@ -161,8 +161,8 @@ static inline void __io_apic_write(unsigned int apic, unsigned int reg, unsigned
+
+ static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned int value)
+ {
+- if ( ioapic_reg_remapped(reg) )
+- return iommu_update_ire_from_apic(apic, reg, value);
++ /* RTE writes must use ioapic_write_entry. */
++ BUG_ON(reg >= 0x10);
+ __io_apic_write(apic, reg, value);
+ }
+
+@@ -172,8 +172,8 @@ static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned i
+ */
+ static inline void io_apic_modify(unsigned int apic, unsigned int reg, unsigned int value)
+ {
+- if ( ioapic_reg_remapped(reg) )
+- return iommu_update_ire_from_apic(apic, reg, value);
++ /* RTE writes must use ioapic_write_entry. */
++ BUG_ON(reg >= 0x10);
+ *(IO_APIC_BASE(apic) + 4) = value;
+ }
+
+diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
+index aada2ef96c..041233b9b7 100644
+--- a/xen/arch/x86/io_apic.c
++++ b/xen/arch/x86/io_apic.c
+@@ -237,15 +237,15 @@ struct IO_APIC_route_entry __ioapic_read_entry(
+ {
+ union entry_union eu;
+
+- if ( raw )
++ if ( raw || !iommu_intremap )
+ {
+ eu.w1 = __io_apic_read(apic, 0x10 + 2 * pin);
+ eu.w2 = __io_apic_read(apic, 0x11 + 2 * pin);
+ }
+ else
+ {
+- eu.w1 = io_apic_read(apic, 0x10 + 2 * pin);
+- eu.w2 = io_apic_read(apic, 0x11 + 2 * pin);
++ eu.w1 = iommu_read_apic_from_ire(apic, 0x10 + 2 * pin);
++ eu.w2 = iommu_read_apic_from_ire(apic, 0x11 + 2 * pin);
+ }
+
+ return eu.entry;
+@@ -269,15 +269,15 @@ void __ioapic_write_entry(
+ {
+ union entry_union eu = { .entry = e };
+
+- if ( raw )
++ if ( raw || !iommu_intremap )
+ {
+ __io_apic_write(apic, 0x11 + 2 * pin, eu.w2);
+ __io_apic_write(apic, 0x10 + 2 * pin, eu.w1);
+ }
+ else
+ {
+- io_apic_write(apic, 0x11 + 2 * pin, eu.w2);
+- io_apic_write(apic, 0x10 + 2 * pin, eu.w1);
++ iommu_update_ire_from_apic(apic, 0x11 + 2 * pin, eu.w2);
++ iommu_update_ire_from_apic(apic, 0x10 + 2 * pin, eu.w1);
+ }
+ }
+
+@@ -433,16 +433,17 @@ static void modify_IO_APIC_irq(unsigned int irq, unsigned int enable,
+ unsigned int disable)
+ {
+ struct irq_pin_list *entry = irq_2_pin + irq;
+- unsigned int pin, reg;
+
+ for (;;) {
+- pin = entry->pin;
++ unsigned int pin = entry->pin;
++ struct IO_APIC_route_entry rte;
++
+ if (pin == -1)
+ break;
+- reg = io_apic_read(entry->apic, 0x10 + pin*2);
+- reg &= ~disable;
+- reg |= enable;
+- io_apic_modify(entry->apic, 0x10 + pin*2, reg);
++ rte = __ioapic_read_entry(entry->apic, pin, false);
++ rte.raw &= ~(uint64_t)disable;
++ rte.raw |= enable;
++ __ioapic_write_entry(entry->apic, pin, false, rte);
+ if (!entry->next)
+ break;
+ entry = irq_2_pin + entry->next;
+@@ -584,16 +585,16 @@ set_ioapic_affinity_irq(struct irq_desc *desc, const cpumask_t *mask)
+ dest = SET_APIC_LOGICAL_ID(dest);
+ entry = irq_2_pin + irq;
+ for (;;) {
+- unsigned int data;
++ struct IO_APIC_route_entry rte;
++
+ pin = entry->pin;
+ if (pin == -1)
+ break;
+
+- io_apic_write(entry->apic, 0x10 + 1 + pin*2, dest);
+- data = io_apic_read(entry->apic, 0x10 + pin*2);
+- data &= ~IO_APIC_REDIR_VECTOR_MASK;
+- data |= MASK_INSR(desc->arch.vector, IO_APIC_REDIR_VECTOR_MASK);
+- io_apic_modify(entry->apic, 0x10 + pin*2, data);
++ rte = __ioapic_read_entry(entry->apic, pin, false);
++ rte.dest.dest32 = dest;
++ rte.vector = desc->arch.vector;
++ __ioapic_write_entry(entry->apic, pin, false, rte);
+
+ if (!entry->next)
+ break;
+@@ -2127,10 +2128,8 @@ void ioapic_resume(void)
+ reg_00.bits.ID = mp_ioapics[apic].mpc_apicid;
+ __io_apic_write(apic, 0, reg_00.raw);
+ }
+- for (i = 0; i < nr_ioapic_entries[apic]; i++, entry++) {
+- __io_apic_write(apic, 0x11+2*i, *(((int *)entry)+1));
+- __io_apic_write(apic, 0x10+2*i, *(((int *)entry)+0));
+- }
++ for (i = 0; i < nr_ioapic_entries[apic]; i++, entry++)
++ __ioapic_write_entry(apic, i, true, *entry);
+ }
+ spin_unlock_irqrestore(&ioapic_lock, flags);
+ }
+diff --git a/xen/drivers/passthrough/amd/iommu_intr.c b/xen/drivers/passthrough/amd/iommu_intr.c
+index 9e6be3be35..f32c418a7e 100644
+--- a/xen/drivers/passthrough/amd/iommu_intr.c
++++ b/xen/drivers/passthrough/amd/iommu_intr.c
+@@ -361,12 +361,6 @@ void cf_check amd_iommu_ioapic_update_ire(
+ struct amd_iommu *iommu;
+ unsigned int idx;
+
+- if ( !iommu_intremap )
+- {
+- __io_apic_write(apic, reg, value);
+- return;
+- }
+-
+ idx = ioapic_id_to_index(IO_APIC_ID(apic));
+ if ( idx == MAX_IO_APICS )
+ return;
+--
+2.42.0
+
diff --git a/0021-iommu-vtd-rename-io_apic_read_remap_rte-local-variab.patch b/0021-iommu-vtd-rename-io_apic_read_remap_rte-local-variab.patch
new file mode 100644
index 0000000..6560452
--- /dev/null
+++ b/0021-iommu-vtd-rename-io_apic_read_remap_rte-local-variab.patch
@@ -0,0 +1,64 @@
+From e08e7330c58b7ee1efb00e348521a6afc524dc38 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Sep 2023 08:50:05 +0200
+Subject: [PATCH 21/55] iommu/vtd: rename io_apic_read_remap_rte() local
+ variable
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Preparatory change to unify the IO-APIC pin variable name between
+io_apic_read_remap_rte() and amd_iommu_ioapic_update_ire(), so that
+the local variable can be made a function parameter with the same name
+across vendors.
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+master commit: a478b38c01b65fa030303f0324a3380d872eb165
+master date: 2023-07-28 09:40:42 +0200
+---
+ xen/drivers/passthrough/vtd/intremap.c | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
+index 019c21c556..53c9de9a75 100644
+--- a/xen/drivers/passthrough/vtd/intremap.c
++++ b/xen/drivers/passthrough/vtd/intremap.c
+@@ -441,14 +441,14 @@ unsigned int cf_check io_apic_read_remap_rte(
+ void cf_check io_apic_write_remap_rte(
+ unsigned int apic, unsigned int reg, unsigned int value)
+ {
+- unsigned int ioapic_pin = (reg - 0x10) / 2;
++ unsigned int pin = (reg - 0x10) / 2;
+ struct IO_xAPIC_route_entry old_rte = { };
+ struct IO_APIC_route_remap_entry *remap_rte;
+ unsigned int rte_upper = (reg & 1) ? 1 : 0;
+ struct vtd_iommu *iommu = ioapic_to_iommu(IO_APIC_ID(apic));
+ int saved_mask;
+
+- old_rte = __ioapic_read_entry(apic, ioapic_pin, true);
++ old_rte = __ioapic_read_entry(apic, pin, true);
+
+ remap_rte = (struct IO_APIC_route_remap_entry *) &old_rte;
+
+@@ -458,7 +458,7 @@ void cf_check io_apic_write_remap_rte(
+ __io_apic_write(apic, reg & ~1, *(u32 *)&old_rte);
+ remap_rte->mask = saved_mask;
+
+- if ( ioapic_rte_to_remap_entry(iommu, apic, ioapic_pin,
++ if ( ioapic_rte_to_remap_entry(iommu, apic, pin,
+ &old_rte, rte_upper, value) )
+ {
+ __io_apic_write(apic, reg, value);
+@@ -468,7 +468,7 @@ void cf_check io_apic_write_remap_rte(
+ __io_apic_write(apic, reg & ~1, *(u32 *)&old_rte);
+ }
+ else
+- __ioapic_write_entry(apic, ioapic_pin, true, old_rte);
++ __ioapic_write_entry(apic, pin, true, old_rte);
+ }
+
+ static void set_msi_source_id(struct pci_dev *pdev, struct iremap_entry *ire)
+--
+2.42.0
+
diff --git a/0021-tools-oxenstored-Set-uncaught-exception-handler.patch b/0021-tools-oxenstored-Set-uncaught-exception-handler.patch
deleted file mode 100644
index 8a42fcc..0000000
--- a/0021-tools-oxenstored-Set-uncaught-exception-handler.patch
+++ /dev/null
@@ -1,83 +0,0 @@
-From e13a9a2146952859c21c0a0c7b8b07757c2aba9d Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Mon, 7 Nov 2022 17:41:36 +0000
-Subject: [PATCH 21/89] tools/oxenstored: Set uncaught exception handler
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Unhandled exceptions go to stderr by default, but this doesn't typically work
-for oxenstored because:
- * daemonize reopens stderr as /dev/null
- * systemd redirects stderr to /dev/null too
-
-Debugging an unhandled exception requires reproducing the issue locally when
-using --no-fork, and is not conducive to figuring out what went wrong on a
-remote system.
-
-Install a custom handler which also tries to render the backtrace to the
-configured syslog facility, and DAEMON|ERR otherwise.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit ee7815f49faf743e960dac9e72809eb66393bc6d)
----
- tools/ocaml/xenstored/logging.ml | 29 +++++++++++++++++++++++++++++
- tools/ocaml/xenstored/xenstored.ml | 3 ++-
- 2 files changed, 31 insertions(+), 1 deletion(-)
-
-diff --git a/tools/ocaml/xenstored/logging.ml b/tools/ocaml/xenstored/logging.ml
-index 39c3036155..255051437d 100644
---- a/tools/ocaml/xenstored/logging.ml
-+++ b/tools/ocaml/xenstored/logging.ml
-@@ -342,3 +342,32 @@ let xb_answer ~tid ~con ~ty data =
- let watch_not_fired ~con perms path =
- let data = Printf.sprintf "EPERM perms=[%s] path=%s" perms path in
- access_logging ~tid:0 ~con ~data Watch_not_fired ~level:Info
-+
-+let msg_of exn bt =
-+ Printf.sprintf "Fatal exception: %s\n%s\n" (Printexc.to_string exn)
-+ (Printexc.raw_backtrace_to_string bt)
-+
-+let fallback_exception_handler exn bt =
-+ (* stderr goes to /dev/null, so use the logger where possible,
-+ but always print to stderr too, in case everything else fails,
-+ e.g. this can be used to debug with --no-fork
-+
-+ this function should try not to raise exceptions, but if it does
-+ the ocaml runtime should still print the exception, both the original,
-+ and the one from this function, but to stderr this time
-+ *)
-+ let msg = msg_of exn bt in
-+ prerr_endline msg;
-+ (* See Printexc.set_uncaught_exception_handler, need to flush,
-+ so has to call stop and flush *)
-+ match !xenstored_logger with
-+ | Some l -> error "xenstored-fallback" "%s" msg; l.stop ()
-+ | None ->
-+ (* Too early, no logger set yet.
-+ We normally try to use the configured logger so we don't flood syslog
-+ during development for example, or if the user has a file set
-+ *)
-+ try Syslog.log Syslog.Daemon Syslog.Err msg
-+ with e ->
-+ let bt = Printexc.get_raw_backtrace () in
-+ prerr_endline @@ msg_of e bt
-diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
-index 35b8cbd43f..4d5851c5cb 100644
---- a/tools/ocaml/xenstored/xenstored.ml
-+++ b/tools/ocaml/xenstored/xenstored.ml
-@@ -355,7 +355,8 @@ let tweak_gc () =
- Gc.set { (Gc.get ()) with Gc.max_overhead = !Define.gc_max_overhead }
-
-
--let _ =
-+let () =
-+ Printexc.set_uncaught_exception_handler Logging.fallback_exception_handler;
- let cf = do_argv in
- let pidfile =
- if Sys.file_exists (config_filename cf) then
---
-2.40.0
-
diff --git a/0022-tools-oxenstored-syslog-Avoid-potential-NULL-derefer.patch b/0022-tools-oxenstored-syslog-Avoid-potential-NULL-derefer.patch
deleted file mode 100644
index eb6d42e..0000000
--- a/0022-tools-oxenstored-syslog-Avoid-potential-NULL-derefer.patch
+++ /dev/null
@@ -1,55 +0,0 @@
-From 91a9ac6e9be5aa94020f5c482e6c51b581e2ea39 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Tue, 8 Nov 2022 14:24:19 +0000
-Subject: [PATCH 22/89] tools/oxenstored/syslog: Avoid potential NULL
- dereference
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-strdup() may return NULL. Check for this before passing to syslog().
-
-Drop const from c_msg. It is bogus, as demonstrated by the need to cast to
-void * in order to free the memory.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit acd3fb6d65905f8a185dcb9fe6a330a591b96203)
----
- tools/ocaml/xenstored/syslog_stubs.c | 7 +++++--
- 1 file changed, 5 insertions(+), 2 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/syslog_stubs.c b/tools/ocaml/xenstored/syslog_stubs.c
-index 875d48ad57..e16c3a9491 100644
---- a/tools/ocaml/xenstored/syslog_stubs.c
-+++ b/tools/ocaml/xenstored/syslog_stubs.c
-@@ -14,6 +14,7 @@
-
- #include <syslog.h>
- #include <string.h>
-+#include <caml/fail.h>
- #include <caml/mlvalues.h>
- #include <caml/memory.h>
- #include <caml/alloc.h>
-@@ -35,14 +36,16 @@ static int __syslog_facility_table[] = {
- value stub_syslog(value facility, value level, value msg)
- {
- CAMLparam3(facility, level, msg);
-- const char *c_msg = strdup(String_val(msg));
-+ char *c_msg = strdup(String_val(msg));
- int c_facility = __syslog_facility_table[Int_val(facility)]
- | __syslog_level_table[Int_val(level)];
-
-+ if ( !c_msg )
-+ caml_raise_out_of_memory();
- caml_enter_blocking_section();
- syslog(c_facility, "%s", c_msg);
- caml_leave_blocking_section();
-
-- free((void*)c_msg);
-+ free(c_msg);
- CAMLreturn(Val_unit);
- }
---
-2.40.0
-
diff --git a/0022-x86-iommu-pass-full-IO-APIC-RTE-for-remapping-table-.patch b/0022-x86-iommu-pass-full-IO-APIC-RTE-for-remapping-table-.patch
new file mode 100644
index 0000000..e06714e
--- /dev/null
+++ b/0022-x86-iommu-pass-full-IO-APIC-RTE-for-remapping-table-.patch
@@ -0,0 +1,462 @@
+From 5116fe12d8238cc7d6582ceefd3f7e944bff9a1d Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Sep 2023 08:50:39 +0200
+Subject: [PATCH 22/55] x86/iommu: pass full IO-APIC RTE for remapping table
+ update
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+So that the remapping entry can be updated atomically when possible.
+
+Doing such update atomically will avoid Xen having to mask the IO-APIC
+pin prior to performing any interrupt movements (ie: changing the
+destination and vector fields), as the interrupt remapping entry is
+always consistent.
+
+This also simplifies some of the logic on both VT-d and AMD-Vi
+implementations, as having the full RTE available instead of half of
+it avoids to possibly read and update the missing other half from
+hardware.
+
+While there remove the explicit zeroing of new_ire fields in
+ioapic_rte_to_remap_entry() and initialize the variable at definition
+so all fields are zeroed. Note fields could be also initialized with
+final values at definition, but I found that likely too much to be
+done at this time.
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Kevin Tian <kevin.tian@intel.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 3e033172b0250446bfe119f31c7f0f51684b0472
+master date: 2023-08-01 11:48:39 +0200
+---
+ xen/arch/x86/include/asm/iommu.h | 3 +-
+ xen/arch/x86/io_apic.c | 5 +-
+ xen/drivers/passthrough/amd/iommu.h | 2 +-
+ xen/drivers/passthrough/amd/iommu_intr.c | 100 ++---------------
+ xen/drivers/passthrough/vtd/extern.h | 2 +-
+ xen/drivers/passthrough/vtd/intremap.c | 131 +++++++++++------------
+ xen/drivers/passthrough/x86/iommu.c | 4 +-
+ xen/include/xen/iommu.h | 3 +-
+ 8 files changed, 82 insertions(+), 168 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/iommu.h
+index fc0afe35bf..c0d4ad3742 100644
+--- a/xen/arch/x86/include/asm/iommu.h
++++ b/xen/arch/x86/include/asm/iommu.h
+@@ -97,7 +97,8 @@ struct iommu_init_ops {
+
+ extern const struct iommu_init_ops *iommu_init_ops;
+
+-void iommu_update_ire_from_apic(unsigned int apic, unsigned int reg, unsigned int value);
++void iommu_update_ire_from_apic(unsigned int apic, unsigned int pin,
++ uint64_t rte);
+ unsigned int iommu_read_apic_from_ire(unsigned int apic, unsigned int reg);
+ int iommu_setup_hpet_msi(struct msi_desc *);
+
+diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
+index 041233b9b7..b3afef8933 100644
+--- a/xen/arch/x86/io_apic.c
++++ b/xen/arch/x86/io_apic.c
+@@ -275,10 +275,7 @@ void __ioapic_write_entry(
+ __io_apic_write(apic, 0x10 + 2 * pin, eu.w1);
+ }
+ else
+- {
+- iommu_update_ire_from_apic(apic, 0x11 + 2 * pin, eu.w2);
+- iommu_update_ire_from_apic(apic, 0x10 + 2 * pin, eu.w1);
+- }
++ iommu_update_ire_from_apic(apic, pin, e.raw);
+ }
+
+ static void ioapic_write_entry(
+diff --git a/xen/drivers/passthrough/amd/iommu.h b/xen/drivers/passthrough/amd/iommu.h
+index 8bc3c35b1b..5429ada58e 100644
+--- a/xen/drivers/passthrough/amd/iommu.h
++++ b/xen/drivers/passthrough/amd/iommu.h
+@@ -300,7 +300,7 @@ int cf_check amd_iommu_free_intremap_table(
+ unsigned int amd_iommu_intremap_table_order(
+ const void *irt, const struct amd_iommu *iommu);
+ void cf_check amd_iommu_ioapic_update_ire(
+- unsigned int apic, unsigned int reg, unsigned int value);
++ unsigned int apic, unsigned int pin, uint64_t rte);
+ unsigned int cf_check amd_iommu_read_ioapic_from_ire(
+ unsigned int apic, unsigned int reg);
+ int cf_check amd_iommu_msi_msg_update_ire(
+diff --git a/xen/drivers/passthrough/amd/iommu_intr.c b/xen/drivers/passthrough/amd/iommu_intr.c
+index f32c418a7e..e83a2a932a 100644
+--- a/xen/drivers/passthrough/amd/iommu_intr.c
++++ b/xen/drivers/passthrough/amd/iommu_intr.c
+@@ -247,11 +247,6 @@ static void update_intremap_entry(const struct amd_iommu *iommu,
+ }
+ }
+
+-static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
+-{
+- return rte->vector | (rte->delivery_mode << 8);
+-}
+-
+ static inline void set_rte_index(struct IO_APIC_route_entry *rte, int offset)
+ {
+ rte->vector = (u8)offset;
+@@ -267,7 +262,6 @@ static int update_intremap_entry_from_ioapic(
+ int bdf,
+ struct amd_iommu *iommu,
+ struct IO_APIC_route_entry *rte,
+- bool_t lo_update,
+ u16 *index)
+ {
+ unsigned long flags;
+@@ -315,31 +309,6 @@ static int update_intremap_entry_from_ioapic(
+ spin_lock(lock);
+ }
+
+- if ( fresh )
+- /* nothing */;
+- else if ( !lo_update )
+- {
+- /*
+- * Low half of incoming RTE is already in remapped format,
+- * so need to recover vector and delivery mode from IRTE.
+- */
+- ASSERT(get_rte_index(rte) == offset);
+- if ( iommu->ctrl.ga_en )
+- vector = entry.ptr128->full.vector;
+- else
+- vector = entry.ptr32->flds.vector;
+- /* The IntType fields match for both formats. */
+- delivery_mode = entry.ptr32->flds.int_type;
+- }
+- else if ( x2apic_enabled )
+- {
+- /*
+- * High half of incoming RTE was read from the I/O APIC and hence may
+- * not hold the full destination, so need to recover full destination
+- * from IRTE.
+- */
+- dest = get_full_dest(entry.ptr128);
+- }
+ update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
+
+ spin_unlock_irqrestore(lock, flags);
+@@ -350,14 +319,11 @@ static int update_intremap_entry_from_ioapic(
+ }
+
+ void cf_check amd_iommu_ioapic_update_ire(
+- unsigned int apic, unsigned int reg, unsigned int value)
++ unsigned int apic, unsigned int pin, uint64_t rte)
+ {
+- struct IO_APIC_route_entry old_rte = { };
+- struct IO_APIC_route_entry new_rte = { };
+- unsigned int rte_lo = (reg & 1) ? reg - 1 : reg;
+- unsigned int pin = (reg - 0x10) / 2;
++ struct IO_APIC_route_entry old_rte;
++ struct IO_APIC_route_entry new_rte = { .raw = rte };
+ int seg, bdf, rc;
+- bool saved_mask, fresh = false;
+ struct amd_iommu *iommu;
+ unsigned int idx;
+
+@@ -373,58 +339,23 @@ void cf_check amd_iommu_ioapic_update_ire(
+ {
+ AMD_IOMMU_WARN("failed to find IOMMU for IO-APIC @ %04x:%04x\n",
+ seg, bdf);
+- __io_apic_write(apic, reg, value);
++ __ioapic_write_entry(apic, pin, true, new_rte);
+ return;
+ }
+
+- /* save io-apic rte lower 32 bits */
+- *((u32 *)&old_rte) = __io_apic_read(apic, rte_lo);
+- saved_mask = old_rte.mask;
+-
+- if ( reg == rte_lo )
+- {
+- *((u32 *)&new_rte) = value;
+- /* read upper 32 bits from io-apic rte */
+- *(((u32 *)&new_rte) + 1) = __io_apic_read(apic, reg + 1);
+- }
+- else
+- {
+- *((u32 *)&new_rte) = *((u32 *)&old_rte);
+- *(((u32 *)&new_rte) + 1) = value;
+- }
+-
+- if ( ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_MAX_ENTRIES )
+- {
+- ASSERT(saved_mask);
+-
+- /*
+- * There's nowhere except the IRTE to store a full 32-bit destination,
+- * so we may not bypass entry allocation and updating of the low RTE
+- * half in the (usual) case of the high RTE half getting written first.
+- */
+- if ( new_rte.mask && !x2apic_enabled )
+- {
+- __io_apic_write(apic, reg, value);
+- return;
+- }
+-
+- fresh = true;
+- }
+-
++ old_rte = __ioapic_read_entry(apic, pin, true);
+ /* mask the interrupt while we change the intremap table */
+- if ( !saved_mask )
++ if ( !old_rte.mask )
+ {
+ old_rte.mask = 1;
+- __io_apic_write(apic, rte_lo, *((u32 *)&old_rte));
++ __ioapic_write_entry(apic, pin, true, old_rte);
+ }
+
+ /* Update interrupt remapping entry */
+ rc = update_intremap_entry_from_ioapic(
+- bdf, iommu, &new_rte, reg == rte_lo,
++ bdf, iommu, &new_rte,
+ &ioapic_sbdf[idx].pin_2_idx[pin]);
+
+- __io_apic_write(apic, reg, ((u32 *)&new_rte)[reg != rte_lo]);
+-
+ if ( rc )
+ {
+ /* Keep the entry masked. */
+@@ -433,20 +364,7 @@ void cf_check amd_iommu_ioapic_update_ire(
+ return;
+ }
+
+- /* For lower bits access, return directly to avoid double writes */
+- if ( reg == rte_lo )
+- return;
+-
+- /*
+- * Unmask the interrupt after we have updated the intremap table. Also
+- * write the low half if a fresh entry was allocated for a high half
+- * update in x2APIC mode.
+- */
+- if ( !saved_mask || (x2apic_enabled && fresh) )
+- {
+- old_rte.mask = saved_mask;
+- __io_apic_write(apic, rte_lo, *((u32 *)&old_rte));
+- }
++ __ioapic_write_entry(apic, pin, true, new_rte);
+ }
+
+ unsigned int cf_check amd_iommu_read_ioapic_from_ire(
+diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
+index 39602d1f88..d49e40c5ce 100644
+--- a/xen/drivers/passthrough/vtd/extern.h
++++ b/xen/drivers/passthrough/vtd/extern.h
+@@ -92,7 +92,7 @@ int cf_check intel_iommu_get_reserved_device_memory(
+ unsigned int cf_check io_apic_read_remap_rte(
+ unsigned int apic, unsigned int reg);
+ void cf_check io_apic_write_remap_rte(
+- unsigned int apic, unsigned int reg, unsigned int value);
++ unsigned int apic, unsigned int pin, uint64_t rte);
+
+ struct msi_desc;
+ struct msi_msg;
+diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
+index 53c9de9a75..78d7bc139a 100644
+--- a/xen/drivers/passthrough/vtd/intremap.c
++++ b/xen/drivers/passthrough/vtd/intremap.c
+@@ -328,15 +328,14 @@ static int remap_entry_to_ioapic_rte(
+
+ static int ioapic_rte_to_remap_entry(struct vtd_iommu *iommu,
+ int apic, unsigned int ioapic_pin, struct IO_xAPIC_route_entry *old_rte,
+- unsigned int rte_upper, unsigned int value)
++ struct IO_xAPIC_route_entry new_rte)
+ {
+ struct iremap_entry *iremap_entry = NULL, *iremap_entries;
+ struct iremap_entry new_ire;
+ struct IO_APIC_route_remap_entry *remap_rte;
+- struct IO_xAPIC_route_entry new_rte;
+ int index;
+ unsigned long flags;
+- bool init = false;
++ bool init = false, masked = old_rte->mask;
+
+ remap_rte = (struct IO_APIC_route_remap_entry *) old_rte;
+ spin_lock_irqsave(&iommu->intremap.lock, flags);
+@@ -364,48 +363,40 @@ static int ioapic_rte_to_remap_entry(struct vtd_iommu *iommu,
+
+ new_ire = *iremap_entry;
+
+- if ( rte_upper )
+- {
+- if ( x2apic_enabled )
+- new_ire.remap.dst = value;
+- else
+- new_ire.remap.dst = (value >> 24) << 8;
+- }
++ if ( x2apic_enabled )
++ new_ire.remap.dst = new_rte.dest.dest32;
+ else
+- {
+- *(((u32 *)&new_rte) + 0) = value;
+- new_ire.remap.fpd = 0;
+- new_ire.remap.dm = new_rte.dest_mode;
+- new_ire.remap.tm = new_rte.trigger;
+- new_ire.remap.dlm = new_rte.delivery_mode;
+- /* Hardware require RH = 1 for LPR delivery mode */
+- new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
+- new_ire.remap.avail = 0;
+- new_ire.remap.res_1 = 0;
+- new_ire.remap.vector = new_rte.vector;
+- new_ire.remap.res_2 = 0;
+-
+- set_ioapic_source_id(IO_APIC_ID(apic), &new_ire);
+- new_ire.remap.res_3 = 0;
+- new_ire.remap.res_4 = 0;
+- new_ire.remap.p = 1; /* finally, set present bit */
+-
+- /* now construct new ioapic rte entry */
+- remap_rte->vector = new_rte.vector;
+- remap_rte->delivery_mode = 0; /* has to be 0 for remap format */
+- remap_rte->index_15 = (index >> 15) & 0x1;
+- remap_rte->index_0_14 = index & 0x7fff;
+-
+- remap_rte->delivery_status = new_rte.delivery_status;
+- remap_rte->polarity = new_rte.polarity;
+- remap_rte->irr = new_rte.irr;
+- remap_rte->trigger = new_rte.trigger;
+- remap_rte->mask = new_rte.mask;
+- remap_rte->reserved = 0;
+- remap_rte->format = 1; /* indicate remap format */
+- }
+-
+- update_irte(iommu, iremap_entry, &new_ire, !init);
++ new_ire.remap.dst = GET_xAPIC_ID(new_rte.dest.dest32) << 8;
++
++ new_ire.remap.dm = new_rte.dest_mode;
++ new_ire.remap.tm = new_rte.trigger;
++ new_ire.remap.dlm = new_rte.delivery_mode;
++ /* Hardware require RH = 1 for LPR delivery mode. */
++ new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
++ new_ire.remap.vector = new_rte.vector;
++
++ set_ioapic_source_id(IO_APIC_ID(apic), &new_ire);
++ /* Finally, set present bit. */
++ new_ire.remap.p = 1;
++
++ /* Now construct new ioapic rte entry. */
++ remap_rte->vector = new_rte.vector;
++ /* Has to be 0 for remap format. */
++ remap_rte->delivery_mode = 0;
++ remap_rte->index_15 = (index >> 15) & 0x1;
++ remap_rte->index_0_14 = index & 0x7fff;
++
++ remap_rte->delivery_status = new_rte.delivery_status;
++ remap_rte->polarity = new_rte.polarity;
++ remap_rte->irr = new_rte.irr;
++ remap_rte->trigger = new_rte.trigger;
++ remap_rte->mask = new_rte.mask;
++ remap_rte->reserved = 0;
++ /* Indicate remap format. */
++ remap_rte->format = 1;
++
++ /* If cmpxchg16b is not available the caller must mask the IO-APIC pin. */
++ update_irte(iommu, iremap_entry, &new_ire, !init && !masked);
+ iommu_sync_cache(iremap_entry, sizeof(*iremap_entry));
+ iommu_flush_iec_index(iommu, 0, index);
+
+@@ -439,36 +430,42 @@ unsigned int cf_check io_apic_read_remap_rte(
+ }
+
+ void cf_check io_apic_write_remap_rte(
+- unsigned int apic, unsigned int reg, unsigned int value)
++ unsigned int apic, unsigned int pin, uint64_t rte)
+ {
+- unsigned int pin = (reg - 0x10) / 2;
++ struct IO_xAPIC_route_entry new_rte = { .raw = rte };
+ struct IO_xAPIC_route_entry old_rte = { };
+- struct IO_APIC_route_remap_entry *remap_rte;
+- unsigned int rte_upper = (reg & 1) ? 1 : 0;
+ struct vtd_iommu *iommu = ioapic_to_iommu(IO_APIC_ID(apic));
+- int saved_mask;
+-
+- old_rte = __ioapic_read_entry(apic, pin, true);
+-
+- remap_rte = (struct IO_APIC_route_remap_entry *) &old_rte;
+-
+- /* mask the interrupt while we change the intremap table */
+- saved_mask = remap_rte->mask;
+- remap_rte->mask = 1;
+- __io_apic_write(apic, reg & ~1, *(u32 *)&old_rte);
+- remap_rte->mask = saved_mask;
++ bool masked = true;
++ int rc;
+
+- if ( ioapic_rte_to_remap_entry(iommu, apic, pin,
+- &old_rte, rte_upper, value) )
++ if ( !cpu_has_cx16 )
+ {
+- __io_apic_write(apic, reg, value);
++ /*
++ * Cannot atomically update the IRTE entry: mask the IO-APIC pin to
++ * avoid interrupts seeing an inconsistent IRTE entry.
++ */
++ old_rte = __ioapic_read_entry(apic, pin, true);
++ if ( !old_rte.mask )
++ {
++ masked = false;
++ old_rte.mask = 1;
++ __ioapic_write_entry(apic, pin, true, old_rte);
++ }
++ }
+
+- /* Recover the original value of 'mask' bit */
+- if ( rte_upper )
+- __io_apic_write(apic, reg & ~1, *(u32 *)&old_rte);
++ rc = ioapic_rte_to_remap_entry(iommu, apic, pin, &old_rte, new_rte);
++ if ( rc )
++ {
++ if ( !masked )
++ {
++ /* Recover the original value of 'mask' bit */
++ old_rte.mask = 0;
++ __ioapic_write_entry(apic, pin, true, old_rte);
++ }
++ return;
+ }
+- else
+- __ioapic_write_entry(apic, pin, true, old_rte);
++ /* old_rte will contain the updated IO-APIC RTE on success. */
++ __ioapic_write_entry(apic, pin, true, old_rte);
+ }
+
+ static void set_msi_source_id(struct pci_dev *pdev, struct iremap_entry *ire)
+diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
+index f671b0f2bb..8bd0ccb2e9 100644
+--- a/xen/drivers/passthrough/x86/iommu.c
++++ b/xen/drivers/passthrough/x86/iommu.c
+@@ -142,9 +142,9 @@ int iommu_enable_x2apic(void)
+ }
+
+ void iommu_update_ire_from_apic(
+- unsigned int apic, unsigned int reg, unsigned int value)
++ unsigned int apic, unsigned int pin, uint64_t rte)
+ {
+- iommu_vcall(&iommu_ops, update_ire_from_apic, apic, reg, value);
++ iommu_vcall(&iommu_ops, update_ire_from_apic, apic, pin, rte);
+ }
+
+ unsigned int iommu_read_apic_from_ire(unsigned int apic, unsigned int reg)
+diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
+index 4f22fc1bed..f8a52627f7 100644
+--- a/xen/include/xen/iommu.h
++++ b/xen/include/xen/iommu.h
+@@ -274,7 +274,8 @@ struct iommu_ops {
+ int (*enable_x2apic)(void);
+ void (*disable_x2apic)(void);
+
+- void (*update_ire_from_apic)(unsigned int apic, unsigned int reg, unsigned int value);
++ void (*update_ire_from_apic)(unsigned int apic, unsigned int pin,
++ uint64_t rte);
+ unsigned int (*read_apic_from_ire)(unsigned int apic, unsigned int reg);
+
+ int (*setup_hpet_msi)(struct msi_desc *);
+--
+2.42.0
+
diff --git a/0023-build-correct-gas-noexecstack-check.patch b/0023-build-correct-gas-noexecstack-check.patch
new file mode 100644
index 0000000..245d631
--- /dev/null
+++ b/0023-build-correct-gas-noexecstack-check.patch
@@ -0,0 +1,34 @@
+From ba360fbb6413231f84a7d68f5cb34858f81d4d23 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 5 Sep 2023 08:51:50 +0200
+Subject: [PATCH 23/55] build: correct gas --noexecstack check
+
+The check was missing an escape for the inner $, thus breaking things
+in the unlikely event that the underlying assembler doesn't support this
+option.
+
+Fixes: 62d22296a95d ("build: silence GNU ld warning about executable stacks")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
+master commit: d1f6a58dfdc508c43a51c1865c826d519bf16493
+master date: 2023-08-14 09:58:19 +0200
+---
+ xen/Makefile | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/Makefile b/xen/Makefile
+index 7bb9de7bdc..455916c757 100644
+--- a/xen/Makefile
++++ b/xen/Makefile
+@@ -405,7 +405,7 @@ endif
+
+ AFLAGS += -D__ASSEMBLY__
+
+-$(call cc-option-add,AFLAGS,CC,-Wa$(comma)--noexecstack)
++$(call cc-option-add,AFLAGS,CC,-Wa$$(comma)--noexecstack)
+
+ LDFLAGS-$(call ld-option,--warn-rwx-segments) += --no-warn-rwx-segments
+
+--
+2.42.0
+
diff --git a/0023-tools-oxenstored-Render-backtraces-more-nicely-in-Sy.patch b/0023-tools-oxenstored-Render-backtraces-more-nicely-in-Sy.patch
deleted file mode 100644
index c0343d0..0000000
--- a/0023-tools-oxenstored-Render-backtraces-more-nicely-in-Sy.patch
+++ /dev/null
@@ -1,83 +0,0 @@
-From c4972a4272690384b15d5706f2a833aed636895e Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 1 Dec 2022 21:06:25 +0000
-Subject: [PATCH 23/89] tools/oxenstored: Render backtraces more nicely in
- Syslog
-
-fallback_exception_handler feeds a string with embedded newlines directly into
-syslog(). While this is an improvement on getting nothing, syslogd escapes
-all control characters it gets, and emits one (long) log line.
-
-Fix the problem generally in the syslog stub. As we already have a local copy
-of the string, split it in place and emit one syslog() call per line.
-
-Also tweak Logging.msg_of to avoid putting an extra newline on a string which
-already ends with one.
-
-Fixes: ee7815f49faf ("tools/oxenstored: Set uncaught exception handler")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit d2162d884cba0ff7b2ac0d832f4e044444bda2e1)
----
- tools/ocaml/xenstored/logging.ml | 2 +-
- tools/ocaml/xenstored/syslog_stubs.c | 26 +++++++++++++++++++++++---
- 2 files changed, 24 insertions(+), 4 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/logging.ml b/tools/ocaml/xenstored/logging.ml
-index 255051437d..f233bc9a39 100644
---- a/tools/ocaml/xenstored/logging.ml
-+++ b/tools/ocaml/xenstored/logging.ml
-@@ -344,7 +344,7 @@ let watch_not_fired ~con perms path =
- access_logging ~tid:0 ~con ~data Watch_not_fired ~level:Info
-
- let msg_of exn bt =
-- Printf.sprintf "Fatal exception: %s\n%s\n" (Printexc.to_string exn)
-+ Printf.sprintf "Fatal exception: %s\n%s" (Printexc.to_string exn)
- (Printexc.raw_backtrace_to_string bt)
-
- let fallback_exception_handler exn bt =
-diff --git a/tools/ocaml/xenstored/syslog_stubs.c b/tools/ocaml/xenstored/syslog_stubs.c
-index e16c3a9491..760e78ff73 100644
---- a/tools/ocaml/xenstored/syslog_stubs.c
-+++ b/tools/ocaml/xenstored/syslog_stubs.c
-@@ -37,14 +37,34 @@ value stub_syslog(value facility, value level, value msg)
- {
- CAMLparam3(facility, level, msg);
- char *c_msg = strdup(String_val(msg));
-+ char *s = c_msg, *ss;
- int c_facility = __syslog_facility_table[Int_val(facility)]
- | __syslog_level_table[Int_val(level)];
-
- if ( !c_msg )
- caml_raise_out_of_memory();
-- caml_enter_blocking_section();
-- syslog(c_facility, "%s", c_msg);
-- caml_leave_blocking_section();
-+
-+ /*
-+ * syslog() doesn't like embedded newlines, and c_msg generally
-+ * contains them.
-+ *
-+ * Split the message in place by converting \n to \0, and issue one
-+ * syslog() call per line, skipping the final iteration if c_msg ends
-+ * with a newline anyway.
-+ */
-+ do {
-+ ss = strchr(s, '\n');
-+ if ( ss )
-+ *ss = '\0';
-+ else if ( *s == '\0' )
-+ break;
-+
-+ caml_enter_blocking_section();
-+ syslog(c_facility, "%s", s);
-+ caml_leave_blocking_section();
-+
-+ s = ss + 1;
-+ } while ( ss );
-
- free(c_msg);
- CAMLreturn(Val_unit);
---
-2.40.0
-
diff --git a/0024-Revert-tools-xenstore-simplify-loop-handling-connect.patch b/0024-Revert-tools-xenstore-simplify-loop-handling-connect.patch
deleted file mode 100644
index 81481fc..0000000
--- a/0024-Revert-tools-xenstore-simplify-loop-handling-connect.patch
+++ /dev/null
@@ -1,136 +0,0 @@
-From 2f8851c37f88e4eb4858e16626fcb2379db71a4f Mon Sep 17 00:00:00 2001
-From: Jason Andryuk <jandryuk@gmail.com>
-Date: Thu, 26 Jan 2023 11:00:24 +0100
-Subject: [PATCH 24/89] Revert "tools/xenstore: simplify loop handling
- connection I/O"
-
-I'm observing guest kexec trigger xenstored to abort on a double free.
-
-gdb output:
-Program received signal SIGABRT, Aborted.
-__pthread_kill_implementation (no_tid=0, signo=6, threadid=140645614258112) at ./nptl/pthread_kill.c:44
-44 ./nptl/pthread_kill.c: No such file or directory.
-(gdb) bt
- at ./nptl/pthread_kill.c:44
- at ./nptl/pthread_kill.c:78
- at ./nptl/pthread_kill.c:89
- at ../sysdeps/posix/raise.c:26
- at talloc.c:119
- ptr=ptr@entry=0x559fae724290) at talloc.c:232
- at xenstored_core.c:2945
-(gdb) frame 5
- at talloc.c:119
-119 TALLOC_ABORT("Bad talloc magic value - double free");
-(gdb) frame 7
- at xenstored_core.c:2945
-2945 talloc_increase_ref_count(conn);
-(gdb) p conn
-$1 = (struct connection *) 0x559fae724290
-
-Looking at a xenstore trace, we have:
-IN 0x559fae71f250 20230120 17:40:53 READ (/local/domain/3/image/device-model-dom
-id )
-wrl: dom 0 1 msec 10000 credit 1000000 reserve 100 disc
-ard
-wrl: dom 3 1 msec 10000 credit 1000000 reserve 100 disc
-ard
-wrl: dom 0 0 msec 10000 credit 1000000 reserve 0 disc
-ard
-wrl: dom 3 0 msec 10000 credit 1000000 reserve 0 disc
-ard
-OUT 0x559fae71f250 20230120 17:40:53 ERROR (ENOENT )
-wrl: dom 0 1 msec 10000 credit 1000000 reserve 100 disc
-ard
-wrl: dom 3 1 msec 10000 credit 1000000 reserve 100 disc
-ard
-IN 0x559fae71f250 20230120 17:40:53 RELEASE (3 )
-DESTROY watch 0x559fae73f630
-DESTROY watch 0x559fae75ddf0
-DESTROY watch 0x559fae75ec30
-DESTROY watch 0x559fae75ea60
-DESTROY watch 0x559fae732c00
-DESTROY watch 0x559fae72cea0
-DESTROY watch 0x559fae728fc0
-DESTROY watch 0x559fae729570
-DESTROY connection 0x559fae724290
-orphaned node /local/domain/3/device/suspend/event-channel deleted
-orphaned node /local/domain/3/device/vbd/51712 deleted
-orphaned node /local/domain/3/device/vkbd/0 deleted
-orphaned node /local/domain/3/device/vif/0 deleted
-orphaned node /local/domain/3/control/shutdown deleted
-orphaned node /local/domain/3/control/feature-poweroff deleted
-orphaned node /local/domain/3/control/feature-reboot deleted
-orphaned node /local/domain/3/control/feature-suspend deleted
-orphaned node /local/domain/3/control/feature-s3 deleted
-orphaned node /local/domain/3/control/feature-s4 deleted
-orphaned node /local/domain/3/control/sysrq deleted
-orphaned node /local/domain/3/data deleted
-orphaned node /local/domain/3/drivers deleted
-orphaned node /local/domain/3/feature deleted
-orphaned node /local/domain/3/attr deleted
-orphaned node /local/domain/3/error deleted
-orphaned node /local/domain/3/console/backend-id deleted
-
-and no further output.
-
-The trace shows that DESTROY was called for connection 0x559fae724290,
-but that is the same pointer (conn) main() was looping through from
-connections. So it wasn't actually removed from the connections list?
-
-Reverting commit e8e6e42279a5 "tools/xenstore: simplify loop handling
-connection I/O" fixes the abort/double free. I think the use of
-list_for_each_entry_safe is incorrect. list_for_each_entry_safe makes
-traversal safe for deleting the current iterator, but RELEASE/do_release
-will delete some other entry in the connections list. I think the
-observed abort is because list_for_each_entry has next pointing to the
-deleted connection, and it is used in the subsequent iteration.
-
-Add a comment explaining the unsuitability of list_for_each_entry_safe.
-Also notice that the old code takes a reference on next which would
-prevents a use-after-free.
-
-This reverts commit e8e6e42279a5723239c5c40ba4c7f579a979465d.
-
-This is XSA-425/CVE-2022-42330.
-
-Fixes: e8e6e42279a5 ("tools/xenstore: simplify loop handling connection I/O")
-Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
----
- tools/xenstore/xenstored_core.c | 19 +++++++++++++++++--
- 1 file changed, 17 insertions(+), 2 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
-index 476d5c6d51..56dbdc2530 100644
---- a/tools/xenstore/xenstored_core.c
-+++ b/tools/xenstore/xenstored_core.c
-@@ -2935,8 +2935,23 @@ int main(int argc, char *argv[])
- }
- }
-
-- list_for_each_entry_safe(conn, next, &connections, list) {
-- talloc_increase_ref_count(conn);
-+ /*
-+ * list_for_each_entry_safe is not suitable here because
-+ * handle_input may delete entries besides the current one, but
-+ * those may be in the temporary next which would trigger a
-+ * use-after-free. list_for_each_entry_safe is only safe for
-+ * deleting the current entry.
-+ */
-+ next = list_entry(connections.next, typeof(*conn), list);
-+ if (&next->list != &connections)
-+ talloc_increase_ref_count(next);
-+ while (&next->list != &connections) {
-+ conn = next;
-+
-+ next = list_entry(conn->list.next,
-+ typeof(*conn), list);
-+ if (&next->list != &connections)
-+ talloc_increase_ref_count(next);
-
- if (conn_can_read(conn))
- handle_input(conn);
---
-2.40.0
-
diff --git a/0024-libxl-slightly-correct-JSON-generation-of-CPU-policy.patch b/0024-libxl-slightly-correct-JSON-generation-of-CPU-policy.patch
new file mode 100644
index 0000000..1ec7335
--- /dev/null
+++ b/0024-libxl-slightly-correct-JSON-generation-of-CPU-policy.patch
@@ -0,0 +1,38 @@
+From 042982297802e7b746dc2fac95a453cc88d0aa83 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 5 Sep 2023 08:52:15 +0200
+Subject: [PATCH 24/55] libxl: slightly correct JSON generation of CPU policy
+
+The "cpuid_empty" label is also (in principle; maybe only for rubbish
+input) reachable in the "cpuid_only" case. Hence the label needs to live
+ahead of the check of the variable.
+
+Fixes: 5b80cecb747b ("libxl: introduce MSR data in libxl_cpuid_policy")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
+master commit: ebce4e3a146c39e57bb7a890e059e89c32b6d547
+master date: 2023-08-17 16:24:17 +0200
+---
+ tools/libs/light/libxl_cpuid.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
+index 849722541c..5c66d094b2 100644
+--- a/tools/libs/light/libxl_cpuid.c
++++ b/tools/libs/light/libxl_cpuid.c
+@@ -710,10 +710,11 @@ parse_cpuid:
+ libxl__strdup(NOGC, libxl__json_object_get_string(r));
+ }
+ }
++
++cpuid_empty:
+ if (cpuid_only)
+ return 0;
+
+-cpuid_empty:
+ co = libxl__json_map_get("msr", o, JSON_ARRAY);
+ if (!libxl__json_object_is_array(co))
+ return ERROR_FAIL;
+--
+2.42.0
+
diff --git a/0025-tboot-Disable-CET-at-shutdown.patch b/0025-tboot-Disable-CET-at-shutdown.patch
new file mode 100644
index 0000000..f06db61
--- /dev/null
+++ b/0025-tboot-Disable-CET-at-shutdown.patch
@@ -0,0 +1,53 @@
+From 7ca58fbef489fcb17631872a2bdc929823a2a494 Mon Sep 17 00:00:00 2001
+From: Jason Andryuk <jandryuk@gmail.com>
+Date: Tue, 5 Sep 2023 08:52:33 +0200
+Subject: [PATCH 25/55] tboot: Disable CET at shutdown
+
+tboot_shutdown() calls into tboot to perform the actual system shutdown.
+tboot isn't built with endbr annotations, and Xen has CET-IBT enabled on
+newer hardware. shutdown_entry isn't annotated with endbr and Xen
+faults:
+
+Panic on CPU 0:
+CONTROL-FLOW PROTECTION FAULT: #CP[0003] endbranch
+
+And Xen hangs at this point.
+
+Disabling CET-IBT let Xen and tboot power off, but reboot was
+perfoming a poweroff instead of a warm reboot. Disabling all of CET,
+i.e. shadow stacks as well, lets tboot reboot properly.
+
+Fixes: cdbe2b0a1aec ("x86: Enable CET Indirect Branch Tracking")
+Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
+master commit: 0801868f550539d417d46f82c49307480947ccaa
+master date: 2023-08-17 16:24:49 +0200
+---
+ xen/arch/x86/tboot.c | 10 ++++++++++
+ 1 file changed, 10 insertions(+)
+
+diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
+index fe1abfdf08..a2e9e97ed7 100644
+--- a/xen/arch/x86/tboot.c
++++ b/xen/arch/x86/tboot.c
+@@ -398,6 +398,16 @@ void tboot_shutdown(uint32_t shutdown_type)
+ tboot_gen_xenheap_integrity(g_tboot_shared->s3_key, &xenheap_mac);
+ }
+
++ /*
++ * Disable CET - tboot may not be built with endbr, and it doesn't support
++ * shadow stacks.
++ */
++ if ( read_cr4() & X86_CR4_CET )
++ {
++ wrmsrl(MSR_S_CET, 0);
++ write_cr4(read_cr4() & ~X86_CR4_CET);
++ }
++
+ /*
+ * During early boot, we can be called by panic before idle_vcpu[0] is
+ * setup, but in that case we don't need to change page tables.
+--
+2.42.0
+
diff --git a/0025-x86-S3-Restore-Xen-s-MSR_PAT-value-on-S3-resume.patch b/0025-x86-S3-Restore-Xen-s-MSR_PAT-value-on-S3-resume.patch
deleted file mode 100644
index 142280f..0000000
--- a/0025-x86-S3-Restore-Xen-s-MSR_PAT-value-on-S3-resume.patch
+++ /dev/null
@@ -1,36 +0,0 @@
-From a470a83c36c07b56d90957ae1e6e9ebc458d3686 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 7 Feb 2023 16:56:14 +0100
-Subject: [PATCH 25/89] x86/S3: Restore Xen's MSR_PAT value on S3 resume
-
-There are two paths in the trampoline, and Xen's PAT needs setting up in both,
-not just the boot path.
-
-Fixes: 4304ff420e51 ("x86/S3: Drop {save,restore}_rest_processor_state() completely")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 4d975798e11579fdf405b348543061129e01b0fb
-master date: 2023-01-10 21:21:30 +0000
----
- xen/arch/x86/boot/wakeup.S | 5 +++++
- 1 file changed, 5 insertions(+)
-
-diff --git a/xen/arch/x86/boot/wakeup.S b/xen/arch/x86/boot/wakeup.S
-index c17d613b61..08447e1934 100644
---- a/xen/arch/x86/boot/wakeup.S
-+++ b/xen/arch/x86/boot/wakeup.S
-@@ -130,6 +130,11 @@ wakeup_32:
- and %edi, %edx
- wrmsr
- 1:
-+ /* Set up PAT before enabling paging. */
-+ mov $XEN_MSR_PAT & 0xffffffff, %eax
-+ mov $XEN_MSR_PAT >> 32, %edx
-+ mov $MSR_IA32_CR_PAT, %ecx
-+ wrmsr
-
- /* Set up EFER (Extended Feature Enable Register). */
- movl $MSR_EFER,%ecx
---
-2.40.0
-
diff --git a/0026-tools-Fix-build-with-recent-QEMU-use-enable-trace-ba.patch b/0026-tools-Fix-build-with-recent-QEMU-use-enable-trace-ba.patch
deleted file mode 100644
index 5d937d5..0000000
--- a/0026-tools-Fix-build-with-recent-QEMU-use-enable-trace-ba.patch
+++ /dev/null
@@ -1,50 +0,0 @@
-From 1d7a388e7b9711cbd7e14b2020b168b6789772af Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Tue, 7 Feb 2023 16:57:22 +0100
-Subject: [PATCH 26/89] tools: Fix build with recent QEMU, use
- "--enable-trace-backends"
-
-The configure option "--enable-trace-backend" isn't accepted anymore
-and we should use "--enable-trace-backends" instead which was
-introduce in 2014 and allow multiple backends.
-
-"--enable-trace-backends" was introduced by:
- 5b808275f3bb ("trace: Multi-backend tracing")
-The backward compatible option "--enable-trace-backend" is removed by
- 10229ec3b0ff ("configure: remove backwards-compatibility and obsolete options")
-
-As we already use ./configure options that wouldn't be accepted by
-older version of QEMU's configure, we will simply use the new spelling
-for the option and avoid trying to detect which spelling to use.
-
-We already make use if "--firmwarepath=" which was introduced by
- 3d5eecab4a5a ("Add --firmwarepath to configure")
-which already include the new spelling for "--enable-trace-backends".
-
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
-master commit: e66d450b6e0ffec635639df993ab43ce28b3383f
-master date: 2023-01-11 10:45:29 +0100
----
- tools/Makefile | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/tools/Makefile b/tools/Makefile
-index 9e28027835..4906fdbc23 100644
---- a/tools/Makefile
-+++ b/tools/Makefile
-@@ -218,9 +218,9 @@ subdir-all-qemu-xen-dir: qemu-xen-dir-find
- mkdir -p qemu-xen-build; \
- cd qemu-xen-build; \
- if $$source/scripts/tracetool.py --check-backend --backend log ; then \
-- enable_trace_backend='--enable-trace-backend=log'; \
-+ enable_trace_backend="--enable-trace-backends=log"; \
- elif $$source/scripts/tracetool.py --check-backend --backend stderr ; then \
-- enable_trace_backend='--enable-trace-backend=stderr'; \
-+ enable_trace_backend='--enable-trace-backends=stderr'; \
- else \
- enable_trace_backend='' ; \
- fi ; \
---
-2.40.0
-
diff --git a/0026-x86-svm-Fix-valid-condition-in-svm_get_pending_event.patch b/0026-x86-svm-Fix-valid-condition-in-svm_get_pending_event.patch
new file mode 100644
index 0000000..10aa14f
--- /dev/null
+++ b/0026-x86-svm-Fix-valid-condition-in-svm_get_pending_event.patch
@@ -0,0 +1,29 @@
+From a939e953cdd522da3d8f0efeaea84448b5b570f9 Mon Sep 17 00:00:00 2001
+From: Jinoh Kang <jinoh.kang.kr@gmail.com>
+Date: Tue, 5 Sep 2023 08:53:01 +0200
+Subject: [PATCH 26/55] x86/svm: Fix valid condition in svm_get_pending_event()
+
+Fixes: 9864841914c2 ("x86/vm_event: add support for VM_EVENT_REASON_INTERRUPT")
+Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>
+master commit: b2865c2b6f164d2c379177cdd1cb200e4eaba549
+master date: 2023-08-18 20:21:44 +0100
+---
+ xen/arch/x86/hvm/svm/svm.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
+index 5fa945c526..e8f50e7c5e 100644
+--- a/xen/arch/x86/hvm/svm/svm.c
++++ b/xen/arch/x86/hvm/svm/svm.c
+@@ -2490,7 +2490,7 @@ static bool cf_check svm_get_pending_event(
+ {
+ const struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
+
+- if ( vmcb->event_inj.v )
++ if ( !vmcb->event_inj.v )
+ return false;
+
+ info->vector = vmcb->event_inj.vector;
+--
+2.42.0
+
diff --git a/0027-include-compat-produce-stubs-for-headers-not-otherwi.patch b/0027-include-compat-produce-stubs-for-headers-not-otherwi.patch
deleted file mode 100644
index 3528bd6..0000000
--- a/0027-include-compat-produce-stubs-for-headers-not-otherwi.patch
+++ /dev/null
@@ -1,74 +0,0 @@
-From c871e05e138aae2ac75e9b4ccebe6cf3fd1a775b Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 7 Feb 2023 16:57:52 +0100
-Subject: [PATCH 27/89] include/compat: produce stubs for headers not otherwise
- generated
-
-Public headers can include other public headers. Such interdependencies
-are retained in their compat counterparts. Since some compat headers are
-generated only in certain configurations, the referenced headers still
-need to exist. The lack thereof was observed with hvm/hvm_op.h needing
-trace.h, where generation of the latter depends on TRACEBUFFER=y. Make
-empty stubs in such cases (as generating the extra headers is relatively
-slow and hence better to avoid). Changes to .config and incrementally
-(re-)building is covered by the respective .*.cmd then no longer
-matching the command to be used, resulting in the necessary re-creation
-of the (possibly stub) header.
-
-Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
-master commit: 6bec713f871f21c6254a5783c1e39867ea828256
-master date: 2023-01-12 16:17:54 +0100
----
- xen/include/Makefile | 14 +++++++++++++-
- 1 file changed, 13 insertions(+), 1 deletion(-)
-
-diff --git a/xen/include/Makefile b/xen/include/Makefile
-index 65be310eca..cfd7851614 100644
---- a/xen/include/Makefile
-+++ b/xen/include/Makefile
-@@ -34,6 +34,8 @@ headers-$(CONFIG_TRACEBUFFER) += compat/trace.h
- headers-$(CONFIG_XENOPROF) += compat/xenoprof.h
- headers-$(CONFIG_XSM_FLASK) += compat/xsm/flask_op.h
-
-+headers-n := $(filter-out $(headers-y),$(headers-n) $(headers-))
-+
- cppflags-y := -include public/xen-compat.h -DXEN_GENERATING_COMPAT_HEADERS
- cppflags-$(CONFIG_X86) += -m32
-
-@@ -43,13 +45,16 @@ public-$(CONFIG_X86) := $(wildcard $(srcdir)/public/arch-x86/*.h $(srcdir)/publi
- public-$(CONFIG_ARM) := $(wildcard $(srcdir)/public/arch-arm/*.h $(srcdir)/public/arch-arm/*/*.h)
-
- .PHONY: all
--all: $(addprefix $(obj)/,$(headers-y))
-+all: $(addprefix $(obj)/,$(headers-y) $(headers-n))
-
- quiet_cmd_compat_h = GEN $@
- cmd_compat_h = \
- $(PYTHON) $(srctree)/tools/compat-build-header.py <$< $(patsubst $(obj)/%,%,$@) >>$@.new; \
- mv -f $@.new $@
-
-+quiet_cmd_stub_h = GEN $@
-+cmd_stub_h = echo '/* empty */' >$@
-+
- quiet_cmd_compat_i = CPP $@
- cmd_compat_i = $(CPP) $(filter-out -Wa$(comma)% -include %/include/xen/config.h,$(XEN_CFLAGS)) $(cppflags-y) -o $@ $<
-
-@@ -69,6 +74,13 @@ targets += $(headers-y)
- $(obj)/compat/%.h: $(obj)/compat/%.i $(srctree)/tools/compat-build-header.py FORCE
- $(call if_changed,compat_h)
-
-+# Placeholders may be needed in case files in $(headers-y) include files we
-+# don't otherwise generate. Real dependencies would need spelling out explicitly,
-+# for them to appear in $(headers-y) instead.
-+targets += $(headers-n)
-+$(addprefix $(obj)/,$(headers-n)): FORCE
-+ $(call if_changed,stub_h)
-+
- .PRECIOUS: $(obj)/compat/%.i
- targets += $(patsubst %.h, %.i, $(headers-y))
- $(obj)/compat/%.i: $(obj)/compat/%.c FORCE
---
-2.40.0
-
diff --git a/0027-x86-vmx-Revert-x86-VMX-sanitize-rIP-before-re-enteri.patch b/0027-x86-vmx-Revert-x86-VMX-sanitize-rIP-before-re-enteri.patch
new file mode 100644
index 0000000..a022066
--- /dev/null
+++ b/0027-x86-vmx-Revert-x86-VMX-sanitize-rIP-before-re-enteri.patch
@@ -0,0 +1,100 @@
+From 8be85d8c0df2445c012fac42117396b483db5db0 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 5 Sep 2023 08:53:31 +0200
+Subject: [PATCH 27/55] x86/vmx: Revert "x86/VMX: sanitize rIP before
+ re-entering guest"
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+At the time of XSA-170, the x86 instruction emulator was genuinely broken. It
+would load arbitrary values into %rip and putting a check here probably was
+the best stopgap security fix. It should have been reverted following c/s
+81d3a0b26c1 "x86emul: limit-check branch targets" which corrected the emulator
+behaviour.
+
+However, everyone involved in XSA-170, myself included, failed to read the SDM
+correctly. On the subject of %rip consistency checks, the SDM stated:
+
+ If the processor supports N < 64 linear-address bits, bits 63:N must be
+ identical
+
+A non-canonical %rip (and SSP more recently) is an explicitly legal state in
+x86, and the VMEntry consistency checks are intentionally off-by-one from a
+regular canonical check.
+
+The consequence of this bug is that Xen will currently take a legal x86 state
+which would successfully VMEnter, and corrupt it into having non-architectural
+behaviour.
+
+Furthermore, in the time this bugfix has been pending in public, I
+successfully persuaded Intel to clarify the SDM, adding the following
+clarification:
+
+ The guest RIP value is not required to be canonical; the value of bit N-1
+ may differ from that of bit N.
+
+Fixes: ffbbfda377 ("x86/VMX: sanitize rIP before re-entering guest")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 10c83bb0f5d158d101d983883741b76f927e54a3
+master date: 2023-08-23 18:44:59 +0100
+---
+ xen/arch/x86/hvm/vmx/vmx.c | 34 +---------------------------------
+ 1 file changed, 1 insertion(+), 33 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
+index f256dc2635..072288a5ef 100644
+--- a/xen/arch/x86/hvm/vmx/vmx.c
++++ b/xen/arch/x86/hvm/vmx/vmx.c
+@@ -3975,7 +3975,7 @@ static void undo_nmis_unblocked_by_iret(void)
+ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+ {
+ unsigned long exit_qualification, exit_reason, idtv_info, intr_info = 0;
+- unsigned int vector = 0, mode;
++ unsigned int vector = 0;
+ struct vcpu *v = current;
+ struct domain *currd = v->domain;
+
+@@ -4650,38 +4650,6 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+ out:
+ if ( nestedhvm_vcpu_in_guestmode(v) )
+ nvmx_idtv_handling();
+-
+- /*
+- * VM entry will fail (causing the guest to get crashed) if rIP (and
+- * rFLAGS, but we don't have an issue there) doesn't meet certain
+- * criteria. As we must not allow less than fully privileged mode to have
+- * such an effect on the domain, we correct rIP in that case (accepting
+- * this not being architecturally correct behavior, as the injected #GP
+- * fault will then not see the correct [invalid] return address).
+- * And since we know the guest will crash, we crash it right away if it
+- * already is in most privileged mode.
+- */
+- mode = vmx_guest_x86_mode(v);
+- if ( mode == 8 ? !is_canonical_address(regs->rip)
+- : regs->rip != regs->eip )
+- {
+- gprintk(XENLOG_WARNING, "Bad rIP %lx for mode %u\n", regs->rip, mode);
+-
+- if ( vmx_get_cpl() )
+- {
+- __vmread(VM_ENTRY_INTR_INFO, &intr_info);
+- if ( !(intr_info & INTR_INFO_VALID_MASK) )
+- hvm_inject_hw_exception(TRAP_gp_fault, 0);
+- /* Need to fix rIP nevertheless. */
+- if ( mode == 8 )
+- regs->rip = (long)(regs->rip << (64 - VADDR_BITS)) >>
+- (64 - VADDR_BITS);
+- else
+- regs->rip = regs->eip;
+- }
+- else
+- domain_crash(v->domain);
+- }
+ }
+
+ static void lbr_tsx_fixup(void)
+--
+2.42.0
+
diff --git a/0028-x86-irq-fix-reporting-of-spurious-i8259-interrupts.patch b/0028-x86-irq-fix-reporting-of-spurious-i8259-interrupts.patch
new file mode 100644
index 0000000..2fcfd68
--- /dev/null
+++ b/0028-x86-irq-fix-reporting-of-spurious-i8259-interrupts.patch
@@ -0,0 +1,41 @@
+From 699de512748d8e3bdcb3225b3b2a77c10cfd2408 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Sep 2023 08:53:57 +0200
+Subject: [PATCH 28/55] x86/irq: fix reporting of spurious i8259 interrupts
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The return value of bogus_8259A_irq() is wrong: the function will
+return `true` when the IRQ is real and `false` when it's a spurious
+IRQ. This causes the "No irq handler for vector ..." message in
+do_IRQ() to be printed for spurious i8259 interrupts which is not
+intended (and not helpful).
+
+Fix by inverting the return value of bogus_8259A_irq().
+
+Fixes: 132906348a14 ('x86/i8259: Handle bogus spurious interrupts more quietly')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 709f6c8ce6422475c372e67507606170a31ccb65
+master date: 2023-08-30 10:03:53 +0200
+---
+ xen/arch/x86/i8259.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/i8259.c b/xen/arch/x86/i8259.c
+index 6b35be10f0..ed9f55abe5 100644
+--- a/xen/arch/x86/i8259.c
++++ b/xen/arch/x86/i8259.c
+@@ -37,7 +37,7 @@ static bool _mask_and_ack_8259A_irq(unsigned int irq);
+
+ bool bogus_8259A_irq(unsigned int irq)
+ {
+- return _mask_and_ack_8259A_irq(irq);
++ return !_mask_and_ack_8259A_irq(irq);
+ }
+
+ static void cf_check mask_and_ack_8259A_irq(struct irq_desc *desc)
+--
+2.42.0
+
diff --git a/0028-x86-vmx-Calculate-model-specific-LBRs-once-at-start-.patch b/0028-x86-vmx-Calculate-model-specific-LBRs-once-at-start-.patch
deleted file mode 100644
index 8185bee..0000000
--- a/0028-x86-vmx-Calculate-model-specific-LBRs-once-at-start-.patch
+++ /dev/null
@@ -1,342 +0,0 @@
-From 5e3250258afbace3e5dc3f31ac99c1eebf60f238 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 7 Feb 2023 16:58:25 +0100
-Subject: [PATCH 28/89] x86/vmx: Calculate model-specific LBRs once at start of
- day
-
-There is no point repeating this calculation at runtime, especially as it is
-in the fallback path of the WRSMR/RDMSR handlers.
-
-Move the infrastructure higher in vmx.c to avoid forward declarations,
-renaming last_branch_msr_get() to get_model_specific_lbr() to highlight that
-these are model-specific only.
-
-No practical change.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Kevin Tian <kevin.tian@intel.com>
-master commit: e94af0d58f86c3a914b9cbbf4d9ed3d43b974771
-master date: 2023-01-12 18:42:00 +0000
----
- xen/arch/x86/hvm/vmx/vmx.c | 276 +++++++++++++++++++------------------
- 1 file changed, 139 insertions(+), 137 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
-index 7c81b80710..ad91464103 100644
---- a/xen/arch/x86/hvm/vmx/vmx.c
-+++ b/xen/arch/x86/hvm/vmx/vmx.c
-@@ -396,6 +396,142 @@ void vmx_pi_hooks_deassign(struct domain *d)
- domain_unpause(d);
- }
-
-+static const struct lbr_info {
-+ u32 base, count;
-+} p4_lbr[] = {
-+ { MSR_P4_LER_FROM_LIP, 1 },
-+ { MSR_P4_LER_TO_LIP, 1 },
-+ { MSR_P4_LASTBRANCH_TOS, 1 },
-+ { MSR_P4_LASTBRANCH_0_FROM_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
-+ { MSR_P4_LASTBRANCH_0_TO_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
-+ { 0, 0 }
-+}, c2_lbr[] = {
-+ { MSR_IA32_LASTINTFROMIP, 1 },
-+ { MSR_IA32_LASTINTTOIP, 1 },
-+ { MSR_C2_LASTBRANCH_TOS, 1 },
-+ { MSR_C2_LASTBRANCH_0_FROM_IP, NUM_MSR_C2_LASTBRANCH_FROM_TO },
-+ { MSR_C2_LASTBRANCH_0_TO_IP, NUM_MSR_C2_LASTBRANCH_FROM_TO },
-+ { 0, 0 }
-+}, nh_lbr[] = {
-+ { MSR_IA32_LASTINTFROMIP, 1 },
-+ { MSR_IA32_LASTINTTOIP, 1 },
-+ { MSR_NHL_LBR_SELECT, 1 },
-+ { MSR_NHL_LASTBRANCH_TOS, 1 },
-+ { MSR_P4_LASTBRANCH_0_FROM_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
-+ { MSR_P4_LASTBRANCH_0_TO_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
-+ { 0, 0 }
-+}, sk_lbr[] = {
-+ { MSR_IA32_LASTINTFROMIP, 1 },
-+ { MSR_IA32_LASTINTTOIP, 1 },
-+ { MSR_NHL_LBR_SELECT, 1 },
-+ { MSR_NHL_LASTBRANCH_TOS, 1 },
-+ { MSR_SKL_LASTBRANCH_0_FROM_IP, NUM_MSR_SKL_LASTBRANCH },
-+ { MSR_SKL_LASTBRANCH_0_TO_IP, NUM_MSR_SKL_LASTBRANCH },
-+ { MSR_SKL_LASTBRANCH_0_INFO, NUM_MSR_SKL_LASTBRANCH },
-+ { 0, 0 }
-+}, at_lbr[] = {
-+ { MSR_IA32_LASTINTFROMIP, 1 },
-+ { MSR_IA32_LASTINTTOIP, 1 },
-+ { MSR_C2_LASTBRANCH_TOS, 1 },
-+ { MSR_C2_LASTBRANCH_0_FROM_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
-+ { MSR_C2_LASTBRANCH_0_TO_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
-+ { 0, 0 }
-+}, sm_lbr[] = {
-+ { MSR_IA32_LASTINTFROMIP, 1 },
-+ { MSR_IA32_LASTINTTOIP, 1 },
-+ { MSR_SM_LBR_SELECT, 1 },
-+ { MSR_SM_LASTBRANCH_TOS, 1 },
-+ { MSR_C2_LASTBRANCH_0_FROM_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
-+ { MSR_C2_LASTBRANCH_0_TO_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
-+ { 0, 0 }
-+}, gm_lbr[] = {
-+ { MSR_IA32_LASTINTFROMIP, 1 },
-+ { MSR_IA32_LASTINTTOIP, 1 },
-+ { MSR_SM_LBR_SELECT, 1 },
-+ { MSR_SM_LASTBRANCH_TOS, 1 },
-+ { MSR_GM_LASTBRANCH_0_FROM_IP, NUM_MSR_GM_LASTBRANCH_FROM_TO },
-+ { MSR_GM_LASTBRANCH_0_TO_IP, NUM_MSR_GM_LASTBRANCH_FROM_TO },
-+ { 0, 0 }
-+};
-+static const struct lbr_info *__ro_after_init model_specific_lbr;
-+
-+static const struct lbr_info *__init get_model_specific_lbr(void)
-+{
-+ switch ( boot_cpu_data.x86 )
-+ {
-+ case 6:
-+ switch ( boot_cpu_data.x86_model )
-+ {
-+ /* Core2 Duo */
-+ case 0x0f:
-+ /* Enhanced Core */
-+ case 0x17:
-+ /* Xeon 7400 */
-+ case 0x1d:
-+ return c2_lbr;
-+ /* Nehalem */
-+ case 0x1a: case 0x1e: case 0x1f: case 0x2e:
-+ /* Westmere */
-+ case 0x25: case 0x2c: case 0x2f:
-+ /* Sandy Bridge */
-+ case 0x2a: case 0x2d:
-+ /* Ivy Bridge */
-+ case 0x3a: case 0x3e:
-+ /* Haswell */
-+ case 0x3c: case 0x3f: case 0x45: case 0x46:
-+ /* Broadwell */
-+ case 0x3d: case 0x47: case 0x4f: case 0x56:
-+ return nh_lbr;
-+ /* Skylake */
-+ case 0x4e: case 0x5e:
-+ /* Xeon Scalable */
-+ case 0x55:
-+ /* Cannon Lake */
-+ case 0x66:
-+ /* Goldmont Plus */
-+ case 0x7a:
-+ /* Ice Lake */
-+ case 0x6a: case 0x6c: case 0x7d: case 0x7e:
-+ /* Tiger Lake */
-+ case 0x8c: case 0x8d:
-+ /* Tremont */
-+ case 0x86:
-+ /* Kaby Lake */
-+ case 0x8e: case 0x9e:
-+ /* Comet Lake */
-+ case 0xa5: case 0xa6:
-+ return sk_lbr;
-+ /* Atom */
-+ case 0x1c: case 0x26: case 0x27: case 0x35: case 0x36:
-+ return at_lbr;
-+ /* Silvermont */
-+ case 0x37: case 0x4a: case 0x4d: case 0x5a: case 0x5d:
-+ /* Xeon Phi Knights Landing */
-+ case 0x57:
-+ /* Xeon Phi Knights Mill */
-+ case 0x85:
-+ /* Airmont */
-+ case 0x4c:
-+ return sm_lbr;
-+ /* Goldmont */
-+ case 0x5c: case 0x5f:
-+ return gm_lbr;
-+ }
-+ break;
-+
-+ case 15:
-+ switch ( boot_cpu_data.x86_model )
-+ {
-+ /* Pentium4/Xeon with em64t */
-+ case 3: case 4: case 6:
-+ return p4_lbr;
-+ }
-+ break;
-+ }
-+
-+ return NULL;
-+}
-+
- static int cf_check vmx_domain_initialise(struct domain *d)
- {
- static const struct arch_csw csw = {
-@@ -2837,6 +2973,7 @@ const struct hvm_function_table * __init start_vmx(void)
- vmx_function_table.tsc_scaling.setup = vmx_setup_tsc_scaling;
- }
-
-+ model_specific_lbr = get_model_specific_lbr();
- lbr_tsx_fixup_check();
- ler_to_fixup_check();
-
-@@ -2983,141 +3120,6 @@ static int vmx_cr_access(cr_access_qual_t qual)
- return X86EMUL_OKAY;
- }
-
--static const struct lbr_info {
-- u32 base, count;
--} p4_lbr[] = {
-- { MSR_P4_LER_FROM_LIP, 1 },
-- { MSR_P4_LER_TO_LIP, 1 },
-- { MSR_P4_LASTBRANCH_TOS, 1 },
-- { MSR_P4_LASTBRANCH_0_FROM_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
-- { MSR_P4_LASTBRANCH_0_TO_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
-- { 0, 0 }
--}, c2_lbr[] = {
-- { MSR_IA32_LASTINTFROMIP, 1 },
-- { MSR_IA32_LASTINTTOIP, 1 },
-- { MSR_C2_LASTBRANCH_TOS, 1 },
-- { MSR_C2_LASTBRANCH_0_FROM_IP, NUM_MSR_C2_LASTBRANCH_FROM_TO },
-- { MSR_C2_LASTBRANCH_0_TO_IP, NUM_MSR_C2_LASTBRANCH_FROM_TO },
-- { 0, 0 }
--}, nh_lbr[] = {
-- { MSR_IA32_LASTINTFROMIP, 1 },
-- { MSR_IA32_LASTINTTOIP, 1 },
-- { MSR_NHL_LBR_SELECT, 1 },
-- { MSR_NHL_LASTBRANCH_TOS, 1 },
-- { MSR_P4_LASTBRANCH_0_FROM_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
-- { MSR_P4_LASTBRANCH_0_TO_LIP, NUM_MSR_P4_LASTBRANCH_FROM_TO },
-- { 0, 0 }
--}, sk_lbr[] = {
-- { MSR_IA32_LASTINTFROMIP, 1 },
-- { MSR_IA32_LASTINTTOIP, 1 },
-- { MSR_NHL_LBR_SELECT, 1 },
-- { MSR_NHL_LASTBRANCH_TOS, 1 },
-- { MSR_SKL_LASTBRANCH_0_FROM_IP, NUM_MSR_SKL_LASTBRANCH },
-- { MSR_SKL_LASTBRANCH_0_TO_IP, NUM_MSR_SKL_LASTBRANCH },
-- { MSR_SKL_LASTBRANCH_0_INFO, NUM_MSR_SKL_LASTBRANCH },
-- { 0, 0 }
--}, at_lbr[] = {
-- { MSR_IA32_LASTINTFROMIP, 1 },
-- { MSR_IA32_LASTINTTOIP, 1 },
-- { MSR_C2_LASTBRANCH_TOS, 1 },
-- { MSR_C2_LASTBRANCH_0_FROM_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
-- { MSR_C2_LASTBRANCH_0_TO_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
-- { 0, 0 }
--}, sm_lbr[] = {
-- { MSR_IA32_LASTINTFROMIP, 1 },
-- { MSR_IA32_LASTINTTOIP, 1 },
-- { MSR_SM_LBR_SELECT, 1 },
-- { MSR_SM_LASTBRANCH_TOS, 1 },
-- { MSR_C2_LASTBRANCH_0_FROM_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
-- { MSR_C2_LASTBRANCH_0_TO_IP, NUM_MSR_ATOM_LASTBRANCH_FROM_TO },
-- { 0, 0 }
--}, gm_lbr[] = {
-- { MSR_IA32_LASTINTFROMIP, 1 },
-- { MSR_IA32_LASTINTTOIP, 1 },
-- { MSR_SM_LBR_SELECT, 1 },
-- { MSR_SM_LASTBRANCH_TOS, 1 },
-- { MSR_GM_LASTBRANCH_0_FROM_IP, NUM_MSR_GM_LASTBRANCH_FROM_TO },
-- { MSR_GM_LASTBRANCH_0_TO_IP, NUM_MSR_GM_LASTBRANCH_FROM_TO },
-- { 0, 0 }
--};
--
--static const struct lbr_info *last_branch_msr_get(void)
--{
-- switch ( boot_cpu_data.x86 )
-- {
-- case 6:
-- switch ( boot_cpu_data.x86_model )
-- {
-- /* Core2 Duo */
-- case 0x0f:
-- /* Enhanced Core */
-- case 0x17:
-- /* Xeon 7400 */
-- case 0x1d:
-- return c2_lbr;
-- /* Nehalem */
-- case 0x1a: case 0x1e: case 0x1f: case 0x2e:
-- /* Westmere */
-- case 0x25: case 0x2c: case 0x2f:
-- /* Sandy Bridge */
-- case 0x2a: case 0x2d:
-- /* Ivy Bridge */
-- case 0x3a: case 0x3e:
-- /* Haswell */
-- case 0x3c: case 0x3f: case 0x45: case 0x46:
-- /* Broadwell */
-- case 0x3d: case 0x47: case 0x4f: case 0x56:
-- return nh_lbr;
-- /* Skylake */
-- case 0x4e: case 0x5e:
-- /* Xeon Scalable */
-- case 0x55:
-- /* Cannon Lake */
-- case 0x66:
-- /* Goldmont Plus */
-- case 0x7a:
-- /* Ice Lake */
-- case 0x6a: case 0x6c: case 0x7d: case 0x7e:
-- /* Tiger Lake */
-- case 0x8c: case 0x8d:
-- /* Tremont */
-- case 0x86:
-- /* Kaby Lake */
-- case 0x8e: case 0x9e:
-- /* Comet Lake */
-- case 0xa5: case 0xa6:
-- return sk_lbr;
-- /* Atom */
-- case 0x1c: case 0x26: case 0x27: case 0x35: case 0x36:
-- return at_lbr;
-- /* Silvermont */
-- case 0x37: case 0x4a: case 0x4d: case 0x5a: case 0x5d:
-- /* Xeon Phi Knights Landing */
-- case 0x57:
-- /* Xeon Phi Knights Mill */
-- case 0x85:
-- /* Airmont */
-- case 0x4c:
-- return sm_lbr;
-- /* Goldmont */
-- case 0x5c: case 0x5f:
-- return gm_lbr;
-- }
-- break;
--
-- case 15:
-- switch ( boot_cpu_data.x86_model )
-- {
-- /* Pentium4/Xeon with em64t */
-- case 3: case 4: case 6:
-- return p4_lbr;
-- }
-- break;
-- }
--
-- return NULL;
--}
--
- enum
- {
- LBR_FORMAT_32 = 0x0, /* 32-bit record format */
-@@ -3224,7 +3226,7 @@ static void __init ler_to_fixup_check(void)
-
- static int is_last_branch_msr(u32 ecx)
- {
-- const struct lbr_info *lbr = last_branch_msr_get();
-+ const struct lbr_info *lbr = model_specific_lbr;
-
- if ( lbr == NULL )
- return 0;
-@@ -3563,7 +3565,7 @@ static int cf_check vmx_msr_write_intercept(
- if ( !(v->arch.hvm.vmx.lbr_flags & LBR_MSRS_INSERTED) &&
- (msr_content & IA32_DEBUGCTLMSR_LBR) )
- {
-- const struct lbr_info *lbr = last_branch_msr_get();
-+ const struct lbr_info *lbr = model_specific_lbr;
-
- if ( unlikely(!lbr) )
- {
---
-2.40.0
-
diff --git a/0029-x86-vmx-Support-for-CPUs-without-model-specific-LBR.patch b/0029-x86-vmx-Support-for-CPUs-without-model-specific-LBR.patch
deleted file mode 100644
index 2f87b83..0000000
--- a/0029-x86-vmx-Support-for-CPUs-without-model-specific-LBR.patch
+++ /dev/null
@@ -1,83 +0,0 @@
-From e904d8ae01a0be53368c8c388f13bf4ffcbcdf6c Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 7 Feb 2023 16:59:14 +0100
-Subject: [PATCH 29/89] x86/vmx: Support for CPUs without model-specific LBR
-
-Ice Lake (server at least) has both architectural LBR and model-specific LBR.
-Sapphire Rapids does not have model-specific LBR at all. I.e. On SPR and
-later, model_specific_lbr will always be NULL, so we must make changes to
-avoid reliably hitting the domain_crash().
-
-The Arch LBR spec states that CPUs without model-specific LBR implement
-MSR_DBG_CTL.LBR by discarding writes and always returning 0.
-
-Do this for any CPU for which we lack model-specific LBR information.
-
-Adjust the now-stale comment, now that the Arch LBR spec has created a way to
-signal "no model specific LBR" to guests.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Kevin Tian <kevin.tian@intel.com>
-master commit: 3edca52ce736297d7fcf293860cd94ef62638052
-master date: 2023-01-12 18:42:00 +0000
----
- xen/arch/x86/hvm/vmx/vmx.c | 31 ++++++++++++++++---------------
- 1 file changed, 16 insertions(+), 15 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
-index ad91464103..861f91f2af 100644
---- a/xen/arch/x86/hvm/vmx/vmx.c
-+++ b/xen/arch/x86/hvm/vmx/vmx.c
-@@ -3545,18 +3545,26 @@ static int cf_check vmx_msr_write_intercept(
- if ( msr_content & rsvd )
- goto gp_fault;
-
-+ /*
-+ * The Arch LBR spec (new in Ice Lake) states that CPUs with no
-+ * model-specific LBRs implement MSR_DBG_CTL.LBR by discarding writes
-+ * and always returning 0.
-+ *
-+ * Use this property in all cases where we don't know any
-+ * model-specific LBR information, as it matches real hardware
-+ * behaviour on post-Ice Lake systems.
-+ */
-+ if ( !model_specific_lbr )
-+ msr_content &= ~IA32_DEBUGCTLMSR_LBR;
-+
- /*
- * When a guest first enables LBR, arrange to save and restore the LBR
- * MSRs and allow the guest direct access.
- *
-- * MSR_DEBUGCTL and LBR has existed almost as long as MSRs have
-- * existed, and there is no architectural way to hide the feature, or
-- * fail the attempt to enable LBR.
-- *
-- * Unknown host LBR MSRs or hitting -ENOSPC with the guest load/save
-- * list are definitely hypervisor bugs, whereas -ENOMEM for allocating
-- * the load/save list is simply unlucky (and shouldn't occur with
-- * sensible management by the toolstack).
-+ * Hitting -ENOSPC with the guest load/save list is definitely a
-+ * hypervisor bug, whereas -ENOMEM for allocating the load/save list
-+ * is simply unlucky (and shouldn't occur with sensible management by
-+ * the toolstack).
- *
- * Either way, there is nothing we can do right now to recover, and
- * the guest won't execute correctly either. Simply crash the domain
-@@ -3567,13 +3575,6 @@ static int cf_check vmx_msr_write_intercept(
- {
- const struct lbr_info *lbr = model_specific_lbr;
-
-- if ( unlikely(!lbr) )
-- {
-- gprintk(XENLOG_ERR, "Unknown Host LBR MSRs\n");
-- domain_crash(v->domain);
-- return X86EMUL_OKAY;
-- }
--
- for ( ; lbr->count; lbr++ )
- {
- unsigned int i;
---
-2.40.0
-
diff --git a/0029-xen-arm-page-Handle-cache-flush-of-an-element-at-the.patch b/0029-xen-arm-page-Handle-cache-flush-of-an-element-at-the.patch
new file mode 100644
index 0000000..bc866d0
--- /dev/null
+++ b/0029-xen-arm-page-Handle-cache-flush-of-an-element-at-the.patch
@@ -0,0 +1,111 @@
+From d31e5b2a9c39816a954d1088d4cfc782f0006f39 Mon Sep 17 00:00:00 2001
+From: Stefano Stabellini <stefano.stabellini@amd.com>
+Date: Tue, 5 Sep 2023 14:33:29 +0200
+Subject: [PATCH 29/55] xen/arm: page: Handle cache flush of an element at the
+ top of the address space
+
+The region that needs to be cleaned/invalidated may be at the top
+of the address space. This means that 'end' (i.e. 'p + size') will
+be 0 and therefore nothing will be cleaned/invalidated as the check
+in the loop will always be false.
+
+On Arm64, we only support we only support up to 48-bit Virtual
+address space. So this is not a concern there. However, for 32-bit,
+the mapcache is using the last 2GB of the address space. Therefore
+we may not clean/invalidate properly some pages. This could lead
+to memory corruption or data leakage (the scrubbed value may
+still sit in the cache when the guest could read directly the memory
+and therefore read the old content).
+
+Rework invalidate_dcache_va_range(), clean_dcache_va_range(),
+clean_and_invalidate_dcache_va_range() to handle a cache flush
+with an element at the top of the address space.
+
+This is CVE-2023-34321 / XSA-437.
+
+Reported-by: Julien Grall <jgrall@amazon.com>
+Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Acked-by: Bertrand Marquis <bertrand.marquis@arm.com>
+master commit: 9a216e92de9f9011097e4f1fb55ff67ba0a21704
+master date: 2023-09-05 14:30:08 +0200
+---
+ xen/arch/arm/include/asm/page.h | 33 ++++++++++++++++++++-------------
+ 1 file changed, 20 insertions(+), 13 deletions(-)
+
+diff --git a/xen/arch/arm/include/asm/page.h b/xen/arch/arm/include/asm/page.h
+index e7cd62190c..d7fe770a5e 100644
+--- a/xen/arch/arm/include/asm/page.h
++++ b/xen/arch/arm/include/asm/page.h
+@@ -160,26 +160,25 @@ static inline size_t read_dcache_line_bytes(void)
+
+ static inline int invalidate_dcache_va_range(const void *p, unsigned long size)
+ {
+- const void *end = p + size;
+ size_t cacheline_mask = dcache_line_bytes - 1;
+
+ dsb(sy); /* So the CPU issues all writes to the range */
+
+ if ( (uintptr_t)p & cacheline_mask )
+ {
++ size -= dcache_line_bytes - ((uintptr_t)p & cacheline_mask);
+ p = (void *)((uintptr_t)p & ~cacheline_mask);
+ asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p));
+ p += dcache_line_bytes;
+ }
+- if ( (uintptr_t)end & cacheline_mask )
+- {
+- end = (void *)((uintptr_t)end & ~cacheline_mask);
+- asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (end));
+- }
+
+- for ( ; p < end; p += dcache_line_bytes )
++ for ( ; size >= dcache_line_bytes;
++ p += dcache_line_bytes, size -= dcache_line_bytes )
+ asm volatile (__invalidate_dcache_one(0) : : "r" (p));
+
++ if ( size > 0 )
++ asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p));
++
+ dsb(sy); /* So we know the flushes happen before continuing */
+
+ return 0;
+@@ -187,10 +186,14 @@ static inline int invalidate_dcache_va_range(const void *p, unsigned long size)
+
+ static inline int clean_dcache_va_range(const void *p, unsigned long size)
+ {
+- const void *end = p + size;
++ size_t cacheline_mask = dcache_line_bytes - 1;
++
+ dsb(sy); /* So the CPU issues all writes to the range */
+- p = (void *)((uintptr_t)p & ~(dcache_line_bytes - 1));
+- for ( ; p < end; p += dcache_line_bytes )
++ size += (uintptr_t)p & cacheline_mask;
++ size = (size + cacheline_mask) & ~cacheline_mask;
++ p = (void *)((uintptr_t)p & ~cacheline_mask);
++ for ( ; size >= dcache_line_bytes;
++ p += dcache_line_bytes, size -= dcache_line_bytes )
+ asm volatile (__clean_dcache_one(0) : : "r" (p));
+ dsb(sy); /* So we know the flushes happen before continuing */
+ /* ARM callers assume that dcache_* functions cannot fail. */
+@@ -200,10 +203,14 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size)
+ static inline int clean_and_invalidate_dcache_va_range
+ (const void *p, unsigned long size)
+ {
+- const void *end = p + size;
++ size_t cacheline_mask = dcache_line_bytes - 1;
++
+ dsb(sy); /* So the CPU issues all writes to the range */
+- p = (void *)((uintptr_t)p & ~(dcache_line_bytes - 1));
+- for ( ; p < end; p += dcache_line_bytes )
++ size += (uintptr_t)p & cacheline_mask;
++ size = (size + cacheline_mask) & ~cacheline_mask;
++ p = (void *)((uintptr_t)p & ~cacheline_mask);
++ for ( ; size >= dcache_line_bytes;
++ p += dcache_line_bytes, size -= dcache_line_bytes )
+ asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p));
+ dsb(sy); /* So we know the flushes happen before continuing */
+ /* ARM callers assume that dcache_* functions cannot fail. */
+--
+2.42.0
+
diff --git a/0030-x86-AMD-extend-Zenbleed-check-to-models-good-ucode-i.patch b/0030-x86-AMD-extend-Zenbleed-check-to-models-good-ucode-i.patch
new file mode 100644
index 0000000..4581d03
--- /dev/null
+++ b/0030-x86-AMD-extend-Zenbleed-check-to-models-good-ucode-i.patch
@@ -0,0 +1,48 @@
+From d2d2dcae879c6cc05227c9620f0a772f35fe6886 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Wed, 23 Aug 2023 09:26:36 +0200
+Subject: [PATCH 30/55] x86/AMD: extend Zenbleed check to models "good" ucode
+ isn't known for
+
+Reportedly the AMD Custom APU 0405 found on SteamDeck, models 0x90 and
+0x91, (quoting the respective Linux commit) is similarly affected. Put
+another instance of our Zen1 vs Zen2 distinction checks in
+amd_check_zenbleed(), forcing use of the chickenbit irrespective of
+ucode version (building upon real hardware never surfacing a version of
+0xffffffff).
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+(cherry picked from commit 145a69c0944ac70cfcf9d247c85dee9e99d9d302)
+---
+ xen/arch/x86/cpu/amd.c | 13 ++++++++++---
+ 1 file changed, 10 insertions(+), 3 deletions(-)
+
+diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
+index 3ea214fc2e..1bb3044be1 100644
+--- a/xen/arch/x86/cpu/amd.c
++++ b/xen/arch/x86/cpu/amd.c
+@@ -909,10 +909,17 @@ void amd_check_zenbleed(void)
+ case 0xa0 ... 0xaf: good_rev = 0x08a00008; break;
+ default:
+ /*
+- * With the Fam17h check above, parts getting here are Zen1.
+- * They're not affected.
++ * With the Fam17h check above, most parts getting here are
++ * Zen1. They're not affected. Assume Zen2 ones making it
++ * here are affected regardless of microcode version.
++ *
++ * Zen1 vs Zen2 isn't a simple model number comparison, so use
++ * STIBP as a heuristic to distinguish.
+ */
+- return;
++ if (!boot_cpu_has(X86_FEATURE_AMD_STIBP))
++ return;
++ good_rev = ~0U;
++ break;
+ }
+
+ rdmsrl(MSR_AMD64_DE_CFG, val);
+--
+2.42.0
+
diff --git a/0030-x86-shadow-fix-PAE-check-for-top-level-table-unshado.patch b/0030-x86-shadow-fix-PAE-check-for-top-level-table-unshado.patch
deleted file mode 100644
index e2bb8df..0000000
--- a/0030-x86-shadow-fix-PAE-check-for-top-level-table-unshado.patch
+++ /dev/null
@@ -1,39 +0,0 @@
-From 2d74e7035bd060d662f1c4f8522377be8021be92 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 7 Feb 2023 16:59:54 +0100
-Subject: [PATCH 30/89] x86/shadow: fix PAE check for top-level table
- unshadowing
-
-Clearly within the for_each_vcpu() the vCPU of this loop is meant, not
-the (loop invariant) one the fault occurred on.
-
-Fixes: 3d5e6a3ff383 ("x86 hvm: implement HVMOP_pagetable_dying")
-Fixes: ef3b0d8d2c39 ("x86/shadow: shadow_table[] needs only one entry for PV-only configs")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: f8fdceefbb1193ec81667eb40b83bc525cb71204
-master date: 2023-01-20 09:23:42 +0100
----
- xen/arch/x86/mm/shadow/multi.c | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
-index 2370b30602..671bf8c228 100644
---- a/xen/arch/x86/mm/shadow/multi.c
-+++ b/xen/arch/x86/mm/shadow/multi.c
-@@ -2672,10 +2672,10 @@ static int cf_check sh_page_fault(
- #if GUEST_PAGING_LEVELS == 3
- unsigned int i;
-
-- for_each_shadow_table(v, i)
-+ for_each_shadow_table(tmp, i)
- {
- mfn_t smfn = pagetable_get_mfn(
-- v->arch.paging.shadow.shadow_table[i]);
-+ tmp->arch.paging.shadow.shadow_table[i]);
-
- if ( mfn_valid(smfn) && (mfn_x(smfn) != 0) )
- {
---
-2.40.0
-
diff --git a/0031-build-fix-building-flask-headers-before-descending-i.patch b/0031-build-fix-building-flask-headers-before-descending-i.patch
deleted file mode 100644
index 273e795..0000000
--- a/0031-build-fix-building-flask-headers-before-descending-i.patch
+++ /dev/null
@@ -1,50 +0,0 @@
-From 819a5d4ed8b79e21843d5960a7ab8fbd16f28233 Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Tue, 7 Feb 2023 17:00:29 +0100
-Subject: [PATCH 31/89] build: fix building flask headers before descending in
- flask/ss/
-
-Unfortunatly, adding prerequisite to "$(obj)/ss/built_in.o" doesn't
-work because we have "$(obj)/%/built_in.o: $(obj)/% ;" in Rules.mk.
-So, make is allow to try to build objects in "xsm/flask/ss/" before
-generating the headers.
-
-Adding a prerequisite on "$(obj)/ss" instead will fix the issue as
-that's the target used to run make in this subdirectory.
-
-Unfortunatly, that target is also used when running `make clean`, so
-we want to ignore it in this case. $(MAKECMDGOALS) can't be used in
-this case as it is empty, but we can guess which operation is done by
-looking at the list of loaded makefiles.
-
-Fixes: 7a3bcd2babcc ("build: build everything from the root dir, use obj=$subdir")
-Reported-by: "Daniel P. Smith" <dpsmith@apertussolutions.com>
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: d60324d8af9404014cfcc37bba09e9facfd02fcf
-master date: 2023-01-23 15:03:58 +0100
----
- xen/xsm/flask/Makefile | 6 +++++-
- 1 file changed, 5 insertions(+), 1 deletion(-)
-
-diff --git a/xen/xsm/flask/Makefile b/xen/xsm/flask/Makefile
-index d25312f4fa..3fdcf7727e 100644
---- a/xen/xsm/flask/Makefile
-+++ b/xen/xsm/flask/Makefile
-@@ -16,7 +16,11 @@ FLASK_H_FILES := flask.h class_to_string.h initial_sid_to_string.h
- AV_H_FILES := av_perm_to_string.h av_permissions.h
- ALL_H_FILES := $(addprefix include/,$(FLASK_H_FILES) $(AV_H_FILES))
-
--$(addprefix $(obj)/,$(obj-y)) $(obj)/ss/built_in.o: $(addprefix $(obj)/,$(ALL_H_FILES))
-+# Adding prerequisite to descending into ss/ folder only when not running
-+# `make *clean`.
-+ifeq ($(filter %/Makefile.clean,$(MAKEFILE_LIST)),)
-+$(addprefix $(obj)/,$(obj-y)) $(obj)/ss: $(addprefix $(obj)/,$(ALL_H_FILES))
-+endif
- extra-y += $(ALL_H_FILES)
-
- mkflask := $(srcdir)/policy/mkflask.sh
---
-2.40.0
-
diff --git a/0031-x86-spec-ctrl-Fix-confusion-between-SPEC_CTRL_EXIT_T.patch b/0031-x86-spec-ctrl-Fix-confusion-between-SPEC_CTRL_EXIT_T.patch
new file mode 100644
index 0000000..10417ae
--- /dev/null
+++ b/0031-x86-spec-ctrl-Fix-confusion-between-SPEC_CTRL_EXIT_T.patch
@@ -0,0 +1,74 @@
+From dc28aba565f226f9bec24cfde993e78478acfb4e Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 12 Sep 2023 15:06:49 +0100
+Subject: [PATCH 31/55] x86/spec-ctrl: Fix confusion between
+ SPEC_CTRL_EXIT_TO_XEN{,_IST}
+
+c/s 3fffaf9c13e9 ("x86/entry: Avoid using alternatives in NMI/#MC paths")
+dropped the only user, leaving behind the (incorrect) implication that Xen had
+split exit paths.
+
+Delete the unused SPEC_CTRL_EXIT_TO_XEN and rename SPEC_CTRL_EXIT_TO_XEN_IST
+to SPEC_CTRL_EXIT_TO_XEN for consistency.
+
+No functional change.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 1c18d73774533a55ba9d1cbee8bdace03efdb5e7)
+---
+ xen/arch/x86/include/asm/spec_ctrl_asm.h | 10 ++--------
+ xen/arch/x86/x86_64/entry.S | 2 +-
+ 2 files changed, 3 insertions(+), 9 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+index f23bb105c5..e8fd01243c 100644
+--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
++++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+@@ -79,7 +79,6 @@
+ * - SPEC_CTRL_ENTRY_FROM_PV
+ * - SPEC_CTRL_ENTRY_FROM_INTR
+ * - SPEC_CTRL_ENTRY_FROM_INTR_IST
+- * - SPEC_CTRL_EXIT_TO_XEN_IST
+ * - SPEC_CTRL_EXIT_TO_XEN
+ * - SPEC_CTRL_EXIT_TO_PV
+ *
+@@ -268,11 +267,6 @@
+ ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1), \
+ X86_FEATURE_SC_MSR_PV
+
+-/* Use when exiting to Xen context. */
+-#define SPEC_CTRL_EXIT_TO_XEN \
+- ALTERNATIVE "", \
+- DO_SPEC_CTRL_EXIT_TO_XEN, X86_FEATURE_SC_MSR_PV
+-
+ /* Use when exiting to PV guest context. */
+ #define SPEC_CTRL_EXIT_TO_PV \
+ ALTERNATIVE "", \
+@@ -339,8 +333,8 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
+ UNLIKELY_END(\@_serialise)
+ .endm
+
+-/* Use when exiting to Xen in IST context. */
+-.macro SPEC_CTRL_EXIT_TO_XEN_IST
++/* Use when exiting to Xen context. */
++.macro SPEC_CTRL_EXIT_TO_XEN
+ /*
+ * Requires %rbx=stack_end
+ * Clobbers %rax, %rcx, %rdx
+diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
+index 7675a59ff0..b45a09823a 100644
+--- a/xen/arch/x86/x86_64/entry.S
++++ b/xen/arch/x86/x86_64/entry.S
+@@ -673,7 +673,7 @@ UNLIKELY_START(ne, exit_cr3)
+ UNLIKELY_END(exit_cr3)
+
+ /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
+- SPEC_CTRL_EXIT_TO_XEN_IST /* Req: %rbx=end, Clob: acd */
++ SPEC_CTRL_EXIT_TO_XEN /* Req: %rbx=end, Clob: acd */
+
+ RESTORE_ALL adj=8
+ iretq
+--
+2.42.0
+
diff --git a/0032-ns16550-fix-an-incorrect-assignment-to-uart-io_size.patch b/0032-ns16550-fix-an-incorrect-assignment-to-uart-io_size.patch
deleted file mode 100644
index 8b3a410..0000000
--- a/0032-ns16550-fix-an-incorrect-assignment-to-uart-io_size.patch
+++ /dev/null
@@ -1,34 +0,0 @@
-From d0127881376baeea1e4eb71d0f7b56d942147124 Mon Sep 17 00:00:00 2001
-From: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
-Date: Tue, 7 Feb 2023 17:00:47 +0100
-Subject: [PATCH 32/89] ns16550: fix an incorrect assignment to uart->io_size
-
-uart->io_size represents the size in bytes. Thus, when serial_port.bit_width
-is assigned to it, it should be converted to size in bytes.
-
-Fixes: 17b516196c ("ns16550: add ACPI support for ARM only")
-Reported-by: Jan Beulich <jbeulich@suse.com>
-Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
-Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
-master commit: 352c89f72ddb67b8d9d4e492203f8c77f85c8df1
-master date: 2023-01-24 16:54:38 +0100
----
- xen/drivers/char/ns16550.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
-index 01a05c9aa8..ce013fb6a5 100644
---- a/xen/drivers/char/ns16550.c
-+++ b/xen/drivers/char/ns16550.c
-@@ -1875,7 +1875,7 @@ static int __init ns16550_acpi_uart_init(const void *data)
- uart->parity = spcr->parity;
- uart->stop_bits = spcr->stop_bits;
- uart->io_base = spcr->serial_port.address;
-- uart->io_size = spcr->serial_port.bit_width;
-+ uart->io_size = DIV_ROUND_UP(spcr->serial_port.bit_width, BITS_PER_BYTE);
- uart->reg_shift = spcr->serial_port.bit_offset;
- uart->reg_width = spcr->serial_port.access_width;
-
---
-2.40.0
-
diff --git a/0032-x86-spec-ctrl-Fold-DO_SPEC_CTRL_EXIT_TO_XEN-into-it-.patch b/0032-x86-spec-ctrl-Fold-DO_SPEC_CTRL_EXIT_TO_XEN-into-it-.patch
new file mode 100644
index 0000000..a0c83da
--- /dev/null
+++ b/0032-x86-spec-ctrl-Fold-DO_SPEC_CTRL_EXIT_TO_XEN-into-it-.patch
@@ -0,0 +1,85 @@
+From 84690fb82c4f4aecb72a6789d8994efa74841e09 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 12 Sep 2023 17:03:16 +0100
+Subject: [PATCH 32/55] x86/spec-ctrl: Fold DO_SPEC_CTRL_EXIT_TO_XEN into it's
+ single user
+
+With the SPEC_CTRL_EXIT_TO_XEN{,_IST} confusion fixed, it's now obvious that
+there's only a single EXIT_TO_XEN path. Fold DO_SPEC_CTRL_EXIT_TO_XEN into
+SPEC_CTRL_EXIT_TO_XEN to simplify further fixes.
+
+When merging labels, switch the name to .L\@_skip_sc_msr as "skip" on its own
+is going to be too generic shortly.
+
+No functional change.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 694bb0f280fd08a4377e36e32b84b5062def4de2)
+---
+ xen/arch/x86/include/asm/spec_ctrl_asm.h | 40 ++++++++++--------------
+ 1 file changed, 16 insertions(+), 24 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+index e8fd01243c..d5f65d80ea 100644
+--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
++++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+@@ -211,27 +211,6 @@
+ wrmsr
+ .endm
+
+-.macro DO_SPEC_CTRL_EXIT_TO_XEN
+-/*
+- * Requires %rbx=stack_end
+- * Clobbers %rax, %rcx, %rdx
+- *
+- * When returning to Xen context, look to see whether SPEC_CTRL shadowing is
+- * in effect, and reload the shadow value. This covers race conditions which
+- * exist with an NMI/MCE/etc hitting late in the return-to-guest path.
+- */
+- xor %edx, %edx
+-
+- testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
+- jz .L\@_skip
+-
+- mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%rbx), %eax
+- mov $MSR_SPEC_CTRL, %ecx
+- wrmsr
+-
+-.L\@_skip:
+-.endm
+-
+ .macro DO_SPEC_CTRL_EXIT_TO_GUEST
+ /*
+ * Requires %eax=spec_ctrl, %rsp=regs/cpuinfo
+@@ -340,11 +319,24 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
+ * Clobbers %rax, %rcx, %rdx
+ */
+ testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
+- jz .L\@_skip
++ jz .L\@_skip_sc_msr
+
+- DO_SPEC_CTRL_EXIT_TO_XEN
++ /*
++ * When returning to Xen context, look to see whether SPEC_CTRL shadowing
++ * is in effect, and reload the shadow value. This covers race conditions
++ * which exist with an NMI/MCE/etc hitting late in the return-to-guest
++ * path.
++ */
++ xor %edx, %edx
+
+-.L\@_skip:
++ testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
++ jz .L\@_skip_sc_msr
++
++ mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%rbx), %eax
++ mov $MSR_SPEC_CTRL, %ecx
++ wrmsr
++
++.L\@_skip_sc_msr:
+ .endm
+
+ #endif /* __ASSEMBLY__ */
+--
+2.42.0
+
diff --git a/0033-libxl-fix-guest-kexec-skip-cpuid-policy.patch b/0033-libxl-fix-guest-kexec-skip-cpuid-policy.patch
deleted file mode 100644
index 7eb3779..0000000
--- a/0033-libxl-fix-guest-kexec-skip-cpuid-policy.patch
+++ /dev/null
@@ -1,72 +0,0 @@
-From 3dae50283d9819c691a97f15b133124c00d39a2f Mon Sep 17 00:00:00 2001
-From: Jason Andryuk <jandryuk@gmail.com>
-Date: Tue, 7 Feb 2023 17:01:49 +0100
-Subject: [PATCH 33/89] libxl: fix guest kexec - skip cpuid policy
-
-When a domain performs a kexec (soft reset), libxl__build_pre() is
-called with the existing domid. Calling libxl__cpuid_legacy() on the
-existing domain fails since the cpuid policy has already been set, and
-the guest isn't rebuilt and doesn't kexec.
-
-xc: error: Failed to set d1's policy (err leaf 0xffffffff, subleaf 0xffffffff, msr 0xffffffff) (17 = File exists): Internal error
-libxl: error: libxl_cpuid.c:494:libxl__cpuid_legacy: Domain 1:Failed to apply CPUID policy: File exists
-libxl: error: libxl_create.c:1641:domcreate_rebuild_done: Domain 1:cannot (re-)build domain: -3
-libxl: error: libxl_xshelp.c:201:libxl__xs_read_mandatory: xenstore read failed: `/libxl/1/type': No such file or directory
-libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain type for domid=1, assuming HVM
-
-During a soft_reset, skip calling libxl__cpuid_legacy() to avoid the
-issue. Before commit 34990446ca91, the libxl__cpuid_legacy() failure
-would have been ignored, so kexec would continue.
-
-Fixes: 34990446ca91 ("libxl: don't ignore the return value from xc_cpuid_apply_policy")
-Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
-Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
-master commit: 1e454c2b5b1172e0fc7457e411ebaba61db8fc87
-master date: 2023-01-26 10:58:23 +0100
----
- tools/libs/light/libxl_create.c | 2 ++
- tools/libs/light/libxl_dom.c | 2 +-
- tools/libs/light/libxl_internal.h | 1 +
- 3 files changed, 4 insertions(+), 1 deletion(-)
-
-diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
-index 612eacfc7f..dbee32b7b7 100644
---- a/tools/libs/light/libxl_create.c
-+++ b/tools/libs/light/libxl_create.c
-@@ -2203,6 +2203,8 @@ static int do_domain_soft_reset(libxl_ctx *ctx,
- aop_console_how);
- cdcs->domid_out = &domid_out;
-
-+ state->soft_reset = true;
-+
- dom_path = libxl__xs_get_dompath(gc, domid);
- if (!dom_path) {
- LOGD(ERROR, domid, "failed to read domain path");
-diff --git a/tools/libs/light/libxl_dom.c b/tools/libs/light/libxl_dom.c
-index b454f988fb..f6311eea6e 100644
---- a/tools/libs/light/libxl_dom.c
-+++ b/tools/libs/light/libxl_dom.c
-@@ -382,7 +382,7 @@ int libxl__build_pre(libxl__gc *gc, uint32_t domid,
- /* Construct a CPUID policy, but only for brand new domains. Domains
- * being migrated-in/restored have CPUID handled during the
- * static_data_done() callback. */
-- if (!state->restore)
-+ if (!state->restore && !state->soft_reset)
- rc = libxl__cpuid_legacy(ctx, domid, false, info);
-
- out:
-diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
-index a7c447c10e..cae160351f 100644
---- a/tools/libs/light/libxl_internal.h
-+++ b/tools/libs/light/libxl_internal.h
-@@ -1406,6 +1406,7 @@ typedef struct {
- /* Whether this domain is being migrated/restored, or booting fresh. Only
- * applicable to the primary domain, not support domains (e.g. stub QEMU). */
- bool restore;
-+ bool soft_reset;
- } libxl__domain_build_state;
-
- _hidden void libxl__domain_build_state_init(libxl__domain_build_state *s);
---
-2.40.0
-
diff --git a/0033-x86-spec-ctrl-Turn-the-remaining-SPEC_CTRL_-ENTRY-EX.patch b/0033-x86-spec-ctrl-Turn-the-remaining-SPEC_CTRL_-ENTRY-EX.patch
new file mode 100644
index 0000000..a278c5f
--- /dev/null
+++ b/0033-x86-spec-ctrl-Turn-the-remaining-SPEC_CTRL_-ENTRY-EX.patch
@@ -0,0 +1,83 @@
+From 3952c73bdbd05f0e666986fce633a591237b3c88 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 1 Sep 2023 11:38:44 +0100
+Subject: [PATCH 33/55] x86/spec-ctrl: Turn the remaining
+ SPEC_CTRL_{ENTRY,EXIT}_* into asm macros
+
+These have grown more complex over time, with some already having been
+converted.
+
+Provide full Requires/Clobbers comments, otherwise missing at this level of
+indirection.
+
+No functional change.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 7125429aafb9e3c9c88fc93001fc2300e0ac2cc8)
+---
+ xen/arch/x86/include/asm/spec_ctrl_asm.h | 37 ++++++++++++++++++------
+ 1 file changed, 28 insertions(+), 9 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+index d5f65d80ea..c6d5f2ad01 100644
+--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
++++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+@@ -231,26 +231,45 @@
+ .endm
+
+ /* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
+-#define SPEC_CTRL_ENTRY_FROM_PV \
++.macro SPEC_CTRL_ENTRY_FROM_PV
++/*
++ * Requires %rsp=regs/cpuinfo, %rdx=0
++ * Clobbers %rax, %rcx, %rdx
++ */
+ ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=0), \
+- X86_FEATURE_IBPB_ENTRY_PV; \
+- ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV; \
++ X86_FEATURE_IBPB_ENTRY_PV
++
++ ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV
++
+ ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=0), \
+ X86_FEATURE_SC_MSR_PV
++.endm
+
+ /* Use in interrupt/exception context. May interrupt Xen or PV context. */
+-#define SPEC_CTRL_ENTRY_FROM_INTR \
++.macro SPEC_CTRL_ENTRY_FROM_INTR
++/*
++ * Requires %rsp=regs, %r14=stack_end, %rdx=0
++ * Clobbers %rax, %rcx, %rdx
++ */
+ ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=1), \
+- X86_FEATURE_IBPB_ENTRY_PV; \
+- ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV; \
++ X86_FEATURE_IBPB_ENTRY_PV
++
++ ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV
++
+ ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1), \
+ X86_FEATURE_SC_MSR_PV
++.endm
+
+ /* Use when exiting to PV guest context. */
+-#define SPEC_CTRL_EXIT_TO_PV \
+- ALTERNATIVE "", \
+- DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV; \
++.macro SPEC_CTRL_EXIT_TO_PV
++/*
++ * Requires %rax=spec_ctrl, %rsp=regs/info
++ * Clobbers %rcx, %rdx
++ */
++ ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV
++
+ DO_SPEC_CTRL_COND_VERW
++.endm
+
+ /*
+ * Use in IST interrupt/exception context. May interrupt Xen or PV context.
+--
+2.42.0
+
diff --git a/0034-tools-ocaml-xenctrl-Make-domain_getinfolist-tail-rec.patch b/0034-tools-ocaml-xenctrl-Make-domain_getinfolist-tail-rec.patch
deleted file mode 100644
index 8f57d4e..0000000
--- a/0034-tools-ocaml-xenctrl-Make-domain_getinfolist-tail-rec.patch
+++ /dev/null
@@ -1,71 +0,0 @@
-From 03f545b6cf3220b4647677b588e5525a781a4813 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Tue, 1 Nov 2022 17:59:16 +0000
-Subject: [PATCH 34/89] tools/ocaml/xenctrl: Make domain_getinfolist tail
- recursive
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-domain_getinfolist() is quadratic with the number of domains, because of the
-behaviour of the underlying hypercall. xenopsd was further observed to be
-wasting excessive quantites of time manipulating the list of already-obtained
-domains.
-
-Implement a tail recursive `rev_concat` equivalent to `concat |> rev`, and use
-it instead of calling `@` multiple times.
-
-An incidental benefit is that the list of domains will now be in domid order,
-instead of having pairs of 2 domains changing direction every time.
-
-In a scalability testing scenario with ~1000 VMs, a combination of this and
-the subsequent change takes xenopsd's wallclock time in domain_getinfolist()
-down from 88% to 0.02%
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Tested-by: Pau Ruiz Safont <pau.safont@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit c3b6be714c64aa62b56d0bce96f4b6a10b5c2078)
----
- tools/ocaml/libs/xc/xenctrl.ml | 23 +++++++++++++++++------
- 1 file changed, 17 insertions(+), 6 deletions(-)
-
-diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
-index 83e39a8616..85b73a7f6f 100644
---- a/tools/ocaml/libs/xc/xenctrl.ml
-+++ b/tools/ocaml/libs/xc/xenctrl.ml
-@@ -222,14 +222,25 @@ external domain_shutdown: handle -> domid -> shutdown_reason -> unit
- external _domain_getinfolist: handle -> domid -> int -> domaininfo list
- = "stub_xc_domain_getinfolist"
-
-+let rev_append_fold acc e = List.rev_append e acc
-+
-+(**
-+ * [rev_concat lst] is equivalent to [lst |> List.concat |> List.rev]
-+ * except it is tail recursive, whereas [List.concat] isn't.
-+ * Example:
-+ * rev_concat [[10;9;8];[7;6];[5]]] = [5; 6; 7; 8; 9; 10]
-+ *)
-+let rev_concat lst = List.fold_left rev_append_fold [] lst
-+
- let domain_getinfolist handle first_domain =
- let nb = 2 in
-- let last_domid l = (List.hd l).domid + 1 in
-- let rec __getlist from =
-- let l = _domain_getinfolist handle from nb in
-- (if List.length l = nb then __getlist (last_domid l) else []) @ l
-- in
-- List.rev (__getlist first_domain)
-+ let rec __getlist lst from =
-+ (* _domain_getinfolist returns domains in reverse order, largest first *)
-+ match _domain_getinfolist handle from nb with
-+ | [] -> rev_concat lst
-+ | (hd :: _) as l -> __getlist (l :: lst) (hd.domid + 1)
-+ in
-+ __getlist [] first_domain
-
- external domain_getinfo: handle -> domid -> domaininfo= "stub_xc_domain_getinfo"
-
---
-2.40.0
-
diff --git a/0034-x86-spec-ctrl-Improve-all-SPEC_CTRL_-ENTER-EXIT-_-co.patch b/0034-x86-spec-ctrl-Improve-all-SPEC_CTRL_-ENTER-EXIT-_-co.patch
new file mode 100644
index 0000000..f360cbd
--- /dev/null
+++ b/0034-x86-spec-ctrl-Improve-all-SPEC_CTRL_-ENTER-EXIT-_-co.patch
@@ -0,0 +1,106 @@
+From ba023e93d0b1e60b80251bf080bab694efb9f8e3 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 30 Aug 2023 20:11:50 +0100
+Subject: [PATCH 34/55] x86/spec-ctrl: Improve all SPEC_CTRL_{ENTER,EXIT}_*
+ comments
+
+... to better explain how they're used.
+
+Doing so highlights that SPEC_CTRL_EXIT_TO_XEN is missing a VERW flush for the
+corner case when e.g. an NMI hits late in an exit-to-guest path.
+
+Leave a TODO, which will be addressed in subsequent patches which arrange for
+VERW flushing to be safe within SPEC_CTRL_EXIT_TO_XEN.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 45f00557350dc7d0756551069803fc49c29184ca)
+---
+ xen/arch/x86/include/asm/spec_ctrl_asm.h | 36 ++++++++++++++++++++----
+ 1 file changed, 31 insertions(+), 5 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+index c6d5f2ad01..97c4db31cd 100644
+--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
++++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+@@ -230,7 +230,10 @@
+ wrmsr
+ .endm
+
+-/* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
++/*
++ * Used after an entry from PV context: SYSCALL, SYSENTER, INT,
++ * etc. There is always a guest speculation state in context.
++ */
+ .macro SPEC_CTRL_ENTRY_FROM_PV
+ /*
+ * Requires %rsp=regs/cpuinfo, %rdx=0
+@@ -245,7 +248,11 @@
+ X86_FEATURE_SC_MSR_PV
+ .endm
+
+-/* Use in interrupt/exception context. May interrupt Xen or PV context. */
++/*
++ * Used after an exception or maskable interrupt, hitting Xen or PV context.
++ * There will either be a guest speculation context, or (barring fatal
++ * exceptions) a well-formed Xen speculation context.
++ */
+ .macro SPEC_CTRL_ENTRY_FROM_INTR
+ /*
+ * Requires %rsp=regs, %r14=stack_end, %rdx=0
+@@ -260,7 +267,10 @@
+ X86_FEATURE_SC_MSR_PV
+ .endm
+
+-/* Use when exiting to PV guest context. */
++/*
++ * Used when exiting from any entry context, back to PV context. This
++ * includes from an IST entry which moved onto the primary stack.
++ */
+ .macro SPEC_CTRL_EXIT_TO_PV
+ /*
+ * Requires %rax=spec_ctrl, %rsp=regs/info
+@@ -272,7 +282,13 @@
+ .endm
+
+ /*
+- * Use in IST interrupt/exception context. May interrupt Xen or PV context.
++ * Used after an IST entry hitting Xen or PV context. Special care is needed,
++ * because when hitting Xen context, there may not be a well-formed
++ * speculation context. (i.e. it can hit in the middle of
++ * SPEC_CTRL_{ENTRY,EXIT}_* regions.)
++ *
++ * An IST entry which hits PV context moves onto the primary stack and leaves
++ * via SPEC_CTRL_EXIT_TO_PV, *not* SPEC_CTRL_EXIT_TO_XEN.
+ */
+ .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
+ /*
+@@ -331,7 +347,14 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
+ UNLIKELY_END(\@_serialise)
+ .endm
+
+-/* Use when exiting to Xen context. */
++/*
++ * Use when exiting from any entry context, back to Xen context. This
++ * includes returning to other SPEC_CTRL_{ENTRY,EXIT}_* regions with an
++ * incomplete speculation context.
++ *
++ * Because we might have interrupted Xen beyond SPEC_CTRL_EXIT_TO_$GUEST, we
++ * need to treat this as if it were an EXIT_TO_$GUEST case too.
++ */
+ .macro SPEC_CTRL_EXIT_TO_XEN
+ /*
+ * Requires %rbx=stack_end
+@@ -356,6 +379,9 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
+ wrmsr
+
+ .L\@_skip_sc_msr:
++
++ /* TODO VERW */
++
+ .endm
+
+ #endif /* __ASSEMBLY__ */
+--
+2.42.0
+
diff --git a/0035-tools-ocaml-xenctrl-Use-larger-chunksize-in-domain_g.patch b/0035-tools-ocaml-xenctrl-Use-larger-chunksize-in-domain_g.patch
deleted file mode 100644
index 6c64355..0000000
--- a/0035-tools-ocaml-xenctrl-Use-larger-chunksize-in-domain_g.patch
+++ /dev/null
@@ -1,41 +0,0 @@
-From 5d8f9cfa166c55a308856e7b021d778350edbd6c Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Tue, 1 Nov 2022 17:59:17 +0000
-Subject: [PATCH 35/89] tools/ocaml/xenctrl: Use larger chunksize in
- domain_getinfolist
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-domain_getinfolist() is quadratic with the number of domains, because of the
-behaviour of the underlying hypercall. Nevertheless, getting domain info in
-blocks of 1024 is far more efficient than blocks of 2.
-
-In a scalability testing scenario with ~1000 VMs, a combination of this and
-the previous change takes xenopsd's wallclock time in domain_getinfolist()
-down from 88% to 0.02%
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Tested-by: Pau Ruiz Safont <pau.safont@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 95db09b1b154fb72fad861815ceae1f3fa49fc4e)
----
- tools/ocaml/libs/xc/xenctrl.ml | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
-index 85b73a7f6f..aa650533f7 100644
---- a/tools/ocaml/libs/xc/xenctrl.ml
-+++ b/tools/ocaml/libs/xc/xenctrl.ml
-@@ -233,7 +233,7 @@ let rev_append_fold acc e = List.rev_append e acc
- let rev_concat lst = List.fold_left rev_append_fold [] lst
-
- let domain_getinfolist handle first_domain =
-- let nb = 2 in
-+ let nb = 1024 in
- let rec __getlist lst from =
- (* _domain_getinfolist returns domains in reverse order, largest first *)
- match _domain_getinfolist handle from nb with
---
-2.40.0
-
diff --git a/0035-x86-entry-Adjust-restore_all_xen-to-hold-stack_end-i.patch b/0035-x86-entry-Adjust-restore_all_xen-to-hold-stack_end-i.patch
new file mode 100644
index 0000000..fe2acaf
--- /dev/null
+++ b/0035-x86-entry-Adjust-restore_all_xen-to-hold-stack_end-i.patch
@@ -0,0 +1,74 @@
+From 5f7efd47c8273fde972637d0360851802f76eca9 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 13 Sep 2023 13:48:16 +0100
+Subject: [PATCH 35/55] x86/entry: Adjust restore_all_xen to hold stack_end in
+ %r14
+
+All other SPEC_CTRL_{ENTRY,EXIT}_* helpers hold stack_end in %r14. Adjust it
+for consistency.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 7aa28849a1155d856e214e9a80a7e65fffdc3e58)
+---
+ xen/arch/x86/include/asm/spec_ctrl_asm.h | 8 ++++----
+ xen/arch/x86/x86_64/entry.S | 8 ++++----
+ 2 files changed, 8 insertions(+), 8 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+index 97c4db31cd..66c706496f 100644
+--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
++++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+@@ -357,10 +357,10 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
+ */
+ .macro SPEC_CTRL_EXIT_TO_XEN
+ /*
+- * Requires %rbx=stack_end
++ * Requires %r14=stack_end
+ * Clobbers %rax, %rcx, %rdx
+ */
+- testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
++ testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
+ jz .L\@_skip_sc_msr
+
+ /*
+@@ -371,10 +371,10 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
+ */
+ xor %edx, %edx
+
+- testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
++ testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
+ jz .L\@_skip_sc_msr
+
+- mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%rbx), %eax
++ mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%r14), %eax
+ mov $MSR_SPEC_CTRL, %ecx
+ wrmsr
+
+diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
+index b45a09823a..92279a225d 100644
+--- a/xen/arch/x86/x86_64/entry.S
++++ b/xen/arch/x86/x86_64/entry.S
+@@ -665,15 +665,15 @@ restore_all_xen:
+ * Check whether we need to switch to the per-CPU page tables, in
+ * case we return to late PV exit code (from an NMI or #MC).
+ */
+- GET_STACK_END(bx)
+- cmpb $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%rbx)
++ GET_STACK_END(14)
++ cmpb $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%r14)
+ UNLIKELY_START(ne, exit_cr3)
+- mov STACK_CPUINFO_FIELD(pv_cr3)(%rbx), %rax
++ mov STACK_CPUINFO_FIELD(pv_cr3)(%r14), %rax
+ mov %rax, %cr3
+ UNLIKELY_END(exit_cr3)
+
+ /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
+- SPEC_CTRL_EXIT_TO_XEN /* Req: %rbx=end, Clob: acd */
++ SPEC_CTRL_EXIT_TO_XEN /* Req: %r14=end, Clob: acd */
+
+ RESTORE_ALL adj=8
+ iretq
+--
+2.42.0
+
diff --git a/0036-tools-ocaml-xb-mmap-Use-Data_abstract_val-wrapper.patch b/0036-tools-ocaml-xb-mmap-Use-Data_abstract_val-wrapper.patch
deleted file mode 100644
index d6a324a..0000000
--- a/0036-tools-ocaml-xb-mmap-Use-Data_abstract_val-wrapper.patch
+++ /dev/null
@@ -1,75 +0,0 @@
-From 7d516fc87637dc551494f8eca08f106f578f7112 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Fri, 16 Dec 2022 18:25:10 +0000
-Subject: [PATCH 36/89] tools/ocaml/xb,mmap: Use Data_abstract_val wrapper
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-This is not strictly necessary since it is essentially a no-op currently: a
-cast to void * and value *, even in OCaml 5.0.
-
-However it does make it clearer that what we have here is not a regular OCaml
-value, but one allocated with Abstract_tag or Custom_tag, and follows the
-example from the manual more closely:
-https://v2.ocaml.org/manual/intfc.html#ss:c-outside-head
-
-It also makes it clearer that these modules have been reviewed for
-compat with OCaml 5.0.
-
-We cannot use OCaml finalizers here, because we want exact control over when
-to unmap these pages from remote domains.
-
-No functional change.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit d2ccc637111d6dbcf808aaffeec7a46f0b1e1c81)
----
- tools/ocaml/libs/mmap/mmap_stubs.h | 4 ++++
- tools/ocaml/libs/mmap/xenmmap_stubs.c | 2 +-
- tools/ocaml/libs/xb/xs_ring_stubs.c | 2 +-
- 3 files changed, 6 insertions(+), 2 deletions(-)
-
-diff --git a/tools/ocaml/libs/mmap/mmap_stubs.h b/tools/ocaml/libs/mmap/mmap_stubs.h
-index 65e4239890..f4784e4715 100644
---- a/tools/ocaml/libs/mmap/mmap_stubs.h
-+++ b/tools/ocaml/libs/mmap/mmap_stubs.h
-@@ -30,4 +30,8 @@ struct mmap_interface
- int len;
- };
-
-+#ifndef Data_abstract_val
-+#define Data_abstract_val(x) ((void *)Op_val(x))
-+#endif
-+
- #endif
-diff --git a/tools/ocaml/libs/mmap/xenmmap_stubs.c b/tools/ocaml/libs/mmap/xenmmap_stubs.c
-index e2ce088e25..e03951d781 100644
---- a/tools/ocaml/libs/mmap/xenmmap_stubs.c
-+++ b/tools/ocaml/libs/mmap/xenmmap_stubs.c
-@@ -28,7 +28,7 @@
- #include <caml/fail.h>
- #include <caml/callback.h>
-
--#define Intf_val(a) ((struct mmap_interface *) a)
-+#define Intf_val(a) ((struct mmap_interface *)Data_abstract_val(a))
-
- static int mmap_interface_init(struct mmap_interface *intf,
- int fd, int pflag, int mflag,
-diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
-index 7a91fdee75..1f58524535 100644
---- a/tools/ocaml/libs/xb/xs_ring_stubs.c
-+++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
-@@ -35,7 +35,7 @@
- #include <sys/mman.h>
- #include "mmap_stubs.h"
-
--#define GET_C_STRUCT(a) ((struct mmap_interface *) a)
-+#define GET_C_STRUCT(a) ((struct mmap_interface *)Data_abstract_val(a))
-
- /*
- * Bytes_val has been introduced by Ocaml 4.06.1. So define our own version
---
-2.40.0
-
diff --git a/0036-x86-entry-Track-the-IST-ness-of-an-entry-for-the-exi.patch b/0036-x86-entry-Track-the-IST-ness-of-an-entry-for-the-exi.patch
new file mode 100644
index 0000000..ba7ea21
--- /dev/null
+++ b/0036-x86-entry-Track-the-IST-ness-of-an-entry-for-the-exi.patch
@@ -0,0 +1,109 @@
+From e4a71bc0da0baf7464bb0d8e33053f330e5ea366 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 13 Sep 2023 12:20:12 +0100
+Subject: [PATCH 36/55] x86/entry: Track the IST-ness of an entry for the exit
+ paths
+
+Use %r12 to hold an ist_exit boolean. This register is zero elsewhere in the
+entry/exit asm, so it only needs setting in the IST path.
+
+As this is subtle and fragile, add check_ist_exit() to be used in debugging
+builds to cross-check that the ist_exit boolean matches the entry vector.
+
+Write check_ist_exit() it in C, because it's debug only and the logic more
+complicated than I care to maintain in asm.
+
+For now, we only need to use this signal in the exit-to-Xen path, but some
+exit-to-guest paths happen in IST context too. Check the correctness in all
+exit paths to avoid the logic bit-rotting.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 21bdc25b05a0f8ab6bc73520a9ca01327360732c)
+
+x86/entry: Partially revert IST-exit checks
+
+The patch adding check_ist_exit() didn't account for the fact that
+reset_stack_and_jump() is not an ABI-preserving boundary. The IST-ness in
+%r12 doesn't survive into the next context, and is a stale value C.
+
+This shows up in Gitlab CI for the Clang build:
+
+ https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/5112783827
+
+and in OSSTest for GCC 8:
+
+ http://logs.test-lab.xenproject.org/osstest/logs/183045/test-amd64-amd64-xl-qemuu-debianhvm-amd64/serial-pinot0.log
+
+There's no straightforward way to reconstruct the IST-exit-ness on the
+exit-to-guest path after a context switch. For now, we only need IST-exit on
+the return-to-Xen path.
+
+Fixes: 21bdc25b05a0 ("x86/entry: Track the IST-ness of an entry for the exit paths")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 9b57c800b79b96769ea3dcd6468578fa664d19f9)
+---
+ xen/arch/x86/traps.c | 13 +++++++++++++
+ xen/arch/x86/x86_64/entry.S | 13 ++++++++++++-
+ 2 files changed, 25 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
+index d12004b1c6..e65cc60041 100644
+--- a/xen/arch/x86/traps.c
++++ b/xen/arch/x86/traps.c
+@@ -2315,6 +2315,19 @@ void asm_domain_crash_synchronous(unsigned long addr)
+ do_softirq();
+ }
+
++#ifdef CONFIG_DEBUG
++void check_ist_exit(const struct cpu_user_regs *regs, bool ist_exit)
++{
++ const unsigned int ist_mask =
++ (1U << X86_EXC_NMI) | (1U << X86_EXC_DB) |
++ (1U << X86_EXC_DF) | (1U << X86_EXC_MC);
++ uint8_t ev = regs->entry_vector;
++ bool is_ist = (ev < TRAP_nr) && ((1U << ev) & ist_mask);
++
++ ASSERT(is_ist == ist_exit);
++}
++#endif
++
+ /*
+ * Local variables:
+ * mode: C
+diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
+index 92279a225d..4cebc4fbe3 100644
+--- a/xen/arch/x86/x86_64/entry.S
++++ b/xen/arch/x86/x86_64/entry.S
+@@ -659,8 +659,15 @@ ENTRY(early_page_fault)
+ .section .text.entry, "ax", @progbits
+
+ ALIGN
+-/* No special register assumptions. */
++/* %r12=ist_exit */
+ restore_all_xen:
++
++#ifdef CONFIG_DEBUG
++ mov %rsp, %rdi
++ mov %r12, %rsi
++ call check_ist_exit
++#endif
++
+ /*
+ * Check whether we need to switch to the per-CPU page tables, in
+ * case we return to late PV exit code (from an NMI or #MC).
+@@ -1091,6 +1098,10 @@ handle_ist_exception:
+ .L_ist_dispatch_done:
+ mov %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
+ mov %bl, STACK_CPUINFO_FIELD(use_pv_cr3)(%r14)
++
++ /* This is an IST exit */
++ mov $1, %r12d
++
+ cmpb $TRAP_nmi,UREGS_entry_vector(%rsp)
+ jne ret_from_intr
+
+--
+2.42.0
+
diff --git a/0037-tools-ocaml-xb-Drop-Xs_ring.write.patch b/0037-tools-ocaml-xb-Drop-Xs_ring.write.patch
deleted file mode 100644
index 226ae52..0000000
--- a/0037-tools-ocaml-xb-Drop-Xs_ring.write.patch
+++ /dev/null
@@ -1,62 +0,0 @@
-From f0e653fb4aea77210b8096c170e82de3c2039d89 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Fri, 16 Dec 2022 18:25:20 +0000
-Subject: [PATCH 37/89] tools/ocaml/xb: Drop Xs_ring.write
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-This function is unusued (only Xs_ring.write_substring is used), and the
-bytes/string conversion here is backwards: the C stub implements the bytes
-version and then we use a Bytes.unsafe_of_string to convert a string into
-bytes.
-
-However the operation here really is read-only: we read from the string and
-write it to the ring, so the C stub should implement the read-only string
-version, and if needed we could use Bytes.unsafe_to_string to be able to send
-'bytes'. However that is not necessary as the 'bytes' version is dropped above.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 01f139215e678c2dc7d4bb3f9f2777069bb1b091)
----
- tools/ocaml/libs/xb/xs_ring.ml | 5 +----
- tools/ocaml/libs/xb/xs_ring_stubs.c | 2 +-
- 2 files changed, 2 insertions(+), 5 deletions(-)
-
-diff --git a/tools/ocaml/libs/xb/xs_ring.ml b/tools/ocaml/libs/xb/xs_ring.ml
-index db7f86bd27..dd5e014a33 100644
---- a/tools/ocaml/libs/xb/xs_ring.ml
-+++ b/tools/ocaml/libs/xb/xs_ring.ml
-@@ -25,14 +25,11 @@ module Server_features = Set.Make(struct
- end)
-
- external read: Xenmmap.mmap_interface -> bytes -> int -> int = "ml_interface_read"
--external write: Xenmmap.mmap_interface -> bytes -> int -> int = "ml_interface_write"
-+external write_substring: Xenmmap.mmap_interface -> string -> int -> int = "ml_interface_write"
-
- external _internal_set_server_features: Xenmmap.mmap_interface -> int -> unit = "ml_interface_set_server_features" [@@noalloc]
- external _internal_get_server_features: Xenmmap.mmap_interface -> int = "ml_interface_get_server_features" [@@noalloc]
-
--let write_substring mmap buff len =
-- write mmap (Bytes.unsafe_of_string buff) len
--
- let get_server_features mmap =
- (* NB only one feature currently defined above *)
- let x = _internal_get_server_features mmap in
-diff --git a/tools/ocaml/libs/xb/xs_ring_stubs.c b/tools/ocaml/libs/xb/xs_ring_stubs.c
-index 1f58524535..1243c63f03 100644
---- a/tools/ocaml/libs/xb/xs_ring_stubs.c
-+++ b/tools/ocaml/libs/xb/xs_ring_stubs.c
-@@ -112,7 +112,7 @@ CAMLprim value ml_interface_write(value ml_interface,
- CAMLlocal1(ml_result);
-
- struct mmap_interface *interface = GET_C_STRUCT(ml_interface);
-- const unsigned char *buffer = Bytes_val(ml_buffer);
-+ const char *buffer = String_val(ml_buffer);
- int len = Int_val(ml_len);
- int result;
-
---
-2.40.0
-
diff --git a/0037-x86-spec-ctrl-Issue-VERW-during-IST-exit-to-Xen.patch b/0037-x86-spec-ctrl-Issue-VERW-during-IST-exit-to-Xen.patch
new file mode 100644
index 0000000..6580907
--- /dev/null
+++ b/0037-x86-spec-ctrl-Issue-VERW-during-IST-exit-to-Xen.patch
@@ -0,0 +1,89 @@
+From 2e2c3efcfc9f183674a8de6ed954ffbe7188b70d Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 13 Sep 2023 13:53:33 +0100
+Subject: [PATCH 37/55] x86/spec-ctrl: Issue VERW during IST exit to Xen
+
+There is a corner case where e.g. an NMI hitting an exit-to-guest path after
+SPEC_CTRL_EXIT_TO_* would have run the entire NMI handler *after* the VERW
+flush to scrub potentially sensitive data from uarch buffers.
+
+In order to compensate, issue VERW when exiting to Xen from an IST entry.
+
+SPEC_CTRL_EXIT_TO_XEN already has two reads of spec_ctrl_flags off the stack,
+and we're about to add a third. Load the field into %ebx, and list the
+register as clobbered.
+
+%r12 has been arranged to be the ist_exit signal, so add this as an input
+dependency and use it to identify when to issue a VERW.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 3ee6066bcd737756b0990d417d94eddc0b0d2585)
+---
+ xen/arch/x86/include/asm/spec_ctrl_asm.h | 20 +++++++++++++++-----
+ xen/arch/x86/x86_64/entry.S | 2 +-
+ 2 files changed, 16 insertions(+), 6 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+index 66c706496f..28a75796e6 100644
+--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
++++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+@@ -357,10 +357,12 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
+ */
+ .macro SPEC_CTRL_EXIT_TO_XEN
+ /*
+- * Requires %r14=stack_end
+- * Clobbers %rax, %rcx, %rdx
++ * Requires %r12=ist_exit, %r14=stack_end
++ * Clobbers %rax, %rbx, %rcx, %rdx
+ */
+- testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
++ movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
++
++ testb $SCF_ist_sc_msr, %bl
+ jz .L\@_skip_sc_msr
+
+ /*
+@@ -371,7 +373,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
+ */
+ xor %edx, %edx
+
+- testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
++ testb $SCF_use_shadow, %bl
+ jz .L\@_skip_sc_msr
+
+ mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%r14), %eax
+@@ -380,8 +382,16 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
+
+ .L\@_skip_sc_msr:
+
+- /* TODO VERW */
++ test %r12, %r12
++ jz .L\@_skip_ist_exit
++
++ /* Logically DO_SPEC_CTRL_COND_VERW but without the %rsp=cpuinfo dependency */
++ testb $SCF_verw, %bl
++ jz .L\@_skip_verw
++ verw STACK_CPUINFO_FIELD(verw_sel)(%r14)
++.L\@_skip_verw:
+
++.L\@_skip_ist_exit:
+ .endm
+
+ #endif /* __ASSEMBLY__ */
+diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
+index 4cebc4fbe3..c12e011b4d 100644
+--- a/xen/arch/x86/x86_64/entry.S
++++ b/xen/arch/x86/x86_64/entry.S
+@@ -680,7 +680,7 @@ UNLIKELY_START(ne, exit_cr3)
+ UNLIKELY_END(exit_cr3)
+
+ /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
+- SPEC_CTRL_EXIT_TO_XEN /* Req: %r14=end, Clob: acd */
++ SPEC_CTRL_EXIT_TO_XEN /* Req: %r12=ist_exit %r14=end, Clob: abcd */
+
+ RESTORE_ALL adj=8
+ iretq
+--
+2.42.0
+
diff --git a/0038-tools-oxenstored-validate-config-file-before-live-up.patch b/0038-tools-oxenstored-validate-config-file-before-live-up.patch
deleted file mode 100644
index 5b7f58a..0000000
--- a/0038-tools-oxenstored-validate-config-file-before-live-up.patch
+++ /dev/null
@@ -1,131 +0,0 @@
-From e74d868b48d55dfb20f5a41ec20fbec93d8e5deb Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edvin.torok@citrix.com>
-Date: Tue, 11 May 2021 15:56:50 +0000
-Subject: [PATCH 38/89] tools/oxenstored: validate config file before live
- update
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The configuration file can contain typos or various errors that could prevent
-live update from succeeding (e.g. a flag only valid on a different version).
-Unknown entries in the config file would be ignored on startup normally,
-add a strict --config-test that live-update can use to check that the config file
-is valid *for the new binary*.
-
-For compatibility with running old code during live update recognize
---live --help as an equivalent to --config-test.
-
-Signed-off-by: Edwin Török <edvin.torok@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit e6f07052ce4a0f0b7d4dc522d87465efb2d9ee86)
----
- tools/ocaml/xenstored/parse_arg.ml | 26 ++++++++++++++++++++++++++
- tools/ocaml/xenstored/xenstored.ml | 11 +++++++++--
- 2 files changed, 35 insertions(+), 2 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/parse_arg.ml b/tools/ocaml/xenstored/parse_arg.ml
-index 7c0478e76a..5e4ca6f1f7 100644
---- a/tools/ocaml/xenstored/parse_arg.ml
-+++ b/tools/ocaml/xenstored/parse_arg.ml
-@@ -26,8 +26,14 @@ type config =
- restart: bool;
- live_reload: bool;
- disable_socket: bool;
-+ config_test: bool;
- }
-
-+let get_config_filename config_file =
-+ match config_file with
-+ | Some name -> name
-+ | None -> Define.default_config_dir ^ "/oxenstored.conf"
-+
- let do_argv =
- let pidfile = ref "" and tracefile = ref "" (* old xenstored compatibility *)
- and domain_init = ref true
-@@ -38,6 +44,8 @@ let do_argv =
- and restart = ref false
- and live_reload = ref false
- and disable_socket = ref false
-+ and config_test = ref false
-+ and help = ref false
- in
-
- let speclist =
-@@ -55,10 +63,27 @@ let do_argv =
- ("-T", Arg.Set_string tracefile, ""); (* for compatibility *)
- ("--restart", Arg.Set restart, "Read database on starting");
- ("--live", Arg.Set live_reload, "Read live dump on startup");
-+ ("--config-test", Arg.Set config_test, "Test validity of config file");
- ("--disable-socket", Arg.Unit (fun () -> disable_socket := true), "Disable socket");
-+ ("--help", Arg.Set help, "Display this list of options")
- ] in
- let usage_msg = "usage : xenstored [--config-file <filename>] [--no-domain-init] [--help] [--no-fork] [--reraise-top-level] [--restart] [--disable-socket]" in
- Arg.parse speclist (fun _ -> ()) usage_msg;
-+ let () =
-+ if !help then begin
-+ if !live_reload then
-+ (*
-+ * Transform --live --help into --config-test for backward compat with
-+ * running code during live update.
-+ * Caller will validate config and exit
-+ *)
-+ config_test := true
-+ else begin
-+ Arg.usage_string speclist usage_msg |> print_endline;
-+ exit 0
-+ end
-+ end
-+ in
- {
- domain_init = !domain_init;
- activate_access_log = !activate_access_log;
-@@ -70,4 +95,5 @@ let do_argv =
- restart = !restart;
- live_reload = !live_reload;
- disable_socket = !disable_socket;
-+ config_test = !config_test;
- }
-diff --git a/tools/ocaml/xenstored/xenstored.ml b/tools/ocaml/xenstored/xenstored.ml
-index 4d5851c5cb..e2638a5af2 100644
---- a/tools/ocaml/xenstored/xenstored.ml
-+++ b/tools/ocaml/xenstored/xenstored.ml
-@@ -88,7 +88,7 @@ let default_pidfile = Paths.xen_run_dir ^ "/xenstored.pid"
-
- let ring_scan_interval = ref 20
-
--let parse_config filename =
-+let parse_config ?(strict=false) filename =
- let pidfile = ref default_pidfile in
- let options = [
- ("merge-activate", Config.Set_bool Transaction.do_coalesce);
-@@ -129,11 +129,12 @@ let parse_config filename =
- ("xenstored-port", Config.Set_string Domains.xenstored_port); ] in
- begin try Config.read filename options (fun _ _ -> raise Not_found)
- with
-- | Config.Error err -> List.iter (fun (k, e) ->
-+ | Config.Error err as e -> List.iter (fun (k, e) ->
- match e with
- | "unknown key" -> eprintf "config: unknown key %s\n" k
- | _ -> eprintf "config: %s: %s\n" k e
- ) err;
-+ if strict then raise e
- | Sys_error m -> eprintf "error: config: %s\n" m;
- end;
- !pidfile
-@@ -358,6 +359,12 @@ let tweak_gc () =
- let () =
- Printexc.set_uncaught_exception_handler Logging.fallback_exception_handler;
- let cf = do_argv in
-+ if cf.config_test then begin
-+ let path = config_filename cf in
-+ let _pidfile:string = parse_config ~strict:true path in
-+ Printf.printf "Configuration valid at %s\n%!" path;
-+ exit 0
-+ end;
- let pidfile =
- if Sys.file_exists (config_filename cf) then
- parse_config (config_filename cf)
---
-2.40.0
-
diff --git a/0038-x86-amd-Introduce-is_zen-1-2-_uarch-predicates.patch b/0038-x86-amd-Introduce-is_zen-1-2-_uarch-predicates.patch
new file mode 100644
index 0000000..6f2cdcb
--- /dev/null
+++ b/0038-x86-amd-Introduce-is_zen-1-2-_uarch-predicates.patch
@@ -0,0 +1,91 @@
+From 19ee1e1faa32b79274b3484cb1170a5970f1e602 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 15 Sep 2023 12:13:51 +0100
+Subject: [PATCH 38/55] x86/amd: Introduce is_zen{1,2}_uarch() predicates
+
+We already have 3 cases using STIBP as a Zen1/2 heuristic, and are about to
+introduce a 4th. Wrap the heuristic into a pair of predicates rather than
+opencoding it, and the explanation of the heuristic, at each usage site.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit de1d265001397f308c5c3c5d3ffc30e7ef8c0705)
+---
+ xen/arch/x86/cpu/amd.c | 18 ++++--------------
+ xen/arch/x86/include/asm/amd.h | 11 +++++++++++
+ 2 files changed, 15 insertions(+), 14 deletions(-)
+
+diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
+index 1bb3044be1..e94ba5a0e0 100644
+--- a/xen/arch/x86/cpu/amd.c
++++ b/xen/arch/x86/cpu/amd.c
+@@ -855,15 +855,13 @@ void amd_set_legacy_ssbd(bool enable)
+ * non-branch instructions to be ignored. It is to be set unilaterally in
+ * newer microcode.
+ *
+- * This chickenbit is something unrelated on Zen1, and Zen1 vs Zen2 isn't a
+- * simple model number comparison, so use STIBP as a heuristic to separate the
+- * two uarches in Fam17h(AMD)/18h(Hygon).
++ * This chickenbit is something unrelated on Zen1.
+ */
+ void amd_init_spectral_chicken(void)
+ {
+ uint64_t val, chickenbit = 1 << 1;
+
+- if (cpu_has_hypervisor || !boot_cpu_has(X86_FEATURE_AMD_STIBP))
++ if (cpu_has_hypervisor || !is_zen2_uarch())
+ return;
+
+ if (rdmsr_safe(MSR_AMD64_DE_CFG2, val) == 0 && !(val & chickenbit))
+@@ -912,11 +910,8 @@ void amd_check_zenbleed(void)
+ * With the Fam17h check above, most parts getting here are
+ * Zen1. They're not affected. Assume Zen2 ones making it
+ * here are affected regardless of microcode version.
+- *
+- * Zen1 vs Zen2 isn't a simple model number comparison, so use
+- * STIBP as a heuristic to distinguish.
+ */
+- if (!boot_cpu_has(X86_FEATURE_AMD_STIBP))
++ if (is_zen1_uarch())
+ return;
+ good_rev = ~0U;
+ break;
+@@ -1277,12 +1272,7 @@ static int __init cf_check zen2_c6_errata_check(void)
+ */
+ s_time_t delta;
+
+- /*
+- * Zen1 vs Zen2 isn't a simple model number comparison, so use STIBP as
+- * a heuristic to separate the two uarches in Fam17h.
+- */
+- if (cpu_has_hypervisor || boot_cpu_data.x86 != 0x17 ||
+- !boot_cpu_has(X86_FEATURE_AMD_STIBP))
++ if (cpu_has_hypervisor || boot_cpu_data.x86 != 0x17 || !is_zen2_uarch())
+ return 0;
+
+ /*
+diff --git a/xen/arch/x86/include/asm/amd.h b/xen/arch/x86/include/asm/amd.h
+index a975d3de26..82324110ab 100644
+--- a/xen/arch/x86/include/asm/amd.h
++++ b/xen/arch/x86/include/asm/amd.h
+@@ -140,6 +140,17 @@
+ AMD_MODEL_RANGE(0x11, 0x0, 0x0, 0xff, 0xf), \
+ AMD_MODEL_RANGE(0x12, 0x0, 0x0, 0xff, 0xf))
+
++/*
++ * The Zen1 and Zen2 microarchitectures are implemented by AMD (Fam17h) and
++ * Hygon (Fam18h) but without simple model number rules. Instead, use STIBP
++ * as a heuristic that distinguishes the two.
++ *
++ * The caller is required to perform the appropriate vendor/family checks
++ * first.
++ */
++#define is_zen1_uarch() (!boot_cpu_has(X86_FEATURE_AMD_STIBP))
++#define is_zen2_uarch() boot_cpu_has(X86_FEATURE_AMD_STIBP)
++
+ struct cpuinfo_x86;
+ int cpu_has_amd_erratum(const struct cpuinfo_x86 *, int, ...);
+
+--
+2.42.0
+
diff --git a/0039-tools-ocaml-libs-Don-t-declare-stubs-as-taking-void.patch b/0039-tools-ocaml-libs-Don-t-declare-stubs-as-taking-void.patch
deleted file mode 100644
index c967391..0000000
--- a/0039-tools-ocaml-libs-Don-t-declare-stubs-as-taking-void.patch
+++ /dev/null
@@ -1,61 +0,0 @@
-From 2c21e1bee6d62cbd523069e839086addf35da9f2 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
-Date: Thu, 12 Jan 2023 11:28:29 +0000
-Subject: [PATCH 39/89] tools/ocaml/libs: Don't declare stubs as taking void
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-There is no such thing as an Ocaml function (C stub or otherwise) taking no
-parameters. In the absence of any other parameters, unit is still passed.
-
-This doesn't explode with any ABI we care about, but would malfunction for an
-ABI environment such as stdcall.
-
-Fixes: c3afd398ba7f ("ocaml: Add XS bindings.")
-Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
-Signed-off-by: Edwin Török <edwin.torok@cloud.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit ff8b560be80b9211c303d74df7e4b3921d2bb8ca)
----
- tools/ocaml/libs/xb/xenbus_stubs.c | 5 ++---
- tools/ocaml/libs/xc/xenctrl_stubs.c | 4 ++--
- 2 files changed, 4 insertions(+), 5 deletions(-)
-
-diff --git a/tools/ocaml/libs/xb/xenbus_stubs.c b/tools/ocaml/libs/xb/xenbus_stubs.c
-index 3065181a55..97116b0782 100644
---- a/tools/ocaml/libs/xb/xenbus_stubs.c
-+++ b/tools/ocaml/libs/xb/xenbus_stubs.c
-@@ -30,10 +30,9 @@
- #include <xenctrl.h>
- #include <xen/io/xs_wire.h>
-
--CAMLprim value stub_header_size(void)
-+CAMLprim value stub_header_size(value unit)
- {
-- CAMLparam0();
-- CAMLreturn(Val_int(sizeof(struct xsd_sockmsg)));
-+ return Val_int(sizeof(struct xsd_sockmsg));
- }
-
- CAMLprim value stub_header_of_string(value s)
-diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
-index f37848ae0b..6eb0ea69da 100644
---- a/tools/ocaml/libs/xc/xenctrl_stubs.c
-+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
-@@ -67,9 +67,9 @@ static void Noreturn failwith_xc(xc_interface *xch)
- caml_raise_with_string(*caml_named_value("xc.error"), error_str);
- }
-
--CAMLprim value stub_xc_interface_open(void)
-+CAMLprim value stub_xc_interface_open(value unit)
- {
-- CAMLparam0();
-+ CAMLparam1(unit);
- xc_interface *xch;
-
- /* Don't assert XC_OPENFLAG_NON_REENTRANT because these bindings
---
-2.40.0
-
diff --git a/0039-x86-spec-ctrl-Mitigate-the-Zen1-DIV-leakage.patch b/0039-x86-spec-ctrl-Mitigate-the-Zen1-DIV-leakage.patch
new file mode 100644
index 0000000..4b23d12
--- /dev/null
+++ b/0039-x86-spec-ctrl-Mitigate-the-Zen1-DIV-leakage.patch
@@ -0,0 +1,228 @@
+From 9ac2f49f5fa3a5159409241d4f74fb0d721dd4c5 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 30 Aug 2023 20:24:25 +0100
+Subject: [PATCH 39/55] x86/spec-ctrl: Mitigate the Zen1 DIV leakage
+
+In the Zen1 microarchitecure, there is one divider in the pipeline which
+services uops from both threads. In the case of #DE, the latched result from
+the previous DIV to execute will be forwarded speculatively.
+
+This is an interesting covert channel that allows two threads to communicate
+without any system calls. In also allows userspace to obtain the result of
+the most recent DIV instruction executed (even speculatively) in the core,
+which can be from a higher privilege context.
+
+Scrub the result from the divider by executing a non-faulting divide. This
+needs performing on the exit-to-guest paths, and ist_exit-to-Xen.
+
+Alternatives in IST context is believed safe now that it's done in NMI
+context.
+
+This is XSA-439 / CVE-2023-20588.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit b5926c6ecf05c28ee99c6248c42d691ccbf0c315)
+---
+ docs/misc/xen-command-line.pandoc | 6 ++-
+ xen/arch/x86/hvm/svm/entry.S | 1 +
+ xen/arch/x86/include/asm/cpufeatures.h | 2 +-
+ xen/arch/x86/include/asm/spec_ctrl_asm.h | 17 +++++++++
+ xen/arch/x86/spec_ctrl.c | 48 +++++++++++++++++++++++-
+ 5 files changed, 71 insertions(+), 3 deletions(-)
+
+diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
+index d9dae740cc..b92c8f969c 100644
+--- a/docs/misc/xen-command-line.pandoc
++++ b/docs/misc/xen-command-line.pandoc
+@@ -2315,7 +2315,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
+ > {msr-sc,rsb,md-clear,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
+ > bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
+ > eager-fpu,l1d-flush,branch-harden,srb-lock,
+-> unpriv-mmio,gds-mit}=<bool> ]`
++> unpriv-mmio,gds-mit,div-scrub}=<bool> ]`
+
+ Controls for speculative execution sidechannel mitigations. By default, Xen
+ will pick the most appropriate mitigations based on compiled in support,
+@@ -2437,6 +2437,10 @@ has elected not to lock the configuration, Xen will use GDS_CTRL to mitigate
+ GDS with. Otherwise, Xen will mitigate by disabling AVX, which blocks the use
+ of the AVX2 Gather instructions.
+
++On all hardware, the `div-scrub=` option can be used to force or prevent Xen
++from mitigating the DIV-leakage vulnerability. By default, Xen will mitigate
++DIV-leakage on hardware believed to be vulnerable.
++
+ ### sync_console
+ > `= <boolean>`
+
+diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
+index 981cd82e7c..934f12cf5c 100644
+--- a/xen/arch/x86/hvm/svm/entry.S
++++ b/xen/arch/x86/hvm/svm/entry.S
+@@ -74,6 +74,7 @@ __UNLIKELY_END(nsvm_hap)
+ 1: /* No Spectre v1 concerns. Execution will hit VMRUN imminently. */
+ .endm
+ ALTERNATIVE "", svm_vmentry_spec_ctrl, X86_FEATURE_SC_MSR_HVM
++ ALTERNATIVE "", DO_SPEC_CTRL_DIV, X86_FEATURE_SC_DIV
+
+ pop %r15
+ pop %r14
+diff --git a/xen/arch/x86/include/asm/cpufeatures.h b/xen/arch/x86/include/asm/cpufeatures.h
+index da0593de85..c3aad21c3b 100644
+--- a/xen/arch/x86/include/asm/cpufeatures.h
++++ b/xen/arch/x86/include/asm/cpufeatures.h
+@@ -35,7 +35,7 @@ XEN_CPUFEATURE(SC_RSB_HVM, X86_SYNTH(19)) /* RSB overwrite needed for HVM
+ XEN_CPUFEATURE(XEN_SELFSNOOP, X86_SYNTH(20)) /* SELFSNOOP gets used by Xen itself */
+ XEN_CPUFEATURE(SC_MSR_IDLE, X86_SYNTH(21)) /* Clear MSR_SPEC_CTRL on idle */
+ XEN_CPUFEATURE(XEN_LBR, X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
+-/* Bits 23 unused. */
++XEN_CPUFEATURE(SC_DIV, X86_SYNTH(23)) /* DIV scrub needed */
+ XEN_CPUFEATURE(SC_RSB_IDLE, X86_SYNTH(24)) /* RSB overwrite needed for idle. */
+ XEN_CPUFEATURE(SC_VERW_IDLE, X86_SYNTH(25)) /* VERW used by Xen for idle */
+ XEN_CPUFEATURE(XEN_SHSTK, X86_SYNTH(26)) /* Xen uses CET Shadow Stacks */
+diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+index 28a75796e6..f4b8b9d956 100644
+--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
++++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+@@ -177,6 +177,19 @@
+ .L\@_verw_skip:
+ .endm
+
++.macro DO_SPEC_CTRL_DIV
++/*
++ * Requires nothing
++ * Clobbers %rax
++ *
++ * Issue a DIV for its flushing side effect (Zen1 uarch specific). Any
++ * non-faulting DIV will do; a byte DIV has least latency, and doesn't clobber
++ * %rdx.
++ */
++ mov $1, %eax
++ div %al
++.endm
++
+ .macro DO_SPEC_CTRL_ENTRY maybexen:req
+ /*
+ * Requires %rsp=regs (also cpuinfo if !maybexen)
+@@ -279,6 +292,8 @@
+ ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV
+
+ DO_SPEC_CTRL_COND_VERW
++
++ ALTERNATIVE "", DO_SPEC_CTRL_DIV, X86_FEATURE_SC_DIV
+ .endm
+
+ /*
+@@ -391,6 +406,8 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
+ verw STACK_CPUINFO_FIELD(verw_sel)(%r14)
+ .L\@_skip_verw:
+
++ ALTERNATIVE "", DO_SPEC_CTRL_DIV, X86_FEATURE_SC_DIV
++
+ .L\@_skip_ist_exit:
+ .endm
+
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 79b98f0fe7..0ff3c895ac 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -79,6 +79,7 @@ static int8_t __initdata opt_srb_lock = -1;
+ static bool __initdata opt_unpriv_mmio;
+ static bool __ro_after_init opt_fb_clear_mmio;
+ static int8_t __initdata opt_gds_mit = -1;
++static int8_t __initdata opt_div_scrub = -1;
+
+ static int __init cf_check parse_spec_ctrl(const char *s)
+ {
+@@ -133,6 +134,7 @@ static int __init cf_check parse_spec_ctrl(const char *s)
+ opt_srb_lock = 0;
+ opt_unpriv_mmio = false;
+ opt_gds_mit = 0;
++ opt_div_scrub = 0;
+ }
+ else if ( val > 0 )
+ rc = -EINVAL;
+@@ -285,6 +287,8 @@ static int __init cf_check parse_spec_ctrl(const char *s)
+ opt_unpriv_mmio = val;
+ else if ( (val = parse_boolean("gds-mit", s, ss)) >= 0 )
+ opt_gds_mit = val;
++ else if ( (val = parse_boolean("div-scrub", s, ss)) >= 0 )
++ opt_div_scrub = val;
+ else
+ rc = -EINVAL;
+
+@@ -485,7 +489,7 @@ static void __init print_details(enum ind_thunk thunk)
+ "\n");
+
+ /* Settings for Xen's protection, irrespective of guests. */
+- printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s\n",
++ printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s\n",
+ thunk == THUNK_NONE ? "N/A" :
+ thunk == THUNK_RETPOLINE ? "RETPOLINE" :
+ thunk == THUNK_LFENCE ? "LFENCE" :
+@@ -510,6 +514,7 @@ static void __init print_details(enum ind_thunk thunk)
+ opt_l1d_flush ? " L1D_FLUSH" : "",
+ opt_md_clear_pv || opt_md_clear_hvm ||
+ opt_fb_clear_mmio ? " VERW" : "",
++ opt_div_scrub ? " DIV" : "",
+ opt_branch_harden ? " BRANCH_HARDEN" : "");
+
+ /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
+@@ -967,6 +972,45 @@ static void __init srso_calculations(bool hw_smt_enabled)
+ setup_force_cpu_cap(X86_FEATURE_SRSO_NO);
+ }
+
++/*
++ * The Div leakage issue is specific to the AMD Zen1 microarchitecure.
++ *
++ * However, there's no $FOO_NO bit defined, so if we're virtualised we have no
++ * hope of spotting the case where we might move to vulnerable hardware. We
++ * also can't make any useful conclusion about SMT-ness.
++ *
++ * Don't check the hypervisor bit, so at least we do the safe thing when
++ * booting on something that looks like a Zen1 CPU.
++ */
++static bool __init has_div_vuln(void)
++{
++ if ( !(boot_cpu_data.x86_vendor &
++ (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
++ return false;
++
++ if ( boot_cpu_data.x86 != 0x17 && boot_cpu_data.x86 != 0x18 )
++ return false;
++
++ return is_zen1_uarch();
++}
++
++static void __init div_calculations(bool hw_smt_enabled)
++{
++ bool cpu_bug_div = has_div_vuln();
++
++ if ( opt_div_scrub == -1 )
++ opt_div_scrub = cpu_bug_div;
++
++ if ( opt_div_scrub )
++ setup_force_cpu_cap(X86_FEATURE_SC_DIV);
++
++ if ( opt_smt == -1 && !cpu_has_hypervisor && cpu_bug_div && hw_smt_enabled )
++ warning_add(
++ "Booted on leaky-DIV hardware with SMT/Hyperthreading\n"
++ "enabled. Please assess your configuration and choose an\n"
++ "explicit 'smt=<bool>' setting. See XSA-439.\n");
++}
++
+ static void __init ibpb_calculations(void)
+ {
+ bool def_ibpb_entry = false;
+@@ -1726,6 +1770,8 @@ void __init init_speculation_mitigations(void)
+
+ ibpb_calculations();
+
++ div_calculations(hw_smt_enabled);
++
+ /* Check whether Eager FPU should be enabled by default. */
+ if ( opt_eager_fpu == -1 )
+ opt_eager_fpu = should_use_eager_fpu();
+--
+2.42.0
+
diff --git a/0040-tools-ocaml-libs-Allocate-the-correct-amount-of-memo.patch b/0040-tools-ocaml-libs-Allocate-the-correct-amount-of-memo.patch
deleted file mode 100644
index 5a26683..0000000
--- a/0040-tools-ocaml-libs-Allocate-the-correct-amount-of-memo.patch
+++ /dev/null
@@ -1,80 +0,0 @@
-From 5797b798a542a7e5be34698463152cb92f18776f Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 31 Jan 2023 10:59:42 +0000
-Subject: [PATCH 40/89] tools/ocaml/libs: Allocate the correct amount of memory
- for Abstract_tag
-
-caml_alloc() takes units of Wsize (word size), not bytes. As a consequence,
-we're allocating 4 or 8 times too much memory.
-
-Ocaml has a helper, Wsize_bsize(), but it truncates cases which aren't an
-exact multiple. Use a BUILD_BUG_ON() to cover the potential for truncation,
-as there's no rounding-up form of the helper.
-
-Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
-Fixes: d3e649277a13 ("ocaml: add mmap bindings implementation.")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 36eb2de31b6ecb8787698fb1a701bd708c8971b2)
----
- tools/ocaml/libs/mmap/Makefile | 2 ++
- tools/ocaml/libs/mmap/xenmmap_stubs.c | 6 +++++-
- tools/ocaml/libs/xc/xenctrl_stubs.c | 5 ++++-
- 3 files changed, 11 insertions(+), 2 deletions(-)
-
-diff --git a/tools/ocaml/libs/mmap/Makefile b/tools/ocaml/libs/mmap/Makefile
-index a621537135..855b8b2c98 100644
---- a/tools/ocaml/libs/mmap/Makefile
-+++ b/tools/ocaml/libs/mmap/Makefile
-@@ -2,6 +2,8 @@ OCAML_TOPLEVEL=$(CURDIR)/../..
- XEN_ROOT=$(OCAML_TOPLEVEL)/../..
- include $(OCAML_TOPLEVEL)/common.make
-
-+CFLAGS += $(CFLAGS_xeninclude)
-+
- OBJS = xenmmap
- INTF = $(foreach obj, $(OBJS),$(obj).cmi)
- LIBS = xenmmap.cma xenmmap.cmxa
-diff --git a/tools/ocaml/libs/mmap/xenmmap_stubs.c b/tools/ocaml/libs/mmap/xenmmap_stubs.c
-index e03951d781..d623ad390e 100644
---- a/tools/ocaml/libs/mmap/xenmmap_stubs.c
-+++ b/tools/ocaml/libs/mmap/xenmmap_stubs.c
-@@ -21,6 +21,8 @@
- #include <errno.h>
- #include "mmap_stubs.h"
-
-+#include <xen-tools/libs.h>
-+
- #include <caml/mlvalues.h>
- #include <caml/memory.h>
- #include <caml/alloc.h>
-@@ -59,7 +61,9 @@ CAMLprim value stub_mmap_init(value fd, value pflag, value mflag,
- default: caml_invalid_argument("maptype");
- }
-
-- result = caml_alloc(sizeof(struct mmap_interface), Abstract_tag);
-+ BUILD_BUG_ON((sizeof(struct mmap_interface) % sizeof(value)) != 0);
-+ result = caml_alloc(Wsize_bsize(sizeof(struct mmap_interface)),
-+ Abstract_tag);
-
- if (mmap_interface_init(Intf_val(result), Int_val(fd),
- c_pflag, c_mflag,
-diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
-index 6eb0ea69da..e25367531b 100644
---- a/tools/ocaml/libs/xc/xenctrl_stubs.c
-+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
-@@ -956,7 +956,10 @@ CAMLprim value stub_map_foreign_range(value xch, value dom,
- uint32_t c_dom;
- unsigned long c_mfn;
-
-- result = caml_alloc(sizeof(struct mmap_interface), Abstract_tag);
-+ BUILD_BUG_ON((sizeof(struct mmap_interface) % sizeof(value)) != 0);
-+ result = caml_alloc(Wsize_bsize(sizeof(struct mmap_interface)),
-+ Abstract_tag);
-+
- intf = (struct mmap_interface *) result;
-
- intf->len = Int_val(size);
---
-2.40.0
-
diff --git a/0040-x86-shadow-defer-releasing-of-PV-s-top-level-shadow-.patch b/0040-x86-shadow-defer-releasing-of-PV-s-top-level-shadow-.patch
new file mode 100644
index 0000000..21fb16f
--- /dev/null
+++ b/0040-x86-shadow-defer-releasing-of-PV-s-top-level-shadow-.patch
@@ -0,0 +1,455 @@
+From 90c540c58985dc774cf0a1d2dc423473d3f37267 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <JBeulich@suse.com>
+Date: Wed, 20 Sep 2023 10:33:26 +0100
+Subject: [PATCH 40/55] x86/shadow: defer releasing of PV's top-level shadow
+ reference
+
+sh_set_toplevel_shadow() re-pinning the top-level shadow we may be
+running on is not enough (and at the same time unnecessary when the
+shadow isn't what we're running on): That shadow becomes eligible for
+blowing away (from e.g. shadow_prealloc()) immediately after the
+paging lock was dropped. Yet it needs to remain valid until the actual
+page table switch occurred.
+
+Propagate up the call chain the shadow entry that needs releasing
+eventually, and carry out the release immediately after switching page
+tables. Handle update_cr3() failures by switching to idle pagetables.
+Note that various further uses of update_cr3() are HVM-only or only act
+on paused vCPU-s, in which case sh_set_toplevel_shadow() will not defer
+releasing of the reference.
+
+While changing the update_cr3() hook, also convert the "do_locking"
+parameter to boolean.
+
+This is CVE-2023-34322 / XSA-438.
+
+Reported-by: Tim Deegan <tim@xen.org>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: George Dunlap <george.dunlap@cloud.com>
+(cherry picked from commit fb0ff49fe9f784bfee0370c2a3c5f20e39d7a1cb)
+---
+ xen/arch/x86/include/asm/mm.h | 2 +-
+ xen/arch/x86/include/asm/paging.h | 6 ++--
+ xen/arch/x86/include/asm/shadow.h | 8 +++++
+ xen/arch/x86/mm.c | 27 +++++++++++----
+ xen/arch/x86/mm/hap/hap.c | 6 ++--
+ xen/arch/x86/mm/shadow/common.c | 55 ++++++++++++++++++++-----------
+ xen/arch/x86/mm/shadow/multi.c | 33 ++++++++++++-------
+ xen/arch/x86/mm/shadow/none.c | 4 ++-
+ xen/arch/x86/mm/shadow/private.h | 14 ++++----
+ xen/arch/x86/pv/domain.c | 25 ++++++++++++--
+ 10 files changed, 127 insertions(+), 53 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/mm.h b/xen/arch/x86/include/asm/mm.h
+index d723c7c38f..a5d7fdd32e 100644
+--- a/xen/arch/x86/include/asm/mm.h
++++ b/xen/arch/x86/include/asm/mm.h
+@@ -552,7 +552,7 @@ void audit_domains(void);
+ #endif
+
+ void make_cr3(struct vcpu *v, mfn_t mfn);
+-void update_cr3(struct vcpu *v);
++pagetable_t update_cr3(struct vcpu *v);
+ int vcpu_destroy_pagetables(struct vcpu *);
+ void *do_page_walk(struct vcpu *v, unsigned long addr);
+
+diff --git a/xen/arch/x86/include/asm/paging.h b/xen/arch/x86/include/asm/paging.h
+index 6f7000d5f4..94c590f31a 100644
+--- a/xen/arch/x86/include/asm/paging.h
++++ b/xen/arch/x86/include/asm/paging.h
+@@ -138,7 +138,7 @@ struct paging_mode {
+ paddr_t ga, uint32_t *pfec,
+ unsigned int *page_order);
+ #endif
+- void (*update_cr3 )(struct vcpu *v, int do_locking,
++ pagetable_t (*update_cr3 )(struct vcpu *v, bool do_locking,
+ bool noflush);
+ void (*update_paging_modes )(struct vcpu *v);
+ bool (*flush_tlb )(const unsigned long *vcpu_bitmap);
+@@ -310,9 +310,9 @@ static inline unsigned long paging_ga_to_gfn_cr3(struct vcpu *v,
+ /* Update all the things that are derived from the guest's CR3.
+ * Called when the guest changes CR3; the caller can then use v->arch.cr3
+ * as the value to load into the host CR3 to schedule this vcpu */
+-static inline void paging_update_cr3(struct vcpu *v, bool noflush)
++static inline pagetable_t paging_update_cr3(struct vcpu *v, bool noflush)
+ {
+- paging_get_hostmode(v)->update_cr3(v, 1, noflush);
++ return paging_get_hostmode(v)->update_cr3(v, 1, noflush);
+ }
+
+ /* Update all the things that are derived from the guest's CR0/CR3/CR4.
+diff --git a/xen/arch/x86/include/asm/shadow.h b/xen/arch/x86/include/asm/shadow.h
+index dad876d294..0b72c9eda8 100644
+--- a/xen/arch/x86/include/asm/shadow.h
++++ b/xen/arch/x86/include/asm/shadow.h
+@@ -99,6 +99,9 @@ int shadow_set_allocation(struct domain *d, unsigned int pages,
+
+ int shadow_get_allocation_bytes(struct domain *d, uint64_t *size);
+
++/* Helper to invoke for deferred releasing of a top-level shadow's reference. */
++void shadow_put_top_level(struct domain *d, pagetable_t old);
++
+ #else /* !CONFIG_SHADOW_PAGING */
+
+ #define shadow_vcpu_teardown(v) ASSERT(is_pv_vcpu(v))
+@@ -121,6 +124,11 @@ static inline void shadow_prepare_page_type_change(struct domain *d,
+
+ static inline void shadow_blow_tables_per_domain(struct domain *d) {}
+
++static inline void shadow_put_top_level(struct domain *d, pagetable_t old)
++{
++ ASSERT_UNREACHABLE();
++}
++
+ static inline int shadow_domctl(struct domain *d,
+ struct xen_domctl_shadow_op *sc,
+ XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index b46eee1332..e884a6fdbd 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -567,15 +567,12 @@ void write_ptbase(struct vcpu *v)
+ *
+ * Update ref counts to shadow tables appropriately.
+ */
+-void update_cr3(struct vcpu *v)
++pagetable_t update_cr3(struct vcpu *v)
+ {
+ mfn_t cr3_mfn;
+
+ if ( paging_mode_enabled(v->domain) )
+- {
+- paging_update_cr3(v, false);
+- return;
+- }
++ return paging_update_cr3(v, false);
+
+ if ( !(v->arch.flags & TF_kernel_mode) )
+ cr3_mfn = pagetable_get_mfn(v->arch.guest_table_user);
+@@ -583,6 +580,8 @@ void update_cr3(struct vcpu *v)
+ cr3_mfn = pagetable_get_mfn(v->arch.guest_table);
+
+ make_cr3(v, cr3_mfn);
++
++ return pagetable_null();
+ }
+
+ static inline void set_tlbflush_timestamp(struct page_info *page)
+@@ -3285,6 +3284,7 @@ int new_guest_cr3(mfn_t mfn)
+ struct domain *d = curr->domain;
+ int rc;
+ mfn_t old_base_mfn;
++ pagetable_t old_shadow;
+
+ if ( is_pv_32bit_domain(d) )
+ {
+@@ -3352,9 +3352,22 @@ int new_guest_cr3(mfn_t mfn)
+ if ( !VM_ASSIST(d, m2p_strict) )
+ fill_ro_mpt(mfn);
+ curr->arch.guest_table = pagetable_from_mfn(mfn);
+- update_cr3(curr);
++ old_shadow = update_cr3(curr);
++
++ /*
++ * In shadow mode update_cr3() can fail, in which case here we're still
++ * running on the prior top-level shadow (which we're about to release).
++ * Switch to the idle page tables in such an event; the guest will have
++ * been crashed already.
++ */
++ if ( likely(!mfn_eq(pagetable_get_mfn(old_shadow),
++ maddr_to_mfn(curr->arch.cr3 & ~X86_CR3_NOFLUSH))) )
++ write_ptbase(curr);
++ else
++ write_ptbase(idle_vcpu[curr->processor]);
+
+- write_ptbase(curr);
++ if ( !pagetable_is_null(old_shadow) )
++ shadow_put_top_level(d, old_shadow);
+
+ if ( likely(mfn_x(old_base_mfn) != 0) )
+ {
+diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
+index 0fc1b1d9ac..57a19c3d59 100644
+--- a/xen/arch/x86/mm/hap/hap.c
++++ b/xen/arch/x86/mm/hap/hap.c
+@@ -739,11 +739,13 @@ static bool cf_check hap_invlpg(struct vcpu *v, unsigned long linear)
+ return 1;
+ }
+
+-static void cf_check hap_update_cr3(
+- struct vcpu *v, int do_locking, bool noflush)
++static pagetable_t cf_check hap_update_cr3(
++ struct vcpu *v, bool do_locking, bool noflush)
+ {
+ v->arch.hvm.hw_cr[3] = v->arch.hvm.guest_cr[3];
+ hvm_update_guest_cr3(v, noflush);
++
++ return pagetable_null();
+ }
+
+ static bool flush_vcpu(const struct vcpu *v, const unsigned long *vcpu_bitmap)
+diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
+index cf5e181f74..c0940f939e 100644
+--- a/xen/arch/x86/mm/shadow/common.c
++++ b/xen/arch/x86/mm/shadow/common.c
+@@ -2590,13 +2590,13 @@ void cf_check shadow_update_paging_modes(struct vcpu *v)
+ }
+
+ /* Set up the top-level shadow and install it in slot 'slot' of shadow_table */
+-void sh_set_toplevel_shadow(struct vcpu *v,
+- unsigned int slot,
+- mfn_t gmfn,
+- unsigned int root_type,
+- mfn_t (*make_shadow)(struct vcpu *v,
+- mfn_t gmfn,
+- uint32_t shadow_type))
++pagetable_t sh_set_toplevel_shadow(struct vcpu *v,
++ unsigned int slot,
++ mfn_t gmfn,
++ unsigned int root_type,
++ mfn_t (*make_shadow)(struct vcpu *v,
++ mfn_t gmfn,
++ uint32_t shadow_type))
+ {
+ mfn_t smfn;
+ pagetable_t old_entry, new_entry;
+@@ -2653,20 +2653,37 @@ void sh_set_toplevel_shadow(struct vcpu *v,
+ mfn_x(gmfn), mfn_x(pagetable_get_mfn(new_entry)));
+ v->arch.paging.shadow.shadow_table[slot] = new_entry;
+
+- /* Decrement the refcount of the old contents of this slot */
+- if ( !pagetable_is_null(old_entry) )
++ /*
++ * Decrement the refcount of the old contents of this slot, unless
++ * we're still running on that shadow - in that case it'll need holding
++ * on to until the actual page table switch did occur.
++ */
++ if ( !pagetable_is_null(old_entry) && (v != current || !is_pv_domain(d)) )
+ {
+- mfn_t old_smfn = pagetable_get_mfn(old_entry);
+- /* Need to repin the old toplevel shadow if it's been unpinned
+- * by shadow_prealloc(): in PV mode we're still running on this
+- * shadow and it's not safe to free it yet. */
+- if ( !mfn_to_page(old_smfn)->u.sh.pinned && !sh_pin(d, old_smfn) )
+- {
+- printk(XENLOG_G_ERR "can't re-pin %"PRI_mfn"\n", mfn_x(old_smfn));
+- domain_crash(d);
+- }
+- sh_put_ref(d, old_smfn, 0);
++ sh_put_ref(d, pagetable_get_mfn(old_entry), 0);
++ old_entry = pagetable_null();
+ }
++
++ /*
++ * 2- and 3-level shadow mode is used for HVM only. Therefore we never run
++ * on such a shadow, so only call sites requesting an L4 shadow need to pay
++ * attention to the returned value.
++ */
++ ASSERT(pagetable_is_null(old_entry) || root_type == SH_type_l4_64_shadow);
++
++ return old_entry;
++}
++
++/*
++ * Helper invoked when releasing of a top-level shadow's reference was
++ * deferred in sh_set_toplevel_shadow() above.
++ */
++void shadow_put_top_level(struct domain *d, pagetable_t old_entry)
++{
++ ASSERT(!pagetable_is_null(old_entry));
++ paging_lock(d);
++ sh_put_ref(d, pagetable_get_mfn(old_entry), 0);
++ paging_unlock(d);
+ }
+
+ /**************************************************************************/
+diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
+index 671bf8c228..c92b354a78 100644
+--- a/xen/arch/x86/mm/shadow/multi.c
++++ b/xen/arch/x86/mm/shadow/multi.c
+@@ -3224,7 +3224,8 @@ static void cf_check sh_detach_old_tables(struct vcpu *v)
+ }
+ }
+
+-static void cf_check sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
++static pagetable_t cf_check sh_update_cr3(struct vcpu *v, bool do_locking,
++ bool noflush)
+ /* Updates vcpu->arch.cr3 after the guest has changed CR3.
+ * Paravirtual guests should set v->arch.guest_table (and guest_table_user,
+ * if appropriate).
+@@ -3238,6 +3239,7 @@ static void cf_check sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
+ {
+ struct domain *d = v->domain;
+ mfn_t gmfn;
++ pagetable_t old_entry = pagetable_null();
+ #if GUEST_PAGING_LEVELS == 3
+ const guest_l3e_t *gl3e;
+ unsigned int i, guest_idx;
+@@ -3247,7 +3249,7 @@ static void cf_check sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
+ if ( !is_hvm_domain(d) && !v->is_initialised )
+ {
+ ASSERT(v->arch.cr3 == 0);
+- return;
++ return old_entry;
+ }
+
+ if ( do_locking ) paging_lock(v->domain);
+@@ -3320,11 +3322,12 @@ static void cf_check sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
+ #if GUEST_PAGING_LEVELS == 4
+ if ( sh_remove_write_access(d, gmfn, 4, 0) != 0 )
+ guest_flush_tlb_mask(d, d->dirty_cpumask);
+- sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow, sh_make_shadow);
++ old_entry = sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow,
++ sh_make_shadow);
+ if ( unlikely(pagetable_is_null(v->arch.paging.shadow.shadow_table[0])) )
+ {
+ ASSERT(d->is_dying || d->is_shutting_down);
+- return;
++ return old_entry;
+ }
+ if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
+ {
+@@ -3368,24 +3371,30 @@ static void cf_check sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
+ gl2gfn = guest_l3e_get_gfn(gl3e[i]);
+ gl2mfn = get_gfn_query_unlocked(d, gfn_x(gl2gfn), &p2mt);
+ if ( p2m_is_ram(p2mt) )
+- sh_set_toplevel_shadow(v, i, gl2mfn, SH_type_l2_shadow,
+- sh_make_shadow);
++ old_entry = sh_set_toplevel_shadow(v, i, gl2mfn,
++ SH_type_l2_shadow,
++ sh_make_shadow);
+ else
+- sh_set_toplevel_shadow(v, i, INVALID_MFN, 0,
+- sh_make_shadow);
++ old_entry = sh_set_toplevel_shadow(v, i, INVALID_MFN, 0,
++ sh_make_shadow);
+ }
+ else
+- sh_set_toplevel_shadow(v, i, INVALID_MFN, 0, sh_make_shadow);
++ old_entry = sh_set_toplevel_shadow(v, i, INVALID_MFN, 0,
++ sh_make_shadow);
++
++ ASSERT(pagetable_is_null(old_entry));
+ }
+ }
+ #elif GUEST_PAGING_LEVELS == 2
+ if ( sh_remove_write_access(d, gmfn, 2, 0) != 0 )
+ guest_flush_tlb_mask(d, d->dirty_cpumask);
+- sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow, sh_make_shadow);
++ old_entry = sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow,
++ sh_make_shadow);
++ ASSERT(pagetable_is_null(old_entry));
+ if ( unlikely(pagetable_is_null(v->arch.paging.shadow.shadow_table[0])) )
+ {
+ ASSERT(d->is_dying || d->is_shutting_down);
+- return;
++ return old_entry;
+ }
+ #else
+ #error This should never happen
+@@ -3473,6 +3482,8 @@ static void cf_check sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
+
+ /* Release the lock, if we took it (otherwise it's the caller's problem) */
+ if ( do_locking ) paging_unlock(v->domain);
++
++ return old_entry;
+ }
+
+
+diff --git a/xen/arch/x86/mm/shadow/none.c b/xen/arch/x86/mm/shadow/none.c
+index eaaa874b11..743c0ffb85 100644
+--- a/xen/arch/x86/mm/shadow/none.c
++++ b/xen/arch/x86/mm/shadow/none.c
+@@ -52,9 +52,11 @@ static unsigned long cf_check _gva_to_gfn(
+ }
+ #endif
+
+-static void cf_check _update_cr3(struct vcpu *v, int do_locking, bool noflush)
++static pagetable_t cf_check _update_cr3(struct vcpu *v, bool do_locking,
++ bool noflush)
+ {
+ ASSERT_UNREACHABLE();
++ return pagetable_null();
+ }
+
+ static void cf_check _update_paging_modes(struct vcpu *v)
+diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h
+index c2bb1ed3c3..91f798c5aa 100644
+--- a/xen/arch/x86/mm/shadow/private.h
++++ b/xen/arch/x86/mm/shadow/private.h
+@@ -391,13 +391,13 @@ mfn_t shadow_alloc(struct domain *d,
+ void shadow_free(struct domain *d, mfn_t smfn);
+
+ /* Set up the top-level shadow and install it in slot 'slot' of shadow_table */
+-void sh_set_toplevel_shadow(struct vcpu *v,
+- unsigned int slot,
+- mfn_t gmfn,
+- unsigned int root_type,
+- mfn_t (*make_shadow)(struct vcpu *v,
+- mfn_t gmfn,
+- uint32_t shadow_type));
++pagetable_t sh_set_toplevel_shadow(struct vcpu *v,
++ unsigned int slot,
++ mfn_t gmfn,
++ unsigned int root_type,
++ mfn_t (*make_shadow)(struct vcpu *v,
++ mfn_t gmfn,
++ uint32_t shadow_type));
+
+ /* Update the shadows in response to a pagetable write from Xen */
+ int sh_validate_guest_entry(struct vcpu *v, mfn_t gmfn, void *entry, u32 size);
+diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
+index 5c92812dc6..2a445bb17b 100644
+--- a/xen/arch/x86/pv/domain.c
++++ b/xen/arch/x86/pv/domain.c
+@@ -424,10 +424,13 @@ bool __init xpti_pcid_enabled(void)
+
+ static void _toggle_guest_pt(struct vcpu *v)
+ {
++ bool guest_update;
++ pagetable_t old_shadow;
+ unsigned long cr3;
+
+ v->arch.flags ^= TF_kernel_mode;
+- update_cr3(v);
++ guest_update = v->arch.flags & TF_kernel_mode;
++ old_shadow = update_cr3(v);
+
+ /*
+ * Don't flush user global mappings from the TLB. Don't tick TLB clock.
+@@ -436,13 +439,31 @@ static void _toggle_guest_pt(struct vcpu *v)
+ * TLB flush (for just the incoming PCID), as the top level page table may
+ * have changed behind our backs. To be on the safe side, suppress the
+ * no-flush unconditionally in this case.
++ *
++ * Furthermore in shadow mode update_cr3() can fail, in which case here
++ * we're still running on the prior top-level shadow (which we're about
++ * to release). Switch to the idle page tables in such an event; the
++ * guest will have been crashed already.
+ */
+ cr3 = v->arch.cr3;
+ if ( shadow_mode_enabled(v->domain) )
++ {
+ cr3 &= ~X86_CR3_NOFLUSH;
++
++ if ( unlikely(mfn_eq(pagetable_get_mfn(old_shadow),
++ maddr_to_mfn(cr3))) )
++ {
++ cr3 = idle_vcpu[v->processor]->arch.cr3;
++ /* Also suppress runstate/time area updates below. */
++ guest_update = false;
++ }
++ }
+ write_cr3(cr3);
+
+- if ( !(v->arch.flags & TF_kernel_mode) )
++ if ( !pagetable_is_null(old_shadow) )
++ shadow_put_top_level(v->domain, old_shadow);
++
++ if ( !guest_update )
+ return;
+
+ if ( v->arch.pv.need_update_runstate_area && update_runstate_area(v) )
+--
+2.42.0
+
diff --git a/0041-tools-ocaml-evtchn-Don-t-reference-Custom-objects-wi.patch b/0041-tools-ocaml-evtchn-Don-t-reference-Custom-objects-wi.patch
deleted file mode 100644
index cabcdd0..0000000
--- a/0041-tools-ocaml-evtchn-Don-t-reference-Custom-objects-wi.patch
+++ /dev/null
@@ -1,213 +0,0 @@
-From 021b82cc0c71ba592439f175c1ededa800b172a9 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
-Date: Thu, 12 Jan 2023 17:48:29 +0000
-Subject: [PATCH 41/89] tools/ocaml/evtchn: Don't reference Custom objects with
- the GC lock released
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The modification to the _H() macro for Ocaml 5 support introduced a subtle
-bug. From the manual:
-
- https://ocaml.org/manual/intfc.html#ss:parallel-execution-long-running-c-code
-
-"After caml_release_runtime_system() was called and until
-caml_acquire_runtime_system() is called, the C code must not access any OCaml
-data, nor call any function of the run-time system, nor call back into OCaml
-code."
-
-Previously, the value was a naked C pointer, so dereferencing it wasn't
-"accessing any Ocaml data", but the fix to avoid naked C pointers added a
-layer of indirection through an Ocaml Custom object, meaning that the common
-pattern of using _H() in a blocking section is unsafe.
-
-In order to fix:
-
- * Drop the _H() macro and replace it with a static inline xce_of_val().
- * Opencode the assignment into Data_custom_val() in the two constructors.
- * Rename "value xce" parameters to "value xce_val" so we can consistently
- have "xenevtchn_handle *xce" on the stack, and obtain the pointer with the
- GC lock still held.
-
-Fixes: 22d5affdf0ce ("tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak")
-Signed-off-by: Edwin Török <edwin.torok@cloud.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 2636d8ff7a670c4d2485757dbe966e36c259a960)
----
- tools/ocaml/libs/eventchn/xeneventchn_stubs.c | 60 +++++++++++--------
- 1 file changed, 35 insertions(+), 25 deletions(-)
-
-diff --git a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
-index aa8a69cc1e..d7881ca95f 100644
---- a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
-+++ b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
-@@ -33,11 +33,14 @@
- #include <caml/fail.h>
- #include <caml/signals.h>
-
--#define _H(__h) (*((xenevtchn_handle **)Data_custom_val(__h)))
-+static inline xenevtchn_handle *xce_of_val(value v)
-+{
-+ return *(xenevtchn_handle **)Data_custom_val(v);
-+}
-
- static void stub_evtchn_finalize(value v)
- {
-- xenevtchn_close(_H(v));
-+ xenevtchn_close(xce_of_val(v));
- }
-
- static struct custom_operations xenevtchn_ops = {
-@@ -68,7 +71,7 @@ CAMLprim value stub_eventchn_init(value cloexec)
- caml_failwith("open failed");
-
- result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
-- _H(result) = xce;
-+ *(xenevtchn_handle **)Data_custom_val(result) = xce;
-
- CAMLreturn(result);
- }
-@@ -87,18 +90,19 @@ CAMLprim value stub_eventchn_fdopen(value fdval)
- caml_failwith("evtchn fdopen failed");
-
- result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
-- _H(result) = xce;
-+ *(xenevtchn_handle **)Data_custom_val(result) = xce;
-
- CAMLreturn(result);
- }
-
--CAMLprim value stub_eventchn_fd(value xce)
-+CAMLprim value stub_eventchn_fd(value xce_val)
- {
-- CAMLparam1(xce);
-+ CAMLparam1(xce_val);
- CAMLlocal1(result);
-+ xenevtchn_handle *xce = xce_of_val(xce_val);
- int fd;
-
-- fd = xenevtchn_fd(_H(xce));
-+ fd = xenevtchn_fd(xce);
- if (fd == -1)
- caml_failwith("evtchn fd failed");
-
-@@ -107,13 +111,14 @@ CAMLprim value stub_eventchn_fd(value xce)
- CAMLreturn(result);
- }
-
--CAMLprim value stub_eventchn_notify(value xce, value port)
-+CAMLprim value stub_eventchn_notify(value xce_val, value port)
- {
-- CAMLparam2(xce, port);
-+ CAMLparam2(xce_val, port);
-+ xenevtchn_handle *xce = xce_of_val(xce_val);
- int rc;
-
- caml_enter_blocking_section();
-- rc = xenevtchn_notify(_H(xce), Int_val(port));
-+ rc = xenevtchn_notify(xce, Int_val(port));
- caml_leave_blocking_section();
-
- if (rc == -1)
-@@ -122,15 +127,16 @@ CAMLprim value stub_eventchn_notify(value xce, value port)
- CAMLreturn(Val_unit);
- }
-
--CAMLprim value stub_eventchn_bind_interdomain(value xce, value domid,
-+CAMLprim value stub_eventchn_bind_interdomain(value xce_val, value domid,
- value remote_port)
- {
-- CAMLparam3(xce, domid, remote_port);
-+ CAMLparam3(xce_val, domid, remote_port);
- CAMLlocal1(port);
-+ xenevtchn_handle *xce = xce_of_val(xce_val);
- xenevtchn_port_or_error_t rc;
-
- caml_enter_blocking_section();
-- rc = xenevtchn_bind_interdomain(_H(xce), Int_val(domid), Int_val(remote_port));
-+ rc = xenevtchn_bind_interdomain(xce, Int_val(domid), Int_val(remote_port));
- caml_leave_blocking_section();
-
- if (rc == -1)
-@@ -140,14 +146,15 @@ CAMLprim value stub_eventchn_bind_interdomain(value xce, value domid,
- CAMLreturn(port);
- }
-
--CAMLprim value stub_eventchn_bind_virq(value xce, value virq_type)
-+CAMLprim value stub_eventchn_bind_virq(value xce_val, value virq_type)
- {
-- CAMLparam2(xce, virq_type);
-+ CAMLparam2(xce_val, virq_type);
- CAMLlocal1(port);
-+ xenevtchn_handle *xce = xce_of_val(xce_val);
- xenevtchn_port_or_error_t rc;
-
- caml_enter_blocking_section();
-- rc = xenevtchn_bind_virq(_H(xce), Int_val(virq_type));
-+ rc = xenevtchn_bind_virq(xce, Int_val(virq_type));
- caml_leave_blocking_section();
-
- if (rc == -1)
-@@ -157,13 +164,14 @@ CAMLprim value stub_eventchn_bind_virq(value xce, value virq_type)
- CAMLreturn(port);
- }
-
--CAMLprim value stub_eventchn_unbind(value xce, value port)
-+CAMLprim value stub_eventchn_unbind(value xce_val, value port)
- {
-- CAMLparam2(xce, port);
-+ CAMLparam2(xce_val, port);
-+ xenevtchn_handle *xce = xce_of_val(xce_val);
- int rc;
-
- caml_enter_blocking_section();
-- rc = xenevtchn_unbind(_H(xce), Int_val(port));
-+ rc = xenevtchn_unbind(xce, Int_val(port));
- caml_leave_blocking_section();
-
- if (rc == -1)
-@@ -172,14 +180,15 @@ CAMLprim value stub_eventchn_unbind(value xce, value port)
- CAMLreturn(Val_unit);
- }
-
--CAMLprim value stub_eventchn_pending(value xce)
-+CAMLprim value stub_eventchn_pending(value xce_val)
- {
-- CAMLparam1(xce);
-+ CAMLparam1(xce_val);
- CAMLlocal1(result);
-+ xenevtchn_handle *xce = xce_of_val(xce_val);
- xenevtchn_port_or_error_t port;
-
- caml_enter_blocking_section();
-- port = xenevtchn_pending(_H(xce));
-+ port = xenevtchn_pending(xce);
- caml_leave_blocking_section();
-
- if (port == -1)
-@@ -189,16 +198,17 @@ CAMLprim value stub_eventchn_pending(value xce)
- CAMLreturn(result);
- }
-
--CAMLprim value stub_eventchn_unmask(value xce, value _port)
-+CAMLprim value stub_eventchn_unmask(value xce_val, value _port)
- {
-- CAMLparam2(xce, _port);
-+ CAMLparam2(xce_val, _port);
-+ xenevtchn_handle *xce = xce_of_val(xce_val);
- evtchn_port_t port;
- int rc;
-
- port = Int_val(_port);
-
- caml_enter_blocking_section();
-- rc = xenevtchn_unmask(_H(xce), port);
-+ rc = xenevtchn_unmask(xce, port);
- caml_leave_blocking_section();
-
- if (rc)
---
-2.40.0
-
diff --git a/0041-tools-xenstored-domain_entry_fix-Handle-conflicting-.patch b/0041-tools-xenstored-domain_entry_fix-Handle-conflicting-.patch
new file mode 100644
index 0000000..1edecc8
--- /dev/null
+++ b/0041-tools-xenstored-domain_entry_fix-Handle-conflicting-.patch
@@ -0,0 +1,64 @@
+From c4e05c97f57d236040d1da5c1fbf6e3699dc86ea Mon Sep 17 00:00:00 2001
+From: Julien Grall <jgrall@amazon.com>
+Date: Fri, 22 Sep 2023 11:32:16 +0100
+Subject: [PATCH 41/55] tools/xenstored: domain_entry_fix(): Handle conflicting
+ transaction
+
+The function domain_entry_fix() will be initially called to check if the
+quota is correct before attempt to commit any nodes. So it would be
+possible that accounting is temporarily negative. This is the case
+in the following sequence:
+
+ 1) Create 50 nodes
+ 2) Start two transactions
+ 3) Delete all the nodes in each transaction
+ 4) Commit the two transactions
+
+Because the first transaction will have succeed and updated the
+accounting, there is no guarantee that 'd->nbentry + num' will still
+be above 0. So the assert() would be triggered.
+The assert() was introduced in dbef1f748289 ("tools/xenstore: simplify
+and fix per domain node accounting") with the assumption that the
+value can't be negative. As this is not true revert to the original
+check but restricted to the path where we don't update. Take the
+opportunity to explain the rationale behind the check.
+
+This CVE-2023-34323 / XSA-440.
+
+Fixes: dbef1f748289 ("tools/xenstore: simplify and fix per domain node accounting")
+Signed-off-by: Julien Grall <jgrall@amazon.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+---
+ tools/xenstore/xenstored_domain.c | 14 ++++++++++++--
+ 1 file changed, 12 insertions(+), 2 deletions(-)
+
+diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
+index aa86892fed..6074df210c 100644
+--- a/tools/xenstore/xenstored_domain.c
++++ b/tools/xenstore/xenstored_domain.c
+@@ -1094,10 +1094,20 @@ int domain_entry_fix(unsigned int domid, int num, bool update)
+ }
+
+ cnt = d->nbentry + num;
+- assert(cnt >= 0);
+
+- if (update)
++ if (update) {
++ assert(cnt >= 0);
+ d->nbentry = cnt;
++ } else if (cnt < 0) {
++ /*
++ * In a transaction when a node is being added/removed AND
++ * the same node has been added/removed outside the
++ * transaction in parallel, the result value may be negative.
++ * This is no problem, as the transaction will fail due to
++ * the resulting conflict. So override 'cnt'.
++ */
++ cnt = 0;
++ }
+
+ return domid_is_unprivileged(domid) ? cnt : 0;
+ }
+--
+2.42.0
+
diff --git a/0042-iommu-amd-vi-flush-IOMMU-TLB-when-flushing-the-DTE.patch b/0042-iommu-amd-vi-flush-IOMMU-TLB-when-flushing-the-DTE.patch
new file mode 100644
index 0000000..66597c2
--- /dev/null
+++ b/0042-iommu-amd-vi-flush-IOMMU-TLB-when-flushing-the-DTE.patch
@@ -0,0 +1,186 @@
+From 0d8f9f7f2706e8ad8dfff203173693b631339b86 Mon Sep 17 00:00:00 2001
+From: Roger Pau Monne <roger.pau@citrix.com>
+Date: Tue, 13 Jun 2023 15:01:05 +0200
+Subject: [PATCH 42/55] iommu/amd-vi: flush IOMMU TLB when flushing the DTE
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The caching invalidation guidelines from the AMD-Vi specification (48882—Rev
+3.07-PUB—Oct 2022) seem to be misleading on some hardware, as devices will
+malfunction (see stale DMA mappings) if some fields of the DTE are updated but
+the IOMMU TLB is not flushed. This has been observed in practice on AMD
+systems. Due to the lack of guidance from the currently published
+specification this patch aims to increase the flushing done in order to prevent
+device malfunction.
+
+In order to fix, issue an INVALIDATE_IOMMU_PAGES command from
+amd_iommu_flush_device(), flushing all the address space. Note this requires
+callers to be adjusted in order to pass the DomID on the DTE previous to the
+modification.
+
+Some call sites don't provide a valid DomID to amd_iommu_flush_device() in
+order to avoid the flush. That's because the device had address translations
+disabled and hence the previous DomID on the DTE is not valid. Note the
+current logic relies on the entity disabling address translations to also flush
+the TLB of the in use DomID.
+
+Device I/O TLB flushing when ATS are enabled is not covered by the current
+change, as ATS usage is not security supported.
+
+This is XSA-442 / CVE-2023-34326
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 5fc98b97084a46884acef9320e643faf40d42212)
+---
+ xen/drivers/passthrough/amd/iommu.h | 3 ++-
+ xen/drivers/passthrough/amd/iommu_cmd.c | 10 +++++++++-
+ xen/drivers/passthrough/amd/iommu_guest.c | 5 +++--
+ xen/drivers/passthrough/amd/iommu_init.c | 6 +++++-
+ xen/drivers/passthrough/amd/pci_amd_iommu.c | 14 ++++++++++----
+ 5 files changed, 29 insertions(+), 9 deletions(-)
+
+diff --git a/xen/drivers/passthrough/amd/iommu.h b/xen/drivers/passthrough/amd/iommu.h
+index 5429ada58e..a58be28bf9 100644
+--- a/xen/drivers/passthrough/amd/iommu.h
++++ b/xen/drivers/passthrough/amd/iommu.h
+@@ -283,7 +283,8 @@ void amd_iommu_flush_pages(struct domain *d, unsigned long dfn,
+ unsigned int order);
+ void amd_iommu_flush_iotlb(u8 devfn, const struct pci_dev *pdev,
+ uint64_t gaddr, unsigned int order);
+-void amd_iommu_flush_device(struct amd_iommu *iommu, uint16_t bdf);
++void amd_iommu_flush_device(struct amd_iommu *iommu, uint16_t bdf,
++ domid_t domid);
+ void amd_iommu_flush_intremap(struct amd_iommu *iommu, uint16_t bdf);
+ void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
+
+diff --git a/xen/drivers/passthrough/amd/iommu_cmd.c b/xen/drivers/passthrough/amd/iommu_cmd.c
+index 40ddf366bb..cb28b36abc 100644
+--- a/xen/drivers/passthrough/amd/iommu_cmd.c
++++ b/xen/drivers/passthrough/amd/iommu_cmd.c
+@@ -363,10 +363,18 @@ void amd_iommu_flush_pages(struct domain *d,
+ _amd_iommu_flush_pages(d, __dfn_to_daddr(dfn), order);
+ }
+
+-void amd_iommu_flush_device(struct amd_iommu *iommu, uint16_t bdf)
++void amd_iommu_flush_device(struct amd_iommu *iommu, uint16_t bdf,
++ domid_t domid)
+ {
+ invalidate_dev_table_entry(iommu, bdf);
+ flush_command_buffer(iommu, 0);
++
++ /* Also invalidate IOMMU TLB entries when flushing the DTE. */
++ if ( domid != DOMID_INVALID )
++ {
++ invalidate_iommu_pages(iommu, INV_IOMMU_ALL_PAGES_ADDRESS, domid, 0);
++ flush_command_buffer(iommu, 0);
++ }
+ }
+
+ void amd_iommu_flush_intremap(struct amd_iommu *iommu, uint16_t bdf)
+diff --git a/xen/drivers/passthrough/amd/iommu_guest.c b/xen/drivers/passthrough/amd/iommu_guest.c
+index 80a331f546..be86bce6fb 100644
+--- a/xen/drivers/passthrough/amd/iommu_guest.c
++++ b/xen/drivers/passthrough/amd/iommu_guest.c
+@@ -385,7 +385,7 @@ static int do_completion_wait(struct domain *d, cmd_entry_t *cmd)
+
+ static int do_invalidate_dte(struct domain *d, cmd_entry_t *cmd)
+ {
+- uint16_t gbdf, mbdf, req_id, gdom_id, hdom_id;
++ uint16_t gbdf, mbdf, req_id, gdom_id, hdom_id, prev_domid;
+ struct amd_iommu_dte *gdte, *mdte, *dte_base;
+ struct amd_iommu *iommu = NULL;
+ struct guest_iommu *g_iommu;
+@@ -445,13 +445,14 @@ static int do_invalidate_dte(struct domain *d, cmd_entry_t *cmd)
+ req_id = get_dma_requestor_id(iommu->seg, mbdf);
+ dte_base = iommu->dev_table.buffer;
+ mdte = &dte_base[req_id];
++ prev_domid = mdte->domain_id;
+
+ spin_lock_irqsave(&iommu->lock, flags);
+ dte_set_gcr3_table(mdte, hdom_id, gcr3_mfn << PAGE_SHIFT, gv, glx);
+
+ spin_unlock_irqrestore(&iommu->lock, flags);
+
+- amd_iommu_flush_device(iommu, req_id);
++ amd_iommu_flush_device(iommu, req_id, prev_domid);
+
+ return 0;
+ }
+diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
+index 166570648d..101a60ce17 100644
+--- a/xen/drivers/passthrough/amd/iommu_init.c
++++ b/xen/drivers/passthrough/amd/iommu_init.c
+@@ -1547,7 +1547,11 @@ static int cf_check _invalidate_all_devices(
+ req_id = ivrs_mappings[bdf].dte_requestor_id;
+ if ( iommu )
+ {
+- amd_iommu_flush_device(iommu, req_id);
++ /*
++ * IOMMU TLB flush performed separately (see
++ * invalidate_all_domain_pages()).
++ */
++ amd_iommu_flush_device(iommu, req_id, DOMID_INVALID);
+ amd_iommu_flush_intremap(iommu, req_id);
+ }
+ }
+diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
+index 94e3775506..8641b84712 100644
+--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
++++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
+@@ -192,10 +192,13 @@ static int __must_check amd_iommu_setup_domain_device(
+
+ spin_unlock_irqrestore(&iommu->lock, flags);
+
+- amd_iommu_flush_device(iommu, req_id);
++ /* DTE didn't have DMA translations enabled, do not flush the TLB. */
++ amd_iommu_flush_device(iommu, req_id, DOMID_INVALID);
+ }
+ else if ( dte->pt_root != mfn_x(page_to_mfn(root_pg)) )
+ {
++ domid_t prev_domid = dte->domain_id;
++
+ /*
+ * Strictly speaking if the device is the only one with this requestor
+ * ID, it could be allowed to be re-assigned regardless of unity map
+@@ -252,7 +255,7 @@ static int __must_check amd_iommu_setup_domain_device(
+
+ spin_unlock_irqrestore(&iommu->lock, flags);
+
+- amd_iommu_flush_device(iommu, req_id);
++ amd_iommu_flush_device(iommu, req_id, prev_domid);
+ }
+ else
+ spin_unlock_irqrestore(&iommu->lock, flags);
+@@ -421,6 +424,8 @@ static void amd_iommu_disable_domain_device(const struct domain *domain,
+ spin_lock_irqsave(&iommu->lock, flags);
+ if ( dte->tv || dte->v )
+ {
++ domid_t prev_domid = dte->domain_id;
++
+ /* See the comment in amd_iommu_setup_device_table(). */
+ dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_ABORTED;
+ smp_wmb();
+@@ -439,7 +444,7 @@ static void amd_iommu_disable_domain_device(const struct domain *domain,
+
+ spin_unlock_irqrestore(&iommu->lock, flags);
+
+- amd_iommu_flush_device(iommu, req_id);
++ amd_iommu_flush_device(iommu, req_id, prev_domid);
+
+ AMD_IOMMU_DEBUG("Disable: device id = %#x, "
+ "domain = %d, paging mode = %d\n",
+@@ -610,7 +615,8 @@ static int cf_check amd_iommu_add_device(u8 devfn, struct pci_dev *pdev)
+
+ spin_unlock_irqrestore(&iommu->lock, flags);
+
+- amd_iommu_flush_device(iommu, bdf);
++ /* DTE didn't have DMA translations enabled, do not flush the TLB. */
++ amd_iommu_flush_device(iommu, bdf, DOMID_INVALID);
+ }
+
+ if ( amd_iommu_reserve_domain_unity_map(
+--
+2.42.0
+
diff --git a/0042-tools-ocaml-xc-Fix-binding-for-xc_domain_assign_devi.patch b/0042-tools-ocaml-xc-Fix-binding-for-xc_domain_assign_devi.patch
deleted file mode 100644
index ac3e86d..0000000
--- a/0042-tools-ocaml-xc-Fix-binding-for-xc_domain_assign_devi.patch
+++ /dev/null
@@ -1,70 +0,0 @@
-From afdcc108566e5a4ee352b6427c98ebad6885a81d Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
-Date: Thu, 12 Jan 2023 11:38:38 +0000
-Subject: [PATCH 42/89] tools/ocaml/xc: Fix binding for
- xc_domain_assign_device()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The patch adding this binding was plain broken, and unreviewed. It modified
-the C stub to add a 4th parameter without an equivalent adjustment in the
-Ocaml side of the bindings.
-
-In 64bit builds, this causes us to dereference whatever dead value is in %rcx
-when trying to interpret the rflags parameter.
-
-This has gone unnoticed because Xapi doesn't use this binding (it has its
-own), but unbreak the binding by passing RDM_RELAXED unconditionally for
-now (matching the libxl default behaviour).
-
-Fixes: 9b34056cb4 ("tools: extend xc_assign_device() to support rdm reservation policy")
-Signed-off-by: Edwin Török <edwin.torok@cloud.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 4250683842104f02996428f93927a035c8e19266)
----
- tools/ocaml/libs/xc/xenctrl_stubs.c | 17 +++++------------
- 1 file changed, 5 insertions(+), 12 deletions(-)
-
-diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
-index e25367531b..f376d94334 100644
---- a/tools/ocaml/libs/xc/xenctrl_stubs.c
-+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
-@@ -1139,17 +1139,12 @@ CAMLprim value stub_xc_domain_test_assign_device(value xch, value domid, value d
- CAMLreturn(Val_bool(ret == 0));
- }
-
--static int domain_assign_device_rdm_flag_table[] = {
-- XEN_DOMCTL_DEV_RDM_RELAXED,
--};
--
--CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
-- value rflag)
-+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
- {
-- CAMLparam4(xch, domid, desc, rflag);
-+ CAMLparam3(xch, domid, desc);
- int ret;
- int domain, bus, dev, func;
-- uint32_t sbdf, flag;
-+ uint32_t sbdf;
-
- domain = Int_val(Field(desc, 0));
- bus = Int_val(Field(desc, 1));
-@@ -1157,10 +1152,8 @@ CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
- func = Int_val(Field(desc, 3));
- sbdf = encode_sbdf(domain, bus, dev, func);
-
-- ret = Int_val(Field(rflag, 0));
-- flag = domain_assign_device_rdm_flag_table[ret];
--
-- ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
-+ ret = xc_assign_device(_H(xch), _D(domid), sbdf,
-+ XEN_DOMCTL_DEV_RDM_RELAXED);
-
- if (ret < 0)
- failwith_xc(_H(xch));
---
-2.40.0
-
diff --git a/0043-libfsimage-xfs-Remove-dead-code.patch b/0043-libfsimage-xfs-Remove-dead-code.patch
new file mode 100644
index 0000000..cbb9ad4
--- /dev/null
+++ b/0043-libfsimage-xfs-Remove-dead-code.patch
@@ -0,0 +1,71 @@
+From d665c6690eb3c2c86cb2c7dac09804211481f926 Mon Sep 17 00:00:00 2001
+From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Date: Thu, 14 Sep 2023 13:22:50 +0100
+Subject: [PATCH 43/55] libfsimage/xfs: Remove dead code
+
+xfs_info.agnolog (and related code) and XFS_INO_AGBNO_BITS are dead code
+that serve no purpose.
+
+This is part of XSA-443 / CVE-2023-34325
+
+Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 37fc1e6c1c5c63aafd9cfd76a37728d5baea7d71)
+---
+ tools/libfsimage/xfs/fsys_xfs.c | 18 ------------------
+ 1 file changed, 18 deletions(-)
+
+diff --git a/tools/libfsimage/xfs/fsys_xfs.c b/tools/libfsimage/xfs/fsys_xfs.c
+index d735a88e55..2800699f59 100644
+--- a/tools/libfsimage/xfs/fsys_xfs.c
++++ b/tools/libfsimage/xfs/fsys_xfs.c
+@@ -37,7 +37,6 @@ struct xfs_info {
+ int blklog;
+ int inopblog;
+ int agblklog;
+- int agnolog;
+ unsigned int nextents;
+ xfs_daddr_t next;
+ xfs_daddr_t daddr;
+@@ -65,9 +64,7 @@ static struct xfs_info xfs;
+
+ #define XFS_INO_MASK(k) ((xfs_uint32_t)((1ULL << (k)) - 1))
+ #define XFS_INO_OFFSET_BITS xfs.inopblog
+-#define XFS_INO_AGBNO_BITS xfs.agblklog
+ #define XFS_INO_AGINO_BITS (xfs.agblklog + xfs.inopblog)
+-#define XFS_INO_AGNO_BITS xfs.agnolog
+
+ static inline xfs_agblock_t
+ agino2agbno (xfs_agino_t agino)
+@@ -149,20 +146,6 @@ xt_len (xfs_bmbt_rec_32_t *r)
+ return le32(r->l3) & mask32lo(21);
+ }
+
+-static inline int
+-xfs_highbit32(xfs_uint32_t v)
+-{
+- int i;
+-
+- if (--v) {
+- for (i = 0; i < 31; i++, v >>= 1) {
+- if (v == 0)
+- return i;
+- }
+- }
+- return 0;
+-}
+-
+ static int
+ isinxt (xfs_fileoff_t key, xfs_fileoff_t offset, xfs_filblks_t len)
+ {
+@@ -472,7 +455,6 @@ xfs_mount (fsi_file_t *ffi, const char *options)
+
+ xfs.inopblog = super.sb_inopblog;
+ xfs.agblklog = super.sb_agblklog;
+- xfs.agnolog = xfs_highbit32 (le32(super.sb_agcount));
+
+ xfs.btnode_ptr0_off =
+ ((xfs.bsize - sizeof(xfs_btree_block_t)) /
+--
+2.42.0
+
diff --git a/0043-tools-ocaml-xc-Don-t-reference-Abstract_Tag-objects-.patch b/0043-tools-ocaml-xc-Don-t-reference-Abstract_Tag-objects-.patch
deleted file mode 100644
index b7fec46..0000000
--- a/0043-tools-ocaml-xc-Don-t-reference-Abstract_Tag-objects-.patch
+++ /dev/null
@@ -1,76 +0,0 @@
-From bf935b1ff7cc76b2d25f877e56a359afaafcac1f Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 31 Jan 2023 17:19:30 +0000
-Subject: [PATCH 43/89] tools/ocaml/xc: Don't reference Abstract_Tag objects
- with the GC lock released
-
-The intf->{addr,len} references in the xc_map_foreign_range() call are unsafe.
-From the manual:
-
- https://ocaml.org/manual/intfc.html#ss:parallel-execution-long-running-c-code
-
-"After caml_release_runtime_system() was called and until
-caml_acquire_runtime_system() is called, the C code must not access any OCaml
-data, nor call any function of the run-time system, nor call back into OCaml
-code."
-
-More than what the manual says, the intf pointer is (potentially) invalidated
-by caml_enter_blocking_section() if another thread happens to perform garbage
-collection at just the right (wrong) moment.
-
-Rewrite the logic. There's no need to stash data in the Ocaml object until
-the success path at the very end.
-
-Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit 9e7c74e6f9fd2e44df1212643b80af9032b45b07)
----
- tools/ocaml/libs/xc/xenctrl_stubs.c | 23 +++++++++++------------
- 1 file changed, 11 insertions(+), 12 deletions(-)
-
-diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c b/tools/ocaml/libs/xc/xenctrl_stubs.c
-index f376d94334..facb561577 100644
---- a/tools/ocaml/libs/xc/xenctrl_stubs.c
-+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
-@@ -953,26 +953,25 @@ CAMLprim value stub_map_foreign_range(value xch, value dom,
- CAMLparam4(xch, dom, size, mfn);
- CAMLlocal1(result);
- struct mmap_interface *intf;
-- uint32_t c_dom;
-- unsigned long c_mfn;
-+ unsigned long c_mfn = Nativeint_val(mfn);
-+ int len = Int_val(size);
-+ void *ptr;
-
- BUILD_BUG_ON((sizeof(struct mmap_interface) % sizeof(value)) != 0);
- result = caml_alloc(Wsize_bsize(sizeof(struct mmap_interface)),
- Abstract_tag);
-
-- intf = (struct mmap_interface *) result;
--
-- intf->len = Int_val(size);
--
-- c_dom = _D(dom);
-- c_mfn = Nativeint_val(mfn);
- caml_enter_blocking_section();
-- intf->addr = xc_map_foreign_range(_H(xch), c_dom,
-- intf->len, PROT_READ|PROT_WRITE,
-- c_mfn);
-+ ptr = xc_map_foreign_range(_H(xch), _D(dom), len,
-+ PROT_READ|PROT_WRITE, c_mfn);
- caml_leave_blocking_section();
-- if (!intf->addr)
-+
-+ if (!ptr)
- caml_failwith("xc_map_foreign_range error");
-+
-+ intf = Data_abstract_val(result);
-+ *intf = (struct mmap_interface){ ptr, len };
-+
- CAMLreturn(result);
- }
-
---
-2.40.0
-
diff --git a/0044-libfsimage-xfs-Amend-mask32lo-to-allow-the-value-32.patch b/0044-libfsimage-xfs-Amend-mask32lo-to-allow-the-value-32.patch
new file mode 100644
index 0000000..880ff83
--- /dev/null
+++ b/0044-libfsimage-xfs-Amend-mask32lo-to-allow-the-value-32.patch
@@ -0,0 +1,33 @@
+From f1cd620cc3572c858e276463e05f695d949362c5 Mon Sep 17 00:00:00 2001
+From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Date: Thu, 14 Sep 2023 13:22:51 +0100
+Subject: [PATCH 44/55] libfsimage/xfs: Amend mask32lo() to allow the value 32
+
+agblklog could plausibly be 32, but that would overflow this shift.
+Perform the shift as ULL and cast to u32 at the end instead.
+
+This is part of XSA-443 / CVE-2023-34325
+
+Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit ddc45e4eea946bb373a4b4a60c84bf9339cf413b)
+---
+ tools/libfsimage/xfs/fsys_xfs.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/tools/libfsimage/xfs/fsys_xfs.c b/tools/libfsimage/xfs/fsys_xfs.c
+index 2800699f59..4720bb4505 100644
+--- a/tools/libfsimage/xfs/fsys_xfs.c
++++ b/tools/libfsimage/xfs/fsys_xfs.c
+@@ -60,7 +60,7 @@ static struct xfs_info xfs;
+ #define inode ((xfs_dinode_t *)((char *)FSYS_BUF + 8192))
+ #define icore (inode->di_core)
+
+-#define mask32lo(n) (((xfs_uint32_t)1 << (n)) - 1)
++#define mask32lo(n) ((xfs_uint32_t)((1ull << (n)) - 1))
+
+ #define XFS_INO_MASK(k) ((xfs_uint32_t)((1ULL << (k)) - 1))
+ #define XFS_INO_OFFSET_BITS xfs.inopblog
+--
+2.42.0
+
diff --git a/0044-tools-ocaml-libs-Fix-memory-resource-leaks-with-caml.patch b/0044-tools-ocaml-libs-Fix-memory-resource-leaks-with-caml.patch
deleted file mode 100644
index 8876ab7..0000000
--- a/0044-tools-ocaml-libs-Fix-memory-resource-leaks-with-caml.patch
+++ /dev/null
@@ -1,61 +0,0 @@
-From 587823eca162d063027faf1826ec3544f0a06e78 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 1 Feb 2023 11:27:42 +0000
-Subject: [PATCH 44/89] tools/ocaml/libs: Fix memory/resource leaks with
- caml_alloc_custom()
-
-All caml_alloc_*() functions can throw exceptions, and longjump out of
-context. If this happens, we leak the xch/xce handle.
-
-Reorder the logic to allocate the the Ocaml object first.
-
-Fixes: 8b3c06a3e545 ("tools/ocaml/xenctrl: OCaml 5 support, fix use-after-free")
-Fixes: 22d5affdf0ce ("tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Christian Lindig <christian.lindig@citrix.com>
-(cherry picked from commit d69ccf52ad467ccc22029172a8e61dc621187889)
----
- tools/ocaml/libs/eventchn/xeneventchn_stubs.c | 6 ++++--
- 1 file changed, 4 insertions(+), 2 deletions(-)
-
-diff --git a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
-index d7881ca95f..de2fc29292 100644
---- a/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
-+++ b/tools/ocaml/libs/eventchn/xeneventchn_stubs.c
-@@ -63,6 +63,8 @@ CAMLprim value stub_eventchn_init(value cloexec)
- if ( !Bool_val(cloexec) )
- flags |= XENEVTCHN_NO_CLOEXEC;
-
-+ result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
-+
- caml_enter_blocking_section();
- xce = xenevtchn_open(NULL, flags);
- caml_leave_blocking_section();
-@@ -70,7 +72,6 @@ CAMLprim value stub_eventchn_init(value cloexec)
- if (xce == NULL)
- caml_failwith("open failed");
-
-- result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
- *(xenevtchn_handle **)Data_custom_val(result) = xce;
-
- CAMLreturn(result);
-@@ -82,6 +83,8 @@ CAMLprim value stub_eventchn_fdopen(value fdval)
- CAMLlocal1(result);
- xenevtchn_handle *xce;
-
-+ result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
-+
- caml_enter_blocking_section();
- xce = xenevtchn_fdopen(NULL, Int_val(fdval), 0);
- caml_leave_blocking_section();
-@@ -89,7 +92,6 @@ CAMLprim value stub_eventchn_fdopen(value fdval)
- if (xce == NULL)
- caml_failwith("evtchn fdopen failed");
-
-- result = caml_alloc_custom(&xenevtchn_ops, sizeof(xce), 0, 1);
- *(xenevtchn_handle **)Data_custom_val(result) = xce;
-
- CAMLreturn(result);
---
-2.40.0
-
diff --git a/0045-libfsimage-xfs-Sanity-check-the-superblock-during-mo.patch b/0045-libfsimage-xfs-Sanity-check-the-superblock-during-mo.patch
new file mode 100644
index 0000000..01ae52a
--- /dev/null
+++ b/0045-libfsimage-xfs-Sanity-check-the-superblock-during-mo.patch
@@ -0,0 +1,137 @@
+From 78143c5336c8316bcc648e964d65a07f216cf77f Mon Sep 17 00:00:00 2001
+From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Date: Thu, 14 Sep 2023 13:22:52 +0100
+Subject: [PATCH 45/55] libfsimage/xfs: Sanity-check the superblock during
+ mounts
+
+Sanity-check the XFS superblock for wellformedness at the mount handler.
+This forces pygrub to abort parsing a potentially malformed filesystem and
+ensures the invariants assumed throughout the rest of the code hold.
+
+Also, derive parameters from previously sanitized parameters where possible
+(rather than reading them off the superblock)
+
+The code doesn't try to avoid overflowing the end of the disk, because
+that's an unlikely and benign error. Parameters used in calculations of
+xfs_daddr_t (like the root inode index) aren't in critical need of being
+sanitized.
+
+The sanitization of agblklog is basically checking that no obvious
+overflows happen on agblklog, and then ensuring agblocks is contained in
+the range (2^(sb_agblklog-1), 2^sb_agblklog].
+
+This is part of XSA-443 / CVE-2023-34325
+
+Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 620500dd1baf33347dfde5e7fde7cf7fe347da5c)
+---
+ tools/libfsimage/xfs/fsys_xfs.c | 48 ++++++++++++++++++++++++++-------
+ tools/libfsimage/xfs/xfs.h | 12 +++++++++
+ 2 files changed, 50 insertions(+), 10 deletions(-)
+
+diff --git a/tools/libfsimage/xfs/fsys_xfs.c b/tools/libfsimage/xfs/fsys_xfs.c
+index 4720bb4505..e4eb7e1ee2 100644
+--- a/tools/libfsimage/xfs/fsys_xfs.c
++++ b/tools/libfsimage/xfs/fsys_xfs.c
+@@ -17,6 +17,7 @@
+ * along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
++#include <stdbool.h>
+ #include <xenfsimage_grub.h>
+ #include "xfs.h"
+
+@@ -433,29 +434,56 @@ first_dentry (fsi_file_t *ffi, xfs_ino_t *ino)
+ return next_dentry (ffi, ino);
+ }
+
++static bool
++xfs_sb_is_invalid (const xfs_sb_t *super)
++{
++ return (le32(super->sb_magicnum) != XFS_SB_MAGIC)
++ || ((le16(super->sb_versionnum) & XFS_SB_VERSION_NUMBITS) !=
++ XFS_SB_VERSION_4)
++ || (super->sb_inodelog < XFS_SB_INODELOG_MIN)
++ || (super->sb_inodelog > XFS_SB_INODELOG_MAX)
++ || (super->sb_blocklog < XFS_SB_BLOCKLOG_MIN)
++ || (super->sb_blocklog > XFS_SB_BLOCKLOG_MAX)
++ || (super->sb_blocklog < super->sb_inodelog)
++ || (super->sb_agblklog > XFS_SB_AGBLKLOG_MAX)
++ || ((1ull << super->sb_agblklog) < le32(super->sb_agblocks))
++ || (((1ull << super->sb_agblklog) >> 1) >=
++ le32(super->sb_agblocks))
++ || ((super->sb_blocklog + super->sb_dirblklog) >=
++ XFS_SB_DIRBLK_NUMBITS);
++}
++
+ static int
+ xfs_mount (fsi_file_t *ffi, const char *options)
+ {
+ xfs_sb_t super;
+
+ if (!devread (ffi, 0, 0, sizeof(super), (char *)&super)
+- || (le32(super.sb_magicnum) != XFS_SB_MAGIC)
+- || ((le16(super.sb_versionnum)
+- & XFS_SB_VERSION_NUMBITS) != XFS_SB_VERSION_4) ) {
++ || xfs_sb_is_invalid(&super)) {
+ return 0;
+ }
+
+- xfs.bsize = le32 (super.sb_blocksize);
+- xfs.blklog = super.sb_blocklog;
+- xfs.bdlog = xfs.blklog - SECTOR_BITS;
++ /*
++ * Not sanitized. It's exclusively used to generate disk addresses,
++ * so it's not important from a security standpoint.
++ */
+ xfs.rootino = le64 (super.sb_rootino);
+- xfs.isize = le16 (super.sb_inodesize);
+- xfs.agblocks = le32 (super.sb_agblocks);
+- xfs.dirbsize = xfs.bsize << super.sb_dirblklog;
+
+- xfs.inopblog = super.sb_inopblog;
++ /*
++ * Sanitized to be consistent with each other, only used to
++ * generate disk addresses, so it's safe
++ */
++ xfs.agblocks = le32 (super.sb_agblocks);
+ xfs.agblklog = super.sb_agblklog;
+
++ /* Derived from sanitized parameters */
++ xfs.bsize = 1 << super.sb_blocklog;
++ xfs.blklog = super.sb_blocklog;
++ xfs.bdlog = super.sb_blocklog - SECTOR_BITS;
++ xfs.isize = 1 << super.sb_inodelog;
++ xfs.dirbsize = 1 << (super.sb_blocklog + super.sb_dirblklog);
++ xfs.inopblog = super.sb_blocklog - super.sb_inodelog;
++
+ xfs.btnode_ptr0_off =
+ ((xfs.bsize - sizeof(xfs_btree_block_t)) /
+ (sizeof (xfs_bmbt_key_t) + sizeof (xfs_bmbt_ptr_t)))
+diff --git a/tools/libfsimage/xfs/xfs.h b/tools/libfsimage/xfs/xfs.h
+index 40699281e4..b87e37d3d7 100644
+--- a/tools/libfsimage/xfs/xfs.h
++++ b/tools/libfsimage/xfs/xfs.h
+@@ -134,6 +134,18 @@ typedef struct xfs_sb
+ xfs_uint8_t sb_dummy[7]; /* padding */
+ } xfs_sb_t;
+
++/* Bound taken from xfs.c in GRUB2. It doesn't exist in the spec */
++#define XFS_SB_DIRBLK_NUMBITS 27
++/* Implied by the XFS specification. The minimum block size is 512 octets */
++#define XFS_SB_BLOCKLOG_MIN 9
++/* Implied by the XFS specification. The maximum block size is 65536 octets */
++#define XFS_SB_BLOCKLOG_MAX 16
++/* Implied by the XFS specification. The minimum inode size is 256 octets */
++#define XFS_SB_INODELOG_MIN 8
++/* Implied by the XFS specification. The maximum inode size is 2048 octets */
++#define XFS_SB_INODELOG_MAX 11
++/* High bound for sb_agblklog */
++#define XFS_SB_AGBLKLOG_MAX 32
+
+ /* those are from xfs_btree.h */
+
+--
+2.42.0
+
diff --git a/0045-x86-spec-ctrl-Mitigate-Cross-Thread-Return-Address-P.patch b/0045-x86-spec-ctrl-Mitigate-Cross-Thread-Return-Address-P.patch
deleted file mode 100644
index 1720bdd..0000000
--- a/0045-x86-spec-ctrl-Mitigate-Cross-Thread-Return-Address-P.patch
+++ /dev/null
@@ -1,120 +0,0 @@
-From 3685e754e6017c616769b28133286d06bf07b613 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 8 Sep 2022 21:27:58 +0100
-Subject: [PATCH 45/89] x86/spec-ctrl: Mitigate Cross-Thread Return Address
- Predictions
-
-This is XSA-426 / CVE-2022-27672
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 63305e5392ec2d17b85e7996a97462744425db80)
----
- docs/misc/xen-command-line.pandoc | 2 +-
- xen/arch/x86/include/asm/cpufeatures.h | 3 ++-
- xen/arch/x86/include/asm/spec_ctrl.h | 15 +++++++++++++
- xen/arch/x86/spec_ctrl.c | 31 +++++++++++++++++++++++---
- 4 files changed, 46 insertions(+), 5 deletions(-)
-
-diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index 424b12cfb2..e7fe8b0cc9 100644
---- a/docs/misc/xen-command-line.pandoc
-+++ b/docs/misc/xen-command-line.pandoc
-@@ -2343,7 +2343,7 @@ guests to use.
- on entry and exit. These blocks are necessary to virtualise support for
- guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
- * `rsb=` offers control over whether to overwrite the Return Stack Buffer /
-- Return Address Stack on entry to Xen.
-+ Return Address Stack on entry to Xen and on idle.
- * `md-clear=` offers control over whether to use VERW to flush
- microarchitectural buffers on idle and exit from Xen. *Note: For
- compatibility with development versions of this fix, `mds=` is also accepted
-diff --git a/xen/arch/x86/include/asm/cpufeatures.h b/xen/arch/x86/include/asm/cpufeatures.h
-index 865f110986..da0593de85 100644
---- a/xen/arch/x86/include/asm/cpufeatures.h
-+++ b/xen/arch/x86/include/asm/cpufeatures.h
-@@ -35,7 +35,8 @@ XEN_CPUFEATURE(SC_RSB_HVM, X86_SYNTH(19)) /* RSB overwrite needed for HVM
- XEN_CPUFEATURE(XEN_SELFSNOOP, X86_SYNTH(20)) /* SELFSNOOP gets used by Xen itself */
- XEN_CPUFEATURE(SC_MSR_IDLE, X86_SYNTH(21)) /* Clear MSR_SPEC_CTRL on idle */
- XEN_CPUFEATURE(XEN_LBR, X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
--/* Bits 23,24 unused. */
-+/* Bits 23 unused. */
-+XEN_CPUFEATURE(SC_RSB_IDLE, X86_SYNTH(24)) /* RSB overwrite needed for idle. */
- XEN_CPUFEATURE(SC_VERW_IDLE, X86_SYNTH(25)) /* VERW used by Xen for idle */
- XEN_CPUFEATURE(XEN_SHSTK, X86_SYNTH(26)) /* Xen uses CET Shadow Stacks */
- XEN_CPUFEATURE(XEN_IBT, X86_SYNTH(27)) /* Xen uses CET Indirect Branch Tracking */
-diff --git a/xen/arch/x86/include/asm/spec_ctrl.h b/xen/arch/x86/include/asm/spec_ctrl.h
-index 6a77c39378..391973ef6a 100644
---- a/xen/arch/x86/include/asm/spec_ctrl.h
-+++ b/xen/arch/x86/include/asm/spec_ctrl.h
-@@ -159,6 +159,21 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
- */
- alternative_input("", "verw %[sel]", X86_FEATURE_SC_VERW_IDLE,
- [sel] "m" (info->verw_sel));
-+
-+ /*
-+ * Cross-Thread Return Address Predictions:
-+ *
-+ * On vulnerable systems, the return predictions (RSB/RAS) are statically
-+ * partitioned between active threads. When entering idle, our entries
-+ * are re-partitioned to allow the other threads to use them.
-+ *
-+ * In some cases, we might still have guest entries in the RAS, so flush
-+ * them before injecting them sideways to our sibling thread.
-+ *
-+ * (ab)use alternative_input() to specify clobbers.
-+ */
-+ alternative_input("", "DO_OVERWRITE_RSB", X86_FEATURE_SC_RSB_IDLE,
-+ : "rax", "rcx");
- }
-
- /* WARNING! `ret`, `call *`, `jmp *` not safe before this call. */
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index a320b81947..e80e2a5ed1 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -1327,13 +1327,38 @@ void __init init_speculation_mitigations(void)
- * 3) Some CPUs have RSBs which are not full width, which allow the
- * attacker's entries to alias Xen addresses.
- *
-+ * 4) Some CPUs have RSBs which are re-partitioned based on thread
-+ * idleness, which allows an attacker to inject entries into the other
-+ * thread. We still active the optimisation in this case, and mitigate
-+ * in the idle path which has lower overhead.
-+ *
- * It is safe to turn off RSB stuffing when Xen is using SMEP itself, and
- * 32bit PV guests are disabled, and when the RSB is full width.
- */
- BUILD_BUG_ON(RO_MPT_VIRT_START != PML4_ADDR(256));
-- if ( opt_rsb_pv == -1 && boot_cpu_has(X86_FEATURE_XEN_SMEP) &&
-- !opt_pv32 && rsb_is_full_width() )
-- opt_rsb_pv = 0;
-+ if ( opt_rsb_pv == -1 )
-+ {
-+ opt_rsb_pv = (opt_pv32 || !boot_cpu_has(X86_FEATURE_XEN_SMEP) ||
-+ !rsb_is_full_width());
-+
-+ /*
-+ * Cross-Thread Return Address Predictions.
-+ *
-+ * Vulnerable systems are Zen1/Zen2 uarch, which is AMD Fam17 / Hygon
-+ * Fam18, when SMT is active.
-+ *
-+ * To mitigate, we must flush the RSB/RAS/RAP once between entering
-+ * Xen and going idle.
-+ *
-+ * Most cases flush on entry to Xen anyway. The one case where we
-+ * don't is when using the SMEP optimisation for PV guests. Flushing
-+ * before going idle is less overhead than flushing on PV entry.
-+ */
-+ if ( !opt_rsb_pv && hw_smt_enabled &&
-+ (boot_cpu_data.x86_vendor & (X86_VENDOR_AMD|X86_VENDOR_HYGON)) &&
-+ (boot_cpu_data.x86 == 0x17 || boot_cpu_data.x86 == 0x18) )
-+ setup_force_cpu_cap(X86_FEATURE_SC_RSB_IDLE);
-+ }
-
- if ( opt_rsb_pv )
- {
---
-2.40.0
-
diff --git a/0046-automation-Remove-clang-8-from-Debian-unstable-conta.patch b/0046-automation-Remove-clang-8-from-Debian-unstable-conta.patch
deleted file mode 100644
index 6fc3323..0000000
--- a/0046-automation-Remove-clang-8-from-Debian-unstable-conta.patch
+++ /dev/null
@@ -1,84 +0,0 @@
-From aaf74a532c02017998492c0bf60a9c6be3332f20 Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Tue, 21 Feb 2023 16:55:38 +0000
-Subject: [PATCH 46/89] automation: Remove clang-8 from Debian unstable
- container
-
-First, apt complain that it isn't the right way to add keys anymore,
-but hopefully that's just a warning.
-
-Second, we can't install clang-8:
-The following packages have unmet dependencies:
- clang-8 : Depends: libstdc++-8-dev but it is not installable
- Depends: libgcc-8-dev but it is not installable
- Depends: libobjc-8-dev but it is not installable
- Recommends: llvm-8-dev but it is not going to be installed
- Recommends: libomp-8-dev but it is not going to be installed
- libllvm8 : Depends: libffi7 (>= 3.3~20180313) but it is not installable
-E: Unable to correct problems, you have held broken packages.
-
-clang on Debian unstable is now version 14.0.6.
-
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-(cherry picked from commit a6b1e2b80fe2053b1c9c9843fb086a668513ea36)
----
- automation/build/debian/unstable-llvm-8.list | 3 ---
- automation/build/debian/unstable.dockerfile | 12 ------------
- automation/gitlab-ci/build.yaml | 10 ----------
- 3 files changed, 25 deletions(-)
- delete mode 100644 automation/build/debian/unstable-llvm-8.list
-
-diff --git a/automation/build/debian/unstable-llvm-8.list b/automation/build/debian/unstable-llvm-8.list
-deleted file mode 100644
-index dc119fa0b4..0000000000
---- a/automation/build/debian/unstable-llvm-8.list
-+++ /dev/null
-@@ -1,3 +0,0 @@
--# Unstable LLVM 8 repos
--deb http://apt.llvm.org/unstable/ llvm-toolchain-8 main
--deb-src http://apt.llvm.org/unstable/ llvm-toolchain-8 main
-diff --git a/automation/build/debian/unstable.dockerfile b/automation/build/debian/unstable.dockerfile
-index 9de766d596..b560337b7a 100644
---- a/automation/build/debian/unstable.dockerfile
-+++ b/automation/build/debian/unstable.dockerfile
-@@ -51,15 +51,3 @@ RUN apt-get update && \
- apt-get autoremove -y && \
- apt-get clean && \
- rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
--
--RUN wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key|apt-key add -
--COPY unstable-llvm-8.list /etc/apt/sources.list.d/
--
--RUN apt-get update && \
-- apt-get --quiet --yes install \
-- clang-8 \
-- lld-8 \
-- && \
-- apt-get autoremove -y && \
-- apt-get clean && \
-- rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
-diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
-index 716ee0b1e4..bed161b471 100644
---- a/automation/gitlab-ci/build.yaml
-+++ b/automation/gitlab-ci/build.yaml
-@@ -312,16 +312,6 @@ debian-unstable-clang-debug:
- variables:
- CONTAINER: debian:unstable
-
--debian-unstable-clang-8:
-- extends: .clang-8-x86-64-build
-- variables:
-- CONTAINER: debian:unstable
--
--debian-unstable-clang-8-debug:
-- extends: .clang-8-x86-64-build-debug
-- variables:
-- CONTAINER: debian:unstable
--
- debian-unstable-gcc:
- extends: .gcc-x86-64-build
- variables:
---
-2.40.0
-
diff --git a/0046-libfsimage-xfs-Add-compile-time-check-to-libfsimage.patch b/0046-libfsimage-xfs-Add-compile-time-check-to-libfsimage.patch
new file mode 100644
index 0000000..0c32745
--- /dev/null
+++ b/0046-libfsimage-xfs-Add-compile-time-check-to-libfsimage.patch
@@ -0,0 +1,62 @@
+From eb4efdac4cc7121f832ee156f39761312878f3a5 Mon Sep 17 00:00:00 2001
+From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Date: Thu, 14 Sep 2023 13:22:53 +0100
+Subject: [PATCH 46/55] libfsimage/xfs: Add compile-time check to libfsimage
+
+Adds the common tools include folder to the -I compile flags
+of libfsimage. This allows us to use:
+ xen-tools/common-macros.h:BUILD_BUG_ON()
+
+With it, statically assert a sanitized "blocklog - SECTOR_BITS" cannot
+underflow.
+
+This is part of XSA-443 / CVE-2023-34325
+
+Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 7d85c70431593550e32022e3a19a37f306f49e00)
+---
+ tools/libfsimage/common.mk | 2 +-
+ tools/libfsimage/xfs/fsys_xfs.c | 4 +++-
+ 2 files changed, 4 insertions(+), 2 deletions(-)
+
+diff --git a/tools/libfsimage/common.mk b/tools/libfsimage/common.mk
+index 4fc8c66795..e4336837d0 100644
+--- a/tools/libfsimage/common.mk
++++ b/tools/libfsimage/common.mk
+@@ -1,7 +1,7 @@
+ include $(XEN_ROOT)/tools/Rules.mk
+
+ FSDIR := $(libdir)/xenfsimage
+-CFLAGS += -Wno-unknown-pragmas -I$(XEN_ROOT)/tools/libfsimage/common/ -DFSIMAGE_FSDIR=\"$(FSDIR)\"
++CFLAGS += -Wno-unknown-pragmas -I$(XEN_ROOT)/tools/libfsimage/common/ $(CFLAGS_xeninclude) -DFSIMAGE_FSDIR=\"$(FSDIR)\"
+ CFLAGS += -D_GNU_SOURCE
+ LDFLAGS += -L../common/
+
+diff --git a/tools/libfsimage/xfs/fsys_xfs.c b/tools/libfsimage/xfs/fsys_xfs.c
+index e4eb7e1ee2..4a8dd6f239 100644
+--- a/tools/libfsimage/xfs/fsys_xfs.c
++++ b/tools/libfsimage/xfs/fsys_xfs.c
+@@ -19,6 +19,7 @@
+
+ #include <stdbool.h>
+ #include <xenfsimage_grub.h>
++#include <xen-tools/libs.h>
+ #include "xfs.h"
+
+ #define MAX_LINK_COUNT 8
+@@ -477,9 +478,10 @@ xfs_mount (fsi_file_t *ffi, const char *options)
+ xfs.agblklog = super.sb_agblklog;
+
+ /* Derived from sanitized parameters */
++ BUILD_BUG_ON(XFS_SB_BLOCKLOG_MIN < SECTOR_BITS);
++ xfs.bdlog = super.sb_blocklog - SECTOR_BITS;
+ xfs.bsize = 1 << super.sb_blocklog;
+ xfs.blklog = super.sb_blocklog;
+- xfs.bdlog = super.sb_blocklog - SECTOR_BITS;
+ xfs.isize = 1 << super.sb_inodelog;
+ xfs.dirbsize = 1 << (super.sb_blocklog + super.sb_dirblklog);
+ xfs.inopblog = super.sb_blocklog - super.sb_inodelog;
+--
+2.42.0
+
diff --git a/0047-libs-util-Fix-parallel-build-between-flex-bison-and-.patch b/0047-libs-util-Fix-parallel-build-between-flex-bison-and-.patch
deleted file mode 100644
index f3e6d36..0000000
--- a/0047-libs-util-Fix-parallel-build-between-flex-bison-and-.patch
+++ /dev/null
@@ -1,50 +0,0 @@
-From c622b8ace93cc38c73f47f5044dc3663ef93f815 Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Fri, 3 Mar 2023 07:55:24 +0100
-Subject: [PATCH 47/89] libs/util: Fix parallel build between flex/bison and CC
- rules
-
-flex/bison generate two targets, and when those targets are
-prerequisite of other rules they are considered independently by make.
-
-We can have a situation where the .c file is out-of-date but not the
-.h, git checkout for example. In this case, if a rule only have the .h
-file as prerequiste, make will procced and start to build the object.
-In parallel, another target can have the .c file as prerequisite and
-make will find out it need re-generating and do so, changing the .h at
-the same time. This parallel task breaks the first one.
-
-To avoid this scenario, we put both the header and the source as
-prerequisite for all object even if they only need the header.
-
-Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: bf652a50fb3bb3b1b3d93db6fb79bc28f978fe75
-master date: 2023-02-09 18:26:17 +0000
----
- tools/libs/util/Makefile | 8 ++++++++
- 1 file changed, 8 insertions(+)
-
-diff --git a/tools/libs/util/Makefile b/tools/libs/util/Makefile
-index 493d2e00be..fee4ea0dc7 100644
---- a/tools/libs/util/Makefile
-+++ b/tools/libs/util/Makefile
-@@ -40,6 +40,14 @@ include $(XEN_ROOT)/tools/libs/libs.mk
-
- $(OBJS-y) $(PIC_OBJS): $(AUTOINCS)
-
-+# Adding the .c conterparts of the headers generated by flex/bison as
-+# prerequisite of all objects.
-+# This is to tell make that if only the .c file is out-of-date but not the
-+# header, it should still wait for the .c file to be rebuilt.
-+# Otherwise, make doesn't considered "%.c %.h" as grouped targets, and will run
-+# the flex/bison rules in parallel of CC rules which only need the header.
-+$(OBJS-y) $(PIC_OBJS): libxlu_cfg_l.c libxlu_cfg_y.c libxlu_disk_l.c
-+
- %.c %.h:: %.y
- @rm -f $*.[ch]
- $(BISON) --output=$*.c $<
---
-2.40.0
-
diff --git a/0047-tools-pygrub-Remove-unnecessary-hypercall.patch b/0047-tools-pygrub-Remove-unnecessary-hypercall.patch
new file mode 100644
index 0000000..6bdd9bb
--- /dev/null
+++ b/0047-tools-pygrub-Remove-unnecessary-hypercall.patch
@@ -0,0 +1,60 @@
+From 8a584126eae53a44cefb0acdbca201233a557fa5 Mon Sep 17 00:00:00 2001
+From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Date: Mon, 25 Sep 2023 18:32:21 +0100
+Subject: [PATCH 47/55] tools/pygrub: Remove unnecessary hypercall
+
+There's a hypercall being issued in order to determine whether PV64 is
+supported, but since Xen 4.3 that's strictly true so it's not required.
+
+Plus, this way we can avoid mapping the privcmd interface altogether in the
+depriv pygrub.
+
+This is part of XSA-443 / CVE-2023-34325
+
+Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+(cherry picked from commit f4b504c6170c446e61055cbd388ae4e832a9deca)
+---
+ tools/pygrub/src/pygrub | 12 +-----------
+ 1 file changed, 1 insertion(+), 11 deletions(-)
+
+diff --git a/tools/pygrub/src/pygrub b/tools/pygrub/src/pygrub
+index ce7ab0eb8c..ce4e07d3e8 100755
+--- a/tools/pygrub/src/pygrub
++++ b/tools/pygrub/src/pygrub
+@@ -18,7 +18,6 @@ import os, sys, string, struct, tempfile, re, traceback, stat, errno
+ import copy
+ import logging
+ import platform
+-import xen.lowlevel.xc
+
+ import curses, _curses, curses.textpad, curses.ascii
+ import getopt
+@@ -668,14 +667,6 @@ def run_grub(file, entry, fs, cfg_args):
+
+ return grubcfg
+
+-def supports64bitPVguest():
+- xc = xen.lowlevel.xc.xc()
+- caps = xc.xeninfo()['xen_caps'].split(" ")
+- for cap in caps:
+- if cap == "xen-3.0-x86_64":
+- return True
+- return False
+-
+ # If nothing has been specified, look for a Solaris domU. If found, perform the
+ # necessary tweaks.
+ def sniff_solaris(fs, cfg):
+@@ -684,8 +675,7 @@ def sniff_solaris(fs, cfg):
+ return cfg
+
+ if not cfg["kernel"]:
+- if supports64bitPVguest() and \
+- fs.file_exists("/platform/i86xpv/kernel/amd64/unix"):
++ if fs.file_exists("/platform/i86xpv/kernel/amd64/unix"):
+ cfg["kernel"] = "/platform/i86xpv/kernel/amd64/unix"
+ cfg["ramdisk"] = "/platform/i86pc/amd64/boot_archive"
+ elif fs.file_exists("/platform/i86xpv/kernel/unix"):
+--
+2.42.0
+
diff --git a/0048-tools-pygrub-Small-refactors.patch b/0048-tools-pygrub-Small-refactors.patch
new file mode 100644
index 0000000..55b238c
--- /dev/null
+++ b/0048-tools-pygrub-Small-refactors.patch
@@ -0,0 +1,65 @@
+From e7059f16f7c2b99fea30b9671fec74c0375eee8f Mon Sep 17 00:00:00 2001
+From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Date: Mon, 25 Sep 2023 18:32:22 +0100
+Subject: [PATCH 48/55] tools/pygrub: Small refactors
+
+Small tidy up to ensure output_directory always has a trailing '/' to ease
+concatenating paths and that `output` can only be a filename or None.
+
+This is part of XSA-443 / CVE-2023-34325
+
+Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+(cherry picked from commit 9f2ff9a7c9b3ac734ae99f17f0134ed0343dcccf)
+---
+ tools/pygrub/src/pygrub | 10 +++++-----
+ 1 file changed, 5 insertions(+), 5 deletions(-)
+
+diff --git a/tools/pygrub/src/pygrub b/tools/pygrub/src/pygrub
+index ce4e07d3e8..1042c05b86 100755
+--- a/tools/pygrub/src/pygrub
++++ b/tools/pygrub/src/pygrub
+@@ -793,7 +793,7 @@ if __name__ == "__main__":
+ debug = False
+ not_really = False
+ output_format = "sxp"
+- output_directory = "/var/run/xen/pygrub"
++ output_directory = "/var/run/xen/pygrub/"
+
+ # what was passed in
+ incfg = { "kernel": None, "ramdisk": None, "args": "" }
+@@ -815,7 +815,8 @@ if __name__ == "__main__":
+ usage()
+ sys.exit()
+ elif o in ("--output",):
+- output = a
++ if a != "-":
++ output = a
+ elif o in ("--kernel",):
+ incfg["kernel"] = a
+ elif o in ("--ramdisk",):
+@@ -847,12 +848,11 @@ if __name__ == "__main__":
+ if not os.path.isdir(a):
+ print("%s is not an existing directory" % a)
+ sys.exit(1)
+- output_directory = a
++ output_directory = a + '/'
+
+ if debug:
+ logging.basicConfig(level=logging.DEBUG)
+
+-
+ try:
+ os.makedirs(output_directory, 0o700)
+ except OSError as e:
+@@ -861,7 +861,7 @@ if __name__ == "__main__":
+ else:
+ raise
+
+- if output is None or output == "-":
++ if output is None:
+ fd = sys.stdout.fileno()
+ else:
+ fd = os.open(output, os.O_WRONLY)
+--
+2.42.0
+
diff --git a/0048-x86-cpuid-Infrastructure-for-leaves-7-1-ecx-edx.patch b/0048-x86-cpuid-Infrastructure-for-leaves-7-1-ecx-edx.patch
deleted file mode 100644
index 46c48de..0000000
--- a/0048-x86-cpuid-Infrastructure-for-leaves-7-1-ecx-edx.patch
+++ /dev/null
@@ -1,126 +0,0 @@
-From cdc23d47ad85e756540eaa8655ebc2a0445612ed Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 3 Mar 2023 07:55:54 +0100
-Subject: [PATCH 48/89] x86/cpuid: Infrastructure for leaves 7:1{ecx,edx}
-
-We don't actually need ecx yet, but adding it in now will reduce the amount to
-which leaf 7 is out of order in a featureset.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: b4a23bf6293aadecfd03bf9e83974443e2eac9cb
-master date: 2023-02-09 18:26:17 +0000
----
- tools/misc/xen-cpuid.c | 10 ++++++++++
- xen/arch/x86/cpu/common.c | 3 ++-
- xen/include/public/arch-x86/cpufeatureset.h | 4 ++++
- xen/include/xen/lib/x86/cpuid.h | 15 ++++++++++++++-
- 4 files changed, 30 insertions(+), 2 deletions(-)
-
-diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
-index d5833e9ce8..addb3a39a1 100644
---- a/tools/misc/xen-cpuid.c
-+++ b/tools/misc/xen-cpuid.c
-@@ -202,6 +202,14 @@ static const char *const str_7b1[32] =
- [ 0] = "ppin",
- };
-
-+static const char *const str_7c1[32] =
-+{
-+};
-+
-+static const char *const str_7d1[32] =
-+{
-+};
-+
- static const char *const str_7d2[32] =
- {
- [ 0] = "intel-psfd",
-@@ -229,6 +237,8 @@ static const struct {
- { "0x80000021.eax", "e21a", str_e21a },
- { "0x00000007:1.ebx", "7b1", str_7b1 },
- { "0x00000007:2.edx", "7d2", str_7d2 },
-+ { "0x00000007:1.ecx", "7c1", str_7c1 },
-+ { "0x00000007:1.edx", "7d1", str_7d1 },
- };
-
- #define COL_ALIGN "18"
-diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
-index 0412dbc915..b3fcf4680f 100644
---- a/xen/arch/x86/cpu/common.c
-+++ b/xen/arch/x86/cpu/common.c
-@@ -450,7 +450,8 @@ static void generic_identify(struct cpuinfo_x86 *c)
- cpuid_count(7, 1,
- &c->x86_capability[FEATURESET_7a1],
- &c->x86_capability[FEATURESET_7b1],
-- &tmp, &tmp);
-+ &c->x86_capability[FEATURESET_7c1],
-+ &c->x86_capability[FEATURESET_7d1]);
- if (max_subleaf >= 2)
- cpuid_count(7, 2,
- &tmp, &tmp, &tmp,
-diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
-index 7915f5826f..f43cdcd0f9 100644
---- a/xen/include/public/arch-x86/cpufeatureset.h
-+++ b/xen/include/public/arch-x86/cpufeatureset.h
-@@ -295,6 +295,10 @@ XEN_CPUFEATURE(RRSBA_CTRL, 13*32+ 2) /* MSR_SPEC_CTRL.RRSBA_DIS_* */
- XEN_CPUFEATURE(BHI_CTRL, 13*32+ 4) /* MSR_SPEC_CTRL.BHI_DIS_S */
- XEN_CPUFEATURE(MCDT_NO, 13*32+ 5) /*A MCDT_NO */
-
-+/* Intel-defined CPU features, CPUID level 0x00000007:1.ecx, word 14 */
-+
-+/* Intel-defined CPU features, CPUID level 0x00000007:1.edx, word 15 */
-+
- #endif /* XEN_CPUFEATURE */
-
- /* Clean up from a default include. Close the enum (for C). */
-diff --git a/xen/include/xen/lib/x86/cpuid.h b/xen/include/xen/lib/x86/cpuid.h
-index 73a5c33036..fa98b371ee 100644
---- a/xen/include/xen/lib/x86/cpuid.h
-+++ b/xen/include/xen/lib/x86/cpuid.h
-@@ -18,6 +18,8 @@
- #define FEATURESET_e21a 11 /* 0x80000021.eax */
- #define FEATURESET_7b1 12 /* 0x00000007:1.ebx */
- #define FEATURESET_7d2 13 /* 0x00000007:2.edx */
-+#define FEATURESET_7c1 14 /* 0x00000007:1.ecx */
-+#define FEATURESET_7d1 15 /* 0x00000007:1.edx */
-
- struct cpuid_leaf
- {
-@@ -194,7 +196,14 @@ struct cpuid_policy
- uint32_t _7b1;
- struct { DECL_BITFIELD(7b1); };
- };
-- uint32_t /* c */:32, /* d */:32;
-+ union {
-+ uint32_t _7c1;
-+ struct { DECL_BITFIELD(7c1); };
-+ };
-+ union {
-+ uint32_t _7d1;
-+ struct { DECL_BITFIELD(7d1); };
-+ };
-
- /* Subleaf 2. */
- uint32_t /* a */:32, /* b */:32, /* c */:32;
-@@ -343,6 +352,8 @@ static inline void cpuid_policy_to_featureset(
- fs[FEATURESET_e21a] = p->extd.e21a;
- fs[FEATURESET_7b1] = p->feat._7b1;
- fs[FEATURESET_7d2] = p->feat._7d2;
-+ fs[FEATURESET_7c1] = p->feat._7c1;
-+ fs[FEATURESET_7d1] = p->feat._7d1;
- }
-
- /* Fill in a CPUID policy from a featureset bitmap. */
-@@ -363,6 +374,8 @@ static inline void cpuid_featureset_to_policy(
- p->extd.e21a = fs[FEATURESET_e21a];
- p->feat._7b1 = fs[FEATURESET_7b1];
- p->feat._7d2 = fs[FEATURESET_7d2];
-+ p->feat._7c1 = fs[FEATURESET_7c1];
-+ p->feat._7d1 = fs[FEATURESET_7d1];
- }
-
- static inline uint64_t cpuid_policy_xcr0_max(const struct cpuid_policy *p)
---
-2.40.0
-
diff --git a/0049-tools-pygrub-Open-the-output-files-earlier.patch b/0049-tools-pygrub-Open-the-output-files-earlier.patch
new file mode 100644
index 0000000..c3b00b1
--- /dev/null
+++ b/0049-tools-pygrub-Open-the-output-files-earlier.patch
@@ -0,0 +1,105 @@
+From 37977420670c65db220349510599d3fe47600ad8 Mon Sep 17 00:00:00 2001
+From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Date: Mon, 25 Sep 2023 18:32:23 +0100
+Subject: [PATCH 49/55] tools/pygrub: Open the output files earlier
+
+This patch allows pygrub to get ahold of every RW file descriptor it needs
+early on. A later patch will clamp the filesystem it can access so it can't
+obtain any others.
+
+This is part of XSA-443 / CVE-2023-34325
+
+Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+(cherry picked from commit 0710d7d44586251bfca9758890616dc3d6de8a74)
+---
+ tools/pygrub/src/pygrub | 37 ++++++++++++++++++++++---------------
+ 1 file changed, 22 insertions(+), 15 deletions(-)
+
+diff --git a/tools/pygrub/src/pygrub b/tools/pygrub/src/pygrub
+index 1042c05b86..91e2ec2ab1 100755
+--- a/tools/pygrub/src/pygrub
++++ b/tools/pygrub/src/pygrub
+@@ -738,8 +738,7 @@ if __name__ == "__main__":
+ def usage():
+ print("Usage: %s [-q|--quiet] [-i|--interactive] [-l|--list-entries] [-n|--not-really] [--output=] [--kernel=] [--ramdisk=] [--args=] [--entry=] [--output-directory=] [--output-format=sxp|simple|simple0] [--offset=] <image>" %(sys.argv[0],), file=sys.stderr)
+
+- def copy_from_image(fs, file_to_read, file_type, output_directory,
+- not_really):
++ def copy_from_image(fs, file_to_read, file_type, fd_dst, path_dst, not_really):
+ if not_really:
+ if fs.file_exists(file_to_read):
+ return "<%s:%s>" % (file_type, file_to_read)
+@@ -750,21 +749,18 @@ if __name__ == "__main__":
+ except Exception as e:
+ print(e, file=sys.stderr)
+ sys.exit("Error opening %s in guest" % file_to_read)
+- (tfd, ret) = tempfile.mkstemp(prefix="boot_"+file_type+".",
+- dir=output_directory)
+ dataoff = 0
+ while True:
+ data = datafile.read(FS_READ_MAX, dataoff)
+ if len(data) == 0:
+- os.close(tfd)
++ os.close(fd_dst)
+ del datafile
+- return ret
++ return
+ try:
+- os.write(tfd, data)
++ os.write(fd_dst, data)
+ except Exception as e:
+ print(e, file=sys.stderr)
+- os.close(tfd)
+- os.unlink(ret)
++ os.unlink(path_dst)
+ del datafile
+ sys.exit("Error writing temporary copy of "+file_type)
+ dataoff += len(data)
+@@ -861,6 +857,14 @@ if __name__ == "__main__":
+ else:
+ raise
+
++ if not_really:
++ fd_kernel = path_kernel = fd_ramdisk = path_ramdisk = None
++ else:
++ (fd_kernel, path_kernel) = tempfile.mkstemp(prefix="boot_kernel.",
++ dir=output_directory)
++ (fd_ramdisk, path_ramdisk) = tempfile.mkstemp(prefix="boot_ramdisk.",
++ dir=output_directory)
++
+ if output is None:
+ fd = sys.stdout.fileno()
+ else:
+@@ -920,20 +924,23 @@ if __name__ == "__main__":
+ if fs is None:
+ raise RuntimeError("Unable to find partition containing kernel")
+
+- bootcfg["kernel"] = copy_from_image(fs, chosencfg["kernel"], "kernel",
+- output_directory, not_really)
++ copy_from_image(fs, chosencfg["kernel"], "kernel",
++ fd_kernel, path_kernel, not_really)
++ bootcfg["kernel"] = path_kernel
+
+ if chosencfg["ramdisk"]:
+ try:
+- bootcfg["ramdisk"] = copy_from_image(fs, chosencfg["ramdisk"],
+- "ramdisk", output_directory,
+- not_really)
++ copy_from_image(fs, chosencfg["ramdisk"], "ramdisk",
++ fd_ramdisk, path_ramdisk, not_really)
+ except:
+ if not not_really:
+- os.unlink(bootcfg["kernel"])
++ os.unlink(path_kernel)
+ raise
++ bootcfg["ramdisk"] = path_ramdisk
+ else:
+ initrd = None
++ if not not_really:
++ os.unlink(path_ramdisk)
+
+ args = None
+ if chosencfg["args"]:
+--
+2.42.0
+
diff --git a/0049-x86-shskt-Disable-CET-SS-on-parts-susceptible-to-fra.patch b/0049-x86-shskt-Disable-CET-SS-on-parts-susceptible-to-fra.patch
deleted file mode 100644
index a34217e..0000000
--- a/0049-x86-shskt-Disable-CET-SS-on-parts-susceptible-to-fra.patch
+++ /dev/null
@@ -1,195 +0,0 @@
-From 8202b9cf84674c5b23a89c4b8722afbb9787f917 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 3 Mar 2023 07:56:16 +0100
-Subject: [PATCH 49/89] x86/shskt: Disable CET-SS on parts susceptible to
- fractured updates
-
-Refer to Intel SDM Rev 70 (Dec 2022), Vol3 17.2.3 "Supervisor Shadow Stack
-Token".
-
-Architecturally, an event delivery which starts in CPL<3 and switches shadow
-stack will first validate the Supervisor Shadow Stack Token (setting the busy
-bit), then pushes CS/LIP/SSP. One example of this is an NMI interrupting Xen.
-
-Some CPUs suffer from an issue called fracturing, whereby a fault/vmexit/etc
-between setting the busy bit and completing the event injection renders the
-action non-restartable, because when it comes time to restart, the busy bit is
-found to be already set.
-
-This is far more easily encountered under virt, yet it is not the fault of the
-hypervisor, nor the fault of the guest kernel. The fault lies somewhere
-between the architectural specification, and the uarch behaviour.
-
-Intel have allocated CPUID.7[1].ecx[18] CET_SSS to enumerate that supervisor
-shadow stacks are safe to use. Because of how Xen lays out its shadow stacks,
-fracturing is not expected to be a problem on native.
-
-Detect this case on boot and default to not using shstk if virtualised.
-Specifying `cet=shstk` on the command line will override this heuristic and
-enable shadow stacks irrespective.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 01e7477d1b081cff4288ff9f51ec59ee94c03ee0
-master date: 2023-02-09 18:26:17 +0000
----
- docs/misc/xen-command-line.pandoc | 7 +++-
- tools/libs/light/libxl_cpuid.c | 2 +
- tools/misc/xen-cpuid.c | 1 +
- xen/arch/x86/cpu/common.c | 11 ++++-
- xen/arch/x86/setup.c | 46 +++++++++++++++++----
- xen/include/public/arch-x86/cpufeatureset.h | 1 +
- 6 files changed, 57 insertions(+), 11 deletions(-)
-
-diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index e7fe8b0cc9..807ca51fb2 100644
---- a/docs/misc/xen-command-line.pandoc
-+++ b/docs/misc/xen-command-line.pandoc
-@@ -287,10 +287,15 @@ can be maintained with the pv-shim mechanism.
- protection.
-
- The option is available when `CONFIG_XEN_SHSTK` is compiled in, and
-- defaults to `true` on hardware supporting CET-SS. Specifying
-+ generally defaults to `true` on hardware supporting CET-SS. Specifying
- `cet=no-shstk` will cause Xen not to use Shadow Stacks even when support
- is available in hardware.
-
-+ Some hardware suffers from an issue known as Supervisor Shadow Stack
-+ Fracturing. On such hardware, Xen will default to not using Shadow Stacks
-+ when virtualised. Specifying `cet=shstk` will override this heuristic and
-+ enable Shadow Stacks unilaterally.
-+
- * The `ibt=` boolean controls whether Xen uses Indirect Branch Tracking for
- its own protection.
-
-diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
-index 2aa23225f4..d97a2f3338 100644
---- a/tools/libs/light/libxl_cpuid.c
-+++ b/tools/libs/light/libxl_cpuid.c
-@@ -235,6 +235,8 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
- {"fsrs", 0x00000007, 1, CPUID_REG_EAX, 11, 1},
- {"fsrcs", 0x00000007, 1, CPUID_REG_EAX, 12, 1},
-
-+ {"cet-sss", 0x00000007, 1, CPUID_REG_EDX, 18, 1},
-+
- {"intel-psfd", 0x00000007, 2, CPUID_REG_EDX, 0, 1},
- {"mcdt-no", 0x00000007, 2, CPUID_REG_EDX, 5, 1},
-
-diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
-index addb3a39a1..0248eaef44 100644
---- a/tools/misc/xen-cpuid.c
-+++ b/tools/misc/xen-cpuid.c
-@@ -208,6 +208,7 @@ static const char *const str_7c1[32] =
-
- static const char *const str_7d1[32] =
- {
-+ [18] = "cet-sss",
- };
-
- static const char *const str_7d2[32] =
-diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
-index b3fcf4680f..27f73d3bbe 100644
---- a/xen/arch/x86/cpu/common.c
-+++ b/xen/arch/x86/cpu/common.c
-@@ -346,11 +346,18 @@ void __init early_cpu_init(void)
- x86_cpuid_vendor_to_str(c->x86_vendor), c->x86, c->x86,
- c->x86_model, c->x86_model, c->x86_mask, eax);
-
-- if (c->cpuid_level >= 7)
-- cpuid_count(7, 0, &eax, &ebx,
-+ if (c->cpuid_level >= 7) {
-+ uint32_t max_subleaf;
-+
-+ cpuid_count(7, 0, &max_subleaf, &ebx,
- &c->x86_capability[FEATURESET_7c0],
- &c->x86_capability[FEATURESET_7d0]);
-
-+ if (max_subleaf >= 1)
-+ cpuid_count(7, 1, &eax, &ebx, &ecx,
-+ &c->x86_capability[FEATURESET_7d1]);
-+ }
-+
- eax = cpuid_eax(0x80000000);
- if ((eax >> 16) == 0x8000 && eax >= 0x80000008) {
- ebx = eax >= 0x8000001f ? cpuid_ebx(0x8000001f) : 0;
-diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
-index e05189f649..09c17b1016 100644
---- a/xen/arch/x86/setup.c
-+++ b/xen/arch/x86/setup.c
-@@ -95,11 +95,7 @@ unsigned long __initdata highmem_start;
- size_param("highmem-start", highmem_start);
- #endif
-
--#ifdef CONFIG_XEN_SHSTK
--static bool __initdata opt_xen_shstk = true;
--#else
--#define opt_xen_shstk false
--#endif
-+static int8_t __initdata opt_xen_shstk = -IS_ENABLED(CONFIG_XEN_SHSTK);
-
- #ifdef CONFIG_XEN_IBT
- static bool __initdata opt_xen_ibt = true;
-@@ -1104,11 +1100,45 @@ void __init noreturn __start_xen(unsigned long mbi_p)
- early_cpu_init();
-
- /* Choose shadow stack early, to set infrastructure up appropriately. */
-- if ( opt_xen_shstk && boot_cpu_has(X86_FEATURE_CET_SS) )
-+ if ( !boot_cpu_has(X86_FEATURE_CET_SS) )
-+ opt_xen_shstk = 0;
-+
-+ if ( opt_xen_shstk )
- {
-- printk("Enabling Supervisor Shadow Stacks\n");
-+ /*
-+ * Some CPUs suffer from Shadow Stack Fracturing, an issue whereby a
-+ * fault/VMExit/etc between setting a Supervisor Busy bit and the
-+ * event delivery completing renders the operation non-restartable.
-+ * On restart, event delivery will find the Busy bit already set.
-+ *
-+ * This is a problem on bare metal, but outside of synthetic cases or
-+ * a very badly timed #MC, it's not believed to be a problem. It is a
-+ * much bigger problem under virt, because we can VMExit for a number
-+ * of legitimate reasons and tickle this bug.
-+ *
-+ * CPUs with this addressed enumerate CET-SSS to indicate that
-+ * supervisor shadow stacks are now safe to use.
-+ */
-+ bool cpu_has_bug_shstk_fracture =
-+ boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
-+ !boot_cpu_has(X86_FEATURE_CET_SSS);
-
-- setup_force_cpu_cap(X86_FEATURE_XEN_SHSTK);
-+ /*
-+ * On bare metal, assume that Xen won't be impacted by shstk
-+ * fracturing problems. Under virt, be more conservative and disable
-+ * shstk by default.
-+ */
-+ if ( opt_xen_shstk == -1 )
-+ opt_xen_shstk =
-+ cpu_has_hypervisor ? !cpu_has_bug_shstk_fracture
-+ : true;
-+
-+ if ( opt_xen_shstk )
-+ {
-+ printk("Enabling Supervisor Shadow Stacks\n");
-+
-+ setup_force_cpu_cap(X86_FEATURE_XEN_SHSTK);
-+ }
- }
-
- if ( opt_xen_ibt && boot_cpu_has(X86_FEATURE_CET_IBT) )
-diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
-index f43cdcd0f9..08600cfdc7 100644
---- a/xen/include/public/arch-x86/cpufeatureset.h
-+++ b/xen/include/public/arch-x86/cpufeatureset.h
-@@ -298,6 +298,7 @@ XEN_CPUFEATURE(MCDT_NO, 13*32+ 5) /*A MCDT_NO */
- /* Intel-defined CPU features, CPUID level 0x00000007:1.ecx, word 14 */
-
- /* Intel-defined CPU features, CPUID level 0x00000007:1.edx, word 15 */
-+XEN_CPUFEATURE(CET_SSS, 15*32+18) /* CET Supervisor Shadow Stacks safe to use */
-
- #endif /* XEN_CPUFEATURE */
-
---
-2.40.0
-
diff --git a/0050-credit2-respect-credit2_runqueue-all-when-arranging-.patch b/0050-credit2-respect-credit2_runqueue-all-when-arranging-.patch
deleted file mode 100644
index 0444aa9..0000000
--- a/0050-credit2-respect-credit2_runqueue-all-when-arranging-.patch
+++ /dev/null
@@ -1,69 +0,0 @@
-From 74b76704fd4059e9133e84c1384501858e9663b7 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
- <marmarek@invisiblethingslab.com>
-Date: Fri, 3 Mar 2023 07:57:39 +0100
-Subject: [PATCH 50/89] credit2: respect credit2_runqueue=all when arranging
- runqueues
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Documentation for credit2_runqueue=all says it should create one queue
-for all pCPUs on the host. But since introduction
-sched_credit2_max_cpus_runqueue, it actually created separate runqueue
-per socket, even if the CPUs count is below
-sched_credit2_max_cpus_runqueue.
-
-Adjust the condition to skip syblink check in case of
-credit2_runqueue=all.
-
-Fixes: 8e2aa76dc167 ("xen: credit2: limit the max number of CPUs in a runqueue")
-Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-master commit: 1f5747ee929fbbcae58d7234c6c38a77495d0cfe
-master date: 2023-02-15 16:12:42 +0100
----
- docs/misc/xen-command-line.pandoc | 5 +++++
- xen/common/sched/credit2.c | 9 +++++++--
- 2 files changed, 12 insertions(+), 2 deletions(-)
-
-diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index 807ca51fb2..5be5ce10c6 100644
---- a/docs/misc/xen-command-line.pandoc
-+++ b/docs/misc/xen-command-line.pandoc
-@@ -726,6 +726,11 @@ Available alternatives, with their meaning, are:
- * `all`: just one runqueue shared by all the logical pCPUs of
- the host
-
-+Regardless of the above choice, Xen attempts to respect
-+`sched_credit2_max_cpus_runqueue` limit, which may mean more than one runqueue
-+for the `all` value. If that isn't intended, raise
-+the `sched_credit2_max_cpus_runqueue` value.
-+
- ### dbgp
- > `= ehci[ <integer> | @pci<bus>:<slot>.<func> ]`
- > `= xhci[ <integer> | @pci<bus>:<slot>.<func> ][,share=<bool>|hwdom]`
-diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
-index 0e3f89e537..ae55feea34 100644
---- a/xen/common/sched/credit2.c
-+++ b/xen/common/sched/credit2.c
-@@ -996,9 +996,14 @@ cpu_add_to_runqueue(const struct scheduler *ops, unsigned int cpu)
- *
- * Otherwise, let's try to make sure that siblings stay in the
- * same runqueue, pretty much under any cinrcumnstances.
-+ *
-+ * Furthermore, try to respect credit2_runqueue=all, as long as
-+ * max_cpus_runq isn't violated.
- */
-- if ( rqd->refcnt < max_cpus_runq && (ops->cpupool->gran != SCHED_GRAN_cpu ||
-- cpu_runqueue_siblings_match(rqd, cpu, max_cpus_runq)) )
-+ if ( rqd->refcnt < max_cpus_runq &&
-+ (ops->cpupool->gran != SCHED_GRAN_cpu ||
-+ cpu_runqueue_siblings_match(rqd, cpu, max_cpus_runq) ||
-+ opt_runqueue == OPT_RUNQUEUE_ALL) )
- {
- /*
- * This runqueue is ok, but as we said, we also want an even
---
-2.40.0
-
diff --git a/0050-tools-libfsimage-Export-a-new-function-to-preload-al.patch b/0050-tools-libfsimage-Export-a-new-function-to-preload-al.patch
new file mode 100644
index 0000000..949528d
--- /dev/null
+++ b/0050-tools-libfsimage-Export-a-new-function-to-preload-al.patch
@@ -0,0 +1,126 @@
+From 8ee19246ad2c1d0ce241a52683f56b144a4f0b0e Mon Sep 17 00:00:00 2001
+From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Date: Mon, 25 Sep 2023 18:32:24 +0100
+Subject: [PATCH 50/55] tools/libfsimage: Export a new function to preload all
+ plugins
+
+This is work required in order to let pygrub operate in highly deprivileged
+chroot mode. This patch adds a function that preloads every plugin, hence
+ensuring that a on function exit, every shared library is loaded in memory.
+
+The new "init" function is supposed to be used before depriv, but that's
+fine because it's not acting on untrusted data.
+
+This is part of XSA-443 / CVE-2023-34325
+
+Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+(cherry picked from commit 990e65c3ad9ac08642ce62a92852c80be6c83e96)
+---
+ tools/libfsimage/common/fsimage_plugin.c | 4 ++--
+ tools/libfsimage/common/mapfile-GNU | 1 +
+ tools/libfsimage/common/mapfile-SunOS | 1 +
+ tools/libfsimage/common/xenfsimage.h | 8 ++++++++
+ tools/pygrub/src/fsimage/fsimage.c | 15 +++++++++++++++
+ 5 files changed, 27 insertions(+), 2 deletions(-)
+
+diff --git a/tools/libfsimage/common/fsimage_plugin.c b/tools/libfsimage/common/fsimage_plugin.c
+index de1412b423..d0cb9e96a6 100644
+--- a/tools/libfsimage/common/fsimage_plugin.c
++++ b/tools/libfsimage/common/fsimage_plugin.c
+@@ -119,7 +119,7 @@ fail:
+ return (-1);
+ }
+
+-static int load_plugins(void)
++int fsi_init(void)
+ {
+ const char *fsdir = getenv("XEN_FSIMAGE_FSDIR");
+ struct dirent *dp = NULL;
+@@ -180,7 +180,7 @@ int find_plugin(fsi_t *fsi, const char *path, const char *options)
+ fsi_plugin_t *fp;
+ int ret = 0;
+
+- if (plugins == NULL && (ret = load_plugins()) != 0)
++ if (plugins == NULL && (ret = fsi_init()) != 0)
+ goto out;
+
+ for (fp = plugins; fp != NULL; fp = fp->fp_next) {
+diff --git a/tools/libfsimage/common/mapfile-GNU b/tools/libfsimage/common/mapfile-GNU
+index 26d4d7a69e..2d54d527d7 100644
+--- a/tools/libfsimage/common/mapfile-GNU
++++ b/tools/libfsimage/common/mapfile-GNU
+@@ -1,6 +1,7 @@
+ VERSION {
+ libfsimage.so.1.0 {
+ global:
++ fsi_init;
+ fsi_open_fsimage;
+ fsi_close_fsimage;
+ fsi_file_exists;
+diff --git a/tools/libfsimage/common/mapfile-SunOS b/tools/libfsimage/common/mapfile-SunOS
+index e99b90b650..48deedb425 100644
+--- a/tools/libfsimage/common/mapfile-SunOS
++++ b/tools/libfsimage/common/mapfile-SunOS
+@@ -1,5 +1,6 @@
+ libfsimage.so.1.0 {
+ global:
++ fsi_init;
+ fsi_open_fsimage;
+ fsi_close_fsimage;
+ fsi_file_exists;
+diff --git a/tools/libfsimage/common/xenfsimage.h b/tools/libfsimage/common/xenfsimage.h
+index 201abd54f2..341883b2d7 100644
+--- a/tools/libfsimage/common/xenfsimage.h
++++ b/tools/libfsimage/common/xenfsimage.h
+@@ -35,6 +35,14 @@ extern C {
+ typedef struct fsi fsi_t;
+ typedef struct fsi_file fsi_file_t;
+
++/*
++ * Optional initialization function. If invoked it loads the associated
++ * dynamic libraries for the backends ahead of time. This is required if
++ * the library is to run as part of a highly deprivileged executable, as
++ * the libraries may not be reachable after depriv.
++ */
++int fsi_init(void);
++
+ fsi_t *fsi_open_fsimage(const char *, uint64_t, const char *);
+ void fsi_close_fsimage(fsi_t *);
+
+diff --git a/tools/pygrub/src/fsimage/fsimage.c b/tools/pygrub/src/fsimage/fsimage.c
+index 2ebbbe35df..92fbf2851f 100644
+--- a/tools/pygrub/src/fsimage/fsimage.c
++++ b/tools/pygrub/src/fsimage/fsimage.c
+@@ -286,6 +286,15 @@ fsimage_getbootstring(PyObject *o, PyObject *args)
+ return Py_BuildValue("s", bootstring);
+ }
+
++static PyObject *
++fsimage_init(PyObject *o, PyObject *args)
++{
++ if (!PyArg_ParseTuple(args, ""))
++ return (NULL);
++
++ return Py_BuildValue("i", fsi_init());
++}
++
+ PyDoc_STRVAR(fsimage_open__doc__,
+ "open(name, [offset=off]) - Open the given file as a filesystem image.\n"
+ "\n"
+@@ -297,7 +306,13 @@ PyDoc_STRVAR(fsimage_getbootstring__doc__,
+ "getbootstring(fs) - Return the boot string needed for this file system "
+ "or NULL if none is needed.\n");
+
++PyDoc_STRVAR(fsimage_init__doc__,
++ "init() - Loads every dynamic library contained in xenfsimage "
++ "into memory so that it can be used in chrooted environments.\n");
++
+ static struct PyMethodDef fsimage_module_methods[] = {
++ { "init", (PyCFunction)fsimage_init,
++ METH_VARARGS, fsimage_init__doc__ },
+ { "open", (PyCFunction)fsimage_open,
+ METH_VARARGS|METH_KEYWORDS, fsimage_open__doc__ },
+ { "getbootstring", (PyCFunction)fsimage_getbootstring,
+--
+2.42.0
+
diff --git a/0051-build-make-FILE-symbol-paths-consistent.patch b/0051-build-make-FILE-symbol-paths-consistent.patch
deleted file mode 100644
index 47528c2..0000000
--- a/0051-build-make-FILE-symbol-paths-consistent.patch
+++ /dev/null
@@ -1,42 +0,0 @@
-From 46c104cce0bf340193cb1eacaee5dcd75e264c8f Mon Sep 17 00:00:00 2001
-From: Ross Lagerwall <ross.lagerwall@citrix.com>
-Date: Fri, 3 Mar 2023 07:58:12 +0100
-Subject: [PATCH 51/89] build: make FILE symbol paths consistent
-
-The FILE symbols in out-of-tree builds may be either a relative path to
-the object dir or an absolute path depending on how the build is
-invoked. Fix the paths for C files so that they are consistent with
-in-tree builds - the path is relative to the "xen" directory (e.g.
-common/irq.c).
-
-This fixes livepatch builds when the original Xen build was out-of-tree
-since livepatch-build always does in-tree builds. Note that this doesn't
-fix the behaviour for Clang < 6 which always embeds full paths.
-
-Fixes: 7115fa562fe7 ("build: adding out-of-tree support to the xen build")
-Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 5b9bb91abba7c983def3b4bef71ab08ad360a242
-master date: 2023-02-15 16:13:49 +0100
----
- xen/Rules.mk | 3 ++-
- 1 file changed, 2 insertions(+), 1 deletion(-)
-
-diff --git a/xen/Rules.mk b/xen/Rules.mk
-index 70b7489ea8..d6b7cec0a8 100644
---- a/xen/Rules.mk
-+++ b/xen/Rules.mk
-@@ -228,8 +228,9 @@ quiet_cmd_cc_o_c = CC $@
- ifeq ($(CONFIG_ENFORCE_UNIQUE_SYMBOLS),y)
- cmd_cc_o_c = $(CC) $(c_flags) -c $< -o $(dot-target).tmp -MQ $@
- ifneq ($(CONFIG_CC_IS_CLANG)$(call clang-ifversion,-lt,600,y),yy)
-+ rel-path = $(patsubst $(abs_srctree)/%,%,$(call realpath,$(1)))
- cmd_objcopy_fix_sym = \
-- $(OBJCOPY) --redefine-sym $(<F)=$< $(dot-target).tmp $@ && rm -f $(dot-target).tmp
-+ $(OBJCOPY) --redefine-sym $(<F)=$(call rel-path,$<) $(dot-target).tmp $@ && rm -f $(dot-target).tmp
- else
- cmd_objcopy_fix_sym = mv -f $(dot-target).tmp $@
- endif
---
-2.40.0
-
diff --git a/0051-tools-pygrub-Deprivilege-pygrub.patch b/0051-tools-pygrub-Deprivilege-pygrub.patch
new file mode 100644
index 0000000..1d89191
--- /dev/null
+++ b/0051-tools-pygrub-Deprivilege-pygrub.patch
@@ -0,0 +1,307 @@
+From f5e211654e5fbb7f1fc5cfea7f9c7ab525edb9e7 Mon Sep 17 00:00:00 2001
+From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Date: Mon, 25 Sep 2023 18:32:25 +0100
+Subject: [PATCH 51/55] tools/pygrub: Deprivilege pygrub
+
+Introduce a --runas=<uid> flag to deprivilege pygrub on Linux and *BSDs. It
+also implicitly creates a chroot env where it drops a deprivileged forked
+process. The chroot itself is cleaned up at the end.
+
+If the --runas arg is present, then pygrub forks, leaving the child to
+deprivilege itself, and waiting for it to complete. When the child exists,
+the parent performs cleanup and exits with the same error code.
+
+This is roughly what the child does:
+ 1. Initialize libfsimage (this loads every .so in memory so the chroot
+ can avoid bind-mounting /{,usr}/lib*
+ 2. Create a temporary empty chroot directory
+ 3. Mount tmpfs in it
+ 4. Bind mount the disk inside, because libfsimage expects a path, not a
+ file descriptor.
+ 5. Remount the root tmpfs to be stricter (ro,nosuid,nodev)
+ 6. Set RLIMIT_FSIZE to a sensibly high amount (128 MiB)
+ 7. Depriv gid, groups and uid
+
+With this scheme in place, the "output" files are writable (up to
+RLIMIT_FSIZE octets) and the exposed filesystem is immutable and contains
+the single only file we can't easily get rid of (the disk).
+
+If running on Linux, the child process also unshares mount, IPC, and
+network namespaces before dropping its privileges.
+
+This is part of XSA-443 / CVE-2023-34325
+
+Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+(cherry picked from commit e0342ae5556f2b6e2db50701b8a0679a45822ca6)
+---
+ tools/pygrub/setup.py | 2 +-
+ tools/pygrub/src/pygrub | 162 +++++++++++++++++++++++++++++++++++++---
+ 2 files changed, 154 insertions(+), 10 deletions(-)
+
+diff --git a/tools/pygrub/setup.py b/tools/pygrub/setup.py
+index 0e4e3d02d3..06b96733d0 100644
+--- a/tools/pygrub/setup.py
++++ b/tools/pygrub/setup.py
+@@ -17,7 +17,7 @@ xenfsimage = Extension("xenfsimage",
+ pkgs = [ 'grub' ]
+
+ setup(name='pygrub',
+- version='0.6',
++ version='0.7',
+ description='Boot loader that looks a lot like grub for Xen',
+ author='Jeremy Katz',
+ author_email='katzj@redhat.com',
+diff --git a/tools/pygrub/src/pygrub b/tools/pygrub/src/pygrub
+index 91e2ec2ab1..7cea496ade 100755
+--- a/tools/pygrub/src/pygrub
++++ b/tools/pygrub/src/pygrub
+@@ -16,8 +16,11 @@ from __future__ import print_function
+
+ import os, sys, string, struct, tempfile, re, traceback, stat, errno
+ import copy
++import ctypes, ctypes.util
+ import logging
+ import platform
++import resource
++import subprocess
+
+ import curses, _curses, curses.textpad, curses.ascii
+ import getopt
+@@ -27,10 +30,135 @@ import grub.GrubConf
+ import grub.LiloConf
+ import grub.ExtLinuxConf
+
+-PYGRUB_VER = 0.6
++PYGRUB_VER = 0.7
+ FS_READ_MAX = 1024 * 1024
+ SECTOR_SIZE = 512
+
++# Unless provided through the env variable PYGRUB_MAX_FILE_SIZE_MB, then
++# this is the maximum filesize allowed for files written by the depriv
++# pygrub
++LIMIT_FSIZE = 128 << 20
++
++CLONE_NEWNS = 0x00020000 # mount namespace
++CLONE_NEWNET = 0x40000000 # network namespace
++CLONE_NEWIPC = 0x08000000 # IPC namespace
++
++def unshare(flags):
++ if not sys.platform.startswith("linux"):
++ print("skip_unshare reason=not_linux platform=%s", sys.platform, file=sys.stderr)
++ return
++
++ libc = ctypes.CDLL(ctypes.util.find_library('c'), use_errno=True)
++ unshare_prototype = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.c_int, use_errno=True)
++ unshare = unshare_prototype(('unshare', libc))
++
++ if unshare(flags) < 0:
++ raise OSError(ctypes.get_errno(), os.strerror(ctypes.get_errno()))
++
++def bind_mount(src, dst, options):
++ open(dst, "a").close() # touch
++
++ rc = subprocess.call(["mount", "--bind", "-o", options, src, dst])
++ if rc != 0:
++ raise RuntimeError("bad_mount: src=%s dst=%s opts=%s" %
++ (src, dst, options))
++
++def downgrade_rlimits():
++ # Wipe the authority to use unrequired resources
++ resource.setrlimit(resource.RLIMIT_NPROC, (0, 0))
++ resource.setrlimit(resource.RLIMIT_CORE, (0, 0))
++ resource.setrlimit(resource.RLIMIT_MEMLOCK, (0, 0))
++
++ # py2's resource module doesn't know about resource.RLIMIT_MSGQUEUE
++ #
++ # TODO: Use resource.RLIMIT_MSGQUEUE after python2 is deprecated
++ if sys.platform.startswith('linux'):
++ RLIMIT_MSGQUEUE = 12
++ resource.setrlimit(RLIMIT_MSGQUEUE, (0, 0))
++
++ # The final look of the filesystem for this process is fully RO, but
++ # note we have some file descriptor already open (notably, kernel and
++ # ramdisk). In order to avoid a compromised pygrub from filling up the
++ # filesystem we set RLIMIT_FSIZE to a high bound, so that the file
++ # write permissions are bound.
++ fsize = LIMIT_FSIZE
++ if "PYGRUB_MAX_FILE_SIZE_MB" in os.environ.keys():
++ fsize = os.environ["PYGRUB_MAX_FILE_SIZE_MB"] << 20
++
++ resource.setrlimit(resource.RLIMIT_FSIZE, (fsize, fsize))
++
++def depriv(output_directory, output, device, uid, path_kernel, path_ramdisk):
++ # The only point of this call is to force the loading of libfsimage.
++ # That way, we don't need to bind-mount it into the chroot
++ rc = xenfsimage.init()
++ if rc != 0:
++ os.unlink(path_ramdisk)
++ os.unlink(path_kernel)
++ raise RuntimeError("bad_xenfsimage: rc=%d" % rc)
++
++ # Create a temporary directory for the chroot
++ chroot = tempfile.mkdtemp(prefix=str(uid)+'-', dir=output_directory) + '/'
++ device_path = '/device'
++
++ pid = os.fork()
++ if pid:
++ # parent
++ _, rc = os.waitpid(pid, 0)
++
++ for path in [path_kernel, path_ramdisk]:
++ # If the child didn't write anything, just get rid of it,
++ # otherwise we end up consuming a 0-size file when parsing
++ # systems without a ramdisk that the ultimate caller of pygrub
++ # may just be unaware of
++ if rc != 0 or os.path.getsize(path) == 0:
++ os.unlink(path)
++
++ # Normally, unshare(CLONE_NEWNS) will ensure this is not required.
++ # However, this syscall doesn't exist in *BSD systems and doesn't
++ # auto-unmount everything on older Linux kernels (At least as of
++ # Linux 4.19, but it seems fixed in 5.15). Either way,
++ # recursively unmount everything if needed. Quietly.
++ with open('/dev/null', 'w') as devnull:
++ subprocess.call(["umount", "-f", chroot + device_path],
++ stdout=devnull, stderr=devnull)
++ subprocess.call(["umount", "-f", chroot],
++ stdout=devnull, stderr=devnull)
++ os.rmdir(chroot)
++
++ sys.exit(rc)
++
++ # By unsharing the namespace we're making sure it's all bulk-released
++ # at the end, when the namespaces disappear. This means the kernel does
++ # (almost) all the cleanup for us and the parent just has to remove the
++ # temporary directory.
++ unshare(CLONE_NEWNS | CLONE_NEWIPC | CLONE_NEWNET)
++
++ # Set sensible limits using the setrlimit interface
++ downgrade_rlimits()
++
++ # We'll mount tmpfs on the chroot to ensure the deprivileged child
++ # cannot affect the persistent state. It's RW now in order to
++ # bind-mount the device, but note it's remounted RO after that.
++ rc = subprocess.call(["mount", "-t", "tmpfs", "none", chroot])
++ if rc != 0:
++ raise RuntimeError("mount_tmpfs rc=%d dst=\"%s\"" % (rc, chroot))
++
++ # Bind the untrusted device RO
++ bind_mount(device, chroot + device_path, "ro,nosuid,noexec")
++
++ rc = subprocess.call(["mount", "-t", "tmpfs", "-o", "remount,ro,nosuid,noexec,nodev", "none", chroot])
++ if rc != 0:
++ raise RuntimeError("remount_tmpfs rc=%d dst=\"%s\"" % (rc, chroot))
++
++ # Drop superpowers!
++ os.chroot(chroot)
++ os.chdir('/')
++ os.setgid(uid)
++ os.setgroups([uid])
++ os.setuid(uid)
++
++ return device_path
++
+ def read_size_roundup(fd, size):
+ if platform.system() != 'FreeBSD':
+ return size
+@@ -736,7 +864,7 @@ if __name__ == "__main__":
+ sel = None
+
+ def usage():
+- print("Usage: %s [-q|--quiet] [-i|--interactive] [-l|--list-entries] [-n|--not-really] [--output=] [--kernel=] [--ramdisk=] [--args=] [--entry=] [--output-directory=] [--output-format=sxp|simple|simple0] [--offset=] <image>" %(sys.argv[0],), file=sys.stderr)
++ print("Usage: %s [-q|--quiet] [-i|--interactive] [-l|--list-entries] [-n|--not-really] [--output=] [--kernel=] [--ramdisk=] [--args=] [--entry=] [--output-directory=] [--output-format=sxp|simple|simple0] [--runas=] [--offset=] <image>" %(sys.argv[0],), file=sys.stderr)
+
+ def copy_from_image(fs, file_to_read, file_type, fd_dst, path_dst, not_really):
+ if not_really:
+@@ -760,7 +888,8 @@ if __name__ == "__main__":
+ os.write(fd_dst, data)
+ except Exception as e:
+ print(e, file=sys.stderr)
+- os.unlink(path_dst)
++ if path_dst:
++ os.unlink(path_dst)
+ del datafile
+ sys.exit("Error writing temporary copy of "+file_type)
+ dataoff += len(data)
+@@ -769,7 +898,7 @@ if __name__ == "__main__":
+ opts, args = getopt.gnu_getopt(sys.argv[1:], 'qilnh::',
+ ["quiet", "interactive", "list-entries", "not-really", "help",
+ "output=", "output-format=", "output-directory=", "offset=",
+- "entry=", "kernel=",
++ "runas=", "entry=", "kernel=",
+ "ramdisk=", "args=", "isconfig", "debug"])
+ except getopt.GetoptError:
+ usage()
+@@ -790,6 +919,7 @@ if __name__ == "__main__":
+ not_really = False
+ output_format = "sxp"
+ output_directory = "/var/run/xen/pygrub/"
++ uid = None
+
+ # what was passed in
+ incfg = { "kernel": None, "ramdisk": None, "args": "" }
+@@ -813,6 +943,13 @@ if __name__ == "__main__":
+ elif o in ("--output",):
+ if a != "-":
+ output = a
++ elif o in ("--runas",):
++ try:
++ uid = int(a)
++ except ValueError:
++ print("runas value must be an integer user id")
++ usage()
++ sys.exit(1)
+ elif o in ("--kernel",):
+ incfg["kernel"] = a
+ elif o in ("--ramdisk",):
+@@ -849,6 +986,10 @@ if __name__ == "__main__":
+ if debug:
+ logging.basicConfig(level=logging.DEBUG)
+
++ if interactive and uid:
++ print("In order to use --runas, you must also set --entry or -q", file=sys.stderr)
++ sys.exit(1)
++
+ try:
+ os.makedirs(output_directory, 0o700)
+ except OSError as e:
+@@ -870,6 +1011,9 @@ if __name__ == "__main__":
+ else:
+ fd = os.open(output, os.O_WRONLY)
+
++ if uid:
++ file = depriv(output_directory, output, file, uid, path_kernel, path_ramdisk)
++
+ # debug
+ if isconfig:
+ chosencfg = run_grub(file, entry, fs, incfg["args"])
+@@ -925,21 +1069,21 @@ if __name__ == "__main__":
+ raise RuntimeError("Unable to find partition containing kernel")
+
+ copy_from_image(fs, chosencfg["kernel"], "kernel",
+- fd_kernel, path_kernel, not_really)
++ fd_kernel, None if uid else path_kernel, not_really)
+ bootcfg["kernel"] = path_kernel
+
+ if chosencfg["ramdisk"]:
+ try:
+ copy_from_image(fs, chosencfg["ramdisk"], "ramdisk",
+- fd_ramdisk, path_ramdisk, not_really)
++ fd_ramdisk, None if uid else path_ramdisk, not_really)
+ except:
+- if not not_really:
+- os.unlink(path_kernel)
++ if not uid and not not_really:
++ os.unlink(path_kernel)
+ raise
+ bootcfg["ramdisk"] = path_ramdisk
+ else:
+ initrd = None
+- if not not_really:
++ if not uid and not not_really:
+ os.unlink(path_ramdisk)
+
+ args = None
+--
+2.42.0
+
diff --git a/0052-libxl-add-support-for-running-bootloader-in-restrict.patch b/0052-libxl-add-support-for-running-bootloader-in-restrict.patch
new file mode 100644
index 0000000..08691b9
--- /dev/null
+++ b/0052-libxl-add-support-for-running-bootloader-in-restrict.patch
@@ -0,0 +1,251 @@
+From 42bf49d74b711ca7fef37bcde12928220c8e9700 Mon Sep 17 00:00:00 2001
+From: Roger Pau Monne <roger.pau@citrix.com>
+Date: Mon, 25 Sep 2023 14:30:20 +0200
+Subject: [PATCH 52/55] libxl: add support for running bootloader in restricted
+ mode
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Much like the device model depriv mode, add the same kind of support for the
+bootloader. Such feature allows passing a UID as a parameter for the
+bootloader to run as, together with the bootloader itself taking the necessary
+actions to isolate.
+
+Note that the user to run the bootloader as must have the right permissions to
+access the guest disk image (in read mode only), and that the bootloader will
+be run in non-interactive mode when restricted.
+
+If enabled bootloader restrict mode will attempt to re-use the user(s) from the
+QEMU depriv implementation if no user is provided on the configuration file or
+the environment. See docs/features/qemu-deprivilege.pandoc for more
+information about how to setup those users.
+
+Bootloader restrict mode is not enabled by default as it requires certain
+setup to be done first (setup of the user(s) to use in restrict mode).
+
+This is part of XSA-443 / CVE-2023-34325
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
+(cherry picked from commit 1f762642d2cad1a40634e3280361928109d902f1)
+---
+ docs/man/xl.1.pod.in | 33 +++++++++++
+ tools/libs/light/libxl_bootloader.c | 89 ++++++++++++++++++++++++++++-
+ tools/libs/light/libxl_dm.c | 8 +--
+ tools/libs/light/libxl_internal.h | 8 +++
+ 4 files changed, 131 insertions(+), 7 deletions(-)
+
+diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
+index 101e14241d..4831e12242 100644
+--- a/docs/man/xl.1.pod.in
++++ b/docs/man/xl.1.pod.in
+@@ -1957,6 +1957,39 @@ ignored:
+
+ =back
+
++=head1 ENVIRONMENT VARIABLES
++
++The following environment variables shall affect the execution of xl:
++
++=over 4
++
++=item LIBXL_BOOTLOADER_RESTRICT
++
++Attempt to restrict the bootloader after startup, to limit the
++consequences of security vulnerabilities due to parsing guest
++owned image files.
++
++See docs/features/qemu-deprivilege.pandoc for more information
++on how to setup the unprivileged users.
++
++Note that running the bootloader in restricted mode also implies using
++non-interactive mode, and the disk image must be readable by the
++restricted user.
++
++Having this variable set is equivalent to enabling the option, even if the
++value is 0.
++
++=item LIBXL_BOOTLOADER_USER
++
++When using bootloader_restrict, run the bootloader as this user. If
++not set the default QEMU restrict users will be used.
++
++NOTE: Each domain MUST have a SEPARATE username.
++
++See docs/features/qemu-deprivilege.pandoc for more information.
++
++=back
++
+ =head1 SEE ALSO
+
+ The following man pages:
+diff --git a/tools/libs/light/libxl_bootloader.c b/tools/libs/light/libxl_bootloader.c
+index 108329b4a5..23c0ef3e89 100644
+--- a/tools/libs/light/libxl_bootloader.c
++++ b/tools/libs/light/libxl_bootloader.c
+@@ -14,6 +14,7 @@
+
+ #include "libxl_osdeps.h" /* must come before any other headers */
+
++#include <pwd.h>
+ #include <termios.h>
+ #ifdef HAVE_UTMP_H
+ #include <utmp.h>
+@@ -42,8 +43,71 @@ static void bootloader_arg(libxl__bootloader_state *bl, const char *arg)
+ bl->args[bl->nargs++] = arg;
+ }
+
+-static void make_bootloader_args(libxl__gc *gc, libxl__bootloader_state *bl,
+- const char *bootloader_path)
++static int bootloader_uid(libxl__gc *gc, domid_t guest_domid,
++ const char *user, uid_t *intended_uid)
++{
++ struct passwd *user_base, user_pwbuf;
++ int rc;
++
++ if (user) {
++ rc = userlookup_helper_getpwnam(gc, user, &user_pwbuf, &user_base);
++ if (rc) return rc;
++
++ if (!user_base) {
++ LOGD(ERROR, guest_domid, "Couldn't find user %s", user);
++ return ERROR_INVAL;
++ }
++
++ *intended_uid = user_base->pw_uid;
++ return 0;
++ }
++
++ /* Re-use QEMU user range for the bootloader. */
++ rc = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_RANGE_BASE,
++ &user_pwbuf, &user_base);
++ if (rc) return rc;
++
++ if (user_base) {
++ struct passwd *user_clash, user_clash_pwbuf;
++ uid_t temp_uid = user_base->pw_uid + guest_domid;
++
++ rc = userlookup_helper_getpwuid(gc, temp_uid, &user_clash_pwbuf,
++ &user_clash);
++ if (rc) return rc;
++
++ if (user_clash) {
++ LOGD(ERROR, guest_domid,
++ "wanted to use uid %ld (%s + %d) but that is user %s !",
++ (long)temp_uid, LIBXL_QEMU_USER_RANGE_BASE,
++ guest_domid, user_clash->pw_name);
++ return ERROR_INVAL;
++ }
++
++ *intended_uid = temp_uid;
++ return 0;
++ }
++
++ rc = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_SHARED, &user_pwbuf,
++ &user_base);
++ if (rc) return rc;
++
++ if (user_base) {
++ LOGD(WARN, guest_domid, "Could not find user %s, falling back to %s",
++ LIBXL_QEMU_USER_RANGE_BASE, LIBXL_QEMU_USER_SHARED);
++ *intended_uid = user_base->pw_uid;
++
++ return 0;
++ }
++
++ LOGD(ERROR, guest_domid,
++ "Could not find user %s or range base pseudo-user %s, cannot restrict",
++ LIBXL_QEMU_USER_SHARED, LIBXL_QEMU_USER_RANGE_BASE);
++
++ return ERROR_INVAL;
++}
++
++static int make_bootloader_args(libxl__gc *gc, libxl__bootloader_state *bl,
++ const char *bootloader_path)
+ {
+ const libxl_domain_build_info *info = bl->info;
+
+@@ -61,6 +125,23 @@ static void make_bootloader_args(libxl__gc *gc, libxl__bootloader_state *bl,
+ ARG(GCSPRINTF("--ramdisk=%s", info->ramdisk));
+ if (info->cmdline && *info->cmdline != '\0')
+ ARG(GCSPRINTF("--args=%s", info->cmdline));
++ if (getenv("LIBXL_BOOTLOADER_RESTRICT") ||
++ getenv("LIBXL_BOOTLOADER_USER")) {
++ uid_t uid = -1;
++ int rc = bootloader_uid(gc, bl->domid, getenv("LIBXL_BOOTLOADER_USER"),
++ &uid);
++
++ if (rc) return rc;
++
++ assert(uid != -1);
++ if (!uid) {
++ LOGD(ERROR, bl->domid, "bootloader restrict UID is 0 (root)!");
++ return ERROR_INVAL;
++ }
++ LOGD(DEBUG, bl->domid, "using uid %ld", (long)uid);
++ ARG(GCSPRINTF("--runas=%ld", (long)uid));
++ ARG("--quiet");
++ }
+
+ ARG(GCSPRINTF("--output=%s", bl->outputpath));
+ ARG("--output-format=simple0");
+@@ -79,6 +160,7 @@ static void make_bootloader_args(libxl__gc *gc, libxl__bootloader_state *bl,
+ /* Sentinel for execv */
+ ARG(NULL);
+
++ return 0;
+ #undef ARG
+ }
+
+@@ -443,7 +525,8 @@ static void bootloader_disk_attached_cb(libxl__egc *egc,
+ bootloader = bltmp;
+ }
+
+- make_bootloader_args(gc, bl, bootloader);
++ rc = make_bootloader_args(gc, bl, bootloader);
++ if (rc) goto out;
+
+ bl->openpty.ao = ao;
+ bl->openpty.callback = bootloader_gotptys;
+diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
+index fc264a3a13..14b593110f 100644
+--- a/tools/libs/light/libxl_dm.c
++++ b/tools/libs/light/libxl_dm.c
+@@ -80,10 +80,10 @@ static int libxl__create_qemu_logfile(libxl__gc *gc, char *name)
+ * On error, return a libxl-style error code.
+ */
+ #define DEFINE_USERLOOKUP_HELPER(NAME,SPEC_TYPE,STRUCTNAME,SYSCONF) \
+- static int userlookup_helper_##NAME(libxl__gc *gc, \
+- SPEC_TYPE spec, \
+- struct STRUCTNAME *resultbuf, \
+- struct STRUCTNAME **out) \
++ int userlookup_helper_##NAME(libxl__gc *gc, \
++ SPEC_TYPE spec, \
++ struct STRUCTNAME *resultbuf, \
++ struct STRUCTNAME **out) \
+ { \
+ struct STRUCTNAME *resultp = NULL; \
+ char *buf = NULL; \
+diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
+index 7ad38de30e..f1e3a9a15b 100644
+--- a/tools/libs/light/libxl_internal.h
++++ b/tools/libs/light/libxl_internal.h
+@@ -4873,6 +4873,14 @@ struct libxl__cpu_policy {
+ struct xc_msr *msr;
+ };
+
++struct passwd;
++_hidden int userlookup_helper_getpwnam(libxl__gc*, const char *user,
++ struct passwd *res,
++ struct passwd **out);
++_hidden int userlookup_helper_getpwuid(libxl__gc*, uid_t uid,
++ struct passwd *res,
++ struct passwd **out);
++
+ #endif
+
+ /*
+--
+2.42.0
+
diff --git a/0052-x86-ucode-AMD-apply-the-patch-early-on-every-logical.patch b/0052-x86-ucode-AMD-apply-the-patch-early-on-every-logical.patch
deleted file mode 100644
index 22a214b..0000000
--- a/0052-x86-ucode-AMD-apply-the-patch-early-on-every-logical.patch
+++ /dev/null
@@ -1,154 +0,0 @@
-From e9a7942f6c1638c668605fbf6d6e02bc7bff2582 Mon Sep 17 00:00:00 2001
-From: Sergey Dyasli <sergey.dyasli@citrix.com>
-Date: Fri, 3 Mar 2023 07:58:35 +0100
-Subject: [PATCH 52/89] x86/ucode/AMD: apply the patch early on every logical
- thread
-
-The original issue has been reported on AMD Bulldozer-based CPUs where
-ucode loading loses the LWP feature bit in order to gain the IBPB bit.
-LWP disabling is per-SMT/CMT core modification and needs to happen on
-each sibling thread despite the shared microcode engine. Otherwise,
-logical CPUs will end up with different cpuid capabilities.
-Link: https://bugzilla.kernel.org/show_bug.cgi?id=216211
-
-Guests running under Xen happen to be not affected because of levelling
-logic for the feature masking/override MSRs which causes the LWP bit to
-fall out and hides the issue. The latest recommendation from AMD, after
-discussing this bug, is to load ucode on every logical CPU.
-
-In Linux kernel this issue has been addressed by e7ad18d1169c
-("x86/microcode/AMD: Apply the patch early on every logical thread").
-Follow the same approach in Xen.
-
-Introduce SAME_UCODE match result and use it for early AMD ucode
-loading. Take this opportunity and move opt_ucode_allow_same out of
-compare_revisions() to the relevant callers and also modify the warning
-message based on it. Intel's side of things is modified for consistency
-but provides no functional change.
-
-Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: f4ef8a41b80831db2136bdaff9f946a1a4b051e7
-master date: 2023-02-21 15:08:05 +0100
----
- xen/arch/x86/cpu/microcode/amd.c | 11 ++++++++---
- xen/arch/x86/cpu/microcode/core.c | 26 +++++++++++++++++---------
- xen/arch/x86/cpu/microcode/intel.c | 10 +++++++---
- xen/arch/x86/cpu/microcode/private.h | 3 ++-
- 4 files changed, 34 insertions(+), 16 deletions(-)
-
-diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
-index 8195707ee1..ded8fe90e6 100644
---- a/xen/arch/x86/cpu/microcode/amd.c
-+++ b/xen/arch/x86/cpu/microcode/amd.c
-@@ -176,8 +176,8 @@ static enum microcode_match_result compare_revisions(
- if ( new_rev > old_rev )
- return NEW_UCODE;
-
-- if ( opt_ucode_allow_same && new_rev == old_rev )
-- return NEW_UCODE;
-+ if ( new_rev == old_rev )
-+ return SAME_UCODE;
-
- return OLD_UCODE;
- }
-@@ -220,8 +220,13 @@ static int cf_check apply_microcode(const struct microcode_patch *patch)
- unsigned int cpu = smp_processor_id();
- struct cpu_signature *sig = &per_cpu(cpu_sig, cpu);
- uint32_t rev, old_rev = sig->rev;
-+ enum microcode_match_result result = microcode_fits(patch);
-
-- if ( microcode_fits(patch) != NEW_UCODE )
-+ /*
-+ * Allow application of the same revision to pick up SMT-specific changes
-+ * even if the revision of the other SMT thread is already up-to-date.
-+ */
-+ if ( result != NEW_UCODE && result != SAME_UCODE )
- return -EINVAL;
-
- if ( check_final_patch_levels(sig) )
-diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
-index 452a7ca773..57ecc5358b 100644
---- a/xen/arch/x86/cpu/microcode/core.c
-+++ b/xen/arch/x86/cpu/microcode/core.c
-@@ -610,17 +610,25 @@ static long cf_check microcode_update_helper(void *data)
- * that ucode revision.
- */
- spin_lock(µcode_mutex);
-- if ( microcode_cache &&
-- alternative_call(ucode_ops.compare_patch,
-- patch, microcode_cache) != NEW_UCODE )
-+ if ( microcode_cache )
- {
-- spin_unlock(µcode_mutex);
-- printk(XENLOG_WARNING "microcode: couldn't find any newer revision "
-- "in the provided blob!\n");
-- microcode_free_patch(patch);
-- ret = -ENOENT;
-+ enum microcode_match_result result;
-
-- goto put;
-+ result = alternative_call(ucode_ops.compare_patch, patch,
-+ microcode_cache);
-+
-+ if ( result != NEW_UCODE &&
-+ !(opt_ucode_allow_same && result == SAME_UCODE) )
-+ {
-+ spin_unlock(µcode_mutex);
-+ printk(XENLOG_WARNING
-+ "microcode: couldn't find any newer%s revision in the provided blob!\n",
-+ opt_ucode_allow_same ? " (or the same)" : "");
-+ microcode_free_patch(patch);
-+ ret = -ENOENT;
-+
-+ goto put;
-+ }
- }
- spin_unlock(µcode_mutex);
-
-diff --git a/xen/arch/x86/cpu/microcode/intel.c b/xen/arch/x86/cpu/microcode/intel.c
-index f5ba6d76d7..cb08f63d2e 100644
---- a/xen/arch/x86/cpu/microcode/intel.c
-+++ b/xen/arch/x86/cpu/microcode/intel.c
-@@ -232,8 +232,8 @@ static enum microcode_match_result compare_revisions(
- if ( new_rev > old_rev )
- return NEW_UCODE;
-
-- if ( opt_ucode_allow_same && new_rev == old_rev )
-- return NEW_UCODE;
-+ if ( new_rev == old_rev )
-+ return SAME_UCODE;
-
- /*
- * Treat pre-production as always applicable - anyone using pre-production
-@@ -290,8 +290,12 @@ static int cf_check apply_microcode(const struct microcode_patch *patch)
- unsigned int cpu = smp_processor_id();
- struct cpu_signature *sig = &this_cpu(cpu_sig);
- uint32_t rev, old_rev = sig->rev;
-+ enum microcode_match_result result;
-+
-+ result = microcode_update_match(patch);
-
-- if ( microcode_update_match(patch) != NEW_UCODE )
-+ if ( result != NEW_UCODE &&
-+ !(opt_ucode_allow_same && result == SAME_UCODE) )
- return -EINVAL;
-
- wbinvd();
-diff --git a/xen/arch/x86/cpu/microcode/private.h b/xen/arch/x86/cpu/microcode/private.h
-index c085a10268..feafab0677 100644
---- a/xen/arch/x86/cpu/microcode/private.h
-+++ b/xen/arch/x86/cpu/microcode/private.h
-@@ -6,7 +6,8 @@
- extern bool opt_ucode_allow_same;
-
- enum microcode_match_result {
-- OLD_UCODE, /* signature matched, but revision id is older or equal */
-+ OLD_UCODE, /* signature matched, but revision id is older */
-+ SAME_UCODE, /* signature matched, but revision id is the same */
- NEW_UCODE, /* signature matched, but revision id is newer */
- MIS_UCODE, /* signature mismatched */
- };
---
-2.40.0
-
diff --git a/0053-libxl-limit-bootloader-execution-in-restricted-mode.patch b/0053-libxl-limit-bootloader-execution-in-restricted-mode.patch
new file mode 100644
index 0000000..8c790d3
--- /dev/null
+++ b/0053-libxl-limit-bootloader-execution-in-restricted-mode.patch
@@ -0,0 +1,158 @@
+From 46d00dbf4c22b28910f73f66a03e5cabe50b5395 Mon Sep 17 00:00:00 2001
+From: Roger Pau Monne <roger.pau@citrix.com>
+Date: Thu, 28 Sep 2023 12:22:35 +0200
+Subject: [PATCH 53/55] libxl: limit bootloader execution in restricted mode
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Introduce a timeout for bootloader execution when running in restricted mode.
+
+Allow overwriting the default time out with an environment provided value.
+
+This is part of XSA-443 / CVE-2023-34325
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
+(cherry picked from commit 9c114178ffd700112e91f5ec66cf5151b9c9a8cc)
+---
+ docs/man/xl.1.pod.in | 8 ++++++
+ tools/libs/light/libxl_bootloader.c | 40 +++++++++++++++++++++++++++++
+ tools/libs/light/libxl_internal.h | 2 ++
+ 3 files changed, 50 insertions(+)
+
+diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
+index 4831e12242..c3eb6570ab 100644
+--- a/docs/man/xl.1.pod.in
++++ b/docs/man/xl.1.pod.in
+@@ -1988,6 +1988,14 @@ NOTE: Each domain MUST have a SEPARATE username.
+
+ See docs/features/qemu-deprivilege.pandoc for more information.
+
++=item LIBXL_BOOTLOADER_TIMEOUT
++
++Timeout in seconds for bootloader execution when running in restricted mode.
++Otherwise the build time default in LIBXL_BOOTLOADER_TIMEOUT will be used.
++
++If defined the value must be an unsigned integer between 0 and INT_MAX,
++otherwise behavior is undefined. Setting to 0 disables the timeout.
++
+ =back
+
+ =head1 SEE ALSO
+diff --git a/tools/libs/light/libxl_bootloader.c b/tools/libs/light/libxl_bootloader.c
+index 23c0ef3e89..ee26d08f37 100644
+--- a/tools/libs/light/libxl_bootloader.c
++++ b/tools/libs/light/libxl_bootloader.c
+@@ -30,6 +30,8 @@ static void bootloader_keystrokes_copyfail(libxl__egc *egc,
+ libxl__datacopier_state *dc, int rc, int onwrite, int errnoval);
+ static void bootloader_display_copyfail(libxl__egc *egc,
+ libxl__datacopier_state *dc, int rc, int onwrite, int errnoval);
++static void bootloader_timeout(libxl__egc *egc, libxl__ev_time *ev,
++ const struct timeval *requested_abs, int rc);
+ static void bootloader_domaindeath(libxl__egc*, libxl__domaindeathcheck *dc,
+ int rc);
+ static void bootloader_finished(libxl__egc *egc, libxl__ev_child *child,
+@@ -297,6 +299,7 @@ void libxl__bootloader_init(libxl__bootloader_state *bl)
+ bl->ptys[0].master = bl->ptys[0].slave = 0;
+ bl->ptys[1].master = bl->ptys[1].slave = 0;
+ libxl__ev_child_init(&bl->child);
++ libxl__ev_time_init(&bl->time);
+ libxl__domaindeathcheck_init(&bl->deathcheck);
+ bl->keystrokes.ao = bl->ao; libxl__datacopier_init(&bl->keystrokes);
+ bl->display.ao = bl->ao; libxl__datacopier_init(&bl->display);
+@@ -314,6 +317,7 @@ static void bootloader_cleanup(libxl__egc *egc, libxl__bootloader_state *bl)
+ libxl__domaindeathcheck_stop(gc,&bl->deathcheck);
+ libxl__datacopier_kill(&bl->keystrokes);
+ libxl__datacopier_kill(&bl->display);
++ libxl__ev_time_deregister(gc, &bl->time);
+ for (i=0; i<2; i++) {
+ libxl__carefd_close(bl->ptys[i].master);
+ libxl__carefd_close(bl->ptys[i].slave);
+@@ -375,6 +379,7 @@ static void bootloader_stop(libxl__egc *egc,
+
+ libxl__datacopier_kill(&bl->keystrokes);
+ libxl__datacopier_kill(&bl->display);
++ libxl__ev_time_deregister(gc, &bl->time);
+ if (libxl__ev_child_inuse(&bl->child)) {
+ r = kill(bl->child.pid, SIGTERM);
+ if (r) LOGED(WARN, bl->domid, "%sfailed to kill bootloader [%lu]",
+@@ -637,6 +642,25 @@ static void bootloader_gotptys(libxl__egc *egc, libxl__openpty_state *op)
+
+ struct termios termattr;
+
++ if (getenv("LIBXL_BOOTLOADER_RESTRICT") ||
++ getenv("LIBXL_BOOTLOADER_USER")) {
++ const char *timeout_env = getenv("LIBXL_BOOTLOADER_TIMEOUT");
++ int timeout = timeout_env ? atoi(timeout_env)
++ : LIBXL_BOOTLOADER_TIMEOUT;
++
++ if (timeout) {
++ /* Set execution timeout */
++ rc = libxl__ev_time_register_rel(ao, &bl->time,
++ bootloader_timeout,
++ timeout * 1000);
++ if (rc) {
++ LOGED(ERROR, bl->domid,
++ "unable to register timeout for bootloader execution");
++ goto out;
++ }
++ }
++ }
++
+ pid_t pid = libxl__ev_child_fork(gc, &bl->child, bootloader_finished);
+ if (pid == -1) {
+ rc = ERROR_FAIL;
+@@ -702,6 +726,21 @@ static void bootloader_display_copyfail(libxl__egc *egc,
+ libxl__bootloader_state *bl = CONTAINER_OF(dc, *bl, display);
+ bootloader_copyfail(egc, "bootloader output", bl, 1, rc,onwrite,errnoval);
+ }
++static void bootloader_timeout(libxl__egc *egc, libxl__ev_time *ev,
++ const struct timeval *requested_abs, int rc)
++{
++ libxl__bootloader_state *bl = CONTAINER_OF(ev, *bl, time);
++ STATE_AO_GC(bl->ao);
++
++ libxl__ev_time_deregister(gc, &bl->time);
++
++ assert(libxl__ev_child_inuse(&bl->child));
++ LOGD(ERROR, bl->domid, "killing bootloader because of timeout");
++
++ libxl__ev_child_kill_deregister(ao, &bl->child, SIGKILL);
++
++ bootloader_callback(egc, bl, rc);
++}
+
+ static void bootloader_domaindeath(libxl__egc *egc,
+ libxl__domaindeathcheck *dc,
+@@ -718,6 +757,7 @@ static void bootloader_finished(libxl__egc *egc, libxl__ev_child *child,
+ STATE_AO_GC(bl->ao);
+ int rc;
+
++ libxl__ev_time_deregister(gc, &bl->time);
+ libxl__datacopier_kill(&bl->keystrokes);
+ libxl__datacopier_kill(&bl->display);
+
+diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
+index f1e3a9a15b..d05783617f 100644
+--- a/tools/libs/light/libxl_internal.h
++++ b/tools/libs/light/libxl_internal.h
+@@ -102,6 +102,7 @@
+ #define LIBXL_QMP_CMD_TIMEOUT 10
+ #define LIBXL_STUBDOM_START_TIMEOUT 30
+ #define LIBXL_QEMU_BODGE_TIMEOUT 2
++#define LIBXL_BOOTLOADER_TIMEOUT 120
+ #define LIBXL_XENCONSOLE_LIMIT 1048576
+ #define LIBXL_XENCONSOLE_PROTOCOL "vt100"
+ #define LIBXL_MAXMEM_CONSTANT 1024
+@@ -3744,6 +3745,7 @@ struct libxl__bootloader_state {
+ libxl__openpty_state openpty;
+ libxl__openpty_result ptys[2]; /* [0] is for bootloader */
+ libxl__ev_child child;
++ libxl__ev_time time;
+ libxl__domaindeathcheck deathcheck;
+ int nargs, argsspace;
+ const char **args;
+--
+2.42.0
+
diff --git a/0053-x86-perform-mem_sharing-teardown-before-paging-teard.patch b/0053-x86-perform-mem_sharing-teardown-before-paging-teard.patch
deleted file mode 100644
index 934c0f5..0000000
--- a/0053-x86-perform-mem_sharing-teardown-before-paging-teard.patch
+++ /dev/null
@@ -1,111 +0,0 @@
-From e8f28e129d23c940749c66150a89c4ed683a0fb9 Mon Sep 17 00:00:00 2001
-From: Tamas K Lengyel <tamas@tklengyel.com>
-Date: Fri, 3 Mar 2023 07:59:08 +0100
-Subject: [PATCH 53/89] x86: perform mem_sharing teardown before paging
- teardown
-
-An assert failure has been observed in p2m_teardown when performing vm
-forking and then destroying the forked VM (p2m-basic.c:173). The assert
-checks whether the domain's shared pages counter is 0. According to the
-patch that originally added the assert (7bedbbb5c31) the p2m_teardown
-should only happen after mem_sharing already relinquished all shared pages.
-
-In this patch we flip the order in which relinquish ops are called to avoid
-tripping the assert. Conceptually sharing being torn down makes sense to
-happen before paging is torn down.
-
-Fixes: e7aa55c0aab3 ("x86/p2m: free the paging memory pool preemptively")
-Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 2869349f0cb3a89dcbf1f1b30371f58df6309312
-master date: 2023-02-23 12:35:48 +0100
----
- xen/arch/x86/domain.c | 56 ++++++++++++++++++++++---------------------
- 1 file changed, 29 insertions(+), 27 deletions(-)
-
-diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
-index 5a119eec3a..e546c98322 100644
---- a/xen/arch/x86/domain.c
-+++ b/xen/arch/x86/domain.c
-@@ -2347,9 +2347,9 @@ int domain_relinquish_resources(struct domain *d)
-
- enum {
- PROG_iommu_pagetables = 1,
-+ PROG_shared,
- PROG_paging,
- PROG_vcpu_pagetables,
-- PROG_shared,
- PROG_xen,
- PROG_l4,
- PROG_l3,
-@@ -2368,6 +2368,34 @@ int domain_relinquish_resources(struct domain *d)
- if ( ret )
- return ret;
-
-+#ifdef CONFIG_MEM_SHARING
-+ PROGRESS(shared):
-+
-+ if ( is_hvm_domain(d) )
-+ {
-+ /*
-+ * If the domain has shared pages, relinquish them allowing
-+ * for preemption.
-+ */
-+ ret = relinquish_shared_pages(d);
-+ if ( ret )
-+ return ret;
-+
-+ /*
-+ * If the domain is forked, decrement the parent's pause count
-+ * and release the domain.
-+ */
-+ if ( mem_sharing_is_fork(d) )
-+ {
-+ struct domain *parent = d->parent;
-+
-+ d->parent = NULL;
-+ domain_unpause(parent);
-+ put_domain(parent);
-+ }
-+ }
-+#endif
-+
- PROGRESS(paging):
-
- /* Tear down paging-assistance stuff. */
-@@ -2408,32 +2436,6 @@ int domain_relinquish_resources(struct domain *d)
- d->arch.auto_unmask = 0;
- }
-
--#ifdef CONFIG_MEM_SHARING
-- PROGRESS(shared):
--
-- if ( is_hvm_domain(d) )
-- {
-- /* If the domain has shared pages, relinquish them allowing
-- * for preemption. */
-- ret = relinquish_shared_pages(d);
-- if ( ret )
-- return ret;
--
-- /*
-- * If the domain is forked, decrement the parent's pause count
-- * and release the domain.
-- */
-- if ( mem_sharing_is_fork(d) )
-- {
-- struct domain *parent = d->parent;
--
-- d->parent = NULL;
-- domain_unpause(parent);
-- put_domain(parent);
-- }
-- }
--#endif
--
- spin_lock(&d->page_alloc_lock);
- page_list_splice(&d->arch.relmem_list, &d->page_list);
- INIT_PAGE_LIST_HEAD(&d->arch.relmem_list);
---
-2.40.0
-
diff --git a/0054-x86-svm-Fix-asymmetry-with-AMD-DR-MASK-context-switc.patch b/0054-x86-svm-Fix-asymmetry-with-AMD-DR-MASK-context-switc.patch
new file mode 100644
index 0000000..af72c9a
--- /dev/null
+++ b/0054-x86-svm-Fix-asymmetry-with-AMD-DR-MASK-context-switc.patch
@@ -0,0 +1,104 @@
+From 3f8b444072fd8615288d9d11e53fbf0b6a8a7750 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 26 Sep 2023 20:03:36 +0100
+Subject: [PATCH 54/55] x86/svm: Fix asymmetry with AMD DR MASK context
+ switching
+
+The handling of MSR_DR{0..3}_MASK is asymmetric between PV and HVM guests.
+
+HVM guests context switch in based on the guest view of DBEXT, whereas PV
+guest switch in base on the host capability. Both guest types leave the
+context dirty for the next vCPU.
+
+This leads to the following issue:
+
+ * PV or HVM vCPU has debugging active (%dr7 + mask)
+ * Switch out deactivates %dr7 but leaves other state stale in hardware
+ * HVM vCPU with debugging activate but can't see DBEXT is switched in
+ * Switch in loads %dr7 but leaves the mask MSRs alone
+
+Now, the HVM vCPU is operating in the context of the prior vCPU's mask MSR,
+and furthermore in a case where it genuinely expects there to be no masking
+MSRs.
+
+As a stopgap, adjust the HVM path to switch in/out the masks based on host
+capabilities rather than guest visibility (i.e. like the PV path). Adjustment
+of the of the intercepts still needs to be dependent on the guest visibility
+of DBEXT.
+
+This is part of XSA-444 / CVE-2023-34327
+
+Fixes: c097f54912d3 ("x86/SVM: support data breakpoint extension registers")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+(cherry picked from commit 5d54282f984bb9a7a65b3d12208584f9fdf1c8e1)
+---
+ xen/arch/x86/hvm/svm/svm.c | 24 ++++++++++++++++++------
+ xen/arch/x86/traps.c | 5 +++++
+ 2 files changed, 23 insertions(+), 6 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
+index e8f50e7c5e..fd32600ae3 100644
+--- a/xen/arch/x86/hvm/svm/svm.c
++++ b/xen/arch/x86/hvm/svm/svm.c
+@@ -339,6 +339,10 @@ static void svm_save_dr(struct vcpu *v)
+ v->arch.hvm.flag_dr_dirty = 0;
+ vmcb_set_dr_intercepts(vmcb, ~0u);
+
++ /*
++ * The guest can only have changed the mask MSRs if we previous dropped
++ * intercepts. Re-read them from hardware.
++ */
+ if ( v->domain->arch.cpuid->extd.dbext )
+ {
+ svm_intercept_msr(v, MSR_AMD64_DR0_ADDRESS_MASK, MSR_INTERCEPT_RW);
+@@ -370,17 +374,25 @@ static void __restore_debug_registers(struct vmcb_struct *vmcb, struct vcpu *v)
+
+ ASSERT(v == current);
+
+- if ( v->domain->arch.cpuid->extd.dbext )
++ /*
++ * Both the PV and HVM paths leave stale DR_MASK values in hardware on
++ * context-switch-out. If we're activating %dr7 for the guest, we must
++ * sync the DR_MASKs too, whether or not the guest can see them.
++ */
++ if ( boot_cpu_has(X86_FEATURE_DBEXT) )
+ {
+- svm_intercept_msr(v, MSR_AMD64_DR0_ADDRESS_MASK, MSR_INTERCEPT_NONE);
+- svm_intercept_msr(v, MSR_AMD64_DR1_ADDRESS_MASK, MSR_INTERCEPT_NONE);
+- svm_intercept_msr(v, MSR_AMD64_DR2_ADDRESS_MASK, MSR_INTERCEPT_NONE);
+- svm_intercept_msr(v, MSR_AMD64_DR3_ADDRESS_MASK, MSR_INTERCEPT_NONE);
+-
+ wrmsrl(MSR_AMD64_DR0_ADDRESS_MASK, v->arch.msrs->dr_mask[0]);
+ wrmsrl(MSR_AMD64_DR1_ADDRESS_MASK, v->arch.msrs->dr_mask[1]);
+ wrmsrl(MSR_AMD64_DR2_ADDRESS_MASK, v->arch.msrs->dr_mask[2]);
+ wrmsrl(MSR_AMD64_DR3_ADDRESS_MASK, v->arch.msrs->dr_mask[3]);
++
++ if ( v->domain->arch.cpuid->extd.dbext )
++ {
++ svm_intercept_msr(v, MSR_AMD64_DR0_ADDRESS_MASK, MSR_INTERCEPT_NONE);
++ svm_intercept_msr(v, MSR_AMD64_DR1_ADDRESS_MASK, MSR_INTERCEPT_NONE);
++ svm_intercept_msr(v, MSR_AMD64_DR2_ADDRESS_MASK, MSR_INTERCEPT_NONE);
++ svm_intercept_msr(v, MSR_AMD64_DR3_ADDRESS_MASK, MSR_INTERCEPT_NONE);
++ }
+ }
+
+ write_debugreg(0, v->arch.dr[0]);
+diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
+index e65cc60041..06c4f3868b 100644
+--- a/xen/arch/x86/traps.c
++++ b/xen/arch/x86/traps.c
+@@ -2281,6 +2281,11 @@ void activate_debugregs(const struct vcpu *curr)
+ if ( curr->arch.dr7 & DR7_ACTIVE_MASK )
+ write_debugreg(7, curr->arch.dr7);
+
++ /*
++ * Both the PV and HVM paths leave stale DR_MASK values in hardware on
++ * context-switch-out. If we're activating %dr7 for the guest, we must
++ * sync the DR_MASKs too, whether or not the guest can see them.
++ */
+ if ( boot_cpu_has(X86_FEATURE_DBEXT) )
+ {
+ wrmsrl(MSR_AMD64_DR0_ADDRESS_MASK, curr->arch.msrs->dr_mask[0]);
+--
+2.42.0
+
diff --git a/0054-xen-Work-around-Clang-IAS-macro-expansion-bug.patch b/0054-xen-Work-around-Clang-IAS-macro-expansion-bug.patch
deleted file mode 100644
index 525dc49..0000000
--- a/0054-xen-Work-around-Clang-IAS-macro-expansion-bug.patch
+++ /dev/null
@@ -1,109 +0,0 @@
-From 837bdc6eb2df796e832302347f363afc820694fe Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 3 Mar 2023 08:00:04 +0100
-Subject: [PATCH 54/89] xen: Work around Clang-IAS macro \@ expansion bug
-
-https://github.com/llvm/llvm-project/issues/60792
-
-It turns out that Clang-IAS does not expand \@ uniquely in a translaition
-unit, and the XSA-426 change tickles this bug:
-
- <instantiation>:4:1: error: invalid symbol redefinition
- .L1_fill_rsb_loop:
- ^
- make[3]: *** [Rules.mk:247: arch/x86/acpi/cpu_idle.o] Error 1
-
-Extend DO_OVERWRITE_RSB with an optional parameter so C callers can mix %= in
-too, which Clang does seem to expand properly.
-
-Fixes: 63305e5392ec ("x86/spec-ctrl: Mitigate Cross-Thread Return Address Predictions")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: a2adacff0b91cc7b977abb209dc419a2ef15963f
-master date: 2023-02-24 17:44:29 +0000
----
- xen/arch/x86/include/asm/spec_ctrl.h | 4 ++--
- xen/arch/x86/include/asm/spec_ctrl_asm.h | 19 ++++++++++++-------
- 2 files changed, 14 insertions(+), 9 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/spec_ctrl.h b/xen/arch/x86/include/asm/spec_ctrl.h
-index 391973ef6a..a431fea587 100644
---- a/xen/arch/x86/include/asm/spec_ctrl.h
-+++ b/xen/arch/x86/include/asm/spec_ctrl.h
-@@ -83,7 +83,7 @@ static always_inline void spec_ctrl_new_guest_context(void)
- wrmsrl(MSR_PRED_CMD, PRED_CMD_IBPB);
-
- /* (ab)use alternative_input() to specify clobbers. */
-- alternative_input("", "DO_OVERWRITE_RSB", X86_BUG_IBPB_NO_RET,
-+ alternative_input("", "DO_OVERWRITE_RSB xu=%=", X86_BUG_IBPB_NO_RET,
- : "rax", "rcx");
- }
-
-@@ -172,7 +172,7 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
- *
- * (ab)use alternative_input() to specify clobbers.
- */
-- alternative_input("", "DO_OVERWRITE_RSB", X86_FEATURE_SC_RSB_IDLE,
-+ alternative_input("", "DO_OVERWRITE_RSB xu=%=", X86_FEATURE_SC_RSB_IDLE,
- : "rax", "rcx");
- }
-
-diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-index fab27ff553..f23bb105c5 100644
---- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
-+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-@@ -117,11 +117,16 @@
- .L\@_done:
- .endm
-
--.macro DO_OVERWRITE_RSB tmp=rax
-+.macro DO_OVERWRITE_RSB tmp=rax xu
- /*
- * Requires nothing
- * Clobbers \tmp (%rax by default), %rcx
- *
-+ * xu is an optional parameter to add eXtra Uniqueness. It is intended for
-+ * passing %= in from an asm() block, in order to work around
-+ * https://github.com/llvm/llvm-project/issues/60792 where Clang-IAS doesn't
-+ * expand \@ uniquely.
-+ *
- * Requires 256 bytes of {,shadow}stack space, but %rsp/SSP has no net
- * change. Based on Google's performance numbers, the loop is unrolled to 16
- * iterations and two calls per iteration.
-@@ -136,27 +141,27 @@
- mov $16, %ecx /* 16 iterations, two calls per loop */
- mov %rsp, %\tmp /* Store the current %rsp */
-
--.L\@_fill_rsb_loop:
-+.L\@_fill_rsb_loop\xu:
-
- .irp n, 1, 2 /* Unrolled twice. */
-- call .L\@_insert_rsb_entry_\n /* Create an RSB entry. */
-+ call .L\@_insert_rsb_entry\xu\n /* Create an RSB entry. */
- int3 /* Halt rogue speculation. */
-
--.L\@_insert_rsb_entry_\n:
-+.L\@_insert_rsb_entry\xu\n:
- .endr
-
- sub $1, %ecx
-- jnz .L\@_fill_rsb_loop
-+ jnz .L\@_fill_rsb_loop\xu
- mov %\tmp, %rsp /* Restore old %rsp */
-
- #ifdef CONFIG_XEN_SHSTK
- mov $1, %ecx
- rdsspd %ecx
- cmp $1, %ecx
-- je .L\@_shstk_done
-+ je .L\@_shstk_done\xu
- mov $64, %ecx /* 64 * 4 bytes, given incsspd */
- incsspd %ecx /* Restore old SSP */
--.L\@_shstk_done:
-+.L\@_shstk_done\xu:
- #endif
- .endm
-
---
-2.40.0
-
diff --git a/0055-x86-pv-Correct-the-auditing-of-guest-breakpoint-addr.patch b/0055-x86-pv-Correct-the-auditing-of-guest-breakpoint-addr.patch
new file mode 100644
index 0000000..5838e7f
--- /dev/null
+++ b/0055-x86-pv-Correct-the-auditing-of-guest-breakpoint-addr.patch
@@ -0,0 +1,86 @@
+From 0b56bed864ca9b572473957f0254aefa797216f2 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 26 Sep 2023 20:03:36 +0100
+Subject: [PATCH 55/55] x86/pv: Correct the auditing of guest breakpoint
+ addresses
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The use of access_ok() is buggy, because it permits access to the compat
+translation area. 64bit PV guests don't use the XLAT area, but on AMD
+hardware, the DBEXT feature allows a breakpoint to match up to a 4G aligned
+region, allowing the breakpoint to reach outside of the XLAT area.
+
+Prior to c/s cda16c1bb223 ("x86: mirror compat argument translation area for
+32-bit PV"), the live GDT was within 4G of the XLAT area.
+
+All together, this allowed a malicious 64bit PV guest on AMD hardware to place
+a breakpoint over the live GDT, and trigger a #DB livelock (CVE-2015-8104).
+
+Introduce breakpoint_addr_ok() and explain why __addr_ok() happens to be an
+appropriate check in this case.
+
+For Xen 4.14 and later, this is a latent bug because the XLAT area has moved
+to be on its own with nothing interesting adjacent. For Xen 4.13 and older on
+AMD hardware, this fixes a PV-trigger-able DoS.
+
+This is part of XSA-444 / CVE-2023-34328.
+
+Fixes: 65e355490817 ("x86/PV: support data breakpoint extension registers")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit dc9d9aa62ddeb14abd5672690d30789829f58f7e)
+---
+ xen/arch/x86/include/asm/debugreg.h | 20 ++++++++++++++++++++
+ xen/arch/x86/pv/misc-hypercalls.c | 2 +-
+ 2 files changed, 21 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/include/asm/debugreg.h b/xen/arch/x86/include/asm/debugreg.h
+index c57914efc6..cc29826524 100644
+--- a/xen/arch/x86/include/asm/debugreg.h
++++ b/xen/arch/x86/include/asm/debugreg.h
+@@ -77,6 +77,26 @@
+ asm volatile ( "mov %%db" #reg ",%0" : "=r" (__val) ); \
+ __val; \
+ })
++
++/*
++ * Architecturally, %dr{0..3} can have any arbitrary value. However, Xen
++ * can't allow the guest to breakpoint the Xen address range, so we limit the
++ * guest to the lower canonical half, or above the Xen range in the higher
++ * canonical half.
++ *
++ * Breakpoint lengths are specified to mask the low order address bits,
++ * meaning all breakpoints are naturally aligned. With %dr7, the widest
++ * breakpoint is 8 bytes. With DBEXT, the widest breakpoint is 4G. Both of
++ * the Xen boundaries have >4G alignment.
++ *
++ * In principle we should account for HYPERVISOR_COMPAT_VIRT_START(d), but
++ * 64bit Xen has never enforced this for compat guests, and there's no problem
++ * (to Xen) if the guest breakpoints it's alias of the M2P. Skipping this
++ * aspect simplifies the logic, and causes us not to reject a migrating guest
++ * which operated fine on prior versions of Xen.
++ */
++#define breakpoint_addr_ok(a) __addr_ok(a)
++
+ long set_debugreg(struct vcpu *, unsigned int reg, unsigned long value);
+ void activate_debugregs(const struct vcpu *);
+
+diff --git a/xen/arch/x86/pv/misc-hypercalls.c b/xen/arch/x86/pv/misc-hypercalls.c
+index aaaf70eb63..f8636de907 100644
+--- a/xen/arch/x86/pv/misc-hypercalls.c
++++ b/xen/arch/x86/pv/misc-hypercalls.c
+@@ -72,7 +72,7 @@ long set_debugreg(struct vcpu *v, unsigned int reg, unsigned long value)
+ switch ( reg )
+ {
+ case 0 ... 3:
+- if ( !access_ok(value, sizeof(long)) )
++ if ( !breakpoint_addr_ok(value) )
+ return -EPERM;
+
+ v->arch.dr[reg] = value;
+--
+2.42.0
+
diff --git a/0055-xen-Fix-Clang-Wunicode-diagnostic-when-building-asm-.patch b/0055-xen-Fix-Clang-Wunicode-diagnostic-when-building-asm-.patch
deleted file mode 100644
index 02755a9..0000000
--- a/0055-xen-Fix-Clang-Wunicode-diagnostic-when-building-asm-.patch
+++ /dev/null
@@ -1,83 +0,0 @@
-From b10cf1561a638c835481ae923b571cb8f7350a89 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 3 Mar 2023 08:01:21 +0100
-Subject: [PATCH 55/89] xen: Fix Clang -Wunicode diagnostic when building
- asm-macros
-
-While trying to work around a different Clang-IAS bug (parent changeset), I
-stumbled onto:
-
- In file included from arch/x86/asm-macros.c:3:
- ./arch/x86/include/asm/spec_ctrl_asm.h:144:19: error: \u used with
- no following hex digits; treating as '\' followed by identifier [-Werror,-Wunicode]
- .L\@_fill_rsb_loop\uniq:
- ^
-
-It turns out that Clang -E is sensitive to the file extension of the source
-file it is processing. Furthermore, C explicitly permits the use of \u
-escapes in identifier names, so the diagnostic would be reasonable in
-principle if we trying to compile the result.
-
-asm-macros should really have been .S from the outset, as it is ultimately
-generating assembly, not C. Rename it, which causes Clang not to complain.
-
-We need to introduce rules for generating a .i file from .S, and substituting
-c_flags for a_flags lets us drop the now-redundant -D__ASSEMBLY__.
-
-No functional change.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 53f0d02040b1df08f0589f162790ca376e1c2040
-master date: 2023-02-24 17:44:29 +0000
----
- xen/Rules.mk | 6 ++++++
- xen/arch/x86/Makefile | 2 +-
- xen/arch/x86/{asm-macros.c => asm-macros.S} | 0
- 3 files changed, 7 insertions(+), 1 deletion(-)
- rename xen/arch/x86/{asm-macros.c => asm-macros.S} (100%)
-
-diff --git a/xen/Rules.mk b/xen/Rules.mk
-index d6b7cec0a8..59072ae8df 100644
---- a/xen/Rules.mk
-+++ b/xen/Rules.mk
-@@ -273,6 +273,9 @@ $(filter %.init.o,$(obj-y) $(obj-bin-y) $(extra-y)): $(obj)/%.init.o: $(obj)/%.o
- quiet_cmd_cpp_i_c = CPP $@
- cmd_cpp_i_c = $(CPP) $(call cpp_flags,$(c_flags)) -MQ $@ -o $@ $<
-
-+quiet_cmd_cpp_i_S = CPP $@
-+cmd_cpp_i_S = $(CPP) $(call cpp_flags,$(a_flags)) -MQ $@ -o $@ $<
-+
- quiet_cmd_cc_s_c = CC $@
- cmd_cc_s_c = $(CC) $(filter-out -Wa$(comma)%,$(c_flags)) -S $< -o $@
-
-@@ -282,6 +285,9 @@ cmd_cpp_s_S = $(CPP) $(call cpp_flags,$(a_flags)) -MQ $@ -o $@ $<
- $(obj)/%.i: $(src)/%.c FORCE
- $(call if_changed_dep,cpp_i_c)
-
-+$(obj)/%.i: $(src)/%.S FORCE
-+ $(call if_changed_dep,cpp_i_S)
-+
- $(obj)/%.s: $(src)/%.c FORCE
- $(call if_changed_dep,cc_s_c)
-
-diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
-index 177a2ff742..5accbe4c67 100644
---- a/xen/arch/x86/Makefile
-+++ b/xen/arch/x86/Makefile
-@@ -240,7 +240,7 @@ $(obj)/efi/buildid.o $(obj)/efi/relocs-dummy.o: ;
- .PHONY: include
- include: $(objtree)/arch/x86/include/asm/asm-macros.h
-
--$(obj)/asm-macros.i: CFLAGS-y += -D__ASSEMBLY__ -P
-+$(obj)/asm-macros.i: CFLAGS-y += -P
-
- $(objtree)/arch/x86/include/asm/asm-macros.h: $(obj)/asm-macros.i $(src)/Makefile
- $(call filechk,asm-macros.h)
-diff --git a/xen/arch/x86/asm-macros.c b/xen/arch/x86/asm-macros.S
-similarity index 100%
-rename from xen/arch/x86/asm-macros.c
-rename to xen/arch/x86/asm-macros.S
---
-2.40.0
-
diff --git a/0056-tools-Use-PKG_CONFIG_FILE-instead-of-PKG_CONFIG-vari.patch b/0056-tools-Use-PKG_CONFIG_FILE-instead-of-PKG_CONFIG-vari.patch
deleted file mode 100644
index 59cc172..0000000
--- a/0056-tools-Use-PKG_CONFIG_FILE-instead-of-PKG_CONFIG-vari.patch
+++ /dev/null
@@ -1,91 +0,0 @@
-From 53bd16bcc0d0f5ed5d1ac6d6dc14bf6ecf2e2c43 Mon Sep 17 00:00:00 2001
-From: Bertrand Marquis <bertrand.marquis@arm.com>
-Date: Fri, 3 Mar 2023 08:02:30 +0100
-Subject: [PATCH 56/89] tools: Use PKG_CONFIG_FILE instead of PKG_CONFIG
- variable
-
-Replace PKG_CONFIG variable name with PKG_CONFIG_FILE for the name of
-the pkg-config file.
-This is preventing a conflict in some build systems where PKG_CONFIG
-actually contains the path to the pkg-config executable to use, as the
-default assignment in libs.mk is using a weak assignment (?=).
-
-This problem has been found when trying to build the latest version of
-Xen tools using buildroot.
-
-Fixes: d400dc5729e4 ("tools: tweak tools/libs/libs.mk for being able to support libxenctrl")
-Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
-Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
-master commit: b97e2fe7b9e1f4706693552697239ac2b71efee4
-master date: 2023-02-24 17:44:29 +0000
----
- tools/libs/ctrl/Makefile | 2 +-
- tools/libs/libs.mk | 16 ++++++++--------
- 2 files changed, 9 insertions(+), 9 deletions(-)
-
-diff --git a/tools/libs/ctrl/Makefile b/tools/libs/ctrl/Makefile
-index 93442ab389..15d0ae8e4e 100644
---- a/tools/libs/ctrl/Makefile
-+++ b/tools/libs/ctrl/Makefile
-@@ -4,7 +4,7 @@ include $(XEN_ROOT)/tools/Rules.mk
- include Makefile.common
-
- LIBHEADER := xenctrl.h xenctrl_compat.h
--PKG_CONFIG := xencontrol.pc
-+PKG_CONFIG_FILE := xencontrol.pc
- PKG_CONFIG_NAME := Xencontrol
-
- NO_HEADERS_CHK := y
-diff --git a/tools/libs/libs.mk b/tools/libs/libs.mk
-index 3eb91fc8f3..3fab5aecff 100644
---- a/tools/libs/libs.mk
-+++ b/tools/libs/libs.mk
-@@ -1,7 +1,7 @@
- # Common Makefile for building a lib.
- #
- # Variables taken as input:
--# PKG_CONFIG: name of pkg-config file (xen$(LIBNAME).pc if empty)
-+# PKG_CONFIG_FILE: name of pkg-config file (xen$(LIBNAME).pc if empty)
- # MAJOR: major version of lib (Xen version if empty)
- # MINOR: minor version of lib (0 if empty)
-
-@@ -26,7 +26,7 @@ ifneq ($(nosharedlibs),y)
- TARGETS += lib$(LIB_FILE_NAME).so
- endif
-
--PKG_CONFIG ?= $(LIB_FILE_NAME).pc
-+PKG_CONFIG_FILE ?= $(LIB_FILE_NAME).pc
- PKG_CONFIG_NAME ?= Xen$(LIBNAME)
- PKG_CONFIG_DESC ?= The $(PKG_CONFIG_NAME) library for Xen hypervisor
- PKG_CONFIG_VERSION := $(MAJOR).$(MINOR)
-@@ -35,13 +35,13 @@ PKG_CONFIG_LIB := $(LIB_FILE_NAME)
- PKG_CONFIG_REQPRIV := $(subst $(space),$(comma),$(strip $(foreach lib,$(patsubst ctrl,control,$(USELIBS_$(LIBNAME))),xen$(lib))))
-
- ifneq ($(CONFIG_LIBXC_MINIOS),y)
--TARGETS += $(PKG_CONFIG)
--$(PKG_CONFIG): PKG_CONFIG_PREFIX = $(prefix)
--$(PKG_CONFIG): PKG_CONFIG_INCDIR = $(includedir)
--$(PKG_CONFIG): PKG_CONFIG_LIBDIR = $(libdir)
-+TARGETS += $(PKG_CONFIG_FILE)
-+$(PKG_CONFIG_FILE): PKG_CONFIG_PREFIX = $(prefix)
-+$(PKG_CONFIG_FILE): PKG_CONFIG_INCDIR = $(includedir)
-+$(PKG_CONFIG_FILE): PKG_CONFIG_LIBDIR = $(libdir)
- endif
-
--PKG_CONFIG_LOCAL := $(PKG_CONFIG_DIR)/$(PKG_CONFIG)
-+PKG_CONFIG_LOCAL := $(PKG_CONFIG_DIR)/$(PKG_CONFIG_FILE)
-
- LIBHEADER ?= $(LIB_FILE_NAME).h
- LIBHEADERS = $(foreach h, $(LIBHEADER), $(XEN_INCLUDE)/$(h))
-@@ -103,7 +103,7 @@ install:: all
- $(SYMLINK_SHLIB) lib$(LIB_FILE_NAME).so.$(MAJOR).$(MINOR) $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so.$(MAJOR)
- $(SYMLINK_SHLIB) lib$(LIB_FILE_NAME).so.$(MAJOR) $(DESTDIR)$(libdir)/lib$(LIB_FILE_NAME).so
- for i in $(LIBHEADERS); do $(INSTALL_DATA) $$i $(DESTDIR)$(includedir); done
-- $(INSTALL_DATA) $(PKG_CONFIG) $(DESTDIR)$(PKG_INSTALLDIR)
-+ $(INSTALL_DATA) $(PKG_CONFIG_FILE) $(DESTDIR)$(PKG_INSTALLDIR)
-
- .PHONY: uninstall
- uninstall::
---
-2.40.0
-
diff --git a/0057-libs-guest-Fix-resource-leaks-in-xc_core_arch_map_p2.patch b/0057-libs-guest-Fix-resource-leaks-in-xc_core_arch_map_p2.patch
deleted file mode 100644
index ea80bd0..0000000
--- a/0057-libs-guest-Fix-resource-leaks-in-xc_core_arch_map_p2.patch
+++ /dev/null
@@ -1,65 +0,0 @@
-From 01f85d835bb10d18bdab2cc780ea5ad47004516d Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 3 Mar 2023 08:02:59 +0100
-Subject: [PATCH 57/89] libs/guest: Fix resource leaks in
- xc_core_arch_map_p2m_tree_rw()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Edwin, with the help of GCC's -fanalyzer, identified that p2m_frame_list_list
-gets leaked. What fanalyzer can't see is that the live_p2m_frame_list_list
-and live_p2m_frame_list foreign mappings are leaked too.
-
-Rework the logic so the out path is executed unconditionally, which cleans up
-all the intermediate allocations/mappings appropriately.
-
-Fixes: bd7a29c3d0b9 ("tools/libs/ctrl: fix xc_core_arch_map_p2m() to support linear p2m table")
-Reported-by: Edwin Török <edwin.torok@cloud.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-master commit: 1868d7f22660c8980bd0a7e53f044467e8b63bb5
-master date: 2023-02-27 15:51:23 +0000
----
- tools/libs/guest/xg_core_x86.c | 8 +++-----
- 1 file changed, 3 insertions(+), 5 deletions(-)
-
-diff --git a/tools/libs/guest/xg_core_x86.c b/tools/libs/guest/xg_core_x86.c
-index 61106b98b8..c5e4542ccc 100644
---- a/tools/libs/guest/xg_core_x86.c
-+++ b/tools/libs/guest/xg_core_x86.c
-@@ -229,11 +229,11 @@ xc_core_arch_map_p2m_tree_rw(xc_interface *xch, struct domain_info_context *dinf
- uint32_t dom, shared_info_any_t *live_shinfo)
- {
- /* Double and single indirect references to the live P2M table */
-- xen_pfn_t *live_p2m_frame_list_list;
-+ xen_pfn_t *live_p2m_frame_list_list = NULL;
- xen_pfn_t *live_p2m_frame_list = NULL;
- /* Copies of the above. */
- xen_pfn_t *p2m_frame_list_list = NULL;
-- xen_pfn_t *p2m_frame_list;
-+ xen_pfn_t *p2m_frame_list = NULL;
-
- int err;
- int i;
-@@ -297,8 +297,6 @@ xc_core_arch_map_p2m_tree_rw(xc_interface *xch, struct domain_info_context *dinf
-
- dinfo->p2m_frames = P2M_FL_ENTRIES;
-
-- return p2m_frame_list;
--
- out:
- err = errno;
-
-@@ -312,7 +310,7 @@ xc_core_arch_map_p2m_tree_rw(xc_interface *xch, struct domain_info_context *dinf
-
- errno = err;
-
-- return NULL;
-+ return p2m_frame_list;
- }
-
- static int
---
-2.40.0
-
diff --git a/0058-libs-guest-Fix-leak-on-realloc-failure-in-backup_pte.patch b/0058-libs-guest-Fix-leak-on-realloc-failure-in-backup_pte.patch
deleted file mode 100644
index d55c095..0000000
--- a/0058-libs-guest-Fix-leak-on-realloc-failure-in-backup_pte.patch
+++ /dev/null
@@ -1,56 +0,0 @@
-From fa8250f1920413f02b63551a6a4d8ef0b47891a8 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
-Date: Fri, 3 Mar 2023 08:03:19 +0100
-Subject: [PATCH 58/89] libs/guest: Fix leak on realloc failure in
- backup_ptes()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-From `man 2 realloc`:
-
- If realloc() fails, the original block is left untouched; it is not freed or moved.
-
-Found using GCC -fanalyzer:
-
- | 184 | backup->entries = realloc(backup->entries,
- | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- | | | | |
- | | | | (91) when ‘realloc’ fails
- | | | (92) ‘old_ptes.entries’ leaks here; was allocated at (44)
- | | (90) ...to here
-
-Signed-off-by: Edwin Török <edwin.torok@cloud.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 275d13184cfa52ebe4336ed66526ce93716adbe0
-master date: 2023-02-27 15:51:23 +0000
----
- tools/libs/guest/xg_offline_page.c | 10 ++++++++--
- 1 file changed, 8 insertions(+), 2 deletions(-)
-
-diff --git a/tools/libs/guest/xg_offline_page.c b/tools/libs/guest/xg_offline_page.c
-index c594fdba41..ccd0299f0f 100644
---- a/tools/libs/guest/xg_offline_page.c
-+++ b/tools/libs/guest/xg_offline_page.c
-@@ -181,10 +181,16 @@ static int backup_ptes(xen_pfn_t table_mfn, int offset,
-
- if (backup->max == backup->cur)
- {
-- backup->entries = realloc(backup->entries,
-- backup->max * 2 * sizeof(struct pte_backup_entry));
-+ void *orig = backup->entries;
-+
-+ backup->entries = realloc(
-+ orig, backup->max * 2 * sizeof(struct pte_backup_entry));
-+
- if (backup->entries == NULL)
-+ {
-+ free(orig);
- return -1;
-+ }
- else
- backup->max *= 2;
- }
---
-2.40.0
-
diff --git a/0059-x86-ucode-AMD-late-load-the-patch-on-every-logical-t.patch b/0059-x86-ucode-AMD-late-load-the-patch-on-every-logical-t.patch
deleted file mode 100644
index 292a61a..0000000
--- a/0059-x86-ucode-AMD-late-load-the-patch-on-every-logical-t.patch
+++ /dev/null
@@ -1,90 +0,0 @@
-From ec5b058d2a6436a2e180315522fcf1645a8153b4 Mon Sep 17 00:00:00 2001
-From: Sergey Dyasli <sergey.dyasli@citrix.com>
-Date: Fri, 3 Mar 2023 08:03:43 +0100
-Subject: [PATCH 59/89] x86/ucode/AMD: late load the patch on every logical
- thread
-
-Currently late ucode loading is performed only on the first core of CPU
-siblings. But according to the latest recommendation from AMD, late
-ucode loading should happen on every logical thread/core on AMD CPUs.
-
-To achieve that, introduce is_cpu_primary() helper which will consider
-every logical cpu as "primary" when running on AMD CPUs. Also include
-Hygon in the check for future-proofing.
-
-Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: f1315e48a03a42f78f9b03c0a384165baf02acae
-master date: 2023-02-28 14:51:28 +0100
----
- xen/arch/x86/cpu/microcode/core.c | 24 +++++++++++++++++++-----
- 1 file changed, 19 insertions(+), 5 deletions(-)
-
-diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
-index 57ecc5358b..2497630bbe 100644
---- a/xen/arch/x86/cpu/microcode/core.c
-+++ b/xen/arch/x86/cpu/microcode/core.c
-@@ -274,6 +274,20 @@ static bool microcode_update_cache(struct microcode_patch *patch)
- return true;
- }
-
-+/* Returns true if ucode should be loaded on a given cpu */
-+static bool is_cpu_primary(unsigned int cpu)
-+{
-+ if ( boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) )
-+ /* Load ucode on every logical thread/core */
-+ return true;
-+
-+ /* Intel CPUs should load ucode only on the first core of SMT siblings */
-+ if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) )
-+ return true;
-+
-+ return false;
-+}
-+
- /* Wait for a condition to be met with a timeout (us). */
- static int wait_for_condition(bool (*func)(unsigned int data),
- unsigned int data, unsigned int timeout)
-@@ -380,7 +394,7 @@ static int primary_thread_work(const struct microcode_patch *patch)
- static int cf_check microcode_nmi_callback(
- const struct cpu_user_regs *regs, int cpu)
- {
-- unsigned int primary = cpumask_first(this_cpu(cpu_sibling_mask));
-+ bool primary_cpu = is_cpu_primary(cpu);
- int ret;
-
- /* System-generated NMI, leave to main handler */
-@@ -393,10 +407,10 @@ static int cf_check microcode_nmi_callback(
- * ucode_in_nmi.
- */
- if ( cpu == cpumask_first(&cpu_online_map) ||
-- (!ucode_in_nmi && cpu == primary) )
-+ (!ucode_in_nmi && primary_cpu) )
- return 0;
-
-- if ( cpu == primary )
-+ if ( primary_cpu )
- ret = primary_thread_work(nmi_patch);
- else
- ret = secondary_nmi_work();
-@@ -547,7 +561,7 @@ static int cf_check do_microcode_update(void *patch)
- */
- if ( cpu == cpumask_first(&cpu_online_map) )
- ret = control_thread_fn(patch);
-- else if ( cpu == cpumask_first(this_cpu(cpu_sibling_mask)) )
-+ else if ( is_cpu_primary(cpu) )
- ret = primary_thread_fn(patch);
- else
- ret = secondary_thread_fn();
-@@ -640,7 +654,7 @@ static long cf_check microcode_update_helper(void *data)
- /* Calculate the number of online CPU core */
- nr_cores = 0;
- for_each_online_cpu(cpu)
-- if ( cpu == cpumask_first(per_cpu(cpu_sibling_mask, cpu)) )
-+ if ( is_cpu_primary(cpu) )
- nr_cores++;
-
- printk(XENLOG_INFO "%u cores are to update their microcode\n", nr_cores);
---
-2.40.0
-
diff --git a/0060-x86-shadow-account-for-log-dirty-mode-when-pre-alloc.patch b/0060-x86-shadow-account-for-log-dirty-mode-when-pre-alloc.patch
deleted file mode 100644
index fd397b0..0000000
--- a/0060-x86-shadow-account-for-log-dirty-mode-when-pre-alloc.patch
+++ /dev/null
@@ -1,92 +0,0 @@
-From f8f8f07880d3817fc7b0472420eca9fecaa55358 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 21 Mar 2023 11:58:50 +0000
-Subject: [PATCH 60/89] x86/shadow: account for log-dirty mode when
- pre-allocating
-
-Pre-allocation is intended to ensure that in the course of constructing
-or updating shadows there won't be any risk of just made shadows or
-shadows being acted upon can disappear under our feet. The amount of
-pages pre-allocated then, however, needs to account for all possible
-subsequent allocations. While the use in sh_page_fault() accounts for
-all shadows which may need making, so far it didn't account for
-allocations coming from log-dirty tracking (which piggybacks onto the
-P2M allocation functions).
-
-Since shadow_prealloc() takes a count of shadows (or other data
-structures) rather than a count of pages, putting the adjustment at the
-call site of this function won't work very well: We simply can't express
-the correct count that way in all cases. Instead take care of this in
-the function itself, by "snooping" for L1 type requests. (While not
-applicable right now, future new request sites of L1 tables would then
-also be covered right away.)
-
-It is relevant to note here that pre-allocations like the one done from
-shadow_alloc_p2m_page() are benign when they fall in the "scope" of an
-earlier pre-alloc which already included that count: The inner call will
-simply find enough pages available then; it'll bail right away.
-
-This is CVE-2022-42332 / XSA-427.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Tim Deegan <tim@xen.org>
-(cherry picked from commit 91767a71061035ae42be93de495cd976f863a41a)
----
- xen/arch/x86/include/asm/paging.h | 4 ++++
- xen/arch/x86/mm/paging.c | 1 +
- xen/arch/x86/mm/shadow/common.c | 12 +++++++++++-
- 3 files changed, 16 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/include/asm/paging.h b/xen/arch/x86/include/asm/paging.h
-index b2b243a4ff..635ccc83b1 100644
---- a/xen/arch/x86/include/asm/paging.h
-+++ b/xen/arch/x86/include/asm/paging.h
-@@ -190,6 +190,10 @@ bool paging_mfn_is_dirty(const struct domain *d, mfn_t gmfn);
- #define L4_LOGDIRTY_IDX(pfn) ((pfn_x(pfn) >> (PAGE_SHIFT + 3 + PAGETABLE_ORDER * 2)) & \
- (LOGDIRTY_NODE_ENTRIES-1))
-
-+#define paging_logdirty_levels() \
-+ (DIV_ROUND_UP(PADDR_BITS - PAGE_SHIFT - (PAGE_SHIFT + 3), \
-+ PAGE_SHIFT - ilog2(sizeof(mfn_t))) + 1)
-+
- #ifdef CONFIG_HVM
- /* VRAM dirty tracking support */
- struct sh_dirty_vram {
-diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
-index 8d579fa9a3..308d44bce7 100644
---- a/xen/arch/x86/mm/paging.c
-+++ b/xen/arch/x86/mm/paging.c
-@@ -282,6 +282,7 @@ void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn)
- if ( unlikely(!VALID_M2P(pfn_x(pfn))) )
- return;
-
-+ BUILD_BUG_ON(paging_logdirty_levels() != 4);
- i1 = L1_LOGDIRTY_IDX(pfn);
- i2 = L2_LOGDIRTY_IDX(pfn);
- i3 = L3_LOGDIRTY_IDX(pfn);
-diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
-index a8404f97f6..cf5e181f74 100644
---- a/xen/arch/x86/mm/shadow/common.c
-+++ b/xen/arch/x86/mm/shadow/common.c
-@@ -1015,7 +1015,17 @@ bool shadow_prealloc(struct domain *d, unsigned int type, unsigned int count)
- if ( unlikely(d->is_dying) )
- return false;
-
-- ret = _shadow_prealloc(d, shadow_size(type) * count);
-+ count *= shadow_size(type);
-+ /*
-+ * Log-dirty handling may result in allocations when populating its
-+ * tracking structures. Tie this to the caller requesting space for L1
-+ * shadows.
-+ */
-+ if ( paging_mode_log_dirty(d) &&
-+ ((SHF_L1_ANY | SHF_FL1_ANY) & (1u << type)) )
-+ count += paging_logdirty_levels();
-+
-+ ret = _shadow_prealloc(d, count);
- if ( !ret && (!d->is_shutting_down || d->shutdown_code != SHUTDOWN_crash) )
- /*
- * Failing to allocate memory required for shadow usage can only result in
---
-2.40.0
-
diff --git a/0061-x86-HVM-bound-number-of-pinned-cache-attribute-regio.patch b/0061-x86-HVM-bound-number-of-pinned-cache-attribute-regio.patch
deleted file mode 100644
index b638eca..0000000
--- a/0061-x86-HVM-bound-number-of-pinned-cache-attribute-regio.patch
+++ /dev/null
@@ -1,50 +0,0 @@
-From d0cb66d59a956ccba3dbe794f4ec01e4a4269ee9 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 21 Mar 2023 12:01:01 +0000
-Subject: [PATCH 61/89] x86/HVM: bound number of pinned cache attribute regions
-
-This is exposed via DMOP, i.e. to potentially not fully privileged
-device models. With that we may not permit registration of an (almost)
-unbounded amount of such regions.
-
-This is CVE-2022-42333 / part of XSA-428.
-
-Fixes: 642123c5123f ("x86/hvm: provide XEN_DMOP_pin_memory_cacheattr")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-(cherry picked from commit a5e768640f786b681063f4e08af45d0c4e91debf)
----
- xen/arch/x86/hvm/mtrr.c | 5 +++++
- 1 file changed, 5 insertions(+)
-
-diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
-index 4d2aa6def8..714911dd7f 100644
---- a/xen/arch/x86/hvm/mtrr.c
-+++ b/xen/arch/x86/hvm/mtrr.c
-@@ -595,6 +595,7 @@ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
- uint64_t gfn_end, uint32_t type)
- {
- struct hvm_mem_pinned_cacheattr_range *range;
-+ unsigned int nr = 0;
- int rc = 1;
-
- if ( !is_hvm_domain(d) )
-@@ -666,11 +667,15 @@ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
- rc = -EBUSY;
- break;
- }
-+ ++nr;
- }
- rcu_read_unlock(&pinned_cacheattr_rcu_lock);
- if ( rc <= 0 )
- return rc;
-
-+ if ( nr >= 64 /* The limit is arbitrary. */ )
-+ return -ENOSPC;
-+
- range = xzalloc(struct hvm_mem_pinned_cacheattr_range);
- if ( range == NULL )
- return -ENOMEM;
---
-2.40.0
-
diff --git a/0062-x86-HVM-serialize-pinned-cache-attribute-list-manipu.patch b/0062-x86-HVM-serialize-pinned-cache-attribute-list-manipu.patch
deleted file mode 100644
index a0f6efc..0000000
--- a/0062-x86-HVM-serialize-pinned-cache-attribute-list-manipu.patch
+++ /dev/null
@@ -1,126 +0,0 @@
-From a2a915b3960e6ab060d8be2c36e6e697700ea87c Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 21 Mar 2023 12:01:01 +0000
-Subject: [PATCH 62/89] x86/HVM: serialize pinned cache attribute list
- manipulation
-
-While the RCU variants of list insertion and removal allow lockless list
-traversal (with RCU just read-locked), insertions and removals still
-need serializing amongst themselves. To keep things simple, use the
-domain lock for this purpose.
-
-This is CVE-2022-42334 / part of XSA-428.
-
-Fixes: 642123c5123f ("x86/hvm: provide XEN_DMOP_pin_memory_cacheattr")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit 829ec245cf66560e3b50d140ccb3168e7fb7c945)
----
- xen/arch/x86/hvm/mtrr.c | 51 +++++++++++++++++++++++++----------------
- 1 file changed, 31 insertions(+), 20 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/mtrr.c b/xen/arch/x86/hvm/mtrr.c
-index 714911dd7f..bd5cc42ef4 100644
---- a/xen/arch/x86/hvm/mtrr.c
-+++ b/xen/arch/x86/hvm/mtrr.c
-@@ -594,7 +594,7 @@ static void cf_check free_pinned_cacheattr_entry(struct rcu_head *rcu)
- int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
- uint64_t gfn_end, uint32_t type)
- {
-- struct hvm_mem_pinned_cacheattr_range *range;
-+ struct hvm_mem_pinned_cacheattr_range *range, *newr;
- unsigned int nr = 0;
- int rc = 1;
-
-@@ -608,14 +608,15 @@ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
- {
- case XEN_DOMCTL_DELETE_MEM_CACHEATTR:
- /* Remove the requested range. */
-- rcu_read_lock(&pinned_cacheattr_rcu_lock);
-- list_for_each_entry_rcu ( range,
-- &d->arch.hvm.pinned_cacheattr_ranges,
-- list )
-+ domain_lock(d);
-+ list_for_each_entry ( range,
-+ &d->arch.hvm.pinned_cacheattr_ranges,
-+ list )
- if ( range->start == gfn_start && range->end == gfn_end )
- {
-- rcu_read_unlock(&pinned_cacheattr_rcu_lock);
- list_del_rcu(&range->list);
-+ domain_unlock(d);
-+
- type = range->type;
- call_rcu(&range->rcu, free_pinned_cacheattr_entry);
- p2m_memory_type_changed(d);
-@@ -636,7 +637,7 @@ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
- }
- return 0;
- }
-- rcu_read_unlock(&pinned_cacheattr_rcu_lock);
-+ domain_unlock(d);
- return -ENOENT;
-
- case PAT_TYPE_UC_MINUS:
-@@ -651,7 +652,10 @@ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
- return -EINVAL;
- }
-
-- rcu_read_lock(&pinned_cacheattr_rcu_lock);
-+ newr = xzalloc(struct hvm_mem_pinned_cacheattr_range);
-+
-+ domain_lock(d);
-+
- list_for_each_entry_rcu ( range,
- &d->arch.hvm.pinned_cacheattr_ranges,
- list )
-@@ -669,27 +673,34 @@ int hvm_set_mem_pinned_cacheattr(struct domain *d, uint64_t gfn_start,
- }
- ++nr;
- }
-- rcu_read_unlock(&pinned_cacheattr_rcu_lock);
-+
- if ( rc <= 0 )
-- return rc;
-+ /* nothing */;
-+ else if ( nr >= 64 /* The limit is arbitrary. */ )
-+ rc = -ENOSPC;
-+ else if ( !newr )
-+ rc = -ENOMEM;
-+ else
-+ {
-+ newr->start = gfn_start;
-+ newr->end = gfn_end;
-+ newr->type = type;
-
-- if ( nr >= 64 /* The limit is arbitrary. */ )
-- return -ENOSPC;
-+ list_add_rcu(&newr->list, &d->arch.hvm.pinned_cacheattr_ranges);
-
-- range = xzalloc(struct hvm_mem_pinned_cacheattr_range);
-- if ( range == NULL )
-- return -ENOMEM;
-+ newr = NULL;
-+ rc = 0;
-+ }
-+
-+ domain_unlock(d);
-
-- range->start = gfn_start;
-- range->end = gfn_end;
-- range->type = type;
-+ xfree(newr);
-
-- list_add_rcu(&range->list, &d->arch.hvm.pinned_cacheattr_ranges);
- p2m_memory_type_changed(d);
- if ( type != PAT_TYPE_WRBACK )
- flush_all(FLUSH_CACHE);
-
-- return 0;
-+ return rc;
- }
-
- static int cf_check hvm_save_mtrr_msr(struct vcpu *v, hvm_domain_context_t *h)
---
-2.40.0
-
diff --git a/0063-x86-spec-ctrl-Defer-CR4_PV32_RESTORE-on-the-cstar_en.patch b/0063-x86-spec-ctrl-Defer-CR4_PV32_RESTORE-on-the-cstar_en.patch
deleted file mode 100644
index fa97a41..0000000
--- a/0063-x86-spec-ctrl-Defer-CR4_PV32_RESTORE-on-the-cstar_en.patch
+++ /dev/null
@@ -1,56 +0,0 @@
-From a730e4d1190594102784222f76a984d10bbc88a9 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 10 Feb 2023 21:11:14 +0000
-Subject: [PATCH 63/89] x86/spec-ctrl: Defer CR4_PV32_RESTORE on the
- cstar_enter path
-
-As stated (correctly) by the comment next to SPEC_CTRL_ENTRY_FROM_PV, between
-the two hunks visible in the patch, RET's are not safe prior to this point.
-
-CR4_PV32_RESTORE hides a CALL/RET pair in certain configurations (PV32
-compiled in, SMEP or SMAP active), and the RET can be attacked with one of
-several known speculative issues.
-
-Furthermore, CR4_PV32_RESTORE also hides a reference to the cr4_pv32_mask
-global variable, which is not safe when XPTI is active before restoring Xen's
-full pagetables.
-
-This crash has gone unnoticed because it is only AMD CPUs which permit the
-SYSCALL instruction in compatibility mode, and these are not vulnerable to
-Meltdown so don't activate XPTI by default.
-
-This is XSA-429 / CVE-2022-42331
-
-Fixes: 5e7962901131 ("x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point")
-Fixes: 5784de3e2067 ("x86: Meltdown band-aid against malicious 64-bit PV guests")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit df5b055b12116d9e63ced59ae5389e69a2a3de48)
----
- xen/arch/x86/x86_64/entry.S | 3 ++-
- 1 file changed, 2 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
-index ae01285181..7675a59ff0 100644
---- a/xen/arch/x86/x86_64/entry.S
-+++ b/xen/arch/x86/x86_64/entry.S
-@@ -288,7 +288,6 @@ ENTRY(cstar_enter)
- ALTERNATIVE "", "setssbsy", X86_FEATURE_XEN_SHSTK
- #endif
- push %rax /* Guest %rsp */
-- CR4_PV32_RESTORE
- movq 8(%rsp), %rax /* Restore guest %rax. */
- movq $FLAT_USER_SS32, 8(%rsp) /* Assume a 64bit domain. Compat handled lower. */
- pushq %r11
-@@ -312,6 +311,8 @@ ENTRY(cstar_enter)
- .Lcstar_cr3_okay:
- sti
-
-+ CR4_PV32_RESTORE
-+
- movq STACK_CPUINFO_FIELD(current_vcpu)(%rbx), %rbx
-
- #ifdef CONFIG_PV32
---
-2.40.0
-
diff --git a/0064-x86-vmx-implement-VMExit-based-guest-Bus-Lock-detect.patch b/0064-x86-vmx-implement-VMExit-based-guest-Bus-Lock-detect.patch
deleted file mode 100644
index cebb501..0000000
--- a/0064-x86-vmx-implement-VMExit-based-guest-Bus-Lock-detect.patch
+++ /dev/null
@@ -1,175 +0,0 @@
-From 83f12e4eafdc4b034501adf4847a09a1293fdf8b Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 21 Mar 2023 13:40:41 +0100
-Subject: [PATCH 64/89] x86/vmx: implement VMExit based guest Bus Lock
- detection
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Add support for enabling guest Bus Lock Detection on Intel systems.
-Such detection works by triggering a vmexit, which ought to be enough
-of a pause to prevent a guest from abusing of the Bus Lock.
-
-Add an extra Xen perf counter to track the number of Bus Locks detected.
-This is done because Bus Locks can also be reported by setting the bit
-26 in the exit reason field, so also account for those.
-
-Note EXIT_REASON_BUS_LOCK VMExits will always have bit 26 set in
-exit_reason, and hence the performance counter doesn't need to be
-increased for EXIT_REASON_BUS_LOCK handling.
-
-Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Kevin Tian <kevin.tian@intel.com>
-master commit: f7d07619d2ae0382e2922e287fbfbb27722f3f0b
-master date: 2022-12-19 11:22:43 +0100
----
- xen/arch/x86/hvm/vmx/vmcs.c | 4 +++-
- xen/arch/x86/hvm/vmx/vmx.c | 15 +++++++++++++++
- xen/arch/x86/hvm/vmx/vvmx.c | 3 ++-
- xen/arch/x86/include/asm/hvm/vmx/vmcs.h | 3 +++
- xen/arch/x86/include/asm/hvm/vmx/vmx.h | 2 ++
- xen/arch/x86/include/asm/perfc_defn.h | 4 +++-
- 6 files changed, 28 insertions(+), 3 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
-index 84dbb88d33..a0d5e8d6ab 100644
---- a/xen/arch/x86/hvm/vmx/vmcs.c
-+++ b/xen/arch/x86/hvm/vmx/vmcs.c
-@@ -209,6 +209,7 @@ static void __init vmx_display_features(void)
- P(cpu_has_vmx_virt_exceptions, "Virtualisation Exceptions");
- P(cpu_has_vmx_pml, "Page Modification Logging");
- P(cpu_has_vmx_tsc_scaling, "TSC Scaling");
-+ P(cpu_has_vmx_bus_lock_detection, "Bus Lock Detection");
- #undef P
-
- if ( !printed )
-@@ -318,7 +319,8 @@ static int vmx_init_vmcs_config(bool bsp)
- SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
- SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
- SECONDARY_EXEC_XSAVES |
-- SECONDARY_EXEC_TSC_SCALING);
-+ SECONDARY_EXEC_TSC_SCALING |
-+ SECONDARY_EXEC_BUS_LOCK_DETECTION);
- if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
- opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
- if ( opt_vpid_enabled )
-diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
-index 861f91f2af..d0f0f2e429 100644
---- a/xen/arch/x86/hvm/vmx/vmx.c
-+++ b/xen/arch/x86/hvm/vmx/vmx.c
-@@ -4084,6 +4084,12 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
- return;
- }
-
-+ if ( unlikely(exit_reason & VMX_EXIT_REASONS_BUS_LOCK) )
-+ {
-+ perfc_incr(buslock);
-+ exit_reason &= ~VMX_EXIT_REASONS_BUS_LOCK;
-+ }
-+
- /* XXX: This looks ugly, but we need a mechanism to ensure
- * any pending vmresume has really happened
- */
-@@ -4593,6 +4599,15 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
- vmx_handle_descriptor_access(exit_reason);
- break;
-
-+ case EXIT_REASON_BUS_LOCK:
-+ /*
-+ * Nothing to do: just taking a vmexit should be enough of a pause to
-+ * prevent a VM from crippling the host with bus locks. Note
-+ * EXIT_REASON_BUS_LOCK will always have bit 26 set in exit_reason, and
-+ * hence the perf counter is already increased.
-+ */
-+ break;
-+
- case EXIT_REASON_VMX_PREEMPTION_TIMER_EXPIRED:
- case EXIT_REASON_INVPCID:
- /* fall through */
-diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
-index 5f54451475..2095c1e612 100644
---- a/xen/arch/x86/hvm/vmx/vvmx.c
-+++ b/xen/arch/x86/hvm/vmx/vvmx.c
-@@ -2405,7 +2405,7 @@ void nvmx_idtv_handling(void)
- * be reinjected, otherwise, pass to L1.
- */
- __vmread(VM_EXIT_REASON, &reason);
-- if ( reason != EXIT_REASON_EPT_VIOLATION ?
-+ if ( (uint16_t)reason != EXIT_REASON_EPT_VIOLATION ?
- !(nvmx->intr.intr_info & INTR_INFO_VALID_MASK) :
- !nvcpu->nv_vmexit_pending )
- {
-@@ -2486,6 +2486,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
- case EXIT_REASON_EPT_VIOLATION:
- case EXIT_REASON_EPT_MISCONFIG:
- case EXIT_REASON_EXTERNAL_INTERRUPT:
-+ case EXIT_REASON_BUS_LOCK:
- /* pass to L0 handler */
- break;
- case VMX_EXIT_REASONS_FAILED_VMENTRY:
-diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
-index 75f9928abf..f3df5113d4 100644
---- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
-+++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
-@@ -267,6 +267,7 @@ extern u32 vmx_vmentry_control;
- #define SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS 0x00040000
- #define SECONDARY_EXEC_XSAVES 0x00100000
- #define SECONDARY_EXEC_TSC_SCALING 0x02000000
-+#define SECONDARY_EXEC_BUS_LOCK_DETECTION 0x40000000
- extern u32 vmx_secondary_exec_control;
-
- #define VMX_EPT_EXEC_ONLY_SUPPORTED 0x00000001
-@@ -346,6 +347,8 @@ extern u64 vmx_ept_vpid_cap;
- (vmx_secondary_exec_control & SECONDARY_EXEC_XSAVES)
- #define cpu_has_vmx_tsc_scaling \
- (vmx_secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
-+#define cpu_has_vmx_bus_lock_detection \
-+ (vmx_secondary_exec_control & SECONDARY_EXEC_BUS_LOCK_DETECTION)
-
- #define VMCS_RID_TYPE_MASK 0x80000000
-
-diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmx.h b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
-index 8eedf59155..03995701a1 100644
---- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
-+++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
-@@ -159,6 +159,7 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
- * Exit Reasons
- */
- #define VMX_EXIT_REASONS_FAILED_VMENTRY 0x80000000
-+#define VMX_EXIT_REASONS_BUS_LOCK (1u << 26)
-
- #define EXIT_REASON_EXCEPTION_NMI 0
- #define EXIT_REASON_EXTERNAL_INTERRUPT 1
-@@ -219,6 +220,7 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
- #define EXIT_REASON_PML_FULL 62
- #define EXIT_REASON_XSAVES 63
- #define EXIT_REASON_XRSTORS 64
-+#define EXIT_REASON_BUS_LOCK 74
- /* Remember to also update VMX_PERF_EXIT_REASON_SIZE! */
-
- /*
-diff --git a/xen/arch/x86/include/asm/perfc_defn.h b/xen/arch/x86/include/asm/perfc_defn.h
-index 509afc516b..6fce21e85a 100644
---- a/xen/arch/x86/include/asm/perfc_defn.h
-+++ b/xen/arch/x86/include/asm/perfc_defn.h
-@@ -6,7 +6,7 @@ PERFCOUNTER_ARRAY(exceptions, "exceptions", 32)
-
- #ifdef CONFIG_HVM
-
--#define VMX_PERF_EXIT_REASON_SIZE 65
-+#define VMX_PERF_EXIT_REASON_SIZE 75
- #define VMEXIT_NPF_PERFC 143
- #define SVM_PERF_EXIT_REASON_SIZE (VMEXIT_NPF_PERFC + 1)
- PERFCOUNTER_ARRAY(vmexits, "vmexits",
-@@ -128,4 +128,6 @@ PERFCOUNTER(pauseloop_exits, "vmexits from Pause-Loop Detection")
- PERFCOUNTER(iommu_pt_shatters, "IOMMU page table shatters")
- PERFCOUNTER(iommu_pt_coalesces, "IOMMU page table coalesces")
-
-+PERFCOUNTER(buslock, "Bus Locks Detected")
-+
- /*#endif*/ /* __XEN_PERFC_DEFN_H__ */
---
-2.40.0
-
diff --git a/0065-x86-vmx-introduce-helper-to-set-VMX_INTR_SHADOW_NMI.patch b/0065-x86-vmx-introduce-helper-to-set-VMX_INTR_SHADOW_NMI.patch
deleted file mode 100644
index 847ee99..0000000
--- a/0065-x86-vmx-introduce-helper-to-set-VMX_INTR_SHADOW_NMI.patch
+++ /dev/null
@@ -1,102 +0,0 @@
-From 27abea1ba6fa68f81b98de31cf9b9ebb594ff238 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 21 Mar 2023 13:41:49 +0100
-Subject: [PATCH 65/89] x86/vmx: introduce helper to set VMX_INTR_SHADOW_NMI
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Introduce a small helper to OR VMX_INTR_SHADOW_NMI in
-GUEST_INTERRUPTIBILITY_INFO in order to help dealing with the NMI
-unblocked by IRET case. Replace the existing usage in handling
-EXIT_REASON_EXCEPTION_NMI and also add such handling to EPT violations
-and page-modification log-full events.
-
-Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Kevin Tian <kevin.tian@intel.com>
-master commit: d329b37d12132164c3894d0b6284be72576ef950
-master date: 2022-12-19 11:23:34 +0100
----
- xen/arch/x86/hvm/vmx/vmx.c | 28 +++++++++++++++++++-------
- xen/arch/x86/include/asm/hvm/vmx/vmx.h | 3 +++
- 2 files changed, 24 insertions(+), 7 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
-index d0f0f2e429..456726e897 100644
---- a/xen/arch/x86/hvm/vmx/vmx.c
-+++ b/xen/arch/x86/hvm/vmx/vmx.c
-@@ -3967,6 +3967,15 @@ static int vmx_handle_apic_write(void)
- return vlapic_apicv_write(current, exit_qualification & 0xfff);
- }
-
-+static void undo_nmis_unblocked_by_iret(void)
-+{
-+ unsigned long guest_info;
-+
-+ __vmread(GUEST_INTERRUPTIBILITY_INFO, &guest_info);
-+ __vmwrite(GUEST_INTERRUPTIBILITY_INFO,
-+ guest_info | VMX_INTR_SHADOW_NMI);
-+}
-+
- void vmx_vmexit_handler(struct cpu_user_regs *regs)
- {
- unsigned long exit_qualification, exit_reason, idtv_info, intr_info = 0;
-@@ -4167,13 +4176,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
- if ( unlikely(intr_info & INTR_INFO_NMI_UNBLOCKED_BY_IRET) &&
- !(idtv_info & INTR_INFO_VALID_MASK) &&
- (vector != TRAP_double_fault) )
-- {
-- unsigned long guest_info;
--
-- __vmread(GUEST_INTERRUPTIBILITY_INFO, &guest_info);
-- __vmwrite(GUEST_INTERRUPTIBILITY_INFO,
-- guest_info | VMX_INTR_SHADOW_NMI);
-- }
-+ undo_nmis_unblocked_by_iret();
-
- perfc_incra(cause_vector, vector);
-
-@@ -4539,6 +4542,11 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
-
- __vmread(GUEST_PHYSICAL_ADDRESS, &gpa);
- __vmread(EXIT_QUALIFICATION, &exit_qualification);
-+
-+ if ( unlikely(exit_qualification & INTR_INFO_NMI_UNBLOCKED_BY_IRET) &&
-+ !(idtv_info & INTR_INFO_VALID_MASK) )
-+ undo_nmis_unblocked_by_iret();
-+
- ept_handle_violation(exit_qualification, gpa);
- break;
- }
-@@ -4583,6 +4591,12 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
- break;
-
- case EXIT_REASON_PML_FULL:
-+ __vmread(EXIT_QUALIFICATION, &exit_qualification);
-+
-+ if ( unlikely(exit_qualification & INTR_INFO_NMI_UNBLOCKED_BY_IRET) &&
-+ !(idtv_info & INTR_INFO_VALID_MASK) )
-+ undo_nmis_unblocked_by_iret();
-+
- vmx_vcpu_flush_pml_buffer(v);
- break;
-
-diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmx.h b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
-index 03995701a1..eae39365aa 100644
---- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
-+++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
-@@ -225,6 +225,9 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
-
- /*
- * Interruption-information format
-+ *
-+ * Note INTR_INFO_NMI_UNBLOCKED_BY_IRET is also used with Exit Qualification
-+ * field for EPT violations, PML full and SPP-related event vmexits.
- */
- #define INTR_INFO_VECTOR_MASK 0xff /* 7:0 */
- #define INTR_INFO_INTR_TYPE_MASK 0x700 /* 10:8 */
---
-2.40.0
-
diff --git a/0066-x86-vmx-implement-Notify-VM-Exit.patch b/0066-x86-vmx-implement-Notify-VM-Exit.patch
deleted file mode 100644
index bc54d18..0000000
--- a/0066-x86-vmx-implement-Notify-VM-Exit.patch
+++ /dev/null
@@ -1,243 +0,0 @@
-From b745ff30113d2bd91e2d34cf56437b2fe2e2ea35 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 21 Mar 2023 13:42:43 +0100
-Subject: [PATCH 66/89] x86/vmx: implement Notify VM Exit
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Under certain conditions guests can get the CPU stuck in an unbounded
-loop without the possibility of an interrupt window to occur on
-instruction boundary. This was the case with the scenarios described
-in XSA-156.
-
-Make use of the Notify VM Exit mechanism, that will trigger a VM Exit
-if no interrupt window occurs for a specified amount of time. Note
-that using the Notify VM Exit avoids having to trap #AC and #DB
-exceptions, as Xen is guaranteed to get a VM Exit even if the guest
-puts the CPU in a loop without an interrupt window, as such disable
-the intercepts if the feature is available and enabled.
-
-Setting the notify VM exit window to 0 is safe because there's a
-threshold added by the hardware in order to have a sane window value.
-
-Note the handling of EXIT_REASON_NOTIFY in the nested virtualization
-case is passed to L0, and hence a nested guest being able to trigger a
-notify VM exit with an invalid context would be able to crash the L1
-hypervisor (by L0 destroying the domain). Since we don't expose VM
-Notify support to L1 it should already enable the required
-protections in order to prevent VM Notify from triggering in the first
-place.
-
-Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Kevin Tian <kevin.tian@intel.com>
-
-x86/vmx: Partially revert "x86/vmx: implement Notify VM Exit"
-
-The original patch tried to do two things - implement VMNotify, and
-re-optimise VT-x to not intercept #DB/#AC by default.
-
-The second part is buggy in multiple ways. Both GDBSX and Introspection need
-to conditionally intercept #DB, which was not accounted for. Also, #DB
-interception has nothing at all to do with cpu_has_monitor_trap_flag.
-
-Revert the second half, leaving #DB/#AC intercepted unilaterally, but with
-VMNotify active by default when available.
-
-Fixes: 573279cde1c4 ("x86/vmx: implement Notify VM Exit")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Kevin Tian <kevin.tian@intel.com>
-master commit: 573279cde1c4e752d4df34bc65ffafa17573148e
-master date: 2022-12-19 11:24:14 +0100
-master commit: 5f08bc9404c7cfa8131e262c7dbcb4d96c752686
-master date: 2023-01-20 19:39:32 +0000
----
- docs/misc/xen-command-line.pandoc | 11 +++++++++++
- xen/arch/x86/hvm/vmx/vmcs.c | 10 ++++++++++
- xen/arch/x86/hvm/vmx/vmx.c | 16 ++++++++++++++++
- xen/arch/x86/hvm/vmx/vvmx.c | 1 +
- xen/arch/x86/include/asm/hvm/vmx/vmcs.h | 4 ++++
- xen/arch/x86/include/asm/hvm/vmx/vmx.h | 6 ++++++
- xen/arch/x86/include/asm/perfc_defn.h | 3 ++-
- 7 files changed, 50 insertions(+), 1 deletion(-)
-
-diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index 5be5ce10c6..d601120faa 100644
---- a/docs/misc/xen-command-line.pandoc
-+++ b/docs/misc/xen-command-line.pandoc
-@@ -2634,6 +2634,17 @@ guest will notify Xen that it has failed to acquire a spinlock.
- <major>, <minor> and <build> must be integers. The values will be
- encoded in guest CPUID 0x40000002 if viridian enlightenments are enabled.
-
-+### vm-notify-window (Intel)
-+> `= <integer>`
-+
-+> Default: `0`
-+
-+Specify the value of the VM Notify window used to detect locked VMs. Set to -1
-+to disable the feature. Value is in units of crystal clock cycles.
-+
-+Note the hardware might add a threshold to the provided value in order to make
-+it safe, and hence using 0 is fine.
-+
- ### vpid (Intel)
- > `= <boolean>`
-
-diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
-index a0d5e8d6ab..7912053bda 100644
---- a/xen/arch/x86/hvm/vmx/vmcs.c
-+++ b/xen/arch/x86/hvm/vmx/vmcs.c
-@@ -67,6 +67,9 @@ integer_param("ple_gap", ple_gap);
- static unsigned int __read_mostly ple_window = 4096;
- integer_param("ple_window", ple_window);
-
-+static unsigned int __ro_after_init vm_notify_window;
-+integer_param("vm-notify-window", vm_notify_window);
-+
- static bool __read_mostly opt_ept_pml = true;
- static s8 __read_mostly opt_ept_ad = -1;
- int8_t __read_mostly opt_ept_exec_sp = -1;
-@@ -210,6 +213,7 @@ static void __init vmx_display_features(void)
- P(cpu_has_vmx_pml, "Page Modification Logging");
- P(cpu_has_vmx_tsc_scaling, "TSC Scaling");
- P(cpu_has_vmx_bus_lock_detection, "Bus Lock Detection");
-+ P(cpu_has_vmx_notify_vm_exiting, "Notify VM Exit");
- #undef P
-
- if ( !printed )
-@@ -329,6 +333,8 @@ static int vmx_init_vmcs_config(bool bsp)
- opt |= SECONDARY_EXEC_UNRESTRICTED_GUEST;
- if ( opt_ept_pml )
- opt |= SECONDARY_EXEC_ENABLE_PML;
-+ if ( vm_notify_window != ~0u )
-+ opt |= SECONDARY_EXEC_NOTIFY_VM_EXITING;
-
- /*
- * "APIC Register Virtualization" and "Virtual Interrupt Delivery"
-@@ -1290,6 +1296,10 @@ static int construct_vmcs(struct vcpu *v)
- v->arch.hvm.vmx.exception_bitmap = HVM_TRAP_MASK
- | (paging_mode_hap(d) ? 0 : (1U << TRAP_page_fault))
- | (v->arch.fully_eager_fpu ? 0 : (1U << TRAP_no_device));
-+
-+ if ( cpu_has_vmx_notify_vm_exiting )
-+ __vmwrite(NOTIFY_WINDOW, vm_notify_window);
-+
- vmx_update_exception_bitmap(v);
-
- v->arch.hvm.guest_cr[0] = X86_CR0_PE | X86_CR0_ET;
-diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
-index 456726e897..f0e759eeaf 100644
---- a/xen/arch/x86/hvm/vmx/vmx.c
-+++ b/xen/arch/x86/hvm/vmx/vmx.c
-@@ -4622,6 +4622,22 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
- */
- break;
-
-+ case EXIT_REASON_NOTIFY:
-+ __vmread(EXIT_QUALIFICATION, &exit_qualification);
-+
-+ if ( unlikely(exit_qualification & NOTIFY_VM_CONTEXT_INVALID) )
-+ {
-+ perfc_incr(vmnotify_crash);
-+ gprintk(XENLOG_ERR, "invalid VM context after notify vmexit\n");
-+ domain_crash(v->domain);
-+ break;
-+ }
-+
-+ if ( unlikely(exit_qualification & INTR_INFO_NMI_UNBLOCKED_BY_IRET) )
-+ undo_nmis_unblocked_by_iret();
-+
-+ break;
-+
- case EXIT_REASON_VMX_PREEMPTION_TIMER_EXPIRED:
- case EXIT_REASON_INVPCID:
- /* fall through */
-diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
-index 2095c1e612..f8fe8d0c14 100644
---- a/xen/arch/x86/hvm/vmx/vvmx.c
-+++ b/xen/arch/x86/hvm/vmx/vvmx.c
-@@ -2487,6 +2487,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
- case EXIT_REASON_EPT_MISCONFIG:
- case EXIT_REASON_EXTERNAL_INTERRUPT:
- case EXIT_REASON_BUS_LOCK:
-+ case EXIT_REASON_NOTIFY:
- /* pass to L0 handler */
- break;
- case VMX_EXIT_REASONS_FAILED_VMENTRY:
-diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
-index f3df5113d4..78404e42b3 100644
---- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
-+++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
-@@ -268,6 +268,7 @@ extern u32 vmx_vmentry_control;
- #define SECONDARY_EXEC_XSAVES 0x00100000
- #define SECONDARY_EXEC_TSC_SCALING 0x02000000
- #define SECONDARY_EXEC_BUS_LOCK_DETECTION 0x40000000
-+#define SECONDARY_EXEC_NOTIFY_VM_EXITING 0x80000000
- extern u32 vmx_secondary_exec_control;
-
- #define VMX_EPT_EXEC_ONLY_SUPPORTED 0x00000001
-@@ -349,6 +350,8 @@ extern u64 vmx_ept_vpid_cap;
- (vmx_secondary_exec_control & SECONDARY_EXEC_TSC_SCALING)
- #define cpu_has_vmx_bus_lock_detection \
- (vmx_secondary_exec_control & SECONDARY_EXEC_BUS_LOCK_DETECTION)
-+#define cpu_has_vmx_notify_vm_exiting \
-+ (vmx_secondary_exec_control & SECONDARY_EXEC_NOTIFY_VM_EXITING)
-
- #define VMCS_RID_TYPE_MASK 0x80000000
-
-@@ -456,6 +459,7 @@ enum vmcs_field {
- SECONDARY_VM_EXEC_CONTROL = 0x0000401e,
- PLE_GAP = 0x00004020,
- PLE_WINDOW = 0x00004022,
-+ NOTIFY_WINDOW = 0x00004024,
- VM_INSTRUCTION_ERROR = 0x00004400,
- VM_EXIT_REASON = 0x00004402,
- VM_EXIT_INTR_INFO = 0x00004404,
-diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmx.h b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
-index eae39365aa..8e1e42ac47 100644
---- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
-+++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
-@@ -221,6 +221,7 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
- #define EXIT_REASON_XSAVES 63
- #define EXIT_REASON_XRSTORS 64
- #define EXIT_REASON_BUS_LOCK 74
-+#define EXIT_REASON_NOTIFY 75
- /* Remember to also update VMX_PERF_EXIT_REASON_SIZE! */
-
- /*
-@@ -236,6 +237,11 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
- #define INTR_INFO_VALID_MASK 0x80000000 /* 31 */
- #define INTR_INFO_RESVD_BITS_MASK 0x7ffff000
-
-+/*
-+ * Exit Qualifications for NOTIFY VM EXIT
-+ */
-+#define NOTIFY_VM_CONTEXT_INVALID 1u
-+
- /*
- * Exit Qualifications for MOV for Control Register Access
- */
-diff --git a/xen/arch/x86/include/asm/perfc_defn.h b/xen/arch/x86/include/asm/perfc_defn.h
-index 6fce21e85a..487e20dc97 100644
---- a/xen/arch/x86/include/asm/perfc_defn.h
-+++ b/xen/arch/x86/include/asm/perfc_defn.h
-@@ -6,7 +6,7 @@ PERFCOUNTER_ARRAY(exceptions, "exceptions", 32)
-
- #ifdef CONFIG_HVM
-
--#define VMX_PERF_EXIT_REASON_SIZE 75
-+#define VMX_PERF_EXIT_REASON_SIZE 76
- #define VMEXIT_NPF_PERFC 143
- #define SVM_PERF_EXIT_REASON_SIZE (VMEXIT_NPF_PERFC + 1)
- PERFCOUNTER_ARRAY(vmexits, "vmexits",
-@@ -129,5 +129,6 @@ PERFCOUNTER(iommu_pt_shatters, "IOMMU page table shatters")
- PERFCOUNTER(iommu_pt_coalesces, "IOMMU page table coalesces")
-
- PERFCOUNTER(buslock, "Bus Locks Detected")
-+PERFCOUNTER(vmnotify_crash, "domain crashes by Notify VM Exit")
-
- /*#endif*/ /* __XEN_PERFC_DEFN_H__ */
---
-2.40.0
-
diff --git a/0067-tools-python-change-s-size-type-for-Python-3.10.patch b/0067-tools-python-change-s-size-type-for-Python-3.10.patch
deleted file mode 100644
index 0671c67..0000000
--- a/0067-tools-python-change-s-size-type-for-Python-3.10.patch
+++ /dev/null
@@ -1,72 +0,0 @@
-From 651ffe2c7847cb9922d22980984a3bea6f47bea7 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
- <marmarek@invisiblethingslab.com>
-Date: Tue, 21 Mar 2023 13:43:44 +0100
-Subject: [PATCH 67/89] tools/python: change 's#' size type for Python >= 3.10
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Python < 3.10 by default uses 'int' type for data+size string types
-(s#), unless PY_SSIZE_T_CLEAN is defined - in which case it uses
-Py_ssize_t. The former behavior was removed in Python 3.10 and now it's
-required to define PY_SSIZE_T_CLEAN before including Python.h, and using
-Py_ssize_t for the length argument. The PY_SSIZE_T_CLEAN behavior is
-supported since Python 2.5.
-
-Adjust bindings accordingly.
-
-Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
-Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
-master commit: 897257ba49d0a6ddcf084960fd792ccce9c40f94
-master date: 2023-02-06 08:50:13 +0100
----
- tools/python/xen/lowlevel/xc/xc.c | 3 ++-
- tools/python/xen/lowlevel/xs/xs.c | 3 ++-
- 2 files changed, 4 insertions(+), 2 deletions(-)
-
-diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c
-index fd00861032..cfb2734a99 100644
---- a/tools/python/xen/lowlevel/xc/xc.c
-+++ b/tools/python/xen/lowlevel/xc/xc.c
-@@ -4,6 +4,7 @@
- * Copyright (c) 2003-2004, K A Fraser (University of Cambridge)
- */
-
-+#define PY_SSIZE_T_CLEAN
- #include <Python.h>
- #define XC_WANT_COMPAT_MAP_FOREIGN_API
- #include <xenctrl.h>
-@@ -1774,7 +1775,7 @@ static PyObject *pyflask_load(PyObject *self, PyObject *args, PyObject *kwds)
- {
- xc_interface *xc_handle;
- char *policy;
-- uint32_t len;
-+ Py_ssize_t len;
- int ret;
-
- static char *kwd_list[] = { "policy", NULL };
-diff --git a/tools/python/xen/lowlevel/xs/xs.c b/tools/python/xen/lowlevel/xs/xs.c
-index 0dad7fa5f2..3ba5a8b893 100644
---- a/tools/python/xen/lowlevel/xs/xs.c
-+++ b/tools/python/xen/lowlevel/xs/xs.c
-@@ -18,6 +18,7 @@
- * Copyright (C) 2005 XenSource Ltd.
- */
-
-+#define PY_SSIZE_T_CLEAN
- #include <Python.h>
-
- #include <stdbool.h>
-@@ -141,7 +142,7 @@ static PyObject *xspy_write(XsHandle *self, PyObject *args)
- char *thstr;
- char *path;
- char *data;
-- int data_n;
-+ Py_ssize_t data_n;
- bool result;
-
- if (!xh)
---
-2.40.0
-
diff --git a/0068-tools-xenmon-Fix-xenmon.py-for-with-python3.x.patch b/0068-tools-xenmon-Fix-xenmon.py-for-with-python3.x.patch
deleted file mode 100644
index a47812b..0000000
--- a/0068-tools-xenmon-Fix-xenmon.py-for-with-python3.x.patch
+++ /dev/null
@@ -1,54 +0,0 @@
-From 244d39fb13abae6c2da341b76363f169d8bbc93b Mon Sep 17 00:00:00 2001
-From: Bernhard Kaindl <bernhard.kaindl@citrix.com>
-Date: Tue, 21 Mar 2023 13:44:04 +0100
-Subject: [PATCH 68/89] tools/xenmon: Fix xenmon.py for with python3.x
-
-Fixes for Py3:
-* class Delayed(): file not defined; also an error for pylint -E. Inherit
- object instead for Py2 compatibility. Fix DomainInfo() too.
-* Inconsistent use of tabs and spaces for indentation (in one block)
-
-Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 3a59443c1d5ae0677a792c660ccd3796ce036732
-master date: 2023-02-06 10:22:12 +0000
----
- tools/xenmon/xenmon.py | 8 ++++----
- 1 file changed, 4 insertions(+), 4 deletions(-)
-
-diff --git a/tools/xenmon/xenmon.py b/tools/xenmon/xenmon.py
-index 175eacd2cb..977ada6887 100644
---- a/tools/xenmon/xenmon.py
-+++ b/tools/xenmon/xenmon.py
-@@ -117,7 +117,7 @@ def setup_cmdline_parser():
- return parser
-
- # encapsulate information about a domain
--class DomainInfo:
-+class DomainInfo(object):
- def __init__(self):
- self.allocated_sum = 0
- self.gotten_sum = 0
-@@ -533,7 +533,7 @@ def show_livestats(cpu):
- # simple functions to allow initialization of log files without actually
- # physically creating files that are never used; only on the first real
- # write does the file get created
--class Delayed(file):
-+class Delayed(object):
- def __init__(self, filename, mode):
- self.filename = filename
- self.saved_mode = mode
-@@ -677,8 +677,8 @@ def main():
-
- if os.uname()[0] == "SunOS":
- xenbaked_cmd = "/usr/lib/xenbaked"
-- stop_cmd = "/usr/bin/pkill -INT -z global xenbaked"
-- kill_cmd = "/usr/bin/pkill -KILL -z global xenbaked"
-+ stop_cmd = "/usr/bin/pkill -INT -z global xenbaked"
-+ kill_cmd = "/usr/bin/pkill -KILL -z global xenbaked"
- else:
- # assumes that xenbaked is in your path
- xenbaked_cmd = "xenbaked"
---
-2.40.0
-
diff --git a/0069-x86-spec-ctrl-Add-BHI-controls-to-userspace-componen.patch b/0069-x86-spec-ctrl-Add-BHI-controls-to-userspace-componen.patch
deleted file mode 100644
index 734a2e5..0000000
--- a/0069-x86-spec-ctrl-Add-BHI-controls-to-userspace-componen.patch
+++ /dev/null
@@ -1,51 +0,0 @@
-From b4dad09bb23c439f2e67ed2eb6d7bdd640b8bbae Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 21 Mar 2023 13:44:27 +0100
-Subject: [PATCH 69/89] x86/spec-ctrl: Add BHI controls to userspace components
-
-This was an oversight when adding the Xen parts.
-
-Fixes: cea9ae062295 ("x86/spec-ctrl: Enumeration for new Intel BHI controls")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 9276e832aef60437da13d91e66fc259fd94d6f91
-master date: 2023-03-13 11:26:26 +0000
----
- tools/libs/light/libxl_cpuid.c | 3 +++
- tools/misc/xen-cpuid.c | 6 +++---
- 2 files changed, 6 insertions(+), 3 deletions(-)
-
-diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
-index d97a2f3338..55cfbc8f23 100644
---- a/tools/libs/light/libxl_cpuid.c
-+++ b/tools/libs/light/libxl_cpuid.c
-@@ -238,6 +238,9 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
- {"cet-sss", 0x00000007, 1, CPUID_REG_EDX, 18, 1},
-
- {"intel-psfd", 0x00000007, 2, CPUID_REG_EDX, 0, 1},
-+ {"ipred-ctrl", 0x00000007, 2, CPUID_REG_EDX, 1, 1},
-+ {"rrsba-ctrl", 0x00000007, 2, CPUID_REG_EDX, 2, 1},
-+ {"bhi-ctrl", 0x00000007, 2, CPUID_REG_EDX, 4, 1},
- {"mcdt-no", 0x00000007, 2, CPUID_REG_EDX, 5, 1},
-
- {"lahfsahf", 0x80000001, NA, CPUID_REG_ECX, 0, 1},
-diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
-index 0248eaef44..45e443f5d9 100644
---- a/tools/misc/xen-cpuid.c
-+++ b/tools/misc/xen-cpuid.c
-@@ -213,9 +213,9 @@ static const char *const str_7d1[32] =
-
- static const char *const str_7d2[32] =
- {
-- [ 0] = "intel-psfd",
--
-- /* 4 */ [ 5] = "mcdt-no",
-+ [ 0] = "intel-psfd", [ 1] = "ipred-ctrl",
-+ [ 2] = "rrsba-ctrl",
-+ [ 4] = "bhi-ctrl", [ 5] = "mcdt-no",
- };
-
- static const struct {
---
-2.40.0
-
diff --git a/0070-core-parking-fix-build-with-gcc12-and-NR_CPUS-1.patch b/0070-core-parking-fix-build-with-gcc12-and-NR_CPUS-1.patch
deleted file mode 100644
index 0b2c2b4..0000000
--- a/0070-core-parking-fix-build-with-gcc12-and-NR_CPUS-1.patch
+++ /dev/null
@@ -1,95 +0,0 @@
-From b5409f4e4d0722e8669123d59f15f784903d153f Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 21 Mar 2023 13:44:53 +0100
-Subject: [PATCH 70/89] core-parking: fix build with gcc12 and NR_CPUS=1
-
-Gcc12 takes issue with core_parking_remove()'s
-
- for ( ; i < cur_idle_nums; ++i )
- core_parking_cpunum[i] = core_parking_cpunum[i + 1];
-
-complaining that the right hand side array access is past the bounds of
-1. Clearly the compiler can't know that cur_idle_nums can only ever be
-zero in this case (as the sole CPU cannot be parked).
-
-Arrange for core_parking.c's contents to not be needed altogether, and
-then disable its building when NR_CPUS == 1.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 4b0422f70feb4b1cd04598ffde805fc224f3812e
-master date: 2023-03-13 15:15:42 +0100
----
- xen/arch/x86/Kconfig | 2 +-
- xen/arch/x86/platform_hypercall.c | 11 ++++++++---
- xen/arch/x86/sysctl.c | 3 +++
- xen/common/Kconfig | 1 +
- 4 files changed, 13 insertions(+), 4 deletions(-)
-
-diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
-index 6a7825f4ba..2a5c3304e2 100644
---- a/xen/arch/x86/Kconfig
-+++ b/xen/arch/x86/Kconfig
-@@ -10,7 +10,7 @@ config X86
- select ALTERNATIVE_CALL
- select ARCH_MAP_DOMAIN_PAGE
- select ARCH_SUPPORTS_INT128
-- select CORE_PARKING
-+ imply CORE_PARKING
- select HAS_ALTERNATIVE
- select HAS_COMPAT
- select HAS_CPUFREQ
-diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c
-index a7341dc3d7..e7deee2268 100644
---- a/xen/arch/x86/platform_hypercall.c
-+++ b/xen/arch/x86/platform_hypercall.c
-@@ -727,12 +727,17 @@ ret_t do_platform_op(
- case XEN_CORE_PARKING_SET:
- idle_nums = min_t(uint32_t,
- op->u.core_parking.idle_nums, num_present_cpus() - 1);
-- ret = continue_hypercall_on_cpu(
-- 0, core_parking_helper, (void *)(unsigned long)idle_nums);
-+ if ( CONFIG_NR_CPUS > 1 )
-+ ret = continue_hypercall_on_cpu(
-+ 0, core_parking_helper,
-+ (void *)(unsigned long)idle_nums);
-+ else if ( idle_nums )
-+ ret = -EINVAL;
- break;
-
- case XEN_CORE_PARKING_GET:
-- op->u.core_parking.idle_nums = get_cur_idle_nums();
-+ op->u.core_parking.idle_nums = CONFIG_NR_CPUS > 1
-+ ? get_cur_idle_nums() : 0;
- ret = __copy_field_to_guest(u_xenpf_op, op, u.core_parking) ?
- -EFAULT : 0;
- break;
-diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
-index f82abc2488..f8f8d79755 100644
---- a/xen/arch/x86/sysctl.c
-+++ b/xen/arch/x86/sysctl.c
-@@ -179,6 +179,9 @@ long arch_do_sysctl(
- ret = -EBUSY;
- break;
- }
-+ if ( CONFIG_NR_CPUS <= 1 )
-+ /* Mimic behavior of smt_up_down_helper(). */
-+ return 0;
- plug = op == XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE;
- fn = smt_up_down_helper;
- hcpu = _p(plug);
-diff --git a/xen/common/Kconfig b/xen/common/Kconfig
-index f1ea3199c8..855c843113 100644
---- a/xen/common/Kconfig
-+++ b/xen/common/Kconfig
-@@ -10,6 +10,7 @@ config COMPAT
-
- config CORE_PARKING
- bool
-+ depends on NR_CPUS > 1
-
- config GRANT_TABLE
- bool "Grant table support" if EXPERT
---
-2.40.0
-
diff --git a/0071-x86-altp2m-help-gcc13-to-avoid-it-emitting-a-warning.patch b/0071-x86-altp2m-help-gcc13-to-avoid-it-emitting-a-warning.patch
deleted file mode 100644
index b33bd11..0000000
--- a/0071-x86-altp2m-help-gcc13-to-avoid-it-emitting-a-warning.patch
+++ /dev/null
@@ -1,129 +0,0 @@
-From d84612ecab00ab31c09a7c5a5892906edbacaf5b Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 21 Mar 2023 13:45:47 +0100
-Subject: [PATCH 71/89] x86/altp2m: help gcc13 to avoid it emitting a warning
-
-Switches of altp2m-s always expect a valid altp2m to be in place (and
-indeed altp2m_vcpu_initialise() sets the active one to be at index 0).
-The compiler, however, cannot know that, and hence it cannot eliminate
-p2m_get_altp2m()'s case of returnin (literal) NULL. If then the compiler
-decides to special case that code path in the caller, the dereference in
-instances of
-
- atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
-
-can, to the code generator, appear to be NULL dereferences, leading to
-
-In function 'atomic_dec',
- inlined from '...' at ...:
-./arch/x86/include/asm/atomic.h:182:5: error: array subscript 0 is outside array bounds of 'int[0]' [-Werror=array-bounds=]
-
-Aid the compiler by adding a BUG_ON() checking the return value of the
-problematic p2m_get_altp2m(). Since with the use of the local variable
-the 2nd p2m_get_altp2m() each will look questionable at the first glance
-(Why is the local variable not used here?), open-code the only relevant
-piece of p2m_get_altp2m() there.
-
-To avoid repeatedly doing these transformations, and also to limit how
-"bad" the open-coding really is, convert the entire operation to an
-inline helper, used by all three instances (and accepting the redundant
-BUG_ON(idx >= MAX_ALTP2M) in two of the three cases).
-
-Reported-by: Charles Arnold <carnold@suse.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: be62b1fc2aa7375d553603fca07299da765a89fe
-master date: 2023-03-13 15:16:21 +0100
----
- xen/arch/x86/hvm/vmx/vmx.c | 8 +-------
- xen/arch/x86/include/asm/p2m.h | 20 ++++++++++++++++++++
- xen/arch/x86/mm/p2m.c | 14 ++------------
- 3 files changed, 23 insertions(+), 19 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
-index f0e759eeaf..a8fb4365ad 100644
---- a/xen/arch/x86/hvm/vmx/vmx.c
-+++ b/xen/arch/x86/hvm/vmx/vmx.c
-@@ -4072,13 +4072,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
- }
- }
-
-- if ( idx != vcpu_altp2m(v).p2midx )
-- {
-- BUG_ON(idx >= MAX_ALTP2M);
-- atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
-- vcpu_altp2m(v).p2midx = idx;
-- atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
-- }
-+ p2m_set_altp2m(v, idx);
- }
-
- if ( unlikely(currd->arch.monitor.vmexit_enabled) )
-diff --git a/xen/arch/x86/include/asm/p2m.h b/xen/arch/x86/include/asm/p2m.h
-index bd684d02f3..cd43d8621a 100644
---- a/xen/arch/x86/include/asm/p2m.h
-+++ b/xen/arch/x86/include/asm/p2m.h
-@@ -879,6 +879,26 @@ static inline struct p2m_domain *p2m_get_altp2m(struct vcpu *v)
- return v->domain->arch.altp2m_p2m[index];
- }
-
-+/* set current alternate p2m table */
-+static inline bool p2m_set_altp2m(struct vcpu *v, unsigned int idx)
-+{
-+ struct p2m_domain *orig;
-+
-+ BUG_ON(idx >= MAX_ALTP2M);
-+
-+ if ( idx == vcpu_altp2m(v).p2midx )
-+ return false;
-+
-+ orig = p2m_get_altp2m(v);
-+ BUG_ON(!orig);
-+ atomic_dec(&orig->active_vcpus);
-+
-+ vcpu_altp2m(v).p2midx = idx;
-+ atomic_inc(&v->domain->arch.altp2m_p2m[idx]->active_vcpus);
-+
-+ return true;
-+}
-+
- /* Switch alternate p2m for a single vcpu */
- bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx);
-
-diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
-index a405ee5fde..b28c899b5e 100644
---- a/xen/arch/x86/mm/p2m.c
-+++ b/xen/arch/x86/mm/p2m.c
-@@ -1787,13 +1787,8 @@ bool_t p2m_switch_vcpu_altp2m_by_id(struct vcpu *v, unsigned int idx)
-
- if ( d->arch.altp2m_eptp[idx] != mfn_x(INVALID_MFN) )
- {
-- if ( idx != vcpu_altp2m(v).p2midx )
-- {
-- atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
-- vcpu_altp2m(v).p2midx = idx;
-- atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
-+ if ( p2m_set_altp2m(v, idx) )
- altp2m_vcpu_update_p2m(v);
-- }
- rc = 1;
- }
-
-@@ -2070,13 +2065,8 @@ int p2m_switch_domain_altp2m_by_id(struct domain *d, unsigned int idx)
- if ( d->arch.altp2m_visible_eptp[idx] != mfn_x(INVALID_MFN) )
- {
- for_each_vcpu( d, v )
-- if ( idx != vcpu_altp2m(v).p2midx )
-- {
-- atomic_dec(&p2m_get_altp2m(v)->active_vcpus);
-- vcpu_altp2m(v).p2midx = idx;
-- atomic_inc(&p2m_get_altp2m(v)->active_vcpus);
-+ if ( p2m_set_altp2m(v, idx) )
- altp2m_vcpu_update_p2m(v);
-- }
-
- rc = 0;
- }
---
-2.40.0
-
diff --git a/0072-VT-d-constrain-IGD-check.patch b/0072-VT-d-constrain-IGD-check.patch
deleted file mode 100644
index 497b04b..0000000
--- a/0072-VT-d-constrain-IGD-check.patch
+++ /dev/null
@@ -1,44 +0,0 @@
-From f971f5c531ce6a5fd6c1ff1f525f2c6837eeb78d Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 21 Mar 2023 13:46:39 +0100
-Subject: [PATCH 72/89] VT-d: constrain IGD check
-
-Marking a DRHD as controlling an IGD isn't very sensible without
-checking that at the very least it's a graphics device that lives at
-0000:00:02.0. Re-use the reading of the class-code to control both the
-clearing of "gfx_only" and the setting of "igd_drhd_address".
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Kevin Tian <kevin.tian@intel.com>
-master commit: f8c4317295fa1cde1a81779b7e362651c084efb8
-master date: 2023-03-14 10:44:08 +0100
----
- xen/drivers/passthrough/vtd/dmar.c | 9 +++------
- 1 file changed, 3 insertions(+), 6 deletions(-)
-
-diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
-index 78c8bad151..78d4526446 100644
---- a/xen/drivers/passthrough/vtd/dmar.c
-+++ b/xen/drivers/passthrough/vtd/dmar.c
-@@ -391,15 +391,12 @@ static int __init acpi_parse_dev_scope(
-
- if ( drhd )
- {
-- if ( (seg == 0) && (bus == 0) && (path->dev == 2) &&
-- (path->fn == 0) )
-- igd_drhd_address = drhd->address;
--
-- if ( gfx_only &&
-- pci_conf_read8(PCI_SBDF(seg, bus, path->dev, path->fn),
-+ if ( pci_conf_read8(PCI_SBDF(seg, bus, path->dev, path->fn),
- PCI_CLASS_DEVICE + 1) != 0x03
- /* PCI_BASE_CLASS_DISPLAY */ )
- gfx_only = false;
-+ else if ( !seg && !bus && path->dev == 2 && !path->fn )
-+ igd_drhd_address = drhd->address;
- }
-
- break;
---
-2.40.0
-
diff --git a/0073-bunzip-work-around-gcc13-warning.patch b/0073-bunzip-work-around-gcc13-warning.patch
deleted file mode 100644
index c7ec163..0000000
--- a/0073-bunzip-work-around-gcc13-warning.patch
+++ /dev/null
@@ -1,42 +0,0 @@
-From 7082d656ae9bcd26392caf72e50e0f7a61c8f285 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 21 Mar 2023 13:47:11 +0100
-Subject: [PATCH 73/89] bunzip: work around gcc13 warning
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-While provable that length[0] is always initialized (because symCount
-cannot be zero), upcoming gcc13 fails to recognize this and warns about
-the unconditional use of the value immediately following the loop.
-
-See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511.
-
-Reported-by: Martin Liška <martin.liska@suse.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 402195e56de0aacf97e05c80ed367d464ca6938b
-master date: 2023-03-14 10:45:28 +0100
----
- xen/common/bunzip2.c | 5 +++++
- 1 file changed, 5 insertions(+)
-
-diff --git a/xen/common/bunzip2.c b/xen/common/bunzip2.c
-index 61b80aff1b..4466426941 100644
---- a/xen/common/bunzip2.c
-+++ b/xen/common/bunzip2.c
-@@ -233,6 +233,11 @@ static int __init get_next_block(struct bunzip_data *bd)
- becomes negative, so an unsigned inequality catches
- it.) */
- t = get_bits(bd, 5)-1;
-+ /* GCC 13 has apparently improved use-before-set detection, but
-+ it can't figure out that length[0] is always intialized by
-+ virtue of symCount always being positive when making it here.
-+ See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511. */
-+ length[0] = 0;
- for (i = 0; i < symCount; i++) {
- for (;;) {
- if (((unsigned)t) > (MAX_HUFCODE_BITS-1))
---
-2.40.0
-
diff --git a/0074-libacpi-fix-PCI-hotplug-AML.patch b/0074-libacpi-fix-PCI-hotplug-AML.patch
deleted file mode 100644
index 3583849..0000000
--- a/0074-libacpi-fix-PCI-hotplug-AML.patch
+++ /dev/null
@@ -1,57 +0,0 @@
-From 3eac216e6e60860bbc030602c401d3ef8efce8d9 Mon Sep 17 00:00:00 2001
-From: David Woodhouse <dwmw@amazon.co.uk>
-Date: Tue, 21 Mar 2023 13:47:52 +0100
-Subject: [PATCH 74/89] libacpi: fix PCI hotplug AML
-
-The emulated PIIX3 uses a nybble for the status of each PCI function,
-so the status for e.g. slot 0 functions 0 and 1 respectively can be
-read as (\_GPE.PH00 & 0x0F), and (\_GPE.PH00 >> 0x04).
-
-The AML that Xen gives to a guest gets the operand order for the odd-
-numbered functions the wrong way round, returning (0x04 >> \_GPE.PH00)
-instead.
-
-As far as I can tell, this was the wrong way round in Xen from the
-moment that PCI hotplug was first introduced in commit 83d82e6f35a8:
-
-+ ShiftRight (0x4, \_GPE.PH00, Local1)
-+ Return (Local1) /* IN status as the _STA */
-
-Or maybe there's bizarre AML operand ordering going on there, like
-Intel's wrong-way-round assembler, and it only broke later when it was
-changed to being generated?
-
-Either way, it's definitely wrong now, and instrumenting a Linux guest
-shows that it correctly sees _STA being 0x00 in function 0 of an empty
-slot, but then the loop in acpiphp_glue.c::get_slot_status() goes on to
-look at function 1 and sees that _STA evaluates to 0x04. Thus reporting
-an adapter is present in every slot in /sys/bus/pci/slots/*
-
-Quite why Linux wants to look for function 1 being physically present
-when function 0 isn't... I don't want to think about right now.
-
-Fixes: 83d82e6f35a8 ("hvmloader: pass-through: multi-function PCI hot-plug")
-Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: b190af7d3e90f58da5f58044b8dea7261b8b483d
-master date: 2023-03-20 17:12:34 +0100
----
- tools/libacpi/mk_dsdt.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/tools/libacpi/mk_dsdt.c b/tools/libacpi/mk_dsdt.c
-index 1176da80ef..1d27809116 100644
---- a/tools/libacpi/mk_dsdt.c
-+++ b/tools/libacpi/mk_dsdt.c
-@@ -431,7 +431,7 @@ int main(int argc, char **argv)
- stmt("Store", "0x89, \\_GPE.DPT2");
- }
- if ( slot & 1 )
-- stmt("ShiftRight", "0x4, \\_GPE.PH%02X, Local1", slot & ~1);
-+ stmt("ShiftRight", "\\_GPE.PH%02X, 0x04, Local1", slot & ~1);
- else
- stmt("And", "\\_GPE.PH%02X, 0x0f, Local1", slot & ~1);
- stmt("Return", "Local1"); /* IN status as the _STA */
---
-2.40.0
-
diff --git a/0075-AMD-IOMMU-without-XT-x2APIC-needs-to-be-forced-into-.patch b/0075-AMD-IOMMU-without-XT-x2APIC-needs-to-be-forced-into-.patch
deleted file mode 100644
index 5decf2c..0000000
--- a/0075-AMD-IOMMU-without-XT-x2APIC-needs-to-be-forced-into-.patch
+++ /dev/null
@@ -1,42 +0,0 @@
-From 3c85fb7b65d6a8b0fa993bc1cb67eea9b4a64aca Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Fri, 31 Mar 2023 08:28:56 +0200
-Subject: [PATCH 75/89] AMD/IOMMU: without XT, x2APIC needs to be forced into
- physical mode
-
-An earlier change with the same title (commit 1ba66a870eba) altered only
-the path where x2apic_phys was already set to false (perhaps from the
-command line). The same of course needs applying when the variable
-wasn't modified yet from its initial value.
-
-Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 0d2686f6b66b4b1b3c72c3525083b0ce02830054
-master date: 2023-03-21 09:23:25 +0100
----
- xen/arch/x86/genapic/x2apic.c | 6 +++---
- 1 file changed, 3 insertions(+), 3 deletions(-)
-
-diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
-index 7dfc793514..d512c50fc5 100644
---- a/xen/arch/x86/genapic/x2apic.c
-+++ b/xen/arch/x86/genapic/x2apic.c
-@@ -236,11 +236,11 @@ const struct genapic *__init apic_x2apic_probe(void)
- if ( x2apic_phys < 0 )
- {
- /*
-- * Force physical mode if there's no interrupt remapping support: The
-- * ID in clustered mode requires a 32 bit destination field due to
-+ * Force physical mode if there's no (full) interrupt remapping support:
-+ * The ID in clustered mode requires a 32 bit destination field due to
- * the usage of the high 16 bits to hold the cluster ID.
- */
-- x2apic_phys = !iommu_intremap ||
-+ x2apic_phys = iommu_intremap != iommu_intremap_full ||
- (acpi_gbl_FADT.flags & ACPI_FADT_APIC_PHYSICAL) ||
- (IS_ENABLED(CONFIG_X2APIC_PHYSICAL) &&
- !(acpi_gbl_FADT.flags & ACPI_FADT_APIC_CLUSTER));
---
-2.40.0
-
diff --git a/0076-VT-d-fix-iommu-no-igfx-if-the-IOMMU-scope-contains-f.patch b/0076-VT-d-fix-iommu-no-igfx-if-the-IOMMU-scope-contains-f.patch
deleted file mode 100644
index d897da6..0000000
--- a/0076-VT-d-fix-iommu-no-igfx-if-the-IOMMU-scope-contains-f.patch
+++ /dev/null
@@ -1,44 +0,0 @@
-From 33b1c8cd86bd6c311131b8dff32bd45581e2fbc1 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
- <marmarek@invisiblethingslab.com>
-Date: Fri, 31 Mar 2023 08:29:55 +0200
-Subject: [PATCH 76/89] VT-d: fix iommu=no-igfx if the IOMMU scope contains
- fake device(s)
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-If the scope for IGD's IOMMU contains additional device that doesn't
-actually exist, iommu=no-igfx would not disable that IOMMU. In this
-particular case (Thinkpad x230) it included 00:02.1, but there is no
-such device on this platform. Consider only existing devices for the
-"gfx only" check as well as the establishing of IGD DRHD address
-(underlying is_igd_drhd(), which is used to determine applicability of
-two workarounds).
-
-Fixes: 2d7f191b392e ("VT-d: generalize and correct "iommu=no-igfx" handling")
-Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Kevin Tian <kevin.tian@intel.com>
-master commit: 49de6749baa8d0addc3048defd4ef3e85cb135e9
-master date: 2023-03-23 09:16:41 +0100
----
- xen/drivers/passthrough/vtd/dmar.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
-index 78d4526446..4936c20952 100644
---- a/xen/drivers/passthrough/vtd/dmar.c
-+++ b/xen/drivers/passthrough/vtd/dmar.c
-@@ -389,7 +389,7 @@ static int __init acpi_parse_dev_scope(
- printk(VTDPREFIX " endpoint: %pp\n",
- &PCI_SBDF(seg, bus, path->dev, path->fn));
-
-- if ( drhd )
-+ if ( drhd && pci_device_detect(seg, bus, path->dev, path->fn) )
- {
- if ( pci_conf_read8(PCI_SBDF(seg, bus, path->dev, path->fn),
- PCI_CLASS_DEVICE + 1) != 0x03
---
-2.40.0
-
diff --git a/0077-x86-shadow-fix-and-improve-sh_page_has_multiple_shad.patch b/0077-x86-shadow-fix-and-improve-sh_page_has_multiple_shad.patch
deleted file mode 100644
index 3486ccd..0000000
--- a/0077-x86-shadow-fix-and-improve-sh_page_has_multiple_shad.patch
+++ /dev/null
@@ -1,47 +0,0 @@
-From 6f2d89d68175e74aca9c67761aa87ffc8f5ffed1 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Fri, 31 Mar 2023 08:30:41 +0200
-Subject: [PATCH 77/89] x86/shadow: fix and improve
- sh_page_has_multiple_shadows()
-
-While no caller currently invokes the function without first making sure
-there is at least one shadow [1], we'd better eliminate UB here:
-find_first_set_bit() requires input to be non-zero to return a well-
-defined result.
-
-Further, using find_first_set_bit() isn't very efficient in the first
-place for the intended purpose.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-
-[1] The function has exactly two uses, and both are from OOS code, which
- is HVM-only. For HVM (but not for PV) sh_mfn_is_a_page_table(),
- guarding the call to sh_unsync(), guarantees at least one shadow.
- Hence even if sh_page_has_multiple_shadows() returned a bogus value
- when invoked for a PV domain, the subsequent is_hvm_vcpu() and
- oos_active checks (the former being redundant with the latter) will
- compensate. (Arguably that oos_active check should come first, for
- both clarity and efficiency reasons.)
-master commit: 2896224a4e294652c33f487b603d20bd30955f21
-master date: 2023-03-24 11:07:08 +0100
----
- xen/arch/x86/mm/shadow/private.h | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h
-index 85bb26c7ea..c2bb1ed3c3 100644
---- a/xen/arch/x86/mm/shadow/private.h
-+++ b/xen/arch/x86/mm/shadow/private.h
-@@ -324,7 +324,7 @@ static inline int sh_page_has_multiple_shadows(struct page_info *pg)
- return 0;
- shadows = pg->shadow_flags & SHF_page_type_mask;
- /* More than one type bit set in shadow-flags? */
-- return ( (shadows & ~(1UL << find_first_set_bit(shadows))) != 0 );
-+ return shadows && (shadows & (shadows - 1));
- }
-
- #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
---
-2.40.0
-
diff --git a/0078-x86-nospec-Fix-evaluate_nospec-code-generation-under.patch b/0078-x86-nospec-Fix-evaluate_nospec-code-generation-under.patch
deleted file mode 100644
index 62de15a..0000000
--- a/0078-x86-nospec-Fix-evaluate_nospec-code-generation-under.patch
+++ /dev/null
@@ -1,101 +0,0 @@
-From 00aa5c93d14c6561a69fe204cbe29f7519830782 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 31 Mar 2023 08:31:20 +0200
-Subject: [PATCH 78/89] x86/nospec: Fix evaluate_nospec() code generation under
- Clang
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-It turns out that evaluate_nospec() code generation is not safe under Clang.
-Given:
-
- void eval_nospec_test(int x)
- {
- if ( evaluate_nospec(x) )
- asm volatile ("nop #true" ::: "memory");
- else
- asm volatile ("nop #false" ::: "memory");
- }
-
-Clang emits:
-
- <eval_nospec_test>:
- 0f ae e8 lfence
- 85 ff test %edi,%edi
- 74 02 je <eval_nospec_test+0x9>
- 90 nop
- c3 ret
- 90 nop
- c3 ret
-
-which is not safe because the lfence has been hoisted above the conditional
-jump. Clang concludes that both barrier_nospec_true()'s have identical side
-effects and can safely be merged.
-
-Clang can be persuaded that the side effects are different if there are
-different comments in the asm blocks. This is fragile, but no more fragile
-that other aspects of this construct.
-
-Introduce barrier_nospec_false() with a separate internal comment to prevent
-Clang merging it with barrier_nospec_true() despite the otherwise-identical
-content. The generated code now becomes:
-
- <eval_nospec_test>:
- 85 ff test %edi,%edi
- 74 05 je <eval_nospec_test+0x9>
- 0f ae e8 lfence
- 90 nop
- c3 ret
- 0f ae e8 lfence
- 90 nop
- c3 ret
-
-which has the correct number of lfence's, and in the correct place.
-
-Link: https://github.com/llvm/llvm-project/issues/55084
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: bc3c133841435829ba5c0a48427e2a77633502ab
-master date: 2023-03-24 12:16:31 +0000
----
- xen/arch/x86/include/asm/nospec.h | 15 +++++++++++++--
- 1 file changed, 13 insertions(+), 2 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/nospec.h b/xen/arch/x86/include/asm/nospec.h
-index 5312ae4c6f..7150e76b87 100644
---- a/xen/arch/x86/include/asm/nospec.h
-+++ b/xen/arch/x86/include/asm/nospec.h
-@@ -10,15 +10,26 @@
- static always_inline bool barrier_nospec_true(void)
- {
- #ifdef CONFIG_SPECULATIVE_HARDEN_BRANCH
-- alternative("lfence", "", X86_FEATURE_SC_NO_BRANCH_HARDEN);
-+ alternative("lfence #nospec-true", "", X86_FEATURE_SC_NO_BRANCH_HARDEN);
- #endif
- return true;
- }
-
-+static always_inline bool barrier_nospec_false(void)
-+{
-+#ifdef CONFIG_SPECULATIVE_HARDEN_BRANCH
-+ alternative("lfence #nospec-false", "", X86_FEATURE_SC_NO_BRANCH_HARDEN);
-+#endif
-+ return false;
-+}
-+
- /* Allow to protect evaluation of conditionals with respect to speculation */
- static always_inline bool evaluate_nospec(bool condition)
- {
-- return condition ? barrier_nospec_true() : !barrier_nospec_true();
-+ if ( condition )
-+ return barrier_nospec_true();
-+ else
-+ return barrier_nospec_false();
- }
-
- /* Allow to block speculative execution in generic code */
---
-2.40.0
-
diff --git a/0079-x86-shadow-Fix-build-with-no-PG_log_dirty.patch b/0079-x86-shadow-Fix-build-with-no-PG_log_dirty.patch
deleted file mode 100644
index f7652a4..0000000
--- a/0079-x86-shadow-Fix-build-with-no-PG_log_dirty.patch
+++ /dev/null
@@ -1,56 +0,0 @@
-From 11c8ef59b9024849c0fc224354904615d5579628 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 31 Mar 2023 08:32:11 +0200
-Subject: [PATCH 79/89] x86/shadow: Fix build with no PG_log_dirty
-
-Gitlab Randconfig found:
-
- arch/x86/mm/shadow/common.c: In function 'shadow_prealloc':
- arch/x86/mm/shadow/common.c:1023:18: error: implicit declaration of function
- 'paging_logdirty_levels'; did you mean 'paging_log_dirty_init'? [-Werror=implicit-function-declaration]
- 1023 | count += paging_logdirty_levels();
- | ^~~~~~~~~~~~~~~~~~~~~~
- | paging_log_dirty_init
- arch/x86/mm/shadow/common.c:1023:18: error: nested extern declaration of 'paging_logdirty_levels' [-Werror=nested-externs]
-
-The '#if PG_log_dirty' expression is currently SHADOW_PAGING && !HVM &&
-PV_SHIM_EXCLUSIVE. Move the declaration outside.
-
-Fixes: 33fb3a661223 ("x86/shadow: account for log-dirty mode when pre-allocating")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 6d14cb105b1c54ad7b4228d858ae85aa8a672bbd
-master date: 2023-03-24 12:16:31 +0000
----
- xen/arch/x86/include/asm/paging.h | 8 ++++----
- 1 file changed, 4 insertions(+), 4 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/paging.h b/xen/arch/x86/include/asm/paging.h
-index 635ccc83b1..6f7000d5f4 100644
---- a/xen/arch/x86/include/asm/paging.h
-+++ b/xen/arch/x86/include/asm/paging.h
-@@ -152,6 +152,10 @@ struct paging_mode {
- /*****************************************************************************
- * Log dirty code */
-
-+#define paging_logdirty_levels() \
-+ (DIV_ROUND_UP(PADDR_BITS - PAGE_SHIFT - (PAGE_SHIFT + 3), \
-+ PAGE_SHIFT - ilog2(sizeof(mfn_t))) + 1)
-+
- #if PG_log_dirty
-
- /* get the dirty bitmap for a specific range of pfns */
-@@ -190,10 +194,6 @@ bool paging_mfn_is_dirty(const struct domain *d, mfn_t gmfn);
- #define L4_LOGDIRTY_IDX(pfn) ((pfn_x(pfn) >> (PAGE_SHIFT + 3 + PAGETABLE_ORDER * 2)) & \
- (LOGDIRTY_NODE_ENTRIES-1))
-
--#define paging_logdirty_levels() \
-- (DIV_ROUND_UP(PADDR_BITS - PAGE_SHIFT - (PAGE_SHIFT + 3), \
-- PAGE_SHIFT - ilog2(sizeof(mfn_t))) + 1)
--
- #ifdef CONFIG_HVM
- /* VRAM dirty tracking support */
- struct sh_dirty_vram {
---
-2.40.0
-
diff --git a/0080-x86-vmx-Don-t-spuriously-crash-the-domain-when-INIT-.patch b/0080-x86-vmx-Don-t-spuriously-crash-the-domain-when-INIT-.patch
deleted file mode 100644
index 539401f..0000000
--- a/0080-x86-vmx-Don-t-spuriously-crash-the-domain-when-INIT-.patch
+++ /dev/null
@@ -1,51 +0,0 @@
-From f6a3e93b3788aa009e9b86d9cb14c243b958daa9 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 31 Mar 2023 08:32:57 +0200
-Subject: [PATCH 80/89] x86/vmx: Don't spuriously crash the domain when INIT is
- received
-
-In VMX operation, the handling of INIT IPIs is changed. Instead of the CPU
-resetting, the next VMEntry fails with EXIT_REASON_INIT. From the TXT spec,
-the intent of this behaviour is so that an entity which cares can scrub
-secrets from RAM before participating in an orderly shutdown.
-
-Right now, Xen's behaviour is that when an INIT arrives, the HVM VM which
-schedules next is killed (citing an unknown VMExit), *and* we ignore the INIT
-and continue blindly onwards anyway.
-
-This patch addresses only the first of these two problems by ignoring the INIT
-and continuing without crashing the VM in question.
-
-The second wants addressing too, just as soon as we've figured out something
-better to do...
-
-Discovered as collateral damage from when an AP triple faults on S3 resume on
-Intel TigerLake platforms.
-
-Link: https://github.com/QubesOS/qubes-issues/issues/7283
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Kevin Tian <kevin.tian@intel.com>
-master commit: b1f11273d5a774cc88a3685c96c2e7cf6385e3b6
-master date: 2023-03-24 22:49:58 +0000
----
- xen/arch/x86/hvm/vmx/vmx.c | 4 ++++
- 1 file changed, 4 insertions(+)
-
-diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
-index a8fb4365ad..64dbd50197 100644
---- a/xen/arch/x86/hvm/vmx/vmx.c
-+++ b/xen/arch/x86/hvm/vmx/vmx.c
-@@ -4038,6 +4038,10 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
- case EXIT_REASON_MCE_DURING_VMENTRY:
- do_machine_check(regs);
- break;
-+
-+ case EXIT_REASON_INIT:
-+ printk(XENLOG_ERR "Error: INIT received - ignoring\n");
-+ return; /* Renter the guest without further processing */
- }
-
- /* Now enable interrupts so it's safe to take locks. */
---
-2.40.0
-
diff --git a/0081-x86-ucode-Fix-error-paths-control_thread_fn.patch b/0081-x86-ucode-Fix-error-paths-control_thread_fn.patch
deleted file mode 100644
index 765fa84..0000000
--- a/0081-x86-ucode-Fix-error-paths-control_thread_fn.patch
+++ /dev/null
@@ -1,56 +0,0 @@
-From 7f55774489d2f12a23f2ac0f516b62e2709cea99 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 31 Mar 2023 08:33:28 +0200
-Subject: [PATCH 81/89] x86/ucode: Fix error paths control_thread_fn()
-
-These two early exits skipped re-enabling the watchdog, restoring the NMI
-callback, and clearing the nmi_patch global pointer. Always execute the tail
-of the function on the way out.
-
-Fixes: 8dd4dfa92d62 ("x86/microcode: Synchronize late microcode loading")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: fc2e1f3aad602a66c14b8285a1bd38a82f8fd02d
-master date: 2023-03-28 11:57:56 +0100
----
- xen/arch/x86/cpu/microcode/core.c | 9 +++------
- 1 file changed, 3 insertions(+), 6 deletions(-)
-
-diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
-index 2497630bbe..c760723e4f 100644
---- a/xen/arch/x86/cpu/microcode/core.c
-+++ b/xen/arch/x86/cpu/microcode/core.c
-@@ -490,10 +490,7 @@ static int control_thread_fn(const struct microcode_patch *patch)
- ret = wait_for_condition(wait_cpu_callin, num_online_cpus(),
- MICROCODE_CALLIN_TIMEOUT_US);
- if ( ret )
-- {
-- set_state(LOADING_EXIT);
-- return ret;
-- }
-+ goto out;
-
- /* Control thread loads ucode first while others are in NMI handler. */
- ret = alternative_call(ucode_ops.apply_microcode, patch);
-@@ -505,8 +502,7 @@ static int control_thread_fn(const struct microcode_patch *patch)
- {
- printk(XENLOG_ERR
- "Late loading aborted: CPU%u failed to update ucode\n", cpu);
-- set_state(LOADING_EXIT);
-- return ret;
-+ goto out;
- }
-
- /* Let primary threads load the given ucode update */
-@@ -537,6 +533,7 @@ static int control_thread_fn(const struct microcode_patch *patch)
- }
- }
-
-+ out:
- /* Mark loading is done to unblock other threads */
- set_state(LOADING_EXIT);
-
---
-2.40.0
-
diff --git a/0082-include-don-t-mention-stub-headers-more-than-once-in.patch b/0082-include-don-t-mention-stub-headers-more-than-once-in.patch
deleted file mode 100644
index cc0a914..0000000
--- a/0082-include-don-t-mention-stub-headers-more-than-once-in.patch
+++ /dev/null
@@ -1,37 +0,0 @@
-From 350693582427887387f21a6eeedaa0ac48aecc3f Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Fri, 31 Mar 2023 08:34:04 +0200
-Subject: [PATCH 82/89] include: don't mention stub headers more than once in a
- make rule
-
-When !GRANT_TABLE and !PV_SHIM headers-n contains grant_table.h twice,
-causing make to complain "target '...' given more than once in the same
-rule" for the rule generating the stub headers. We don't need duplicate
-entries in headers-n anywhere, so zap them (by using $(sort ...)) right
-where the final value of the variable is constructed.
-
-Fixes: 6bec713f871f ("include/compat: produce stubs for headers not otherwise generated")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
-master commit: 231ab79704cbb5b9be7700287c3b185225d34f1b
-master date: 2023-03-28 14:20:16 +0200
----
- xen/include/Makefile | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/include/Makefile b/xen/include/Makefile
-index cfd7851614..e19f9464fd 100644
---- a/xen/include/Makefile
-+++ b/xen/include/Makefile
-@@ -34,7 +34,7 @@ headers-$(CONFIG_TRACEBUFFER) += compat/trace.h
- headers-$(CONFIG_XENOPROF) += compat/xenoprof.h
- headers-$(CONFIG_XSM_FLASK) += compat/xsm/flask_op.h
-
--headers-n := $(filter-out $(headers-y),$(headers-n) $(headers-))
-+headers-n := $(sort $(filter-out $(headers-y),$(headers-n) $(headers-)))
-
- cppflags-y := -include public/xen-compat.h -DXEN_GENERATING_COMPAT_HEADERS
- cppflags-$(CONFIG_X86) += -m32
---
-2.40.0
-
diff --git a/0083-vpci-msix-handle-accesses-adjacent-to-the-MSI-X-tabl.patch b/0083-vpci-msix-handle-accesses-adjacent-to-the-MSI-X-tabl.patch
deleted file mode 100644
index 8a1f412..0000000
--- a/0083-vpci-msix-handle-accesses-adjacent-to-the-MSI-X-tabl.patch
+++ /dev/null
@@ -1,540 +0,0 @@
-From 85100ed78ca18f188b1ca495f132db7df705f1a4 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Fri, 31 Mar 2023 08:34:26 +0200
-Subject: [PATCH 83/89] vpci/msix: handle accesses adjacent to the MSI-X table
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The handling of the MSI-X table accesses by Xen requires that any
-pages part of the MSI-X related tables are not mapped into the domain
-physmap. As a result, any device registers in the same pages as the
-start or the end of the MSIX or PBA tables is not currently
-accessible, as the accesses are just dropped.
-
-Note the spec forbids such placing of registers, as the MSIX and PBA
-tables must be 4K isolated from any other registers:
-
-"If a Base Address register that maps address space for the MSI-X
-Table or MSI-X PBA also maps other usable address space that is not
-associated with MSI-X structures, locations (e.g., for CSRs) used in
-the other address space must not share any naturally aligned 4-KB
-address range with one where either MSI-X structure resides."
-
-Yet the 'Intel Wi-Fi 6 AX201' device on one of my boxes has registers
-in the same page as the MSIX tables, and thus won't work on a PVH dom0
-without this fix.
-
-In order to cope with the behavior passthrough any accesses that fall
-on the same page as the MSIX tables (but don't fall in between) to the
-underlying hardware. Such forwarding also takes care of the PBA
-accesses, so it allows to remove the code doing this handling in
-msix_{read,write}. Note that as a result accesses to the PBA array
-are no longer limited to 4 and 8 byte sizes, there's no access size
-restriction for PBA accesses documented in the specification.
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-
-vpci/msix: restore PBA access length and alignment restrictions
-
-Accesses to the PBA array have the same length and alignment
-limitations as accesses to the MSI-X table:
-
-"For all accesses to MSI-X Table and MSI-X PBA fields, software must
-use aligned full DWORD or aligned full QWORD transactions; otherwise,
-the result is undefined."
-
-Introduce such length and alignment checks into the handling of PBA
-accesses for vPCI. This was a mistake of mine for not reading the
-specification correctly.
-
-Note that accesses must now be aligned, and hence there's no longer a
-need to check that the end of the access falls into the PBA region as
-both the access and the region addresses must be aligned.
-
-Fixes: b177892d2d ('vpci/msix: handle accesses adjacent to the MSI-X table')
-Reported-by: Jan Beulich <jbeulich@suse.com>
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: b177892d2d0e8a31122c218989f43130aeba5282
-master date: 2023-03-28 14:20:35 +0200
-master commit: 7a502b4fbc339e9d3d3d45fb37f09da06bc3081c
-master date: 2023-03-29 14:56:33 +0200
----
- xen/drivers/vpci/msix.c | 353 +++++++++++++++++++++++++++++-----------
- xen/drivers/vpci/vpci.c | 7 +-
- xen/include/xen/vpci.h | 8 +-
- 3 files changed, 273 insertions(+), 95 deletions(-)
-
-diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
-index bea0cc7aed..cafddcf305 100644
---- a/xen/drivers/vpci/msix.c
-+++ b/xen/drivers/vpci/msix.c
-@@ -27,6 +27,11 @@
- ((addr) >= vmsix_table_addr(vpci, nr) && \
- (addr) < vmsix_table_addr(vpci, nr) + vmsix_table_size(vpci, nr))
-
-+#define VMSIX_ADDR_SAME_PAGE(addr, vpci, nr) \
-+ (PFN_DOWN(addr) >= PFN_DOWN(vmsix_table_addr(vpci, nr)) && \
-+ PFN_DOWN(addr) <= PFN_DOWN(vmsix_table_addr(vpci, nr) + \
-+ vmsix_table_size(vpci, nr) - 1))
-+
- static uint32_t cf_check control_read(
- const struct pci_dev *pdev, unsigned int reg, void *data)
- {
-@@ -149,7 +154,7 @@ static struct vpci_msix *msix_find(const struct domain *d, unsigned long addr)
-
- for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
- if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled &&
-- VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) )
-+ VMSIX_ADDR_SAME_PAGE(addr, msix->pdev->vpci, i) )
- return msix;
- }
-
-@@ -182,36 +187,172 @@ static struct vpci_msix_entry *get_entry(struct vpci_msix *msix,
- return &msix->entries[(addr - start) / PCI_MSIX_ENTRY_SIZE];
- }
-
--static void __iomem *get_pba(struct vpci *vpci)
-+static void __iomem *get_table(struct vpci *vpci, unsigned int slot)
- {
- struct vpci_msix *msix = vpci->msix;
-+ paddr_t addr = 0;
-+
-+ ASSERT(spin_is_locked(&vpci->lock));
-+
-+ if ( likely(msix->table[slot]) )
-+ return msix->table[slot];
-+
-+ switch ( slot )
-+ {
-+ case VPCI_MSIX_TBL_TAIL:
-+ addr = vmsix_table_size(vpci, VPCI_MSIX_TABLE);
-+ fallthrough;
-+ case VPCI_MSIX_TBL_HEAD:
-+ addr += vmsix_table_addr(vpci, VPCI_MSIX_TABLE);
-+ break;
-+
-+ case VPCI_MSIX_PBA_TAIL:
-+ addr = vmsix_table_size(vpci, VPCI_MSIX_PBA);
-+ fallthrough;
-+ case VPCI_MSIX_PBA_HEAD:
-+ addr += vmsix_table_addr(vpci, VPCI_MSIX_PBA);
-+ break;
-+
-+ default:
-+ ASSERT_UNREACHABLE();
-+ return NULL;
-+ }
-+
-+ msix->table[slot] = ioremap(round_pgdown(addr), PAGE_SIZE);
-+
-+ return msix->table[slot];
-+}
-+
-+unsigned int get_slot(const struct vpci *vpci, unsigned long addr)
-+{
-+ unsigned long pfn = PFN_DOWN(addr);
-+
- /*
-- * PBA will only be unmapped when the device is deassigned, so access it
-- * without holding the vpci lock.
-+ * The logic below relies on having the tables identity mapped to the guest
-+ * address space, or for the `addr` parameter to be translated into its
-+ * host physical memory address equivalent.
- */
-- void __iomem *pba = read_atomic(&msix->pba);
-
-- if ( likely(pba) )
-- return pba;
-+ if ( pfn == PFN_DOWN(vmsix_table_addr(vpci, VPCI_MSIX_TABLE)) )
-+ return VPCI_MSIX_TBL_HEAD;
-+ if ( pfn == PFN_DOWN(vmsix_table_addr(vpci, VPCI_MSIX_TABLE) +
-+ vmsix_table_size(vpci, VPCI_MSIX_TABLE) - 1) )
-+ return VPCI_MSIX_TBL_TAIL;
-+ if ( pfn == PFN_DOWN(vmsix_table_addr(vpci, VPCI_MSIX_PBA)) )
-+ return VPCI_MSIX_PBA_HEAD;
-+ if ( pfn == PFN_DOWN(vmsix_table_addr(vpci, VPCI_MSIX_PBA) +
-+ vmsix_table_size(vpci, VPCI_MSIX_PBA) - 1) )
-+ return VPCI_MSIX_PBA_TAIL;
-+
-+ ASSERT_UNREACHABLE();
-+ return -1;
-+}
-+
-+static bool adjacent_handle(const struct vpci_msix *msix, unsigned long addr)
-+{
-+ unsigned int i;
-+
-+ if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
-+ return true;
-+
-+ if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_TABLE) )
-+ return false;
-+
-+ for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
-+ if ( VMSIX_ADDR_SAME_PAGE(addr, msix->pdev->vpci, i) )
-+ return true;
-+
-+ return false;
-+}
-
-- pba = ioremap(vmsix_table_addr(vpci, VPCI_MSIX_PBA),
-- vmsix_table_size(vpci, VPCI_MSIX_PBA));
-- if ( !pba )
-- return read_atomic(&msix->pba);
-+static int adjacent_read(const struct domain *d, const struct vpci_msix *msix,
-+ unsigned long addr, unsigned int len,
-+ unsigned long *data)
-+{
-+ const void __iomem *mem;
-+ struct vpci *vpci = msix->pdev->vpci;
-+ unsigned int slot;
-+
-+ *data = ~0ul;
-+
-+ if ( !adjacent_handle(msix, addr + len - 1) )
-+ return X86EMUL_OKAY;
-+
-+ if ( VMSIX_ADDR_IN_RANGE(addr, vpci, VPCI_MSIX_PBA) &&
-+ !access_allowed(msix->pdev, addr, len) )
-+ /* PBA accesses must be aligned and 4 or 8 bytes in size. */
-+ return X86EMUL_OKAY;
-+
-+ slot = get_slot(vpci, addr);
-+ if ( slot >= ARRAY_SIZE(msix->table) )
-+ return X86EMUL_OKAY;
-+
-+ if ( unlikely(!IS_ALIGNED(addr, len)) )
-+ {
-+ unsigned int i;
-+
-+ gprintk(XENLOG_DEBUG, "%pp: unaligned read to MSI-X related page\n",
-+ &msix->pdev->sbdf);
-+
-+ /*
-+ * Split unaligned accesses into byte sized ones. Shouldn't happen in
-+ * the first place, but devices shouldn't have registers in the same 4K
-+ * page as the MSIX tables either.
-+ *
-+ * It's unclear whether this could cause issues if a guest expects
-+ * registers to be accessed atomically, it better use an aligned access
-+ * if it has such expectations.
-+ */
-+ for ( i = 0; i < len; i++ )
-+ {
-+ unsigned long partial = ~0ul;
-+ int rc = adjacent_read(d, msix, addr + i, 1, &partial);
-+
-+ if ( rc != X86EMUL_OKAY )
-+ return rc;
-+
-+ *data &= ~(0xfful << (i * 8));
-+ *data |= (partial & 0xff) << (i * 8);
-+ }
-+
-+ return X86EMUL_OKAY;
-+ }
-
- spin_lock(&vpci->lock);
-- if ( !msix->pba )
-+ mem = get_table(vpci, slot);
-+ if ( !mem )
- {
-- write_atomic(&msix->pba, pba);
- spin_unlock(&vpci->lock);
-+ gprintk(XENLOG_WARNING,
-+ "%pp: unable to map MSI-X page, returning all bits set\n",
-+ &msix->pdev->sbdf);
-+ return X86EMUL_OKAY;
- }
-- else
-+
-+ switch ( len )
- {
-- spin_unlock(&vpci->lock);
-- iounmap(pba);
-+ case 1:
-+ *data = readb(mem + PAGE_OFFSET(addr));
-+ break;
-+
-+ case 2:
-+ *data = readw(mem + PAGE_OFFSET(addr));
-+ break;
-+
-+ case 4:
-+ *data = readl(mem + PAGE_OFFSET(addr));
-+ break;
-+
-+ case 8:
-+ *data = readq(mem + PAGE_OFFSET(addr));
-+ break;
-+
-+ default:
-+ ASSERT_UNREACHABLE();
- }
-+ spin_unlock(&vpci->lock);
-
-- return read_atomic(&msix->pba);
-+ return X86EMUL_OKAY;
- }
-
- static int cf_check msix_read(
-@@ -227,47 +368,11 @@ static int cf_check msix_read(
- if ( !msix )
- return X86EMUL_RETRY;
-
-- if ( !access_allowed(msix->pdev, addr, len) )
-- return X86EMUL_OKAY;
--
-- if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
-- {
-- struct vpci *vpci = msix->pdev->vpci;
-- unsigned int idx = addr - vmsix_table_addr(vpci, VPCI_MSIX_PBA);
-- const void __iomem *pba = get_pba(vpci);
--
-- /*
-- * Access to PBA.
-- *
-- * TODO: note that this relies on having the PBA identity mapped to the
-- * guest address space. If this changes the address will need to be
-- * translated.
-- */
-- if ( !pba )
-- {
-- gprintk(XENLOG_WARNING,
-- "%pp: unable to map MSI-X PBA, report all pending\n",
-- &msix->pdev->sbdf);
-- return X86EMUL_OKAY;
-- }
--
-- switch ( len )
-- {
-- case 4:
-- *data = readl(pba + idx);
-- break;
--
-- case 8:
-- *data = readq(pba + idx);
-- break;
--
-- default:
-- ASSERT_UNREACHABLE();
-- break;
-- }
-+ if ( adjacent_handle(msix, addr) )
-+ return adjacent_read(d, msix, addr, len, data);
-
-+ if ( !access_allowed(msix->pdev, addr, len) )
- return X86EMUL_OKAY;
-- }
-
- spin_lock(&msix->pdev->vpci->lock);
- entry = get_entry(msix, addr);
-@@ -303,56 +408,102 @@ static int cf_check msix_read(
- return X86EMUL_OKAY;
- }
-
--static int cf_check msix_write(
-- struct vcpu *v, unsigned long addr, unsigned int len, unsigned long data)
-+static int adjacent_write(const struct domain *d, const struct vpci_msix *msix,
-+ unsigned long addr, unsigned int len,
-+ unsigned long data)
- {
-- const struct domain *d = v->domain;
-- struct vpci_msix *msix = msix_find(d, addr);
-- struct vpci_msix_entry *entry;
-- unsigned int offset;
-+ void __iomem *mem;
-+ struct vpci *vpci = msix->pdev->vpci;
-+ unsigned int slot;
-
-- if ( !msix )
-- return X86EMUL_RETRY;
-+ if ( !adjacent_handle(msix, addr + len - 1) )
-+ return X86EMUL_OKAY;
-
-- if ( !access_allowed(msix->pdev, addr, len) )
-+ /*
-+ * Only check start and end of the access because the size of the PBA is
-+ * assumed to be equal or bigger (8 bytes) than the length of any access
-+ * handled here.
-+ */
-+ if ( VMSIX_ADDR_IN_RANGE(addr, vpci, VPCI_MSIX_PBA) &&
-+ (!access_allowed(msix->pdev, addr, len) || !is_hardware_domain(d)) )
-+ /* Ignore writes to PBA for DomUs, it's undefined behavior. */
- return X86EMUL_OKAY;
-
-- if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
-+ slot = get_slot(vpci, addr);
-+ if ( slot >= ARRAY_SIZE(msix->table) )
-+ return X86EMUL_OKAY;
-+
-+ if ( unlikely(!IS_ALIGNED(addr, len)) )
- {
-- struct vpci *vpci = msix->pdev->vpci;
-- unsigned int idx = addr - vmsix_table_addr(vpci, VPCI_MSIX_PBA);
-- const void __iomem *pba = get_pba(vpci);
-+ unsigned int i;
-
-- if ( !is_hardware_domain(d) )
-- /* Ignore writes to PBA for DomUs, it's behavior is undefined. */
-- return X86EMUL_OKAY;
-+ gprintk(XENLOG_DEBUG, "%pp: unaligned write to MSI-X related page\n",
-+ &msix->pdev->sbdf);
-
-- if ( !pba )
-+ for ( i = 0; i < len; i++ )
- {
-- /* Unable to map the PBA, ignore write. */
-- gprintk(XENLOG_WARNING,
-- "%pp: unable to map MSI-X PBA, write ignored\n",
-- &msix->pdev->sbdf);
-- return X86EMUL_OKAY;
-+ int rc = adjacent_write(d, msix, addr + i, 1, data >> (i * 8));
-+
-+ if ( rc != X86EMUL_OKAY )
-+ return rc;
- }
-
-- switch ( len )
-- {
-- case 4:
-- writel(data, pba + idx);
-- break;
-+ return X86EMUL_OKAY;
-+ }
-
-- case 8:
-- writeq(data, pba + idx);
-- break;
-+ spin_lock(&vpci->lock);
-+ mem = get_table(vpci, slot);
-+ if ( !mem )
-+ {
-+ spin_unlock(&vpci->lock);
-+ gprintk(XENLOG_WARNING,
-+ "%pp: unable to map MSI-X page, dropping write\n",
-+ &msix->pdev->sbdf);
-+ return X86EMUL_OKAY;
-+ }
-
-- default:
-- ASSERT_UNREACHABLE();
-- break;
-- }
-+ switch ( len )
-+ {
-+ case 1:
-+ writeb(data, mem + PAGE_OFFSET(addr));
-+ break;
-
-- return X86EMUL_OKAY;
-+ case 2:
-+ writew(data, mem + PAGE_OFFSET(addr));
-+ break;
-+
-+ case 4:
-+ writel(data, mem + PAGE_OFFSET(addr));
-+ break;
-+
-+ case 8:
-+ writeq(data, mem + PAGE_OFFSET(addr));
-+ break;
-+
-+ default:
-+ ASSERT_UNREACHABLE();
- }
-+ spin_unlock(&vpci->lock);
-+
-+ return X86EMUL_OKAY;
-+}
-+
-+static int cf_check msix_write(
-+ struct vcpu *v, unsigned long addr, unsigned int len, unsigned long data)
-+{
-+ const struct domain *d = v->domain;
-+ struct vpci_msix *msix = msix_find(d, addr);
-+ struct vpci_msix_entry *entry;
-+ unsigned int offset;
-+
-+ if ( !msix )
-+ return X86EMUL_RETRY;
-+
-+ if ( adjacent_handle(msix, addr) )
-+ return adjacent_write(d, msix, addr, len, data);
-+
-+ if ( !access_allowed(msix->pdev, addr, len) )
-+ return X86EMUL_OKAY;
-
- spin_lock(&msix->pdev->vpci->lock);
- entry = get_entry(msix, addr);
-@@ -482,6 +633,26 @@ int vpci_make_msix_hole(const struct pci_dev *pdev)
- }
- }
-
-+ if ( is_hardware_domain(d) )
-+ {
-+ /*
-+ * For dom0 only: remove any hypervisor mappings of the MSIX or PBA
-+ * related areas, as dom0 is capable of moving the position of the BARs
-+ * in the host address space.
-+ *
-+ * We rely on being called with the vPCI lock held once the domain is
-+ * running, so the maps are not in use.
-+ */
-+ for ( i = 0; i < ARRAY_SIZE(pdev->vpci->msix->table); i++ )
-+ if ( pdev->vpci->msix->table[i] )
-+ {
-+ /* If there are any maps, the domain must be running. */
-+ ASSERT(spin_is_locked(&pdev->vpci->lock));
-+ iounmap(pdev->vpci->msix->table[i]);
-+ pdev->vpci->msix->table[i] = NULL;
-+ }
-+ }
-+
- return 0;
- }
-
-diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
-index 6d48d496bb..652807a4a4 100644
---- a/xen/drivers/vpci/vpci.c
-+++ b/xen/drivers/vpci/vpci.c
-@@ -54,9 +54,12 @@ void vpci_remove_device(struct pci_dev *pdev)
- spin_unlock(&pdev->vpci->lock);
- if ( pdev->vpci->msix )
- {
-+ unsigned int i;
-+
- list_del(&pdev->vpci->msix->next);
-- if ( pdev->vpci->msix->pba )
-- iounmap(pdev->vpci->msix->pba);
-+ for ( i = 0; i < ARRAY_SIZE(pdev->vpci->msix->table); i++ )
-+ if ( pdev->vpci->msix->table[i] )
-+ iounmap(pdev->vpci->msix->table[i]);
- }
- xfree(pdev->vpci->msix);
- xfree(pdev->vpci->msi);
-diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
-index d8acfeba8a..0b8a2a3c74 100644
---- a/xen/include/xen/vpci.h
-+++ b/xen/include/xen/vpci.h
-@@ -133,8 +133,12 @@ struct vpci {
- bool enabled : 1;
- /* Masked? */
- bool masked : 1;
-- /* PBA map */
-- void __iomem *pba;
-+ /* Partial table map. */
-+#define VPCI_MSIX_TBL_HEAD 0
-+#define VPCI_MSIX_TBL_TAIL 1
-+#define VPCI_MSIX_PBA_HEAD 2
-+#define VPCI_MSIX_PBA_TAIL 3
-+ void __iomem *table[4];
- /* Entries. */
- struct vpci_msix_entry {
- uint64_t addr;
---
-2.40.0
-
diff --git a/0084-ns16550-correct-name-value-pair-parsing-for-PCI-port.patch b/0084-ns16550-correct-name-value-pair-parsing-for-PCI-port.patch
deleted file mode 100644
index 6ab5c69..0000000
--- a/0084-ns16550-correct-name-value-pair-parsing-for-PCI-port.patch
+++ /dev/null
@@ -1,59 +0,0 @@
-From 7758cd57e002c5096b2296ede67c59fca68724d7 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Fri, 31 Mar 2023 08:35:15 +0200
-Subject: [PATCH 84/89] ns16550: correct name/value pair parsing for PCI
- port/bridge
-
-First of all these were inverted: "bridge=" caused the port coordinates
-to be established, while "port=" controlled the bridge coordinates. And
-then the error messages being identical also wasn't helpful. While
-correcting this also move both case blocks close together.
-
-Fixes: 97fd49a7e074 ("ns16550: add support for UART parameters to be specifed with name-value pairs")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: e692b22230b411d762ac9e278a398e28df474eae
-master date: 2023-03-29 14:55:37 +0200
----
- xen/drivers/char/ns16550.c | 16 ++++++++--------
- 1 file changed, 8 insertions(+), 8 deletions(-)
-
-diff --git a/xen/drivers/char/ns16550.c b/xen/drivers/char/ns16550.c
-index ce013fb6a5..97b3d8d269 100644
---- a/xen/drivers/char/ns16550.c
-+++ b/xen/drivers/char/ns16550.c
-@@ -1631,13 +1631,6 @@ static bool __init parse_namevalue_pairs(char *str, struct ns16550 *uart)
- break;
-
- #ifdef CONFIG_HAS_PCI
-- case bridge_bdf:
-- if ( !parse_pci(param_value, NULL, &uart->ps_bdf[0],
-- &uart->ps_bdf[1], &uart->ps_bdf[2]) )
-- PARSE_ERR_RET("Bad port PCI coordinates\n");
-- uart->ps_bdf_enable = true;
-- break;
--
- case device:
- if ( strncmp(param_value, "pci", 3) == 0 )
- {
-@@ -1652,9 +1645,16 @@ static bool __init parse_namevalue_pairs(char *str, struct ns16550 *uart)
- break;
-
- case port_bdf:
-+ if ( !parse_pci(param_value, NULL, &uart->ps_bdf[0],
-+ &uart->ps_bdf[1], &uart->ps_bdf[2]) )
-+ PARSE_ERR_RET("Bad port PCI coordinates\n");
-+ uart->ps_bdf_enable = true;
-+ break;
-+
-+ case bridge_bdf:
- if ( !parse_pci(param_value, NULL, &uart->pb_bdf[0],
- &uart->pb_bdf[1], &uart->pb_bdf[2]) )
-- PARSE_ERR_RET("Bad port PCI coordinates\n");
-+ PARSE_ERR_RET("Bad bridge PCI coordinates\n");
- uart->pb_bdf_enable = true;
- break;
- #endif
---
-2.40.0
-
diff --git a/0085-CI-Drop-automation-configs.patch b/0085-CI-Drop-automation-configs.patch
deleted file mode 100644
index bfed25a..0000000
--- a/0085-CI-Drop-automation-configs.patch
+++ /dev/null
@@ -1,87 +0,0 @@
-From 4c0d792675f0843c6dd52acdae38e5c0e112b09e Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 29 Dec 2022 15:39:13 +0000
-Subject: [PATCH 85/89] CI: Drop automation/configs/
-
-Having 3 extra hypervisor builds on the end of a full build is deeply
-confusing to debug if one of them fails, because the .config file presented in
-the artefacts is not the one which caused a build failure. Also, the log
-tends to be truncated in the UI.
-
-PV-only is tested as part of PV-Shim in a full build anyway, so doesn't need
-repeating. HVM-only and neither appear frequently in randconfig, so drop all
-the logic here to simplify things.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Michal Orzel <michal.orzel@amd.com>
-Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
-(cherry picked from commit 7b20009a812f26e74bdbde2ab96165376b3dad34)
----
- automation/configs/x86/hvm_only_config | 3 ---
- automation/configs/x86/no_hvm_pv_config | 3 ---
- automation/configs/x86/pv_only_config | 3 ---
- automation/scripts/build | 21 ---------------------
- 4 files changed, 30 deletions(-)
- delete mode 100644 automation/configs/x86/hvm_only_config
- delete mode 100644 automation/configs/x86/no_hvm_pv_config
- delete mode 100644 automation/configs/x86/pv_only_config
-
-diff --git a/automation/configs/x86/hvm_only_config b/automation/configs/x86/hvm_only_config
-deleted file mode 100644
-index 9efbddd535..0000000000
---- a/automation/configs/x86/hvm_only_config
-+++ /dev/null
-@@ -1,3 +0,0 @@
--CONFIG_HVM=y
--# CONFIG_PV is not set
--# CONFIG_DEBUG is not set
-diff --git a/automation/configs/x86/no_hvm_pv_config b/automation/configs/x86/no_hvm_pv_config
-deleted file mode 100644
-index 0bf6a8e468..0000000000
---- a/automation/configs/x86/no_hvm_pv_config
-+++ /dev/null
-@@ -1,3 +0,0 @@
--# CONFIG_HVM is not set
--# CONFIG_PV is not set
--# CONFIG_DEBUG is not set
-diff --git a/automation/configs/x86/pv_only_config b/automation/configs/x86/pv_only_config
-deleted file mode 100644
-index e9d8b4a7c7..0000000000
---- a/automation/configs/x86/pv_only_config
-+++ /dev/null
-@@ -1,3 +0,0 @@
--CONFIG_PV=y
--# CONFIG_HVM is not set
--# CONFIG_DEBUG is not set
-diff --git a/automation/scripts/build b/automation/scripts/build
-index a593419063..5dafa72ba5 100755
---- a/automation/scripts/build
-+++ b/automation/scripts/build
-@@ -85,24 +85,3 @@ if [[ "${XEN_TARGET_ARCH}" != "x86_32" ]]; then
- cp -r dist binaries/
- fi
- fi
--
--if [[ "${hypervisor_only}" == "y" ]]; then
-- # If we are build testing a specific Kconfig exit now, there's no point in
-- # testing all the possible configs.
-- exit 0
--fi
--
--# Build all the configs we care about
--case ${XEN_TARGET_ARCH} in
-- x86_64) arch=x86 ;;
-- *) exit 0 ;;
--esac
--
--cfg_dir="automation/configs/${arch}"
--for cfg in `ls ${cfg_dir}`; do
-- echo "Building $cfg"
-- make -j$(nproc) -C xen clean
-- rm -f xen/.config
-- make -C xen KBUILD_DEFCONFIG=../../../../${cfg_dir}/${cfg} defconfig
-- make -j$(nproc) -C xen
--done
---
-2.40.0
-
diff --git a/0086-automation-Switch-arm32-cross-builds-to-run-on-arm64.patch b/0086-automation-Switch-arm32-cross-builds-to-run-on-arm64.patch
deleted file mode 100644
index a200cab..0000000
--- a/0086-automation-Switch-arm32-cross-builds-to-run-on-arm64.patch
+++ /dev/null
@@ -1,87 +0,0 @@
-From e3b23da4a10fafdabce22e2eba225d9404fc646f Mon Sep 17 00:00:00 2001
-From: Michal Orzel <michal.orzel@amd.com>
-Date: Tue, 14 Feb 2023 16:38:38 +0100
-Subject: [PATCH 86/89] automation: Switch arm32 cross builds to run on arm64
-
-Due to the limited x86 CI resources slowing down the whole pipeline,
-switch the arm32 cross builds to be executed on arm64 which is much more
-capable. For that, rename the existing debian container dockerfile
-from unstable-arm32-gcc to unstable-arm64v8-arm32-gcc and use
-arm64v8/debian:unstable as an image. Note, that we cannot use the same
-container name as we have to keep the backwards compatibility.
-Take the opportunity to remove extra empty line at the end of a file.
-
-Modify the tag of .arm32-cross-build-tmpl to arm64 and update the build
-jobs accordingly.
-
-Signed-off-by: Michal Orzel <michal.orzel@amd.com>
-Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
-(cherry picked from commit a35fccc8df93de7154dba87db6e7bcf391e9d51c)
----
- ...ockerfile => unstable-arm64v8-arm32-gcc.dockerfile} | 3 +--
- automation/gitlab-ci/build.yaml | 10 +++++-----
- 2 files changed, 6 insertions(+), 7 deletions(-)
- rename automation/build/debian/{unstable-arm32-gcc.dockerfile => unstable-arm64v8-arm32-gcc.dockerfile} (94%)
-
-diff --git a/automation/build/debian/unstable-arm32-gcc.dockerfile b/automation/build/debian/unstable-arm64v8-arm32-gcc.dockerfile
-similarity index 94%
-rename from automation/build/debian/unstable-arm32-gcc.dockerfile
-rename to automation/build/debian/unstable-arm64v8-arm32-gcc.dockerfile
-index b41a57f197..11860425a6 100644
---- a/automation/build/debian/unstable-arm32-gcc.dockerfile
-+++ b/automation/build/debian/unstable-arm64v8-arm32-gcc.dockerfile
-@@ -1,4 +1,4 @@
--FROM debian:unstable
-+FROM arm64v8/debian:unstable
- LABEL maintainer.name="The Xen Project" \
- maintainer.email="xen-devel@lists.xenproject.org"
-
-@@ -21,4 +21,3 @@ RUN apt-get update && \
- apt-get autoremove -y && \
- apt-get clean && \
- rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
--
-diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
-index bed161b471..b4caf159f9 100644
---- a/automation/gitlab-ci/build.yaml
-+++ b/automation/gitlab-ci/build.yaml
-@@ -123,7 +123,7 @@
- variables:
- XEN_TARGET_ARCH: arm32
- tags:
-- - x86_64
-+ - arm64
-
- .arm32-cross-build:
- extends: .arm32-cross-build-tmpl
-@@ -505,23 +505,23 @@ alpine-3.12-clang-debug:
- debian-unstable-gcc-arm32:
- extends: .gcc-arm32-cross-build
- variables:
-- CONTAINER: debian:unstable-arm32-gcc
-+ CONTAINER: debian:unstable-arm64v8-arm32-gcc
-
- debian-unstable-gcc-arm32-debug:
- extends: .gcc-arm32-cross-build-debug
- variables:
-- CONTAINER: debian:unstable-arm32-gcc
-+ CONTAINER: debian:unstable-arm64v8-arm32-gcc
-
- debian-unstable-gcc-arm32-randconfig:
- extends: .gcc-arm32-cross-build
- variables:
-- CONTAINER: debian:unstable-arm32-gcc
-+ CONTAINER: debian:unstable-arm64v8-arm32-gcc
- RANDCONFIG: y
-
- debian-unstable-gcc-arm32-debug-randconfig:
- extends: .gcc-arm32-cross-build-debug
- variables:
-- CONTAINER: debian:unstable-arm32-gcc
-+ CONTAINER: debian:unstable-arm64v8-arm32-gcc
- RANDCONFIG: y
-
- # Arm builds
---
-2.40.0
-
diff --git a/0087-automation-Remove-CentOS-7.2-containers-and-builds.patch b/0087-automation-Remove-CentOS-7.2-containers-and-builds.patch
deleted file mode 100644
index b5d629d..0000000
--- a/0087-automation-Remove-CentOS-7.2-containers-and-builds.patch
+++ /dev/null
@@ -1,145 +0,0 @@
-From 8c414bab3092bb68ab4eaaba39b61e3804c45f0a Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Tue, 21 Feb 2023 16:55:36 +0000
-Subject: [PATCH 87/89] automation: Remove CentOS 7.2 containers and builds
-
-We already have a container which track the latest CentOS 7, no need
-for this one as well.
-
-Also, 7.2 have outdated root certificate which prevent connection to
-website which use Let's Encrypt.
-
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-(cherry picked from commit ba512629f76dfddb39ea9133ee51cdd9e392a927)
----
- automation/build/centos/7.2.dockerfile | 52 -------------------------
- automation/build/centos/CentOS-7.2.repo | 35 -----------------
- automation/gitlab-ci/build.yaml | 10 -----
- 3 files changed, 97 deletions(-)
- delete mode 100644 automation/build/centos/7.2.dockerfile
- delete mode 100644 automation/build/centos/CentOS-7.2.repo
-
-diff --git a/automation/build/centos/7.2.dockerfile b/automation/build/centos/7.2.dockerfile
-deleted file mode 100644
-index 4baa097e31..0000000000
---- a/automation/build/centos/7.2.dockerfile
-+++ /dev/null
-@@ -1,52 +0,0 @@
--FROM centos:7.2.1511
--LABEL maintainer.name="The Xen Project" \
-- maintainer.email="xen-devel@lists.xenproject.org"
--
--# ensure we only get bits from the vault for
--# the version we want
--COPY CentOS-7.2.repo /etc/yum.repos.d/CentOS-Base.repo
--
--# install EPEL for dev86, xz-devel and possibly other packages
--RUN yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm && \
-- yum clean all
--
--RUN mkdir /build
--WORKDIR /build
--
--# work around https://github.com/moby/moby/issues/10180
--# and install Xen depends
--RUN rpm --rebuilddb && \
-- yum -y install \
-- yum-plugin-ovl \
-- gcc \
-- gcc-c++ \
-- ncurses-devel \
-- zlib-devel \
-- openssl-devel \
-- python-devel \
-- libuuid-devel \
-- pkgconfig \
-- # gettext for Xen < 4.13
-- gettext \
-- flex \
-- bison \
-- libaio-devel \
-- glib2-devel \
-- yajl-devel \
-- pixman-devel \
-- glibc-devel \
-- # glibc-devel.i686 for Xen < 4.15
-- glibc-devel.i686 \
-- make \
-- binutils \
-- git \
-- wget \
-- acpica-tools \
-- python-markdown \
-- patch \
-- checkpolicy \
-- dev86 \
-- xz-devel \
-- bzip2 \
-- nasm \
-- && yum clean all
-diff --git a/automation/build/centos/CentOS-7.2.repo b/automation/build/centos/CentOS-7.2.repo
-deleted file mode 100644
-index 4da27faeb5..0000000000
---- a/automation/build/centos/CentOS-7.2.repo
-+++ /dev/null
-@@ -1,35 +0,0 @@
--# CentOS-Base.repo
--#
--# This is a replacement file that pins things to just use CentOS 7.2
--# from the CentOS Vault.
--#
--
--[base]
--name=CentOS-7.2.1511 - Base
--baseurl=http://vault.centos.org/7.2.1511/os/$basearch/
--gpgcheck=1
--gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
--
--#released updates
--[updates]
--name=CentOS-7.2.1511 - Updates
--baseurl=http://vault.centos.org/7.2.1511/updates/$basearch/
--gpgcheck=1
--gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
--
--#additional packages that may be useful
--[extras]
--name=CentOS-7.2.1511 - Extras
--baseurl=http://vault.centos.org/7.2.1511/extras/$basearch/
--gpgcheck=1
--gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
--
--#additional packages that extend functionality of existing packages
--[centosplus]
--name=CentOS-7.2.1511 - Plus
--baseurl=http://vault.centos.org/7.2.1511/centosplus/$basearch/
--gpgcheck=1
--gpgcheck=1
--enabled=0
--gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
--
-diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
-index b4caf159f9..ff6df1cfc2 100644
---- a/automation/gitlab-ci/build.yaml
-+++ b/automation/gitlab-ci/build.yaml
-@@ -184,16 +184,6 @@ archlinux-gcc-debug:
- variables:
- CONTAINER: archlinux:current
-
--centos-7-2-gcc:
-- extends: .gcc-x86-64-build
-- variables:
-- CONTAINER: centos:7.2
--
--centos-7-2-gcc-debug:
-- extends: .gcc-x86-64-build-debug
-- variables:
-- CONTAINER: centos:7.2
--
- centos-7-gcc:
- extends: .gcc-x86-64-build
- variables:
---
-2.40.0
-
diff --git a/0088-automation-Remove-non-debug-x86_32-build-jobs.patch b/0088-automation-Remove-non-debug-x86_32-build-jobs.patch
deleted file mode 100644
index d16014e..0000000
--- a/0088-automation-Remove-non-debug-x86_32-build-jobs.patch
+++ /dev/null
@@ -1,67 +0,0 @@
-From 435a1e5e8fd6fbd52cc16570dcff5982bdbec351 Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Fri, 24 Feb 2023 17:29:15 +0000
-Subject: [PATCH 88/89] automation: Remove non-debug x86_32 build jobs
-
-In the interest of having less jobs, we remove the x86_32 build jobs
-that do release build. Debug build is very likely to be enough to find
-32bit build issues.
-
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-(cherry picked from commit 7b66792ea7f77fb9e587e1e9c530a7c869eecba1)
----
- automation/gitlab-ci/build.yaml | 20 --------------------
- 1 file changed, 20 deletions(-)
-
-diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
-index ff6df1cfc2..eea517aa0a 100644
---- a/automation/gitlab-ci/build.yaml
-+++ b/automation/gitlab-ci/build.yaml
-@@ -264,21 +264,11 @@ debian-stretch-gcc-debug:
- variables:
- CONTAINER: debian:stretch
-
--debian-stretch-32-clang:
-- extends: .clang-x86-32-build
-- variables:
-- CONTAINER: debian:stretch-i386
--
- debian-stretch-32-clang-debug:
- extends: .clang-x86-32-build-debug
- variables:
- CONTAINER: debian:stretch-i386
-
--debian-stretch-32-gcc:
-- extends: .gcc-x86-32-build
-- variables:
-- CONTAINER: debian:stretch-i386
--
- debian-stretch-32-gcc-debug:
- extends: .gcc-x86-32-build-debug
- variables:
-@@ -324,21 +314,11 @@ debian-unstable-gcc-debug-randconfig:
- CONTAINER: debian:unstable
- RANDCONFIG: y
-
--debian-unstable-32-clang:
-- extends: .clang-x86-32-build
-- variables:
-- CONTAINER: debian:unstable-i386
--
- debian-unstable-32-clang-debug:
- extends: .clang-x86-32-build-debug
- variables:
- CONTAINER: debian:unstable-i386
-
--debian-unstable-32-gcc:
-- extends: .gcc-x86-32-build
-- variables:
-- CONTAINER: debian:unstable-i386
--
- debian-unstable-32-gcc-debug:
- extends: .gcc-x86-32-build-debug
- variables:
---
-2.40.0
-
diff --git a/0089-CI-Remove-llvm-8-from-the-Debian-Stretch-container.patch b/0089-CI-Remove-llvm-8-from-the-Debian-Stretch-container.patch
deleted file mode 100644
index c0294ec..0000000
--- a/0089-CI-Remove-llvm-8-from-the-Debian-Stretch-container.patch
+++ /dev/null
@@ -1,103 +0,0 @@
-From e4a5fb9227889bec99ab212b839680f4d5b51e60 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 24 Mar 2023 17:59:56 +0000
-Subject: [PATCH 89/89] CI: Remove llvm-8 from the Debian Stretch container
-
-For similar reasons to c/s a6b1e2b80fe20. While this container is still
-build-able for now, all the other problems with explicitly-versioned compilers
-remain.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
-(cherry picked from commit 7a298375721636290a57f31bb0f7c2a5a38956a4)
----
- automation/build/debian/stretch-llvm-8.list | 3 ---
- automation/build/debian/stretch.dockerfile | 12 ---------
- automation/gitlab-ci/build.yaml | 27 ---------------------
- 3 files changed, 42 deletions(-)
- delete mode 100644 automation/build/debian/stretch-llvm-8.list
-
-diff --git a/automation/build/debian/stretch-llvm-8.list b/automation/build/debian/stretch-llvm-8.list
-deleted file mode 100644
-index 09fe843fb2..0000000000
---- a/automation/build/debian/stretch-llvm-8.list
-+++ /dev/null
-@@ -1,3 +0,0 @@
--# Strech LLVM 8 repos
--deb http://apt.llvm.org/stretch/ llvm-toolchain-stretch-8 main
--deb-src http://apt.llvm.org/stretch/ llvm-toolchain-stretch-8 main
-diff --git a/automation/build/debian/stretch.dockerfile b/automation/build/debian/stretch.dockerfile
-index da6aa874dd..9861acbcc3 100644
---- a/automation/build/debian/stretch.dockerfile
-+++ b/automation/build/debian/stretch.dockerfile
-@@ -53,15 +53,3 @@ RUN apt-get update && \
- apt-get autoremove -y && \
- apt-get clean && \
- rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
--
--RUN wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | apt-key add -
--COPY stretch-llvm-8.list /etc/apt/sources.list.d/
--
--RUN apt-get update && \
-- apt-get --quiet --yes install \
-- clang-8 \
-- lld-8 \
-- && \
-- apt-get autoremove -y && \
-- apt-get clean && \
-- rm -rf /var/lib/apt/lists* /tmp/* /var/tmp/*
-diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
-index eea517aa0a..802449cb96 100644
---- a/automation/gitlab-ci/build.yaml
-+++ b/automation/gitlab-ci/build.yaml
-@@ -27,13 +27,6 @@
- CXX: clang++
- clang: y
-
--.clang-8-tmpl:
-- variables: &clang-8
-- CC: clang-8
-- CXX: clang++-8
-- LD: ld.lld-8
-- clang: y
--
- .x86-64-build-tmpl:
- <<: *build
- variables:
-@@ -98,16 +91,6 @@
- variables:
- <<: *clang
-
--.clang-8-x86-64-build:
-- extends: .x86-64-build
-- variables:
-- <<: *clang-8
--
--.clang-8-x86-64-build-debug:
-- extends: .x86-64-build-debug
-- variables:
-- <<: *clang-8
--
- .clang-x86-32-build:
- extends: .x86-32-build
- variables:
-@@ -244,16 +227,6 @@ debian-stretch-clang-debug:
- variables:
- CONTAINER: debian:stretch
-
--debian-stretch-clang-8:
-- extends: .clang-8-x86-64-build
-- variables:
-- CONTAINER: debian:stretch
--
--debian-stretch-clang-8-debug:
-- extends: .clang-8-x86-64-build-debug
-- variables:
-- CONTAINER: debian:stretch
--
- debian-stretch-gcc:
- extends: .gcc-x86-64-build
- variables:
---
-2.40.0
-
diff --git a/info.txt b/info.txt
index 45b2f7f..26a1905 100644
--- a/info.txt
+++ b/info.txt
@@ -1,6 +1,6 @@
-Xen upstream patchset #0 for 4.17.1-pre
+Xen upstream patchset #0 for 4.17.3-pre
Containing patches from
-RELEASE-4.17.0 (5556ac9bf224ed6b977f214653b234de45dcdfbf)
+RELEASE-4.17.2 (b86c313a4a9c3ec4c9f825d9b99131753296485f)
to
-staging-4.17 (e4a5fb9227889bec99ab212b839680f4d5b51e60)
+staging-4.17 (0b56bed864ca9b572473957f0254aefa797216f2)
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [gentoo-commits] proj/xen-upstream-patches:main commit in: /
@ 2024-02-03 18:12 Tomáš Mózes
0 siblings, 0 replies; 11+ messages in thread
From: Tomáš Mózes @ 2024-02-03 18:12 UTC (permalink / raw
To: gentoo-commits
commit: 0fbc09bbe820146fd857c79bb150028703342c87
Author: Tomáš Mózes <hydrapolic <AT> gmail <DOT> com>
AuthorDate: Sat Feb 3 18:12:02 2024 +0000
Commit: Tomáš Mózes <hydrapolic <AT> gmail <DOT> com>
CommitDate: Sat Feb 3 18:12:02 2024 +0000
URL: https://gitweb.gentoo.org/proj/xen-upstream-patches.git/commit/?id=0fbc09bb
Xen 4.17.4-pre-patchset-0
Signed-off-by: Tomáš Mózes <hydrapolic <AT> gmail.com>
... => 0001-update-Xen-version-to-4.17.4-pre.patch | 14 +-
...vice-assignment-if-phantom-functions-cann.patch | 91 ++++
...ild-with-old-gcc-after-CPU-policy-changes.patch | 84 ----
0003-VT-d-Fix-else-vs-endif-misplacement.patch | 70 ++++
...EN_LIB_DIR-to-store-bootloader-from-pygru.patch | 45 --
0004-build-define-ARCH-and-SRCARCH-later.patch | 67 ---
...end-CPU-erratum-1474-fix-to-more-affected.patch | 123 ++++++
0005-CirrusCI-drop-FreeBSD-12.patch | 39 ++
...remove-TARGET_SUBARCH-a-duplicate-of-ARCH.patch | 50 ---
...remove-TARGET_ARCH-a-duplicate-of-SRCARCH.patch | 123 ------
...nsure-Global-Performance-Counter-Control-.patch | 74 ++++
...ate-XEN_BUILD_-and-XEN_DOMAIN-immediately.patch | 58 ---
...vmx-Fix-IRQ-handling-for-EXIT_REASON_INIT.patch | 65 +++
...valuate-XEN_COMPILE_ARCH-and-XEN_OS-immed.patch | 50 ---
...vmx-Disallow-the-use-of-inactivity-states.patch | 126 ++++++
...-move-lib-fdt-elf-temp.o-and-their-deps-t.patch | 70 ++++
...ork-wrapping-of-libc-functions-in-test-an.patch | 245 -----------
0010-rombios-Work-around-GCC-issue-99578.patch | 43 --
...m-pt-fix-off-by-one-in-entry-check-assert.patch | 36 ++
0011-rombios-Avoid-using-K-R-function-syntax.patch | 74 ----
0012-rombios-Remove-the-use-of-egrep.patch | 34 --
0013-CI-Resync-FreeBSD-config-with-staging.patch | 62 ---
...-Fix-Wsingle-bit-bitfield-constant-conver.patch | 43 --
0015-xen-vcpu-ignore-VCPU_SSHOTTMR_future.patch | 143 -------
0016-x86-head-check-base-address-alignment.patch | 85 ----
...e-Handle-start-of-day-RUNNING-transitions.patch | 275 ------------
...sanitize-IO-APIC-pins-before-enabling-lap.patch | 113 -----
...-x86-ioapic-add-a-raw-field-to-RTE-struct.patch | 147 -------
...RTE-modifications-must-use-ioapic_write_e.patch | 180 --------
...ename-io_apic_read_remap_rte-local-variab.patch | 64 ---
...ass-full-IO-APIC-RTE-for-remapping-table-.patch | 462 ---------------------
0023-build-correct-gas-noexecstack-check.patch | 34 --
...tly-correct-JSON-generation-of-CPU-policy.patch | 38 --
0025-tboot-Disable-CET-at-shutdown.patch | 53 ---
...-valid-condition-in-svm_get_pending_event.patch | 29 --
...ert-x86-VMX-sanitize-rIP-before-re-enteri.patch | 100 -----
...ix-reporting-of-spurious-i8259-interrupts.patch | 41 --
...e-Handle-cache-flush-of-an-element-at-the.patch | 111 -----
...end-Zenbleed-check-to-models-good-ucode-i.patch | 48 ---
...rl-Fix-confusion-between-SPEC_CTRL_EXIT_T.patch | 74 ----
...rl-Fold-DO_SPEC_CTRL_EXIT_TO_XEN-into-it-.patch | 85 ----
...rl-Turn-the-remaining-SPEC_CTRL_-ENTRY-EX.patch | 83 ----
...rl-Improve-all-SPEC_CTRL_-ENTER-EXIT-_-co.patch | 106 -----
...djust-restore_all_xen-to-hold-stack_end-i.patch | 74 ----
...rack-the-IST-ness-of-an-entry-for-the-exi.patch | 109 -----
...ec-ctrl-Issue-VERW-during-IST-exit-to-Xen.patch | 89 ----
...md-Introduce-is_zen-1-2-_uarch-predicates.patch | 91 ----
...6-spec-ctrl-Mitigate-the-Zen1-DIV-leakage.patch | 228 ----------
...defer-releasing-of-PV-s-top-level-shadow-.patch | 455 --------------------
...ored-domain_entry_fix-Handle-conflicting-.patch | 64 ---
...-vi-flush-IOMMU-TLB-when-flushing-the-DTE.patch | 186 ---------
0043-libfsimage-xfs-Remove-dead-code.patch | 71 ----
...-xfs-Amend-mask32lo-to-allow-the-value-32.patch | 33 --
...xfs-Sanity-check-the-superblock-during-mo.patch | 137 ------
...-xfs-Add-compile-time-check-to-libfsimage.patch | 62 ---
...tools-pygrub-Remove-unnecessary-hypercall.patch | 60 ---
0048-tools-pygrub-Small-refactors.patch | 65 ---
...ools-pygrub-Open-the-output-files-earlier.patch | 105 -----
...image-Export-a-new-function-to-preload-al.patch | 126 ------
0051-tools-pygrub-Deprivilege-pygrub.patch | 307 --------------
...upport-for-running-bootloader-in-restrict.patch | 251 -----------
...t-bootloader-execution-in-restricted-mode.patch | 158 -------
...-asymmetry-with-AMD-DR-MASK-context-switc.patch | 104 -----
...ect-the-auditing-of-guest-breakpoint-addr.patch | 86 ----
info.txt | 6 +-
65 files changed, 704 insertions(+), 6120 deletions(-)
diff --git a/0001-update-Xen-version-to-4.17.3-pre.patch b/0001-update-Xen-version-to-4.17.4-pre.patch
similarity index 62%
rename from 0001-update-Xen-version-to-4.17.3-pre.patch
rename to 0001-update-Xen-version-to-4.17.4-pre.patch
index 1be1cd1..b532743 100644
--- a/0001-update-Xen-version-to-4.17.3-pre.patch
+++ b/0001-update-Xen-version-to-4.17.4-pre.patch
@@ -1,25 +1,25 @@
-From 2f337a04bfc2dda794ae0fc108577ec72932f83b Mon Sep 17 00:00:00 2001
+From 4f6e9d4327eb5252f1e8cac97a095d8b8485dadb Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
-Date: Mon, 21 Aug 2023 15:52:13 +0200
-Subject: [PATCH 01/55] update Xen version to 4.17.3-pre
+Date: Tue, 30 Jan 2024 14:36:44 +0100
+Subject: [PATCH 01/10] update Xen version to 4.17.4-pre
---
xen/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/Makefile b/xen/Makefile
-index fbada570b8..f6005bd536 100644
+index a46e6330db..dd0b004e1c 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -6,7 +6,7 @@ this-makefile := $(call lastword,$(MAKEFILE_LIST))
# All other places this is stored (eg. compile.h) should be autogenerated.
export XEN_VERSION = 4
export XEN_SUBVERSION = 17
--export XEN_EXTRAVERSION ?= .2$(XEN_VENDORVERSION)
-+export XEN_EXTRAVERSION ?= .3-pre$(XEN_VENDORVERSION)
+-export XEN_EXTRAVERSION ?= .3$(XEN_VENDORVERSION)
++export XEN_EXTRAVERSION ?= .4-pre$(XEN_VENDORVERSION)
export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
-include xen-version
--
-2.42.0
+2.43.0
diff --git a/0002-pci-fail-device-assignment-if-phantom-functions-cann.patch b/0002-pci-fail-device-assignment-if-phantom-functions-cann.patch
new file mode 100644
index 0000000..d91802f
--- /dev/null
+++ b/0002-pci-fail-device-assignment-if-phantom-functions-cann.patch
@@ -0,0 +1,91 @@
+From f9e1ed51bdba31017ea17e1819eb2ade6b5c8615 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 30 Jan 2024 14:37:39 +0100
+Subject: [PATCH 02/10] pci: fail device assignment if phantom functions cannot
+ be assigned
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The current behavior is that no error is reported if (some) phantom functions
+fail to be assigned during device add or assignment, so the operation succeeds
+even if some phantom functions are not correctly setup.
+
+This can lead to devices possibly being successfully assigned to a domU while
+some of the device phantom functions are still assigned to dom0. Even when the
+device is assigned domIO before being assigned to a domU phantom functions
+might fail to be assigned to domIO, and also fail to be assigned to the domU,
+leaving them assigned to dom0.
+
+Since the device can generate requests using the IDs of those phantom
+functions, given the scenario above a device in such state would be in control
+of a domU, but still capable of generating transactions that use a context ID
+targeting dom0 owned memory.
+
+Modify device assign in order to attempt to deassign the device if phantom
+functions failed to be assigned.
+
+Note that device addition is not modified in the same way, as in that case the
+device is assigned to a trusted domain, and hence partial assign can lead to
+device malfunction but not a security issue.
+
+This is XSA-449 / CVE-2023-46839
+
+Fixes: 4e9950dc1bd2 ('IOMMU: add phantom function support')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: cb4ecb3cc17b02c2814bc817efd05f3f3ba33d1e
+master date: 2024-01-30 14:28:01 +0100
+---
+ xen/drivers/passthrough/pci.c | 27 +++++++++++++++++++++------
+ 1 file changed, 21 insertions(+), 6 deletions(-)
+
+diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
+index 07d1986d33..8c62b14d19 100644
+--- a/xen/drivers/passthrough/pci.c
++++ b/xen/drivers/passthrough/pci.c
+@@ -1444,11 +1444,10 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
+
+ pdev->fault.count = 0;
+
+- if ( (rc = iommu_call(hd->platform_ops, assign_device, d, devfn,
+- pci_to_dev(pdev), flag)) )
+- goto done;
++ rc = iommu_call(hd->platform_ops, assign_device, d, devfn, pci_to_dev(pdev),
++ flag);
+
+- for ( ; pdev->phantom_stride; rc = 0 )
++ while ( pdev->phantom_stride && !rc )
+ {
+ devfn += pdev->phantom_stride;
+ if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
+@@ -1459,8 +1458,24 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
+
+ done:
+ if ( rc )
+- printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",
+- d, &PCI_SBDF(seg, bus, devfn), rc);
++ {
++ printk(XENLOG_G_WARNING "%pd: assign %s(%pp) failed (%d)\n",
++ d, devfn != pdev->devfn ? "phantom function " : "",
++ &PCI_SBDF(seg, bus, devfn), rc);
++
++ if ( devfn != pdev->devfn && deassign_device(d, seg, bus, pdev->devfn) )
++ {
++ /*
++ * Device with phantom functions that failed to both assign and
++ * rollback. Mark the device as broken and crash the target domain,
++ * as the state of the functions at this point is unknown and Xen
++ * has no way to assert consistent context assignment among them.
++ */
++ pdev->broken = true;
++ if ( !is_hardware_domain(d) && d != dom_io )
++ domain_crash(d);
++ }
++ }
+ /* The device is assigned to dom_io so mark it as quarantined */
+ else if ( d == dom_io )
+ pdev->quarantine = true;
+--
+2.43.0
+
diff --git a/0002-x86-fix-build-with-old-gcc-after-CPU-policy-changes.patch b/0002-x86-fix-build-with-old-gcc-after-CPU-policy-changes.patch
deleted file mode 100644
index 1b62572..0000000
--- a/0002-x86-fix-build-with-old-gcc-after-CPU-policy-changes.patch
+++ /dev/null
@@ -1,84 +0,0 @@
-From 7d8897984927a51495e9a1b827aa4bce1d779b87 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Mon, 21 Aug 2023 15:53:17 +0200
-Subject: [PATCH 02/55] x86: fix build with old gcc after CPU policy changes
-
-Old gcc won't cope with initializers involving unnamed struct/union
-fields.
-
-Fixes: 441b1b2a50ea ("x86/emul: Switch x86_emulate_ctxt to cpu_policy")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 768846690d64bc730c1a1123e8de3af731bb2eb3
-master date: 2023-04-19 11:02:47 +0200
----
- tools/fuzz/x86_instruction_emulator/fuzz-emul.c | 4 +++-
- xen/arch/x86/pv/emul-priv-op.c | 4 +++-
- xen/arch/x86/pv/ro-page-fault.c | 4 +++-
- 3 files changed, 9 insertions(+), 3 deletions(-)
-
-diff --git a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
-index 4885a68210..eeeb6931f4 100644
---- a/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
-+++ b/tools/fuzz/x86_instruction_emulator/fuzz-emul.c
-@@ -893,12 +893,14 @@ int LLVMFuzzerTestOneInput(const uint8_t *data_p, size_t size)
- struct x86_emulate_ctxt ctxt = {
- .data = &state,
- .regs = &input.regs,
-- .cpu_policy = &cp,
- .addr_size = 8 * sizeof(void *),
- .sp_size = 8 * sizeof(void *),
- };
- int rc;
-
-+ /* Not part of the initializer, for old gcc to cope. */
-+ ctxt.cpu_policy = &cp;
-+
- /* Reset all global state variables */
- memset(&input, 0, sizeof(input));
-
-diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
-index 04416f1979..2c94beb10e 100644
---- a/xen/arch/x86/pv/emul-priv-op.c
-+++ b/xen/arch/x86/pv/emul-priv-op.c
-@@ -1327,12 +1327,14 @@ int pv_emulate_privileged_op(struct cpu_user_regs *regs)
- struct domain *currd = curr->domain;
- struct priv_op_ctxt ctxt = {
- .ctxt.regs = regs,
-- .ctxt.cpu_policy = currd->arch.cpu_policy,
- .ctxt.lma = !is_pv_32bit_domain(currd),
- };
- int rc;
- unsigned int eflags, ar;
-
-+ /* Not part of the initializer, for old gcc to cope. */
-+ ctxt.ctxt.cpu_policy = currd->arch.cpu_policy;
-+
- if ( !pv_emul_read_descriptor(regs->cs, curr, &ctxt.cs.base,
- &ctxt.cs.limit, &ar, 1) ||
- !(ar & _SEGMENT_S) ||
-diff --git a/xen/arch/x86/pv/ro-page-fault.c b/xen/arch/x86/pv/ro-page-fault.c
-index 0d02c7d2ab..f23ad5d184 100644
---- a/xen/arch/x86/pv/ro-page-fault.c
-+++ b/xen/arch/x86/pv/ro-page-fault.c
-@@ -356,7 +356,6 @@ int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs)
- unsigned int addr_size = is_pv_32bit_domain(currd) ? 32 : BITS_PER_LONG;
- struct x86_emulate_ctxt ctxt = {
- .regs = regs,
-- .cpu_policy = currd->arch.cpu_policy,
- .addr_size = addr_size,
- .sp_size = addr_size,
- .lma = addr_size > 32,
-@@ -364,6 +363,9 @@ int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs)
- int rc;
- bool mmio_ro;
-
-+ /* Not part of the initializer, for old gcc to cope. */
-+ ctxt.cpu_policy = currd->arch.cpu_policy;
-+
- /* Attempt to read the PTE that maps the VA being accessed. */
- pte = guest_get_eff_kern_l1e(addr);
-
---
-2.42.0
-
diff --git a/0003-VT-d-Fix-else-vs-endif-misplacement.patch b/0003-VT-d-Fix-else-vs-endif-misplacement.patch
new file mode 100644
index 0000000..2e7f78d
--- /dev/null
+++ b/0003-VT-d-Fix-else-vs-endif-misplacement.patch
@@ -0,0 +1,70 @@
+From 6b1864afc14d484cdbc9754ce3172ac3dc189846 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 30 Jan 2024 14:38:38 +0100
+Subject: [PATCH 03/10] VT-d: Fix "else" vs "#endif" misplacement
+
+In domain_pgd_maddr() the "#endif" is misplaced with respect to "else". This
+generates incorrect logic when CONFIG_HVM is compiled out, as the "else" body
+is executed unconditionally.
+
+Rework the logic to use IS_ENABLED() instead of explicit #ifdef-ary, as it's
+clearer to follow. This in turn involves adjusting p2m_get_pagetable() to
+compile when CONFIG_HVM is disabled.
+
+This is XSA-450 / CVE-2023-46840.
+
+Fixes: 033ff90aa9c1 ("x86/P2M: p2m_{alloc,free}_ptp() and p2m_alloc_table() are HVM-only")
+Reported-by: Teddy Astie <teddy.astie@vates.tech>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: cc6ba68edf6dcd18c3865e7d7c0f1ed822796426
+master date: 2024-01-30 14:29:15 +0100
+---
+ xen/arch/x86/include/asm/p2m.h | 9 ++++++++-
+ xen/drivers/passthrough/vtd/iommu.c | 4 +---
+ 2 files changed, 9 insertions(+), 4 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/p2m.h b/xen/arch/x86/include/asm/p2m.h
+index cd43d8621a..4f691533d5 100644
+--- a/xen/arch/x86/include/asm/p2m.h
++++ b/xen/arch/x86/include/asm/p2m.h
+@@ -447,7 +447,14 @@ static inline bool_t p2m_is_altp2m(const struct p2m_domain *p2m)
+ return p2m->p2m_class == p2m_alternate;
+ }
+
+-#define p2m_get_pagetable(p2m) ((p2m)->phys_table)
++#ifdef CONFIG_HVM
++static inline pagetable_t p2m_get_pagetable(const struct p2m_domain *p2m)
++{
++ return p2m->phys_table;
++}
++#else
++pagetable_t p2m_get_pagetable(const struct p2m_domain *p2m);
++#endif
+
+ /*
+ * Ensure any deferred p2m TLB flush has been completed on all VCPUs.
+diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
+index b4c11a6b48..908b3ba6ee 100644
+--- a/xen/drivers/passthrough/vtd/iommu.c
++++ b/xen/drivers/passthrough/vtd/iommu.c
+@@ -441,15 +441,13 @@ static paddr_t domain_pgd_maddr(struct domain *d, paddr_t pgd_maddr,
+
+ if ( pgd_maddr )
+ /* nothing */;
+-#ifdef CONFIG_HVM
+- else if ( iommu_use_hap_pt(d) )
++ else if ( IS_ENABLED(CONFIG_HVM) && iommu_use_hap_pt(d) )
+ {
+ pagetable_t pgt = p2m_get_pagetable(p2m_get_hostp2m(d));
+
+ pgd_maddr = pagetable_get_paddr(pgt);
+ }
+ else
+-#endif
+ {
+ if ( !hd->arch.vtd.pgd_maddr )
+ {
+--
+2.43.0
+
diff --git a/0003-libxl-Use-XEN_LIB_DIR-to-store-bootloader-from-pygru.patch b/0003-libxl-Use-XEN_LIB_DIR-to-store-bootloader-from-pygru.patch
deleted file mode 100644
index a395d7a..0000000
--- a/0003-libxl-Use-XEN_LIB_DIR-to-store-bootloader-from-pygru.patch
+++ /dev/null
@@ -1,45 +0,0 @@
-From 8d84be5b557b27e9cc53e48285aebad28a48468c Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Mon, 21 Aug 2023 15:53:47 +0200
-Subject: [PATCH 03/55] libxl: Use XEN_LIB_DIR to store bootloader from pygrub
-
-In osstest, the jobs using pygrub on arm64 on the branch linux-linus
-started to fails with:
- [Errno 28] No space left on device
- Error writing temporary copy of ramdisk
-
-This is because /var/run is small when dom0 has only 512MB to work
-with, /var/run is only 40MB. The size of both kernel and ramdisk on
-this jobs is now about 42MB, so not enough space in /var/run.
-
-So, to avoid writing a big binary in ramfs, we will use /var/lib
-instead, like we already do when saving the device model state on
-migration.
-
-Reported-by: Jan Beulich <jbeulich@suse.com>
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
-master commit: ad89640ad766d3cb6c92fc8b6406ca6bbab44136
-master date: 2023-08-08 09:45:20 +0200
----
- tools/libs/light/libxl_bootloader.c | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/tools/libs/light/libxl_bootloader.c b/tools/libs/light/libxl_bootloader.c
-index 1bc6e51827..108329b4a5 100644
---- a/tools/libs/light/libxl_bootloader.c
-+++ b/tools/libs/light/libxl_bootloader.c
-@@ -245,8 +245,8 @@ static void bootloader_cleanup(libxl__egc *egc, libxl__bootloader_state *bl)
- static void bootloader_setpaths(libxl__gc *gc, libxl__bootloader_state *bl)
- {
- uint32_t domid = bl->domid;
-- bl->outputdir = GCSPRINTF(XEN_RUN_DIR "/bootloader.%"PRIu32".d", domid);
-- bl->outputpath = GCSPRINTF(XEN_RUN_DIR "/bootloader.%"PRIu32".out", domid);
-+ bl->outputdir = GCSPRINTF(XEN_LIB_DIR "/bootloader.%"PRIu32".d", domid);
-+ bl->outputpath = GCSPRINTF(XEN_LIB_DIR "/bootloader.%"PRIu32".out", domid);
- }
-
- /* Callbacks */
---
-2.42.0
-
diff --git a/0004-build-define-ARCH-and-SRCARCH-later.patch b/0004-build-define-ARCH-and-SRCARCH-later.patch
deleted file mode 100644
index aebcbb7..0000000
--- a/0004-build-define-ARCH-and-SRCARCH-later.patch
+++ /dev/null
@@ -1,67 +0,0 @@
-From 1c3927f8f6743538a35aa45a91a2d4adbde9f277 Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Wed, 5 Jul 2023 08:25:03 +0200
-Subject: [PATCH 04/55] build: define ARCH and SRCARCH later
-
-Defining ARCH and SRCARCH later in xen/Makefile allows to switch to
-immediate evaluation variable type.
-
-ARCH and SRCARCH depend on value defined in Config.mk and aren't used
-for e.g. TARGET_SUBARCH or TARGET_ARCH, and not before they're needed in
-a sub-make or a rule.
-
-This will help reduce the number of times the shell rune is been
-run.
-
-With GNU make 4.4, the number of execution of the command present in
-these $(shell ) increased greatly. This is probably because as of make
-4.4, exported variable are also added to the environment of $(shell )
-construct.
-
-Also, `make -d` shows a lot of these:
- Makefile:39: not recursively expanding SRCARCH to export to shell function
- Makefile:38: not recursively expanding ARCH to export to shell function
-
-Reported-by: Jason Andryuk <jandryuk@gmail.com>
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Tested-by: Jason Andryuk <jandryuk@gmail.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 58e0a3f3b2c430f8640ef9df67ac857b0008ebc8)
----
- xen/Makefile | 13 +++++++------
- 1 file changed, 7 insertions(+), 6 deletions(-)
-
-diff --git a/xen/Makefile b/xen/Makefile
-index f6005bd536..7ecfa6e8e9 100644
---- a/xen/Makefile
-+++ b/xen/Makefile
-@@ -35,12 +35,6 @@ MAKEFLAGS += -rR
-
- EFI_MOUNTPOINT ?= $(BOOT_DIR)/efi
-
--ARCH=$(XEN_TARGET_ARCH)
--SRCARCH=$(shell echo $(ARCH) | \
-- sed -e 's/x86.*/x86/' -e s'/arm\(32\|64\)/arm/g' \
-- -e s'/riscv.*/riscv/g')
--export ARCH SRCARCH
--
- # Allow someone to change their config file
- export KCONFIG_CONFIG ?= .config
-
-@@ -241,6 +235,13 @@ include scripts/Kbuild.include
- include $(XEN_ROOT)/Config.mk
-
- # Set ARCH/SUBARCH appropriately.
-+
-+ARCH := $(XEN_TARGET_ARCH)
-+SRCARCH := $(shell echo $(ARCH) | \
-+ sed -e 's/x86.*/x86/' -e 's/arm\(32\|64\)/arm/g' \
-+ -e 's/riscv.*/riscv/g')
-+export ARCH SRCARCH
-+
- export TARGET_SUBARCH := $(XEN_TARGET_ARCH)
- export TARGET_ARCH := $(shell echo $(XEN_TARGET_ARCH) | \
- sed -e 's/x86.*/x86/' -e s'/arm\(32\|64\)/arm/g' \
---
-2.42.0
-
diff --git a/0004-x86-amd-Extend-CPU-erratum-1474-fix-to-more-affected.patch b/0004-x86-amd-Extend-CPU-erratum-1474-fix-to-more-affected.patch
new file mode 100644
index 0000000..f1289aa
--- /dev/null
+++ b/0004-x86-amd-Extend-CPU-erratum-1474-fix-to-more-affected.patch
@@ -0,0 +1,123 @@
+From abcc32f0634627fe21117a48bd10e792bfbdd6dc Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Fri, 2 Feb 2024 08:01:09 +0100
+Subject: [PATCH 04/10] x86/amd: Extend CPU erratum #1474 fix to more affected
+ models
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Erratum #1474 has now been extended to cover models from family 17h ranges
+00-2Fh, so the errata now covers all the models released under Family
+17h (Zen, Zen+ and Zen2).
+
+Additionally extend the workaround to Family 18h (Hygon), since it's based on
+the Zen architecture and very likely affected.
+
+Rename all the zen2 related symbols to fam17, since the errata doesn't
+exclusively affect Zen2 anymore.
+
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 23db507a01a4ec5259ec0ab43d296a41b1c326ba
+master date: 2023-12-21 12:19:40 +0000
+---
+ xen/arch/x86/cpu/amd.c | 27 ++++++++++++++-------------
+ 1 file changed, 14 insertions(+), 13 deletions(-)
+
+diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
+index 29ae97e7c0..3d85e9797d 100644
+--- a/xen/arch/x86/cpu/amd.c
++++ b/xen/arch/x86/cpu/amd.c
+@@ -54,7 +54,7 @@ bool __read_mostly amd_acpi_c1e_quirk;
+ bool __ro_after_init amd_legacy_ssbd;
+ bool __initdata amd_virt_spec_ctrl;
+
+-static bool __read_mostly zen2_c6_disabled;
++static bool __read_mostly fam17_c6_disabled;
+
+ static inline int rdmsr_amd_safe(unsigned int msr, unsigned int *lo,
+ unsigned int *hi)
+@@ -951,24 +951,24 @@ void amd_check_zenbleed(void)
+ val & chickenbit ? "chickenbit" : "microcode");
+ }
+
+-static void cf_check zen2_disable_c6(void *arg)
++static void cf_check fam17_disable_c6(void *arg)
+ {
+ /* Disable C6 by clearing the CCR{0,1,2}_CC6EN bits. */
+ const uint64_t mask = ~((1ul << 6) | (1ul << 14) | (1ul << 22));
+ uint64_t val;
+
+- if (!zen2_c6_disabled) {
++ if (!fam17_c6_disabled) {
+ printk(XENLOG_WARNING
+ "Disabling C6 after 1000 days apparent uptime due to AMD errata 1474\n");
+- zen2_c6_disabled = true;
++ fam17_c6_disabled = true;
+ /*
+ * Prevent CPU hotplug so that started CPUs will either see
+- * zen2_c6_disabled set, or will be handled by
++ * zen_c6_disabled set, or will be handled by
+ * smp_call_function().
+ */
+ while (!get_cpu_maps())
+ process_pending_softirqs();
+- smp_call_function(zen2_disable_c6, NULL, 0);
++ smp_call_function(fam17_disable_c6, NULL, 0);
+ put_cpu_maps();
+ }
+
+@@ -1273,8 +1273,8 @@ static void cf_check init_amd(struct cpuinfo_x86 *c)
+ amd_check_zenbleed();
+ amd_check_erratum_1485();
+
+- if (zen2_c6_disabled)
+- zen2_disable_c6(NULL);
++ if (fam17_c6_disabled)
++ fam17_disable_c6(NULL);
+
+ check_syscfg_dram_mod_en();
+
+@@ -1286,7 +1286,7 @@ const struct cpu_dev amd_cpu_dev = {
+ .c_init = init_amd,
+ };
+
+-static int __init cf_check zen2_c6_errata_check(void)
++static int __init cf_check amd_check_erratum_1474(void)
+ {
+ /*
+ * Errata #1474: A Core May Hang After About 1044 Days
+@@ -1294,7 +1294,8 @@ static int __init cf_check zen2_c6_errata_check(void)
+ */
+ s_time_t delta;
+
+- if (cpu_has_hypervisor || boot_cpu_data.x86 != 0x17 || !is_zen2_uarch())
++ if (cpu_has_hypervisor ||
++ (boot_cpu_data.x86 != 0x17 && boot_cpu_data.x86 != 0x18))
+ return 0;
+
+ /*
+@@ -1309,10 +1310,10 @@ static int __init cf_check zen2_c6_errata_check(void)
+ if (delta > 0) {
+ static struct timer errata_c6;
+
+- init_timer(&errata_c6, zen2_disable_c6, NULL, 0);
++ init_timer(&errata_c6, fam17_disable_c6, NULL, 0);
+ set_timer(&errata_c6, NOW() + delta);
+ } else
+- zen2_disable_c6(NULL);
++ fam17_disable_c6(NULL);
+
+ return 0;
+ }
+@@ -1320,4 +1321,4 @@ static int __init cf_check zen2_c6_errata_check(void)
+ * Must be executed after early_time_init() for tsc_ticks2ns() to have been
+ * calibrated. That prevents us doing the check in init_amd().
+ */
+-presmp_initcall(zen2_c6_errata_check);
++presmp_initcall(amd_check_erratum_1474);
+--
+2.43.0
+
diff --git a/0005-CirrusCI-drop-FreeBSD-12.patch b/0005-CirrusCI-drop-FreeBSD-12.patch
new file mode 100644
index 0000000..cca7bb0
--- /dev/null
+++ b/0005-CirrusCI-drop-FreeBSD-12.patch
@@ -0,0 +1,39 @@
+From 0ef1fb43ddd61b3c4c953e833e012ac21ad5ca0f Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Fri, 2 Feb 2024 08:01:50 +0100
+Subject: [PATCH 05/10] CirrusCI: drop FreeBSD 12
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Went EOL by the end of December 2023, and the pkg repos have been shut down.
+
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: c2ce3466472e9c9eda79f5dc98eb701bc6fdba20
+master date: 2024-01-15 12:20:11 +0100
+---
+ .cirrus.yml | 6 ------
+ 1 file changed, 6 deletions(-)
+
+diff --git a/.cirrus.yml b/.cirrus.yml
+index 7e0beb200d..63f3afb104 100644
+--- a/.cirrus.yml
++++ b/.cirrus.yml
+@@ -14,12 +14,6 @@ freebsd_template: &FREEBSD_TEMPLATE
+ - ./configure --with-system-seabios=/usr/local/share/seabios/bios.bin
+ - gmake -j`sysctl -n hw.ncpu` clang=y
+
+-task:
+- name: 'FreeBSD 12'
+- freebsd_instance:
+- image_family: freebsd-12-4
+- << : *FREEBSD_TEMPLATE
+-
+ task:
+ name: 'FreeBSD 13'
+ freebsd_instance:
+--
+2.43.0
+
diff --git a/0005-build-remove-TARGET_SUBARCH-a-duplicate-of-ARCH.patch b/0005-build-remove-TARGET_SUBARCH-a-duplicate-of-ARCH.patch
deleted file mode 100644
index 4f31614..0000000
--- a/0005-build-remove-TARGET_SUBARCH-a-duplicate-of-ARCH.patch
+++ /dev/null
@@ -1,50 +0,0 @@
-From 56076ef445073458c39c481f9b70c3b4ff848839 Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Wed, 5 Jul 2023 08:27:51 +0200
-Subject: [PATCH 05/55] build: remove TARGET_SUBARCH, a duplicate of ARCH
-
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit a6ab7dd061338c33faef629cbe52ed1608571d84)
----
- xen/Makefile | 3 +--
- xen/build.mk | 2 +-
- 2 files changed, 2 insertions(+), 3 deletions(-)
-
-diff --git a/xen/Makefile b/xen/Makefile
-index 7ecfa6e8e9..6e89bcf348 100644
---- a/xen/Makefile
-+++ b/xen/Makefile
-@@ -234,7 +234,7 @@ include scripts/Kbuild.include
- # we need XEN_TARGET_ARCH to generate the proper config
- include $(XEN_ROOT)/Config.mk
-
--# Set ARCH/SUBARCH appropriately.
-+# Set ARCH/SRCARCH appropriately.
-
- ARCH := $(XEN_TARGET_ARCH)
- SRCARCH := $(shell echo $(ARCH) | \
-@@ -242,7 +242,6 @@ SRCARCH := $(shell echo $(ARCH) | \
- -e 's/riscv.*/riscv/g')
- export ARCH SRCARCH
-
--export TARGET_SUBARCH := $(XEN_TARGET_ARCH)
- export TARGET_ARCH := $(shell echo $(XEN_TARGET_ARCH) | \
- sed -e 's/x86.*/x86/' -e s'/arm\(32\|64\)/arm/g' \
- -e s'/riscv.*/riscv/g')
-diff --git a/xen/build.mk b/xen/build.mk
-index 758590c68e..d049d3a53a 100644
---- a/xen/build.mk
-+++ b/xen/build.mk
-@@ -41,7 +41,7 @@ include/xen/compile.h: include/xen/compile.h.in .banner FORCE
- targets += include/xen/compile.h
-
- -include $(wildcard .asm-offsets.s.d)
--asm-offsets.s: arch/$(TARGET_ARCH)/$(TARGET_SUBARCH)/asm-offsets.c
-+asm-offsets.s: arch/$(TARGET_ARCH)/$(ARCH)/asm-offsets.c
- $(CC) $(call cpp_flags,$(c_flags)) -S -g0 -o $@.new -MQ $@ $<
- $(call move-if-changed,$@.new,$@)
-
---
-2.42.0
-
diff --git a/0006-build-remove-TARGET_ARCH-a-duplicate-of-SRCARCH.patch b/0006-build-remove-TARGET_ARCH-a-duplicate-of-SRCARCH.patch
deleted file mode 100644
index 9eef37a..0000000
--- a/0006-build-remove-TARGET_ARCH-a-duplicate-of-SRCARCH.patch
+++ /dev/null
@@ -1,123 +0,0 @@
-From 36e84ea02e1e8dce8f3a4e9351ab1c72dec3c11e Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Wed, 5 Jul 2023 08:29:49 +0200
-Subject: [PATCH 06/55] build: remove TARGET_ARCH, a duplicate of SRCARCH
-
-The same command is used to generate the value of both $(TARGET_ARCH)
-and $(SRCARCH), as $(ARCH) is an alias for $(XEN_TARGET_ARCH).
-
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit ac27b3beb9b7b423d5563768de890c7594c21b4e)
----
- xen/Makefile | 20 ++++++++------------
- xen/Rules.mk | 2 +-
- xen/build.mk | 6 +++---
- 3 files changed, 12 insertions(+), 16 deletions(-)
-
-diff --git a/xen/Makefile b/xen/Makefile
-index 6e89bcf348..1a3b9a081f 100644
---- a/xen/Makefile
-+++ b/xen/Makefile
-@@ -242,10 +242,6 @@ SRCARCH := $(shell echo $(ARCH) | \
- -e 's/riscv.*/riscv/g')
- export ARCH SRCARCH
-
--export TARGET_ARCH := $(shell echo $(XEN_TARGET_ARCH) | \
-- sed -e 's/x86.*/x86/' -e s'/arm\(32\|64\)/arm/g' \
-- -e s'/riscv.*/riscv/g')
--
- export CONFIG_SHELL := $(SHELL)
- export CC CXX LD NM OBJCOPY OBJDUMP ADDR2LINE
- export YACC = $(if $(BISON),$(BISON),bison)
-@@ -262,7 +258,7 @@ export XEN_TREEWIDE_CFLAGS := $(CFLAGS)
- ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep clang),)
- CLANG_FLAGS :=
-
--ifeq ($(TARGET_ARCH),x86)
-+ifeq ($(SRCARCH),x86)
- # The tests to select whether the integrated assembler is usable need to happen
- # before testing any assembler features, or else the result of the tests would
- # be stale if the integrated assembler is not used.
-@@ -430,22 +426,22 @@ endif
-
- ifdef building_out_of_srctree
- CFLAGS += -I$(objtree)/include
-- CFLAGS += -I$(objtree)/arch/$(TARGET_ARCH)/include
-+ CFLAGS += -I$(objtree)/arch/$(SRCARCH)/include
- endif
- CFLAGS += -I$(srctree)/include
--CFLAGS += -I$(srctree)/arch/$(TARGET_ARCH)/include
-+CFLAGS += -I$(srctree)/arch/$(SRCARCH)/include
-
- # Note that link order matters!
- ALL_OBJS-y := common/built_in.o
- ALL_OBJS-y += drivers/built_in.o
- ALL_OBJS-y += lib/built_in.o
- ALL_OBJS-y += xsm/built_in.o
--ALL_OBJS-y += arch/$(TARGET_ARCH)/built_in.o
-+ALL_OBJS-y += arch/$(SRCARCH)/built_in.o
- ALL_OBJS-$(CONFIG_CRYPTO) += crypto/built_in.o
-
- ALL_LIBS-y := lib/lib.a
-
--include $(srctree)/arch/$(TARGET_ARCH)/arch.mk
-+include $(srctree)/arch/$(SRCARCH)/arch.mk
-
- # define new variables to avoid the ones defined in Config.mk
- export XEN_CFLAGS := $(CFLAGS)
-@@ -587,11 +583,11 @@ $(TARGET): outputmakefile FORCE
- $(Q)$(MAKE) $(build)=tools
- $(Q)$(MAKE) $(build)=. include/xen/compile.h
- $(Q)$(MAKE) $(build)=include all
-- $(Q)$(MAKE) $(build)=arch/$(TARGET_ARCH) include
-- $(Q)$(MAKE) $(build)=. arch/$(TARGET_ARCH)/include/asm/asm-offsets.h
-+ $(Q)$(MAKE) $(build)=arch/$(SRCARCH) include
-+ $(Q)$(MAKE) $(build)=. arch/$(SRCARCH)/include/asm/asm-offsets.h
- $(Q)$(MAKE) $(build)=. MKRELOC=$(MKRELOC) 'ALL_OBJS=$(ALL_OBJS-y)' 'ALL_LIBS=$(ALL_LIBS-y)' $@
-
--SUBDIRS = xsm arch/$(TARGET_ARCH) common drivers lib test
-+SUBDIRS = xsm arch/$(SRCARCH) common drivers lib test
- define all_sources
- ( find include -type f -name '*.h' -print; \
- find $(SUBDIRS) -type f -name '*.[chS]' -print )
-diff --git a/xen/Rules.mk b/xen/Rules.mk
-index 59072ae8df..8af3dd7277 100644
---- a/xen/Rules.mk
-+++ b/xen/Rules.mk
-@@ -180,7 +180,7 @@ cpp_flags = $(filter-out -Wa$(comma)% -flto,$(1))
- c_flags = -MMD -MP -MF $(depfile) $(XEN_CFLAGS)
- a_flags = -MMD -MP -MF $(depfile) $(XEN_AFLAGS)
-
--include $(srctree)/arch/$(TARGET_ARCH)/Rules.mk
-+include $(srctree)/arch/$(SRCARCH)/Rules.mk
-
- c_flags += $(_c_flags)
- a_flags += $(_c_flags)
-diff --git a/xen/build.mk b/xen/build.mk
-index d049d3a53a..9ecb104f1e 100644
---- a/xen/build.mk
-+++ b/xen/build.mk
-@@ -41,11 +41,11 @@ include/xen/compile.h: include/xen/compile.h.in .banner FORCE
- targets += include/xen/compile.h
-
- -include $(wildcard .asm-offsets.s.d)
--asm-offsets.s: arch/$(TARGET_ARCH)/$(ARCH)/asm-offsets.c
-+asm-offsets.s: arch/$(SRCARCH)/$(ARCH)/asm-offsets.c
- $(CC) $(call cpp_flags,$(c_flags)) -S -g0 -o $@.new -MQ $@ $<
- $(call move-if-changed,$@.new,$@)
-
--arch/$(TARGET_ARCH)/include/asm/asm-offsets.h: asm-offsets.s
-+arch/$(SRCARCH)/include/asm/asm-offsets.h: asm-offsets.s
- @(set -e; \
- echo "/*"; \
- echo " * DO NOT MODIFY."; \
-@@ -87,4 +87,4 @@ endif
- targets += prelink.o
-
- $(TARGET): prelink.o FORCE
-- $(Q)$(MAKE) $(build)=arch/$(TARGET_ARCH) $@
-+ $(Q)$(MAKE) $(build)=arch/$(SRCARCH) $@
---
-2.42.0
-
diff --git a/0006-x86-intel-ensure-Global-Performance-Counter-Control-.patch b/0006-x86-intel-ensure-Global-Performance-Counter-Control-.patch
new file mode 100644
index 0000000..dc64ad6
--- /dev/null
+++ b/0006-x86-intel-ensure-Global-Performance-Counter-Control-.patch
@@ -0,0 +1,74 @@
+From d0ad2cc5eac1b5d3cfd14204d377ce2384f52607 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Fri, 2 Feb 2024 08:02:20 +0100
+Subject: [PATCH 06/10] x86/intel: ensure Global Performance Counter Control is
+ setup correctly
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
+MSR contains per-counter enable bits that is ANDed with the enable bit in the
+counter EVNTSEL MSR in order for a PMC counter to be enabled.
+
+So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
+bits being set by default, but at least on some Intel Sapphire and Emerald
+Rapids this is no longer the case, and Xen reports:
+
+Testing NMI watchdog on all CPUs: 0 40 stuck
+
+The first CPU on each package is started with PERF_GLOBAL_CTRL zeroed, so PMC0
+doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
+relevant enable bit in PERF_GLOBAL_CTRL not being set.
+
+Check and adjust PERF_GLOBAL_CTRL during CPU initialization so that all the
+general-purpose PMCs are enabled. Doing so brings the state of the package-BSP
+PERF_GLOBAL_CTRL in line with the rest of the CPUs on the system.
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+master commit: 6bdb965178bbb3fc50cd4418d4770a7789956e2c
+master date: 2024-01-17 10:40:52 +0100
+---
+ xen/arch/x86/cpu/intel.c | 23 ++++++++++++++++++++++-
+ 1 file changed, 22 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
+index b40ac696e6..96723b5d44 100644
+--- a/xen/arch/x86/cpu/intel.c
++++ b/xen/arch/x86/cpu/intel.c
+@@ -528,9 +528,30 @@ static void cf_check init_intel(struct cpuinfo_x86 *c)
+ init_intel_cacheinfo(c);
+ if (c->cpuid_level > 9) {
+ unsigned eax = cpuid_eax(10);
++ unsigned int cnt = (eax >> 8) & 0xff;
++
+ /* Check for version and the number of counters */
+- if ((eax & 0xff) && (((eax>>8) & 0xff) > 1))
++ if ((eax & 0xff) && (cnt > 1) && (cnt <= 32)) {
++ uint64_t global_ctrl;
++ unsigned int cnt_mask = (1UL << cnt) - 1;
++
++ /*
++ * On (some?) Sapphire/Emerald Rapids platforms each
++ * package-BSP starts with all the enable bits for the
++ * general-purpose PMCs cleared. Adjust so counters
++ * can be enabled from EVNTSEL.
++ */
++ rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
++ if ((global_ctrl & cnt_mask) != cnt_mask) {
++ printk("CPU%u: invalid PERF_GLOBAL_CTRL: %#"
++ PRIx64 " adjusting to %#" PRIx64 "\n",
++ smp_processor_id(), global_ctrl,
++ global_ctrl | cnt_mask);
++ wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
++ global_ctrl | cnt_mask);
++ }
+ __set_bit(X86_FEATURE_ARCH_PERFMON, c->x86_capability);
++ }
+ }
+
+ if ( !cpu_has(c, X86_FEATURE_XTOPOLOGY) )
+--
+2.43.0
+
diff --git a/0007-build-evaluate-XEN_BUILD_-and-XEN_DOMAIN-immediately.patch b/0007-build-evaluate-XEN_BUILD_-and-XEN_DOMAIN-immediately.patch
deleted file mode 100644
index 81e5ca4..0000000
--- a/0007-build-evaluate-XEN_BUILD_-and-XEN_DOMAIN-immediately.patch
+++ /dev/null
@@ -1,58 +0,0 @@
-From a1f68fb56710c507f9c1ec8e8d784f5b1e4088f1 Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Mon, 31 Jul 2023 15:02:18 +0200
-Subject: [PATCH 07/55] build: evaluate XEN_BUILD_* and XEN_DOMAIN immediately
-
-With GNU make 4.4, the number of execution of the command present in
-these $(shell ) increased greatly. This is probably because as of make
-4.4, exported variable are also added to the environment of $(shell )
-construct.
-
-Also, `make -d` shows a lot of these:
- Makefile:15: not recursively expanding XEN_BUILD_DATE to export to shell function
- Makefile:16: not recursively expanding XEN_BUILD_TIME to export to shell function
- Makefile:17: not recursively expanding XEN_BUILD_HOST to export to shell function
- Makefile:14: not recursively expanding XEN_DOMAIN to export to shell function
-
-So to avoid having these command been run more than necessary, we
-will replace ?= by an equivalent but with immediate expansion.
-
-Reported-by: Jason Andryuk <jandryuk@gmail.com>
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Tested-by: Jason Andryuk <jandryuk@gmail.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 0c594c1b57ee2ecec5f70826c53a2cf02a9c2acb)
----
- xen/Makefile | 16 ++++++++++++----
- 1 file changed, 12 insertions(+), 4 deletions(-)
-
-diff --git a/xen/Makefile b/xen/Makefile
-index 1a3b9a081f..7bb9de7bdc 100644
---- a/xen/Makefile
-+++ b/xen/Makefile
-@@ -11,10 +11,18 @@ export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
- -include xen-version
-
- export XEN_WHOAMI ?= $(USER)
--export XEN_DOMAIN ?= $(shell ([ -x /bin/dnsdomainname ] && /bin/dnsdomainname) || ([ -x /bin/domainname ] && /bin/domainname || echo [unknown]))
--export XEN_BUILD_DATE ?= $(shell LC_ALL=C date)
--export XEN_BUILD_TIME ?= $(shell LC_ALL=C date +%T)
--export XEN_BUILD_HOST ?= $(shell hostname)
-+ifeq ($(origin XEN_DOMAIN), undefined)
-+export XEN_DOMAIN := $(shell ([ -x /bin/dnsdomainname ] && /bin/dnsdomainname) || ([ -x /bin/domainname ] && /bin/domainname || echo [unknown]))
-+endif
-+ifeq ($(origin XEN_BUILD_DATE), undefined)
-+export XEN_BUILD_DATE := $(shell LC_ALL=C date)
-+endif
-+ifeq ($(origin XEN_BUILD_TIME), undefined)
-+export XEN_BUILD_TIME := $(shell LC_ALL=C date +%T)
-+endif
-+ifeq ($(origin XEN_BUILD_HOST), undefined)
-+export XEN_BUILD_HOST := $(shell hostname)
-+endif
-
- # Best effort attempt to find a python interpreter, defaulting to Python 3 if
- # available. Fall back to just `python` if `which` is nowhere to be found.
---
-2.42.0
-
diff --git a/0007-x86-vmx-Fix-IRQ-handling-for-EXIT_REASON_INIT.patch b/0007-x86-vmx-Fix-IRQ-handling-for-EXIT_REASON_INIT.patch
new file mode 100644
index 0000000..a1937a7
--- /dev/null
+++ b/0007-x86-vmx-Fix-IRQ-handling-for-EXIT_REASON_INIT.patch
@@ -0,0 +1,65 @@
+From eca5416f9b0e179de9553900de8de660ab09199d Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 2 Feb 2024 08:02:51 +0100
+Subject: [PATCH 07/10] x86/vmx: Fix IRQ handling for EXIT_REASON_INIT
+
+When receiving an INIT, a prior bugfix tried to ignore the INIT and continue
+onwards.
+
+Unfortunately it's not safe to return at that point in vmx_vmexit_handler().
+Just out of context in the first hunk is a local_irqs_enabled() which is
+depended-upon by the return-to-guest path, causing the following checklock
+failure in debug builds:
+
+ (XEN) Error: INIT received - ignoring
+ (XEN) CHECKLOCK FAILURE: prev irqsafe: 0, curr irqsafe 1
+ (XEN) Xen BUG at common/spinlock.c:132
+ (XEN) ----[ Xen-4.19-unstable x86_64 debug=y Tainted: H ]----
+ ...
+ (XEN) Xen call trace:
+ (XEN) [<ffff82d040238e10>] R check_lock+0xcd/0xe1
+ (XEN) [<ffff82d040238fe3>] F _spin_lock+0x1b/0x60
+ (XEN) [<ffff82d0402ed6a8>] F pt_update_irq+0x32/0x3bb
+ (XEN) [<ffff82d0402b9632>] F vmx_intr_assist+0x3b/0x51d
+ (XEN) [<ffff82d040206447>] F vmx_asm_vmexit_handler+0xf7/0x210
+
+Luckily, this is benign in release builds. Accidentally having IRQs disabled
+when trying to take an IRQs-on lock isn't a deadlock-vulnerable pattern.
+
+Drop the problematic early return. In hindsight, it's wrong to skip other
+normal VMExit steps.
+
+Fixes: b1f11273d5a7 ("x86/vmx: Don't spuriously crash the domain when INIT is received")
+Reported-by: Reima ISHII <ishiir@g.ecc.u-tokyo.ac.jp>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: d1f8883aebe00f6a9632d77ab0cd5c6d02c9cbe4
+master date: 2024-01-18 20:59:06 +0000
+---
+ xen/arch/x86/hvm/vmx/vmx.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
+index 072288a5ef..31f4a861c6 100644
+--- a/xen/arch/x86/hvm/vmx/vmx.c
++++ b/xen/arch/x86/hvm/vmx/vmx.c
+@@ -4037,7 +4037,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+
+ case EXIT_REASON_INIT:
+ printk(XENLOG_ERR "Error: INIT received - ignoring\n");
+- return; /* Renter the guest without further processing */
++ break;
+ }
+
+ /* Now enable interrupts so it's safe to take locks. */
+@@ -4323,6 +4323,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
+ break;
+ }
+ case EXIT_REASON_EXTERNAL_INTERRUPT:
++ case EXIT_REASON_INIT:
+ /* Already handled above. */
+ break;
+ case EXIT_REASON_TRIPLE_FAULT:
+--
+2.43.0
+
diff --git a/0008-Config.mk-evaluate-XEN_COMPILE_ARCH-and-XEN_OS-immed.patch b/0008-Config.mk-evaluate-XEN_COMPILE_ARCH-and-XEN_OS-immed.patch
deleted file mode 100644
index 8a4cb7d..0000000
--- a/0008-Config.mk-evaluate-XEN_COMPILE_ARCH-and-XEN_OS-immed.patch
+++ /dev/null
@@ -1,50 +0,0 @@
-From 476d2624ec3cf3e60709580ff1df208bb8f616e2 Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Mon, 31 Jul 2023 15:02:34 +0200
-Subject: [PATCH 08/55] Config.mk: evaluate XEN_COMPILE_ARCH and XEN_OS
- immediately
-
-With GNU make 4.4, the number of execution of the command present in
-these $(shell ) increased greatly. This is probably because as of make
-4.4, exported variable are also added to the environment of $(shell )
-construct.
-
-So to avoid having these command been run more than necessary, we
-will replace ?= by an equivalent but with immediate expansion.
-
-Reported-by: Jason Andryuk <jandryuk@gmail.com>
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Tested-by: Jason Andryuk <jandryuk@gmail.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit a07414d989cf52e5e84192b78023bee1589bbda4)
----
- Config.mk | 8 ++++++--
- 1 file changed, 6 insertions(+), 2 deletions(-)
-
-diff --git a/Config.mk b/Config.mk
-index 8bc2bcd5f6..4864033c73 100644
---- a/Config.mk
-+++ b/Config.mk
-@@ -19,13 +19,17 @@ or = $(if $(strip $(1)),$(1),$(if $(strip $(2)),$(2),$(if $(strip $(3)),$(
-
- -include $(XEN_ROOT)/.config
-
--XEN_COMPILE_ARCH ?= $(shell uname -m | sed -e s/i.86/x86_32/ \
-+ifeq ($(origin XEN_COMPILE_ARCH), undefined)
-+XEN_COMPILE_ARCH := $(shell uname -m | sed -e s/i.86/x86_32/ \
- -e s/i86pc/x86_32/ -e s/amd64/x86_64/ \
- -e s/armv7.*/arm32/ -e s/armv8.*/arm64/ \
- -e s/aarch64/arm64/)
-+endif
-
- XEN_TARGET_ARCH ?= $(XEN_COMPILE_ARCH)
--XEN_OS ?= $(shell uname -s)
-+ifeq ($(origin XEN_OS), undefined)
-+XEN_OS := $(shell uname -s)
-+endif
-
- CONFIG_$(XEN_OS) := y
-
---
-2.42.0
-
diff --git a/0008-x86-vmx-Disallow-the-use-of-inactivity-states.patch b/0008-x86-vmx-Disallow-the-use-of-inactivity-states.patch
new file mode 100644
index 0000000..12c2d59
--- /dev/null
+++ b/0008-x86-vmx-Disallow-the-use-of-inactivity-states.patch
@@ -0,0 +1,126 @@
+From 7bd612727df792671e44152a8205f0cf821ad984 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 2 Feb 2024 08:03:26 +0100
+Subject: [PATCH 08/10] x86/vmx: Disallow the use of inactivity states
+
+Right now, vvmx will blindly copy L12's ACTIVITY_STATE into the L02 VMCS and
+enter the vCPU. Luckily for us, nested-virt is explicitly unsupported for
+security bugs.
+
+The inactivity states are HLT, SHUTDOWN and WAIT-FOR-SIPI, and as noted by the
+SDM in Vol3 27.7 "Special Features of VM Entry":
+
+ If VM entry ends with the logical processor in an inactive activity state,
+ the VM entry generates any special bus cycle that is normally generated when
+ that activity state is entered from the active state.
+
+Also,
+
+ Some activity states unconditionally block certain events.
+
+I.e. A VMEntry with ACTIVITY=SHUTDOWN will initiate a platform reset, while a
+VMEntry with ACTIVITY=WAIT-FOR-SIPI will really block everything other than
+SIPIs.
+
+Both of these activity states are for the TXT ACM to use, not for regular
+hypervisors, and Xen doesn't support dropping the HLT intercept either.
+
+There are two paths in Xen which operate on ACTIVITY_STATE.
+
+1) The vmx_{get,set}_nonreg_state() helpers for VM-Fork.
+
+ As regular VMs can't use any inactivity states, this is just duplicating
+ the 0 from construct_vmcs(). Retain the ability to query activity_state,
+ but crash the domain on any attempt to set an inactivity state.
+
+2) Nested virt, because of ACTIVITY_STATE in vmcs_gstate_field[].
+
+ Explicitly hide the inactivity states in the guest's view of MSR_VMX_MISC,
+ and remove ACTIVITY_STATE from vmcs_gstate_field[].
+
+ In virtual_vmentry(), we should trigger a VMEntry failure for the use of
+ any inactivity states, but there's no support for that in the code at all
+ so leave a TODO for when we finally start working on nested-virt in
+ earnest.
+
+Reported-by: Reima Ishii <ishiir@g.ecc.u-tokyo.ac.jp>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
+master commit: 3643bb53a05b7c8fbac072c63bef1538f2a6d0d2
+master date: 2024-01-18 20:59:06 +0000
+---
+ xen/arch/x86/hvm/vmx/vmx.c | 8 +++++++-
+ xen/arch/x86/hvm/vmx/vvmx.c | 9 +++++++--
+ xen/arch/x86/include/asm/hvm/vmx/vmcs.h | 1 +
+ 3 files changed, 15 insertions(+), 3 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
+index 31f4a861c6..35d391d8e5 100644
+--- a/xen/arch/x86/hvm/vmx/vmx.c
++++ b/xen/arch/x86/hvm/vmx/vmx.c
+@@ -1499,7 +1499,13 @@ static void cf_check vmx_set_nonreg_state(struct vcpu *v,
+ {
+ vmx_vmcs_enter(v);
+
+- __vmwrite(GUEST_ACTIVITY_STATE, nrs->vmx.activity_state);
++ if ( nrs->vmx.activity_state )
++ {
++ printk("Attempt to set %pv activity_state %#lx\n",
++ v, nrs->vmx.activity_state);
++ domain_crash(v->domain);
++ }
++
+ __vmwrite(GUEST_INTERRUPTIBILITY_INFO, nrs->vmx.interruptibility_info);
+ __vmwrite(GUEST_PENDING_DBG_EXCEPTIONS, nrs->vmx.pending_dbg);
+
+diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
+index f8fe8d0c14..515cb5ae77 100644
+--- a/xen/arch/x86/hvm/vmx/vvmx.c
++++ b/xen/arch/x86/hvm/vmx/vvmx.c
+@@ -910,7 +910,10 @@ static const u16 vmcs_gstate_field[] = {
+ GUEST_LDTR_AR_BYTES,
+ GUEST_TR_AR_BYTES,
+ GUEST_INTERRUPTIBILITY_INFO,
++ /*
++ * ACTIVITY_STATE is handled specially.
+ GUEST_ACTIVITY_STATE,
++ */
+ GUEST_SYSENTER_CS,
+ GUEST_PREEMPTION_TIMER,
+ /* natural */
+@@ -1211,6 +1214,8 @@ static void virtual_vmentry(struct cpu_user_regs *regs)
+ nvcpu->nv_vmentry_pending = 0;
+ nvcpu->nv_vmswitch_in_progress = 1;
+
++ /* TODO: Fail VMentry for GUEST_ACTIVITY_STATE != 0 */
++
+ /*
+ * EFER handling:
+ * hvm_set_efer won't work if CR0.PG = 1, so we change the value
+@@ -2327,8 +2332,8 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
+ data = hvm_cr4_guest_valid_bits(d);
+ break;
+ case MSR_IA32_VMX_MISC:
+- /* Do not support CR3-target feature now */
+- data = host_data & ~VMX_MISC_CR3_TARGET;
++ /* Do not support CR3-targets or activity states. */
++ data = host_data & ~(VMX_MISC_CR3_TARGET | VMX_MISC_ACTIVITY_MASK);
+ break;
+ case MSR_IA32_VMX_EPT_VPID_CAP:
+ data = nept_get_ept_vpid_cap();
+diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
+index 78404e42b3..0af021d5f5 100644
+--- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
++++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
+@@ -288,6 +288,7 @@ extern u32 vmx_secondary_exec_control;
+ #define VMX_VPID_INVVPID_SINGLE_CONTEXT_RETAINING_GLOBAL 0x80000000000ULL
+ extern u64 vmx_ept_vpid_cap;
+
++#define VMX_MISC_ACTIVITY_MASK 0x000001c0
+ #define VMX_MISC_PROC_TRACE 0x00004000
+ #define VMX_MISC_CR3_TARGET 0x01ff0000
+ #define VMX_MISC_VMWRITE_ALL 0x20000000
+--
+2.43.0
+
diff --git a/0009-lib-fdt-elf-move-lib-fdt-elf-temp.o-and-their-deps-t.patch b/0009-lib-fdt-elf-move-lib-fdt-elf-temp.o-and-their-deps-t.patch
new file mode 100644
index 0000000..9ee7104
--- /dev/null
+++ b/0009-lib-fdt-elf-move-lib-fdt-elf-temp.o-and-their-deps-t.patch
@@ -0,0 +1,70 @@
+From afb85cf1e8f165abf88de9d8a6df625692a753b1 Mon Sep 17 00:00:00 2001
+From: Michal Orzel <michal.orzel@amd.com>
+Date: Fri, 2 Feb 2024 08:04:07 +0100
+Subject: [PATCH 09/10] lib{fdt,elf}: move lib{fdt,elf}-temp.o and their deps
+ to $(targets)
+
+At the moment, trying to run xencov read/reset (calling SYSCTL_coverage_op
+under the hood) results in a crash. This is due to a profiler trying to
+access data in the .init.* sections (libfdt for Arm and libelf for x86)
+that are stripped after boot. Normally, the build system compiles any
+*.init.o file without COV_FLAGS. However, these two libraries are
+handled differently as sections will be renamed to init after linking.
+
+To override COV_FLAGS to empty for these libraries, lib{fdt,elf}.o were
+added to nocov-y. This worked until e321576f4047 ("xen/build: start using
+if_changed") that added lib{fdt,elf}-temp.o and their deps to extra-y.
+This way, even though these objects appear as prerequisites of
+lib{fdt,elf}.o and the settings should propagate to them, make can also
+build them as a prerequisite of __build, in which case COV_FLAGS would
+still have the unwanted flags. Fix it by switching to $(targets) instead.
+
+Also, for libfdt, append libfdt.o to nocov-y only if CONFIG_OVERLAY_DTB
+is not set. Otherwise, there is no section renaming and we should be able
+to run the coverage.
+
+Fixes: e321576f4047 ("xen/build: start using if_changed")
+Signed-off-by: Michal Orzel <michal.orzel@amd.com>
+Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+master commit: 79519fcfa0605bbf19d8c02b979af3a2c8afed68
+master date: 2024-01-23 12:02:44 +0100
+---
+ xen/common/libelf/Makefile | 2 +-
+ xen/common/libfdt/Makefile | 4 ++--
+ 2 files changed, 3 insertions(+), 3 deletions(-)
+
+diff --git a/xen/common/libelf/Makefile b/xen/common/libelf/Makefile
+index 8a4522e4e1..917d12b006 100644
+--- a/xen/common/libelf/Makefile
++++ b/xen/common/libelf/Makefile
+@@ -13,4 +13,4 @@ $(obj)/libelf.o: $(obj)/libelf-temp.o FORCE
+ $(obj)/libelf-temp.o: $(addprefix $(obj)/,$(libelf-objs)) FORCE
+ $(call if_changed,ld)
+
+-extra-y += libelf-temp.o $(libelf-objs)
++targets += libelf-temp.o $(libelf-objs)
+diff --git a/xen/common/libfdt/Makefile b/xen/common/libfdt/Makefile
+index 75aaefa2e3..4d14fd61ba 100644
+--- a/xen/common/libfdt/Makefile
++++ b/xen/common/libfdt/Makefile
+@@ -2,9 +2,9 @@ include $(src)/Makefile.libfdt
+
+ SECTIONS := text data $(SPECIAL_DATA_SECTIONS)
+ OBJCOPYFLAGS := $(foreach s,$(SECTIONS),--rename-section .$(s)=.init.$(s))
++nocov-y += libfdt.o
+
+ obj-y += libfdt.o
+-nocov-y += libfdt.o
+
+ CFLAGS-y += -I$(srctree)/include/xen/libfdt/
+
+@@ -14,4 +14,4 @@ $(obj)/libfdt.o: $(obj)/libfdt-temp.o FORCE
+ $(obj)/libfdt-temp.o: $(addprefix $(obj)/,$(LIBFDT_OBJS)) FORCE
+ $(call if_changed,ld)
+
+-extra-y += libfdt-temp.o $(LIBFDT_OBJS)
++targets += libfdt-temp.o $(LIBFDT_OBJS)
+--
+2.43.0
+
diff --git a/0009-x86emul-rework-wrapping-of-libc-functions-in-test-an.patch b/0009-x86emul-rework-wrapping-of-libc-functions-in-test-an.patch
deleted file mode 100644
index 4f9c0bb..0000000
--- a/0009-x86emul-rework-wrapping-of-libc-functions-in-test-an.patch
+++ /dev/null
@@ -1,245 +0,0 @@
-From 37f1d68fa34220600f1e4ec82af5da70127757e5 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Fri, 18 Aug 2023 15:04:28 +0200
-Subject: [PATCH 09/55] x86emul: rework wrapping of libc functions in test and
- fuzzing harnesses
-
-Our present approach is working fully behind the compiler's back. This
-was found to not work with LTO. Employ ld's --wrap= option instead. Note
-that while this makes the build work at least with new enough gcc (it
-doesn't with gcc7, for example, due to tool chain side issues afaict),
-according to my testing things still won't work when building the
-fuzzing harness with afl-cc: While with the gcc7 tool chain I see afl-as
-getting invoked, this does not happen with gcc13. Yet without using that
-assembler wrapper the resulting binary will look uninstrumented to
-afl-fuzz.
-
-While checking the resulting binaries I noticed that we've gained uses
-of snprintf() and strstr(), which only just so happen to not cause any
-problems. Add a wrappers for them as well.
-
-Since we don't have any actual uses of v{,sn}printf(), no definitions of
-their wrappers appear (just yet). But I think we want
-__wrap_{,sn}printf() to properly use __real_v{,sn}printf() right away,
-which means we need delarations of the latter.
-
-Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-(cherry picked from commit 6fba45ca3be1c5d46cddb1eaf371d9e69550b244)
----
- tools/fuzz/x86_instruction_emulator/Makefile | 6 ++-
- tools/tests/x86_emulator/Makefile | 4 +-
- tools/tests/x86_emulator/wrappers.c | 55 ++++++++++++++------
- tools/tests/x86_emulator/x86-emulate.h | 14 +++--
- 4 files changed, 53 insertions(+), 26 deletions(-)
-
-diff --git a/tools/fuzz/x86_instruction_emulator/Makefile b/tools/fuzz/x86_instruction_emulator/Makefile
-index 13aa238503..c83959c847 100644
---- a/tools/fuzz/x86_instruction_emulator/Makefile
-+++ b/tools/fuzz/x86_instruction_emulator/Makefile
-@@ -29,6 +29,8 @@ GCOV_FLAGS := --coverage
- %-cov.o: %.c
- $(CC) -c $(CFLAGS) $(GCOV_FLAGS) $< -o $@
-
-+WRAPPED = $(shell sed -n 's,^ *WRAP(\([[:alnum:]_]*\));,\1,p' x86-emulate.h)
-+
- x86-emulate.h: x86_emulate/x86_emulate.h
- x86-emulate.o x86-emulate-cov.o: x86-emulate.h x86_emulate/x86_emulate.c
- fuzz-emul.o fuzz-emul-cov.o wrappers.o: x86-emulate.h
-@@ -37,10 +39,10 @@ x86-insn-fuzzer.a: fuzz-emul.o x86-emulate.o cpuid.o
- $(AR) rc $@ $^
-
- afl-harness: afl-harness.o fuzz-emul.o x86-emulate.o cpuid.o wrappers.o
-- $(CC) $(CFLAGS) $^ -o $@
-+ $(CC) $(CFLAGS) $(addprefix -Wl$(comma)--wrap=,$(WRAPPED)) $^ -o $@
-
- afl-harness-cov: afl-harness-cov.o fuzz-emul-cov.o x86-emulate-cov.o cpuid.o wrappers.o
-- $(CC) $(CFLAGS) $(GCOV_FLAGS) $^ -o $@
-+ $(CC) $(CFLAGS) $(GCOV_FLAGS) $(addprefix -Wl$(comma)--wrap=,$(WRAPPED)) $^ -o $@
-
- # Common targets
- .PHONY: all
-diff --git a/tools/tests/x86_emulator/Makefile b/tools/tests/x86_emulator/Makefile
-index bd82598f97..a2fd6607c6 100644
---- a/tools/tests/x86_emulator/Makefile
-+++ b/tools/tests/x86_emulator/Makefile
-@@ -250,8 +250,10 @@ xop.h avx512f.h: simd-fma.c
-
- endif # 32-bit override
-
-+WRAPPED := $(shell sed -n 's,^ *WRAP(\([[:alnum:]_]*\));,\1,p' x86-emulate.h)
-+
- $(TARGET): x86-emulate.o cpuid.o test_x86_emulator.o evex-disp8.o predicates.o wrappers.o
-- $(HOSTCC) $(HOSTCFLAGS) -o $@ $^
-+ $(HOSTCC) $(HOSTCFLAGS) $(addprefix -Wl$(comma)--wrap=,$(WRAPPED)) -o $@ $^
-
- .PHONY: clean
- clean:
-diff --git a/tools/tests/x86_emulator/wrappers.c b/tools/tests/x86_emulator/wrappers.c
-index eba7cc93c5..3829a6f416 100644
---- a/tools/tests/x86_emulator/wrappers.c
-+++ b/tools/tests/x86_emulator/wrappers.c
-@@ -1,78 +1,103 @@
- #include <stdarg.h>
-
--#define WRAP(x) typeof(x) emul_##x
-+#define WRAP(x) typeof(x) __wrap_ ## x, __real_ ## x
- #include "x86-emulate.h"
-
--size_t emul_fwrite(const void *src, size_t sz, size_t n, FILE *f)
-+size_t __wrap_fwrite(const void *src, size_t sz, size_t n, FILE *f)
- {
- emul_save_fpu_state();
-- sz = fwrite(src, sz, n, f);
-+ sz = __real_fwrite(src, sz, n, f);
- emul_restore_fpu_state();
-
- return sz;
- }
-
--int emul_memcmp(const void *p1, const void *p2, size_t sz)
-+int __wrap_memcmp(const void *p1, const void *p2, size_t sz)
- {
- int rc;
-
- emul_save_fpu_state();
-- rc = memcmp(p1, p2, sz);
-+ rc = __real_memcmp(p1, p2, sz);
- emul_restore_fpu_state();
-
- return rc;
- }
-
--void *emul_memcpy(void *dst, const void *src, size_t sz)
-+void *__wrap_memcpy(void *dst, const void *src, size_t sz)
- {
- emul_save_fpu_state();
-- memcpy(dst, src, sz);
-+ __real_memcpy(dst, src, sz);
- emul_restore_fpu_state();
-
- return dst;
- }
-
--void *emul_memset(void *dst, int c, size_t sz)
-+void *__wrap_memset(void *dst, int c, size_t sz)
- {
- emul_save_fpu_state();
-- memset(dst, c, sz);
-+ __real_memset(dst, c, sz);
- emul_restore_fpu_state();
-
- return dst;
- }
-
--int emul_printf(const char *fmt, ...)
-+int __wrap_printf(const char *fmt, ...)
- {
- va_list varg;
- int rc;
-
- emul_save_fpu_state();
- va_start(varg, fmt);
-- rc = vprintf(fmt, varg);
-+ rc = __real_vprintf(fmt, varg);
- va_end(varg);
- emul_restore_fpu_state();
-
- return rc;
- }
-
--int emul_putchar(int c)
-+int __wrap_putchar(int c)
- {
- int rc;
-
- emul_save_fpu_state();
-- rc = putchar(c);
-+ rc = __real_putchar(c);
- emul_restore_fpu_state();
-
- return rc;
- }
-
--int emul_puts(const char *str)
-+int __wrap_puts(const char *str)
- {
- int rc;
-
- emul_save_fpu_state();
-- rc = puts(str);
-+ rc = __real_puts(str);
- emul_restore_fpu_state();
-
- return rc;
- }
-+
-+int __wrap_snprintf(char *buf, size_t n, const char *fmt, ...)
-+{
-+ va_list varg;
-+ int rc;
-+
-+ emul_save_fpu_state();
-+ va_start(varg, fmt);
-+ rc = __real_vsnprintf(buf, n, fmt, varg);
-+ va_end(varg);
-+ emul_restore_fpu_state();
-+
-+ return rc;
-+}
-+
-+char *__wrap_strstr(const char *s1, const char *s2)
-+{
-+ char *s;
-+
-+ emul_save_fpu_state();
-+ s = __real_strstr(s1, s2);
-+ emul_restore_fpu_state();
-+
-+ return s;
-+}
-diff --git a/tools/tests/x86_emulator/x86-emulate.h b/tools/tests/x86_emulator/x86-emulate.h
-index 19bea9c38d..58760f096d 100644
---- a/tools/tests/x86_emulator/x86-emulate.h
-+++ b/tools/tests/x86_emulator/x86-emulate.h
-@@ -29,9 +29,7 @@
- #ifdef EOF
- # error "Must not include <stdio.h> before x86-emulate.h"
- #endif
--#ifdef WRAP
--# include <stdio.h>
--#endif
-+#include <stdio.h>
-
- #include <xen/xen.h>
-
-@@ -85,11 +83,7 @@ void emul_restore_fpu_state(void);
- * around the actual function.
- */
- #ifndef WRAP
--# if 0 /* This only works for explicit calls, not for compiler generated ones. */
--# define WRAP(x) typeof(x) x asm("emul_" #x)
--# else
--# define WRAP(x) asm(".equ " #x ", emul_" #x)
--# endif
-+# define WRAP(x) typeof(x) __wrap_ ## x
- #endif
-
- WRAP(fwrite);
-@@ -99,6 +93,10 @@ WRAP(memset);
- WRAP(printf);
- WRAP(putchar);
- WRAP(puts);
-+WRAP(snprintf);
-+WRAP(strstr);
-+WRAP(vprintf);
-+WRAP(vsnprintf);
-
- #undef WRAP
-
---
-2.42.0
-
diff --git a/0010-rombios-Work-around-GCC-issue-99578.patch b/0010-rombios-Work-around-GCC-issue-99578.patch
deleted file mode 100644
index 3995f02..0000000
--- a/0010-rombios-Work-around-GCC-issue-99578.patch
+++ /dev/null
@@ -1,43 +0,0 @@
-From ae1045c42954772e48862162d0e95fbc9393c91e Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 17 Aug 2023 21:32:53 +0100
-Subject: [PATCH 10/55] rombios: Work around GCC issue 99578
-
-GCC 12 objects to pointers derived from a constant:
-
- util.c: In function 'find_rsdp':
- util.c:429:16: error: array subscript 0 is outside array bounds of 'uint16_t[0]' {aka 'short unsigned int[]'} [-Werror=array-bounds]
- 429 | ebda_seg = *(uint16_t *)ADDR_FROM_SEG_OFF(0x40, 0xe);
- cc1: all warnings being treated as errors
-
-This is a GCC bug, but work around it rather than turning array-bounds
-checking off generally.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit e35138a2ffbe1fe71edaaaaae71063dc545a8416)
----
- tools/firmware/rombios/32bit/util.c | 6 +++---
- 1 file changed, 3 insertions(+), 3 deletions(-)
-
-diff --git a/tools/firmware/rombios/32bit/util.c b/tools/firmware/rombios/32bit/util.c
-index 6c1c480514..a47e000a26 100644
---- a/tools/firmware/rombios/32bit/util.c
-+++ b/tools/firmware/rombios/32bit/util.c
-@@ -424,10 +424,10 @@ static struct acpi_20_rsdp *__find_rsdp(const void *start, unsigned int len)
- struct acpi_20_rsdp *find_rsdp(void)
- {
- struct acpi_20_rsdp *rsdp;
-- uint16_t ebda_seg;
-+ uint16_t *volatile /* GCC issue 99578 */ ebda_seg =
-+ ADDR_FROM_SEG_OFF(0x40, 0xe);
-
-- ebda_seg = *(uint16_t *)ADDR_FROM_SEG_OFF(0x40, 0xe);
-- rsdp = __find_rsdp((void *)(ebda_seg << 16), 1024);
-+ rsdp = __find_rsdp((void *)(*ebda_seg << 16), 1024);
- if (!rsdp)
- rsdp = __find_rsdp((void *)0xE0000, 0x20000);
-
---
-2.42.0
-
diff --git a/0010-x86-p2m-pt-fix-off-by-one-in-entry-check-assert.patch b/0010-x86-p2m-pt-fix-off-by-one-in-entry-check-assert.patch
new file mode 100644
index 0000000..ba99063
--- /dev/null
+++ b/0010-x86-p2m-pt-fix-off-by-one-in-entry-check-assert.patch
@@ -0,0 +1,36 @@
+From 091466ba55d1e2e75738f751818ace2e3ed08ccf Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Fri, 2 Feb 2024 08:04:33 +0100
+Subject: [PATCH 10/10] x86/p2m-pt: fix off by one in entry check assert
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The MMIO RO rangeset overlap check is bogus: the rangeset is inclusive so the
+passed end mfn should be the last mfn to be mapped (not last + 1).
+
+Fixes: 6fa1755644d0 ('amd/npt/shadow: replace assert that prevents creating 2M/1G MMIO entries')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: George Dunlap <george.dunlap@cloud.com>
+master commit: 610775d0dd61c1bd2f4720c755986098e6a5bafd
+master date: 2024-01-25 16:09:04 +0100
+---
+ xen/arch/x86/mm/p2m-pt.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
+index eaba2b0fb4..f02ebae372 100644
+--- a/xen/arch/x86/mm/p2m-pt.c
++++ b/xen/arch/x86/mm/p2m-pt.c
+@@ -564,7 +564,7 @@ static void check_entry(mfn_t mfn, p2m_type_t new, p2m_type_t old,
+ if ( new == p2m_mmio_direct )
+ ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
+ !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
+- mfn_x(mfn) + (1ul << order)));
++ mfn_x(mfn) + (1UL << order) - 1));
+ else if ( p2m_allows_invalid_mfn(new) || new == p2m_invalid ||
+ new == p2m_mmio_dm )
+ ASSERT(mfn_valid(mfn) || mfn_eq(mfn, INVALID_MFN));
+--
+2.43.0
+
diff --git a/0011-rombios-Avoid-using-K-R-function-syntax.patch b/0011-rombios-Avoid-using-K-R-function-syntax.patch
deleted file mode 100644
index 0bd761f..0000000
--- a/0011-rombios-Avoid-using-K-R-function-syntax.patch
+++ /dev/null
@@ -1,74 +0,0 @@
-From 24487fec3bbebbc1fd3f00d16bca7fb0f56a5f30 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 18 Aug 2023 10:47:46 +0100
-Subject: [PATCH 11/55] rombios: Avoid using K&R function syntax
-
-Clang-15 complains:
-
- tcgbios.c:598:25: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
- void tcpa_calling_int19h()
- ^
- void
-
-C2x formally removes K&R syntax. The declarations for these functions in
-32bitprotos.h are already ANSI compatible. Update the definitions to match.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit a562afa5679d4a7ceb9cb9222fec1fea9a61f738)
----
- tools/firmware/rombios/32bit/tcgbios/tcgbios.c | 10 +++++-----
- 1 file changed, 5 insertions(+), 5 deletions(-)
-
-diff --git a/tools/firmware/rombios/32bit/tcgbios/tcgbios.c b/tools/firmware/rombios/32bit/tcgbios/tcgbios.c
-index fa22c4460a..ad0eac0d20 100644
---- a/tools/firmware/rombios/32bit/tcgbios/tcgbios.c
-+++ b/tools/firmware/rombios/32bit/tcgbios/tcgbios.c
-@@ -595,7 +595,7 @@ static void tcpa_add_measurement(uint32_t pcrIndex,
- /*
- * Add measurement to log about call of int 19h
- */
--void tcpa_calling_int19h()
-+void tcpa_calling_int19h(void)
- {
- tcpa_add_measurement(4, EV_ACTION, 0);
- }
-@@ -603,7 +603,7 @@ void tcpa_calling_int19h()
- /*
- * Add measurement to log about retuning from int 19h
- */
--void tcpa_returned_int19h()
-+void tcpa_returned_int19h(void)
- {
- tcpa_add_measurement(4, EV_ACTION, 1);
- }
-@@ -611,7 +611,7 @@ void tcpa_returned_int19h()
- /*
- * Add event separators for PCRs 0 to 7; specs 8.2.3
- */
--void tcpa_add_event_separators()
-+void tcpa_add_event_separators(void)
- {
- uint32_t pcrIndex = 0;
- while (pcrIndex <= 7) {
-@@ -624,7 +624,7 @@ void tcpa_add_event_separators()
- /*
- * Add a wake event to the log
- */
--void tcpa_wake_event()
-+void tcpa_wake_event(void)
- {
- tcpa_add_measurement_to_log(6,
- EV_ACTION,
-@@ -659,7 +659,7 @@ void tcpa_add_bootdevice(uint32_t bootcd, uint32_t bootdrv)
- * Add measurement to the log about option rom scan
- * 10.4.3 : action 14
- */
--void tcpa_start_option_rom_scan()
-+void tcpa_start_option_rom_scan(void)
- {
- tcpa_add_measurement(2, EV_ACTION, 14);
- }
---
-2.42.0
-
diff --git a/0012-rombios-Remove-the-use-of-egrep.patch b/0012-rombios-Remove-the-use-of-egrep.patch
deleted file mode 100644
index 44702b4..0000000
--- a/0012-rombios-Remove-the-use-of-egrep.patch
+++ /dev/null
@@ -1,34 +0,0 @@
-From e418a77295e6b512d212b57123c11e4d4fb23e8c Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 18 Aug 2023 11:05:00 +0100
-Subject: [PATCH 12/55] rombios: Remove the use of egrep
-
-As the Alpine 3.18 container notes:
-
- egrep: warning: egrep is obsolescent; using grep -E
-
-Adjust it.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 5ddac3c2852ecc120acab86fc403153a2097c5dc)
----
- tools/firmware/rombios/32bit/Makefile | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/tools/firmware/rombios/32bit/Makefile b/tools/firmware/rombios/32bit/Makefile
-index c058c71551..50d45647c2 100644
---- a/tools/firmware/rombios/32bit/Makefile
-+++ b/tools/firmware/rombios/32bit/Makefile
-@@ -26,7 +26,7 @@ $(TARGET): 32bitbios_all.o
- 32bitbios_all.o: 32bitbios.o tcgbios/tcgbiosext.o util.o pmm.o
- $(LD) $(LDFLAGS_DIRECT) -s -r $^ -o 32bitbios_all.o
- @nm 32bitbios_all.o | \
-- egrep '^ +U ' >/dev/null && { \
-+ grep -E '^ +U ' >/dev/null && { \
- echo "There are undefined symbols in the BIOS:"; \
- nm -u 32bitbios_all.o; \
- exit 11; \
---
-2.42.0
-
diff --git a/0013-CI-Resync-FreeBSD-config-with-staging.patch b/0013-CI-Resync-FreeBSD-config-with-staging.patch
deleted file mode 100644
index dcd867b..0000000
--- a/0013-CI-Resync-FreeBSD-config-with-staging.patch
+++ /dev/null
@@ -1,62 +0,0 @@
-From f00d56309533427981f09ef2614f1bae4bcab62e Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 17 Feb 2023 11:16:32 +0000
-Subject: [PATCH 13/55] CI: Resync FreeBSD config with staging
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-CI: Update FreeBSD to 13.1
-
-Also print the compiler version before starting. It's not easy to find
-otherwise, and does change from time to time.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
-(cherry picked from commit 5e7667ea2dd33e0e5e0f3a96db37fdb4ecd98fba)
-
-CI: Update FreeBSD to 13.2
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Stefano Stabellini <sstabellini@kernel.org>
-(cherry picked from commit f872a624cbf92de9944483eea7674ef80ced1380)
-
-CI: Update FreeBSD to 12.4
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-(cherry picked from commit a73560896ce3c513460f26bd1c205060d6ec4f8a)
----
- .cirrus.yml | 5 +++--
- 1 file changed, 3 insertions(+), 2 deletions(-)
-
-diff --git a/.cirrus.yml b/.cirrus.yml
-index c38333e736..7e0beb200d 100644
---- a/.cirrus.yml
-+++ b/.cirrus.yml
-@@ -10,19 +10,20 @@ freebsd_template: &FREEBSD_TEMPLATE
- libxml2 glib git
-
- build_script:
-+ - cc --version
- - ./configure --with-system-seabios=/usr/local/share/seabios/bios.bin
- - gmake -j`sysctl -n hw.ncpu` clang=y
-
- task:
- name: 'FreeBSD 12'
- freebsd_instance:
-- image_family: freebsd-12-3
-+ image_family: freebsd-12-4
- << : *FREEBSD_TEMPLATE
-
- task:
- name: 'FreeBSD 13'
- freebsd_instance:
-- image_family: freebsd-13-0
-+ image_family: freebsd-13-2
- << : *FREEBSD_TEMPLATE
-
- task:
---
-2.42.0
-
diff --git a/0014-tools-vchan-Fix-Wsingle-bit-bitfield-constant-conver.patch b/0014-tools-vchan-Fix-Wsingle-bit-bitfield-constant-conver.patch
deleted file mode 100644
index 6e29490..0000000
--- a/0014-tools-vchan-Fix-Wsingle-bit-bitfield-constant-conver.patch
+++ /dev/null
@@ -1,43 +0,0 @@
-From 052a8d24bc670ab6503e21dfd2fb8bccfc22aa73 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 8 Aug 2023 14:53:42 +0100
-Subject: [PATCH 14/55] tools/vchan: Fix
- -Wsingle-bit-bitfield-constant-conversion
-
-Gitlab reports:
-
- node.c:158:17: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
-
- ctrl->blocking = 1;
- ^ ~
- 1 error generated.
- make[4]: *** [/builds/xen-project/people/andyhhp/xen/tools/vchan/../../tools/Rules.mk:188: node.o] Error 1
-
-In Xen 4.18, this was fixed with c/s 99ab02f63ea8 ("tools: convert bitfields
-to unsigned type") but this is an ABI change which can't be backported.
-
-Swich 1 for -1 to provide a minimally invasive way to fix the build.
-
-No functional change.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
----
- tools/vchan/node.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/tools/vchan/node.c b/tools/vchan/node.c
-index f1638f013d..a28293b720 100644
---- a/tools/vchan/node.c
-+++ b/tools/vchan/node.c
-@@ -155,7 +155,7 @@ int main(int argc, char **argv)
- perror("libxenvchan_*_init");
- exit(1);
- }
-- ctrl->blocking = 1;
-+ ctrl->blocking = -1;
-
- srand(seed);
- fprintf(stderr, "seed=%d\n", seed);
---
-2.42.0
-
diff --git a/0015-xen-vcpu-ignore-VCPU_SSHOTTMR_future.patch b/0015-xen-vcpu-ignore-VCPU_SSHOTTMR_future.patch
deleted file mode 100644
index 81e010b..0000000
--- a/0015-xen-vcpu-ignore-VCPU_SSHOTTMR_future.patch
+++ /dev/null
@@ -1,143 +0,0 @@
-From 7b5155a79ea946dd513847d4e7ad2b7e6a4ebb73 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Sep 2023 08:45:29 +0200
-Subject: [PATCH 15/55] xen/vcpu: ignore VCPU_SSHOTTMR_future
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The usage of VCPU_SSHOTTMR_future in Linux prior to 4.7 is bogus.
-When the hypervisor returns -ETIME (timeout in the past) Linux keeps
-retrying to setup the timer with a higher timeout instead of
-self-injecting a timer interrupt.
-
-On boxes without any hardware assistance for logdirty we have seen HVM
-Linux guests < 4.7 with 32vCPUs give up trying to setup the timer when
-logdirty is enabled:
-
-CE: Reprogramming failure. Giving up
-CE: xen increased min_delta_ns to 1000000 nsec
-CE: Reprogramming failure. Giving up
-CE: Reprogramming failure. Giving up
-CE: xen increased min_delta_ns to 506250 nsec
-CE: xen increased min_delta_ns to 759375 nsec
-CE: xen increased min_delta_ns to 1000000 nsec
-CE: Reprogramming failure. Giving up
-CE: Reprogramming failure. Giving up
-CE: Reprogramming failure. Giving up
-Freezing user space processes ...
-INFO: rcu_sched detected stalls on CPUs/tasks: { 14} (detected by 10, t=60002 jiffies, g=4006, c=4005, q=14130)
-Task dump for CPU 14:
-swapper/14 R running task 0 0 1 0x00000000
-Call Trace:
- [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
- [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
- [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
- [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
- [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
- [<ffffffff900000d5>] ? start_cpu+0x5/0x14
-INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=6922, c=6921, q=7013)
-Task dump for CPU 26:
-swapper/26 R running task 0 0 1 0x00000000
-Call Trace:
- [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
- [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
- [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
- [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
- [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
- [<ffffffff900000d5>] ? start_cpu+0x5/0x14
-INFO: rcu_sched detected stalls on CPUs/tasks: { 26} (detected by 24, t=60002 jiffies, g=8499, c=8498, q=7664)
-Task dump for CPU 26:
-swapper/26 R running task 0 0 1 0x00000000
-Call Trace:
- [<ffffffff90160f5d>] ? rcu_eqs_enter_common.isra.30+0x3d/0xf0
- [<ffffffff907b9bde>] ? default_idle+0x1e/0xd0
- [<ffffffff90039570>] ? arch_cpu_idle+0x20/0xc0
- [<ffffffff9010820a>] ? cpu_startup_entry+0x14a/0x1e0
- [<ffffffff9005d3a7>] ? start_secondary+0x1f7/0x270
- [<ffffffff900000d5>] ? start_cpu+0x5/0x14
-
-Thus leading to CPU stalls and a broken system as a result.
-
-Workaround this bogus usage by ignoring the VCPU_SSHOTTMR_future in
-the hypervisor. Old Linux versions are the only ones known to have
-(wrongly) attempted to use the flag, and ignoring it is compatible
-with the behavior expected by any guests setting that flag.
-
-Note the usage of the flag has been removed from Linux by commit:
-
-c06b6d70feb3 xen/x86: don't lose event interrupts
-
-Which landed in Linux 4.7.
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Acked-by: Henry Wang <Henry.Wang@arm.com> # CHANGELOG
-Acked-by: Jan Beulich <jbeulich@suse.com>
-master commit: 19c6cbd90965b1440bd551069373d6fa3f2f365d
-master date: 2023-05-03 13:36:05 +0200
----
- CHANGELOG.md | 6 ++++++
- xen/common/domain.c | 13 ++++++++++---
- xen/include/public/vcpu.h | 5 ++++-
- 3 files changed, 20 insertions(+), 4 deletions(-)
-
-diff --git a/CHANGELOG.md b/CHANGELOG.md
-index 7f4d0f25e9..bb0eceb69a 100644
---- a/CHANGELOG.md
-+++ b/CHANGELOG.md
-@@ -4,6 +4,12 @@ Notable changes to Xen will be documented in this file.
-
- The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
-
-+## [4.17.3](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.17.3)
-+
-+### Changed
-+ - Ignore VCPUOP_set_singleshot_timer's VCPU_SSHOTTMR_future flag. The only
-+ known user doesn't use it properly, leading to in-guest breakage.
-+
- ## [4.17.0](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.17.0) - 2022-12-12
-
- ### Changed
-diff --git a/xen/common/domain.c b/xen/common/domain.c
-index 53f7e734fe..30c2279673 100644
---- a/xen/common/domain.c
-+++ b/xen/common/domain.c
-@@ -1691,9 +1691,16 @@ long common_vcpu_op(int cmd, struct vcpu *v, XEN_GUEST_HANDLE_PARAM(void) arg)
- if ( copy_from_guest(&set, arg, 1) )
- return -EFAULT;
-
-- if ( (set.flags & VCPU_SSHOTTMR_future) &&
-- (set.timeout_abs_ns < NOW()) )
-- return -ETIME;
-+ if ( set.timeout_abs_ns < NOW() )
-+ {
-+ /*
-+ * Simplify the logic if the timeout has already expired and just
-+ * inject the event.
-+ */
-+ stop_timer(&v->singleshot_timer);
-+ send_timer_event(v);
-+ break;
-+ }
-
- migrate_timer(&v->singleshot_timer, smp_processor_id());
- set_timer(&v->singleshot_timer, set.timeout_abs_ns);
-diff --git a/xen/include/public/vcpu.h b/xen/include/public/vcpu.h
-index 81a3b3a743..a836b264a9 100644
---- a/xen/include/public/vcpu.h
-+++ b/xen/include/public/vcpu.h
-@@ -150,7 +150,10 @@ typedef struct vcpu_set_singleshot_timer vcpu_set_singleshot_timer_t;
- DEFINE_XEN_GUEST_HANDLE(vcpu_set_singleshot_timer_t);
-
- /* Flags to VCPUOP_set_singleshot_timer. */
-- /* Require the timeout to be in the future (return -ETIME if it's passed). */
-+ /*
-+ * Request the timeout to be in the future (return -ETIME if it's passed)
-+ * but can be ignored by the hypervisor.
-+ */
- #define _VCPU_SSHOTTMR_future (0)
- #define VCPU_SSHOTTMR_future (1U << _VCPU_SSHOTTMR_future)
-
---
-2.42.0
-
diff --git a/0016-x86-head-check-base-address-alignment.patch b/0016-x86-head-check-base-address-alignment.patch
deleted file mode 100644
index 2b9cead..0000000
--- a/0016-x86-head-check-base-address-alignment.patch
+++ /dev/null
@@ -1,85 +0,0 @@
-From e5f9987d5f63ecc3cc9884c614aca699a41e7ca7 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Sep 2023 08:46:28 +0200
-Subject: [PATCH 16/55] x86/head: check base address alignment
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Ensure that the base address is 2M aligned, or else the page table
-entries created would be corrupt as reserved bits on the PDE end up
-set.
-
-We have encountered a broken firmware where grub2 would end up loading
-Xen at a non 2M aligned region when using the multiboot2 protocol, and
-that caused a very difficult to debug triple fault.
-
-If the alignment is not as required by the page tables print an error
-message and stop the boot. Also add a build time check that the
-calculation of symbol offsets don't break alignment of passed
-addresses.
-
-The check could be performed earlier, but so far the alignment is
-required by the page tables, and hence feels more natural that the
-check lives near to the piece of code that requires it.
-
-Note that when booted as an EFI application from the PE entry point
-the alignment check is already performed by
-efi_arch_load_addr_check(), and hence there's no need to add another
-check at the point where page tables get built in
-efi_arch_memory_setup().
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 0946068e7faea22868c577d7afa54ba4970ff520
-master date: 2023-05-03 13:36:25 +0200
----
- xen/arch/x86/boot/head.S | 14 ++++++++++++++
- 1 file changed, 14 insertions(+)
-
-diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
-index 245c859dd7..6bc64c9e86 100644
---- a/xen/arch/x86/boot/head.S
-+++ b/xen/arch/x86/boot/head.S
-@@ -1,3 +1,4 @@
-+#include <xen/lib.h>
- #include <xen/multiboot.h>
- #include <xen/multiboot2.h>
- #include <public/xen.h>
-@@ -121,6 +122,7 @@ multiboot2_header:
- .Lbad_ldr_nst: .asciz "ERR: EFI SystemTable is not provided by bootloader!"
- .Lbad_ldr_nih: .asciz "ERR: EFI ImageHandle is not provided by bootloader!"
- .Lbad_efi_msg: .asciz "ERR: EFI IA-32 platforms are not supported!"
-+.Lbad_alg_msg: .asciz "ERR: Xen must be loaded at a 2Mb boundary!"
-
- .section .init.data, "aw", @progbits
- .align 4
-@@ -146,6 +148,9 @@ bad_cpu:
- not_multiboot:
- mov $sym_offs(.Lbad_ldr_msg), %ecx
- jmp .Lget_vtb
-+.Lnot_aligned:
-+ mov $sym_offs(.Lbad_alg_msg), %ecx
-+ jmp .Lget_vtb
- .Lmb2_no_st:
- /*
- * Here we are on EFI platform. vga_text_buffer was zapped earlier
-@@ -673,6 +678,15 @@ trampoline_setup:
- cmp %edi, %eax
- jb 1b
-
-+ .if !IS_ALIGNED(sym_offs(0), 1 << L2_PAGETABLE_SHIFT)
-+ .error "Symbol offset calculation breaks alignment"
-+ .endif
-+
-+ /* Check that the image base is aligned. */
-+ lea sym_esi(_start), %eax
-+ test $(1 << L2_PAGETABLE_SHIFT) - 1, %eax
-+ jnz .Lnot_aligned
-+
- /* Map Xen into the higher mappings using 2M superpages. */
- lea _PAGE_PSE + PAGE_HYPERVISOR_RWX + sym_esi(_start), %eax
- mov $sym_offs(_start), %ecx /* %eax = PTE to write ^ */
---
-2.42.0
-
diff --git a/0017-xenalyze-Handle-start-of-day-RUNNING-transitions.patch b/0017-xenalyze-Handle-start-of-day-RUNNING-transitions.patch
deleted file mode 100644
index a4501a3..0000000
--- a/0017-xenalyze-Handle-start-of-day-RUNNING-transitions.patch
+++ /dev/null
@@ -1,275 +0,0 @@
-From f04295dd802fb6cd43a02ec59a5964b2c5950fe1 Mon Sep 17 00:00:00 2001
-From: George Dunlap <george.dunlap@cloud.com>
-Date: Tue, 5 Sep 2023 08:47:14 +0200
-Subject: [PATCH 17/55] xenalyze: Handle start-of-day ->RUNNING transitions
-
-A recent xentrace highlighted an unhandled corner case in the vcpu
-"start-of-day" logic, if the trace starts after the last running ->
-non-running transition, but before the first non-running -> running
-transition. Because start-of-day wasn't handled, vcpu_next_update()
-was expecting p->current to be NULL, and tripping out with the
-following error message when it wasn't:
-
-vcpu_next_update: FATAL: p->current not NULL! (d32768dv$p, runstate RUNSTATE_INIT)
-
-where 32768 is the DEFAULT_DOMAIN, and $p is the pcpu number.
-
-Instead of calling vcpu_start() piecemeal throughout
-sched_runstate_process(), call it at the top of the function if the
-vcpu in question is still in RUNSTATE_INIT, so that we can handle all
-the cases in one place.
-
-Sketch out at the top of the function all cases which we need to
-handle, and what to do in those cases. Some transitions tell us where
-v is running; some transitions tell us about what is (or is not)
-running on p; some transitions tell us neither.
-
-If a transition tells us where v is now running, update its state;
-otherwise leave it in INIT, in order to avoid having to deal with TSC
-skew on start-up.
-
-If a transition tells us what is or is not running on p, update
-p->current (either to v or NULL). Otherwise leave it alone.
-
-If neither, do nothing.
-
-Reifying those rules:
-
-- If we're continuing to run, set v to RUNNING, and use p->first_tsc
- as the runstate time.
-
-- If we're starting to run, set v to RUNNING, and use ri->tsc as the
- runstate time.
-
-- If v is being deschedled, leave v in the INIT state to avoid dealing
- with TSC skew; but set p->current to NULL so that whatever is
- scheduled next won't trigger the assert in vcpu_next_update().
-
-- If a vcpu is waking up (switching from one non-runnable state to
- another non-runnable state), leave v in INIT, and p in whatever
- state it's in (which may be the default domain, or some other vcpu
- which has already run).
-
-While here, fix the comment above vcpu_start; it's called when the
-vcpu state is INIT, not when current is the default domain.
-
-Signed-off-by: George Dunlap <george.dunlap@cloud.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Anthony PERARD <anthony.perard@citrix.com>
-master commit: aab4b38b5d77e3c65f44bacd56427a85b7392a11
-master date: 2023-06-30 11:25:33 +0100
----
- tools/xentrace/xenalyze.c | 159 ++++++++++++++++++++++++--------------
- 1 file changed, 101 insertions(+), 58 deletions(-)
-
-diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
-index e7ec284eea..9b4b62c82f 100644
---- a/tools/xentrace/xenalyze.c
-+++ b/tools/xentrace/xenalyze.c
-@@ -6885,39 +6885,86 @@ void vcpu_next_update(struct pcpu_info *p, struct vcpu_data *next, tsc_t tsc)
- p->lost_record.seen_valid_schedule = 1;
- }
-
--/* If current is the default domain, we're fixing up from something
-- * like start-of-day. Update what we can. */
--void vcpu_start(struct pcpu_info *p, struct vcpu_data *v) {
-- /* If vcpus are created, or first show up, in a "dead zone", this will
-- * fail. */
-- if( !p->current || p->current->d->did != DEFAULT_DOMAIN) {
-- fprintf(stderr, "Strange, p->current not default domain!\n");
-- error(ERR_FILE, NULL);
-- return;
-- }
-+/*
-+ * If the vcpu in question is in state INIT, we're fixing up from something
-+ * like start-of-day. Update what we can.
-+ */
-+void vcpu_start(struct pcpu_info *p, struct vcpu_data *v,
-+ int old_runstate, int new_runstate, tsc_t ri_tsc) {
-+ tsc_t tsc;
-+
-+ /*
-+ *
-+ * Cases:
-+ * running -> running:
-+ * v -> running, using p->first_tsc
-+ * {runnable, blocked} -> running:
-+ * v -> running, using ri->tsc
-+ * running -> {runnable, blocked}:
-+ * Leave v INIT, but clear p->current in case another vcpu is scheduled
-+ * blocked -> runnable:
-+ * Leave INIT, and also leave p->current, since we still don't know who's scheduled here
-+ */
-+
-+ /*
-+ * NB that a vcpu won't come out of INIT until it starts running somewhere.
-+ * If this event is pcpu that has already seen a scheduling event, p->current
-+ * should be null; if this is the first scheduling event on this pcpu,
-+ * p->current should be the default domain.
-+ */
-+ if( old_runstate == RUNSTATE_RUNNING ) {
-+ if ( !p->current || p->current->d->did != DEFAULT_DOMAIN) {
-+ fprintf(stderr, "Strange, p->current not default domain!\n");
-+ error(ERR_FILE, NULL);
-+ return;
-
-- if(!p->first_tsc) {
-- fprintf(stderr, "Strange, p%d first_tsc 0!\n", p->pid);
-- error(ERR_FILE, NULL);
-+ }
-+
-+ if(!p->first_tsc) {
-+ fprintf(stderr, "Strange, p%d first_tsc 0!\n", p->pid);
-+ error(ERR_FILE, NULL);
-+ }
-+
-+ if(p->first_tsc <= p->current->runstate.tsc) {
-+ fprintf(stderr, "Strange, first_tsc %llx < default_domain runstate tsc %llx!\n",
-+ p->first_tsc,
-+ p->current->runstate.tsc);
-+ error(ERR_FILE, NULL);
-+ }
-+
-+ /* Change default domain to 'queued' */
-+ runstate_update(p->current, RUNSTATE_QUEUED, p->first_tsc);
-+
-+ /*
-+ * Set current to NULL, so that if another vcpu (not in INIT)
-+ * is scheduled here, we don't trip over the check in
-+ * vcpu_next_update()
-+ */
-+ p->current = NULL;
- }
-
-- if(p->first_tsc <= p->current->runstate.tsc) {
-- fprintf(stderr, "Strange, first_tsc %llx < default_domain runstate tsc %llx!\n",
-- p->first_tsc,
-- p->current->runstate.tsc);
-- error(ERR_FILE, NULL);
-+ /* TSC skew at start-of-day is hard to deal with. Don't
-+ * bring a vcpu out of INIT until it's seen to be actually
-+ * running somewhere. */
-+ if ( new_runstate != RUNSTATE_RUNNING ) {
-+ fprintf(warn, "First schedule for d%dv%d doesn't take us into a running state; leaving INIT\n",
-+ v->d->did, v->vid);
-+
-+ return;
- }
-
-- /* Change default domain to 'queued' */
-- runstate_update(p->current, RUNSTATE_QUEUED, p->first_tsc);
-+ tsc = ri_tsc;
-+ if ( old_runstate == RUNSTATE_RUNNING ) {
-+ /* FIXME: Copy over data from the default domain this interval */
-+ fprintf(warn, "Using first_tsc for d%dv%d (%lld cycles)\n",
-+ v->d->did, v->vid, p->last_tsc - p->first_tsc);
-
-- /* FIXME: Copy over data from the default domain this interval */
-- fprintf(warn, "Using first_tsc for d%dv%d (%lld cycles)\n",
-- v->d->did, v->vid, p->last_tsc - p->first_tsc);
-+ tsc = p->first_tsc;
-+ }
-
- /* Simulate the time since the first tsc */
-- runstate_update(v, RUNSTATE_RUNNING, p->first_tsc);
-- p->time.tsc = p->first_tsc;
-+ runstate_update(v, RUNSTATE_RUNNING, tsc);
-+ p->time.tsc = tsc;
- p->current = v;
- pcpu_string_draw(p);
- v->p = p;
-@@ -7021,6 +7068,13 @@ void sched_runstate_process(struct pcpu_info *p)
- last_oldstate = v->runstate.last_oldstate;
- v->runstate.last_oldstate.wrong = RUNSTATE_INIT;
-
-+ /* Handle all "start-of-day" issues in one place. This can be
-+ * done before any of the other tracks or sanity checks. */
-+ if ( v->runstate.state == RUNSTATE_INIT ) {
-+ vcpu_start(p, v, sevt.old_runstate, sevt.new_runstate, ri->tsc);
-+ return;
-+ }
-+
- /* Close vmexits when the putative reason for blocking / &c stops.
- * This way, we don't account cpu contention to some other overhead. */
- if(sevt.new_runstate == RUNSTATE_RUNNABLE
-@@ -7190,32 +7244,27 @@ update:
- * or stopping actually running on a physical cpu. */
- if ( type == CONTINUE )
- {
-- if( v->runstate.state == RUNSTATE_INIT ) {
-- /* Start-of-day; account first tsc -> now to v */
-- vcpu_start(p, v);
-- } else {
-- /* Continue running. First, do some sanity checks */
-- if ( v->runstate.state == RUNSTATE_LOST ) {
-- fprintf(warn, "WARNING: continue with d%dv%d in RUNSTATE_LOST. Resetting current.\n",
-- v->d->did, v->vid);
-- if ( p->current )
-- vcpu_prev_update(p, p->current, ri->tsc, RUNSTATE_LOST);
-- vcpu_next_update(p, v, ri->tsc);
-- }
-- else if( v->runstate.state != RUNSTATE_RUNNING ) {
-- /* This should never happen. */
-- fprintf(warn, "FATAL: sevt.old_runstate running, but d%dv%d runstate %s!\n",
-- v->d->did, v->vid, runstate_name[v->runstate.state]);
-- error(ERR_FILE, NULL);
-- } else if ( v->p != p ) {
-- fprintf(warn, "FATAL: continue on p%d, but d%dv%d p%d!\n",
-- p->pid, v->d->did, v->vid,
-- v->p ? v->p->pid : -1);
-- error(ERR_FILE, NULL);
-- }
--
-- runstate_update(v, RUNSTATE_RUNNING, ri->tsc);
-+ /* Continue running. First, do some sanity checks */
-+ if ( v->runstate.state == RUNSTATE_LOST ) {
-+ fprintf(warn, "WARNING: continue with d%dv%d in RUNSTATE_LOST. Resetting current.\n",
-+ v->d->did, v->vid);
-+ if ( p->current )
-+ vcpu_prev_update(p, p->current, ri->tsc, RUNSTATE_LOST);
-+ vcpu_next_update(p, v, ri->tsc);
-+ }
-+ else if( v->runstate.state != RUNSTATE_RUNNING ) {
-+ /* This should never happen. */
-+ fprintf(warn, "FATAL: sevt.old_runstate running, but d%dv%d runstate %s!\n",
-+ v->d->did, v->vid, runstate_name[v->runstate.state]);
-+ error(ERR_FILE, NULL);
-+ } else if ( v->p != p ) {
-+ fprintf(warn, "FATAL: continue on p%d, but d%dv%d p%d!\n",
-+ p->pid, v->d->did, v->vid,
-+ v->p ? v->p->pid : -1);
-+ error(ERR_FILE, NULL);
- }
-+
-+ runstate_update(v, RUNSTATE_RUNNING, ri->tsc);
- }
- else if ( sevt.old_runstate == RUNSTATE_RUNNING
- || v->runstate.state == RUNSTATE_RUNNING )
-@@ -7232,10 +7281,7 @@ update:
- * # (should never happen)
- */
- if( sevt.old_runstate == RUNSTATE_RUNNING ) {
-- if( v->runstate.state == RUNSTATE_INIT ) {
-- /* Start-of-day; account first tsc -> now to v */
-- vcpu_start(p, v);
-- } else if( v->runstate.state != RUNSTATE_RUNNING
-+ if( v->runstate.state != RUNSTATE_RUNNING
- && v->runstate.state != RUNSTATE_LOST ) {
- /* This should never happen. */
- fprintf(warn, "FATAL: sevt.old_runstate running, but d%dv%d runstate %s!\n",
-@@ -7264,11 +7310,8 @@ update:
-
- vcpu_next_update(p, v, ri->tsc);
- }
-- else if ( v->runstate.state != RUNSTATE_INIT )
-+ else
- {
-- /* TSC skew at start-of-day is hard to deal with. Don't
-- * bring a vcpu out of INIT until it's seen to be actually
-- * running somewhere. */
- runstate_update(v, sevt.new_runstate, ri->tsc);
- }
-
---
-2.42.0
-
diff --git a/0018-x86-ioapic-sanitize-IO-APIC-pins-before-enabling-lap.patch b/0018-x86-ioapic-sanitize-IO-APIC-pins-before-enabling-lap.patch
deleted file mode 100644
index a03f86e..0000000
--- a/0018-x86-ioapic-sanitize-IO-APIC-pins-before-enabling-lap.patch
+++ /dev/null
@@ -1,113 +0,0 @@
-From d0cdd34dd815bf99c3f8a7bddfdde5ae59b0f0db Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Sep 2023 08:47:34 +0200
-Subject: [PATCH 18/55] x86/ioapic: sanitize IO-APIC pins before enabling lapic
- LVTERR/ESR
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The current logic to init the local APIC and the IO-APIC does init the
-local APIC LVTERR/ESR before doing any sanitization on the IO-APIC pin
-configuration. It's already noted on enable_IO_APIC() that Xen
-shouldn't trust the IO-APIC being empty at bootup.
-
-At XenServer we have a system where the IO-APIC 0 is handed to Xen
-with pin 0 unmasked, set to Fixed delivery mode, edge triggered and
-with a vector of 0 (all fields of the RTE are zeroed). Once the local
-APIC LVTERR/ESR is enabled periodic injections from such pin cause the
-local APIC to in turn inject periodic error vectors:
-
-APIC error on CPU0: 00(40), Received illegal vector
-APIC error on CPU0: 40(40), Received illegal vector
-APIC error on CPU0: 40(40), Received illegal vector
-APIC error on CPU0: 40(40), Received illegal vector
-APIC error on CPU0: 40(40), Received illegal vector
-APIC error on CPU0: 40(40), Received illegal vector
-
-That prevents Xen from booting.
-
-Move the masking of the IO-APIC pins ahead of the setup of the local
-APIC. This has the side effect of also moving the detection of the
-pin where the i8259 is connected, as such detection must be done
-before masking any pins.
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 813da5f0e73b8cbd2ac3c7922506e58c28cd736d
-master date: 2023-07-17 10:31:10 +0200
----
- xen/arch/x86/apic.c | 4 ++++
- xen/arch/x86/include/asm/irq.h | 1 +
- xen/arch/x86/io_apic.c | 4 +---
- xen/arch/x86/smpboot.c | 5 +++++
- 4 files changed, 11 insertions(+), 3 deletions(-)
-
-diff --git a/xen/arch/x86/apic.c b/xen/arch/x86/apic.c
-index 47e6e5fe41..33103d3e91 100644
---- a/xen/arch/x86/apic.c
-+++ b/xen/arch/x86/apic.c
-@@ -1491,6 +1491,10 @@ int __init APIC_init_uniprocessor (void)
- physids_clear(phys_cpu_present_map);
- physid_set(boot_cpu_physical_apicid, phys_cpu_present_map);
-
-+ if ( !skip_ioapic_setup && nr_ioapics )
-+ /* Sanitize the IO-APIC pins before enabling the lapic LVTERR/ESR. */
-+ enable_IO_APIC();
-+
- setup_local_APIC(true);
-
- if (nmi_watchdog == NMI_LOCAL_APIC)
-diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
-index 76e6ed6d60..f6a0207a80 100644
---- a/xen/arch/x86/include/asm/irq.h
-+++ b/xen/arch/x86/include/asm/irq.h
-@@ -122,6 +122,7 @@ bool bogus_8259A_irq(unsigned int irq);
- int i8259A_suspend(void);
- int i8259A_resume(void);
-
-+void enable_IO_APIC(void);
- void setup_IO_APIC(void);
- void disable_IO_APIC(void);
- void setup_ioapic_dest(void);
-diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
-index 9b8a972cf5..25a08b1ea6 100644
---- a/xen/arch/x86/io_apic.c
-+++ b/xen/arch/x86/io_apic.c
-@@ -1273,7 +1273,7 @@ static void cf_check _print_IO_APIC_keyhandler(unsigned char key)
- __print_IO_APIC(0);
- }
-
--static void __init enable_IO_APIC(void)
-+void __init enable_IO_APIC(void)
- {
- int i8259_apic, i8259_pin;
- int i, apic;
-@@ -2067,8 +2067,6 @@ static void __init ioapic_pm_state_alloc(void)
-
- void __init setup_IO_APIC(void)
- {
-- enable_IO_APIC();
--
- if (acpi_ioapic)
- io_apic_irqs = ~0; /* all IRQs go through IOAPIC */
- else
-diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
-index b46fd9ab18..41ec3211ac 100644
---- a/xen/arch/x86/smpboot.c
-+++ b/xen/arch/x86/smpboot.c
-@@ -1232,6 +1232,11 @@ void __init smp_prepare_cpus(void)
- verify_local_APIC();
-
- connect_bsp_APIC();
-+
-+ if ( !skip_ioapic_setup && nr_ioapics )
-+ /* Sanitize the IO-APIC pins before enabling the lapic LVTERR/ESR. */
-+ enable_IO_APIC();
-+
- setup_local_APIC(true);
-
- if ( !skip_ioapic_setup && nr_ioapics )
---
-2.42.0
-
diff --git a/0019-x86-ioapic-add-a-raw-field-to-RTE-struct.patch b/0019-x86-ioapic-add-a-raw-field-to-RTE-struct.patch
deleted file mode 100644
index 10e5946..0000000
--- a/0019-x86-ioapic-add-a-raw-field-to-RTE-struct.patch
+++ /dev/null
@@ -1,147 +0,0 @@
-From a885649098e06432939907eee84f735a644883e6 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Sep 2023 08:48:43 +0200
-Subject: [PATCH 19/55] x86/ioapic: add a raw field to RTE struct
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Further changes will require access to the full RTE as a single value
-in order to pass it to IOMMU interrupt remapping handlers.
-
-No functional change intended.
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-master commit: cdc48cb5a74b10c2b07a09d2f554756d730bfee3
-master date: 2023-07-28 09:39:44 +0200
----
- xen/arch/x86/include/asm/io_apic.h | 57 +++++++++++++-----------
- xen/arch/x86/io_apic.c | 2 +-
- xen/drivers/passthrough/amd/iommu_intr.c | 4 +-
- xen/drivers/passthrough/vtd/intremap.c | 4 +-
- 4 files changed, 35 insertions(+), 32 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h
-index ef0878b09e..a558bb063c 100644
---- a/xen/arch/x86/include/asm/io_apic.h
-+++ b/xen/arch/x86/include/asm/io_apic.h
-@@ -89,35 +89,38 @@ enum ioapic_irq_destination_types {
- };
-
- struct IO_APIC_route_entry {
-- unsigned int vector:8;
-- unsigned int delivery_mode:3; /*
-- * 000: FIXED
-- * 001: lowest prio
-- * 111: ExtINT
-- */
-- unsigned int dest_mode:1; /* 0: physical, 1: logical */
-- unsigned int delivery_status:1;
-- unsigned int polarity:1; /* 0: low, 1: high */
-- unsigned int irr:1;
-- unsigned int trigger:1; /* 0: edge, 1: level */
-- unsigned int mask:1; /* 0: enabled, 1: disabled */
-- unsigned int __reserved_2:15;
--
- union {
- struct {
-- unsigned int __reserved_1:24;
-- unsigned int physical_dest:4;
-- unsigned int __reserved_2:4;
-- } physical;
--
-- struct {
-- unsigned int __reserved_1:24;
-- unsigned int logical_dest:8;
-- } logical;
--
-- /* used when Interrupt Remapping with EIM is enabled */
-- unsigned int dest32;
-- } dest;
-+ unsigned int vector:8;
-+ unsigned int delivery_mode:3; /*
-+ * 000: FIXED
-+ * 001: lowest prio
-+ * 111: ExtINT
-+ */
-+ unsigned int dest_mode:1; /* 0: physical, 1: logical */
-+ unsigned int delivery_status:1;
-+ unsigned int polarity:1; /* 0: low, 1: high */
-+ unsigned int irr:1;
-+ unsigned int trigger:1; /* 0: edge, 1: level */
-+ unsigned int mask:1; /* 0: enabled, 1: disabled */
-+ unsigned int __reserved_2:15;
-+
-+ union {
-+ struct {
-+ unsigned int __reserved_1:24;
-+ unsigned int physical_dest:4;
-+ unsigned int __reserved_2:4;
-+ } physical;
-+
-+ struct {
-+ unsigned int __reserved_1:24;
-+ unsigned int logical_dest:8;
-+ } logical;
-+ unsigned int dest32;
-+ } dest;
-+ };
-+ uint64_t raw;
-+ };
- };
-
- /*
-diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
-index 25a08b1ea6..aada2ef96c 100644
---- a/xen/arch/x86/io_apic.c
-+++ b/xen/arch/x86/io_apic.c
-@@ -2360,7 +2360,7 @@ int ioapic_guest_read(unsigned long physbase, unsigned int reg, u32 *pval)
- int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val)
- {
- int apic, pin, irq, ret, pirq;
-- struct IO_APIC_route_entry rte = { 0 };
-+ struct IO_APIC_route_entry rte = { };
- unsigned long flags;
- struct irq_desc *desc;
-
-diff --git a/xen/drivers/passthrough/amd/iommu_intr.c b/xen/drivers/passthrough/amd/iommu_intr.c
-index f4de09f431..9e6be3be35 100644
---- a/xen/drivers/passthrough/amd/iommu_intr.c
-+++ b/xen/drivers/passthrough/amd/iommu_intr.c
-@@ -352,8 +352,8 @@ static int update_intremap_entry_from_ioapic(
- void cf_check amd_iommu_ioapic_update_ire(
- unsigned int apic, unsigned int reg, unsigned int value)
- {
-- struct IO_APIC_route_entry old_rte = { 0 };
-- struct IO_APIC_route_entry new_rte = { 0 };
-+ struct IO_APIC_route_entry old_rte = { };
-+ struct IO_APIC_route_entry new_rte = { };
- unsigned int rte_lo = (reg & 1) ? reg - 1 : reg;
- unsigned int pin = (reg - 0x10) / 2;
- int seg, bdf, rc;
-diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
-index 1512e4866b..019c21c556 100644
---- a/xen/drivers/passthrough/vtd/intremap.c
-+++ b/xen/drivers/passthrough/vtd/intremap.c
-@@ -419,7 +419,7 @@ unsigned int cf_check io_apic_read_remap_rte(
- {
- unsigned int ioapic_pin = (reg - 0x10) / 2;
- int index;
-- struct IO_xAPIC_route_entry old_rte = { 0 };
-+ struct IO_xAPIC_route_entry old_rte = { };
- int rte_upper = (reg & 1) ? 1 : 0;
- struct vtd_iommu *iommu = ioapic_to_iommu(IO_APIC_ID(apic));
-
-@@ -442,7 +442,7 @@ void cf_check io_apic_write_remap_rte(
- unsigned int apic, unsigned int reg, unsigned int value)
- {
- unsigned int ioapic_pin = (reg - 0x10) / 2;
-- struct IO_xAPIC_route_entry old_rte = { 0 };
-+ struct IO_xAPIC_route_entry old_rte = { };
- struct IO_APIC_route_remap_entry *remap_rte;
- unsigned int rte_upper = (reg & 1) ? 1 : 0;
- struct vtd_iommu *iommu = ioapic_to_iommu(IO_APIC_ID(apic));
---
-2.42.0
-
diff --git a/0020-x86-ioapic-RTE-modifications-must-use-ioapic_write_e.patch b/0020-x86-ioapic-RTE-modifications-must-use-ioapic_write_e.patch
deleted file mode 100644
index 43faeeb..0000000
--- a/0020-x86-ioapic-RTE-modifications-must-use-ioapic_write_e.patch
+++ /dev/null
@@ -1,180 +0,0 @@
-From 1bd4523d696d26976f64a919df8c7a1b3ea32f6f Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Sep 2023 08:49:37 +0200
-Subject: [PATCH 20/55] x86/ioapic: RTE modifications must use
- ioapic_write_entry
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Do not allow to write to RTE registers using io_apic_write and instead
-require changes to RTE to be performed using ioapic_write_entry.
-
-This is in preparation for passing the full contents of the RTE to the
-IOMMU interrupt remapping handlers, so remapping entries for IO-APIC
-RTEs can be updated atomically when possible.
-
-While immediately this commit might expand the number of MMIO accesses
-in order to update an IO-APIC RTE, further changes will benefit from
-getting the full RTE value passed to the IOMMU handlers, as the logic
-is greatly simplified when the IOMMU handlers can get the complete RTE
-value in one go.
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: ef7995ed1bcd7eac37fb3c3fe56eaa54ea9baf6c
-master date: 2023-07-28 09:40:20 +0200
----
- xen/arch/x86/include/asm/io_apic.h | 8 ++---
- xen/arch/x86/io_apic.c | 43 ++++++++++++------------
- xen/drivers/passthrough/amd/iommu_intr.c | 6 ----
- 3 files changed, 25 insertions(+), 32 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/io_apic.h b/xen/arch/x86/include/asm/io_apic.h
-index a558bb063c..6b514b4e3d 100644
---- a/xen/arch/x86/include/asm/io_apic.h
-+++ b/xen/arch/x86/include/asm/io_apic.h
-@@ -161,8 +161,8 @@ static inline void __io_apic_write(unsigned int apic, unsigned int reg, unsigned
-
- static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned int value)
- {
-- if ( ioapic_reg_remapped(reg) )
-- return iommu_update_ire_from_apic(apic, reg, value);
-+ /* RTE writes must use ioapic_write_entry. */
-+ BUG_ON(reg >= 0x10);
- __io_apic_write(apic, reg, value);
- }
-
-@@ -172,8 +172,8 @@ static inline void io_apic_write(unsigned int apic, unsigned int reg, unsigned i
- */
- static inline void io_apic_modify(unsigned int apic, unsigned int reg, unsigned int value)
- {
-- if ( ioapic_reg_remapped(reg) )
-- return iommu_update_ire_from_apic(apic, reg, value);
-+ /* RTE writes must use ioapic_write_entry. */
-+ BUG_ON(reg >= 0x10);
- *(IO_APIC_BASE(apic) + 4) = value;
- }
-
-diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
-index aada2ef96c..041233b9b7 100644
---- a/xen/arch/x86/io_apic.c
-+++ b/xen/arch/x86/io_apic.c
-@@ -237,15 +237,15 @@ struct IO_APIC_route_entry __ioapic_read_entry(
- {
- union entry_union eu;
-
-- if ( raw )
-+ if ( raw || !iommu_intremap )
- {
- eu.w1 = __io_apic_read(apic, 0x10 + 2 * pin);
- eu.w2 = __io_apic_read(apic, 0x11 + 2 * pin);
- }
- else
- {
-- eu.w1 = io_apic_read(apic, 0x10 + 2 * pin);
-- eu.w2 = io_apic_read(apic, 0x11 + 2 * pin);
-+ eu.w1 = iommu_read_apic_from_ire(apic, 0x10 + 2 * pin);
-+ eu.w2 = iommu_read_apic_from_ire(apic, 0x11 + 2 * pin);
- }
-
- return eu.entry;
-@@ -269,15 +269,15 @@ void __ioapic_write_entry(
- {
- union entry_union eu = { .entry = e };
-
-- if ( raw )
-+ if ( raw || !iommu_intremap )
- {
- __io_apic_write(apic, 0x11 + 2 * pin, eu.w2);
- __io_apic_write(apic, 0x10 + 2 * pin, eu.w1);
- }
- else
- {
-- io_apic_write(apic, 0x11 + 2 * pin, eu.w2);
-- io_apic_write(apic, 0x10 + 2 * pin, eu.w1);
-+ iommu_update_ire_from_apic(apic, 0x11 + 2 * pin, eu.w2);
-+ iommu_update_ire_from_apic(apic, 0x10 + 2 * pin, eu.w1);
- }
- }
-
-@@ -433,16 +433,17 @@ static void modify_IO_APIC_irq(unsigned int irq, unsigned int enable,
- unsigned int disable)
- {
- struct irq_pin_list *entry = irq_2_pin + irq;
-- unsigned int pin, reg;
-
- for (;;) {
-- pin = entry->pin;
-+ unsigned int pin = entry->pin;
-+ struct IO_APIC_route_entry rte;
-+
- if (pin == -1)
- break;
-- reg = io_apic_read(entry->apic, 0x10 + pin*2);
-- reg &= ~disable;
-- reg |= enable;
-- io_apic_modify(entry->apic, 0x10 + pin*2, reg);
-+ rte = __ioapic_read_entry(entry->apic, pin, false);
-+ rte.raw &= ~(uint64_t)disable;
-+ rte.raw |= enable;
-+ __ioapic_write_entry(entry->apic, pin, false, rte);
- if (!entry->next)
- break;
- entry = irq_2_pin + entry->next;
-@@ -584,16 +585,16 @@ set_ioapic_affinity_irq(struct irq_desc *desc, const cpumask_t *mask)
- dest = SET_APIC_LOGICAL_ID(dest);
- entry = irq_2_pin + irq;
- for (;;) {
-- unsigned int data;
-+ struct IO_APIC_route_entry rte;
-+
- pin = entry->pin;
- if (pin == -1)
- break;
-
-- io_apic_write(entry->apic, 0x10 + 1 + pin*2, dest);
-- data = io_apic_read(entry->apic, 0x10 + pin*2);
-- data &= ~IO_APIC_REDIR_VECTOR_MASK;
-- data |= MASK_INSR(desc->arch.vector, IO_APIC_REDIR_VECTOR_MASK);
-- io_apic_modify(entry->apic, 0x10 + pin*2, data);
-+ rte = __ioapic_read_entry(entry->apic, pin, false);
-+ rte.dest.dest32 = dest;
-+ rte.vector = desc->arch.vector;
-+ __ioapic_write_entry(entry->apic, pin, false, rte);
-
- if (!entry->next)
- break;
-@@ -2127,10 +2128,8 @@ void ioapic_resume(void)
- reg_00.bits.ID = mp_ioapics[apic].mpc_apicid;
- __io_apic_write(apic, 0, reg_00.raw);
- }
-- for (i = 0; i < nr_ioapic_entries[apic]; i++, entry++) {
-- __io_apic_write(apic, 0x11+2*i, *(((int *)entry)+1));
-- __io_apic_write(apic, 0x10+2*i, *(((int *)entry)+0));
-- }
-+ for (i = 0; i < nr_ioapic_entries[apic]; i++, entry++)
-+ __ioapic_write_entry(apic, i, true, *entry);
- }
- spin_unlock_irqrestore(&ioapic_lock, flags);
- }
-diff --git a/xen/drivers/passthrough/amd/iommu_intr.c b/xen/drivers/passthrough/amd/iommu_intr.c
-index 9e6be3be35..f32c418a7e 100644
---- a/xen/drivers/passthrough/amd/iommu_intr.c
-+++ b/xen/drivers/passthrough/amd/iommu_intr.c
-@@ -361,12 +361,6 @@ void cf_check amd_iommu_ioapic_update_ire(
- struct amd_iommu *iommu;
- unsigned int idx;
-
-- if ( !iommu_intremap )
-- {
-- __io_apic_write(apic, reg, value);
-- return;
-- }
--
- idx = ioapic_id_to_index(IO_APIC_ID(apic));
- if ( idx == MAX_IO_APICS )
- return;
---
-2.42.0
-
diff --git a/0021-iommu-vtd-rename-io_apic_read_remap_rte-local-variab.patch b/0021-iommu-vtd-rename-io_apic_read_remap_rte-local-variab.patch
deleted file mode 100644
index 6560452..0000000
--- a/0021-iommu-vtd-rename-io_apic_read_remap_rte-local-variab.patch
+++ /dev/null
@@ -1,64 +0,0 @@
-From e08e7330c58b7ee1efb00e348521a6afc524dc38 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Sep 2023 08:50:05 +0200
-Subject: [PATCH 21/55] iommu/vtd: rename io_apic_read_remap_rte() local
- variable
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Preparatory change to unify the IO-APIC pin variable name between
-io_apic_read_remap_rte() and amd_iommu_ioapic_update_ire(), so that
-the local variable can be made a function parameter with the same name
-across vendors.
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Kevin Tian <kevin.tian@intel.com>
-master commit: a478b38c01b65fa030303f0324a3380d872eb165
-master date: 2023-07-28 09:40:42 +0200
----
- xen/drivers/passthrough/vtd/intremap.c | 8 ++++----
- 1 file changed, 4 insertions(+), 4 deletions(-)
-
-diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
-index 019c21c556..53c9de9a75 100644
---- a/xen/drivers/passthrough/vtd/intremap.c
-+++ b/xen/drivers/passthrough/vtd/intremap.c
-@@ -441,14 +441,14 @@ unsigned int cf_check io_apic_read_remap_rte(
- void cf_check io_apic_write_remap_rte(
- unsigned int apic, unsigned int reg, unsigned int value)
- {
-- unsigned int ioapic_pin = (reg - 0x10) / 2;
-+ unsigned int pin = (reg - 0x10) / 2;
- struct IO_xAPIC_route_entry old_rte = { };
- struct IO_APIC_route_remap_entry *remap_rte;
- unsigned int rte_upper = (reg & 1) ? 1 : 0;
- struct vtd_iommu *iommu = ioapic_to_iommu(IO_APIC_ID(apic));
- int saved_mask;
-
-- old_rte = __ioapic_read_entry(apic, ioapic_pin, true);
-+ old_rte = __ioapic_read_entry(apic, pin, true);
-
- remap_rte = (struct IO_APIC_route_remap_entry *) &old_rte;
-
-@@ -458,7 +458,7 @@ void cf_check io_apic_write_remap_rte(
- __io_apic_write(apic, reg & ~1, *(u32 *)&old_rte);
- remap_rte->mask = saved_mask;
-
-- if ( ioapic_rte_to_remap_entry(iommu, apic, ioapic_pin,
-+ if ( ioapic_rte_to_remap_entry(iommu, apic, pin,
- &old_rte, rte_upper, value) )
- {
- __io_apic_write(apic, reg, value);
-@@ -468,7 +468,7 @@ void cf_check io_apic_write_remap_rte(
- __io_apic_write(apic, reg & ~1, *(u32 *)&old_rte);
- }
- else
-- __ioapic_write_entry(apic, ioapic_pin, true, old_rte);
-+ __ioapic_write_entry(apic, pin, true, old_rte);
- }
-
- static void set_msi_source_id(struct pci_dev *pdev, struct iremap_entry *ire)
---
-2.42.0
-
diff --git a/0022-x86-iommu-pass-full-IO-APIC-RTE-for-remapping-table-.patch b/0022-x86-iommu-pass-full-IO-APIC-RTE-for-remapping-table-.patch
deleted file mode 100644
index e06714e..0000000
--- a/0022-x86-iommu-pass-full-IO-APIC-RTE-for-remapping-table-.patch
+++ /dev/null
@@ -1,462 +0,0 @@
-From 5116fe12d8238cc7d6582ceefd3f7e944bff9a1d Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Sep 2023 08:50:39 +0200
-Subject: [PATCH 22/55] x86/iommu: pass full IO-APIC RTE for remapping table
- update
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-So that the remapping entry can be updated atomically when possible.
-
-Doing such update atomically will avoid Xen having to mask the IO-APIC
-pin prior to performing any interrupt movements (ie: changing the
-destination and vector fields), as the interrupt remapping entry is
-always consistent.
-
-This also simplifies some of the logic on both VT-d and AMD-Vi
-implementations, as having the full RTE available instead of half of
-it avoids to possibly read and update the missing other half from
-hardware.
-
-While there remove the explicit zeroing of new_ire fields in
-ioapic_rte_to_remap_entry() and initialize the variable at definition
-so all fields are zeroed. Note fields could be also initialized with
-final values at definition, but I found that likely too much to be
-done at this time.
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Kevin Tian <kevin.tian@intel.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 3e033172b0250446bfe119f31c7f0f51684b0472
-master date: 2023-08-01 11:48:39 +0200
----
- xen/arch/x86/include/asm/iommu.h | 3 +-
- xen/arch/x86/io_apic.c | 5 +-
- xen/drivers/passthrough/amd/iommu.h | 2 +-
- xen/drivers/passthrough/amd/iommu_intr.c | 100 ++---------------
- xen/drivers/passthrough/vtd/extern.h | 2 +-
- xen/drivers/passthrough/vtd/intremap.c | 131 +++++++++++------------
- xen/drivers/passthrough/x86/iommu.c | 4 +-
- xen/include/xen/iommu.h | 3 +-
- 8 files changed, 82 insertions(+), 168 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/iommu.h
-index fc0afe35bf..c0d4ad3742 100644
---- a/xen/arch/x86/include/asm/iommu.h
-+++ b/xen/arch/x86/include/asm/iommu.h
-@@ -97,7 +97,8 @@ struct iommu_init_ops {
-
- extern const struct iommu_init_ops *iommu_init_ops;
-
--void iommu_update_ire_from_apic(unsigned int apic, unsigned int reg, unsigned int value);
-+void iommu_update_ire_from_apic(unsigned int apic, unsigned int pin,
-+ uint64_t rte);
- unsigned int iommu_read_apic_from_ire(unsigned int apic, unsigned int reg);
- int iommu_setup_hpet_msi(struct msi_desc *);
-
-diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
-index 041233b9b7..b3afef8933 100644
---- a/xen/arch/x86/io_apic.c
-+++ b/xen/arch/x86/io_apic.c
-@@ -275,10 +275,7 @@ void __ioapic_write_entry(
- __io_apic_write(apic, 0x10 + 2 * pin, eu.w1);
- }
- else
-- {
-- iommu_update_ire_from_apic(apic, 0x11 + 2 * pin, eu.w2);
-- iommu_update_ire_from_apic(apic, 0x10 + 2 * pin, eu.w1);
-- }
-+ iommu_update_ire_from_apic(apic, pin, e.raw);
- }
-
- static void ioapic_write_entry(
-diff --git a/xen/drivers/passthrough/amd/iommu.h b/xen/drivers/passthrough/amd/iommu.h
-index 8bc3c35b1b..5429ada58e 100644
---- a/xen/drivers/passthrough/amd/iommu.h
-+++ b/xen/drivers/passthrough/amd/iommu.h
-@@ -300,7 +300,7 @@ int cf_check amd_iommu_free_intremap_table(
- unsigned int amd_iommu_intremap_table_order(
- const void *irt, const struct amd_iommu *iommu);
- void cf_check amd_iommu_ioapic_update_ire(
-- unsigned int apic, unsigned int reg, unsigned int value);
-+ unsigned int apic, unsigned int pin, uint64_t rte);
- unsigned int cf_check amd_iommu_read_ioapic_from_ire(
- unsigned int apic, unsigned int reg);
- int cf_check amd_iommu_msi_msg_update_ire(
-diff --git a/xen/drivers/passthrough/amd/iommu_intr.c b/xen/drivers/passthrough/amd/iommu_intr.c
-index f32c418a7e..e83a2a932a 100644
---- a/xen/drivers/passthrough/amd/iommu_intr.c
-+++ b/xen/drivers/passthrough/amd/iommu_intr.c
-@@ -247,11 +247,6 @@ static void update_intremap_entry(const struct amd_iommu *iommu,
- }
- }
-
--static inline int get_rte_index(const struct IO_APIC_route_entry *rte)
--{
-- return rte->vector | (rte->delivery_mode << 8);
--}
--
- static inline void set_rte_index(struct IO_APIC_route_entry *rte, int offset)
- {
- rte->vector = (u8)offset;
-@@ -267,7 +262,6 @@ static int update_intremap_entry_from_ioapic(
- int bdf,
- struct amd_iommu *iommu,
- struct IO_APIC_route_entry *rte,
-- bool_t lo_update,
- u16 *index)
- {
- unsigned long flags;
-@@ -315,31 +309,6 @@ static int update_intremap_entry_from_ioapic(
- spin_lock(lock);
- }
-
-- if ( fresh )
-- /* nothing */;
-- else if ( !lo_update )
-- {
-- /*
-- * Low half of incoming RTE is already in remapped format,
-- * so need to recover vector and delivery mode from IRTE.
-- */
-- ASSERT(get_rte_index(rte) == offset);
-- if ( iommu->ctrl.ga_en )
-- vector = entry.ptr128->full.vector;
-- else
-- vector = entry.ptr32->flds.vector;
-- /* The IntType fields match for both formats. */
-- delivery_mode = entry.ptr32->flds.int_type;
-- }
-- else if ( x2apic_enabled )
-- {
-- /*
-- * High half of incoming RTE was read from the I/O APIC and hence may
-- * not hold the full destination, so need to recover full destination
-- * from IRTE.
-- */
-- dest = get_full_dest(entry.ptr128);
-- }
- update_intremap_entry(iommu, entry, vector, delivery_mode, dest_mode, dest);
-
- spin_unlock_irqrestore(lock, flags);
-@@ -350,14 +319,11 @@ static int update_intremap_entry_from_ioapic(
- }
-
- void cf_check amd_iommu_ioapic_update_ire(
-- unsigned int apic, unsigned int reg, unsigned int value)
-+ unsigned int apic, unsigned int pin, uint64_t rte)
- {
-- struct IO_APIC_route_entry old_rte = { };
-- struct IO_APIC_route_entry new_rte = { };
-- unsigned int rte_lo = (reg & 1) ? reg - 1 : reg;
-- unsigned int pin = (reg - 0x10) / 2;
-+ struct IO_APIC_route_entry old_rte;
-+ struct IO_APIC_route_entry new_rte = { .raw = rte };
- int seg, bdf, rc;
-- bool saved_mask, fresh = false;
- struct amd_iommu *iommu;
- unsigned int idx;
-
-@@ -373,58 +339,23 @@ void cf_check amd_iommu_ioapic_update_ire(
- {
- AMD_IOMMU_WARN("failed to find IOMMU for IO-APIC @ %04x:%04x\n",
- seg, bdf);
-- __io_apic_write(apic, reg, value);
-+ __ioapic_write_entry(apic, pin, true, new_rte);
- return;
- }
-
-- /* save io-apic rte lower 32 bits */
-- *((u32 *)&old_rte) = __io_apic_read(apic, rte_lo);
-- saved_mask = old_rte.mask;
--
-- if ( reg == rte_lo )
-- {
-- *((u32 *)&new_rte) = value;
-- /* read upper 32 bits from io-apic rte */
-- *(((u32 *)&new_rte) + 1) = __io_apic_read(apic, reg + 1);
-- }
-- else
-- {
-- *((u32 *)&new_rte) = *((u32 *)&old_rte);
-- *(((u32 *)&new_rte) + 1) = value;
-- }
--
-- if ( ioapic_sbdf[idx].pin_2_idx[pin] >= INTREMAP_MAX_ENTRIES )
-- {
-- ASSERT(saved_mask);
--
-- /*
-- * There's nowhere except the IRTE to store a full 32-bit destination,
-- * so we may not bypass entry allocation and updating of the low RTE
-- * half in the (usual) case of the high RTE half getting written first.
-- */
-- if ( new_rte.mask && !x2apic_enabled )
-- {
-- __io_apic_write(apic, reg, value);
-- return;
-- }
--
-- fresh = true;
-- }
--
-+ old_rte = __ioapic_read_entry(apic, pin, true);
- /* mask the interrupt while we change the intremap table */
-- if ( !saved_mask )
-+ if ( !old_rte.mask )
- {
- old_rte.mask = 1;
-- __io_apic_write(apic, rte_lo, *((u32 *)&old_rte));
-+ __ioapic_write_entry(apic, pin, true, old_rte);
- }
-
- /* Update interrupt remapping entry */
- rc = update_intremap_entry_from_ioapic(
-- bdf, iommu, &new_rte, reg == rte_lo,
-+ bdf, iommu, &new_rte,
- &ioapic_sbdf[idx].pin_2_idx[pin]);
-
-- __io_apic_write(apic, reg, ((u32 *)&new_rte)[reg != rte_lo]);
--
- if ( rc )
- {
- /* Keep the entry masked. */
-@@ -433,20 +364,7 @@ void cf_check amd_iommu_ioapic_update_ire(
- return;
- }
-
-- /* For lower bits access, return directly to avoid double writes */
-- if ( reg == rte_lo )
-- return;
--
-- /*
-- * Unmask the interrupt after we have updated the intremap table. Also
-- * write the low half if a fresh entry was allocated for a high half
-- * update in x2APIC mode.
-- */
-- if ( !saved_mask || (x2apic_enabled && fresh) )
-- {
-- old_rte.mask = saved_mask;
-- __io_apic_write(apic, rte_lo, *((u32 *)&old_rte));
-- }
-+ __ioapic_write_entry(apic, pin, true, new_rte);
- }
-
- unsigned int cf_check amd_iommu_read_ioapic_from_ire(
-diff --git a/xen/drivers/passthrough/vtd/extern.h b/xen/drivers/passthrough/vtd/extern.h
-index 39602d1f88..d49e40c5ce 100644
---- a/xen/drivers/passthrough/vtd/extern.h
-+++ b/xen/drivers/passthrough/vtd/extern.h
-@@ -92,7 +92,7 @@ int cf_check intel_iommu_get_reserved_device_memory(
- unsigned int cf_check io_apic_read_remap_rte(
- unsigned int apic, unsigned int reg);
- void cf_check io_apic_write_remap_rte(
-- unsigned int apic, unsigned int reg, unsigned int value);
-+ unsigned int apic, unsigned int pin, uint64_t rte);
-
- struct msi_desc;
- struct msi_msg;
-diff --git a/xen/drivers/passthrough/vtd/intremap.c b/xen/drivers/passthrough/vtd/intremap.c
-index 53c9de9a75..78d7bc139a 100644
---- a/xen/drivers/passthrough/vtd/intremap.c
-+++ b/xen/drivers/passthrough/vtd/intremap.c
-@@ -328,15 +328,14 @@ static int remap_entry_to_ioapic_rte(
-
- static int ioapic_rte_to_remap_entry(struct vtd_iommu *iommu,
- int apic, unsigned int ioapic_pin, struct IO_xAPIC_route_entry *old_rte,
-- unsigned int rte_upper, unsigned int value)
-+ struct IO_xAPIC_route_entry new_rte)
- {
- struct iremap_entry *iremap_entry = NULL, *iremap_entries;
- struct iremap_entry new_ire;
- struct IO_APIC_route_remap_entry *remap_rte;
-- struct IO_xAPIC_route_entry new_rte;
- int index;
- unsigned long flags;
-- bool init = false;
-+ bool init = false, masked = old_rte->mask;
-
- remap_rte = (struct IO_APIC_route_remap_entry *) old_rte;
- spin_lock_irqsave(&iommu->intremap.lock, flags);
-@@ -364,48 +363,40 @@ static int ioapic_rte_to_remap_entry(struct vtd_iommu *iommu,
-
- new_ire = *iremap_entry;
-
-- if ( rte_upper )
-- {
-- if ( x2apic_enabled )
-- new_ire.remap.dst = value;
-- else
-- new_ire.remap.dst = (value >> 24) << 8;
-- }
-+ if ( x2apic_enabled )
-+ new_ire.remap.dst = new_rte.dest.dest32;
- else
-- {
-- *(((u32 *)&new_rte) + 0) = value;
-- new_ire.remap.fpd = 0;
-- new_ire.remap.dm = new_rte.dest_mode;
-- new_ire.remap.tm = new_rte.trigger;
-- new_ire.remap.dlm = new_rte.delivery_mode;
-- /* Hardware require RH = 1 for LPR delivery mode */
-- new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
-- new_ire.remap.avail = 0;
-- new_ire.remap.res_1 = 0;
-- new_ire.remap.vector = new_rte.vector;
-- new_ire.remap.res_2 = 0;
--
-- set_ioapic_source_id(IO_APIC_ID(apic), &new_ire);
-- new_ire.remap.res_3 = 0;
-- new_ire.remap.res_4 = 0;
-- new_ire.remap.p = 1; /* finally, set present bit */
--
-- /* now construct new ioapic rte entry */
-- remap_rte->vector = new_rte.vector;
-- remap_rte->delivery_mode = 0; /* has to be 0 for remap format */
-- remap_rte->index_15 = (index >> 15) & 0x1;
-- remap_rte->index_0_14 = index & 0x7fff;
--
-- remap_rte->delivery_status = new_rte.delivery_status;
-- remap_rte->polarity = new_rte.polarity;
-- remap_rte->irr = new_rte.irr;
-- remap_rte->trigger = new_rte.trigger;
-- remap_rte->mask = new_rte.mask;
-- remap_rte->reserved = 0;
-- remap_rte->format = 1; /* indicate remap format */
-- }
--
-- update_irte(iommu, iremap_entry, &new_ire, !init);
-+ new_ire.remap.dst = GET_xAPIC_ID(new_rte.dest.dest32) << 8;
-+
-+ new_ire.remap.dm = new_rte.dest_mode;
-+ new_ire.remap.tm = new_rte.trigger;
-+ new_ire.remap.dlm = new_rte.delivery_mode;
-+ /* Hardware require RH = 1 for LPR delivery mode. */
-+ new_ire.remap.rh = (new_ire.remap.dlm == dest_LowestPrio);
-+ new_ire.remap.vector = new_rte.vector;
-+
-+ set_ioapic_source_id(IO_APIC_ID(apic), &new_ire);
-+ /* Finally, set present bit. */
-+ new_ire.remap.p = 1;
-+
-+ /* Now construct new ioapic rte entry. */
-+ remap_rte->vector = new_rte.vector;
-+ /* Has to be 0 for remap format. */
-+ remap_rte->delivery_mode = 0;
-+ remap_rte->index_15 = (index >> 15) & 0x1;
-+ remap_rte->index_0_14 = index & 0x7fff;
-+
-+ remap_rte->delivery_status = new_rte.delivery_status;
-+ remap_rte->polarity = new_rte.polarity;
-+ remap_rte->irr = new_rte.irr;
-+ remap_rte->trigger = new_rte.trigger;
-+ remap_rte->mask = new_rte.mask;
-+ remap_rte->reserved = 0;
-+ /* Indicate remap format. */
-+ remap_rte->format = 1;
-+
-+ /* If cmpxchg16b is not available the caller must mask the IO-APIC pin. */
-+ update_irte(iommu, iremap_entry, &new_ire, !init && !masked);
- iommu_sync_cache(iremap_entry, sizeof(*iremap_entry));
- iommu_flush_iec_index(iommu, 0, index);
-
-@@ -439,36 +430,42 @@ unsigned int cf_check io_apic_read_remap_rte(
- }
-
- void cf_check io_apic_write_remap_rte(
-- unsigned int apic, unsigned int reg, unsigned int value)
-+ unsigned int apic, unsigned int pin, uint64_t rte)
- {
-- unsigned int pin = (reg - 0x10) / 2;
-+ struct IO_xAPIC_route_entry new_rte = { .raw = rte };
- struct IO_xAPIC_route_entry old_rte = { };
-- struct IO_APIC_route_remap_entry *remap_rte;
-- unsigned int rte_upper = (reg & 1) ? 1 : 0;
- struct vtd_iommu *iommu = ioapic_to_iommu(IO_APIC_ID(apic));
-- int saved_mask;
--
-- old_rte = __ioapic_read_entry(apic, pin, true);
--
-- remap_rte = (struct IO_APIC_route_remap_entry *) &old_rte;
--
-- /* mask the interrupt while we change the intremap table */
-- saved_mask = remap_rte->mask;
-- remap_rte->mask = 1;
-- __io_apic_write(apic, reg & ~1, *(u32 *)&old_rte);
-- remap_rte->mask = saved_mask;
-+ bool masked = true;
-+ int rc;
-
-- if ( ioapic_rte_to_remap_entry(iommu, apic, pin,
-- &old_rte, rte_upper, value) )
-+ if ( !cpu_has_cx16 )
- {
-- __io_apic_write(apic, reg, value);
-+ /*
-+ * Cannot atomically update the IRTE entry: mask the IO-APIC pin to
-+ * avoid interrupts seeing an inconsistent IRTE entry.
-+ */
-+ old_rte = __ioapic_read_entry(apic, pin, true);
-+ if ( !old_rte.mask )
-+ {
-+ masked = false;
-+ old_rte.mask = 1;
-+ __ioapic_write_entry(apic, pin, true, old_rte);
-+ }
-+ }
-
-- /* Recover the original value of 'mask' bit */
-- if ( rte_upper )
-- __io_apic_write(apic, reg & ~1, *(u32 *)&old_rte);
-+ rc = ioapic_rte_to_remap_entry(iommu, apic, pin, &old_rte, new_rte);
-+ if ( rc )
-+ {
-+ if ( !masked )
-+ {
-+ /* Recover the original value of 'mask' bit */
-+ old_rte.mask = 0;
-+ __ioapic_write_entry(apic, pin, true, old_rte);
-+ }
-+ return;
- }
-- else
-- __ioapic_write_entry(apic, pin, true, old_rte);
-+ /* old_rte will contain the updated IO-APIC RTE on success. */
-+ __ioapic_write_entry(apic, pin, true, old_rte);
- }
-
- static void set_msi_source_id(struct pci_dev *pdev, struct iremap_entry *ire)
-diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
-index f671b0f2bb..8bd0ccb2e9 100644
---- a/xen/drivers/passthrough/x86/iommu.c
-+++ b/xen/drivers/passthrough/x86/iommu.c
-@@ -142,9 +142,9 @@ int iommu_enable_x2apic(void)
- }
-
- void iommu_update_ire_from_apic(
-- unsigned int apic, unsigned int reg, unsigned int value)
-+ unsigned int apic, unsigned int pin, uint64_t rte)
- {
-- iommu_vcall(&iommu_ops, update_ire_from_apic, apic, reg, value);
-+ iommu_vcall(&iommu_ops, update_ire_from_apic, apic, pin, rte);
- }
-
- unsigned int iommu_read_apic_from_ire(unsigned int apic, unsigned int reg)
-diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
-index 4f22fc1bed..f8a52627f7 100644
---- a/xen/include/xen/iommu.h
-+++ b/xen/include/xen/iommu.h
-@@ -274,7 +274,8 @@ struct iommu_ops {
- int (*enable_x2apic)(void);
- void (*disable_x2apic)(void);
-
-- void (*update_ire_from_apic)(unsigned int apic, unsigned int reg, unsigned int value);
-+ void (*update_ire_from_apic)(unsigned int apic, unsigned int pin,
-+ uint64_t rte);
- unsigned int (*read_apic_from_ire)(unsigned int apic, unsigned int reg);
-
- int (*setup_hpet_msi)(struct msi_desc *);
---
-2.42.0
-
diff --git a/0023-build-correct-gas-noexecstack-check.patch b/0023-build-correct-gas-noexecstack-check.patch
deleted file mode 100644
index 245d631..0000000
--- a/0023-build-correct-gas-noexecstack-check.patch
+++ /dev/null
@@ -1,34 +0,0 @@
-From ba360fbb6413231f84a7d68f5cb34858f81d4d23 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 5 Sep 2023 08:51:50 +0200
-Subject: [PATCH 23/55] build: correct gas --noexecstack check
-
-The check was missing an escape for the inner $, thus breaking things
-in the unlikely event that the underlying assembler doesn't support this
-option.
-
-Fixes: 62d22296a95d ("build: silence GNU ld warning about executable stacks")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
-master commit: d1f6a58dfdc508c43a51c1865c826d519bf16493
-master date: 2023-08-14 09:58:19 +0200
----
- xen/Makefile | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/Makefile b/xen/Makefile
-index 7bb9de7bdc..455916c757 100644
---- a/xen/Makefile
-+++ b/xen/Makefile
-@@ -405,7 +405,7 @@ endif
-
- AFLAGS += -D__ASSEMBLY__
-
--$(call cc-option-add,AFLAGS,CC,-Wa$(comma)--noexecstack)
-+$(call cc-option-add,AFLAGS,CC,-Wa$$(comma)--noexecstack)
-
- LDFLAGS-$(call ld-option,--warn-rwx-segments) += --no-warn-rwx-segments
-
---
-2.42.0
-
diff --git a/0024-libxl-slightly-correct-JSON-generation-of-CPU-policy.patch b/0024-libxl-slightly-correct-JSON-generation-of-CPU-policy.patch
deleted file mode 100644
index 1ec7335..0000000
--- a/0024-libxl-slightly-correct-JSON-generation-of-CPU-policy.patch
+++ /dev/null
@@ -1,38 +0,0 @@
-From 042982297802e7b746dc2fac95a453cc88d0aa83 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 5 Sep 2023 08:52:15 +0200
-Subject: [PATCH 24/55] libxl: slightly correct JSON generation of CPU policy
-
-The "cpuid_empty" label is also (in principle; maybe only for rubbish
-input) reachable in the "cpuid_only" case. Hence the label needs to live
-ahead of the check of the variable.
-
-Fixes: 5b80cecb747b ("libxl: introduce MSR data in libxl_cpuid_policy")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
-master commit: ebce4e3a146c39e57bb7a890e059e89c32b6d547
-master date: 2023-08-17 16:24:17 +0200
----
- tools/libs/light/libxl_cpuid.c | 3 ++-
- 1 file changed, 2 insertions(+), 1 deletion(-)
-
-diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
-index 849722541c..5c66d094b2 100644
---- a/tools/libs/light/libxl_cpuid.c
-+++ b/tools/libs/light/libxl_cpuid.c
-@@ -710,10 +710,11 @@ parse_cpuid:
- libxl__strdup(NOGC, libxl__json_object_get_string(r));
- }
- }
-+
-+cpuid_empty:
- if (cpuid_only)
- return 0;
-
--cpuid_empty:
- co = libxl__json_map_get("msr", o, JSON_ARRAY);
- if (!libxl__json_object_is_array(co))
- return ERROR_FAIL;
---
-2.42.0
-
diff --git a/0025-tboot-Disable-CET-at-shutdown.patch b/0025-tboot-Disable-CET-at-shutdown.patch
deleted file mode 100644
index f06db61..0000000
--- a/0025-tboot-Disable-CET-at-shutdown.patch
+++ /dev/null
@@ -1,53 +0,0 @@
-From 7ca58fbef489fcb17631872a2bdc929823a2a494 Mon Sep 17 00:00:00 2001
-From: Jason Andryuk <jandryuk@gmail.com>
-Date: Tue, 5 Sep 2023 08:52:33 +0200
-Subject: [PATCH 25/55] tboot: Disable CET at shutdown
-
-tboot_shutdown() calls into tboot to perform the actual system shutdown.
-tboot isn't built with endbr annotations, and Xen has CET-IBT enabled on
-newer hardware. shutdown_entry isn't annotated with endbr and Xen
-faults:
-
-Panic on CPU 0:
-CONTROL-FLOW PROTECTION FAULT: #CP[0003] endbranch
-
-And Xen hangs at this point.
-
-Disabling CET-IBT let Xen and tboot power off, but reboot was
-perfoming a poweroff instead of a warm reboot. Disabling all of CET,
-i.e. shadow stacks as well, lets tboot reboot properly.
-
-Fixes: cdbe2b0a1aec ("x86: Enable CET Indirect Branch Tracking")
-Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
-master commit: 0801868f550539d417d46f82c49307480947ccaa
-master date: 2023-08-17 16:24:49 +0200
----
- xen/arch/x86/tboot.c | 10 ++++++++++
- 1 file changed, 10 insertions(+)
-
-diff --git a/xen/arch/x86/tboot.c b/xen/arch/x86/tboot.c
-index fe1abfdf08..a2e9e97ed7 100644
---- a/xen/arch/x86/tboot.c
-+++ b/xen/arch/x86/tboot.c
-@@ -398,6 +398,16 @@ void tboot_shutdown(uint32_t shutdown_type)
- tboot_gen_xenheap_integrity(g_tboot_shared->s3_key, &xenheap_mac);
- }
-
-+ /*
-+ * Disable CET - tboot may not be built with endbr, and it doesn't support
-+ * shadow stacks.
-+ */
-+ if ( read_cr4() & X86_CR4_CET )
-+ {
-+ wrmsrl(MSR_S_CET, 0);
-+ write_cr4(read_cr4() & ~X86_CR4_CET);
-+ }
-+
- /*
- * During early boot, we can be called by panic before idle_vcpu[0] is
- * setup, but in that case we don't need to change page tables.
---
-2.42.0
-
diff --git a/0026-x86-svm-Fix-valid-condition-in-svm_get_pending_event.patch b/0026-x86-svm-Fix-valid-condition-in-svm_get_pending_event.patch
deleted file mode 100644
index 10aa14f..0000000
--- a/0026-x86-svm-Fix-valid-condition-in-svm_get_pending_event.patch
+++ /dev/null
@@ -1,29 +0,0 @@
-From a939e953cdd522da3d8f0efeaea84448b5b570f9 Mon Sep 17 00:00:00 2001
-From: Jinoh Kang <jinoh.kang.kr@gmail.com>
-Date: Tue, 5 Sep 2023 08:53:01 +0200
-Subject: [PATCH 26/55] x86/svm: Fix valid condition in svm_get_pending_event()
-
-Fixes: 9864841914c2 ("x86/vm_event: add support for VM_EVENT_REASON_INTERRUPT")
-Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com>
-master commit: b2865c2b6f164d2c379177cdd1cb200e4eaba549
-master date: 2023-08-18 20:21:44 +0100
----
- xen/arch/x86/hvm/svm/svm.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
-index 5fa945c526..e8f50e7c5e 100644
---- a/xen/arch/x86/hvm/svm/svm.c
-+++ b/xen/arch/x86/hvm/svm/svm.c
-@@ -2490,7 +2490,7 @@ static bool cf_check svm_get_pending_event(
- {
- const struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
-
-- if ( vmcb->event_inj.v )
-+ if ( !vmcb->event_inj.v )
- return false;
-
- info->vector = vmcb->event_inj.vector;
---
-2.42.0
-
diff --git a/0027-x86-vmx-Revert-x86-VMX-sanitize-rIP-before-re-enteri.patch b/0027-x86-vmx-Revert-x86-VMX-sanitize-rIP-before-re-enteri.patch
deleted file mode 100644
index a022066..0000000
--- a/0027-x86-vmx-Revert-x86-VMX-sanitize-rIP-before-re-enteri.patch
+++ /dev/null
@@ -1,100 +0,0 @@
-From 8be85d8c0df2445c012fac42117396b483db5db0 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 5 Sep 2023 08:53:31 +0200
-Subject: [PATCH 27/55] x86/vmx: Revert "x86/VMX: sanitize rIP before
- re-entering guest"
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-At the time of XSA-170, the x86 instruction emulator was genuinely broken. It
-would load arbitrary values into %rip and putting a check here probably was
-the best stopgap security fix. It should have been reverted following c/s
-81d3a0b26c1 "x86emul: limit-check branch targets" which corrected the emulator
-behaviour.
-
-However, everyone involved in XSA-170, myself included, failed to read the SDM
-correctly. On the subject of %rip consistency checks, the SDM stated:
-
- If the processor supports N < 64 linear-address bits, bits 63:N must be
- identical
-
-A non-canonical %rip (and SSP more recently) is an explicitly legal state in
-x86, and the VMEntry consistency checks are intentionally off-by-one from a
-regular canonical check.
-
-The consequence of this bug is that Xen will currently take a legal x86 state
-which would successfully VMEnter, and corrupt it into having non-architectural
-behaviour.
-
-Furthermore, in the time this bugfix has been pending in public, I
-successfully persuaded Intel to clarify the SDM, adding the following
-clarification:
-
- The guest RIP value is not required to be canonical; the value of bit N-1
- may differ from that of bit N.
-
-Fixes: ffbbfda377 ("x86/VMX: sanitize rIP before re-entering guest")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 10c83bb0f5d158d101d983883741b76f927e54a3
-master date: 2023-08-23 18:44:59 +0100
----
- xen/arch/x86/hvm/vmx/vmx.c | 34 +---------------------------------
- 1 file changed, 1 insertion(+), 33 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
-index f256dc2635..072288a5ef 100644
---- a/xen/arch/x86/hvm/vmx/vmx.c
-+++ b/xen/arch/x86/hvm/vmx/vmx.c
-@@ -3975,7 +3975,7 @@ static void undo_nmis_unblocked_by_iret(void)
- void vmx_vmexit_handler(struct cpu_user_regs *regs)
- {
- unsigned long exit_qualification, exit_reason, idtv_info, intr_info = 0;
-- unsigned int vector = 0, mode;
-+ unsigned int vector = 0;
- struct vcpu *v = current;
- struct domain *currd = v->domain;
-
-@@ -4650,38 +4650,6 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
- out:
- if ( nestedhvm_vcpu_in_guestmode(v) )
- nvmx_idtv_handling();
--
-- /*
-- * VM entry will fail (causing the guest to get crashed) if rIP (and
-- * rFLAGS, but we don't have an issue there) doesn't meet certain
-- * criteria. As we must not allow less than fully privileged mode to have
-- * such an effect on the domain, we correct rIP in that case (accepting
-- * this not being architecturally correct behavior, as the injected #GP
-- * fault will then not see the correct [invalid] return address).
-- * And since we know the guest will crash, we crash it right away if it
-- * already is in most privileged mode.
-- */
-- mode = vmx_guest_x86_mode(v);
-- if ( mode == 8 ? !is_canonical_address(regs->rip)
-- : regs->rip != regs->eip )
-- {
-- gprintk(XENLOG_WARNING, "Bad rIP %lx for mode %u\n", regs->rip, mode);
--
-- if ( vmx_get_cpl() )
-- {
-- __vmread(VM_ENTRY_INTR_INFO, &intr_info);
-- if ( !(intr_info & INTR_INFO_VALID_MASK) )
-- hvm_inject_hw_exception(TRAP_gp_fault, 0);
-- /* Need to fix rIP nevertheless. */
-- if ( mode == 8 )
-- regs->rip = (long)(regs->rip << (64 - VADDR_BITS)) >>
-- (64 - VADDR_BITS);
-- else
-- regs->rip = regs->eip;
-- }
-- else
-- domain_crash(v->domain);
-- }
- }
-
- static void lbr_tsx_fixup(void)
---
-2.42.0
-
diff --git a/0028-x86-irq-fix-reporting-of-spurious-i8259-interrupts.patch b/0028-x86-irq-fix-reporting-of-spurious-i8259-interrupts.patch
deleted file mode 100644
index 2fcfd68..0000000
--- a/0028-x86-irq-fix-reporting-of-spurious-i8259-interrupts.patch
+++ /dev/null
@@ -1,41 +0,0 @@
-From 699de512748d8e3bdcb3225b3b2a77c10cfd2408 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Sep 2023 08:53:57 +0200
-Subject: [PATCH 28/55] x86/irq: fix reporting of spurious i8259 interrupts
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The return value of bogus_8259A_irq() is wrong: the function will
-return `true` when the IRQ is real and `false` when it's a spurious
-IRQ. This causes the "No irq handler for vector ..." message in
-do_IRQ() to be printed for spurious i8259 interrupts which is not
-intended (and not helpful).
-
-Fix by inverting the return value of bogus_8259A_irq().
-
-Fixes: 132906348a14 ('x86/i8259: Handle bogus spurious interrupts more quietly')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 709f6c8ce6422475c372e67507606170a31ccb65
-master date: 2023-08-30 10:03:53 +0200
----
- xen/arch/x86/i8259.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/i8259.c b/xen/arch/x86/i8259.c
-index 6b35be10f0..ed9f55abe5 100644
---- a/xen/arch/x86/i8259.c
-+++ b/xen/arch/x86/i8259.c
-@@ -37,7 +37,7 @@ static bool _mask_and_ack_8259A_irq(unsigned int irq);
-
- bool bogus_8259A_irq(unsigned int irq)
- {
-- return _mask_and_ack_8259A_irq(irq);
-+ return !_mask_and_ack_8259A_irq(irq);
- }
-
- static void cf_check mask_and_ack_8259A_irq(struct irq_desc *desc)
---
-2.42.0
-
diff --git a/0029-xen-arm-page-Handle-cache-flush-of-an-element-at-the.patch b/0029-xen-arm-page-Handle-cache-flush-of-an-element-at-the.patch
deleted file mode 100644
index bc866d0..0000000
--- a/0029-xen-arm-page-Handle-cache-flush-of-an-element-at-the.patch
+++ /dev/null
@@ -1,111 +0,0 @@
-From d31e5b2a9c39816a954d1088d4cfc782f0006f39 Mon Sep 17 00:00:00 2001
-From: Stefano Stabellini <stefano.stabellini@amd.com>
-Date: Tue, 5 Sep 2023 14:33:29 +0200
-Subject: [PATCH 29/55] xen/arm: page: Handle cache flush of an element at the
- top of the address space
-
-The region that needs to be cleaned/invalidated may be at the top
-of the address space. This means that 'end' (i.e. 'p + size') will
-be 0 and therefore nothing will be cleaned/invalidated as the check
-in the loop will always be false.
-
-On Arm64, we only support we only support up to 48-bit Virtual
-address space. So this is not a concern there. However, for 32-bit,
-the mapcache is using the last 2GB of the address space. Therefore
-we may not clean/invalidate properly some pages. This could lead
-to memory corruption or data leakage (the scrubbed value may
-still sit in the cache when the guest could read directly the memory
-and therefore read the old content).
-
-Rework invalidate_dcache_va_range(), clean_dcache_va_range(),
-clean_and_invalidate_dcache_va_range() to handle a cache flush
-with an element at the top of the address space.
-
-This is CVE-2023-34321 / XSA-437.
-
-Reported-by: Julien Grall <jgrall@amazon.com>
-Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
-Signed-off-by: Julien Grall <jgrall@amazon.com>
-Acked-by: Bertrand Marquis <bertrand.marquis@arm.com>
-master commit: 9a216e92de9f9011097e4f1fb55ff67ba0a21704
-master date: 2023-09-05 14:30:08 +0200
----
- xen/arch/arm/include/asm/page.h | 33 ++++++++++++++++++++-------------
- 1 file changed, 20 insertions(+), 13 deletions(-)
-
-diff --git a/xen/arch/arm/include/asm/page.h b/xen/arch/arm/include/asm/page.h
-index e7cd62190c..d7fe770a5e 100644
---- a/xen/arch/arm/include/asm/page.h
-+++ b/xen/arch/arm/include/asm/page.h
-@@ -160,26 +160,25 @@ static inline size_t read_dcache_line_bytes(void)
-
- static inline int invalidate_dcache_va_range(const void *p, unsigned long size)
- {
-- const void *end = p + size;
- size_t cacheline_mask = dcache_line_bytes - 1;
-
- dsb(sy); /* So the CPU issues all writes to the range */
-
- if ( (uintptr_t)p & cacheline_mask )
- {
-+ size -= dcache_line_bytes - ((uintptr_t)p & cacheline_mask);
- p = (void *)((uintptr_t)p & ~cacheline_mask);
- asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p));
- p += dcache_line_bytes;
- }
-- if ( (uintptr_t)end & cacheline_mask )
-- {
-- end = (void *)((uintptr_t)end & ~cacheline_mask);
-- asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (end));
-- }
-
-- for ( ; p < end; p += dcache_line_bytes )
-+ for ( ; size >= dcache_line_bytes;
-+ p += dcache_line_bytes, size -= dcache_line_bytes )
- asm volatile (__invalidate_dcache_one(0) : : "r" (p));
-
-+ if ( size > 0 )
-+ asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p));
-+
- dsb(sy); /* So we know the flushes happen before continuing */
-
- return 0;
-@@ -187,10 +186,14 @@ static inline int invalidate_dcache_va_range(const void *p, unsigned long size)
-
- static inline int clean_dcache_va_range(const void *p, unsigned long size)
- {
-- const void *end = p + size;
-+ size_t cacheline_mask = dcache_line_bytes - 1;
-+
- dsb(sy); /* So the CPU issues all writes to the range */
-- p = (void *)((uintptr_t)p & ~(dcache_line_bytes - 1));
-- for ( ; p < end; p += dcache_line_bytes )
-+ size += (uintptr_t)p & cacheline_mask;
-+ size = (size + cacheline_mask) & ~cacheline_mask;
-+ p = (void *)((uintptr_t)p & ~cacheline_mask);
-+ for ( ; size >= dcache_line_bytes;
-+ p += dcache_line_bytes, size -= dcache_line_bytes )
- asm volatile (__clean_dcache_one(0) : : "r" (p));
- dsb(sy); /* So we know the flushes happen before continuing */
- /* ARM callers assume that dcache_* functions cannot fail. */
-@@ -200,10 +203,14 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size)
- static inline int clean_and_invalidate_dcache_va_range
- (const void *p, unsigned long size)
- {
-- const void *end = p + size;
-+ size_t cacheline_mask = dcache_line_bytes - 1;
-+
- dsb(sy); /* So the CPU issues all writes to the range */
-- p = (void *)((uintptr_t)p & ~(dcache_line_bytes - 1));
-- for ( ; p < end; p += dcache_line_bytes )
-+ size += (uintptr_t)p & cacheline_mask;
-+ size = (size + cacheline_mask) & ~cacheline_mask;
-+ p = (void *)((uintptr_t)p & ~cacheline_mask);
-+ for ( ; size >= dcache_line_bytes;
-+ p += dcache_line_bytes, size -= dcache_line_bytes )
- asm volatile (__clean_and_invalidate_dcache_one(0) : : "r" (p));
- dsb(sy); /* So we know the flushes happen before continuing */
- /* ARM callers assume that dcache_* functions cannot fail. */
---
-2.42.0
-
diff --git a/0030-x86-AMD-extend-Zenbleed-check-to-models-good-ucode-i.patch b/0030-x86-AMD-extend-Zenbleed-check-to-models-good-ucode-i.patch
deleted file mode 100644
index 4581d03..0000000
--- a/0030-x86-AMD-extend-Zenbleed-check-to-models-good-ucode-i.patch
+++ /dev/null
@@ -1,48 +0,0 @@
-From d2d2dcae879c6cc05227c9620f0a772f35fe6886 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Wed, 23 Aug 2023 09:26:36 +0200
-Subject: [PATCH 30/55] x86/AMD: extend Zenbleed check to models "good" ucode
- isn't known for
-
-Reportedly the AMD Custom APU 0405 found on SteamDeck, models 0x90 and
-0x91, (quoting the respective Linux commit) is similarly affected. Put
-another instance of our Zen1 vs Zen2 distinction checks in
-amd_check_zenbleed(), forcing use of the chickenbit irrespective of
-ucode version (building upon real hardware never surfacing a version of
-0xffffffff).
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-(cherry picked from commit 145a69c0944ac70cfcf9d247c85dee9e99d9d302)
----
- xen/arch/x86/cpu/amd.c | 13 ++++++++++---
- 1 file changed, 10 insertions(+), 3 deletions(-)
-
-diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
-index 3ea214fc2e..1bb3044be1 100644
---- a/xen/arch/x86/cpu/amd.c
-+++ b/xen/arch/x86/cpu/amd.c
-@@ -909,10 +909,17 @@ void amd_check_zenbleed(void)
- case 0xa0 ... 0xaf: good_rev = 0x08a00008; break;
- default:
- /*
-- * With the Fam17h check above, parts getting here are Zen1.
-- * They're not affected.
-+ * With the Fam17h check above, most parts getting here are
-+ * Zen1. They're not affected. Assume Zen2 ones making it
-+ * here are affected regardless of microcode version.
-+ *
-+ * Zen1 vs Zen2 isn't a simple model number comparison, so use
-+ * STIBP as a heuristic to distinguish.
- */
-- return;
-+ if (!boot_cpu_has(X86_FEATURE_AMD_STIBP))
-+ return;
-+ good_rev = ~0U;
-+ break;
- }
-
- rdmsrl(MSR_AMD64_DE_CFG, val);
---
-2.42.0
-
diff --git a/0031-x86-spec-ctrl-Fix-confusion-between-SPEC_CTRL_EXIT_T.patch b/0031-x86-spec-ctrl-Fix-confusion-between-SPEC_CTRL_EXIT_T.patch
deleted file mode 100644
index 10417ae..0000000
--- a/0031-x86-spec-ctrl-Fix-confusion-between-SPEC_CTRL_EXIT_T.patch
+++ /dev/null
@@ -1,74 +0,0 @@
-From dc28aba565f226f9bec24cfde993e78478acfb4e Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 12 Sep 2023 15:06:49 +0100
-Subject: [PATCH 31/55] x86/spec-ctrl: Fix confusion between
- SPEC_CTRL_EXIT_TO_XEN{,_IST}
-
-c/s 3fffaf9c13e9 ("x86/entry: Avoid using alternatives in NMI/#MC paths")
-dropped the only user, leaving behind the (incorrect) implication that Xen had
-split exit paths.
-
-Delete the unused SPEC_CTRL_EXIT_TO_XEN and rename SPEC_CTRL_EXIT_TO_XEN_IST
-to SPEC_CTRL_EXIT_TO_XEN for consistency.
-
-No functional change.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 1c18d73774533a55ba9d1cbee8bdace03efdb5e7)
----
- xen/arch/x86/include/asm/spec_ctrl_asm.h | 10 ++--------
- xen/arch/x86/x86_64/entry.S | 2 +-
- 2 files changed, 3 insertions(+), 9 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-index f23bb105c5..e8fd01243c 100644
---- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
-+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-@@ -79,7 +79,6 @@
- * - SPEC_CTRL_ENTRY_FROM_PV
- * - SPEC_CTRL_ENTRY_FROM_INTR
- * - SPEC_CTRL_ENTRY_FROM_INTR_IST
-- * - SPEC_CTRL_EXIT_TO_XEN_IST
- * - SPEC_CTRL_EXIT_TO_XEN
- * - SPEC_CTRL_EXIT_TO_PV
- *
-@@ -268,11 +267,6 @@
- ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1), \
- X86_FEATURE_SC_MSR_PV
-
--/* Use when exiting to Xen context. */
--#define SPEC_CTRL_EXIT_TO_XEN \
-- ALTERNATIVE "", \
-- DO_SPEC_CTRL_EXIT_TO_XEN, X86_FEATURE_SC_MSR_PV
--
- /* Use when exiting to PV guest context. */
- #define SPEC_CTRL_EXIT_TO_PV \
- ALTERNATIVE "", \
-@@ -339,8 +333,8 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
- UNLIKELY_END(\@_serialise)
- .endm
-
--/* Use when exiting to Xen in IST context. */
--.macro SPEC_CTRL_EXIT_TO_XEN_IST
-+/* Use when exiting to Xen context. */
-+.macro SPEC_CTRL_EXIT_TO_XEN
- /*
- * Requires %rbx=stack_end
- * Clobbers %rax, %rcx, %rdx
-diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
-index 7675a59ff0..b45a09823a 100644
---- a/xen/arch/x86/x86_64/entry.S
-+++ b/xen/arch/x86/x86_64/entry.S
-@@ -673,7 +673,7 @@ UNLIKELY_START(ne, exit_cr3)
- UNLIKELY_END(exit_cr3)
-
- /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-- SPEC_CTRL_EXIT_TO_XEN_IST /* Req: %rbx=end, Clob: acd */
-+ SPEC_CTRL_EXIT_TO_XEN /* Req: %rbx=end, Clob: acd */
-
- RESTORE_ALL adj=8
- iretq
---
-2.42.0
-
diff --git a/0032-x86-spec-ctrl-Fold-DO_SPEC_CTRL_EXIT_TO_XEN-into-it-.patch b/0032-x86-spec-ctrl-Fold-DO_SPEC_CTRL_EXIT_TO_XEN-into-it-.patch
deleted file mode 100644
index a0c83da..0000000
--- a/0032-x86-spec-ctrl-Fold-DO_SPEC_CTRL_EXIT_TO_XEN-into-it-.patch
+++ /dev/null
@@ -1,85 +0,0 @@
-From 84690fb82c4f4aecb72a6789d8994efa74841e09 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 12 Sep 2023 17:03:16 +0100
-Subject: [PATCH 32/55] x86/spec-ctrl: Fold DO_SPEC_CTRL_EXIT_TO_XEN into it's
- single user
-
-With the SPEC_CTRL_EXIT_TO_XEN{,_IST} confusion fixed, it's now obvious that
-there's only a single EXIT_TO_XEN path. Fold DO_SPEC_CTRL_EXIT_TO_XEN into
-SPEC_CTRL_EXIT_TO_XEN to simplify further fixes.
-
-When merging labels, switch the name to .L\@_skip_sc_msr as "skip" on its own
-is going to be too generic shortly.
-
-No functional change.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 694bb0f280fd08a4377e36e32b84b5062def4de2)
----
- xen/arch/x86/include/asm/spec_ctrl_asm.h | 40 ++++++++++--------------
- 1 file changed, 16 insertions(+), 24 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-index e8fd01243c..d5f65d80ea 100644
---- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
-+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-@@ -211,27 +211,6 @@
- wrmsr
- .endm
-
--.macro DO_SPEC_CTRL_EXIT_TO_XEN
--/*
-- * Requires %rbx=stack_end
-- * Clobbers %rax, %rcx, %rdx
-- *
-- * When returning to Xen context, look to see whether SPEC_CTRL shadowing is
-- * in effect, and reload the shadow value. This covers race conditions which
-- * exist with an NMI/MCE/etc hitting late in the return-to-guest path.
-- */
-- xor %edx, %edx
--
-- testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
-- jz .L\@_skip
--
-- mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%rbx), %eax
-- mov $MSR_SPEC_CTRL, %ecx
-- wrmsr
--
--.L\@_skip:
--.endm
--
- .macro DO_SPEC_CTRL_EXIT_TO_GUEST
- /*
- * Requires %eax=spec_ctrl, %rsp=regs/cpuinfo
-@@ -340,11 +319,24 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
- * Clobbers %rax, %rcx, %rdx
- */
- testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
-- jz .L\@_skip
-+ jz .L\@_skip_sc_msr
-
-- DO_SPEC_CTRL_EXIT_TO_XEN
-+ /*
-+ * When returning to Xen context, look to see whether SPEC_CTRL shadowing
-+ * is in effect, and reload the shadow value. This covers race conditions
-+ * which exist with an NMI/MCE/etc hitting late in the return-to-guest
-+ * path.
-+ */
-+ xor %edx, %edx
-
--.L\@_skip:
-+ testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
-+ jz .L\@_skip_sc_msr
-+
-+ mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%rbx), %eax
-+ mov $MSR_SPEC_CTRL, %ecx
-+ wrmsr
-+
-+.L\@_skip_sc_msr:
- .endm
-
- #endif /* __ASSEMBLY__ */
---
-2.42.0
-
diff --git a/0033-x86-spec-ctrl-Turn-the-remaining-SPEC_CTRL_-ENTRY-EX.patch b/0033-x86-spec-ctrl-Turn-the-remaining-SPEC_CTRL_-ENTRY-EX.patch
deleted file mode 100644
index a278c5f..0000000
--- a/0033-x86-spec-ctrl-Turn-the-remaining-SPEC_CTRL_-ENTRY-EX.patch
+++ /dev/null
@@ -1,83 +0,0 @@
-From 3952c73bdbd05f0e666986fce633a591237b3c88 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 1 Sep 2023 11:38:44 +0100
-Subject: [PATCH 33/55] x86/spec-ctrl: Turn the remaining
- SPEC_CTRL_{ENTRY,EXIT}_* into asm macros
-
-These have grown more complex over time, with some already having been
-converted.
-
-Provide full Requires/Clobbers comments, otherwise missing at this level of
-indirection.
-
-No functional change.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 7125429aafb9e3c9c88fc93001fc2300e0ac2cc8)
----
- xen/arch/x86/include/asm/spec_ctrl_asm.h | 37 ++++++++++++++++++------
- 1 file changed, 28 insertions(+), 9 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-index d5f65d80ea..c6d5f2ad01 100644
---- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
-+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-@@ -231,26 +231,45 @@
- .endm
-
- /* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
--#define SPEC_CTRL_ENTRY_FROM_PV \
-+.macro SPEC_CTRL_ENTRY_FROM_PV
-+/*
-+ * Requires %rsp=regs/cpuinfo, %rdx=0
-+ * Clobbers %rax, %rcx, %rdx
-+ */
- ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=0), \
-- X86_FEATURE_IBPB_ENTRY_PV; \
-- ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV; \
-+ X86_FEATURE_IBPB_ENTRY_PV
-+
-+ ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV
-+
- ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=0), \
- X86_FEATURE_SC_MSR_PV
-+.endm
-
- /* Use in interrupt/exception context. May interrupt Xen or PV context. */
--#define SPEC_CTRL_ENTRY_FROM_INTR \
-+.macro SPEC_CTRL_ENTRY_FROM_INTR
-+/*
-+ * Requires %rsp=regs, %r14=stack_end, %rdx=0
-+ * Clobbers %rax, %rcx, %rdx
-+ */
- ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=1), \
-- X86_FEATURE_IBPB_ENTRY_PV; \
-- ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV; \
-+ X86_FEATURE_IBPB_ENTRY_PV
-+
-+ ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV
-+
- ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1), \
- X86_FEATURE_SC_MSR_PV
-+.endm
-
- /* Use when exiting to PV guest context. */
--#define SPEC_CTRL_EXIT_TO_PV \
-- ALTERNATIVE "", \
-- DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV; \
-+.macro SPEC_CTRL_EXIT_TO_PV
-+/*
-+ * Requires %rax=spec_ctrl, %rsp=regs/info
-+ * Clobbers %rcx, %rdx
-+ */
-+ ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV
-+
- DO_SPEC_CTRL_COND_VERW
-+.endm
-
- /*
- * Use in IST interrupt/exception context. May interrupt Xen or PV context.
---
-2.42.0
-
diff --git a/0034-x86-spec-ctrl-Improve-all-SPEC_CTRL_-ENTER-EXIT-_-co.patch b/0034-x86-spec-ctrl-Improve-all-SPEC_CTRL_-ENTER-EXIT-_-co.patch
deleted file mode 100644
index f360cbd..0000000
--- a/0034-x86-spec-ctrl-Improve-all-SPEC_CTRL_-ENTER-EXIT-_-co.patch
+++ /dev/null
@@ -1,106 +0,0 @@
-From ba023e93d0b1e60b80251bf080bab694efb9f8e3 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 30 Aug 2023 20:11:50 +0100
-Subject: [PATCH 34/55] x86/spec-ctrl: Improve all SPEC_CTRL_{ENTER,EXIT}_*
- comments
-
-... to better explain how they're used.
-
-Doing so highlights that SPEC_CTRL_EXIT_TO_XEN is missing a VERW flush for the
-corner case when e.g. an NMI hits late in an exit-to-guest path.
-
-Leave a TODO, which will be addressed in subsequent patches which arrange for
-VERW flushing to be safe within SPEC_CTRL_EXIT_TO_XEN.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 45f00557350dc7d0756551069803fc49c29184ca)
----
- xen/arch/x86/include/asm/spec_ctrl_asm.h | 36 ++++++++++++++++++++----
- 1 file changed, 31 insertions(+), 5 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-index c6d5f2ad01..97c4db31cd 100644
---- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
-+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-@@ -230,7 +230,10 @@
- wrmsr
- .endm
-
--/* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
-+/*
-+ * Used after an entry from PV context: SYSCALL, SYSENTER, INT,
-+ * etc. There is always a guest speculation state in context.
-+ */
- .macro SPEC_CTRL_ENTRY_FROM_PV
- /*
- * Requires %rsp=regs/cpuinfo, %rdx=0
-@@ -245,7 +248,11 @@
- X86_FEATURE_SC_MSR_PV
- .endm
-
--/* Use in interrupt/exception context. May interrupt Xen or PV context. */
-+/*
-+ * Used after an exception or maskable interrupt, hitting Xen or PV context.
-+ * There will either be a guest speculation context, or (barring fatal
-+ * exceptions) a well-formed Xen speculation context.
-+ */
- .macro SPEC_CTRL_ENTRY_FROM_INTR
- /*
- * Requires %rsp=regs, %r14=stack_end, %rdx=0
-@@ -260,7 +267,10 @@
- X86_FEATURE_SC_MSR_PV
- .endm
-
--/* Use when exiting to PV guest context. */
-+/*
-+ * Used when exiting from any entry context, back to PV context. This
-+ * includes from an IST entry which moved onto the primary stack.
-+ */
- .macro SPEC_CTRL_EXIT_TO_PV
- /*
- * Requires %rax=spec_ctrl, %rsp=regs/info
-@@ -272,7 +282,13 @@
- .endm
-
- /*
-- * Use in IST interrupt/exception context. May interrupt Xen or PV context.
-+ * Used after an IST entry hitting Xen or PV context. Special care is needed,
-+ * because when hitting Xen context, there may not be a well-formed
-+ * speculation context. (i.e. it can hit in the middle of
-+ * SPEC_CTRL_{ENTRY,EXIT}_* regions.)
-+ *
-+ * An IST entry which hits PV context moves onto the primary stack and leaves
-+ * via SPEC_CTRL_EXIT_TO_PV, *not* SPEC_CTRL_EXIT_TO_XEN.
- */
- .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
- /*
-@@ -331,7 +347,14 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
- UNLIKELY_END(\@_serialise)
- .endm
-
--/* Use when exiting to Xen context. */
-+/*
-+ * Use when exiting from any entry context, back to Xen context. This
-+ * includes returning to other SPEC_CTRL_{ENTRY,EXIT}_* regions with an
-+ * incomplete speculation context.
-+ *
-+ * Because we might have interrupted Xen beyond SPEC_CTRL_EXIT_TO_$GUEST, we
-+ * need to treat this as if it were an EXIT_TO_$GUEST case too.
-+ */
- .macro SPEC_CTRL_EXIT_TO_XEN
- /*
- * Requires %rbx=stack_end
-@@ -356,6 +379,9 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
- wrmsr
-
- .L\@_skip_sc_msr:
-+
-+ /* TODO VERW */
-+
- .endm
-
- #endif /* __ASSEMBLY__ */
---
-2.42.0
-
diff --git a/0035-x86-entry-Adjust-restore_all_xen-to-hold-stack_end-i.patch b/0035-x86-entry-Adjust-restore_all_xen-to-hold-stack_end-i.patch
deleted file mode 100644
index fe2acaf..0000000
--- a/0035-x86-entry-Adjust-restore_all_xen-to-hold-stack_end-i.patch
+++ /dev/null
@@ -1,74 +0,0 @@
-From 5f7efd47c8273fde972637d0360851802f76eca9 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 13 Sep 2023 13:48:16 +0100
-Subject: [PATCH 35/55] x86/entry: Adjust restore_all_xen to hold stack_end in
- %r14
-
-All other SPEC_CTRL_{ENTRY,EXIT}_* helpers hold stack_end in %r14. Adjust it
-for consistency.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 7aa28849a1155d856e214e9a80a7e65fffdc3e58)
----
- xen/arch/x86/include/asm/spec_ctrl_asm.h | 8 ++++----
- xen/arch/x86/x86_64/entry.S | 8 ++++----
- 2 files changed, 8 insertions(+), 8 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-index 97c4db31cd..66c706496f 100644
---- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
-+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-@@ -357,10 +357,10 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
- */
- .macro SPEC_CTRL_EXIT_TO_XEN
- /*
-- * Requires %rbx=stack_end
-+ * Requires %r14=stack_end
- * Clobbers %rax, %rcx, %rdx
- */
-- testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
-+ testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
- jz .L\@_skip_sc_msr
-
- /*
-@@ -371,10 +371,10 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
- */
- xor %edx, %edx
-
-- testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
-+ testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
- jz .L\@_skip_sc_msr
-
-- mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%rbx), %eax
-+ mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%r14), %eax
- mov $MSR_SPEC_CTRL, %ecx
- wrmsr
-
-diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
-index b45a09823a..92279a225d 100644
---- a/xen/arch/x86/x86_64/entry.S
-+++ b/xen/arch/x86/x86_64/entry.S
-@@ -665,15 +665,15 @@ restore_all_xen:
- * Check whether we need to switch to the per-CPU page tables, in
- * case we return to late PV exit code (from an NMI or #MC).
- */
-- GET_STACK_END(bx)
-- cmpb $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%rbx)
-+ GET_STACK_END(14)
-+ cmpb $0, STACK_CPUINFO_FIELD(use_pv_cr3)(%r14)
- UNLIKELY_START(ne, exit_cr3)
-- mov STACK_CPUINFO_FIELD(pv_cr3)(%rbx), %rax
-+ mov STACK_CPUINFO_FIELD(pv_cr3)(%r14), %rax
- mov %rax, %cr3
- UNLIKELY_END(exit_cr3)
-
- /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-- SPEC_CTRL_EXIT_TO_XEN /* Req: %rbx=end, Clob: acd */
-+ SPEC_CTRL_EXIT_TO_XEN /* Req: %r14=end, Clob: acd */
-
- RESTORE_ALL adj=8
- iretq
---
-2.42.0
-
diff --git a/0036-x86-entry-Track-the-IST-ness-of-an-entry-for-the-exi.patch b/0036-x86-entry-Track-the-IST-ness-of-an-entry-for-the-exi.patch
deleted file mode 100644
index ba7ea21..0000000
--- a/0036-x86-entry-Track-the-IST-ness-of-an-entry-for-the-exi.patch
+++ /dev/null
@@ -1,109 +0,0 @@
-From e4a71bc0da0baf7464bb0d8e33053f330e5ea366 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 13 Sep 2023 12:20:12 +0100
-Subject: [PATCH 36/55] x86/entry: Track the IST-ness of an entry for the exit
- paths
-
-Use %r12 to hold an ist_exit boolean. This register is zero elsewhere in the
-entry/exit asm, so it only needs setting in the IST path.
-
-As this is subtle and fragile, add check_ist_exit() to be used in debugging
-builds to cross-check that the ist_exit boolean matches the entry vector.
-
-Write check_ist_exit() it in C, because it's debug only and the logic more
-complicated than I care to maintain in asm.
-
-For now, we only need to use this signal in the exit-to-Xen path, but some
-exit-to-guest paths happen in IST context too. Check the correctness in all
-exit paths to avoid the logic bit-rotting.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 21bdc25b05a0f8ab6bc73520a9ca01327360732c)
-
-x86/entry: Partially revert IST-exit checks
-
-The patch adding check_ist_exit() didn't account for the fact that
-reset_stack_and_jump() is not an ABI-preserving boundary. The IST-ness in
-%r12 doesn't survive into the next context, and is a stale value C.
-
-This shows up in Gitlab CI for the Clang build:
-
- https://gitlab.com/xen-project/people/andyhhp/xen/-/jobs/5112783827
-
-and in OSSTest for GCC 8:
-
- http://logs.test-lab.xenproject.org/osstest/logs/183045/test-amd64-amd64-xl-qemuu-debianhvm-amd64/serial-pinot0.log
-
-There's no straightforward way to reconstruct the IST-exit-ness on the
-exit-to-guest path after a context switch. For now, we only need IST-exit on
-the return-to-Xen path.
-
-Fixes: 21bdc25b05a0 ("x86/entry: Track the IST-ness of an entry for the exit paths")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 9b57c800b79b96769ea3dcd6468578fa664d19f9)
----
- xen/arch/x86/traps.c | 13 +++++++++++++
- xen/arch/x86/x86_64/entry.S | 13 ++++++++++++-
- 2 files changed, 25 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
-index d12004b1c6..e65cc60041 100644
---- a/xen/arch/x86/traps.c
-+++ b/xen/arch/x86/traps.c
-@@ -2315,6 +2315,19 @@ void asm_domain_crash_synchronous(unsigned long addr)
- do_softirq();
- }
-
-+#ifdef CONFIG_DEBUG
-+void check_ist_exit(const struct cpu_user_regs *regs, bool ist_exit)
-+{
-+ const unsigned int ist_mask =
-+ (1U << X86_EXC_NMI) | (1U << X86_EXC_DB) |
-+ (1U << X86_EXC_DF) | (1U << X86_EXC_MC);
-+ uint8_t ev = regs->entry_vector;
-+ bool is_ist = (ev < TRAP_nr) && ((1U << ev) & ist_mask);
-+
-+ ASSERT(is_ist == ist_exit);
-+}
-+#endif
-+
- /*
- * Local variables:
- * mode: C
-diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
-index 92279a225d..4cebc4fbe3 100644
---- a/xen/arch/x86/x86_64/entry.S
-+++ b/xen/arch/x86/x86_64/entry.S
-@@ -659,8 +659,15 @@ ENTRY(early_page_fault)
- .section .text.entry, "ax", @progbits
-
- ALIGN
--/* No special register assumptions. */
-+/* %r12=ist_exit */
- restore_all_xen:
-+
-+#ifdef CONFIG_DEBUG
-+ mov %rsp, %rdi
-+ mov %r12, %rsi
-+ call check_ist_exit
-+#endif
-+
- /*
- * Check whether we need to switch to the per-CPU page tables, in
- * case we return to late PV exit code (from an NMI or #MC).
-@@ -1091,6 +1098,10 @@ handle_ist_exception:
- .L_ist_dispatch_done:
- mov %r15, STACK_CPUINFO_FIELD(xen_cr3)(%r14)
- mov %bl, STACK_CPUINFO_FIELD(use_pv_cr3)(%r14)
-+
-+ /* This is an IST exit */
-+ mov $1, %r12d
-+
- cmpb $TRAP_nmi,UREGS_entry_vector(%rsp)
- jne ret_from_intr
-
---
-2.42.0
-
diff --git a/0037-x86-spec-ctrl-Issue-VERW-during-IST-exit-to-Xen.patch b/0037-x86-spec-ctrl-Issue-VERW-during-IST-exit-to-Xen.patch
deleted file mode 100644
index 6580907..0000000
--- a/0037-x86-spec-ctrl-Issue-VERW-during-IST-exit-to-Xen.patch
+++ /dev/null
@@ -1,89 +0,0 @@
-From 2e2c3efcfc9f183674a8de6ed954ffbe7188b70d Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 13 Sep 2023 13:53:33 +0100
-Subject: [PATCH 37/55] x86/spec-ctrl: Issue VERW during IST exit to Xen
-
-There is a corner case where e.g. an NMI hitting an exit-to-guest path after
-SPEC_CTRL_EXIT_TO_* would have run the entire NMI handler *after* the VERW
-flush to scrub potentially sensitive data from uarch buffers.
-
-In order to compensate, issue VERW when exiting to Xen from an IST entry.
-
-SPEC_CTRL_EXIT_TO_XEN already has two reads of spec_ctrl_flags off the stack,
-and we're about to add a third. Load the field into %ebx, and list the
-register as clobbered.
-
-%r12 has been arranged to be the ist_exit signal, so add this as an input
-dependency and use it to identify when to issue a VERW.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 3ee6066bcd737756b0990d417d94eddc0b0d2585)
----
- xen/arch/x86/include/asm/spec_ctrl_asm.h | 20 +++++++++++++++-----
- xen/arch/x86/x86_64/entry.S | 2 +-
- 2 files changed, 16 insertions(+), 6 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-index 66c706496f..28a75796e6 100644
---- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
-+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-@@ -357,10 +357,12 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
- */
- .macro SPEC_CTRL_EXIT_TO_XEN
- /*
-- * Requires %r14=stack_end
-- * Clobbers %rax, %rcx, %rdx
-+ * Requires %r12=ist_exit, %r14=stack_end
-+ * Clobbers %rax, %rbx, %rcx, %rdx
- */
-- testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
-+ movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
-+
-+ testb $SCF_ist_sc_msr, %bl
- jz .L\@_skip_sc_msr
-
- /*
-@@ -371,7 +373,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
- */
- xor %edx, %edx
-
-- testb $SCF_use_shadow, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
-+ testb $SCF_use_shadow, %bl
- jz .L\@_skip_sc_msr
-
- mov STACK_CPUINFO_FIELD(shadow_spec_ctrl)(%r14), %eax
-@@ -380,8 +382,16 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
-
- .L\@_skip_sc_msr:
-
-- /* TODO VERW */
-+ test %r12, %r12
-+ jz .L\@_skip_ist_exit
-+
-+ /* Logically DO_SPEC_CTRL_COND_VERW but without the %rsp=cpuinfo dependency */
-+ testb $SCF_verw, %bl
-+ jz .L\@_skip_verw
-+ verw STACK_CPUINFO_FIELD(verw_sel)(%r14)
-+.L\@_skip_verw:
-
-+.L\@_skip_ist_exit:
- .endm
-
- #endif /* __ASSEMBLY__ */
-diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
-index 4cebc4fbe3..c12e011b4d 100644
---- a/xen/arch/x86/x86_64/entry.S
-+++ b/xen/arch/x86/x86_64/entry.S
-@@ -680,7 +680,7 @@ UNLIKELY_START(ne, exit_cr3)
- UNLIKELY_END(exit_cr3)
-
- /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-- SPEC_CTRL_EXIT_TO_XEN /* Req: %r14=end, Clob: acd */
-+ SPEC_CTRL_EXIT_TO_XEN /* Req: %r12=ist_exit %r14=end, Clob: abcd */
-
- RESTORE_ALL adj=8
- iretq
---
-2.42.0
-
diff --git a/0038-x86-amd-Introduce-is_zen-1-2-_uarch-predicates.patch b/0038-x86-amd-Introduce-is_zen-1-2-_uarch-predicates.patch
deleted file mode 100644
index 6f2cdcb..0000000
--- a/0038-x86-amd-Introduce-is_zen-1-2-_uarch-predicates.patch
+++ /dev/null
@@ -1,91 +0,0 @@
-From 19ee1e1faa32b79274b3484cb1170a5970f1e602 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 15 Sep 2023 12:13:51 +0100
-Subject: [PATCH 38/55] x86/amd: Introduce is_zen{1,2}_uarch() predicates
-
-We already have 3 cases using STIBP as a Zen1/2 heuristic, and are about to
-introduce a 4th. Wrap the heuristic into a pair of predicates rather than
-opencoding it, and the explanation of the heuristic, at each usage site.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit de1d265001397f308c5c3c5d3ffc30e7ef8c0705)
----
- xen/arch/x86/cpu/amd.c | 18 ++++--------------
- xen/arch/x86/include/asm/amd.h | 11 +++++++++++
- 2 files changed, 15 insertions(+), 14 deletions(-)
-
-diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
-index 1bb3044be1..e94ba5a0e0 100644
---- a/xen/arch/x86/cpu/amd.c
-+++ b/xen/arch/x86/cpu/amd.c
-@@ -855,15 +855,13 @@ void amd_set_legacy_ssbd(bool enable)
- * non-branch instructions to be ignored. It is to be set unilaterally in
- * newer microcode.
- *
-- * This chickenbit is something unrelated on Zen1, and Zen1 vs Zen2 isn't a
-- * simple model number comparison, so use STIBP as a heuristic to separate the
-- * two uarches in Fam17h(AMD)/18h(Hygon).
-+ * This chickenbit is something unrelated on Zen1.
- */
- void amd_init_spectral_chicken(void)
- {
- uint64_t val, chickenbit = 1 << 1;
-
-- if (cpu_has_hypervisor || !boot_cpu_has(X86_FEATURE_AMD_STIBP))
-+ if (cpu_has_hypervisor || !is_zen2_uarch())
- return;
-
- if (rdmsr_safe(MSR_AMD64_DE_CFG2, val) == 0 && !(val & chickenbit))
-@@ -912,11 +910,8 @@ void amd_check_zenbleed(void)
- * With the Fam17h check above, most parts getting here are
- * Zen1. They're not affected. Assume Zen2 ones making it
- * here are affected regardless of microcode version.
-- *
-- * Zen1 vs Zen2 isn't a simple model number comparison, so use
-- * STIBP as a heuristic to distinguish.
- */
-- if (!boot_cpu_has(X86_FEATURE_AMD_STIBP))
-+ if (is_zen1_uarch())
- return;
- good_rev = ~0U;
- break;
-@@ -1277,12 +1272,7 @@ static int __init cf_check zen2_c6_errata_check(void)
- */
- s_time_t delta;
-
-- /*
-- * Zen1 vs Zen2 isn't a simple model number comparison, so use STIBP as
-- * a heuristic to separate the two uarches in Fam17h.
-- */
-- if (cpu_has_hypervisor || boot_cpu_data.x86 != 0x17 ||
-- !boot_cpu_has(X86_FEATURE_AMD_STIBP))
-+ if (cpu_has_hypervisor || boot_cpu_data.x86 != 0x17 || !is_zen2_uarch())
- return 0;
-
- /*
-diff --git a/xen/arch/x86/include/asm/amd.h b/xen/arch/x86/include/asm/amd.h
-index a975d3de26..82324110ab 100644
---- a/xen/arch/x86/include/asm/amd.h
-+++ b/xen/arch/x86/include/asm/amd.h
-@@ -140,6 +140,17 @@
- AMD_MODEL_RANGE(0x11, 0x0, 0x0, 0xff, 0xf), \
- AMD_MODEL_RANGE(0x12, 0x0, 0x0, 0xff, 0xf))
-
-+/*
-+ * The Zen1 and Zen2 microarchitectures are implemented by AMD (Fam17h) and
-+ * Hygon (Fam18h) but without simple model number rules. Instead, use STIBP
-+ * as a heuristic that distinguishes the two.
-+ *
-+ * The caller is required to perform the appropriate vendor/family checks
-+ * first.
-+ */
-+#define is_zen1_uarch() (!boot_cpu_has(X86_FEATURE_AMD_STIBP))
-+#define is_zen2_uarch() boot_cpu_has(X86_FEATURE_AMD_STIBP)
-+
- struct cpuinfo_x86;
- int cpu_has_amd_erratum(const struct cpuinfo_x86 *, int, ...);
-
---
-2.42.0
-
diff --git a/0039-x86-spec-ctrl-Mitigate-the-Zen1-DIV-leakage.patch b/0039-x86-spec-ctrl-Mitigate-the-Zen1-DIV-leakage.patch
deleted file mode 100644
index 4b23d12..0000000
--- a/0039-x86-spec-ctrl-Mitigate-the-Zen1-DIV-leakage.patch
+++ /dev/null
@@ -1,228 +0,0 @@
-From 9ac2f49f5fa3a5159409241d4f74fb0d721dd4c5 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 30 Aug 2023 20:24:25 +0100
-Subject: [PATCH 39/55] x86/spec-ctrl: Mitigate the Zen1 DIV leakage
-
-In the Zen1 microarchitecure, there is one divider in the pipeline which
-services uops from both threads. In the case of #DE, the latched result from
-the previous DIV to execute will be forwarded speculatively.
-
-This is an interesting covert channel that allows two threads to communicate
-without any system calls. In also allows userspace to obtain the result of
-the most recent DIV instruction executed (even speculatively) in the core,
-which can be from a higher privilege context.
-
-Scrub the result from the divider by executing a non-faulting divide. This
-needs performing on the exit-to-guest paths, and ist_exit-to-Xen.
-
-Alternatives in IST context is believed safe now that it's done in NMI
-context.
-
-This is XSA-439 / CVE-2023-20588.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit b5926c6ecf05c28ee99c6248c42d691ccbf0c315)
----
- docs/misc/xen-command-line.pandoc | 6 ++-
- xen/arch/x86/hvm/svm/entry.S | 1 +
- xen/arch/x86/include/asm/cpufeatures.h | 2 +-
- xen/arch/x86/include/asm/spec_ctrl_asm.h | 17 +++++++++
- xen/arch/x86/spec_ctrl.c | 48 +++++++++++++++++++++++-
- 5 files changed, 71 insertions(+), 3 deletions(-)
-
-diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index d9dae740cc..b92c8f969c 100644
---- a/docs/misc/xen-command-line.pandoc
-+++ b/docs/misc/xen-command-line.pandoc
-@@ -2315,7 +2315,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
- > {msr-sc,rsb,md-clear,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
- > bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
- > eager-fpu,l1d-flush,branch-harden,srb-lock,
--> unpriv-mmio,gds-mit}=<bool> ]`
-+> unpriv-mmio,gds-mit,div-scrub}=<bool> ]`
-
- Controls for speculative execution sidechannel mitigations. By default, Xen
- will pick the most appropriate mitigations based on compiled in support,
-@@ -2437,6 +2437,10 @@ has elected not to lock the configuration, Xen will use GDS_CTRL to mitigate
- GDS with. Otherwise, Xen will mitigate by disabling AVX, which blocks the use
- of the AVX2 Gather instructions.
-
-+On all hardware, the `div-scrub=` option can be used to force or prevent Xen
-+from mitigating the DIV-leakage vulnerability. By default, Xen will mitigate
-+DIV-leakage on hardware believed to be vulnerable.
-+
- ### sync_console
- > `= <boolean>`
-
-diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
-index 981cd82e7c..934f12cf5c 100644
---- a/xen/arch/x86/hvm/svm/entry.S
-+++ b/xen/arch/x86/hvm/svm/entry.S
-@@ -74,6 +74,7 @@ __UNLIKELY_END(nsvm_hap)
- 1: /* No Spectre v1 concerns. Execution will hit VMRUN imminently. */
- .endm
- ALTERNATIVE "", svm_vmentry_spec_ctrl, X86_FEATURE_SC_MSR_HVM
-+ ALTERNATIVE "", DO_SPEC_CTRL_DIV, X86_FEATURE_SC_DIV
-
- pop %r15
- pop %r14
-diff --git a/xen/arch/x86/include/asm/cpufeatures.h b/xen/arch/x86/include/asm/cpufeatures.h
-index da0593de85..c3aad21c3b 100644
---- a/xen/arch/x86/include/asm/cpufeatures.h
-+++ b/xen/arch/x86/include/asm/cpufeatures.h
-@@ -35,7 +35,7 @@ XEN_CPUFEATURE(SC_RSB_HVM, X86_SYNTH(19)) /* RSB overwrite needed for HVM
- XEN_CPUFEATURE(XEN_SELFSNOOP, X86_SYNTH(20)) /* SELFSNOOP gets used by Xen itself */
- XEN_CPUFEATURE(SC_MSR_IDLE, X86_SYNTH(21)) /* Clear MSR_SPEC_CTRL on idle */
- XEN_CPUFEATURE(XEN_LBR, X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
--/* Bits 23 unused. */
-+XEN_CPUFEATURE(SC_DIV, X86_SYNTH(23)) /* DIV scrub needed */
- XEN_CPUFEATURE(SC_RSB_IDLE, X86_SYNTH(24)) /* RSB overwrite needed for idle. */
- XEN_CPUFEATURE(SC_VERW_IDLE, X86_SYNTH(25)) /* VERW used by Xen for idle */
- XEN_CPUFEATURE(XEN_SHSTK, X86_SYNTH(26)) /* Xen uses CET Shadow Stacks */
-diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-index 28a75796e6..f4b8b9d956 100644
---- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
-+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-@@ -177,6 +177,19 @@
- .L\@_verw_skip:
- .endm
-
-+.macro DO_SPEC_CTRL_DIV
-+/*
-+ * Requires nothing
-+ * Clobbers %rax
-+ *
-+ * Issue a DIV for its flushing side effect (Zen1 uarch specific). Any
-+ * non-faulting DIV will do; a byte DIV has least latency, and doesn't clobber
-+ * %rdx.
-+ */
-+ mov $1, %eax
-+ div %al
-+.endm
-+
- .macro DO_SPEC_CTRL_ENTRY maybexen:req
- /*
- * Requires %rsp=regs (also cpuinfo if !maybexen)
-@@ -279,6 +292,8 @@
- ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV
-
- DO_SPEC_CTRL_COND_VERW
-+
-+ ALTERNATIVE "", DO_SPEC_CTRL_DIV, X86_FEATURE_SC_DIV
- .endm
-
- /*
-@@ -391,6 +406,8 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
- verw STACK_CPUINFO_FIELD(verw_sel)(%r14)
- .L\@_skip_verw:
-
-+ ALTERNATIVE "", DO_SPEC_CTRL_DIV, X86_FEATURE_SC_DIV
-+
- .L\@_skip_ist_exit:
- .endm
-
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 79b98f0fe7..0ff3c895ac 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -79,6 +79,7 @@ static int8_t __initdata opt_srb_lock = -1;
- static bool __initdata opt_unpriv_mmio;
- static bool __ro_after_init opt_fb_clear_mmio;
- static int8_t __initdata opt_gds_mit = -1;
-+static int8_t __initdata opt_div_scrub = -1;
-
- static int __init cf_check parse_spec_ctrl(const char *s)
- {
-@@ -133,6 +134,7 @@ static int __init cf_check parse_spec_ctrl(const char *s)
- opt_srb_lock = 0;
- opt_unpriv_mmio = false;
- opt_gds_mit = 0;
-+ opt_div_scrub = 0;
- }
- else if ( val > 0 )
- rc = -EINVAL;
-@@ -285,6 +287,8 @@ static int __init cf_check parse_spec_ctrl(const char *s)
- opt_unpriv_mmio = val;
- else if ( (val = parse_boolean("gds-mit", s, ss)) >= 0 )
- opt_gds_mit = val;
-+ else if ( (val = parse_boolean("div-scrub", s, ss)) >= 0 )
-+ opt_div_scrub = val;
- else
- rc = -EINVAL;
-
-@@ -485,7 +489,7 @@ static void __init print_details(enum ind_thunk thunk)
- "\n");
-
- /* Settings for Xen's protection, irrespective of guests. */
-- printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s\n",
-+ printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s\n",
- thunk == THUNK_NONE ? "N/A" :
- thunk == THUNK_RETPOLINE ? "RETPOLINE" :
- thunk == THUNK_LFENCE ? "LFENCE" :
-@@ -510,6 +514,7 @@ static void __init print_details(enum ind_thunk thunk)
- opt_l1d_flush ? " L1D_FLUSH" : "",
- opt_md_clear_pv || opt_md_clear_hvm ||
- opt_fb_clear_mmio ? " VERW" : "",
-+ opt_div_scrub ? " DIV" : "",
- opt_branch_harden ? " BRANCH_HARDEN" : "");
-
- /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
-@@ -967,6 +972,45 @@ static void __init srso_calculations(bool hw_smt_enabled)
- setup_force_cpu_cap(X86_FEATURE_SRSO_NO);
- }
-
-+/*
-+ * The Div leakage issue is specific to the AMD Zen1 microarchitecure.
-+ *
-+ * However, there's no $FOO_NO bit defined, so if we're virtualised we have no
-+ * hope of spotting the case where we might move to vulnerable hardware. We
-+ * also can't make any useful conclusion about SMT-ness.
-+ *
-+ * Don't check the hypervisor bit, so at least we do the safe thing when
-+ * booting on something that looks like a Zen1 CPU.
-+ */
-+static bool __init has_div_vuln(void)
-+{
-+ if ( !(boot_cpu_data.x86_vendor &
-+ (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
-+ return false;
-+
-+ if ( boot_cpu_data.x86 != 0x17 && boot_cpu_data.x86 != 0x18 )
-+ return false;
-+
-+ return is_zen1_uarch();
-+}
-+
-+static void __init div_calculations(bool hw_smt_enabled)
-+{
-+ bool cpu_bug_div = has_div_vuln();
-+
-+ if ( opt_div_scrub == -1 )
-+ opt_div_scrub = cpu_bug_div;
-+
-+ if ( opt_div_scrub )
-+ setup_force_cpu_cap(X86_FEATURE_SC_DIV);
-+
-+ if ( opt_smt == -1 && !cpu_has_hypervisor && cpu_bug_div && hw_smt_enabled )
-+ warning_add(
-+ "Booted on leaky-DIV hardware with SMT/Hyperthreading\n"
-+ "enabled. Please assess your configuration and choose an\n"
-+ "explicit 'smt=<bool>' setting. See XSA-439.\n");
-+}
-+
- static void __init ibpb_calculations(void)
- {
- bool def_ibpb_entry = false;
-@@ -1726,6 +1770,8 @@ void __init init_speculation_mitigations(void)
-
- ibpb_calculations();
-
-+ div_calculations(hw_smt_enabled);
-+
- /* Check whether Eager FPU should be enabled by default. */
- if ( opt_eager_fpu == -1 )
- opt_eager_fpu = should_use_eager_fpu();
---
-2.42.0
-
diff --git a/0040-x86-shadow-defer-releasing-of-PV-s-top-level-shadow-.patch b/0040-x86-shadow-defer-releasing-of-PV-s-top-level-shadow-.patch
deleted file mode 100644
index 21fb16f..0000000
--- a/0040-x86-shadow-defer-releasing-of-PV-s-top-level-shadow-.patch
+++ /dev/null
@@ -1,455 +0,0 @@
-From 90c540c58985dc774cf0a1d2dc423473d3f37267 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <JBeulich@suse.com>
-Date: Wed, 20 Sep 2023 10:33:26 +0100
-Subject: [PATCH 40/55] x86/shadow: defer releasing of PV's top-level shadow
- reference
-
-sh_set_toplevel_shadow() re-pinning the top-level shadow we may be
-running on is not enough (and at the same time unnecessary when the
-shadow isn't what we're running on): That shadow becomes eligible for
-blowing away (from e.g. shadow_prealloc()) immediately after the
-paging lock was dropped. Yet it needs to remain valid until the actual
-page table switch occurred.
-
-Propagate up the call chain the shadow entry that needs releasing
-eventually, and carry out the release immediately after switching page
-tables. Handle update_cr3() failures by switching to idle pagetables.
-Note that various further uses of update_cr3() are HVM-only or only act
-on paused vCPU-s, in which case sh_set_toplevel_shadow() will not defer
-releasing of the reference.
-
-While changing the update_cr3() hook, also convert the "do_locking"
-parameter to boolean.
-
-This is CVE-2023-34322 / XSA-438.
-
-Reported-by: Tim Deegan <tim@xen.org>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: George Dunlap <george.dunlap@cloud.com>
-(cherry picked from commit fb0ff49fe9f784bfee0370c2a3c5f20e39d7a1cb)
----
- xen/arch/x86/include/asm/mm.h | 2 +-
- xen/arch/x86/include/asm/paging.h | 6 ++--
- xen/arch/x86/include/asm/shadow.h | 8 +++++
- xen/arch/x86/mm.c | 27 +++++++++++----
- xen/arch/x86/mm/hap/hap.c | 6 ++--
- xen/arch/x86/mm/shadow/common.c | 55 ++++++++++++++++++++-----------
- xen/arch/x86/mm/shadow/multi.c | 33 ++++++++++++-------
- xen/arch/x86/mm/shadow/none.c | 4 ++-
- xen/arch/x86/mm/shadow/private.h | 14 ++++----
- xen/arch/x86/pv/domain.c | 25 ++++++++++++--
- 10 files changed, 127 insertions(+), 53 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/mm.h b/xen/arch/x86/include/asm/mm.h
-index d723c7c38f..a5d7fdd32e 100644
---- a/xen/arch/x86/include/asm/mm.h
-+++ b/xen/arch/x86/include/asm/mm.h
-@@ -552,7 +552,7 @@ void audit_domains(void);
- #endif
-
- void make_cr3(struct vcpu *v, mfn_t mfn);
--void update_cr3(struct vcpu *v);
-+pagetable_t update_cr3(struct vcpu *v);
- int vcpu_destroy_pagetables(struct vcpu *);
- void *do_page_walk(struct vcpu *v, unsigned long addr);
-
-diff --git a/xen/arch/x86/include/asm/paging.h b/xen/arch/x86/include/asm/paging.h
-index 6f7000d5f4..94c590f31a 100644
---- a/xen/arch/x86/include/asm/paging.h
-+++ b/xen/arch/x86/include/asm/paging.h
-@@ -138,7 +138,7 @@ struct paging_mode {
- paddr_t ga, uint32_t *pfec,
- unsigned int *page_order);
- #endif
-- void (*update_cr3 )(struct vcpu *v, int do_locking,
-+ pagetable_t (*update_cr3 )(struct vcpu *v, bool do_locking,
- bool noflush);
- void (*update_paging_modes )(struct vcpu *v);
- bool (*flush_tlb )(const unsigned long *vcpu_bitmap);
-@@ -310,9 +310,9 @@ static inline unsigned long paging_ga_to_gfn_cr3(struct vcpu *v,
- /* Update all the things that are derived from the guest's CR3.
- * Called when the guest changes CR3; the caller can then use v->arch.cr3
- * as the value to load into the host CR3 to schedule this vcpu */
--static inline void paging_update_cr3(struct vcpu *v, bool noflush)
-+static inline pagetable_t paging_update_cr3(struct vcpu *v, bool noflush)
- {
-- paging_get_hostmode(v)->update_cr3(v, 1, noflush);
-+ return paging_get_hostmode(v)->update_cr3(v, 1, noflush);
- }
-
- /* Update all the things that are derived from the guest's CR0/CR3/CR4.
-diff --git a/xen/arch/x86/include/asm/shadow.h b/xen/arch/x86/include/asm/shadow.h
-index dad876d294..0b72c9eda8 100644
---- a/xen/arch/x86/include/asm/shadow.h
-+++ b/xen/arch/x86/include/asm/shadow.h
-@@ -99,6 +99,9 @@ int shadow_set_allocation(struct domain *d, unsigned int pages,
-
- int shadow_get_allocation_bytes(struct domain *d, uint64_t *size);
-
-+/* Helper to invoke for deferred releasing of a top-level shadow's reference. */
-+void shadow_put_top_level(struct domain *d, pagetable_t old);
-+
- #else /* !CONFIG_SHADOW_PAGING */
-
- #define shadow_vcpu_teardown(v) ASSERT(is_pv_vcpu(v))
-@@ -121,6 +124,11 @@ static inline void shadow_prepare_page_type_change(struct domain *d,
-
- static inline void shadow_blow_tables_per_domain(struct domain *d) {}
-
-+static inline void shadow_put_top_level(struct domain *d, pagetable_t old)
-+{
-+ ASSERT_UNREACHABLE();
-+}
-+
- static inline int shadow_domctl(struct domain *d,
- struct xen_domctl_shadow_op *sc,
- XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
-diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
-index b46eee1332..e884a6fdbd 100644
---- a/xen/arch/x86/mm.c
-+++ b/xen/arch/x86/mm.c
-@@ -567,15 +567,12 @@ void write_ptbase(struct vcpu *v)
- *
- * Update ref counts to shadow tables appropriately.
- */
--void update_cr3(struct vcpu *v)
-+pagetable_t update_cr3(struct vcpu *v)
- {
- mfn_t cr3_mfn;
-
- if ( paging_mode_enabled(v->domain) )
-- {
-- paging_update_cr3(v, false);
-- return;
-- }
-+ return paging_update_cr3(v, false);
-
- if ( !(v->arch.flags & TF_kernel_mode) )
- cr3_mfn = pagetable_get_mfn(v->arch.guest_table_user);
-@@ -583,6 +580,8 @@ void update_cr3(struct vcpu *v)
- cr3_mfn = pagetable_get_mfn(v->arch.guest_table);
-
- make_cr3(v, cr3_mfn);
-+
-+ return pagetable_null();
- }
-
- static inline void set_tlbflush_timestamp(struct page_info *page)
-@@ -3285,6 +3284,7 @@ int new_guest_cr3(mfn_t mfn)
- struct domain *d = curr->domain;
- int rc;
- mfn_t old_base_mfn;
-+ pagetable_t old_shadow;
-
- if ( is_pv_32bit_domain(d) )
- {
-@@ -3352,9 +3352,22 @@ int new_guest_cr3(mfn_t mfn)
- if ( !VM_ASSIST(d, m2p_strict) )
- fill_ro_mpt(mfn);
- curr->arch.guest_table = pagetable_from_mfn(mfn);
-- update_cr3(curr);
-+ old_shadow = update_cr3(curr);
-+
-+ /*
-+ * In shadow mode update_cr3() can fail, in which case here we're still
-+ * running on the prior top-level shadow (which we're about to release).
-+ * Switch to the idle page tables in such an event; the guest will have
-+ * been crashed already.
-+ */
-+ if ( likely(!mfn_eq(pagetable_get_mfn(old_shadow),
-+ maddr_to_mfn(curr->arch.cr3 & ~X86_CR3_NOFLUSH))) )
-+ write_ptbase(curr);
-+ else
-+ write_ptbase(idle_vcpu[curr->processor]);
-
-- write_ptbase(curr);
-+ if ( !pagetable_is_null(old_shadow) )
-+ shadow_put_top_level(d, old_shadow);
-
- if ( likely(mfn_x(old_base_mfn) != 0) )
- {
-diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
-index 0fc1b1d9ac..57a19c3d59 100644
---- a/xen/arch/x86/mm/hap/hap.c
-+++ b/xen/arch/x86/mm/hap/hap.c
-@@ -739,11 +739,13 @@ static bool cf_check hap_invlpg(struct vcpu *v, unsigned long linear)
- return 1;
- }
-
--static void cf_check hap_update_cr3(
-- struct vcpu *v, int do_locking, bool noflush)
-+static pagetable_t cf_check hap_update_cr3(
-+ struct vcpu *v, bool do_locking, bool noflush)
- {
- v->arch.hvm.hw_cr[3] = v->arch.hvm.guest_cr[3];
- hvm_update_guest_cr3(v, noflush);
-+
-+ return pagetable_null();
- }
-
- static bool flush_vcpu(const struct vcpu *v, const unsigned long *vcpu_bitmap)
-diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
-index cf5e181f74..c0940f939e 100644
---- a/xen/arch/x86/mm/shadow/common.c
-+++ b/xen/arch/x86/mm/shadow/common.c
-@@ -2590,13 +2590,13 @@ void cf_check shadow_update_paging_modes(struct vcpu *v)
- }
-
- /* Set up the top-level shadow and install it in slot 'slot' of shadow_table */
--void sh_set_toplevel_shadow(struct vcpu *v,
-- unsigned int slot,
-- mfn_t gmfn,
-- unsigned int root_type,
-- mfn_t (*make_shadow)(struct vcpu *v,
-- mfn_t gmfn,
-- uint32_t shadow_type))
-+pagetable_t sh_set_toplevel_shadow(struct vcpu *v,
-+ unsigned int slot,
-+ mfn_t gmfn,
-+ unsigned int root_type,
-+ mfn_t (*make_shadow)(struct vcpu *v,
-+ mfn_t gmfn,
-+ uint32_t shadow_type))
- {
- mfn_t smfn;
- pagetable_t old_entry, new_entry;
-@@ -2653,20 +2653,37 @@ void sh_set_toplevel_shadow(struct vcpu *v,
- mfn_x(gmfn), mfn_x(pagetable_get_mfn(new_entry)));
- v->arch.paging.shadow.shadow_table[slot] = new_entry;
-
-- /* Decrement the refcount of the old contents of this slot */
-- if ( !pagetable_is_null(old_entry) )
-+ /*
-+ * Decrement the refcount of the old contents of this slot, unless
-+ * we're still running on that shadow - in that case it'll need holding
-+ * on to until the actual page table switch did occur.
-+ */
-+ if ( !pagetable_is_null(old_entry) && (v != current || !is_pv_domain(d)) )
- {
-- mfn_t old_smfn = pagetable_get_mfn(old_entry);
-- /* Need to repin the old toplevel shadow if it's been unpinned
-- * by shadow_prealloc(): in PV mode we're still running on this
-- * shadow and it's not safe to free it yet. */
-- if ( !mfn_to_page(old_smfn)->u.sh.pinned && !sh_pin(d, old_smfn) )
-- {
-- printk(XENLOG_G_ERR "can't re-pin %"PRI_mfn"\n", mfn_x(old_smfn));
-- domain_crash(d);
-- }
-- sh_put_ref(d, old_smfn, 0);
-+ sh_put_ref(d, pagetable_get_mfn(old_entry), 0);
-+ old_entry = pagetable_null();
- }
-+
-+ /*
-+ * 2- and 3-level shadow mode is used for HVM only. Therefore we never run
-+ * on such a shadow, so only call sites requesting an L4 shadow need to pay
-+ * attention to the returned value.
-+ */
-+ ASSERT(pagetable_is_null(old_entry) || root_type == SH_type_l4_64_shadow);
-+
-+ return old_entry;
-+}
-+
-+/*
-+ * Helper invoked when releasing of a top-level shadow's reference was
-+ * deferred in sh_set_toplevel_shadow() above.
-+ */
-+void shadow_put_top_level(struct domain *d, pagetable_t old_entry)
-+{
-+ ASSERT(!pagetable_is_null(old_entry));
-+ paging_lock(d);
-+ sh_put_ref(d, pagetable_get_mfn(old_entry), 0);
-+ paging_unlock(d);
- }
-
- /**************************************************************************/
-diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
-index 671bf8c228..c92b354a78 100644
---- a/xen/arch/x86/mm/shadow/multi.c
-+++ b/xen/arch/x86/mm/shadow/multi.c
-@@ -3224,7 +3224,8 @@ static void cf_check sh_detach_old_tables(struct vcpu *v)
- }
- }
-
--static void cf_check sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
-+static pagetable_t cf_check sh_update_cr3(struct vcpu *v, bool do_locking,
-+ bool noflush)
- /* Updates vcpu->arch.cr3 after the guest has changed CR3.
- * Paravirtual guests should set v->arch.guest_table (and guest_table_user,
- * if appropriate).
-@@ -3238,6 +3239,7 @@ static void cf_check sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
- {
- struct domain *d = v->domain;
- mfn_t gmfn;
-+ pagetable_t old_entry = pagetable_null();
- #if GUEST_PAGING_LEVELS == 3
- const guest_l3e_t *gl3e;
- unsigned int i, guest_idx;
-@@ -3247,7 +3249,7 @@ static void cf_check sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
- if ( !is_hvm_domain(d) && !v->is_initialised )
- {
- ASSERT(v->arch.cr3 == 0);
-- return;
-+ return old_entry;
- }
-
- if ( do_locking ) paging_lock(v->domain);
-@@ -3320,11 +3322,12 @@ static void cf_check sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
- #if GUEST_PAGING_LEVELS == 4
- if ( sh_remove_write_access(d, gmfn, 4, 0) != 0 )
- guest_flush_tlb_mask(d, d->dirty_cpumask);
-- sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow, sh_make_shadow);
-+ old_entry = sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l4_shadow,
-+ sh_make_shadow);
- if ( unlikely(pagetable_is_null(v->arch.paging.shadow.shadow_table[0])) )
- {
- ASSERT(d->is_dying || d->is_shutting_down);
-- return;
-+ return old_entry;
- }
- if ( !shadow_mode_external(d) && !is_pv_32bit_domain(d) )
- {
-@@ -3368,24 +3371,30 @@ static void cf_check sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
- gl2gfn = guest_l3e_get_gfn(gl3e[i]);
- gl2mfn = get_gfn_query_unlocked(d, gfn_x(gl2gfn), &p2mt);
- if ( p2m_is_ram(p2mt) )
-- sh_set_toplevel_shadow(v, i, gl2mfn, SH_type_l2_shadow,
-- sh_make_shadow);
-+ old_entry = sh_set_toplevel_shadow(v, i, gl2mfn,
-+ SH_type_l2_shadow,
-+ sh_make_shadow);
- else
-- sh_set_toplevel_shadow(v, i, INVALID_MFN, 0,
-- sh_make_shadow);
-+ old_entry = sh_set_toplevel_shadow(v, i, INVALID_MFN, 0,
-+ sh_make_shadow);
- }
- else
-- sh_set_toplevel_shadow(v, i, INVALID_MFN, 0, sh_make_shadow);
-+ old_entry = sh_set_toplevel_shadow(v, i, INVALID_MFN, 0,
-+ sh_make_shadow);
-+
-+ ASSERT(pagetable_is_null(old_entry));
- }
- }
- #elif GUEST_PAGING_LEVELS == 2
- if ( sh_remove_write_access(d, gmfn, 2, 0) != 0 )
- guest_flush_tlb_mask(d, d->dirty_cpumask);
-- sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow, sh_make_shadow);
-+ old_entry = sh_set_toplevel_shadow(v, 0, gmfn, SH_type_l2_shadow,
-+ sh_make_shadow);
-+ ASSERT(pagetable_is_null(old_entry));
- if ( unlikely(pagetable_is_null(v->arch.paging.shadow.shadow_table[0])) )
- {
- ASSERT(d->is_dying || d->is_shutting_down);
-- return;
-+ return old_entry;
- }
- #else
- #error This should never happen
-@@ -3473,6 +3482,8 @@ static void cf_check sh_update_cr3(struct vcpu *v, int do_locking, bool noflush)
-
- /* Release the lock, if we took it (otherwise it's the caller's problem) */
- if ( do_locking ) paging_unlock(v->domain);
-+
-+ return old_entry;
- }
-
-
-diff --git a/xen/arch/x86/mm/shadow/none.c b/xen/arch/x86/mm/shadow/none.c
-index eaaa874b11..743c0ffb85 100644
---- a/xen/arch/x86/mm/shadow/none.c
-+++ b/xen/arch/x86/mm/shadow/none.c
-@@ -52,9 +52,11 @@ static unsigned long cf_check _gva_to_gfn(
- }
- #endif
-
--static void cf_check _update_cr3(struct vcpu *v, int do_locking, bool noflush)
-+static pagetable_t cf_check _update_cr3(struct vcpu *v, bool do_locking,
-+ bool noflush)
- {
- ASSERT_UNREACHABLE();
-+ return pagetable_null();
- }
-
- static void cf_check _update_paging_modes(struct vcpu *v)
-diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h
-index c2bb1ed3c3..91f798c5aa 100644
---- a/xen/arch/x86/mm/shadow/private.h
-+++ b/xen/arch/x86/mm/shadow/private.h
-@@ -391,13 +391,13 @@ mfn_t shadow_alloc(struct domain *d,
- void shadow_free(struct domain *d, mfn_t smfn);
-
- /* Set up the top-level shadow and install it in slot 'slot' of shadow_table */
--void sh_set_toplevel_shadow(struct vcpu *v,
-- unsigned int slot,
-- mfn_t gmfn,
-- unsigned int root_type,
-- mfn_t (*make_shadow)(struct vcpu *v,
-- mfn_t gmfn,
-- uint32_t shadow_type));
-+pagetable_t sh_set_toplevel_shadow(struct vcpu *v,
-+ unsigned int slot,
-+ mfn_t gmfn,
-+ unsigned int root_type,
-+ mfn_t (*make_shadow)(struct vcpu *v,
-+ mfn_t gmfn,
-+ uint32_t shadow_type));
-
- /* Update the shadows in response to a pagetable write from Xen */
- int sh_validate_guest_entry(struct vcpu *v, mfn_t gmfn, void *entry, u32 size);
-diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
-index 5c92812dc6..2a445bb17b 100644
---- a/xen/arch/x86/pv/domain.c
-+++ b/xen/arch/x86/pv/domain.c
-@@ -424,10 +424,13 @@ bool __init xpti_pcid_enabled(void)
-
- static void _toggle_guest_pt(struct vcpu *v)
- {
-+ bool guest_update;
-+ pagetable_t old_shadow;
- unsigned long cr3;
-
- v->arch.flags ^= TF_kernel_mode;
-- update_cr3(v);
-+ guest_update = v->arch.flags & TF_kernel_mode;
-+ old_shadow = update_cr3(v);
-
- /*
- * Don't flush user global mappings from the TLB. Don't tick TLB clock.
-@@ -436,13 +439,31 @@ static void _toggle_guest_pt(struct vcpu *v)
- * TLB flush (for just the incoming PCID), as the top level page table may
- * have changed behind our backs. To be on the safe side, suppress the
- * no-flush unconditionally in this case.
-+ *
-+ * Furthermore in shadow mode update_cr3() can fail, in which case here
-+ * we're still running on the prior top-level shadow (which we're about
-+ * to release). Switch to the idle page tables in such an event; the
-+ * guest will have been crashed already.
- */
- cr3 = v->arch.cr3;
- if ( shadow_mode_enabled(v->domain) )
-+ {
- cr3 &= ~X86_CR3_NOFLUSH;
-+
-+ if ( unlikely(mfn_eq(pagetable_get_mfn(old_shadow),
-+ maddr_to_mfn(cr3))) )
-+ {
-+ cr3 = idle_vcpu[v->processor]->arch.cr3;
-+ /* Also suppress runstate/time area updates below. */
-+ guest_update = false;
-+ }
-+ }
- write_cr3(cr3);
-
-- if ( !(v->arch.flags & TF_kernel_mode) )
-+ if ( !pagetable_is_null(old_shadow) )
-+ shadow_put_top_level(v->domain, old_shadow);
-+
-+ if ( !guest_update )
- return;
-
- if ( v->arch.pv.need_update_runstate_area && update_runstate_area(v) )
---
-2.42.0
-
diff --git a/0041-tools-xenstored-domain_entry_fix-Handle-conflicting-.patch b/0041-tools-xenstored-domain_entry_fix-Handle-conflicting-.patch
deleted file mode 100644
index 1edecc8..0000000
--- a/0041-tools-xenstored-domain_entry_fix-Handle-conflicting-.patch
+++ /dev/null
@@ -1,64 +0,0 @@
-From c4e05c97f57d236040d1da5c1fbf6e3699dc86ea Mon Sep 17 00:00:00 2001
-From: Julien Grall <jgrall@amazon.com>
-Date: Fri, 22 Sep 2023 11:32:16 +0100
-Subject: [PATCH 41/55] tools/xenstored: domain_entry_fix(): Handle conflicting
- transaction
-
-The function domain_entry_fix() will be initially called to check if the
-quota is correct before attempt to commit any nodes. So it would be
-possible that accounting is temporarily negative. This is the case
-in the following sequence:
-
- 1) Create 50 nodes
- 2) Start two transactions
- 3) Delete all the nodes in each transaction
- 4) Commit the two transactions
-
-Because the first transaction will have succeed and updated the
-accounting, there is no guarantee that 'd->nbentry + num' will still
-be above 0. So the assert() would be triggered.
-The assert() was introduced in dbef1f748289 ("tools/xenstore: simplify
-and fix per domain node accounting") with the assumption that the
-value can't be negative. As this is not true revert to the original
-check but restricted to the path where we don't update. Take the
-opportunity to explain the rationale behind the check.
-
-This CVE-2023-34323 / XSA-440.
-
-Fixes: dbef1f748289 ("tools/xenstore: simplify and fix per domain node accounting")
-Signed-off-by: Julien Grall <jgrall@amazon.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
----
- tools/xenstore/xenstored_domain.c | 14 ++++++++++++--
- 1 file changed, 12 insertions(+), 2 deletions(-)
-
-diff --git a/tools/xenstore/xenstored_domain.c b/tools/xenstore/xenstored_domain.c
-index aa86892fed..6074df210c 100644
---- a/tools/xenstore/xenstored_domain.c
-+++ b/tools/xenstore/xenstored_domain.c
-@@ -1094,10 +1094,20 @@ int domain_entry_fix(unsigned int domid, int num, bool update)
- }
-
- cnt = d->nbentry + num;
-- assert(cnt >= 0);
-
-- if (update)
-+ if (update) {
-+ assert(cnt >= 0);
- d->nbentry = cnt;
-+ } else if (cnt < 0) {
-+ /*
-+ * In a transaction when a node is being added/removed AND
-+ * the same node has been added/removed outside the
-+ * transaction in parallel, the result value may be negative.
-+ * This is no problem, as the transaction will fail due to
-+ * the resulting conflict. So override 'cnt'.
-+ */
-+ cnt = 0;
-+ }
-
- return domid_is_unprivileged(domid) ? cnt : 0;
- }
---
-2.42.0
-
diff --git a/0042-iommu-amd-vi-flush-IOMMU-TLB-when-flushing-the-DTE.patch b/0042-iommu-amd-vi-flush-IOMMU-TLB-when-flushing-the-DTE.patch
deleted file mode 100644
index 66597c2..0000000
--- a/0042-iommu-amd-vi-flush-IOMMU-TLB-when-flushing-the-DTE.patch
+++ /dev/null
@@ -1,186 +0,0 @@
-From 0d8f9f7f2706e8ad8dfff203173693b631339b86 Mon Sep 17 00:00:00 2001
-From: Roger Pau Monne <roger.pau@citrix.com>
-Date: Tue, 13 Jun 2023 15:01:05 +0200
-Subject: [PATCH 42/55] iommu/amd-vi: flush IOMMU TLB when flushing the DTE
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The caching invalidation guidelines from the AMD-Vi specification (48882—Rev
-3.07-PUB—Oct 2022) seem to be misleading on some hardware, as devices will
-malfunction (see stale DMA mappings) if some fields of the DTE are updated but
-the IOMMU TLB is not flushed. This has been observed in practice on AMD
-systems. Due to the lack of guidance from the currently published
-specification this patch aims to increase the flushing done in order to prevent
-device malfunction.
-
-In order to fix, issue an INVALIDATE_IOMMU_PAGES command from
-amd_iommu_flush_device(), flushing all the address space. Note this requires
-callers to be adjusted in order to pass the DomID on the DTE previous to the
-modification.
-
-Some call sites don't provide a valid DomID to amd_iommu_flush_device() in
-order to avoid the flush. That's because the device had address translations
-disabled and hence the previous DomID on the DTE is not valid. Note the
-current logic relies on the entity disabling address translations to also flush
-the TLB of the in use DomID.
-
-Device I/O TLB flushing when ATS are enabled is not covered by the current
-change, as ATS usage is not security supported.
-
-This is XSA-442 / CVE-2023-34326
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 5fc98b97084a46884acef9320e643faf40d42212)
----
- xen/drivers/passthrough/amd/iommu.h | 3 ++-
- xen/drivers/passthrough/amd/iommu_cmd.c | 10 +++++++++-
- xen/drivers/passthrough/amd/iommu_guest.c | 5 +++--
- xen/drivers/passthrough/amd/iommu_init.c | 6 +++++-
- xen/drivers/passthrough/amd/pci_amd_iommu.c | 14 ++++++++++----
- 5 files changed, 29 insertions(+), 9 deletions(-)
-
-diff --git a/xen/drivers/passthrough/amd/iommu.h b/xen/drivers/passthrough/amd/iommu.h
-index 5429ada58e..a58be28bf9 100644
---- a/xen/drivers/passthrough/amd/iommu.h
-+++ b/xen/drivers/passthrough/amd/iommu.h
-@@ -283,7 +283,8 @@ void amd_iommu_flush_pages(struct domain *d, unsigned long dfn,
- unsigned int order);
- void amd_iommu_flush_iotlb(u8 devfn, const struct pci_dev *pdev,
- uint64_t gaddr, unsigned int order);
--void amd_iommu_flush_device(struct amd_iommu *iommu, uint16_t bdf);
-+void amd_iommu_flush_device(struct amd_iommu *iommu, uint16_t bdf,
-+ domid_t domid);
- void amd_iommu_flush_intremap(struct amd_iommu *iommu, uint16_t bdf);
- void amd_iommu_flush_all_caches(struct amd_iommu *iommu);
-
-diff --git a/xen/drivers/passthrough/amd/iommu_cmd.c b/xen/drivers/passthrough/amd/iommu_cmd.c
-index 40ddf366bb..cb28b36abc 100644
---- a/xen/drivers/passthrough/amd/iommu_cmd.c
-+++ b/xen/drivers/passthrough/amd/iommu_cmd.c
-@@ -363,10 +363,18 @@ void amd_iommu_flush_pages(struct domain *d,
- _amd_iommu_flush_pages(d, __dfn_to_daddr(dfn), order);
- }
-
--void amd_iommu_flush_device(struct amd_iommu *iommu, uint16_t bdf)
-+void amd_iommu_flush_device(struct amd_iommu *iommu, uint16_t bdf,
-+ domid_t domid)
- {
- invalidate_dev_table_entry(iommu, bdf);
- flush_command_buffer(iommu, 0);
-+
-+ /* Also invalidate IOMMU TLB entries when flushing the DTE. */
-+ if ( domid != DOMID_INVALID )
-+ {
-+ invalidate_iommu_pages(iommu, INV_IOMMU_ALL_PAGES_ADDRESS, domid, 0);
-+ flush_command_buffer(iommu, 0);
-+ }
- }
-
- void amd_iommu_flush_intremap(struct amd_iommu *iommu, uint16_t bdf)
-diff --git a/xen/drivers/passthrough/amd/iommu_guest.c b/xen/drivers/passthrough/amd/iommu_guest.c
-index 80a331f546..be86bce6fb 100644
---- a/xen/drivers/passthrough/amd/iommu_guest.c
-+++ b/xen/drivers/passthrough/amd/iommu_guest.c
-@@ -385,7 +385,7 @@ static int do_completion_wait(struct domain *d, cmd_entry_t *cmd)
-
- static int do_invalidate_dte(struct domain *d, cmd_entry_t *cmd)
- {
-- uint16_t gbdf, mbdf, req_id, gdom_id, hdom_id;
-+ uint16_t gbdf, mbdf, req_id, gdom_id, hdom_id, prev_domid;
- struct amd_iommu_dte *gdte, *mdte, *dte_base;
- struct amd_iommu *iommu = NULL;
- struct guest_iommu *g_iommu;
-@@ -445,13 +445,14 @@ static int do_invalidate_dte(struct domain *d, cmd_entry_t *cmd)
- req_id = get_dma_requestor_id(iommu->seg, mbdf);
- dte_base = iommu->dev_table.buffer;
- mdte = &dte_base[req_id];
-+ prev_domid = mdte->domain_id;
-
- spin_lock_irqsave(&iommu->lock, flags);
- dte_set_gcr3_table(mdte, hdom_id, gcr3_mfn << PAGE_SHIFT, gv, glx);
-
- spin_unlock_irqrestore(&iommu->lock, flags);
-
-- amd_iommu_flush_device(iommu, req_id);
-+ amd_iommu_flush_device(iommu, req_id, prev_domid);
-
- return 0;
- }
-diff --git a/xen/drivers/passthrough/amd/iommu_init.c b/xen/drivers/passthrough/amd/iommu_init.c
-index 166570648d..101a60ce17 100644
---- a/xen/drivers/passthrough/amd/iommu_init.c
-+++ b/xen/drivers/passthrough/amd/iommu_init.c
-@@ -1547,7 +1547,11 @@ static int cf_check _invalidate_all_devices(
- req_id = ivrs_mappings[bdf].dte_requestor_id;
- if ( iommu )
- {
-- amd_iommu_flush_device(iommu, req_id);
-+ /*
-+ * IOMMU TLB flush performed separately (see
-+ * invalidate_all_domain_pages()).
-+ */
-+ amd_iommu_flush_device(iommu, req_id, DOMID_INVALID);
- amd_iommu_flush_intremap(iommu, req_id);
- }
- }
-diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c
-index 94e3775506..8641b84712 100644
---- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
-+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
-@@ -192,10 +192,13 @@ static int __must_check amd_iommu_setup_domain_device(
-
- spin_unlock_irqrestore(&iommu->lock, flags);
-
-- amd_iommu_flush_device(iommu, req_id);
-+ /* DTE didn't have DMA translations enabled, do not flush the TLB. */
-+ amd_iommu_flush_device(iommu, req_id, DOMID_INVALID);
- }
- else if ( dte->pt_root != mfn_x(page_to_mfn(root_pg)) )
- {
-+ domid_t prev_domid = dte->domain_id;
-+
- /*
- * Strictly speaking if the device is the only one with this requestor
- * ID, it could be allowed to be re-assigned regardless of unity map
-@@ -252,7 +255,7 @@ static int __must_check amd_iommu_setup_domain_device(
-
- spin_unlock_irqrestore(&iommu->lock, flags);
-
-- amd_iommu_flush_device(iommu, req_id);
-+ amd_iommu_flush_device(iommu, req_id, prev_domid);
- }
- else
- spin_unlock_irqrestore(&iommu->lock, flags);
-@@ -421,6 +424,8 @@ static void amd_iommu_disable_domain_device(const struct domain *domain,
- spin_lock_irqsave(&iommu->lock, flags);
- if ( dte->tv || dte->v )
- {
-+ domid_t prev_domid = dte->domain_id;
-+
- /* See the comment in amd_iommu_setup_device_table(). */
- dte->int_ctl = IOMMU_DEV_TABLE_INT_CONTROL_ABORTED;
- smp_wmb();
-@@ -439,7 +444,7 @@ static void amd_iommu_disable_domain_device(const struct domain *domain,
-
- spin_unlock_irqrestore(&iommu->lock, flags);
-
-- amd_iommu_flush_device(iommu, req_id);
-+ amd_iommu_flush_device(iommu, req_id, prev_domid);
-
- AMD_IOMMU_DEBUG("Disable: device id = %#x, "
- "domain = %d, paging mode = %d\n",
-@@ -610,7 +615,8 @@ static int cf_check amd_iommu_add_device(u8 devfn, struct pci_dev *pdev)
-
- spin_unlock_irqrestore(&iommu->lock, flags);
-
-- amd_iommu_flush_device(iommu, bdf);
-+ /* DTE didn't have DMA translations enabled, do not flush the TLB. */
-+ amd_iommu_flush_device(iommu, bdf, DOMID_INVALID);
- }
-
- if ( amd_iommu_reserve_domain_unity_map(
---
-2.42.0
-
diff --git a/0043-libfsimage-xfs-Remove-dead-code.patch b/0043-libfsimage-xfs-Remove-dead-code.patch
deleted file mode 100644
index cbb9ad4..0000000
--- a/0043-libfsimage-xfs-Remove-dead-code.patch
+++ /dev/null
@@ -1,71 +0,0 @@
-From d665c6690eb3c2c86cb2c7dac09804211481f926 Mon Sep 17 00:00:00 2001
-From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-Date: Thu, 14 Sep 2023 13:22:50 +0100
-Subject: [PATCH 43/55] libfsimage/xfs: Remove dead code
-
-xfs_info.agnolog (and related code) and XFS_INO_AGBNO_BITS are dead code
-that serve no purpose.
-
-This is part of XSA-443 / CVE-2023-34325
-
-Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 37fc1e6c1c5c63aafd9cfd76a37728d5baea7d71)
----
- tools/libfsimage/xfs/fsys_xfs.c | 18 ------------------
- 1 file changed, 18 deletions(-)
-
-diff --git a/tools/libfsimage/xfs/fsys_xfs.c b/tools/libfsimage/xfs/fsys_xfs.c
-index d735a88e55..2800699f59 100644
---- a/tools/libfsimage/xfs/fsys_xfs.c
-+++ b/tools/libfsimage/xfs/fsys_xfs.c
-@@ -37,7 +37,6 @@ struct xfs_info {
- int blklog;
- int inopblog;
- int agblklog;
-- int agnolog;
- unsigned int nextents;
- xfs_daddr_t next;
- xfs_daddr_t daddr;
-@@ -65,9 +64,7 @@ static struct xfs_info xfs;
-
- #define XFS_INO_MASK(k) ((xfs_uint32_t)((1ULL << (k)) - 1))
- #define XFS_INO_OFFSET_BITS xfs.inopblog
--#define XFS_INO_AGBNO_BITS xfs.agblklog
- #define XFS_INO_AGINO_BITS (xfs.agblklog + xfs.inopblog)
--#define XFS_INO_AGNO_BITS xfs.agnolog
-
- static inline xfs_agblock_t
- agino2agbno (xfs_agino_t agino)
-@@ -149,20 +146,6 @@ xt_len (xfs_bmbt_rec_32_t *r)
- return le32(r->l3) & mask32lo(21);
- }
-
--static inline int
--xfs_highbit32(xfs_uint32_t v)
--{
-- int i;
--
-- if (--v) {
-- for (i = 0; i < 31; i++, v >>= 1) {
-- if (v == 0)
-- return i;
-- }
-- }
-- return 0;
--}
--
- static int
- isinxt (xfs_fileoff_t key, xfs_fileoff_t offset, xfs_filblks_t len)
- {
-@@ -472,7 +455,6 @@ xfs_mount (fsi_file_t *ffi, const char *options)
-
- xfs.inopblog = super.sb_inopblog;
- xfs.agblklog = super.sb_agblklog;
-- xfs.agnolog = xfs_highbit32 (le32(super.sb_agcount));
-
- xfs.btnode_ptr0_off =
- ((xfs.bsize - sizeof(xfs_btree_block_t)) /
---
-2.42.0
-
diff --git a/0044-libfsimage-xfs-Amend-mask32lo-to-allow-the-value-32.patch b/0044-libfsimage-xfs-Amend-mask32lo-to-allow-the-value-32.patch
deleted file mode 100644
index 880ff83..0000000
--- a/0044-libfsimage-xfs-Amend-mask32lo-to-allow-the-value-32.patch
+++ /dev/null
@@ -1,33 +0,0 @@
-From f1cd620cc3572c858e276463e05f695d949362c5 Mon Sep 17 00:00:00 2001
-From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-Date: Thu, 14 Sep 2023 13:22:51 +0100
-Subject: [PATCH 44/55] libfsimage/xfs: Amend mask32lo() to allow the value 32
-
-agblklog could plausibly be 32, but that would overflow this shift.
-Perform the shift as ULL and cast to u32 at the end instead.
-
-This is part of XSA-443 / CVE-2023-34325
-
-Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit ddc45e4eea946bb373a4b4a60c84bf9339cf413b)
----
- tools/libfsimage/xfs/fsys_xfs.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/tools/libfsimage/xfs/fsys_xfs.c b/tools/libfsimage/xfs/fsys_xfs.c
-index 2800699f59..4720bb4505 100644
---- a/tools/libfsimage/xfs/fsys_xfs.c
-+++ b/tools/libfsimage/xfs/fsys_xfs.c
-@@ -60,7 +60,7 @@ static struct xfs_info xfs;
- #define inode ((xfs_dinode_t *)((char *)FSYS_BUF + 8192))
- #define icore (inode->di_core)
-
--#define mask32lo(n) (((xfs_uint32_t)1 << (n)) - 1)
-+#define mask32lo(n) ((xfs_uint32_t)((1ull << (n)) - 1))
-
- #define XFS_INO_MASK(k) ((xfs_uint32_t)((1ULL << (k)) - 1))
- #define XFS_INO_OFFSET_BITS xfs.inopblog
---
-2.42.0
-
diff --git a/0045-libfsimage-xfs-Sanity-check-the-superblock-during-mo.patch b/0045-libfsimage-xfs-Sanity-check-the-superblock-during-mo.patch
deleted file mode 100644
index 01ae52a..0000000
--- a/0045-libfsimage-xfs-Sanity-check-the-superblock-during-mo.patch
+++ /dev/null
@@ -1,137 +0,0 @@
-From 78143c5336c8316bcc648e964d65a07f216cf77f Mon Sep 17 00:00:00 2001
-From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-Date: Thu, 14 Sep 2023 13:22:52 +0100
-Subject: [PATCH 45/55] libfsimage/xfs: Sanity-check the superblock during
- mounts
-
-Sanity-check the XFS superblock for wellformedness at the mount handler.
-This forces pygrub to abort parsing a potentially malformed filesystem and
-ensures the invariants assumed throughout the rest of the code hold.
-
-Also, derive parameters from previously sanitized parameters where possible
-(rather than reading them off the superblock)
-
-The code doesn't try to avoid overflowing the end of the disk, because
-that's an unlikely and benign error. Parameters used in calculations of
-xfs_daddr_t (like the root inode index) aren't in critical need of being
-sanitized.
-
-The sanitization of agblklog is basically checking that no obvious
-overflows happen on agblklog, and then ensuring agblocks is contained in
-the range (2^(sb_agblklog-1), 2^sb_agblklog].
-
-This is part of XSA-443 / CVE-2023-34325
-
-Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 620500dd1baf33347dfde5e7fde7cf7fe347da5c)
----
- tools/libfsimage/xfs/fsys_xfs.c | 48 ++++++++++++++++++++++++++-------
- tools/libfsimage/xfs/xfs.h | 12 +++++++++
- 2 files changed, 50 insertions(+), 10 deletions(-)
-
-diff --git a/tools/libfsimage/xfs/fsys_xfs.c b/tools/libfsimage/xfs/fsys_xfs.c
-index 4720bb4505..e4eb7e1ee2 100644
---- a/tools/libfsimage/xfs/fsys_xfs.c
-+++ b/tools/libfsimage/xfs/fsys_xfs.c
-@@ -17,6 +17,7 @@
- * along with this program; If not, see <http://www.gnu.org/licenses/>.
- */
-
-+#include <stdbool.h>
- #include <xenfsimage_grub.h>
- #include "xfs.h"
-
-@@ -433,29 +434,56 @@ first_dentry (fsi_file_t *ffi, xfs_ino_t *ino)
- return next_dentry (ffi, ino);
- }
-
-+static bool
-+xfs_sb_is_invalid (const xfs_sb_t *super)
-+{
-+ return (le32(super->sb_magicnum) != XFS_SB_MAGIC)
-+ || ((le16(super->sb_versionnum) & XFS_SB_VERSION_NUMBITS) !=
-+ XFS_SB_VERSION_4)
-+ || (super->sb_inodelog < XFS_SB_INODELOG_MIN)
-+ || (super->sb_inodelog > XFS_SB_INODELOG_MAX)
-+ || (super->sb_blocklog < XFS_SB_BLOCKLOG_MIN)
-+ || (super->sb_blocklog > XFS_SB_BLOCKLOG_MAX)
-+ || (super->sb_blocklog < super->sb_inodelog)
-+ || (super->sb_agblklog > XFS_SB_AGBLKLOG_MAX)
-+ || ((1ull << super->sb_agblklog) < le32(super->sb_agblocks))
-+ || (((1ull << super->sb_agblklog) >> 1) >=
-+ le32(super->sb_agblocks))
-+ || ((super->sb_blocklog + super->sb_dirblklog) >=
-+ XFS_SB_DIRBLK_NUMBITS);
-+}
-+
- static int
- xfs_mount (fsi_file_t *ffi, const char *options)
- {
- xfs_sb_t super;
-
- if (!devread (ffi, 0, 0, sizeof(super), (char *)&super)
-- || (le32(super.sb_magicnum) != XFS_SB_MAGIC)
-- || ((le16(super.sb_versionnum)
-- & XFS_SB_VERSION_NUMBITS) != XFS_SB_VERSION_4) ) {
-+ || xfs_sb_is_invalid(&super)) {
- return 0;
- }
-
-- xfs.bsize = le32 (super.sb_blocksize);
-- xfs.blklog = super.sb_blocklog;
-- xfs.bdlog = xfs.blklog - SECTOR_BITS;
-+ /*
-+ * Not sanitized. It's exclusively used to generate disk addresses,
-+ * so it's not important from a security standpoint.
-+ */
- xfs.rootino = le64 (super.sb_rootino);
-- xfs.isize = le16 (super.sb_inodesize);
-- xfs.agblocks = le32 (super.sb_agblocks);
-- xfs.dirbsize = xfs.bsize << super.sb_dirblklog;
-
-- xfs.inopblog = super.sb_inopblog;
-+ /*
-+ * Sanitized to be consistent with each other, only used to
-+ * generate disk addresses, so it's safe
-+ */
-+ xfs.agblocks = le32 (super.sb_agblocks);
- xfs.agblklog = super.sb_agblklog;
-
-+ /* Derived from sanitized parameters */
-+ xfs.bsize = 1 << super.sb_blocklog;
-+ xfs.blklog = super.sb_blocklog;
-+ xfs.bdlog = super.sb_blocklog - SECTOR_BITS;
-+ xfs.isize = 1 << super.sb_inodelog;
-+ xfs.dirbsize = 1 << (super.sb_blocklog + super.sb_dirblklog);
-+ xfs.inopblog = super.sb_blocklog - super.sb_inodelog;
-+
- xfs.btnode_ptr0_off =
- ((xfs.bsize - sizeof(xfs_btree_block_t)) /
- (sizeof (xfs_bmbt_key_t) + sizeof (xfs_bmbt_ptr_t)))
-diff --git a/tools/libfsimage/xfs/xfs.h b/tools/libfsimage/xfs/xfs.h
-index 40699281e4..b87e37d3d7 100644
---- a/tools/libfsimage/xfs/xfs.h
-+++ b/tools/libfsimage/xfs/xfs.h
-@@ -134,6 +134,18 @@ typedef struct xfs_sb
- xfs_uint8_t sb_dummy[7]; /* padding */
- } xfs_sb_t;
-
-+/* Bound taken from xfs.c in GRUB2. It doesn't exist in the spec */
-+#define XFS_SB_DIRBLK_NUMBITS 27
-+/* Implied by the XFS specification. The minimum block size is 512 octets */
-+#define XFS_SB_BLOCKLOG_MIN 9
-+/* Implied by the XFS specification. The maximum block size is 65536 octets */
-+#define XFS_SB_BLOCKLOG_MAX 16
-+/* Implied by the XFS specification. The minimum inode size is 256 octets */
-+#define XFS_SB_INODELOG_MIN 8
-+/* Implied by the XFS specification. The maximum inode size is 2048 octets */
-+#define XFS_SB_INODELOG_MAX 11
-+/* High bound for sb_agblklog */
-+#define XFS_SB_AGBLKLOG_MAX 32
-
- /* those are from xfs_btree.h */
-
---
-2.42.0
-
diff --git a/0046-libfsimage-xfs-Add-compile-time-check-to-libfsimage.patch b/0046-libfsimage-xfs-Add-compile-time-check-to-libfsimage.patch
deleted file mode 100644
index 0c32745..0000000
--- a/0046-libfsimage-xfs-Add-compile-time-check-to-libfsimage.patch
+++ /dev/null
@@ -1,62 +0,0 @@
-From eb4efdac4cc7121f832ee156f39761312878f3a5 Mon Sep 17 00:00:00 2001
-From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-Date: Thu, 14 Sep 2023 13:22:53 +0100
-Subject: [PATCH 46/55] libfsimage/xfs: Add compile-time check to libfsimage
-
-Adds the common tools include folder to the -I compile flags
-of libfsimage. This allows us to use:
- xen-tools/common-macros.h:BUILD_BUG_ON()
-
-With it, statically assert a sanitized "blocklog - SECTOR_BITS" cannot
-underflow.
-
-This is part of XSA-443 / CVE-2023-34325
-
-Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 7d85c70431593550e32022e3a19a37f306f49e00)
----
- tools/libfsimage/common.mk | 2 +-
- tools/libfsimage/xfs/fsys_xfs.c | 4 +++-
- 2 files changed, 4 insertions(+), 2 deletions(-)
-
-diff --git a/tools/libfsimage/common.mk b/tools/libfsimage/common.mk
-index 4fc8c66795..e4336837d0 100644
---- a/tools/libfsimage/common.mk
-+++ b/tools/libfsimage/common.mk
-@@ -1,7 +1,7 @@
- include $(XEN_ROOT)/tools/Rules.mk
-
- FSDIR := $(libdir)/xenfsimage
--CFLAGS += -Wno-unknown-pragmas -I$(XEN_ROOT)/tools/libfsimage/common/ -DFSIMAGE_FSDIR=\"$(FSDIR)\"
-+CFLAGS += -Wno-unknown-pragmas -I$(XEN_ROOT)/tools/libfsimage/common/ $(CFLAGS_xeninclude) -DFSIMAGE_FSDIR=\"$(FSDIR)\"
- CFLAGS += -D_GNU_SOURCE
- LDFLAGS += -L../common/
-
-diff --git a/tools/libfsimage/xfs/fsys_xfs.c b/tools/libfsimage/xfs/fsys_xfs.c
-index e4eb7e1ee2..4a8dd6f239 100644
---- a/tools/libfsimage/xfs/fsys_xfs.c
-+++ b/tools/libfsimage/xfs/fsys_xfs.c
-@@ -19,6 +19,7 @@
-
- #include <stdbool.h>
- #include <xenfsimage_grub.h>
-+#include <xen-tools/libs.h>
- #include "xfs.h"
-
- #define MAX_LINK_COUNT 8
-@@ -477,9 +478,10 @@ xfs_mount (fsi_file_t *ffi, const char *options)
- xfs.agblklog = super.sb_agblklog;
-
- /* Derived from sanitized parameters */
-+ BUILD_BUG_ON(XFS_SB_BLOCKLOG_MIN < SECTOR_BITS);
-+ xfs.bdlog = super.sb_blocklog - SECTOR_BITS;
- xfs.bsize = 1 << super.sb_blocklog;
- xfs.blklog = super.sb_blocklog;
-- xfs.bdlog = super.sb_blocklog - SECTOR_BITS;
- xfs.isize = 1 << super.sb_inodelog;
- xfs.dirbsize = 1 << (super.sb_blocklog + super.sb_dirblklog);
- xfs.inopblog = super.sb_blocklog - super.sb_inodelog;
---
-2.42.0
-
diff --git a/0047-tools-pygrub-Remove-unnecessary-hypercall.patch b/0047-tools-pygrub-Remove-unnecessary-hypercall.patch
deleted file mode 100644
index 6bdd9bb..0000000
--- a/0047-tools-pygrub-Remove-unnecessary-hypercall.patch
+++ /dev/null
@@ -1,60 +0,0 @@
-From 8a584126eae53a44cefb0acdbca201233a557fa5 Mon Sep 17 00:00:00 2001
-From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-Date: Mon, 25 Sep 2023 18:32:21 +0100
-Subject: [PATCH 47/55] tools/pygrub: Remove unnecessary hypercall
-
-There's a hypercall being issued in order to determine whether PV64 is
-supported, but since Xen 4.3 that's strictly true so it's not required.
-
-Plus, this way we can avoid mapping the privcmd interface altogether in the
-depriv pygrub.
-
-This is part of XSA-443 / CVE-2023-34325
-
-Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-(cherry picked from commit f4b504c6170c446e61055cbd388ae4e832a9deca)
----
- tools/pygrub/src/pygrub | 12 +-----------
- 1 file changed, 1 insertion(+), 11 deletions(-)
-
-diff --git a/tools/pygrub/src/pygrub b/tools/pygrub/src/pygrub
-index ce7ab0eb8c..ce4e07d3e8 100755
---- a/tools/pygrub/src/pygrub
-+++ b/tools/pygrub/src/pygrub
-@@ -18,7 +18,6 @@ import os, sys, string, struct, tempfile, re, traceback, stat, errno
- import copy
- import logging
- import platform
--import xen.lowlevel.xc
-
- import curses, _curses, curses.textpad, curses.ascii
- import getopt
-@@ -668,14 +667,6 @@ def run_grub(file, entry, fs, cfg_args):
-
- return grubcfg
-
--def supports64bitPVguest():
-- xc = xen.lowlevel.xc.xc()
-- caps = xc.xeninfo()['xen_caps'].split(" ")
-- for cap in caps:
-- if cap == "xen-3.0-x86_64":
-- return True
-- return False
--
- # If nothing has been specified, look for a Solaris domU. If found, perform the
- # necessary tweaks.
- def sniff_solaris(fs, cfg):
-@@ -684,8 +675,7 @@ def sniff_solaris(fs, cfg):
- return cfg
-
- if not cfg["kernel"]:
-- if supports64bitPVguest() and \
-- fs.file_exists("/platform/i86xpv/kernel/amd64/unix"):
-+ if fs.file_exists("/platform/i86xpv/kernel/amd64/unix"):
- cfg["kernel"] = "/platform/i86xpv/kernel/amd64/unix"
- cfg["ramdisk"] = "/platform/i86pc/amd64/boot_archive"
- elif fs.file_exists("/platform/i86xpv/kernel/unix"):
---
-2.42.0
-
diff --git a/0048-tools-pygrub-Small-refactors.patch b/0048-tools-pygrub-Small-refactors.patch
deleted file mode 100644
index 55b238c..0000000
--- a/0048-tools-pygrub-Small-refactors.patch
+++ /dev/null
@@ -1,65 +0,0 @@
-From e7059f16f7c2b99fea30b9671fec74c0375eee8f Mon Sep 17 00:00:00 2001
-From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-Date: Mon, 25 Sep 2023 18:32:22 +0100
-Subject: [PATCH 48/55] tools/pygrub: Small refactors
-
-Small tidy up to ensure output_directory always has a trailing '/' to ease
-concatenating paths and that `output` can only be a filename or None.
-
-This is part of XSA-443 / CVE-2023-34325
-
-Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-(cherry picked from commit 9f2ff9a7c9b3ac734ae99f17f0134ed0343dcccf)
----
- tools/pygrub/src/pygrub | 10 +++++-----
- 1 file changed, 5 insertions(+), 5 deletions(-)
-
-diff --git a/tools/pygrub/src/pygrub b/tools/pygrub/src/pygrub
-index ce4e07d3e8..1042c05b86 100755
---- a/tools/pygrub/src/pygrub
-+++ b/tools/pygrub/src/pygrub
-@@ -793,7 +793,7 @@ if __name__ == "__main__":
- debug = False
- not_really = False
- output_format = "sxp"
-- output_directory = "/var/run/xen/pygrub"
-+ output_directory = "/var/run/xen/pygrub/"
-
- # what was passed in
- incfg = { "kernel": None, "ramdisk": None, "args": "" }
-@@ -815,7 +815,8 @@ if __name__ == "__main__":
- usage()
- sys.exit()
- elif o in ("--output",):
-- output = a
-+ if a != "-":
-+ output = a
- elif o in ("--kernel",):
- incfg["kernel"] = a
- elif o in ("--ramdisk",):
-@@ -847,12 +848,11 @@ if __name__ == "__main__":
- if not os.path.isdir(a):
- print("%s is not an existing directory" % a)
- sys.exit(1)
-- output_directory = a
-+ output_directory = a + '/'
-
- if debug:
- logging.basicConfig(level=logging.DEBUG)
-
--
- try:
- os.makedirs(output_directory, 0o700)
- except OSError as e:
-@@ -861,7 +861,7 @@ if __name__ == "__main__":
- else:
- raise
-
-- if output is None or output == "-":
-+ if output is None:
- fd = sys.stdout.fileno()
- else:
- fd = os.open(output, os.O_WRONLY)
---
-2.42.0
-
diff --git a/0049-tools-pygrub-Open-the-output-files-earlier.patch b/0049-tools-pygrub-Open-the-output-files-earlier.patch
deleted file mode 100644
index c3b00b1..0000000
--- a/0049-tools-pygrub-Open-the-output-files-earlier.patch
+++ /dev/null
@@ -1,105 +0,0 @@
-From 37977420670c65db220349510599d3fe47600ad8 Mon Sep 17 00:00:00 2001
-From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-Date: Mon, 25 Sep 2023 18:32:23 +0100
-Subject: [PATCH 49/55] tools/pygrub: Open the output files earlier
-
-This patch allows pygrub to get ahold of every RW file descriptor it needs
-early on. A later patch will clamp the filesystem it can access so it can't
-obtain any others.
-
-This is part of XSA-443 / CVE-2023-34325
-
-Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-(cherry picked from commit 0710d7d44586251bfca9758890616dc3d6de8a74)
----
- tools/pygrub/src/pygrub | 37 ++++++++++++++++++++++---------------
- 1 file changed, 22 insertions(+), 15 deletions(-)
-
-diff --git a/tools/pygrub/src/pygrub b/tools/pygrub/src/pygrub
-index 1042c05b86..91e2ec2ab1 100755
---- a/tools/pygrub/src/pygrub
-+++ b/tools/pygrub/src/pygrub
-@@ -738,8 +738,7 @@ if __name__ == "__main__":
- def usage():
- print("Usage: %s [-q|--quiet] [-i|--interactive] [-l|--list-entries] [-n|--not-really] [--output=] [--kernel=] [--ramdisk=] [--args=] [--entry=] [--output-directory=] [--output-format=sxp|simple|simple0] [--offset=] <image>" %(sys.argv[0],), file=sys.stderr)
-
-- def copy_from_image(fs, file_to_read, file_type, output_directory,
-- not_really):
-+ def copy_from_image(fs, file_to_read, file_type, fd_dst, path_dst, not_really):
- if not_really:
- if fs.file_exists(file_to_read):
- return "<%s:%s>" % (file_type, file_to_read)
-@@ -750,21 +749,18 @@ if __name__ == "__main__":
- except Exception as e:
- print(e, file=sys.stderr)
- sys.exit("Error opening %s in guest" % file_to_read)
-- (tfd, ret) = tempfile.mkstemp(prefix="boot_"+file_type+".",
-- dir=output_directory)
- dataoff = 0
- while True:
- data = datafile.read(FS_READ_MAX, dataoff)
- if len(data) == 0:
-- os.close(tfd)
-+ os.close(fd_dst)
- del datafile
-- return ret
-+ return
- try:
-- os.write(tfd, data)
-+ os.write(fd_dst, data)
- except Exception as e:
- print(e, file=sys.stderr)
-- os.close(tfd)
-- os.unlink(ret)
-+ os.unlink(path_dst)
- del datafile
- sys.exit("Error writing temporary copy of "+file_type)
- dataoff += len(data)
-@@ -861,6 +857,14 @@ if __name__ == "__main__":
- else:
- raise
-
-+ if not_really:
-+ fd_kernel = path_kernel = fd_ramdisk = path_ramdisk = None
-+ else:
-+ (fd_kernel, path_kernel) = tempfile.mkstemp(prefix="boot_kernel.",
-+ dir=output_directory)
-+ (fd_ramdisk, path_ramdisk) = tempfile.mkstemp(prefix="boot_ramdisk.",
-+ dir=output_directory)
-+
- if output is None:
- fd = sys.stdout.fileno()
- else:
-@@ -920,20 +924,23 @@ if __name__ == "__main__":
- if fs is None:
- raise RuntimeError("Unable to find partition containing kernel")
-
-- bootcfg["kernel"] = copy_from_image(fs, chosencfg["kernel"], "kernel",
-- output_directory, not_really)
-+ copy_from_image(fs, chosencfg["kernel"], "kernel",
-+ fd_kernel, path_kernel, not_really)
-+ bootcfg["kernel"] = path_kernel
-
- if chosencfg["ramdisk"]:
- try:
-- bootcfg["ramdisk"] = copy_from_image(fs, chosencfg["ramdisk"],
-- "ramdisk", output_directory,
-- not_really)
-+ copy_from_image(fs, chosencfg["ramdisk"], "ramdisk",
-+ fd_ramdisk, path_ramdisk, not_really)
- except:
- if not not_really:
-- os.unlink(bootcfg["kernel"])
-+ os.unlink(path_kernel)
- raise
-+ bootcfg["ramdisk"] = path_ramdisk
- else:
- initrd = None
-+ if not not_really:
-+ os.unlink(path_ramdisk)
-
- args = None
- if chosencfg["args"]:
---
-2.42.0
-
diff --git a/0050-tools-libfsimage-Export-a-new-function-to-preload-al.patch b/0050-tools-libfsimage-Export-a-new-function-to-preload-al.patch
deleted file mode 100644
index 949528d..0000000
--- a/0050-tools-libfsimage-Export-a-new-function-to-preload-al.patch
+++ /dev/null
@@ -1,126 +0,0 @@
-From 8ee19246ad2c1d0ce241a52683f56b144a4f0b0e Mon Sep 17 00:00:00 2001
-From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-Date: Mon, 25 Sep 2023 18:32:24 +0100
-Subject: [PATCH 50/55] tools/libfsimage: Export a new function to preload all
- plugins
-
-This is work required in order to let pygrub operate in highly deprivileged
-chroot mode. This patch adds a function that preloads every plugin, hence
-ensuring that a on function exit, every shared library is loaded in memory.
-
-The new "init" function is supposed to be used before depriv, but that's
-fine because it's not acting on untrusted data.
-
-This is part of XSA-443 / CVE-2023-34325
-
-Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-(cherry picked from commit 990e65c3ad9ac08642ce62a92852c80be6c83e96)
----
- tools/libfsimage/common/fsimage_plugin.c | 4 ++--
- tools/libfsimage/common/mapfile-GNU | 1 +
- tools/libfsimage/common/mapfile-SunOS | 1 +
- tools/libfsimage/common/xenfsimage.h | 8 ++++++++
- tools/pygrub/src/fsimage/fsimage.c | 15 +++++++++++++++
- 5 files changed, 27 insertions(+), 2 deletions(-)
-
-diff --git a/tools/libfsimage/common/fsimage_plugin.c b/tools/libfsimage/common/fsimage_plugin.c
-index de1412b423..d0cb9e96a6 100644
---- a/tools/libfsimage/common/fsimage_plugin.c
-+++ b/tools/libfsimage/common/fsimage_plugin.c
-@@ -119,7 +119,7 @@ fail:
- return (-1);
- }
-
--static int load_plugins(void)
-+int fsi_init(void)
- {
- const char *fsdir = getenv("XEN_FSIMAGE_FSDIR");
- struct dirent *dp = NULL;
-@@ -180,7 +180,7 @@ int find_plugin(fsi_t *fsi, const char *path, const char *options)
- fsi_plugin_t *fp;
- int ret = 0;
-
-- if (plugins == NULL && (ret = load_plugins()) != 0)
-+ if (plugins == NULL && (ret = fsi_init()) != 0)
- goto out;
-
- for (fp = plugins; fp != NULL; fp = fp->fp_next) {
-diff --git a/tools/libfsimage/common/mapfile-GNU b/tools/libfsimage/common/mapfile-GNU
-index 26d4d7a69e..2d54d527d7 100644
---- a/tools/libfsimage/common/mapfile-GNU
-+++ b/tools/libfsimage/common/mapfile-GNU
-@@ -1,6 +1,7 @@
- VERSION {
- libfsimage.so.1.0 {
- global:
-+ fsi_init;
- fsi_open_fsimage;
- fsi_close_fsimage;
- fsi_file_exists;
-diff --git a/tools/libfsimage/common/mapfile-SunOS b/tools/libfsimage/common/mapfile-SunOS
-index e99b90b650..48deedb425 100644
---- a/tools/libfsimage/common/mapfile-SunOS
-+++ b/tools/libfsimage/common/mapfile-SunOS
-@@ -1,5 +1,6 @@
- libfsimage.so.1.0 {
- global:
-+ fsi_init;
- fsi_open_fsimage;
- fsi_close_fsimage;
- fsi_file_exists;
-diff --git a/tools/libfsimage/common/xenfsimage.h b/tools/libfsimage/common/xenfsimage.h
-index 201abd54f2..341883b2d7 100644
---- a/tools/libfsimage/common/xenfsimage.h
-+++ b/tools/libfsimage/common/xenfsimage.h
-@@ -35,6 +35,14 @@ extern C {
- typedef struct fsi fsi_t;
- typedef struct fsi_file fsi_file_t;
-
-+/*
-+ * Optional initialization function. If invoked it loads the associated
-+ * dynamic libraries for the backends ahead of time. This is required if
-+ * the library is to run as part of a highly deprivileged executable, as
-+ * the libraries may not be reachable after depriv.
-+ */
-+int fsi_init(void);
-+
- fsi_t *fsi_open_fsimage(const char *, uint64_t, const char *);
- void fsi_close_fsimage(fsi_t *);
-
-diff --git a/tools/pygrub/src/fsimage/fsimage.c b/tools/pygrub/src/fsimage/fsimage.c
-index 2ebbbe35df..92fbf2851f 100644
---- a/tools/pygrub/src/fsimage/fsimage.c
-+++ b/tools/pygrub/src/fsimage/fsimage.c
-@@ -286,6 +286,15 @@ fsimage_getbootstring(PyObject *o, PyObject *args)
- return Py_BuildValue("s", bootstring);
- }
-
-+static PyObject *
-+fsimage_init(PyObject *o, PyObject *args)
-+{
-+ if (!PyArg_ParseTuple(args, ""))
-+ return (NULL);
-+
-+ return Py_BuildValue("i", fsi_init());
-+}
-+
- PyDoc_STRVAR(fsimage_open__doc__,
- "open(name, [offset=off]) - Open the given file as a filesystem image.\n"
- "\n"
-@@ -297,7 +306,13 @@ PyDoc_STRVAR(fsimage_getbootstring__doc__,
- "getbootstring(fs) - Return the boot string needed for this file system "
- "or NULL if none is needed.\n");
-
-+PyDoc_STRVAR(fsimage_init__doc__,
-+ "init() - Loads every dynamic library contained in xenfsimage "
-+ "into memory so that it can be used in chrooted environments.\n");
-+
- static struct PyMethodDef fsimage_module_methods[] = {
-+ { "init", (PyCFunction)fsimage_init,
-+ METH_VARARGS, fsimage_init__doc__ },
- { "open", (PyCFunction)fsimage_open,
- METH_VARARGS|METH_KEYWORDS, fsimage_open__doc__ },
- { "getbootstring", (PyCFunction)fsimage_getbootstring,
---
-2.42.0
-
diff --git a/0051-tools-pygrub-Deprivilege-pygrub.patch b/0051-tools-pygrub-Deprivilege-pygrub.patch
deleted file mode 100644
index 1d89191..0000000
--- a/0051-tools-pygrub-Deprivilege-pygrub.patch
+++ /dev/null
@@ -1,307 +0,0 @@
-From f5e211654e5fbb7f1fc5cfea7f9c7ab525edb9e7 Mon Sep 17 00:00:00 2001
-From: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-Date: Mon, 25 Sep 2023 18:32:25 +0100
-Subject: [PATCH 51/55] tools/pygrub: Deprivilege pygrub
-
-Introduce a --runas=<uid> flag to deprivilege pygrub on Linux and *BSDs. It
-also implicitly creates a chroot env where it drops a deprivileged forked
-process. The chroot itself is cleaned up at the end.
-
-If the --runas arg is present, then pygrub forks, leaving the child to
-deprivilege itself, and waiting for it to complete. When the child exists,
-the parent performs cleanup and exits with the same error code.
-
-This is roughly what the child does:
- 1. Initialize libfsimage (this loads every .so in memory so the chroot
- can avoid bind-mounting /{,usr}/lib*
- 2. Create a temporary empty chroot directory
- 3. Mount tmpfs in it
- 4. Bind mount the disk inside, because libfsimage expects a path, not a
- file descriptor.
- 5. Remount the root tmpfs to be stricter (ro,nosuid,nodev)
- 6. Set RLIMIT_FSIZE to a sensibly high amount (128 MiB)
- 7. Depriv gid, groups and uid
-
-With this scheme in place, the "output" files are writable (up to
-RLIMIT_FSIZE octets) and the exposed filesystem is immutable and contains
-the single only file we can't easily get rid of (the disk).
-
-If running on Linux, the child process also unshares mount, IPC, and
-network namespaces before dropping its privileges.
-
-This is part of XSA-443 / CVE-2023-34325
-
-Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
-(cherry picked from commit e0342ae5556f2b6e2db50701b8a0679a45822ca6)
----
- tools/pygrub/setup.py | 2 +-
- tools/pygrub/src/pygrub | 162 +++++++++++++++++++++++++++++++++++++---
- 2 files changed, 154 insertions(+), 10 deletions(-)
-
-diff --git a/tools/pygrub/setup.py b/tools/pygrub/setup.py
-index 0e4e3d02d3..06b96733d0 100644
---- a/tools/pygrub/setup.py
-+++ b/tools/pygrub/setup.py
-@@ -17,7 +17,7 @@ xenfsimage = Extension("xenfsimage",
- pkgs = [ 'grub' ]
-
- setup(name='pygrub',
-- version='0.6',
-+ version='0.7',
- description='Boot loader that looks a lot like grub for Xen',
- author='Jeremy Katz',
- author_email='katzj@redhat.com',
-diff --git a/tools/pygrub/src/pygrub b/tools/pygrub/src/pygrub
-index 91e2ec2ab1..7cea496ade 100755
---- a/tools/pygrub/src/pygrub
-+++ b/tools/pygrub/src/pygrub
-@@ -16,8 +16,11 @@ from __future__ import print_function
-
- import os, sys, string, struct, tempfile, re, traceback, stat, errno
- import copy
-+import ctypes, ctypes.util
- import logging
- import platform
-+import resource
-+import subprocess
-
- import curses, _curses, curses.textpad, curses.ascii
- import getopt
-@@ -27,10 +30,135 @@ import grub.GrubConf
- import grub.LiloConf
- import grub.ExtLinuxConf
-
--PYGRUB_VER = 0.6
-+PYGRUB_VER = 0.7
- FS_READ_MAX = 1024 * 1024
- SECTOR_SIZE = 512
-
-+# Unless provided through the env variable PYGRUB_MAX_FILE_SIZE_MB, then
-+# this is the maximum filesize allowed for files written by the depriv
-+# pygrub
-+LIMIT_FSIZE = 128 << 20
-+
-+CLONE_NEWNS = 0x00020000 # mount namespace
-+CLONE_NEWNET = 0x40000000 # network namespace
-+CLONE_NEWIPC = 0x08000000 # IPC namespace
-+
-+def unshare(flags):
-+ if not sys.platform.startswith("linux"):
-+ print("skip_unshare reason=not_linux platform=%s", sys.platform, file=sys.stderr)
-+ return
-+
-+ libc = ctypes.CDLL(ctypes.util.find_library('c'), use_errno=True)
-+ unshare_prototype = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.c_int, use_errno=True)
-+ unshare = unshare_prototype(('unshare', libc))
-+
-+ if unshare(flags) < 0:
-+ raise OSError(ctypes.get_errno(), os.strerror(ctypes.get_errno()))
-+
-+def bind_mount(src, dst, options):
-+ open(dst, "a").close() # touch
-+
-+ rc = subprocess.call(["mount", "--bind", "-o", options, src, dst])
-+ if rc != 0:
-+ raise RuntimeError("bad_mount: src=%s dst=%s opts=%s" %
-+ (src, dst, options))
-+
-+def downgrade_rlimits():
-+ # Wipe the authority to use unrequired resources
-+ resource.setrlimit(resource.RLIMIT_NPROC, (0, 0))
-+ resource.setrlimit(resource.RLIMIT_CORE, (0, 0))
-+ resource.setrlimit(resource.RLIMIT_MEMLOCK, (0, 0))
-+
-+ # py2's resource module doesn't know about resource.RLIMIT_MSGQUEUE
-+ #
-+ # TODO: Use resource.RLIMIT_MSGQUEUE after python2 is deprecated
-+ if sys.platform.startswith('linux'):
-+ RLIMIT_MSGQUEUE = 12
-+ resource.setrlimit(RLIMIT_MSGQUEUE, (0, 0))
-+
-+ # The final look of the filesystem for this process is fully RO, but
-+ # note we have some file descriptor already open (notably, kernel and
-+ # ramdisk). In order to avoid a compromised pygrub from filling up the
-+ # filesystem we set RLIMIT_FSIZE to a high bound, so that the file
-+ # write permissions are bound.
-+ fsize = LIMIT_FSIZE
-+ if "PYGRUB_MAX_FILE_SIZE_MB" in os.environ.keys():
-+ fsize = os.environ["PYGRUB_MAX_FILE_SIZE_MB"] << 20
-+
-+ resource.setrlimit(resource.RLIMIT_FSIZE, (fsize, fsize))
-+
-+def depriv(output_directory, output, device, uid, path_kernel, path_ramdisk):
-+ # The only point of this call is to force the loading of libfsimage.
-+ # That way, we don't need to bind-mount it into the chroot
-+ rc = xenfsimage.init()
-+ if rc != 0:
-+ os.unlink(path_ramdisk)
-+ os.unlink(path_kernel)
-+ raise RuntimeError("bad_xenfsimage: rc=%d" % rc)
-+
-+ # Create a temporary directory for the chroot
-+ chroot = tempfile.mkdtemp(prefix=str(uid)+'-', dir=output_directory) + '/'
-+ device_path = '/device'
-+
-+ pid = os.fork()
-+ if pid:
-+ # parent
-+ _, rc = os.waitpid(pid, 0)
-+
-+ for path in [path_kernel, path_ramdisk]:
-+ # If the child didn't write anything, just get rid of it,
-+ # otherwise we end up consuming a 0-size file when parsing
-+ # systems without a ramdisk that the ultimate caller of pygrub
-+ # may just be unaware of
-+ if rc != 0 or os.path.getsize(path) == 0:
-+ os.unlink(path)
-+
-+ # Normally, unshare(CLONE_NEWNS) will ensure this is not required.
-+ # However, this syscall doesn't exist in *BSD systems and doesn't
-+ # auto-unmount everything on older Linux kernels (At least as of
-+ # Linux 4.19, but it seems fixed in 5.15). Either way,
-+ # recursively unmount everything if needed. Quietly.
-+ with open('/dev/null', 'w') as devnull:
-+ subprocess.call(["umount", "-f", chroot + device_path],
-+ stdout=devnull, stderr=devnull)
-+ subprocess.call(["umount", "-f", chroot],
-+ stdout=devnull, stderr=devnull)
-+ os.rmdir(chroot)
-+
-+ sys.exit(rc)
-+
-+ # By unsharing the namespace we're making sure it's all bulk-released
-+ # at the end, when the namespaces disappear. This means the kernel does
-+ # (almost) all the cleanup for us and the parent just has to remove the
-+ # temporary directory.
-+ unshare(CLONE_NEWNS | CLONE_NEWIPC | CLONE_NEWNET)
-+
-+ # Set sensible limits using the setrlimit interface
-+ downgrade_rlimits()
-+
-+ # We'll mount tmpfs on the chroot to ensure the deprivileged child
-+ # cannot affect the persistent state. It's RW now in order to
-+ # bind-mount the device, but note it's remounted RO after that.
-+ rc = subprocess.call(["mount", "-t", "tmpfs", "none", chroot])
-+ if rc != 0:
-+ raise RuntimeError("mount_tmpfs rc=%d dst=\"%s\"" % (rc, chroot))
-+
-+ # Bind the untrusted device RO
-+ bind_mount(device, chroot + device_path, "ro,nosuid,noexec")
-+
-+ rc = subprocess.call(["mount", "-t", "tmpfs", "-o", "remount,ro,nosuid,noexec,nodev", "none", chroot])
-+ if rc != 0:
-+ raise RuntimeError("remount_tmpfs rc=%d dst=\"%s\"" % (rc, chroot))
-+
-+ # Drop superpowers!
-+ os.chroot(chroot)
-+ os.chdir('/')
-+ os.setgid(uid)
-+ os.setgroups([uid])
-+ os.setuid(uid)
-+
-+ return device_path
-+
- def read_size_roundup(fd, size):
- if platform.system() != 'FreeBSD':
- return size
-@@ -736,7 +864,7 @@ if __name__ == "__main__":
- sel = None
-
- def usage():
-- print("Usage: %s [-q|--quiet] [-i|--interactive] [-l|--list-entries] [-n|--not-really] [--output=] [--kernel=] [--ramdisk=] [--args=] [--entry=] [--output-directory=] [--output-format=sxp|simple|simple0] [--offset=] <image>" %(sys.argv[0],), file=sys.stderr)
-+ print("Usage: %s [-q|--quiet] [-i|--interactive] [-l|--list-entries] [-n|--not-really] [--output=] [--kernel=] [--ramdisk=] [--args=] [--entry=] [--output-directory=] [--output-format=sxp|simple|simple0] [--runas=] [--offset=] <image>" %(sys.argv[0],), file=sys.stderr)
-
- def copy_from_image(fs, file_to_read, file_type, fd_dst, path_dst, not_really):
- if not_really:
-@@ -760,7 +888,8 @@ if __name__ == "__main__":
- os.write(fd_dst, data)
- except Exception as e:
- print(e, file=sys.stderr)
-- os.unlink(path_dst)
-+ if path_dst:
-+ os.unlink(path_dst)
- del datafile
- sys.exit("Error writing temporary copy of "+file_type)
- dataoff += len(data)
-@@ -769,7 +898,7 @@ if __name__ == "__main__":
- opts, args = getopt.gnu_getopt(sys.argv[1:], 'qilnh::',
- ["quiet", "interactive", "list-entries", "not-really", "help",
- "output=", "output-format=", "output-directory=", "offset=",
-- "entry=", "kernel=",
-+ "runas=", "entry=", "kernel=",
- "ramdisk=", "args=", "isconfig", "debug"])
- except getopt.GetoptError:
- usage()
-@@ -790,6 +919,7 @@ if __name__ == "__main__":
- not_really = False
- output_format = "sxp"
- output_directory = "/var/run/xen/pygrub/"
-+ uid = None
-
- # what was passed in
- incfg = { "kernel": None, "ramdisk": None, "args": "" }
-@@ -813,6 +943,13 @@ if __name__ == "__main__":
- elif o in ("--output",):
- if a != "-":
- output = a
-+ elif o in ("--runas",):
-+ try:
-+ uid = int(a)
-+ except ValueError:
-+ print("runas value must be an integer user id")
-+ usage()
-+ sys.exit(1)
- elif o in ("--kernel",):
- incfg["kernel"] = a
- elif o in ("--ramdisk",):
-@@ -849,6 +986,10 @@ if __name__ == "__main__":
- if debug:
- logging.basicConfig(level=logging.DEBUG)
-
-+ if interactive and uid:
-+ print("In order to use --runas, you must also set --entry or -q", file=sys.stderr)
-+ sys.exit(1)
-+
- try:
- os.makedirs(output_directory, 0o700)
- except OSError as e:
-@@ -870,6 +1011,9 @@ if __name__ == "__main__":
- else:
- fd = os.open(output, os.O_WRONLY)
-
-+ if uid:
-+ file = depriv(output_directory, output, file, uid, path_kernel, path_ramdisk)
-+
- # debug
- if isconfig:
- chosencfg = run_grub(file, entry, fs, incfg["args"])
-@@ -925,21 +1069,21 @@ if __name__ == "__main__":
- raise RuntimeError("Unable to find partition containing kernel")
-
- copy_from_image(fs, chosencfg["kernel"], "kernel",
-- fd_kernel, path_kernel, not_really)
-+ fd_kernel, None if uid else path_kernel, not_really)
- bootcfg["kernel"] = path_kernel
-
- if chosencfg["ramdisk"]:
- try:
- copy_from_image(fs, chosencfg["ramdisk"], "ramdisk",
-- fd_ramdisk, path_ramdisk, not_really)
-+ fd_ramdisk, None if uid else path_ramdisk, not_really)
- except:
-- if not not_really:
-- os.unlink(path_kernel)
-+ if not uid and not not_really:
-+ os.unlink(path_kernel)
- raise
- bootcfg["ramdisk"] = path_ramdisk
- else:
- initrd = None
-- if not not_really:
-+ if not uid and not not_really:
- os.unlink(path_ramdisk)
-
- args = None
---
-2.42.0
-
diff --git a/0052-libxl-add-support-for-running-bootloader-in-restrict.patch b/0052-libxl-add-support-for-running-bootloader-in-restrict.patch
deleted file mode 100644
index 08691b9..0000000
--- a/0052-libxl-add-support-for-running-bootloader-in-restrict.patch
+++ /dev/null
@@ -1,251 +0,0 @@
-From 42bf49d74b711ca7fef37bcde12928220c8e9700 Mon Sep 17 00:00:00 2001
-From: Roger Pau Monne <roger.pau@citrix.com>
-Date: Mon, 25 Sep 2023 14:30:20 +0200
-Subject: [PATCH 52/55] libxl: add support for running bootloader in restricted
- mode
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Much like the device model depriv mode, add the same kind of support for the
-bootloader. Such feature allows passing a UID as a parameter for the
-bootloader to run as, together with the bootloader itself taking the necessary
-actions to isolate.
-
-Note that the user to run the bootloader as must have the right permissions to
-access the guest disk image (in read mode only), and that the bootloader will
-be run in non-interactive mode when restricted.
-
-If enabled bootloader restrict mode will attempt to re-use the user(s) from the
-QEMU depriv implementation if no user is provided on the configuration file or
-the environment. See docs/features/qemu-deprivilege.pandoc for more
-information about how to setup those users.
-
-Bootloader restrict mode is not enabled by default as it requires certain
-setup to be done first (setup of the user(s) to use in restrict mode).
-
-This is part of XSA-443 / CVE-2023-34325
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
-(cherry picked from commit 1f762642d2cad1a40634e3280361928109d902f1)
----
- docs/man/xl.1.pod.in | 33 +++++++++++
- tools/libs/light/libxl_bootloader.c | 89 ++++++++++++++++++++++++++++-
- tools/libs/light/libxl_dm.c | 8 +--
- tools/libs/light/libxl_internal.h | 8 +++
- 4 files changed, 131 insertions(+), 7 deletions(-)
-
-diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
-index 101e14241d..4831e12242 100644
---- a/docs/man/xl.1.pod.in
-+++ b/docs/man/xl.1.pod.in
-@@ -1957,6 +1957,39 @@ ignored:
-
- =back
-
-+=head1 ENVIRONMENT VARIABLES
-+
-+The following environment variables shall affect the execution of xl:
-+
-+=over 4
-+
-+=item LIBXL_BOOTLOADER_RESTRICT
-+
-+Attempt to restrict the bootloader after startup, to limit the
-+consequences of security vulnerabilities due to parsing guest
-+owned image files.
-+
-+See docs/features/qemu-deprivilege.pandoc for more information
-+on how to setup the unprivileged users.
-+
-+Note that running the bootloader in restricted mode also implies using
-+non-interactive mode, and the disk image must be readable by the
-+restricted user.
-+
-+Having this variable set is equivalent to enabling the option, even if the
-+value is 0.
-+
-+=item LIBXL_BOOTLOADER_USER
-+
-+When using bootloader_restrict, run the bootloader as this user. If
-+not set the default QEMU restrict users will be used.
-+
-+NOTE: Each domain MUST have a SEPARATE username.
-+
-+See docs/features/qemu-deprivilege.pandoc for more information.
-+
-+=back
-+
- =head1 SEE ALSO
-
- The following man pages:
-diff --git a/tools/libs/light/libxl_bootloader.c b/tools/libs/light/libxl_bootloader.c
-index 108329b4a5..23c0ef3e89 100644
---- a/tools/libs/light/libxl_bootloader.c
-+++ b/tools/libs/light/libxl_bootloader.c
-@@ -14,6 +14,7 @@
-
- #include "libxl_osdeps.h" /* must come before any other headers */
-
-+#include <pwd.h>
- #include <termios.h>
- #ifdef HAVE_UTMP_H
- #include <utmp.h>
-@@ -42,8 +43,71 @@ static void bootloader_arg(libxl__bootloader_state *bl, const char *arg)
- bl->args[bl->nargs++] = arg;
- }
-
--static void make_bootloader_args(libxl__gc *gc, libxl__bootloader_state *bl,
-- const char *bootloader_path)
-+static int bootloader_uid(libxl__gc *gc, domid_t guest_domid,
-+ const char *user, uid_t *intended_uid)
-+{
-+ struct passwd *user_base, user_pwbuf;
-+ int rc;
-+
-+ if (user) {
-+ rc = userlookup_helper_getpwnam(gc, user, &user_pwbuf, &user_base);
-+ if (rc) return rc;
-+
-+ if (!user_base) {
-+ LOGD(ERROR, guest_domid, "Couldn't find user %s", user);
-+ return ERROR_INVAL;
-+ }
-+
-+ *intended_uid = user_base->pw_uid;
-+ return 0;
-+ }
-+
-+ /* Re-use QEMU user range for the bootloader. */
-+ rc = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_RANGE_BASE,
-+ &user_pwbuf, &user_base);
-+ if (rc) return rc;
-+
-+ if (user_base) {
-+ struct passwd *user_clash, user_clash_pwbuf;
-+ uid_t temp_uid = user_base->pw_uid + guest_domid;
-+
-+ rc = userlookup_helper_getpwuid(gc, temp_uid, &user_clash_pwbuf,
-+ &user_clash);
-+ if (rc) return rc;
-+
-+ if (user_clash) {
-+ LOGD(ERROR, guest_domid,
-+ "wanted to use uid %ld (%s + %d) but that is user %s !",
-+ (long)temp_uid, LIBXL_QEMU_USER_RANGE_BASE,
-+ guest_domid, user_clash->pw_name);
-+ return ERROR_INVAL;
-+ }
-+
-+ *intended_uid = temp_uid;
-+ return 0;
-+ }
-+
-+ rc = userlookup_helper_getpwnam(gc, LIBXL_QEMU_USER_SHARED, &user_pwbuf,
-+ &user_base);
-+ if (rc) return rc;
-+
-+ if (user_base) {
-+ LOGD(WARN, guest_domid, "Could not find user %s, falling back to %s",
-+ LIBXL_QEMU_USER_RANGE_BASE, LIBXL_QEMU_USER_SHARED);
-+ *intended_uid = user_base->pw_uid;
-+
-+ return 0;
-+ }
-+
-+ LOGD(ERROR, guest_domid,
-+ "Could not find user %s or range base pseudo-user %s, cannot restrict",
-+ LIBXL_QEMU_USER_SHARED, LIBXL_QEMU_USER_RANGE_BASE);
-+
-+ return ERROR_INVAL;
-+}
-+
-+static int make_bootloader_args(libxl__gc *gc, libxl__bootloader_state *bl,
-+ const char *bootloader_path)
- {
- const libxl_domain_build_info *info = bl->info;
-
-@@ -61,6 +125,23 @@ static void make_bootloader_args(libxl__gc *gc, libxl__bootloader_state *bl,
- ARG(GCSPRINTF("--ramdisk=%s", info->ramdisk));
- if (info->cmdline && *info->cmdline != '\0')
- ARG(GCSPRINTF("--args=%s", info->cmdline));
-+ if (getenv("LIBXL_BOOTLOADER_RESTRICT") ||
-+ getenv("LIBXL_BOOTLOADER_USER")) {
-+ uid_t uid = -1;
-+ int rc = bootloader_uid(gc, bl->domid, getenv("LIBXL_BOOTLOADER_USER"),
-+ &uid);
-+
-+ if (rc) return rc;
-+
-+ assert(uid != -1);
-+ if (!uid) {
-+ LOGD(ERROR, bl->domid, "bootloader restrict UID is 0 (root)!");
-+ return ERROR_INVAL;
-+ }
-+ LOGD(DEBUG, bl->domid, "using uid %ld", (long)uid);
-+ ARG(GCSPRINTF("--runas=%ld", (long)uid));
-+ ARG("--quiet");
-+ }
-
- ARG(GCSPRINTF("--output=%s", bl->outputpath));
- ARG("--output-format=simple0");
-@@ -79,6 +160,7 @@ static void make_bootloader_args(libxl__gc *gc, libxl__bootloader_state *bl,
- /* Sentinel for execv */
- ARG(NULL);
-
-+ return 0;
- #undef ARG
- }
-
-@@ -443,7 +525,8 @@ static void bootloader_disk_attached_cb(libxl__egc *egc,
- bootloader = bltmp;
- }
-
-- make_bootloader_args(gc, bl, bootloader);
-+ rc = make_bootloader_args(gc, bl, bootloader);
-+ if (rc) goto out;
-
- bl->openpty.ao = ao;
- bl->openpty.callback = bootloader_gotptys;
-diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
-index fc264a3a13..14b593110f 100644
---- a/tools/libs/light/libxl_dm.c
-+++ b/tools/libs/light/libxl_dm.c
-@@ -80,10 +80,10 @@ static int libxl__create_qemu_logfile(libxl__gc *gc, char *name)
- * On error, return a libxl-style error code.
- */
- #define DEFINE_USERLOOKUP_HELPER(NAME,SPEC_TYPE,STRUCTNAME,SYSCONF) \
-- static int userlookup_helper_##NAME(libxl__gc *gc, \
-- SPEC_TYPE spec, \
-- struct STRUCTNAME *resultbuf, \
-- struct STRUCTNAME **out) \
-+ int userlookup_helper_##NAME(libxl__gc *gc, \
-+ SPEC_TYPE spec, \
-+ struct STRUCTNAME *resultbuf, \
-+ struct STRUCTNAME **out) \
- { \
- struct STRUCTNAME *resultp = NULL; \
- char *buf = NULL; \
-diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
-index 7ad38de30e..f1e3a9a15b 100644
---- a/tools/libs/light/libxl_internal.h
-+++ b/tools/libs/light/libxl_internal.h
-@@ -4873,6 +4873,14 @@ struct libxl__cpu_policy {
- struct xc_msr *msr;
- };
-
-+struct passwd;
-+_hidden int userlookup_helper_getpwnam(libxl__gc*, const char *user,
-+ struct passwd *res,
-+ struct passwd **out);
-+_hidden int userlookup_helper_getpwuid(libxl__gc*, uid_t uid,
-+ struct passwd *res,
-+ struct passwd **out);
-+
- #endif
-
- /*
---
-2.42.0
-
diff --git a/0053-libxl-limit-bootloader-execution-in-restricted-mode.patch b/0053-libxl-limit-bootloader-execution-in-restricted-mode.patch
deleted file mode 100644
index 8c790d3..0000000
--- a/0053-libxl-limit-bootloader-execution-in-restricted-mode.patch
+++ /dev/null
@@ -1,158 +0,0 @@
-From 46d00dbf4c22b28910f73f66a03e5cabe50b5395 Mon Sep 17 00:00:00 2001
-From: Roger Pau Monne <roger.pau@citrix.com>
-Date: Thu, 28 Sep 2023 12:22:35 +0200
-Subject: [PATCH 53/55] libxl: limit bootloader execution in restricted mode
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Introduce a timeout for bootloader execution when running in restricted mode.
-
-Allow overwriting the default time out with an environment provided value.
-
-This is part of XSA-443 / CVE-2023-34325
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
-(cherry picked from commit 9c114178ffd700112e91f5ec66cf5151b9c9a8cc)
----
- docs/man/xl.1.pod.in | 8 ++++++
- tools/libs/light/libxl_bootloader.c | 40 +++++++++++++++++++++++++++++
- tools/libs/light/libxl_internal.h | 2 ++
- 3 files changed, 50 insertions(+)
-
-diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
-index 4831e12242..c3eb6570ab 100644
---- a/docs/man/xl.1.pod.in
-+++ b/docs/man/xl.1.pod.in
-@@ -1988,6 +1988,14 @@ NOTE: Each domain MUST have a SEPARATE username.
-
- See docs/features/qemu-deprivilege.pandoc for more information.
-
-+=item LIBXL_BOOTLOADER_TIMEOUT
-+
-+Timeout in seconds for bootloader execution when running in restricted mode.
-+Otherwise the build time default in LIBXL_BOOTLOADER_TIMEOUT will be used.
-+
-+If defined the value must be an unsigned integer between 0 and INT_MAX,
-+otherwise behavior is undefined. Setting to 0 disables the timeout.
-+
- =back
-
- =head1 SEE ALSO
-diff --git a/tools/libs/light/libxl_bootloader.c b/tools/libs/light/libxl_bootloader.c
-index 23c0ef3e89..ee26d08f37 100644
---- a/tools/libs/light/libxl_bootloader.c
-+++ b/tools/libs/light/libxl_bootloader.c
-@@ -30,6 +30,8 @@ static void bootloader_keystrokes_copyfail(libxl__egc *egc,
- libxl__datacopier_state *dc, int rc, int onwrite, int errnoval);
- static void bootloader_display_copyfail(libxl__egc *egc,
- libxl__datacopier_state *dc, int rc, int onwrite, int errnoval);
-+static void bootloader_timeout(libxl__egc *egc, libxl__ev_time *ev,
-+ const struct timeval *requested_abs, int rc);
- static void bootloader_domaindeath(libxl__egc*, libxl__domaindeathcheck *dc,
- int rc);
- static void bootloader_finished(libxl__egc *egc, libxl__ev_child *child,
-@@ -297,6 +299,7 @@ void libxl__bootloader_init(libxl__bootloader_state *bl)
- bl->ptys[0].master = bl->ptys[0].slave = 0;
- bl->ptys[1].master = bl->ptys[1].slave = 0;
- libxl__ev_child_init(&bl->child);
-+ libxl__ev_time_init(&bl->time);
- libxl__domaindeathcheck_init(&bl->deathcheck);
- bl->keystrokes.ao = bl->ao; libxl__datacopier_init(&bl->keystrokes);
- bl->display.ao = bl->ao; libxl__datacopier_init(&bl->display);
-@@ -314,6 +317,7 @@ static void bootloader_cleanup(libxl__egc *egc, libxl__bootloader_state *bl)
- libxl__domaindeathcheck_stop(gc,&bl->deathcheck);
- libxl__datacopier_kill(&bl->keystrokes);
- libxl__datacopier_kill(&bl->display);
-+ libxl__ev_time_deregister(gc, &bl->time);
- for (i=0; i<2; i++) {
- libxl__carefd_close(bl->ptys[i].master);
- libxl__carefd_close(bl->ptys[i].slave);
-@@ -375,6 +379,7 @@ static void bootloader_stop(libxl__egc *egc,
-
- libxl__datacopier_kill(&bl->keystrokes);
- libxl__datacopier_kill(&bl->display);
-+ libxl__ev_time_deregister(gc, &bl->time);
- if (libxl__ev_child_inuse(&bl->child)) {
- r = kill(bl->child.pid, SIGTERM);
- if (r) LOGED(WARN, bl->domid, "%sfailed to kill bootloader [%lu]",
-@@ -637,6 +642,25 @@ static void bootloader_gotptys(libxl__egc *egc, libxl__openpty_state *op)
-
- struct termios termattr;
-
-+ if (getenv("LIBXL_BOOTLOADER_RESTRICT") ||
-+ getenv("LIBXL_BOOTLOADER_USER")) {
-+ const char *timeout_env = getenv("LIBXL_BOOTLOADER_TIMEOUT");
-+ int timeout = timeout_env ? atoi(timeout_env)
-+ : LIBXL_BOOTLOADER_TIMEOUT;
-+
-+ if (timeout) {
-+ /* Set execution timeout */
-+ rc = libxl__ev_time_register_rel(ao, &bl->time,
-+ bootloader_timeout,
-+ timeout * 1000);
-+ if (rc) {
-+ LOGED(ERROR, bl->domid,
-+ "unable to register timeout for bootloader execution");
-+ goto out;
-+ }
-+ }
-+ }
-+
- pid_t pid = libxl__ev_child_fork(gc, &bl->child, bootloader_finished);
- if (pid == -1) {
- rc = ERROR_FAIL;
-@@ -702,6 +726,21 @@ static void bootloader_display_copyfail(libxl__egc *egc,
- libxl__bootloader_state *bl = CONTAINER_OF(dc, *bl, display);
- bootloader_copyfail(egc, "bootloader output", bl, 1, rc,onwrite,errnoval);
- }
-+static void bootloader_timeout(libxl__egc *egc, libxl__ev_time *ev,
-+ const struct timeval *requested_abs, int rc)
-+{
-+ libxl__bootloader_state *bl = CONTAINER_OF(ev, *bl, time);
-+ STATE_AO_GC(bl->ao);
-+
-+ libxl__ev_time_deregister(gc, &bl->time);
-+
-+ assert(libxl__ev_child_inuse(&bl->child));
-+ LOGD(ERROR, bl->domid, "killing bootloader because of timeout");
-+
-+ libxl__ev_child_kill_deregister(ao, &bl->child, SIGKILL);
-+
-+ bootloader_callback(egc, bl, rc);
-+}
-
- static void bootloader_domaindeath(libxl__egc *egc,
- libxl__domaindeathcheck *dc,
-@@ -718,6 +757,7 @@ static void bootloader_finished(libxl__egc *egc, libxl__ev_child *child,
- STATE_AO_GC(bl->ao);
- int rc;
-
-+ libxl__ev_time_deregister(gc, &bl->time);
- libxl__datacopier_kill(&bl->keystrokes);
- libxl__datacopier_kill(&bl->display);
-
-diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
-index f1e3a9a15b..d05783617f 100644
---- a/tools/libs/light/libxl_internal.h
-+++ b/tools/libs/light/libxl_internal.h
-@@ -102,6 +102,7 @@
- #define LIBXL_QMP_CMD_TIMEOUT 10
- #define LIBXL_STUBDOM_START_TIMEOUT 30
- #define LIBXL_QEMU_BODGE_TIMEOUT 2
-+#define LIBXL_BOOTLOADER_TIMEOUT 120
- #define LIBXL_XENCONSOLE_LIMIT 1048576
- #define LIBXL_XENCONSOLE_PROTOCOL "vt100"
- #define LIBXL_MAXMEM_CONSTANT 1024
-@@ -3744,6 +3745,7 @@ struct libxl__bootloader_state {
- libxl__openpty_state openpty;
- libxl__openpty_result ptys[2]; /* [0] is for bootloader */
- libxl__ev_child child;
-+ libxl__ev_time time;
- libxl__domaindeathcheck deathcheck;
- int nargs, argsspace;
- const char **args;
---
-2.42.0
-
diff --git a/0054-x86-svm-Fix-asymmetry-with-AMD-DR-MASK-context-switc.patch b/0054-x86-svm-Fix-asymmetry-with-AMD-DR-MASK-context-switc.patch
deleted file mode 100644
index af72c9a..0000000
--- a/0054-x86-svm-Fix-asymmetry-with-AMD-DR-MASK-context-switc.patch
+++ /dev/null
@@ -1,104 +0,0 @@
-From 3f8b444072fd8615288d9d11e53fbf0b6a8a7750 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 26 Sep 2023 20:03:36 +0100
-Subject: [PATCH 54/55] x86/svm: Fix asymmetry with AMD DR MASK context
- switching
-
-The handling of MSR_DR{0..3}_MASK is asymmetric between PV and HVM guests.
-
-HVM guests context switch in based on the guest view of DBEXT, whereas PV
-guest switch in base on the host capability. Both guest types leave the
-context dirty for the next vCPU.
-
-This leads to the following issue:
-
- * PV or HVM vCPU has debugging active (%dr7 + mask)
- * Switch out deactivates %dr7 but leaves other state stale in hardware
- * HVM vCPU with debugging activate but can't see DBEXT is switched in
- * Switch in loads %dr7 but leaves the mask MSRs alone
-
-Now, the HVM vCPU is operating in the context of the prior vCPU's mask MSR,
-and furthermore in a case where it genuinely expects there to be no masking
-MSRs.
-
-As a stopgap, adjust the HVM path to switch in/out the masks based on host
-capabilities rather than guest visibility (i.e. like the PV path). Adjustment
-of the of the intercepts still needs to be dependent on the guest visibility
-of DBEXT.
-
-This is part of XSA-444 / CVE-2023-34327
-
-Fixes: c097f54912d3 ("x86/SVM: support data breakpoint extension registers")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-(cherry picked from commit 5d54282f984bb9a7a65b3d12208584f9fdf1c8e1)
----
- xen/arch/x86/hvm/svm/svm.c | 24 ++++++++++++++++++------
- xen/arch/x86/traps.c | 5 +++++
- 2 files changed, 23 insertions(+), 6 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
-index e8f50e7c5e..fd32600ae3 100644
---- a/xen/arch/x86/hvm/svm/svm.c
-+++ b/xen/arch/x86/hvm/svm/svm.c
-@@ -339,6 +339,10 @@ static void svm_save_dr(struct vcpu *v)
- v->arch.hvm.flag_dr_dirty = 0;
- vmcb_set_dr_intercepts(vmcb, ~0u);
-
-+ /*
-+ * The guest can only have changed the mask MSRs if we previous dropped
-+ * intercepts. Re-read them from hardware.
-+ */
- if ( v->domain->arch.cpuid->extd.dbext )
- {
- svm_intercept_msr(v, MSR_AMD64_DR0_ADDRESS_MASK, MSR_INTERCEPT_RW);
-@@ -370,17 +374,25 @@ static void __restore_debug_registers(struct vmcb_struct *vmcb, struct vcpu *v)
-
- ASSERT(v == current);
-
-- if ( v->domain->arch.cpuid->extd.dbext )
-+ /*
-+ * Both the PV and HVM paths leave stale DR_MASK values in hardware on
-+ * context-switch-out. If we're activating %dr7 for the guest, we must
-+ * sync the DR_MASKs too, whether or not the guest can see them.
-+ */
-+ if ( boot_cpu_has(X86_FEATURE_DBEXT) )
- {
-- svm_intercept_msr(v, MSR_AMD64_DR0_ADDRESS_MASK, MSR_INTERCEPT_NONE);
-- svm_intercept_msr(v, MSR_AMD64_DR1_ADDRESS_MASK, MSR_INTERCEPT_NONE);
-- svm_intercept_msr(v, MSR_AMD64_DR2_ADDRESS_MASK, MSR_INTERCEPT_NONE);
-- svm_intercept_msr(v, MSR_AMD64_DR3_ADDRESS_MASK, MSR_INTERCEPT_NONE);
--
- wrmsrl(MSR_AMD64_DR0_ADDRESS_MASK, v->arch.msrs->dr_mask[0]);
- wrmsrl(MSR_AMD64_DR1_ADDRESS_MASK, v->arch.msrs->dr_mask[1]);
- wrmsrl(MSR_AMD64_DR2_ADDRESS_MASK, v->arch.msrs->dr_mask[2]);
- wrmsrl(MSR_AMD64_DR3_ADDRESS_MASK, v->arch.msrs->dr_mask[3]);
-+
-+ if ( v->domain->arch.cpuid->extd.dbext )
-+ {
-+ svm_intercept_msr(v, MSR_AMD64_DR0_ADDRESS_MASK, MSR_INTERCEPT_NONE);
-+ svm_intercept_msr(v, MSR_AMD64_DR1_ADDRESS_MASK, MSR_INTERCEPT_NONE);
-+ svm_intercept_msr(v, MSR_AMD64_DR2_ADDRESS_MASK, MSR_INTERCEPT_NONE);
-+ svm_intercept_msr(v, MSR_AMD64_DR3_ADDRESS_MASK, MSR_INTERCEPT_NONE);
-+ }
- }
-
- write_debugreg(0, v->arch.dr[0]);
-diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
-index e65cc60041..06c4f3868b 100644
---- a/xen/arch/x86/traps.c
-+++ b/xen/arch/x86/traps.c
-@@ -2281,6 +2281,11 @@ void activate_debugregs(const struct vcpu *curr)
- if ( curr->arch.dr7 & DR7_ACTIVE_MASK )
- write_debugreg(7, curr->arch.dr7);
-
-+ /*
-+ * Both the PV and HVM paths leave stale DR_MASK values in hardware on
-+ * context-switch-out. If we're activating %dr7 for the guest, we must
-+ * sync the DR_MASKs too, whether or not the guest can see them.
-+ */
- if ( boot_cpu_has(X86_FEATURE_DBEXT) )
- {
- wrmsrl(MSR_AMD64_DR0_ADDRESS_MASK, curr->arch.msrs->dr_mask[0]);
---
-2.42.0
-
diff --git a/0055-x86-pv-Correct-the-auditing-of-guest-breakpoint-addr.patch b/0055-x86-pv-Correct-the-auditing-of-guest-breakpoint-addr.patch
deleted file mode 100644
index 5838e7f..0000000
--- a/0055-x86-pv-Correct-the-auditing-of-guest-breakpoint-addr.patch
+++ /dev/null
@@ -1,86 +0,0 @@
-From 0b56bed864ca9b572473957f0254aefa797216f2 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 26 Sep 2023 20:03:36 +0100
-Subject: [PATCH 55/55] x86/pv: Correct the auditing of guest breakpoint
- addresses
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The use of access_ok() is buggy, because it permits access to the compat
-translation area. 64bit PV guests don't use the XLAT area, but on AMD
-hardware, the DBEXT feature allows a breakpoint to match up to a 4G aligned
-region, allowing the breakpoint to reach outside of the XLAT area.
-
-Prior to c/s cda16c1bb223 ("x86: mirror compat argument translation area for
-32-bit PV"), the live GDT was within 4G of the XLAT area.
-
-All together, this allowed a malicious 64bit PV guest on AMD hardware to place
-a breakpoint over the live GDT, and trigger a #DB livelock (CVE-2015-8104).
-
-Introduce breakpoint_addr_ok() and explain why __addr_ok() happens to be an
-appropriate check in this case.
-
-For Xen 4.14 and later, this is a latent bug because the XLAT area has moved
-to be on its own with nothing interesting adjacent. For Xen 4.13 and older on
-AMD hardware, this fixes a PV-trigger-able DoS.
-
-This is part of XSA-444 / CVE-2023-34328.
-
-Fixes: 65e355490817 ("x86/PV: support data breakpoint extension registers")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit dc9d9aa62ddeb14abd5672690d30789829f58f7e)
----
- xen/arch/x86/include/asm/debugreg.h | 20 ++++++++++++++++++++
- xen/arch/x86/pv/misc-hypercalls.c | 2 +-
- 2 files changed, 21 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/include/asm/debugreg.h b/xen/arch/x86/include/asm/debugreg.h
-index c57914efc6..cc29826524 100644
---- a/xen/arch/x86/include/asm/debugreg.h
-+++ b/xen/arch/x86/include/asm/debugreg.h
-@@ -77,6 +77,26 @@
- asm volatile ( "mov %%db" #reg ",%0" : "=r" (__val) ); \
- __val; \
- })
-+
-+/*
-+ * Architecturally, %dr{0..3} can have any arbitrary value. However, Xen
-+ * can't allow the guest to breakpoint the Xen address range, so we limit the
-+ * guest to the lower canonical half, or above the Xen range in the higher
-+ * canonical half.
-+ *
-+ * Breakpoint lengths are specified to mask the low order address bits,
-+ * meaning all breakpoints are naturally aligned. With %dr7, the widest
-+ * breakpoint is 8 bytes. With DBEXT, the widest breakpoint is 4G. Both of
-+ * the Xen boundaries have >4G alignment.
-+ *
-+ * In principle we should account for HYPERVISOR_COMPAT_VIRT_START(d), but
-+ * 64bit Xen has never enforced this for compat guests, and there's no problem
-+ * (to Xen) if the guest breakpoints it's alias of the M2P. Skipping this
-+ * aspect simplifies the logic, and causes us not to reject a migrating guest
-+ * which operated fine on prior versions of Xen.
-+ */
-+#define breakpoint_addr_ok(a) __addr_ok(a)
-+
- long set_debugreg(struct vcpu *, unsigned int reg, unsigned long value);
- void activate_debugregs(const struct vcpu *);
-
-diff --git a/xen/arch/x86/pv/misc-hypercalls.c b/xen/arch/x86/pv/misc-hypercalls.c
-index aaaf70eb63..f8636de907 100644
---- a/xen/arch/x86/pv/misc-hypercalls.c
-+++ b/xen/arch/x86/pv/misc-hypercalls.c
-@@ -72,7 +72,7 @@ long set_debugreg(struct vcpu *v, unsigned int reg, unsigned long value)
- switch ( reg )
- {
- case 0 ... 3:
-- if ( !access_ok(value, sizeof(long)) )
-+ if ( !breakpoint_addr_ok(value) )
- return -EPERM;
-
- v->arch.dr[reg] = value;
---
-2.42.0
-
diff --git a/info.txt b/info.txt
index 26a1905..0a99509 100644
--- a/info.txt
+++ b/info.txt
@@ -1,6 +1,6 @@
-Xen upstream patchset #0 for 4.17.3-pre
+Xen upstream patchset #0 for 4.17.4-pre
Containing patches from
-RELEASE-4.17.2 (b86c313a4a9c3ec4c9f825d9b99131753296485f)
+RELEASE-4.17.3 (07f413d7ffb06eab36045bd19f53555de1cacf62)
to
-staging-4.17 (0b56bed864ca9b572473957f0254aefa797216f2)
+staging-4.17 (091466ba55d1e2e75738f751818ace2e3ed08ccf)
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [gentoo-commits] proj/xen-upstream-patches:main commit in: /
@ 2024-04-05 7:00 Tomáš Mózes
0 siblings, 0 replies; 11+ messages in thread
From: Tomáš Mózes @ 2024-04-05 7:00 UTC (permalink / raw
To: gentoo-commits
commit: d0ce95087288b30e5e211bac8e9a0817f2effcf5
Author: Tomáš Mózes <hydrapolic <AT> gmail <DOT> com>
AuthorDate: Fri Apr 5 06:59:40 2024 +0000
Commit: Tomáš Mózes <hydrapolic <AT> gmail <DOT> com>
CommitDate: Fri Apr 5 06:59:40 2024 +0000
URL: https://gitweb.gentoo.org/proj/xen-upstream-patches.git/commit/?id=d0ce9508
Xen 4.17.4-pre-patchset-1
Signed-off-by: Tomáš Mózes <hydrapolic <AT> gmail.com>
0001-update-Xen-version-to-4.17.4-pre.patch | 4 +-
...vice-assignment-if-phantom-functions-cann.patch | 4 +-
0003-VT-d-Fix-else-vs-endif-misplacement.patch | 4 +-
...end-CPU-erratum-1474-fix-to-more-affected.patch | 4 +-
0005-CirrusCI-drop-FreeBSD-12.patch | 4 +-
...nsure-Global-Performance-Counter-Control-.patch | 4 +-
...vmx-Fix-IRQ-handling-for-EXIT_REASON_INIT.patch | 4 +-
...vmx-Disallow-the-use-of-inactivity-states.patch | 4 +-
...-move-lib-fdt-elf-temp.o-and-their-deps-t.patch | 4 +-
...m-pt-fix-off-by-one-in-entry-check-assert.patch | 4 +-
...s-xentop-fix-sorting-bug-for-some-columns.patch | 67 ++++
0012-amd-vi-fix-IVMD-memory-type-checks.patch | 53 +++
...hvm-Fix-fast-singlestep-state-persistence.patch | 86 +++++
...y-state-on-hvmemul_map_linear_addr-s-erro.patch | 63 ++++
0015-build-Replace-which-with-command-v.patch | 57 +++
...le-relocating-memory-for-qemu-xen-in-stub.patch | 50 +++
...sure-build-fails-when-running-kconfig-fai.patch | 58 +++
0018-x86emul-add-missing-EVEX.R-checks.patch | 50 +++
...vepatch-fix-norevert-test-hook-setup-typo.patch | 36 ++
...-fix-printf-format-specifier-in-no_config.patch | 38 ++
...-use-a-union-as-register-type-for-functio.patch | 141 +++++++
...x-BRANCH_HARDEN-option-to-only-be-set-whe.patch | 57 +++
...-for-shadow-stack-in-exception-from-stub-.patch | 212 +++++++++++
0024-xen-arm-Fix-UBSAN-failure-in-start_xen.patch | 52 +++
...e-SVM-VMX-when-their-enabling-is-prohibit.patch | 67 ++++
...sched-Fix-UB-shift-in-compat_set_timer_op.patch | 86 +++++
...int-the-built-in-SPECULATIVE_HARDEN_-opti.patch | 54 +++
...x-INDIRECT_THUNK-option-to-only-be-set-wh.patch | 67 ++++
...-not-print-thunk-option-selection-if-not-.patch | 50 +++
...ch-register-livepatch-regions-when-loaded.patch | 159 ++++++++
...ch-search-for-symbols-in-all-loaded-paylo.patch | 149 ++++++++
...ch-fix-norevert-test-attempt-to-open-code.patch | 186 ++++++++++
...ch-properly-build-the-noapply-and-norever.patch | 43 +++
...ix-segfault-in-device_model_spawn_outcome.patch | 39 ++
...-always-use-a-temporary-parameter-stashin.patch | 197 ++++++++++
...icy-Allow-for-levelling-of-VERW-side-effe.patch | 102 ++++++
...CI-skip-huge-BARs-in-certain-calculations.patch | 99 +++++
...detection-of-last-L1-entry-in-modify_xen_.patch | 41 +++
0039-x86-entry-Introduce-EFRAME_-constants.patch | 314 ++++++++++++++++
0040-x86-Resync-intel-family.h-from-Linux.patch | 98 +++++
...form-VERW-flushing-later-in-the-VMExit-pa.patch | 146 ++++++++
...rl-Perform-VERW-flushing-later-in-exit-pa.patch | 209 +++++++++++
...x86-spec-ctrl-Rename-VERW-related-options.patch | 248 +++++++++++++
0044-x86-spec-ctrl-VERW-handling-adjustments.patch | 171 +++++++++
...rl-Mitigation-Register-File-Data-Sampling.patch | 320 ++++++++++++++++
...-Delete-update_cr3-s-do_locking-parameter.patch | 161 ++++++++
...-Swap-order-of-actions-in-the-FREE-macros.patch | 58 +++
...k-introduce-support-for-blocking-speculat.patch | 331 +++++++++++++++++
...oduce-support-for-blocking-speculation-in.patch | 125 +++++++
...ck-introduce-support-for-blocking-specula.patch | 87 +++++
...empt-to-ensure-lock-wrappers-are-always-i.patch | 405 +++++++++++++++++++++
...-speculation-barriers-to-open-coded-locks.patch | 73 ++++
...-conditional-lock-taking-from-speculative.patch | 216 +++++++++++
...s-ipxe-update-for-fixing-build-with-GCC12.patch | 33 ++
...-block_lock_speculation-in-_mm_write_lock.patch | 35 ++
...x-setup_apic_nmi_watchdog-to-fail-more-cl.patch | 120 ++++++
...-together-P2M-update-and-increment-of-ent.patch | 61 ++++
...tored-Use-Map-instead-of-Hashtbl-for-quot.patch | 143 ++++++++
0059-tools-oxenstored-Make-Quota.t-pure.patch | 121 ++++++
...x86-cpu-policy-Hide-x2APIC-from-PV-guests.patch | 90 +++++
...icy-Fix-visibility-of-HTT-CMP_LEGACY-in-m.patch | 85 +++++
...irtual-region-Rename-the-start-end-fields.patch | 140 +++++++
...en-virtual-region-Include-rodata-pointers.patch | 71 ++++
...livepatch-Relax-permissions-on-rodata-too.patch | 85 +++++
...prove-the-boot-watchdog-determination-of-.patch | 106 ++++++
...Support-the-watchdog-on-newer-AMD-systems.patch | 48 +++
...s-resource-Fix-HVM-guest-in-SHADOW-builds.patch | 110 ++++++
info.txt | 4 +-
68 files changed, 6591 insertions(+), 22 deletions(-)
diff --git a/0001-update-Xen-version-to-4.17.4-pre.patch b/0001-update-Xen-version-to-4.17.4-pre.patch
index b532743..e1070c9 100644
--- a/0001-update-Xen-version-to-4.17.4-pre.patch
+++ b/0001-update-Xen-version-to-4.17.4-pre.patch
@@ -1,7 +1,7 @@
From 4f6e9d4327eb5252f1e8cac97a095d8b8485dadb Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
Date: Tue, 30 Jan 2024 14:36:44 +0100
-Subject: [PATCH 01/10] update Xen version to 4.17.4-pre
+Subject: [PATCH 01/67] update Xen version to 4.17.4-pre
---
xen/Makefile | 2 +-
@@ -21,5 +21,5 @@ index a46e6330db..dd0b004e1c 100644
-include xen-version
--
-2.43.0
+2.44.0
diff --git a/0002-pci-fail-device-assignment-if-phantom-functions-cann.patch b/0002-pci-fail-device-assignment-if-phantom-functions-cann.patch
index d91802f..bafad55 100644
--- a/0002-pci-fail-device-assignment-if-phantom-functions-cann.patch
+++ b/0002-pci-fail-device-assignment-if-phantom-functions-cann.patch
@@ -1,7 +1,7 @@
From f9e1ed51bdba31017ea17e1819eb2ade6b5c8615 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Date: Tue, 30 Jan 2024 14:37:39 +0100
-Subject: [PATCH 02/10] pci: fail device assignment if phantom functions cannot
+Subject: [PATCH 02/67] pci: fail device assignment if phantom functions cannot
be assigned
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
@@ -87,5 +87,5 @@ index 07d1986d33..8c62b14d19 100644
else if ( d == dom_io )
pdev->quarantine = true;
--
-2.43.0
+2.44.0
diff --git a/0003-VT-d-Fix-else-vs-endif-misplacement.patch b/0003-VT-d-Fix-else-vs-endif-misplacement.patch
index 2e7f78d..622fa18 100644
--- a/0003-VT-d-Fix-else-vs-endif-misplacement.patch
+++ b/0003-VT-d-Fix-else-vs-endif-misplacement.patch
@@ -1,7 +1,7 @@
From 6b1864afc14d484cdbc9754ce3172ac3dc189846 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Tue, 30 Jan 2024 14:38:38 +0100
-Subject: [PATCH 03/10] VT-d: Fix "else" vs "#endif" misplacement
+Subject: [PATCH 03/67] VT-d: Fix "else" vs "#endif" misplacement
In domain_pgd_maddr() the "#endif" is misplaced with respect to "else". This
generates incorrect logic when CONFIG_HVM is compiled out, as the "else" body
@@ -66,5 +66,5 @@ index b4c11a6b48..908b3ba6ee 100644
if ( !hd->arch.vtd.pgd_maddr )
{
--
-2.43.0
+2.44.0
diff --git a/0004-x86-amd-Extend-CPU-erratum-1474-fix-to-more-affected.patch b/0004-x86-amd-Extend-CPU-erratum-1474-fix-to-more-affected.patch
index f1289aa..fa90a46 100644
--- a/0004-x86-amd-Extend-CPU-erratum-1474-fix-to-more-affected.patch
+++ b/0004-x86-amd-Extend-CPU-erratum-1474-fix-to-more-affected.patch
@@ -1,7 +1,7 @@
From abcc32f0634627fe21117a48bd10e792bfbdd6dc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Date: Fri, 2 Feb 2024 08:01:09 +0100
-Subject: [PATCH 04/10] x86/amd: Extend CPU erratum #1474 fix to more affected
+Subject: [PATCH 04/67] x86/amd: Extend CPU erratum #1474 fix to more affected
models
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
@@ -119,5 +119,5 @@ index 29ae97e7c0..3d85e9797d 100644
-presmp_initcall(zen2_c6_errata_check);
+presmp_initcall(amd_check_erratum_1474);
--
-2.43.0
+2.44.0
diff --git a/0005-CirrusCI-drop-FreeBSD-12.patch b/0005-CirrusCI-drop-FreeBSD-12.patch
index cca7bb0..dac712b 100644
--- a/0005-CirrusCI-drop-FreeBSD-12.patch
+++ b/0005-CirrusCI-drop-FreeBSD-12.patch
@@ -1,7 +1,7 @@
From 0ef1fb43ddd61b3c4c953e833e012ac21ad5ca0f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Date: Fri, 2 Feb 2024 08:01:50 +0100
-Subject: [PATCH 05/10] CirrusCI: drop FreeBSD 12
+Subject: [PATCH 05/67] CirrusCI: drop FreeBSD 12
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
@@ -35,5 +35,5 @@ index 7e0beb200d..63f3afb104 100644
name: 'FreeBSD 13'
freebsd_instance:
--
-2.43.0
+2.44.0
diff --git a/0006-x86-intel-ensure-Global-Performance-Counter-Control-.patch b/0006-x86-intel-ensure-Global-Performance-Counter-Control-.patch
index dc64ad6..ce07803 100644
--- a/0006-x86-intel-ensure-Global-Performance-Counter-Control-.patch
+++ b/0006-x86-intel-ensure-Global-Performance-Counter-Control-.patch
@@ -1,7 +1,7 @@
From d0ad2cc5eac1b5d3cfd14204d377ce2384f52607 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Date: Fri, 2 Feb 2024 08:02:20 +0100
-Subject: [PATCH 06/10] x86/intel: ensure Global Performance Counter Control is
+Subject: [PATCH 06/67] x86/intel: ensure Global Performance Counter Control is
setup correctly
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
@@ -70,5 +70,5 @@ index b40ac696e6..96723b5d44 100644
if ( !cpu_has(c, X86_FEATURE_XTOPOLOGY) )
--
-2.43.0
+2.44.0
diff --git a/0007-x86-vmx-Fix-IRQ-handling-for-EXIT_REASON_INIT.patch b/0007-x86-vmx-Fix-IRQ-handling-for-EXIT_REASON_INIT.patch
index a1937a7..2100acc 100644
--- a/0007-x86-vmx-Fix-IRQ-handling-for-EXIT_REASON_INIT.patch
+++ b/0007-x86-vmx-Fix-IRQ-handling-for-EXIT_REASON_INIT.patch
@@ -1,7 +1,7 @@
From eca5416f9b0e179de9553900de8de660ab09199d Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Fri, 2 Feb 2024 08:02:51 +0100
-Subject: [PATCH 07/10] x86/vmx: Fix IRQ handling for EXIT_REASON_INIT
+Subject: [PATCH 07/67] x86/vmx: Fix IRQ handling for EXIT_REASON_INIT
When receiving an INIT, a prior bugfix tried to ignore the INIT and continue
onwards.
@@ -61,5 +61,5 @@ index 072288a5ef..31f4a861c6 100644
break;
case EXIT_REASON_TRIPLE_FAULT:
--
-2.43.0
+2.44.0
diff --git a/0008-x86-vmx-Disallow-the-use-of-inactivity-states.patch b/0008-x86-vmx-Disallow-the-use-of-inactivity-states.patch
index 12c2d59..3af45e8 100644
--- a/0008-x86-vmx-Disallow-the-use-of-inactivity-states.patch
+++ b/0008-x86-vmx-Disallow-the-use-of-inactivity-states.patch
@@ -1,7 +1,7 @@
From 7bd612727df792671e44152a8205f0cf821ad984 Mon Sep 17 00:00:00 2001
From: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Fri, 2 Feb 2024 08:03:26 +0100
-Subject: [PATCH 08/10] x86/vmx: Disallow the use of inactivity states
+Subject: [PATCH 08/67] x86/vmx: Disallow the use of inactivity states
Right now, vvmx will blindly copy L12's ACTIVITY_STATE into the L02 VMCS and
enter the vCPU. Luckily for us, nested-virt is explicitly unsupported for
@@ -122,5 +122,5 @@ index 78404e42b3..0af021d5f5 100644
#define VMX_MISC_CR3_TARGET 0x01ff0000
#define VMX_MISC_VMWRITE_ALL 0x20000000
--
-2.43.0
+2.44.0
diff --git a/0009-lib-fdt-elf-move-lib-fdt-elf-temp.o-and-their-deps-t.patch b/0009-lib-fdt-elf-move-lib-fdt-elf-temp.o-and-their-deps-t.patch
index 9ee7104..f33d27d 100644
--- a/0009-lib-fdt-elf-move-lib-fdt-elf-temp.o-and-their-deps-t.patch
+++ b/0009-lib-fdt-elf-move-lib-fdt-elf-temp.o-and-their-deps-t.patch
@@ -1,7 +1,7 @@
From afb85cf1e8f165abf88de9d8a6df625692a753b1 Mon Sep 17 00:00:00 2001
From: Michal Orzel <michal.orzel@amd.com>
Date: Fri, 2 Feb 2024 08:04:07 +0100
-Subject: [PATCH 09/10] lib{fdt,elf}: move lib{fdt,elf}-temp.o and their deps
+Subject: [PATCH 09/67] lib{fdt,elf}: move lib{fdt,elf}-temp.o and their deps
to $(targets)
At the moment, trying to run xencov read/reset (calling SYSCTL_coverage_op
@@ -66,5 +66,5 @@ index 75aaefa2e3..4d14fd61ba 100644
-extra-y += libfdt-temp.o $(LIBFDT_OBJS)
+targets += libfdt-temp.o $(LIBFDT_OBJS)
--
-2.43.0
+2.44.0
diff --git a/0010-x86-p2m-pt-fix-off-by-one-in-entry-check-assert.patch b/0010-x86-p2m-pt-fix-off-by-one-in-entry-check-assert.patch
index ba99063..9b3b9a0 100644
--- a/0010-x86-p2m-pt-fix-off-by-one-in-entry-check-assert.patch
+++ b/0010-x86-p2m-pt-fix-off-by-one-in-entry-check-assert.patch
@@ -1,7 +1,7 @@
From 091466ba55d1e2e75738f751818ace2e3ed08ccf Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Date: Fri, 2 Feb 2024 08:04:33 +0100
-Subject: [PATCH 10/10] x86/p2m-pt: fix off by one in entry check assert
+Subject: [PATCH 10/67] x86/p2m-pt: fix off by one in entry check assert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
@@ -32,5 +32,5 @@ index eaba2b0fb4..f02ebae372 100644
new == p2m_mmio_dm )
ASSERT(mfn_valid(mfn) || mfn_eq(mfn, INVALID_MFN));
--
-2.43.0
+2.44.0
diff --git a/0011-tools-xentop-fix-sorting-bug-for-some-columns.patch b/0011-tools-xentop-fix-sorting-bug-for-some-columns.patch
new file mode 100644
index 0000000..6bf11d9
--- /dev/null
+++ b/0011-tools-xentop-fix-sorting-bug-for-some-columns.patch
@@ -0,0 +1,67 @@
+From 61da71968ea44964fd1dd2e449b053c77eb83139 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Cyril=20R=C3=A9bert=20=28zithro=29?= <slack@rabbit.lu>
+Date: Tue, 27 Feb 2024 14:06:53 +0100
+Subject: [PATCH 11/67] tools/xentop: fix sorting bug for some columns
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Sort doesn't work on columns VBD_OO, VBD_RD, VBD_WR and VBD_RSECT.
+Fix by adjusting variables names in compare functions.
+Bug fix only. No functional change.
+
+Fixes: 91c3e3dc91d6 ("tools/xentop: Display '-' when stats are not available.")
+Signed-off-by: Cyril Rébert (zithro) <slack@rabbit.lu>
+Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
+master commit: 29f17d837421f13c0e0010802de1b2d51d2ded4a
+master date: 2024-02-05 17:58:23 +0000
+---
+ tools/xentop/xentop.c | 10 +++++-----
+ 1 file changed, 5 insertions(+), 5 deletions(-)
+
+diff --git a/tools/xentop/xentop.c b/tools/xentop/xentop.c
+index 950e8935c4..545bd5e96d 100644
+--- a/tools/xentop/xentop.c
++++ b/tools/xentop/xentop.c
+@@ -684,7 +684,7 @@ static int compare_vbd_oo(xenstat_domain *domain1, xenstat_domain *domain2)
+ unsigned long long dom1_vbd_oo = 0, dom2_vbd_oo = 0;
+
+ tot_vbd_reqs(domain1, FIELD_VBD_OO, &dom1_vbd_oo);
+- tot_vbd_reqs(domain1, FIELD_VBD_OO, &dom2_vbd_oo);
++ tot_vbd_reqs(domain2, FIELD_VBD_OO, &dom2_vbd_oo);
+
+ return -compare(dom1_vbd_oo, dom2_vbd_oo);
+ }
+@@ -711,9 +711,9 @@ static int compare_vbd_rd(xenstat_domain *domain1, xenstat_domain *domain2)
+ unsigned long long dom1_vbd_rd = 0, dom2_vbd_rd = 0;
+
+ tot_vbd_reqs(domain1, FIELD_VBD_RD, &dom1_vbd_rd);
+- tot_vbd_reqs(domain1, FIELD_VBD_RD, &dom2_vbd_rd);
++ tot_vbd_reqs(domain2, FIELD_VBD_RD, &dom2_vbd_rd);
+
+- return -compare(dom1_vbd_rd, dom1_vbd_rd);
++ return -compare(dom1_vbd_rd, dom2_vbd_rd);
+ }
+
+ /* Prints number of total VBD READ requests statistic */
+@@ -738,7 +738,7 @@ static int compare_vbd_wr(xenstat_domain *domain1, xenstat_domain *domain2)
+ unsigned long long dom1_vbd_wr = 0, dom2_vbd_wr = 0;
+
+ tot_vbd_reqs(domain1, FIELD_VBD_WR, &dom1_vbd_wr);
+- tot_vbd_reqs(domain1, FIELD_VBD_WR, &dom2_vbd_wr);
++ tot_vbd_reqs(domain2, FIELD_VBD_WR, &dom2_vbd_wr);
+
+ return -compare(dom1_vbd_wr, dom2_vbd_wr);
+ }
+@@ -765,7 +765,7 @@ static int compare_vbd_rsect(xenstat_domain *domain1, xenstat_domain *domain2)
+ unsigned long long dom1_vbd_rsect = 0, dom2_vbd_rsect = 0;
+
+ tot_vbd_reqs(domain1, FIELD_VBD_RSECT, &dom1_vbd_rsect);
+- tot_vbd_reqs(domain1, FIELD_VBD_RSECT, &dom2_vbd_rsect);
++ tot_vbd_reqs(domain2, FIELD_VBD_RSECT, &dom2_vbd_rsect);
+
+ return -compare(dom1_vbd_rsect, dom2_vbd_rsect);
+ }
+--
+2.44.0
+
diff --git a/0012-amd-vi-fix-IVMD-memory-type-checks.patch b/0012-amd-vi-fix-IVMD-memory-type-checks.patch
new file mode 100644
index 0000000..f38e39e
--- /dev/null
+++ b/0012-amd-vi-fix-IVMD-memory-type-checks.patch
@@ -0,0 +1,53 @@
+From 463aaf3fbf62d24e898ae0c2ba53d85ca0f94d3f Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 27 Feb 2024 14:07:12 +0100
+Subject: [PATCH 12/67] amd-vi: fix IVMD memory type checks
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The current code that parses the IVMD blocks is relaxed with regard to the
+restriction that such unity regions should always fall into memory ranges
+marked as reserved in the memory map.
+
+However the type checks for the IVMD addresses are inverted, and as a result
+IVMD ranges falling into RAM areas are accepted. Note that having such ranges
+in the first place is a firmware bug, as IVMD should always fall into reserved
+ranges.
+
+Fixes: ed6c77ebf0c1 ('AMD/IOMMU: check / convert IVMD ranges for being / to be reserved')
+Reported-by: Ox <oxjo@proton.me>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Tested-by: oxjo <oxjo@proton.me>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 83afa313583019d9f159c122cecf867735d27ec5
+master date: 2024-02-06 11:56:13 +0100
+---
+ xen/drivers/passthrough/amd/iommu_acpi.c | 11 ++++++++---
+ 1 file changed, 8 insertions(+), 3 deletions(-)
+
+diff --git a/xen/drivers/passthrough/amd/iommu_acpi.c b/xen/drivers/passthrough/amd/iommu_acpi.c
+index 3b577c9b39..3a7045c39b 100644
+--- a/xen/drivers/passthrough/amd/iommu_acpi.c
++++ b/xen/drivers/passthrough/amd/iommu_acpi.c
+@@ -426,9 +426,14 @@ static int __init parse_ivmd_block(const struct acpi_ivrs_memory *ivmd_block)
+ return -EIO;
+ }
+
+- /* Types which won't be handed out are considered good enough. */
+- if ( !(type & (RAM_TYPE_RESERVED | RAM_TYPE_ACPI |
+- RAM_TYPE_UNUSABLE)) )
++ /*
++ * Types which aren't RAM are considered good enough.
++ * Note that a page being partially RESERVED, ACPI or UNUSABLE will
++ * force Xen into assuming the whole page as having that type in
++ * practice.
++ */
++ if ( type & (RAM_TYPE_RESERVED | RAM_TYPE_ACPI |
++ RAM_TYPE_UNUSABLE) )
+ continue;
+
+ AMD_IOMMU_ERROR("IVMD: page at %lx can't be converted\n", addr);
+--
+2.44.0
+
diff --git a/0013-x86-hvm-Fix-fast-singlestep-state-persistence.patch b/0013-x86-hvm-Fix-fast-singlestep-state-persistence.patch
new file mode 100644
index 0000000..2a14354
--- /dev/null
+++ b/0013-x86-hvm-Fix-fast-singlestep-state-persistence.patch
@@ -0,0 +1,86 @@
+From 415f770d23f9fcbc02436560fa6583dcd8e1343f Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Petr=20Bene=C5=A1?= <w1benny@gmail.com>
+Date: Tue, 27 Feb 2024 14:07:45 +0100
+Subject: [PATCH 13/67] x86/hvm: Fix fast singlestep state persistence
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+This patch addresses an issue where the fast singlestep setting would persist
+despite xc_domain_debug_control being called with XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF.
+Specifically, if fast singlestep was enabled in a VMI session and that session
+stopped before the MTF trap occurred, the fast singlestep setting remained
+active even though MTF itself was disabled. This led to a situation where, upon
+starting a new VMI session, the first event to trigger an EPT violation would
+cause the corresponding EPT event callback to be skipped due to the lingering
+fast singlestep setting.
+
+The fix ensures that the fast singlestep setting is properly reset when
+disabling single step debugging operations.
+
+Signed-off-by: Petr Beneš <w1benny@gmail.com>
+Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
+master commit: 897def94b56175ce569673a05909d2f223e1e749
+master date: 2024-02-12 09:37:58 +0100
+---
+ xen/arch/x86/hvm/hvm.c | 34 ++++++++++++++++++++++++----------
+ 1 file changed, 24 insertions(+), 10 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
+index d6c6ab8897..558dc3eddc 100644
+--- a/xen/arch/x86/hvm/hvm.c
++++ b/xen/arch/x86/hvm/hvm.c
+@@ -5153,26 +5153,40 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
+
+ int hvm_debug_op(struct vcpu *v, int32_t op)
+ {
+- int rc;
++ int rc = 0;
+
+ switch ( op )
+ {
+ case XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON:
+ case XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF:
+- rc = -EOPNOTSUPP;
+ if ( !cpu_has_monitor_trap_flag )
+- break;
+- rc = 0;
+- vcpu_pause(v);
+- v->arch.hvm.single_step =
+- (op == XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON);
+- vcpu_unpause(v); /* guest will latch new state */
++ return -EOPNOTSUPP;
+ break;
+ default:
+- rc = -ENOSYS;
+- break;
++ return -ENOSYS;
++ }
++
++ vcpu_pause(v);
++
++ switch ( op )
++ {
++ case XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON:
++ v->arch.hvm.single_step = true;
++ break;
++
++ case XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF:
++ v->arch.hvm.single_step = false;
++ v->arch.hvm.fast_single_step.enabled = false;
++ v->arch.hvm.fast_single_step.p2midx = 0;
++ break;
++
++ default: /* Excluded above */
++ ASSERT_UNREACHABLE();
++ return -ENOSYS;
+ }
+
++ vcpu_unpause(v); /* guest will latch new state */
++
+ return rc;
+ }
+
+--
+2.44.0
+
diff --git a/0014-x86-HVM-tidy-state-on-hvmemul_map_linear_addr-s-erro.patch b/0014-x86-HVM-tidy-state-on-hvmemul_map_linear_addr-s-erro.patch
new file mode 100644
index 0000000..6536674
--- /dev/null
+++ b/0014-x86-HVM-tidy-state-on-hvmemul_map_linear_addr-s-erro.patch
@@ -0,0 +1,63 @@
+From b3ae0e6201495216b12157bd8b2382b28fdd7dae Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 27 Feb 2024 14:08:20 +0100
+Subject: [PATCH 14/67] x86/HVM: tidy state on hvmemul_map_linear_addr()'s
+ error path
+
+While in the vast majority of cases failure of the function will not
+be followed by re-invocation with the same emulation context, a few
+very specific insns - involving multiple independent writes, e.g. ENTER
+and PUSHA - exist where this can happen. Since failure of the function
+only signals to the caller that it ought to try an MMIO write instead,
+such failure also cannot be assumed to result in wholesale failure of
+emulation of the current insn. Instead we have to maintain internal
+state such that another invocation of the function with the same
+emulation context remains possible. To achieve that we need to reset MFN
+slots after putting page references on the error path.
+
+Note that all of this affects debugging code only, in causing an
+assertion to trigger (higher up in the function). There's otherwise no
+misbehavior - such a "leftover" slot would simply be overwritten by new
+contents in a release build.
+
+Also extend the related unmap() assertion, to further check for MFN 0.
+
+Fixes: 8cbd4fb0b7ea ("x86/hvm: implement hvmemul_write() using real mappings")
+Reported-by: Manuel Andreas <manuel.andreas@tum.de>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Paul Durrant <paul@xen.org>
+master commit: e72f951df407bc3be82faac64d8733a270036ba1
+master date: 2024-02-13 09:36:14 +0100
+---
+ xen/arch/x86/hvm/emulate.c | 7 ++++++-
+ 1 file changed, 6 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
+index 275451dd36..27928dc3f3 100644
+--- a/xen/arch/x86/hvm/emulate.c
++++ b/xen/arch/x86/hvm/emulate.c
+@@ -697,7 +697,12 @@ static void *hvmemul_map_linear_addr(
+ out:
+ /* Drop all held references. */
+ while ( mfn-- > hvmemul_ctxt->mfn )
++ {
+ put_page(mfn_to_page(*mfn));
++#ifndef NDEBUG /* Clean slot for a subsequent map()'s error checking. */
++ *mfn = _mfn(0);
++#endif
++ }
+
+ return err;
+ }
+@@ -719,7 +724,7 @@ static void hvmemul_unmap_linear_addr(
+
+ for ( i = 0; i < nr_frames; i++ )
+ {
+- ASSERT(mfn_valid(*mfn));
++ ASSERT(mfn_x(*mfn) && mfn_valid(*mfn));
+ paging_mark_dirty(currd, *mfn);
+ put_page(mfn_to_page(*mfn));
+
+--
+2.44.0
+
diff --git a/0015-build-Replace-which-with-command-v.patch b/0015-build-Replace-which-with-command-v.patch
new file mode 100644
index 0000000..57f21d4
--- /dev/null
+++ b/0015-build-Replace-which-with-command-v.patch
@@ -0,0 +1,57 @@
+From 1330a5fe44ca91f98857b53fe8bbe06522d9db27 Mon Sep 17 00:00:00 2001
+From: Anthony PERARD <anthony.perard@citrix.com>
+Date: Tue, 27 Feb 2024 14:08:50 +0100
+Subject: [PATCH 15/67] build: Replace `which` with `command -v`
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The `which` command is not standard, may not exist on the build host,
+or may not behave as expected by the build system. It is recommended
+to use `command -v` to find out if a command exist and have its path,
+and it's part of a POSIX shell standard (at least, it seems to be
+mandatory since IEEE Std 1003.1-2008, but was optional before).
+
+Fixes: c8a8645f1efe ("xen/build: Automatically locate a suitable python interpreter")
+Fixes: 3b47bcdb6d38 ("xen/build: Use a distro version of figlet")
+Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
+Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: f93629b18b528a5ab1b1092949c5420069c7226c
+master date: 2024-02-19 12:45:48 +0100
+---
+ xen/Makefile | 4 ++--
+ xen/build.mk | 2 +-
+ 2 files changed, 3 insertions(+), 3 deletions(-)
+
+diff --git a/xen/Makefile b/xen/Makefile
+index dd0b004e1c..7ea13a6791 100644
+--- a/xen/Makefile
++++ b/xen/Makefile
+@@ -25,8 +25,8 @@ export XEN_BUILD_HOST := $(shell hostname)
+ endif
+
+ # Best effort attempt to find a python interpreter, defaulting to Python 3 if
+-# available. Fall back to just `python` if `which` is nowhere to be found.
+-PYTHON_INTERPRETER := $(word 1,$(shell which python3 python python2 2>/dev/null) python)
++# available. Fall back to just `python`.
++PYTHON_INTERPRETER := $(word 1,$(shell command -v python3 || command -v python || command -v python2) python)
+ export PYTHON ?= $(PYTHON_INTERPRETER)
+
+ export CHECKPOLICY ?= checkpolicy
+diff --git a/xen/build.mk b/xen/build.mk
+index 9ecb104f1e..b489f77b7c 100644
+--- a/xen/build.mk
++++ b/xen/build.mk
+@@ -1,6 +1,6 @@
+ quiet_cmd_banner = BANNER $@
+ define cmd_banner
+- if which figlet >/dev/null 2>&1 ; then \
++ if command -v figlet >/dev/null 2>&1 ; then \
+ echo " Xen $(XEN_FULLVERSION)" | figlet -f $< > $@.tmp; \
+ else \
+ echo " Xen $(XEN_FULLVERSION)" > $@.tmp; \
+--
+2.44.0
+
diff --git a/0016-libxl-Disable-relocating-memory-for-qemu-xen-in-stub.patch b/0016-libxl-Disable-relocating-memory-for-qemu-xen-in-stub.patch
new file mode 100644
index 0000000..f75e07c
--- /dev/null
+++ b/0016-libxl-Disable-relocating-memory-for-qemu-xen-in-stub.patch
@@ -0,0 +1,50 @@
+From b9745280736ee526374873aa3c4142596e2ba10b Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
+ <marmarek@invisiblethingslab.com>
+Date: Tue, 27 Feb 2024 14:09:19 +0100
+Subject: [PATCH 16/67] libxl: Disable relocating memory for qemu-xen in
+ stubdomain too
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+According to comments (and experiments) qemu-xen cannot handle memory
+reolcation done by hvmloader. The code was already disabled when running
+qemu-xen in dom0 (see libxl__spawn_local_dm()), but it was missed when
+adding qemu-xen support to stubdomain. Adjust libxl__spawn_stub_dm() to
+be consistent in this regard.
+
+Reported-by: Neowutran <xen@neowutran.ovh>
+Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
+Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
+Acked-by: Anthony PERARD <anthony.perard@citrix.com>
+master commit: 97883aa269f6745a6ded232be3a855abb1297e0d
+master date: 2024-02-22 11:48:22 +0100
+---
+ tools/libs/light/libxl_dm.c | 10 ++++++++++
+ 1 file changed, 10 insertions(+)
+
+diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
+index 14b593110f..ed620a9d8e 100644
+--- a/tools/libs/light/libxl_dm.c
++++ b/tools/libs/light/libxl_dm.c
+@@ -2432,6 +2432,16 @@ void libxl__spawn_stub_dm(libxl__egc *egc, libxl__stub_dm_spawn_state *sdss)
+ "%s",
+ libxl_bios_type_to_string(guest_config->b_info.u.hvm.bios));
+ }
++ /* Disable relocating memory to make the MMIO hole larger
++ * unless we're running qemu-traditional and vNUMA is not
++ * configured. */
++ libxl__xs_printf(gc, XBT_NULL,
++ libxl__sprintf(gc, "%s/hvmloader/allow-memory-relocate",
++ libxl__xs_get_dompath(gc, guest_domid)),
++ "%d",
++ guest_config->b_info.device_model_version
++ == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL &&
++ !libxl__vnuma_configured(&guest_config->b_info));
+ ret = xc_domain_set_target(ctx->xch, dm_domid, guest_domid);
+ if (ret<0) {
+ LOGED(ERROR, guest_domid, "setting target domain %d -> %d",
+--
+2.44.0
+
diff --git a/0017-build-make-sure-build-fails-when-running-kconfig-fai.patch b/0017-build-make-sure-build-fails-when-running-kconfig-fai.patch
new file mode 100644
index 0000000..1bb3aa8
--- /dev/null
+++ b/0017-build-make-sure-build-fails-when-running-kconfig-fai.patch
@@ -0,0 +1,58 @@
+From ea869977271f93945451908be9b6117ffd1fb02d Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 27 Feb 2024 14:09:37 +0100
+Subject: [PATCH 17/67] build: make sure build fails when running kconfig fails
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Because of using "-include", failure to (re)build auto.conf (with
+auto.conf.cmd produced as a secondary target) won't stop make from
+continuing the build. Arrange for it being possible to drop the - from
+Rules.mk, requiring that the include be skipped for tools-only targets.
+Note that relying on the inclusion in those cases wouldn't be correct
+anyway, as it might be a stale file (yet to be rebuilt) which would be
+included, while during initial build, the file would be absent
+altogether.
+
+Fixes: 8d4c17a90b0a ("xen/build: silence make warnings about missing auto.conf*")
+Reported-by: Roger Pau Monné <roger.pau@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
+master commit: d34e5fa2e8db19f23081f46a3e710bb122130691
+master date: 2024-02-22 11:52:47 +0100
+---
+ xen/Makefile | 1 +
+ xen/Rules.mk | 4 +++-
+ 2 files changed, 4 insertions(+), 1 deletion(-)
+
+diff --git a/xen/Makefile b/xen/Makefile
+index 7ea13a6791..bac3684a36 100644
+--- a/xen/Makefile
++++ b/xen/Makefile
+@@ -374,6 +374,7 @@ $(KCONFIG_CONFIG): tools_fixdep
+ # This exploits the 'multi-target pattern rule' trick.
+ # The syncconfig should be executed only once to make all the targets.
+ include/config/%.conf include/config/%.conf.cmd: $(KCONFIG_CONFIG)
++ $(Q)rm -f include/config/auto.conf
+ $(Q)$(MAKE) $(build)=tools/kconfig syncconfig
+
+ ifeq ($(CONFIG_DEBUG),y)
+diff --git a/xen/Rules.mk b/xen/Rules.mk
+index 8af3dd7277..d759cccee3 100644
+--- a/xen/Rules.mk
++++ b/xen/Rules.mk
+@@ -15,7 +15,9 @@ srcdir := $(srctree)/$(src)
+ PHONY := __build
+ __build:
+
+--include $(objtree)/include/config/auto.conf
++ifneq ($(firstword $(subst /, ,$(obj))),tools)
++include $(objtree)/include/config/auto.conf
++endif
+
+ include $(XEN_ROOT)/Config.mk
+ include $(srctree)/scripts/Kbuild.include
+--
+2.44.0
+
diff --git a/0018-x86emul-add-missing-EVEX.R-checks.patch b/0018-x86emul-add-missing-EVEX.R-checks.patch
new file mode 100644
index 0000000..12e7702
--- /dev/null
+++ b/0018-x86emul-add-missing-EVEX.R-checks.patch
@@ -0,0 +1,50 @@
+From 16f2e47eb1207d866f95cf694a60a7ceb8f96a36 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 27 Feb 2024 14:09:55 +0100
+Subject: [PATCH 18/67] x86emul: add missing EVEX.R' checks
+
+EVEX.R' is not ignored in 64-bit code when encoding a GPR or mask
+register. While for mask registers suitable checks are in place (there
+also covering EVEX.R), they were missing for the few cases where in
+EVEX-encoded instructions ModR/M.reg encodes a GPR. While for VPEXTRW
+the bit is replaced before an emulation stub is invoked, for
+VCVT{,T}{S,D,H}2{,U}SI this actually would have led to #UD from inside
+an emulation stub, in turn raising #UD to the guest, but accompanied by
+log messages indicating something's wrong in Xen nevertheless.
+
+Fixes: 001bd91ad864 ("x86emul: support AVX512{F,BW,DQ} extract insns")
+Fixes: baf4a376f550 ("x86emul: support AVX512F legacy-equivalent scalar int/FP conversion insns")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: cb319824bfa8d3c9ea0410cc71daaedc3e11aa2a
+master date: 2024-02-22 11:54:07 +0100
+---
+ xen/arch/x86/x86_emulate/x86_emulate.c | 5 +++--
+ 1 file changed, 3 insertions(+), 2 deletions(-)
+
+diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
+index 0c0336f737..995670cbc8 100644
+--- a/xen/arch/x86/x86_emulate/x86_emulate.c
++++ b/xen/arch/x86/x86_emulate/x86_emulate.c
+@@ -6829,7 +6829,8 @@ x86_emulate(
+ CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x2d): /* vcvts{s,d}2si xmm/mem,reg */
+ CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x78): /* vcvtts{s,d}2usi xmm/mem,reg */
+ CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x79): /* vcvts{s,d}2usi xmm/mem,reg */
+- generate_exception_if((evex.reg != 0xf || !evex.RX || evex.opmsk ||
++ generate_exception_if((evex.reg != 0xf || !evex.RX || !evex.R ||
++ evex.opmsk ||
+ (ea.type != OP_REG && evex.brs)),
+ EXC_UD);
+ host_and_vcpu_must_have(avx512f);
+@@ -10705,7 +10706,7 @@ x86_emulate(
+ goto pextr;
+
+ case X86EMUL_OPC_EVEX_66(0x0f, 0xc5): /* vpextrw $imm8,xmm,reg */
+- generate_exception_if(ea.type != OP_REG, EXC_UD);
++ generate_exception_if(ea.type != OP_REG || !evex.R, EXC_UD);
+ /* Convert to alternative encoding: We want to use a memory operand. */
+ evex.opcx = ext_0f3a;
+ b = 0x15;
+--
+2.44.0
+
diff --git a/0019-xen-livepatch-fix-norevert-test-hook-setup-typo.patch b/0019-xen-livepatch-fix-norevert-test-hook-setup-typo.patch
new file mode 100644
index 0000000..1676f7a
--- /dev/null
+++ b/0019-xen-livepatch-fix-norevert-test-hook-setup-typo.patch
@@ -0,0 +1,36 @@
+From f6b12792542e372f36a71ea4c2563e6dd6e4fa57 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 27 Feb 2024 14:10:24 +0100
+Subject: [PATCH 19/67] xen/livepatch: fix norevert test hook setup typo
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The test code has a typo in using LIVEPATCH_APPLY_HOOK() instead of
+LIVEPATCH_REVERT_HOOK().
+
+Fixes: 6047104c3ccc ('livepatch: Add per-function applied/reverted state tracking marker')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
+master commit: f0622dd4fd6ae6ddb523a45d89ed9b8f3a9a8f36
+master date: 2024-02-26 10:13:46 +0100
+---
+ xen/test/livepatch/xen_action_hooks_norevert.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/test/livepatch/xen_action_hooks_norevert.c b/xen/test/livepatch/xen_action_hooks_norevert.c
+index 3e21ade6ab..c173855192 100644
+--- a/xen/test/livepatch/xen_action_hooks_norevert.c
++++ b/xen/test/livepatch/xen_action_hooks_norevert.c
+@@ -120,7 +120,7 @@ static void post_revert_hook(livepatch_payload_t *payload)
+ printk(KERN_DEBUG "%s: Hook done.\n", __func__);
+ }
+
+-LIVEPATCH_APPLY_HOOK(revert_hook);
++LIVEPATCH_REVERT_HOOK(revert_hook);
+
+ LIVEPATCH_PREAPPLY_HOOK(pre_apply_hook);
+ LIVEPATCH_POSTAPPLY_HOOK(post_apply_hook);
+--
+2.44.0
+
diff --git a/0020-xen-cmdline-fix-printf-format-specifier-in-no_config.patch b/0020-xen-cmdline-fix-printf-format-specifier-in-no_config.patch
new file mode 100644
index 0000000..b47d9ee
--- /dev/null
+++ b/0020-xen-cmdline-fix-printf-format-specifier-in-no_config.patch
@@ -0,0 +1,38 @@
+From 229e8a72ee4cde5698aaf42cc59ae57446dce60f Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 27 Feb 2024 14:10:39 +0100
+Subject: [PATCH 20/67] xen/cmdline: fix printf format specifier in
+ no_config_param()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+'*' sets the width field, which is the minimum number of characters to output,
+but what we want in no_config_param() is the precision instead, which is '.*'
+as it imposes a maximum limit on the output.
+
+Fixes: 68d757df8dd2 ('x86/pv: Options to disable and/or compile out 32bit PV support')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: ef101f525173cf51dc70f4c77862f6f10a8ddccf
+master date: 2024-02-26 10:17:40 +0100
+---
+ xen/include/xen/param.h | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/include/xen/param.h b/xen/include/xen/param.h
+index 93c3fe7cb7..e02e49635c 100644
+--- a/xen/include/xen/param.h
++++ b/xen/include/xen/param.h
+@@ -191,7 +191,7 @@ static inline void no_config_param(const char *cfg, const char *param,
+ {
+ int len = e ? ({ ASSERT(e >= s); e - s; }) : strlen(s);
+
+- printk(XENLOG_INFO "CONFIG_%s disabled - ignoring '%s=%*s' setting\n",
++ printk(XENLOG_INFO "CONFIG_%s disabled - ignoring '%s=%.*s' setting\n",
+ cfg, param, len, s);
+ }
+
+--
+2.44.0
+
diff --git a/0021-x86-altcall-use-a-union-as-register-type-for-functio.patch b/0021-x86-altcall-use-a-union-as-register-type-for-functio.patch
new file mode 100644
index 0000000..ab050dd
--- /dev/null
+++ b/0021-x86-altcall-use-a-union-as-register-type-for-functio.patch
@@ -0,0 +1,141 @@
+From 1aafe054e7d1efbf8e8482a9cdd4be5753b79e2f Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 27 Feb 2024 14:11:04 +0100
+Subject: [PATCH 21/67] x86/altcall: use a union as register type for function
+ parameters on clang
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The current code for alternative calls uses the caller parameter types as the
+types for the register variables that serve as function parameters:
+
+uint8_t foo;
+[...]
+alternative_call(myfunc, foo);
+
+Would expand roughly into:
+
+register unint8_t a1_ asm("rdi") = foo;
+register unsigned long a2_ asm("rsi");
+[...]
+asm volatile ("call *%c[addr](%%rip)"...);
+
+However with -O2 clang will generate incorrect code, given the following
+example:
+
+unsigned int func(uint8_t t)
+{
+ return t;
+}
+
+static void bar(uint8_t b)
+{
+ int ret_;
+ register uint8_t di asm("rdi") = b;
+ register unsigned long si asm("rsi");
+ register unsigned long dx asm("rdx");
+ register unsigned long cx asm("rcx");
+ register unsigned long r8 asm("r8");
+ register unsigned long r9 asm("r9");
+ register unsigned long r10 asm("r10");
+ register unsigned long r11 asm("r11");
+
+ asm volatile ( "call %c[addr]"
+ : "+r" (di), "=r" (si), "=r" (dx),
+ "=r" (cx), "=r" (r8), "=r" (r9),
+ "=r" (r10), "=r" (r11), "=a" (ret_)
+ : [addr] "i" (&(func)), "g" (func)
+ : "memory" );
+}
+
+void foo(unsigned int a)
+{
+ bar(a);
+}
+
+Clang generates the following assembly code:
+
+func: # @func
+ movl %edi, %eax
+ retq
+foo: # @foo
+ callq func
+ retq
+
+Note the truncation of the unsigned int parameter 'a' of foo() to uint8_t when
+passed into bar() is lost. clang doesn't zero extend the parameters in the
+callee when required, as the psABI mandates.
+
+The above can be worked around by using a union when defining the register
+variables, so that `di` becomes:
+
+register union {
+ uint8_t e;
+ unsigned long r;
+} di asm("rdi") = { .e = b };
+
+Which results in following code generated for `foo()`:
+
+foo: # @foo
+ movzbl %dil, %edi
+ callq func
+ retq
+
+So the truncation is not longer lost. Apply such workaround only when built
+with clang.
+
+Reported-by: Matthew Grooms <mgrooms@shrew.net>
+Link: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277200
+Link: https://github.com/llvm/llvm-project/issues/12579
+Link: https://github.com/llvm/llvm-project/issues/82598
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+master commit: 2ce562b2a413cbdb2e1128989ed1722290a27c4e
+master date: 2024-02-26 10:18:01 +0100
+---
+ xen/arch/x86/include/asm/alternative.h | 25 +++++++++++++++++++++++++
+ 1 file changed, 25 insertions(+)
+
+diff --git a/xen/arch/x86/include/asm/alternative.h b/xen/arch/x86/include/asm/alternative.h
+index a7a82c2c03..bcb1dc94f4 100644
+--- a/xen/arch/x86/include/asm/alternative.h
++++ b/xen/arch/x86/include/asm/alternative.h
+@@ -167,9 +167,34 @@ extern void alternative_branches(void);
+ #define ALT_CALL_arg5 "r8"
+ #define ALT_CALL_arg6 "r9"
+
++#ifdef CONFIG_CC_IS_CLANG
++/*
++ * Use a union with an unsigned long in order to prevent clang from
++ * skipping a possible truncation of the value. By using the union any
++ * truncation is carried before the call instruction, in turn covering
++ * for ABI-non-compliance in that the necessary clipping / extension of
++ * the value is supposed to be carried out in the callee.
++ *
++ * Note this behavior is not mandated by the standard, and hence could
++ * stop being a viable workaround, or worse, could cause a different set
++ * of code-generation issues in future clang versions.
++ *
++ * This has been reported upstream:
++ * https://github.com/llvm/llvm-project/issues/12579
++ * https://github.com/llvm/llvm-project/issues/82598
++ */
++#define ALT_CALL_ARG(arg, n) \
++ register union { \
++ typeof(arg) e; \
++ unsigned long r; \
++ } a ## n ## _ asm ( ALT_CALL_arg ## n ) = { \
++ .e = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); }) \
++ }
++#else
+ #define ALT_CALL_ARG(arg, n) \
+ register typeof(arg) a ## n ## _ asm ( ALT_CALL_arg ## n ) = \
+ ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })
++#endif
+ #define ALT_CALL_NO_ARG(n) \
+ register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n )
+
+--
+2.44.0
+
diff --git a/0022-x86-spec-fix-BRANCH_HARDEN-option-to-only-be-set-whe.patch b/0022-x86-spec-fix-BRANCH_HARDEN-option-to-only-be-set-whe.patch
new file mode 100644
index 0000000..ce01c1a
--- /dev/null
+++ b/0022-x86-spec-fix-BRANCH_HARDEN-option-to-only-be-set-whe.patch
@@ -0,0 +1,57 @@
+From 91650010815f3da0834bc9781c4359350d1162a5 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 27 Feb 2024 14:11:40 +0100
+Subject: [PATCH 22/67] x86/spec: fix BRANCH_HARDEN option to only be set when
+ build-enabled
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The current logic to handle the BRANCH_HARDEN option will report it as enabled
+even when build-time disabled. Fix this by only allowing the option to be set
+when support for it is built into Xen.
+
+Fixes: 2d6f36daa086 ('x86/nospec: Introduce CONFIG_SPECULATIVE_HARDEN_BRANCH')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 60e00f77a5cc671d30c5ef3318f5b8e9b74e4aa3
+master date: 2024-02-26 16:06:42 +0100
+---
+ xen/arch/x86/spec_ctrl.c | 14 ++++++++++++--
+ 1 file changed, 12 insertions(+), 2 deletions(-)
+
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 56e07d7536..661716d695 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -62,7 +62,8 @@ int8_t __initdata opt_psfd = -1;
+ int8_t __ro_after_init opt_ibpb_ctxt_switch = -1;
+ int8_t __read_mostly opt_eager_fpu = -1;
+ int8_t __read_mostly opt_l1d_flush = -1;
+-static bool __initdata opt_branch_harden = true;
++static bool __initdata opt_branch_harden =
++ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH);
+
+ bool __initdata bsp_delay_spec_ctrl;
+ uint8_t __read_mostly default_xen_spec_ctrl;
+@@ -280,7 +281,16 @@ static int __init cf_check parse_spec_ctrl(const char *s)
+ else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
+ opt_l1d_flush = val;
+ else if ( (val = parse_boolean("branch-harden", s, ss)) >= 0 )
+- opt_branch_harden = val;
++ {
++ if ( IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH) )
++ opt_branch_harden = val;
++ else
++ {
++ no_config_param("SPECULATIVE_HARDEN_BRANCH", "spec-ctrl", s,
++ ss);
++ rc = -EINVAL;
++ }
++ }
+ else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 )
+ opt_srb_lock = val;
+ else if ( (val = parse_boolean("unpriv-mmio", s, ss)) >= 0 )
+--
+2.44.0
+
diff --git a/0023-x86-account-for-shadow-stack-in-exception-from-stub-.patch b/0023-x86-account-for-shadow-stack-in-exception-from-stub-.patch
new file mode 100644
index 0000000..e23a764
--- /dev/null
+++ b/0023-x86-account-for-shadow-stack-in-exception-from-stub-.patch
@@ -0,0 +1,212 @@
+From 49f77602373b58b7bbdb40cea2b49d2f88d4003d Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 27 Feb 2024 14:12:11 +0100
+Subject: [PATCH 23/67] x86: account for shadow stack in exception-from-stub
+ recovery
+
+Dealing with exceptions raised from within emulation stubs involves
+discarding return address (replaced by exception related information).
+Such discarding of course also requires removing the corresponding entry
+from the shadow stack.
+
+Also amend the comment in fixup_exception_return(), to further clarify
+why use of ptr[1] can't be an out-of-bounds access.
+
+This is CVE-2023-46841 / XSA-451.
+
+Fixes: 209fb9919b50 ("x86/extable: Adjust extable handling to be shadow stack compatible")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 91f5f7a9154919a765c3933521760acffeddbf28
+master date: 2024-02-27 13:49:22 +0100
+---
+ xen/arch/x86/extable.c | 20 ++++++----
+ xen/arch/x86/include/asm/uaccess.h | 3 +-
+ xen/arch/x86/traps.c | 63 +++++++++++++++++++++++++++---
+ 3 files changed, 71 insertions(+), 15 deletions(-)
+
+diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
+index 6758ba1dca..dd9583f2a5 100644
+--- a/xen/arch/x86/extable.c
++++ b/xen/arch/x86/extable.c
+@@ -86,26 +86,29 @@ search_one_extable(const struct exception_table_entry *first,
+ }
+
+ unsigned long
+-search_exception_table(const struct cpu_user_regs *regs)
++search_exception_table(const struct cpu_user_regs *regs, unsigned long *stub_ra)
+ {
+ const struct virtual_region *region = find_text_region(regs->rip);
+ unsigned long stub = this_cpu(stubs.addr);
+
+ if ( region && region->ex )
++ {
++ *stub_ra = 0;
+ return search_one_extable(region->ex, region->ex_end, regs->rip);
++ }
+
+ if ( regs->rip >= stub + STUB_BUF_SIZE / 2 &&
+ regs->rip < stub + STUB_BUF_SIZE &&
+ regs->rsp > (unsigned long)regs &&
+ regs->rsp < (unsigned long)get_cpu_info() )
+ {
+- unsigned long retptr = *(unsigned long *)regs->rsp;
++ unsigned long retaddr = *(unsigned long *)regs->rsp, fixup;
+
+- region = find_text_region(retptr);
+- retptr = region && region->ex
+- ? search_one_extable(region->ex, region->ex_end, retptr)
+- : 0;
+- if ( retptr )
++ region = find_text_region(retaddr);
++ fixup = region && region->ex
++ ? search_one_extable(region->ex, region->ex_end, retaddr)
++ : 0;
++ if ( fixup )
+ {
+ /*
+ * Put trap number and error code on the stack (in place of the
+@@ -117,7 +120,8 @@ search_exception_table(const struct cpu_user_regs *regs)
+ };
+
+ *(unsigned long *)regs->rsp = token.raw;
+- return retptr;
++ *stub_ra = retaddr;
++ return fixup;
+ }
+ }
+
+diff --git a/xen/arch/x86/include/asm/uaccess.h b/xen/arch/x86/include/asm/uaccess.h
+index 684fccd95c..74bb222c03 100644
+--- a/xen/arch/x86/include/asm/uaccess.h
++++ b/xen/arch/x86/include/asm/uaccess.h
+@@ -421,7 +421,8 @@ union stub_exception_token {
+ unsigned long raw;
+ };
+
+-extern unsigned long search_exception_table(const struct cpu_user_regs *regs);
++extern unsigned long search_exception_table(const struct cpu_user_regs *regs,
++ unsigned long *stub_ra);
+ extern void sort_exception_tables(void);
+ extern void sort_exception_table(struct exception_table_entry *start,
+ const struct exception_table_entry *stop);
+diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
+index 06c4f3868b..7599bee361 100644
+--- a/xen/arch/x86/traps.c
++++ b/xen/arch/x86/traps.c
+@@ -856,7 +856,7 @@ void do_unhandled_trap(struct cpu_user_regs *regs)
+ }
+
+ static void fixup_exception_return(struct cpu_user_regs *regs,
+- unsigned long fixup)
++ unsigned long fixup, unsigned long stub_ra)
+ {
+ if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
+ {
+@@ -873,7 +873,8 @@ static void fixup_exception_return(struct cpu_user_regs *regs,
+ /*
+ * Search for %rip. The shstk currently looks like this:
+ *
+- * ... [Likely pointed to by SSP]
++ * tok [Supervisor token, == &tok | BUSY, only with FRED inactive]
++ * ... [Pointed to by SSP for most exceptions, empty in IST cases]
+ * %cs [== regs->cs]
+ * %rip [== regs->rip]
+ * SSP [Likely points to 3 slots higher, above %cs]
+@@ -891,7 +892,56 @@ static void fixup_exception_return(struct cpu_user_regs *regs,
+ */
+ if ( ptr[0] == regs->rip && ptr[1] == regs->cs )
+ {
++ unsigned long primary_shstk =
++ (ssp & ~(STACK_SIZE - 1)) +
++ (PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8;
++
+ wrss(fixup, ptr);
++
++ if ( !stub_ra )
++ goto shstk_done;
++
++ /*
++ * Stub recovery ought to happen only when the outer context
++ * was on the main shadow stack. We need to also "pop" the
++ * stub's return address from the interrupted context's shadow
++ * stack. That is,
++ * - if we're still on the main stack, we need to move the
++ * entire stack (up to and including the exception frame)
++ * up by one slot, incrementing the original SSP in the
++ * exception frame,
++ * - if we're on an IST stack, we need to increment the
++ * original SSP.
++ */
++ BUG_ON((ptr[-1] ^ primary_shstk) >> PAGE_SHIFT);
++
++ if ( (ssp ^ primary_shstk) >> PAGE_SHIFT )
++ {
++ /*
++ * We're on an IST stack. First make sure the two return
++ * addresses actually match. Then increment the interrupted
++ * context's SSP.
++ */
++ BUG_ON(stub_ra != *(unsigned long*)ptr[-1]);
++ wrss(ptr[-1] + 8, &ptr[-1]);
++ goto shstk_done;
++ }
++
++ /* Make sure the two return addresses actually match. */
++ BUG_ON(stub_ra != ptr[2]);
++
++ /* Move exception frame, updating SSP there. */
++ wrss(ptr[1], &ptr[2]); /* %cs */
++ wrss(ptr[0], &ptr[1]); /* %rip */
++ wrss(ptr[-1] + 8, &ptr[0]); /* SSP */
++
++ /* Move all newer entries. */
++ while ( --ptr != _p(ssp) )
++ wrss(ptr[-1], &ptr[0]);
++
++ /* Finally account for our own stack having shifted up. */
++ asm volatile ( "incsspd %0" :: "r" (2) );
++
+ goto shstk_done;
+ }
+ }
+@@ -912,7 +962,8 @@ static void fixup_exception_return(struct cpu_user_regs *regs,
+
+ static bool extable_fixup(struct cpu_user_regs *regs, bool print)
+ {
+- unsigned long fixup = search_exception_table(regs);
++ unsigned long stub_ra = 0;
++ unsigned long fixup = search_exception_table(regs, &stub_ra);
+
+ if ( unlikely(fixup == 0) )
+ return false;
+@@ -926,7 +977,7 @@ static bool extable_fixup(struct cpu_user_regs *regs, bool print)
+ vector_name(regs->entry_vector), regs->error_code,
+ _p(regs->rip), _p(regs->rip), _p(fixup));
+
+- fixup_exception_return(regs, fixup);
++ fixup_exception_return(regs, fixup, stub_ra);
+ this_cpu(last_extable_addr) = regs->rip;
+
+ return true;
+@@ -1214,7 +1265,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
+ void (*fn)(struct cpu_user_regs *) = bug_ptr(bug);
+
+ fn(regs);
+- fixup_exception_return(regs, (unsigned long)eip);
++ fixup_exception_return(regs, (unsigned long)eip, 0);
+ return;
+ }
+
+@@ -1235,7 +1286,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
+ case BUGFRAME_warn:
+ printk("Xen WARN at %s%s:%d\n", prefix, filename, lineno);
+ show_execution_state(regs);
+- fixup_exception_return(regs, (unsigned long)eip);
++ fixup_exception_return(regs, (unsigned long)eip, 0);
+ return;
+
+ case BUGFRAME_bug:
+--
+2.44.0
+
diff --git a/0024-xen-arm-Fix-UBSAN-failure-in-start_xen.patch b/0024-xen-arm-Fix-UBSAN-failure-in-start_xen.patch
new file mode 100644
index 0000000..7bdd651
--- /dev/null
+++ b/0024-xen-arm-Fix-UBSAN-failure-in-start_xen.patch
@@ -0,0 +1,52 @@
+From 6cbccc4071ef49a8c591ecaddfdcb1cc26d28411 Mon Sep 17 00:00:00 2001
+From: Michal Orzel <michal.orzel@amd.com>
+Date: Thu, 8 Feb 2024 11:43:39 +0100
+Subject: [PATCH 24/67] xen/arm: Fix UBSAN failure in start_xen()
+
+When running Xen on arm32, in scenario where Xen is loaded at an address
+such as boot_phys_offset >= 2GB, UBSAN reports the following:
+
+(XEN) UBSAN: Undefined behaviour in arch/arm/setup.c:739:58
+(XEN) pointer operation underflowed 00200000 to 86800000
+(XEN) Xen WARN at common/ubsan/ubsan.c:172
+(XEN) ----[ Xen-4.19-unstable arm32 debug=y ubsan=y Not tainted ]----
+...
+(XEN) Xen call trace:
+(XEN) [<0031b4c0>] ubsan.c#ubsan_epilogue+0x18/0xf0 (PC)
+(XEN) [<0031d134>] __ubsan_handle_pointer_overflow+0xb8/0xd4 (LR)
+(XEN) [<0031d134>] __ubsan_handle_pointer_overflow+0xb8/0xd4
+(XEN) [<004d15a8>] start_xen+0xe0/0xbe0
+(XEN) [<0020007c>] head.o#primary_switched+0x4/0x30
+
+The failure is reported for the following line:
+(paddr_t)(uintptr_t)(_start + boot_phys_offset)
+
+This occurs because the compiler treats (ptr + size) with size bigger than
+PTRDIFF_MAX as undefined behavior. To address this, switch to macro
+virt_to_maddr(), given the future plans to eliminate boot_phys_offset.
+
+Signed-off-by: Michal Orzel <michal.orzel@amd.com>
+Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
+Tested-by: Luca Fancellu <luca.fancellu@arm.com>
+Acked-by: Julien Grall <jgrall@amazon.com>
+(cherry picked from commit e11f5766503c0ff074b4e0f888bbfc931518a169)
+---
+ xen/arch/arm/setup.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
+index 4395640019..9ee19c2bc1 100644
+--- a/xen/arch/arm/setup.c
++++ b/xen/arch/arm/setup.c
+@@ -1025,7 +1025,7 @@ void __init start_xen(unsigned long boot_phys_offset,
+
+ /* Register Xen's load address as a boot module. */
+ xen_bootmodule = add_boot_module(BOOTMOD_XEN,
+- (paddr_t)(uintptr_t)(_start + boot_phys_offset),
++ virt_to_maddr(_start),
+ (paddr_t)(uintptr_t)(_end - _start), false);
+ BUG_ON(!xen_bootmodule);
+
+--
+2.44.0
+
diff --git a/0025-x86-HVM-hide-SVM-VMX-when-their-enabling-is-prohibit.patch b/0025-x86-HVM-hide-SVM-VMX-when-their-enabling-is-prohibit.patch
new file mode 100644
index 0000000..28e489b
--- /dev/null
+++ b/0025-x86-HVM-hide-SVM-VMX-when-their-enabling-is-prohibit.patch
@@ -0,0 +1,67 @@
+From 9c0d518eb8dc69430e6a8d767bd101dad19b846a Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 5 Mar 2024 11:56:31 +0100
+Subject: [PATCH 25/67] x86/HVM: hide SVM/VMX when their enabling is prohibited
+ by firmware
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+... or we fail to enable the functionality on the BSP for other reasons.
+The only place where hardware announcing the feature is recorded is the
+raw CPU policy/featureset.
+
+Inspired by https://lore.kernel.org/all/20230921114940.957141-1-pbonzini@redhat.com/.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 0b5f149338e35a795bf609ce584640b0977f9e6c
+master date: 2024-01-09 14:06:34 +0100
+---
+ xen/arch/x86/hvm/svm/svm.c | 1 +
+ xen/arch/x86/hvm/vmx/vmcs.c | 17 +++++++++++++++++
+ 2 files changed, 18 insertions(+)
+
+diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
+index fd32600ae3..3c17464550 100644
+--- a/xen/arch/x86/hvm/svm/svm.c
++++ b/xen/arch/x86/hvm/svm/svm.c
+@@ -1669,6 +1669,7 @@ const struct hvm_function_table * __init start_svm(void)
+
+ if ( _svm_cpu_up(true) )
+ {
++ setup_clear_cpu_cap(X86_FEATURE_SVM);
+ printk("SVM: failed to initialise.\n");
+ return NULL;
+ }
+diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
+index bcbecc6945..b5ecc51b43 100644
+--- a/xen/arch/x86/hvm/vmx/vmcs.c
++++ b/xen/arch/x86/hvm/vmx/vmcs.c
+@@ -2163,6 +2163,23 @@ int __init vmx_vmcs_init(void)
+
+ if ( !ret )
+ register_keyhandler('v', vmcs_dump, "dump VT-x VMCSs", 1);
++ else
++ {
++ setup_clear_cpu_cap(X86_FEATURE_VMX);
++
++ /*
++ * _vmx_vcpu_up() may have made it past feature identification.
++ * Make sure all dependent features are off as well.
++ */
++ vmx_basic_msr = 0;
++ vmx_pin_based_exec_control = 0;
++ vmx_cpu_based_exec_control = 0;
++ vmx_secondary_exec_control = 0;
++ vmx_vmexit_control = 0;
++ vmx_vmentry_control = 0;
++ vmx_ept_vpid_cap = 0;
++ vmx_vmfunc = 0;
++ }
+
+ return ret;
+ }
+--
+2.44.0
+
diff --git a/0026-xen-sched-Fix-UB-shift-in-compat_set_timer_op.patch b/0026-xen-sched-Fix-UB-shift-in-compat_set_timer_op.patch
new file mode 100644
index 0000000..4b051ea
--- /dev/null
+++ b/0026-xen-sched-Fix-UB-shift-in-compat_set_timer_op.patch
@@ -0,0 +1,86 @@
+From b75bee183210318150e678e14b35224d7c73edb6 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 5 Mar 2024 11:57:02 +0100
+Subject: [PATCH 26/67] xen/sched: Fix UB shift in compat_set_timer_op()
+
+Tamas reported this UBSAN failure from fuzzing:
+
+ (XEN) ================================================================================
+ (XEN) UBSAN: Undefined behaviour in common/sched/compat.c:48:37
+ (XEN) left shift of negative value -2147425536
+ (XEN) ----[ Xen-4.19-unstable x86_64 debug=y ubsan=y Not tainted ]----
+ ...
+ (XEN) Xen call trace:
+ (XEN) [<ffff82d040307c1c>] R ubsan.c#ubsan_epilogue+0xa/0xd9
+ (XEN) [<ffff82d040308afb>] F __ubsan_handle_shift_out_of_bounds+0x11a/0x1c5
+ (XEN) [<ffff82d040307758>] F compat_set_timer_op+0x41/0x43
+ (XEN) [<ffff82d04040e4cc>] F hvm_do_multicall_call+0x77f/0xa75
+ (XEN) [<ffff82d040519462>] F arch_do_multicall_call+0xec/0xf1
+ (XEN) [<ffff82d040261567>] F do_multicall+0x1dc/0xde3
+ (XEN) [<ffff82d04040d2b3>] F hvm_hypercall+0xa00/0x149a
+ (XEN) [<ffff82d0403cd072>] F vmx_vmexit_handler+0x1596/0x279c
+ (XEN) [<ffff82d0403d909b>] F vmx_asm_vmexit_handler+0xdb/0x200
+
+Left-shifting any negative value is strictly undefined behaviour in C, and
+the two parameters here come straight from the guest.
+
+The fuzzer happened to choose lo 0xf, hi 0x8000e300.
+
+Switch everything to be unsigned values, making the shift well defined.
+
+As GCC documents:
+
+ As an extension to the C language, GCC does not use the latitude given in
+ C99 and C11 only to treat certain aspects of signed '<<' as undefined.
+ However, -fsanitize=shift (and -fsanitize=undefined) will diagnose such
+ cases.
+
+this was deemed not to need an XSA.
+
+Note: The unsigned -> signed conversion for do_set_timer_op()'s s_time_t
+parameter is also well defined. C makes it implementation defined, and GCC
+defines it as reduction modulo 2^N to be within range of the new type.
+
+Fixes: 2942f45e09fb ("Enable compatibility mode operation for HYPERVISOR_sched_op and HYPERVISOR_set_timer_op.")
+Reported-by: Tamas K Lengyel <tamas@tklengyel.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: ae6d4fd876765e6d623eec67d14f5d0464be09cb
+master date: 2024-02-01 19:52:44 +0000
+---
+ xen/common/sched/compat.c | 4 ++--
+ xen/include/hypercall-defs.c | 2 +-
+ 2 files changed, 3 insertions(+), 3 deletions(-)
+
+diff --git a/xen/common/sched/compat.c b/xen/common/sched/compat.c
+index 040b4caca2..b827fdecb8 100644
+--- a/xen/common/sched/compat.c
++++ b/xen/common/sched/compat.c
+@@ -39,9 +39,9 @@ static int compat_poll(struct compat_sched_poll *compat)
+
+ #include "core.c"
+
+-int compat_set_timer_op(u32 lo, s32 hi)
++int compat_set_timer_op(uint32_t lo, uint32_t hi)
+ {
+- return do_set_timer_op(((s64)hi << 32) | lo);
++ return do_set_timer_op(((uint64_t)hi << 32) | lo);
+ }
+
+ /*
+diff --git a/xen/include/hypercall-defs.c b/xen/include/hypercall-defs.c
+index 1896121074..c442dee284 100644
+--- a/xen/include/hypercall-defs.c
++++ b/xen/include/hypercall-defs.c
+@@ -127,7 +127,7 @@ xenoprof_op(int op, void *arg)
+
+ #ifdef CONFIG_COMPAT
+ prefix: compat
+-set_timer_op(uint32_t lo, int32_t hi)
++set_timer_op(uint32_t lo, uint32_t hi)
+ multicall(multicall_entry_compat_t *call_list, uint32_t nr_calls)
+ memory_op(unsigned int cmd, void *arg)
+ #ifdef CONFIG_IOREQ_SERVER
+--
+2.44.0
+
diff --git a/0027-x86-spec-print-the-built-in-SPECULATIVE_HARDEN_-opti.patch b/0027-x86-spec-print-the-built-in-SPECULATIVE_HARDEN_-opti.patch
new file mode 100644
index 0000000..845247a
--- /dev/null
+++ b/0027-x86-spec-print-the-built-in-SPECULATIVE_HARDEN_-opti.patch
@@ -0,0 +1,54 @@
+From 76ea2aab3652cc34e474de0905f0a9cd4df7d087 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Mar 2024 11:57:41 +0100
+Subject: [PATCH 27/67] x86/spec: print the built-in SPECULATIVE_HARDEN_*
+ options
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Just like it's done for INDIRECT_THUNK and SHADOW_PAGING.
+
+Reported-by: Jan Beulich <jbeulich@suse.com>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 6e9507f7d51fe49df8bc70f83e49ce06c92e4e54
+master date: 2024-02-27 14:57:52 +0100
+---
+ xen/arch/x86/spec_ctrl.c | 14 +++++++++++++-
+ 1 file changed, 13 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 661716d695..93f1cf3bb5 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -488,13 +488,25 @@ static void __init print_details(enum ind_thunk thunk)
+ (e21a & cpufeat_mask(X86_FEATURE_SBPB)) ? " SBPB" : "");
+
+ /* Compiled-in support which pertains to mitigations. */
+- if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) )
++ if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) ||
++ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_ARRAY) ||
++ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH) ||
++ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS) )
+ printk(" Compiled-in support:"
+ #ifdef CONFIG_INDIRECT_THUNK
+ " INDIRECT_THUNK"
+ #endif
+ #ifdef CONFIG_SHADOW_PAGING
+ " SHADOW_PAGING"
++#endif
++#ifdef CONFIG_SPECULATIVE_HARDEN_ARRAY
++ " HARDEN_ARRAY"
++#endif
++#ifdef CONFIG_SPECULATIVE_HARDEN_BRANCH
++ " HARDEN_BRANCH"
++#endif
++#ifdef CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS
++ " HARDEN_GUEST_ACCESS"
+ #endif
+ "\n");
+
+--
+2.44.0
+
diff --git a/0028-x86-spec-fix-INDIRECT_THUNK-option-to-only-be-set-wh.patch b/0028-x86-spec-fix-INDIRECT_THUNK-option-to-only-be-set-wh.patch
new file mode 100644
index 0000000..dfbf516
--- /dev/null
+++ b/0028-x86-spec-fix-INDIRECT_THUNK-option-to-only-be-set-wh.patch
@@ -0,0 +1,67 @@
+From 693455c3c370e535eb6cd065800ff91e147815fa Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Mar 2024 11:58:04 +0100
+Subject: [PATCH 28/67] x86/spec: fix INDIRECT_THUNK option to only be set when
+ build-enabled
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Attempt to provide a more helpful error message when the user attempts to set
+spec-ctrl=bti-thunk option but the support is build-time disabled.
+
+While there also adjust the command line documentation to mention
+CONFIG_INDIRECT_THUNK instead of INDIRECT_THUNK.
+
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 8441fa806a3b778867867cd0159fa1722e90397e
+master date: 2024-02-27 14:58:20 +0100
+---
+ docs/misc/xen-command-line.pandoc | 10 +++++-----
+ xen/arch/x86/spec_ctrl.c | 7 ++++++-
+ 2 files changed, 11 insertions(+), 6 deletions(-)
+
+diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
+index 05f613c71c..2006697226 100644
+--- a/docs/misc/xen-command-line.pandoc
++++ b/docs/misc/xen-command-line.pandoc
+@@ -2378,11 +2378,11 @@ guests to use.
+ performance reasons dom0 is unprotected by default. If it is necessary to
+ protect dom0 too, boot with `spec-ctrl=ibpb-entry`.
+
+-If Xen was compiled with INDIRECT_THUNK support, `bti-thunk=` can be used to
+-select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
+-locations. The default thunk is `retpoline` (generally preferred), with the
+-alternatives being `jmp` (a `jmp *%reg` gadget, minimal overhead), and
+-`lfence` (an `lfence; jmp *%reg` gadget).
++If Xen was compiled with `CONFIG_INDIRECT_THUNK` support, `bti-thunk=` can be
++used to select which of the thunks gets patched into the
++`__x86_indirect_thunk_%reg` locations. The default thunk is `retpoline`
++(generally preferred), with the alternatives being `jmp` (a `jmp *%reg` gadget,
++minimal overhead), and `lfence` (an `lfence; jmp *%reg` gadget).
+
+ On hardware supporting IBRS (Indirect Branch Restricted Speculation), the
+ `ibrs=` option can be used to force or prevent Xen using the feature itself.
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 93f1cf3bb5..098fa3184d 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -253,7 +253,12 @@ static int __init cf_check parse_spec_ctrl(const char *s)
+ {
+ s += 10;
+
+- if ( !cmdline_strcmp(s, "retpoline") )
++ if ( !IS_ENABLED(CONFIG_INDIRECT_THUNK) )
++ {
++ no_config_param("INDIRECT_THUNK", "spec-ctrl", s - 10, ss);
++ rc = -EINVAL;
++ }
++ else if ( !cmdline_strcmp(s, "retpoline") )
+ opt_thunk = THUNK_RETPOLINE;
+ else if ( !cmdline_strcmp(s, "lfence") )
+ opt_thunk = THUNK_LFENCE;
+--
+2.44.0
+
diff --git a/0029-x86-spec-do-not-print-thunk-option-selection-if-not-.patch b/0029-x86-spec-do-not-print-thunk-option-selection-if-not-.patch
new file mode 100644
index 0000000..71e6633
--- /dev/null
+++ b/0029-x86-spec-do-not-print-thunk-option-selection-if-not-.patch
@@ -0,0 +1,50 @@
+From 0ce25b46ab2fb53a1b58f7682ca14971453f4f2c Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Mar 2024 11:58:36 +0100
+Subject: [PATCH 29/67] x86/spec: do not print thunk option selection if not
+ built-in
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Since the thunk built-in enable is printed as part of the "Compiled-in
+support:" line, avoid printing anything in "Xen settings:" if the thunk is
+disabled at build time.
+
+Note the BTI-Thunk option printing is also adjusted to print a colon in the
+same way the other options on the line do.
+
+Requested-by: Jan Beulich <jbeulich@suse.com>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 576528a2a742069af203e90c613c5c93e23c9755
+master date: 2024-02-27 14:58:40 +0100
+---
+ xen/arch/x86/spec_ctrl.c | 11 ++++++-----
+ 1 file changed, 6 insertions(+), 5 deletions(-)
+
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 098fa3184d..25a18ac598 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -516,11 +516,12 @@ static void __init print_details(enum ind_thunk thunk)
+ "\n");
+
+ /* Settings for Xen's protection, irrespective of guests. */
+- printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s\n",
+- thunk == THUNK_NONE ? "N/A" :
+- thunk == THUNK_RETPOLINE ? "RETPOLINE" :
+- thunk == THUNK_LFENCE ? "LFENCE" :
+- thunk == THUNK_JMP ? "JMP" : "?",
++ printk(" Xen settings: %s%sSPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s\n",
++ thunk != THUNK_NONE ? "BTI-Thunk: " : "",
++ thunk == THUNK_NONE ? "" :
++ thunk == THUNK_RETPOLINE ? "RETPOLINE, " :
++ thunk == THUNK_LFENCE ? "LFENCE, " :
++ thunk == THUNK_JMP ? "JMP, " : "?, ",
+ (!boot_cpu_has(X86_FEATURE_IBRSB) &&
+ !boot_cpu_has(X86_FEATURE_IBRS)) ? "No" :
+ (default_xen_spec_ctrl & SPEC_CTRL_IBRS) ? "IBRS+" : "IBRS-",
+--
+2.44.0
+
diff --git a/0030-xen-livepatch-register-livepatch-regions-when-loaded.patch b/0030-xen-livepatch-register-livepatch-regions-when-loaded.patch
new file mode 100644
index 0000000..f521ecc
--- /dev/null
+++ b/0030-xen-livepatch-register-livepatch-regions-when-loaded.patch
@@ -0,0 +1,159 @@
+From b11917de0cd261a878beaf50c18a689bde0b2f50 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Mar 2024 11:59:26 +0100
+Subject: [PATCH 30/67] xen/livepatch: register livepatch regions when loaded
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Currently livepatch regions are registered as virtual regions only after the
+livepatch has been applied.
+
+This can lead to issues when using the pre-apply or post-revert hooks, as at
+that point the livepatch is not in the virtual regions list. If a livepatch
+pre-apply hook contains a WARN() it would trigger an hypervisor crash, as the
+code to handle the bug frame won't be able to find the instruction pointer that
+triggered the #UD in any of the registered virtual regions, and hence crash.
+
+Fix this by adding the livepatch payloads as virtual regions as soon as loaded,
+and only remove them once the payload is unloaded. This requires some changes
+to the virtual regions code, as the removal of the virtual regions is no longer
+done in stop machine context, and hence an RCU barrier is added in order to
+make sure there are no users of the virtual region after it's been removed from
+the list.
+
+Fixes: 8313c864fa95 ('livepatch: Implement pre-|post- apply|revert hooks')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
+master commit: a57b4074ab39bee78b6c116277f0a9963bd8e687
+master date: 2024-02-28 16:57:25 +0000
+---
+ xen/common/livepatch.c | 4 ++--
+ xen/common/virtual_region.c | 44 ++++++++++++++-----------------------
+ 2 files changed, 19 insertions(+), 29 deletions(-)
+
+diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
+index c2ae84d18b..537e9f33e4 100644
+--- a/xen/common/livepatch.c
++++ b/xen/common/livepatch.c
+@@ -1015,6 +1015,7 @@ static int build_symbol_table(struct payload *payload,
+ static void free_payload(struct payload *data)
+ {
+ ASSERT(spin_is_locked(&payload_lock));
++ unregister_virtual_region(&data->region);
+ list_del(&data->list);
+ payload_cnt--;
+ payload_version++;
+@@ -1114,6 +1115,7 @@ static int livepatch_upload(struct xen_sysctl_livepatch_upload *upload)
+ INIT_LIST_HEAD(&data->list);
+ INIT_LIST_HEAD(&data->applied_list);
+
++ register_virtual_region(&data->region);
+ list_add_tail(&data->list, &payload_list);
+ payload_cnt++;
+ payload_version++;
+@@ -1330,7 +1332,6 @@ static inline void apply_payload_tail(struct payload *data)
+ * The applied_list is iterated by the trap code.
+ */
+ list_add_tail_rcu(&data->applied_list, &applied_list);
+- register_virtual_region(&data->region);
+
+ data->state = LIVEPATCH_STATE_APPLIED;
+ }
+@@ -1376,7 +1377,6 @@ static inline void revert_payload_tail(struct payload *data)
+ * The applied_list is iterated by the trap code.
+ */
+ list_del_rcu(&data->applied_list);
+- unregister_virtual_region(&data->region);
+
+ data->reverted = true;
+ data->state = LIVEPATCH_STATE_CHECKED;
+diff --git a/xen/common/virtual_region.c b/xen/common/virtual_region.c
+index 5f89703f51..9f12c30efe 100644
+--- a/xen/common/virtual_region.c
++++ b/xen/common/virtual_region.c
+@@ -23,14 +23,8 @@ static struct virtual_region core_init __initdata = {
+ };
+
+ /*
+- * RCU locking. Additions are done either at startup (when there is only
+- * one CPU) or when all CPUs are running without IRQs.
+- *
+- * Deletions are bit tricky. We do it when Live Patch (all CPUs running
+- * without IRQs) or during bootup (when clearing the init).
+- *
+- * Hence we use list_del_rcu (which sports an memory fence) and a spinlock
+- * on deletion.
++ * RCU locking. Modifications to the list must be done in exclusive mode, and
++ * hence need to hold the spinlock.
+ *
+ * All readers of virtual_region_list MUST use list_for_each_entry_rcu.
+ */
+@@ -58,41 +52,36 @@ const struct virtual_region *find_text_region(unsigned long addr)
+
+ void register_virtual_region(struct virtual_region *r)
+ {
+- ASSERT(!local_irq_is_enabled());
++ unsigned long flags;
+
++ spin_lock_irqsave(&virtual_region_lock, flags);
+ list_add_tail_rcu(&r->list, &virtual_region_list);
++ spin_unlock_irqrestore(&virtual_region_lock, flags);
+ }
+
+-static void remove_virtual_region(struct virtual_region *r)
++/*
++ * Suggest inline so when !CONFIG_LIVEPATCH the function is not left
++ * unreachable after init code is removed.
++ */
++static void inline remove_virtual_region(struct virtual_region *r)
+ {
+ unsigned long flags;
+
+ spin_lock_irqsave(&virtual_region_lock, flags);
+ list_del_rcu(&r->list);
+ spin_unlock_irqrestore(&virtual_region_lock, flags);
+- /*
+- * We do not need to invoke call_rcu.
+- *
+- * This is due to the fact that on the deletion we have made sure
+- * to use spinlocks (to guard against somebody else calling
+- * unregister_virtual_region) and list_deletion spiced with
+- * memory barrier.
+- *
+- * That protects us from corrupting the list as the readers all
+- * use list_for_each_entry_rcu which is safe against concurrent
+- * deletions.
+- */
+ }
+
++#ifdef CONFIG_LIVEPATCH
+ void unregister_virtual_region(struct virtual_region *r)
+ {
+- /* Expected to be called from Live Patch - which has IRQs disabled. */
+- ASSERT(!local_irq_is_enabled());
+-
+ remove_virtual_region(r);
++
++ /* Assert that no CPU might be using the removed region. */
++ rcu_barrier();
+ }
+
+-#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_X86)
++#ifdef CONFIG_X86
+ void relax_virtual_region_perms(void)
+ {
+ const struct virtual_region *region;
+@@ -116,7 +105,8 @@ void tighten_virtual_region_perms(void)
+ PAGE_HYPERVISOR_RX);
+ rcu_read_unlock(&rcu_virtual_region_lock);
+ }
+-#endif
++#endif /* CONFIG_X86 */
++#endif /* CONFIG_LIVEPATCH */
+
+ void __init unregister_init_virtual_region(void)
+ {
+--
+2.44.0
+
diff --git a/0031-xen-livepatch-search-for-symbols-in-all-loaded-paylo.patch b/0031-xen-livepatch-search-for-symbols-in-all-loaded-paylo.patch
new file mode 100644
index 0000000..c778639
--- /dev/null
+++ b/0031-xen-livepatch-search-for-symbols-in-all-loaded-paylo.patch
@@ -0,0 +1,149 @@
+From c54cf903b06fb1933fad053cc547580c92c856ea Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Mar 2024 11:59:35 +0100
+Subject: [PATCH 31/67] xen/livepatch: search for symbols in all loaded
+ payloads
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When checking if an address belongs to a patch, or when resolving a symbol,
+take into account all loaded livepatch payloads, even if not applied.
+
+This is required in order for the pre-apply and post-revert hooks to work
+properly, or else Xen won't detect the instruction pointer belonging to those
+hooks as being part of the currently active text.
+
+Move the RCU handling to be used for payload_list instead of applied_list, as
+now the calls from trap code will iterate over the payload_list.
+
+Fixes: 8313c864fa95 ('livepatch: Implement pre-|post- apply|revert hooks')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
+master commit: d2daa40fb3ddb8f83e238e57854bd878924cde90
+master date: 2024-02-28 16:57:25 +0000
+---
+ xen/common/livepatch.c | 49 +++++++++++++++---------------------------
+ 1 file changed, 17 insertions(+), 32 deletions(-)
+
+diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
+index 537e9f33e4..a129ab9973 100644
+--- a/xen/common/livepatch.c
++++ b/xen/common/livepatch.c
+@@ -36,13 +36,14 @@
+ * caller in schedule_work.
+ */
+ static DEFINE_SPINLOCK(payload_lock);
+-static LIST_HEAD(payload_list);
+-
+ /*
+- * Patches which have been applied. Need RCU in case we crash (and then
+- * traps code would iterate via applied_list) when adding entries on the list.
++ * Need RCU in case we crash (and then traps code would iterate via
++ * payload_list) when adding entries on the list.
+ */
+-static DEFINE_RCU_READ_LOCK(rcu_applied_lock);
++static DEFINE_RCU_READ_LOCK(rcu_payload_lock);
++static LIST_HEAD(payload_list);
++
++/* Patches which have been applied. Only modified from stop machine context. */
+ static LIST_HEAD(applied_list);
+
+ static unsigned int payload_cnt;
+@@ -111,12 +112,8 @@ bool_t is_patch(const void *ptr)
+ const struct payload *data;
+ bool_t r = 0;
+
+- /*
+- * Only RCU locking since this list is only ever changed during apply
+- * or revert context. And in case it dies there we need an safe list.
+- */
+- rcu_read_lock(&rcu_applied_lock);
+- list_for_each_entry_rcu ( data, &applied_list, applied_list )
++ rcu_read_lock(&rcu_payload_lock);
++ list_for_each_entry_rcu ( data, &payload_list, list )
+ {
+ if ( (ptr >= data->rw_addr &&
+ ptr < (data->rw_addr + data->rw_size)) ||
+@@ -130,7 +127,7 @@ bool_t is_patch(const void *ptr)
+ }
+
+ }
+- rcu_read_unlock(&rcu_applied_lock);
++ rcu_read_unlock(&rcu_payload_lock);
+
+ return r;
+ }
+@@ -166,12 +163,8 @@ static const char *cf_check livepatch_symbols_lookup(
+ const void *va = (const void *)addr;
+ const char *n = NULL;
+
+- /*
+- * Only RCU locking since this list is only ever changed during apply
+- * or revert context. And in case it dies there we need an safe list.
+- */
+- rcu_read_lock(&rcu_applied_lock);
+- list_for_each_entry_rcu ( data, &applied_list, applied_list )
++ rcu_read_lock(&rcu_payload_lock);
++ list_for_each_entry_rcu ( data, &payload_list, list )
+ {
+ if ( va < data->text_addr ||
+ va >= (data->text_addr + data->text_size) )
+@@ -200,7 +193,7 @@ static const char *cf_check livepatch_symbols_lookup(
+ n = data->symtab[best].name;
+ break;
+ }
+- rcu_read_unlock(&rcu_applied_lock);
++ rcu_read_unlock(&rcu_payload_lock);
+
+ return n;
+ }
+@@ -1016,7 +1009,8 @@ static void free_payload(struct payload *data)
+ {
+ ASSERT(spin_is_locked(&payload_lock));
+ unregister_virtual_region(&data->region);
+- list_del(&data->list);
++ list_del_rcu(&data->list);
++ rcu_barrier();
+ payload_cnt--;
+ payload_version++;
+ free_payload_data(data);
+@@ -1116,7 +1110,7 @@ static int livepatch_upload(struct xen_sysctl_livepatch_upload *upload)
+ INIT_LIST_HEAD(&data->applied_list);
+
+ register_virtual_region(&data->region);
+- list_add_tail(&data->list, &payload_list);
++ list_add_tail_rcu(&data->list, &payload_list);
+ payload_cnt++;
+ payload_version++;
+ }
+@@ -1327,11 +1321,7 @@ static int apply_payload(struct payload *data)
+
+ static inline void apply_payload_tail(struct payload *data)
+ {
+- /*
+- * We need RCU variant (which has barriers) in case we crash here.
+- * The applied_list is iterated by the trap code.
+- */
+- list_add_tail_rcu(&data->applied_list, &applied_list);
++ list_add_tail(&data->applied_list, &applied_list);
+
+ data->state = LIVEPATCH_STATE_APPLIED;
+ }
+@@ -1371,12 +1361,7 @@ static int revert_payload(struct payload *data)
+
+ static inline void revert_payload_tail(struct payload *data)
+ {
+-
+- /*
+- * We need RCU variant (which has barriers) in case we crash here.
+- * The applied_list is iterated by the trap code.
+- */
+- list_del_rcu(&data->applied_list);
++ list_del(&data->applied_list);
+
+ data->reverted = true;
+ data->state = LIVEPATCH_STATE_CHECKED;
+--
+2.44.0
+
diff --git a/0032-xen-livepatch-fix-norevert-test-attempt-to-open-code.patch b/0032-xen-livepatch-fix-norevert-test-attempt-to-open-code.patch
new file mode 100644
index 0000000..76af9ef
--- /dev/null
+++ b/0032-xen-livepatch-fix-norevert-test-attempt-to-open-code.patch
@@ -0,0 +1,186 @@
+From 5564323f643715f9d364df88e0eb9c7d6fd2c22b Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Mar 2024 11:59:43 +0100
+Subject: [PATCH 32/67] xen/livepatch: fix norevert test attempt to open-code
+ revert
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The purpose of the norevert test is to install a dummy handler that replaces
+the internal Xen revert code, and then perform the revert in the post-revert
+hook. For that purpose the usage of the previous common_livepatch_revert() is
+not enough, as that just reverts specific functions, but not the whole state of
+the payload.
+
+Remove both common_livepatch_{apply,revert}() and instead expose
+revert_payload{,_tail}() in order to perform the patch revert from the
+post-revert hook.
+
+Fixes: 6047104c3ccc ('livepatch: Add per-function applied/reverted state tracking marker')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
+master commit: cdae267ce10d04d71d1687b5701ff2911a96b6dc
+master date: 2024-02-28 16:57:25 +0000
+---
+ xen/common/livepatch.c | 41 +++++++++++++++++--
+ xen/include/xen/livepatch.h | 32 ++-------------
+ .../livepatch/xen_action_hooks_norevert.c | 22 +++-------
+ 3 files changed, 46 insertions(+), 49 deletions(-)
+
+diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
+index a129ab9973..a5068a2217 100644
+--- a/xen/common/livepatch.c
++++ b/xen/common/livepatch.c
+@@ -1310,7 +1310,22 @@ static int apply_payload(struct payload *data)
+ ASSERT(!local_irq_is_enabled());
+
+ for ( i = 0; i < data->nfuncs; i++ )
+- common_livepatch_apply(&data->funcs[i], &data->fstate[i]);
++ {
++ const struct livepatch_func *func = &data->funcs[i];
++ struct livepatch_fstate *state = &data->fstate[i];
++
++ /* If the action has been already executed on this function, do nothing. */
++ if ( state->applied == LIVEPATCH_FUNC_APPLIED )
++ {
++ printk(XENLOG_WARNING LIVEPATCH
++ "%s: %s has been already applied before\n",
++ __func__, func->name);
++ continue;
++ }
++
++ arch_livepatch_apply(func, state);
++ state->applied = LIVEPATCH_FUNC_APPLIED;
++ }
+
+ arch_livepatch_revive();
+
+@@ -1326,7 +1341,7 @@ static inline void apply_payload_tail(struct payload *data)
+ data->state = LIVEPATCH_STATE_APPLIED;
+ }
+
+-static int revert_payload(struct payload *data)
++int revert_payload(struct payload *data)
+ {
+ unsigned int i;
+ int rc;
+@@ -1341,7 +1356,25 @@ static int revert_payload(struct payload *data)
+ }
+
+ for ( i = 0; i < data->nfuncs; i++ )
+- common_livepatch_revert(&data->funcs[i], &data->fstate[i]);
++ {
++ const struct livepatch_func *func = &data->funcs[i];
++ struct livepatch_fstate *state = &data->fstate[i];
++
++ /*
++ * If the apply action hasn't been executed on this function, do
++ * nothing.
++ */
++ if ( !func->old_addr || state->applied == LIVEPATCH_FUNC_NOT_APPLIED )
++ {
++ printk(XENLOG_WARNING LIVEPATCH
++ "%s: %s has not been applied before\n",
++ __func__, func->name);
++ continue;
++ }
++
++ arch_livepatch_revert(func, state);
++ state->applied = LIVEPATCH_FUNC_NOT_APPLIED;
++ }
+
+ /*
+ * Since we are running with IRQs disabled and the hooks may call common
+@@ -1359,7 +1392,7 @@ static int revert_payload(struct payload *data)
+ return 0;
+ }
+
+-static inline void revert_payload_tail(struct payload *data)
++void revert_payload_tail(struct payload *data)
+ {
+ list_del(&data->applied_list);
+
+diff --git a/xen/include/xen/livepatch.h b/xen/include/xen/livepatch.h
+index 537d3d58b6..c9ee58fd37 100644
+--- a/xen/include/xen/livepatch.h
++++ b/xen/include/xen/livepatch.h
+@@ -136,35 +136,11 @@ void arch_livepatch_post_action(void);
+ void arch_livepatch_mask(void);
+ void arch_livepatch_unmask(void);
+
+-static inline void common_livepatch_apply(const struct livepatch_func *func,
+- struct livepatch_fstate *state)
+-{
+- /* If the action has been already executed on this function, do nothing. */
+- if ( state->applied == LIVEPATCH_FUNC_APPLIED )
+- {
+- printk(XENLOG_WARNING LIVEPATCH "%s: %s has been already applied before\n",
+- __func__, func->name);
+- return;
+- }
+-
+- arch_livepatch_apply(func, state);
+- state->applied = LIVEPATCH_FUNC_APPLIED;
+-}
++/* Only for testing purposes. */
++struct payload;
++int revert_payload(struct payload *data);
++void revert_payload_tail(struct payload *data);
+
+-static inline void common_livepatch_revert(const struct livepatch_func *func,
+- struct livepatch_fstate *state)
+-{
+- /* If the apply action hasn't been executed on this function, do nothing. */
+- if ( !func->old_addr || state->applied == LIVEPATCH_FUNC_NOT_APPLIED )
+- {
+- printk(XENLOG_WARNING LIVEPATCH "%s: %s has not been applied before\n",
+- __func__, func->name);
+- return;
+- }
+-
+- arch_livepatch_revert(func, state);
+- state->applied = LIVEPATCH_FUNC_NOT_APPLIED;
+-}
+ #else
+
+ /*
+diff --git a/xen/test/livepatch/xen_action_hooks_norevert.c b/xen/test/livepatch/xen_action_hooks_norevert.c
+index c173855192..c5fbab1746 100644
+--- a/xen/test/livepatch/xen_action_hooks_norevert.c
++++ b/xen/test/livepatch/xen_action_hooks_norevert.c
+@@ -96,26 +96,14 @@ static int revert_hook(livepatch_payload_t *payload)
+
+ static void post_revert_hook(livepatch_payload_t *payload)
+ {
+- int i;
++ unsigned long flags;
+
+ printk(KERN_DEBUG "%s: Hook starting.\n", __func__);
+
+- for (i = 0; i < payload->nfuncs; i++)
+- {
+- const struct livepatch_func *func = &payload->funcs[i];
+- struct livepatch_fstate *fstate = &payload->fstate[i];
+-
+- BUG_ON(revert_cnt != 1);
+- BUG_ON(fstate->applied != LIVEPATCH_FUNC_APPLIED);
+-
+- /* Outside of quiesce zone: MAY TRIGGER HOST CRASH/UNDEFINED BEHAVIOR */
+- arch_livepatch_quiesce();
+- common_livepatch_revert(payload);
+- arch_livepatch_revive();
+- BUG_ON(fstate->applied == LIVEPATCH_FUNC_APPLIED);
+-
+- printk(KERN_DEBUG "%s: post reverted: %s\n", __func__, func->name);
+- }
++ local_irq_save(flags);
++ BUG_ON(revert_payload(payload));
++ revert_payload_tail(payload);
++ local_irq_restore(flags);
+
+ printk(KERN_DEBUG "%s: Hook done.\n", __func__);
+ }
+--
+2.44.0
+
diff --git a/0033-xen-livepatch-properly-build-the-noapply-and-norever.patch b/0033-xen-livepatch-properly-build-the-noapply-and-norever.patch
new file mode 100644
index 0000000..76803c6
--- /dev/null
+++ b/0033-xen-livepatch-properly-build-the-noapply-and-norever.patch
@@ -0,0 +1,43 @@
+From a59106b27609b6ae2873bd6755949b1258290872 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Mar 2024 11:59:51 +0100
+Subject: [PATCH 33/67] xen/livepatch: properly build the noapply and norevert
+ tests
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+It seems the build variables for those tests where copy-pasted from
+xen_action_hooks_marker-objs and not adjusted to use the correct source files.
+
+Fixes: 6047104c3ccc ('livepatch: Add per-function applied/reverted state tracking marker')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
+master commit: e579677095782c7dec792597ba8b037b7d716b32
+master date: 2024-02-28 16:57:25 +0000
+---
+ xen/test/livepatch/Makefile | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/xen/test/livepatch/Makefile b/xen/test/livepatch/Makefile
+index c258ab0b59..d987a8367f 100644
+--- a/xen/test/livepatch/Makefile
++++ b/xen/test/livepatch/Makefile
+@@ -118,12 +118,12 @@ xen_action_hooks_marker-objs := xen_action_hooks_marker.o xen_hello_world_func.o
+ $(obj)/xen_action_hooks_noapply.o: $(obj)/config.h
+
+ extra-y += xen_action_hooks_noapply.livepatch
+-xen_action_hooks_noapply-objs := xen_action_hooks_marker.o xen_hello_world_func.o note.o xen_note.o
++xen_action_hooks_noapply-objs := xen_action_hooks_noapply.o xen_hello_world_func.o note.o xen_note.o
+
+ $(obj)/xen_action_hooks_norevert.o: $(obj)/config.h
+
+ extra-y += xen_action_hooks_norevert.livepatch
+-xen_action_hooks_norevert-objs := xen_action_hooks_marker.o xen_hello_world_func.o note.o xen_note.o
++xen_action_hooks_norevert-objs := xen_action_hooks_norevert.o xen_hello_world_func.o note.o xen_note.o
+
+ EXPECT_BYTES_COUNT := 8
+ CODE_GET_EXPECT=$(shell $(OBJDUMP) -d --insn-width=1 $(1) | sed -n -e '/<'$(2)'>:$$/,/^$$/ p' | tail -n +2 | head -n $(EXPECT_BYTES_COUNT) | awk '{$$0=$$2; printf "%s", substr($$0,length-1)}' | sed 's/.\{2\}/0x&,/g' | sed 's/^/{/;s/,$$/}/g')
+--
+2.44.0
+
diff --git a/0034-libxl-Fix-segfault-in-device_model_spawn_outcome.patch b/0034-libxl-Fix-segfault-in-device_model_spawn_outcome.patch
new file mode 100644
index 0000000..7f23a73
--- /dev/null
+++ b/0034-libxl-Fix-segfault-in-device_model_spawn_outcome.patch
@@ -0,0 +1,39 @@
+From c4ee68eda9937743527fff41f4ede0f6a3228080 Mon Sep 17 00:00:00 2001
+From: Jason Andryuk <jandryuk@gmail.com>
+Date: Tue, 5 Mar 2024 12:00:30 +0100
+Subject: [PATCH 34/67] libxl: Fix segfault in device_model_spawn_outcome
+
+libxl__spawn_qdisk_backend() explicitly sets guest_config to NULL when
+starting QEMU (the usual launch through libxl__spawn_local_dm() has a
+guest_config though).
+
+Bail early on a NULL guest_config/d_config. This skips the QMP queries
+for chardevs and VNC, but this xenpv QEMU instance isn't expected to
+provide those - only qdisk (or 9pfs backends after an upcoming change).
+
+Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
+Acked-by: Anthony PERARD <anthony.perard@citrix.com>
+master commit: d4f3d35f043f6ef29393166b0dd131c8102cf255
+master date: 2024-02-29 08:18:38 +0100
+---
+ tools/libs/light/libxl_dm.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
+index ed620a9d8e..29b43ed20a 100644
+--- a/tools/libs/light/libxl_dm.c
++++ b/tools/libs/light/libxl_dm.c
+@@ -3172,8 +3172,8 @@ static void device_model_spawn_outcome(libxl__egc *egc,
+
+ /* Check if spawn failed */
+ if (rc) goto out;
+-
+- if (d_config->b_info.device_model_version
++ /* d_config is NULL for xl devd/libxl__spawn_qemu_xenpv_backend(). */
++ if (d_config && d_config->b_info.device_model_version
+ == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
+ rc = libxl__ev_time_register_rel(ao, &dmss->timeout,
+ devise_model_postconfig_timeout,
+--
+2.44.0
+
diff --git a/0035-x86-altcall-always-use-a-temporary-parameter-stashin.patch b/0035-x86-altcall-always-use-a-temporary-parameter-stashin.patch
new file mode 100644
index 0000000..177c73b
--- /dev/null
+++ b/0035-x86-altcall-always-use-a-temporary-parameter-stashin.patch
@@ -0,0 +1,197 @@
+From 2f49d9f89c14519d4cb1e06ab8370cf4ba50fab7 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 5 Mar 2024 12:00:47 +0100
+Subject: [PATCH 35/67] x86/altcall: always use a temporary parameter stashing
+ variable
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The usage in ALT_CALL_ARG() on clang of:
+
+register union {
+ typeof(arg) e;
+ const unsigned long r;
+} ...
+
+When `arg` is the first argument to alternative_{,v}call() and
+const_vlapic_vcpu() is used results in clang 3.5.0 complaining with:
+
+arch/x86/hvm/vlapic.c:141:47: error: non-const static data member must be initialized out of line
+ alternative_call(hvm_funcs.test_pir, const_vlapic_vcpu(vlapic), vec) )
+
+Workaround this by pulling `arg1` into a local variable, like it's done for
+further arguments (arg2, arg3...)
+
+Originally arg1 wasn't pulled into a variable because for the a1_ register
+local variable the possible clobbering as a result of operators on other
+variables don't matter:
+
+https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
+
+Note clang version 3.8.1 seems to already be fixed and don't require the
+workaround, but since it's harmless do it uniformly everywhere.
+
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Fixes: 2ce562b2a413 ('x86/altcall: use a union as register type for function parameters on clang')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+master commit: c20850540ad6a32f4fc17bde9b01c92b0df18bf0
+master date: 2024-02-29 08:21:49 +0100
+---
+ xen/arch/x86/include/asm/alternative.h | 36 +++++++++++++++++---------
+ 1 file changed, 24 insertions(+), 12 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/alternative.h b/xen/arch/x86/include/asm/alternative.h
+index bcb1dc94f4..fa04481316 100644
+--- a/xen/arch/x86/include/asm/alternative.h
++++ b/xen/arch/x86/include/asm/alternative.h
+@@ -253,21 +253,24 @@ extern void alternative_branches(void);
+ })
+
+ #define alternative_vcall1(func, arg) ({ \
+- ALT_CALL_ARG(arg, 1); \
++ typeof(arg) v1_ = (arg); \
++ ALT_CALL_ARG(v1_, 1); \
+ ALT_CALL_NO_ARG2; \
+ (void)sizeof(func(arg)); \
+ (void)alternative_callN(1, int, func); \
+ })
+
+ #define alternative_call1(func, arg) ({ \
+- ALT_CALL_ARG(arg, 1); \
++ typeof(arg) v1_ = (arg); \
++ ALT_CALL_ARG(v1_, 1); \
+ ALT_CALL_NO_ARG2; \
+ alternative_callN(1, typeof(func(arg)), func); \
+ })
+
+ #define alternative_vcall2(func, arg1, arg2) ({ \
++ typeof(arg1) v1_ = (arg1); \
+ typeof(arg2) v2_ = (arg2); \
+- ALT_CALL_ARG(arg1, 1); \
++ ALT_CALL_ARG(v1_, 1); \
+ ALT_CALL_ARG(v2_, 2); \
+ ALT_CALL_NO_ARG3; \
+ (void)sizeof(func(arg1, arg2)); \
+@@ -275,17 +278,19 @@ extern void alternative_branches(void);
+ })
+
+ #define alternative_call2(func, arg1, arg2) ({ \
++ typeof(arg1) v1_ = (arg1); \
+ typeof(arg2) v2_ = (arg2); \
+- ALT_CALL_ARG(arg1, 1); \
++ ALT_CALL_ARG(v1_, 1); \
+ ALT_CALL_ARG(v2_, 2); \
+ ALT_CALL_NO_ARG3; \
+ alternative_callN(2, typeof(func(arg1, arg2)), func); \
+ })
+
+ #define alternative_vcall3(func, arg1, arg2, arg3) ({ \
++ typeof(arg1) v1_ = (arg1); \
+ typeof(arg2) v2_ = (arg2); \
+ typeof(arg3) v3_ = (arg3); \
+- ALT_CALL_ARG(arg1, 1); \
++ ALT_CALL_ARG(v1_, 1); \
+ ALT_CALL_ARG(v2_, 2); \
+ ALT_CALL_ARG(v3_, 3); \
+ ALT_CALL_NO_ARG4; \
+@@ -294,9 +299,10 @@ extern void alternative_branches(void);
+ })
+
+ #define alternative_call3(func, arg1, arg2, arg3) ({ \
++ typeof(arg1) v1_ = (arg1); \
+ typeof(arg2) v2_ = (arg2); \
+ typeof(arg3) v3_ = (arg3); \
+- ALT_CALL_ARG(arg1, 1); \
++ ALT_CALL_ARG(v1_, 1); \
+ ALT_CALL_ARG(v2_, 2); \
+ ALT_CALL_ARG(v3_, 3); \
+ ALT_CALL_NO_ARG4; \
+@@ -305,10 +311,11 @@ extern void alternative_branches(void);
+ })
+
+ #define alternative_vcall4(func, arg1, arg2, arg3, arg4) ({ \
++ typeof(arg1) v1_ = (arg1); \
+ typeof(arg2) v2_ = (arg2); \
+ typeof(arg3) v3_ = (arg3); \
+ typeof(arg4) v4_ = (arg4); \
+- ALT_CALL_ARG(arg1, 1); \
++ ALT_CALL_ARG(v1_, 1); \
+ ALT_CALL_ARG(v2_, 2); \
+ ALT_CALL_ARG(v3_, 3); \
+ ALT_CALL_ARG(v4_, 4); \
+@@ -318,10 +325,11 @@ extern void alternative_branches(void);
+ })
+
+ #define alternative_call4(func, arg1, arg2, arg3, arg4) ({ \
++ typeof(arg1) v1_ = (arg1); \
+ typeof(arg2) v2_ = (arg2); \
+ typeof(arg3) v3_ = (arg3); \
+ typeof(arg4) v4_ = (arg4); \
+- ALT_CALL_ARG(arg1, 1); \
++ ALT_CALL_ARG(v1_, 1); \
+ ALT_CALL_ARG(v2_, 2); \
+ ALT_CALL_ARG(v3_, 3); \
+ ALT_CALL_ARG(v4_, 4); \
+@@ -332,11 +340,12 @@ extern void alternative_branches(void);
+ })
+
+ #define alternative_vcall5(func, arg1, arg2, arg3, arg4, arg5) ({ \
++ typeof(arg1) v1_ = (arg1); \
+ typeof(arg2) v2_ = (arg2); \
+ typeof(arg3) v3_ = (arg3); \
+ typeof(arg4) v4_ = (arg4); \
+ typeof(arg5) v5_ = (arg5); \
+- ALT_CALL_ARG(arg1, 1); \
++ ALT_CALL_ARG(v1_, 1); \
+ ALT_CALL_ARG(v2_, 2); \
+ ALT_CALL_ARG(v3_, 3); \
+ ALT_CALL_ARG(v4_, 4); \
+@@ -347,11 +356,12 @@ extern void alternative_branches(void);
+ })
+
+ #define alternative_call5(func, arg1, arg2, arg3, arg4, arg5) ({ \
++ typeof(arg1) v1_ = (arg1); \
+ typeof(arg2) v2_ = (arg2); \
+ typeof(arg3) v3_ = (arg3); \
+ typeof(arg4) v4_ = (arg4); \
+ typeof(arg5) v5_ = (arg5); \
+- ALT_CALL_ARG(arg1, 1); \
++ ALT_CALL_ARG(v1_, 1); \
+ ALT_CALL_ARG(v2_, 2); \
+ ALT_CALL_ARG(v3_, 3); \
+ ALT_CALL_ARG(v4_, 4); \
+@@ -363,12 +373,13 @@ extern void alternative_branches(void);
+ })
+
+ #define alternative_vcall6(func, arg1, arg2, arg3, arg4, arg5, arg6) ({ \
++ typeof(arg1) v1_ = (arg1); \
+ typeof(arg2) v2_ = (arg2); \
+ typeof(arg3) v3_ = (arg3); \
+ typeof(arg4) v4_ = (arg4); \
+ typeof(arg5) v5_ = (arg5); \
+ typeof(arg6) v6_ = (arg6); \
+- ALT_CALL_ARG(arg1, 1); \
++ ALT_CALL_ARG(v1_, 1); \
+ ALT_CALL_ARG(v2_, 2); \
+ ALT_CALL_ARG(v3_, 3); \
+ ALT_CALL_ARG(v4_, 4); \
+@@ -379,12 +390,13 @@ extern void alternative_branches(void);
+ })
+
+ #define alternative_call6(func, arg1, arg2, arg3, arg4, arg5, arg6) ({ \
++ typeof(arg1) v1_ = (arg1); \
+ typeof(arg2) v2_ = (arg2); \
+ typeof(arg3) v3_ = (arg3); \
+ typeof(arg4) v4_ = (arg4); \
+ typeof(arg5) v5_ = (arg5); \
+ typeof(arg6) v6_ = (arg6); \
+- ALT_CALL_ARG(arg1, 1); \
++ ALT_CALL_ARG(v1_, 1); \
+ ALT_CALL_ARG(v2_, 2); \
+ ALT_CALL_ARG(v3_, 3); \
+ ALT_CALL_ARG(v4_, 4); \
+--
+2.44.0
+
diff --git a/0036-x86-cpu-policy-Allow-for-levelling-of-VERW-side-effe.patch b/0036-x86-cpu-policy-Allow-for-levelling-of-VERW-side-effe.patch
new file mode 100644
index 0000000..b91ff52
--- /dev/null
+++ b/0036-x86-cpu-policy-Allow-for-levelling-of-VERW-side-effe.patch
@@ -0,0 +1,102 @@
+From 54dacb5c02cba4676879ed077765734326b78e39 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 5 Mar 2024 12:01:22 +0100
+Subject: [PATCH 36/67] x86/cpu-policy: Allow for levelling of VERW side
+ effects
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+MD_CLEAR and FB_CLEAR need OR-ing across a migrate pool. Allow this, by
+having them unconditinally set in max, with the host values reflected in
+default. Annotate the bits as having special properies.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: de17162cafd27f2865a3102a2ec0f386a02ed03d
+master date: 2024-03-01 20:14:19 +0000
+---
+ xen/arch/x86/cpu-policy.c | 24 +++++++++++++++++++++
+ xen/arch/x86/include/asm/cpufeature.h | 1 +
+ xen/include/public/arch-x86/cpufeatureset.h | 4 ++--
+ 3 files changed, 27 insertions(+), 2 deletions(-)
+
+diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
+index f0f2c8a1c0..7b875a7221 100644
+--- a/xen/arch/x86/cpu-policy.c
++++ b/xen/arch/x86/cpu-policy.c
+@@ -435,6 +435,16 @@ static void __init guest_common_max_feature_adjustments(uint32_t *fs)
+ __set_bit(X86_FEATURE_RSBA, fs);
+ __set_bit(X86_FEATURE_RRSBA, fs);
+
++ /*
++ * These bits indicate that the VERW instruction may have gained
++ * scrubbing side effects. With pooling, they mean "you might migrate
++ * somewhere where scrubbing is necessary", and may need exposing on
++ * unaffected hardware. This is fine, because the VERW instruction
++ * has been around since the 286.
++ */
++ __set_bit(X86_FEATURE_MD_CLEAR, fs);
++ __set_bit(X86_FEATURE_FB_CLEAR, fs);
++
+ /*
+ * The Gather Data Sampling microcode mitigation (August 2023) has an
+ * adverse performance impact on the CLWB instruction on SKX/CLX/CPX.
+@@ -469,6 +479,20 @@ static void __init guest_common_default_feature_adjustments(uint32_t *fs)
+ cpu_has_rdrand && !is_forced_cpu_cap(X86_FEATURE_RDRAND) )
+ __clear_bit(X86_FEATURE_RDRAND, fs);
+
++ /*
++ * These bits indicate that the VERW instruction may have gained
++ * scrubbing side effects. The max policy has them set for migration
++ * reasons, so reset the default policy back to the host values in
++ * case we're unaffected.
++ */
++ __clear_bit(X86_FEATURE_MD_CLEAR, fs);
++ if ( cpu_has_md_clear )
++ __set_bit(X86_FEATURE_MD_CLEAR, fs);
++
++ __clear_bit(X86_FEATURE_FB_CLEAR, fs);
++ if ( cpu_has_fb_clear )
++ __set_bit(X86_FEATURE_FB_CLEAR, fs);
++
+ /*
+ * The Gather Data Sampling microcode mitigation (August 2023) has an
+ * adverse performance impact on the CLWB instruction on SKX/CLX/CPX.
+diff --git a/xen/arch/x86/include/asm/cpufeature.h b/xen/arch/x86/include/asm/cpufeature.h
+index 9ef7756593..ec824e8954 100644
+--- a/xen/arch/x86/include/asm/cpufeature.h
++++ b/xen/arch/x86/include/asm/cpufeature.h
+@@ -136,6 +136,7 @@
+ #define cpu_has_avx512_4fmaps boot_cpu_has(X86_FEATURE_AVX512_4FMAPS)
+ #define cpu_has_avx512_vp2intersect boot_cpu_has(X86_FEATURE_AVX512_VP2INTERSECT)
+ #define cpu_has_srbds_ctrl boot_cpu_has(X86_FEATURE_SRBDS_CTRL)
++#define cpu_has_md_clear boot_cpu_has(X86_FEATURE_MD_CLEAR)
+ #define cpu_has_rtm_always_abort boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)
+ #define cpu_has_tsx_force_abort boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)
+ #define cpu_has_serialize boot_cpu_has(X86_FEATURE_SERIALIZE)
+diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
+index 94d211df2f..aec1407613 100644
+--- a/xen/include/public/arch-x86/cpufeatureset.h
++++ b/xen/include/public/arch-x86/cpufeatureset.h
+@@ -260,7 +260,7 @@ XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A AVX512 Multiply Accumulation Single
+ XEN_CPUFEATURE(FSRM, 9*32+ 4) /*A Fast Short REP MOVS */
+ XEN_CPUFEATURE(AVX512_VP2INTERSECT, 9*32+8) /*a VP2INTERSECT{D,Q} insns */
+ XEN_CPUFEATURE(SRBDS_CTRL, 9*32+ 9) /* MSR_MCU_OPT_CTRL and RNGDS_MITG_DIS. */
+-XEN_CPUFEATURE(MD_CLEAR, 9*32+10) /*A VERW clears microarchitectural buffers */
++XEN_CPUFEATURE(MD_CLEAR, 9*32+10) /*!A VERW clears microarchitectural buffers */
+ XEN_CPUFEATURE(RTM_ALWAYS_ABORT, 9*32+11) /*! June 2021 TSX defeaturing in microcode. */
+ XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
+ XEN_CPUFEATURE(SERIALIZE, 9*32+14) /*A SERIALIZE insn */
+@@ -321,7 +321,7 @@ XEN_CPUFEATURE(DOITM, 16*32+12) /* Data Operand Invariant Timing
+ XEN_CPUFEATURE(SBDR_SSDP_NO, 16*32+13) /*A No Shared Buffer Data Read or Sideband Stale Data Propagation */
+ XEN_CPUFEATURE(FBSDP_NO, 16*32+14) /*A No Fill Buffer Stale Data Propagation */
+ XEN_CPUFEATURE(PSDP_NO, 16*32+15) /*A No Primary Stale Data Propagation */
+-XEN_CPUFEATURE(FB_CLEAR, 16*32+17) /*A Fill Buffers cleared by VERW */
++XEN_CPUFEATURE(FB_CLEAR, 16*32+17) /*!A Fill Buffers cleared by VERW */
+ XEN_CPUFEATURE(FB_CLEAR_CTRL, 16*32+18) /* MSR_OPT_CPU_CTRL.FB_CLEAR_DIS */
+ XEN_CPUFEATURE(RRSBA, 16*32+19) /*! Restricted RSB Alternative */
+ XEN_CPUFEATURE(BHI_NO, 16*32+20) /*A No Branch History Injection */
+--
+2.44.0
+
diff --git a/0037-hvmloader-PCI-skip-huge-BARs-in-certain-calculations.patch b/0037-hvmloader-PCI-skip-huge-BARs-in-certain-calculations.patch
new file mode 100644
index 0000000..a46f913
--- /dev/null
+++ b/0037-hvmloader-PCI-skip-huge-BARs-in-certain-calculations.patch
@@ -0,0 +1,99 @@
+From 1e9808227c10717228969e924cab49cad4af6265 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 12 Mar 2024 12:08:48 +0100
+Subject: [PATCH 37/67] hvmloader/PCI: skip huge BARs in certain calculations
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+BARs of size 2Gb and up can't possibly fit below 4Gb: Both the bottom of
+the lower 2Gb range and the top of the higher 2Gb range have special
+purpose. Don't even have them influence whether to (perhaps) relocate
+low RAM.
+
+Reported-by: Neowutran <xen@neowutran.ovh>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 57acad12a09ffa490e870ebe17596aad858f0191
+master date: 2024-03-06 10:19:29 +0100
+---
+ tools/firmware/hvmloader/pci.c | 28 ++++++++++++++++++++--------
+ 1 file changed, 20 insertions(+), 8 deletions(-)
+
+diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
+index 257a6feb61..c3c61ca060 100644
+--- a/tools/firmware/hvmloader/pci.c
++++ b/tools/firmware/hvmloader/pci.c
+@@ -33,6 +33,13 @@ uint32_t pci_mem_start = HVM_BELOW_4G_MMIO_START;
+ const uint32_t pci_mem_end = RESERVED_MEMBASE;
+ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
+
++/*
++ * BARs larger than this value are put in 64-bit space unconditionally. That
++ * is, such BARs also don't play into the determination of how big the lowmem
++ * MMIO hole needs to be.
++ */
++#define BAR_RELOC_THRESH GB(1)
++
+ enum virtual_vga virtual_vga = VGA_none;
+ unsigned long igd_opregion_pgbase = 0;
+
+@@ -286,9 +293,11 @@ void pci_setup(void)
+ bars[i].bar_reg = bar_reg;
+ bars[i].bar_sz = bar_sz;
+
+- if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
+- PCI_BASE_ADDRESS_SPACE_MEMORY) ||
+- (bar_reg == PCI_ROM_ADDRESS) )
++ if ( is_64bar && bar_sz > BAR_RELOC_THRESH )
++ bar64_relocate = 1;
++ else if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
++ PCI_BASE_ADDRESS_SPACE_MEMORY) ||
++ (bar_reg == PCI_ROM_ADDRESS) )
+ mmio_total += bar_sz;
+
+ nr_bars++;
+@@ -367,7 +376,7 @@ void pci_setup(void)
+ pci_mem_start = hvm_info->low_mem_pgend << PAGE_SHIFT;
+ }
+
+- if ( mmio_total > (pci_mem_end - pci_mem_start) )
++ if ( mmio_total > (pci_mem_end - pci_mem_start) || bar64_relocate )
+ {
+ printf("Low MMIO hole not large enough for all devices,"
+ " relocating some BARs to 64-bit\n");
+@@ -430,7 +439,8 @@ void pci_setup(void)
+
+ /*
+ * Relocate to high memory if the total amount of MMIO needed
+- * is more than the low MMIO available. Because devices are
++ * is more than the low MMIO available or BARs bigger than
++ * BAR_RELOC_THRESH are present. Because devices are
+ * processed in order of bar_sz, this will preferentially
+ * relocate larger devices to high memory first.
+ *
+@@ -446,8 +456,9 @@ void pci_setup(void)
+ * the code here assumes it to be.)
+ * Should either of those two conditions change, this code will break.
+ */
+- using_64bar = bars[i].is_64bar && bar64_relocate
+- && (mmio_total > (mem_resource.max - mem_resource.base));
++ using_64bar = bars[i].is_64bar && bar64_relocate &&
++ (mmio_total > (mem_resource.max - mem_resource.base) ||
++ bar_sz > BAR_RELOC_THRESH);
+ bar_data = pci_readl(devfn, bar_reg);
+
+ if ( (bar_data & PCI_BASE_ADDRESS_SPACE) ==
+@@ -467,7 +478,8 @@ void pci_setup(void)
+ resource = &mem_resource;
+ bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
+ }
+- mmio_total -= bar_sz;
++ if ( bar_sz <= BAR_RELOC_THRESH )
++ mmio_total -= bar_sz;
+ }
+ else
+ {
+--
+2.44.0
+
diff --git a/0038-x86-mm-fix-detection-of-last-L1-entry-in-modify_xen_.patch b/0038-x86-mm-fix-detection-of-last-L1-entry-in-modify_xen_.patch
new file mode 100644
index 0000000..66b4db3
--- /dev/null
+++ b/0038-x86-mm-fix-detection-of-last-L1-entry-in-modify_xen_.patch
@@ -0,0 +1,41 @@
+From 1f94117bec55a7b934fed3dfd3529db624eb441f Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 12 Mar 2024 12:08:59 +0100
+Subject: [PATCH 38/67] x86/mm: fix detection of last L1 entry in
+ modify_xen_mappings_lite()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The current logic to detect when to switch to the next L1 table is incorrectly
+using l2_table_offset() in order to notice when the last entry on the current
+L1 table has been reached.
+
+It should instead use l1_table_offset() to check whether the index has wrapped
+to point to the first entry, and so the next L1 table should be used.
+
+Fixes: 8676092a0f16 ('x86/livepatch: Fix livepatch application when CET is active')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 7c81558208de7858251b62f168a449be84305595
+master date: 2024-03-11 11:09:42 +0000
+---
+ xen/arch/x86/mm.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index e884a6fdbd..330c4abcd1 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -5963,7 +5963,7 @@ void init_or_livepatch modify_xen_mappings_lite(
+
+ v += 1UL << L1_PAGETABLE_SHIFT;
+
+- if ( l2_table_offset(v) == 0 )
++ if ( l1_table_offset(v) == 0 )
+ break;
+ }
+
+--
+2.44.0
+
diff --git a/0039-x86-entry-Introduce-EFRAME_-constants.patch b/0039-x86-entry-Introduce-EFRAME_-constants.patch
new file mode 100644
index 0000000..c280286
--- /dev/null
+++ b/0039-x86-entry-Introduce-EFRAME_-constants.patch
@@ -0,0 +1,314 @@
+From e691f99f17198906f813b85dcabafe5addb9a57a Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Sat, 27 Jan 2024 17:52:09 +0000
+Subject: [PATCH 39/67] x86/entry: Introduce EFRAME_* constants
+
+restore_all_guest() does a lot of manipulation of the stack after popping the
+GPRs, and uses raw %rsp displacements to do so. Also, almost all entrypaths
+use raw %rsp displacements prior to pushing GPRs.
+
+Provide better mnemonics, to aid readability and reduce the chance of errors
+when editing.
+
+No functional change. The resulting binary is identical.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 37541208f119a9c552c6c6c3246ea61be0d44035)
+---
+ xen/arch/x86/x86_64/asm-offsets.c | 17 ++++++++
+ xen/arch/x86/x86_64/compat/entry.S | 2 +-
+ xen/arch/x86/x86_64/entry.S | 70 +++++++++++++++---------------
+ 3 files changed, 53 insertions(+), 36 deletions(-)
+
+diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
+index 287dac101a..31fa63b77f 100644
+--- a/xen/arch/x86/x86_64/asm-offsets.c
++++ b/xen/arch/x86/x86_64/asm-offsets.c
+@@ -51,6 +51,23 @@ void __dummy__(void)
+ OFFSET(UREGS_kernel_sizeof, struct cpu_user_regs, es);
+ BLANK();
+
++ /*
++ * EFRAME_* is for the entry/exit logic where %rsp is pointing at
++ * UREGS_error_code and GPRs are still/already guest values.
++ */
++#define OFFSET_EF(sym, mem) \
++ DEFINE(sym, offsetof(struct cpu_user_regs, mem) - \
++ offsetof(struct cpu_user_regs, error_code))
++
++ OFFSET_EF(EFRAME_entry_vector, entry_vector);
++ OFFSET_EF(EFRAME_rip, rip);
++ OFFSET_EF(EFRAME_cs, cs);
++ OFFSET_EF(EFRAME_eflags, eflags);
++ OFFSET_EF(EFRAME_rsp, rsp);
++ BLANK();
++
++#undef OFFSET_EF
++
+ OFFSET(VCPU_processor, struct vcpu, processor);
+ OFFSET(VCPU_domain, struct vcpu, domain);
+ OFFSET(VCPU_vcpu_info, struct vcpu, vcpu_info);
+diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
+index 253bb1688c..7c211314d8 100644
+--- a/xen/arch/x86/x86_64/compat/entry.S
++++ b/xen/arch/x86/x86_64/compat/entry.S
+@@ -15,7 +15,7 @@ ENTRY(entry_int82)
+ ENDBR64
+ ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
+ pushq $0
+- movl $HYPERCALL_VECTOR, 4(%rsp)
++ movl $HYPERCALL_VECTOR, EFRAME_entry_vector(%rsp)
+ SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */
+
+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
+diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
+index 585b0c9551..412cbeb3ec 100644
+--- a/xen/arch/x86/x86_64/entry.S
++++ b/xen/arch/x86/x86_64/entry.S
+@@ -190,15 +190,15 @@ restore_all_guest:
+ SPEC_CTRL_EXIT_TO_PV /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
+
+ RESTORE_ALL
+- testw $TRAP_syscall,4(%rsp)
++ testw $TRAP_syscall, EFRAME_entry_vector(%rsp)
+ jz iret_exit_to_guest
+
+- movq 24(%rsp),%r11 # RFLAGS
++ mov EFRAME_eflags(%rsp), %r11
+ andq $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), %r11
+ orq $X86_EFLAGS_IF,%r11
+
+ /* Don't use SYSRET path if the return address is not canonical. */
+- movq 8(%rsp),%rcx
++ mov EFRAME_rip(%rsp), %rcx
+ sarq $47,%rcx
+ incl %ecx
+ cmpl $1,%ecx
+@@ -213,20 +213,20 @@ restore_all_guest:
+ ALTERNATIVE "", rag_clrssbsy, X86_FEATURE_XEN_SHSTK
+ #endif
+
+- movq 8(%rsp), %rcx # RIP
+- cmpw $FLAT_USER_CS32,16(%rsp)# CS
+- movq 32(%rsp),%rsp # RSP
++ mov EFRAME_rip(%rsp), %rcx
++ cmpw $FLAT_USER_CS32, EFRAME_cs(%rsp)
++ mov EFRAME_rsp(%rsp), %rsp
+ je 1f
+ sysretq
+ 1: sysretl
+
+ ALIGN
+ .Lrestore_rcx_iret_exit_to_guest:
+- movq 8(%rsp), %rcx # RIP
++ mov EFRAME_rip(%rsp), %rcx
+ /* No special register assumptions. */
+ iret_exit_to_guest:
+- andl $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), 24(%rsp)
+- orl $X86_EFLAGS_IF,24(%rsp)
++ andl $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), EFRAME_eflags(%rsp)
++ orl $X86_EFLAGS_IF, EFRAME_eflags(%rsp)
+ addq $8,%rsp
+ .Lft0: iretq
+ _ASM_PRE_EXTABLE(.Lft0, handle_exception)
+@@ -257,7 +257,7 @@ ENTRY(lstar_enter)
+ pushq $FLAT_KERNEL_CS64
+ pushq %rcx
+ pushq $0
+- movl $TRAP_syscall, 4(%rsp)
++ movl $TRAP_syscall, EFRAME_entry_vector(%rsp)
+ SAVE_ALL
+
+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
+@@ -294,7 +294,7 @@ ENTRY(cstar_enter)
+ pushq $FLAT_USER_CS32
+ pushq %rcx
+ pushq $0
+- movl $TRAP_syscall, 4(%rsp)
++ movl $TRAP_syscall, EFRAME_entry_vector(%rsp)
+ SAVE_ALL
+
+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
+@@ -335,7 +335,7 @@ GLOBAL(sysenter_eflags_saved)
+ pushq $3 /* ring 3 null cs */
+ pushq $0 /* null rip */
+ pushq $0
+- movl $TRAP_syscall, 4(%rsp)
++ movl $TRAP_syscall, EFRAME_entry_vector(%rsp)
+ SAVE_ALL
+
+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
+@@ -389,7 +389,7 @@ ENTRY(int80_direct_trap)
+ ENDBR64
+ ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
+ pushq $0
+- movl $0x80, 4(%rsp)
++ movl $0x80, EFRAME_entry_vector(%rsp)
+ SAVE_ALL
+
+ SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
+@@ -649,7 +649,7 @@ ret_from_intr:
+ .section .init.text, "ax", @progbits
+ ENTRY(early_page_fault)
+ ENDBR64
+- movl $TRAP_page_fault, 4(%rsp)
++ movl $TRAP_page_fault, EFRAME_entry_vector(%rsp)
+ SAVE_ALL
+ movq %rsp, %rdi
+ call do_early_page_fault
+@@ -716,7 +716,7 @@ ENTRY(common_interrupt)
+
+ ENTRY(page_fault)
+ ENDBR64
+- movl $TRAP_page_fault,4(%rsp)
++ movl $TRAP_page_fault, EFRAME_entry_vector(%rsp)
+ /* No special register assumptions. */
+ GLOBAL(handle_exception)
+ ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
+@@ -892,90 +892,90 @@ FATAL_exception_with_ints_disabled:
+ ENTRY(divide_error)
+ ENDBR64
+ pushq $0
+- movl $TRAP_divide_error,4(%rsp)
++ movl $TRAP_divide_error, EFRAME_entry_vector(%rsp)
+ jmp handle_exception
+
+ ENTRY(coprocessor_error)
+ ENDBR64
+ pushq $0
+- movl $TRAP_copro_error,4(%rsp)
++ movl $TRAP_copro_error, EFRAME_entry_vector(%rsp)
+ jmp handle_exception
+
+ ENTRY(simd_coprocessor_error)
+ ENDBR64
+ pushq $0
+- movl $TRAP_simd_error,4(%rsp)
++ movl $TRAP_simd_error, EFRAME_entry_vector(%rsp)
+ jmp handle_exception
+
+ ENTRY(device_not_available)
+ ENDBR64
+ pushq $0
+- movl $TRAP_no_device,4(%rsp)
++ movl $TRAP_no_device, EFRAME_entry_vector(%rsp)
+ jmp handle_exception
+
+ ENTRY(debug)
+ ENDBR64
+ pushq $0
+- movl $TRAP_debug,4(%rsp)
++ movl $TRAP_debug, EFRAME_entry_vector(%rsp)
+ jmp handle_ist_exception
+
+ ENTRY(int3)
+ ENDBR64
+ pushq $0
+- movl $TRAP_int3,4(%rsp)
++ movl $TRAP_int3, EFRAME_entry_vector(%rsp)
+ jmp handle_exception
+
+ ENTRY(overflow)
+ ENDBR64
+ pushq $0
+- movl $TRAP_overflow,4(%rsp)
++ movl $TRAP_overflow, EFRAME_entry_vector(%rsp)
+ jmp handle_exception
+
+ ENTRY(bounds)
+ ENDBR64
+ pushq $0
+- movl $TRAP_bounds,4(%rsp)
++ movl $TRAP_bounds, EFRAME_entry_vector(%rsp)
+ jmp handle_exception
+
+ ENTRY(invalid_op)
+ ENDBR64
+ pushq $0
+- movl $TRAP_invalid_op,4(%rsp)
++ movl $TRAP_invalid_op, EFRAME_entry_vector(%rsp)
+ jmp handle_exception
+
+ ENTRY(invalid_TSS)
+ ENDBR64
+- movl $TRAP_invalid_tss,4(%rsp)
++ movl $TRAP_invalid_tss, EFRAME_entry_vector(%rsp)
+ jmp handle_exception
+
+ ENTRY(segment_not_present)
+ ENDBR64
+- movl $TRAP_no_segment,4(%rsp)
++ movl $TRAP_no_segment, EFRAME_entry_vector(%rsp)
+ jmp handle_exception
+
+ ENTRY(stack_segment)
+ ENDBR64
+- movl $TRAP_stack_error,4(%rsp)
++ movl $TRAP_stack_error, EFRAME_entry_vector(%rsp)
+ jmp handle_exception
+
+ ENTRY(general_protection)
+ ENDBR64
+- movl $TRAP_gp_fault,4(%rsp)
++ movl $TRAP_gp_fault, EFRAME_entry_vector(%rsp)
+ jmp handle_exception
+
+ ENTRY(alignment_check)
+ ENDBR64
+- movl $TRAP_alignment_check,4(%rsp)
++ movl $TRAP_alignment_check, EFRAME_entry_vector(%rsp)
+ jmp handle_exception
+
+ ENTRY(entry_CP)
+ ENDBR64
+- movl $X86_EXC_CP, 4(%rsp)
++ movl $X86_EXC_CP, EFRAME_entry_vector(%rsp)
+ jmp handle_exception
+
+ ENTRY(double_fault)
+ ENDBR64
+- movl $TRAP_double_fault,4(%rsp)
++ movl $TRAP_double_fault, EFRAME_entry_vector(%rsp)
+ /* Set AC to reduce chance of further SMAP faults */
+ ALTERNATIVE "", stac, X86_FEATURE_XEN_SMAP
+ SAVE_ALL
+@@ -1001,7 +1001,7 @@ ENTRY(double_fault)
+ ENTRY(nmi)
+ ENDBR64
+ pushq $0
+- movl $TRAP_nmi,4(%rsp)
++ movl $TRAP_nmi, EFRAME_entry_vector(%rsp)
+ handle_ist_exception:
+ ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
+ SAVE_ALL
+@@ -1134,7 +1134,7 @@ handle_ist_exception:
+ ENTRY(machine_check)
+ ENDBR64
+ pushq $0
+- movl $TRAP_machine_check,4(%rsp)
++ movl $TRAP_machine_check, EFRAME_entry_vector(%rsp)
+ jmp handle_ist_exception
+
+ /* No op trap handler. Required for kexec crash path. */
+@@ -1171,7 +1171,7 @@ autogen_stubs: /* Automatically generated stubs. */
+ 1:
+ ENDBR64
+ pushq $0
+- movb $vec,4(%rsp)
++ movb $vec, EFRAME_entry_vector(%rsp)
+ jmp common_interrupt
+
+ entrypoint 1b
+@@ -1185,7 +1185,7 @@ autogen_stubs: /* Automatically generated stubs. */
+ test $8,%spl /* 64bit exception frames are 16 byte aligned, but the word */
+ jz 2f /* size is 8 bytes. Check whether the processor gave us an */
+ pushq $0 /* error code, and insert an empty one if not. */
+-2: movb $vec,4(%rsp)
++2: movb $vec, EFRAME_entry_vector(%rsp)
+ jmp handle_exception
+
+ entrypoint 1b
+--
+2.44.0
+
diff --git a/0040-x86-Resync-intel-family.h-from-Linux.patch b/0040-x86-Resync-intel-family.h-from-Linux.patch
new file mode 100644
index 0000000..84e0304
--- /dev/null
+++ b/0040-x86-Resync-intel-family.h-from-Linux.patch
@@ -0,0 +1,98 @@
+From abc43cf5a6579f1aa0decf0a2349cdd2d2473117 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 27 Feb 2024 16:07:39 +0000
+Subject: [PATCH 40/67] x86: Resync intel-family.h from Linux
+
+From v6.8-rc6
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 195e75371b13c4f7ecdf7b5c50aed0d02f2d7ce8)
+---
+ xen/arch/x86/include/asm/intel-family.h | 38 ++++++++++++++++++++++---
+ 1 file changed, 34 insertions(+), 4 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/intel-family.h b/xen/arch/x86/include/asm/intel-family.h
+index ffc49151be..b65e9c46b9 100644
+--- a/xen/arch/x86/include/asm/intel-family.h
++++ b/xen/arch/x86/include/asm/intel-family.h
+@@ -26,6 +26,9 @@
+ * _G - parts with extra graphics on
+ * _X - regular server parts
+ * _D - micro server parts
++ * _N,_P - other mobile parts
++ * _H - premium mobile parts
++ * _S - other client parts
+ *
+ * Historical OPTDIFFs:
+ *
+@@ -37,6 +40,9 @@
+ * their own names :-(
+ */
+
++/* Wildcard match for FAM6 so X86_MATCH_INTEL_FAM6_MODEL(ANY) works */
++#define INTEL_FAM6_ANY X86_MODEL_ANY
++
+ #define INTEL_FAM6_CORE_YONAH 0x0E
+
+ #define INTEL_FAM6_CORE2_MEROM 0x0F
+@@ -93,8 +99,6 @@
+ #define INTEL_FAM6_ICELAKE_L 0x7E /* Sunny Cove */
+ #define INTEL_FAM6_ICELAKE_NNPI 0x9D /* Sunny Cove */
+
+-#define INTEL_FAM6_LAKEFIELD 0x8A /* Sunny Cove / Tremont */
+-
+ #define INTEL_FAM6_ROCKETLAKE 0xA7 /* Cypress Cove */
+
+ #define INTEL_FAM6_TIGERLAKE_L 0x8C /* Willow Cove */
+@@ -102,12 +106,31 @@
+
+ #define INTEL_FAM6_SAPPHIRERAPIDS_X 0x8F /* Golden Cove */
+
++#define INTEL_FAM6_EMERALDRAPIDS_X 0xCF
++
++#define INTEL_FAM6_GRANITERAPIDS_X 0xAD
++#define INTEL_FAM6_GRANITERAPIDS_D 0xAE
++
++/* "Hybrid" Processors (P-Core/E-Core) */
++
++#define INTEL_FAM6_LAKEFIELD 0x8A /* Sunny Cove / Tremont */
++
+ #define INTEL_FAM6_ALDERLAKE 0x97 /* Golden Cove / Gracemont */
+ #define INTEL_FAM6_ALDERLAKE_L 0x9A /* Golden Cove / Gracemont */
+
+-#define INTEL_FAM6_RAPTORLAKE 0xB7
++#define INTEL_FAM6_RAPTORLAKE 0xB7 /* Raptor Cove / Enhanced Gracemont */
++#define INTEL_FAM6_RAPTORLAKE_P 0xBA
++#define INTEL_FAM6_RAPTORLAKE_S 0xBF
++
++#define INTEL_FAM6_METEORLAKE 0xAC
++#define INTEL_FAM6_METEORLAKE_L 0xAA
++
++#define INTEL_FAM6_ARROWLAKE_H 0xC5
++#define INTEL_FAM6_ARROWLAKE 0xC6
++
++#define INTEL_FAM6_LUNARLAKE_M 0xBD
+
+-/* "Small Core" Processors (Atom) */
++/* "Small Core" Processors (Atom/E-Core) */
+
+ #define INTEL_FAM6_ATOM_BONNELL 0x1C /* Diamondville, Pineview */
+ #define INTEL_FAM6_ATOM_BONNELL_MID 0x26 /* Silverthorne, Lincroft */
+@@ -134,6 +157,13 @@
+ #define INTEL_FAM6_ATOM_TREMONT 0x96 /* Elkhart Lake */
+ #define INTEL_FAM6_ATOM_TREMONT_L 0x9C /* Jasper Lake */
+
++#define INTEL_FAM6_ATOM_GRACEMONT 0xBE /* Alderlake N */
++
++#define INTEL_FAM6_ATOM_CRESTMONT_X 0xAF /* Sierra Forest */
++#define INTEL_FAM6_ATOM_CRESTMONT 0xB6 /* Grand Ridge */
++
++#define INTEL_FAM6_ATOM_DARKMONT_X 0xDD /* Clearwater Forest */
++
+ /* Xeon Phi */
+
+ #define INTEL_FAM6_XEON_PHI_KNL 0x57 /* Knights Landing */
+--
+2.44.0
+
diff --git a/0041-x86-vmx-Perform-VERW-flushing-later-in-the-VMExit-pa.patch b/0041-x86-vmx-Perform-VERW-flushing-later-in-the-VMExit-pa.patch
new file mode 100644
index 0000000..871f10f
--- /dev/null
+++ b/0041-x86-vmx-Perform-VERW-flushing-later-in-the-VMExit-pa.patch
@@ -0,0 +1,146 @@
+From 77f2bec134049aba29b9b459f955022722d10847 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 23 Jun 2023 11:32:00 +0100
+Subject: [PATCH 41/67] x86/vmx: Perform VERW flushing later in the VMExit path
+
+Broken out of the following patch because this change is subtle enough on its
+own. See it for the rational of why we're moving VERW.
+
+As for how, extend the trick already used to hold one condition in
+flags (RESUME vs LAUNCH) through the POPing of GPRs.
+
+Move the MOV CR earlier. Intel specify flags to be undefined across it.
+
+Encode the two conditions we want using SF and PF. See the code comment for
+exactly how.
+
+Leave a comment to explain the lack of any content around
+SPEC_CTRL_EXIT_TO_VMX, but leave the block in place. Sods law says if we
+delete it, we'll need to reintroduce it.
+
+This is part of XSA-452 / CVE-2023-28746.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 475fa20b7384464210f42bad7195f87bd6f1c63f)
+---
+ xen/arch/x86/hvm/vmx/entry.S | 36 +++++++++++++++++++++---
+ xen/arch/x86/include/asm/asm_defns.h | 8 ++++++
+ xen/arch/x86/include/asm/spec_ctrl_asm.h | 7 +++++
+ xen/arch/x86/x86_64/asm-offsets.c | 1 +
+ 4 files changed, 48 insertions(+), 4 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
+index 5f5de45a13..cdde76e138 100644
+--- a/xen/arch/x86/hvm/vmx/entry.S
++++ b/xen/arch/x86/hvm/vmx/entry.S
+@@ -87,17 +87,39 @@ UNLIKELY_END(realmode)
+
+ /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
+ /* SPEC_CTRL_EXIT_TO_VMX Req: %rsp=regs/cpuinfo Clob: */
+- DO_SPEC_CTRL_COND_VERW
++ /*
++ * All speculation safety work happens to be elsewhere. VERW is after
++ * popping the GPRs, while restoring the guest MSR_SPEC_CTRL is left
++ * to the MSR load list.
++ */
+
+ mov VCPU_hvm_guest_cr2(%rbx),%rax
++ mov %rax, %cr2
++
++ /*
++ * We need to perform two conditional actions (VERW, and Resume vs
++ * Launch) after popping GPRs. With some cunning, we can encode both
++ * of these in eflags together.
++ *
++ * Parity is only calculated over the bottom byte of the answer, while
++ * Sign is simply the top bit.
++ *
++ * Therefore, the final OR instruction ends up producing:
++ * SF = VCPU_vmx_launched
++ * PF = !SCF_verw
++ */
++ BUILD_BUG_ON(SCF_verw & ~0xff)
++ movzbl VCPU_vmx_launched(%rbx), %ecx
++ shl $31, %ecx
++ movzbl CPUINFO_spec_ctrl_flags(%rsp), %eax
++ and $SCF_verw, %eax
++ or %eax, %ecx
+
+ pop %r15
+ pop %r14
+ pop %r13
+ pop %r12
+ pop %rbp
+- mov %rax,%cr2
+- cmpb $0,VCPU_vmx_launched(%rbx)
+ pop %rbx
+ pop %r11
+ pop %r10
+@@ -108,7 +130,13 @@ UNLIKELY_END(realmode)
+ pop %rdx
+ pop %rsi
+ pop %rdi
+- je .Lvmx_launch
++
++ jpe .L_skip_verw
++ /* VERW clobbers ZF, but preserves all others, including SF. */
++ verw STK_REL(CPUINFO_verw_sel, CPUINFO_error_code)(%rsp)
++.L_skip_verw:
++
++ jns .Lvmx_launch
+
+ /*.Lvmx_resume:*/
+ VMRESUME
+diff --git a/xen/arch/x86/include/asm/asm_defns.h b/xen/arch/x86/include/asm/asm_defns.h
+index d9431180cf..abc6822b08 100644
+--- a/xen/arch/x86/include/asm/asm_defns.h
++++ b/xen/arch/x86/include/asm/asm_defns.h
+@@ -81,6 +81,14 @@ register unsigned long current_stack_pointer asm("rsp");
+
+ #ifdef __ASSEMBLY__
+
++.macro BUILD_BUG_ON condstr, cond:vararg
++ .if \cond
++ .error "Condition \"\condstr\" not satisfied"
++ .endif
++.endm
++/* preprocessor macro to make error message more user friendly */
++#define BUILD_BUG_ON(cond) BUILD_BUG_ON #cond, cond
++
+ #ifdef HAVE_AS_QUOTED_SYM
+ #define SUBSECTION_LBL(tag) \
+ .ifndef .L.tag; \
+diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+index f4b8b9d956..ca9cb0f5dd 100644
+--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
++++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+@@ -164,6 +164,13 @@
+ #endif
+ .endm
+
++/*
++ * Helper to improve the readibility of stack dispacements with %rsp in
++ * unusual positions. Both @field and @top_of_stack should be constants from
++ * the same object. @top_of_stack should be where %rsp is currently pointing.
++ */
++#define STK_REL(field, top_of_stk) ((field) - (top_of_stk))
++
+ .macro DO_SPEC_CTRL_COND_VERW
+ /*
+ * Requires %rsp=cpuinfo
+diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
+index 31fa63b77f..a4e94d6930 100644
+--- a/xen/arch/x86/x86_64/asm-offsets.c
++++ b/xen/arch/x86/x86_64/asm-offsets.c
+@@ -135,6 +135,7 @@ void __dummy__(void)
+ #endif
+
+ OFFSET(CPUINFO_guest_cpu_user_regs, struct cpu_info, guest_cpu_user_regs);
++ OFFSET(CPUINFO_error_code, struct cpu_info, guest_cpu_user_regs.error_code);
+ OFFSET(CPUINFO_verw_sel, struct cpu_info, verw_sel);
+ OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
+ OFFSET(CPUINFO_per_cpu_offset, struct cpu_info, per_cpu_offset);
+--
+2.44.0
+
diff --git a/0042-x86-spec-ctrl-Perform-VERW-flushing-later-in-exit-pa.patch b/0042-x86-spec-ctrl-Perform-VERW-flushing-later-in-exit-pa.patch
new file mode 100644
index 0000000..ac78acd
--- /dev/null
+++ b/0042-x86-spec-ctrl-Perform-VERW-flushing-later-in-exit-pa.patch
@@ -0,0 +1,209 @@
+From 76af773de5d3e68b7140cc9c5343be6746c9101c Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Sat, 27 Jan 2024 18:20:56 +0000
+Subject: [PATCH 42/67] x86/spec-ctrl: Perform VERW flushing later in exit
+ paths
+
+On parts vulnerable to RFDS, VERW's side effects are extended to scrub all
+non-architectural entries in various Physical Register Files. To remove all
+of Xen's values, the VERW must be after popping the GPRs.
+
+Rework SPEC_CTRL_COND_VERW to default to an CPUINFO_error_code %rsp position,
+but with overrides for other contexts. Identify that it clobbers eflags; this
+is particularly relevant for the SYSRET path.
+
+For the IST exit return to Xen, have the main SPEC_CTRL_EXIT_TO_XEN put a
+shadow copy of spec_ctrl_flags, as GPRs can't be used at the point we want to
+issue the VERW.
+
+This is part of XSA-452 / CVE-2023-28746.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 0a666cf2cd99df6faf3eebc81a1fc286e4eca4c7)
+---
+ xen/arch/x86/include/asm/spec_ctrl_asm.h | 36 ++++++++++++++++--------
+ xen/arch/x86/x86_64/asm-offsets.c | 13 +++++++--
+ xen/arch/x86/x86_64/compat/entry.S | 6 ++++
+ xen/arch/x86/x86_64/entry.S | 21 +++++++++++++-
+ 4 files changed, 61 insertions(+), 15 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+index ca9cb0f5dd..97a97b2b82 100644
+--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
++++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
+@@ -171,16 +171,23 @@
+ */
+ #define STK_REL(field, top_of_stk) ((field) - (top_of_stk))
+
+-.macro DO_SPEC_CTRL_COND_VERW
++.macro SPEC_CTRL_COND_VERW \
++ scf=STK_REL(CPUINFO_spec_ctrl_flags, CPUINFO_error_code), \
++ sel=STK_REL(CPUINFO_verw_sel, CPUINFO_error_code)
+ /*
+- * Requires %rsp=cpuinfo
++ * Requires \scf and \sel as %rsp-relative expressions
++ * Clobbers eflags
++ *
++ * VERW needs to run after guest GPRs have been restored, where only %rsp is
++ * good to use. Default to expecting %rsp pointing at CPUINFO_error_code.
++ * Contexts where this is not true must provide an alternative \scf and \sel.
+ *
+ * Issue a VERW for its flushing side effect, if indicated. This is a Spectre
+ * v1 gadget, but the IRET/VMEntry is serialising.
+ */
+- testb $SCF_verw, CPUINFO_spec_ctrl_flags(%rsp)
++ testb $SCF_verw, \scf(%rsp)
+ jz .L\@_verw_skip
+- verw CPUINFO_verw_sel(%rsp)
++ verw \sel(%rsp)
+ .L\@_verw_skip:
+ .endm
+
+@@ -298,8 +305,6 @@
+ */
+ ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV
+
+- DO_SPEC_CTRL_COND_VERW
+-
+ ALTERNATIVE "", DO_SPEC_CTRL_DIV, X86_FEATURE_SC_DIV
+ .endm
+
+@@ -379,7 +384,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
+ */
+ .macro SPEC_CTRL_EXIT_TO_XEN
+ /*
+- * Requires %r12=ist_exit, %r14=stack_end
++ * Requires %r12=ist_exit, %r14=stack_end, %rsp=regs
+ * Clobbers %rax, %rbx, %rcx, %rdx
+ */
+ movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
+@@ -407,11 +412,18 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
+ test %r12, %r12
+ jz .L\@_skip_ist_exit
+
+- /* Logically DO_SPEC_CTRL_COND_VERW but without the %rsp=cpuinfo dependency */
+- testb $SCF_verw, %bl
+- jz .L\@_skip_verw
+- verw STACK_CPUINFO_FIELD(verw_sel)(%r14)
+-.L\@_skip_verw:
++ /*
++ * Stash SCF and verw_sel above eflags in the case of an IST_exit. The
++ * VERW logic needs to run after guest GPRs have been restored; i.e. where
++ * we cannot use %r12 or %r14 for the purposes they have here.
++ *
++ * When the CPU pushed this exception frame, it zero-extended eflags.
++ * Therefore it is safe for the VERW logic to look at the stashed SCF
++ * outside of the ist_exit condition. Also, this stashing won't influence
++ * any other restore_all_guest() paths.
++ */
++ or $(__HYPERVISOR_DS32 << 16), %ebx
++ mov %ebx, UREGS_eflags + 4(%rsp) /* EFRAME_shadow_scf/sel */
+
+ ALTERNATIVE "", DO_SPEC_CTRL_DIV, X86_FEATURE_SC_DIV
+
+diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
+index a4e94d6930..4cd5938d7b 100644
+--- a/xen/arch/x86/x86_64/asm-offsets.c
++++ b/xen/arch/x86/x86_64/asm-offsets.c
+@@ -55,14 +55,22 @@ void __dummy__(void)
+ * EFRAME_* is for the entry/exit logic where %rsp is pointing at
+ * UREGS_error_code and GPRs are still/already guest values.
+ */
+-#define OFFSET_EF(sym, mem) \
++#define OFFSET_EF(sym, mem, ...) \
+ DEFINE(sym, offsetof(struct cpu_user_regs, mem) - \
+- offsetof(struct cpu_user_regs, error_code))
++ offsetof(struct cpu_user_regs, error_code) __VA_ARGS__)
+
+ OFFSET_EF(EFRAME_entry_vector, entry_vector);
+ OFFSET_EF(EFRAME_rip, rip);
+ OFFSET_EF(EFRAME_cs, cs);
+ OFFSET_EF(EFRAME_eflags, eflags);
++
++ /*
++ * These aren't real fields. They're spare space, used by the IST
++ * exit-to-xen path.
++ */
++ OFFSET_EF(EFRAME_shadow_scf, eflags, +4);
++ OFFSET_EF(EFRAME_shadow_sel, eflags, +6);
++
+ OFFSET_EF(EFRAME_rsp, rsp);
+ BLANK();
+
+@@ -136,6 +144,7 @@ void __dummy__(void)
+
+ OFFSET(CPUINFO_guest_cpu_user_regs, struct cpu_info, guest_cpu_user_regs);
+ OFFSET(CPUINFO_error_code, struct cpu_info, guest_cpu_user_regs.error_code);
++ OFFSET(CPUINFO_rip, struct cpu_info, guest_cpu_user_regs.rip);
+ OFFSET(CPUINFO_verw_sel, struct cpu_info, verw_sel);
+ OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
+ OFFSET(CPUINFO_per_cpu_offset, struct cpu_info, per_cpu_offset);
+diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
+index 7c211314d8..3b2fbcd873 100644
+--- a/xen/arch/x86/x86_64/compat/entry.S
++++ b/xen/arch/x86/x86_64/compat/entry.S
+@@ -161,6 +161,12 @@ ENTRY(compat_restore_all_guest)
+ SPEC_CTRL_EXIT_TO_PV /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
+
+ RESTORE_ALL adj=8 compat=1
++
++ /* Account for ev/ec having already been popped off the stack. */
++ SPEC_CTRL_COND_VERW \
++ scf=STK_REL(CPUINFO_spec_ctrl_flags, CPUINFO_rip), \
++ sel=STK_REL(CPUINFO_verw_sel, CPUINFO_rip)
++
+ .Lft0: iretq
+ _ASM_PRE_EXTABLE(.Lft0, handle_exception)
+
+diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
+index 412cbeb3ec..ef517e2945 100644
+--- a/xen/arch/x86/x86_64/entry.S
++++ b/xen/arch/x86/x86_64/entry.S
+@@ -214,6 +214,9 @@ restore_all_guest:
+ #endif
+
+ mov EFRAME_rip(%rsp), %rcx
++
++ SPEC_CTRL_COND_VERW /* Req: %rsp=eframe Clob: efl */
++
+ cmpw $FLAT_USER_CS32, EFRAME_cs(%rsp)
+ mov EFRAME_rsp(%rsp), %rsp
+ je 1f
+@@ -227,6 +230,9 @@ restore_all_guest:
+ iret_exit_to_guest:
+ andl $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), EFRAME_eflags(%rsp)
+ orl $X86_EFLAGS_IF, EFRAME_eflags(%rsp)
++
++ SPEC_CTRL_COND_VERW /* Req: %rsp=eframe Clob: efl */
++
+ addq $8,%rsp
+ .Lft0: iretq
+ _ASM_PRE_EXTABLE(.Lft0, handle_exception)
+@@ -679,9 +685,22 @@ UNLIKELY_START(ne, exit_cr3)
+ UNLIKELY_END(exit_cr3)
+
+ /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
+- SPEC_CTRL_EXIT_TO_XEN /* Req: %r12=ist_exit %r14=end, Clob: abcd */
++ SPEC_CTRL_EXIT_TO_XEN /* Req: %r12=ist_exit %r14=end %rsp=regs, Clob: abcd */
+
+ RESTORE_ALL adj=8
++
++ /*
++ * When the CPU pushed this exception frame, it zero-extended eflags.
++ * For an IST exit, SPEC_CTRL_EXIT_TO_XEN stashed shadow copies of
++ * spec_ctrl_flags and ver_sel above eflags, as we can't use any GPRs,
++ * and we're at a random place on the stack, not in a CPUFINFO block.
++ *
++ * Account for ev/ec having already been popped off the stack.
++ */
++ SPEC_CTRL_COND_VERW \
++ scf=STK_REL(EFRAME_shadow_scf, EFRAME_rip), \
++ sel=STK_REL(EFRAME_shadow_sel, EFRAME_rip)
++
+ iretq
+
+ ENTRY(common_interrupt)
+--
+2.44.0
+
diff --git a/0043-x86-spec-ctrl-Rename-VERW-related-options.patch b/0043-x86-spec-ctrl-Rename-VERW-related-options.patch
new file mode 100644
index 0000000..38edc15
--- /dev/null
+++ b/0043-x86-spec-ctrl-Rename-VERW-related-options.patch
@@ -0,0 +1,248 @@
+From d55d52961d13d4fcd1441fcfca98f690e687b941 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Mon, 12 Feb 2024 17:50:43 +0000
+Subject: [PATCH 43/67] x86/spec-ctrl: Rename VERW related options
+
+VERW is going to be used for a 3rd purpose, and the existing nomenclature
+didn't survive the Stale MMIO issues terribly well.
+
+Rename the command line option from `md-clear=` to `verw=`. This is more
+consistent with other options which tend to be named based on what they're
+doing, not which feature enumeration they use behind the scenes. Retain
+`md-clear=` as a deprecated alias.
+
+Rename opt_md_clear_{pv,hvm} and opt_fb_clear_mmio to opt_verw_{pv,hvm,mmio},
+which has a side effect of making spec_ctrl_init_domain() rather clearer to
+follow.
+
+No functional change.
+
+This is part of XSA-452 / CVE-2023-28746.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit f7603ca252e4226739eb3129a5290ee3da3f8ea4)
+---
+ docs/misc/xen-command-line.pandoc | 15 ++++----
+ xen/arch/x86/spec_ctrl.c | 62 ++++++++++++++++---------------
+ 2 files changed, 40 insertions(+), 37 deletions(-)
+
+diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
+index 2006697226..d909ec94fe 100644
+--- a/docs/misc/xen-command-line.pandoc
++++ b/docs/misc/xen-command-line.pandoc
+@@ -2324,7 +2324,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
+
+ ### spec-ctrl (x86)
+ > `= List of [ <bool>, xen=<bool>, {pv,hvm}=<bool>,
+-> {msr-sc,rsb,md-clear,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
++> {msr-sc,rsb,verw,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
+ > bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
+ > eager-fpu,l1d-flush,branch-harden,srb-lock,
+ > unpriv-mmio,gds-mit,div-scrub}=<bool> ]`
+@@ -2349,7 +2349,7 @@ in place for guests to use.
+
+ Use of a positive boolean value for either of these options is invalid.
+
+-The `pv=`, `hvm=`, `msr-sc=`, `rsb=`, `md-clear=` and `ibpb-entry=` options
++The `pv=`, `hvm=`, `msr-sc=`, `rsb=`, `verw=` and `ibpb-entry=` options
+ offer fine grained control over the primitives by Xen. These impact Xen's
+ ability to protect itself, and/or Xen's ability to virtualise support for
+ guests to use.
+@@ -2366,11 +2366,12 @@ guests to use.
+ guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
+ * `rsb=` offers control over whether to overwrite the Return Stack Buffer /
+ Return Address Stack on entry to Xen and on idle.
+-* `md-clear=` offers control over whether to use VERW to flush
+- microarchitectural buffers on idle and exit from Xen. *Note: For
+- compatibility with development versions of this fix, `mds=` is also accepted
+- on Xen 4.12 and earlier as an alias. Consult vendor documentation in
+- preference to here.*
++* `verw=` offers control over whether to use VERW for its scrubbing side
++ effects at appropriate privilege transitions. The exact side effects are
++ microarchitecture and microcode specific. *Note: `md-clear=` is accepted as
++ a deprecated alias. For compatibility with development versions of XSA-297,
++ `mds=` is also accepted on Xen 4.12 and earlier as an alias. Consult vendor
++ documentation in preference to here.*
+ * `ibpb-entry=` offers control over whether IBPB (Indirect Branch Prediction
+ Barrier) is used on entry to Xen. This is used by default on hardware
+ vulnerable to Branch Type Confusion, and hardware vulnerable to Speculative
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 25a18ac598..e12ec9930c 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -37,8 +37,8 @@ static bool __initdata opt_msr_sc_pv = true;
+ static bool __initdata opt_msr_sc_hvm = true;
+ static int8_t __initdata opt_rsb_pv = -1;
+ static bool __initdata opt_rsb_hvm = true;
+-static int8_t __ro_after_init opt_md_clear_pv = -1;
+-static int8_t __ro_after_init opt_md_clear_hvm = -1;
++static int8_t __ro_after_init opt_verw_pv = -1;
++static int8_t __ro_after_init opt_verw_hvm = -1;
+
+ static int8_t __ro_after_init opt_ibpb_entry_pv = -1;
+ static int8_t __ro_after_init opt_ibpb_entry_hvm = -1;
+@@ -78,7 +78,7 @@ static bool __initdata cpu_has_bug_mds; /* Any other M{LP,SB,FB}DS combination.
+
+ static int8_t __initdata opt_srb_lock = -1;
+ static bool __initdata opt_unpriv_mmio;
+-static bool __ro_after_init opt_fb_clear_mmio;
++static bool __ro_after_init opt_verw_mmio;
+ static int8_t __initdata opt_gds_mit = -1;
+ static int8_t __initdata opt_div_scrub = -1;
+
+@@ -120,8 +120,8 @@ static int __init cf_check parse_spec_ctrl(const char *s)
+ disable_common:
+ opt_rsb_pv = false;
+ opt_rsb_hvm = false;
+- opt_md_clear_pv = 0;
+- opt_md_clear_hvm = 0;
++ opt_verw_pv = 0;
++ opt_verw_hvm = 0;
+ opt_ibpb_entry_pv = 0;
+ opt_ibpb_entry_hvm = 0;
+ opt_ibpb_entry_dom0 = false;
+@@ -152,14 +152,14 @@ static int __init cf_check parse_spec_ctrl(const char *s)
+ {
+ opt_msr_sc_pv = val;
+ opt_rsb_pv = val;
+- opt_md_clear_pv = val;
++ opt_verw_pv = val;
+ opt_ibpb_entry_pv = val;
+ }
+ else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
+ {
+ opt_msr_sc_hvm = val;
+ opt_rsb_hvm = val;
+- opt_md_clear_hvm = val;
++ opt_verw_hvm = val;
+ opt_ibpb_entry_hvm = val;
+ }
+ else if ( (val = parse_boolean("msr-sc", s, ss)) != -1 )
+@@ -204,21 +204,22 @@ static int __init cf_check parse_spec_ctrl(const char *s)
+ break;
+ }
+ }
+- else if ( (val = parse_boolean("md-clear", s, ss)) != -1 )
++ else if ( (val = parse_boolean("verw", s, ss)) != -1 ||
++ (val = parse_boolean("md-clear", s, ss)) != -1 )
+ {
+ switch ( val )
+ {
+ case 0:
+ case 1:
+- opt_md_clear_pv = opt_md_clear_hvm = val;
++ opt_verw_pv = opt_verw_hvm = val;
+ break;
+
+ case -2:
+- s += strlen("md-clear=");
++ s += (*s == 'v') ? strlen("verw=") : strlen("md-clear=");
+ if ( (val = parse_boolean("pv", s, ss)) >= 0 )
+- opt_md_clear_pv = val;
++ opt_verw_pv = val;
+ else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
+- opt_md_clear_hvm = val;
++ opt_verw_hvm = val;
+ else
+ default:
+ rc = -EINVAL;
+@@ -540,8 +541,8 @@ static void __init print_details(enum ind_thunk thunk)
+ opt_srb_lock ? " SRB_LOCK+" : " SRB_LOCK-",
+ opt_ibpb_ctxt_switch ? " IBPB-ctxt" : "",
+ opt_l1d_flush ? " L1D_FLUSH" : "",
+- opt_md_clear_pv || opt_md_clear_hvm ||
+- opt_fb_clear_mmio ? " VERW" : "",
++ opt_verw_pv || opt_verw_hvm ||
++ opt_verw_mmio ? " VERW" : "",
+ opt_div_scrub ? " DIV" : "",
+ opt_branch_harden ? " BRANCH_HARDEN" : "");
+
+@@ -562,13 +563,13 @@ static void __init print_details(enum ind_thunk thunk)
+ boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
+ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ||
+ amd_virt_spec_ctrl ||
+- opt_eager_fpu || opt_md_clear_hvm) ? "" : " None",
++ opt_eager_fpu || opt_verw_hvm) ? "" : " None",
+ boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ? " MSR_SPEC_CTRL" : "",
+ (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
+ amd_virt_spec_ctrl) ? " MSR_VIRT_SPEC_CTRL" : "",
+ boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ? " RSB" : "",
+ opt_eager_fpu ? " EAGER_FPU" : "",
+- opt_md_clear_hvm ? " MD_CLEAR" : "",
++ opt_verw_hvm ? " VERW" : "",
+ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ? " IBPB-entry" : "");
+
+ #endif
+@@ -577,11 +578,11 @@ static void __init print_details(enum ind_thunk thunk)
+ (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
+ boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
+ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ||
+- opt_eager_fpu || opt_md_clear_pv) ? "" : " None",
++ opt_eager_fpu || opt_verw_pv) ? "" : " None",
+ boot_cpu_has(X86_FEATURE_SC_MSR_PV) ? " MSR_SPEC_CTRL" : "",
+ boot_cpu_has(X86_FEATURE_SC_RSB_PV) ? " RSB" : "",
+ opt_eager_fpu ? " EAGER_FPU" : "",
+- opt_md_clear_pv ? " MD_CLEAR" : "",
++ opt_verw_pv ? " VERW" : "",
+ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ? " IBPB-entry" : "");
+
+ printk(" XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
+@@ -1514,8 +1515,8 @@ void spec_ctrl_init_domain(struct domain *d)
+ {
+ bool pv = is_pv_domain(d);
+
+- bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
+- (opt_fb_clear_mmio && is_iommu_enabled(d)));
++ bool verw = ((pv ? opt_verw_pv : opt_verw_hvm) ||
++ (opt_verw_mmio && is_iommu_enabled(d)));
+
+ bool ibpb = ((pv ? opt_ibpb_entry_pv : opt_ibpb_entry_hvm) &&
+ (d->domain_id != 0 || opt_ibpb_entry_dom0));
+@@ -1878,19 +1879,20 @@ void __init init_speculation_mitigations(void)
+ * the return-to-guest path.
+ */
+ if ( opt_unpriv_mmio )
+- opt_fb_clear_mmio = cpu_has_fb_clear;
++ opt_verw_mmio = cpu_has_fb_clear;
+
+ /*
+ * By default, enable PV and HVM mitigations on MDS-vulnerable hardware.
+ * This will only be a token effort for MLPDS/MFBDS when HT is enabled,
+ * but it is somewhat better than nothing.
+ */
+- if ( opt_md_clear_pv == -1 )
+- opt_md_clear_pv = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+- boot_cpu_has(X86_FEATURE_MD_CLEAR));
+- if ( opt_md_clear_hvm == -1 )
+- opt_md_clear_hvm = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+- boot_cpu_has(X86_FEATURE_MD_CLEAR));
++ if ( opt_verw_pv == -1 )
++ opt_verw_pv = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
++ cpu_has_md_clear);
++
++ if ( opt_verw_hvm == -1 )
++ opt_verw_hvm = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
++ cpu_has_md_clear);
+
+ /*
+ * Enable MDS/MMIO defences as applicable. The Idle blocks need using if
+@@ -1903,12 +1905,12 @@ void __init init_speculation_mitigations(void)
+ * MDS mitigations. L1D_FLUSH is not safe for MMIO mitigations.)
+ *
+ * After calculating the appropriate idle setting, simplify
+- * opt_md_clear_hvm to mean just "should we VERW on the way into HVM
++ * opt_verw_hvm to mean just "should we VERW on the way into HVM
+ * guests", so spec_ctrl_init_domain() can calculate suitable settings.
+ */
+- if ( opt_md_clear_pv || opt_md_clear_hvm || opt_fb_clear_mmio )
++ if ( opt_verw_pv || opt_verw_hvm || opt_verw_mmio )
+ setup_force_cpu_cap(X86_FEATURE_SC_VERW_IDLE);
+- opt_md_clear_hvm &= !cpu_has_skip_l1dfl && !opt_l1d_flush;
++ opt_verw_hvm &= !cpu_has_skip_l1dfl && !opt_l1d_flush;
+
+ /*
+ * Warn the user if they are on MLPDS/MFBDS-vulnerable hardware with HT
+--
+2.44.0
+
diff --git a/0044-x86-spec-ctrl-VERW-handling-adjustments.patch b/0044-x86-spec-ctrl-VERW-handling-adjustments.patch
new file mode 100644
index 0000000..e2458c9
--- /dev/null
+++ b/0044-x86-spec-ctrl-VERW-handling-adjustments.patch
@@ -0,0 +1,171 @@
+From 6663430b442fdf9698bd8e03f701a4547309ad71 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 5 Mar 2024 19:33:37 +0000
+Subject: [PATCH 44/67] x86/spec-ctrl: VERW-handling adjustments
+
+... before we add yet more complexity to this logic. Mostly expanded
+comments, but with three minor changes.
+
+1) Introduce cpu_has_useful_md_clear to simplify later logic in this patch and
+ future ones.
+
+2) We only ever need SC_VERW_IDLE when SMT is active. If SMT isn't active,
+ then there's no re-partition of pipeline resources based on thread-idleness
+ to worry about.
+
+3) The logic to adjust HVM VERW based on L1D_FLUSH is unmaintainable and, as
+ it turns out, wrong. SKIP_L1DFL is just a hint bit, whereas opt_l1d_flush
+ is the relevant decision of whether to use L1D_FLUSH based on
+ susceptibility and user preference.
+
+ Rewrite the logic so it can be followed, and incorporate the fact that when
+ FB_CLEAR is visible, L1D_FLUSH isn't a safe substitution.
+
+This is part of XSA-452 / CVE-2023-28746.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 1eb91a8a06230b4b64228c9a380194f8cfe6c5e2)
+---
+ xen/arch/x86/spec_ctrl.c | 99 +++++++++++++++++++++++++++++-----------
+ 1 file changed, 73 insertions(+), 26 deletions(-)
+
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index e12ec9930c..adb6bc74e8 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -1531,7 +1531,7 @@ void __init init_speculation_mitigations(void)
+ {
+ enum ind_thunk thunk = THUNK_DEFAULT;
+ bool has_spec_ctrl, ibrs = false, hw_smt_enabled;
+- bool cpu_has_bug_taa, retpoline_safe;
++ bool cpu_has_bug_taa, cpu_has_useful_md_clear, retpoline_safe;
+
+ hw_smt_enabled = check_smt_enabled();
+
+@@ -1867,50 +1867,97 @@ void __init init_speculation_mitigations(void)
+ "enabled. Please assess your configuration and choose an\n"
+ "explicit 'smt=<bool>' setting. See XSA-273.\n");
+
++ /*
++ * A brief summary of VERW-related changes.
++ *
++ * https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/intel-analysis-microarchitectural-data-sampling.html
++ * https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/processor-mmio-stale-data-vulnerabilities.html
++ *
++ * Relevant ucodes:
++ *
++ * - May 2019, for MDS. Introduces the MD_CLEAR CPUID bit and VERW side
++ * effects to scrub Store/Load/Fill buffers as applicable. MD_CLEAR
++ * exists architecturally, even when the side effects have been removed.
++ *
++ * Use VERW to scrub on return-to-guest. Parts with L1D_FLUSH to
++ * mitigate L1TF have the same side effect, so no need to do both.
++ *
++ * Various Atoms suffer from Store-buffer sampling only. Store buffers
++ * are statically partitioned between non-idle threads, so scrubbing is
++ * wanted when going idle too.
++ *
++ * Load ports and Fill buffers are competitively shared between threads.
++ * SMT must be disabled for VERW scrubbing to be fully effective.
++ *
++ * - November 2019, for TAA. Extended VERW side effects to TSX-enabled
++ * MDS_NO parts.
++ *
++ * - February 2022, for Client TSX de-feature. Removed VERW side effects
++ * from Client CPUs only.
++ *
++ * - May 2022, for MMIO Stale Data. (Re)introduced Fill Buffer scrubbing
++ * on all MMIO-affected parts which didn't already have it for MDS
++ * reasons, enumerating FB_CLEAR on those parts only.
++ *
++ * If FB_CLEAR is enumerated, L1D_FLUSH does not have the same scrubbing
++ * side effects as VERW and cannot be used in its place.
++ */
+ mds_calculations();
+
+ /*
+- * Parts which enumerate FB_CLEAR are those which are post-MDS_NO and have
+- * reintroduced the VERW fill buffer flushing side effect because of a
+- * susceptibility to FBSDP.
++ * Parts which enumerate FB_CLEAR are those with now-updated microcode
++ * which weren't susceptible to the original MFBDS (and therefore didn't
++ * have Fill Buffer scrubbing side effects to begin with, or were Client
++ * MDS_NO non-TAA_NO parts where the scrubbing was removed), but have had
++ * the scrubbing reintroduced because of a susceptibility to FBSDP.
+ *
+ * If unprivileged guests have (or will have) MMIO mappings, we can
+ * mitigate cross-domain leakage of fill buffer data by issuing VERW on
+- * the return-to-guest path.
++ * the return-to-guest path. This is only a token effort if SMT is
++ * active.
+ */
+ if ( opt_unpriv_mmio )
+ opt_verw_mmio = cpu_has_fb_clear;
+
+ /*
+- * By default, enable PV and HVM mitigations on MDS-vulnerable hardware.
+- * This will only be a token effort for MLPDS/MFBDS when HT is enabled,
+- * but it is somewhat better than nothing.
++ * MD_CLEAR is enumerated architecturally forevermore, even after the
++ * scrubbing side effects have been removed. Create ourselves an version
++ * which expressed whether we think MD_CLEAR is having any useful side
++ * effect.
++ */
++ cpu_has_useful_md_clear = (cpu_has_md_clear &&
++ (cpu_has_bug_mds || cpu_has_bug_msbds_only));
++
++ /*
++ * By default, use VERW scrubbing on applicable hardware, if we think it's
++ * going to have an effect. This will only be a token effort for
++ * MLPDS/MFBDS when SMT is enabled.
+ */
+ if ( opt_verw_pv == -1 )
+- opt_verw_pv = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+- cpu_has_md_clear);
++ opt_verw_pv = cpu_has_useful_md_clear;
+
+ if ( opt_verw_hvm == -1 )
+- opt_verw_hvm = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
+- cpu_has_md_clear);
++ opt_verw_hvm = cpu_has_useful_md_clear;
+
+ /*
+- * Enable MDS/MMIO defences as applicable. The Idle blocks need using if
+- * either the PV or HVM MDS defences are used, or if we may give MMIO
+- * access to untrusted guests.
+- *
+- * HVM is more complicated. The MD_CLEAR microcode extends L1D_FLUSH with
+- * equivalent semantics to avoid needing to perform both flushes on the
+- * HVM path. Therefore, we don't need VERW in addition to L1D_FLUSH (for
+- * MDS mitigations. L1D_FLUSH is not safe for MMIO mitigations.)
+- *
+- * After calculating the appropriate idle setting, simplify
+- * opt_verw_hvm to mean just "should we VERW on the way into HVM
+- * guests", so spec_ctrl_init_domain() can calculate suitable settings.
++ * If SMT is active, and we're protecting against MDS or MMIO stale data,
++ * we need to scrub before going idle as well as on return to guest.
++ * Various pipeline resources are repartitioned amongst non-idle threads.
+ */
+- if ( opt_verw_pv || opt_verw_hvm || opt_verw_mmio )
++ if ( ((cpu_has_useful_md_clear && (opt_verw_pv || opt_verw_hvm)) ||
++ opt_verw_mmio) && hw_smt_enabled )
+ setup_force_cpu_cap(X86_FEATURE_SC_VERW_IDLE);
+- opt_verw_hvm &= !cpu_has_skip_l1dfl && !opt_l1d_flush;
++
++ /*
++ * After calculating the appropriate idle setting, simplify opt_verw_hvm
++ * to mean just "should we VERW on the way into HVM guests", so
++ * spec_ctrl_init_domain() can calculate suitable settings.
++ *
++ * It is only safe to use L1D_FLUSH in place of VERW when MD_CLEAR is the
++ * only *_CLEAR we can see.
++ */
++ if ( opt_l1d_flush && cpu_has_md_clear && !cpu_has_fb_clear )
++ opt_verw_hvm = false;
+
+ /*
+ * Warn the user if they are on MLPDS/MFBDS-vulnerable hardware with HT
+--
+2.44.0
+
diff --git a/0045-x86-spec-ctrl-Mitigation-Register-File-Data-Sampling.patch b/0045-x86-spec-ctrl-Mitigation-Register-File-Data-Sampling.patch
new file mode 100644
index 0000000..4a10524
--- /dev/null
+++ b/0045-x86-spec-ctrl-Mitigation-Register-File-Data-Sampling.patch
@@ -0,0 +1,320 @@
+From d85481135d87abbbf1feab18b749288fa08b65f2 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 22 Jun 2023 23:32:19 +0100
+Subject: [PATCH 45/67] x86/spec-ctrl: Mitigation Register File Data Sampling
+
+RFDS affects Atom cores, also branded E-cores, between the Goldmont and
+Gracemont microarchitectures. This includes Alder Lake and Raptor Lake hybrid
+clien systems which have a mix of Gracemont and other types of cores.
+
+Two new bits have been defined; RFDS_CLEAR to indicate VERW has more side
+effets, and RFDS_NO to incidate that the system is unaffected. Plenty of
+unaffected CPUs won't be getting RFDS_NO retrofitted in microcode, so we
+synthesise it. Alder Lake and Raptor Lake Xeon-E's are unaffected due to
+their platform configuration, and we must use the Hybrid CPUID bit to
+distinguish them from their non-Xeon counterparts.
+
+Like MD_CLEAR and FB_CLEAR, RFDS_CLEAR needs OR-ing across a resource pool, so
+set it in the max policies and reflect the host setting in default.
+
+This is part of XSA-452 / CVE-2023-28746.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit fb5b6f6744713410c74cfc12b7176c108e3c9a31)
+---
+ tools/misc/xen-cpuid.c | 5 +-
+ xen/arch/x86/cpu-policy.c | 5 +
+ xen/arch/x86/include/asm/cpufeature.h | 3 +
+ xen/arch/x86/include/asm/msr-index.h | 2 +
+ xen/arch/x86/spec_ctrl.c | 100 +++++++++++++++++++-
+ xen/include/public/arch-x86/cpufeatureset.h | 3 +
+ 6 files changed, 111 insertions(+), 7 deletions(-)
+
+diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
+index aefc140d66..5ceea8be07 100644
+--- a/tools/misc/xen-cpuid.c
++++ b/tools/misc/xen-cpuid.c
+@@ -172,7 +172,7 @@ static const char *const str_7d0[32] =
+ [ 8] = "avx512-vp2intersect", [ 9] = "srbds-ctrl",
+ [10] = "md-clear", [11] = "rtm-always-abort",
+ /* 12 */ [13] = "tsx-force-abort",
+- [14] = "serialize",
++ [14] = "serialize", [15] = "hybrid",
+ [16] = "tsxldtrk",
+ [18] = "pconfig",
+ [20] = "cet-ibt",
+@@ -237,7 +237,8 @@ static const char *const str_m10Al[32] =
+ [20] = "bhi-no", [21] = "xapic-status",
+ /* 22 */ [23] = "ovrclk-status",
+ [24] = "pbrsb-no", [25] = "gds-ctrl",
+- [26] = "gds-no",
++ [26] = "gds-no", [27] = "rfds-no",
++ [28] = "rfds-clear",
+ };
+
+ static const char *const str_m10Ah[32] =
+diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
+index 7b875a7221..96c2cee1a8 100644
+--- a/xen/arch/x86/cpu-policy.c
++++ b/xen/arch/x86/cpu-policy.c
+@@ -444,6 +444,7 @@ static void __init guest_common_max_feature_adjustments(uint32_t *fs)
+ */
+ __set_bit(X86_FEATURE_MD_CLEAR, fs);
+ __set_bit(X86_FEATURE_FB_CLEAR, fs);
++ __set_bit(X86_FEATURE_RFDS_CLEAR, fs);
+
+ /*
+ * The Gather Data Sampling microcode mitigation (August 2023) has an
+@@ -493,6 +494,10 @@ static void __init guest_common_default_feature_adjustments(uint32_t *fs)
+ if ( cpu_has_fb_clear )
+ __set_bit(X86_FEATURE_FB_CLEAR, fs);
+
++ __clear_bit(X86_FEATURE_RFDS_CLEAR, fs);
++ if ( cpu_has_rfds_clear )
++ __set_bit(X86_FEATURE_RFDS_CLEAR, fs);
++
+ /*
+ * The Gather Data Sampling microcode mitigation (August 2023) has an
+ * adverse performance impact on the CLWB instruction on SKX/CLX/CPX.
+diff --git a/xen/arch/x86/include/asm/cpufeature.h b/xen/arch/x86/include/asm/cpufeature.h
+index ec824e8954..a6b8af1296 100644
+--- a/xen/arch/x86/include/asm/cpufeature.h
++++ b/xen/arch/x86/include/asm/cpufeature.h
+@@ -140,6 +140,7 @@
+ #define cpu_has_rtm_always_abort boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)
+ #define cpu_has_tsx_force_abort boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)
+ #define cpu_has_serialize boot_cpu_has(X86_FEATURE_SERIALIZE)
++#define cpu_has_hybrid boot_cpu_has(X86_FEATURE_HYBRID)
+ #define cpu_has_avx512_fp16 boot_cpu_has(X86_FEATURE_AVX512_FP16)
+ #define cpu_has_arch_caps boot_cpu_has(X86_FEATURE_ARCH_CAPS)
+
+@@ -161,6 +162,8 @@
+ #define cpu_has_rrsba boot_cpu_has(X86_FEATURE_RRSBA)
+ #define cpu_has_gds_ctrl boot_cpu_has(X86_FEATURE_GDS_CTRL)
+ #define cpu_has_gds_no boot_cpu_has(X86_FEATURE_GDS_NO)
++#define cpu_has_rfds_no boot_cpu_has(X86_FEATURE_RFDS_NO)
++#define cpu_has_rfds_clear boot_cpu_has(X86_FEATURE_RFDS_CLEAR)
+
+ /* Synthesized. */
+ #define cpu_has_arch_perfmon boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
+diff --git a/xen/arch/x86/include/asm/msr-index.h b/xen/arch/x86/include/asm/msr-index.h
+index 6abf7bc34a..9b5f67711f 100644
+--- a/xen/arch/x86/include/asm/msr-index.h
++++ b/xen/arch/x86/include/asm/msr-index.h
+@@ -88,6 +88,8 @@
+ #define ARCH_CAPS_PBRSB_NO (_AC(1, ULL) << 24)
+ #define ARCH_CAPS_GDS_CTRL (_AC(1, ULL) << 25)
+ #define ARCH_CAPS_GDS_NO (_AC(1, ULL) << 26)
++#define ARCH_CAPS_RFDS_NO (_AC(1, ULL) << 27)
++#define ARCH_CAPS_RFDS_CLEAR (_AC(1, ULL) << 28)
+
+ #define MSR_FLUSH_CMD 0x0000010b
+ #define FLUSH_CMD_L1D (_AC(1, ULL) << 0)
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index adb6bc74e8..1ee81e2dfe 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -24,6 +24,7 @@
+
+ #include <asm/amd.h>
+ #include <asm/hvm/svm/svm.h>
++#include <asm/intel-family.h>
+ #include <asm/microcode.h>
+ #include <asm/msr.h>
+ #include <asm/pv/domain.h>
+@@ -447,7 +448,7 @@ static void __init print_details(enum ind_thunk thunk)
+ * Hardware read-only information, stating immunity to certain issues, or
+ * suggestions of which mitigation to use.
+ */
+- printk(" Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
++ printk(" Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+ (caps & ARCH_CAPS_RDCL_NO) ? " RDCL_NO" : "",
+ (caps & ARCH_CAPS_EIBRS) ? " EIBRS" : "",
+ (caps & ARCH_CAPS_RSBA) ? " RSBA" : "",
+@@ -463,6 +464,7 @@ static void __init print_details(enum ind_thunk thunk)
+ (caps & ARCH_CAPS_FB_CLEAR) ? " FB_CLEAR" : "",
+ (caps & ARCH_CAPS_PBRSB_NO) ? " PBRSB_NO" : "",
+ (caps & ARCH_CAPS_GDS_NO) ? " GDS_NO" : "",
++ (caps & ARCH_CAPS_RFDS_NO) ? " RFDS_NO" : "",
+ (e8b & cpufeat_mask(X86_FEATURE_IBRS_ALWAYS)) ? " IBRS_ALWAYS" : "",
+ (e8b & cpufeat_mask(X86_FEATURE_STIBP_ALWAYS)) ? " STIBP_ALWAYS" : "",
+ (e8b & cpufeat_mask(X86_FEATURE_IBRS_FAST)) ? " IBRS_FAST" : "",
+@@ -473,7 +475,7 @@ static void __init print_details(enum ind_thunk thunk)
+ (e21a & cpufeat_mask(X86_FEATURE_SRSO_NO)) ? " SRSO_NO" : "");
+
+ /* Hardware features which need driving to mitigate issues. */
+- printk(" Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
++ printk(" Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+ (e8b & cpufeat_mask(X86_FEATURE_IBPB)) ||
+ (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBPB" : "",
+ (e8b & cpufeat_mask(X86_FEATURE_IBRS)) ||
+@@ -491,6 +493,7 @@ static void __init print_details(enum ind_thunk thunk)
+ (caps & ARCH_CAPS_TSX_CTRL) ? " TSX_CTRL" : "",
+ (caps & ARCH_CAPS_FB_CLEAR_CTRL) ? " FB_CLEAR_CTRL" : "",
+ (caps & ARCH_CAPS_GDS_CTRL) ? " GDS_CTRL" : "",
++ (caps & ARCH_CAPS_RFDS_CLEAR) ? " RFDS_CLEAR" : "",
+ (e21a & cpufeat_mask(X86_FEATURE_SBPB)) ? " SBPB" : "");
+
+ /* Compiled-in support which pertains to mitigations. */
+@@ -1359,6 +1362,83 @@ static __init void mds_calculations(void)
+ }
+ }
+
++/*
++ * Register File Data Sampling affects Atom cores from the Goldmont to
++ * Gracemont microarchitectures. The March 2024 microcode adds RFDS_NO to
++ * some but not all unaffected parts, and RFDS_CLEAR to affected parts still
++ * in support.
++ *
++ * Alder Lake and Raptor Lake client CPUs have a mix of P cores
++ * (Golden/Raptor Cove, not vulnerable) and E cores (Gracemont,
++ * vulnerable), and both enumerate RFDS_CLEAR.
++ *
++ * Both exist in a Xeon SKU, which has the E cores (Gracemont) disabled by
++ * platform configuration, and enumerate RFDS_NO.
++ *
++ * With older parts, or with out-of-date microcode, synthesise RFDS_NO when
++ * safe to do so.
++ *
++ * https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/register-file-data-sampling.html
++ */
++static void __init rfds_calculations(void)
++{
++ /* RFDS is only known to affect Intel Family 6 processors at this time. */
++ if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
++ boot_cpu_data.x86 != 6 )
++ return;
++
++ /*
++ * If RFDS_NO or RFDS_CLEAR are visible, we've either got suitable
++ * microcode, or an RFDS-aware hypervisor is levelling us in a pool.
++ */
++ if ( cpu_has_rfds_no || cpu_has_rfds_clear )
++ return;
++
++ /* If we're virtualised, don't attempt to synthesise RFDS_NO. */
++ if ( cpu_has_hypervisor )
++ return;
++
++ /*
++ * Not all CPUs are expected to get a microcode update enumerating one of
++ * RFDS_{NO,CLEAR}, or we might have out-of-date microcode.
++ */
++ switch ( boot_cpu_data.x86_model )
++ {
++ case INTEL_FAM6_ALDERLAKE:
++ case INTEL_FAM6_RAPTORLAKE:
++ /*
++ * Alder Lake and Raptor Lake might be a client SKU (with the
++ * Gracemont cores active, and therefore vulnerable) or might be a
++ * server SKU (with the Gracemont cores disabled, and therefore not
++ * vulnerable).
++ *
++ * See if the CPU identifies as hybrid to distinguish the two cases.
++ */
++ if ( !cpu_has_hybrid )
++ break;
++ fallthrough;
++ case INTEL_FAM6_ALDERLAKE_L:
++ case INTEL_FAM6_RAPTORLAKE_P:
++ case INTEL_FAM6_RAPTORLAKE_S:
++
++ case INTEL_FAM6_ATOM_GOLDMONT: /* Apollo Lake */
++ case INTEL_FAM6_ATOM_GOLDMONT_D: /* Denverton */
++ case INTEL_FAM6_ATOM_GOLDMONT_PLUS: /* Gemini Lake */
++ case INTEL_FAM6_ATOM_TREMONT_D: /* Snow Ridge / Parker Ridge */
++ case INTEL_FAM6_ATOM_TREMONT: /* Elkhart Lake */
++ case INTEL_FAM6_ATOM_TREMONT_L: /* Jasper Lake */
++ case INTEL_FAM6_ATOM_GRACEMONT: /* Alder Lake N */
++ return;
++ }
++
++ /*
++ * We appear to be on an unaffected CPU which didn't enumerate RFDS_NO,
++ * perhaps because of it's age or because of out-of-date microcode.
++ * Synthesise it.
++ */
++ setup_force_cpu_cap(X86_FEATURE_RFDS_NO);
++}
++
+ static bool __init cpu_has_gds(void)
+ {
+ /*
+@@ -1872,6 +1952,7 @@ void __init init_speculation_mitigations(void)
+ *
+ * https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/intel-analysis-microarchitectural-data-sampling.html
+ * https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/processor-mmio-stale-data-vulnerabilities.html
++ * https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/register-file-data-sampling.html
+ *
+ * Relevant ucodes:
+ *
+@@ -1901,8 +1982,12 @@ void __init init_speculation_mitigations(void)
+ *
+ * If FB_CLEAR is enumerated, L1D_FLUSH does not have the same scrubbing
+ * side effects as VERW and cannot be used in its place.
++ *
++ * - March 2023, for RFDS. Enumerate RFDS_CLEAR to mean that VERW now
++ * scrubs non-architectural entries from certain register files.
+ */
+ mds_calculations();
++ rfds_calculations();
+
+ /*
+ * Parts which enumerate FB_CLEAR are those with now-updated microcode
+@@ -1934,15 +2019,19 @@ void __init init_speculation_mitigations(void)
+ * MLPDS/MFBDS when SMT is enabled.
+ */
+ if ( opt_verw_pv == -1 )
+- opt_verw_pv = cpu_has_useful_md_clear;
++ opt_verw_pv = cpu_has_useful_md_clear || cpu_has_rfds_clear;
+
+ if ( opt_verw_hvm == -1 )
+- opt_verw_hvm = cpu_has_useful_md_clear;
++ opt_verw_hvm = cpu_has_useful_md_clear || cpu_has_rfds_clear;
+
+ /*
+ * If SMT is active, and we're protecting against MDS or MMIO stale data,
+ * we need to scrub before going idle as well as on return to guest.
+ * Various pipeline resources are repartitioned amongst non-idle threads.
++ *
++ * We don't need to scrub on idle for RFDS. There are no affected cores
++ * which support SMT, despite there being affected cores in hybrid systems
++ * which have SMT elsewhere in the platform.
+ */
+ if ( ((cpu_has_useful_md_clear && (opt_verw_pv || opt_verw_hvm)) ||
+ opt_verw_mmio) && hw_smt_enabled )
+@@ -1956,7 +2045,8 @@ void __init init_speculation_mitigations(void)
+ * It is only safe to use L1D_FLUSH in place of VERW when MD_CLEAR is the
+ * only *_CLEAR we can see.
+ */
+- if ( opt_l1d_flush && cpu_has_md_clear && !cpu_has_fb_clear )
++ if ( opt_l1d_flush && cpu_has_md_clear && !cpu_has_fb_clear &&
++ !cpu_has_rfds_clear )
+ opt_verw_hvm = false;
+
+ /*
+diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
+index aec1407613..113e6cadc1 100644
+--- a/xen/include/public/arch-x86/cpufeatureset.h
++++ b/xen/include/public/arch-x86/cpufeatureset.h
+@@ -264,6 +264,7 @@ XEN_CPUFEATURE(MD_CLEAR, 9*32+10) /*!A VERW clears microarchitectural buffe
+ XEN_CPUFEATURE(RTM_ALWAYS_ABORT, 9*32+11) /*! June 2021 TSX defeaturing in microcode. */
+ XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
+ XEN_CPUFEATURE(SERIALIZE, 9*32+14) /*A SERIALIZE insn */
++XEN_CPUFEATURE(HYBRID, 9*32+15) /* Heterogeneous platform */
+ XEN_CPUFEATURE(TSXLDTRK, 9*32+16) /*a TSX load tracking suspend/resume insns */
+ XEN_CPUFEATURE(CET_IBT, 9*32+20) /* CET - Indirect Branch Tracking */
+ XEN_CPUFEATURE(AVX512_FP16, 9*32+23) /* AVX512 FP16 instructions */
+@@ -330,6 +331,8 @@ XEN_CPUFEATURE(OVRCLK_STATUS, 16*32+23) /* MSR_OVERCLOCKING_STATUS */
+ XEN_CPUFEATURE(PBRSB_NO, 16*32+24) /*A No Post-Barrier RSB predictions */
+ XEN_CPUFEATURE(GDS_CTRL, 16*32+25) /* MCU_OPT_CTRL.GDS_MIT_{DIS,LOCK} */
+ XEN_CPUFEATURE(GDS_NO, 16*32+26) /*A No Gather Data Sampling */
++XEN_CPUFEATURE(RFDS_NO, 16*32+27) /*A No Register File Data Sampling */
++XEN_CPUFEATURE(RFDS_CLEAR, 16*32+28) /*!A Register File(s) cleared by VERW */
+
+ /* Intel-defined CPU features, MSR_ARCH_CAPS 0x10a.edx, word 17 */
+
+--
+2.44.0
+
diff --git a/0046-x86-paging-Delete-update_cr3-s-do_locking-parameter.patch b/0046-x86-paging-Delete-update_cr3-s-do_locking-parameter.patch
new file mode 100644
index 0000000..ce397a1
--- /dev/null
+++ b/0046-x86-paging-Delete-update_cr3-s-do_locking-parameter.patch
@@ -0,0 +1,161 @@
+From bf70ce8b3449c49eb828d5b1f4934a49b00fef35 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 20 Sep 2023 20:06:53 +0100
+Subject: [PATCH 46/67] x86/paging: Delete update_cr3()'s do_locking parameter
+
+Nicola reports that the XSA-438 fix introduced new MISRA violations because of
+some incidental tidying it tried to do. The parameter is useless, so resolve
+the MISRA regression by removing it.
+
+hap_update_cr3() discards the parameter entirely, while sh_update_cr3() uses
+it to distinguish internal and external callers and therefore whether the
+paging lock should be taken.
+
+However, we have paging_lock_recursive() for this purpose, which also avoids
+the ability for the shadow internal callers to accidentally not hold the lock.
+
+Fixes: fb0ff49fe9f7 ("x86/shadow: defer releasing of PV's top-level shadow reference")
+Reported-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Release-acked-by: Henry Wang <Henry.Wang@arm.com>
+(cherry picked from commit e71157d1ac2a7fbf413130663cf0a93ff9fbcf7e)
+---
+ xen/arch/x86/include/asm/paging.h | 5 ++---
+ xen/arch/x86/mm/hap/hap.c | 5 ++---
+ xen/arch/x86/mm/shadow/common.c | 2 +-
+ xen/arch/x86/mm/shadow/multi.c | 17 ++++++++---------
+ xen/arch/x86/mm/shadow/none.c | 3 +--
+ 5 files changed, 14 insertions(+), 18 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/paging.h b/xen/arch/x86/include/asm/paging.h
+index 94c590f31a..809ff35d9a 100644
+--- a/xen/arch/x86/include/asm/paging.h
++++ b/xen/arch/x86/include/asm/paging.h
+@@ -138,8 +138,7 @@ struct paging_mode {
+ paddr_t ga, uint32_t *pfec,
+ unsigned int *page_order);
+ #endif
+- pagetable_t (*update_cr3 )(struct vcpu *v, bool do_locking,
+- bool noflush);
++ pagetable_t (*update_cr3 )(struct vcpu *v, bool noflush);
+ void (*update_paging_modes )(struct vcpu *v);
+ bool (*flush_tlb )(const unsigned long *vcpu_bitmap);
+
+@@ -312,7 +311,7 @@ static inline unsigned long paging_ga_to_gfn_cr3(struct vcpu *v,
+ * as the value to load into the host CR3 to schedule this vcpu */
+ static inline pagetable_t paging_update_cr3(struct vcpu *v, bool noflush)
+ {
+- return paging_get_hostmode(v)->update_cr3(v, 1, noflush);
++ return paging_get_hostmode(v)->update_cr3(v, noflush);
+ }
+
+ /* Update all the things that are derived from the guest's CR0/CR3/CR4.
+diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
+index 57a19c3d59..3ad39a7dd7 100644
+--- a/xen/arch/x86/mm/hap/hap.c
++++ b/xen/arch/x86/mm/hap/hap.c
+@@ -739,8 +739,7 @@ static bool cf_check hap_invlpg(struct vcpu *v, unsigned long linear)
+ return 1;
+ }
+
+-static pagetable_t cf_check hap_update_cr3(
+- struct vcpu *v, bool do_locking, bool noflush)
++static pagetable_t cf_check hap_update_cr3(struct vcpu *v, bool noflush)
+ {
+ v->arch.hvm.hw_cr[3] = v->arch.hvm.guest_cr[3];
+ hvm_update_guest_cr3(v, noflush);
+@@ -826,7 +825,7 @@ static void cf_check hap_update_paging_modes(struct vcpu *v)
+ }
+
+ /* CR3 is effectively updated by a mode change. Flush ASIDs, etc. */
+- hap_update_cr3(v, 0, false);
++ hap_update_cr3(v, false);
+
+ unlock:
+ paging_unlock(d);
+diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
+index c0940f939e..18714dbd02 100644
+--- a/xen/arch/x86/mm/shadow/common.c
++++ b/xen/arch/x86/mm/shadow/common.c
+@@ -2579,7 +2579,7 @@ static void sh_update_paging_modes(struct vcpu *v)
+ }
+ #endif /* OOS */
+
+- v->arch.paging.mode->update_cr3(v, 0, false);
++ v->arch.paging.mode->update_cr3(v, false);
+ }
+
+ void cf_check shadow_update_paging_modes(struct vcpu *v)
+diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
+index c92b354a78..e54a507b54 100644
+--- a/xen/arch/x86/mm/shadow/multi.c
++++ b/xen/arch/x86/mm/shadow/multi.c
+@@ -2506,7 +2506,7 @@ static int cf_check sh_page_fault(
+ * In any case, in the PAE case, the ASSERT is not true; it can
+ * happen because of actions the guest is taking. */
+ #if GUEST_PAGING_LEVELS == 3
+- v->arch.paging.mode->update_cr3(v, 0, false);
++ v->arch.paging.mode->update_cr3(v, false);
+ #else
+ ASSERT(d->is_shutting_down);
+ #endif
+@@ -3224,17 +3224,13 @@ static void cf_check sh_detach_old_tables(struct vcpu *v)
+ }
+ }
+
+-static pagetable_t cf_check sh_update_cr3(struct vcpu *v, bool do_locking,
+- bool noflush)
++static pagetable_t cf_check sh_update_cr3(struct vcpu *v, bool noflush)
+ /* Updates vcpu->arch.cr3 after the guest has changed CR3.
+ * Paravirtual guests should set v->arch.guest_table (and guest_table_user,
+ * if appropriate).
+ * HVM guests should also make sure hvm_get_guest_cntl_reg(v, 3) works;
+ * this function will call hvm_update_guest_cr(v, 3) to tell them where the
+ * shadow tables are.
+- * If do_locking != 0, assume we are being called from outside the
+- * shadow code, and must take and release the paging lock; otherwise
+- * that is the caller's responsibility.
+ */
+ {
+ struct domain *d = v->domain;
+@@ -3252,7 +3248,11 @@ static pagetable_t cf_check sh_update_cr3(struct vcpu *v, bool do_locking,
+ return old_entry;
+ }
+
+- if ( do_locking ) paging_lock(v->domain);
++ /*
++ * This is used externally (with the paging lock not taken) and internally
++ * by the shadow code (with the lock already taken).
++ */
++ paging_lock_recursive(v->domain);
+
+ #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
+ /* Need to resync all the shadow entries on a TLB flush. Resync
+@@ -3480,8 +3480,7 @@ static pagetable_t cf_check sh_update_cr3(struct vcpu *v, bool do_locking,
+ shadow_sync_other_vcpus(v);
+ #endif
+
+- /* Release the lock, if we took it (otherwise it's the caller's problem) */
+- if ( do_locking ) paging_unlock(v->domain);
++ paging_unlock(v->domain);
+
+ return old_entry;
+ }
+diff --git a/xen/arch/x86/mm/shadow/none.c b/xen/arch/x86/mm/shadow/none.c
+index 743c0ffb85..7e4e386cd0 100644
+--- a/xen/arch/x86/mm/shadow/none.c
++++ b/xen/arch/x86/mm/shadow/none.c
+@@ -52,8 +52,7 @@ static unsigned long cf_check _gva_to_gfn(
+ }
+ #endif
+
+-static pagetable_t cf_check _update_cr3(struct vcpu *v, bool do_locking,
+- bool noflush)
++static pagetable_t cf_check _update_cr3(struct vcpu *v, bool noflush)
+ {
+ ASSERT_UNREACHABLE();
+ return pagetable_null();
+--
+2.44.0
+
diff --git a/0047-xen-Swap-order-of-actions-in-the-FREE-macros.patch b/0047-xen-Swap-order-of-actions-in-the-FREE-macros.patch
new file mode 100644
index 0000000..3e58906
--- /dev/null
+++ b/0047-xen-Swap-order-of-actions-in-the-FREE-macros.patch
@@ -0,0 +1,58 @@
+From 0a53565f1886201cc8a8afe9b2619ee297c20955 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Fri, 2 Feb 2024 00:39:42 +0000
+Subject: [PATCH 47/67] xen: Swap order of actions in the FREE*() macros
+
+Wherever possible, it is a good idea to NULL out the visible reference to an
+object prior to freeing it. The FREE*() macros already collect together both
+parts, making it easy to adjust.
+
+This has a marginal code generation improvement, as some of the calls to the
+free() function can be tailcall optimised.
+
+No functional change.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit c4f427ec879e7c0df6d44d02561e8bee838a293e)
+---
+ xen/include/xen/mm.h | 3 ++-
+ xen/include/xen/xmalloc.h | 7 ++++---
+ 2 files changed, 6 insertions(+), 4 deletions(-)
+
+diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
+index 3dc61bcc3c..211685a5d2 100644
+--- a/xen/include/xen/mm.h
++++ b/xen/include/xen/mm.h
+@@ -80,8 +80,9 @@ bool scrub_free_pages(void);
+
+ /* Free an allocation, and zero the pointer to it. */
+ #define FREE_XENHEAP_PAGES(p, o) do { \
+- free_xenheap_pages(p, o); \
++ void *_ptr_ = (p); \
+ (p) = NULL; \
++ free_xenheap_pages(_ptr_, o); \
+ } while ( false )
+ #define FREE_XENHEAP_PAGE(p) FREE_XENHEAP_PAGES(p, 0)
+
+diff --git a/xen/include/xen/xmalloc.h b/xen/include/xen/xmalloc.h
+index 16979a117c..d857298011 100644
+--- a/xen/include/xen/xmalloc.h
++++ b/xen/include/xen/xmalloc.h
+@@ -66,9 +66,10 @@
+ extern void xfree(void *);
+
+ /* Free an allocation, and zero the pointer to it. */
+-#define XFREE(p) do { \
+- xfree(p); \
+- (p) = NULL; \
++#define XFREE(p) do { \
++ void *_ptr_ = (p); \
++ (p) = NULL; \
++ xfree(_ptr_); \
+ } while ( false )
+
+ /* Underlying functions */
+--
+2.44.0
+
diff --git a/0048-x86-spinlock-introduce-support-for-blocking-speculat.patch b/0048-x86-spinlock-introduce-support-for-blocking-speculat.patch
new file mode 100644
index 0000000..ecf0830
--- /dev/null
+++ b/0048-x86-spinlock-introduce-support-for-blocking-speculat.patch
@@ -0,0 +1,331 @@
+From 9d2f136328aab5537b7180a1b23e171893ebe455 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 13 Feb 2024 13:08:05 +0100
+Subject: [PATCH 48/67] x86/spinlock: introduce support for blocking
+ speculation into critical regions
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Introduce a new Kconfig option to block speculation into lock protected
+critical regions. The Kconfig option is enabled by default, but the mitigation
+won't be engaged unless it's explicitly enabled in the command line using
+`spec-ctrl=lock-harden`.
+
+Convert the spinlock acquire macros into always-inline functions, and introduce
+a speculation barrier after the lock has been taken. Note the speculation
+barrier is not placed inside the implementation of the spin lock functions, as
+to prevent speculation from falling through the call to the lock functions
+resulting in the barrier also being skipped.
+
+trylock variants are protected using a construct akin to the existing
+evaluate_nospec().
+
+This patch only implements the speculation barrier for x86.
+
+Note spin locks are the only locking primitive taken care in this change,
+further locking primitives will be adjusted by separate changes.
+
+This is part of XSA-453 / CVE-2024-2193
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 7ef0084418e188d05f338c3e028fbbe8b6924afa)
+---
+ docs/misc/xen-command-line.pandoc | 7 ++++-
+ xen/arch/x86/include/asm/cpufeatures.h | 2 +-
+ xen/arch/x86/include/asm/nospec.h | 26 ++++++++++++++++++
+ xen/arch/x86/spec_ctrl.c | 26 +++++++++++++++---
+ xen/common/Kconfig | 17 ++++++++++++
+ xen/include/xen/nospec.h | 15 +++++++++++
+ xen/include/xen/spinlock.h | 37 +++++++++++++++++++++-----
+ 7 files changed, 119 insertions(+), 11 deletions(-)
+
+diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
+index d909ec94fe..e1d56407dd 100644
+--- a/docs/misc/xen-command-line.pandoc
++++ b/docs/misc/xen-command-line.pandoc
+@@ -2327,7 +2327,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
+ > {msr-sc,rsb,verw,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
+ > bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
+ > eager-fpu,l1d-flush,branch-harden,srb-lock,
+-> unpriv-mmio,gds-mit,div-scrub}=<bool> ]`
++> unpriv-mmio,gds-mit,div-scrub,lock-harden}=<bool> ]`
+
+ Controls for speculative execution sidechannel mitigations. By default, Xen
+ will pick the most appropriate mitigations based on compiled in support,
+@@ -2454,6 +2454,11 @@ On all hardware, the `div-scrub=` option can be used to force or prevent Xen
+ from mitigating the DIV-leakage vulnerability. By default, Xen will mitigate
+ DIV-leakage on hardware believed to be vulnerable.
+
++If Xen is compiled with `CONFIG_SPECULATIVE_HARDEN_LOCK`, the `lock-harden=`
++boolean can be used to force or prevent Xen from using speculation barriers to
++protect lock critical regions. This mitigation won't be engaged by default,
++and needs to be explicitly enabled on the command line.
++
+ ### sync_console
+ > `= <boolean>`
+
+diff --git a/xen/arch/x86/include/asm/cpufeatures.h b/xen/arch/x86/include/asm/cpufeatures.h
+index c3aad21c3b..7e8221fd85 100644
+--- a/xen/arch/x86/include/asm/cpufeatures.h
++++ b/xen/arch/x86/include/asm/cpufeatures.h
+@@ -24,7 +24,7 @@ XEN_CPUFEATURE(APERFMPERF, X86_SYNTH( 8)) /* APERFMPERF */
+ XEN_CPUFEATURE(MFENCE_RDTSC, X86_SYNTH( 9)) /* MFENCE synchronizes RDTSC */
+ XEN_CPUFEATURE(XEN_SMEP, X86_SYNTH(10)) /* SMEP gets used by Xen itself */
+ XEN_CPUFEATURE(XEN_SMAP, X86_SYNTH(11)) /* SMAP gets used by Xen itself */
+-/* Bit 12 unused. */
++XEN_CPUFEATURE(SC_NO_LOCK_HARDEN, X86_SYNTH(12)) /* (Disable) Lock critical region hardening */
+ XEN_CPUFEATURE(IND_THUNK_LFENCE, X86_SYNTH(13)) /* Use IND_THUNK_LFENCE */
+ XEN_CPUFEATURE(IND_THUNK_JMP, X86_SYNTH(14)) /* Use IND_THUNK_JMP */
+ XEN_CPUFEATURE(SC_NO_BRANCH_HARDEN, X86_SYNTH(15)) /* (Disable) Conditional branch hardening */
+diff --git a/xen/arch/x86/include/asm/nospec.h b/xen/arch/x86/include/asm/nospec.h
+index 7150e76b87..0725839e19 100644
+--- a/xen/arch/x86/include/asm/nospec.h
++++ b/xen/arch/x86/include/asm/nospec.h
+@@ -38,6 +38,32 @@ static always_inline void block_speculation(void)
+ barrier_nospec_true();
+ }
+
++static always_inline void arch_block_lock_speculation(void)
++{
++ alternative("lfence", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
++}
++
++/* Allow to insert a read memory barrier into conditionals */
++static always_inline bool barrier_lock_true(void)
++{
++ alternative("lfence #nospec-true", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
++ return true;
++}
++
++static always_inline bool barrier_lock_false(void)
++{
++ alternative("lfence #nospec-false", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
++ return false;
++}
++
++static always_inline bool arch_lock_evaluate_nospec(bool condition)
++{
++ if ( condition )
++ return barrier_lock_true();
++ else
++ return barrier_lock_false();
++}
++
+ #endif /* _ASM_X86_NOSPEC_H */
+
+ /*
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 1ee81e2dfe..ac21af2c5c 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -65,6 +65,7 @@ int8_t __read_mostly opt_eager_fpu = -1;
+ int8_t __read_mostly opt_l1d_flush = -1;
+ static bool __initdata opt_branch_harden =
+ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH);
++static bool __initdata opt_lock_harden;
+
+ bool __initdata bsp_delay_spec_ctrl;
+ uint8_t __read_mostly default_xen_spec_ctrl;
+@@ -133,6 +134,7 @@ static int __init cf_check parse_spec_ctrl(const char *s)
+ opt_ssbd = false;
+ opt_l1d_flush = 0;
+ opt_branch_harden = false;
++ opt_lock_harden = false;
+ opt_srb_lock = 0;
+ opt_unpriv_mmio = false;
+ opt_gds_mit = 0;
+@@ -298,6 +300,16 @@ static int __init cf_check parse_spec_ctrl(const char *s)
+ rc = -EINVAL;
+ }
+ }
++ else if ( (val = parse_boolean("lock-harden", s, ss)) >= 0 )
++ {
++ if ( IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_LOCK) )
++ opt_lock_harden = val;
++ else
++ {
++ no_config_param("SPECULATIVE_HARDEN_LOCK", "spec-ctrl", s, ss);
++ rc = -EINVAL;
++ }
++ }
+ else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 )
+ opt_srb_lock = val;
+ else if ( (val = parse_boolean("unpriv-mmio", s, ss)) >= 0 )
+@@ -500,7 +512,8 @@ static void __init print_details(enum ind_thunk thunk)
+ if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) ||
+ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_ARRAY) ||
+ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH) ||
+- IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS) )
++ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS) ||
++ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_LOCK) )
+ printk(" Compiled-in support:"
+ #ifdef CONFIG_INDIRECT_THUNK
+ " INDIRECT_THUNK"
+@@ -516,11 +529,14 @@ static void __init print_details(enum ind_thunk thunk)
+ #endif
+ #ifdef CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS
+ " HARDEN_GUEST_ACCESS"
++#endif
++#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
++ " HARDEN_LOCK"
+ #endif
+ "\n");
+
+ /* Settings for Xen's protection, irrespective of guests. */
+- printk(" Xen settings: %s%sSPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s\n",
++ printk(" Xen settings: %s%sSPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s%s\n",
+ thunk != THUNK_NONE ? "BTI-Thunk: " : "",
+ thunk == THUNK_NONE ? "" :
+ thunk == THUNK_RETPOLINE ? "RETPOLINE, " :
+@@ -547,7 +563,8 @@ static void __init print_details(enum ind_thunk thunk)
+ opt_verw_pv || opt_verw_hvm ||
+ opt_verw_mmio ? " VERW" : "",
+ opt_div_scrub ? " DIV" : "",
+- opt_branch_harden ? " BRANCH_HARDEN" : "");
++ opt_branch_harden ? " BRANCH_HARDEN" : "",
++ opt_lock_harden ? " LOCK_HARDEN" : "");
+
+ /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
+ if ( cpu_has_bug_l1tf || opt_pv_l1tf_hwdom || opt_pv_l1tf_domu )
+@@ -1930,6 +1947,9 @@ void __init init_speculation_mitigations(void)
+ if ( !opt_branch_harden )
+ setup_force_cpu_cap(X86_FEATURE_SC_NO_BRANCH_HARDEN);
+
++ if ( !opt_lock_harden )
++ setup_force_cpu_cap(X86_FEATURE_SC_NO_LOCK_HARDEN);
++
+ /*
+ * We do not disable HT by default on affected hardware.
+ *
+diff --git a/xen/common/Kconfig b/xen/common/Kconfig
+index e7794cb7f6..cd73851538 100644
+--- a/xen/common/Kconfig
++++ b/xen/common/Kconfig
+@@ -173,6 +173,23 @@ config SPECULATIVE_HARDEN_GUEST_ACCESS
+
+ If unsure, say Y.
+
++config SPECULATIVE_HARDEN_LOCK
++ bool "Speculative lock context hardening"
++ default y
++ depends on X86
++ help
++ Contemporary processors may use speculative execution as a
++ performance optimisation, but this can potentially be abused by an
++ attacker to leak data via speculative sidechannels.
++
++ One source of data leakage is via speculative accesses to lock
++ critical regions.
++
++ This option is disabled by default at run time, and needs to be
++ enabled on the command line.
++
++ If unsure, say Y.
++
+ endmenu
+
+ config DIT_DEFAULT
+diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h
+index 76255bc46e..4552846403 100644
+--- a/xen/include/xen/nospec.h
++++ b/xen/include/xen/nospec.h
+@@ -70,6 +70,21 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
+ #define array_access_nospec(array, index) \
+ (array)[array_index_nospec(index, ARRAY_SIZE(array))]
+
++static always_inline void block_lock_speculation(void)
++{
++#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
++ arch_block_lock_speculation();
++#endif
++}
++
++static always_inline bool lock_evaluate_nospec(bool condition)
++{
++#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
++ return arch_lock_evaluate_nospec(condition);
++#endif
++ return condition;
++}
++
+ #endif /* XEN_NOSPEC_H */
+
+ /*
+diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
+index 961891bea4..daf48fdea7 100644
+--- a/xen/include/xen/spinlock.h
++++ b/xen/include/xen/spinlock.h
+@@ -1,6 +1,7 @@
+ #ifndef __SPINLOCK_H__
+ #define __SPINLOCK_H__
+
++#include <xen/nospec.h>
+ #include <xen/time.h>
+ #include <asm/system.h>
+ #include <asm/spinlock.h>
+@@ -189,13 +190,30 @@ int _spin_trylock_recursive(spinlock_t *lock);
+ void _spin_lock_recursive(spinlock_t *lock);
+ void _spin_unlock_recursive(spinlock_t *lock);
+
+-#define spin_lock(l) _spin_lock(l)
+-#define spin_lock_cb(l, c, d) _spin_lock_cb(l, c, d)
+-#define spin_lock_irq(l) _spin_lock_irq(l)
++static always_inline void spin_lock(spinlock_t *l)
++{
++ _spin_lock(l);
++ block_lock_speculation();
++}
++
++static always_inline void spin_lock_cb(spinlock_t *l, void (*c)(void *data),
++ void *d)
++{
++ _spin_lock_cb(l, c, d);
++ block_lock_speculation();
++}
++
++static always_inline void spin_lock_irq(spinlock_t *l)
++{
++ _spin_lock_irq(l);
++ block_lock_speculation();
++}
++
+ #define spin_lock_irqsave(l, f) \
+ ({ \
+ BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long)); \
+ ((f) = _spin_lock_irqsave(l)); \
++ block_lock_speculation(); \
+ })
+
+ #define spin_unlock(l) _spin_unlock(l)
+@@ -203,7 +221,7 @@ void _spin_unlock_recursive(spinlock_t *lock);
+ #define spin_unlock_irqrestore(l, f) _spin_unlock_irqrestore(l, f)
+
+ #define spin_is_locked(l) _spin_is_locked(l)
+-#define spin_trylock(l) _spin_trylock(l)
++#define spin_trylock(l) lock_evaluate_nospec(_spin_trylock(l))
+
+ #define spin_trylock_irqsave(lock, flags) \
+ ({ \
+@@ -224,8 +242,15 @@ void _spin_unlock_recursive(spinlock_t *lock);
+ * are any critical regions that cannot form part of such a set, they can use
+ * standard spin_[un]lock().
+ */
+-#define spin_trylock_recursive(l) _spin_trylock_recursive(l)
+-#define spin_lock_recursive(l) _spin_lock_recursive(l)
++#define spin_trylock_recursive(l) \
++ lock_evaluate_nospec(_spin_trylock_recursive(l))
++
++static always_inline void spin_lock_recursive(spinlock_t *l)
++{
++ _spin_lock_recursive(l);
++ block_lock_speculation();
++}
++
+ #define spin_unlock_recursive(l) _spin_unlock_recursive(l)
+
+ #endif /* __SPINLOCK_H__ */
+--
+2.44.0
+
diff --git a/0049-rwlock-introduce-support-for-blocking-speculation-in.patch b/0049-rwlock-introduce-support-for-blocking-speculation-in.patch
new file mode 100644
index 0000000..593b588
--- /dev/null
+++ b/0049-rwlock-introduce-support-for-blocking-speculation-in.patch
@@ -0,0 +1,125 @@
+From 7454dad6ee15f9fa6d84fc285d366b86f3d47494 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 13 Feb 2024 16:08:52 +0100
+Subject: [PATCH 49/67] rwlock: introduce support for blocking speculation into
+ critical regions
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Introduce inline wrappers as required and add direct calls to
+block_lock_speculation() in order to prevent speculation into the rwlock
+protected critical regions.
+
+Note the rwlock primitives are adjusted to use the non speculation safe variants
+of the spinlock handlers, as a speculation barrier is added in the rwlock
+calling wrappers.
+
+trylock variants are protected by using lock_evaluate_nospec().
+
+This is part of XSA-453 / CVE-2024-2193
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit a1fb15f61692b1fa9945fc51f55471ace49cdd59)
+---
+ xen/common/rwlock.c | 14 +++++++++++---
+ xen/include/xen/rwlock.h | 34 ++++++++++++++++++++++++++++------
+ 2 files changed, 39 insertions(+), 9 deletions(-)
+
+diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
+index aa15529bbe..cda06b9d6e 100644
+--- a/xen/common/rwlock.c
++++ b/xen/common/rwlock.c
+@@ -34,8 +34,11 @@ void queue_read_lock_slowpath(rwlock_t *lock)
+
+ /*
+ * Put the reader into the wait queue.
++ *
++ * Use the speculation unsafe helper, as it's the caller responsibility to
++ * issue a speculation barrier if required.
+ */
+- spin_lock(&lock->lock);
++ _spin_lock(&lock->lock);
+
+ /*
+ * At the head of the wait queue now, wait until the writer state
+@@ -64,8 +67,13 @@ void queue_write_lock_slowpath(rwlock_t *lock)
+ {
+ u32 cnts;
+
+- /* Put the writer into the wait queue. */
+- spin_lock(&lock->lock);
++ /*
++ * Put the writer into the wait queue.
++ *
++ * Use the speculation unsafe helper, as it's the caller responsibility to
++ * issue a speculation barrier if required.
++ */
++ _spin_lock(&lock->lock);
+
+ /* Try to acquire the lock directly if no reader is present. */
+ if ( !atomic_read(&lock->cnts) &&
+diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
+index 0cc9167715..fd0458be94 100644
+--- a/xen/include/xen/rwlock.h
++++ b/xen/include/xen/rwlock.h
+@@ -247,27 +247,49 @@ static inline int _rw_is_write_locked(rwlock_t *lock)
+ return (atomic_read(&lock->cnts) & _QW_WMASK) == _QW_LOCKED;
+ }
+
+-#define read_lock(l) _read_lock(l)
+-#define read_lock_irq(l) _read_lock_irq(l)
++static always_inline void read_lock(rwlock_t *l)
++{
++ _read_lock(l);
++ block_lock_speculation();
++}
++
++static always_inline void read_lock_irq(rwlock_t *l)
++{
++ _read_lock_irq(l);
++ block_lock_speculation();
++}
++
+ #define read_lock_irqsave(l, f) \
+ ({ \
+ BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long)); \
+ ((f) = _read_lock_irqsave(l)); \
++ block_lock_speculation(); \
+ })
+
+ #define read_unlock(l) _read_unlock(l)
+ #define read_unlock_irq(l) _read_unlock_irq(l)
+ #define read_unlock_irqrestore(l, f) _read_unlock_irqrestore(l, f)
+-#define read_trylock(l) _read_trylock(l)
++#define read_trylock(l) lock_evaluate_nospec(_read_trylock(l))
++
++static always_inline void write_lock(rwlock_t *l)
++{
++ _write_lock(l);
++ block_lock_speculation();
++}
++
++static always_inline void write_lock_irq(rwlock_t *l)
++{
++ _write_lock_irq(l);
++ block_lock_speculation();
++}
+
+-#define write_lock(l) _write_lock(l)
+-#define write_lock_irq(l) _write_lock_irq(l)
+ #define write_lock_irqsave(l, f) \
+ ({ \
+ BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long)); \
+ ((f) = _write_lock_irqsave(l)); \
++ block_lock_speculation(); \
+ })
+-#define write_trylock(l) _write_trylock(l)
++#define write_trylock(l) lock_evaluate_nospec(_write_trylock(l))
+
+ #define write_unlock(l) _write_unlock(l)
+ #define write_unlock_irq(l) _write_unlock_irq(l)
+--
+2.44.0
+
diff --git a/0050-percpu-rwlock-introduce-support-for-blocking-specula.patch b/0050-percpu-rwlock-introduce-support-for-blocking-specula.patch
new file mode 100644
index 0000000..1da2128
--- /dev/null
+++ b/0050-percpu-rwlock-introduce-support-for-blocking-specula.patch
@@ -0,0 +1,87 @@
+From 468a368b2e5a38fc0be8e9e5f475820f7e4a6b4f Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 13 Feb 2024 17:57:38 +0100
+Subject: [PATCH 50/67] percpu-rwlock: introduce support for blocking
+ speculation into critical regions
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Add direct calls to block_lock_speculation() where required in order to prevent
+speculation into the lock protected critical regions. Also convert
+_percpu_read_lock() from inline to always_inline.
+
+Note that _percpu_write_lock() has been modified the use the non speculation
+safe of the locking primites, as a speculation is added unconditionally by the
+calling wrapper.
+
+This is part of XSA-453 / CVE-2024-2193
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit f218daf6d3a3b847736d37c6a6b76031a0d08441)
+---
+ xen/common/rwlock.c | 6 +++++-
+ xen/include/xen/rwlock.h | 14 ++++++++++----
+ 2 files changed, 15 insertions(+), 5 deletions(-)
+
+diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
+index cda06b9d6e..4da0ed8fad 100644
+--- a/xen/common/rwlock.c
++++ b/xen/common/rwlock.c
+@@ -125,8 +125,12 @@ void _percpu_write_lock(percpu_rwlock_t **per_cpudata,
+ /*
+ * First take the write lock to protect against other writers or slow
+ * path readers.
++ *
++ * Note we use the speculation unsafe variant of write_lock(), as the
++ * calling wrapper already adds a speculation barrier after the lock has
++ * been taken.
+ */
+- write_lock(&percpu_rwlock->rwlock);
++ _write_lock(&percpu_rwlock->rwlock);
+
+ /* Now set the global variable so that readers start using read_lock. */
+ percpu_rwlock->writer_activating = 1;
+diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
+index fd0458be94..abe0804bf7 100644
+--- a/xen/include/xen/rwlock.h
++++ b/xen/include/xen/rwlock.h
+@@ -326,8 +326,8 @@ static inline void _percpu_rwlock_owner_check(percpu_rwlock_t **per_cpudata,
+ #define percpu_rwlock_resource_init(l, owner) \
+ (*(l) = (percpu_rwlock_t)PERCPU_RW_LOCK_UNLOCKED(&get_per_cpu_var(owner)))
+
+-static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
+- percpu_rwlock_t *percpu_rwlock)
++static always_inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
++ percpu_rwlock_t *percpu_rwlock)
+ {
+ /* Validate the correct per_cpudata variable has been provided. */
+ _percpu_rwlock_owner_check(per_cpudata, percpu_rwlock);
+@@ -362,6 +362,8 @@ static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
+ }
+ else
+ {
++ /* Other branch already has a speculation barrier in read_lock(). */
++ block_lock_speculation();
+ /* All other paths have implicit check_lock() calls via read_lock(). */
+ check_lock(&percpu_rwlock->rwlock.lock.debug, false);
+ }
+@@ -410,8 +412,12 @@ static inline void _percpu_write_unlock(percpu_rwlock_t **per_cpudata,
+ _percpu_read_lock(&get_per_cpu_var(percpu), lock)
+ #define percpu_read_unlock(percpu, lock) \
+ _percpu_read_unlock(&get_per_cpu_var(percpu), lock)
+-#define percpu_write_lock(percpu, lock) \
+- _percpu_write_lock(&get_per_cpu_var(percpu), lock)
++
++#define percpu_write_lock(percpu, lock) \
++({ \
++ _percpu_write_lock(&get_per_cpu_var(percpu), lock); \
++ block_lock_speculation(); \
++})
+ #define percpu_write_unlock(percpu, lock) \
+ _percpu_write_unlock(&get_per_cpu_var(percpu), lock)
+
+--
+2.44.0
+
diff --git a/0051-locking-attempt-to-ensure-lock-wrappers-are-always-i.patch b/0051-locking-attempt-to-ensure-lock-wrappers-are-always-i.patch
new file mode 100644
index 0000000..822836d
--- /dev/null
+++ b/0051-locking-attempt-to-ensure-lock-wrappers-are-always-i.patch
@@ -0,0 +1,405 @@
+From 2cc5e57be680a516aa5cdef4281856d09b9d0ea6 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Mon, 4 Mar 2024 14:29:36 +0100
+Subject: [PATCH 51/67] locking: attempt to ensure lock wrappers are always
+ inline
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+In order to prevent the locking speculation barriers from being inside of
+`call`ed functions that could be speculatively bypassed.
+
+While there also add an extra locking barrier to _mm_write_lock() in the branch
+taken when the lock is already held.
+
+Note some functions are switched to use the unsafe variants (without speculation
+barrier) of the locking primitives, but a speculation barrier is always added
+to the exposed public lock wrapping helper. That's the case with
+sched_spin_lock_double() or pcidevs_lock() for example.
+
+This is part of XSA-453 / CVE-2024-2193
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 197ecd838a2aaf959a469df3696d4559c4f8b762)
+---
+ xen/arch/x86/hvm/vpt.c | 10 +++++++---
+ xen/arch/x86/include/asm/irq.h | 1 +
+ xen/arch/x86/mm/mm-locks.h | 28 +++++++++++++++-------------
+ xen/arch/x86/mm/p2m-pod.c | 2 +-
+ xen/common/event_channel.c | 5 +++--
+ xen/common/grant_table.c | 6 +++---
+ xen/common/sched/core.c | 19 ++++++++++++-------
+ xen/common/sched/private.h | 26 ++++++++++++++++++++++++--
+ xen/common/timer.c | 8 +++++---
+ xen/drivers/passthrough/pci.c | 5 +++--
+ xen/include/xen/event.h | 4 ++--
+ xen/include/xen/pci.h | 8 ++++++--
+ 12 files changed, 82 insertions(+), 40 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
+index cb1d81bf9e..66f1095245 100644
+--- a/xen/arch/x86/hvm/vpt.c
++++ b/xen/arch/x86/hvm/vpt.c
+@@ -161,7 +161,7 @@ static int pt_irq_masked(struct periodic_time *pt)
+ * pt->vcpu field, because another thread holding the pt_migrate lock
+ * may already be spinning waiting for your vcpu lock.
+ */
+-static void pt_vcpu_lock(struct vcpu *v)
++static always_inline void pt_vcpu_lock(struct vcpu *v)
+ {
+ spin_lock(&v->arch.hvm.tm_lock);
+ }
+@@ -180,9 +180,13 @@ static void pt_vcpu_unlock(struct vcpu *v)
+ * need to take an additional lock that protects against pt->vcpu
+ * changing.
+ */
+-static void pt_lock(struct periodic_time *pt)
++static always_inline void pt_lock(struct periodic_time *pt)
+ {
+- read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
++ /*
++ * Use the speculation unsafe variant for the first lock, as the following
++ * lock taking helper already includes a speculation barrier.
++ */
++ _read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
+ spin_lock(&pt->vcpu->arch.hvm.tm_lock);
+ }
+
+diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
+index f6a0207a80..823d627fd0 100644
+--- a/xen/arch/x86/include/asm/irq.h
++++ b/xen/arch/x86/include/asm/irq.h
+@@ -178,6 +178,7 @@ void cf_check irq_complete_move(struct irq_desc *);
+
+ extern struct irq_desc *irq_desc;
+
++/* Not speculation safe, only used for AP bringup. */
+ void lock_vector_lock(void);
+ void unlock_vector_lock(void);
+
+diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
+index c1523aeccf..265239c49f 100644
+--- a/xen/arch/x86/mm/mm-locks.h
++++ b/xen/arch/x86/mm/mm-locks.h
+@@ -86,8 +86,8 @@ static inline void _set_lock_level(int l)
+ this_cpu(mm_lock_level) = l;
+ }
+
+-static inline void _mm_lock(const struct domain *d, mm_lock_t *l,
+- const char *func, int level, int rec)
++static always_inline void _mm_lock(const struct domain *d, mm_lock_t *l,
++ const char *func, int level, int rec)
+ {
+ if ( !((mm_locked_by_me(l)) && rec) )
+ _check_lock_level(d, level);
+@@ -137,8 +137,8 @@ static inline int mm_write_locked_by_me(mm_rwlock_t *l)
+ return (l->locker == get_processor_id());
+ }
+
+-static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
+- const char *func, int level)
++static always_inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
++ const char *func, int level)
+ {
+ if ( !mm_write_locked_by_me(l) )
+ {
+@@ -149,6 +149,8 @@ static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
+ l->unlock_level = _get_lock_level();
+ _set_lock_level(_lock_level(d, level));
+ }
++ else
++ block_speculation();
+ l->recurse_count++;
+ }
+
+@@ -162,8 +164,8 @@ static inline void mm_write_unlock(mm_rwlock_t *l)
+ percpu_write_unlock(p2m_percpu_rwlock, &l->lock);
+ }
+
+-static inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l,
+- int level)
++static always_inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l,
++ int level)
+ {
+ _check_lock_level(d, level);
+ percpu_read_lock(p2m_percpu_rwlock, &l->lock);
+@@ -178,15 +180,15 @@ static inline void mm_read_unlock(mm_rwlock_t *l)
+
+ /* This wrapper uses the line number to express the locking order below */
+ #define declare_mm_lock(name) \
+- static inline void mm_lock_##name(const struct domain *d, mm_lock_t *l, \
+- const char *func, int rec) \
++ static always_inline void mm_lock_##name( \
++ const struct domain *d, mm_lock_t *l, const char *func, int rec) \
+ { _mm_lock(d, l, func, MM_LOCK_ORDER_##name, rec); }
+ #define declare_mm_rwlock(name) \
+- static inline void mm_write_lock_##name(const struct domain *d, \
+- mm_rwlock_t *l, const char *func) \
++ static always_inline void mm_write_lock_##name( \
++ const struct domain *d, mm_rwlock_t *l, const char *func) \
+ { _mm_write_lock(d, l, func, MM_LOCK_ORDER_##name); } \
+- static inline void mm_read_lock_##name(const struct domain *d, \
+- mm_rwlock_t *l) \
++ static always_inline void mm_read_lock_##name(const struct domain *d, \
++ mm_rwlock_t *l) \
+ { _mm_read_lock(d, l, MM_LOCK_ORDER_##name); }
+ /* These capture the name of the calling function */
+ #define mm_lock(name, d, l) mm_lock_##name(d, l, __func__, 0)
+@@ -321,7 +323,7 @@ declare_mm_lock(altp2mlist)
+ #define MM_LOCK_ORDER_altp2m 40
+ declare_mm_rwlock(altp2m);
+
+-static inline void p2m_lock(struct p2m_domain *p)
++static always_inline void p2m_lock(struct p2m_domain *p)
+ {
+ if ( p2m_is_altp2m(p) )
+ mm_write_lock(altp2m, p->domain, &p->lock);
+diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
+index fc110506dc..99dbcb3101 100644
+--- a/xen/arch/x86/mm/p2m-pod.c
++++ b/xen/arch/x86/mm/p2m-pod.c
+@@ -36,7 +36,7 @@
+ #define superpage_aligned(_x) (((_x)&(SUPERPAGE_PAGES-1))==0)
+
+ /* Enforce lock ordering when grabbing the "external" page_alloc lock */
+-static inline void lock_page_alloc(struct p2m_domain *p2m)
++static always_inline void lock_page_alloc(struct p2m_domain *p2m)
+ {
+ page_alloc_mm_pre_lock(p2m->domain);
+ spin_lock(&(p2m->domain->page_alloc_lock));
+diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
+index f5e0b12d15..dada9f15f5 100644
+--- a/xen/common/event_channel.c
++++ b/xen/common/event_channel.c
+@@ -62,7 +62,7 @@
+ * just assume the event channel is free or unbound at the moment when the
+ * evtchn_read_trylock() returns false.
+ */
+-static inline void evtchn_write_lock(struct evtchn *evtchn)
++static always_inline void evtchn_write_lock(struct evtchn *evtchn)
+ {
+ write_lock(&evtchn->lock);
+
+@@ -364,7 +364,8 @@ int evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc, evtchn_port_t port)
+ return rc;
+ }
+
+-static void double_evtchn_lock(struct evtchn *lchn, struct evtchn *rchn)
++static always_inline void double_evtchn_lock(struct evtchn *lchn,
++ struct evtchn *rchn)
+ {
+ ASSERT(lchn != rchn);
+
+diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
+index ee7cc496b8..62a8685cd5 100644
+--- a/xen/common/grant_table.c
++++ b/xen/common/grant_table.c
+@@ -410,7 +410,7 @@ static inline void act_set_gfn(struct active_grant_entry *act, gfn_t gfn)
+
+ static DEFINE_PERCPU_RWLOCK_GLOBAL(grant_rwlock);
+
+-static inline void grant_read_lock(struct grant_table *gt)
++static always_inline void grant_read_lock(struct grant_table *gt)
+ {
+ percpu_read_lock(grant_rwlock, >->lock);
+ }
+@@ -420,7 +420,7 @@ static inline void grant_read_unlock(struct grant_table *gt)
+ percpu_read_unlock(grant_rwlock, >->lock);
+ }
+
+-static inline void grant_write_lock(struct grant_table *gt)
++static always_inline void grant_write_lock(struct grant_table *gt)
+ {
+ percpu_write_lock(grant_rwlock, >->lock);
+ }
+@@ -457,7 +457,7 @@ nr_active_grant_frames(struct grant_table *gt)
+ return num_act_frames_from_sha_frames(nr_grant_frames(gt));
+ }
+
+-static inline struct active_grant_entry *
++static always_inline struct active_grant_entry *
+ active_entry_acquire(struct grant_table *t, grant_ref_t e)
+ {
+ struct active_grant_entry *act;
+diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
+index 078beb1adb..29bbab5ac6 100644
+--- a/xen/common/sched/core.c
++++ b/xen/common/sched/core.c
+@@ -348,23 +348,28 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
+ * This avoids dead- or live-locks when this code is running on both
+ * cpus at the same time.
+ */
+-static void sched_spin_lock_double(spinlock_t *lock1, spinlock_t *lock2,
+- unsigned long *flags)
++static always_inline void sched_spin_lock_double(
++ spinlock_t *lock1, spinlock_t *lock2, unsigned long *flags)
+ {
++ /*
++ * In order to avoid extra overhead, use the locking primitives without the
++ * speculation barrier, and introduce a single barrier here.
++ */
+ if ( lock1 == lock2 )
+ {
+- spin_lock_irqsave(lock1, *flags);
++ *flags = _spin_lock_irqsave(lock1);
+ }
+ else if ( lock1 < lock2 )
+ {
+- spin_lock_irqsave(lock1, *flags);
+- spin_lock(lock2);
++ *flags = _spin_lock_irqsave(lock1);
++ _spin_lock(lock2);
+ }
+ else
+ {
+- spin_lock_irqsave(lock2, *flags);
+- spin_lock(lock1);
++ *flags = _spin_lock_irqsave(lock2);
++ _spin_lock(lock1);
+ }
++ block_lock_speculation();
+ }
+
+ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
+diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
+index 0527a8c70d..24a93dd0c1 100644
+--- a/xen/common/sched/private.h
++++ b/xen/common/sched/private.h
+@@ -207,8 +207,24 @@ DECLARE_PER_CPU(cpumask_t, cpumask_scratch);
+ #define cpumask_scratch (&this_cpu(cpumask_scratch))
+ #define cpumask_scratch_cpu(c) (&per_cpu(cpumask_scratch, c))
+
++/*
++ * Deal with _spin_lock_irqsave() returning the flags value instead of storing
++ * it in a passed parameter.
++ */
++#define _sched_spinlock0(lock, irq) _spin_lock##irq(lock)
++#define _sched_spinlock1(lock, irq, arg) ({ \
++ BUILD_BUG_ON(sizeof(arg) != sizeof(unsigned long)); \
++ (arg) = _spin_lock##irq(lock); \
++})
++
++#define _sched_spinlock__(nr) _sched_spinlock ## nr
++#define _sched_spinlock_(nr) _sched_spinlock__(nr)
++#define _sched_spinlock(lock, irq, args...) \
++ _sched_spinlock_(count_args(args))(lock, irq, ## args)
++
+ #define sched_lock(kind, param, cpu, irq, arg...) \
+-static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
++static always_inline spinlock_t \
++*kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
+ { \
+ for ( ; ; ) \
+ { \
+@@ -220,10 +236,16 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
+ * \
+ * It may also be the case that v->processor may change but the \
+ * lock may be the same; this will succeed in that case. \
++ * \
++ * Use the speculation unsafe locking helper, there's a speculation \
++ * barrier before returning to the caller. \
+ */ \
+- spin_lock##irq(lock, ## arg); \
++ _sched_spinlock(lock, irq, ## arg); \
+ if ( likely(lock == get_sched_res(cpu)->schedule_lock) ) \
++ { \
++ block_lock_speculation(); \
+ return lock; \
++ } \
+ spin_unlock##irq(lock, ## arg); \
+ } \
+ }
+diff --git a/xen/common/timer.c b/xen/common/timer.c
+index 9b5016d5ed..459668d417 100644
+--- a/xen/common/timer.c
++++ b/xen/common/timer.c
+@@ -240,7 +240,7 @@ static inline void deactivate_timer(struct timer *timer)
+ list_add(&timer->inactive, &per_cpu(timers, timer->cpu).inactive);
+ }
+
+-static inline bool_t timer_lock(struct timer *timer)
++static inline bool_t timer_lock_unsafe(struct timer *timer)
+ {
+ unsigned int cpu;
+
+@@ -254,7 +254,8 @@ static inline bool_t timer_lock(struct timer *timer)
+ rcu_read_unlock(&timer_cpu_read_lock);
+ return 0;
+ }
+- spin_lock(&per_cpu(timers, cpu).lock);
++ /* Use the speculation unsafe variant, the wrapper has the barrier. */
++ _spin_lock(&per_cpu(timers, cpu).lock);
+ if ( likely(timer->cpu == cpu) )
+ break;
+ spin_unlock(&per_cpu(timers, cpu).lock);
+@@ -267,8 +268,9 @@ static inline bool_t timer_lock(struct timer *timer)
+ #define timer_lock_irqsave(t, flags) ({ \
+ bool_t __x; \
+ local_irq_save(flags); \
+- if ( !(__x = timer_lock(t)) ) \
++ if ( !(__x = timer_lock_unsafe(t)) ) \
+ local_irq_restore(flags); \
++ block_lock_speculation(); \
+ __x; \
+ })
+
+diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
+index 8c62b14d19..1b3d285166 100644
+--- a/xen/drivers/passthrough/pci.c
++++ b/xen/drivers/passthrough/pci.c
+@@ -52,9 +52,10 @@ struct pci_seg {
+
+ static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED;
+
+-void pcidevs_lock(void)
++/* Do not use, as it has no speculation barrier, use pcidevs_lock() instead. */
++void pcidevs_lock_unsafe(void)
+ {
+- spin_lock_recursive(&_pcidevs_lock);
++ _spin_lock_recursive(&_pcidevs_lock);
+ }
+
+ void pcidevs_unlock(void)
+diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
+index 8eae9984a9..dd96e84c69 100644
+--- a/xen/include/xen/event.h
++++ b/xen/include/xen/event.h
+@@ -114,12 +114,12 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
+ #define bucket_from_port(d, p) \
+ ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
+
+-static inline void evtchn_read_lock(struct evtchn *evtchn)
++static always_inline void evtchn_read_lock(struct evtchn *evtchn)
+ {
+ read_lock(&evtchn->lock);
+ }
+
+-static inline bool evtchn_read_trylock(struct evtchn *evtchn)
++static always_inline bool evtchn_read_trylock(struct evtchn *evtchn)
+ {
+ return read_trylock(&evtchn->lock);
+ }
+diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
+index 5975ca2f30..b373f139d1 100644
+--- a/xen/include/xen/pci.h
++++ b/xen/include/xen/pci.h
+@@ -155,8 +155,12 @@ struct pci_dev {
+ * devices, it also sync the access to the msi capability that is not
+ * interrupt handling related (the mask bit register).
+ */
+-
+-void pcidevs_lock(void);
++void pcidevs_lock_unsafe(void);
++static always_inline void pcidevs_lock(void)
++{
++ pcidevs_lock_unsafe();
++ block_lock_speculation();
++}
+ void pcidevs_unlock(void);
+ bool_t __must_check pcidevs_locked(void);
+
+--
+2.44.0
+
diff --git a/0052-x86-mm-add-speculation-barriers-to-open-coded-locks.patch b/0052-x86-mm-add-speculation-barriers-to-open-coded-locks.patch
new file mode 100644
index 0000000..9e20f78
--- /dev/null
+++ b/0052-x86-mm-add-speculation-barriers-to-open-coded-locks.patch
@@ -0,0 +1,73 @@
+From 074b4c8987db235a0b86798810c045f68e4775b6 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Mon, 4 Mar 2024 18:08:48 +0100
+Subject: [PATCH 52/67] x86/mm: add speculation barriers to open coded locks
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Add a speculation barrier to the clearly identified open-coded lock taking
+functions.
+
+Note that the memory sharing page_lock() replacement (_page_lock()) is left
+as-is, as the code is experimental and not security supported.
+
+This is part of XSA-453 / CVE-2024-2193
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 42a572a38e22a97d86a4b648a22597628d5b42e4)
+---
+ xen/arch/x86/include/asm/mm.h | 4 +++-
+ xen/arch/x86/mm.c | 6 ++++--
+ 2 files changed, 7 insertions(+), 3 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/mm.h b/xen/arch/x86/include/asm/mm.h
+index a5d7fdd32e..5845b729c3 100644
+--- a/xen/arch/x86/include/asm/mm.h
++++ b/xen/arch/x86/include/asm/mm.h
+@@ -393,7 +393,9 @@ const struct platform_bad_page *get_platform_badpages(unsigned int *array_size);
+ * The use of PGT_locked in mem_sharing does not collide, since mem_sharing is
+ * only supported for hvm guests, which do not have PV PTEs updated.
+ */
+-int page_lock(struct page_info *page);
++int page_lock_unsafe(struct page_info *page);
++#define page_lock(pg) lock_evaluate_nospec(page_lock_unsafe(pg))
++
+ void page_unlock(struct page_info *page);
+
+ void put_page_type(struct page_info *page);
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index 330c4abcd1..8d19d719bd 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -2033,7 +2033,7 @@ static inline bool current_locked_page_ne_check(struct page_info *page) {
+ #define current_locked_page_ne_check(x) true
+ #endif
+
+-int page_lock(struct page_info *page)
++int page_lock_unsafe(struct page_info *page)
+ {
+ unsigned long x, nx;
+
+@@ -2094,7 +2094,7 @@ void page_unlock(struct page_info *page)
+ * l3t_lock(), so to avoid deadlock we must avoid grabbing them in
+ * reverse order.
+ */
+-static void l3t_lock(struct page_info *page)
++static always_inline void l3t_lock(struct page_info *page)
+ {
+ unsigned long x, nx;
+
+@@ -2103,6 +2103,8 @@ static void l3t_lock(struct page_info *page)
+ cpu_relax();
+ nx = x | PGT_locked;
+ } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x );
++
++ block_lock_speculation();
+ }
+
+ static void l3t_unlock(struct page_info *page)
+--
+2.44.0
+
diff --git a/0053-x86-protect-conditional-lock-taking-from-speculative.patch b/0053-x86-protect-conditional-lock-taking-from-speculative.patch
new file mode 100644
index 0000000..f0caa24
--- /dev/null
+++ b/0053-x86-protect-conditional-lock-taking-from-speculative.patch
@@ -0,0 +1,216 @@
+From 0ebd2e49bcd0f566ba6b9158555942aab8e41332 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Mon, 4 Mar 2024 16:24:21 +0100
+Subject: [PATCH 53/67] x86: protect conditional lock taking from speculative
+ execution
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Conditionally taken locks that use the pattern:
+
+if ( lock )
+ spin_lock(...);
+
+Need an else branch in order to issue an speculation barrier in the else case,
+just like it's done in case the lock needs to be acquired.
+
+eval_nospec() could be used on the condition itself, but that would result in a
+double barrier on the branch where the lock is taken.
+
+Introduce a new pair of helpers, {gfn,spin}_lock_if() that can be used to
+conditionally take a lock in a speculation safe way.
+
+This is part of XSA-453 / CVE-2024-2193
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+(cherry picked from commit 03cf7ca23e0e876075954c558485b267b7d02406)
+---
+ xen/arch/x86/mm.c | 35 +++++++++++++----------------------
+ xen/arch/x86/mm/mm-locks.h | 9 +++++++++
+ xen/arch/x86/mm/p2m.c | 5 ++---
+ xen/include/xen/spinlock.h | 8 ++++++++
+ 4 files changed, 32 insertions(+), 25 deletions(-)
+
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index 8d19d719bd..d31b8d56ff 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -5023,8 +5023,7 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
+ if ( !l3t )
+ return NULL;
+ UNMAP_DOMAIN_PAGE(l3t);
+- if ( locking )
+- spin_lock(&map_pgdir_lock);
++ spin_lock_if(locking, &map_pgdir_lock);
+ if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
+ {
+ l4_pgentry_t l4e = l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
+@@ -5061,8 +5060,7 @@ static l2_pgentry_t *virt_to_xen_l2e(unsigned long v)
+ return NULL;
+ }
+ UNMAP_DOMAIN_PAGE(l2t);
+- if ( locking )
+- spin_lock(&map_pgdir_lock);
++ spin_lock_if(locking, &map_pgdir_lock);
+ if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
+ {
+ l3e_write(pl3e, l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR));
+@@ -5100,8 +5098,7 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
+ return NULL;
+ }
+ UNMAP_DOMAIN_PAGE(l1t);
+- if ( locking )
+- spin_lock(&map_pgdir_lock);
++ spin_lock_if(locking, &map_pgdir_lock);
+ if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
+ {
+ l2e_write(pl2e, l2e_from_mfn(l1mfn, __PAGE_HYPERVISOR));
+@@ -5132,6 +5129,8 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
+ do { \
+ if ( locking ) \
+ l3t_lock(page); \
++ else \
++ block_lock_speculation(); \
+ } while ( false )
+
+ #define L3T_UNLOCK(page) \
+@@ -5347,8 +5346,7 @@ int map_pages_to_xen(
+ if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
+ flush_flags |= FLUSH_TLB_GLOBAL;
+
+- if ( locking )
+- spin_lock(&map_pgdir_lock);
++ spin_lock_if(locking, &map_pgdir_lock);
+ if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
+ (l3e_get_flags(*pl3e) & _PAGE_PSE) )
+ {
+@@ -5452,8 +5450,7 @@ int map_pages_to_xen(
+ if ( l2e_get_flags(*pl2e) & _PAGE_GLOBAL )
+ flush_flags |= FLUSH_TLB_GLOBAL;
+
+- if ( locking )
+- spin_lock(&map_pgdir_lock);
++ spin_lock_if(locking, &map_pgdir_lock);
+ if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
+ (l2e_get_flags(*pl2e) & _PAGE_PSE) )
+ {
+@@ -5494,8 +5491,7 @@ int map_pages_to_xen(
+ unsigned long base_mfn;
+ const l1_pgentry_t *l1t;
+
+- if ( locking )
+- spin_lock(&map_pgdir_lock);
++ spin_lock_if(locking, &map_pgdir_lock);
+
+ ol2e = *pl2e;
+ /*
+@@ -5549,8 +5545,7 @@ int map_pages_to_xen(
+ unsigned long base_mfn;
+ const l2_pgentry_t *l2t;
+
+- if ( locking )
+- spin_lock(&map_pgdir_lock);
++ spin_lock_if(locking, &map_pgdir_lock);
+
+ ol3e = *pl3e;
+ /*
+@@ -5694,8 +5689,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
+ l3e_get_flags(*pl3e)));
+ UNMAP_DOMAIN_PAGE(l2t);
+
+- if ( locking )
+- spin_lock(&map_pgdir_lock);
++ spin_lock_if(locking, &map_pgdir_lock);
+ if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
+ (l3e_get_flags(*pl3e) & _PAGE_PSE) )
+ {
+@@ -5754,8 +5748,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
+ l2e_get_flags(*pl2e) & ~_PAGE_PSE));
+ UNMAP_DOMAIN_PAGE(l1t);
+
+- if ( locking )
+- spin_lock(&map_pgdir_lock);
++ spin_lock_if(locking, &map_pgdir_lock);
+ if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
+ (l2e_get_flags(*pl2e) & _PAGE_PSE) )
+ {
+@@ -5799,8 +5792,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
+ */
+ if ( (nf & _PAGE_PRESENT) || ((v != e) && (l1_table_offset(v) != 0)) )
+ continue;
+- if ( locking )
+- spin_lock(&map_pgdir_lock);
++ spin_lock_if(locking, &map_pgdir_lock);
+
+ /*
+ * L2E may be already cleared, or set to a superpage, by
+@@ -5847,8 +5839,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
+ if ( (nf & _PAGE_PRESENT) ||
+ ((v != e) && (l2_table_offset(v) + l1_table_offset(v) != 0)) )
+ continue;
+- if ( locking )
+- spin_lock(&map_pgdir_lock);
++ spin_lock_if(locking, &map_pgdir_lock);
+
+ /*
+ * L3E may be already cleared, or set to a superpage, by
+diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
+index 265239c49f..3ea2d8eb03 100644
+--- a/xen/arch/x86/mm/mm-locks.h
++++ b/xen/arch/x86/mm/mm-locks.h
+@@ -347,6 +347,15 @@ static inline void p2m_unlock(struct p2m_domain *p)
+ #define p2m_locked_by_me(p) mm_write_locked_by_me(&(p)->lock)
+ #define gfn_locked_by_me(p,g) p2m_locked_by_me(p)
+
++static always_inline void gfn_lock_if(bool condition, struct p2m_domain *p2m,
++ gfn_t gfn, unsigned int order)
++{
++ if ( condition )
++ gfn_lock(p2m, gfn, order);
++ else
++ block_lock_speculation();
++}
++
+ /* PoD lock (per-p2m-table)
+ *
+ * Protects private PoD data structs: entry and cache
+diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
+index b28c899b5e..1fa9e01012 100644
+--- a/xen/arch/x86/mm/p2m.c
++++ b/xen/arch/x86/mm/p2m.c
+@@ -292,9 +292,8 @@ mfn_t p2m_get_gfn_type_access(struct p2m_domain *p2m, gfn_t gfn,
+ if ( q & P2M_UNSHARE )
+ q |= P2M_ALLOC;
+
+- if ( locked )
+- /* Grab the lock here, don't release until put_gfn */
+- gfn_lock(p2m, gfn, 0);
++ /* Grab the lock here, don't release until put_gfn */
++ gfn_lock_if(locked, p2m, gfn, 0);
+
+ mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
+
+diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
+index daf48fdea7..7e75d0e2e7 100644
+--- a/xen/include/xen/spinlock.h
++++ b/xen/include/xen/spinlock.h
+@@ -216,6 +216,14 @@ static always_inline void spin_lock_irq(spinlock_t *l)
+ block_lock_speculation(); \
+ })
+
++/* Conditionally take a spinlock in a speculation safe way. */
++static always_inline void spin_lock_if(bool condition, spinlock_t *l)
++{
++ if ( condition )
++ _spin_lock(l);
++ block_lock_speculation();
++}
++
+ #define spin_unlock(l) _spin_unlock(l)
+ #define spin_unlock_irq(l) _spin_unlock_irq(l)
+ #define spin_unlock_irqrestore(l, f) _spin_unlock_irqrestore(l, f)
+--
+2.44.0
+
diff --git a/0054-tools-ipxe-update-for-fixing-build-with-GCC12.patch b/0054-tools-ipxe-update-for-fixing-build-with-GCC12.patch
new file mode 100644
index 0000000..90efaf8
--- /dev/null
+++ b/0054-tools-ipxe-update-for-fixing-build-with-GCC12.patch
@@ -0,0 +1,33 @@
+From a01c0b0f9691a8350e74938329892f949669119e Mon Sep 17 00:00:00 2001
+From: Olaf Hering <olaf@aepfle.de>
+Date: Wed, 27 Mar 2024 12:27:03 +0100
+Subject: [PATCH 54/67] tools: ipxe: update for fixing build with GCC12
+
+Use a snapshot which includes commit
+b0ded89e917b48b73097d3b8b88dfa3afb264ed0 ("[build] Disable dangling
+pointer checking for GCC"), which fixes build with gcc12.
+
+Signed-off-by: Olaf Hering <olaf@aepfle.de>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 18a36b4a9b088875486cfe33a2d4a8ae7eb4ab47
+master date: 2023-04-25 23:47:45 +0100
+---
+ tools/firmware/etherboot/Makefile | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/tools/firmware/etherboot/Makefile b/tools/firmware/etherboot/Makefile
+index 4bc3633ba3..7a56fe8014 100644
+--- a/tools/firmware/etherboot/Makefile
++++ b/tools/firmware/etherboot/Makefile
+@@ -11,7 +11,7 @@ IPXE_GIT_URL ?= git://git.ipxe.org/ipxe.git
+ endif
+
+ # put an updated tar.gz on xenbits after changes to this variable
+-IPXE_GIT_TAG := 3c040ad387099483102708bb1839110bc788cefb
++IPXE_GIT_TAG := 1d1cf74a5e58811822bee4b3da3cff7282fcdfca
+
+ IPXE_TARBALL_URL ?= $(XEN_EXTFILES_URL)/ipxe-git-$(IPXE_GIT_TAG).tar.gz
+
+--
+2.44.0
+
diff --git a/0055-x86-mm-use-block_lock_speculation-in-_mm_write_lock.patch b/0055-x86-mm-use-block_lock_speculation-in-_mm_write_lock.patch
new file mode 100644
index 0000000..719234c
--- /dev/null
+++ b/0055-x86-mm-use-block_lock_speculation-in-_mm_write_lock.patch
@@ -0,0 +1,35 @@
+From a153b8b42e9027ba3057bc7c8bf55e4d71e86ec3 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Wed, 27 Mar 2024 12:28:24 +0100
+Subject: [PATCH 55/67] x86/mm: use block_lock_speculation() in
+ _mm_write_lock()
+
+I can only guess that using block_speculation() there was a leftover
+from, earlier on, SPECULATIVE_HARDEN_LOCK depending on
+SPECULATIVE_HARDEN_BRANCH.
+
+Fixes: 197ecd838a2a ("locking: attempt to ensure lock wrappers are always inline")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 62018f08708a5ff6ef8fc8ff2aaaac46e5a60430
+master date: 2024-03-18 13:53:37 +0100
+---
+ xen/arch/x86/mm/mm-locks.h | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
+index 3ea2d8eb03..7d6e4d2a7c 100644
+--- a/xen/arch/x86/mm/mm-locks.h
++++ b/xen/arch/x86/mm/mm-locks.h
+@@ -150,7 +150,7 @@ static always_inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
+ _set_lock_level(_lock_level(d, level));
+ }
+ else
+- block_speculation();
++ block_lock_speculation();
+ l->recurse_count++;
+ }
+
+--
+2.44.0
+
diff --git a/0056-x86-boot-Fix-setup_apic_nmi_watchdog-to-fail-more-cl.patch b/0056-x86-boot-Fix-setup_apic_nmi_watchdog-to-fail-more-cl.patch
new file mode 100644
index 0000000..5d549c1
--- /dev/null
+++ b/0056-x86-boot-Fix-setup_apic_nmi_watchdog-to-fail-more-cl.patch
@@ -0,0 +1,120 @@
+From 471b53c6a092940f3629990d9ca946aa22bd8535 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 27 Mar 2024 12:29:11 +0100
+Subject: [PATCH 56/67] x86/boot: Fix setup_apic_nmi_watchdog() to fail more
+ cleanly
+
+Right now, if the user requests the watchdog on the command line,
+setup_apic_nmi_watchdog() will blindly assume that setting up the watchdog
+worked. Reuse nmi_perfctr_msr to identify when the watchdog has been
+configured.
+
+Rearrange setup_p6_watchdog() to not set nmi_perfctr_msr until the sanity
+checks are complete. Turn setup_p4_watchdog() into a void function, matching
+the others.
+
+If the watchdog isn't set up, inform the user and override to NMI_NONE, which
+will prevent check_nmi_watchdog() from claiming that all CPUs are stuck.
+
+e.g.:
+
+ (XEN) alt table ffff82d040697c38 -> ffff82d0406a97f0
+ (XEN) Failed to configure NMI watchdog
+ (XEN) Brought up 512 CPUs
+ (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: f658321374687c7339235e1ac643e0427acff717
+master date: 2024-03-19 18:29:37 +0000
+---
+ xen/arch/x86/nmi.c | 25 ++++++++++++-------------
+ 1 file changed, 12 insertions(+), 13 deletions(-)
+
+diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
+index 7656023748..7c9591b65e 100644
+--- a/xen/arch/x86/nmi.c
++++ b/xen/arch/x86/nmi.c
+@@ -323,8 +323,6 @@ static void setup_p6_watchdog(unsigned counter)
+ {
+ unsigned int evntsel;
+
+- nmi_perfctr_msr = MSR_P6_PERFCTR(0);
+-
+ if ( !nmi_p6_event_width && current_cpu_data.cpuid_level >= 0xa )
+ nmi_p6_event_width = MASK_EXTR(cpuid_eax(0xa), P6_EVENT_WIDTH_MASK);
+ if ( !nmi_p6_event_width )
+@@ -334,6 +332,8 @@ static void setup_p6_watchdog(unsigned counter)
+ nmi_p6_event_width > BITS_PER_LONG )
+ return;
+
++ nmi_perfctr_msr = MSR_P6_PERFCTR(0);
++
+ clear_msr_range(MSR_P6_EVNTSEL(0), 2);
+ clear_msr_range(MSR_P6_PERFCTR(0), 2);
+
+@@ -349,13 +349,13 @@ static void setup_p6_watchdog(unsigned counter)
+ wrmsr(MSR_P6_EVNTSEL(0), evntsel, 0);
+ }
+
+-static int setup_p4_watchdog(void)
++static void setup_p4_watchdog(void)
+ {
+ uint64_t misc_enable;
+
+ rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
+ if (!(misc_enable & MSR_IA32_MISC_ENABLE_PERF_AVAIL))
+- return 0;
++ return;
+
+ nmi_perfctr_msr = MSR_P4_IQ_PERFCTR0;
+ nmi_p4_cccr_val = P4_NMI_IQ_CCCR0;
+@@ -378,13 +378,12 @@ static int setup_p4_watchdog(void)
+ clear_msr_range(0x3E0, 2);
+ clear_msr_range(MSR_P4_BPU_CCCR0, 18);
+ clear_msr_range(MSR_P4_BPU_PERFCTR0, 18);
+-
++
+ wrmsrl(MSR_P4_CRU_ESCR0, P4_NMI_CRU_ESCR0);
+ wrmsrl(MSR_P4_IQ_CCCR0, P4_NMI_IQ_CCCR0 & ~P4_CCCR_ENABLE);
+ write_watchdog_counter("P4_IQ_COUNTER0");
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
+ wrmsrl(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val);
+- return 1;
+ }
+
+ void setup_apic_nmi_watchdog(void)
+@@ -399,8 +398,6 @@ void setup_apic_nmi_watchdog(void)
+ case 0xf ... 0x19:
+ setup_k7_watchdog();
+ break;
+- default:
+- return;
+ }
+ break;
+ case X86_VENDOR_INTEL:
+@@ -411,14 +408,16 @@ void setup_apic_nmi_watchdog(void)
+ : CORE_EVENT_CPU_CLOCKS_NOT_HALTED);
+ break;
+ case 15:
+- if (!setup_p4_watchdog())
+- return;
++ setup_p4_watchdog();
+ break;
+- default:
+- return;
+ }
+ break;
+- default:
++ }
++
++ if ( nmi_perfctr_msr == 0 )
++ {
++ printk(XENLOG_WARNING "Failed to configure NMI watchdog\n");
++ nmi_watchdog = NMI_NONE;
+ return;
+ }
+
+--
+2.44.0
+
diff --git a/0057-x86-PoD-tie-together-P2M-update-and-increment-of-ent.patch b/0057-x86-PoD-tie-together-P2M-update-and-increment-of-ent.patch
new file mode 100644
index 0000000..dedc1c2
--- /dev/null
+++ b/0057-x86-PoD-tie-together-P2M-update-and-increment-of-ent.patch
@@ -0,0 +1,61 @@
+From bfb69205376d94ff91b09a337c47fb665ee12da3 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Wed, 27 Mar 2024 12:29:33 +0100
+Subject: [PATCH 57/67] x86/PoD: tie together P2M update and increment of entry
+ count
+
+When not holding the PoD lock across the entire region covering P2M
+update and stats update, the entry count - if to be incorrect at all -
+should indicate too large a value in preference to a too small one, to
+avoid functions bailing early when they find the count is zero. However,
+instead of moving the increment ahead (and adjust back upon failure),
+extend the PoD-locked region.
+
+Fixes: 99af3cd40b6e ("x86/mm: Rework locking in the PoD layer")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: George Dunlap <george.dunlap@cloud.com>
+master commit: cc950c49ae6a6690f7fc3041a1f43122c250d250
+master date: 2024-03-21 09:48:10 +0100
+---
+ xen/arch/x86/mm/p2m-pod.c | 15 ++++++++++++---
+ 1 file changed, 12 insertions(+), 3 deletions(-)
+
+diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
+index 99dbcb3101..e903db9d93 100644
+--- a/xen/arch/x86/mm/p2m-pod.c
++++ b/xen/arch/x86/mm/p2m-pod.c
+@@ -1370,19 +1370,28 @@ mark_populate_on_demand(struct domain *d, unsigned long gfn_l,
+ }
+ }
+
++ /*
++ * P2M update and stats increment need to collectively be under PoD lock,
++ * to prevent code elsewhere observing PoD entry count being zero despite
++ * there actually still being PoD entries (created by the p2m_set_entry()
++ * invocation below).
++ */
++ pod_lock(p2m);
++
+ /* Now, actually do the two-way mapping */
+ rc = p2m_set_entry(p2m, gfn, INVALID_MFN, order,
+ p2m_populate_on_demand, p2m->default_access);
+ if ( rc == 0 )
+ {
+- pod_lock(p2m);
+ p2m->pod.entry_count += 1UL << order;
+ p2m->pod.entry_count -= pod_count;
+ BUG_ON(p2m->pod.entry_count < 0);
+- pod_unlock(p2m);
++ }
++
++ pod_unlock(p2m);
+
++ if ( rc == 0 )
+ ioreq_request_mapcache_invalidate(d);
+- }
+ else if ( order )
+ {
+ /*
+--
+2.44.0
+
diff --git a/0058-tools-oxenstored-Use-Map-instead-of-Hashtbl-for-quot.patch b/0058-tools-oxenstored-Use-Map-instead-of-Hashtbl-for-quot.patch
new file mode 100644
index 0000000..dfc7f5a
--- /dev/null
+++ b/0058-tools-oxenstored-Use-Map-instead-of-Hashtbl-for-quot.patch
@@ -0,0 +1,143 @@
+From 7abd305607938b846da1a37dd1bda7bf7d47dba5 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
+Date: Wed, 31 Jan 2024 10:52:55 +0000
+Subject: [PATCH 58/67] tools/oxenstored: Use Map instead of Hashtbl for quotas
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+On a stress test running 1000 VMs flamegraphs have shown that
+`oxenstored` spends a large amount of time in `Hashtbl.copy` and the GC.
+
+Hashtable complexity:
+ * read/write: O(1) average
+ * copy: O(domains) -- copying the entire table
+
+Map complexity:
+ * read/write: O(log n) worst case
+ * copy: O(1) -- a word copy
+
+We always perform at least one 'copy' when processing each xenstore
+packet (regardless whether it is a readonly operation or inside a
+transaction or not), so the actual complexity per packet is:
+ * Hashtbl: O(domains)
+ * Map: O(log domains)
+
+Maps are the clear winner, and a better fit for the immutable xenstore
+tree.
+
+Signed-off-by: Edwin Török <edwin.torok@cloud.com>
+Acked-by: Christian Lindig <christian.lindig@cloud.com>
+(cherry picked from commit b6cf604207fd0a04451a48f2ce6d05fb66c612ab)
+---
+ tools/ocaml/xenstored/quota.ml | 65 ++++++++++++++++++----------------
+ 1 file changed, 34 insertions(+), 31 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/quota.ml b/tools/ocaml/xenstored/quota.ml
+index 6e3d6401ae..ee8dd22581 100644
+--- a/tools/ocaml/xenstored/quota.ml
++++ b/tools/ocaml/xenstored/quota.ml
+@@ -23,66 +23,69 @@ let activate = ref true
+ let maxent = ref (1000)
+ let maxsize = ref (2048)
+
++module Domid = struct
++ type t = Xenctrl.domid
++ let compare (a:t) (b:t) = compare a b
++end
++
++module DomidMap = Map.Make(Domid)
++
+ type t = {
+ maxent: int; (* max entities per domU *)
+ maxsize: int; (* max size of data store in one node *)
+- cur: (Xenctrl.domid, int) Hashtbl.t; (* current domains quota *)
++ mutable cur: int DomidMap.t; (* current domains quota *)
+ }
+
+ let to_string quota domid =
+- if Hashtbl.mem quota.cur domid
+- then Printf.sprintf "dom%i quota: %i/%i" domid (Hashtbl.find quota.cur domid) quota.maxent
+- else Printf.sprintf "dom%i quota: not set" domid
++ try
++ Printf.sprintf "dom%i quota: %i/%i" domid (DomidMap.find domid quota.cur) quota.maxent
++ with Not_found ->
++ Printf.sprintf "dom%i quota: not set" domid
+
+ let create () =
+- { maxent = !maxent; maxsize = !maxsize; cur = Hashtbl.create 100; }
++ { maxent = !maxent; maxsize = !maxsize; cur = DomidMap.empty; }
+
+-let copy quota = { quota with cur = (Hashtbl.copy quota.cur) }
++let copy quota = { quota with cur = quota.cur }
+
+-let del quota id = Hashtbl.remove quota.cur id
++let del quota id = { quota with cur = DomidMap.remove id quota.cur }
+
+ let _check quota id size =
+ if size > quota.maxsize then (
+ warn "domain %u err create entry: data too big %d" id size;
+ raise Data_too_big
+ );
+- if id > 0 && Hashtbl.mem quota.cur id then
+- let entry = Hashtbl.find quota.cur id in
++ if id > 0 then
++ try
++ let entry = DomidMap.find id quota.cur in
+ if entry >= quota.maxent then (
+ warn "domain %u cannot create entry: quota reached" id;
+ raise Limit_reached
+ )
++ with Not_found -> ()
+
+ let check quota id size =
+ if !activate then
+ _check quota id size
+
+-let get_entry quota id = Hashtbl.find quota.cur id
++let find_or_zero quota_cur id =
++ try DomidMap.find id quota_cur with Not_found -> 0
+
+-let set_entry quota id nb =
+- if nb = 0
+- then Hashtbl.remove quota.cur id
+- else begin
+- if Hashtbl.mem quota.cur id then
+- Hashtbl.replace quota.cur id nb
+- else
+- Hashtbl.add quota.cur id nb
+- end
++let update_entry quota_cur id diff =
++ let nb = diff + find_or_zero quota_cur id in
++ if nb = 0 then DomidMap.remove id quota_cur
++ else DomidMap.add id nb quota_cur
+
+ let del_entry quota id =
+- try
+- let nb = get_entry quota id in
+- set_entry quota id (nb - 1)
+- with Not_found -> ()
++ quota.cur <- update_entry quota.cur id (-1)
+
+ let add_entry quota id =
+- let nb = try get_entry quota id with Not_found -> 0 in
+- set_entry quota id (nb + 1)
+-
+-let add quota diff =
+- Hashtbl.iter (fun id nb -> set_entry quota id (get_entry quota id + nb)) diff.cur
++ quota.cur <- update_entry quota.cur id (+1)
+
+ let merge orig_quota mod_quota dest_quota =
+- Hashtbl.iter (fun id nb -> let diff = nb - (try get_entry orig_quota id with Not_found -> 0) in
+- if diff <> 0 then
+- set_entry dest_quota id ((try get_entry dest_quota id with Not_found -> 0) + diff)) mod_quota.cur
++ let fold_merge id nb dest =
++ match nb - find_or_zero orig_quota.cur id with
++ | 0 -> dest (* not modified *)
++ | diff -> update_entry dest id diff (* update with [x=x+diff] *)
++ in
++ dest_quota.cur <- DomidMap.fold fold_merge mod_quota.cur dest_quota.cur
++ (* dest_quota = dest_quota + (mod_quota - orig_quota) *)
+--
+2.44.0
+
diff --git a/0059-tools-oxenstored-Make-Quota.t-pure.patch b/0059-tools-oxenstored-Make-Quota.t-pure.patch
new file mode 100644
index 0000000..7616b90
--- /dev/null
+++ b/0059-tools-oxenstored-Make-Quota.t-pure.patch
@@ -0,0 +1,121 @@
+From f38a815a54000ca51ff5165b2863d60b6bbea49c Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
+Date: Wed, 31 Jan 2024 10:52:56 +0000
+Subject: [PATCH 59/67] tools/oxenstored: Make Quota.t pure
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Now that we no longer have a hashtable inside we can make Quota.t pure, and
+push the mutable update to its callers. Store.t already had a mutable Quota.t
+field.
+
+No functional change.
+
+Signed-off-by: Edwin Török <edwin.torok@cloud.com>
+Acked-by: Christian Lindig <christian.lindig@cloud.com>
+(cherry picked from commit 098d868e52ac0165b7f36e22b767ea70cef70054)
+---
+ tools/ocaml/xenstored/quota.ml | 8 ++++----
+ tools/ocaml/xenstored/store.ml | 17 ++++++++++-------
+ 2 files changed, 14 insertions(+), 11 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/quota.ml b/tools/ocaml/xenstored/quota.ml
+index ee8dd22581..b3ab678c72 100644
+--- a/tools/ocaml/xenstored/quota.ml
++++ b/tools/ocaml/xenstored/quota.ml
+@@ -33,7 +33,7 @@ module DomidMap = Map.Make(Domid)
+ type t = {
+ maxent: int; (* max entities per domU *)
+ maxsize: int; (* max size of data store in one node *)
+- mutable cur: int DomidMap.t; (* current domains quota *)
++ cur: int DomidMap.t; (* current domains quota *)
+ }
+
+ let to_string quota domid =
+@@ -76,10 +76,10 @@ let update_entry quota_cur id diff =
+ else DomidMap.add id nb quota_cur
+
+ let del_entry quota id =
+- quota.cur <- update_entry quota.cur id (-1)
++ {quota with cur = update_entry quota.cur id (-1)}
+
+ let add_entry quota id =
+- quota.cur <- update_entry quota.cur id (+1)
++ {quota with cur = update_entry quota.cur id (+1)}
+
+ let merge orig_quota mod_quota dest_quota =
+ let fold_merge id nb dest =
+@@ -87,5 +87,5 @@ let merge orig_quota mod_quota dest_quota =
+ | 0 -> dest (* not modified *)
+ | diff -> update_entry dest id diff (* update with [x=x+diff] *)
+ in
+- dest_quota.cur <- DomidMap.fold fold_merge mod_quota.cur dest_quota.cur
++ {dest_quota with cur = DomidMap.fold fold_merge mod_quota.cur dest_quota.cur}
+ (* dest_quota = dest_quota + (mod_quota - orig_quota) *)
+diff --git a/tools/ocaml/xenstored/store.ml b/tools/ocaml/xenstored/store.ml
+index c94dbf3a62..5dd965db15 100644
+--- a/tools/ocaml/xenstored/store.ml
++++ b/tools/ocaml/xenstored/store.ml
+@@ -85,7 +85,9 @@ let check_owner node connection =
+ raise Define.Permission_denied;
+ end
+
+-let rec recurse fct node = fct node; SymbolMap.iter (fun _ -> recurse fct) node.children
++let rec recurse fct node acc =
++ let acc = fct node acc in
++ SymbolMap.fold (fun _ -> recurse fct) node.children acc
+
+ (** [recurse_filter_map f tree] applies [f] on each node in the tree recursively,
+ possibly removing some nodes.
+@@ -408,7 +410,7 @@ let dump_buffer store = dump_store_buf store.root
+ let set_node store path node orig_quota mod_quota =
+ let root = Path.set_node store.root path node in
+ store.root <- root;
+- Quota.merge orig_quota mod_quota store.quota
++ store.quota <- Quota.merge orig_quota mod_quota store.quota
+
+ let write store perm path value =
+ let node, existing = get_deepest_existing_node store path in
+@@ -422,7 +424,7 @@ let write store perm path value =
+ let root, node_created = path_write store perm path value in
+ store.root <- root;
+ if node_created
+- then Quota.add_entry store.quota owner
++ then store.quota <- Quota.add_entry store.quota owner
+
+ let mkdir store perm path =
+ let node, existing = get_deepest_existing_node store path in
+@@ -431,7 +433,7 @@ let mkdir store perm path =
+ if not (existing || (Perms.Connection.is_dom0 perm)) then Quota.check store.quota owner 0;
+ store.root <- path_mkdir store perm path;
+ if not existing then
+- Quota.add_entry store.quota owner
++ store.quota <- Quota.add_entry store.quota owner
+
+ let rm store perm path =
+ let rmed_node = Path.get_node store.root path in
+@@ -439,7 +441,7 @@ let rm store perm path =
+ | None -> raise Define.Doesnt_exist
+ | Some rmed_node ->
+ store.root <- path_rm store perm path;
+- Node.recurse (fun node -> Quota.del_entry store.quota (Node.get_owner node)) rmed_node
++ store.quota <- Node.recurse (fun node quota -> Quota.del_entry quota (Node.get_owner node)) rmed_node store.quota
+
+ let setperms store perm path nperms =
+ match Path.get_node store.root path with
+@@ -450,8 +452,9 @@ let setperms store perm path nperms =
+ if not ((old_owner = new_owner) || (Perms.Connection.is_dom0 perm)) then
+ raise Define.Permission_denied;
+ store.root <- path_setperms store perm path nperms;
+- Quota.del_entry store.quota old_owner;
+- Quota.add_entry store.quota new_owner
++ store.quota <-
++ let quota = Quota.del_entry store.quota old_owner in
++ Quota.add_entry quota new_owner
+
+ let reset_permissions store domid =
+ Logging.info "store|node" "Cleaning up xenstore ACLs for domid %d" domid;
+--
+2.44.0
+
diff --git a/0060-x86-cpu-policy-Hide-x2APIC-from-PV-guests.patch b/0060-x86-cpu-policy-Hide-x2APIC-from-PV-guests.patch
new file mode 100644
index 0000000..ce2b89d
--- /dev/null
+++ b/0060-x86-cpu-policy-Hide-x2APIC-from-PV-guests.patch
@@ -0,0 +1,90 @@
+From bb27e11c56963e170d1f6d2fbddbc956f7164121 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 2 Apr 2024 16:17:25 +0200
+Subject: [PATCH 60/67] x86/cpu-policy: Hide x2APIC from PV guests
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+PV guests can't write to MSR_APIC_BASE (in order to set EXTD), nor can they
+access any of the x2APIC MSR range. Therefore they mustn't see the x2APIC
+CPUID bit saying that they can.
+
+Right now, the host x2APIC flag filters into PV guests, meaning that PV guests
+generally see x2APIC except on Zen1-and-older AMD systems.
+
+Linux works around this by explicitly hiding the bit itself, and filtering
+EXTD out of MSR_APIC_BASE reads. NetBSD behaves more in the spirit of PV
+guests, and entirely ignores the APIC when built as a PV guest.
+
+Change the annotation from !A to !S. This has a consequence of stripping it
+out of both PV featuremasks. However, as existing guests may have seen the
+bit, set it back into the PV Max policy; a VM which saw the bit and is alive
+enough to migrate will have ignored it one way or another.
+
+Hiding x2APIC does change the contents of leaf 0xb, but as the information is
+nonsense to begin with, this is likely an improvement on the status quo.
+
+Xen's blind assumption that APIC_ID = vCPU_ID * 2 isn't interlinked with the
+host's topology structure, where a PV guest may see real host values, and the
+APIC_IDs are useless without an MADT to start with. Dom0 is the only PV VM to
+get an MADT but it's the host one, meaning the two sets of APIC_IDs are from
+different address spaces.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 5420aa165dfa5fe95dd84bb71cb96c15459935b1
+master date: 2024-03-01 20:14:19 +0000
+---
+ xen/arch/x86/cpu-policy.c | 11 +++++++++--
+ xen/include/public/arch-x86/cpufeatureset.h | 2 +-
+ 2 files changed, 10 insertions(+), 3 deletions(-)
+
+diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
+index 96c2cee1a8..ed64d56294 100644
+--- a/xen/arch/x86/cpu-policy.c
++++ b/xen/arch/x86/cpu-policy.c
+@@ -559,6 +559,14 @@ static void __init calculate_pv_max_policy(void)
+ for ( i = 0; i < ARRAY_SIZE(fs); ++i )
+ fs[i] &= pv_max_featuremask[i];
+
++ /*
++ * Xen at the time of writing (Feb 2024, 4.19 dev cycle) used to leak the
++ * host x2APIC capability into PV guests, but never supported the guest
++ * trying to turn x2APIC mode on. Tolerate an incoming VM which saw the
++ * x2APIC CPUID bit and is alive enough to migrate.
++ */
++ __set_bit(X86_FEATURE_X2APIC, fs);
++
+ /*
+ * If Xen isn't virtualising MSR_SPEC_CTRL for PV guests (functional
+ * availability, or admin choice), hide the feature.
+@@ -837,11 +845,10 @@ void recalculate_cpuid_policy(struct domain *d)
+ }
+
+ /*
+- * Allow the toolstack to set HTT, X2APIC and CMP_LEGACY. These bits
++ * Allow the toolstack to set HTT and CMP_LEGACY. These bits
+ * affect how to interpret topology information in other cpuid leaves.
+ */
+ __set_bit(X86_FEATURE_HTT, max_fs);
+- __set_bit(X86_FEATURE_X2APIC, max_fs);
+ __set_bit(X86_FEATURE_CMP_LEGACY, max_fs);
+
+ /*
+diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
+index 113e6cadc1..bc971f3c6f 100644
+--- a/xen/include/public/arch-x86/cpufeatureset.h
++++ b/xen/include/public/arch-x86/cpufeatureset.h
+@@ -123,7 +123,7 @@ XEN_CPUFEATURE(PCID, 1*32+17) /*H Process Context ID */
+ XEN_CPUFEATURE(DCA, 1*32+18) /* Direct Cache Access */
+ XEN_CPUFEATURE(SSE4_1, 1*32+19) /*A Streaming SIMD Extensions 4.1 */
+ XEN_CPUFEATURE(SSE4_2, 1*32+20) /*A Streaming SIMD Extensions 4.2 */
+-XEN_CPUFEATURE(X2APIC, 1*32+21) /*!A Extended xAPIC */
++XEN_CPUFEATURE(X2APIC, 1*32+21) /*!S Extended xAPIC */
+ XEN_CPUFEATURE(MOVBE, 1*32+22) /*A movbe instruction */
+ XEN_CPUFEATURE(POPCNT, 1*32+23) /*A POPCNT instruction */
+ XEN_CPUFEATURE(TSC_DEADLINE, 1*32+24) /*S TSC Deadline Timer */
+--
+2.44.0
+
diff --git a/0061-x86-cpu-policy-Fix-visibility-of-HTT-CMP_LEGACY-in-m.patch b/0061-x86-cpu-policy-Fix-visibility-of-HTT-CMP_LEGACY-in-m.patch
new file mode 100644
index 0000000..d1b8786
--- /dev/null
+++ b/0061-x86-cpu-policy-Fix-visibility-of-HTT-CMP_LEGACY-in-m.patch
@@ -0,0 +1,85 @@
+From 70ad9c5fdeac4814050080c87e06d44292ecf868 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 2 Apr 2024 16:18:05 +0200
+Subject: [PATCH 61/67] x86/cpu-policy: Fix visibility of HTT/CMP_LEGACY in max
+ policies
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The block in recalculate_cpuid_policy() predates the proper split between
+default and max policies, and was a "slightly max for a toolstack which knows
+about it" capability. It didn't get transformed properly in Xen 4.14.
+
+Because Xen will accept a VM with HTT/CMP_LEGACY seen, they should be visible
+in the max polices. Keep the default policy matching host settings.
+
+This manifested as an incorrectly-rejected migration across XenServer's Xen
+4.13 -> 4.17 upgrade, as Xapi is slowly growing the logic to check a VM
+against the target max policy.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: e2d8a652251660c3252d92b442e1a9c5d6e6a1e9
+master date: 2024-03-01 20:14:19 +0000
+---
+ xen/arch/x86/cpu-policy.c | 29 ++++++++++++++++++++++-------
+ 1 file changed, 22 insertions(+), 7 deletions(-)
+
+diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
+index ed64d56294..24acd12ce2 100644
+--- a/xen/arch/x86/cpu-policy.c
++++ b/xen/arch/x86/cpu-policy.c
+@@ -458,6 +458,16 @@ static void __init guest_common_max_feature_adjustments(uint32_t *fs)
+ raw_cpu_policy.feat.clwb )
+ __set_bit(X86_FEATURE_CLWB, fs);
+ }
++
++ /*
++ * Topology information inside the guest is entirely at the toolstack's
++ * discretion, and bears no relationship to the host we're running on.
++ *
++ * HTT identifies p->basic.lppp as valid
++ * CMP_LEGACY identifies p->extd.nc as valid
++ */
++ __set_bit(X86_FEATURE_HTT, fs);
++ __set_bit(X86_FEATURE_CMP_LEGACY, fs);
+ }
+
+ static void __init guest_common_default_feature_adjustments(uint32_t *fs)
+@@ -512,6 +522,18 @@ static void __init guest_common_default_feature_adjustments(uint32_t *fs)
+ __clear_bit(X86_FEATURE_CLWB, fs);
+ }
+
++ /*
++ * Topology information is at the toolstack's discretion so these are
++ * unconditionally set in max, but pick a default which matches the host.
++ */
++ __clear_bit(X86_FEATURE_HTT, fs);
++ if ( cpu_has_htt )
++ __set_bit(X86_FEATURE_HTT, fs);
++
++ __clear_bit(X86_FEATURE_CMP_LEGACY, fs);
++ if ( cpu_has_cmp_legacy )
++ __set_bit(X86_FEATURE_CMP_LEGACY, fs);
++
+ /*
+ * On certain hardware, speculative or errata workarounds can result in
+ * TSX being placed in "force-abort" mode, where it doesn't actually
+@@ -844,13 +866,6 @@ void recalculate_cpuid_policy(struct domain *d)
+ }
+ }
+
+- /*
+- * Allow the toolstack to set HTT and CMP_LEGACY. These bits
+- * affect how to interpret topology information in other cpuid leaves.
+- */
+- __set_bit(X86_FEATURE_HTT, max_fs);
+- __set_bit(X86_FEATURE_CMP_LEGACY, max_fs);
+-
+ /*
+ * 32bit PV domains can't use any Long Mode features, and cannot use
+ * SYSCALL on non-AMD hardware.
+--
+2.44.0
+
diff --git a/0062-xen-virtual-region-Rename-the-start-end-fields.patch b/0062-xen-virtual-region-Rename-the-start-end-fields.patch
new file mode 100644
index 0000000..9dbd5c9
--- /dev/null
+++ b/0062-xen-virtual-region-Rename-the-start-end-fields.patch
@@ -0,0 +1,140 @@
+From 2392e958ec6fd2e48e011781344cf94dee6d6142 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 2 Apr 2024 16:18:51 +0200
+Subject: [PATCH 62/67] xen/virtual-region: Rename the start/end fields
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+... to text_{start,end}. We're about to introduce another start/end pair.
+
+Despite it's name, struct virtual_region has always been a module-ish
+description. Call this out specifically.
+
+As minor cleanup, replace ROUNDUP(x, PAGE_SIZE) with the more concise
+PAGE_ALIGN() ahead of duplicating the example.
+
+No functional change.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
+master commit: 989556c6f8ca080f5f202417af97d1188b9ba52a
+master date: 2024-03-07 14:24:42 +0000
+---
+ xen/common/livepatch.c | 9 +++++----
+ xen/common/virtual_region.c | 19 ++++++++++---------
+ xen/include/xen/virtual_region.h | 11 +++++++++--
+ 3 files changed, 24 insertions(+), 15 deletions(-)
+
+diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
+index a5068a2217..29395f286f 100644
+--- a/xen/common/livepatch.c
++++ b/xen/common/livepatch.c
+@@ -785,8 +785,8 @@ static int prepare_payload(struct payload *payload,
+ region = &payload->region;
+
+ region->symbols_lookup = livepatch_symbols_lookup;
+- region->start = payload->text_addr;
+- region->end = payload->text_addr + payload->text_size;
++ region->text_start = payload->text_addr;
++ region->text_end = payload->text_addr + payload->text_size;
+
+ /* Optional sections. */
+ for ( i = 0; i < BUGFRAME_NR; i++ )
+@@ -823,8 +823,9 @@ static int prepare_payload(struct payload *payload,
+ const void *instr = ALT_ORIG_PTR(a);
+ const void *replacement = ALT_REPL_PTR(a);
+
+- if ( (instr < region->start && instr >= region->end) ||
+- (replacement < region->start && replacement >= region->end) )
++ if ( (instr < region->text_start && instr >= region->text_end) ||
++ (replacement < region->text_start &&
++ replacement >= region->text_end) )
+ {
+ printk(XENLOG_ERR LIVEPATCH "%s Alt patching outside payload: %p\n",
+ elf->name, instr);
+diff --git a/xen/common/virtual_region.c b/xen/common/virtual_region.c
+index 9f12c30efe..b22ffb75c4 100644
+--- a/xen/common/virtual_region.c
++++ b/xen/common/virtual_region.c
+@@ -11,15 +11,15 @@
+
+ static struct virtual_region core = {
+ .list = LIST_HEAD_INIT(core.list),
+- .start = _stext,
+- .end = _etext,
++ .text_start = _stext,
++ .text_end = _etext,
+ };
+
+ /* Becomes irrelevant when __init sections are cleared. */
+ static struct virtual_region core_init __initdata = {
+ .list = LIST_HEAD_INIT(core_init.list),
+- .start = _sinittext,
+- .end = _einittext,
++ .text_start = _sinittext,
++ .text_end = _einittext,
+ };
+
+ /*
+@@ -39,7 +39,8 @@ const struct virtual_region *find_text_region(unsigned long addr)
+ rcu_read_lock(&rcu_virtual_region_lock);
+ list_for_each_entry_rcu( region, &virtual_region_list, list )
+ {
+- if ( (void *)addr >= region->start && (void *)addr < region->end )
++ if ( (void *)addr >= region->text_start &&
++ (void *)addr < region->text_end )
+ {
+ rcu_read_unlock(&rcu_virtual_region_lock);
+ return region;
+@@ -88,8 +89,8 @@ void relax_virtual_region_perms(void)
+
+ rcu_read_lock(&rcu_virtual_region_lock);
+ list_for_each_entry_rcu( region, &virtual_region_list, list )
+- modify_xen_mappings_lite((unsigned long)region->start,
+- ROUNDUP((unsigned long)region->end, PAGE_SIZE),
++ modify_xen_mappings_lite((unsigned long)region->text_start,
++ PAGE_ALIGN((unsigned long)region->text_end),
+ PAGE_HYPERVISOR_RWX);
+ rcu_read_unlock(&rcu_virtual_region_lock);
+ }
+@@ -100,8 +101,8 @@ void tighten_virtual_region_perms(void)
+
+ rcu_read_lock(&rcu_virtual_region_lock);
+ list_for_each_entry_rcu( region, &virtual_region_list, list )
+- modify_xen_mappings_lite((unsigned long)region->start,
+- ROUNDUP((unsigned long)region->end, PAGE_SIZE),
++ modify_xen_mappings_lite((unsigned long)region->text_start,
++ PAGE_ALIGN((unsigned long)region->text_end),
+ PAGE_HYPERVISOR_RX);
+ rcu_read_unlock(&rcu_virtual_region_lock);
+ }
+diff --git a/xen/include/xen/virtual_region.h b/xen/include/xen/virtual_region.h
+index d053620711..442a45bf1f 100644
+--- a/xen/include/xen/virtual_region.h
++++ b/xen/include/xen/virtual_region.h
+@@ -9,11 +9,18 @@
+ #include <xen/list.h>
+ #include <xen/symbols.h>
+
++/*
++ * Despite it's name, this is a module(ish) description.
++ *
++ * There's one region for the runtime .text/etc, one region for .init during
++ * boot only, and one region per livepatch.
++ */
+ struct virtual_region
+ {
+ struct list_head list;
+- const void *start; /* Virtual address start. */
+- const void *end; /* Virtual address end. */
++
++ const void *text_start; /* .text virtual address start. */
++ const void *text_end; /* .text virtual address end. */
+
+ /* If this is NULL the default lookup mechanism is used. */
+ symbols_lookup_t *symbols_lookup;
+--
+2.44.0
+
diff --git a/0063-xen-virtual-region-Include-rodata-pointers.patch b/0063-xen-virtual-region-Include-rodata-pointers.patch
new file mode 100644
index 0000000..9f51d4d
--- /dev/null
+++ b/0063-xen-virtual-region-Include-rodata-pointers.patch
@@ -0,0 +1,71 @@
+From 335cbb55567b20df8e8bd2d1b340609e272ddab6 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 2 Apr 2024 16:19:11 +0200
+Subject: [PATCH 63/67] xen/virtual-region: Include rodata pointers
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+These are optional. .init doesn't distinguish types of data like this, and
+livepatches don't necesserily have any .rodata either.
+
+No functional change.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
+master commit: ef969144a425e39f5b214a875b5713d0ea8575fb
+master date: 2024-03-07 14:24:42 +0000
+---
+ xen/common/livepatch.c | 6 ++++++
+ xen/common/virtual_region.c | 2 ++
+ xen/include/xen/virtual_region.h | 3 +++
+ 3 files changed, 11 insertions(+)
+
+diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
+index 29395f286f..28c09ddf58 100644
+--- a/xen/common/livepatch.c
++++ b/xen/common/livepatch.c
+@@ -788,6 +788,12 @@ static int prepare_payload(struct payload *payload,
+ region->text_start = payload->text_addr;
+ region->text_end = payload->text_addr + payload->text_size;
+
++ if ( payload->ro_size )
++ {
++ region->rodata_start = payload->ro_addr;
++ region->rodata_end = payload->ro_addr + payload->ro_size;
++ }
++
+ /* Optional sections. */
+ for ( i = 0; i < BUGFRAME_NR; i++ )
+ {
+diff --git a/xen/common/virtual_region.c b/xen/common/virtual_region.c
+index b22ffb75c4..9c566f8ec9 100644
+--- a/xen/common/virtual_region.c
++++ b/xen/common/virtual_region.c
+@@ -13,6 +13,8 @@ static struct virtual_region core = {
+ .list = LIST_HEAD_INIT(core.list),
+ .text_start = _stext,
+ .text_end = _etext,
++ .rodata_start = _srodata,
++ .rodata_end = _erodata,
+ };
+
+ /* Becomes irrelevant when __init sections are cleared. */
+diff --git a/xen/include/xen/virtual_region.h b/xen/include/xen/virtual_region.h
+index 442a45bf1f..dcdc95ba49 100644
+--- a/xen/include/xen/virtual_region.h
++++ b/xen/include/xen/virtual_region.h
+@@ -22,6 +22,9 @@ struct virtual_region
+ const void *text_start; /* .text virtual address start. */
+ const void *text_end; /* .text virtual address end. */
+
++ const void *rodata_start; /* .rodata virtual address start (optional). */
++ const void *rodata_end; /* .rodata virtual address end. */
++
+ /* If this is NULL the default lookup mechanism is used. */
+ symbols_lookup_t *symbols_lookup;
+
+--
+2.44.0
+
diff --git a/0064-x86-livepatch-Relax-permissions-on-rodata-too.patch b/0064-x86-livepatch-Relax-permissions-on-rodata-too.patch
new file mode 100644
index 0000000..bc80769
--- /dev/null
+++ b/0064-x86-livepatch-Relax-permissions-on-rodata-too.patch
@@ -0,0 +1,85 @@
+From c3ff11b11c21777a9b1c616607705f3a7340b391 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 2 Apr 2024 16:19:36 +0200
+Subject: [PATCH 64/67] x86/livepatch: Relax permissions on rodata too
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+This reinstates the capability to patch .rodata in load/unload hooks, which
+was lost when we stopped using CR0.WP=0 to patch.
+
+This turns out to be rather less of a large TODO than I thought at the time.
+
+Fixes: 8676092a0f16 ("x86/livepatch: Fix livepatch application when CET is active")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
+master commit: b083b1c393dc8961acf0959b1d2e0ad459985ae3
+master date: 2024-03-07 14:24:42 +0000
+---
+ xen/arch/x86/livepatch.c | 4 ++--
+ xen/common/virtual_region.c | 12 ++++++++++++
+ 2 files changed, 14 insertions(+), 2 deletions(-)
+
+diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c
+index ee539f001b..4f76127e1f 100644
+--- a/xen/arch/x86/livepatch.c
++++ b/xen/arch/x86/livepatch.c
+@@ -62,7 +62,7 @@ int arch_livepatch_safety_check(void)
+ int noinline arch_livepatch_quiesce(void)
+ {
+ /*
+- * Relax perms on .text to be RWX, so we can modify them.
++ * Relax perms on .text/.rodata, so we can modify them.
+ *
+ * This relaxes perms globally, but all other CPUs are waiting on us.
+ */
+@@ -75,7 +75,7 @@ int noinline arch_livepatch_quiesce(void)
+ void noinline arch_livepatch_revive(void)
+ {
+ /*
+- * Reinstate perms on .text to be RX. This also cleans out the dirty
++ * Reinstate perms on .text/.rodata. This also cleans out the dirty
+ * bits, which matters when CET Shstk is active.
+ *
+ * The other CPUs waiting for us could in principle have re-walked while
+diff --git a/xen/common/virtual_region.c b/xen/common/virtual_region.c
+index 9c566f8ec9..aefc08e75f 100644
+--- a/xen/common/virtual_region.c
++++ b/xen/common/virtual_region.c
+@@ -91,9 +91,15 @@ void relax_virtual_region_perms(void)
+
+ rcu_read_lock(&rcu_virtual_region_lock);
+ list_for_each_entry_rcu( region, &virtual_region_list, list )
++ {
+ modify_xen_mappings_lite((unsigned long)region->text_start,
+ PAGE_ALIGN((unsigned long)region->text_end),
+ PAGE_HYPERVISOR_RWX);
++ if ( region->rodata_start )
++ modify_xen_mappings_lite((unsigned long)region->rodata_start,
++ PAGE_ALIGN((unsigned long)region->rodata_end),
++ PAGE_HYPERVISOR_RW);
++ }
+ rcu_read_unlock(&rcu_virtual_region_lock);
+ }
+
+@@ -103,9 +109,15 @@ void tighten_virtual_region_perms(void)
+
+ rcu_read_lock(&rcu_virtual_region_lock);
+ list_for_each_entry_rcu( region, &virtual_region_list, list )
++ {
+ modify_xen_mappings_lite((unsigned long)region->text_start,
+ PAGE_ALIGN((unsigned long)region->text_end),
+ PAGE_HYPERVISOR_RX);
++ if ( region->rodata_start )
++ modify_xen_mappings_lite((unsigned long)region->rodata_start,
++ PAGE_ALIGN((unsigned long)region->rodata_end),
++ PAGE_HYPERVISOR_RO);
++ }
+ rcu_read_unlock(&rcu_virtual_region_lock);
+ }
+ #endif /* CONFIG_X86 */
+--
+2.44.0
+
diff --git a/0065-x86-boot-Improve-the-boot-watchdog-determination-of-.patch b/0065-x86-boot-Improve-the-boot-watchdog-determination-of-.patch
new file mode 100644
index 0000000..4a46326
--- /dev/null
+++ b/0065-x86-boot-Improve-the-boot-watchdog-determination-of-.patch
@@ -0,0 +1,106 @@
+From 846fb984b506135917c2862d2e4607005d6afdeb Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 2 Apr 2024 16:20:09 +0200
+Subject: [PATCH 65/67] x86/boot: Improve the boot watchdog determination of
+ stuck cpus
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Right now, check_nmi_watchdog() has two processing loops over all online CPUs
+using prev_nmi_count as storage.
+
+Use a cpumask_t instead (1/32th as much initdata) and have wait_for_nmis()
+make the determination of whether it is stuck, rather than having both
+functions needing to agree on how many ticks mean stuck.
+
+More importantly though, it means we can use the standard cpumask
+infrastructure, including turning this:
+
+ (XEN) Brought up 512 CPUs
+ (XEN) Testing NMI watchdog on all CPUs: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,
266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511} stuck
+
+into the rather more manageable:
+
+ (XEN) Brought up 512 CPUs
+ (XEN) Testing NMI watchdog on all CPUs: {0-511} stuck
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 9e18f339830c828798aef465556d4029d83476a0
+master date: 2024-03-19 18:29:37 +0000
+---
+ xen/arch/x86/nmi.c | 33 ++++++++++++++-------------------
+ 1 file changed, 14 insertions(+), 19 deletions(-)
+
+diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
+index 7c9591b65e..dd31034ac8 100644
+--- a/xen/arch/x86/nmi.c
++++ b/xen/arch/x86/nmi.c
+@@ -150,6 +150,8 @@ int nmi_active;
+
+ static void __init cf_check wait_for_nmis(void *p)
+ {
++ cpumask_t *stuck_cpus = p;
++ unsigned int cpu = smp_processor_id();
+ unsigned int start_count = this_cpu(nmi_count);
+ unsigned long ticks = 10 * 1000 * cpu_khz / nmi_hz;
+ unsigned long s, e;
+@@ -158,42 +160,35 @@ static void __init cf_check wait_for_nmis(void *p)
+ do {
+ cpu_relax();
+ if ( this_cpu(nmi_count) >= start_count + 2 )
+- break;
++ return;
++
+ e = rdtsc();
+- } while( e - s < ticks );
++ } while ( e - s < ticks );
++
++ /* Timeout. Mark ourselves as stuck. */
++ cpumask_set_cpu(cpu, stuck_cpus);
+ }
+
+ void __init check_nmi_watchdog(void)
+ {
+- static unsigned int __initdata prev_nmi_count[NR_CPUS];
+- int cpu;
+- bool ok = true;
++ static cpumask_t __initdata stuck_cpus;
+
+ if ( nmi_watchdog == NMI_NONE )
+ return;
+
+ printk("Testing NMI watchdog on all CPUs:");
+
+- for_each_online_cpu ( cpu )
+- prev_nmi_count[cpu] = per_cpu(nmi_count, cpu);
+-
+ /*
+ * Wait at most 10 ticks for 2 watchdog NMIs on each CPU.
+ * Busy-wait on all CPUs: the LAPIC counter that the NMI watchdog
+ * uses only runs while the core's not halted
+ */
+- on_selected_cpus(&cpu_online_map, wait_for_nmis, NULL, 1);
+-
+- for_each_online_cpu ( cpu )
+- {
+- if ( per_cpu(nmi_count, cpu) - prev_nmi_count[cpu] < 2 )
+- {
+- printk(" %d", cpu);
+- ok = false;
+- }
+- }
++ on_selected_cpus(&cpu_online_map, wait_for_nmis, &stuck_cpus, 1);
+
+- printk(" %s\n", ok ? "ok" : "stuck");
++ if ( cpumask_empty(&stuck_cpus) )
++ printk("ok\n");
++ else
++ printk("{%*pbl} stuck\n", CPUMASK_PR(&stuck_cpus));
+
+ /*
+ * Now that we know it works we can reduce NMI frequency to
+--
+2.44.0
+
diff --git a/0066-x86-boot-Support-the-watchdog-on-newer-AMD-systems.patch b/0066-x86-boot-Support-the-watchdog-on-newer-AMD-systems.patch
new file mode 100644
index 0000000..e501861
--- /dev/null
+++ b/0066-x86-boot-Support-the-watchdog-on-newer-AMD-systems.patch
@@ -0,0 +1,48 @@
+From 2777b499f1f6d5cea68f9479f82d055542b822ad Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 2 Apr 2024 16:20:30 +0200
+Subject: [PATCH 66/67] x86/boot: Support the watchdog on newer AMD systems
+
+The MSRs used by setup_k7_watchdog() are architectural in 64bit. The Unit
+Select (0x76, cycles not in halt state) isn't, but it hasn't changed in 25
+years, making this a trend likely to continue.
+
+Drop the family check. If the Unit Select does happen to change meaning in
+the future, check_nmi_watchdog() will still notice the watchdog not operating
+as expected.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 131892e0dcc1265b621c2b7d844cb9e7c3a4404f
+master date: 2024-03-19 18:29:37 +0000
+---
+ xen/arch/x86/nmi.c | 11 ++++-------
+ 1 file changed, 4 insertions(+), 7 deletions(-)
+
+diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
+index dd31034ac8..c7c51614a6 100644
+--- a/xen/arch/x86/nmi.c
++++ b/xen/arch/x86/nmi.c
+@@ -386,15 +386,12 @@ void setup_apic_nmi_watchdog(void)
+ if ( nmi_watchdog == NMI_NONE )
+ return;
+
+- switch (boot_cpu_data.x86_vendor) {
++ switch ( boot_cpu_data.x86_vendor )
++ {
+ case X86_VENDOR_AMD:
+- switch (boot_cpu_data.x86) {
+- case 6:
+- case 0xf ... 0x19:
+- setup_k7_watchdog();
+- break;
+- }
++ setup_k7_watchdog();
+ break;
++
+ case X86_VENDOR_INTEL:
+ switch (boot_cpu_data.x86) {
+ case 6:
+--
+2.44.0
+
diff --git a/0067-tests-resource-Fix-HVM-guest-in-SHADOW-builds.patch b/0067-tests-resource-Fix-HVM-guest-in-SHADOW-builds.patch
new file mode 100644
index 0000000..5ce4e17
--- /dev/null
+++ b/0067-tests-resource-Fix-HVM-guest-in-SHADOW-builds.patch
@@ -0,0 +1,110 @@
+From 9bc40dbcf9eafccc1923b2555286bf6a2af03b7a Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 2 Apr 2024 16:24:07 +0200
+Subject: [PATCH 67/67] tests/resource: Fix HVM guest in !SHADOW builds
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Right now, test-resource always creates HVM Shadow guests. But if Xen has
+SHADOW compiled out, running the test yields:
+
+ $./test-resource
+ XENMEM_acquire_resource tests
+ Test x86 PV
+ Created d1
+ Test grant table
+ Test x86 PVH
+ Skip: 95 - Operation not supported
+
+and doesn't really test HVM guests, but doesn't fail either.
+
+There's nothing paging-mode-specific about this test, so default to HAP if
+possible and provide a more specific message if neither HAP or Shadow are
+available.
+
+As we've got physinfo to hand, also provide more specific message about the
+absence of PV or HVM support.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 0263dc9069ddb66335c72a159e09050b1600e56a
+master date: 2024-03-01 20:14:19 +0000
+---
+ tools/tests/resource/test-resource.c | 39 ++++++++++++++++++++++++++++
+ 1 file changed, 39 insertions(+)
+
+diff --git a/tools/tests/resource/test-resource.c b/tools/tests/resource/test-resource.c
+index 0a950072f9..e2c4ba3478 100644
+--- a/tools/tests/resource/test-resource.c
++++ b/tools/tests/resource/test-resource.c
+@@ -20,6 +20,8 @@ static xc_interface *xch;
+ static xenforeignmemory_handle *fh;
+ static xengnttab_handle *gh;
+
++static xc_physinfo_t physinfo;
++
+ static void test_gnttab(uint32_t domid, unsigned int nr_frames,
+ unsigned long gfn)
+ {
+@@ -172,6 +174,37 @@ static void test_domain_configurations(void)
+
+ printf("Test %s\n", t->name);
+
++#if defined(__x86_64__) || defined(__i386__)
++ if ( t->create.flags & XEN_DOMCTL_CDF_hvm )
++ {
++ if ( !(physinfo.capabilities & XEN_SYSCTL_PHYSCAP_hvm) )
++ {
++ printf(" Skip: HVM not available\n");
++ continue;
++ }
++
++ /*
++ * On x86, use HAP guests if possible, but skip if neither HAP nor
++ * SHADOW is available.
++ */
++ if ( physinfo.capabilities & XEN_SYSCTL_PHYSCAP_hap )
++ t->create.flags |= XEN_DOMCTL_CDF_hap;
++ else if ( !(physinfo.capabilities & XEN_SYSCTL_PHYSCAP_shadow) )
++ {
++ printf(" Skip: Neither HAP or SHADOW available\n");
++ continue;
++ }
++ }
++ else
++ {
++ if ( !(physinfo.capabilities & XEN_SYSCTL_PHYSCAP_pv) )
++ {
++ printf(" Skip: PV not available\n");
++ continue;
++ }
++ }
++#endif
++
+ rc = xc_domain_create(xch, &domid, &t->create);
+ if ( rc )
+ {
+@@ -214,6 +247,8 @@ static void test_domain_configurations(void)
+
+ int main(int argc, char **argv)
+ {
++ int rc;
++
+ printf("XENMEM_acquire_resource tests\n");
+
+ xch = xc_interface_open(NULL, NULL, 0);
+@@ -227,6 +262,10 @@ int main(int argc, char **argv)
+ if ( !gh )
+ err(1, "xengnttab_open");
+
++ rc = xc_physinfo(xch, &physinfo);
++ if ( rc )
++ err(1, "Failed to obtain physinfo");
++
+ test_domain_configurations();
+
+ return !!nr_failures;
+--
+2.44.0
+
diff --git a/info.txt b/info.txt
index 0a99509..fa9f510 100644
--- a/info.txt
+++ b/info.txt
@@ -1,6 +1,6 @@
-Xen upstream patchset #0 for 4.17.4-pre
+Xen upstream patchset #1 for 4.17.4-pre
Containing patches from
RELEASE-4.17.3 (07f413d7ffb06eab36045bd19f53555de1cacf62)
to
-staging-4.17 (091466ba55d1e2e75738f751818ace2e3ed08ccf)
+staging-4.17 (9bc40dbcf9eafccc1923b2555286bf6a2af03b7a)
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [gentoo-commits] proj/xen-upstream-patches:main commit in: /
@ 2024-08-01 13:03 Tomáš Mózes
0 siblings, 0 replies; 11+ messages in thread
From: Tomáš Mózes @ 2024-08-01 13:03 UTC (permalink / raw
To: gentoo-commits
commit: 212febf72900c12405591dcc5902d4cfa11173bf
Author: Tomáš Mózes <tomas.mozes <AT> gmail <DOT> com>
AuthorDate: Thu Aug 1 13:02:58 2024 +0000
Commit: Tomáš Mózes <hydrapolic <AT> gmail <DOT> com>
CommitDate: Thu Aug 1 13:02:58 2024 +0000
URL: https://gitweb.gentoo.org/proj/xen-upstream-patches.git/commit/?id=212febf7
Xen 4.18.3-pre-patchset-0
Signed-off-by: Tomáš Mózes <tomas.mozes <AT> gmail.com>
...x86-entry-Fix-build-with-older-toolchains.patch | 32 +
...-__alt_call_maybe_initdata-so-it-s-safe-f.patch | 49 ++
...vice-assignment-if-phantom-functions-cann.patch | 91 ---
0003-VT-d-Fix-else-vs-endif-misplacement.patch | 70 ---
...id-UIP-flag-being-set-for-longer-than-exp.patch | 57 ++
...R-correct-inadvertently-inverted-WC-check.patch | 36 ++
...end-CPU-erratum-1474-fix-to-more-affected.patch | 123 ----
0005-CirrusCI-drop-FreeBSD-12.patch | 39 --
...x-reporting-of-BHB-clearing-usage-from-gu.patch | 69 +++
...nsure-Global-Performance-Counter-Control-.patch | 74 ---
...-x86-spec-adjust-logic-that-elides-lfence.patch | 75 +++
...vmx-Fix-IRQ-handling-for-EXIT_REASON_INIT.patch | 65 ---
0007-xen-xsm-Wire-up-get_dom0_console.patch | 66 +++
...vmx-Disallow-the-use-of-inactivity-states.patch | 126 ----
...en-x86-Fix-Syntax-warning-in-gen-cpuid.py.patch | 41 ++
...t-ATS-checking-for-root-complex-integrate.patch | 63 ++
...-move-lib-fdt-elf-temp.o-and-their-deps-t.patch | 70 ---
...ibxs-Open-dev-xen-xenbus-fds-as-O_CLOEXEC.patch | 47 ++
...m-pt-fix-off-by-one-in-entry-check-assert.patch | 36 --
...s-xentop-fix-sorting-bug-for-some-columns.patch | 67 ---
...icy-Fix-migration-from-Ice-Lake-to-Cascad.patch | 92 +++
0012-amd-vi-fix-IVMD-memory-type-checks.patch | 53 --
...code-Distinguish-ucode-already-up-to-date.patch | 58 ++
...opulation-of-the-online-vCPU-bitmap-for-P.patch | 61 ++
...hvm-Fix-fast-singlestep-state-persistence.patch | 86 ---
...andling-XenStore-errors-in-device-creatio.patch | 191 ++++++
...y-state-on-hvmemul_map_linear_addr-s-erro.patch | 63 --
0015-build-Replace-which-with-command-v.patch | 57 --
...et-all-sched_resource-data-inside-locked-.patch | 84 +++
...le-relocating-memory-for-qemu-xen-in-stub.patch | 50 --
...-x86-respect-mapcache_domain_init-failing.patch | 38 ++
...sure-build-fails-when-running-kconfig-fai.patch | 58 --
0017-tools-xentop-Fix-cpu-sort-order.patch | 76 +++
...oid-system-wide-rendezvous-when-setting-A.patch | 60 ++
0018-x86emul-add-missing-EVEX.R-checks.patch | 50 --
... => 0019-update-Xen-version-to-4.18.3-pre.patch | 16 +-
...vepatch-fix-norevert-test-hook-setup-typo.patch | 36 --
...urther-fixes-to-identify-ucode-already-up.patch | 92 +++
...-fix-printf-format-specifier-in-no_config.patch | 38 --
...-use-a-union-as-register-type-for-functio.patch | 141 -----
...vent-watchdog-triggering-when-dumping-MSI.patch | 44 ++
...ove-offline-CPUs-from-old-CPU-mask-when-a.patch | 44 ++
...x-BRANCH_HARDEN-option-to-only-be-set-whe.patch | 57 --
0023-CI-Update-FreeBSD-to-13.3.patch | 33 ++
...-for-shadow-stack-in-exception-from-stub-.patch | 212 -------
...not-use-shorthand-IPI-destinations-in-CPU.patch | 98 ++++
0024-xen-arm-Fix-UBSAN-failure-in-start_xen.patch | 52 --
...e-SVM-VMX-when-their-enabling-is-prohibit.patch | 67 ---
...mit-interrupt-movement-done-by-fixup_irqs.patch | 104 ++++
...rect-special-page-checking-in-epte_get_en.patch | 46 ++
...sched-Fix-UB-shift-in-compat_set_timer_op.patch | 86 ---
...id-marking-non-present-entries-for-re-con.patch | 85 +++
...int-the-built-in-SPECULATIVE_HARDEN_-opti.patch | 54 --
...p-questionable-mfn_valid-from-epte_get_en.patch | 47 ++
...x-INDIRECT_THUNK-option-to-only-be-set-wh.patch | 67 ---
...86-Intel-unlock-CPUID-earlier-for-the-BSP.patch | 105 ++++
...-not-print-thunk-option-selection-if-not-.patch | 50 --
...l-with-old_cpu_mask-for-interrupts-in-mov.patch | 84 +++
...ch-register-livepatch-regions-when-loaded.patch | 159 -----
...dle-moving-interrupts-in-_assign_irq_vect.patch | 172 ++++++
...ch-search-for-symbols-in-all-loaded-paylo.patch | 149 -----
...ch-fix-norevert-test-attempt-to-open-code.patch | 186 ------
...san-Fix-UB-in-type_descriptor-declaration.patch | 39 ++
...86-xstate-Fix-initialisation-of-XSS-cache.patch | 74 +++
...ch-properly-build-the-noapply-and-norever.patch | 43 --
...ix-segfault-in-device_model_spawn_outcome.patch | 39 --
...puid-Fix-handling-of-XSAVE-dynamic-leaves.patch | 72 +++
...-always-use-a-temporary-parameter-stashin.patch | 197 -------
...ward-pending-interrupts-to-new-destinatio.patch | 143 +++++
...icy-Allow-for-levelling-of-VERW-side-effe.patch | 102 ----
...exception-from-stub-recovery-selftests-wi.patch | 84 +++
...CI-skip-huge-BARs-in-certain-calculations.patch | 99 ----
...-don-t-let-test-xenstore-write-nodes-exce.patch | 41 ++
...-let-test-xenstore-exit-with-non-0-status.patch | 57 ++
...detection-of-last-L1-entry-in-modify_xen_.patch | 41 --
0039-LICENSES-Add-MIT-0-MIT-No-Attribution.patch | 58 ++
0039-x86-entry-Introduce-EFRAME_-constants.patch | 314 ----------
...t-stand-alone-sd_notify-implementation-fr.patch | 130 +++++
0040-x86-Resync-intel-family.h-from-Linux.patch | 98 ----
...o-xenstored-Don-t-link-against-libsystemd.patch | 87 +++
...form-VERW-flushing-later-in-the-VMExit-pa.patch | 146 -----
0042-tools-Drop-libsystemd-as-a-dependency.patch | 648 +++++++++++++++++++++
...rl-Perform-VERW-flushing-later-in-exit-pa.patch | 209 -------
...x86-ioapic-Fix-signed-shifts-in-io_apic.c.patch | 46 ++
...x86-spec-ctrl-Rename-VERW-related-options.patch | 248 --------
0044-tools-xl-Open-xldevd.log-with-O_CLOEXEC.patch | 53 ++
0044-x86-spec-ctrl-VERW-handling-adjustments.patch | 171 ------
0045-pirq_cleanup_check-leaks.patch | 84 +++
...rl-Mitigation-Register-File-Data-Sampling.patch | 320 ----------
...ilder-Correct-the-length-calculation-in-x.patch | 44 ++
...-Delete-update_cr3-s-do_locking-parameter.patch | 161 -----
...ols-libxs-Fix-CLOEXEC-handling-in-get_dev.patch | 95 +++
...-Swap-order-of-actions-in-the-FREE-macros.patch | 58 --
...-libxs-Fix-CLOEXEC-handling-in-get_socket.patch | 60 ++
...k-introduce-support-for-blocking-speculat.patch | 331 -----------
...oduce-support-for-blocking-speculation-in.patch | 125 ----
...s-libxs-Fix-CLOEXEC-handling-in-xs_fileno.patch | 109 ++++
...ument-and-enforce-extra_guest_irqs-upper-.patch | 156 +++++
...ck-introduce-support-for-blocking-specula.patch | 87 ---
...empt-to-ensure-lock-wrappers-are-always-i.patch | 405 -------------
...on-t-clear-DF-when-raising-UD-for-lack-of.patch | 58 ++
0052-evtchn-build-fix-for-Arm.patch | 43 ++
...-speculation-barriers-to-open-coded-locks.patch | 73 ---
...RQ-avoid-double-unlock-in-map_domain_pirq.patch | 53 ++
...-conditional-lock-taking-from-speculative.patch | 216 -------
...s-ipxe-update-for-fixing-build-with-GCC12.patch | 33 --
...-Return-pirq-that-irq-was-already-mapped-.patch | 38 ++
...libxs-Fix-fcntl-invocation-in-set_cloexec.patch | 57 ++
...-block_lock_speculation-in-_mm_write_lock.patch | 35 --
...-fix-clang-code-gen-when-using-altcall-in.patch | 85 +++
...x-setup_apic_nmi_watchdog-to-fail-more-cl.patch | 120 ----
...-together-P2M-update-and-increment-of-ent.patch | 61 --
...tored-Use-Map-instead-of-Hashtbl-for-quot.patch | 143 -----
0059-tools-oxenstored-Make-Quota.t-pure.patch | 121 ----
...x86-cpu-policy-Hide-x2APIC-from-PV-guests.patch | 90 ---
...icy-Fix-visibility-of-HTT-CMP_LEGACY-in-m.patch | 85 ---
...irtual-region-Rename-the-start-end-fields.patch | 140 -----
...en-virtual-region-Include-rodata-pointers.patch | 71 ---
...livepatch-Relax-permissions-on-rodata-too.patch | 85 ---
...prove-the-boot-watchdog-determination-of-.patch | 106 ----
...Support-the-watchdog-on-newer-AMD-systems.patch | 48 --
...s-resource-Fix-HVM-guest-in-SHADOW-builds.patch | 110 ----
info.txt | 6 +-
123 files changed, 4574 insertions(+), 7274 deletions(-)
diff --git a/0001-x86-entry-Fix-build-with-older-toolchains.patch b/0001-x86-entry-Fix-build-with-older-toolchains.patch
new file mode 100644
index 0000000..ad6e76a
--- /dev/null
+++ b/0001-x86-entry-Fix-build-with-older-toolchains.patch
@@ -0,0 +1,32 @@
+From 2d38302c33b117aa9a417056db241aefc840c2f0 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 9 Apr 2024 21:39:51 +0100
+Subject: [PATCH 01/56] x86/entry: Fix build with older toolchains
+
+Binutils older than 2.29 doesn't know INCSSPD.
+
+Fixes: 8e186f98ce0e ("x86: Use indirect calls in reset-stack infrastructure")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+(cherry picked from commit a9fa82500818a8d8ce5f2843f1577bd2c29d088e)
+---
+ xen/arch/x86/x86_64/entry.S | 2 ++
+ 1 file changed, 2 insertions(+)
+
+diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
+index ad7dd3b23b..054fcb225f 100644
+--- a/xen/arch/x86/x86_64/entry.S
++++ b/xen/arch/x86/x86_64/entry.S
+@@ -643,7 +643,9 @@ ENTRY(continue_pv_domain)
+ * JMPed to. Drop the return address.
+ */
+ add $8, %rsp
++#ifdef CONFIG_XEN_SHSTK
+ ALTERNATIVE "", "mov $2, %eax; incsspd %eax", X86_FEATURE_XEN_SHSTK
++#endif
+
+ call check_wakeup_from_wait
+ ret_from_intr:
+--
+2.45.2
+
diff --git a/0002-altcall-fix-__alt_call_maybe_initdata-so-it-s-safe-f.patch b/0002-altcall-fix-__alt_call_maybe_initdata-so-it-s-safe-f.patch
new file mode 100644
index 0000000..05ecd83
--- /dev/null
+++ b/0002-altcall-fix-__alt_call_maybe_initdata-so-it-s-safe-f.patch
@@ -0,0 +1,49 @@
+From 8bdcb0b98b53140102031ceca0611f22190227fd Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Mon, 29 Apr 2024 09:35:21 +0200
+Subject: [PATCH 02/56] altcall: fix __alt_call_maybe_initdata so it's safe for
+ livepatch
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Setting alternative call variables as __init is not safe for use with
+livepatch, as livepatches can rightfully introduce new alternative calls to
+structures marked as __alt_call_maybe_initdata (possibly just indirectly due to
+replacing existing functions that use those). Attempting to resolve those
+alternative calls then results in page faults as the variable that holds the
+function pointer address has been freed.
+
+When livepatch is supported use the __ro_after_init attribute instead of
+__initdata for __alt_call_maybe_initdata.
+
+Fixes: f26bb285949b ('xen: Implement xen/alternative-call.h for use in common code')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: af4cd0a6a61cdb03bc1afca9478b05b0c9703599
+master date: 2024-04-11 18:51:36 +0100
+---
+ xen/include/xen/alternative-call.h | 7 ++++++-
+ 1 file changed, 6 insertions(+), 1 deletion(-)
+
+diff --git a/xen/include/xen/alternative-call.h b/xen/include/xen/alternative-call.h
+index 5c6b9a562b..10f7d7637e 100644
+--- a/xen/include/xen/alternative-call.h
++++ b/xen/include/xen/alternative-call.h
+@@ -50,7 +50,12 @@
+
+ #include <asm/alternative.h>
+
+-#define __alt_call_maybe_initdata __initdata
++#ifdef CONFIG_LIVEPATCH
++/* Must keep for livepatches to resolve alternative calls. */
++# define __alt_call_maybe_initdata __ro_after_init
++#else
++# define __alt_call_maybe_initdata __initdata
++#endif
+
+ #else
+
+--
+2.45.2
+
diff --git a/0002-pci-fail-device-assignment-if-phantom-functions-cann.patch b/0002-pci-fail-device-assignment-if-phantom-functions-cann.patch
deleted file mode 100644
index bafad55..0000000
--- a/0002-pci-fail-device-assignment-if-phantom-functions-cann.patch
+++ /dev/null
@@ -1,91 +0,0 @@
-From f9e1ed51bdba31017ea17e1819eb2ade6b5c8615 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 30 Jan 2024 14:37:39 +0100
-Subject: [PATCH 02/67] pci: fail device assignment if phantom functions cannot
- be assigned
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The current behavior is that no error is reported if (some) phantom functions
-fail to be assigned during device add or assignment, so the operation succeeds
-even if some phantom functions are not correctly setup.
-
-This can lead to devices possibly being successfully assigned to a domU while
-some of the device phantom functions are still assigned to dom0. Even when the
-device is assigned domIO before being assigned to a domU phantom functions
-might fail to be assigned to domIO, and also fail to be assigned to the domU,
-leaving them assigned to dom0.
-
-Since the device can generate requests using the IDs of those phantom
-functions, given the scenario above a device in such state would be in control
-of a domU, but still capable of generating transactions that use a context ID
-targeting dom0 owned memory.
-
-Modify device assign in order to attempt to deassign the device if phantom
-functions failed to be assigned.
-
-Note that device addition is not modified in the same way, as in that case the
-device is assigned to a trusted domain, and hence partial assign can lead to
-device malfunction but not a security issue.
-
-This is XSA-449 / CVE-2023-46839
-
-Fixes: 4e9950dc1bd2 ('IOMMU: add phantom function support')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: cb4ecb3cc17b02c2814bc817efd05f3f3ba33d1e
-master date: 2024-01-30 14:28:01 +0100
----
- xen/drivers/passthrough/pci.c | 27 +++++++++++++++++++++------
- 1 file changed, 21 insertions(+), 6 deletions(-)
-
-diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
-index 07d1986d33..8c62b14d19 100644
---- a/xen/drivers/passthrough/pci.c
-+++ b/xen/drivers/passthrough/pci.c
-@@ -1444,11 +1444,10 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
-
- pdev->fault.count = 0;
-
-- if ( (rc = iommu_call(hd->platform_ops, assign_device, d, devfn,
-- pci_to_dev(pdev), flag)) )
-- goto done;
-+ rc = iommu_call(hd->platform_ops, assign_device, d, devfn, pci_to_dev(pdev),
-+ flag);
-
-- for ( ; pdev->phantom_stride; rc = 0 )
-+ while ( pdev->phantom_stride && !rc )
- {
- devfn += pdev->phantom_stride;
- if ( PCI_SLOT(devfn) != PCI_SLOT(pdev->devfn) )
-@@ -1459,8 +1458,24 @@ static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 flag)
-
- done:
- if ( rc )
-- printk(XENLOG_G_WARNING "%pd: assign (%pp) failed (%d)\n",
-- d, &PCI_SBDF(seg, bus, devfn), rc);
-+ {
-+ printk(XENLOG_G_WARNING "%pd: assign %s(%pp) failed (%d)\n",
-+ d, devfn != pdev->devfn ? "phantom function " : "",
-+ &PCI_SBDF(seg, bus, devfn), rc);
-+
-+ if ( devfn != pdev->devfn && deassign_device(d, seg, bus, pdev->devfn) )
-+ {
-+ /*
-+ * Device with phantom functions that failed to both assign and
-+ * rollback. Mark the device as broken and crash the target domain,
-+ * as the state of the functions at this point is unknown and Xen
-+ * has no way to assert consistent context assignment among them.
-+ */
-+ pdev->broken = true;
-+ if ( !is_hardware_domain(d) && d != dom_io )
-+ domain_crash(d);
-+ }
-+ }
- /* The device is assigned to dom_io so mark it as quarantined */
- else if ( d == dom_io )
- pdev->quarantine = true;
---
-2.44.0
-
diff --git a/0003-VT-d-Fix-else-vs-endif-misplacement.patch b/0003-VT-d-Fix-else-vs-endif-misplacement.patch
deleted file mode 100644
index 622fa18..0000000
--- a/0003-VT-d-Fix-else-vs-endif-misplacement.patch
+++ /dev/null
@@ -1,70 +0,0 @@
-From 6b1864afc14d484cdbc9754ce3172ac3dc189846 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 30 Jan 2024 14:38:38 +0100
-Subject: [PATCH 03/67] VT-d: Fix "else" vs "#endif" misplacement
-
-In domain_pgd_maddr() the "#endif" is misplaced with respect to "else". This
-generates incorrect logic when CONFIG_HVM is compiled out, as the "else" body
-is executed unconditionally.
-
-Rework the logic to use IS_ENABLED() instead of explicit #ifdef-ary, as it's
-clearer to follow. This in turn involves adjusting p2m_get_pagetable() to
-compile when CONFIG_HVM is disabled.
-
-This is XSA-450 / CVE-2023-46840.
-
-Fixes: 033ff90aa9c1 ("x86/P2M: p2m_{alloc,free}_ptp() and p2m_alloc_table() are HVM-only")
-Reported-by: Teddy Astie <teddy.astie@vates.tech>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: cc6ba68edf6dcd18c3865e7d7c0f1ed822796426
-master date: 2024-01-30 14:29:15 +0100
----
- xen/arch/x86/include/asm/p2m.h | 9 ++++++++-
- xen/drivers/passthrough/vtd/iommu.c | 4 +---
- 2 files changed, 9 insertions(+), 4 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/p2m.h b/xen/arch/x86/include/asm/p2m.h
-index cd43d8621a..4f691533d5 100644
---- a/xen/arch/x86/include/asm/p2m.h
-+++ b/xen/arch/x86/include/asm/p2m.h
-@@ -447,7 +447,14 @@ static inline bool_t p2m_is_altp2m(const struct p2m_domain *p2m)
- return p2m->p2m_class == p2m_alternate;
- }
-
--#define p2m_get_pagetable(p2m) ((p2m)->phys_table)
-+#ifdef CONFIG_HVM
-+static inline pagetable_t p2m_get_pagetable(const struct p2m_domain *p2m)
-+{
-+ return p2m->phys_table;
-+}
-+#else
-+pagetable_t p2m_get_pagetable(const struct p2m_domain *p2m);
-+#endif
-
- /*
- * Ensure any deferred p2m TLB flush has been completed on all VCPUs.
-diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
-index b4c11a6b48..908b3ba6ee 100644
---- a/xen/drivers/passthrough/vtd/iommu.c
-+++ b/xen/drivers/passthrough/vtd/iommu.c
-@@ -441,15 +441,13 @@ static paddr_t domain_pgd_maddr(struct domain *d, paddr_t pgd_maddr,
-
- if ( pgd_maddr )
- /* nothing */;
--#ifdef CONFIG_HVM
-- else if ( iommu_use_hap_pt(d) )
-+ else if ( IS_ENABLED(CONFIG_HVM) && iommu_use_hap_pt(d) )
- {
- pagetable_t pgt = p2m_get_pagetable(p2m_get_hostp2m(d));
-
- pgd_maddr = pagetable_get_paddr(pgt);
- }
- else
--#endif
- {
- if ( !hd->arch.vtd.pgd_maddr )
- {
---
-2.44.0
-
diff --git a/0003-x86-rtc-Avoid-UIP-flag-being-set-for-longer-than-exp.patch b/0003-x86-rtc-Avoid-UIP-flag-being-set-for-longer-than-exp.patch
new file mode 100644
index 0000000..8307630
--- /dev/null
+++ b/0003-x86-rtc-Avoid-UIP-flag-being-set-for-longer-than-exp.patch
@@ -0,0 +1,57 @@
+From af0e9ba44a58c87d6d135d8ffbf468b4ceac0a41 Mon Sep 17 00:00:00 2001
+From: Ross Lagerwall <ross.lagerwall@citrix.com>
+Date: Mon, 29 Apr 2024 09:36:04 +0200
+Subject: [PATCH 03/56] x86/rtc: Avoid UIP flag being set for longer than
+ expected
+
+In a test, OVMF reported an error initializing the RTC without
+indicating the precise nature of the error. The only plausible
+explanation I can find is as follows:
+
+As part of the initialization, OVMF reads register C and then reads
+register A repatedly until the UIP flag is not set. If this takes longer
+than 100 ms, OVMF fails and reports an error. This may happen with the
+following sequence of events:
+
+At guest time=0s, rtc_init() calls check_update_timer() which schedules
+update_timer for t=(1 - 244us).
+
+At t=1s, the update_timer function happens to have been called >= 244us
+late. In the timer callback, it sets the UIP flag and schedules
+update_timer2 for t=1s.
+
+Before update_timer2 runs, the guest reads register C which calls
+check_update_timer(). check_update_timer() stops the scheduled
+update_timer2 and since the guest time is now outside of the update
+cycle, it schedules update_timer for t=(2 - 244us).
+
+The UIP flag will therefore be set for a whole second from t=1 to t=2
+while the guest repeatedly reads register A waiting for the UIP flag to
+clear. Fix it by clearing the UIP flag when scheduling update_timer.
+
+I was able to reproduce this issue with a synthetic test and this
+resolves the issue.
+
+Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 43a07069863b419433dee12c9b58c1f7ce70aa97
+master date: 2024-04-23 14:09:18 +0200
+---
+ xen/arch/x86/hvm/rtc.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
+index 206b4296e9..4839374352 100644
+--- a/xen/arch/x86/hvm/rtc.c
++++ b/xen/arch/x86/hvm/rtc.c
+@@ -202,6 +202,7 @@ static void check_update_timer(RTCState *s)
+ }
+ else
+ {
++ s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
+ next_update_time = (USEC_PER_SEC - guest_usec - 244) * NS_PER_USEC;
+ expire_time = NOW() + next_update_time;
+ s->next_update_time = expire_time;
+--
+2.45.2
+
diff --git a/0004-x86-MTRR-correct-inadvertently-inverted-WC-check.patch b/0004-x86-MTRR-correct-inadvertently-inverted-WC-check.patch
new file mode 100644
index 0000000..ed7754d
--- /dev/null
+++ b/0004-x86-MTRR-correct-inadvertently-inverted-WC-check.patch
@@ -0,0 +1,36 @@
+From eb7059767c82d833ebecdf8106e96482b04f3c40 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Mon, 29 Apr 2024 09:36:37 +0200
+Subject: [PATCH 04/56] x86/MTRR: correct inadvertently inverted WC check
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The ! clearly got lost by mistake.
+
+Fixes: e9e0eb30d4d6 ("x86/MTRR: avoid several indirect calls")
+Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 77e25f0e30ddd11e043e6fce84bf108ce7de5b6f
+master date: 2024-04-23 14:13:48 +0200
+---
+ xen/arch/x86/cpu/mtrr/main.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/cpu/mtrr/main.c b/xen/arch/x86/cpu/mtrr/main.c
+index 55a4da54a7..90b235f57e 100644
+--- a/xen/arch/x86/cpu/mtrr/main.c
++++ b/xen/arch/x86/cpu/mtrr/main.c
+@@ -316,7 +316,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
+ }
+
+ /* If the type is WC, check that this processor supports it */
+- if ((type == X86_MT_WC) && mtrr_have_wrcomb()) {
++ if ((type == X86_MT_WC) && !mtrr_have_wrcomb()) {
+ printk(KERN_WARNING
+ "mtrr: your processor doesn't support write-combining\n");
+ return -EOPNOTSUPP;
+--
+2.45.2
+
diff --git a/0004-x86-amd-Extend-CPU-erratum-1474-fix-to-more-affected.patch b/0004-x86-amd-Extend-CPU-erratum-1474-fix-to-more-affected.patch
deleted file mode 100644
index fa90a46..0000000
--- a/0004-x86-amd-Extend-CPU-erratum-1474-fix-to-more-affected.patch
+++ /dev/null
@@ -1,123 +0,0 @@
-From abcc32f0634627fe21117a48bd10e792bfbdd6dc Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Fri, 2 Feb 2024 08:01:09 +0100
-Subject: [PATCH 04/67] x86/amd: Extend CPU erratum #1474 fix to more affected
- models
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Erratum #1474 has now been extended to cover models from family 17h ranges
-00-2Fh, so the errata now covers all the models released under Family
-17h (Zen, Zen+ and Zen2).
-
-Additionally extend the workaround to Family 18h (Hygon), since it's based on
-the Zen architecture and very likely affected.
-
-Rename all the zen2 related symbols to fam17, since the errata doesn't
-exclusively affect Zen2 anymore.
-
-Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 23db507a01a4ec5259ec0ab43d296a41b1c326ba
-master date: 2023-12-21 12:19:40 +0000
----
- xen/arch/x86/cpu/amd.c | 27 ++++++++++++++-------------
- 1 file changed, 14 insertions(+), 13 deletions(-)
-
-diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
-index 29ae97e7c0..3d85e9797d 100644
---- a/xen/arch/x86/cpu/amd.c
-+++ b/xen/arch/x86/cpu/amd.c
-@@ -54,7 +54,7 @@ bool __read_mostly amd_acpi_c1e_quirk;
- bool __ro_after_init amd_legacy_ssbd;
- bool __initdata amd_virt_spec_ctrl;
-
--static bool __read_mostly zen2_c6_disabled;
-+static bool __read_mostly fam17_c6_disabled;
-
- static inline int rdmsr_amd_safe(unsigned int msr, unsigned int *lo,
- unsigned int *hi)
-@@ -951,24 +951,24 @@ void amd_check_zenbleed(void)
- val & chickenbit ? "chickenbit" : "microcode");
- }
-
--static void cf_check zen2_disable_c6(void *arg)
-+static void cf_check fam17_disable_c6(void *arg)
- {
- /* Disable C6 by clearing the CCR{0,1,2}_CC6EN bits. */
- const uint64_t mask = ~((1ul << 6) | (1ul << 14) | (1ul << 22));
- uint64_t val;
-
-- if (!zen2_c6_disabled) {
-+ if (!fam17_c6_disabled) {
- printk(XENLOG_WARNING
- "Disabling C6 after 1000 days apparent uptime due to AMD errata 1474\n");
-- zen2_c6_disabled = true;
-+ fam17_c6_disabled = true;
- /*
- * Prevent CPU hotplug so that started CPUs will either see
-- * zen2_c6_disabled set, or will be handled by
-+ * zen_c6_disabled set, or will be handled by
- * smp_call_function().
- */
- while (!get_cpu_maps())
- process_pending_softirqs();
-- smp_call_function(zen2_disable_c6, NULL, 0);
-+ smp_call_function(fam17_disable_c6, NULL, 0);
- put_cpu_maps();
- }
-
-@@ -1273,8 +1273,8 @@ static void cf_check init_amd(struct cpuinfo_x86 *c)
- amd_check_zenbleed();
- amd_check_erratum_1485();
-
-- if (zen2_c6_disabled)
-- zen2_disable_c6(NULL);
-+ if (fam17_c6_disabled)
-+ fam17_disable_c6(NULL);
-
- check_syscfg_dram_mod_en();
-
-@@ -1286,7 +1286,7 @@ const struct cpu_dev amd_cpu_dev = {
- .c_init = init_amd,
- };
-
--static int __init cf_check zen2_c6_errata_check(void)
-+static int __init cf_check amd_check_erratum_1474(void)
- {
- /*
- * Errata #1474: A Core May Hang After About 1044 Days
-@@ -1294,7 +1294,8 @@ static int __init cf_check zen2_c6_errata_check(void)
- */
- s_time_t delta;
-
-- if (cpu_has_hypervisor || boot_cpu_data.x86 != 0x17 || !is_zen2_uarch())
-+ if (cpu_has_hypervisor ||
-+ (boot_cpu_data.x86 != 0x17 && boot_cpu_data.x86 != 0x18))
- return 0;
-
- /*
-@@ -1309,10 +1310,10 @@ static int __init cf_check zen2_c6_errata_check(void)
- if (delta > 0) {
- static struct timer errata_c6;
-
-- init_timer(&errata_c6, zen2_disable_c6, NULL, 0);
-+ init_timer(&errata_c6, fam17_disable_c6, NULL, 0);
- set_timer(&errata_c6, NOW() + delta);
- } else
-- zen2_disable_c6(NULL);
-+ fam17_disable_c6(NULL);
-
- return 0;
- }
-@@ -1320,4 +1321,4 @@ static int __init cf_check zen2_c6_errata_check(void)
- * Must be executed after early_time_init() for tsc_ticks2ns() to have been
- * calibrated. That prevents us doing the check in init_amd().
- */
--presmp_initcall(zen2_c6_errata_check);
-+presmp_initcall(amd_check_erratum_1474);
---
-2.44.0
-
diff --git a/0005-CirrusCI-drop-FreeBSD-12.patch b/0005-CirrusCI-drop-FreeBSD-12.patch
deleted file mode 100644
index dac712b..0000000
--- a/0005-CirrusCI-drop-FreeBSD-12.patch
+++ /dev/null
@@ -1,39 +0,0 @@
-From 0ef1fb43ddd61b3c4c953e833e012ac21ad5ca0f Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Fri, 2 Feb 2024 08:01:50 +0100
-Subject: [PATCH 05/67] CirrusCI: drop FreeBSD 12
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Went EOL by the end of December 2023, and the pkg repos have been shut down.
-
-Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: c2ce3466472e9c9eda79f5dc98eb701bc6fdba20
-master date: 2024-01-15 12:20:11 +0100
----
- .cirrus.yml | 6 ------
- 1 file changed, 6 deletions(-)
-
-diff --git a/.cirrus.yml b/.cirrus.yml
-index 7e0beb200d..63f3afb104 100644
---- a/.cirrus.yml
-+++ b/.cirrus.yml
-@@ -14,12 +14,6 @@ freebsd_template: &FREEBSD_TEMPLATE
- - ./configure --with-system-seabios=/usr/local/share/seabios/bios.bin
- - gmake -j`sysctl -n hw.ncpu` clang=y
-
--task:
-- name: 'FreeBSD 12'
-- freebsd_instance:
-- image_family: freebsd-12-4
-- << : *FREEBSD_TEMPLATE
--
- task:
- name: 'FreeBSD 13'
- freebsd_instance:
---
-2.44.0
-
diff --git a/0005-x86-spec-fix-reporting-of-BHB-clearing-usage-from-gu.patch b/0005-x86-spec-fix-reporting-of-BHB-clearing-usage-from-gu.patch
new file mode 100644
index 0000000..bad0428
--- /dev/null
+++ b/0005-x86-spec-fix-reporting-of-BHB-clearing-usage-from-gu.patch
@@ -0,0 +1,69 @@
+From 0b0c7dca70d64c35c86e5d503f67366ebe2b9138 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Mon, 29 Apr 2024 09:37:04 +0200
+Subject: [PATCH 05/56] x86/spec: fix reporting of BHB clearing usage from
+ guest entry points
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Reporting whether the BHB clearing on entry is done for the different domains
+types based on cpu_has_bhb_seq is unhelpful, as that variable signals whether
+there's a BHB clearing sequence selected, but that alone doesn't imply that
+such sequence is used from the PV and/or HVM entry points.
+
+Instead use opt_bhb_entry_{pv,hvm} which do signal whether BHB clearing is
+performed on entry from PV/HVM.
+
+Fixes: 689ad48ce9cf ('x86/spec-ctrl: Wire up the Native-BHI software sequences')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 049ab0b2c9f1f5edb54b505fef0bc575787dafe9
+master date: 2024-04-25 16:35:56 +0200
+---
+ xen/arch/x86/spec_ctrl.c | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index ba4349a024..8c67d6256a 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -634,7 +634,7 @@ static void __init print_details(enum ind_thunk thunk)
+ (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
+ boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
+ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ||
+- cpu_has_bhb_seq || amd_virt_spec_ctrl ||
++ opt_bhb_entry_hvm || amd_virt_spec_ctrl ||
+ opt_eager_fpu || opt_verw_hvm) ? "" : " None",
+ boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ? " MSR_SPEC_CTRL" : "",
+ (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
+@@ -643,7 +643,7 @@ static void __init print_details(enum ind_thunk thunk)
+ opt_eager_fpu ? " EAGER_FPU" : "",
+ opt_verw_hvm ? " VERW" : "",
+ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ? " IBPB-entry" : "",
+- cpu_has_bhb_seq ? " BHB-entry" : "");
++ opt_bhb_entry_hvm ? " BHB-entry" : "");
+
+ #endif
+ #ifdef CONFIG_PV
+@@ -651,14 +651,14 @@ static void __init print_details(enum ind_thunk thunk)
+ (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
+ boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
+ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ||
+- cpu_has_bhb_seq ||
++ opt_bhb_entry_pv ||
+ opt_eager_fpu || opt_verw_pv) ? "" : " None",
+ boot_cpu_has(X86_FEATURE_SC_MSR_PV) ? " MSR_SPEC_CTRL" : "",
+ boot_cpu_has(X86_FEATURE_SC_RSB_PV) ? " RSB" : "",
+ opt_eager_fpu ? " EAGER_FPU" : "",
+ opt_verw_pv ? " VERW" : "",
+ boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ? " IBPB-entry" : "",
+- cpu_has_bhb_seq ? " BHB-entry" : "");
++ opt_bhb_entry_pv ? " BHB-entry" : "");
+
+ printk(" XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
+ opt_xpti_hwdom ? "enabled" : "disabled",
+--
+2.45.2
+
diff --git a/0006-x86-intel-ensure-Global-Performance-Counter-Control-.patch b/0006-x86-intel-ensure-Global-Performance-Counter-Control-.patch
deleted file mode 100644
index ce07803..0000000
--- a/0006-x86-intel-ensure-Global-Performance-Counter-Control-.patch
+++ /dev/null
@@ -1,74 +0,0 @@
-From d0ad2cc5eac1b5d3cfd14204d377ce2384f52607 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Fri, 2 Feb 2024 08:02:20 +0100
-Subject: [PATCH 06/67] x86/intel: ensure Global Performance Counter Control is
- setup correctly
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-When Architectural Performance Monitoring is available, the PERF_GLOBAL_CTRL
-MSR contains per-counter enable bits that is ANDed with the enable bit in the
-counter EVNTSEL MSR in order for a PMC counter to be enabled.
-
-So far the watchdog code seems to have relied on the PERF_GLOBAL_CTRL enable
-bits being set by default, but at least on some Intel Sapphire and Emerald
-Rapids this is no longer the case, and Xen reports:
-
-Testing NMI watchdog on all CPUs: 0 40 stuck
-
-The first CPU on each package is started with PERF_GLOBAL_CTRL zeroed, so PMC0
-doesn't start counting when the enable bit in EVNTSEL0 is set, due to the
-relevant enable bit in PERF_GLOBAL_CTRL not being set.
-
-Check and adjust PERF_GLOBAL_CTRL during CPU initialization so that all the
-general-purpose PMCs are enabled. Doing so brings the state of the package-BSP
-PERF_GLOBAL_CTRL in line with the rest of the CPUs on the system.
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-master commit: 6bdb965178bbb3fc50cd4418d4770a7789956e2c
-master date: 2024-01-17 10:40:52 +0100
----
- xen/arch/x86/cpu/intel.c | 23 ++++++++++++++++++++++-
- 1 file changed, 22 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
-index b40ac696e6..96723b5d44 100644
---- a/xen/arch/x86/cpu/intel.c
-+++ b/xen/arch/x86/cpu/intel.c
-@@ -528,9 +528,30 @@ static void cf_check init_intel(struct cpuinfo_x86 *c)
- init_intel_cacheinfo(c);
- if (c->cpuid_level > 9) {
- unsigned eax = cpuid_eax(10);
-+ unsigned int cnt = (eax >> 8) & 0xff;
-+
- /* Check for version and the number of counters */
-- if ((eax & 0xff) && (((eax>>8) & 0xff) > 1))
-+ if ((eax & 0xff) && (cnt > 1) && (cnt <= 32)) {
-+ uint64_t global_ctrl;
-+ unsigned int cnt_mask = (1UL << cnt) - 1;
-+
-+ /*
-+ * On (some?) Sapphire/Emerald Rapids platforms each
-+ * package-BSP starts with all the enable bits for the
-+ * general-purpose PMCs cleared. Adjust so counters
-+ * can be enabled from EVNTSEL.
-+ */
-+ rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl);
-+ if ((global_ctrl & cnt_mask) != cnt_mask) {
-+ printk("CPU%u: invalid PERF_GLOBAL_CTRL: %#"
-+ PRIx64 " adjusting to %#" PRIx64 "\n",
-+ smp_processor_id(), global_ctrl,
-+ global_ctrl | cnt_mask);
-+ wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL,
-+ global_ctrl | cnt_mask);
-+ }
- __set_bit(X86_FEATURE_ARCH_PERFMON, c->x86_capability);
-+ }
- }
-
- if ( !cpu_has(c, X86_FEATURE_XTOPOLOGY) )
---
-2.44.0
-
diff --git a/0006-x86-spec-adjust-logic-that-elides-lfence.patch b/0006-x86-spec-adjust-logic-that-elides-lfence.patch
new file mode 100644
index 0000000..6da96c4
--- /dev/null
+++ b/0006-x86-spec-adjust-logic-that-elides-lfence.patch
@@ -0,0 +1,75 @@
+From f0ff1d9cb96041a84a24857a6464628240deed4f Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Mon, 29 Apr 2024 09:37:29 +0200
+Subject: [PATCH 06/56] x86/spec: adjust logic that elides lfence
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+It's currently too restrictive by just checking whether there's a BHB clearing
+sequence selected. It should instead check whether BHB clearing is used on
+entry from PV or HVM specifically.
+
+Switch to use opt_bhb_entry_{pv,hvm} instead, and then remove cpu_has_bhb_seq
+since it no longer has any users.
+
+Reported-by: Jan Beulich <jbeulich@suse.com>
+Fixes: 954c983abcee ('x86/spec-ctrl: Software BHB-clearing sequences')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 656ae8f1091bcefec9c46ec3ea3ac2118742d4f6
+master date: 2024-04-25 16:37:01 +0200
+---
+ xen/arch/x86/include/asm/cpufeature.h | 3 ---
+ xen/arch/x86/spec_ctrl.c | 6 +++---
+ 2 files changed, 3 insertions(+), 6 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/cpufeature.h b/xen/arch/x86/include/asm/cpufeature.h
+index 7a312c485e..3c57f55de0 100644
+--- a/xen/arch/x86/include/asm/cpufeature.h
++++ b/xen/arch/x86/include/asm/cpufeature.h
+@@ -228,9 +228,6 @@ static inline bool boot_cpu_has(unsigned int feat)
+ #define cpu_bug_fpu_ptrs boot_cpu_has(X86_BUG_FPU_PTRS)
+ #define cpu_bug_null_seg boot_cpu_has(X86_BUG_NULL_SEG)
+
+-#define cpu_has_bhb_seq (boot_cpu_has(X86_SPEC_BHB_TSX) || \
+- boot_cpu_has(X86_SPEC_BHB_LOOPS))
+-
+ enum _cache_type {
+ CACHE_TYPE_NULL = 0,
+ CACHE_TYPE_DATA = 1,
+diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
+index 8c67d6256a..12c19b7eca 100644
+--- a/xen/arch/x86/spec_ctrl.c
++++ b/xen/arch/x86/spec_ctrl.c
+@@ -2328,7 +2328,7 @@ void __init init_speculation_mitigations(void)
+ * unconditional WRMSR. If we do have it, or we're not using any
+ * prior conditional block, then it's safe to drop the LFENCE.
+ */
+- if ( !cpu_has_bhb_seq &&
++ if ( !opt_bhb_entry_pv &&
+ (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
+ !boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV)) )
+ setup_force_cpu_cap(X86_SPEC_NO_LFENCE_ENTRY_PV);
+@@ -2344,7 +2344,7 @@ void __init init_speculation_mitigations(void)
+ * active in the block that is skipped when interrupting guest
+ * context, then it's safe to drop the LFENCE.
+ */
+- if ( !cpu_has_bhb_seq &&
++ if ( !opt_bhb_entry_pv &&
+ (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
+ (!boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) &&
+ !boot_cpu_has(X86_FEATURE_SC_RSB_PV))) )
+@@ -2356,7 +2356,7 @@ void __init init_speculation_mitigations(void)
+ * A BHB sequence, if used, is the only conditional action, so if we
+ * don't have it, we don't need the safety LFENCE.
+ */
+- if ( !cpu_has_bhb_seq )
++ if ( !opt_bhb_entry_hvm )
+ setup_force_cpu_cap(X86_SPEC_NO_LFENCE_ENTRY_VMX);
+ }
+
+--
+2.45.2
+
diff --git a/0007-x86-vmx-Fix-IRQ-handling-for-EXIT_REASON_INIT.patch b/0007-x86-vmx-Fix-IRQ-handling-for-EXIT_REASON_INIT.patch
deleted file mode 100644
index 2100acc..0000000
--- a/0007-x86-vmx-Fix-IRQ-handling-for-EXIT_REASON_INIT.patch
+++ /dev/null
@@ -1,65 +0,0 @@
-From eca5416f9b0e179de9553900de8de660ab09199d Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 2 Feb 2024 08:02:51 +0100
-Subject: [PATCH 07/67] x86/vmx: Fix IRQ handling for EXIT_REASON_INIT
-
-When receiving an INIT, a prior bugfix tried to ignore the INIT and continue
-onwards.
-
-Unfortunately it's not safe to return at that point in vmx_vmexit_handler().
-Just out of context in the first hunk is a local_irqs_enabled() which is
-depended-upon by the return-to-guest path, causing the following checklock
-failure in debug builds:
-
- (XEN) Error: INIT received - ignoring
- (XEN) CHECKLOCK FAILURE: prev irqsafe: 0, curr irqsafe 1
- (XEN) Xen BUG at common/spinlock.c:132
- (XEN) ----[ Xen-4.19-unstable x86_64 debug=y Tainted: H ]----
- ...
- (XEN) Xen call trace:
- (XEN) [<ffff82d040238e10>] R check_lock+0xcd/0xe1
- (XEN) [<ffff82d040238fe3>] F _spin_lock+0x1b/0x60
- (XEN) [<ffff82d0402ed6a8>] F pt_update_irq+0x32/0x3bb
- (XEN) [<ffff82d0402b9632>] F vmx_intr_assist+0x3b/0x51d
- (XEN) [<ffff82d040206447>] F vmx_asm_vmexit_handler+0xf7/0x210
-
-Luckily, this is benign in release builds. Accidentally having IRQs disabled
-when trying to take an IRQs-on lock isn't a deadlock-vulnerable pattern.
-
-Drop the problematic early return. In hindsight, it's wrong to skip other
-normal VMExit steps.
-
-Fixes: b1f11273d5a7 ("x86/vmx: Don't spuriously crash the domain when INIT is received")
-Reported-by: Reima ISHII <ishiir@g.ecc.u-tokyo.ac.jp>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: d1f8883aebe00f6a9632d77ab0cd5c6d02c9cbe4
-master date: 2024-01-18 20:59:06 +0000
----
- xen/arch/x86/hvm/vmx/vmx.c | 3 ++-
- 1 file changed, 2 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
-index 072288a5ef..31f4a861c6 100644
---- a/xen/arch/x86/hvm/vmx/vmx.c
-+++ b/xen/arch/x86/hvm/vmx/vmx.c
-@@ -4037,7 +4037,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
-
- case EXIT_REASON_INIT:
- printk(XENLOG_ERR "Error: INIT received - ignoring\n");
-- return; /* Renter the guest without further processing */
-+ break;
- }
-
- /* Now enable interrupts so it's safe to take locks. */
-@@ -4323,6 +4323,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
- break;
- }
- case EXIT_REASON_EXTERNAL_INTERRUPT:
-+ case EXIT_REASON_INIT:
- /* Already handled above. */
- break;
- case EXIT_REASON_TRIPLE_FAULT:
---
-2.44.0
-
diff --git a/0007-xen-xsm-Wire-up-get_dom0_console.patch b/0007-xen-xsm-Wire-up-get_dom0_console.patch
new file mode 100644
index 0000000..540541c
--- /dev/null
+++ b/0007-xen-xsm-Wire-up-get_dom0_console.patch
@@ -0,0 +1,66 @@
+From 026542c8577ab6af7c1dbc7446547bdc2bc705fd Mon Sep 17 00:00:00 2001
+From: Jason Andryuk <jason.andryuk@amd.com>
+Date: Tue, 21 May 2024 10:19:43 +0200
+Subject: [PATCH 07/56] xen/xsm: Wire up get_dom0_console
+
+An XSM hook for get_dom0_console is currently missing. Using XSM with
+a PVH dom0 shows:
+(XEN) FLASK: Denying unknown platform_op: 64.
+
+Wire up the hook, and allow it for dom0.
+
+Fixes: 4dd160583c ("x86/platform: introduce hypercall to get initial video console settings")
+Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
+Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
+master commit: 647f7e50ebeeb8152974cad6a12affe474c74513
+master date: 2024-04-30 08:33:41 +0200
+---
+ tools/flask/policy/modules/dom0.te | 2 +-
+ xen/xsm/flask/hooks.c | 4 ++++
+ xen/xsm/flask/policy/access_vectors | 2 ++
+ 3 files changed, 7 insertions(+), 1 deletion(-)
+
+diff --git a/tools/flask/policy/modules/dom0.te b/tools/flask/policy/modules/dom0.te
+index f1dcff48e2..16b8c9646d 100644
+--- a/tools/flask/policy/modules/dom0.te
++++ b/tools/flask/policy/modules/dom0.te
+@@ -16,7 +16,7 @@ allow dom0_t xen_t:xen {
+ allow dom0_t xen_t:xen2 {
+ resource_op psr_cmt_op psr_alloc pmu_ctrl get_symbol
+ get_cpu_levelling_caps get_cpu_featureset livepatch_op
+- coverage_op
++ coverage_op get_dom0_console
+ };
+
+ # Allow dom0 to use all XENVER_ subops that have checks.
+diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
+index 78225f68c1..5e88c71b8e 100644
+--- a/xen/xsm/flask/hooks.c
++++ b/xen/xsm/flask/hooks.c
+@@ -1558,6 +1558,10 @@ static int cf_check flask_platform_op(uint32_t op)
+ return avc_has_perm(domain_sid(current->domain), SECINITSID_XEN,
+ SECCLASS_XEN2, XEN2__GET_SYMBOL, NULL);
+
++ case XENPF_get_dom0_console:
++ return avc_has_perm(domain_sid(current->domain), SECINITSID_XEN,
++ SECCLASS_XEN2, XEN2__GET_DOM0_CONSOLE, NULL);
++
+ default:
+ return avc_unknown_permission("platform_op", op);
+ }
+diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
+index 4e6710a63e..a35e3d4c51 100644
+--- a/xen/xsm/flask/policy/access_vectors
++++ b/xen/xsm/flask/policy/access_vectors
+@@ -99,6 +99,8 @@ class xen2
+ livepatch_op
+ # XEN_SYSCTL_coverage_op
+ coverage_op
++# XENPF_get_dom0_console
++ get_dom0_console
+ }
+
+ # Classes domain and domain2 consist of operations that a domain performs on
+--
+2.45.2
+
diff --git a/0008-x86-vmx-Disallow-the-use-of-inactivity-states.patch b/0008-x86-vmx-Disallow-the-use-of-inactivity-states.patch
deleted file mode 100644
index 3af45e8..0000000
--- a/0008-x86-vmx-Disallow-the-use-of-inactivity-states.patch
+++ /dev/null
@@ -1,126 +0,0 @@
-From 7bd612727df792671e44152a8205f0cf821ad984 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 2 Feb 2024 08:03:26 +0100
-Subject: [PATCH 08/67] x86/vmx: Disallow the use of inactivity states
-
-Right now, vvmx will blindly copy L12's ACTIVITY_STATE into the L02 VMCS and
-enter the vCPU. Luckily for us, nested-virt is explicitly unsupported for
-security bugs.
-
-The inactivity states are HLT, SHUTDOWN and WAIT-FOR-SIPI, and as noted by the
-SDM in Vol3 27.7 "Special Features of VM Entry":
-
- If VM entry ends with the logical processor in an inactive activity state,
- the VM entry generates any special bus cycle that is normally generated when
- that activity state is entered from the active state.
-
-Also,
-
- Some activity states unconditionally block certain events.
-
-I.e. A VMEntry with ACTIVITY=SHUTDOWN will initiate a platform reset, while a
-VMEntry with ACTIVITY=WAIT-FOR-SIPI will really block everything other than
-SIPIs.
-
-Both of these activity states are for the TXT ACM to use, not for regular
-hypervisors, and Xen doesn't support dropping the HLT intercept either.
-
-There are two paths in Xen which operate on ACTIVITY_STATE.
-
-1) The vmx_{get,set}_nonreg_state() helpers for VM-Fork.
-
- As regular VMs can't use any inactivity states, this is just duplicating
- the 0 from construct_vmcs(). Retain the ability to query activity_state,
- but crash the domain on any attempt to set an inactivity state.
-
-2) Nested virt, because of ACTIVITY_STATE in vmcs_gstate_field[].
-
- Explicitly hide the inactivity states in the guest's view of MSR_VMX_MISC,
- and remove ACTIVITY_STATE from vmcs_gstate_field[].
-
- In virtual_vmentry(), we should trigger a VMEntry failure for the use of
- any inactivity states, but there's no support for that in the code at all
- so leave a TODO for when we finally start working on nested-virt in
- earnest.
-
-Reported-by: Reima Ishii <ishiir@g.ecc.u-tokyo.ac.jp>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
-master commit: 3643bb53a05b7c8fbac072c63bef1538f2a6d0d2
-master date: 2024-01-18 20:59:06 +0000
----
- xen/arch/x86/hvm/vmx/vmx.c | 8 +++++++-
- xen/arch/x86/hvm/vmx/vvmx.c | 9 +++++++--
- xen/arch/x86/include/asm/hvm/vmx/vmcs.h | 1 +
- 3 files changed, 15 insertions(+), 3 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
-index 31f4a861c6..35d391d8e5 100644
---- a/xen/arch/x86/hvm/vmx/vmx.c
-+++ b/xen/arch/x86/hvm/vmx/vmx.c
-@@ -1499,7 +1499,13 @@ static void cf_check vmx_set_nonreg_state(struct vcpu *v,
- {
- vmx_vmcs_enter(v);
-
-- __vmwrite(GUEST_ACTIVITY_STATE, nrs->vmx.activity_state);
-+ if ( nrs->vmx.activity_state )
-+ {
-+ printk("Attempt to set %pv activity_state %#lx\n",
-+ v, nrs->vmx.activity_state);
-+ domain_crash(v->domain);
-+ }
-+
- __vmwrite(GUEST_INTERRUPTIBILITY_INFO, nrs->vmx.interruptibility_info);
- __vmwrite(GUEST_PENDING_DBG_EXCEPTIONS, nrs->vmx.pending_dbg);
-
-diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
-index f8fe8d0c14..515cb5ae77 100644
---- a/xen/arch/x86/hvm/vmx/vvmx.c
-+++ b/xen/arch/x86/hvm/vmx/vvmx.c
-@@ -910,7 +910,10 @@ static const u16 vmcs_gstate_field[] = {
- GUEST_LDTR_AR_BYTES,
- GUEST_TR_AR_BYTES,
- GUEST_INTERRUPTIBILITY_INFO,
-+ /*
-+ * ACTIVITY_STATE is handled specially.
- GUEST_ACTIVITY_STATE,
-+ */
- GUEST_SYSENTER_CS,
- GUEST_PREEMPTION_TIMER,
- /* natural */
-@@ -1211,6 +1214,8 @@ static void virtual_vmentry(struct cpu_user_regs *regs)
- nvcpu->nv_vmentry_pending = 0;
- nvcpu->nv_vmswitch_in_progress = 1;
-
-+ /* TODO: Fail VMentry for GUEST_ACTIVITY_STATE != 0 */
-+
- /*
- * EFER handling:
- * hvm_set_efer won't work if CR0.PG = 1, so we change the value
-@@ -2327,8 +2332,8 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
- data = hvm_cr4_guest_valid_bits(d);
- break;
- case MSR_IA32_VMX_MISC:
-- /* Do not support CR3-target feature now */
-- data = host_data & ~VMX_MISC_CR3_TARGET;
-+ /* Do not support CR3-targets or activity states. */
-+ data = host_data & ~(VMX_MISC_CR3_TARGET | VMX_MISC_ACTIVITY_MASK);
- break;
- case MSR_IA32_VMX_EPT_VPID_CAP:
- data = nept_get_ept_vpid_cap();
-diff --git a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
-index 78404e42b3..0af021d5f5 100644
---- a/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
-+++ b/xen/arch/x86/include/asm/hvm/vmx/vmcs.h
-@@ -288,6 +288,7 @@ extern u32 vmx_secondary_exec_control;
- #define VMX_VPID_INVVPID_SINGLE_CONTEXT_RETAINING_GLOBAL 0x80000000000ULL
- extern u64 vmx_ept_vpid_cap;
-
-+#define VMX_MISC_ACTIVITY_MASK 0x000001c0
- #define VMX_MISC_PROC_TRACE 0x00004000
- #define VMX_MISC_CR3_TARGET 0x01ff0000
- #define VMX_MISC_VMWRITE_ALL 0x20000000
---
-2.44.0
-
diff --git a/0008-xen-x86-Fix-Syntax-warning-in-gen-cpuid.py.patch b/0008-xen-x86-Fix-Syntax-warning-in-gen-cpuid.py.patch
new file mode 100644
index 0000000..7c04f23
--- /dev/null
+++ b/0008-xen-x86-Fix-Syntax-warning-in-gen-cpuid.py.patch
@@ -0,0 +1,41 @@
+From 47cf06c09a2fa1ee92ea3e7718c8f8e0f1450d88 Mon Sep 17 00:00:00 2001
+From: Jason Andryuk <jason.andryuk@amd.com>
+Date: Tue, 21 May 2024 10:20:06 +0200
+Subject: [PATCH 08/56] xen/x86: Fix Syntax warning in gen-cpuid.py
+
+Python 3.12.2 warns:
+
+xen/tools/gen-cpuid.py:50: SyntaxWarning: invalid escape sequence '\s'
+ "\s+([\s\d]+\*[\s\d]+\+[\s\d]+)\)"
+xen/tools/gen-cpuid.py:51: SyntaxWarning: invalid escape sequence '\s'
+ "\s+/\*([\w!]*) .*$")
+
+Specify the strings as raw strings so '\s' is read as literal '\' + 's'.
+This avoids escaping all the '\'s in the strings.
+
+Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 08e79bba73d74a85d3ce6ff0f91c5205f1e05eda
+master date: 2024-04-30 08:34:37 +0200
+---
+ xen/tools/gen-cpuid.py | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
+index 02dd45a5ed..415d644db5 100755
+--- a/xen/tools/gen-cpuid.py
++++ b/xen/tools/gen-cpuid.py
+@@ -47,8 +47,8 @@ def parse_definitions(state):
+ """
+ feat_regex = re.compile(
+ r"^XEN_CPUFEATURE\(([A-Z0-9_]+),"
+- "\s+([\s\d]+\*[\s\d]+\+[\s\d]+)\)"
+- "\s+/\*([\w!]*) .*$")
++ r"\s+([\s\d]+\*[\s\d]+\+[\s\d]+)\)"
++ r"\s+/\*([\w!]*) .*$")
+
+ word_regex = re.compile(
+ r"^/\* .* word (\d*) \*/$")
+--
+2.45.2
+
diff --git a/0009-VT-d-correct-ATS-checking-for-root-complex-integrate.patch b/0009-VT-d-correct-ATS-checking-for-root-complex-integrate.patch
new file mode 100644
index 0000000..2d2dc91
--- /dev/null
+++ b/0009-VT-d-correct-ATS-checking-for-root-complex-integrate.patch
@@ -0,0 +1,63 @@
+From a4c5bbb9db07b27e66f7c47676b1c888e1bece20 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 21 May 2024 10:20:58 +0200
+Subject: [PATCH 09/56] VT-d: correct ATS checking for root complex integrated
+ devices
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Spec version 4.1 says
+
+"The ATSR structures identifies PCI Express Root-Ports supporting
+ Address Translation Services (ATS) transactions. Software must enable
+ ATS on endpoint devices behind a Root Port only if the Root Port is
+ reported as supporting ATS transactions."
+
+Clearly root complex integrated devices aren't "behind root ports",
+matching my observation on a SapphireRapids system having an ATS-
+capable root complex integrated device. Hence for such devices we
+shouldn't try to locate a corresponding ATSR.
+
+Since both pci_find_ext_capability() and pci_find_cap_offset() return
+"unsigned int", change "pos" to that type at the same time.
+
+Fixes: 903b93211f56 ("[VTD] laying the ground work for ATS")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 04e31583bab97e5042a44a1d00fce2760272635f
+master date: 2024-05-06 09:22:45 +0200
+---
+ xen/drivers/passthrough/vtd/x86/ats.c | 9 +++++++--
+ 1 file changed, 7 insertions(+), 2 deletions(-)
+
+diff --git a/xen/drivers/passthrough/vtd/x86/ats.c b/xen/drivers/passthrough/vtd/x86/ats.c
+index 1f5913bed9..61052ef580 100644
+--- a/xen/drivers/passthrough/vtd/x86/ats.c
++++ b/xen/drivers/passthrough/vtd/x86/ats.c
+@@ -44,7 +44,7 @@ struct acpi_drhd_unit *find_ats_dev_drhd(struct vtd_iommu *iommu)
+ int ats_device(const struct pci_dev *pdev, const struct acpi_drhd_unit *drhd)
+ {
+ struct acpi_drhd_unit *ats_drhd;
+- int pos;
++ unsigned int pos, expfl = 0;
+
+ if ( !ats_enabled || !iommu_qinval )
+ return 0;
+@@ -53,7 +53,12 @@ int ats_device(const struct pci_dev *pdev, const struct acpi_drhd_unit *drhd)
+ !ecap_dev_iotlb(drhd->iommu->ecap) )
+ return 0;
+
+- if ( !acpi_find_matched_atsr_unit(pdev) )
++ pos = pci_find_cap_offset(pdev->sbdf, PCI_CAP_ID_EXP);
++ if ( pos )
++ expfl = pci_conf_read16(pdev->sbdf, pos + PCI_EXP_FLAGS);
++
++ if ( MASK_EXTR(expfl, PCI_EXP_FLAGS_TYPE) != PCI_EXP_TYPE_RC_END &&
++ !acpi_find_matched_atsr_unit(pdev) )
+ return 0;
+
+ ats_drhd = find_ats_dev_drhd(drhd->iommu);
+--
+2.45.2
+
diff --git a/0009-lib-fdt-elf-move-lib-fdt-elf-temp.o-and-their-deps-t.patch b/0009-lib-fdt-elf-move-lib-fdt-elf-temp.o-and-their-deps-t.patch
deleted file mode 100644
index f33d27d..0000000
--- a/0009-lib-fdt-elf-move-lib-fdt-elf-temp.o-and-their-deps-t.patch
+++ /dev/null
@@ -1,70 +0,0 @@
-From afb85cf1e8f165abf88de9d8a6df625692a753b1 Mon Sep 17 00:00:00 2001
-From: Michal Orzel <michal.orzel@amd.com>
-Date: Fri, 2 Feb 2024 08:04:07 +0100
-Subject: [PATCH 09/67] lib{fdt,elf}: move lib{fdt,elf}-temp.o and their deps
- to $(targets)
-
-At the moment, trying to run xencov read/reset (calling SYSCTL_coverage_op
-under the hood) results in a crash. This is due to a profiler trying to
-access data in the .init.* sections (libfdt for Arm and libelf for x86)
-that are stripped after boot. Normally, the build system compiles any
-*.init.o file without COV_FLAGS. However, these two libraries are
-handled differently as sections will be renamed to init after linking.
-
-To override COV_FLAGS to empty for these libraries, lib{fdt,elf}.o were
-added to nocov-y. This worked until e321576f4047 ("xen/build: start using
-if_changed") that added lib{fdt,elf}-temp.o and their deps to extra-y.
-This way, even though these objects appear as prerequisites of
-lib{fdt,elf}.o and the settings should propagate to them, make can also
-build them as a prerequisite of __build, in which case COV_FLAGS would
-still have the unwanted flags. Fix it by switching to $(targets) instead.
-
-Also, for libfdt, append libfdt.o to nocov-y only if CONFIG_OVERLAY_DTB
-is not set. Otherwise, there is no section renaming and we should be able
-to run the coverage.
-
-Fixes: e321576f4047 ("xen/build: start using if_changed")
-Signed-off-by: Michal Orzel <michal.orzel@amd.com>
-Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-master commit: 79519fcfa0605bbf19d8c02b979af3a2c8afed68
-master date: 2024-01-23 12:02:44 +0100
----
- xen/common/libelf/Makefile | 2 +-
- xen/common/libfdt/Makefile | 4 ++--
- 2 files changed, 3 insertions(+), 3 deletions(-)
-
-diff --git a/xen/common/libelf/Makefile b/xen/common/libelf/Makefile
-index 8a4522e4e1..917d12b006 100644
---- a/xen/common/libelf/Makefile
-+++ b/xen/common/libelf/Makefile
-@@ -13,4 +13,4 @@ $(obj)/libelf.o: $(obj)/libelf-temp.o FORCE
- $(obj)/libelf-temp.o: $(addprefix $(obj)/,$(libelf-objs)) FORCE
- $(call if_changed,ld)
-
--extra-y += libelf-temp.o $(libelf-objs)
-+targets += libelf-temp.o $(libelf-objs)
-diff --git a/xen/common/libfdt/Makefile b/xen/common/libfdt/Makefile
-index 75aaefa2e3..4d14fd61ba 100644
---- a/xen/common/libfdt/Makefile
-+++ b/xen/common/libfdt/Makefile
-@@ -2,9 +2,9 @@ include $(src)/Makefile.libfdt
-
- SECTIONS := text data $(SPECIAL_DATA_SECTIONS)
- OBJCOPYFLAGS := $(foreach s,$(SECTIONS),--rename-section .$(s)=.init.$(s))
-+nocov-y += libfdt.o
-
- obj-y += libfdt.o
--nocov-y += libfdt.o
-
- CFLAGS-y += -I$(srctree)/include/xen/libfdt/
-
-@@ -14,4 +14,4 @@ $(obj)/libfdt.o: $(obj)/libfdt-temp.o FORCE
- $(obj)/libfdt-temp.o: $(addprefix $(obj)/,$(LIBFDT_OBJS)) FORCE
- $(call if_changed,ld)
-
--extra-y += libfdt-temp.o $(LIBFDT_OBJS)
-+targets += libfdt-temp.o $(LIBFDT_OBJS)
---
-2.44.0
-
diff --git a/0010-tools-libxs-Open-dev-xen-xenbus-fds-as-O_CLOEXEC.patch b/0010-tools-libxs-Open-dev-xen-xenbus-fds-as-O_CLOEXEC.patch
new file mode 100644
index 0000000..9f9cdd7
--- /dev/null
+++ b/0010-tools-libxs-Open-dev-xen-xenbus-fds-as-O_CLOEXEC.patch
@@ -0,0 +1,47 @@
+From 2bc52041cacb33a301ebf939d69a021597941186 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 21 May 2024 10:21:47 +0200
+Subject: [PATCH 10/56] tools/libxs: Open /dev/xen/xenbus fds as O_CLOEXEC
+
+The header description for xs_open() goes as far as to suggest that the fd is
+O_CLOEXEC, but it isn't actually.
+
+`xl devd` has been observed leaking /dev/xen/xenbus into children.
+
+Link: https://github.com/QubesOS/qubes-issues/issues/8292
+Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+master commit: f4f2f3402b2f4985d69ffc0d46f845d05fd0b60f
+master date: 2024-05-07 15:18:36 +0100
+---
+ tools/libs/store/xs.c | 6 +++++-
+ 1 file changed, 5 insertions(+), 1 deletion(-)
+
+diff --git a/tools/libs/store/xs.c b/tools/libs/store/xs.c
+index 140b9a2839..1498515073 100644
+--- a/tools/libs/store/xs.c
++++ b/tools/libs/store/xs.c
+@@ -54,6 +54,10 @@ struct xs_stored_msg {
+ #include <dlfcn.h>
+ #endif
+
++#ifndef O_CLOEXEC
++#define O_CLOEXEC 0
++#endif
++
+ struct xs_handle {
+ /* Communications channel to xenstore daemon. */
+ int fd;
+@@ -227,7 +231,7 @@ error:
+ static int get_dev(const char *connect_to)
+ {
+ /* We cannot open read-only because requests are writes */
+- return open(connect_to, O_RDWR);
++ return open(connect_to, O_RDWR | O_CLOEXEC);
+ }
+
+ static int all_restrict_cb(Xentoolcore__Active_Handle *ah, domid_t domid) {
+--
+2.45.2
+
diff --git a/0010-x86-p2m-pt-fix-off-by-one-in-entry-check-assert.patch b/0010-x86-p2m-pt-fix-off-by-one-in-entry-check-assert.patch
deleted file mode 100644
index 9b3b9a0..0000000
--- a/0010-x86-p2m-pt-fix-off-by-one-in-entry-check-assert.patch
+++ /dev/null
@@ -1,36 +0,0 @@
-From 091466ba55d1e2e75738f751818ace2e3ed08ccf Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Fri, 2 Feb 2024 08:04:33 +0100
-Subject: [PATCH 10/67] x86/p2m-pt: fix off by one in entry check assert
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The MMIO RO rangeset overlap check is bogus: the rangeset is inclusive so the
-passed end mfn should be the last mfn to be mapped (not last + 1).
-
-Fixes: 6fa1755644d0 ('amd/npt/shadow: replace assert that prevents creating 2M/1G MMIO entries')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: George Dunlap <george.dunlap@cloud.com>
-master commit: 610775d0dd61c1bd2f4720c755986098e6a5bafd
-master date: 2024-01-25 16:09:04 +0100
----
- xen/arch/x86/mm/p2m-pt.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/mm/p2m-pt.c b/xen/arch/x86/mm/p2m-pt.c
-index eaba2b0fb4..f02ebae372 100644
---- a/xen/arch/x86/mm/p2m-pt.c
-+++ b/xen/arch/x86/mm/p2m-pt.c
-@@ -564,7 +564,7 @@ static void check_entry(mfn_t mfn, p2m_type_t new, p2m_type_t old,
- if ( new == p2m_mmio_direct )
- ASSERT(!mfn_eq(mfn, INVALID_MFN) &&
- !rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn),
-- mfn_x(mfn) + (1ul << order)));
-+ mfn_x(mfn) + (1UL << order) - 1));
- else if ( p2m_allows_invalid_mfn(new) || new == p2m_invalid ||
- new == p2m_mmio_dm )
- ASSERT(mfn_valid(mfn) || mfn_eq(mfn, INVALID_MFN));
---
-2.44.0
-
diff --git a/0011-tools-xentop-fix-sorting-bug-for-some-columns.patch b/0011-tools-xentop-fix-sorting-bug-for-some-columns.patch
deleted file mode 100644
index 6bf11d9..0000000
--- a/0011-tools-xentop-fix-sorting-bug-for-some-columns.patch
+++ /dev/null
@@ -1,67 +0,0 @@
-From 61da71968ea44964fd1dd2e449b053c77eb83139 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Cyril=20R=C3=A9bert=20=28zithro=29?= <slack@rabbit.lu>
-Date: Tue, 27 Feb 2024 14:06:53 +0100
-Subject: [PATCH 11/67] tools/xentop: fix sorting bug for some columns
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Sort doesn't work on columns VBD_OO, VBD_RD, VBD_WR and VBD_RSECT.
-Fix by adjusting variables names in compare functions.
-Bug fix only. No functional change.
-
-Fixes: 91c3e3dc91d6 ("tools/xentop: Display '-' when stats are not available.")
-Signed-off-by: Cyril Rébert (zithro) <slack@rabbit.lu>
-Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
-master commit: 29f17d837421f13c0e0010802de1b2d51d2ded4a
-master date: 2024-02-05 17:58:23 +0000
----
- tools/xentop/xentop.c | 10 +++++-----
- 1 file changed, 5 insertions(+), 5 deletions(-)
-
-diff --git a/tools/xentop/xentop.c b/tools/xentop/xentop.c
-index 950e8935c4..545bd5e96d 100644
---- a/tools/xentop/xentop.c
-+++ b/tools/xentop/xentop.c
-@@ -684,7 +684,7 @@ static int compare_vbd_oo(xenstat_domain *domain1, xenstat_domain *domain2)
- unsigned long long dom1_vbd_oo = 0, dom2_vbd_oo = 0;
-
- tot_vbd_reqs(domain1, FIELD_VBD_OO, &dom1_vbd_oo);
-- tot_vbd_reqs(domain1, FIELD_VBD_OO, &dom2_vbd_oo);
-+ tot_vbd_reqs(domain2, FIELD_VBD_OO, &dom2_vbd_oo);
-
- return -compare(dom1_vbd_oo, dom2_vbd_oo);
- }
-@@ -711,9 +711,9 @@ static int compare_vbd_rd(xenstat_domain *domain1, xenstat_domain *domain2)
- unsigned long long dom1_vbd_rd = 0, dom2_vbd_rd = 0;
-
- tot_vbd_reqs(domain1, FIELD_VBD_RD, &dom1_vbd_rd);
-- tot_vbd_reqs(domain1, FIELD_VBD_RD, &dom2_vbd_rd);
-+ tot_vbd_reqs(domain2, FIELD_VBD_RD, &dom2_vbd_rd);
-
-- return -compare(dom1_vbd_rd, dom1_vbd_rd);
-+ return -compare(dom1_vbd_rd, dom2_vbd_rd);
- }
-
- /* Prints number of total VBD READ requests statistic */
-@@ -738,7 +738,7 @@ static int compare_vbd_wr(xenstat_domain *domain1, xenstat_domain *domain2)
- unsigned long long dom1_vbd_wr = 0, dom2_vbd_wr = 0;
-
- tot_vbd_reqs(domain1, FIELD_VBD_WR, &dom1_vbd_wr);
-- tot_vbd_reqs(domain1, FIELD_VBD_WR, &dom2_vbd_wr);
-+ tot_vbd_reqs(domain2, FIELD_VBD_WR, &dom2_vbd_wr);
-
- return -compare(dom1_vbd_wr, dom2_vbd_wr);
- }
-@@ -765,7 +765,7 @@ static int compare_vbd_rsect(xenstat_domain *domain1, xenstat_domain *domain2)
- unsigned long long dom1_vbd_rsect = 0, dom2_vbd_rsect = 0;
-
- tot_vbd_reqs(domain1, FIELD_VBD_RSECT, &dom1_vbd_rsect);
-- tot_vbd_reqs(domain1, FIELD_VBD_RSECT, &dom2_vbd_rsect);
-+ tot_vbd_reqs(domain2, FIELD_VBD_RSECT, &dom2_vbd_rsect);
-
- return -compare(dom1_vbd_rsect, dom2_vbd_rsect);
- }
---
-2.44.0
-
diff --git a/0011-x86-cpu-policy-Fix-migration-from-Ice-Lake-to-Cascad.patch b/0011-x86-cpu-policy-Fix-migration-from-Ice-Lake-to-Cascad.patch
new file mode 100644
index 0000000..26eb3ec
--- /dev/null
+++ b/0011-x86-cpu-policy-Fix-migration-from-Ice-Lake-to-Cascad.patch
@@ -0,0 +1,92 @@
+From 0673eae8e53de5007dba35149527579819428323 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 21 May 2024 10:22:08 +0200
+Subject: [PATCH 11/56] x86/cpu-policy: Fix migration from Ice Lake to Cascade
+ Lake
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Ever since Xen 4.14, there has been a latent bug with migration.
+
+While some toolstacks can level the features properly, they don't shink
+feat.max_subleaf when all features have been dropped. This is because
+we *still* have not completed the toolstack side work for full CPU Policy
+objects.
+
+As a consequence, even when properly feature levelled, VMs can't migrate
+"backwards" across hardware which reduces feat.max_subleaf. One such example
+is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0).
+
+Extend the max policies feat.max_subleaf to the hightest number Xen knows
+about, but leave the default policies matching the host. This will allow VMs
+with a higher feat.max_subleaf than strictly necessary to migrate in.
+
+Eventually we'll manage to teach the toolstack how to avoid creating such VMs
+in the first place, but there's still more work to do there.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: a2330b51df267e20e66bbba6c5bf08f0570ed58b
+master date: 2024-05-07 16:56:46 +0100
+---
+ xen/arch/x86/cpu-policy.c | 22 ++++++++++++++++++++++
+ 1 file changed, 22 insertions(+)
+
+diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
+index a822800f52..1aba6ed4ca 100644
+--- a/xen/arch/x86/cpu-policy.c
++++ b/xen/arch/x86/cpu-policy.c
+@@ -603,6 +603,13 @@ static void __init calculate_pv_max_policy(void)
+ unsigned int i;
+
+ *p = host_cpu_policy;
++
++ /*
++ * Some VMs may have a larger-than-necessary feat max_subleaf. Allow them
++ * to migrate in.
++ */
++ p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1;
++
+ x86_cpu_policy_to_featureset(p, fs);
+
+ for ( i = 0; i < ARRAY_SIZE(fs); ++i )
+@@ -643,6 +650,10 @@ static void __init calculate_pv_def_policy(void)
+ unsigned int i;
+
+ *p = pv_max_cpu_policy;
++
++ /* Default to the same max_subleaf as the host. */
++ p->feat.max_subleaf = host_cpu_policy.feat.max_subleaf;
++
+ x86_cpu_policy_to_featureset(p, fs);
+
+ for ( i = 0; i < ARRAY_SIZE(fs); ++i )
+@@ -679,6 +690,13 @@ static void __init calculate_hvm_max_policy(void)
+ const uint32_t *mask;
+
+ *p = host_cpu_policy;
++
++ /*
++ * Some VMs may have a larger-than-necessary feat max_subleaf. Allow them
++ * to migrate in.
++ */
++ p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1;
++
+ x86_cpu_policy_to_featureset(p, fs);
+
+ mask = hvm_hap_supported() ?
+@@ -780,6 +798,10 @@ static void __init calculate_hvm_def_policy(void)
+ const uint32_t *mask;
+
+ *p = hvm_max_cpu_policy;
++
++ /* Default to the same max_subleaf as the host. */
++ p->feat.max_subleaf = host_cpu_policy.feat.max_subleaf;
++
+ x86_cpu_policy_to_featureset(p, fs);
+
+ mask = hvm_hap_supported() ?
+--
+2.45.2
+
diff --git a/0012-amd-vi-fix-IVMD-memory-type-checks.patch b/0012-amd-vi-fix-IVMD-memory-type-checks.patch
deleted file mode 100644
index f38e39e..0000000
--- a/0012-amd-vi-fix-IVMD-memory-type-checks.patch
+++ /dev/null
@@ -1,53 +0,0 @@
-From 463aaf3fbf62d24e898ae0c2ba53d85ca0f94d3f Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 27 Feb 2024 14:07:12 +0100
-Subject: [PATCH 12/67] amd-vi: fix IVMD memory type checks
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The current code that parses the IVMD blocks is relaxed with regard to the
-restriction that such unity regions should always fall into memory ranges
-marked as reserved in the memory map.
-
-However the type checks for the IVMD addresses are inverted, and as a result
-IVMD ranges falling into RAM areas are accepted. Note that having such ranges
-in the first place is a firmware bug, as IVMD should always fall into reserved
-ranges.
-
-Fixes: ed6c77ebf0c1 ('AMD/IOMMU: check / convert IVMD ranges for being / to be reserved')
-Reported-by: Ox <oxjo@proton.me>
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Tested-by: oxjo <oxjo@proton.me>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 83afa313583019d9f159c122cecf867735d27ec5
-master date: 2024-02-06 11:56:13 +0100
----
- xen/drivers/passthrough/amd/iommu_acpi.c | 11 ++++++++---
- 1 file changed, 8 insertions(+), 3 deletions(-)
-
-diff --git a/xen/drivers/passthrough/amd/iommu_acpi.c b/xen/drivers/passthrough/amd/iommu_acpi.c
-index 3b577c9b39..3a7045c39b 100644
---- a/xen/drivers/passthrough/amd/iommu_acpi.c
-+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
-@@ -426,9 +426,14 @@ static int __init parse_ivmd_block(const struct acpi_ivrs_memory *ivmd_block)
- return -EIO;
- }
-
-- /* Types which won't be handed out are considered good enough. */
-- if ( !(type & (RAM_TYPE_RESERVED | RAM_TYPE_ACPI |
-- RAM_TYPE_UNUSABLE)) )
-+ /*
-+ * Types which aren't RAM are considered good enough.
-+ * Note that a page being partially RESERVED, ACPI or UNUSABLE will
-+ * force Xen into assuming the whole page as having that type in
-+ * practice.
-+ */
-+ if ( type & (RAM_TYPE_RESERVED | RAM_TYPE_ACPI |
-+ RAM_TYPE_UNUSABLE) )
- continue;
-
- AMD_IOMMU_ERROR("IVMD: page at %lx can't be converted\n", addr);
---
-2.44.0
-
diff --git a/0012-x86-ucode-Distinguish-ucode-already-up-to-date.patch b/0012-x86-ucode-Distinguish-ucode-already-up-to-date.patch
new file mode 100644
index 0000000..dd2f91a
--- /dev/null
+++ b/0012-x86-ucode-Distinguish-ucode-already-up-to-date.patch
@@ -0,0 +1,58 @@
+From a42c83b202cc034c43c723cf363dbbabac61b1af Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 21 May 2024 10:22:52 +0200
+Subject: [PATCH 12/56] x86/ucode: Distinguish "ucode already up to date"
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Right now, Xen returns -ENOENT for both "the provided blob isn't correct for
+this CPU", and "the blob isn't newer than what's loaded".
+
+This in turn causes xen-ucode to exit with an error, when "nothing to do" is
+more commonly a success condition.
+
+Handle EEXIST specially and exit cleanly.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 648db37a155aca6f66d4cf3bb118417a728c3579
+master date: 2024-05-09 18:19:49 +0100
+---
+ tools/misc/xen-ucode.c | 5 ++++-
+ xen/arch/x86/cpu/microcode/core.c | 2 +-
+ 2 files changed, 5 insertions(+), 2 deletions(-)
+
+diff --git a/tools/misc/xen-ucode.c b/tools/misc/xen-ucode.c
+index c6ae6498d6..390969db3d 100644
+--- a/tools/misc/xen-ucode.c
++++ b/tools/misc/xen-ucode.c
+@@ -125,8 +125,11 @@ int main(int argc, char *argv[])
+ exit(1);
+ }
+
++ errno = 0;
+ ret = xc_microcode_update(xch, buf, len);
+- if ( ret )
++ if ( ret == -1 && errno == EEXIST )
++ printf("Microcode already up to date\n");
++ else if ( ret )
+ {
+ fprintf(stderr, "Failed to update microcode. (err: %s)\n",
+ strerror(errno));
+diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
+index 4e011cdc41..d5338ad345 100644
+--- a/xen/arch/x86/cpu/microcode/core.c
++++ b/xen/arch/x86/cpu/microcode/core.c
+@@ -640,7 +640,7 @@ static long cf_check microcode_update_helper(void *data)
+ "microcode: couldn't find any newer%s revision in the provided blob!\n",
+ opt_ucode_allow_same ? " (or the same)" : "");
+ microcode_free_patch(patch);
+- ret = -ENOENT;
++ ret = -EEXIST;
+
+ goto put;
+ }
+--
+2.45.2
+
diff --git a/0013-libxl-fix-population-of-the-online-vCPU-bitmap-for-P.patch b/0013-libxl-fix-population-of-the-online-vCPU-bitmap-for-P.patch
new file mode 100644
index 0000000..e5fb285
--- /dev/null
+++ b/0013-libxl-fix-population-of-the-online-vCPU-bitmap-for-P.patch
@@ -0,0 +1,61 @@
+From 9966e5413133157a630f7462518005fb898e582a Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 21 May 2024 10:23:27 +0200
+Subject: [PATCH 13/56] libxl: fix population of the online vCPU bitmap for PVH
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+libxl passes some information to libacpi to create the ACPI table for a PVH
+guest, and among that information it's a bitmap of which vCPUs are online
+which can be less than the maximum number of vCPUs assigned to the domain.
+
+While the population of the bitmap is done correctly for HVM based on the
+number of online vCPUs, for PVH the population of the bitmap is done based on
+the number of maximum vCPUs allowed. This leads to all local APIC entries in
+the MADT being set as enabled, which contradicts the data in xenstore if vCPUs
+is different than maximum vCPUs.
+
+Fix by copying the internal libxl bitmap that's populated based on the vCPUs
+parameter.
+
+Reported-by: Arthur Borsboom <arthurborsboom@gmail.com>
+Link: https://gitlab.com/libvirt/libvirt/-/issues/399
+Reported-by: Leigh Brown <leigh@solinno.co.uk>
+Fixes: 14c0d328da2b ('libxl/acpi: Build ACPI tables for HVMlite guests')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Tested-by: Leigh Brown <leigh@solinno.co.uk>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 5cc7347b04b2d0a3133754c7a9b936f614ec656a
+master date: 2024-05-11 00:13:43 +0100
+---
+ tools/libs/light/libxl_x86_acpi.c | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+diff --git a/tools/libs/light/libxl_x86_acpi.c b/tools/libs/light/libxl_x86_acpi.c
+index 620f3c700c..5cf261bd67 100644
+--- a/tools/libs/light/libxl_x86_acpi.c
++++ b/tools/libs/light/libxl_x86_acpi.c
+@@ -89,7 +89,7 @@ static int init_acpi_config(libxl__gc *gc,
+ uint32_t domid = dom->guest_domid;
+ xc_domaininfo_t info;
+ struct hvm_info_table *hvminfo;
+- int i, r, rc;
++ int r, rc;
+
+ config->dsdt_anycpu = config->dsdt_15cpu = dsdt_pvh;
+ config->dsdt_anycpu_len = config->dsdt_15cpu_len = dsdt_pvh_len;
+@@ -138,8 +138,8 @@ static int init_acpi_config(libxl__gc *gc,
+ hvminfo->nr_vcpus = info.max_vcpu_id + 1;
+ }
+
+- for (i = 0; i < hvminfo->nr_vcpus; i++)
+- hvminfo->vcpu_online[i / 8] |= 1 << (i & 7);
++ memcpy(hvminfo->vcpu_online, b_info->avail_vcpus.map,
++ b_info->avail_vcpus.size);
+
+ config->hvminfo = hvminfo;
+
+--
+2.45.2
+
diff --git a/0013-x86-hvm-Fix-fast-singlestep-state-persistence.patch b/0013-x86-hvm-Fix-fast-singlestep-state-persistence.patch
deleted file mode 100644
index 2a14354..0000000
--- a/0013-x86-hvm-Fix-fast-singlestep-state-persistence.patch
+++ /dev/null
@@ -1,86 +0,0 @@
-From 415f770d23f9fcbc02436560fa6583dcd8e1343f Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Petr=20Bene=C5=A1?= <w1benny@gmail.com>
-Date: Tue, 27 Feb 2024 14:07:45 +0100
-Subject: [PATCH 13/67] x86/hvm: Fix fast singlestep state persistence
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-This patch addresses an issue where the fast singlestep setting would persist
-despite xc_domain_debug_control being called with XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF.
-Specifically, if fast singlestep was enabled in a VMI session and that session
-stopped before the MTF trap occurred, the fast singlestep setting remained
-active even though MTF itself was disabled. This led to a situation where, upon
-starting a new VMI session, the first event to trigger an EPT violation would
-cause the corresponding EPT event callback to be skipped due to the lingering
-fast singlestep setting.
-
-The fix ensures that the fast singlestep setting is properly reset when
-disabling single step debugging operations.
-
-Signed-off-by: Petr Beneš <w1benny@gmail.com>
-Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
-master commit: 897def94b56175ce569673a05909d2f223e1e749
-master date: 2024-02-12 09:37:58 +0100
----
- xen/arch/x86/hvm/hvm.c | 34 ++++++++++++++++++++++++----------
- 1 file changed, 24 insertions(+), 10 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
-index d6c6ab8897..558dc3eddc 100644
---- a/xen/arch/x86/hvm/hvm.c
-+++ b/xen/arch/x86/hvm/hvm.c
-@@ -5153,26 +5153,40 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
-
- int hvm_debug_op(struct vcpu *v, int32_t op)
- {
-- int rc;
-+ int rc = 0;
-
- switch ( op )
- {
- case XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON:
- case XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF:
-- rc = -EOPNOTSUPP;
- if ( !cpu_has_monitor_trap_flag )
-- break;
-- rc = 0;
-- vcpu_pause(v);
-- v->arch.hvm.single_step =
-- (op == XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON);
-- vcpu_unpause(v); /* guest will latch new state */
-+ return -EOPNOTSUPP;
- break;
- default:
-- rc = -ENOSYS;
-- break;
-+ return -ENOSYS;
-+ }
-+
-+ vcpu_pause(v);
-+
-+ switch ( op )
-+ {
-+ case XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON:
-+ v->arch.hvm.single_step = true;
-+ break;
-+
-+ case XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF:
-+ v->arch.hvm.single_step = false;
-+ v->arch.hvm.fast_single_step.enabled = false;
-+ v->arch.hvm.fast_single_step.p2midx = 0;
-+ break;
-+
-+ default: /* Excluded above */
-+ ASSERT_UNREACHABLE();
-+ return -ENOSYS;
- }
-
-+ vcpu_unpause(v); /* guest will latch new state */
-+
- return rc;
- }
-
---
-2.44.0
-
diff --git a/0014-libxl-Fix-handling-XenStore-errors-in-device-creatio.patch b/0014-libxl-Fix-handling-XenStore-errors-in-device-creatio.patch
new file mode 100644
index 0000000..ac28521
--- /dev/null
+++ b/0014-libxl-Fix-handling-XenStore-errors-in-device-creatio.patch
@@ -0,0 +1,191 @@
+From 8271f0e8f23b63199caf0edcfe85ebc1c1412d1b Mon Sep 17 00:00:00 2001
+From: Demi Marie Obenour <demi@invisiblethingslab.com>
+Date: Tue, 21 May 2024 10:23:52 +0200
+Subject: [PATCH 14/56] libxl: Fix handling XenStore errors in device creation
+
+If xenstored runs out of memory it is possible for it to fail operations
+that should succeed. libxl wasn't robust against this, and could fail
+to ensure that the TTY path of a non-initial console was created and
+read-only for guests. This doesn't qualify for an XSA because guests
+should not be able to run xenstored out of memory, but it still needs to
+be fixed.
+
+Add the missing error checks to ensure that all errors are properly
+handled and that at no point can a guest make the TTY path of its
+frontend directory writable.
+
+Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+master commit: 531d3bea5e9357357eaf6d40f5784a1b4c29b910
+master date: 2024-05-11 00:13:43 +0100
+---
+ tools/libs/light/libxl_console.c | 11 ++---
+ tools/libs/light/libxl_device.c | 72 ++++++++++++++++++++------------
+ tools/libs/light/libxl_xshelp.c | 13 ++++--
+ 3 files changed, 60 insertions(+), 36 deletions(-)
+
+diff --git a/tools/libs/light/libxl_console.c b/tools/libs/light/libxl_console.c
+index cd7412a327..a563c9d3c7 100644
+--- a/tools/libs/light/libxl_console.c
++++ b/tools/libs/light/libxl_console.c
+@@ -351,11 +351,10 @@ int libxl__device_console_add(libxl__gc *gc, uint32_t domid,
+ flexarray_append(front, "protocol");
+ flexarray_append(front, LIBXL_XENCONSOLE_PROTOCOL);
+ }
+- libxl__device_generic_add(gc, XBT_NULL, device,
+- libxl__xs_kvs_of_flexarray(gc, back),
+- libxl__xs_kvs_of_flexarray(gc, front),
+- libxl__xs_kvs_of_flexarray(gc, ro_front));
+- rc = 0;
++ rc = libxl__device_generic_add(gc, XBT_NULL, device,
++ libxl__xs_kvs_of_flexarray(gc, back),
++ libxl__xs_kvs_of_flexarray(gc, front),
++ libxl__xs_kvs_of_flexarray(gc, ro_front));
+ out:
+ return rc;
+ }
+@@ -665,6 +664,8 @@ int libxl_device_channel_getinfo(libxl_ctx *ctx, uint32_t domid,
+ */
+ if (!val) val = "/NO-SUCH-PATH";
+ channelinfo->u.pty.path = strdup(val);
++ if (channelinfo->u.pty.path == NULL)
++ abort();
+ break;
+ default:
+ break;
+diff --git a/tools/libs/light/libxl_device.c b/tools/libs/light/libxl_device.c
+index 13da6e0573..3035501f2c 100644
+--- a/tools/libs/light/libxl_device.c
++++ b/tools/libs/light/libxl_device.c
+@@ -177,8 +177,13 @@ int libxl__device_generic_add(libxl__gc *gc, xs_transaction_t t,
+ ro_frontend_perms[1].perms = backend_perms[1].perms = XS_PERM_READ;
+
+ retry_transaction:
+- if (create_transaction)
++ if (create_transaction) {
+ t = xs_transaction_start(ctx->xsh);
++ if (t == XBT_NULL) {
++ LOGED(ERROR, device->domid, "xs_transaction_start failed");
++ return ERROR_FAIL;
++ }
++ }
+
+ /* FIXME: read frontend_path and check state before removing stuff */
+
+@@ -195,42 +200,55 @@ retry_transaction:
+ if (rc) goto out;
+ }
+
+- /* xxx much of this function lacks error checks! */
+-
+ if (fents || ro_fents) {
+- xs_rm(ctx->xsh, t, frontend_path);
+- xs_mkdir(ctx->xsh, t, frontend_path);
++ if (!xs_rm(ctx->xsh, t, frontend_path) && errno != ENOENT)
++ goto out;
++ if (!xs_mkdir(ctx->xsh, t, frontend_path))
++ goto out;
+ /* Console 0 is a special case. It doesn't use the regular PV
+ * state machine but also the frontend directory has
+ * historically contained other information, such as the
+ * vnc-port, which we don't want the guest fiddling with.
+ */
+ if ((device->kind == LIBXL__DEVICE_KIND_CONSOLE && device->devid == 0) ||
+- (device->kind == LIBXL__DEVICE_KIND_VUART))
+- xs_set_permissions(ctx->xsh, t, frontend_path,
+- ro_frontend_perms, ARRAY_SIZE(ro_frontend_perms));
+- else
+- xs_set_permissions(ctx->xsh, t, frontend_path,
+- frontend_perms, ARRAY_SIZE(frontend_perms));
+- xs_write(ctx->xsh, t, GCSPRINTF("%s/backend", frontend_path),
+- backend_path, strlen(backend_path));
+- if (fents)
+- libxl__xs_writev_perms(gc, t, frontend_path, fents,
+- frontend_perms, ARRAY_SIZE(frontend_perms));
+- if (ro_fents)
+- libxl__xs_writev_perms(gc, t, frontend_path, ro_fents,
+- ro_frontend_perms, ARRAY_SIZE(ro_frontend_perms));
++ (device->kind == LIBXL__DEVICE_KIND_VUART)) {
++ if (!xs_set_permissions(ctx->xsh, t, frontend_path,
++ ro_frontend_perms, ARRAY_SIZE(ro_frontend_perms)))
++ goto out;
++ } else {
++ if (!xs_set_permissions(ctx->xsh, t, frontend_path,
++ frontend_perms, ARRAY_SIZE(frontend_perms)))
++ goto out;
++ }
++ if (!xs_write(ctx->xsh, t, GCSPRINTF("%s/backend", frontend_path),
++ backend_path, strlen(backend_path)))
++ goto out;
++ if (fents) {
++ rc = libxl__xs_writev_perms(gc, t, frontend_path, fents,
++ frontend_perms, ARRAY_SIZE(frontend_perms));
++ if (rc) goto out;
++ }
++ if (ro_fents) {
++ rc = libxl__xs_writev_perms(gc, t, frontend_path, ro_fents,
++ ro_frontend_perms, ARRAY_SIZE(ro_frontend_perms));
++ if (rc) goto out;
++ }
+ }
+
+ if (bents) {
+ if (!libxl_only) {
+- xs_rm(ctx->xsh, t, backend_path);
+- xs_mkdir(ctx->xsh, t, backend_path);
+- xs_set_permissions(ctx->xsh, t, backend_path, backend_perms,
+- ARRAY_SIZE(backend_perms));
+- xs_write(ctx->xsh, t, GCSPRINTF("%s/frontend", backend_path),
+- frontend_path, strlen(frontend_path));
+- libxl__xs_writev(gc, t, backend_path, bents);
++ if (!xs_rm(ctx->xsh, t, backend_path) && errno != ENOENT)
++ goto out;
++ if (!xs_mkdir(ctx->xsh, t, backend_path))
++ goto out;
++ if (!xs_set_permissions(ctx->xsh, t, backend_path, backend_perms,
++ ARRAY_SIZE(backend_perms)))
++ goto out;
++ if (!xs_write(ctx->xsh, t, GCSPRINTF("%s/frontend", backend_path),
++ frontend_path, strlen(frontend_path)))
++ goto out;
++ rc = libxl__xs_writev(gc, t, backend_path, bents);
++ if (rc) goto out;
+ }
+
+ /*
+@@ -276,7 +294,7 @@ retry_transaction:
+ out:
+ if (create_transaction && t)
+ libxl__xs_transaction_abort(gc, &t);
+- return rc;
++ return rc != 0 ? rc : ERROR_FAIL;
+ }
+
+ typedef struct {
+diff --git a/tools/libs/light/libxl_xshelp.c b/tools/libs/light/libxl_xshelp.c
+index 751cd942d9..a6e34ab10f 100644
+--- a/tools/libs/light/libxl_xshelp.c
++++ b/tools/libs/light/libxl_xshelp.c
+@@ -60,10 +60,15 @@ int libxl__xs_writev_perms(libxl__gc *gc, xs_transaction_t t,
+ for (i = 0; kvs[i] != NULL; i += 2) {
+ path = GCSPRINTF("%s/%s", dir, kvs[i]);
+ if (path && kvs[i + 1]) {
+- int length = strlen(kvs[i + 1]);
+- xs_write(ctx->xsh, t, path, kvs[i + 1], length);
+- if (perms)
+- xs_set_permissions(ctx->xsh, t, path, perms, num_perms);
++ size_t length = strlen(kvs[i + 1]);
++ if (length > UINT_MAX)
++ return ERROR_FAIL;
++ if (!xs_write(ctx->xsh, t, path, kvs[i + 1], length))
++ return ERROR_FAIL;
++ if (perms) {
++ if (!xs_set_permissions(ctx->xsh, t, path, perms, num_perms))
++ return ERROR_FAIL;
++ }
+ }
+ }
+ return 0;
+--
+2.45.2
+
diff --git a/0014-x86-HVM-tidy-state-on-hvmemul_map_linear_addr-s-erro.patch b/0014-x86-HVM-tidy-state-on-hvmemul_map_linear_addr-s-erro.patch
deleted file mode 100644
index 6536674..0000000
--- a/0014-x86-HVM-tidy-state-on-hvmemul_map_linear_addr-s-erro.patch
+++ /dev/null
@@ -1,63 +0,0 @@
-From b3ae0e6201495216b12157bd8b2382b28fdd7dae Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 27 Feb 2024 14:08:20 +0100
-Subject: [PATCH 14/67] x86/HVM: tidy state on hvmemul_map_linear_addr()'s
- error path
-
-While in the vast majority of cases failure of the function will not
-be followed by re-invocation with the same emulation context, a few
-very specific insns - involving multiple independent writes, e.g. ENTER
-and PUSHA - exist where this can happen. Since failure of the function
-only signals to the caller that it ought to try an MMIO write instead,
-such failure also cannot be assumed to result in wholesale failure of
-emulation of the current insn. Instead we have to maintain internal
-state such that another invocation of the function with the same
-emulation context remains possible. To achieve that we need to reset MFN
-slots after putting page references on the error path.
-
-Note that all of this affects debugging code only, in causing an
-assertion to trigger (higher up in the function). There's otherwise no
-misbehavior - such a "leftover" slot would simply be overwritten by new
-contents in a release build.
-
-Also extend the related unmap() assertion, to further check for MFN 0.
-
-Fixes: 8cbd4fb0b7ea ("x86/hvm: implement hvmemul_write() using real mappings")
-Reported-by: Manuel Andreas <manuel.andreas@tum.de>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Paul Durrant <paul@xen.org>
-master commit: e72f951df407bc3be82faac64d8733a270036ba1
-master date: 2024-02-13 09:36:14 +0100
----
- xen/arch/x86/hvm/emulate.c | 7 ++++++-
- 1 file changed, 6 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
-index 275451dd36..27928dc3f3 100644
---- a/xen/arch/x86/hvm/emulate.c
-+++ b/xen/arch/x86/hvm/emulate.c
-@@ -697,7 +697,12 @@ static void *hvmemul_map_linear_addr(
- out:
- /* Drop all held references. */
- while ( mfn-- > hvmemul_ctxt->mfn )
-+ {
- put_page(mfn_to_page(*mfn));
-+#ifndef NDEBUG /* Clean slot for a subsequent map()'s error checking. */
-+ *mfn = _mfn(0);
-+#endif
-+ }
-
- return err;
- }
-@@ -719,7 +724,7 @@ static void hvmemul_unmap_linear_addr(
-
- for ( i = 0; i < nr_frames; i++ )
- {
-- ASSERT(mfn_valid(*mfn));
-+ ASSERT(mfn_x(*mfn) && mfn_valid(*mfn));
- paging_mark_dirty(currd, *mfn);
- put_page(mfn_to_page(*mfn));
-
---
-2.44.0
-
diff --git a/0015-build-Replace-which-with-command-v.patch b/0015-build-Replace-which-with-command-v.patch
deleted file mode 100644
index 57f21d4..0000000
--- a/0015-build-Replace-which-with-command-v.patch
+++ /dev/null
@@ -1,57 +0,0 @@
-From 1330a5fe44ca91f98857b53fe8bbe06522d9db27 Mon Sep 17 00:00:00 2001
-From: Anthony PERARD <anthony.perard@citrix.com>
-Date: Tue, 27 Feb 2024 14:08:50 +0100
-Subject: [PATCH 15/67] build: Replace `which` with `command -v`
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The `which` command is not standard, may not exist on the build host,
-or may not behave as expected by the build system. It is recommended
-to use `command -v` to find out if a command exist and have its path,
-and it's part of a POSIX shell standard (at least, it seems to be
-mandatory since IEEE Std 1003.1-2008, but was optional before).
-
-Fixes: c8a8645f1efe ("xen/build: Automatically locate a suitable python interpreter")
-Fixes: 3b47bcdb6d38 ("xen/build: Use a distro version of figlet")
-Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
-Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: f93629b18b528a5ab1b1092949c5420069c7226c
-master date: 2024-02-19 12:45:48 +0100
----
- xen/Makefile | 4 ++--
- xen/build.mk | 2 +-
- 2 files changed, 3 insertions(+), 3 deletions(-)
-
-diff --git a/xen/Makefile b/xen/Makefile
-index dd0b004e1c..7ea13a6791 100644
---- a/xen/Makefile
-+++ b/xen/Makefile
-@@ -25,8 +25,8 @@ export XEN_BUILD_HOST := $(shell hostname)
- endif
-
- # Best effort attempt to find a python interpreter, defaulting to Python 3 if
--# available. Fall back to just `python` if `which` is nowhere to be found.
--PYTHON_INTERPRETER := $(word 1,$(shell which python3 python python2 2>/dev/null) python)
-+# available. Fall back to just `python`.
-+PYTHON_INTERPRETER := $(word 1,$(shell command -v python3 || command -v python || command -v python2) python)
- export PYTHON ?= $(PYTHON_INTERPRETER)
-
- export CHECKPOLICY ?= checkpolicy
-diff --git a/xen/build.mk b/xen/build.mk
-index 9ecb104f1e..b489f77b7c 100644
---- a/xen/build.mk
-+++ b/xen/build.mk
-@@ -1,6 +1,6 @@
- quiet_cmd_banner = BANNER $@
- define cmd_banner
-- if which figlet >/dev/null 2>&1 ; then \
-+ if command -v figlet >/dev/null 2>&1 ; then \
- echo " Xen $(XEN_FULLVERSION)" | figlet -f $< > $@.tmp; \
- else \
- echo " Xen $(XEN_FULLVERSION)" > $@.tmp; \
---
-2.44.0
-
diff --git a/0015-xen-sched-set-all-sched_resource-data-inside-locked-.patch b/0015-xen-sched-set-all-sched_resource-data-inside-locked-.patch
new file mode 100644
index 0000000..a8090d4
--- /dev/null
+++ b/0015-xen-sched-set-all-sched_resource-data-inside-locked-.patch
@@ -0,0 +1,84 @@
+From 3999b675cad5b717274d6493899b0eea8896f4d7 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Tue, 21 May 2024 10:24:26 +0200
+Subject: [PATCH 15/56] xen/sched: set all sched_resource data inside locked
+ region for new cpu
+
+When adding a cpu to a scheduler, set all data items of struct
+sched_resource inside the locked region, as otherwise a race might
+happen (e.g. when trying to access the cpupool of the cpu):
+
+ (XEN) ----[ Xen-4.19.0-1-d x86_64 debug=y Tainted: H ]----
+ (XEN) CPU: 45
+ (XEN) RIP: e008:[<ffff82d040244cbf>] common/sched/credit.c#csched_load_balance+0x41/0x877
+ (XEN) RFLAGS: 0000000000010092 CONTEXT: hypervisor
+ (XEN) rax: ffff82d040981618 rbx: ffff82d040981618 rcx: 0000000000000000
+ (XEN) rdx: 0000003ff68cd000 rsi: 000000000000002d rdi: ffff83103723d450
+ (XEN) rbp: ffff83207caa7d48 rsp: ffff83207caa7b98 r8: 0000000000000000
+ (XEN) r9: ffff831037253cf0 r10: ffff83103767c3f0 r11: 0000000000000009
+ (XEN) r12: ffff831037237990 r13: ffff831037237990 r14: ffff831037253720
+ (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 0000000000f526e0
+ (XEN) cr3: 000000005bc2f000 cr2: 0000000000000010
+ (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000000
+ (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
+ (XEN) Xen code around <ffff82d040244cbf> (common/sched/credit.c#csched_load_balance+0x41/0x877):
+ (XEN) 48 8b 0c 10 48 8b 49 08 <48> 8b 79 10 48 89 bd b8 fe ff ff 49 8b 4e 28 48
+ <snip>
+ (XEN) Xen call trace:
+ (XEN) [<ffff82d040244cbf>] R common/sched/credit.c#csched_load_balance+0x41/0x877
+ (XEN) [<ffff82d040245a18>] F common/sched/credit.c#csched_schedule+0x36a/0x69f
+ (XEN) [<ffff82d040252644>] F common/sched/core.c#do_schedule+0xe8/0x433
+ (XEN) [<ffff82d0402572dd>] F common/sched/core.c#schedule+0x2e5/0x2f9
+ (XEN) [<ffff82d040232f35>] F common/softirq.c#__do_softirq+0x94/0xbe
+ (XEN) [<ffff82d040232fc8>] F do_softirq+0x13/0x15
+ (XEN) [<ffff82d0403075ef>] F arch/x86/domain.c#idle_loop+0x92/0xe6
+ (XEN)
+ (XEN) Pagetable walk from 0000000000000010:
+ (XEN) L4[0x000] = 000000103ff61063 ffffffffffffffff
+ (XEN) L3[0x000] = 000000103ff60063 ffffffffffffffff
+ (XEN) L2[0x000] = 0000001033dff063 ffffffffffffffff
+ (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
+ (XEN)
+ (XEN) ****************************************
+ (XEN) Panic on CPU 45:
+ (XEN) FATAL PAGE FAULT
+ (XEN) [error_code=0000]
+ (XEN) Faulting linear address: 0000000000000010
+ (XEN) ****************************************
+
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Fixes: a8c6c623192e ("sched: clarify use cases of schedule_cpu_switch()")
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: d104a07524ffc92ae7a70dfe192c291de2a563cc
+master date: 2024-05-15 19:59:52 +0100
+---
+ xen/common/sched/core.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
+index 34ad39b9ad..3c2403ebcf 100644
+--- a/xen/common/sched/core.c
++++ b/xen/common/sched/core.c
+@@ -3179,6 +3179,8 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
+
+ sr->scheduler = new_ops;
+ sr->sched_priv = ppriv;
++ sr->granularity = cpupool_get_granularity(c);
++ sr->cpupool = c;
+
+ /*
+ * Reroute the lock to the per pCPU lock as /last/ thing. In fact,
+@@ -3191,8 +3193,6 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
+ /* _Not_ pcpu_schedule_unlock(): schedule_lock has changed! */
+ spin_unlock_irqrestore(old_lock, flags);
+
+- sr->granularity = cpupool_get_granularity(c);
+- sr->cpupool = c;
+ /* The cpu is added to a pool, trigger it to go pick up some work */
+ cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+
+--
+2.45.2
+
diff --git a/0016-libxl-Disable-relocating-memory-for-qemu-xen-in-stub.patch b/0016-libxl-Disable-relocating-memory-for-qemu-xen-in-stub.patch
deleted file mode 100644
index f75e07c..0000000
--- a/0016-libxl-Disable-relocating-memory-for-qemu-xen-in-stub.patch
+++ /dev/null
@@ -1,50 +0,0 @@
-From b9745280736ee526374873aa3c4142596e2ba10b Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Marek=20Marczykowski-G=C3=B3recki?=
- <marmarek@invisiblethingslab.com>
-Date: Tue, 27 Feb 2024 14:09:19 +0100
-Subject: [PATCH 16/67] libxl: Disable relocating memory for qemu-xen in
- stubdomain too
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-According to comments (and experiments) qemu-xen cannot handle memory
-reolcation done by hvmloader. The code was already disabled when running
-qemu-xen in dom0 (see libxl__spawn_local_dm()), but it was missed when
-adding qemu-xen support to stubdomain. Adjust libxl__spawn_stub_dm() to
-be consistent in this regard.
-
-Reported-by: Neowutran <xen@neowutran.ovh>
-Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
-Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
-Acked-by: Anthony PERARD <anthony.perard@citrix.com>
-master commit: 97883aa269f6745a6ded232be3a855abb1297e0d
-master date: 2024-02-22 11:48:22 +0100
----
- tools/libs/light/libxl_dm.c | 10 ++++++++++
- 1 file changed, 10 insertions(+)
-
-diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
-index 14b593110f..ed620a9d8e 100644
---- a/tools/libs/light/libxl_dm.c
-+++ b/tools/libs/light/libxl_dm.c
-@@ -2432,6 +2432,16 @@ void libxl__spawn_stub_dm(libxl__egc *egc, libxl__stub_dm_spawn_state *sdss)
- "%s",
- libxl_bios_type_to_string(guest_config->b_info.u.hvm.bios));
- }
-+ /* Disable relocating memory to make the MMIO hole larger
-+ * unless we're running qemu-traditional and vNUMA is not
-+ * configured. */
-+ libxl__xs_printf(gc, XBT_NULL,
-+ libxl__sprintf(gc, "%s/hvmloader/allow-memory-relocate",
-+ libxl__xs_get_dompath(gc, guest_domid)),
-+ "%d",
-+ guest_config->b_info.device_model_version
-+ == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL &&
-+ !libxl__vnuma_configured(&guest_config->b_info));
- ret = xc_domain_set_target(ctx->xch, dm_domid, guest_domid);
- if (ret<0) {
- LOGED(ERROR, guest_domid, "setting target domain %d -> %d",
---
-2.44.0
-
diff --git a/0016-x86-respect-mapcache_domain_init-failing.patch b/0016-x86-respect-mapcache_domain_init-failing.patch
new file mode 100644
index 0000000..db7ddfe
--- /dev/null
+++ b/0016-x86-respect-mapcache_domain_init-failing.patch
@@ -0,0 +1,38 @@
+From dfabab2cd9461ef9d21a708461f35d2ae4b55220 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 21 May 2024 10:25:08 +0200
+Subject: [PATCH 16/56] x86: respect mapcache_domain_init() failing
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The function itself properly handles and hands onwards failure from
+create_perdomain_mapping(). Therefore its caller should respect possible
+failure, too.
+
+Fixes: 4b28bf6ae90b ("x86: re-introduce map_domain_page() et al")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 7270fdc7a0028d4b7b26fd1b36c6b9e97abcf3da
+master date: 2024-05-15 19:59:52 +0100
+---
+ xen/arch/x86/domain.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
+index 307446273a..5feb0d0679 100644
+--- a/xen/arch/x86/domain.c
++++ b/xen/arch/x86/domain.c
+@@ -850,7 +850,8 @@ int arch_domain_create(struct domain *d,
+ }
+ else if ( is_pv_domain(d) )
+ {
+- mapcache_domain_init(d);
++ if ( (rc = mapcache_domain_init(d)) != 0 )
++ goto fail;
+
+ if ( (rc = pv_domain_initialise(d)) != 0 )
+ goto fail;
+--
+2.45.2
+
diff --git a/0017-build-make-sure-build-fails-when-running-kconfig-fai.patch b/0017-build-make-sure-build-fails-when-running-kconfig-fai.patch
deleted file mode 100644
index 1bb3aa8..0000000
--- a/0017-build-make-sure-build-fails-when-running-kconfig-fai.patch
+++ /dev/null
@@ -1,58 +0,0 @@
-From ea869977271f93945451908be9b6117ffd1fb02d Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 27 Feb 2024 14:09:37 +0100
-Subject: [PATCH 17/67] build: make sure build fails when running kconfig fails
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Because of using "-include", failure to (re)build auto.conf (with
-auto.conf.cmd produced as a secondary target) won't stop make from
-continuing the build. Arrange for it being possible to drop the - from
-Rules.mk, requiring that the include be skipped for tools-only targets.
-Note that relying on the inclusion in those cases wouldn't be correct
-anyway, as it might be a stale file (yet to be rebuilt) which would be
-included, while during initial build, the file would be absent
-altogether.
-
-Fixes: 8d4c17a90b0a ("xen/build: silence make warnings about missing auto.conf*")
-Reported-by: Roger Pau Monné <roger.pau@citrix.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
-master commit: d34e5fa2e8db19f23081f46a3e710bb122130691
-master date: 2024-02-22 11:52:47 +0100
----
- xen/Makefile | 1 +
- xen/Rules.mk | 4 +++-
- 2 files changed, 4 insertions(+), 1 deletion(-)
-
-diff --git a/xen/Makefile b/xen/Makefile
-index 7ea13a6791..bac3684a36 100644
---- a/xen/Makefile
-+++ b/xen/Makefile
-@@ -374,6 +374,7 @@ $(KCONFIG_CONFIG): tools_fixdep
- # This exploits the 'multi-target pattern rule' trick.
- # The syncconfig should be executed only once to make all the targets.
- include/config/%.conf include/config/%.conf.cmd: $(KCONFIG_CONFIG)
-+ $(Q)rm -f include/config/auto.conf
- $(Q)$(MAKE) $(build)=tools/kconfig syncconfig
-
- ifeq ($(CONFIG_DEBUG),y)
-diff --git a/xen/Rules.mk b/xen/Rules.mk
-index 8af3dd7277..d759cccee3 100644
---- a/xen/Rules.mk
-+++ b/xen/Rules.mk
-@@ -15,7 +15,9 @@ srcdir := $(srctree)/$(src)
- PHONY := __build
- __build:
-
---include $(objtree)/include/config/auto.conf
-+ifneq ($(firstword $(subst /, ,$(obj))),tools)
-+include $(objtree)/include/config/auto.conf
-+endif
-
- include $(XEN_ROOT)/Config.mk
- include $(srctree)/scripts/Kbuild.include
---
-2.44.0
-
diff --git a/0017-tools-xentop-Fix-cpu-sort-order.patch b/0017-tools-xentop-Fix-cpu-sort-order.patch
new file mode 100644
index 0000000..de19ddc
--- /dev/null
+++ b/0017-tools-xentop-Fix-cpu-sort-order.patch
@@ -0,0 +1,76 @@
+From f3d20dd31770a70971f4f85521eec1e741d38695 Mon Sep 17 00:00:00 2001
+From: Leigh Brown <leigh@solinno.co.uk>
+Date: Tue, 21 May 2024 10:25:30 +0200
+Subject: [PATCH 17/56] tools/xentop: Fix cpu% sort order
+
+In compare_cpu_pct(), there is a double -> unsigned long long converion when
+calling compare(). In C, this discards the fractional part, resulting in an
+out-of order sorting such as:
+
+ NAME STATE CPU(sec) CPU(%)
+ xendd --b--- 4020 5.7
+ icecream --b--- 2600 3.8
+ Domain-0 -----r 1060 1.5
+ neon --b--- 827 1.1
+ cheese --b--- 225 0.7
+ pizza --b--- 359 0.5
+ cassini --b--- 490 0.4
+ fusilli --b--- 159 0.2
+ bob --b--- 502 0.2
+ blender --b--- 121 0.2
+ bread --b--- 69 0.1
+ chickpea --b--- 67 0.1
+ lentil --b--- 67 0.1
+
+Introduce compare_dbl() function and update compare_cpu_pct() to call it.
+
+Fixes: 49839b535b78 ("Add xenstat framework.")
+Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: e27fc7d15eab79e604e8b8728778594accc23cf1
+master date: 2024-05-15 19:59:52 +0100
+---
+ tools/xentop/xentop.c | 13 ++++++++++++-
+ 1 file changed, 12 insertions(+), 1 deletion(-)
+
+diff --git a/tools/xentop/xentop.c b/tools/xentop/xentop.c
+index 545bd5e96d..c2a311befe 100644
+--- a/tools/xentop/xentop.c
++++ b/tools/xentop/xentop.c
+@@ -85,6 +85,7 @@ static void set_delay(const char *value);
+ static void set_prompt(const char *new_prompt, void (*func)(const char *));
+ static int handle_key(int);
+ static int compare(unsigned long long, unsigned long long);
++static int compare_dbl(double, double);
+ static int compare_domains(xenstat_domain **, xenstat_domain **);
+ static unsigned long long tot_net_bytes( xenstat_domain *, int);
+ static bool tot_vbd_reqs(xenstat_domain *, int, unsigned long long *);
+@@ -422,6 +423,16 @@ static int compare(unsigned long long i1, unsigned long long i2)
+ return 0;
+ }
+
++/* Compares two double precision numbers, returning -1,0,1 for <,=,> */
++static int compare_dbl(double d1, double d2)
++{
++ if (d1 < d2)
++ return -1;
++ if (d1 > d2)
++ return 1;
++ return 0;
++}
++
+ /* Comparison function for use with qsort. Compares two domains using the
+ * current sort field. */
+ static int compare_domains(xenstat_domain **domain1, xenstat_domain **domain2)
+@@ -523,7 +534,7 @@ static double get_cpu_pct(xenstat_domain *domain)
+
+ static int compare_cpu_pct(xenstat_domain *domain1, xenstat_domain *domain2)
+ {
+- return -compare(get_cpu_pct(domain1), get_cpu_pct(domain2));
++ return -compare_dbl(get_cpu_pct(domain1), get_cpu_pct(domain2));
+ }
+
+ /* Prints cpu percentage statistic */
+--
+2.45.2
+
diff --git a/0018-x86-mtrr-avoid-system-wide-rendezvous-when-setting-A.patch b/0018-x86-mtrr-avoid-system-wide-rendezvous-when-setting-A.patch
new file mode 100644
index 0000000..a57775d
--- /dev/null
+++ b/0018-x86-mtrr-avoid-system-wide-rendezvous-when-setting-A.patch
@@ -0,0 +1,60 @@
+From 7cdb1fa2ab0b5e11f66cada0370770404153c824 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 21 May 2024 10:25:39 +0200
+Subject: [PATCH 18/56] x86/mtrr: avoid system wide rendezvous when setting AP
+ MTRRs
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+There's no point in forcing a system wide update of the MTRRs on all processors
+when there are no changes to be propagated. On AP startup it's only the AP
+that needs to write the system wide MTRR values in order to match the rest of
+the already online CPUs.
+
+We have occasionally seen the watchdog trigger during `xen-hptool cpu-online`
+in one Intel Cascade Lake box with 448 CPUs due to the re-setting of the MTRRs
+on all the CPUs in the system.
+
+While there adjust the comment to clarify why the system-wide resetting of the
+MTRR registers is not needed for the purposes of mtrr_ap_init().
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: abd00b037da5ffa4e8c4508a5df0cd6eabb805a4
+master date: 2024-05-15 19:59:52 +0100
+---
+ xen/arch/x86/cpu/mtrr/main.c | 15 ++++++++-------
+ 1 file changed, 8 insertions(+), 7 deletions(-)
+
+diff --git a/xen/arch/x86/cpu/mtrr/main.c b/xen/arch/x86/cpu/mtrr/main.c
+index 90b235f57e..0a44ebbcb0 100644
+--- a/xen/arch/x86/cpu/mtrr/main.c
++++ b/xen/arch/x86/cpu/mtrr/main.c
+@@ -573,14 +573,15 @@ void mtrr_ap_init(void)
+ if (!mtrr_if || hold_mtrr_updates_on_aps)
+ return;
+ /*
+- * Ideally we should hold mtrr_mutex here to avoid mtrr entries changed,
+- * but this routine will be called in cpu boot time, holding the lock
+- * breaks it. This routine is called in two cases: 1.very earily time
+- * of software resume, when there absolutely isn't mtrr entry changes;
+- * 2.cpu hotadd time. We let mtrr_add/del_page hold cpuhotplug lock to
+- * prevent mtrr entry changes
++ * hold_mtrr_updates_on_aps takes care of preventing unnecessary MTRR
++ * updates when batch starting the CPUs (see
++ * mtrr_aps_sync_{begin,end}()).
++ *
++ * Otherwise just apply the current system wide MTRR values to this AP.
++ * Note this doesn't require synchronization with the other CPUs, as
++ * there are strictly no modifications of the current MTRR values.
+ */
+- set_mtrr(~0U, 0, 0, 0);
++ mtrr_set_all();
+ }
+
+ /**
+--
+2.45.2
+
diff --git a/0018-x86emul-add-missing-EVEX.R-checks.patch b/0018-x86emul-add-missing-EVEX.R-checks.patch
deleted file mode 100644
index 12e7702..0000000
--- a/0018-x86emul-add-missing-EVEX.R-checks.patch
+++ /dev/null
@@ -1,50 +0,0 @@
-From 16f2e47eb1207d866f95cf694a60a7ceb8f96a36 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 27 Feb 2024 14:09:55 +0100
-Subject: [PATCH 18/67] x86emul: add missing EVEX.R' checks
-
-EVEX.R' is not ignored in 64-bit code when encoding a GPR or mask
-register. While for mask registers suitable checks are in place (there
-also covering EVEX.R), they were missing for the few cases where in
-EVEX-encoded instructions ModR/M.reg encodes a GPR. While for VPEXTRW
-the bit is replaced before an emulation stub is invoked, for
-VCVT{,T}{S,D,H}2{,U}SI this actually would have led to #UD from inside
-an emulation stub, in turn raising #UD to the guest, but accompanied by
-log messages indicating something's wrong in Xen nevertheless.
-
-Fixes: 001bd91ad864 ("x86emul: support AVX512{F,BW,DQ} extract insns")
-Fixes: baf4a376f550 ("x86emul: support AVX512F legacy-equivalent scalar int/FP conversion insns")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: cb319824bfa8d3c9ea0410cc71daaedc3e11aa2a
-master date: 2024-02-22 11:54:07 +0100
----
- xen/arch/x86/x86_emulate/x86_emulate.c | 5 +++--
- 1 file changed, 3 insertions(+), 2 deletions(-)
-
-diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
-index 0c0336f737..995670cbc8 100644
---- a/xen/arch/x86/x86_emulate/x86_emulate.c
-+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
-@@ -6829,7 +6829,8 @@ x86_emulate(
- CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x2d): /* vcvts{s,d}2si xmm/mem,reg */
- CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x78): /* vcvtts{s,d}2usi xmm/mem,reg */
- CASE_SIMD_SCALAR_FP(_EVEX, 0x0f, 0x79): /* vcvts{s,d}2usi xmm/mem,reg */
-- generate_exception_if((evex.reg != 0xf || !evex.RX || evex.opmsk ||
-+ generate_exception_if((evex.reg != 0xf || !evex.RX || !evex.R ||
-+ evex.opmsk ||
- (ea.type != OP_REG && evex.brs)),
- EXC_UD);
- host_and_vcpu_must_have(avx512f);
-@@ -10705,7 +10706,7 @@ x86_emulate(
- goto pextr;
-
- case X86EMUL_OPC_EVEX_66(0x0f, 0xc5): /* vpextrw $imm8,xmm,reg */
-- generate_exception_if(ea.type != OP_REG, EXC_UD);
-+ generate_exception_if(ea.type != OP_REG || !evex.R, EXC_UD);
- /* Convert to alternative encoding: We want to use a memory operand. */
- evex.opcx = ext_0f3a;
- b = 0x15;
---
-2.44.0
-
diff --git a/0001-update-Xen-version-to-4.17.4-pre.patch b/0019-update-Xen-version-to-4.18.3-pre.patch
similarity index 58%
rename from 0001-update-Xen-version-to-4.17.4-pre.patch
rename to 0019-update-Xen-version-to-4.18.3-pre.patch
index e1070c9..34f2b33 100644
--- a/0001-update-Xen-version-to-4.17.4-pre.patch
+++ b/0019-update-Xen-version-to-4.18.3-pre.patch
@@ -1,25 +1,25 @@
-From 4f6e9d4327eb5252f1e8cac97a095d8b8485dadb Mon Sep 17 00:00:00 2001
+From 01f7a3c792241d348a4e454a30afdf6c0d6cd71c Mon Sep 17 00:00:00 2001
From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 30 Jan 2024 14:36:44 +0100
-Subject: [PATCH 01/67] update Xen version to 4.17.4-pre
+Date: Tue, 21 May 2024 11:52:11 +0200
+Subject: [PATCH 19/56] update Xen version to 4.18.3-pre
---
xen/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/Makefile b/xen/Makefile
-index a46e6330db..dd0b004e1c 100644
+index 657f6fa4e3..786ab61600 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -6,7 +6,7 @@ this-makefile := $(call lastword,$(MAKEFILE_LIST))
# All other places this is stored (eg. compile.h) should be autogenerated.
export XEN_VERSION = 4
- export XEN_SUBVERSION = 17
--export XEN_EXTRAVERSION ?= .3$(XEN_VENDORVERSION)
-+export XEN_EXTRAVERSION ?= .4-pre$(XEN_VENDORVERSION)
+ export XEN_SUBVERSION = 18
+-export XEN_EXTRAVERSION ?= .2$(XEN_VENDORVERSION)
++export XEN_EXTRAVERSION ?= .3-pre$(XEN_VENDORVERSION)
export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
-include xen-version
--
-2.44.0
+2.45.2
diff --git a/0019-xen-livepatch-fix-norevert-test-hook-setup-typo.patch b/0019-xen-livepatch-fix-norevert-test-hook-setup-typo.patch
deleted file mode 100644
index 1676f7a..0000000
--- a/0019-xen-livepatch-fix-norevert-test-hook-setup-typo.patch
+++ /dev/null
@@ -1,36 +0,0 @@
-From f6b12792542e372f36a71ea4c2563e6dd6e4fa57 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 27 Feb 2024 14:10:24 +0100
-Subject: [PATCH 19/67] xen/livepatch: fix norevert test hook setup typo
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The test code has a typo in using LIVEPATCH_APPLY_HOOK() instead of
-LIVEPATCH_REVERT_HOOK().
-
-Fixes: 6047104c3ccc ('livepatch: Add per-function applied/reverted state tracking marker')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
-master commit: f0622dd4fd6ae6ddb523a45d89ed9b8f3a9a8f36
-master date: 2024-02-26 10:13:46 +0100
----
- xen/test/livepatch/xen_action_hooks_norevert.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/test/livepatch/xen_action_hooks_norevert.c b/xen/test/livepatch/xen_action_hooks_norevert.c
-index 3e21ade6ab..c173855192 100644
---- a/xen/test/livepatch/xen_action_hooks_norevert.c
-+++ b/xen/test/livepatch/xen_action_hooks_norevert.c
-@@ -120,7 +120,7 @@ static void post_revert_hook(livepatch_payload_t *payload)
- printk(KERN_DEBUG "%s: Hook done.\n", __func__);
- }
-
--LIVEPATCH_APPLY_HOOK(revert_hook);
-+LIVEPATCH_REVERT_HOOK(revert_hook);
-
- LIVEPATCH_PREAPPLY_HOOK(pre_apply_hook);
- LIVEPATCH_POSTAPPLY_HOOK(post_apply_hook);
---
-2.44.0
-
diff --git a/0020-x86-ucode-Further-fixes-to-identify-ucode-already-up.patch b/0020-x86-ucode-Further-fixes-to-identify-ucode-already-up.patch
new file mode 100644
index 0000000..c00dce2
--- /dev/null
+++ b/0020-x86-ucode-Further-fixes-to-identify-ucode-already-up.patch
@@ -0,0 +1,92 @@
+From cd873f00bedca2f1afeaf13a78f70e719c5b1398 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 26 Jun 2024 13:36:13 +0200
+Subject: [PATCH 20/56] x86/ucode: Further fixes to identify "ucode already up
+ to date"
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When the revision in hardware is newer than anything Xen has to hand,
+'microcode_cache' isn't set up. Then, `xen-ucode` initiates the update
+because it doesn't know whether the revisions across the system are symmetric
+or not. This involves the patch getting all the way into the
+apply_microcode() hooks before being found to be too old.
+
+This is all a giant mess and needs an overhaul, but in the short term simply
+adjust the apply_microcode() to return -EEXIST.
+
+Also, unconditionally print the preexisting microcode revision on boot. It's
+relevant information which is otherwise unavailable if Xen doesn't find new
+microcode to use.
+
+Fixes: 648db37a155a ("x86/ucode: Distinguish "ucode already up to date"")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 977d98e67c2e929c62aa1f495fc4c6341c45abb5
+master date: 2024-05-16 13:59:11 +0100
+---
+ xen/arch/x86/cpu/microcode/amd.c | 7 +++++--
+ xen/arch/x86/cpu/microcode/core.c | 2 ++
+ xen/arch/x86/cpu/microcode/intel.c | 7 +++++--
+ 3 files changed, 12 insertions(+), 4 deletions(-)
+
+diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
+index 75fc84e445..d8f7646e88 100644
+--- a/xen/arch/x86/cpu/microcode/amd.c
++++ b/xen/arch/x86/cpu/microcode/amd.c
+@@ -222,12 +222,15 @@ static int cf_check apply_microcode(const struct microcode_patch *patch)
+ uint32_t rev, old_rev = sig->rev;
+ enum microcode_match_result result = microcode_fits(patch);
+
++ if ( result == MIS_UCODE )
++ return -EINVAL;
++
+ /*
+ * Allow application of the same revision to pick up SMT-specific changes
+ * even if the revision of the other SMT thread is already up-to-date.
+ */
+- if ( result != NEW_UCODE && result != SAME_UCODE )
+- return -EINVAL;
++ if ( result == OLD_UCODE )
++ return -EEXIST;
+
+ if ( check_final_patch_levels(sig) )
+ {
+diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
+index d5338ad345..8a47f4471f 100644
+--- a/xen/arch/x86/cpu/microcode/core.c
++++ b/xen/arch/x86/cpu/microcode/core.c
+@@ -887,6 +887,8 @@ int __init early_microcode_init(unsigned long *module_map,
+
+ ucode_ops.collect_cpu_info();
+
++ printk(XENLOG_INFO "BSP microcode revision: 0x%08x\n", this_cpu(cpu_sig).rev);
++
+ /*
+ * Some hypervisors deliberately report a microcode revision of -1 to
+ * mean that they will not accept microcode updates.
+diff --git a/xen/arch/x86/cpu/microcode/intel.c b/xen/arch/x86/cpu/microcode/intel.c
+index 060c529a6e..a2d88e3ac0 100644
+--- a/xen/arch/x86/cpu/microcode/intel.c
++++ b/xen/arch/x86/cpu/microcode/intel.c
+@@ -294,10 +294,13 @@ static int cf_check apply_microcode(const struct microcode_patch *patch)
+
+ result = microcode_update_match(patch);
+
+- if ( result != NEW_UCODE &&
+- !(opt_ucode_allow_same && result == SAME_UCODE) )
++ if ( result == MIS_UCODE )
+ return -EINVAL;
+
++ if ( result == OLD_UCODE ||
++ (result == SAME_UCODE && !opt_ucode_allow_same) )
++ return -EEXIST;
++
+ wbinvd();
+
+ wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)patch->data);
+--
+2.45.2
+
diff --git a/0020-xen-cmdline-fix-printf-format-specifier-in-no_config.patch b/0020-xen-cmdline-fix-printf-format-specifier-in-no_config.patch
deleted file mode 100644
index b47d9ee..0000000
--- a/0020-xen-cmdline-fix-printf-format-specifier-in-no_config.patch
+++ /dev/null
@@ -1,38 +0,0 @@
-From 229e8a72ee4cde5698aaf42cc59ae57446dce60f Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 27 Feb 2024 14:10:39 +0100
-Subject: [PATCH 20/67] xen/cmdline: fix printf format specifier in
- no_config_param()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-'*' sets the width field, which is the minimum number of characters to output,
-but what we want in no_config_param() is the precision instead, which is '.*'
-as it imposes a maximum limit on the output.
-
-Fixes: 68d757df8dd2 ('x86/pv: Options to disable and/or compile out 32bit PV support')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: ef101f525173cf51dc70f4c77862f6f10a8ddccf
-master date: 2024-02-26 10:17:40 +0100
----
- xen/include/xen/param.h | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/include/xen/param.h b/xen/include/xen/param.h
-index 93c3fe7cb7..e02e49635c 100644
---- a/xen/include/xen/param.h
-+++ b/xen/include/xen/param.h
-@@ -191,7 +191,7 @@ static inline void no_config_param(const char *cfg, const char *param,
- {
- int len = e ? ({ ASSERT(e >= s); e - s; }) : strlen(s);
-
-- printk(XENLOG_INFO "CONFIG_%s disabled - ignoring '%s=%*s' setting\n",
-+ printk(XENLOG_INFO "CONFIG_%s disabled - ignoring '%s=%.*s' setting\n",
- cfg, param, len, s);
- }
-
---
-2.44.0
-
diff --git a/0021-x86-altcall-use-a-union-as-register-type-for-functio.patch b/0021-x86-altcall-use-a-union-as-register-type-for-functio.patch
deleted file mode 100644
index ab050dd..0000000
--- a/0021-x86-altcall-use-a-union-as-register-type-for-functio.patch
+++ /dev/null
@@ -1,141 +0,0 @@
-From 1aafe054e7d1efbf8e8482a9cdd4be5753b79e2f Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 27 Feb 2024 14:11:04 +0100
-Subject: [PATCH 21/67] x86/altcall: use a union as register type for function
- parameters on clang
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The current code for alternative calls uses the caller parameter types as the
-types for the register variables that serve as function parameters:
-
-uint8_t foo;
-[...]
-alternative_call(myfunc, foo);
-
-Would expand roughly into:
-
-register unint8_t a1_ asm("rdi") = foo;
-register unsigned long a2_ asm("rsi");
-[...]
-asm volatile ("call *%c[addr](%%rip)"...);
-
-However with -O2 clang will generate incorrect code, given the following
-example:
-
-unsigned int func(uint8_t t)
-{
- return t;
-}
-
-static void bar(uint8_t b)
-{
- int ret_;
- register uint8_t di asm("rdi") = b;
- register unsigned long si asm("rsi");
- register unsigned long dx asm("rdx");
- register unsigned long cx asm("rcx");
- register unsigned long r8 asm("r8");
- register unsigned long r9 asm("r9");
- register unsigned long r10 asm("r10");
- register unsigned long r11 asm("r11");
-
- asm volatile ( "call %c[addr]"
- : "+r" (di), "=r" (si), "=r" (dx),
- "=r" (cx), "=r" (r8), "=r" (r9),
- "=r" (r10), "=r" (r11), "=a" (ret_)
- : [addr] "i" (&(func)), "g" (func)
- : "memory" );
-}
-
-void foo(unsigned int a)
-{
- bar(a);
-}
-
-Clang generates the following assembly code:
-
-func: # @func
- movl %edi, %eax
- retq
-foo: # @foo
- callq func
- retq
-
-Note the truncation of the unsigned int parameter 'a' of foo() to uint8_t when
-passed into bar() is lost. clang doesn't zero extend the parameters in the
-callee when required, as the psABI mandates.
-
-The above can be worked around by using a union when defining the register
-variables, so that `di` becomes:
-
-register union {
- uint8_t e;
- unsigned long r;
-} di asm("rdi") = { .e = b };
-
-Which results in following code generated for `foo()`:
-
-foo: # @foo
- movzbl %dil, %edi
- callq func
- retq
-
-So the truncation is not longer lost. Apply such workaround only when built
-with clang.
-
-Reported-by: Matthew Grooms <mgrooms@shrew.net>
-Link: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277200
-Link: https://github.com/llvm/llvm-project/issues/12579
-Link: https://github.com/llvm/llvm-project/issues/82598
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-master commit: 2ce562b2a413cbdb2e1128989ed1722290a27c4e
-master date: 2024-02-26 10:18:01 +0100
----
- xen/arch/x86/include/asm/alternative.h | 25 +++++++++++++++++++++++++
- 1 file changed, 25 insertions(+)
-
-diff --git a/xen/arch/x86/include/asm/alternative.h b/xen/arch/x86/include/asm/alternative.h
-index a7a82c2c03..bcb1dc94f4 100644
---- a/xen/arch/x86/include/asm/alternative.h
-+++ b/xen/arch/x86/include/asm/alternative.h
-@@ -167,9 +167,34 @@ extern void alternative_branches(void);
- #define ALT_CALL_arg5 "r8"
- #define ALT_CALL_arg6 "r9"
-
-+#ifdef CONFIG_CC_IS_CLANG
-+/*
-+ * Use a union with an unsigned long in order to prevent clang from
-+ * skipping a possible truncation of the value. By using the union any
-+ * truncation is carried before the call instruction, in turn covering
-+ * for ABI-non-compliance in that the necessary clipping / extension of
-+ * the value is supposed to be carried out in the callee.
-+ *
-+ * Note this behavior is not mandated by the standard, and hence could
-+ * stop being a viable workaround, or worse, could cause a different set
-+ * of code-generation issues in future clang versions.
-+ *
-+ * This has been reported upstream:
-+ * https://github.com/llvm/llvm-project/issues/12579
-+ * https://github.com/llvm/llvm-project/issues/82598
-+ */
-+#define ALT_CALL_ARG(arg, n) \
-+ register union { \
-+ typeof(arg) e; \
-+ unsigned long r; \
-+ } a ## n ## _ asm ( ALT_CALL_arg ## n ) = { \
-+ .e = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); }) \
-+ }
-+#else
- #define ALT_CALL_ARG(arg, n) \
- register typeof(arg) a ## n ## _ asm ( ALT_CALL_arg ## n ) = \
- ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })
-+#endif
- #define ALT_CALL_NO_ARG(n) \
- register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n )
-
---
-2.44.0
-
diff --git a/0021-x86-msi-prevent-watchdog-triggering-when-dumping-MSI.patch b/0021-x86-msi-prevent-watchdog-triggering-when-dumping-MSI.patch
new file mode 100644
index 0000000..8bcc63f
--- /dev/null
+++ b/0021-x86-msi-prevent-watchdog-triggering-when-dumping-MSI.patch
@@ -0,0 +1,44 @@
+From 1ffb29d132600e6a7965c2885505615a6fd6c647 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Wed, 26 Jun 2024 13:36:52 +0200
+Subject: [PATCH 21/56] x86/msi: prevent watchdog triggering when dumping MSI
+ state
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Use the same check that's used in dump_irqs().
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 594b22ca5be681ec1b42c34f321cc2600d582210
+master date: 2024-05-20 14:29:44 +0100
+---
+ xen/arch/x86/msi.c | 4 ++++
+ 1 file changed, 4 insertions(+)
+
+diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
+index a78367d7cf..3eaeffd1e0 100644
+--- a/xen/arch/x86/msi.c
++++ b/xen/arch/x86/msi.c
+@@ -17,6 +17,7 @@
+ #include <xen/param.h>
+ #include <xen/pci.h>
+ #include <xen/pci_regs.h>
++#include <xen/softirq.h>
+ #include <xen/iocap.h>
+ #include <xen/keyhandler.h>
+ #include <xen/pfn.h>
+@@ -1405,6 +1406,9 @@ static void cf_check dump_msi(unsigned char key)
+ unsigned long flags;
+ const char *type = "???";
+
++ if ( !(irq & 0x1f) )
++ process_pending_softirqs();
++
+ if ( !irq_desc_initialized(desc) )
+ continue;
+
+--
+2.45.2
+
diff --git a/0022-x86-irq-remove-offline-CPUs-from-old-CPU-mask-when-a.patch b/0022-x86-irq-remove-offline-CPUs-from-old-CPU-mask-when-a.patch
new file mode 100644
index 0000000..28fec3e
--- /dev/null
+++ b/0022-x86-irq-remove-offline-CPUs-from-old-CPU-mask-when-a.patch
@@ -0,0 +1,44 @@
+From 52e16bf065cb42b79d14ac74d701d1f9d8506430 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Wed, 26 Jun 2024 13:37:20 +0200
+Subject: [PATCH 22/56] x86/irq: remove offline CPUs from old CPU mask when
+ adjusting move_cleanup_count
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When adjusting move_cleanup_count to account for CPUs that are offline also
+adjust old_cpu_mask, otherwise further calls to fixup_irqs() could subtract
+those again and create an imbalance in move_cleanup_count.
+
+Fixes: 472e0b74c5c4 ('x86/IRQ: deal with move cleanup count state in fixup_irqs()')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: e63209d3ba2fd1b2f232babd14c9c679ffa7b09a
+master date: 2024-06-10 10:33:22 +0200
+---
+ xen/arch/x86/irq.c | 8 ++++++++
+ 1 file changed, 8 insertions(+)
+
+diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
+index e07006391a..db14df93db 100644
+--- a/xen/arch/x86/irq.c
++++ b/xen/arch/x86/irq.c
+@@ -2576,6 +2576,14 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
+ desc->arch.move_cleanup_count -= cpumask_weight(affinity);
+ if ( !desc->arch.move_cleanup_count )
+ release_old_vec(desc);
++ else
++ /*
++ * Adjust old_cpu_mask to account for the offline CPUs,
++ * otherwise further calls to fixup_irqs() could subtract those
++ * again and possibly underflow the counter.
++ */
++ cpumask_andnot(desc->arch.old_cpu_mask, desc->arch.old_cpu_mask,
++ affinity);
+ }
+
+ if ( !desc->action || cpumask_subset(desc->affinity, mask) )
+--
+2.45.2
+
diff --git a/0022-x86-spec-fix-BRANCH_HARDEN-option-to-only-be-set-whe.patch b/0022-x86-spec-fix-BRANCH_HARDEN-option-to-only-be-set-whe.patch
deleted file mode 100644
index ce01c1a..0000000
--- a/0022-x86-spec-fix-BRANCH_HARDEN-option-to-only-be-set-whe.patch
+++ /dev/null
@@ -1,57 +0,0 @@
-From 91650010815f3da0834bc9781c4359350d1162a5 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 27 Feb 2024 14:11:40 +0100
-Subject: [PATCH 22/67] x86/spec: fix BRANCH_HARDEN option to only be set when
- build-enabled
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The current logic to handle the BRANCH_HARDEN option will report it as enabled
-even when build-time disabled. Fix this by only allowing the option to be set
-when support for it is built into Xen.
-
-Fixes: 2d6f36daa086 ('x86/nospec: Introduce CONFIG_SPECULATIVE_HARDEN_BRANCH')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 60e00f77a5cc671d30c5ef3318f5b8e9b74e4aa3
-master date: 2024-02-26 16:06:42 +0100
----
- xen/arch/x86/spec_ctrl.c | 14 ++++++++++++--
- 1 file changed, 12 insertions(+), 2 deletions(-)
-
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 56e07d7536..661716d695 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -62,7 +62,8 @@ int8_t __initdata opt_psfd = -1;
- int8_t __ro_after_init opt_ibpb_ctxt_switch = -1;
- int8_t __read_mostly opt_eager_fpu = -1;
- int8_t __read_mostly opt_l1d_flush = -1;
--static bool __initdata opt_branch_harden = true;
-+static bool __initdata opt_branch_harden =
-+ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH);
-
- bool __initdata bsp_delay_spec_ctrl;
- uint8_t __read_mostly default_xen_spec_ctrl;
-@@ -280,7 +281,16 @@ static int __init cf_check parse_spec_ctrl(const char *s)
- else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
- opt_l1d_flush = val;
- else if ( (val = parse_boolean("branch-harden", s, ss)) >= 0 )
-- opt_branch_harden = val;
-+ {
-+ if ( IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH) )
-+ opt_branch_harden = val;
-+ else
-+ {
-+ no_config_param("SPECULATIVE_HARDEN_BRANCH", "spec-ctrl", s,
-+ ss);
-+ rc = -EINVAL;
-+ }
-+ }
- else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 )
- opt_srb_lock = val;
- else if ( (val = parse_boolean("unpriv-mmio", s, ss)) >= 0 )
---
-2.44.0
-
diff --git a/0023-CI-Update-FreeBSD-to-13.3.patch b/0023-CI-Update-FreeBSD-to-13.3.patch
new file mode 100644
index 0000000..6a6e7ae
--- /dev/null
+++ b/0023-CI-Update-FreeBSD-to-13.3.patch
@@ -0,0 +1,33 @@
+From 80f2d2c2a515a6b9a4ea1b128267c6e1b5085002 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 26 Jun 2024 13:37:58 +0200
+Subject: [PATCH 23/56] CI: Update FreeBSD to 13.3
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+Acked-by: Stefano Stabellini <sstabellini@kernel.org>
+master commit: 5ea7f2c9d7a1334b3b2bd5f67fab4d447b60613d
+master date: 2024-06-11 17:00:10 +0100
+---
+ .cirrus.yml | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/.cirrus.yml b/.cirrus.yml
+index 63f3afb104..e961877881 100644
+--- a/.cirrus.yml
++++ b/.cirrus.yml
+@@ -17,7 +17,7 @@ freebsd_template: &FREEBSD_TEMPLATE
+ task:
+ name: 'FreeBSD 13'
+ freebsd_instance:
+- image_family: freebsd-13-2
++ image_family: freebsd-13-3
+ << : *FREEBSD_TEMPLATE
+
+ task:
+--
+2.45.2
+
diff --git a/0023-x86-account-for-shadow-stack-in-exception-from-stub-.patch b/0023-x86-account-for-shadow-stack-in-exception-from-stub-.patch
deleted file mode 100644
index e23a764..0000000
--- a/0023-x86-account-for-shadow-stack-in-exception-from-stub-.patch
+++ /dev/null
@@ -1,212 +0,0 @@
-From 49f77602373b58b7bbdb40cea2b49d2f88d4003d Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 27 Feb 2024 14:12:11 +0100
-Subject: [PATCH 23/67] x86: account for shadow stack in exception-from-stub
- recovery
-
-Dealing with exceptions raised from within emulation stubs involves
-discarding return address (replaced by exception related information).
-Such discarding of course also requires removing the corresponding entry
-from the shadow stack.
-
-Also amend the comment in fixup_exception_return(), to further clarify
-why use of ptr[1] can't be an out-of-bounds access.
-
-This is CVE-2023-46841 / XSA-451.
-
-Fixes: 209fb9919b50 ("x86/extable: Adjust extable handling to be shadow stack compatible")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 91f5f7a9154919a765c3933521760acffeddbf28
-master date: 2024-02-27 13:49:22 +0100
----
- xen/arch/x86/extable.c | 20 ++++++----
- xen/arch/x86/include/asm/uaccess.h | 3 +-
- xen/arch/x86/traps.c | 63 +++++++++++++++++++++++++++---
- 3 files changed, 71 insertions(+), 15 deletions(-)
-
-diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
-index 6758ba1dca..dd9583f2a5 100644
---- a/xen/arch/x86/extable.c
-+++ b/xen/arch/x86/extable.c
-@@ -86,26 +86,29 @@ search_one_extable(const struct exception_table_entry *first,
- }
-
- unsigned long
--search_exception_table(const struct cpu_user_regs *regs)
-+search_exception_table(const struct cpu_user_regs *regs, unsigned long *stub_ra)
- {
- const struct virtual_region *region = find_text_region(regs->rip);
- unsigned long stub = this_cpu(stubs.addr);
-
- if ( region && region->ex )
-+ {
-+ *stub_ra = 0;
- return search_one_extable(region->ex, region->ex_end, regs->rip);
-+ }
-
- if ( regs->rip >= stub + STUB_BUF_SIZE / 2 &&
- regs->rip < stub + STUB_BUF_SIZE &&
- regs->rsp > (unsigned long)regs &&
- regs->rsp < (unsigned long)get_cpu_info() )
- {
-- unsigned long retptr = *(unsigned long *)regs->rsp;
-+ unsigned long retaddr = *(unsigned long *)regs->rsp, fixup;
-
-- region = find_text_region(retptr);
-- retptr = region && region->ex
-- ? search_one_extable(region->ex, region->ex_end, retptr)
-- : 0;
-- if ( retptr )
-+ region = find_text_region(retaddr);
-+ fixup = region && region->ex
-+ ? search_one_extable(region->ex, region->ex_end, retaddr)
-+ : 0;
-+ if ( fixup )
- {
- /*
- * Put trap number and error code on the stack (in place of the
-@@ -117,7 +120,8 @@ search_exception_table(const struct cpu_user_regs *regs)
- };
-
- *(unsigned long *)regs->rsp = token.raw;
-- return retptr;
-+ *stub_ra = retaddr;
-+ return fixup;
- }
- }
-
-diff --git a/xen/arch/x86/include/asm/uaccess.h b/xen/arch/x86/include/asm/uaccess.h
-index 684fccd95c..74bb222c03 100644
---- a/xen/arch/x86/include/asm/uaccess.h
-+++ b/xen/arch/x86/include/asm/uaccess.h
-@@ -421,7 +421,8 @@ union stub_exception_token {
- unsigned long raw;
- };
-
--extern unsigned long search_exception_table(const struct cpu_user_regs *regs);
-+extern unsigned long search_exception_table(const struct cpu_user_regs *regs,
-+ unsigned long *stub_ra);
- extern void sort_exception_tables(void);
- extern void sort_exception_table(struct exception_table_entry *start,
- const struct exception_table_entry *stop);
-diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
-index 06c4f3868b..7599bee361 100644
---- a/xen/arch/x86/traps.c
-+++ b/xen/arch/x86/traps.c
-@@ -856,7 +856,7 @@ void do_unhandled_trap(struct cpu_user_regs *regs)
- }
-
- static void fixup_exception_return(struct cpu_user_regs *regs,
-- unsigned long fixup)
-+ unsigned long fixup, unsigned long stub_ra)
- {
- if ( IS_ENABLED(CONFIG_XEN_SHSTK) )
- {
-@@ -873,7 +873,8 @@ static void fixup_exception_return(struct cpu_user_regs *regs,
- /*
- * Search for %rip. The shstk currently looks like this:
- *
-- * ... [Likely pointed to by SSP]
-+ * tok [Supervisor token, == &tok | BUSY, only with FRED inactive]
-+ * ... [Pointed to by SSP for most exceptions, empty in IST cases]
- * %cs [== regs->cs]
- * %rip [== regs->rip]
- * SSP [Likely points to 3 slots higher, above %cs]
-@@ -891,7 +892,56 @@ static void fixup_exception_return(struct cpu_user_regs *regs,
- */
- if ( ptr[0] == regs->rip && ptr[1] == regs->cs )
- {
-+ unsigned long primary_shstk =
-+ (ssp & ~(STACK_SIZE - 1)) +
-+ (PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 8;
-+
- wrss(fixup, ptr);
-+
-+ if ( !stub_ra )
-+ goto shstk_done;
-+
-+ /*
-+ * Stub recovery ought to happen only when the outer context
-+ * was on the main shadow stack. We need to also "pop" the
-+ * stub's return address from the interrupted context's shadow
-+ * stack. That is,
-+ * - if we're still on the main stack, we need to move the
-+ * entire stack (up to and including the exception frame)
-+ * up by one slot, incrementing the original SSP in the
-+ * exception frame,
-+ * - if we're on an IST stack, we need to increment the
-+ * original SSP.
-+ */
-+ BUG_ON((ptr[-1] ^ primary_shstk) >> PAGE_SHIFT);
-+
-+ if ( (ssp ^ primary_shstk) >> PAGE_SHIFT )
-+ {
-+ /*
-+ * We're on an IST stack. First make sure the two return
-+ * addresses actually match. Then increment the interrupted
-+ * context's SSP.
-+ */
-+ BUG_ON(stub_ra != *(unsigned long*)ptr[-1]);
-+ wrss(ptr[-1] + 8, &ptr[-1]);
-+ goto shstk_done;
-+ }
-+
-+ /* Make sure the two return addresses actually match. */
-+ BUG_ON(stub_ra != ptr[2]);
-+
-+ /* Move exception frame, updating SSP there. */
-+ wrss(ptr[1], &ptr[2]); /* %cs */
-+ wrss(ptr[0], &ptr[1]); /* %rip */
-+ wrss(ptr[-1] + 8, &ptr[0]); /* SSP */
-+
-+ /* Move all newer entries. */
-+ while ( --ptr != _p(ssp) )
-+ wrss(ptr[-1], &ptr[0]);
-+
-+ /* Finally account for our own stack having shifted up. */
-+ asm volatile ( "incsspd %0" :: "r" (2) );
-+
- goto shstk_done;
- }
- }
-@@ -912,7 +962,8 @@ static void fixup_exception_return(struct cpu_user_regs *regs,
-
- static bool extable_fixup(struct cpu_user_regs *regs, bool print)
- {
-- unsigned long fixup = search_exception_table(regs);
-+ unsigned long stub_ra = 0;
-+ unsigned long fixup = search_exception_table(regs, &stub_ra);
-
- if ( unlikely(fixup == 0) )
- return false;
-@@ -926,7 +977,7 @@ static bool extable_fixup(struct cpu_user_regs *regs, bool print)
- vector_name(regs->entry_vector), regs->error_code,
- _p(regs->rip), _p(regs->rip), _p(fixup));
-
-- fixup_exception_return(regs, fixup);
-+ fixup_exception_return(regs, fixup, stub_ra);
- this_cpu(last_extable_addr) = regs->rip;
-
- return true;
-@@ -1214,7 +1265,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
- void (*fn)(struct cpu_user_regs *) = bug_ptr(bug);
-
- fn(regs);
-- fixup_exception_return(regs, (unsigned long)eip);
-+ fixup_exception_return(regs, (unsigned long)eip, 0);
- return;
- }
-
-@@ -1235,7 +1286,7 @@ void do_invalid_op(struct cpu_user_regs *regs)
- case BUGFRAME_warn:
- printk("Xen WARN at %s%s:%d\n", prefix, filename, lineno);
- show_execution_state(regs);
-- fixup_exception_return(regs, (unsigned long)eip);
-+ fixup_exception_return(regs, (unsigned long)eip, 0);
- return;
-
- case BUGFRAME_bug:
---
-2.44.0
-
diff --git a/0024-x86-smp-do-not-use-shorthand-IPI-destinations-in-CPU.patch b/0024-x86-smp-do-not-use-shorthand-IPI-destinations-in-CPU.patch
new file mode 100644
index 0000000..b69c88c
--- /dev/null
+++ b/0024-x86-smp-do-not-use-shorthand-IPI-destinations-in-CPU.patch
@@ -0,0 +1,98 @@
+From 98238d49ecb149a5ac07cb8032817904c404ac2b Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Wed, 26 Jun 2024 13:38:36 +0200
+Subject: [PATCH 24/56] x86/smp: do not use shorthand IPI destinations in CPU
+ hot{,un}plug contexts
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Due to the current rwlock logic, if the CPU calling get_cpu_maps() does
+so from a cpu_hotplug_{begin,done}() region the function will still
+return success, because a CPU taking the rwlock in read mode after
+having taken it in write mode is allowed. Such corner case makes using
+get_cpu_maps() alone not enough to prevent using the shorthand in CPU
+hotplug regions.
+
+Introduce a new helper to detect whether the current caller is between a
+cpu_hotplug_{begin,done}() region and use it in send_IPI_mask() to restrict
+shorthand usage.
+
+Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when possible')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 171c52fba5d94e050d704770480dcb983490d0ad
+master date: 2024-06-12 14:29:31 +0200
+---
+ xen/arch/x86/smp.c | 2 +-
+ xen/common/cpu.c | 5 +++++
+ xen/include/xen/cpu.h | 10 ++++++++++
+ xen/include/xen/rwlock.h | 2 ++
+ 4 files changed, 18 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
+index 3a331cbdbc..340fcafb46 100644
+--- a/xen/arch/x86/smp.c
++++ b/xen/arch/x86/smp.c
+@@ -88,7 +88,7 @@ void send_IPI_mask(const cpumask_t *mask, int vector)
+ * the system have been accounted for.
+ */
+ if ( system_state > SYS_STATE_smp_boot &&
+- !unaccounted_cpus && !disabled_cpus &&
++ !unaccounted_cpus && !disabled_cpus && !cpu_in_hotplug_context() &&
+ /* NB: get_cpu_maps lock requires enabled interrupts. */
+ local_irq_is_enabled() && (cpus_locked = get_cpu_maps()) &&
+ (park_offline_cpus ||
+diff --git a/xen/common/cpu.c b/xen/common/cpu.c
+index 8709db4d29..6e35b114c0 100644
+--- a/xen/common/cpu.c
++++ b/xen/common/cpu.c
+@@ -68,6 +68,11 @@ void cpu_hotplug_done(void)
+ write_unlock(&cpu_add_remove_lock);
+ }
+
++bool cpu_in_hotplug_context(void)
++{
++ return rw_is_write_locked_by_me(&cpu_add_remove_lock);
++}
++
+ static NOTIFIER_HEAD(cpu_chain);
+
+ void __init register_cpu_notifier(struct notifier_block *nb)
+diff --git a/xen/include/xen/cpu.h b/xen/include/xen/cpu.h
+index e1d4eb5967..6bf5786750 100644
+--- a/xen/include/xen/cpu.h
++++ b/xen/include/xen/cpu.h
+@@ -13,6 +13,16 @@ void put_cpu_maps(void);
+ void cpu_hotplug_begin(void);
+ void cpu_hotplug_done(void);
+
++/*
++ * Returns true when the caller CPU is between a cpu_hotplug_{begin,done}()
++ * region.
++ *
++ * This is required to safely identify hotplug contexts, as get_cpu_maps()
++ * would otherwise succeed because a caller holding the lock in write mode is
++ * allowed to acquire the same lock in read mode.
++ */
++bool cpu_in_hotplug_context(void);
++
+ /* Receive notification of CPU hotplug events. */
+ void register_cpu_notifier(struct notifier_block *nb);
+
+diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
+index 9e35ee2edf..dc74d1c057 100644
+--- a/xen/include/xen/rwlock.h
++++ b/xen/include/xen/rwlock.h
+@@ -309,6 +309,8 @@ static always_inline void write_lock_irq(rwlock_t *l)
+
+ #define rw_is_locked(l) _rw_is_locked(l)
+ #define rw_is_write_locked(l) _rw_is_write_locked(l)
++#define rw_is_write_locked_by_me(l) \
++ lock_evaluate_nospec(_is_write_locked_by_me(atomic_read(&(l)->cnts)))
+
+
+ typedef struct percpu_rwlock percpu_rwlock_t;
+--
+2.45.2
+
diff --git a/0024-xen-arm-Fix-UBSAN-failure-in-start_xen.patch b/0024-xen-arm-Fix-UBSAN-failure-in-start_xen.patch
deleted file mode 100644
index 7bdd651..0000000
--- a/0024-xen-arm-Fix-UBSAN-failure-in-start_xen.patch
+++ /dev/null
@@ -1,52 +0,0 @@
-From 6cbccc4071ef49a8c591ecaddfdcb1cc26d28411 Mon Sep 17 00:00:00 2001
-From: Michal Orzel <michal.orzel@amd.com>
-Date: Thu, 8 Feb 2024 11:43:39 +0100
-Subject: [PATCH 24/67] xen/arm: Fix UBSAN failure in start_xen()
-
-When running Xen on arm32, in scenario where Xen is loaded at an address
-such as boot_phys_offset >= 2GB, UBSAN reports the following:
-
-(XEN) UBSAN: Undefined behaviour in arch/arm/setup.c:739:58
-(XEN) pointer operation underflowed 00200000 to 86800000
-(XEN) Xen WARN at common/ubsan/ubsan.c:172
-(XEN) ----[ Xen-4.19-unstable arm32 debug=y ubsan=y Not tainted ]----
-...
-(XEN) Xen call trace:
-(XEN) [<0031b4c0>] ubsan.c#ubsan_epilogue+0x18/0xf0 (PC)
-(XEN) [<0031d134>] __ubsan_handle_pointer_overflow+0xb8/0xd4 (LR)
-(XEN) [<0031d134>] __ubsan_handle_pointer_overflow+0xb8/0xd4
-(XEN) [<004d15a8>] start_xen+0xe0/0xbe0
-(XEN) [<0020007c>] head.o#primary_switched+0x4/0x30
-
-The failure is reported for the following line:
-(paddr_t)(uintptr_t)(_start + boot_phys_offset)
-
-This occurs because the compiler treats (ptr + size) with size bigger than
-PTRDIFF_MAX as undefined behavior. To address this, switch to macro
-virt_to_maddr(), given the future plans to eliminate boot_phys_offset.
-
-Signed-off-by: Michal Orzel <michal.orzel@amd.com>
-Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
-Tested-by: Luca Fancellu <luca.fancellu@arm.com>
-Acked-by: Julien Grall <jgrall@amazon.com>
-(cherry picked from commit e11f5766503c0ff074b4e0f888bbfc931518a169)
----
- xen/arch/arm/setup.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
-index 4395640019..9ee19c2bc1 100644
---- a/xen/arch/arm/setup.c
-+++ b/xen/arch/arm/setup.c
-@@ -1025,7 +1025,7 @@ void __init start_xen(unsigned long boot_phys_offset,
-
- /* Register Xen's load address as a boot module. */
- xen_bootmodule = add_boot_module(BOOTMOD_XEN,
-- (paddr_t)(uintptr_t)(_start + boot_phys_offset),
-+ virt_to_maddr(_start),
- (paddr_t)(uintptr_t)(_end - _start), false);
- BUG_ON(!xen_bootmodule);
-
---
-2.44.0
-
diff --git a/0025-x86-HVM-hide-SVM-VMX-when-their-enabling-is-prohibit.patch b/0025-x86-HVM-hide-SVM-VMX-when-their-enabling-is-prohibit.patch
deleted file mode 100644
index 28e489b..0000000
--- a/0025-x86-HVM-hide-SVM-VMX-when-their-enabling-is-prohibit.patch
+++ /dev/null
@@ -1,67 +0,0 @@
-From 9c0d518eb8dc69430e6a8d767bd101dad19b846a Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 5 Mar 2024 11:56:31 +0100
-Subject: [PATCH 25/67] x86/HVM: hide SVM/VMX when their enabling is prohibited
- by firmware
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-... or we fail to enable the functionality on the BSP for other reasons.
-The only place where hardware announcing the feature is recorded is the
-raw CPU policy/featureset.
-
-Inspired by https://lore.kernel.org/all/20230921114940.957141-1-pbonzini@redhat.com/.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 0b5f149338e35a795bf609ce584640b0977f9e6c
-master date: 2024-01-09 14:06:34 +0100
----
- xen/arch/x86/hvm/svm/svm.c | 1 +
- xen/arch/x86/hvm/vmx/vmcs.c | 17 +++++++++++++++++
- 2 files changed, 18 insertions(+)
-
-diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
-index fd32600ae3..3c17464550 100644
---- a/xen/arch/x86/hvm/svm/svm.c
-+++ b/xen/arch/x86/hvm/svm/svm.c
-@@ -1669,6 +1669,7 @@ const struct hvm_function_table * __init start_svm(void)
-
- if ( _svm_cpu_up(true) )
- {
-+ setup_clear_cpu_cap(X86_FEATURE_SVM);
- printk("SVM: failed to initialise.\n");
- return NULL;
- }
-diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
-index bcbecc6945..b5ecc51b43 100644
---- a/xen/arch/x86/hvm/vmx/vmcs.c
-+++ b/xen/arch/x86/hvm/vmx/vmcs.c
-@@ -2163,6 +2163,23 @@ int __init vmx_vmcs_init(void)
-
- if ( !ret )
- register_keyhandler('v', vmcs_dump, "dump VT-x VMCSs", 1);
-+ else
-+ {
-+ setup_clear_cpu_cap(X86_FEATURE_VMX);
-+
-+ /*
-+ * _vmx_vcpu_up() may have made it past feature identification.
-+ * Make sure all dependent features are off as well.
-+ */
-+ vmx_basic_msr = 0;
-+ vmx_pin_based_exec_control = 0;
-+ vmx_cpu_based_exec_control = 0;
-+ vmx_secondary_exec_control = 0;
-+ vmx_vmexit_control = 0;
-+ vmx_vmentry_control = 0;
-+ vmx_ept_vpid_cap = 0;
-+ vmx_vmfunc = 0;
-+ }
-
- return ret;
- }
---
-2.44.0
-
diff --git a/0025-x86-irq-limit-interrupt-movement-done-by-fixup_irqs.patch b/0025-x86-irq-limit-interrupt-movement-done-by-fixup_irqs.patch
new file mode 100644
index 0000000..7c40bba
--- /dev/null
+++ b/0025-x86-irq-limit-interrupt-movement-done-by-fixup_irqs.patch
@@ -0,0 +1,104 @@
+From ce0a0cb0a74a909abf988f242aa228acdd2917fe Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Wed, 26 Jun 2024 13:39:11 +0200
+Subject: [PATCH 25/56] x86/irq: limit interrupt movement done by fixup_irqs()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The current check used in fixup_irqs() to decide whether to move around
+interrupts is based on the affinity mask, but such mask can have all bits set,
+and hence is unlikely to be a subset of the input mask. For example if an
+interrupt has an affinity mask of all 1s, any input to fixup_irqs() that's not
+an all set CPU mask would cause that interrupt to be shuffled around
+unconditionally.
+
+What fixup_irqs() care about is evacuating interrupts from CPUs not set on the
+input CPU mask, and for that purpose it should check whether the interrupt is
+assigned to a CPU not present in the input mask. Assume that ->arch.cpu_mask
+is a subset of the ->affinity mask, and keep the current logic that resets the
+->affinity mask if the interrupt has to be shuffled around.
+
+Doing the affinity movement based on ->arch.cpu_mask requires removing the
+special handling to ->arch.cpu_mask done for high priority vectors, otherwise
+the adjustment done to cpu_mask makes them always skip the CPU interrupt
+movement.
+
+While there also adjust the comment as to the purpose of fixup_irqs().
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: c7564d7366d865cc407e3d64bca816d07edee174
+master date: 2024-06-12 14:30:40 +0200
+---
+ xen/arch/x86/include/asm/irq.h | 2 +-
+ xen/arch/x86/irq.c | 21 +++++++++++----------
+ 2 files changed, 12 insertions(+), 11 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
+index d7fb8ec7e8..71d4a8fc56 100644
+--- a/xen/arch/x86/include/asm/irq.h
++++ b/xen/arch/x86/include/asm/irq.h
+@@ -132,7 +132,7 @@ void free_domain_pirqs(struct domain *d);
+ int map_domain_emuirq_pirq(struct domain *d, int pirq, int emuirq);
+ int unmap_domain_pirq_emuirq(struct domain *d, int pirq);
+
+-/* Reset irq affinities to match the given CPU mask. */
++/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. */
+ void fixup_irqs(const cpumask_t *mask, bool verbose);
+ void fixup_eoi(void);
+
+diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
+index db14df93db..566331bec1 100644
+--- a/xen/arch/x86/irq.c
++++ b/xen/arch/x86/irq.c
+@@ -2529,7 +2529,7 @@ static int __init cf_check setup_dump_irqs(void)
+ }
+ __initcall(setup_dump_irqs);
+
+-/* Reset irq affinities to match the given CPU mask. */
++/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. */
+ void fixup_irqs(const cpumask_t *mask, bool verbose)
+ {
+ unsigned int irq;
+@@ -2553,19 +2553,15 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
+
+ vector = irq_to_vector(irq);
+ if ( vector >= FIRST_HIPRIORITY_VECTOR &&
+- vector <= LAST_HIPRIORITY_VECTOR )
++ vector <= LAST_HIPRIORITY_VECTOR &&
++ desc->handler == &no_irq_type )
+ {
+- cpumask_and(desc->arch.cpu_mask, desc->arch.cpu_mask, mask);
+-
+ /*
+ * This can in particular happen when parking secondary threads
+ * during boot and when the serial console wants to use a PCI IRQ.
+ */
+- if ( desc->handler == &no_irq_type )
+- {
+- spin_unlock(&desc->lock);
+- continue;
+- }
++ spin_unlock(&desc->lock);
++ continue;
+ }
+
+ if ( desc->arch.move_cleanup_count )
+@@ -2586,7 +2582,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
+ affinity);
+ }
+
+- if ( !desc->action || cpumask_subset(desc->affinity, mask) )
++ /*
++ * Avoid shuffling the interrupt around as long as current target CPUs
++ * are a subset of the input mask. What fixup_irqs() cares about is
++ * evacuating interrupts from CPUs not in the input mask.
++ */
++ if ( !desc->action || cpumask_subset(desc->arch.cpu_mask, mask) )
+ {
+ spin_unlock(&desc->lock);
+ continue;
+--
+2.45.2
+
diff --git a/0026-x86-EPT-correct-special-page-checking-in-epte_get_en.patch b/0026-x86-EPT-correct-special-page-checking-in-epte_get_en.patch
new file mode 100644
index 0000000..c94728a
--- /dev/null
+++ b/0026-x86-EPT-correct-special-page-checking-in-epte_get_en.patch
@@ -0,0 +1,46 @@
+From 6e647efaf2b02ce92bcf80bec47c18cca5084f8a Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Wed, 26 Jun 2024 13:39:44 +0200
+Subject: [PATCH 26/56] x86/EPT: correct special page checking in
+ epte_get_entry_emt()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+mfn_valid() granularity is (currently) 256Mb. Therefore the start of a
+1Gb page passing the test doesn't necessarily mean all parts of such a
+range would also pass. Yet using the result of mfn_to_page() on an MFN
+which doesn't pass mfn_valid() checking is liable to result in a crash
+(the invocation of mfn_to_page() alone is presumably "just" UB in such a
+case).
+
+Fixes: ca24b2ffdbd9 ("x86/hvm: set 'ipat' in EPT for special pages")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 5540b94e8191059eb9cbbe98ac316232a42208f6
+master date: 2024-06-13 16:53:34 +0200
+---
+ xen/arch/x86/mm/p2m-ept.c | 6 +++++-
+ 1 file changed, 5 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
+index 85c4e8e54f..1aa6bbc771 100644
+--- a/xen/arch/x86/mm/p2m-ept.c
++++ b/xen/arch/x86/mm/p2m-ept.c
+@@ -518,8 +518,12 @@ int epte_get_entry_emt(struct domain *d, gfn_t gfn, mfn_t mfn,
+ }
+
+ for ( special_pgs = i = 0; i < (1ul << order); i++ )
+- if ( is_special_page(mfn_to_page(mfn_add(mfn, i))) )
++ {
++ mfn_t cur = mfn_add(mfn, i);
++
++ if ( mfn_valid(cur) && is_special_page(mfn_to_page(cur)) )
+ special_pgs++;
++ }
+
+ if ( special_pgs )
+ {
+--
+2.45.2
+
diff --git a/0026-xen-sched-Fix-UB-shift-in-compat_set_timer_op.patch b/0026-xen-sched-Fix-UB-shift-in-compat_set_timer_op.patch
deleted file mode 100644
index 4b051ea..0000000
--- a/0026-xen-sched-Fix-UB-shift-in-compat_set_timer_op.patch
+++ /dev/null
@@ -1,86 +0,0 @@
-From b75bee183210318150e678e14b35224d7c73edb6 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 5 Mar 2024 11:57:02 +0100
-Subject: [PATCH 26/67] xen/sched: Fix UB shift in compat_set_timer_op()
-
-Tamas reported this UBSAN failure from fuzzing:
-
- (XEN) ================================================================================
- (XEN) UBSAN: Undefined behaviour in common/sched/compat.c:48:37
- (XEN) left shift of negative value -2147425536
- (XEN) ----[ Xen-4.19-unstable x86_64 debug=y ubsan=y Not tainted ]----
- ...
- (XEN) Xen call trace:
- (XEN) [<ffff82d040307c1c>] R ubsan.c#ubsan_epilogue+0xa/0xd9
- (XEN) [<ffff82d040308afb>] F __ubsan_handle_shift_out_of_bounds+0x11a/0x1c5
- (XEN) [<ffff82d040307758>] F compat_set_timer_op+0x41/0x43
- (XEN) [<ffff82d04040e4cc>] F hvm_do_multicall_call+0x77f/0xa75
- (XEN) [<ffff82d040519462>] F arch_do_multicall_call+0xec/0xf1
- (XEN) [<ffff82d040261567>] F do_multicall+0x1dc/0xde3
- (XEN) [<ffff82d04040d2b3>] F hvm_hypercall+0xa00/0x149a
- (XEN) [<ffff82d0403cd072>] F vmx_vmexit_handler+0x1596/0x279c
- (XEN) [<ffff82d0403d909b>] F vmx_asm_vmexit_handler+0xdb/0x200
-
-Left-shifting any negative value is strictly undefined behaviour in C, and
-the two parameters here come straight from the guest.
-
-The fuzzer happened to choose lo 0xf, hi 0x8000e300.
-
-Switch everything to be unsigned values, making the shift well defined.
-
-As GCC documents:
-
- As an extension to the C language, GCC does not use the latitude given in
- C99 and C11 only to treat certain aspects of signed '<<' as undefined.
- However, -fsanitize=shift (and -fsanitize=undefined) will diagnose such
- cases.
-
-this was deemed not to need an XSA.
-
-Note: The unsigned -> signed conversion for do_set_timer_op()'s s_time_t
-parameter is also well defined. C makes it implementation defined, and GCC
-defines it as reduction modulo 2^N to be within range of the new type.
-
-Fixes: 2942f45e09fb ("Enable compatibility mode operation for HYPERVISOR_sched_op and HYPERVISOR_set_timer_op.")
-Reported-by: Tamas K Lengyel <tamas@tklengyel.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: ae6d4fd876765e6d623eec67d14f5d0464be09cb
-master date: 2024-02-01 19:52:44 +0000
----
- xen/common/sched/compat.c | 4 ++--
- xen/include/hypercall-defs.c | 2 +-
- 2 files changed, 3 insertions(+), 3 deletions(-)
-
-diff --git a/xen/common/sched/compat.c b/xen/common/sched/compat.c
-index 040b4caca2..b827fdecb8 100644
---- a/xen/common/sched/compat.c
-+++ b/xen/common/sched/compat.c
-@@ -39,9 +39,9 @@ static int compat_poll(struct compat_sched_poll *compat)
-
- #include "core.c"
-
--int compat_set_timer_op(u32 lo, s32 hi)
-+int compat_set_timer_op(uint32_t lo, uint32_t hi)
- {
-- return do_set_timer_op(((s64)hi << 32) | lo);
-+ return do_set_timer_op(((uint64_t)hi << 32) | lo);
- }
-
- /*
-diff --git a/xen/include/hypercall-defs.c b/xen/include/hypercall-defs.c
-index 1896121074..c442dee284 100644
---- a/xen/include/hypercall-defs.c
-+++ b/xen/include/hypercall-defs.c
-@@ -127,7 +127,7 @@ xenoprof_op(int op, void *arg)
-
- #ifdef CONFIG_COMPAT
- prefix: compat
--set_timer_op(uint32_t lo, int32_t hi)
-+set_timer_op(uint32_t lo, uint32_t hi)
- multicall(multicall_entry_compat_t *call_list, uint32_t nr_calls)
- memory_op(unsigned int cmd, void *arg)
- #ifdef CONFIG_IOREQ_SERVER
---
-2.44.0
-
diff --git a/0027-x86-EPT-avoid-marking-non-present-entries-for-re-con.patch b/0027-x86-EPT-avoid-marking-non-present-entries-for-re-con.patch
new file mode 100644
index 0000000..23e8946
--- /dev/null
+++ b/0027-x86-EPT-avoid-marking-non-present-entries-for-re-con.patch
@@ -0,0 +1,85 @@
+From d31385be5c8e8bc5efb6f8848057bd0c69e8274a Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Wed, 26 Jun 2024 13:40:11 +0200
+Subject: [PATCH 27/56] x86/EPT: avoid marking non-present entries for
+ re-configuring
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+For non-present entries EMT, like most other fields, is meaningless to
+hardware. Make the logic in ept_set_entry() setting the field (and iPAT)
+conditional upon dealing with a present entry, leaving the value at 0
+otherwise. This has two effects for epte_get_entry_emt() which we'll
+want to leverage subsequently:
+1) The call moved here now won't be issued with INVALID_MFN anymore (a
+ respective BUG_ON() is being added).
+2) Neither of the other two calls could now be issued with a truncated
+ form of INVALID_MFN anymore (as long as there's no bug anywhere
+ marking an entry present when that was populated using INVALID_MFN).
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 777c71d31325bc55ba1cc3f317d4155fe519ab0b
+master date: 2024-06-13 16:54:17 +0200
+---
+ xen/arch/x86/mm/p2m-ept.c | 29 ++++++++++++++++++-----------
+ 1 file changed, 18 insertions(+), 11 deletions(-)
+
+diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
+index 1aa6bbc771..641d61b350 100644
+--- a/xen/arch/x86/mm/p2m-ept.c
++++ b/xen/arch/x86/mm/p2m-ept.c
+@@ -649,6 +649,8 @@ static int cf_check resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
+ if ( e.emt != MTRR_NUM_TYPES )
+ break;
+
++ ASSERT(is_epte_present(&e));
++
+ if ( level == 0 )
+ {
+ for ( gfn -= i, i = 0; i < EPT_PAGETABLE_ENTRIES; ++i )
+@@ -914,17 +916,6 @@ ept_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
+
+ if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
+ {
+- bool ipat;
+- int emt = epte_get_entry_emt(p2m->domain, _gfn(gfn), mfn,
+- i * EPT_TABLE_ORDER, &ipat,
+- p2mt);
+-
+- if ( emt >= 0 )
+- new_entry.emt = emt;
+- else /* ept_handle_misconfig() will need to take care of this. */
+- new_entry.emt = MTRR_NUM_TYPES;
+-
+- new_entry.ipat = ipat;
+ new_entry.sp = !!i;
+ new_entry.sa_p2mt = p2mt;
+ new_entry.access = p2ma;
+@@ -940,6 +931,22 @@ ept_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
+ need_modify_vtd_table = 0;
+
+ ept_p2m_type_to_flags(p2m, &new_entry);
++
++ if ( is_epte_present(&new_entry) )
++ {
++ bool ipat;
++ int emt = epte_get_entry_emt(p2m->domain, _gfn(gfn), mfn,
++ i * EPT_TABLE_ORDER, &ipat,
++ p2mt);
++
++ BUG_ON(mfn_eq(mfn, INVALID_MFN));
++
++ if ( emt >= 0 )
++ new_entry.emt = emt;
++ else /* ept_handle_misconfig() will need to take care of this. */
++ new_entry.emt = MTRR_NUM_TYPES;
++ new_entry.ipat = ipat;
++ }
+ }
+
+ if ( sve != -1 )
+--
+2.45.2
+
diff --git a/0027-x86-spec-print-the-built-in-SPECULATIVE_HARDEN_-opti.patch b/0027-x86-spec-print-the-built-in-SPECULATIVE_HARDEN_-opti.patch
deleted file mode 100644
index 845247a..0000000
--- a/0027-x86-spec-print-the-built-in-SPECULATIVE_HARDEN_-opti.patch
+++ /dev/null
@@ -1,54 +0,0 @@
-From 76ea2aab3652cc34e474de0905f0a9cd4df7d087 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Mar 2024 11:57:41 +0100
-Subject: [PATCH 27/67] x86/spec: print the built-in SPECULATIVE_HARDEN_*
- options
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Just like it's done for INDIRECT_THUNK and SHADOW_PAGING.
-
-Reported-by: Jan Beulich <jbeulich@suse.com>
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 6e9507f7d51fe49df8bc70f83e49ce06c92e4e54
-master date: 2024-02-27 14:57:52 +0100
----
- xen/arch/x86/spec_ctrl.c | 14 +++++++++++++-
- 1 file changed, 13 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 661716d695..93f1cf3bb5 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -488,13 +488,25 @@ static void __init print_details(enum ind_thunk thunk)
- (e21a & cpufeat_mask(X86_FEATURE_SBPB)) ? " SBPB" : "");
-
- /* Compiled-in support which pertains to mitigations. */
-- if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) )
-+ if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) ||
-+ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_ARRAY) ||
-+ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH) ||
-+ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS) )
- printk(" Compiled-in support:"
- #ifdef CONFIG_INDIRECT_THUNK
- " INDIRECT_THUNK"
- #endif
- #ifdef CONFIG_SHADOW_PAGING
- " SHADOW_PAGING"
-+#endif
-+#ifdef CONFIG_SPECULATIVE_HARDEN_ARRAY
-+ " HARDEN_ARRAY"
-+#endif
-+#ifdef CONFIG_SPECULATIVE_HARDEN_BRANCH
-+ " HARDEN_BRANCH"
-+#endif
-+#ifdef CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS
-+ " HARDEN_GUEST_ACCESS"
- #endif
- "\n");
-
---
-2.44.0
-
diff --git a/0028-x86-EPT-drop-questionable-mfn_valid-from-epte_get_en.patch b/0028-x86-EPT-drop-questionable-mfn_valid-from-epte_get_en.patch
new file mode 100644
index 0000000..ee495d4
--- /dev/null
+++ b/0028-x86-EPT-drop-questionable-mfn_valid-from-epte_get_en.patch
@@ -0,0 +1,47 @@
+From 3b777c2ce4ea8cf67b79a5496e51201145606798 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Wed, 26 Jun 2024 13:40:35 +0200
+Subject: [PATCH 28/56] x86/EPT: drop questionable mfn_valid() from
+ epte_get_entry_emt()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+mfn_valid() is RAM-focused; it will often return false for MMIO. Yet
+access to actual MMIO space should not generally be restricted to UC
+only; especially video frame buffer accesses are unduly affected by such
+a restriction.
+
+Since, as of 777c71d31325 ("x86/EPT: avoid marking non-present entries
+for re-configuring"), the function won't be called with INVALID_MFN or,
+worse, truncated forms thereof anymore, we call fully drop that check.
+
+Fixes: 81fd0d3ca4b2 ("x86/hvm: simplify 'mmio_direct' check in epte_get_entry_emt()")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 4fdd8d75566fdad06667a79ec0ce6f43cc466c54
+master date: 2024-06-13 16:55:22 +0200
+---
+ xen/arch/x86/mm/p2m-ept.c | 6 ------
+ 1 file changed, 6 deletions(-)
+
+diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
+index 641d61b350..d325424e97 100644
+--- a/xen/arch/x86/mm/p2m-ept.c
++++ b/xen/arch/x86/mm/p2m-ept.c
+@@ -500,12 +500,6 @@ int epte_get_entry_emt(struct domain *d, gfn_t gfn, mfn_t mfn,
+ return -1;
+ }
+
+- if ( !mfn_valid(mfn) )
+- {
+- *ipat = true;
+- return X86_MT_UC;
+- }
+-
+ /*
+ * Conditional must be kept in sync with the code in
+ * {iomem,ioports}_{permit,deny}_access().
+--
+2.45.2
+
diff --git a/0028-x86-spec-fix-INDIRECT_THUNK-option-to-only-be-set-wh.patch b/0028-x86-spec-fix-INDIRECT_THUNK-option-to-only-be-set-wh.patch
deleted file mode 100644
index dfbf516..0000000
--- a/0028-x86-spec-fix-INDIRECT_THUNK-option-to-only-be-set-wh.patch
+++ /dev/null
@@ -1,67 +0,0 @@
-From 693455c3c370e535eb6cd065800ff91e147815fa Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Mar 2024 11:58:04 +0100
-Subject: [PATCH 28/67] x86/spec: fix INDIRECT_THUNK option to only be set when
- build-enabled
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Attempt to provide a more helpful error message when the user attempts to set
-spec-ctrl=bti-thunk option but the support is build-time disabled.
-
-While there also adjust the command line documentation to mention
-CONFIG_INDIRECT_THUNK instead of INDIRECT_THUNK.
-
-Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 8441fa806a3b778867867cd0159fa1722e90397e
-master date: 2024-02-27 14:58:20 +0100
----
- docs/misc/xen-command-line.pandoc | 10 +++++-----
- xen/arch/x86/spec_ctrl.c | 7 ++++++-
- 2 files changed, 11 insertions(+), 6 deletions(-)
-
-diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index 05f613c71c..2006697226 100644
---- a/docs/misc/xen-command-line.pandoc
-+++ b/docs/misc/xen-command-line.pandoc
-@@ -2378,11 +2378,11 @@ guests to use.
- performance reasons dom0 is unprotected by default. If it is necessary to
- protect dom0 too, boot with `spec-ctrl=ibpb-entry`.
-
--If Xen was compiled with INDIRECT_THUNK support, `bti-thunk=` can be used to
--select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
--locations. The default thunk is `retpoline` (generally preferred), with the
--alternatives being `jmp` (a `jmp *%reg` gadget, minimal overhead), and
--`lfence` (an `lfence; jmp *%reg` gadget).
-+If Xen was compiled with `CONFIG_INDIRECT_THUNK` support, `bti-thunk=` can be
-+used to select which of the thunks gets patched into the
-+`__x86_indirect_thunk_%reg` locations. The default thunk is `retpoline`
-+(generally preferred), with the alternatives being `jmp` (a `jmp *%reg` gadget,
-+minimal overhead), and `lfence` (an `lfence; jmp *%reg` gadget).
-
- On hardware supporting IBRS (Indirect Branch Restricted Speculation), the
- `ibrs=` option can be used to force or prevent Xen using the feature itself.
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 93f1cf3bb5..098fa3184d 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -253,7 +253,12 @@ static int __init cf_check parse_spec_ctrl(const char *s)
- {
- s += 10;
-
-- if ( !cmdline_strcmp(s, "retpoline") )
-+ if ( !IS_ENABLED(CONFIG_INDIRECT_THUNK) )
-+ {
-+ no_config_param("INDIRECT_THUNK", "spec-ctrl", s - 10, ss);
-+ rc = -EINVAL;
-+ }
-+ else if ( !cmdline_strcmp(s, "retpoline") )
- opt_thunk = THUNK_RETPOLINE;
- else if ( !cmdline_strcmp(s, "lfence") )
- opt_thunk = THUNK_LFENCE;
---
-2.44.0
-
diff --git a/0029-x86-Intel-unlock-CPUID-earlier-for-the-BSP.patch b/0029-x86-Intel-unlock-CPUID-earlier-for-the-BSP.patch
new file mode 100644
index 0000000..6722508
--- /dev/null
+++ b/0029-x86-Intel-unlock-CPUID-earlier-for-the-BSP.patch
@@ -0,0 +1,105 @@
+From c4b284912695a5802433512b913e968eda01544f Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Wed, 26 Jun 2024 13:41:05 +0200
+Subject: [PATCH 29/56] x86/Intel: unlock CPUID earlier for the BSP
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If
+this bit is set by the BIOS then CPUID evaluation does not work when
+data from any leaf greater than two is needed; early_cpu_init() in
+particular wants to collect leaf 7 data.
+
+Cure this by unlocking CPUID right before evaluating anything which
+depends on the maximum CPUID leaf being greater than two.
+
+Inspired by (and description cloned from) Linux commit 0c2f6d04619e
+("x86/topology/intel: Unlock CPUID before evaluating anything").
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: fa4d026737a47cd1d66ffb797a29150b4453aa9f
+master date: 2024-06-18 15:12:44 +0200
+---
+ xen/arch/x86/cpu/common.c | 3 ++-
+ xen/arch/x86/cpu/cpu.h | 2 ++
+ xen/arch/x86/cpu/intel.c | 29 +++++++++++++++++------------
+ 3 files changed, 21 insertions(+), 13 deletions(-)
+
+diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
+index 26eed2ade1..edec0a2546 100644
+--- a/xen/arch/x86/cpu/common.c
++++ b/xen/arch/x86/cpu/common.c
+@@ -336,7 +336,8 @@ void __init early_cpu_init(bool verbose)
+
+ c->x86_vendor = x86_cpuid_lookup_vendor(ebx, ecx, edx);
+ switch (c->x86_vendor) {
+- case X86_VENDOR_INTEL: actual_cpu = intel_cpu_dev; break;
++ case X86_VENDOR_INTEL: intel_unlock_cpuid_leaves(c);
++ actual_cpu = intel_cpu_dev; break;
+ case X86_VENDOR_AMD: actual_cpu = amd_cpu_dev; break;
+ case X86_VENDOR_CENTAUR: actual_cpu = centaur_cpu_dev; break;
+ case X86_VENDOR_SHANGHAI: actual_cpu = shanghai_cpu_dev; break;
+diff --git a/xen/arch/x86/cpu/cpu.h b/xen/arch/x86/cpu/cpu.h
+index e3d06278b3..8be65e975a 100644
+--- a/xen/arch/x86/cpu/cpu.h
++++ b/xen/arch/x86/cpu/cpu.h
+@@ -24,3 +24,5 @@ void amd_init_lfence(struct cpuinfo_x86 *c);
+ void amd_init_ssbd(const struct cpuinfo_x86 *c);
+ void amd_init_spectral_chicken(void);
+ void detect_zen2_null_seg_behaviour(void);
++
++void intel_unlock_cpuid_leaves(struct cpuinfo_x86 *c);
+diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
+index deb7b70464..0dc7c27601 100644
+--- a/xen/arch/x86/cpu/intel.c
++++ b/xen/arch/x86/cpu/intel.c
+@@ -303,10 +303,24 @@ static void __init noinline intel_init_levelling(void)
+ ctxt_switch_masking = intel_ctxt_switch_masking;
+ }
+
+-static void cf_check early_init_intel(struct cpuinfo_x86 *c)
++/* Unmask CPUID levels if masked. */
++void intel_unlock_cpuid_leaves(struct cpuinfo_x86 *c)
+ {
+- u64 misc_enable, disable;
++ uint64_t misc_enable, disable;
++
++ rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
++
++ disable = misc_enable & MSR_IA32_MISC_ENABLE_LIMIT_CPUID;
++ if (disable) {
++ wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable);
++ bootsym(trampoline_misc_enable_off) |= disable;
++ c->cpuid_level = cpuid_eax(0);
++ printk(KERN_INFO "revised cpuid level: %u\n", c->cpuid_level);
++ }
++}
+
++static void cf_check early_init_intel(struct cpuinfo_x86 *c)
++{
+ /* Netburst reports 64 bytes clflush size, but does IO in 128 bytes */
+ if (c->x86 == 15 && c->x86_cache_alignment == 64)
+ c->x86_cache_alignment = 128;
+@@ -315,16 +329,7 @@ static void cf_check early_init_intel(struct cpuinfo_x86 *c)
+ bootsym(trampoline_misc_enable_off) & MSR_IA32_MISC_ENABLE_XD_DISABLE)
+ printk(KERN_INFO "re-enabled NX (Execute Disable) protection\n");
+
+- /* Unmask CPUID levels and NX if masked: */
+- rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
+-
+- disable = misc_enable & MSR_IA32_MISC_ENABLE_LIMIT_CPUID;
+- if (disable) {
+- wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable);
+- bootsym(trampoline_misc_enable_off) |= disable;
+- printk(KERN_INFO "revised cpuid level: %d\n",
+- cpuid_eax(0));
+- }
++ intel_unlock_cpuid_leaves(c);
+
+ /* CPUID workaround for Intel 0F33/0F34 CPU */
+ if (boot_cpu_data.x86 == 0xF && boot_cpu_data.x86_model == 3 &&
+--
+2.45.2
+
diff --git a/0029-x86-spec-do-not-print-thunk-option-selection-if-not-.patch b/0029-x86-spec-do-not-print-thunk-option-selection-if-not-.patch
deleted file mode 100644
index 71e6633..0000000
--- a/0029-x86-spec-do-not-print-thunk-option-selection-if-not-.patch
+++ /dev/null
@@ -1,50 +0,0 @@
-From 0ce25b46ab2fb53a1b58f7682ca14971453f4f2c Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Mar 2024 11:58:36 +0100
-Subject: [PATCH 29/67] x86/spec: do not print thunk option selection if not
- built-in
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Since the thunk built-in enable is printed as part of the "Compiled-in
-support:" line, avoid printing anything in "Xen settings:" if the thunk is
-disabled at build time.
-
-Note the BTI-Thunk option printing is also adjusted to print a colon in the
-same way the other options on the line do.
-
-Requested-by: Jan Beulich <jbeulich@suse.com>
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 576528a2a742069af203e90c613c5c93e23c9755
-master date: 2024-02-27 14:58:40 +0100
----
- xen/arch/x86/spec_ctrl.c | 11 ++++++-----
- 1 file changed, 6 insertions(+), 5 deletions(-)
-
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 098fa3184d..25a18ac598 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -516,11 +516,12 @@ static void __init print_details(enum ind_thunk thunk)
- "\n");
-
- /* Settings for Xen's protection, irrespective of guests. */
-- printk(" Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s\n",
-- thunk == THUNK_NONE ? "N/A" :
-- thunk == THUNK_RETPOLINE ? "RETPOLINE" :
-- thunk == THUNK_LFENCE ? "LFENCE" :
-- thunk == THUNK_JMP ? "JMP" : "?",
-+ printk(" Xen settings: %s%sSPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s\n",
-+ thunk != THUNK_NONE ? "BTI-Thunk: " : "",
-+ thunk == THUNK_NONE ? "" :
-+ thunk == THUNK_RETPOLINE ? "RETPOLINE, " :
-+ thunk == THUNK_LFENCE ? "LFENCE, " :
-+ thunk == THUNK_JMP ? "JMP, " : "?, ",
- (!boot_cpu_has(X86_FEATURE_IBRSB) &&
- !boot_cpu_has(X86_FEATURE_IBRS)) ? "No" :
- (default_xen_spec_ctrl & SPEC_CTRL_IBRS) ? "IBRS+" : "IBRS-",
---
-2.44.0
-
diff --git a/0030-x86-irq-deal-with-old_cpu_mask-for-interrupts-in-mov.patch b/0030-x86-irq-deal-with-old_cpu_mask-for-interrupts-in-mov.patch
new file mode 100644
index 0000000..785df10
--- /dev/null
+++ b/0030-x86-irq-deal-with-old_cpu_mask-for-interrupts-in-mov.patch
@@ -0,0 +1,84 @@
+From 39a6170c15bf369a2b26c855ea7621387ed4070b Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Wed, 26 Jun 2024 13:41:35 +0200
+Subject: [PATCH 30/56] x86/irq: deal with old_cpu_mask for interrupts in
+ movement in fixup_irqs()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Given the current logic it's possible for ->arch.old_cpu_mask to get out of
+sync: if a CPU set in old_cpu_mask is offlined and then onlined
+again without old_cpu_mask having been updated the data in the mask will no
+longer be accurate, as when brought back online the CPU will no longer have
+old_vector configured to handle the old interrupt source.
+
+If there's an interrupt movement in progress, and the to be offlined CPU (which
+is the call context) is in the old_cpu_mask, clear it and update the mask, so
+it doesn't contain stale data.
+
+Note that when the system is going down fixup_irqs() will be called by
+smp_send_stop() from CPU 0 with a mask with only CPU 0 on it, effectively
+asking to move all interrupts to the current caller (CPU 0) which is the only
+CPU to remain online. In that case we don't care to migrate interrupts that
+are in the process of being moved, as it's likely we won't be able to move all
+interrupts to CPU 0 due to vector shortage anyway.
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 817d1cd627be668c358d038f0fadbf7d24d417d3
+master date: 2024-06-18 15:14:49 +0200
+---
+ xen/arch/x86/irq.c | 29 ++++++++++++++++++++++++++++-
+ 1 file changed, 28 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
+index 566331bec1..f877327975 100644
+--- a/xen/arch/x86/irq.c
++++ b/xen/arch/x86/irq.c
+@@ -2539,7 +2539,7 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
+ for ( irq = 0; irq < nr_irqs; irq++ )
+ {
+ bool break_affinity = false, set_affinity = true;
+- unsigned int vector;
++ unsigned int vector, cpu = smp_processor_id();
+ cpumask_t *affinity = this_cpu(scratch_cpumask);
+
+ if ( irq == 2 )
+@@ -2582,6 +2582,33 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
+ affinity);
+ }
+
++ if ( desc->arch.move_in_progress &&
++ /*
++ * Only attempt to adjust the mask if the current CPU is going
++ * offline, otherwise the whole system is going down and leaving
++ * stale data in the masks is fine.
++ */
++ !cpu_online(cpu) &&
++ cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )
++ {
++ /*
++ * This CPU is going offline, remove it from ->arch.old_cpu_mask
++ * and possibly release the old vector if the old mask becomes
++ * empty.
++ *
++ * Note cleaning ->arch.old_cpu_mask is required if the CPU is
++ * brought offline and then online again, as when re-onlined the
++ * per-cpu vector table will no longer have ->arch.old_vector
++ * setup, and hence ->arch.old_cpu_mask would be stale.
++ */
++ cpumask_clear_cpu(cpu, desc->arch.old_cpu_mask);
++ if ( cpumask_empty(desc->arch.old_cpu_mask) )
++ {
++ desc->arch.move_in_progress = 0;
++ release_old_vec(desc);
++ }
++ }
++
+ /*
+ * Avoid shuffling the interrupt around as long as current target CPUs
+ * are a subset of the input mask. What fixup_irqs() cares about is
+--
+2.45.2
+
diff --git a/0030-xen-livepatch-register-livepatch-regions-when-loaded.patch b/0030-xen-livepatch-register-livepatch-regions-when-loaded.patch
deleted file mode 100644
index f521ecc..0000000
--- a/0030-xen-livepatch-register-livepatch-regions-when-loaded.patch
+++ /dev/null
@@ -1,159 +0,0 @@
-From b11917de0cd261a878beaf50c18a689bde0b2f50 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Mar 2024 11:59:26 +0100
-Subject: [PATCH 30/67] xen/livepatch: register livepatch regions when loaded
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Currently livepatch regions are registered as virtual regions only after the
-livepatch has been applied.
-
-This can lead to issues when using the pre-apply or post-revert hooks, as at
-that point the livepatch is not in the virtual regions list. If a livepatch
-pre-apply hook contains a WARN() it would trigger an hypervisor crash, as the
-code to handle the bug frame won't be able to find the instruction pointer that
-triggered the #UD in any of the registered virtual regions, and hence crash.
-
-Fix this by adding the livepatch payloads as virtual regions as soon as loaded,
-and only remove them once the payload is unloaded. This requires some changes
-to the virtual regions code, as the removal of the virtual regions is no longer
-done in stop machine context, and hence an RCU barrier is added in order to
-make sure there are no users of the virtual region after it's been removed from
-the list.
-
-Fixes: 8313c864fa95 ('livepatch: Implement pre-|post- apply|revert hooks')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
-master commit: a57b4074ab39bee78b6c116277f0a9963bd8e687
-master date: 2024-02-28 16:57:25 +0000
----
- xen/common/livepatch.c | 4 ++--
- xen/common/virtual_region.c | 44 ++++++++++++++-----------------------
- 2 files changed, 19 insertions(+), 29 deletions(-)
-
-diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
-index c2ae84d18b..537e9f33e4 100644
---- a/xen/common/livepatch.c
-+++ b/xen/common/livepatch.c
-@@ -1015,6 +1015,7 @@ static int build_symbol_table(struct payload *payload,
- static void free_payload(struct payload *data)
- {
- ASSERT(spin_is_locked(&payload_lock));
-+ unregister_virtual_region(&data->region);
- list_del(&data->list);
- payload_cnt--;
- payload_version++;
-@@ -1114,6 +1115,7 @@ static int livepatch_upload(struct xen_sysctl_livepatch_upload *upload)
- INIT_LIST_HEAD(&data->list);
- INIT_LIST_HEAD(&data->applied_list);
-
-+ register_virtual_region(&data->region);
- list_add_tail(&data->list, &payload_list);
- payload_cnt++;
- payload_version++;
-@@ -1330,7 +1332,6 @@ static inline void apply_payload_tail(struct payload *data)
- * The applied_list is iterated by the trap code.
- */
- list_add_tail_rcu(&data->applied_list, &applied_list);
-- register_virtual_region(&data->region);
-
- data->state = LIVEPATCH_STATE_APPLIED;
- }
-@@ -1376,7 +1377,6 @@ static inline void revert_payload_tail(struct payload *data)
- * The applied_list is iterated by the trap code.
- */
- list_del_rcu(&data->applied_list);
-- unregister_virtual_region(&data->region);
-
- data->reverted = true;
- data->state = LIVEPATCH_STATE_CHECKED;
-diff --git a/xen/common/virtual_region.c b/xen/common/virtual_region.c
-index 5f89703f51..9f12c30efe 100644
---- a/xen/common/virtual_region.c
-+++ b/xen/common/virtual_region.c
-@@ -23,14 +23,8 @@ static struct virtual_region core_init __initdata = {
- };
-
- /*
-- * RCU locking. Additions are done either at startup (when there is only
-- * one CPU) or when all CPUs are running without IRQs.
-- *
-- * Deletions are bit tricky. We do it when Live Patch (all CPUs running
-- * without IRQs) or during bootup (when clearing the init).
-- *
-- * Hence we use list_del_rcu (which sports an memory fence) and a spinlock
-- * on deletion.
-+ * RCU locking. Modifications to the list must be done in exclusive mode, and
-+ * hence need to hold the spinlock.
- *
- * All readers of virtual_region_list MUST use list_for_each_entry_rcu.
- */
-@@ -58,41 +52,36 @@ const struct virtual_region *find_text_region(unsigned long addr)
-
- void register_virtual_region(struct virtual_region *r)
- {
-- ASSERT(!local_irq_is_enabled());
-+ unsigned long flags;
-
-+ spin_lock_irqsave(&virtual_region_lock, flags);
- list_add_tail_rcu(&r->list, &virtual_region_list);
-+ spin_unlock_irqrestore(&virtual_region_lock, flags);
- }
-
--static void remove_virtual_region(struct virtual_region *r)
-+/*
-+ * Suggest inline so when !CONFIG_LIVEPATCH the function is not left
-+ * unreachable after init code is removed.
-+ */
-+static void inline remove_virtual_region(struct virtual_region *r)
- {
- unsigned long flags;
-
- spin_lock_irqsave(&virtual_region_lock, flags);
- list_del_rcu(&r->list);
- spin_unlock_irqrestore(&virtual_region_lock, flags);
-- /*
-- * We do not need to invoke call_rcu.
-- *
-- * This is due to the fact that on the deletion we have made sure
-- * to use spinlocks (to guard against somebody else calling
-- * unregister_virtual_region) and list_deletion spiced with
-- * memory barrier.
-- *
-- * That protects us from corrupting the list as the readers all
-- * use list_for_each_entry_rcu which is safe against concurrent
-- * deletions.
-- */
- }
-
-+#ifdef CONFIG_LIVEPATCH
- void unregister_virtual_region(struct virtual_region *r)
- {
-- /* Expected to be called from Live Patch - which has IRQs disabled. */
-- ASSERT(!local_irq_is_enabled());
--
- remove_virtual_region(r);
-+
-+ /* Assert that no CPU might be using the removed region. */
-+ rcu_barrier();
- }
-
--#if defined(CONFIG_LIVEPATCH) && defined(CONFIG_X86)
-+#ifdef CONFIG_X86
- void relax_virtual_region_perms(void)
- {
- const struct virtual_region *region;
-@@ -116,7 +105,8 @@ void tighten_virtual_region_perms(void)
- PAGE_HYPERVISOR_RX);
- rcu_read_unlock(&rcu_virtual_region_lock);
- }
--#endif
-+#endif /* CONFIG_X86 */
-+#endif /* CONFIG_LIVEPATCH */
-
- void __init unregister_init_virtual_region(void)
- {
---
-2.44.0
-
diff --git a/0031-x86-irq-handle-moving-interrupts-in-_assign_irq_vect.patch b/0031-x86-irq-handle-moving-interrupts-in-_assign_irq_vect.patch
new file mode 100644
index 0000000..96e87cd
--- /dev/null
+++ b/0031-x86-irq-handle-moving-interrupts-in-_assign_irq_vect.patch
@@ -0,0 +1,172 @@
+From 3a8f4ec75d8ed8da6370deac95c341cbada96802 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Wed, 26 Jun 2024 13:42:05 +0200
+Subject: [PATCH 31/56] x86/irq: handle moving interrupts in
+ _assign_irq_vector()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Currently there's logic in fixup_irqs() that attempts to prevent
+_assign_irq_vector() from failing, as fixup_irqs() is required to evacuate all
+interrupts from the CPUs not present in the input mask. The current logic in
+fixup_irqs() is incomplete, as it doesn't deal with interrupts that have
+move_cleanup_count > 0 and a non-empty ->arch.old_cpu_mask field.
+
+Instead of attempting to fixup the interrupt descriptor in fixup_irqs() so that
+_assign_irq_vector() cannot fail, introduce logic in _assign_irq_vector()
+to deal with interrupts that have either move_{in_progress,cleanup_count} set
+and no remaining online CPUs in ->arch.cpu_mask.
+
+If _assign_irq_vector() is requested to move an interrupt in the state
+described above, first attempt to see if ->arch.old_cpu_mask contains any valid
+CPUs that could be used as fallback, and if that's the case do move the
+interrupt back to the previous destination. Note this is easier because the
+vector hasn't been released yet, so there's no need to allocate and setup a new
+vector on the destination.
+
+Due to the logic in fixup_irqs() that clears offline CPUs from
+->arch.old_cpu_mask (and releases the old vector if the mask becomes empty) it
+shouldn't be possible to get into _assign_irq_vector() with
+->arch.move_{in_progress,cleanup_count} set but no online CPUs in
+->arch.old_cpu_mask.
+
+However if ->arch.move_{in_progress,cleanup_count} is set and the interrupt has
+also changed affinity, it's possible the members of ->arch.old_cpu_mask are no
+longer part of the affinity set, move the interrupt to a different CPU part of
+the provided mask and keep the current ->arch.old_{cpu_mask,vector} for the
+pending interrupt movement to be completed.
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 369558924a642bbb0cb731e9a3375958867cb17b
+master date: 2024-06-18 15:15:10 +0200
+---
+ xen/arch/x86/irq.c | 97 ++++++++++++++++++++++++++++++++--------------
+ 1 file changed, 68 insertions(+), 29 deletions(-)
+
+diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
+index f877327975..13ef61a5b7 100644
+--- a/xen/arch/x86/irq.c
++++ b/xen/arch/x86/irq.c
+@@ -553,7 +553,58 @@ static int _assign_irq_vector(struct irq_desc *desc, const cpumask_t *mask)
+ }
+
+ if ( desc->arch.move_in_progress || desc->arch.move_cleanup_count )
+- return -EAGAIN;
++ {
++ /*
++ * If the current destination is online refuse to shuffle. Retry after
++ * the in-progress movement has finished.
++ */
++ if ( cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map) )
++ return -EAGAIN;
++
++ /*
++ * Due to the logic in fixup_irqs() that clears offlined CPUs from
++ * ->arch.old_cpu_mask it shouldn't be possible to get here with
++ * ->arch.move_{in_progress,cleanup_count} set and no online CPUs in
++ * ->arch.old_cpu_mask.
++ */
++ ASSERT(valid_irq_vector(desc->arch.old_vector));
++ ASSERT(cpumask_intersects(desc->arch.old_cpu_mask, &cpu_online_map));
++
++ if ( cpumask_intersects(desc->arch.old_cpu_mask, mask) )
++ {
++ /*
++ * Fallback to the old destination if moving is in progress and the
++ * current destination is to be offlined. This is only possible if
++ * the CPUs in old_cpu_mask intersect with the affinity mask passed
++ * in the 'mask' parameter.
++ */
++ desc->arch.vector = desc->arch.old_vector;
++ cpumask_and(desc->arch.cpu_mask, desc->arch.old_cpu_mask, mask);
++
++ /* Undo any possibly done cleanup. */
++ for_each_cpu(cpu, desc->arch.cpu_mask)
++ per_cpu(vector_irq, cpu)[desc->arch.vector] = irq;
++
++ /* Cancel the pending move and release the current vector. */
++ desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
++ cpumask_clear(desc->arch.old_cpu_mask);
++ desc->arch.move_in_progress = 0;
++ desc->arch.move_cleanup_count = 0;
++ if ( desc->arch.used_vectors )
++ {
++ ASSERT(test_bit(old_vector, desc->arch.used_vectors));
++ clear_bit(old_vector, desc->arch.used_vectors);
++ }
++
++ return 0;
++ }
++
++ /*
++ * There's an interrupt movement in progress but the destination(s) in
++ * ->arch.old_cpu_mask are not suitable given the 'mask' parameter, go
++ * through the full logic to find a new vector in a suitable CPU.
++ */
++ }
+
+ err = -ENOSPC;
+
+@@ -609,7 +660,22 @@ next:
+ current_vector = vector;
+ current_offset = offset;
+
+- if ( valid_irq_vector(old_vector) )
++ if ( desc->arch.move_in_progress || desc->arch.move_cleanup_count )
++ {
++ ASSERT(!cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map));
++ /*
++ * Special case when evacuating an interrupt from a CPU to be
++ * offlined and the interrupt was already in the process of being
++ * moved. Leave ->arch.old_{vector,cpu_mask} as-is and just
++ * replace ->arch.{cpu_mask,vector} with the new destination.
++ * Cleanup will be done normally for the old fields, just release
++ * the current vector here.
++ */
++ if ( desc->arch.used_vectors &&
++ !test_and_clear_bit(old_vector, desc->arch.used_vectors) )
++ ASSERT_UNREACHABLE();
++ }
++ else if ( valid_irq_vector(old_vector) )
+ {
+ cpumask_and(desc->arch.old_cpu_mask, desc->arch.cpu_mask,
+ &cpu_online_map);
+@@ -2620,33 +2686,6 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
+ continue;
+ }
+
+- /*
+- * In order for the affinity adjustment below to be successful, we
+- * need _assign_irq_vector() to succeed. This in particular means
+- * clearing desc->arch.move_in_progress if this would otherwise
+- * prevent the function from succeeding. Since there's no way for the
+- * flag to get cleared anymore when there's no possible destination
+- * left (the only possibility then would be the IRQs enabled window
+- * after this loop), there's then also no race with us doing it here.
+- *
+- * Therefore the logic here and there need to remain in sync.
+- */
+- if ( desc->arch.move_in_progress &&
+- !cpumask_intersects(mask, desc->arch.cpu_mask) )
+- {
+- unsigned int cpu;
+-
+- cpumask_and(affinity, desc->arch.old_cpu_mask, &cpu_online_map);
+-
+- spin_lock(&vector_lock);
+- for_each_cpu(cpu, affinity)
+- per_cpu(vector_irq, cpu)[desc->arch.old_vector] = ~irq;
+- spin_unlock(&vector_lock);
+-
+- release_old_vec(desc);
+- desc->arch.move_in_progress = 0;
+- }
+-
+ if ( !cpumask_intersects(mask, desc->affinity) )
+ {
+ break_affinity = true;
+--
+2.45.2
+
diff --git a/0031-xen-livepatch-search-for-symbols-in-all-loaded-paylo.patch b/0031-xen-livepatch-search-for-symbols-in-all-loaded-paylo.patch
deleted file mode 100644
index c778639..0000000
--- a/0031-xen-livepatch-search-for-symbols-in-all-loaded-paylo.patch
+++ /dev/null
@@ -1,149 +0,0 @@
-From c54cf903b06fb1933fad053cc547580c92c856ea Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Mar 2024 11:59:35 +0100
-Subject: [PATCH 31/67] xen/livepatch: search for symbols in all loaded
- payloads
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-When checking if an address belongs to a patch, or when resolving a symbol,
-take into account all loaded livepatch payloads, even if not applied.
-
-This is required in order for the pre-apply and post-revert hooks to work
-properly, or else Xen won't detect the instruction pointer belonging to those
-hooks as being part of the currently active text.
-
-Move the RCU handling to be used for payload_list instead of applied_list, as
-now the calls from trap code will iterate over the payload_list.
-
-Fixes: 8313c864fa95 ('livepatch: Implement pre-|post- apply|revert hooks')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
-master commit: d2daa40fb3ddb8f83e238e57854bd878924cde90
-master date: 2024-02-28 16:57:25 +0000
----
- xen/common/livepatch.c | 49 +++++++++++++++---------------------------
- 1 file changed, 17 insertions(+), 32 deletions(-)
-
-diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
-index 537e9f33e4..a129ab9973 100644
---- a/xen/common/livepatch.c
-+++ b/xen/common/livepatch.c
-@@ -36,13 +36,14 @@
- * caller in schedule_work.
- */
- static DEFINE_SPINLOCK(payload_lock);
--static LIST_HEAD(payload_list);
--
- /*
-- * Patches which have been applied. Need RCU in case we crash (and then
-- * traps code would iterate via applied_list) when adding entries on the list.
-+ * Need RCU in case we crash (and then traps code would iterate via
-+ * payload_list) when adding entries on the list.
- */
--static DEFINE_RCU_READ_LOCK(rcu_applied_lock);
-+static DEFINE_RCU_READ_LOCK(rcu_payload_lock);
-+static LIST_HEAD(payload_list);
-+
-+/* Patches which have been applied. Only modified from stop machine context. */
- static LIST_HEAD(applied_list);
-
- static unsigned int payload_cnt;
-@@ -111,12 +112,8 @@ bool_t is_patch(const void *ptr)
- const struct payload *data;
- bool_t r = 0;
-
-- /*
-- * Only RCU locking since this list is only ever changed during apply
-- * or revert context. And in case it dies there we need an safe list.
-- */
-- rcu_read_lock(&rcu_applied_lock);
-- list_for_each_entry_rcu ( data, &applied_list, applied_list )
-+ rcu_read_lock(&rcu_payload_lock);
-+ list_for_each_entry_rcu ( data, &payload_list, list )
- {
- if ( (ptr >= data->rw_addr &&
- ptr < (data->rw_addr + data->rw_size)) ||
-@@ -130,7 +127,7 @@ bool_t is_patch(const void *ptr)
- }
-
- }
-- rcu_read_unlock(&rcu_applied_lock);
-+ rcu_read_unlock(&rcu_payload_lock);
-
- return r;
- }
-@@ -166,12 +163,8 @@ static const char *cf_check livepatch_symbols_lookup(
- const void *va = (const void *)addr;
- const char *n = NULL;
-
-- /*
-- * Only RCU locking since this list is only ever changed during apply
-- * or revert context. And in case it dies there we need an safe list.
-- */
-- rcu_read_lock(&rcu_applied_lock);
-- list_for_each_entry_rcu ( data, &applied_list, applied_list )
-+ rcu_read_lock(&rcu_payload_lock);
-+ list_for_each_entry_rcu ( data, &payload_list, list )
- {
- if ( va < data->text_addr ||
- va >= (data->text_addr + data->text_size) )
-@@ -200,7 +193,7 @@ static const char *cf_check livepatch_symbols_lookup(
- n = data->symtab[best].name;
- break;
- }
-- rcu_read_unlock(&rcu_applied_lock);
-+ rcu_read_unlock(&rcu_payload_lock);
-
- return n;
- }
-@@ -1016,7 +1009,8 @@ static void free_payload(struct payload *data)
- {
- ASSERT(spin_is_locked(&payload_lock));
- unregister_virtual_region(&data->region);
-- list_del(&data->list);
-+ list_del_rcu(&data->list);
-+ rcu_barrier();
- payload_cnt--;
- payload_version++;
- free_payload_data(data);
-@@ -1116,7 +1110,7 @@ static int livepatch_upload(struct xen_sysctl_livepatch_upload *upload)
- INIT_LIST_HEAD(&data->applied_list);
-
- register_virtual_region(&data->region);
-- list_add_tail(&data->list, &payload_list);
-+ list_add_tail_rcu(&data->list, &payload_list);
- payload_cnt++;
- payload_version++;
- }
-@@ -1327,11 +1321,7 @@ static int apply_payload(struct payload *data)
-
- static inline void apply_payload_tail(struct payload *data)
- {
-- /*
-- * We need RCU variant (which has barriers) in case we crash here.
-- * The applied_list is iterated by the trap code.
-- */
-- list_add_tail_rcu(&data->applied_list, &applied_list);
-+ list_add_tail(&data->applied_list, &applied_list);
-
- data->state = LIVEPATCH_STATE_APPLIED;
- }
-@@ -1371,12 +1361,7 @@ static int revert_payload(struct payload *data)
-
- static inline void revert_payload_tail(struct payload *data)
- {
--
-- /*
-- * We need RCU variant (which has barriers) in case we crash here.
-- * The applied_list is iterated by the trap code.
-- */
-- list_del_rcu(&data->applied_list);
-+ list_del(&data->applied_list);
-
- data->reverted = true;
- data->state = LIVEPATCH_STATE_CHECKED;
---
-2.44.0
-
diff --git a/0032-xen-livepatch-fix-norevert-test-attempt-to-open-code.patch b/0032-xen-livepatch-fix-norevert-test-attempt-to-open-code.patch
deleted file mode 100644
index 76af9ef..0000000
--- a/0032-xen-livepatch-fix-norevert-test-attempt-to-open-code.patch
+++ /dev/null
@@ -1,186 +0,0 @@
-From 5564323f643715f9d364df88e0eb9c7d6fd2c22b Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Mar 2024 11:59:43 +0100
-Subject: [PATCH 32/67] xen/livepatch: fix norevert test attempt to open-code
- revert
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The purpose of the norevert test is to install a dummy handler that replaces
-the internal Xen revert code, and then perform the revert in the post-revert
-hook. For that purpose the usage of the previous common_livepatch_revert() is
-not enough, as that just reverts specific functions, but not the whole state of
-the payload.
-
-Remove both common_livepatch_{apply,revert}() and instead expose
-revert_payload{,_tail}() in order to perform the patch revert from the
-post-revert hook.
-
-Fixes: 6047104c3ccc ('livepatch: Add per-function applied/reverted state tracking marker')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
-master commit: cdae267ce10d04d71d1687b5701ff2911a96b6dc
-master date: 2024-02-28 16:57:25 +0000
----
- xen/common/livepatch.c | 41 +++++++++++++++++--
- xen/include/xen/livepatch.h | 32 ++-------------
- .../livepatch/xen_action_hooks_norevert.c | 22 +++-------
- 3 files changed, 46 insertions(+), 49 deletions(-)
-
-diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
-index a129ab9973..a5068a2217 100644
---- a/xen/common/livepatch.c
-+++ b/xen/common/livepatch.c
-@@ -1310,7 +1310,22 @@ static int apply_payload(struct payload *data)
- ASSERT(!local_irq_is_enabled());
-
- for ( i = 0; i < data->nfuncs; i++ )
-- common_livepatch_apply(&data->funcs[i], &data->fstate[i]);
-+ {
-+ const struct livepatch_func *func = &data->funcs[i];
-+ struct livepatch_fstate *state = &data->fstate[i];
-+
-+ /* If the action has been already executed on this function, do nothing. */
-+ if ( state->applied == LIVEPATCH_FUNC_APPLIED )
-+ {
-+ printk(XENLOG_WARNING LIVEPATCH
-+ "%s: %s has been already applied before\n",
-+ __func__, func->name);
-+ continue;
-+ }
-+
-+ arch_livepatch_apply(func, state);
-+ state->applied = LIVEPATCH_FUNC_APPLIED;
-+ }
-
- arch_livepatch_revive();
-
-@@ -1326,7 +1341,7 @@ static inline void apply_payload_tail(struct payload *data)
- data->state = LIVEPATCH_STATE_APPLIED;
- }
-
--static int revert_payload(struct payload *data)
-+int revert_payload(struct payload *data)
- {
- unsigned int i;
- int rc;
-@@ -1341,7 +1356,25 @@ static int revert_payload(struct payload *data)
- }
-
- for ( i = 0; i < data->nfuncs; i++ )
-- common_livepatch_revert(&data->funcs[i], &data->fstate[i]);
-+ {
-+ const struct livepatch_func *func = &data->funcs[i];
-+ struct livepatch_fstate *state = &data->fstate[i];
-+
-+ /*
-+ * If the apply action hasn't been executed on this function, do
-+ * nothing.
-+ */
-+ if ( !func->old_addr || state->applied == LIVEPATCH_FUNC_NOT_APPLIED )
-+ {
-+ printk(XENLOG_WARNING LIVEPATCH
-+ "%s: %s has not been applied before\n",
-+ __func__, func->name);
-+ continue;
-+ }
-+
-+ arch_livepatch_revert(func, state);
-+ state->applied = LIVEPATCH_FUNC_NOT_APPLIED;
-+ }
-
- /*
- * Since we are running with IRQs disabled and the hooks may call common
-@@ -1359,7 +1392,7 @@ static int revert_payload(struct payload *data)
- return 0;
- }
-
--static inline void revert_payload_tail(struct payload *data)
-+void revert_payload_tail(struct payload *data)
- {
- list_del(&data->applied_list);
-
-diff --git a/xen/include/xen/livepatch.h b/xen/include/xen/livepatch.h
-index 537d3d58b6..c9ee58fd37 100644
---- a/xen/include/xen/livepatch.h
-+++ b/xen/include/xen/livepatch.h
-@@ -136,35 +136,11 @@ void arch_livepatch_post_action(void);
- void arch_livepatch_mask(void);
- void arch_livepatch_unmask(void);
-
--static inline void common_livepatch_apply(const struct livepatch_func *func,
-- struct livepatch_fstate *state)
--{
-- /* If the action has been already executed on this function, do nothing. */
-- if ( state->applied == LIVEPATCH_FUNC_APPLIED )
-- {
-- printk(XENLOG_WARNING LIVEPATCH "%s: %s has been already applied before\n",
-- __func__, func->name);
-- return;
-- }
--
-- arch_livepatch_apply(func, state);
-- state->applied = LIVEPATCH_FUNC_APPLIED;
--}
-+/* Only for testing purposes. */
-+struct payload;
-+int revert_payload(struct payload *data);
-+void revert_payload_tail(struct payload *data);
-
--static inline void common_livepatch_revert(const struct livepatch_func *func,
-- struct livepatch_fstate *state)
--{
-- /* If the apply action hasn't been executed on this function, do nothing. */
-- if ( !func->old_addr || state->applied == LIVEPATCH_FUNC_NOT_APPLIED )
-- {
-- printk(XENLOG_WARNING LIVEPATCH "%s: %s has not been applied before\n",
-- __func__, func->name);
-- return;
-- }
--
-- arch_livepatch_revert(func, state);
-- state->applied = LIVEPATCH_FUNC_NOT_APPLIED;
--}
- #else
-
- /*
-diff --git a/xen/test/livepatch/xen_action_hooks_norevert.c b/xen/test/livepatch/xen_action_hooks_norevert.c
-index c173855192..c5fbab1746 100644
---- a/xen/test/livepatch/xen_action_hooks_norevert.c
-+++ b/xen/test/livepatch/xen_action_hooks_norevert.c
-@@ -96,26 +96,14 @@ static int revert_hook(livepatch_payload_t *payload)
-
- static void post_revert_hook(livepatch_payload_t *payload)
- {
-- int i;
-+ unsigned long flags;
-
- printk(KERN_DEBUG "%s: Hook starting.\n", __func__);
-
-- for (i = 0; i < payload->nfuncs; i++)
-- {
-- const struct livepatch_func *func = &payload->funcs[i];
-- struct livepatch_fstate *fstate = &payload->fstate[i];
--
-- BUG_ON(revert_cnt != 1);
-- BUG_ON(fstate->applied != LIVEPATCH_FUNC_APPLIED);
--
-- /* Outside of quiesce zone: MAY TRIGGER HOST CRASH/UNDEFINED BEHAVIOR */
-- arch_livepatch_quiesce();
-- common_livepatch_revert(payload);
-- arch_livepatch_revive();
-- BUG_ON(fstate->applied == LIVEPATCH_FUNC_APPLIED);
--
-- printk(KERN_DEBUG "%s: post reverted: %s\n", __func__, func->name);
-- }
-+ local_irq_save(flags);
-+ BUG_ON(revert_payload(payload));
-+ revert_payload_tail(payload);
-+ local_irq_restore(flags);
-
- printk(KERN_DEBUG "%s: Hook done.\n", __func__);
- }
---
-2.44.0
-
diff --git a/0032-xen-ubsan-Fix-UB-in-type_descriptor-declaration.patch b/0032-xen-ubsan-Fix-UB-in-type_descriptor-declaration.patch
new file mode 100644
index 0000000..c7c0968
--- /dev/null
+++ b/0032-xen-ubsan-Fix-UB-in-type_descriptor-declaration.patch
@@ -0,0 +1,39 @@
+From 5397ab9995f7354e7f8122a8a91c810256afa3d1 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 26 Jun 2024 13:42:30 +0200
+Subject: [PATCH 32/56] xen/ubsan: Fix UB in type_descriptor declaration
+
+struct type_descriptor is arranged with a NUL terminated string following the
+kind/info fields.
+
+The only reason this doesn't trip UBSAN detection itself (on more modern
+compilers at least) is because struct type_descriptor is only referenced in
+suppressed regions.
+
+Switch the declaration to be a real flexible member. No functional change.
+
+Fixes: 00fcf4dd8eb4 ("xen/ubsan: Import ubsan implementation from Linux 4.13")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: bd59af99700f075d06a6d47a16f777c9519928e0
+master date: 2024-06-18 14:55:04 +0100
+---
+ xen/common/ubsan/ubsan.h | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/common/ubsan/ubsan.h b/xen/common/ubsan/ubsan.h
+index a3159040fe..3db42e75b1 100644
+--- a/xen/common/ubsan/ubsan.h
++++ b/xen/common/ubsan/ubsan.h
+@@ -10,7 +10,7 @@ enum {
+ struct type_descriptor {
+ u16 type_kind;
+ u16 type_info;
+- char type_name[1];
++ char type_name[];
+ };
+
+ struct source_location {
+--
+2.45.2
+
diff --git a/0033-x86-xstate-Fix-initialisation-of-XSS-cache.patch b/0033-x86-xstate-Fix-initialisation-of-XSS-cache.patch
new file mode 100644
index 0000000..1a8c724
--- /dev/null
+++ b/0033-x86-xstate-Fix-initialisation-of-XSS-cache.patch
@@ -0,0 +1,74 @@
+From 4ee1df89d9c92609e5fff3c9b261ce4b1bb88e42 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 26 Jun 2024 13:43:19 +0200
+Subject: [PATCH 33/56] x86/xstate: Fix initialisation of XSS cache
+
+The clobbering of this_cpu(xcr0) and this_cpu(xss) to architecturally invalid
+values is to force the subsequent set_xcr0() and set_msr_xss() to reload the
+hardware register.
+
+While XCR0 is reloaded in xstate_init(), MSR_XSS isn't. This causes
+get_msr_xss() to return the invalid value, and logic of the form:
+
+ old = get_msr_xss();
+ set_msr_xss(new);
+ ...
+ set_msr_xss(old);
+
+to try and restore said invalid value.
+
+The architecturally invalid value must be purged from the cache, meaning the
+hardware register must be written at least once. This in turn highlights that
+the invalid value must only be used in the case that the hardware register is
+available.
+
+Fixes: f7f4a523927f ("x86/xstate: reset cached register values on resume")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 9e6dbbe8bf400aacb99009ddffa91d2a0c312b39
+master date: 2024-06-19 13:00:06 +0100
+---
+ xen/arch/x86/xstate.c | 18 +++++++++++-------
+ 1 file changed, 11 insertions(+), 7 deletions(-)
+
+diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
+index f442610fc5..ca76f98fe2 100644
+--- a/xen/arch/x86/xstate.c
++++ b/xen/arch/x86/xstate.c
+@@ -641,13 +641,6 @@ void xstate_init(struct cpuinfo_x86 *c)
+ return;
+ }
+
+- /*
+- * Zap the cached values to make set_xcr0() and set_msr_xss() really
+- * write it.
+- */
+- this_cpu(xcr0) = 0;
+- this_cpu(xss) = ~0;
+-
+ cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
+ feature_mask = (((u64)edx << 32) | eax) & XCNTXT_MASK;
+ BUG_ON(!valid_xcr0(feature_mask));
+@@ -657,8 +650,19 @@ void xstate_init(struct cpuinfo_x86 *c)
+ * Set CR4_OSXSAVE and run "cpuid" to get xsave_cntxt_size.
+ */
+ set_in_cr4(X86_CR4_OSXSAVE);
++
++ /*
++ * Zap the cached values to make set_xcr0() and set_msr_xss() really write
++ * the hardware register.
++ */
++ this_cpu(xcr0) = 0;
+ if ( !set_xcr0(feature_mask) )
+ BUG();
++ if ( cpu_has_xsaves )
++ {
++ this_cpu(xss) = ~0;
++ set_msr_xss(0);
++ }
+
+ if ( bsp )
+ {
+--
+2.45.2
+
diff --git a/0033-xen-livepatch-properly-build-the-noapply-and-norever.patch b/0033-xen-livepatch-properly-build-the-noapply-and-norever.patch
deleted file mode 100644
index 76803c6..0000000
--- a/0033-xen-livepatch-properly-build-the-noapply-and-norever.patch
+++ /dev/null
@@ -1,43 +0,0 @@
-From a59106b27609b6ae2873bd6755949b1258290872 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Mar 2024 11:59:51 +0100
-Subject: [PATCH 33/67] xen/livepatch: properly build the noapply and norevert
- tests
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-It seems the build variables for those tests where copy-pasted from
-xen_action_hooks_marker-objs and not adjusted to use the correct source files.
-
-Fixes: 6047104c3ccc ('livepatch: Add per-function applied/reverted state tracking marker')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
-master commit: e579677095782c7dec792597ba8b037b7d716b32
-master date: 2024-02-28 16:57:25 +0000
----
- xen/test/livepatch/Makefile | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/xen/test/livepatch/Makefile b/xen/test/livepatch/Makefile
-index c258ab0b59..d987a8367f 100644
---- a/xen/test/livepatch/Makefile
-+++ b/xen/test/livepatch/Makefile
-@@ -118,12 +118,12 @@ xen_action_hooks_marker-objs := xen_action_hooks_marker.o xen_hello_world_func.o
- $(obj)/xen_action_hooks_noapply.o: $(obj)/config.h
-
- extra-y += xen_action_hooks_noapply.livepatch
--xen_action_hooks_noapply-objs := xen_action_hooks_marker.o xen_hello_world_func.o note.o xen_note.o
-+xen_action_hooks_noapply-objs := xen_action_hooks_noapply.o xen_hello_world_func.o note.o xen_note.o
-
- $(obj)/xen_action_hooks_norevert.o: $(obj)/config.h
-
- extra-y += xen_action_hooks_norevert.livepatch
--xen_action_hooks_norevert-objs := xen_action_hooks_marker.o xen_hello_world_func.o note.o xen_note.o
-+xen_action_hooks_norevert-objs := xen_action_hooks_norevert.o xen_hello_world_func.o note.o xen_note.o
-
- EXPECT_BYTES_COUNT := 8
- CODE_GET_EXPECT=$(shell $(OBJDUMP) -d --insn-width=1 $(1) | sed -n -e '/<'$(2)'>:$$/,/^$$/ p' | tail -n +2 | head -n $(EXPECT_BYTES_COUNT) | awk '{$$0=$$2; printf "%s", substr($$0,length-1)}' | sed 's/.\{2\}/0x&,/g' | sed 's/^/{/;s/,$$/}/g')
---
-2.44.0
-
diff --git a/0034-libxl-Fix-segfault-in-device_model_spawn_outcome.patch b/0034-libxl-Fix-segfault-in-device_model_spawn_outcome.patch
deleted file mode 100644
index 7f23a73..0000000
--- a/0034-libxl-Fix-segfault-in-device_model_spawn_outcome.patch
+++ /dev/null
@@ -1,39 +0,0 @@
-From c4ee68eda9937743527fff41f4ede0f6a3228080 Mon Sep 17 00:00:00 2001
-From: Jason Andryuk <jandryuk@gmail.com>
-Date: Tue, 5 Mar 2024 12:00:30 +0100
-Subject: [PATCH 34/67] libxl: Fix segfault in device_model_spawn_outcome
-
-libxl__spawn_qdisk_backend() explicitly sets guest_config to NULL when
-starting QEMU (the usual launch through libxl__spawn_local_dm() has a
-guest_config though).
-
-Bail early on a NULL guest_config/d_config. This skips the QMP queries
-for chardevs and VNC, but this xenpv QEMU instance isn't expected to
-provide those - only qdisk (or 9pfs backends after an upcoming change).
-
-Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
-Acked-by: Anthony PERARD <anthony.perard@citrix.com>
-master commit: d4f3d35f043f6ef29393166b0dd131c8102cf255
-master date: 2024-02-29 08:18:38 +0100
----
- tools/libs/light/libxl_dm.c | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
-index ed620a9d8e..29b43ed20a 100644
---- a/tools/libs/light/libxl_dm.c
-+++ b/tools/libs/light/libxl_dm.c
-@@ -3172,8 +3172,8 @@ static void device_model_spawn_outcome(libxl__egc *egc,
-
- /* Check if spawn failed */
- if (rc) goto out;
--
-- if (d_config->b_info.device_model_version
-+ /* d_config is NULL for xl devd/libxl__spawn_qemu_xenpv_backend(). */
-+ if (d_config && d_config->b_info.device_model_version
- == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) {
- rc = libxl__ev_time_register_rel(ao, &dmss->timeout,
- devise_model_postconfig_timeout,
---
-2.44.0
-
diff --git a/0034-x86-cpuid-Fix-handling-of-XSAVE-dynamic-leaves.patch b/0034-x86-cpuid-Fix-handling-of-XSAVE-dynamic-leaves.patch
new file mode 100644
index 0000000..1905728
--- /dev/null
+++ b/0034-x86-cpuid-Fix-handling-of-XSAVE-dynamic-leaves.patch
@@ -0,0 +1,72 @@
+From 9b43092d54b5f9e9d39d9f20393671e303b19e81 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Wed, 26 Jun 2024 13:43:44 +0200
+Subject: [PATCH 34/56] x86/cpuid: Fix handling of XSAVE dynamic leaves
+
+[ This is a minimal backport of commit 71cacfb035f4 ("x86/cpuid: Fix handling
+ of XSAVE dynamic leaves") to fix the bugs without depending on the large
+ rework of XSTATE handling in Xen 4.19 ]
+
+First, if XSAVE is available in hardware but not visible to the guest, the
+dynamic leaves shouldn't be filled in.
+
+Second, the comment concerning XSS state is wrong. VT-x doesn't manage
+host/guest state automatically, but there is provision for "host only" bits to
+be set, so the implications are still accurate.
+
+In Xen 4.18, no XSS states are supported, so it's safe to keep deferring to
+real hardware.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 71cacfb035f4a78ee10970dc38a3baa04d387451
+master date: 2024-06-19 13:00:06 +0100
+---
+ xen/arch/x86/cpuid.c | 30 +++++++++++++-----------------
+ 1 file changed, 13 insertions(+), 17 deletions(-)
+
+diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
+index 455a09b2dd..f6fd6cc6b3 100644
+--- a/xen/arch/x86/cpuid.c
++++ b/xen/arch/x86/cpuid.c
+@@ -330,24 +330,20 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
+ case XSTATE_CPUID:
+ switch ( subleaf )
+ {
+- case 1:
+- if ( p->xstate.xsavec || p->xstate.xsaves )
+- {
+- /*
+- * TODO: Figure out what to do for XSS state. VT-x manages
+- * host vs guest MSR_XSS automatically, so as soon as we start
+- * supporting any XSS states, the wrong XSS will be in
+- * context.
+- */
+- BUILD_BUG_ON(XSTATE_XSAVES_ONLY != 0);
+-
+- /*
+- * Read CPUID[0xD,0/1].EBX from hardware. They vary with
+- * enabled XSTATE, and appropraite XCR0|XSS are in context.
+- */
++ /*
++ * Read CPUID[0xd,0/1].EBX from hardware. They vary with enabled
++ * XSTATE, and the appropriate XCR0 is in context.
++ */
+ case 0:
+- res->b = cpuid_count_ebx(leaf, subleaf);
+- }
++ if ( p->basic.xsave )
++ res->b = cpuid_count_ebx(0xd, 0);
++ break;
++
++ case 1:
++ /* This only works because Xen doesn't support XSS states yet. */
++ BUILD_BUG_ON(XSTATE_XSAVES_ONLY != 0);
++ if ( p->xstate.xsavec )
++ res->b = cpuid_count_ebx(0xd, 1);
+ break;
+ }
+ break;
+--
+2.45.2
+
diff --git a/0035-x86-altcall-always-use-a-temporary-parameter-stashin.patch b/0035-x86-altcall-always-use-a-temporary-parameter-stashin.patch
deleted file mode 100644
index 177c73b..0000000
--- a/0035-x86-altcall-always-use-a-temporary-parameter-stashin.patch
+++ /dev/null
@@ -1,197 +0,0 @@
-From 2f49d9f89c14519d4cb1e06ab8370cf4ba50fab7 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 5 Mar 2024 12:00:47 +0100
-Subject: [PATCH 35/67] x86/altcall: always use a temporary parameter stashing
- variable
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The usage in ALT_CALL_ARG() on clang of:
-
-register union {
- typeof(arg) e;
- const unsigned long r;
-} ...
-
-When `arg` is the first argument to alternative_{,v}call() and
-const_vlapic_vcpu() is used results in clang 3.5.0 complaining with:
-
-arch/x86/hvm/vlapic.c:141:47: error: non-const static data member must be initialized out of line
- alternative_call(hvm_funcs.test_pir, const_vlapic_vcpu(vlapic), vec) )
-
-Workaround this by pulling `arg1` into a local variable, like it's done for
-further arguments (arg2, arg3...)
-
-Originally arg1 wasn't pulled into a variable because for the a1_ register
-local variable the possible clobbering as a result of operators on other
-variables don't matter:
-
-https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
-
-Note clang version 3.8.1 seems to already be fixed and don't require the
-workaround, but since it's harmless do it uniformly everywhere.
-
-Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Fixes: 2ce562b2a413 ('x86/altcall: use a union as register type for function parameters on clang')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-master commit: c20850540ad6a32f4fc17bde9b01c92b0df18bf0
-master date: 2024-02-29 08:21:49 +0100
----
- xen/arch/x86/include/asm/alternative.h | 36 +++++++++++++++++---------
- 1 file changed, 24 insertions(+), 12 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/alternative.h b/xen/arch/x86/include/asm/alternative.h
-index bcb1dc94f4..fa04481316 100644
---- a/xen/arch/x86/include/asm/alternative.h
-+++ b/xen/arch/x86/include/asm/alternative.h
-@@ -253,21 +253,24 @@ extern void alternative_branches(void);
- })
-
- #define alternative_vcall1(func, arg) ({ \
-- ALT_CALL_ARG(arg, 1); \
-+ typeof(arg) v1_ = (arg); \
-+ ALT_CALL_ARG(v1_, 1); \
- ALT_CALL_NO_ARG2; \
- (void)sizeof(func(arg)); \
- (void)alternative_callN(1, int, func); \
- })
-
- #define alternative_call1(func, arg) ({ \
-- ALT_CALL_ARG(arg, 1); \
-+ typeof(arg) v1_ = (arg); \
-+ ALT_CALL_ARG(v1_, 1); \
- ALT_CALL_NO_ARG2; \
- alternative_callN(1, typeof(func(arg)), func); \
- })
-
- #define alternative_vcall2(func, arg1, arg2) ({ \
-+ typeof(arg1) v1_ = (arg1); \
- typeof(arg2) v2_ = (arg2); \
-- ALT_CALL_ARG(arg1, 1); \
-+ ALT_CALL_ARG(v1_, 1); \
- ALT_CALL_ARG(v2_, 2); \
- ALT_CALL_NO_ARG3; \
- (void)sizeof(func(arg1, arg2)); \
-@@ -275,17 +278,19 @@ extern void alternative_branches(void);
- })
-
- #define alternative_call2(func, arg1, arg2) ({ \
-+ typeof(arg1) v1_ = (arg1); \
- typeof(arg2) v2_ = (arg2); \
-- ALT_CALL_ARG(arg1, 1); \
-+ ALT_CALL_ARG(v1_, 1); \
- ALT_CALL_ARG(v2_, 2); \
- ALT_CALL_NO_ARG3; \
- alternative_callN(2, typeof(func(arg1, arg2)), func); \
- })
-
- #define alternative_vcall3(func, arg1, arg2, arg3) ({ \
-+ typeof(arg1) v1_ = (arg1); \
- typeof(arg2) v2_ = (arg2); \
- typeof(arg3) v3_ = (arg3); \
-- ALT_CALL_ARG(arg1, 1); \
-+ ALT_CALL_ARG(v1_, 1); \
- ALT_CALL_ARG(v2_, 2); \
- ALT_CALL_ARG(v3_, 3); \
- ALT_CALL_NO_ARG4; \
-@@ -294,9 +299,10 @@ extern void alternative_branches(void);
- })
-
- #define alternative_call3(func, arg1, arg2, arg3) ({ \
-+ typeof(arg1) v1_ = (arg1); \
- typeof(arg2) v2_ = (arg2); \
- typeof(arg3) v3_ = (arg3); \
-- ALT_CALL_ARG(arg1, 1); \
-+ ALT_CALL_ARG(v1_, 1); \
- ALT_CALL_ARG(v2_, 2); \
- ALT_CALL_ARG(v3_, 3); \
- ALT_CALL_NO_ARG4; \
-@@ -305,10 +311,11 @@ extern void alternative_branches(void);
- })
-
- #define alternative_vcall4(func, arg1, arg2, arg3, arg4) ({ \
-+ typeof(arg1) v1_ = (arg1); \
- typeof(arg2) v2_ = (arg2); \
- typeof(arg3) v3_ = (arg3); \
- typeof(arg4) v4_ = (arg4); \
-- ALT_CALL_ARG(arg1, 1); \
-+ ALT_CALL_ARG(v1_, 1); \
- ALT_CALL_ARG(v2_, 2); \
- ALT_CALL_ARG(v3_, 3); \
- ALT_CALL_ARG(v4_, 4); \
-@@ -318,10 +325,11 @@ extern void alternative_branches(void);
- })
-
- #define alternative_call4(func, arg1, arg2, arg3, arg4) ({ \
-+ typeof(arg1) v1_ = (arg1); \
- typeof(arg2) v2_ = (arg2); \
- typeof(arg3) v3_ = (arg3); \
- typeof(arg4) v4_ = (arg4); \
-- ALT_CALL_ARG(arg1, 1); \
-+ ALT_CALL_ARG(v1_, 1); \
- ALT_CALL_ARG(v2_, 2); \
- ALT_CALL_ARG(v3_, 3); \
- ALT_CALL_ARG(v4_, 4); \
-@@ -332,11 +340,12 @@ extern void alternative_branches(void);
- })
-
- #define alternative_vcall5(func, arg1, arg2, arg3, arg4, arg5) ({ \
-+ typeof(arg1) v1_ = (arg1); \
- typeof(arg2) v2_ = (arg2); \
- typeof(arg3) v3_ = (arg3); \
- typeof(arg4) v4_ = (arg4); \
- typeof(arg5) v5_ = (arg5); \
-- ALT_CALL_ARG(arg1, 1); \
-+ ALT_CALL_ARG(v1_, 1); \
- ALT_CALL_ARG(v2_, 2); \
- ALT_CALL_ARG(v3_, 3); \
- ALT_CALL_ARG(v4_, 4); \
-@@ -347,11 +356,12 @@ extern void alternative_branches(void);
- })
-
- #define alternative_call5(func, arg1, arg2, arg3, arg4, arg5) ({ \
-+ typeof(arg1) v1_ = (arg1); \
- typeof(arg2) v2_ = (arg2); \
- typeof(arg3) v3_ = (arg3); \
- typeof(arg4) v4_ = (arg4); \
- typeof(arg5) v5_ = (arg5); \
-- ALT_CALL_ARG(arg1, 1); \
-+ ALT_CALL_ARG(v1_, 1); \
- ALT_CALL_ARG(v2_, 2); \
- ALT_CALL_ARG(v3_, 3); \
- ALT_CALL_ARG(v4_, 4); \
-@@ -363,12 +373,13 @@ extern void alternative_branches(void);
- })
-
- #define alternative_vcall6(func, arg1, arg2, arg3, arg4, arg5, arg6) ({ \
-+ typeof(arg1) v1_ = (arg1); \
- typeof(arg2) v2_ = (arg2); \
- typeof(arg3) v3_ = (arg3); \
- typeof(arg4) v4_ = (arg4); \
- typeof(arg5) v5_ = (arg5); \
- typeof(arg6) v6_ = (arg6); \
-- ALT_CALL_ARG(arg1, 1); \
-+ ALT_CALL_ARG(v1_, 1); \
- ALT_CALL_ARG(v2_, 2); \
- ALT_CALL_ARG(v3_, 3); \
- ALT_CALL_ARG(v4_, 4); \
-@@ -379,12 +390,13 @@ extern void alternative_branches(void);
- })
-
- #define alternative_call6(func, arg1, arg2, arg3, arg4, arg5, arg6) ({ \
-+ typeof(arg1) v1_ = (arg1); \
- typeof(arg2) v2_ = (arg2); \
- typeof(arg3) v3_ = (arg3); \
- typeof(arg4) v4_ = (arg4); \
- typeof(arg5) v5_ = (arg5); \
- typeof(arg6) v6_ = (arg6); \
-- ALT_CALL_ARG(arg1, 1); \
-+ ALT_CALL_ARG(v1_, 1); \
- ALT_CALL_ARG(v2_, 2); \
- ALT_CALL_ARG(v3_, 3); \
- ALT_CALL_ARG(v4_, 4); \
---
-2.44.0
-
diff --git a/0035-x86-irq-forward-pending-interrupts-to-new-destinatio.patch b/0035-x86-irq-forward-pending-interrupts-to-new-destinatio.patch
new file mode 100644
index 0000000..f05b09e
--- /dev/null
+++ b/0035-x86-irq-forward-pending-interrupts-to-new-destinatio.patch
@@ -0,0 +1,143 @@
+From e95d30f9e5eed0c5d9dbf72d4cc3ae373152ab10 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Wed, 26 Jun 2024 13:44:08 +0200
+Subject: [PATCH 35/56] x86/irq: forward pending interrupts to new destination
+ in fixup_irqs()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+fixup_irqs() is used to evacuate interrupts from to be offlined CPUs. Given
+the CPU is to become offline, the normal migration logic used by Xen where the
+vector in the previous target(s) is left configured until the interrupt is
+received on the new destination is not suitable.
+
+Instead attempt to do as much as possible in order to prevent loosing
+interrupts. If fixup_irqs() is called from the CPU to be offlined (as is
+currently the case for CPU hot unplug) attempt to forward pending vectors when
+interrupts that target the current CPU are migrated to a different destination.
+
+Additionally, for interrupts that have already been moved from the current CPU
+prior to the call to fixup_irqs() but that haven't been delivered to the new
+destination (iow: interrupts with move_in_progress set and the current CPU set
+in ->arch.old_cpu_mask) also check whether the previous vector is pending and
+forward it to the new destination.
+
+This allows us to remove the window with interrupts enabled at the bottom of
+fixup_irqs(). Such window wasn't safe anyway: references to the CPU to become
+offline are removed from interrupts masks, but the per-CPU vector_irq[] array
+is not updated to reflect those changes (as the CPU is going offline anyway).
+
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: e2bb28d621584fce15c907002ddc7c6772644b64
+master date: 2024-06-20 12:09:32 +0200
+---
+ xen/arch/x86/include/asm/apic.h | 5 ++++
+ xen/arch/x86/irq.c | 46 ++++++++++++++++++++++++++++-----
+ 2 files changed, 45 insertions(+), 6 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/apic.h b/xen/arch/x86/include/asm/apic.h
+index 7625c0ecd6..ad8d7cc054 100644
+--- a/xen/arch/x86/include/asm/apic.h
++++ b/xen/arch/x86/include/asm/apic.h
+@@ -145,6 +145,11 @@ static __inline bool_t apic_isr_read(u8 vector)
+ (vector & 0x1f)) & 1;
+ }
+
++static inline bool apic_irr_read(unsigned int vector)
++{
++ return apic_read(APIC_IRR + (vector / 32 * 0x10)) & (1U << (vector % 32));
++}
++
+ static __inline u32 get_apic_id(void) /* Get the physical APIC id */
+ {
+ u32 id = apic_read(APIC_ID);
+diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
+index 13ef61a5b7..290f8d26e7 100644
+--- a/xen/arch/x86/irq.c
++++ b/xen/arch/x86/irq.c
+@@ -2604,7 +2604,7 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
+
+ for ( irq = 0; irq < nr_irqs; irq++ )
+ {
+- bool break_affinity = false, set_affinity = true;
++ bool break_affinity = false, set_affinity = true, check_irr = false;
+ unsigned int vector, cpu = smp_processor_id();
+ cpumask_t *affinity = this_cpu(scratch_cpumask);
+
+@@ -2657,6 +2657,25 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
+ !cpu_online(cpu) &&
+ cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )
+ {
++ /*
++ * This to be offlined CPU was the target of an interrupt that's
++ * been moved, and the new destination target hasn't yet
++ * acknowledged any interrupt from it.
++ *
++ * We know the interrupt is configured to target the new CPU at
++ * this point, so we can check IRR for any pending vectors and
++ * forward them to the new destination.
++ *
++ * Note that for the other case of an interrupt movement being in
++ * progress (move_cleanup_count being non-zero) we know the new
++ * destination has already acked at least one interrupt from this
++ * source, and hence there's no need to forward any stale
++ * interrupts.
++ */
++ if ( apic_irr_read(desc->arch.old_vector) )
++ send_IPI_mask(cpumask_of(cpumask_any(desc->arch.cpu_mask)),
++ desc->arch.vector);
++
+ /*
+ * This CPU is going offline, remove it from ->arch.old_cpu_mask
+ * and possibly release the old vector if the old mask becomes
+@@ -2697,6 +2716,14 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
+ if ( desc->handler->disable )
+ desc->handler->disable(desc);
+
++ /*
++ * If the current CPU is going offline and is (one of) the target(s) of
++ * the interrupt, signal to check whether there are any pending vectors
++ * to be handled in the local APIC after the interrupt has been moved.
++ */
++ if ( !cpu_online(cpu) && cpumask_test_cpu(cpu, desc->arch.cpu_mask) )
++ check_irr = true;
++
+ if ( desc->handler->set_affinity )
+ desc->handler->set_affinity(desc, affinity);
+ else if ( !(warned++) )
+@@ -2707,6 +2734,18 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
+
+ cpumask_copy(affinity, desc->affinity);
+
++ if ( check_irr && apic_irr_read(vector) )
++ /*
++ * Forward pending interrupt to the new destination, this CPU is
++ * going offline and otherwise the interrupt would be lost.
++ *
++ * Do the IRR check as late as possible before releasing the irq
++ * desc in order for any in-flight interrupts to be delivered to
++ * the lapic.
++ */
++ send_IPI_mask(cpumask_of(cpumask_any(desc->arch.cpu_mask)),
++ desc->arch.vector);
++
+ spin_unlock(&desc->lock);
+
+ if ( !verbose )
+@@ -2718,11 +2757,6 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
+ printk("Broke affinity for IRQ%u, new: %*pb\n",
+ irq, CPUMASK_PR(affinity));
+ }
+-
+- /* That doesn't seem sufficient. Give it 1ms. */
+- local_irq_enable();
+- mdelay(1);
+- local_irq_disable();
+ }
+
+ void fixup_eoi(void)
+--
+2.45.2
+
diff --git a/0036-x86-cpu-policy-Allow-for-levelling-of-VERW-side-effe.patch b/0036-x86-cpu-policy-Allow-for-levelling-of-VERW-side-effe.patch
deleted file mode 100644
index b91ff52..0000000
--- a/0036-x86-cpu-policy-Allow-for-levelling-of-VERW-side-effe.patch
+++ /dev/null
@@ -1,102 +0,0 @@
-From 54dacb5c02cba4676879ed077765734326b78e39 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 5 Mar 2024 12:01:22 +0100
-Subject: [PATCH 36/67] x86/cpu-policy: Allow for levelling of VERW side
- effects
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-MD_CLEAR and FB_CLEAR need OR-ing across a migrate pool. Allow this, by
-having them unconditinally set in max, with the host values reflected in
-default. Annotate the bits as having special properies.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: de17162cafd27f2865a3102a2ec0f386a02ed03d
-master date: 2024-03-01 20:14:19 +0000
----
- xen/arch/x86/cpu-policy.c | 24 +++++++++++++++++++++
- xen/arch/x86/include/asm/cpufeature.h | 1 +
- xen/include/public/arch-x86/cpufeatureset.h | 4 ++--
- 3 files changed, 27 insertions(+), 2 deletions(-)
-
-diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
-index f0f2c8a1c0..7b875a7221 100644
---- a/xen/arch/x86/cpu-policy.c
-+++ b/xen/arch/x86/cpu-policy.c
-@@ -435,6 +435,16 @@ static void __init guest_common_max_feature_adjustments(uint32_t *fs)
- __set_bit(X86_FEATURE_RSBA, fs);
- __set_bit(X86_FEATURE_RRSBA, fs);
-
-+ /*
-+ * These bits indicate that the VERW instruction may have gained
-+ * scrubbing side effects. With pooling, they mean "you might migrate
-+ * somewhere where scrubbing is necessary", and may need exposing on
-+ * unaffected hardware. This is fine, because the VERW instruction
-+ * has been around since the 286.
-+ */
-+ __set_bit(X86_FEATURE_MD_CLEAR, fs);
-+ __set_bit(X86_FEATURE_FB_CLEAR, fs);
-+
- /*
- * The Gather Data Sampling microcode mitigation (August 2023) has an
- * adverse performance impact on the CLWB instruction on SKX/CLX/CPX.
-@@ -469,6 +479,20 @@ static void __init guest_common_default_feature_adjustments(uint32_t *fs)
- cpu_has_rdrand && !is_forced_cpu_cap(X86_FEATURE_RDRAND) )
- __clear_bit(X86_FEATURE_RDRAND, fs);
-
-+ /*
-+ * These bits indicate that the VERW instruction may have gained
-+ * scrubbing side effects. The max policy has them set for migration
-+ * reasons, so reset the default policy back to the host values in
-+ * case we're unaffected.
-+ */
-+ __clear_bit(X86_FEATURE_MD_CLEAR, fs);
-+ if ( cpu_has_md_clear )
-+ __set_bit(X86_FEATURE_MD_CLEAR, fs);
-+
-+ __clear_bit(X86_FEATURE_FB_CLEAR, fs);
-+ if ( cpu_has_fb_clear )
-+ __set_bit(X86_FEATURE_FB_CLEAR, fs);
-+
- /*
- * The Gather Data Sampling microcode mitigation (August 2023) has an
- * adverse performance impact on the CLWB instruction on SKX/CLX/CPX.
-diff --git a/xen/arch/x86/include/asm/cpufeature.h b/xen/arch/x86/include/asm/cpufeature.h
-index 9ef7756593..ec824e8954 100644
---- a/xen/arch/x86/include/asm/cpufeature.h
-+++ b/xen/arch/x86/include/asm/cpufeature.h
-@@ -136,6 +136,7 @@
- #define cpu_has_avx512_4fmaps boot_cpu_has(X86_FEATURE_AVX512_4FMAPS)
- #define cpu_has_avx512_vp2intersect boot_cpu_has(X86_FEATURE_AVX512_VP2INTERSECT)
- #define cpu_has_srbds_ctrl boot_cpu_has(X86_FEATURE_SRBDS_CTRL)
-+#define cpu_has_md_clear boot_cpu_has(X86_FEATURE_MD_CLEAR)
- #define cpu_has_rtm_always_abort boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)
- #define cpu_has_tsx_force_abort boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)
- #define cpu_has_serialize boot_cpu_has(X86_FEATURE_SERIALIZE)
-diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
-index 94d211df2f..aec1407613 100644
---- a/xen/include/public/arch-x86/cpufeatureset.h
-+++ b/xen/include/public/arch-x86/cpufeatureset.h
-@@ -260,7 +260,7 @@ XEN_CPUFEATURE(AVX512_4FMAPS, 9*32+ 3) /*A AVX512 Multiply Accumulation Single
- XEN_CPUFEATURE(FSRM, 9*32+ 4) /*A Fast Short REP MOVS */
- XEN_CPUFEATURE(AVX512_VP2INTERSECT, 9*32+8) /*a VP2INTERSECT{D,Q} insns */
- XEN_CPUFEATURE(SRBDS_CTRL, 9*32+ 9) /* MSR_MCU_OPT_CTRL and RNGDS_MITG_DIS. */
--XEN_CPUFEATURE(MD_CLEAR, 9*32+10) /*A VERW clears microarchitectural buffers */
-+XEN_CPUFEATURE(MD_CLEAR, 9*32+10) /*!A VERW clears microarchitectural buffers */
- XEN_CPUFEATURE(RTM_ALWAYS_ABORT, 9*32+11) /*! June 2021 TSX defeaturing in microcode. */
- XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
- XEN_CPUFEATURE(SERIALIZE, 9*32+14) /*A SERIALIZE insn */
-@@ -321,7 +321,7 @@ XEN_CPUFEATURE(DOITM, 16*32+12) /* Data Operand Invariant Timing
- XEN_CPUFEATURE(SBDR_SSDP_NO, 16*32+13) /*A No Shared Buffer Data Read or Sideband Stale Data Propagation */
- XEN_CPUFEATURE(FBSDP_NO, 16*32+14) /*A No Fill Buffer Stale Data Propagation */
- XEN_CPUFEATURE(PSDP_NO, 16*32+15) /*A No Primary Stale Data Propagation */
--XEN_CPUFEATURE(FB_CLEAR, 16*32+17) /*A Fill Buffers cleared by VERW */
-+XEN_CPUFEATURE(FB_CLEAR, 16*32+17) /*!A Fill Buffers cleared by VERW */
- XEN_CPUFEATURE(FB_CLEAR_CTRL, 16*32+18) /* MSR_OPT_CPU_CTRL.FB_CLEAR_DIS */
- XEN_CPUFEATURE(RRSBA, 16*32+19) /*! Restricted RSB Alternative */
- XEN_CPUFEATURE(BHI_NO, 16*32+20) /*A No Branch History Injection */
---
-2.44.0
-
diff --git a/0036-x86-re-run-exception-from-stub-recovery-selftests-wi.patch b/0036-x86-re-run-exception-from-stub-recovery-selftests-wi.patch
new file mode 100644
index 0000000..a552e9c
--- /dev/null
+++ b/0036-x86-re-run-exception-from-stub-recovery-selftests-wi.patch
@@ -0,0 +1,84 @@
+From 5ac3cbbf83e1f955aeaf5d0f503099f5249b5c25 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Thu, 4 Jul 2024 14:06:19 +0200
+Subject: [PATCH 36/56] x86: re-run exception-from-stub recovery selftests with
+ CET-SS enabled
+
+On the BSP, shadow stacks are enabled only relatively late in the
+booting process. They in particular aren't active yet when initcalls are
+run. Keep the testing there, but invoke that testing a 2nd time when
+shadow stacks are active, to make sure we won't regress that case after
+addressing XSA-451.
+
+While touching this code, switch the guard from NDEBUG to CONFIG_DEBUG,
+such that IS_ENABLED() can validly be used at the new call site.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: cfe3ad67127b86e1b1c06993b86422673a51b050
+master date: 2024-02-27 13:49:52 +0100
+---
+ xen/arch/x86/extable.c | 8 +++++---
+ xen/arch/x86/include/asm/setup.h | 2 ++
+ xen/arch/x86/setup.c | 4 ++++
+ 3 files changed, 11 insertions(+), 3 deletions(-)
+
+diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
+index 8ffcd346d7..12cc9935d8 100644
+--- a/xen/arch/x86/extable.c
++++ b/xen/arch/x86/extable.c
+@@ -128,10 +128,11 @@ search_exception_table(const struct cpu_user_regs *regs, unsigned long *stub_ra)
+ return 0;
+ }
+
+-#ifndef NDEBUG
++#ifdef CONFIG_DEBUG
++#include <asm/setup.h>
+ #include <asm/traps.h>
+
+-static int __init cf_check stub_selftest(void)
++int __init cf_check stub_selftest(void)
+ {
+ static const struct {
+ uint8_t opc[8];
+@@ -155,7 +156,8 @@ static int __init cf_check stub_selftest(void)
+ unsigned int i;
+ bool fail = false;
+
+- printk("Running stub recovery selftests...\n");
++ printk("%s stub recovery selftests...\n",
++ system_state < SYS_STATE_active ? "Running" : "Re-running");
+
+ for ( i = 0; i < ARRAY_SIZE(tests); ++i )
+ {
+diff --git a/xen/arch/x86/include/asm/setup.h b/xen/arch/x86/include/asm/setup.h
+index 9a460e4db8..14d15048eb 100644
+--- a/xen/arch/x86/include/asm/setup.h
++++ b/xen/arch/x86/include/asm/setup.h
+@@ -38,6 +38,8 @@ void *bootstrap_map(const module_t *mod);
+
+ int xen_in_range(unsigned long mfn);
+
++int cf_check stub_selftest(void);
++
+ extern uint8_t kbd_shift_flags;
+
+ #ifdef NDEBUG
+diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
+index 25017b5d96..f2592c3dc9 100644
+--- a/xen/arch/x86/setup.c
++++ b/xen/arch/x86/setup.c
+@@ -738,6 +738,10 @@ static void noreturn init_done(void)
+
+ system_state = SYS_STATE_active;
+
++ /* Re-run stub recovery self-tests with CET-SS active. */
++ if ( IS_ENABLED(CONFIG_DEBUG) && cpu_has_xen_shstk )
++ stub_selftest();
++
+ domain_unpause_by_systemcontroller(dom0);
+
+ /* MUST be done prior to removing .init data. */
+--
+2.45.2
+
diff --git a/0037-hvmloader-PCI-skip-huge-BARs-in-certain-calculations.patch b/0037-hvmloader-PCI-skip-huge-BARs-in-certain-calculations.patch
deleted file mode 100644
index a46f913..0000000
--- a/0037-hvmloader-PCI-skip-huge-BARs-in-certain-calculations.patch
+++ /dev/null
@@ -1,99 +0,0 @@
-From 1e9808227c10717228969e924cab49cad4af6265 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 12 Mar 2024 12:08:48 +0100
-Subject: [PATCH 37/67] hvmloader/PCI: skip huge BARs in certain calculations
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-BARs of size 2Gb and up can't possibly fit below 4Gb: Both the bottom of
-the lower 2Gb range and the top of the higher 2Gb range have special
-purpose. Don't even have them influence whether to (perhaps) relocate
-low RAM.
-
-Reported-by: Neowutran <xen@neowutran.ovh>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 57acad12a09ffa490e870ebe17596aad858f0191
-master date: 2024-03-06 10:19:29 +0100
----
- tools/firmware/hvmloader/pci.c | 28 ++++++++++++++++++++--------
- 1 file changed, 20 insertions(+), 8 deletions(-)
-
-diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
-index 257a6feb61..c3c61ca060 100644
---- a/tools/firmware/hvmloader/pci.c
-+++ b/tools/firmware/hvmloader/pci.c
-@@ -33,6 +33,13 @@ uint32_t pci_mem_start = HVM_BELOW_4G_MMIO_START;
- const uint32_t pci_mem_end = RESERVED_MEMBASE;
- uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
-
-+/*
-+ * BARs larger than this value are put in 64-bit space unconditionally. That
-+ * is, such BARs also don't play into the determination of how big the lowmem
-+ * MMIO hole needs to be.
-+ */
-+#define BAR_RELOC_THRESH GB(1)
-+
- enum virtual_vga virtual_vga = VGA_none;
- unsigned long igd_opregion_pgbase = 0;
-
-@@ -286,9 +293,11 @@ void pci_setup(void)
- bars[i].bar_reg = bar_reg;
- bars[i].bar_sz = bar_sz;
-
-- if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
-- PCI_BASE_ADDRESS_SPACE_MEMORY) ||
-- (bar_reg == PCI_ROM_ADDRESS) )
-+ if ( is_64bar && bar_sz > BAR_RELOC_THRESH )
-+ bar64_relocate = 1;
-+ else if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
-+ PCI_BASE_ADDRESS_SPACE_MEMORY) ||
-+ (bar_reg == PCI_ROM_ADDRESS) )
- mmio_total += bar_sz;
-
- nr_bars++;
-@@ -367,7 +376,7 @@ void pci_setup(void)
- pci_mem_start = hvm_info->low_mem_pgend << PAGE_SHIFT;
- }
-
-- if ( mmio_total > (pci_mem_end - pci_mem_start) )
-+ if ( mmio_total > (pci_mem_end - pci_mem_start) || bar64_relocate )
- {
- printf("Low MMIO hole not large enough for all devices,"
- " relocating some BARs to 64-bit\n");
-@@ -430,7 +439,8 @@ void pci_setup(void)
-
- /*
- * Relocate to high memory if the total amount of MMIO needed
-- * is more than the low MMIO available. Because devices are
-+ * is more than the low MMIO available or BARs bigger than
-+ * BAR_RELOC_THRESH are present. Because devices are
- * processed in order of bar_sz, this will preferentially
- * relocate larger devices to high memory first.
- *
-@@ -446,8 +456,9 @@ void pci_setup(void)
- * the code here assumes it to be.)
- * Should either of those two conditions change, this code will break.
- */
-- using_64bar = bars[i].is_64bar && bar64_relocate
-- && (mmio_total > (mem_resource.max - mem_resource.base));
-+ using_64bar = bars[i].is_64bar && bar64_relocate &&
-+ (mmio_total > (mem_resource.max - mem_resource.base) ||
-+ bar_sz > BAR_RELOC_THRESH);
- bar_data = pci_readl(devfn, bar_reg);
-
- if ( (bar_data & PCI_BASE_ADDRESS_SPACE) ==
-@@ -467,7 +478,8 @@ void pci_setup(void)
- resource = &mem_resource;
- bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
- }
-- mmio_total -= bar_sz;
-+ if ( bar_sz <= BAR_RELOC_THRESH )
-+ mmio_total -= bar_sz;
- }
- else
- {
---
-2.44.0
-
diff --git a/0037-tools-tests-don-t-let-test-xenstore-write-nodes-exce.patch b/0037-tools-tests-don-t-let-test-xenstore-write-nodes-exce.patch
new file mode 100644
index 0000000..cc7e47d
--- /dev/null
+++ b/0037-tools-tests-don-t-let-test-xenstore-write-nodes-exce.patch
@@ -0,0 +1,41 @@
+From 0ebfa35965257343ba3d8377be91ad8512a9c749 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Thu, 4 Jul 2024 14:06:54 +0200
+Subject: [PATCH 37/56] tools/tests: don't let test-xenstore write nodes
+ exceeding default size
+
+Today test-xenstore will write nodes with 3000 bytes node data. This
+size is exceeding the default quota for the allowed node size. While
+working in dom0 with C-xenstored, OCAML-xenstored does not like that.
+
+Use a size of 2000 instead, which is lower than the allowed default
+node size of 2048.
+
+Fixes: 3afc5e4a5b75 ("tools/tests: add xenstore testing framework")
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 642005e310483c490b0725fab4672f2b77fdf2ba
+master date: 2024-05-02 18:15:31 +0100
+---
+ tools/tests/xenstore/test-xenstore.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/tools/tests/xenstore/test-xenstore.c b/tools/tests/xenstore/test-xenstore.c
+index d491dac53b..73a7011d21 100644
+--- a/tools/tests/xenstore/test-xenstore.c
++++ b/tools/tests/xenstore/test-xenstore.c
+@@ -408,9 +408,9 @@ static int test_ta3_deinit(uintptr_t par)
+ #define TEST(s, f, p, l) { s, f ## _init, f, f ## _deinit, (uintptr_t)(p), l }
+ struct test tests[] = {
+ TEST("read 1", test_read, 1, "Read node with 1 byte data"),
+-TEST("read 3000", test_read, 3000, "Read node with 3000 bytes data"),
++TEST("read 2000", test_read, 2000, "Read node with 2000 bytes data"),
+ TEST("write 1", test_write, 1, "Write node with 1 byte data"),
+-TEST("write 3000", test_write, 3000, "Write node with 3000 bytes data"),
++TEST("write 2000", test_write, 2000, "Write node with 2000 bytes data"),
+ TEST("dir", test_dir, 0, "List directory"),
+ TEST("rm node", test_rm, 0, "Remove single node"),
+ TEST("rm dir", test_rm, WRITE_BUFFERS_N, "Remove node with sub-nodes"),
+--
+2.45.2
+
diff --git a/0038-tools-tests-let-test-xenstore-exit-with-non-0-status.patch b/0038-tools-tests-let-test-xenstore-exit-with-non-0-status.patch
new file mode 100644
index 0000000..ee0a497
--- /dev/null
+++ b/0038-tools-tests-let-test-xenstore-exit-with-non-0-status.patch
@@ -0,0 +1,57 @@
+From 22f623622cc60571be9cccc323a1d17749683667 Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Thu, 4 Jul 2024 14:07:12 +0200
+Subject: [PATCH 38/56] tools/tests: let test-xenstore exit with non-0 status
+ in case of error
+
+In case a test is failing in test-xenstore, let the tool exit with an
+exit status other than 0.
+
+Fix a typo in an error message.
+
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Fixes: 3afc5e4a5b75 ("tools/tests: add xenstore testing framework")
+Signed-off-by: Juergen Gross <jgross@suse.com>
+master commit: 2d4ba205591ba64f31149ae31051678159ee9e11
+master date: 2024-05-02 18:15:46 +0100
+---
+ tools/tests/xenstore/test-xenstore.c | 8 ++++----
+ 1 file changed, 4 insertions(+), 4 deletions(-)
+
+diff --git a/tools/tests/xenstore/test-xenstore.c b/tools/tests/xenstore/test-xenstore.c
+index 73a7011d21..7a9bd9afb3 100644
+--- a/tools/tests/xenstore/test-xenstore.c
++++ b/tools/tests/xenstore/test-xenstore.c
+@@ -506,14 +506,14 @@ int main(int argc, char *argv[])
+ stop = time(NULL) + randtime;
+ srandom((unsigned int)stop);
+
+- while ( time(NULL) < stop )
++ while ( time(NULL) < stop && !ret )
+ {
+ t = random() % ARRAY_SIZE(tests);
+ ret = call_test(tests + t, iters, true);
+ }
+ }
+ else
+- for ( t = 0; t < ARRAY_SIZE(tests); t++ )
++ for ( t = 0; t < ARRAY_SIZE(tests) && !ret; t++ )
+ {
+ if ( !test || !strcmp(test, tests[t].name) )
+ ret = call_test(tests + t, iters, false);
+@@ -525,10 +525,10 @@ int main(int argc, char *argv[])
+ xs_close(xsh);
+
+ if ( ta_loops )
+- printf("Exhaustive transaction retries (%d) occurrred %d times.\n",
++ printf("Exhaustive transaction retries (%d) occurred %d times.\n",
+ MAX_TA_LOOPS, ta_loops);
+
+- return 0;
++ return ret ? 3 : 0;
+ }
+
+ /*
+--
+2.45.2
+
diff --git a/0038-x86-mm-fix-detection-of-last-L1-entry-in-modify_xen_.patch b/0038-x86-mm-fix-detection-of-last-L1-entry-in-modify_xen_.patch
deleted file mode 100644
index 66b4db3..0000000
--- a/0038-x86-mm-fix-detection-of-last-L1-entry-in-modify_xen_.patch
+++ /dev/null
@@ -1,41 +0,0 @@
-From 1f94117bec55a7b934fed3dfd3529db624eb441f Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 12 Mar 2024 12:08:59 +0100
-Subject: [PATCH 38/67] x86/mm: fix detection of last L1 entry in
- modify_xen_mappings_lite()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The current logic to detect when to switch to the next L1 table is incorrectly
-using l2_table_offset() in order to notice when the last entry on the current
-L1 table has been reached.
-
-It should instead use l1_table_offset() to check whether the index has wrapped
-to point to the first entry, and so the next L1 table should be used.
-
-Fixes: 8676092a0f16 ('x86/livepatch: Fix livepatch application when CET is active')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 7c81558208de7858251b62f168a449be84305595
-master date: 2024-03-11 11:09:42 +0000
----
- xen/arch/x86/mm.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
-index e884a6fdbd..330c4abcd1 100644
---- a/xen/arch/x86/mm.c
-+++ b/xen/arch/x86/mm.c
-@@ -5963,7 +5963,7 @@ void init_or_livepatch modify_xen_mappings_lite(
-
- v += 1UL << L1_PAGETABLE_SHIFT;
-
-- if ( l2_table_offset(v) == 0 )
-+ if ( l1_table_offset(v) == 0 )
- break;
- }
-
---
-2.44.0
-
diff --git a/0039-LICENSES-Add-MIT-0-MIT-No-Attribution.patch b/0039-LICENSES-Add-MIT-0-MIT-No-Attribution.patch
new file mode 100644
index 0000000..8b2c4ec
--- /dev/null
+++ b/0039-LICENSES-Add-MIT-0-MIT-No-Attribution.patch
@@ -0,0 +1,58 @@
+From 75b4f9474a1aa33a6f9e0986b51c390f9b38ae5a Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 4 Jul 2024 14:08:11 +0200
+Subject: [PATCH 39/56] LICENSES: Add MIT-0 (MIT No Attribution)
+
+We are about to import code licensed under MIT-0. It's compatible for us to
+use, so identify it as a permitted license.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
+Acked-by: Christian Lindig <christian.lindig@cloud.com>
+master commit: 219cdff3fb7b4a03ab14869584f111e0f623b330
+master date: 2024-05-23 15:04:40 +0100
+---
+ LICENSES/MIT-0 | 31 +++++++++++++++++++++++++++++++
+ 1 file changed, 31 insertions(+)
+ create mode 100644 LICENSES/MIT-0
+
+diff --git a/LICENSES/MIT-0 b/LICENSES/MIT-0
+new file mode 100644
+index 0000000000..70fb90ee34
+--- /dev/null
++++ b/LICENSES/MIT-0
+@@ -0,0 +1,31 @@
++Valid-License-Identifier: MIT-0
++
++SPDX-URL: https://spdx.org/licenses/MIT-0.html
++
++Usage-Guide:
++
++ To use the MIT-0 License put the following SPDX tag/value pair into a
++ comment according to the placement guidelines in the licensing rules
++ documentation:
++ SPDX-License-Identifier: MIT-0
++
++License-Text:
++
++MIT No Attribution
++
++Copyright <year> <copyright holder>
++
++Permission is hereby granted, free of charge, to any person obtaining a copy
++of this software and associated documentation files (the "Software"), to deal
++in the Software without restriction, including without limitation the rights
++to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
++copies of the Software, and to permit persons to whom the Software is
++furnished to do so.
++
++THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
++IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
++FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
++AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
++LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
++OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
++SOFTWARE.
+--
+2.45.2
+
diff --git a/0039-x86-entry-Introduce-EFRAME_-constants.patch b/0039-x86-entry-Introduce-EFRAME_-constants.patch
deleted file mode 100644
index c280286..0000000
--- a/0039-x86-entry-Introduce-EFRAME_-constants.patch
+++ /dev/null
@@ -1,314 +0,0 @@
-From e691f99f17198906f813b85dcabafe5addb9a57a Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Sat, 27 Jan 2024 17:52:09 +0000
-Subject: [PATCH 39/67] x86/entry: Introduce EFRAME_* constants
-
-restore_all_guest() does a lot of manipulation of the stack after popping the
-GPRs, and uses raw %rsp displacements to do so. Also, almost all entrypaths
-use raw %rsp displacements prior to pushing GPRs.
-
-Provide better mnemonics, to aid readability and reduce the chance of errors
-when editing.
-
-No functional change. The resulting binary is identical.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 37541208f119a9c552c6c6c3246ea61be0d44035)
----
- xen/arch/x86/x86_64/asm-offsets.c | 17 ++++++++
- xen/arch/x86/x86_64/compat/entry.S | 2 +-
- xen/arch/x86/x86_64/entry.S | 70 +++++++++++++++---------------
- 3 files changed, 53 insertions(+), 36 deletions(-)
-
-diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
-index 287dac101a..31fa63b77f 100644
---- a/xen/arch/x86/x86_64/asm-offsets.c
-+++ b/xen/arch/x86/x86_64/asm-offsets.c
-@@ -51,6 +51,23 @@ void __dummy__(void)
- OFFSET(UREGS_kernel_sizeof, struct cpu_user_regs, es);
- BLANK();
-
-+ /*
-+ * EFRAME_* is for the entry/exit logic where %rsp is pointing at
-+ * UREGS_error_code and GPRs are still/already guest values.
-+ */
-+#define OFFSET_EF(sym, mem) \
-+ DEFINE(sym, offsetof(struct cpu_user_regs, mem) - \
-+ offsetof(struct cpu_user_regs, error_code))
-+
-+ OFFSET_EF(EFRAME_entry_vector, entry_vector);
-+ OFFSET_EF(EFRAME_rip, rip);
-+ OFFSET_EF(EFRAME_cs, cs);
-+ OFFSET_EF(EFRAME_eflags, eflags);
-+ OFFSET_EF(EFRAME_rsp, rsp);
-+ BLANK();
-+
-+#undef OFFSET_EF
-+
- OFFSET(VCPU_processor, struct vcpu, processor);
- OFFSET(VCPU_domain, struct vcpu, domain);
- OFFSET(VCPU_vcpu_info, struct vcpu, vcpu_info);
-diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
-index 253bb1688c..7c211314d8 100644
---- a/xen/arch/x86/x86_64/compat/entry.S
-+++ b/xen/arch/x86/x86_64/compat/entry.S
-@@ -15,7 +15,7 @@ ENTRY(entry_int82)
- ENDBR64
- ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
- pushq $0
-- movl $HYPERCALL_VECTOR, 4(%rsp)
-+ movl $HYPERCALL_VECTOR, EFRAME_entry_vector(%rsp)
- SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */
-
- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
-diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
-index 585b0c9551..412cbeb3ec 100644
---- a/xen/arch/x86/x86_64/entry.S
-+++ b/xen/arch/x86/x86_64/entry.S
-@@ -190,15 +190,15 @@ restore_all_guest:
- SPEC_CTRL_EXIT_TO_PV /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
-
- RESTORE_ALL
-- testw $TRAP_syscall,4(%rsp)
-+ testw $TRAP_syscall, EFRAME_entry_vector(%rsp)
- jz iret_exit_to_guest
-
-- movq 24(%rsp),%r11 # RFLAGS
-+ mov EFRAME_eflags(%rsp), %r11
- andq $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), %r11
- orq $X86_EFLAGS_IF,%r11
-
- /* Don't use SYSRET path if the return address is not canonical. */
-- movq 8(%rsp),%rcx
-+ mov EFRAME_rip(%rsp), %rcx
- sarq $47,%rcx
- incl %ecx
- cmpl $1,%ecx
-@@ -213,20 +213,20 @@ restore_all_guest:
- ALTERNATIVE "", rag_clrssbsy, X86_FEATURE_XEN_SHSTK
- #endif
-
-- movq 8(%rsp), %rcx # RIP
-- cmpw $FLAT_USER_CS32,16(%rsp)# CS
-- movq 32(%rsp),%rsp # RSP
-+ mov EFRAME_rip(%rsp), %rcx
-+ cmpw $FLAT_USER_CS32, EFRAME_cs(%rsp)
-+ mov EFRAME_rsp(%rsp), %rsp
- je 1f
- sysretq
- 1: sysretl
-
- ALIGN
- .Lrestore_rcx_iret_exit_to_guest:
-- movq 8(%rsp), %rcx # RIP
-+ mov EFRAME_rip(%rsp), %rcx
- /* No special register assumptions. */
- iret_exit_to_guest:
-- andl $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), 24(%rsp)
-- orl $X86_EFLAGS_IF,24(%rsp)
-+ andl $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), EFRAME_eflags(%rsp)
-+ orl $X86_EFLAGS_IF, EFRAME_eflags(%rsp)
- addq $8,%rsp
- .Lft0: iretq
- _ASM_PRE_EXTABLE(.Lft0, handle_exception)
-@@ -257,7 +257,7 @@ ENTRY(lstar_enter)
- pushq $FLAT_KERNEL_CS64
- pushq %rcx
- pushq $0
-- movl $TRAP_syscall, 4(%rsp)
-+ movl $TRAP_syscall, EFRAME_entry_vector(%rsp)
- SAVE_ALL
-
- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
-@@ -294,7 +294,7 @@ ENTRY(cstar_enter)
- pushq $FLAT_USER_CS32
- pushq %rcx
- pushq $0
-- movl $TRAP_syscall, 4(%rsp)
-+ movl $TRAP_syscall, EFRAME_entry_vector(%rsp)
- SAVE_ALL
-
- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
-@@ -335,7 +335,7 @@ GLOBAL(sysenter_eflags_saved)
- pushq $3 /* ring 3 null cs */
- pushq $0 /* null rip */
- pushq $0
-- movl $TRAP_syscall, 4(%rsp)
-+ movl $TRAP_syscall, EFRAME_entry_vector(%rsp)
- SAVE_ALL
-
- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
-@@ -389,7 +389,7 @@ ENTRY(int80_direct_trap)
- ENDBR64
- ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
- pushq $0
-- movl $0x80, 4(%rsp)
-+ movl $0x80, EFRAME_entry_vector(%rsp)
- SAVE_ALL
-
- SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
-@@ -649,7 +649,7 @@ ret_from_intr:
- .section .init.text, "ax", @progbits
- ENTRY(early_page_fault)
- ENDBR64
-- movl $TRAP_page_fault, 4(%rsp)
-+ movl $TRAP_page_fault, EFRAME_entry_vector(%rsp)
- SAVE_ALL
- movq %rsp, %rdi
- call do_early_page_fault
-@@ -716,7 +716,7 @@ ENTRY(common_interrupt)
-
- ENTRY(page_fault)
- ENDBR64
-- movl $TRAP_page_fault,4(%rsp)
-+ movl $TRAP_page_fault, EFRAME_entry_vector(%rsp)
- /* No special register assumptions. */
- GLOBAL(handle_exception)
- ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
-@@ -892,90 +892,90 @@ FATAL_exception_with_ints_disabled:
- ENTRY(divide_error)
- ENDBR64
- pushq $0
-- movl $TRAP_divide_error,4(%rsp)
-+ movl $TRAP_divide_error, EFRAME_entry_vector(%rsp)
- jmp handle_exception
-
- ENTRY(coprocessor_error)
- ENDBR64
- pushq $0
-- movl $TRAP_copro_error,4(%rsp)
-+ movl $TRAP_copro_error, EFRAME_entry_vector(%rsp)
- jmp handle_exception
-
- ENTRY(simd_coprocessor_error)
- ENDBR64
- pushq $0
-- movl $TRAP_simd_error,4(%rsp)
-+ movl $TRAP_simd_error, EFRAME_entry_vector(%rsp)
- jmp handle_exception
-
- ENTRY(device_not_available)
- ENDBR64
- pushq $0
-- movl $TRAP_no_device,4(%rsp)
-+ movl $TRAP_no_device, EFRAME_entry_vector(%rsp)
- jmp handle_exception
-
- ENTRY(debug)
- ENDBR64
- pushq $0
-- movl $TRAP_debug,4(%rsp)
-+ movl $TRAP_debug, EFRAME_entry_vector(%rsp)
- jmp handle_ist_exception
-
- ENTRY(int3)
- ENDBR64
- pushq $0
-- movl $TRAP_int3,4(%rsp)
-+ movl $TRAP_int3, EFRAME_entry_vector(%rsp)
- jmp handle_exception
-
- ENTRY(overflow)
- ENDBR64
- pushq $0
-- movl $TRAP_overflow,4(%rsp)
-+ movl $TRAP_overflow, EFRAME_entry_vector(%rsp)
- jmp handle_exception
-
- ENTRY(bounds)
- ENDBR64
- pushq $0
-- movl $TRAP_bounds,4(%rsp)
-+ movl $TRAP_bounds, EFRAME_entry_vector(%rsp)
- jmp handle_exception
-
- ENTRY(invalid_op)
- ENDBR64
- pushq $0
-- movl $TRAP_invalid_op,4(%rsp)
-+ movl $TRAP_invalid_op, EFRAME_entry_vector(%rsp)
- jmp handle_exception
-
- ENTRY(invalid_TSS)
- ENDBR64
-- movl $TRAP_invalid_tss,4(%rsp)
-+ movl $TRAP_invalid_tss, EFRAME_entry_vector(%rsp)
- jmp handle_exception
-
- ENTRY(segment_not_present)
- ENDBR64
-- movl $TRAP_no_segment,4(%rsp)
-+ movl $TRAP_no_segment, EFRAME_entry_vector(%rsp)
- jmp handle_exception
-
- ENTRY(stack_segment)
- ENDBR64
-- movl $TRAP_stack_error,4(%rsp)
-+ movl $TRAP_stack_error, EFRAME_entry_vector(%rsp)
- jmp handle_exception
-
- ENTRY(general_protection)
- ENDBR64
-- movl $TRAP_gp_fault,4(%rsp)
-+ movl $TRAP_gp_fault, EFRAME_entry_vector(%rsp)
- jmp handle_exception
-
- ENTRY(alignment_check)
- ENDBR64
-- movl $TRAP_alignment_check,4(%rsp)
-+ movl $TRAP_alignment_check, EFRAME_entry_vector(%rsp)
- jmp handle_exception
-
- ENTRY(entry_CP)
- ENDBR64
-- movl $X86_EXC_CP, 4(%rsp)
-+ movl $X86_EXC_CP, EFRAME_entry_vector(%rsp)
- jmp handle_exception
-
- ENTRY(double_fault)
- ENDBR64
-- movl $TRAP_double_fault,4(%rsp)
-+ movl $TRAP_double_fault, EFRAME_entry_vector(%rsp)
- /* Set AC to reduce chance of further SMAP faults */
- ALTERNATIVE "", stac, X86_FEATURE_XEN_SMAP
- SAVE_ALL
-@@ -1001,7 +1001,7 @@ ENTRY(double_fault)
- ENTRY(nmi)
- ENDBR64
- pushq $0
-- movl $TRAP_nmi,4(%rsp)
-+ movl $TRAP_nmi, EFRAME_entry_vector(%rsp)
- handle_ist_exception:
- ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
- SAVE_ALL
-@@ -1134,7 +1134,7 @@ handle_ist_exception:
- ENTRY(machine_check)
- ENDBR64
- pushq $0
-- movl $TRAP_machine_check,4(%rsp)
-+ movl $TRAP_machine_check, EFRAME_entry_vector(%rsp)
- jmp handle_ist_exception
-
- /* No op trap handler. Required for kexec crash path. */
-@@ -1171,7 +1171,7 @@ autogen_stubs: /* Automatically generated stubs. */
- 1:
- ENDBR64
- pushq $0
-- movb $vec,4(%rsp)
-+ movb $vec, EFRAME_entry_vector(%rsp)
- jmp common_interrupt
-
- entrypoint 1b
-@@ -1185,7 +1185,7 @@ autogen_stubs: /* Automatically generated stubs. */
- test $8,%spl /* 64bit exception frames are 16 byte aligned, but the word */
- jz 2f /* size is 8 bytes. Check whether the processor gave us an */
- pushq $0 /* error code, and insert an empty one if not. */
--2: movb $vec,4(%rsp)
-+2: movb $vec, EFRAME_entry_vector(%rsp)
- jmp handle_exception
-
- entrypoint 1b
---
-2.44.0
-
diff --git a/0040-tools-Import-stand-alone-sd_notify-implementation-fr.patch b/0040-tools-Import-stand-alone-sd_notify-implementation-fr.patch
new file mode 100644
index 0000000..990158d
--- /dev/null
+++ b/0040-tools-Import-stand-alone-sd_notify-implementation-fr.patch
@@ -0,0 +1,130 @@
+From 1743102a92479834c8e17b20697129e05b7c8313 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 4 Jul 2024 14:10:10 +0200
+Subject: [PATCH 40/56] tools: Import stand-alone sd_notify() implementation
+ from systemd
+
+... in order to avoid linking against the whole of libsystemd.
+
+Only minimal changes to the upstream copy, to function as a drop-in
+replacement for sd_notify() and as a header-only library.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+Acked-by: Christian Lindig <christian.lindig@cloud.com>
+master commit: 78510f3a1522f2856330ffa429e0e35f8aab4277
+master date: 2024-05-23 15:04:40 +0100
+master commit: 78510f3a1522f2856330ffa429e0e35f8aab4277
+master date: 2024-05-23 15:04:40 +0100
+---
+ tools/include/xen-sd-notify.h | 98 +++++++++++++++++++++++++++++++++++
+ 1 file changed, 98 insertions(+)
+ create mode 100644 tools/include/xen-sd-notify.h
+
+diff --git a/tools/include/xen-sd-notify.h b/tools/include/xen-sd-notify.h
+new file mode 100644
+index 0000000000..28c9b20f15
+--- /dev/null
++++ b/tools/include/xen-sd-notify.h
+@@ -0,0 +1,98 @@
++/* SPDX-License-Identifier: MIT-0 */
++
++/*
++ * Implement the systemd notify protocol without external dependencies.
++ * Supports both readiness notification on startup and on reloading,
++ * according to the protocol defined at:
++ * https://www.freedesktop.org/software/systemd/man/latest/sd_notify.html
++ * This protocol is guaranteed to be stable as per:
++ * https://systemd.io/PORTABILITY_AND_STABILITY/
++ *
++ * Differences from the upstream copy:
++ * - Rename/rework as a drop-in replacement for systemd/sd-daemon.h
++ * - Only take the subset Xen cares about
++ * - Respect -Wdeclaration-after-statement
++ */
++
++#ifndef XEN_SD_NOTIFY
++#define XEN_SD_NOTIFY
++
++#include <errno.h>
++#include <stddef.h>
++#include <stdlib.h>
++#include <sys/socket.h>
++#include <sys/un.h>
++#include <unistd.h>
++
++static inline void xen_sd_closep(int *fd) {
++ if (!fd || *fd < 0)
++ return;
++
++ close(*fd);
++ *fd = -1;
++}
++
++static inline int xen_sd_notify(const char *message) {
++ union sockaddr_union {
++ struct sockaddr sa;
++ struct sockaddr_un sun;
++ } socket_addr = {
++ .sun.sun_family = AF_UNIX,
++ };
++ size_t path_length, message_length;
++ ssize_t written;
++ const char *socket_path;
++ int __attribute__((cleanup(xen_sd_closep))) fd = -1;
++
++ /* Verify the argument first */
++ if (!message)
++ return -EINVAL;
++
++ message_length = strlen(message);
++ if (message_length == 0)
++ return -EINVAL;
++
++ /* If the variable is not set, the protocol is a noop */
++ socket_path = getenv("NOTIFY_SOCKET");
++ if (!socket_path)
++ return 0; /* Not set? Nothing to do */
++
++ /* Only AF_UNIX is supported, with path or abstract sockets */
++ if (socket_path[0] != '/' && socket_path[0] != '@')
++ return -EAFNOSUPPORT;
++
++ path_length = strlen(socket_path);
++ /* Ensure there is room for NUL byte */
++ if (path_length >= sizeof(socket_addr.sun.sun_path))
++ return -E2BIG;
++
++ memcpy(socket_addr.sun.sun_path, socket_path, path_length);
++
++ /* Support for abstract socket */
++ if (socket_addr.sun.sun_path[0] == '@')
++ socket_addr.sun.sun_path[0] = 0;
++
++ fd = socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0);
++ if (fd < 0)
++ return -errno;
++
++ if (connect(fd, &socket_addr.sa, offsetof(struct sockaddr_un, sun_path) + path_length) != 0)
++ return -errno;
++
++ written = write(fd, message, message_length);
++ if (written != (ssize_t) message_length)
++ return written < 0 ? -errno : -EPROTO;
++
++ return 1; /* Notified! */
++}
++
++static inline int sd_notify(int unset_environment, const char *message) {
++ int r = xen_sd_notify(message);
++
++ if (unset_environment)
++ unsetenv("NOTIFY_SOCKET");
++
++ return r;
++}
++
++#endif /* XEN_SD_NOTIFY */
+--
+2.45.2
+
diff --git a/0040-x86-Resync-intel-family.h-from-Linux.patch b/0040-x86-Resync-intel-family.h-from-Linux.patch
deleted file mode 100644
index 84e0304..0000000
--- a/0040-x86-Resync-intel-family.h-from-Linux.patch
+++ /dev/null
@@ -1,98 +0,0 @@
-From abc43cf5a6579f1aa0decf0a2349cdd2d2473117 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 27 Feb 2024 16:07:39 +0000
-Subject: [PATCH 40/67] x86: Resync intel-family.h from Linux
-
-From v6.8-rc6
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 195e75371b13c4f7ecdf7b5c50aed0d02f2d7ce8)
----
- xen/arch/x86/include/asm/intel-family.h | 38 ++++++++++++++++++++++---
- 1 file changed, 34 insertions(+), 4 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/intel-family.h b/xen/arch/x86/include/asm/intel-family.h
-index ffc49151be..b65e9c46b9 100644
---- a/xen/arch/x86/include/asm/intel-family.h
-+++ b/xen/arch/x86/include/asm/intel-family.h
-@@ -26,6 +26,9 @@
- * _G - parts with extra graphics on
- * _X - regular server parts
- * _D - micro server parts
-+ * _N,_P - other mobile parts
-+ * _H - premium mobile parts
-+ * _S - other client parts
- *
- * Historical OPTDIFFs:
- *
-@@ -37,6 +40,9 @@
- * their own names :-(
- */
-
-+/* Wildcard match for FAM6 so X86_MATCH_INTEL_FAM6_MODEL(ANY) works */
-+#define INTEL_FAM6_ANY X86_MODEL_ANY
-+
- #define INTEL_FAM6_CORE_YONAH 0x0E
-
- #define INTEL_FAM6_CORE2_MEROM 0x0F
-@@ -93,8 +99,6 @@
- #define INTEL_FAM6_ICELAKE_L 0x7E /* Sunny Cove */
- #define INTEL_FAM6_ICELAKE_NNPI 0x9D /* Sunny Cove */
-
--#define INTEL_FAM6_LAKEFIELD 0x8A /* Sunny Cove / Tremont */
--
- #define INTEL_FAM6_ROCKETLAKE 0xA7 /* Cypress Cove */
-
- #define INTEL_FAM6_TIGERLAKE_L 0x8C /* Willow Cove */
-@@ -102,12 +106,31 @@
-
- #define INTEL_FAM6_SAPPHIRERAPIDS_X 0x8F /* Golden Cove */
-
-+#define INTEL_FAM6_EMERALDRAPIDS_X 0xCF
-+
-+#define INTEL_FAM6_GRANITERAPIDS_X 0xAD
-+#define INTEL_FAM6_GRANITERAPIDS_D 0xAE
-+
-+/* "Hybrid" Processors (P-Core/E-Core) */
-+
-+#define INTEL_FAM6_LAKEFIELD 0x8A /* Sunny Cove / Tremont */
-+
- #define INTEL_FAM6_ALDERLAKE 0x97 /* Golden Cove / Gracemont */
- #define INTEL_FAM6_ALDERLAKE_L 0x9A /* Golden Cove / Gracemont */
-
--#define INTEL_FAM6_RAPTORLAKE 0xB7
-+#define INTEL_FAM6_RAPTORLAKE 0xB7 /* Raptor Cove / Enhanced Gracemont */
-+#define INTEL_FAM6_RAPTORLAKE_P 0xBA
-+#define INTEL_FAM6_RAPTORLAKE_S 0xBF
-+
-+#define INTEL_FAM6_METEORLAKE 0xAC
-+#define INTEL_FAM6_METEORLAKE_L 0xAA
-+
-+#define INTEL_FAM6_ARROWLAKE_H 0xC5
-+#define INTEL_FAM6_ARROWLAKE 0xC6
-+
-+#define INTEL_FAM6_LUNARLAKE_M 0xBD
-
--/* "Small Core" Processors (Atom) */
-+/* "Small Core" Processors (Atom/E-Core) */
-
- #define INTEL_FAM6_ATOM_BONNELL 0x1C /* Diamondville, Pineview */
- #define INTEL_FAM6_ATOM_BONNELL_MID 0x26 /* Silverthorne, Lincroft */
-@@ -134,6 +157,13 @@
- #define INTEL_FAM6_ATOM_TREMONT 0x96 /* Elkhart Lake */
- #define INTEL_FAM6_ATOM_TREMONT_L 0x9C /* Jasper Lake */
-
-+#define INTEL_FAM6_ATOM_GRACEMONT 0xBE /* Alderlake N */
-+
-+#define INTEL_FAM6_ATOM_CRESTMONT_X 0xAF /* Sierra Forest */
-+#define INTEL_FAM6_ATOM_CRESTMONT 0xB6 /* Grand Ridge */
-+
-+#define INTEL_FAM6_ATOM_DARKMONT_X 0xDD /* Clearwater Forest */
-+
- /* Xeon Phi */
-
- #define INTEL_FAM6_XEON_PHI_KNL 0x57 /* Knights Landing */
---
-2.44.0
-
diff --git a/0041-tools-c-o-xenstored-Don-t-link-against-libsystemd.patch b/0041-tools-c-o-xenstored-Don-t-link-against-libsystemd.patch
new file mode 100644
index 0000000..5bf3f98
--- /dev/null
+++ b/0041-tools-c-o-xenstored-Don-t-link-against-libsystemd.patch
@@ -0,0 +1,87 @@
+From 77cf215157d267a7776f3c4ec32e89064dcd84cd Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 4 Jul 2024 14:10:29 +0200
+Subject: [PATCH 41/56] tools/{c,o}xenstored: Don't link against libsystemd
+
+Use the local freestanding wrapper instead.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+Acked-by: Christian Lindig <christian.lindig@cloud.com>
+master commit: caf864482689a5dd6a945759b6372bb260d49665
+master date: 2024-05-23 15:04:40 +0100
+---
+ tools/ocaml/xenstored/Makefile | 3 +--
+ tools/ocaml/xenstored/systemd_stubs.c | 2 +-
+ tools/xenstored/Makefile | 5 -----
+ tools/xenstored/core.c | 4 ++--
+ 4 files changed, 4 insertions(+), 10 deletions(-)
+
+diff --git a/tools/ocaml/xenstored/Makefile b/tools/ocaml/xenstored/Makefile
+index e8aaecf2e6..fa45305d8c 100644
+--- a/tools/ocaml/xenstored/Makefile
++++ b/tools/ocaml/xenstored/Makefile
+@@ -4,8 +4,7 @@ include $(OCAML_TOPLEVEL)/common.make
+
+ # Include configure output (config.h)
+ CFLAGS += -include $(XEN_ROOT)/tools/config.h
+-CFLAGS-$(CONFIG_SYSTEMD) += $(SYSTEMD_CFLAGS)
+-LDFLAGS-$(CONFIG_SYSTEMD) += $(SYSTEMD_LIBS)
++CFLAGS-$(CONFIG_SYSTEMD) += $(CFLAGS_xeninclude)
+
+ CFLAGS += $(CFLAGS-y)
+ CFLAGS += $(APPEND_CFLAGS)
+diff --git a/tools/ocaml/xenstored/systemd_stubs.c b/tools/ocaml/xenstored/systemd_stubs.c
+index f4c875075a..7dbbdd35bf 100644
+--- a/tools/ocaml/xenstored/systemd_stubs.c
++++ b/tools/ocaml/xenstored/systemd_stubs.c
+@@ -25,7 +25,7 @@
+
+ #if defined(HAVE_SYSTEMD)
+
+-#include <systemd/sd-daemon.h>
++#include <xen-sd-notify.h>
+
+ CAMLprim value ocaml_sd_notify_ready(value ignore)
+ {
+diff --git a/tools/xenstored/Makefile b/tools/xenstored/Makefile
+index e0897ed1ba..09adfe1d50 100644
+--- a/tools/xenstored/Makefile
++++ b/tools/xenstored/Makefile
+@@ -9,11 +9,6 @@ xenstored: LDLIBS += $(LDLIBS_libxenctrl)
+ xenstored: LDLIBS += -lrt
+ xenstored: LDLIBS += $(SOCKET_LIBS)
+
+-ifeq ($(CONFIG_SYSTEMD),y)
+-$(XENSTORED_OBJS-y): CFLAGS += $(SYSTEMD_CFLAGS)
+-xenstored: LDLIBS += $(SYSTEMD_LIBS)
+-endif
+-
+ TARGETS := xenstored
+
+ .PHONY: all
+diff --git a/tools/xenstored/core.c b/tools/xenstored/core.c
+index edd07711db..dfe98e7bfc 100644
+--- a/tools/xenstored/core.c
++++ b/tools/xenstored/core.c
+@@ -61,7 +61,7 @@
+ #endif
+
+ #if defined(XEN_SYSTEMD_ENABLED)
+-#include <systemd/sd-daemon.h>
++#include <xen-sd-notify.h>
+ #endif
+
+ extern xenevtchn_handle *xce_handle; /* in domain.c */
+@@ -3000,7 +3000,7 @@ int main(int argc, char *argv[])
+ #if defined(XEN_SYSTEMD_ENABLED)
+ if (!live_update) {
+ sd_notify(1, "READY=1");
+- fprintf(stderr, SD_NOTICE "xenstored is ready\n");
++ fprintf(stderr, "xenstored is ready\n");
+ }
+ #endif
+
+--
+2.45.2
+
diff --git a/0041-x86-vmx-Perform-VERW-flushing-later-in-the-VMExit-pa.patch b/0041-x86-vmx-Perform-VERW-flushing-later-in-the-VMExit-pa.patch
deleted file mode 100644
index 871f10f..0000000
--- a/0041-x86-vmx-Perform-VERW-flushing-later-in-the-VMExit-pa.patch
+++ /dev/null
@@ -1,146 +0,0 @@
-From 77f2bec134049aba29b9b459f955022722d10847 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 23 Jun 2023 11:32:00 +0100
-Subject: [PATCH 41/67] x86/vmx: Perform VERW flushing later in the VMExit path
-
-Broken out of the following patch because this change is subtle enough on its
-own. See it for the rational of why we're moving VERW.
-
-As for how, extend the trick already used to hold one condition in
-flags (RESUME vs LAUNCH) through the POPing of GPRs.
-
-Move the MOV CR earlier. Intel specify flags to be undefined across it.
-
-Encode the two conditions we want using SF and PF. See the code comment for
-exactly how.
-
-Leave a comment to explain the lack of any content around
-SPEC_CTRL_EXIT_TO_VMX, but leave the block in place. Sods law says if we
-delete it, we'll need to reintroduce it.
-
-This is part of XSA-452 / CVE-2023-28746.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 475fa20b7384464210f42bad7195f87bd6f1c63f)
----
- xen/arch/x86/hvm/vmx/entry.S | 36 +++++++++++++++++++++---
- xen/arch/x86/include/asm/asm_defns.h | 8 ++++++
- xen/arch/x86/include/asm/spec_ctrl_asm.h | 7 +++++
- xen/arch/x86/x86_64/asm-offsets.c | 1 +
- 4 files changed, 48 insertions(+), 4 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
-index 5f5de45a13..cdde76e138 100644
---- a/xen/arch/x86/hvm/vmx/entry.S
-+++ b/xen/arch/x86/hvm/vmx/entry.S
-@@ -87,17 +87,39 @@ UNLIKELY_END(realmode)
-
- /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
- /* SPEC_CTRL_EXIT_TO_VMX Req: %rsp=regs/cpuinfo Clob: */
-- DO_SPEC_CTRL_COND_VERW
-+ /*
-+ * All speculation safety work happens to be elsewhere. VERW is after
-+ * popping the GPRs, while restoring the guest MSR_SPEC_CTRL is left
-+ * to the MSR load list.
-+ */
-
- mov VCPU_hvm_guest_cr2(%rbx),%rax
-+ mov %rax, %cr2
-+
-+ /*
-+ * We need to perform two conditional actions (VERW, and Resume vs
-+ * Launch) after popping GPRs. With some cunning, we can encode both
-+ * of these in eflags together.
-+ *
-+ * Parity is only calculated over the bottom byte of the answer, while
-+ * Sign is simply the top bit.
-+ *
-+ * Therefore, the final OR instruction ends up producing:
-+ * SF = VCPU_vmx_launched
-+ * PF = !SCF_verw
-+ */
-+ BUILD_BUG_ON(SCF_verw & ~0xff)
-+ movzbl VCPU_vmx_launched(%rbx), %ecx
-+ shl $31, %ecx
-+ movzbl CPUINFO_spec_ctrl_flags(%rsp), %eax
-+ and $SCF_verw, %eax
-+ or %eax, %ecx
-
- pop %r15
- pop %r14
- pop %r13
- pop %r12
- pop %rbp
-- mov %rax,%cr2
-- cmpb $0,VCPU_vmx_launched(%rbx)
- pop %rbx
- pop %r11
- pop %r10
-@@ -108,7 +130,13 @@ UNLIKELY_END(realmode)
- pop %rdx
- pop %rsi
- pop %rdi
-- je .Lvmx_launch
-+
-+ jpe .L_skip_verw
-+ /* VERW clobbers ZF, but preserves all others, including SF. */
-+ verw STK_REL(CPUINFO_verw_sel, CPUINFO_error_code)(%rsp)
-+.L_skip_verw:
-+
-+ jns .Lvmx_launch
-
- /*.Lvmx_resume:*/
- VMRESUME
-diff --git a/xen/arch/x86/include/asm/asm_defns.h b/xen/arch/x86/include/asm/asm_defns.h
-index d9431180cf..abc6822b08 100644
---- a/xen/arch/x86/include/asm/asm_defns.h
-+++ b/xen/arch/x86/include/asm/asm_defns.h
-@@ -81,6 +81,14 @@ register unsigned long current_stack_pointer asm("rsp");
-
- #ifdef __ASSEMBLY__
-
-+.macro BUILD_BUG_ON condstr, cond:vararg
-+ .if \cond
-+ .error "Condition \"\condstr\" not satisfied"
-+ .endif
-+.endm
-+/* preprocessor macro to make error message more user friendly */
-+#define BUILD_BUG_ON(cond) BUILD_BUG_ON #cond, cond
-+
- #ifdef HAVE_AS_QUOTED_SYM
- #define SUBSECTION_LBL(tag) \
- .ifndef .L.tag; \
-diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-index f4b8b9d956..ca9cb0f5dd 100644
---- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
-+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-@@ -164,6 +164,13 @@
- #endif
- .endm
-
-+/*
-+ * Helper to improve the readibility of stack dispacements with %rsp in
-+ * unusual positions. Both @field and @top_of_stack should be constants from
-+ * the same object. @top_of_stack should be where %rsp is currently pointing.
-+ */
-+#define STK_REL(field, top_of_stk) ((field) - (top_of_stk))
-+
- .macro DO_SPEC_CTRL_COND_VERW
- /*
- * Requires %rsp=cpuinfo
-diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
-index 31fa63b77f..a4e94d6930 100644
---- a/xen/arch/x86/x86_64/asm-offsets.c
-+++ b/xen/arch/x86/x86_64/asm-offsets.c
-@@ -135,6 +135,7 @@ void __dummy__(void)
- #endif
-
- OFFSET(CPUINFO_guest_cpu_user_regs, struct cpu_info, guest_cpu_user_regs);
-+ OFFSET(CPUINFO_error_code, struct cpu_info, guest_cpu_user_regs.error_code);
- OFFSET(CPUINFO_verw_sel, struct cpu_info, verw_sel);
- OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
- OFFSET(CPUINFO_per_cpu_offset, struct cpu_info, per_cpu_offset);
---
-2.44.0
-
diff --git a/0042-tools-Drop-libsystemd-as-a-dependency.patch b/0042-tools-Drop-libsystemd-as-a-dependency.patch
new file mode 100644
index 0000000..168680e
--- /dev/null
+++ b/0042-tools-Drop-libsystemd-as-a-dependency.patch
@@ -0,0 +1,648 @@
+From 7967bd358e93ed83e01813a8d0dfd68aa67f5780 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 4 Jul 2024 14:10:40 +0200
+Subject: [PATCH 42/56] tools: Drop libsystemd as a dependency
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+There are no more users, and we want to disuade people from introducing new
+users just for sd_notify() and friends. Drop the dependency.
+
+We still want the overall --with{,out}-systemd to gate the generation of the
+service/unit/mount/etc files.
+
+Rerun autogen.sh, and mark the dependency as removed in the build containers.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+Acked-by: Christian Lindig <christian.lindig@cloud.com>
+
+tools: (Actually) drop libsystemd as a dependency
+
+When reinstating some of systemd.m4 between v1 and v2, I reintroduced a little
+too much. While {c,o}xenstored are indeed no longer linked against
+libsystemd, ./configure still looks for it.
+
+Drop this too.
+
+Fixes: ae26101f6bfc ("tools: Drop libsystemd as a dependency")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: ae26101f6bfc8185adcdb9165d469bdc467780db
+master date: 2024-05-23 15:04:40 +0100
+master commit: 6ef4fa1e7fe78c1dae07b451292b07facfce4902
+master date: 2024-05-30 12:15:25 +0100
+---
+ CHANGELOG.md | 7 +-
+ config/Tools.mk.in | 2 -
+ m4/systemd.m4 | 17 --
+ tools/configure | 485 +--------------------------------------------
+ 4 files changed, 7 insertions(+), 504 deletions(-)
+
+diff --git a/CHANGELOG.md b/CHANGELOG.md
+index fa54d59df1..ceca12eb5f 100644
+--- a/CHANGELOG.md
++++ b/CHANGELOG.md
+@@ -4,7 +4,12 @@ Notable changes to Xen will be documented in this file.
+
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
+
+-## [4.18.2](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.18.2)
++## [4.18.3](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.18.3)
++
++### Changed
++ - When building with Systemd support (./configure --enable-systemd), remove
++ libsystemd as a build dependency. Systemd Notify support is retained, now
++ using a standalone library implementation.
+
+ ## [4.18.1](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.18.1)
+
+diff --git a/config/Tools.mk.in b/config/Tools.mk.in
+index b54ab21f96..50fbef841f 100644
+--- a/config/Tools.mk.in
++++ b/config/Tools.mk.in
+@@ -52,8 +52,6 @@ CONFIG_PYGRUB := @pygrub@
+ CONFIG_LIBFSIMAGE := @libfsimage@
+
+ CONFIG_SYSTEMD := @systemd@
+-SYSTEMD_CFLAGS := @SYSTEMD_CFLAGS@
+-SYSTEMD_LIBS := @SYSTEMD_LIBS@
+ XEN_SYSTEMD_DIR := @SYSTEMD_DIR@
+ XEN_SYSTEMD_MODULES_LOAD := @SYSTEMD_MODULES_LOAD@
+ CONFIG_9PFS := @ninepfs@
+diff --git a/m4/systemd.m4 b/m4/systemd.m4
+index 112dc11b5e..ab12ea313d 100644
+--- a/m4/systemd.m4
++++ b/m4/systemd.m4
+@@ -41,15 +41,6 @@ AC_DEFUN([AX_ALLOW_SYSTEMD_OPTS], [
+ ])
+
+ AC_DEFUN([AX_CHECK_SYSTEMD_LIBS], [
+- PKG_CHECK_MODULES([SYSTEMD], [libsystemd-daemon],,
+- [PKG_CHECK_MODULES([SYSTEMD], [libsystemd >= 209])]
+- )
+- dnl pkg-config older than 0.24 does not set these for
+- dnl PKG_CHECK_MODULES() worth also noting is that as of version 208
+- dnl of systemd pkg-config --cflags currently yields no extra flags yet.
+- AC_SUBST([SYSTEMD_CFLAGS])
+- AC_SUBST([SYSTEMD_LIBS])
+-
+ AS_IF([test "x$SYSTEMD_DIR" = x], [
+ dnl In order to use the line below we need to fix upstream systemd
+ dnl to properly ${prefix} for child variables in
+@@ -95,13 +86,6 @@ AC_DEFUN([AX_CHECK_SYSTEMD], [
+ ],[systemd=n])
+ ])
+
+-AC_DEFUN([AX_CHECK_SYSTEMD_ENABLE_AVAILABLE], [
+- PKG_CHECK_MODULES([SYSTEMD], [libsystemd-daemon], [systemd="y"],[
+- PKG_CHECK_MODULES([SYSTEMD], [libsystemd >= 209],
+- [systemd="y"],[systemd="n"])
+- ])
+-])
+-
+ dnl Enables systemd by default and requires a --disable-systemd option flag
+ dnl to configure if you want to disable.
+ AC_DEFUN([AX_ENABLE_SYSTEMD], [
+@@ -121,6 +105,5 @@ dnl to have systemd build libraries it will be enabled. You can always force
+ dnl disable with --disable-systemd
+ AC_DEFUN([AX_AVAILABLE_SYSTEMD], [
+ AX_ALLOW_SYSTEMD_OPTS()
+- AX_CHECK_SYSTEMD_ENABLE_AVAILABLE()
+ AX_CHECK_SYSTEMD()
+ ])
+diff --git a/tools/configure b/tools/configure
+index 38c0808d3a..7bb935d23b 100755
+--- a/tools/configure
++++ b/tools/configure
+@@ -626,8 +626,6 @@ ac_subst_vars='LTLIBOBJS
+ LIBOBJS
+ pvshim
+ ninepfs
+-SYSTEMD_LIBS
+-SYSTEMD_CFLAGS
+ SYSTEMD_MODULES_LOAD
+ SYSTEMD_DIR
+ systemd
+@@ -864,9 +862,7 @@ pixman_LIBS
+ libzstd_CFLAGS
+ libzstd_LIBS
+ LIBNL3_CFLAGS
+-LIBNL3_LIBS
+-SYSTEMD_CFLAGS
+-SYSTEMD_LIBS'
++LIBNL3_LIBS'
+
+
+ # Initialize some variables set by options.
+@@ -1621,10 +1617,6 @@ Some influential environment variables:
+ LIBNL3_CFLAGS
+ C compiler flags for LIBNL3, overriding pkg-config
+ LIBNL3_LIBS linker flags for LIBNL3, overriding pkg-config
+- SYSTEMD_CFLAGS
+- C compiler flags for SYSTEMD, overriding pkg-config
+- SYSTEMD_LIBS
+- linker flags for SYSTEMD, overriding pkg-config
+
+ Use these variables to override the choices made by `configure' or to help
+ it to find libraries and programs with nonstandard names/locations.
+@@ -3889,8 +3881,6 @@ esac
+
+
+
+-
+-
+
+
+
+@@ -9540,223 +9530,6 @@ fi
+
+
+
+-
+-pkg_failed=no
+-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for SYSTEMD" >&5
+-$as_echo_n "checking for SYSTEMD... " >&6; }
+-
+-if test -n "$SYSTEMD_CFLAGS"; then
+- pkg_cv_SYSTEMD_CFLAGS="$SYSTEMD_CFLAGS"
+- elif test -n "$PKG_CONFIG"; then
+- if test -n "$PKG_CONFIG" && \
+- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd-daemon\""; } >&5
+- ($PKG_CONFIG --exists --print-errors "libsystemd-daemon") 2>&5
+- ac_status=$?
+- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+- test $ac_status = 0; }; then
+- pkg_cv_SYSTEMD_CFLAGS=`$PKG_CONFIG --cflags "libsystemd-daemon" 2>/dev/null`
+- test "x$?" != "x0" && pkg_failed=yes
+-else
+- pkg_failed=yes
+-fi
+- else
+- pkg_failed=untried
+-fi
+-if test -n "$SYSTEMD_LIBS"; then
+- pkg_cv_SYSTEMD_LIBS="$SYSTEMD_LIBS"
+- elif test -n "$PKG_CONFIG"; then
+- if test -n "$PKG_CONFIG" && \
+- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd-daemon\""; } >&5
+- ($PKG_CONFIG --exists --print-errors "libsystemd-daemon") 2>&5
+- ac_status=$?
+- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+- test $ac_status = 0; }; then
+- pkg_cv_SYSTEMD_LIBS=`$PKG_CONFIG --libs "libsystemd-daemon" 2>/dev/null`
+- test "x$?" != "x0" && pkg_failed=yes
+-else
+- pkg_failed=yes
+-fi
+- else
+- pkg_failed=untried
+-fi
+-
+-
+-
+-if test $pkg_failed = yes; then
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+-$as_echo "no" >&6; }
+-
+-if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+- _pkg_short_errors_supported=yes
+-else
+- _pkg_short_errors_supported=no
+-fi
+- if test $_pkg_short_errors_supported = yes; then
+- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libsystemd-daemon" 2>&1`
+- else
+- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libsystemd-daemon" 2>&1`
+- fi
+- # Put the nasty error message in config.log where it belongs
+- echo "$SYSTEMD_PKG_ERRORS" >&5
+-
+-
+-
+-pkg_failed=no
+-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for SYSTEMD" >&5
+-$as_echo_n "checking for SYSTEMD... " >&6; }
+-
+-if test -n "$SYSTEMD_CFLAGS"; then
+- pkg_cv_SYSTEMD_CFLAGS="$SYSTEMD_CFLAGS"
+- elif test -n "$PKG_CONFIG"; then
+- if test -n "$PKG_CONFIG" && \
+- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
+- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
+- ac_status=$?
+- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+- test $ac_status = 0; }; then
+- pkg_cv_SYSTEMD_CFLAGS=`$PKG_CONFIG --cflags "libsystemd >= 209" 2>/dev/null`
+- test "x$?" != "x0" && pkg_failed=yes
+-else
+- pkg_failed=yes
+-fi
+- else
+- pkg_failed=untried
+-fi
+-if test -n "$SYSTEMD_LIBS"; then
+- pkg_cv_SYSTEMD_LIBS="$SYSTEMD_LIBS"
+- elif test -n "$PKG_CONFIG"; then
+- if test -n "$PKG_CONFIG" && \
+- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
+- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
+- ac_status=$?
+- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+- test $ac_status = 0; }; then
+- pkg_cv_SYSTEMD_LIBS=`$PKG_CONFIG --libs "libsystemd >= 209" 2>/dev/null`
+- test "x$?" != "x0" && pkg_failed=yes
+-else
+- pkg_failed=yes
+-fi
+- else
+- pkg_failed=untried
+-fi
+-
+-
+-
+-if test $pkg_failed = yes; then
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+-$as_echo "no" >&6; }
+-
+-if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+- _pkg_short_errors_supported=yes
+-else
+- _pkg_short_errors_supported=no
+-fi
+- if test $_pkg_short_errors_supported = yes; then
+- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
+- else
+- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
+- fi
+- # Put the nasty error message in config.log where it belongs
+- echo "$SYSTEMD_PKG_ERRORS" >&5
+-
+- systemd="n"
+-elif test $pkg_failed = untried; then
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+-$as_echo "no" >&6; }
+- systemd="n"
+-else
+- SYSTEMD_CFLAGS=$pkg_cv_SYSTEMD_CFLAGS
+- SYSTEMD_LIBS=$pkg_cv_SYSTEMD_LIBS
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+-$as_echo "yes" >&6; }
+- systemd="y"
+-fi
+-
+-elif test $pkg_failed = untried; then
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+-$as_echo "no" >&6; }
+-
+-
+-pkg_failed=no
+-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for SYSTEMD" >&5
+-$as_echo_n "checking for SYSTEMD... " >&6; }
+-
+-if test -n "$SYSTEMD_CFLAGS"; then
+- pkg_cv_SYSTEMD_CFLAGS="$SYSTEMD_CFLAGS"
+- elif test -n "$PKG_CONFIG"; then
+- if test -n "$PKG_CONFIG" && \
+- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
+- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
+- ac_status=$?
+- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+- test $ac_status = 0; }; then
+- pkg_cv_SYSTEMD_CFLAGS=`$PKG_CONFIG --cflags "libsystemd >= 209" 2>/dev/null`
+- test "x$?" != "x0" && pkg_failed=yes
+-else
+- pkg_failed=yes
+-fi
+- else
+- pkg_failed=untried
+-fi
+-if test -n "$SYSTEMD_LIBS"; then
+- pkg_cv_SYSTEMD_LIBS="$SYSTEMD_LIBS"
+- elif test -n "$PKG_CONFIG"; then
+- if test -n "$PKG_CONFIG" && \
+- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
+- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
+- ac_status=$?
+- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+- test $ac_status = 0; }; then
+- pkg_cv_SYSTEMD_LIBS=`$PKG_CONFIG --libs "libsystemd >= 209" 2>/dev/null`
+- test "x$?" != "x0" && pkg_failed=yes
+-else
+- pkg_failed=yes
+-fi
+- else
+- pkg_failed=untried
+-fi
+-
+-
+-
+-if test $pkg_failed = yes; then
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+-$as_echo "no" >&6; }
+-
+-if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+- _pkg_short_errors_supported=yes
+-else
+- _pkg_short_errors_supported=no
+-fi
+- if test $_pkg_short_errors_supported = yes; then
+- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
+- else
+- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
+- fi
+- # Put the nasty error message in config.log where it belongs
+- echo "$SYSTEMD_PKG_ERRORS" >&5
+-
+- systemd="n"
+-elif test $pkg_failed = untried; then
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+-$as_echo "no" >&6; }
+- systemd="n"
+-else
+- SYSTEMD_CFLAGS=$pkg_cv_SYSTEMD_CFLAGS
+- SYSTEMD_LIBS=$pkg_cv_SYSTEMD_LIBS
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+-$as_echo "yes" >&6; }
+- systemd="y"
+-fi
+-
+-else
+- SYSTEMD_CFLAGS=$pkg_cv_SYSTEMD_CFLAGS
+- SYSTEMD_LIBS=$pkg_cv_SYSTEMD_LIBS
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+-$as_echo "yes" >&6; }
+- systemd="y"
+-fi
+-
+-
+ if test "x$enable_systemd" != "xno"; then :
+
+ if test "x$systemd" = "xy" ; then :
+@@ -9766,262 +9539,6 @@ $as_echo "#define HAVE_SYSTEMD 1" >>confdefs.h
+
+ systemd=y
+
+-
+-pkg_failed=no
+-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for SYSTEMD" >&5
+-$as_echo_n "checking for SYSTEMD... " >&6; }
+-
+-if test -n "$SYSTEMD_CFLAGS"; then
+- pkg_cv_SYSTEMD_CFLAGS="$SYSTEMD_CFLAGS"
+- elif test -n "$PKG_CONFIG"; then
+- if test -n "$PKG_CONFIG" && \
+- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd-daemon\""; } >&5
+- ($PKG_CONFIG --exists --print-errors "libsystemd-daemon") 2>&5
+- ac_status=$?
+- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+- test $ac_status = 0; }; then
+- pkg_cv_SYSTEMD_CFLAGS=`$PKG_CONFIG --cflags "libsystemd-daemon" 2>/dev/null`
+- test "x$?" != "x0" && pkg_failed=yes
+-else
+- pkg_failed=yes
+-fi
+- else
+- pkg_failed=untried
+-fi
+-if test -n "$SYSTEMD_LIBS"; then
+- pkg_cv_SYSTEMD_LIBS="$SYSTEMD_LIBS"
+- elif test -n "$PKG_CONFIG"; then
+- if test -n "$PKG_CONFIG" && \
+- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd-daemon\""; } >&5
+- ($PKG_CONFIG --exists --print-errors "libsystemd-daemon") 2>&5
+- ac_status=$?
+- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+- test $ac_status = 0; }; then
+- pkg_cv_SYSTEMD_LIBS=`$PKG_CONFIG --libs "libsystemd-daemon" 2>/dev/null`
+- test "x$?" != "x0" && pkg_failed=yes
+-else
+- pkg_failed=yes
+-fi
+- else
+- pkg_failed=untried
+-fi
+-
+-
+-
+-if test $pkg_failed = yes; then
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+-$as_echo "no" >&6; }
+-
+-if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+- _pkg_short_errors_supported=yes
+-else
+- _pkg_short_errors_supported=no
+-fi
+- if test $_pkg_short_errors_supported = yes; then
+- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libsystemd-daemon" 2>&1`
+- else
+- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libsystemd-daemon" 2>&1`
+- fi
+- # Put the nasty error message in config.log where it belongs
+- echo "$SYSTEMD_PKG_ERRORS" >&5
+-
+-
+-pkg_failed=no
+-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for SYSTEMD" >&5
+-$as_echo_n "checking for SYSTEMD... " >&6; }
+-
+-if test -n "$SYSTEMD_CFLAGS"; then
+- pkg_cv_SYSTEMD_CFLAGS="$SYSTEMD_CFLAGS"
+- elif test -n "$PKG_CONFIG"; then
+- if test -n "$PKG_CONFIG" && \
+- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
+- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
+- ac_status=$?
+- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+- test $ac_status = 0; }; then
+- pkg_cv_SYSTEMD_CFLAGS=`$PKG_CONFIG --cflags "libsystemd >= 209" 2>/dev/null`
+- test "x$?" != "x0" && pkg_failed=yes
+-else
+- pkg_failed=yes
+-fi
+- else
+- pkg_failed=untried
+-fi
+-if test -n "$SYSTEMD_LIBS"; then
+- pkg_cv_SYSTEMD_LIBS="$SYSTEMD_LIBS"
+- elif test -n "$PKG_CONFIG"; then
+- if test -n "$PKG_CONFIG" && \
+- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
+- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
+- ac_status=$?
+- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+- test $ac_status = 0; }; then
+- pkg_cv_SYSTEMD_LIBS=`$PKG_CONFIG --libs "libsystemd >= 209" 2>/dev/null`
+- test "x$?" != "x0" && pkg_failed=yes
+-else
+- pkg_failed=yes
+-fi
+- else
+- pkg_failed=untried
+-fi
+-
+-
+-
+-if test $pkg_failed = yes; then
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+-$as_echo "no" >&6; }
+-
+-if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+- _pkg_short_errors_supported=yes
+-else
+- _pkg_short_errors_supported=no
+-fi
+- if test $_pkg_short_errors_supported = yes; then
+- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
+- else
+- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
+- fi
+- # Put the nasty error message in config.log where it belongs
+- echo "$SYSTEMD_PKG_ERRORS" >&5
+-
+- as_fn_error $? "Package requirements (libsystemd >= 209) were not met:
+-
+-$SYSTEMD_PKG_ERRORS
+-
+-Consider adjusting the PKG_CONFIG_PATH environment variable if you
+-installed software in a non-standard prefix.
+-
+-Alternatively, you may set the environment variables SYSTEMD_CFLAGS
+-and SYSTEMD_LIBS to avoid the need to call pkg-config.
+-See the pkg-config man page for more details." "$LINENO" 5
+-elif test $pkg_failed = untried; then
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+-$as_echo "no" >&6; }
+- { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+-$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+-as_fn_error $? "The pkg-config script could not be found or is too old. Make sure it
+-is in your PATH or set the PKG_CONFIG environment variable to the full
+-path to pkg-config.
+-
+-Alternatively, you may set the environment variables SYSTEMD_CFLAGS
+-and SYSTEMD_LIBS to avoid the need to call pkg-config.
+-See the pkg-config man page for more details.
+-
+-To get pkg-config, see <http://pkg-config.freedesktop.org/>.
+-See \`config.log' for more details" "$LINENO" 5; }
+-else
+- SYSTEMD_CFLAGS=$pkg_cv_SYSTEMD_CFLAGS
+- SYSTEMD_LIBS=$pkg_cv_SYSTEMD_LIBS
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+-$as_echo "yes" >&6; }
+-
+-fi
+-
+-elif test $pkg_failed = untried; then
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+-$as_echo "no" >&6; }
+-
+-pkg_failed=no
+-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for SYSTEMD" >&5
+-$as_echo_n "checking for SYSTEMD... " >&6; }
+-
+-if test -n "$SYSTEMD_CFLAGS"; then
+- pkg_cv_SYSTEMD_CFLAGS="$SYSTEMD_CFLAGS"
+- elif test -n "$PKG_CONFIG"; then
+- if test -n "$PKG_CONFIG" && \
+- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
+- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
+- ac_status=$?
+- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+- test $ac_status = 0; }; then
+- pkg_cv_SYSTEMD_CFLAGS=`$PKG_CONFIG --cflags "libsystemd >= 209" 2>/dev/null`
+- test "x$?" != "x0" && pkg_failed=yes
+-else
+- pkg_failed=yes
+-fi
+- else
+- pkg_failed=untried
+-fi
+-if test -n "$SYSTEMD_LIBS"; then
+- pkg_cv_SYSTEMD_LIBS="$SYSTEMD_LIBS"
+- elif test -n "$PKG_CONFIG"; then
+- if test -n "$PKG_CONFIG" && \
+- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
+- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
+- ac_status=$?
+- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+- test $ac_status = 0; }; then
+- pkg_cv_SYSTEMD_LIBS=`$PKG_CONFIG --libs "libsystemd >= 209" 2>/dev/null`
+- test "x$?" != "x0" && pkg_failed=yes
+-else
+- pkg_failed=yes
+-fi
+- else
+- pkg_failed=untried
+-fi
+-
+-
+-
+-if test $pkg_failed = yes; then
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+-$as_echo "no" >&6; }
+-
+-if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+- _pkg_short_errors_supported=yes
+-else
+- _pkg_short_errors_supported=no
+-fi
+- if test $_pkg_short_errors_supported = yes; then
+- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
+- else
+- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
+- fi
+- # Put the nasty error message in config.log where it belongs
+- echo "$SYSTEMD_PKG_ERRORS" >&5
+-
+- as_fn_error $? "Package requirements (libsystemd >= 209) were not met:
+-
+-$SYSTEMD_PKG_ERRORS
+-
+-Consider adjusting the PKG_CONFIG_PATH environment variable if you
+-installed software in a non-standard prefix.
+-
+-Alternatively, you may set the environment variables SYSTEMD_CFLAGS
+-and SYSTEMD_LIBS to avoid the need to call pkg-config.
+-See the pkg-config man page for more details." "$LINENO" 5
+-elif test $pkg_failed = untried; then
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+-$as_echo "no" >&6; }
+- { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+-$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+-as_fn_error $? "The pkg-config script could not be found or is too old. Make sure it
+-is in your PATH or set the PKG_CONFIG environment variable to the full
+-path to pkg-config.
+-
+-Alternatively, you may set the environment variables SYSTEMD_CFLAGS
+-and SYSTEMD_LIBS to avoid the need to call pkg-config.
+-See the pkg-config man page for more details.
+-
+-To get pkg-config, see <http://pkg-config.freedesktop.org/>.
+-See \`config.log' for more details" "$LINENO" 5; }
+-else
+- SYSTEMD_CFLAGS=$pkg_cv_SYSTEMD_CFLAGS
+- SYSTEMD_LIBS=$pkg_cv_SYSTEMD_LIBS
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+-$as_echo "yes" >&6; }
+-
+-fi
+-
+-else
+- SYSTEMD_CFLAGS=$pkg_cv_SYSTEMD_CFLAGS
+- SYSTEMD_LIBS=$pkg_cv_SYSTEMD_LIBS
+- { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+-$as_echo "yes" >&6; }
+-
+-fi
+-
+-
+-
+ if test "x$SYSTEMD_DIR" = x; then :
+
+ SYSTEMD_DIR="\$(prefix)/lib/systemd/system/"
+--
+2.45.2
+
diff --git a/0042-x86-spec-ctrl-Perform-VERW-flushing-later-in-exit-pa.patch b/0042-x86-spec-ctrl-Perform-VERW-flushing-later-in-exit-pa.patch
deleted file mode 100644
index ac78acd..0000000
--- a/0042-x86-spec-ctrl-Perform-VERW-flushing-later-in-exit-pa.patch
+++ /dev/null
@@ -1,209 +0,0 @@
-From 76af773de5d3e68b7140cc9c5343be6746c9101c Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Sat, 27 Jan 2024 18:20:56 +0000
-Subject: [PATCH 42/67] x86/spec-ctrl: Perform VERW flushing later in exit
- paths
-
-On parts vulnerable to RFDS, VERW's side effects are extended to scrub all
-non-architectural entries in various Physical Register Files. To remove all
-of Xen's values, the VERW must be after popping the GPRs.
-
-Rework SPEC_CTRL_COND_VERW to default to an CPUINFO_error_code %rsp position,
-but with overrides for other contexts. Identify that it clobbers eflags; this
-is particularly relevant for the SYSRET path.
-
-For the IST exit return to Xen, have the main SPEC_CTRL_EXIT_TO_XEN put a
-shadow copy of spec_ctrl_flags, as GPRs can't be used at the point we want to
-issue the VERW.
-
-This is part of XSA-452 / CVE-2023-28746.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 0a666cf2cd99df6faf3eebc81a1fc286e4eca4c7)
----
- xen/arch/x86/include/asm/spec_ctrl_asm.h | 36 ++++++++++++++++--------
- xen/arch/x86/x86_64/asm-offsets.c | 13 +++++++--
- xen/arch/x86/x86_64/compat/entry.S | 6 ++++
- xen/arch/x86/x86_64/entry.S | 21 +++++++++++++-
- 4 files changed, 61 insertions(+), 15 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-index ca9cb0f5dd..97a97b2b82 100644
---- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
-+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
-@@ -171,16 +171,23 @@
- */
- #define STK_REL(field, top_of_stk) ((field) - (top_of_stk))
-
--.macro DO_SPEC_CTRL_COND_VERW
-+.macro SPEC_CTRL_COND_VERW \
-+ scf=STK_REL(CPUINFO_spec_ctrl_flags, CPUINFO_error_code), \
-+ sel=STK_REL(CPUINFO_verw_sel, CPUINFO_error_code)
- /*
-- * Requires %rsp=cpuinfo
-+ * Requires \scf and \sel as %rsp-relative expressions
-+ * Clobbers eflags
-+ *
-+ * VERW needs to run after guest GPRs have been restored, where only %rsp is
-+ * good to use. Default to expecting %rsp pointing at CPUINFO_error_code.
-+ * Contexts where this is not true must provide an alternative \scf and \sel.
- *
- * Issue a VERW for its flushing side effect, if indicated. This is a Spectre
- * v1 gadget, but the IRET/VMEntry is serialising.
- */
-- testb $SCF_verw, CPUINFO_spec_ctrl_flags(%rsp)
-+ testb $SCF_verw, \scf(%rsp)
- jz .L\@_verw_skip
-- verw CPUINFO_verw_sel(%rsp)
-+ verw \sel(%rsp)
- .L\@_verw_skip:
- .endm
-
-@@ -298,8 +305,6 @@
- */
- ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV
-
-- DO_SPEC_CTRL_COND_VERW
--
- ALTERNATIVE "", DO_SPEC_CTRL_DIV, X86_FEATURE_SC_DIV
- .endm
-
-@@ -379,7 +384,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
- */
- .macro SPEC_CTRL_EXIT_TO_XEN
- /*
-- * Requires %r12=ist_exit, %r14=stack_end
-+ * Requires %r12=ist_exit, %r14=stack_end, %rsp=regs
- * Clobbers %rax, %rbx, %rcx, %rdx
- */
- movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
-@@ -407,11 +412,18 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
- test %r12, %r12
- jz .L\@_skip_ist_exit
-
-- /* Logically DO_SPEC_CTRL_COND_VERW but without the %rsp=cpuinfo dependency */
-- testb $SCF_verw, %bl
-- jz .L\@_skip_verw
-- verw STACK_CPUINFO_FIELD(verw_sel)(%r14)
--.L\@_skip_verw:
-+ /*
-+ * Stash SCF and verw_sel above eflags in the case of an IST_exit. The
-+ * VERW logic needs to run after guest GPRs have been restored; i.e. where
-+ * we cannot use %r12 or %r14 for the purposes they have here.
-+ *
-+ * When the CPU pushed this exception frame, it zero-extended eflags.
-+ * Therefore it is safe for the VERW logic to look at the stashed SCF
-+ * outside of the ist_exit condition. Also, this stashing won't influence
-+ * any other restore_all_guest() paths.
-+ */
-+ or $(__HYPERVISOR_DS32 << 16), %ebx
-+ mov %ebx, UREGS_eflags + 4(%rsp) /* EFRAME_shadow_scf/sel */
-
- ALTERNATIVE "", DO_SPEC_CTRL_DIV, X86_FEATURE_SC_DIV
-
-diff --git a/xen/arch/x86/x86_64/asm-offsets.c b/xen/arch/x86/x86_64/asm-offsets.c
-index a4e94d6930..4cd5938d7b 100644
---- a/xen/arch/x86/x86_64/asm-offsets.c
-+++ b/xen/arch/x86/x86_64/asm-offsets.c
-@@ -55,14 +55,22 @@ void __dummy__(void)
- * EFRAME_* is for the entry/exit logic where %rsp is pointing at
- * UREGS_error_code and GPRs are still/already guest values.
- */
--#define OFFSET_EF(sym, mem) \
-+#define OFFSET_EF(sym, mem, ...) \
- DEFINE(sym, offsetof(struct cpu_user_regs, mem) - \
-- offsetof(struct cpu_user_regs, error_code))
-+ offsetof(struct cpu_user_regs, error_code) __VA_ARGS__)
-
- OFFSET_EF(EFRAME_entry_vector, entry_vector);
- OFFSET_EF(EFRAME_rip, rip);
- OFFSET_EF(EFRAME_cs, cs);
- OFFSET_EF(EFRAME_eflags, eflags);
-+
-+ /*
-+ * These aren't real fields. They're spare space, used by the IST
-+ * exit-to-xen path.
-+ */
-+ OFFSET_EF(EFRAME_shadow_scf, eflags, +4);
-+ OFFSET_EF(EFRAME_shadow_sel, eflags, +6);
-+
- OFFSET_EF(EFRAME_rsp, rsp);
- BLANK();
-
-@@ -136,6 +144,7 @@ void __dummy__(void)
-
- OFFSET(CPUINFO_guest_cpu_user_regs, struct cpu_info, guest_cpu_user_regs);
- OFFSET(CPUINFO_error_code, struct cpu_info, guest_cpu_user_regs.error_code);
-+ OFFSET(CPUINFO_rip, struct cpu_info, guest_cpu_user_regs.rip);
- OFFSET(CPUINFO_verw_sel, struct cpu_info, verw_sel);
- OFFSET(CPUINFO_current_vcpu, struct cpu_info, current_vcpu);
- OFFSET(CPUINFO_per_cpu_offset, struct cpu_info, per_cpu_offset);
-diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
-index 7c211314d8..3b2fbcd873 100644
---- a/xen/arch/x86/x86_64/compat/entry.S
-+++ b/xen/arch/x86/x86_64/compat/entry.S
-@@ -161,6 +161,12 @@ ENTRY(compat_restore_all_guest)
- SPEC_CTRL_EXIT_TO_PV /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
-
- RESTORE_ALL adj=8 compat=1
-+
-+ /* Account for ev/ec having already been popped off the stack. */
-+ SPEC_CTRL_COND_VERW \
-+ scf=STK_REL(CPUINFO_spec_ctrl_flags, CPUINFO_rip), \
-+ sel=STK_REL(CPUINFO_verw_sel, CPUINFO_rip)
-+
- .Lft0: iretq
- _ASM_PRE_EXTABLE(.Lft0, handle_exception)
-
-diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
-index 412cbeb3ec..ef517e2945 100644
---- a/xen/arch/x86/x86_64/entry.S
-+++ b/xen/arch/x86/x86_64/entry.S
-@@ -214,6 +214,9 @@ restore_all_guest:
- #endif
-
- mov EFRAME_rip(%rsp), %rcx
-+
-+ SPEC_CTRL_COND_VERW /* Req: %rsp=eframe Clob: efl */
-+
- cmpw $FLAT_USER_CS32, EFRAME_cs(%rsp)
- mov EFRAME_rsp(%rsp), %rsp
- je 1f
-@@ -227,6 +230,9 @@ restore_all_guest:
- iret_exit_to_guest:
- andl $~(X86_EFLAGS_IOPL | X86_EFLAGS_VM), EFRAME_eflags(%rsp)
- orl $X86_EFLAGS_IF, EFRAME_eflags(%rsp)
-+
-+ SPEC_CTRL_COND_VERW /* Req: %rsp=eframe Clob: efl */
-+
- addq $8,%rsp
- .Lft0: iretq
- _ASM_PRE_EXTABLE(.Lft0, handle_exception)
-@@ -679,9 +685,22 @@ UNLIKELY_START(ne, exit_cr3)
- UNLIKELY_END(exit_cr3)
-
- /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-- SPEC_CTRL_EXIT_TO_XEN /* Req: %r12=ist_exit %r14=end, Clob: abcd */
-+ SPEC_CTRL_EXIT_TO_XEN /* Req: %r12=ist_exit %r14=end %rsp=regs, Clob: abcd */
-
- RESTORE_ALL adj=8
-+
-+ /*
-+ * When the CPU pushed this exception frame, it zero-extended eflags.
-+ * For an IST exit, SPEC_CTRL_EXIT_TO_XEN stashed shadow copies of
-+ * spec_ctrl_flags and ver_sel above eflags, as we can't use any GPRs,
-+ * and we're at a random place on the stack, not in a CPUFINFO block.
-+ *
-+ * Account for ev/ec having already been popped off the stack.
-+ */
-+ SPEC_CTRL_COND_VERW \
-+ scf=STK_REL(EFRAME_shadow_scf, EFRAME_rip), \
-+ sel=STK_REL(EFRAME_shadow_sel, EFRAME_rip)
-+
- iretq
-
- ENTRY(common_interrupt)
---
-2.44.0
-
diff --git a/0043-x86-ioapic-Fix-signed-shifts-in-io_apic.c.patch b/0043-x86-ioapic-Fix-signed-shifts-in-io_apic.c.patch
new file mode 100644
index 0000000..c368c1d
--- /dev/null
+++ b/0043-x86-ioapic-Fix-signed-shifts-in-io_apic.c.patch
@@ -0,0 +1,46 @@
+From 0dc5fbee17cd2bcb1aa6a1cf420dd80381587de8 Mon Sep 17 00:00:00 2001
+From: Matthew Barnes <matthew.barnes@cloud.com>
+Date: Thu, 4 Jul 2024 14:11:03 +0200
+Subject: [PATCH 43/56] x86/ioapic: Fix signed shifts in io_apic.c
+
+There exists bitshifts in the IOAPIC code where signed integers are
+shifted to the left by up to 31 bits, which is undefined behaviour.
+
+This patch fixes this by changing the integers from signed to unsigned.
+
+Signed-off-by: Matthew Barnes <matthew.barnes@cloud.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: c5746b021e573184fb92b601a0e93a295485054e
+master date: 2024-06-21 15:09:26 +0100
+---
+ xen/arch/x86/io_apic.c | 6 ++++--
+ 1 file changed, 4 insertions(+), 2 deletions(-)
+
+diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
+index 0ef61fb2f1..c5342789e8 100644
+--- a/xen/arch/x86/io_apic.c
++++ b/xen/arch/x86/io_apic.c
+@@ -1692,7 +1692,8 @@ static void cf_check mask_and_ack_level_ioapic_irq(struct irq_desc *desc)
+ !io_apic_level_ack_pending(desc->irq))
+ move_masked_irq(desc);
+
+- if ( !(v & (1 << (i & 0x1f))) ) {
++ if ( !(v & (1U << (i & 0x1f))) )
++ {
+ spin_lock(&ioapic_lock);
+ __edge_IO_APIC_irq(desc->irq);
+ __level_IO_APIC_irq(desc->irq);
+@@ -1756,7 +1757,8 @@ static void cf_check end_level_ioapic_irq_new(struct irq_desc *desc, u8 vector)
+ !io_apic_level_ack_pending(desc->irq) )
+ move_native_irq(desc);
+
+- if (!(v & (1 << (i & 0x1f)))) {
++ if ( !(v & (1U << (i & 0x1f))) )
++ {
+ spin_lock(&ioapic_lock);
+ __mask_IO_APIC_irq(desc->irq);
+ __edge_IO_APIC_irq(desc->irq);
+--
+2.45.2
+
diff --git a/0043-x86-spec-ctrl-Rename-VERW-related-options.patch b/0043-x86-spec-ctrl-Rename-VERW-related-options.patch
deleted file mode 100644
index 38edc15..0000000
--- a/0043-x86-spec-ctrl-Rename-VERW-related-options.patch
+++ /dev/null
@@ -1,248 +0,0 @@
-From d55d52961d13d4fcd1441fcfca98f690e687b941 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Mon, 12 Feb 2024 17:50:43 +0000
-Subject: [PATCH 43/67] x86/spec-ctrl: Rename VERW related options
-
-VERW is going to be used for a 3rd purpose, and the existing nomenclature
-didn't survive the Stale MMIO issues terribly well.
-
-Rename the command line option from `md-clear=` to `verw=`. This is more
-consistent with other options which tend to be named based on what they're
-doing, not which feature enumeration they use behind the scenes. Retain
-`md-clear=` as a deprecated alias.
-
-Rename opt_md_clear_{pv,hvm} and opt_fb_clear_mmio to opt_verw_{pv,hvm,mmio},
-which has a side effect of making spec_ctrl_init_domain() rather clearer to
-follow.
-
-No functional change.
-
-This is part of XSA-452 / CVE-2023-28746.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit f7603ca252e4226739eb3129a5290ee3da3f8ea4)
----
- docs/misc/xen-command-line.pandoc | 15 ++++----
- xen/arch/x86/spec_ctrl.c | 62 ++++++++++++++++---------------
- 2 files changed, 40 insertions(+), 37 deletions(-)
-
-diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index 2006697226..d909ec94fe 100644
---- a/docs/misc/xen-command-line.pandoc
-+++ b/docs/misc/xen-command-line.pandoc
-@@ -2324,7 +2324,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
-
- ### spec-ctrl (x86)
- > `= List of [ <bool>, xen=<bool>, {pv,hvm}=<bool>,
--> {msr-sc,rsb,md-clear,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
-+> {msr-sc,rsb,verw,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
- > bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
- > eager-fpu,l1d-flush,branch-harden,srb-lock,
- > unpriv-mmio,gds-mit,div-scrub}=<bool> ]`
-@@ -2349,7 +2349,7 @@ in place for guests to use.
-
- Use of a positive boolean value for either of these options is invalid.
-
--The `pv=`, `hvm=`, `msr-sc=`, `rsb=`, `md-clear=` and `ibpb-entry=` options
-+The `pv=`, `hvm=`, `msr-sc=`, `rsb=`, `verw=` and `ibpb-entry=` options
- offer fine grained control over the primitives by Xen. These impact Xen's
- ability to protect itself, and/or Xen's ability to virtualise support for
- guests to use.
-@@ -2366,11 +2366,12 @@ guests to use.
- guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
- * `rsb=` offers control over whether to overwrite the Return Stack Buffer /
- Return Address Stack on entry to Xen and on idle.
--* `md-clear=` offers control over whether to use VERW to flush
-- microarchitectural buffers on idle and exit from Xen. *Note: For
-- compatibility with development versions of this fix, `mds=` is also accepted
-- on Xen 4.12 and earlier as an alias. Consult vendor documentation in
-- preference to here.*
-+* `verw=` offers control over whether to use VERW for its scrubbing side
-+ effects at appropriate privilege transitions. The exact side effects are
-+ microarchitecture and microcode specific. *Note: `md-clear=` is accepted as
-+ a deprecated alias. For compatibility with development versions of XSA-297,
-+ `mds=` is also accepted on Xen 4.12 and earlier as an alias. Consult vendor
-+ documentation in preference to here.*
- * `ibpb-entry=` offers control over whether IBPB (Indirect Branch Prediction
- Barrier) is used on entry to Xen. This is used by default on hardware
- vulnerable to Branch Type Confusion, and hardware vulnerable to Speculative
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 25a18ac598..e12ec9930c 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -37,8 +37,8 @@ static bool __initdata opt_msr_sc_pv = true;
- static bool __initdata opt_msr_sc_hvm = true;
- static int8_t __initdata opt_rsb_pv = -1;
- static bool __initdata opt_rsb_hvm = true;
--static int8_t __ro_after_init opt_md_clear_pv = -1;
--static int8_t __ro_after_init opt_md_clear_hvm = -1;
-+static int8_t __ro_after_init opt_verw_pv = -1;
-+static int8_t __ro_after_init opt_verw_hvm = -1;
-
- static int8_t __ro_after_init opt_ibpb_entry_pv = -1;
- static int8_t __ro_after_init opt_ibpb_entry_hvm = -1;
-@@ -78,7 +78,7 @@ static bool __initdata cpu_has_bug_mds; /* Any other M{LP,SB,FB}DS combination.
-
- static int8_t __initdata opt_srb_lock = -1;
- static bool __initdata opt_unpriv_mmio;
--static bool __ro_after_init opt_fb_clear_mmio;
-+static bool __ro_after_init opt_verw_mmio;
- static int8_t __initdata opt_gds_mit = -1;
- static int8_t __initdata opt_div_scrub = -1;
-
-@@ -120,8 +120,8 @@ static int __init cf_check parse_spec_ctrl(const char *s)
- disable_common:
- opt_rsb_pv = false;
- opt_rsb_hvm = false;
-- opt_md_clear_pv = 0;
-- opt_md_clear_hvm = 0;
-+ opt_verw_pv = 0;
-+ opt_verw_hvm = 0;
- opt_ibpb_entry_pv = 0;
- opt_ibpb_entry_hvm = 0;
- opt_ibpb_entry_dom0 = false;
-@@ -152,14 +152,14 @@ static int __init cf_check parse_spec_ctrl(const char *s)
- {
- opt_msr_sc_pv = val;
- opt_rsb_pv = val;
-- opt_md_clear_pv = val;
-+ opt_verw_pv = val;
- opt_ibpb_entry_pv = val;
- }
- else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
- {
- opt_msr_sc_hvm = val;
- opt_rsb_hvm = val;
-- opt_md_clear_hvm = val;
-+ opt_verw_hvm = val;
- opt_ibpb_entry_hvm = val;
- }
- else if ( (val = parse_boolean("msr-sc", s, ss)) != -1 )
-@@ -204,21 +204,22 @@ static int __init cf_check parse_spec_ctrl(const char *s)
- break;
- }
- }
-- else if ( (val = parse_boolean("md-clear", s, ss)) != -1 )
-+ else if ( (val = parse_boolean("verw", s, ss)) != -1 ||
-+ (val = parse_boolean("md-clear", s, ss)) != -1 )
- {
- switch ( val )
- {
- case 0:
- case 1:
-- opt_md_clear_pv = opt_md_clear_hvm = val;
-+ opt_verw_pv = opt_verw_hvm = val;
- break;
-
- case -2:
-- s += strlen("md-clear=");
-+ s += (*s == 'v') ? strlen("verw=") : strlen("md-clear=");
- if ( (val = parse_boolean("pv", s, ss)) >= 0 )
-- opt_md_clear_pv = val;
-+ opt_verw_pv = val;
- else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
-- opt_md_clear_hvm = val;
-+ opt_verw_hvm = val;
- else
- default:
- rc = -EINVAL;
-@@ -540,8 +541,8 @@ static void __init print_details(enum ind_thunk thunk)
- opt_srb_lock ? " SRB_LOCK+" : " SRB_LOCK-",
- opt_ibpb_ctxt_switch ? " IBPB-ctxt" : "",
- opt_l1d_flush ? " L1D_FLUSH" : "",
-- opt_md_clear_pv || opt_md_clear_hvm ||
-- opt_fb_clear_mmio ? " VERW" : "",
-+ opt_verw_pv || opt_verw_hvm ||
-+ opt_verw_mmio ? " VERW" : "",
- opt_div_scrub ? " DIV" : "",
- opt_branch_harden ? " BRANCH_HARDEN" : "");
-
-@@ -562,13 +563,13 @@ static void __init print_details(enum ind_thunk thunk)
- boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
- boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ||
- amd_virt_spec_ctrl ||
-- opt_eager_fpu || opt_md_clear_hvm) ? "" : " None",
-+ opt_eager_fpu || opt_verw_hvm) ? "" : " None",
- boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ? " MSR_SPEC_CTRL" : "",
- (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
- amd_virt_spec_ctrl) ? " MSR_VIRT_SPEC_CTRL" : "",
- boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ? " RSB" : "",
- opt_eager_fpu ? " EAGER_FPU" : "",
-- opt_md_clear_hvm ? " MD_CLEAR" : "",
-+ opt_verw_hvm ? " VERW" : "",
- boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ? " IBPB-entry" : "");
-
- #endif
-@@ -577,11 +578,11 @@ static void __init print_details(enum ind_thunk thunk)
- (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
- boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
- boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ||
-- opt_eager_fpu || opt_md_clear_pv) ? "" : " None",
-+ opt_eager_fpu || opt_verw_pv) ? "" : " None",
- boot_cpu_has(X86_FEATURE_SC_MSR_PV) ? " MSR_SPEC_CTRL" : "",
- boot_cpu_has(X86_FEATURE_SC_RSB_PV) ? " RSB" : "",
- opt_eager_fpu ? " EAGER_FPU" : "",
-- opt_md_clear_pv ? " MD_CLEAR" : "",
-+ opt_verw_pv ? " VERW" : "",
- boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ? " IBPB-entry" : "");
-
- printk(" XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
-@@ -1514,8 +1515,8 @@ void spec_ctrl_init_domain(struct domain *d)
- {
- bool pv = is_pv_domain(d);
-
-- bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
-- (opt_fb_clear_mmio && is_iommu_enabled(d)));
-+ bool verw = ((pv ? opt_verw_pv : opt_verw_hvm) ||
-+ (opt_verw_mmio && is_iommu_enabled(d)));
-
- bool ibpb = ((pv ? opt_ibpb_entry_pv : opt_ibpb_entry_hvm) &&
- (d->domain_id != 0 || opt_ibpb_entry_dom0));
-@@ -1878,19 +1879,20 @@ void __init init_speculation_mitigations(void)
- * the return-to-guest path.
- */
- if ( opt_unpriv_mmio )
-- opt_fb_clear_mmio = cpu_has_fb_clear;
-+ opt_verw_mmio = cpu_has_fb_clear;
-
- /*
- * By default, enable PV and HVM mitigations on MDS-vulnerable hardware.
- * This will only be a token effort for MLPDS/MFBDS when HT is enabled,
- * but it is somewhat better than nothing.
- */
-- if ( opt_md_clear_pv == -1 )
-- opt_md_clear_pv = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
-- boot_cpu_has(X86_FEATURE_MD_CLEAR));
-- if ( opt_md_clear_hvm == -1 )
-- opt_md_clear_hvm = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
-- boot_cpu_has(X86_FEATURE_MD_CLEAR));
-+ if ( opt_verw_pv == -1 )
-+ opt_verw_pv = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
-+ cpu_has_md_clear);
-+
-+ if ( opt_verw_hvm == -1 )
-+ opt_verw_hvm = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
-+ cpu_has_md_clear);
-
- /*
- * Enable MDS/MMIO defences as applicable. The Idle blocks need using if
-@@ -1903,12 +1905,12 @@ void __init init_speculation_mitigations(void)
- * MDS mitigations. L1D_FLUSH is not safe for MMIO mitigations.)
- *
- * After calculating the appropriate idle setting, simplify
-- * opt_md_clear_hvm to mean just "should we VERW on the way into HVM
-+ * opt_verw_hvm to mean just "should we VERW on the way into HVM
- * guests", so spec_ctrl_init_domain() can calculate suitable settings.
- */
-- if ( opt_md_clear_pv || opt_md_clear_hvm || opt_fb_clear_mmio )
-+ if ( opt_verw_pv || opt_verw_hvm || opt_verw_mmio )
- setup_force_cpu_cap(X86_FEATURE_SC_VERW_IDLE);
-- opt_md_clear_hvm &= !cpu_has_skip_l1dfl && !opt_l1d_flush;
-+ opt_verw_hvm &= !cpu_has_skip_l1dfl && !opt_l1d_flush;
-
- /*
- * Warn the user if they are on MLPDS/MFBDS-vulnerable hardware with HT
---
-2.44.0
-
diff --git a/0044-tools-xl-Open-xldevd.log-with-O_CLOEXEC.patch b/0044-tools-xl-Open-xldevd.log-with-O_CLOEXEC.patch
new file mode 100644
index 0000000..39dc3eb
--- /dev/null
+++ b/0044-tools-xl-Open-xldevd.log-with-O_CLOEXEC.patch
@@ -0,0 +1,53 @@
+From 2b3bf02c4f5e44d7d7bd3636530c9ebc837dea87 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 4 Jul 2024 14:11:36 +0200
+Subject: [PATCH 44/56] tools/xl: Open xldevd.log with O_CLOEXEC
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+`xl devd` has been observed leaking /var/log/xldevd.log into children.
+
+Note this is specifically safe; dup2() leaves O_CLOEXEC disabled on newfd, so
+after setting up stdout/stderr, it's only the logfile fd which will close on
+exec().
+
+Link: https://github.com/QubesOS/qubes-issues/issues/8292
+Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
+Reviewed-by: Demi Marie Obenour <demi@invisiblethingslab.com>
+Acked-by: Anthony PERARD <anthony.perard@vates.tech>
+master commit: ba52b3b624e4a1a976908552364eba924ca45430
+master date: 2024-06-24 16:22:59 +0100
+---
+ tools/xl/xl_utils.c | 6 +++++-
+ 1 file changed, 5 insertions(+), 1 deletion(-)
+
+diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c
+index 17489d1829..b0d23b2cdb 100644
+--- a/tools/xl/xl_utils.c
++++ b/tools/xl/xl_utils.c
+@@ -27,6 +27,10 @@
+ #include "xl.h"
+ #include "xl_utils.h"
+
++#ifndef O_CLOEXEC
++#define O_CLOEXEC 0
++#endif
++
+ void dolog(const char *file, int line, const char *func, const char *fmt, ...)
+ {
+ va_list ap;
+@@ -270,7 +274,7 @@ int do_daemonize(const char *name, const char *pidfile)
+ exit(-1);
+ }
+
+- CHK_SYSCALL(logfile = open(fullname, O_WRONLY|O_CREAT|O_APPEND, 0644));
++ CHK_SYSCALL(logfile = open(fullname, O_WRONLY | O_CREAT | O_APPEND | O_CLOEXEC, 0644));
+ free(fullname);
+ assert(logfile >= 3);
+
+--
+2.45.2
+
diff --git a/0044-x86-spec-ctrl-VERW-handling-adjustments.patch b/0044-x86-spec-ctrl-VERW-handling-adjustments.patch
deleted file mode 100644
index e2458c9..0000000
--- a/0044-x86-spec-ctrl-VERW-handling-adjustments.patch
+++ /dev/null
@@ -1,171 +0,0 @@
-From 6663430b442fdf9698bd8e03f701a4547309ad71 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 5 Mar 2024 19:33:37 +0000
-Subject: [PATCH 44/67] x86/spec-ctrl: VERW-handling adjustments
-
-... before we add yet more complexity to this logic. Mostly expanded
-comments, but with three minor changes.
-
-1) Introduce cpu_has_useful_md_clear to simplify later logic in this patch and
- future ones.
-
-2) We only ever need SC_VERW_IDLE when SMT is active. If SMT isn't active,
- then there's no re-partition of pipeline resources based on thread-idleness
- to worry about.
-
-3) The logic to adjust HVM VERW based on L1D_FLUSH is unmaintainable and, as
- it turns out, wrong. SKIP_L1DFL is just a hint bit, whereas opt_l1d_flush
- is the relevant decision of whether to use L1D_FLUSH based on
- susceptibility and user preference.
-
- Rewrite the logic so it can be followed, and incorporate the fact that when
- FB_CLEAR is visible, L1D_FLUSH isn't a safe substitution.
-
-This is part of XSA-452 / CVE-2023-28746.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 1eb91a8a06230b4b64228c9a380194f8cfe6c5e2)
----
- xen/arch/x86/spec_ctrl.c | 99 +++++++++++++++++++++++++++++-----------
- 1 file changed, 73 insertions(+), 26 deletions(-)
-
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index e12ec9930c..adb6bc74e8 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -1531,7 +1531,7 @@ void __init init_speculation_mitigations(void)
- {
- enum ind_thunk thunk = THUNK_DEFAULT;
- bool has_spec_ctrl, ibrs = false, hw_smt_enabled;
-- bool cpu_has_bug_taa, retpoline_safe;
-+ bool cpu_has_bug_taa, cpu_has_useful_md_clear, retpoline_safe;
-
- hw_smt_enabled = check_smt_enabled();
-
-@@ -1867,50 +1867,97 @@ void __init init_speculation_mitigations(void)
- "enabled. Please assess your configuration and choose an\n"
- "explicit 'smt=<bool>' setting. See XSA-273.\n");
-
-+ /*
-+ * A brief summary of VERW-related changes.
-+ *
-+ * https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/intel-analysis-microarchitectural-data-sampling.html
-+ * https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/processor-mmio-stale-data-vulnerabilities.html
-+ *
-+ * Relevant ucodes:
-+ *
-+ * - May 2019, for MDS. Introduces the MD_CLEAR CPUID bit and VERW side
-+ * effects to scrub Store/Load/Fill buffers as applicable. MD_CLEAR
-+ * exists architecturally, even when the side effects have been removed.
-+ *
-+ * Use VERW to scrub on return-to-guest. Parts with L1D_FLUSH to
-+ * mitigate L1TF have the same side effect, so no need to do both.
-+ *
-+ * Various Atoms suffer from Store-buffer sampling only. Store buffers
-+ * are statically partitioned between non-idle threads, so scrubbing is
-+ * wanted when going idle too.
-+ *
-+ * Load ports and Fill buffers are competitively shared between threads.
-+ * SMT must be disabled for VERW scrubbing to be fully effective.
-+ *
-+ * - November 2019, for TAA. Extended VERW side effects to TSX-enabled
-+ * MDS_NO parts.
-+ *
-+ * - February 2022, for Client TSX de-feature. Removed VERW side effects
-+ * from Client CPUs only.
-+ *
-+ * - May 2022, for MMIO Stale Data. (Re)introduced Fill Buffer scrubbing
-+ * on all MMIO-affected parts which didn't already have it for MDS
-+ * reasons, enumerating FB_CLEAR on those parts only.
-+ *
-+ * If FB_CLEAR is enumerated, L1D_FLUSH does not have the same scrubbing
-+ * side effects as VERW and cannot be used in its place.
-+ */
- mds_calculations();
-
- /*
-- * Parts which enumerate FB_CLEAR are those which are post-MDS_NO and have
-- * reintroduced the VERW fill buffer flushing side effect because of a
-- * susceptibility to FBSDP.
-+ * Parts which enumerate FB_CLEAR are those with now-updated microcode
-+ * which weren't susceptible to the original MFBDS (and therefore didn't
-+ * have Fill Buffer scrubbing side effects to begin with, or were Client
-+ * MDS_NO non-TAA_NO parts where the scrubbing was removed), but have had
-+ * the scrubbing reintroduced because of a susceptibility to FBSDP.
- *
- * If unprivileged guests have (or will have) MMIO mappings, we can
- * mitigate cross-domain leakage of fill buffer data by issuing VERW on
-- * the return-to-guest path.
-+ * the return-to-guest path. This is only a token effort if SMT is
-+ * active.
- */
- if ( opt_unpriv_mmio )
- opt_verw_mmio = cpu_has_fb_clear;
-
- /*
-- * By default, enable PV and HVM mitigations on MDS-vulnerable hardware.
-- * This will only be a token effort for MLPDS/MFBDS when HT is enabled,
-- * but it is somewhat better than nothing.
-+ * MD_CLEAR is enumerated architecturally forevermore, even after the
-+ * scrubbing side effects have been removed. Create ourselves an version
-+ * which expressed whether we think MD_CLEAR is having any useful side
-+ * effect.
-+ */
-+ cpu_has_useful_md_clear = (cpu_has_md_clear &&
-+ (cpu_has_bug_mds || cpu_has_bug_msbds_only));
-+
-+ /*
-+ * By default, use VERW scrubbing on applicable hardware, if we think it's
-+ * going to have an effect. This will only be a token effort for
-+ * MLPDS/MFBDS when SMT is enabled.
- */
- if ( opt_verw_pv == -1 )
-- opt_verw_pv = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
-- cpu_has_md_clear);
-+ opt_verw_pv = cpu_has_useful_md_clear;
-
- if ( opt_verw_hvm == -1 )
-- opt_verw_hvm = ((cpu_has_bug_mds || cpu_has_bug_msbds_only) &&
-- cpu_has_md_clear);
-+ opt_verw_hvm = cpu_has_useful_md_clear;
-
- /*
-- * Enable MDS/MMIO defences as applicable. The Idle blocks need using if
-- * either the PV or HVM MDS defences are used, or if we may give MMIO
-- * access to untrusted guests.
-- *
-- * HVM is more complicated. The MD_CLEAR microcode extends L1D_FLUSH with
-- * equivalent semantics to avoid needing to perform both flushes on the
-- * HVM path. Therefore, we don't need VERW in addition to L1D_FLUSH (for
-- * MDS mitigations. L1D_FLUSH is not safe for MMIO mitigations.)
-- *
-- * After calculating the appropriate idle setting, simplify
-- * opt_verw_hvm to mean just "should we VERW on the way into HVM
-- * guests", so spec_ctrl_init_domain() can calculate suitable settings.
-+ * If SMT is active, and we're protecting against MDS or MMIO stale data,
-+ * we need to scrub before going idle as well as on return to guest.
-+ * Various pipeline resources are repartitioned amongst non-idle threads.
- */
-- if ( opt_verw_pv || opt_verw_hvm || opt_verw_mmio )
-+ if ( ((cpu_has_useful_md_clear && (opt_verw_pv || opt_verw_hvm)) ||
-+ opt_verw_mmio) && hw_smt_enabled )
- setup_force_cpu_cap(X86_FEATURE_SC_VERW_IDLE);
-- opt_verw_hvm &= !cpu_has_skip_l1dfl && !opt_l1d_flush;
-+
-+ /*
-+ * After calculating the appropriate idle setting, simplify opt_verw_hvm
-+ * to mean just "should we VERW on the way into HVM guests", so
-+ * spec_ctrl_init_domain() can calculate suitable settings.
-+ *
-+ * It is only safe to use L1D_FLUSH in place of VERW when MD_CLEAR is the
-+ * only *_CLEAR we can see.
-+ */
-+ if ( opt_l1d_flush && cpu_has_md_clear && !cpu_has_fb_clear )
-+ opt_verw_hvm = false;
-
- /*
- * Warn the user if they are on MLPDS/MFBDS-vulnerable hardware with HT
---
-2.44.0
-
diff --git a/0045-pirq_cleanup_check-leaks.patch b/0045-pirq_cleanup_check-leaks.patch
new file mode 100644
index 0000000..dcf96c7
--- /dev/null
+++ b/0045-pirq_cleanup_check-leaks.patch
@@ -0,0 +1,84 @@
+From c9f50d2c5f29b630603e2b95f29e5b6e416a6187 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Thu, 4 Jul 2024 14:11:57 +0200
+Subject: [PATCH 45/56] pirq_cleanup_check() leaks
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Its original introduction had two issues: For one the "common" part of
+the checks (carried out in the macro) was inverted. And then after
+removal from the radix tree the structure wasn't scheduled for freeing.
+(All structures still left in the radix tree would be freed upon domain
+destruction, though.)
+
+For the freeing to be safe even if it didn't use RCU (i.e. to avoid use-
+after-free), re-arrange checks/operations in evtchn_close(), such that
+the pointer wouldn't be used anymore after calling pirq_cleanup_check()
+(noting that unmap_domain_pirq_emuirq() itself calls the function in the
+success case).
+
+Fixes: c24536b636f2 ("replace d->nr_pirqs sized arrays with radix tree")
+Fixes: 79858fee307c ("xen: fix hvm_domain_use_pirq's behavior")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: daa90dfea9175c07f13d1a2d901857b2dd14d080
+master date: 2024-07-02 08:35:56 +0200
+---
+ xen/arch/x86/irq.c | 1 +
+ xen/common/event_channel.c | 11 ++++++++---
+ xen/include/xen/irq.h | 2 +-
+ 3 files changed, 10 insertions(+), 4 deletions(-)
+
+diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
+index 290f8d26e7..00be3b88e8 100644
+--- a/xen/arch/x86/irq.c
++++ b/xen/arch/x86/irq.c
+@@ -1413,6 +1413,7 @@ void (pirq_cleanup_check)(struct pirq *pirq, struct domain *d)
+
+ if ( radix_tree_delete(&d->pirq_tree, pirq->pirq) != pirq )
+ BUG();
++ free_pirq_struct(pirq);
+ }
+
+ /* Flush all ready EOIs from the top of this CPU's pending-EOI stack. */
+diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
+index 66f924a7b0..b1a6215c37 100644
+--- a/xen/common/event_channel.c
++++ b/xen/common/event_channel.c
+@@ -705,11 +705,16 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
+ if ( !is_hvm_domain(d1) )
+ pirq_guest_unbind(d1, pirq);
+ pirq->evtchn = 0;
+- pirq_cleanup_check(pirq, d1);
+ #ifdef CONFIG_X86
+- if ( is_hvm_domain(d1) && domain_pirq_to_irq(d1, pirq->pirq) > 0 )
+- unmap_domain_pirq_emuirq(d1, pirq->pirq);
++ if ( !is_hvm_domain(d1) ||
++ domain_pirq_to_irq(d1, pirq->pirq) <= 0 ||
++ unmap_domain_pirq_emuirq(d1, pirq->pirq) < 0 )
++ /*
++ * The successful path of unmap_domain_pirq_emuirq() will have
++ * called pirq_cleanup_check() already.
++ */
+ #endif
++ pirq_cleanup_check(pirq, d1);
+ }
+ unlink_pirq_port(chn1, d1->vcpu[chn1->notify_vcpu_id]);
+ break;
+diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
+index 65083135e1..5dcd2d8f0c 100644
+--- a/xen/include/xen/irq.h
++++ b/xen/include/xen/irq.h
+@@ -180,7 +180,7 @@ extern struct pirq *pirq_get_info(struct domain *d, int pirq);
+ void pirq_cleanup_check(struct pirq *pirq, struct domain *d);
+
+ #define pirq_cleanup_check(pirq, d) \
+- ((pirq)->evtchn ? pirq_cleanup_check(pirq, d) : (void)0)
++ (!(pirq)->evtchn ? pirq_cleanup_check(pirq, d) : (void)0)
+
+ extern void pirq_guest_eoi(struct pirq *pirq);
+ extern void desc_guest_eoi(struct irq_desc *desc, struct pirq *pirq);
+--
+2.45.2
+
diff --git a/0045-x86-spec-ctrl-Mitigation-Register-File-Data-Sampling.patch b/0045-x86-spec-ctrl-Mitigation-Register-File-Data-Sampling.patch
deleted file mode 100644
index 4a10524..0000000
--- a/0045-x86-spec-ctrl-Mitigation-Register-File-Data-Sampling.patch
+++ /dev/null
@@ -1,320 +0,0 @@
-From d85481135d87abbbf1feab18b749288fa08b65f2 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 22 Jun 2023 23:32:19 +0100
-Subject: [PATCH 45/67] x86/spec-ctrl: Mitigation Register File Data Sampling
-
-RFDS affects Atom cores, also branded E-cores, between the Goldmont and
-Gracemont microarchitectures. This includes Alder Lake and Raptor Lake hybrid
-clien systems which have a mix of Gracemont and other types of cores.
-
-Two new bits have been defined; RFDS_CLEAR to indicate VERW has more side
-effets, and RFDS_NO to incidate that the system is unaffected. Plenty of
-unaffected CPUs won't be getting RFDS_NO retrofitted in microcode, so we
-synthesise it. Alder Lake and Raptor Lake Xeon-E's are unaffected due to
-their platform configuration, and we must use the Hybrid CPUID bit to
-distinguish them from their non-Xeon counterparts.
-
-Like MD_CLEAR and FB_CLEAR, RFDS_CLEAR needs OR-ing across a resource pool, so
-set it in the max policies and reflect the host setting in default.
-
-This is part of XSA-452 / CVE-2023-28746.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit fb5b6f6744713410c74cfc12b7176c108e3c9a31)
----
- tools/misc/xen-cpuid.c | 5 +-
- xen/arch/x86/cpu-policy.c | 5 +
- xen/arch/x86/include/asm/cpufeature.h | 3 +
- xen/arch/x86/include/asm/msr-index.h | 2 +
- xen/arch/x86/spec_ctrl.c | 100 +++++++++++++++++++-
- xen/include/public/arch-x86/cpufeatureset.h | 3 +
- 6 files changed, 111 insertions(+), 7 deletions(-)
-
-diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
-index aefc140d66..5ceea8be07 100644
---- a/tools/misc/xen-cpuid.c
-+++ b/tools/misc/xen-cpuid.c
-@@ -172,7 +172,7 @@ static const char *const str_7d0[32] =
- [ 8] = "avx512-vp2intersect", [ 9] = "srbds-ctrl",
- [10] = "md-clear", [11] = "rtm-always-abort",
- /* 12 */ [13] = "tsx-force-abort",
-- [14] = "serialize",
-+ [14] = "serialize", [15] = "hybrid",
- [16] = "tsxldtrk",
- [18] = "pconfig",
- [20] = "cet-ibt",
-@@ -237,7 +237,8 @@ static const char *const str_m10Al[32] =
- [20] = "bhi-no", [21] = "xapic-status",
- /* 22 */ [23] = "ovrclk-status",
- [24] = "pbrsb-no", [25] = "gds-ctrl",
-- [26] = "gds-no",
-+ [26] = "gds-no", [27] = "rfds-no",
-+ [28] = "rfds-clear",
- };
-
- static const char *const str_m10Ah[32] =
-diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
-index 7b875a7221..96c2cee1a8 100644
---- a/xen/arch/x86/cpu-policy.c
-+++ b/xen/arch/x86/cpu-policy.c
-@@ -444,6 +444,7 @@ static void __init guest_common_max_feature_adjustments(uint32_t *fs)
- */
- __set_bit(X86_FEATURE_MD_CLEAR, fs);
- __set_bit(X86_FEATURE_FB_CLEAR, fs);
-+ __set_bit(X86_FEATURE_RFDS_CLEAR, fs);
-
- /*
- * The Gather Data Sampling microcode mitigation (August 2023) has an
-@@ -493,6 +494,10 @@ static void __init guest_common_default_feature_adjustments(uint32_t *fs)
- if ( cpu_has_fb_clear )
- __set_bit(X86_FEATURE_FB_CLEAR, fs);
-
-+ __clear_bit(X86_FEATURE_RFDS_CLEAR, fs);
-+ if ( cpu_has_rfds_clear )
-+ __set_bit(X86_FEATURE_RFDS_CLEAR, fs);
-+
- /*
- * The Gather Data Sampling microcode mitigation (August 2023) has an
- * adverse performance impact on the CLWB instruction on SKX/CLX/CPX.
-diff --git a/xen/arch/x86/include/asm/cpufeature.h b/xen/arch/x86/include/asm/cpufeature.h
-index ec824e8954..a6b8af1296 100644
---- a/xen/arch/x86/include/asm/cpufeature.h
-+++ b/xen/arch/x86/include/asm/cpufeature.h
-@@ -140,6 +140,7 @@
- #define cpu_has_rtm_always_abort boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)
- #define cpu_has_tsx_force_abort boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)
- #define cpu_has_serialize boot_cpu_has(X86_FEATURE_SERIALIZE)
-+#define cpu_has_hybrid boot_cpu_has(X86_FEATURE_HYBRID)
- #define cpu_has_avx512_fp16 boot_cpu_has(X86_FEATURE_AVX512_FP16)
- #define cpu_has_arch_caps boot_cpu_has(X86_FEATURE_ARCH_CAPS)
-
-@@ -161,6 +162,8 @@
- #define cpu_has_rrsba boot_cpu_has(X86_FEATURE_RRSBA)
- #define cpu_has_gds_ctrl boot_cpu_has(X86_FEATURE_GDS_CTRL)
- #define cpu_has_gds_no boot_cpu_has(X86_FEATURE_GDS_NO)
-+#define cpu_has_rfds_no boot_cpu_has(X86_FEATURE_RFDS_NO)
-+#define cpu_has_rfds_clear boot_cpu_has(X86_FEATURE_RFDS_CLEAR)
-
- /* Synthesized. */
- #define cpu_has_arch_perfmon boot_cpu_has(X86_FEATURE_ARCH_PERFMON)
-diff --git a/xen/arch/x86/include/asm/msr-index.h b/xen/arch/x86/include/asm/msr-index.h
-index 6abf7bc34a..9b5f67711f 100644
---- a/xen/arch/x86/include/asm/msr-index.h
-+++ b/xen/arch/x86/include/asm/msr-index.h
-@@ -88,6 +88,8 @@
- #define ARCH_CAPS_PBRSB_NO (_AC(1, ULL) << 24)
- #define ARCH_CAPS_GDS_CTRL (_AC(1, ULL) << 25)
- #define ARCH_CAPS_GDS_NO (_AC(1, ULL) << 26)
-+#define ARCH_CAPS_RFDS_NO (_AC(1, ULL) << 27)
-+#define ARCH_CAPS_RFDS_CLEAR (_AC(1, ULL) << 28)
-
- #define MSR_FLUSH_CMD 0x0000010b
- #define FLUSH_CMD_L1D (_AC(1, ULL) << 0)
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index adb6bc74e8..1ee81e2dfe 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -24,6 +24,7 @@
-
- #include <asm/amd.h>
- #include <asm/hvm/svm/svm.h>
-+#include <asm/intel-family.h>
- #include <asm/microcode.h>
- #include <asm/msr.h>
- #include <asm/pv/domain.h>
-@@ -447,7 +448,7 @@ static void __init print_details(enum ind_thunk thunk)
- * Hardware read-only information, stating immunity to certain issues, or
- * suggestions of which mitigation to use.
- */
-- printk(" Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
-+ printk(" Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
- (caps & ARCH_CAPS_RDCL_NO) ? " RDCL_NO" : "",
- (caps & ARCH_CAPS_EIBRS) ? " EIBRS" : "",
- (caps & ARCH_CAPS_RSBA) ? " RSBA" : "",
-@@ -463,6 +464,7 @@ static void __init print_details(enum ind_thunk thunk)
- (caps & ARCH_CAPS_FB_CLEAR) ? " FB_CLEAR" : "",
- (caps & ARCH_CAPS_PBRSB_NO) ? " PBRSB_NO" : "",
- (caps & ARCH_CAPS_GDS_NO) ? " GDS_NO" : "",
-+ (caps & ARCH_CAPS_RFDS_NO) ? " RFDS_NO" : "",
- (e8b & cpufeat_mask(X86_FEATURE_IBRS_ALWAYS)) ? " IBRS_ALWAYS" : "",
- (e8b & cpufeat_mask(X86_FEATURE_STIBP_ALWAYS)) ? " STIBP_ALWAYS" : "",
- (e8b & cpufeat_mask(X86_FEATURE_IBRS_FAST)) ? " IBRS_FAST" : "",
-@@ -473,7 +475,7 @@ static void __init print_details(enum ind_thunk thunk)
- (e21a & cpufeat_mask(X86_FEATURE_SRSO_NO)) ? " SRSO_NO" : "");
-
- /* Hardware features which need driving to mitigate issues. */
-- printk(" Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
-+ printk(" Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
- (e8b & cpufeat_mask(X86_FEATURE_IBPB)) ||
- (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBPB" : "",
- (e8b & cpufeat_mask(X86_FEATURE_IBRS)) ||
-@@ -491,6 +493,7 @@ static void __init print_details(enum ind_thunk thunk)
- (caps & ARCH_CAPS_TSX_CTRL) ? " TSX_CTRL" : "",
- (caps & ARCH_CAPS_FB_CLEAR_CTRL) ? " FB_CLEAR_CTRL" : "",
- (caps & ARCH_CAPS_GDS_CTRL) ? " GDS_CTRL" : "",
-+ (caps & ARCH_CAPS_RFDS_CLEAR) ? " RFDS_CLEAR" : "",
- (e21a & cpufeat_mask(X86_FEATURE_SBPB)) ? " SBPB" : "");
-
- /* Compiled-in support which pertains to mitigations. */
-@@ -1359,6 +1362,83 @@ static __init void mds_calculations(void)
- }
- }
-
-+/*
-+ * Register File Data Sampling affects Atom cores from the Goldmont to
-+ * Gracemont microarchitectures. The March 2024 microcode adds RFDS_NO to
-+ * some but not all unaffected parts, and RFDS_CLEAR to affected parts still
-+ * in support.
-+ *
-+ * Alder Lake and Raptor Lake client CPUs have a mix of P cores
-+ * (Golden/Raptor Cove, not vulnerable) and E cores (Gracemont,
-+ * vulnerable), and both enumerate RFDS_CLEAR.
-+ *
-+ * Both exist in a Xeon SKU, which has the E cores (Gracemont) disabled by
-+ * platform configuration, and enumerate RFDS_NO.
-+ *
-+ * With older parts, or with out-of-date microcode, synthesise RFDS_NO when
-+ * safe to do so.
-+ *
-+ * https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/register-file-data-sampling.html
-+ */
-+static void __init rfds_calculations(void)
-+{
-+ /* RFDS is only known to affect Intel Family 6 processors at this time. */
-+ if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ||
-+ boot_cpu_data.x86 != 6 )
-+ return;
-+
-+ /*
-+ * If RFDS_NO or RFDS_CLEAR are visible, we've either got suitable
-+ * microcode, or an RFDS-aware hypervisor is levelling us in a pool.
-+ */
-+ if ( cpu_has_rfds_no || cpu_has_rfds_clear )
-+ return;
-+
-+ /* If we're virtualised, don't attempt to synthesise RFDS_NO. */
-+ if ( cpu_has_hypervisor )
-+ return;
-+
-+ /*
-+ * Not all CPUs are expected to get a microcode update enumerating one of
-+ * RFDS_{NO,CLEAR}, or we might have out-of-date microcode.
-+ */
-+ switch ( boot_cpu_data.x86_model )
-+ {
-+ case INTEL_FAM6_ALDERLAKE:
-+ case INTEL_FAM6_RAPTORLAKE:
-+ /*
-+ * Alder Lake and Raptor Lake might be a client SKU (with the
-+ * Gracemont cores active, and therefore vulnerable) or might be a
-+ * server SKU (with the Gracemont cores disabled, and therefore not
-+ * vulnerable).
-+ *
-+ * See if the CPU identifies as hybrid to distinguish the two cases.
-+ */
-+ if ( !cpu_has_hybrid )
-+ break;
-+ fallthrough;
-+ case INTEL_FAM6_ALDERLAKE_L:
-+ case INTEL_FAM6_RAPTORLAKE_P:
-+ case INTEL_FAM6_RAPTORLAKE_S:
-+
-+ case INTEL_FAM6_ATOM_GOLDMONT: /* Apollo Lake */
-+ case INTEL_FAM6_ATOM_GOLDMONT_D: /* Denverton */
-+ case INTEL_FAM6_ATOM_GOLDMONT_PLUS: /* Gemini Lake */
-+ case INTEL_FAM6_ATOM_TREMONT_D: /* Snow Ridge / Parker Ridge */
-+ case INTEL_FAM6_ATOM_TREMONT: /* Elkhart Lake */
-+ case INTEL_FAM6_ATOM_TREMONT_L: /* Jasper Lake */
-+ case INTEL_FAM6_ATOM_GRACEMONT: /* Alder Lake N */
-+ return;
-+ }
-+
-+ /*
-+ * We appear to be on an unaffected CPU which didn't enumerate RFDS_NO,
-+ * perhaps because of it's age or because of out-of-date microcode.
-+ * Synthesise it.
-+ */
-+ setup_force_cpu_cap(X86_FEATURE_RFDS_NO);
-+}
-+
- static bool __init cpu_has_gds(void)
- {
- /*
-@@ -1872,6 +1952,7 @@ void __init init_speculation_mitigations(void)
- *
- * https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/intel-analysis-microarchitectural-data-sampling.html
- * https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/processor-mmio-stale-data-vulnerabilities.html
-+ * https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/register-file-data-sampling.html
- *
- * Relevant ucodes:
- *
-@@ -1901,8 +1982,12 @@ void __init init_speculation_mitigations(void)
- *
- * If FB_CLEAR is enumerated, L1D_FLUSH does not have the same scrubbing
- * side effects as VERW and cannot be used in its place.
-+ *
-+ * - March 2023, for RFDS. Enumerate RFDS_CLEAR to mean that VERW now
-+ * scrubs non-architectural entries from certain register files.
- */
- mds_calculations();
-+ rfds_calculations();
-
- /*
- * Parts which enumerate FB_CLEAR are those with now-updated microcode
-@@ -1934,15 +2019,19 @@ void __init init_speculation_mitigations(void)
- * MLPDS/MFBDS when SMT is enabled.
- */
- if ( opt_verw_pv == -1 )
-- opt_verw_pv = cpu_has_useful_md_clear;
-+ opt_verw_pv = cpu_has_useful_md_clear || cpu_has_rfds_clear;
-
- if ( opt_verw_hvm == -1 )
-- opt_verw_hvm = cpu_has_useful_md_clear;
-+ opt_verw_hvm = cpu_has_useful_md_clear || cpu_has_rfds_clear;
-
- /*
- * If SMT is active, and we're protecting against MDS or MMIO stale data,
- * we need to scrub before going idle as well as on return to guest.
- * Various pipeline resources are repartitioned amongst non-idle threads.
-+ *
-+ * We don't need to scrub on idle for RFDS. There are no affected cores
-+ * which support SMT, despite there being affected cores in hybrid systems
-+ * which have SMT elsewhere in the platform.
- */
- if ( ((cpu_has_useful_md_clear && (opt_verw_pv || opt_verw_hvm)) ||
- opt_verw_mmio) && hw_smt_enabled )
-@@ -1956,7 +2045,8 @@ void __init init_speculation_mitigations(void)
- * It is only safe to use L1D_FLUSH in place of VERW when MD_CLEAR is the
- * only *_CLEAR we can see.
- */
-- if ( opt_l1d_flush && cpu_has_md_clear && !cpu_has_fb_clear )
-+ if ( opt_l1d_flush && cpu_has_md_clear && !cpu_has_fb_clear &&
-+ !cpu_has_rfds_clear )
- opt_verw_hvm = false;
-
- /*
-diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
-index aec1407613..113e6cadc1 100644
---- a/xen/include/public/arch-x86/cpufeatureset.h
-+++ b/xen/include/public/arch-x86/cpufeatureset.h
-@@ -264,6 +264,7 @@ XEN_CPUFEATURE(MD_CLEAR, 9*32+10) /*!A VERW clears microarchitectural buffe
- XEN_CPUFEATURE(RTM_ALWAYS_ABORT, 9*32+11) /*! June 2021 TSX defeaturing in microcode. */
- XEN_CPUFEATURE(TSX_FORCE_ABORT, 9*32+13) /* MSR_TSX_FORCE_ABORT.RTM_ABORT */
- XEN_CPUFEATURE(SERIALIZE, 9*32+14) /*A SERIALIZE insn */
-+XEN_CPUFEATURE(HYBRID, 9*32+15) /* Heterogeneous platform */
- XEN_CPUFEATURE(TSXLDTRK, 9*32+16) /*a TSX load tracking suspend/resume insns */
- XEN_CPUFEATURE(CET_IBT, 9*32+20) /* CET - Indirect Branch Tracking */
- XEN_CPUFEATURE(AVX512_FP16, 9*32+23) /* AVX512 FP16 instructions */
-@@ -330,6 +331,8 @@ XEN_CPUFEATURE(OVRCLK_STATUS, 16*32+23) /* MSR_OVERCLOCKING_STATUS */
- XEN_CPUFEATURE(PBRSB_NO, 16*32+24) /*A No Post-Barrier RSB predictions */
- XEN_CPUFEATURE(GDS_CTRL, 16*32+25) /* MCU_OPT_CTRL.GDS_MIT_{DIS,LOCK} */
- XEN_CPUFEATURE(GDS_NO, 16*32+26) /*A No Gather Data Sampling */
-+XEN_CPUFEATURE(RFDS_NO, 16*32+27) /*A No Register File Data Sampling */
-+XEN_CPUFEATURE(RFDS_CLEAR, 16*32+28) /*!A Register File(s) cleared by VERW */
-
- /* Intel-defined CPU features, MSR_ARCH_CAPS 0x10a.edx, word 17 */
-
---
-2.44.0
-
diff --git a/0046-tools-dombuilder-Correct-the-length-calculation-in-x.patch b/0046-tools-dombuilder-Correct-the-length-calculation-in-x.patch
new file mode 100644
index 0000000..b25f15d
--- /dev/null
+++ b/0046-tools-dombuilder-Correct-the-length-calculation-in-x.patch
@@ -0,0 +1,44 @@
+From 8e51c8f1d45fad242a315fa17ba3582c02e66840 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 4 Jul 2024 14:12:31 +0200
+Subject: [PATCH 46/56] tools/dombuilder: Correct the length calculation in
+ xc_dom_alloc_segment()
+
+xc_dom_alloc_segment() is passed a size in bytes, calculates a size in pages
+from it, then fills in the new segment information with a bytes value
+re-calculated from the number of pages.
+
+This causes the module information given to the guest (MB, or PVH) to have
+incorrect sizes; specifically, sizes rounded up to the next page.
+
+This in turn is problematic for Xen. When Xen finds a gzipped module, it
+peeks at the end metadata to judge the decompressed size, which is a -4
+backreference from the reported end of the module.
+
+Fill in seg->vend using the correct number of bytes.
+
+Fixes: ea7c8a3d0e82 ("libxc: reorganize domain builder guest memory allocator")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Acked-by: Anthony PERARD <anthony.perard@vates.tech>
+master commit: 4c3a618b0adaa0cd59e0fa0898bb60978b8b3a5f
+master date: 2024-07-02 10:50:18 +0100
+---
+ tools/libs/guest/xg_dom_core.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/tools/libs/guest/xg_dom_core.c b/tools/libs/guest/xg_dom_core.c
+index c4f4e7f3e2..f5521d528b 100644
+--- a/tools/libs/guest/xg_dom_core.c
++++ b/tools/libs/guest/xg_dom_core.c
+@@ -601,7 +601,7 @@ int xc_dom_alloc_segment(struct xc_dom_image *dom,
+ memset(ptr, 0, pages * page_size);
+
+ seg->vstart = start;
+- seg->vend = dom->virt_alloc_end;
++ seg->vend = start + size;
+
+ DOMPRINTF("%-20s: %-12s : 0x%" PRIx64 " -> 0x%" PRIx64
+ " (pfn 0x%" PRIpfn " + 0x%" PRIpfn " pages)",
+--
+2.45.2
+
diff --git a/0046-x86-paging-Delete-update_cr3-s-do_locking-parameter.patch b/0046-x86-paging-Delete-update_cr3-s-do_locking-parameter.patch
deleted file mode 100644
index ce397a1..0000000
--- a/0046-x86-paging-Delete-update_cr3-s-do_locking-parameter.patch
+++ /dev/null
@@ -1,161 +0,0 @@
-From bf70ce8b3449c49eb828d5b1f4934a49b00fef35 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 20 Sep 2023 20:06:53 +0100
-Subject: [PATCH 46/67] x86/paging: Delete update_cr3()'s do_locking parameter
-
-Nicola reports that the XSA-438 fix introduced new MISRA violations because of
-some incidental tidying it tried to do. The parameter is useless, so resolve
-the MISRA regression by removing it.
-
-hap_update_cr3() discards the parameter entirely, while sh_update_cr3() uses
-it to distinguish internal and external callers and therefore whether the
-paging lock should be taken.
-
-However, we have paging_lock_recursive() for this purpose, which also avoids
-the ability for the shadow internal callers to accidentally not hold the lock.
-
-Fixes: fb0ff49fe9f7 ("x86/shadow: defer releasing of PV's top-level shadow reference")
-Reported-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Release-acked-by: Henry Wang <Henry.Wang@arm.com>
-(cherry picked from commit e71157d1ac2a7fbf413130663cf0a93ff9fbcf7e)
----
- xen/arch/x86/include/asm/paging.h | 5 ++---
- xen/arch/x86/mm/hap/hap.c | 5 ++---
- xen/arch/x86/mm/shadow/common.c | 2 +-
- xen/arch/x86/mm/shadow/multi.c | 17 ++++++++---------
- xen/arch/x86/mm/shadow/none.c | 3 +--
- 5 files changed, 14 insertions(+), 18 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/paging.h b/xen/arch/x86/include/asm/paging.h
-index 94c590f31a..809ff35d9a 100644
---- a/xen/arch/x86/include/asm/paging.h
-+++ b/xen/arch/x86/include/asm/paging.h
-@@ -138,8 +138,7 @@ struct paging_mode {
- paddr_t ga, uint32_t *pfec,
- unsigned int *page_order);
- #endif
-- pagetable_t (*update_cr3 )(struct vcpu *v, bool do_locking,
-- bool noflush);
-+ pagetable_t (*update_cr3 )(struct vcpu *v, bool noflush);
- void (*update_paging_modes )(struct vcpu *v);
- bool (*flush_tlb )(const unsigned long *vcpu_bitmap);
-
-@@ -312,7 +311,7 @@ static inline unsigned long paging_ga_to_gfn_cr3(struct vcpu *v,
- * as the value to load into the host CR3 to schedule this vcpu */
- static inline pagetable_t paging_update_cr3(struct vcpu *v, bool noflush)
- {
-- return paging_get_hostmode(v)->update_cr3(v, 1, noflush);
-+ return paging_get_hostmode(v)->update_cr3(v, noflush);
- }
-
- /* Update all the things that are derived from the guest's CR0/CR3/CR4.
-diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
-index 57a19c3d59..3ad39a7dd7 100644
---- a/xen/arch/x86/mm/hap/hap.c
-+++ b/xen/arch/x86/mm/hap/hap.c
-@@ -739,8 +739,7 @@ static bool cf_check hap_invlpg(struct vcpu *v, unsigned long linear)
- return 1;
- }
-
--static pagetable_t cf_check hap_update_cr3(
-- struct vcpu *v, bool do_locking, bool noflush)
-+static pagetable_t cf_check hap_update_cr3(struct vcpu *v, bool noflush)
- {
- v->arch.hvm.hw_cr[3] = v->arch.hvm.guest_cr[3];
- hvm_update_guest_cr3(v, noflush);
-@@ -826,7 +825,7 @@ static void cf_check hap_update_paging_modes(struct vcpu *v)
- }
-
- /* CR3 is effectively updated by a mode change. Flush ASIDs, etc. */
-- hap_update_cr3(v, 0, false);
-+ hap_update_cr3(v, false);
-
- unlock:
- paging_unlock(d);
-diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
-index c0940f939e..18714dbd02 100644
---- a/xen/arch/x86/mm/shadow/common.c
-+++ b/xen/arch/x86/mm/shadow/common.c
-@@ -2579,7 +2579,7 @@ static void sh_update_paging_modes(struct vcpu *v)
- }
- #endif /* OOS */
-
-- v->arch.paging.mode->update_cr3(v, 0, false);
-+ v->arch.paging.mode->update_cr3(v, false);
- }
-
- void cf_check shadow_update_paging_modes(struct vcpu *v)
-diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
-index c92b354a78..e54a507b54 100644
---- a/xen/arch/x86/mm/shadow/multi.c
-+++ b/xen/arch/x86/mm/shadow/multi.c
-@@ -2506,7 +2506,7 @@ static int cf_check sh_page_fault(
- * In any case, in the PAE case, the ASSERT is not true; it can
- * happen because of actions the guest is taking. */
- #if GUEST_PAGING_LEVELS == 3
-- v->arch.paging.mode->update_cr3(v, 0, false);
-+ v->arch.paging.mode->update_cr3(v, false);
- #else
- ASSERT(d->is_shutting_down);
- #endif
-@@ -3224,17 +3224,13 @@ static void cf_check sh_detach_old_tables(struct vcpu *v)
- }
- }
-
--static pagetable_t cf_check sh_update_cr3(struct vcpu *v, bool do_locking,
-- bool noflush)
-+static pagetable_t cf_check sh_update_cr3(struct vcpu *v, bool noflush)
- /* Updates vcpu->arch.cr3 after the guest has changed CR3.
- * Paravirtual guests should set v->arch.guest_table (and guest_table_user,
- * if appropriate).
- * HVM guests should also make sure hvm_get_guest_cntl_reg(v, 3) works;
- * this function will call hvm_update_guest_cr(v, 3) to tell them where the
- * shadow tables are.
-- * If do_locking != 0, assume we are being called from outside the
-- * shadow code, and must take and release the paging lock; otherwise
-- * that is the caller's responsibility.
- */
- {
- struct domain *d = v->domain;
-@@ -3252,7 +3248,11 @@ static pagetable_t cf_check sh_update_cr3(struct vcpu *v, bool do_locking,
- return old_entry;
- }
-
-- if ( do_locking ) paging_lock(v->domain);
-+ /*
-+ * This is used externally (with the paging lock not taken) and internally
-+ * by the shadow code (with the lock already taken).
-+ */
-+ paging_lock_recursive(v->domain);
-
- #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
- /* Need to resync all the shadow entries on a TLB flush. Resync
-@@ -3480,8 +3480,7 @@ static pagetable_t cf_check sh_update_cr3(struct vcpu *v, bool do_locking,
- shadow_sync_other_vcpus(v);
- #endif
-
-- /* Release the lock, if we took it (otherwise it's the caller's problem) */
-- if ( do_locking ) paging_unlock(v->domain);
-+ paging_unlock(v->domain);
-
- return old_entry;
- }
-diff --git a/xen/arch/x86/mm/shadow/none.c b/xen/arch/x86/mm/shadow/none.c
-index 743c0ffb85..7e4e386cd0 100644
---- a/xen/arch/x86/mm/shadow/none.c
-+++ b/xen/arch/x86/mm/shadow/none.c
-@@ -52,8 +52,7 @@ static unsigned long cf_check _gva_to_gfn(
- }
- #endif
-
--static pagetable_t cf_check _update_cr3(struct vcpu *v, bool do_locking,
-- bool noflush)
-+static pagetable_t cf_check _update_cr3(struct vcpu *v, bool noflush)
- {
- ASSERT_UNREACHABLE();
- return pagetable_null();
---
-2.44.0
-
diff --git a/0047-tools-libxs-Fix-CLOEXEC-handling-in-get_dev.patch b/0047-tools-libxs-Fix-CLOEXEC-handling-in-get_dev.patch
new file mode 100644
index 0000000..aabae58
--- /dev/null
+++ b/0047-tools-libxs-Fix-CLOEXEC-handling-in-get_dev.patch
@@ -0,0 +1,95 @@
+From d1b3bbb46402af77089906a97c413c14ed1740d2 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 4 Jul 2024 14:13:10 +0200
+Subject: [PATCH 47/56] tools/libxs: Fix CLOEXEC handling in get_dev()
+
+Move the O_CLOEXEC compatibility outside of an #ifdef USE_PTHREAD block.
+
+Introduce set_cloexec() to wrap fcntl() setting FD_CLOEXEC. It will be reused
+for other CLOEXEC fixes too.
+
+Use set_cloexec() when O_CLOEXEC isn't available as a best-effort fallback.
+
+Fixes: f4f2f3402b2f ("tools/libxs: Open /dev/xen/xenbus fds as O_CLOEXEC")
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+Acked-by: Anthony PERARD <anthony.perard@vates.tech>
+master commit: bf7c1464706adfa903f1e7d59383d042c3a88e39
+master date: 2024-07-02 10:51:06 +0100
+---
+ tools/libs/store/xs.c | 38 ++++++++++++++++++++++++++++++++------
+ 1 file changed, 32 insertions(+), 6 deletions(-)
+
+diff --git a/tools/libs/store/xs.c b/tools/libs/store/xs.c
+index 1498515073..037e79d98b 100644
+--- a/tools/libs/store/xs.c
++++ b/tools/libs/store/xs.c
+@@ -40,6 +40,10 @@
+ #include <xentoolcore_internal.h>
+ #include <xen_list.h>
+
++#ifndef O_CLOEXEC
++#define O_CLOEXEC 0
++#endif
++
+ struct xs_stored_msg {
+ XEN_TAILQ_ENTRY(struct xs_stored_msg) list;
+ struct xsd_sockmsg hdr;
+@@ -54,10 +58,6 @@ struct xs_stored_msg {
+ #include <dlfcn.h>
+ #endif
+
+-#ifndef O_CLOEXEC
+-#define O_CLOEXEC 0
+-#endif
+-
+ struct xs_handle {
+ /* Communications channel to xenstore daemon. */
+ int fd;
+@@ -176,6 +176,16 @@ static bool setnonblock(int fd, int nonblock) {
+ return true;
+ }
+
++static bool set_cloexec(int fd)
++{
++ int flags = fcntl(fd, F_GETFL);
++
++ if (flags < 0)
++ return false;
++
++ return fcntl(fd, flags | FD_CLOEXEC) >= 0;
++}
++
+ int xs_fileno(struct xs_handle *h)
+ {
+ char c = 0;
+@@ -230,8 +240,24 @@ error:
+
+ static int get_dev(const char *connect_to)
+ {
+- /* We cannot open read-only because requests are writes */
+- return open(connect_to, O_RDWR | O_CLOEXEC);
++ int fd, saved_errno;
++
++ fd = open(connect_to, O_RDWR | O_CLOEXEC);
++ if (fd < 0)
++ return -1;
++
++ /* Compat for non-O_CLOEXEC environments. Racy. */
++ if (!O_CLOEXEC && !set_cloexec(fd))
++ goto error;
++
++ return fd;
++
++error:
++ saved_errno = errno;
++ close(fd);
++ errno = saved_errno;
++
++ return -1;
+ }
+
+ static int all_restrict_cb(Xentoolcore__Active_Handle *ah, domid_t domid) {
+--
+2.45.2
+
diff --git a/0047-xen-Swap-order-of-actions-in-the-FREE-macros.patch b/0047-xen-Swap-order-of-actions-in-the-FREE-macros.patch
deleted file mode 100644
index 3e58906..0000000
--- a/0047-xen-Swap-order-of-actions-in-the-FREE-macros.patch
+++ /dev/null
@@ -1,58 +0,0 @@
-From 0a53565f1886201cc8a8afe9b2619ee297c20955 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Fri, 2 Feb 2024 00:39:42 +0000
-Subject: [PATCH 47/67] xen: Swap order of actions in the FREE*() macros
-
-Wherever possible, it is a good idea to NULL out the visible reference to an
-object prior to freeing it. The FREE*() macros already collect together both
-parts, making it easy to adjust.
-
-This has a marginal code generation improvement, as some of the calls to the
-free() function can be tailcall optimised.
-
-No functional change.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit c4f427ec879e7c0df6d44d02561e8bee838a293e)
----
- xen/include/xen/mm.h | 3 ++-
- xen/include/xen/xmalloc.h | 7 ++++---
- 2 files changed, 6 insertions(+), 4 deletions(-)
-
-diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
-index 3dc61bcc3c..211685a5d2 100644
---- a/xen/include/xen/mm.h
-+++ b/xen/include/xen/mm.h
-@@ -80,8 +80,9 @@ bool scrub_free_pages(void);
-
- /* Free an allocation, and zero the pointer to it. */
- #define FREE_XENHEAP_PAGES(p, o) do { \
-- free_xenheap_pages(p, o); \
-+ void *_ptr_ = (p); \
- (p) = NULL; \
-+ free_xenheap_pages(_ptr_, o); \
- } while ( false )
- #define FREE_XENHEAP_PAGE(p) FREE_XENHEAP_PAGES(p, 0)
-
-diff --git a/xen/include/xen/xmalloc.h b/xen/include/xen/xmalloc.h
-index 16979a117c..d857298011 100644
---- a/xen/include/xen/xmalloc.h
-+++ b/xen/include/xen/xmalloc.h
-@@ -66,9 +66,10 @@
- extern void xfree(void *);
-
- /* Free an allocation, and zero the pointer to it. */
--#define XFREE(p) do { \
-- xfree(p); \
-- (p) = NULL; \
-+#define XFREE(p) do { \
-+ void *_ptr_ = (p); \
-+ (p) = NULL; \
-+ xfree(_ptr_); \
- } while ( false )
-
- /* Underlying functions */
---
-2.44.0
-
diff --git a/0048-tools-libxs-Fix-CLOEXEC-handling-in-get_socket.patch b/0048-tools-libxs-Fix-CLOEXEC-handling-in-get_socket.patch
new file mode 100644
index 0000000..e01a6b4
--- /dev/null
+++ b/0048-tools-libxs-Fix-CLOEXEC-handling-in-get_socket.patch
@@ -0,0 +1,60 @@
+From d689bb4d2cd3ccdb0067b0ca953cccbc5ab375ae Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 4 Jul 2024 14:13:18 +0200
+Subject: [PATCH 48/56] tools/libxs: Fix CLOEXEC handling in get_socket()
+
+get_socket() opens a socket, then uses fcntl() to set CLOEXEC. This is racy
+with exec().
+
+Open the socket with SOCK_CLOEXEC. Use the same compatibility strategy as
+O_CLOEXEC on ancient versions of Linux.
+
+Reported-by: Frediano Ziglio <frediano.ziglio@cloud.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+Acked-by: Anthony PERARD <anthony.perard@vates.tech>
+master commit: 1957dd6aff931877fc22699d8f2d4be8728014ba
+master date: 2024-07-02 10:51:11 +0100
+---
+ tools/libs/store/xs.c | 14 ++++++++------
+ 1 file changed, 8 insertions(+), 6 deletions(-)
+
+diff --git a/tools/libs/store/xs.c b/tools/libs/store/xs.c
+index 037e79d98b..11a766c508 100644
+--- a/tools/libs/store/xs.c
++++ b/tools/libs/store/xs.c
+@@ -44,6 +44,10 @@
+ #define O_CLOEXEC 0
+ #endif
+
++#ifndef SOCK_CLOEXEC
++#define SOCK_CLOEXEC 0
++#endif
++
+ struct xs_stored_msg {
+ XEN_TAILQ_ENTRY(struct xs_stored_msg) list;
+ struct xsd_sockmsg hdr;
+@@ -207,16 +211,14 @@ int xs_fileno(struct xs_handle *h)
+ static int get_socket(const char *connect_to)
+ {
+ struct sockaddr_un addr;
+- int sock, saved_errno, flags;
++ int sock, saved_errno;
+
+- sock = socket(PF_UNIX, SOCK_STREAM, 0);
++ sock = socket(PF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0);
+ if (sock < 0)
+ return -1;
+
+- if ((flags = fcntl(sock, F_GETFD)) < 0)
+- goto error;
+- flags |= FD_CLOEXEC;
+- if (fcntl(sock, F_SETFD, flags) < 0)
++ /* Compat for non-SOCK_CLOEXEC environments. Racy. */
++ if (!SOCK_CLOEXEC && !set_cloexec(sock))
+ goto error;
+
+ addr.sun_family = AF_UNIX;
+--
+2.45.2
+
diff --git a/0048-x86-spinlock-introduce-support-for-blocking-speculat.patch b/0048-x86-spinlock-introduce-support-for-blocking-speculat.patch
deleted file mode 100644
index ecf0830..0000000
--- a/0048-x86-spinlock-introduce-support-for-blocking-speculat.patch
+++ /dev/null
@@ -1,331 +0,0 @@
-From 9d2f136328aab5537b7180a1b23e171893ebe455 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 13 Feb 2024 13:08:05 +0100
-Subject: [PATCH 48/67] x86/spinlock: introduce support for blocking
- speculation into critical regions
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Introduce a new Kconfig option to block speculation into lock protected
-critical regions. The Kconfig option is enabled by default, but the mitigation
-won't be engaged unless it's explicitly enabled in the command line using
-`spec-ctrl=lock-harden`.
-
-Convert the spinlock acquire macros into always-inline functions, and introduce
-a speculation barrier after the lock has been taken. Note the speculation
-barrier is not placed inside the implementation of the spin lock functions, as
-to prevent speculation from falling through the call to the lock functions
-resulting in the barrier also being skipped.
-
-trylock variants are protected using a construct akin to the existing
-evaluate_nospec().
-
-This patch only implements the speculation barrier for x86.
-
-Note spin locks are the only locking primitive taken care in this change,
-further locking primitives will be adjusted by separate changes.
-
-This is part of XSA-453 / CVE-2024-2193
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 7ef0084418e188d05f338c3e028fbbe8b6924afa)
----
- docs/misc/xen-command-line.pandoc | 7 ++++-
- xen/arch/x86/include/asm/cpufeatures.h | 2 +-
- xen/arch/x86/include/asm/nospec.h | 26 ++++++++++++++++++
- xen/arch/x86/spec_ctrl.c | 26 +++++++++++++++---
- xen/common/Kconfig | 17 ++++++++++++
- xen/include/xen/nospec.h | 15 +++++++++++
- xen/include/xen/spinlock.h | 37 +++++++++++++++++++++-----
- 7 files changed, 119 insertions(+), 11 deletions(-)
-
-diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index d909ec94fe..e1d56407dd 100644
---- a/docs/misc/xen-command-line.pandoc
-+++ b/docs/misc/xen-command-line.pandoc
-@@ -2327,7 +2327,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
- > {msr-sc,rsb,verw,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
- > bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
- > eager-fpu,l1d-flush,branch-harden,srb-lock,
--> unpriv-mmio,gds-mit,div-scrub}=<bool> ]`
-+> unpriv-mmio,gds-mit,div-scrub,lock-harden}=<bool> ]`
-
- Controls for speculative execution sidechannel mitigations. By default, Xen
- will pick the most appropriate mitigations based on compiled in support,
-@@ -2454,6 +2454,11 @@ On all hardware, the `div-scrub=` option can be used to force or prevent Xen
- from mitigating the DIV-leakage vulnerability. By default, Xen will mitigate
- DIV-leakage on hardware believed to be vulnerable.
-
-+If Xen is compiled with `CONFIG_SPECULATIVE_HARDEN_LOCK`, the `lock-harden=`
-+boolean can be used to force or prevent Xen from using speculation barriers to
-+protect lock critical regions. This mitigation won't be engaged by default,
-+and needs to be explicitly enabled on the command line.
-+
- ### sync_console
- > `= <boolean>`
-
-diff --git a/xen/arch/x86/include/asm/cpufeatures.h b/xen/arch/x86/include/asm/cpufeatures.h
-index c3aad21c3b..7e8221fd85 100644
---- a/xen/arch/x86/include/asm/cpufeatures.h
-+++ b/xen/arch/x86/include/asm/cpufeatures.h
-@@ -24,7 +24,7 @@ XEN_CPUFEATURE(APERFMPERF, X86_SYNTH( 8)) /* APERFMPERF */
- XEN_CPUFEATURE(MFENCE_RDTSC, X86_SYNTH( 9)) /* MFENCE synchronizes RDTSC */
- XEN_CPUFEATURE(XEN_SMEP, X86_SYNTH(10)) /* SMEP gets used by Xen itself */
- XEN_CPUFEATURE(XEN_SMAP, X86_SYNTH(11)) /* SMAP gets used by Xen itself */
--/* Bit 12 unused. */
-+XEN_CPUFEATURE(SC_NO_LOCK_HARDEN, X86_SYNTH(12)) /* (Disable) Lock critical region hardening */
- XEN_CPUFEATURE(IND_THUNK_LFENCE, X86_SYNTH(13)) /* Use IND_THUNK_LFENCE */
- XEN_CPUFEATURE(IND_THUNK_JMP, X86_SYNTH(14)) /* Use IND_THUNK_JMP */
- XEN_CPUFEATURE(SC_NO_BRANCH_HARDEN, X86_SYNTH(15)) /* (Disable) Conditional branch hardening */
-diff --git a/xen/arch/x86/include/asm/nospec.h b/xen/arch/x86/include/asm/nospec.h
-index 7150e76b87..0725839e19 100644
---- a/xen/arch/x86/include/asm/nospec.h
-+++ b/xen/arch/x86/include/asm/nospec.h
-@@ -38,6 +38,32 @@ static always_inline void block_speculation(void)
- barrier_nospec_true();
- }
-
-+static always_inline void arch_block_lock_speculation(void)
-+{
-+ alternative("lfence", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
-+}
-+
-+/* Allow to insert a read memory barrier into conditionals */
-+static always_inline bool barrier_lock_true(void)
-+{
-+ alternative("lfence #nospec-true", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
-+ return true;
-+}
-+
-+static always_inline bool barrier_lock_false(void)
-+{
-+ alternative("lfence #nospec-false", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
-+ return false;
-+}
-+
-+static always_inline bool arch_lock_evaluate_nospec(bool condition)
-+{
-+ if ( condition )
-+ return barrier_lock_true();
-+ else
-+ return barrier_lock_false();
-+}
-+
- #endif /* _ASM_X86_NOSPEC_H */
-
- /*
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 1ee81e2dfe..ac21af2c5c 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -65,6 +65,7 @@ int8_t __read_mostly opt_eager_fpu = -1;
- int8_t __read_mostly opt_l1d_flush = -1;
- static bool __initdata opt_branch_harden =
- IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH);
-+static bool __initdata opt_lock_harden;
-
- bool __initdata bsp_delay_spec_ctrl;
- uint8_t __read_mostly default_xen_spec_ctrl;
-@@ -133,6 +134,7 @@ static int __init cf_check parse_spec_ctrl(const char *s)
- opt_ssbd = false;
- opt_l1d_flush = 0;
- opt_branch_harden = false;
-+ opt_lock_harden = false;
- opt_srb_lock = 0;
- opt_unpriv_mmio = false;
- opt_gds_mit = 0;
-@@ -298,6 +300,16 @@ static int __init cf_check parse_spec_ctrl(const char *s)
- rc = -EINVAL;
- }
- }
-+ else if ( (val = parse_boolean("lock-harden", s, ss)) >= 0 )
-+ {
-+ if ( IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_LOCK) )
-+ opt_lock_harden = val;
-+ else
-+ {
-+ no_config_param("SPECULATIVE_HARDEN_LOCK", "spec-ctrl", s, ss);
-+ rc = -EINVAL;
-+ }
-+ }
- else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 )
- opt_srb_lock = val;
- else if ( (val = parse_boolean("unpriv-mmio", s, ss)) >= 0 )
-@@ -500,7 +512,8 @@ static void __init print_details(enum ind_thunk thunk)
- if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) ||
- IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_ARRAY) ||
- IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH) ||
-- IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS) )
-+ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS) ||
-+ IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_LOCK) )
- printk(" Compiled-in support:"
- #ifdef CONFIG_INDIRECT_THUNK
- " INDIRECT_THUNK"
-@@ -516,11 +529,14 @@ static void __init print_details(enum ind_thunk thunk)
- #endif
- #ifdef CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS
- " HARDEN_GUEST_ACCESS"
-+#endif
-+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
-+ " HARDEN_LOCK"
- #endif
- "\n");
-
- /* Settings for Xen's protection, irrespective of guests. */
-- printk(" Xen settings: %s%sSPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s\n",
-+ printk(" Xen settings: %s%sSPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s%s\n",
- thunk != THUNK_NONE ? "BTI-Thunk: " : "",
- thunk == THUNK_NONE ? "" :
- thunk == THUNK_RETPOLINE ? "RETPOLINE, " :
-@@ -547,7 +563,8 @@ static void __init print_details(enum ind_thunk thunk)
- opt_verw_pv || opt_verw_hvm ||
- opt_verw_mmio ? " VERW" : "",
- opt_div_scrub ? " DIV" : "",
-- opt_branch_harden ? " BRANCH_HARDEN" : "");
-+ opt_branch_harden ? " BRANCH_HARDEN" : "",
-+ opt_lock_harden ? " LOCK_HARDEN" : "");
-
- /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
- if ( cpu_has_bug_l1tf || opt_pv_l1tf_hwdom || opt_pv_l1tf_domu )
-@@ -1930,6 +1947,9 @@ void __init init_speculation_mitigations(void)
- if ( !opt_branch_harden )
- setup_force_cpu_cap(X86_FEATURE_SC_NO_BRANCH_HARDEN);
-
-+ if ( !opt_lock_harden )
-+ setup_force_cpu_cap(X86_FEATURE_SC_NO_LOCK_HARDEN);
-+
- /*
- * We do not disable HT by default on affected hardware.
- *
-diff --git a/xen/common/Kconfig b/xen/common/Kconfig
-index e7794cb7f6..cd73851538 100644
---- a/xen/common/Kconfig
-+++ b/xen/common/Kconfig
-@@ -173,6 +173,23 @@ config SPECULATIVE_HARDEN_GUEST_ACCESS
-
- If unsure, say Y.
-
-+config SPECULATIVE_HARDEN_LOCK
-+ bool "Speculative lock context hardening"
-+ default y
-+ depends on X86
-+ help
-+ Contemporary processors may use speculative execution as a
-+ performance optimisation, but this can potentially be abused by an
-+ attacker to leak data via speculative sidechannels.
-+
-+ One source of data leakage is via speculative accesses to lock
-+ critical regions.
-+
-+ This option is disabled by default at run time, and needs to be
-+ enabled on the command line.
-+
-+ If unsure, say Y.
-+
- endmenu
-
- config DIT_DEFAULT
-diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h
-index 76255bc46e..4552846403 100644
---- a/xen/include/xen/nospec.h
-+++ b/xen/include/xen/nospec.h
-@@ -70,6 +70,21 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
- #define array_access_nospec(array, index) \
- (array)[array_index_nospec(index, ARRAY_SIZE(array))]
-
-+static always_inline void block_lock_speculation(void)
-+{
-+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
-+ arch_block_lock_speculation();
-+#endif
-+}
-+
-+static always_inline bool lock_evaluate_nospec(bool condition)
-+{
-+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
-+ return arch_lock_evaluate_nospec(condition);
-+#endif
-+ return condition;
-+}
-+
- #endif /* XEN_NOSPEC_H */
-
- /*
-diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
-index 961891bea4..daf48fdea7 100644
---- a/xen/include/xen/spinlock.h
-+++ b/xen/include/xen/spinlock.h
-@@ -1,6 +1,7 @@
- #ifndef __SPINLOCK_H__
- #define __SPINLOCK_H__
-
-+#include <xen/nospec.h>
- #include <xen/time.h>
- #include <asm/system.h>
- #include <asm/spinlock.h>
-@@ -189,13 +190,30 @@ int _spin_trylock_recursive(spinlock_t *lock);
- void _spin_lock_recursive(spinlock_t *lock);
- void _spin_unlock_recursive(spinlock_t *lock);
-
--#define spin_lock(l) _spin_lock(l)
--#define spin_lock_cb(l, c, d) _spin_lock_cb(l, c, d)
--#define spin_lock_irq(l) _spin_lock_irq(l)
-+static always_inline void spin_lock(spinlock_t *l)
-+{
-+ _spin_lock(l);
-+ block_lock_speculation();
-+}
-+
-+static always_inline void spin_lock_cb(spinlock_t *l, void (*c)(void *data),
-+ void *d)
-+{
-+ _spin_lock_cb(l, c, d);
-+ block_lock_speculation();
-+}
-+
-+static always_inline void spin_lock_irq(spinlock_t *l)
-+{
-+ _spin_lock_irq(l);
-+ block_lock_speculation();
-+}
-+
- #define spin_lock_irqsave(l, f) \
- ({ \
- BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long)); \
- ((f) = _spin_lock_irqsave(l)); \
-+ block_lock_speculation(); \
- })
-
- #define spin_unlock(l) _spin_unlock(l)
-@@ -203,7 +221,7 @@ void _spin_unlock_recursive(spinlock_t *lock);
- #define spin_unlock_irqrestore(l, f) _spin_unlock_irqrestore(l, f)
-
- #define spin_is_locked(l) _spin_is_locked(l)
--#define spin_trylock(l) _spin_trylock(l)
-+#define spin_trylock(l) lock_evaluate_nospec(_spin_trylock(l))
-
- #define spin_trylock_irqsave(lock, flags) \
- ({ \
-@@ -224,8 +242,15 @@ void _spin_unlock_recursive(spinlock_t *lock);
- * are any critical regions that cannot form part of such a set, they can use
- * standard spin_[un]lock().
- */
--#define spin_trylock_recursive(l) _spin_trylock_recursive(l)
--#define spin_lock_recursive(l) _spin_lock_recursive(l)
-+#define spin_trylock_recursive(l) \
-+ lock_evaluate_nospec(_spin_trylock_recursive(l))
-+
-+static always_inline void spin_lock_recursive(spinlock_t *l)
-+{
-+ _spin_lock_recursive(l);
-+ block_lock_speculation();
-+}
-+
- #define spin_unlock_recursive(l) _spin_unlock_recursive(l)
-
- #endif /* __SPINLOCK_H__ */
---
-2.44.0
-
diff --git a/0049-rwlock-introduce-support-for-blocking-speculation-in.patch b/0049-rwlock-introduce-support-for-blocking-speculation-in.patch
deleted file mode 100644
index 593b588..0000000
--- a/0049-rwlock-introduce-support-for-blocking-speculation-in.patch
+++ /dev/null
@@ -1,125 +0,0 @@
-From 7454dad6ee15f9fa6d84fc285d366b86f3d47494 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 13 Feb 2024 16:08:52 +0100
-Subject: [PATCH 49/67] rwlock: introduce support for blocking speculation into
- critical regions
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Introduce inline wrappers as required and add direct calls to
-block_lock_speculation() in order to prevent speculation into the rwlock
-protected critical regions.
-
-Note the rwlock primitives are adjusted to use the non speculation safe variants
-of the spinlock handlers, as a speculation barrier is added in the rwlock
-calling wrappers.
-
-trylock variants are protected by using lock_evaluate_nospec().
-
-This is part of XSA-453 / CVE-2024-2193
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit a1fb15f61692b1fa9945fc51f55471ace49cdd59)
----
- xen/common/rwlock.c | 14 +++++++++++---
- xen/include/xen/rwlock.h | 34 ++++++++++++++++++++++++++++------
- 2 files changed, 39 insertions(+), 9 deletions(-)
-
-diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
-index aa15529bbe..cda06b9d6e 100644
---- a/xen/common/rwlock.c
-+++ b/xen/common/rwlock.c
-@@ -34,8 +34,11 @@ void queue_read_lock_slowpath(rwlock_t *lock)
-
- /*
- * Put the reader into the wait queue.
-+ *
-+ * Use the speculation unsafe helper, as it's the caller responsibility to
-+ * issue a speculation barrier if required.
- */
-- spin_lock(&lock->lock);
-+ _spin_lock(&lock->lock);
-
- /*
- * At the head of the wait queue now, wait until the writer state
-@@ -64,8 +67,13 @@ void queue_write_lock_slowpath(rwlock_t *lock)
- {
- u32 cnts;
-
-- /* Put the writer into the wait queue. */
-- spin_lock(&lock->lock);
-+ /*
-+ * Put the writer into the wait queue.
-+ *
-+ * Use the speculation unsafe helper, as it's the caller responsibility to
-+ * issue a speculation barrier if required.
-+ */
-+ _spin_lock(&lock->lock);
-
- /* Try to acquire the lock directly if no reader is present. */
- if ( !atomic_read(&lock->cnts) &&
-diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
-index 0cc9167715..fd0458be94 100644
---- a/xen/include/xen/rwlock.h
-+++ b/xen/include/xen/rwlock.h
-@@ -247,27 +247,49 @@ static inline int _rw_is_write_locked(rwlock_t *lock)
- return (atomic_read(&lock->cnts) & _QW_WMASK) == _QW_LOCKED;
- }
-
--#define read_lock(l) _read_lock(l)
--#define read_lock_irq(l) _read_lock_irq(l)
-+static always_inline void read_lock(rwlock_t *l)
-+{
-+ _read_lock(l);
-+ block_lock_speculation();
-+}
-+
-+static always_inline void read_lock_irq(rwlock_t *l)
-+{
-+ _read_lock_irq(l);
-+ block_lock_speculation();
-+}
-+
- #define read_lock_irqsave(l, f) \
- ({ \
- BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long)); \
- ((f) = _read_lock_irqsave(l)); \
-+ block_lock_speculation(); \
- })
-
- #define read_unlock(l) _read_unlock(l)
- #define read_unlock_irq(l) _read_unlock_irq(l)
- #define read_unlock_irqrestore(l, f) _read_unlock_irqrestore(l, f)
--#define read_trylock(l) _read_trylock(l)
-+#define read_trylock(l) lock_evaluate_nospec(_read_trylock(l))
-+
-+static always_inline void write_lock(rwlock_t *l)
-+{
-+ _write_lock(l);
-+ block_lock_speculation();
-+}
-+
-+static always_inline void write_lock_irq(rwlock_t *l)
-+{
-+ _write_lock_irq(l);
-+ block_lock_speculation();
-+}
-
--#define write_lock(l) _write_lock(l)
--#define write_lock_irq(l) _write_lock_irq(l)
- #define write_lock_irqsave(l, f) \
- ({ \
- BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long)); \
- ((f) = _write_lock_irqsave(l)); \
-+ block_lock_speculation(); \
- })
--#define write_trylock(l) _write_trylock(l)
-+#define write_trylock(l) lock_evaluate_nospec(_write_trylock(l))
-
- #define write_unlock(l) _write_unlock(l)
- #define write_unlock_irq(l) _write_unlock_irq(l)
---
-2.44.0
-
diff --git a/0049-tools-libxs-Fix-CLOEXEC-handling-in-xs_fileno.patch b/0049-tools-libxs-Fix-CLOEXEC-handling-in-xs_fileno.patch
new file mode 100644
index 0000000..564cece
--- /dev/null
+++ b/0049-tools-libxs-Fix-CLOEXEC-handling-in-xs_fileno.patch
@@ -0,0 +1,109 @@
+From 26b8ff1861a870e01456b31bf999f25df5538ebf Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 4 Jul 2024 14:13:30 +0200
+Subject: [PATCH 49/56] tools/libxs: Fix CLOEXEC handling in xs_fileno()
+
+xs_fileno() opens a pipe on first use to communicate between the watch thread
+and the main thread. Nothing ever sets CLOEXEC on the file descriptors.
+
+Check for the availability of the pipe2() function with configure. Despite
+starting life as Linux-only, FreeBSD and NetBSD have gained it.
+
+When pipe2() isn't available, try our best with pipe() and set_cloexec().
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+Acked-by: Anthony PERARD <anthony.perard@vates.tech>
+master commit: a2ff677852f0ce05fa335e8e5682bf2ae0c916ee
+master date: 2024-07-02 10:52:59 +0100
+---
+ tools/config.h.in | 3 +++
+ tools/configure | 12 ++++++++++++
+ tools/configure.ac | 2 ++
+ tools/libs/store/xs.c | 16 +++++++++++++++-
+ 4 files changed, 32 insertions(+), 1 deletion(-)
+
+diff --git a/tools/config.h.in b/tools/config.h.in
+index 0bb2fe08a1..50ad60fcb0 100644
+--- a/tools/config.h.in
++++ b/tools/config.h.in
+@@ -39,6 +39,9 @@
+ /* Define to 1 if you have the <memory.h> header file. */
+ #undef HAVE_MEMORY_H
+
++/* Define to 1 if you have the `pipe2' function. */
++#undef HAVE_PIPE2
++
+ /* pygrub enabled */
+ #undef HAVE_PYGRUB
+
+diff --git a/tools/configure b/tools/configure
+index 7bb935d23b..e35112b5c5 100755
+--- a/tools/configure
++++ b/tools/configure
+@@ -9751,6 +9751,18 @@ if test "$ax_found" = "0"; then :
+ fi
+
+
++for ac_func in pipe2
++do :
++ ac_fn_c_check_func "$LINENO" "pipe2" "ac_cv_func_pipe2"
++if test "x$ac_cv_func_pipe2" = xyes; then :
++ cat >>confdefs.h <<_ACEOF
++#define HAVE_PIPE2 1
++_ACEOF
++
++fi
++done
++
++
+ cat >confcache <<\_ACEOF
+ # This file is a shell script that caches the results of configure
+ # tests run on this system so they can be shared between configure
+diff --git a/tools/configure.ac b/tools/configure.ac
+index 618ef8c63f..53ac20af1e 100644
+--- a/tools/configure.ac
++++ b/tools/configure.ac
+@@ -543,4 +543,6 @@ AS_IF([test "x$pvshim" = "xy"], [
+
+ AX_FIND_HEADER([INCLUDE_ENDIAN_H], [endian.h sys/endian.h])
+
++AC_CHECK_FUNCS([pipe2])
++
+ AC_OUTPUT()
+diff --git a/tools/libs/store/xs.c b/tools/libs/store/xs.c
+index 11a766c508..c8845b69e2 100644
+--- a/tools/libs/store/xs.c
++++ b/tools/libs/store/xs.c
+@@ -190,13 +190,27 @@ static bool set_cloexec(int fd)
+ return fcntl(fd, flags | FD_CLOEXEC) >= 0;
+ }
+
++static int pipe_cloexec(int fds[2])
++{
++#if HAVE_PIPE2
++ return pipe2(fds, O_CLOEXEC);
++#else
++ if (pipe(fds) < 0)
++ return -1;
++ /* Best effort to set CLOEXEC. Racy. */
++ set_cloexec(fds[0]);
++ set_cloexec(fds[1]);
++ return 0;
++#endif
++}
++
+ int xs_fileno(struct xs_handle *h)
+ {
+ char c = 0;
+
+ mutex_lock(&h->watch_mutex);
+
+- if ((h->watch_pipe[0] == -1) && (pipe(h->watch_pipe) != -1)) {
++ if ((h->watch_pipe[0] == -1) && (pipe_cloexec(h->watch_pipe) != -1)) {
+ /* Kick things off if the watch list is already non-empty. */
+ if (!XEN_TAILQ_EMPTY(&h->watch_list))
+ while (write(h->watch_pipe[1], &c, 1) != 1)
+--
+2.45.2
+
diff --git a/0050-cmdline-document-and-enforce-extra_guest_irqs-upper-.patch b/0050-cmdline-document-and-enforce-extra_guest_irqs-upper-.patch
new file mode 100644
index 0000000..f7f61e8
--- /dev/null
+++ b/0050-cmdline-document-and-enforce-extra_guest_irqs-upper-.patch
@@ -0,0 +1,156 @@
+From 30c695ddaf067cbe7a98037474e7910109238807 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Thu, 4 Jul 2024 14:14:16 +0200
+Subject: [PATCH 50/56] cmdline: document and enforce "extra_guest_irqs" upper
+ bounds
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+PHYSDEVOP_pirq_eoi_gmfn_v<N> accepting just a single GFN implies that no
+more than 32k pIRQ-s can be used by a domain on x86. Document this upper
+bound.
+
+To also enforce the limit, (ab)use both arch_hwdom_irqs() (changing its
+parameter type) and setup_system_domains(). This is primarily to avoid
+exposing the two static variables or introducing yet further arch hooks.
+
+While touching arch_hwdom_irqs() also mark it hwdom-init.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Roger Pau Monné <roger.pau@citrix.com>
+
+amend 'cmdline: document and enforce "extra_guest_irqs" upper bounds'
+
+Address late review comments for what is now commit 17f6d398f765:
+- bound max_irqs right away against nr_irqs
+- introduce a #define for a constant used twice
+
+Requested-by: Roger Pau Monné <roger.pau@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 17f6d398f76597f8009ec0530842fb8705ece7ba
+master date: 2024-07-02 12:00:27 +0200
+master commit: 1f56accba33ffea0abf7d1c6384710823d10cbd6
+master date: 2024-07-03 14:03:27 +0200
+---
+ docs/misc/xen-command-line.pandoc | 3 ++-
+ xen/arch/x86/io_apic.c | 17 ++++++++++-------
+ xen/common/domain.c | 24 ++++++++++++++++++++++--
+ xen/include/xen/irq.h | 3 ++-
+ 4 files changed, 36 insertions(+), 11 deletions(-)
+
+diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
+index 10a09bbf23..d857bd0f89 100644
+--- a/docs/misc/xen-command-line.pandoc
++++ b/docs/misc/xen-command-line.pandoc
+@@ -1175,7 +1175,8 @@ common for all domUs, while the optional second number (preceded by a comma)
+ is for dom0. Changing the setting for domU has no impact on dom0 and vice
+ versa. For example to change dom0 without changing domU, use
+ `extra_guest_irqs=,512`. The default value for Dom0 and an eventual separate
+-hardware domain is architecture dependent.
++hardware domain is architecture dependent. The upper limit for both values on
++x86 is such that the resulting total number of IRQs can't be higher than 32768.
+ Note that specifying zero as domU value means zero, while for dom0 it means
+ to use the default.
+
+diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
+index c5342789e8..f7591fd091 100644
+--- a/xen/arch/x86/io_apic.c
++++ b/xen/arch/x86/io_apic.c
+@@ -2664,18 +2664,21 @@ void __init ioapic_init(void)
+ nr_irqs_gsi, nr_irqs - nr_irqs_gsi);
+ }
+
+-unsigned int arch_hwdom_irqs(domid_t domid)
++unsigned int __hwdom_init arch_hwdom_irqs(const struct domain *d)
+ {
+ unsigned int n = fls(num_present_cpus());
++ /* Bounding by the domain pirq EOI bitmap capacity. */
++ const unsigned int max_irqs = min_t(unsigned int, nr_irqs,
++ PAGE_SIZE * BITS_PER_BYTE);
+
+- if ( !domid )
+- n = min(n, dom0_max_vcpus());
+- n = min(nr_irqs_gsi + n * NR_DYNAMIC_VECTORS, nr_irqs);
++ if ( is_system_domain(d) )
++ return max_irqs;
+
+- /* Bounded by the domain pirq eoi bitmap gfn. */
+- n = min_t(unsigned int, n, PAGE_SIZE * BITS_PER_BYTE);
++ if ( !d->domain_id )
++ n = min(n, dom0_max_vcpus());
++ n = min(nr_irqs_gsi + n * NR_DYNAMIC_VECTORS, max_irqs);
+
+- printk("Dom%d has maximum %u PIRQs\n", domid, n);
++ printk("%pd has maximum %u PIRQs\n", d, n);
+
+ return n;
+ }
+diff --git a/xen/common/domain.c b/xen/common/domain.c
+index 003f4ab125..62832a5860 100644
+--- a/xen/common/domain.c
++++ b/xen/common/domain.c
+@@ -351,7 +351,8 @@ static int late_hwdom_init(struct domain *d)
+ }
+
+ static unsigned int __read_mostly extra_hwdom_irqs;
+-static unsigned int __read_mostly extra_domU_irqs = 32;
++#define DEFAULT_EXTRA_DOMU_IRQS 32U
++static unsigned int __read_mostly extra_domU_irqs = DEFAULT_EXTRA_DOMU_IRQS;
+
+ static int __init cf_check parse_extra_guest_irqs(const char *s)
+ {
+@@ -688,7 +689,7 @@ struct domain *domain_create(domid_t domid,
+ d->nr_pirqs = nr_static_irqs + extra_domU_irqs;
+ else
+ d->nr_pirqs = extra_hwdom_irqs ? nr_static_irqs + extra_hwdom_irqs
+- : arch_hwdom_irqs(domid);
++ : arch_hwdom_irqs(d);
+ d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
+
+ radix_tree_init(&d->pirq_tree);
+@@ -812,6 +813,25 @@ void __init setup_system_domains(void)
+ if ( IS_ERR(dom_xen) )
+ panic("Failed to create d[XEN]: %ld\n", PTR_ERR(dom_xen));
+
++#ifdef CONFIG_HAS_PIRQ
++ /* Bound-check values passed via "extra_guest_irqs=". */
++ {
++ unsigned int n = max(arch_hwdom_irqs(dom_xen), nr_static_irqs);
++
++ if ( extra_hwdom_irqs > n - nr_static_irqs )
++ {
++ extra_hwdom_irqs = n - nr_static_irqs;
++ printk(XENLOG_WARNING "hwdom IRQs bounded to %u\n", n);
++ }
++ if ( extra_domU_irqs >
++ max(DEFAULT_EXTRA_DOMU_IRQS, n - nr_static_irqs) )
++ {
++ extra_domU_irqs = n - nr_static_irqs;
++ printk(XENLOG_WARNING "domU IRQs bounded to %u\n", n);
++ }
++ }
++#endif
++
+ /*
+ * Initialise our DOMID_IO domain.
+ * This domain owns I/O pages that are within the range of the page_info
+diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
+index 5dcd2d8f0c..bef170bcb6 100644
+--- a/xen/include/xen/irq.h
++++ b/xen/include/xen/irq.h
+@@ -196,8 +196,9 @@ extern struct irq_desc *pirq_spin_lock_irq_desc(
+
+ unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask);
+
++/* When passed a system domain, this returns the maximum permissible value. */
+ #ifndef arch_hwdom_irqs
+-unsigned int arch_hwdom_irqs(domid_t domid);
++unsigned int arch_hwdom_irqs(const struct domain *d);
+ #endif
+
+ #ifndef arch_evtchn_bind_pirq
+--
+2.45.2
+
diff --git a/0050-percpu-rwlock-introduce-support-for-blocking-specula.patch b/0050-percpu-rwlock-introduce-support-for-blocking-specula.patch
deleted file mode 100644
index 1da2128..0000000
--- a/0050-percpu-rwlock-introduce-support-for-blocking-specula.patch
+++ /dev/null
@@ -1,87 +0,0 @@
-From 468a368b2e5a38fc0be8e9e5f475820f7e4a6b4f Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 13 Feb 2024 17:57:38 +0100
-Subject: [PATCH 50/67] percpu-rwlock: introduce support for blocking
- speculation into critical regions
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Add direct calls to block_lock_speculation() where required in order to prevent
-speculation into the lock protected critical regions. Also convert
-_percpu_read_lock() from inline to always_inline.
-
-Note that _percpu_write_lock() has been modified the use the non speculation
-safe of the locking primites, as a speculation is added unconditionally by the
-calling wrapper.
-
-This is part of XSA-453 / CVE-2024-2193
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit f218daf6d3a3b847736d37c6a6b76031a0d08441)
----
- xen/common/rwlock.c | 6 +++++-
- xen/include/xen/rwlock.h | 14 ++++++++++----
- 2 files changed, 15 insertions(+), 5 deletions(-)
-
-diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
-index cda06b9d6e..4da0ed8fad 100644
---- a/xen/common/rwlock.c
-+++ b/xen/common/rwlock.c
-@@ -125,8 +125,12 @@ void _percpu_write_lock(percpu_rwlock_t **per_cpudata,
- /*
- * First take the write lock to protect against other writers or slow
- * path readers.
-+ *
-+ * Note we use the speculation unsafe variant of write_lock(), as the
-+ * calling wrapper already adds a speculation barrier after the lock has
-+ * been taken.
- */
-- write_lock(&percpu_rwlock->rwlock);
-+ _write_lock(&percpu_rwlock->rwlock);
-
- /* Now set the global variable so that readers start using read_lock. */
- percpu_rwlock->writer_activating = 1;
-diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
-index fd0458be94..abe0804bf7 100644
---- a/xen/include/xen/rwlock.h
-+++ b/xen/include/xen/rwlock.h
-@@ -326,8 +326,8 @@ static inline void _percpu_rwlock_owner_check(percpu_rwlock_t **per_cpudata,
- #define percpu_rwlock_resource_init(l, owner) \
- (*(l) = (percpu_rwlock_t)PERCPU_RW_LOCK_UNLOCKED(&get_per_cpu_var(owner)))
-
--static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
-- percpu_rwlock_t *percpu_rwlock)
-+static always_inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
-+ percpu_rwlock_t *percpu_rwlock)
- {
- /* Validate the correct per_cpudata variable has been provided. */
- _percpu_rwlock_owner_check(per_cpudata, percpu_rwlock);
-@@ -362,6 +362,8 @@ static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
- }
- else
- {
-+ /* Other branch already has a speculation barrier in read_lock(). */
-+ block_lock_speculation();
- /* All other paths have implicit check_lock() calls via read_lock(). */
- check_lock(&percpu_rwlock->rwlock.lock.debug, false);
- }
-@@ -410,8 +412,12 @@ static inline void _percpu_write_unlock(percpu_rwlock_t **per_cpudata,
- _percpu_read_lock(&get_per_cpu_var(percpu), lock)
- #define percpu_read_unlock(percpu, lock) \
- _percpu_read_unlock(&get_per_cpu_var(percpu), lock)
--#define percpu_write_lock(percpu, lock) \
-- _percpu_write_lock(&get_per_cpu_var(percpu), lock)
-+
-+#define percpu_write_lock(percpu, lock) \
-+({ \
-+ _percpu_write_lock(&get_per_cpu_var(percpu), lock); \
-+ block_lock_speculation(); \
-+})
- #define percpu_write_unlock(percpu, lock) \
- _percpu_write_unlock(&get_per_cpu_var(percpu), lock)
-
---
-2.44.0
-
diff --git a/0051-locking-attempt-to-ensure-lock-wrappers-are-always-i.patch b/0051-locking-attempt-to-ensure-lock-wrappers-are-always-i.patch
deleted file mode 100644
index 822836d..0000000
--- a/0051-locking-attempt-to-ensure-lock-wrappers-are-always-i.patch
+++ /dev/null
@@ -1,405 +0,0 @@
-From 2cc5e57be680a516aa5cdef4281856d09b9d0ea6 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Mon, 4 Mar 2024 14:29:36 +0100
-Subject: [PATCH 51/67] locking: attempt to ensure lock wrappers are always
- inline
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-In order to prevent the locking speculation barriers from being inside of
-`call`ed functions that could be speculatively bypassed.
-
-While there also add an extra locking barrier to _mm_write_lock() in the branch
-taken when the lock is already held.
-
-Note some functions are switched to use the unsafe variants (without speculation
-barrier) of the locking primitives, but a speculation barrier is always added
-to the exposed public lock wrapping helper. That's the case with
-sched_spin_lock_double() or pcidevs_lock() for example.
-
-This is part of XSA-453 / CVE-2024-2193
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 197ecd838a2aaf959a469df3696d4559c4f8b762)
----
- xen/arch/x86/hvm/vpt.c | 10 +++++++---
- xen/arch/x86/include/asm/irq.h | 1 +
- xen/arch/x86/mm/mm-locks.h | 28 +++++++++++++++-------------
- xen/arch/x86/mm/p2m-pod.c | 2 +-
- xen/common/event_channel.c | 5 +++--
- xen/common/grant_table.c | 6 +++---
- xen/common/sched/core.c | 19 ++++++++++++-------
- xen/common/sched/private.h | 26 ++++++++++++++++++++++++--
- xen/common/timer.c | 8 +++++---
- xen/drivers/passthrough/pci.c | 5 +++--
- xen/include/xen/event.h | 4 ++--
- xen/include/xen/pci.h | 8 ++++++--
- 12 files changed, 82 insertions(+), 40 deletions(-)
-
-diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
-index cb1d81bf9e..66f1095245 100644
---- a/xen/arch/x86/hvm/vpt.c
-+++ b/xen/arch/x86/hvm/vpt.c
-@@ -161,7 +161,7 @@ static int pt_irq_masked(struct periodic_time *pt)
- * pt->vcpu field, because another thread holding the pt_migrate lock
- * may already be spinning waiting for your vcpu lock.
- */
--static void pt_vcpu_lock(struct vcpu *v)
-+static always_inline void pt_vcpu_lock(struct vcpu *v)
- {
- spin_lock(&v->arch.hvm.tm_lock);
- }
-@@ -180,9 +180,13 @@ static void pt_vcpu_unlock(struct vcpu *v)
- * need to take an additional lock that protects against pt->vcpu
- * changing.
- */
--static void pt_lock(struct periodic_time *pt)
-+static always_inline void pt_lock(struct periodic_time *pt)
- {
-- read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
-+ /*
-+ * Use the speculation unsafe variant for the first lock, as the following
-+ * lock taking helper already includes a speculation barrier.
-+ */
-+ _read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
- spin_lock(&pt->vcpu->arch.hvm.tm_lock);
- }
-
-diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
-index f6a0207a80..823d627fd0 100644
---- a/xen/arch/x86/include/asm/irq.h
-+++ b/xen/arch/x86/include/asm/irq.h
-@@ -178,6 +178,7 @@ void cf_check irq_complete_move(struct irq_desc *);
-
- extern struct irq_desc *irq_desc;
-
-+/* Not speculation safe, only used for AP bringup. */
- void lock_vector_lock(void);
- void unlock_vector_lock(void);
-
-diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
-index c1523aeccf..265239c49f 100644
---- a/xen/arch/x86/mm/mm-locks.h
-+++ b/xen/arch/x86/mm/mm-locks.h
-@@ -86,8 +86,8 @@ static inline void _set_lock_level(int l)
- this_cpu(mm_lock_level) = l;
- }
-
--static inline void _mm_lock(const struct domain *d, mm_lock_t *l,
-- const char *func, int level, int rec)
-+static always_inline void _mm_lock(const struct domain *d, mm_lock_t *l,
-+ const char *func, int level, int rec)
- {
- if ( !((mm_locked_by_me(l)) && rec) )
- _check_lock_level(d, level);
-@@ -137,8 +137,8 @@ static inline int mm_write_locked_by_me(mm_rwlock_t *l)
- return (l->locker == get_processor_id());
- }
-
--static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
-- const char *func, int level)
-+static always_inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
-+ const char *func, int level)
- {
- if ( !mm_write_locked_by_me(l) )
- {
-@@ -149,6 +149,8 @@ static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
- l->unlock_level = _get_lock_level();
- _set_lock_level(_lock_level(d, level));
- }
-+ else
-+ block_speculation();
- l->recurse_count++;
- }
-
-@@ -162,8 +164,8 @@ static inline void mm_write_unlock(mm_rwlock_t *l)
- percpu_write_unlock(p2m_percpu_rwlock, &l->lock);
- }
-
--static inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l,
-- int level)
-+static always_inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l,
-+ int level)
- {
- _check_lock_level(d, level);
- percpu_read_lock(p2m_percpu_rwlock, &l->lock);
-@@ -178,15 +180,15 @@ static inline void mm_read_unlock(mm_rwlock_t *l)
-
- /* This wrapper uses the line number to express the locking order below */
- #define declare_mm_lock(name) \
-- static inline void mm_lock_##name(const struct domain *d, mm_lock_t *l, \
-- const char *func, int rec) \
-+ static always_inline void mm_lock_##name( \
-+ const struct domain *d, mm_lock_t *l, const char *func, int rec) \
- { _mm_lock(d, l, func, MM_LOCK_ORDER_##name, rec); }
- #define declare_mm_rwlock(name) \
-- static inline void mm_write_lock_##name(const struct domain *d, \
-- mm_rwlock_t *l, const char *func) \
-+ static always_inline void mm_write_lock_##name( \
-+ const struct domain *d, mm_rwlock_t *l, const char *func) \
- { _mm_write_lock(d, l, func, MM_LOCK_ORDER_##name); } \
-- static inline void mm_read_lock_##name(const struct domain *d, \
-- mm_rwlock_t *l) \
-+ static always_inline void mm_read_lock_##name(const struct domain *d, \
-+ mm_rwlock_t *l) \
- { _mm_read_lock(d, l, MM_LOCK_ORDER_##name); }
- /* These capture the name of the calling function */
- #define mm_lock(name, d, l) mm_lock_##name(d, l, __func__, 0)
-@@ -321,7 +323,7 @@ declare_mm_lock(altp2mlist)
- #define MM_LOCK_ORDER_altp2m 40
- declare_mm_rwlock(altp2m);
-
--static inline void p2m_lock(struct p2m_domain *p)
-+static always_inline void p2m_lock(struct p2m_domain *p)
- {
- if ( p2m_is_altp2m(p) )
- mm_write_lock(altp2m, p->domain, &p->lock);
-diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
-index fc110506dc..99dbcb3101 100644
---- a/xen/arch/x86/mm/p2m-pod.c
-+++ b/xen/arch/x86/mm/p2m-pod.c
-@@ -36,7 +36,7 @@
- #define superpage_aligned(_x) (((_x)&(SUPERPAGE_PAGES-1))==0)
-
- /* Enforce lock ordering when grabbing the "external" page_alloc lock */
--static inline void lock_page_alloc(struct p2m_domain *p2m)
-+static always_inline void lock_page_alloc(struct p2m_domain *p2m)
- {
- page_alloc_mm_pre_lock(p2m->domain);
- spin_lock(&(p2m->domain->page_alloc_lock));
-diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
-index f5e0b12d15..dada9f15f5 100644
---- a/xen/common/event_channel.c
-+++ b/xen/common/event_channel.c
-@@ -62,7 +62,7 @@
- * just assume the event channel is free or unbound at the moment when the
- * evtchn_read_trylock() returns false.
- */
--static inline void evtchn_write_lock(struct evtchn *evtchn)
-+static always_inline void evtchn_write_lock(struct evtchn *evtchn)
- {
- write_lock(&evtchn->lock);
-
-@@ -364,7 +364,8 @@ int evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc, evtchn_port_t port)
- return rc;
- }
-
--static void double_evtchn_lock(struct evtchn *lchn, struct evtchn *rchn)
-+static always_inline void double_evtchn_lock(struct evtchn *lchn,
-+ struct evtchn *rchn)
- {
- ASSERT(lchn != rchn);
-
-diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
-index ee7cc496b8..62a8685cd5 100644
---- a/xen/common/grant_table.c
-+++ b/xen/common/grant_table.c
-@@ -410,7 +410,7 @@ static inline void act_set_gfn(struct active_grant_entry *act, gfn_t gfn)
-
- static DEFINE_PERCPU_RWLOCK_GLOBAL(grant_rwlock);
-
--static inline void grant_read_lock(struct grant_table *gt)
-+static always_inline void grant_read_lock(struct grant_table *gt)
- {
- percpu_read_lock(grant_rwlock, >->lock);
- }
-@@ -420,7 +420,7 @@ static inline void grant_read_unlock(struct grant_table *gt)
- percpu_read_unlock(grant_rwlock, >->lock);
- }
-
--static inline void grant_write_lock(struct grant_table *gt)
-+static always_inline void grant_write_lock(struct grant_table *gt)
- {
- percpu_write_lock(grant_rwlock, >->lock);
- }
-@@ -457,7 +457,7 @@ nr_active_grant_frames(struct grant_table *gt)
- return num_act_frames_from_sha_frames(nr_grant_frames(gt));
- }
-
--static inline struct active_grant_entry *
-+static always_inline struct active_grant_entry *
- active_entry_acquire(struct grant_table *t, grant_ref_t e)
- {
- struct active_grant_entry *act;
-diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
-index 078beb1adb..29bbab5ac6 100644
---- a/xen/common/sched/core.c
-+++ b/xen/common/sched/core.c
-@@ -348,23 +348,28 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
- * This avoids dead- or live-locks when this code is running on both
- * cpus at the same time.
- */
--static void sched_spin_lock_double(spinlock_t *lock1, spinlock_t *lock2,
-- unsigned long *flags)
-+static always_inline void sched_spin_lock_double(
-+ spinlock_t *lock1, spinlock_t *lock2, unsigned long *flags)
- {
-+ /*
-+ * In order to avoid extra overhead, use the locking primitives without the
-+ * speculation barrier, and introduce a single barrier here.
-+ */
- if ( lock1 == lock2 )
- {
-- spin_lock_irqsave(lock1, *flags);
-+ *flags = _spin_lock_irqsave(lock1);
- }
- else if ( lock1 < lock2 )
- {
-- spin_lock_irqsave(lock1, *flags);
-- spin_lock(lock2);
-+ *flags = _spin_lock_irqsave(lock1);
-+ _spin_lock(lock2);
- }
- else
- {
-- spin_lock_irqsave(lock2, *flags);
-- spin_lock(lock1);
-+ *flags = _spin_lock_irqsave(lock2);
-+ _spin_lock(lock1);
- }
-+ block_lock_speculation();
- }
-
- static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
-diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
-index 0527a8c70d..24a93dd0c1 100644
---- a/xen/common/sched/private.h
-+++ b/xen/common/sched/private.h
-@@ -207,8 +207,24 @@ DECLARE_PER_CPU(cpumask_t, cpumask_scratch);
- #define cpumask_scratch (&this_cpu(cpumask_scratch))
- #define cpumask_scratch_cpu(c) (&per_cpu(cpumask_scratch, c))
-
-+/*
-+ * Deal with _spin_lock_irqsave() returning the flags value instead of storing
-+ * it in a passed parameter.
-+ */
-+#define _sched_spinlock0(lock, irq) _spin_lock##irq(lock)
-+#define _sched_spinlock1(lock, irq, arg) ({ \
-+ BUILD_BUG_ON(sizeof(arg) != sizeof(unsigned long)); \
-+ (arg) = _spin_lock##irq(lock); \
-+})
-+
-+#define _sched_spinlock__(nr) _sched_spinlock ## nr
-+#define _sched_spinlock_(nr) _sched_spinlock__(nr)
-+#define _sched_spinlock(lock, irq, args...) \
-+ _sched_spinlock_(count_args(args))(lock, irq, ## args)
-+
- #define sched_lock(kind, param, cpu, irq, arg...) \
--static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
-+static always_inline spinlock_t \
-+*kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
- { \
- for ( ; ; ) \
- { \
-@@ -220,10 +236,16 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
- * \
- * It may also be the case that v->processor may change but the \
- * lock may be the same; this will succeed in that case. \
-+ * \
-+ * Use the speculation unsafe locking helper, there's a speculation \
-+ * barrier before returning to the caller. \
- */ \
-- spin_lock##irq(lock, ## arg); \
-+ _sched_spinlock(lock, irq, ## arg); \
- if ( likely(lock == get_sched_res(cpu)->schedule_lock) ) \
-+ { \
-+ block_lock_speculation(); \
- return lock; \
-+ } \
- spin_unlock##irq(lock, ## arg); \
- } \
- }
-diff --git a/xen/common/timer.c b/xen/common/timer.c
-index 9b5016d5ed..459668d417 100644
---- a/xen/common/timer.c
-+++ b/xen/common/timer.c
-@@ -240,7 +240,7 @@ static inline void deactivate_timer(struct timer *timer)
- list_add(&timer->inactive, &per_cpu(timers, timer->cpu).inactive);
- }
-
--static inline bool_t timer_lock(struct timer *timer)
-+static inline bool_t timer_lock_unsafe(struct timer *timer)
- {
- unsigned int cpu;
-
-@@ -254,7 +254,8 @@ static inline bool_t timer_lock(struct timer *timer)
- rcu_read_unlock(&timer_cpu_read_lock);
- return 0;
- }
-- spin_lock(&per_cpu(timers, cpu).lock);
-+ /* Use the speculation unsafe variant, the wrapper has the barrier. */
-+ _spin_lock(&per_cpu(timers, cpu).lock);
- if ( likely(timer->cpu == cpu) )
- break;
- spin_unlock(&per_cpu(timers, cpu).lock);
-@@ -267,8 +268,9 @@ static inline bool_t timer_lock(struct timer *timer)
- #define timer_lock_irqsave(t, flags) ({ \
- bool_t __x; \
- local_irq_save(flags); \
-- if ( !(__x = timer_lock(t)) ) \
-+ if ( !(__x = timer_lock_unsafe(t)) ) \
- local_irq_restore(flags); \
-+ block_lock_speculation(); \
- __x; \
- })
-
-diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
-index 8c62b14d19..1b3d285166 100644
---- a/xen/drivers/passthrough/pci.c
-+++ b/xen/drivers/passthrough/pci.c
-@@ -52,9 +52,10 @@ struct pci_seg {
-
- static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED;
-
--void pcidevs_lock(void)
-+/* Do not use, as it has no speculation barrier, use pcidevs_lock() instead. */
-+void pcidevs_lock_unsafe(void)
- {
-- spin_lock_recursive(&_pcidevs_lock);
-+ _spin_lock_recursive(&_pcidevs_lock);
- }
-
- void pcidevs_unlock(void)
-diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
-index 8eae9984a9..dd96e84c69 100644
---- a/xen/include/xen/event.h
-+++ b/xen/include/xen/event.h
-@@ -114,12 +114,12 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
- #define bucket_from_port(d, p) \
- ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
-
--static inline void evtchn_read_lock(struct evtchn *evtchn)
-+static always_inline void evtchn_read_lock(struct evtchn *evtchn)
- {
- read_lock(&evtchn->lock);
- }
-
--static inline bool evtchn_read_trylock(struct evtchn *evtchn)
-+static always_inline bool evtchn_read_trylock(struct evtchn *evtchn)
- {
- return read_trylock(&evtchn->lock);
- }
-diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
-index 5975ca2f30..b373f139d1 100644
---- a/xen/include/xen/pci.h
-+++ b/xen/include/xen/pci.h
-@@ -155,8 +155,12 @@ struct pci_dev {
- * devices, it also sync the access to the msi capability that is not
- * interrupt handling related (the mask bit register).
- */
--
--void pcidevs_lock(void);
-+void pcidevs_lock_unsafe(void);
-+static always_inline void pcidevs_lock(void)
-+{
-+ pcidevs_lock_unsafe();
-+ block_lock_speculation();
-+}
- void pcidevs_unlock(void);
- bool_t __must_check pcidevs_locked(void);
-
---
-2.44.0
-
diff --git a/0051-x86-entry-don-t-clear-DF-when-raising-UD-for-lack-of.patch b/0051-x86-entry-don-t-clear-DF-when-raising-UD-for-lack-of.patch
new file mode 100644
index 0000000..acefc8e
--- /dev/null
+++ b/0051-x86-entry-don-t-clear-DF-when-raising-UD-for-lack-of.patch
@@ -0,0 +1,58 @@
+From 7e636b8a16412d4f0d94b2b24d7ebcd2c749afff Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Thu, 4 Jul 2024 14:14:49 +0200
+Subject: [PATCH 51/56] x86/entry: don't clear DF when raising #UD for lack of
+ syscall handler
+
+While doing so is intentional when invoking the actual callback, to
+mimic a hard-coded SYCALL_MASK / FMASK MSR, the same should not be done
+when no handler is available and hence #UD is raised.
+
+Fixes: ca6fcf4321b3 ("x86/pv: Inject #UD for missing SYSCALL callbacks")
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: d2fe9ab3048d503869ec81bc49db07e55a4a2386
+master date: 2024-07-02 12:01:21 +0200
+---
+ xen/arch/x86/x86_64/entry.S | 12 +++++++++++-
+ 1 file changed, 11 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
+index 054fcb225f..d3def49ea3 100644
+--- a/xen/arch/x86/x86_64/entry.S
++++ b/xen/arch/x86/x86_64/entry.S
+@@ -38,6 +38,14 @@ switch_to_kernel:
+ setc %cl
+ leal (,%rcx,TBF_INTERRUPT),%ecx
+
++ /*
++ * The PV ABI hardcodes the (guest-inaccessible and virtual)
++ * SYSCALL_MASK MSR such that DF (and nothing else) would be cleared.
++ * Note that the equivalent of IF (VGCF_syscall_disables_events) is
++ * dealt with separately above.
++ */
++ mov $~X86_EFLAGS_DF, %esi
++
+ test %rax, %rax
+ UNLIKELY_START(z, syscall_no_callback) /* TB_eip == 0 => #UD */
+ mov VCPU_trap_ctxt(%rbx), %rdi
+@@ -47,12 +55,14 @@ UNLIKELY_START(z, syscall_no_callback) /* TB_eip == 0 => #UD */
+ testb $4, X86_EXC_UD * TRAPINFO_sizeof + TRAPINFO_flags(%rdi)
+ setnz %cl
+ lea TBF_EXCEPTION(, %rcx, TBF_INTERRUPT), %ecx
++ or $~0, %esi /* Don't clear DF */
+ UNLIKELY_END(syscall_no_callback)
+
+ movq %rax,TRAPBOUNCE_eip(%rdx)
+ movb %cl,TRAPBOUNCE_flags(%rdx)
+ call create_bounce_frame
+- andl $~X86_EFLAGS_DF,UREGS_eflags(%rsp)
++ /* Conditionally clear DF */
++ and %esi, UREGS_eflags(%rsp)
+ /* %rbx: struct vcpu */
+ test_all_events:
+ ASSERT_NOT_IN_ATOMIC
+--
+2.45.2
+
diff --git a/0052-evtchn-build-fix-for-Arm.patch b/0052-evtchn-build-fix-for-Arm.patch
new file mode 100644
index 0000000..6cbeb10
--- /dev/null
+++ b/0052-evtchn-build-fix-for-Arm.patch
@@ -0,0 +1,43 @@
+From 45c5333935628e7c80de0bd5a9d9eff50b305b16 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Thu, 4 Jul 2024 16:57:29 +0200
+Subject: [PATCH 52/56] evtchn: build fix for Arm
+
+When backporting daa90dfea917 ("pirq_cleanup_check() leaks") I neglected
+to pay attention to it depending on 13a7b0f9f747 ("restrict concept of
+pIRQ to x86"). That one doesn't want backporting imo, so use / adjust
+custom #ifdef-ary to address the immediate issue of pirq_cleanup_check()
+not being available on Arm.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+---
+ xen/common/event_channel.c | 4 +++-
+ 1 file changed, 3 insertions(+), 1 deletion(-)
+
+diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
+index b1a6215c37..e6ec556603 100644
+--- a/xen/common/event_channel.c
++++ b/xen/common/event_channel.c
+@@ -643,7 +643,9 @@ static int evtchn_bind_pirq(evtchn_bind_pirq_t *bind)
+ if ( rc != 0 )
+ {
+ info->evtchn = 0;
++#ifdef CONFIG_X86
+ pirq_cleanup_check(info, d);
++#endif
+ goto out;
+ }
+
+@@ -713,8 +715,8 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
+ * The successful path of unmap_domain_pirq_emuirq() will have
+ * called pirq_cleanup_check() already.
+ */
+-#endif
+ pirq_cleanup_check(pirq, d1);
++#endif
+ }
+ unlink_pirq_port(chn1, d1->vcpu[chn1->notify_vcpu_id]);
+ break;
+--
+2.45.2
+
diff --git a/0052-x86-mm-add-speculation-barriers-to-open-coded-locks.patch b/0052-x86-mm-add-speculation-barriers-to-open-coded-locks.patch
deleted file mode 100644
index 9e20f78..0000000
--- a/0052-x86-mm-add-speculation-barriers-to-open-coded-locks.patch
+++ /dev/null
@@ -1,73 +0,0 @@
-From 074b4c8987db235a0b86798810c045f68e4775b6 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Mon, 4 Mar 2024 18:08:48 +0100
-Subject: [PATCH 52/67] x86/mm: add speculation barriers to open coded locks
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Add a speculation barrier to the clearly identified open-coded lock taking
-functions.
-
-Note that the memory sharing page_lock() replacement (_page_lock()) is left
-as-is, as the code is experimental and not security supported.
-
-This is part of XSA-453 / CVE-2024-2193
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 42a572a38e22a97d86a4b648a22597628d5b42e4)
----
- xen/arch/x86/include/asm/mm.h | 4 +++-
- xen/arch/x86/mm.c | 6 ++++--
- 2 files changed, 7 insertions(+), 3 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/mm.h b/xen/arch/x86/include/asm/mm.h
-index a5d7fdd32e..5845b729c3 100644
---- a/xen/arch/x86/include/asm/mm.h
-+++ b/xen/arch/x86/include/asm/mm.h
-@@ -393,7 +393,9 @@ const struct platform_bad_page *get_platform_badpages(unsigned int *array_size);
- * The use of PGT_locked in mem_sharing does not collide, since mem_sharing is
- * only supported for hvm guests, which do not have PV PTEs updated.
- */
--int page_lock(struct page_info *page);
-+int page_lock_unsafe(struct page_info *page);
-+#define page_lock(pg) lock_evaluate_nospec(page_lock_unsafe(pg))
-+
- void page_unlock(struct page_info *page);
-
- void put_page_type(struct page_info *page);
-diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
-index 330c4abcd1..8d19d719bd 100644
---- a/xen/arch/x86/mm.c
-+++ b/xen/arch/x86/mm.c
-@@ -2033,7 +2033,7 @@ static inline bool current_locked_page_ne_check(struct page_info *page) {
- #define current_locked_page_ne_check(x) true
- #endif
-
--int page_lock(struct page_info *page)
-+int page_lock_unsafe(struct page_info *page)
- {
- unsigned long x, nx;
-
-@@ -2094,7 +2094,7 @@ void page_unlock(struct page_info *page)
- * l3t_lock(), so to avoid deadlock we must avoid grabbing them in
- * reverse order.
- */
--static void l3t_lock(struct page_info *page)
-+static always_inline void l3t_lock(struct page_info *page)
- {
- unsigned long x, nx;
-
-@@ -2103,6 +2103,8 @@ static void l3t_lock(struct page_info *page)
- cpu_relax();
- nx = x | PGT_locked;
- } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x );
-+
-+ block_lock_speculation();
- }
-
- static void l3t_unlock(struct page_info *page)
---
-2.44.0
-
diff --git a/0053-x86-IRQ-avoid-double-unlock-in-map_domain_pirq.patch b/0053-x86-IRQ-avoid-double-unlock-in-map_domain_pirq.patch
new file mode 100644
index 0000000..686e142
--- /dev/null
+++ b/0053-x86-IRQ-avoid-double-unlock-in-map_domain_pirq.patch
@@ -0,0 +1,53 @@
+From d46a1ce3175dc45e97a8c9b89b0d0ff46145ae64 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 16 Jul 2024 14:14:43 +0200
+Subject: [PATCH 53/56] x86/IRQ: avoid double unlock in map_domain_pirq()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Forever since its introduction the main loop in the function dealing
+with multi-vector MSI had error exit points ("break") with different
+properties: In one case no IRQ descriptor lock is being held.
+Nevertheless the subsequent error cleanup path assumed such a lock would
+uniformly need releasing. Identify the case by setting "desc" to NULL,
+thus allowing the unlock to be skipped as necessary.
+
+This is CVE-2024-31143 / XSA-458.
+
+Coverity ID: 1605298
+Fixes: d1b6d0a02489 ("x86: enable multi-vector MSI")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 57338346f29cea7b183403561bdc5f407163b846
+master date: 2024-07-16 14:09:14 +0200
+---
+ xen/arch/x86/irq.c | 5 ++++-
+ 1 file changed, 4 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
+index 00be3b88e8..5dae8bd1b9 100644
+--- a/xen/arch/x86/irq.c
++++ b/xen/arch/x86/irq.c
+@@ -2287,6 +2287,7 @@ int map_domain_pirq(
+
+ set_domain_irq_pirq(d, irq, info);
+ spin_unlock_irqrestore(&desc->lock, flags);
++ desc = NULL;
+
+ info = NULL;
+ irq = create_irq(NUMA_NO_NODE, true);
+@@ -2322,7 +2323,9 @@ int map_domain_pirq(
+
+ if ( ret )
+ {
+- spin_unlock_irqrestore(&desc->lock, flags);
++ if ( desc )
++ spin_unlock_irqrestore(&desc->lock, flags);
++
+ pci_disable_msi(msi_desc);
+ if ( nr )
+ {
+--
+2.45.2
+
diff --git a/0053-x86-protect-conditional-lock-taking-from-speculative.patch b/0053-x86-protect-conditional-lock-taking-from-speculative.patch
deleted file mode 100644
index f0caa24..0000000
--- a/0053-x86-protect-conditional-lock-taking-from-speculative.patch
+++ /dev/null
@@ -1,216 +0,0 @@
-From 0ebd2e49bcd0f566ba6b9158555942aab8e41332 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Mon, 4 Mar 2024 16:24:21 +0100
-Subject: [PATCH 53/67] x86: protect conditional lock taking from speculative
- execution
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Conditionally taken locks that use the pattern:
-
-if ( lock )
- spin_lock(...);
-
-Need an else branch in order to issue an speculation barrier in the else case,
-just like it's done in case the lock needs to be acquired.
-
-eval_nospec() could be used on the condition itself, but that would result in a
-double barrier on the branch where the lock is taken.
-
-Introduce a new pair of helpers, {gfn,spin}_lock_if() that can be used to
-conditionally take a lock in a speculation safe way.
-
-This is part of XSA-453 / CVE-2024-2193
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-(cherry picked from commit 03cf7ca23e0e876075954c558485b267b7d02406)
----
- xen/arch/x86/mm.c | 35 +++++++++++++----------------------
- xen/arch/x86/mm/mm-locks.h | 9 +++++++++
- xen/arch/x86/mm/p2m.c | 5 ++---
- xen/include/xen/spinlock.h | 8 ++++++++
- 4 files changed, 32 insertions(+), 25 deletions(-)
-
-diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
-index 8d19d719bd..d31b8d56ff 100644
---- a/xen/arch/x86/mm.c
-+++ b/xen/arch/x86/mm.c
-@@ -5023,8 +5023,7 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
- if ( !l3t )
- return NULL;
- UNMAP_DOMAIN_PAGE(l3t);
-- if ( locking )
-- spin_lock(&map_pgdir_lock);
-+ spin_lock_if(locking, &map_pgdir_lock);
- if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
- {
- l4_pgentry_t l4e = l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
-@@ -5061,8 +5060,7 @@ static l2_pgentry_t *virt_to_xen_l2e(unsigned long v)
- return NULL;
- }
- UNMAP_DOMAIN_PAGE(l2t);
-- if ( locking )
-- spin_lock(&map_pgdir_lock);
-+ spin_lock_if(locking, &map_pgdir_lock);
- if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
- {
- l3e_write(pl3e, l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR));
-@@ -5100,8 +5098,7 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
- return NULL;
- }
- UNMAP_DOMAIN_PAGE(l1t);
-- if ( locking )
-- spin_lock(&map_pgdir_lock);
-+ spin_lock_if(locking, &map_pgdir_lock);
- if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
- {
- l2e_write(pl2e, l2e_from_mfn(l1mfn, __PAGE_HYPERVISOR));
-@@ -5132,6 +5129,8 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
- do { \
- if ( locking ) \
- l3t_lock(page); \
-+ else \
-+ block_lock_speculation(); \
- } while ( false )
-
- #define L3T_UNLOCK(page) \
-@@ -5347,8 +5346,7 @@ int map_pages_to_xen(
- if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
- flush_flags |= FLUSH_TLB_GLOBAL;
-
-- if ( locking )
-- spin_lock(&map_pgdir_lock);
-+ spin_lock_if(locking, &map_pgdir_lock);
- if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
- (l3e_get_flags(*pl3e) & _PAGE_PSE) )
- {
-@@ -5452,8 +5450,7 @@ int map_pages_to_xen(
- if ( l2e_get_flags(*pl2e) & _PAGE_GLOBAL )
- flush_flags |= FLUSH_TLB_GLOBAL;
-
-- if ( locking )
-- spin_lock(&map_pgdir_lock);
-+ spin_lock_if(locking, &map_pgdir_lock);
- if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
- (l2e_get_flags(*pl2e) & _PAGE_PSE) )
- {
-@@ -5494,8 +5491,7 @@ int map_pages_to_xen(
- unsigned long base_mfn;
- const l1_pgentry_t *l1t;
-
-- if ( locking )
-- spin_lock(&map_pgdir_lock);
-+ spin_lock_if(locking, &map_pgdir_lock);
-
- ol2e = *pl2e;
- /*
-@@ -5549,8 +5545,7 @@ int map_pages_to_xen(
- unsigned long base_mfn;
- const l2_pgentry_t *l2t;
-
-- if ( locking )
-- spin_lock(&map_pgdir_lock);
-+ spin_lock_if(locking, &map_pgdir_lock);
-
- ol3e = *pl3e;
- /*
-@@ -5694,8 +5689,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
- l3e_get_flags(*pl3e)));
- UNMAP_DOMAIN_PAGE(l2t);
-
-- if ( locking )
-- spin_lock(&map_pgdir_lock);
-+ spin_lock_if(locking, &map_pgdir_lock);
- if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
- (l3e_get_flags(*pl3e) & _PAGE_PSE) )
- {
-@@ -5754,8 +5748,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
- l2e_get_flags(*pl2e) & ~_PAGE_PSE));
- UNMAP_DOMAIN_PAGE(l1t);
-
-- if ( locking )
-- spin_lock(&map_pgdir_lock);
-+ spin_lock_if(locking, &map_pgdir_lock);
- if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
- (l2e_get_flags(*pl2e) & _PAGE_PSE) )
- {
-@@ -5799,8 +5792,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
- */
- if ( (nf & _PAGE_PRESENT) || ((v != e) && (l1_table_offset(v) != 0)) )
- continue;
-- if ( locking )
-- spin_lock(&map_pgdir_lock);
-+ spin_lock_if(locking, &map_pgdir_lock);
-
- /*
- * L2E may be already cleared, or set to a superpage, by
-@@ -5847,8 +5839,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
- if ( (nf & _PAGE_PRESENT) ||
- ((v != e) && (l2_table_offset(v) + l1_table_offset(v) != 0)) )
- continue;
-- if ( locking )
-- spin_lock(&map_pgdir_lock);
-+ spin_lock_if(locking, &map_pgdir_lock);
-
- /*
- * L3E may be already cleared, or set to a superpage, by
-diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
-index 265239c49f..3ea2d8eb03 100644
---- a/xen/arch/x86/mm/mm-locks.h
-+++ b/xen/arch/x86/mm/mm-locks.h
-@@ -347,6 +347,15 @@ static inline void p2m_unlock(struct p2m_domain *p)
- #define p2m_locked_by_me(p) mm_write_locked_by_me(&(p)->lock)
- #define gfn_locked_by_me(p,g) p2m_locked_by_me(p)
-
-+static always_inline void gfn_lock_if(bool condition, struct p2m_domain *p2m,
-+ gfn_t gfn, unsigned int order)
-+{
-+ if ( condition )
-+ gfn_lock(p2m, gfn, order);
-+ else
-+ block_lock_speculation();
-+}
-+
- /* PoD lock (per-p2m-table)
- *
- * Protects private PoD data structs: entry and cache
-diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
-index b28c899b5e..1fa9e01012 100644
---- a/xen/arch/x86/mm/p2m.c
-+++ b/xen/arch/x86/mm/p2m.c
-@@ -292,9 +292,8 @@ mfn_t p2m_get_gfn_type_access(struct p2m_domain *p2m, gfn_t gfn,
- if ( q & P2M_UNSHARE )
- q |= P2M_ALLOC;
-
-- if ( locked )
-- /* Grab the lock here, don't release until put_gfn */
-- gfn_lock(p2m, gfn, 0);
-+ /* Grab the lock here, don't release until put_gfn */
-+ gfn_lock_if(locked, p2m, gfn, 0);
-
- mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
-
-diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
-index daf48fdea7..7e75d0e2e7 100644
---- a/xen/include/xen/spinlock.h
-+++ b/xen/include/xen/spinlock.h
-@@ -216,6 +216,14 @@ static always_inline void spin_lock_irq(spinlock_t *l)
- block_lock_speculation(); \
- })
-
-+/* Conditionally take a spinlock in a speculation safe way. */
-+static always_inline void spin_lock_if(bool condition, spinlock_t *l)
-+{
-+ if ( condition )
-+ _spin_lock(l);
-+ block_lock_speculation();
-+}
-+
- #define spin_unlock(l) _spin_unlock(l)
- #define spin_unlock_irq(l) _spin_unlock_irq(l)
- #define spin_unlock_irqrestore(l, f) _spin_unlock_irqrestore(l, f)
---
-2.44.0
-
diff --git a/0054-tools-ipxe-update-for-fixing-build-with-GCC12.patch b/0054-tools-ipxe-update-for-fixing-build-with-GCC12.patch
deleted file mode 100644
index 90efaf8..0000000
--- a/0054-tools-ipxe-update-for-fixing-build-with-GCC12.patch
+++ /dev/null
@@ -1,33 +0,0 @@
-From a01c0b0f9691a8350e74938329892f949669119e Mon Sep 17 00:00:00 2001
-From: Olaf Hering <olaf@aepfle.de>
-Date: Wed, 27 Mar 2024 12:27:03 +0100
-Subject: [PATCH 54/67] tools: ipxe: update for fixing build with GCC12
-
-Use a snapshot which includes commit
-b0ded89e917b48b73097d3b8b88dfa3afb264ed0 ("[build] Disable dangling
-pointer checking for GCC"), which fixes build with gcc12.
-
-Signed-off-by: Olaf Hering <olaf@aepfle.de>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 18a36b4a9b088875486cfe33a2d4a8ae7eb4ab47
-master date: 2023-04-25 23:47:45 +0100
----
- tools/firmware/etherboot/Makefile | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/tools/firmware/etherboot/Makefile b/tools/firmware/etherboot/Makefile
-index 4bc3633ba3..7a56fe8014 100644
---- a/tools/firmware/etherboot/Makefile
-+++ b/tools/firmware/etherboot/Makefile
-@@ -11,7 +11,7 @@ IPXE_GIT_URL ?= git://git.ipxe.org/ipxe.git
- endif
-
- # put an updated tar.gz on xenbits after changes to this variable
--IPXE_GIT_TAG := 3c040ad387099483102708bb1839110bc788cefb
-+IPXE_GIT_TAG := 1d1cf74a5e58811822bee4b3da3cff7282fcdfca
-
- IPXE_TARBALL_URL ?= $(XEN_EXTFILES_URL)/ipxe-git-$(IPXE_GIT_TAG).tar.gz
-
---
-2.44.0
-
diff --git a/0054-x86-physdev-Return-pirq-that-irq-was-already-mapped-.patch b/0054-x86-physdev-Return-pirq-that-irq-was-already-mapped-.patch
new file mode 100644
index 0000000..5e245f9
--- /dev/null
+++ b/0054-x86-physdev-Return-pirq-that-irq-was-already-mapped-.patch
@@ -0,0 +1,38 @@
+From f9f3062f11e144438fac9e9da6aa4cb41a6009b1 Mon Sep 17 00:00:00 2001
+From: Jiqian Chen <Jiqian.Chen@amd.com>
+Date: Thu, 25 Jul 2024 16:20:17 +0200
+Subject: [PATCH 54/56] x86/physdev: Return pirq that irq was already mapped to
+
+Fix bug introduced by 0762e2502f1f ("x86/physdev: factor out the code to allocate and
+map a pirq"). After that re-factoring, when pirq<0 and current_pirq>0, it means
+caller want to allocate a free pirq for irq but irq already has a mapped pirq, then
+it returns the negative pirq, so it fails. However, the logic before that
+re-factoring is different, it should return the current_pirq that irq was already
+mapped to and make the call success.
+
+Fixes: 0762e2502f1f ("x86/physdev: factor out the code to allocate and map a pirq")
+Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
+Signed-off-by: Huang Rui <ray.huang@amd.com>
+Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 0d2b87b5adfc19e87e9027d996db204c66a47f30
+master date: 2024-07-08 14:46:12 +0100
+---
+ xen/arch/x86/irq.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
+index 5dae8bd1b9..6b1f338eae 100644
+--- a/xen/arch/x86/irq.c
++++ b/xen/arch/x86/irq.c
+@@ -2914,6 +2914,7 @@ static int allocate_pirq(struct domain *d, int index, int pirq, int irq,
+ d->domain_id, index, pirq, current_pirq);
+ if ( current_pirq < 0 )
+ return -EBUSY;
++ pirq = current_pirq;
+ }
+ else if ( type == MAP_PIRQ_TYPE_MULTI_MSI )
+ {
+--
+2.45.2
+
diff --git a/0055-tools-libxs-Fix-fcntl-invocation-in-set_cloexec.patch b/0055-tools-libxs-Fix-fcntl-invocation-in-set_cloexec.patch
new file mode 100644
index 0000000..e4cc09e
--- /dev/null
+++ b/0055-tools-libxs-Fix-fcntl-invocation-in-set_cloexec.patch
@@ -0,0 +1,57 @@
+From 81f1e807fadb8111d71b78191e01ca688d74eac7 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 25 Jul 2024 16:20:53 +0200
+Subject: [PATCH 55/56] tools/libxs: Fix fcntl() invocation in set_cloexec()
+
+set_cloexec() had a bit too much copy&pate from setnonblock(), and
+insufficient testing on ancient versions of Linux...
+
+As written (emulating ancient linux by undef'ing O_CLOEXEC), strace shows:
+
+ open("/dev/xen/xenbus", O_RDWR) = 3
+ fcntl(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
+ fcntl(3, 0x8003 /* F_??? */, 0x7ffe4a771d90) = -1 EINVAL (Invalid argument)
+ close(3) = 0
+
+which is obviously nonsense.
+
+Switch F_GETFL -> F_GETFD, and fix the second invocation to use F_SETFD. With
+this, strace is rather happer:
+
+ open("/dev/xen/xenbus", O_RDWR) = 3
+ fcntl(3, F_GETFD) = 0
+ fcntl(3, F_SETFD, FD_CLOEXEC) = 0
+
+Fixes: bf7c1464706a ("tools/libxs: Fix CLOEXEC handling in get_dev()")
+Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+master commit: 37810b52d003f8a04af41d7b1f85eff24af9f804
+master date: 2024-07-09 15:32:18 +0100
+---
+ tools/libs/store/xs.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/tools/libs/store/xs.c b/tools/libs/store/xs.c
+index c8845b69e2..38a6ce3cf2 100644
+--- a/tools/libs/store/xs.c
++++ b/tools/libs/store/xs.c
+@@ -182,12 +182,12 @@ static bool setnonblock(int fd, int nonblock) {
+
+ static bool set_cloexec(int fd)
+ {
+- int flags = fcntl(fd, F_GETFL);
++ int flags = fcntl(fd, F_GETFD);
+
+ if (flags < 0)
+ return false;
+
+- return fcntl(fd, flags | FD_CLOEXEC) >= 0;
++ return fcntl(fd, F_SETFD, flags | FD_CLOEXEC) >= 0;
+ }
+
+ static int pipe_cloexec(int fds[2])
+--
+2.45.2
+
diff --git a/0055-x86-mm-use-block_lock_speculation-in-_mm_write_lock.patch b/0055-x86-mm-use-block_lock_speculation-in-_mm_write_lock.patch
deleted file mode 100644
index 719234c..0000000
--- a/0055-x86-mm-use-block_lock_speculation-in-_mm_write_lock.patch
+++ /dev/null
@@ -1,35 +0,0 @@
-From a153b8b42e9027ba3057bc7c8bf55e4d71e86ec3 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Wed, 27 Mar 2024 12:28:24 +0100
-Subject: [PATCH 55/67] x86/mm: use block_lock_speculation() in
- _mm_write_lock()
-
-I can only guess that using block_speculation() there was a leftover
-from, earlier on, SPECULATIVE_HARDEN_LOCK depending on
-SPECULATIVE_HARDEN_BRANCH.
-
-Fixes: 197ecd838a2a ("locking: attempt to ensure lock wrappers are always inline")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 62018f08708a5ff6ef8fc8ff2aaaac46e5a60430
-master date: 2024-03-18 13:53:37 +0100
----
- xen/arch/x86/mm/mm-locks.h | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
-index 3ea2d8eb03..7d6e4d2a7c 100644
---- a/xen/arch/x86/mm/mm-locks.h
-+++ b/xen/arch/x86/mm/mm-locks.h
-@@ -150,7 +150,7 @@ static always_inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
- _set_lock_level(_lock_level(d, level));
- }
- else
-- block_speculation();
-+ block_lock_speculation();
- l->recurse_count++;
- }
-
---
-2.44.0
-
diff --git a/0056-x86-altcall-fix-clang-code-gen-when-using-altcall-in.patch b/0056-x86-altcall-fix-clang-code-gen-when-using-altcall-in.patch
new file mode 100644
index 0000000..c94c516
--- /dev/null
+++ b/0056-x86-altcall-fix-clang-code-gen-when-using-altcall-in.patch
@@ -0,0 +1,85 @@
+From d078d0aa86e9e3b937f673dc89306b3afd09d560 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Thu, 25 Jul 2024 16:21:17 +0200
+Subject: [PATCH 56/56] x86/altcall: fix clang code-gen when using altcall in
+ loop constructs
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Yet another clang code generation issue when using altcalls.
+
+The issue this time is with using loop constructs around alternative_{,v}call
+instances using parameter types smaller than the register size.
+
+Given the following example code:
+
+static void bar(bool b)
+{
+ unsigned int i;
+
+ for ( i = 0; i < 10; i++ )
+ {
+ int ret_;
+ register union {
+ bool e;
+ unsigned long r;
+ } di asm("rdi") = { .e = b };
+ register unsigned long si asm("rsi");
+ register unsigned long dx asm("rdx");
+ register unsigned long cx asm("rcx");
+ register unsigned long r8 asm("r8");
+ register unsigned long r9 asm("r9");
+ register unsigned long r10 asm("r10");
+ register unsigned long r11 asm("r11");
+
+ asm volatile ( "call %c[addr]"
+ : "+r" (di), "=r" (si), "=r" (dx),
+ "=r" (cx), "=r" (r8), "=r" (r9),
+ "=r" (r10), "=r" (r11), "=a" (ret_)
+ : [addr] "i" (&(func)), "g" (func)
+ : "memory" );
+ }
+}
+
+See: https://godbolt.org/z/qvxMGd84q
+
+Clang will generate machine code that only resets the low 8 bits of %rdi
+between loop calls, leaving the rest of the register possibly containing
+garbage from the use of %rdi inside the called function. Note also that clang
+doesn't truncate the input parameters at the callee, thus breaking the psABI.
+
+Fix this by turning the `e` element in the anonymous union into an array that
+consumes the same space as an unsigned long, as this forces clang to reset the
+whole %rdi register instead of just the low 8 bits.
+
+Fixes: 2ce562b2a413 ('x86/altcall: use a union as register type for function parameters on clang')
+Suggested-by: Jan Beulich <jbeulich@suse.com>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: d51b2f5ea1915fe058f730b0ec542cf84254fca0
+master date: 2024-07-23 13:59:30 +0200
+---
+ xen/arch/x86/include/asm/alternative.h | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/alternative.h b/xen/arch/x86/include/asm/alternative.h
+index 0d3697f1de..e63b459276 100644
+--- a/xen/arch/x86/include/asm/alternative.h
++++ b/xen/arch/x86/include/asm/alternative.h
+@@ -185,10 +185,10 @@ extern void alternative_branches(void);
+ */
+ #define ALT_CALL_ARG(arg, n) \
+ register union { \
+- typeof(arg) e; \
++ typeof(arg) e[sizeof(long) / sizeof(arg)]; \
+ unsigned long r; \
+ } a ## n ## _ asm ( ALT_CALL_arg ## n ) = { \
+- .e = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); }) \
++ .e[0] = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })\
+ }
+ #else
+ #define ALT_CALL_ARG(arg, n) \
+--
+2.45.2
+
diff --git a/0056-x86-boot-Fix-setup_apic_nmi_watchdog-to-fail-more-cl.patch b/0056-x86-boot-Fix-setup_apic_nmi_watchdog-to-fail-more-cl.patch
deleted file mode 100644
index 5d549c1..0000000
--- a/0056-x86-boot-Fix-setup_apic_nmi_watchdog-to-fail-more-cl.patch
+++ /dev/null
@@ -1,120 +0,0 @@
-From 471b53c6a092940f3629990d9ca946aa22bd8535 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 27 Mar 2024 12:29:11 +0100
-Subject: [PATCH 56/67] x86/boot: Fix setup_apic_nmi_watchdog() to fail more
- cleanly
-
-Right now, if the user requests the watchdog on the command line,
-setup_apic_nmi_watchdog() will blindly assume that setting up the watchdog
-worked. Reuse nmi_perfctr_msr to identify when the watchdog has been
-configured.
-
-Rearrange setup_p6_watchdog() to not set nmi_perfctr_msr until the sanity
-checks are complete. Turn setup_p4_watchdog() into a void function, matching
-the others.
-
-If the watchdog isn't set up, inform the user and override to NMI_NONE, which
-will prevent check_nmi_watchdog() from claiming that all CPUs are stuck.
-
-e.g.:
-
- (XEN) alt table ffff82d040697c38 -> ffff82d0406a97f0
- (XEN) Failed to configure NMI watchdog
- (XEN) Brought up 512 CPUs
- (XEN) Scheduling granularity: cpu, 1 CPU per sched-resource
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: f658321374687c7339235e1ac643e0427acff717
-master date: 2024-03-19 18:29:37 +0000
----
- xen/arch/x86/nmi.c | 25 ++++++++++++-------------
- 1 file changed, 12 insertions(+), 13 deletions(-)
-
-diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
-index 7656023748..7c9591b65e 100644
---- a/xen/arch/x86/nmi.c
-+++ b/xen/arch/x86/nmi.c
-@@ -323,8 +323,6 @@ static void setup_p6_watchdog(unsigned counter)
- {
- unsigned int evntsel;
-
-- nmi_perfctr_msr = MSR_P6_PERFCTR(0);
--
- if ( !nmi_p6_event_width && current_cpu_data.cpuid_level >= 0xa )
- nmi_p6_event_width = MASK_EXTR(cpuid_eax(0xa), P6_EVENT_WIDTH_MASK);
- if ( !nmi_p6_event_width )
-@@ -334,6 +332,8 @@ static void setup_p6_watchdog(unsigned counter)
- nmi_p6_event_width > BITS_PER_LONG )
- return;
-
-+ nmi_perfctr_msr = MSR_P6_PERFCTR(0);
-+
- clear_msr_range(MSR_P6_EVNTSEL(0), 2);
- clear_msr_range(MSR_P6_PERFCTR(0), 2);
-
-@@ -349,13 +349,13 @@ static void setup_p6_watchdog(unsigned counter)
- wrmsr(MSR_P6_EVNTSEL(0), evntsel, 0);
- }
-
--static int setup_p4_watchdog(void)
-+static void setup_p4_watchdog(void)
- {
- uint64_t misc_enable;
-
- rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
- if (!(misc_enable & MSR_IA32_MISC_ENABLE_PERF_AVAIL))
-- return 0;
-+ return;
-
- nmi_perfctr_msr = MSR_P4_IQ_PERFCTR0;
- nmi_p4_cccr_val = P4_NMI_IQ_CCCR0;
-@@ -378,13 +378,12 @@ static int setup_p4_watchdog(void)
- clear_msr_range(0x3E0, 2);
- clear_msr_range(MSR_P4_BPU_CCCR0, 18);
- clear_msr_range(MSR_P4_BPU_PERFCTR0, 18);
--
-+
- wrmsrl(MSR_P4_CRU_ESCR0, P4_NMI_CRU_ESCR0);
- wrmsrl(MSR_P4_IQ_CCCR0, P4_NMI_IQ_CCCR0 & ~P4_CCCR_ENABLE);
- write_watchdog_counter("P4_IQ_COUNTER0");
- apic_write(APIC_LVTPC, APIC_DM_NMI);
- wrmsrl(MSR_P4_IQ_CCCR0, nmi_p4_cccr_val);
-- return 1;
- }
-
- void setup_apic_nmi_watchdog(void)
-@@ -399,8 +398,6 @@ void setup_apic_nmi_watchdog(void)
- case 0xf ... 0x19:
- setup_k7_watchdog();
- break;
-- default:
-- return;
- }
- break;
- case X86_VENDOR_INTEL:
-@@ -411,14 +408,16 @@ void setup_apic_nmi_watchdog(void)
- : CORE_EVENT_CPU_CLOCKS_NOT_HALTED);
- break;
- case 15:
-- if (!setup_p4_watchdog())
-- return;
-+ setup_p4_watchdog();
- break;
-- default:
-- return;
- }
- break;
-- default:
-+ }
-+
-+ if ( nmi_perfctr_msr == 0 )
-+ {
-+ printk(XENLOG_WARNING "Failed to configure NMI watchdog\n");
-+ nmi_watchdog = NMI_NONE;
- return;
- }
-
---
-2.44.0
-
diff --git a/0057-x86-PoD-tie-together-P2M-update-and-increment-of-ent.patch b/0057-x86-PoD-tie-together-P2M-update-and-increment-of-ent.patch
deleted file mode 100644
index dedc1c2..0000000
--- a/0057-x86-PoD-tie-together-P2M-update-and-increment-of-ent.patch
+++ /dev/null
@@ -1,61 +0,0 @@
-From bfb69205376d94ff91b09a337c47fb665ee12da3 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Wed, 27 Mar 2024 12:29:33 +0100
-Subject: [PATCH 57/67] x86/PoD: tie together P2M update and increment of entry
- count
-
-When not holding the PoD lock across the entire region covering P2M
-update and stats update, the entry count - if to be incorrect at all -
-should indicate too large a value in preference to a too small one, to
-avoid functions bailing early when they find the count is zero. However,
-instead of moving the increment ahead (and adjust back upon failure),
-extend the PoD-locked region.
-
-Fixes: 99af3cd40b6e ("x86/mm: Rework locking in the PoD layer")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: George Dunlap <george.dunlap@cloud.com>
-master commit: cc950c49ae6a6690f7fc3041a1f43122c250d250
-master date: 2024-03-21 09:48:10 +0100
----
- xen/arch/x86/mm/p2m-pod.c | 15 ++++++++++++---
- 1 file changed, 12 insertions(+), 3 deletions(-)
-
-diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
-index 99dbcb3101..e903db9d93 100644
---- a/xen/arch/x86/mm/p2m-pod.c
-+++ b/xen/arch/x86/mm/p2m-pod.c
-@@ -1370,19 +1370,28 @@ mark_populate_on_demand(struct domain *d, unsigned long gfn_l,
- }
- }
-
-+ /*
-+ * P2M update and stats increment need to collectively be under PoD lock,
-+ * to prevent code elsewhere observing PoD entry count being zero despite
-+ * there actually still being PoD entries (created by the p2m_set_entry()
-+ * invocation below).
-+ */
-+ pod_lock(p2m);
-+
- /* Now, actually do the two-way mapping */
- rc = p2m_set_entry(p2m, gfn, INVALID_MFN, order,
- p2m_populate_on_demand, p2m->default_access);
- if ( rc == 0 )
- {
-- pod_lock(p2m);
- p2m->pod.entry_count += 1UL << order;
- p2m->pod.entry_count -= pod_count;
- BUG_ON(p2m->pod.entry_count < 0);
-- pod_unlock(p2m);
-+ }
-+
-+ pod_unlock(p2m);
-
-+ if ( rc == 0 )
- ioreq_request_mapcache_invalidate(d);
-- }
- else if ( order )
- {
- /*
---
-2.44.0
-
diff --git a/0058-tools-oxenstored-Use-Map-instead-of-Hashtbl-for-quot.patch b/0058-tools-oxenstored-Use-Map-instead-of-Hashtbl-for-quot.patch
deleted file mode 100644
index dfc7f5a..0000000
--- a/0058-tools-oxenstored-Use-Map-instead-of-Hashtbl-for-quot.patch
+++ /dev/null
@@ -1,143 +0,0 @@
-From 7abd305607938b846da1a37dd1bda7bf7d47dba5 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
-Date: Wed, 31 Jan 2024 10:52:55 +0000
-Subject: [PATCH 58/67] tools/oxenstored: Use Map instead of Hashtbl for quotas
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-On a stress test running 1000 VMs flamegraphs have shown that
-`oxenstored` spends a large amount of time in `Hashtbl.copy` and the GC.
-
-Hashtable complexity:
- * read/write: O(1) average
- * copy: O(domains) -- copying the entire table
-
-Map complexity:
- * read/write: O(log n) worst case
- * copy: O(1) -- a word copy
-
-We always perform at least one 'copy' when processing each xenstore
-packet (regardless whether it is a readonly operation or inside a
-transaction or not), so the actual complexity per packet is:
- * Hashtbl: O(domains)
- * Map: O(log domains)
-
-Maps are the clear winner, and a better fit for the immutable xenstore
-tree.
-
-Signed-off-by: Edwin Török <edwin.torok@cloud.com>
-Acked-by: Christian Lindig <christian.lindig@cloud.com>
-(cherry picked from commit b6cf604207fd0a04451a48f2ce6d05fb66c612ab)
----
- tools/ocaml/xenstored/quota.ml | 65 ++++++++++++++++++----------------
- 1 file changed, 34 insertions(+), 31 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/quota.ml b/tools/ocaml/xenstored/quota.ml
-index 6e3d6401ae..ee8dd22581 100644
---- a/tools/ocaml/xenstored/quota.ml
-+++ b/tools/ocaml/xenstored/quota.ml
-@@ -23,66 +23,69 @@ let activate = ref true
- let maxent = ref (1000)
- let maxsize = ref (2048)
-
-+module Domid = struct
-+ type t = Xenctrl.domid
-+ let compare (a:t) (b:t) = compare a b
-+end
-+
-+module DomidMap = Map.Make(Domid)
-+
- type t = {
- maxent: int; (* max entities per domU *)
- maxsize: int; (* max size of data store in one node *)
-- cur: (Xenctrl.domid, int) Hashtbl.t; (* current domains quota *)
-+ mutable cur: int DomidMap.t; (* current domains quota *)
- }
-
- let to_string quota domid =
-- if Hashtbl.mem quota.cur domid
-- then Printf.sprintf "dom%i quota: %i/%i" domid (Hashtbl.find quota.cur domid) quota.maxent
-- else Printf.sprintf "dom%i quota: not set" domid
-+ try
-+ Printf.sprintf "dom%i quota: %i/%i" domid (DomidMap.find domid quota.cur) quota.maxent
-+ with Not_found ->
-+ Printf.sprintf "dom%i quota: not set" domid
-
- let create () =
-- { maxent = !maxent; maxsize = !maxsize; cur = Hashtbl.create 100; }
-+ { maxent = !maxent; maxsize = !maxsize; cur = DomidMap.empty; }
-
--let copy quota = { quota with cur = (Hashtbl.copy quota.cur) }
-+let copy quota = { quota with cur = quota.cur }
-
--let del quota id = Hashtbl.remove quota.cur id
-+let del quota id = { quota with cur = DomidMap.remove id quota.cur }
-
- let _check quota id size =
- if size > quota.maxsize then (
- warn "domain %u err create entry: data too big %d" id size;
- raise Data_too_big
- );
-- if id > 0 && Hashtbl.mem quota.cur id then
-- let entry = Hashtbl.find quota.cur id in
-+ if id > 0 then
-+ try
-+ let entry = DomidMap.find id quota.cur in
- if entry >= quota.maxent then (
- warn "domain %u cannot create entry: quota reached" id;
- raise Limit_reached
- )
-+ with Not_found -> ()
-
- let check quota id size =
- if !activate then
- _check quota id size
-
--let get_entry quota id = Hashtbl.find quota.cur id
-+let find_or_zero quota_cur id =
-+ try DomidMap.find id quota_cur with Not_found -> 0
-
--let set_entry quota id nb =
-- if nb = 0
-- then Hashtbl.remove quota.cur id
-- else begin
-- if Hashtbl.mem quota.cur id then
-- Hashtbl.replace quota.cur id nb
-- else
-- Hashtbl.add quota.cur id nb
-- end
-+let update_entry quota_cur id diff =
-+ let nb = diff + find_or_zero quota_cur id in
-+ if nb = 0 then DomidMap.remove id quota_cur
-+ else DomidMap.add id nb quota_cur
-
- let del_entry quota id =
-- try
-- let nb = get_entry quota id in
-- set_entry quota id (nb - 1)
-- with Not_found -> ()
-+ quota.cur <- update_entry quota.cur id (-1)
-
- let add_entry quota id =
-- let nb = try get_entry quota id with Not_found -> 0 in
-- set_entry quota id (nb + 1)
--
--let add quota diff =
-- Hashtbl.iter (fun id nb -> set_entry quota id (get_entry quota id + nb)) diff.cur
-+ quota.cur <- update_entry quota.cur id (+1)
-
- let merge orig_quota mod_quota dest_quota =
-- Hashtbl.iter (fun id nb -> let diff = nb - (try get_entry orig_quota id with Not_found -> 0) in
-- if diff <> 0 then
-- set_entry dest_quota id ((try get_entry dest_quota id with Not_found -> 0) + diff)) mod_quota.cur
-+ let fold_merge id nb dest =
-+ match nb - find_or_zero orig_quota.cur id with
-+ | 0 -> dest (* not modified *)
-+ | diff -> update_entry dest id diff (* update with [x=x+diff] *)
-+ in
-+ dest_quota.cur <- DomidMap.fold fold_merge mod_quota.cur dest_quota.cur
-+ (* dest_quota = dest_quota + (mod_quota - orig_quota) *)
---
-2.44.0
-
diff --git a/0059-tools-oxenstored-Make-Quota.t-pure.patch b/0059-tools-oxenstored-Make-Quota.t-pure.patch
deleted file mode 100644
index 7616b90..0000000
--- a/0059-tools-oxenstored-Make-Quota.t-pure.patch
+++ /dev/null
@@ -1,121 +0,0 @@
-From f38a815a54000ca51ff5165b2863d60b6bbea49c Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Edwin=20T=C3=B6r=C3=B6k?= <edwin.torok@cloud.com>
-Date: Wed, 31 Jan 2024 10:52:56 +0000
-Subject: [PATCH 59/67] tools/oxenstored: Make Quota.t pure
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Now that we no longer have a hashtable inside we can make Quota.t pure, and
-push the mutable update to its callers. Store.t already had a mutable Quota.t
-field.
-
-No functional change.
-
-Signed-off-by: Edwin Török <edwin.torok@cloud.com>
-Acked-by: Christian Lindig <christian.lindig@cloud.com>
-(cherry picked from commit 098d868e52ac0165b7f36e22b767ea70cef70054)
----
- tools/ocaml/xenstored/quota.ml | 8 ++++----
- tools/ocaml/xenstored/store.ml | 17 ++++++++++-------
- 2 files changed, 14 insertions(+), 11 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/quota.ml b/tools/ocaml/xenstored/quota.ml
-index ee8dd22581..b3ab678c72 100644
---- a/tools/ocaml/xenstored/quota.ml
-+++ b/tools/ocaml/xenstored/quota.ml
-@@ -33,7 +33,7 @@ module DomidMap = Map.Make(Domid)
- type t = {
- maxent: int; (* max entities per domU *)
- maxsize: int; (* max size of data store in one node *)
-- mutable cur: int DomidMap.t; (* current domains quota *)
-+ cur: int DomidMap.t; (* current domains quota *)
- }
-
- let to_string quota domid =
-@@ -76,10 +76,10 @@ let update_entry quota_cur id diff =
- else DomidMap.add id nb quota_cur
-
- let del_entry quota id =
-- quota.cur <- update_entry quota.cur id (-1)
-+ {quota with cur = update_entry quota.cur id (-1)}
-
- let add_entry quota id =
-- quota.cur <- update_entry quota.cur id (+1)
-+ {quota with cur = update_entry quota.cur id (+1)}
-
- let merge orig_quota mod_quota dest_quota =
- let fold_merge id nb dest =
-@@ -87,5 +87,5 @@ let merge orig_quota mod_quota dest_quota =
- | 0 -> dest (* not modified *)
- | diff -> update_entry dest id diff (* update with [x=x+diff] *)
- in
-- dest_quota.cur <- DomidMap.fold fold_merge mod_quota.cur dest_quota.cur
-+ {dest_quota with cur = DomidMap.fold fold_merge mod_quota.cur dest_quota.cur}
- (* dest_quota = dest_quota + (mod_quota - orig_quota) *)
-diff --git a/tools/ocaml/xenstored/store.ml b/tools/ocaml/xenstored/store.ml
-index c94dbf3a62..5dd965db15 100644
---- a/tools/ocaml/xenstored/store.ml
-+++ b/tools/ocaml/xenstored/store.ml
-@@ -85,7 +85,9 @@ let check_owner node connection =
- raise Define.Permission_denied;
- end
-
--let rec recurse fct node = fct node; SymbolMap.iter (fun _ -> recurse fct) node.children
-+let rec recurse fct node acc =
-+ let acc = fct node acc in
-+ SymbolMap.fold (fun _ -> recurse fct) node.children acc
-
- (** [recurse_filter_map f tree] applies [f] on each node in the tree recursively,
- possibly removing some nodes.
-@@ -408,7 +410,7 @@ let dump_buffer store = dump_store_buf store.root
- let set_node store path node orig_quota mod_quota =
- let root = Path.set_node store.root path node in
- store.root <- root;
-- Quota.merge orig_quota mod_quota store.quota
-+ store.quota <- Quota.merge orig_quota mod_quota store.quota
-
- let write store perm path value =
- let node, existing = get_deepest_existing_node store path in
-@@ -422,7 +424,7 @@ let write store perm path value =
- let root, node_created = path_write store perm path value in
- store.root <- root;
- if node_created
-- then Quota.add_entry store.quota owner
-+ then store.quota <- Quota.add_entry store.quota owner
-
- let mkdir store perm path =
- let node, existing = get_deepest_existing_node store path in
-@@ -431,7 +433,7 @@ let mkdir store perm path =
- if not (existing || (Perms.Connection.is_dom0 perm)) then Quota.check store.quota owner 0;
- store.root <- path_mkdir store perm path;
- if not existing then
-- Quota.add_entry store.quota owner
-+ store.quota <- Quota.add_entry store.quota owner
-
- let rm store perm path =
- let rmed_node = Path.get_node store.root path in
-@@ -439,7 +441,7 @@ let rm store perm path =
- | None -> raise Define.Doesnt_exist
- | Some rmed_node ->
- store.root <- path_rm store perm path;
-- Node.recurse (fun node -> Quota.del_entry store.quota (Node.get_owner node)) rmed_node
-+ store.quota <- Node.recurse (fun node quota -> Quota.del_entry quota (Node.get_owner node)) rmed_node store.quota
-
- let setperms store perm path nperms =
- match Path.get_node store.root path with
-@@ -450,8 +452,9 @@ let setperms store perm path nperms =
- if not ((old_owner = new_owner) || (Perms.Connection.is_dom0 perm)) then
- raise Define.Permission_denied;
- store.root <- path_setperms store perm path nperms;
-- Quota.del_entry store.quota old_owner;
-- Quota.add_entry store.quota new_owner
-+ store.quota <-
-+ let quota = Quota.del_entry store.quota old_owner in
-+ Quota.add_entry quota new_owner
-
- let reset_permissions store domid =
- Logging.info "store|node" "Cleaning up xenstore ACLs for domid %d" domid;
---
-2.44.0
-
diff --git a/0060-x86-cpu-policy-Hide-x2APIC-from-PV-guests.patch b/0060-x86-cpu-policy-Hide-x2APIC-from-PV-guests.patch
deleted file mode 100644
index ce2b89d..0000000
--- a/0060-x86-cpu-policy-Hide-x2APIC-from-PV-guests.patch
+++ /dev/null
@@ -1,90 +0,0 @@
-From bb27e11c56963e170d1f6d2fbddbc956f7164121 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 2 Apr 2024 16:17:25 +0200
-Subject: [PATCH 60/67] x86/cpu-policy: Hide x2APIC from PV guests
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-PV guests can't write to MSR_APIC_BASE (in order to set EXTD), nor can they
-access any of the x2APIC MSR range. Therefore they mustn't see the x2APIC
-CPUID bit saying that they can.
-
-Right now, the host x2APIC flag filters into PV guests, meaning that PV guests
-generally see x2APIC except on Zen1-and-older AMD systems.
-
-Linux works around this by explicitly hiding the bit itself, and filtering
-EXTD out of MSR_APIC_BASE reads. NetBSD behaves more in the spirit of PV
-guests, and entirely ignores the APIC when built as a PV guest.
-
-Change the annotation from !A to !S. This has a consequence of stripping it
-out of both PV featuremasks. However, as existing guests may have seen the
-bit, set it back into the PV Max policy; a VM which saw the bit and is alive
-enough to migrate will have ignored it one way or another.
-
-Hiding x2APIC does change the contents of leaf 0xb, but as the information is
-nonsense to begin with, this is likely an improvement on the status quo.
-
-Xen's blind assumption that APIC_ID = vCPU_ID * 2 isn't interlinked with the
-host's topology structure, where a PV guest may see real host values, and the
-APIC_IDs are useless without an MADT to start with. Dom0 is the only PV VM to
-get an MADT but it's the host one, meaning the two sets of APIC_IDs are from
-different address spaces.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 5420aa165dfa5fe95dd84bb71cb96c15459935b1
-master date: 2024-03-01 20:14:19 +0000
----
- xen/arch/x86/cpu-policy.c | 11 +++++++++--
- xen/include/public/arch-x86/cpufeatureset.h | 2 +-
- 2 files changed, 10 insertions(+), 3 deletions(-)
-
-diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
-index 96c2cee1a8..ed64d56294 100644
---- a/xen/arch/x86/cpu-policy.c
-+++ b/xen/arch/x86/cpu-policy.c
-@@ -559,6 +559,14 @@ static void __init calculate_pv_max_policy(void)
- for ( i = 0; i < ARRAY_SIZE(fs); ++i )
- fs[i] &= pv_max_featuremask[i];
-
-+ /*
-+ * Xen at the time of writing (Feb 2024, 4.19 dev cycle) used to leak the
-+ * host x2APIC capability into PV guests, but never supported the guest
-+ * trying to turn x2APIC mode on. Tolerate an incoming VM which saw the
-+ * x2APIC CPUID bit and is alive enough to migrate.
-+ */
-+ __set_bit(X86_FEATURE_X2APIC, fs);
-+
- /*
- * If Xen isn't virtualising MSR_SPEC_CTRL for PV guests (functional
- * availability, or admin choice), hide the feature.
-@@ -837,11 +845,10 @@ void recalculate_cpuid_policy(struct domain *d)
- }
-
- /*
-- * Allow the toolstack to set HTT, X2APIC and CMP_LEGACY. These bits
-+ * Allow the toolstack to set HTT and CMP_LEGACY. These bits
- * affect how to interpret topology information in other cpuid leaves.
- */
- __set_bit(X86_FEATURE_HTT, max_fs);
-- __set_bit(X86_FEATURE_X2APIC, max_fs);
- __set_bit(X86_FEATURE_CMP_LEGACY, max_fs);
-
- /*
-diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
-index 113e6cadc1..bc971f3c6f 100644
---- a/xen/include/public/arch-x86/cpufeatureset.h
-+++ b/xen/include/public/arch-x86/cpufeatureset.h
-@@ -123,7 +123,7 @@ XEN_CPUFEATURE(PCID, 1*32+17) /*H Process Context ID */
- XEN_CPUFEATURE(DCA, 1*32+18) /* Direct Cache Access */
- XEN_CPUFEATURE(SSE4_1, 1*32+19) /*A Streaming SIMD Extensions 4.1 */
- XEN_CPUFEATURE(SSE4_2, 1*32+20) /*A Streaming SIMD Extensions 4.2 */
--XEN_CPUFEATURE(X2APIC, 1*32+21) /*!A Extended xAPIC */
-+XEN_CPUFEATURE(X2APIC, 1*32+21) /*!S Extended xAPIC */
- XEN_CPUFEATURE(MOVBE, 1*32+22) /*A movbe instruction */
- XEN_CPUFEATURE(POPCNT, 1*32+23) /*A POPCNT instruction */
- XEN_CPUFEATURE(TSC_DEADLINE, 1*32+24) /*S TSC Deadline Timer */
---
-2.44.0
-
diff --git a/0061-x86-cpu-policy-Fix-visibility-of-HTT-CMP_LEGACY-in-m.patch b/0061-x86-cpu-policy-Fix-visibility-of-HTT-CMP_LEGACY-in-m.patch
deleted file mode 100644
index d1b8786..0000000
--- a/0061-x86-cpu-policy-Fix-visibility-of-HTT-CMP_LEGACY-in-m.patch
+++ /dev/null
@@ -1,85 +0,0 @@
-From 70ad9c5fdeac4814050080c87e06d44292ecf868 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 2 Apr 2024 16:18:05 +0200
-Subject: [PATCH 61/67] x86/cpu-policy: Fix visibility of HTT/CMP_LEGACY in max
- policies
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The block in recalculate_cpuid_policy() predates the proper split between
-default and max policies, and was a "slightly max for a toolstack which knows
-about it" capability. It didn't get transformed properly in Xen 4.14.
-
-Because Xen will accept a VM with HTT/CMP_LEGACY seen, they should be visible
-in the max polices. Keep the default policy matching host settings.
-
-This manifested as an incorrectly-rejected migration across XenServer's Xen
-4.13 -> 4.17 upgrade, as Xapi is slowly growing the logic to check a VM
-against the target max policy.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: e2d8a652251660c3252d92b442e1a9c5d6e6a1e9
-master date: 2024-03-01 20:14:19 +0000
----
- xen/arch/x86/cpu-policy.c | 29 ++++++++++++++++++++++-------
- 1 file changed, 22 insertions(+), 7 deletions(-)
-
-diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
-index ed64d56294..24acd12ce2 100644
---- a/xen/arch/x86/cpu-policy.c
-+++ b/xen/arch/x86/cpu-policy.c
-@@ -458,6 +458,16 @@ static void __init guest_common_max_feature_adjustments(uint32_t *fs)
- raw_cpu_policy.feat.clwb )
- __set_bit(X86_FEATURE_CLWB, fs);
- }
-+
-+ /*
-+ * Topology information inside the guest is entirely at the toolstack's
-+ * discretion, and bears no relationship to the host we're running on.
-+ *
-+ * HTT identifies p->basic.lppp as valid
-+ * CMP_LEGACY identifies p->extd.nc as valid
-+ */
-+ __set_bit(X86_FEATURE_HTT, fs);
-+ __set_bit(X86_FEATURE_CMP_LEGACY, fs);
- }
-
- static void __init guest_common_default_feature_adjustments(uint32_t *fs)
-@@ -512,6 +522,18 @@ static void __init guest_common_default_feature_adjustments(uint32_t *fs)
- __clear_bit(X86_FEATURE_CLWB, fs);
- }
-
-+ /*
-+ * Topology information is at the toolstack's discretion so these are
-+ * unconditionally set in max, but pick a default which matches the host.
-+ */
-+ __clear_bit(X86_FEATURE_HTT, fs);
-+ if ( cpu_has_htt )
-+ __set_bit(X86_FEATURE_HTT, fs);
-+
-+ __clear_bit(X86_FEATURE_CMP_LEGACY, fs);
-+ if ( cpu_has_cmp_legacy )
-+ __set_bit(X86_FEATURE_CMP_LEGACY, fs);
-+
- /*
- * On certain hardware, speculative or errata workarounds can result in
- * TSX being placed in "force-abort" mode, where it doesn't actually
-@@ -844,13 +866,6 @@ void recalculate_cpuid_policy(struct domain *d)
- }
- }
-
-- /*
-- * Allow the toolstack to set HTT and CMP_LEGACY. These bits
-- * affect how to interpret topology information in other cpuid leaves.
-- */
-- __set_bit(X86_FEATURE_HTT, max_fs);
-- __set_bit(X86_FEATURE_CMP_LEGACY, max_fs);
--
- /*
- * 32bit PV domains can't use any Long Mode features, and cannot use
- * SYSCALL on non-AMD hardware.
---
-2.44.0
-
diff --git a/0062-xen-virtual-region-Rename-the-start-end-fields.patch b/0062-xen-virtual-region-Rename-the-start-end-fields.patch
deleted file mode 100644
index 9dbd5c9..0000000
--- a/0062-xen-virtual-region-Rename-the-start-end-fields.patch
+++ /dev/null
@@ -1,140 +0,0 @@
-From 2392e958ec6fd2e48e011781344cf94dee6d6142 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 2 Apr 2024 16:18:51 +0200
-Subject: [PATCH 62/67] xen/virtual-region: Rename the start/end fields
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-... to text_{start,end}. We're about to introduce another start/end pair.
-
-Despite it's name, struct virtual_region has always been a module-ish
-description. Call this out specifically.
-
-As minor cleanup, replace ROUNDUP(x, PAGE_SIZE) with the more concise
-PAGE_ALIGN() ahead of duplicating the example.
-
-No functional change.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
-master commit: 989556c6f8ca080f5f202417af97d1188b9ba52a
-master date: 2024-03-07 14:24:42 +0000
----
- xen/common/livepatch.c | 9 +++++----
- xen/common/virtual_region.c | 19 ++++++++++---------
- xen/include/xen/virtual_region.h | 11 +++++++++--
- 3 files changed, 24 insertions(+), 15 deletions(-)
-
-diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
-index a5068a2217..29395f286f 100644
---- a/xen/common/livepatch.c
-+++ b/xen/common/livepatch.c
-@@ -785,8 +785,8 @@ static int prepare_payload(struct payload *payload,
- region = &payload->region;
-
- region->symbols_lookup = livepatch_symbols_lookup;
-- region->start = payload->text_addr;
-- region->end = payload->text_addr + payload->text_size;
-+ region->text_start = payload->text_addr;
-+ region->text_end = payload->text_addr + payload->text_size;
-
- /* Optional sections. */
- for ( i = 0; i < BUGFRAME_NR; i++ )
-@@ -823,8 +823,9 @@ static int prepare_payload(struct payload *payload,
- const void *instr = ALT_ORIG_PTR(a);
- const void *replacement = ALT_REPL_PTR(a);
-
-- if ( (instr < region->start && instr >= region->end) ||
-- (replacement < region->start && replacement >= region->end) )
-+ if ( (instr < region->text_start && instr >= region->text_end) ||
-+ (replacement < region->text_start &&
-+ replacement >= region->text_end) )
- {
- printk(XENLOG_ERR LIVEPATCH "%s Alt patching outside payload: %p\n",
- elf->name, instr);
-diff --git a/xen/common/virtual_region.c b/xen/common/virtual_region.c
-index 9f12c30efe..b22ffb75c4 100644
---- a/xen/common/virtual_region.c
-+++ b/xen/common/virtual_region.c
-@@ -11,15 +11,15 @@
-
- static struct virtual_region core = {
- .list = LIST_HEAD_INIT(core.list),
-- .start = _stext,
-- .end = _etext,
-+ .text_start = _stext,
-+ .text_end = _etext,
- };
-
- /* Becomes irrelevant when __init sections are cleared. */
- static struct virtual_region core_init __initdata = {
- .list = LIST_HEAD_INIT(core_init.list),
-- .start = _sinittext,
-- .end = _einittext,
-+ .text_start = _sinittext,
-+ .text_end = _einittext,
- };
-
- /*
-@@ -39,7 +39,8 @@ const struct virtual_region *find_text_region(unsigned long addr)
- rcu_read_lock(&rcu_virtual_region_lock);
- list_for_each_entry_rcu( region, &virtual_region_list, list )
- {
-- if ( (void *)addr >= region->start && (void *)addr < region->end )
-+ if ( (void *)addr >= region->text_start &&
-+ (void *)addr < region->text_end )
- {
- rcu_read_unlock(&rcu_virtual_region_lock);
- return region;
-@@ -88,8 +89,8 @@ void relax_virtual_region_perms(void)
-
- rcu_read_lock(&rcu_virtual_region_lock);
- list_for_each_entry_rcu( region, &virtual_region_list, list )
-- modify_xen_mappings_lite((unsigned long)region->start,
-- ROUNDUP((unsigned long)region->end, PAGE_SIZE),
-+ modify_xen_mappings_lite((unsigned long)region->text_start,
-+ PAGE_ALIGN((unsigned long)region->text_end),
- PAGE_HYPERVISOR_RWX);
- rcu_read_unlock(&rcu_virtual_region_lock);
- }
-@@ -100,8 +101,8 @@ void tighten_virtual_region_perms(void)
-
- rcu_read_lock(&rcu_virtual_region_lock);
- list_for_each_entry_rcu( region, &virtual_region_list, list )
-- modify_xen_mappings_lite((unsigned long)region->start,
-- ROUNDUP((unsigned long)region->end, PAGE_SIZE),
-+ modify_xen_mappings_lite((unsigned long)region->text_start,
-+ PAGE_ALIGN((unsigned long)region->text_end),
- PAGE_HYPERVISOR_RX);
- rcu_read_unlock(&rcu_virtual_region_lock);
- }
-diff --git a/xen/include/xen/virtual_region.h b/xen/include/xen/virtual_region.h
-index d053620711..442a45bf1f 100644
---- a/xen/include/xen/virtual_region.h
-+++ b/xen/include/xen/virtual_region.h
-@@ -9,11 +9,18 @@
- #include <xen/list.h>
- #include <xen/symbols.h>
-
-+/*
-+ * Despite it's name, this is a module(ish) description.
-+ *
-+ * There's one region for the runtime .text/etc, one region for .init during
-+ * boot only, and one region per livepatch.
-+ */
- struct virtual_region
- {
- struct list_head list;
-- const void *start; /* Virtual address start. */
-- const void *end; /* Virtual address end. */
-+
-+ const void *text_start; /* .text virtual address start. */
-+ const void *text_end; /* .text virtual address end. */
-
- /* If this is NULL the default lookup mechanism is used. */
- symbols_lookup_t *symbols_lookup;
---
-2.44.0
-
diff --git a/0063-xen-virtual-region-Include-rodata-pointers.patch b/0063-xen-virtual-region-Include-rodata-pointers.patch
deleted file mode 100644
index 9f51d4d..0000000
--- a/0063-xen-virtual-region-Include-rodata-pointers.patch
+++ /dev/null
@@ -1,71 +0,0 @@
-From 335cbb55567b20df8e8bd2d1b340609e272ddab6 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 2 Apr 2024 16:19:11 +0200
-Subject: [PATCH 63/67] xen/virtual-region: Include rodata pointers
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-These are optional. .init doesn't distinguish types of data like this, and
-livepatches don't necesserily have any .rodata either.
-
-No functional change.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
-master commit: ef969144a425e39f5b214a875b5713d0ea8575fb
-master date: 2024-03-07 14:24:42 +0000
----
- xen/common/livepatch.c | 6 ++++++
- xen/common/virtual_region.c | 2 ++
- xen/include/xen/virtual_region.h | 3 +++
- 3 files changed, 11 insertions(+)
-
-diff --git a/xen/common/livepatch.c b/xen/common/livepatch.c
-index 29395f286f..28c09ddf58 100644
---- a/xen/common/livepatch.c
-+++ b/xen/common/livepatch.c
-@@ -788,6 +788,12 @@ static int prepare_payload(struct payload *payload,
- region->text_start = payload->text_addr;
- region->text_end = payload->text_addr + payload->text_size;
-
-+ if ( payload->ro_size )
-+ {
-+ region->rodata_start = payload->ro_addr;
-+ region->rodata_end = payload->ro_addr + payload->ro_size;
-+ }
-+
- /* Optional sections. */
- for ( i = 0; i < BUGFRAME_NR; i++ )
- {
-diff --git a/xen/common/virtual_region.c b/xen/common/virtual_region.c
-index b22ffb75c4..9c566f8ec9 100644
---- a/xen/common/virtual_region.c
-+++ b/xen/common/virtual_region.c
-@@ -13,6 +13,8 @@ static struct virtual_region core = {
- .list = LIST_HEAD_INIT(core.list),
- .text_start = _stext,
- .text_end = _etext,
-+ .rodata_start = _srodata,
-+ .rodata_end = _erodata,
- };
-
- /* Becomes irrelevant when __init sections are cleared. */
-diff --git a/xen/include/xen/virtual_region.h b/xen/include/xen/virtual_region.h
-index 442a45bf1f..dcdc95ba49 100644
---- a/xen/include/xen/virtual_region.h
-+++ b/xen/include/xen/virtual_region.h
-@@ -22,6 +22,9 @@ struct virtual_region
- const void *text_start; /* .text virtual address start. */
- const void *text_end; /* .text virtual address end. */
-
-+ const void *rodata_start; /* .rodata virtual address start (optional). */
-+ const void *rodata_end; /* .rodata virtual address end. */
-+
- /* If this is NULL the default lookup mechanism is used. */
- symbols_lookup_t *symbols_lookup;
-
---
-2.44.0
-
diff --git a/0064-x86-livepatch-Relax-permissions-on-rodata-too.patch b/0064-x86-livepatch-Relax-permissions-on-rodata-too.patch
deleted file mode 100644
index bc80769..0000000
--- a/0064-x86-livepatch-Relax-permissions-on-rodata-too.patch
+++ /dev/null
@@ -1,85 +0,0 @@
-From c3ff11b11c21777a9b1c616607705f3a7340b391 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 2 Apr 2024 16:19:36 +0200
-Subject: [PATCH 64/67] x86/livepatch: Relax permissions on rodata too
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-This reinstates the capability to patch .rodata in load/unload hooks, which
-was lost when we stopped using CR0.WP=0 to patch.
-
-This turns out to be rather less of a large TODO than I thought at the time.
-
-Fixes: 8676092a0f16 ("x86/livepatch: Fix livepatch application when CET is active")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
-master commit: b083b1c393dc8961acf0959b1d2e0ad459985ae3
-master date: 2024-03-07 14:24:42 +0000
----
- xen/arch/x86/livepatch.c | 4 ++--
- xen/common/virtual_region.c | 12 ++++++++++++
- 2 files changed, 14 insertions(+), 2 deletions(-)
-
-diff --git a/xen/arch/x86/livepatch.c b/xen/arch/x86/livepatch.c
-index ee539f001b..4f76127e1f 100644
---- a/xen/arch/x86/livepatch.c
-+++ b/xen/arch/x86/livepatch.c
-@@ -62,7 +62,7 @@ int arch_livepatch_safety_check(void)
- int noinline arch_livepatch_quiesce(void)
- {
- /*
-- * Relax perms on .text to be RWX, so we can modify them.
-+ * Relax perms on .text/.rodata, so we can modify them.
- *
- * This relaxes perms globally, but all other CPUs are waiting on us.
- */
-@@ -75,7 +75,7 @@ int noinline arch_livepatch_quiesce(void)
- void noinline arch_livepatch_revive(void)
- {
- /*
-- * Reinstate perms on .text to be RX. This also cleans out the dirty
-+ * Reinstate perms on .text/.rodata. This also cleans out the dirty
- * bits, which matters when CET Shstk is active.
- *
- * The other CPUs waiting for us could in principle have re-walked while
-diff --git a/xen/common/virtual_region.c b/xen/common/virtual_region.c
-index 9c566f8ec9..aefc08e75f 100644
---- a/xen/common/virtual_region.c
-+++ b/xen/common/virtual_region.c
-@@ -91,9 +91,15 @@ void relax_virtual_region_perms(void)
-
- rcu_read_lock(&rcu_virtual_region_lock);
- list_for_each_entry_rcu( region, &virtual_region_list, list )
-+ {
- modify_xen_mappings_lite((unsigned long)region->text_start,
- PAGE_ALIGN((unsigned long)region->text_end),
- PAGE_HYPERVISOR_RWX);
-+ if ( region->rodata_start )
-+ modify_xen_mappings_lite((unsigned long)region->rodata_start,
-+ PAGE_ALIGN((unsigned long)region->rodata_end),
-+ PAGE_HYPERVISOR_RW);
-+ }
- rcu_read_unlock(&rcu_virtual_region_lock);
- }
-
-@@ -103,9 +109,15 @@ void tighten_virtual_region_perms(void)
-
- rcu_read_lock(&rcu_virtual_region_lock);
- list_for_each_entry_rcu( region, &virtual_region_list, list )
-+ {
- modify_xen_mappings_lite((unsigned long)region->text_start,
- PAGE_ALIGN((unsigned long)region->text_end),
- PAGE_HYPERVISOR_RX);
-+ if ( region->rodata_start )
-+ modify_xen_mappings_lite((unsigned long)region->rodata_start,
-+ PAGE_ALIGN((unsigned long)region->rodata_end),
-+ PAGE_HYPERVISOR_RO);
-+ }
- rcu_read_unlock(&rcu_virtual_region_lock);
- }
- #endif /* CONFIG_X86 */
---
-2.44.0
-
diff --git a/0065-x86-boot-Improve-the-boot-watchdog-determination-of-.patch b/0065-x86-boot-Improve-the-boot-watchdog-determination-of-.patch
deleted file mode 100644
index 4a46326..0000000
--- a/0065-x86-boot-Improve-the-boot-watchdog-determination-of-.patch
+++ /dev/null
@@ -1,106 +0,0 @@
-From 846fb984b506135917c2862d2e4607005d6afdeb Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 2 Apr 2024 16:20:09 +0200
-Subject: [PATCH 65/67] x86/boot: Improve the boot watchdog determination of
- stuck cpus
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Right now, check_nmi_watchdog() has two processing loops over all online CPUs
-using prev_nmi_count as storage.
-
-Use a cpumask_t instead (1/32th as much initdata) and have wait_for_nmis()
-make the determination of whether it is stuck, rather than having both
-functions needing to agree on how many ticks mean stuck.
-
-More importantly though, it means we can use the standard cpumask
-infrastructure, including turning this:
-
- (XEN) Brought up 512 CPUs
- (XEN) Testing NMI watchdog on all CPUs: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,
266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511} stuck
-
-into the rather more manageable:
-
- (XEN) Brought up 512 CPUs
- (XEN) Testing NMI watchdog on all CPUs: {0-511} stuck
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 9e18f339830c828798aef465556d4029d83476a0
-master date: 2024-03-19 18:29:37 +0000
----
- xen/arch/x86/nmi.c | 33 ++++++++++++++-------------------
- 1 file changed, 14 insertions(+), 19 deletions(-)
-
-diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
-index 7c9591b65e..dd31034ac8 100644
---- a/xen/arch/x86/nmi.c
-+++ b/xen/arch/x86/nmi.c
-@@ -150,6 +150,8 @@ int nmi_active;
-
- static void __init cf_check wait_for_nmis(void *p)
- {
-+ cpumask_t *stuck_cpus = p;
-+ unsigned int cpu = smp_processor_id();
- unsigned int start_count = this_cpu(nmi_count);
- unsigned long ticks = 10 * 1000 * cpu_khz / nmi_hz;
- unsigned long s, e;
-@@ -158,42 +160,35 @@ static void __init cf_check wait_for_nmis(void *p)
- do {
- cpu_relax();
- if ( this_cpu(nmi_count) >= start_count + 2 )
-- break;
-+ return;
-+
- e = rdtsc();
-- } while( e - s < ticks );
-+ } while ( e - s < ticks );
-+
-+ /* Timeout. Mark ourselves as stuck. */
-+ cpumask_set_cpu(cpu, stuck_cpus);
- }
-
- void __init check_nmi_watchdog(void)
- {
-- static unsigned int __initdata prev_nmi_count[NR_CPUS];
-- int cpu;
-- bool ok = true;
-+ static cpumask_t __initdata stuck_cpus;
-
- if ( nmi_watchdog == NMI_NONE )
- return;
-
- printk("Testing NMI watchdog on all CPUs:");
-
-- for_each_online_cpu ( cpu )
-- prev_nmi_count[cpu] = per_cpu(nmi_count, cpu);
--
- /*
- * Wait at most 10 ticks for 2 watchdog NMIs on each CPU.
- * Busy-wait on all CPUs: the LAPIC counter that the NMI watchdog
- * uses only runs while the core's not halted
- */
-- on_selected_cpus(&cpu_online_map, wait_for_nmis, NULL, 1);
--
-- for_each_online_cpu ( cpu )
-- {
-- if ( per_cpu(nmi_count, cpu) - prev_nmi_count[cpu] < 2 )
-- {
-- printk(" %d", cpu);
-- ok = false;
-- }
-- }
-+ on_selected_cpus(&cpu_online_map, wait_for_nmis, &stuck_cpus, 1);
-
-- printk(" %s\n", ok ? "ok" : "stuck");
-+ if ( cpumask_empty(&stuck_cpus) )
-+ printk("ok\n");
-+ else
-+ printk("{%*pbl} stuck\n", CPUMASK_PR(&stuck_cpus));
-
- /*
- * Now that we know it works we can reduce NMI frequency to
---
-2.44.0
-
diff --git a/0066-x86-boot-Support-the-watchdog-on-newer-AMD-systems.patch b/0066-x86-boot-Support-the-watchdog-on-newer-AMD-systems.patch
deleted file mode 100644
index e501861..0000000
--- a/0066-x86-boot-Support-the-watchdog-on-newer-AMD-systems.patch
+++ /dev/null
@@ -1,48 +0,0 @@
-From 2777b499f1f6d5cea68f9479f82d055542b822ad Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 2 Apr 2024 16:20:30 +0200
-Subject: [PATCH 66/67] x86/boot: Support the watchdog on newer AMD systems
-
-The MSRs used by setup_k7_watchdog() are architectural in 64bit. The Unit
-Select (0x76, cycles not in halt state) isn't, but it hasn't changed in 25
-years, making this a trend likely to continue.
-
-Drop the family check. If the Unit Select does happen to change meaning in
-the future, check_nmi_watchdog() will still notice the watchdog not operating
-as expected.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 131892e0dcc1265b621c2b7d844cb9e7c3a4404f
-master date: 2024-03-19 18:29:37 +0000
----
- xen/arch/x86/nmi.c | 11 ++++-------
- 1 file changed, 4 insertions(+), 7 deletions(-)
-
-diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
-index dd31034ac8..c7c51614a6 100644
---- a/xen/arch/x86/nmi.c
-+++ b/xen/arch/x86/nmi.c
-@@ -386,15 +386,12 @@ void setup_apic_nmi_watchdog(void)
- if ( nmi_watchdog == NMI_NONE )
- return;
-
-- switch (boot_cpu_data.x86_vendor) {
-+ switch ( boot_cpu_data.x86_vendor )
-+ {
- case X86_VENDOR_AMD:
-- switch (boot_cpu_data.x86) {
-- case 6:
-- case 0xf ... 0x19:
-- setup_k7_watchdog();
-- break;
-- }
-+ setup_k7_watchdog();
- break;
-+
- case X86_VENDOR_INTEL:
- switch (boot_cpu_data.x86) {
- case 6:
---
-2.44.0
-
diff --git a/0067-tests-resource-Fix-HVM-guest-in-SHADOW-builds.patch b/0067-tests-resource-Fix-HVM-guest-in-SHADOW-builds.patch
deleted file mode 100644
index 5ce4e17..0000000
--- a/0067-tests-resource-Fix-HVM-guest-in-SHADOW-builds.patch
+++ /dev/null
@@ -1,110 +0,0 @@
-From 9bc40dbcf9eafccc1923b2555286bf6a2af03b7a Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 2 Apr 2024 16:24:07 +0200
-Subject: [PATCH 67/67] tests/resource: Fix HVM guest in !SHADOW builds
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Right now, test-resource always creates HVM Shadow guests. But if Xen has
-SHADOW compiled out, running the test yields:
-
- $./test-resource
- XENMEM_acquire_resource tests
- Test x86 PV
- Created d1
- Test grant table
- Test x86 PVH
- Skip: 95 - Operation not supported
-
-and doesn't really test HVM guests, but doesn't fail either.
-
-There's nothing paging-mode-specific about this test, so default to HAP if
-possible and provide a more specific message if neither HAP or Shadow are
-available.
-
-As we've got physinfo to hand, also provide more specific message about the
-absence of PV or HVM support.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 0263dc9069ddb66335c72a159e09050b1600e56a
-master date: 2024-03-01 20:14:19 +0000
----
- tools/tests/resource/test-resource.c | 39 ++++++++++++++++++++++++++++
- 1 file changed, 39 insertions(+)
-
-diff --git a/tools/tests/resource/test-resource.c b/tools/tests/resource/test-resource.c
-index 0a950072f9..e2c4ba3478 100644
---- a/tools/tests/resource/test-resource.c
-+++ b/tools/tests/resource/test-resource.c
-@@ -20,6 +20,8 @@ static xc_interface *xch;
- static xenforeignmemory_handle *fh;
- static xengnttab_handle *gh;
-
-+static xc_physinfo_t physinfo;
-+
- static void test_gnttab(uint32_t domid, unsigned int nr_frames,
- unsigned long gfn)
- {
-@@ -172,6 +174,37 @@ static void test_domain_configurations(void)
-
- printf("Test %s\n", t->name);
-
-+#if defined(__x86_64__) || defined(__i386__)
-+ if ( t->create.flags & XEN_DOMCTL_CDF_hvm )
-+ {
-+ if ( !(physinfo.capabilities & XEN_SYSCTL_PHYSCAP_hvm) )
-+ {
-+ printf(" Skip: HVM not available\n");
-+ continue;
-+ }
-+
-+ /*
-+ * On x86, use HAP guests if possible, but skip if neither HAP nor
-+ * SHADOW is available.
-+ */
-+ if ( physinfo.capabilities & XEN_SYSCTL_PHYSCAP_hap )
-+ t->create.flags |= XEN_DOMCTL_CDF_hap;
-+ else if ( !(physinfo.capabilities & XEN_SYSCTL_PHYSCAP_shadow) )
-+ {
-+ printf(" Skip: Neither HAP or SHADOW available\n");
-+ continue;
-+ }
-+ }
-+ else
-+ {
-+ if ( !(physinfo.capabilities & XEN_SYSCTL_PHYSCAP_pv) )
-+ {
-+ printf(" Skip: PV not available\n");
-+ continue;
-+ }
-+ }
-+#endif
-+
- rc = xc_domain_create(xch, &domid, &t->create);
- if ( rc )
- {
-@@ -214,6 +247,8 @@ static void test_domain_configurations(void)
-
- int main(int argc, char **argv)
- {
-+ int rc;
-+
- printf("XENMEM_acquire_resource tests\n");
-
- xch = xc_interface_open(NULL, NULL, 0);
-@@ -227,6 +262,10 @@ int main(int argc, char **argv)
- if ( !gh )
- err(1, "xengnttab_open");
-
-+ rc = xc_physinfo(xch, &physinfo);
-+ if ( rc )
-+ err(1, "Failed to obtain physinfo");
-+
- test_domain_configurations();
-
- return !!nr_failures;
---
-2.44.0
-
diff --git a/info.txt b/info.txt
index fa9f510..ccc4d4e 100644
--- a/info.txt
+++ b/info.txt
@@ -1,6 +1,6 @@
-Xen upstream patchset #1 for 4.17.4-pre
+Xen upstream patchset #0 for 4.18.3-pre
Containing patches from
-RELEASE-4.17.3 (07f413d7ffb06eab36045bd19f53555de1cacf62)
+RELEASE-4.18.2 (844f9931c6c207588a70f897262c628cd542f75a)
to
-staging-4.17 (9bc40dbcf9eafccc1923b2555286bf6a2af03b7a)
+staging-4.18 (d078d0aa86e9e3b937f673dc89306b3afd09d560)
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [gentoo-commits] proj/xen-upstream-patches:main commit in: /
@ 2024-10-02 5:59 Tomáš Mózes
0 siblings, 0 replies; 11+ messages in thread
From: Tomáš Mózes @ 2024-10-02 5:59 UTC (permalink / raw
To: gentoo-commits
commit: befc038ba7247e93c8b224942fcca2c4a9e32717
Author: Tomáš Mózes <tomas.mozes <AT> gmail <DOT> com>
AuthorDate: Wed Oct 2 05:59:17 2024 +0000
Commit: Tomáš Mózes <hydrapolic <AT> gmail <DOT> com>
CommitDate: Wed Oct 2 05:59:17 2024 +0000
URL: https://gitweb.gentoo.org/proj/xen-upstream-patches.git/commit/?id=befc038b
Xen 4.19.1-pre-patchset-0
Signed-off-by: Tomáš Mózes <tomas.mozes <AT> gmail.com>
0001-update-Xen-version-to-4.19.1-pre.patch | 164 ++++++
...x86-entry-Fix-build-with-older-toolchains.patch | 32 -
...-__alt_call_maybe_initdata-so-it-s-safe-f.patch | 49 --
0002-bunzip2-fix-rare-decompression-failure.patch | 39 ++
...Fix-permission-checks-on-XEN_DOMCTL_creat.patch | 150 +++++
...id-UIP-flag-being-set-for-longer-than-exp.patch | 57 --
...R-correct-inadvertently-inverted-WC-check.patch | 36 --
...x-restoring-cr3-and-the-mapcache-override.patch | 38 ++
...6-altcall-further-refine-clang-workaround.patch | 73 +++
...x-reporting-of-BHB-clearing-usage-from-gu.patch | 69 ---
...-x86-spec-adjust-logic-that-elides-lfence.patch | 75 ---
...hed-fix-error-handling-in-cpu_schedule_up.patch | 113 ++++
...-xen-hvm-Don-t-skip-MSR_READ-trace-record.patch | 40 ++
0007-xen-xsm-Wire-up-get_dom0_console.patch | 66 ---
...chn-Use-errno-macro-to-handle-hypercall-e.patch | 75 +++
...en-x86-Fix-Syntax-warning-in-gen-cpuid.py.patch | 41 --
0009-9pfsd-fix-release-build-with-old-gcc.patch | 33 ++
...t-ATS-checking-for-root-complex-integrate.patch | 63 --
...ibxs-Open-dev-xen-xenbus-fds-as-O_CLOEXEC.patch | 47 --
...x-misaligned-IO-breakpoint-behaviour-in-P.patch | 41 ++
...U-move-tracking-in-iommu_identity_mapping.patch | 111 ++++
...icy-Fix-migration-from-Ice-Lake-to-Cascad.patch | 92 ---
...rough-documents-as-security-unsupported-w.patch | 42 ++
...code-Distinguish-ucode-already-up-to-date.patch | 58 --
0013-automation-disable-Yocto-jobs.patch | 48 ++
...opulation-of-the-online-vCPU-bitmap-for-P.patch | 61 --
0014-automation-use-expect-to-run-QEMU.patch | 362 ++++++++++++
...andling-XenStore-errors-in-device-creatio.patch | 191 ------
...C-prevent-undue-recursion-of-vlapic_error.patch | 57 ++
...et-all-sched_resource-data-inside-locked-.patch | 84 ---
0016-Arm-correct-FIXADDR_TOP.patch | 58 ++
...-x86-respect-mapcache_domain_init-failing.patch | 38 --
0017-tools-xentop-Fix-cpu-sort-order.patch | 76 ---
0017-xl-fix-incorrect-output-in-help-command.patch | 36 ++
...oid-system-wide-rendezvous-when-setting-A.patch | 60 --
...rect-UD-check-for-AVX512-FP16-complex-mul.patch | 37 ++
0019-update-Xen-version-to-4.18.3-pre.patch | 25 -
...-Introduce-x86_merge_dr6-and-fix-do_debug.patch | 140 +++++
...v-Fix-merging-of-new-status-bits-into-dr6.patch | 222 +++++++
...urther-fixes-to-identify-ucode-already-up.patch | 92 ---
...vent-watchdog-triggering-when-dumping-MSI.patch | 44 --
...ess-Coverity-complaint-in-check_guest_io_.patch | 112 ++++
...ove-offline-CPUs-from-old-CPU-mask-when-a.patch | 44 --
...ays-set-operand-size-for-AVX-VNNI-INT8-in.patch | 36 ++
0023-CI-Update-FreeBSD-to-13.3.patch | 33 --
...-fake-operand-size-for-AVX512CD-broadcast.patch | 35 ++
...not-use-shorthand-IPI-destinations-in-CPU.patch | 98 ----
...correct-cluster-tracking-upon-CPUs-going-.patch | 52 ++
...-disable-SMAP-for-PV-domain-building-only.patch | 145 +++++
...mit-interrupt-movement-done-by-fixup_irqs.patch | 104 ----
...rect-special-page-checking-in-epte_get_en.patch | 46 --
...rrect-partial-HPET_STATUS-write-emulation.patch | 37 ++
...ust-__irq_to_desc-to-fix-build-with-gcc14.patch | 61 ++
...id-marking-non-present-entries-for-re-con.patch | 85 ---
...ul-termination-of-the-return-value-of-lib.patch | 100 ++++
...p-questionable-mfn_valid-from-epte_get_en.patch | 47 --
0029-SUPPORT.md-split-XSM-from-Flask.patch | 66 +++
...86-Intel-unlock-CPUID-earlier-for-the-BSP.patch | 105 ----
0030-x86-fix-UP-build-with-gcc14.patch | 63 ++
...l-with-old_cpu_mask-for-interrupts-in-mov.patch | 84 ---
...dle-moving-interrupts-in-_assign_irq_vect.patch | 172 ------
0031-x86emul-test-fix-build-with-gas-2.43.patch | 86 +++
...-HVM-properly-reject-indirect-VRAM-writes.patch | 45 ++
...san-Fix-UB-in-type_descriptor-declaration.patch | 39 --
...86-xstate-Fix-initialisation-of-XSS-cache.patch | 74 ---
...-handle-ACPI-RSDT-table-in-PVH-Dom0-build.patch | 63 ++
...cile-protocol-specification-with-in-use-i.patch | 183 ++++++
...puid-Fix-handling-of-XSAVE-dynamic-leaves.patch | 72 ---
...ward-pending-interrupts-to-new-destinatio.patch | 143 -----
...ix-buffer-under-run-when-parsing-AMD-cont.patch | 62 ++
...exception-from-stub-recovery-selftests-wi.patch | 84 ---
...-don-t-let-test-xenstore-write-nodes-exce.patch | 41 --
...-let-test-xenstore-exit-with-non-0-status.patch | 57 --
0039-LICENSES-Add-MIT-0-MIT-No-Attribution.patch | 58 --
...t-stand-alone-sd_notify-implementation-fr.patch | 130 -----
...o-xenstored-Don-t-link-against-libsystemd.patch | 87 ---
0042-tools-Drop-libsystemd-as-a-dependency.patch | 648 ---------------------
...x86-ioapic-Fix-signed-shifts-in-io_apic.c.patch | 46 --
0044-tools-xl-Open-xldevd.log-with-O_CLOEXEC.patch | 53 --
0045-pirq_cleanup_check-leaks.patch | 84 ---
...ilder-Correct-the-length-calculation-in-x.patch | 44 --
...ols-libxs-Fix-CLOEXEC-handling-in-get_dev.patch | 95 ---
...-libxs-Fix-CLOEXEC-handling-in-get_socket.patch | 60 --
...s-libxs-Fix-CLOEXEC-handling-in-xs_fileno.patch | 109 ----
...ument-and-enforce-extra_guest_irqs-upper-.patch | 156 -----
...on-t-clear-DF-when-raising-UD-for-lack-of.patch | 58 --
0052-evtchn-build-fix-for-Arm.patch | 43 --
...RQ-avoid-double-unlock-in-map_domain_pirq.patch | 53 --
...-Return-pirq-that-irq-was-already-mapped-.patch | 38 --
...libxs-Fix-fcntl-invocation-in-set_cloexec.patch | 57 --
...-fix-clang-code-gen-when-using-altcall-in.patch | 85 ---
info.txt | 6 +-
92 files changed, 3028 insertions(+), 4591 deletions(-)
diff --git a/0001-update-Xen-version-to-4.19.1-pre.patch b/0001-update-Xen-version-to-4.19.1-pre.patch
new file mode 100644
index 0000000..8801862
--- /dev/null
+++ b/0001-update-Xen-version-to-4.19.1-pre.patch
@@ -0,0 +1,164 @@
+From f97db9b3bc3deac4eead160106a3f6de2ccce81d Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Thu, 8 Aug 2024 13:43:19 +0200
+Subject: [PATCH 01/35] update Xen version to 4.19.1-pre
+
+---
+ Config.mk | 2 -
+ MAINTAINERS | 106 +++++----------------------------------------------
+ xen/Makefile | 2 +-
+ 3 files changed, 10 insertions(+), 100 deletions(-)
+
+diff --git a/Config.mk b/Config.mk
+index ac8fb847ce..03a89624c7 100644
+--- a/Config.mk
++++ b/Config.mk
+@@ -234,8 +234,6 @@ ETHERBOOT_NICS ?= rtl8139 8086100e
+
+ QEMU_TRADITIONAL_URL ?= https://xenbits.xen.org/git-http/qemu-xen-traditional.git
+ QEMU_TRADITIONAL_REVISION ?= xen-4.19.0
+-# Wed Jul 15 10:01:40 2020 +0100
+-# qemu-trad: remove Xen path dependencies
+
+ # Specify which qemu-dm to use. This may be `ioemu' to use the old
+ # Mercurial in-tree version, or a local directory, or a git URL.
+diff --git a/MAINTAINERS b/MAINTAINERS
+index 2b0c894527..fe81ed63ad 100644
+--- a/MAINTAINERS
++++ b/MAINTAINERS
+@@ -54,6 +54,15 @@ list. Remember to copy the appropriate stable branch maintainer who
+ will be listed in this section of the MAINTAINERS file in the
+ appropriate branch.
+
++The maintainer for this branch is:
++
++ Jan Beulich <jbeulich@suse.com>
++
++Tools backport requests should also be copied to:
++
++ Anthony Perard <anthony.perard@citrix.com>
++
++
+ Unstable Subsystem Maintainers
+ ==============================
+
+@@ -104,103 +113,6 @@ Descriptions of section entries:
+ xen-maintainers-<version format number of this file>
+
+
+- Check-in policy
+- ===============
+-
+-In order for a patch to be checked in, in general, several conditions
+-must be met:
+-
+-1. In order to get a change to a given file committed, it must have
+- the approval of at least one maintainer of that file.
+-
+- A patch of course needs Acks from the maintainers of each file that
+- it changes; so a patch which changes xen/arch/x86/traps.c,
+- xen/arch/x86/mm/p2m.c, and xen/arch/x86/mm/shadow/multi.c would
+- require an Ack from each of the three sets of maintainers.
+-
+- See below for rules on nested maintainership.
+-
+-2. Each change must have appropriate approval from someone other than
+- the person who wrote it. This can be either:
+-
+- a. An Acked-by from a maintainer of the code being touched (a
+- co-maintainer if available, or a more general level maintainer if
+- not available; see the secton on nested maintainership)
+-
+- b. A Reviewed-by by anyone of suitable stature in the community
+-
+-3. Sufficient time must have been given for anyone to respond. This
+- depends in large part upon the urgency and nature of the patch.
+- For a straightforward uncontroversial patch, a day or two may be
+- sufficient; for a controversial patch, a week or two may be better.
+-
+-4. There must be no "open" objections.
+-
+-In a case where one person submits a patch and a maintainer gives an
+-Ack, the Ack stands in for both the approval requirement (#1) and the
+-Acked-by-non-submitter requirement (#2).
+-
+-In a case where a maintainer themselves submits a patch, the
+-Signed-off-by meets the approval requirement (#1); so a Review
+-from anyone in the community suffices for requirement #2.
+-
+-Before a maintainer checks in their own patch with another community
+-member's R-b but no co-maintainer Ack, it is especially important to
+-give their co-maintainer opportunity to give feedback, perhaps
+-declaring their intention to check it in without their co-maintainers
+-ack a day before doing so.
+-
+-In the case where two people collaborate on a patch, at least one of
+-whom is a maintainer -- typically where one maintainer will do an
+-early version of the patch, and another maintainer will pick it up and
+-revise it -- there should be two Signed-off-by's and one Acked-by or
+-Reviewed-by; with the maintainer who did the most recent change
+-sending the patch, and an Acked-by or Reviewed-by coming from the
+-maintainer who did not most recently edit the patch. This satisfies
+-the requirement #2 because a) the Signed-off-by of the sender approves
+-the final version of the patch; including all parts of the patch that
+-the sender did not write b) the Reviewed-by approves the final version
+-of the patch, including all patches that the reviewer did not write.
+-Thus all code in the patch has been approved by someone who did not
+-write it.
+-
+-Maintainers may choose to override non-maintainer objections in the
+-case that consensus can't be reached.
+-
+-As always, no policy can cover all possible situations. In
+-exceptional circumstances, committers may commit a patch in absence of
+-one or more of the above requirements, if they are reasonably
+-confident that the other maintainers will approve of their decision in
+-retrospect.
+-
+- The meaning of nesting
+- ======================
+-
+-Many maintainership areas are "nested": for example, there are entries
+-for xen/arch/x86 as well as xen/arch/x86/mm, and even
+-xen/arch/x86/mm/shadow; and there is a section at the end called "THE
+-REST" which lists all committers. The meaning of nesting is that:
+-
+-1. Under normal circumstances, the Ack of the most specific maintainer
+-is both necessary and sufficient to get a change to a given file
+-committed. So a change to xen/arch/x86/mm/shadow/multi.c requires the
+-the Ack of the xen/arch/x86/mm/shadow maintainer for that part of the
+-patch, but would not require the Ack of the xen/arch/x86 maintainer or
+-the xen/arch/x86/mm maintainer.
+-
+-2. In unusual circumstances, a more general maintainer's Ack can stand
+-in for or even overrule a specific maintainer's Ack. Unusual
+-circumstances might include:
+- - The patch is fixing a high-priority issue causing immediate pain,
+- and the more specific maintainer is not available.
+- - The more specific maintainer has not responded either to the
+- original patch, nor to "pings", within a reasonable amount of time.
+- - The more general maintainer wants to overrule the more specific
+- maintainer on some issue. (This should be exceptional.)
+- - In the case of a disagreement between maintainers, THE REST can
+- settle the matter by majority vote. (This should be very exceptional
+- indeed.)
+-
+
+ Maintainers List (try to look for most precise areas first)
+
+diff --git a/xen/Makefile b/xen/Makefile
+index 16055101fb..59dac504b3 100644
+--- a/xen/Makefile
++++ b/xen/Makefile
+@@ -6,7 +6,7 @@ this-makefile := $(call lastword,$(MAKEFILE_LIST))
+ # All other places this is stored (eg. compile.h) should be autogenerated.
+ export XEN_VERSION = 4
+ export XEN_SUBVERSION = 19
+-export XEN_EXTRAVERSION ?= .0$(XEN_VENDORVERSION)
++export XEN_EXTRAVERSION ?= .1-pre$(XEN_VENDORVERSION)
+ export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
+ -include xen-version
+
+--
+2.46.1
+
diff --git a/0001-x86-entry-Fix-build-with-older-toolchains.patch b/0001-x86-entry-Fix-build-with-older-toolchains.patch
deleted file mode 100644
index ad6e76a..0000000
--- a/0001-x86-entry-Fix-build-with-older-toolchains.patch
+++ /dev/null
@@ -1,32 +0,0 @@
-From 2d38302c33b117aa9a417056db241aefc840c2f0 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 9 Apr 2024 21:39:51 +0100
-Subject: [PATCH 01/56] x86/entry: Fix build with older toolchains
-
-Binutils older than 2.29 doesn't know INCSSPD.
-
-Fixes: 8e186f98ce0e ("x86: Use indirect calls in reset-stack infrastructure")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
-(cherry picked from commit a9fa82500818a8d8ce5f2843f1577bd2c29d088e)
----
- xen/arch/x86/x86_64/entry.S | 2 ++
- 1 file changed, 2 insertions(+)
-
-diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
-index ad7dd3b23b..054fcb225f 100644
---- a/xen/arch/x86/x86_64/entry.S
-+++ b/xen/arch/x86/x86_64/entry.S
-@@ -643,7 +643,9 @@ ENTRY(continue_pv_domain)
- * JMPed to. Drop the return address.
- */
- add $8, %rsp
-+#ifdef CONFIG_XEN_SHSTK
- ALTERNATIVE "", "mov $2, %eax; incsspd %eax", X86_FEATURE_XEN_SHSTK
-+#endif
-
- call check_wakeup_from_wait
- ret_from_intr:
---
-2.45.2
-
diff --git a/0002-altcall-fix-__alt_call_maybe_initdata-so-it-s-safe-f.patch b/0002-altcall-fix-__alt_call_maybe_initdata-so-it-s-safe-f.patch
deleted file mode 100644
index 05ecd83..0000000
--- a/0002-altcall-fix-__alt_call_maybe_initdata-so-it-s-safe-f.patch
+++ /dev/null
@@ -1,49 +0,0 @@
-From 8bdcb0b98b53140102031ceca0611f22190227fd Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Mon, 29 Apr 2024 09:35:21 +0200
-Subject: [PATCH 02/56] altcall: fix __alt_call_maybe_initdata so it's safe for
- livepatch
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Setting alternative call variables as __init is not safe for use with
-livepatch, as livepatches can rightfully introduce new alternative calls to
-structures marked as __alt_call_maybe_initdata (possibly just indirectly due to
-replacing existing functions that use those). Attempting to resolve those
-alternative calls then results in page faults as the variable that holds the
-function pointer address has been freed.
-
-When livepatch is supported use the __ro_after_init attribute instead of
-__initdata for __alt_call_maybe_initdata.
-
-Fixes: f26bb285949b ('xen: Implement xen/alternative-call.h for use in common code')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: af4cd0a6a61cdb03bc1afca9478b05b0c9703599
-master date: 2024-04-11 18:51:36 +0100
----
- xen/include/xen/alternative-call.h | 7 ++++++-
- 1 file changed, 6 insertions(+), 1 deletion(-)
-
-diff --git a/xen/include/xen/alternative-call.h b/xen/include/xen/alternative-call.h
-index 5c6b9a562b..10f7d7637e 100644
---- a/xen/include/xen/alternative-call.h
-+++ b/xen/include/xen/alternative-call.h
-@@ -50,7 +50,12 @@
-
- #include <asm/alternative.h>
-
--#define __alt_call_maybe_initdata __initdata
-+#ifdef CONFIG_LIVEPATCH
-+/* Must keep for livepatches to resolve alternative calls. */
-+# define __alt_call_maybe_initdata __ro_after_init
-+#else
-+# define __alt_call_maybe_initdata __initdata
-+#endif
-
- #else
-
---
-2.45.2
-
diff --git a/0002-bunzip2-fix-rare-decompression-failure.patch b/0002-bunzip2-fix-rare-decompression-failure.patch
new file mode 100644
index 0000000..79e8339
--- /dev/null
+++ b/0002-bunzip2-fix-rare-decompression-failure.patch
@@ -0,0 +1,39 @@
+From e54077cbca7149c8fa856535b69a4c70dfd48cd2 Mon Sep 17 00:00:00 2001
+From: Ross Lagerwall <ross.lagerwall@citrix.com>
+Date: Thu, 8 Aug 2024 13:44:26 +0200
+Subject: [PATCH 02/35] bunzip2: fix rare decompression failure
+
+The decompression code parses a huffman tree and counts the number of
+symbols for a given bit length. In rare cases, there may be >= 256
+symbols with a given bit length, causing the unsigned char to overflow.
+This causes a decompression failure later when the code tries and fails to
+find the bit length for a given symbol.
+
+Since the maximum number of symbols is 258, use unsigned short instead.
+
+Fixes: ab77e81f6521 ("x86/dom0: support bzip2 and lzma compressed bzImage payloads")
+Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
+Acked-by: Jan Beulich <jbeulich@suse.com>
+master commit: 303d3ff85c90ee4af4bad4e3b1d4932fa2634d64
+master date: 2024-07-30 11:55:56 +0200
+---
+ xen/common/bunzip2.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/xen/common/bunzip2.c b/xen/common/bunzip2.c
+index 4466426941..79f17162b1 100644
+--- a/xen/common/bunzip2.c
++++ b/xen/common/bunzip2.c
+@@ -221,7 +221,8 @@ static int __init get_next_block(struct bunzip_data *bd)
+ RUNB) */
+ symCount = symTotal+2;
+ for (j = 0; j < groupCount; j++) {
+- unsigned char length[MAX_SYMBOLS], temp[MAX_HUFCODE_BITS+1];
++ unsigned char length[MAX_SYMBOLS];
++ unsigned short temp[MAX_HUFCODE_BITS+1];
+ int minLen, maxLen, pp;
+ /* Read Huffman code lengths for each symbol. They're
+ stored in a way similar to mtf; record a starting
+--
+2.46.1
+
diff --git a/0003-XSM-domctl-Fix-permission-checks-on-XEN_DOMCTL_creat.patch b/0003-XSM-domctl-Fix-permission-checks-on-XEN_DOMCTL_creat.patch
new file mode 100644
index 0000000..ccdb369
--- /dev/null
+++ b/0003-XSM-domctl-Fix-permission-checks-on-XEN_DOMCTL_creat.patch
@@ -0,0 +1,150 @@
+From d2ecc1f231b90d4e54394e25a9aef9be42c0d196 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Thu, 8 Aug 2024 13:44:56 +0200
+Subject: [PATCH 03/35] XSM/domctl: Fix permission checks on
+ XEN_DOMCTL_createdomain
+
+The XSM checks for XEN_DOMCTL_createdomain are problematic. There's a split
+between xsm_domctl() called early, and flask_domain_create() called quite late
+during domain construction.
+
+All XSM implementations except Flask have a simple IS_PRIV check in
+xsm_domctl(), and operate as expected when an unprivileged domain tries to
+make a hypercall.
+
+Flask however foregoes any action in xsm_domctl() and defers everything,
+including the simple "is the caller permitted to create a domain" check, to
+flask_domain_create().
+
+As a consequence, when XSM Flask is active, and irrespective of the policy
+loaded, all domains irrespective of privilege can:
+
+ * Mutate the global 'rover' variable, used to track the next free domid.
+ Therefore, all domains can cause a domid wraparound, and combined with a
+ voluntary reboot, choose their own domid.
+
+ * Cause a reasonable amount of a domain to be constructed before ultimately
+ failing for permission reasons, including the use of settings outside of
+ supported limits.
+
+In order to remediate this, pass the ssidref into xsm_domctl() and at least
+check that the calling domain privileged enough to create domains.
+
+Take the opportunity to also fix the sign of the cmd parameter to be unsigned.
+
+This issue has not been assigned an XSA, because Flask is experimental and not
+security supported.
+
+Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
+master commit: ee32b9b29af449d38aad0a1b3a81aaae586f5ea7
+master date: 2024-07-30 17:42:17 +0100
+---
+ xen/arch/x86/mm/paging.c | 2 +-
+ xen/common/domctl.c | 4 +++-
+ xen/include/xsm/dummy.h | 2 +-
+ xen/include/xsm/xsm.h | 7 ++++---
+ xen/xsm/flask/hooks.c | 14 ++++++++++++--
+ 5 files changed, 21 insertions(+), 8 deletions(-)
+
+diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
+index bca320fffa..dd47bde5ce 100644
+--- a/xen/arch/x86/mm/paging.c
++++ b/xen/arch/x86/mm/paging.c
+@@ -767,7 +767,7 @@ long do_paging_domctl_cont(
+ if ( d == NULL )
+ return -ESRCH;
+
+- ret = xsm_domctl(XSM_OTHER, d, op.cmd);
++ ret = xsm_domctl(XSM_OTHER, d, op.cmd, 0 /* SSIDref not applicable */);
+ if ( !ret )
+ {
+ if ( domctl_lock_acquire() )
+diff --git a/xen/common/domctl.c b/xen/common/domctl.c
+index 2c0331bb05..ea16b75910 100644
+--- a/xen/common/domctl.c
++++ b/xen/common/domctl.c
+@@ -322,7 +322,9 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
+ break;
+ }
+
+- ret = xsm_domctl(XSM_OTHER, d, op->cmd);
++ ret = xsm_domctl(XSM_OTHER, d, op->cmd,
++ /* SSIDRef only applicable for cmd == createdomain */
++ op->u.createdomain.ssidref);
+ if ( ret )
+ goto domctl_out_unlock_domonly;
+
+diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h
+index 00d2cbebf2..7956f27a29 100644
+--- a/xen/include/xsm/dummy.h
++++ b/xen/include/xsm/dummy.h
+@@ -162,7 +162,7 @@ static XSM_INLINE int cf_check xsm_set_target(
+ }
+
+ static XSM_INLINE int cf_check xsm_domctl(
+- XSM_DEFAULT_ARG struct domain *d, int cmd)
++ XSM_DEFAULT_ARG struct domain *d, unsigned int cmd, uint32_t ssidref)
+ {
+ XSM_ASSERT_ACTION(XSM_OTHER);
+ switch ( cmd )
+diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h
+index 8dad03fd3d..627c0d2731 100644
+--- a/xen/include/xsm/xsm.h
++++ b/xen/include/xsm/xsm.h
+@@ -60,7 +60,7 @@ struct xsm_ops {
+ int (*domctl_scheduler_op)(struct domain *d, int op);
+ int (*sysctl_scheduler_op)(int op);
+ int (*set_target)(struct domain *d, struct domain *e);
+- int (*domctl)(struct domain *d, int cmd);
++ int (*domctl)(struct domain *d, unsigned int cmd, uint32_t ssidref);
+ int (*sysctl)(int cmd);
+ int (*readconsole)(uint32_t clear);
+
+@@ -248,9 +248,10 @@ static inline int xsm_set_target(
+ return alternative_call(xsm_ops.set_target, d, e);
+ }
+
+-static inline int xsm_domctl(xsm_default_t def, struct domain *d, int cmd)
++static inline int xsm_domctl(xsm_default_t def, struct domain *d,
++ unsigned int cmd, uint32_t ssidref)
+ {
+- return alternative_call(xsm_ops.domctl, d, cmd);
++ return alternative_call(xsm_ops.domctl, d, cmd, ssidref);
+ }
+
+ static inline int xsm_sysctl(xsm_default_t def, int cmd)
+diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
+index 5e88c71b8e..278ad38c2a 100644
+--- a/xen/xsm/flask/hooks.c
++++ b/xen/xsm/flask/hooks.c
+@@ -663,12 +663,22 @@ static int cf_check flask_set_target(struct domain *d, struct domain *t)
+ return rc;
+ }
+
+-static int cf_check flask_domctl(struct domain *d, int cmd)
++static int cf_check flask_domctl(struct domain *d, unsigned int cmd,
++ uint32_t ssidref)
+ {
+ switch ( cmd )
+ {
+- /* These have individual XSM hooks (common/domctl.c) */
+ case XEN_DOMCTL_createdomain:
++ /*
++ * There is a later hook too, but at this early point simply check
++ * that the calling domain is privileged enough to create a domain.
++ *
++ * Note that d is NULL because we haven't even allocated memory for it
++ * this early in XEN_DOMCTL_createdomain.
++ */
++ return avc_current_has_perm(ssidref, SECCLASS_DOMAIN, DOMAIN__CREATE, NULL);
++
++ /* These have individual XSM hooks (common/domctl.c) */
+ case XEN_DOMCTL_getdomaininfo:
+ case XEN_DOMCTL_scheduler_op:
+ case XEN_DOMCTL_irq_permission:
+--
+2.46.1
+
diff --git a/0003-x86-rtc-Avoid-UIP-flag-being-set-for-longer-than-exp.patch b/0003-x86-rtc-Avoid-UIP-flag-being-set-for-longer-than-exp.patch
deleted file mode 100644
index 8307630..0000000
--- a/0003-x86-rtc-Avoid-UIP-flag-being-set-for-longer-than-exp.patch
+++ /dev/null
@@ -1,57 +0,0 @@
-From af0e9ba44a58c87d6d135d8ffbf468b4ceac0a41 Mon Sep 17 00:00:00 2001
-From: Ross Lagerwall <ross.lagerwall@citrix.com>
-Date: Mon, 29 Apr 2024 09:36:04 +0200
-Subject: [PATCH 03/56] x86/rtc: Avoid UIP flag being set for longer than
- expected
-
-In a test, OVMF reported an error initializing the RTC without
-indicating the precise nature of the error. The only plausible
-explanation I can find is as follows:
-
-As part of the initialization, OVMF reads register C and then reads
-register A repatedly until the UIP flag is not set. If this takes longer
-than 100 ms, OVMF fails and reports an error. This may happen with the
-following sequence of events:
-
-At guest time=0s, rtc_init() calls check_update_timer() which schedules
-update_timer for t=(1 - 244us).
-
-At t=1s, the update_timer function happens to have been called >= 244us
-late. In the timer callback, it sets the UIP flag and schedules
-update_timer2 for t=1s.
-
-Before update_timer2 runs, the guest reads register C which calls
-check_update_timer(). check_update_timer() stops the scheduled
-update_timer2 and since the guest time is now outside of the update
-cycle, it schedules update_timer for t=(2 - 244us).
-
-The UIP flag will therefore be set for a whole second from t=1 to t=2
-while the guest repeatedly reads register A waiting for the UIP flag to
-clear. Fix it by clearing the UIP flag when scheduling update_timer.
-
-I was able to reproduce this issue with a synthetic test and this
-resolves the issue.
-
-Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 43a07069863b419433dee12c9b58c1f7ce70aa97
-master date: 2024-04-23 14:09:18 +0200
----
- xen/arch/x86/hvm/rtc.c | 1 +
- 1 file changed, 1 insertion(+)
-
-diff --git a/xen/arch/x86/hvm/rtc.c b/xen/arch/x86/hvm/rtc.c
-index 206b4296e9..4839374352 100644
---- a/xen/arch/x86/hvm/rtc.c
-+++ b/xen/arch/x86/hvm/rtc.c
-@@ -202,6 +202,7 @@ static void check_update_timer(RTCState *s)
- }
- else
- {
-+ s->hw.cmos_data[RTC_REG_A] &= ~RTC_UIP;
- next_update_time = (USEC_PER_SEC - guest_usec - 244) * NS_PER_USEC;
- expire_time = NOW() + next_update_time;
- s->next_update_time = expire_time;
---
-2.45.2
-
diff --git a/0004-x86-MTRR-correct-inadvertently-inverted-WC-check.patch b/0004-x86-MTRR-correct-inadvertently-inverted-WC-check.patch
deleted file mode 100644
index ed7754d..0000000
--- a/0004-x86-MTRR-correct-inadvertently-inverted-WC-check.patch
+++ /dev/null
@@ -1,36 +0,0 @@
-From eb7059767c82d833ebecdf8106e96482b04f3c40 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Mon, 29 Apr 2024 09:36:37 +0200
-Subject: [PATCH 04/56] x86/MTRR: correct inadvertently inverted WC check
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The ! clearly got lost by mistake.
-
-Fixes: e9e0eb30d4d6 ("x86/MTRR: avoid several indirect calls")
-Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 77e25f0e30ddd11e043e6fce84bf108ce7de5b6f
-master date: 2024-04-23 14:13:48 +0200
----
- xen/arch/x86/cpu/mtrr/main.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/cpu/mtrr/main.c b/xen/arch/x86/cpu/mtrr/main.c
-index 55a4da54a7..90b235f57e 100644
---- a/xen/arch/x86/cpu/mtrr/main.c
-+++ b/xen/arch/x86/cpu/mtrr/main.c
-@@ -316,7 +316,7 @@ int mtrr_add_page(unsigned long base, unsigned long size,
- }
-
- /* If the type is WC, check that this processor supports it */
-- if ((type == X86_MT_WC) && mtrr_have_wrcomb()) {
-+ if ((type == X86_MT_WC) && !mtrr_have_wrcomb()) {
- printk(KERN_WARNING
- "mtrr: your processor doesn't support write-combining\n");
- return -EOPNOTSUPP;
---
-2.45.2
-
diff --git a/0004-x86-dom0-fix-restoring-cr3-and-the-mapcache-override.patch b/0004-x86-dom0-fix-restoring-cr3-and-the-mapcache-override.patch
new file mode 100644
index 0000000..40dbb9f
--- /dev/null
+++ b/0004-x86-dom0-fix-restoring-cr3-and-the-mapcache-override.patch
@@ -0,0 +1,38 @@
+From adf1939b51a0a2fa596f7acca0989bfe56cab307 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Thu, 8 Aug 2024 13:45:28 +0200
+Subject: [PATCH 04/35] x86/dom0: fix restoring %cr3 and the mapcache override
+ on PV build error
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+One of the error paths in the PV dom0 builder section that runs on the guest
+page-tables wasn't restoring the Xen value of %cr3, neither removing the
+mapcache override.
+
+Fixes: 079ff2d32c3d ('libelf-loader: introduce elf_load_image')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 1fc3f77113dd43b14fa7ef5936dcdba120c0b63f
+master date: 2024-07-31 12:41:02 +0200
+---
+ xen/arch/x86/pv/dom0_build.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
+index d8043fa58a..57e58a02e7 100644
+--- a/xen/arch/x86/pv/dom0_build.c
++++ b/xen/arch/x86/pv/dom0_build.c
+@@ -825,6 +825,8 @@ int __init dom0_construct_pv(struct domain *d,
+ rc = elf_load_binary(&elf);
+ if ( rc < 0 )
+ {
++ mapcache_override_current(NULL);
++ switch_cr3_cr4(current->arch.cr3, read_cr4());
+ printk("Failed to load the kernel binary\n");
+ goto out;
+ }
+--
+2.46.1
+
diff --git a/0005-x86-altcall-further-refine-clang-workaround.patch b/0005-x86-altcall-further-refine-clang-workaround.patch
new file mode 100644
index 0000000..7107099
--- /dev/null
+++ b/0005-x86-altcall-further-refine-clang-workaround.patch
@@ -0,0 +1,73 @@
+From ee032f29972b8c58db9fcf96650f9cbc083edca8 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Thu, 8 Aug 2024 13:45:58 +0200
+Subject: [PATCH 05/35] x86/altcall: further refine clang workaround
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The current code in ALT_CALL_ARG() won't successfully workaround the clang
+code-generation issue if the arg parameter has a size that's not a power of 2.
+While there are no such sized parameters at the moment, improve the workaround
+to also be effective when such sizes are used.
+
+Instead of using a union with a long use an unsigned long that's first
+initialized to 0 and afterwards set to the argument value.
+
+Reported-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Suggested-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 561cba38ff551383a628dc93e64ab0691cfc92bf
+master date: 2024-07-31 12:41:22 +0200
+---
+ xen/arch/x86/include/asm/alternative.h | 26 ++++++++++++--------------
+ 1 file changed, 12 insertions(+), 14 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/alternative.h b/xen/arch/x86/include/asm/alternative.h
+index e63b459276..a86eadfaec 100644
+--- a/xen/arch/x86/include/asm/alternative.h
++++ b/xen/arch/x86/include/asm/alternative.h
+@@ -169,27 +169,25 @@ extern void alternative_branches(void);
+
+ #ifdef CONFIG_CC_IS_CLANG
+ /*
+- * Use a union with an unsigned long in order to prevent clang from
+- * skipping a possible truncation of the value. By using the union any
+- * truncation is carried before the call instruction, in turn covering
+- * for ABI-non-compliance in that the necessary clipping / extension of
+- * the value is supposed to be carried out in the callee.
++ * Clang doesn't follow the psABI and doesn't truncate parameter values at the
++ * callee. This can lead to bad code being generated when using alternative
++ * calls.
+ *
+- * Note this behavior is not mandated by the standard, and hence could
+- * stop being a viable workaround, or worse, could cause a different set
+- * of code-generation issues in future clang versions.
++ * Workaround it by using a temporary intermediate variable that's zeroed
++ * before being assigned the parameter value, as that forces clang to zero the
++ * register at the caller.
+ *
+ * This has been reported upstream:
+ * https://github.com/llvm/llvm-project/issues/12579
+ * https://github.com/llvm/llvm-project/issues/82598
+ */
+ #define ALT_CALL_ARG(arg, n) \
+- register union { \
+- typeof(arg) e[sizeof(long) / sizeof(arg)]; \
+- unsigned long r; \
+- } a ## n ## _ asm ( ALT_CALL_arg ## n ) = { \
+- .e[0] = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })\
+- }
++ register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = ({ \
++ unsigned long tmp = 0; \
++ BUILD_BUG_ON(sizeof(arg) > sizeof(unsigned long)); \
++ *(typeof(arg) *)&tmp = (arg); \
++ tmp; \
++ })
+ #else
+ #define ALT_CALL_ARG(arg, n) \
+ register typeof(arg) a ## n ## _ asm ( ALT_CALL_arg ## n ) = \
+--
+2.46.1
+
diff --git a/0005-x86-spec-fix-reporting-of-BHB-clearing-usage-from-gu.patch b/0005-x86-spec-fix-reporting-of-BHB-clearing-usage-from-gu.patch
deleted file mode 100644
index bad0428..0000000
--- a/0005-x86-spec-fix-reporting-of-BHB-clearing-usage-from-gu.patch
+++ /dev/null
@@ -1,69 +0,0 @@
-From 0b0c7dca70d64c35c86e5d503f67366ebe2b9138 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Mon, 29 Apr 2024 09:37:04 +0200
-Subject: [PATCH 05/56] x86/spec: fix reporting of BHB clearing usage from
- guest entry points
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Reporting whether the BHB clearing on entry is done for the different domains
-types based on cpu_has_bhb_seq is unhelpful, as that variable signals whether
-there's a BHB clearing sequence selected, but that alone doesn't imply that
-such sequence is used from the PV and/or HVM entry points.
-
-Instead use opt_bhb_entry_{pv,hvm} which do signal whether BHB clearing is
-performed on entry from PV/HVM.
-
-Fixes: 689ad48ce9cf ('x86/spec-ctrl: Wire up the Native-BHI software sequences')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 049ab0b2c9f1f5edb54b505fef0bc575787dafe9
-master date: 2024-04-25 16:35:56 +0200
----
- xen/arch/x86/spec_ctrl.c | 8 ++++----
- 1 file changed, 4 insertions(+), 4 deletions(-)
-
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index ba4349a024..8c67d6256a 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -634,7 +634,7 @@ static void __init print_details(enum ind_thunk thunk)
- (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
- boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
- boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ||
-- cpu_has_bhb_seq || amd_virt_spec_ctrl ||
-+ opt_bhb_entry_hvm || amd_virt_spec_ctrl ||
- opt_eager_fpu || opt_verw_hvm) ? "" : " None",
- boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ? " MSR_SPEC_CTRL" : "",
- (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
-@@ -643,7 +643,7 @@ static void __init print_details(enum ind_thunk thunk)
- opt_eager_fpu ? " EAGER_FPU" : "",
- opt_verw_hvm ? " VERW" : "",
- boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ? " IBPB-entry" : "",
-- cpu_has_bhb_seq ? " BHB-entry" : "");
-+ opt_bhb_entry_hvm ? " BHB-entry" : "");
-
- #endif
- #ifdef CONFIG_PV
-@@ -651,14 +651,14 @@ static void __init print_details(enum ind_thunk thunk)
- (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
- boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
- boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ||
-- cpu_has_bhb_seq ||
-+ opt_bhb_entry_pv ||
- opt_eager_fpu || opt_verw_pv) ? "" : " None",
- boot_cpu_has(X86_FEATURE_SC_MSR_PV) ? " MSR_SPEC_CTRL" : "",
- boot_cpu_has(X86_FEATURE_SC_RSB_PV) ? " RSB" : "",
- opt_eager_fpu ? " EAGER_FPU" : "",
- opt_verw_pv ? " VERW" : "",
- boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ? " IBPB-entry" : "",
-- cpu_has_bhb_seq ? " BHB-entry" : "");
-+ opt_bhb_entry_pv ? " BHB-entry" : "");
-
- printk(" XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
- opt_xpti_hwdom ? "enabled" : "disabled",
---
-2.45.2
-
diff --git a/0006-x86-spec-adjust-logic-that-elides-lfence.patch b/0006-x86-spec-adjust-logic-that-elides-lfence.patch
deleted file mode 100644
index 6da96c4..0000000
--- a/0006-x86-spec-adjust-logic-that-elides-lfence.patch
+++ /dev/null
@@ -1,75 +0,0 @@
-From f0ff1d9cb96041a84a24857a6464628240deed4f Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Mon, 29 Apr 2024 09:37:29 +0200
-Subject: [PATCH 06/56] x86/spec: adjust logic that elides lfence
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-It's currently too restrictive by just checking whether there's a BHB clearing
-sequence selected. It should instead check whether BHB clearing is used on
-entry from PV or HVM specifically.
-
-Switch to use opt_bhb_entry_{pv,hvm} instead, and then remove cpu_has_bhb_seq
-since it no longer has any users.
-
-Reported-by: Jan Beulich <jbeulich@suse.com>
-Fixes: 954c983abcee ('x86/spec-ctrl: Software BHB-clearing sequences')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 656ae8f1091bcefec9c46ec3ea3ac2118742d4f6
-master date: 2024-04-25 16:37:01 +0200
----
- xen/arch/x86/include/asm/cpufeature.h | 3 ---
- xen/arch/x86/spec_ctrl.c | 6 +++---
- 2 files changed, 3 insertions(+), 6 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/cpufeature.h b/xen/arch/x86/include/asm/cpufeature.h
-index 7a312c485e..3c57f55de0 100644
---- a/xen/arch/x86/include/asm/cpufeature.h
-+++ b/xen/arch/x86/include/asm/cpufeature.h
-@@ -228,9 +228,6 @@ static inline bool boot_cpu_has(unsigned int feat)
- #define cpu_bug_fpu_ptrs boot_cpu_has(X86_BUG_FPU_PTRS)
- #define cpu_bug_null_seg boot_cpu_has(X86_BUG_NULL_SEG)
-
--#define cpu_has_bhb_seq (boot_cpu_has(X86_SPEC_BHB_TSX) || \
-- boot_cpu_has(X86_SPEC_BHB_LOOPS))
--
- enum _cache_type {
- CACHE_TYPE_NULL = 0,
- CACHE_TYPE_DATA = 1,
-diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
-index 8c67d6256a..12c19b7eca 100644
---- a/xen/arch/x86/spec_ctrl.c
-+++ b/xen/arch/x86/spec_ctrl.c
-@@ -2328,7 +2328,7 @@ void __init init_speculation_mitigations(void)
- * unconditional WRMSR. If we do have it, or we're not using any
- * prior conditional block, then it's safe to drop the LFENCE.
- */
-- if ( !cpu_has_bhb_seq &&
-+ if ( !opt_bhb_entry_pv &&
- (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
- !boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV)) )
- setup_force_cpu_cap(X86_SPEC_NO_LFENCE_ENTRY_PV);
-@@ -2344,7 +2344,7 @@ void __init init_speculation_mitigations(void)
- * active in the block that is skipped when interrupting guest
- * context, then it's safe to drop the LFENCE.
- */
-- if ( !cpu_has_bhb_seq &&
-+ if ( !opt_bhb_entry_pv &&
- (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
- (!boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) &&
- !boot_cpu_has(X86_FEATURE_SC_RSB_PV))) )
-@@ -2356,7 +2356,7 @@ void __init init_speculation_mitigations(void)
- * A BHB sequence, if used, is the only conditional action, so if we
- * don't have it, we don't need the safety LFENCE.
- */
-- if ( !cpu_has_bhb_seq )
-+ if ( !opt_bhb_entry_hvm )
- setup_force_cpu_cap(X86_SPEC_NO_LFENCE_ENTRY_VMX);
- }
-
---
-2.45.2
-
diff --git a/0006-xen-sched-fix-error-handling-in-cpu_schedule_up.patch b/0006-xen-sched-fix-error-handling-in-cpu_schedule_up.patch
new file mode 100644
index 0000000..86189a6
--- /dev/null
+++ b/0006-xen-sched-fix-error-handling-in-cpu_schedule_up.patch
@@ -0,0 +1,113 @@
+From b37580d5e984770266783b639552a97c36ecb58a Mon Sep 17 00:00:00 2001
+From: Juergen Gross <jgross@suse.com>
+Date: Thu, 8 Aug 2024 13:46:21 +0200
+Subject: [PATCH 06/35] xen/sched: fix error handling in cpu_schedule_up()
+
+In case cpu_schedule_up() is failing, it needs to undo all externally
+visible changes it has done before.
+
+Reason is that cpu_schedule_callback() won't be called with the
+CPU_UP_CANCELED notifier in case cpu_schedule_up() did fail.
+
+Fixes: 207589dbacd4 ("xen/sched: move per cpu scheduler private data into struct sched_resource")
+Reported-by: Jan Beulich <jbeulich@suse.com>
+Signed-off-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 44a7d4f0a5e9eae41a44a162e54ff6d2ebe5b7d6
+master date: 2024-07-31 14:50:18 +0200
+---
+ xen/common/sched/core.c | 63 +++++++++++++++++++++--------------------
+ 1 file changed, 33 insertions(+), 30 deletions(-)
+
+diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
+index d84b65f197..c466711e9e 100644
+--- a/xen/common/sched/core.c
++++ b/xen/common/sched/core.c
+@@ -2755,6 +2755,36 @@ static struct sched_resource *sched_alloc_res(void)
+ return sr;
+ }
+
++static void cf_check sched_res_free(struct rcu_head *head)
++{
++ struct sched_resource *sr = container_of(head, struct sched_resource, rcu);
++
++ free_cpumask_var(sr->cpus);
++ if ( sr->sched_unit_idle )
++ sched_free_unit_mem(sr->sched_unit_idle);
++ xfree(sr);
++}
++
++static void cpu_schedule_down(unsigned int cpu)
++{
++ struct sched_resource *sr;
++
++ rcu_read_lock(&sched_res_rculock);
++
++ sr = get_sched_res(cpu);
++
++ kill_timer(&sr->s_timer);
++
++ cpumask_clear_cpu(cpu, &sched_res_mask);
++ set_sched_res(cpu, NULL);
++
++ /* Keep idle unit. */
++ sr->sched_unit_idle = NULL;
++ call_rcu(&sr->rcu, sched_res_free);
++
++ rcu_read_unlock(&sched_res_rculock);
++}
++
+ static int cpu_schedule_up(unsigned int cpu)
+ {
+ struct sched_resource *sr;
+@@ -2794,7 +2824,10 @@ static int cpu_schedule_up(unsigned int cpu)
+ idle_vcpu[cpu]->sched_unit->res = sr;
+
+ if ( idle_vcpu[cpu] == NULL )
++ {
++ cpu_schedule_down(cpu);
+ return -ENOMEM;
++ }
+
+ idle_vcpu[cpu]->sched_unit->rendezvous_in_cnt = 0;
+
+@@ -2812,36 +2845,6 @@ static int cpu_schedule_up(unsigned int cpu)
+ return 0;
+ }
+
+-static void cf_check sched_res_free(struct rcu_head *head)
+-{
+- struct sched_resource *sr = container_of(head, struct sched_resource, rcu);
+-
+- free_cpumask_var(sr->cpus);
+- if ( sr->sched_unit_idle )
+- sched_free_unit_mem(sr->sched_unit_idle);
+- xfree(sr);
+-}
+-
+-static void cpu_schedule_down(unsigned int cpu)
+-{
+- struct sched_resource *sr;
+-
+- rcu_read_lock(&sched_res_rculock);
+-
+- sr = get_sched_res(cpu);
+-
+- kill_timer(&sr->s_timer);
+-
+- cpumask_clear_cpu(cpu, &sched_res_mask);
+- set_sched_res(cpu, NULL);
+-
+- /* Keep idle unit. */
+- sr->sched_unit_idle = NULL;
+- call_rcu(&sr->rcu, sched_res_free);
+-
+- rcu_read_unlock(&sched_res_rculock);
+-}
+-
+ void sched_rm_cpu(unsigned int cpu)
+ {
+ int rc;
+--
+2.46.1
+
diff --git a/0007-xen-hvm-Don-t-skip-MSR_READ-trace-record.patch b/0007-xen-hvm-Don-t-skip-MSR_READ-trace-record.patch
new file mode 100644
index 0000000..0649616
--- /dev/null
+++ b/0007-xen-hvm-Don-t-skip-MSR_READ-trace-record.patch
@@ -0,0 +1,40 @@
+From 97a15007c9606d4c53109754bb21fd593bca589b Mon Sep 17 00:00:00 2001
+From: George Dunlap <george.dunlap@cloud.com>
+Date: Thu, 8 Aug 2024 13:47:02 +0200
+Subject: [PATCH 07/35] xen/hvm: Don't skip MSR_READ trace record
+
+Commit 37f074a3383 ("x86/msr: introduce guest_rdmsr()") introduced a
+function to combine the MSR_READ handling between PV and HVM.
+Unfortunately, by returning directly, it skipped the trace generation,
+leading to gaps in the trace record, as well as xenalyze errors like
+this:
+
+hvm_generic_postprocess: d2v0 Strange, exit 7c(VMEXIT_MSR) missing a handler
+
+Replace the `return` with `goto out`.
+
+Fixes: 37f074a3383 ("x86/msr: introduce guest_rdmsr()")
+Signed-off-by: George Dunlap <george.dunlap@cloud.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: bc8a43fff61ae4162a95d84f4e148d6773667cd2
+master date: 2024-08-02 08:42:09 +0200
+---
+ xen/arch/x86/hvm/hvm.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
+index 7f4b627b1f..0fe2b85b16 100644
+--- a/xen/arch/x86/hvm/hvm.c
++++ b/xen/arch/x86/hvm/hvm.c
+@@ -3557,7 +3557,7 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t *msr_content)
+ fixed_range_base = (uint64_t *)v->arch.hvm.mtrr.fixed_ranges;
+
+ if ( (ret = guest_rdmsr(v, msr, msr_content)) != X86EMUL_UNHANDLEABLE )
+- return ret;
++ goto out;
+
+ ret = X86EMUL_OKAY;
+
+--
+2.46.1
+
diff --git a/0007-xen-xsm-Wire-up-get_dom0_console.patch b/0007-xen-xsm-Wire-up-get_dom0_console.patch
deleted file mode 100644
index 540541c..0000000
--- a/0007-xen-xsm-Wire-up-get_dom0_console.patch
+++ /dev/null
@@ -1,66 +0,0 @@
-From 026542c8577ab6af7c1dbc7446547bdc2bc705fd Mon Sep 17 00:00:00 2001
-From: Jason Andryuk <jason.andryuk@amd.com>
-Date: Tue, 21 May 2024 10:19:43 +0200
-Subject: [PATCH 07/56] xen/xsm: Wire up get_dom0_console
-
-An XSM hook for get_dom0_console is currently missing. Using XSM with
-a PVH dom0 shows:
-(XEN) FLASK: Denying unknown platform_op: 64.
-
-Wire up the hook, and allow it for dom0.
-
-Fixes: 4dd160583c ("x86/platform: introduce hypercall to get initial video console settings")
-Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
-Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
-master commit: 647f7e50ebeeb8152974cad6a12affe474c74513
-master date: 2024-04-30 08:33:41 +0200
----
- tools/flask/policy/modules/dom0.te | 2 +-
- xen/xsm/flask/hooks.c | 4 ++++
- xen/xsm/flask/policy/access_vectors | 2 ++
- 3 files changed, 7 insertions(+), 1 deletion(-)
-
-diff --git a/tools/flask/policy/modules/dom0.te b/tools/flask/policy/modules/dom0.te
-index f1dcff48e2..16b8c9646d 100644
---- a/tools/flask/policy/modules/dom0.te
-+++ b/tools/flask/policy/modules/dom0.te
-@@ -16,7 +16,7 @@ allow dom0_t xen_t:xen {
- allow dom0_t xen_t:xen2 {
- resource_op psr_cmt_op psr_alloc pmu_ctrl get_symbol
- get_cpu_levelling_caps get_cpu_featureset livepatch_op
-- coverage_op
-+ coverage_op get_dom0_console
- };
-
- # Allow dom0 to use all XENVER_ subops that have checks.
-diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
-index 78225f68c1..5e88c71b8e 100644
---- a/xen/xsm/flask/hooks.c
-+++ b/xen/xsm/flask/hooks.c
-@@ -1558,6 +1558,10 @@ static int cf_check flask_platform_op(uint32_t op)
- return avc_has_perm(domain_sid(current->domain), SECINITSID_XEN,
- SECCLASS_XEN2, XEN2__GET_SYMBOL, NULL);
-
-+ case XENPF_get_dom0_console:
-+ return avc_has_perm(domain_sid(current->domain), SECINITSID_XEN,
-+ SECCLASS_XEN2, XEN2__GET_DOM0_CONSOLE, NULL);
-+
- default:
- return avc_unknown_permission("platform_op", op);
- }
-diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors
-index 4e6710a63e..a35e3d4c51 100644
---- a/xen/xsm/flask/policy/access_vectors
-+++ b/xen/xsm/flask/policy/access_vectors
-@@ -99,6 +99,8 @@ class xen2
- livepatch_op
- # XEN_SYSCTL_coverage_op
- coverage_op
-+# XENPF_get_dom0_console
-+ get_dom0_console
- }
-
- # Classes domain and domain2 consist of operations that a domain performs on
---
-2.45.2
-
diff --git a/0008-tools-lsevtchn-Use-errno-macro-to-handle-hypercall-e.patch b/0008-tools-lsevtchn-Use-errno-macro-to-handle-hypercall-e.patch
new file mode 100644
index 0000000..76bb65e
--- /dev/null
+++ b/0008-tools-lsevtchn-Use-errno-macro-to-handle-hypercall-e.patch
@@ -0,0 +1,75 @@
+From e0e84771b61ed985809d105d8f116d4c520542b0 Mon Sep 17 00:00:00 2001
+From: Matthew Barnes <matthew.barnes@cloud.com>
+Date: Thu, 8 Aug 2024 13:47:30 +0200
+Subject: [PATCH 08/35] tools/lsevtchn: Use errno macro to handle hypercall
+ error cases
+
+Currently, lsevtchn aborts its event channel enumeration when it hits
+an event channel that is owned by Xen.
+
+lsevtchn does not distinguish between different hypercall errors, which
+results in lsevtchn missing potential relevant event channels with
+higher port numbers.
+
+Use the errno macro to distinguish between hypercall errors, and
+continue event channel enumeration if the hypercall error is not
+critical to enumeration.
+
+Signed-off-by: Matthew Barnes <matthew.barnes@cloud.com>
+Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
+master commit: e92a453c8db8bba62d6be3006079e2b9990c3978
+master date: 2024-08-02 08:43:57 +0200
+---
+ tools/xcutils/lsevtchn.c | 22 ++++++++++++++++++++--
+ 1 file changed, 20 insertions(+), 2 deletions(-)
+
+diff --git a/tools/xcutils/lsevtchn.c b/tools/xcutils/lsevtchn.c
+index d1710613dd..30c8d847b8 100644
+--- a/tools/xcutils/lsevtchn.c
++++ b/tools/xcutils/lsevtchn.c
+@@ -3,6 +3,7 @@
+ #include <stdint.h>
+ #include <string.h>
+ #include <stdio.h>
++#include <errno.h>
+
+ #include <xenctrl.h>
+
+@@ -24,7 +25,23 @@ int main(int argc, char **argv)
+ status.port = port;
+ rc = xc_evtchn_status(xch, &status);
+ if ( rc < 0 )
+- break;
++ {
++ switch ( errno )
++ {
++ case EACCES: /* Xen-owned evtchn */
++ continue;
++
++ case EINVAL: /* Port enumeration has ended */
++ rc = 0;
++ break;
++
++ default:
++ perror("xc_evtchn_status");
++ rc = 1;
++ break;
++ }
++ goto out;
++ }
+
+ if ( status.status == EVTCHNSTAT_closed )
+ continue;
+@@ -58,7 +75,8 @@ int main(int argc, char **argv)
+ printf("\n");
+ }
+
++ out:
+ xc_interface_close(xch);
+
+- return 0;
++ return rc;
+ }
+--
+2.46.1
+
diff --git a/0008-xen-x86-Fix-Syntax-warning-in-gen-cpuid.py.patch b/0008-xen-x86-Fix-Syntax-warning-in-gen-cpuid.py.patch
deleted file mode 100644
index 7c04f23..0000000
--- a/0008-xen-x86-Fix-Syntax-warning-in-gen-cpuid.py.patch
+++ /dev/null
@@ -1,41 +0,0 @@
-From 47cf06c09a2fa1ee92ea3e7718c8f8e0f1450d88 Mon Sep 17 00:00:00 2001
-From: Jason Andryuk <jason.andryuk@amd.com>
-Date: Tue, 21 May 2024 10:20:06 +0200
-Subject: [PATCH 08/56] xen/x86: Fix Syntax warning in gen-cpuid.py
-
-Python 3.12.2 warns:
-
-xen/tools/gen-cpuid.py:50: SyntaxWarning: invalid escape sequence '\s'
- "\s+([\s\d]+\*[\s\d]+\+[\s\d]+)\)"
-xen/tools/gen-cpuid.py:51: SyntaxWarning: invalid escape sequence '\s'
- "\s+/\*([\w!]*) .*$")
-
-Specify the strings as raw strings so '\s' is read as literal '\' + 's'.
-This avoids escaping all the '\'s in the strings.
-
-Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 08e79bba73d74a85d3ce6ff0f91c5205f1e05eda
-master date: 2024-04-30 08:34:37 +0200
----
- xen/tools/gen-cpuid.py | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/xen/tools/gen-cpuid.py b/xen/tools/gen-cpuid.py
-index 02dd45a5ed..415d644db5 100755
---- a/xen/tools/gen-cpuid.py
-+++ b/xen/tools/gen-cpuid.py
-@@ -47,8 +47,8 @@ def parse_definitions(state):
- """
- feat_regex = re.compile(
- r"^XEN_CPUFEATURE\(([A-Z0-9_]+),"
-- "\s+([\s\d]+\*[\s\d]+\+[\s\d]+)\)"
-- "\s+/\*([\w!]*) .*$")
-+ r"\s+([\s\d]+\*[\s\d]+\+[\s\d]+)\)"
-+ r"\s+/\*([\w!]*) .*$")
-
- word_regex = re.compile(
- r"^/\* .* word (\d*) \*/$")
---
-2.45.2
-
diff --git a/0009-9pfsd-fix-release-build-with-old-gcc.patch b/0009-9pfsd-fix-release-build-with-old-gcc.patch
new file mode 100644
index 0000000..6d6f2ef
--- /dev/null
+++ b/0009-9pfsd-fix-release-build-with-old-gcc.patch
@@ -0,0 +1,33 @@
+From 8ad5a8c5c36add2eee70a7253da4098ebffdb79b Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Thu, 8 Aug 2024 13:47:44 +0200
+Subject: [PATCH 09/35] 9pfsd: fix release build with old gcc
+
+Being able to recognize that "par" is reliably initialized on the 1st
+loop iteration requires not overly old compilers.
+
+Fixes: 7809132b1a1d ("tools/xen-9pfsd: add 9pfs response generation support")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+master commit: 984cb316cb27b53704c607e640a7dd2763b898ab
+master date: 2024-08-02 08:44:22 +0200
+---
+ tools/9pfsd/io.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/tools/9pfsd/io.c b/tools/9pfsd/io.c
+index df1be3df7d..468e0241f5 100644
+--- a/tools/9pfsd/io.c
++++ b/tools/9pfsd/io.c
+@@ -196,7 +196,7 @@ static void fill_buffer_at(void **data, const char *fmt, ...);
+ static void vfill_buffer_at(void **data, const char *fmt, va_list ap)
+ {
+ const char *f;
+- const void *par;
++ const void *par = NULL; /* old gcc */
+ const char *str_val;
+ const struct p9_qid *qid;
+ const struct p9_stat *stat;
+--
+2.46.1
+
diff --git a/0009-VT-d-correct-ATS-checking-for-root-complex-integrate.patch b/0009-VT-d-correct-ATS-checking-for-root-complex-integrate.patch
deleted file mode 100644
index 2d2dc91..0000000
--- a/0009-VT-d-correct-ATS-checking-for-root-complex-integrate.patch
+++ /dev/null
@@ -1,63 +0,0 @@
-From a4c5bbb9db07b27e66f7c47676b1c888e1bece20 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 21 May 2024 10:20:58 +0200
-Subject: [PATCH 09/56] VT-d: correct ATS checking for root complex integrated
- devices
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Spec version 4.1 says
-
-"The ATSR structures identifies PCI Express Root-Ports supporting
- Address Translation Services (ATS) transactions. Software must enable
- ATS on endpoint devices behind a Root Port only if the Root Port is
- reported as supporting ATS transactions."
-
-Clearly root complex integrated devices aren't "behind root ports",
-matching my observation on a SapphireRapids system having an ATS-
-capable root complex integrated device. Hence for such devices we
-shouldn't try to locate a corresponding ATSR.
-
-Since both pci_find_ext_capability() and pci_find_cap_offset() return
-"unsigned int", change "pos" to that type at the same time.
-
-Fixes: 903b93211f56 ("[VTD] laying the ground work for ATS")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 04e31583bab97e5042a44a1d00fce2760272635f
-master date: 2024-05-06 09:22:45 +0200
----
- xen/drivers/passthrough/vtd/x86/ats.c | 9 +++++++--
- 1 file changed, 7 insertions(+), 2 deletions(-)
-
-diff --git a/xen/drivers/passthrough/vtd/x86/ats.c b/xen/drivers/passthrough/vtd/x86/ats.c
-index 1f5913bed9..61052ef580 100644
---- a/xen/drivers/passthrough/vtd/x86/ats.c
-+++ b/xen/drivers/passthrough/vtd/x86/ats.c
-@@ -44,7 +44,7 @@ struct acpi_drhd_unit *find_ats_dev_drhd(struct vtd_iommu *iommu)
- int ats_device(const struct pci_dev *pdev, const struct acpi_drhd_unit *drhd)
- {
- struct acpi_drhd_unit *ats_drhd;
-- int pos;
-+ unsigned int pos, expfl = 0;
-
- if ( !ats_enabled || !iommu_qinval )
- return 0;
-@@ -53,7 +53,12 @@ int ats_device(const struct pci_dev *pdev, const struct acpi_drhd_unit *drhd)
- !ecap_dev_iotlb(drhd->iommu->ecap) )
- return 0;
-
-- if ( !acpi_find_matched_atsr_unit(pdev) )
-+ pos = pci_find_cap_offset(pdev->sbdf, PCI_CAP_ID_EXP);
-+ if ( pos )
-+ expfl = pci_conf_read16(pdev->sbdf, pos + PCI_EXP_FLAGS);
-+
-+ if ( MASK_EXTR(expfl, PCI_EXP_FLAGS_TYPE) != PCI_EXP_TYPE_RC_END &&
-+ !acpi_find_matched_atsr_unit(pdev) )
- return 0;
-
- ats_drhd = find_ats_dev_drhd(drhd->iommu);
---
-2.45.2
-
diff --git a/0010-tools-libxs-Open-dev-xen-xenbus-fds-as-O_CLOEXEC.patch b/0010-tools-libxs-Open-dev-xen-xenbus-fds-as-O_CLOEXEC.patch
deleted file mode 100644
index 9f9cdd7..0000000
--- a/0010-tools-libxs-Open-dev-xen-xenbus-fds-as-O_CLOEXEC.patch
+++ /dev/null
@@ -1,47 +0,0 @@
-From 2bc52041cacb33a301ebf939d69a021597941186 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 21 May 2024 10:21:47 +0200
-Subject: [PATCH 10/56] tools/libxs: Open /dev/xen/xenbus fds as O_CLOEXEC
-
-The header description for xs_open() goes as far as to suggest that the fd is
-O_CLOEXEC, but it isn't actually.
-
-`xl devd` has been observed leaking /dev/xen/xenbus into children.
-
-Link: https://github.com/QubesOS/qubes-issues/issues/8292
-Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-master commit: f4f2f3402b2f4985d69ffc0d46f845d05fd0b60f
-master date: 2024-05-07 15:18:36 +0100
----
- tools/libs/store/xs.c | 6 +++++-
- 1 file changed, 5 insertions(+), 1 deletion(-)
-
-diff --git a/tools/libs/store/xs.c b/tools/libs/store/xs.c
-index 140b9a2839..1498515073 100644
---- a/tools/libs/store/xs.c
-+++ b/tools/libs/store/xs.c
-@@ -54,6 +54,10 @@ struct xs_stored_msg {
- #include <dlfcn.h>
- #endif
-
-+#ifndef O_CLOEXEC
-+#define O_CLOEXEC 0
-+#endif
-+
- struct xs_handle {
- /* Communications channel to xenstore daemon. */
- int fd;
-@@ -227,7 +231,7 @@ error:
- static int get_dev(const char *connect_to)
- {
- /* We cannot open read-only because requests are writes */
-- return open(connect_to, O_RDWR);
-+ return open(connect_to, O_RDWR | O_CLOEXEC);
- }
-
- static int all_restrict_cb(Xentoolcore__Active_Handle *ah, domid_t domid) {
---
-2.45.2
-
diff --git a/0010-x86-emul-Fix-misaligned-IO-breakpoint-behaviour-in-P.patch b/0010-x86-emul-Fix-misaligned-IO-breakpoint-behaviour-in-P.patch
new file mode 100644
index 0000000..07b592a
--- /dev/null
+++ b/0010-x86-emul-Fix-misaligned-IO-breakpoint-behaviour-in-P.patch
@@ -0,0 +1,41 @@
+From 033060ee6e05f9e86ef1a51674864b55dc15e62c Mon Sep 17 00:00:00 2001
+From: Matthew Barnes <matthew.barnes@cloud.com>
+Date: Thu, 8 Aug 2024 13:48:03 +0200
+Subject: [PATCH 10/35] x86/emul: Fix misaligned IO breakpoint behaviour in PV
+ guests
+
+When hardware breakpoints are configured on misaligned IO ports, the
+hardware will mask the addresses based on the breakpoint width during
+comparison.
+
+For PV guests, misaligned IO breakpoints do not behave the same way, and
+therefore yield different results.
+
+This patch tweaks the emulation of IO breakpoints for PV guests such
+that they reproduce the same behaviour as hardware.
+
+Fixes: bec9e3205018 ("x86: emulate I/O port access breakpoints")
+Signed-off-by: Matthew Barnes <matthew.barnes@cloud.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 08aacc392d86d4c7dbebdb5e664060ae2af72057
+master date: 2024-08-08 13:27:50 +0200
+---
+ xen/arch/x86/pv/emul-priv-op.c | 2 ++
+ 1 file changed, 2 insertions(+)
+
+diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
+index f101510a1b..aa11ecadaa 100644
+--- a/xen/arch/x86/pv/emul-priv-op.c
++++ b/xen/arch/x86/pv/emul-priv-op.c
+@@ -346,6 +346,8 @@ static unsigned int check_guest_io_breakpoint(struct vcpu *v,
+ case DR_LEN_8: width = 8; break;
+ }
+
++ start &= ~(width - 1UL);
++
+ if ( (start < (port + len)) && ((start + width) > port) )
+ match |= 1u << i;
+ }
+--
+2.46.1
+
diff --git a/0011-x86-IOMMU-move-tracking-in-iommu_identity_mapping.patch b/0011-x86-IOMMU-move-tracking-in-iommu_identity_mapping.patch
new file mode 100644
index 0000000..930ea55
--- /dev/null
+++ b/0011-x86-IOMMU-move-tracking-in-iommu_identity_mapping.patch
@@ -0,0 +1,111 @@
+From c61d4264d26d1ffb26563bfb6dc2f0b06cd72128 Mon Sep 17 00:00:00 2001
+From: Teddy Astie <teddy.astie@vates.tech>
+Date: Tue, 13 Aug 2024 16:47:19 +0200
+Subject: [PATCH 11/35] x86/IOMMU: move tracking in iommu_identity_mapping()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+If for some reason xmalloc() fails after having mapped the reserved
+regions, an error is reported, but the regions remain mapped in the P2M.
+
+Similarly if an error occurs during set_identity_p2m_entry() (except on
+the first call), the partial mappings of the region would be retained
+without being tracked anywhere, and hence without there being a way to
+remove them again from the domain's P2M.
+
+Move the setting up of the list entry ahead of trying to map the region.
+In cases other than the first mapping failing, keep record of the full
+region, such that a subsequent unmapping request can be properly torn
+down.
+
+To compensate for the potentially excess unmapping requests, don't log a
+warning from p2m_remove_identity_entry() when there really was nothing
+mapped at a given GFN.
+
+This is XSA-460 / CVE-2024-31145.
+
+Fixes: 2201b67b9128 ("VT-d: improve RMRR region handling")
+Fixes: c0e19d7c6c42 ("IOMMU: generalize VT-d's tracking of mapped RMRR regions")
+Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: beadd68b5490ada053d72f8a9ce6fd696d626596
+master date: 2024-08-13 16:36:40 +0200
+---
+ xen/arch/x86/mm/p2m.c | 8 +++++---
+ xen/drivers/passthrough/x86/iommu.c | 30 ++++++++++++++++++++---------
+ 2 files changed, 26 insertions(+), 12 deletions(-)
+
+diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
+index e7e327d6a6..1739133fc2 100644
+--- a/xen/arch/x86/mm/p2m.c
++++ b/xen/arch/x86/mm/p2m.c
+@@ -1267,9 +1267,11 @@ int p2m_remove_identity_entry(struct domain *d, unsigned long gfn_l)
+ else
+ {
+ gfn_unlock(p2m, gfn, 0);
+- printk(XENLOG_G_WARNING
+- "non-identity map d%d:%lx not cleared (mapped to %lx)\n",
+- d->domain_id, gfn_l, mfn_x(mfn));
++ if ( (p2mt != p2m_invalid && p2mt != p2m_mmio_dm) ||
++ a != p2m_access_n || !mfn_eq(mfn, INVALID_MFN) )
++ printk(XENLOG_G_WARNING
++ "non-identity map %pd:%lx not cleared (mapped to %lx)\n",
++ d, gfn_l, mfn_x(mfn));
+ ret = 0;
+ }
+
+diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
+index cc0062b027..8b1e0596b8 100644
+--- a/xen/drivers/passthrough/x86/iommu.c
++++ b/xen/drivers/passthrough/x86/iommu.c
+@@ -267,24 +267,36 @@ int iommu_identity_mapping(struct domain *d, p2m_access_t p2ma,
+ if ( p2ma == p2m_access_x )
+ return -ENOENT;
+
+- while ( base_pfn < end_pfn )
+- {
+- int err = set_identity_p2m_entry(d, base_pfn, p2ma, flag);
+-
+- if ( err )
+- return err;
+- base_pfn++;
+- }
+-
+ map = xmalloc(struct identity_map);
+ if ( !map )
+ return -ENOMEM;
++
+ map->base = base;
+ map->end = end;
+ map->access = p2ma;
+ map->count = 1;
++
++ /*
++ * Insert into list ahead of mapping, so the range can be found when
++ * trying to clean up.
++ */
+ list_add_tail(&map->list, &hd->arch.identity_maps);
+
++ for ( ; base_pfn < end_pfn; ++base_pfn )
++ {
++ int err = set_identity_p2m_entry(d, base_pfn, p2ma, flag);
++
++ if ( !err )
++ continue;
++
++ if ( (map->base >> PAGE_SHIFT_4K) == base_pfn )
++ {
++ list_del(&map->list);
++ xfree(map);
++ }
++ return err;
++ }
++
+ return 0;
+ }
+
+--
+2.46.1
+
diff --git a/0011-x86-cpu-policy-Fix-migration-from-Ice-Lake-to-Cascad.patch b/0011-x86-cpu-policy-Fix-migration-from-Ice-Lake-to-Cascad.patch
deleted file mode 100644
index 26eb3ec..0000000
--- a/0011-x86-cpu-policy-Fix-migration-from-Ice-Lake-to-Cascad.patch
+++ /dev/null
@@ -1,92 +0,0 @@
-From 0673eae8e53de5007dba35149527579819428323 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 21 May 2024 10:22:08 +0200
-Subject: [PATCH 11/56] x86/cpu-policy: Fix migration from Ice Lake to Cascade
- Lake
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Ever since Xen 4.14, there has been a latent bug with migration.
-
-While some toolstacks can level the features properly, they don't shink
-feat.max_subleaf when all features have been dropped. This is because
-we *still* have not completed the toolstack side work for full CPU Policy
-objects.
-
-As a consequence, even when properly feature levelled, VMs can't migrate
-"backwards" across hardware which reduces feat.max_subleaf. One such example
-is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0).
-
-Extend the max policies feat.max_subleaf to the hightest number Xen knows
-about, but leave the default policies matching the host. This will allow VMs
-with a higher feat.max_subleaf than strictly necessary to migrate in.
-
-Eventually we'll manage to teach the toolstack how to avoid creating such VMs
-in the first place, but there's still more work to do there.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: a2330b51df267e20e66bbba6c5bf08f0570ed58b
-master date: 2024-05-07 16:56:46 +0100
----
- xen/arch/x86/cpu-policy.c | 22 ++++++++++++++++++++++
- 1 file changed, 22 insertions(+)
-
-diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
-index a822800f52..1aba6ed4ca 100644
---- a/xen/arch/x86/cpu-policy.c
-+++ b/xen/arch/x86/cpu-policy.c
-@@ -603,6 +603,13 @@ static void __init calculate_pv_max_policy(void)
- unsigned int i;
-
- *p = host_cpu_policy;
-+
-+ /*
-+ * Some VMs may have a larger-than-necessary feat max_subleaf. Allow them
-+ * to migrate in.
-+ */
-+ p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1;
-+
- x86_cpu_policy_to_featureset(p, fs);
-
- for ( i = 0; i < ARRAY_SIZE(fs); ++i )
-@@ -643,6 +650,10 @@ static void __init calculate_pv_def_policy(void)
- unsigned int i;
-
- *p = pv_max_cpu_policy;
-+
-+ /* Default to the same max_subleaf as the host. */
-+ p->feat.max_subleaf = host_cpu_policy.feat.max_subleaf;
-+
- x86_cpu_policy_to_featureset(p, fs);
-
- for ( i = 0; i < ARRAY_SIZE(fs); ++i )
-@@ -679,6 +690,13 @@ static void __init calculate_hvm_max_policy(void)
- const uint32_t *mask;
-
- *p = host_cpu_policy;
-+
-+ /*
-+ * Some VMs may have a larger-than-necessary feat max_subleaf. Allow them
-+ * to migrate in.
-+ */
-+ p->feat.max_subleaf = ARRAY_SIZE(p->feat.raw) - 1;
-+
- x86_cpu_policy_to_featureset(p, fs);
-
- mask = hvm_hap_supported() ?
-@@ -780,6 +798,10 @@ static void __init calculate_hvm_def_policy(void)
- const uint32_t *mask;
-
- *p = hvm_max_cpu_policy;
-+
-+ /* Default to the same max_subleaf as the host. */
-+ p->feat.max_subleaf = host_cpu_policy.feat.max_subleaf;
-+
- x86_cpu_policy_to_featureset(p, fs);
-
- mask = hvm_hap_supported() ?
---
-2.45.2
-
diff --git a/0012-x86-pass-through-documents-as-security-unsupported-w.patch b/0012-x86-pass-through-documents-as-security-unsupported-w.patch
new file mode 100644
index 0000000..a83553c
--- /dev/null
+++ b/0012-x86-pass-through-documents-as-security-unsupported-w.patch
@@ -0,0 +1,42 @@
+From 3e8a2217f211d49dd771f7918d72df057121109f Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 13 Aug 2024 16:48:13 +0200
+Subject: [PATCH 12/35] x86/pass-through: documents as security-unsupported
+ when sharing resources
+
+When multiple devices share resources and one of them is to be passed
+through to a guest, security of the entire system and of respective
+guests individually cannot really be guaranteed without knowing
+internals of any of the involved guests. Therefore such a configuration
+cannot really be security-supported, yet making that explicit was so far
+missing.
+
+This is XSA-461 / CVE-2024-31146.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+master commit: 9c94eda1e3790820699a6de3f6a7c959ecf30600
+master date: 2024-08-13 16:37:25 +0200
+---
+ SUPPORT.md | 5 +++++
+ 1 file changed, 5 insertions(+)
+
+diff --git a/SUPPORT.md b/SUPPORT.md
+index 8b998d9bc7..1d8b38cbd0 100644
+--- a/SUPPORT.md
++++ b/SUPPORT.md
+@@ -841,6 +841,11 @@ This feature is not security supported: see https://xenbits.xen.org/xsa/advisory
+
+ Only systems using IOMMUs are supported.
+
++Passing through of devices sharing resources with another device is not
++security supported. Such sharing could e.g. be the same line interrupt being
++used by multiple devices, one of which is to be passed through, or two such
++devices having memory BARs within the same 4k page.
++
+ Not compatible with migration, populate-on-demand, altp2m,
+ introspection, memory sharing, or memory paging.
+
+--
+2.46.1
+
diff --git a/0012-x86-ucode-Distinguish-ucode-already-up-to-date.patch b/0012-x86-ucode-Distinguish-ucode-already-up-to-date.patch
deleted file mode 100644
index dd2f91a..0000000
--- a/0012-x86-ucode-Distinguish-ucode-already-up-to-date.patch
+++ /dev/null
@@ -1,58 +0,0 @@
-From a42c83b202cc034c43c723cf363dbbabac61b1af Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Tue, 21 May 2024 10:22:52 +0200
-Subject: [PATCH 12/56] x86/ucode: Distinguish "ucode already up to date"
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Right now, Xen returns -ENOENT for both "the provided blob isn't correct for
-this CPU", and "the blob isn't newer than what's loaded".
-
-This in turn causes xen-ucode to exit with an error, when "nothing to do" is
-more commonly a success condition.
-
-Handle EEXIST specially and exit cleanly.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 648db37a155aca6f66d4cf3bb118417a728c3579
-master date: 2024-05-09 18:19:49 +0100
----
- tools/misc/xen-ucode.c | 5 ++++-
- xen/arch/x86/cpu/microcode/core.c | 2 +-
- 2 files changed, 5 insertions(+), 2 deletions(-)
-
-diff --git a/tools/misc/xen-ucode.c b/tools/misc/xen-ucode.c
-index c6ae6498d6..390969db3d 100644
---- a/tools/misc/xen-ucode.c
-+++ b/tools/misc/xen-ucode.c
-@@ -125,8 +125,11 @@ int main(int argc, char *argv[])
- exit(1);
- }
-
-+ errno = 0;
- ret = xc_microcode_update(xch, buf, len);
-- if ( ret )
-+ if ( ret == -1 && errno == EEXIST )
-+ printf("Microcode already up to date\n");
-+ else if ( ret )
- {
- fprintf(stderr, "Failed to update microcode. (err: %s)\n",
- strerror(errno));
-diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
-index 4e011cdc41..d5338ad345 100644
---- a/xen/arch/x86/cpu/microcode/core.c
-+++ b/xen/arch/x86/cpu/microcode/core.c
-@@ -640,7 +640,7 @@ static long cf_check microcode_update_helper(void *data)
- "microcode: couldn't find any newer%s revision in the provided blob!\n",
- opt_ucode_allow_same ? " (or the same)" : "");
- microcode_free_patch(patch);
-- ret = -ENOENT;
-+ ret = -EEXIST;
-
- goto put;
- }
---
-2.45.2
-
diff --git a/0013-automation-disable-Yocto-jobs.patch b/0013-automation-disable-Yocto-jobs.patch
new file mode 100644
index 0000000..72fda13
--- /dev/null
+++ b/0013-automation-disable-Yocto-jobs.patch
@@ -0,0 +1,48 @@
+From 51ae51301f2b4bccd365353f78510c1bdac522c9 Mon Sep 17 00:00:00 2001
+From: Stefano Stabellini <stefano.stabellini@amd.com>
+Date: Fri, 9 Aug 2024 23:59:18 -0700
+Subject: [PATCH 13/35] automation: disable Yocto jobs
+
+The Yocto jobs take a long time to run. We are changing Gitlab ARM64
+runners and the new runners might not be able to finish the Yocto jobs
+in a reasonable time.
+
+For now, disable the Yocto jobs by turning them into "manual" trigger
+(they need to be manually executed.)
+
+Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
+Reviewed-by: Michal Orzel <michal.orzel@amd.com>
+master commit: 1c24bca387136d73f88f46ce3db82d34411702e8
+master date: 2024-08-09 23:59:18 -0700
+---
+ automation/gitlab-ci/build.yaml | 3 +++
+ 1 file changed, 3 insertions(+)
+
+diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
+index 7ce88d38e7..09895d1fbd 100644
+--- a/automation/gitlab-ci/build.yaml
++++ b/automation/gitlab-ci/build.yaml
+@@ -470,17 +470,20 @@ yocto-qemuarm64:
+ extends: .yocto-test-arm64
+ variables:
+ YOCTO_BOARD: qemuarm64
++ when: manual
+
+ yocto-qemuarm:
+ extends: .yocto-test-arm64
+ variables:
+ YOCTO_BOARD: qemuarm
+ YOCTO_OUTPUT: --copy-output
++ when: manual
+
+ yocto-qemux86-64:
+ extends: .yocto-test-x86-64
+ variables:
+ YOCTO_BOARD: qemux86-64
++ when: manual
+
+ # Cppcheck analysis jobs
+
+--
+2.46.1
+
diff --git a/0013-libxl-fix-population-of-the-online-vCPU-bitmap-for-P.patch b/0013-libxl-fix-population-of-the-online-vCPU-bitmap-for-P.patch
deleted file mode 100644
index e5fb285..0000000
--- a/0013-libxl-fix-population-of-the-online-vCPU-bitmap-for-P.patch
+++ /dev/null
@@ -1,61 +0,0 @@
-From 9966e5413133157a630f7462518005fb898e582a Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 21 May 2024 10:23:27 +0200
-Subject: [PATCH 13/56] libxl: fix population of the online vCPU bitmap for PVH
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-libxl passes some information to libacpi to create the ACPI table for a PVH
-guest, and among that information it's a bitmap of which vCPUs are online
-which can be less than the maximum number of vCPUs assigned to the domain.
-
-While the population of the bitmap is done correctly for HVM based on the
-number of online vCPUs, for PVH the population of the bitmap is done based on
-the number of maximum vCPUs allowed. This leads to all local APIC entries in
-the MADT being set as enabled, which contradicts the data in xenstore if vCPUs
-is different than maximum vCPUs.
-
-Fix by copying the internal libxl bitmap that's populated based on the vCPUs
-parameter.
-
-Reported-by: Arthur Borsboom <arthurborsboom@gmail.com>
-Link: https://gitlab.com/libvirt/libvirt/-/issues/399
-Reported-by: Leigh Brown <leigh@solinno.co.uk>
-Fixes: 14c0d328da2b ('libxl/acpi: Build ACPI tables for HVMlite guests')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Tested-by: Leigh Brown <leigh@solinno.co.uk>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 5cc7347b04b2d0a3133754c7a9b936f614ec656a
-master date: 2024-05-11 00:13:43 +0100
----
- tools/libs/light/libxl_x86_acpi.c | 6 +++---
- 1 file changed, 3 insertions(+), 3 deletions(-)
-
-diff --git a/tools/libs/light/libxl_x86_acpi.c b/tools/libs/light/libxl_x86_acpi.c
-index 620f3c700c..5cf261bd67 100644
---- a/tools/libs/light/libxl_x86_acpi.c
-+++ b/tools/libs/light/libxl_x86_acpi.c
-@@ -89,7 +89,7 @@ static int init_acpi_config(libxl__gc *gc,
- uint32_t domid = dom->guest_domid;
- xc_domaininfo_t info;
- struct hvm_info_table *hvminfo;
-- int i, r, rc;
-+ int r, rc;
-
- config->dsdt_anycpu = config->dsdt_15cpu = dsdt_pvh;
- config->dsdt_anycpu_len = config->dsdt_15cpu_len = dsdt_pvh_len;
-@@ -138,8 +138,8 @@ static int init_acpi_config(libxl__gc *gc,
- hvminfo->nr_vcpus = info.max_vcpu_id + 1;
- }
-
-- for (i = 0; i < hvminfo->nr_vcpus; i++)
-- hvminfo->vcpu_online[i / 8] |= 1 << (i & 7);
-+ memcpy(hvminfo->vcpu_online, b_info->avail_vcpus.map,
-+ b_info->avail_vcpus.size);
-
- config->hvminfo = hvminfo;
-
---
-2.45.2
-
diff --git a/0014-automation-use-expect-to-run-QEMU.patch b/0014-automation-use-expect-to-run-QEMU.patch
new file mode 100644
index 0000000..90e2c62
--- /dev/null
+++ b/0014-automation-use-expect-to-run-QEMU.patch
@@ -0,0 +1,362 @@
+From 0918434e0fbee48c9dccc5fe262de5a81e380c15 Mon Sep 17 00:00:00 2001
+From: Stefano Stabellini <stefano.stabellini@amd.com>
+Date: Fri, 9 Aug 2024 23:59:20 -0700
+Subject: [PATCH 14/35] automation: use expect to run QEMU
+
+Use expect to invoke QEMU so that we can terminate the test as soon as
+we get the right string in the output instead of waiting until the
+final timeout.
+
+For timeout, instead of an hardcoding the value, use a Gitlab CI
+variable "QEMU_TIMEOUT" that can be changed depending on the latest
+status of the Gitlab CI runners.
+
+Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
+Reviewed-by: Michal Orzel <michal.orzel@amd.com>
+master commit: c36efb7fcea6ef9f31a20e60ec79ed3ae293feee
+master date: 2024-08-09 23:59:20 -0700
+---
+ automation/scripts/qemu-alpine-x86_64.sh | 16 +++----
+ automation/scripts/qemu-key.exp | 45 +++++++++++++++++++
+ automation/scripts/qemu-smoke-dom0-arm32.sh | 16 +++----
+ automation/scripts/qemu-smoke-dom0-arm64.sh | 16 +++----
+ .../scripts/qemu-smoke-dom0less-arm32.sh | 18 ++++----
+ .../scripts/qemu-smoke-dom0less-arm64.sh | 16 +++----
+ automation/scripts/qemu-smoke-ppc64le.sh | 13 +++---
+ automation/scripts/qemu-smoke-riscv64.sh | 13 +++---
+ automation/scripts/qemu-smoke-x86-64.sh | 15 ++++---
+ automation/scripts/qemu-xtf-dom0less-arm64.sh | 15 +++----
+ 10 files changed, 112 insertions(+), 71 deletions(-)
+ create mode 100755 automation/scripts/qemu-key.exp
+
+diff --git a/automation/scripts/qemu-alpine-x86_64.sh b/automation/scripts/qemu-alpine-x86_64.sh
+index 8e398dcea3..5359e0820b 100755
+--- a/automation/scripts/qemu-alpine-x86_64.sh
++++ b/automation/scripts/qemu-alpine-x86_64.sh
+@@ -77,18 +77,16 @@ EOF
+ # Run the test
+ rm -f smoke.serial
+ set +e
+-timeout -k 1 720 \
+-qemu-system-x86_64 \
++export QEMU_CMD="qemu-system-x86_64 \
+ -cpu qemu64,+svm \
+ -m 2G -smp 2 \
+ -monitor none -serial stdio \
+ -nographic \
+ -device virtio-net-pci,netdev=n0 \
+- -netdev user,id=n0,tftp=binaries,bootfile=/pxelinux.0 |& \
+- # Remove carriage returns from the stdout output, as gitlab
+- # interface chokes on them
+- tee smoke.serial | sed 's/\r//'
++ -netdev user,id=n0,tftp=binaries,bootfile=/pxelinux.0"
+
+-set -e
+-(grep -q "Domain-0" smoke.serial && grep -q "BusyBox" smoke.serial) || exit 1
+-exit 0
++export QEMU_LOG="smoke.serial"
++export LOG_MSG="Domain-0"
++export PASSED="BusyBox"
++
++./automation/scripts/qemu-key.exp
+diff --git a/automation/scripts/qemu-key.exp b/automation/scripts/qemu-key.exp
+new file mode 100755
+index 0000000000..35eb903a31
+--- /dev/null
++++ b/automation/scripts/qemu-key.exp
+@@ -0,0 +1,45 @@
++#!/usr/bin/expect -f
++
++set timeout $env(QEMU_TIMEOUT)
++
++log_file -a $env(QEMU_LOG)
++
++match_max 10000
++
++eval spawn $env(QEMU_CMD)
++
++expect_after {
++ -re "(.*)\r" {
++ exp_continue
++ }
++ timeout {send_error "ERROR-Timeout!\n"; exit 1}
++ eof {send_error "ERROR-EOF!\n"; exit 1}
++}
++
++if {[info exists env(UBOOT_CMD)]} {
++ expect "=>"
++
++ send "$env(UBOOT_CMD)\r"
++}
++
++if {[info exists env(LOG_MSG)]} {
++ expect {
++ "$env(PASSED)" {
++ expect "$env(LOG_MSG)"
++ exit 0
++ }
++ "$env(LOG_MSG)" {
++ expect "$env(PASSED)"
++ exit 0
++ }
++ }
++}
++
++expect {
++ "$env(PASSED)" {
++ exit 0
++ }
++}
++
++expect eof
++
+diff --git a/automation/scripts/qemu-smoke-dom0-arm32.sh b/automation/scripts/qemu-smoke-dom0-arm32.sh
+index 31c05cc840..bab66bfe44 100755
+--- a/automation/scripts/qemu-smoke-dom0-arm32.sh
++++ b/automation/scripts/qemu-smoke-dom0-arm32.sh
+@@ -78,9 +78,7 @@ bash imagebuilder/scripts/uboot-script-gen -t tftp -d . -c config
+
+ rm -f ${serial_log}
+ set +e
+-echo " virtio scan; dhcp; tftpb 0x40000000 boot.scr; source 0x40000000"| \
+-timeout -k 1 720 \
+-./qemu-system-arm \
++export QEMU_CMD="./qemu-system-arm \
+ -machine virt \
+ -machine virtualization=true \
+ -smp 4 \
+@@ -91,9 +89,11 @@ timeout -k 1 720 \
+ -no-reboot \
+ -device virtio-net-pci,netdev=n0 \
+ -netdev user,id=n0,tftp=./ \
+- -bios /usr/lib/u-boot/qemu_arm/u-boot.bin |& \
+- tee ${serial_log} | sed 's/\r//'
++ -bios /usr/lib/u-boot/qemu_arm/u-boot.bin"
++
++export UBOOT_CMD="virtio scan; dhcp; tftpb 0x40000000 boot.scr; source 0x40000000"
++export QEMU_LOG="${serial_log}"
++export LOG_MSG="Domain-0"
++export PASSED="/ #"
+
+-set -e
+-(grep -q "Domain-0" ${serial_log} && grep -q "^/ #" ${serial_log}) || exit 1
+-exit 0
++../automation/scripts/qemu-key.exp
+diff --git a/automation/scripts/qemu-smoke-dom0-arm64.sh b/automation/scripts/qemu-smoke-dom0-arm64.sh
+index 352963a741..0094bfc8e1 100755
+--- a/automation/scripts/qemu-smoke-dom0-arm64.sh
++++ b/automation/scripts/qemu-smoke-dom0-arm64.sh
+@@ -94,9 +94,7 @@ bash imagebuilder/scripts/uboot-script-gen -t tftp -d binaries/ -c binaries/conf
+ # Run the test
+ rm -f smoke.serial
+ set +e
+-echo " virtio scan; dhcp; tftpb 0x40000000 boot.scr; source 0x40000000"| \
+-timeout -k 1 720 \
+-./binaries/qemu-system-aarch64 \
++export QEMU_CMD="./binaries/qemu-system-aarch64 \
+ -machine virtualization=true \
+ -cpu cortex-a57 -machine type=virt \
+ -m 2048 -monitor none -serial stdio \
+@@ -104,9 +102,11 @@ timeout -k 1 720 \
+ -no-reboot \
+ -device virtio-net-pci,netdev=n0 \
+ -netdev user,id=n0,tftp=binaries \
+- -bios /usr/lib/u-boot/qemu_arm64/u-boot.bin |& \
+- tee smoke.serial | sed 's/\r//'
++ -bios /usr/lib/u-boot/qemu_arm64/u-boot.bin"
++
++export UBOOT_CMD="virtio scan; dhcp; tftpb 0x40000000 boot.scr; source 0x40000000"
++export QEMU_LOG="smoke.serial"
++export LOG_MSG="Domain-0"
++export PASSED="BusyBox"
+
+-set -e
+-(grep -q "Domain-0" smoke.serial && grep -q "BusyBox" smoke.serial) || exit 1
+-exit 0
++./automation/scripts/qemu-key.exp
+diff --git a/automation/scripts/qemu-smoke-dom0less-arm32.sh b/automation/scripts/qemu-smoke-dom0less-arm32.sh
+index c027c8c5c8..68ffbabdb8 100755
+--- a/automation/scripts/qemu-smoke-dom0less-arm32.sh
++++ b/automation/scripts/qemu-smoke-dom0less-arm32.sh
+@@ -5,7 +5,7 @@ set -ex
+ test_variant=$1
+
+ # Prompt to grep for to check if dom0 booted successfully
+-dom0_prompt="^/ #"
++dom0_prompt="/ #"
+
+ serial_log="$(pwd)/smoke.serial"
+
+@@ -131,9 +131,7 @@ bash imagebuilder/scripts/uboot-script-gen -t tftp -d . -c config
+ # Run the test
+ rm -f ${serial_log}
+ set +e
+-echo " virtio scan; dhcp; tftpb 0x40000000 boot.scr; source 0x40000000"| \
+-timeout -k 1 240 \
+-./qemu-system-arm \
++export QEMU_CMD="./qemu-system-arm \
+ -machine virt \
+ -machine virtualization=true \
+ -smp 4 \
+@@ -144,9 +142,11 @@ timeout -k 1 240 \
+ -no-reboot \
+ -device virtio-net-pci,netdev=n0 \
+ -netdev user,id=n0,tftp=./ \
+- -bios /usr/lib/u-boot/qemu_arm/u-boot.bin |& \
+- tee ${serial_log} | sed 's/\r//'
++ -bios /usr/lib/u-boot/qemu_arm/u-boot.bin"
++
++export UBOOT_CMD="virtio scan; dhcp; tftpb 0x40000000 boot.scr; source 0x40000000"
++export QEMU_LOG="${serial_log}"
++export LOG_MSG="${dom0_prompt}"
++export PASSED="${passed}"
+
+-set -e
+-(grep -q "${dom0_prompt}" ${serial_log} && grep -q "${passed}" ${serial_log}) || exit 1
+-exit 0
++../automation/scripts/qemu-key.exp
+diff --git a/automation/scripts/qemu-smoke-dom0less-arm64.sh b/automation/scripts/qemu-smoke-dom0less-arm64.sh
+index 15258692d5..eb25c4af4b 100755
+--- a/automation/scripts/qemu-smoke-dom0less-arm64.sh
++++ b/automation/scripts/qemu-smoke-dom0less-arm64.sh
+@@ -205,9 +205,7 @@ bash imagebuilder/scripts/uboot-script-gen -t tftp -d binaries/ -c binaries/conf
+ # Run the test
+ rm -f smoke.serial
+ set +e
+-echo " virtio scan; dhcp; tftpb 0x40000000 boot.scr; source 0x40000000"| \
+-timeout -k 1 240 \
+-./binaries/qemu-system-aarch64 \
++export QEMU_CMD="./binaries/qemu-system-aarch64 \
+ -machine virtualization=true \
+ -cpu cortex-a57 -machine type=virt,gic-version=$gic_version \
+ -m 2048 -monitor none -serial stdio \
+@@ -215,9 +213,11 @@ timeout -k 1 240 \
+ -no-reboot \
+ -device virtio-net-pci,netdev=n0 \
+ -netdev user,id=n0,tftp=binaries \
+- -bios /usr/lib/u-boot/qemu_arm64/u-boot.bin |& \
+- tee smoke.serial | sed 's/\r//'
++ -bios /usr/lib/u-boot/qemu_arm64/u-boot.bin"
++
++export UBOOT_CMD="virtio scan; dhcp; tftpb 0x40000000 boot.scr; source 0x40000000"
++export QEMU_LOG="smoke.serial"
++export LOG_MSG="Welcome to Alpine Linux"
++export PASSED="${passed}"
+
+-set -e
+-(grep -q "^Welcome to Alpine Linux" smoke.serial && grep -q "${passed}" smoke.serial) || exit 1
+-exit 0
++./automation/scripts/qemu-key.exp
+diff --git a/automation/scripts/qemu-smoke-ppc64le.sh b/automation/scripts/qemu-smoke-ppc64le.sh
+index 9088881b73..ccb4a576f4 100755
+--- a/automation/scripts/qemu-smoke-ppc64le.sh
++++ b/automation/scripts/qemu-smoke-ppc64le.sh
+@@ -11,8 +11,7 @@ machine=$1
+ rm -f ${serial_log}
+ set +e
+
+-timeout -k 1 20 \
+-qemu-system-ppc64 \
++export QEMU_CMD="qemu-system-ppc64 \
+ -bios skiboot.lid \
+ -M $machine \
+ -m 2g \
+@@ -21,9 +20,9 @@ qemu-system-ppc64 \
+ -monitor none \
+ -nographic \
+ -serial stdio \
+- -kernel binaries/xen \
+- |& tee ${serial_log} | sed 's/\r//'
++ -kernel binaries/xen"
+
+-set -e
+-(grep -q "Hello, ppc64le!" ${serial_log}) || exit 1
+-exit 0
++export QEMU_LOG="${serial_log}"
++export PASSED="Hello, ppc64le!"
++
++./automation/scripts/qemu-key.exp
+diff --git a/automation/scripts/qemu-smoke-riscv64.sh b/automation/scripts/qemu-smoke-riscv64.sh
+index f90df3c051..0355c075b7 100755
+--- a/automation/scripts/qemu-smoke-riscv64.sh
++++ b/automation/scripts/qemu-smoke-riscv64.sh
+@@ -6,15 +6,14 @@ set -ex
+ rm -f smoke.serial
+ set +e
+
+-timeout -k 1 2 \
+-qemu-system-riscv64 \
++export QEMU_CMD="qemu-system-riscv64 \
+ -M virt \
+ -smp 1 \
+ -nographic \
+ -m 2g \
+- -kernel binaries/xen \
+- |& tee smoke.serial | sed 's/\r//'
++ -kernel binaries/xen"
+
+-set -e
+-(grep -q "All set up" smoke.serial) || exit 1
+-exit 0
++export QEMU_LOG="smoke.serial"
++export PASSED="All set up"
++
++./automation/scripts/qemu-key.exp
+diff --git a/automation/scripts/qemu-smoke-x86-64.sh b/automation/scripts/qemu-smoke-x86-64.sh
+index 3014d07314..37ac10e068 100755
+--- a/automation/scripts/qemu-smoke-x86-64.sh
++++ b/automation/scripts/qemu-smoke-x86-64.sh
+@@ -16,11 +16,12 @@ esac
+
+ rm -f smoke.serial
+ set +e
+-timeout -k 1 30 \
+-qemu-system-x86_64 -nographic -kernel binaries/xen \
++export QEMU_CMD="qemu-system-x86_64 -nographic -kernel binaries/xen \
+ -initrd xtf/tests/example/$k \
+- -append "loglvl=all console=com1 noreboot console_timestamps=boot $extra" \
+- -m 512 -monitor none -serial file:smoke.serial
+-set -e
+-grep -q 'Test result: SUCCESS' smoke.serial || exit 1
+-exit 0
++ -append \"loglvl=all console=com1 noreboot console_timestamps=boot $extra\" \
++ -m 512 -monitor none -serial stdio"
++
++export QEMU_LOG="smoke.serial"
++export PASSED="Test result: SUCCESS"
++
++./automation/scripts/qemu-key.exp
+diff --git a/automation/scripts/qemu-xtf-dom0less-arm64.sh b/automation/scripts/qemu-xtf-dom0less-arm64.sh
+index b08c2d44fb..0666f6363e 100755
+--- a/automation/scripts/qemu-xtf-dom0less-arm64.sh
++++ b/automation/scripts/qemu-xtf-dom0less-arm64.sh
+@@ -51,9 +51,7 @@ bash imagebuilder/scripts/uboot-script-gen -t tftp -d binaries/ -c binaries/conf
+ # Run the test
+ rm -f smoke.serial
+ set +e
+-echo " virtio scan; dhcp; tftpb 0x40000000 boot.scr; source 0x40000000"| \
+-timeout -k 1 120 \
+-./binaries/qemu-system-aarch64 \
++export QEMU_CMD="./binaries/qemu-system-aarch64 \
+ -machine virtualization=true \
+ -cpu cortex-a57 -machine type=virt \
+ -m 2048 -monitor none -serial stdio \
+@@ -61,9 +59,10 @@ timeout -k 1 120 \
+ -no-reboot \
+ -device virtio-net-pci,netdev=n0 \
+ -netdev user,id=n0,tftp=binaries \
+- -bios /usr/lib/u-boot/qemu_arm64/u-boot.bin |& \
+- tee smoke.serial | sed 's/\r//'
++ -bios /usr/lib/u-boot/qemu_arm64/u-boot.bin"
++
++export UBOOT_CMD="virtio scan; dhcp; tftpb 0x40000000 boot.scr; source 0x40000000"
++export QEMU_LOG="smoke.serial"
++export PASSED="${passed}"
+
+-set -e
+-(grep -q "${passed}" smoke.serial) || exit 1
+-exit 0
++./automation/scripts/qemu-key.exp
+--
+2.46.1
+
diff --git a/0014-libxl-Fix-handling-XenStore-errors-in-device-creatio.patch b/0014-libxl-Fix-handling-XenStore-errors-in-device-creatio.patch
deleted file mode 100644
index ac28521..0000000
--- a/0014-libxl-Fix-handling-XenStore-errors-in-device-creatio.patch
+++ /dev/null
@@ -1,191 +0,0 @@
-From 8271f0e8f23b63199caf0edcfe85ebc1c1412d1b Mon Sep 17 00:00:00 2001
-From: Demi Marie Obenour <demi@invisiblethingslab.com>
-Date: Tue, 21 May 2024 10:23:52 +0200
-Subject: [PATCH 14/56] libxl: Fix handling XenStore errors in device creation
-
-If xenstored runs out of memory it is possible for it to fail operations
-that should succeed. libxl wasn't robust against this, and could fail
-to ensure that the TTY path of a non-initial console was created and
-read-only for guests. This doesn't qualify for an XSA because guests
-should not be able to run xenstored out of memory, but it still needs to
-be fixed.
-
-Add the missing error checks to ensure that all errors are properly
-handled and that at no point can a guest make the TTY path of its
-frontend directory writable.
-
-Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-master commit: 531d3bea5e9357357eaf6d40f5784a1b4c29b910
-master date: 2024-05-11 00:13:43 +0100
----
- tools/libs/light/libxl_console.c | 11 ++---
- tools/libs/light/libxl_device.c | 72 ++++++++++++++++++++------------
- tools/libs/light/libxl_xshelp.c | 13 ++++--
- 3 files changed, 60 insertions(+), 36 deletions(-)
-
-diff --git a/tools/libs/light/libxl_console.c b/tools/libs/light/libxl_console.c
-index cd7412a327..a563c9d3c7 100644
---- a/tools/libs/light/libxl_console.c
-+++ b/tools/libs/light/libxl_console.c
-@@ -351,11 +351,10 @@ int libxl__device_console_add(libxl__gc *gc, uint32_t domid,
- flexarray_append(front, "protocol");
- flexarray_append(front, LIBXL_XENCONSOLE_PROTOCOL);
- }
-- libxl__device_generic_add(gc, XBT_NULL, device,
-- libxl__xs_kvs_of_flexarray(gc, back),
-- libxl__xs_kvs_of_flexarray(gc, front),
-- libxl__xs_kvs_of_flexarray(gc, ro_front));
-- rc = 0;
-+ rc = libxl__device_generic_add(gc, XBT_NULL, device,
-+ libxl__xs_kvs_of_flexarray(gc, back),
-+ libxl__xs_kvs_of_flexarray(gc, front),
-+ libxl__xs_kvs_of_flexarray(gc, ro_front));
- out:
- return rc;
- }
-@@ -665,6 +664,8 @@ int libxl_device_channel_getinfo(libxl_ctx *ctx, uint32_t domid,
- */
- if (!val) val = "/NO-SUCH-PATH";
- channelinfo->u.pty.path = strdup(val);
-+ if (channelinfo->u.pty.path == NULL)
-+ abort();
- break;
- default:
- break;
-diff --git a/tools/libs/light/libxl_device.c b/tools/libs/light/libxl_device.c
-index 13da6e0573..3035501f2c 100644
---- a/tools/libs/light/libxl_device.c
-+++ b/tools/libs/light/libxl_device.c
-@@ -177,8 +177,13 @@ int libxl__device_generic_add(libxl__gc *gc, xs_transaction_t t,
- ro_frontend_perms[1].perms = backend_perms[1].perms = XS_PERM_READ;
-
- retry_transaction:
-- if (create_transaction)
-+ if (create_transaction) {
- t = xs_transaction_start(ctx->xsh);
-+ if (t == XBT_NULL) {
-+ LOGED(ERROR, device->domid, "xs_transaction_start failed");
-+ return ERROR_FAIL;
-+ }
-+ }
-
- /* FIXME: read frontend_path and check state before removing stuff */
-
-@@ -195,42 +200,55 @@ retry_transaction:
- if (rc) goto out;
- }
-
-- /* xxx much of this function lacks error checks! */
--
- if (fents || ro_fents) {
-- xs_rm(ctx->xsh, t, frontend_path);
-- xs_mkdir(ctx->xsh, t, frontend_path);
-+ if (!xs_rm(ctx->xsh, t, frontend_path) && errno != ENOENT)
-+ goto out;
-+ if (!xs_mkdir(ctx->xsh, t, frontend_path))
-+ goto out;
- /* Console 0 is a special case. It doesn't use the regular PV
- * state machine but also the frontend directory has
- * historically contained other information, such as the
- * vnc-port, which we don't want the guest fiddling with.
- */
- if ((device->kind == LIBXL__DEVICE_KIND_CONSOLE && device->devid == 0) ||
-- (device->kind == LIBXL__DEVICE_KIND_VUART))
-- xs_set_permissions(ctx->xsh, t, frontend_path,
-- ro_frontend_perms, ARRAY_SIZE(ro_frontend_perms));
-- else
-- xs_set_permissions(ctx->xsh, t, frontend_path,
-- frontend_perms, ARRAY_SIZE(frontend_perms));
-- xs_write(ctx->xsh, t, GCSPRINTF("%s/backend", frontend_path),
-- backend_path, strlen(backend_path));
-- if (fents)
-- libxl__xs_writev_perms(gc, t, frontend_path, fents,
-- frontend_perms, ARRAY_SIZE(frontend_perms));
-- if (ro_fents)
-- libxl__xs_writev_perms(gc, t, frontend_path, ro_fents,
-- ro_frontend_perms, ARRAY_SIZE(ro_frontend_perms));
-+ (device->kind == LIBXL__DEVICE_KIND_VUART)) {
-+ if (!xs_set_permissions(ctx->xsh, t, frontend_path,
-+ ro_frontend_perms, ARRAY_SIZE(ro_frontend_perms)))
-+ goto out;
-+ } else {
-+ if (!xs_set_permissions(ctx->xsh, t, frontend_path,
-+ frontend_perms, ARRAY_SIZE(frontend_perms)))
-+ goto out;
-+ }
-+ if (!xs_write(ctx->xsh, t, GCSPRINTF("%s/backend", frontend_path),
-+ backend_path, strlen(backend_path)))
-+ goto out;
-+ if (fents) {
-+ rc = libxl__xs_writev_perms(gc, t, frontend_path, fents,
-+ frontend_perms, ARRAY_SIZE(frontend_perms));
-+ if (rc) goto out;
-+ }
-+ if (ro_fents) {
-+ rc = libxl__xs_writev_perms(gc, t, frontend_path, ro_fents,
-+ ro_frontend_perms, ARRAY_SIZE(ro_frontend_perms));
-+ if (rc) goto out;
-+ }
- }
-
- if (bents) {
- if (!libxl_only) {
-- xs_rm(ctx->xsh, t, backend_path);
-- xs_mkdir(ctx->xsh, t, backend_path);
-- xs_set_permissions(ctx->xsh, t, backend_path, backend_perms,
-- ARRAY_SIZE(backend_perms));
-- xs_write(ctx->xsh, t, GCSPRINTF("%s/frontend", backend_path),
-- frontend_path, strlen(frontend_path));
-- libxl__xs_writev(gc, t, backend_path, bents);
-+ if (!xs_rm(ctx->xsh, t, backend_path) && errno != ENOENT)
-+ goto out;
-+ if (!xs_mkdir(ctx->xsh, t, backend_path))
-+ goto out;
-+ if (!xs_set_permissions(ctx->xsh, t, backend_path, backend_perms,
-+ ARRAY_SIZE(backend_perms)))
-+ goto out;
-+ if (!xs_write(ctx->xsh, t, GCSPRINTF("%s/frontend", backend_path),
-+ frontend_path, strlen(frontend_path)))
-+ goto out;
-+ rc = libxl__xs_writev(gc, t, backend_path, bents);
-+ if (rc) goto out;
- }
-
- /*
-@@ -276,7 +294,7 @@ retry_transaction:
- out:
- if (create_transaction && t)
- libxl__xs_transaction_abort(gc, &t);
-- return rc;
-+ return rc != 0 ? rc : ERROR_FAIL;
- }
-
- typedef struct {
-diff --git a/tools/libs/light/libxl_xshelp.c b/tools/libs/light/libxl_xshelp.c
-index 751cd942d9..a6e34ab10f 100644
---- a/tools/libs/light/libxl_xshelp.c
-+++ b/tools/libs/light/libxl_xshelp.c
-@@ -60,10 +60,15 @@ int libxl__xs_writev_perms(libxl__gc *gc, xs_transaction_t t,
- for (i = 0; kvs[i] != NULL; i += 2) {
- path = GCSPRINTF("%s/%s", dir, kvs[i]);
- if (path && kvs[i + 1]) {
-- int length = strlen(kvs[i + 1]);
-- xs_write(ctx->xsh, t, path, kvs[i + 1], length);
-- if (perms)
-- xs_set_permissions(ctx->xsh, t, path, perms, num_perms);
-+ size_t length = strlen(kvs[i + 1]);
-+ if (length > UINT_MAX)
-+ return ERROR_FAIL;
-+ if (!xs_write(ctx->xsh, t, path, kvs[i + 1], length))
-+ return ERROR_FAIL;
-+ if (perms) {
-+ if (!xs_set_permissions(ctx->xsh, t, path, perms, num_perms))
-+ return ERROR_FAIL;
-+ }
- }
- }
- return 0;
---
-2.45.2
-
diff --git a/0015-x86-vLAPIC-prevent-undue-recursion-of-vlapic_error.patch b/0015-x86-vLAPIC-prevent-undue-recursion-of-vlapic_error.patch
new file mode 100644
index 0000000..ce66fe7
--- /dev/null
+++ b/0015-x86-vLAPIC-prevent-undue-recursion-of-vlapic_error.patch
@@ -0,0 +1,57 @@
+From 9358a7fad7f0427e7d1666da0c78cef341ee9072 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 24 Sep 2024 14:27:03 +0200
+Subject: [PATCH 15/35] x86/vLAPIC: prevent undue recursion of vlapic_error()
+
+With the error vector set to an illegal value, the function invoking
+vlapic_set_irq() would bring execution back here, with the non-recursive
+lock already held. Avoid the call in this case, merely further updating
+ESR (if necessary).
+
+This is XSA-462 / CVE-2024-45817.
+
+Fixes: 5f32d186a8b1 ("x86/vlapic: don't silently accept bad vectors")
+Reported-by: Federico Serafini <federico.serafini@bugseng.com>
+Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: c42d9ec61f6d11e25fa77bd44dd11dad1edda268
+master date: 2024-09-24 14:23:29 +0200
+---
+ xen/arch/x86/hvm/vlapic.c | 17 ++++++++++++++++-
+ 1 file changed, 16 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
+index 9cfc82666a..46ff758904 100644
+--- a/xen/arch/x86/hvm/vlapic.c
++++ b/xen/arch/x86/hvm/vlapic.c
+@@ -112,9 +112,24 @@ static void vlapic_error(struct vlapic *vlapic, unsigned int errmask)
+ if ( (esr & errmask) != errmask )
+ {
+ uint32_t lvterr = vlapic_get_reg(vlapic, APIC_LVTERR);
++ bool inj = false;
+
+- vlapic_set_reg(vlapic, APIC_ESR, esr | errmask);
+ if ( !(lvterr & APIC_LVT_MASKED) )
++ {
++ /*
++ * If LVTERR is unmasked and has an illegal vector, vlapic_set_irq()
++ * will end up back here. Break the cycle by only injecting LVTERR
++ * if it will succeed, and folding in RECVILL otherwise.
++ */
++ if ( (lvterr & APIC_VECTOR_MASK) >= 16 )
++ inj = true;
++ else
++ errmask |= APIC_ESR_RECVILL;
++ }
++
++ vlapic_set_reg(vlapic, APIC_ESR, esr | errmask);
++
++ if ( inj )
+ vlapic_set_irq(vlapic, lvterr & APIC_VECTOR_MASK, 0);
+ }
+ spin_unlock_irqrestore(&vlapic->esr_lock, flags);
+--
+2.46.1
+
diff --git a/0015-xen-sched-set-all-sched_resource-data-inside-locked-.patch b/0015-xen-sched-set-all-sched_resource-data-inside-locked-.patch
deleted file mode 100644
index a8090d4..0000000
--- a/0015-xen-sched-set-all-sched_resource-data-inside-locked-.patch
+++ /dev/null
@@ -1,84 +0,0 @@
-From 3999b675cad5b717274d6493899b0eea8896f4d7 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Tue, 21 May 2024 10:24:26 +0200
-Subject: [PATCH 15/56] xen/sched: set all sched_resource data inside locked
- region for new cpu
-
-When adding a cpu to a scheduler, set all data items of struct
-sched_resource inside the locked region, as otherwise a race might
-happen (e.g. when trying to access the cpupool of the cpu):
-
- (XEN) ----[ Xen-4.19.0-1-d x86_64 debug=y Tainted: H ]----
- (XEN) CPU: 45
- (XEN) RIP: e008:[<ffff82d040244cbf>] common/sched/credit.c#csched_load_balance+0x41/0x877
- (XEN) RFLAGS: 0000000000010092 CONTEXT: hypervisor
- (XEN) rax: ffff82d040981618 rbx: ffff82d040981618 rcx: 0000000000000000
- (XEN) rdx: 0000003ff68cd000 rsi: 000000000000002d rdi: ffff83103723d450
- (XEN) rbp: ffff83207caa7d48 rsp: ffff83207caa7b98 r8: 0000000000000000
- (XEN) r9: ffff831037253cf0 r10: ffff83103767c3f0 r11: 0000000000000009
- (XEN) r12: ffff831037237990 r13: ffff831037237990 r14: ffff831037253720
- (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 0000000000f526e0
- (XEN) cr3: 000000005bc2f000 cr2: 0000000000000010
- (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: 0000000000000000
- (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
- (XEN) Xen code around <ffff82d040244cbf> (common/sched/credit.c#csched_load_balance+0x41/0x877):
- (XEN) 48 8b 0c 10 48 8b 49 08 <48> 8b 79 10 48 89 bd b8 fe ff ff 49 8b 4e 28 48
- <snip>
- (XEN) Xen call trace:
- (XEN) [<ffff82d040244cbf>] R common/sched/credit.c#csched_load_balance+0x41/0x877
- (XEN) [<ffff82d040245a18>] F common/sched/credit.c#csched_schedule+0x36a/0x69f
- (XEN) [<ffff82d040252644>] F common/sched/core.c#do_schedule+0xe8/0x433
- (XEN) [<ffff82d0402572dd>] F common/sched/core.c#schedule+0x2e5/0x2f9
- (XEN) [<ffff82d040232f35>] F common/softirq.c#__do_softirq+0x94/0xbe
- (XEN) [<ffff82d040232fc8>] F do_softirq+0x13/0x15
- (XEN) [<ffff82d0403075ef>] F arch/x86/domain.c#idle_loop+0x92/0xe6
- (XEN)
- (XEN) Pagetable walk from 0000000000000010:
- (XEN) L4[0x000] = 000000103ff61063 ffffffffffffffff
- (XEN) L3[0x000] = 000000103ff60063 ffffffffffffffff
- (XEN) L2[0x000] = 0000001033dff063 ffffffffffffffff
- (XEN) L1[0x000] = 0000000000000000 ffffffffffffffff
- (XEN)
- (XEN) ****************************************
- (XEN) Panic on CPU 45:
- (XEN) FATAL PAGE FAULT
- (XEN) [error_code=0000]
- (XEN) Faulting linear address: 0000000000000010
- (XEN) ****************************************
-
-Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Fixes: a8c6c623192e ("sched: clarify use cases of schedule_cpu_switch()")
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: d104a07524ffc92ae7a70dfe192c291de2a563cc
-master date: 2024-05-15 19:59:52 +0100
----
- xen/common/sched/core.c | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
-index 34ad39b9ad..3c2403ebcf 100644
---- a/xen/common/sched/core.c
-+++ b/xen/common/sched/core.c
-@@ -3179,6 +3179,8 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
-
- sr->scheduler = new_ops;
- sr->sched_priv = ppriv;
-+ sr->granularity = cpupool_get_granularity(c);
-+ sr->cpupool = c;
-
- /*
- * Reroute the lock to the per pCPU lock as /last/ thing. In fact,
-@@ -3191,8 +3193,6 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c)
- /* _Not_ pcpu_schedule_unlock(): schedule_lock has changed! */
- spin_unlock_irqrestore(old_lock, flags);
-
-- sr->granularity = cpupool_get_granularity(c);
-- sr->cpupool = c;
- /* The cpu is added to a pool, trigger it to go pick up some work */
- cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
-
---
-2.45.2
-
diff --git a/0016-Arm-correct-FIXADDR_TOP.patch b/0016-Arm-correct-FIXADDR_TOP.patch
new file mode 100644
index 0000000..244e873
--- /dev/null
+++ b/0016-Arm-correct-FIXADDR_TOP.patch
@@ -0,0 +1,58 @@
+From 46a2ce35212c9b35c4818ca9eec918aa4a45cb48 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 24 Sep 2024 14:28:22 +0200
+Subject: [PATCH 16/35] Arm: correct FIXADDR_TOP
+
+While reviewing a RISC-V patch cloning the Arm code, I noticed an
+off-by-1 here: FIX_PMAP_{BEGIN,END} being an inclusive range and
+FIX_LAST being the same as FIX_PMAP_END, FIXADDR_TOP cannot derive from
+FIX_LAST alone, or else the BUG_ON() in virt_to_fix() would trigger if
+FIX_PMAP_END ended up being used.
+
+While touching this area also add a check for fixmap and boot FDT area
+to not only not overlap, but to have at least one (unmapped) page in
+between.
+
+Fixes: 4f17357b52f6 ("xen/arm: add Persistent Map (PMAP) infrastructure")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Michal Orzel <michal.orzel@amd.com>
+master commit: fe3412ab83cc53c2bf2c497be3794bc09751efa5
+master date: 2024-08-13 21:50:55 +0100
+---
+ xen/arch/arm/include/asm/fixmap.h | 2 +-
+ xen/arch/arm/mmu/setup.c | 6 ++++++
+ 2 files changed, 7 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/arm/include/asm/fixmap.h b/xen/arch/arm/include/asm/fixmap.h
+index a823456ecb..0cb5d54d1c 100644
+--- a/xen/arch/arm/include/asm/fixmap.h
++++ b/xen/arch/arm/include/asm/fixmap.h
+@@ -18,7 +18,7 @@
+ #define FIX_LAST FIX_PMAP_END
+
+ #define FIXADDR_START FIXMAP_ADDR(0)
+-#define FIXADDR_TOP FIXMAP_ADDR(FIX_LAST)
++#define FIXADDR_TOP FIXMAP_ADDR(FIX_LAST + 1)
+
+ #ifndef __ASSEMBLY__
+
+diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
+index f4bb424c3c..57042ed57b 100644
+--- a/xen/arch/arm/mmu/setup.c
++++ b/xen/arch/arm/mmu/setup.c
+@@ -128,6 +128,12 @@ static void __init __maybe_unused build_assertions(void)
+
+ #undef CHECK_SAME_SLOT
+ #undef CHECK_DIFFERENT_SLOT
++
++ /*
++ * Fixmaps must not overlap with boot FDT mapping area. Make sure there's
++ * at least one guard page in between.
++ */
++ BUILD_BUG_ON(FIXADDR_TOP >= BOOT_FDT_VIRT_START);
+ }
+
+ lpae_t __init pte_of_xenaddr(vaddr_t va)
+--
+2.46.1
+
diff --git a/0016-x86-respect-mapcache_domain_init-failing.patch b/0016-x86-respect-mapcache_domain_init-failing.patch
deleted file mode 100644
index db7ddfe..0000000
--- a/0016-x86-respect-mapcache_domain_init-failing.patch
+++ /dev/null
@@ -1,38 +0,0 @@
-From dfabab2cd9461ef9d21a708461f35d2ae4b55220 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 21 May 2024 10:25:08 +0200
-Subject: [PATCH 16/56] x86: respect mapcache_domain_init() failing
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The function itself properly handles and hands onwards failure from
-create_perdomain_mapping(). Therefore its caller should respect possible
-failure, too.
-
-Fixes: 4b28bf6ae90b ("x86: re-introduce map_domain_page() et al")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 7270fdc7a0028d4b7b26fd1b36c6b9e97abcf3da
-master date: 2024-05-15 19:59:52 +0100
----
- xen/arch/x86/domain.c | 3 ++-
- 1 file changed, 2 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
-index 307446273a..5feb0d0679 100644
---- a/xen/arch/x86/domain.c
-+++ b/xen/arch/x86/domain.c
-@@ -850,7 +850,8 @@ int arch_domain_create(struct domain *d,
- }
- else if ( is_pv_domain(d) )
- {
-- mapcache_domain_init(d);
-+ if ( (rc = mapcache_domain_init(d)) != 0 )
-+ goto fail;
-
- if ( (rc = pv_domain_initialise(d)) != 0 )
- goto fail;
---
-2.45.2
-
diff --git a/0017-tools-xentop-Fix-cpu-sort-order.patch b/0017-tools-xentop-Fix-cpu-sort-order.patch
deleted file mode 100644
index de19ddc..0000000
--- a/0017-tools-xentop-Fix-cpu-sort-order.patch
+++ /dev/null
@@ -1,76 +0,0 @@
-From f3d20dd31770a70971f4f85521eec1e741d38695 Mon Sep 17 00:00:00 2001
-From: Leigh Brown <leigh@solinno.co.uk>
-Date: Tue, 21 May 2024 10:25:30 +0200
-Subject: [PATCH 17/56] tools/xentop: Fix cpu% sort order
-
-In compare_cpu_pct(), there is a double -> unsigned long long converion when
-calling compare(). In C, this discards the fractional part, resulting in an
-out-of order sorting such as:
-
- NAME STATE CPU(sec) CPU(%)
- xendd --b--- 4020 5.7
- icecream --b--- 2600 3.8
- Domain-0 -----r 1060 1.5
- neon --b--- 827 1.1
- cheese --b--- 225 0.7
- pizza --b--- 359 0.5
- cassini --b--- 490 0.4
- fusilli --b--- 159 0.2
- bob --b--- 502 0.2
- blender --b--- 121 0.2
- bread --b--- 69 0.1
- chickpea --b--- 67 0.1
- lentil --b--- 67 0.1
-
-Introduce compare_dbl() function and update compare_cpu_pct() to call it.
-
-Fixes: 49839b535b78 ("Add xenstat framework.")
-Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: e27fc7d15eab79e604e8b8728778594accc23cf1
-master date: 2024-05-15 19:59:52 +0100
----
- tools/xentop/xentop.c | 13 ++++++++++++-
- 1 file changed, 12 insertions(+), 1 deletion(-)
-
-diff --git a/tools/xentop/xentop.c b/tools/xentop/xentop.c
-index 545bd5e96d..c2a311befe 100644
---- a/tools/xentop/xentop.c
-+++ b/tools/xentop/xentop.c
-@@ -85,6 +85,7 @@ static void set_delay(const char *value);
- static void set_prompt(const char *new_prompt, void (*func)(const char *));
- static int handle_key(int);
- static int compare(unsigned long long, unsigned long long);
-+static int compare_dbl(double, double);
- static int compare_domains(xenstat_domain **, xenstat_domain **);
- static unsigned long long tot_net_bytes( xenstat_domain *, int);
- static bool tot_vbd_reqs(xenstat_domain *, int, unsigned long long *);
-@@ -422,6 +423,16 @@ static int compare(unsigned long long i1, unsigned long long i2)
- return 0;
- }
-
-+/* Compares two double precision numbers, returning -1,0,1 for <,=,> */
-+static int compare_dbl(double d1, double d2)
-+{
-+ if (d1 < d2)
-+ return -1;
-+ if (d1 > d2)
-+ return 1;
-+ return 0;
-+}
-+
- /* Comparison function for use with qsort. Compares two domains using the
- * current sort field. */
- static int compare_domains(xenstat_domain **domain1, xenstat_domain **domain2)
-@@ -523,7 +534,7 @@ static double get_cpu_pct(xenstat_domain *domain)
-
- static int compare_cpu_pct(xenstat_domain *domain1, xenstat_domain *domain2)
- {
-- return -compare(get_cpu_pct(domain1), get_cpu_pct(domain2));
-+ return -compare_dbl(get_cpu_pct(domain1), get_cpu_pct(domain2));
- }
-
- /* Prints cpu percentage statistic */
---
-2.45.2
-
diff --git a/0017-xl-fix-incorrect-output-in-help-command.patch b/0017-xl-fix-incorrect-output-in-help-command.patch
new file mode 100644
index 0000000..f2ab58a
--- /dev/null
+++ b/0017-xl-fix-incorrect-output-in-help-command.patch
@@ -0,0 +1,36 @@
+From e12998a9db8d0ac14477557d09b437783a999ea4 Mon Sep 17 00:00:00 2001
+From: "John E. Krokes" <mag@netherworld.org>
+Date: Tue, 24 Sep 2024 14:29:26 +0200
+Subject: [PATCH 17/35] xl: fix incorrect output in "help" command
+
+In "xl help", the output includes this line:
+
+ vsnd-list List virtual display devices for a domain
+
+This should obviously say "sound devices" instead of "display devices".
+
+Signed-off-by: John E. Krokes <mag@netherworld.org>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+Acked-by: Anthony PERARD <anthony.perard@vates.tech>
+master commit: 09226d165b57d919150458044c5b594d3d1dc23a
+master date: 2024-08-14 08:49:44 +0200
+---
+ tools/xl/xl_cmdtable.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
+index 42751228c1..53fc22d344 100644
+--- a/tools/xl/xl_cmdtable.c
++++ b/tools/xl/xl_cmdtable.c
+@@ -433,7 +433,7 @@ const struct cmd_spec cmd_table[] = {
+ },
+ { "vsnd-list",
+ &main_vsndlist, 0, 0,
+- "List virtual display devices for a domain",
++ "List virtual sound devices for a domain",
+ "<Domain(s)>",
+ },
+ { "vsnd-detach",
+--
+2.46.1
+
diff --git a/0018-x86-mtrr-avoid-system-wide-rendezvous-when-setting-A.patch b/0018-x86-mtrr-avoid-system-wide-rendezvous-when-setting-A.patch
deleted file mode 100644
index a57775d..0000000
--- a/0018-x86-mtrr-avoid-system-wide-rendezvous-when-setting-A.patch
+++ /dev/null
@@ -1,60 +0,0 @@
-From 7cdb1fa2ab0b5e11f66cada0370770404153c824 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Tue, 21 May 2024 10:25:39 +0200
-Subject: [PATCH 18/56] x86/mtrr: avoid system wide rendezvous when setting AP
- MTRRs
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-There's no point in forcing a system wide update of the MTRRs on all processors
-when there are no changes to be propagated. On AP startup it's only the AP
-that needs to write the system wide MTRR values in order to match the rest of
-the already online CPUs.
-
-We have occasionally seen the watchdog trigger during `xen-hptool cpu-online`
-in one Intel Cascade Lake box with 448 CPUs due to the re-setting of the MTRRs
-on all the CPUs in the system.
-
-While there adjust the comment to clarify why the system-wide resetting of the
-MTRR registers is not needed for the purposes of mtrr_ap_init().
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: abd00b037da5ffa4e8c4508a5df0cd6eabb805a4
-master date: 2024-05-15 19:59:52 +0100
----
- xen/arch/x86/cpu/mtrr/main.c | 15 ++++++++-------
- 1 file changed, 8 insertions(+), 7 deletions(-)
-
-diff --git a/xen/arch/x86/cpu/mtrr/main.c b/xen/arch/x86/cpu/mtrr/main.c
-index 90b235f57e..0a44ebbcb0 100644
---- a/xen/arch/x86/cpu/mtrr/main.c
-+++ b/xen/arch/x86/cpu/mtrr/main.c
-@@ -573,14 +573,15 @@ void mtrr_ap_init(void)
- if (!mtrr_if || hold_mtrr_updates_on_aps)
- return;
- /*
-- * Ideally we should hold mtrr_mutex here to avoid mtrr entries changed,
-- * but this routine will be called in cpu boot time, holding the lock
-- * breaks it. This routine is called in two cases: 1.very earily time
-- * of software resume, when there absolutely isn't mtrr entry changes;
-- * 2.cpu hotadd time. We let mtrr_add/del_page hold cpuhotplug lock to
-- * prevent mtrr entry changes
-+ * hold_mtrr_updates_on_aps takes care of preventing unnecessary MTRR
-+ * updates when batch starting the CPUs (see
-+ * mtrr_aps_sync_{begin,end}()).
-+ *
-+ * Otherwise just apply the current system wide MTRR values to this AP.
-+ * Note this doesn't require synchronization with the other CPUs, as
-+ * there are strictly no modifications of the current MTRR values.
- */
-- set_mtrr(~0U, 0, 0, 0);
-+ mtrr_set_all();
- }
-
- /**
---
-2.45.2
-
diff --git a/0018-x86emul-correct-UD-check-for-AVX512-FP16-complex-mul.patch b/0018-x86emul-correct-UD-check-for-AVX512-FP16-complex-mul.patch
new file mode 100644
index 0000000..cdbc59e
--- /dev/null
+++ b/0018-x86emul-correct-UD-check-for-AVX512-FP16-complex-mul.patch
@@ -0,0 +1,37 @@
+From e2f29f7bad59c4be53363c8c0d2933982a22d0de Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 24 Sep 2024 14:30:04 +0200
+Subject: [PATCH 18/35] x86emul: correct #UD check for AVX512-FP16 complex
+ multiplications
+
+avx512_vlen_check()'s argument was inverted, while the surrounding
+conditional wrongly forced the EVEX.L'L check for the scalar forms when
+embedded rounding was in effect.
+
+Fixes: d14c52cba0f5 ("x86emul: handle AVX512-FP16 complex multiplication insns")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: a30d438ce58b70c5955f5d37f776086ab8f88623
+master date: 2024-08-19 15:32:31 +0200
+---
+ xen/arch/x86/x86_emulate/x86_emulate.c | 4 ++--
+ 1 file changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
+index 2d5c1de8ec..16557385bf 100644
+--- a/xen/arch/x86/x86_emulate/x86_emulate.c
++++ b/xen/arch/x86/x86_emulate/x86_emulate.c
+@@ -7984,8 +7984,8 @@ x86_emulate(
+ generate_exception_if(modrm_reg == src1 ||
+ (ea.type != OP_MEM && modrm_reg == modrm_rm),
+ X86_EXC_UD);
+- if ( ea.type != OP_REG || (b & 1) || !evex.brs )
+- avx512_vlen_check(!(b & 1));
++ if ( ea.type != OP_REG || !evex.brs )
++ avx512_vlen_check(b & 1);
+ goto simd_zmm;
+ }
+
+--
+2.46.1
+
diff --git a/0019-update-Xen-version-to-4.18.3-pre.patch b/0019-update-Xen-version-to-4.18.3-pre.patch
deleted file mode 100644
index 34f2b33..0000000
--- a/0019-update-Xen-version-to-4.18.3-pre.patch
+++ /dev/null
@@ -1,25 +0,0 @@
-From 01f7a3c792241d348a4e454a30afdf6c0d6cd71c Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 21 May 2024 11:52:11 +0200
-Subject: [PATCH 19/56] update Xen version to 4.18.3-pre
-
----
- xen/Makefile | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/Makefile b/xen/Makefile
-index 657f6fa4e3..786ab61600 100644
---- a/xen/Makefile
-+++ b/xen/Makefile
-@@ -6,7 +6,7 @@ this-makefile := $(call lastword,$(MAKEFILE_LIST))
- # All other places this is stored (eg. compile.h) should be autogenerated.
- export XEN_VERSION = 4
- export XEN_SUBVERSION = 18
--export XEN_EXTRAVERSION ?= .2$(XEN_VENDORVERSION)
-+export XEN_EXTRAVERSION ?= .3-pre$(XEN_VENDORVERSION)
- export XEN_FULLVERSION = $(XEN_VERSION).$(XEN_SUBVERSION)$(XEN_EXTRAVERSION)
- -include xen-version
-
---
-2.45.2
-
diff --git a/0019-x86-pv-Introduce-x86_merge_dr6-and-fix-do_debug.patch b/0019-x86-pv-Introduce-x86_merge_dr6-and-fix-do_debug.patch
new file mode 100644
index 0000000..87690da
--- /dev/null
+++ b/0019-x86-pv-Introduce-x86_merge_dr6-and-fix-do_debug.patch
@@ -0,0 +1,140 @@
+From de924e4dbac80ac7d94a2e86c37eecccaa1bc677 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 24 Sep 2024 14:30:49 +0200
+Subject: [PATCH 19/35] x86/pv: Introduce x86_merge_dr6() and fix do_debug()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Pretty much everywhere in Xen the logic to update %dr6 when injecting #DB is
+buggy. Introduce a new x86_merge_dr6() helper, and start fixing the mess by
+adjusting the dr6 merge in do_debug(). Also correct the comment.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 54ef601a66e8d812a6a6a308f02524e81201825e
+master date: 2024-08-21 23:59:19 +0100
+---
+ xen/arch/x86/debug.c | 40 ++++++++++++++++++++++++++++
+ xen/arch/x86/include/asm/debugreg.h | 7 +++++
+ xen/arch/x86/include/asm/x86-defns.h | 7 +++++
+ xen/arch/x86/traps.c | 11 +++++---
+ 4 files changed, 62 insertions(+), 3 deletions(-)
+
+diff --git a/xen/arch/x86/debug.c b/xen/arch/x86/debug.c
+index 127fe83021..b10f1f12b6 100644
+--- a/xen/arch/x86/debug.c
++++ b/xen/arch/x86/debug.c
+@@ -2,12 +2,52 @@
+ /*
+ * Copyright (C) 2023 XenServer.
+ */
++#include <xen/bug.h>
+ #include <xen/kernel.h>
+
+ #include <xen/lib/x86/cpu-policy.h>
+
+ #include <asm/debugreg.h>
+
++/*
++ * Merge new bits into dr6. 'new' is always given in positive polarity,
++ * matching the Intel VMCS PENDING_DBG semantics.
++ *
++ * At the time of writing (August 2024), on the subject of %dr6 updates the
++ * manuals are either vague (Intel "certain exceptions may clear bits 0-3"),
++ * or disputed (AMD makes statements which don't match observed behaviour).
++ *
++ * The only debug exception I can find which doesn't clear the breakpoint bits
++ * is ICEBP(/INT1) on AMD systems. This is also the one source of #DB that
++ * doesn't have an explicit status bit, meaning we can't easily identify this
++ * case either (AMD systems don't virtualise PENDING_DBG and only provide a
++ * post-merge %dr6 value).
++ *
++ * Treat %dr6 merging as unconditionally writing the breakpoint bits.
++ *
++ * We can't really manage any better, and guest kernels handling #DB as
++ * instructed by the SDM/APM (i.e. reading %dr6 then resetting it back to
++ * default) wont notice.
++ */
++unsigned int x86_merge_dr6(const struct cpu_policy *p, unsigned int dr6,
++ unsigned int new)
++{
++ /* Flip dr6 to have positive polarity. */
++ dr6 ^= X86_DR6_DEFAULT;
++
++ /* Sanity check that only known values are passed in. */
++ ASSERT(!(dr6 & ~X86_DR6_KNOWN_MASK));
++ ASSERT(!(new & ~X86_DR6_KNOWN_MASK));
++
++ /* Breakpoint bits overridden. All others accumulate. */
++ dr6 = (dr6 & ~X86_DR6_BP_MASK) | new;
++
++ /* Flip dr6 back to having default polarity. */
++ dr6 ^= X86_DR6_DEFAULT;
++
++ return x86_adj_dr6_rsvd(p, dr6);
++}
++
+ unsigned int x86_adj_dr6_rsvd(const struct cpu_policy *p, unsigned int dr6)
+ {
+ unsigned int ones = X86_DR6_DEFAULT;
+diff --git a/xen/arch/x86/include/asm/debugreg.h b/xen/arch/x86/include/asm/debugreg.h
+index 96c406ad53..6baa725441 100644
+--- a/xen/arch/x86/include/asm/debugreg.h
++++ b/xen/arch/x86/include/asm/debugreg.h
+@@ -108,4 +108,11 @@ struct cpu_policy;
+ unsigned int x86_adj_dr6_rsvd(const struct cpu_policy *p, unsigned int dr6);
+ unsigned int x86_adj_dr7_rsvd(const struct cpu_policy *p, unsigned int dr7);
+
++/*
++ * Merge new bits into dr6. 'new' is always given in positive polarity,
++ * matching the Intel VMCS PENDING_DBG semantics.
++ */
++unsigned int x86_merge_dr6(const struct cpu_policy *p, unsigned int dr6,
++ unsigned int new);
++
+ #endif /* _X86_DEBUGREG_H */
+diff --git a/xen/arch/x86/include/asm/x86-defns.h b/xen/arch/x86/include/asm/x86-defns.h
+index 3bcdbaccd3..caa92829ea 100644
+--- a/xen/arch/x86/include/asm/x86-defns.h
++++ b/xen/arch/x86/include/asm/x86-defns.h
+@@ -132,6 +132,13 @@
+ #define X86_DR6_ZEROS _AC(0x00001000, UL) /* %dr6 bits forced to 0 */
+ #define X86_DR6_DEFAULT _AC(0xffff0ff0, UL) /* Default %dr6 value */
+
++#define X86_DR6_BP_MASK \
++ (X86_DR6_B0 | X86_DR6_B1 | X86_DR6_B2 | X86_DR6_B3)
++
++#define X86_DR6_KNOWN_MASK \
++ (X86_DR6_BP_MASK | X86_DR6_BLD | X86_DR6_BD | X86_DR6_BS | \
++ X86_DR6_BT | X86_DR6_RTM)
++
+ /*
+ * Debug control flags in DR7.
+ */
+diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
+index ee91fc56b1..78e83f6fc1 100644
+--- a/xen/arch/x86/traps.c
++++ b/xen/arch/x86/traps.c
+@@ -2017,9 +2017,14 @@ void asmlinkage do_debug(struct cpu_user_regs *regs)
+ return;
+ }
+
+- /* Save debug status register where guest OS can peek at it */
+- v->arch.dr6 |= (dr6 & ~X86_DR6_DEFAULT);
+- v->arch.dr6 &= (dr6 | ~X86_DR6_DEFAULT);
++ /*
++ * Update the guest's dr6 so the debugger can peek at it.
++ *
++ * TODO: This should be passed out-of-band, so guest state is not modified
++ * by debugging actions completed behind it's back.
++ */
++ v->arch.dr6 = x86_merge_dr6(v->domain->arch.cpu_policy,
++ v->arch.dr6, dr6 ^ X86_DR6_DEFAULT);
+
+ if ( guest_kernel_mode(v, regs) && v->domain->debugger_attached )
+ {
+--
+2.46.1
+
diff --git a/0020-x86-pv-Fix-merging-of-new-status-bits-into-dr6.patch b/0020-x86-pv-Fix-merging-of-new-status-bits-into-dr6.patch
new file mode 100644
index 0000000..b9be372
--- /dev/null
+++ b/0020-x86-pv-Fix-merging-of-new-status-bits-into-dr6.patch
@@ -0,0 +1,222 @@
+From b74a5ea8399d1a0466c55332f557863acdae21b6 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 24 Sep 2024 14:34:30 +0200
+Subject: [PATCH 20/35] x86/pv: Fix merging of new status bits into %dr6
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+All #DB exceptions result in an update of %dr6, but this isn't captured in
+Xen's handling, and is buggy just about everywhere.
+
+To begin resolving this issue, add a new pending_dbg field to x86_event
+(unioned with cr2 to avoid taking any extra space, adjusting users to avoid
+old-GCC bugs with anonymous unions), and introduce pv_inject_DB() to replace
+the current callers using pv_inject_hw_exception().
+
+Push the adjustment of v->arch.dr6 into pv_inject_event(), and use the new
+x86_merge_dr6() rather than the current incorrect logic.
+
+A key property is that pending_dbg is taken with positive polarity to deal
+with RTM/BLD sensibly. Most callers pass in a constant, but callers passing
+in a hardware %dr6 value need to XOR the value with X86_DR6_DEFAULT to flip to
+positive polarity.
+
+This fixes the behaviour of the breakpoint status bits; that any left pending
+are generally discarded when a new #DB is raised. In principle it would fix
+RTM/BLD too, except PV guests can't turn these capabilities on to start with.
+
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: db39fa4b27ea470902d4625567cb6fa24030ddfa
+master date: 2024-08-21 23:59:19 +0100
+---
+ xen/arch/x86/include/asm/domain.h | 18 ++++++++++++++++--
+ xen/arch/x86/include/asm/hvm/hvm.h | 3 ++-
+ xen/arch/x86/pv/emul-priv-op.c | 5 +----
+ xen/arch/x86/pv/emulate.c | 9 +++++++--
+ xen/arch/x86/pv/ro-page-fault.c | 2 +-
+ xen/arch/x86/pv/traps.c | 16 ++++++++++++----
+ xen/arch/x86/traps.c | 2 +-
+ xen/arch/x86/x86_emulate/x86_emulate.h | 5 ++++-
+ 8 files changed, 44 insertions(+), 16 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
+index f5daeb182b..5d92891e6f 100644
+--- a/xen/arch/x86/include/asm/domain.h
++++ b/xen/arch/x86/include/asm/domain.h
+@@ -731,15 +731,29 @@ static inline void pv_inject_hw_exception(unsigned int vector, int errcode)
+ pv_inject_event(&event);
+ }
+
++static inline void pv_inject_DB(unsigned long pending_dbg)
++{
++ struct x86_event event = {
++ .vector = X86_EXC_DB,
++ .type = X86_EVENTTYPE_HW_EXCEPTION,
++ .error_code = X86_EVENT_NO_EC,
++ };
++
++ event.pending_dbg = pending_dbg;
++
++ pv_inject_event(&event);
++}
++
+ static inline void pv_inject_page_fault(int errcode, unsigned long cr2)
+ {
+- const struct x86_event event = {
++ struct x86_event event = {
+ .vector = X86_EXC_PF,
+ .type = X86_EVENTTYPE_HW_EXCEPTION,
+ .error_code = errcode,
+- .cr2 = cr2,
+ };
+
++ event.cr2 = cr2;
++
+ pv_inject_event(&event);
+ }
+
+diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
+index 1c01e22c8e..238eece0cf 100644
+--- a/xen/arch/x86/include/asm/hvm/hvm.h
++++ b/xen/arch/x86/include/asm/hvm/hvm.h
+@@ -525,9 +525,10 @@ static inline void hvm_inject_page_fault(int errcode, unsigned long cr2)
+ .vector = X86_EXC_PF,
+ .type = X86_EVENTTYPE_HW_EXCEPTION,
+ .error_code = errcode,
+- .cr2 = cr2,
+ };
+
++ event.cr2 = cr2;
++
+ hvm_inject_event(&event);
+ }
+
+diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
+index aa11ecadaa..15c83b9d23 100644
+--- a/xen/arch/x86/pv/emul-priv-op.c
++++ b/xen/arch/x86/pv/emul-priv-op.c
+@@ -1366,10 +1366,7 @@ int pv_emulate_privileged_op(struct cpu_user_regs *regs)
+ ctxt.bpmatch |= DR_STEP;
+
+ if ( ctxt.bpmatch )
+- {
+- curr->arch.dr6 |= ctxt.bpmatch | DR_STATUS_RESERVED_ONE;
+- pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
+- }
++ pv_inject_DB(ctxt.bpmatch);
+
+ /* fall through */
+ case X86EMUL_RETRY:
+diff --git a/xen/arch/x86/pv/emulate.c b/xen/arch/x86/pv/emulate.c
+index e7a1c0a2cc..8c44dea123 100644
+--- a/xen/arch/x86/pv/emulate.c
++++ b/xen/arch/x86/pv/emulate.c
+@@ -71,10 +71,15 @@ void pv_emul_instruction_done(struct cpu_user_regs *regs, unsigned long rip)
+ {
+ regs->rip = rip;
+ regs->eflags &= ~X86_EFLAGS_RF;
++
+ if ( regs->eflags & X86_EFLAGS_TF )
+ {
+- current->arch.dr6 |= DR_STEP | DR_STATUS_RESERVED_ONE;
+- pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
++ /*
++ * TODO: this should generally use TF from the start of the
++ * instruction. It's only a latent bug for now, as this path isn't
++ * used for any instruction which modifies eflags.
++ */
++ pv_inject_DB(X86_DR6_BS);
+ }
+ }
+
+diff --git a/xen/arch/x86/pv/ro-page-fault.c b/xen/arch/x86/pv/ro-page-fault.c
+index cad28ef928..d0fe07e3a1 100644
+--- a/xen/arch/x86/pv/ro-page-fault.c
++++ b/xen/arch/x86/pv/ro-page-fault.c
+@@ -390,7 +390,7 @@ int pv_ro_page_fault(unsigned long addr, struct cpu_user_regs *regs)
+ /* Fallthrough */
+ case X86EMUL_OKAY:
+ if ( ctxt.retire.singlestep )
+- pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
++ pv_inject_DB(X86_DR6_BS);
+
+ /* Fallthrough */
+ case X86EMUL_RETRY:
+diff --git a/xen/arch/x86/pv/traps.c b/xen/arch/x86/pv/traps.c
+index 83e84e2762..5a7341abf0 100644
+--- a/xen/arch/x86/pv/traps.c
++++ b/xen/arch/x86/pv/traps.c
+@@ -12,6 +12,7 @@
+ #include <xen/lib.h>
+ #include <xen/softirq.h>
+
++#include <asm/debugreg.h>
+ #include <asm/pv/trace.h>
+ #include <asm/shared.h>
+ #include <asm/traps.h>
+@@ -50,9 +51,9 @@ void pv_inject_event(const struct x86_event *event)
+ tb->cs = ti->cs;
+ tb->eip = ti->address;
+
+- if ( event->type == X86_EVENTTYPE_HW_EXCEPTION &&
+- vector == X86_EXC_PF )
++ switch ( vector | -(event->type == X86_EVENTTYPE_SW_INTERRUPT) )
+ {
++ case X86_EXC_PF:
+ curr->arch.pv.ctrlreg[2] = event->cr2;
+ arch_set_cr2(curr, event->cr2);
+
+@@ -62,9 +63,16 @@ void pv_inject_event(const struct x86_event *event)
+ error_code |= PFEC_user_mode;
+
+ trace_pv_page_fault(event->cr2, error_code);
+- }
+- else
++ break;
++
++ case X86_EXC_DB:
++ curr->arch.dr6 = x86_merge_dr6(curr->domain->arch.cpu_policy,
++ curr->arch.dr6, event->pending_dbg);
++ fallthrough;
++ default:
+ trace_pv_trap(vector, regs->rip, use_error_code, error_code);
++ break;
++ }
+
+ if ( use_error_code )
+ {
+diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
+index 78e83f6fc1..8e2df3e719 100644
+--- a/xen/arch/x86/traps.c
++++ b/xen/arch/x86/traps.c
+@@ -2032,7 +2032,7 @@ void asmlinkage do_debug(struct cpu_user_regs *regs)
+ return;
+ }
+
+- pv_inject_hw_exception(X86_EXC_DB, X86_EVENT_NO_EC);
++ pv_inject_DB(0 /* N/A, already merged */);
+ }
+
+ void asmlinkage do_entry_CP(struct cpu_user_regs *regs)
+diff --git a/xen/arch/x86/x86_emulate/x86_emulate.h b/xen/arch/x86/x86_emulate/x86_emulate.h
+index d92be69d84..e8a0e57228 100644
+--- a/xen/arch/x86/x86_emulate/x86_emulate.h
++++ b/xen/arch/x86/x86_emulate/x86_emulate.h
+@@ -78,7 +78,10 @@ struct x86_event {
+ uint8_t type; /* X86_EVENTTYPE_* */
+ uint8_t insn_len; /* Instruction length */
+ int32_t error_code; /* X86_EVENT_NO_EC if n/a */
+- unsigned long cr2; /* Only for X86_EXC_PF h/w exception */
++ union {
++ unsigned long cr2; /* #PF */
++ unsigned long pending_dbg; /* #DB (new DR6 bits, positive polarity) */
++ };
+ };
+
+ /*
+--
+2.46.1
+
diff --git a/0020-x86-ucode-Further-fixes-to-identify-ucode-already-up.patch b/0020-x86-ucode-Further-fixes-to-identify-ucode-already-up.patch
deleted file mode 100644
index c00dce2..0000000
--- a/0020-x86-ucode-Further-fixes-to-identify-ucode-already-up.patch
+++ /dev/null
@@ -1,92 +0,0 @@
-From cd873f00bedca2f1afeaf13a78f70e719c5b1398 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 26 Jun 2024 13:36:13 +0200
-Subject: [PATCH 20/56] x86/ucode: Further fixes to identify "ucode already up
- to date"
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-When the revision in hardware is newer than anything Xen has to hand,
-'microcode_cache' isn't set up. Then, `xen-ucode` initiates the update
-because it doesn't know whether the revisions across the system are symmetric
-or not. This involves the patch getting all the way into the
-apply_microcode() hooks before being found to be too old.
-
-This is all a giant mess and needs an overhaul, but in the short term simply
-adjust the apply_microcode() to return -EEXIST.
-
-Also, unconditionally print the preexisting microcode revision on boot. It's
-relevant information which is otherwise unavailable if Xen doesn't find new
-microcode to use.
-
-Fixes: 648db37a155a ("x86/ucode: Distinguish "ucode already up to date"")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 977d98e67c2e929c62aa1f495fc4c6341c45abb5
-master date: 2024-05-16 13:59:11 +0100
----
- xen/arch/x86/cpu/microcode/amd.c | 7 +++++--
- xen/arch/x86/cpu/microcode/core.c | 2 ++
- xen/arch/x86/cpu/microcode/intel.c | 7 +++++--
- 3 files changed, 12 insertions(+), 4 deletions(-)
-
-diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
-index 75fc84e445..d8f7646e88 100644
---- a/xen/arch/x86/cpu/microcode/amd.c
-+++ b/xen/arch/x86/cpu/microcode/amd.c
-@@ -222,12 +222,15 @@ static int cf_check apply_microcode(const struct microcode_patch *patch)
- uint32_t rev, old_rev = sig->rev;
- enum microcode_match_result result = microcode_fits(patch);
-
-+ if ( result == MIS_UCODE )
-+ return -EINVAL;
-+
- /*
- * Allow application of the same revision to pick up SMT-specific changes
- * even if the revision of the other SMT thread is already up-to-date.
- */
-- if ( result != NEW_UCODE && result != SAME_UCODE )
-- return -EINVAL;
-+ if ( result == OLD_UCODE )
-+ return -EEXIST;
-
- if ( check_final_patch_levels(sig) )
- {
-diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
-index d5338ad345..8a47f4471f 100644
---- a/xen/arch/x86/cpu/microcode/core.c
-+++ b/xen/arch/x86/cpu/microcode/core.c
-@@ -887,6 +887,8 @@ int __init early_microcode_init(unsigned long *module_map,
-
- ucode_ops.collect_cpu_info();
-
-+ printk(XENLOG_INFO "BSP microcode revision: 0x%08x\n", this_cpu(cpu_sig).rev);
-+
- /*
- * Some hypervisors deliberately report a microcode revision of -1 to
- * mean that they will not accept microcode updates.
-diff --git a/xen/arch/x86/cpu/microcode/intel.c b/xen/arch/x86/cpu/microcode/intel.c
-index 060c529a6e..a2d88e3ac0 100644
---- a/xen/arch/x86/cpu/microcode/intel.c
-+++ b/xen/arch/x86/cpu/microcode/intel.c
-@@ -294,10 +294,13 @@ static int cf_check apply_microcode(const struct microcode_patch *patch)
-
- result = microcode_update_match(patch);
-
-- if ( result != NEW_UCODE &&
-- !(opt_ucode_allow_same && result == SAME_UCODE) )
-+ if ( result == MIS_UCODE )
- return -EINVAL;
-
-+ if ( result == OLD_UCODE ||
-+ (result == SAME_UCODE && !opt_ucode_allow_same) )
-+ return -EEXIST;
-+
- wbinvd();
-
- wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)patch->data);
---
-2.45.2
-
diff --git a/0021-x86-msi-prevent-watchdog-triggering-when-dumping-MSI.patch b/0021-x86-msi-prevent-watchdog-triggering-when-dumping-MSI.patch
deleted file mode 100644
index 8bcc63f..0000000
--- a/0021-x86-msi-prevent-watchdog-triggering-when-dumping-MSI.patch
+++ /dev/null
@@ -1,44 +0,0 @@
-From 1ffb29d132600e6a7965c2885505615a6fd6c647 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Wed, 26 Jun 2024 13:36:52 +0200
-Subject: [PATCH 21/56] x86/msi: prevent watchdog triggering when dumping MSI
- state
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Use the same check that's used in dump_irqs().
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 594b22ca5be681ec1b42c34f321cc2600d582210
-master date: 2024-05-20 14:29:44 +0100
----
- xen/arch/x86/msi.c | 4 ++++
- 1 file changed, 4 insertions(+)
-
-diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
-index a78367d7cf..3eaeffd1e0 100644
---- a/xen/arch/x86/msi.c
-+++ b/xen/arch/x86/msi.c
-@@ -17,6 +17,7 @@
- #include <xen/param.h>
- #include <xen/pci.h>
- #include <xen/pci_regs.h>
-+#include <xen/softirq.h>
- #include <xen/iocap.h>
- #include <xen/keyhandler.h>
- #include <xen/pfn.h>
-@@ -1405,6 +1406,9 @@ static void cf_check dump_msi(unsigned char key)
- unsigned long flags;
- const char *type = "???";
-
-+ if ( !(irq & 0x1f) )
-+ process_pending_softirqs();
-+
- if ( !irq_desc_initialized(desc) )
- continue;
-
---
-2.45.2
-
diff --git a/0021-x86-pv-Address-Coverity-complaint-in-check_guest_io_.patch b/0021-x86-pv-Address-Coverity-complaint-in-check_guest_io_.patch
new file mode 100644
index 0000000..b951fb1
--- /dev/null
+++ b/0021-x86-pv-Address-Coverity-complaint-in-check_guest_io_.patch
@@ -0,0 +1,112 @@
+From cb6c3cfc5f8aa8bd8aae1abffea0574b02a04840 Mon Sep 17 00:00:00 2001
+From: Andrew Cooper <andrew.cooper3@citrix.com>
+Date: Tue, 24 Sep 2024 14:36:25 +0200
+Subject: [PATCH 21/35] x86/pv: Address Coverity complaint in
+ check_guest_io_breakpoint()
+
+Commit 08aacc392d86 ("x86/emul: Fix misaligned IO breakpoint behaviour in PV
+guests") caused a Coverity INTEGER_OVERFLOW complaint based on the reasoning
+that width could be 0.
+
+It can't, but digging into the code generation, GCC 8 and later (bisected on
+godbolt) choose to emit a CSWITCH lookup table, and because the range (bottom
+2 bits clear), it's a 16-entry lookup table.
+
+So Coverity is understandable, given that GCC did emit a (dead) logic path
+where width stayed 0.
+
+Rewrite the logic. Introduce x86_bp_width() which compiles to a single basic
+block, which replaces the switch() statement. Take the opportunity to also
+make start and width be loop-scope variables.
+
+No practical change, but it should compile better and placate Coverity.
+
+Fixes: 08aacc392d86 ("x86/emul: Fix misaligned IO breakpoint behaviour in PV guests")
+Coverity-ID: 1616152
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: 6d41a9d8a12ff89adabdc286e63e9391a0481699
+master date: 2024-08-21 23:59:19 +0100
+---
+ xen/arch/x86/include/asm/debugreg.h | 25 +++++++++++++++++++++++++
+ xen/arch/x86/pv/emul-priv-op.c | 21 ++++++---------------
+ 2 files changed, 31 insertions(+), 15 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/debugreg.h b/xen/arch/x86/include/asm/debugreg.h
+index 6baa725441..23aa592e40 100644
+--- a/xen/arch/x86/include/asm/debugreg.h
++++ b/xen/arch/x86/include/asm/debugreg.h
+@@ -115,4 +115,29 @@ unsigned int x86_adj_dr7_rsvd(const struct cpu_policy *p, unsigned int dr7);
+ unsigned int x86_merge_dr6(const struct cpu_policy *p, unsigned int dr6,
+ unsigned int new);
+
++/*
++ * Calculate the width of a breakpoint from its dr7 encoding.
++ *
++ * The LEN encoding in dr7 is 2 bits wide per breakpoint and encoded as a X-1
++ * (0, 1 and 3) for widths of 1, 2 and 4 respectively in the 32bit days.
++ *
++ * In 64bit, the unused value (2) was given a meaning of width 8, which is
++ * great for efficiency but less great for nicely calculating the width.
++ */
++static inline unsigned int x86_bp_width(unsigned int dr7, unsigned int bp)
++{
++ unsigned int raw = (dr7 >> (DR_CONTROL_SHIFT +
++ DR_CONTROL_SIZE * bp + 2)) & 3;
++
++ /*
++ * If the top bit is set (i.e. we've got an 4 or 8 byte wide breakpoint),
++ * flip the bottom to reverse their order, making them sorted properly.
++ * Then it's a simple shift to calculate the width.
++ */
++ if ( raw & 2 )
++ raw ^= 1;
++
++ return 1U << raw;
++}
++
+ #endif /* _X86_DEBUGREG_H */
+diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c
+index 15c83b9d23..b90f745c75 100644
+--- a/xen/arch/x86/pv/emul-priv-op.c
++++ b/xen/arch/x86/pv/emul-priv-op.c
+@@ -323,30 +323,21 @@ static unsigned int check_guest_io_breakpoint(struct vcpu *v,
+ unsigned int port,
+ unsigned int len)
+ {
+- unsigned int width, i, match = 0;
+- unsigned long start;
++ unsigned int i, match = 0;
+
+ if ( !v->arch.pv.dr7_emul || !(v->arch.pv.ctrlreg[4] & X86_CR4_DE) )
+ return 0;
+
+ for ( i = 0; i < 4; i++ )
+ {
++ unsigned long start;
++ unsigned int width;
++
+ if ( !(v->arch.pv.dr7_emul & (3 << (i * DR_ENABLE_SIZE))) )
+ continue;
+
+- start = v->arch.dr[i];
+- width = 0;
+-
+- switch ( (v->arch.dr7 >>
+- (DR_CONTROL_SHIFT + i * DR_CONTROL_SIZE)) & 0xc )
+- {
+- case DR_LEN_1: width = 1; break;
+- case DR_LEN_2: width = 2; break;
+- case DR_LEN_4: width = 4; break;
+- case DR_LEN_8: width = 8; break;
+- }
+-
+- start &= ~(width - 1UL);
++ width = x86_bp_width(v->arch.dr7, i);
++ start = v->arch.dr[i] & ~(width - 1UL);
+
+ if ( (start < (port + len)) && ((start + width) > port) )
+ match |= 1u << i;
+--
+2.46.1
+
diff --git a/0022-x86-irq-remove-offline-CPUs-from-old-CPU-mask-when-a.patch b/0022-x86-irq-remove-offline-CPUs-from-old-CPU-mask-when-a.patch
deleted file mode 100644
index 28fec3e..0000000
--- a/0022-x86-irq-remove-offline-CPUs-from-old-CPU-mask-when-a.patch
+++ /dev/null
@@ -1,44 +0,0 @@
-From 52e16bf065cb42b79d14ac74d701d1f9d8506430 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Wed, 26 Jun 2024 13:37:20 +0200
-Subject: [PATCH 22/56] x86/irq: remove offline CPUs from old CPU mask when
- adjusting move_cleanup_count
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-When adjusting move_cleanup_count to account for CPUs that are offline also
-adjust old_cpu_mask, otherwise further calls to fixup_irqs() could subtract
-those again and create an imbalance in move_cleanup_count.
-
-Fixes: 472e0b74c5c4 ('x86/IRQ: deal with move cleanup count state in fixup_irqs()')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: e63209d3ba2fd1b2f232babd14c9c679ffa7b09a
-master date: 2024-06-10 10:33:22 +0200
----
- xen/arch/x86/irq.c | 8 ++++++++
- 1 file changed, 8 insertions(+)
-
-diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
-index e07006391a..db14df93db 100644
---- a/xen/arch/x86/irq.c
-+++ b/xen/arch/x86/irq.c
-@@ -2576,6 +2576,14 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
- desc->arch.move_cleanup_count -= cpumask_weight(affinity);
- if ( !desc->arch.move_cleanup_count )
- release_old_vec(desc);
-+ else
-+ /*
-+ * Adjust old_cpu_mask to account for the offline CPUs,
-+ * otherwise further calls to fixup_irqs() could subtract those
-+ * again and possibly underflow the counter.
-+ */
-+ cpumask_andnot(desc->arch.old_cpu_mask, desc->arch.old_cpu_mask,
-+ affinity);
- }
-
- if ( !desc->action || cpumask_subset(desc->affinity, mask) )
---
-2.45.2
-
diff --git a/0022-x86emul-always-set-operand-size-for-AVX-VNNI-INT8-in.patch b/0022-x86emul-always-set-operand-size-for-AVX-VNNI-INT8-in.patch
new file mode 100644
index 0000000..cf37c99
--- /dev/null
+++ b/0022-x86emul-always-set-operand-size-for-AVX-VNNI-INT8-in.patch
@@ -0,0 +1,36 @@
+From 1e68200487e662e9f8720d508a1d6b3d3e2c72b9 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 24 Sep 2024 14:37:08 +0200
+Subject: [PATCH 22/35] x86emul: always set operand size for AVX-VNNI-INT8
+ insns
+
+Unlike for AVX-VNNI-INT16 I failed to notice that op_bytes may still be
+zero when reaching the respective case block: With the ext0f38_table[]
+entries having simd_packed_int, the defaulting at the bottom of
+x86emul_decode() won't set the field to non-zero for F3- or F2-prefixed
+insns.
+
+Fixes: 842acaa743a5 ("x86emul: support AVX-VNNI-INT8")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: d45687cca2450bfebe1dfbddb22f4f03c6fbc9cb
+master date: 2024-08-23 09:11:15 +0200
+---
+ xen/arch/x86/x86_emulate/x86_emulate.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
+index 16557385bf..4d9649a2af 100644
+--- a/xen/arch/x86/x86_emulate/x86_emulate.c
++++ b/xen/arch/x86/x86_emulate/x86_emulate.c
+@@ -6075,6 +6075,7 @@ x86_emulate(
+ case X86EMUL_OPC_VEX_F2(0x0f38, 0x51): /* vpdpbssds [xy]mm/mem,[xy]mm,[xy]mm */
+ host_and_vcpu_must_have(avx_vnni_int8);
+ generate_exception_if(vex.w, X86_EXC_UD);
++ op_bytes = 16 << vex.l;
+ goto simd_0f_ymm;
+
+ case X86EMUL_OPC_VEX_66(0x0f38, 0x50): /* vpdpbusd [xy]mm/mem,[xy]mm,[xy]mm */
+--
+2.46.1
+
diff --git a/0023-CI-Update-FreeBSD-to-13.3.patch b/0023-CI-Update-FreeBSD-to-13.3.patch
deleted file mode 100644
index 6a6e7ae..0000000
--- a/0023-CI-Update-FreeBSD-to-13.3.patch
+++ /dev/null
@@ -1,33 +0,0 @@
-From 80f2d2c2a515a6b9a4ea1b128267c6e1b5085002 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 26 Jun 2024 13:37:58 +0200
-Subject: [PATCH 23/56] CI: Update FreeBSD to 13.3
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-Acked-by: Stefano Stabellini <sstabellini@kernel.org>
-master commit: 5ea7f2c9d7a1334b3b2bd5f67fab4d447b60613d
-master date: 2024-06-11 17:00:10 +0100
----
- .cirrus.yml | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/.cirrus.yml b/.cirrus.yml
-index 63f3afb104..e961877881 100644
---- a/.cirrus.yml
-+++ b/.cirrus.yml
-@@ -17,7 +17,7 @@ freebsd_template: &FREEBSD_TEMPLATE
- task:
- name: 'FreeBSD 13'
- freebsd_instance:
-- image_family: freebsd-13-2
-+ image_family: freebsd-13-3
- << : *FREEBSD_TEMPLATE
-
- task:
---
-2.45.2
-
diff --git a/0023-x86emul-set-fake-operand-size-for-AVX512CD-broadcast.patch b/0023-x86emul-set-fake-operand-size-for-AVX512CD-broadcast.patch
new file mode 100644
index 0000000..d94f56e
--- /dev/null
+++ b/0023-x86emul-set-fake-operand-size-for-AVX512CD-broadcast.patch
@@ -0,0 +1,35 @@
+From a0d6b75b832d2f7c54429de1a550fe122bcd6881 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 24 Sep 2024 14:37:52 +0200
+Subject: [PATCH 23/35] x86emul: set (fake) operand size for AVX512CD broadcast
+ insns
+
+Back at the time I failed to pay attention to op_bytes still being zero
+when reaching the respective case block: With the ext0f38_table[]
+entries having simd_packed_int, the defaulting at the bottom of
+x86emul_decode() won't set the field to non-zero for F3-prefixed insns.
+
+Fixes: 37ccca740c26 ("x86emul: support AVX512CD insns")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 6fa6b7feaafd622db3a2f3436750cf07782f4c12
+master date: 2024-08-23 09:12:24 +0200
+---
+ xen/arch/x86/x86_emulate/x86_emulate.c | 1 +
+ 1 file changed, 1 insertion(+)
+
+diff --git a/xen/arch/x86/x86_emulate/x86_emulate.c b/xen/arch/x86/x86_emulate/x86_emulate.c
+index 4d9649a2af..305f4286bf 100644
+--- a/xen/arch/x86/x86_emulate/x86_emulate.c
++++ b/xen/arch/x86/x86_emulate/x86_emulate.c
+@@ -5928,6 +5928,7 @@ x86_emulate(
+ evex.w == ((b >> 4) & 1)),
+ X86_EXC_UD);
+ d |= TwoOp;
++ op_bytes = 1; /* fake */
+ /* fall through */
+ case X86EMUL_OPC_EVEX_66(0x0f38, 0xc4): /* vpconflict{d,q} [xyz]mm/mem,[xyz]mm{k} */
+ fault_suppression = false;
+--
+2.46.1
+
diff --git a/0024-x86-smp-do-not-use-shorthand-IPI-destinations-in-CPU.patch b/0024-x86-smp-do-not-use-shorthand-IPI-destinations-in-CPU.patch
deleted file mode 100644
index b69c88c..0000000
--- a/0024-x86-smp-do-not-use-shorthand-IPI-destinations-in-CPU.patch
+++ /dev/null
@@ -1,98 +0,0 @@
-From 98238d49ecb149a5ac07cb8032817904c404ac2b Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Wed, 26 Jun 2024 13:38:36 +0200
-Subject: [PATCH 24/56] x86/smp: do not use shorthand IPI destinations in CPU
- hot{,un}plug contexts
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Due to the current rwlock logic, if the CPU calling get_cpu_maps() does
-so from a cpu_hotplug_{begin,done}() region the function will still
-return success, because a CPU taking the rwlock in read mode after
-having taken it in write mode is allowed. Such corner case makes using
-get_cpu_maps() alone not enough to prevent using the shorthand in CPU
-hotplug regions.
-
-Introduce a new helper to detect whether the current caller is between a
-cpu_hotplug_{begin,done}() region and use it in send_IPI_mask() to restrict
-shorthand usage.
-
-Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when possible')
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 171c52fba5d94e050d704770480dcb983490d0ad
-master date: 2024-06-12 14:29:31 +0200
----
- xen/arch/x86/smp.c | 2 +-
- xen/common/cpu.c | 5 +++++
- xen/include/xen/cpu.h | 10 ++++++++++
- xen/include/xen/rwlock.h | 2 ++
- 4 files changed, 18 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
-index 3a331cbdbc..340fcafb46 100644
---- a/xen/arch/x86/smp.c
-+++ b/xen/arch/x86/smp.c
-@@ -88,7 +88,7 @@ void send_IPI_mask(const cpumask_t *mask, int vector)
- * the system have been accounted for.
- */
- if ( system_state > SYS_STATE_smp_boot &&
-- !unaccounted_cpus && !disabled_cpus &&
-+ !unaccounted_cpus && !disabled_cpus && !cpu_in_hotplug_context() &&
- /* NB: get_cpu_maps lock requires enabled interrupts. */
- local_irq_is_enabled() && (cpus_locked = get_cpu_maps()) &&
- (park_offline_cpus ||
-diff --git a/xen/common/cpu.c b/xen/common/cpu.c
-index 8709db4d29..6e35b114c0 100644
---- a/xen/common/cpu.c
-+++ b/xen/common/cpu.c
-@@ -68,6 +68,11 @@ void cpu_hotplug_done(void)
- write_unlock(&cpu_add_remove_lock);
- }
-
-+bool cpu_in_hotplug_context(void)
-+{
-+ return rw_is_write_locked_by_me(&cpu_add_remove_lock);
-+}
-+
- static NOTIFIER_HEAD(cpu_chain);
-
- void __init register_cpu_notifier(struct notifier_block *nb)
-diff --git a/xen/include/xen/cpu.h b/xen/include/xen/cpu.h
-index e1d4eb5967..6bf5786750 100644
---- a/xen/include/xen/cpu.h
-+++ b/xen/include/xen/cpu.h
-@@ -13,6 +13,16 @@ void put_cpu_maps(void);
- void cpu_hotplug_begin(void);
- void cpu_hotplug_done(void);
-
-+/*
-+ * Returns true when the caller CPU is between a cpu_hotplug_{begin,done}()
-+ * region.
-+ *
-+ * This is required to safely identify hotplug contexts, as get_cpu_maps()
-+ * would otherwise succeed because a caller holding the lock in write mode is
-+ * allowed to acquire the same lock in read mode.
-+ */
-+bool cpu_in_hotplug_context(void);
-+
- /* Receive notification of CPU hotplug events. */
- void register_cpu_notifier(struct notifier_block *nb);
-
-diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
-index 9e35ee2edf..dc74d1c057 100644
---- a/xen/include/xen/rwlock.h
-+++ b/xen/include/xen/rwlock.h
-@@ -309,6 +309,8 @@ static always_inline void write_lock_irq(rwlock_t *l)
-
- #define rw_is_locked(l) _rw_is_locked(l)
- #define rw_is_write_locked(l) _rw_is_write_locked(l)
-+#define rw_is_write_locked_by_me(l) \
-+ lock_evaluate_nospec(_is_write_locked_by_me(atomic_read(&(l)->cnts)))
-
-
- typedef struct percpu_rwlock percpu_rwlock_t;
---
-2.45.2
-
diff --git a/0024-x86-x2APIC-correct-cluster-tracking-upon-CPUs-going-.patch b/0024-x86-x2APIC-correct-cluster-tracking-upon-CPUs-going-.patch
new file mode 100644
index 0000000..a85c858
--- /dev/null
+++ b/0024-x86-x2APIC-correct-cluster-tracking-upon-CPUs-going-.patch
@@ -0,0 +1,52 @@
+From 404fb9b745dd3f1ca17c3e957e43e3f95ab2613a Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 24 Sep 2024 14:38:27 +0200
+Subject: [PATCH 24/35] x86/x2APIC: correct cluster tracking upon CPUs going
+ down for S3
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Downing CPUs for S3 is somewhat special: Since we can expect the system
+to come back up in exactly the same hardware configuration, per-CPU data
+for the secondary CPUs isn't de-allocated (and then cleared upon re-
+allocation when the CPUs are being brought back up). Therefore the
+cluster_cpus per-CPU pointer will retain its value for all CPUs other
+than the final one in a cluster (i.e. in particular for all CPUs in the
+same cluster as CPU0). That, however, is in conflict with the assertion
+early in init_apic_ldr_x2apic_cluster().
+
+Note that the issue is avoided on Intel hardware, where we park CPUs
+instead of bringing them down.
+
+Extend the bypassing of the freeing to the suspend case, thus making
+suspend/resume also a tiny bit faster.
+
+Fixes: 2e6c8f182c9c ("x86: distinguish CPU offlining from CPU removal")
+Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: ad3ff7b4279d16c91c23cda6e8be5bc670b25c9a
+master date: 2024-08-26 10:30:40 +0200
+---
+ xen/arch/x86/genapic/x2apic.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/genapic/x2apic.c b/xen/arch/x86/genapic/x2apic.c
+index 371dd100c7..d531035fa4 100644
+--- a/xen/arch/x86/genapic/x2apic.c
++++ b/xen/arch/x86/genapic/x2apic.c
+@@ -228,7 +228,8 @@ static int cf_check update_clusterinfo(
+ case CPU_UP_CANCELED:
+ case CPU_DEAD:
+ case CPU_REMOVE:
+- if ( park_offline_cpus == (action != CPU_REMOVE) )
++ if ( park_offline_cpus == (action != CPU_REMOVE) ||
++ system_state == SYS_STATE_suspend )
+ break;
+ if ( per_cpu(cluster_cpus, cpu) )
+ {
+--
+2.46.1
+
diff --git a/0025-x86-dom0-disable-SMAP-for-PV-domain-building-only.patch b/0025-x86-dom0-disable-SMAP-for-PV-domain-building-only.patch
new file mode 100644
index 0000000..f431756
--- /dev/null
+++ b/0025-x86-dom0-disable-SMAP-for-PV-domain-building-only.patch
@@ -0,0 +1,145 @@
+From 743af916723eb4f1197719fc0aebd4460bafb5bf Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 24 Sep 2024 14:39:23 +0200
+Subject: [PATCH 25/35] x86/dom0: disable SMAP for PV domain building only
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Move the logic that disables SMAP so it's only performed when building a PV
+dom0, PVH dom0 builder doesn't require disabling SMAP.
+
+The fixes tag is to account for the wrong usage of cpu_has_smap in
+create_dom0(), it should instead have used
+boot_cpu_has(X86_FEATURE_XEN_SMAP). Fix while moving the logic to apply to PV
+only.
+
+While there also make cr4_pv32_mask __ro_after_init.
+
+Fixes: 493ab190e5b1 ('xen/sm{e, a}p: allow disabling sm{e, a}p for Xen itself')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: fb1658221a31ec1db33253a80001191391e73b17
+master date: 2024-08-28 19:59:07 +0100
+---
+ xen/arch/x86/include/asm/setup.h | 2 ++
+ xen/arch/x86/pv/dom0_build.c | 40 ++++++++++++++++++++++++++++----
+ xen/arch/x86/setup.c | 20 +---------------
+ 3 files changed, 38 insertions(+), 24 deletions(-)
+
+diff --git a/xen/arch/x86/include/asm/setup.h b/xen/arch/x86/include/asm/setup.h
+index d75589178b..8f7dfefb4d 100644
+--- a/xen/arch/x86/include/asm/setup.h
++++ b/xen/arch/x86/include/asm/setup.h
+@@ -64,6 +64,8 @@ extern bool opt_dom0_verbose;
+ extern bool opt_dom0_cpuid_faulting;
+ extern bool opt_dom0_msr_relaxed;
+
++extern unsigned long cr4_pv32_mask;
++
+ #define max_init_domid (0)
+
+ #endif
+diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
+index 57e58a02e7..07e9594493 100644
+--- a/xen/arch/x86/pv/dom0_build.c
++++ b/xen/arch/x86/pv/dom0_build.c
+@@ -354,11 +354,11 @@ static struct page_info * __init alloc_chunk(struct domain *d,
+ return page;
+ }
+
+-int __init dom0_construct_pv(struct domain *d,
+- const module_t *image,
+- unsigned long image_headroom,
+- module_t *initrd,
+- const char *cmdline)
++static int __init dom0_construct(struct domain *d,
++ const module_t *image,
++ unsigned long image_headroom,
++ module_t *initrd,
++ const char *cmdline)
+ {
+ int i, rc, order, machine;
+ bool compatible, compat;
+@@ -1051,6 +1051,36 @@ out:
+ return rc;
+ }
+
++int __init dom0_construct_pv(struct domain *d,
++ const module_t *image,
++ unsigned long image_headroom,
++ module_t *initrd,
++ const char *cmdline)
++{
++ int rc;
++
++ /*
++ * Clear SMAP in CR4 to allow user-accesses in construct_dom0(). This
++ * prevents us needing to rewrite construct_dom0() in terms of
++ * copy_{to,from}_user().
++ */
++ if ( boot_cpu_has(X86_FEATURE_XEN_SMAP) )
++ {
++ cr4_pv32_mask &= ~X86_CR4_SMAP;
++ write_cr4(read_cr4() & ~X86_CR4_SMAP);
++ }
++
++ rc = dom0_construct(d, image, image_headroom, initrd, cmdline);
++
++ if ( boot_cpu_has(X86_FEATURE_XEN_SMAP) )
++ {
++ write_cr4(read_cr4() | X86_CR4_SMAP);
++ cr4_pv32_mask |= X86_CR4_SMAP;
++ }
++
++ return rc;
++}
++
+ /*
+ * Local variables:
+ * mode: C
+diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
+index eee20bb175..f1076c7203 100644
+--- a/xen/arch/x86/setup.c
++++ b/xen/arch/x86/setup.c
+@@ -79,8 +79,7 @@ bool __read_mostly use_invpcid;
+ int8_t __initdata opt_probe_port_aliases = -1;
+ boolean_param("probe-port-aliases", opt_probe_port_aliases);
+
+-/* Only used in asm code and within this source file */
+-unsigned long asmlinkage __read_mostly cr4_pv32_mask;
++unsigned long __ro_after_init cr4_pv32_mask;
+
+ /* **** Linux config option: propagated to domain0. */
+ /* "acpi=off": Sisables both ACPI table parsing and interpreter. */
+@@ -955,26 +954,9 @@ static struct domain *__init create_dom0(const module_t *image,
+ }
+ }
+
+- /*
+- * Temporarily clear SMAP in CR4 to allow user-accesses in construct_dom0().
+- * This saves a large number of corner cases interactions with
+- * copy_from_user().
+- */
+- if ( cpu_has_smap )
+- {
+- cr4_pv32_mask &= ~X86_CR4_SMAP;
+- write_cr4(read_cr4() & ~X86_CR4_SMAP);
+- }
+-
+ if ( construct_dom0(d, image, headroom, initrd, cmdline) != 0 )
+ panic("Could not construct domain 0\n");
+
+- if ( cpu_has_smap )
+- {
+- write_cr4(read_cr4() | X86_CR4_SMAP);
+- cr4_pv32_mask |= X86_CR4_SMAP;
+- }
+-
+ return d;
+ }
+
+--
+2.46.1
+
diff --git a/0025-x86-irq-limit-interrupt-movement-done-by-fixup_irqs.patch b/0025-x86-irq-limit-interrupt-movement-done-by-fixup_irqs.patch
deleted file mode 100644
index 7c40bba..0000000
--- a/0025-x86-irq-limit-interrupt-movement-done-by-fixup_irqs.patch
+++ /dev/null
@@ -1,104 +0,0 @@
-From ce0a0cb0a74a909abf988f242aa228acdd2917fe Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Wed, 26 Jun 2024 13:39:11 +0200
-Subject: [PATCH 25/56] x86/irq: limit interrupt movement done by fixup_irqs()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-The current check used in fixup_irqs() to decide whether to move around
-interrupts is based on the affinity mask, but such mask can have all bits set,
-and hence is unlikely to be a subset of the input mask. For example if an
-interrupt has an affinity mask of all 1s, any input to fixup_irqs() that's not
-an all set CPU mask would cause that interrupt to be shuffled around
-unconditionally.
-
-What fixup_irqs() care about is evacuating interrupts from CPUs not set on the
-input CPU mask, and for that purpose it should check whether the interrupt is
-assigned to a CPU not present in the input mask. Assume that ->arch.cpu_mask
-is a subset of the ->affinity mask, and keep the current logic that resets the
-->affinity mask if the interrupt has to be shuffled around.
-
-Doing the affinity movement based on ->arch.cpu_mask requires removing the
-special handling to ->arch.cpu_mask done for high priority vectors, otherwise
-the adjustment done to cpu_mask makes them always skip the CPU interrupt
-movement.
-
-While there also adjust the comment as to the purpose of fixup_irqs().
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: c7564d7366d865cc407e3d64bca816d07edee174
-master date: 2024-06-12 14:30:40 +0200
----
- xen/arch/x86/include/asm/irq.h | 2 +-
- xen/arch/x86/irq.c | 21 +++++++++++----------
- 2 files changed, 12 insertions(+), 11 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
-index d7fb8ec7e8..71d4a8fc56 100644
---- a/xen/arch/x86/include/asm/irq.h
-+++ b/xen/arch/x86/include/asm/irq.h
-@@ -132,7 +132,7 @@ void free_domain_pirqs(struct domain *d);
- int map_domain_emuirq_pirq(struct domain *d, int pirq, int emuirq);
- int unmap_domain_pirq_emuirq(struct domain *d, int pirq);
-
--/* Reset irq affinities to match the given CPU mask. */
-+/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. */
- void fixup_irqs(const cpumask_t *mask, bool verbose);
- void fixup_eoi(void);
-
-diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
-index db14df93db..566331bec1 100644
---- a/xen/arch/x86/irq.c
-+++ b/xen/arch/x86/irq.c
-@@ -2529,7 +2529,7 @@ static int __init cf_check setup_dump_irqs(void)
- }
- __initcall(setup_dump_irqs);
-
--/* Reset irq affinities to match the given CPU mask. */
-+/* Evacuate interrupts assigned to CPUs not present in the input CPU mask. */
- void fixup_irqs(const cpumask_t *mask, bool verbose)
- {
- unsigned int irq;
-@@ -2553,19 +2553,15 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
-
- vector = irq_to_vector(irq);
- if ( vector >= FIRST_HIPRIORITY_VECTOR &&
-- vector <= LAST_HIPRIORITY_VECTOR )
-+ vector <= LAST_HIPRIORITY_VECTOR &&
-+ desc->handler == &no_irq_type )
- {
-- cpumask_and(desc->arch.cpu_mask, desc->arch.cpu_mask, mask);
--
- /*
- * This can in particular happen when parking secondary threads
- * during boot and when the serial console wants to use a PCI IRQ.
- */
-- if ( desc->handler == &no_irq_type )
-- {
-- spin_unlock(&desc->lock);
-- continue;
-- }
-+ spin_unlock(&desc->lock);
-+ continue;
- }
-
- if ( desc->arch.move_cleanup_count )
-@@ -2586,7 +2582,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
- affinity);
- }
-
-- if ( !desc->action || cpumask_subset(desc->affinity, mask) )
-+ /*
-+ * Avoid shuffling the interrupt around as long as current target CPUs
-+ * are a subset of the input mask. What fixup_irqs() cares about is
-+ * evacuating interrupts from CPUs not in the input mask.
-+ */
-+ if ( !desc->action || cpumask_subset(desc->arch.cpu_mask, mask) )
- {
- spin_unlock(&desc->lock);
- continue;
---
-2.45.2
-
diff --git a/0026-x86-EPT-correct-special-page-checking-in-epte_get_en.patch b/0026-x86-EPT-correct-special-page-checking-in-epte_get_en.patch
deleted file mode 100644
index c94728a..0000000
--- a/0026-x86-EPT-correct-special-page-checking-in-epte_get_en.patch
+++ /dev/null
@@ -1,46 +0,0 @@
-From 6e647efaf2b02ce92bcf80bec47c18cca5084f8a Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Wed, 26 Jun 2024 13:39:44 +0200
-Subject: [PATCH 26/56] x86/EPT: correct special page checking in
- epte_get_entry_emt()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-mfn_valid() granularity is (currently) 256Mb. Therefore the start of a
-1Gb page passing the test doesn't necessarily mean all parts of such a
-range would also pass. Yet using the result of mfn_to_page() on an MFN
-which doesn't pass mfn_valid() checking is liable to result in a crash
-(the invocation of mfn_to_page() alone is presumably "just" UB in such a
-case).
-
-Fixes: ca24b2ffdbd9 ("x86/hvm: set 'ipat' in EPT for special pages")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 5540b94e8191059eb9cbbe98ac316232a42208f6
-master date: 2024-06-13 16:53:34 +0200
----
- xen/arch/x86/mm/p2m-ept.c | 6 +++++-
- 1 file changed, 5 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
-index 85c4e8e54f..1aa6bbc771 100644
---- a/xen/arch/x86/mm/p2m-ept.c
-+++ b/xen/arch/x86/mm/p2m-ept.c
-@@ -518,8 +518,12 @@ int epte_get_entry_emt(struct domain *d, gfn_t gfn, mfn_t mfn,
- }
-
- for ( special_pgs = i = 0; i < (1ul << order); i++ )
-- if ( is_special_page(mfn_to_page(mfn_add(mfn, i))) )
-+ {
-+ mfn_t cur = mfn_add(mfn, i);
-+
-+ if ( mfn_valid(cur) && is_special_page(mfn_to_page(cur)) )
- special_pgs++;
-+ }
-
- if ( special_pgs )
- {
---
-2.45.2
-
diff --git a/0026-x86-HVM-correct-partial-HPET_STATUS-write-emulation.patch b/0026-x86-HVM-correct-partial-HPET_STATUS-write-emulation.patch
new file mode 100644
index 0000000..3b79d84
--- /dev/null
+++ b/0026-x86-HVM-correct-partial-HPET_STATUS-write-emulation.patch
@@ -0,0 +1,37 @@
+From 6e96dee93c60af4ee446f5e0fddf3b424824de18 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 24 Sep 2024 14:40:03 +0200
+Subject: [PATCH 26/35] x86/HVM: correct partial HPET_STATUS write emulation
+
+For partial writes the non-written parts of registers are folded into
+the full 64-bit value from what they're presently set to. That's wrong
+to do though when the behavior is write-1-to-clear: Writes not
+including to low 3 bits would unconditionally clear all ISR bits which
+are presently set. Re-calculate the value to use.
+
+Fixes: be07023be115 ("x86/vhpet: add support for level triggered interrupts")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 41d358d2f9607ba37c216effa39b9f1bc58de69d
+master date: 2024-08-29 10:02:20 +0200
+---
+ xen/arch/x86/hvm/hpet.c | 3 ++-
+ 1 file changed, 2 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
+index 87642575f9..f0e5f877f4 100644
+--- a/xen/arch/x86/hvm/hpet.c
++++ b/xen/arch/x86/hvm/hpet.c
+@@ -404,7 +404,8 @@ static int cf_check hpet_write(
+ break;
+
+ case HPET_STATUS:
+- /* write 1 to clear. */
++ /* Write 1 to clear. Therefore don't use new_val directly here. */
++ new_val = val << ((addr & 7) * 8);
+ while ( new_val )
+ {
+ bool active;
+--
+2.46.1
+
diff --git a/0027-Arm64-adjust-__irq_to_desc-to-fix-build-with-gcc14.patch b/0027-Arm64-adjust-__irq_to_desc-to-fix-build-with-gcc14.patch
new file mode 100644
index 0000000..a95c549
--- /dev/null
+++ b/0027-Arm64-adjust-__irq_to_desc-to-fix-build-with-gcc14.patch
@@ -0,0 +1,61 @@
+From ee826bc490d6036ed9b637ada014a2d59d151f79 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 24 Sep 2024 14:40:34 +0200
+Subject: [PATCH 27/35] Arm64: adjust __irq_to_desc() to fix build with gcc14
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+With the original code I observe
+
+In function ‘__irq_to_desc’,
+ inlined from ‘route_irq_to_guest’ at arch/arm/irq.c:465:12:
+arch/arm/irq.c:54:16: error: array subscript -2 is below array bounds of ‘irq_desc_t[32]’ {aka ‘struct irq_desc[32]’} [-Werror=array-bounds=]
+ 54 | return &this_cpu(local_irq_desc)[irq];
+ | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+which looks pretty bogus: How in the world does the compiler arrive at
+-2 when compiling route_irq_to_guest()? Yet independent of that the
+function's parameter wants to be of unsigned type anyway, as shown by
+a vast majority of callers (others use plain int when they really mean
+non-negative quantities). With that adjustment the code compiles fine
+again.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Michal Orzel <michal.orzel@amd.com>
+master commit: 99f942f3d410059dc223ee0a908827e928ef3592
+master date: 2024-08-29 10:03:53 +0200
+---
+ xen/arch/arm/include/asm/irq.h | 2 +-
+ xen/arch/arm/irq.c | 2 +-
+ 2 files changed, 2 insertions(+), 2 deletions(-)
+
+diff --git a/xen/arch/arm/include/asm/irq.h b/xen/arch/arm/include/asm/irq.h
+index ec437add09..88e060bf29 100644
+--- a/xen/arch/arm/include/asm/irq.h
++++ b/xen/arch/arm/include/asm/irq.h
+@@ -56,7 +56,7 @@ extern const unsigned int nr_irqs;
+ struct irq_desc;
+ struct irqaction;
+
+-struct irq_desc *__irq_to_desc(int irq);
++struct irq_desc *__irq_to_desc(unsigned int irq);
+
+ #define irq_to_desc(irq) __irq_to_desc(irq)
+
+diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
+index 6b89f64fd1..b9757d7ad3 100644
+--- a/xen/arch/arm/irq.c
++++ b/xen/arch/arm/irq.c
+@@ -48,7 +48,7 @@ void irq_end_none(struct irq_desc *irq)
+ static irq_desc_t irq_desc[NR_IRQS];
+ static DEFINE_PER_CPU(irq_desc_t[NR_LOCAL_IRQS], local_irq_desc);
+
+-struct irq_desc *__irq_to_desc(int irq)
++struct irq_desc *__irq_to_desc(unsigned int irq)
+ {
+ if ( irq < NR_LOCAL_IRQS )
+ return &this_cpu(local_irq_desc)[irq];
+--
+2.46.1
+
diff --git a/0027-x86-EPT-avoid-marking-non-present-entries-for-re-con.patch b/0027-x86-EPT-avoid-marking-non-present-entries-for-re-con.patch
deleted file mode 100644
index 23e8946..0000000
--- a/0027-x86-EPT-avoid-marking-non-present-entries-for-re-con.patch
+++ /dev/null
@@ -1,85 +0,0 @@
-From d31385be5c8e8bc5efb6f8848057bd0c69e8274a Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Wed, 26 Jun 2024 13:40:11 +0200
-Subject: [PATCH 27/56] x86/EPT: avoid marking non-present entries for
- re-configuring
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-For non-present entries EMT, like most other fields, is meaningless to
-hardware. Make the logic in ept_set_entry() setting the field (and iPAT)
-conditional upon dealing with a present entry, leaving the value at 0
-otherwise. This has two effects for epte_get_entry_emt() which we'll
-want to leverage subsequently:
-1) The call moved here now won't be issued with INVALID_MFN anymore (a
- respective BUG_ON() is being added).
-2) Neither of the other two calls could now be issued with a truncated
- form of INVALID_MFN anymore (as long as there's no bug anywhere
- marking an entry present when that was populated using INVALID_MFN).
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 777c71d31325bc55ba1cc3f317d4155fe519ab0b
-master date: 2024-06-13 16:54:17 +0200
----
- xen/arch/x86/mm/p2m-ept.c | 29 ++++++++++++++++++-----------
- 1 file changed, 18 insertions(+), 11 deletions(-)
-
-diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
-index 1aa6bbc771..641d61b350 100644
---- a/xen/arch/x86/mm/p2m-ept.c
-+++ b/xen/arch/x86/mm/p2m-ept.c
-@@ -649,6 +649,8 @@ static int cf_check resolve_misconfig(struct p2m_domain *p2m, unsigned long gfn)
- if ( e.emt != MTRR_NUM_TYPES )
- break;
-
-+ ASSERT(is_epte_present(&e));
-+
- if ( level == 0 )
- {
- for ( gfn -= i, i = 0; i < EPT_PAGETABLE_ENTRIES; ++i )
-@@ -914,17 +916,6 @@ ept_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
-
- if ( mfn_valid(mfn) || p2m_allows_invalid_mfn(p2mt) )
- {
-- bool ipat;
-- int emt = epte_get_entry_emt(p2m->domain, _gfn(gfn), mfn,
-- i * EPT_TABLE_ORDER, &ipat,
-- p2mt);
--
-- if ( emt >= 0 )
-- new_entry.emt = emt;
-- else /* ept_handle_misconfig() will need to take care of this. */
-- new_entry.emt = MTRR_NUM_TYPES;
--
-- new_entry.ipat = ipat;
- new_entry.sp = !!i;
- new_entry.sa_p2mt = p2mt;
- new_entry.access = p2ma;
-@@ -940,6 +931,22 @@ ept_set_entry(struct p2m_domain *p2m, gfn_t gfn_, mfn_t mfn,
- need_modify_vtd_table = 0;
-
- ept_p2m_type_to_flags(p2m, &new_entry);
-+
-+ if ( is_epte_present(&new_entry) )
-+ {
-+ bool ipat;
-+ int emt = epte_get_entry_emt(p2m->domain, _gfn(gfn), mfn,
-+ i * EPT_TABLE_ORDER, &ipat,
-+ p2mt);
-+
-+ BUG_ON(mfn_eq(mfn, INVALID_MFN));
-+
-+ if ( emt >= 0 )
-+ new_entry.emt = emt;
-+ else /* ept_handle_misconfig() will need to take care of this. */
-+ new_entry.emt = MTRR_NUM_TYPES;
-+ new_entry.ipat = ipat;
-+ }
- }
-
- if ( sve != -1 )
---
-2.45.2
-
diff --git a/0028-libxl-Fix-nul-termination-of-the-return-value-of-lib.patch b/0028-libxl-Fix-nul-termination-of-the-return-value-of-lib.patch
new file mode 100644
index 0000000..7f43c74
--- /dev/null
+++ b/0028-libxl-Fix-nul-termination-of-the-return-value-of-lib.patch
@@ -0,0 +1,100 @@
+From c18635fd69fc2da238f00a26ab707f1b2a50bf64 Mon Sep 17 00:00:00 2001
+From: Javi Merino <javi.merino@cloud.com>
+Date: Tue, 24 Sep 2024 14:41:06 +0200
+Subject: [PATCH 28/35] libxl: Fix nul-termination of the return value of
+ libxl_xen_console_read_line()
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+When built with ASAN, "xl dmesg" crashes in the "printf("%s", line)"
+call in main_dmesg(). ASAN reports a heap buffer overflow: an
+off-by-one access to cr->buffer.
+
+The readconsole sysctl copies up to count characters into the buffer,
+but it does not add a null character at the end. Despite the
+documentation of libxl_xen_console_read_line(), line_r is not
+nul-terminated if 16384 characters were copied to the buffer.
+
+Fix this by asking xc_readconsolering() to fill the buffer up to size
+- 1. As the number of characters in the buffer is only needed in
+libxl_xen_console_read_line(), make it a local variable there instead
+of part of the libxl__xen_console_reader struct.
+
+Fixes: 4024bae739cc ("xl: Add subcommand 'xl dmesg'")
+Reported-by: Edwin Török <edwin.torok@cloud.com>
+Signed-off-by: Javi Merino <javi.merino@cloud.com>
+Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
+master commit: bb03169bcb6ecccf372de1f6b9285cd519a26bb8
+master date: 2024-09-03 10:53:44 +0100
+---
+ tools/libs/light/libxl_console.c | 19 +++++++++++++++----
+ tools/libs/light/libxl_internal.h | 1 -
+ 2 files changed, 15 insertions(+), 5 deletions(-)
+
+diff --git a/tools/libs/light/libxl_console.c b/tools/libs/light/libxl_console.c
+index a563c9d3c7..9f736b8913 100644
+--- a/tools/libs/light/libxl_console.c
++++ b/tools/libs/light/libxl_console.c
+@@ -774,12 +774,17 @@ libxl_xen_console_reader *
+ {
+ GC_INIT(ctx);
+ libxl_xen_console_reader *cr;
+- unsigned int size = 16384;
++ /*
++ * We want xen to fill the buffer in as few hypercalls as
++ * possible, but xen will not nul-terminate it. The default size
++ * of Xen's console buffer is 16384. Leave one byte at the end
++ * for the null character.
++ */
++ unsigned int size = 16384 + 1;
+
+ cr = libxl__zalloc(NOGC, sizeof(libxl_xen_console_reader));
+ cr->buffer = libxl__zalloc(NOGC, size);
+ cr->size = size;
+- cr->count = size;
+ cr->clear = clear;
+ cr->incremental = 1;
+
+@@ -800,10 +805,16 @@ int libxl_xen_console_read_line(libxl_ctx *ctx,
+ char **line_r)
+ {
+ int ret;
++ /*
++ * Number of chars to copy into the buffer. xc_readconsolering()
++ * does not add a null character at the end, so leave a space for
++ * us to add it.
++ */
++ unsigned int nr_chars = cr->size - 1;
+ GC_INIT(ctx);
+
+ memset(cr->buffer, 0, cr->size);
+- ret = xc_readconsolering(ctx->xch, cr->buffer, &cr->count,
++ ret = xc_readconsolering(ctx->xch, cr->buffer, &nr_chars,
+ cr->clear, cr->incremental, &cr->index);
+ if (ret < 0) {
+ LOGE(ERROR, "reading console ring buffer");
+@@ -811,7 +822,7 @@ int libxl_xen_console_read_line(libxl_ctx *ctx,
+ return ERROR_FAIL;
+ }
+ if (!ret) {
+- if (cr->count) {
++ if (nr_chars) {
+ *line_r = cr->buffer;
+ ret = 1;
+ } else {
+diff --git a/tools/libs/light/libxl_internal.h b/tools/libs/light/libxl_internal.h
+index 3b58bb2d7f..96d14f5746 100644
+--- a/tools/libs/light/libxl_internal.h
++++ b/tools/libs/light/libxl_internal.h
+@@ -2077,7 +2077,6 @@ _hidden char *libxl__uuid2string(libxl__gc *gc, const libxl_uuid uuid);
+ struct libxl__xen_console_reader {
+ char *buffer;
+ unsigned int size;
+- unsigned int count;
+ unsigned int clear;
+ unsigned int incremental;
+ unsigned int index;
+--
+2.46.1
+
diff --git a/0028-x86-EPT-drop-questionable-mfn_valid-from-epte_get_en.patch b/0028-x86-EPT-drop-questionable-mfn_valid-from-epte_get_en.patch
deleted file mode 100644
index ee495d4..0000000
--- a/0028-x86-EPT-drop-questionable-mfn_valid-from-epte_get_en.patch
+++ /dev/null
@@ -1,47 +0,0 @@
-From 3b777c2ce4ea8cf67b79a5496e51201145606798 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Wed, 26 Jun 2024 13:40:35 +0200
-Subject: [PATCH 28/56] x86/EPT: drop questionable mfn_valid() from
- epte_get_entry_emt()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-mfn_valid() is RAM-focused; it will often return false for MMIO. Yet
-access to actual MMIO space should not generally be restricted to UC
-only; especially video frame buffer accesses are unduly affected by such
-a restriction.
-
-Since, as of 777c71d31325 ("x86/EPT: avoid marking non-present entries
-for re-configuring"), the function won't be called with INVALID_MFN or,
-worse, truncated forms thereof anymore, we call fully drop that check.
-
-Fixes: 81fd0d3ca4b2 ("x86/hvm: simplify 'mmio_direct' check in epte_get_entry_emt()")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 4fdd8d75566fdad06667a79ec0ce6f43cc466c54
-master date: 2024-06-13 16:55:22 +0200
----
- xen/arch/x86/mm/p2m-ept.c | 6 ------
- 1 file changed, 6 deletions(-)
-
-diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
-index 641d61b350..d325424e97 100644
---- a/xen/arch/x86/mm/p2m-ept.c
-+++ b/xen/arch/x86/mm/p2m-ept.c
-@@ -500,12 +500,6 @@ int epte_get_entry_emt(struct domain *d, gfn_t gfn, mfn_t mfn,
- return -1;
- }
-
-- if ( !mfn_valid(mfn) )
-- {
-- *ipat = true;
-- return X86_MT_UC;
-- }
--
- /*
- * Conditional must be kept in sync with the code in
- * {iomem,ioports}_{permit,deny}_access().
---
-2.45.2
-
diff --git a/0029-SUPPORT.md-split-XSM-from-Flask.patch b/0029-SUPPORT.md-split-XSM-from-Flask.patch
new file mode 100644
index 0000000..4cf9adb
--- /dev/null
+++ b/0029-SUPPORT.md-split-XSM-from-Flask.patch
@@ -0,0 +1,66 @@
+From 3ceb79ceabab58305a0f35aed0117537f7a6b922 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 24 Sep 2024 14:41:51 +0200
+Subject: [PATCH 29/35] SUPPORT.md: split XSM from Flask
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+XSM is a generic framework, which in particular is also used by SILO.
+With this it can't really be experimental: Arm mandates SILO for having
+a security supported configuration.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
+master commit: d7c18b8720824d7efc39ffa7296751e1812865a9
+master date: 2024-09-04 16:05:03 +0200
+---
+ SUPPORT.md | 19 +++++++++++++++++--
+ 1 file changed, 17 insertions(+), 2 deletions(-)
+
+diff --git a/SUPPORT.md b/SUPPORT.md
+index 1d8b38cbd0..ba6052477b 100644
+--- a/SUPPORT.md
++++ b/SUPPORT.md
+@@ -768,13 +768,21 @@ Compile time disabled for ARM by default.
+
+ Status, x86: Supported, not security supported
+
+-### XSM & FLASK
++### XSM (Xen Security Module) Framework
++
++XSM is a security policy framework. The dummy implementation is covered by this
++statement, and implements a policy whereby dom0 is all powerful. See below for
++alternative modules (FLASK, SILO).
++
++ Status: Supported
++
++### FLASK XSM Module
+
+ Status: Experimental
+
+ Compile time disabled by default.
+
+-Also note that using XSM
++Also note that using FLASK
+ to delegate various domain control hypercalls
+ to particular other domains, rather than only permitting use by dom0,
+ is also specifically excluded from security support for many hypercalls.
+@@ -787,6 +795,13 @@ Please see XSA-77 for more details.
+ The default policy includes FLASK labels and roles for a "typical" Xen-based system
+ with dom0, driver domains, stub domains, domUs, and so on.
+
++### SILO XSM Module
++
++SILO extends the dummy policy by enforcing that DomU-s can only communicate
++with Dom0, yet not with each other.
++
++ Status: Supported
++
+ ## Virtual Hardware, Hypervisor
+
+ ### x86/Nested PV
+--
+2.46.1
+
diff --git a/0029-x86-Intel-unlock-CPUID-earlier-for-the-BSP.patch b/0029-x86-Intel-unlock-CPUID-earlier-for-the-BSP.patch
deleted file mode 100644
index 6722508..0000000
--- a/0029-x86-Intel-unlock-CPUID-earlier-for-the-BSP.patch
+++ /dev/null
@@ -1,105 +0,0 @@
-From c4b284912695a5802433512b913e968eda01544f Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Wed, 26 Jun 2024 13:41:05 +0200
-Subject: [PATCH 29/56] x86/Intel: unlock CPUID earlier for the BSP
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If
-this bit is set by the BIOS then CPUID evaluation does not work when
-data from any leaf greater than two is needed; early_cpu_init() in
-particular wants to collect leaf 7 data.
-
-Cure this by unlocking CPUID right before evaluating anything which
-depends on the maximum CPUID leaf being greater than two.
-
-Inspired by (and description cloned from) Linux commit 0c2f6d04619e
-("x86/topology/intel: Unlock CPUID before evaluating anything").
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: fa4d026737a47cd1d66ffb797a29150b4453aa9f
-master date: 2024-06-18 15:12:44 +0200
----
- xen/arch/x86/cpu/common.c | 3 ++-
- xen/arch/x86/cpu/cpu.h | 2 ++
- xen/arch/x86/cpu/intel.c | 29 +++++++++++++++++------------
- 3 files changed, 21 insertions(+), 13 deletions(-)
-
-diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
-index 26eed2ade1..edec0a2546 100644
---- a/xen/arch/x86/cpu/common.c
-+++ b/xen/arch/x86/cpu/common.c
-@@ -336,7 +336,8 @@ void __init early_cpu_init(bool verbose)
-
- c->x86_vendor = x86_cpuid_lookup_vendor(ebx, ecx, edx);
- switch (c->x86_vendor) {
-- case X86_VENDOR_INTEL: actual_cpu = intel_cpu_dev; break;
-+ case X86_VENDOR_INTEL: intel_unlock_cpuid_leaves(c);
-+ actual_cpu = intel_cpu_dev; break;
- case X86_VENDOR_AMD: actual_cpu = amd_cpu_dev; break;
- case X86_VENDOR_CENTAUR: actual_cpu = centaur_cpu_dev; break;
- case X86_VENDOR_SHANGHAI: actual_cpu = shanghai_cpu_dev; break;
-diff --git a/xen/arch/x86/cpu/cpu.h b/xen/arch/x86/cpu/cpu.h
-index e3d06278b3..8be65e975a 100644
---- a/xen/arch/x86/cpu/cpu.h
-+++ b/xen/arch/x86/cpu/cpu.h
-@@ -24,3 +24,5 @@ void amd_init_lfence(struct cpuinfo_x86 *c);
- void amd_init_ssbd(const struct cpuinfo_x86 *c);
- void amd_init_spectral_chicken(void);
- void detect_zen2_null_seg_behaviour(void);
-+
-+void intel_unlock_cpuid_leaves(struct cpuinfo_x86 *c);
-diff --git a/xen/arch/x86/cpu/intel.c b/xen/arch/x86/cpu/intel.c
-index deb7b70464..0dc7c27601 100644
---- a/xen/arch/x86/cpu/intel.c
-+++ b/xen/arch/x86/cpu/intel.c
-@@ -303,10 +303,24 @@ static void __init noinline intel_init_levelling(void)
- ctxt_switch_masking = intel_ctxt_switch_masking;
- }
-
--static void cf_check early_init_intel(struct cpuinfo_x86 *c)
-+/* Unmask CPUID levels if masked. */
-+void intel_unlock_cpuid_leaves(struct cpuinfo_x86 *c)
- {
-- u64 misc_enable, disable;
-+ uint64_t misc_enable, disable;
-+
-+ rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
-+
-+ disable = misc_enable & MSR_IA32_MISC_ENABLE_LIMIT_CPUID;
-+ if (disable) {
-+ wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable);
-+ bootsym(trampoline_misc_enable_off) |= disable;
-+ c->cpuid_level = cpuid_eax(0);
-+ printk(KERN_INFO "revised cpuid level: %u\n", c->cpuid_level);
-+ }
-+}
-
-+static void cf_check early_init_intel(struct cpuinfo_x86 *c)
-+{
- /* Netburst reports 64 bytes clflush size, but does IO in 128 bytes */
- if (c->x86 == 15 && c->x86_cache_alignment == 64)
- c->x86_cache_alignment = 128;
-@@ -315,16 +329,7 @@ static void cf_check early_init_intel(struct cpuinfo_x86 *c)
- bootsym(trampoline_misc_enable_off) & MSR_IA32_MISC_ENABLE_XD_DISABLE)
- printk(KERN_INFO "re-enabled NX (Execute Disable) protection\n");
-
-- /* Unmask CPUID levels and NX if masked: */
-- rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
--
-- disable = misc_enable & MSR_IA32_MISC_ENABLE_LIMIT_CPUID;
-- if (disable) {
-- wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable);
-- bootsym(trampoline_misc_enable_off) |= disable;
-- printk(KERN_INFO "revised cpuid level: %d\n",
-- cpuid_eax(0));
-- }
-+ intel_unlock_cpuid_leaves(c);
-
- /* CPUID workaround for Intel 0F33/0F34 CPU */
- if (boot_cpu_data.x86 == 0xF && boot_cpu_data.x86_model == 3 &&
---
-2.45.2
-
diff --git a/0030-x86-fix-UP-build-with-gcc14.patch b/0030-x86-fix-UP-build-with-gcc14.patch
new file mode 100644
index 0000000..bdb7dbe
--- /dev/null
+++ b/0030-x86-fix-UP-build-with-gcc14.patch
@@ -0,0 +1,63 @@
+From d625c4e9fb46ef1b81a5b32d8fe1774c432cddd6 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 24 Sep 2024 14:41:59 +0200
+Subject: [PATCH 30/35] x86: fix UP build with gcc14
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+The complaint is:
+
+In file included from ././include/xen/config.h:17,
+ from <command-line>:
+arch/x86/smpboot.c: In function ‘link_thread_siblings.constprop’:
+./include/asm-generic/percpu.h:16:51: error: array subscript [0, 0] is outside array bounds of ‘long unsigned int[1]’ [-Werror=array-bounds=]
+ 16 | (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
+./include/xen/compiler.h:140:29: note: in definition of macro ‘RELOC_HIDE’
+ 140 | (typeof(ptr)) (__ptr + (off)); })
+ | ^~~
+arch/x86/smpboot.c:238:27: note: in expansion of macro ‘per_cpu’
+ 238 | cpumask_set_cpu(cpu2, per_cpu(cpu_sibling_mask, cpu1));
+ | ^~~~~~~
+In file included from ./arch/x86/include/generated/asm/percpu.h:1,
+ from ./include/xen/percpu.h:30,
+ from ./arch/x86/include/asm/cpuid.h:9,
+ from ./arch/x86/include/asm/cpufeature.h:11,
+ from ./arch/x86/include/asm/system.h:6,
+ from ./include/xen/list.h:11,
+ from ./include/xen/mm.h:68,
+ from arch/x86/smpboot.c:12:
+./include/asm-generic/percpu.h:12:22: note: while referencing ‘__per_cpu_offset’
+ 12 | extern unsigned long __per_cpu_offset[NR_CPUS];
+ | ^~~~~~~~~~~~~~~~
+
+Which I consider bogus in the first place ("array subscript [0, 0]" vs a
+1-element array). Yet taking the experience from 99f942f3d410 ("Arm64:
+adjust __irq_to_desc() to fix build with gcc14") I guessed that
+switching function parameters to unsigned int (which they should have
+been anyway) might help. And voilà ...
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: a2de7dc4d845738e734b10fce6550c89c6b1092c
+master date: 2024-09-04 16:09:28 +0200
+---
+ xen/arch/x86/smpboot.c | 2 +-
+ 1 file changed, 1 insertion(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
+index 8aa621533f..0a89f22a39 100644
+--- a/xen/arch/x86/smpboot.c
++++ b/xen/arch/x86/smpboot.c
+@@ -226,7 +226,7 @@ static int booting_cpu;
+ /* CPUs for which sibling maps can be computed. */
+ static cpumask_t cpu_sibling_setup_map;
+
+-static void link_thread_siblings(int cpu1, int cpu2)
++static void link_thread_siblings(unsigned int cpu1, unsigned int cpu2)
+ {
+ cpumask_set_cpu(cpu1, per_cpu(cpu_sibling_mask, cpu2));
+ cpumask_set_cpu(cpu2, per_cpu(cpu_sibling_mask, cpu1));
+--
+2.46.1
+
diff --git a/0030-x86-irq-deal-with-old_cpu_mask-for-interrupts-in-mov.patch b/0030-x86-irq-deal-with-old_cpu_mask-for-interrupts-in-mov.patch
deleted file mode 100644
index 785df10..0000000
--- a/0030-x86-irq-deal-with-old_cpu_mask-for-interrupts-in-mov.patch
+++ /dev/null
@@ -1,84 +0,0 @@
-From 39a6170c15bf369a2b26c855ea7621387ed4070b Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Wed, 26 Jun 2024 13:41:35 +0200
-Subject: [PATCH 30/56] x86/irq: deal with old_cpu_mask for interrupts in
- movement in fixup_irqs()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Given the current logic it's possible for ->arch.old_cpu_mask to get out of
-sync: if a CPU set in old_cpu_mask is offlined and then onlined
-again without old_cpu_mask having been updated the data in the mask will no
-longer be accurate, as when brought back online the CPU will no longer have
-old_vector configured to handle the old interrupt source.
-
-If there's an interrupt movement in progress, and the to be offlined CPU (which
-is the call context) is in the old_cpu_mask, clear it and update the mask, so
-it doesn't contain stale data.
-
-Note that when the system is going down fixup_irqs() will be called by
-smp_send_stop() from CPU 0 with a mask with only CPU 0 on it, effectively
-asking to move all interrupts to the current caller (CPU 0) which is the only
-CPU to remain online. In that case we don't care to migrate interrupts that
-are in the process of being moved, as it's likely we won't be able to move all
-interrupts to CPU 0 due to vector shortage anyway.
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 817d1cd627be668c358d038f0fadbf7d24d417d3
-master date: 2024-06-18 15:14:49 +0200
----
- xen/arch/x86/irq.c | 29 ++++++++++++++++++++++++++++-
- 1 file changed, 28 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
-index 566331bec1..f877327975 100644
---- a/xen/arch/x86/irq.c
-+++ b/xen/arch/x86/irq.c
-@@ -2539,7 +2539,7 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
- for ( irq = 0; irq < nr_irqs; irq++ )
- {
- bool break_affinity = false, set_affinity = true;
-- unsigned int vector;
-+ unsigned int vector, cpu = smp_processor_id();
- cpumask_t *affinity = this_cpu(scratch_cpumask);
-
- if ( irq == 2 )
-@@ -2582,6 +2582,33 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
- affinity);
- }
-
-+ if ( desc->arch.move_in_progress &&
-+ /*
-+ * Only attempt to adjust the mask if the current CPU is going
-+ * offline, otherwise the whole system is going down and leaving
-+ * stale data in the masks is fine.
-+ */
-+ !cpu_online(cpu) &&
-+ cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )
-+ {
-+ /*
-+ * This CPU is going offline, remove it from ->arch.old_cpu_mask
-+ * and possibly release the old vector if the old mask becomes
-+ * empty.
-+ *
-+ * Note cleaning ->arch.old_cpu_mask is required if the CPU is
-+ * brought offline and then online again, as when re-onlined the
-+ * per-cpu vector table will no longer have ->arch.old_vector
-+ * setup, and hence ->arch.old_cpu_mask would be stale.
-+ */
-+ cpumask_clear_cpu(cpu, desc->arch.old_cpu_mask);
-+ if ( cpumask_empty(desc->arch.old_cpu_mask) )
-+ {
-+ desc->arch.move_in_progress = 0;
-+ release_old_vec(desc);
-+ }
-+ }
-+
- /*
- * Avoid shuffling the interrupt around as long as current target CPUs
- * are a subset of the input mask. What fixup_irqs() cares about is
---
-2.45.2
-
diff --git a/0031-x86-irq-handle-moving-interrupts-in-_assign_irq_vect.patch b/0031-x86-irq-handle-moving-interrupts-in-_assign_irq_vect.patch
deleted file mode 100644
index 96e87cd..0000000
--- a/0031-x86-irq-handle-moving-interrupts-in-_assign_irq_vect.patch
+++ /dev/null
@@ -1,172 +0,0 @@
-From 3a8f4ec75d8ed8da6370deac95c341cbada96802 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Wed, 26 Jun 2024 13:42:05 +0200
-Subject: [PATCH 31/56] x86/irq: handle moving interrupts in
- _assign_irq_vector()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Currently there's logic in fixup_irqs() that attempts to prevent
-_assign_irq_vector() from failing, as fixup_irqs() is required to evacuate all
-interrupts from the CPUs not present in the input mask. The current logic in
-fixup_irqs() is incomplete, as it doesn't deal with interrupts that have
-move_cleanup_count > 0 and a non-empty ->arch.old_cpu_mask field.
-
-Instead of attempting to fixup the interrupt descriptor in fixup_irqs() so that
-_assign_irq_vector() cannot fail, introduce logic in _assign_irq_vector()
-to deal with interrupts that have either move_{in_progress,cleanup_count} set
-and no remaining online CPUs in ->arch.cpu_mask.
-
-If _assign_irq_vector() is requested to move an interrupt in the state
-described above, first attempt to see if ->arch.old_cpu_mask contains any valid
-CPUs that could be used as fallback, and if that's the case do move the
-interrupt back to the previous destination. Note this is easier because the
-vector hasn't been released yet, so there's no need to allocate and setup a new
-vector on the destination.
-
-Due to the logic in fixup_irqs() that clears offline CPUs from
-->arch.old_cpu_mask (and releases the old vector if the mask becomes empty) it
-shouldn't be possible to get into _assign_irq_vector() with
-->arch.move_{in_progress,cleanup_count} set but no online CPUs in
-->arch.old_cpu_mask.
-
-However if ->arch.move_{in_progress,cleanup_count} is set and the interrupt has
-also changed affinity, it's possible the members of ->arch.old_cpu_mask are no
-longer part of the affinity set, move the interrupt to a different CPU part of
-the provided mask and keep the current ->arch.old_{cpu_mask,vector} for the
-pending interrupt movement to be completed.
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 369558924a642bbb0cb731e9a3375958867cb17b
-master date: 2024-06-18 15:15:10 +0200
----
- xen/arch/x86/irq.c | 97 ++++++++++++++++++++++++++++++++--------------
- 1 file changed, 68 insertions(+), 29 deletions(-)
-
-diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
-index f877327975..13ef61a5b7 100644
---- a/xen/arch/x86/irq.c
-+++ b/xen/arch/x86/irq.c
-@@ -553,7 +553,58 @@ static int _assign_irq_vector(struct irq_desc *desc, const cpumask_t *mask)
- }
-
- if ( desc->arch.move_in_progress || desc->arch.move_cleanup_count )
-- return -EAGAIN;
-+ {
-+ /*
-+ * If the current destination is online refuse to shuffle. Retry after
-+ * the in-progress movement has finished.
-+ */
-+ if ( cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map) )
-+ return -EAGAIN;
-+
-+ /*
-+ * Due to the logic in fixup_irqs() that clears offlined CPUs from
-+ * ->arch.old_cpu_mask it shouldn't be possible to get here with
-+ * ->arch.move_{in_progress,cleanup_count} set and no online CPUs in
-+ * ->arch.old_cpu_mask.
-+ */
-+ ASSERT(valid_irq_vector(desc->arch.old_vector));
-+ ASSERT(cpumask_intersects(desc->arch.old_cpu_mask, &cpu_online_map));
-+
-+ if ( cpumask_intersects(desc->arch.old_cpu_mask, mask) )
-+ {
-+ /*
-+ * Fallback to the old destination if moving is in progress and the
-+ * current destination is to be offlined. This is only possible if
-+ * the CPUs in old_cpu_mask intersect with the affinity mask passed
-+ * in the 'mask' parameter.
-+ */
-+ desc->arch.vector = desc->arch.old_vector;
-+ cpumask_and(desc->arch.cpu_mask, desc->arch.old_cpu_mask, mask);
-+
-+ /* Undo any possibly done cleanup. */
-+ for_each_cpu(cpu, desc->arch.cpu_mask)
-+ per_cpu(vector_irq, cpu)[desc->arch.vector] = irq;
-+
-+ /* Cancel the pending move and release the current vector. */
-+ desc->arch.old_vector = IRQ_VECTOR_UNASSIGNED;
-+ cpumask_clear(desc->arch.old_cpu_mask);
-+ desc->arch.move_in_progress = 0;
-+ desc->arch.move_cleanup_count = 0;
-+ if ( desc->arch.used_vectors )
-+ {
-+ ASSERT(test_bit(old_vector, desc->arch.used_vectors));
-+ clear_bit(old_vector, desc->arch.used_vectors);
-+ }
-+
-+ return 0;
-+ }
-+
-+ /*
-+ * There's an interrupt movement in progress but the destination(s) in
-+ * ->arch.old_cpu_mask are not suitable given the 'mask' parameter, go
-+ * through the full logic to find a new vector in a suitable CPU.
-+ */
-+ }
-
- err = -ENOSPC;
-
-@@ -609,7 +660,22 @@ next:
- current_vector = vector;
- current_offset = offset;
-
-- if ( valid_irq_vector(old_vector) )
-+ if ( desc->arch.move_in_progress || desc->arch.move_cleanup_count )
-+ {
-+ ASSERT(!cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map));
-+ /*
-+ * Special case when evacuating an interrupt from a CPU to be
-+ * offlined and the interrupt was already in the process of being
-+ * moved. Leave ->arch.old_{vector,cpu_mask} as-is and just
-+ * replace ->arch.{cpu_mask,vector} with the new destination.
-+ * Cleanup will be done normally for the old fields, just release
-+ * the current vector here.
-+ */
-+ if ( desc->arch.used_vectors &&
-+ !test_and_clear_bit(old_vector, desc->arch.used_vectors) )
-+ ASSERT_UNREACHABLE();
-+ }
-+ else if ( valid_irq_vector(old_vector) )
- {
- cpumask_and(desc->arch.old_cpu_mask, desc->arch.cpu_mask,
- &cpu_online_map);
-@@ -2620,33 +2686,6 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
- continue;
- }
-
-- /*
-- * In order for the affinity adjustment below to be successful, we
-- * need _assign_irq_vector() to succeed. This in particular means
-- * clearing desc->arch.move_in_progress if this would otherwise
-- * prevent the function from succeeding. Since there's no way for the
-- * flag to get cleared anymore when there's no possible destination
-- * left (the only possibility then would be the IRQs enabled window
-- * after this loop), there's then also no race with us doing it here.
-- *
-- * Therefore the logic here and there need to remain in sync.
-- */
-- if ( desc->arch.move_in_progress &&
-- !cpumask_intersects(mask, desc->arch.cpu_mask) )
-- {
-- unsigned int cpu;
--
-- cpumask_and(affinity, desc->arch.old_cpu_mask, &cpu_online_map);
--
-- spin_lock(&vector_lock);
-- for_each_cpu(cpu, affinity)
-- per_cpu(vector_irq, cpu)[desc->arch.old_vector] = ~irq;
-- spin_unlock(&vector_lock);
--
-- release_old_vec(desc);
-- desc->arch.move_in_progress = 0;
-- }
--
- if ( !cpumask_intersects(mask, desc->affinity) )
- {
- break_affinity = true;
---
-2.45.2
-
diff --git a/0031-x86emul-test-fix-build-with-gas-2.43.patch b/0031-x86emul-test-fix-build-with-gas-2.43.patch
new file mode 100644
index 0000000..fe30e10
--- /dev/null
+++ b/0031-x86emul-test-fix-build-with-gas-2.43.patch
@@ -0,0 +1,86 @@
+From 78d412f8bc3d78458cd868ba375ad30175194d91 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 24 Sep 2024 14:42:39 +0200
+Subject: [PATCH 31/35] x86emul/test: fix build with gas 2.43
+
+Drop explicit {evex} pseudo-prefixes. New gas (validly) complains when
+they're used on things other than instructions. Our use was potentially
+ahead of macro invocations - see simd.h's "override" macro.
+
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: 3c09288298af881ea1bb568740deb2d2a06bcd41
+master date: 2024-09-06 08:41:18 +0200
+---
+ tools/tests/x86_emulator/simd.c | 14 +++++++-------
+ 1 file changed, 7 insertions(+), 7 deletions(-)
+
+diff --git a/tools/tests/x86_emulator/simd.c b/tools/tests/x86_emulator/simd.c
+index 263cea662d..d68a7364c2 100644
+--- a/tools/tests/x86_emulator/simd.c
++++ b/tools/tests/x86_emulator/simd.c
+@@ -333,7 +333,7 @@ static inline vec_t movlhps(vec_t x, vec_t y) {
+ # if FLOAT_SIZE == 4
+ # define broadcast(x) ({ \
+ vec_t t_; \
+- asm ( "%{evex%} vbroadcastss %1, %0" \
++ asm ( "vbroadcastss %1, %0" \
+ : "=v" (t_) : "m" (*(float[1]){ x }) ); \
+ t_; \
+ })
+@@ -401,14 +401,14 @@ static inline vec_t movlhps(vec_t x, vec_t y) {
+ # if VEC_SIZE >= 32
+ # define broadcast(x) ({ \
+ vec_t t_; \
+- asm ( "%{evex%} vbroadcastsd %1, %0" : "=v" (t_) \
++ asm ( "vbroadcastsd %1, %0" : "=v" (t_) \
+ : "m" (*(double[1]){ x }) ); \
+ t_; \
+ })
+ # else
+ # define broadcast(x) ({ \
+ vec_t t_; \
+- asm ( "%{evex%} vpbroadcastq %1, %0" \
++ asm ( "vpbroadcastq %1, %0" \
+ : "=v" (t_) : "m" (*(double[1]){ x }) ); \
+ t_; \
+ })
+@@ -601,7 +601,7 @@ static inline vec_t movlhps(vec_t x, vec_t y) {
+ # if INT_SIZE == 4 || UINT_SIZE == 4
+ # define broadcast(x) ({ \
+ vec_t t_; \
+- asm ( "%{evex%} vpbroadcastd %1, %0" \
++ asm ( "vpbroadcastd %1, %0" \
+ : "=v" (t_) : "m" (*(int[1]){ x }) ); \
+ t_; \
+ })
+@@ -649,7 +649,7 @@ static inline vec_t movlhps(vec_t x, vec_t y) {
+ # elif INT_SIZE == 8 || UINT_SIZE == 8
+ # define broadcast(x) ({ \
+ vec_t t_; \
+- asm ( "%{evex%} vpbroadcastq %1, %0" \
++ asm ( "vpbroadcastq %1, %0" \
+ : "=v" (t_) : "m" (*(long long[1]){ x }) ); \
+ t_; \
+ })
+@@ -716,7 +716,7 @@ static inline vec_t movlhps(vec_t x, vec_t y) {
+ # if INT_SIZE == 1 || UINT_SIZE == 1
+ # define broadcast(x) ({ \
+ vec_t t_; \
+- asm ( "%{evex%} vpbroadcastb %1, %0" \
++ asm ( "vpbroadcastb %1, %0" \
+ : "=v" (t_) : "m" (*(char[1]){ x }) ); \
+ t_; \
+ })
+@@ -745,7 +745,7 @@ static inline vec_t movlhps(vec_t x, vec_t y) {
+ # elif INT_SIZE == 2 || UINT_SIZE == 2
+ # define broadcast(x) ({ \
+ vec_t t_; \
+- asm ( "%{evex%} vpbroadcastw %1, %0" \
++ asm ( "vpbroadcastw %1, %0" \
+ : "=v" (t_) : "m" (*(short[1]){ x }) ); \
+ t_; \
+ })
+--
+2.46.1
+
diff --git a/0032-x86-HVM-properly-reject-indirect-VRAM-writes.patch b/0032-x86-HVM-properly-reject-indirect-VRAM-writes.patch
new file mode 100644
index 0000000..79652e7
--- /dev/null
+++ b/0032-x86-HVM-properly-reject-indirect-VRAM-writes.patch
@@ -0,0 +1,45 @@
+From ec3999e205ccadbeb8ab1f8420dea02fee2b5a5d Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich@suse.com>
+Date: Tue, 24 Sep 2024 14:43:02 +0200
+Subject: [PATCH 32/35] x86/HVM: properly reject "indirect" VRAM writes
+
+While ->count will only be different from 1 for "indirect" (data in
+guest memory) accesses, it being 1 does not exclude the request being an
+"indirect" one. Check both to be on the safe side, and bring the ->count
+part also in line with what ioreq_send_buffered() actually refuses to
+handle.
+
+Fixes: 3bbaaec09b1b ("x86/hvm: unify stdvga mmio intercept with standard mmio intercept")
+Signed-off-by: Jan Beulich <jbeulich@suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
+master commit: eb7cd0593d88c4b967a24bca8bd30591966676cd
+master date: 2024-09-12 09:13:04 +0200
+---
+ xen/arch/x86/hvm/stdvga.c | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
+index b16c59f772..5f02d88615 100644
+--- a/xen/arch/x86/hvm/stdvga.c
++++ b/xen/arch/x86/hvm/stdvga.c
+@@ -530,14 +530,14 @@ static bool cf_check stdvga_mem_accept(
+
+ spin_lock(&s->lock);
+
+- if ( p->dir == IOREQ_WRITE && p->count > 1 )
++ if ( p->dir == IOREQ_WRITE && (p->data_is_ptr || p->count != 1) )
+ {
+ /*
+ * We cannot return X86EMUL_UNHANDLEABLE on anything other then the
+ * first cycle of an I/O. So, since we cannot guarantee to always be
+ * able to send buffered writes, we have to reject any multi-cycle
+- * I/O and, since we are rejecting an I/O, we must invalidate the
+- * cache.
++ * or "indirect" I/O and, since we are rejecting an I/O, we must
++ * invalidate the cache.
+ * Single-cycle write transactions are accepted even if the cache is
+ * not active since we can assert, when in stdvga mode, that writes
+ * to VRAM have no side effect and thus we can try to buffer them.
+--
+2.46.1
+
diff --git a/0032-xen-ubsan-Fix-UB-in-type_descriptor-declaration.patch b/0032-xen-ubsan-Fix-UB-in-type_descriptor-declaration.patch
deleted file mode 100644
index c7c0968..0000000
--- a/0032-xen-ubsan-Fix-UB-in-type_descriptor-declaration.patch
+++ /dev/null
@@ -1,39 +0,0 @@
-From 5397ab9995f7354e7f8122a8a91c810256afa3d1 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 26 Jun 2024 13:42:30 +0200
-Subject: [PATCH 32/56] xen/ubsan: Fix UB in type_descriptor declaration
-
-struct type_descriptor is arranged with a NUL terminated string following the
-kind/info fields.
-
-The only reason this doesn't trip UBSAN detection itself (on more modern
-compilers at least) is because struct type_descriptor is only referenced in
-suppressed regions.
-
-Switch the declaration to be a real flexible member. No functional change.
-
-Fixes: 00fcf4dd8eb4 ("xen/ubsan: Import ubsan implementation from Linux 4.13")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: bd59af99700f075d06a6d47a16f777c9519928e0
-master date: 2024-06-18 14:55:04 +0100
----
- xen/common/ubsan/ubsan.h | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/xen/common/ubsan/ubsan.h b/xen/common/ubsan/ubsan.h
-index a3159040fe..3db42e75b1 100644
---- a/xen/common/ubsan/ubsan.h
-+++ b/xen/common/ubsan/ubsan.h
-@@ -10,7 +10,7 @@ enum {
- struct type_descriptor {
- u16 type_kind;
- u16 type_info;
-- char type_name[1];
-+ char type_name[];
- };
-
- struct source_location {
---
-2.45.2
-
diff --git a/0033-x86-xstate-Fix-initialisation-of-XSS-cache.patch b/0033-x86-xstate-Fix-initialisation-of-XSS-cache.patch
deleted file mode 100644
index 1a8c724..0000000
--- a/0033-x86-xstate-Fix-initialisation-of-XSS-cache.patch
+++ /dev/null
@@ -1,74 +0,0 @@
-From 4ee1df89d9c92609e5fff3c9b261ce4b1bb88e42 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 26 Jun 2024 13:43:19 +0200
-Subject: [PATCH 33/56] x86/xstate: Fix initialisation of XSS cache
-
-The clobbering of this_cpu(xcr0) and this_cpu(xss) to architecturally invalid
-values is to force the subsequent set_xcr0() and set_msr_xss() to reload the
-hardware register.
-
-While XCR0 is reloaded in xstate_init(), MSR_XSS isn't. This causes
-get_msr_xss() to return the invalid value, and logic of the form:
-
- old = get_msr_xss();
- set_msr_xss(new);
- ...
- set_msr_xss(old);
-
-to try and restore said invalid value.
-
-The architecturally invalid value must be purged from the cache, meaning the
-hardware register must be written at least once. This in turn highlights that
-the invalid value must only be used in the case that the hardware register is
-available.
-
-Fixes: f7f4a523927f ("x86/xstate: reset cached register values on resume")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 9e6dbbe8bf400aacb99009ddffa91d2a0c312b39
-master date: 2024-06-19 13:00:06 +0100
----
- xen/arch/x86/xstate.c | 18 +++++++++++-------
- 1 file changed, 11 insertions(+), 7 deletions(-)
-
-diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
-index f442610fc5..ca76f98fe2 100644
---- a/xen/arch/x86/xstate.c
-+++ b/xen/arch/x86/xstate.c
-@@ -641,13 +641,6 @@ void xstate_init(struct cpuinfo_x86 *c)
- return;
- }
-
-- /*
-- * Zap the cached values to make set_xcr0() and set_msr_xss() really
-- * write it.
-- */
-- this_cpu(xcr0) = 0;
-- this_cpu(xss) = ~0;
--
- cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
- feature_mask = (((u64)edx << 32) | eax) & XCNTXT_MASK;
- BUG_ON(!valid_xcr0(feature_mask));
-@@ -657,8 +650,19 @@ void xstate_init(struct cpuinfo_x86 *c)
- * Set CR4_OSXSAVE and run "cpuid" to get xsave_cntxt_size.
- */
- set_in_cr4(X86_CR4_OSXSAVE);
-+
-+ /*
-+ * Zap the cached values to make set_xcr0() and set_msr_xss() really write
-+ * the hardware register.
-+ */
-+ this_cpu(xcr0) = 0;
- if ( !set_xcr0(feature_mask) )
- BUG();
-+ if ( cpu_has_xsaves )
-+ {
-+ this_cpu(xss) = ~0;
-+ set_msr_xss(0);
-+ }
-
- if ( bsp )
- {
---
-2.45.2
-
diff --git a/0033-xen-x86-pvh-handle-ACPI-RSDT-table-in-PVH-Dom0-build.patch b/0033-xen-x86-pvh-handle-ACPI-RSDT-table-in-PVH-Dom0-build.patch
new file mode 100644
index 0000000..d5d65b8
--- /dev/null
+++ b/0033-xen-x86-pvh-handle-ACPI-RSDT-table-in-PVH-Dom0-build.patch
@@ -0,0 +1,63 @@
+From d0ea9b319d4ca04e29ef533db0c3655a78dec315 Mon Sep 17 00:00:00 2001
+From: Stefano Stabellini <stefano.stabellini@amd.com>
+Date: Tue, 24 Sep 2024 14:43:24 +0200
+Subject: [PATCH 33/35] xen/x86/pvh: handle ACPI RSDT table in PVH Dom0 build
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Xen always generates an XSDT table even if the firmware only provided an
+RSDT table. Copy the RSDT header from the firmware table, adjusting the
+signature, for the XSDT table when not provided by the firmware.
+
+This is necessary to run Xen on QEMU.
+
+Fixes: 1d74282c455f ('x86: setup PVHv2 Dom0 ACPI tables')
+Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
+Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
+Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
+Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
+master commit: 6e7f7a0c16c4d406bda6d4a900252ff63a7c5fad
+master date: 2024-09-12 09:18:25 +0200
+---
+ xen/arch/x86/hvm/dom0_build.c | 17 ++++++++++++++++-
+ 1 file changed, 16 insertions(+), 1 deletion(-)
+
+diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
+index f3eddb6846..3dd913bdb0 100644
+--- a/xen/arch/x86/hvm/dom0_build.c
++++ b/xen/arch/x86/hvm/dom0_build.c
+@@ -1078,7 +1078,16 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, paddr_t madt_addr,
+ rc = -EINVAL;
+ goto out;
+ }
+- xsdt_paddr = rsdp->xsdt_physical_address;
++ /*
++ * Note the header is the same for both RSDT and XSDT, so it's fine to
++ * copy the native RSDT header to the Xen crafted XSDT if no native
++ * XSDT is available.
++ */
++ if ( rsdp->revision > 1 && rsdp->xsdt_physical_address )
++ xsdt_paddr = rsdp->xsdt_physical_address;
++ else
++ xsdt_paddr = rsdp->rsdt_physical_address;
++
+ acpi_os_unmap_memory(rsdp, sizeof(*rsdp));
+ table = acpi_os_map_memory(xsdt_paddr, sizeof(*table));
+ if ( !table )
+@@ -1090,6 +1099,12 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, paddr_t madt_addr,
+ xsdt->header = *table;
+ acpi_os_unmap_memory(table, sizeof(*table));
+
++ /*
++ * In case the header is an RSDT copy, unconditionally ensure it has
++ * an XSDT sig.
++ */
++ xsdt->header.signature[0] = 'X';
++
+ /* Add the custom MADT. */
+ xsdt->table_offset_entry[0] = madt_addr;
+
+--
+2.46.1
+
diff --git a/0034-blkif-reconcile-protocol-specification-with-in-use-i.patch b/0034-blkif-reconcile-protocol-specification-with-in-use-i.patch
new file mode 100644
index 0000000..baa2b49
--- /dev/null
+++ b/0034-blkif-reconcile-protocol-specification-with-in-use-i.patch
@@ -0,0 +1,183 @@
+From 933416b13966a3fa2a37b1f645c23afbd8fb6d09 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
+Date: Tue, 24 Sep 2024 14:43:50 +0200
+Subject: [PATCH 34/35] blkif: reconcile protocol specification with in-use
+ implementations
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+Current blkif implementations (both backends and frontends) have all slight
+differences about how they handle the 'sector-size' xenstore node, and how
+other fields are derived from this value or hardcoded to be expressed in units
+of 512 bytes.
+
+To give some context, this is an excerpt of how different implementations use
+the value in 'sector-size' as the base unit for to other fields rather than
+just to set the logical sector size of the block device:
+
+ │ sectors xenbus node │ requests sector_number │ requests {first,last}_sect
+────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
+FreeBSD blk{front,back} │ sector-size │ sector-size │ 512
+────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
+Linux blk{front,back} │ 512 │ 512 │ 512
+────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
+QEMU blkback │ sector-size │ sector-size │ sector-size
+────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
+Windows blkfront │ sector-size │ sector-size │ sector-size
+────────────────────────┼─────────────────────┼────────────────────────┼───────────────────────────
+MiniOS │ sector-size │ 512 │ 512
+
+An attempt was made by 67e1c050e36b in order to change the base units of the
+request fields and the xenstore 'sectors' node. That however only lead to more
+confusion, as the specification now clearly diverged from the reference
+implementation in Linux. Such change was only implemented for QEMU Qdisk
+and Windows PV blkfront.
+
+Partially revert to the state before 67e1c050e36b while adjusting the
+documentation for 'sectors' to match what it used to be previous to
+2fa701e5346d:
+
+ * Declare 'feature-large-sector-size' deprecated. Frontends should not expose
+ the node, backends should not make decisions based on its presence.
+
+ * Clarify that 'sectors' xenstore node and the requests fields are always in
+ 512-byte units, like it was previous to 2fa701e5346d and 67e1c050e36b.
+
+All base units for the fields used in the protocol are 512-byte based, the
+xenbus 'sector-size' field is only used to signal the logic block size. When
+'sector-size' is greater than 512, blkfront implementations must make sure that
+the offsets and sizes (despite being expressed in 512-byte units) are aligned
+to the logical block size specified in 'sector-size', otherwise the backend
+will fail to process the requests.
+
+This will require changes to some of the frontends and backends in order to
+properly support 'sector-size' nodes greater than 512.
+
+Fixes: 2fa701e5346d ('blkif.h: Provide more complete documentation of the blkif interface')
+Fixes: 67e1c050e36b ('public/io/blkif.h: try to fix the semantics of sector based quantities')
+Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
+Reviewed-by: Juergen Gross <jgross@suse.com>
+Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
+master commit: 221f2748e8dabe8361b8cdfcffbeab9102c4c899
+master date: 2024-09-12 14:04:56 +0200
+---
+ xen/include/public/io/blkif.h | 52 ++++++++++++++++++++++++++---------
+ 1 file changed, 39 insertions(+), 13 deletions(-)
+
+diff --git a/xen/include/public/io/blkif.h b/xen/include/public/io/blkif.h
+index 22f1eef0c0..9b00d633d3 100644
+--- a/xen/include/public/io/blkif.h
++++ b/xen/include/public/io/blkif.h
+@@ -237,12 +237,16 @@
+ * sector-size
+ * Values: <uint32_t>
+ *
+- * The logical block size, in bytes, of the underlying storage. This
+- * must be a power of two with a minimum value of 512.
++ * The logical block size, in bytes, of the underlying storage. This must
++ * be a power of two with a minimum value of 512. The sector size should
++ * only be used for request segment length and alignment.
+ *
+- * NOTE: Because of implementation bugs in some frontends this must be
+- * set to 512, unless the frontend advertizes a non-zero value
+- * in its "feature-large-sector-size" xenbus node. (See below).
++ * When exposing a device that uses a logical sector size of 4096, the
++ * only difference xenstore wise will be that 'sector-size' (and possibly
++ * 'physical-sector-size' if supported by the backend) will be 4096, but
++ * the 'sectors' node will still be calculated using 512 byte units. The
++ * sector base units in the ring requests fields will all be 512 byte
++ * based despite the logical sector size exposed in 'sector-size'.
+ *
+ * physical-sector-size
+ * Values: <uint32_t>
+@@ -254,9 +258,9 @@
+ * sectors
+ * Values: <uint64_t>
+ *
+- * The size of the backend device, expressed in units of "sector-size".
+- * The product of "sector-size" and "sectors" must also be an integer
+- * multiple of "physical-sector-size", if that node is present.
++ * The size of the backend device, expressed in units of 512b. The
++ * product of "sectors" * 512 must also be an integer multiple of
++ * "physical-sector-size", if that node is present.
+ *
+ *****************************************************************************
+ * Frontend XenBus Nodes
+@@ -338,6 +342,7 @@
+ * feature-large-sector-size
+ * Values: 0/1 (boolean)
+ * Default Value: 0
++ * Notes: DEPRECATED, 12
+ *
+ * A value of "1" indicates that the frontend will correctly supply and
+ * interpret all sector-based quantities in terms of the "sector-size"
+@@ -411,6 +416,11 @@
+ *(10) The discard-secure property may be present and will be set to 1 if the
+ * backing device supports secure discard.
+ *(11) Only used by Linux and NetBSD.
++ *(12) Possibly only ever implemented by the QEMU Qdisk backend and the Windows
++ * PV block frontend. Other backends and frontends supported 'sector-size'
++ * values greater than 512 before such feature was added. Frontends should
++ * not expose this node, neither should backends make any decisions based
++ * on it being exposed by the frontend.
+ */
+
+ /*
+@@ -619,11 +629,14 @@
+ #define BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST 8
+
+ /*
+- * NB. 'first_sect' and 'last_sect' in blkif_request_segment, as well as
+- * 'sector_number' in blkif_request, blkif_request_discard and
+- * blkif_request_indirect are sector-based quantities. See the description
+- * of the "feature-large-sector-size" frontend xenbus node above for
+- * more information.
++ * NB. 'first_sect' and 'last_sect' in blkif_request_segment are all in units
++ * of 512 bytes, despite the 'sector-size' xenstore node possibly having a
++ * value greater than 512.
++ *
++ * The value in 'first_sect' and 'last_sect' fields must be setup so that the
++ * resulting segment offset and size is aligned to the logical sector size
++ * reported by the 'sector-size' xenstore node, see 'Backend Device Properties'
++ * section.
+ */
+ struct blkif_request_segment {
+ grant_ref_t gref; /* reference to I/O buffer frame */
+@@ -634,6 +647,10 @@ struct blkif_request_segment {
+
+ /*
+ * Starting ring element for any I/O request.
++ *
++ * The 'sector_number' field is in units of 512b, despite the value of the
++ * 'sector-size' xenstore node. Note however that the offset in
++ * 'sector_number' must be aligned to 'sector-size'.
+ */
+ struct blkif_request {
+ uint8_t operation; /* BLKIF_OP_??? */
+@@ -648,6 +665,10 @@ typedef struct blkif_request blkif_request_t;
+ /*
+ * Cast to this structure when blkif_request.operation == BLKIF_OP_DISCARD
+ * sizeof(struct blkif_request_discard) <= sizeof(struct blkif_request)
++ *
++ * The 'sector_number' field is in units of 512b, despite the value of the
++ * 'sector-size' xenstore node. Note however that the offset in
++ * 'sector_number' must be aligned to 'sector-size'.
+ */
+ struct blkif_request_discard {
+ uint8_t operation; /* BLKIF_OP_DISCARD */
+@@ -660,6 +681,11 @@ struct blkif_request_discard {
+ };
+ typedef struct blkif_request_discard blkif_request_discard_t;
+
++/*
++ * The 'sector_number' field is in units of 512b, despite the value of the
++ * 'sector-size' xenstore node. Note however that the offset in
++ * 'sector_number' must be aligned to 'sector-size'.
++ */
+ struct blkif_request_indirect {
+ uint8_t operation; /* BLKIF_OP_INDIRECT */
+ uint8_t indirect_op; /* BLKIF_OP_{READ/WRITE} */
+--
+2.46.1
+
diff --git a/0034-x86-cpuid-Fix-handling-of-XSAVE-dynamic-leaves.patch b/0034-x86-cpuid-Fix-handling-of-XSAVE-dynamic-leaves.patch
deleted file mode 100644
index 1905728..0000000
--- a/0034-x86-cpuid-Fix-handling-of-XSAVE-dynamic-leaves.patch
+++ /dev/null
@@ -1,72 +0,0 @@
-From 9b43092d54b5f9e9d39d9f20393671e303b19e81 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Wed, 26 Jun 2024 13:43:44 +0200
-Subject: [PATCH 34/56] x86/cpuid: Fix handling of XSAVE dynamic leaves
-
-[ This is a minimal backport of commit 71cacfb035f4 ("x86/cpuid: Fix handling
- of XSAVE dynamic leaves") to fix the bugs without depending on the large
- rework of XSTATE handling in Xen 4.19 ]
-
-First, if XSAVE is available in hardware but not visible to the guest, the
-dynamic leaves shouldn't be filled in.
-
-Second, the comment concerning XSS state is wrong. VT-x doesn't manage
-host/guest state automatically, but there is provision for "host only" bits to
-be set, so the implications are still accurate.
-
-In Xen 4.18, no XSS states are supported, so it's safe to keep deferring to
-real hardware.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 71cacfb035f4a78ee10970dc38a3baa04d387451
-master date: 2024-06-19 13:00:06 +0100
----
- xen/arch/x86/cpuid.c | 30 +++++++++++++-----------------
- 1 file changed, 13 insertions(+), 17 deletions(-)
-
-diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
-index 455a09b2dd..f6fd6cc6b3 100644
---- a/xen/arch/x86/cpuid.c
-+++ b/xen/arch/x86/cpuid.c
-@@ -330,24 +330,20 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
- case XSTATE_CPUID:
- switch ( subleaf )
- {
-- case 1:
-- if ( p->xstate.xsavec || p->xstate.xsaves )
-- {
-- /*
-- * TODO: Figure out what to do for XSS state. VT-x manages
-- * host vs guest MSR_XSS automatically, so as soon as we start
-- * supporting any XSS states, the wrong XSS will be in
-- * context.
-- */
-- BUILD_BUG_ON(XSTATE_XSAVES_ONLY != 0);
--
-- /*
-- * Read CPUID[0xD,0/1].EBX from hardware. They vary with
-- * enabled XSTATE, and appropraite XCR0|XSS are in context.
-- */
-+ /*
-+ * Read CPUID[0xd,0/1].EBX from hardware. They vary with enabled
-+ * XSTATE, and the appropriate XCR0 is in context.
-+ */
- case 0:
-- res->b = cpuid_count_ebx(leaf, subleaf);
-- }
-+ if ( p->basic.xsave )
-+ res->b = cpuid_count_ebx(0xd, 0);
-+ break;
-+
-+ case 1:
-+ /* This only works because Xen doesn't support XSS states yet. */
-+ BUILD_BUG_ON(XSTATE_XSAVES_ONLY != 0);
-+ if ( p->xstate.xsavec )
-+ res->b = cpuid_count_ebx(0xd, 1);
- break;
- }
- break;
---
-2.45.2
-
diff --git a/0035-x86-irq-forward-pending-interrupts-to-new-destinatio.patch b/0035-x86-irq-forward-pending-interrupts-to-new-destinatio.patch
deleted file mode 100644
index f05b09e..0000000
--- a/0035-x86-irq-forward-pending-interrupts-to-new-destinatio.patch
+++ /dev/null
@@ -1,143 +0,0 @@
-From e95d30f9e5eed0c5d9dbf72d4cc3ae373152ab10 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Wed, 26 Jun 2024 13:44:08 +0200
-Subject: [PATCH 35/56] x86/irq: forward pending interrupts to new destination
- in fixup_irqs()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-fixup_irqs() is used to evacuate interrupts from to be offlined CPUs. Given
-the CPU is to become offline, the normal migration logic used by Xen where the
-vector in the previous target(s) is left configured until the interrupt is
-received on the new destination is not suitable.
-
-Instead attempt to do as much as possible in order to prevent loosing
-interrupts. If fixup_irqs() is called from the CPU to be offlined (as is
-currently the case for CPU hot unplug) attempt to forward pending vectors when
-interrupts that target the current CPU are migrated to a different destination.
-
-Additionally, for interrupts that have already been moved from the current CPU
-prior to the call to fixup_irqs() but that haven't been delivered to the new
-destination (iow: interrupts with move_in_progress set and the current CPU set
-in ->arch.old_cpu_mask) also check whether the previous vector is pending and
-forward it to the new destination.
-
-This allows us to remove the window with interrupts enabled at the bottom of
-fixup_irqs(). Such window wasn't safe anyway: references to the CPU to become
-offline are removed from interrupts masks, but the per-CPU vector_irq[] array
-is not updated to reflect those changes (as the CPU is going offline anyway).
-
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: e2bb28d621584fce15c907002ddc7c6772644b64
-master date: 2024-06-20 12:09:32 +0200
----
- xen/arch/x86/include/asm/apic.h | 5 ++++
- xen/arch/x86/irq.c | 46 ++++++++++++++++++++++++++++-----
- 2 files changed, 45 insertions(+), 6 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/apic.h b/xen/arch/x86/include/asm/apic.h
-index 7625c0ecd6..ad8d7cc054 100644
---- a/xen/arch/x86/include/asm/apic.h
-+++ b/xen/arch/x86/include/asm/apic.h
-@@ -145,6 +145,11 @@ static __inline bool_t apic_isr_read(u8 vector)
- (vector & 0x1f)) & 1;
- }
-
-+static inline bool apic_irr_read(unsigned int vector)
-+{
-+ return apic_read(APIC_IRR + (vector / 32 * 0x10)) & (1U << (vector % 32));
-+}
-+
- static __inline u32 get_apic_id(void) /* Get the physical APIC id */
- {
- u32 id = apic_read(APIC_ID);
-diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
-index 13ef61a5b7..290f8d26e7 100644
---- a/xen/arch/x86/irq.c
-+++ b/xen/arch/x86/irq.c
-@@ -2604,7 +2604,7 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
-
- for ( irq = 0; irq < nr_irqs; irq++ )
- {
-- bool break_affinity = false, set_affinity = true;
-+ bool break_affinity = false, set_affinity = true, check_irr = false;
- unsigned int vector, cpu = smp_processor_id();
- cpumask_t *affinity = this_cpu(scratch_cpumask);
-
-@@ -2657,6 +2657,25 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
- !cpu_online(cpu) &&
- cpumask_test_cpu(cpu, desc->arch.old_cpu_mask) )
- {
-+ /*
-+ * This to be offlined CPU was the target of an interrupt that's
-+ * been moved, and the new destination target hasn't yet
-+ * acknowledged any interrupt from it.
-+ *
-+ * We know the interrupt is configured to target the new CPU at
-+ * this point, so we can check IRR for any pending vectors and
-+ * forward them to the new destination.
-+ *
-+ * Note that for the other case of an interrupt movement being in
-+ * progress (move_cleanup_count being non-zero) we know the new
-+ * destination has already acked at least one interrupt from this
-+ * source, and hence there's no need to forward any stale
-+ * interrupts.
-+ */
-+ if ( apic_irr_read(desc->arch.old_vector) )
-+ send_IPI_mask(cpumask_of(cpumask_any(desc->arch.cpu_mask)),
-+ desc->arch.vector);
-+
- /*
- * This CPU is going offline, remove it from ->arch.old_cpu_mask
- * and possibly release the old vector if the old mask becomes
-@@ -2697,6 +2716,14 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
- if ( desc->handler->disable )
- desc->handler->disable(desc);
-
-+ /*
-+ * If the current CPU is going offline and is (one of) the target(s) of
-+ * the interrupt, signal to check whether there are any pending vectors
-+ * to be handled in the local APIC after the interrupt has been moved.
-+ */
-+ if ( !cpu_online(cpu) && cpumask_test_cpu(cpu, desc->arch.cpu_mask) )
-+ check_irr = true;
-+
- if ( desc->handler->set_affinity )
- desc->handler->set_affinity(desc, affinity);
- else if ( !(warned++) )
-@@ -2707,6 +2734,18 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
-
- cpumask_copy(affinity, desc->affinity);
-
-+ if ( check_irr && apic_irr_read(vector) )
-+ /*
-+ * Forward pending interrupt to the new destination, this CPU is
-+ * going offline and otherwise the interrupt would be lost.
-+ *
-+ * Do the IRR check as late as possible before releasing the irq
-+ * desc in order for any in-flight interrupts to be delivered to
-+ * the lapic.
-+ */
-+ send_IPI_mask(cpumask_of(cpumask_any(desc->arch.cpu_mask)),
-+ desc->arch.vector);
-+
- spin_unlock(&desc->lock);
-
- if ( !verbose )
-@@ -2718,11 +2757,6 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
- printk("Broke affinity for IRQ%u, new: %*pb\n",
- irq, CPUMASK_PR(affinity));
- }
--
-- /* That doesn't seem sufficient. Give it 1ms. */
-- local_irq_enable();
-- mdelay(1);
-- local_irq_disable();
- }
-
- void fixup_eoi(void)
---
-2.45.2
-
diff --git a/0035-xen-ucode-Fix-buffer-under-run-when-parsing-AMD-cont.patch b/0035-xen-ucode-Fix-buffer-under-run-when-parsing-AMD-cont.patch
new file mode 100644
index 0000000..221f55b
--- /dev/null
+++ b/0035-xen-ucode-Fix-buffer-under-run-when-parsing-AMD-cont.patch
@@ -0,0 +1,62 @@
+From 2c61ab407172682e1382204a8305107f19e2951b Mon Sep 17 00:00:00 2001
+From: Demi Marie Obenour <demi@invisiblethingslab.com>
+Date: Tue, 24 Sep 2024 14:44:10 +0200
+Subject: [PATCH 35/35] xen/ucode: Fix buffer under-run when parsing AMD
+ containers
+
+The AMD container format has no formal spec. It is, at best, precision
+guesswork based on AMD's prior contributions to open source projects. The
+Equivalence Table has both an explicit length, and an expectation of having a
+NULL entry at the end.
+
+Xen was sanity checking the NULL entry, but without confirming that an entry
+was present, resulting in a read off the front of the buffer. With some
+manual debugging/annotations this manifests as:
+
+ (XEN) *** Buf ffff83204c00b19c, eq ffff83204c00b194
+ (XEN) *** eq: 0c 00 00 00 44 4d 41 00 00 00 00 00 00 00 00 00 aa aa aa aa
+ ^-Actual buffer-------------------^
+ (XEN) *** installed_cpu: 000c
+ (XEN) microcode: Bad equivalent cpu table
+ (XEN) Parsing microcode blob error -22
+
+When loaded by hypercall, the 4 bytes interpreted as installed_cpu happen to
+be the containing struct ucode_buf's len field, and luckily will be nonzero.
+
+When loaded at boot, it's possible for the access to #PF if the module happens
+to have been placed on a 2M boundary by the bootloader. Under Linux, it will
+commonly be the end of the CPIO header.
+
+Drop the probe of the NULL entry; Nothing else cares. A container without one
+is well formed, insofar that we can still parse it correctly. With this
+dropped, the same container results in:
+
+ (XEN) microcode: couldn't find any matching ucode in the provided blob!
+
+Fixes: 4de936a38aa9 ("x86/ucode/amd: Rework parsing logic in cpu_request_microcode()")
+Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
+Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
+Reviewed-by: Jan Beulich <jbeulich@suse.com>
+master commit: a8bf14f6f331d4f428010b4277b67c33f561ed19
+master date: 2024-09-13 15:23:30 +0100
+---
+ xen/arch/x86/cpu/microcode/amd.c | 3 +--
+ 1 file changed, 1 insertion(+), 2 deletions(-)
+
+diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
+index f76a563c8b..9fe6e29751 100644
+--- a/xen/arch/x86/cpu/microcode/amd.c
++++ b/xen/arch/x86/cpu/microcode/amd.c
+@@ -336,8 +336,7 @@ static struct microcode_patch *cf_check cpu_request_microcode(
+ if ( size < sizeof(*et) ||
+ (et = buf)->type != UCODE_EQUIV_CPU_TABLE_TYPE ||
+ size - sizeof(*et) < et->len ||
+- et->len % sizeof(et->eq[0]) ||
+- et->eq[(et->len / sizeof(et->eq[0])) - 1].installed_cpu )
++ et->len % sizeof(et->eq[0]) )
+ {
+ printk(XENLOG_ERR "microcode: Bad equivalent cpu table\n");
+ error = -EINVAL;
+--
+2.46.1
+
diff --git a/0036-x86-re-run-exception-from-stub-recovery-selftests-wi.patch b/0036-x86-re-run-exception-from-stub-recovery-selftests-wi.patch
deleted file mode 100644
index a552e9c..0000000
--- a/0036-x86-re-run-exception-from-stub-recovery-selftests-wi.patch
+++ /dev/null
@@ -1,84 +0,0 @@
-From 5ac3cbbf83e1f955aeaf5d0f503099f5249b5c25 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Thu, 4 Jul 2024 14:06:19 +0200
-Subject: [PATCH 36/56] x86: re-run exception-from-stub recovery selftests with
- CET-SS enabled
-
-On the BSP, shadow stacks are enabled only relatively late in the
-booting process. They in particular aren't active yet when initcalls are
-run. Keep the testing there, but invoke that testing a 2nd time when
-shadow stacks are active, to make sure we won't regress that case after
-addressing XSA-451.
-
-While touching this code, switch the guard from NDEBUG to CONFIG_DEBUG,
-such that IS_ENABLED() can validly be used at the new call site.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: cfe3ad67127b86e1b1c06993b86422673a51b050
-master date: 2024-02-27 13:49:52 +0100
----
- xen/arch/x86/extable.c | 8 +++++---
- xen/arch/x86/include/asm/setup.h | 2 ++
- xen/arch/x86/setup.c | 4 ++++
- 3 files changed, 11 insertions(+), 3 deletions(-)
-
-diff --git a/xen/arch/x86/extable.c b/xen/arch/x86/extable.c
-index 8ffcd346d7..12cc9935d8 100644
---- a/xen/arch/x86/extable.c
-+++ b/xen/arch/x86/extable.c
-@@ -128,10 +128,11 @@ search_exception_table(const struct cpu_user_regs *regs, unsigned long *stub_ra)
- return 0;
- }
-
--#ifndef NDEBUG
-+#ifdef CONFIG_DEBUG
-+#include <asm/setup.h>
- #include <asm/traps.h>
-
--static int __init cf_check stub_selftest(void)
-+int __init cf_check stub_selftest(void)
- {
- static const struct {
- uint8_t opc[8];
-@@ -155,7 +156,8 @@ static int __init cf_check stub_selftest(void)
- unsigned int i;
- bool fail = false;
-
-- printk("Running stub recovery selftests...\n");
-+ printk("%s stub recovery selftests...\n",
-+ system_state < SYS_STATE_active ? "Running" : "Re-running");
-
- for ( i = 0; i < ARRAY_SIZE(tests); ++i )
- {
-diff --git a/xen/arch/x86/include/asm/setup.h b/xen/arch/x86/include/asm/setup.h
-index 9a460e4db8..14d15048eb 100644
---- a/xen/arch/x86/include/asm/setup.h
-+++ b/xen/arch/x86/include/asm/setup.h
-@@ -38,6 +38,8 @@ void *bootstrap_map(const module_t *mod);
-
- int xen_in_range(unsigned long mfn);
-
-+int cf_check stub_selftest(void);
-+
- extern uint8_t kbd_shift_flags;
-
- #ifdef NDEBUG
-diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
-index 25017b5d96..f2592c3dc9 100644
---- a/xen/arch/x86/setup.c
-+++ b/xen/arch/x86/setup.c
-@@ -738,6 +738,10 @@ static void noreturn init_done(void)
-
- system_state = SYS_STATE_active;
-
-+ /* Re-run stub recovery self-tests with CET-SS active. */
-+ if ( IS_ENABLED(CONFIG_DEBUG) && cpu_has_xen_shstk )
-+ stub_selftest();
-+
- domain_unpause_by_systemcontroller(dom0);
-
- /* MUST be done prior to removing .init data. */
---
-2.45.2
-
diff --git a/0037-tools-tests-don-t-let-test-xenstore-write-nodes-exce.patch b/0037-tools-tests-don-t-let-test-xenstore-write-nodes-exce.patch
deleted file mode 100644
index cc7e47d..0000000
--- a/0037-tools-tests-don-t-let-test-xenstore-write-nodes-exce.patch
+++ /dev/null
@@ -1,41 +0,0 @@
-From 0ebfa35965257343ba3d8377be91ad8512a9c749 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Thu, 4 Jul 2024 14:06:54 +0200
-Subject: [PATCH 37/56] tools/tests: don't let test-xenstore write nodes
- exceeding default size
-
-Today test-xenstore will write nodes with 3000 bytes node data. This
-size is exceeding the default quota for the allowed node size. While
-working in dom0 with C-xenstored, OCAML-xenstored does not like that.
-
-Use a size of 2000 instead, which is lower than the allowed default
-node size of 2048.
-
-Fixes: 3afc5e4a5b75 ("tools/tests: add xenstore testing framework")
-Signed-off-by: Juergen Gross <jgross@suse.com>
-Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: 642005e310483c490b0725fab4672f2b77fdf2ba
-master date: 2024-05-02 18:15:31 +0100
----
- tools/tests/xenstore/test-xenstore.c | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/tools/tests/xenstore/test-xenstore.c b/tools/tests/xenstore/test-xenstore.c
-index d491dac53b..73a7011d21 100644
---- a/tools/tests/xenstore/test-xenstore.c
-+++ b/tools/tests/xenstore/test-xenstore.c
-@@ -408,9 +408,9 @@ static int test_ta3_deinit(uintptr_t par)
- #define TEST(s, f, p, l) { s, f ## _init, f, f ## _deinit, (uintptr_t)(p), l }
- struct test tests[] = {
- TEST("read 1", test_read, 1, "Read node with 1 byte data"),
--TEST("read 3000", test_read, 3000, "Read node with 3000 bytes data"),
-+TEST("read 2000", test_read, 2000, "Read node with 2000 bytes data"),
- TEST("write 1", test_write, 1, "Write node with 1 byte data"),
--TEST("write 3000", test_write, 3000, "Write node with 3000 bytes data"),
-+TEST("write 2000", test_write, 2000, "Write node with 2000 bytes data"),
- TEST("dir", test_dir, 0, "List directory"),
- TEST("rm node", test_rm, 0, "Remove single node"),
- TEST("rm dir", test_rm, WRITE_BUFFERS_N, "Remove node with sub-nodes"),
---
-2.45.2
-
diff --git a/0038-tools-tests-let-test-xenstore-exit-with-non-0-status.patch b/0038-tools-tests-let-test-xenstore-exit-with-non-0-status.patch
deleted file mode 100644
index ee0a497..0000000
--- a/0038-tools-tests-let-test-xenstore-exit-with-non-0-status.patch
+++ /dev/null
@@ -1,57 +0,0 @@
-From 22f623622cc60571be9cccc323a1d17749683667 Mon Sep 17 00:00:00 2001
-From: Juergen Gross <jgross@suse.com>
-Date: Thu, 4 Jul 2024 14:07:12 +0200
-Subject: [PATCH 38/56] tools/tests: let test-xenstore exit with non-0 status
- in case of error
-
-In case a test is failing in test-xenstore, let the tool exit with an
-exit status other than 0.
-
-Fix a typo in an error message.
-
-Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Fixes: 3afc5e4a5b75 ("tools/tests: add xenstore testing framework")
-Signed-off-by: Juergen Gross <jgross@suse.com>
-master commit: 2d4ba205591ba64f31149ae31051678159ee9e11
-master date: 2024-05-02 18:15:46 +0100
----
- tools/tests/xenstore/test-xenstore.c | 8 ++++----
- 1 file changed, 4 insertions(+), 4 deletions(-)
-
-diff --git a/tools/tests/xenstore/test-xenstore.c b/tools/tests/xenstore/test-xenstore.c
-index 73a7011d21..7a9bd9afb3 100644
---- a/tools/tests/xenstore/test-xenstore.c
-+++ b/tools/tests/xenstore/test-xenstore.c
-@@ -506,14 +506,14 @@ int main(int argc, char *argv[])
- stop = time(NULL) + randtime;
- srandom((unsigned int)stop);
-
-- while ( time(NULL) < stop )
-+ while ( time(NULL) < stop && !ret )
- {
- t = random() % ARRAY_SIZE(tests);
- ret = call_test(tests + t, iters, true);
- }
- }
- else
-- for ( t = 0; t < ARRAY_SIZE(tests); t++ )
-+ for ( t = 0; t < ARRAY_SIZE(tests) && !ret; t++ )
- {
- if ( !test || !strcmp(test, tests[t].name) )
- ret = call_test(tests + t, iters, false);
-@@ -525,10 +525,10 @@ int main(int argc, char *argv[])
- xs_close(xsh);
-
- if ( ta_loops )
-- printf("Exhaustive transaction retries (%d) occurrred %d times.\n",
-+ printf("Exhaustive transaction retries (%d) occurred %d times.\n",
- MAX_TA_LOOPS, ta_loops);
-
-- return 0;
-+ return ret ? 3 : 0;
- }
-
- /*
---
-2.45.2
-
diff --git a/0039-LICENSES-Add-MIT-0-MIT-No-Attribution.patch b/0039-LICENSES-Add-MIT-0-MIT-No-Attribution.patch
deleted file mode 100644
index 8b2c4ec..0000000
--- a/0039-LICENSES-Add-MIT-0-MIT-No-Attribution.patch
+++ /dev/null
@@ -1,58 +0,0 @@
-From 75b4f9474a1aa33a6f9e0986b51c390f9b38ae5a Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 4 Jul 2024 14:08:11 +0200
-Subject: [PATCH 39/56] LICENSES: Add MIT-0 (MIT No Attribution)
-
-We are about to import code licensed under MIT-0. It's compatible for us to
-use, so identify it as a permitted license.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
-Acked-by: Christian Lindig <christian.lindig@cloud.com>
-master commit: 219cdff3fb7b4a03ab14869584f111e0f623b330
-master date: 2024-05-23 15:04:40 +0100
----
- LICENSES/MIT-0 | 31 +++++++++++++++++++++++++++++++
- 1 file changed, 31 insertions(+)
- create mode 100644 LICENSES/MIT-0
-
-diff --git a/LICENSES/MIT-0 b/LICENSES/MIT-0
-new file mode 100644
-index 0000000000..70fb90ee34
---- /dev/null
-+++ b/LICENSES/MIT-0
-@@ -0,0 +1,31 @@
-+Valid-License-Identifier: MIT-0
-+
-+SPDX-URL: https://spdx.org/licenses/MIT-0.html
-+
-+Usage-Guide:
-+
-+ To use the MIT-0 License put the following SPDX tag/value pair into a
-+ comment according to the placement guidelines in the licensing rules
-+ documentation:
-+ SPDX-License-Identifier: MIT-0
-+
-+License-Text:
-+
-+MIT No Attribution
-+
-+Copyright <year> <copyright holder>
-+
-+Permission is hereby granted, free of charge, to any person obtaining a copy
-+of this software and associated documentation files (the "Software"), to deal
-+in the Software without restriction, including without limitation the rights
-+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-+copies of the Software, and to permit persons to whom the Software is
-+furnished to do so.
-+
-+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-+SOFTWARE.
---
-2.45.2
-
diff --git a/0040-tools-Import-stand-alone-sd_notify-implementation-fr.patch b/0040-tools-Import-stand-alone-sd_notify-implementation-fr.patch
deleted file mode 100644
index 990158d..0000000
--- a/0040-tools-Import-stand-alone-sd_notify-implementation-fr.patch
+++ /dev/null
@@ -1,130 +0,0 @@
-From 1743102a92479834c8e17b20697129e05b7c8313 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 4 Jul 2024 14:10:10 +0200
-Subject: [PATCH 40/56] tools: Import stand-alone sd_notify() implementation
- from systemd
-
-... in order to avoid linking against the whole of libsystemd.
-
-Only minimal changes to the upstream copy, to function as a drop-in
-replacement for sd_notify() and as a header-only library.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-Acked-by: Christian Lindig <christian.lindig@cloud.com>
-master commit: 78510f3a1522f2856330ffa429e0e35f8aab4277
-master date: 2024-05-23 15:04:40 +0100
-master commit: 78510f3a1522f2856330ffa429e0e35f8aab4277
-master date: 2024-05-23 15:04:40 +0100
----
- tools/include/xen-sd-notify.h | 98 +++++++++++++++++++++++++++++++++++
- 1 file changed, 98 insertions(+)
- create mode 100644 tools/include/xen-sd-notify.h
-
-diff --git a/tools/include/xen-sd-notify.h b/tools/include/xen-sd-notify.h
-new file mode 100644
-index 0000000000..28c9b20f15
---- /dev/null
-+++ b/tools/include/xen-sd-notify.h
-@@ -0,0 +1,98 @@
-+/* SPDX-License-Identifier: MIT-0 */
-+
-+/*
-+ * Implement the systemd notify protocol without external dependencies.
-+ * Supports both readiness notification on startup and on reloading,
-+ * according to the protocol defined at:
-+ * https://www.freedesktop.org/software/systemd/man/latest/sd_notify.html
-+ * This protocol is guaranteed to be stable as per:
-+ * https://systemd.io/PORTABILITY_AND_STABILITY/
-+ *
-+ * Differences from the upstream copy:
-+ * - Rename/rework as a drop-in replacement for systemd/sd-daemon.h
-+ * - Only take the subset Xen cares about
-+ * - Respect -Wdeclaration-after-statement
-+ */
-+
-+#ifndef XEN_SD_NOTIFY
-+#define XEN_SD_NOTIFY
-+
-+#include <errno.h>
-+#include <stddef.h>
-+#include <stdlib.h>
-+#include <sys/socket.h>
-+#include <sys/un.h>
-+#include <unistd.h>
-+
-+static inline void xen_sd_closep(int *fd) {
-+ if (!fd || *fd < 0)
-+ return;
-+
-+ close(*fd);
-+ *fd = -1;
-+}
-+
-+static inline int xen_sd_notify(const char *message) {
-+ union sockaddr_union {
-+ struct sockaddr sa;
-+ struct sockaddr_un sun;
-+ } socket_addr = {
-+ .sun.sun_family = AF_UNIX,
-+ };
-+ size_t path_length, message_length;
-+ ssize_t written;
-+ const char *socket_path;
-+ int __attribute__((cleanup(xen_sd_closep))) fd = -1;
-+
-+ /* Verify the argument first */
-+ if (!message)
-+ return -EINVAL;
-+
-+ message_length = strlen(message);
-+ if (message_length == 0)
-+ return -EINVAL;
-+
-+ /* If the variable is not set, the protocol is a noop */
-+ socket_path = getenv("NOTIFY_SOCKET");
-+ if (!socket_path)
-+ return 0; /* Not set? Nothing to do */
-+
-+ /* Only AF_UNIX is supported, with path or abstract sockets */
-+ if (socket_path[0] != '/' && socket_path[0] != '@')
-+ return -EAFNOSUPPORT;
-+
-+ path_length = strlen(socket_path);
-+ /* Ensure there is room for NUL byte */
-+ if (path_length >= sizeof(socket_addr.sun.sun_path))
-+ return -E2BIG;
-+
-+ memcpy(socket_addr.sun.sun_path, socket_path, path_length);
-+
-+ /* Support for abstract socket */
-+ if (socket_addr.sun.sun_path[0] == '@')
-+ socket_addr.sun.sun_path[0] = 0;
-+
-+ fd = socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0);
-+ if (fd < 0)
-+ return -errno;
-+
-+ if (connect(fd, &socket_addr.sa, offsetof(struct sockaddr_un, sun_path) + path_length) != 0)
-+ return -errno;
-+
-+ written = write(fd, message, message_length);
-+ if (written != (ssize_t) message_length)
-+ return written < 0 ? -errno : -EPROTO;
-+
-+ return 1; /* Notified! */
-+}
-+
-+static inline int sd_notify(int unset_environment, const char *message) {
-+ int r = xen_sd_notify(message);
-+
-+ if (unset_environment)
-+ unsetenv("NOTIFY_SOCKET");
-+
-+ return r;
-+}
-+
-+#endif /* XEN_SD_NOTIFY */
---
-2.45.2
-
diff --git a/0041-tools-c-o-xenstored-Don-t-link-against-libsystemd.patch b/0041-tools-c-o-xenstored-Don-t-link-against-libsystemd.patch
deleted file mode 100644
index 5bf3f98..0000000
--- a/0041-tools-c-o-xenstored-Don-t-link-against-libsystemd.patch
+++ /dev/null
@@ -1,87 +0,0 @@
-From 77cf215157d267a7776f3c4ec32e89064dcd84cd Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 4 Jul 2024 14:10:29 +0200
-Subject: [PATCH 41/56] tools/{c,o}xenstored: Don't link against libsystemd
-
-Use the local freestanding wrapper instead.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-Acked-by: Christian Lindig <christian.lindig@cloud.com>
-master commit: caf864482689a5dd6a945759b6372bb260d49665
-master date: 2024-05-23 15:04:40 +0100
----
- tools/ocaml/xenstored/Makefile | 3 +--
- tools/ocaml/xenstored/systemd_stubs.c | 2 +-
- tools/xenstored/Makefile | 5 -----
- tools/xenstored/core.c | 4 ++--
- 4 files changed, 4 insertions(+), 10 deletions(-)
-
-diff --git a/tools/ocaml/xenstored/Makefile b/tools/ocaml/xenstored/Makefile
-index e8aaecf2e6..fa45305d8c 100644
---- a/tools/ocaml/xenstored/Makefile
-+++ b/tools/ocaml/xenstored/Makefile
-@@ -4,8 +4,7 @@ include $(OCAML_TOPLEVEL)/common.make
-
- # Include configure output (config.h)
- CFLAGS += -include $(XEN_ROOT)/tools/config.h
--CFLAGS-$(CONFIG_SYSTEMD) += $(SYSTEMD_CFLAGS)
--LDFLAGS-$(CONFIG_SYSTEMD) += $(SYSTEMD_LIBS)
-+CFLAGS-$(CONFIG_SYSTEMD) += $(CFLAGS_xeninclude)
-
- CFLAGS += $(CFLAGS-y)
- CFLAGS += $(APPEND_CFLAGS)
-diff --git a/tools/ocaml/xenstored/systemd_stubs.c b/tools/ocaml/xenstored/systemd_stubs.c
-index f4c875075a..7dbbdd35bf 100644
---- a/tools/ocaml/xenstored/systemd_stubs.c
-+++ b/tools/ocaml/xenstored/systemd_stubs.c
-@@ -25,7 +25,7 @@
-
- #if defined(HAVE_SYSTEMD)
-
--#include <systemd/sd-daemon.h>
-+#include <xen-sd-notify.h>
-
- CAMLprim value ocaml_sd_notify_ready(value ignore)
- {
-diff --git a/tools/xenstored/Makefile b/tools/xenstored/Makefile
-index e0897ed1ba..09adfe1d50 100644
---- a/tools/xenstored/Makefile
-+++ b/tools/xenstored/Makefile
-@@ -9,11 +9,6 @@ xenstored: LDLIBS += $(LDLIBS_libxenctrl)
- xenstored: LDLIBS += -lrt
- xenstored: LDLIBS += $(SOCKET_LIBS)
-
--ifeq ($(CONFIG_SYSTEMD),y)
--$(XENSTORED_OBJS-y): CFLAGS += $(SYSTEMD_CFLAGS)
--xenstored: LDLIBS += $(SYSTEMD_LIBS)
--endif
--
- TARGETS := xenstored
-
- .PHONY: all
-diff --git a/tools/xenstored/core.c b/tools/xenstored/core.c
-index edd07711db..dfe98e7bfc 100644
---- a/tools/xenstored/core.c
-+++ b/tools/xenstored/core.c
-@@ -61,7 +61,7 @@
- #endif
-
- #if defined(XEN_SYSTEMD_ENABLED)
--#include <systemd/sd-daemon.h>
-+#include <xen-sd-notify.h>
- #endif
-
- extern xenevtchn_handle *xce_handle; /* in domain.c */
-@@ -3000,7 +3000,7 @@ int main(int argc, char *argv[])
- #if defined(XEN_SYSTEMD_ENABLED)
- if (!live_update) {
- sd_notify(1, "READY=1");
-- fprintf(stderr, SD_NOTICE "xenstored is ready\n");
-+ fprintf(stderr, "xenstored is ready\n");
- }
- #endif
-
---
-2.45.2
-
diff --git a/0042-tools-Drop-libsystemd-as-a-dependency.patch b/0042-tools-Drop-libsystemd-as-a-dependency.patch
deleted file mode 100644
index 168680e..0000000
--- a/0042-tools-Drop-libsystemd-as-a-dependency.patch
+++ /dev/null
@@ -1,648 +0,0 @@
-From 7967bd358e93ed83e01813a8d0dfd68aa67f5780 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 4 Jul 2024 14:10:40 +0200
-Subject: [PATCH 42/56] tools: Drop libsystemd as a dependency
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-There are no more users, and we want to disuade people from introducing new
-users just for sd_notify() and friends. Drop the dependency.
-
-We still want the overall --with{,out}-systemd to gate the generation of the
-service/unit/mount/etc files.
-
-Rerun autogen.sh, and mark the dependency as removed in the build containers.
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-Acked-by: Christian Lindig <christian.lindig@cloud.com>
-
-tools: (Actually) drop libsystemd as a dependency
-
-When reinstating some of systemd.m4 between v1 and v2, I reintroduced a little
-too much. While {c,o}xenstored are indeed no longer linked against
-libsystemd, ./configure still looks for it.
-
-Drop this too.
-
-Fixes: ae26101f6bfc ("tools: Drop libsystemd as a dependency")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: ae26101f6bfc8185adcdb9165d469bdc467780db
-master date: 2024-05-23 15:04:40 +0100
-master commit: 6ef4fa1e7fe78c1dae07b451292b07facfce4902
-master date: 2024-05-30 12:15:25 +0100
----
- CHANGELOG.md | 7 +-
- config/Tools.mk.in | 2 -
- m4/systemd.m4 | 17 --
- tools/configure | 485 +--------------------------------------------
- 4 files changed, 7 insertions(+), 504 deletions(-)
-
-diff --git a/CHANGELOG.md b/CHANGELOG.md
-index fa54d59df1..ceca12eb5f 100644
---- a/CHANGELOG.md
-+++ b/CHANGELOG.md
-@@ -4,7 +4,12 @@ Notable changes to Xen will be documented in this file.
-
- The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
-
--## [4.18.2](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.18.2)
-+## [4.18.3](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.18.3)
-+
-+### Changed
-+ - When building with Systemd support (./configure --enable-systemd), remove
-+ libsystemd as a build dependency. Systemd Notify support is retained, now
-+ using a standalone library implementation.
-
- ## [4.18.1](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-4.18.1)
-
-diff --git a/config/Tools.mk.in b/config/Tools.mk.in
-index b54ab21f96..50fbef841f 100644
---- a/config/Tools.mk.in
-+++ b/config/Tools.mk.in
-@@ -52,8 +52,6 @@ CONFIG_PYGRUB := @pygrub@
- CONFIG_LIBFSIMAGE := @libfsimage@
-
- CONFIG_SYSTEMD := @systemd@
--SYSTEMD_CFLAGS := @SYSTEMD_CFLAGS@
--SYSTEMD_LIBS := @SYSTEMD_LIBS@
- XEN_SYSTEMD_DIR := @SYSTEMD_DIR@
- XEN_SYSTEMD_MODULES_LOAD := @SYSTEMD_MODULES_LOAD@
- CONFIG_9PFS := @ninepfs@
-diff --git a/m4/systemd.m4 b/m4/systemd.m4
-index 112dc11b5e..ab12ea313d 100644
---- a/m4/systemd.m4
-+++ b/m4/systemd.m4
-@@ -41,15 +41,6 @@ AC_DEFUN([AX_ALLOW_SYSTEMD_OPTS], [
- ])
-
- AC_DEFUN([AX_CHECK_SYSTEMD_LIBS], [
-- PKG_CHECK_MODULES([SYSTEMD], [libsystemd-daemon],,
-- [PKG_CHECK_MODULES([SYSTEMD], [libsystemd >= 209])]
-- )
-- dnl pkg-config older than 0.24 does not set these for
-- dnl PKG_CHECK_MODULES() worth also noting is that as of version 208
-- dnl of systemd pkg-config --cflags currently yields no extra flags yet.
-- AC_SUBST([SYSTEMD_CFLAGS])
-- AC_SUBST([SYSTEMD_LIBS])
--
- AS_IF([test "x$SYSTEMD_DIR" = x], [
- dnl In order to use the line below we need to fix upstream systemd
- dnl to properly ${prefix} for child variables in
-@@ -95,13 +86,6 @@ AC_DEFUN([AX_CHECK_SYSTEMD], [
- ],[systemd=n])
- ])
-
--AC_DEFUN([AX_CHECK_SYSTEMD_ENABLE_AVAILABLE], [
-- PKG_CHECK_MODULES([SYSTEMD], [libsystemd-daemon], [systemd="y"],[
-- PKG_CHECK_MODULES([SYSTEMD], [libsystemd >= 209],
-- [systemd="y"],[systemd="n"])
-- ])
--])
--
- dnl Enables systemd by default and requires a --disable-systemd option flag
- dnl to configure if you want to disable.
- AC_DEFUN([AX_ENABLE_SYSTEMD], [
-@@ -121,6 +105,5 @@ dnl to have systemd build libraries it will be enabled. You can always force
- dnl disable with --disable-systemd
- AC_DEFUN([AX_AVAILABLE_SYSTEMD], [
- AX_ALLOW_SYSTEMD_OPTS()
-- AX_CHECK_SYSTEMD_ENABLE_AVAILABLE()
- AX_CHECK_SYSTEMD()
- ])
-diff --git a/tools/configure b/tools/configure
-index 38c0808d3a..7bb935d23b 100755
---- a/tools/configure
-+++ b/tools/configure
-@@ -626,8 +626,6 @@ ac_subst_vars='LTLIBOBJS
- LIBOBJS
- pvshim
- ninepfs
--SYSTEMD_LIBS
--SYSTEMD_CFLAGS
- SYSTEMD_MODULES_LOAD
- SYSTEMD_DIR
- systemd
-@@ -864,9 +862,7 @@ pixman_LIBS
- libzstd_CFLAGS
- libzstd_LIBS
- LIBNL3_CFLAGS
--LIBNL3_LIBS
--SYSTEMD_CFLAGS
--SYSTEMD_LIBS'
-+LIBNL3_LIBS'
-
-
- # Initialize some variables set by options.
-@@ -1621,10 +1617,6 @@ Some influential environment variables:
- LIBNL3_CFLAGS
- C compiler flags for LIBNL3, overriding pkg-config
- LIBNL3_LIBS linker flags for LIBNL3, overriding pkg-config
-- SYSTEMD_CFLAGS
-- C compiler flags for SYSTEMD, overriding pkg-config
-- SYSTEMD_LIBS
-- linker flags for SYSTEMD, overriding pkg-config
-
- Use these variables to override the choices made by `configure' or to help
- it to find libraries and programs with nonstandard names/locations.
-@@ -3889,8 +3881,6 @@ esac
-
-
-
--
--
-
-
-
-@@ -9540,223 +9530,6 @@ fi
-
-
-
--
--pkg_failed=no
--{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for SYSTEMD" >&5
--$as_echo_n "checking for SYSTEMD... " >&6; }
--
--if test -n "$SYSTEMD_CFLAGS"; then
-- pkg_cv_SYSTEMD_CFLAGS="$SYSTEMD_CFLAGS"
-- elif test -n "$PKG_CONFIG"; then
-- if test -n "$PKG_CONFIG" && \
-- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd-daemon\""; } >&5
-- ($PKG_CONFIG --exists --print-errors "libsystemd-daemon") 2>&5
-- ac_status=$?
-- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-- test $ac_status = 0; }; then
-- pkg_cv_SYSTEMD_CFLAGS=`$PKG_CONFIG --cflags "libsystemd-daemon" 2>/dev/null`
-- test "x$?" != "x0" && pkg_failed=yes
--else
-- pkg_failed=yes
--fi
-- else
-- pkg_failed=untried
--fi
--if test -n "$SYSTEMD_LIBS"; then
-- pkg_cv_SYSTEMD_LIBS="$SYSTEMD_LIBS"
-- elif test -n "$PKG_CONFIG"; then
-- if test -n "$PKG_CONFIG" && \
-- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd-daemon\""; } >&5
-- ($PKG_CONFIG --exists --print-errors "libsystemd-daemon") 2>&5
-- ac_status=$?
-- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-- test $ac_status = 0; }; then
-- pkg_cv_SYSTEMD_LIBS=`$PKG_CONFIG --libs "libsystemd-daemon" 2>/dev/null`
-- test "x$?" != "x0" && pkg_failed=yes
--else
-- pkg_failed=yes
--fi
-- else
-- pkg_failed=untried
--fi
--
--
--
--if test $pkg_failed = yes; then
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
--$as_echo "no" >&6; }
--
--if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
-- _pkg_short_errors_supported=yes
--else
-- _pkg_short_errors_supported=no
--fi
-- if test $_pkg_short_errors_supported = yes; then
-- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libsystemd-daemon" 2>&1`
-- else
-- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libsystemd-daemon" 2>&1`
-- fi
-- # Put the nasty error message in config.log where it belongs
-- echo "$SYSTEMD_PKG_ERRORS" >&5
--
--
--
--pkg_failed=no
--{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for SYSTEMD" >&5
--$as_echo_n "checking for SYSTEMD... " >&6; }
--
--if test -n "$SYSTEMD_CFLAGS"; then
-- pkg_cv_SYSTEMD_CFLAGS="$SYSTEMD_CFLAGS"
-- elif test -n "$PKG_CONFIG"; then
-- if test -n "$PKG_CONFIG" && \
-- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
-- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
-- ac_status=$?
-- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-- test $ac_status = 0; }; then
-- pkg_cv_SYSTEMD_CFLAGS=`$PKG_CONFIG --cflags "libsystemd >= 209" 2>/dev/null`
-- test "x$?" != "x0" && pkg_failed=yes
--else
-- pkg_failed=yes
--fi
-- else
-- pkg_failed=untried
--fi
--if test -n "$SYSTEMD_LIBS"; then
-- pkg_cv_SYSTEMD_LIBS="$SYSTEMD_LIBS"
-- elif test -n "$PKG_CONFIG"; then
-- if test -n "$PKG_CONFIG" && \
-- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
-- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
-- ac_status=$?
-- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-- test $ac_status = 0; }; then
-- pkg_cv_SYSTEMD_LIBS=`$PKG_CONFIG --libs "libsystemd >= 209" 2>/dev/null`
-- test "x$?" != "x0" && pkg_failed=yes
--else
-- pkg_failed=yes
--fi
-- else
-- pkg_failed=untried
--fi
--
--
--
--if test $pkg_failed = yes; then
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
--$as_echo "no" >&6; }
--
--if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
-- _pkg_short_errors_supported=yes
--else
-- _pkg_short_errors_supported=no
--fi
-- if test $_pkg_short_errors_supported = yes; then
-- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
-- else
-- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
-- fi
-- # Put the nasty error message in config.log where it belongs
-- echo "$SYSTEMD_PKG_ERRORS" >&5
--
-- systemd="n"
--elif test $pkg_failed = untried; then
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
--$as_echo "no" >&6; }
-- systemd="n"
--else
-- SYSTEMD_CFLAGS=$pkg_cv_SYSTEMD_CFLAGS
-- SYSTEMD_LIBS=$pkg_cv_SYSTEMD_LIBS
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
--$as_echo "yes" >&6; }
-- systemd="y"
--fi
--
--elif test $pkg_failed = untried; then
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
--$as_echo "no" >&6; }
--
--
--pkg_failed=no
--{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for SYSTEMD" >&5
--$as_echo_n "checking for SYSTEMD... " >&6; }
--
--if test -n "$SYSTEMD_CFLAGS"; then
-- pkg_cv_SYSTEMD_CFLAGS="$SYSTEMD_CFLAGS"
-- elif test -n "$PKG_CONFIG"; then
-- if test -n "$PKG_CONFIG" && \
-- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
-- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
-- ac_status=$?
-- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-- test $ac_status = 0; }; then
-- pkg_cv_SYSTEMD_CFLAGS=`$PKG_CONFIG --cflags "libsystemd >= 209" 2>/dev/null`
-- test "x$?" != "x0" && pkg_failed=yes
--else
-- pkg_failed=yes
--fi
-- else
-- pkg_failed=untried
--fi
--if test -n "$SYSTEMD_LIBS"; then
-- pkg_cv_SYSTEMD_LIBS="$SYSTEMD_LIBS"
-- elif test -n "$PKG_CONFIG"; then
-- if test -n "$PKG_CONFIG" && \
-- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
-- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
-- ac_status=$?
-- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-- test $ac_status = 0; }; then
-- pkg_cv_SYSTEMD_LIBS=`$PKG_CONFIG --libs "libsystemd >= 209" 2>/dev/null`
-- test "x$?" != "x0" && pkg_failed=yes
--else
-- pkg_failed=yes
--fi
-- else
-- pkg_failed=untried
--fi
--
--
--
--if test $pkg_failed = yes; then
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
--$as_echo "no" >&6; }
--
--if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
-- _pkg_short_errors_supported=yes
--else
-- _pkg_short_errors_supported=no
--fi
-- if test $_pkg_short_errors_supported = yes; then
-- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
-- else
-- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
-- fi
-- # Put the nasty error message in config.log where it belongs
-- echo "$SYSTEMD_PKG_ERRORS" >&5
--
-- systemd="n"
--elif test $pkg_failed = untried; then
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
--$as_echo "no" >&6; }
-- systemd="n"
--else
-- SYSTEMD_CFLAGS=$pkg_cv_SYSTEMD_CFLAGS
-- SYSTEMD_LIBS=$pkg_cv_SYSTEMD_LIBS
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
--$as_echo "yes" >&6; }
-- systemd="y"
--fi
--
--else
-- SYSTEMD_CFLAGS=$pkg_cv_SYSTEMD_CFLAGS
-- SYSTEMD_LIBS=$pkg_cv_SYSTEMD_LIBS
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
--$as_echo "yes" >&6; }
-- systemd="y"
--fi
--
--
- if test "x$enable_systemd" != "xno"; then :
-
- if test "x$systemd" = "xy" ; then :
-@@ -9766,262 +9539,6 @@ $as_echo "#define HAVE_SYSTEMD 1" >>confdefs.h
-
- systemd=y
-
--
--pkg_failed=no
--{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for SYSTEMD" >&5
--$as_echo_n "checking for SYSTEMD... " >&6; }
--
--if test -n "$SYSTEMD_CFLAGS"; then
-- pkg_cv_SYSTEMD_CFLAGS="$SYSTEMD_CFLAGS"
-- elif test -n "$PKG_CONFIG"; then
-- if test -n "$PKG_CONFIG" && \
-- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd-daemon\""; } >&5
-- ($PKG_CONFIG --exists --print-errors "libsystemd-daemon") 2>&5
-- ac_status=$?
-- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-- test $ac_status = 0; }; then
-- pkg_cv_SYSTEMD_CFLAGS=`$PKG_CONFIG --cflags "libsystemd-daemon" 2>/dev/null`
-- test "x$?" != "x0" && pkg_failed=yes
--else
-- pkg_failed=yes
--fi
-- else
-- pkg_failed=untried
--fi
--if test -n "$SYSTEMD_LIBS"; then
-- pkg_cv_SYSTEMD_LIBS="$SYSTEMD_LIBS"
-- elif test -n "$PKG_CONFIG"; then
-- if test -n "$PKG_CONFIG" && \
-- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd-daemon\""; } >&5
-- ($PKG_CONFIG --exists --print-errors "libsystemd-daemon") 2>&5
-- ac_status=$?
-- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-- test $ac_status = 0; }; then
-- pkg_cv_SYSTEMD_LIBS=`$PKG_CONFIG --libs "libsystemd-daemon" 2>/dev/null`
-- test "x$?" != "x0" && pkg_failed=yes
--else
-- pkg_failed=yes
--fi
-- else
-- pkg_failed=untried
--fi
--
--
--
--if test $pkg_failed = yes; then
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
--$as_echo "no" >&6; }
--
--if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
-- _pkg_short_errors_supported=yes
--else
-- _pkg_short_errors_supported=no
--fi
-- if test $_pkg_short_errors_supported = yes; then
-- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libsystemd-daemon" 2>&1`
-- else
-- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libsystemd-daemon" 2>&1`
-- fi
-- # Put the nasty error message in config.log where it belongs
-- echo "$SYSTEMD_PKG_ERRORS" >&5
--
--
--pkg_failed=no
--{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for SYSTEMD" >&5
--$as_echo_n "checking for SYSTEMD... " >&6; }
--
--if test -n "$SYSTEMD_CFLAGS"; then
-- pkg_cv_SYSTEMD_CFLAGS="$SYSTEMD_CFLAGS"
-- elif test -n "$PKG_CONFIG"; then
-- if test -n "$PKG_CONFIG" && \
-- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
-- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
-- ac_status=$?
-- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-- test $ac_status = 0; }; then
-- pkg_cv_SYSTEMD_CFLAGS=`$PKG_CONFIG --cflags "libsystemd >= 209" 2>/dev/null`
-- test "x$?" != "x0" && pkg_failed=yes
--else
-- pkg_failed=yes
--fi
-- else
-- pkg_failed=untried
--fi
--if test -n "$SYSTEMD_LIBS"; then
-- pkg_cv_SYSTEMD_LIBS="$SYSTEMD_LIBS"
-- elif test -n "$PKG_CONFIG"; then
-- if test -n "$PKG_CONFIG" && \
-- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
-- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
-- ac_status=$?
-- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-- test $ac_status = 0; }; then
-- pkg_cv_SYSTEMD_LIBS=`$PKG_CONFIG --libs "libsystemd >= 209" 2>/dev/null`
-- test "x$?" != "x0" && pkg_failed=yes
--else
-- pkg_failed=yes
--fi
-- else
-- pkg_failed=untried
--fi
--
--
--
--if test $pkg_failed = yes; then
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
--$as_echo "no" >&6; }
--
--if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
-- _pkg_short_errors_supported=yes
--else
-- _pkg_short_errors_supported=no
--fi
-- if test $_pkg_short_errors_supported = yes; then
-- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
-- else
-- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
-- fi
-- # Put the nasty error message in config.log where it belongs
-- echo "$SYSTEMD_PKG_ERRORS" >&5
--
-- as_fn_error $? "Package requirements (libsystemd >= 209) were not met:
--
--$SYSTEMD_PKG_ERRORS
--
--Consider adjusting the PKG_CONFIG_PATH environment variable if you
--installed software in a non-standard prefix.
--
--Alternatively, you may set the environment variables SYSTEMD_CFLAGS
--and SYSTEMD_LIBS to avoid the need to call pkg-config.
--See the pkg-config man page for more details." "$LINENO" 5
--elif test $pkg_failed = untried; then
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
--$as_echo "no" >&6; }
-- { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
--$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
--as_fn_error $? "The pkg-config script could not be found or is too old. Make sure it
--is in your PATH or set the PKG_CONFIG environment variable to the full
--path to pkg-config.
--
--Alternatively, you may set the environment variables SYSTEMD_CFLAGS
--and SYSTEMD_LIBS to avoid the need to call pkg-config.
--See the pkg-config man page for more details.
--
--To get pkg-config, see <http://pkg-config.freedesktop.org/>.
--See \`config.log' for more details" "$LINENO" 5; }
--else
-- SYSTEMD_CFLAGS=$pkg_cv_SYSTEMD_CFLAGS
-- SYSTEMD_LIBS=$pkg_cv_SYSTEMD_LIBS
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
--$as_echo "yes" >&6; }
--
--fi
--
--elif test $pkg_failed = untried; then
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
--$as_echo "no" >&6; }
--
--pkg_failed=no
--{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for SYSTEMD" >&5
--$as_echo_n "checking for SYSTEMD... " >&6; }
--
--if test -n "$SYSTEMD_CFLAGS"; then
-- pkg_cv_SYSTEMD_CFLAGS="$SYSTEMD_CFLAGS"
-- elif test -n "$PKG_CONFIG"; then
-- if test -n "$PKG_CONFIG" && \
-- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
-- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
-- ac_status=$?
-- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-- test $ac_status = 0; }; then
-- pkg_cv_SYSTEMD_CFLAGS=`$PKG_CONFIG --cflags "libsystemd >= 209" 2>/dev/null`
-- test "x$?" != "x0" && pkg_failed=yes
--else
-- pkg_failed=yes
--fi
-- else
-- pkg_failed=untried
--fi
--if test -n "$SYSTEMD_LIBS"; then
-- pkg_cv_SYSTEMD_LIBS="$SYSTEMD_LIBS"
-- elif test -n "$PKG_CONFIG"; then
-- if test -n "$PKG_CONFIG" && \
-- { { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists --print-errors \"libsystemd >= 209\""; } >&5
-- ($PKG_CONFIG --exists --print-errors "libsystemd >= 209") 2>&5
-- ac_status=$?
-- $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
-- test $ac_status = 0; }; then
-- pkg_cv_SYSTEMD_LIBS=`$PKG_CONFIG --libs "libsystemd >= 209" 2>/dev/null`
-- test "x$?" != "x0" && pkg_failed=yes
--else
-- pkg_failed=yes
--fi
-- else
-- pkg_failed=untried
--fi
--
--
--
--if test $pkg_failed = yes; then
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
--$as_echo "no" >&6; }
--
--if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
-- _pkg_short_errors_supported=yes
--else
-- _pkg_short_errors_supported=no
--fi
-- if test $_pkg_short_errors_supported = yes; then
-- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
-- else
-- SYSTEMD_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs "libsystemd >= 209" 2>&1`
-- fi
-- # Put the nasty error message in config.log where it belongs
-- echo "$SYSTEMD_PKG_ERRORS" >&5
--
-- as_fn_error $? "Package requirements (libsystemd >= 209) were not met:
--
--$SYSTEMD_PKG_ERRORS
--
--Consider adjusting the PKG_CONFIG_PATH environment variable if you
--installed software in a non-standard prefix.
--
--Alternatively, you may set the environment variables SYSTEMD_CFLAGS
--and SYSTEMD_LIBS to avoid the need to call pkg-config.
--See the pkg-config man page for more details." "$LINENO" 5
--elif test $pkg_failed = untried; then
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
--$as_echo "no" >&6; }
-- { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
--$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
--as_fn_error $? "The pkg-config script could not be found or is too old. Make sure it
--is in your PATH or set the PKG_CONFIG environment variable to the full
--path to pkg-config.
--
--Alternatively, you may set the environment variables SYSTEMD_CFLAGS
--and SYSTEMD_LIBS to avoid the need to call pkg-config.
--See the pkg-config man page for more details.
--
--To get pkg-config, see <http://pkg-config.freedesktop.org/>.
--See \`config.log' for more details" "$LINENO" 5; }
--else
-- SYSTEMD_CFLAGS=$pkg_cv_SYSTEMD_CFLAGS
-- SYSTEMD_LIBS=$pkg_cv_SYSTEMD_LIBS
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
--$as_echo "yes" >&6; }
--
--fi
--
--else
-- SYSTEMD_CFLAGS=$pkg_cv_SYSTEMD_CFLAGS
-- SYSTEMD_LIBS=$pkg_cv_SYSTEMD_LIBS
-- { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
--$as_echo "yes" >&6; }
--
--fi
--
--
--
- if test "x$SYSTEMD_DIR" = x; then :
-
- SYSTEMD_DIR="\$(prefix)/lib/systemd/system/"
---
-2.45.2
-
diff --git a/0043-x86-ioapic-Fix-signed-shifts-in-io_apic.c.patch b/0043-x86-ioapic-Fix-signed-shifts-in-io_apic.c.patch
deleted file mode 100644
index c368c1d..0000000
--- a/0043-x86-ioapic-Fix-signed-shifts-in-io_apic.c.patch
+++ /dev/null
@@ -1,46 +0,0 @@
-From 0dc5fbee17cd2bcb1aa6a1cf420dd80381587de8 Mon Sep 17 00:00:00 2001
-From: Matthew Barnes <matthew.barnes@cloud.com>
-Date: Thu, 4 Jul 2024 14:11:03 +0200
-Subject: [PATCH 43/56] x86/ioapic: Fix signed shifts in io_apic.c
-
-There exists bitshifts in the IOAPIC code where signed integers are
-shifted to the left by up to 31 bits, which is undefined behaviour.
-
-This patch fixes this by changing the integers from signed to unsigned.
-
-Signed-off-by: Matthew Barnes <matthew.barnes@cloud.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: c5746b021e573184fb92b601a0e93a295485054e
-master date: 2024-06-21 15:09:26 +0100
----
- xen/arch/x86/io_apic.c | 6 ++++--
- 1 file changed, 4 insertions(+), 2 deletions(-)
-
-diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
-index 0ef61fb2f1..c5342789e8 100644
---- a/xen/arch/x86/io_apic.c
-+++ b/xen/arch/x86/io_apic.c
-@@ -1692,7 +1692,8 @@ static void cf_check mask_and_ack_level_ioapic_irq(struct irq_desc *desc)
- !io_apic_level_ack_pending(desc->irq))
- move_masked_irq(desc);
-
-- if ( !(v & (1 << (i & 0x1f))) ) {
-+ if ( !(v & (1U << (i & 0x1f))) )
-+ {
- spin_lock(&ioapic_lock);
- __edge_IO_APIC_irq(desc->irq);
- __level_IO_APIC_irq(desc->irq);
-@@ -1756,7 +1757,8 @@ static void cf_check end_level_ioapic_irq_new(struct irq_desc *desc, u8 vector)
- !io_apic_level_ack_pending(desc->irq) )
- move_native_irq(desc);
-
-- if (!(v & (1 << (i & 0x1f)))) {
-+ if ( !(v & (1U << (i & 0x1f))) )
-+ {
- spin_lock(&ioapic_lock);
- __mask_IO_APIC_irq(desc->irq);
- __edge_IO_APIC_irq(desc->irq);
---
-2.45.2
-
diff --git a/0044-tools-xl-Open-xldevd.log-with-O_CLOEXEC.patch b/0044-tools-xl-Open-xldevd.log-with-O_CLOEXEC.patch
deleted file mode 100644
index 39dc3eb..0000000
--- a/0044-tools-xl-Open-xldevd.log-with-O_CLOEXEC.patch
+++ /dev/null
@@ -1,53 +0,0 @@
-From 2b3bf02c4f5e44d7d7bd3636530c9ebc837dea87 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 4 Jul 2024 14:11:36 +0200
-Subject: [PATCH 44/56] tools/xl: Open xldevd.log with O_CLOEXEC
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-`xl devd` has been observed leaking /var/log/xldevd.log into children.
-
-Note this is specifically safe; dup2() leaves O_CLOEXEC disabled on newfd, so
-after setting up stdout/stderr, it's only the logfile fd which will close on
-exec().
-
-Link: https://github.com/QubesOS/qubes-issues/issues/8292
-Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
-Reviewed-by: Demi Marie Obenour <demi@invisiblethingslab.com>
-Acked-by: Anthony PERARD <anthony.perard@vates.tech>
-master commit: ba52b3b624e4a1a976908552364eba924ca45430
-master date: 2024-06-24 16:22:59 +0100
----
- tools/xl/xl_utils.c | 6 +++++-
- 1 file changed, 5 insertions(+), 1 deletion(-)
-
-diff --git a/tools/xl/xl_utils.c b/tools/xl/xl_utils.c
-index 17489d1829..b0d23b2cdb 100644
---- a/tools/xl/xl_utils.c
-+++ b/tools/xl/xl_utils.c
-@@ -27,6 +27,10 @@
- #include "xl.h"
- #include "xl_utils.h"
-
-+#ifndef O_CLOEXEC
-+#define O_CLOEXEC 0
-+#endif
-+
- void dolog(const char *file, int line, const char *func, const char *fmt, ...)
- {
- va_list ap;
-@@ -270,7 +274,7 @@ int do_daemonize(const char *name, const char *pidfile)
- exit(-1);
- }
-
-- CHK_SYSCALL(logfile = open(fullname, O_WRONLY|O_CREAT|O_APPEND, 0644));
-+ CHK_SYSCALL(logfile = open(fullname, O_WRONLY | O_CREAT | O_APPEND | O_CLOEXEC, 0644));
- free(fullname);
- assert(logfile >= 3);
-
---
-2.45.2
-
diff --git a/0045-pirq_cleanup_check-leaks.patch b/0045-pirq_cleanup_check-leaks.patch
deleted file mode 100644
index dcf96c7..0000000
--- a/0045-pirq_cleanup_check-leaks.patch
+++ /dev/null
@@ -1,84 +0,0 @@
-From c9f50d2c5f29b630603e2b95f29e5b6e416a6187 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Thu, 4 Jul 2024 14:11:57 +0200
-Subject: [PATCH 45/56] pirq_cleanup_check() leaks
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Its original introduction had two issues: For one the "common" part of
-the checks (carried out in the macro) was inverted. And then after
-removal from the radix tree the structure wasn't scheduled for freeing.
-(All structures still left in the radix tree would be freed upon domain
-destruction, though.)
-
-For the freeing to be safe even if it didn't use RCU (i.e. to avoid use-
-after-free), re-arrange checks/operations in evtchn_close(), such that
-the pointer wouldn't be used anymore after calling pirq_cleanup_check()
-(noting that unmap_domain_pirq_emuirq() itself calls the function in the
-success case).
-
-Fixes: c24536b636f2 ("replace d->nr_pirqs sized arrays with radix tree")
-Fixes: 79858fee307c ("xen: fix hvm_domain_use_pirq's behavior")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: daa90dfea9175c07f13d1a2d901857b2dd14d080
-master date: 2024-07-02 08:35:56 +0200
----
- xen/arch/x86/irq.c | 1 +
- xen/common/event_channel.c | 11 ++++++++---
- xen/include/xen/irq.h | 2 +-
- 3 files changed, 10 insertions(+), 4 deletions(-)
-
-diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
-index 290f8d26e7..00be3b88e8 100644
---- a/xen/arch/x86/irq.c
-+++ b/xen/arch/x86/irq.c
-@@ -1413,6 +1413,7 @@ void (pirq_cleanup_check)(struct pirq *pirq, struct domain *d)
-
- if ( radix_tree_delete(&d->pirq_tree, pirq->pirq) != pirq )
- BUG();
-+ free_pirq_struct(pirq);
- }
-
- /* Flush all ready EOIs from the top of this CPU's pending-EOI stack. */
-diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
-index 66f924a7b0..b1a6215c37 100644
---- a/xen/common/event_channel.c
-+++ b/xen/common/event_channel.c
-@@ -705,11 +705,16 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
- if ( !is_hvm_domain(d1) )
- pirq_guest_unbind(d1, pirq);
- pirq->evtchn = 0;
-- pirq_cleanup_check(pirq, d1);
- #ifdef CONFIG_X86
-- if ( is_hvm_domain(d1) && domain_pirq_to_irq(d1, pirq->pirq) > 0 )
-- unmap_domain_pirq_emuirq(d1, pirq->pirq);
-+ if ( !is_hvm_domain(d1) ||
-+ domain_pirq_to_irq(d1, pirq->pirq) <= 0 ||
-+ unmap_domain_pirq_emuirq(d1, pirq->pirq) < 0 )
-+ /*
-+ * The successful path of unmap_domain_pirq_emuirq() will have
-+ * called pirq_cleanup_check() already.
-+ */
- #endif
-+ pirq_cleanup_check(pirq, d1);
- }
- unlink_pirq_port(chn1, d1->vcpu[chn1->notify_vcpu_id]);
- break;
-diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
-index 65083135e1..5dcd2d8f0c 100644
---- a/xen/include/xen/irq.h
-+++ b/xen/include/xen/irq.h
-@@ -180,7 +180,7 @@ extern struct pirq *pirq_get_info(struct domain *d, int pirq);
- void pirq_cleanup_check(struct pirq *pirq, struct domain *d);
-
- #define pirq_cleanup_check(pirq, d) \
-- ((pirq)->evtchn ? pirq_cleanup_check(pirq, d) : (void)0)
-+ (!(pirq)->evtchn ? pirq_cleanup_check(pirq, d) : (void)0)
-
- extern void pirq_guest_eoi(struct pirq *pirq);
- extern void desc_guest_eoi(struct irq_desc *desc, struct pirq *pirq);
---
-2.45.2
-
diff --git a/0046-tools-dombuilder-Correct-the-length-calculation-in-x.patch b/0046-tools-dombuilder-Correct-the-length-calculation-in-x.patch
deleted file mode 100644
index b25f15d..0000000
--- a/0046-tools-dombuilder-Correct-the-length-calculation-in-x.patch
+++ /dev/null
@@ -1,44 +0,0 @@
-From 8e51c8f1d45fad242a315fa17ba3582c02e66840 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 4 Jul 2024 14:12:31 +0200
-Subject: [PATCH 46/56] tools/dombuilder: Correct the length calculation in
- xc_dom_alloc_segment()
-
-xc_dom_alloc_segment() is passed a size in bytes, calculates a size in pages
-from it, then fills in the new segment information with a bytes value
-re-calculated from the number of pages.
-
-This causes the module information given to the guest (MB, or PVH) to have
-incorrect sizes; specifically, sizes rounded up to the next page.
-
-This in turn is problematic for Xen. When Xen finds a gzipped module, it
-peeks at the end metadata to judge the decompressed size, which is a -4
-backreference from the reported end of the module.
-
-Fill in seg->vend using the correct number of bytes.
-
-Fixes: ea7c8a3d0e82 ("libxc: reorganize domain builder guest memory allocator")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Acked-by: Anthony PERARD <anthony.perard@vates.tech>
-master commit: 4c3a618b0adaa0cd59e0fa0898bb60978b8b3a5f
-master date: 2024-07-02 10:50:18 +0100
----
- tools/libs/guest/xg_dom_core.c | 2 +-
- 1 file changed, 1 insertion(+), 1 deletion(-)
-
-diff --git a/tools/libs/guest/xg_dom_core.c b/tools/libs/guest/xg_dom_core.c
-index c4f4e7f3e2..f5521d528b 100644
---- a/tools/libs/guest/xg_dom_core.c
-+++ b/tools/libs/guest/xg_dom_core.c
-@@ -601,7 +601,7 @@ int xc_dom_alloc_segment(struct xc_dom_image *dom,
- memset(ptr, 0, pages * page_size);
-
- seg->vstart = start;
-- seg->vend = dom->virt_alloc_end;
-+ seg->vend = start + size;
-
- DOMPRINTF("%-20s: %-12s : 0x%" PRIx64 " -> 0x%" PRIx64
- " (pfn 0x%" PRIpfn " + 0x%" PRIpfn " pages)",
---
-2.45.2
-
diff --git a/0047-tools-libxs-Fix-CLOEXEC-handling-in-get_dev.patch b/0047-tools-libxs-Fix-CLOEXEC-handling-in-get_dev.patch
deleted file mode 100644
index aabae58..0000000
--- a/0047-tools-libxs-Fix-CLOEXEC-handling-in-get_dev.patch
+++ /dev/null
@@ -1,95 +0,0 @@
-From d1b3bbb46402af77089906a97c413c14ed1740d2 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 4 Jul 2024 14:13:10 +0200
-Subject: [PATCH 47/56] tools/libxs: Fix CLOEXEC handling in get_dev()
-
-Move the O_CLOEXEC compatibility outside of an #ifdef USE_PTHREAD block.
-
-Introduce set_cloexec() to wrap fcntl() setting FD_CLOEXEC. It will be reused
-for other CLOEXEC fixes too.
-
-Use set_cloexec() when O_CLOEXEC isn't available as a best-effort fallback.
-
-Fixes: f4f2f3402b2f ("tools/libxs: Open /dev/xen/xenbus fds as O_CLOEXEC")
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-Acked-by: Anthony PERARD <anthony.perard@vates.tech>
-master commit: bf7c1464706adfa903f1e7d59383d042c3a88e39
-master date: 2024-07-02 10:51:06 +0100
----
- tools/libs/store/xs.c | 38 ++++++++++++++++++++++++++++++++------
- 1 file changed, 32 insertions(+), 6 deletions(-)
-
-diff --git a/tools/libs/store/xs.c b/tools/libs/store/xs.c
-index 1498515073..037e79d98b 100644
---- a/tools/libs/store/xs.c
-+++ b/tools/libs/store/xs.c
-@@ -40,6 +40,10 @@
- #include <xentoolcore_internal.h>
- #include <xen_list.h>
-
-+#ifndef O_CLOEXEC
-+#define O_CLOEXEC 0
-+#endif
-+
- struct xs_stored_msg {
- XEN_TAILQ_ENTRY(struct xs_stored_msg) list;
- struct xsd_sockmsg hdr;
-@@ -54,10 +58,6 @@ struct xs_stored_msg {
- #include <dlfcn.h>
- #endif
-
--#ifndef O_CLOEXEC
--#define O_CLOEXEC 0
--#endif
--
- struct xs_handle {
- /* Communications channel to xenstore daemon. */
- int fd;
-@@ -176,6 +176,16 @@ static bool setnonblock(int fd, int nonblock) {
- return true;
- }
-
-+static bool set_cloexec(int fd)
-+{
-+ int flags = fcntl(fd, F_GETFL);
-+
-+ if (flags < 0)
-+ return false;
-+
-+ return fcntl(fd, flags | FD_CLOEXEC) >= 0;
-+}
-+
- int xs_fileno(struct xs_handle *h)
- {
- char c = 0;
-@@ -230,8 +240,24 @@ error:
-
- static int get_dev(const char *connect_to)
- {
-- /* We cannot open read-only because requests are writes */
-- return open(connect_to, O_RDWR | O_CLOEXEC);
-+ int fd, saved_errno;
-+
-+ fd = open(connect_to, O_RDWR | O_CLOEXEC);
-+ if (fd < 0)
-+ return -1;
-+
-+ /* Compat for non-O_CLOEXEC environments. Racy. */
-+ if (!O_CLOEXEC && !set_cloexec(fd))
-+ goto error;
-+
-+ return fd;
-+
-+error:
-+ saved_errno = errno;
-+ close(fd);
-+ errno = saved_errno;
-+
-+ return -1;
- }
-
- static int all_restrict_cb(Xentoolcore__Active_Handle *ah, domid_t domid) {
---
-2.45.2
-
diff --git a/0048-tools-libxs-Fix-CLOEXEC-handling-in-get_socket.patch b/0048-tools-libxs-Fix-CLOEXEC-handling-in-get_socket.patch
deleted file mode 100644
index e01a6b4..0000000
--- a/0048-tools-libxs-Fix-CLOEXEC-handling-in-get_socket.patch
+++ /dev/null
@@ -1,60 +0,0 @@
-From d689bb4d2cd3ccdb0067b0ca953cccbc5ab375ae Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 4 Jul 2024 14:13:18 +0200
-Subject: [PATCH 48/56] tools/libxs: Fix CLOEXEC handling in get_socket()
-
-get_socket() opens a socket, then uses fcntl() to set CLOEXEC. This is racy
-with exec().
-
-Open the socket with SOCK_CLOEXEC. Use the same compatibility strategy as
-O_CLOEXEC on ancient versions of Linux.
-
-Reported-by: Frediano Ziglio <frediano.ziglio@cloud.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-Acked-by: Anthony PERARD <anthony.perard@vates.tech>
-master commit: 1957dd6aff931877fc22699d8f2d4be8728014ba
-master date: 2024-07-02 10:51:11 +0100
----
- tools/libs/store/xs.c | 14 ++++++++------
- 1 file changed, 8 insertions(+), 6 deletions(-)
-
-diff --git a/tools/libs/store/xs.c b/tools/libs/store/xs.c
-index 037e79d98b..11a766c508 100644
---- a/tools/libs/store/xs.c
-+++ b/tools/libs/store/xs.c
-@@ -44,6 +44,10 @@
- #define O_CLOEXEC 0
- #endif
-
-+#ifndef SOCK_CLOEXEC
-+#define SOCK_CLOEXEC 0
-+#endif
-+
- struct xs_stored_msg {
- XEN_TAILQ_ENTRY(struct xs_stored_msg) list;
- struct xsd_sockmsg hdr;
-@@ -207,16 +211,14 @@ int xs_fileno(struct xs_handle *h)
- static int get_socket(const char *connect_to)
- {
- struct sockaddr_un addr;
-- int sock, saved_errno, flags;
-+ int sock, saved_errno;
-
-- sock = socket(PF_UNIX, SOCK_STREAM, 0);
-+ sock = socket(PF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0);
- if (sock < 0)
- return -1;
-
-- if ((flags = fcntl(sock, F_GETFD)) < 0)
-- goto error;
-- flags |= FD_CLOEXEC;
-- if (fcntl(sock, F_SETFD, flags) < 0)
-+ /* Compat for non-SOCK_CLOEXEC environments. Racy. */
-+ if (!SOCK_CLOEXEC && !set_cloexec(sock))
- goto error;
-
- addr.sun_family = AF_UNIX;
---
-2.45.2
-
diff --git a/0049-tools-libxs-Fix-CLOEXEC-handling-in-xs_fileno.patch b/0049-tools-libxs-Fix-CLOEXEC-handling-in-xs_fileno.patch
deleted file mode 100644
index 564cece..0000000
--- a/0049-tools-libxs-Fix-CLOEXEC-handling-in-xs_fileno.patch
+++ /dev/null
@@ -1,109 +0,0 @@
-From 26b8ff1861a870e01456b31bf999f25df5538ebf Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 4 Jul 2024 14:13:30 +0200
-Subject: [PATCH 49/56] tools/libxs: Fix CLOEXEC handling in xs_fileno()
-
-xs_fileno() opens a pipe on first use to communicate between the watch thread
-and the main thread. Nothing ever sets CLOEXEC on the file descriptors.
-
-Check for the availability of the pipe2() function with configure. Despite
-starting life as Linux-only, FreeBSD and NetBSD have gained it.
-
-When pipe2() isn't available, try our best with pipe() and set_cloexec().
-
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-Acked-by: Anthony PERARD <anthony.perard@vates.tech>
-master commit: a2ff677852f0ce05fa335e8e5682bf2ae0c916ee
-master date: 2024-07-02 10:52:59 +0100
----
- tools/config.h.in | 3 +++
- tools/configure | 12 ++++++++++++
- tools/configure.ac | 2 ++
- tools/libs/store/xs.c | 16 +++++++++++++++-
- 4 files changed, 32 insertions(+), 1 deletion(-)
-
-diff --git a/tools/config.h.in b/tools/config.h.in
-index 0bb2fe08a1..50ad60fcb0 100644
---- a/tools/config.h.in
-+++ b/tools/config.h.in
-@@ -39,6 +39,9 @@
- /* Define to 1 if you have the <memory.h> header file. */
- #undef HAVE_MEMORY_H
-
-+/* Define to 1 if you have the `pipe2' function. */
-+#undef HAVE_PIPE2
-+
- /* pygrub enabled */
- #undef HAVE_PYGRUB
-
-diff --git a/tools/configure b/tools/configure
-index 7bb935d23b..e35112b5c5 100755
---- a/tools/configure
-+++ b/tools/configure
-@@ -9751,6 +9751,18 @@ if test "$ax_found" = "0"; then :
- fi
-
-
-+for ac_func in pipe2
-+do :
-+ ac_fn_c_check_func "$LINENO" "pipe2" "ac_cv_func_pipe2"
-+if test "x$ac_cv_func_pipe2" = xyes; then :
-+ cat >>confdefs.h <<_ACEOF
-+#define HAVE_PIPE2 1
-+_ACEOF
-+
-+fi
-+done
-+
-+
- cat >confcache <<\_ACEOF
- # This file is a shell script that caches the results of configure
- # tests run on this system so they can be shared between configure
-diff --git a/tools/configure.ac b/tools/configure.ac
-index 618ef8c63f..53ac20af1e 100644
---- a/tools/configure.ac
-+++ b/tools/configure.ac
-@@ -543,4 +543,6 @@ AS_IF([test "x$pvshim" = "xy"], [
-
- AX_FIND_HEADER([INCLUDE_ENDIAN_H], [endian.h sys/endian.h])
-
-+AC_CHECK_FUNCS([pipe2])
-+
- AC_OUTPUT()
-diff --git a/tools/libs/store/xs.c b/tools/libs/store/xs.c
-index 11a766c508..c8845b69e2 100644
---- a/tools/libs/store/xs.c
-+++ b/tools/libs/store/xs.c
-@@ -190,13 +190,27 @@ static bool set_cloexec(int fd)
- return fcntl(fd, flags | FD_CLOEXEC) >= 0;
- }
-
-+static int pipe_cloexec(int fds[2])
-+{
-+#if HAVE_PIPE2
-+ return pipe2(fds, O_CLOEXEC);
-+#else
-+ if (pipe(fds) < 0)
-+ return -1;
-+ /* Best effort to set CLOEXEC. Racy. */
-+ set_cloexec(fds[0]);
-+ set_cloexec(fds[1]);
-+ return 0;
-+#endif
-+}
-+
- int xs_fileno(struct xs_handle *h)
- {
- char c = 0;
-
- mutex_lock(&h->watch_mutex);
-
-- if ((h->watch_pipe[0] == -1) && (pipe(h->watch_pipe) != -1)) {
-+ if ((h->watch_pipe[0] == -1) && (pipe_cloexec(h->watch_pipe) != -1)) {
- /* Kick things off if the watch list is already non-empty. */
- if (!XEN_TAILQ_EMPTY(&h->watch_list))
- while (write(h->watch_pipe[1], &c, 1) != 1)
---
-2.45.2
-
diff --git a/0050-cmdline-document-and-enforce-extra_guest_irqs-upper-.patch b/0050-cmdline-document-and-enforce-extra_guest_irqs-upper-.patch
deleted file mode 100644
index f7f61e8..0000000
--- a/0050-cmdline-document-and-enforce-extra_guest_irqs-upper-.patch
+++ /dev/null
@@ -1,156 +0,0 @@
-From 30c695ddaf067cbe7a98037474e7910109238807 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Thu, 4 Jul 2024 14:14:16 +0200
-Subject: [PATCH 50/56] cmdline: document and enforce "extra_guest_irqs" upper
- bounds
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-PHYSDEVOP_pirq_eoi_gmfn_v<N> accepting just a single GFN implies that no
-more than 32k pIRQ-s can be used by a domain on x86. Document this upper
-bound.
-
-To also enforce the limit, (ab)use both arch_hwdom_irqs() (changing its
-parameter type) and setup_system_domains(). This is primarily to avoid
-exposing the two static variables or introducing yet further arch hooks.
-
-While touching arch_hwdom_irqs() also mark it hwdom-init.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Acked-by: Roger Pau Monné <roger.pau@citrix.com>
-
-amend 'cmdline: document and enforce "extra_guest_irqs" upper bounds'
-
-Address late review comments for what is now commit 17f6d398f765:
-- bound max_irqs right away against nr_irqs
-- introduce a #define for a constant used twice
-
-Requested-by: Roger Pau Monné <roger.pau@citrix.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 17f6d398f76597f8009ec0530842fb8705ece7ba
-master date: 2024-07-02 12:00:27 +0200
-master commit: 1f56accba33ffea0abf7d1c6384710823d10cbd6
-master date: 2024-07-03 14:03:27 +0200
----
- docs/misc/xen-command-line.pandoc | 3 ++-
- xen/arch/x86/io_apic.c | 17 ++++++++++-------
- xen/common/domain.c | 24 ++++++++++++++++++++++--
- xen/include/xen/irq.h | 3 ++-
- 4 files changed, 36 insertions(+), 11 deletions(-)
-
-diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
-index 10a09bbf23..d857bd0f89 100644
---- a/docs/misc/xen-command-line.pandoc
-+++ b/docs/misc/xen-command-line.pandoc
-@@ -1175,7 +1175,8 @@ common for all domUs, while the optional second number (preceded by a comma)
- is for dom0. Changing the setting for domU has no impact on dom0 and vice
- versa. For example to change dom0 without changing domU, use
- `extra_guest_irqs=,512`. The default value for Dom0 and an eventual separate
--hardware domain is architecture dependent.
-+hardware domain is architecture dependent. The upper limit for both values on
-+x86 is such that the resulting total number of IRQs can't be higher than 32768.
- Note that specifying zero as domU value means zero, while for dom0 it means
- to use the default.
-
-diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
-index c5342789e8..f7591fd091 100644
---- a/xen/arch/x86/io_apic.c
-+++ b/xen/arch/x86/io_apic.c
-@@ -2664,18 +2664,21 @@ void __init ioapic_init(void)
- nr_irqs_gsi, nr_irqs - nr_irqs_gsi);
- }
-
--unsigned int arch_hwdom_irqs(domid_t domid)
-+unsigned int __hwdom_init arch_hwdom_irqs(const struct domain *d)
- {
- unsigned int n = fls(num_present_cpus());
-+ /* Bounding by the domain pirq EOI bitmap capacity. */
-+ const unsigned int max_irqs = min_t(unsigned int, nr_irqs,
-+ PAGE_SIZE * BITS_PER_BYTE);
-
-- if ( !domid )
-- n = min(n, dom0_max_vcpus());
-- n = min(nr_irqs_gsi + n * NR_DYNAMIC_VECTORS, nr_irqs);
-+ if ( is_system_domain(d) )
-+ return max_irqs;
-
-- /* Bounded by the domain pirq eoi bitmap gfn. */
-- n = min_t(unsigned int, n, PAGE_SIZE * BITS_PER_BYTE);
-+ if ( !d->domain_id )
-+ n = min(n, dom0_max_vcpus());
-+ n = min(nr_irqs_gsi + n * NR_DYNAMIC_VECTORS, max_irqs);
-
-- printk("Dom%d has maximum %u PIRQs\n", domid, n);
-+ printk("%pd has maximum %u PIRQs\n", d, n);
-
- return n;
- }
-diff --git a/xen/common/domain.c b/xen/common/domain.c
-index 003f4ab125..62832a5860 100644
---- a/xen/common/domain.c
-+++ b/xen/common/domain.c
-@@ -351,7 +351,8 @@ static int late_hwdom_init(struct domain *d)
- }
-
- static unsigned int __read_mostly extra_hwdom_irqs;
--static unsigned int __read_mostly extra_domU_irqs = 32;
-+#define DEFAULT_EXTRA_DOMU_IRQS 32U
-+static unsigned int __read_mostly extra_domU_irqs = DEFAULT_EXTRA_DOMU_IRQS;
-
- static int __init cf_check parse_extra_guest_irqs(const char *s)
- {
-@@ -688,7 +689,7 @@ struct domain *domain_create(domid_t domid,
- d->nr_pirqs = nr_static_irqs + extra_domU_irqs;
- else
- d->nr_pirqs = extra_hwdom_irqs ? nr_static_irqs + extra_hwdom_irqs
-- : arch_hwdom_irqs(domid);
-+ : arch_hwdom_irqs(d);
- d->nr_pirqs = min(d->nr_pirqs, nr_irqs);
-
- radix_tree_init(&d->pirq_tree);
-@@ -812,6 +813,25 @@ void __init setup_system_domains(void)
- if ( IS_ERR(dom_xen) )
- panic("Failed to create d[XEN]: %ld\n", PTR_ERR(dom_xen));
-
-+#ifdef CONFIG_HAS_PIRQ
-+ /* Bound-check values passed via "extra_guest_irqs=". */
-+ {
-+ unsigned int n = max(arch_hwdom_irqs(dom_xen), nr_static_irqs);
-+
-+ if ( extra_hwdom_irqs > n - nr_static_irqs )
-+ {
-+ extra_hwdom_irqs = n - nr_static_irqs;
-+ printk(XENLOG_WARNING "hwdom IRQs bounded to %u\n", n);
-+ }
-+ if ( extra_domU_irqs >
-+ max(DEFAULT_EXTRA_DOMU_IRQS, n - nr_static_irqs) )
-+ {
-+ extra_domU_irqs = n - nr_static_irqs;
-+ printk(XENLOG_WARNING "domU IRQs bounded to %u\n", n);
-+ }
-+ }
-+#endif
-+
- /*
- * Initialise our DOMID_IO domain.
- * This domain owns I/O pages that are within the range of the page_info
-diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
-index 5dcd2d8f0c..bef170bcb6 100644
---- a/xen/include/xen/irq.h
-+++ b/xen/include/xen/irq.h
-@@ -196,8 +196,9 @@ extern struct irq_desc *pirq_spin_lock_irq_desc(
-
- unsigned int set_desc_affinity(struct irq_desc *desc, const cpumask_t *mask);
-
-+/* When passed a system domain, this returns the maximum permissible value. */
- #ifndef arch_hwdom_irqs
--unsigned int arch_hwdom_irqs(domid_t domid);
-+unsigned int arch_hwdom_irqs(const struct domain *d);
- #endif
-
- #ifndef arch_evtchn_bind_pirq
---
-2.45.2
-
diff --git a/0051-x86-entry-don-t-clear-DF-when-raising-UD-for-lack-of.patch b/0051-x86-entry-don-t-clear-DF-when-raising-UD-for-lack-of.patch
deleted file mode 100644
index acefc8e..0000000
--- a/0051-x86-entry-don-t-clear-DF-when-raising-UD-for-lack-of.patch
+++ /dev/null
@@ -1,58 +0,0 @@
-From 7e636b8a16412d4f0d94b2b24d7ebcd2c749afff Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Thu, 4 Jul 2024 14:14:49 +0200
-Subject: [PATCH 51/56] x86/entry: don't clear DF when raising #UD for lack of
- syscall handler
-
-While doing so is intentional when invoking the actual callback, to
-mimic a hard-coded SYCALL_MASK / FMASK MSR, the same should not be done
-when no handler is available and hence #UD is raised.
-
-Fixes: ca6fcf4321b3 ("x86/pv: Inject #UD for missing SYSCALL callbacks")
-Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
-master commit: d2fe9ab3048d503869ec81bc49db07e55a4a2386
-master date: 2024-07-02 12:01:21 +0200
----
- xen/arch/x86/x86_64/entry.S | 12 +++++++++++-
- 1 file changed, 11 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
-index 054fcb225f..d3def49ea3 100644
---- a/xen/arch/x86/x86_64/entry.S
-+++ b/xen/arch/x86/x86_64/entry.S
-@@ -38,6 +38,14 @@ switch_to_kernel:
- setc %cl
- leal (,%rcx,TBF_INTERRUPT),%ecx
-
-+ /*
-+ * The PV ABI hardcodes the (guest-inaccessible and virtual)
-+ * SYSCALL_MASK MSR such that DF (and nothing else) would be cleared.
-+ * Note that the equivalent of IF (VGCF_syscall_disables_events) is
-+ * dealt with separately above.
-+ */
-+ mov $~X86_EFLAGS_DF, %esi
-+
- test %rax, %rax
- UNLIKELY_START(z, syscall_no_callback) /* TB_eip == 0 => #UD */
- mov VCPU_trap_ctxt(%rbx), %rdi
-@@ -47,12 +55,14 @@ UNLIKELY_START(z, syscall_no_callback) /* TB_eip == 0 => #UD */
- testb $4, X86_EXC_UD * TRAPINFO_sizeof + TRAPINFO_flags(%rdi)
- setnz %cl
- lea TBF_EXCEPTION(, %rcx, TBF_INTERRUPT), %ecx
-+ or $~0, %esi /* Don't clear DF */
- UNLIKELY_END(syscall_no_callback)
-
- movq %rax,TRAPBOUNCE_eip(%rdx)
- movb %cl,TRAPBOUNCE_flags(%rdx)
- call create_bounce_frame
-- andl $~X86_EFLAGS_DF,UREGS_eflags(%rsp)
-+ /* Conditionally clear DF */
-+ and %esi, UREGS_eflags(%rsp)
- /* %rbx: struct vcpu */
- test_all_events:
- ASSERT_NOT_IN_ATOMIC
---
-2.45.2
-
diff --git a/0052-evtchn-build-fix-for-Arm.patch b/0052-evtchn-build-fix-for-Arm.patch
deleted file mode 100644
index 6cbeb10..0000000
--- a/0052-evtchn-build-fix-for-Arm.patch
+++ /dev/null
@@ -1,43 +0,0 @@
-From 45c5333935628e7c80de0bd5a9d9eff50b305b16 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Thu, 4 Jul 2024 16:57:29 +0200
-Subject: [PATCH 52/56] evtchn: build fix for Arm
-
-When backporting daa90dfea917 ("pirq_cleanup_check() leaks") I neglected
-to pay attention to it depending on 13a7b0f9f747 ("restrict concept of
-pIRQ to x86"). That one doesn't want backporting imo, so use / adjust
-custom #ifdef-ary to address the immediate issue of pirq_cleanup_check()
-not being available on Arm.
-
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
----
- xen/common/event_channel.c | 4 +++-
- 1 file changed, 3 insertions(+), 1 deletion(-)
-
-diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
-index b1a6215c37..e6ec556603 100644
---- a/xen/common/event_channel.c
-+++ b/xen/common/event_channel.c
-@@ -643,7 +643,9 @@ static int evtchn_bind_pirq(evtchn_bind_pirq_t *bind)
- if ( rc != 0 )
- {
- info->evtchn = 0;
-+#ifdef CONFIG_X86
- pirq_cleanup_check(info, d);
-+#endif
- goto out;
- }
-
-@@ -713,8 +715,8 @@ int evtchn_close(struct domain *d1, int port1, bool guest)
- * The successful path of unmap_domain_pirq_emuirq() will have
- * called pirq_cleanup_check() already.
- */
--#endif
- pirq_cleanup_check(pirq, d1);
-+#endif
- }
- unlink_pirq_port(chn1, d1->vcpu[chn1->notify_vcpu_id]);
- break;
---
-2.45.2
-
diff --git a/0053-x86-IRQ-avoid-double-unlock-in-map_domain_pirq.patch b/0053-x86-IRQ-avoid-double-unlock-in-map_domain_pirq.patch
deleted file mode 100644
index 686e142..0000000
--- a/0053-x86-IRQ-avoid-double-unlock-in-map_domain_pirq.patch
+++ /dev/null
@@ -1,53 +0,0 @@
-From d46a1ce3175dc45e97a8c9b89b0d0ff46145ae64 Mon Sep 17 00:00:00 2001
-From: Jan Beulich <jbeulich@suse.com>
-Date: Tue, 16 Jul 2024 14:14:43 +0200
-Subject: [PATCH 53/56] x86/IRQ: avoid double unlock in map_domain_pirq()
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Forever since its introduction the main loop in the function dealing
-with multi-vector MSI had error exit points ("break") with different
-properties: In one case no IRQ descriptor lock is being held.
-Nevertheless the subsequent error cleanup path assumed such a lock would
-uniformly need releasing. Identify the case by setting "desc" to NULL,
-thus allowing the unlock to be skipped as necessary.
-
-This is CVE-2024-31143 / XSA-458.
-
-Coverity ID: 1605298
-Fixes: d1b6d0a02489 ("x86: enable multi-vector MSI")
-Signed-off-by: Jan Beulich <jbeulich@suse.com>
-Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
-master commit: 57338346f29cea7b183403561bdc5f407163b846
-master date: 2024-07-16 14:09:14 +0200
----
- xen/arch/x86/irq.c | 5 ++++-
- 1 file changed, 4 insertions(+), 1 deletion(-)
-
-diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
-index 00be3b88e8..5dae8bd1b9 100644
---- a/xen/arch/x86/irq.c
-+++ b/xen/arch/x86/irq.c
-@@ -2287,6 +2287,7 @@ int map_domain_pirq(
-
- set_domain_irq_pirq(d, irq, info);
- spin_unlock_irqrestore(&desc->lock, flags);
-+ desc = NULL;
-
- info = NULL;
- irq = create_irq(NUMA_NO_NODE, true);
-@@ -2322,7 +2323,9 @@ int map_domain_pirq(
-
- if ( ret )
- {
-- spin_unlock_irqrestore(&desc->lock, flags);
-+ if ( desc )
-+ spin_unlock_irqrestore(&desc->lock, flags);
-+
- pci_disable_msi(msi_desc);
- if ( nr )
- {
---
-2.45.2
-
diff --git a/0054-x86-physdev-Return-pirq-that-irq-was-already-mapped-.patch b/0054-x86-physdev-Return-pirq-that-irq-was-already-mapped-.patch
deleted file mode 100644
index 5e245f9..0000000
--- a/0054-x86-physdev-Return-pirq-that-irq-was-already-mapped-.patch
+++ /dev/null
@@ -1,38 +0,0 @@
-From f9f3062f11e144438fac9e9da6aa4cb41a6009b1 Mon Sep 17 00:00:00 2001
-From: Jiqian Chen <Jiqian.Chen@amd.com>
-Date: Thu, 25 Jul 2024 16:20:17 +0200
-Subject: [PATCH 54/56] x86/physdev: Return pirq that irq was already mapped to
-
-Fix bug introduced by 0762e2502f1f ("x86/physdev: factor out the code to allocate and
-map a pirq"). After that re-factoring, when pirq<0 and current_pirq>0, it means
-caller want to allocate a free pirq for irq but irq already has a mapped pirq, then
-it returns the negative pirq, so it fails. However, the logic before that
-re-factoring is different, it should return the current_pirq that irq was already
-mapped to and make the call success.
-
-Fixes: 0762e2502f1f ("x86/physdev: factor out the code to allocate and map a pirq")
-Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
-Signed-off-by: Huang Rui <ray.huang@amd.com>
-Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: 0d2b87b5adfc19e87e9027d996db204c66a47f30
-master date: 2024-07-08 14:46:12 +0100
----
- xen/arch/x86/irq.c | 1 +
- 1 file changed, 1 insertion(+)
-
-diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
-index 5dae8bd1b9..6b1f338eae 100644
---- a/xen/arch/x86/irq.c
-+++ b/xen/arch/x86/irq.c
-@@ -2914,6 +2914,7 @@ static int allocate_pirq(struct domain *d, int index, int pirq, int irq,
- d->domain_id, index, pirq, current_pirq);
- if ( current_pirq < 0 )
- return -EBUSY;
-+ pirq = current_pirq;
- }
- else if ( type == MAP_PIRQ_TYPE_MULTI_MSI )
- {
---
-2.45.2
-
diff --git a/0055-tools-libxs-Fix-fcntl-invocation-in-set_cloexec.patch b/0055-tools-libxs-Fix-fcntl-invocation-in-set_cloexec.patch
deleted file mode 100644
index e4cc09e..0000000
--- a/0055-tools-libxs-Fix-fcntl-invocation-in-set_cloexec.patch
+++ /dev/null
@@ -1,57 +0,0 @@
-From 81f1e807fadb8111d71b78191e01ca688d74eac7 Mon Sep 17 00:00:00 2001
-From: Andrew Cooper <andrew.cooper3@citrix.com>
-Date: Thu, 25 Jul 2024 16:20:53 +0200
-Subject: [PATCH 55/56] tools/libxs: Fix fcntl() invocation in set_cloexec()
-
-set_cloexec() had a bit too much copy&pate from setnonblock(), and
-insufficient testing on ancient versions of Linux...
-
-As written (emulating ancient linux by undef'ing O_CLOEXEC), strace shows:
-
- open("/dev/xen/xenbus", O_RDWR) = 3
- fcntl(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
- fcntl(3, 0x8003 /* F_??? */, 0x7ffe4a771d90) = -1 EINVAL (Invalid argument)
- close(3) = 0
-
-which is obviously nonsense.
-
-Switch F_GETFL -> F_GETFD, and fix the second invocation to use F_SETFD. With
-this, strace is rather happer:
-
- open("/dev/xen/xenbus", O_RDWR) = 3
- fcntl(3, F_GETFD) = 0
- fcntl(3, F_SETFD, FD_CLOEXEC) = 0
-
-Fixes: bf7c1464706a ("tools/libxs: Fix CLOEXEC handling in get_dev()")
-Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
-Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
-Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
-Reviewed-by: Juergen Gross <jgross@suse.com>
-master commit: 37810b52d003f8a04af41d7b1f85eff24af9f804
-master date: 2024-07-09 15:32:18 +0100
----
- tools/libs/store/xs.c | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/tools/libs/store/xs.c b/tools/libs/store/xs.c
-index c8845b69e2..38a6ce3cf2 100644
---- a/tools/libs/store/xs.c
-+++ b/tools/libs/store/xs.c
-@@ -182,12 +182,12 @@ static bool setnonblock(int fd, int nonblock) {
-
- static bool set_cloexec(int fd)
- {
-- int flags = fcntl(fd, F_GETFL);
-+ int flags = fcntl(fd, F_GETFD);
-
- if (flags < 0)
- return false;
-
-- return fcntl(fd, flags | FD_CLOEXEC) >= 0;
-+ return fcntl(fd, F_SETFD, flags | FD_CLOEXEC) >= 0;
- }
-
- static int pipe_cloexec(int fds[2])
---
-2.45.2
-
diff --git a/0056-x86-altcall-fix-clang-code-gen-when-using-altcall-in.patch b/0056-x86-altcall-fix-clang-code-gen-when-using-altcall-in.patch
deleted file mode 100644
index c94c516..0000000
--- a/0056-x86-altcall-fix-clang-code-gen-when-using-altcall-in.patch
+++ /dev/null
@@ -1,85 +0,0 @@
-From d078d0aa86e9e3b937f673dc89306b3afd09d560 Mon Sep 17 00:00:00 2001
-From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
-Date: Thu, 25 Jul 2024 16:21:17 +0200
-Subject: [PATCH 56/56] x86/altcall: fix clang code-gen when using altcall in
- loop constructs
-MIME-Version: 1.0
-Content-Type: text/plain; charset=UTF-8
-Content-Transfer-Encoding: 8bit
-
-Yet another clang code generation issue when using altcalls.
-
-The issue this time is with using loop constructs around alternative_{,v}call
-instances using parameter types smaller than the register size.
-
-Given the following example code:
-
-static void bar(bool b)
-{
- unsigned int i;
-
- for ( i = 0; i < 10; i++ )
- {
- int ret_;
- register union {
- bool e;
- unsigned long r;
- } di asm("rdi") = { .e = b };
- register unsigned long si asm("rsi");
- register unsigned long dx asm("rdx");
- register unsigned long cx asm("rcx");
- register unsigned long r8 asm("r8");
- register unsigned long r9 asm("r9");
- register unsigned long r10 asm("r10");
- register unsigned long r11 asm("r11");
-
- asm volatile ( "call %c[addr]"
- : "+r" (di), "=r" (si), "=r" (dx),
- "=r" (cx), "=r" (r8), "=r" (r9),
- "=r" (r10), "=r" (r11), "=a" (ret_)
- : [addr] "i" (&(func)), "g" (func)
- : "memory" );
- }
-}
-
-See: https://godbolt.org/z/qvxMGd84q
-
-Clang will generate machine code that only resets the low 8 bits of %rdi
-between loop calls, leaving the rest of the register possibly containing
-garbage from the use of %rdi inside the called function. Note also that clang
-doesn't truncate the input parameters at the callee, thus breaking the psABI.
-
-Fix this by turning the `e` element in the anonymous union into an array that
-consumes the same space as an unsigned long, as this forces clang to reset the
-whole %rdi register instead of just the low 8 bits.
-
-Fixes: 2ce562b2a413 ('x86/altcall: use a union as register type for function parameters on clang')
-Suggested-by: Jan Beulich <jbeulich@suse.com>
-Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
-Reviewed-by: Jan Beulich <jbeulich@suse.com>
-master commit: d51b2f5ea1915fe058f730b0ec542cf84254fca0
-master date: 2024-07-23 13:59:30 +0200
----
- xen/arch/x86/include/asm/alternative.h | 4 ++--
- 1 file changed, 2 insertions(+), 2 deletions(-)
-
-diff --git a/xen/arch/x86/include/asm/alternative.h b/xen/arch/x86/include/asm/alternative.h
-index 0d3697f1de..e63b459276 100644
---- a/xen/arch/x86/include/asm/alternative.h
-+++ b/xen/arch/x86/include/asm/alternative.h
-@@ -185,10 +185,10 @@ extern void alternative_branches(void);
- */
- #define ALT_CALL_ARG(arg, n) \
- register union { \
-- typeof(arg) e; \
-+ typeof(arg) e[sizeof(long) / sizeof(arg)]; \
- unsigned long r; \
- } a ## n ## _ asm ( ALT_CALL_arg ## n ) = { \
-- .e = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); }) \
-+ .e[0] = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })\
- }
- #else
- #define ALT_CALL_ARG(arg, n) \
---
-2.45.2
-
diff --git a/info.txt b/info.txt
index ccc4d4e..eaab4bc 100644
--- a/info.txt
+++ b/info.txt
@@ -1,6 +1,6 @@
-Xen upstream patchset #0 for 4.18.3-pre
+Xen upstream patchset #0 for 4.19.1-pre
Containing patches from
-RELEASE-4.18.2 (844f9931c6c207588a70f897262c628cd542f75a)
+RELEASE-4.19.0 (0ef126c163d99932c9d7142e2bd130633c5c4844)
to
-staging-4.18 (d078d0aa86e9e3b937f673dc89306b3afd09d560)
+staging-4.19 (2c61ab407172682e1382204a8305107f19e2951b)
^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-10-02 5:59 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-02 5:59 [gentoo-commits] proj/xen-upstream-patches:main commit in: / Tomáš Mózes
-- strict thread matches above, loose matches on Subject: below --
2024-08-01 13:03 Tomáš Mózes
2024-04-05 7:00 Tomáš Mózes
2024-02-03 18:12 Tomáš Mózes
2023-10-18 18:31 Tomáš Mózes
2023-04-17 8:16 Florian Schmaus
2023-04-14 16:04 Tomáš Mózes
2022-11-09 8:53 Florian Schmaus
2022-10-19 9:04 Florian Schmaus
2022-07-14 8:16 Florian Schmaus
2022-07-12 19:34 Florian Schmaus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox