From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 6F3D4138806 for ; Mon, 25 Dec 2017 14:35:02 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 41E49E0CCD; Mon, 25 Dec 2017 14:34:59 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id E546BE0CCD for ; Mon, 25 Dec 2017 14:34:58 +0000 (UTC) Received: from oystercatcher.gentoo.org (oystercatcher.gentoo.org [148.251.78.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTPS id 15A0C33C770 for ; Mon, 25 Dec 2017 14:34:55 +0000 (UTC) Received: from localhost.localdomain (localhost [IPv6:::1]) by oystercatcher.gentoo.org (Postfix) with ESMTP id A0484AE8F for ; Mon, 25 Dec 2017 14:34:53 +0000 (UTC) From: "Alice Ferrazzi" To: gentoo-commits@lists.gentoo.org Content-Transfer-Encoding: 8bit Content-type: text/plain; charset=UTF-8 Reply-To: gentoo-dev@lists.gentoo.org, "Alice Ferrazzi" Message-ID: <1514212079.5a2a504d74eaf43aa0b3bf953836a44e0bb40b28.alicef@gentoo> Subject: [gentoo-commits] proj/linux-patches:4.14 commit in: / X-VCS-Repository: proj/linux-patches X-VCS-Files: 0000_README 1008_linux-4.14.9.patch X-VCS-Directories: / X-VCS-Committer: alicef X-VCS-Committer-Name: Alice Ferrazzi X-VCS-Revision: 5a2a504d74eaf43aa0b3bf953836a44e0bb40b28 X-VCS-Branch: 4.14 Date: Mon, 25 Dec 2017 14:34:53 +0000 (UTC) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-commits@lists.gentoo.org X-Archives-Salt: 63db567b-28f2-4ef6-8292-01a2b7fdef2c X-Archives-Hash: ad780b153ebe611b5f93cd9b21265651 commit: 5a2a504d74eaf43aa0b3bf953836a44e0bb40b28 Author: Alice Ferrazzi gentoo org> AuthorDate: Mon Dec 25 14:27:59 2017 +0000 Commit: Alice Ferrazzi gentoo org> CommitDate: Mon Dec 25 14:27:59 2017 +0000 URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=5a2a504d linux kernel 4.14.9 0000_README | 4 + 1008_linux-4.14.9.patch | 10292 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 10296 insertions(+) diff --git a/0000_README b/0000_README index 9a6da0f..a19ca77 100644 --- a/0000_README +++ b/0000_README @@ -75,6 +75,10 @@ Patch: 1007_linux-4.14.8.patch From: http://www.kernel.org Desc: Linux 4.14.8 +Patch: 1008_linux-4.14.9.patch +From: http://www.kernel.org +Desc: Linux 4.14.9 + Patch: 1500_XATTR_USER_PREFIX.patch From: https://bugs.gentoo.org/show_bug.cgi?id=470644 Desc: Support for namespace user.pax.* on tmpfs. diff --git a/1008_linux-4.14.9.patch b/1008_linux-4.14.9.patch new file mode 100644 index 0000000..006dbd5 --- /dev/null +++ b/1008_linux-4.14.9.patch @@ -0,0 +1,10292 @@ +diff --git a/Documentation/x86/orc-unwinder.txt b/Documentation/x86/orc-unwinder.txt +index af0c9a4c65a6..cd4b29be29af 100644 +--- a/Documentation/x86/orc-unwinder.txt ++++ b/Documentation/x86/orc-unwinder.txt +@@ -4,7 +4,7 @@ ORC unwinder + Overview + -------- + +-The kernel CONFIG_ORC_UNWINDER option enables the ORC unwinder, which is ++The kernel CONFIG_UNWINDER_ORC option enables the ORC unwinder, which is + similar in concept to a DWARF unwinder. The difference is that the + format of the ORC data is much simpler than DWARF, which in turn allows + the ORC unwinder to be much simpler and faster. +diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt +index b0798e281aa6..3448e675b462 100644 +--- a/Documentation/x86/x86_64/mm.txt ++++ b/Documentation/x86/x86_64/mm.txt +@@ -34,7 +34,7 @@ ff92000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space + ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole + ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB) + ... unused hole ... +-ffd8000000000000 - fff7ffffffffffff (=53 bits) kasan shadow memory (8PB) ++ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB) + ... unused hole ... + ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks + ... unused hole ... +diff --git a/Makefile b/Makefile +index 97b5ae76ac8c..ed2132c6d286 100644 +--- a/Makefile ++++ b/Makefile +@@ -1,7 +1,7 @@ + # SPDX-License-Identifier: GPL-2.0 + VERSION = 4 + PATCHLEVEL = 14 +-SUBLEVEL = 8 ++SUBLEVEL = 9 + EXTRAVERSION = + NAME = Petit Gorille + +@@ -935,8 +935,8 @@ ifdef CONFIG_STACK_VALIDATION + ifeq ($(has_libelf),1) + objtool_target := tools/objtool FORCE + else +- ifdef CONFIG_ORC_UNWINDER +- $(error "Cannot generate ORC metadata for CONFIG_ORC_UNWINDER=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel") ++ ifdef CONFIG_UNWINDER_ORC ++ $(error "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel") + else + $(warning "Cannot use CONFIG_STACK_VALIDATION=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel") + endif +diff --git a/arch/arm/configs/exynos_defconfig b/arch/arm/configs/exynos_defconfig +index 8c2a2619971b..f1d7834990ec 100644 +--- a/arch/arm/configs/exynos_defconfig ++++ b/arch/arm/configs/exynos_defconfig +@@ -244,7 +244,7 @@ CONFIG_USB_STORAGE_ONETOUCH=m + CONFIG_USB_STORAGE_KARMA=m + CONFIG_USB_STORAGE_CYPRESS_ATACB=m + CONFIG_USB_STORAGE_ENE_UB6250=m +-CONFIG_USB_UAS=m ++CONFIG_USB_UAS=y + CONFIG_USB_DWC3=y + CONFIG_USB_DWC2=y + CONFIG_USB_HSIC_USB3503=y +diff --git a/arch/arm/include/asm/ptrace.h b/arch/arm/include/asm/ptrace.h +index e9c9a117bd25..c7cdbb43ae7c 100644 +--- a/arch/arm/include/asm/ptrace.h ++++ b/arch/arm/include/asm/ptrace.h +@@ -126,8 +126,7 @@ extern unsigned long profile_pc(struct pt_regs *regs); + /* + * kprobe-based event tracer support + */ +-#include +-#include ++#include + #define MAX_REG_OFFSET (offsetof(struct pt_regs, ARM_ORIG_r0)) + + extern int regs_query_register_offset(const char *name); +diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h +index caf86be815ba..4052ec39e8db 100644 +--- a/arch/arm64/include/asm/fixmap.h ++++ b/arch/arm64/include/asm/fixmap.h +@@ -51,6 +51,13 @@ enum fixed_addresses { + + FIX_EARLYCON_MEM_BASE, + FIX_TEXT_POKE0, ++ ++#ifdef CONFIG_ACPI_APEI_GHES ++ /* Used for GHES mapping from assorted contexts */ ++ FIX_APEI_GHES_IRQ, ++ FIX_APEI_GHES_NMI, ++#endif /* CONFIG_ACPI_APEI_GHES */ ++ + __end_of_permanent_fixed_addresses, + + /* +diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c +index 57190f384f63..ce848ff84edd 100644 +--- a/arch/powerpc/kernel/watchdog.c ++++ b/arch/powerpc/kernel/watchdog.c +@@ -276,9 +276,12 @@ void arch_touch_nmi_watchdog(void) + { + unsigned long ticks = tb_ticks_per_usec * wd_timer_period_ms * 1000; + int cpu = smp_processor_id(); ++ u64 tb = get_tb(); + +- if (get_tb() - per_cpu(wd_timer_tb, cpu) >= ticks) +- watchdog_timer_interrupt(cpu); ++ if (tb - per_cpu(wd_timer_tb, cpu) >= ticks) { ++ per_cpu(wd_timer_tb, cpu) = tb; ++ wd_smp_clear_cpu_pending(cpu, tb); ++ } + } + EXPORT_SYMBOL(arch_touch_nmi_watchdog); + +diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c +index a66e64b0b251..5d115bd32539 100644 +--- a/arch/powerpc/net/bpf_jit_comp64.c ++++ b/arch/powerpc/net/bpf_jit_comp64.c +@@ -762,7 +762,8 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, + func = (u8 *) __bpf_call_base + imm; + + /* Save skb pointer if we need to re-cache skb data */ +- if (bpf_helper_changes_pkt_data(func)) ++ if ((ctx->seen & SEEN_SKB) && ++ bpf_helper_changes_pkt_data(func)) + PPC_BPF_STL(3, 1, bpf_jit_stack_local(ctx)); + + bpf_jit_emit_func_call(image, ctx, (u64)func); +@@ -771,7 +772,8 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, + PPC_MR(b2p[BPF_REG_0], 3); + + /* refresh skb cache */ +- if (bpf_helper_changes_pkt_data(func)) { ++ if ((ctx->seen & SEEN_SKB) && ++ bpf_helper_changes_pkt_data(func)) { + /* reload skb pointer to r3 */ + PPC_BPF_LL(3, 1, bpf_jit_stack_local(ctx)); + bpf_jit_emit_skb_loads(image, ctx); +diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c +index c008083fbc4f..2c8b325591cc 100644 +--- a/arch/powerpc/xmon/xmon.c ++++ b/arch/powerpc/xmon/xmon.c +@@ -530,14 +530,19 @@ static int xmon_core(struct pt_regs *regs, int fromipi) + + waiting: + secondary = 1; ++ spin_begin(); + while (secondary && !xmon_gate) { + if (in_xmon == 0) { +- if (fromipi) ++ if (fromipi) { ++ spin_end(); + goto leave; ++ } + secondary = test_and_set_bit(0, &in_xmon); + } +- barrier(); ++ spin_cpu_relax(); ++ touch_nmi_watchdog(); + } ++ spin_end(); + + if (!secondary && !xmon_gate) { + /* we are the first cpu to come in */ +@@ -568,21 +573,25 @@ static int xmon_core(struct pt_regs *regs, int fromipi) + mb(); + xmon_gate = 1; + barrier(); ++ touch_nmi_watchdog(); + } + + cmdloop: + while (in_xmon) { + if (secondary) { ++ spin_begin(); + if (cpu == xmon_owner) { + if (!test_and_set_bit(0, &xmon_taken)) { + secondary = 0; ++ spin_end(); + continue; + } + /* missed it */ + while (cpu == xmon_owner) +- barrier(); ++ spin_cpu_relax(); + } +- barrier(); ++ spin_cpu_relax(); ++ touch_nmi_watchdog(); + } else { + cmd = cmds(regs); + if (cmd != 0) { +diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c +index b15cd2f0320f..33e2785f6842 100644 +--- a/arch/s390/net/bpf_jit_comp.c ++++ b/arch/s390/net/bpf_jit_comp.c +@@ -55,8 +55,7 @@ struct bpf_jit { + #define SEEN_LITERAL 8 /* code uses literals */ + #define SEEN_FUNC 16 /* calls C functions */ + #define SEEN_TAIL_CALL 32 /* code uses tail calls */ +-#define SEEN_SKB_CHANGE 64 /* code changes skb data */ +-#define SEEN_REG_AX 128 /* code uses constant blinding */ ++#define SEEN_REG_AX 64 /* code uses constant blinding */ + #define SEEN_STACK (SEEN_FUNC | SEEN_MEM | SEEN_SKB) + + /* +@@ -448,12 +447,12 @@ static void bpf_jit_prologue(struct bpf_jit *jit) + EMIT6_DISP_LH(0xe3000000, 0x0024, REG_W1, REG_0, + REG_15, 152); + } +- if (jit->seen & SEEN_SKB) ++ if (jit->seen & SEEN_SKB) { + emit_load_skb_data_hlen(jit); +- if (jit->seen & SEEN_SKB_CHANGE) + /* stg %b1,ST_OFF_SKBP(%r0,%r15) */ + EMIT6_DISP_LH(0xe3000000, 0x0024, BPF_REG_1, REG_0, REG_15, + STK_OFF_SKBP); ++ } + } + + /* +@@ -983,8 +982,8 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i + EMIT2(0x0d00, REG_14, REG_W1); + /* lgr %b0,%r2: load return value into %b0 */ + EMIT4(0xb9040000, BPF_REG_0, REG_2); +- if (bpf_helper_changes_pkt_data((void *)func)) { +- jit->seen |= SEEN_SKB_CHANGE; ++ if ((jit->seen & SEEN_SKB) && ++ bpf_helper_changes_pkt_data((void *)func)) { + /* lg %b1,ST_OFF_SKBP(%r15) */ + EMIT6_DISP_LH(0xe3000000, 0x0004, BPF_REG_1, REG_0, + REG_15, STK_OFF_SKBP); +diff --git a/arch/sparc/include/asm/ptrace.h b/arch/sparc/include/asm/ptrace.h +index 6a339a78f4f4..71dd82b43cc5 100644 +--- a/arch/sparc/include/asm/ptrace.h ++++ b/arch/sparc/include/asm/ptrace.h +@@ -7,6 +7,7 @@ + #if defined(__sparc__) && defined(__arch64__) + #ifndef __ASSEMBLY__ + ++#include + #include + #include + +diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c +index 5765e7e711f7..ff5f9cb3039a 100644 +--- a/arch/sparc/net/bpf_jit_comp_64.c ++++ b/arch/sparc/net/bpf_jit_comp_64.c +@@ -1245,14 +1245,16 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx) + u8 *func = ((u8 *)__bpf_call_base) + imm; + + ctx->saw_call = true; ++ if (ctx->saw_ld_abs_ind && bpf_helper_changes_pkt_data(func)) ++ emit_reg_move(bpf2sparc[BPF_REG_1], L7, ctx); + + emit_call((u32 *)func, ctx); + emit_nop(ctx); + + emit_reg_move(O0, bpf2sparc[BPF_REG_0], ctx); + +- if (bpf_helper_changes_pkt_data(func) && ctx->saw_ld_abs_ind) +- load_skb_regs(ctx, bpf2sparc[BPF_REG_6]); ++ if (ctx->saw_ld_abs_ind && bpf_helper_changes_pkt_data(func)) ++ load_skb_regs(ctx, L7); + break; + } + +diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild +index 50a32c33d729..73c57f614c9e 100644 +--- a/arch/um/include/asm/Kbuild ++++ b/arch/um/include/asm/Kbuild +@@ -1,4 +1,5 @@ + generic-y += barrier.h ++generic-y += bpf_perf_event.h + generic-y += bug.h + generic-y += clkdev.h + generic-y += current.h +diff --git a/arch/um/include/shared/init.h b/arch/um/include/shared/init.h +index 390572daa40d..b3f5865a92c9 100644 +--- a/arch/um/include/shared/init.h ++++ b/arch/um/include/shared/init.h +@@ -41,7 +41,7 @@ + typedef int (*initcall_t)(void); + typedef void (*exitcall_t)(void); + +-#include ++#include + + /* These are for everybody (although not all archs will actually + discard it in modules) */ +diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig +index 9bceea6a5852..48646160eb83 100644 +--- a/arch/x86/Kconfig ++++ b/arch/x86/Kconfig +@@ -108,7 +108,7 @@ config X86 + select HAVE_ARCH_AUDITSYSCALL + select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE + select HAVE_ARCH_JUMP_LABEL +- select HAVE_ARCH_KASAN if X86_64 && SPARSEMEM_VMEMMAP ++ select HAVE_ARCH_KASAN if X86_64 + select HAVE_ARCH_KGDB + select HAVE_ARCH_KMEMCHECK + select HAVE_ARCH_MMAP_RND_BITS if MMU +@@ -171,7 +171,7 @@ config X86 + select HAVE_PERF_USER_STACK_DUMP + select HAVE_RCU_TABLE_FREE + select HAVE_REGS_AND_STACK_ACCESS_API +- select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER_UNWINDER && STACK_VALIDATION ++ select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION + select HAVE_STACK_VALIDATION if X86_64 + select HAVE_SYSCALL_TRACEPOINTS + select HAVE_UNSTABLE_SCHED_CLOCK +@@ -303,7 +303,6 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC + config KASAN_SHADOW_OFFSET + hex + depends on KASAN +- default 0xdff8000000000000 if X86_5LEVEL + default 0xdffffc0000000000 + + config HAVE_INTEL_TXT +diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug +index 90b123056f4b..6293a8768a91 100644 +--- a/arch/x86/Kconfig.debug ++++ b/arch/x86/Kconfig.debug +@@ -359,28 +359,14 @@ config PUNIT_ATOM_DEBUG + + choice + prompt "Choose kernel unwinder" +- default FRAME_POINTER_UNWINDER ++ default UNWINDER_ORC if X86_64 ++ default UNWINDER_FRAME_POINTER if X86_32 + ---help--- + This determines which method will be used for unwinding kernel stack + traces for panics, oopses, bugs, warnings, perf, /proc//stack, + livepatch, lockdep, and more. + +-config FRAME_POINTER_UNWINDER +- bool "Frame pointer unwinder" +- select FRAME_POINTER +- ---help--- +- This option enables the frame pointer unwinder for unwinding kernel +- stack traces. +- +- The unwinder itself is fast and it uses less RAM than the ORC +- unwinder, but the kernel text size will grow by ~3% and the kernel's +- overall performance will degrade by roughly 5-10%. +- +- This option is recommended if you want to use the livepatch +- consistency model, as this is currently the only way to get a +- reliable stack trace (CONFIG_HAVE_RELIABLE_STACKTRACE). +- +-config ORC_UNWINDER ++config UNWINDER_ORC + bool "ORC unwinder" + depends on X86_64 + select STACK_VALIDATION +@@ -396,7 +382,22 @@ config ORC_UNWINDER + Enabling this option will increase the kernel's runtime memory usage + by roughly 2-4MB, depending on your kernel config. + +-config GUESS_UNWINDER ++config UNWINDER_FRAME_POINTER ++ bool "Frame pointer unwinder" ++ select FRAME_POINTER ++ ---help--- ++ This option enables the frame pointer unwinder for unwinding kernel ++ stack traces. ++ ++ The unwinder itself is fast and it uses less RAM than the ORC ++ unwinder, but the kernel text size will grow by ~3% and the kernel's ++ overall performance will degrade by roughly 5-10%. ++ ++ This option is recommended if you want to use the livepatch ++ consistency model, as this is currently the only way to get a ++ reliable stack trace (CONFIG_HAVE_RELIABLE_STACKTRACE). ++ ++config UNWINDER_GUESS + bool "Guess unwinder" + depends on EXPERT + ---help--- +@@ -411,7 +412,7 @@ config GUESS_UNWINDER + endchoice + + config FRAME_POINTER +- depends on !ORC_UNWINDER && !GUESS_UNWINDER ++ depends on !UNWINDER_ORC && !UNWINDER_GUESS + bool + + endmenu +diff --git a/arch/x86/configs/tiny.config b/arch/x86/configs/tiny.config +index 550cd5012b73..66c9e2aab16c 100644 +--- a/arch/x86/configs/tiny.config ++++ b/arch/x86/configs/tiny.config +@@ -1,5 +1,5 @@ + CONFIG_NOHIGHMEM=y + # CONFIG_HIGHMEM4G is not set + # CONFIG_HIGHMEM64G is not set +-CONFIG_GUESS_UNWINDER=y +-# CONFIG_FRAME_POINTER_UNWINDER is not set ++CONFIG_UNWINDER_GUESS=y ++# CONFIG_UNWINDER_FRAME_POINTER is not set +diff --git a/arch/x86/configs/x86_64_defconfig b/arch/x86/configs/x86_64_defconfig +index 4a4b16e56d35..e32fc1f274d8 100644 +--- a/arch/x86/configs/x86_64_defconfig ++++ b/arch/x86/configs/x86_64_defconfig +@@ -299,6 +299,7 @@ CONFIG_DEBUG_STACKOVERFLOW=y + # CONFIG_DEBUG_RODATA_TEST is not set + CONFIG_DEBUG_BOOT_PARAMS=y + CONFIG_OPTIMIZE_INLINING=y ++CONFIG_UNWINDER_ORC=y + CONFIG_SECURITY=y + CONFIG_SECURITY_NETWORK=y + CONFIG_SECURITY_SELINUX=y +diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h +index 6e160031cfea..3fd8bc560fae 100644 +--- a/arch/x86/entry/calling.h ++++ b/arch/x86/entry/calling.h +@@ -142,56 +142,25 @@ For 32-bit we have the following conventions - kernel is built with + UNWIND_HINT_REGS offset=\offset + .endm + +- .macro RESTORE_EXTRA_REGS offset=0 +- movq 0*8+\offset(%rsp), %r15 +- movq 1*8+\offset(%rsp), %r14 +- movq 2*8+\offset(%rsp), %r13 +- movq 3*8+\offset(%rsp), %r12 +- movq 4*8+\offset(%rsp), %rbp +- movq 5*8+\offset(%rsp), %rbx +- UNWIND_HINT_REGS offset=\offset extra=0 +- .endm +- +- .macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1 +- .if \rstor_r11 +- movq 6*8(%rsp), %r11 +- .endif +- .if \rstor_r8910 +- movq 7*8(%rsp), %r10 +- movq 8*8(%rsp), %r9 +- movq 9*8(%rsp), %r8 +- .endif +- .if \rstor_rax +- movq 10*8(%rsp), %rax +- .endif +- .if \rstor_rcx +- movq 11*8(%rsp), %rcx +- .endif +- .if \rstor_rdx +- movq 12*8(%rsp), %rdx +- .endif +- movq 13*8(%rsp), %rsi +- movq 14*8(%rsp), %rdi +- UNWIND_HINT_IRET_REGS offset=16*8 +- .endm +- .macro RESTORE_C_REGS +- RESTORE_C_REGS_HELPER 1,1,1,1,1 +- .endm +- .macro RESTORE_C_REGS_EXCEPT_RAX +- RESTORE_C_REGS_HELPER 0,1,1,1,1 +- .endm +- .macro RESTORE_C_REGS_EXCEPT_RCX +- RESTORE_C_REGS_HELPER 1,0,1,1,1 +- .endm +- .macro RESTORE_C_REGS_EXCEPT_R11 +- RESTORE_C_REGS_HELPER 1,1,0,1,1 +- .endm +- .macro RESTORE_C_REGS_EXCEPT_RCX_R11 +- RESTORE_C_REGS_HELPER 1,0,0,1,1 +- .endm +- +- .macro REMOVE_PT_GPREGS_FROM_STACK addskip=0 +- subq $-(15*8+\addskip), %rsp ++ .macro POP_EXTRA_REGS ++ popq %r15 ++ popq %r14 ++ popq %r13 ++ popq %r12 ++ popq %rbp ++ popq %rbx ++ .endm ++ ++ .macro POP_C_REGS ++ popq %r11 ++ popq %r10 ++ popq %r9 ++ popq %r8 ++ popq %rax ++ popq %rcx ++ popq %rdx ++ popq %rsi ++ popq %rdi + .endm + + .macro icebp +diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S +index 4838037f97f6..bd8b57a5c874 100644 +--- a/arch/x86/entry/entry_32.S ++++ b/arch/x86/entry/entry_32.S +@@ -941,7 +941,8 @@ ENTRY(debug) + movl %esp, %eax # pt_regs pointer + + /* Are we currently on the SYSENTER stack? */ +- PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx) ++ movl PER_CPU_VAR(cpu_entry_area), %ecx ++ addl $CPU_ENTRY_AREA_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx + subl %eax, %ecx /* ecx = (end of SYSENTER_stack) - esp */ + cmpl $SIZEOF_SYSENTER_stack, %ecx + jb .Ldebug_from_sysenter_stack +@@ -984,7 +985,8 @@ ENTRY(nmi) + movl %esp, %eax # pt_regs pointer + + /* Are we currently on the SYSENTER stack? */ +- PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx) ++ movl PER_CPU_VAR(cpu_entry_area), %ecx ++ addl $CPU_ENTRY_AREA_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx + subl %eax, %ecx /* ecx = (end of SYSENTER_stack) - esp */ + cmpl $SIZEOF_SYSENTER_stack, %ecx + jb .Lnmi_from_sysenter_stack +diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S +index 2e956afe272c..6abe3fcaece9 100644 +--- a/arch/x86/entry/entry_64.S ++++ b/arch/x86/entry/entry_64.S +@@ -136,6 +136,64 @@ END(native_usergs_sysret64) + * with them due to bugs in both AMD and Intel CPUs. + */ + ++ .pushsection .entry_trampoline, "ax" ++ ++/* ++ * The code in here gets remapped into cpu_entry_area's trampoline. This means ++ * that the assembler and linker have the wrong idea as to where this code ++ * lives (and, in fact, it's mapped more than once, so it's not even at a ++ * fixed address). So we can't reference any symbols outside the entry ++ * trampoline and expect it to work. ++ * ++ * Instead, we carefully abuse %rip-relative addressing. ++ * _entry_trampoline(%rip) refers to the start of the remapped) entry ++ * trampoline. We can thus find cpu_entry_area with this macro: ++ */ ++ ++#define CPU_ENTRY_AREA \ ++ _entry_trampoline - CPU_ENTRY_AREA_entry_trampoline(%rip) ++ ++/* The top word of the SYSENTER stack is hot and is usable as scratch space. */ ++#define RSP_SCRATCH CPU_ENTRY_AREA_SYSENTER_stack + \ ++ SIZEOF_SYSENTER_stack - 8 + CPU_ENTRY_AREA ++ ++ENTRY(entry_SYSCALL_64_trampoline) ++ UNWIND_HINT_EMPTY ++ swapgs ++ ++ /* Stash the user RSP. */ ++ movq %rsp, RSP_SCRATCH ++ ++ /* Load the top of the task stack into RSP */ ++ movq CPU_ENTRY_AREA_tss + TSS_sp1 + CPU_ENTRY_AREA, %rsp ++ ++ /* Start building the simulated IRET frame. */ ++ pushq $__USER_DS /* pt_regs->ss */ ++ pushq RSP_SCRATCH /* pt_regs->sp */ ++ pushq %r11 /* pt_regs->flags */ ++ pushq $__USER_CS /* pt_regs->cs */ ++ pushq %rcx /* pt_regs->ip */ ++ ++ /* ++ * x86 lacks a near absolute jump, and we can't jump to the real ++ * entry text with a relative jump. We could push the target ++ * address and then use retq, but this destroys the pipeline on ++ * many CPUs (wasting over 20 cycles on Sandy Bridge). Instead, ++ * spill RDI and restore it in a second-stage trampoline. ++ */ ++ pushq %rdi ++ movq $entry_SYSCALL_64_stage2, %rdi ++ jmp *%rdi ++END(entry_SYSCALL_64_trampoline) ++ ++ .popsection ++ ++ENTRY(entry_SYSCALL_64_stage2) ++ UNWIND_HINT_EMPTY ++ popq %rdi ++ jmp entry_SYSCALL_64_after_hwframe ++END(entry_SYSCALL_64_stage2) ++ + ENTRY(entry_SYSCALL_64) + UNWIND_HINT_EMPTY + /* +@@ -221,10 +279,9 @@ entry_SYSCALL_64_fastpath: + TRACE_IRQS_ON /* user mode is traced as IRQs on */ + movq RIP(%rsp), %rcx + movq EFLAGS(%rsp), %r11 +- RESTORE_C_REGS_EXCEPT_RCX_R11 +- movq RSP(%rsp), %rsp ++ addq $6*8, %rsp /* skip extra regs -- they were preserved */ + UNWIND_HINT_EMPTY +- USERGS_SYSRET64 ++ jmp .Lpop_c_regs_except_rcx_r11_and_sysret + + 1: + /* +@@ -246,17 +303,18 @@ entry_SYSCALL64_slow_path: + call do_syscall_64 /* returns with IRQs disabled */ + + return_from_SYSCALL_64: +- RESTORE_EXTRA_REGS + TRACE_IRQS_IRETQ /* we're about to change IF */ + + /* + * Try to use SYSRET instead of IRET if we're returning to +- * a completely clean 64-bit userspace context. ++ * a completely clean 64-bit userspace context. If we're not, ++ * go to the slow exit path. + */ + movq RCX(%rsp), %rcx + movq RIP(%rsp), %r11 +- cmpq %rcx, %r11 /* RCX == RIP */ +- jne opportunistic_sysret_failed ++ ++ cmpq %rcx, %r11 /* SYSRET requires RCX == RIP */ ++ jne swapgs_restore_regs_and_return_to_usermode + + /* + * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP +@@ -274,14 +332,14 @@ return_from_SYSCALL_64: + + /* If this changed %rcx, it was not canonical */ + cmpq %rcx, %r11 +- jne opportunistic_sysret_failed ++ jne swapgs_restore_regs_and_return_to_usermode + + cmpq $__USER_CS, CS(%rsp) /* CS must match SYSRET */ +- jne opportunistic_sysret_failed ++ jne swapgs_restore_regs_and_return_to_usermode + + movq R11(%rsp), %r11 + cmpq %r11, EFLAGS(%rsp) /* R11 == RFLAGS */ +- jne opportunistic_sysret_failed ++ jne swapgs_restore_regs_and_return_to_usermode + + /* + * SYSCALL clears RF when it saves RFLAGS in R11 and SYSRET cannot +@@ -302,12 +360,12 @@ return_from_SYSCALL_64: + * would never get past 'stuck_here'. + */ + testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11 +- jnz opportunistic_sysret_failed ++ jnz swapgs_restore_regs_and_return_to_usermode + + /* nothing to check for RSP */ + + cmpq $__USER_DS, SS(%rsp) /* SS must match SYSRET */ +- jne opportunistic_sysret_failed ++ jne swapgs_restore_regs_and_return_to_usermode + + /* + * We win! This label is here just for ease of understanding +@@ -315,14 +373,36 @@ return_from_SYSCALL_64: + */ + syscall_return_via_sysret: + /* rcx and r11 are already restored (see code above) */ +- RESTORE_C_REGS_EXCEPT_RCX_R11 +- movq RSP(%rsp), %rsp + UNWIND_HINT_EMPTY +- USERGS_SYSRET64 ++ POP_EXTRA_REGS ++.Lpop_c_regs_except_rcx_r11_and_sysret: ++ popq %rsi /* skip r11 */ ++ popq %r10 ++ popq %r9 ++ popq %r8 ++ popq %rax ++ popq %rsi /* skip rcx */ ++ popq %rdx ++ popq %rsi + +-opportunistic_sysret_failed: +- SWAPGS +- jmp restore_c_regs_and_iret ++ /* ++ * Now all regs are restored except RSP and RDI. ++ * Save old stack pointer and switch to trampoline stack. ++ */ ++ movq %rsp, %rdi ++ movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp ++ ++ pushq RSP-RDI(%rdi) /* RSP */ ++ pushq (%rdi) /* RDI */ ++ ++ /* ++ * We are on the trampoline stack. All regs except RDI are live. ++ * We can do future final exit work right here. ++ */ ++ ++ popq %rdi ++ popq %rsp ++ USERGS_SYSRET64 + END(entry_SYSCALL_64) + + ENTRY(stub_ptregs_64) +@@ -423,8 +503,7 @@ ENTRY(ret_from_fork) + movq %rsp, %rdi + call syscall_return_slowpath /* returns with IRQs disabled */ + TRACE_IRQS_ON /* user mode is traced as IRQS on */ +- SWAPGS +- jmp restore_regs_and_iret ++ jmp swapgs_restore_regs_and_return_to_usermode + + 1: + /* kernel thread */ +@@ -457,12 +536,13 @@ END(irq_entries_start) + + .macro DEBUG_ENTRY_ASSERT_IRQS_OFF + #ifdef CONFIG_DEBUG_ENTRY +- pushfq +- testl $X86_EFLAGS_IF, (%rsp) ++ pushq %rax ++ SAVE_FLAGS(CLBR_RAX) ++ testl $X86_EFLAGS_IF, %eax + jz .Lokay_\@ + ud2 + .Lokay_\@: +- addq $8, %rsp ++ popq %rax + #endif + .endm + +@@ -554,6 +634,13 @@ END(irq_entries_start) + /* 0(%rsp): ~(interrupt number) */ + .macro interrupt func + cld ++ ++ testb $3, CS-ORIG_RAX(%rsp) ++ jz 1f ++ SWAPGS ++ call switch_to_thread_stack ++1: ++ + ALLOC_PT_GPREGS_ON_STACK + SAVE_C_REGS + SAVE_EXTRA_REGS +@@ -563,12 +650,8 @@ END(irq_entries_start) + jz 1f + + /* +- * IRQ from user mode. Switch to kernel gsbase and inform context +- * tracking that we're in kernel mode. +- */ +- SWAPGS +- +- /* ++ * IRQ from user mode. ++ * + * We need to tell lockdep that IRQs are off. We can't do this until + * we fix gsbase, and we should do it before enter_from_user_mode + * (which can take locks). Since TRACE_IRQS_OFF idempotent, +@@ -612,8 +695,52 @@ GLOBAL(retint_user) + mov %rsp,%rdi + call prepare_exit_to_usermode + TRACE_IRQS_IRETQ ++ ++GLOBAL(swapgs_restore_regs_and_return_to_usermode) ++#ifdef CONFIG_DEBUG_ENTRY ++ /* Assert that pt_regs indicates user mode. */ ++ testb $3, CS(%rsp) ++ jnz 1f ++ ud2 ++1: ++#endif ++ POP_EXTRA_REGS ++ popq %r11 ++ popq %r10 ++ popq %r9 ++ popq %r8 ++ popq %rax ++ popq %rcx ++ popq %rdx ++ popq %rsi ++ ++ /* ++ * The stack is now user RDI, orig_ax, RIP, CS, EFLAGS, RSP, SS. ++ * Save old stack pointer and switch to trampoline stack. ++ */ ++ movq %rsp, %rdi ++ movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp ++ ++ /* Copy the IRET frame to the trampoline stack. */ ++ pushq 6*8(%rdi) /* SS */ ++ pushq 5*8(%rdi) /* RSP */ ++ pushq 4*8(%rdi) /* EFLAGS */ ++ pushq 3*8(%rdi) /* CS */ ++ pushq 2*8(%rdi) /* RIP */ ++ ++ /* Push user RDI on the trampoline stack. */ ++ pushq (%rdi) ++ ++ /* ++ * We are on the trampoline stack. All regs except RDI are live. ++ * We can do future final exit work right here. ++ */ ++ ++ /* Restore RDI. */ ++ popq %rdi + SWAPGS +- jmp restore_regs_and_iret ++ INTERRUPT_RETURN ++ + + /* Returning to kernel space */ + retint_kernel: +@@ -633,15 +760,17 @@ retint_kernel: + */ + TRACE_IRQS_IRETQ + +-/* +- * At this label, code paths which return to kernel and to user, +- * which come from interrupts/exception and from syscalls, merge. +- */ +-GLOBAL(restore_regs_and_iret) +- RESTORE_EXTRA_REGS +-restore_c_regs_and_iret: +- RESTORE_C_REGS +- REMOVE_PT_GPREGS_FROM_STACK 8 ++GLOBAL(restore_regs_and_return_to_kernel) ++#ifdef CONFIG_DEBUG_ENTRY ++ /* Assert that pt_regs indicates kernel mode. */ ++ testb $3, CS(%rsp) ++ jz 1f ++ ud2 ++1: ++#endif ++ POP_EXTRA_REGS ++ POP_C_REGS ++ addq $8, %rsp /* skip regs->orig_ax */ + INTERRUPT_RETURN + + ENTRY(native_iret) +@@ -805,7 +934,33 @@ apicinterrupt IRQ_WORK_VECTOR irq_work_interrupt smp_irq_work_interrupt + /* + * Exception entry points. + */ +-#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss) + (TSS_ist + ((x) - 1) * 8) ++#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + ((x) - 1) * 8) ++ ++/* ++ * Switch to the thread stack. This is called with the IRET frame and ++ * orig_ax on the stack. (That is, RDI..R12 are not on the stack and ++ * space has not been allocated for them.) ++ */ ++ENTRY(switch_to_thread_stack) ++ UNWIND_HINT_FUNC ++ ++ pushq %rdi ++ movq %rsp, %rdi ++ movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp ++ UNWIND_HINT sp_offset=16 sp_reg=ORC_REG_DI ++ ++ pushq 7*8(%rdi) /* regs->ss */ ++ pushq 6*8(%rdi) /* regs->rsp */ ++ pushq 5*8(%rdi) /* regs->eflags */ ++ pushq 4*8(%rdi) /* regs->cs */ ++ pushq 3*8(%rdi) /* regs->ip */ ++ pushq 2*8(%rdi) /* regs->orig_ax */ ++ pushq 8(%rdi) /* return address */ ++ UNWIND_HINT_FUNC ++ ++ movq (%rdi), %rdi ++ ret ++END(switch_to_thread_stack) + + .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 + ENTRY(\sym) +@@ -818,17 +973,18 @@ ENTRY(\sym) + + ASM_CLAC + +- .ifeq \has_error_code ++ .if \has_error_code == 0 + pushq $-1 /* ORIG_RAX: no syscall to restart */ + .endif + + ALLOC_PT_GPREGS_ON_STACK + +- .if \paranoid +- .if \paranoid == 1 ++ .if \paranoid < 2 + testb $3, CS(%rsp) /* If coming from userspace, switch stacks */ +- jnz 1f ++ jnz .Lfrom_usermode_switch_stack_\@ + .endif ++ ++ .if \paranoid + call paranoid_entry + .else + call error_entry +@@ -870,20 +1026,15 @@ ENTRY(\sym) + jmp error_exit + .endif + +- .if \paranoid == 1 ++ .if \paranoid < 2 + /* +- * Paranoid entry from userspace. Switch stacks and treat it ++ * Entry from userspace. Switch stacks and treat it + * as a normal entry. This means that paranoid handlers + * run in real process context if user_mode(regs). + */ +-1: ++.Lfrom_usermode_switch_stack_\@: + call error_entry + +- +- movq %rsp, %rdi /* pt_regs pointer */ +- call sync_regs +- movq %rax, %rsp /* switch stack */ +- + movq %rsp, %rdi /* pt_regs pointer */ + + .if \has_error_code +@@ -1059,6 +1210,7 @@ idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK + idtentry stack_segment do_stack_segment has_error_code=1 + + #ifdef CONFIG_XEN ++idtentry xennmi do_nmi has_error_code=0 + idtentry xendebug do_debug has_error_code=0 + idtentry xenint3 do_int3 has_error_code=0 + #endif +@@ -1112,17 +1264,14 @@ ENTRY(paranoid_exit) + DISABLE_INTERRUPTS(CLBR_ANY) + TRACE_IRQS_OFF_DEBUG + testl %ebx, %ebx /* swapgs needed? */ +- jnz paranoid_exit_no_swapgs ++ jnz .Lparanoid_exit_no_swapgs + TRACE_IRQS_IRETQ + SWAPGS_UNSAFE_STACK +- jmp paranoid_exit_restore +-paranoid_exit_no_swapgs: ++ jmp .Lparanoid_exit_restore ++.Lparanoid_exit_no_swapgs: + TRACE_IRQS_IRETQ_DEBUG +-paranoid_exit_restore: +- RESTORE_EXTRA_REGS +- RESTORE_C_REGS +- REMOVE_PT_GPREGS_FROM_STACK 8 +- INTERRUPT_RETURN ++.Lparanoid_exit_restore: ++ jmp restore_regs_and_return_to_kernel + END(paranoid_exit) + + /* +@@ -1146,6 +1295,14 @@ ENTRY(error_entry) + SWAPGS + + .Lerror_entry_from_usermode_after_swapgs: ++ /* Put us onto the real thread stack. */ ++ popq %r12 /* save return addr in %12 */ ++ movq %rsp, %rdi /* arg0 = pt_regs pointer */ ++ call sync_regs ++ movq %rax, %rsp /* switch stack */ ++ ENCODE_FRAME_POINTER ++ pushq %r12 ++ + /* + * We need to tell lockdep that IRQs are off. We can't do this until + * we fix gsbase, and we should do it before enter_from_user_mode +@@ -1223,10 +1380,13 @@ ENTRY(error_exit) + jmp retint_user + END(error_exit) + +-/* Runs on exception stack */ +-/* XXX: broken on Xen PV */ ++/* ++ * Runs on exception stack. Xen PV does not go through this path at all, ++ * so we can use real assembly here. ++ */ + ENTRY(nmi) + UNWIND_HINT_IRET_REGS ++ + /* + * We allow breakpoints in NMIs. If a breakpoint occurs, then + * the iretq it performs will take us out of NMI context. +@@ -1284,7 +1444,7 @@ ENTRY(nmi) + * stacks lest we corrupt the "NMI executing" variable. + */ + +- SWAPGS_UNSAFE_STACK ++ swapgs + cld + movq %rsp, %rdx + movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp +@@ -1328,8 +1488,7 @@ ENTRY(nmi) + * Return back to user mode. We must *not* do the normal exit + * work, because we don't want to enable interrupts. + */ +- SWAPGS +- jmp restore_regs_and_iret ++ jmp swapgs_restore_regs_and_return_to_usermode + + .Lnmi_from_kernel: + /* +@@ -1450,7 +1609,7 @@ nested_nmi_out: + popq %rdx + + /* We are returning to kernel mode, so this cannot result in a fault. */ +- INTERRUPT_RETURN ++ iretq + + first_nmi: + /* Restore rdx. */ +@@ -1481,7 +1640,7 @@ first_nmi: + pushfq /* RFLAGS */ + pushq $__KERNEL_CS /* CS */ + pushq $1f /* RIP */ +- INTERRUPT_RETURN /* continues at repeat_nmi below */ ++ iretq /* continues at repeat_nmi below */ + UNWIND_HINT_IRET_REGS + 1: + #endif +@@ -1544,29 +1703,34 @@ end_repeat_nmi: + nmi_swapgs: + SWAPGS_UNSAFE_STACK + nmi_restore: +- RESTORE_EXTRA_REGS +- RESTORE_C_REGS ++ POP_EXTRA_REGS ++ POP_C_REGS + +- /* Point RSP at the "iret" frame. */ +- REMOVE_PT_GPREGS_FROM_STACK 6*8 ++ /* ++ * Skip orig_ax and the "outermost" frame to point RSP at the "iret" ++ * at the "iret" frame. ++ */ ++ addq $6*8, %rsp + + /* + * Clear "NMI executing". Set DF first so that we can easily + * distinguish the remaining code between here and IRET from +- * the SYSCALL entry and exit paths. On a native kernel, we +- * could just inspect RIP, but, on paravirt kernels, +- * INTERRUPT_RETURN can translate into a jump into a +- * hypercall page. ++ * the SYSCALL entry and exit paths. ++ * ++ * We arguably should just inspect RIP instead, but I (Andy) wrote ++ * this code when I had the misapprehension that Xen PV supported ++ * NMIs, and Xen PV would break that approach. + */ + std + movq $0, 5*8(%rsp) /* clear "NMI executing" */ + + /* +- * INTERRUPT_RETURN reads the "iret" frame and exits the NMI +- * stack in a single instruction. We are returning to kernel +- * mode, so this cannot result in a fault. ++ * iretq reads the "iret" frame and exits the NMI stack in a ++ * single instruction. We are returning to kernel mode, so this ++ * cannot result in a fault. Similarly, we don't need to worry ++ * about espfix64 on the way back to kernel mode. + */ +- INTERRUPT_RETURN ++ iretq + END(nmi) + + ENTRY(ignore_sysret) +diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S +index b5c7a56ed256..95ad40eb7eff 100644 +--- a/arch/x86/entry/entry_64_compat.S ++++ b/arch/x86/entry/entry_64_compat.S +@@ -48,7 +48,7 @@ + */ + ENTRY(entry_SYSENTER_compat) + /* Interrupts are off on entry. */ +- SWAPGS_UNSAFE_STACK ++ SWAPGS + movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp + + /* +@@ -306,8 +306,11 @@ ENTRY(entry_INT80_compat) + */ + movl %eax, %eax + +- /* Construct struct pt_regs on stack (iret frame is already on stack) */ + pushq %rax /* pt_regs->orig_ax */ ++ ++ /* switch to thread stack expects orig_ax to be pushed */ ++ call switch_to_thread_stack ++ + pushq %rdi /* pt_regs->di */ + pushq %rsi /* pt_regs->si */ + pushq %rdx /* pt_regs->dx */ +@@ -337,8 +340,7 @@ ENTRY(entry_INT80_compat) + + /* Go back to user mode. */ + TRACE_IRQS_ON +- SWAPGS +- jmp restore_regs_and_iret ++ jmp swapgs_restore_regs_and_return_to_usermode + END(entry_INT80_compat) + + ENTRY(stub32_clone) +diff --git a/arch/x86/entry/syscalls/Makefile b/arch/x86/entry/syscalls/Makefile +index 331f1dca5085..6fb9b57ed5ba 100644 +--- a/arch/x86/entry/syscalls/Makefile ++++ b/arch/x86/entry/syscalls/Makefile +@@ -1,6 +1,6 @@ + # SPDX-License-Identifier: GPL-2.0 +-out := $(obj)/../../include/generated/asm +-uapi := $(obj)/../../include/generated/uapi/asm ++out := arch/$(SRCARCH)/include/generated/asm ++uapi := arch/$(SRCARCH)/include/generated/uapi/asm + + # Create output directory if not already present + _dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)') \ +diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c +index 80534d3c2480..589af1eec7c1 100644 +--- a/arch/x86/events/core.c ++++ b/arch/x86/events/core.c +@@ -2371,7 +2371,7 @@ static unsigned long get_segment_base(unsigned int segment) + struct ldt_struct *ldt; + + /* IRQs are off, so this synchronizes with smp_store_release */ +- ldt = lockless_dereference(current->active_mm->context.ldt); ++ ldt = READ_ONCE(current->active_mm->context.ldt); + if (!ldt || idx >= ldt->nr_entries) + return 0; + +diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c +index f94855000d4e..09c26a4f139c 100644 +--- a/arch/x86/events/intel/core.c ++++ b/arch/x86/events/intel/core.c +@@ -2958,6 +2958,10 @@ static unsigned long intel_pmu_free_running_flags(struct perf_event *event) + + if (event->attr.use_clockid) + flags &= ~PERF_SAMPLE_TIME; ++ if (!event->attr.exclude_kernel) ++ flags &= ~PERF_SAMPLE_REGS_USER; ++ if (event->attr.sample_regs_user & ~PEBS_REGS) ++ flags &= ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR); + return flags; + } + +diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h +index 4196f81ec0e1..f7aaadf9331f 100644 +--- a/arch/x86/events/perf_event.h ++++ b/arch/x86/events/perf_event.h +@@ -85,13 +85,15 @@ struct amd_nb { + * Flags PEBS can handle without an PMI. + * + * TID can only be handled by flushing at context switch. ++ * REGS_USER can be handled for events limited to ring 3. + * + */ + #define PEBS_FREERUNNING_FLAGS \ + (PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_ADDR | \ + PERF_SAMPLE_ID | PERF_SAMPLE_CPU | PERF_SAMPLE_STREAM_ID | \ + PERF_SAMPLE_DATA_SRC | PERF_SAMPLE_IDENTIFIER | \ +- PERF_SAMPLE_TRANSACTION | PERF_SAMPLE_PHYS_ADDR) ++ PERF_SAMPLE_TRANSACTION | PERF_SAMPLE_PHYS_ADDR | \ ++ PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER) + + /* + * A debug store configuration. +@@ -110,6 +112,26 @@ struct debug_store { + u64 pebs_event_reset[MAX_PEBS_EVENTS]; + }; + ++#define PEBS_REGS \ ++ (PERF_REG_X86_AX | \ ++ PERF_REG_X86_BX | \ ++ PERF_REG_X86_CX | \ ++ PERF_REG_X86_DX | \ ++ PERF_REG_X86_DI | \ ++ PERF_REG_X86_SI | \ ++ PERF_REG_X86_SP | \ ++ PERF_REG_X86_BP | \ ++ PERF_REG_X86_IP | \ ++ PERF_REG_X86_FLAGS | \ ++ PERF_REG_X86_R8 | \ ++ PERF_REG_X86_R9 | \ ++ PERF_REG_X86_R10 | \ ++ PERF_REG_X86_R11 | \ ++ PERF_REG_X86_R12 | \ ++ PERF_REG_X86_R13 | \ ++ PERF_REG_X86_R14 | \ ++ PERF_REG_X86_R15) ++ + /* + * Per register state. + */ +diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c +index a5db63f728a2..a0b86cf486e0 100644 +--- a/arch/x86/hyperv/hv_init.c ++++ b/arch/x86/hyperv/hv_init.c +@@ -113,7 +113,7 @@ void hyperv_init(void) + u64 guest_id; + union hv_x64_msr_hypercall_contents hypercall_msr; + +- if (x86_hyper != &x86_hyper_ms_hyperv) ++ if (x86_hyper_type != X86_HYPER_MS_HYPERV) + return; + + /* Allocate percpu VP index */ +diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h +index 5b0579abb398..3ac991d81e74 100644 +--- a/arch/x86/include/asm/archrandom.h ++++ b/arch/x86/include/asm/archrandom.h +@@ -45,7 +45,7 @@ static inline bool rdrand_long(unsigned long *v) + bool ok; + unsigned int retry = RDRAND_RETRY_LOOPS; + do { +- asm volatile(RDRAND_LONG "\n\t" ++ asm volatile(RDRAND_LONG + CC_SET(c) + : CC_OUT(c) (ok), "=a" (*v)); + if (ok) +@@ -59,7 +59,7 @@ static inline bool rdrand_int(unsigned int *v) + bool ok; + unsigned int retry = RDRAND_RETRY_LOOPS; + do { +- asm volatile(RDRAND_INT "\n\t" ++ asm volatile(RDRAND_INT + CC_SET(c) + : CC_OUT(c) (ok), "=a" (*v)); + if (ok) +@@ -71,7 +71,7 @@ static inline bool rdrand_int(unsigned int *v) + static inline bool rdseed_long(unsigned long *v) + { + bool ok; +- asm volatile(RDSEED_LONG "\n\t" ++ asm volatile(RDSEED_LONG + CC_SET(c) + : CC_OUT(c) (ok), "=a" (*v)); + return ok; +@@ -80,7 +80,7 @@ static inline bool rdseed_long(unsigned long *v) + static inline bool rdseed_int(unsigned int *v) + { + bool ok; +- asm volatile(RDSEED_INT "\n\t" ++ asm volatile(RDSEED_INT + CC_SET(c) + : CC_OUT(c) (ok), "=a" (*v)); + return ok; +diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h +index 2bcf47314959..3fa039855b8f 100644 +--- a/arch/x86/include/asm/bitops.h ++++ b/arch/x86/include/asm/bitops.h +@@ -143,7 +143,7 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr) + static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr) + { + bool negative; +- asm volatile(LOCK_PREFIX "andb %2,%1\n\t" ++ asm volatile(LOCK_PREFIX "andb %2,%1" + CC_SET(s) + : CC_OUT(s) (negative), ADDR + : "ir" ((char) ~(1 << nr)) : "memory"); +@@ -246,7 +246,7 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long * + { + bool oldbit; + +- asm("bts %2,%1\n\t" ++ asm("bts %2,%1" + CC_SET(c) + : CC_OUT(c) (oldbit), ADDR + : "Ir" (nr)); +@@ -286,7 +286,7 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long + { + bool oldbit; + +- asm volatile("btr %2,%1\n\t" ++ asm volatile("btr %2,%1" + CC_SET(c) + : CC_OUT(c) (oldbit), ADDR + : "Ir" (nr)); +@@ -298,7 +298,7 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon + { + bool oldbit; + +- asm volatile("btc %2,%1\n\t" ++ asm volatile("btc %2,%1" + CC_SET(c) + : CC_OUT(c) (oldbit), ADDR + : "Ir" (nr) : "memory"); +@@ -329,7 +329,7 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l + { + bool oldbit; + +- asm volatile("bt %2,%1\n\t" ++ asm volatile("bt %2,%1" + CC_SET(c) + : CC_OUT(c) (oldbit) + : "m" (*(unsigned long *)addr), "Ir" (nr)); +diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h +index 70bc1df580b2..2cbd75dd2fd3 100644 +--- a/arch/x86/include/asm/compat.h ++++ b/arch/x86/include/asm/compat.h +@@ -7,6 +7,7 @@ + */ + #include + #include ++#include + #include + #include + #include +diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h +index 0dfa68438e80..ea9a7dde62e5 100644 +--- a/arch/x86/include/asm/cpufeature.h ++++ b/arch/x86/include/asm/cpufeature.h +@@ -126,16 +126,17 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; + #define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit) + + #define set_cpu_cap(c, bit) set_bit(bit, (unsigned long *)((c)->x86_capability)) +-#define clear_cpu_cap(c, bit) clear_bit(bit, (unsigned long *)((c)->x86_capability)) +-#define setup_clear_cpu_cap(bit) do { \ +- clear_cpu_cap(&boot_cpu_data, bit); \ +- set_bit(bit, (unsigned long *)cpu_caps_cleared); \ +-} while (0) ++ ++extern void setup_clear_cpu_cap(unsigned int bit); ++extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit); ++ + #define setup_force_cpu_cap(bit) do { \ + set_cpu_cap(&boot_cpu_data, bit); \ + set_bit(bit, (unsigned long *)cpu_caps_set); \ + } while (0) + ++#define setup_force_cpu_bug(bit) setup_force_cpu_cap(bit) ++ + #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_X86_FAST_FEATURE_TESTS) + /* + * Static testing of CPU features. Used the same as boot_cpu_has(). +diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h +index 793690fbda36..800104c8a3ed 100644 +--- a/arch/x86/include/asm/cpufeatures.h ++++ b/arch/x86/include/asm/cpufeatures.h +@@ -13,173 +13,176 @@ + /* + * Defines x86 CPU feature bits + */ +-#define NCAPINTS 18 /* N 32-bit words worth of info */ +-#define NBUGINTS 1 /* N 32-bit bug flags */ ++#define NCAPINTS 18 /* N 32-bit words worth of info */ ++#define NBUGINTS 1 /* N 32-bit bug flags */ + + /* + * Note: If the comment begins with a quoted string, that string is used + * in /proc/cpuinfo instead of the macro name. If the string is "", + * this feature bit is not displayed in /proc/cpuinfo at all. ++ * ++ * When adding new features here that depend on other features, ++ * please update the table in kernel/cpu/cpuid-deps.c as well. + */ + +-/* Intel-defined CPU features, CPUID level 0x00000001 (edx), word 0 */ +-#define X86_FEATURE_FPU ( 0*32+ 0) /* Onboard FPU */ +-#define X86_FEATURE_VME ( 0*32+ 1) /* Virtual Mode Extensions */ +-#define X86_FEATURE_DE ( 0*32+ 2) /* Debugging Extensions */ +-#define X86_FEATURE_PSE ( 0*32+ 3) /* Page Size Extensions */ +-#define X86_FEATURE_TSC ( 0*32+ 4) /* Time Stamp Counter */ +-#define X86_FEATURE_MSR ( 0*32+ 5) /* Model-Specific Registers */ +-#define X86_FEATURE_PAE ( 0*32+ 6) /* Physical Address Extensions */ +-#define X86_FEATURE_MCE ( 0*32+ 7) /* Machine Check Exception */ +-#define X86_FEATURE_CX8 ( 0*32+ 8) /* CMPXCHG8 instruction */ +-#define X86_FEATURE_APIC ( 0*32+ 9) /* Onboard APIC */ +-#define X86_FEATURE_SEP ( 0*32+11) /* SYSENTER/SYSEXIT */ +-#define X86_FEATURE_MTRR ( 0*32+12) /* Memory Type Range Registers */ +-#define X86_FEATURE_PGE ( 0*32+13) /* Page Global Enable */ +-#define X86_FEATURE_MCA ( 0*32+14) /* Machine Check Architecture */ +-#define X86_FEATURE_CMOV ( 0*32+15) /* CMOV instructions */ +- /* (plus FCMOVcc, FCOMI with FPU) */ +-#define X86_FEATURE_PAT ( 0*32+16) /* Page Attribute Table */ +-#define X86_FEATURE_PSE36 ( 0*32+17) /* 36-bit PSEs */ +-#define X86_FEATURE_PN ( 0*32+18) /* Processor serial number */ +-#define X86_FEATURE_CLFLUSH ( 0*32+19) /* CLFLUSH instruction */ +-#define X86_FEATURE_DS ( 0*32+21) /* "dts" Debug Store */ +-#define X86_FEATURE_ACPI ( 0*32+22) /* ACPI via MSR */ +-#define X86_FEATURE_MMX ( 0*32+23) /* Multimedia Extensions */ +-#define X86_FEATURE_FXSR ( 0*32+24) /* FXSAVE/FXRSTOR, CR4.OSFXSR */ +-#define X86_FEATURE_XMM ( 0*32+25) /* "sse" */ +-#define X86_FEATURE_XMM2 ( 0*32+26) /* "sse2" */ +-#define X86_FEATURE_SELFSNOOP ( 0*32+27) /* "ss" CPU self snoop */ +-#define X86_FEATURE_HT ( 0*32+28) /* Hyper-Threading */ +-#define X86_FEATURE_ACC ( 0*32+29) /* "tm" Automatic clock control */ +-#define X86_FEATURE_IA64 ( 0*32+30) /* IA-64 processor */ +-#define X86_FEATURE_PBE ( 0*32+31) /* Pending Break Enable */ ++/* Intel-defined CPU features, CPUID level 0x00000001 (EDX), word 0 */ ++#define X86_FEATURE_FPU ( 0*32+ 0) /* Onboard FPU */ ++#define X86_FEATURE_VME ( 0*32+ 1) /* Virtual Mode Extensions */ ++#define X86_FEATURE_DE ( 0*32+ 2) /* Debugging Extensions */ ++#define X86_FEATURE_PSE ( 0*32+ 3) /* Page Size Extensions */ ++#define X86_FEATURE_TSC ( 0*32+ 4) /* Time Stamp Counter */ ++#define X86_FEATURE_MSR ( 0*32+ 5) /* Model-Specific Registers */ ++#define X86_FEATURE_PAE ( 0*32+ 6) /* Physical Address Extensions */ ++#define X86_FEATURE_MCE ( 0*32+ 7) /* Machine Check Exception */ ++#define X86_FEATURE_CX8 ( 0*32+ 8) /* CMPXCHG8 instruction */ ++#define X86_FEATURE_APIC ( 0*32+ 9) /* Onboard APIC */ ++#define X86_FEATURE_SEP ( 0*32+11) /* SYSENTER/SYSEXIT */ ++#define X86_FEATURE_MTRR ( 0*32+12) /* Memory Type Range Registers */ ++#define X86_FEATURE_PGE ( 0*32+13) /* Page Global Enable */ ++#define X86_FEATURE_MCA ( 0*32+14) /* Machine Check Architecture */ ++#define X86_FEATURE_CMOV ( 0*32+15) /* CMOV instructions (plus FCMOVcc, FCOMI with FPU) */ ++#define X86_FEATURE_PAT ( 0*32+16) /* Page Attribute Table */ ++#define X86_FEATURE_PSE36 ( 0*32+17) /* 36-bit PSEs */ ++#define X86_FEATURE_PN ( 0*32+18) /* Processor serial number */ ++#define X86_FEATURE_CLFLUSH ( 0*32+19) /* CLFLUSH instruction */ ++#define X86_FEATURE_DS ( 0*32+21) /* "dts" Debug Store */ ++#define X86_FEATURE_ACPI ( 0*32+22) /* ACPI via MSR */ ++#define X86_FEATURE_MMX ( 0*32+23) /* Multimedia Extensions */ ++#define X86_FEATURE_FXSR ( 0*32+24) /* FXSAVE/FXRSTOR, CR4.OSFXSR */ ++#define X86_FEATURE_XMM ( 0*32+25) /* "sse" */ ++#define X86_FEATURE_XMM2 ( 0*32+26) /* "sse2" */ ++#define X86_FEATURE_SELFSNOOP ( 0*32+27) /* "ss" CPU self snoop */ ++#define X86_FEATURE_HT ( 0*32+28) /* Hyper-Threading */ ++#define X86_FEATURE_ACC ( 0*32+29) /* "tm" Automatic clock control */ ++#define X86_FEATURE_IA64 ( 0*32+30) /* IA-64 processor */ ++#define X86_FEATURE_PBE ( 0*32+31) /* Pending Break Enable */ + + /* AMD-defined CPU features, CPUID level 0x80000001, word 1 */ + /* Don't duplicate feature flags which are redundant with Intel! */ +-#define X86_FEATURE_SYSCALL ( 1*32+11) /* SYSCALL/SYSRET */ +-#define X86_FEATURE_MP ( 1*32+19) /* MP Capable. */ +-#define X86_FEATURE_NX ( 1*32+20) /* Execute Disable */ +-#define X86_FEATURE_MMXEXT ( 1*32+22) /* AMD MMX extensions */ +-#define X86_FEATURE_FXSR_OPT ( 1*32+25) /* FXSAVE/FXRSTOR optimizations */ +-#define X86_FEATURE_GBPAGES ( 1*32+26) /* "pdpe1gb" GB pages */ +-#define X86_FEATURE_RDTSCP ( 1*32+27) /* RDTSCP */ +-#define X86_FEATURE_LM ( 1*32+29) /* Long Mode (x86-64) */ +-#define X86_FEATURE_3DNOWEXT ( 1*32+30) /* AMD 3DNow! extensions */ +-#define X86_FEATURE_3DNOW ( 1*32+31) /* 3DNow! */ ++#define X86_FEATURE_SYSCALL ( 1*32+11) /* SYSCALL/SYSRET */ ++#define X86_FEATURE_MP ( 1*32+19) /* MP Capable */ ++#define X86_FEATURE_NX ( 1*32+20) /* Execute Disable */ ++#define X86_FEATURE_MMXEXT ( 1*32+22) /* AMD MMX extensions */ ++#define X86_FEATURE_FXSR_OPT ( 1*32+25) /* FXSAVE/FXRSTOR optimizations */ ++#define X86_FEATURE_GBPAGES ( 1*32+26) /* "pdpe1gb" GB pages */ ++#define X86_FEATURE_RDTSCP ( 1*32+27) /* RDTSCP */ ++#define X86_FEATURE_LM ( 1*32+29) /* Long Mode (x86-64, 64-bit support) */ ++#define X86_FEATURE_3DNOWEXT ( 1*32+30) /* AMD 3DNow extensions */ ++#define X86_FEATURE_3DNOW ( 1*32+31) /* 3DNow */ + + /* Transmeta-defined CPU features, CPUID level 0x80860001, word 2 */ +-#define X86_FEATURE_RECOVERY ( 2*32+ 0) /* CPU in recovery mode */ +-#define X86_FEATURE_LONGRUN ( 2*32+ 1) /* Longrun power control */ +-#define X86_FEATURE_LRTI ( 2*32+ 3) /* LongRun table interface */ ++#define X86_FEATURE_RECOVERY ( 2*32+ 0) /* CPU in recovery mode */ ++#define X86_FEATURE_LONGRUN ( 2*32+ 1) /* Longrun power control */ ++#define X86_FEATURE_LRTI ( 2*32+ 3) /* LongRun table interface */ + + /* Other features, Linux-defined mapping, word 3 */ + /* This range is used for feature bits which conflict or are synthesized */ +-#define X86_FEATURE_CXMMX ( 3*32+ 0) /* Cyrix MMX extensions */ +-#define X86_FEATURE_K6_MTRR ( 3*32+ 1) /* AMD K6 nonstandard MTRRs */ +-#define X86_FEATURE_CYRIX_ARR ( 3*32+ 2) /* Cyrix ARRs (= MTRRs) */ +-#define X86_FEATURE_CENTAUR_MCR ( 3*32+ 3) /* Centaur MCRs (= MTRRs) */ +-/* cpu types for specific tunings: */ +-#define X86_FEATURE_K8 ( 3*32+ 4) /* "" Opteron, Athlon64 */ +-#define X86_FEATURE_K7 ( 3*32+ 5) /* "" Athlon */ +-#define X86_FEATURE_P3 ( 3*32+ 6) /* "" P3 */ +-#define X86_FEATURE_P4 ( 3*32+ 7) /* "" P4 */ +-#define X86_FEATURE_CONSTANT_TSC ( 3*32+ 8) /* TSC ticks at a constant rate */ +-#define X86_FEATURE_UP ( 3*32+ 9) /* smp kernel running on up */ +-#define X86_FEATURE_ART ( 3*32+10) /* Platform has always running timer (ART) */ +-#define X86_FEATURE_ARCH_PERFMON ( 3*32+11) /* Intel Architectural PerfMon */ +-#define X86_FEATURE_PEBS ( 3*32+12) /* Precise-Event Based Sampling */ +-#define X86_FEATURE_BTS ( 3*32+13) /* Branch Trace Store */ +-#define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in ia32 userspace */ +-#define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in ia32 userspace */ +-#define X86_FEATURE_REP_GOOD ( 3*32+16) /* rep microcode works well */ +-#define X86_FEATURE_MFENCE_RDTSC ( 3*32+17) /* "" Mfence synchronizes RDTSC */ +-#define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" Lfence synchronizes RDTSC */ +-#define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */ +-#define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */ +-#define X86_FEATURE_ALWAYS ( 3*32+21) /* "" Always-present feature */ +-#define X86_FEATURE_XTOPOLOGY ( 3*32+22) /* cpu topology enum extensions */ +-#define X86_FEATURE_TSC_RELIABLE ( 3*32+23) /* TSC is known to be reliable */ +-#define X86_FEATURE_NONSTOP_TSC ( 3*32+24) /* TSC does not stop in C states */ +-#define X86_FEATURE_CPUID ( 3*32+25) /* CPU has CPUID instruction itself */ +-#define X86_FEATURE_EXTD_APICID ( 3*32+26) /* has extended APICID (8 bits) */ +-#define X86_FEATURE_AMD_DCM ( 3*32+27) /* multi-node processor */ +-#define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */ +-#define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */ +-#define X86_FEATURE_TSC_KNOWN_FREQ ( 3*32+31) /* TSC has known frequency */ ++#define X86_FEATURE_CXMMX ( 3*32+ 0) /* Cyrix MMX extensions */ ++#define X86_FEATURE_K6_MTRR ( 3*32+ 1) /* AMD K6 nonstandard MTRRs */ ++#define X86_FEATURE_CYRIX_ARR ( 3*32+ 2) /* Cyrix ARRs (= MTRRs) */ ++#define X86_FEATURE_CENTAUR_MCR ( 3*32+ 3) /* Centaur MCRs (= MTRRs) */ ++ ++/* CPU types for specific tunings: */ ++#define X86_FEATURE_K8 ( 3*32+ 4) /* "" Opteron, Athlon64 */ ++#define X86_FEATURE_K7 ( 3*32+ 5) /* "" Athlon */ ++#define X86_FEATURE_P3 ( 3*32+ 6) /* "" P3 */ ++#define X86_FEATURE_P4 ( 3*32+ 7) /* "" P4 */ ++#define X86_FEATURE_CONSTANT_TSC ( 3*32+ 8) /* TSC ticks at a constant rate */ ++#define X86_FEATURE_UP ( 3*32+ 9) /* SMP kernel running on UP */ ++#define X86_FEATURE_ART ( 3*32+10) /* Always running timer (ART) */ ++#define X86_FEATURE_ARCH_PERFMON ( 3*32+11) /* Intel Architectural PerfMon */ ++#define X86_FEATURE_PEBS ( 3*32+12) /* Precise-Event Based Sampling */ ++#define X86_FEATURE_BTS ( 3*32+13) /* Branch Trace Store */ ++#define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in IA32 userspace */ ++#define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */ ++#define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */ ++#define X86_FEATURE_MFENCE_RDTSC ( 3*32+17) /* "" MFENCE synchronizes RDTSC */ ++#define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" LFENCE synchronizes RDTSC */ ++#define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */ ++#define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */ ++#define X86_FEATURE_ALWAYS ( 3*32+21) /* "" Always-present feature */ ++#define X86_FEATURE_XTOPOLOGY ( 3*32+22) /* CPU topology enum extensions */ ++#define X86_FEATURE_TSC_RELIABLE ( 3*32+23) /* TSC is known to be reliable */ ++#define X86_FEATURE_NONSTOP_TSC ( 3*32+24) /* TSC does not stop in C states */ ++#define X86_FEATURE_CPUID ( 3*32+25) /* CPU has CPUID instruction itself */ ++#define X86_FEATURE_EXTD_APICID ( 3*32+26) /* Extended APICID (8 bits) */ ++#define X86_FEATURE_AMD_DCM ( 3*32+27) /* AMD multi-node processor */ ++#define X86_FEATURE_APERFMPERF ( 3*32+28) /* P-State hardware coordination feedback capability (APERF/MPERF MSRs) */ ++#define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */ ++#define X86_FEATURE_TSC_KNOWN_FREQ ( 3*32+31) /* TSC has known frequency */ + +-/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */ +-#define X86_FEATURE_XMM3 ( 4*32+ 0) /* "pni" SSE-3 */ +-#define X86_FEATURE_PCLMULQDQ ( 4*32+ 1) /* PCLMULQDQ instruction */ +-#define X86_FEATURE_DTES64 ( 4*32+ 2) /* 64-bit Debug Store */ +-#define X86_FEATURE_MWAIT ( 4*32+ 3) /* "monitor" Monitor/Mwait support */ +-#define X86_FEATURE_DSCPL ( 4*32+ 4) /* "ds_cpl" CPL Qual. Debug Store */ +-#define X86_FEATURE_VMX ( 4*32+ 5) /* Hardware virtualization */ +-#define X86_FEATURE_SMX ( 4*32+ 6) /* Safer mode */ +-#define X86_FEATURE_EST ( 4*32+ 7) /* Enhanced SpeedStep */ +-#define X86_FEATURE_TM2 ( 4*32+ 8) /* Thermal Monitor 2 */ +-#define X86_FEATURE_SSSE3 ( 4*32+ 9) /* Supplemental SSE-3 */ +-#define X86_FEATURE_CID ( 4*32+10) /* Context ID */ +-#define X86_FEATURE_SDBG ( 4*32+11) /* Silicon Debug */ +-#define X86_FEATURE_FMA ( 4*32+12) /* Fused multiply-add */ +-#define X86_FEATURE_CX16 ( 4*32+13) /* CMPXCHG16B */ +-#define X86_FEATURE_XTPR ( 4*32+14) /* Send Task Priority Messages */ +-#define X86_FEATURE_PDCM ( 4*32+15) /* Performance Capabilities */ +-#define X86_FEATURE_PCID ( 4*32+17) /* Process Context Identifiers */ +-#define X86_FEATURE_DCA ( 4*32+18) /* Direct Cache Access */ +-#define X86_FEATURE_XMM4_1 ( 4*32+19) /* "sse4_1" SSE-4.1 */ +-#define X86_FEATURE_XMM4_2 ( 4*32+20) /* "sse4_2" SSE-4.2 */ +-#define X86_FEATURE_X2APIC ( 4*32+21) /* x2APIC */ +-#define X86_FEATURE_MOVBE ( 4*32+22) /* MOVBE instruction */ +-#define X86_FEATURE_POPCNT ( 4*32+23) /* POPCNT instruction */ +-#define X86_FEATURE_TSC_DEADLINE_TIMER ( 4*32+24) /* Tsc deadline timer */ +-#define X86_FEATURE_AES ( 4*32+25) /* AES instructions */ +-#define X86_FEATURE_XSAVE ( 4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV */ +-#define X86_FEATURE_OSXSAVE ( 4*32+27) /* "" XSAVE enabled in the OS */ +-#define X86_FEATURE_AVX ( 4*32+28) /* Advanced Vector Extensions */ +-#define X86_FEATURE_F16C ( 4*32+29) /* 16-bit fp conversions */ +-#define X86_FEATURE_RDRAND ( 4*32+30) /* The RDRAND instruction */ +-#define X86_FEATURE_HYPERVISOR ( 4*32+31) /* Running on a hypervisor */ ++/* Intel-defined CPU features, CPUID level 0x00000001 (ECX), word 4 */ ++#define X86_FEATURE_XMM3 ( 4*32+ 0) /* "pni" SSE-3 */ ++#define X86_FEATURE_PCLMULQDQ ( 4*32+ 1) /* PCLMULQDQ instruction */ ++#define X86_FEATURE_DTES64 ( 4*32+ 2) /* 64-bit Debug Store */ ++#define X86_FEATURE_MWAIT ( 4*32+ 3) /* "monitor" MONITOR/MWAIT support */ ++#define X86_FEATURE_DSCPL ( 4*32+ 4) /* "ds_cpl" CPL-qualified (filtered) Debug Store */ ++#define X86_FEATURE_VMX ( 4*32+ 5) /* Hardware virtualization */ ++#define X86_FEATURE_SMX ( 4*32+ 6) /* Safer Mode eXtensions */ ++#define X86_FEATURE_EST ( 4*32+ 7) /* Enhanced SpeedStep */ ++#define X86_FEATURE_TM2 ( 4*32+ 8) /* Thermal Monitor 2 */ ++#define X86_FEATURE_SSSE3 ( 4*32+ 9) /* Supplemental SSE-3 */ ++#define X86_FEATURE_CID ( 4*32+10) /* Context ID */ ++#define X86_FEATURE_SDBG ( 4*32+11) /* Silicon Debug */ ++#define X86_FEATURE_FMA ( 4*32+12) /* Fused multiply-add */ ++#define X86_FEATURE_CX16 ( 4*32+13) /* CMPXCHG16B instruction */ ++#define X86_FEATURE_XTPR ( 4*32+14) /* Send Task Priority Messages */ ++#define X86_FEATURE_PDCM ( 4*32+15) /* Perf/Debug Capabilities MSR */ ++#define X86_FEATURE_PCID ( 4*32+17) /* Process Context Identifiers */ ++#define X86_FEATURE_DCA ( 4*32+18) /* Direct Cache Access */ ++#define X86_FEATURE_XMM4_1 ( 4*32+19) /* "sse4_1" SSE-4.1 */ ++#define X86_FEATURE_XMM4_2 ( 4*32+20) /* "sse4_2" SSE-4.2 */ ++#define X86_FEATURE_X2APIC ( 4*32+21) /* X2APIC */ ++#define X86_FEATURE_MOVBE ( 4*32+22) /* MOVBE instruction */ ++#define X86_FEATURE_POPCNT ( 4*32+23) /* POPCNT instruction */ ++#define X86_FEATURE_TSC_DEADLINE_TIMER ( 4*32+24) /* TSC deadline timer */ ++#define X86_FEATURE_AES ( 4*32+25) /* AES instructions */ ++#define X86_FEATURE_XSAVE ( 4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV instructions */ ++#define X86_FEATURE_OSXSAVE ( 4*32+27) /* "" XSAVE instruction enabled in the OS */ ++#define X86_FEATURE_AVX ( 4*32+28) /* Advanced Vector Extensions */ ++#define X86_FEATURE_F16C ( 4*32+29) /* 16-bit FP conversions */ ++#define X86_FEATURE_RDRAND ( 4*32+30) /* RDRAND instruction */ ++#define X86_FEATURE_HYPERVISOR ( 4*32+31) /* Running on a hypervisor */ + + /* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */ +-#define X86_FEATURE_XSTORE ( 5*32+ 2) /* "rng" RNG present (xstore) */ +-#define X86_FEATURE_XSTORE_EN ( 5*32+ 3) /* "rng_en" RNG enabled */ +-#define X86_FEATURE_XCRYPT ( 5*32+ 6) /* "ace" on-CPU crypto (xcrypt) */ +-#define X86_FEATURE_XCRYPT_EN ( 5*32+ 7) /* "ace_en" on-CPU crypto enabled */ +-#define X86_FEATURE_ACE2 ( 5*32+ 8) /* Advanced Cryptography Engine v2 */ +-#define X86_FEATURE_ACE2_EN ( 5*32+ 9) /* ACE v2 enabled */ +-#define X86_FEATURE_PHE ( 5*32+10) /* PadLock Hash Engine */ +-#define X86_FEATURE_PHE_EN ( 5*32+11) /* PHE enabled */ +-#define X86_FEATURE_PMM ( 5*32+12) /* PadLock Montgomery Multiplier */ +-#define X86_FEATURE_PMM_EN ( 5*32+13) /* PMM enabled */ ++#define X86_FEATURE_XSTORE ( 5*32+ 2) /* "rng" RNG present (xstore) */ ++#define X86_FEATURE_XSTORE_EN ( 5*32+ 3) /* "rng_en" RNG enabled */ ++#define X86_FEATURE_XCRYPT ( 5*32+ 6) /* "ace" on-CPU crypto (xcrypt) */ ++#define X86_FEATURE_XCRYPT_EN ( 5*32+ 7) /* "ace_en" on-CPU crypto enabled */ ++#define X86_FEATURE_ACE2 ( 5*32+ 8) /* Advanced Cryptography Engine v2 */ ++#define X86_FEATURE_ACE2_EN ( 5*32+ 9) /* ACE v2 enabled */ ++#define X86_FEATURE_PHE ( 5*32+10) /* PadLock Hash Engine */ ++#define X86_FEATURE_PHE_EN ( 5*32+11) /* PHE enabled */ ++#define X86_FEATURE_PMM ( 5*32+12) /* PadLock Montgomery Multiplier */ ++#define X86_FEATURE_PMM_EN ( 5*32+13) /* PMM enabled */ + +-/* More extended AMD flags: CPUID level 0x80000001, ecx, word 6 */ +-#define X86_FEATURE_LAHF_LM ( 6*32+ 0) /* LAHF/SAHF in long mode */ +-#define X86_FEATURE_CMP_LEGACY ( 6*32+ 1) /* If yes HyperThreading not valid */ +-#define X86_FEATURE_SVM ( 6*32+ 2) /* Secure virtual machine */ +-#define X86_FEATURE_EXTAPIC ( 6*32+ 3) /* Extended APIC space */ +-#define X86_FEATURE_CR8_LEGACY ( 6*32+ 4) /* CR8 in 32-bit mode */ +-#define X86_FEATURE_ABM ( 6*32+ 5) /* Advanced bit manipulation */ +-#define X86_FEATURE_SSE4A ( 6*32+ 6) /* SSE-4A */ +-#define X86_FEATURE_MISALIGNSSE ( 6*32+ 7) /* Misaligned SSE mode */ +-#define X86_FEATURE_3DNOWPREFETCH ( 6*32+ 8) /* 3DNow prefetch instructions */ +-#define X86_FEATURE_OSVW ( 6*32+ 9) /* OS Visible Workaround */ +-#define X86_FEATURE_IBS ( 6*32+10) /* Instruction Based Sampling */ +-#define X86_FEATURE_XOP ( 6*32+11) /* extended AVX instructions */ +-#define X86_FEATURE_SKINIT ( 6*32+12) /* SKINIT/STGI instructions */ +-#define X86_FEATURE_WDT ( 6*32+13) /* Watchdog timer */ +-#define X86_FEATURE_LWP ( 6*32+15) /* Light Weight Profiling */ +-#define X86_FEATURE_FMA4 ( 6*32+16) /* 4 operands MAC instructions */ +-#define X86_FEATURE_TCE ( 6*32+17) /* translation cache extension */ +-#define X86_FEATURE_NODEID_MSR ( 6*32+19) /* NodeId MSR */ +-#define X86_FEATURE_TBM ( 6*32+21) /* trailing bit manipulations */ +-#define X86_FEATURE_TOPOEXT ( 6*32+22) /* topology extensions CPUID leafs */ +-#define X86_FEATURE_PERFCTR_CORE ( 6*32+23) /* core performance counter extensions */ +-#define X86_FEATURE_PERFCTR_NB ( 6*32+24) /* NB performance counter extensions */ +-#define X86_FEATURE_BPEXT (6*32+26) /* data breakpoint extension */ +-#define X86_FEATURE_PTSC ( 6*32+27) /* performance time-stamp counter */ +-#define X86_FEATURE_PERFCTR_LLC ( 6*32+28) /* Last Level Cache performance counter extensions */ +-#define X86_FEATURE_MWAITX ( 6*32+29) /* MWAIT extension (MONITORX/MWAITX) */ ++/* More extended AMD flags: CPUID level 0x80000001, ECX, word 6 */ ++#define X86_FEATURE_LAHF_LM ( 6*32+ 0) /* LAHF/SAHF in long mode */ ++#define X86_FEATURE_CMP_LEGACY ( 6*32+ 1) /* If yes HyperThreading not valid */ ++#define X86_FEATURE_SVM ( 6*32+ 2) /* Secure Virtual Machine */ ++#define X86_FEATURE_EXTAPIC ( 6*32+ 3) /* Extended APIC space */ ++#define X86_FEATURE_CR8_LEGACY ( 6*32+ 4) /* CR8 in 32-bit mode */ ++#define X86_FEATURE_ABM ( 6*32+ 5) /* Advanced bit manipulation */ ++#define X86_FEATURE_SSE4A ( 6*32+ 6) /* SSE-4A */ ++#define X86_FEATURE_MISALIGNSSE ( 6*32+ 7) /* Misaligned SSE mode */ ++#define X86_FEATURE_3DNOWPREFETCH ( 6*32+ 8) /* 3DNow prefetch instructions */ ++#define X86_FEATURE_OSVW ( 6*32+ 9) /* OS Visible Workaround */ ++#define X86_FEATURE_IBS ( 6*32+10) /* Instruction Based Sampling */ ++#define X86_FEATURE_XOP ( 6*32+11) /* extended AVX instructions */ ++#define X86_FEATURE_SKINIT ( 6*32+12) /* SKINIT/STGI instructions */ ++#define X86_FEATURE_WDT ( 6*32+13) /* Watchdog timer */ ++#define X86_FEATURE_LWP ( 6*32+15) /* Light Weight Profiling */ ++#define X86_FEATURE_FMA4 ( 6*32+16) /* 4 operands MAC instructions */ ++#define X86_FEATURE_TCE ( 6*32+17) /* Translation Cache Extension */ ++#define X86_FEATURE_NODEID_MSR ( 6*32+19) /* NodeId MSR */ ++#define X86_FEATURE_TBM ( 6*32+21) /* Trailing Bit Manipulations */ ++#define X86_FEATURE_TOPOEXT ( 6*32+22) /* Topology extensions CPUID leafs */ ++#define X86_FEATURE_PERFCTR_CORE ( 6*32+23) /* Core performance counter extensions */ ++#define X86_FEATURE_PERFCTR_NB ( 6*32+24) /* NB performance counter extensions */ ++#define X86_FEATURE_BPEXT ( 6*32+26) /* Data breakpoint extension */ ++#define X86_FEATURE_PTSC ( 6*32+27) /* Performance time-stamp counter */ ++#define X86_FEATURE_PERFCTR_LLC ( 6*32+28) /* Last Level Cache performance counter extensions */ ++#define X86_FEATURE_MWAITX ( 6*32+29) /* MWAIT extension (MONITORX/MWAITX instructions) */ + + /* + * Auxiliary flags: Linux defined - For features scattered in various +@@ -187,146 +190,155 @@ + * + * Reuse free bits when adding new feature flags! + */ +-#define X86_FEATURE_RING3MWAIT ( 7*32+ 0) /* Ring 3 MONITOR/MWAIT */ +-#define X86_FEATURE_CPUID_FAULT ( 7*32+ 1) /* Intel CPUID faulting */ +-#define X86_FEATURE_CPB ( 7*32+ 2) /* AMD Core Performance Boost */ +-#define X86_FEATURE_EPB ( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */ +-#define X86_FEATURE_CAT_L3 ( 7*32+ 4) /* Cache Allocation Technology L3 */ +-#define X86_FEATURE_CAT_L2 ( 7*32+ 5) /* Cache Allocation Technology L2 */ +-#define X86_FEATURE_CDP_L3 ( 7*32+ 6) /* Code and Data Prioritization L3 */ ++#define X86_FEATURE_RING3MWAIT ( 7*32+ 0) /* Ring 3 MONITOR/MWAIT instructions */ ++#define X86_FEATURE_CPUID_FAULT ( 7*32+ 1) /* Intel CPUID faulting */ ++#define X86_FEATURE_CPB ( 7*32+ 2) /* AMD Core Performance Boost */ ++#define X86_FEATURE_EPB ( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */ ++#define X86_FEATURE_CAT_L3 ( 7*32+ 4) /* Cache Allocation Technology L3 */ ++#define X86_FEATURE_CAT_L2 ( 7*32+ 5) /* Cache Allocation Technology L2 */ ++#define X86_FEATURE_CDP_L3 ( 7*32+ 6) /* Code and Data Prioritization L3 */ + +-#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ +-#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ +-#define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */ ++#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ ++#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ ++#define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */ + +-#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ +-#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ +-#define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */ +-#define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */ ++#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ ++#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ ++#define X86_FEATURE_AVX512_4VNNIW ( 7*32+16) /* AVX-512 Neural Network Instructions */ ++#define X86_FEATURE_AVX512_4FMAPS ( 7*32+17) /* AVX-512 Multiply Accumulation Single precision */ + +-#define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */ ++#define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */ + + /* Virtualization flags: Linux defined, word 8 */ +-#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ +-#define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */ +-#define X86_FEATURE_FLEXPRIORITY ( 8*32+ 2) /* Intel FlexPriority */ +-#define X86_FEATURE_EPT ( 8*32+ 3) /* Intel Extended Page Table */ +-#define X86_FEATURE_VPID ( 8*32+ 4) /* Intel Virtual Processor ID */ ++#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ ++#define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */ ++#define X86_FEATURE_FLEXPRIORITY ( 8*32+ 2) /* Intel FlexPriority */ ++#define X86_FEATURE_EPT ( 8*32+ 3) /* Intel Extended Page Table */ ++#define X86_FEATURE_VPID ( 8*32+ 4) /* Intel Virtual Processor ID */ + +-#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer vmmcall to vmcall */ +-#define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */ ++#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer VMMCALL to VMCALL */ ++#define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */ + + +-/* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */ +-#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/ +-#define X86_FEATURE_TSC_ADJUST ( 9*32+ 1) /* TSC adjustment MSR 0x3b */ +-#define X86_FEATURE_BMI1 ( 9*32+ 3) /* 1st group bit manipulation extensions */ +-#define X86_FEATURE_HLE ( 9*32+ 4) /* Hardware Lock Elision */ +-#define X86_FEATURE_AVX2 ( 9*32+ 5) /* AVX2 instructions */ +-#define X86_FEATURE_SMEP ( 9*32+ 7) /* Supervisor Mode Execution Protection */ +-#define X86_FEATURE_BMI2 ( 9*32+ 8) /* 2nd group bit manipulation extensions */ +-#define X86_FEATURE_ERMS ( 9*32+ 9) /* Enhanced REP MOVSB/STOSB */ +-#define X86_FEATURE_INVPCID ( 9*32+10) /* Invalidate Processor Context ID */ +-#define X86_FEATURE_RTM ( 9*32+11) /* Restricted Transactional Memory */ +-#define X86_FEATURE_CQM ( 9*32+12) /* Cache QoS Monitoring */ +-#define X86_FEATURE_MPX ( 9*32+14) /* Memory Protection Extension */ +-#define X86_FEATURE_RDT_A ( 9*32+15) /* Resource Director Technology Allocation */ +-#define X86_FEATURE_AVX512F ( 9*32+16) /* AVX-512 Foundation */ +-#define X86_FEATURE_AVX512DQ ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */ +-#define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */ +-#define X86_FEATURE_ADX ( 9*32+19) /* The ADCX and ADOX instructions */ +-#define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */ +-#define X86_FEATURE_AVX512IFMA ( 9*32+21) /* AVX-512 Integer Fused Multiply-Add instructions */ +-#define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */ +-#define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */ +-#define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */ +-#define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */ +-#define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */ +-#define X86_FEATURE_SHA_NI ( 9*32+29) /* SHA1/SHA256 Instruction Extensions */ +-#define X86_FEATURE_AVX512BW ( 9*32+30) /* AVX-512 BW (Byte/Word granular) Instructions */ +-#define X86_FEATURE_AVX512VL ( 9*32+31) /* AVX-512 VL (128/256 Vector Length) Extensions */ ++/* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */ ++#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/ ++#define X86_FEATURE_TSC_ADJUST ( 9*32+ 1) /* TSC adjustment MSR 0x3B */ ++#define X86_FEATURE_BMI1 ( 9*32+ 3) /* 1st group bit manipulation extensions */ ++#define X86_FEATURE_HLE ( 9*32+ 4) /* Hardware Lock Elision */ ++#define X86_FEATURE_AVX2 ( 9*32+ 5) /* AVX2 instructions */ ++#define X86_FEATURE_SMEP ( 9*32+ 7) /* Supervisor Mode Execution Protection */ ++#define X86_FEATURE_BMI2 ( 9*32+ 8) /* 2nd group bit manipulation extensions */ ++#define X86_FEATURE_ERMS ( 9*32+ 9) /* Enhanced REP MOVSB/STOSB instructions */ ++#define X86_FEATURE_INVPCID ( 9*32+10) /* Invalidate Processor Context ID */ ++#define X86_FEATURE_RTM ( 9*32+11) /* Restricted Transactional Memory */ ++#define X86_FEATURE_CQM ( 9*32+12) /* Cache QoS Monitoring */ ++#define X86_FEATURE_MPX ( 9*32+14) /* Memory Protection Extension */ ++#define X86_FEATURE_RDT_A ( 9*32+15) /* Resource Director Technology Allocation */ ++#define X86_FEATURE_AVX512F ( 9*32+16) /* AVX-512 Foundation */ ++#define X86_FEATURE_AVX512DQ ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */ ++#define X86_FEATURE_RDSEED ( 9*32+18) /* RDSEED instruction */ ++#define X86_FEATURE_ADX ( 9*32+19) /* ADCX and ADOX instructions */ ++#define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */ ++#define X86_FEATURE_AVX512IFMA ( 9*32+21) /* AVX-512 Integer Fused Multiply-Add instructions */ ++#define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */ ++#define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */ ++#define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */ ++#define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */ ++#define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */ ++#define X86_FEATURE_SHA_NI ( 9*32+29) /* SHA1/SHA256 Instruction Extensions */ ++#define X86_FEATURE_AVX512BW ( 9*32+30) /* AVX-512 BW (Byte/Word granular) Instructions */ ++#define X86_FEATURE_AVX512VL ( 9*32+31) /* AVX-512 VL (128/256 Vector Length) Extensions */ + +-/* Extended state features, CPUID level 0x0000000d:1 (eax), word 10 */ +-#define X86_FEATURE_XSAVEOPT (10*32+ 0) /* XSAVEOPT */ +-#define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC */ +-#define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 */ +-#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS */ ++/* Extended state features, CPUID level 0x0000000d:1 (EAX), word 10 */ ++#define X86_FEATURE_XSAVEOPT (10*32+ 0) /* XSAVEOPT instruction */ ++#define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC instruction */ ++#define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 instruction */ ++#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS instructions */ + +-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:0 (edx), word 11 */ +-#define X86_FEATURE_CQM_LLC (11*32+ 1) /* LLC QoS if 1 */ ++/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:0 (EDX), word 11 */ ++#define X86_FEATURE_CQM_LLC (11*32+ 1) /* LLC QoS if 1 */ + +-/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (edx), word 12 */ +-#define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring if 1 */ +-#define X86_FEATURE_CQM_MBM_TOTAL (12*32+ 1) /* LLC Total MBM monitoring */ +-#define X86_FEATURE_CQM_MBM_LOCAL (12*32+ 2) /* LLC Local MBM monitoring */ ++/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (EDX), word 12 */ ++#define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring */ ++#define X86_FEATURE_CQM_MBM_TOTAL (12*32+ 1) /* LLC Total MBM monitoring */ ++#define X86_FEATURE_CQM_MBM_LOCAL (12*32+ 2) /* LLC Local MBM monitoring */ + +-/* AMD-defined CPU features, CPUID level 0x80000008 (ebx), word 13 */ +-#define X86_FEATURE_CLZERO (13*32+0) /* CLZERO instruction */ +-#define X86_FEATURE_IRPERF (13*32+1) /* Instructions Retired Count */ ++/* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */ ++#define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */ ++#define X86_FEATURE_IRPERF (13*32+ 1) /* Instructions Retired Count */ ++#define X86_FEATURE_XSAVEERPTR (13*32+ 2) /* Always save/restore FP error pointers */ + +-/* Thermal and Power Management Leaf, CPUID level 0x00000006 (eax), word 14 */ +-#define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ +-#define X86_FEATURE_IDA (14*32+ 1) /* Intel Dynamic Acceleration */ +-#define X86_FEATURE_ARAT (14*32+ 2) /* Always Running APIC Timer */ +-#define X86_FEATURE_PLN (14*32+ 4) /* Intel Power Limit Notification */ +-#define X86_FEATURE_PTS (14*32+ 6) /* Intel Package Thermal Status */ +-#define X86_FEATURE_HWP (14*32+ 7) /* Intel Hardware P-states */ +-#define X86_FEATURE_HWP_NOTIFY (14*32+ 8) /* HWP Notification */ +-#define X86_FEATURE_HWP_ACT_WINDOW (14*32+ 9) /* HWP Activity Window */ +-#define X86_FEATURE_HWP_EPP (14*32+10) /* HWP Energy Perf. Preference */ +-#define X86_FEATURE_HWP_PKG_REQ (14*32+11) /* HWP Package Level Request */ ++/* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */ ++#define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ ++#define X86_FEATURE_IDA (14*32+ 1) /* Intel Dynamic Acceleration */ ++#define X86_FEATURE_ARAT (14*32+ 2) /* Always Running APIC Timer */ ++#define X86_FEATURE_PLN (14*32+ 4) /* Intel Power Limit Notification */ ++#define X86_FEATURE_PTS (14*32+ 6) /* Intel Package Thermal Status */ ++#define X86_FEATURE_HWP (14*32+ 7) /* Intel Hardware P-states */ ++#define X86_FEATURE_HWP_NOTIFY (14*32+ 8) /* HWP Notification */ ++#define X86_FEATURE_HWP_ACT_WINDOW (14*32+ 9) /* HWP Activity Window */ ++#define X86_FEATURE_HWP_EPP (14*32+10) /* HWP Energy Perf. Preference */ ++#define X86_FEATURE_HWP_PKG_REQ (14*32+11) /* HWP Package Level Request */ + +-/* AMD SVM Feature Identification, CPUID level 0x8000000a (edx), word 15 */ +-#define X86_FEATURE_NPT (15*32+ 0) /* Nested Page Table support */ +-#define X86_FEATURE_LBRV (15*32+ 1) /* LBR Virtualization support */ +-#define X86_FEATURE_SVML (15*32+ 2) /* "svm_lock" SVM locking MSR */ +-#define X86_FEATURE_NRIPS (15*32+ 3) /* "nrip_save" SVM next_rip save */ +-#define X86_FEATURE_TSCRATEMSR (15*32+ 4) /* "tsc_scale" TSC scaling support */ +-#define X86_FEATURE_VMCBCLEAN (15*32+ 5) /* "vmcb_clean" VMCB clean bits support */ +-#define X86_FEATURE_FLUSHBYASID (15*32+ 6) /* flush-by-ASID support */ +-#define X86_FEATURE_DECODEASSISTS (15*32+ 7) /* Decode Assists support */ +-#define X86_FEATURE_PAUSEFILTER (15*32+10) /* filtered pause intercept */ +-#define X86_FEATURE_PFTHRESHOLD (15*32+12) /* pause filter threshold */ +-#define X86_FEATURE_AVIC (15*32+13) /* Virtual Interrupt Controller */ +-#define X86_FEATURE_V_VMSAVE_VMLOAD (15*32+15) /* Virtual VMSAVE VMLOAD */ +-#define X86_FEATURE_VGIF (15*32+16) /* Virtual GIF */ ++/* AMD SVM Feature Identification, CPUID level 0x8000000a (EDX), word 15 */ ++#define X86_FEATURE_NPT (15*32+ 0) /* Nested Page Table support */ ++#define X86_FEATURE_LBRV (15*32+ 1) /* LBR Virtualization support */ ++#define X86_FEATURE_SVML (15*32+ 2) /* "svm_lock" SVM locking MSR */ ++#define X86_FEATURE_NRIPS (15*32+ 3) /* "nrip_save" SVM next_rip save */ ++#define X86_FEATURE_TSCRATEMSR (15*32+ 4) /* "tsc_scale" TSC scaling support */ ++#define X86_FEATURE_VMCBCLEAN (15*32+ 5) /* "vmcb_clean" VMCB clean bits support */ ++#define X86_FEATURE_FLUSHBYASID (15*32+ 6) /* flush-by-ASID support */ ++#define X86_FEATURE_DECODEASSISTS (15*32+ 7) /* Decode Assists support */ ++#define X86_FEATURE_PAUSEFILTER (15*32+10) /* filtered pause intercept */ ++#define X86_FEATURE_PFTHRESHOLD (15*32+12) /* pause filter threshold */ ++#define X86_FEATURE_AVIC (15*32+13) /* Virtual Interrupt Controller */ ++#define X86_FEATURE_V_VMSAVE_VMLOAD (15*32+15) /* Virtual VMSAVE VMLOAD */ ++#define X86_FEATURE_VGIF (15*32+16) /* Virtual GIF */ + +-/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */ +-#define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ +-#define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ +-#define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ +-#define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ +-#define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */ +-#define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ ++/* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */ ++#define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ ++#define X86_FEATURE_UMIP (16*32+ 2) /* User Mode Instruction Protection */ ++#define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ ++#define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ ++#define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ ++#define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ ++#define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ ++#define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ ++#define X86_FEATURE_AVX512_VNNI (16*32+11) /* Vector Neural Network Instructions */ ++#define X86_FEATURE_AVX512_BITALG (16*32+12) /* Support for VPOPCNT[B,W] and VPSHUF-BITQMB instructions */ ++#define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ ++#define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */ ++#define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ + +-/* AMD-defined CPU features, CPUID level 0x80000007 (ebx), word 17 */ +-#define X86_FEATURE_OVERFLOW_RECOV (17*32+0) /* MCA overflow recovery support */ +-#define X86_FEATURE_SUCCOR (17*32+1) /* Uncorrectable error containment and recovery */ +-#define X86_FEATURE_SMCA (17*32+3) /* Scalable MCA */ ++/* AMD-defined CPU features, CPUID level 0x80000007 (EBX), word 17 */ ++#define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */ ++#define X86_FEATURE_SUCCOR (17*32+ 1) /* Uncorrectable error containment and recovery */ ++#define X86_FEATURE_SMCA (17*32+ 3) /* Scalable MCA */ + + /* + * BUG word(s) + */ +-#define X86_BUG(x) (NCAPINTS*32 + (x)) ++#define X86_BUG(x) (NCAPINTS*32 + (x)) + +-#define X86_BUG_F00F X86_BUG(0) /* Intel F00F */ +-#define X86_BUG_FDIV X86_BUG(1) /* FPU FDIV */ +-#define X86_BUG_COMA X86_BUG(2) /* Cyrix 6x86 coma */ +-#define X86_BUG_AMD_TLB_MMATCH X86_BUG(3) /* "tlb_mmatch" AMD Erratum 383 */ +-#define X86_BUG_AMD_APIC_C1E X86_BUG(4) /* "apic_c1e" AMD Erratum 400 */ +-#define X86_BUG_11AP X86_BUG(5) /* Bad local APIC aka 11AP */ +-#define X86_BUG_FXSAVE_LEAK X86_BUG(6) /* FXSAVE leaks FOP/FIP/FOP */ +-#define X86_BUG_CLFLUSH_MONITOR X86_BUG(7) /* AAI65, CLFLUSH required before MONITOR */ +-#define X86_BUG_SYSRET_SS_ATTRS X86_BUG(8) /* SYSRET doesn't fix up SS attrs */ ++#define X86_BUG_F00F X86_BUG(0) /* Intel F00F */ ++#define X86_BUG_FDIV X86_BUG(1) /* FPU FDIV */ ++#define X86_BUG_COMA X86_BUG(2) /* Cyrix 6x86 coma */ ++#define X86_BUG_AMD_TLB_MMATCH X86_BUG(3) /* "tlb_mmatch" AMD Erratum 383 */ ++#define X86_BUG_AMD_APIC_C1E X86_BUG(4) /* "apic_c1e" AMD Erratum 400 */ ++#define X86_BUG_11AP X86_BUG(5) /* Bad local APIC aka 11AP */ ++#define X86_BUG_FXSAVE_LEAK X86_BUG(6) /* FXSAVE leaks FOP/FIP/FOP */ ++#define X86_BUG_CLFLUSH_MONITOR X86_BUG(7) /* AAI65, CLFLUSH required before MONITOR */ ++#define X86_BUG_SYSRET_SS_ATTRS X86_BUG(8) /* SYSRET doesn't fix up SS attrs */ + #ifdef CONFIG_X86_32 + /* + * 64-bit kernels don't use X86_BUG_ESPFIX. Make the define conditional + * to avoid confusion. + */ +-#define X86_BUG_ESPFIX X86_BUG(9) /* "" IRET to 16-bit SS corrupts ESP/RSP high bits */ ++#define X86_BUG_ESPFIX X86_BUG(9) /* "" IRET to 16-bit SS corrupts ESP/RSP high bits */ + #endif +-#define X86_BUG_NULL_SEG X86_BUG(10) /* Nulling a selector preserves the base */ +-#define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */ +-#define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ +-#define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ ++#define X86_BUG_NULL_SEG X86_BUG(10) /* Nulling a selector preserves the base */ ++#define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */ ++#define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ ++#define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ ++ + #endif /* _ASM_X86_CPUFEATURES_H */ +diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h +index 0a3e808b9123..2ace1f90d138 100644 +--- a/arch/x86/include/asm/desc.h ++++ b/arch/x86/include/asm/desc.h +@@ -60,17 +60,10 @@ static inline struct desc_struct *get_current_gdt_rw(void) + return this_cpu_ptr(&gdt_page)->gdt; + } + +-/* Get the fixmap index for a specific processor */ +-static inline unsigned int get_cpu_gdt_ro_index(int cpu) +-{ +- return FIX_GDT_REMAP_BEGIN + cpu; +-} +- + /* Provide the fixmap address of the remapped GDT */ + static inline struct desc_struct *get_cpu_gdt_ro(int cpu) + { +- unsigned int idx = get_cpu_gdt_ro_index(cpu); +- return (struct desc_struct *)__fix_to_virt(idx); ++ return (struct desc_struct *)&get_cpu_entry_area(cpu)->gdt; + } + + /* Provide the current read-only GDT */ +@@ -185,7 +178,7 @@ static inline void set_tssldt_descriptor(void *d, unsigned long addr, + #endif + } + +-static inline void __set_tss_desc(unsigned cpu, unsigned int entry, void *addr) ++static inline void __set_tss_desc(unsigned cpu, unsigned int entry, struct x86_hw_tss *addr) + { + struct desc_struct *d = get_cpu_gdt_rw(cpu); + tss_desc tss; +diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h +index dcd9fb55e679..94fc4fa14127 100644 +--- a/arch/x86/include/asm/fixmap.h ++++ b/arch/x86/include/asm/fixmap.h +@@ -44,6 +44,45 @@ extern unsigned long __FIXADDR_TOP; + PAGE_SIZE) + #endif + ++/* ++ * cpu_entry_area is a percpu region in the fixmap that contains things ++ * needed by the CPU and early entry/exit code. Real types aren't used ++ * for all fields here to avoid circular header dependencies. ++ * ++ * Every field is a virtual alias of some other allocated backing store. ++ * There is no direct allocation of a struct cpu_entry_area. ++ */ ++struct cpu_entry_area { ++ char gdt[PAGE_SIZE]; ++ ++ /* ++ * The GDT is just below SYSENTER_stack and thus serves (on x86_64) as ++ * a a read-only guard page. ++ */ ++ struct SYSENTER_stack_page SYSENTER_stack_page; ++ ++ /* ++ * On x86_64, the TSS is mapped RO. On x86_32, it's mapped RW because ++ * we need task switches to work, and task switches write to the TSS. ++ */ ++ struct tss_struct tss; ++ ++ char entry_trampoline[PAGE_SIZE]; ++ ++#ifdef CONFIG_X86_64 ++ /* ++ * Exception stacks used for IST entries. ++ * ++ * In the future, this should have a separate slot for each stack ++ * with guard pages between them. ++ */ ++ char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]; ++#endif ++}; ++ ++#define CPU_ENTRY_AREA_PAGES (sizeof(struct cpu_entry_area) / PAGE_SIZE) ++ ++extern void setup_cpu_entry_areas(void); + + /* + * Here we define all the compile-time 'special' virtual +@@ -101,8 +140,14 @@ enum fixed_addresses { + FIX_LNW_VRTC, + #endif + /* Fixmap entries to remap the GDTs, one per processor. */ +- FIX_GDT_REMAP_BEGIN, +- FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, ++ FIX_CPU_ENTRY_AREA_TOP, ++ FIX_CPU_ENTRY_AREA_BOTTOM = FIX_CPU_ENTRY_AREA_TOP + (CPU_ENTRY_AREA_PAGES * NR_CPUS) - 1, ++ ++#ifdef CONFIG_ACPI_APEI_GHES ++ /* Used for GHES mapping from assorted contexts */ ++ FIX_APEI_GHES_IRQ, ++ FIX_APEI_GHES_NMI, ++#endif + + __end_of_permanent_fixed_addresses, + +@@ -185,5 +230,30 @@ void __init *early_memremap_decrypted_wp(resource_size_t phys_addr, + void __early_set_fixmap(enum fixed_addresses idx, + phys_addr_t phys, pgprot_t flags); + ++static inline unsigned int __get_cpu_entry_area_page_index(int cpu, int page) ++{ ++ BUILD_BUG_ON(sizeof(struct cpu_entry_area) % PAGE_SIZE != 0); ++ ++ return FIX_CPU_ENTRY_AREA_BOTTOM - cpu*CPU_ENTRY_AREA_PAGES - page; ++} ++ ++#define __get_cpu_entry_area_offset_index(cpu, offset) ({ \ ++ BUILD_BUG_ON(offset % PAGE_SIZE != 0); \ ++ __get_cpu_entry_area_page_index(cpu, offset / PAGE_SIZE); \ ++ }) ++ ++#define get_cpu_entry_area_index(cpu, field) \ ++ __get_cpu_entry_area_offset_index((cpu), offsetof(struct cpu_entry_area, field)) ++ ++static inline struct cpu_entry_area *get_cpu_entry_area(int cpu) ++{ ++ return (struct cpu_entry_area *)__fix_to_virt(__get_cpu_entry_area_page_index(cpu, 0)); ++} ++ ++static inline struct SYSENTER_stack *cpu_SYSENTER_stack(int cpu) ++{ ++ return &get_cpu_entry_area(cpu)->SYSENTER_stack_page.stack; ++} ++ + #endif /* !__ASSEMBLY__ */ + #endif /* _ASM_X86_FIXMAP_H */ +diff --git a/arch/x86/include/asm/hypervisor.h b/arch/x86/include/asm/hypervisor.h +index 0ead9dbb9130..96aa6b9884dc 100644 +--- a/arch/x86/include/asm/hypervisor.h ++++ b/arch/x86/include/asm/hypervisor.h +@@ -20,14 +20,22 @@ + #ifndef _ASM_X86_HYPERVISOR_H + #define _ASM_X86_HYPERVISOR_H + ++/* x86 hypervisor types */ ++enum x86_hypervisor_type { ++ X86_HYPER_NATIVE = 0, ++ X86_HYPER_VMWARE, ++ X86_HYPER_MS_HYPERV, ++ X86_HYPER_XEN_PV, ++ X86_HYPER_XEN_HVM, ++ X86_HYPER_KVM, ++}; ++ + #ifdef CONFIG_HYPERVISOR_GUEST + + #include ++#include + #include + +-/* +- * x86 hypervisor information +- */ + struct hypervisor_x86 { + /* Hypervisor name */ + const char *name; +@@ -35,40 +43,27 @@ struct hypervisor_x86 { + /* Detection routine */ + uint32_t (*detect)(void); + +- /* Platform setup (run once per boot) */ +- void (*init_platform)(void); +- +- /* X2APIC detection (run once per boot) */ +- bool (*x2apic_available)(void); ++ /* Hypervisor type */ ++ enum x86_hypervisor_type type; + +- /* pin current vcpu to specified physical cpu (run rarely) */ +- void (*pin_vcpu)(int); ++ /* init time callbacks */ ++ struct x86_hyper_init init; + +- /* called during init_mem_mapping() to setup early mappings. */ +- void (*init_mem_mapping)(void); ++ /* runtime callbacks */ ++ struct x86_hyper_runtime runtime; + }; + +-extern const struct hypervisor_x86 *x86_hyper; +- +-/* Recognized hypervisors */ +-extern const struct hypervisor_x86 x86_hyper_vmware; +-extern const struct hypervisor_x86 x86_hyper_ms_hyperv; +-extern const struct hypervisor_x86 x86_hyper_xen_pv; +-extern const struct hypervisor_x86 x86_hyper_xen_hvm; +-extern const struct hypervisor_x86 x86_hyper_kvm; +- ++extern enum x86_hypervisor_type x86_hyper_type; + extern void init_hypervisor_platform(void); +-extern bool hypervisor_x2apic_available(void); +-extern void hypervisor_pin_vcpu(int cpu); +- +-static inline void hypervisor_init_mem_mapping(void) ++static inline bool hypervisor_is_type(enum x86_hypervisor_type type) + { +- if (x86_hyper && x86_hyper->init_mem_mapping) +- x86_hyper->init_mem_mapping(); ++ return x86_hyper_type == type; + } + #else + static inline void init_hypervisor_platform(void) { } +-static inline bool hypervisor_x2apic_available(void) { return false; } +-static inline void hypervisor_init_mem_mapping(void) { } ++static inline bool hypervisor_is_type(enum x86_hypervisor_type type) ++{ ++ return type == X86_HYPER_NATIVE; ++} + #endif /* CONFIG_HYPERVISOR_GUEST */ + #endif /* _ASM_X86_HYPERVISOR_H */ +diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h +index c8ef23f2c28f..89f08955fff7 100644 +--- a/arch/x86/include/asm/irqflags.h ++++ b/arch/x86/include/asm/irqflags.h +@@ -142,6 +142,9 @@ static inline notrace unsigned long arch_local_irq_save(void) + swapgs; \ + sysretl + ++#ifdef CONFIG_DEBUG_ENTRY ++#define SAVE_FLAGS(x) pushfq; popq %rax ++#endif + #else + #define INTERRUPT_RETURN iret + #define ENABLE_INTERRUPTS_SYSEXIT sti; sysexit +diff --git a/arch/x86/include/asm/kdebug.h b/arch/x86/include/asm/kdebug.h +index f86a8caa561e..395c9631e000 100644 +--- a/arch/x86/include/asm/kdebug.h ++++ b/arch/x86/include/asm/kdebug.h +@@ -26,6 +26,7 @@ extern void die(const char *, struct pt_regs *,long); + extern int __must_check __die(const char *, struct pt_regs *, long); + extern void show_stack_regs(struct pt_regs *regs); + extern void __show_regs(struct pt_regs *regs, int all); ++extern void show_iret_regs(struct pt_regs *regs); + extern unsigned long oops_begin(void); + extern void oops_end(unsigned long, struct pt_regs *, int signr); + +diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h +index 6699fc441644..6d16d15d09a0 100644 +--- a/arch/x86/include/asm/mmu_context.h ++++ b/arch/x86/include/asm/mmu_context.h +@@ -73,8 +73,8 @@ static inline void load_mm_ldt(struct mm_struct *mm) + #ifdef CONFIG_MODIFY_LDT_SYSCALL + struct ldt_struct *ldt; + +- /* lockless_dereference synchronizes with smp_store_release */ +- ldt = lockless_dereference(mm->context.ldt); ++ /* READ_ONCE synchronizes with smp_store_release */ ++ ldt = READ_ONCE(mm->context.ldt); + + /* + * Any change to mm->context.ldt is followed by an IPI to all +diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h +index 8546fafa21a9..7948a17febb4 100644 +--- a/arch/x86/include/asm/module.h ++++ b/arch/x86/include/asm/module.h +@@ -6,7 +6,7 @@ + #include + + struct mod_arch_specific { +-#ifdef CONFIG_ORC_UNWINDER ++#ifdef CONFIG_UNWINDER_ORC + unsigned int num_orcs; + int *orc_unwind_ip; + struct orc_entry *orc_unwind; +diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h +index fd81228e8037..892df375b615 100644 +--- a/arch/x86/include/asm/paravirt.h ++++ b/arch/x86/include/asm/paravirt.h +@@ -16,10 +16,9 @@ + #include + #include + +-static inline void load_sp0(struct tss_struct *tss, +- struct thread_struct *thread) ++static inline void load_sp0(unsigned long sp0) + { +- PVOP_VCALL2(pv_cpu_ops.load_sp0, tss, thread); ++ PVOP_VCALL1(pv_cpu_ops.load_sp0, sp0); + } + + /* The paravirtualized CPUID instruction. */ +@@ -928,6 +927,15 @@ extern void default_banner(void); + PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_usergs_sysret64), \ + CLBR_NONE, \ + jmp PARA_INDIRECT(pv_cpu_ops+PV_CPU_usergs_sysret64)) ++ ++#ifdef CONFIG_DEBUG_ENTRY ++#define SAVE_FLAGS(clobbers) \ ++ PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_save_fl), clobbers, \ ++ PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE); \ ++ call PARA_INDIRECT(pv_irq_ops+PV_IRQ_save_fl); \ ++ PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);) ++#endif ++ + #endif /* CONFIG_X86_32 */ + + #endif /* __ASSEMBLY__ */ +diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h +index 10cc3b9709fe..6ec54d01972d 100644 +--- a/arch/x86/include/asm/paravirt_types.h ++++ b/arch/x86/include/asm/paravirt_types.h +@@ -134,7 +134,7 @@ struct pv_cpu_ops { + void (*alloc_ldt)(struct desc_struct *ldt, unsigned entries); + void (*free_ldt)(struct desc_struct *ldt, unsigned entries); + +- void (*load_sp0)(struct tss_struct *tss, struct thread_struct *t); ++ void (*load_sp0)(unsigned long sp0); + + void (*set_iopl_mask)(unsigned mask); + +diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h +index 377f1ffd18be..ba3c523aaf16 100644 +--- a/arch/x86/include/asm/percpu.h ++++ b/arch/x86/include/asm/percpu.h +@@ -526,7 +526,7 @@ static inline bool x86_this_cpu_variable_test_bit(int nr, + { + bool oldbit; + +- asm volatile("bt "__percpu_arg(2)",%1\n\t" ++ asm volatile("bt "__percpu_arg(2)",%1" + CC_SET(c) + : CC_OUT(c) (oldbit) + : "m" (*(unsigned long __percpu *)addr), "Ir" (nr)); +diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h +index 59df7b47a434..9e9b05fc4860 100644 +--- a/arch/x86/include/asm/pgtable_types.h ++++ b/arch/x86/include/asm/pgtable_types.h +@@ -200,10 +200,9 @@ enum page_cache_mode { + + #define _PAGE_ENC (_AT(pteval_t, sme_me_mask)) + +-#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \ +- _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_ENC) + #define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \ + _PAGE_DIRTY | _PAGE_ENC) ++#define _PAGE_TABLE (_KERNPG_TABLE | _PAGE_USER) + + #define __PAGE_KERNEL_ENC (__PAGE_KERNEL | _PAGE_ENC) + #define __PAGE_KERNEL_ENC_WP (__PAGE_KERNEL_WP | _PAGE_ENC) +diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h +index bdac19ab2488..da943411d3d8 100644 +--- a/arch/x86/include/asm/processor.h ++++ b/arch/x86/include/asm/processor.h +@@ -162,9 +162,9 @@ enum cpuid_regs_idx { + extern struct cpuinfo_x86 boot_cpu_data; + extern struct cpuinfo_x86 new_cpu_data; + +-extern struct tss_struct doublefault_tss; +-extern __u32 cpu_caps_cleared[NCAPINTS]; +-extern __u32 cpu_caps_set[NCAPINTS]; ++extern struct x86_hw_tss doublefault_tss; ++extern __u32 cpu_caps_cleared[NCAPINTS + NBUGINTS]; ++extern __u32 cpu_caps_set[NCAPINTS + NBUGINTS]; + + #ifdef CONFIG_SMP + DECLARE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info); +@@ -252,6 +252,11 @@ static inline void load_cr3(pgd_t *pgdir) + write_cr3(__sme_pa(pgdir)); + } + ++/* ++ * Note that while the legacy 'TSS' name comes from 'Task State Segment', ++ * on modern x86 CPUs the TSS also holds information important to 64-bit mode, ++ * unrelated to the task-switch mechanism: ++ */ + #ifdef CONFIG_X86_32 + /* This is the TSS defined by the hardware. */ + struct x86_hw_tss { +@@ -304,7 +309,13 @@ struct x86_hw_tss { + struct x86_hw_tss { + u32 reserved1; + u64 sp0; ++ ++ /* ++ * We store cpu_current_top_of_stack in sp1 so it's always accessible. ++ * Linux does not use ring 1, so sp1 is not otherwise needed. ++ */ + u64 sp1; ++ + u64 sp2; + u64 reserved2; + u64 ist[7]; +@@ -322,12 +333,22 @@ struct x86_hw_tss { + #define IO_BITMAP_BITS 65536 + #define IO_BITMAP_BYTES (IO_BITMAP_BITS/8) + #define IO_BITMAP_LONGS (IO_BITMAP_BYTES/sizeof(long)) +-#define IO_BITMAP_OFFSET offsetof(struct tss_struct, io_bitmap) ++#define IO_BITMAP_OFFSET (offsetof(struct tss_struct, io_bitmap) - offsetof(struct tss_struct, x86_tss)) + #define INVALID_IO_BITMAP_OFFSET 0x8000 + ++struct SYSENTER_stack { ++ unsigned long words[64]; ++}; ++ ++struct SYSENTER_stack_page { ++ struct SYSENTER_stack stack; ++} __aligned(PAGE_SIZE); ++ + struct tss_struct { + /* +- * The hardware state: ++ * The fixed hardware portion. This must not cross a page boundary ++ * at risk of violating the SDM's advice and potentially triggering ++ * errata. + */ + struct x86_hw_tss x86_tss; + +@@ -338,18 +359,9 @@ struct tss_struct { + * be within the limit. + */ + unsigned long io_bitmap[IO_BITMAP_LONGS + 1]; ++} __aligned(PAGE_SIZE); + +-#ifdef CONFIG_X86_32 +- /* +- * Space for the temporary SYSENTER stack. +- */ +- unsigned long SYSENTER_stack_canary; +- unsigned long SYSENTER_stack[64]; +-#endif +- +-} ____cacheline_aligned; +- +-DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss); ++DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw); + + /* + * sizeof(unsigned long) coming from an extra "long" at the end +@@ -363,6 +375,9 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss); + + #ifdef CONFIG_X86_32 + DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack); ++#else ++/* The RO copy can't be accessed with this_cpu_xyz(), so use the RW copy. */ ++#define cpu_current_top_of_stack cpu_tss_rw.x86_tss.sp1 + #endif + + /* +@@ -431,7 +446,9 @@ typedef struct { + struct thread_struct { + /* Cached TLS descriptors: */ + struct desc_struct tls_array[GDT_ENTRY_TLS_ENTRIES]; ++#ifdef CONFIG_X86_32 + unsigned long sp0; ++#endif + unsigned long sp; + #ifdef CONFIG_X86_32 + unsigned long sysenter_cs; +@@ -518,16 +535,9 @@ static inline void native_set_iopl_mask(unsigned mask) + } + + static inline void +-native_load_sp0(struct tss_struct *tss, struct thread_struct *thread) ++native_load_sp0(unsigned long sp0) + { +- tss->x86_tss.sp0 = thread->sp0; +-#ifdef CONFIG_X86_32 +- /* Only happens when SEP is enabled, no need to test "SEP"arately: */ +- if (unlikely(tss->x86_tss.ss1 != thread->sysenter_cs)) { +- tss->x86_tss.ss1 = thread->sysenter_cs; +- wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0); +- } +-#endif ++ this_cpu_write(cpu_tss_rw.x86_tss.sp0, sp0); + } + + static inline void native_swapgs(void) +@@ -539,12 +549,18 @@ static inline void native_swapgs(void) + + static inline unsigned long current_top_of_stack(void) + { +-#ifdef CONFIG_X86_64 +- return this_cpu_read_stable(cpu_tss.x86_tss.sp0); +-#else +- /* sp0 on x86_32 is special in and around vm86 mode. */ ++ /* ++ * We can't read directly from tss.sp0: sp0 on x86_32 is special in ++ * and around vm86 mode and sp0 on x86_64 is special because of the ++ * entry trampoline. ++ */ + return this_cpu_read_stable(cpu_current_top_of_stack); +-#endif ++} ++ ++static inline bool on_thread_stack(void) ++{ ++ return (unsigned long)(current_top_of_stack() - ++ current_stack_pointer) < THREAD_SIZE; + } + + #ifdef CONFIG_PARAVIRT +@@ -552,10 +568,9 @@ static inline unsigned long current_top_of_stack(void) + #else + #define __cpuid native_cpuid + +-static inline void load_sp0(struct tss_struct *tss, +- struct thread_struct *thread) ++static inline void load_sp0(unsigned long sp0) + { +- native_load_sp0(tss, thread); ++ native_load_sp0(sp0); + } + + #define set_iopl_mask native_set_iopl_mask +@@ -804,6 +819,15 @@ static inline void spin_lock_prefetch(const void *x) + #define TOP_OF_INIT_STACK ((unsigned long)&init_stack + sizeof(init_stack) - \ + TOP_OF_KERNEL_STACK_PADDING) + ++#define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1)) ++ ++#define task_pt_regs(task) \ ++({ \ ++ unsigned long __ptr = (unsigned long)task_stack_page(task); \ ++ __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \ ++ ((struct pt_regs *)__ptr) - 1; \ ++}) ++ + #ifdef CONFIG_X86_32 + /* + * User space process size: 3GB (default). +@@ -823,23 +847,6 @@ static inline void spin_lock_prefetch(const void *x) + .addr_limit = KERNEL_DS, \ + } + +-/* +- * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack. +- * This is necessary to guarantee that the entire "struct pt_regs" +- * is accessible even if the CPU haven't stored the SS/ESP registers +- * on the stack (interrupt gate does not save these registers +- * when switching to the same priv ring). +- * Therefore beware: accessing the ss/esp fields of the +- * "struct pt_regs" is possible, but they may contain the +- * completely wrong values. +- */ +-#define task_pt_regs(task) \ +-({ \ +- unsigned long __ptr = (unsigned long)task_stack_page(task); \ +- __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \ +- ((struct pt_regs *)__ptr) - 1; \ +-}) +- + #define KSTK_ESP(task) (task_pt_regs(task)->sp) + + #else +@@ -873,11 +880,9 @@ static inline void spin_lock_prefetch(const void *x) + #define STACK_TOP_MAX TASK_SIZE_MAX + + #define INIT_THREAD { \ +- .sp0 = TOP_OF_INIT_STACK, \ + .addr_limit = KERNEL_DS, \ + } + +-#define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1) + extern unsigned long KSTK_ESP(struct task_struct *task); + + #endif /* CONFIG_X86_64 */ +diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h +index c0e3c45cf6ab..14131dd06b29 100644 +--- a/arch/x86/include/asm/ptrace.h ++++ b/arch/x86/include/asm/ptrace.h +@@ -136,9 +136,9 @@ static inline int v8086_mode(struct pt_regs *regs) + #endif + } + +-#ifdef CONFIG_X86_64 + static inline bool user_64bit_mode(struct pt_regs *regs) + { ++#ifdef CONFIG_X86_64 + #ifndef CONFIG_PARAVIRT + /* + * On non-paravirt systems, this is the only long mode CPL 3 +@@ -149,8 +149,12 @@ static inline bool user_64bit_mode(struct pt_regs *regs) + /* Headers are too twisted for this to go in paravirt.h. */ + return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs; + #endif ++#else /* !CONFIG_X86_64 */ ++ return false; ++#endif + } + ++#ifdef CONFIG_X86_64 + #define current_user_stack_pointer() current_pt_regs()->sp + #define compat_user_stack_pointer() current_pt_regs()->sp + #endif +diff --git a/arch/x86/include/asm/rmwcc.h b/arch/x86/include/asm/rmwcc.h +index d8f3a6ae9f6c..f91c365e57c3 100644 +--- a/arch/x86/include/asm/rmwcc.h ++++ b/arch/x86/include/asm/rmwcc.h +@@ -29,7 +29,7 @@ cc_label: \ + #define __GEN_RMWcc(fullop, var, cc, clobbers, ...) \ + do { \ + bool c; \ +- asm volatile (fullop ";" CC_SET(cc) \ ++ asm volatile (fullop CC_SET(cc) \ + : [counter] "+m" (var), CC_OUT(cc) (c) \ + : __VA_ARGS__ : clobbers); \ + return c; \ +diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h +index 8da111b3c342..f8062bfd43a0 100644 +--- a/arch/x86/include/asm/stacktrace.h ++++ b/arch/x86/include/asm/stacktrace.h +@@ -16,6 +16,7 @@ enum stack_type { + STACK_TYPE_TASK, + STACK_TYPE_IRQ, + STACK_TYPE_SOFTIRQ, ++ STACK_TYPE_SYSENTER, + STACK_TYPE_EXCEPTION, + STACK_TYPE_EXCEPTION_LAST = STACK_TYPE_EXCEPTION + N_EXCEPTION_STACKS-1, + }; +@@ -28,6 +29,8 @@ struct stack_info { + bool in_task_stack(unsigned long *stack, struct task_struct *task, + struct stack_info *info); + ++bool in_sysenter_stack(unsigned long *stack, struct stack_info *info); ++ + int get_stack_info(unsigned long *stack, struct task_struct *task, + struct stack_info *info, unsigned long *visit_mask); + +diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h +index 899084b70412..9b6df68d8fd1 100644 +--- a/arch/x86/include/asm/switch_to.h ++++ b/arch/x86/include/asm/switch_to.h +@@ -2,6 +2,8 @@ + #ifndef _ASM_X86_SWITCH_TO_H + #define _ASM_X86_SWITCH_TO_H + ++#include ++ + struct task_struct; /* one of the stranger aspects of C forward declarations */ + + struct task_struct *__switch_to_asm(struct task_struct *prev, +@@ -73,4 +75,28 @@ do { \ + ((last) = __switch_to_asm((prev), (next))); \ + } while (0) + ++#ifdef CONFIG_X86_32 ++static inline void refresh_sysenter_cs(struct thread_struct *thread) ++{ ++ /* Only happens when SEP is enabled, no need to test "SEP"arately: */ ++ if (unlikely(this_cpu_read(cpu_tss_rw.x86_tss.ss1) == thread->sysenter_cs)) ++ return; ++ ++ this_cpu_write(cpu_tss_rw.x86_tss.ss1, thread->sysenter_cs); ++ wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0); ++} ++#endif ++ ++/* This is used when switching tasks or entering/exiting vm86 mode. */ ++static inline void update_sp0(struct task_struct *task) ++{ ++ /* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */ ++#ifdef CONFIG_X86_32 ++ load_sp0(task->thread.sp0); ++#else ++ if (static_cpu_has(X86_FEATURE_XENPV)) ++ load_sp0(task_top_of_stack(task)); ++#endif ++} ++ + #endif /* _ASM_X86_SWITCH_TO_H */ +diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h +index 70f425947dc5..00223333821a 100644 +--- a/arch/x86/include/asm/thread_info.h ++++ b/arch/x86/include/asm/thread_info.h +@@ -207,7 +207,7 @@ static inline int arch_within_stack_frames(const void * const stack, + #else /* !__ASSEMBLY__ */ + + #ifdef CONFIG_X86_64 +-# define cpu_current_top_of_stack (cpu_tss + TSS_sp0) ++# define cpu_current_top_of_stack (cpu_tss_rw + TSS_sp1) + #endif + + #endif +diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h +index fa60398bbc3a..069c04be1507 100644 +--- a/arch/x86/include/asm/trace/fpu.h ++++ b/arch/x86/include/asm/trace/fpu.h +@@ -34,11 +34,6 @@ DECLARE_EVENT_CLASS(x86_fpu, + ) + ); + +-DEFINE_EVENT(x86_fpu, x86_fpu_state, +- TP_PROTO(struct fpu *fpu), +- TP_ARGS(fpu) +-); +- + DEFINE_EVENT(x86_fpu, x86_fpu_before_save, + TP_PROTO(struct fpu *fpu), + TP_ARGS(fpu) +@@ -74,11 +69,6 @@ DEFINE_EVENT(x86_fpu, x86_fpu_activate_state, + TP_ARGS(fpu) + ); + +-DEFINE_EVENT(x86_fpu, x86_fpu_deactivate_state, +- TP_PROTO(struct fpu *fpu), +- TP_ARGS(fpu) +-); +- + DEFINE_EVENT(x86_fpu, x86_fpu_init_state, + TP_PROTO(struct fpu *fpu), + TP_ARGS(fpu) +diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h +index b0cced97a6ce..31051f35cbb7 100644 +--- a/arch/x86/include/asm/traps.h ++++ b/arch/x86/include/asm/traps.h +@@ -38,9 +38,9 @@ asmlinkage void simd_coprocessor_error(void); + + #if defined(CONFIG_X86_64) && defined(CONFIG_XEN_PV) + asmlinkage void xen_divide_error(void); ++asmlinkage void xen_xennmi(void); + asmlinkage void xen_xendebug(void); + asmlinkage void xen_xenint3(void); +-asmlinkage void xen_nmi(void); + asmlinkage void xen_overflow(void); + asmlinkage void xen_bounds(void); + asmlinkage void xen_invalid_op(void); +@@ -75,7 +75,6 @@ dotraplinkage void do_segment_not_present(struct pt_regs *, long); + dotraplinkage void do_stack_segment(struct pt_regs *, long); + #ifdef CONFIG_X86_64 + dotraplinkage void do_double_fault(struct pt_regs *, long); +-asmlinkage struct pt_regs *sync_regs(struct pt_regs *); + #endif + dotraplinkage void do_general_protection(struct pt_regs *, long); + dotraplinkage void do_page_fault(struct pt_regs *, unsigned long); +@@ -145,4 +144,22 @@ enum { + X86_TRAP_IRET = 32, /* 32, IRET Exception */ + }; + ++/* ++ * Page fault error code bits: ++ * ++ * bit 0 == 0: no page found 1: protection fault ++ * bit 1 == 0: read access 1: write access ++ * bit 2 == 0: kernel-mode access 1: user-mode access ++ * bit 3 == 1: use of reserved bit detected ++ * bit 4 == 1: fault was an instruction fetch ++ * bit 5 == 1: protection keys block access ++ */ ++enum x86_pf_error_code { ++ X86_PF_PROT = 1 << 0, ++ X86_PF_WRITE = 1 << 1, ++ X86_PF_USER = 1 << 2, ++ X86_PF_RSVD = 1 << 3, ++ X86_PF_INSTR = 1 << 4, ++ X86_PF_PK = 1 << 5, ++}; + #endif /* _ASM_X86_TRAPS_H */ +diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h +index 87adc0d38c4a..c1688c2d0a12 100644 +--- a/arch/x86/include/asm/unwind.h ++++ b/arch/x86/include/asm/unwind.h +@@ -7,17 +7,20 @@ + #include + #include + ++#define IRET_FRAME_OFFSET (offsetof(struct pt_regs, ip)) ++#define IRET_FRAME_SIZE (sizeof(struct pt_regs) - IRET_FRAME_OFFSET) ++ + struct unwind_state { + struct stack_info stack_info; + unsigned long stack_mask; + struct task_struct *task; + int graph_idx; + bool error; +-#if defined(CONFIG_ORC_UNWINDER) ++#if defined(CONFIG_UNWINDER_ORC) + bool signal, full_regs; + unsigned long sp, bp, ip; + struct pt_regs *regs; +-#elif defined(CONFIG_FRAME_POINTER_UNWINDER) ++#elif defined(CONFIG_UNWINDER_FRAME_POINTER) + bool got_irq; + unsigned long *bp, *orig_sp, ip; + struct pt_regs *regs; +@@ -51,7 +54,11 @@ void unwind_start(struct unwind_state *state, struct task_struct *task, + __unwind_start(state, task, regs, first_frame); + } + +-#if defined(CONFIG_ORC_UNWINDER) || defined(CONFIG_FRAME_POINTER_UNWINDER) ++#if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER) ++/* ++ * WARNING: The entire pt_regs may not be safe to dereference. In some cases, ++ * only the iret frame registers are accessible. Use with caution! ++ */ + static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state) + { + if (unwind_done(state)) +@@ -66,7 +73,7 @@ static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state) + } + #endif + +-#ifdef CONFIG_ORC_UNWINDER ++#ifdef CONFIG_UNWINDER_ORC + void unwind_init(void); + void unwind_module_init(struct module *mod, void *orc_ip, size_t orc_ip_size, + void *orc, size_t orc_size); +diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h +index 8a1ebf9540dd..ad15a0fda917 100644 +--- a/arch/x86/include/asm/x86_init.h ++++ b/arch/x86/include/asm/x86_init.h +@@ -114,6 +114,18 @@ struct x86_init_pci { + void (*fixup_irqs)(void); + }; + ++/** ++ * struct x86_hyper_init - x86 hypervisor init functions ++ * @init_platform: platform setup ++ * @x2apic_available: X2APIC detection ++ * @init_mem_mapping: setup early mappings during init_mem_mapping() ++ */ ++struct x86_hyper_init { ++ void (*init_platform)(void); ++ bool (*x2apic_available)(void); ++ void (*init_mem_mapping)(void); ++}; ++ + /** + * struct x86_init_ops - functions for platform specific setup + * +@@ -127,6 +139,7 @@ struct x86_init_ops { + struct x86_init_timers timers; + struct x86_init_iommu iommu; + struct x86_init_pci pci; ++ struct x86_hyper_init hyper; + }; + + /** +@@ -199,6 +212,15 @@ struct x86_legacy_features { + struct x86_legacy_devices devices; + }; + ++/** ++ * struct x86_hyper_runtime - x86 hypervisor specific runtime callbacks ++ * ++ * @pin_vcpu: pin current vcpu to specified physical cpu (run rarely) ++ */ ++struct x86_hyper_runtime { ++ void (*pin_vcpu)(int cpu); ++}; ++ + /** + * struct x86_platform_ops - platform specific runtime functions + * @calibrate_cpu: calibrate CPU +@@ -218,6 +240,7 @@ struct x86_legacy_features { + * possible in x86_early_init_platform_quirks() by + * only using the current x86_hardware_subarch + * semantics. ++ * @hyper: x86 hypervisor specific runtime callbacks + */ + struct x86_platform_ops { + unsigned long (*calibrate_cpu)(void); +@@ -233,6 +256,7 @@ struct x86_platform_ops { + void (*apic_post_init)(void); + struct x86_legacy_features legacy; + void (*set_legacy_features)(void); ++ struct x86_hyper_runtime hyper; + }; + + struct pci_dev; +diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h +index 6f3355399665..53b4ca55ebb6 100644 +--- a/arch/x86/include/uapi/asm/processor-flags.h ++++ b/arch/x86/include/uapi/asm/processor-flags.h +@@ -152,5 +152,8 @@ + #define CX86_ARR_BASE 0xc4 + #define CX86_RCR_BASE 0xdc + ++#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \ ++ X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \ ++ X86_CR0_PG) + + #endif /* _UAPI_ASM_X86_PROCESSOR_FLAGS_H */ +diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile +index 5f70044340ff..295abaa58add 100644 +--- a/arch/x86/kernel/Makefile ++++ b/arch/x86/kernel/Makefile +@@ -25,9 +25,9 @@ endif + KASAN_SANITIZE_head$(BITS).o := n + KASAN_SANITIZE_dumpstack.o := n + KASAN_SANITIZE_dumpstack_$(BITS).o := n +-KASAN_SANITIZE_stacktrace.o := n ++KASAN_SANITIZE_stacktrace.o := n ++KASAN_SANITIZE_paravirt.o := n + +-OBJECT_FILES_NON_STANDARD_head_$(BITS).o := y + OBJECT_FILES_NON_STANDARD_relocate_kernel_$(BITS).o := y + OBJECT_FILES_NON_STANDARD_ftrace_$(BITS).o := y + OBJECT_FILES_NON_STANDARD_test_nx.o := y +@@ -128,9 +128,9 @@ obj-$(CONFIG_PERF_EVENTS) += perf_regs.o + obj-$(CONFIG_TRACING) += tracepoint.o + obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o + +-obj-$(CONFIG_ORC_UNWINDER) += unwind_orc.o +-obj-$(CONFIG_FRAME_POINTER_UNWINDER) += unwind_frame.o +-obj-$(CONFIG_GUESS_UNWINDER) += unwind_guess.o ++obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o ++obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o ++obj-$(CONFIG_UNWINDER_GUESS) += unwind_guess.o + + ### + # 64 bit specific files +diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c +index ff891772c9f8..89c7c8569e5e 100644 +--- a/arch/x86/kernel/apic/apic.c ++++ b/arch/x86/kernel/apic/apic.c +@@ -1645,7 +1645,7 @@ static __init void try_to_enable_x2apic(int remap_mode) + * under KVM + */ + if (max_physical_apicid > 255 || +- !hypervisor_x2apic_available()) { ++ !x86_init.hyper.x2apic_available()) { + pr_info("x2apic: IRQ remapping doesn't support X2APIC mode\n"); + x2apic_disable(); + return; +diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c +index 0d57bb9079c9..c0b694810ff4 100644 +--- a/arch/x86/kernel/apic/x2apic_uv_x.c ++++ b/arch/x86/kernel/apic/x2apic_uv_x.c +@@ -920,9 +920,8 @@ static __init void uv_rtc_init(void) + /* + * percpu heartbeat timer + */ +-static void uv_heartbeat(unsigned long ignored) ++static void uv_heartbeat(struct timer_list *timer) + { +- struct timer_list *timer = &uv_scir_info->timer; + unsigned char bits = uv_scir_info->state; + + /* Flip heartbeat bit: */ +@@ -947,7 +946,7 @@ static int uv_heartbeat_enable(unsigned int cpu) + struct timer_list *timer = &uv_cpu_scir_info(cpu)->timer; + + uv_set_cpu_scir_bits(cpu, SCIR_CPU_HEARTBEAT|SCIR_CPU_ACTIVITY); +- setup_pinned_timer(timer, uv_heartbeat, cpu); ++ timer_setup(timer, uv_heartbeat, TIMER_PINNED); + timer->expires = jiffies + SCIR_CPU_HB_INTERVAL; + add_timer_on(timer, cpu); + uv_cpu_scir_info(cpu)->enabled = 1; +diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c +index 8ea78275480d..cd360a5e0dca 100644 +--- a/arch/x86/kernel/asm-offsets.c ++++ b/arch/x86/kernel/asm-offsets.c +@@ -93,4 +93,10 @@ void common(void) { + + BLANK(); + DEFINE(PTREGS_SIZE, sizeof(struct pt_regs)); ++ ++ /* Layout info for cpu_entry_area */ ++ OFFSET(CPU_ENTRY_AREA_tss, cpu_entry_area, tss); ++ OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline); ++ OFFSET(CPU_ENTRY_AREA_SYSENTER_stack, cpu_entry_area, SYSENTER_stack_page); ++ DEFINE(SIZEOF_SYSENTER_stack, sizeof(struct SYSENTER_stack)); + } +diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c +index dedf428b20b6..7d20d9c0b3d6 100644 +--- a/arch/x86/kernel/asm-offsets_32.c ++++ b/arch/x86/kernel/asm-offsets_32.c +@@ -47,13 +47,8 @@ void foo(void) + BLANK(); + + /* Offset from the sysenter stack to tss.sp0 */ +- DEFINE(TSS_sysenter_sp0, offsetof(struct tss_struct, x86_tss.sp0) - +- offsetofend(struct tss_struct, SYSENTER_stack)); +- +- /* Offset from cpu_tss to SYSENTER_stack */ +- OFFSET(CPU_TSS_SYSENTER_stack, tss_struct, SYSENTER_stack); +- /* Size of SYSENTER_stack */ +- DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack)); ++ DEFINE(TSS_sysenter_sp0, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - ++ offsetofend(struct cpu_entry_area, SYSENTER_stack_page.stack)); + + #ifdef CONFIG_CC_STACKPROTECTOR + BLANK(); +diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c +index 630212fa9b9d..bf51e51d808d 100644 +--- a/arch/x86/kernel/asm-offsets_64.c ++++ b/arch/x86/kernel/asm-offsets_64.c +@@ -23,6 +23,9 @@ int main(void) + #ifdef CONFIG_PARAVIRT + OFFSET(PV_CPU_usergs_sysret64, pv_cpu_ops, usergs_sysret64); + OFFSET(PV_CPU_swapgs, pv_cpu_ops, swapgs); ++#ifdef CONFIG_DEBUG_ENTRY ++ OFFSET(PV_IRQ_save_fl, pv_irq_ops, save_fl); ++#endif + BLANK(); + #endif + +@@ -63,6 +66,7 @@ int main(void) + + OFFSET(TSS_ist, tss_struct, x86_tss.ist); + OFFSET(TSS_sp0, tss_struct, x86_tss.sp0); ++ OFFSET(TSS_sp1, tss_struct, x86_tss.sp1); + BLANK(); + + #ifdef CONFIG_CC_STACKPROTECTOR +diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile +index c60922a66385..90cb82dbba57 100644 +--- a/arch/x86/kernel/cpu/Makefile ++++ b/arch/x86/kernel/cpu/Makefile +@@ -23,6 +23,7 @@ obj-y += rdrand.o + obj-y += match.o + obj-y += bugs.o + obj-$(CONFIG_CPU_FREQ) += aperfmperf.o ++obj-y += cpuid-deps.o + + obj-$(CONFIG_PROC_FS) += proc.o + obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o +diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c +index d58184b7cd44..bcb75dc97d44 100644 +--- a/arch/x86/kernel/cpu/amd.c ++++ b/arch/x86/kernel/cpu/amd.c +@@ -804,8 +804,11 @@ static void init_amd(struct cpuinfo_x86 *c) + case 0x17: init_amd_zn(c); break; + } + +- /* Enable workaround for FXSAVE leak */ +- if (c->x86 >= 6) ++ /* ++ * Enable workaround for FXSAVE leak on CPUs ++ * without a XSaveErPtr feature ++ */ ++ if ((c->x86 >= 6) && (!cpu_has(c, X86_FEATURE_XSAVEERPTR))) + set_cpu_bug(c, X86_BUG_FXSAVE_LEAK); + + cpu_detect_cache_sizes(c); +diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c +index c9176bae7fd8..034900623adf 100644 +--- a/arch/x86/kernel/cpu/common.c ++++ b/arch/x86/kernel/cpu/common.c +@@ -452,8 +452,8 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c) + return NULL; /* Not found */ + } + +-__u32 cpu_caps_cleared[NCAPINTS]; +-__u32 cpu_caps_set[NCAPINTS]; ++__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS]; ++__u32 cpu_caps_set[NCAPINTS + NBUGINTS]; + + void load_percpu_segment(int cpu) + { +@@ -466,27 +466,116 @@ void load_percpu_segment(int cpu) + load_stack_canary_segment(); + } + +-/* Setup the fixmap mapping only once per-processor */ +-static inline void setup_fixmap_gdt(int cpu) ++#ifdef CONFIG_X86_32 ++/* The 32-bit entry code needs to find cpu_entry_area. */ ++DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area); ++#endif ++ ++#ifdef CONFIG_X86_64 ++/* ++ * Special IST stacks which the CPU switches to when it calls ++ * an IST-marked descriptor entry. Up to 7 stacks (hardware ++ * limit), all of them are 4K, except the debug stack which ++ * is 8K. ++ */ ++static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = { ++ [0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STKSZ, ++ [DEBUG_STACK - 1] = DEBUG_STKSZ ++}; ++ ++static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks ++ [(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]); ++#endif ++ ++static DEFINE_PER_CPU_PAGE_ALIGNED(struct SYSENTER_stack_page, ++ SYSENTER_stack_storage); ++ ++static void __init ++set_percpu_fixmap_pages(int idx, void *ptr, int pages, pgprot_t prot) ++{ ++ for ( ; pages; pages--, idx--, ptr += PAGE_SIZE) ++ __set_fixmap(idx, per_cpu_ptr_to_phys(ptr), prot); ++} ++ ++/* Setup the fixmap mappings only once per-processor */ ++static void __init setup_cpu_entry_area(int cpu) + { + #ifdef CONFIG_X86_64 +- /* On 64-bit systems, we use a read-only fixmap GDT. */ +- pgprot_t prot = PAGE_KERNEL_RO; ++ extern char _entry_trampoline[]; ++ ++ /* On 64-bit systems, we use a read-only fixmap GDT and TSS. */ ++ pgprot_t gdt_prot = PAGE_KERNEL_RO; ++ pgprot_t tss_prot = PAGE_KERNEL_RO; + #else + /* + * On native 32-bit systems, the GDT cannot be read-only because + * our double fault handler uses a task gate, and entering through +- * a task gate needs to change an available TSS to busy. If the GDT +- * is read-only, that will triple fault. ++ * a task gate needs to change an available TSS to busy. If the ++ * GDT is read-only, that will triple fault. The TSS cannot be ++ * read-only because the CPU writes to it on task switches. + * +- * On Xen PV, the GDT must be read-only because the hypervisor requires +- * it. ++ * On Xen PV, the GDT must be read-only because the hypervisor ++ * requires it. + */ +- pgprot_t prot = boot_cpu_has(X86_FEATURE_XENPV) ? ++ pgprot_t gdt_prot = boot_cpu_has(X86_FEATURE_XENPV) ? + PAGE_KERNEL_RO : PAGE_KERNEL; ++ pgprot_t tss_prot = PAGE_KERNEL; ++#endif ++ ++ __set_fixmap(get_cpu_entry_area_index(cpu, gdt), get_cpu_gdt_paddr(cpu), gdt_prot); ++ set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, SYSENTER_stack_page), ++ per_cpu_ptr(&SYSENTER_stack_storage, cpu), 1, ++ PAGE_KERNEL); ++ ++ /* ++ * The Intel SDM says (Volume 3, 7.2.1): ++ * ++ * Avoid placing a page boundary in the part of the TSS that the ++ * processor reads during a task switch (the first 104 bytes). The ++ * processor may not correctly perform address translations if a ++ * boundary occurs in this area. During a task switch, the processor ++ * reads and writes into the first 104 bytes of each TSS (using ++ * contiguous physical addresses beginning with the physical address ++ * of the first byte of the TSS). So, after TSS access begins, if ++ * part of the 104 bytes is not physically contiguous, the processor ++ * will access incorrect information without generating a page-fault ++ * exception. ++ * ++ * There are also a lot of errata involving the TSS spanning a page ++ * boundary. Assert that we're not doing that. ++ */ ++ BUILD_BUG_ON((offsetof(struct tss_struct, x86_tss) ^ ++ offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK); ++ BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0); ++ set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, tss), ++ &per_cpu(cpu_tss_rw, cpu), ++ sizeof(struct tss_struct) / PAGE_SIZE, ++ tss_prot); ++ ++#ifdef CONFIG_X86_32 ++ per_cpu(cpu_entry_area, cpu) = get_cpu_entry_area(cpu); + #endif + +- __set_fixmap(get_cpu_gdt_ro_index(cpu), get_cpu_gdt_paddr(cpu), prot); ++#ifdef CONFIG_X86_64 ++ BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0); ++ BUILD_BUG_ON(sizeof(exception_stacks) != ++ sizeof(((struct cpu_entry_area *)0)->exception_stacks)); ++ set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, exception_stacks), ++ &per_cpu(exception_stacks, cpu), ++ sizeof(exception_stacks) / PAGE_SIZE, ++ PAGE_KERNEL); ++ ++ __set_fixmap(get_cpu_entry_area_index(cpu, entry_trampoline), ++ __pa_symbol(_entry_trampoline), PAGE_KERNEL_RX); ++#endif ++} ++ ++void __init setup_cpu_entry_areas(void) ++{ ++ unsigned int cpu; ++ ++ for_each_possible_cpu(cpu) ++ setup_cpu_entry_area(cpu); + } + + /* Load the original GDT from the per-cpu structure */ +@@ -723,7 +812,7 @@ static void apply_forced_caps(struct cpuinfo_x86 *c) + { + int i; + +- for (i = 0; i < NCAPINTS; i++) { ++ for (i = 0; i < NCAPINTS + NBUGINTS; i++) { + c->x86_capability[i] &= ~cpu_caps_cleared[i]; + c->x86_capability[i] |= cpu_caps_set[i]; + } +@@ -1225,7 +1314,7 @@ void enable_sep_cpu(void) + return; + + cpu = get_cpu(); +- tss = &per_cpu(cpu_tss, cpu); ++ tss = &per_cpu(cpu_tss_rw, cpu); + + /* + * We cache MSR_IA32_SYSENTER_CS's value in the TSS's ss1 field -- +@@ -1234,11 +1323,7 @@ void enable_sep_cpu(void) + + tss->x86_tss.ss1 = __KERNEL_CS; + wrmsr(MSR_IA32_SYSENTER_CS, tss->x86_tss.ss1, 0); +- +- wrmsr(MSR_IA32_SYSENTER_ESP, +- (unsigned long)tss + offsetofend(struct tss_struct, SYSENTER_stack), +- 0); +- ++ wrmsr(MSR_IA32_SYSENTER_ESP, (unsigned long)(cpu_SYSENTER_stack(cpu) + 1), 0); + wrmsr(MSR_IA32_SYSENTER_EIP, (unsigned long)entry_SYSENTER_32, 0); + + put_cpu(); +@@ -1301,18 +1386,16 @@ void print_cpu_info(struct cpuinfo_x86 *c) + pr_cont(")\n"); + } + +-static __init int setup_disablecpuid(char *arg) ++/* ++ * clearcpuid= was already parsed in fpu__init_parse_early_param. ++ * But we need to keep a dummy __setup around otherwise it would ++ * show up as an environment variable for init. ++ */ ++static __init int setup_clearcpuid(char *arg) + { +- int bit; +- +- if (get_option(&arg, &bit) && bit >= 0 && bit < NCAPINTS * 32) +- setup_clear_cpu_cap(bit); +- else +- return 0; +- + return 1; + } +-__setup("clearcpuid=", setup_disablecpuid); ++__setup("clearcpuid=", setup_clearcpuid); + + #ifdef CONFIG_X86_64 + DEFINE_PER_CPU_FIRST(union irq_stack_union, +@@ -1334,25 +1417,19 @@ DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1; + DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT; + EXPORT_PER_CPU_SYMBOL(__preempt_count); + +-/* +- * Special IST stacks which the CPU switches to when it calls +- * an IST-marked descriptor entry. Up to 7 stacks (hardware +- * limit), all of them are 4K, except the debug stack which +- * is 8K. +- */ +-static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = { +- [0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STKSZ, +- [DEBUG_STACK - 1] = DEBUG_STKSZ +-}; +- +-static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks +- [(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]); +- + /* May not be marked __init: used by software suspend */ + void syscall_init(void) + { ++ extern char _entry_trampoline[]; ++ extern char entry_SYSCALL_64_trampoline[]; ++ ++ int cpu = smp_processor_id(); ++ unsigned long SYSCALL64_entry_trampoline = ++ (unsigned long)get_cpu_entry_area(cpu)->entry_trampoline + ++ (entry_SYSCALL_64_trampoline - _entry_trampoline); ++ + wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS); +- wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64); ++ wrmsrl(MSR_LSTAR, SYSCALL64_entry_trampoline); + + #ifdef CONFIG_IA32_EMULATION + wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat); +@@ -1363,7 +1440,7 @@ void syscall_init(void) + * AMD doesn't allow SYSENTER in long mode (either 32- or 64-bit). + */ + wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS); +- wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL); ++ wrmsrl_safe(MSR_IA32_SYSENTER_ESP, (unsigned long)(cpu_SYSENTER_stack(cpu) + 1)); + wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat); + #else + wrmsrl(MSR_CSTAR, (unsigned long)ignore_sysret); +@@ -1507,7 +1584,7 @@ void cpu_init(void) + if (cpu) + load_ucode_ap(); + +- t = &per_cpu(cpu_tss, cpu); ++ t = &per_cpu(cpu_tss_rw, cpu); + oist = &per_cpu(orig_ist, cpu); + + #ifdef CONFIG_NUMA +@@ -1546,7 +1623,7 @@ void cpu_init(void) + * set up and load the per-CPU TSS + */ + if (!oist->ist[0]) { +- char *estacks = per_cpu(exception_stacks, cpu); ++ char *estacks = get_cpu_entry_area(cpu)->exception_stacks; + + for (v = 0; v < N_EXCEPTION_STACKS; v++) { + estacks += exception_stack_sizes[v]; +@@ -1557,7 +1634,7 @@ void cpu_init(void) + } + } + +- t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap); ++ t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET; + + /* + * <= is required because the CPU will access up to +@@ -1572,9 +1649,14 @@ void cpu_init(void) + initialize_tlbstate_and_flush(); + enter_lazy_tlb(&init_mm, me); + +- load_sp0(t, ¤t->thread); +- set_tss_desc(cpu, t); ++ /* ++ * Initialize the TSS. sp0 points to the entry trampoline stack ++ * regardless of what task is running. ++ */ ++ set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); + load_TR_desc(); ++ load_sp0((unsigned long)(cpu_SYSENTER_stack(cpu) + 1)); ++ + load_mm_ldt(&init_mm); + + clear_all_debug_regs(); +@@ -1585,7 +1667,6 @@ void cpu_init(void) + if (is_uv_system()) + uv_cpu_init(); + +- setup_fixmap_gdt(cpu); + load_fixmap_gdt(cpu); + } + +@@ -1595,8 +1676,7 @@ void cpu_init(void) + { + int cpu = smp_processor_id(); + struct task_struct *curr = current; +- struct tss_struct *t = &per_cpu(cpu_tss, cpu); +- struct thread_struct *thread = &curr->thread; ++ struct tss_struct *t = &per_cpu(cpu_tss_rw, cpu); + + wait_for_master_cpu(cpu); + +@@ -1627,12 +1707,16 @@ void cpu_init(void) + initialize_tlbstate_and_flush(); + enter_lazy_tlb(&init_mm, curr); + +- load_sp0(t, thread); +- set_tss_desc(cpu, t); ++ /* ++ * Initialize the TSS. Don't bother initializing sp0, as the initial ++ * task never enters user mode. ++ */ ++ set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); + load_TR_desc(); ++ + load_mm_ldt(&init_mm); + +- t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap); ++ t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET; + + #ifdef CONFIG_DOUBLEFAULT + /* Set up doublefault TSS pointer in the GDT */ +@@ -1644,7 +1728,6 @@ void cpu_init(void) + + fpu__init_cpu(); + +- setup_fixmap_gdt(cpu); + load_fixmap_gdt(cpu); + } + #endif +diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c +new file mode 100644 +index 000000000000..904b0a3c4e53 +--- /dev/null ++++ b/arch/x86/kernel/cpu/cpuid-deps.c +@@ -0,0 +1,121 @@ ++/* Declare dependencies between CPUIDs */ ++#include ++#include ++#include ++#include ++ ++struct cpuid_dep { ++ unsigned int feature; ++ unsigned int depends; ++}; ++ ++/* ++ * Table of CPUID features that depend on others. ++ * ++ * This only includes dependencies that can be usefully disabled, not ++ * features part of the base set (like FPU). ++ * ++ * Note this all is not __init / __initdata because it can be ++ * called from cpu hotplug. It shouldn't do anything in this case, ++ * but it's difficult to tell that to the init reference checker. ++ */ ++const static struct cpuid_dep cpuid_deps[] = { ++ { X86_FEATURE_XSAVEOPT, X86_FEATURE_XSAVE }, ++ { X86_FEATURE_XSAVEC, X86_FEATURE_XSAVE }, ++ { X86_FEATURE_XSAVES, X86_FEATURE_XSAVE }, ++ { X86_FEATURE_AVX, X86_FEATURE_XSAVE }, ++ { X86_FEATURE_PKU, X86_FEATURE_XSAVE }, ++ { X86_FEATURE_MPX, X86_FEATURE_XSAVE }, ++ { X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE }, ++ { X86_FEATURE_FXSR_OPT, X86_FEATURE_FXSR }, ++ { X86_FEATURE_XMM, X86_FEATURE_FXSR }, ++ { X86_FEATURE_XMM2, X86_FEATURE_XMM }, ++ { X86_FEATURE_XMM3, X86_FEATURE_XMM2 }, ++ { X86_FEATURE_XMM4_1, X86_FEATURE_XMM2 }, ++ { X86_FEATURE_XMM4_2, X86_FEATURE_XMM2 }, ++ { X86_FEATURE_XMM3, X86_FEATURE_XMM2 }, ++ { X86_FEATURE_PCLMULQDQ, X86_FEATURE_XMM2 }, ++ { X86_FEATURE_SSSE3, X86_FEATURE_XMM2, }, ++ { X86_FEATURE_F16C, X86_FEATURE_XMM2, }, ++ { X86_FEATURE_AES, X86_FEATURE_XMM2 }, ++ { X86_FEATURE_SHA_NI, X86_FEATURE_XMM2 }, ++ { X86_FEATURE_FMA, X86_FEATURE_AVX }, ++ { X86_FEATURE_AVX2, X86_FEATURE_AVX, }, ++ { X86_FEATURE_AVX512F, X86_FEATURE_AVX, }, ++ { X86_FEATURE_AVX512IFMA, X86_FEATURE_AVX512F }, ++ { X86_FEATURE_AVX512PF, X86_FEATURE_AVX512F }, ++ { X86_FEATURE_AVX512ER, X86_FEATURE_AVX512F }, ++ { X86_FEATURE_AVX512CD, X86_FEATURE_AVX512F }, ++ { X86_FEATURE_AVX512DQ, X86_FEATURE_AVX512F }, ++ { X86_FEATURE_AVX512BW, X86_FEATURE_AVX512F }, ++ { X86_FEATURE_AVX512VL, X86_FEATURE_AVX512F }, ++ { X86_FEATURE_AVX512VBMI, X86_FEATURE_AVX512F }, ++ { X86_FEATURE_AVX512_VBMI2, X86_FEATURE_AVX512VL }, ++ { X86_FEATURE_GFNI, X86_FEATURE_AVX512VL }, ++ { X86_FEATURE_VAES, X86_FEATURE_AVX512VL }, ++ { X86_FEATURE_VPCLMULQDQ, X86_FEATURE_AVX512VL }, ++ { X86_FEATURE_AVX512_VNNI, X86_FEATURE_AVX512VL }, ++ { X86_FEATURE_AVX512_BITALG, X86_FEATURE_AVX512VL }, ++ { X86_FEATURE_AVX512_4VNNIW, X86_FEATURE_AVX512F }, ++ { X86_FEATURE_AVX512_4FMAPS, X86_FEATURE_AVX512F }, ++ { X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F }, ++ {} ++}; ++ ++static inline void clear_feature(struct cpuinfo_x86 *c, unsigned int feature) ++{ ++ /* ++ * Note: This could use the non atomic __*_bit() variants, but the ++ * rest of the cpufeature code uses atomics as well, so keep it for ++ * consistency. Cleanup all of it separately. ++ */ ++ if (!c) { ++ clear_cpu_cap(&boot_cpu_data, feature); ++ set_bit(feature, (unsigned long *)cpu_caps_cleared); ++ } else { ++ clear_bit(feature, (unsigned long *)c->x86_capability); ++ } ++} ++ ++/* Take the capabilities and the BUG bits into account */ ++#define MAX_FEATURE_BITS ((NCAPINTS + NBUGINTS) * sizeof(u32) * 8) ++ ++static void do_clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature) ++{ ++ DECLARE_BITMAP(disable, MAX_FEATURE_BITS); ++ const struct cpuid_dep *d; ++ bool changed; ++ ++ if (WARN_ON(feature >= MAX_FEATURE_BITS)) ++ return; ++ ++ clear_feature(c, feature); ++ ++ /* Collect all features to disable, handling dependencies */ ++ memset(disable, 0, sizeof(disable)); ++ __set_bit(feature, disable); ++ ++ /* Loop until we get a stable state. */ ++ do { ++ changed = false; ++ for (d = cpuid_deps; d->feature; d++) { ++ if (!test_bit(d->depends, disable)) ++ continue; ++ if (__test_and_set_bit(d->feature, disable)) ++ continue; ++ ++ changed = true; ++ clear_feature(c, d->feature); ++ } ++ } while (changed); ++} ++ ++void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature) ++{ ++ do_clear_cpu_cap(c, feature); ++} ++ ++void setup_clear_cpu_cap(unsigned int feature) ++{ ++ do_clear_cpu_cap(NULL, feature); ++} +diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c +index 4fa90006ac68..bea8d3e24f50 100644 +--- a/arch/x86/kernel/cpu/hypervisor.c ++++ b/arch/x86/kernel/cpu/hypervisor.c +@@ -26,6 +26,12 @@ + #include + #include + ++extern const struct hypervisor_x86 x86_hyper_vmware; ++extern const struct hypervisor_x86 x86_hyper_ms_hyperv; ++extern const struct hypervisor_x86 x86_hyper_xen_pv; ++extern const struct hypervisor_x86 x86_hyper_xen_hvm; ++extern const struct hypervisor_x86 x86_hyper_kvm; ++ + static const __initconst struct hypervisor_x86 * const hypervisors[] = + { + #ifdef CONFIG_XEN_PV +@@ -41,54 +47,52 @@ static const __initconst struct hypervisor_x86 * const hypervisors[] = + #endif + }; + +-const struct hypervisor_x86 *x86_hyper; +-EXPORT_SYMBOL(x86_hyper); ++enum x86_hypervisor_type x86_hyper_type; ++EXPORT_SYMBOL(x86_hyper_type); + +-static inline void __init ++static inline const struct hypervisor_x86 * __init + detect_hypervisor_vendor(void) + { +- const struct hypervisor_x86 *h, * const *p; ++ const struct hypervisor_x86 *h = NULL, * const *p; + uint32_t pri, max_pri = 0; + + for (p = hypervisors; p < hypervisors + ARRAY_SIZE(hypervisors); p++) { +- h = *p; +- pri = h->detect(); +- if (pri != 0 && pri > max_pri) { ++ pri = (*p)->detect(); ++ if (pri > max_pri) { + max_pri = pri; +- x86_hyper = h; ++ h = *p; + } + } + +- if (max_pri) +- pr_info("Hypervisor detected: %s\n", x86_hyper->name); ++ if (h) ++ pr_info("Hypervisor detected: %s\n", h->name); ++ ++ return h; + } + +-void __init init_hypervisor_platform(void) ++static void __init copy_array(const void *src, void *target, unsigned int size) + { ++ unsigned int i, n = size / sizeof(void *); ++ const void * const *from = (const void * const *)src; ++ const void **to = (const void **)target; + +- detect_hypervisor_vendor(); +- +- if (!x86_hyper) +- return; +- +- if (x86_hyper->init_platform) +- x86_hyper->init_platform(); ++ for (i = 0; i < n; i++) ++ if (from[i]) ++ to[i] = from[i]; + } + +-bool __init hypervisor_x2apic_available(void) ++void __init init_hypervisor_platform(void) + { +- return x86_hyper && +- x86_hyper->x2apic_available && +- x86_hyper->x2apic_available(); +-} ++ const struct hypervisor_x86 *h; + +-void hypervisor_pin_vcpu(int cpu) +-{ +- if (!x86_hyper) ++ h = detect_hypervisor_vendor(); ++ ++ if (!h) + return; + +- if (x86_hyper->pin_vcpu) +- x86_hyper->pin_vcpu(cpu); +- else +- WARN_ONCE(1, "vcpu pinning requested but not supported!\n"); ++ copy_array(&h->init, &x86_init.hyper, sizeof(h->init)); ++ copy_array(&h->runtime, &x86_platform.hyper, sizeof(h->runtime)); ++ ++ x86_hyper_type = h->type; ++ x86_init.hyper.init_platform(); + } +diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c +index 236324e83a3a..85eb5fc180c8 100644 +--- a/arch/x86/kernel/cpu/mshyperv.c ++++ b/arch/x86/kernel/cpu/mshyperv.c +@@ -254,9 +254,9 @@ static void __init ms_hyperv_init_platform(void) + #endif + } + +-const __refconst struct hypervisor_x86 x86_hyper_ms_hyperv = { ++const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = { + .name = "Microsoft Hyper-V", + .detect = ms_hyperv_platform, +- .init_platform = ms_hyperv_init_platform, ++ .type = X86_HYPER_MS_HYPERV, ++ .init.init_platform = ms_hyperv_init_platform, + }; +-EXPORT_SYMBOL(x86_hyper_ms_hyperv); +diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c +index 40ed26852ebd..8e005329648b 100644 +--- a/arch/x86/kernel/cpu/vmware.c ++++ b/arch/x86/kernel/cpu/vmware.c +@@ -205,10 +205,10 @@ static bool __init vmware_legacy_x2apic_available(void) + (eax & (1 << VMWARE_PORT_CMD_LEGACY_X2APIC)) != 0; + } + +-const __refconst struct hypervisor_x86 x86_hyper_vmware = { ++const __initconst struct hypervisor_x86 x86_hyper_vmware = { + .name = "VMware", + .detect = vmware_platform, +- .init_platform = vmware_platform_setup, +- .x2apic_available = vmware_legacy_x2apic_available, ++ .type = X86_HYPER_VMWARE, ++ .init.init_platform = vmware_platform_setup, ++ .init.x2apic_available = vmware_legacy_x2apic_available, + }; +-EXPORT_SYMBOL(x86_hyper_vmware); +diff --git a/arch/x86/kernel/doublefault.c b/arch/x86/kernel/doublefault.c +index 0e662c55ae90..0b8cedb20d6d 100644 +--- a/arch/x86/kernel/doublefault.c ++++ b/arch/x86/kernel/doublefault.c +@@ -50,25 +50,23 @@ static void doublefault_fn(void) + cpu_relax(); + } + +-struct tss_struct doublefault_tss __cacheline_aligned = { +- .x86_tss = { +- .sp0 = STACK_START, +- .ss0 = __KERNEL_DS, +- .ldt = 0, +- .io_bitmap_base = INVALID_IO_BITMAP_OFFSET, +- +- .ip = (unsigned long) doublefault_fn, +- /* 0x2 bit is always set */ +- .flags = X86_EFLAGS_SF | 0x2, +- .sp = STACK_START, +- .es = __USER_DS, +- .cs = __KERNEL_CS, +- .ss = __KERNEL_DS, +- .ds = __USER_DS, +- .fs = __KERNEL_PERCPU, +- +- .__cr3 = __pa_nodebug(swapper_pg_dir), +- } ++struct x86_hw_tss doublefault_tss __cacheline_aligned = { ++ .sp0 = STACK_START, ++ .ss0 = __KERNEL_DS, ++ .ldt = 0, ++ .io_bitmap_base = INVALID_IO_BITMAP_OFFSET, ++ ++ .ip = (unsigned long) doublefault_fn, ++ /* 0x2 bit is always set */ ++ .flags = X86_EFLAGS_SF | 0x2, ++ .sp = STACK_START, ++ .es = __USER_DS, ++ .cs = __KERNEL_CS, ++ .ss = __KERNEL_DS, ++ .ds = __USER_DS, ++ .fs = __KERNEL_PERCPU, ++ ++ .__cr3 = __pa_nodebug(swapper_pg_dir), + }; + + /* dummy for do_double_fault() call */ +diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c +index f13b4c00a5de..bbd6d986e2d0 100644 +--- a/arch/x86/kernel/dumpstack.c ++++ b/arch/x86/kernel/dumpstack.c +@@ -43,6 +43,24 @@ bool in_task_stack(unsigned long *stack, struct task_struct *task, + return true; + } + ++bool in_sysenter_stack(unsigned long *stack, struct stack_info *info) ++{ ++ struct SYSENTER_stack *ss = cpu_SYSENTER_stack(smp_processor_id()); ++ ++ void *begin = ss; ++ void *end = ss + 1; ++ ++ if ((void *)stack < begin || (void *)stack >= end) ++ return false; ++ ++ info->type = STACK_TYPE_SYSENTER; ++ info->begin = begin; ++ info->end = end; ++ info->next_sp = NULL; ++ ++ return true; ++} ++ + static void printk_stack_address(unsigned long address, int reliable, + char *log_lvl) + { +@@ -50,6 +68,28 @@ static void printk_stack_address(unsigned long address, int reliable, + printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address); + } + ++void show_iret_regs(struct pt_regs *regs) ++{ ++ printk(KERN_DEFAULT "RIP: %04x:%pS\n", (int)regs->cs, (void *)regs->ip); ++ printk(KERN_DEFAULT "RSP: %04x:%016lx EFLAGS: %08lx", (int)regs->ss, ++ regs->sp, regs->flags); ++} ++ ++static void show_regs_safe(struct stack_info *info, struct pt_regs *regs) ++{ ++ if (on_stack(info, regs, sizeof(*regs))) ++ __show_regs(regs, 0); ++ else if (on_stack(info, (void *)regs + IRET_FRAME_OFFSET, ++ IRET_FRAME_SIZE)) { ++ /* ++ * When an interrupt or exception occurs in entry code, the ++ * full pt_regs might not have been saved yet. In that case ++ * just print the iret frame. ++ */ ++ show_iret_regs(regs); ++ } ++} ++ + void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, + unsigned long *stack, char *log_lvl) + { +@@ -71,31 +111,35 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, + * - task stack + * - interrupt stack + * - HW exception stacks (double fault, nmi, debug, mce) ++ * - SYSENTER stack + * +- * x86-32 can have up to three stacks: ++ * x86-32 can have up to four stacks: + * - task stack + * - softirq stack + * - hardirq stack ++ * - SYSENTER stack + */ + for (regs = NULL; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) { + const char *stack_name; + +- /* +- * If we overflowed the task stack into a guard page, jump back +- * to the bottom of the usable stack. +- */ +- if (task_stack_page(task) - (void *)stack < PAGE_SIZE) +- stack = task_stack_page(task); +- +- if (get_stack_info(stack, task, &stack_info, &visit_mask)) +- break; ++ if (get_stack_info(stack, task, &stack_info, &visit_mask)) { ++ /* ++ * We weren't on a valid stack. It's possible that ++ * we overflowed a valid stack into a guard page. ++ * See if the next page up is valid so that we can ++ * generate some kind of backtrace if this happens. ++ */ ++ stack = (unsigned long *)PAGE_ALIGN((unsigned long)stack); ++ if (get_stack_info(stack, task, &stack_info, &visit_mask)) ++ break; ++ } + + stack_name = stack_type_name(stack_info.type); + if (stack_name) + printk("%s <%s>\n", log_lvl, stack_name); + +- if (regs && on_stack(&stack_info, regs, sizeof(*regs))) +- __show_regs(regs, 0); ++ if (regs) ++ show_regs_safe(&stack_info, regs); + + /* + * Scan the stack, printing any text addresses we find. At the +@@ -119,7 +163,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, + + /* + * Don't print regs->ip again if it was already printed +- * by __show_regs() below. ++ * by show_regs_safe() below. + */ + if (regs && stack == ®s->ip) + goto next; +@@ -155,8 +199,8 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, + + /* if the frame has entry regs, print them */ + regs = unwind_get_entry_regs(&state); +- if (regs && on_stack(&stack_info, regs, sizeof(*regs))) +- __show_regs(regs, 0); ++ if (regs) ++ show_regs_safe(&stack_info, regs); + } + + if (stack_name) +diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c +index daefae83a3aa..5ff13a6b3680 100644 +--- a/arch/x86/kernel/dumpstack_32.c ++++ b/arch/x86/kernel/dumpstack_32.c +@@ -26,6 +26,9 @@ const char *stack_type_name(enum stack_type type) + if (type == STACK_TYPE_SOFTIRQ) + return "SOFTIRQ"; + ++ if (type == STACK_TYPE_SYSENTER) ++ return "SYSENTER"; ++ + return NULL; + } + +@@ -93,6 +96,9 @@ int get_stack_info(unsigned long *stack, struct task_struct *task, + if (task != current) + goto unknown; + ++ if (in_sysenter_stack(stack, info)) ++ goto recursion_check; ++ + if (in_hardirq_stack(stack, info)) + goto recursion_check; + +diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c +index 88ce2ffdb110..abc828f8c297 100644 +--- a/arch/x86/kernel/dumpstack_64.c ++++ b/arch/x86/kernel/dumpstack_64.c +@@ -37,6 +37,9 @@ const char *stack_type_name(enum stack_type type) + if (type == STACK_TYPE_IRQ) + return "IRQ"; + ++ if (type == STACK_TYPE_SYSENTER) ++ return "SYSENTER"; ++ + if (type >= STACK_TYPE_EXCEPTION && type <= STACK_TYPE_EXCEPTION_LAST) + return exception_stack_names[type - STACK_TYPE_EXCEPTION]; + +@@ -115,6 +118,9 @@ int get_stack_info(unsigned long *stack, struct task_struct *task, + if (in_irq_stack(stack, info)) + goto recursion_check; + ++ if (in_sysenter_stack(stack, info)) ++ goto recursion_check; ++ + goto unknown; + + recursion_check: +diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c +index 7affb7e3d9a5..6abd83572b01 100644 +--- a/arch/x86/kernel/fpu/init.c ++++ b/arch/x86/kernel/fpu/init.c +@@ -249,6 +249,10 @@ static void __init fpu__init_system_ctx_switch(void) + */ + static void __init fpu__init_parse_early_param(void) + { ++ char arg[32]; ++ char *argptr = arg; ++ int bit; ++ + if (cmdline_find_option_bool(boot_command_line, "no387")) + setup_clear_cpu_cap(X86_FEATURE_FPU); + +@@ -266,6 +270,13 @@ static void __init fpu__init_parse_early_param(void) + + if (cmdline_find_option_bool(boot_command_line, "noxsaves")) + setup_clear_cpu_cap(X86_FEATURE_XSAVES); ++ ++ if (cmdline_find_option(boot_command_line, "clearcpuid", arg, ++ sizeof(arg)) && ++ get_option(&argptr, &bit) && ++ bit >= 0 && ++ bit < NCAPINTS * 32) ++ setup_clear_cpu_cap(bit); + } + + /* +diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c +index f1d5476c9022..87a57b7642d3 100644 +--- a/arch/x86/kernel/fpu/xstate.c ++++ b/arch/x86/kernel/fpu/xstate.c +@@ -15,6 +15,7 @@ + #include + + #include ++#include + + /* + * Although we spell it out in here, the Processor Trace +@@ -36,6 +37,19 @@ static const char *xfeature_names[] = + "unknown xstate feature" , + }; + ++static short xsave_cpuid_features[] __initdata = { ++ X86_FEATURE_FPU, ++ X86_FEATURE_XMM, ++ X86_FEATURE_AVX, ++ X86_FEATURE_MPX, ++ X86_FEATURE_MPX, ++ X86_FEATURE_AVX512F, ++ X86_FEATURE_AVX512F, ++ X86_FEATURE_AVX512F, ++ X86_FEATURE_INTEL_PT, ++ X86_FEATURE_PKU, ++}; ++ + /* + * Mask of xstate features supported by the CPU and the kernel: + */ +@@ -59,26 +73,6 @@ unsigned int fpu_user_xstate_size; + void fpu__xstate_clear_all_cpu_caps(void) + { + setup_clear_cpu_cap(X86_FEATURE_XSAVE); +- setup_clear_cpu_cap(X86_FEATURE_XSAVEOPT); +- setup_clear_cpu_cap(X86_FEATURE_XSAVEC); +- setup_clear_cpu_cap(X86_FEATURE_XSAVES); +- setup_clear_cpu_cap(X86_FEATURE_AVX); +- setup_clear_cpu_cap(X86_FEATURE_AVX2); +- setup_clear_cpu_cap(X86_FEATURE_AVX512F); +- setup_clear_cpu_cap(X86_FEATURE_AVX512IFMA); +- setup_clear_cpu_cap(X86_FEATURE_AVX512PF); +- setup_clear_cpu_cap(X86_FEATURE_AVX512ER); +- setup_clear_cpu_cap(X86_FEATURE_AVX512CD); +- setup_clear_cpu_cap(X86_FEATURE_AVX512DQ); +- setup_clear_cpu_cap(X86_FEATURE_AVX512BW); +- setup_clear_cpu_cap(X86_FEATURE_AVX512VL); +- setup_clear_cpu_cap(X86_FEATURE_MPX); +- setup_clear_cpu_cap(X86_FEATURE_XGETBV1); +- setup_clear_cpu_cap(X86_FEATURE_AVX512VBMI); +- setup_clear_cpu_cap(X86_FEATURE_PKU); +- setup_clear_cpu_cap(X86_FEATURE_AVX512_4VNNIW); +- setup_clear_cpu_cap(X86_FEATURE_AVX512_4FMAPS); +- setup_clear_cpu_cap(X86_FEATURE_AVX512_VPOPCNTDQ); + } + + /* +@@ -726,6 +720,7 @@ void __init fpu__init_system_xstate(void) + unsigned int eax, ebx, ecx, edx; + static int on_boot_cpu __initdata = 1; + int err; ++ int i; + + WARN_ON_FPU(!on_boot_cpu); + on_boot_cpu = 0; +@@ -759,6 +754,14 @@ void __init fpu__init_system_xstate(void) + goto out_disable; + } + ++ /* ++ * Clear XSAVE features that are disabled in the normal CPUID. ++ */ ++ for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) { ++ if (!boot_cpu_has(xsave_cpuid_features[i])) ++ xfeatures_mask &= ~BIT(i); ++ } ++ + xfeatures_mask &= fpu__get_supported_xfeatures_mask(); + + /* Enable xstate instructions to be able to continue with initialization: */ +diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S +index f1d528bb66a6..c29020907886 100644 +--- a/arch/x86/kernel/head_32.S ++++ b/arch/x86/kernel/head_32.S +@@ -212,9 +212,6 @@ ENTRY(startup_32_smp) + #endif + + .Ldefault_entry: +-#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \ +- X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \ +- X86_CR0_PG) + movl $(CR0_STATE & ~X86_CR0_PG),%eax + movl %eax,%cr0 + +@@ -402,7 +399,7 @@ ENTRY(early_idt_handler_array) + # 24(%rsp) error code + i = 0 + .rept NUM_EXCEPTION_VECTORS +- .ifeq (EXCEPTION_ERRCODE_MASK >> i) & 1 ++ .if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0 + pushl $0 # Dummy error code, to make stack frame uniform + .endif + pushl $i # 20(%esp) Vector number +diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S +index 6dde3f3fc1f8..7dca675fe78d 100644 +--- a/arch/x86/kernel/head_64.S ++++ b/arch/x86/kernel/head_64.S +@@ -38,11 +38,12 @@ + * + */ + +-#define p4d_index(x) (((x) >> P4D_SHIFT) & (PTRS_PER_P4D-1)) + #define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1)) + ++#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) + PGD_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE) + PGD_START_KERNEL = pgd_index(__START_KERNEL_map) ++#endif + L3_START_KERNEL = pud_index(__START_KERNEL_map) + + .text +@@ -50,6 +51,7 @@ L3_START_KERNEL = pud_index(__START_KERNEL_map) + .code64 + .globl startup_64 + startup_64: ++ UNWIND_HINT_EMPTY + /* + * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0, + * and someone has loaded an identity mapped page table +@@ -89,6 +91,7 @@ startup_64: + addq $(early_top_pgt - __START_KERNEL_map), %rax + jmp 1f + ENTRY(secondary_startup_64) ++ UNWIND_HINT_EMPTY + /* + * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0, + * and someone has loaded a mapped page table. +@@ -133,6 +136,7 @@ ENTRY(secondary_startup_64) + movq $1f, %rax + jmp *%rax + 1: ++ UNWIND_HINT_EMPTY + + /* Check if nx is implemented */ + movl $0x80000001, %eax +@@ -150,9 +154,6 @@ ENTRY(secondary_startup_64) + 1: wrmsr /* Make changes effective */ + + /* Setup cr0 */ +-#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \ +- X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \ +- X86_CR0_PG) + movl $CR0_STATE, %eax + /* Make changes effective */ + movq %rax, %cr0 +@@ -235,7 +236,7 @@ ENTRY(secondary_startup_64) + pushq %rax # target address in negative space + lretq + .Lafter_lret: +-ENDPROC(secondary_startup_64) ++END(secondary_startup_64) + + #include "verify_cpu.S" + +@@ -247,6 +248,7 @@ ENDPROC(secondary_startup_64) + */ + ENTRY(start_cpu0) + movq initial_stack(%rip), %rsp ++ UNWIND_HINT_EMPTY + jmp .Ljump_to_C_code + ENDPROC(start_cpu0) + #endif +@@ -266,26 +268,24 @@ ENDPROC(start_cpu0) + .quad init_thread_union + THREAD_SIZE - SIZEOF_PTREGS + __FINITDATA + +-bad_address: +- jmp bad_address +- + __INIT + ENTRY(early_idt_handler_array) +- # 104(%rsp) %rflags +- # 96(%rsp) %cs +- # 88(%rsp) %rip +- # 80(%rsp) error code + i = 0 + .rept NUM_EXCEPTION_VECTORS +- .ifeq (EXCEPTION_ERRCODE_MASK >> i) & 1 +- pushq $0 # Dummy error code, to make stack frame uniform ++ .if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0 ++ UNWIND_HINT_IRET_REGS ++ pushq $0 # Dummy error code, to make stack frame uniform ++ .else ++ UNWIND_HINT_IRET_REGS offset=8 + .endif + pushq $i # 72(%rsp) Vector number + jmp early_idt_handler_common ++ UNWIND_HINT_IRET_REGS + i = i + 1 + .fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc + .endr +-ENDPROC(early_idt_handler_array) ++ UNWIND_HINT_IRET_REGS offset=16 ++END(early_idt_handler_array) + + early_idt_handler_common: + /* +@@ -313,6 +313,7 @@ early_idt_handler_common: + pushq %r13 /* pt_regs->r13 */ + pushq %r14 /* pt_regs->r14 */ + pushq %r15 /* pt_regs->r15 */ ++ UNWIND_HINT_REGS + + cmpq $14,%rsi /* Page fault? */ + jnz 10f +@@ -327,8 +328,8 @@ early_idt_handler_common: + + 20: + decl early_recursion_flag(%rip) +- jmp restore_regs_and_iret +-ENDPROC(early_idt_handler_common) ++ jmp restore_regs_and_return_to_kernel ++END(early_idt_handler_common) + + __INITDATA + +@@ -362,10 +363,7 @@ NEXT_PAGE(early_dynamic_pgts) + + .data + +-#ifndef CONFIG_XEN +-NEXT_PAGE(init_top_pgt) +- .fill 512,8,0 +-#else ++#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) + NEXT_PAGE(init_top_pgt) + .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC + .org init_top_pgt + PGD_PAGE_OFFSET*8, 0 +@@ -382,6 +380,9 @@ NEXT_PAGE(level2_ident_pgt) + * Don't set NX because code runs from these pages. + */ + PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD) ++#else ++NEXT_PAGE(init_top_pgt) ++ .fill 512,8,0 + #endif + + #ifdef CONFIG_X86_5LEVEL +@@ -435,7 +436,7 @@ ENTRY(phys_base) + EXPORT_SYMBOL(phys_base) + + #include "../../x86/xen/xen-head.S" +- ++ + __PAGE_ALIGNED_BSS + NEXT_PAGE(empty_zero_page) + .skip PAGE_SIZE +diff --git a/arch/x86/kernel/ioport.c b/arch/x86/kernel/ioport.c +index 3feb648781c4..2f723301eb58 100644 +--- a/arch/x86/kernel/ioport.c ++++ b/arch/x86/kernel/ioport.c +@@ -67,7 +67,7 @@ asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int turn_on) + * because the ->io_bitmap_max value must match the bitmap + * contents: + */ +- tss = &per_cpu(cpu_tss, get_cpu()); ++ tss = &per_cpu(cpu_tss_rw, get_cpu()); + + if (turn_on) + bitmap_clear(t->io_bitmap_ptr, from, num); +diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c +index 52089c043160..aa9d51eea9d0 100644 +--- a/arch/x86/kernel/irq.c ++++ b/arch/x86/kernel/irq.c +@@ -219,18 +219,6 @@ __visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs) + /* high bit used in ret_from_ code */ + unsigned vector = ~regs->orig_ax; + +- /* +- * NB: Unlike exception entries, IRQ entries do not reliably +- * handle context tracking in the low-level entry code. This is +- * because syscall entries execute briefly with IRQs on before +- * updating context tracking state, so we can take an IRQ from +- * kernel mode with CONTEXT_USER. The low-level entry code only +- * updates the context if we came from user mode, so we won't +- * switch to CONTEXT_KERNEL. We'll fix that once the syscall +- * code is cleaned up enough that we can cleanly defer enabling +- * IRQs. +- */ +- + entering_irq(); + + /* entering_irq() tells RCU that we're not quiescent. Check it. */ +diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c +index 020efbf5786b..d86e344f5b3d 100644 +--- a/arch/x86/kernel/irq_64.c ++++ b/arch/x86/kernel/irq_64.c +@@ -57,10 +57,10 @@ static inline void stack_overflow_check(struct pt_regs *regs) + if (regs->sp >= estack_top && regs->sp <= estack_bottom) + return; + +- WARN_ONCE(1, "do_IRQ(): %s has overflown the kernel stack (cur:%Lx,sp:%lx,irq stk top-bottom:%Lx-%Lx,exception stk top-bottom:%Lx-%Lx)\n", ++ WARN_ONCE(1, "do_IRQ(): %s has overflown the kernel stack (cur:%Lx,sp:%lx,irq stk top-bottom:%Lx-%Lx,exception stk top-bottom:%Lx-%Lx,ip:%pF)\n", + current->comm, curbase, regs->sp, + irq_stack_top, irq_stack_bottom, +- estack_top, estack_bottom); ++ estack_top, estack_bottom, (void *)regs->ip); + + if (sysctl_panic_on_stackoverflow) + panic("low stack detected by irq handler - check messages\n"); +diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c +index 8bb9594d0761..a94de09edbed 100644 +--- a/arch/x86/kernel/kvm.c ++++ b/arch/x86/kernel/kvm.c +@@ -544,12 +544,12 @@ static uint32_t __init kvm_detect(void) + return kvm_cpuid_base(); + } + +-const struct hypervisor_x86 x86_hyper_kvm __refconst = { ++const __initconst struct hypervisor_x86 x86_hyper_kvm = { + .name = "KVM", + .detect = kvm_detect, +- .x2apic_available = kvm_para_available, ++ .type = X86_HYPER_KVM, ++ .init.x2apic_available = kvm_para_available, + }; +-EXPORT_SYMBOL_GPL(x86_hyper_kvm); + + static __init int activate_jump_labels(void) + { +diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c +index ae5615b03def..1c1eae961340 100644 +--- a/arch/x86/kernel/ldt.c ++++ b/arch/x86/kernel/ldt.c +@@ -103,7 +103,7 @@ static void finalize_ldt_struct(struct ldt_struct *ldt) + static void install_ldt(struct mm_struct *current_mm, + struct ldt_struct *ldt) + { +- /* Synchronizes with lockless_dereference in load_mm_ldt. */ ++ /* Synchronizes with READ_ONCE in load_mm_ldt. */ + smp_store_release(¤t_mm->context.ldt, ldt); + + /* Activate the LDT for all CPUs using current_mm. */ +diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c +index ac0be8283325..9edadabf04f6 100644 +--- a/arch/x86/kernel/paravirt_patch_64.c ++++ b/arch/x86/kernel/paravirt_patch_64.c +@@ -10,7 +10,6 @@ DEF_NATIVE(pv_irq_ops, save_fl, "pushfq; popq %rax"); + DEF_NATIVE(pv_mmu_ops, read_cr2, "movq %cr2, %rax"); + DEF_NATIVE(pv_mmu_ops, read_cr3, "movq %cr3, %rax"); + DEF_NATIVE(pv_mmu_ops, write_cr3, "movq %rdi, %cr3"); +-DEF_NATIVE(pv_mmu_ops, flush_tlb_single, "invlpg (%rdi)"); + DEF_NATIVE(pv_cpu_ops, wbinvd, "wbinvd"); + + DEF_NATIVE(pv_cpu_ops, usergs_sysret64, "swapgs; sysretq"); +@@ -60,7 +59,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf, + PATCH_SITE(pv_mmu_ops, read_cr2); + PATCH_SITE(pv_mmu_ops, read_cr3); + PATCH_SITE(pv_mmu_ops, write_cr3); +- PATCH_SITE(pv_mmu_ops, flush_tlb_single); + PATCH_SITE(pv_cpu_ops, wbinvd); + #if defined(CONFIG_PARAVIRT_SPINLOCKS) + case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock): +diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c +index c67685337c5a..517415978409 100644 +--- a/arch/x86/kernel/process.c ++++ b/arch/x86/kernel/process.c +@@ -47,9 +47,25 @@ + * section. Since TSS's are completely CPU-local, we want them + * on exact cacheline boundaries, to eliminate cacheline ping-pong. + */ +-__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = { ++__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss_rw) = { + .x86_tss = { +- .sp0 = TOP_OF_INIT_STACK, ++ /* ++ * .sp0 is only used when entering ring 0 from a lower ++ * privilege level. Since the init task never runs anything ++ * but ring 0 code, there is no need for a valid value here. ++ * Poison it. ++ */ ++ .sp0 = (1UL << (BITS_PER_LONG-1)) + 1, ++ ++#ifdef CONFIG_X86_64 ++ /* ++ * .sp1 is cpu_current_top_of_stack. The init task never ++ * runs user code, but cpu_current_top_of_stack should still ++ * be well defined before the first context switch. ++ */ ++ .sp1 = TOP_OF_INIT_STACK, ++#endif ++ + #ifdef CONFIG_X86_32 + .ss0 = __KERNEL_DS, + .ss1 = __KERNEL_CS, +@@ -65,11 +81,8 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = { + */ + .io_bitmap = { [0 ... IO_BITMAP_LONGS] = ~0 }, + #endif +-#ifdef CONFIG_X86_32 +- .SYSENTER_stack_canary = STACK_END_MAGIC, +-#endif + }; +-EXPORT_PER_CPU_SYMBOL(cpu_tss); ++EXPORT_PER_CPU_SYMBOL(cpu_tss_rw); + + DEFINE_PER_CPU(bool, __tss_limit_invalid); + EXPORT_PER_CPU_SYMBOL_GPL(__tss_limit_invalid); +@@ -98,7 +111,7 @@ void exit_thread(struct task_struct *tsk) + struct fpu *fpu = &t->fpu; + + if (bp) { +- struct tss_struct *tss = &per_cpu(cpu_tss, get_cpu()); ++ struct tss_struct *tss = &per_cpu(cpu_tss_rw, get_cpu()); + + t->io_bitmap_ptr = NULL; + clear_thread_flag(TIF_IO_BITMAP); +diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c +index 11966251cd42..5224c6099184 100644 +--- a/arch/x86/kernel/process_32.c ++++ b/arch/x86/kernel/process_32.c +@@ -234,7 +234,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) + struct fpu *prev_fpu = &prev->fpu; + struct fpu *next_fpu = &next->fpu; + int cpu = smp_processor_id(); +- struct tss_struct *tss = &per_cpu(cpu_tss, cpu); ++ struct tss_struct *tss = &per_cpu(cpu_tss_rw, cpu); + + /* never put a printk in __switch_to... printk() calls wake_up*() indirectly */ + +@@ -284,9 +284,11 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) + + /* + * Reload esp0 and cpu_current_top_of_stack. This changes +- * current_thread_info(). ++ * current_thread_info(). Refresh the SYSENTER configuration in ++ * case prev or next is vm86. + */ +- load_sp0(tss, next); ++ update_sp0(next_p); ++ refresh_sysenter_cs(next); + this_cpu_write(cpu_current_top_of_stack, + (unsigned long)task_stack_page(next_p) + + THREAD_SIZE); +diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c +index 302e7b2572d1..c75466232016 100644 +--- a/arch/x86/kernel/process_64.c ++++ b/arch/x86/kernel/process_64.c +@@ -69,9 +69,8 @@ void __show_regs(struct pt_regs *regs, int all) + unsigned int fsindex, gsindex; + unsigned int ds, cs, es; + +- printk(KERN_DEFAULT "RIP: %04lx:%pS\n", regs->cs, (void *)regs->ip); +- printk(KERN_DEFAULT "RSP: %04lx:%016lx EFLAGS: %08lx", regs->ss, +- regs->sp, regs->flags); ++ show_iret_regs(regs); ++ + if (regs->orig_ax != -1) + pr_cont(" ORIG_RAX: %016lx\n", regs->orig_ax); + else +@@ -88,6 +87,9 @@ void __show_regs(struct pt_regs *regs, int all) + printk(KERN_DEFAULT "R13: %016lx R14: %016lx R15: %016lx\n", + regs->r13, regs->r14, regs->r15); + ++ if (!all) ++ return; ++ + asm("movl %%ds,%0" : "=r" (ds)); + asm("movl %%cs,%0" : "=r" (cs)); + asm("movl %%es,%0" : "=r" (es)); +@@ -98,9 +100,6 @@ void __show_regs(struct pt_regs *regs, int all) + rdmsrl(MSR_GS_BASE, gs); + rdmsrl(MSR_KERNEL_GS_BASE, shadowgs); + +- if (!all) +- return; +- + cr0 = read_cr0(); + cr2 = read_cr2(); + cr3 = __read_cr3(); +@@ -274,7 +273,6 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp, + struct inactive_task_frame *frame; + struct task_struct *me = current; + +- p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE; + childregs = task_pt_regs(p); + fork_frame = container_of(childregs, struct fork_frame, regs); + frame = &fork_frame->frame; +@@ -401,7 +399,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) + struct fpu *prev_fpu = &prev->fpu; + struct fpu *next_fpu = &next->fpu; + int cpu = smp_processor_id(); +- struct tss_struct *tss = &per_cpu(cpu_tss, cpu); ++ struct tss_struct *tss = &per_cpu(cpu_tss_rw, cpu); + + WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) && + this_cpu_read(irq_count) != -1); +@@ -463,9 +461,10 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) + * Switch the PDA and FPU contexts. + */ + this_cpu_write(current_task, next_p); ++ this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p)); + +- /* Reload esp0 and ss1. This changes current_thread_info(). */ +- load_sp0(tss, next); ++ /* Reload sp0. */ ++ update_sp0(next_p); + + /* + * Now maybe reload the debug registers and handle I/O bitmaps +diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c +index 5e0453f18a57..142126ab5aae 100644 +--- a/arch/x86/kernel/smpboot.c ++++ b/arch/x86/kernel/smpboot.c +@@ -962,8 +962,7 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle) + #ifdef CONFIG_X86_32 + /* Stack for startup_32 can be just as for start_secondary onwards */ + irq_ctx_init(cpu); +- per_cpu(cpu_current_top_of_stack, cpu) = +- (unsigned long)task_stack_page(idle) + THREAD_SIZE; ++ per_cpu(cpu_current_top_of_stack, cpu) = task_top_of_stack(idle); + #else + initial_gs = per_cpu_offset(cpu); + #endif +diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c +index 5a6b8f809792..74136fd16f49 100644 +--- a/arch/x86/kernel/traps.c ++++ b/arch/x86/kernel/traps.c +@@ -141,8 +141,7 @@ void ist_begin_non_atomic(struct pt_regs *regs) + * will catch asm bugs and any attempt to use ist_preempt_enable + * from double_fault. + */ +- BUG_ON((unsigned long)(current_top_of_stack() - +- current_stack_pointer) >= THREAD_SIZE); ++ BUG_ON(!on_thread_stack()); + + preempt_enable_no_resched(); + } +@@ -349,9 +348,15 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) + + /* + * If IRET takes a non-IST fault on the espfix64 stack, then we +- * end up promoting it to a doublefault. In that case, modify +- * the stack to make it look like we just entered the #GP +- * handler from user space, similar to bad_iret. ++ * end up promoting it to a doublefault. In that case, take ++ * advantage of the fact that we're not using the normal (TSS.sp0) ++ * stack right now. We can write a fake #GP(0) frame at TSS.sp0 ++ * and then modify our own IRET frame so that, when we return, ++ * we land directly at the #GP(0) vector with the stack already ++ * set up according to its expectations. ++ * ++ * The net result is that our #GP handler will think that we ++ * entered from usermode with the bad user context. + * + * No need for ist_enter here because we don't use RCU. + */ +@@ -359,13 +364,26 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) + regs->cs == __KERNEL_CS && + regs->ip == (unsigned long)native_irq_return_iret) + { +- struct pt_regs *normal_regs = task_pt_regs(current); ++ struct pt_regs *gpregs = (struct pt_regs *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1; + +- /* Fake a #GP(0) from userspace. */ +- memmove(&normal_regs->ip, (void *)regs->sp, 5*8); +- normal_regs->orig_ax = 0; /* Missing (lost) #GP error code */ ++ /* ++ * regs->sp points to the failing IRET frame on the ++ * ESPFIX64 stack. Copy it to the entry stack. This fills ++ * in gpregs->ss through gpregs->ip. ++ * ++ */ ++ memmove(&gpregs->ip, (void *)regs->sp, 5*8); ++ gpregs->orig_ax = 0; /* Missing (lost) #GP error code */ ++ ++ /* ++ * Adjust our frame so that we return straight to the #GP ++ * vector with the expected RSP value. This is safe because ++ * we won't enable interupts or schedule before we invoke ++ * general_protection, so nothing will clobber the stack ++ * frame we just set up. ++ */ + regs->ip = (unsigned long)general_protection; +- regs->sp = (unsigned long)&normal_regs->orig_ax; ++ regs->sp = (unsigned long)&gpregs->orig_ax; + + return; + } +@@ -390,7 +408,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) + * + * Processors update CR2 whenever a page fault is detected. If a + * second page fault occurs while an earlier page fault is being +- * deliv- ered, the faulting linear address of the second fault will ++ * delivered, the faulting linear address of the second fault will + * overwrite the contents of CR2 (replacing the previous + * address). These updates to CR2 occur even if the page fault + * results in a double fault or occurs during the delivery of a +@@ -601,14 +619,15 @@ NOKPROBE_SYMBOL(do_int3); + + #ifdef CONFIG_X86_64 + /* +- * Help handler running on IST stack to switch off the IST stack if the +- * interrupted code was in user mode. The actual stack switch is done in +- * entry_64.S ++ * Help handler running on a per-cpu (IST or entry trampoline) stack ++ * to switch to the normal thread stack if the interrupted code was in ++ * user mode. The actual stack switch is done in entry_64.S + */ + asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs) + { +- struct pt_regs *regs = task_pt_regs(current); +- *regs = *eregs; ++ struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1; ++ if (regs != eregs) ++ *regs = *eregs; + return regs; + } + NOKPROBE_SYMBOL(sync_regs); +@@ -624,13 +643,13 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s) + /* + * This is called from entry_64.S early in handling a fault + * caused by a bad iret to user mode. To handle the fault +- * correctly, we want move our stack frame to task_pt_regs +- * and we want to pretend that the exception came from the +- * iret target. ++ * correctly, we want to move our stack frame to where it would ++ * be had we entered directly on the entry stack (rather than ++ * just below the IRET frame) and we want to pretend that the ++ * exception came from the IRET target. + */ + struct bad_iret_stack *new_stack = +- container_of(task_pt_regs(current), +- struct bad_iret_stack, regs); ++ (struct bad_iret_stack *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1; + + /* Copy the IRET target to the new stack. */ + memmove(&new_stack->regs.ip, (void *)s->regs.sp, 5*8); +@@ -795,14 +814,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) + debug_stack_usage_dec(); + + exit: +-#if defined(CONFIG_X86_32) +- /* +- * This is the most likely code path that involves non-trivial use +- * of the SYSENTER stack. Check that we haven't overrun it. +- */ +- WARN(this_cpu_read(cpu_tss.SYSENTER_stack_canary) != STACK_END_MAGIC, +- "Overran or corrupted SYSENTER stack\n"); +-#endif + ist_exit(regs); + } + NOKPROBE_SYMBOL(do_debug); +@@ -929,6 +940,9 @@ dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code) + + void __init trap_init(void) + { ++ /* Init cpu_entry_area before IST entries are set up */ ++ setup_cpu_entry_areas(); ++ + idt_setup_traps(); + + /* +diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c +index a3f973b2c97a..be86a865087a 100644 +--- a/arch/x86/kernel/unwind_orc.c ++++ b/arch/x86/kernel/unwind_orc.c +@@ -253,22 +253,15 @@ unsigned long *unwind_get_return_address_ptr(struct unwind_state *state) + return NULL; + } + +-static bool stack_access_ok(struct unwind_state *state, unsigned long addr, ++static bool stack_access_ok(struct unwind_state *state, unsigned long _addr, + size_t len) + { + struct stack_info *info = &state->stack_info; ++ void *addr = (void *)_addr; + +- /* +- * If the address isn't on the current stack, switch to the next one. +- * +- * We may have to traverse multiple stacks to deal with the possibility +- * that info->next_sp could point to an empty stack and the address +- * could be on a subsequent stack. +- */ +- while (!on_stack(info, (void *)addr, len)) +- if (get_stack_info(info->next_sp, state->task, info, +- &state->stack_mask)) +- return false; ++ if (!on_stack(info, addr, len) && ++ (get_stack_info(addr, state->task, info, &state->stack_mask))) ++ return false; + + return true; + } +@@ -283,42 +276,32 @@ static bool deref_stack_reg(struct unwind_state *state, unsigned long addr, + return true; + } + +-#define REGS_SIZE (sizeof(struct pt_regs)) +-#define SP_OFFSET (offsetof(struct pt_regs, sp)) +-#define IRET_REGS_SIZE (REGS_SIZE - offsetof(struct pt_regs, ip)) +-#define IRET_SP_OFFSET (SP_OFFSET - offsetof(struct pt_regs, ip)) +- + static bool deref_stack_regs(struct unwind_state *state, unsigned long addr, +- unsigned long *ip, unsigned long *sp, bool full) ++ unsigned long *ip, unsigned long *sp) + { +- size_t regs_size = full ? REGS_SIZE : IRET_REGS_SIZE; +- size_t sp_offset = full ? SP_OFFSET : IRET_SP_OFFSET; +- struct pt_regs *regs = (struct pt_regs *)(addr + regs_size - REGS_SIZE); +- +- if (IS_ENABLED(CONFIG_X86_64)) { +- if (!stack_access_ok(state, addr, regs_size)) +- return false; ++ struct pt_regs *regs = (struct pt_regs *)addr; + +- *ip = regs->ip; +- *sp = regs->sp; ++ /* x86-32 support will be more complicated due to the ®s->sp hack */ ++ BUILD_BUG_ON(IS_ENABLED(CONFIG_X86_32)); + +- return true; +- } +- +- if (!stack_access_ok(state, addr, sp_offset)) ++ if (!stack_access_ok(state, addr, sizeof(struct pt_regs))) + return false; + + *ip = regs->ip; ++ *sp = regs->sp; ++ return true; ++} + +- if (user_mode(regs)) { +- if (!stack_access_ok(state, addr + sp_offset, +- REGS_SIZE - SP_OFFSET)) +- return false; ++static bool deref_stack_iret_regs(struct unwind_state *state, unsigned long addr, ++ unsigned long *ip, unsigned long *sp) ++{ ++ struct pt_regs *regs = (void *)addr - IRET_FRAME_OFFSET; + +- *sp = regs->sp; +- } else +- *sp = (unsigned long)®s->sp; ++ if (!stack_access_ok(state, addr, IRET_FRAME_SIZE)) ++ return false; + ++ *ip = regs->ip; ++ *sp = regs->sp; + return true; + } + +@@ -327,7 +310,6 @@ bool unwind_next_frame(struct unwind_state *state) + unsigned long ip_p, sp, orig_ip, prev_sp = state->sp; + enum stack_type prev_type = state->stack_info.type; + struct orc_entry *orc; +- struct pt_regs *ptregs; + bool indirect = false; + + if (unwind_done(state)) +@@ -435,7 +417,7 @@ bool unwind_next_frame(struct unwind_state *state) + break; + + case ORC_TYPE_REGS: +- if (!deref_stack_regs(state, sp, &state->ip, &state->sp, true)) { ++ if (!deref_stack_regs(state, sp, &state->ip, &state->sp)) { + orc_warn("can't dereference registers at %p for ip %pB\n", + (void *)sp, (void *)orig_ip); + goto done; +@@ -447,20 +429,14 @@ bool unwind_next_frame(struct unwind_state *state) + break; + + case ORC_TYPE_REGS_IRET: +- if (!deref_stack_regs(state, sp, &state->ip, &state->sp, false)) { ++ if (!deref_stack_iret_regs(state, sp, &state->ip, &state->sp)) { + orc_warn("can't dereference iret registers at %p for ip %pB\n", + (void *)sp, (void *)orig_ip); + goto done; + } + +- ptregs = container_of((void *)sp, struct pt_regs, ip); +- if ((unsigned long)ptregs >= prev_sp && +- on_stack(&state->stack_info, ptregs, REGS_SIZE)) { +- state->regs = ptregs; +- state->full_regs = false; +- } else +- state->regs = NULL; +- ++ state->regs = (void *)sp - IRET_FRAME_OFFSET; ++ state->full_regs = false; + state->signal = true; + break; + +@@ -553,8 +529,18 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task, + } + + if (get_stack_info((unsigned long *)state->sp, state->task, +- &state->stack_info, &state->stack_mask)) +- return; ++ &state->stack_info, &state->stack_mask)) { ++ /* ++ * We weren't on a valid stack. It's possible that ++ * we overflowed a valid stack into a guard page. ++ * See if the next page up is valid so that we can ++ * generate some kind of backtrace if this happens. ++ */ ++ void *next_page = (void *)PAGE_ALIGN((unsigned long)state->sp); ++ if (get_stack_info(next_page, state->task, &state->stack_info, ++ &state->stack_mask)) ++ return; ++ } + + /* + * The caller can provide the address of the first frame directly +diff --git a/arch/x86/kernel/verify_cpu.S b/arch/x86/kernel/verify_cpu.S +index 014ea59aa153..3d3c2f71f617 100644 +--- a/arch/x86/kernel/verify_cpu.S ++++ b/arch/x86/kernel/verify_cpu.S +@@ -33,7 +33,7 @@ + #include + #include + +-verify_cpu: ++ENTRY(verify_cpu) + pushf # Save caller passed flags + push $0 # Kill any dangerous flags + popf +@@ -139,3 +139,4 @@ verify_cpu: + popf # Restore caller passed flags + xorl %eax, %eax + ret ++ENDPROC(verify_cpu) +diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c +index 68244742ecb0..5edb27f1a2c4 100644 +--- a/arch/x86/kernel/vm86_32.c ++++ b/arch/x86/kernel/vm86_32.c +@@ -55,6 +55,7 @@ + #include + #include + #include ++#include + + /* + * Known problems: +@@ -94,7 +95,6 @@ + + void save_v86_state(struct kernel_vm86_regs *regs, int retval) + { +- struct tss_struct *tss; + struct task_struct *tsk = current; + struct vm86plus_struct __user *user; + struct vm86 *vm86 = current->thread.vm86; +@@ -146,12 +146,13 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval) + do_exit(SIGSEGV); + } + +- tss = &per_cpu(cpu_tss, get_cpu()); ++ preempt_disable(); + tsk->thread.sp0 = vm86->saved_sp0; + tsk->thread.sysenter_cs = __KERNEL_CS; +- load_sp0(tss, &tsk->thread); ++ update_sp0(tsk); ++ refresh_sysenter_cs(&tsk->thread); + vm86->saved_sp0 = 0; +- put_cpu(); ++ preempt_enable(); + + memcpy(®s->pt, &vm86->regs32, sizeof(struct pt_regs)); + +@@ -237,7 +238,6 @@ SYSCALL_DEFINE2(vm86, unsigned long, cmd, unsigned long, arg) + + static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus) + { +- struct tss_struct *tss; + struct task_struct *tsk = current; + struct vm86 *vm86 = tsk->thread.vm86; + struct kernel_vm86_regs vm86regs; +@@ -365,15 +365,17 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus) + vm86->saved_sp0 = tsk->thread.sp0; + lazy_save_gs(vm86->regs32.gs); + +- tss = &per_cpu(cpu_tss, get_cpu()); + /* make room for real-mode segments */ ++ preempt_disable(); + tsk->thread.sp0 += 16; + +- if (static_cpu_has(X86_FEATURE_SEP)) ++ if (static_cpu_has(X86_FEATURE_SEP)) { + tsk->thread.sysenter_cs = 0; ++ refresh_sysenter_cs(&tsk->thread); ++ } + +- load_sp0(tss, &tsk->thread); +- put_cpu(); ++ update_sp0(tsk); ++ preempt_enable(); + + if (vm86->flags & VM86_SCREEN_BITMAP) + mark_screen_rdonly(tsk->mm); +diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S +index a4009fb9be87..d2a8b5a24a44 100644 +--- a/arch/x86/kernel/vmlinux.lds.S ++++ b/arch/x86/kernel/vmlinux.lds.S +@@ -107,6 +107,15 @@ SECTIONS + SOFTIRQENTRY_TEXT + *(.fixup) + *(.gnu.warning) ++ ++#ifdef CONFIG_X86_64 ++ . = ALIGN(PAGE_SIZE); ++ _entry_trampoline = .; ++ *(.entry_trampoline) ++ . = ALIGN(PAGE_SIZE); ++ ASSERT(. - _entry_trampoline == PAGE_SIZE, "entry trampoline is too big"); ++#endif ++ + /* End of text section */ + _etext = .; + } :text = 0x9090 +diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c +index a088b2c47f73..5b2d10c1973a 100644 +--- a/arch/x86/kernel/x86_init.c ++++ b/arch/x86/kernel/x86_init.c +@@ -28,6 +28,8 @@ void x86_init_noop(void) { } + void __init x86_init_uint_noop(unsigned int unused) { } + int __init iommu_init_noop(void) { return 0; } + void iommu_shutdown_noop(void) { } ++bool __init bool_x86_init_noop(void) { return false; } ++void x86_op_int_noop(int cpu) { } + + /* + * The platform setup functions are preset with the default functions +@@ -81,6 +83,12 @@ struct x86_init_ops x86_init __initdata = { + .init_irq = x86_default_pci_init_irq, + .fixup_irqs = x86_default_pci_fixup_irqs, + }, ++ ++ .hyper = { ++ .init_platform = x86_init_noop, ++ .x2apic_available = bool_x86_init_noop, ++ .init_mem_mapping = x86_init_noop, ++ }, + }; + + struct x86_cpuinit_ops x86_cpuinit = { +@@ -101,6 +109,7 @@ struct x86_platform_ops x86_platform __ro_after_init = { + .get_nmi_reason = default_get_nmi_reason, + .save_sched_clock_state = tsc_save_sched_clock_state, + .restore_sched_clock_state = tsc_restore_sched_clock_state, ++ .hyper.pin_vcpu = x86_op_int_noop, + }; + + EXPORT_SYMBOL_GPL(x86_platform); +diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c +index 7a69cf053711..13ebeedcec07 100644 +--- a/arch/x86/kvm/mmu.c ++++ b/arch/x86/kvm/mmu.c +@@ -5476,13 +5476,13 @@ int kvm_mmu_module_init(void) + + pte_list_desc_cache = kmem_cache_create("pte_list_desc", + sizeof(struct pte_list_desc), +- 0, 0, NULL); ++ 0, SLAB_ACCOUNT, NULL); + if (!pte_list_desc_cache) + goto nomem; + + mmu_page_header_cache = kmem_cache_create("kvm_mmu_page_header", + sizeof(struct kvm_mmu_page), +- 0, 0, NULL); ++ 0, SLAB_ACCOUNT, NULL); + if (!mmu_page_header_cache) + goto nomem; + +diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c +index bc5921c1e2f2..47d9432756f3 100644 +--- a/arch/x86/kvm/vmx.c ++++ b/arch/x86/kvm/vmx.c +@@ -2295,7 +2295,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) + * processors. See 22.2.4. + */ + vmcs_writel(HOST_TR_BASE, +- (unsigned long)this_cpu_ptr(&cpu_tss)); ++ (unsigned long)&get_cpu_entry_area(cpu)->tss.x86_tss); + vmcs_writel(HOST_GDTR_BASE, (unsigned long)gdt); /* 22.2.4 */ + + /* +diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c +index 553f8fd23cc4..4846eff7e4c8 100644 +--- a/arch/x86/lib/delay.c ++++ b/arch/x86/lib/delay.c +@@ -107,10 +107,10 @@ static void delay_mwaitx(unsigned long __loops) + delay = min_t(u64, MWAITX_MAX_LOOPS, loops); + + /* +- * Use cpu_tss as a cacheline-aligned, seldomly ++ * Use cpu_tss_rw as a cacheline-aligned, seldomly + * accessed per-cpu variable as the monitor target. + */ +- __monitorx(raw_cpu_ptr(&cpu_tss), 0, 0); ++ __monitorx(raw_cpu_ptr(&cpu_tss_rw), 0, 0); + + /* + * AMD, like Intel, supports the EAX hint and EAX=0xf +diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c +index b0ff378650a9..3109ba6c6ede 100644 +--- a/arch/x86/mm/fault.c ++++ b/arch/x86/mm/fault.c +@@ -29,26 +29,6 @@ + #define CREATE_TRACE_POINTS + #include + +-/* +- * Page fault error code bits: +- * +- * bit 0 == 0: no page found 1: protection fault +- * bit 1 == 0: read access 1: write access +- * bit 2 == 0: kernel-mode access 1: user-mode access +- * bit 3 == 1: use of reserved bit detected +- * bit 4 == 1: fault was an instruction fetch +- * bit 5 == 1: protection keys block access +- */ +-enum x86_pf_error_code { +- +- PF_PROT = 1 << 0, +- PF_WRITE = 1 << 1, +- PF_USER = 1 << 2, +- PF_RSVD = 1 << 3, +- PF_INSTR = 1 << 4, +- PF_PK = 1 << 5, +-}; +- + /* + * Returns 0 if mmiotrace is disabled, or if the fault is not + * handled by mmiotrace: +@@ -150,7 +130,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr) + * If it was a exec (instruction fetch) fault on NX page, then + * do not ignore the fault: + */ +- if (error_code & PF_INSTR) ++ if (error_code & X86_PF_INSTR) + return 0; + + instr = (void *)convert_ip_to_linear(current, regs); +@@ -180,7 +160,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr) + * siginfo so userspace can discover which protection key was set + * on the PTE. + * +- * If we get here, we know that the hardware signaled a PF_PK ++ * If we get here, we know that the hardware signaled a X86_PF_PK + * fault and that there was a VMA once we got in the fault + * handler. It does *not* guarantee that the VMA we find here + * was the one that we faulted on. +@@ -205,7 +185,7 @@ static void fill_sig_info_pkey(int si_code, siginfo_t *info, u32 *pkey) + /* + * force_sig_info_fault() is called from a number of + * contexts, some of which have a VMA and some of which +- * do not. The PF_PK handing happens after we have a ++ * do not. The X86_PF_PK handing happens after we have a + * valid VMA, so we should never reach this without a + * valid VMA. + */ +@@ -698,7 +678,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, + if (!oops_may_print()) + return; + +- if (error_code & PF_INSTR) { ++ if (error_code & X86_PF_INSTR) { + unsigned int level; + pgd_t *pgd; + pte_t *pte; +@@ -780,7 +760,7 @@ no_context(struct pt_regs *regs, unsigned long error_code, + */ + if (current->thread.sig_on_uaccess_err && signal) { + tsk->thread.trap_nr = X86_TRAP_PF; +- tsk->thread.error_code = error_code | PF_USER; ++ tsk->thread.error_code = error_code | X86_PF_USER; + tsk->thread.cr2 = address; + + /* XXX: hwpoison faults will set the wrong code. */ +@@ -898,7 +878,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, + struct task_struct *tsk = current; + + /* User mode accesses just cause a SIGSEGV */ +- if (error_code & PF_USER) { ++ if (error_code & X86_PF_USER) { + /* + * It's possible to have interrupts off here: + */ +@@ -919,7 +899,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, + * Instruction fetch faults in the vsyscall page might need + * emulation. + */ +- if (unlikely((error_code & PF_INSTR) && ++ if (unlikely((error_code & X86_PF_INSTR) && + ((address & ~0xfff) == VSYSCALL_ADDR))) { + if (emulate_vsyscall(regs, address)) + return; +@@ -932,7 +912,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, + * are always protection faults. + */ + if (address >= TASK_SIZE_MAX) +- error_code |= PF_PROT; ++ error_code |= X86_PF_PROT; + + if (likely(show_unhandled_signals)) + show_signal_msg(regs, error_code, address, tsk); +@@ -993,11 +973,11 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code, + + if (!boot_cpu_has(X86_FEATURE_OSPKE)) + return false; +- if (error_code & PF_PK) ++ if (error_code & X86_PF_PK) + return true; + /* this checks permission keys on the VMA: */ +- if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE), +- (error_code & PF_INSTR), foreign)) ++ if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE), ++ (error_code & X86_PF_INSTR), foreign)) + return true; + return false; + } +@@ -1025,7 +1005,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address, + int code = BUS_ADRERR; + + /* Kernel mode? Handle exceptions or die: */ +- if (!(error_code & PF_USER)) { ++ if (!(error_code & X86_PF_USER)) { + no_context(regs, error_code, address, SIGBUS, BUS_ADRERR); + return; + } +@@ -1053,14 +1033,14 @@ static noinline void + mm_fault_error(struct pt_regs *regs, unsigned long error_code, + unsigned long address, u32 *pkey, unsigned int fault) + { +- if (fatal_signal_pending(current) && !(error_code & PF_USER)) { ++ if (fatal_signal_pending(current) && !(error_code & X86_PF_USER)) { + no_context(regs, error_code, address, 0, 0); + return; + } + + if (fault & VM_FAULT_OOM) { + /* Kernel mode? Handle exceptions or die: */ +- if (!(error_code & PF_USER)) { ++ if (!(error_code & X86_PF_USER)) { + no_context(regs, error_code, address, + SIGSEGV, SEGV_MAPERR); + return; +@@ -1085,16 +1065,16 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code, + + static int spurious_fault_check(unsigned long error_code, pte_t *pte) + { +- if ((error_code & PF_WRITE) && !pte_write(*pte)) ++ if ((error_code & X86_PF_WRITE) && !pte_write(*pte)) + return 0; + +- if ((error_code & PF_INSTR) && !pte_exec(*pte)) ++ if ((error_code & X86_PF_INSTR) && !pte_exec(*pte)) + return 0; + /* + * Note: We do not do lazy flushing on protection key +- * changes, so no spurious fault will ever set PF_PK. ++ * changes, so no spurious fault will ever set X86_PF_PK. + */ +- if ((error_code & PF_PK)) ++ if ((error_code & X86_PF_PK)) + return 1; + + return 1; +@@ -1140,8 +1120,8 @@ spurious_fault(unsigned long error_code, unsigned long address) + * change, so user accesses are not expected to cause spurious + * faults. + */ +- if (error_code != (PF_WRITE | PF_PROT) +- && error_code != (PF_INSTR | PF_PROT)) ++ if (error_code != (X86_PF_WRITE | X86_PF_PROT) && ++ error_code != (X86_PF_INSTR | X86_PF_PROT)) + return 0; + + pgd = init_mm.pgd + pgd_index(address); +@@ -1201,19 +1181,19 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) + * always an unconditional error and can never result in + * a follow-up action to resolve the fault, like a COW. + */ +- if (error_code & PF_PK) ++ if (error_code & X86_PF_PK) + return 1; + + /* + * Make sure to check the VMA so that we do not perform +- * faults just to hit a PF_PK as soon as we fill in a ++ * faults just to hit a X86_PF_PK as soon as we fill in a + * page. + */ +- if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE), +- (error_code & PF_INSTR), foreign)) ++ if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE), ++ (error_code & X86_PF_INSTR), foreign)) + return 1; + +- if (error_code & PF_WRITE) { ++ if (error_code & X86_PF_WRITE) { + /* write, present and write, not present: */ + if (unlikely(!(vma->vm_flags & VM_WRITE))) + return 1; +@@ -1221,7 +1201,7 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) + } + + /* read, present: */ +- if (unlikely(error_code & PF_PROT)) ++ if (unlikely(error_code & X86_PF_PROT)) + return 1; + + /* read, not present: */ +@@ -1244,7 +1224,7 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs) + if (!static_cpu_has(X86_FEATURE_SMAP)) + return false; + +- if (error_code & PF_USER) ++ if (error_code & X86_PF_USER) + return false; + + if (!user_mode(regs) && (regs->flags & X86_EFLAGS_AC)) +@@ -1297,7 +1277,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, + * protection error (error_code & 9) == 0. + */ + if (unlikely(fault_in_kernel_space(address))) { +- if (!(error_code & (PF_RSVD | PF_USER | PF_PROT))) { ++ if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) { + if (vmalloc_fault(address) >= 0) + return; + +@@ -1325,7 +1305,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, + if (unlikely(kprobes_fault(regs))) + return; + +- if (unlikely(error_code & PF_RSVD)) ++ if (unlikely(error_code & X86_PF_RSVD)) + pgtable_bad(regs, error_code, address); + + if (unlikely(smap_violation(error_code, regs))) { +@@ -1351,7 +1331,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, + */ + if (user_mode(regs)) { + local_irq_enable(); +- error_code |= PF_USER; ++ error_code |= X86_PF_USER; + flags |= FAULT_FLAG_USER; + } else { + if (regs->flags & X86_EFLAGS_IF) +@@ -1360,9 +1340,9 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, + + perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); + +- if (error_code & PF_WRITE) ++ if (error_code & X86_PF_WRITE) + flags |= FAULT_FLAG_WRITE; +- if (error_code & PF_INSTR) ++ if (error_code & X86_PF_INSTR) + flags |= FAULT_FLAG_INSTRUCTION; + + /* +@@ -1382,7 +1362,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, + * space check, thus avoiding the deadlock: + */ + if (unlikely(!down_read_trylock(&mm->mmap_sem))) { +- if ((error_code & PF_USER) == 0 && ++ if (!(error_code & X86_PF_USER) && + !search_exception_tables(regs->ip)) { + bad_area_nosemaphore(regs, error_code, address, NULL); + return; +@@ -1409,7 +1389,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, + bad_area(regs, error_code, address); + return; + } +- if (error_code & PF_USER) { ++ if (error_code & X86_PF_USER) { + /* + * Accessing the stack below %sp is always a bug. + * The large cushion allows instructions like enter +diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c +index af5c1ed21d43..a22c2b95e513 100644 +--- a/arch/x86/mm/init.c ++++ b/arch/x86/mm/init.c +@@ -671,7 +671,7 @@ void __init init_mem_mapping(void) + load_cr3(swapper_pg_dir); + __flush_tlb_all(); + +- hypervisor_init_mem_mapping(); ++ x86_init.hyper.init_mem_mapping(); + + early_memtest(0, max_pfn_mapped << PAGE_SHIFT); + } +diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c +index 048fbe8fc274..adcea90a2046 100644 +--- a/arch/x86/mm/init_64.c ++++ b/arch/x86/mm/init_64.c +@@ -1426,16 +1426,16 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node) + + #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HAVE_BOOTMEM_INFO_NODE) + void register_page_bootmem_memmap(unsigned long section_nr, +- struct page *start_page, unsigned long size) ++ struct page *start_page, unsigned long nr_pages) + { + unsigned long addr = (unsigned long)start_page; +- unsigned long end = (unsigned long)(start_page + size); ++ unsigned long end = (unsigned long)(start_page + nr_pages); + unsigned long next; + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; +- unsigned int nr_pages; ++ unsigned int nr_pmd_pages; + struct page *page; + + for (; addr < end; addr = next) { +@@ -1482,9 +1482,9 @@ void register_page_bootmem_memmap(unsigned long section_nr, + if (pmd_none(*pmd)) + continue; + +- nr_pages = 1 << (get_order(PMD_SIZE)); ++ nr_pmd_pages = 1 << get_order(PMD_SIZE); + page = pmd_page(*pmd); +- while (nr_pages--) ++ while (nr_pmd_pages--) + get_page_bootmem(section_nr, page++, + SECTION_INFO); + } +diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c +index 8f5be3eb40dd..9ec70d780f1f 100644 +--- a/arch/x86/mm/kasan_init_64.c ++++ b/arch/x86/mm/kasan_init_64.c +@@ -4,19 +4,150 @@ + #include + #include + #include ++#include + #include + #include + #include + #include + + #include ++#include + #include + #include + #include + + extern struct range pfn_mapped[E820_MAX_ENTRIES]; + +-static int __init map_range(struct range *range) ++static p4d_t tmp_p4d_table[PTRS_PER_P4D] __initdata __aligned(PAGE_SIZE); ++ ++static __init void *early_alloc(size_t size, int nid) ++{ ++ return memblock_virt_alloc_try_nid_nopanic(size, size, ++ __pa(MAX_DMA_ADDRESS), BOOTMEM_ALLOC_ACCESSIBLE, nid); ++} ++ ++static void __init kasan_populate_pmd(pmd_t *pmd, unsigned long addr, ++ unsigned long end, int nid) ++{ ++ pte_t *pte; ++ ++ if (pmd_none(*pmd)) { ++ void *p; ++ ++ if (boot_cpu_has(X86_FEATURE_PSE) && ++ ((end - addr) == PMD_SIZE) && ++ IS_ALIGNED(addr, PMD_SIZE)) { ++ p = early_alloc(PMD_SIZE, nid); ++ if (p && pmd_set_huge(pmd, __pa(p), PAGE_KERNEL)) ++ return; ++ else if (p) ++ memblock_free(__pa(p), PMD_SIZE); ++ } ++ ++ p = early_alloc(PAGE_SIZE, nid); ++ pmd_populate_kernel(&init_mm, pmd, p); ++ } ++ ++ pte = pte_offset_kernel(pmd, addr); ++ do { ++ pte_t entry; ++ void *p; ++ ++ if (!pte_none(*pte)) ++ continue; ++ ++ p = early_alloc(PAGE_SIZE, nid); ++ entry = pfn_pte(PFN_DOWN(__pa(p)), PAGE_KERNEL); ++ set_pte_at(&init_mm, addr, pte, entry); ++ } while (pte++, addr += PAGE_SIZE, addr != end); ++} ++ ++static void __init kasan_populate_pud(pud_t *pud, unsigned long addr, ++ unsigned long end, int nid) ++{ ++ pmd_t *pmd; ++ unsigned long next; ++ ++ if (pud_none(*pud)) { ++ void *p; ++ ++ if (boot_cpu_has(X86_FEATURE_GBPAGES) && ++ ((end - addr) == PUD_SIZE) && ++ IS_ALIGNED(addr, PUD_SIZE)) { ++ p = early_alloc(PUD_SIZE, nid); ++ if (p && pud_set_huge(pud, __pa(p), PAGE_KERNEL)) ++ return; ++ else if (p) ++ memblock_free(__pa(p), PUD_SIZE); ++ } ++ ++ p = early_alloc(PAGE_SIZE, nid); ++ pud_populate(&init_mm, pud, p); ++ } ++ ++ pmd = pmd_offset(pud, addr); ++ do { ++ next = pmd_addr_end(addr, end); ++ if (!pmd_large(*pmd)) ++ kasan_populate_pmd(pmd, addr, next, nid); ++ } while (pmd++, addr = next, addr != end); ++} ++ ++static void __init kasan_populate_p4d(p4d_t *p4d, unsigned long addr, ++ unsigned long end, int nid) ++{ ++ pud_t *pud; ++ unsigned long next; ++ ++ if (p4d_none(*p4d)) { ++ void *p = early_alloc(PAGE_SIZE, nid); ++ ++ p4d_populate(&init_mm, p4d, p); ++ } ++ ++ pud = pud_offset(p4d, addr); ++ do { ++ next = pud_addr_end(addr, end); ++ if (!pud_large(*pud)) ++ kasan_populate_pud(pud, addr, next, nid); ++ } while (pud++, addr = next, addr != end); ++} ++ ++static void __init kasan_populate_pgd(pgd_t *pgd, unsigned long addr, ++ unsigned long end, int nid) ++{ ++ void *p; ++ p4d_t *p4d; ++ unsigned long next; ++ ++ if (pgd_none(*pgd)) { ++ p = early_alloc(PAGE_SIZE, nid); ++ pgd_populate(&init_mm, pgd, p); ++ } ++ ++ p4d = p4d_offset(pgd, addr); ++ do { ++ next = p4d_addr_end(addr, end); ++ kasan_populate_p4d(p4d, addr, next, nid); ++ } while (p4d++, addr = next, addr != end); ++} ++ ++static void __init kasan_populate_shadow(unsigned long addr, unsigned long end, ++ int nid) ++{ ++ pgd_t *pgd; ++ unsigned long next; ++ ++ addr = addr & PAGE_MASK; ++ end = round_up(end, PAGE_SIZE); ++ pgd = pgd_offset_k(addr); ++ do { ++ next = pgd_addr_end(addr, end); ++ kasan_populate_pgd(pgd, addr, next, nid); ++ } while (pgd++, addr = next, addr != end); ++} ++ ++static void __init map_range(struct range *range) + { + unsigned long start; + unsigned long end; +@@ -24,15 +155,17 @@ static int __init map_range(struct range *range) + start = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->start)); + end = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->end)); + +- return vmemmap_populate(start, end, NUMA_NO_NODE); ++ kasan_populate_shadow(start, end, early_pfn_to_nid(range->start)); + } + + static void __init clear_pgds(unsigned long start, + unsigned long end) + { + pgd_t *pgd; ++ /* See comment in kasan_init() */ ++ unsigned long pgd_end = end & PGDIR_MASK; + +- for (; start < end; start += PGDIR_SIZE) { ++ for (; start < pgd_end; start += PGDIR_SIZE) { + pgd = pgd_offset_k(start); + /* + * With folded p4d, pgd_clear() is nop, use p4d_clear() +@@ -43,29 +176,61 @@ static void __init clear_pgds(unsigned long start, + else + pgd_clear(pgd); + } ++ ++ pgd = pgd_offset_k(start); ++ for (; start < end; start += P4D_SIZE) ++ p4d_clear(p4d_offset(pgd, start)); ++} ++ ++static inline p4d_t *early_p4d_offset(pgd_t *pgd, unsigned long addr) ++{ ++ unsigned long p4d; ++ ++ if (!IS_ENABLED(CONFIG_X86_5LEVEL)) ++ return (p4d_t *)pgd; ++ ++ p4d = __pa_nodebug(pgd_val(*pgd)) & PTE_PFN_MASK; ++ p4d += __START_KERNEL_map - phys_base; ++ return (p4d_t *)p4d + p4d_index(addr); ++} ++ ++static void __init kasan_early_p4d_populate(pgd_t *pgd, ++ unsigned long addr, ++ unsigned long end) ++{ ++ pgd_t pgd_entry; ++ p4d_t *p4d, p4d_entry; ++ unsigned long next; ++ ++ if (pgd_none(*pgd)) { ++ pgd_entry = __pgd(_KERNPG_TABLE | __pa_nodebug(kasan_zero_p4d)); ++ set_pgd(pgd, pgd_entry); ++ } ++ ++ p4d = early_p4d_offset(pgd, addr); ++ do { ++ next = p4d_addr_end(addr, end); ++ ++ if (!p4d_none(*p4d)) ++ continue; ++ ++ p4d_entry = __p4d(_KERNPG_TABLE | __pa_nodebug(kasan_zero_pud)); ++ set_p4d(p4d, p4d_entry); ++ } while (p4d++, addr = next, addr != end && p4d_none(*p4d)); + } + + static void __init kasan_map_early_shadow(pgd_t *pgd) + { +- int i; +- unsigned long start = KASAN_SHADOW_START; ++ /* See comment in kasan_init() */ ++ unsigned long addr = KASAN_SHADOW_START & PGDIR_MASK; + unsigned long end = KASAN_SHADOW_END; ++ unsigned long next; + +- for (i = pgd_index(start); start < end; i++) { +- switch (CONFIG_PGTABLE_LEVELS) { +- case 4: +- pgd[i] = __pgd(__pa_nodebug(kasan_zero_pud) | +- _KERNPG_TABLE); +- break; +- case 5: +- pgd[i] = __pgd(__pa_nodebug(kasan_zero_p4d) | +- _KERNPG_TABLE); +- break; +- default: +- BUILD_BUG(); +- } +- start += PGDIR_SIZE; +- } ++ pgd += pgd_index(addr); ++ do { ++ next = pgd_addr_end(addr, end); ++ kasan_early_p4d_populate(pgd, addr, next); ++ } while (pgd++, addr = next, addr != end); + } + + #ifdef CONFIG_KASAN_INLINE +@@ -102,7 +267,7 @@ void __init kasan_early_init(void) + for (i = 0; i < PTRS_PER_PUD; i++) + kasan_zero_pud[i] = __pud(pud_val); + +- for (i = 0; CONFIG_PGTABLE_LEVELS >= 5 && i < PTRS_PER_P4D; i++) ++ for (i = 0; IS_ENABLED(CONFIG_X86_5LEVEL) && i < PTRS_PER_P4D; i++) + kasan_zero_p4d[i] = __p4d(p4d_val); + + kasan_map_early_shadow(early_top_pgt); +@@ -112,37 +277,76 @@ void __init kasan_early_init(void) + void __init kasan_init(void) + { + int i; ++ void *shadow_cpu_entry_begin, *shadow_cpu_entry_end; + + #ifdef CONFIG_KASAN_INLINE + register_die_notifier(&kasan_die_notifier); + #endif + + memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt)); ++ ++ /* ++ * We use the same shadow offset for 4- and 5-level paging to ++ * facilitate boot-time switching between paging modes. ++ * As result in 5-level paging mode KASAN_SHADOW_START and ++ * KASAN_SHADOW_END are not aligned to PGD boundary. ++ * ++ * KASAN_SHADOW_START doesn't share PGD with anything else. ++ * We claim whole PGD entry to make things easier. ++ * ++ * KASAN_SHADOW_END lands in the last PGD entry and it collides with ++ * bunch of things like kernel code, modules, EFI mapping, etc. ++ * We need to take extra steps to not overwrite them. ++ */ ++ if (IS_ENABLED(CONFIG_X86_5LEVEL)) { ++ void *ptr; ++ ++ ptr = (void *)pgd_page_vaddr(*pgd_offset_k(KASAN_SHADOW_END)); ++ memcpy(tmp_p4d_table, (void *)ptr, sizeof(tmp_p4d_table)); ++ set_pgd(&early_top_pgt[pgd_index(KASAN_SHADOW_END)], ++ __pgd(__pa(tmp_p4d_table) | _KERNPG_TABLE)); ++ } ++ + load_cr3(early_top_pgt); + __flush_tlb_all(); + +- clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END); ++ clear_pgds(KASAN_SHADOW_START & PGDIR_MASK, KASAN_SHADOW_END); + +- kasan_populate_zero_shadow((void *)KASAN_SHADOW_START, ++ kasan_populate_zero_shadow((void *)(KASAN_SHADOW_START & PGDIR_MASK), + kasan_mem_to_shadow((void *)PAGE_OFFSET)); + + for (i = 0; i < E820_MAX_ENTRIES; i++) { + if (pfn_mapped[i].end == 0) + break; + +- if (map_range(&pfn_mapped[i])) +- panic("kasan: unable to allocate shadow!"); ++ map_range(&pfn_mapped[i]); + } ++ + kasan_populate_zero_shadow( + kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM), + kasan_mem_to_shadow((void *)__START_KERNEL_map)); + +- vmemmap_populate((unsigned long)kasan_mem_to_shadow(_stext), +- (unsigned long)kasan_mem_to_shadow(_end), +- NUMA_NO_NODE); ++ kasan_populate_shadow((unsigned long)kasan_mem_to_shadow(_stext), ++ (unsigned long)kasan_mem_to_shadow(_end), ++ early_pfn_to_nid(__pa(_stext))); ++ ++ shadow_cpu_entry_begin = (void *)__fix_to_virt(FIX_CPU_ENTRY_AREA_BOTTOM); ++ shadow_cpu_entry_begin = kasan_mem_to_shadow(shadow_cpu_entry_begin); ++ shadow_cpu_entry_begin = (void *)round_down((unsigned long)shadow_cpu_entry_begin, ++ PAGE_SIZE); ++ ++ shadow_cpu_entry_end = (void *)(__fix_to_virt(FIX_CPU_ENTRY_AREA_TOP) + PAGE_SIZE); ++ shadow_cpu_entry_end = kasan_mem_to_shadow(shadow_cpu_entry_end); ++ shadow_cpu_entry_end = (void *)round_up((unsigned long)shadow_cpu_entry_end, ++ PAGE_SIZE); + + kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END), +- (void *)KASAN_SHADOW_END); ++ shadow_cpu_entry_begin); ++ ++ kasan_populate_shadow((unsigned long)shadow_cpu_entry_begin, ++ (unsigned long)shadow_cpu_entry_end, 0); ++ ++ kasan_populate_zero_shadow(shadow_cpu_entry_end, (void *)KASAN_SHADOW_END); + + load_cr3(init_top_pgt); + __flush_tlb_all(); +diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c +index 84fcfde53f8f..04d5157fe7f8 100644 +--- a/arch/x86/power/cpu.c ++++ b/arch/x86/power/cpu.c +@@ -160,17 +160,19 @@ static void do_fpu_end(void) + static void fix_processor_context(void) + { + int cpu = smp_processor_id(); +- struct tss_struct *t = &per_cpu(cpu_tss, cpu); + #ifdef CONFIG_X86_64 + struct desc_struct *desc = get_cpu_gdt_rw(cpu); + tss_desc tss; + #endif +- set_tss_desc(cpu, t); /* +- * This just modifies memory; should not be +- * necessary. But... This is necessary, because +- * 386 hardware has concept of busy TSS or some +- * similar stupidity. +- */ ++ ++ /* ++ * We need to reload TR, which requires that we change the ++ * GDT entry to indicate "available" first. ++ * ++ * XXX: This could probably all be replaced by a call to ++ * force_reload_TR(). ++ */ ++ set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); + + #ifdef CONFIG_X86_64 + memcpy(&tss, &desc[GDT_ENTRY_TSS], sizeof(tss_desc)); +diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c +index de503c225ae1..754d5391d9fa 100644 +--- a/arch/x86/xen/enlighten_hvm.c ++++ b/arch/x86/xen/enlighten_hvm.c +@@ -226,12 +226,12 @@ static uint32_t __init xen_platform_hvm(void) + return xen_cpuid_base(); + } + +-const struct hypervisor_x86 x86_hyper_xen_hvm = { ++const __initconst struct hypervisor_x86 x86_hyper_xen_hvm = { + .name = "Xen HVM", + .detect = xen_platform_hvm, +- .init_platform = xen_hvm_guest_init, +- .pin_vcpu = xen_pin_vcpu, +- .x2apic_available = xen_x2apic_para_available, +- .init_mem_mapping = xen_hvm_init_mem_mapping, ++ .type = X86_HYPER_XEN_HVM, ++ .init.init_platform = xen_hvm_guest_init, ++ .init.x2apic_available = xen_x2apic_para_available, ++ .init.init_mem_mapping = xen_hvm_init_mem_mapping, ++ .runtime.pin_vcpu = xen_pin_vcpu, + }; +-EXPORT_SYMBOL(x86_hyper_xen_hvm); +diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c +index d4396e27b1fb..ae3a071e1d0f 100644 +--- a/arch/x86/xen/enlighten_pv.c ++++ b/arch/x86/xen/enlighten_pv.c +@@ -601,7 +601,7 @@ static struct trap_array_entry trap_array[] = { + #ifdef CONFIG_X86_MCE + { machine_check, xen_machine_check, true }, + #endif +- { nmi, xen_nmi, true }, ++ { nmi, xen_xennmi, true }, + { overflow, xen_overflow, false }, + #ifdef CONFIG_IA32_EMULATION + { entry_INT80_compat, xen_entry_INT80_compat, false }, +@@ -811,15 +811,14 @@ static void __init xen_write_gdt_entry_boot(struct desc_struct *dt, int entry, + } + } + +-static void xen_load_sp0(struct tss_struct *tss, +- struct thread_struct *thread) ++static void xen_load_sp0(unsigned long sp0) + { + struct multicall_space mcs; + + mcs = xen_mc_entry(0); +- MULTI_stack_switch(mcs.mc, __KERNEL_DS, thread->sp0); ++ MULTI_stack_switch(mcs.mc, __KERNEL_DS, sp0); + xen_mc_issue(PARAVIRT_LAZY_CPU); +- tss->x86_tss.sp0 = thread->sp0; ++ this_cpu_write(cpu_tss_rw.x86_tss.sp0, sp0); + } + + void xen_set_iopl_mask(unsigned mask) +@@ -1460,9 +1459,9 @@ static uint32_t __init xen_platform_pv(void) + return 0; + } + +-const struct hypervisor_x86 x86_hyper_xen_pv = { ++const __initconst struct hypervisor_x86 x86_hyper_xen_pv = { + .name = "Xen PV", + .detect = xen_platform_pv, +- .pin_vcpu = xen_pin_vcpu, ++ .type = X86_HYPER_XEN_PV, ++ .runtime.pin_vcpu = xen_pin_vcpu, + }; +-EXPORT_SYMBOL(x86_hyper_xen_pv); +diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c +index 71495f1a86d7..c2454237fa67 100644 +--- a/arch/x86/xen/mmu_pv.c ++++ b/arch/x86/xen/mmu_pv.c +@@ -449,7 +449,7 @@ __visible pmd_t xen_make_pmd(pmdval_t pmd) + } + PV_CALLEE_SAVE_REGS_THUNK(xen_make_pmd); + +-#if CONFIG_PGTABLE_LEVELS == 4 ++#ifdef CONFIG_X86_64 + __visible pudval_t xen_pud_val(pud_t pud) + { + return pte_mfn_to_pfn(pud.pud); +@@ -538,7 +538,7 @@ static void xen_set_p4d(p4d_t *ptr, p4d_t val) + + xen_mc_issue(PARAVIRT_LAZY_MMU); + } +-#endif /* CONFIG_PGTABLE_LEVELS == 4 */ ++#endif /* CONFIG_X86_64 */ + + static int xen_pmd_walk(struct mm_struct *mm, pmd_t *pmd, + int (*func)(struct mm_struct *mm, struct page *, enum pt_level), +@@ -580,21 +580,17 @@ static int xen_p4d_walk(struct mm_struct *mm, p4d_t *p4d, + int (*func)(struct mm_struct *mm, struct page *, enum pt_level), + bool last, unsigned long limit) + { +- int i, nr, flush = 0; ++ int flush = 0; ++ pud_t *pud; + +- nr = last ? p4d_index(limit) + 1 : PTRS_PER_P4D; +- for (i = 0; i < nr; i++) { +- pud_t *pud; + +- if (p4d_none(p4d[i])) +- continue; ++ if (p4d_none(*p4d)) ++ return flush; + +- pud = pud_offset(&p4d[i], 0); +- if (PTRS_PER_PUD > 1) +- flush |= (*func)(mm, virt_to_page(pud), PT_PUD); +- flush |= xen_pud_walk(mm, pud, func, +- last && i == nr - 1, limit); +- } ++ pud = pud_offset(p4d, 0); ++ if (PTRS_PER_PUD > 1) ++ flush |= (*func)(mm, virt_to_page(pud), PT_PUD); ++ flush |= xen_pud_walk(mm, pud, func, last, limit); + return flush; + } + +@@ -644,8 +640,6 @@ static int __xen_pgd_walk(struct mm_struct *mm, pgd_t *pgd, + continue; + + p4d = p4d_offset(&pgd[i], 0); +- if (PTRS_PER_P4D > 1) +- flush |= (*func)(mm, virt_to_page(p4d), PT_P4D); + flush |= xen_p4d_walk(mm, p4d, func, i == nr - 1, limit); + } + +@@ -1176,22 +1170,14 @@ static void __init xen_cleanmfnmap(unsigned long vaddr) + { + pgd_t *pgd; + p4d_t *p4d; +- unsigned int i; + bool unpin; + + unpin = (vaddr == 2 * PGDIR_SIZE); + vaddr &= PMD_MASK; + pgd = pgd_offset_k(vaddr); + p4d = p4d_offset(pgd, 0); +- for (i = 0; i < PTRS_PER_P4D; i++) { +- if (p4d_none(p4d[i])) +- continue; +- xen_cleanmfnmap_p4d(p4d + i, unpin); +- } +- if (IS_ENABLED(CONFIG_X86_5LEVEL)) { +- set_pgd(pgd, __pgd(0)); +- xen_cleanmfnmap_free_pgtbl(p4d, unpin); +- } ++ if (!p4d_none(*p4d)) ++ xen_cleanmfnmap_p4d(p4d, unpin); + } + + static void __init xen_pagetable_p2m_free(void) +@@ -1692,7 +1678,7 @@ static void xen_release_pmd(unsigned long pfn) + xen_release_ptpage(pfn, PT_PMD); + } + +-#if CONFIG_PGTABLE_LEVELS >= 4 ++#ifdef CONFIG_X86_64 + static void xen_alloc_pud(struct mm_struct *mm, unsigned long pfn) + { + xen_alloc_ptpage(mm, pfn, PT_PUD); +@@ -2029,13 +2015,12 @@ static phys_addr_t __init xen_early_virt_to_phys(unsigned long vaddr) + */ + void __init xen_relocate_p2m(void) + { +- phys_addr_t size, new_area, pt_phys, pmd_phys, pud_phys, p4d_phys; ++ phys_addr_t size, new_area, pt_phys, pmd_phys, pud_phys; + unsigned long p2m_pfn, p2m_pfn_end, n_frames, pfn, pfn_end; +- int n_pte, n_pt, n_pmd, n_pud, n_p4d, idx_pte, idx_pt, idx_pmd, idx_pud, idx_p4d; ++ int n_pte, n_pt, n_pmd, n_pud, idx_pte, idx_pt, idx_pmd, idx_pud; + pte_t *pt; + pmd_t *pmd; + pud_t *pud; +- p4d_t *p4d = NULL; + pgd_t *pgd; + unsigned long *new_p2m; + int save_pud; +@@ -2045,11 +2030,7 @@ void __init xen_relocate_p2m(void) + n_pt = roundup(size, PMD_SIZE) >> PMD_SHIFT; + n_pmd = roundup(size, PUD_SIZE) >> PUD_SHIFT; + n_pud = roundup(size, P4D_SIZE) >> P4D_SHIFT; +- if (PTRS_PER_P4D > 1) +- n_p4d = roundup(size, PGDIR_SIZE) >> PGDIR_SHIFT; +- else +- n_p4d = 0; +- n_frames = n_pte + n_pt + n_pmd + n_pud + n_p4d; ++ n_frames = n_pte + n_pt + n_pmd + n_pud; + + new_area = xen_find_free_area(PFN_PHYS(n_frames)); + if (!new_area) { +@@ -2065,76 +2046,56 @@ void __init xen_relocate_p2m(void) + * To avoid any possible virtual address collision, just use + * 2 * PUD_SIZE for the new area. + */ +- p4d_phys = new_area; +- pud_phys = p4d_phys + PFN_PHYS(n_p4d); ++ pud_phys = new_area; + pmd_phys = pud_phys + PFN_PHYS(n_pud); + pt_phys = pmd_phys + PFN_PHYS(n_pmd); + p2m_pfn = PFN_DOWN(pt_phys) + n_pt; + + pgd = __va(read_cr3_pa()); + new_p2m = (unsigned long *)(2 * PGDIR_SIZE); +- idx_p4d = 0; + save_pud = n_pud; +- do { +- if (n_p4d > 0) { +- p4d = early_memremap(p4d_phys, PAGE_SIZE); +- clear_page(p4d); +- n_pud = min(save_pud, PTRS_PER_P4D); +- } +- for (idx_pud = 0; idx_pud < n_pud; idx_pud++) { +- pud = early_memremap(pud_phys, PAGE_SIZE); +- clear_page(pud); +- for (idx_pmd = 0; idx_pmd < min(n_pmd, PTRS_PER_PUD); +- idx_pmd++) { +- pmd = early_memremap(pmd_phys, PAGE_SIZE); +- clear_page(pmd); +- for (idx_pt = 0; idx_pt < min(n_pt, PTRS_PER_PMD); +- idx_pt++) { +- pt = early_memremap(pt_phys, PAGE_SIZE); +- clear_page(pt); +- for (idx_pte = 0; +- idx_pte < min(n_pte, PTRS_PER_PTE); +- idx_pte++) { +- set_pte(pt + idx_pte, +- pfn_pte(p2m_pfn, PAGE_KERNEL)); +- p2m_pfn++; +- } +- n_pte -= PTRS_PER_PTE; +- early_memunmap(pt, PAGE_SIZE); +- make_lowmem_page_readonly(__va(pt_phys)); +- pin_pagetable_pfn(MMUEXT_PIN_L1_TABLE, +- PFN_DOWN(pt_phys)); +- set_pmd(pmd + idx_pt, +- __pmd(_PAGE_TABLE | pt_phys)); +- pt_phys += PAGE_SIZE; ++ for (idx_pud = 0; idx_pud < n_pud; idx_pud++) { ++ pud = early_memremap(pud_phys, PAGE_SIZE); ++ clear_page(pud); ++ for (idx_pmd = 0; idx_pmd < min(n_pmd, PTRS_PER_PUD); ++ idx_pmd++) { ++ pmd = early_memremap(pmd_phys, PAGE_SIZE); ++ clear_page(pmd); ++ for (idx_pt = 0; idx_pt < min(n_pt, PTRS_PER_PMD); ++ idx_pt++) { ++ pt = early_memremap(pt_phys, PAGE_SIZE); ++ clear_page(pt); ++ for (idx_pte = 0; ++ idx_pte < min(n_pte, PTRS_PER_PTE); ++ idx_pte++) { ++ set_pte(pt + idx_pte, ++ pfn_pte(p2m_pfn, PAGE_KERNEL)); ++ p2m_pfn++; + } +- n_pt -= PTRS_PER_PMD; +- early_memunmap(pmd, PAGE_SIZE); +- make_lowmem_page_readonly(__va(pmd_phys)); +- pin_pagetable_pfn(MMUEXT_PIN_L2_TABLE, +- PFN_DOWN(pmd_phys)); +- set_pud(pud + idx_pmd, __pud(_PAGE_TABLE | pmd_phys)); +- pmd_phys += PAGE_SIZE; ++ n_pte -= PTRS_PER_PTE; ++ early_memunmap(pt, PAGE_SIZE); ++ make_lowmem_page_readonly(__va(pt_phys)); ++ pin_pagetable_pfn(MMUEXT_PIN_L1_TABLE, ++ PFN_DOWN(pt_phys)); ++ set_pmd(pmd + idx_pt, ++ __pmd(_PAGE_TABLE | pt_phys)); ++ pt_phys += PAGE_SIZE; + } +- n_pmd -= PTRS_PER_PUD; +- early_memunmap(pud, PAGE_SIZE); +- make_lowmem_page_readonly(__va(pud_phys)); +- pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(pud_phys)); +- if (n_p4d > 0) +- set_p4d(p4d + idx_pud, __p4d(_PAGE_TABLE | pud_phys)); +- else +- set_pgd(pgd + 2 + idx_pud, __pgd(_PAGE_TABLE | pud_phys)); +- pud_phys += PAGE_SIZE; +- } +- if (n_p4d > 0) { +- save_pud -= PTRS_PER_P4D; +- early_memunmap(p4d, PAGE_SIZE); +- make_lowmem_page_readonly(__va(p4d_phys)); +- pin_pagetable_pfn(MMUEXT_PIN_L4_TABLE, PFN_DOWN(p4d_phys)); +- set_pgd(pgd + 2 + idx_p4d, __pgd(_PAGE_TABLE | p4d_phys)); +- p4d_phys += PAGE_SIZE; ++ n_pt -= PTRS_PER_PMD; ++ early_memunmap(pmd, PAGE_SIZE); ++ make_lowmem_page_readonly(__va(pmd_phys)); ++ pin_pagetable_pfn(MMUEXT_PIN_L2_TABLE, ++ PFN_DOWN(pmd_phys)); ++ set_pud(pud + idx_pmd, __pud(_PAGE_TABLE | pmd_phys)); ++ pmd_phys += PAGE_SIZE; + } +- } while (++idx_p4d < n_p4d); ++ n_pmd -= PTRS_PER_PUD; ++ early_memunmap(pud, PAGE_SIZE); ++ make_lowmem_page_readonly(__va(pud_phys)); ++ pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(pud_phys)); ++ set_pgd(pgd + 2 + idx_pud, __pgd(_PAGE_TABLE | pud_phys)); ++ pud_phys += PAGE_SIZE; ++ } + + /* Now copy the old p2m info to the new area. */ + memcpy(new_p2m, xen_p2m_addr, size); +@@ -2311,7 +2272,7 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) + #endif + case FIX_TEXT_POKE0: + case FIX_TEXT_POKE1: +- case FIX_GDT_REMAP_BEGIN ... FIX_GDT_REMAP_END: ++ case FIX_CPU_ENTRY_AREA_TOP ... FIX_CPU_ENTRY_AREA_BOTTOM: + /* All local page mappings */ + pte = pfn_pte(phys, prot); + break; +@@ -2361,7 +2322,7 @@ static void __init xen_post_allocator_init(void) + pv_mmu_ops.set_pte = xen_set_pte; + pv_mmu_ops.set_pmd = xen_set_pmd; + pv_mmu_ops.set_pud = xen_set_pud; +-#if CONFIG_PGTABLE_LEVELS >= 4 ++#ifdef CONFIG_X86_64 + pv_mmu_ops.set_p4d = xen_set_p4d; + #endif + +@@ -2371,7 +2332,7 @@ static void __init xen_post_allocator_init(void) + pv_mmu_ops.alloc_pmd = xen_alloc_pmd; + pv_mmu_ops.release_pte = xen_release_pte; + pv_mmu_ops.release_pmd = xen_release_pmd; +-#if CONFIG_PGTABLE_LEVELS >= 4 ++#ifdef CONFIG_X86_64 + pv_mmu_ops.alloc_pud = xen_alloc_pud; + pv_mmu_ops.release_pud = xen_release_pud; + #endif +@@ -2435,14 +2396,14 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = { + .make_pmd = PV_CALLEE_SAVE(xen_make_pmd), + .pmd_val = PV_CALLEE_SAVE(xen_pmd_val), + +-#if CONFIG_PGTABLE_LEVELS >= 4 ++#ifdef CONFIG_X86_64 + .pud_val = PV_CALLEE_SAVE(xen_pud_val), + .make_pud = PV_CALLEE_SAVE(xen_make_pud), + .set_p4d = xen_set_p4d_hyper, + + .alloc_pud = xen_alloc_pmd_init, + .release_pud = xen_release_pmd_init, +-#endif /* CONFIG_PGTABLE_LEVELS == 4 */ ++#endif /* CONFIG_X86_64 */ + + .activate_mm = xen_activate_mm, + .dup_mmap = xen_dup_mmap, +diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c +index 05f91ce9b55e..c0c756c76afe 100644 +--- a/arch/x86/xen/smp_pv.c ++++ b/arch/x86/xen/smp_pv.c +@@ -14,6 +14,7 @@ + * single-threaded. + */ + #include ++#include + #include + #include + #include +@@ -294,12 +295,19 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle) + #endif + memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt)); + ++ /* ++ * Bring up the CPU in cpu_bringup_and_idle() with the stack ++ * pointing just below where pt_regs would be if it were a normal ++ * kernel entry. ++ */ + ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle; + ctxt->flags = VGCF_IN_KERNEL; + ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */ + ctxt->user_regs.ds = __USER_DS; + ctxt->user_regs.es = __USER_DS; + ctxt->user_regs.ss = __KERNEL_DS; ++ ctxt->user_regs.cs = __KERNEL_CS; ++ ctxt->user_regs.esp = (unsigned long)task_pt_regs(idle); + + xen_copy_trap_info(ctxt->trap_ctxt); + +@@ -314,8 +322,13 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle) + ctxt->gdt_frames[0] = gdt_mfn; + ctxt->gdt_ents = GDT_ENTRIES; + ++ /* ++ * Set SS:SP that Xen will use when entering guest kernel mode ++ * from guest user mode. Subsequent calls to load_sp0() can ++ * change this value. ++ */ + ctxt->kernel_ss = __KERNEL_DS; +- ctxt->kernel_sp = idle->thread.sp0; ++ ctxt->kernel_sp = task_top_of_stack(idle); + + #ifdef CONFIG_X86_32 + ctxt->event_callback_cs = __KERNEL_CS; +@@ -327,10 +340,8 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle) + (unsigned long)xen_hypervisor_callback; + ctxt->failsafe_callback_eip = + (unsigned long)xen_failsafe_callback; +- ctxt->user_regs.cs = __KERNEL_CS; + per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir); + +- ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs); + ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_gfn(swapper_pg_dir)); + if (HYPERVISOR_vcpu_op(VCPUOP_initialise, xen_vcpu_nr(cpu), ctxt)) + BUG(); +diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S +index c98a48c861fd..8a10c9a9e2b5 100644 +--- a/arch/x86/xen/xen-asm_64.S ++++ b/arch/x86/xen/xen-asm_64.S +@@ -30,7 +30,7 @@ xen_pv_trap debug + xen_pv_trap xendebug + xen_pv_trap int3 + xen_pv_trap xenint3 +-xen_pv_trap nmi ++xen_pv_trap xennmi + xen_pv_trap overflow + xen_pv_trap bounds + xen_pv_trap invalid_op +diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S +index b5b8d7f43557..497cc55a0c16 100644 +--- a/arch/x86/xen/xen-head.S ++++ b/arch/x86/xen/xen-head.S +@@ -10,6 +10,7 @@ + #include + #include + #include ++#include + + #include + #include +@@ -20,6 +21,7 @@ + #ifdef CONFIG_XEN_PV + __INIT + ENTRY(startup_xen) ++ UNWIND_HINT_EMPTY + cld + + /* Clear .bss */ +@@ -34,21 +36,24 @@ ENTRY(startup_xen) + mov $init_thread_union+THREAD_SIZE, %_ASM_SP + + jmp xen_start_kernel +- ++END(startup_xen) + __FINIT + #endif + + .pushsection .text + .balign PAGE_SIZE + ENTRY(hypercall_page) +- .skip PAGE_SIZE ++ .rept (PAGE_SIZE / 32) ++ UNWIND_HINT_EMPTY ++ .skip 32 ++ .endr + + #define HYPERCALL(n) \ + .equ xen_hypercall_##n, hypercall_page + __HYPERVISOR_##n * 32; \ + .type xen_hypercall_##n, @function; .size xen_hypercall_##n, 32 + #include + #undef HYPERCALL +- ++END(hypercall_page) + .popsection + + ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz "linux") +diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c +index a4783da90ba8..0f860cf0d56d 100644 +--- a/block/bfq-iosched.c ++++ b/block/bfq-iosched.c +@@ -108,6 +108,7 @@ + #include "blk-mq-tag.h" + #include "blk-mq-sched.h" + #include "bfq-iosched.h" ++#include "blk-wbt.h" + + #define BFQ_BFQQ_FNS(name) \ + void bfq_mark_bfqq_##name(struct bfq_queue *bfqq) \ +@@ -4775,7 +4776,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e) + bfq_init_root_group(bfqd->root_group, bfqd); + bfq_init_entity(&bfqd->oom_bfqq.entity, bfqd->root_group); + +- ++ wbt_disable_default(q); + return 0; + + out_free: +diff --git a/block/blk-wbt.c b/block/blk-wbt.c +index 6a9a0f03a67b..e59d59c11ebb 100644 +--- a/block/blk-wbt.c ++++ b/block/blk-wbt.c +@@ -654,7 +654,7 @@ void wbt_set_write_cache(struct rq_wb *rwb, bool write_cache_on) + } + + /* +- * Disable wbt, if enabled by default. Only called from CFQ. ++ * Disable wbt, if enabled by default. + */ + void wbt_disable_default(struct request_queue *q) + { +diff --git a/crypto/lrw.c b/crypto/lrw.c +index a8bfae4451bf..eb681e9fe574 100644 +--- a/crypto/lrw.c ++++ b/crypto/lrw.c +@@ -610,8 +610,10 @@ static int create(struct crypto_template *tmpl, struct rtattr **tb) + ecb_name[len - 1] = 0; + + if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME, +- "lrw(%s)", ecb_name) >= CRYPTO_MAX_ALG_NAME) +- return -ENAMETOOLONG; ++ "lrw(%s)", ecb_name) >= CRYPTO_MAX_ALG_NAME) { ++ err = -ENAMETOOLONG; ++ goto err_drop_spawn; ++ } + } + + inst->alg.base.cra_flags = alg->base.cra_flags & CRYPTO_ALG_ASYNC; +diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c +index 3c3a37b8503b..572b6c7303ed 100644 +--- a/drivers/acpi/apei/ghes.c ++++ b/drivers/acpi/apei/ghes.c +@@ -51,6 +51,7 @@ + #include + #include + #include ++#include + #include + #include + +@@ -112,7 +113,7 @@ static DEFINE_MUTEX(ghes_list_mutex); + * Because the memory area used to transfer hardware error information + * from BIOS to Linux can be determined only in NMI, IRQ or timer + * handler, but general ioremap can not be used in atomic context, so +- * a special version of atomic ioremap is implemented for that. ++ * the fixmap is used instead. + */ + + /* +@@ -126,8 +127,8 @@ static DEFINE_MUTEX(ghes_list_mutex); + /* virtual memory area for atomic ioremap */ + static struct vm_struct *ghes_ioremap_area; + /* +- * These 2 spinlock is used to prevent atomic ioremap virtual memory +- * area from being mapped simultaneously. ++ * These 2 spinlocks are used to prevent the fixmap entries from being used ++ * simultaneously. + */ + static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi); + static DEFINE_SPINLOCK(ghes_ioremap_lock_irq); +@@ -159,52 +160,36 @@ static void ghes_ioremap_exit(void) + + static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn) + { +- unsigned long vaddr; + phys_addr_t paddr; + pgprot_t prot; + +- vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr); +- + paddr = pfn << PAGE_SHIFT; + prot = arch_apei_get_mem_attribute(paddr); +- ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot); ++ __set_fixmap(FIX_APEI_GHES_NMI, paddr, prot); + +- return (void __iomem *)vaddr; ++ return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI); + } + + static void __iomem *ghes_ioremap_pfn_irq(u64 pfn) + { +- unsigned long vaddr, paddr; ++ phys_addr_t paddr; + pgprot_t prot; + +- vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr); +- + paddr = pfn << PAGE_SHIFT; + prot = arch_apei_get_mem_attribute(paddr); ++ __set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot); + +- ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot); +- +- return (void __iomem *)vaddr; ++ return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ); + } + +-static void ghes_iounmap_nmi(void __iomem *vaddr_ptr) ++static void ghes_iounmap_nmi(void) + { +- unsigned long vaddr = (unsigned long __force)vaddr_ptr; +- void *base = ghes_ioremap_area->addr; +- +- BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_NMI_PAGE(base)); +- unmap_kernel_range_noflush(vaddr, PAGE_SIZE); +- arch_apei_flush_tlb_one(vaddr); ++ clear_fixmap(FIX_APEI_GHES_NMI); + } + +-static void ghes_iounmap_irq(void __iomem *vaddr_ptr) ++static void ghes_iounmap_irq(void) + { +- unsigned long vaddr = (unsigned long __force)vaddr_ptr; +- void *base = ghes_ioremap_area->addr; +- +- BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_IRQ_PAGE(base)); +- unmap_kernel_range_noflush(vaddr, PAGE_SIZE); +- arch_apei_flush_tlb_one(vaddr); ++ clear_fixmap(FIX_APEI_GHES_IRQ); + } + + static int ghes_estatus_pool_init(void) +@@ -360,10 +345,10 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len, + paddr += trunk; + buffer += trunk; + if (in_nmi) { +- ghes_iounmap_nmi(vaddr); ++ ghes_iounmap_nmi(); + raw_spin_unlock(&ghes_ioremap_lock_nmi); + } else { +- ghes_iounmap_irq(vaddr); ++ ghes_iounmap_irq(); + spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags); + } + } +@@ -851,17 +836,8 @@ static void ghes_sea_remove(struct ghes *ghes) + synchronize_rcu(); + } + #else /* CONFIG_ACPI_APEI_SEA */ +-static inline void ghes_sea_add(struct ghes *ghes) +-{ +- pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n", +- ghes->generic->header.source_id); +-} +- +-static inline void ghes_sea_remove(struct ghes *ghes) +-{ +- pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n", +- ghes->generic->header.source_id); +-} ++static inline void ghes_sea_add(struct ghes *ghes) { } ++static inline void ghes_sea_remove(struct ghes *ghes) { } + #endif /* CONFIG_ACPI_APEI_SEA */ + + #ifdef CONFIG_HAVE_ACPI_APEI_NMI +@@ -1063,23 +1039,9 @@ static void ghes_nmi_init_cxt(void) + init_irq_work(&ghes_proc_irq_work, ghes_proc_in_irq); + } + #else /* CONFIG_HAVE_ACPI_APEI_NMI */ +-static inline void ghes_nmi_add(struct ghes *ghes) +-{ +- pr_err(GHES_PFX "ID: %d, trying to add NMI notification which is not supported!\n", +- ghes->generic->header.source_id); +- BUG(); +-} +- +-static inline void ghes_nmi_remove(struct ghes *ghes) +-{ +- pr_err(GHES_PFX "ID: %d, trying to remove NMI notification which is not supported!\n", +- ghes->generic->header.source_id); +- BUG(); +-} +- +-static inline void ghes_nmi_init_cxt(void) +-{ +-} ++static inline void ghes_nmi_add(struct ghes *ghes) { } ++static inline void ghes_nmi_remove(struct ghes *ghes) { } ++static inline void ghes_nmi_init_cxt(void) { } + #endif /* CONFIG_HAVE_ACPI_APEI_NMI */ + + static int ghes_probe(struct platform_device *ghes_dev) +diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c +index a6de32530693..0459b1204694 100644 +--- a/drivers/base/power/opp/core.c ++++ b/drivers/base/power/opp/core.c +@@ -296,7 +296,7 @@ int dev_pm_opp_get_opp_count(struct device *dev) + opp_table = _find_opp_table(dev); + if (IS_ERR(opp_table)) { + count = PTR_ERR(opp_table); +- dev_err(dev, "%s: OPP table not found (%d)\n", ++ dev_dbg(dev, "%s: OPP table not found (%d)\n", + __func__, count); + return count; + } +diff --git a/drivers/bluetooth/hci_bcm.c b/drivers/bluetooth/hci_bcm.c +index e2540113d0da..73d2d88ddc03 100644 +--- a/drivers/bluetooth/hci_bcm.c ++++ b/drivers/bluetooth/hci_bcm.c +@@ -68,7 +68,7 @@ struct bcm_device { + u32 init_speed; + u32 oper_speed; + int irq; +- u8 irq_polarity; ++ bool irq_active_low; + + #ifdef CONFIG_PM + struct hci_uart *hu; +@@ -213,7 +213,9 @@ static int bcm_request_irq(struct bcm_data *bcm) + } + + err = devm_request_irq(&bdev->pdev->dev, bdev->irq, bcm_host_wake, +- IRQF_TRIGGER_RISING, "host_wake", bdev); ++ bdev->irq_active_low ? IRQF_TRIGGER_FALLING : ++ IRQF_TRIGGER_RISING, ++ "host_wake", bdev); + if (err) + goto unlock; + +@@ -253,7 +255,7 @@ static int bcm_setup_sleep(struct hci_uart *hu) + struct sk_buff *skb; + struct bcm_set_sleep_mode sleep_params = default_sleep_params; + +- sleep_params.host_wake_active = !bcm->dev->irq_polarity; ++ sleep_params.host_wake_active = !bcm->dev->irq_active_low; + + skb = __hci_cmd_sync(hu->hdev, 0xfc27, sizeof(sleep_params), + &sleep_params, HCI_INIT_TIMEOUT); +@@ -690,10 +692,8 @@ static const struct acpi_gpio_mapping acpi_bcm_int_first_gpios[] = { + }; + + #ifdef CONFIG_ACPI +-static u8 acpi_active_low = ACPI_ACTIVE_LOW; +- + /* IRQ polarity of some chipsets are not defined correctly in ACPI table. */ +-static const struct dmi_system_id bcm_wrong_irq_dmi_table[] = { ++static const struct dmi_system_id bcm_active_low_irq_dmi_table[] = { + { + .ident = "Asus T100TA", + .matches = { +@@ -701,7 +701,6 @@ static const struct dmi_system_id bcm_wrong_irq_dmi_table[] = { + "ASUSTeK COMPUTER INC."), + DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "T100TA"), + }, +- .driver_data = &acpi_active_low, + }, + { + .ident = "Asus T100CHI", +@@ -710,7 +709,6 @@ static const struct dmi_system_id bcm_wrong_irq_dmi_table[] = { + "ASUSTeK COMPUTER INC."), + DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "T100CHI"), + }, +- .driver_data = &acpi_active_low, + }, + { /* Handle ThinkPad 8 tablets with BCM2E55 chipset ACPI ID */ + .ident = "Lenovo ThinkPad 8", +@@ -718,7 +716,6 @@ static const struct dmi_system_id bcm_wrong_irq_dmi_table[] = { + DMI_EXACT_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_EXACT_MATCH(DMI_PRODUCT_VERSION, "ThinkPad 8"), + }, +- .driver_data = &acpi_active_low, + }, + { } + }; +@@ -733,13 +730,13 @@ static int bcm_resource(struct acpi_resource *ares, void *data) + switch (ares->type) { + case ACPI_RESOURCE_TYPE_EXTENDED_IRQ: + irq = &ares->data.extended_irq; +- dev->irq_polarity = irq->polarity; ++ dev->irq_active_low = irq->polarity == ACPI_ACTIVE_LOW; + break; + + case ACPI_RESOURCE_TYPE_GPIO: + gpio = &ares->data.gpio; + if (gpio->connection_type == ACPI_RESOURCE_GPIO_TYPE_INT) +- dev->irq_polarity = gpio->polarity; ++ dev->irq_active_low = gpio->polarity == ACPI_ACTIVE_LOW; + break; + + case ACPI_RESOURCE_TYPE_SERIAL_BUS: +@@ -834,11 +831,11 @@ static int bcm_acpi_probe(struct bcm_device *dev) + return ret; + acpi_dev_free_resource_list(&resources); + +- dmi_id = dmi_first_match(bcm_wrong_irq_dmi_table); ++ dmi_id = dmi_first_match(bcm_active_low_irq_dmi_table); + if (dmi_id) { + bt_dev_warn(dev, "%s: Overwriting IRQ polarity to active low", + dmi_id->ident); +- dev->irq_polarity = *(u8 *)dmi_id->driver_data; ++ dev->irq_active_low = true; + } + + return 0; +diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c +index 6e2403805784..6aef3bde10d7 100644 +--- a/drivers/bluetooth/hci_ldisc.c ++++ b/drivers/bluetooth/hci_ldisc.c +@@ -41,6 +41,7 @@ + #include + #include + #include ++#include + + #include + #include +@@ -298,6 +299,12 @@ void hci_uart_set_flow_control(struct hci_uart *hu, bool enable) + unsigned int set = 0; + unsigned int clear = 0; + ++ if (hu->serdev) { ++ serdev_device_set_flow_control(hu->serdev, !enable); ++ serdev_device_set_rts(hu->serdev, !enable); ++ return; ++ } ++ + if (enable) { + /* Disable hardware flow control */ + ktermios = tty->termios; +diff --git a/drivers/clk/sunxi-ng/ccu-sun5i.c b/drivers/clk/sunxi-ng/ccu-sun5i.c +index ab9e850b3707..2f385a57cd91 100644 +--- a/drivers/clk/sunxi-ng/ccu-sun5i.c ++++ b/drivers/clk/sunxi-ng/ccu-sun5i.c +@@ -982,8 +982,8 @@ static void __init sun5i_ccu_init(struct device_node *node, + + /* Force the PLL-Audio-1x divider to 4 */ + val = readl(reg + SUN5I_PLL_AUDIO_REG); +- val &= ~GENMASK(19, 16); +- writel(val | (3 << 16), reg + SUN5I_PLL_AUDIO_REG); ++ val &= ~GENMASK(29, 26); ++ writel(val | (3 << 26), reg + SUN5I_PLL_AUDIO_REG); + + /* + * Use the peripheral PLL as the AHB parent, instead of CPU / +diff --git a/drivers/clk/sunxi-ng/ccu-sun6i-a31.c b/drivers/clk/sunxi-ng/ccu-sun6i-a31.c +index 8af434815fba..241fb13f1c06 100644 +--- a/drivers/clk/sunxi-ng/ccu-sun6i-a31.c ++++ b/drivers/clk/sunxi-ng/ccu-sun6i-a31.c +@@ -608,7 +608,7 @@ static SUNXI_CCU_M_WITH_MUX_GATE(hdmi_clk, "hdmi", lcd_ch1_parents, + 0x150, 0, 4, 24, 2, BIT(31), + CLK_SET_RATE_PARENT); + +-static SUNXI_CCU_GATE(hdmi_ddc_clk, "hdmi-ddc", "osc24M", 0x150, BIT(30), 0); ++static SUNXI_CCU_GATE(hdmi_ddc_clk, "ddc", "osc24M", 0x150, BIT(30), 0); + + static SUNXI_CCU_GATE(ps_clk, "ps", "lcd1-ch1", 0x140, BIT(31), 0); + +diff --git a/drivers/clk/sunxi-ng/ccu_nm.c b/drivers/clk/sunxi-ng/ccu_nm.c +index a32158e8f2e3..84a5e7f17f6f 100644 +--- a/drivers/clk/sunxi-ng/ccu_nm.c ++++ b/drivers/clk/sunxi-ng/ccu_nm.c +@@ -99,6 +99,9 @@ static long ccu_nm_round_rate(struct clk_hw *hw, unsigned long rate, + struct ccu_nm *nm = hw_to_ccu_nm(hw); + struct _ccu_nm _nm; + ++ if (ccu_frac_helper_has_rate(&nm->common, &nm->frac, rate)) ++ return rate; ++ + _nm.min_n = nm->n.min ?: 1; + _nm.max_n = nm->n.max ?: 1 << nm->n.width; + _nm.min_m = 1; +diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c +index 484cc8909d5c..ed4df58a855e 100644 +--- a/drivers/cpuidle/cpuidle.c ++++ b/drivers/cpuidle/cpuidle.c +@@ -208,6 +208,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, + return -EBUSY; + } + target_state = &drv->states[index]; ++ broadcast = false; + } + + /* Take note of the planned idle state. */ +diff --git a/drivers/crypto/amcc/crypto4xx_core.h b/drivers/crypto/amcc/crypto4xx_core.h +index ecfdcfe3698d..4f41d6da5acc 100644 +--- a/drivers/crypto/amcc/crypto4xx_core.h ++++ b/drivers/crypto/amcc/crypto4xx_core.h +@@ -34,12 +34,12 @@ + #define PPC405EX_CE_RESET 0x00000008 + + #define CRYPTO4XX_CRYPTO_PRIORITY 300 +-#define PPC4XX_LAST_PD 63 +-#define PPC4XX_NUM_PD 64 +-#define PPC4XX_LAST_GD 1023 ++#define PPC4XX_NUM_PD 256 ++#define PPC4XX_LAST_PD (PPC4XX_NUM_PD - 1) + #define PPC4XX_NUM_GD 1024 +-#define PPC4XX_LAST_SD 63 +-#define PPC4XX_NUM_SD 64 ++#define PPC4XX_LAST_GD (PPC4XX_NUM_GD - 1) ++#define PPC4XX_NUM_SD 256 ++#define PPC4XX_LAST_SD (PPC4XX_NUM_SD - 1) + #define PPC4XX_SD_BUFFER_SIZE 2048 + + #define PD_ENTRY_INUSE 1 +diff --git a/drivers/gpu/drm/drm_dp_dual_mode_helper.c b/drivers/gpu/drm/drm_dp_dual_mode_helper.c +index 0ef9011a1856..02a50929af67 100644 +--- a/drivers/gpu/drm/drm_dp_dual_mode_helper.c ++++ b/drivers/gpu/drm/drm_dp_dual_mode_helper.c +@@ -410,6 +410,7 @@ int drm_lspcon_get_mode(struct i2c_adapter *adapter, + { + u8 data; + int ret = 0; ++ int retry; + + if (!mode) { + DRM_ERROR("NULL input\n"); +@@ -417,10 +418,19 @@ int drm_lspcon_get_mode(struct i2c_adapter *adapter, + } + + /* Read Status: i2c over aux */ +- ret = drm_dp_dual_mode_read(adapter, DP_DUAL_MODE_LSPCON_CURRENT_MODE, +- &data, sizeof(data)); ++ for (retry = 0; retry < 6; retry++) { ++ if (retry) ++ usleep_range(500, 1000); ++ ++ ret = drm_dp_dual_mode_read(adapter, ++ DP_DUAL_MODE_LSPCON_CURRENT_MODE, ++ &data, sizeof(data)); ++ if (!ret) ++ break; ++ } ++ + if (ret < 0) { +- DRM_ERROR("LSPCON read(0x80, 0x41) failed\n"); ++ DRM_DEBUG_KMS("LSPCON read(0x80, 0x41) failed\n"); + return -EFAULT; + } + +diff --git a/drivers/gpu/drm/vc4/vc4_dsi.c b/drivers/gpu/drm/vc4/vc4_dsi.c +index d1e0dc908048..04796d7d0fdb 100644 +--- a/drivers/gpu/drm/vc4/vc4_dsi.c ++++ b/drivers/gpu/drm/vc4/vc4_dsi.c +@@ -866,7 +866,8 @@ static bool vc4_dsi_encoder_mode_fixup(struct drm_encoder *encoder, + adjusted_mode->clock = pixel_clock_hz / 1000 + 1; + + /* Given the new pixel clock, adjust HFP to keep vrefresh the same. */ +- adjusted_mode->htotal = pixel_clock_hz / (mode->vrefresh * mode->vtotal); ++ adjusted_mode->htotal = adjusted_mode->clock * mode->htotal / ++ mode->clock; + adjusted_mode->hsync_end += adjusted_mode->htotal - mode->htotal; + adjusted_mode->hsync_start += adjusted_mode->htotal - mode->htotal; + +diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c +index 937801ac2fe0..2cd134dd94d2 100644 +--- a/drivers/hv/vmbus_drv.c ++++ b/drivers/hv/vmbus_drv.c +@@ -1534,7 +1534,7 @@ static int __init hv_acpi_init(void) + { + int ret, t; + +- if (x86_hyper != &x86_hyper_ms_hyperv) ++ if (x86_hyper_type != X86_HYPER_MS_HYPERV) + return -ENODEV; + + init_completion(&probe_event); +diff --git a/drivers/iio/accel/st_accel_core.c b/drivers/iio/accel/st_accel_core.c +index 752856b3a849..379de1829cdb 100644 +--- a/drivers/iio/accel/st_accel_core.c ++++ b/drivers/iio/accel/st_accel_core.c +@@ -164,7 +164,10 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = { + .mask_int2 = 0x00, + .addr_ihl = 0x25, + .mask_ihl = 0x02, +- .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .stat_drdy = { ++ .addr = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .mask = 0x07, ++ }, + }, + .sim = { + .addr = 0x23, +@@ -236,7 +239,10 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = { + .mask_ihl = 0x80, + .addr_od = 0x22, + .mask_od = 0x40, +- .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .stat_drdy = { ++ .addr = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .mask = 0x07, ++ }, + }, + .sim = { + .addr = 0x23, +@@ -318,7 +324,10 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = { + .mask_int2 = 0x00, + .addr_ihl = 0x23, + .mask_ihl = 0x40, +- .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .stat_drdy = { ++ .addr = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .mask = 0x07, ++ }, + .ig1 = { + .en_addr = 0x23, + .en_mask = 0x08, +@@ -389,7 +398,10 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = { + .drdy_irq = { + .addr = 0x21, + .mask_int1 = 0x04, +- .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .stat_drdy = { ++ .addr = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .mask = 0x07, ++ }, + }, + .sim = { + .addr = 0x21, +@@ -451,7 +463,10 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = { + .mask_ihl = 0x80, + .addr_od = 0x22, + .mask_od = 0x40, +- .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .stat_drdy = { ++ .addr = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .mask = 0x07, ++ }, + }, + .sim = { + .addr = 0x21, +@@ -569,7 +584,10 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = { + .drdy_irq = { + .addr = 0x21, + .mask_int1 = 0x04, +- .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .stat_drdy = { ++ .addr = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .mask = 0x07, ++ }, + }, + .sim = { + .addr = 0x21, +@@ -640,7 +658,10 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = { + .mask_int2 = 0x00, + .addr_ihl = 0x25, + .mask_ihl = 0x02, +- .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .stat_drdy = { ++ .addr = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .mask = 0x07, ++ }, + }, + .sim = { + .addr = 0x23, +diff --git a/drivers/iio/common/st_sensors/st_sensors_core.c b/drivers/iio/common/st_sensors/st_sensors_core.c +index 02e833b14db0..34115f05d5c4 100644 +--- a/drivers/iio/common/st_sensors/st_sensors_core.c ++++ b/drivers/iio/common/st_sensors/st_sensors_core.c +@@ -470,7 +470,7 @@ int st_sensors_set_dataready_irq(struct iio_dev *indio_dev, bool enable) + * different one. Take into account irq status register + * to understand if irq trigger can be properly supported + */ +- if (sdata->sensor_settings->drdy_irq.addr_stat_drdy) ++ if (sdata->sensor_settings->drdy_irq.stat_drdy.addr) + sdata->hw_irq_trigger = enable; + return 0; + } +diff --git a/drivers/iio/common/st_sensors/st_sensors_trigger.c b/drivers/iio/common/st_sensors/st_sensors_trigger.c +index fa73e6795359..fdcc5a891958 100644 +--- a/drivers/iio/common/st_sensors/st_sensors_trigger.c ++++ b/drivers/iio/common/st_sensors/st_sensors_trigger.c +@@ -31,7 +31,7 @@ static int st_sensors_new_samples_available(struct iio_dev *indio_dev, + int ret; + + /* How would I know if I can't check it? */ +- if (!sdata->sensor_settings->drdy_irq.addr_stat_drdy) ++ if (!sdata->sensor_settings->drdy_irq.stat_drdy.addr) + return -EINVAL; + + /* No scan mask, no interrupt */ +@@ -39,23 +39,15 @@ static int st_sensors_new_samples_available(struct iio_dev *indio_dev, + return 0; + + ret = sdata->tf->read_byte(&sdata->tb, sdata->dev, +- sdata->sensor_settings->drdy_irq.addr_stat_drdy, ++ sdata->sensor_settings->drdy_irq.stat_drdy.addr, + &status); + if (ret < 0) { + dev_err(sdata->dev, + "error checking samples available\n"); + return ret; + } +- /* +- * the lower bits of .active_scan_mask[0] is directly mapped +- * to the channels on the sensor: either bit 0 for +- * one-dimensional sensors, or e.g. x,y,z for accelerometers, +- * gyroscopes or magnetometers. No sensor use more than 3 +- * channels, so cut the other status bits here. +- */ +- status &= 0x07; + +- if (status & (u8)indio_dev->active_scan_mask[0]) ++ if (status & sdata->sensor_settings->drdy_irq.stat_drdy.mask) + return 1; + + return 0; +@@ -212,7 +204,7 @@ int st_sensors_allocate_trigger(struct iio_dev *indio_dev, + * it was "our" interrupt. + */ + if (sdata->int_pin_open_drain && +- sdata->sensor_settings->drdy_irq.addr_stat_drdy) ++ sdata->sensor_settings->drdy_irq.stat_drdy.addr) + irq_trig |= IRQF_SHARED; + + err = request_threaded_irq(sdata->get_irq_data_ready(indio_dev), +diff --git a/drivers/iio/gyro/st_gyro_core.c b/drivers/iio/gyro/st_gyro_core.c +index e366422e8512..2536a8400c98 100644 +--- a/drivers/iio/gyro/st_gyro_core.c ++++ b/drivers/iio/gyro/st_gyro_core.c +@@ -118,7 +118,10 @@ static const struct st_sensor_settings st_gyro_sensors_settings[] = { + * drain settings, but only for INT1 and not + * for the DRDY line on INT2. + */ +- .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .stat_drdy = { ++ .addr = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .mask = 0x07, ++ }, + }, + .multi_read_bit = true, + .bootime = 2, +@@ -188,7 +191,10 @@ static const struct st_sensor_settings st_gyro_sensors_settings[] = { + * drain settings, but only for INT1 and not + * for the DRDY line on INT2. + */ +- .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .stat_drdy = { ++ .addr = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .mask = 0x07, ++ }, + }, + .multi_read_bit = true, + .bootime = 2, +@@ -253,7 +259,10 @@ static const struct st_sensor_settings st_gyro_sensors_settings[] = { + * drain settings, but only for INT1 and not + * for the DRDY line on INT2. + */ +- .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .stat_drdy = { ++ .addr = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .mask = 0x07, ++ }, + }, + .multi_read_bit = true, + .bootime = 2, +diff --git a/drivers/iio/magnetometer/st_magn_core.c b/drivers/iio/magnetometer/st_magn_core.c +index 08aafba4481c..19031a7bce23 100644 +--- a/drivers/iio/magnetometer/st_magn_core.c ++++ b/drivers/iio/magnetometer/st_magn_core.c +@@ -317,7 +317,10 @@ static const struct st_sensor_settings st_magn_sensors_settings[] = { + }, + .drdy_irq = { + /* drdy line is routed drdy pin */ +- .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .stat_drdy = { ++ .addr = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .mask = 0x07, ++ }, + }, + .multi_read_bit = true, + .bootime = 2, +@@ -361,7 +364,10 @@ static const struct st_sensor_settings st_magn_sensors_settings[] = { + .drdy_irq = { + .addr = 0x62, + .mask_int1 = 0x01, +- .addr_stat_drdy = 0x67, ++ .stat_drdy = { ++ .addr = 0x67, ++ .mask = 0x07, ++ }, + }, + .multi_read_bit = false, + .bootime = 2, +diff --git a/drivers/iio/pressure/st_pressure_core.c b/drivers/iio/pressure/st_pressure_core.c +index 34611a8ea2ce..ea075fcd5a6f 100644 +--- a/drivers/iio/pressure/st_pressure_core.c ++++ b/drivers/iio/pressure/st_pressure_core.c +@@ -287,7 +287,10 @@ static const struct st_sensor_settings st_press_sensors_settings[] = { + .mask_ihl = 0x80, + .addr_od = 0x22, + .mask_od = 0x40, +- .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .stat_drdy = { ++ .addr = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .mask = 0x03, ++ }, + }, + .multi_read_bit = true, + .bootime = 2, +@@ -395,7 +398,10 @@ static const struct st_sensor_settings st_press_sensors_settings[] = { + .mask_ihl = 0x80, + .addr_od = 0x22, + .mask_od = 0x40, +- .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .stat_drdy = { ++ .addr = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .mask = 0x03, ++ }, + }, + .multi_read_bit = true, + .bootime = 2, +@@ -454,7 +460,10 @@ static const struct st_sensor_settings st_press_sensors_settings[] = { + .mask_ihl = 0x80, + .addr_od = 0x12, + .mask_od = 0x40, +- .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .stat_drdy = { ++ .addr = ST_SENSORS_DEFAULT_STAT_ADDR, ++ .mask = 0x03, ++ }, + }, + .multi_read_bit = false, + .bootime = 2, +diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c +index 747efd1ae5a6..8208c30f03c5 100644 +--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c ++++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c +@@ -1001,6 +1001,11 @@ static void hns_roce_v1_mr_free_work_fn(struct work_struct *work) + } + } + ++ if (!ne) { ++ dev_err(dev, "Reseved loop qp is absent!\n"); ++ goto free_work; ++ } ++ + do { + ret = hns_roce_v1_poll_cq(&mr_free_cq->ib_cq, ne, wc); + if (ret < 0) { +diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c +index c1b5f38f31a5..3b4916680018 100644 +--- a/drivers/infiniband/sw/rxe/rxe_pool.c ++++ b/drivers/infiniband/sw/rxe/rxe_pool.c +@@ -404,6 +404,8 @@ void *rxe_alloc(struct rxe_pool *pool) + elem = kmem_cache_zalloc(pool_cache(pool), + (pool->flags & RXE_POOL_ATOMIC) ? + GFP_ATOMIC : GFP_KERNEL); ++ if (!elem) ++ return NULL; + + elem->pool = pool; + kref_init(&elem->ref_cnt); +diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c b/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c +index afa938bd26d6..a72278e9cd27 100644 +--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c ++++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c +@@ -139,6 +139,7 @@ void opa_vnic_release_mac_tbl(struct opa_vnic_adapter *adapter) + rcu_assign_pointer(adapter->mactbl, NULL); + synchronize_rcu(); + opa_vnic_free_mac_tbl(mactbl); ++ adapter->info.vport.mac_tbl_digest = 0; + mutex_unlock(&adapter->mactbl_lock); + } + +diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema_iface.c b/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema_iface.c +index c2733964379c..9655cc3aa3a0 100644 +--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema_iface.c ++++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema_iface.c +@@ -348,7 +348,7 @@ void opa_vnic_query_mcast_macs(struct opa_vnic_adapter *adapter, + void opa_vnic_query_ucast_macs(struct opa_vnic_adapter *adapter, + struct opa_veswport_iface_macs *macs) + { +- u16 start_idx, tot_macs, num_macs, idx = 0, count = 0; ++ u16 start_idx, tot_macs, num_macs, idx = 0, count = 0, em_macs = 0; + struct netdev_hw_addr *ha; + + start_idx = be16_to_cpu(macs->start_idx); +@@ -359,8 +359,10 @@ void opa_vnic_query_ucast_macs(struct opa_vnic_adapter *adapter, + + /* Do not include EM specified MAC address */ + if (!memcmp(adapter->info.vport.base_mac_addr, ha->addr, +- ARRAY_SIZE(adapter->info.vport.base_mac_addr))) ++ ARRAY_SIZE(adapter->info.vport.base_mac_addr))) { ++ em_macs++; + continue; ++ } + + if (start_idx > idx++) + continue; +@@ -383,7 +385,7 @@ void opa_vnic_query_ucast_macs(struct opa_vnic_adapter *adapter, + } + + tot_macs = netdev_hw_addr_list_count(&adapter->netdev->dev_addrs) + +- netdev_uc_count(adapter->netdev); ++ netdev_uc_count(adapter->netdev) - em_macs; + macs->tot_macs_in_lst = cpu_to_be16(tot_macs); + macs->num_macs_in_msg = cpu_to_be16(count); + macs->gen_count = cpu_to_be16(adapter->info.vport.uc_macs_gen_count); +diff --git a/drivers/input/mouse/vmmouse.c b/drivers/input/mouse/vmmouse.c +index 0f586780ceb4..1ae5c1ef3f5b 100644 +--- a/drivers/input/mouse/vmmouse.c ++++ b/drivers/input/mouse/vmmouse.c +@@ -316,11 +316,9 @@ static int vmmouse_enable(struct psmouse *psmouse) + /* + * Array of supported hypervisors. + */ +-static const struct hypervisor_x86 *vmmouse_supported_hypervisors[] = { +- &x86_hyper_vmware, +-#ifdef CONFIG_KVM_GUEST +- &x86_hyper_kvm, +-#endif ++static enum x86_hypervisor_type vmmouse_supported_hypervisors[] = { ++ X86_HYPER_VMWARE, ++ X86_HYPER_KVM, + }; + + /** +@@ -331,7 +329,7 @@ static bool vmmouse_check_hypervisor(void) + int i; + + for (i = 0; i < ARRAY_SIZE(vmmouse_supported_hypervisors); i++) +- if (vmmouse_supported_hypervisors[i] == x86_hyper) ++ if (vmmouse_supported_hypervisors[i] == x86_hyper_type) + return true; + + return false; +diff --git a/drivers/leds/leds-pca955x.c b/drivers/leds/leds-pca955x.c +index 905729191d3e..78183f90820e 100644 +--- a/drivers/leds/leds-pca955x.c ++++ b/drivers/leds/leds-pca955x.c +@@ -61,6 +61,10 @@ + #define PCA955X_LS_BLINK0 0x2 /* Blink at PWM0 rate */ + #define PCA955X_LS_BLINK1 0x3 /* Blink at PWM1 rate */ + ++#define PCA955X_GPIO_INPUT LED_OFF ++#define PCA955X_GPIO_HIGH LED_OFF ++#define PCA955X_GPIO_LOW LED_FULL ++ + enum pca955x_type { + pca9550, + pca9551, +@@ -329,9 +333,9 @@ static int pca955x_set_value(struct gpio_chip *gc, unsigned int offset, + struct pca955x_led *led = &pca955x->leds[offset]; + + if (val) +- return pca955x_led_set(&led->led_cdev, LED_FULL); +- else +- return pca955x_led_set(&led->led_cdev, LED_OFF); ++ return pca955x_led_set(&led->led_cdev, PCA955X_GPIO_HIGH); ++ ++ return pca955x_led_set(&led->led_cdev, PCA955X_GPIO_LOW); + } + + static void pca955x_gpio_set_value(struct gpio_chip *gc, unsigned int offset, +@@ -355,8 +359,11 @@ static int pca955x_gpio_get_value(struct gpio_chip *gc, unsigned int offset) + static int pca955x_gpio_direction_input(struct gpio_chip *gc, + unsigned int offset) + { +- /* To use as input ensure pin is not driven */ +- return pca955x_set_value(gc, offset, 0); ++ struct pca955x *pca955x = gpiochip_get_data(gc); ++ struct pca955x_led *led = &pca955x->leds[offset]; ++ ++ /* To use as input ensure pin is not driven. */ ++ return pca955x_led_set(&led->led_cdev, PCA955X_GPIO_INPUT); + } + + static int pca955x_gpio_direction_output(struct gpio_chip *gc, +diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c +index 35e82b14ded7..ddf0a4341ae2 100644 +--- a/drivers/md/dm-mpath.c ++++ b/drivers/md/dm-mpath.c +@@ -366,7 +366,7 @@ static struct pgpath *choose_path_in_pg(struct multipath *m, + + pgpath = path_to_pgpath(path); + +- if (unlikely(lockless_dereference(m->current_pg) != pg)) { ++ if (unlikely(READ_ONCE(m->current_pg) != pg)) { + /* Only update current_pgpath if pg changed */ + spin_lock_irqsave(&m->lock, flags); + m->current_pgpath = pgpath; +@@ -390,7 +390,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes) + } + + /* Were we instructed to switch PG? */ +- if (lockless_dereference(m->next_pg)) { ++ if (READ_ONCE(m->next_pg)) { + spin_lock_irqsave(&m->lock, flags); + pg = m->next_pg; + if (!pg) { +@@ -406,7 +406,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes) + + /* Don't change PG until it has no remaining paths */ + check_current_pg: +- pg = lockless_dereference(m->current_pg); ++ pg = READ_ONCE(m->current_pg); + if (pg) { + pgpath = choose_path_in_pg(m, pg, nr_bytes); + if (!IS_ERR_OR_NULL(pgpath)) +@@ -473,7 +473,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq, + struct request *clone; + + /* Do we need to select a new pgpath? */ +- pgpath = lockless_dereference(m->current_pgpath); ++ pgpath = READ_ONCE(m->current_pgpath); + if (!pgpath || !test_bit(MPATHF_QUEUE_IO, &m->flags)) + pgpath = choose_pgpath(m, nr_bytes); + +@@ -533,7 +533,7 @@ static int __multipath_map_bio(struct multipath *m, struct bio *bio, struct dm_m + bool queue_io; + + /* Do we need to select a new pgpath? */ +- pgpath = lockless_dereference(m->current_pgpath); ++ pgpath = READ_ONCE(m->current_pgpath); + queue_io = test_bit(MPATHF_QUEUE_IO, &m->flags); + if (!pgpath || !queue_io) + pgpath = choose_pgpath(m, nr_bytes); +@@ -1802,7 +1802,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti, + struct pgpath *current_pgpath; + int r; + +- current_pgpath = lockless_dereference(m->current_pgpath); ++ current_pgpath = READ_ONCE(m->current_pgpath); + if (!current_pgpath) + current_pgpath = choose_pgpath(m, 0); + +@@ -1824,7 +1824,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti, + } + + if (r == -ENOTCONN) { +- if (!lockless_dereference(m->current_pg)) { ++ if (!READ_ONCE(m->current_pg)) { + /* Path status changed, redo selection */ + (void) choose_pgpath(m, 0); + } +@@ -1893,9 +1893,9 @@ static int multipath_busy(struct dm_target *ti) + return (m->queue_mode != DM_TYPE_MQ_REQUEST_BASED); + + /* Guess which priority_group will be used at next mapping time */ +- pg = lockless_dereference(m->current_pg); +- next_pg = lockless_dereference(m->next_pg); +- if (unlikely(!lockless_dereference(m->current_pgpath) && next_pg)) ++ pg = READ_ONCE(m->current_pg); ++ next_pg = READ_ONCE(m->next_pg); ++ if (unlikely(!READ_ONCE(m->current_pgpath) && next_pg)) + pg = next_pg; + + if (!pg) { +diff --git a/drivers/md/md.c b/drivers/md/md.c +index 98ea86309ceb..6bf093cef958 100644 +--- a/drivers/md/md.c ++++ b/drivers/md/md.c +@@ -7468,8 +7468,8 @@ void md_wakeup_thread(struct md_thread *thread) + { + if (thread) { + pr_debug("md: waking up MD thread %s.\n", thread->tsk->comm); +- if (!test_and_set_bit(THREAD_WAKEUP, &thread->flags)) +- wake_up(&thread->wqueue); ++ set_bit(THREAD_WAKEUP, &thread->flags); ++ wake_up(&thread->wqueue); + } + } + EXPORT_SYMBOL(md_wakeup_thread); +diff --git a/drivers/misc/pti.c b/drivers/misc/pti.c +index eda38cbe8530..41f2a9f6851d 100644 +--- a/drivers/misc/pti.c ++++ b/drivers/misc/pti.c +@@ -32,7 +32,7 @@ + #include + #include + #include +-#include ++#include + #include + #include + +diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c +index 1e688bfec567..9047c0a529b2 100644 +--- a/drivers/misc/vmw_balloon.c ++++ b/drivers/misc/vmw_balloon.c +@@ -1271,7 +1271,7 @@ static int __init vmballoon_init(void) + * Check if we are running on VMware's hypervisor and bail out + * if we are not. + */ +- if (x86_hyper != &x86_hyper_vmware) ++ if (x86_hyper_type != X86_HYPER_VMWARE) + return -ENODEV; + + for (is_2m_pages = 0; is_2m_pages < VMW_BALLOON_NUM_PAGE_SIZES; +diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c +index c66abd476023..3b0db01ead1f 100644 +--- a/drivers/net/ethernet/ibm/ibmvnic.c ++++ b/drivers/net/ethernet/ibm/ibmvnic.c +@@ -927,6 +927,7 @@ static int ibmvnic_open(struct net_device *netdev) + } + + rc = __ibmvnic_open(netdev); ++ netif_carrier_on(netdev); + mutex_unlock(&adapter->reset_lock); + + return rc; +@@ -3899,6 +3900,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id) + if (rc) + goto ibmvnic_init_fail; + ++ netif_carrier_off(netdev); + rc = register_netdev(netdev); + if (rc) { + dev_err(&dev->dev, "failed to register netdev rc=%d\n", rc); +diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h +index 689c413b7782..d2f9a2dd76a2 100644 +--- a/drivers/net/ethernet/intel/fm10k/fm10k.h ++++ b/drivers/net/ethernet/intel/fm10k/fm10k.h +@@ -526,8 +526,8 @@ s32 fm10k_iov_update_pvid(struct fm10k_intfc *interface, u16 glort, u16 pvid); + int fm10k_ndo_set_vf_mac(struct net_device *netdev, int vf_idx, u8 *mac); + int fm10k_ndo_set_vf_vlan(struct net_device *netdev, + int vf_idx, u16 vid, u8 qos, __be16 vlan_proto); +-int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx, int rate, +- int unused); ++int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx, ++ int __always_unused min_rate, int max_rate); + int fm10k_ndo_get_vf_config(struct net_device *netdev, + int vf_idx, struct ifla_vf_info *ivi); + +diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c +index 5f4dac0d36ef..e72fd52bacfe 100644 +--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c ++++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c +@@ -126,6 +126,9 @@ s32 fm10k_iov_mbx(struct fm10k_intfc *interface) + struct fm10k_mbx_info *mbx = &vf_info->mbx; + u16 glort = vf_info->glort; + ++ /* process the SM mailbox first to drain outgoing messages */ ++ hw->mbx.ops.process(hw, &hw->mbx); ++ + /* verify port mapping is valid, if not reset port */ + if (vf_info->vf_flags && !fm10k_glort_valid_pf(hw, glort)) + hw->iov.ops.reset_lport(hw, vf_info); +@@ -482,7 +485,7 @@ int fm10k_ndo_set_vf_vlan(struct net_device *netdev, int vf_idx, u16 vid, + } + + int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx, +- int __always_unused unused, int rate) ++ int __always_unused min_rate, int max_rate) + { + struct fm10k_intfc *interface = netdev_priv(netdev); + struct fm10k_iov_data *iov_data = interface->iov_data; +@@ -493,14 +496,15 @@ int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx, + return -EINVAL; + + /* rate limit cannot be less than 10Mbs or greater than link speed */ +- if (rate && ((rate < FM10K_VF_TC_MIN) || rate > FM10K_VF_TC_MAX)) ++ if (max_rate && ++ (max_rate < FM10K_VF_TC_MIN || max_rate > FM10K_VF_TC_MAX)) + return -EINVAL; + + /* store values */ +- iov_data->vf_info[vf_idx].rate = rate; ++ iov_data->vf_info[vf_idx].rate = max_rate; + + /* update hardware configuration */ +- hw->iov.ops.configure_tc(hw, vf_idx, rate); ++ hw->iov.ops.configure_tc(hw, vf_idx, max_rate); + + return 0; + } +diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c +index ea20aacd5e1d..b2cde9b16d82 100644 +--- a/drivers/net/ethernet/intel/i40e/i40e_main.c ++++ b/drivers/net/ethernet/intel/i40e/i40e_main.c +@@ -2874,14 +2874,15 @@ static void i40e_vsi_free_rx_resources(struct i40e_vsi *vsi) + static void i40e_config_xps_tx_ring(struct i40e_ring *ring) + { + struct i40e_vsi *vsi = ring->vsi; ++ int cpu; + + if (!ring->q_vector || !ring->netdev) + return; + + if ((vsi->tc_config.numtc <= 1) && + !test_and_set_bit(__I40E_TX_XPS_INIT_DONE, &ring->state)) { +- netif_set_xps_queue(ring->netdev, +- get_cpu_mask(ring->q_vector->v_idx), ++ cpu = cpumask_local_spread(ring->q_vector->v_idx, -1); ++ netif_set_xps_queue(ring->netdev, get_cpu_mask(cpu), + ring->queue_index); + } + +@@ -3471,6 +3472,7 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename) + int tx_int_idx = 0; + int vector, err; + int irq_num; ++ int cpu; + + for (vector = 0; vector < q_vectors; vector++) { + struct i40e_q_vector *q_vector = vsi->q_vectors[vector]; +@@ -3506,10 +3508,14 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename) + q_vector->affinity_notify.notify = i40e_irq_affinity_notify; + q_vector->affinity_notify.release = i40e_irq_affinity_release; + irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify); +- /* get_cpu_mask returns a static constant mask with +- * a permanent lifetime so it's ok to use here. ++ /* Spread affinity hints out across online CPUs. ++ * ++ * get_cpu_mask returns a static constant mask with ++ * a permanent lifetime so it's ok to pass to ++ * irq_set_affinity_hint without making a copy. + */ +- irq_set_affinity_hint(irq_num, get_cpu_mask(q_vector->v_idx)); ++ cpu = cpumask_local_spread(q_vector->v_idx, -1); ++ irq_set_affinity_hint(irq_num, get_cpu_mask(cpu)); + } + + vsi->irqs_ready = true; +diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +index 4d1e670f490e..e368b0237a1b 100644 +--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c ++++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +@@ -1008,8 +1008,8 @@ static void i40e_cleanup_reset_vf(struct i40e_vf *vf) + set_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states); + clear_bit(I40E_VF_STATE_DISABLED, &vf->vf_states); + /* Do not notify the client during VF init */ +- if (test_and_clear_bit(I40E_VF_STATE_PRE_ENABLE, +- &vf->vf_states)) ++ if (!test_and_clear_bit(I40E_VF_STATE_PRE_ENABLE, ++ &vf->vf_states)) + i40e_notify_client_of_vf_reset(pf, abs_vf_id); + vf->num_vlan = 0; + } +@@ -2779,6 +2779,7 @@ int i40e_ndo_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac) + struct i40e_mac_filter *f; + struct i40e_vf *vf; + int ret = 0; ++ struct hlist_node *h; + int bkt; + + /* validate the request */ +@@ -2817,7 +2818,7 @@ int i40e_ndo_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac) + /* Delete all the filters for this VSI - we're going to kill it + * anyway. + */ +- hash_for_each(vsi->mac_filter_hash, bkt, f, hlist) ++ hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist) + __i40e_del_filter(vsi, f); + + spin_unlock_bh(&vsi->mac_filter_hash_lock); +diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c +index 1825d956bb00..1ccad6f30ebf 100644 +--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c ++++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c +@@ -546,6 +546,7 @@ i40evf_request_traffic_irqs(struct i40evf_adapter *adapter, char *basename) + unsigned int vector, q_vectors; + unsigned int rx_int_idx = 0, tx_int_idx = 0; + int irq_num, err; ++ int cpu; + + i40evf_irq_disable(adapter); + /* Decrement for Other and TCP Timer vectors */ +@@ -584,10 +585,12 @@ i40evf_request_traffic_irqs(struct i40evf_adapter *adapter, char *basename) + q_vector->affinity_notify.release = + i40evf_irq_affinity_release; + irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify); +- /* get_cpu_mask returns a static constant mask with +- * a permanent lifetime so it's ok to use here. ++ /* Spread the IRQ affinity hints across online CPUs. Note that ++ * get_cpu_mask returns a mask with a permanent lifetime so ++ * it's safe to use as a hint for irq_set_affinity_hint. + */ +- irq_set_affinity_hint(irq_num, get_cpu_mask(q_vector->v_idx)); ++ cpu = cpumask_local_spread(q_vector->v_idx, -1); ++ irq_set_affinity_hint(irq_num, get_cpu_mask(cpu)); + } + + return 0; +diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c +index b0031c5ff767..667dbc7d4a4e 100644 +--- a/drivers/net/ethernet/intel/igb/igb_main.c ++++ b/drivers/net/ethernet/intel/igb/igb_main.c +@@ -3162,6 +3162,8 @@ static int igb_sw_init(struct igb_adapter *adapter) + /* Setup and initialize a copy of the hw vlan table array */ + adapter->shadow_vfta = kcalloc(E1000_VLAN_FILTER_TBL_SIZE, sizeof(u32), + GFP_ATOMIC); ++ if (!adapter->shadow_vfta) ++ return -ENOMEM; + + /* This call may decrease the number of queues */ + if (igb_init_interrupt_scheme(adapter, true)) { +diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c +index 6e6ab6f6875e..64429a14c630 100644 +--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c ++++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c +@@ -3781,10 +3781,10 @@ s32 ixgbe_set_fw_drv_ver_generic(struct ixgbe_hw *hw, u8 maj, u8 min, + fw_cmd.ver_build = build; + fw_cmd.ver_sub = sub; + fw_cmd.hdr.checksum = 0; +- fw_cmd.hdr.checksum = ixgbe_calculate_checksum((u8 *)&fw_cmd, +- (FW_CEM_HDR_LEN + fw_cmd.hdr.buf_len)); + fw_cmd.pad = 0; + fw_cmd.pad2 = 0; ++ fw_cmd.hdr.checksum = ixgbe_calculate_checksum((u8 *)&fw_cmd, ++ (FW_CEM_HDR_LEN + fw_cmd.hdr.buf_len)); + + for (i = 0; i <= FW_CEM_MAX_RETRIES; i++) { + ret_val = ixgbe_host_interface_command(hw, &fw_cmd, +diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c +index 19fbb2f28ea4..8a85217845ae 100644 +--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c ++++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c +@@ -900,6 +900,8 @@ static s32 ixgbe_read_ee_hostif_buffer_X550(struct ixgbe_hw *hw, + /* convert offset from words to bytes */ + buffer.address = cpu_to_be32((offset + current_word) * 2); + buffer.length = cpu_to_be16(words_to_read * 2); ++ buffer.pad2 = 0; ++ buffer.pad3 = 0; + + status = ixgbe_hic_unlocked(hw, (u32 *)&buffer, sizeof(buffer), + IXGBE_HI_COMMAND_TIMEOUT); +diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c +index c1e52b9dc58d..5f93e6add563 100644 +--- a/drivers/net/phy/at803x.c ++++ b/drivers/net/phy/at803x.c +@@ -167,7 +167,7 @@ static int at803x_set_wol(struct phy_device *phydev, + mac = (const u8 *) ndev->dev_addr; + + if (!is_valid_ether_addr(mac)) +- return -EFAULT; ++ return -EINVAL; + + for (i = 0; i < 3; i++) { + phy_write(phydev, AT803X_MMD_ACCESS_CONTROL, +diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c +index ac41c8be9200..0fd8e164339c 100644 +--- a/drivers/pci/iov.c ++++ b/drivers/pci/iov.c +@@ -162,7 +162,6 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id, int reset) + + pci_device_add(virtfn, virtfn->bus); + +- pci_bus_add_device(virtfn); + sprintf(buf, "virtfn%u", id); + rc = sysfs_create_link(&dev->dev.kobj, &virtfn->dev.kobj, buf); + if (rc) +@@ -173,6 +172,8 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id, int reset) + + kobject_uevent(&virtfn->dev.kobj, KOBJ_CHANGE); + ++ pci_bus_add_device(virtfn); ++ + return 0; + + failed2: +diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c +index 6078dfc11b11..74f1c57ab93b 100644 +--- a/drivers/pci/pci.c ++++ b/drivers/pci/pci.c +@@ -4356,6 +4356,10 @@ static bool pci_bus_resetable(struct pci_bus *bus) + { + struct pci_dev *dev; + ++ ++ if (bus->self && (bus->self->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)) ++ return false; ++ + list_for_each_entry(dev, &bus->devices, bus_list) { + if (dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET || + (dev->subordinate && !pci_bus_resetable(dev->subordinate))) +diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c +index 890efcc574cb..744805232155 100644 +--- a/drivers/pci/pcie/aer/aerdrv_core.c ++++ b/drivers/pci/pcie/aer/aerdrv_core.c +@@ -390,7 +390,14 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, + * If the error is reported by an end point, we think this + * error is related to the upstream link of the end point. + */ +- pci_walk_bus(dev->bus, cb, &result_data); ++ if (state == pci_channel_io_normal) ++ /* ++ * the error is non fatal so the bus is ok, just invoke ++ * the callback for the function that logged the error. ++ */ ++ cb(dev, &result_data); ++ else ++ pci_walk_bus(dev->bus, cb, &result_data); + } + + return result_data.result; +diff --git a/drivers/platform/x86/asus-wireless.c b/drivers/platform/x86/asus-wireless.c +index f3796164329e..d4aeac3477f5 100644 +--- a/drivers/platform/x86/asus-wireless.c ++++ b/drivers/platform/x86/asus-wireless.c +@@ -118,6 +118,7 @@ static void asus_wireless_notify(struct acpi_device *adev, u32 event) + return; + } + input_report_key(data->idev, KEY_RFKILL, 1); ++ input_sync(data->idev); + input_report_key(data->idev, KEY_RFKILL, 0); + input_sync(data->idev); + } +diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c +index 8cec9a02c0b8..9eb32ead63db 100644 +--- a/drivers/rtc/interface.c ++++ b/drivers/rtc/interface.c +@@ -779,7 +779,7 @@ static int rtc_timer_enqueue(struct rtc_device *rtc, struct rtc_timer *timer) + } + + timerqueue_add(&rtc->timerqueue, &timer->node); +- if (!next) { ++ if (!next || ktime_before(timer->node.expires, next->expires)) { + struct rtc_wkalrm alarm; + int err; + alarm.time = rtc_ktime_to_tm(timer->node.expires); +diff --git a/drivers/rtc/rtc-pl031.c b/drivers/rtc/rtc-pl031.c +index e1687e19c59f..a30f24cb6c83 100644 +--- a/drivers/rtc/rtc-pl031.c ++++ b/drivers/rtc/rtc-pl031.c +@@ -308,7 +308,8 @@ static int pl031_remove(struct amba_device *adev) + + dev_pm_clear_wake_irq(&adev->dev); + device_init_wakeup(&adev->dev, false); +- free_irq(adev->irq[0], ldata); ++ if (adev->irq[0]) ++ free_irq(adev->irq[0], ldata); + rtc_device_unregister(ldata->rtc); + iounmap(ldata->base); + kfree(ldata); +@@ -381,12 +382,13 @@ static int pl031_probe(struct amba_device *adev, const struct amba_id *id) + goto out_no_rtc; + } + +- if (request_irq(adev->irq[0], pl031_interrupt, +- vendor->irqflags, "rtc-pl031", ldata)) { +- ret = -EIO; +- goto out_no_irq; ++ if (adev->irq[0]) { ++ ret = request_irq(adev->irq[0], pl031_interrupt, ++ vendor->irqflags, "rtc-pl031", ldata); ++ if (ret) ++ goto out_no_irq; ++ dev_pm_set_wake_irq(&adev->dev, adev->irq[0]); + } +- dev_pm_set_wake_irq(&adev->dev, adev->irq[0]); + return 0; + + out_no_irq: +diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c +index 1d02cf9fe06c..30d5f0ef29bb 100644 +--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c ++++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c +@@ -1575,6 +1575,7 @@ static void release_offload_resources(struct cxgbi_sock *csk) + csk, csk->state, csk->flags, csk->tid); + + cxgbi_sock_free_cpl_skbs(csk); ++ cxgbi_sock_purge_write_queue(csk); + if (csk->wr_cred != csk->wr_max_cred) { + cxgbi_sock_purge_wr_queue(csk); + cxgbi_sock_reset_wr_list(csk); +diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c +index 499df9d17339..d9a03beb76a4 100644 +--- a/drivers/scsi/lpfc/lpfc_hbadisc.c ++++ b/drivers/scsi/lpfc/lpfc_hbadisc.c +@@ -4983,7 +4983,8 @@ lpfc_nlp_remove(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp) + lpfc_cancel_retry_delay_tmo(vport, ndlp); + if ((ndlp->nlp_flag & NLP_DEFER_RM) && + !(ndlp->nlp_flag & NLP_REG_LOGIN_SEND) && +- !(ndlp->nlp_flag & NLP_RPI_REGISTERED)) { ++ !(ndlp->nlp_flag & NLP_RPI_REGISTERED) && ++ phba->sli_rev != LPFC_SLI_REV4) { + /* For this case we need to cleanup the default rpi + * allocated by the firmware. + */ +diff --git a/drivers/scsi/lpfc/lpfc_hw4.h b/drivers/scsi/lpfc/lpfc_hw4.h +index 1db0a38683f4..2b145966c73f 100644 +--- a/drivers/scsi/lpfc/lpfc_hw4.h ++++ b/drivers/scsi/lpfc/lpfc_hw4.h +@@ -3636,7 +3636,7 @@ struct lpfc_mbx_get_port_name { + #define MB_CEQ_STATUS_QUEUE_FLUSHING 0x4 + #define MB_CQE_STATUS_DMA_FAILED 0x5 + +-#define LPFC_MBX_WR_CONFIG_MAX_BDE 8 ++#define LPFC_MBX_WR_CONFIG_MAX_BDE 1 + struct lpfc_mbx_wr_object { + struct mbox_header header; + union { +diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c +index 3c5b054a56ac..7ac1a067d780 100644 +--- a/drivers/scsi/lpfc/lpfc_nvmet.c ++++ b/drivers/scsi/lpfc/lpfc_nvmet.c +@@ -1464,6 +1464,7 @@ static struct lpfc_nvmet_ctxbuf * + lpfc_nvmet_replenish_context(struct lpfc_hba *phba, + struct lpfc_nvmet_ctx_info *current_infop) + { ++#if (IS_ENABLED(CONFIG_NVME_TARGET_FC)) + struct lpfc_nvmet_ctxbuf *ctx_buf = NULL; + struct lpfc_nvmet_ctx_info *get_infop; + int i; +@@ -1511,6 +1512,7 @@ lpfc_nvmet_replenish_context(struct lpfc_hba *phba, + get_infop = get_infop->nvmet_ctx_next_cpu; + } + ++#endif + /* Nothing found, all contexts for the MRQ are in-flight */ + return NULL; + } +diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c +index 22998cbd538f..33ff691878e2 100644 +--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c ++++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c +@@ -4804,6 +4804,11 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply) + } else if (log_info == VIRTUAL_IO_FAILED_RETRY) { + scmd->result = DID_RESET << 16; + break; ++ } else if ((scmd->device->channel == RAID_CHANNEL) && ++ (scsi_state == (MPI2_SCSI_STATE_TERMINATED | ++ MPI2_SCSI_STATE_NO_SCSI_STATUS))) { ++ scmd->result = DID_RESET << 16; ++ break; + } + scmd->result = DID_SOFT_ERROR << 16; + break; +diff --git a/drivers/staging/greybus/light.c b/drivers/staging/greybus/light.c +index 3f4148c92308..0f538b8c3a07 100644 +--- a/drivers/staging/greybus/light.c ++++ b/drivers/staging/greybus/light.c +@@ -925,6 +925,8 @@ static void __gb_lights_led_unregister(struct gb_channel *channel) + return; + + led_classdev_unregister(cdev); ++ kfree(cdev->name); ++ cdev->name = NULL; + channel->led = NULL; + } + +diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c +index 7952357df9c8..edb6e4e9ef3a 100644 +--- a/drivers/tee/optee/core.c ++++ b/drivers/tee/optee/core.c +@@ -590,7 +590,6 @@ static int __init optee_driver_init(void) + return -ENODEV; + + np = of_find_matching_node(fw_np, optee_match); +- of_node_put(fw_np); + if (!np) + return -ENODEV; + +diff --git a/drivers/thermal/hisi_thermal.c b/drivers/thermal/hisi_thermal.c +index bd3572c41585..6d8906d65476 100644 +--- a/drivers/thermal/hisi_thermal.c ++++ b/drivers/thermal/hisi_thermal.c +@@ -35,8 +35,9 @@ + #define TEMP0_RST_MSK (0x1C) + #define TEMP0_VALUE (0x28) + +-#define HISI_TEMP_BASE (-60) ++#define HISI_TEMP_BASE (-60000) + #define HISI_TEMP_RESET (100000) ++#define HISI_TEMP_STEP (784) + + #define HISI_MAX_SENSORS 4 + +@@ -61,19 +62,38 @@ struct hisi_thermal_data { + void __iomem *regs; + }; + +-/* in millicelsius */ +-static inline int _step_to_temp(int step) ++/* ++ * The temperature computation on the tsensor is as follow: ++ * Unit: millidegree Celsius ++ * Step: 255/200 (0.7843) ++ * Temperature base: -60°C ++ * ++ * The register is programmed in temperature steps, every step is 784 ++ * millidegree and begins at -60 000 m°C ++ * ++ * The temperature from the steps: ++ * ++ * Temp = TempBase + (steps x 784) ++ * ++ * and the steps from the temperature: ++ * ++ * steps = (Temp - TempBase) / 784 ++ * ++ */ ++static inline int hisi_thermal_step_to_temp(int step) + { +- /* +- * Every step equals (1 * 200) / 255 celsius, and finally +- * need convert to millicelsius. +- */ +- return (HISI_TEMP_BASE * 1000 + (step * 200000 / 255)); ++ return HISI_TEMP_BASE + (step * HISI_TEMP_STEP); ++} ++ ++static inline long hisi_thermal_temp_to_step(long temp) ++{ ++ return (temp - HISI_TEMP_BASE) / HISI_TEMP_STEP; + } + +-static inline long _temp_to_step(long temp) ++static inline long hisi_thermal_round_temp(int temp) + { +- return ((temp - HISI_TEMP_BASE * 1000) * 255) / 200000; ++ return hisi_thermal_step_to_temp( ++ hisi_thermal_temp_to_step(temp)); + } + + static long hisi_thermal_get_sensor_temp(struct hisi_thermal_data *data, +@@ -99,7 +119,7 @@ static long hisi_thermal_get_sensor_temp(struct hisi_thermal_data *data, + usleep_range(3000, 5000); + + val = readl(data->regs + TEMP0_VALUE); +- val = _step_to_temp(val); ++ val = hisi_thermal_step_to_temp(val); + + mutex_unlock(&data->thermal_lock); + +@@ -126,10 +146,11 @@ static void hisi_thermal_enable_bind_irq_sensor + writel((sensor->id << 12), data->regs + TEMP0_CFG); + + /* enable for interrupt */ +- writel(_temp_to_step(sensor->thres_temp) | 0x0FFFFFF00, ++ writel(hisi_thermal_temp_to_step(sensor->thres_temp) | 0x0FFFFFF00, + data->regs + TEMP0_TH); + +- writel(_temp_to_step(HISI_TEMP_RESET), data->regs + TEMP0_RST_TH); ++ writel(hisi_thermal_temp_to_step(HISI_TEMP_RESET), ++ data->regs + TEMP0_RST_TH); + + /* enable module */ + writel(0x1, data->regs + TEMP0_RST_MSK); +@@ -230,7 +251,7 @@ static irqreturn_t hisi_thermal_alarm_irq_thread(int irq, void *dev) + sensor = &data->sensors[data->irq_bind_sensor]; + + dev_crit(&data->pdev->dev, "THERMAL ALARM: T > %d\n", +- sensor->thres_temp / 1000); ++ sensor->thres_temp); + mutex_unlock(&data->thermal_lock); + + for (i = 0; i < HISI_MAX_SENSORS; i++) { +@@ -269,7 +290,7 @@ static int hisi_thermal_register_sensor(struct platform_device *pdev, + + for (i = 0; i < of_thermal_get_ntrips(sensor->tzd); i++) { + if (trip[i].type == THERMAL_TRIP_PASSIVE) { +- sensor->thres_temp = trip[i].temperature; ++ sensor->thres_temp = hisi_thermal_round_temp(trip[i].temperature); + break; + } + } +@@ -317,15 +338,6 @@ static int hisi_thermal_probe(struct platform_device *pdev) + if (data->irq < 0) + return data->irq; + +- ret = devm_request_threaded_irq(&pdev->dev, data->irq, +- hisi_thermal_alarm_irq, +- hisi_thermal_alarm_irq_thread, +- 0, "hisi_thermal", data); +- if (ret < 0) { +- dev_err(&pdev->dev, "failed to request alarm irq: %d\n", ret); +- return ret; +- } +- + platform_set_drvdata(pdev, data); + + data->clk = devm_clk_get(&pdev->dev, "thermal_clk"); +@@ -345,8 +357,7 @@ static int hisi_thermal_probe(struct platform_device *pdev) + } + + hisi_thermal_enable_bind_irq_sensor(data); +- irq_get_irqchip_state(data->irq, IRQCHIP_STATE_MASKED, +- &data->irq_enabled); ++ data->irq_enabled = true; + + for (i = 0; i < HISI_MAX_SENSORS; ++i) { + ret = hisi_thermal_register_sensor(pdev, data, +@@ -358,6 +369,17 @@ static int hisi_thermal_probe(struct platform_device *pdev) + hisi_thermal_toggle_sensor(&data->sensors[i], true); + } + ++ ret = devm_request_threaded_irq(&pdev->dev, data->irq, ++ hisi_thermal_alarm_irq, ++ hisi_thermal_alarm_irq_thread, ++ 0, "hisi_thermal", data); ++ if (ret < 0) { ++ dev_err(&pdev->dev, "failed to request alarm irq: %d\n", ret); ++ return ret; ++ } ++ ++ enable_irq(data->irq); ++ + return 0; + } + +diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c +index 5628fe114347..91335e6de88a 100644 +--- a/drivers/vfio/pci/vfio_pci_config.c ++++ b/drivers/vfio/pci/vfio_pci_config.c +@@ -849,11 +849,13 @@ static int __init init_pci_cap_exp_perm(struct perm_bits *perm) + + /* + * Allow writes to device control fields, except devctl_phantom, +- * which could confuse IOMMU, and the ARI bit in devctl2, which ++ * which could confuse IOMMU, MPS, which can break communication ++ * with other physical devices, and the ARI bit in devctl2, which + * is set at probe time. FLR gets virtualized via our writefn. + */ + p_setw(perm, PCI_EXP_DEVCTL, +- PCI_EXP_DEVCTL_BCR_FLR, ~PCI_EXP_DEVCTL_PHANTOM); ++ PCI_EXP_DEVCTL_BCR_FLR | PCI_EXP_DEVCTL_PAYLOAD, ++ ~PCI_EXP_DEVCTL_PHANTOM); + p_setw(perm, PCI_EXP_DEVCTL2, NO_VIRT, ~PCI_EXP_DEVCTL2_ARI); + return 0; + } +diff --git a/drivers/video/backlight/pwm_bl.c b/drivers/video/backlight/pwm_bl.c +index 9bd17682655a..1c2289ddd555 100644 +--- a/drivers/video/backlight/pwm_bl.c ++++ b/drivers/video/backlight/pwm_bl.c +@@ -79,14 +79,17 @@ static void pwm_backlight_power_off(struct pwm_bl_data *pb) + static int compute_duty_cycle(struct pwm_bl_data *pb, int brightness) + { + unsigned int lth = pb->lth_brightness; +- int duty_cycle; ++ u64 duty_cycle; + + if (pb->levels) + duty_cycle = pb->levels[brightness]; + else + duty_cycle = brightness; + +- return (duty_cycle * (pb->period - lth) / pb->scale) + lth; ++ duty_cycle *= pb->period - lth; ++ do_div(duty_cycle, pb->scale); ++ ++ return duty_cycle + lth; + } + + static int pwm_backlight_update_status(struct backlight_device *bl) +diff --git a/fs/dcache.c b/fs/dcache.c +index f90141387f01..34c852af215c 100644 +--- a/fs/dcache.c ++++ b/fs/dcache.c +@@ -231,7 +231,7 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c + { + /* + * Be careful about RCU walk racing with rename: +- * use 'lockless_dereference' to fetch the name pointer. ++ * use 'READ_ONCE' to fetch the name pointer. + * + * NOTE! Even if a rename will mean that the length + * was not loaded atomically, we don't care. The +@@ -245,7 +245,7 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c + * early because the data cannot match (there can + * be no NUL in the ct/tcount data) + */ +- const unsigned char *cs = lockless_dereference(dentry->d_name.name); ++ const unsigned char *cs = READ_ONCE(dentry->d_name.name); + + return dentry_string_cmp(cs, ct, tcount); + } +diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h +index 25d9b5adcd42..36b49bd09264 100644 +--- a/fs/overlayfs/ovl_entry.h ++++ b/fs/overlayfs/ovl_entry.h +@@ -77,5 +77,5 @@ static inline struct ovl_inode *OVL_I(struct inode *inode) + + static inline struct dentry *ovl_upperdentry_dereference(struct ovl_inode *oi) + { +- return lockless_dereference(oi->__upperdentry); ++ return READ_ONCE(oi->__upperdentry); + } +diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c +index b2c7f33e08fc..d94a51dc4e32 100644 +--- a/fs/overlayfs/readdir.c ++++ b/fs/overlayfs/readdir.c +@@ -757,7 +757,7 @@ static int ovl_dir_fsync(struct file *file, loff_t start, loff_t end, + if (!od->is_upper && OVL_TYPE_UPPER(ovl_path_type(dentry))) { + struct inode *inode = file_inode(file); + +- realfile = lockless_dereference(od->upperfile); ++ realfile = READ_ONCE(od->upperfile); + if (!realfile) { + struct path upperpath; + +diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h +index e549bff87c5b..353f52fdc35e 100644 +--- a/include/asm-generic/vmlinux.lds.h ++++ b/include/asm-generic/vmlinux.lds.h +@@ -688,7 +688,7 @@ + #define BUG_TABLE + #endif + +-#ifdef CONFIG_ORC_UNWINDER ++#ifdef CONFIG_UNWINDER_ORC + #define ORC_UNWIND_TABLE \ + . = ALIGN(4); \ + .orc_unwind_ip : AT(ADDR(.orc_unwind_ip) - LOAD_OFFSET) { \ +diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h +index b8d200f60a40..73bec75b74c8 100644 +--- a/include/linux/bpf_verifier.h ++++ b/include/linux/bpf_verifier.h +@@ -15,11 +15,11 @@ + * In practice this is far bigger than any realistic pointer offset; this limit + * ensures that umax_value + (int)off + (int)size cannot overflow a u64. + */ +-#define BPF_MAX_VAR_OFF (1ULL << 31) ++#define BPF_MAX_VAR_OFF (1 << 29) + /* Maximum variable size permitted for ARG_CONST_SIZE[_OR_ZERO]. This ensures + * that converting umax_value to int cannot overflow. + */ +-#define BPF_MAX_VAR_SIZ INT_MAX ++#define BPF_MAX_VAR_SIZ (1 << 29) + + /* Liveness marks, used for registers and spilled-regs (in stack slots). + * Read marks propagate upwards until they find a write mark; they record that +@@ -110,7 +110,7 @@ struct bpf_insn_aux_data { + struct bpf_map *map_ptr; /* pointer for call insn into lookup_elem */ + }; + int ctx_field_size; /* the ctx field size for load insn, maybe 0 */ +- int converted_op_size; /* the valid value width after perceived conversion */ ++ bool seen; /* this insn was processed by the verifier */ + }; + + #define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */ +diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h +index 780b1242bf24..3b609edffa8f 100644 +--- a/include/linux/compiler-clang.h ++++ b/include/linux/compiler-clang.h +@@ -1,5 +1,5 @@ + /* SPDX-License-Identifier: GPL-2.0 */ +-#ifndef __LINUX_COMPILER_H ++#ifndef __LINUX_COMPILER_TYPES_H + #error "Please don't include directly, include instead." + #endif + +diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h +index bb78e5bdff26..2272ded07496 100644 +--- a/include/linux/compiler-gcc.h ++++ b/include/linux/compiler-gcc.h +@@ -1,5 +1,5 @@ + /* SPDX-License-Identifier: GPL-2.0 */ +-#ifndef __LINUX_COMPILER_H ++#ifndef __LINUX_COMPILER_TYPES_H + #error "Please don't include directly, include instead." + #endif + +diff --git a/include/linux/compiler-intel.h b/include/linux/compiler-intel.h +index 523d1b74550f..bfa08160db3a 100644 +--- a/include/linux/compiler-intel.h ++++ b/include/linux/compiler-intel.h +@@ -1,5 +1,5 @@ + /* SPDX-License-Identifier: GPL-2.0 */ +-#ifndef __LINUX_COMPILER_H ++#ifndef __LINUX_COMPILER_TYPES_H + #error "Please don't include directly, include instead." + #endif + +diff --git a/include/linux/compiler.h b/include/linux/compiler.h +index 202710420d6d..fab5dc250c61 100644 +--- a/include/linux/compiler.h ++++ b/include/linux/compiler.h +@@ -2,111 +2,12 @@ + #ifndef __LINUX_COMPILER_H + #define __LINUX_COMPILER_H + +-#ifndef __ASSEMBLY__ ++#include + +-#ifdef __CHECKER__ +-# define __user __attribute__((noderef, address_space(1))) +-# define __kernel __attribute__((address_space(0))) +-# define __safe __attribute__((safe)) +-# define __force __attribute__((force)) +-# define __nocast __attribute__((nocast)) +-# define __iomem __attribute__((noderef, address_space(2))) +-# define __must_hold(x) __attribute__((context(x,1,1))) +-# define __acquires(x) __attribute__((context(x,0,1))) +-# define __releases(x) __attribute__((context(x,1,0))) +-# define __acquire(x) __context__(x,1) +-# define __release(x) __context__(x,-1) +-# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0) +-# define __percpu __attribute__((noderef, address_space(3))) +-# define __rcu __attribute__((noderef, address_space(4))) +-# define __private __attribute__((noderef)) +-extern void __chk_user_ptr(const volatile void __user *); +-extern void __chk_io_ptr(const volatile void __iomem *); +-# define ACCESS_PRIVATE(p, member) (*((typeof((p)->member) __force *) &(p)->member)) +-#else /* __CHECKER__ */ +-# ifdef STRUCTLEAK_PLUGIN +-# define __user __attribute__((user)) +-# else +-# define __user +-# endif +-# define __kernel +-# define __safe +-# define __force +-# define __nocast +-# define __iomem +-# define __chk_user_ptr(x) (void)0 +-# define __chk_io_ptr(x) (void)0 +-# define __builtin_warning(x, y...) (1) +-# define __must_hold(x) +-# define __acquires(x) +-# define __releases(x) +-# define __acquire(x) (void)0 +-# define __release(x) (void)0 +-# define __cond_lock(x,c) (c) +-# define __percpu +-# define __rcu +-# define __private +-# define ACCESS_PRIVATE(p, member) ((p)->member) +-#endif /* __CHECKER__ */ +- +-/* Indirect macros required for expanded argument pasting, eg. __LINE__. */ +-#define ___PASTE(a,b) a##b +-#define __PASTE(a,b) ___PASTE(a,b) ++#ifndef __ASSEMBLY__ + + #ifdef __KERNEL__ + +-#ifdef __GNUC__ +-#include +-#endif +- +-#if defined(CC_USING_HOTPATCH) && !defined(__CHECKER__) +-#define notrace __attribute__((hotpatch(0,0))) +-#else +-#define notrace __attribute__((no_instrument_function)) +-#endif +- +-/* Intel compiler defines __GNUC__. So we will overwrite implementations +- * coming from above header files here +- */ +-#ifdef __INTEL_COMPILER +-# include +-#endif +- +-/* Clang compiler defines __GNUC__. So we will overwrite implementations +- * coming from above header files here +- */ +-#ifdef __clang__ +-#include +-#endif +- +-/* +- * Generic compiler-dependent macros required for kernel +- * build go below this comment. Actual compiler/compiler version +- * specific implementations come from the above header files +- */ +- +-struct ftrace_branch_data { +- const char *func; +- const char *file; +- unsigned line; +- union { +- struct { +- unsigned long correct; +- unsigned long incorrect; +- }; +- struct { +- unsigned long miss; +- unsigned long hit; +- }; +- unsigned long miss_hit[2]; +- }; +-}; +- +-struct ftrace_likely_data { +- struct ftrace_branch_data data; +- unsigned long constant; +-}; +- + /* + * Note: DISABLE_BRANCH_PROFILING can be used by special lowlevel code + * to disable branch tracing on a per file basis. +@@ -333,6 +234,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s + * with an explicit memory barrier or atomic instruction that provides the + * required ordering. + */ ++#include + + #define __READ_ONCE(x, check) \ + ({ \ +@@ -341,6 +243,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s + __read_once_size(&(x), __u.__c, sizeof(x)); \ + else \ + __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \ ++ smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \ + __u.__val; \ + }) + #define READ_ONCE(x) __READ_ONCE(x, 1) +@@ -363,167 +266,6 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s + + #endif /* __ASSEMBLY__ */ + +-#ifdef __KERNEL__ +-/* +- * Allow us to mark functions as 'deprecated' and have gcc emit a nice +- * warning for each use, in hopes of speeding the functions removal. +- * Usage is: +- * int __deprecated foo(void) +- */ +-#ifndef __deprecated +-# define __deprecated /* unimplemented */ +-#endif +- +-#ifdef MODULE +-#define __deprecated_for_modules __deprecated +-#else +-#define __deprecated_for_modules +-#endif +- +-#ifndef __must_check +-#define __must_check +-#endif +- +-#ifndef CONFIG_ENABLE_MUST_CHECK +-#undef __must_check +-#define __must_check +-#endif +-#ifndef CONFIG_ENABLE_WARN_DEPRECATED +-#undef __deprecated +-#undef __deprecated_for_modules +-#define __deprecated +-#define __deprecated_for_modules +-#endif +- +-#ifndef __malloc +-#define __malloc +-#endif +- +-/* +- * Allow us to avoid 'defined but not used' warnings on functions and data, +- * as well as force them to be emitted to the assembly file. +- * +- * As of gcc 3.4, static functions that are not marked with attribute((used)) +- * may be elided from the assembly file. As of gcc 3.4, static data not so +- * marked will not be elided, but this may change in a future gcc version. +- * +- * NOTE: Because distributions shipped with a backported unit-at-a-time +- * compiler in gcc 3.3, we must define __used to be __attribute__((used)) +- * for gcc >=3.3 instead of 3.4. +- * +- * In prior versions of gcc, such functions and data would be emitted, but +- * would be warned about except with attribute((unused)). +- * +- * Mark functions that are referenced only in inline assembly as __used so +- * the code is emitted even though it appears to be unreferenced. +- */ +-#ifndef __used +-# define __used /* unimplemented */ +-#endif +- +-#ifndef __maybe_unused +-# define __maybe_unused /* unimplemented */ +-#endif +- +-#ifndef __always_unused +-# define __always_unused /* unimplemented */ +-#endif +- +-#ifndef noinline +-#define noinline +-#endif +- +-/* +- * Rather then using noinline to prevent stack consumption, use +- * noinline_for_stack instead. For documentation reasons. +- */ +-#define noinline_for_stack noinline +- +-#ifndef __always_inline +-#define __always_inline inline +-#endif +- +-#endif /* __KERNEL__ */ +- +-/* +- * From the GCC manual: +- * +- * Many functions do not examine any values except their arguments, +- * and have no effects except the return value. Basically this is +- * just slightly more strict class than the `pure' attribute above, +- * since function is not allowed to read global memory. +- * +- * Note that a function that has pointer arguments and examines the +- * data pointed to must _not_ be declared `const'. Likewise, a +- * function that calls a non-`const' function usually must not be +- * `const'. It does not make sense for a `const' function to return +- * `void'. +- */ +-#ifndef __attribute_const__ +-# define __attribute_const__ /* unimplemented */ +-#endif +- +-#ifndef __designated_init +-# define __designated_init +-#endif +- +-#ifndef __latent_entropy +-# define __latent_entropy +-#endif +- +-#ifndef __randomize_layout +-# define __randomize_layout __designated_init +-#endif +- +-#ifndef __no_randomize_layout +-# define __no_randomize_layout +-#endif +- +-#ifndef randomized_struct_fields_start +-# define randomized_struct_fields_start +-# define randomized_struct_fields_end +-#endif +- +-/* +- * Tell gcc if a function is cold. The compiler will assume any path +- * directly leading to the call is unlikely. +- */ +- +-#ifndef __cold +-#define __cold +-#endif +- +-/* Simple shorthand for a section definition */ +-#ifndef __section +-# define __section(S) __attribute__ ((__section__(#S))) +-#endif +- +-#ifndef __visible +-#define __visible +-#endif +- +-#ifndef __nostackprotector +-# define __nostackprotector +-#endif +- +-/* +- * Assume alignment of return value. +- */ +-#ifndef __assume_aligned +-#define __assume_aligned(a, ...) +-#endif +- +- +-/* Are two types/vars the same type (ignoring qualifiers)? */ +-#ifndef __same_type +-# define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) +-#endif +- +-/* Is this type a native word size -- useful for atomic operations */ +-#ifndef __native_word +-# define __native_word(t) (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long)) +-#endif +- + /* Compile time object size, -1 for unknown */ + #ifndef __compiletime_object_size + # define __compiletime_object_size(obj) -1 +diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h +new file mode 100644 +index 000000000000..6b79a9bba9a7 +--- /dev/null ++++ b/include/linux/compiler_types.h +@@ -0,0 +1,274 @@ ++#ifndef __LINUX_COMPILER_TYPES_H ++#define __LINUX_COMPILER_TYPES_H ++ ++#ifndef __ASSEMBLY__ ++ ++#ifdef __CHECKER__ ++# define __user __attribute__((noderef, address_space(1))) ++# define __kernel __attribute__((address_space(0))) ++# define __safe __attribute__((safe)) ++# define __force __attribute__((force)) ++# define __nocast __attribute__((nocast)) ++# define __iomem __attribute__((noderef, address_space(2))) ++# define __must_hold(x) __attribute__((context(x,1,1))) ++# define __acquires(x) __attribute__((context(x,0,1))) ++# define __releases(x) __attribute__((context(x,1,0))) ++# define __acquire(x) __context__(x,1) ++# define __release(x) __context__(x,-1) ++# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0) ++# define __percpu __attribute__((noderef, address_space(3))) ++# define __rcu __attribute__((noderef, address_space(4))) ++# define __private __attribute__((noderef)) ++extern void __chk_user_ptr(const volatile void __user *); ++extern void __chk_io_ptr(const volatile void __iomem *); ++# define ACCESS_PRIVATE(p, member) (*((typeof((p)->member) __force *) &(p)->member)) ++#else /* __CHECKER__ */ ++# ifdef STRUCTLEAK_PLUGIN ++# define __user __attribute__((user)) ++# else ++# define __user ++# endif ++# define __kernel ++# define __safe ++# define __force ++# define __nocast ++# define __iomem ++# define __chk_user_ptr(x) (void)0 ++# define __chk_io_ptr(x) (void)0 ++# define __builtin_warning(x, y...) (1) ++# define __must_hold(x) ++# define __acquires(x) ++# define __releases(x) ++# define __acquire(x) (void)0 ++# define __release(x) (void)0 ++# define __cond_lock(x,c) (c) ++# define __percpu ++# define __rcu ++# define __private ++# define ACCESS_PRIVATE(p, member) ((p)->member) ++#endif /* __CHECKER__ */ ++ ++/* Indirect macros required for expanded argument pasting, eg. __LINE__. */ ++#define ___PASTE(a,b) a##b ++#define __PASTE(a,b) ___PASTE(a,b) ++ ++#ifdef __KERNEL__ ++ ++#ifdef __GNUC__ ++#include ++#endif ++ ++#if defined(CC_USING_HOTPATCH) && !defined(__CHECKER__) ++#define notrace __attribute__((hotpatch(0,0))) ++#else ++#define notrace __attribute__((no_instrument_function)) ++#endif ++ ++/* Intel compiler defines __GNUC__. So we will overwrite implementations ++ * coming from above header files here ++ */ ++#ifdef __INTEL_COMPILER ++# include ++#endif ++ ++/* Clang compiler defines __GNUC__. So we will overwrite implementations ++ * coming from above header files here ++ */ ++#ifdef __clang__ ++#include ++#endif ++ ++/* ++ * Generic compiler-dependent macros required for kernel ++ * build go below this comment. Actual compiler/compiler version ++ * specific implementations come from the above header files ++ */ ++ ++struct ftrace_branch_data { ++ const char *func; ++ const char *file; ++ unsigned line; ++ union { ++ struct { ++ unsigned long correct; ++ unsigned long incorrect; ++ }; ++ struct { ++ unsigned long miss; ++ unsigned long hit; ++ }; ++ unsigned long miss_hit[2]; ++ }; ++}; ++ ++struct ftrace_likely_data { ++ struct ftrace_branch_data data; ++ unsigned long constant; ++}; ++ ++#endif /* __KERNEL__ */ ++ ++#endif /* __ASSEMBLY__ */ ++ ++#ifdef __KERNEL__ ++/* ++ * Allow us to mark functions as 'deprecated' and have gcc emit a nice ++ * warning for each use, in hopes of speeding the functions removal. ++ * Usage is: ++ * int __deprecated foo(void) ++ */ ++#ifndef __deprecated ++# define __deprecated /* unimplemented */ ++#endif ++ ++#ifdef MODULE ++#define __deprecated_for_modules __deprecated ++#else ++#define __deprecated_for_modules ++#endif ++ ++#ifndef __must_check ++#define __must_check ++#endif ++ ++#ifndef CONFIG_ENABLE_MUST_CHECK ++#undef __must_check ++#define __must_check ++#endif ++#ifndef CONFIG_ENABLE_WARN_DEPRECATED ++#undef __deprecated ++#undef __deprecated_for_modules ++#define __deprecated ++#define __deprecated_for_modules ++#endif ++ ++#ifndef __malloc ++#define __malloc ++#endif ++ ++/* ++ * Allow us to avoid 'defined but not used' warnings on functions and data, ++ * as well as force them to be emitted to the assembly file. ++ * ++ * As of gcc 3.4, static functions that are not marked with attribute((used)) ++ * may be elided from the assembly file. As of gcc 3.4, static data not so ++ * marked will not be elided, but this may change in a future gcc version. ++ * ++ * NOTE: Because distributions shipped with a backported unit-at-a-time ++ * compiler in gcc 3.3, we must define __used to be __attribute__((used)) ++ * for gcc >=3.3 instead of 3.4. ++ * ++ * In prior versions of gcc, such functions and data would be emitted, but ++ * would be warned about except with attribute((unused)). ++ * ++ * Mark functions that are referenced only in inline assembly as __used so ++ * the code is emitted even though it appears to be unreferenced. ++ */ ++#ifndef __used ++# define __used /* unimplemented */ ++#endif ++ ++#ifndef __maybe_unused ++# define __maybe_unused /* unimplemented */ ++#endif ++ ++#ifndef __always_unused ++# define __always_unused /* unimplemented */ ++#endif ++ ++#ifndef noinline ++#define noinline ++#endif ++ ++/* ++ * Rather then using noinline to prevent stack consumption, use ++ * noinline_for_stack instead. For documentation reasons. ++ */ ++#define noinline_for_stack noinline ++ ++#ifndef __always_inline ++#define __always_inline inline ++#endif ++ ++#endif /* __KERNEL__ */ ++ ++/* ++ * From the GCC manual: ++ * ++ * Many functions do not examine any values except their arguments, ++ * and have no effects except the return value. Basically this is ++ * just slightly more strict class than the `pure' attribute above, ++ * since function is not allowed to read global memory. ++ * ++ * Note that a function that has pointer arguments and examines the ++ * data pointed to must _not_ be declared `const'. Likewise, a ++ * function that calls a non-`const' function usually must not be ++ * `const'. It does not make sense for a `const' function to return ++ * `void'. ++ */ ++#ifndef __attribute_const__ ++# define __attribute_const__ /* unimplemented */ ++#endif ++ ++#ifndef __designated_init ++# define __designated_init ++#endif ++ ++#ifndef __latent_entropy ++# define __latent_entropy ++#endif ++ ++#ifndef __randomize_layout ++# define __randomize_layout __designated_init ++#endif ++ ++#ifndef __no_randomize_layout ++# define __no_randomize_layout ++#endif ++ ++#ifndef randomized_struct_fields_start ++# define randomized_struct_fields_start ++# define randomized_struct_fields_end ++#endif ++ ++/* ++ * Tell gcc if a function is cold. The compiler will assume any path ++ * directly leading to the call is unlikely. ++ */ ++ ++#ifndef __cold ++#define __cold ++#endif ++ ++/* Simple shorthand for a section definition */ ++#ifndef __section ++# define __section(S) __attribute__ ((__section__(#S))) ++#endif ++ ++#ifndef __visible ++#define __visible ++#endif ++ ++#ifndef __nostackprotector ++# define __nostackprotector ++#endif ++ ++/* ++ * Assume alignment of return value. ++ */ ++#ifndef __assume_aligned ++#define __assume_aligned(a, ...) ++#endif ++ ++ ++/* Are two types/vars the same type (ignoring qualifiers)? */ ++#ifndef __same_type ++# define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) ++#endif ++ ++/* Is this type a native word size -- useful for atomic operations */ ++#ifndef __native_word ++# define __native_word(t) (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long)) ++#endif ++ ++#endif /* __LINUX_COMPILER_TYPES_H */ +diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h +index b4054fd5b6f6..b19563f9a8eb 100644 +--- a/include/linux/hypervisor.h ++++ b/include/linux/hypervisor.h +@@ -7,8 +7,12 @@ + * Juergen Gross + */ + +-#ifdef CONFIG_HYPERVISOR_GUEST +-#include ++#ifdef CONFIG_X86 ++#include ++static inline void hypervisor_pin_vcpu(int cpu) ++{ ++ x86_platform.hyper.pin_vcpu(cpu); ++} + #else + static inline void hypervisor_pin_vcpu(int cpu) + { +diff --git a/include/linux/iio/common/st_sensors.h b/include/linux/iio/common/st_sensors.h +index 7b0fa8b5c120..ce0ef1c0a30a 100644 +--- a/include/linux/iio/common/st_sensors.h ++++ b/include/linux/iio/common/st_sensors.h +@@ -139,7 +139,7 @@ struct st_sensor_das { + * @mask_ihl: mask to enable/disable active low on the INT lines. + * @addr_od: address to enable/disable Open Drain on the INT lines. + * @mask_od: mask to enable/disable Open Drain on the INT lines. +- * @addr_stat_drdy: address to read status of DRDY (data ready) interrupt ++ * struct stat_drdy - status register of DRDY (data ready) interrupt. + * struct ig1 - represents the Interrupt Generator 1 of sensors. + * @en_addr: address of the enable ig1 register. + * @en_mask: mask to write the on/off value for enable. +@@ -152,7 +152,10 @@ struct st_sensor_data_ready_irq { + u8 mask_ihl; + u8 addr_od; + u8 mask_od; +- u8 addr_stat_drdy; ++ struct { ++ u8 addr; ++ u8 mask; ++ } stat_drdy; + struct { + u8 en_addr; + u8 en_mask; +diff --git a/include/linux/intel-pti.h b/include/linux/intel-pti.h +new file mode 100644 +index 000000000000..2710d72de3c9 +--- /dev/null ++++ b/include/linux/intel-pti.h +@@ -0,0 +1,43 @@ ++/* ++ * Copyright (C) Intel 2011 ++ * ++ * This program is free software; you can redistribute it and/or modify ++ * it under the terms of the GNU General Public License version 2 as ++ * published by the Free Software Foundation. ++ * ++ * This program is distributed in the hope that it will be useful, ++ * but WITHOUT ANY WARRANTY; without even the implied warranty of ++ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ++ * GNU General Public License for more details. ++ * ++ * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ++ * ++ * The PTI (Parallel Trace Interface) driver directs trace data routed from ++ * various parts in the system out through the Intel Penwell PTI port and ++ * out of the mobile device for analysis with a debugging tool ++ * (Lauterbach, Fido). This is part of a solution for the MIPI P1149.7, ++ * compact JTAG, standard. ++ * ++ * This header file will allow other parts of the OS to use the ++ * interface to write out it's contents for debugging a mobile system. ++ */ ++ ++#ifndef LINUX_INTEL_PTI_H_ ++#define LINUX_INTEL_PTI_H_ ++ ++/* offset for last dword of any PTI message. Part of MIPI P1149.7 */ ++#define PTI_LASTDWORD_DTS 0x30 ++ ++/* basic structure used as a write address to the PTI HW */ ++struct pti_masterchannel { ++ u8 master; ++ u8 channel; ++}; ++ ++/* the following functions are defined in misc/pti.c */ ++void pti_writedata(struct pti_masterchannel *mc, u8 *buf, int count); ++struct pti_masterchannel *pti_request_masterchannel(u8 type, ++ const char *thread_name); ++void pti_release_masterchannel(struct pti_masterchannel *mc); ++ ++#endif /* LINUX_INTEL_PTI_H_ */ +diff --git a/include/linux/linkage.h b/include/linux/linkage.h +index 2e6f90bd52aa..f68db9e450eb 100644 +--- a/include/linux/linkage.h ++++ b/include/linux/linkage.h +@@ -2,7 +2,7 @@ + #ifndef _LINUX_LINKAGE_H + #define _LINUX_LINKAGE_H + +-#include ++#include + #include + #include + #include +diff --git a/include/linux/mm.h b/include/linux/mm.h +index db647d428100..f50deada0f5c 100644 +--- a/include/linux/mm.h ++++ b/include/linux/mm.h +@@ -2510,7 +2510,7 @@ void vmemmap_populate_print_last(void); + void vmemmap_free(unsigned long start, unsigned long end); + #endif + void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, +- unsigned long size); ++ unsigned long nr_pages); + + enum mf_flags { + MF_COUNT_INCREASED = 1 << 0, +diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h +index 18b06983131a..f0938257ee6d 100644 +--- a/include/linux/mmzone.h ++++ b/include/linux/mmzone.h +@@ -1152,13 +1152,17 @@ struct mem_section { + #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1) + + #ifdef CONFIG_SPARSEMEM_EXTREME +-extern struct mem_section *mem_section[NR_SECTION_ROOTS]; ++extern struct mem_section **mem_section; + #else + extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; + #endif + + static inline struct mem_section *__nr_to_section(unsigned long nr) + { ++#ifdef CONFIG_SPARSEMEM_EXTREME ++ if (!mem_section) ++ return NULL; ++#endif + if (!mem_section[SECTION_NR_TO_ROOT(nr)]) + return NULL; + return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; +diff --git a/include/linux/pti.h b/include/linux/pti.h +deleted file mode 100644 +index b3ea01a3197e..000000000000 +--- a/include/linux/pti.h ++++ /dev/null +@@ -1,43 +0,0 @@ +-/* +- * Copyright (C) Intel 2011 +- * +- * This program is free software; you can redistribute it and/or modify +- * it under the terms of the GNU General Public License version 2 as +- * published by the Free Software Foundation. +- * +- * This program is distributed in the hope that it will be useful, +- * but WITHOUT ANY WARRANTY; without even the implied warranty of +- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +- * GNU General Public License for more details. +- * +- * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +- * +- * The PTI (Parallel Trace Interface) driver directs trace data routed from +- * various parts in the system out through the Intel Penwell PTI port and +- * out of the mobile device for analysis with a debugging tool +- * (Lauterbach, Fido). This is part of a solution for the MIPI P1149.7, +- * compact JTAG, standard. +- * +- * This header file will allow other parts of the OS to use the +- * interface to write out it's contents for debugging a mobile system. +- */ +- +-#ifndef PTI_H_ +-#define PTI_H_ +- +-/* offset for last dword of any PTI message. Part of MIPI P1149.7 */ +-#define PTI_LASTDWORD_DTS 0x30 +- +-/* basic structure used as a write address to the PTI HW */ +-struct pti_masterchannel { +- u8 master; +- u8 channel; +-}; +- +-/* the following functions are defined in misc/pti.c */ +-void pti_writedata(struct pti_masterchannel *mc, u8 *buf, int count); +-struct pti_masterchannel *pti_request_masterchannel(u8 type, +- const char *thread_name); +-void pti_release_masterchannel(struct pti_masterchannel *mc); +- +-#endif /*PTI_H_*/ +diff --git a/include/linux/rculist.h b/include/linux/rculist.h +index c2cdd45a880a..127f534fec94 100644 +--- a/include/linux/rculist.h ++++ b/include/linux/rculist.h +@@ -275,7 +275,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, + * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock(). + */ + #define list_entry_rcu(ptr, type, member) \ +- container_of(lockless_dereference(ptr), type, member) ++ container_of(READ_ONCE(ptr), type, member) + + /* + * Where are list_empty_rcu() and list_first_entry_rcu()? +@@ -368,7 +368,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, + * example is when items are added to the list, but never deleted. + */ + #define list_entry_lockless(ptr, type, member) \ +- container_of((typeof(ptr))lockless_dereference(ptr), type, member) ++ container_of((typeof(ptr))READ_ONCE(ptr), type, member) + + /** + * list_for_each_entry_lockless - iterate over rcu list of given type +diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h +index 1a9f70d44af9..a6ddc42f87a5 100644 +--- a/include/linux/rcupdate.h ++++ b/include/linux/rcupdate.h +@@ -346,7 +346,7 @@ static inline void rcu_preempt_sleep_check(void) { } + #define __rcu_dereference_check(p, c, space) \ + ({ \ + /* Dependency order vs. p above. */ \ +- typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \ ++ typeof(*p) *________p1 = (typeof(*p) *__force)READ_ONCE(p); \ + RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \ + rcu_dereference_sparse(p, space); \ + ((typeof(*p) __force __kernel *)(________p1)); \ +@@ -360,7 +360,7 @@ static inline void rcu_preempt_sleep_check(void) { } + #define rcu_dereference_raw(p) \ + ({ \ + /* Dependency order vs. p above. */ \ +- typeof(p) ________p1 = lockless_dereference(p); \ ++ typeof(p) ________p1 = READ_ONCE(p); \ + ((typeof(*p) __force __kernel *)(________p1)); \ + }) + +diff --git a/include/uapi/linux/stddef.h b/include/uapi/linux/stddef.h +index f65b92e0e1f9..ee8220f8dcf5 100644 +--- a/include/uapi/linux/stddef.h ++++ b/include/uapi/linux/stddef.h +@@ -1,5 +1,5 @@ + /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +-#include ++#include + + #ifndef __always_inline + #define __always_inline inline +diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c +index c48ca2a34b5e..c5ff809e86d0 100644 +--- a/kernel/bpf/verifier.c ++++ b/kernel/bpf/verifier.c +@@ -1061,6 +1061,11 @@ static int check_ptr_alignment(struct bpf_verifier_env *env, + break; + case PTR_TO_STACK: + pointer_desc = "stack "; ++ /* The stack spill tracking logic in check_stack_write() ++ * and check_stack_read() relies on stack accesses being ++ * aligned. ++ */ ++ strict = true; + break; + default: + break; +@@ -1068,6 +1073,29 @@ static int check_ptr_alignment(struct bpf_verifier_env *env, + return check_generic_ptr_alignment(reg, pointer_desc, off, size, strict); + } + ++/* truncate register to smaller size (in bytes) ++ * must be called with size < BPF_REG_SIZE ++ */ ++static void coerce_reg_to_size(struct bpf_reg_state *reg, int size) ++{ ++ u64 mask; ++ ++ /* clear high bits in bit representation */ ++ reg->var_off = tnum_cast(reg->var_off, size); ++ ++ /* fix arithmetic bounds */ ++ mask = ((u64)1 << (size * 8)) - 1; ++ if ((reg->umin_value & ~mask) == (reg->umax_value & ~mask)) { ++ reg->umin_value &= mask; ++ reg->umax_value &= mask; ++ } else { ++ reg->umin_value = 0; ++ reg->umax_value = mask; ++ } ++ reg->smin_value = reg->umin_value; ++ reg->smax_value = reg->umax_value; ++} ++ + /* check whether memory at (regno + off) is accessible for t = (read | write) + * if t==write, value_regno is a register which value is stored into memory + * if t==read, value_regno is a register which will receive the value from memory +@@ -1200,9 +1228,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn + if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ && + state->regs[value_regno].type == SCALAR_VALUE) { + /* b/h/w load zero-extends, mark upper bits as known 0 */ +- state->regs[value_regno].var_off = tnum_cast( +- state->regs[value_regno].var_off, size); +- __update_reg_bounds(&state->regs[value_regno]); ++ coerce_reg_to_size(&state->regs[value_regno], size); + } + return err; + } +@@ -1282,6 +1308,7 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno, + tnum_strn(tn_buf, sizeof(tn_buf), regs[regno].var_off); + verbose("invalid variable stack read R%d var_off=%s\n", + regno, tn_buf); ++ return -EACCES; + } + off = regs[regno].off + regs[regno].var_off.value; + if (off >= 0 || off < -MAX_BPF_STACK || off + access_size > 0 || +@@ -1742,14 +1769,6 @@ static int check_call(struct bpf_verifier_env *env, int func_id, int insn_idx) + return 0; + } + +-static void coerce_reg_to_32(struct bpf_reg_state *reg) +-{ +- /* clear high 32 bits */ +- reg->var_off = tnum_cast(reg->var_off, 4); +- /* Update bounds */ +- __update_reg_bounds(reg); +-} +- + static bool signed_add_overflows(s64 a, s64 b) + { + /* Do the add in u64, where overflow is well-defined */ +@@ -1770,6 +1789,41 @@ static bool signed_sub_overflows(s64 a, s64 b) + return res > a; + } + ++static bool check_reg_sane_offset(struct bpf_verifier_env *env, ++ const struct bpf_reg_state *reg, ++ enum bpf_reg_type type) ++{ ++ bool known = tnum_is_const(reg->var_off); ++ s64 val = reg->var_off.value; ++ s64 smin = reg->smin_value; ++ ++ if (known && (val >= BPF_MAX_VAR_OFF || val <= -BPF_MAX_VAR_OFF)) { ++ verbose("math between %s pointer and %lld is not allowed\n", ++ reg_type_str[type], val); ++ return false; ++ } ++ ++ if (reg->off >= BPF_MAX_VAR_OFF || reg->off <= -BPF_MAX_VAR_OFF) { ++ verbose("%s pointer offset %d is not allowed\n", ++ reg_type_str[type], reg->off); ++ return false; ++ } ++ ++ if (smin == S64_MIN) { ++ verbose("math between %s pointer and register with unbounded min value is not allowed\n", ++ reg_type_str[type]); ++ return false; ++ } ++ ++ if (smin >= BPF_MAX_VAR_OFF || smin <= -BPF_MAX_VAR_OFF) { ++ verbose("value %lld makes %s pointer be out of bounds\n", ++ smin, reg_type_str[type]); ++ return false; ++ } ++ ++ return true; ++} ++ + /* Handles arithmetic on a pointer and a scalar: computes new min/max and var_off. + * Caller should also handle BPF_MOV case separately. + * If we return -EACCES, caller may want to try again treating pointer as a +@@ -1835,6 +1889,10 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env, + dst_reg->type = ptr_reg->type; + dst_reg->id = ptr_reg->id; + ++ if (!check_reg_sane_offset(env, off_reg, ptr_reg->type) || ++ !check_reg_sane_offset(env, ptr_reg, ptr_reg->type)) ++ return -EINVAL; ++ + switch (opcode) { + case BPF_ADD: + /* We can take a fixed offset as long as it doesn't overflow +@@ -1965,12 +2023,19 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env, + return -EACCES; + } + ++ if (!check_reg_sane_offset(env, dst_reg, ptr_reg->type)) ++ return -EINVAL; ++ + __update_reg_bounds(dst_reg); + __reg_deduce_bounds(dst_reg); + __reg_bound_offset(dst_reg); + return 0; + } + ++/* WARNING: This function does calculations on 64-bit values, but the actual ++ * execution may occur on 32-bit values. Therefore, things like bitshifts ++ * need extra checks in the 32-bit case. ++ */ + static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env, + struct bpf_insn *insn, + struct bpf_reg_state *dst_reg, +@@ -1981,12 +2046,8 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env, + bool src_known, dst_known; + s64 smin_val, smax_val; + u64 umin_val, umax_val; ++ u64 insn_bitness = (BPF_CLASS(insn->code) == BPF_ALU64) ? 64 : 32; + +- if (BPF_CLASS(insn->code) != BPF_ALU64) { +- /* 32-bit ALU ops are (32,32)->64 */ +- coerce_reg_to_32(dst_reg); +- coerce_reg_to_32(&src_reg); +- } + smin_val = src_reg.smin_value; + smax_val = src_reg.smax_value; + umin_val = src_reg.umin_value; +@@ -1994,6 +2055,12 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env, + src_known = tnum_is_const(src_reg.var_off); + dst_known = tnum_is_const(dst_reg->var_off); + ++ if (!src_known && ++ opcode != BPF_ADD && opcode != BPF_SUB && opcode != BPF_AND) { ++ __mark_reg_unknown(dst_reg); ++ return 0; ++ } ++ + switch (opcode) { + case BPF_ADD: + if (signed_add_overflows(dst_reg->smin_value, smin_val) || +@@ -2122,9 +2189,9 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env, + __update_reg_bounds(dst_reg); + break; + case BPF_LSH: +- if (umax_val > 63) { +- /* Shifts greater than 63 are undefined. This includes +- * shifts by a negative number. ++ if (umax_val >= insn_bitness) { ++ /* Shifts greater than 31 or 63 are undefined. ++ * This includes shifts by a negative number. + */ + mark_reg_unknown(regs, insn->dst_reg); + break; +@@ -2150,27 +2217,29 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env, + __update_reg_bounds(dst_reg); + break; + case BPF_RSH: +- if (umax_val > 63) { +- /* Shifts greater than 63 are undefined. This includes +- * shifts by a negative number. ++ if (umax_val >= insn_bitness) { ++ /* Shifts greater than 31 or 63 are undefined. ++ * This includes shifts by a negative number. + */ + mark_reg_unknown(regs, insn->dst_reg); + break; + } +- /* BPF_RSH is an unsigned shift, so make the appropriate casts */ +- if (dst_reg->smin_value < 0) { +- if (umin_val) { +- /* Sign bit will be cleared */ +- dst_reg->smin_value = 0; +- } else { +- /* Lost sign bit information */ +- dst_reg->smin_value = S64_MIN; +- dst_reg->smax_value = S64_MAX; +- } +- } else { +- dst_reg->smin_value = +- (u64)(dst_reg->smin_value) >> umax_val; +- } ++ /* BPF_RSH is an unsigned shift. If the value in dst_reg might ++ * be negative, then either: ++ * 1) src_reg might be zero, so the sign bit of the result is ++ * unknown, so we lose our signed bounds ++ * 2) it's known negative, thus the unsigned bounds capture the ++ * signed bounds ++ * 3) the signed bounds cross zero, so they tell us nothing ++ * about the result ++ * If the value in dst_reg is known nonnegative, then again the ++ * unsigned bounts capture the signed bounds. ++ * Thus, in all cases it suffices to blow away our signed bounds ++ * and rely on inferring new ones from the unsigned bounds and ++ * var_off of the result. ++ */ ++ dst_reg->smin_value = S64_MIN; ++ dst_reg->smax_value = S64_MAX; + if (src_known) + dst_reg->var_off = tnum_rshift(dst_reg->var_off, + umin_val); +@@ -2186,6 +2255,12 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env, + break; + } + ++ if (BPF_CLASS(insn->code) != BPF_ALU64) { ++ /* 32-bit ALU ops are (32,32)->32 */ ++ coerce_reg_to_size(dst_reg, 4); ++ coerce_reg_to_size(&src_reg, 4); ++ } ++ + __reg_deduce_bounds(dst_reg); + __reg_bound_offset(dst_reg); + return 0; +@@ -2362,17 +2437,20 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn) + return -EACCES; + } + mark_reg_unknown(regs, insn->dst_reg); +- /* high 32 bits are known zero. */ +- regs[insn->dst_reg].var_off = tnum_cast( +- regs[insn->dst_reg].var_off, 4); +- __update_reg_bounds(®s[insn->dst_reg]); ++ coerce_reg_to_size(®s[insn->dst_reg], 4); + } + } else { + /* case: R = imm + * remember the value we stored into this reg + */ + regs[insn->dst_reg].type = SCALAR_VALUE; +- __mark_reg_known(regs + insn->dst_reg, insn->imm); ++ if (BPF_CLASS(insn->code) == BPF_ALU64) { ++ __mark_reg_known(regs + insn->dst_reg, ++ insn->imm); ++ } else { ++ __mark_reg_known(regs + insn->dst_reg, ++ (u32)insn->imm); ++ } + } + + } else if (opcode > BPF_END) { +@@ -3307,15 +3385,14 @@ static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur, + return range_within(rold, rcur) && + tnum_in(rold->var_off, rcur->var_off); + } else { +- /* if we knew anything about the old value, we're not +- * equal, because we can't know anything about the +- * scalar value of the pointer in the new value. ++ /* We're trying to use a pointer in place of a scalar. ++ * Even if the scalar was unbounded, this could lead to ++ * pointer leaks because scalars are allowed to leak ++ * while pointers are not. We could make this safe in ++ * special cases if root is calling us, but it's ++ * probably not worth the hassle. + */ +- return rold->umin_value == 0 && +- rold->umax_value == U64_MAX && +- rold->smin_value == S64_MIN && +- rold->smax_value == S64_MAX && +- tnum_is_unknown(rold->var_off); ++ return false; + } + case PTR_TO_MAP_VALUE: + /* If the new min/max/var_off satisfy the old ones and +@@ -3665,6 +3742,7 @@ static int do_check(struct bpf_verifier_env *env) + if (err) + return err; + ++ env->insn_aux_data[insn_idx].seen = true; + if (class == BPF_ALU || class == BPF_ALU64) { + err = check_alu_op(env, insn); + if (err) +@@ -3855,6 +3933,7 @@ static int do_check(struct bpf_verifier_env *env) + return err; + + insn_idx++; ++ env->insn_aux_data[insn_idx].seen = true; + } else { + verbose("invalid BPF_LD mode\n"); + return -EINVAL; +@@ -4035,6 +4114,7 @@ static int adjust_insn_aux_data(struct bpf_verifier_env *env, u32 prog_len, + u32 off, u32 cnt) + { + struct bpf_insn_aux_data *new_data, *old_data = env->insn_aux_data; ++ int i; + + if (cnt == 1) + return 0; +@@ -4044,6 +4124,8 @@ static int adjust_insn_aux_data(struct bpf_verifier_env *env, u32 prog_len, + memcpy(new_data, old_data, sizeof(struct bpf_insn_aux_data) * off); + memcpy(new_data + off + cnt - 1, old_data + off, + sizeof(struct bpf_insn_aux_data) * (prog_len - off - cnt + 1)); ++ for (i = off; i < off + cnt - 1; i++) ++ new_data[i].seen = true; + env->insn_aux_data = new_data; + vfree(old_data); + return 0; +@@ -4062,6 +4144,25 @@ static struct bpf_prog *bpf_patch_insn_data(struct bpf_verifier_env *env, u32 of + return new_prog; + } + ++/* The verifier does more data flow analysis than llvm and will not explore ++ * branches that are dead at run time. Malicious programs can have dead code ++ * too. Therefore replace all dead at-run-time code with nops. ++ */ ++static void sanitize_dead_code(struct bpf_verifier_env *env) ++{ ++ struct bpf_insn_aux_data *aux_data = env->insn_aux_data; ++ struct bpf_insn nop = BPF_MOV64_REG(BPF_REG_0, BPF_REG_0); ++ struct bpf_insn *insn = env->prog->insnsi; ++ const int insn_cnt = env->prog->len; ++ int i; ++ ++ for (i = 0; i < insn_cnt; i++) { ++ if (aux_data[i].seen) ++ continue; ++ memcpy(insn + i, &nop, sizeof(nop)); ++ } ++} ++ + /* convert load instructions that access fields of 'struct __sk_buff' + * into sequence of instructions that access fields of 'struct sk_buff' + */ +@@ -4378,6 +4479,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr) + while (pop_stack(env, NULL) >= 0); + free_states(env); + ++ if (ret == 0) ++ sanitize_dead_code(env); ++ + if (ret == 0) + /* program is valid, convert *(u32*)(ctx + off) accesses */ + ret = convert_ctx_accesses(env); +diff --git a/kernel/events/core.c b/kernel/events/core.c +index 4f1d4bfc607a..24ebad5567b4 100644 +--- a/kernel/events/core.c ++++ b/kernel/events/core.c +@@ -4233,7 +4233,7 @@ static void perf_remove_from_owner(struct perf_event *event) + * indeed free this event, otherwise we need to serialize on + * owner->perf_event_mutex. + */ +- owner = lockless_dereference(event->owner); ++ owner = READ_ONCE(event->owner); + if (owner) { + /* + * Since delayed_put_task_struct() also drops the last +@@ -4330,7 +4330,7 @@ int perf_event_release_kernel(struct perf_event *event) + * Cannot change, child events are not migrated, see the + * comment with perf_event_ctx_lock_nested(). + */ +- ctx = lockless_dereference(child->ctx); ++ ctx = READ_ONCE(child->ctx); + /* + * Since child_mutex nests inside ctx::mutex, we must jump + * through hoops. We start by grabbing a reference on the ctx. +diff --git a/kernel/seccomp.c b/kernel/seccomp.c +index 418a1c045933..5f0dfb2abb8d 100644 +--- a/kernel/seccomp.c ++++ b/kernel/seccomp.c +@@ -190,7 +190,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd, + u32 ret = SECCOMP_RET_ALLOW; + /* Make sure cross-thread synced filter points somewhere sane. */ + struct seccomp_filter *f = +- lockless_dereference(current->seccomp.filter); ++ READ_ONCE(current->seccomp.filter); + + /* Ensure unexpected behavior doesn't result in failing open. */ + if (unlikely(WARN_ON(f == NULL))) +diff --git a/kernel/task_work.c b/kernel/task_work.c +index 5718b3ea202a..0fef395662a6 100644 +--- a/kernel/task_work.c ++++ b/kernel/task_work.c +@@ -68,7 +68,7 @@ task_work_cancel(struct task_struct *task, task_work_func_t func) + * we raced with task_work_run(), *pprev == NULL/exited. + */ + raw_spin_lock_irqsave(&task->pi_lock, flags); +- while ((work = lockless_dereference(*pprev))) { ++ while ((work = READ_ONCE(*pprev))) { + if (work->func != func) + pprev = &work->next; + else if (cmpxchg(pprev, work, work->next) == work) +diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c +index dc498b605d5d..6350f64d5aa4 100644 +--- a/kernel/trace/bpf_trace.c ++++ b/kernel/trace/bpf_trace.c +@@ -293,14 +293,13 @@ static const struct bpf_func_proto bpf_perf_event_read_proto = { + .arg2_type = ARG_ANYTHING, + }; + +-static DEFINE_PER_CPU(struct perf_sample_data, bpf_sd); ++static DEFINE_PER_CPU(struct perf_sample_data, bpf_trace_sd); + + static __always_inline u64 + __bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map, +- u64 flags, struct perf_raw_record *raw) ++ u64 flags, struct perf_sample_data *sd) + { + struct bpf_array *array = container_of(map, struct bpf_array, map); +- struct perf_sample_data *sd = this_cpu_ptr(&bpf_sd); + unsigned int cpu = smp_processor_id(); + u64 index = flags & BPF_F_INDEX_MASK; + struct bpf_event_entry *ee; +@@ -323,8 +322,6 @@ __bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map, + if (unlikely(event->oncpu != cpu)) + return -EOPNOTSUPP; + +- perf_sample_data_init(sd, 0, 0); +- sd->raw = raw; + perf_event_output(event, sd, regs); + return 0; + } +@@ -332,6 +329,7 @@ __bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map, + BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map, + u64, flags, void *, data, u64, size) + { ++ struct perf_sample_data *sd = this_cpu_ptr(&bpf_trace_sd); + struct perf_raw_record raw = { + .frag = { + .size = size, +@@ -342,7 +340,10 @@ BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map, + if (unlikely(flags & ~(BPF_F_INDEX_MASK))) + return -EINVAL; + +- return __bpf_perf_event_output(regs, map, flags, &raw); ++ perf_sample_data_init(sd, 0, 0); ++ sd->raw = &raw; ++ ++ return __bpf_perf_event_output(regs, map, flags, sd); + } + + static const struct bpf_func_proto bpf_perf_event_output_proto = { +@@ -357,10 +358,12 @@ static const struct bpf_func_proto bpf_perf_event_output_proto = { + }; + + static DEFINE_PER_CPU(struct pt_regs, bpf_pt_regs); ++static DEFINE_PER_CPU(struct perf_sample_data, bpf_misc_sd); + + u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, + void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy) + { ++ struct perf_sample_data *sd = this_cpu_ptr(&bpf_misc_sd); + struct pt_regs *regs = this_cpu_ptr(&bpf_pt_regs); + struct perf_raw_frag frag = { + .copy = ctx_copy, +@@ -378,8 +381,10 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, + }; + + perf_fetch_caller_regs(regs); ++ perf_sample_data_init(sd, 0, 0); ++ sd->raw = &raw; + +- return __bpf_perf_event_output(regs, map, flags, &raw); ++ return __bpf_perf_event_output(regs, map, flags, sd); + } + + BPF_CALL_0(bpf_get_current_task) +diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c +index 1c21d0e2a145..7eb975a2d0e1 100644 +--- a/kernel/trace/trace_events_hist.c ++++ b/kernel/trace/trace_events_hist.c +@@ -450,7 +450,7 @@ static int create_val_field(struct hist_trigger_data *hist_data, + } + + field = trace_find_event_field(file->event_call, field_name); +- if (!field) { ++ if (!field || !field->size) { + ret = -EINVAL; + goto out; + } +@@ -548,7 +548,7 @@ static int create_key_field(struct hist_trigger_data *hist_data, + } + + field = trace_find_event_field(file->event_call, field_name); +- if (!field) { ++ if (!field || !field->size) { + ret = -EINVAL; + goto out; + } +diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug +index dfdad67d8f6c..ff21b4dbb392 100644 +--- a/lib/Kconfig.debug ++++ b/lib/Kconfig.debug +@@ -376,7 +376,7 @@ config STACK_VALIDATION + that runtime stack traces are more reliable. + + This is also a prerequisite for generation of ORC unwind data, which +- is needed for CONFIG_ORC_UNWINDER. ++ is needed for CONFIG_UNWINDER_ORC. + + For more information, see + tools/objtool/Documentation/stack-validation.txt. +diff --git a/mm/slab.h b/mm/slab.h +index 028cdc7df67e..86d7c7d860f9 100644 +--- a/mm/slab.h ++++ b/mm/slab.h +@@ -259,7 +259,7 @@ cache_from_memcg_idx(struct kmem_cache *s, int idx) + * memcg_caches issues a write barrier to match this (see + * memcg_create_kmem_cache()). + */ +- cachep = lockless_dereference(arr->entries[idx]); ++ cachep = READ_ONCE(arr->entries[idx]); + rcu_read_unlock(); + + return cachep; +diff --git a/mm/sparse.c b/mm/sparse.c +index 4900707ae146..60805abf98af 100644 +--- a/mm/sparse.c ++++ b/mm/sparse.c +@@ -23,8 +23,7 @@ + * 1) mem_section - memory sections, mem_map's for valid memory + */ + #ifdef CONFIG_SPARSEMEM_EXTREME +-struct mem_section *mem_section[NR_SECTION_ROOTS] +- ____cacheline_internodealigned_in_smp; ++struct mem_section **mem_section; + #else + struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT] + ____cacheline_internodealigned_in_smp; +@@ -101,7 +100,7 @@ static inline int sparse_index_init(unsigned long section_nr, int nid) + int __section_nr(struct mem_section* ms) + { + unsigned long root_nr; +- struct mem_section* root; ++ struct mem_section *root = NULL; + + for (root_nr = 0; root_nr < NR_SECTION_ROOTS; root_nr++) { + root = __nr_to_section(root_nr * SECTIONS_PER_ROOT); +@@ -112,7 +111,7 @@ int __section_nr(struct mem_section* ms) + break; + } + +- VM_BUG_ON(root_nr == NR_SECTION_ROOTS); ++ VM_BUG_ON(!root); + + return (root_nr * SECTIONS_PER_ROOT) + (ms - root); + } +@@ -208,6 +207,16 @@ void __init memory_present(int nid, unsigned long start, unsigned long end) + { + unsigned long pfn; + ++#ifdef CONFIG_SPARSEMEM_EXTREME ++ if (unlikely(!mem_section)) { ++ unsigned long size, align; ++ ++ size = sizeof(struct mem_section) * NR_SECTION_ROOTS; ++ align = 1 << (INTERNODE_CACHE_SHIFT); ++ mem_section = memblock_virt_alloc(size, align); ++ } ++#endif ++ + start &= PAGE_SECTION_MASK; + mminit_validate_memmodel_limits(&start, &end); + for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) { +@@ -330,11 +339,17 @@ sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, + static void __init check_usemap_section_nr(int nid, unsigned long *usemap) + { + unsigned long usemap_snr, pgdat_snr; +- static unsigned long old_usemap_snr = NR_MEM_SECTIONS; +- static unsigned long old_pgdat_snr = NR_MEM_SECTIONS; ++ static unsigned long old_usemap_snr; ++ static unsigned long old_pgdat_snr; + struct pglist_data *pgdat = NODE_DATA(nid); + int usemap_nid; + ++ /* First call */ ++ if (!old_usemap_snr) { ++ old_usemap_snr = NR_MEM_SECTIONS; ++ old_pgdat_snr = NR_MEM_SECTIONS; ++ } ++ + usemap_snr = pfn_to_section_nr(__pa(usemap) >> PAGE_SHIFT); + pgdat_snr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT); + if (usemap_snr == pgdat_snr) +diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c +index 467e44d7587d..045331204097 100644 +--- a/net/ipv4/ip_gre.c ++++ b/net/ipv4/ip_gre.c +@@ -579,8 +579,8 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev, + if (gre_handle_offloads(skb, false)) + goto err_free_rt; + +- if (skb->len > dev->mtu) { +- pskb_trim(skb, dev->mtu); ++ if (skb->len > dev->mtu + dev->hard_header_len) { ++ pskb_trim(skb, dev->mtu + dev->hard_header_len); + truncate = true; + } + +@@ -731,8 +731,8 @@ static netdev_tx_t erspan_xmit(struct sk_buff *skb, + if (skb_cow_head(skb, dev->needed_headroom)) + goto free_skb; + +- if (skb->len - dev->hard_header_len > dev->mtu) { +- pskb_trim(skb, dev->mtu); ++ if (skb->len > dev->mtu + dev->hard_header_len) { ++ pskb_trim(skb, dev->mtu + dev->hard_header_len); + truncate = true; + } + +diff --git a/net/ipv4/tcp_vegas.c b/net/ipv4/tcp_vegas.c +index 218cfcc77650..ee113ff15fd0 100644 +--- a/net/ipv4/tcp_vegas.c ++++ b/net/ipv4/tcp_vegas.c +@@ -158,7 +158,7 @@ EXPORT_SYMBOL_GPL(tcp_vegas_cwnd_event); + + static inline u32 tcp_vegas_ssthresh(struct tcp_sock *tp) + { +- return min(tp->snd_ssthresh, tp->snd_cwnd-1); ++ return min(tp->snd_ssthresh, tp->snd_cwnd); + } + + static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 acked) +diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c +index 8a1c846d3df9..2ec39404c449 100644 +--- a/net/ipv6/addrconf.c ++++ b/net/ipv6/addrconf.c +@@ -303,10 +303,10 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { + .disable_policy = 0, + }; + +-/* Check if a valid qdisc is available */ +-static inline bool addrconf_qdisc_ok(const struct net_device *dev) ++/* Check if link is ready: is it up and is a valid qdisc available */ ++static inline bool addrconf_link_ready(const struct net_device *dev) + { +- return !qdisc_tx_is_noop(dev); ++ return netif_oper_up(dev) && !qdisc_tx_is_noop(dev); + } + + static void addrconf_del_rs_timer(struct inet6_dev *idev) +@@ -451,7 +451,7 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev) + + ndev->token = in6addr_any; + +- if (netif_running(dev) && addrconf_qdisc_ok(dev)) ++ if (netif_running(dev) && addrconf_link_ready(dev)) + ndev->if_flags |= IF_READY; + + ipv6_mc_init_dev(ndev); +@@ -3404,7 +3404,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event, + /* restore routes for permanent addresses */ + addrconf_permanent_addr(dev); + +- if (!addrconf_qdisc_ok(dev)) { ++ if (!addrconf_link_ready(dev)) { + /* device is not ready yet. */ + pr_info("ADDRCONF(NETDEV_UP): %s: link is not ready\n", + dev->name); +@@ -3419,7 +3419,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event, + run_pending = 1; + } + } else if (event == NETDEV_CHANGE) { +- if (!addrconf_qdisc_ok(dev)) { ++ if (!addrconf_link_ready(dev)) { + /* device is still not ready. */ + break; + } +diff --git a/net/ipv6/route.c b/net/ipv6/route.c +index 598efa8cfe25..76b47682f77f 100644 +--- a/net/ipv6/route.c ++++ b/net/ipv6/route.c +@@ -1055,7 +1055,6 @@ static struct rt6_info *rt6_get_pcpu_route(struct rt6_info *rt) + + static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt) + { +- struct fib6_table *table = rt->rt6i_table; + struct rt6_info *pcpu_rt, *prev, **p; + + pcpu_rt = ip6_rt_pcpu_alloc(rt); +@@ -1066,28 +1065,20 @@ static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt) + return net->ipv6.ip6_null_entry; + } + +- read_lock_bh(&table->tb6_lock); +- if (rt->rt6i_pcpu) { +- p = this_cpu_ptr(rt->rt6i_pcpu); +- prev = cmpxchg(p, NULL, pcpu_rt); +- if (prev) { +- /* If someone did it before us, return prev instead */ +- dst_release_immediate(&pcpu_rt->dst); +- pcpu_rt = prev; +- } +- } else { +- /* rt has been removed from the fib6 tree +- * before we have a chance to acquire the read_lock. +- * In this case, don't brother to create a pcpu rt +- * since rt is going away anyway. The next +- * dst_check() will trigger a re-lookup. +- */ ++ dst_hold(&pcpu_rt->dst); ++ p = this_cpu_ptr(rt->rt6i_pcpu); ++ prev = cmpxchg(p, NULL, pcpu_rt); ++ if (prev) { ++ /* If someone did it before us, return prev instead */ ++ /* release refcnt taken by ip6_rt_pcpu_alloc() */ + dst_release_immediate(&pcpu_rt->dst); +- pcpu_rt = rt; ++ /* release refcnt taken by above dst_hold() */ ++ dst_release_immediate(&pcpu_rt->dst); ++ dst_hold(&prev->dst); ++ pcpu_rt = prev; + } +- dst_hold(&pcpu_rt->dst); ++ + rt6_dst_from_metrics_check(pcpu_rt); +- read_unlock_bh(&table->tb6_lock); + return pcpu_rt; + } + +@@ -1177,19 +1168,28 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, + if (pcpu_rt) { + read_unlock_bh(&table->tb6_lock); + } else { +- /* We have to do the read_unlock first +- * because rt6_make_pcpu_route() may trigger +- * ip6_dst_gc() which will take the write_lock. +- */ +- dst_hold(&rt->dst); +- read_unlock_bh(&table->tb6_lock); +- pcpu_rt = rt6_make_pcpu_route(rt); +- dst_release(&rt->dst); ++ /* atomic_inc_not_zero() is needed when using rcu */ ++ if (atomic_inc_not_zero(&rt->rt6i_ref)) { ++ /* We have to do the read_unlock first ++ * because rt6_make_pcpu_route() may trigger ++ * ip6_dst_gc() which will take the write_lock. ++ * ++ * No dst_hold() on rt is needed because grabbing ++ * rt->rt6i_ref makes sure rt can't be released. ++ */ ++ read_unlock_bh(&table->tb6_lock); ++ pcpu_rt = rt6_make_pcpu_route(rt); ++ rt6_release(rt); ++ } else { ++ /* rt is already removed from tree */ ++ read_unlock_bh(&table->tb6_lock); ++ pcpu_rt = net->ipv6.ip6_null_entry; ++ dst_hold(&pcpu_rt->dst); ++ } + } + + trace_fib6_table_lookup(net, pcpu_rt, table->tb6_id, fl6); + return pcpu_rt; +- + } + } + EXPORT_SYMBOL_GPL(ip6_pol_route); +diff --git a/net/sctp/stream.c b/net/sctp/stream.c +index fa8371ff05c4..724adf2786a2 100644 +--- a/net/sctp/stream.c ++++ b/net/sctp/stream.c +@@ -40,9 +40,14 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt, + { + int i; + ++ gfp |= __GFP_NOWARN; ++ + /* Initial stream->out size may be very big, so free it and alloc +- * a new one with new outcnt to save memory. ++ * a new one with new outcnt to save memory if needed. + */ ++ if (outcnt == stream->outcnt) ++ goto in; ++ + kfree(stream->out); + + stream->out = kcalloc(outcnt, sizeof(*stream->out), gfp); +@@ -53,6 +58,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt, + for (i = 0; i < stream->outcnt; i++) + stream->out[i].state = SCTP_STREAM_OPEN; + ++in: + if (!incnt) + return 0; + +diff --git a/scripts/Makefile.build b/scripts/Makefile.build +index bb831d49bcfd..e63af4e19382 100644 +--- a/scripts/Makefile.build ++++ b/scripts/Makefile.build +@@ -259,7 +259,7 @@ ifneq ($(SKIP_STACK_VALIDATION),1) + + __objtool_obj := $(objtree)/tools/objtool/objtool + +-objtool_args = $(if $(CONFIG_ORC_UNWINDER),orc generate,check) ++objtool_args = $(if $(CONFIG_UNWINDER_ORC),orc generate,check) + + ifndef CONFIG_FRAME_POINTER + objtool_args += --no-fp +diff --git a/scripts/headers_install.sh b/scripts/headers_install.sh +index 4d1ea96e8794..a18bca720995 100755 +--- a/scripts/headers_install.sh ++++ b/scripts/headers_install.sh +@@ -34,7 +34,7 @@ do + sed -r \ + -e 's/([ \t(])(__user|__force|__iomem)[ \t]/\1/g' \ + -e 's/__attribute_const__([ \t]|$)/\1/g' \ +- -e 's@^#include @@' \ ++ -e 's@^#include @@' \ + -e 's/(^|[^a-zA-Z0-9])__packed([^a-zA-Z0-9_]|$)/\1__attribute__((packed))\2/g' \ + -e 's/(^|[ \t(])(inline|asm|volatile)([ \t(]|$)/\1__\2__\3/g' \ + -e 's@#(ifndef|define|endif[ \t]*/[*])[ \t]*_UAPI@#\1 @' \ +diff --git a/sound/soc/codecs/msm8916-wcd-analog.c b/sound/soc/codecs/msm8916-wcd-analog.c +index 549c269acc7d..18933bf6473f 100644 +--- a/sound/soc/codecs/msm8916-wcd-analog.c ++++ b/sound/soc/codecs/msm8916-wcd-analog.c +@@ -104,7 +104,7 @@ + #define CDC_A_MICB_1_VAL (0xf141) + #define MICB_MIN_VAL 1600 + #define MICB_STEP_SIZE 50 +-#define MICB_VOLTAGE_REGVAL(v) ((v - MICB_MIN_VAL)/MICB_STEP_SIZE) ++#define MICB_VOLTAGE_REGVAL(v) (((v - MICB_MIN_VAL)/MICB_STEP_SIZE) << 3) + #define MICB_1_VAL_MICB_OUT_VAL_MASK GENMASK(7, 3) + #define MICB_1_VAL_MICB_OUT_VAL_V2P70V ((0x16) << 3) + #define MICB_1_VAL_MICB_OUT_VAL_V1P80V ((0x4) << 3) +@@ -349,8 +349,9 @@ static void pm8916_wcd_analog_micbias_enable(struct snd_soc_codec *codec) + | MICB_1_CTL_EXT_PRECHARG_EN_ENABLE); + + if (wcd->micbias_mv) { +- snd_soc_write(codec, CDC_A_MICB_1_VAL, +- MICB_VOLTAGE_REGVAL(wcd->micbias_mv)); ++ snd_soc_update_bits(codec, CDC_A_MICB_1_VAL, ++ MICB_1_VAL_MICB_OUT_VAL_MASK, ++ MICB_VOLTAGE_REGVAL(wcd->micbias_mv)); + /* + * Special headset needs MICBIAS as 2.7V so wait for + * 50 msec for the MICBIAS to reach 2.7 volts. +@@ -1241,6 +1242,8 @@ static const struct of_device_id pm8916_wcd_analog_spmi_match_table[] = { + { } + }; + ++MODULE_DEVICE_TABLE(of, pm8916_wcd_analog_spmi_match_table); ++ + static struct platform_driver pm8916_wcd_analog_spmi_driver = { + .driver = { + .name = "qcom,pm8916-wcd-spmi-codec", +diff --git a/sound/soc/img/img-parallel-out.c b/sound/soc/img/img-parallel-out.c +index 23b0f0f6ec9c..2fc8a6372206 100644 +--- a/sound/soc/img/img-parallel-out.c ++++ b/sound/soc/img/img-parallel-out.c +@@ -164,9 +164,11 @@ static int img_prl_out_set_fmt(struct snd_soc_dai *dai, unsigned int fmt) + return -EINVAL; + } + ++ pm_runtime_get_sync(prl->dev); + reg = img_prl_out_readl(prl, IMG_PRL_OUT_CTL); + reg = (reg & ~IMG_PRL_OUT_CTL_EDGE_MASK) | control_set; + img_prl_out_writel(prl, reg, IMG_PRL_OUT_CTL); ++ pm_runtime_put(prl->dev); + + return 0; + } +diff --git a/tools/objtool/check.c b/tools/objtool/check.c +index c0e26ad1fa7e..9b341584eb1b 100644 +--- a/tools/objtool/check.c ++++ b/tools/objtool/check.c +@@ -1757,11 +1757,14 @@ static int validate_branch(struct objtool_file *file, struct instruction *first, + if (insn->dead_end) + return 0; + +- insn = next_insn; +- if (!insn) { ++ if (!next_insn) { ++ if (state.cfa.base == CFI_UNDEFINED) ++ return 0; + WARN("%s: unexpected end of section", sec->name); + return 1; + } ++ ++ insn = next_insn; + } + + return 0; +diff --git a/tools/objtool/objtool.c b/tools/objtool/objtool.c +index 31e0f9143840..07f329919828 100644 +--- a/tools/objtool/objtool.c ++++ b/tools/objtool/objtool.c +@@ -70,7 +70,7 @@ static void cmd_usage(void) + + printf("\n"); + +- exit(1); ++ exit(129); + } + + static void handle_options(int *argc, const char ***argv) +@@ -86,9 +86,7 @@ static void handle_options(int *argc, const char ***argv) + break; + } else { + fprintf(stderr, "Unknown option: %s\n", cmd); +- fprintf(stderr, "\n Usage: %s\n", +- objtool_usage_string); +- exit(1); ++ cmd_usage(); + } + + (*argv)++; +diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c +index 64ae21f64489..7a2d221c4702 100644 +--- a/tools/testing/selftests/bpf/test_verifier.c ++++ b/tools/testing/selftests/bpf/test_verifier.c +@@ -606,7 +606,6 @@ static struct bpf_test tests[] = { + }, + .errstr = "misaligned stack access", + .result = REJECT, +- .flags = F_LOAD_WITH_STRICT_ALIGNMENT, + }, + { + "invalid map_fd for function call", +@@ -1797,7 +1796,6 @@ static struct bpf_test tests[] = { + }, + .result = REJECT, + .errstr = "misaligned stack access off (0x0; 0x0)+-8+2 size 8", +- .flags = F_LOAD_WITH_STRICT_ALIGNMENT, + }, + { + "PTR_TO_STACK store/load - bad alignment on reg", +@@ -1810,7 +1808,6 @@ static struct bpf_test tests[] = { + }, + .result = REJECT, + .errstr = "misaligned stack access off (0x0; 0x0)+-10+8 size 8", +- .flags = F_LOAD_WITH_STRICT_ALIGNMENT, + }, + { + "PTR_TO_STACK store/load - out of bounds low", +@@ -6115,7 +6112,7 @@ static struct bpf_test tests[] = { + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, +- .errstr = "R0 min value is negative", ++ .errstr = "unbounded min value", + .result = REJECT, + }, + { +@@ -6139,7 +6136,7 @@ static struct bpf_test tests[] = { + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, +- .errstr = "R0 min value is negative", ++ .errstr = "unbounded min value", + .result = REJECT, + }, + { +@@ -6165,7 +6162,7 @@ static struct bpf_test tests[] = { + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, +- .errstr = "R8 invalid mem access 'inv'", ++ .errstr = "unbounded min value", + .result = REJECT, + }, + { +@@ -6190,7 +6187,7 @@ static struct bpf_test tests[] = { + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, +- .errstr = "R8 invalid mem access 'inv'", ++ .errstr = "unbounded min value", + .result = REJECT, + }, + { +@@ -6238,7 +6235,7 @@ static struct bpf_test tests[] = { + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, +- .errstr = "R0 min value is negative", ++ .errstr = "unbounded min value", + .result = REJECT, + }, + { +@@ -6309,7 +6306,7 @@ static struct bpf_test tests[] = { + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, +- .errstr = "R0 min value is negative", ++ .errstr = "unbounded min value", + .result = REJECT, + }, + { +@@ -6360,7 +6357,7 @@ static struct bpf_test tests[] = { + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, +- .errstr = "R0 min value is negative", ++ .errstr = "unbounded min value", + .result = REJECT, + }, + { +@@ -6387,7 +6384,7 @@ static struct bpf_test tests[] = { + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, +- .errstr = "R0 min value is negative", ++ .errstr = "unbounded min value", + .result = REJECT, + }, + { +@@ -6413,7 +6410,7 @@ static struct bpf_test tests[] = { + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, +- .errstr = "R0 min value is negative", ++ .errstr = "unbounded min value", + .result = REJECT, + }, + { +@@ -6442,7 +6439,7 @@ static struct bpf_test tests[] = { + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, +- .errstr = "R0 min value is negative", ++ .errstr = "unbounded min value", + .result = REJECT, + }, + { +@@ -6472,7 +6469,7 @@ static struct bpf_test tests[] = { + BPF_JMP_IMM(BPF_JA, 0, 0, -7), + }, + .fixup_map1 = { 4 }, +- .errstr = "R0 min value is negative", ++ .errstr = "unbounded min value", + .result = REJECT, + }, + { +@@ -6500,8 +6497,7 @@ static struct bpf_test tests[] = { + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, +- .errstr_unpriv = "R0 pointer comparison prohibited", +- .errstr = "R0 min value is negative", ++ .errstr = "unbounded min value", + .result = REJECT, + .result_unpriv = REJECT, + }, +@@ -6556,6 +6552,462 @@ static struct bpf_test tests[] = { + .errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.", + .result = REJECT, + }, ++ { ++ "bounds check based on zero-extended MOV", ++ .insns = { ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4), ++ /* r2 = 0x0000'0000'ffff'ffff */ ++ BPF_MOV32_IMM(BPF_REG_2, 0xffffffff), ++ /* r2 = 0 */ ++ BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 32), ++ /* no-op */ ++ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_2), ++ /* access at offset 0 */ ++ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), ++ /* exit */ ++ BPF_MOV64_IMM(BPF_REG_0, 0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 3 }, ++ .result = ACCEPT ++ }, ++ { ++ "bounds check based on sign-extended MOV. test1", ++ .insns = { ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4), ++ /* r2 = 0xffff'ffff'ffff'ffff */ ++ BPF_MOV64_IMM(BPF_REG_2, 0xffffffff), ++ /* r2 = 0xffff'ffff */ ++ BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 32), ++ /* r0 = */ ++ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_2), ++ /* access to OOB pointer */ ++ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), ++ /* exit */ ++ BPF_MOV64_IMM(BPF_REG_0, 0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 3 }, ++ .errstr = "map_value pointer and 4294967295", ++ .result = REJECT ++ }, ++ { ++ "bounds check based on sign-extended MOV. test2", ++ .insns = { ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4), ++ /* r2 = 0xffff'ffff'ffff'ffff */ ++ BPF_MOV64_IMM(BPF_REG_2, 0xffffffff), ++ /* r2 = 0xfff'ffff */ ++ BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 36), ++ /* r0 = */ ++ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_2), ++ /* access to OOB pointer */ ++ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), ++ /* exit */ ++ BPF_MOV64_IMM(BPF_REG_0, 0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 3 }, ++ .errstr = "R0 min value is outside of the array range", ++ .result = REJECT ++ }, ++ { ++ "bounds check based on reg_off + var_off + insn_off. test1", ++ .insns = { ++ BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1, ++ offsetof(struct __sk_buff, mark)), ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4), ++ BPF_ALU64_IMM(BPF_AND, BPF_REG_6, 1), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, (1 << 29) - 1), ++ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_6), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, (1 << 29) - 1), ++ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 3), ++ BPF_MOV64_IMM(BPF_REG_0, 0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 4 }, ++ .errstr = "value_size=8 off=1073741825", ++ .result = REJECT, ++ .prog_type = BPF_PROG_TYPE_SCHED_CLS, ++ }, ++ { ++ "bounds check based on reg_off + var_off + insn_off. test2", ++ .insns = { ++ BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1, ++ offsetof(struct __sk_buff, mark)), ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4), ++ BPF_ALU64_IMM(BPF_AND, BPF_REG_6, 1), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, (1 << 30) - 1), ++ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_6), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, (1 << 29) - 1), ++ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 3), ++ BPF_MOV64_IMM(BPF_REG_0, 0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 4 }, ++ .errstr = "value 1073741823", ++ .result = REJECT, ++ .prog_type = BPF_PROG_TYPE_SCHED_CLS, ++ }, ++ { ++ "bounds check after truncation of non-boundary-crossing range", ++ .insns = { ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9), ++ /* r1 = [0x00, 0xff] */ ++ BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), ++ BPF_MOV64_IMM(BPF_REG_2, 1), ++ /* r2 = 0x10'0000'0000 */ ++ BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 36), ++ /* r1 = [0x10'0000'0000, 0x10'0000'00ff] */ ++ BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_2), ++ /* r1 = [0x10'7fff'ffff, 0x10'8000'00fe] */ ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x7fffffff), ++ /* r1 = [0x00, 0xff] */ ++ BPF_ALU32_IMM(BPF_SUB, BPF_REG_1, 0x7fffffff), ++ /* r1 = 0 */ ++ BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8), ++ /* no-op */ ++ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), ++ /* access at offset 0 */ ++ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), ++ /* exit */ ++ BPF_MOV64_IMM(BPF_REG_0, 0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 3 }, ++ .result = ACCEPT ++ }, ++ { ++ "bounds check after truncation of boundary-crossing range (1)", ++ .insns = { ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9), ++ /* r1 = [0x00, 0xff] */ ++ BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0xffffff80 >> 1), ++ /* r1 = [0xffff'ff80, 0x1'0000'007f] */ ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0xffffff80 >> 1), ++ /* r1 = [0xffff'ff80, 0xffff'ffff] or ++ * [0x0000'0000, 0x0000'007f] ++ */ ++ BPF_ALU32_IMM(BPF_ADD, BPF_REG_1, 0), ++ BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 0xffffff80 >> 1), ++ /* r1 = [0x00, 0xff] or ++ * [0xffff'ffff'0000'0080, 0xffff'ffff'ffff'ffff] ++ */ ++ BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 0xffffff80 >> 1), ++ /* r1 = 0 or ++ * [0x00ff'ffff'ff00'0000, 0x00ff'ffff'ffff'ffff] ++ */ ++ BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8), ++ /* no-op or OOB pointer computation */ ++ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), ++ /* potentially OOB access */ ++ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), ++ /* exit */ ++ BPF_MOV64_IMM(BPF_REG_0, 0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 3 }, ++ /* not actually fully unbounded, but the bound is very high */ ++ .errstr = "R0 unbounded memory access", ++ .result = REJECT ++ }, ++ { ++ "bounds check after truncation of boundary-crossing range (2)", ++ .insns = { ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9), ++ /* r1 = [0x00, 0xff] */ ++ BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0xffffff80 >> 1), ++ /* r1 = [0xffff'ff80, 0x1'0000'007f] */ ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0xffffff80 >> 1), ++ /* r1 = [0xffff'ff80, 0xffff'ffff] or ++ * [0x0000'0000, 0x0000'007f] ++ * difference to previous test: truncation via MOV32 ++ * instead of ALU32. ++ */ ++ BPF_MOV32_REG(BPF_REG_1, BPF_REG_1), ++ BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 0xffffff80 >> 1), ++ /* r1 = [0x00, 0xff] or ++ * [0xffff'ffff'0000'0080, 0xffff'ffff'ffff'ffff] ++ */ ++ BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 0xffffff80 >> 1), ++ /* r1 = 0 or ++ * [0x00ff'ffff'ff00'0000, 0x00ff'ffff'ffff'ffff] ++ */ ++ BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8), ++ /* no-op or OOB pointer computation */ ++ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), ++ /* potentially OOB access */ ++ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), ++ /* exit */ ++ BPF_MOV64_IMM(BPF_REG_0, 0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 3 }, ++ /* not actually fully unbounded, but the bound is very high */ ++ .errstr = "R0 unbounded memory access", ++ .result = REJECT ++ }, ++ { ++ "bounds check after wrapping 32-bit addition", ++ .insns = { ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5), ++ /* r1 = 0x7fff'ffff */ ++ BPF_MOV64_IMM(BPF_REG_1, 0x7fffffff), ++ /* r1 = 0xffff'fffe */ ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x7fffffff), ++ /* r1 = 0 */ ++ BPF_ALU32_IMM(BPF_ADD, BPF_REG_1, 2), ++ /* no-op */ ++ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), ++ /* access at offset 0 */ ++ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), ++ /* exit */ ++ BPF_MOV64_IMM(BPF_REG_0, 0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 3 }, ++ .result = ACCEPT ++ }, ++ { ++ "bounds check after shift with oversized count operand", ++ .insns = { ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6), ++ BPF_MOV64_IMM(BPF_REG_2, 32), ++ BPF_MOV64_IMM(BPF_REG_1, 1), ++ /* r1 = (u32)1 << (u32)32 = ? */ ++ BPF_ALU32_REG(BPF_LSH, BPF_REG_1, BPF_REG_2), ++ /* r1 = [0x0000, 0xffff] */ ++ BPF_ALU64_IMM(BPF_AND, BPF_REG_1, 0xffff), ++ /* computes unknown pointer, potentially OOB */ ++ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), ++ /* potentially OOB access */ ++ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), ++ /* exit */ ++ BPF_MOV64_IMM(BPF_REG_0, 0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 3 }, ++ .errstr = "R0 max value is outside of the array range", ++ .result = REJECT ++ }, ++ { ++ "bounds check after right shift of maybe-negative number", ++ .insns = { ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6), ++ /* r1 = [0x00, 0xff] */ ++ BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), ++ /* r1 = [-0x01, 0xfe] */ ++ BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 1), ++ /* r1 = 0 or 0xff'ffff'ffff'ffff */ ++ BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8), ++ /* r1 = 0 or 0xffff'ffff'ffff */ ++ BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8), ++ /* computes unknown pointer, potentially OOB */ ++ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), ++ /* potentially OOB access */ ++ BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), ++ /* exit */ ++ BPF_MOV64_IMM(BPF_REG_0, 0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 3 }, ++ .errstr = "R0 unbounded memory access", ++ .result = REJECT ++ }, ++ { ++ "bounds check map access with off+size signed 32bit overflow. test1", ++ .insns = { ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), ++ BPF_EXIT_INSN(), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 0x7ffffffe), ++ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), ++ BPF_JMP_A(0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 3 }, ++ .errstr = "map_value pointer and 2147483646", ++ .result = REJECT ++ }, ++ { ++ "bounds check map access with off+size signed 32bit overflow. test2", ++ .insns = { ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), ++ BPF_EXIT_INSN(), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 0x1fffffff), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 0x1fffffff), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 0x1fffffff), ++ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), ++ BPF_JMP_A(0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 3 }, ++ .errstr = "pointer offset 1073741822", ++ .result = REJECT ++ }, ++ { ++ "bounds check map access with off+size signed 32bit overflow. test3", ++ .insns = { ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), ++ BPF_EXIT_INSN(), ++ BPF_ALU64_IMM(BPF_SUB, BPF_REG_0, 0x1fffffff), ++ BPF_ALU64_IMM(BPF_SUB, BPF_REG_0, 0x1fffffff), ++ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 2), ++ BPF_JMP_A(0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 3 }, ++ .errstr = "pointer offset -1073741822", ++ .result = REJECT ++ }, ++ { ++ "bounds check map access with off+size signed 32bit overflow. test4", ++ .insns = { ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), ++ BPF_EXIT_INSN(), ++ BPF_MOV64_IMM(BPF_REG_1, 1000000), ++ BPF_ALU64_IMM(BPF_MUL, BPF_REG_1, 1000000), ++ BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), ++ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 2), ++ BPF_JMP_A(0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 3 }, ++ .errstr = "map_value pointer and 1000000000000", ++ .result = REJECT ++ }, ++ { ++ "pointer/scalar confusion in state equality check (way 1)", ++ .insns = { ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), ++ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), ++ BPF_JMP_A(1), ++ BPF_MOV64_REG(BPF_REG_0, BPF_REG_10), ++ BPF_JMP_A(0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 3 }, ++ .result = ACCEPT, ++ .result_unpriv = REJECT, ++ .errstr_unpriv = "R0 leaks addr as return value" ++ }, ++ { ++ "pointer/scalar confusion in state equality check (way 2)", ++ .insns = { ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2), ++ BPF_MOV64_REG(BPF_REG_0, BPF_REG_10), ++ BPF_JMP_A(1), ++ BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 3 }, ++ .result = ACCEPT, ++ .result_unpriv = REJECT, ++ .errstr_unpriv = "R0 leaks addr as return value" ++ }, + { + "variable-offset ctx access", + .insns = { +@@ -6597,6 +7049,71 @@ static struct bpf_test tests[] = { + .result = REJECT, + .prog_type = BPF_PROG_TYPE_LWT_IN, + }, ++ { ++ "indirect variable-offset stack access", ++ .insns = { ++ /* Fill the top 8 bytes of the stack */ ++ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), ++ /* Get an unknown value */ ++ BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0), ++ /* Make it small and 4-byte aligned */ ++ BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4), ++ BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 8), ++ /* add it to fp. We now have either fp-4 or fp-8, but ++ * we don't know which ++ */ ++ BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10), ++ /* dereference it indirectly */ ++ BPF_LD_MAP_FD(BPF_REG_1, 0), ++ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, ++ BPF_FUNC_map_lookup_elem), ++ BPF_MOV64_IMM(BPF_REG_0, 0), ++ BPF_EXIT_INSN(), ++ }, ++ .fixup_map1 = { 5 }, ++ .errstr = "variable stack read R2", ++ .result = REJECT, ++ .prog_type = BPF_PROG_TYPE_LWT_IN, ++ }, ++ { ++ "direct stack access with 32-bit wraparound. test1", ++ .insns = { ++ BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x7fffffff), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x7fffffff), ++ BPF_MOV32_IMM(BPF_REG_0, 0), ++ BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), ++ BPF_EXIT_INSN() ++ }, ++ .errstr = "fp pointer and 2147483647", ++ .result = REJECT ++ }, ++ { ++ "direct stack access with 32-bit wraparound. test2", ++ .insns = { ++ BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x3fffffff), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x3fffffff), ++ BPF_MOV32_IMM(BPF_REG_0, 0), ++ BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), ++ BPF_EXIT_INSN() ++ }, ++ .errstr = "fp pointer and 1073741823", ++ .result = REJECT ++ }, ++ { ++ "direct stack access with 32-bit wraparound. test3", ++ .insns = { ++ BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x1fffffff), ++ BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x1fffffff), ++ BPF_MOV32_IMM(BPF_REG_0, 0), ++ BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), ++ BPF_EXIT_INSN() ++ }, ++ .errstr = "fp pointer offset 1073741822", ++ .result = REJECT ++ }, + { + "liveness pruning and write screening", + .insns = { +diff --git a/tools/testing/selftests/x86/ldt_gdt.c b/tools/testing/selftests/x86/ldt_gdt.c +index 2afc41a3730f..66e5ce5b91f0 100644 +--- a/tools/testing/selftests/x86/ldt_gdt.c ++++ b/tools/testing/selftests/x86/ldt_gdt.c +@@ -137,30 +137,51 @@ static void check_valid_segment(uint16_t index, int ldt, + } + } + +-static bool install_valid_mode(const struct user_desc *desc, uint32_t ar, +- bool oldmode) ++static bool install_valid_mode(const struct user_desc *d, uint32_t ar, ++ bool oldmode, bool ldt) + { +- int ret = syscall(SYS_modify_ldt, oldmode ? 1 : 0x11, +- desc, sizeof(*desc)); +- if (ret < -1) +- errno = -ret; ++ struct user_desc desc = *d; ++ int ret; ++ ++ if (!ldt) { ++#ifndef __i386__ ++ /* No point testing set_thread_area in a 64-bit build */ ++ return false; ++#endif ++ if (!gdt_entry_num) ++ return false; ++ desc.entry_number = gdt_entry_num; ++ ++ ret = syscall(SYS_set_thread_area, &desc); ++ } else { ++ ret = syscall(SYS_modify_ldt, oldmode ? 1 : 0x11, ++ &desc, sizeof(desc)); ++ ++ if (ret < -1) ++ errno = -ret; ++ ++ if (ret != 0 && errno == ENOSYS) { ++ printf("[OK]\tmodify_ldt returned -ENOSYS\n"); ++ return false; ++ } ++ } ++ + if (ret == 0) { +- uint32_t limit = desc->limit; +- if (desc->limit_in_pages) ++ uint32_t limit = desc.limit; ++ if (desc.limit_in_pages) + limit = (limit << 12) + 4095; +- check_valid_segment(desc->entry_number, 1, ar, limit, true); ++ check_valid_segment(desc.entry_number, ldt, ar, limit, true); + return true; +- } else if (errno == ENOSYS) { +- printf("[OK]\tmodify_ldt returned -ENOSYS\n"); +- return false; + } else { +- if (desc->seg_32bit) { +- printf("[FAIL]\tUnexpected modify_ldt failure %d\n", ++ if (desc.seg_32bit) { ++ printf("[FAIL]\tUnexpected %s failure %d\n", ++ ldt ? "modify_ldt" : "set_thread_area", + errno); + nerrs++; + return false; + } else { +- printf("[OK]\tmodify_ldt rejected 16 bit segment\n"); ++ printf("[OK]\t%s rejected 16 bit segment\n", ++ ldt ? "modify_ldt" : "set_thread_area"); + return false; + } + } +@@ -168,7 +189,15 @@ static bool install_valid_mode(const struct user_desc *desc, uint32_t ar, + + static bool install_valid(const struct user_desc *desc, uint32_t ar) + { +- return install_valid_mode(desc, ar, false); ++ bool ret = install_valid_mode(desc, ar, false, true); ++ ++ if (desc->contents <= 1 && desc->seg_32bit && ++ !desc->seg_not_present) { ++ /* Should work in the GDT, too. */ ++ install_valid_mode(desc, ar, false, false); ++ } ++ ++ return ret; + } + + static void install_invalid(const struct user_desc *desc, bool oldmode) +diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c +index 484e8820c382..2447d7c017e7 100644 +--- a/virt/kvm/kvm_main.c ++++ b/virt/kvm/kvm_main.c +@@ -4018,7 +4018,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, + if (!vcpu_align) + vcpu_align = __alignof__(struct kvm_vcpu); + kvm_vcpu_cache = kmem_cache_create("kvm_vcpu", vcpu_size, vcpu_align, +- 0, NULL); ++ SLAB_ACCOUNT, NULL); + if (!kvm_vcpu_cache) { + r = -ENOMEM; + goto out_free_3;