From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) by finch.gentoo.org (Postfix) with ESMTP id 10CB613877A for ; Fri, 8 Aug 2014 17:54:41 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 9382EE095F; Fri, 8 Aug 2014 17:54:39 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 0E8D2E095F for ; Fri, 8 Aug 2014 17:54:38 +0000 (UTC) Received: from spoonbill.gentoo.org (spoonbill.gentoo.org [81.93.255.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTPS id C090E34032E for ; Fri, 8 Aug 2014 17:54:36 +0000 (UTC) Received: from localhost.localdomain (localhost [127.0.0.1]) by spoonbill.gentoo.org (Postfix) with ESMTP id 6A99018810 for ; Fri, 8 Aug 2014 17:54:35 +0000 (UTC) From: "Mike Pagano" To: gentoo-commits@lists.gentoo.org Content-Transfer-Encoding: 8bit Content-type: text/plain; charset=UTF-8 Reply-To: gentoo-dev@lists.gentoo.org, "Mike Pagano" Message-ID: <1407520332.55bddecaa455f53be117a668dca0f114c11bc216.mpagano@gentoo> Subject: [gentoo-commits] proj/linux-patches:3.10 commit in: / X-VCS-Repository: proj/linux-patches X-VCS-Files: 0000_README 1051_linux-3.10.52.patch X-VCS-Directories: / X-VCS-Committer: mpagano X-VCS-Committer-Name: Mike Pagano X-VCS-Revision: 55bddecaa455f53be117a668dca0f114c11bc216 X-VCS-Branch: 3.10 Date: Fri, 8 Aug 2014 17:54:35 +0000 (UTC) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-commits@lists.gentoo.org X-Archives-Salt: 6b5e8326-f6d2-4075-a2df-6940c618cba2 X-Archives-Hash: b59d7bb2c20b57dce1552f7749a8dfa5 commit: 55bddecaa455f53be117a668dca0f114c11bc216 Author: Mike Pagano gentoo org> AuthorDate: Fri Aug 8 17:52:12 2014 +0000 Commit: Mike Pagano gentoo org> CommitDate: Fri Aug 8 17:52:12 2014 +0000 URL: http://git.overlays.gentoo.org/gitweb/?p=proj/linux-patches.git;a=commit;h=55bddeca Linux patch 3.10.52 --- 0000_README | 4 + 1051_linux-3.10.52.patch | 1554 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 1558 insertions(+) diff --git a/0000_README b/0000_README index 35d32f5..1e6798c 100644 --- a/0000_README +++ b/0000_README @@ -246,6 +246,10 @@ Patch: 1050_linux-3.10.51.patch From: http://www.kernel.org Desc: Linux 3.10.51 +Patch: 1051_linux-3.10.52.patch +From: http://www.kernel.org +Desc: Linux 3.10.52 + Patch: 1500_XATTR_USER_PREFIX.patch From: https://bugs.gentoo.org/show_bug.cgi?id=470644 Desc: Support for namespace user.pax.* on tmpfs. diff --git a/1051_linux-3.10.52.patch b/1051_linux-3.10.52.patch new file mode 100644 index 0000000..570d7cd --- /dev/null +++ b/1051_linux-3.10.52.patch @@ -0,0 +1,1554 @@ +diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt +index 881582f75c9c..bd4370487b07 100644 +--- a/Documentation/x86/x86_64/mm.txt ++++ b/Documentation/x86/x86_64/mm.txt +@@ -12,6 +12,8 @@ ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space + ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole + ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB) + ... unused hole ... ++ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ++... unused hole ... + ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0 + ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space + ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls +diff --git a/Makefile b/Makefile +index f9f6ee59c61a..b94f00938acc 100644 +--- a/Makefile ++++ b/Makefile +@@ -1,6 +1,6 @@ + VERSION = 3 + PATCHLEVEL = 10 +-SUBLEVEL = 51 ++SUBLEVEL = 52 + EXTRAVERSION = + NAME = TOSSUG Baby Fish + +diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c +index 83cb3ac27095..c61d2373408c 100644 +--- a/arch/arm/mm/idmap.c ++++ b/arch/arm/mm/idmap.c +@@ -24,6 +24,13 @@ static void idmap_add_pmd(pud_t *pud, unsigned long addr, unsigned long end, + pr_warning("Failed to allocate identity pmd.\n"); + return; + } ++ /* ++ * Copy the original PMD to ensure that the PMD entries for ++ * the kernel image are preserved. ++ */ ++ if (!pud_none(*pud)) ++ memcpy(pmd, pmd_offset(pud, 0), ++ PTRS_PER_PMD * sizeof(pmd_t)); + pud_populate(&init_mm, pud, pmd); + pmd += pmd_index(addr); + } else +diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig +index af88b27ce313..a649cb686692 100644 +--- a/arch/x86/Kconfig ++++ b/arch/x86/Kconfig +@@ -952,10 +952,27 @@ config VM86 + default y + depends on X86_32 + ---help--- +- This option is required by programs like DOSEMU to run 16-bit legacy +- code on X86 processors. It also may be needed by software like +- XFree86 to initialize some video cards via BIOS. Disabling this +- option saves about 6k. ++ This option is required by programs like DOSEMU to run ++ 16-bit real mode legacy code on x86 processors. It also may ++ be needed by software like XFree86 to initialize some video ++ cards via BIOS. Disabling this option saves about 6K. ++ ++config X86_16BIT ++ bool "Enable support for 16-bit segments" if EXPERT ++ default y ++ ---help--- ++ This option is required by programs like Wine to run 16-bit ++ protected mode legacy code on x86 processors. Disabling ++ this option saves about 300 bytes on i386, or around 6K text ++ plus 16K runtime memory on x86-64, ++ ++config X86_ESPFIX32 ++ def_bool y ++ depends on X86_16BIT && X86_32 ++ ++config X86_ESPFIX64 ++ def_bool y ++ depends on X86_16BIT && X86_64 + + config TOSHIBA + tristate "Toshiba Laptop support" +diff --git a/arch/x86/include/asm/espfix.h b/arch/x86/include/asm/espfix.h +new file mode 100644 +index 000000000000..99efebb2f69d +--- /dev/null ++++ b/arch/x86/include/asm/espfix.h +@@ -0,0 +1,16 @@ ++#ifndef _ASM_X86_ESPFIX_H ++#define _ASM_X86_ESPFIX_H ++ ++#ifdef CONFIG_X86_64 ++ ++#include ++ ++DECLARE_PER_CPU_READ_MOSTLY(unsigned long, espfix_stack); ++DECLARE_PER_CPU_READ_MOSTLY(unsigned long, espfix_waddr); ++ ++extern void init_espfix_bsp(void); ++extern void init_espfix_ap(void); ++ ++#endif /* CONFIG_X86_64 */ ++ ++#endif /* _ASM_X86_ESPFIX_H */ +diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h +index bba3cf88e624..0a8b519226b8 100644 +--- a/arch/x86/include/asm/irqflags.h ++++ b/arch/x86/include/asm/irqflags.h +@@ -129,7 +129,7 @@ static inline notrace unsigned long arch_local_irq_save(void) + + #define PARAVIRT_ADJUST_EXCEPTION_FRAME /* */ + +-#define INTERRUPT_RETURN iretq ++#define INTERRUPT_RETURN jmp native_iret + #define USERGS_SYSRET64 \ + swapgs; \ + sysretq; +diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h +index 2d883440cb9a..b1609f2c524c 100644 +--- a/arch/x86/include/asm/pgtable_64_types.h ++++ b/arch/x86/include/asm/pgtable_64_types.h +@@ -61,6 +61,8 @@ typedef struct { pteval_t pte; } pte_t; + #define MODULES_VADDR _AC(0xffffffffa0000000, UL) + #define MODULES_END _AC(0xffffffffff000000, UL) + #define MODULES_LEN (MODULES_END - MODULES_VADDR) ++#define ESPFIX_PGD_ENTRY _AC(-2, UL) ++#define ESPFIX_BASE_ADDR (ESPFIX_PGD_ENTRY << PGDIR_SHIFT) + + #define EARLY_DYNAMIC_PAGE_TABLES 64 + +diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h +index b7bf3505e1ec..2e327f114a1b 100644 +--- a/arch/x86/include/asm/setup.h ++++ b/arch/x86/include/asm/setup.h +@@ -62,6 +62,8 @@ static inline void x86_ce4100_early_setup(void) { } + + #ifndef _SETUP + ++#include ++ + /* + * This is set up by the setup-routine at boot-time + */ +diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile +index 7bd3bd310106..111eb356dbea 100644 +--- a/arch/x86/kernel/Makefile ++++ b/arch/x86/kernel/Makefile +@@ -27,6 +27,7 @@ obj-$(CONFIG_X86_64) += sys_x86_64.o x8664_ksyms_64.o + obj-y += syscall_$(BITS).o + obj-$(CONFIG_X86_64) += vsyscall_64.o + obj-$(CONFIG_X86_64) += vsyscall_emu_64.o ++obj-$(CONFIG_X86_ESPFIX64) += espfix_64.o + obj-y += bootflag.o e820.o + obj-y += pci-dma.o quirks.o topology.o kdebugfs.o + obj-y += alternative.o i8253.o pci-nommu.o hw_breakpoint.o +diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S +index 08fa44443a01..5c38e2b298cd 100644 +--- a/arch/x86/kernel/entry_32.S ++++ b/arch/x86/kernel/entry_32.S +@@ -532,6 +532,7 @@ syscall_exit: + restore_all: + TRACE_IRQS_IRET + restore_all_notrace: ++#ifdef CONFIG_X86_ESPFIX32 + movl PT_EFLAGS(%esp), %eax # mix EFLAGS, SS and CS + # Warning: PT_OLDSS(%esp) contains the wrong/random values if we + # are returning to the kernel. +@@ -542,6 +543,7 @@ restore_all_notrace: + cmpl $((SEGMENT_LDT << 8) | USER_RPL), %eax + CFI_REMEMBER_STATE + je ldt_ss # returning to user-space with LDT SS ++#endif + restore_nocheck: + RESTORE_REGS 4 # skip orig_eax/error_code + irq_return: +@@ -554,6 +556,7 @@ ENTRY(iret_exc) + .previous + _ASM_EXTABLE(irq_return,iret_exc) + ++#ifdef CONFIG_X86_ESPFIX32 + CFI_RESTORE_STATE + ldt_ss: + #ifdef CONFIG_PARAVIRT +@@ -597,6 +600,7 @@ ldt_ss: + lss (%esp), %esp /* switch to espfix segment */ + CFI_ADJUST_CFA_OFFSET -8 + jmp restore_nocheck ++#endif + CFI_ENDPROC + ENDPROC(system_call) + +@@ -709,6 +713,7 @@ END(syscall_badsys) + * the high word of the segment base from the GDT and swiches to the + * normal stack and adjusts ESP with the matching offset. + */ ++#ifdef CONFIG_X86_ESPFIX32 + /* fixup the stack */ + mov GDT_ESPFIX_SS + 4, %al /* bits 16..23 */ + mov GDT_ESPFIX_SS + 7, %ah /* bits 24..31 */ +@@ -718,8 +723,10 @@ END(syscall_badsys) + pushl_cfi %eax + lss (%esp), %esp /* switch to the normal stack segment */ + CFI_ADJUST_CFA_OFFSET -8 ++#endif + .endm + .macro UNWIND_ESPFIX_STACK ++#ifdef CONFIG_X86_ESPFIX32 + movl %ss, %eax + /* see if on espfix stack */ + cmpw $__ESPFIX_SS, %ax +@@ -730,6 +737,7 @@ END(syscall_badsys) + /* switch to normal stack */ + FIXUP_ESPFIX_STACK + 27: ++#endif + .endm + + /* +@@ -1337,11 +1345,13 @@ END(debug) + ENTRY(nmi) + RING0_INT_FRAME + ASM_CLAC ++#ifdef CONFIG_X86_ESPFIX32 + pushl_cfi %eax + movl %ss, %eax + cmpw $__ESPFIX_SS, %ax + popl_cfi %eax + je nmi_espfix_stack ++#endif + cmpl $ia32_sysenter_target,(%esp) + je nmi_stack_fixup + pushl_cfi %eax +@@ -1381,6 +1391,7 @@ nmi_debug_stack_check: + FIX_STACK 24, nmi_stack_correct, 1 + jmp nmi_stack_correct + ++#ifdef CONFIG_X86_ESPFIX32 + nmi_espfix_stack: + /* We have a RING0_INT_FRAME here. + * +@@ -1402,6 +1413,7 @@ nmi_espfix_stack: + lss 12+4(%esp), %esp # back to espfix stack + CFI_ADJUST_CFA_OFFSET -24 + jmp irq_return ++#endif + CFI_ENDPROC + END(nmi) + +diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S +index 7ac938a4bfab..39ba6914bbc6 100644 +--- a/arch/x86/kernel/entry_64.S ++++ b/arch/x86/kernel/entry_64.S +@@ -58,6 +58,7 @@ + #include + #include + #include ++#include + #include + + /* Avoid __ASSEMBLER__'ifying just for this. */ +@@ -1056,12 +1057,45 @@ restore_args: + + irq_return: + INTERRUPT_RETURN +- _ASM_EXTABLE(irq_return, bad_iret) + +-#ifdef CONFIG_PARAVIRT + ENTRY(native_iret) ++ /* ++ * Are we returning to a stack segment from the LDT? Note: in ++ * 64-bit mode SS:RSP on the exception stack is always valid. ++ */ ++#ifdef CONFIG_X86_ESPFIX64 ++ testb $4,(SS-RIP)(%rsp) ++ jnz native_irq_return_ldt ++#endif ++ ++native_irq_return_iret: + iretq +- _ASM_EXTABLE(native_iret, bad_iret) ++ _ASM_EXTABLE(native_irq_return_iret, bad_iret) ++ ++#ifdef CONFIG_X86_ESPFIX64 ++native_irq_return_ldt: ++ pushq_cfi %rax ++ pushq_cfi %rdi ++ SWAPGS ++ movq PER_CPU_VAR(espfix_waddr),%rdi ++ movq %rax,(0*8)(%rdi) /* RAX */ ++ movq (2*8)(%rsp),%rax /* RIP */ ++ movq %rax,(1*8)(%rdi) ++ movq (3*8)(%rsp),%rax /* CS */ ++ movq %rax,(2*8)(%rdi) ++ movq (4*8)(%rsp),%rax /* RFLAGS */ ++ movq %rax,(3*8)(%rdi) ++ movq (6*8)(%rsp),%rax /* SS */ ++ movq %rax,(5*8)(%rdi) ++ movq (5*8)(%rsp),%rax /* RSP */ ++ movq %rax,(4*8)(%rdi) ++ andl $0xffff0000,%eax ++ popq_cfi %rdi ++ orq PER_CPU_VAR(espfix_stack),%rax ++ SWAPGS ++ movq %rax,%rsp ++ popq_cfi %rax ++ jmp native_irq_return_iret + #endif + + .section .fixup,"ax" +@@ -1127,9 +1161,40 @@ ENTRY(retint_kernel) + call preempt_schedule_irq + jmp exit_intr + #endif +- + CFI_ENDPROC + END(common_interrupt) ++ ++ /* ++ * If IRET takes a fault on the espfix stack, then we ++ * end up promoting it to a doublefault. In that case, ++ * modify the stack to make it look like we just entered ++ * the #GP handler from user space, similar to bad_iret. ++ */ ++#ifdef CONFIG_X86_ESPFIX64 ++ ALIGN ++__do_double_fault: ++ XCPT_FRAME 1 RDI+8 ++ movq RSP(%rdi),%rax /* Trap on the espfix stack? */ ++ sarq $PGDIR_SHIFT,%rax ++ cmpl $ESPFIX_PGD_ENTRY,%eax ++ jne do_double_fault /* No, just deliver the fault */ ++ cmpl $__KERNEL_CS,CS(%rdi) ++ jne do_double_fault ++ movq RIP(%rdi),%rax ++ cmpq $native_irq_return_iret,%rax ++ jne do_double_fault /* This shouldn't happen... */ ++ movq PER_CPU_VAR(kernel_stack),%rax ++ subq $(6*8-KERNEL_STACK_OFFSET),%rax /* Reset to original stack */ ++ movq %rax,RSP(%rdi) ++ movq $0,(%rax) /* Missing (lost) #GP error code */ ++ movq $general_protection,RIP(%rdi) ++ retq ++ CFI_ENDPROC ++END(__do_double_fault) ++#else ++# define __do_double_fault do_double_fault ++#endif ++ + /* + * End of kprobes section + */ +@@ -1298,7 +1363,7 @@ zeroentry overflow do_overflow + zeroentry bounds do_bounds + zeroentry invalid_op do_invalid_op + zeroentry device_not_available do_device_not_available +-paranoiderrorentry double_fault do_double_fault ++paranoiderrorentry double_fault __do_double_fault + zeroentry coprocessor_segment_overrun do_coprocessor_segment_overrun + errorentry invalid_TSS do_invalid_TSS + errorentry segment_not_present do_segment_not_present +@@ -1585,7 +1650,7 @@ error_sti: + */ + error_kernelspace: + incl %ebx +- leaq irq_return(%rip),%rcx ++ leaq native_irq_return_iret(%rip),%rcx + cmpq %rcx,RIP+8(%rsp) + je error_swapgs + movl %ecx,%eax /* zero extend */ +diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c +new file mode 100644 +index 000000000000..94d857fb1033 +--- /dev/null ++++ b/arch/x86/kernel/espfix_64.c +@@ -0,0 +1,208 @@ ++/* ----------------------------------------------------------------------- * ++ * ++ * Copyright 2014 Intel Corporation; author: H. Peter Anvin ++ * ++ * This program is free software; you can redistribute it and/or modify it ++ * under the terms and conditions of the GNU General Public License, ++ * version 2, as published by the Free Software Foundation. ++ * ++ * This program is distributed in the hope it will be useful, but WITHOUT ++ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or ++ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for ++ * more details. ++ * ++ * ----------------------------------------------------------------------- */ ++ ++/* ++ * The IRET instruction, when returning to a 16-bit segment, only ++ * restores the bottom 16 bits of the user space stack pointer. This ++ * causes some 16-bit software to break, but it also leaks kernel state ++ * to user space. ++ * ++ * This works around this by creating percpu "ministacks", each of which ++ * is mapped 2^16 times 64K apart. When we detect that the return SS is ++ * on the LDT, we copy the IRET frame to the ministack and use the ++ * relevant alias to return to userspace. The ministacks are mapped ++ * readonly, so if the IRET fault we promote #GP to #DF which is an IST ++ * vector and thus has its own stack; we then do the fixup in the #DF ++ * handler. ++ * ++ * This file sets up the ministacks and the related page tables. The ++ * actual ministack invocation is in entry_64.S. ++ */ ++ ++#include ++#include ++#include ++#include ++#include ++#include ++#include ++#include ++#include ++#include ++ ++/* ++ * Note: we only need 6*8 = 48 bytes for the espfix stack, but round ++ * it up to a cache line to avoid unnecessary sharing. ++ */ ++#define ESPFIX_STACK_SIZE (8*8UL) ++#define ESPFIX_STACKS_PER_PAGE (PAGE_SIZE/ESPFIX_STACK_SIZE) ++ ++/* There is address space for how many espfix pages? */ ++#define ESPFIX_PAGE_SPACE (1UL << (PGDIR_SHIFT-PAGE_SHIFT-16)) ++ ++#define ESPFIX_MAX_CPUS (ESPFIX_STACKS_PER_PAGE * ESPFIX_PAGE_SPACE) ++#if CONFIG_NR_CPUS > ESPFIX_MAX_CPUS ++# error "Need more than one PGD for the ESPFIX hack" ++#endif ++ ++#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO) ++ ++/* This contains the *bottom* address of the espfix stack */ ++DEFINE_PER_CPU_READ_MOSTLY(unsigned long, espfix_stack); ++DEFINE_PER_CPU_READ_MOSTLY(unsigned long, espfix_waddr); ++ ++/* Initialization mutex - should this be a spinlock? */ ++static DEFINE_MUTEX(espfix_init_mutex); ++ ++/* Page allocation bitmap - each page serves ESPFIX_STACKS_PER_PAGE CPUs */ ++#define ESPFIX_MAX_PAGES DIV_ROUND_UP(CONFIG_NR_CPUS, ESPFIX_STACKS_PER_PAGE) ++static void *espfix_pages[ESPFIX_MAX_PAGES]; ++ ++static __page_aligned_bss pud_t espfix_pud_page[PTRS_PER_PUD] ++ __aligned(PAGE_SIZE); ++ ++static unsigned int page_random, slot_random; ++ ++/* ++ * This returns the bottom address of the espfix stack for a specific CPU. ++ * The math allows for a non-power-of-two ESPFIX_STACK_SIZE, in which case ++ * we have to account for some amount of padding at the end of each page. ++ */ ++static inline unsigned long espfix_base_addr(unsigned int cpu) ++{ ++ unsigned long page, slot; ++ unsigned long addr; ++ ++ page = (cpu / ESPFIX_STACKS_PER_PAGE) ^ page_random; ++ slot = (cpu + slot_random) % ESPFIX_STACKS_PER_PAGE; ++ addr = (page << PAGE_SHIFT) + (slot * ESPFIX_STACK_SIZE); ++ addr = (addr & 0xffffUL) | ((addr & ~0xffffUL) << 16); ++ addr += ESPFIX_BASE_ADDR; ++ return addr; ++} ++ ++#define PTE_STRIDE (65536/PAGE_SIZE) ++#define ESPFIX_PTE_CLONES (PTRS_PER_PTE/PTE_STRIDE) ++#define ESPFIX_PMD_CLONES PTRS_PER_PMD ++#define ESPFIX_PUD_CLONES (65536/(ESPFIX_PTE_CLONES*ESPFIX_PMD_CLONES)) ++ ++#define PGTABLE_PROT ((_KERNPG_TABLE & ~_PAGE_RW) | _PAGE_NX) ++ ++static void init_espfix_random(void) ++{ ++ unsigned long rand; ++ ++ /* ++ * This is run before the entropy pools are initialized, ++ * but this is hopefully better than nothing. ++ */ ++ if (!arch_get_random_long(&rand)) { ++ /* The constant is an arbitrary large prime */ ++ rdtscll(rand); ++ rand *= 0xc345c6b72fd16123UL; ++ } ++ ++ slot_random = rand % ESPFIX_STACKS_PER_PAGE; ++ page_random = (rand / ESPFIX_STACKS_PER_PAGE) ++ & (ESPFIX_PAGE_SPACE - 1); ++} ++ ++void __init init_espfix_bsp(void) ++{ ++ pgd_t *pgd_p; ++ pteval_t ptemask; ++ ++ ptemask = __supported_pte_mask; ++ ++ /* Install the espfix pud into the kernel page directory */ ++ pgd_p = &init_level4_pgt[pgd_index(ESPFIX_BASE_ADDR)]; ++ pgd_populate(&init_mm, pgd_p, (pud_t *)espfix_pud_page); ++ ++ /* Randomize the locations */ ++ init_espfix_random(); ++ ++ /* The rest is the same as for any other processor */ ++ init_espfix_ap(); ++} ++ ++void init_espfix_ap(void) ++{ ++ unsigned int cpu, page; ++ unsigned long addr; ++ pud_t pud, *pud_p; ++ pmd_t pmd, *pmd_p; ++ pte_t pte, *pte_p; ++ int n; ++ void *stack_page; ++ pteval_t ptemask; ++ ++ /* We only have to do this once... */ ++ if (likely(this_cpu_read(espfix_stack))) ++ return; /* Already initialized */ ++ ++ cpu = smp_processor_id(); ++ addr = espfix_base_addr(cpu); ++ page = cpu/ESPFIX_STACKS_PER_PAGE; ++ ++ /* Did another CPU already set this up? */ ++ stack_page = ACCESS_ONCE(espfix_pages[page]); ++ if (likely(stack_page)) ++ goto done; ++ ++ mutex_lock(&espfix_init_mutex); ++ ++ /* Did we race on the lock? */ ++ stack_page = ACCESS_ONCE(espfix_pages[page]); ++ if (stack_page) ++ goto unlock_done; ++ ++ ptemask = __supported_pte_mask; ++ ++ pud_p = &espfix_pud_page[pud_index(addr)]; ++ pud = *pud_p; ++ if (!pud_present(pud)) { ++ pmd_p = (pmd_t *)__get_free_page(PGALLOC_GFP); ++ pud = __pud(__pa(pmd_p) | (PGTABLE_PROT & ptemask)); ++ paravirt_alloc_pmd(&init_mm, __pa(pmd_p) >> PAGE_SHIFT); ++ for (n = 0; n < ESPFIX_PUD_CLONES; n++) ++ set_pud(&pud_p[n], pud); ++ } ++ ++ pmd_p = pmd_offset(&pud, addr); ++ pmd = *pmd_p; ++ if (!pmd_present(pmd)) { ++ pte_p = (pte_t *)__get_free_page(PGALLOC_GFP); ++ pmd = __pmd(__pa(pte_p) | (PGTABLE_PROT & ptemask)); ++ paravirt_alloc_pte(&init_mm, __pa(pte_p) >> PAGE_SHIFT); ++ for (n = 0; n < ESPFIX_PMD_CLONES; n++) ++ set_pmd(&pmd_p[n], pmd); ++ } ++ ++ pte_p = pte_offset_kernel(&pmd, addr); ++ stack_page = (void *)__get_free_page(GFP_KERNEL); ++ pte = __pte(__pa(stack_page) | (__PAGE_KERNEL_RO & ptemask)); ++ for (n = 0; n < ESPFIX_PTE_CLONES; n++) ++ set_pte(&pte_p[n*PTE_STRIDE], pte); ++ ++ /* Job is done for this CPU and any CPU which shares this page */ ++ ACCESS_ONCE(espfix_pages[page]) = stack_page; ++ ++unlock_done: ++ mutex_unlock(&espfix_init_mutex); ++done: ++ this_cpu_write(espfix_stack, addr); ++ this_cpu_write(espfix_waddr, (unsigned long)stack_page ++ + (addr & ~PAGE_MASK)); ++} +diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c +index dcbbaa165bde..c37886d759cc 100644 +--- a/arch/x86/kernel/ldt.c ++++ b/arch/x86/kernel/ldt.c +@@ -20,8 +20,6 @@ + #include + #include + +-int sysctl_ldt16 = 0; +- + #ifdef CONFIG_SMP + static void flush_ldt(void *current_mm) + { +@@ -231,16 +229,10 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode) + } + } + +- /* +- * On x86-64 we do not support 16-bit segments due to +- * IRET leaking the high bits of the kernel stack address. +- */ +-#ifdef CONFIG_X86_64 +- if (!ldt_info.seg_32bit && !sysctl_ldt16) { ++ if (!IS_ENABLED(CONFIG_X86_16BIT) && !ldt_info.seg_32bit) { + error = -EINVAL; + goto out_unlock; + } +-#endif + + fill_ldt(&ldt, &ldt_info); + if (oldmode) +diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c +index 3f08f34f93eb..a1da6737ba5b 100644 +--- a/arch/x86/kernel/paravirt_patch_64.c ++++ b/arch/x86/kernel/paravirt_patch_64.c +@@ -6,7 +6,6 @@ DEF_NATIVE(pv_irq_ops, irq_disable, "cli"); + DEF_NATIVE(pv_irq_ops, irq_enable, "sti"); + DEF_NATIVE(pv_irq_ops, restore_fl, "pushq %rdi; popfq"); + DEF_NATIVE(pv_irq_ops, save_fl, "pushfq; popq %rax"); +-DEF_NATIVE(pv_cpu_ops, iret, "iretq"); + DEF_NATIVE(pv_mmu_ops, read_cr2, "movq %cr2, %rax"); + DEF_NATIVE(pv_mmu_ops, read_cr3, "movq %cr3, %rax"); + DEF_NATIVE(pv_mmu_ops, write_cr3, "movq %rdi, %cr3"); +@@ -50,7 +49,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf, + PATCH_SITE(pv_irq_ops, save_fl); + PATCH_SITE(pv_irq_ops, irq_enable); + PATCH_SITE(pv_irq_ops, irq_disable); +- PATCH_SITE(pv_cpu_ops, iret); + PATCH_SITE(pv_cpu_ops, irq_enable_sysexit); + PATCH_SITE(pv_cpu_ops, usergs_sysret32); + PATCH_SITE(pv_cpu_ops, usergs_sysret64); +diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c +index bfd348e99369..fe862750583b 100644 +--- a/arch/x86/kernel/smpboot.c ++++ b/arch/x86/kernel/smpboot.c +@@ -265,6 +265,13 @@ notrace static void __cpuinit start_secondary(void *unused) + check_tsc_sync_target(); + + /* ++ * Enable the espfix hack for this CPU ++ */ ++#ifdef CONFIG_X86_ESPFIX64 ++ init_espfix_ap(); ++#endif ++ ++ /* + * We need to hold vector_lock so there the set of online cpus + * does not change while we are assigning vectors to cpus. Holding + * this lock ensures we don't half assign or remove an irq from a cpu. +diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c +index 0002a3a33081..e04e67753238 100644 +--- a/arch/x86/mm/dump_pagetables.c ++++ b/arch/x86/mm/dump_pagetables.c +@@ -30,11 +30,13 @@ struct pg_state { + unsigned long start_address; + unsigned long current_address; + const struct addr_marker *marker; ++ unsigned long lines; + }; + + struct addr_marker { + unsigned long start_address; + const char *name; ++ unsigned long max_lines; + }; + + /* indices for address_markers; keep sync'd w/ address_markers below */ +@@ -45,6 +47,7 @@ enum address_markers_idx { + LOW_KERNEL_NR, + VMALLOC_START_NR, + VMEMMAP_START_NR, ++ ESPFIX_START_NR, + HIGH_KERNEL_NR, + MODULES_VADDR_NR, + MODULES_END_NR, +@@ -67,6 +70,7 @@ static struct addr_marker address_markers[] = { + { PAGE_OFFSET, "Low Kernel Mapping" }, + { VMALLOC_START, "vmalloc() Area" }, + { VMEMMAP_START, "Vmemmap" }, ++ { ESPFIX_BASE_ADDR, "ESPfix Area", 16 }, + { __START_KERNEL_map, "High Kernel Mapping" }, + { MODULES_VADDR, "Modules" }, + { MODULES_END, "End Modules" }, +@@ -163,7 +167,7 @@ static void note_page(struct seq_file *m, struct pg_state *st, + pgprot_t new_prot, int level) + { + pgprotval_t prot, cur; +- static const char units[] = "KMGTPE"; ++ static const char units[] = "BKMGTPE"; + + /* + * If we have a "break" in the series, we need to flush the state that +@@ -178,6 +182,7 @@ static void note_page(struct seq_file *m, struct pg_state *st, + st->current_prot = new_prot; + st->level = level; + st->marker = address_markers; ++ st->lines = 0; + seq_printf(m, "---[ %s ]---\n", st->marker->name); + } else if (prot != cur || level != st->level || + st->current_address >= st->marker[1].start_address) { +@@ -188,17 +193,21 @@ static void note_page(struct seq_file *m, struct pg_state *st, + /* + * Now print the actual finished series + */ +- seq_printf(m, "0x%0*lx-0x%0*lx ", +- width, st->start_address, +- width, st->current_address); +- +- delta = (st->current_address - st->start_address) >> 10; +- while (!(delta & 1023) && unit[1]) { +- delta >>= 10; +- unit++; ++ if (!st->marker->max_lines || ++ st->lines < st->marker->max_lines) { ++ seq_printf(m, "0x%0*lx-0x%0*lx ", ++ width, st->start_address, ++ width, st->current_address); ++ ++ delta = (st->current_address - st->start_address); ++ while (!(delta & 1023) && unit[1]) { ++ delta >>= 10; ++ unit++; ++ } ++ seq_printf(m, "%9lu%c ", delta, *unit); ++ printk_prot(m, st->current_prot, st->level); + } +- seq_printf(m, "%9lu%c ", delta, *unit); +- printk_prot(m, st->current_prot, st->level); ++ st->lines++; + + /* + * We print markers for special areas of address space, +@@ -206,7 +215,15 @@ static void note_page(struct seq_file *m, struct pg_state *st, + * This helps in the interpretation. + */ + if (st->current_address >= st->marker[1].start_address) { ++ if (st->marker->max_lines && ++ st->lines > st->marker->max_lines) { ++ unsigned long nskip = ++ st->lines - st->marker->max_lines; ++ seq_printf(m, "... %lu entr%s skipped ... \n", ++ nskip, nskip == 1 ? "y" : "ies"); ++ } + st->marker++; ++ st->lines = 0; + seq_printf(m, "---[ %s ]---\n", st->marker->name); + } + +diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c +index 0f134c7cfc24..0faad646f5fd 100644 +--- a/arch/x86/vdso/vdso32-setup.c ++++ b/arch/x86/vdso/vdso32-setup.c +@@ -41,7 +41,6 @@ enum { + #ifdef CONFIG_X86_64 + #define vdso_enabled sysctl_vsyscall32 + #define arch_setup_additional_pages syscall32_setup_pages +-extern int sysctl_ldt16; + #endif + + /* +@@ -381,13 +380,6 @@ static ctl_table abi_table2[] = { + .mode = 0644, + .proc_handler = proc_dointvec + }, +- { +- .procname = "ldt16", +- .data = &sysctl_ldt16, +- .maxlen = sizeof(int), +- .mode = 0644, +- .proc_handler = proc_dointvec +- }, + {} + }; + +diff --git a/crypto/af_alg.c b/crypto/af_alg.c +index ac33d5f30778..bf948e134981 100644 +--- a/crypto/af_alg.c ++++ b/crypto/af_alg.c +@@ -21,6 +21,7 @@ + #include + #include + #include ++#include + + struct alg_type_list { + const struct af_alg_type *type; +@@ -243,6 +244,7 @@ int af_alg_accept(struct sock *sk, struct socket *newsock) + + sock_init_data(newsock, sk2); + sock_graft(sk2, newsock); ++ security_sk_clone(sk, sk2); + + err = type->accept(ask->private, sk2); + if (err) { +diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c +index d344cf3ac9e3..e13c5f4b12cb 100644 +--- a/drivers/iio/industrialio-buffer.c ++++ b/drivers/iio/industrialio-buffer.c +@@ -849,7 +849,7 @@ static int iio_buffer_update_demux(struct iio_dev *indio_dev, + + /* Now we have the two masks, work from least sig and build up sizes */ + for_each_set_bit(out_ind, +- indio_dev->active_scan_mask, ++ buffer->scan_mask, + indio_dev->masklength) { + in_ind = find_next_bit(indio_dev->active_scan_mask, + indio_dev->masklength, +diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c +index 658613021919..f8821ce27802 100644 +--- a/drivers/net/ethernet/marvell/mvneta.c ++++ b/drivers/net/ethernet/marvell/mvneta.c +@@ -99,16 +99,56 @@ + #define MVNETA_CPU_RXQ_ACCESS_ALL_MASK 0x000000ff + #define MVNETA_CPU_TXQ_ACCESS_ALL_MASK 0x0000ff00 + #define MVNETA_RXQ_TIME_COAL_REG(q) (0x2580 + ((q) << 2)) ++ ++/* Exception Interrupt Port/Queue Cause register */ ++ + #define MVNETA_INTR_NEW_CAUSE 0x25a0 +-#define MVNETA_RX_INTR_MASK(nr_rxqs) (((1 << nr_rxqs) - 1) << 8) + #define MVNETA_INTR_NEW_MASK 0x25a4 ++ ++/* bits 0..7 = TXQ SENT, one bit per queue. ++ * bits 8..15 = RXQ OCCUP, one bit per queue. ++ * bits 16..23 = RXQ FREE, one bit per queue. ++ * bit 29 = OLD_REG_SUM, see old reg ? ++ * bit 30 = TX_ERR_SUM, one bit for 4 ports ++ * bit 31 = MISC_SUM, one bit for 4 ports ++ */ ++#define MVNETA_TX_INTR_MASK(nr_txqs) (((1 << nr_txqs) - 1) << 0) ++#define MVNETA_TX_INTR_MASK_ALL (0xff << 0) ++#define MVNETA_RX_INTR_MASK(nr_rxqs) (((1 << nr_rxqs) - 1) << 8) ++#define MVNETA_RX_INTR_MASK_ALL (0xff << 8) ++ + #define MVNETA_INTR_OLD_CAUSE 0x25a8 + #define MVNETA_INTR_OLD_MASK 0x25ac ++ ++/* Data Path Port/Queue Cause Register */ + #define MVNETA_INTR_MISC_CAUSE 0x25b0 + #define MVNETA_INTR_MISC_MASK 0x25b4 ++ ++#define MVNETA_CAUSE_PHY_STATUS_CHANGE BIT(0) ++#define MVNETA_CAUSE_LINK_CHANGE BIT(1) ++#define MVNETA_CAUSE_PTP BIT(4) ++ ++#define MVNETA_CAUSE_INTERNAL_ADDR_ERR BIT(7) ++#define MVNETA_CAUSE_RX_OVERRUN BIT(8) ++#define MVNETA_CAUSE_RX_CRC_ERROR BIT(9) ++#define MVNETA_CAUSE_RX_LARGE_PKT BIT(10) ++#define MVNETA_CAUSE_TX_UNDERUN BIT(11) ++#define MVNETA_CAUSE_PRBS_ERR BIT(12) ++#define MVNETA_CAUSE_PSC_SYNC_CHANGE BIT(13) ++#define MVNETA_CAUSE_SERDES_SYNC_ERR BIT(14) ++ ++#define MVNETA_CAUSE_BMU_ALLOC_ERR_SHIFT 16 ++#define MVNETA_CAUSE_BMU_ALLOC_ERR_ALL_MASK (0xF << MVNETA_CAUSE_BMU_ALLOC_ERR_SHIFT) ++#define MVNETA_CAUSE_BMU_ALLOC_ERR_MASK(pool) (1 << (MVNETA_CAUSE_BMU_ALLOC_ERR_SHIFT + (pool))) ++ ++#define MVNETA_CAUSE_TXQ_ERROR_SHIFT 24 ++#define MVNETA_CAUSE_TXQ_ERROR_ALL_MASK (0xFF << MVNETA_CAUSE_TXQ_ERROR_SHIFT) ++#define MVNETA_CAUSE_TXQ_ERROR_MASK(q) (1 << (MVNETA_CAUSE_TXQ_ERROR_SHIFT + (q))) ++ + #define MVNETA_INTR_ENABLE 0x25b8 + #define MVNETA_TXQ_INTR_ENABLE_ALL_MASK 0x0000ff00 +-#define MVNETA_RXQ_INTR_ENABLE_ALL_MASK 0xff000000 ++#define MVNETA_RXQ_INTR_ENABLE_ALL_MASK 0xff000000 // note: neta says it's 0x000000FF ++ + #define MVNETA_RXQ_CMD 0x2680 + #define MVNETA_RXQ_DISABLE_SHIFT 8 + #define MVNETA_RXQ_ENABLE_MASK 0x000000ff +@@ -174,9 +214,6 @@ + #define MVNETA_RX_COAL_PKTS 32 + #define MVNETA_RX_COAL_USEC 100 + +-/* Timer */ +-#define MVNETA_TX_DONE_TIMER_PERIOD 10 +- + /* Napi polling weight */ + #define MVNETA_RX_POLL_WEIGHT 64 + +@@ -219,10 +256,12 @@ + + #define MVNETA_RX_BUF_SIZE(pkt_size) ((pkt_size) + NET_SKB_PAD) + +-struct mvneta_stats { ++struct mvneta_pcpu_stats { + struct u64_stats_sync syncp; +- u64 packets; +- u64 bytes; ++ u64 rx_packets; ++ u64 rx_bytes; ++ u64 tx_packets; ++ u64 tx_bytes; + }; + + struct mvneta_port { +@@ -230,16 +269,11 @@ struct mvneta_port { + void __iomem *base; + struct mvneta_rx_queue *rxqs; + struct mvneta_tx_queue *txqs; +- struct timer_list tx_done_timer; + struct net_device *dev; + + u32 cause_rx_tx; + struct napi_struct napi; + +- /* Flags */ +- unsigned long flags; +-#define MVNETA_F_TX_DONE_TIMER_BIT 0 +- + /* Napi weight */ + int weight; + +@@ -248,8 +282,7 @@ struct mvneta_port { + u8 mcast_count[256]; + u16 tx_ring_size; + u16 rx_ring_size; +- struct mvneta_stats tx_stats; +- struct mvneta_stats rx_stats; ++ struct mvneta_pcpu_stats *stats; + + struct mii_bus *mii_bus; + struct phy_device *phy_dev; +@@ -428,21 +461,29 @@ struct rtnl_link_stats64 *mvneta_get_stats64(struct net_device *dev, + { + struct mvneta_port *pp = netdev_priv(dev); + unsigned int start; ++ int cpu; + +- memset(stats, 0, sizeof(struct rtnl_link_stats64)); +- +- do { +- start = u64_stats_fetch_begin_bh(&pp->rx_stats.syncp); +- stats->rx_packets = pp->rx_stats.packets; +- stats->rx_bytes = pp->rx_stats.bytes; +- } while (u64_stats_fetch_retry_bh(&pp->rx_stats.syncp, start)); ++ for_each_possible_cpu(cpu) { ++ struct mvneta_pcpu_stats *cpu_stats; ++ u64 rx_packets; ++ u64 rx_bytes; ++ u64 tx_packets; ++ u64 tx_bytes; + ++ cpu_stats = per_cpu_ptr(pp->stats, cpu); ++ do { ++ start = u64_stats_fetch_begin_bh(&cpu_stats->syncp); ++ rx_packets = cpu_stats->rx_packets; ++ rx_bytes = cpu_stats->rx_bytes; ++ tx_packets = cpu_stats->tx_packets; ++ tx_bytes = cpu_stats->tx_bytes; ++ } while (u64_stats_fetch_retry_bh(&cpu_stats->syncp, start)); + +- do { +- start = u64_stats_fetch_begin_bh(&pp->tx_stats.syncp); +- stats->tx_packets = pp->tx_stats.packets; +- stats->tx_bytes = pp->tx_stats.bytes; +- } while (u64_stats_fetch_retry_bh(&pp->tx_stats.syncp, start)); ++ stats->rx_packets += rx_packets; ++ stats->rx_bytes += rx_bytes; ++ stats->tx_packets += tx_packets; ++ stats->tx_bytes += tx_bytes; ++ } + + stats->rx_errors = dev->stats.rx_errors; + stats->rx_dropped = dev->stats.rx_dropped; +@@ -1063,17 +1104,6 @@ static void mvneta_tx_done_pkts_coal_set(struct mvneta_port *pp, + txq->done_pkts_coal = value; + } + +-/* Trigger tx done timer in MVNETA_TX_DONE_TIMER_PERIOD msecs */ +-static void mvneta_add_tx_done_timer(struct mvneta_port *pp) +-{ +- if (test_and_set_bit(MVNETA_F_TX_DONE_TIMER_BIT, &pp->flags) == 0) { +- pp->tx_done_timer.expires = jiffies + +- msecs_to_jiffies(MVNETA_TX_DONE_TIMER_PERIOD); +- add_timer(&pp->tx_done_timer); +- } +-} +- +- + /* Handle rx descriptor fill by setting buf_cookie and buf_phys_addr */ + static void mvneta_rx_desc_fill(struct mvneta_rx_desc *rx_desc, + u32 phys_addr, u32 cookie) +@@ -1354,6 +1384,8 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo, + { + struct net_device *dev = pp->dev; + int rx_done, rx_filled; ++ u32 rcvd_pkts = 0; ++ u32 rcvd_bytes = 0; + + /* Get number of received packets */ + rx_done = mvneta_rxq_busy_desc_num_get(pp, rxq); +@@ -1391,10 +1423,8 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo, + + rx_bytes = rx_desc->data_size - + (ETH_FCS_LEN + MVNETA_MH_SIZE); +- u64_stats_update_begin(&pp->rx_stats.syncp); +- pp->rx_stats.packets++; +- pp->rx_stats.bytes += rx_bytes; +- u64_stats_update_end(&pp->rx_stats.syncp); ++ rcvd_pkts++; ++ rcvd_bytes += rx_bytes; + + /* Linux processing */ + skb_reserve(skb, MVNETA_MH_SIZE); +@@ -1415,6 +1445,15 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo, + } + } + ++ if (rcvd_pkts) { ++ struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats); ++ ++ u64_stats_update_begin(&stats->syncp); ++ stats->rx_packets += rcvd_pkts; ++ stats->rx_bytes += rcvd_bytes; ++ u64_stats_update_end(&stats->syncp); ++ } ++ + /* Update rxq management counters */ + mvneta_rxq_desc_num_update(pp, rxq, rx_done, rx_filled); + +@@ -1545,25 +1584,17 @@ static int mvneta_tx(struct sk_buff *skb, struct net_device *dev) + + out: + if (frags > 0) { +- u64_stats_update_begin(&pp->tx_stats.syncp); +- pp->tx_stats.packets++; +- pp->tx_stats.bytes += skb->len; +- u64_stats_update_end(&pp->tx_stats.syncp); ++ struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats); + ++ u64_stats_update_begin(&stats->syncp); ++ stats->tx_packets++; ++ stats->tx_bytes += skb->len; ++ u64_stats_update_end(&stats->syncp); + } else { + dev->stats.tx_dropped++; + dev_kfree_skb_any(skb); + } + +- if (txq->count >= MVNETA_TXDONE_COAL_PKTS) +- mvneta_txq_done(pp, txq); +- +- /* If after calling mvneta_txq_done, count equals +- * frags, we need to set the timer +- */ +- if (txq->count == frags && frags > 0) +- mvneta_add_tx_done_timer(pp); +- + return NETDEV_TX_OK; + } + +@@ -1839,14 +1870,22 @@ static int mvneta_poll(struct napi_struct *napi, int budget) + + /* Read cause register */ + cause_rx_tx = mvreg_read(pp, MVNETA_INTR_NEW_CAUSE) & +- MVNETA_RX_INTR_MASK(rxq_number); ++ (MVNETA_RX_INTR_MASK(rxq_number) | MVNETA_TX_INTR_MASK(txq_number)); ++ ++ /* Release Tx descriptors */ ++ if (cause_rx_tx & MVNETA_TX_INTR_MASK_ALL) { ++ int tx_todo = 0; ++ ++ mvneta_tx_done_gbe(pp, (cause_rx_tx & MVNETA_TX_INTR_MASK_ALL), &tx_todo); ++ cause_rx_tx &= ~MVNETA_TX_INTR_MASK_ALL; ++ } + + /* For the case where the last mvneta_poll did not process all + * RX packets + */ + cause_rx_tx |= pp->cause_rx_tx; + if (rxq_number > 1) { +- while ((cause_rx_tx != 0) && (budget > 0)) { ++ while ((cause_rx_tx & MVNETA_RX_INTR_MASK_ALL) && (budget > 0)) { + int count; + struct mvneta_rx_queue *rxq; + /* get rx queue number from cause_rx_tx */ +@@ -1878,7 +1917,7 @@ static int mvneta_poll(struct napi_struct *napi, int budget) + napi_complete(napi); + local_irq_save(flags); + mvreg_write(pp, MVNETA_INTR_NEW_MASK, +- MVNETA_RX_INTR_MASK(rxq_number)); ++ MVNETA_RX_INTR_MASK(rxq_number) | MVNETA_TX_INTR_MASK(txq_number)); + local_irq_restore(flags); + } + +@@ -1886,26 +1925,6 @@ static int mvneta_poll(struct napi_struct *napi, int budget) + return rx_done; + } + +-/* tx done timer callback */ +-static void mvneta_tx_done_timer_callback(unsigned long data) +-{ +- struct net_device *dev = (struct net_device *)data; +- struct mvneta_port *pp = netdev_priv(dev); +- int tx_done = 0, tx_todo = 0; +- +- if (!netif_running(dev)) +- return ; +- +- clear_bit(MVNETA_F_TX_DONE_TIMER_BIT, &pp->flags); +- +- tx_done = mvneta_tx_done_gbe(pp, +- (((1 << txq_number) - 1) & +- MVNETA_CAUSE_TXQ_SENT_DESC_ALL_MASK), +- &tx_todo); +- if (tx_todo > 0) +- mvneta_add_tx_done_timer(pp); +-} +- + /* Handle rxq fill: allocates rxq skbs; called when initializing a port */ + static int mvneta_rxq_fill(struct mvneta_port *pp, struct mvneta_rx_queue *rxq, + int num) +@@ -2155,7 +2174,7 @@ static void mvneta_start_dev(struct mvneta_port *pp) + + /* Unmask interrupts */ + mvreg_write(pp, MVNETA_INTR_NEW_MASK, +- MVNETA_RX_INTR_MASK(rxq_number)); ++ MVNETA_RX_INTR_MASK(rxq_number) | MVNETA_TX_INTR_MASK(txq_number)); + + phy_start(pp->phy_dev); + netif_tx_start_all_queues(pp->dev); +@@ -2188,16 +2207,6 @@ static void mvneta_stop_dev(struct mvneta_port *pp) + mvneta_rx_reset(pp); + } + +-/* tx timeout callback - display a message and stop/start the network device */ +-static void mvneta_tx_timeout(struct net_device *dev) +-{ +- struct mvneta_port *pp = netdev_priv(dev); +- +- netdev_info(dev, "tx timeout\n"); +- mvneta_stop_dev(pp); +- mvneta_start_dev(pp); +-} +- + /* Return positive if MTU is valid */ + static int mvneta_check_mtu_valid(struct net_device *dev, int mtu) + { +@@ -2426,8 +2435,6 @@ static int mvneta_stop(struct net_device *dev) + free_irq(dev->irq, pp); + mvneta_cleanup_rxqs(pp); + mvneta_cleanup_txqs(pp); +- del_timer(&pp->tx_done_timer); +- clear_bit(MVNETA_F_TX_DONE_TIMER_BIT, &pp->flags); + + return 0; + } +@@ -2548,7 +2555,6 @@ static const struct net_device_ops mvneta_netdev_ops = { + .ndo_set_rx_mode = mvneta_set_rx_mode, + .ndo_set_mac_address = mvneta_set_mac_addr, + .ndo_change_mtu = mvneta_change_mtu, +- .ndo_tx_timeout = mvneta_tx_timeout, + .ndo_get_stats64 = mvneta_get_stats64, + }; + +@@ -2729,10 +2735,6 @@ static int mvneta_probe(struct platform_device *pdev) + + pp = netdev_priv(dev); + +- pp->tx_done_timer.function = mvneta_tx_done_timer_callback; +- init_timer(&pp->tx_done_timer); +- clear_bit(MVNETA_F_TX_DONE_TIMER_BIT, &pp->flags); +- + pp->weight = MVNETA_RX_POLL_WEIGHT; + pp->phy_node = phy_node; + pp->phy_interface = phy_mode; +@@ -2751,7 +2753,12 @@ static int mvneta_probe(struct platform_device *pdev) + + clk_prepare_enable(pp->clk); + +- pp->tx_done_timer.data = (unsigned long)dev; ++ /* Alloc per-cpu stats */ ++ pp->stats = alloc_percpu(struct mvneta_pcpu_stats); ++ if (!pp->stats) { ++ err = -ENOMEM; ++ goto err_clk; ++ } + + pp->tx_ring_size = MVNETA_MAX_TXD; + pp->rx_ring_size = MVNETA_MAX_RXD; +@@ -2762,7 +2769,7 @@ static int mvneta_probe(struct platform_device *pdev) + err = mvneta_init(pp, phy_addr); + if (err < 0) { + dev_err(&pdev->dev, "can't init eth hal\n"); +- goto err_clk; ++ goto err_free_stats; + } + mvneta_port_power_up(pp, phy_mode); + +@@ -2791,6 +2798,8 @@ static int mvneta_probe(struct platform_device *pdev) + + err_deinit: + mvneta_deinit(pp); ++err_free_stats: ++ free_percpu(pp->stats); + err_clk: + clk_disable_unprepare(pp->clk); + err_unmap: +@@ -2811,6 +2820,7 @@ static int mvneta_remove(struct platform_device *pdev) + unregister_netdev(dev); + mvneta_deinit(pp); + clk_disable_unprepare(pp->clk); ++ free_percpu(pp->stats); + iounmap(pp->base); + irq_dispose_mapping(dev->irq); + free_netdev(dev); +diff --git a/drivers/rapidio/devices/tsi721_dma.c b/drivers/rapidio/devices/tsi721_dma.c +index 91245f5dbe81..47257b6eea84 100644 +--- a/drivers/rapidio/devices/tsi721_dma.c ++++ b/drivers/rapidio/devices/tsi721_dma.c +@@ -287,6 +287,12 @@ struct tsi721_tx_desc *tsi721_desc_get(struct tsi721_bdma_chan *bdma_chan) + "desc %p not ACKed\n", tx_desc); + } + ++ if (ret == NULL) { ++ dev_dbg(bdma_chan->dchan.device->dev, ++ "%s: unable to obtain tx descriptor\n", __func__); ++ goto err_out; ++ } ++ + i = bdma_chan->wr_count_next % bdma_chan->bd_num; + if (i == bdma_chan->bd_num - 1) { + i = 0; +@@ -297,7 +303,7 @@ struct tsi721_tx_desc *tsi721_desc_get(struct tsi721_bdma_chan *bdma_chan) + tx_desc->txd.phys = bdma_chan->bd_phys + + i * sizeof(struct tsi721_dma_desc); + tx_desc->hw_desc = &((struct tsi721_dma_desc *)bdma_chan->bd_base)[i]; +- ++err_out: + spin_unlock_bh(&bdma_chan->lock); + + return ret; +diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c +index 86d522004a20..e5953c8018c5 100644 +--- a/drivers/scsi/scsi_lib.c ++++ b/drivers/scsi/scsi_lib.c +@@ -815,6 +815,14 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes) + scsi_next_command(cmd); + return; + } ++ } else if (blk_rq_bytes(req) == 0 && result && !sense_deferred) { ++ /* ++ * Certain non BLOCK_PC requests are commands that don't ++ * actually transfer anything (FLUSH), so cannot use ++ * good_bytes != blk_rq_bytes(req) as the signal for an error. ++ * This sets the error explicitly for the problem case. ++ */ ++ error = __scsi_error_from_host_byte(cmd, result); + } + + /* no bidi support for !REQ_TYPE_BLOCK_PC yet */ +diff --git a/drivers/staging/vt6655/bssdb.c b/drivers/staging/vt6655/bssdb.c +index f983915168b7..3496a77612ba 100644 +--- a/drivers/staging/vt6655/bssdb.c ++++ b/drivers/staging/vt6655/bssdb.c +@@ -1026,7 +1026,7 @@ start: + pDevice->byERPFlag &= ~(WLAN_SET_ERP_USE_PROTECTION(1)); + } + +- { ++ if (pDevice->eCommandState == WLAN_ASSOCIATE_WAIT) { + pDevice->byReAssocCount++; + if ((pDevice->byReAssocCount > 10) && (pDevice->bLinkPass != true)) { //10 sec timeout + printk("Re-association timeout!!!\n"); +diff --git a/drivers/staging/vt6655/device_main.c b/drivers/staging/vt6655/device_main.c +index 08b250f01dae..d170b6f9db7c 100644 +--- a/drivers/staging/vt6655/device_main.c ++++ b/drivers/staging/vt6655/device_main.c +@@ -2434,6 +2434,7 @@ static irqreturn_t device_intr(int irq, void *dev_instance) { + int handled = 0; + unsigned char byData = 0; + int ii = 0; ++ unsigned long flags; + // unsigned char byRSSI; + + MACvReadISR(pDevice->PortOffset, &pDevice->dwIsr); +@@ -2459,7 +2460,8 @@ static irqreturn_t device_intr(int irq, void *dev_instance) { + + handled = 1; + MACvIntDisable(pDevice->PortOffset); +- spin_lock_irq(&pDevice->lock); ++ ++ spin_lock_irqsave(&pDevice->lock, flags); + + //Make sure current page is 0 + VNSvInPortB(pDevice->PortOffset + MAC_REG_PAGE1SEL, &byOrgPageSel); +@@ -2700,7 +2702,8 @@ static irqreturn_t device_intr(int irq, void *dev_instance) { + MACvSelectPage1(pDevice->PortOffset); + } + +- spin_unlock_irq(&pDevice->lock); ++ spin_unlock_irqrestore(&pDevice->lock, flags); ++ + MACvIntEnable(pDevice->PortOffset, IMR_MASK_VALUE); + + return IRQ_RETVAL(handled); +diff --git a/include/linux/printk.h b/include/linux/printk.h +index 22c7052e9372..708b8a84f6c0 100644 +--- a/include/linux/printk.h ++++ b/include/linux/printk.h +@@ -124,9 +124,9 @@ asmlinkage __printf(1, 2) __cold + int printk(const char *fmt, ...); + + /* +- * Special printk facility for scheduler use only, _DO_NOT_USE_ ! ++ * Special printk facility for scheduler/timekeeping use only, _DO_NOT_USE_ ! + */ +-__printf(1, 2) __cold int printk_sched(const char *fmt, ...); ++__printf(1, 2) __cold int printk_deferred(const char *fmt, ...); + + /* + * Please don't use printk_ratelimit(), because it shares ratelimiting state +@@ -161,7 +161,7 @@ int printk(const char *s, ...) + return 0; + } + static inline __printf(1, 2) __cold +-int printk_sched(const char *s, ...) ++int printk_deferred(const char *s, ...) + { + return 0; + } +diff --git a/init/main.c b/init/main.c +index e83ac04fda97..2132ffd5e031 100644 +--- a/init/main.c ++++ b/init/main.c +@@ -606,6 +606,10 @@ asmlinkage void __init start_kernel(void) + if (efi_enabled(EFI_RUNTIME_SERVICES)) + efi_enter_virtual_mode(); + #endif ++#ifdef CONFIG_X86_ESPFIX64 ++ /* Should be run before the first non-init thread is created */ ++ init_espfix_bsp(); ++#endif + thread_info_cache_init(); + cred_init(); + fork_init(totalram_pages); +diff --git a/kernel/printk.c b/kernel/printk.c +index d37d45c90ae6..f7aff4bd5454 100644 +--- a/kernel/printk.c ++++ b/kernel/printk.c +@@ -2485,7 +2485,7 @@ void wake_up_klogd(void) + preempt_enable(); + } + +-int printk_sched(const char *fmt, ...) ++int printk_deferred(const char *fmt, ...) + { + unsigned long flags; + va_list args; +diff --git a/kernel/sched/core.c b/kernel/sched/core.c +index 2672eca82a2b..c771f2547bef 100644 +--- a/kernel/sched/core.c ++++ b/kernel/sched/core.c +@@ -1235,7 +1235,7 @@ out: + * leave kernel. + */ + if (p->mm && printk_ratelimit()) { +- printk_sched("process %d (%s) no longer affine to cpu%d\n", ++ printk_deferred("process %d (%s) no longer affine to cpu%d\n", + task_pid_nr(p), p->comm, cpu); + } + } +diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c +index 15334e6de832..2dffc7b5d469 100644 +--- a/kernel/sched/rt.c ++++ b/kernel/sched/rt.c +@@ -892,7 +892,7 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq) + + if (!once) { + once = true; +- printk_sched("sched: RT throttling activated\n"); ++ printk_deferred("sched: RT throttling activated\n"); + } + } else { + /* +diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c +index 9df0e3b19f09..58e8430165b5 100644 +--- a/kernel/time/clockevents.c ++++ b/kernel/time/clockevents.c +@@ -138,7 +138,8 @@ static int clockevents_increase_min_delta(struct clock_event_device *dev) + { + /* Nothing to do if we already reached the limit */ + if (dev->min_delta_ns >= MIN_DELTA_LIMIT) { +- printk(KERN_WARNING "CE: Reprogramming failure. Giving up\n"); ++ printk_deferred(KERN_WARNING ++ "CE: Reprogramming failure. Giving up\n"); + dev->next_event.tv64 = KTIME_MAX; + return -ETIME; + } +@@ -151,9 +152,10 @@ static int clockevents_increase_min_delta(struct clock_event_device *dev) + if (dev->min_delta_ns > MIN_DELTA_LIMIT) + dev->min_delta_ns = MIN_DELTA_LIMIT; + +- printk(KERN_WARNING "CE: %s increased min_delta_ns to %llu nsec\n", +- dev->name ? dev->name : "?", +- (unsigned long long) dev->min_delta_ns); ++ printk_deferred(KERN_WARNING ++ "CE: %s increased min_delta_ns to %llu nsec\n", ++ dev->name ? dev->name : "?", ++ (unsigned long long) dev->min_delta_ns); + return 0; + } + +diff --git a/lib/btree.c b/lib/btree.c +index f9a484676cb6..4264871ea1a0 100644 +--- a/lib/btree.c ++++ b/lib/btree.c +@@ -198,6 +198,7 @@ EXPORT_SYMBOL_GPL(btree_init); + + void btree_destroy(struct btree_head *head) + { ++ mempool_free(head->node, head->mempool); + mempool_destroy(head->mempool); + head->mempool = NULL; + } +diff --git a/mm/page_alloc.c b/mm/page_alloc.c +index 0ab02fb8e9b1..71305c6aba5b 100644 +--- a/mm/page_alloc.c ++++ b/mm/page_alloc.c +@@ -2339,7 +2339,7 @@ static inline int + gfp_to_alloc_flags(gfp_t gfp_mask) + { + int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET; +- const gfp_t wait = gfp_mask & __GFP_WAIT; ++ const bool atomic = !(gfp_mask & (__GFP_WAIT | __GFP_NO_KSWAPD)); + + /* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */ + BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH); +@@ -2348,20 +2348,20 @@ gfp_to_alloc_flags(gfp_t gfp_mask) + * The caller may dip into page reserves a bit more if the caller + * cannot run direct reclaim, or if the caller has realtime scheduling + * policy or is asking for __GFP_HIGH memory. GFP_ATOMIC requests will +- * set both ALLOC_HARDER (!wait) and ALLOC_HIGH (__GFP_HIGH). ++ * set both ALLOC_HARDER (atomic == true) and ALLOC_HIGH (__GFP_HIGH). + */ + alloc_flags |= (__force int) (gfp_mask & __GFP_HIGH); + +- if (!wait) { ++ if (atomic) { + /* +- * Not worth trying to allocate harder for +- * __GFP_NOMEMALLOC even if it can't schedule. ++ * Not worth trying to allocate harder for __GFP_NOMEMALLOC even ++ * if it can't schedule. + */ +- if (!(gfp_mask & __GFP_NOMEMALLOC)) ++ if (!(gfp_mask & __GFP_NOMEMALLOC)) + alloc_flags |= ALLOC_HARDER; + /* +- * Ignore cpuset if GFP_ATOMIC (!wait) rather than fail alloc. +- * See also cpuset_zone_allowed() comment in kernel/cpuset.c. ++ * Ignore cpuset mems for GFP_ATOMIC rather than fail, see the ++ * comment for __cpuset_node_allowed_softwall(). + */ + alloc_flags &= ~ALLOC_CPUSET; + } else if (unlikely(rt_task(current)) && !in_interrupt()) +diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c +index 9a0e5874e73e..164fa9dcd97d 100644 +--- a/net/l2tp/l2tp_ppp.c ++++ b/net/l2tp/l2tp_ppp.c +@@ -1365,7 +1365,7 @@ static int pppol2tp_setsockopt(struct socket *sock, int level, int optname, + int err; + + if (level != SOL_PPPOL2TP) +- return udp_prot.setsockopt(sk, level, optname, optval, optlen); ++ return -EINVAL; + + if (optlen < sizeof(int)) + return -EINVAL; +@@ -1491,7 +1491,7 @@ static int pppol2tp_getsockopt(struct socket *sock, int level, int optname, + struct pppol2tp_session *ps; + + if (level != SOL_PPPOL2TP) +- return udp_prot.getsockopt(sk, level, optname, optval, optlen); ++ return -EINVAL; + + if (get_user(len, optlen)) + return -EFAULT; +diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c +index d566cdba24ec..10eea2326022 100644 +--- a/net/mac80211/tx.c ++++ b/net/mac80211/tx.c +@@ -398,6 +398,9 @@ ieee80211_tx_h_multicast_ps_buf(struct ieee80211_tx_data *tx) + if (ieee80211_has_order(hdr->frame_control)) + return TX_CONTINUE; + ++ if (ieee80211_is_probe_req(hdr->frame_control)) ++ return TX_CONTINUE; ++ + /* no stations in PS mode */ + if (!atomic_read(&ps->num_sta_ps)) + return TX_CONTINUE; +@@ -447,6 +450,7 @@ ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx) + { + struct sta_info *sta = tx->sta; + struct ieee80211_tx_info *info = IEEE80211_SKB_CB(tx->skb); ++ struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)tx->skb->data; + struct ieee80211_local *local = tx->local; + + if (unlikely(!sta)) +@@ -457,6 +461,15 @@ ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx) + !(info->flags & IEEE80211_TX_CTL_NO_PS_BUFFER))) { + int ac = skb_get_queue_mapping(tx->skb); + ++ /* only deauth, disassoc and action are bufferable MMPDUs */ ++ if (ieee80211_is_mgmt(hdr->frame_control) && ++ !ieee80211_is_deauth(hdr->frame_control) && ++ !ieee80211_is_disassoc(hdr->frame_control) && ++ !ieee80211_is_action(hdr->frame_control)) { ++ info->flags |= IEEE80211_TX_CTL_NO_PS_BUFFER; ++ return TX_CONTINUE; ++ } ++ + ps_dbg(sta->sdata, "STA %pM aid %d: PS buffer for AC %d\n", + sta->sta.addr, sta->sta.aid, ac); + if (tx->local->total_ps_buffered >= TOTAL_MAX_TX_BUFFER) +@@ -514,22 +527,8 @@ ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx) + static ieee80211_tx_result debug_noinline + ieee80211_tx_h_ps_buf(struct ieee80211_tx_data *tx) + { +- struct ieee80211_tx_info *info = IEEE80211_SKB_CB(tx->skb); +- struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)tx->skb->data; +- + if (unlikely(tx->flags & IEEE80211_TX_PS_BUFFERED)) + return TX_CONTINUE; +- +- /* only deauth, disassoc and action are bufferable MMPDUs */ +- if (ieee80211_is_mgmt(hdr->frame_control) && +- !ieee80211_is_deauth(hdr->frame_control) && +- !ieee80211_is_disassoc(hdr->frame_control) && +- !ieee80211_is_action(hdr->frame_control)) { +- if (tx->flags & IEEE80211_TX_UNICAST) +- info->flags |= IEEE80211_TX_CTL_NO_PS_BUFFER; +- return TX_CONTINUE; +- } +- + if (tx->flags & IEEE80211_TX_UNICAST) + return ieee80211_tx_h_unicast_ps_buf(tx); + else +diff --git a/net/wireless/trace.h b/net/wireless/trace.h +index 5755bc14abbd..bc5a75b1aef8 100644 +--- a/net/wireless/trace.h ++++ b/net/wireless/trace.h +@@ -1972,7 +1972,8 @@ TRACE_EVENT(cfg80211_michael_mic_failure, + MAC_ASSIGN(addr, addr); + __entry->key_type = key_type; + __entry->key_id = key_id; +- memcpy(__entry->tsc, tsc, 6); ++ if (tsc) ++ memcpy(__entry->tsc, tsc, 6); + ), + TP_printk(NETDEV_PR_FMT ", " MAC_PR_FMT ", key type: %d, key id: %d, tsc: %pm", + NETDEV_PR_ARG, MAC_PR_ARG(addr), __entry->key_type,