From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 3E277158094 for ; Tue, 27 Sep 2022 12:02:19 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 67A78E0B63; Tue, 27 Sep 2022 12:02:18 +0000 (UTC) Received: from smtp.gentoo.org (woodpecker.gentoo.org [140.211.166.183]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 4B1E6E0B63 for ; Tue, 27 Sep 2022 12:02:18 +0000 (UTC) Received: from oystercatcher.gentoo.org (oystercatcher.gentoo.org [148.251.78.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTPS id 442F7341161 for ; Tue, 27 Sep 2022 12:02:17 +0000 (UTC) Received: from localhost.localdomain (localhost [IPv6:::1]) by oystercatcher.gentoo.org (Postfix) with ESMTP id 72CE75E9 for ; Tue, 27 Sep 2022 12:02:15 +0000 (UTC) From: "Mike Pagano" To: gentoo-commits@lists.gentoo.org Content-Transfer-Encoding: 8bit Content-type: text/plain; charset=UTF-8 Reply-To: gentoo-dev@lists.gentoo.org, "Mike Pagano" Message-ID: <1664280069.50ea097c7e3b862632af3b0c99cb2467857c7fa7.mpagano@gentoo> Subject: [gentoo-commits] proj/linux-patches:5.19 commit in: / X-VCS-Repository: proj/linux-patches X-VCS-Files: 0000_README 5030_ACPI-Skip-dummy-wait-for-Zen-processors.patch X-VCS-Directories: / X-VCS-Committer: mpagano X-VCS-Committer-Name: Mike Pagano X-VCS-Revision: 50ea097c7e3b862632af3b0c99cb2467857c7fa7 X-VCS-Branch: 5.19 Date: Tue, 27 Sep 2022 12:02:15 +0000 (UTC) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-commits@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply X-Archives-Salt: 76281412-b2f7-4a87-a2d1-523e11030e04 X-Archives-Hash: ddb7156c44d43541e188a9de33b0aca9 commit: 50ea097c7e3b862632af3b0c99cb2467857c7fa7 Author: Mike Pagano gentoo org> AuthorDate: Tue Sep 27 12:01:09 2022 +0000 Commit: Mike Pagano gentoo org> CommitDate: Tue Sep 27 12:01:09 2022 +0000 URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=50ea097c ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture See: https://lkml.org/lkml/2022/9/21/74 See: https://www.phoronix.com/news/Linux-AMD-Old-Chipset-WA Signed-off-by: Mike Pagano gentoo.org> 0000_README | 4 + 5030_ACPI-Skip-dummy-wait-for-Zen-processors.patch | 142 +++++++++++++++++++++ 2 files changed, 146 insertions(+) diff --git a/0000_README b/0000_README index c5e283b0..034f4583 100644 --- a/0000_README +++ b/0000_README @@ -134,3 +134,7 @@ Desc: BMQ(BitMap Queue) Scheduler. A new CPU scheduler developed from PDS(incl Patch: 5021_BMQ-and-PDS-gentoo-defaults.patch From: https://gitweb.gentoo.org/proj/linux-patches.git/ Desc: Set defaults for BMQ. Add archs as people test, default to N + +Patch: 5030_ACPI-Skip-dummy-wait-for-Zen-processors.patch +From: https://lkml.org/lkml/2022/9/21/74 +Desc: ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture diff --git a/5030_ACPI-Skip-dummy-wait-for-Zen-processors.patch b/5030_ACPI-Skip-dummy-wait-for-Zen-processors.patch new file mode 100644 index 00000000..e0f6ebe8 --- /dev/null +++ b/5030_ACPI-Skip-dummy-wait-for-Zen-processors.patch @@ -0,0 +1,142 @@ +linux-kernel.vger.kernel.org archive mirror + help / color / mirror / Atom feed +From: K Prateek Nayak +To: +Cc: , , + , , + , , + , , , + , , + , , + , , + , + K Prateek Nayak , + "Calvin Ong" , , + +Subject: [PATCH] ACPI: processor_idle: Skip dummy wait for processors based on the Zen microarchitecture +Date: Wed, 21 Sep 2022 12:06:38 +0530 [thread overview] +Message-ID: <20220921063638.2489-1-kprateek.nayak@amd.com> (raw) + +Processors based on the Zen microarchitecture support IOPORT based deeper +C-states. The idle driver reads the acpi_gbl_FADT.xpm_timer_block.address +in the IOPORT based C-state exit path which is claimed to be a +"Dummy wait op" and has been around since ACPI introduction to Linux +dating back to Andy Grover's Mar 14, 2002 posting [1]. +The comment above the dummy operation was elaborated by Andreas Mohr back +in 2006 in commit b488f02156d3d ("ACPI: restore comment justifying 'extra' +P_LVLx access") [2] where the commit log claims: +"this dummy read was about: STPCLK# doesn't get asserted in time on +(some) chipsets, which is why we need to have a dummy I/O read to delaarchitecture-relatedy +further instruction processing until the CPU is fully stopped." + +However, sampling certain workloads with IBS on AMD Zen3 system shows +that a significant amount of time is spent in the dummy op, which +incorrectly gets accounted as C-State residency. A large C-State +residency value can prime the cpuidle governor to recommend a deeper +C-State during the subsequent idle instances, starting a vicious cycle, +leading to performance degradation on workloads that rapidly switch +between busy and idle phases. + +One such workload is tbench where a massive performance degradation can +be observed during certain runs. Following are some statistics gathered +by running tbench with 128 clients, on a dual socket (2 x 64C/128T) Zen3 +system with the baseline kernel, baseline kernel keeping C2 disabled, +and baseline kernel with this patch applied keeping C2 enabled: + +baseline kernel was tip:sched/core at +commit f3dd3f674555 ("sched: Remove the limitation of WF_ON_CPU on +wakelist if wakee cpu is idle") + +Kernel : baseline baseline + C2 disabled baseline + patch + +Min (MB/s) : 2215.06 33072.10 (+1393.05%) 33016.10 (+1390.52%) +Max (MB/s) : 32938.80 34399.10 34774.50 +Median (MB/s) : 32191.80 33476.60 33805.70 +AMean (MB/s) : 22448.55 33649.27 (+49.89%) 33865.43 (+50.85%) +AMean Stddev : 17526.70 680.14 880.72 +AMean CoefVar : 78.07% 2.02% 2.60% + +The data shows there are edge cases that can cause massive regressions +in case of tbench. Profiling the bad runs with IBS shows a significant +amount of time being spent in acpi_idle_do_entry method: + +Overhead Command Shared Object Symbolarchitecture-related + 74.76% swapper [kernel.kallsyms] [k] acpi_idle_do_entry + 0.71% tbench [kernel.kallsyms] [k] update_sd_lb_stats.constprop.0 + 0.69% tbench_srv [kernel.kallsyms] [k] update_sd_lb_stats.constprop.0 + 0.49% swapper [kernel.kallsyms] [k] psi_group_change + ... + +Annotation of acpi_idle_do_entry method reveals almost all the time in +acpi_idle_do_entry is spent on the port I/O in wait_for_freeze(): + + 0.14 │ in (%dx),%al # <------ First "in" corresponding to inb(cx->address) + 0.51 │ mov 0x144d64d(%rip),%rax + 0.00 │ test $0x80000000,%eax + │ ↓ jne 62 # <------ Skip if running in guest + 0.00 │ mov 0x19800c3(%rip),%rdx + 99.33 │ in (%dx),%eax # <------ Second "in" corresponding to inl(acpi_gbl_FADT.xpm_timer_block.address) + 0.00 │62: mov -0x8(%rbp),%r12 + 0.00 │ leavearchitecture-related + 0.00 │ ← ret + +This overhead is reflected in the C2 residency on the test system where +C2 is an IOPORT based C-State. The total C-state residency reported by +"cpupower idle-info" on CPU0 for good and bad case over the 80s tbench +run is as follows (all numbers are in microseconds): + + Good Run Bad Run + (Baseline) + +POLL: 43338 6231 (-85.62%) +C1 (MWAIT Based): 23576156 363861 (-98.45%) +C2 (IOPORT Based): 10781218 77027280 (+614.45%) + +The larger residency value in bad case leads to the system recommending +C2 state again for subsequent idle instances. The pattern lasts till the +end of the tbench run. Following is the breakdown of "entry_method" +passed to acpi_idle_do_entry during good run and bad run: + + Good Run Bad Run + (Baseline) + +Number of times acpi_idle_do_entry was called: 6149573 6149050 (-0.01%) + |-> Number of times entry_method was "ACPI_CSTATE_FFH": 6141494 88144 (-98.56%) + |-> Number of times entry_method was "ACPI_CSTATE_HALT": 0 0 (+0.00%) + |-> Number of times entry_method was "ACPI_CSTATE_SYSTEMIO": 8079 6060906 (+74920.49%) + +For processors based on the Zen microarchitecture, this dummy wait op is +unnecessary and can be skipped when choosing IOPORT based C-States to +avoid polluting the C-state residency information. + +Link: https://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux-fullhistory.git/commit/?id=972c16130d9dc182cedcdd408408d9eacc7d6a2d [1] +Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b488f02156d3deb08f5ad7816d565c370a8cc6f1 [2] + +Suggested-by: Calvin Ong +Cc: stable@vger.kernel.org +Cc: regressions@lists.linux.dev +Signed-off-by: K Prateek Nayak +--- + drivers/acpi/processor_idle.c | 7 +++++-- + 1 file changed, 5 insertions(+), 2 deletions(-) + +diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c +index 16a1663d02d4..18850aa2b79b 100644 +--- a/drivers/acpi/processor_idle.c ++++ b/drivers/acpi/processor_idle.c +@@ -528,8 +528,11 @@ static int acpi_idle_bm_check(void) + static void wait_for_freeze(void) + { + #ifdef CONFIG_X86 +- /* No delay is needed if we are in guest */ +- if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) ++ /* ++ * No delay is needed if we are in guest or on a processor ++ * based on the Zen microarchitecture. ++ */ ++ if (boot_cpu_has(X86_FEATURE_HYPERVISOR) || boot_cpu_has(X86_FEATURE_ZEN)) + return; + #endif + /* Dummy wait op - must do something useless after P_LVL2 read +-- +2.25.1