public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] [PATCH 0/2] rocm.eclass: drop xnack feature flag and support ROCm 6
@ 2024-04-14 16:21 Yiyang Wu
  2024-04-14 16:21 ` [gentoo-dev] [PATCH 1/2] rocm.eclass: remove xnack flag for broader compatibility Yiyang Wu
  2024-04-14 16:21 ` [gentoo-dev] [PATCH 2/2] rocm.eclass: Enable ROCm 6, add gfx94{0,1,2} (MI300) support Yiyang Wu
  0 siblings, 2 replies; 3+ messages in thread
From: Yiyang Wu @ 2024-04-14 16:21 UTC (permalink / raw)
  To: gentoo-dev; +Cc: Yiyang Wu

The first patch simplifies rocm.eclass, dropping xnack feature in
get_amdgpu_flags. This makes the compiled GPU binary more compatible.
The details are explained the commit message.

The second commit adds support for ROCm 6 and MI300.

Github pull request: https://github.com/gentoo/gentoo/pull/36254

Yiyang Wu (2):
  rocm.eclass: remove xnack flag for broader compatibility
  Enable ROCm 6, add gfx94{0,1,2} (MI300) support

 eclass/rocm.eclass                | 31 +++++++++++++------------------
 profiles/desc/amdgpu_targets.desc | 14 ++++++++++----
 2 files changed, 23 insertions(+), 22 deletions(-)

-- 
2.41.0



^ permalink raw reply	[flat|nested] 3+ messages in thread

* [gentoo-dev] [PATCH 1/2] rocm.eclass: remove xnack flag for broader compatibility
  2024-04-14 16:21 [gentoo-dev] [PATCH 0/2] rocm.eclass: drop xnack feature flag and support ROCm 6 Yiyang Wu
@ 2024-04-14 16:21 ` Yiyang Wu
  2024-04-14 16:21 ` [gentoo-dev] [PATCH 2/2] rocm.eclass: Enable ROCm 6, add gfx94{0,1,2} (MI300) support Yiyang Wu
  1 sibling, 0 replies; 3+ messages in thread
From: Yiyang Wu @ 2024-04-14 16:21 UTC (permalink / raw)
  To: gentoo-dev; +Cc: Yiyang Wu

Initially, rocm.eclass append xnack[1,2] feature flag to gfx9 GPUs,
since ROCm upstream does this in many of their math libraries, e.g.
rocBLAS [3]. The list includes gfx90a:xnack+, indicating xnack is usable
for MI200 series, thus rocm.eclass append :xnack+ to gfx90a.

But it turns out xnack- is also common for MI200 series, restricting to
xnack+ produces incompatible GPU kernel with xnack- mode.

Also, community also explores using xnack on other gfx9 GPU [4,5], which
is previously restricted to xnack- in rocm.eclass.

By not appending xnack feature flag, GPU kernels are compiled to "xnack
any" mode, which can be run in either mode, potentially scarifying some
performance [6,7], with no direct evidence. rocFFT reports no
performance penalty[8].

For the reason above, do not append xnack feature flag to AMDGPU_TARGETS,
which is compatible with GPUs operate in both xnack mode.

[1] https://wiki.gentoo.org/wiki/ROCm#XNACK_target_feature
[2] https://rocm.docs.amd.com/en/latest/conceptual/gpu-memory.html#xnack
[3] https://github.com/ROCm/rocBLAS/blob/release/rocm-rel-5.0/CMakeLists.txt#L201
[4] https://niconiconi.neocities.org/tech-notes/xnack-on-amd-gpus/
[5] https://arxiv.org/abs/2401.02680
[6] https://llvm.org/docs/AMDGPUUsage.html#target-features
[7] https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html#compiling-hip-kernels-for-specific-xnack-modes
[8] https://github.com/ROCm/rocFFT/commit/cd2689360ba3b3579d044d8925838ff307b4b4cf

Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com>
---
 eclass/rocm.eclass | 19 ++-----------------
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/eclass/rocm.eclass b/eclass/rocm.eclass
index 9804ecde97d0..e03e8bdd507a 100644
--- a/eclass/rocm.eclass
+++ b/eclass/rocm.eclass
@@ -1,4 +1,4 @@
-# Copyright 2022-2023 Gentoo Authors
+# Copyright 2022-2024 Gentoo Authors
 # Distributed under the terms of the GNU General Public License v2
 
 # @ECLASS: rocm.eclass
@@ -201,22 +201,7 @@ unset -f _rocm_set_globals
 # Append default target feature to GPU arch. See
 # https://llvm.org/docs/AMDGPUUsage.html#target-features
 get_amdgpu_flags() {
-	local amdgpu_target_flags
-	for gpu_target in ${AMDGPU_TARGETS}; do
-	local target_feature=
-		case ${gpu_target} in
-			gfx906|gfx908)
-				target_feature=:xnack-
-				;;
-			gfx90a)
-				target_feature=:xnack+
-				;;
-			*)
-				;;
-		esac
-		amdgpu_target_flags+="${gpu_target}${target_feature};"
-	done
-	echo "${amdgpu_target_flags}"
+	echo $(printf "%s;" ${AMDGPU_TARGETS[@]})
 }
 
 # @FUNCTION: check_amdgpu
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [gentoo-dev] [PATCH 2/2] rocm.eclass: Enable ROCm 6, add gfx94{0,1,2} (MI300) support
  2024-04-14 16:21 [gentoo-dev] [PATCH 0/2] rocm.eclass: drop xnack feature flag and support ROCm 6 Yiyang Wu
  2024-04-14 16:21 ` [gentoo-dev] [PATCH 1/2] rocm.eclass: remove xnack flag for broader compatibility Yiyang Wu
@ 2024-04-14 16:21 ` Yiyang Wu
  1 sibling, 0 replies; 3+ messages in thread
From: Yiyang Wu @ 2024-04-14 16:21 UTC (permalink / raw)
  To: gentoo-dev; +Cc: Yiyang Wu

Also update the references, since the original reference does not
contain MI300. The "see also" blog is also removed because it hasn't been
updated for 2 years.

Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com>
---
 eclass/rocm.eclass                | 12 +++++++++++-
 profiles/desc/amdgpu_targets.desc | 14 ++++++++++----
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/eclass/rocm.eclass b/eclass/rocm.eclass
index e03e8bdd507a..7039455dec6b 100644
--- a/eclass/rocm.eclass
+++ b/eclass/rocm.eclass
@@ -161,7 +161,7 @@ _rocm_set_globals() {
 				gfx906 gfx908 gfx90a gfx1030
 			)
 			;;
-		5.*|9999)
+		5.*)
 			unofficial_amdgpu_targets=(
 				gfx803 gfx900 gfx1010 gfx1011 gfx1012
 				gfx1031 gfx1100 gfx1101 gfx1102
@@ -170,6 +170,16 @@ _rocm_set_globals() {
 				gfx906 gfx908 gfx90a gfx1030
 			)
 			;;
+		6.*|9999)
+			unofficial_amdgpu_targets=(
+				gfx803 gfx900 gfx940 gfx941
+				gfx1010 gfx1011 gfx1012
+				gfx1031 gfx1101 gfx1102
+			)
+			official_amdgpu_targets=(
+				gfx906 gfx908 gfx90a gfx942 gfx1030 gfx1100
+			)
+			;;
 		*)
 			die "Unknown ROCm major version! Please update rocm.eclass before bumping to new ebuilds"
 			;;
diff --git a/profiles/desc/amdgpu_targets.desc b/profiles/desc/amdgpu_targets.desc
index d52080781947..8f337b03f63d 100644
--- a/profiles/desc/amdgpu_targets.desc
+++ b/profiles/desc/amdgpu_targets.desc
@@ -1,15 +1,21 @@
-# Copyright 1999-2023 Gentoo Authors.
+# Copyright 1999-2024 Gentoo Authors.
 # Distributed under the terms of the GNU General Public License v2
 
-# Reference:
-# GPU name and Architecture codename: https://github.com/GPUOpen-Tools/device_info/blob/master/DeviceInfo.cpp
-# See also: https://www.coelacanth-dream.com/posts/2019/12/30/did-rid-product-matome-p2/#fn:67
+# Reference: GPU name and architecture codename documented by
+# GPUOpen-Tools https://github.com/GPUOpen-Tools/device_info/blob/master/DeviceInfo.cpp
+# ROCm official document (Instinct accelerator only) https://rocm.docs.amd.com/en/latest/reference/gpu-arch/gpu-arch-spec-overview.html
+# Kernel document (note: GC version is not amdgpu gfx target) https://www.kernel.org/doc/html/latest/gpu/amdgpu/driver-misc.html#discrete-gpu-info
+# Kernel source code (map of IP version vs amdgpu gfx target) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/amd/amdkfd/kfd_device.c kgd2kfd_probe function
+# Mesa drm source code https://gitlab.freedesktop.org/mesa/drm/blob/main/data/amdgpu.ids
 
 gfx803 - Fiji GPU, codename fiji, including Radeon R9 Nano/Fury/FuryX, Radeon Pro Duo, FirePro S9300x2, Radeon Instinct MI8
 gfx900 - Vega GPU, codename vega10, including Radeon Vega Frontier Edition, Radeon RX Vega 56/64, Radeon RX Vega 64 Liquid, Radeon Pro Vega 48/56/64/64X, Radeon Pro WX 8200/9100, Radeon Pro V320/V340/SSG, Radeon Instinct MI25
 gfx906 - Vega GPU, codename vega20, including Radeon (Pro) VII, Radeon Instinct MI50/MI60
 gfx908 - CDNA Accelerator, codename arcturus, including AMD Instinct MI100 Accelerator
 gfx90a - CDNA2 Accelerator, codename aldebaran, including AMD Instinct MI200 series Accelerators
+gfx940 - CDNA3 Accelerator, codename aqua_vangaram, MI300A rev 0
+gfx941 - CDNA3 Accelerator, codename aqua_vangaram, MI300X rev 0
+gfx942 - CDNA3 Accelerator, codename aqua_vangaram, MI300A and MI300X rev >=1
 gfx1010 - RDNA GPU, codename navi10, including Radeon RX 5700XT/5700/5700M/5700B/5700XTB/5600XT/5600/5600M, Radeon Pro 5700XT/5700, Radeon Pro W5700X/W5700
 gfx1011 - RDNA GPU, codename navi12, including Radeon Pro 5600M/V520
 gfx1012 - RDNA GPU, codename navi14, including Radeon RX 5500XT/5500/5500M/5500XTB/5300/5300M, Radeon Pro 5500XT/5500M/5300/5300M, Radeon Pro W5500X/W5500/W5500M/W5300M
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-04-14 16:22 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-14 16:21 [gentoo-dev] [PATCH 0/2] rocm.eclass: drop xnack feature flag and support ROCm 6 Yiyang Wu
2024-04-14 16:21 ` [gentoo-dev] [PATCH 1/2] rocm.eclass: remove xnack flag for broader compatibility Yiyang Wu
2024-04-14 16:21 ` [gentoo-dev] [PATCH 2/2] rocm.eclass: Enable ROCm 6, add gfx94{0,1,2} (MI300) support Yiyang Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox