public inbox for gentoo-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-dev] [PATCH 1/2] rocm.eclass: Fix the xnack feature for gfx90a
@ 2023-11-17  3:21 Yiyang Wu
  2023-11-17  3:21 ` [gentoo-dev] [PATCH 2/2] profiles/desc/amdgpu_targets.desc: Fill in more GPU models Yiyang Wu
  2023-11-27  0:08 ` [gentoo-dev] [PATCH 1/2] rocm.eclass: Fix the xnack feature for gfx90a Benda Xu
  0 siblings, 2 replies; 4+ messages in thread
From: Yiyang Wu @ 2023-11-17  3:21 UTC (permalink / raw)
  To: gentoo-dev; +Cc: Yiyang Wu

MI210 is xnack disabled by default. Compile with :xnack+ will produce
GPU kernels that cannot be run on MI210.

Two targets, gfx90a_xnack and gfx90a_noxnack are introduced to replace
gfx90a, for the following reason:

Upstream usually ships 2 version: gfx90a:xnack-, gfx90a:xnack+. Although
a single gfx90a should have maximum compatibility, According to [1,2],
compile with xnack+/xnack- may have better performance on xnack
enabled/disabled GPUs. Therefore we ship both the target, align with
upstream. gfx900 is also appended with :xnack- to align with upstream
default.

However, some ROCm libraries does not need xnack feature flag, such as
sci-libs/miopen and sci-libs/rocFFT, so "--no-xnack-flag" option is
introduced to stop appending xnack feature. Demonstrate this option in
the second example ebuild.

Update description file profiles/desc/amdgpu_targets.desc accordingly.

Github PR: https://github.com/gentoo/gentoo/pull/33861

[1] https://llvm.org/docs/AMDGPUUsage.html#target-features
[2] https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html#compiling-hip-kernels-for-specific-xnack-modes

Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com>
---
 eclass/rocm.eclass                | 31 ++++++++++++++++++++++---------
 profiles/desc/amdgpu_targets.desc |  5 ++++-
 2 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/eclass/rocm.eclass b/eclass/rocm.eclass
index b78dfea1cc31..3209c003e323 100644
--- a/eclass/rocm.eclass
+++ b/eclass/rocm.eclass
@@ -60,7 +60,8 @@
 #
 # Examples for packages depend on ROCm libraries -- a package which depends on
 # rocBLAS, uses comma separated ${HCC_AMDGPU_TARGET} to determine GPU
-# architectures, and requires ROCm version >=5.1
+# architectures which does not accept :xnack feature flag, and requires ROCm
+# version >=5.1
 # @CODE
 # ROCM_VERSION=5.1
 # inherit rocm
@@ -71,7 +72,7 @@
 #
 # src_configure() {
 #     if use rocm; then
-#         local amdgpu_flags=$(get_amdgpu_flags)
+#         local amdgpu_flags=$(get_amdgpu_flags --no-xnack-flag)
 #         export HCC_AMDGPU_TARGET=${amdgpu_flags//;/,}
 #     fi
 #     default
@@ -143,7 +144,7 @@ _rocm_set_globals() {
 				gfx803 gfx900 gfx1010 gfx1011 gfx1012 gfx1031
 			)
 			official_amdgpu_targets=(
-				gfx906 gfx908 gfx90a gfx1030
+				gfx906 gfx908 gfx90a_xnack gfx90a_noxnack gfx1030
 			)
 			;;
 		5.*|9999)
@@ -152,7 +153,7 @@ _rocm_set_globals() {
 				gfx1031 gfx1100 gfx1101 gfx1102
 			)
 			official_amdgpu_targets=(
-				gfx906 gfx908 gfx90a gfx1030
+				gfx906 gfx908 gfx90a_xnack gfx90a_noxnack gfx1030
 			)
 			;;
 		*)
@@ -181,26 +182,38 @@ unset -f _rocm_set_globals
 
 
 # @FUNCTION: get_amdgpu_flags
-# @USAGE: get_amdgpu_flags
+# @USAGE: get_amdgpu_flags [--no-xnack-flag]
 # @DESCRIPTION:
 # Convert specified use flag of amdgpu_targets to compilation flags.
-# Append default target feature to GPU arch. See
+# Append default target feature xnack to GPU arch. See
 # https://llvm.org/docs/AMDGPUUsage.html#target-features
+# If specified with --no-xnack-flag, do not append xnack feature flag.
 get_amdgpu_flags() {
 	local amdgpu_target_flags
 	for gpu_target in ${AMDGPU_TARGETS}; do
 	local target_feature=
 		case ${gpu_target} in
-			gfx906|gfx908)
+			gfx900|gfx906|gfx908)
+				# These GPUs ususally does not enable xnack, so
+				# disabling xnack generates faster GPU kernels.
 				target_feature=:xnack-
 				;;
-			gfx90a)
+			gfx90a_noxnack)
+				gpu_target=gfx90a
+				target_feature=:xnack-
+				;;
+			gfx90a_xnack)
+				gpu_target=gfx90a
 				target_feature=:xnack+
 				;;
 			*)
 				;;
 		esac
-		amdgpu_target_flags+="${gpu_target}${target_feature};"
+		if [[ "$1" == "--no-xnack-flag" ]]; then
+			amdgpu_target_flags+="${gpu_target};"
+		else
+			amdgpu_target_flags+="${gpu_target}${target_feature};"
+		fi
 	done
 	echo "${amdgpu_target_flags}"
 }
diff --git a/profiles/desc/amdgpu_targets.desc b/profiles/desc/amdgpu_targets.desc
index 9c5739e9d9a4..77ffa43399f1 100644
--- a/profiles/desc/amdgpu_targets.desc
+++ b/profiles/desc/amdgpu_targets.desc
@@ -4,12 +4,15 @@
 # Reference:
 # GPU name and Architecture codename: https://github.com/GPUOpen-Tools/device_info/blob/master/DeviceInfo.cpp
 # See also: https://www.coelacanth-dream.com/posts/2019/12/30/did-rid-product-matome-p2/#fn:67
+# xnack feature reference: https://llvm.org/docs/AMDGPUUsage.html#target-features
+# See also: https://wiki.gentoo.org/wiki/ROCm#XNACK_target_feature
 
 gfx803 - Fiji GPU, codename fiji, including Radeon R9 Nano/Fury/FuryX, Radeon Pro Duo, FirePro S9300x2, Radeon Instinct MI8
 gfx900 - Vega GPU, codename vega10, including Radeon Vega Frontier Edition, Radeon RX Vega 56/64, Radeon RX Vega 64 Liquid, Radeon Pro Vega 48/56/64/64X, Radeon Pro WX 8200/9100, Radeon Pro V320/V340/SSG, Radeon Instinct MI25
 gfx906 - Vega GPU, codename vega20, including Radeon (Pro) VII, Radeon Instinct MI50/MI60
 gfx908 - CDNA Accelerator, codename arcturus, including AMD Instinct MI100 Accelerator
-gfx90a - CDNA2 Accelerator, codename aldebaran, including AMD Instinct MI200 series Accelerators
+gfx90a_xnack - CDNA2: same as gfx90a_noxnack, with xnack enabled. Compatible with xnack enabled GPU state.
+gfx90a_noxnack - CDNA2 Accelerator, codename aldebaran, including AMD Instinct MI200 series Accelerators, compatible with xnack disabled GPU state.
 gfx1010 - RDNA GPU, codename navi10, including Radeon RX 5700XT/5700/5700M/5700B/5700XTB/5600XT/5600/5600M, Radeon Pro 5700XT/5700, Radeon Pro W5700X/W5700
 gfx1011 - RDNA GPU, codename navi12, including Radeon Pro 5600M/V520
 gfx1012 - RDNA GPU, codename navi14, including Radeon RX 5500XT/5500/5500M/5500XTB/5300/5300M, Radeon Pro 5500XT/5500M/5300/5300M, Radeon Pro W5500X/W5500/W5500M/W5300M
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [gentoo-dev] [PATCH 2/2] profiles/desc/amdgpu_targets.desc: Fill in more GPU models
  2023-11-17  3:21 [gentoo-dev] [PATCH 1/2] rocm.eclass: Fix the xnack feature for gfx90a Yiyang Wu
@ 2023-11-17  3:21 ` Yiyang Wu
  2023-11-27  0:08 ` [gentoo-dev] [PATCH 1/2] rocm.eclass: Fix the xnack feature for gfx90a Benda Xu
  1 sibling, 0 replies; 4+ messages in thread
From: Yiyang Wu @ 2023-11-17  3:21 UTC (permalink / raw)
  To: gentoo-dev; +Cc: Yiyang Wu

Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com>
---
 profiles/desc/amdgpu_targets.desc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/profiles/desc/amdgpu_targets.desc b/profiles/desc/amdgpu_targets.desc
index 77ffa43399f1..46513023c3a9 100644
--- a/profiles/desc/amdgpu_targets.desc
+++ b/profiles/desc/amdgpu_targets.desc
@@ -18,6 +18,6 @@ gfx1011 - RDNA GPU, codename navi12, including Radeon Pro 5600M/V520
 gfx1012 - RDNA GPU, codename navi14, including Radeon RX 5500XT/5500/5500M/5500XTB/5300/5300M, Radeon Pro 5500XT/5500M/5300/5300M, Radeon Pro W5500X/W5500/W5500M/W5300M
 gfx1030 - RDNA2 GPU, codename navi21/sienna cichlid, including Radeon RX 6950XT/6900XT/6800XT/6800, Radeon Pro W6800
 gfx1031 - RDNA2 GPU, codename navi22/navy flounder, including Radeon RX 6750XT/6700XT/6800M/6700M
-gfx1100 - RDNA3 GPU, codename navi31/plum bonito, including Radeon RX 7900XTX/7900XT
-gfx1101 - RDNA3 GPU, codename navi32
-gfx1102 - RDNA3 GPU, codename navi33
+gfx1100 - RDNA3 GPU, codename navi31/plum bonito, including Radeon RX 7900XTX/7900XT, AMD Radeon Pro W7900/W7800
+gfx1101 - RDNA3 GPU, codename navi32, including Radeon RX 7700XT/7800XT
+gfx1102 - RDNA3 GPU, codename navi33, including Radeon RX 7600/7600M/7600M XT/7700S/7600S, AMD Radeon PRO W7600/W7500
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [gentoo-dev] [PATCH 1/2] rocm.eclass: Fix the xnack feature for gfx90a
  2023-11-17  3:21 [gentoo-dev] [PATCH 1/2] rocm.eclass: Fix the xnack feature for gfx90a Yiyang Wu
  2023-11-17  3:21 ` [gentoo-dev] [PATCH 2/2] profiles/desc/amdgpu_targets.desc: Fill in more GPU models Yiyang Wu
@ 2023-11-27  0:08 ` Benda Xu
  2023-12-06  7:44   ` WuYiyang
  1 sibling, 1 reply; 4+ messages in thread
From: Benda Xu @ 2023-11-27  0:08 UTC (permalink / raw)
  To: Yiyang Wu; +Cc: gentoo-dev

Hi Yiyang,

Yiyang Wu <xgreenlandforwyy@gmail.com> writes:

> Upstream usually ships 2 version: gfx90a:xnack-, gfx90a:xnack+. Although
> a single gfx90a should have maximum compatibility, According to [1,2],
> compile with xnack+/xnack- may have better performance on xnack
> enabled/disabled GPUs. Therefore we ship both the target, align with
> upstream. gfx900 is also appended with :xnack- to align with upstream
> default.

> [1] https://llvm.org/docs/AMDGPUUsage.html#target-features
> [2] https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html#compiling-hip-kernels-for-specific-xnack-modes

So, you want to add a new ABI to gfx90a for experimental xnack feature.
I suggest make it gfx908a with gfx908a_xnack, instead of
"gfx908a_noxnack" for consistency the existing naming scheme.

With this minimal modification, the remaining cards such as gfx906 and
gfx908 that support xnack could be updated incrementally.

Benda


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [gentoo-dev] [PATCH 1/2] rocm.eclass: Fix the xnack feature for gfx90a
  2023-11-27  0:08 ` [gentoo-dev] [PATCH 1/2] rocm.eclass: Fix the xnack feature for gfx90a Benda Xu
@ 2023-12-06  7:44   ` WuYiyang
  0 siblings, 0 replies; 4+ messages in thread
From: WuYiyang @ 2023-12-06  7:44 UTC (permalink / raw)
  To: Benda Xu; +Cc: gentoo-dev

On Mon, Nov 27, 2023 at 08:08:40AM +0800, Benda Xu wrote:
> Hi Yiyang,
> 
> Yiyang Wu <xgreenlandforwyy@gmail.com> writes:
> 
> > Upstream usually ships 2 version: gfx90a:xnack-, gfx90a:xnack+. Although
> > a single gfx90a should have maximum compatibility, According to [1,2],
> > compile with xnack+/xnack- may have better performance on xnack
> > enabled/disabled GPUs. Therefore we ship both the target, align with
> > upstream. gfx900 is also appended with :xnack- to align with upstream
> > default.
> 
> > [1] https://llvm.org/docs/AMDGPUUsage.html#target-features
> > [2] https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html#compiling-hip-kernels-for-specific-xnack-modes
> 
> So, you want to add a new ABI to gfx90a for experimental xnack feature.
> I suggest make it gfx90a with gfx90a_xnack, instead of
> "gfx90a_noxnack" for consistency the existing naming scheme.

Previously the gfx90a is appended with :xnack+ feature:

```bash
case ${gpu_target} in
	gfx906|gfx908)
		target_feature=:xnack-
		;;
	gfx90a)
		target_feature=:xnack+
		;;
	*)
		;;
esac
```

That means actually we are shipping gfx90a_xnack originally, without shipping
the xnack disabled build.

So maybe you mean adding a new ABI gfx90a_noxnack, while keeping gfx90a_xnack
as its original name gfx90a?

> 
> With this minimal modification, the remaining cards such as gfx906 and
> gfx908 that support xnack could be updated incrementally.

Yes, although ROCm upstream does not ship the xnack supported build of math
libraries, for unknown reasons. There's no public documentation about this,
while some discussions exists [1]

[1] https://github.com/RadeonOpenCompute/ROCm/issues/2358

Best,
Yiyang Wu


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-12-06  7:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-17  3:21 [gentoo-dev] [PATCH 1/2] rocm.eclass: Fix the xnack feature for gfx90a Yiyang Wu
2023-11-17  3:21 ` [gentoo-dev] [PATCH 2/2] profiles/desc/amdgpu_targets.desc: Fill in more GPU models Yiyang Wu
2023-11-27  0:08 ` [gentoo-dev] [PATCH 1/2] rocm.eclass: Fix the xnack feature for gfx90a Benda Xu
2023-12-06  7:44   ` WuYiyang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox