From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 26EE4158089 for ; Fri, 17 Nov 2023 03:23:00 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 1DFD82BC022; Fri, 17 Nov 2023 03:22:55 +0000 (UTC) Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id E2FD92BC018 for ; Fri, 17 Nov 2023 03:22:54 +0000 (UTC) Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-6bbfb8f7ac4so416312b3a.0 for ; Thu, 16 Nov 2023 19:22:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700191374; x=1700796174; darn=lists.gentoo.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=mtooaNj4mbY+GqUSSaBFoFVIfJsr0h/HEBZm8F47zEI=; b=OxkNu7TUR4YfgFceANLTHFK02tPqfKlSxFbnCuS5ypb1sL2GpktbPVjIk2KhdToZlR cg6o7MDO6+uQf1/Xh5NBAxNqmuM7Ir67fx3szwO+Oshju9mxQdXrXXnunXeMBT+zjkDg eE5usYoAYNSsYAvPFHpbkkIkv6OHZUVfUg+WLgMO+XTAOVS5Qla7osgAtucs5+rkxDsR 3tdzmonW/5VsXPT+O3F+EJJbKG3Y7m8AWjS2L2pV0czk9zqNFBqoCh1QEniEpml7v1aX 2R7bAnC/twcBzfZ6fJRZb2+TyloginlHNVZS1iMNAHvF5gNaM/K73VaLganLbamP/g8K RS8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700191374; x=1700796174; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=mtooaNj4mbY+GqUSSaBFoFVIfJsr0h/HEBZm8F47zEI=; b=rIUyc51Tydp0F7oXt6QB7P5Kcx7Z4NmE+6u7bErcxhIKg29BFcrjp5u8eZZl8EGtPN NjYIVEtc+XPMeiRdd8rekdWguo+IZoqttp/izhOv6WnfKbtgcvxNCE4dUUxEoM1QE+/G yWUrsItOpYcEJMzJg/k+07JEMrcELntPQ1aY68xWjTSyuUtV0zSLLhYckcWbnHUSRzEL AftuYZcwM1Emt2KWdWmvQzPEaxqxO3VYJzhdd2f3sIpZl85xas2ItLXEJRrSeCz7bzQY t+mI53JVlvmqPL5T9h27RpW9hYh8+bqk7fRhMXz41sVDl8GgZm+XNCaHDY7Fgv6qx5we ozVg== X-Gm-Message-State: AOJu0Yw0t6FmXVj3PRjmyhG688Wih/3SSqRbP8maOKe0Pls3DcOu5S/D 6gwZQoxdergybKrvZ8wDZcA5yFTZehg= X-Google-Smtp-Source: AGHT+IHe46D/dvnwxXz2t9YrfJIgScN92/BIk/1pg8v4aNo/D6GpoAfKe6b7xbsqr22vuzHyRGyzRg== X-Received: by 2002:a05:6a00:6c89:b0:68a:6cbe:35a7 with SMTP id jc9-20020a056a006c8900b0068a6cbe35a7mr8598527pfb.2.1700191373531; Thu, 16 Nov 2023 19:22:53 -0800 (PST) Received: from localhost (49.212.183.201.v6.sakura.ne.jp. [2403:3a00:202:1120:49:212:183:201]) by smtp.gmail.com with ESMTPSA id u12-20020a056a00098c00b006c0316485f9sm462979pfg.64.2023.11.16.19.22.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Nov 2023 19:22:53 -0800 (PST) From: Yiyang Wu To: gentoo-dev@lists.gentoo.org Cc: Yiyang Wu Subject: [gentoo-dev] [PATCH 1/2] rocm.eclass: Fix the xnack feature for gfx90a Date: Fri, 17 Nov 2023 11:21:17 +0800 Message-ID: <20231117032201.5213-1-xgreenlandforwyy@gmail.com> X-Mailer: git-send-email 2.41.0 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Archives-Salt: f373902a-3869-4db7-b0c7-e1590347ef28 X-Archives-Hash: 58ff571169d2faf0f6821bfbfca31ac9 MI210 is xnack disabled by default. Compile with :xnack+ will produce GPU kernels that cannot be run on MI210. Two targets, gfx90a_xnack and gfx90a_noxnack are introduced to replace gfx90a, for the following reason: Upstream usually ships 2 version: gfx90a:xnack-, gfx90a:xnack+. Although a single gfx90a should have maximum compatibility, According to [1,2], compile with xnack+/xnack- may have better performance on xnack enabled/disabled GPUs. Therefore we ship both the target, align with upstream. gfx900 is also appended with :xnack- to align with upstream default. However, some ROCm libraries does not need xnack feature flag, such as sci-libs/miopen and sci-libs/rocFFT, so "--no-xnack-flag" option is introduced to stop appending xnack feature. Demonstrate this option in the second example ebuild. Update description file profiles/desc/amdgpu_targets.desc accordingly. Github PR: https://github.com/gentoo/gentoo/pull/33861 [1] https://llvm.org/docs/AMDGPUUsage.html#target-features [2] https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html#compiling-hip-kernels-for-specific-xnack-modes Signed-off-by: Yiyang Wu --- eclass/rocm.eclass | 31 ++++++++++++++++++++++--------- profiles/desc/amdgpu_targets.desc | 5 ++++- 2 files changed, 26 insertions(+), 10 deletions(-) diff --git a/eclass/rocm.eclass b/eclass/rocm.eclass index b78dfea1cc31..3209c003e323 100644 --- a/eclass/rocm.eclass +++ b/eclass/rocm.eclass @@ -60,7 +60,8 @@ # # Examples for packages depend on ROCm libraries -- a package which depends on # rocBLAS, uses comma separated ${HCC_AMDGPU_TARGET} to determine GPU -# architectures, and requires ROCm version >=5.1 +# architectures which does not accept :xnack feature flag, and requires ROCm +# version >=5.1 # @CODE # ROCM_VERSION=5.1 # inherit rocm @@ -71,7 +72,7 @@ # # src_configure() { # if use rocm; then -# local amdgpu_flags=$(get_amdgpu_flags) +# local amdgpu_flags=$(get_amdgpu_flags --no-xnack-flag) # export HCC_AMDGPU_TARGET=${amdgpu_flags//;/,} # fi # default @@ -143,7 +144,7 @@ _rocm_set_globals() { gfx803 gfx900 gfx1010 gfx1011 gfx1012 gfx1031 ) official_amdgpu_targets=( - gfx906 gfx908 gfx90a gfx1030 + gfx906 gfx908 gfx90a_xnack gfx90a_noxnack gfx1030 ) ;; 5.*|9999) @@ -152,7 +153,7 @@ _rocm_set_globals() { gfx1031 gfx1100 gfx1101 gfx1102 ) official_amdgpu_targets=( - gfx906 gfx908 gfx90a gfx1030 + gfx906 gfx908 gfx90a_xnack gfx90a_noxnack gfx1030 ) ;; *) @@ -181,26 +182,38 @@ unset -f _rocm_set_globals # @FUNCTION: get_amdgpu_flags -# @USAGE: get_amdgpu_flags +# @USAGE: get_amdgpu_flags [--no-xnack-flag] # @DESCRIPTION: # Convert specified use flag of amdgpu_targets to compilation flags. -# Append default target feature to GPU arch. See +# Append default target feature xnack to GPU arch. See # https://llvm.org/docs/AMDGPUUsage.html#target-features +# If specified with --no-xnack-flag, do not append xnack feature flag. get_amdgpu_flags() { local amdgpu_target_flags for gpu_target in ${AMDGPU_TARGETS}; do local target_feature= case ${gpu_target} in - gfx906|gfx908) + gfx900|gfx906|gfx908) + # These GPUs ususally does not enable xnack, so + # disabling xnack generates faster GPU kernels. target_feature=:xnack- ;; - gfx90a) + gfx90a_noxnack) + gpu_target=gfx90a + target_feature=:xnack- + ;; + gfx90a_xnack) + gpu_target=gfx90a target_feature=:xnack+ ;; *) ;; esac - amdgpu_target_flags+="${gpu_target}${target_feature};" + if [[ "$1" == "--no-xnack-flag" ]]; then + amdgpu_target_flags+="${gpu_target};" + else + amdgpu_target_flags+="${gpu_target}${target_feature};" + fi done echo "${amdgpu_target_flags}" } diff --git a/profiles/desc/amdgpu_targets.desc b/profiles/desc/amdgpu_targets.desc index 9c5739e9d9a4..77ffa43399f1 100644 --- a/profiles/desc/amdgpu_targets.desc +++ b/profiles/desc/amdgpu_targets.desc @@ -4,12 +4,15 @@ # Reference: # GPU name and Architecture codename: https://github.com/GPUOpen-Tools/device_info/blob/master/DeviceInfo.cpp # See also: https://www.coelacanth-dream.com/posts/2019/12/30/did-rid-product-matome-p2/#fn:67 +# xnack feature reference: https://llvm.org/docs/AMDGPUUsage.html#target-features +# See also: https://wiki.gentoo.org/wiki/ROCm#XNACK_target_feature gfx803 - Fiji GPU, codename fiji, including Radeon R9 Nano/Fury/FuryX, Radeon Pro Duo, FirePro S9300x2, Radeon Instinct MI8 gfx900 - Vega GPU, codename vega10, including Radeon Vega Frontier Edition, Radeon RX Vega 56/64, Radeon RX Vega 64 Liquid, Radeon Pro Vega 48/56/64/64X, Radeon Pro WX 8200/9100, Radeon Pro V320/V340/SSG, Radeon Instinct MI25 gfx906 - Vega GPU, codename vega20, including Radeon (Pro) VII, Radeon Instinct MI50/MI60 gfx908 - CDNA Accelerator, codename arcturus, including AMD Instinct MI100 Accelerator -gfx90a - CDNA2 Accelerator, codename aldebaran, including AMD Instinct MI200 series Accelerators +gfx90a_xnack - CDNA2: same as gfx90a_noxnack, with xnack enabled. Compatible with xnack enabled GPU state. +gfx90a_noxnack - CDNA2 Accelerator, codename aldebaran, including AMD Instinct MI200 series Accelerators, compatible with xnack disabled GPU state. gfx1010 - RDNA GPU, codename navi10, including Radeon RX 5700XT/5700/5700M/5700B/5700XTB/5600XT/5600/5600M, Radeon Pro 5700XT/5700, Radeon Pro W5700X/W5700 gfx1011 - RDNA GPU, codename navi12, including Radeon Pro 5600M/V520 gfx1012 - RDNA GPU, codename navi14, including Radeon RX 5500XT/5500/5500M/5500XTB/5300/5300M, Radeon Pro 5500XT/5500M/5300/5300M, Radeon Pro W5500X/W5500/W5500M/W5300M -- 2.41.0