From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id ADD49158094 for ; Mon, 27 Jun 2022 15:12:03 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id C008FE0AFE; Mon, 27 Jun 2022 15:12:02 +0000 (UTC) Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 42AFDE0AFE for ; Mon, 27 Jun 2022 15:12:01 +0000 (UTC) Received: by mail-pl1-x635.google.com with SMTP id jh14so8453354plb.1 for ; Mon, 27 Jun 2022 08:12:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:subject:message-id:mime-version:content-disposition :content-transfer-encoding; bh=iLclX4cN0yS2YsqUFDlF3oOvOij5NY8El6LWUwmG2GU=; b=OvTMPgGl0yeoFYa3FHHXlJY5SUDC8SBgGApawFLFYuRONW2KQgrljuiuGQMl+yW0th P+Wk1NuCi1KpNap92Vi+RAnLbc6FwJNHUzIZthGfk9JojyINmgq+unEfue3bNORC9li8 heaEzvFnqdJtekr/a1kt7bfOf2KEm7vDzxjJOq+SpDr3sR+nn9TdwZsCexkTz34wTlWC CeiIRTVfB5zY/m9Y7wHLnHIYXhfgs7aQYTCAkWgxu7CT8NnCNholJ1/e2zHgttE25j50 vtSf8i2c+q+T3IPlSPppIv/4ZOgbyqsZpnD4uFrSTG+4tWKHFGBeinAovSimxrdpz7HD faYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:content-transfer-encoding; bh=iLclX4cN0yS2YsqUFDlF3oOvOij5NY8El6LWUwmG2GU=; b=rFpjdVGRYhrlsDyjPenbeGWOKghuoHmf9q0S3j2XlWSSU9kn98ADrxFrDR7cAwuk4x ZDUkFxt8U/CjohvoIjpCqvQ/5sKyA6lMhwsxGy87zNTwdXRgtkV6+n8vnqHplq+NnYrG a543EdE0yGfs3dFbIIsvdwwNifo4obbdp+gxrB+bWRc65K/XzZJY2hmHLuMYME+sEiyC WZhEqZP9J7rRCfuBS2Qhjk2ITohEErpPrl9YmulbDraKyELarwBKQNuuRyNDgAD6tiMk hh+yssyE1VrSnlGVM78eJPF5C6uI4BmeB5/tDB1uqGu48W/YJj7r0ggV5NPrao6nyOEV zACw== X-Gm-Message-State: AJIora9SLKX/b0f95XY/3rMS8ybgdK0rxfaKCit/YghlU5jzpWsFw7fQ qz/cLrHhRRr9+y8psSdP8Pcibz8ed75ZbZ8h X-Google-Smtp-Source: AGRyM1tTUTk60tNK8ddjJa9d+35ZN6HzHDVpbmS0+kdy2S3hVNq0U0zxSU89icn223v1I/CG7UCAiw== X-Received: by 2002:a17:902:d718:b0:168:d9df:4f1c with SMTP id w24-20020a170902d71800b00168d9df4f1cmr14593357ply.41.1656342720834; Mon, 27 Jun 2022 08:12:00 -0700 (PDT) Received: from localhost (49.212.183.201.v6.sakura.ne.jp. [2403:3a00:202:1120:49:212:183:201]) by smtp.gmail.com with ESMTPSA id bb3-20020a170902bc8300b0015e8d4eb1c8sm7374119plb.18.2022.06.27.08.12.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jun 2022 08:12:00 -0700 (PDT) Date: Mon, 27 Jun 2022 23:12:14 +0800 From: wuyy To: gentoo-soc Subject: [gentoo-soc] Week 2 Report for Refining ROCm Packages in Gentoo Message-ID: Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-soc@lists.gentoo.org Reply-to: gentoo-soc@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Archives-Salt: 446aacaa-8ccb-4d10-892b-be9c98cf9834 X-Archives-Hash: 2a77f595e5fa73db95f1622534dbdb6e Hello all, The second week of refining ROCm ebuilds is quite busy. I deployed docker to perform clean build which find two hidden bugs in hip, and there is also progress on completing rocm-5.1.3 against vanilla llvm/clang. After learning a lesson at bug #853184, I realized that a clean environment to build and test is essential to find hidden bugs, especially missing dependencies. I find the cause of #853184 and fixed that in [1]. With the help of clean build, I found bug #853718 and fixed that in [2]. I also reproduced bug #843263 and provide a fix in [3]. I also fixed an old bug #853184 with [12]. Another bug fix is [4] for a serious issue of incorrect manifest, bug #851792 and #851795. Andrew Ammerlaan also pointed out the QA issue of directly calling python3 to execute scripts instead of using EPYTHON. I will consider that in week3. Then it's about progress on rocm-5.1.3 against vanilla llvm/clang. The major achievements are: 1. Michał Górny told me the policy of packaging llvm/clang, so the brutal patch in [6] is not suitable. I studied the patch and find it unnecessary, as long as we add `--rocm-path=/usr` and `--hip-device-lib-path=/usr/lib/amdgcn/bitcode` when calling clang to compile hip sources. So I patched hipcc.pl in dev-util/hip and comgr-compiler.cpp in dev-libs/rocm-comgr to explicitly add `--rocm-path=/usr`. Notice that the patch for rocm-comgr is not obvious, because a test suite called "compile_hip_test_in_process" won't appear and fail unless dev-util/hip is merged (hip depend on rocm-comgr but does not depend on hip), so I guess that's why Debian and Fedora has not encounter this issue. I suppose they are also packaging hip, and will meet similar problems, so it would be really helpful if ROCm team of major distributions can discuss and share information on packaging hip. 2. I packaged dev-util/hip-5.1.3, it's currently in [7]. It currently works, although I'm not satisfied with tens of sed commands and ten patches needed -- upstream of hip currently is not distribution-friendly. I fixed the cmake issue mentioned in week1's report, also mentioned in [8]. I also encountered bug when trying to turn on USE=profile, and the solution is backporting two patches (see details in [9]), meaning that this release of hip is not able to build itself due to some important fix not included. Plus the hard-coded clang-runtime include paths and abused `-isystem`, I really find hip the most chanllenging one among ROCm packages. 3. Blender still works after the removing the patch of clang mentioned in 1., and details can be found in [10]. I also tried backporting a patch to enable using HIP cycles (a render engine for blender) on Radeon VII, but failed with GPU memory access error, which indicates that hip needs further tuning [11]. 4. Version 5.1.3 ebuilds are in good shape [7], including low-level runtimes {roct-thunk-interface, rocr-runtime, rocminfo}, and toolchains {rocm-device-libs, rocm-comgr, rocm-cmake, hip}, waiting for PR. The commits are squashed, while you can see my original history of battling against hip in the unrebased tree [16]. rocBLAS is also bumped to 5.1.3 and running tests, but I decide to rewrite it and make use of rocm.eclass later. 5. rocm-comgr upstream noticed my bug report [17]. So now hip-5.1.3 seems to be ready, and my test system does not show bugs. I'll PR my rocm-5.1.3 branch [7] right after [3] get merged. In the next week I shall land make hip-5.1.3 in ::gentoo, and prepare a draft of rocm.eclass. There will also be bug fixes, concentrating on rocBLAS not respecting MAKEOPTS (#852236), rocprofiler QA issue [5], rocFFT build issue using hip-5.1.3 [13]. For the long term, I'll also investigate the embedded header in libhipamd64.so and libhiprtc-builtins.so which blocks CuPy, and how well vanilla libomp supports ROCm openmp offloading compared to aomp(llvm-roc) which is related to rocSPARSE [14]. Summary: I fixed existing bugs in ::gentoo so the blockers are gone [15]. I finished the dev-util/hip-5.1.3 and its 5.1.3 dependencies. The hacks applied to hip is too much -- it would be helpful to share information with other distribution developers, and reflect those issues/open PR to upstream. [1] https://github.com/gentoo/gentoo/pull/26018 [2] https://gitweb.gentoo.org/repo/gentoo.git/commit/93ff73188c29fe12088f6166df669847cde9b2b4 [3] https://github.com/gentoo/gentoo/pull/26090 [4] https://github.com/gentoo/gentoo/pull/25891 [5] https://github.com/gentoo/gentoo/pull/25891#issuecomment-1163481516 [6] https://github.com/gentoo/gentoo/pull/25999 [7] https://github.com/littlewu2508/gentoo/tree/rocm-5.1.3-submit [8] https://bugs.gentoo.org/693200#c23 [9] https://github.com/ROCm-Developer-Tools/hipamd/issues/18#issuecomment-1167198811 [10] https://bugs.gentoo.org/693200#c24 [11] https://developer.blender.org/D15242 [12] https://github.com/gentoo/gentoo/pull/26039 [13] https://bugs.gentoo.org/693200#c25 [14] https://github.com/gentoo/gentoo/pull/25318 [15] https://github.com/justxi/rocm/issues/8#issuecomment-1166165426 [16] https://github.com/littlewu2508/gentoo/tree/rocm-5.1.3 [17] https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/45 --