public inbox for gentoo-soc@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-soc] Week 1 Report for Refining ROCm Packages in Gentoo
@ 2022-06-20 13:41 wuyy
  2022-06-20 14:43 ` Benda Xu
  0 siblings, 1 reply; 5+ messages in thread
From: wuyy @ 2022-06-20 13:41 UTC (permalink / raw
  To: gentoo-soc

Hello all,

This is my first week of GSoC at Gentoo, and I found it truly exciting. The center of first week is around making dev-util/hip rely on vanilla clang. In https://github.com/littlewu2508/gentoo/tree/blender-rocm, I bumped rocm-device-libs, rocm-comgr, hip to 5.1.3 and use vanilla llvm/clang as backend; after that I bumped blender to 3.2.0 and enables its HIP cycles, and it worked on Radeon 6700XT (see [1])! That means I made a good start on replacing llvm-roc with system llvm, which is originally the last thing in my GSoC proposal. So, I changed the plan a bit, to move the last week's plan forward.

The story begun when I heard blender 3.2.0 is finally released with HIP cycles support on Linux, so I decided to try it out. Also I searched the bugzilla and noticed a proposal to use llvm.eclass and rocm USE-flag[1].

After a quick bump for media-gfx/blender and its required dependencies, I enabled the HIP cycles in ebuild and started emerging. The build is surprisingly smooth, since build commands are simply calling hipcc without too many arguments which is already in good shape. However, blender was aborted when I tried to use HIP cycles at runtime -- the error suggest that more than one llvm libs are linked in. I realized that some dependencies like mesa linked vanilla llvm while blender itself has to link llvm-roc since it has components compiled with hipcc. I reported my trial in [1] and Sebastian Parborg confirmed the reason of my failure, so I opened another bug about llvm-roc at [2]. There I stated the situation and give two possible solutions: use vanilla clang as hip's backend, or make llvm-roc another slot of llvm/clang. That is actually my last-week-plan in GSoC proposal, but at that time I didn't realize the importance of making llvm-roc compatible of system llvm, since I had never encountered a package that both use llvm and HIP. In the bug report I announced that the second solution should be easier so I preferred that, but in my heart I think the first one is more elegant, so I would try it first and fallback to the second solution if I failed. As a result, I started my journey on removing llvm-roc from the ROCm dependency tree.

The first thing is to modify rocm-device-libs. With the help of Michał Górny (who pointed out that packages should not assume llvm to have the "BUILD_SHARED_LIBS=ON" and link llvm components in [2], knowledge++), I patched the source made it only rely on llvm:14 (Fedora developers have also discussed about this and they would like to upstream their patches). Then it's rocm-comgr, where I encounter serious problems. With the help from Yuyi Wang, I figured out a patch [3] (however I do not understand why Debian and Fedora don't need it) and I prepare to upstream it to ROCm team in the future. After that only four test failures remain, but it took me a long time to debug, and I found both Debian, Fedora team and me has not to come to a solution yet, so I decided to open a github issue to upstream at https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/45. During ebuild writing I used llvm.eclass to determine llvm prefix and `clang -print-resource-dir` to locate the CLANG_RESOURCE_DIR which is in `/usr/lib/clang/<version>` but not the default relative path in llvm-project -- knowledge+=2.

Then it was all about HIP. I encountered many issues about finding the correct include locations, and they are fixed one-by-one. At last I came to a new hipvars.pm and a patch to hipcc.pl, disabling poisoning `-isystem` and correcting many paths. Now directly calling hipcc works, and blender rendered successfully using HIP cycles! I was amazed at this result.

Then I continued to test -- compiling rocBLAS-5.1.3 using this new ROCm toolchain. Sadly, there are paths that should be corrected in cmake files. I've done some fixes, but there still needs more to let rocBLAS get configured. Bumping the high-level libs using this new toolchain would be the major task of the coming week. Another job is finalize and push low-level runtimes and toolchain into ::gentoo via PRs, starting from https://github.com/gentoo/gentoo/pull/25785. I'll also fix existing bugs when I bump the versions of those in sci-libs. For https://bugs.gentoo.org/852236 I already have a solution. For bugs of not respecting CFLAGS/LDFLAGS I shall investigate, and I think the problem is in common with https://bugs.gentoo.org/851792. I'll check them one-by-one.

**So, the plan is changed as follows:**

I am currently half way in the middle of week 11's task. So plan of week 11 is merged into week 1, meaning that tasks in week 1-10 are postpone one week.

Also, since I'm using ROCm-5.1.3 as the test place of the new toolchain, I would like to make use of rocm.eclass, if possible. That means the original week 5-8 would be moved after week 2 (between CuPy and TensorFlow).

In conclusion, in the first week I was persuaded by [1] that [2] is an important blocker, so the task in week 11 is no longer optional but essential, and get prompted. The good news is I'm getting nice progress on this issue, and I believe I'm the first Gentoo user to package and use blender-3.2 with HIP cycles. The bad news is I'm not finished with hacking cmake modules for HIP.


[1] https://bugs.gentoo.org/693200
[2] https://bugs.gentoo.org/851702
[3] https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/issues/45#issuecomment-1155975910
--
Best wishes,
Yiyang Wu


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [gentoo-soc] Week 1 Report for Refining ROCm Packages in Gentoo
  2022-06-20 13:41 [gentoo-soc] Week 1 Report for Refining ROCm Packages in Gentoo wuyy
@ 2022-06-20 14:43 ` Benda Xu
  2022-06-21  4:20   ` wuyy
  2022-06-21  8:35   ` Benda Xu
  0 siblings, 2 replies; 5+ messages in thread
From: Benda Xu @ 2022-06-20 14:43 UTC (permalink / raw
  To: gentoo-soc

Congratulations Yiyang for the exciting achievements in your first week.
The success to use upstream llvm is a major step towards a healthy ROCm
ecosystem.

wuyy <xgreenlandforwyy@gmail.com> writes:

> [...]

> At last I came to a new hipvars.pm and a patch to hipcc.pl, disabling
> poisoning `-isystem` and correcting many paths. Now directly calling
> hipcc works, and blender rendered successfully using HIP cycles!

Have you summarized all the modification in bug reports and/or pull
requests?

> [...]

Yours,
Benda


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [gentoo-soc] Week 1 Report for Refining ROCm Packages in Gentoo
  2022-06-20 14:43 ` Benda Xu
@ 2022-06-21  4:20   ` wuyy
  2022-06-21  8:35   ` Benda Xu
  1 sibling, 0 replies; 5+ messages in thread
From: wuyy @ 2022-06-21  4:20 UTC (permalink / raw
  To: gentoo-soc

On Mon, Jun 20, 2022 at 10:43:02PM +0800, Benda Xu wrote:
> 
> Have you summarized all the modification in bug reports and/or pull
> requests?
> 
The summary of bug reports in week 1:
1. 822828, 693200, 851702, 842405, 842405

The summary of closed pull requests during week 1:
1. https://github.com/gentoo/gentoo/pull/25861

The summary of currently opened pull requests:

1. rocprofiler QA fixes: https://github.com/gentoo/gentoo/pull/25891 Status: open for review
2. dev-libs/ocl-icd prefix adoption: https://github.com/gentoo/gentoo/pull/25785 Status: fixing
3. sys-devel/clang ROCm patch: https://github.com/gentoo/gentoo/pull/25999 Status: open for review
4. dev-util/premake prefix adoption (this is related to https://github.com/GPUOpen-LibrariesAndSDKs/HIPRTSDK) https://github.com/gentoo/gentoo/pull/25825 Status: open for review
-- 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [gentoo-soc] Week 1 Report for Refining ROCm Packages in Gentoo
  2022-06-20 14:43 ` Benda Xu
  2022-06-21  4:20   ` wuyy
@ 2022-06-21  8:35   ` Benda Xu
  2022-06-21  9:14     ` wuyy
  1 sibling, 1 reply; 5+ messages in thread
From: Benda Xu @ 2022-06-21  8:35 UTC (permalink / raw
  To: gentoo-soc

> Benda Xu <heroxbd@gentoo.org> writes:

> >> At last I came to a new hipvars.pm and a patch to hipcc.pl, disabling
> >> poisoning `-isystem` and correcting many paths. Now directly calling
> >> hipcc works, and blender rendered successfully using HIP cycles!
> >
> > Have you summarized all the modification in bug reports and/or pull
> > requests?

> The summary of bug reports in week 1:
> 1. 822828, 693200, 851702, 842405, 842405

> The summary of closed pull requests during week 1:
> 1. https://github.com/gentoo/gentoo/pull/25861

You are not answering my question.  My original comment was only on the
severe issue of hipcc 'isystem'.  Although you mentioned "I came to a
new hipvars.pm and a patch to hipcc.pl, disabling poisoning `-isystem`
and correcting many paths", I cannot see any fix pushed into ::gentoo.
Otherwise we won't be bitten by bug 853184.

This is a situation you should avoid at all cost in the future.

Yours,
Benda


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [gentoo-soc] Week 1 Report for Refining ROCm Packages in Gentoo
  2022-06-21  8:35   ` Benda Xu
@ 2022-06-21  9:14     ` wuyy
  0 siblings, 0 replies; 5+ messages in thread
From: wuyy @ 2022-06-21  9:14 UTC (permalink / raw
  To: gentoo-soc

On Tue, Jun 21, 2022 at 04:35:20PM +0800, Benda Xu wrote:
> > Benda Xu <heroxbd@gentoo.org> writes:
> 
> You are not answering my question.  My original comment was only on the
> severe issue of hipcc 'isystem'.  Although you mentioned "I came to a
> new hipvars.pm and a patch to hipcc.pl, disabling poisoning `-isystem`
> and correcting many paths", I cannot see any fix pushed into ::gentoo.
> Otherwise we won't be bitten by bug 853184.

The statement "I came to a new hipvars.pm and a patch to hipcc.pl,
disabling poisoning `-isystem`" describes the situation in packaging
dev-util/hip-5.1.3 against vanilla llvm/clang on my own development
branch, and it is independent of ug 853184 (they may share some
similarities, but at that time I didn't realize there can be bug in
hip-5.0.2). Bug 853184 is introduced by https://github.com/gentoo/gentoo/pull/25861. 

I am sorry for pusin dev-util/hip-5.0.2-r1 without test it's functionality.
I'll go back to the 5.0.2 branch and fix up the mess, and setup a CI
system to provide solid testings.

> 
> This is a situation you should avoid at all cost in the future.

-- 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-06-21  9:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-20 13:41 [gentoo-soc] Week 1 Report for Refining ROCm Packages in Gentoo wuyy
2022-06-20 14:43 ` Benda Xu
2022-06-21  4:20   ` wuyy
2022-06-21  8:35   ` Benda Xu
2022-06-21  9:14     ` wuyy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox