From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id C2CFE138334 for ; Mon, 17 Jun 2019 13:33:15 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 52C6CE0908; Mon, 17 Jun 2019 13:33:10 +0000 (UTC) Received: from mx1.riseup.net (mx1.riseup.net [198.252.153.129]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id F4027E08CA for ; Mon, 17 Jun 2019 13:33:09 +0000 (UTC) Received: from bell.riseup.net (bell-pn.riseup.net [10.0.1.178]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by mx1.riseup.net (Postfix) with ESMTPS id C8CBE1A7E01; Mon, 17 Jun 2019 06:33:08 -0700 (PDT) X-Riseup-User-ID: E72CC6D2E7A943E5660CD7A338E4E96C38E533955641CB822DCBECC4FCC70BF6 Received: from [127.0.0.1] (localhost [127.0.0.1]) by bell.riseup.net (Postfix) with ESMTPSA id 9B4BA2207E2; Mon, 17 Jun 2019 06:33:08 -0700 (PDT) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Date: Mon, 17 Jun 2019 06:33:08 -0700 From: Mo Zhou To: gentoo-dev@lists.gentoo.org Cc: =?UTF-8?Q?Micha=C5=82_G=C3=B3rny?= Subject: Re: [gentoo-dev] RFC: BLAS and LAPACK runtime switching (Re-designed) In-Reply-To: <1881be60ffbdd5675e918970246a8f151d490148.camel@gentoo.org> References: <2d3636f5bd6a738f30a4ad2e697b1ddb@debian.org> <1881be60ffbdd5675e918970246a8f151d490148.camel@gentoo.org> Message-ID: X-Archives-Salt: 48559c69-e828-4237-a048-81c30c7783ba X-Archives-Hash: 343d88168fd6239865850e72a3e1e3b4 Hi Michał, Sorry for the late reply. Just encountered some severe hardware failure. On 2019-06-13 07:49, Michał Górny wrote: >> >> sci-libs/{blas,cblas,lapack,lapacke}::gentoo should be deprecated. They >> are based on exactly the same source tarball, and maintaining 4 ebuild >> files for a single tarball is not a good choice IHMO. Those old ebuild >> files seems to leverage the flexibility of upstream build system >> because it enables one to, for example, skip the reference blas build >> and use an existing optimized BLAS impelementation and hence introduce >> flexibility. That flexibility is hard to maintain and is not necessary >> anymore with the new runtime switching mechanism. >> >> That's why I propose to merge the 4 ebuild into a single one: >> sci-libs/lapack. We don't need to add the "reference" postfix >> because no upstream will loot the name "lapack". When talking >> about "lapack" it's always the reference implementation. > > What's the real gain here, and how does it compare to loss of > flexibility of being able to build only what the package in question > needs? First let's see what these 4 components are: 1. blas: written in fortran, provides fundamental linear algebra routines. libblas.so can work alone. 2. cblas: a thin C wrapper around the fortran blas. that means libcblas.so calls libblas.so for the real calculation. 3. lapack: written in fortran, frequently calls BLAS for implementing higher level linear algebra routines. liblapack.so needs libblas.so (fortran). 4. lapacke: a thin C wrapper around the fortran lapack. liblapacke.so needs liblapack.so. The real gain by merging 4 ebuilds into 1 ebuild: 1. easier to maintain, updating 4 ebuilds on every single version bump is much harder compared to updating only 1. This will also make it easier to provide and maintain the virtual-* features for long run. 2. could avoid confusing or even potentially problematic setups, e.g.: A user happened to compile OpenBLAS for the libblas provider, and BLIS for the libcblas provider: appA -> libblas (OpenBLAS) appB -> libcblas (BLIS) appC -> liblapacke (Ref) -> liblapack (Ref) -> libblas (OpenBLAS) -> libcblas.so (BLIS) The user will get him/herself confused on what BLAS is really doing the calculation. Plus, sometimes mixing threading model may cause poor performance (e.g. openmp + pthread) or even silent corruption (e.g. GNU openmp + Intel openmp). Merging cblas into blas, and lapacke into lapack will make it harder to get things wrong. IHMO that mentioned flexibility is not really necessary. Any scientific computing user who needs performance and dislikes the virtual-* solution could directly link their programs against MKL or openblas without thinking about the reference blas, because both MKL and OpenBLAS provides the full set of blas,cblas,lapack,lapacke API and ABI via a single shared object. Plus, that flexibility could be replaced by the proposed runtime switching solution: by alternating the blas(cblas) selection, liblapack.so can be dynamicly linked against different optimized implementations. Discarding this flexibility will only affect users who insist on linking an unoptimized lapack against a specific blas implementation. And one may also fall into trouble with such flexibility, e.g.: libcblas (Reference) -> libblas.so (reference) liblapack (Reference) -> libopenblas.so appC -> (liblapacke, libcblas) --> liblapacke -> liblapack -> libopenblas --> libcblas (reference) libopenblas's ABI is a superset of those of libcblas, which indicates confusion and symbol race condition during run-time. With the proposed (redesigned) solution, these potentially bad cases could be avoided because the solution trys to keep the backend consistency. Some people had headache on the BLAS/LAPACK flexibility and they created flexiblas. In a word, the (4->1) change can reudce the maintaining cost for (blas,cblas,lapack,lapacke) and make the virtual-* feature easier to implement and maintain for long run. Additionally, the flexibility mentioned before is not really necessary when the virtual-* feature is fully implemented. Best, Mo.