From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id DA209138334 for ; Tue, 28 May 2019 08:37:05 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 53980E08C2; Tue, 28 May 2019 08:37:02 +0000 (UTC) Received: from mx1.riseup.net (mx1.riseup.net [198.252.153.129]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id E87F0E08AC for ; Tue, 28 May 2019 08:37:01 +0000 (UTC) Received: from capuchin.riseup.net (capuchin-pn.riseup.net [10.0.1.176]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (verified OK)) by mx1.riseup.net (Postfix) with ESMTPS id 96E6F1A05EA for ; Tue, 28 May 2019 01:37:00 -0700 (PDT) X-Riseup-User-ID: D56CCAD6BB2876F945026D48F62EB4CAD0C2321F35111C00263999AFAFBE029B Received: from [127.0.0.1] (localhost [127.0.0.1]) by capuchin.riseup.net (Postfix) with ESMTPSA id 7DC8B12051B for ; Tue, 28 May 2019 01:37:00 -0700 (PDT) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@lists.gentoo.org Reply-to: gentoo-dev@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Date: Tue, 28 May 2019 01:37:00 -0700 From: Mo Zhou To: gentoo-dev@lists.gentoo.org Subject: [gentoo-dev] RFC: BLAS and LAPACK runtime switching Message-ID: <2d3636f5bd6a738f30a4ad2e697b1ddb@debian.org> X-Archives-Salt: 6536a003-9a7a-4cf2-8968-51eb5887805b X-Archives-Hash: d917547f7a9e1226fca63632a1e02026 Hi Gentoo devs, Classical numerical linear algebra libraries, BLAS[1] and LAPACK[2] play important roles in the scientific computing field, as many software such as Numpy, Scipy, Julia, Octave, R are built upon them. There is a standard implementation of BLAS and LAPACK, named netlib or simply "reference implementation". This implementation had been provided by gentoo's main repo. However, it has a major problem: performance. On the other hand, a number of well-optimized BLAS/LAPACK implementations exist, including OpenBLAS (free), BLIS (free), MKL (non-free), etc., but none of them has been properly integrated into the Gentoo distribution. I'm writing to propose a good solution to this problem. If no gentoo developer is object to this proposal, I'll keep moving forward and start submitting PRs to Gentoo main repo. Historical Obstacle ------------------- Different BLAS/LAPACK implementations are expected to be compatible to each other in both the API and ABI level. They can be used as drop-in replacement to the others. This sounds nice, but the difference in SONAME hampered the gentoo integration of well-optimized ones. Assume a Gentoo user compiled a pile of packages on top of the reference BLAS and LAPACK, namely these reverse dependencies are linked against libblas.so.3 and liblapack.so.3 . When the user discovered that OpenBLAS provides much better performance, they'll have to recompile the whole reverse dependency tree in order to take advantage from OpenBLAS, because the SONAME of OpenBLAS is libopenblas.so.0 . When the user wants to try MKL (libmkl_rt.so), they'll have to recompile the whole reverse dependency tree again. This is not friendly to our earth. Goal ---- * When a program is linked against libblas.so or liblapack.so provided by any BLAS/LAPACK provider, the eselect-based solution will allow user to switch the underlying library without recompiling anything. * When a program is linked against a specific implementation, e.g. libmkl_rt.so, the solution doesn't break anything. Solution -------- Similar to Debian's update-alternatives mechanism, Gentoo's eselect is good at dealing with drop-in replacements as well. My preliminary investigation suggests that eselect is enough for enabling BLAS/LAPACK runtime switching. Hence, the proposed solution is eselect-based: * Every BLAS/LAPACK implementation should provide generic library and eselect candidate libraries at the same time. Taking netlib, BLIS and OpenBLAS as examples: reference: usr/lib64/blas/reference/libblas.so.3 (SONAME=libblas.so.3) -- default BLAS provider -- candidate of the eselect "blas" unit -- will be symlinked to usr/lib64/libblas.so.3 by eselect usr/lib64/lapack/reference/liblapack.so.3 (SONAME=liblapack.so.3) -- default LAPACK provider -- candidate of the eselect "lapack" unit -- will be symlinked to usr/lib64/liblapack.so.3 by eselect blis (doesn't provide LAPACK): usr/lib64/libblis.so.2 (SONAME=libblis.so.2) -- general purpose usr/lib64/blas/blis/libblas.so.3 (SONAME=libblas.so.3) -- candidate of the eselect "blas" unit -- will be symlinked to usr/lib64/libblas.so.3 by eselect -- compiled from the same set of object files as libblis.so.2 openblas: usr/lib64/libopenblas.so.0 (SONAME=libopenblas.so.0) -- general purpose usr/lib64/blas/openblas/libblas.so.3 (SONAME=libblas.so.3) -- candidate of the eselect "blas" unit -- will be symlinked to usr/lib64/libblas.so.3 by eselect -- compiled from the same set of object files as libopenblas.so.0 usr/lib64/lapack/openblas/liblapack.so.3 (SONAME=liblapack.so.3) -- candidate of the eselect "lapack" unit -- will be symlinked to usr/lib64/liblapack.so.3 by eselect -- compiled from the same set of object files as libopenblas.so.0 This solution is similar to Debian's[3]. This solution achieves our goal, and it requires us to patch upstream build systems (same to Debian). Preliminary demonstration for this solution is available, see below. Is this solution reliable? -------------------------- * A similar solution has been used by Debian for many years. * Many projects call BLAS/LAPACK libraries through FFI, including Julia. (See Julia's standard library: LinearAlgebra) Proposed Changes ---------------- 1. Deprecate sci-libs/{blas,cblas,lapack,lapacke}-reference from gentoo main repo. They use exactly the same source tarball. It's not quite helpful to package these components in a fine-grained manner. A single sci-libs/lapack package is enough. 2. Merge the "cblas" eselect unit into "blas" unit. It is potentially harmful when "blas" and "cblas" point to different implementations. That means "app-eselect/eselect-cblas" should be deprecated. 3. Update virtual/{blas,cblas,lapack,lapacke}. BLAS/LAPACK providers will be registered in their dependency information. Note, ebuilds for BLAS/LAPACK reverse dependencies are expected to work with these changes correctly without change. For example, my local numpy-1.16.1 compilation was successful without change. Preliminary Demonstration ------------------------- The preliminary implementation is available in my personal overlay[4]. A simple sanity test script `check-cpp.sh` is provided to illustrate the effectiveness of the proposed solution. The script `check-cpp.sh` compiles two C++ programs -- one calls general matrix-matrix multiplication from BLAS, while another one calls general singular value decomposition from LAPACK. Once compiled, this script will switch different BLAS/LAPACK implementations and run the C++ programs without recompilation. The preliminary result is avaiable here[5]. (CPU=Power9, ARCH=ppc64le) >From the experimental results, we find that For (512x512) single precision matrix multiplication: * reference BLAS takes ~360 ms * BLIS takes ~70 ms * OpenBLAS takes ~10 ms For (512x512) single precision singular value decomposition: * reference LAPACK takes ~1900 ms * BLIS (+reference LAPACK) takes ~1500 ms * OpenBLAS takes ~1100 ms The difference in computation speed illustrates the effectiveness of the proposed solution. Theoretically, any other package could take advantage from this solution without any recompilation as long as it's linked against a library with SONAME. Acknowledgement --------------- This is an on-going GSoC-2019 Porject: https://summerofcode.withgoogle.com/projects/?sp-page=2#6268942782300160 Mentor: Benda Xu [1] BLAS = Basic Linear Algebra Subroutines. It's a set of API + ABI. [2] LAPACK = Linear Algebra PACKage. It's a set of API + ABI. [3] https://wiki.debian.org/DebianScience/LinearAlgebraLibraries [4] https://github.com/cdluminate/my-overlay [5] https://gist.github.com/cdluminate/0cfeab19b89a8b5ac4ea2c5f942d8f64