public inbox for gentoo-portage-dev@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-portage-dev] [PATCH 1/2] cpuinfo: use better available CPU calculation
@ 2019-02-16  6:21 robbat2
  2019-02-16  6:21 ` [gentoo-portage-dev] [PATCH 2/2] Replace multiprocessing.cpu_count with portage.util.cpuinfo.get_cpu_count robbat2
  0 siblings, 1 reply; 3+ messages in thread
From: robbat2 @ 2019-02-16  6:21 UTC (permalink / raw
  To: gentoo-portage-dev; +Cc: Robin H. Johnson

From: "Robin H. Johnson" <robbat2@gentoo.org>

The existing portage.util.cpuinfo.get_cpu_count() behavior is wrong when
run in any environment where the cpuset is a subset of online CPUs.

The solution recommended by the 'os.cpu_count()' help is to use:
 len(os.sched_getaffinity(0))

This only works on line, so keep multiprocessing.cpu_count() as a
fallback. In newer version of Python, multiprocessing.cpu_count() is a
wrapper for os.cpu_count().

Reported-By: Daniel Robbins <drobbins@funtoo.org>
Fixes: https://bugs.funtoo.org/browse/FL-6227
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
---
 lib/portage/util/cpuinfo.py | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

diff --git a/lib/portage/util/cpuinfo.py b/lib/portage/util/cpuinfo.py
index 669e707b5..9ab1c119d 100644
--- a/lib/portage/util/cpuinfo.py
+++ b/lib/portage/util/cpuinfo.py
@@ -1,15 +1,44 @@
-# Copyright 2015 Gentoo Foundation
+# Copyright 2015-2019 Gentoo Authors
 # Distributed under the terms of the GNU General Public License v2
 
 __all__ = ['get_cpu_count']
 
+# Before you set out to change this function, figure out what you're really
+# asking:
+#
+# - How many CPUs exist in this system (e.g. that the kernel is aware of?)
+#   This is 'getconf _NPROCESSORS_CONF' / get_nprocs_conf(3)
+#   In modern Linux, implemented by counting CPUs in /sys/devices/system/cpu/
+#
+# - How many CPUs in this system are ONLINE right now?
+#   This is 'getconf _NPROCESSORS_ONLN' / get_nprocs(3)
+#   In modern Linux, implemented by parsing /sys/devices/system/cpu/online
+#
+# - How many CPUs are available to this program?
+#   This is 'nproc' / sched_getaffinity(2), which is implemented in modern
+#   Linux kernels by querying the kernel scheduler; This might not be available
+#   in some non-Linux systems!
+#
+# - How many CPUs are available to this thread?
+#   This is pthread_getaffinity_np(3)
+#
+# As a further warning, the results returned by this function can differ
+# between runs, if altered by the scheduler or other external factors.
 
 def get_cpu_count():
 	"""
-	Try to obtain the number of CPUs available.
+	Try to obtain the number of CPUs available to this process.
 
 	@return: Number of CPUs or None if unable to obtain.
 	"""
+        try:
+		import os
+		# This was introduced in Python 3.3 only, but exists in Linux
+		# all the way back to the 2.5.8 kernel.
+		# This NOT available in FreeBSD!
+		return len(os.sched_getaffinity(0))
+	except (ImportError, NotImplementedError, AttributeError):
+		pass
 
 	try:
 		import multiprocessing
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [gentoo-portage-dev] [PATCH 2/2] Replace multiprocessing.cpu_count with portage.util.cpuinfo.get_cpu_count
  2019-02-16  6:21 [gentoo-portage-dev] [PATCH 1/2] cpuinfo: use better available CPU calculation robbat2
@ 2019-02-16  6:21 ` robbat2
  2019-02-16  6:58   ` Zac Medico
  0 siblings, 1 reply; 3+ messages in thread
From: robbat2 @ 2019-02-16  6:21 UTC (permalink / raw
  To: gentoo-portage-dev; +Cc: Robin H. Johnson

From: "Robin H. Johnson" <robbat2@gentoo.org>

portage.util.cpuinfo.get_cpu_count was only used in one spot before, and
other call-sites just used multiprocessing.cpu_count() directly.

Replace all multiprocessing.cpu_count() calls with get_cpu_count() in
portage.util.cpuinfo, to ensure consistency in CPU calculation.

Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
---
 lib/portage/dbapi/porttree.py              |  4 ++--
 lib/portage/util/futures/executor/fork.py  |  4 ++--
 lib/portage/util/futures/iter_completed.py | 18 +++++++++---------
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/lib/portage/dbapi/porttree.py b/lib/portage/dbapi/porttree.py
index 2ff3e1b34..64a5f3681 100644
--- a/lib/portage/dbapi/porttree.py
+++ b/lib/portage/dbapi/porttree.py
@@ -1471,11 +1471,11 @@ def _async_manifest_fetchlist(portdb, repo_config, cp, cpv_list=None,
 	@param cpv_list: list of ebuild cpv values for a Manifest
 	@type cpv_list: list
 	@param max_jobs: max number of futures to process concurrently (default
-		is multiprocessing.cpu_count())
+		is portage.util.cpuinfo.get_cpu_count())
 	@type max_jobs: int
 	@param max_load: max load allowed when scheduling a new future,
 		otherwise schedule no more than 1 future at a time (default
-		is multiprocessing.cpu_count())
+		is portage.util.cpuinfo.get_cpu_count())
 	@type max_load: int or float
 	@param loop: event loop
 	@type loop: EventLoop
diff --git a/lib/portage/util/futures/executor/fork.py b/lib/portage/util/futures/executor/fork.py
index 72844403c..add7b3c9e 100644
--- a/lib/portage/util/futures/executor/fork.py
+++ b/lib/portage/util/futures/executor/fork.py
@@ -7,13 +7,13 @@ __all__ = (
 
 import collections
 import functools
-import multiprocessing
 import os
 import sys
 import traceback
 
 from portage.util._async.AsyncFunction import AsyncFunction
 from portage.util.futures import asyncio
+from portage.util.cpuinfo import get_cpu_count
 
 
 class ForkExecutor(object):
@@ -24,7 +24,7 @@ class ForkExecutor(object):
 	This is entirely driven by an event loop.
 	"""
 	def __init__(self, max_workers=None, loop=None):
-		self._max_workers = max_workers or multiprocessing.cpu_count()
+		self._max_workers = max_workers or get_cpu_count()
 		self._loop = asyncio._wrap_loop(loop)
 		self._submit_queue = collections.deque()
 		self._running_tasks = {}
diff --git a/lib/portage/util/futures/iter_completed.py b/lib/portage/util/futures/iter_completed.py
index 31b5e0c78..4c48ea0fe 100644
--- a/lib/portage/util/futures/iter_completed.py
+++ b/lib/portage/util/futures/iter_completed.py
@@ -2,11 +2,11 @@
 # Distributed under the terms of the GNU General Public License v2
 
 import functools
-import multiprocessing
 
 from portage.util._async.AsyncTaskFuture import AsyncTaskFuture
 from portage.util._async.TaskScheduler import TaskScheduler
 from portage.util.futures import asyncio
+from portage.util.cpuinfo import get_cpu_count
 
 
 def iter_completed(futures, max_jobs=None, max_load=None, loop=None):
@@ -18,11 +18,11 @@ def iter_completed(futures, max_jobs=None, max_load=None, loop=None):
 	@param futures: iterator of asyncio.Future (or compatible)
 	@type futures: iterator
 	@param max_jobs: max number of futures to process concurrently (default
-		is multiprocessing.cpu_count())
+		is portage.util.cpuinfo.get_cpu_count())
 	@type max_jobs: int
 	@param max_load: max load allowed when scheduling a new future,
 		otherwise schedule no more than 1 future at a time (default
-		is multiprocessing.cpu_count())
+		is portage.util.cpuinfo.get_cpu_count())
 	@type max_load: int or float
 	@param loop: event loop
 	@type loop: EventLoop
@@ -47,11 +47,11 @@ def async_iter_completed(futures, max_jobs=None, max_load=None, loop=None):
 	@param futures: iterator of asyncio.Future (or compatible)
 	@type futures: iterator
 	@param max_jobs: max number of futures to process concurrently (default
-		is multiprocessing.cpu_count())
+		is portage.util.cpuinfo.get_cpu_count())
 	@type max_jobs: int
 	@param max_load: max load allowed when scheduling a new future,
 		otherwise schedule no more than 1 future at a time (default
-		is multiprocessing.cpu_count())
+		is portage.util.cpuinfo.get_cpu_count())
 	@type max_load: int or float
 	@param loop: event loop
 	@type loop: EventLoop
@@ -61,8 +61,8 @@ def async_iter_completed(futures, max_jobs=None, max_load=None, loop=None):
 	"""
 	loop = asyncio._wrap_loop(loop)
 
-	max_jobs = max_jobs or multiprocessing.cpu_count()
-	max_load = max_load or multiprocessing.cpu_count()
+	max_jobs = max_jobs or portage.util.cpuinfo.get_cpu_count()
+	max_load = max_load or portage.util.cpuinfo.get_cpu_count()
 
 	future_map = {}
 	def task_generator():
@@ -120,11 +120,11 @@ def iter_gather(futures, max_jobs=None, max_load=None, loop=None):
 	@param futures: iterator of asyncio.Future (or compatible)
 	@type futures: iterator
 	@param max_jobs: max number of futures to process concurrently (default
-		is multiprocessing.cpu_count())
+		is portage.util.cpuinfo.get_cpu_count())
 	@type max_jobs: int
 	@param max_load: max load allowed when scheduling a new future,
 		otherwise schedule no more than 1 future at a time (default
-		is multiprocessing.cpu_count())
+		is portage.util.cpuinfo.get_cpu_count())
 	@type max_load: int or float
 	@param loop: event loop
 	@type loop: EventLoop
-- 
2.18.0



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [gentoo-portage-dev] [PATCH 2/2] Replace multiprocessing.cpu_count with portage.util.cpuinfo.get_cpu_count
  2019-02-16  6:21 ` [gentoo-portage-dev] [PATCH 2/2] Replace multiprocessing.cpu_count with portage.util.cpuinfo.get_cpu_count robbat2
@ 2019-02-16  6:58   ` Zac Medico
  0 siblings, 0 replies; 3+ messages in thread
From: Zac Medico @ 2019-02-16  6:58 UTC (permalink / raw
  To: gentoo-portage-dev, robbat2


[-- Attachment #1.1: Type: text/plain, Size: 753 bytes --]

On 2/15/19 10:21 PM, robbat2@gentoo.org wrote:
> From: "Robin H. Johnson" <robbat2@gentoo.org>
> 
> portage.util.cpuinfo.get_cpu_count was only used in one spot before, and
> other call-sites just used multiprocessing.cpu_count() directly.
> 
> Replace all multiprocessing.cpu_count() calls with get_cpu_count() in
> portage.util.cpuinfo, to ensure consistency in CPU calculation.
> 
> Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
> ---
>  lib/portage/dbapi/porttree.py              |  4 ++--
>  lib/portage/util/futures/executor/fork.py  |  4 ++--
>  lib/portage/util/futures/iter_completed.py | 18 +++++++++---------
>  3 files changed, 13 insertions(+), 13 deletions(-)

Series looks good, please merge.
-- 
Thanks,
Zac


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 981 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-02-16  6:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-02-16  6:21 [gentoo-portage-dev] [PATCH 1/2] cpuinfo: use better available CPU calculation robbat2
2019-02-16  6:21 ` [gentoo-portage-dev] [PATCH 2/2] Replace multiprocessing.cpu_count with portage.util.cpuinfo.get_cpu_count robbat2
2019-02-16  6:58   ` Zac Medico

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox