From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id E00DC158041 for ; Wed, 21 Feb 2024 11:33:45 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 07488E29A2; Wed, 21 Feb 2024 11:33:45 +0000 (UTC) Received: from smtp.gentoo.org (mail.gentoo.org [IPv6:2001:470:ea4a:1:5054:ff:fec7:86e4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id DE31BE29A2 for ; Wed, 21 Feb 2024 11:33:44 +0000 (UTC) Received: from oystercatcher.gentoo.org (oystercatcher.gentoo.org [148.251.78.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTPS id 17C2D3431D2 for ; Wed, 21 Feb 2024 11:33:44 +0000 (UTC) Received: from localhost.localdomain (localhost [IPv6:::1]) by oystercatcher.gentoo.org (Postfix) with ESMTP id 89AF6118C for ; Wed, 21 Feb 2024 11:33:42 +0000 (UTC) From: "Alfredo Tupone" To: gentoo-commits@lists.gentoo.org Content-Transfer-Encoding: 8bit Content-type: text/plain; charset=UTF-8 Reply-To: gentoo-dev@lists.gentoo.org, "Alfredo Tupone" Message-ID: <1708515185.c05b1cf909d724d72c0ce14fd7c870ab3749677f.tupone@gentoo> Subject: [gentoo-commits] repo/gentoo:master commit in: sci-libs/datasets/, sci-libs/datasets/files/ X-VCS-Repository: repo/gentoo X-VCS-Files: sci-libs/datasets/Manifest sci-libs/datasets/datasets-2.15.0.ebuild sci-libs/datasets/datasets-2.16.0.ebuild sci-libs/datasets/files/datasets-2.15.0-tests.patch sci-libs/datasets/files/datasets-2.16.0-tests.patch X-VCS-Directories: sci-libs/datasets/ sci-libs/datasets/files/ X-VCS-Committer: tupone X-VCS-Committer-Name: Alfredo Tupone X-VCS-Revision: c05b1cf909d724d72c0ce14fd7c870ab3749677f X-VCS-Branch: master Date: Wed, 21 Feb 2024 11:33:42 +0000 (UTC) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-commits@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply X-Archives-Salt: 9cdcd7ca-2e87-4d2e-b950-c3dd2118529c X-Archives-Hash: 78242a31660fb0b76c2fe5b85b20fd43 commit: c05b1cf909d724d72c0ce14fd7c870ab3749677f Author: Alfredo Tupone gentoo org> AuthorDate: Wed Feb 21 11:32:18 2024 +0000 Commit: Alfredo Tupone gentoo org> CommitDate: Wed Feb 21 11:33:05 2024 +0000 URL: https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=c05b1cf9 sci-libs/datasets: add 2.16.0, drop 2.15.0 Signed-off-by: Alfredo Tupone gentoo.org> sci-libs/datasets/Manifest | 2 +- ...tasets-2.15.0.ebuild => datasets-2.16.0.ebuild} | 4 +- .../datasets/files/datasets-2.15.0-tests.patch | 46 ----------- .../datasets/files/datasets-2.16.0-tests.patch | 89 ++++++++++++++++++++++ 4 files changed, 92 insertions(+), 49 deletions(-) diff --git a/sci-libs/datasets/Manifest b/sci-libs/datasets/Manifest index 42abdebf934c..0880ec7cb629 100644 --- a/sci-libs/datasets/Manifest +++ b/sci-libs/datasets/Manifest @@ -1 +1 @@ -DIST datasets-2.15.0.gh.tar.gz 2147191 BLAKE2B eadf0133f0baa9f0469a51f28e00d3656b2b799ed1ff221ad6df39640c9777ccd46b706e46898ffa0597bc43288ee5991410d5c6d0a2cb3b814658c92d779a68 SHA512 589ca7992d58007c556558ef0889354fe34821f55e79025ea475d08c105428fe84c77c9183ec0028d8e60b25ba0ea8565bd8c6003a85bb6472d1cb4a247142e2 +DIST datasets-2.16.0.gh.tar.gz 2163874 BLAKE2B baec91a0e39fac3e07f11e352a286c0940cbc672e7233267e70d1abb64dd31bae18c55213a20fafaeaf2f60268104f294c77c9b73ddc1b289175904288a7c440 SHA512 f2a17ffab192163cfc196cc2bad0adb2ca657b5cf911f74f299b6e29eb4fcfacc377505b1857974a6b55252eedf8775a8706f9e991450c55e5d613020dc03735 diff --git a/sci-libs/datasets/datasets-2.15.0.ebuild b/sci-libs/datasets/datasets-2.16.0.ebuild similarity index 95% rename from sci-libs/datasets/datasets-2.15.0.ebuild rename to sci-libs/datasets/datasets-2.16.0.ebuild index 52af2f93ac88..0325b5ae63d6 100644 --- a/sci-libs/datasets/datasets-2.15.0.ebuild +++ b/sci-libs/datasets/datasets-2.16.0.ebuild @@ -4,7 +4,7 @@ EAPI=8 DISTUTILS_USE_PEP517=setuptools -PYTHON_COMPAT=( python3_{9..11} ) +PYTHON_COMPAT=( python3_{10..12} ) DISTUTILS_SINGLE_IMPL=1 inherit distutils-r1 @@ -36,7 +36,7 @@ RDEPEND=" dev-python/tqdm[${PYTHON_USEDEP}] dev-python/xxhash[${PYTHON_USEDEP}] dev-python/zstandard[${PYTHON_USEDEP}] - >=sci-libs/huggingface_hub-0.14.0[${PYTHON_USEDEP}] + sci-libs/huggingface_hub[${PYTHON_USEDEP}] sci-libs/scikit-learn[${PYTHON_USEDEP}] ') " diff --git a/sci-libs/datasets/files/datasets-2.15.0-tests.patch b/sci-libs/datasets/files/datasets-2.15.0-tests.patch deleted file mode 100644 index 64d8dcfdc8d8..000000000000 --- a/sci-libs/datasets/files/datasets-2.15.0-tests.patch +++ /dev/null @@ -1,46 +0,0 @@ ---- a/tests/test_arrow_dataset.py 2024-02-20 21:53:24.248470991 +0100 -+++ b/tests/test_arrow_dataset.py 2024-02-20 21:53:29.441804737 +0100 -@@ -3978,7 +3978,6 @@ - [ - "relative/path", - "/absolute/path", -- "s3://bucket/relative/path", - "hdfs://relative/path", - "hdfs:///absolute/path", - ], ---- a/tests/test_hf_gcp.py 2024-02-20 21:55:18.821852434 +0100 -+++ b/tests/test_hf_gcp.py 2024-02-20 21:55:46.525186394 +0100 -@@ -22,7 +22,6 @@ - {"dataset": "wikipedia", "config_name": "20220301.it"}, - {"dataset": "wikipedia", "config_name": "20220301.simple"}, - {"dataset": "snli", "config_name": "plain_text"}, -- {"dataset": "eli5", "config_name": "LFQA_reddit"}, - {"dataset": "wiki40b", "config_name": "en"}, - {"dataset": "wiki_dpr", "config_name": "psgs_w100.nq.compressed"}, - {"dataset": "wiki_dpr", "config_name": "psgs_w100.nq.no_index"}, ---- a/tests/test_inspect.py 2024-02-20 22:01:35.148488467 +0100 -+++ b/tests/test_inspect.py 2024-02-20 22:02:14.458561571 +0100 -@@ -15,7 +15,7 @@ - pytestmark = pytest.mark.integration - - --@pytest.mark.parametrize("path", ["paws", "csv"]) -+@pytest.mark.parametrize("path", ["csv"]) - def test_inspect_dataset(path, tmp_path): - inspect_dataset(path, tmp_path) - script_name = path + ".py" ---- a/tests/test_load.py 2024-02-20 22:12:13.699209107 +0100 -+++ b/tests/test_load.py 2024-02-20 22:13:10.862626708 +0100 -@@ -1235,12 +1235,6 @@ - - - @pytest.mark.integration --def test_load_streaming_private_dataset_with_zipped_data(hf_token, hf_private_dataset_repo_zipped_txt_data): -- ds = load_dataset(hf_private_dataset_repo_zipped_txt_data, streaming=True, token=hf_token) -- assert next(iter(ds)) is not None -- -- --@pytest.mark.integration - def test_load_dataset_config_kwargs_passed_as_arguments(): - ds_default = load_dataset(SAMPLE_DATASET_IDENTIFIER4) - ds_custom = load_dataset(SAMPLE_DATASET_IDENTIFIER4, drop_metadata=True) diff --git a/sci-libs/datasets/files/datasets-2.16.0-tests.patch b/sci-libs/datasets/files/datasets-2.16.0-tests.patch new file mode 100644 index 000000000000..6b2845bce168 --- /dev/null +++ b/sci-libs/datasets/files/datasets-2.16.0-tests.patch @@ -0,0 +1,89 @@ +--- a/tests/test_arrow_dataset.py 2024-02-20 21:53:24.248470991 +0100 ++++ b/tests/test_arrow_dataset.py 2024-02-20 21:53:29.441804737 +0100 +@@ -3982,7 +3982,6 @@ + [ + "relative/path", + "/absolute/path", +- "s3://bucket/relative/path", + "hdfs://relative/path", + "hdfs:///absolute/path", + ], +--- a/tests/test_load.py 2024-02-20 22:12:13.699209107 +0100 ++++ b/tests/test_load.py 2024-02-20 22:13:10.862626708 +0100 +@@ -386,21 +386,6 @@ + hf_modules_cache=self.hf_modules_cache, + ) + +- def test_HubDatasetModuleFactoryWithScript_dont_trust_remote_code(self): +- # "squad" has a dataset script +- factory = HubDatasetModuleFactoryWithScript( +- "squad", download_config=self.download_config, dynamic_modules_path=self.dynamic_modules_path +- ) +- with patch.object(config, "HF_DATASETS_TRUST_REMOTE_CODE", None): # this will be the default soon +- self.assertRaises(ValueError, factory.get_module) +- factory = HubDatasetModuleFactoryWithScript( +- "squad", +- download_config=self.download_config, +- dynamic_modules_path=self.dynamic_modules_path, +- trust_remote_code=False, +- ) +- self.assertRaises(ValueError, factory.get_module) +- + def test_HubDatasetModuleFactoryWithScript_with_github_dataset(self): + # "wmt_t2t" has additional imports (internal) + factory = HubDatasetModuleFactoryWithScript( +@@ -1235,12 +1235,6 @@ + + + @pytest.mark.integration +-def test_load_streaming_private_dataset_with_zipped_data(hf_token, hf_private_dataset_repo_zipped_txt_data): +- ds = load_dataset(hf_private_dataset_repo_zipped_txt_data, streaming=True, token=hf_token) +- assert next(iter(ds)) is not None +- +- +-@pytest.mark.integration + def test_load_dataset_config_kwargs_passed_as_arguments(): + ds_default = load_dataset(SAMPLE_DATASET_IDENTIFIER4) + ds_custom = load_dataset(SAMPLE_DATASET_IDENTIFIER4, drop_metadata=True) +--- a/tests/test_hf_gcp.py 2024-02-21 09:59:26.918397895 +0100 ++++ b/tests/test_hf_gcp.py 2024-02-21 09:59:46.335100597 +0100 +@@ -21,7 +21,6 @@ + {"dataset": "wikipedia", "config_name": "20220301.frr"}, + {"dataset": "wikipedia", "config_name": "20220301.it"}, + {"dataset": "wikipedia", "config_name": "20220301.simple"}, +- {"dataset": "eli5", "config_name": "LFQA_reddit"}, + {"dataset": "wiki40b", "config_name": "en"}, + {"dataset": "wiki_dpr", "config_name": "psgs_w100.nq.compressed"}, + {"dataset": "wiki_dpr", "config_name": "psgs_w100.nq.no_index"}, +--- a/tests/test_inspect.py 2024-02-21 10:03:32.315520016 +0100 ++++ b/tests/test_inspect.py 2024-02-21 10:03:50.345553490 +0100 +@@ -18,7 +18,7 @@ + pytestmark = pytest.mark.integration + + +-@pytest.mark.parametrize("path", ["paws", csv.__file__]) ++@pytest.mark.parametrize("path", [csv.__file__]) + def test_inspect_dataset(path, tmp_path): + inspect_dataset(path, tmp_path) + script_name = Path(path).stem + ".py" +--- a/tests/packaged_modules/test_cache.py 2024-02-21 12:04:18.036866572 +0100 ++++ b/tests/packaged_modules/test_cache.py 2024-02-21 12:04:54.333558520 +0100 +@@ -44,18 +44,3 @@ + Cache(dataset_name=text_dir.name, hash="missing").download_and_prepare() + with pytest.raises(ValueError): + Cache(dataset_name=text_dir.name, config_name="missing", version="auto", hash="auto").download_and_prepare() +- +- +-@pytest.mark.integration +-def test_cache_multi_configs(): +- repo_id = SAMPLE_DATASET_TWO_CONFIG_IN_METADATA +- dataset_name = repo_id.split("/")[-1] +- config_name = "v1" +- ds = load_dataset(repo_id, config_name) +- cache = Cache(dataset_name=dataset_name, repo_id=repo_id, config_name=config_name, version="auto", hash="auto") +- reloaded = cache.as_dataset() +- assert list(ds) == list(reloaded) +- assert len(ds["train"]) == len(reloaded["train"]) +- with pytest.raises(ValueError) as excinfo: +- Cache(dataset_name=dataset_name, repo_id=repo_id, config_name="missing", version="auto", hash="auto") +- assert config_name in str(excinfo.value)