[gentoo-commits] repo/gentoo:master commit in: sci-libs/datasets/, sci-libs/datasets/files/

public inbox for gentoo-commits@lists.gentoo.org
 help / color / mirror / Atom feed

From: "Alfredo Tupone" <tupone@gentoo.org>
To: gentoo-commits@lists.gentoo.org
Subject: [gentoo-commits] repo/gentoo:master commit in: sci-libs/datasets/, sci-libs/datasets/files/
Date: Wed, 21 Feb 2024 11:33:42 +0000 (UTC)	[thread overview]
Message-ID: <1708515185.c05b1cf909d724d72c0ce14fd7c870ab3749677f.tupone@gentoo> (raw)

commit:     c05b1cf909d724d72c0ce14fd7c870ab3749677f
Author:     Alfredo Tupone <tupone <AT> gentoo <DOT> org>
AuthorDate: Wed Feb 21 11:32:18 2024 +0000
Commit:     Alfredo Tupone <tupone <AT> gentoo <DOT> org>
CommitDate: Wed Feb 21 11:33:05 2024 +0000
URL:        https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=c05b1cf9

sci-libs/datasets: add 2.16.0, drop 2.15.0

Signed-off-by: Alfredo Tupone <tupone <AT> gentoo.org>

 sci-libs/datasets/Manifest                         |  2 +-
 ...tasets-2.15.0.ebuild => datasets-2.16.0.ebuild} |  4 +-
 .../datasets/files/datasets-2.15.0-tests.patch     | 46 -----------
 .../datasets/files/datasets-2.16.0-tests.patch     | 89 ++++++++++++++++++++++
 4 files changed, 92 insertions(+), 49 deletions(-)

diff --git a/sci-libs/datasets/Manifest b/sci-libs/datasets/Manifest
index 42abdebf934c..0880ec7cb629 100644
--- a/sci-libs/datasets/Manifest
+++ b/sci-libs/datasets/Manifest
@@ -1 +1 @@
-DIST datasets-2.15.0.gh.tar.gz 2147191 BLAKE2B eadf0133f0baa9f0469a51f28e00d3656b2b799ed1ff221ad6df39640c9777ccd46b706e46898ffa0597bc43288ee5991410d5c6d0a2cb3b814658c92d779a68 SHA512 589ca7992d58007c556558ef0889354fe34821f55e79025ea475d08c105428fe84c77c9183ec0028d8e60b25ba0ea8565bd8c6003a85bb6472d1cb4a247142e2
+DIST datasets-2.16.0.gh.tar.gz 2163874 BLAKE2B baec91a0e39fac3e07f11e352a286c0940cbc672e7233267e70d1abb64dd31bae18c55213a20fafaeaf2f60268104f294c77c9b73ddc1b289175904288a7c440 SHA512 f2a17ffab192163cfc196cc2bad0adb2ca657b5cf911f74f299b6e29eb4fcfacc377505b1857974a6b55252eedf8775a8706f9e991450c55e5d613020dc03735

diff --git a/sci-libs/datasets/datasets-2.15.0.ebuild b/sci-libs/datasets/datasets-2.16.0.ebuild
similarity index 95%
rename from sci-libs/datasets/datasets-2.15.0.ebuild
rename to sci-libs/datasets/datasets-2.16.0.ebuild
index 52af2f93ac88..0325b5ae63d6 100644
--- a/sci-libs/datasets/datasets-2.15.0.ebuild
+++ b/sci-libs/datasets/datasets-2.16.0.ebuild
@@ -4,7 +4,7 @@
 EAPI=8
 
 DISTUTILS_USE_PEP517=setuptools
-PYTHON_COMPAT=( python3_{9..11} )
+PYTHON_COMPAT=( python3_{10..12} )
 DISTUTILS_SINGLE_IMPL=1
 inherit distutils-r1
 
@@ -36,7 +36,7 @@ RDEPEND="
 		dev-python/tqdm[${PYTHON_USEDEP}]
 		dev-python/xxhash[${PYTHON_USEDEP}]
 		dev-python/zstandard[${PYTHON_USEDEP}]
-		>=sci-libs/huggingface_hub-0.14.0[${PYTHON_USEDEP}]
+		sci-libs/huggingface_hub[${PYTHON_USEDEP}]
 		sci-libs/scikit-learn[${PYTHON_USEDEP}]
 	')
 "

diff --git a/sci-libs/datasets/files/datasets-2.15.0-tests.patch b/sci-libs/datasets/files/datasets-2.15.0-tests.patch
deleted file mode 100644
index 64d8dcfdc8d8..000000000000
--- a/sci-libs/datasets/files/datasets-2.15.0-tests.patch
+++ /dev/null
@@ -1,46 +0,0 @@
---- a/tests/test_arrow_dataset.py	2024-02-20 21:53:24.248470991 +0100
-+++ b/tests/test_arrow_dataset.py	2024-02-20 21:53:29.441804737 +0100
-@@ -3978,7 +3978,6 @@
-     [
-         "relative/path",
-         "/absolute/path",
--        "s3://bucket/relative/path",
-         "hdfs://relative/path",
-         "hdfs:///absolute/path",
-     ],
---- a/tests/test_hf_gcp.py	2024-02-20 21:55:18.821852434 +0100
-+++ b/tests/test_hf_gcp.py	2024-02-20 21:55:46.525186394 +0100
-@@ -22,7 +22,6 @@
-     {"dataset": "wikipedia", "config_name": "20220301.it"},
-     {"dataset": "wikipedia", "config_name": "20220301.simple"},
-     {"dataset": "snli", "config_name": "plain_text"},
--    {"dataset": "eli5", "config_name": "LFQA_reddit"},
-     {"dataset": "wiki40b", "config_name": "en"},
-     {"dataset": "wiki_dpr", "config_name": "psgs_w100.nq.compressed"},
-     {"dataset": "wiki_dpr", "config_name": "psgs_w100.nq.no_index"},
---- a/tests/test_inspect.py	2024-02-20 22:01:35.148488467 +0100
-+++ b/tests/test_inspect.py	2024-02-20 22:02:14.458561571 +0100
-@@ -15,7 +15,7 @@
- pytestmark = pytest.mark.integration
- 
- 
--@pytest.mark.parametrize("path", ["paws", "csv"])
-+@pytest.mark.parametrize("path", ["csv"])
- def test_inspect_dataset(path, tmp_path):
-     inspect_dataset(path, tmp_path)
-     script_name = path + ".py"
---- a/tests/test_load.py	2024-02-20 22:12:13.699209107 +0100
-+++ b/tests/test_load.py	2024-02-20 22:13:10.862626708 +0100
-@@ -1235,12 +1235,6 @@
- 
- 
- @pytest.mark.integration
--def test_load_streaming_private_dataset_with_zipped_data(hf_token, hf_private_dataset_repo_zipped_txt_data):
--    ds = load_dataset(hf_private_dataset_repo_zipped_txt_data, streaming=True, token=hf_token)
--    assert next(iter(ds)) is not None
--
--
--@pytest.mark.integration
- def test_load_dataset_config_kwargs_passed_as_arguments():
-     ds_default = load_dataset(SAMPLE_DATASET_IDENTIFIER4)
-     ds_custom = load_dataset(SAMPLE_DATASET_IDENTIFIER4, drop_metadata=True)

diff --git a/sci-libs/datasets/files/datasets-2.16.0-tests.patch b/sci-libs/datasets/files/datasets-2.16.0-tests.patch
new file mode 100644
index 000000000000..6b2845bce168
--- /dev/null
+++ b/sci-libs/datasets/files/datasets-2.16.0-tests.patch
@@ -0,0 +1,89 @@
+--- a/tests/test_arrow_dataset.py	2024-02-20 21:53:24.248470991 +0100
++++ b/tests/test_arrow_dataset.py	2024-02-20 21:53:29.441804737 +0100
+@@ -3982,7 +3982,6 @@
+     [
+         "relative/path",
+         "/absolute/path",
+-        "s3://bucket/relative/path",
+         "hdfs://relative/path",
+         "hdfs:///absolute/path",
+     ],
+--- a/tests/test_load.py	2024-02-20 22:12:13.699209107 +0100
++++ b/tests/test_load.py	2024-02-20 22:13:10.862626708 +0100
+@@ -386,21 +386,6 @@
+             hf_modules_cache=self.hf_modules_cache,
+         )
+ 
+-    def test_HubDatasetModuleFactoryWithScript_dont_trust_remote_code(self):
+-        # "squad" has a dataset script
+-        factory = HubDatasetModuleFactoryWithScript(
+-            "squad", download_config=self.download_config, dynamic_modules_path=self.dynamic_modules_path
+-        )
+-        with patch.object(config, "HF_DATASETS_TRUST_REMOTE_CODE", None):  # this will be the default soon
+-            self.assertRaises(ValueError, factory.get_module)
+-        factory = HubDatasetModuleFactoryWithScript(
+-            "squad",
+-            download_config=self.download_config,
+-            dynamic_modules_path=self.dynamic_modules_path,
+-            trust_remote_code=False,
+-        )
+-        self.assertRaises(ValueError, factory.get_module)
+-
+     def test_HubDatasetModuleFactoryWithScript_with_github_dataset(self):
+         # "wmt_t2t" has additional imports (internal)
+         factory = HubDatasetModuleFactoryWithScript(
+@@ -1235,12 +1235,6 @@
+ 
+ 
+ @pytest.mark.integration
+-def test_load_streaming_private_dataset_with_zipped_data(hf_token, hf_private_dataset_repo_zipped_txt_data):
+-    ds = load_dataset(hf_private_dataset_repo_zipped_txt_data, streaming=True, token=hf_token)
+-    assert next(iter(ds)) is not None
+-
+-
+-@pytest.mark.integration
+ def test_load_dataset_config_kwargs_passed_as_arguments():
+     ds_default = load_dataset(SAMPLE_DATASET_IDENTIFIER4)
+     ds_custom = load_dataset(SAMPLE_DATASET_IDENTIFIER4, drop_metadata=True)
+--- a/tests/test_hf_gcp.py	2024-02-21 09:59:26.918397895 +0100
++++ b/tests/test_hf_gcp.py	2024-02-21 09:59:46.335100597 +0100
+@@ -21,7 +21,6 @@
+     {"dataset": "wikipedia", "config_name": "20220301.frr"},
+     {"dataset": "wikipedia", "config_name": "20220301.it"},
+     {"dataset": "wikipedia", "config_name": "20220301.simple"},
+-    {"dataset": "eli5", "config_name": "LFQA_reddit"},
+     {"dataset": "wiki40b", "config_name": "en"},
+     {"dataset": "wiki_dpr", "config_name": "psgs_w100.nq.compressed"},
+     {"dataset": "wiki_dpr", "config_name": "psgs_w100.nq.no_index"},
+--- a/tests/test_inspect.py	2024-02-21 10:03:32.315520016 +0100
++++ b/tests/test_inspect.py	2024-02-21 10:03:50.345553490 +0100
+@@ -18,7 +18,7 @@
+ pytestmark = pytest.mark.integration
+ 
+ 
+-@pytest.mark.parametrize("path", ["paws", csv.__file__])
++@pytest.mark.parametrize("path", [csv.__file__])
+ def test_inspect_dataset(path, tmp_path):
+     inspect_dataset(path, tmp_path)
+     script_name = Path(path).stem + ".py"
+--- a/tests/packaged_modules/test_cache.py	2024-02-21 12:04:18.036866572 +0100
++++ b/tests/packaged_modules/test_cache.py	2024-02-21 12:04:54.333558520 +0100
+@@ -44,18 +44,3 @@
+         Cache(dataset_name=text_dir.name, hash="missing").download_and_prepare()
+     with pytest.raises(ValueError):
+         Cache(dataset_name=text_dir.name, config_name="missing", version="auto", hash="auto").download_and_prepare()
+-
+-
+-@pytest.mark.integration
+-def test_cache_multi_configs():
+-    repo_id = SAMPLE_DATASET_TWO_CONFIG_IN_METADATA
+-    dataset_name = repo_id.split("/")[-1]
+-    config_name = "v1"
+-    ds = load_dataset(repo_id, config_name)
+-    cache = Cache(dataset_name=dataset_name, repo_id=repo_id, config_name=config_name, version="auto", hash="auto")
+-    reloaded = cache.as_dataset()
+-    assert list(ds) == list(reloaded)
+-    assert len(ds["train"]) == len(reloaded["train"])
+-    with pytest.raises(ValueError) as excinfo:
+-        Cache(dataset_name=dataset_name, repo_id=repo_id, config_name="missing", version="auto", hash="auto")
+-    assert config_name in str(excinfo.value)

next             reply	other threads:[~2024-02-21 11:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-21 11:33 Alfredo Tupone [this message]
  -- strict thread matches above, loose matches on Subject: below --
2024-02-22 18:37 [gentoo-commits] repo/gentoo:master commit in: sci-libs/datasets/, sci-libs/datasets/files/ Alfredo Tupone
2024-02-22  7:27 Alfredo Tupone
2023-08-24 15:29 Alfredo Tupone
2023-05-07 20:16 Alfredo Tupone

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:42abdebf934 dfblob:0880ec7cb62 dfblob:52af2f93ac8
dfblob:0325b5ae63d dfblob:64d8dcfdc8d dfblob:6b2845bce16 )
 OR (
bs:"[gentoo-commits] repo/gentoo:master commit in: sci-libs/datasets/, sci-libs/datasets/files/" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1708515185.c05b1cf909d724d72c0ce14fd7c870ab3749677f.tupone@gentoo \
    --to=tupone@gentoo.org \
    --cc=gentoo-commits@lists.gentoo.org \
    --cc=gentoo-dev@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox