* [gentoo-commits] proj/g-sorcery:pypi commit in: gs_pypi/
2013-08-12 15:41 [gentoo-commits] proj/g-sorcery:master " Jauhien Piatlicki
@ 2013-08-12 15:40 ` Jauhien Piatlicki
0 siblings, 0 replies; 6+ messages in thread
From: Jauhien Piatlicki @ 2013-08-12 15:40 UTC (permalink / raw
To: gentoo-commits
commit: 82a4549a97cd62c2896e8ba7d15409fc526b0a82
Author: Jauhien Piatlicki (jauhien) <piatlicki <AT> gmail <DOT> com>
AuthorDate: Mon Aug 12 15:40:39 2013 +0000
Commit: Jauhien Piatlicki <piatlicki <AT> gmail <DOT> com>
CommitDate: Mon Aug 12 15:40:39 2013 +0000
URL: http://git.overlays.gentoo.org/gitweb/?p=proj/g-sorcery.git;a=commit;h=82a4549a
gs_pypi/pypi_db: fix URI and parsing
---
gs_pypi/pypi_db.py | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/gs_pypi/pypi_db.py b/gs_pypi/pypi_db.py
index 1feb59e..1a42f92 100644
--- a/gs_pypi/pypi_db.py
+++ b/gs_pypi/pypi_db.py
@@ -24,7 +24,7 @@ class PypiDBGenerator(DBGenerator):
def get_download_uries(self, common_config, config):
self.repo_uri = config["repo_uri"]
- return [{"uri": self.repo_uri + "?%3Aaction=index", "output": "packages"}]
+ return [{"uri": self.repo_uri + "pypi?%3Aaction=index", "output": "packages"}]
def parse_data(self, data_f):
soup = bs4.BeautifulSoup(data_f.read())
@@ -47,7 +47,7 @@ class PypiDBGenerator(DBGenerator):
"parser": self.parse_package_page,
"output": package + "-" + version})
pkg_uries = self.decode_download_uries(pkg_uries)
- for uri in pkg_uries:
+ for uri in pkg_uries[:10]:
while True:
try:
self.process_uri(uri, data)
@@ -91,8 +91,14 @@ class PypiDBGenerator(DBGenerator):
"pyversion": file_pyversion,
"uploaded": file_uploaded,
"size": file_size})
-
- for ul in soup("ul", class_ = "nodot")[:1]:
+
+ uls = soup("ul", class_ = "nodot")
+ if uls:
+ if "Downloads (All Versions):" in uls[0]("strong")[0].string:
+ ul = uls[1]
+ else:
+ ul = uls[0]
+
for entry in ul.contents:
if not hasattr(entry, "name") or entry.name != "li":
continue
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [gentoo-commits] proj/g-sorcery:master commit in: gs_pypi/
@ 2013-08-12 18:38 Jauhien Piatlicki
2013-08-12 18:38 ` [gentoo-commits] proj/g-sorcery:pypi " Jauhien Piatlicki
0 siblings, 1 reply; 6+ messages in thread
From: Jauhien Piatlicki @ 2013-08-12 18:38 UTC (permalink / raw
To: gentoo-commits
commit: 5db4a26f73b50f8ef398709dc605b20191dacb5e
Author: Jauhien Piatlicki (jauhien) <piatlicki <AT> gmail <DOT> com>
AuthorDate: Mon Aug 12 18:37:57 2013 +0000
Commit: Jauhien Piatlicki <piatlicki <AT> gmail <DOT> com>
CommitDate: Mon Aug 12 18:37:57 2013 +0000
URL: http://git.overlays.gentoo.org/gitweb/?p=proj/g-sorcery.git;a=commit;h=5db4a26f
gs_pypi/pypi_db: fix parsing and store info in database
---
gs_pypi/pypi_db.py | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/gs_pypi/pypi_db.py b/gs_pypi/pypi_db.py
index 52e83e3..ee5c2d5 100644
--- a/gs_pypi/pypi_db.py
+++ b/gs_pypi/pypi_db.py
@@ -63,7 +63,10 @@ class PypiDBGenerator(DBGenerator):
data = {}
data["files"] = []
data["info"] = {}
- for table in soup("table")[-1:]:
+ for table in soup("table", class_ = "list")[-1:]:
+ if not "File" in table("th")[0].string:
+ continue
+
for entry in table("tr")[1:-1]:
fields = entry("td")
@@ -151,10 +154,12 @@ class PypiDBGenerator(DBGenerator):
continue
files_src_uri = ""
+ md5 = ""
if pkg_data["files"]:
for file_entry in pkg_data["files"]:
if file_entry["type"] == "\n Source\n ":
files_src_uri = file_entry["url"]
+ md5 = file_entry["md5"]
break
download_url = ""
@@ -205,5 +210,8 @@ class PypiDBGenerator(DBGenerator):
ebuild_data["homepage"] = homepage
ebuild_data["license"] = license
ebuild_data["source_uri"] = source_uri
+ ebuild_data["md5"] = md5
+
+ ebuild_data["info"] = info
pkg_db.add_package(Package(category, filtered_package, filtered_version), ebuild_data)
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [gentoo-commits] proj/g-sorcery:pypi commit in: gs_pypi/
2013-08-12 18:38 [gentoo-commits] proj/g-sorcery:master commit in: gs_pypi/ Jauhien Piatlicki
@ 2013-08-12 18:38 ` Jauhien Piatlicki
0 siblings, 0 replies; 6+ messages in thread
From: Jauhien Piatlicki @ 2013-08-12 18:38 UTC (permalink / raw
To: gentoo-commits
commit: 5db4a26f73b50f8ef398709dc605b20191dacb5e
Author: Jauhien Piatlicki (jauhien) <piatlicki <AT> gmail <DOT> com>
AuthorDate: Mon Aug 12 18:37:57 2013 +0000
Commit: Jauhien Piatlicki <piatlicki <AT> gmail <DOT> com>
CommitDate: Mon Aug 12 18:37:57 2013 +0000
URL: http://git.overlays.gentoo.org/gitweb/?p=proj/g-sorcery.git;a=commit;h=5db4a26f
gs_pypi/pypi_db: fix parsing and store info in database
---
gs_pypi/pypi_db.py | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/gs_pypi/pypi_db.py b/gs_pypi/pypi_db.py
index 52e83e3..ee5c2d5 100644
--- a/gs_pypi/pypi_db.py
+++ b/gs_pypi/pypi_db.py
@@ -63,7 +63,10 @@ class PypiDBGenerator(DBGenerator):
data = {}
data["files"] = []
data["info"] = {}
- for table in soup("table")[-1:]:
+ for table in soup("table", class_ = "list")[-1:]:
+ if not "File" in table("th")[0].string:
+ continue
+
for entry in table("tr")[1:-1]:
fields = entry("td")
@@ -151,10 +154,12 @@ class PypiDBGenerator(DBGenerator):
continue
files_src_uri = ""
+ md5 = ""
if pkg_data["files"]:
for file_entry in pkg_data["files"]:
if file_entry["type"] == "\n Source\n ":
files_src_uri = file_entry["url"]
+ md5 = file_entry["md5"]
break
download_url = ""
@@ -205,5 +210,8 @@ class PypiDBGenerator(DBGenerator):
ebuild_data["homepage"] = homepage
ebuild_data["license"] = license
ebuild_data["source_uri"] = source_uri
+ ebuild_data["md5"] = md5
+
+ ebuild_data["info"] = info
pkg_db.add_package(Package(category, filtered_package, filtered_version), ebuild_data)
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [gentoo-commits] proj/g-sorcery:pypi commit in: gs_pypi/
@ 2013-08-12 18:38 Jauhien Piatlicki
0 siblings, 0 replies; 6+ messages in thread
From: Jauhien Piatlicki @ 2013-08-12 18:38 UTC (permalink / raw
To: gentoo-commits
commit: 78ec3e8545829801ec9e0bc97e0fdf84c23e328b
Author: Jauhien Piatlicki (jauhien) <piatlicki <AT> gmail <DOT> com>
AuthorDate: Mon Aug 12 15:45:13 2013 +0000
Commit: Jauhien Piatlicki <piatlicki <AT> gmail <DOT> com>
CommitDate: Mon Aug 12 15:45:13 2013 +0000
URL: http://git.overlays.gentoo.org/gitweb/?p=proj/g-sorcery.git;a=commit;h=78ec3e85
gs_pypi/pypi_db: remove accidentally commited debug code
---
gs_pypi/pypi_db.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gs_pypi/pypi_db.py b/gs_pypi/pypi_db.py
index 1a42f92..52e83e3 100644
--- a/gs_pypi/pypi_db.py
+++ b/gs_pypi/pypi_db.py
@@ -47,7 +47,7 @@ class PypiDBGenerator(DBGenerator):
"parser": self.parse_package_page,
"output": package + "-" + version})
pkg_uries = self.decode_download_uries(pkg_uries)
- for uri in pkg_uries[:10]:
+ for uri in pkg_uries:
while True:
try:
self.process_uri(uri, data)
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [gentoo-commits] proj/g-sorcery:pypi commit in: gs_pypi/
@ 2013-08-14 8:23 Jauhien Piatlicki
0 siblings, 0 replies; 6+ messages in thread
From: Jauhien Piatlicki @ 2013-08-14 8:23 UTC (permalink / raw
To: gentoo-commits
commit: 8bce5c1e20223170b8569405f214cd266d0d606f
Author: Jauhien Piatlicki (jauhien) <piatlicki <AT> gmail <DOT> com>
AuthorDate: Wed Aug 14 08:20:04 2013 +0000
Commit: Jauhien Piatlicki <piatlicki <AT> gmail <DOT> com>
CommitDate: Wed Aug 14 08:20:04 2013 +0000
URL: http://git.overlays.gentoo.org/gitweb/?p=proj/g-sorcery.git;a=commit;h=8bce5c1e
gs_pypi/pypi_db: ignore errors during package page parsing
---
gs_pypi/pypi_db.py | 122 ++++++++++++++++++++++++++++-------------------------
1 file changed, 64 insertions(+), 58 deletions(-)
diff --git a/gs_pypi/pypi_db.py b/gs_pypi/pypi_db.py
index ee5c2d5..5db6c59 100644
--- a/gs_pypi/pypi_db.py
+++ b/gs_pypi/pypi_db.py
@@ -63,66 +63,72 @@ class PypiDBGenerator(DBGenerator):
data = {}
data["files"] = []
data["info"] = {}
- for table in soup("table", class_ = "list")[-1:]:
- if not "File" in table("th")[0].string:
- continue
-
- for entry in table("tr")[1:-1]:
- fields = entry("td")
-
- FILE = 0
- URL = 0
- MD5 = 1
-
- TYPE = 1
- PYVERSION = 2
- UPLOADED = 3
- SIZE = 4
-
- file_inf = fields[FILE]("a")[0]["href"].split("#")
- file_url = file_inf[URL]
- file_md5 = file_inf[MD5][4:]
-
- file_type = fields[TYPE].string
- file_pyversion = fields[PYVERSION].string
- file_uploaded = fields[UPLOADED].string
- file_size = fields[SIZE].string
-
- data["files"].append({"url": file_url,
- "md5": file_md5,
- "type": file_type,
- "pyversion": file_pyversion,
- "uploaded": file_uploaded,
- "size": file_size})
-
- uls = soup("ul", class_ = "nodot")
- if uls:
- if "Downloads (All Versions):" in uls[0]("strong")[0].string:
- ul = uls[1]
- else:
- ul = uls[0]
-
- for entry in ul.contents:
- if not hasattr(entry, "name") or entry.name != "li":
- continue
- entry_name = entry("strong")[0].string
- if not entry_name:
+ try:
+ for table in soup("table", class_ = "list")[-1:]:
+ if not "File" in table("th")[0].string:
continue
- if entry_name == "Categories":
- data["info"][entry_name] = {}
- for cat_entry in entry("a"):
- cat_data = cat_entry.string.split(" :: ")
- data["info"][entry_name][cat_data[0]] = cat_data[1:]
- continue
-
- if entry("span"):
- data["info"][entry_name] = entry("span")[0].string
- continue
-
- if entry("a"):
- data["info"][entry_name] = entry("a")[0]["href"]
- continue
+ for entry in table("tr")[1:-1]:
+ fields = entry("td")
+
+ FILE = 0
+ URL = 0
+ MD5 = 1
+
+ TYPE = 1
+ PYVERSION = 2
+ UPLOADED = 3
+ SIZE = 4
+
+ file_inf = fields[FILE]("a")[0]["href"].split("#")
+ file_url = file_inf[URL]
+ file_md5 = file_inf[MD5][4:]
+
+ file_type = fields[TYPE].string
+ file_pyversion = fields[PYVERSION].string
+ file_uploaded = fields[UPLOADED].string
+ file_size = fields[SIZE].string
+
+ data["files"].append({"url": file_url,
+ "md5": file_md5,
+ "type": file_type,
+ "pyversion": file_pyversion,
+ "uploaded": file_uploaded,
+ "size": file_size})
+
+ uls = soup("ul", class_ = "nodot")
+ if uls:
+ if "Downloads (All Versions):" in uls[0]("strong")[0].string:
+ ul = uls[1]
+ else:
+ ul = uls[0]
+
+ for entry in ul.contents:
+ if not hasattr(entry, "name") or entry.name != "li":
+ continue
+ entry_name = entry("strong")[0].string
+ if not entry_name:
+ continue
+
+ if entry_name == "Categories":
+ data["info"][entry_name] = {}
+ for cat_entry in entry("a"):
+ cat_data = cat_entry.string.split(" :: ")
+ data["info"][entry_name][cat_data[0]] = cat_data[1:]
+ continue
+
+ if entry("span"):
+ data["info"][entry_name] = entry("span")[0].string
+ continue
+
+ if entry("a"):
+ data["info"][entry_name] = entry("a")[0]["href"]
+ continue
+
+ except Exception as error:
+ print("There was an error during parsing: " + str(error))
+ print("Ignoring this package.")
+ data = {}
return data
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [gentoo-commits] proj/g-sorcery:pypi commit in: gs_pypi/
@ 2013-08-14 8:23 Jauhien Piatlicki
0 siblings, 0 replies; 6+ messages in thread
From: Jauhien Piatlicki @ 2013-08-14 8:23 UTC (permalink / raw
To: gentoo-commits
commit: 97fabb7e1423bcd18a3c70148dc705bf2018bfaf
Author: Jauhien Piatlicki (jauhien) <piatlicki <AT> gmail <DOT> com>
AuthorDate: Wed Aug 14 08:23:32 2013 +0000
Commit: Jauhien Piatlicki <piatlicki <AT> gmail <DOT> com>
CommitDate: Wed Aug 14 08:23:32 2013 +0000
URL: http://git.overlays.gentoo.org/gitweb/?p=proj/g-sorcery.git;a=commit;h=97fabb7e
gs_pypi/pypi_db: sleep after page downloading failed
---
gs_pypi/pypi_db.py | 2 ++
1 file changed, 2 insertions(+)
diff --git a/gs_pypi/pypi_db.py b/gs_pypi/pypi_db.py
index 5db6c59..9963b4e 100644
--- a/gs_pypi/pypi_db.py
+++ b/gs_pypi/pypi_db.py
@@ -13,6 +13,7 @@
import datetime
import re
+import time
import bs4
@@ -53,6 +54,7 @@ class PypiDBGenerator(DBGenerator):
self.process_uri(uri, data)
except DownloadingError as error:
print(str(error))
+ time.sleep(2)
continue
break
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-08-14 8:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-12 18:38 [gentoo-commits] proj/g-sorcery:master commit in: gs_pypi/ Jauhien Piatlicki
2013-08-12 18:38 ` [gentoo-commits] proj/g-sorcery:pypi " Jauhien Piatlicki
-- strict thread matches above, loose matches on Subject: below --
2013-08-14 8:23 Jauhien Piatlicki
2013-08-14 8:23 Jauhien Piatlicki
2013-08-12 18:38 Jauhien Piatlicki
2013-08-12 15:41 [gentoo-commits] proj/g-sorcery:master " Jauhien Piatlicki
2013-08-12 15:40 ` [gentoo-commits] proj/g-sorcery:pypi " Jauhien Piatlicki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox