public inbox for gentoo-commits@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-commits] proj/g-sorcery:pypi commit in: gs_pypi/
  2013-08-12 15:41 [gentoo-commits] proj/g-sorcery:master " Jauhien Piatlicki
@ 2013-08-12 15:40 ` Jauhien Piatlicki
  0 siblings, 0 replies; 6+ messages in thread
From: Jauhien Piatlicki @ 2013-08-12 15:40 UTC (permalink / raw
  To: gentoo-commits

commit:     82a4549a97cd62c2896e8ba7d15409fc526b0a82
Author:     Jauhien Piatlicki (jauhien) <piatlicki <AT> gmail <DOT> com>
AuthorDate: Mon Aug 12 15:40:39 2013 +0000
Commit:     Jauhien Piatlicki <piatlicki <AT> gmail <DOT> com>
CommitDate: Mon Aug 12 15:40:39 2013 +0000
URL:        http://git.overlays.gentoo.org/gitweb/?p=proj/g-sorcery.git;a=commit;h=82a4549a

gs_pypi/pypi_db: fix URI and parsing

---
 gs_pypi/pypi_db.py | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/gs_pypi/pypi_db.py b/gs_pypi/pypi_db.py
index 1feb59e..1a42f92 100644
--- a/gs_pypi/pypi_db.py
+++ b/gs_pypi/pypi_db.py
@@ -24,7 +24,7 @@ class PypiDBGenerator(DBGenerator):
 
     def get_download_uries(self, common_config, config):
         self.repo_uri = config["repo_uri"]
-        return [{"uri": self.repo_uri + "?%3Aaction=index", "output": "packages"}]
+        return [{"uri": self.repo_uri + "pypi?%3Aaction=index", "output": "packages"}]
 
     def parse_data(self, data_f):
         soup = bs4.BeautifulSoup(data_f.read())
@@ -47,7 +47,7 @@ class PypiDBGenerator(DBGenerator):
                               "parser": self.parse_package_page,
                               "output": package + "-" + version})
         pkg_uries = self.decode_download_uries(pkg_uries)
-        for uri in pkg_uries:
+        for uri in pkg_uries[:10]:
             while True:
                 try:
                     self.process_uri(uri, data)
@@ -91,8 +91,14 @@ class PypiDBGenerator(DBGenerator):
                                       "pyversion": file_pyversion,
                                       "uploaded": file_uploaded,
                                       "size": file_size})
-                
-        for ul in soup("ul", class_ = "nodot")[:1]:
+
+        uls = soup("ul", class_ = "nodot")
+        if uls:
+            if "Downloads (All Versions):" in uls[0]("strong")[0].string:
+                ul = uls[1]
+            else:
+                ul = uls[0]
+
             for entry in ul.contents:
                 if not hasattr(entry, "name") or entry.name != "li":
                     continue


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [gentoo-commits] proj/g-sorcery:master commit in: gs_pypi/
@ 2013-08-12 18:38 Jauhien Piatlicki
  2013-08-12 18:38 ` [gentoo-commits] proj/g-sorcery:pypi " Jauhien Piatlicki
  0 siblings, 1 reply; 6+ messages in thread
From: Jauhien Piatlicki @ 2013-08-12 18:38 UTC (permalink / raw
  To: gentoo-commits

commit:     5db4a26f73b50f8ef398709dc605b20191dacb5e
Author:     Jauhien Piatlicki (jauhien) <piatlicki <AT> gmail <DOT> com>
AuthorDate: Mon Aug 12 18:37:57 2013 +0000
Commit:     Jauhien Piatlicki <piatlicki <AT> gmail <DOT> com>
CommitDate: Mon Aug 12 18:37:57 2013 +0000
URL:        http://git.overlays.gentoo.org/gitweb/?p=proj/g-sorcery.git;a=commit;h=5db4a26f

gs_pypi/pypi_db: fix parsing and store info in database

---
 gs_pypi/pypi_db.py | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gs_pypi/pypi_db.py b/gs_pypi/pypi_db.py
index 52e83e3..ee5c2d5 100644
--- a/gs_pypi/pypi_db.py
+++ b/gs_pypi/pypi_db.py
@@ -63,7 +63,10 @@ class PypiDBGenerator(DBGenerator):
         data = {}
         data["files"] = []
         data["info"] = {}
-        for table in soup("table")[-1:]:
+        for table in soup("table", class_ = "list")[-1:]:
+            if not "File" in table("th")[0].string:
+                continue
+
             for entry in table("tr")[1:-1]:
                 fields = entry("td")
                 
@@ -151,10 +154,12 @@ class PypiDBGenerator(DBGenerator):
                 continue
 
             files_src_uri = ""
+            md5 = ""
             if pkg_data["files"]:
                 for file_entry in pkg_data["files"]:
                     if file_entry["type"] == "\n    Source\n  ":
                         files_src_uri = file_entry["url"]
+                        md5 = file_entry["md5"]
                         break
 
             download_url = ""
@@ -205,5 +210,8 @@ class PypiDBGenerator(DBGenerator):
             ebuild_data["homepage"] = homepage
             ebuild_data["license"] = license
             ebuild_data["source_uri"] = source_uri
+            ebuild_data["md5"] = md5
+
+            ebuild_data["info"] = info
 
             pkg_db.add_package(Package(category, filtered_package, filtered_version), ebuild_data)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [gentoo-commits] proj/g-sorcery:pypi commit in: gs_pypi/
  2013-08-12 18:38 [gentoo-commits] proj/g-sorcery:master commit in: gs_pypi/ Jauhien Piatlicki
@ 2013-08-12 18:38 ` Jauhien Piatlicki
  0 siblings, 0 replies; 6+ messages in thread
From: Jauhien Piatlicki @ 2013-08-12 18:38 UTC (permalink / raw
  To: gentoo-commits

commit:     5db4a26f73b50f8ef398709dc605b20191dacb5e
Author:     Jauhien Piatlicki (jauhien) <piatlicki <AT> gmail <DOT> com>
AuthorDate: Mon Aug 12 18:37:57 2013 +0000
Commit:     Jauhien Piatlicki <piatlicki <AT> gmail <DOT> com>
CommitDate: Mon Aug 12 18:37:57 2013 +0000
URL:        http://git.overlays.gentoo.org/gitweb/?p=proj/g-sorcery.git;a=commit;h=5db4a26f

gs_pypi/pypi_db: fix parsing and store info in database

---
 gs_pypi/pypi_db.py | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gs_pypi/pypi_db.py b/gs_pypi/pypi_db.py
index 52e83e3..ee5c2d5 100644
--- a/gs_pypi/pypi_db.py
+++ b/gs_pypi/pypi_db.py
@@ -63,7 +63,10 @@ class PypiDBGenerator(DBGenerator):
         data = {}
         data["files"] = []
         data["info"] = {}
-        for table in soup("table")[-1:]:
+        for table in soup("table", class_ = "list")[-1:]:
+            if not "File" in table("th")[0].string:
+                continue
+
             for entry in table("tr")[1:-1]:
                 fields = entry("td")
                 
@@ -151,10 +154,12 @@ class PypiDBGenerator(DBGenerator):
                 continue
 
             files_src_uri = ""
+            md5 = ""
             if pkg_data["files"]:
                 for file_entry in pkg_data["files"]:
                     if file_entry["type"] == "\n    Source\n  ":
                         files_src_uri = file_entry["url"]
+                        md5 = file_entry["md5"]
                         break
 
             download_url = ""
@@ -205,5 +210,8 @@ class PypiDBGenerator(DBGenerator):
             ebuild_data["homepage"] = homepage
             ebuild_data["license"] = license
             ebuild_data["source_uri"] = source_uri
+            ebuild_data["md5"] = md5
+
+            ebuild_data["info"] = info
 
             pkg_db.add_package(Package(category, filtered_package, filtered_version), ebuild_data)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [gentoo-commits] proj/g-sorcery:pypi commit in: gs_pypi/
@ 2013-08-12 18:38 Jauhien Piatlicki
  0 siblings, 0 replies; 6+ messages in thread
From: Jauhien Piatlicki @ 2013-08-12 18:38 UTC (permalink / raw
  To: gentoo-commits

commit:     78ec3e8545829801ec9e0bc97e0fdf84c23e328b
Author:     Jauhien Piatlicki (jauhien) <piatlicki <AT> gmail <DOT> com>
AuthorDate: Mon Aug 12 15:45:13 2013 +0000
Commit:     Jauhien Piatlicki <piatlicki <AT> gmail <DOT> com>
CommitDate: Mon Aug 12 15:45:13 2013 +0000
URL:        http://git.overlays.gentoo.org/gitweb/?p=proj/g-sorcery.git;a=commit;h=78ec3e85

gs_pypi/pypi_db: remove accidentally commited debug code

---
 gs_pypi/pypi_db.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gs_pypi/pypi_db.py b/gs_pypi/pypi_db.py
index 1a42f92..52e83e3 100644
--- a/gs_pypi/pypi_db.py
+++ b/gs_pypi/pypi_db.py
@@ -47,7 +47,7 @@ class PypiDBGenerator(DBGenerator):
                               "parser": self.parse_package_page,
                               "output": package + "-" + version})
         pkg_uries = self.decode_download_uries(pkg_uries)
-        for uri in pkg_uries[:10]:
+        for uri in pkg_uries:
             while True:
                 try:
                     self.process_uri(uri, data)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [gentoo-commits] proj/g-sorcery:pypi commit in: gs_pypi/
@ 2013-08-14  8:23 Jauhien Piatlicki
  0 siblings, 0 replies; 6+ messages in thread
From: Jauhien Piatlicki @ 2013-08-14  8:23 UTC (permalink / raw
  To: gentoo-commits

commit:     8bce5c1e20223170b8569405f214cd266d0d606f
Author:     Jauhien Piatlicki (jauhien) <piatlicki <AT> gmail <DOT> com>
AuthorDate: Wed Aug 14 08:20:04 2013 +0000
Commit:     Jauhien Piatlicki <piatlicki <AT> gmail <DOT> com>
CommitDate: Wed Aug 14 08:20:04 2013 +0000
URL:        http://git.overlays.gentoo.org/gitweb/?p=proj/g-sorcery.git;a=commit;h=8bce5c1e

gs_pypi/pypi_db: ignore errors during package page parsing

---
 gs_pypi/pypi_db.py | 122 ++++++++++++++++++++++++++++-------------------------
 1 file changed, 64 insertions(+), 58 deletions(-)

diff --git a/gs_pypi/pypi_db.py b/gs_pypi/pypi_db.py
index ee5c2d5..5db6c59 100644
--- a/gs_pypi/pypi_db.py
+++ b/gs_pypi/pypi_db.py
@@ -63,66 +63,72 @@ class PypiDBGenerator(DBGenerator):
         data = {}
         data["files"] = []
         data["info"] = {}
-        for table in soup("table", class_ = "list")[-1:]:
-            if not "File" in table("th")[0].string:
-                continue
-
-            for entry in table("tr")[1:-1]:
-                fields = entry("td")
-                
-                FILE = 0
-                URL = 0
-                MD5 = 1
-                
-                TYPE = 1
-                PYVERSION = 2
-                UPLOADED = 3
-                SIZE = 4
-                
-                file_inf = fields[FILE]("a")[0]["href"].split("#")
-                file_url = file_inf[URL]
-                file_md5 = file_inf[MD5][4:]
-
-                file_type = fields[TYPE].string
-                file_pyversion = fields[PYVERSION].string
-                file_uploaded = fields[UPLOADED].string
-                file_size = fields[SIZE].string
-
-                data["files"].append({"url": file_url,
-                                      "md5": file_md5,
-                                      "type": file_type,
-                                      "pyversion": file_pyversion,
-                                      "uploaded": file_uploaded,
-                                      "size": file_size})
-
-        uls = soup("ul", class_ = "nodot")
-        if uls:
-            if "Downloads (All Versions):" in uls[0]("strong")[0].string:
-                ul = uls[1]
-            else:
-                ul = uls[0]
-
-            for entry in ul.contents:
-                if not hasattr(entry, "name") or entry.name != "li":
-                    continue
-                entry_name = entry("strong")[0].string
-                if not entry_name:
+        try:
+            for table in soup("table", class_ = "list")[-1:]:
+                if not "File" in table("th")[0].string:
                     continue
 
-                if entry_name == "Categories":
-                    data["info"][entry_name] = {}
-                    for cat_entry in entry("a"):
-                        cat_data = cat_entry.string.split(" :: ")
-                        data["info"][entry_name][cat_data[0]] = cat_data[1:]
-                    continue
-
-                if entry("span"):
-                    data["info"][entry_name] = entry("span")[0].string
-                    continue
-
-                if entry("a"):
-                    data["info"][entry_name] = entry("a")[0]["href"]
-                    continue
+                for entry in table("tr")[1:-1]:
+                    fields = entry("td")
+
+                    FILE = 0
+                    URL = 0
+                    MD5 = 1
+
+                    TYPE = 1
+                    PYVERSION = 2
+                    UPLOADED = 3
+                    SIZE = 4
+
+                    file_inf = fields[FILE]("a")[0]["href"].split("#")
+                    file_url = file_inf[URL]
+                    file_md5 = file_inf[MD5][4:]
+
+                    file_type = fields[TYPE].string
+                    file_pyversion = fields[PYVERSION].string
+                    file_uploaded = fields[UPLOADED].string
+                    file_size = fields[SIZE].string
+
+                    data["files"].append({"url": file_url,
+                                          "md5": file_md5,
+                                          "type": file_type,
+                                          "pyversion": file_pyversion,
+                                          "uploaded": file_uploaded,
+                                          "size": file_size})
+
+            uls = soup("ul", class_ = "nodot")
+            if uls:
+                if "Downloads (All Versions):" in uls[0]("strong")[0].string:
+                    ul = uls[1]
+                else:
+                    ul = uls[0]
+
+                for entry in ul.contents:
+                    if not hasattr(entry, "name") or entry.name != "li":
+                        continue
+                    entry_name = entry("strong")[0].string
+                    if not entry_name:
+                        continue
+
+                    if entry_name == "Categories":
+                        data["info"][entry_name] = {}
+                        for cat_entry in entry("a"):
+                            cat_data = cat_entry.string.split(" :: ")
+                            data["info"][entry_name][cat_data[0]] = cat_data[1:]
+                        continue
+
+                    if entry("span"):
+                        data["info"][entry_name] = entry("span")[0].string
+                        continue
+
+                    if entry("a"):
+                        data["info"][entry_name] = entry("a")[0]["href"]
+                        continue
+
+        except Exception as error:
+            print("There was an error during parsing: " + str(error))
+            print("Ignoring this package.")
+            data = {}
 
         return data
 


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [gentoo-commits] proj/g-sorcery:pypi commit in: gs_pypi/
@ 2013-08-14  8:23 Jauhien Piatlicki
  0 siblings, 0 replies; 6+ messages in thread
From: Jauhien Piatlicki @ 2013-08-14  8:23 UTC (permalink / raw
  To: gentoo-commits

commit:     97fabb7e1423bcd18a3c70148dc705bf2018bfaf
Author:     Jauhien Piatlicki (jauhien) <piatlicki <AT> gmail <DOT> com>
AuthorDate: Wed Aug 14 08:23:32 2013 +0000
Commit:     Jauhien Piatlicki <piatlicki <AT> gmail <DOT> com>
CommitDate: Wed Aug 14 08:23:32 2013 +0000
URL:        http://git.overlays.gentoo.org/gitweb/?p=proj/g-sorcery.git;a=commit;h=97fabb7e

gs_pypi/pypi_db: sleep after page downloading failed

---
 gs_pypi/pypi_db.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gs_pypi/pypi_db.py b/gs_pypi/pypi_db.py
index 5db6c59..9963b4e 100644
--- a/gs_pypi/pypi_db.py
+++ b/gs_pypi/pypi_db.py
@@ -13,6 +13,7 @@
 
 import datetime
 import re
+import time
 
 import bs4
 
@@ -53,6 +54,7 @@ class PypiDBGenerator(DBGenerator):
                     self.process_uri(uri, data)
                 except DownloadingError as error:
                     print(str(error))
+                    time.sleep(2)
                     continue
                 break
 


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-08-14  8:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-12 18:38 [gentoo-commits] proj/g-sorcery:master commit in: gs_pypi/ Jauhien Piatlicki
2013-08-12 18:38 ` [gentoo-commits] proj/g-sorcery:pypi " Jauhien Piatlicki
  -- strict thread matches above, loose matches on Subject: below --
2013-08-14  8:23 Jauhien Piatlicki
2013-08-14  8:23 Jauhien Piatlicki
2013-08-12 18:38 Jauhien Piatlicki
2013-08-12 15:41 [gentoo-commits] proj/g-sorcery:master " Jauhien Piatlicki
2013-08-12 15:40 ` [gentoo-commits] proj/g-sorcery:pypi " Jauhien Piatlicki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox