From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17694 invoked by uid 1002); 30 Mar 2003 10:47:07 -0000 Mailing-List: contact gentoo-dev-help@gentoo.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@gentoo.org Received: (qmail 4106 invoked from network); 30 Mar 2003 10:47:06 -0000 Date: Sun, 30 Mar 2003 02:47:05 -0800 From: "Robin H. Johnson" To: George Shapovalov Cc: gentoo-dev@gentoo.org Message-ID: <20030330104705.GA20613@cherenkov.orbis-terrarum.net> Mail-Followup-To: George Shapovalov , gentoo-dev@gentoo.org References: <20030330063505.GA19969@cherenkov.orbis-terrarum.net> <20030330075150.GA20237@cherenkov.orbis-terrarum.net> <200303300113.24841.george@gentoo.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="NzB8fVQJ5HfG6fxh" Content-Disposition: inline In-Reply-To: <200303300113.24841.george@gentoo.org> User-Agent: Mutt/1.5.3i Subject: Re: [gentoo-dev] Portage programming question X-Archives-Salt: 0264b9ef-8c78-451b-8a29-fa8f562a372f X-Archives-Hash: fa7ac75b44818af923b0ee0326ef7d35 --NzB8fVQJ5HfG6fxh Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sun, Mar 30, 2003 at 01:13:24AM -0800, George Shapovalov wrote: > On Saturday 29 March 2003 23:51, Robin H. Johnson wrote: > > I'm going to write some C++ or Java instead for what I want. Python > > seems to lack the Sets and other sequence datatypes that would be > > greatly useful. > Well, not sure what are you referring to, but python sure does have seque= nce=20 > types and sets. In fact all sequence types provide set abilities. > You might want to take a look here: > http://www.python.org/doc/current/lib/typesseq.html > and here: > http://www.python.org/doc/current/tut/node7.html I looked at those originally when learning python properly earlier today. My definitions: A LIST is a group of unordered and possibly duplicate items. As in LINKED LIST. Commonly O(n) time to access any item. O(n/2) or better if cleverly implemented. O(n) time for add/remove/contains/size. O(1) for add/remove (after search). An ARRAY is a LIST with additional pointers to each element allowing constant time O(1) access to each element. O(n) time for arbitrary add/remove.=20 A SET is a group of items that are all unique and are accessed in naturally sorted order. O(1) time for access/add/remove/contains/size. A MAP is defined as a SET with each element being the key to another data structure. (Python provides this as a dictionary). Same big-O as SET. A BIMAP is best explained as a pair of SETs doubly linked in MAPs. It is MAP that can be used in both directions, with significent memory, speed and consistancy improvements, over just using two MAPs. An explaination of BIMAP: A is the set of (A1,A2,A3) B is the set of (B1,B2,B3) BIMAP.parents() might be: {A1=3D>(B1,B3),A2=3D>(B2),A3=3D(B2,B3)} BIMAP.children() would then be: {B1=3D>(A1),B2=3D>(A2,A3),B3=3D(A1,A3)} The actual data of A and B is stored seperately and references to them are used since each item will appear at least twice. BIMAPs offer the major advantage that an element from set of data can be used as a key to obtain related items in constant amortized time. O(1) for almost all operations. Hashtable implementation of SET/MAP/BIMAP is cruicial for high performance. Python's implementation of a MAP (dictionary) implemented via hash tables, but I can't find anything on tweaking the load factor and initial capacity, which makes a significent difference on data sets of large but known sizes. For these structures, various iterators are required as well, as this is the best way to traverse the data. Functional programming methods to apply to the data is also required, such as (Python provides these first three) filter, map, reduce, (Python doesn't seem to have these last three) set union, set difference, set intersection. For a SET, I considered using MAP and just ignoring the data field, but that seems like a terrible waste of cpu and memory. I find it really strange that Python doesn't have a SET while it does have a MAP, esp. given that MAPs are a logical extension of SETs. It seems Python did at one stage consider adding SETS: http://www.python.org/peps/pep-0218.html But I can't make much sense of the outcome. It seems I would have to implement BIMAP myself, since there isn't one in Python that I can find at all. I would admit that it is significently less common data structure than a list/map/set. I did have to write my own Java implementation of it, but it is available in other functional programming languages (Mercury provides a very good example). BIMAP is the main structure that I would be using, but SET and MAP also to a slightly lesser degree. I want to create a BIMAP containing the relative paths to each ebuild as one key, and the IUSE data from that ebuild as the other key. --=20 Robin Hugh Johnson E-Mail : robbat2@orbis-terrarum.net Home Page : http://www.orbis-terrarum.net/?l=3Dpeople.robbat2 ICQ# : 30269588 or 41961639 GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 --NzB8fVQJ5HfG6fxh Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.0 (GNU/Linux) Comment: Robbat2 @ Orbis-Terrarum Networks iD8DBQE+hsspsnuUTjSIToURAn1FAJoC9iTZ4V46OiicBASKJP40pwoI4wCfU9zD 5TiSxwhyEWNG8J/MaELqtWo= =Y9Sn -----END PGP SIGNATURE----- --NzB8fVQJ5HfG6fxh--