public inbox for gentoo-soc@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-soc] Project IDFetch - Weekly report #8-9 ("Once upon a time...")
@ 2010-08-06 21:57 Kostyantyn Ovechko
  0 siblings, 0 replies; only message in thread
From: Kostyantyn Ovechko @ 2010-08-06 21:57 UTC (permalink / raw
  To: gentoo-soc

[-- Attachment #1: Type: text/plain, Size: 14004 bytes --]

-=====================================================================-
-    Project IDFetch - Weekly report #8-9 ("Once upon a time...")     - 
-=====================================================================-

Once upon a time there were no computers, and nobody knew how a gnome[1]
and dEmons[2] look like. Today even kids know this, but i still bumped 
into a problem that i can not see a dEmon. All it started when i was 
trying to play "Roshambo game"[3] with segget dEmon.

Firstly, i was trying to fork[4] the curly[5] daemon twice and it kept 
punching me in my nose, so i thought my TTL[6] would rapidly decrease.
I understood that it's not such an easy thing to win while fighting with
someone you can not see. And when daemon obtained Python[7-8] support 
and started to spawn[9] zombies[10], i've got even more problems.
Conscience was telling me that i must play by the rules, but consciousness
was sure that daemon doesn't always abide the protocol[11]. I've tried
to follow the thread[12-13], but the dEmon was running like a ghost[13],
so i almost got myself lost in the thicket of logs[14] and trees[15] :-(

Anjuta[16] came to my rescue and helped me to improve my tools, so i could
see what the daemon does. Unfortunately, curses[17-18] usually don't work 
on dEmons, and i really needed a pure magic to win this game. So, i've 
learned: "Mutex"[19], "Rainbow Colors"[20] and some other tricks.

In a meantime i was finding myself knowing more and more about the dEmon, 
but this was not enough and i had to prepare good arguments if i were
going to talk to the dEmon. Here they are:

1. For segget daemon:
    Command line arguments: 
        --no-daemon
        --conf-dir=specify_conf_dir_here
    Arguments are optional. If no arguments provided, segget will run in a daemon
    mode and use /etc/seggetd dir to read configuration files.

2. For request tool:
        --pkglist-file
    E.i.:
           $request --pkglist-file=/home/user/mypkg.list

3. For tuiclient:
       --wait-distfile=distfile_name
    tuiclient checks distfile status, and returns when distfile is downloaded or not in the queue.


Btw, here's features added to segget daemon and tuiclient during this 
period of time:

1. DAEMON
=========
1.1. Options:
--------------
    Add daemon mode to segget
    Add /etc/init.d/seggetd script to start|stop|restart|status segget daemon
    Check all set checksums, checksums are optional.
    Consider distfile failed if one of its segments is failed.
    Fixed: if only local mirrors are available and all of them failed to download
    a distfile, distfile still had DWAITING status, because attempt_limit wasn't reached.

    Add CoralCDN support as an option to network#.conf files (section [mode])

    Add options FOLLOW_LOCATION and MAX_REDIRS  to network#.conf files
    
    SYNOPSIS: FOLLOW_LOCATION= 0 | 1
    A parameter set to 1 tells segget to follow any Location: header that the server
    sends as part of an HTTP header. This means that the segget will re-send the
    same request on the new location and follow new Location: headers all the way
    until no more such headers are returned. MAX_REDIRS can be used to limit the
    number of redirects segget will follow.
    Default:
    follow_location=1
    
    MAX_REDIRS
    The set number will be the redirection limit. If that many redirections have
    been followed, the next redirect will cause an error. This option only makes
    sense if the FOLLOW_LOCATION is used at the same time.
    Setting the limit to 0 will make segget refuse any redirect.
    Minimum value: 0
    Maximum value: 100
    Default:
    max_redirs=5

    Add BIND_LOCAL_PORT and BIND_LOCAL_PORT_RANGE options to network#.conf files
    
    BIND_LOCAL_PORT
    This sets the local port number of the socket used for connection. This option
    can be used in combination with BIND_INTERFACE and you are recommended to
    use BIND_LOCAL_PORT_RANGE as well when this is set. Set to 0 - to disable
    binding. Valid port numbers are 1 - 65535.
    Minimum value: 0 (no binding)
    Maximum value: 65535
    Default:
    bind_local_port=0
    
    BIND_LOCAL_PORT_RANGE
    If BIND_LOCAL_PORT=0 this option will be ignored.
    This is the number of attempts segget should make to find a
    working local port number. It starts with the given BIND_LOCAL_PORT and adds
    one to the number for each retry. Setting this to 1 or below will make segget
    do only one try for the exact port number. Port numbers by nature are scarce
    resources that will be busy at times so setting this value to something too
    low might cause unnecessary connection setup failures.
    Minimum value: 1
    Maximum value: 65535
    Default:
    bind_local_port_range=20

    Add option proxy_type to network#.conf files
    
    SYNOPSIS: PROXY_TYPE = 0 | 1 | 2 | 3 | 4 | 5
    0 - HTTP
    1 - HTTP_1_0
    2 - SOCKS4
    3 - SOCKS4a
    4 - SOCKS5
    5 - SOCKS5_HOSTNAME
    Specify type of the proxy.
    Default:
    proxy_type=0

1.2. Proxy-fetcher
------------------
    Implement checks for both (proxy_fetcher and request_server) queues.
    
    There're 2 queues: proxy_fetcher queue and request_server queue.
    
    Note: Segget processes request_server queue first and if no segment was
    chosen switches to proxy_fetcher queue.
    
    Before adding a distifile to any of the queues it's necessary to
    check both queues, since distfile may already be in one of them.

1.3. Python scripting
---------------------
    Add [scripting_and_scheduling] section to segget.conf file.
    [scripting_and_scheduling]
    Segget provides Python scripting functionalyty to support scheduling.
    Each time segget tries to start a new connection certain network it calls
    a python script (client.py) to accept or reject this connection and
    if necessary adjusts its settings.
    
    PYTHON_PATH
    Define path to python
    Default:
    python_path=/usr/bin/python
    
    SCRIPTS_DIR
    Define a path to the dir with python scripts. Before establishing connection for
    a particular segment via network# segget checks SCRIPTS_DIR.
    If SCRIPTS_DIR contains net#.py file, segget will launch schedule() function
    from this file to apply settings for connetion and accept or reject this
    segment for the moment. net#.py file is a python script file
    with a user-writen schedule() function.
    It's necessary to import functions before using get("variable"),
    set("variable",value), accept_segment() and reject_segment() in schedule().
    get() function can obtain values for the following variables:
    connection.num, connection.url, connection.max_speed_limit,
    network.num, network.mode, network.active_connections_count,
    distfile.name, distfile.size, distfile.dld_segments_count,
    distfile.segments_count, distfile.active_connections_count,
    segment.num, segment.try_num, segment.size, segment.range
    set() function can change connection.max_speed_limit, see example:
    -----------------EXAMPLE STARTS-----------------
    from functions import *
    import time;
    def schedule():
        localtime = time.localtime(time.time());
        hour=localtime[3];
        # disable downloading distfiles that have size more than 5 000 000 bytes
        # from 8-00 to 22-00.
        if hour>8 and hour<22 and (get("distfile.size"))>5000000:
            print "reject because distfile is too big"
            reject_segment()
        # set speed limit 50 000 cps for distfiles larger than 1 000 000 bytes
        if get("distfile.size")>1000000:
            print "limit connection speed"
            set(connection.max_speed_limit, 50000)
            accept_segment()
    -----------------EXAMPLE ENDS-----------------
    From example above localtime returns following tuple:
    Index  Attributes       Values
      0     tm_year   e.i.: 2008
      1     tm_mon          1 to 12
      2     tm_mday         1 to 31
      3     tm_hour         0 to 23
      4     tm_min          0 to 59
      5     tm_sec          0 to 61 (60 or 61 are leap-seconds)
      6     tm_wday         0 to 6 (0 is Monday)
      7     tm_yday         1 to 366 (Julian day)
      8     tm_isdst        -1, 0, 1, -1 means library determines DST
    Therefore localtime[3] provides hours.
    Segment will be accecpted by default if it was neither accepted nor rejected
    during the schedule() function.
    sagget saves logs of resulting stdout and stderr in the log folder
    separatly for each network. Hence, if there's an error in net3.py file python
    error message would be saved to net3_script_stderr.log. Results of print would
    be saved in net3_script_stdout.log.
    Default:
    scripts_dir=./scripts
    
    SCRIPT_SOCKET_PATH
    Segget uses AF_UNIX domain sockets for communication with python.
    Specify path for the socket on your filesystem.
    Default:
    script_socket_path=/tmp/segget_script_socket

1.4 Logs
--------
    Add "none" as an option for log files.

    Add explanations for CURL error codes to logs. 

    Add options: GENERAL_LOG_TIME_FORMAT, ERROR_LOG_TIME_FORMAT and DEBUG_LOG_TIME_FORMAT to segget.conf file
    
    GENERAL_LOG_TIME_FORMAT
    Set time format for general log as a string containing any combination of
    regular characters and special format specifiers. These format specifiers are
    replaced by the function to the corresponding values to represent the time
    specified in timeptr. They all begin with a percentage (%) sign, and are:
    %a Abbreviated weekday name             [For example: Thu]
    %A Full weekday name                    [For example: Thursday]
    %b Abbreviated month name               [For example: Aug]
    %B Full month name                      [For example: August]
    %c Date and time representation         [For example: Thu Aug 23 14:55:02 2001]
    %d Day of the month (01-31)             [For example: 23]
    %H Hour in 24h format (00-23)           [For example: 14]
    %I Hour in 12h format (01-12)           [For example: 02]
    %j Day of the year (001-366)            [For example: 235]
    %m Month as a decimal number (01-12)    [For example: 08]
    %M Minute (00-59)                       [For example: 55]
    %p AM or PM designation                 [For example: PM]
    %S Second (00-61)                       [For example: 02]
    %U Week number with the first Sunday
       as the first day of week one (00-53) [For example: 33]
    %w Weekday as a decimal number with
       Sunday as 0 (0-6)                    [For example: 4]
    %W Week number with the first Monday as
       the first day of week one (00-53)    [For example: 34]
    %x Date representation                  [For example: 08/23/01]
    %X Time representation                  [For example: 14:55:02]
    %y Year, last two digits (00-99)        [For example: 01]
    %Y Year                                 [For example: 2001]
    %Z Timezone name or abbreviation        [For example: CDT]
    %% A % sign                             [For example: %]
    
    For instace general_log_time_format=Time: %m/%d %X
    
    Default:
    general_log_time_format=%m/%d %X
    
    ERROR_LOG_TIME_FORMAT
    Set time format for error log as a string containing any combination of
    regular characters and special format specifiers. See GENERAL_LOG_TIME_FORMAT
    for details on format specifiers.
    Default:
    error_log_time_format=%m/%d %X
    
    DEBUG_LOG_TIME_FORMAT
    Set time format for debug log as a string containing any combination of
    regular characters and special format specifiers. See GENERAL_LOG_TIME_FORMAT
    for details on format specifiers.
    Default:
    debug_log_time_format=%m/%d %X

2. REQUEST TOOL
===============
    Add request tool.

    Request tool reads list of distfiles from ./pkg.list file and requests
    seggetd daemon to download distfiles from the list.

3. TUICLIENT
============
    Add network_type for each connection to tui. 
    Add ETA, AVG speed and active/total connections to tui.
    Add segments counters to stats and tui. 
    Add connetion num to totals.
    Add log and error_log windows to tuiclient 
    Add distfiles window to tuiclient that shows progress on distfile downloads,
including its status: added/waiting/downloading/downloaded/failed/rejected by script etc.

[1]  Gnome http://www.gnome.org/
[2]  dEmon http://www.clker.com/cliparts/5/1/b/d/11954315391526924611beastie_freebsd_daemon_r_02.svg.med.png
[3]  Roshambo game http://www.erikandanna.com/Humor/FlashStuff/SouthPark/roshamboN.swf
[4]  fork http://en.wikipedia.org/wiki/Fork_%28software_development%29
[5]  curl http://curl.haxx.se/
[6]  TTL http://en.wikipedia.org/wiki/Time_to_live
[7]  Python http://loyalkng.com/wp-content/uploads/2010/03/adam-apple-bizarro-cartoon-comic-tampon-chandelier-pc-mac-snake-eve.jpg
[8]  Python http://www.python.org/
[9]  spawn http://en.wikipedia.org/wiki/Spawn_(computing)
[10] zombies http://en.wikipedia.org/wiki/Zombie_process
[11] protocol http://en.wikipedia.org/wiki/Communications_protocol
[12] thread http://en.wikipedia.org/wiki/Thread_(computer_science)
[13] ghost http://www.youtube.com/watch?v=9WrEDyIzdjY from the 3rd minute
[14] logs http://www.nawwal.org/~mrgoff/photojournal/2004/winspr/pictures/03-20nurselog.jpg
[15] pstrees http://en.wikipedia.org/wiki/Pstree
[16] Anjuta http://www.anjuta.org/
[17] curses http://en.wikipedia.org/wiki/Curse
[18] Ncurses http://en.wikipedia.org/wiki/Ncurses
[19] Mutex http://en.wikipedia.org/wiki/Mutual_exclusion
[20] Rainbow Colors http://idfetch.isgreat.org/_content2/tuiclient_rainbow_colors.jpg see "DISTFILES" window.

Best regards,
Kostyantyn aka simka

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2010-08-06 21:57 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-06 21:57 [gentoo-soc] Project IDFetch - Weekly report #8-9 ("Once upon a time...") Kostyantyn Ovechko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox