-=====================================================================- - Project IDFetch - Weekly report #8-9 ("Once upon a time...") - -=====================================================================- Once upon a time there were no computers, and nobody knew how a gnome[1] and dEmons[2] look like. Today even kids know this, but i still bumped into a problem that i can not see a dEmon. All it started when i was trying to play "Roshambo game"[3] with segget dEmon. Firstly, i was trying to fork[4] the curly[5] daemon twice and it kept punching me in my nose, so i thought my TTL[6] would rapidly decrease. I understood that it's not such an easy thing to win while fighting with someone you can not see. And when daemon obtained Python[7-8] support and started to spawn[9] zombies[10], i've got even more problems. Conscience was telling me that i must play by the rules, but consciousness was sure that daemon doesn't always abide the protocol[11]. I've tried to follow the thread[12-13], but the dEmon was running like a ghost[13], so i almost got myself lost in the thicket of logs[14] and trees[15] :-( Anjuta[16] came to my rescue and helped me to improve my tools, so i could see what the daemon does. Unfortunately, curses[17-18] usually don't work on dEmons, and i really needed a pure magic to win this game. So, i've learned: "Mutex"[19], "Rainbow Colors"[20] and some other tricks. In a meantime i was finding myself knowing more and more about the dEmon, but this was not enough and i had to prepare good arguments if i were going to talk to the dEmon. Here they are: 1. For segget daemon: Command line arguments: --no-daemon --conf-dir=specify_conf_dir_here Arguments are optional. If no arguments provided, segget will run in a daemon mode and use /etc/seggetd dir to read configuration files. 2. For request tool: --pkglist-file E.i.: $request --pkglist-file=/home/user/mypkg.list 3. For tuiclient: --wait-distfile=distfile_name tuiclient checks distfile status, and returns when distfile is downloaded or not in the queue. Btw, here's features added to segget daemon and tuiclient during this period of time: 1. DAEMON ========= 1.1. Options: -------------- Add daemon mode to segget Add /etc/init.d/seggetd script to start|stop|restart|status segget daemon Check all set checksums, checksums are optional. Consider distfile failed if one of its segments is failed. Fixed: if only local mirrors are available and all of them failed to download a distfile, distfile still had DWAITING status, because attempt_limit wasn't reached. Add CoralCDN support as an option to network#.conf files (section [mode]) Add options FOLLOW_LOCATION and MAX_REDIRS to network#.conf files SYNOPSIS: FOLLOW_LOCATION= 0 | 1 A parameter set to 1 tells segget to follow any Location: header that the server sends as part of an HTTP header. This means that the segget will re-send the same request on the new location and follow new Location: headers all the way until no more such headers are returned. MAX_REDIRS can be used to limit the number of redirects segget will follow. Default: follow_location=1 MAX_REDIRS The set number will be the redirection limit. If that many redirections have been followed, the next redirect will cause an error. This option only makes sense if the FOLLOW_LOCATION is used at the same time. Setting the limit to 0 will make segget refuse any redirect. Minimum value: 0 Maximum value: 100 Default: max_redirs=5 Add BIND_LOCAL_PORT and BIND_LOCAL_PORT_RANGE options to network#.conf files BIND_LOCAL_PORT This sets the local port number of the socket used for connection. This option can be used in combination with BIND_INTERFACE and you are recommended to use BIND_LOCAL_PORT_RANGE as well when this is set. Set to 0 - to disable binding. Valid port numbers are 1 - 65535. Minimum value: 0 (no binding) Maximum value: 65535 Default: bind_local_port=0 BIND_LOCAL_PORT_RANGE If BIND_LOCAL_PORT=0 this option will be ignored. This is the number of attempts segget should make to find a working local port number. It starts with the given BIND_LOCAL_PORT and adds one to the number for each retry. Setting this to 1 or below will make segget do only one try for the exact port number. Port numbers by nature are scarce resources that will be busy at times so setting this value to something too low might cause unnecessary connection setup failures. Minimum value: 1 Maximum value: 65535 Default: bind_local_port_range=20 Add option proxy_type to network#.conf files SYNOPSIS: PROXY_TYPE = 0 | 1 | 2 | 3 | 4 | 5 0 - HTTP 1 - HTTP_1_0 2 - SOCKS4 3 - SOCKS4a 4 - SOCKS5 5 - SOCKS5_HOSTNAME Specify type of the proxy. Default: proxy_type=0 1.2. Proxy-fetcher ------------------ Implement checks for both (proxy_fetcher and request_server) queues. There're 2 queues: proxy_fetcher queue and request_server queue. Note: Segget processes request_server queue first and if no segment was chosen switches to proxy_fetcher queue. Before adding a distifile to any of the queues it's necessary to check both queues, since distfile may already be in one of them. 1.3. Python scripting --------------------- Add [scripting_and_scheduling] section to segget.conf file. [scripting_and_scheduling] Segget provides Python scripting functionalyty to support scheduling. Each time segget tries to start a new connection certain network it calls a python script (client.py) to accept or reject this connection and if necessary adjusts its settings. PYTHON_PATH Define path to python Default: python_path=/usr/bin/python SCRIPTS_DIR Define a path to the dir with python scripts. Before establishing connection for a particular segment via network# segget checks SCRIPTS_DIR. If SCRIPTS_DIR contains net#.py file, segget will launch schedule() function from this file to apply settings for connetion and accept or reject this segment for the moment. net#.py file is a python script file with a user-writen schedule() function. It's necessary to import functions before using get("variable"), set("variable",value), accept_segment() and reject_segment() in schedule(). get() function can obtain values for the following variables: connection.num, connection.url, connection.max_speed_limit, network.num, network.mode, network.active_connections_count, distfile.name, distfile.size, distfile.dld_segments_count, distfile.segments_count, distfile.active_connections_count, segment.num, segment.try_num, segment.size, segment.range set() function can change connection.max_speed_limit, see example: -----------------EXAMPLE STARTS----------------- from functions import * import time; def schedule(): localtime = time.localtime(time.time()); hour=localtime[3]; # disable downloading distfiles that have size more than 5 000 000 bytes # from 8-00 to 22-00. if hour>8 and hour<22 and (get("distfile.size"))>5000000: print "reject because distfile is too big" reject_segment() # set speed limit 50 000 cps for distfiles larger than 1 000 000 bytes if get("distfile.size")>1000000: print "limit connection speed" set(connection.max_speed_limit, 50000) accept_segment() -----------------EXAMPLE ENDS----------------- From example above localtime returns following tuple: Index Attributes Values 0 tm_year e.i.: 2008 1 tm_mon 1 to 12 2 tm_mday 1 to 31 3 tm_hour 0 to 23 4 tm_min 0 to 59 5 tm_sec 0 to 61 (60 or 61 are leap-seconds) 6 tm_wday 0 to 6 (0 is Monday) 7 tm_yday 1 to 366 (Julian day) 8 tm_isdst -1, 0, 1, -1 means library determines DST Therefore localtime[3] provides hours. Segment will be accecpted by default if it was neither accepted nor rejected during the schedule() function. sagget saves logs of resulting stdout and stderr in the log folder separatly for each network. Hence, if there's an error in net3.py file python error message would be saved to net3_script_stderr.log. Results of print would be saved in net3_script_stdout.log. Default: scripts_dir=./scripts SCRIPT_SOCKET_PATH Segget uses AF_UNIX domain sockets for communication with python. Specify path for the socket on your filesystem. Default: script_socket_path=/tmp/segget_script_socket 1.4 Logs -------- Add "none" as an option for log files. Add explanations for CURL error codes to logs. Add options: GENERAL_LOG_TIME_FORMAT, ERROR_LOG_TIME_FORMAT and DEBUG_LOG_TIME_FORMAT to segget.conf file GENERAL_LOG_TIME_FORMAT Set time format for general log as a string containing any combination of regular characters and special format specifiers. These format specifiers are replaced by the function to the corresponding values to represent the time specified in timeptr. They all begin with a percentage (%) sign, and are: %a Abbreviated weekday name [For example: Thu] %A Full weekday name [For example: Thursday] %b Abbreviated month name [For example: Aug] %B Full month name [For example: August] %c Date and time representation [For example: Thu Aug 23 14:55:02 2001] %d Day of the month (01-31) [For example: 23] %H Hour in 24h format (00-23) [For example: 14] %I Hour in 12h format (01-12) [For example: 02] %j Day of the year (001-366) [For example: 235] %m Month as a decimal number (01-12) [For example: 08] %M Minute (00-59) [For example: 55] %p AM or PM designation [For example: PM] %S Second (00-61) [For example: 02] %U Week number with the first Sunday as the first day of week one (00-53) [For example: 33] %w Weekday as a decimal number with Sunday as 0 (0-6) [For example: 4] %W Week number with the first Monday as the first day of week one (00-53) [For example: 34] %x Date representation [For example: 08/23/01] %X Time representation [For example: 14:55:02] %y Year, last two digits (00-99) [For example: 01] %Y Year [For example: 2001] %Z Timezone name or abbreviation [For example: CDT] %% A % sign [For example: %] For instace general_log_time_format=Time: %m/%d %X Default: general_log_time_format=%m/%d %X ERROR_LOG_TIME_FORMAT Set time format for error log as a string containing any combination of regular characters and special format specifiers. See GENERAL_LOG_TIME_FORMAT for details on format specifiers. Default: error_log_time_format=%m/%d %X DEBUG_LOG_TIME_FORMAT Set time format for debug log as a string containing any combination of regular characters and special format specifiers. See GENERAL_LOG_TIME_FORMAT for details on format specifiers. Default: debug_log_time_format=%m/%d %X 2. REQUEST TOOL =============== Add request tool. Request tool reads list of distfiles from ./pkg.list file and requests seggetd daemon to download distfiles from the list. 3. TUICLIENT ============ Add network_type for each connection to tui. Add ETA, AVG speed and active/total connections to tui. Add segments counters to stats and tui. Add connetion num to totals. Add log and error_log windows to tuiclient Add distfiles window to tuiclient that shows progress on distfile downloads, including its status: added/waiting/downloading/downloaded/failed/rejected by script etc. [1] Gnome http://www.gnome.org/ [2] dEmon http://www.clker.com/cliparts/5/1/b/d/11954315391526924611beastie_freebsd_daemon_r_02.svg.med.png [3] Roshambo game http://www.erikandanna.com/Humor/FlashStuff/SouthPark/roshamboN.swf [4] fork http://en.wikipedia.org/wiki/Fork_%28software_development%29 [5] curl http://curl.haxx.se/ [6] TTL http://en.wikipedia.org/wiki/Time_to_live [7] Python http://loyalkng.com/wp-content/uploads/2010/03/adam-apple-bizarro-cartoon-comic-tampon-chandelier-pc-mac-snake-eve.jpg [8] Python http://www.python.org/ [9] spawn http://en.wikipedia.org/wiki/Spawn_(computing) [10] zombies http://en.wikipedia.org/wiki/Zombie_process [11] protocol http://en.wikipedia.org/wiki/Communications_protocol [12] thread http://en.wikipedia.org/wiki/Thread_(computer_science) [13] ghost http://www.youtube.com/watch?v=9WrEDyIzdjY from the 3rd minute [14] logs http://www.nawwal.org/~mrgoff/photojournal/2004/winspr/pictures/03-20nurselog.jpg [15] pstrees http://en.wikipedia.org/wiki/Pstree [16] Anjuta http://www.anjuta.org/ [17] curses http://en.wikipedia.org/wiki/Curse [18] Ncurses http://en.wikipedia.org/wiki/Ncurses [19] Mutex http://en.wikipedia.org/wiki/Mutual_exclusion [20] Rainbow Colors http://idfetch.isgreat.org/_content2/tuiclient_rainbow_colors.jpg see "DISTFILES" window. Best regards, Kostyantyn aka simka