* [gentoo-soc] Project IDFetch - Weekly report #8-9 ("Once upon a time...")
@ 2010-08-06 21:57 Kostyantyn Ovechko
0 siblings, 0 replies; only message in thread
From: Kostyantyn Ovechko @ 2010-08-06 21:57 UTC (permalink / raw
To: gentoo-soc
[-- Attachment #1: Type: text/plain, Size: 14004 bytes --]
-=====================================================================-
- Project IDFetch - Weekly report #8-9 ("Once upon a time...") -
-=====================================================================-
Once upon a time there were no computers, and nobody knew how a gnome[1]
and dEmons[2] look like. Today even kids know this, but i still bumped
into a problem that i can not see a dEmon. All it started when i was
trying to play "Roshambo game"[3] with segget dEmon.
Firstly, i was trying to fork[4] the curly[5] daemon twice and it kept
punching me in my nose, so i thought my TTL[6] would rapidly decrease.
I understood that it's not such an easy thing to win while fighting with
someone you can not see. And when daemon obtained Python[7-8] support
and started to spawn[9] zombies[10], i've got even more problems.
Conscience was telling me that i must play by the rules, but consciousness
was sure that daemon doesn't always abide the protocol[11]. I've tried
to follow the thread[12-13], but the dEmon was running like a ghost[13],
so i almost got myself lost in the thicket of logs[14] and trees[15] :-(
Anjuta[16] came to my rescue and helped me to improve my tools, so i could
see what the daemon does. Unfortunately, curses[17-18] usually don't work
on dEmons, and i really needed a pure magic to win this game. So, i've
learned: "Mutex"[19], "Rainbow Colors"[20] and some other tricks.
In a meantime i was finding myself knowing more and more about the dEmon,
but this was not enough and i had to prepare good arguments if i were
going to talk to the dEmon. Here they are:
1. For segget daemon:
Command line arguments:
--no-daemon
--conf-dir=specify_conf_dir_here
Arguments are optional. If no arguments provided, segget will run in a daemon
mode and use /etc/seggetd dir to read configuration files.
2. For request tool:
--pkglist-file
E.i.:
$request --pkglist-file=/home/user/mypkg.list
3. For tuiclient:
--wait-distfile=distfile_name
tuiclient checks distfile status, and returns when distfile is downloaded or not in the queue.
Btw, here's features added to segget daemon and tuiclient during this
period of time:
1. DAEMON
=========
1.1. Options:
--------------
Add daemon mode to segget
Add /etc/init.d/seggetd script to start|stop|restart|status segget daemon
Check all set checksums, checksums are optional.
Consider distfile failed if one of its segments is failed.
Fixed: if only local mirrors are available and all of them failed to download
a distfile, distfile still had DWAITING status, because attempt_limit wasn't reached.
Add CoralCDN support as an option to network#.conf files (section [mode])
Add options FOLLOW_LOCATION and MAX_REDIRS to network#.conf files
SYNOPSIS: FOLLOW_LOCATION= 0 | 1
A parameter set to 1 tells segget to follow any Location: header that the server
sends as part of an HTTP header. This means that the segget will re-send the
same request on the new location and follow new Location: headers all the way
until no more such headers are returned. MAX_REDIRS can be used to limit the
number of redirects segget will follow.
Default:
follow_location=1
MAX_REDIRS
The set number will be the redirection limit. If that many redirections have
been followed, the next redirect will cause an error. This option only makes
sense if the FOLLOW_LOCATION is used at the same time.
Setting the limit to 0 will make segget refuse any redirect.
Minimum value: 0
Maximum value: 100
Default:
max_redirs=5
Add BIND_LOCAL_PORT and BIND_LOCAL_PORT_RANGE options to network#.conf files
BIND_LOCAL_PORT
This sets the local port number of the socket used for connection. This option
can be used in combination with BIND_INTERFACE and you are recommended to
use BIND_LOCAL_PORT_RANGE as well when this is set. Set to 0 - to disable
binding. Valid port numbers are 1 - 65535.
Minimum value: 0 (no binding)
Maximum value: 65535
Default:
bind_local_port=0
BIND_LOCAL_PORT_RANGE
If BIND_LOCAL_PORT=0 this option will be ignored.
This is the number of attempts segget should make to find a
working local port number. It starts with the given BIND_LOCAL_PORT and adds
one to the number for each retry. Setting this to 1 or below will make segget
do only one try for the exact port number. Port numbers by nature are scarce
resources that will be busy at times so setting this value to something too
low might cause unnecessary connection setup failures.
Minimum value: 1
Maximum value: 65535
Default:
bind_local_port_range=20
Add option proxy_type to network#.conf files
SYNOPSIS: PROXY_TYPE = 0 | 1 | 2 | 3 | 4 | 5
0 - HTTP
1 - HTTP_1_0
2 - SOCKS4
3 - SOCKS4a
4 - SOCKS5
5 - SOCKS5_HOSTNAME
Specify type of the proxy.
Default:
proxy_type=0
1.2. Proxy-fetcher
------------------
Implement checks for both (proxy_fetcher and request_server) queues.
There're 2 queues: proxy_fetcher queue and request_server queue.
Note: Segget processes request_server queue first and if no segment was
chosen switches to proxy_fetcher queue.
Before adding a distifile to any of the queues it's necessary to
check both queues, since distfile may already be in one of them.
1.3. Python scripting
---------------------
Add [scripting_and_scheduling] section to segget.conf file.
[scripting_and_scheduling]
Segget provides Python scripting functionalyty to support scheduling.
Each time segget tries to start a new connection certain network it calls
a python script (client.py) to accept or reject this connection and
if necessary adjusts its settings.
PYTHON_PATH
Define path to python
Default:
python_path=/usr/bin/python
SCRIPTS_DIR
Define a path to the dir with python scripts. Before establishing connection for
a particular segment via network# segget checks SCRIPTS_DIR.
If SCRIPTS_DIR contains net#.py file, segget will launch schedule() function
from this file to apply settings for connetion and accept or reject this
segment for the moment. net#.py file is a python script file
with a user-writen schedule() function.
It's necessary to import functions before using get("variable"),
set("variable",value), accept_segment() and reject_segment() in schedule().
get() function can obtain values for the following variables:
connection.num, connection.url, connection.max_speed_limit,
network.num, network.mode, network.active_connections_count,
distfile.name, distfile.size, distfile.dld_segments_count,
distfile.segments_count, distfile.active_connections_count,
segment.num, segment.try_num, segment.size, segment.range
set() function can change connection.max_speed_limit, see example:
-----------------EXAMPLE STARTS-----------------
from functions import *
import time;
def schedule():
localtime = time.localtime(time.time());
hour=localtime[3];
# disable downloading distfiles that have size more than 5 000 000 bytes
# from 8-00 to 22-00.
if hour>8 and hour<22 and (get("distfile.size"))>5000000:
print "reject because distfile is too big"
reject_segment()
# set speed limit 50 000 cps for distfiles larger than 1 000 000 bytes
if get("distfile.size")>1000000:
print "limit connection speed"
set(connection.max_speed_limit, 50000)
accept_segment()
-----------------EXAMPLE ENDS-----------------
From example above localtime returns following tuple:
Index Attributes Values
0 tm_year e.i.: 2008
1 tm_mon 1 to 12
2 tm_mday 1 to 31
3 tm_hour 0 to 23
4 tm_min 0 to 59
5 tm_sec 0 to 61 (60 or 61 are leap-seconds)
6 tm_wday 0 to 6 (0 is Monday)
7 tm_yday 1 to 366 (Julian day)
8 tm_isdst -1, 0, 1, -1 means library determines DST
Therefore localtime[3] provides hours.
Segment will be accecpted by default if it was neither accepted nor rejected
during the schedule() function.
sagget saves logs of resulting stdout and stderr in the log folder
separatly for each network. Hence, if there's an error in net3.py file python
error message would be saved to net3_script_stderr.log. Results of print would
be saved in net3_script_stdout.log.
Default:
scripts_dir=./scripts
SCRIPT_SOCKET_PATH
Segget uses AF_UNIX domain sockets for communication with python.
Specify path for the socket on your filesystem.
Default:
script_socket_path=/tmp/segget_script_socket
1.4 Logs
--------
Add "none" as an option for log files.
Add explanations for CURL error codes to logs.
Add options: GENERAL_LOG_TIME_FORMAT, ERROR_LOG_TIME_FORMAT and DEBUG_LOG_TIME_FORMAT to segget.conf file
GENERAL_LOG_TIME_FORMAT
Set time format for general log as a string containing any combination of
regular characters and special format specifiers. These format specifiers are
replaced by the function to the corresponding values to represent the time
specified in timeptr. They all begin with a percentage (%) sign, and are:
%a Abbreviated weekday name [For example: Thu]
%A Full weekday name [For example: Thursday]
%b Abbreviated month name [For example: Aug]
%B Full month name [For example: August]
%c Date and time representation [For example: Thu Aug 23 14:55:02 2001]
%d Day of the month (01-31) [For example: 23]
%H Hour in 24h format (00-23) [For example: 14]
%I Hour in 12h format (01-12) [For example: 02]
%j Day of the year (001-366) [For example: 235]
%m Month as a decimal number (01-12) [For example: 08]
%M Minute (00-59) [For example: 55]
%p AM or PM designation [For example: PM]
%S Second (00-61) [For example: 02]
%U Week number with the first Sunday
as the first day of week one (00-53) [For example: 33]
%w Weekday as a decimal number with
Sunday as 0 (0-6) [For example: 4]
%W Week number with the first Monday as
the first day of week one (00-53) [For example: 34]
%x Date representation [For example: 08/23/01]
%X Time representation [For example: 14:55:02]
%y Year, last two digits (00-99) [For example: 01]
%Y Year [For example: 2001]
%Z Timezone name or abbreviation [For example: CDT]
%% A % sign [For example: %]
For instace general_log_time_format=Time: %m/%d %X
Default:
general_log_time_format=%m/%d %X
ERROR_LOG_TIME_FORMAT
Set time format for error log as a string containing any combination of
regular characters and special format specifiers. See GENERAL_LOG_TIME_FORMAT
for details on format specifiers.
Default:
error_log_time_format=%m/%d %X
DEBUG_LOG_TIME_FORMAT
Set time format for debug log as a string containing any combination of
regular characters and special format specifiers. See GENERAL_LOG_TIME_FORMAT
for details on format specifiers.
Default:
debug_log_time_format=%m/%d %X
2. REQUEST TOOL
===============
Add request tool.
Request tool reads list of distfiles from ./pkg.list file and requests
seggetd daemon to download distfiles from the list.
3. TUICLIENT
============
Add network_type for each connection to tui.
Add ETA, AVG speed and active/total connections to tui.
Add segments counters to stats and tui.
Add connetion num to totals.
Add log and error_log windows to tuiclient
Add distfiles window to tuiclient that shows progress on distfile downloads,
including its status: added/waiting/downloading/downloaded/failed/rejected by script etc.
[1] Gnome http://www.gnome.org/
[2] dEmon http://www.clker.com/cliparts/5/1/b/d/11954315391526924611beastie_freebsd_daemon_r_02.svg.med.png
[3] Roshambo game http://www.erikandanna.com/Humor/FlashStuff/SouthPark/roshamboN.swf
[4] fork http://en.wikipedia.org/wiki/Fork_%28software_development%29
[5] curl http://curl.haxx.se/
[6] TTL http://en.wikipedia.org/wiki/Time_to_live
[7] Python http://loyalkng.com/wp-content/uploads/2010/03/adam-apple-bizarro-cartoon-comic-tampon-chandelier-pc-mac-snake-eve.jpg
[8] Python http://www.python.org/
[9] spawn http://en.wikipedia.org/wiki/Spawn_(computing)
[10] zombies http://en.wikipedia.org/wiki/Zombie_process
[11] protocol http://en.wikipedia.org/wiki/Communications_protocol
[12] thread http://en.wikipedia.org/wiki/Thread_(computer_science)
[13] ghost http://www.youtube.com/watch?v=9WrEDyIzdjY from the 3rd minute
[14] logs http://www.nawwal.org/~mrgoff/photojournal/2004/winspr/pictures/03-20nurselog.jpg
[15] pstrees http://en.wikipedia.org/wiki/Pstree
[16] Anjuta http://www.anjuta.org/
[17] curses http://en.wikipedia.org/wiki/Curse
[18] Ncurses http://en.wikipedia.org/wiki/Ncurses
[19] Mutex http://en.wikipedia.org/wiki/Mutual_exclusion
[20] Rainbow Colors http://idfetch.isgreat.org/_content2/tuiclient_rainbow_colors.jpg see "DISTFILES" window.
Best regards,
Kostyantyn aka simka
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2010-08-06 21:57 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-06 21:57 [gentoo-soc] Project IDFetch - Weekly report #8-9 ("Once upon a time...") Kostyantyn Ovechko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox