public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] Trying to automate HTML ---> pdf
@ 2008-01-27 17:06 felix
  2008-01-27 17:21 ` Neil Bothwick
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: felix @ 2008-01-27 17:06 UTC (permalink / raw
  To: gentoo-user

I am trying to automate converting a URL into a pdf file.  These web
pages include javascript and fancy formatting, so the simple minded
converters just don't cut the ice.  My next plan was to hack up a real
browser so it would take two command line args, the URL and the print
file, render the page, print it to the pdf file, and exit.  From what
I know of some of them, they would have to be configured in advance,
and invocation would have to be strictly controlled so only one
instance runs at a time, at least per user.  I could probably create
several firefox user sessions and have each of them running
simultaneously, but multiple real users works for me too.

Firefox doesn't print to pdf, however.  But konqueror does.  By using
the DCOP interface, I can even pass it commands to load a URL and
print the page, altho I have to settle for the configured print file
name.  But since I have to run individual sessions anyway, that's no
big deal.  The commands look like this:

    dcop konqueror-6352 'konqueror-mainwindow#1' openURL 'http://slashdot.org'
    dcop konqueror-6352 html-widget2 print true

There's a bit more than that, since widget names change, but a simple
perl program handles it easily (so far!).

However, there's a problem.  The "openURL" command returns without
waiting for the web page to finish loading, and the "print" command
does not wait for it to finish loading.  The "print" command does wait
for printing to finish before returning, which is nice.

This means I have to put in some arbitrary "sleep 30" or so between
"openURL" and "print" to have a good chance of a complete printed
page, and even then, there is no guarantee it actually will be
complete.  We have to send these pdf files to a bank, and it would not
be good to send them incomplete pages, even if only one out of 100 or
even 1000.  There will be at least hundreds of these every day.

I started to look at sources but there is no "konqueror-3.5.8.tar.gz"
or anything similar.  No doubt most of the code is handled by Qt
widgets and KDE libs.

Here are my quests:

0.  Is there a better place to ask this?  I tried a KDE mailing list
    and got no responses; there weren't even many views.

1.  Is there either a DCOP command to wait for a URL to be loaded or a
    DCOP command like openURL which waits?

2.  Is there a source file for konqueror which I could hack to take
    command line parameters without changing libraries or other code
    which would affect the rest of KDE?  I don't have any problem with
    a hacked and renamed konqueror command.

3.  Is there some other way of converting complicated web pages into
    pdf?  If they don't understand javascript and style sheets and
    everything else that a real browser does, they are useless to me.

4.  Are there other ways to do this that I haven't thought of?

-- 
            ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
     Felix Finch: scarecrow repairman & rocket surgeon / felix@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o
-- 
gentoo-user@lists.gentoo.org mailing list



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-01-29 14:35 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-27 17:06 [gentoo-user] Trying to automate HTML ---> pdf felix
2008-01-27 17:21 ` Neil Bothwick
2008-01-27 17:56 ` Etaoin Shrdlu
2008-01-27 18:01   ` felix
2008-01-28 15:01     ` Etaoin Shrdlu
2008-01-29  0:29       ` felix
2008-01-27 21:26 ` [gentoo-user] " Grant Edwards
2008-01-27 21:43   ` felix
2008-01-29  3:44     ` Justin Findlay
2008-01-29  5:24       ` felix
2008-01-29  5:49         ` Grant Edwards
2008-01-29  7:24           ` felix
2008-01-29 12:45             ` Stroller
2008-01-29 14:34               ` felix
2008-01-29  8:42         ` Neil Bothwick
2008-01-29 12:31           ` William Kenworthy
2008-01-29 14:13             ` Neil Bothwick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox