* [gentoo-user] creating local copies of web pages
@ 2005-12-02 1:41 Robert Persson
2005-12-02 5:25 ` Shawn Singh
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Robert Persson @ 2005-12-02 1:41 UTC (permalink / raw
To: gentoo-user
I have been trying all afternoon to make local copies of web pages from a
netscape bookmark file. I have been wrestling with httrack (through
khttrack), pavuk and wget, but none of them work. httrack and pavuk seem to
claim they can do the job, but they can't, or at least not in any way an
ordinary mortal could be expected to work out. They do things like pretending
to download hundreds of files without actually saving them to disk, crashing
suddenly and frequently, and popping up messages saying that I haven't
contributed enough code to their project to expect the thing to work
properly. I don't want to do anything hideously complicated. I just want to
make local copies of some bookmarked pages. What tools should I be using?
I would be happy to use a windows tool in wine if it worked. I would be happy
to reboot into Windows if I could get this job done.
One option would be to feed wget a list of urls. The trouble is I don't know
how to turn an html bookmark file into a simple list of urls. I imagine I
could do it in sed if I spent enough time to learn sed, but my afternoon has
gone now and I don't have the time.
Many thanks
Robert
--
Robert Persson
"Don't use nuclear weapons to troubleshoot faults."
(US Air Force Instruction 91-111, 1 Oct 1997)
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] creating local copies of web pages
2005-12-02 1:41 [gentoo-user] creating local copies of web pages Robert Persson
@ 2005-12-02 5:25 ` Shawn Singh
2005-12-02 9:37 ` Martins Steinbergs
2005-12-02 9:05 ` Neil Bothwick
2005-12-02 15:42 ` Billy Holmes
2 siblings, 1 reply; 17+ messages in thread
From: Shawn Singh @ 2005-12-02 5:25 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 1676 bytes --]
I guess I'm not exactly sure what you're trying to do, but when I want to
get a local copy of a website I do this:
nohup wget -m http://www.someUrL.org &
Shawn
On 12/2/05, Robert Persson <ireneshusband@yahoo.co.uk> wrote:
>
> I have been trying all afternoon to make local copies of web pages from a
> netscape bookmark file. I have been wrestling with httrack (through
> khttrack), pavuk and wget, but none of them work. httrack and pavuk seem
> to
> claim they can do the job, but they can't, or at least not in any way an
> ordinary mortal could be expected to work out. They do things like
> pretending
> to download hundreds of files without actually saving them to disk,
> crashing
> suddenly and frequently, and popping up messages saying that I haven't
> contributed enough code to their project to expect the thing to work
> properly. I don't want to do anything hideously complicated. I just want
> to
> make local copies of some bookmarked pages. What tools should I be using?
>
> I would be happy to use a windows tool in wine if it worked. I would be
> happy
> to reboot into Windows if I could get this job done.
>
> One option would be to feed wget a list of urls. The trouble is I don't
> know
> how to turn an html bookmark file into a simple list of urls. I imagine I
> could do it in sed if I spent enough time to learn sed, but my afternoon
> has
> gone now and I don't have the time.
>
> Many thanks
> Robert
> --
> Robert Persson
>
> "Don't use nuclear weapons to troubleshoot faults."
> (US Air Force Instruction 91-111, 1 Oct 1997)
>
> --
> gentoo-user@gentoo.org mailing list
>
>
--
Shawn Singh
[-- Attachment #2: Type: text/html, Size: 2079 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] creating local copies of web pages
2005-12-02 5:25 ` Shawn Singh
@ 2005-12-02 9:37 ` Martins Steinbergs
2005-12-02 13:42 ` Robert Persson
0 siblings, 1 reply; 17+ messages in thread
From: Martins Steinbergs @ 2005-12-02 9:37 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 2339 bytes --]
On Friday 02 December 2005 07:25, Shawn Singh wrote:
> I guess I'm not exactly sure what you're trying to do, but when I want to
> get a local copy of a website I do this:
>
> nohup wget -m http://www.someUrL.org &
>
> Shawn
>
> On 12/2/05, Robert Persson <ireneshusband@yahoo.co.uk> wrote:
> > I have been trying all afternoon to make local copies of web pages from a
> > netscape bookmark file. I have been wrestling with httrack (through
> > khttrack), pavuk and wget, but none of them work. httrack and pavuk seem
> > to
> > claim they can do the job, but they can't, or at least not in any way an
> > ordinary mortal could be expected to work out. They do things like
> > pretending
> > to download hundreds of files without actually saving them to disk,
> > crashing
> > suddenly and frequently, and popping up messages saying that I haven't
> > contributed enough code to their project to expect the thing to work
> > properly. I don't want to do anything hideously complicated. I just want
> > to
> > make local copies of some bookmarked pages. What tools should I be using?
> >
> > I would be happy to use a windows tool in wine if it worked. I would be
> > happy
> > to reboot into Windows if I could get this job done.
> >
> > One option would be to feed wget a list of urls. The trouble is I don't
> > know
> > how to turn an html bookmark file into a simple list of urls. I imagine I
> > could do it in sed if I spent enough time to learn sed, but my afternoon
> > has
> > gone now and I don't have the time.
> >
> > Many thanks
> > Robert
> > --
> > Robert Persson
> >
> > "Don't use nuclear weapons to troubleshoot faults."
> > (US Air Force Instruction 91-111, 1 Oct 1997)
> >
> > --
> > gentoo-user@gentoo.org mailing list
>
> --
> Shawn Singh
i use httrack linux and windows versions, generally without problems,
sometimes fails parse dinamic content websites but man httrack has plenty
options described. in previous work (windows only) i run daily task with
httrack to get fresh rar files with database updates.
if there realy no files and dirs created in ~/websites folder, try to check
write permissions or is there any space left.
--
Linux 2.6.15-rc2 AMD Athlon(tm) 64 Processor 3200+
11:18:28 up 45 min, 7 users, load average: 0.00, 0.00, 0.00
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] creating local copies of web pages
2005-12-02 9:37 ` Martins Steinbergs
@ 2005-12-02 13:42 ` Robert Persson
2005-12-02 14:40 ` Martins Steinbergs
0 siblings, 1 reply; 17+ messages in thread
From: Robert Persson @ 2005-12-02 13:42 UTC (permalink / raw
To: gentoo-user
On December 2, 2005 01:37 am Martins Steinbergs was like:
> if there realy no files and dirs created in ~/websites folder, try to check
> write permissions or is there any space left.
Permissions are fine and there is quite a bit of space on the disk. httrack
creates directories in ~/websites, but no other files, despite the fact that
it claims to be downloading bucketloads of them.
--
Robert Persson
"Don't use nuclear weapons to troubleshoot faults."
(US Air Force Instruction 91-111, 1 Oct 1997)
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] creating local copies of web pages
2005-12-02 13:42 ` Robert Persson
@ 2005-12-02 14:40 ` Martins Steinbergs
2005-12-03 7:04 ` Robert Persson
0 siblings, 1 reply; 17+ messages in thread
From: Martins Steinbergs @ 2005-12-02 14:40 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 609 bytes --]
On Friday 02 December 2005 15:42, Robert Persson wrote:
>
> Permissions are fine and there is quite a bit of space on the disk. httrack
> creates directories in ~/websites, but no other files, despite the fact
> that it claims to be downloading bucketloads of them.
> --
> Robert Persson
>
> "Don't use nuclear weapons to troubleshoot faults."
> (US Air Force Instruction 91-111, 1 Oct 1997)
if httrack is runing as root all stuff goes to /root/websites/ , explored
there?
--
Linux 2.6.15-rc2 AMD Athlon(tm) 64 Processor 3200+
16:38:31 up 6:05, 2 users, load average: 0.08, 0.49, 0.82
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] creating local copies of web pages
2005-12-02 14:40 ` Martins Steinbergs
@ 2005-12-03 7:04 ` Robert Persson
2005-12-03 13:40 ` Martins Steinbergs
0 siblings, 1 reply; 17+ messages in thread
From: Robert Persson @ 2005-12-03 7:04 UTC (permalink / raw
To: gentoo-user
On December 2, 2005 06:40 am Martins Steinbergs was like:
> if httrack is runing as root all stuff goes to /root/websites/ , explored
> there?
I wasn't running it as root. The strange thing is that httrack did start
creating a directory structure in ~/websites consisting of a couple of dozen
directories or so (e.g.
~/websites/politics/www.fromthewilderness.com/free/ww3/), but it didn't
actually store any html or other site content, despite the fact that it was
taking a very long time to do this and was claiming to have downloaded
hundreds of files.
--
Robert Persson
"Don't use nuclear weapons to troubleshoot faults."
(US Air Force Instruction 91-111, 1 Oct 1997)
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] creating local copies of web pages
2005-12-03 7:04 ` Robert Persson
@ 2005-12-03 13:40 ` Martins Steinbergs
2005-12-03 18:09 ` Robert Persson
0 siblings, 1 reply; 17+ messages in thread
From: Martins Steinbergs @ 2005-12-03 13:40 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 1015 bytes --]
On Saturday 03 December 2005 09:04, Robert Persson wrote:
> I wasn't running it as root. The strange thing is that httrack did start
> creating a directory structure in ~/websites consisting of a couple of
> dozen directories or so (e.g.
> ~/websites/politics/www.fromthewilderness.com/free/ww3/), but it didn't
> actually store any html or other site content, despite the fact that it was
> taking a very long time to do this and was claiming to have downloaded
> hundreds of files.
> --
> Robert Persson
>
> "Don't use nuclear weapons to troubleshoot faults."
> (US Air Force Instruction 91-111, 1 Oct 1997)
if there isn't any files or folders under /websites then it isn't problem with
httrack. if mirroring goes wrong, then there at least should be project
folder containing hts-cash folder and hts-log.txt; index.html files. sorry,
not much help from here.
martins
--
Linux 2.6.15-rc2 AMD Athlon(tm) 64 Processor 3200+
15:20:24 up 1:03, 3 users, load average: 0.27, 0.13, 0.08
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] creating local copies of web pages
2005-12-03 13:40 ` Martins Steinbergs
@ 2005-12-03 18:09 ` Robert Persson
2005-12-30 5:24 ` Robert Persson
0 siblings, 1 reply; 17+ messages in thread
From: Robert Persson @ 2005-12-03 18:09 UTC (permalink / raw
To: gentoo-user
On December 3, 2005 05:40 am Martins Steinbergs was like:
> if there isn't any files or folders under /websites then it isn't problem
> with httrack. if mirroring goes wrong, then there at least should be
> project folder containing hts-cash folder and hts-log.txt; index.html
> files. sorry, not much help from here.
> martins
But that's not what I've been saying, Martins. httrack +does+ create
directories in ~/websites, including hts-cache. It also creates hts-log.txt,
index.html, a lock file and a couple of gifs. However hts-cache is the only
one of those directories with anything in it (aside from subdirectories and
sub-subdirectories), and index.html is an empty file. What there is in
hts-cache is a file called new.dat which contains a lot of the html that
ought to have been put into the folders, all rolled into one huge file.
--
Robert Persson
"Don't use nuclear weapons to troubleshoot faults."
(US Air Force Instruction 91-111, 1 Oct 1997)
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] creating local copies of web pages
2005-12-03 18:09 ` Robert Persson
@ 2005-12-30 5:24 ` Robert Persson
0 siblings, 0 replies; 17+ messages in thread
From: Robert Persson @ 2005-12-30 5:24 UTC (permalink / raw
To: gentoo-user
On December 3, 2005 10:09 am Robert Persson was like:
> On December 3, 2005 05:40 am Martins Steinbergs was like:
> > if there isn't any files or folders under /websites then it isn't problem
> > with httrack. if mirroring goes wrong, then there at least should be
> > project folder containing hts-cash folder and hts-log.txt; index.html
> > files. sorry, not much help from here.
> > martins
>
> But that's not what I've been saying, Martins. httrack +does+ create
> directories in ~/websites, including hts-cache. It also creates
> hts-log.txt, index.html, a lock file and a couple of gifs. However
> hts-cache is the only one of those directories with anything in it (aside
> from subdirectories and sub-subdirectories), and index.html is an empty
> file. What there is in hts-cache is a file called new.dat which contains a
> lot of the html that ought to have been put into the folders, all rolled
> into one huge file.
It seems that the problem has been simply that httrack is incredibly slow at
downloading pages, and that it also caches a lot of them before committing
them to wherever they are supposed to end up. This is why it looked like
nothing was happening whenever I tried to use it.
Many thanks to everyone who has helped me here.
Robert
--
Robert Persson
"Don't use nuclear weapons to troubleshoot faults."
(US Air Force Instruction 91-111, 1 Oct 1997)
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] creating local copies of web pages
2005-12-02 1:41 [gentoo-user] creating local copies of web pages Robert Persson
2005-12-02 5:25 ` Shawn Singh
@ 2005-12-02 9:05 ` Neil Bothwick
2005-12-02 13:39 ` Robert Persson
2005-12-02 15:42 ` Billy Holmes
2 siblings, 1 reply; 17+ messages in thread
From: Neil Bothwick @ 2005-12-02 9:05 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 527 bytes --]
On Thu, 1 Dec 2005 17:41:36 -0800, Robert Persson wrote:
> One option would be to feed wget a list of urls. The trouble is I don't
> know how to turn an html bookmark file into a simple list of urls. I
> imagine I could do it in sed if I spent enough time to learn sed, but
> my afternoon has gone now and I don't have the time.
wget will accept most files containing URLs, it doesn't have to be a
straight list. Try feeding it your bookmark file as is.
--
Neil Bothwick
Excuse for the day: daemons did it
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] creating local copies of web pages
2005-12-02 9:05 ` Neil Bothwick
@ 2005-12-02 13:39 ` Robert Persson
0 siblings, 0 replies; 17+ messages in thread
From: Robert Persson @ 2005-12-02 13:39 UTC (permalink / raw
To: gentoo-user
On December 2, 2005 01:05 am Neil Bothwick was like:
> wget will accept most files containing URLs, it doesn't have to be a
> straight list. Try feeding it your bookmark file as is.
Tried that. It borked. :-(
--
Robert Persson
"Don't use nuclear weapons to troubleshoot faults."
(US Air Force Instruction 91-111, 1 Oct 1997)
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] creating local copies of web pages
2005-12-02 1:41 [gentoo-user] creating local copies of web pages Robert Persson
2005-12-02 5:25 ` Shawn Singh
2005-12-02 9:05 ` Neil Bothwick
@ 2005-12-02 15:42 ` Billy Holmes
2005-12-03 6:56 ` Robert Persson
2 siblings, 1 reply; 17+ messages in thread
From: Billy Holmes @ 2005-12-02 15:42 UTC (permalink / raw
To: gentoo-user
Robert Persson wrote:
> I have been trying all afternoon to make local copies of web pages from a
> netscape bookmark file. I have been wrestling with httrack (through
wget -r http://$site/
have you tried that, yet?
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] creating local copies of web pages
2005-12-02 15:42 ` Billy Holmes
@ 2005-12-03 6:56 ` Robert Persson
2005-12-03 15:40 ` Matthew Cline
` (3 more replies)
0 siblings, 4 replies; 17+ messages in thread
From: Robert Persson @ 2005-12-03 6:56 UTC (permalink / raw
To: gentoo-user
On December 2, 2005 07:42 am Billy Holmes was like:
> Robert Persson wrote:
> > I have been trying all afternoon to make local copies of web pages from a
> > netscape bookmark file. I have been wrestling with httrack (through
>
> wget -r http://$site/
>
> have you tried that, yet?
The trouble is that I have a bookmark file with several hundred entries. wget
is supposed to be fairly good at extracting urls from text files, but it
couldn't handle this particular file.
Robert
--
Robert Persson
"Don't use nuclear weapons to troubleshoot faults."
(US Air Force Instruction 91-111, 1 Oct 1997)
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] creating local copies of web pages
2005-12-03 6:56 ` Robert Persson
@ 2005-12-03 15:40 ` Matthew Cline
2005-12-03 18:00 ` [gentoo-user] " Harry Putnam
` (2 subsequent siblings)
3 siblings, 0 replies; 17+ messages in thread
From: Matthew Cline @ 2005-12-03 15:40 UTC (permalink / raw
To: gentoo-user
On 12/3/05, Robert Persson <ireneshusband@yahoo.co.uk> wrote:
>
> The trouble is that I have a bookmark file with several hundred entries. wget
> is supposed to be fairly good at extracting urls from text files, but it
> couldn't handle this particular file.
>
I don't know what the exact format of your particular text file is,
but why don't you just sed and friends to convert the text file into a
format that wget can use?
HTH,
Matt
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 17+ messages in thread
* [gentoo-user] Re: creating local copies of web pages
2005-12-03 6:56 ` Robert Persson
2005-12-03 15:40 ` Matthew Cline
@ 2005-12-03 18:00 ` Harry Putnam
2005-12-05 17:32 ` [gentoo-user] " Billy Holmes
2005-12-05 17:33 ` Billy Holmes
3 siblings, 0 replies; 17+ messages in thread
From: Harry Putnam @ 2005-12-03 18:00 UTC (permalink / raw
To: gentoo-user
Robert Persson <ireneshusband@yahoo.co.uk> writes:
> The trouble is that I have a bookmark file with several hundred entries. wget
> is supposed to be fairly good at extracting urls from text files, but it
> couldn't handle this particular file.
There is a program in portage called `urlview'. I haven't used it for
a couple of yrs but it used to be able to create clickable urls from
ones found in text files. Maybe it can output something wget can use.
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] creating local copies of web pages
2005-12-03 6:56 ` Robert Persson
2005-12-03 15:40 ` Matthew Cline
2005-12-03 18:00 ` [gentoo-user] " Harry Putnam
@ 2005-12-05 17:32 ` Billy Holmes
2005-12-05 17:33 ` Billy Holmes
3 siblings, 0 replies; 17+ messages in thread
From: Billy Holmes @ 2005-12-05 17:32 UTC (permalink / raw
To: gentoo-user
Robert Persson wrote:
> The trouble is that I have a bookmark file with several hundred entries. wget
> is supposed to be fairly good at extracting urls from text files, but it
> couldn't handle this particular file.
Try this:
emerge HTML-Tree
then as a normal user, run this script like so (where $file is your
bookmark file)
$ perl listhref.pl $file > list.txt
[snip]
#!/usr/bin/perl
use HTML::Tree;
print join("\n",(map { $_->attr('href') }
HTML::TreeBuilder->new()->parse_file(shift)->look_down("_tag","A",sub {
$_[0]->attr('href') ne "" }) ))."\n";
exit;
[snip]
Then you can process your urls like so:
xargs wget -m < list.txt
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [gentoo-user] creating local copies of web pages
2005-12-03 6:56 ` Robert Persson
` (2 preceding siblings ...)
2005-12-05 17:32 ` [gentoo-user] " Billy Holmes
@ 2005-12-05 17:33 ` Billy Holmes
3 siblings, 0 replies; 17+ messages in thread
From: Billy Holmes @ 2005-12-05 17:33 UTC (permalink / raw
To: gentoo-user
Robert Persson wrote:
> The trouble is that I have a bookmark file with several hundred entries. wget
> is supposed to be fairly good at extracting urls from text files, but it
> couldn't handle this particular file.
my previous message assumes that your bookmark file is in reality a HTML
file.
--
gentoo-user@gentoo.org mailing list
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2005-12-30 5:28 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-12-02 1:41 [gentoo-user] creating local copies of web pages Robert Persson
2005-12-02 5:25 ` Shawn Singh
2005-12-02 9:37 ` Martins Steinbergs
2005-12-02 13:42 ` Robert Persson
2005-12-02 14:40 ` Martins Steinbergs
2005-12-03 7:04 ` Robert Persson
2005-12-03 13:40 ` Martins Steinbergs
2005-12-03 18:09 ` Robert Persson
2005-12-30 5:24 ` Robert Persson
2005-12-02 9:05 ` Neil Bothwick
2005-12-02 13:39 ` Robert Persson
2005-12-02 15:42 ` Billy Holmes
2005-12-03 6:56 ` Robert Persson
2005-12-03 15:40 ` Matthew Cline
2005-12-03 18:00 ` [gentoo-user] " Harry Putnam
2005-12-05 17:32 ` [gentoo-user] " Billy Holmes
2005-12-05 17:33 ` Billy Holmes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox