* [gentoo-user] checksumming files
@ 2008-12-04 7:10 Mick
2008-12-04 7:21 ` Heinrichs, Dirk (EXT-Capgemini - DE/Dusseldorf)
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Mick @ 2008-12-04 7:10 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 504 bytes --]
Almost every time I split a large file >1G into say 200k chunks, then ftp it
to a server and then:
cat 1 2 3 4 5 6 7 > completefile ; md5sum -c completefile
if fails. Checking the split files in turn I often find 1 or two chunks that
fail on their own md5 checks. Despite that the concatenated file often works
(e.g. if it is a video file it'll play alright).
Can you explain this? Should I be using a different check to verify the
integrity of the ftp'd file?
--
Regards,
Mick
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-user] checksumming files
2008-12-04 7:10 [gentoo-user] checksumming files Mick
@ 2008-12-04 7:21 ` Heinrichs, Dirk (EXT-Capgemini - DE/Dusseldorf)
2008-12-05 18:48 ` Mick
2008-12-04 9:06 ` Neil Bothwick
` (2 subsequent siblings)
3 siblings, 1 reply; 11+ messages in thread
From: Heinrichs, Dirk (EXT-Capgemini - DE/Dusseldorf) @ 2008-12-04 7:21 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 1063 bytes --]
Am Donnerstag, den 04.12.2008, 07:10 +0000 schrieb ext Mick:
> Almost every time I split a large file >1G into say 200k chunks, then ftp it
> to a server and then:
>
> cat 1 2 3 4 5 6 7 > completefile ; md5sum -c completefile
>
> if fails. Checking the split files in turn I often find 1 or two chunks that
> fail on their own md5 checks. Despite that the concatenated file often works
> (e.g. if it is a video file it'll play alright).
>
> Can you explain this? Should I be using a different check to verify the
> integrity of the ftp'd file?
Did you make sure the chunks are transfered in binary mode? BTW, most
modern FTP clients have a resume option, so there's no need to split.
HTH...
Dirk
--
Dirk Heinrichs | Tel: +49 (0)162 234 3408
Configuration Manager | Fax: +49 (0)211 47068 111
Capgemini Deutschland | Mail: dirk.heinrichs@capgemini.com
Wanheimerstraße 68 | Web: http://www.capgemini.com
D-40468 Düsseldorf | ICQ#: 110037733
GPG Public Key C2E467BB | Keyserver: www.keyserver.net
[-- Attachment #2: Dies ist ein digital signierter Nachrichtenteil --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-user] checksumming files
2008-12-04 7:10 [gentoo-user] checksumming files Mick
2008-12-04 7:21 ` Heinrichs, Dirk (EXT-Capgemini - DE/Dusseldorf)
@ 2008-12-04 9:06 ` Neil Bothwick
2008-12-05 19:57 ` Paul Hartman
2008-12-05 20:32 ` Albert Hopkins
3 siblings, 0 replies; 11+ messages in thread
From: Neil Bothwick @ 2008-12-04 9:06 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 513 bytes --]
On Thu, 4 Dec 2008 07:10:06 +0000, Mick wrote:
> Despite that the concatenated file often works
> (e.g. if it is a video file it'll play alright).
>
> Can you explain this? Should I be using a different check to verify
> the integrity of the ftp'd file?
An MD5 check will fail if one bit is changed, which won't affect the
playback of a video file. Try it with a large compressed tarball and
you'll notice a difference.
--
Neil Bothwick
--T-A+G-L-I+N-E--+M-E-A+S-U-R+I-N-G+--G-A+U-G-E--
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-user] checksumming files
2008-12-04 7:21 ` Heinrichs, Dirk (EXT-Capgemini - DE/Dusseldorf)
@ 2008-12-05 18:48 ` Mick
2008-12-06 10:54 ` Dirk Heinrichs
0 siblings, 1 reply; 11+ messages in thread
From: Mick @ 2008-12-05 18:48 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 1307 bytes --]
On Thursday 04 December 2008, Heinrichs, Dirk (EXT-Capgemini - DE/Dusseldorf)
wrote:
> Did you make sure the chunks are transfered in binary mode?
Aha!! Since the split chunks were part of a video file I assumed that it would
be binary - and I understand that the default type (for tnftp) is binary?
There's more to it:
I use tnftp because it has an unattended feature which suits me nicely. A
string like:
sleep 90m ; tnftp -u ftp://<username>:<passwd>@<server_address>/htdocs/path \
<files_to_upload>
will login after 90 minutes and upload the file(s) I want (not sure if/how I
can do this with vanilla ftp).
> BTW, most
> modern FTP clients have a resume option, so there's no need to split.
Yes, tnftp has the 'reget' command but I can't find a 'reput', or 'resume'?
It also has 'restart':
==============================================================
restart marker
Restart the immediately following get or put at the indicated
marker. On UNIX systems, marker is usually a byte offset
into the file.
==============================================================
but I am not sure how this works exactly. Would anyone be clued up on the
intricacies of tnftp?
Anything else I could try?
--
Regards,
Mick
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-user] checksumming files
2008-12-04 7:10 [gentoo-user] checksumming files Mick
2008-12-04 7:21 ` Heinrichs, Dirk (EXT-Capgemini - DE/Dusseldorf)
2008-12-04 9:06 ` Neil Bothwick
@ 2008-12-05 19:57 ` Paul Hartman
2008-12-05 20:32 ` Albert Hopkins
3 siblings, 0 replies; 11+ messages in thread
From: Paul Hartman @ 2008-12-05 19:57 UTC (permalink / raw
To: gentoo-user
On Thu, Dec 4, 2008 at 1:10 AM, Mick <michaelkintzios@gmail.com> wrote:
> Almost every time I split a large file >1G into say 200k chunks, then ftp it
> to a server and then:
>
> cat 1 2 3 4 5 6 7 > completefile ; md5sum -c completefile
>
> if fails. Checking the split files in turn I often find 1 or two chunks that
> fail on their own md5 checks. Despite that the concatenated file often works
> (e.g. if it is a video file it'll play alright).
>
> Can you explain this? Should I be using a different check to verify the
> integrity of the ftp'd file?
Obviously something is going wrong... without knowing why that, I
suggest you emerge par2cmdline and use it to create some recovery
blocks. That way you can repair/reassemble the pieces when they get to
the other side.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-user] checksumming files
2008-12-04 7:10 [gentoo-user] checksumming files Mick
` (2 preceding siblings ...)
2008-12-05 19:57 ` Paul Hartman
@ 2008-12-05 20:32 ` Albert Hopkins
2008-12-07 15:39 ` Mick
3 siblings, 1 reply; 11+ messages in thread
From: Albert Hopkins @ 2008-12-05 20:32 UTC (permalink / raw
To: gentoo-user
On Thu, 2008-12-04 at 07:10 +0000, Mick wrote:
> Almost every time I split a large file >1G into say 200k chunks, then ftp it
> to a server and then:
That's thousands of files! Have you gone mad?!
>
> cat 1 2 3 4 5 6 7 > completefile ; md5sum -c completefile
> if fails. Checking the split files in turn I often find 1 or two chunks that
> fail on their own md5 checks. Despite that the concatenated file often works
> (e.g. if it is a video file it'll play alright).
Let me understand this. Are [1..7] the split files or the checksums of
the split files? If the former then 'md5sum -c completefile' will fail
with "no properly formatted MD5 checksum lines found" or similar due to
the fact that "completefile" is not a list of checksums. If the latter,
then how are you generating [1..7]? If you are using the split(1)
command to split the files and are not passing at least "-a 3" to it
then your file is going to be truncated do to the fact that the suffix
length is too small to accommodate the thousands of files needed to
split a 1GB+ file into 200k chunks. You should get an error like "split:
Output file suffixes exhausted."
Maybe if you give the exact commands used I might understand this
better.
I have a feeling that this is not the most efficient method of file
transfer.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-user] checksumming files
2008-12-05 18:48 ` Mick
@ 2008-12-06 10:54 ` Dirk Heinrichs
0 siblings, 0 replies; 11+ messages in thread
From: Dirk Heinrichs @ 2008-12-06 10:54 UTC (permalink / raw
To: gentoo-user
Am Freitag, 5. Dezember 2008 19:48:18 schrieb Mick:
> On Thursday 04 December 2008, Heinrichs, Dirk (EXT-Capgemini -
> DE/Dusseldorf)
>
> wrote:
> > Did you make sure the chunks are transfered in binary mode?
>
> Aha!! Since the split chunks were part of a video file I assumed that it
> would be binary - and I understand that the default type (for tnftp) is
> binary?
>
> > BTW, most
> > modern FTP clients have a resume option, so there's no need to split.
>
> Yes, tnftp has the 'reget' command but I can't find a 'reput', or 'resume'?
> It also has 'restart':
> [...]
> but I am not sure how this works exactly. Would anyone be clued up on the
> intricacies of tnftp?
Unfortunately not, never heard of it before.
> Anything else I could try?
ncftp. This one also comes with ncftpget and ncftpput command line utilities.
They use binary transfer as default and have resume capabilities.
HTH...
Dirk
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-user] checksumming files
2008-12-05 20:32 ` Albert Hopkins
@ 2008-12-07 15:39 ` Mick
2008-12-07 17:28 ` Albert Hopkins
0 siblings, 1 reply; 11+ messages in thread
From: Mick @ 2008-12-07 15:39 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 2283 bytes --]
On Friday 05 December 2008, Albert Hopkins wrote:
> On Thu, 2008-12-04 at 07:10 +0000, Mick wrote:
> > Almost every time I split a large file >1G into say 200k chunks, then ftp
> > it to a server and then:
>
> That's thousands of files! Have you gone mad?!
Ha! small error in units . . . it is 200M (of course this is no disclaimer of
me going/gone mad . . .) I think the server drops the connection above 230M
file uploads or something like that, so I tried 200M files and it seems to
work.
> > cat 1 2 3 4 5 6 7 > completefile ; md5sum -c completefile
> >
> > if fails. Checking the split files in turn I often find 1 or two chunks
> > that fail on their own md5 checks. Despite that the concatenated file
> > often works (e.g. if it is a video file it'll play alright).
>
> Let me understand this. Are [1..7] the split files or the checksums of
> the split files?
They are the the split files which I concatenate into the complete file.
> If the former then 'md5sum -c completefile' will fail
> with "no properly formatted MD5 checksum lines found" or similar due to
> the fact that "completefile" is not a list of checksums. If the latter,
> then how are you generating [1..7]? If you are using the split(1)
> command to split the files and are not passing at least "-a 3" to it
> then your file is going to be truncated do to the fact that the suffix
> length is too small to accommodate the thousands of files needed to
> split a 1GB+ file into 200k chunks. You should get an error like "split:
> Output file suffixes exhausted."
>
> Maybe if you give the exact commands used I might understand this
> better.
>
> I have a feeling that this is not the most efficient method of file
> transfer.
split --verbose -b 20000000 big_file
tnftp -r 45 -u
ftp://<username>:<passwd>@<server_name>/htdocs/<directory_path>/ xaa xab xac
xad . . .
The above would fail after xaa was uploaded and about 1/3 or less of xab. So,
I split up the individual file upload:
tnftp -r 45 -u
ftp://<username>:<passwd>@<server_name>/htdocs/<directory_path>/ xaa ; sleep
1m ; tnftp -r 45 -u
ftp://<username>:<passwd>@<server_name>/htdocs/<directory_path>/ xab ;
sleep ... ; etc.
Does this make sense?
--
Regards,
Mick
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-user] checksumming files
2008-12-07 15:39 ` Mick
@ 2008-12-07 17:28 ` Albert Hopkins
2008-12-07 17:56 ` Mick
0 siblings, 1 reply; 11+ messages in thread
From: Albert Hopkins @ 2008-12-07 17:28 UTC (permalink / raw
To: gentoo-user
On Sun, 2008-12-07 at 15:39 +0000, Mick wrote:
> On Friday 05 December 2008, Albert Hopkins wrote:
> > On Thu, 2008-12-04 at 07:10 +0000, Mick wrote:
> > > Almost every time I split a large file >1G into say 200k chunks, then ftp
> > > it to a server and then:
> >
> > That's thousands of files! Have you gone mad?!
>
> Ha! small error in units . . . it is 200M (of course this is no disclaimer of
> me going/gone mad . . .) I think the server drops the connection above 230M
> file uploads or something like that, so I tried 200M files and it seems to
> work.
>
> > > cat 1 2 3 4 5 6 7 > completefile ; md5sum -c completefile
> > >
> > > if fails. Checking the split files in turn I often find 1 or two chunks
> > > that fail on their own md5 checks. Despite that the concatenated file
> > > often works (e.g. if it is a video file it'll play alright).
> >
> > Let me understand this. Are [1..7] the split files or the checksums of
> > the split files?
>
> They are the the split files which I concatenate into the complete file.
Well, unless you made another error in your OP, you are using md5sum
incorrectly. When you use "-c", md5sum expects a file that is a list of
files/checksums. For example
$ dd if=/dev/urandom of=bigfile bs=1M count=5
5+0 records in
5+0 records out
5242880 bytes (5.2 MB) copied, 2.29361 s, 2.3 MB/s
$ md5sum bigfile > checksum # create checksum file
$ split -b1M bigfile
$ rm bigfile
$ cat xa* > bigfile
$ # This is correct
$ md5sum -c checksum
bigfile: OK
$ # This is wrong!
$ md5sum -c bigfile
md5sum: bigfile: no properly formatted MD5 checksum lines found
[SNIP!]
> > Maybe if you give the exact commands used I might understand this
> > better.
> >
> > I have a feeling that this is not the most efficient method of file
> > transfer.
>
> split --verbose -b 20000000 big_file
>
> tnftp -r 45 -u
> ftp://<username>:<passwd>@<server_name>/htdocs/<directory_path>/ xaa xab xac
> xad . . .
>
> The above would fail after xaa was uploaded and about 1/3 or less of xab. So,
> I split up the individual file upload:
>
> tnftp -r 45 -u
> ftp://<username>:<passwd>@<server_name>/htdocs/<directory_path>/ xaa ; sleep
> 1m ; tnftp -r 45 -u
> ftp://<username>:<passwd>@<server_name>/htdocs/<directory_path>/ xab ;
> sleep ... ; etc.
>
> Does this make sense?
Yes, but if you are truly using "-c" then it would make sense that you
could get a checksum error but the file be ok.
Here's how I would do it. I'm not saying you should do it this way.
I'd use rsync. Rsync does file xfer has checksumming built-in. You say
you split because you get disconnected, right? I'm not sure if rsync
handles re-connects, but you can write a loop so that if rsync fails you
continue where you left off:
status=30
until [ $status -eq 0 ] ;
do
rsync --append-verify big_file server_name:/htdocs/<directory_path>/
status=$?
done
No splitting/concatenating and no need to checksum.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-user] checksumming files
2008-12-07 17:28 ` Albert Hopkins
@ 2008-12-07 17:56 ` Mick
2008-12-07 19:59 ` Neil Bothwick
0 siblings, 1 reply; 11+ messages in thread
From: Mick @ 2008-12-07 17:56 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 1417 bytes --]
On Sunday 07 December 2008, Albert Hopkins wrote:
> On Sun, 2008-12-07 at 15:39 +0000, Mick wrote:
> > They are the the split files which I concatenate into the complete file.
>
> Well, unless you made another error in your OP, you are using md5sum
> incorrectly. When you use "-c", md5sum expects a file that is a list of
> files/checksums. For example
[snip...]
> Yes, but if you are truly using "-c" then it would make sense that you
> could get a checksum error but the file be ok.
Sorry, yes I used the md5sum -c command correctly with the corresponding
checksum file for the big_file.
> Here's how I would do it. I'm not saying you should do it this way.
> I'd use rsync. Rsync does file xfer has checksumming built-in. You say
> you split because you get disconnected, right? I'm not sure if rsync
> handles re-connects, but you can write a loop so that if rsync fails you
> continue where you left off:
>
> status=30
> until [ $status -eq 0 ] ;
> do
> rsync --append-verify big_file server_name:/htdocs/<directory_path>/
> status=$?
> done
>
> No splitting/concatenating and no need to checksum.
Wouldn't the server need to have rsyncd running to be able to do that? Can I
rsync to an ftp server? Also, how would I pass username/passwd on the
command line so that the upload can take place unattended?
Thank you for your replies.
--
Regards,
Mick
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [gentoo-user] checksumming files
2008-12-07 17:56 ` Mick
@ 2008-12-07 19:59 ` Neil Bothwick
0 siblings, 0 replies; 11+ messages in thread
From: Neil Bothwick @ 2008-12-07 19:59 UTC (permalink / raw
To: gentoo-user
[-- Attachment #1: Type: text/plain, Size: 547 bytes --]
On Sun, 7 Dec 2008 17:56:07 +0000, Mick wrote:
> > rsync --append-verify big_file server_name:/htdocs/<directory_path>/ status=$?
> Wouldn't the server need to have rsyncd running to be able to do that?
> Can I rsync to an ftp server? Also, how would I pass username/passwd
> on the command line so that the upload can take place unattended?
You only need rsyncd running to use rsync::reponame type connections. The
above command uses ssh to connect.
--
Neil Bothwick
Veni, vermini, vomui
I came, I got ratted, I threw up
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-12-07 19:59 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-04 7:10 [gentoo-user] checksumming files Mick
2008-12-04 7:21 ` Heinrichs, Dirk (EXT-Capgemini - DE/Dusseldorf)
2008-12-05 18:48 ` Mick
2008-12-06 10:54 ` Dirk Heinrichs
2008-12-04 9:06 ` Neil Bothwick
2008-12-05 19:57 ` Paul Hartman
2008-12-05 20:32 ` Albert Hopkins
2008-12-07 15:39 ` Mick
2008-12-07 17:28 ` Albert Hopkins
2008-12-07 17:56 ` Mick
2008-12-07 19:59 ` Neil Bothwick
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox