public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* Re: [gentoo-user] I have 146,000 files in lost+found. How do I sort them?
       [not found] <1159242928.22700.31.camel@localhost>
@ 2006-09-26  4:03 ` Richard Fish
  2006-09-26 13:20 ` Boyd Stephen Smith Jr.
  1 sibling, 0 replies; 3+ messages in thread
From: Richard Fish @ 2006-09-26  4:03 UTC (permalink / raw
  To: gentoo-user

On 9/25/06, Robert Persson <ireneshusband@gmail.com> wrote:
> If I can, how can I best sift through them? Is there a utility, or
> something I could drop into a simple bash script, that would look at the
> first few bytes of the file and, say, identify it as a jpeg or an xml
> file, so that it could be given an appropriate file extension, deleted
> or moved? Or is there one that could distinguish a text file from a
> binary?

sys-apps/file will give you the 'file' command, which does exactly this.

-Richard
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [gentoo-user] I have 146,000 files in lost+found. How do I sort them?
       [not found] <1159242928.22700.31.camel@localhost>
  2006-09-26  4:03 ` [gentoo-user] I have 146,000 files in lost+found. How do I sort them? Richard Fish
@ 2006-09-26 13:20 ` Boyd Stephen Smith Jr.
  2006-09-28  4:30   ` Robert Persson
  1 sibling, 1 reply; 3+ messages in thread
From: Boyd Stephen Smith Jr. @ 2006-09-26 13:20 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 2620 bytes --]

On Monday 25 September 2006 22:55, Robert Persson <ireneshusband@gmail.com> 
wrote about '[gentoo-user] I have 146,000 files in lost+found. How do I 
sort them?':
> Am I likely to find many usable files in that /lost+found directory?

Maybe.  I tried to recover a corrupted ext3 boot recently and was unable to 
pull anything useful out of lost and found that was larger than a 
symlink. :(  If a number of files NOT in lost+found were corrupt, it's 
likely most of the files in lost+found are corrupt as well.

That said, /boot data is generally easy to replace, so I put no effort into 
recovering files that were corrupted.  If the data was valuable, if might 
be worth it to spend some time sorting those out.

> If I can, how can I best sift through them?

Carefully. :)

> Is there a utility, or 
> something I could drop into a simple bash script, that would look at the
> first few bytes of the file and, say, identify it as a jpeg or an xml
> file, so that it could be given an appropriate file extension, deleted
> or moved?

As the other poster mentioned, the file utility is useful for identifying 
the type of file.  Keep in mind though that is only looks at the first few 
bytes of the file, if there's corruption later on file won't notice.

> Or is there one that could distinguish a text file from a 
> binary?

Of course, file does this to some extent.  A MIME type of text/* is 
generally text, while anything else is binary.  But, file's output (by 
default) isn't a simple "binary" or "text" string.

Some of the GNU utilities that are meant for text files will complain 
before operating on a binary file, so you could use those for this task, 
possibly.  (I'm thinking of less and grep.)  In particular, 
grep '[^[:print:]]' should return true when run against a file that 
contains non-printable characters (like control characters or NUL, and, 
depending on locale, non-7-bit-clean characters).

> Are there any other strategies I could use to sift through these files
> (assuming it would be worth doing)?

Well, before you write some sort of bash script around file to rename 
stuff, you'll probably want to remove anything that is clearly trash, like 
device nodes or 0-length files.  Something like:
find lost+found \! \( -type f -o -type d \ -o -type l \) -o -empty -delete
should work if you are using GNU find.

-- 
"If there's one thing we've established over the years,
it's that the vast majority of our users don't have the slightest
clue what's best for them in terms of package stability."
-- Gentoo Developer Ciaran McCreesh

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [gentoo-user] I have 146,000 files in lost+found. How do I sort them?
  2006-09-26 13:20 ` Boyd Stephen Smith Jr.
@ 2006-09-28  4:30   ` Robert Persson
  0 siblings, 0 replies; 3+ messages in thread
From: Robert Persson @ 2006-09-28  4:30 UTC (permalink / raw
  To: gentoo-user

Thanks for the detailed advice. And thanks, Richard for your advice too.

In the end (before I received your posts) I managed to move all the
files into enough smaller directories that I could browse them in
Nautilus. From what I saw it looked very much to me like most of the
files were ones that had been deleted by emerge before the big disaster.
I didn't look at every single one obviously, but it soon became obvious
that I wasn't going to find much of any use.

And thanks for giving a practical example of how to use find. I have
always found the man page rather heavy going, so this is the first time
I have felt I have half an idea how to use it.

Robert

On Tue, 2006-26-09 at 08:20 -0500, Boyd Stephen Smith Jr. wrote:
> On Monday 25 September 2006 22:55, Robert Persson <ireneshusband@gmail.com> 
> wrote about '[gentoo-user] I have 146,000 files in lost+found. How do I 
> sort them?':
> > Am I likely to find many usable files in that /lost+found directory?
> 
> Maybe.  I tried to recover a corrupted ext3 boot recently and was unable to 
> pull anything useful out of lost and found that was larger than a 
> symlink. :(  If a number of files NOT in lost+found were corrupt, it's 
> likely most of the files in lost+found are corrupt as well.
> 
> That said, /boot data is generally easy to replace, so I put no effort into 
> recovering files that were corrupted.  If the data was valuable, if might 
> be worth it to spend some time sorting those out.
> 
> > If I can, how can I best sift through them?
> 
> Carefully. :)
> 
> > Is there a utility, or 
> > something I could drop into a simple bash script, that would look at the
> > first few bytes of the file and, say, identify it as a jpeg or an xml
> > file, so that it could be given an appropriate file extension, deleted
> > or moved?
> 
> As the other poster mentioned, the file utility is useful for identifying 
> the type of file.  Keep in mind though that is only looks at the first few 
> bytes of the file, if there's corruption later on file won't notice.
> 
> > Or is there one that could distinguish a text file from a 
> > binary?
> 
> Of course, file does this to some extent.  A MIME type of text/* is 
> generally text, while anything else is binary.  But, file's output (by 
> default) isn't a simple "binary" or "text" string.
> 
> Some of the GNU utilities that are meant for text files will complain 
> before operating on a binary file, so you could use those for this task, 
> possibly.  (I'm thinking of less and grep.)  In particular, 
> grep '[^[:print:]]' should return true when run against a file that 
> contains non-printable characters (like control characters or NUL, and, 
> depending on locale, non-7-bit-clean characters).
> 
> > Are there any other strategies I could use to sift through these files
> > (assuming it would be worth doing)?
> 
> Well, before you write some sort of bash script around file to rename 
> stuff, you'll probably want to remove anything that is clearly trash, like 
> device nodes or 0-length files.  Something like:
> find lost+found \! \( -type f -o -type d \ -o -type l \) -o -empty -delete
> should work if you are using GNU find.
> 

-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-09-28  4:36 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1159242928.22700.31.camel@localhost>
2006-09-26  4:03 ` [gentoo-user] I have 146,000 files in lost+found. How do I sort them? Richard Fish
2006-09-26 13:20 ` Boyd Stephen Smith Jr.
2006-09-28  4:30   ` Robert Persson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox