[gentoo-user] OT: extract an image from a .doc file?

public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed

* [gentoo-user] OT: extract an image from a .doc file?
@ 2009-12-13  8:46 Stroller
  2009-12-13 10:50 ` Mick
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Stroller @ 2009-12-13  8:46 UTC (permalink / raw
  To: gentoo-user

Hi all,

A .doc file contains an image. Is there any way to extract the image  
file in its original format, please?

This may seem like a bit of an odd request, so I'll explain. The .doc  
file is quite large, and it seems like the image it contains must be  
to blame. I would like to extract the original file of the image and  
examine it. I have tried in OpenOffice on Windows and Word for Mac. In  
OpenOffice I can't see any way to save the image file, in Word for Mac  
I can drag the file to the desktop but it becomes a "Picture  
clipping.pictClipping" and is clearly not the original format.

I tried running `photorec` on the .doc file, but that just "finds"  
the .doc file itself. I thought to use dd to zero over the first few  
bytes of the .doc - maybe this would make the .doc unrecognisable to  
photorec, and then photorec would maybe find the image file inside the  
corrupt document, but I haven't tried that yet. I'm not sure if it'd  
work, and so I thought I'd ask here to see if anyone knew of an easy  
way to do this first.

TIA for any suggestions,

Stroller.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-13  8:46 [gentoo-user] OT: extract an image from a .doc file? Stroller
@ 2009-12-13 10:50 ` Mick
  2009-12-13 12:12   ` Stroller
  2009-12-13 14:57 ` felix
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 25+ messages in thread
From: Mick @ 2009-12-13 10:50 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: Text/Plain, Size: 782 bytes --]

On Sunday 13 December 2009 08:46:05 Stroller wrote:
> Hi all,
> 
> A .doc file contains an image. Is there any way to extract the image
> file in its original format, please?
> 
> This may seem like a bit of an odd request, so I'll explain. The .doc
> file is quite large, and it seems like the image it contains must be
> to blame. I would like to extract the original file of the image and
> examine it. I have tried in OpenOffice on Windows and Word for Mac. In
> OpenOffice I can't see any way to save the image file, 

I don't know about MSWindows, but in OOo-bin in Linux I can right-click on the 
image and select 'Save graphics' when the image is jpeg/png/etc.  Not sure if 
this works with MS embedded images/files from e.g. Powerpoint.
-- 
Regards,
Mick

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-13 10:50 ` Mick
@ 2009-12-13 12:12   ` Stroller
  2009-12-13 12:50     ` Mick
  0 siblings, 1 reply; 25+ messages in thread
From: Stroller @ 2009-12-13 12:12 UTC (permalink / raw
  To: gentoo-user

On 13 Dec 2009, at 10:50, Mick wrote:
> On Sunday 13 December 2009 08:46:05 Stroller wrote:
>> A .doc file contains an image. Is there any way to extract the image
>> file in its original format, please?
>> .... I have tried in OpenOffice on Windows and Word for Mac. In
>> OpenOffice I can't see any way to save the image file, 
> 
> I don't know about MSWindows, but in OOo-bin in Linux I can right-click on the 
> image and select 'Save graphics' when the image is jpeg/png/etc.  Not sure if 
> this works with MS embedded images/files from e.g. Powerpoint.

This is strange. I get the same thing in Open Office (on Windows) if I create a new .doc and add a jpeg to it.

Right-clicking on the image gives me a menu of:  Arrange, Alignment, Anchor, Wrap, (separator), Picture..., Save Graphics..., Caption..., ImageMap, (separator), Cut, Copy, Paste.

If I open the file(s) I have the interest in, the first 4 entries in the context-menu are the same, but after the first separator I get instead "Object" (which did not appear previously) and "Caption". There is then another separator and instead of Cut, Copy, Paste, I see only Cut & Copy.

This file was created by the software that a lettings agency uses to manage their properties. It runs on Windows and automatically generates letters (for overdue rent, inspections &c) in .doc format. One image in question is the boss' signature, so the letters appear like he actually signed them, but I think they also use company logos in other letters.

Apart from that, I don't see why this image is treated differently by OpenOffice.

Isn't there a program (command line?) for converting .doc into HTML? Maybe that would extract the image.

The reason I'd like to see this is because some of the .doc files are 2 meg in size (some others exactly 1meg, so cluster size may affect this) and there are thousands of them taking up space on the server. If the image is to blame then we would benefit many times from the size saving. I haven't yet spoken to the site about this, only discovering it yesterday, so I don't know if I can find the file by accessing the property management software.

Cheers,

Stroller.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-13 12:12   ` Stroller
@ 2009-12-13 12:50     ` Mick
  0 siblings, 0 replies; 25+ messages in thread
From: Mick @ 2009-12-13 12:50 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: Text/Plain, Size: 2554 bytes --]

On Sunday 13 December 2009 12:12:46 Stroller wrote:
> On 13 Dec 2009, at 10:50, Mick wrote:
> > On Sunday 13 December 2009 08:46:05 Stroller wrote:

> If I open the file(s) I have the interest in, the first 4 entries in the
>  context-menu are the same, but after the first separator I get instead
>  "Object" (which did not appear previously) and "Caption". There is then
>  another separator and instead of Cut, Copy, Paste, I see only Cut & Copy.

This indicates that the graphic in question is an embedded MSWindows file.  If 
you were able to double click on it in MSWIndows it would read its metadata 
and launch the respective MSWindows application for editing it; e.g. MSPaint, 
PPt, Excel and what not.  With OOo this API linkage is not there I guess, so 
all you can do cut/copy it.

> This file was created by the software that a lettings agency uses to manage
>  their properties. It runs on Windows and automatically generates letters
>  (for overdue rent, inspections &c) in .doc format. One image in question
>  is the boss' signature, so the letters appear like he actually signed
>  them, but I think they also use company logos in other letters.

I guess that whoever created this image they did not save it as 'conventional' 
image, e.g. jpeg, png, etc, and therefore OOo cannot deal with it as it would 
with a normal image.

> Apart from that, I don't see why this image is treated differently by
>  OpenOffice.

Because it is not an 'image' but an embedded MSWindows file in the MSWord 
document with loads of its own proprietary metadata.

> Isn't there a program (command line?) for converting .doc into HTML? Maybe
>  that would extract the image.

I think that MSWord has either a SaveAs or an export function which will 
convert the file into HTML.  Also OOo has File/Preview as HTML, which will 
convert the document into html and open it in a browser - if the graphics look 
correct then you could save it from with the browser.

> The reason I'd like to see this is because some of the .doc files are 2 meg
>  in size (some others exactly 1meg, so cluster size may affect this) and
>  there are thousands of them taking up space on the server. If the image is
>  to blame then we would benefit many times from the size saving. I haven't
>  yet spoken to the site about this, only discovering it yesterday, so I
>  don't know if I can find the file by accessing the property management
>  software.

Have you looked at what size you get with pdf'ing them?
-- 
Regards,
Mick

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-13  8:46 [gentoo-user] OT: extract an image from a .doc file? Stroller
  2009-12-13 10:50 ` Mick
@ 2009-12-13 14:57 ` felix
  2009-12-13 15:01 ` Sebastian Beßler
  2009-12-14 19:23 ` Daniel da Veiga
  3 siblings, 0 replies; 25+ messages in thread
From: felix @ 2009-12-13 14:57 UTC (permalink / raw
  To: gentoo-user

On Sun, Dec 13, 2009 at 08:46:05AM +0000, Stroller wrote:

> A .doc file contains an image. Is there any way to extract the image  
> file in its original format, please?

My limited experience with OpenOffice is that in slideshows, right
click on an image brings up a context menu with a save image option.
I do not know if this applies to .doc files.

-- 
            ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
     Felix Finch: scarecrow repairman & rocket surgeon / felix@crowfix.com
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-13  8:46 [gentoo-user] OT: extract an image from a .doc file? Stroller
  2009-12-13 10:50 ` Mick
  2009-12-13 14:57 ` felix
@ 2009-12-13 15:01 ` Sebastian Beßler
  2009-12-14  9:48   ` Stroller
  2009-12-14 19:23 ` Daniel da Veiga
  3 siblings, 1 reply; 25+ messages in thread
From: Sebastian Beßler @ 2009-12-13 15:01 UTC (permalink / raw
  To: gentoo-user

Am 13.12.2009 09:46, schrieb Stroller:
> Hi all,
> 
> A .doc file contains an image. Is there any way to extract the image
> file in its original format, please?

Open the doc file with OpenOffice, save it as a odt file.
The odt is a renamed zip archive that should contain the image in on of
its subfolders.

Greetings

Sebastian



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-13 15:01 ` Sebastian Beßler
@ 2009-12-14  9:48   ` Stroller
  2009-12-14 13:01     ` Renat Golubchyk
                       ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Stroller @ 2009-12-14  9:48 UTC (permalink / raw
  To: gentoo-user

On 13 Dec 2009, at 15:01, Sebastian Beßler wrote:
> Am 13.12.2009 09:46, schrieb Stroller:
>> Hi all,
>>
>> A .doc file contains an image. Is there any way to extract the image
>> file in its original format, please?
>
> Open the doc file with OpenOffice, save it as a odt file.
> The odt is a renamed zip archive that should contain the image in on  
> of
> its subfolders.

Great idea, Sebastian.

The file which is responsible for the size of the .doc is immediately  
obvious when I rename this document.odt to document.zip.

It is a 2meg file, but unfortunately, as Mick appears to have  
predicted, it is called simply "Object 1" with no file extension.

Running `file` on it shows it to be a "Microsoft Office Document", but  
it's apparently not the kind you can open in Word.

I suspect this is going to prove a dead loss. Thanks for your help,  
though.

Stroller.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-14  9:48   ` Stroller
@ 2009-12-14 13:01     ` Renat Golubchyk
  2009-12-14 14:43       ` Willie Wong
  2009-12-14 15:43       ` Stroller
  2009-12-14 15:06     ` Arttu V.
  2009-12-14 16:46     ` Sebastian Beßler
  2 siblings, 2 replies; 25+ messages in thread
From: Renat Golubchyk @ 2009-12-14 13:01 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1207 bytes --]

On Mon, 14 Dec 2009 09:48:57 +0000
Stroller <stroller@stellar.eclipse.co.uk> wrote:
> 
> On 13 Dec 2009, at 15:01, Sebastian Beßler wrote:
> > Am 13.12.2009 09:46, schrieb Stroller:
> >> Hi all,
> >>
> >> A .doc file contains an image. Is there any way to extract the
> >> image file in its original format, please?
> >
> > Open the doc file with OpenOffice, save it as a odt file.
> > The odt is a renamed zip archive that should contain the image in
> > on of
> > its subfolders.
> 
> Great idea, Sebastian.
> 
> The file which is responsible for the size of the .doc is
> immediately obvious when I rename this document.odt to document.zip.
> 
> It is a 2meg file, but unfortunately, as Mick appears to have  
> predicted, it is called simply "Object 1" with no file extension.
> 
> Running `file` on it shows it to be a "Microsoft Office Document",
> but it's apparently not the kind you can open in Word.

Have you tried opening this "Object 1" file in OpenOffice and repeat
the steps above again?


Cheers,
Renat

-- 
Probleme kann man niemals mit derselben Denkweise loesen,
durch die sie entstanden sind.
                                              (Einstein)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-14 13:01     ` Renat Golubchyk
@ 2009-12-14 14:43       ` Willie Wong
  2009-12-14 19:21         ` Stroller
  2009-12-14 15:43       ` Stroller
  1 sibling, 1 reply; 25+ messages in thread
From: Willie Wong @ 2009-12-14 14:43 UTC (permalink / raw
  To: gentoo-user

On Mon, Dec 14, 2009 at 02:01:50PM +0100, Penguin Lover Renat Golubchyk squawked:
> > It is a 2meg file, but unfortunately, as Mick appears to have  
> > predicted, it is called simply "Object 1" with no file extension.
> > 
> > Running `file` on it shows it to be a "Microsoft Office Document",
> > but it's apparently not the kind you can open in Word.
> 
> Have you tried opening this "Object 1" file in OpenOffice and repeat
> the steps above again?

It would be hilarious if it were "Object N" all the way down. 

I apologize if these have been covered before, but since I don't
remember seeing it:
 (a) Is it not possible to extract that image in Microsoft Word
 itself? (Opening the file in question in Microsoft Word and saving
 the image?) What happens if you save the file in Word's funny XML
 format? (Knowing MS, I wouldn't be too surprised if the image becomes
 some sort of funny base64 encoded string, but it is still worth a
 try.)
 (b) If the Big Wig is already happily letting the computer sign those
 documents for him, is it prohibitive to try the non-technological
 measure? E.g., ask the Big Wig to provide another image of his
 signature? 
 (c) If the image file is that big, it is probably because the
 original that got included in the doc file has a ridiculously high
 resolution (maybe they just scanned the signature in, cleaned it up a
 bit? My signature usually fits in a 1/2 inch by 2 inch block, if
 scanned at 24-bit color and 600 dpi, this makes almost a 1M raw
 image). I hope if the processing/storage/bandwidth tax is high
 enough, an "upstream" fix would not be ruled out directly. 

Also, I do recall that newer versions of MS Word has the capability to
compress included images; though it is not used by default. 

Cheers, 

W

-- 
(04:01:59) W: yep
(04:02:02) W: I love linux
(04:02:15) NJYWT: I love penguins
Sortir en Pantoufles: up 1102 days, 13:18

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-14  9:48   ` Stroller
  2009-12-14 13:01     ` Renat Golubchyk
@ 2009-12-14 15:06     ` Arttu V.
  2009-12-14 15:18       ` Willie Wong
  2009-12-14 16:46     ` Sebastian Beßler
  2 siblings, 1 reply; 25+ messages in thread
From: Arttu V. @ 2009-12-14 15:06 UTC (permalink / raw
  To: gentoo-user

On 12/14/09, Stroller <stroller@stellar.eclipse.co.uk> wrote:
> It is a 2meg file, but unfortunately, as Mick appears to have
> predicted, it is called simply "Object 1" with no file extension.
>
> Running `file` on it shows it to be a "Microsoft Office Document", but
> it's apparently not the kind you can open in Word.
>
> I suspect this is going to prove a dead loss. Thanks for your help,
> though.

Throwing a wild guess here. Could it be a MODI object?

http://en.wikipedia.org/wiki/MODI

Then you have entered captive markets, might be hard to do much
without software from MS.

-- 
Arttu V.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-14 15:06     ` Arttu V.
@ 2009-12-14 15:18       ` Willie Wong
  2009-12-14 16:25         ` Dale
  0 siblings, 1 reply; 25+ messages in thread
From: Willie Wong @ 2009-12-14 15:18 UTC (permalink / raw
  To: gentoo-user

On Mon, Dec 14, 2009 at 05:06:35PM +0200, Penguin Lover Arttu V. squawked:
> On 12/14/09, Stroller <stroller@stellar.eclipse.co.uk> wrote:
> > It is a 2meg file, but unfortunately, as Mick appears to have
> > predicted, it is called simply "Object 1" with no file extension.
> >
> > Running `file` on it shows it to be a "Microsoft Office Document", but
> > it's apparently not the kind you can open in Word.
> >
> > I suspect this is going to prove a dead loss. Thanks for your help,
> > though.
> 
> Throwing a wild guess here. Could it be a MODI object?
> 
> http://en.wikipedia.org/wiki/MODI
> 
> Then you have entered captive markets, might be hard to do much
> without software from MS.

Correct me if I am wrong, but isn't the original object a image of a
signature? If they used MODI (whose point I thought was so that you
have OCR on the scanned document) for an illegible scrawl, I think
this should be nominated for the DailyWTF....

Cheers, 

W
-- 
A cliche is a cliche is a cliche is a cliche is a cliche is a cliche.
Sortir en Pantoufles: up 1102 days, 14:07



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-14 13:01     ` Renat Golubchyk
  2009-12-14 14:43       ` Willie Wong
@ 2009-12-14 15:43       ` Stroller
  2009-12-14 16:44         ` Renat Golubchyk
  1 sibling, 1 reply; 25+ messages in thread
From: Stroller @ 2009-12-14 15:43 UTC (permalink / raw
  To: gentoo-user


On 14 Dec 2009, at 13:01, Renat Golubchyk wrote:
>> ...
>> The file which is responsible for the size of the .doc is
>> immediately obvious when I rename this document.odt to document.zip.
>>
>> It is a 2meg file, but unfortunately, as Mick appears to have
>> predicted, it is called simply "Object 1" with no file extension.
>>
>> Running `file` on it shows it to be a "Microsoft Office Document",
>> but it's apparently not the kind you can open in Word.
>
> Have you tried opening this "Object 1" file in OpenOffice and repeat
> the steps above again?

I don't seem to be able to open this file in Open Office. It doesn't  
recognise the format, and gives me a list of about 100 file types to  
try. Choosing (I think) Microsoft Word document doesn't work, and I  
can't see anything else in the list that looks more promising.

I tried running photorec on "Object 1" and it produces a  
recovered .doc file, but that doesn't open, either.

Stroller.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-14 15:18       ` Willie Wong
@ 2009-12-14 16:25         ` Dale
  2009-12-14 17:27           ` Willie Wong
  2009-12-14 19:45           ` Stroller
  0 siblings, 2 replies; 25+ messages in thread
From: Dale @ 2009-12-14 16:25 UTC (permalink / raw
  To: gentoo-user

Willie Wong wrote:
>
> Correct me if I am wrong, but isn't the original object a image of a
> signature? If they used MODI (whose point I thought was so that you
> have OCR on the scanned document) for an illegible scrawl, I think
> this should be nominated for the DailyWTF....
>
> Cheers, 
>
> W
>   

I'm somewhat clueless about this software issue but wonder about this 
way of seeing things.  Since it appears there is a signature, as in what 
is at the bottom of a letter or a bank check, wouldn't they want to make 
it so that is not able to be extracted at all?  If I had a digital 
signature, I wouldn't want to put it somewhere that it could be used by 
someone that I wouldn't want it to be used by.

I'm not saying you would do this, because I don't think you would, but 
if you could grab that image, then someone who did have bad intentions 
could do the same thing.  This would not be good.  My reason for 
mentioning this, it may be done in such a way that is intended to 
prevent you from doing what you want to do.  Therefore, it may not be 
"doable" by design.

Just a thought.

Dale

:-)  :-) 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-14 15:43       ` Stroller
@ 2009-12-14 16:44         ` Renat Golubchyk
  2009-12-15 14:22           ` Sebastian Beßler
  0 siblings, 1 reply; 25+ messages in thread
From: Renat Golubchyk @ 2009-12-14 16:44 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1326 bytes --]

On Mon, 14 Dec 2009 15:43:23 +0000
Stroller <stroller@stellar.eclipse.co.uk> wrote:> 
> On 14 Dec 2009, at 13:01, Renat Golubchyk wrote:
> >> ...
> >> The file which is responsible for the size of the .doc is
> >> immediately obvious when I rename this document.odt to
> >> document.zip.
> >>
> >> It is a 2meg file, but unfortunately, as Mick appears to have
> >> predicted, it is called simply "Object 1" with no file extension.
> >>
> >> Running `file` on it shows it to be a "Microsoft Office Document",
> >> but it's apparently not the kind you can open in Word.
> >
> > Have you tried opening this "Object 1" file in OpenOffice and repeat
> > the steps above again?
> 
> I don't seem to be able to open this file in Open Office. It doesn't  
> recognise the format, and gives me a list of about 100 file types to  
> try. Choosing (I think) Microsoft Word document doesn't work, and I  
> can't see anything else in the list that looks more promising.
> 
> I tried running photorec on "Object 1" and it produces a  
> recovered .doc file, but that doesn't open, either.

Try checking it with ImageMagick's "identify".


Cheers,
Renat

-- 
Probleme kann man niemals mit derselben Denkweise loesen,
durch die sie entstanden sind.
                                              (Einstein)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-14  9:48   ` Stroller
  2009-12-14 13:01     ` Renat Golubchyk
  2009-12-14 15:06     ` Arttu V.
@ 2009-12-14 16:46     ` Sebastian Beßler
  2 siblings, 0 replies; 25+ messages in thread
From: Sebastian Beßler @ 2009-12-14 16:46 UTC (permalink / raw
  To: gentoo-user

Am Montag, 14. Dezember 2009 10:48:57 schrieb Stroller:

> I suspect this is going to prove a dead loss. Thanks for your help,
> though.
 
As mentioned here 
http://suppressingfire.org/~burner/evil-mods-tiff/
you could try to use http://foremost.sourceforge.net/ to recover the image 
from "Object 1".

Greetings

Sebastian



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-14 16:25         ` Dale
@ 2009-12-14 17:27           ` Willie Wong
  2009-12-14 19:45           ` Stroller
  1 sibling, 0 replies; 25+ messages in thread
From: Willie Wong @ 2009-12-14 17:27 UTC (permalink / raw
  To: gentoo-user

On Mon, Dec 14, 2009 at 10:25:51AM -0600, Penguin Lover Dale squawked:
> I'm somewhat clueless about this software issue but wonder about this way 
> of seeing things.  Since it appears there is a signature, as in what is at 
> the bottom of a letter or a bank check, wouldn't they want to make it so 
> that is not able to be extracted at all?  If I had a digital signature, I 
> wouldn't want to put it somewhere that it could be used by someone that I 
> wouldn't want it to be used by.

What's to prevent me from printing it out and forging the signature by
hand? Or, god forbid, zooming in, taking a screen-cap, and using the
extracted image that way? If this design is *for security*, there are
many problems. 

(Incidentally, Donald Knuth stopped sending out real cheques for bug
discovery because people would scan it, put it online for bragging
rights, and Knuth's bank account would get targeted for fraudulant
transactions.)

W
-- 
A lot of money is tainted. It taint yours and it taint mine.
Sortir en Pantoufles: up 1102 days, 16:12



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-14 14:43       ` Willie Wong
@ 2009-12-14 19:21         ` Stroller
  2009-12-15 15:50           ` Marcus Wanner
  0 siblings, 1 reply; 25+ messages in thread
From: Stroller @ 2009-12-14 19:21 UTC (permalink / raw
  To: gentoo-user

On 14 Dec 2009, at 14:43, Willie Wong wrote:
> ...
> (b) If the Big Wig is already happily letting the computer sign those
> documents for him, is it prohibitive to try the non-technological
> measure? E.g., ask the Big Wig to provide another image of his
> signature?

Oh, for sure.

I just didn't expect it to be this complicated. I expected to be able  
to open the document and pretty much to be able to click on the file  
to ascertain it's file size. I expected to be able to turn around  
quickly to the boss and say "you'd have saved all this file space if  
you used a 20kb version instead".

When I posted here I was kinda expecting someone to be able to suggest  
a 2- to 5-minute fix. I had no idea it would be this complicated, and  
now I'm mostly only interested because it has become an interesting  
problem.

> (c) If the image file is that big, it is probably because the
> original that got included in the doc file has a ridiculously high
> resolution (maybe they just scanned the signature in, cleaned it up a
> bit? My signature usually fits in a 1/2 inch by 2 inch block, if
> scanned at 24-bit color and 600 dpi, this makes almost a 1M raw
> image). I hope if the processing/storage/bandwidth tax is high
> enough, an "upstream" fix would not be ruled out directly.

Yeah, I think I have a copy of my signature here which was scanned at  
about that kinda resolution, stored as a bitmap & has a large  
filesize. When I discovered how badly it slowed down Word when  
actually trying to place it in a document it got replaced with a much  
smaller gif version. The improvement in performance that this  
eventuated was, to me, slightly unexpected - surely whatever the  
original format, both images must be stored in RAM in about the same  
way.

Stroller.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-13  8:46 [gentoo-user] OT: extract an image from a .doc file? Stroller
                   ` (2 preceding siblings ...)
  2009-12-13 15:01 ` Sebastian Beßler
@ 2009-12-14 19:23 ` Daniel da Veiga
  2009-12-15 13:01   ` Stroller
  3 siblings, 1 reply; 25+ messages in thread
From: Daniel da Veiga @ 2009-12-14 19:23 UTC (permalink / raw
  To: gentoo-user

On Sun, Dec 13, 2009 at 06:46, Stroller <stroller@stellar.eclipse.co.uk> wrote:
> Hi all,
>
> A .doc file contains an image. Is there any way to extract the image file in
> its original format, please?
>
> This may seem like a bit of an odd request, so I'll explain. The .doc file
> is quite large, and it seems like the image it contains must be to blame. I
> would like to extract the original file of the image and examine it. I have
> tried in OpenOffice on Windows and Word for Mac. In OpenOffice I can't see
> any way to save the image file, in Word for Mac I can drag the file to the
> desktop but it becomes a "Picture clipping.pictClipping" and is clearly not
> the original format.
>
> I tried running `photorec` on the .doc file, but that just "finds" the .doc
> file itself. I thought to use dd to zero over the first few bytes of the
> .doc - maybe this would make the .doc unrecognisable to photorec, and then
> photorec would maybe find the image file inside the corrupt document, but I
> haven't tried that yet. I'm not sure if it'd work, and so I thought I'd ask
> here to see if anyone knew of an easy way to do this first.
>
> TIA for any suggestions,
>

When I want to extract an image from a doc I save it as HTML. It saves
images in a separated folder and links it into the HTML. I simply go
to the folder and check the image.

Hope it helps.

-- 
Daniel da Veiga



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-14 16:25         ` Dale
  2009-12-14 17:27           ` Willie Wong
@ 2009-12-14 19:45           ` Stroller
  1 sibling, 0 replies; 25+ messages in thread
From: Stroller @ 2009-12-14 19:45 UTC (permalink / raw
  To: gentoo-user

On 14 Dec 2009, at 16:25, Dale wrote:
> ...
> I'm somewhat clueless about this software issue but wonder about  
> this way of seeing things.  Since it appears there is a signature,  
> as in what is at the bottom of a letter or a bank check, wouldn't  
> they want to make it so that is not able to be extracted at all?  If  
> I had a digital signature, I wouldn't want to put it somewhere that  
> it could be used by someone that I wouldn't want it to be used by.

My customer, an agency which rents out apartments on behalf of  
landlords, has a server on which they store all their files. These  
letters are generated by their property management software, and are  
written to their landlords, tenants &c. Obviously they have to keep a  
big archive of copies of all the letters they've written in the past.

The letters are stored in .doc format, but when they're originated I  
think they're actually just printed out and posted. Storing the  
signature in the .doc is, I think, just a convenience to save the boss  
(or other members of staff) having to physically scribble a signature  
at the bottom of each letter.

I'm guessing the office might manage c 100 properties, and I imagine  
that they may generate some thousands of letters per year. So a 1meg  
image in a letter might well start to consume gigs of disk space.  
"Gigs of disk space" is only a little bit of a problem because when  
this server was manufactured, 4 years ago, decent RAID meant  
horrendously expensive SCSI disks (it is still under support contract  
for another year) and because I haven't got around to migrating them  
away from an offsite-backup provider who is somewhat overpriced.

Hope this clarifies,

Stroller.

PS: a couple of great suggestions have been made this afternoon. I'm  
going to watch some of the fruits my labours - my labours involving  
get_iplayer earlier today - and will try these suggestions later. Many  
thanks!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-14 19:23 ` Daniel da Veiga
@ 2009-12-15 13:01   ` Stroller
  2009-12-15 14:00     ` Mick
  0 siblings, 1 reply; 25+ messages in thread
From: Stroller @ 2009-12-15 13:01 UTC (permalink / raw
  To: gentoo-user

On 14 Dec 2009, at 19:23, Daniel da Veiga wrote:
> ...
> When I want to extract an image from a doc I save it as HTML. It saves
> images in a separated folder and links it into the HTML. I simply go
> to the folder and check the image.

When I do this in Open Office the image in the resulting .html  
document is a .png. If I do it in Word for Mac it's a .gif, although  
there appears to be an option to use .png in the export options.

I think the image is a bitmap & it's being converted. There's nothing  
else in the document that would explain it being 2meg.

I'll try Renat's suggestion to use ImageMagick's `identify` command  
(emerging ImageMagick now), but will just mention it to the customer  
this afternoon.

Stroller.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-15 13:01   ` Stroller
@ 2009-12-15 14:00     ` Mick
  2009-12-15 16:29       ` Stroller
  0 siblings, 1 reply; 25+ messages in thread
From: Mick @ 2009-12-15 14:00 UTC (permalink / raw
  To: gentoo-user

2009/12/15 Stroller <stroller@stellar.eclipse.co.uk>:
>
> On 14 Dec 2009, at 19:23, Daniel da Veiga wrote:
>>
>> ...
>> When I want to extract an image from a doc I save it as HTML. It saves
>> images in a separated folder and links it into the HTML. I simply go
>> to the folder and check the image.
>
> When I do this in Open Office the image in the resulting .html document is a
> .png. If I do it in Word for Mac it's a .gif, although there appears to be
> an option to use .png in the export options.
>
> I think the image is a bitmap & it's being converted. There's nothing else
> in the document that would explain it being 2meg.
>
> I'll try Renat's suggestion to use ImageMagick's `identify` command
> (emerging ImageMagick now), but will just mention it to the customer this
> afternoon.

I'm guessing that the OOo HTML converter will probably turn images
into PNGs.  If you want to see what the original format is then open
the .doc file using OOo and Save As an ODF file - OOo's open document
format.  Then unzip it and in the folder that is created amongst other
you will find:

-Configurations/Images
-Pictures
-Thumbnails

assuming that OOo was successful in converting them to a
non-proprietary format.  However, if the signature file is a MSWindows
embedded metafile you may be out of luck.  In that case the only
solution is to ask the originators of these files to paste/embed these
signature images as a png/jpeg file.

Coming to think of it, you may also be able to copy and paste the
image after you convert the file into pdf ... but I am not sure if
this is going to help with your problem.
-- 
Regards,
Mick

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-14 16:44         ` Renat Golubchyk
@ 2009-12-15 14:22           ` Sebastian Beßler
  0 siblings, 0 replies; 25+ messages in thread
From: Sebastian Beßler @ 2009-12-15 14:22 UTC (permalink / raw
  To: gentoo-user

Am Montag, 14. Dezember 2009 17:44:01 schrieb Renat Golubchyk:

> Try checking it with ImageMagick's "identify".

app-forensic/foremost may be useful too

Greetings

Sebastian



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-14 19:21         ` Stroller
@ 2009-12-15 15:50           ` Marcus Wanner
  0 siblings, 0 replies; 25+ messages in thread
From: Marcus Wanner @ 2009-12-15 15:50 UTC (permalink / raw
  To: gentoo-user

On 12/14/2009 2:21 PM, Stroller wrote:
> Yeah, I think I have a copy of my signature here which was scanned at 
> about that kinda resolution, stored as a bitmap & has a large 
> filesize. When I discovered how badly it slowed down Word when 
> actually trying to place it in a document it got replaced with a much 
> smaller gif version. The improvement in performance that this 
> eventuated was, to me, slightly unexpected - surely whatever the 
> original format, both images must be stored in RAM in about the same way.
>
> Stroller.
There's MS Office for you...




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-15 14:00     ` Mick
@ 2009-12-15 16:29       ` Stroller
  2009-12-15 21:45         ` Mick
  0 siblings, 1 reply; 25+ messages in thread
From: Stroller @ 2009-12-15 16:29 UTC (permalink / raw
  To: gentoo-user


On 15 Dec 2009, at 14:00, Mick wrote:
> ...
> I'm guessing that the OOo HTML converter will probably turn images
> into PNGs.  If you want to see what the original format is then open
> the .doc file using OOo and Save As an ODF file - OOo's open document
> format.  Then unzip it and in the folder that is created amongst other
> you will find:
>
> -Configurations/Images
> -Pictures
> -Thumbnails

"Save As an ODF file" - do you mean .ODT?

<http://archives.gentoo.org/gentoo-user/msg_bd7e73e69d5212365418fc46d2626f26.xml 
 >

I can't find ODF as an option on this version of Open Office (3.1.0).

Stroller.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [gentoo-user] OT: extract an image from a .doc file?
  2009-12-15 16:29       ` Stroller
@ 2009-12-15 21:45         ` Mick
  0 siblings, 0 replies; 25+ messages in thread
From: Mick @ 2009-12-15 21:45 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: Text/Plain, Size: 792 bytes --]

On Tuesday 15 December 2009 16:29:58 Stroller wrote:
> On 15 Dec 2009, at 14:00, Mick wrote:
> > ...
> > I'm guessing that the OOo HTML converter will probably turn images
> > into PNGs.  If you want to see what the original format is then open
> > the .doc file using OOo and Save As an ODF file - OOo's open document
> > format.  Then unzip it and in the folder that is created amongst other
> > you will find:
> >
> > -Configurations/Images
> > -Pictures
> > -Thumbnails
> 
> "Save As an ODF file" - do you mean .ODT?

Yes. The Open Document Format has an .odt file extension for text files.  The 
drop down when you select Save As (in OOo 3.1.1) says:

ODF Text Document (.odt)

The "t" in .odt stands for text.  For spreadsheets it's .ods, etc.
-- 
Regards,
Mick

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2009-12-15 22:18 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-13  8:46 [gentoo-user] OT: extract an image from a .doc file? Stroller
2009-12-13 10:50 ` Mick
2009-12-13 12:12   ` Stroller
2009-12-13 12:50     ` Mick
2009-12-13 14:57 ` felix
2009-12-13 15:01 ` Sebastian Beßler
2009-12-14  9:48   ` Stroller
2009-12-14 13:01     ` Renat Golubchyk
2009-12-14 14:43       ` Willie Wong
2009-12-14 19:21         ` Stroller
2009-12-15 15:50           ` Marcus Wanner
2009-12-14 15:43       ` Stroller
2009-12-14 16:44         ` Renat Golubchyk
2009-12-15 14:22           ` Sebastian Beßler
2009-12-14 15:06     ` Arttu V.
2009-12-14 15:18       ` Willie Wong
2009-12-14 16:25         ` Dale
2009-12-14 17:27           ` Willie Wong
2009-12-14 19:45           ` Stroller
2009-12-14 16:46     ` Sebastian Beßler
2009-12-14 19:23 ` Daniel da Veiga
2009-12-15 13:01   ` Stroller
2009-12-15 14:00     ` Mick
2009-12-15 16:29       ` Stroller
2009-12-15 21:45         ` Mick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox