Jump to content

Recommended Posts

Posted

I think I must be thick. But I scanned a word document with a view to adding text. The scanner has autmatically assigned it as jpeg!

I just can get into it to edit.

Is it possible to change jpeg to word? Then I think I edit no problems.

Anyone know how? Like the best programme to open it with?

Thanks

begs

Guest Reimar
Posted

You'll need to use an OCR Software like Omni-Page, Paperport or something like that for to convert the picture back to text.

The OCR Software will "read" the picture and recognize the Letter in that picture, simply to say.

Normal OCR Software came with the Scanner together.

Cheers.

Posted

I don’t know Begs, the formats for an image and the format for a word document are completely different. The way an image is stored by a computer is generally by either breaking the image down into its smallest component (pixel) and then assigning a color value to that pixel, often along with a transparency value as well. This is the basic bitmap format. For a 4 pixel image that is all red you could imagine the following expressed as Red, Green, Blue, Alpha (RGBA).

255,0,0,0 255,0,0,0

255,0,0,0 255,0,0,0

or

Red Red

Red Red

That would be like 4 red pixels arranged in a square where the red value is maximum 255, the green, blue and alpha values are 0

That could then be represented in binary e.g. 255,0,0,0 could be expressed as 4 double words

11111111 00000000 00000000 000000000 11111111 000000000 00000000 00000000

11111111 00000000 00000000 000000000 11111111 000000000 00000000 00000000

Other formats such as Jpeg and Gif will the use a compression algorithm to try and save on space, since representing the same value 4 times would be better expressed as 'from 1 to 4 use color 255,0,0,0'

A text document on the other hand is not the same. At the very basic level characters are simply stored as numbers.

My Name, Mal for example could be stored as decimal numbers

77 97 108

(Mal)

which can be also represented (using ascii) as

01001101 01100001 01101100

Everything is stored as binary, its just how the computer interprets the binary that counts. When you open a file as an image, the computer will treat the binary in the file as values to draw pixels from. When you open a file as a text document the computer will treat the content of the file as values to create characters from.

Now, you've scanned a word document. Before, the document was a series of binary values representing the characters of the document (there are other things as well apart from the actual character data in a word doc, but for simplicity ...). When you've scanned it, its been saved as an image, so the data in the scanned image file now represents the color values of the document, NOT the character data. This is why you can’t simply re-open the document and edit the text content.

To try and convert back from an image to a text document, as Reimar pointed out, you need some sort of software that will try and figure out what the original characters were based on the color of the pixels. This is called Character Recognition.

There are a number of applications out there, none of which I have tested, but this one looks like it might be ok

http://www.simpleocr.com/Info.asp

I hope this makes sense.

Good luck

Posted

I love OneNote - its a pity it isn't standard in all versions of office. I use the OCR in it quite a lot to recover abstracts from 'old' scanned papers.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.



×
×
  • Create New...