Copyfish OCR is a free extension for Chrome and Firefox browsers and does a pretty good job of converting the text in images and PDF files to real text.

Update from the Copyfish team: Copyfish was hacked but is back under our control! A new, clean and safe version of Copyfish (reviewed by Google) is now available for download. The stolen and infected version has been disabled by Google (More info here).

Often, translators complain about getting screenshots or PDF files for translation. And they are correct in being a bit upset about it since images and PDFs that contain text for translation can often not be processed by their CAT tools. And this is not only an issue for translators. Maybe you have screenshots or protected PDFs (files you can open with software like Adobe PDF Reader) from which you would like to extract the text for some reason.

The Solution

The solution to this kind of problem is OCR software. OCR stands for Optical Character Recognition. Such an OCR software can convert an image that represents a text into a real text so it can be translated. How successful the conversion depends on the resolution of the image and some other factors. And even if the text was converted it might have to be edited a little to be processed efficiently with a CAT software like OmegaT.

However, Copyfish is doing a pretty good job considering that it is completely free. And it does not only capture the text but can also translate the text into a language of your choice (it is using Google Translate to do this).

How do you use Copyfish? Follow these steps:

  1. Install the extension and click the Copyfish icon in your Chrome or Firefox browser.
  2. Select the area with the text (image, video, html5, ajax – Copyfish works with any input)
  3. Done! Copyfish extracts the text from the image, displays, and translates it.

Additional Information:

How to use Copyfish