OCR

Copying Machine has a very basic method for recognizing text from images. Copying Machine uses the open source Tesseract library. Don't expect an error free conversion to a text file, but see it as a possibility to extract (small) text blocks from a scanned image.

How does it work

  • Make a selection on the document.
  • Choose the option 'Recognize text' in the menu 'Page' or on the pane 'Page'.
  • The OCR dialog will appear. Select the language of your document and press 'Start'.
  • The result box will show the recognized text. You can copy it to the clipboard.
  • On the document a text block will be visible. When you save the document as a PDF document, all the text blocks will be saved into the PDF. The advantage is that now the recognized text can be copied from the PDF.