Few weeks ago I need to create electronic book from scanned pages. For creating book I used great multi-platform WYSIWYG ebook editor Sigil.
Problem was: which tool to use for OCR and proofreading? On windows usually I use VietOCR. Unfortunately in this case it was very slow for opening scanned images. So I made quick tests and I found out Lector was fast but it missed tools for proofreading.
So I started to play with its code. First I implemented “text operations”: removing end of lines, changing capital letters on selected text (tesseract sometimes recognize “V” instead of “v” or “Z” for “z”). I was not satisfied with python standard function “Capitalize”, so I implemented new algorithm, that it capitalise not only first letter of selection, but also first letter after dot (within selection). I also find quite useful to see “white space characters”.
Than I recognize it is much easier for me to do text formatting in Lector than in Sigil because of present original image. So next code improvements was on that.
Last feature was spellchecker – basis come from “code of John Nachtimwald”: http://john.nachtimwald.com/2009/08/22/qplaintextedit-with-in-line-spell-check/ (by the way – he is contributor to Sigil). I also implemented function (optional) that Personal world list (PWL) is changed based on selected dictionary (standard behaviour is a one PWL for all dictionaries).
After discussion with author he made project owner. Because I did not find problem on my computer (Windows, Linux) I released Lector 0.3.0. There is no installer – so just download it, unpack it and run it. If you find some problem please report it. BTW: contributors and code reviews are welcomed!