Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm really intrigued. Working in the OCR field, this is something that we do a lot, using inhouse libraries or even dedicated 3rd party solutions. I'll definitely give this a spin in the next couple of days.

I do wonder what happened to normal C# style here though. Had a hard time reading this, because 'suggestItem' or 'editItem' etc. doesn't _look_ like a type/class. A single uppercase/lowercase change and I stumbled a couple of times.



This algorithm seems to be based on edit distance so should be a poor fit for the OCR field, since OCR rarely swaps letters.


I'm not quite sure what you're saying here. Swap letters as in transpositions? Yes, correct. Usually the errors are simple replaces or deletes/inserts (which arguable is might include 'swapping' an 1 for an l).

But the greater field I'm working in doesn't hand you random OCR and that's it. Most projects here contain a way for typists to correct recognition mistakes or complete the missing pieces of information on a document. For that (-> human typist, often you have a database with valid/expected values for fields) transpositions aren't rare at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: