A friend in law school was asking me if I knew what the word “eautionary” meant. She was unable to find any definition for it in the common places, so I went to go look it up online in the Oxford English Dictionary. And to my surprise, it wasn’t in there either.
It took several minutes to figure out, but imagine this scenario:
You have an old document and you want to reprint it. You could either type it in by hand or scan it in and run optical character recognition. The problem with OCR is that sometimes it’s wrong. And it could have easily mistaken a stylized or script “c” for an “e” . Furthermore, I bet there are a lot of words in legal papers that aren’t in a standard spell checker, so it could have easily been skipped on accident during the proofing process.
Voila: eautionary…a misrecognized version of “cautionary.”
A better question is how could you prevent this from happening. I’d first run a spell check to see if the word is misspelled; if it is, I would perform a Hamming distance calculation. Then based on the Hamming distance and the number of times the word appears in within a specific region of text, I would flag it for further review.