Friday 1 February 2019

Mistakes and mangled verses

For 50 years I have been a programmer. One thing you learn as a programmer is how to control the mistakes that programmers make. Typical rates of error in a programming system have not changed because humans who do the programming have not changed. I was told when I began to expect a coding error in every 10 lines of code. A 10% error rate. Automated programming helps. It can reduce the error rate to zero. Of course it automates the unknown limitations of its source and process. But it is consistent.

In my reading project, I am fortunate that learning and translation are processes that require keeping one's detailed work. The detailed work is partially redundant with the final reading. So I have two independent tables in my data that both contain the full text of the English.

One is the word by word glossary. It has the Hebrew consonants, my derivation of the stem, the corresponding English phrase for each Hebrew word in the text. So roughly 305,000 entries. This I developed over time to try to discover the way stems work in the Hebrew tongue. I am still at the beginning of this discovery.

The second full text is the verse by verse version. This contains the English text by book, chapter, and verse. It is a bit frightening to read the English alone now. During the project, I developed the English in a variety of ways, sometimes first as a sentence, and then working out the glossary. But eventually I developed a technique to generate the sentence from the glossary and adjust it to try to find its sense. (If it indeed does make sense - some verses in the Leningrad codex are missing words - but not many).

But this process evolved and adjustments happened over the 8 year scope of this project for a variety of reasons. Recently when reading the English I noted a fishy looking verse and made a mental (actually a PDF note on my phone) but before I remembered that I had made the note, I had corrected something else and overwritten the PDF. Bother! said Winnie-the-Pooh.

So I thought and thought - if only it was honey and not cheese. And I figured a way to give myself a headache by comparing the glossary with the sentences of the text. The odds are that if the letter count of the glossary is not equal to the letter count of the verse in its sentence form, there is some kind of problem.

So if you were wondering, that is what I am doing these days - a two week slogging fest to correct my estimated 2000 errors (the rate of discrepancy is about 10% - just like in programming). It is a precise focus with a well defined end. It won't catch all errors but it will catch a lot and allow the English to be read with slightly less fear.

I am astonished at some of the errors. Many are trivial - a mismatch of and's or's or helping verbs or prepositions. Some are spelling discrepancies in names. Some are duplicated prepositions arising particularly from expected construct forms. But some are words and phrases that appear to have come out of nowhere! These wild animals will be caught in my redundancy trap. The odds of making that same mistake twice are small since English lemma forms in the glossary are tested against the Hebrew stems for consistency within a table of rules that has been developed over the project. Curiously enough, some of the errors are from pasting almost identical verses from parallel passages! (Some programming errors come from laziness. Some from tiredness.)

No comments:

Post a Comment