Thursday, February 15, 2018

History of my word guesses

As part of this reading project, and of course, to save time and drudgery, I wrote a very simple algorithm to collect sequences of 6, 5, 4, 3, 2, and single Hebrew words that I had already drafted, and find those sequences in parts that I had not yet drafted.

It looks as if I wrote the algorithm in months 4 and 5 of 2016. The data shows that my output actually dropped in that period. I went backwards and removed some work in progress whether by design or or by accident I don't recall, to prove the new technique. The leftovers from the guesses are of course: work that is done, and the unknowns. Here is the chart based on a few data points - it was probably not quite so linear, but I did not keep a daily record.

The actual data points are: 2016-03-31, 2016-05-14, 2017-03-03, 2018-02-12. Chapters completed 700.

This is a broad brush insight into how many duplicate passages of varying length I was able to program as pre-reading guesses. All guesses are decreasing. The 'done' count is now over 205,000 words. Unknowns are fewer than 15,000. Single guesses are just over 42,000, double guesses just under 29,000, (these are often enough quite accurate), and the 3 to 6 in sequence are still quite high enough to be really useful - still about 15,000 words in phrases that I 'know' even though they are in places that I have not yet read.