I am down to fewer that 50 puzzle pieces that I don't have an initial domain and subdomain for. But there are still over 18,000 pieces that have no guess at all. Many of these are probably person and location names. Some of the others that I think are known through a semantic domain and even a single or double guess are probably assigned to the wrong stem by the root derivation algorithm (it was less than 80% accurate). Several letter combinations can occur with different stems. E.g. שׁוב and ישׁב have many overlaps in their forms. On a word by word basis there is insufficient information to decide which stem is which. I do not use complex analytical techniques to read these forms with the computer. No Markov chains or probabilities. I would need a team of expert programmers for such stuff. But my algorithms will correct a stem if it is used in a word sequence greater than 2. (Every now and then I wonder if this section of the code is working as it should - but that's a programmer's lifestyle.)
It wasn't long ago (January 2017) that the unknown pieces were double what they are now so we are making progress. Here's a rough manual graph.
|Puzzle pieces not yet turned over for the first time - Hebrew Bible, more than 305,000 pieces|
Heading towards zero - end of next year perhaps
X axis is number of days ago