Wednesday, December 31, 2014

Thank you, readers of dust

Thank you readers for propelling Dust into the top 10 Biblioblogs. Dust is number 7 according to Peter Kirby's winter list. So let me tell you what I am doing at present.

I have drafted a routine that analyses the Hebrew consonantal text and tries to tell me the grammar. It has been two months of refining ideas and it will be another 6 months of refinement as I do some translation to see if it is helpful - I think it will be eventually. The code is in Oracle PL/SQL for those who might know what this means - about 1500 lines of code at the moment. The input is the word and the stem of the word. I have a system that gives me a stem even when there is no stem for a word - such as in a preposition+pronoun where both are enclitics. I also use some input from domain and subdomain. Some things about a language you just need to know and some you can figure out from the word. I am still increasing information where I need it and reducing it where I don't need it. It is quite a study in linguistics and I will report what I learn.

Initially I have divided the words into 4 groups based on prefixes and suffixes, while separating out grammatical words and names (peoples, places, persons). The routine then looks at the individual prefixes and suffixes to determine possible grammatical forms. Many words are unambiguous, but many will require a more clever contextual analysis. I don't know if I can do this yet. Automating contextual choices in language processes is not my strong suit (if indeed I have such a thing). For now if I can't tell from the word, I just note that there are choices.

I am still working on making the automated features of my structural charts available on the web - perhaps in 6 months to a year I will have something done. These are relatively objective. I hope also to make the music available. At the moment you can see structural analysis in my book and on this blog, and the music here, but I would like to get to the point where you could chose any section of the Hebrew scripture and see structures and music without intervention as interpreted by the programs. I would even like to be able to support additional sets of rules for the music, but this is a longer term job and requires time and programming both to create the data in the right form and to externalize the rules so that differing cantillation schemes could be supported.

Over the next year I hope to review my translations of Job, Jonah, the Song, Ruth, Lamentations, Qohelet (Ecclesiastes), and the Psalms and to make them conform to the translation controls I have now imposed. Here's an image of the screen I have designed to do the main job. If you look closely you will see that my concordance and search buttons helped me to find and correct an error (visible on the page) that was caused by a mistaken stem. When it was discovered by the program, I was able to correct it and carry on with the translation. Roots also were derived by a PL/SQL routine I wrote several years ago - one that was a kind of bootstrap and only gets about 80% right. Over the next year as well I hope to extend translation of the first 50,000 words (about 1/7th of the Bible) that I have in my data: bits and pieces of almost all the books. I will use the experience to keep on translating while I have breath in me. This is such an impressive and mind-boggling body of text.



No comments:

Post a Comment