Friday, June 26, 2015

Automation studies reported in JBL

I read this morning of computerized source criticism of Biblical Texts (Journal of Biblical Literature, Vol 134, No 2 page 253-271). I have not yet read this article in toto but I was surprised to see their dependence on the Strong's exhaustive concordance of 1890. I am even more surprised to see that they used the KJV to establish initial synonym sets. Should I consider this for real or is their experiment a study of the 17th century CE?

The Strong's numbering system, and decisions as to what is or is not a word, is about as confusing a starting point as I can imagine for using a language. I know some pastors who think that Hebrew is nothing but numbers. (Mind you, I would have loved this data as a starting point in 2006 if it had been available to me in a convertible form - i.e. one in which the whole concept could be redesigned.)

Strong's numbering is an early attempt at unique identification of words before the idea of identity was ever explored and clearly defined in database usage. Strong's applies a sequential number to each 'word' beginning at alef and ending with taf. When Strong's comes to mem and taf, not to mention others, it is clear that the numbering system cannot be sequential for we are in the midst of words derived from other words that do not relate to mem or taf in a sequential sense. Such identity is fine but it needs to be hidden. By that I mean it must be used as a pointer only, and that only by software, never by a human as if the number were meaningful. Identity should not be a property describing the object. (Alphabetic sequence is not a very useful property anyway except for dictionaries.) Of course the human was the only software when this was originally developed and we have to scan by some identifier if we want to find something in a list. Nevertheless, I am concerned that distinction of source material, style or author, arising out of this data and its implied assumptions about words, synonyms and homonyms will compromise from the outset any possible results. All it will do is prove the starting assumptions.

I admit it is convenient to have such data online for various queries about frequency of use and so on, but I search when I look at a word. One of my sources, the Blue Letter Bible, has at its base the same concordance and the same undisciplined and unsubstantiated use of synonyms, but it, at least, like hard-copy dictionaries such as Brown, Driver, Briggs, and my קונקורדנציה לתנייך (Latinate concordance) allow one to drill down the derivative pointers and begin to see the sound patterns that may be implied for individual words. I also make use of, a serious ultra-literal and reasonably concordant interlinear that can sometimes be used to help decompose the Hebrew. BLB remains an apt and clever forward and reverse look-up and it has the merit of exposing the lack of concordance in the KJV, in spite of the limited pre-modern additional literature it has in its data.

Always I am aware of the need to question every assumption, especially my own when I can see them. Many who work in this field know far more than I do. I am a fly on the wall, and hopefully not a fly in the ointment. I think I must make use of all clues, not only considering words that carry significant semantic content, but also the little words and word forms that make up the grammar of Hebrew: article, particle, preposition, and so on. It is in the grammatical usage as well as random synonyms that distinctive authorial styles will emerge if indeed they can be seen 'objectively'. Hopefully there are many flies with lots of maggoty larvae working on the decomposition of this ancient body of text.

