Friday, 26 August 2016

An insight from the Raw Data of the Old Testament

What did I think I was faced with as the raw data of the writing that Christendom calls the Old Testament? I remember being told that ancient Manuscripts of the Hebrew texts were consonants only without vowels or punctuation.

I was also told that there were no spaces and no lower case letters. These are somewhat oversimplified statements. It is true that there are no lower case letters. It is true that there are no dots and dashes to show the later tradition of vowels, but there are several consonants that act like vowels in some situations.

Also I was told that there was no verb for our common 'to be'. That is a false statement. But there are many sentences in the text where the verb 'to be' is implied. This is true in other Semitic languages like Arabic. One just says for instance, My name Bob.

Here's a bit of raw data, a verse from the great Isaiah scroll from the Dead Sea Scrolls, picked at random. My first impression of this site is that it has some very creative technical work and is nicely curated. (I remember visiting the scroll in 2010 but I didn't have time for close examination.)

The Great Isaiah Scroll, Isaiah 1:28
Surprise! It is clear that there are spaces between words and also paragraphs in this manuscript.

This verse promising the shattering of transgressions and sins is between the marks I made on the ms.

I find scrolls hard to read and hard to see, but they look familiar. I wonder what happened to the divine name in this verse - must look further. (Anyone know?)

I am more familiar with the Aleppo codex (800 years later than the great Isaiah scroll) and the Unicode text (c 2010) of the Leningrad codex (itself 200 years later than the Aleppo codex).
The Aleppo Codex, Isaiah 1:28

What we see in the Aleppo codex is that the spaces have disappeared but in effect they are replaced by accents. And since part of my set of outstanding questions is 'where did the accents come from', they must have been included early in this copying tradition or the potential for confusion of words would have been severe.

The verse is in the box. Can you find it? The clue to look for is the silluq (vertical line under the syllable) showing the end of the prior verse בִּצְדָקָֽה and the end of the verse we are looking for.

Here is the Leningrad text of Isaiah 1:28:
 וְשֶׁ֧בֶר פֹּשְׁעִ֛ים וְחַטָּאִ֖ים יַחְדָּ֑ו וְעֹזְבֵ֥י יְהוָ֖ה יִכְלֽוּ׃
Here we see that the 'big colon', (red) the sof pasuk, is not in use in the Aleppo Codex or the Great Isaiah Scroll, but the short vertical stroke below the last syllable of a verse, the silluq, (לֽ)is there in the Aleppo Codex to distinguish final cadences. (This mark is also used for a completely different purpose in later manuscripts where it is called a metheg. The metheg disturbs the musical line and is in no way equivalent to the silluq.)

It would seem that the accents are required much more than the vowels once the spaces are removed from the text. Here is the music for this verse.
The interpretation of the accents, Isaiah 1:28
The text is printed over each change in note showing the double use of the music for both word and sentence phrasing. The atnah (the A in the music) divides the verse.

I sketched a query against my data - not conclusive - but I think I can say that there is at least one syllable-stress marked for every 'word' in the text. Sometimes it is the same accent as on the prior word to keep the music on the same reciting note. Note some 'words' without accents are joined to another word by a hyphen (called a maqqep) effectively treating the combination as one word. The 1900 or so that have fewer accents than words each disappear when I restore the maqqep that I used to remove. Sometimes also there are multiple accents on one word. About 60% of the verses have more accents than words.

So this is my surprising insight today, accents present on every word in the text help identify words when the spaces have disappeared.

I began this post to 'explain' where I began my process of raw data management 10 years ago. I was not expecting to clarify an important point for myself. As a result, it is misleading to say that accents have a role in punctuation. The accents are not punctuation at all. They are for syllable stress and musical phrasing. Secondarily they provide information on distinguishing words in a manuscript where there are no spaces.


What I first did 10 years ago was to code the consonants of the text, as in the Isaiah scroll, word by word in a Latin letter transcription. Given the Unicode mapping, I could easily transform the consonants both ways from Latin letters to Unicode and back again. I also experimented with automated transcription including vowels in those early days after my first full translation of the Psalms.

I first began to work with the Leningrad codex as far back as 2010. I verified my text for the Psalms from it in the period 2010-2013. Eventually I ended up with all the marks, vowels and cantillation, via the publicly available web service on that site. That gives me two approaches to the data, verse by verse in one table, and word by word (consonants only) in another. The manipulation of the data via query and cross-tab is another story. There are some additional posts in the past about the development environment I am using. This one is less than revealing, being full of code, but it is of the essence. The majority of the relevant posts would be under the label reading project status.

No comments:

Post a Comment