Saturday 28 April 2018

Constructing a word by word text

I am wondering (again) how to present the information in the text and how the pieces of each verse are used and interpreted in their context.
Ezekiel 31 Fn Min Max Syll
וַיְהִ֗י בְּאַחַ֤ת עֶשְׂרֵה֙ שָׁנָ֔ה בַּשְּׁלִישִׁ֖י בְּאֶחָ֣ד לַחֹ֑דֶשׁ
הָיָ֥ה דְבַר־יְהוָ֖ה אֵלַ֥י לֵאמֹֽר
1 And it was in the eleventh year on the first of the third of the month,
the word of Yahweh happened to me, saying,
3e 4C 18
10
ויהי and it was באחת -- עשׂרה in the eleventh שׁנה year בשׁלישׁי the third of באחד on the first לחדשׁ of the month היה happened דבר the word of יהוה Yahweh אלי to me לאמר saying
Ezekiel 13.1 music
This is just an experiment with one verse. I have everything in the data including the music, but the preparation of the above is not a slam-dunk.
  1. The conversion of words to a sentence means there is a 1 to many relation in the join of the initial and final text to the individual words. Here I have used a join of the music in text form to produce the first row of the table (but suppressing most of the music). To this I have added the interlinear rows. I have used HTML features to reduce the interlinear to one table row. 
  2. To include the music, I must use an image.
My individual word data does not have the music by word. That was probably a mistake, but at the beginning of the project 12 years ago, I knew nothing of the music and I would have been buried - I almost was buried just by the consonants, let alone vowels or accents. I had to fight one struggle at a time.

While I can produce a single verse image in a minute or two, that is too slow for doing the whole bible, (estimated 47 8-hour days). All the music is available but not so easily seen in the context of a single verse. Only an image gives the relationship of accent to word. The unique accent sequence for Ezekiel 13.1 is e rev,C qad,z-q,g# B ^A f g# f e. It is useful for comparing music in an algorithm, but not for singing.

In addition, it would be nice to have a list of stems used in the verse, and for each stem, where else one might want to look for its usage and what additional glosses are used. This is a lot of potential output so must be partitioned or reduced in some way. The 10 unique stems for the 12 words of Ezekiel 31:1 ignoring the same stem used in a different semantic domain, produce 111 rows of possible glosses used in 11,915 verses.

Every time I do this I discover errors and impossibilities. These are the 10 stems and the 111 rows.

אחד another (11) first (69) once (11) other (19) several (2) single (7) singly (1) specific (2) thing (3) unity (1) untranslated (5)
אל attention (1) concerning (7) even (1) lo (1) none (1) though (3) through (1) untranslated (144)
אמר hers say (1) indicate (6) looking say (1) mention (3) promise (92) pronounce (8) said (2,329) said... (2) say (1,412) saying (18) says (495) talk (20) tout (2) untranslated (4) uppermost offshoot (2) vocal (1) women said (1)
דבר -speech (1) address (1) annals (2) concern (1) conversation (1) even word (1) matter (3) ones speak (1) said word (1) speak (407) speak be (1) speaker (1) spoke (426) spoken (145) tale (2) through word (1) untranslated (3) word (885)
היה being (11) happen (401) though (1) untranslated (8)
חדשׁ both new moon (1) month (152) new (15) new moon (59) new thing (1)
יהוה LORD (1 - for an acrostic) Yahweh (6,637)
עשׂר eleven (13) eleventh (8) teen (1) ten (124) tens (3) tenth (29) twentieth (4) twenty (240) twenty- (8) untranslated (198)
שׁלשׁ ַand third (1) day yesterday (3) days gone (2) every three (1) great grand (1) third (91) third measure (1) third time (1) thirteen (8) thirteenth (10) thirtieth (1) thirty (129) thirty thing (1) three (338) three day (1) three year (3) three-ply (1) trisect (1) untranslated (12)
שׁנה adjust (6) college (2) diverse (1) feign (4) second (1) second type (1) substitute (1) two year (11) unmark (1) untranslated (55) variance (4) vary (5) year (626) year out (9) year span (41) yearling (55)

Note - eleven isn't quite that simple. [Nothing in natural language ever is] Unlike English eleven takes two words and there are two distinct ways of saying it (אחת עשרה or אשׁתי עשרה). I have a number of routines that deal with consecutive words but in this exercise I did not use them, and it turns out that I haven't been exactly consistent in my data coding. When I have two Hebrew stems for one gloss, one of the Hebrew words might be marked as untranslated or I might have included a connector like 'and' even if it was part of the ignored stem - phooey! [Data coding is critical. High level errors like the failure to distinguish metheg from silluq in Unicode lead to the inability of a program to make some types of decision.]

I have also had to ignore the domain grammar-preposition. If I don't remove prepositions from the glosses, I get 292 rows - and 111 was already too many. Perhaps omitting entirely words that are only used for the base Hebrew grammar would be a good idea. If I omit prepositions, the rows reduce to 103. If I omit names, they reduce to 100, the verses reduce to 7,734. If I remove speaking, and the very common היה, the rows reduce to 61 and the verses to 1,861. So the strategy will have to be to identify the very common words in the language and take them as given. This runs the risk of losing track of the glue... But there we are - we don't want to be stuck.


No comments:

Post a Comment