Saturday 23 May 2020

Automation and the Hebrew Language

Years ago I attempted a grammar routine that would take a stem and its word form and automatically tell me if it was a noun or verb form etc. This is a non trivial problem.

These past few weeks I undertook a subset of the problem to determine the morphology of a Hebrew word.  This is the opposite of determining the stem from the word. Now I have the stems and they are mostly consistent (though I have found some errors and corrected them and the Aramaic is tricky since I stored the equivalent Hebrew stem instead of the Aramaic). There are of course real exceptions and transpositions of letters in the text, sometimes to avoid tongue twisters like the fourth word in Isaiah 3:4, וְתַעֲלוּלִ֖ים which has עולל as its stem.

With SimHebrew now partially integrated into my translation system, transposition of letters is both easier to see and process with the computer. The morphology of this word is: vt/ylvl\im. Notice how the stem yvll is altered to make the word easier to say. The vl of the stem has become lv in the word form. I have all words decoded now automatically using a simple recursive function implemented in the database. It was fascinating to do it. Here's the verse in a few different forms:

Isaiah 3:4 SimHebrew Malé text My translation
וְנָתַתִּ֥י נְעָרִ֖ים שָׂרֵיהֶ֑ם
וְתַעֲלוּלִ֖ים יִמְשְׁלוּ־בָֽם
d vntti nyrim wrihm
vtylulim imwlu-bm
ד ונתתי נערים שריהם
ותעלולים ימשלו־במ
4 And I will give youngsters as their nobility,
infants, and they will govern them.

I had no idea it was so politically appropriate.

And this is the verse in SimHebrew with each word decomposed.
vn/tt\i nyr\im wr\ihm, vt/ylvl\im i/mwl\v b\m.

I have about 50 words left to examine out of the 305,000-odd in the canon. Astounding what a recursive program can do (though one has to be careful that it will not loop infinitely).

The process is just string manipulation. if the stem is in the word form exactly, it's easy. But this is true for only words with strong consonants in them. And a large number of consonants are weak sometimes, and move around in the word forms depending on the word form. Also of course Hebrew is enclitic, so additional affixes glom onto the stem to make it do things like change its tense or aspect or subject or object. When they do this, some parts of the stem disappear. Lots of letters disappear. Sometimes there is only one letter of the stem left. But there is a limit. In the above example, vav is notorious for moving or changing its shape to yod or whatever else you want to call it. So the program tries a manipulation in various sequences and then calls itself again (recurses) with the variation.

Here is the full set of possibilities for yvll:
yll\i, yll\ihm, ylv\t, yvll, yvl\hm, yll\ih, v/yll\ihm, yvll\im, vt/ylvl\im, l/yvll\ihm, yvl\m, m/yvll, yvl, c/yll\im, yvl\h, yll\ic, yvll\ic, m/yll

Programming is a great escape from isolation.

No comments:

Post a Comment