One of my pandemic projects is to map the differences between a pointed and an unpointed text of the Hebrew Bible. I am doing this using the very memorable and capable left to right version of the square text called SimHebrew.
I had 'my own version' of a left-to right abbreviation in capital letters and with a few punctuation marks for the gutturals aleph and ayin. But I converted to the lower case version developed by Jonathan Orr-Stav. (Such a Latin-letter code is an abbreviation because it takes one byte as opposed to 7 bytes in rendered Unicode, and one byte as opposed to 2 internally. Using a Unicode database is really awkward for me and the technology was new 10 years ago so it was a non-starter.)
My method of data capture is to use the unpointed Mechon-Mamre text that can be downloaded from their site (one book at a time) and to run it through the SimHebrew converter here. Then to manipulate the text in notepad until I have legitimate insert statements for my database. This is somewhat prone to error but eventually I get a clean script. (I use a combination of Word and Notepad for the necessary global changes.)
After this I create a temporary table that matches: book, chapter, a conversion of the alef-betic verse number to a real verse number, and a word by word Hebrew word in SimHebrew, full pointed text word from the Leningrad codex, the stem code, raw word form, and semantic domain from my database, the word number relative to the start of the verse, and the word id (assigned by an Oracle sequence and connecting to my word table).
I recently added 2 Kings to my data. At first I assigned the wrong chapter numbers - a little oversight. That resulted in over 5000 differences in my calculations. When I fixed the issue, the differences went down to about 230. I then discovered that M-M is treating דִּבְיוֹנִ֖ים (dbivn) which they render as dung of a dove as two words (db ivnh - apparently a poor substitute for salt). Fixing those 4 instances dropped my mismatches by 100. Then word by word I work through the remainder until the differences are accounted for.
My program that calculates the words also suggests a code snippet that will fix any discrepancy (as long as I put it in the right place in the code.) The program as it stands gets 99% of the words right on first pass now. So my predicted SimHebrew Bible is about 99% right. Sorry - for the Bible, that's not good enough.
About this time in the history of the project, I ask, Is there a better way? I know I can finish the way I have begun and my brute force spelling changes are next to 0. I have over a third of the words in my test data now and over 75% of the stems represented. But the code is specific to prefix and suffix and sometimes to particular vowel combinations in the WLC. What are the real rules?
Native speakers who 'just know' the pronunciation - what are they really doing? Certainly they have retention in memory by word form, by context, and by stem. But can I get the program to discover the shortcuts that people use? And in some ways see what they are doing. (And thereby discover the nature of the evolution of language usage.)
Here's an example: the 31 uses so far of עצר (yxr)
For yxr, the following rules are noted.
- First you see that the holem is rendered as /o/ in lines 1 and 2 and others. This is default, but some words with some exceptions do not render the o.
- Next (line 3) you will note that the qamats is not rendered as /o/ (or vav for rtl Hebrew) but it is in line 4, but the qamats under the second letter of the stem is rendered with prefix /i/ and /t/ in each case without suffix.
- Tsere is rendered as /i/ in line 6 and 12 - but only the tsere under the last letter of the prefix. (I would like to color code these but the new editor absolutely ruins rtl display with embedded color coding.)
- Notice that the final h is dropped in line 19.
- Lines 26 and 27 would have rendered the qamats in the closed syllable as /o/ if the suffix had not been /u/. That's a general rule - but there are exceptions here too and exceptions to the exceptions.
Ref (book: chap: vs: (word within vs) |
Stem |
Word Form |
Morph |
Sim Source |
Sim Calc. |
WLC Word |
Domain |
Rendering |
|
2 Chronicles 14:10(30) |
iyxr |
i/yxr |
-iyxvr |
-iyxor |
־יַעְצֹר |
BOUND |
יעצר will coerce |
||
1 Chronicles 29:14(7) |
nyxr |
n/yxr |
-nyxvr |
-nyxor |
־נַעְצֹר |
BOUND |
נעצר contained of |
||
2 Chronicles 13:20(2) |
yxr |
yxr |
-yxr |
-yxr |
־עָצַר |
BOUND |
עצר did coerce |
||
2 Kings 4:24(9) |
tyxr |
t/yxr |
-tyxvr |
-tyxor |
־תַּעֲצָר |
BOUND |
תעצר do detain |
||
2 Chronicles 7:13(2) |
yxr (5) |
ayxr |
a/yxr |
ayxvr |
ayxor |
אֶעֱצֹר |
BOUND |
אעצר I contain |
|
1 Kings 8:35(1) |
bhyxr |
bh/yxr |
bhiyxr |
bhiyxr |
בְּהֵעָצֵר |
BOUND |
בהעצר when is
contained |
||
2 Chronicles 6:26(1) |
bhyxr |
bh/yxr |
bhiyxr |
bhiyxr |
בְּהֵעָצֵר |
BOUND |
בהעצר when are
contained |
||
Amos 5:21(6) |
byxrticm |
b/yxr\ticm |
byxrvticm |
byxroticm |
בְּעַצְּרֹתֵיכֶם |
COVENANT |
בעצרתיכם in your
conclaves |
||
2 Kings 9:8(9) |
vyxvr |
v/yxvr |
vyxvr |
vyxur |
וְעָצוּר |
BOUND |
ועצור the contained |
||
1 Kings 21:21(11) |
yxr (10) |
vyxvr |
v/yxvr |
vyxvr |
vyxur |
וְעָצוּר |
BOUND |
ועצור the contained |
|
Proverbs 30:16(2) |
vyxr |
v/yxr |
vyvxr |
vyoxr |
וְעֹצֶר |
BOUND |
ועצר and contained |
||
1 Chronicles 21:22(17) |
vtyxr |
vt/yxr |
vtiyxr |
vtiyxr |
וְתֵעָצַר |
BOUND |
ותעצר that may be
contained |
||
2 Kings 17:4(20) |
viyxrhv |
vi/yxr\hv |
viyxrhv |
viyxrhu |
וַיַּעַצְרֵהוּ |
BOUND |
ויעצרהו so detained
him |
||
Job 4:2(5) |
vyxr |
v/yxr |
vyxvr |
vyxor |
וַעְצֹר |
BOUND |
ועצר but contain |
||
Psalms 106:30(4) |
yxr (15) |
vtyxr |
vt/yxr |
vtiyxr |
vtiyxr |
וַתֵּעָצַר |
BOUND |
ותעצר and was
contained |
|
Job 12:15(2) |
iyxr |
i/yxr |
iyxvr |
iyxor |
יַעְצֹר |
BOUND |
יעצר he contains |
||
2 Chronicles 2:5(2) |
iyxr |
i/yxr |
iyxvr |
iyxor |
יַעֲצָר |
BOUND |
יעצר contains |
||
1 Kings 18:44(19) |
iyxrch |
i/yxr\ch |
iyxvrç |
iyxorc |
יַעַצָרְכָה |
BOUND |
יעצרכה detain you |
||
2 Chronicles 22:9(27) |
lyxr |
l/yxr |
lyxvr |
lyxor |
לַעְצֹר |
BOUND |
לעצר to coerce |
||
Psalms 107:39(3) |
yxr (20) |
myxr |
m/yxr |
myvxr |
myoxr |
מֵעֹצֶר |
BOUND |
מעצר through
coercion of |
|
Proverbs 25:28(8) |
myxr |
m/yxr |
myxr |
myxr |
מַעְצָר |
BOUND |
מעצר containment |
||
2 Chronicles 7:9(4) |
yxrt |
yxr\t |
yxrt |
yxrt |
עֲצָרֶת |
COVENANT |
עצרת a conclave |
||
Joel 1:14(4) |
yxrh |
yxr\h |
yxrh |
yxrh |
עֲצָרָה |
COVENANT |
עצרה a conclave |
||
Joel 2:15(7) |
yxrh |
yxr\h |
yxrh |
yxrh |
עֲצָרָה |
COVENANT |
עצרה a conclave |
||
2 Kings 10:20(4) |
yxr (25) |
yxrh |
yxr\h |
yxrh |
yxrh |
עֲצָרָה |
COVENANT |
עצרה a conclave |
|
Job 29:9(2) |
yxrv |
yxr\v |
yxrv |
yxru |
עָצְרוּ |
BOUND |
עצרו contained |
||
2 Chronicles 20:37(19) |
yxrv |
yxr\v |
yxrv |
yxru |
עָצְרוּ |
BOUND |
עצרו could be
coerced |
||
2 Kings 14:26(10) |
yxvr |
yxvr |
yxvr |
yxur |
עָצוּר |
BOUND |
עצור coercion |
||
Jeremiah 36:5(7) |
yxvr |
yxvr |
yxur |
yxur |
עָצוּר |
BOUND |
עצור am detained |
||
1 Chronicles 12:1(7) |
yxr (30) |
yxvr |
yxvr |
yxvr |
yxur |
עָצוּר |
BOUND |
עצור he contained
himself |
|
1 Kings 14:10(12) |
yxvr |
yxvr |
yxvr |
yxur |
עָצוּר |
BOUND |
עצור those who are
contained |
- 200 distinct stems which do not contain a yod are allowed to render hireq as yod without exception.
- Another 281 allow hireq as yod on an exception basis.
- Only 7 stems containing a yod disallow hireq as yod.
- Another 46 disallow it on an exception basis.
- Hireq is usually ignored in a closed syllable - but there are exceptions and exceptions by word form to the exceptions (only 8 stems).
- Hireq is rendered as /ii/ for several reasons. When the prefix is i, and for tsere, patah, and qamats occasionally. What! these also are rendered as o or v sometimes. Who can know?
No comments:
Post a Comment