Sunday, August 9, 2020

Considering the maqaf

The maqaf is in English a hyphen. So Here is an example: Amos 8:10
Amos 8
וְהָפַכְתִּ֨י חַגֵּיכֶ֜ם לְאֵ֗בֶל וְכָל־שִֽׁירֵיכֶם֙ לְקִינָ֔ה וְהַעֲלֵיתִ֤י עַל־כָּל־מָתְנַ֙יִם֙ שָׂ֔ק וְעַל־כָּל־רֹ֖אשׁ־קָרְחָ֑ה
וְשַׂמְתִּ֙יהָ֙ כְּאֵ֣בֶל יָחִ֔יד וְאַחֲרִיתָ֖הּ כְּי֥וֹם מָֽר
i vhpcti kgicm labl vcl-wiricm lqinh vhyliti yl-cl-motniim wq vyl-cl-raw-qrkh
vwmtih cabl ikid vakrith ciom mr
י והפכתי חגיכם לאבל וכל־שיריכם לקינה והעליתי על־כל־מותניים שק ועל־כל־ראש־קרחה
ושמתיה כאבל יחיד ואחריתה כיום מר
10 And I will change your festivities to lament and all your songs to a dirge and I will bring up over all endowments, sackcloth, and over every head, baldness.
And I will set it as a unique lament and following it, as a day of bitterness.

Lambdin writes that the maqaf (maqqep in his spelling), indicates that a preposition is proclitic, i.e. has no stress of its own. 

Does it affect the music? The music determines stress much more than the maqaf. And it is clear from bars 2 to 4 that an avoidance of inner-word stress is not necessarily justified. (I am glad to see that my music program ignores the maqaf entirely. Note that in music, the hyphen joins syllables separated by note changes, whereas in Hebrew the maqaf joins separate words.)

Nonetheless, my attitude towards this jot may have been a little too dismissive in the past.

Sunday, August 2, 2020

Hireq, when and when not to render it in standard spelling

This post continues my exploration of the transformations of individual vowels from a pointed text to an unpointed one. 

Here we look at hireq, when it is ignored and when it appears. There was an overall strategy. I had several rules based on prefix to suppress hireq. They almost all disappeared when I figured out the closed syllable rule.
  1. Suppress hireq
    1. when there is an /h/ in the prefix and the stem is in hlc, hll, clm, ixg, ixt, lvi, npl, pla, rah, rgy, wlc, wmy, wgh, wqh, xmt, yll.
    2. or the stem is in hlc, irw, nwa, rbb with conditions for some affixes.
  2. Suppress hireq when it is in a closed syllable, 
    1. if the last character of the prefix is /n/ 
      1. and stem is one of ptk, ckd, clm, csh, dmh, mlt', pla, pzr, rah, rpa, wbr, wck, wmy, and the first syllable of the word is closed with a schwa.
      2. or any stem with a weak first character and the syllable is closed with a hatef-patah (only one instance of this).
    2.  if the stem is not in /n/ and the stem is not in bqw, clm, csa, gll, irw, igy, lbb, nwa, rxp, qll, qvh (or it is qvh with the last character of the prefix a /t/).
      1. There is an exception for words from stems hlc, wal with a final schwa.
  3. Allow hireq to be come 'i*i' - (which will later translate to double i).
    1. for words beginning with v||schwa||i||hireq where the stem begins with i and the word is not the name of a person, location, or people (I do have this explicit semantic information in my data),
    2. for some stems formed with weak consonants (ibw, iry, ixr, n'ty, ndr, lvn, ain, ild, nwa) with conditions on the affixes,
    3. for the long list of stems (see this post) that allow i with prefixes i, vi, iv, hiv (happens with stem idy),
  4. Allow hireq to become i 
    1. for several lists of stems with conditions on the affixes.
    2. for stems with yod as part of the stem, excluding any stems, another long list, where hireq is explicitly prevented.
  5. Much later in the process, the last steps, 
    1. prevent double i where the stem does not allow it.
    2. prevent the weak hireq from forming a syllable with a common single or double prefix unless this is overridden for a stem.
Though the prevention logic has simplified considerably, this whole process seems too complex still. 

Summary of the rules - a point of stability

Continuing to explain to myself this program I am writing in response to the transformation of a pointed Hebrew text into one that conforms to standard hebrew spelling.

Right now I have 143 explicit named 'rules' governing subsets of Hebrew stems and their various word forms. That means 143 distinct sets of transformations that apply to a set of stems.

Do they make sense? 

Here for example is the first entry below: gvi, irw, mdi each allow double i /ii/ in the calculated SimHebrew word, whatever its origin. That is the only 'rule' applied to these stems. I see that each of these stems has an /i/ in it, so this collection is not contradictory. Allow double i would be a silly rule for say the stem dbr since dbr by default does not allow i. For dbr to appear with allow double i, there would have to be another rule attached to it that allowed i for some condition.

The next rule collection is allow double i, allow i and the two stems that have this rule are ml't and yl, neither of which allows i as a default, so again the rule is self-consistent. 

A little further down I saw rules I would question just as I noted above. How can words that don't allow i get to have a double i? Sure enough, when I removed those from the rule, it made no difference to the results.

A little further down, there was a rule that combined allow_i with allow_i_v. The rules that allow i as well as allowing i for prefix i are OK since i with prefix i translates into a forced double i. But to allow i with prefix v is a subset of allowing i, so the rule is redundant. Equally if i is allowed implicitly and not explicitly prevented, then rules that are subsets of allowing i can be removed. I removed about 11 such redundancies.

Remember Occam's razor.

And so on. What I should do at some point is remove all the specific references to stems as rules from the code and put them into a database as I have in these now 143 cases. Then I could invent some additional language analysis. Maybe when it's easier to do that than just adding more rules I might do it. (Programmers take the route with the fewest obstacles.)

A word about o. The default is to allow it, so if it isn't wanted, a rule must specifically prevent it. So there is a rule below entitled prevent_o. It applies to 9 stems in the test data. I have been in tension whether to name every rule. A single rule that has exceptions for every stem presents a problem. Too many rules! So I oscillate between those I can easily remember and a search for specific combinations of letters and vowels. 

The following has been updated after the inclusion of another 8 chapters. 

Rule Combination                                                      Applies to these stems

allow_dbl_i,

gvi, irw, mdi

allow_dbl_i, allow_i,

ml+, yl

allow_dbl_i, allow_i, hatef_qamats_vav,

yvp

allow_dbl_i, allow_i_c, qamats_1_vav,

zvh

allow_dbl_i, allow_i_h,

abh

allow_dbl_i, prevent_i, qamats_yod_ii,

hih

allow_dbl_i, prevent_i_h,

ixt

allow_i,

+vb, acr, amx, at, bqw, brk, btq, bwr, bzr, ckd, clh, cn, cnr, csa, csh, ctr, cys, dbh, dlh, dvd, gbr, gll, gn, hnm, k+h, kbl, kc, kdql, kmd, kmw, knm, kth, ktt, kx, lb, lpd, lvn, lyg, m+h, mdd, mgn, mvab, nax, nba, ngr, nkl, npx, nqb, nqh, nsh, nsy, ntc, nym, pnnh, psl, pss, ptk, ptt, pzr, q+r, qnm, qxx, rbb, rmvn, rpa, rxp, rxx, sll, snsn, tcn, tmr, w+k, wn, wqx, wvb, wvh, ww, wyy, xhr, xma, xmk, xmq, xnh, xnr, xph, y+r, yqb, ywr, yxm, yzz, zch, zmh, zmm, zmr

allow_i, allow_i_i,

bxr, glh, qvh, wbr, wck

allow_i, allow_i_i, closed_qamats,

nwq

allow_i, allow_i_i, hatef_qamats_vav,

dmh

allow_i, closed_qamats,

crm, lbb

allow_i, hatef_qamats_vav,

wbl, xpr

allow_i, hatef_qamats_vav, qamats_1_vav,

klh

allow_i, i_tsere_ii,

yxb

allow_i, i_tsere_ii, qamats_1_vav,

amn

allow_i, prevent_i_h,

hlc, hll, rgy, wgh, xmt, yll

allow_i, prevent_i_h, allow_i_i,

pla, wmy

allow_i, prevent_i_h, allow_i_i, qamats_u,

clm

allow_i, prevent_i_h, i_tsere_ii, prevent_o,

rah

allow_i, prevent_o,

abd, azn

allow_i, prevent_pfx_hi,

nxb, pgy

allow_i, qamats_1_vav,

rnn

allow_i, vavqamats_vv,

xvh

allow_i_0afx,

gdy, kzq, lmd, nar, qxr, slh, tqn, wcn

allow_i_0afx, allow_i_c, allow_i_l,

wcr

allow_i_0afx, allow_i_h,

psk

allow_i_0afx, allow_i_h, allow_i_l, allow_i_v,

awh

allow_i_0afx, allow_i_l, allow_i_v,

pnh

allow_i_0afx, allow_i_sfx, allow_i_h, allow_i_i, allow_i_t, allow_i_v, remove_h,

ntn

allow_i_0afx, allow_i_sfx, allow_i_i, allow_i_t,

mla

allow_i_0afx, allow_i_sfx, allow_i_t,

bly

allow_i_0afx, allow_i_sfx, allow_i_v,

ynh

allow_i_0afx, allow_i_sfx, i_tsere_ii, hatef_patah_vav,

avh

allow_i_0afx, allow_i_v,

wlk

allow_i_0afx, allow_i_v, closed_qamats,

rkm

allow_i_0afx, hatef_qamats_vav,

qdw

allow_i_0afx, hatef_segol_vav, vavtsere_vv,

yvh

allow_i_0afx, i_tsere_ii,

blh, yvr

allow_i_0pfx,

bxy, dca, gdp, yqw

allow_i_0pfx, allow_i_b, allow_i_h, allow_i_t,

kll

allow_i_0pfx, allow_i_b, prevent_pfx_hi, allow_i_t,

pll

allow_i_0pfx, allow_i_i, allow_i_t,

wkt

allow_i_a,

drw, wvm

allow_i_a, allow_i_i,

nxl, wp+

allow_i_b,

abib, mn, zq

allow_i_b, allow_i_i,

crt

allow_i_b, allow_i_i, allow_i_m, allow_i_t,

sbb

allow_i_b, allow_i_i, allow_i_t,

ngp

allow_i_b, allow_i_i, closed_qamats,

bra

allow_i_c,

wtl, xpkt

allow_i_c, allow_i_l, allow_i_v,

cpr

allow_i_h,

hgh, mss, nsc, nvk, nxx, rby, twy, wpc, zhr

allow_i_h, allow_i_i,

wby

allow_i_h, allow_i_v,

nqm

allow_i_h, prevent_pfx_hi,

rym

allow_i_h, prevent_pfx_hi, allow_i_v,

wal

allow_i_h, qamats_vav_iv,

wlv

allow_i_i,

arc, crh, cwl, gml, lpt, mas, mcc, mvl, mvr, n+r, nbl, npk, nsk, ntq, ntx, nvn, nwc, nzl, prd, qhl, qnh, qvx, rbh, scn, scr, svg, wkk, wrp, xmd, zry

allow_i_i, allow_i_l,

ndd

allow_i_i, allow_i_l, allow_i_t,

lkm

allow_i_i, allow_i_t,

bhl, ctb, dmm, lcd, mkh, mv+, n+w, ngw, ngy, nxr, tmm

allow_i_i, allow_i_t, allow_i_v,

tpw

allow_i_i, allow_i_v,

str

allow_i_i, hatef_qamats_vav,

krh

allow_i_i, i_patah_ii_exc1,

ild

allow_i_i, i_tsere_ii,

idy

allow_i_i, init_i_ii, i_patah_ii_exc1,

ikl

allow_i_l,

wyn

allow_i_m,

qrn

allow_i_sfx,

+pk, amt, aw, axl, b+k, bwl, dwn, kbq, kch, klq, mhr, mll, my+, ncr, qdm, qll, wcl, wdp, wvy, ym

allow_i_sfx, allow_i_b,

xl, yt

allow_i_sfx, allow_i_b, allow_i_c, prevent_pfx_hi,

gdl

allow_i_sfx, allow_i_b, allow_i_i, allow_i_v,

qbx

allow_i_sfx, allow_i_h, allow_i_i, allow_i_v,

nkm

allow_i_sfx, allow_i_i, allow_i_l,

mnh

allow_i_sfx, allow_i_i, prevent_o,

bnh

allow_i_sfx, allow_i_i, remove_h,

spr

allow_i_sfx, allow_i_l,

am

allow_i_sfx, allow_i_v,

rmh

allow_i_sfx, hatef_qamats_vav,

lq+

allow_i_sfx, i_tsere_ii,

klx

allow_i_sfx, prevent_i_h, allow_i_i, allow_i_t,

npl

allow_i_sfx, prevent_pfx_hi, qamats_u,

pqd

allow_i_t,

+ma, mxa, n+p, nvd, qrb, rhb, war

allow_i_t, allow_i_v,

ndr

allow_i_t, allow_i_v, qamats_vav_iv,

alm

allow_i_v,

+hr, +vl, al, awr, kqr, krw, lh+, ngd, nkt, nqp, nvy, nyr, rb, rxh, wbk, wcm, wkr, wlm, wqh, zqq, zvd

allow_i_v, closed_qamats,

kwc

allow_i_v, i_tsere_ii, remove_yod,

qvm

allow_i_v, qamats_u,

wlc

closed_qamats,

anknv, bkn, brc, hdr, krp, mhh, mlc, prx, scc, slk, wcb, wrt, yml

cmn_final_i_exc,

az, dvi, iwi, wi

cmn_final_i_exc, allow_dbl_i, prevent_i, i_tsere_ii, remove_h, qamats_yod_ii,

kih

cmn_masc_pl_exc,

abivn, ivnh, mim

cmn_masc_pl_exc, allow_i_a, allow_i_i, allow_i_t,

wmm

cmn_masc_pl_exc, i_patah_ii,

igy

cmn_sfx_exc,

id, ira, ixr, wna, wph

cmn_sfx_exc, allow_i,

byt

cmn_sfx_exc, create_final_ha, allow_i_i,

qra

create_final_ha,

kgr, qry

create_final_ha, allow_i,

kwb

create_final_ha, allow_i_i,

qrh

create_final_ha, allow_i_i, allow_i_t,

n+y

create_final_ha, allow_i_i, allow_i_t, allow_i_v,

lqk

create_final_ha, allow_i_i, allow_i_t, remove_h,

cvn

create_final_ha, allow_i_sfx,

zcr

create_final_ha, qamats_1_vav,

yzr

create_final_ha, remove_h,

ymd

create_final_ha, restore_a, allow_i_i, allow_i_v,

bva

create_final_ha, vavqamats_vv,

wvq

hatef_patah_vav,

gah, kvl, wkh

hatef_qamats_vav,

kmr, krb, qdqd, sbl, wkd, xri, ymr, ypr

hatef_qamats_vav, qamats_1_vav,

anih

hatef_segol_vav,

avil

i_patah_ii,

bliyl, din, iaw, ixb, yit

i_tsere_ii,

dah, hrg, hrs, iph, iqd, ird, irq, iry, iwb, kmm, kpz, rcc, rdh, rdm, rgn, rkq, ryy, spn, ydi, yip

i_tsere_ii, hatef_qamats_vav,

yni

i_tsere_ii, i_patah_ii, qamats_yod_ii, vavtsere_vv,

iin

i_tsere_ii, prevent_o,

amr, ywh

init_i_ii, i_patah_ii_exc1,

isr

init_i_ii, i_tsere_ii,

iqr

prevent_i,

bin, mmi

prevent_i, prevent_o,

ihvh

prevent_i_h,

ixg, lvi

prevent_o,

acl, aiph, aph, bch, bl, ch, dkh, iawih, la, mwh, nad, ph, pry, raw, wilh, wla, wlmh, wmal, wth, xan, zat, zh

prevent_o, hatef_qamats_vav,

akz

prevent_o, vavqamats_vv,

lvh

prevent_pfx_hi,

ink

prevent_pfx_hi, allow_i_m, allow_i_t, allow_i_v, qamats_1_vav,

knn

prevent_pfx_hi, allow_i_m, i_tsere_ii, remove_yod,

ixa

qamats_1_vav,

ark, avn, bvw, ctl, kq, wr, ybd

qamats_1_vav, closed_qamats,

ahl

qamats_u,

+by, ryl

qamats_vav_iv,

ikd

qub_a_exc,

+la, kla, wvah

qub_a_exc, allow_i_sfx,

cla

qub_a_exc, closed_qamats,

gal

remove_h,

bgd, ryw

remove_h, prevent_o,

pdh

remove_yod,

kbr

restore_a,

nva

restore_a, qub_a_exc, allow_i_h, allow_i_t, allow_i_v,

nwa

sp_exc, allow_dbl_i, allow_i_v,

cny

sp_exc, allow_i_0afx, allow_i_sfx, allow_i_m, allow_i_v, closed_qamats,

dbr

vavqamats_vv,

gvy, xvk, yrh

vavqamats_vv, vavpatah_vv,

rvh

vavqamats_vv, vavtsere_vv, vavpatah_vv,

wlh, ynv

vavtsere_vv,

kgv, mzv, ninvh, qxh

vavtsere_vv, vavpatah_vv,

nvh