Sunday 2 August 2020

Summary of the rules - a point of stability

Continuing to explain to myself this program I am writing in response to the transformation of a pointed Hebrew text into one that conforms to standard hebrew spelling.

Right now I have 143 explicit named 'rules' governing subsets of Hebrew stems and their various word forms. That means 143 distinct sets of transformations that apply to a set of stems.

Do they make sense? 

Here for example is the first entry below: gvi, irw, mdi each allow double i /ii/ in the calculated SimHebrew word, whatever its origin. That is the only 'rule' applied to these stems. I see that each of these stems has an /i/ in it, so this collection is not contradictory. Allow double i would be a silly rule for say the stem dbr since dbr by default does not allow i. For dbr to appear with allow double i, there would have to be another rule attached to it that allowed i for some condition.

The next rule collection is allow double i, allow i and the two stems that have this rule are ml't and yl, neither of which allows i as a default, so again the rule is self-consistent. 

A little further down I saw rules I would question just as I noted above. How can words that don't allow i get to have a double i? Sure enough, when I removed those from the rule, it made no difference to the results.

A little further down, there was a rule that combined allow_i with allow_i_v. The rules that allow i as well as allowing i for prefix i are OK since i with prefix i translates into a forced double i. But to allow i with prefix v is a subset of allowing i, so the rule is redundant. Equally if i is allowed implicitly and not explicitly prevented, then rules that are subsets of allowing i can be removed. I removed about 11 such redundancies.

Remember Occam's razor.

And so on. What I should do at some point is remove all the specific references to stems as rules from the code and put them into a database as I have in these now 143 cases. Then I could invent some additional language analysis. Maybe when it's easier to do that than just adding more rules I might do it. (Programmers take the route with the fewest obstacles.)

A word about o. The default is to allow it, so if it isn't wanted, a rule must specifically prevent it. So there is a rule below entitled prevent_o. It applies to 9 stems in the test data. I have been in tension whether to name every rule. A single rule that has exceptions for every stem presents a problem. Too many rules! So I oscillate between those I can easily remember and a search for specific combinations of letters and vowels. 

The following has been updated after the inclusion of another 8 chapters. 

Rule Combination                                                      Applies to these stems

allow_dbl_i,

gvi, irw, mdi

allow_dbl_i, allow_i,

ml+, yl

allow_dbl_i, allow_i, hatef_qamats_vav,

yvp

allow_dbl_i, allow_i_c, qamats_1_vav,

zvh

allow_dbl_i, allow_i_h,

abh

allow_dbl_i, prevent_i, qamats_yod_ii,

hih

allow_dbl_i, prevent_i_h,

ixt

allow_i,

+vb, acr, amx, at, bqw, brk, btq, bwr, bzr, ckd, clh, cn, cnr, csa, csh, ctr, cys, dbh, dlh, dvd, gbr, gll, gn, hnm, k+h, kbl, kc, kdql, kmd, kmw, knm, kth, ktt, kx, lb, lpd, lvn, lyg, m+h, mdd, mgn, mvab, nax, nba, ngr, nkl, npx, nqb, nqh, nsh, nsy, ntc, nym, pnnh, psl, pss, ptk, ptt, pzr, q+r, qnm, qxx, rbb, rmvn, rpa, rxp, rxx, sll, snsn, tcn, tmr, w+k, wn, wqx, wvb, wvh, ww, wyy, xhr, xma, xmk, xmq, xnh, xnr, xph, y+r, yqb, ywr, yxm, yzz, zch, zmh, zmm, zmr

allow_i, allow_i_i,

bxr, glh, qvh, wbr, wck

allow_i, allow_i_i, closed_qamats,

nwq

allow_i, allow_i_i, hatef_qamats_vav,

dmh

allow_i, closed_qamats,

crm, lbb

allow_i, hatef_qamats_vav,

wbl, xpr

allow_i, hatef_qamats_vav, qamats_1_vav,

klh

allow_i, i_tsere_ii,

yxb

allow_i, i_tsere_ii, qamats_1_vav,

amn

allow_i, prevent_i_h,

hlc, hll, rgy, wgh, xmt, yll

allow_i, prevent_i_h, allow_i_i,

pla, wmy

allow_i, prevent_i_h, allow_i_i, qamats_u,

clm

allow_i, prevent_i_h, i_tsere_ii, prevent_o,

rah

allow_i, prevent_o,

abd, azn

allow_i, prevent_pfx_hi,

nxb, pgy

allow_i, qamats_1_vav,

rnn

allow_i, vavqamats_vv,

xvh

allow_i_0afx,

gdy, kzq, lmd, nar, qxr, slh, tqn, wcn

allow_i_0afx, allow_i_c, allow_i_l,

wcr

allow_i_0afx, allow_i_h,

psk

allow_i_0afx, allow_i_h, allow_i_l, allow_i_v,

awh

allow_i_0afx, allow_i_l, allow_i_v,

pnh

allow_i_0afx, allow_i_sfx, allow_i_h, allow_i_i, allow_i_t, allow_i_v, remove_h,

ntn

allow_i_0afx, allow_i_sfx, allow_i_i, allow_i_t,

mla

allow_i_0afx, allow_i_sfx, allow_i_t,

bly

allow_i_0afx, allow_i_sfx, allow_i_v,

ynh

allow_i_0afx, allow_i_sfx, i_tsere_ii, hatef_patah_vav,

avh

allow_i_0afx, allow_i_v,

wlk

allow_i_0afx, allow_i_v, closed_qamats,

rkm

allow_i_0afx, hatef_qamats_vav,

qdw

allow_i_0afx, hatef_segol_vav, vavtsere_vv,

yvh

allow_i_0afx, i_tsere_ii,

blh, yvr

allow_i_0pfx,

bxy, dca, gdp, yqw

allow_i_0pfx, allow_i_b, allow_i_h, allow_i_t,

kll

allow_i_0pfx, allow_i_b, prevent_pfx_hi, allow_i_t,

pll

allow_i_0pfx, allow_i_i, allow_i_t,

wkt

allow_i_a,

drw, wvm

allow_i_a, allow_i_i,

nxl, wp+

allow_i_b,

abib, mn, zq

allow_i_b, allow_i_i,

crt

allow_i_b, allow_i_i, allow_i_m, allow_i_t,

sbb

allow_i_b, allow_i_i, allow_i_t,

ngp

allow_i_b, allow_i_i, closed_qamats,

bra

allow_i_c,

wtl, xpkt

allow_i_c, allow_i_l, allow_i_v,

cpr

allow_i_h,

hgh, mss, nsc, nvk, nxx, rby, twy, wpc, zhr

allow_i_h, allow_i_i,

wby

allow_i_h, allow_i_v,

nqm

allow_i_h, prevent_pfx_hi,

rym

allow_i_h, prevent_pfx_hi, allow_i_v,

wal

allow_i_h, qamats_vav_iv,

wlv

allow_i_i,

arc, crh, cwl, gml, lpt, mas, mcc, mvl, mvr, n+r, nbl, npk, nsk, ntq, ntx, nvn, nwc, nzl, prd, qhl, qnh, qvx, rbh, scn, scr, svg, wkk, wrp, xmd, zry

allow_i_i, allow_i_l,

ndd

allow_i_i, allow_i_l, allow_i_t,

lkm

allow_i_i, allow_i_t,

bhl, ctb, dmm, lcd, mkh, mv+, n+w, ngw, ngy, nxr, tmm

allow_i_i, allow_i_t, allow_i_v,

tpw

allow_i_i, allow_i_v,

str

allow_i_i, hatef_qamats_vav,

krh

allow_i_i, i_patah_ii_exc1,

ild

allow_i_i, i_tsere_ii,

idy

allow_i_i, init_i_ii, i_patah_ii_exc1,

ikl

allow_i_l,

wyn

allow_i_m,

qrn

allow_i_sfx,

+pk, amt, aw, axl, b+k, bwl, dwn, kbq, kch, klq, mhr, mll, my+, ncr, qdm, qll, wcl, wdp, wvy, ym

allow_i_sfx, allow_i_b,

xl, yt

allow_i_sfx, allow_i_b, allow_i_c, prevent_pfx_hi,

gdl

allow_i_sfx, allow_i_b, allow_i_i, allow_i_v,

qbx

allow_i_sfx, allow_i_h, allow_i_i, allow_i_v,

nkm

allow_i_sfx, allow_i_i, allow_i_l,

mnh

allow_i_sfx, allow_i_i, prevent_o,

bnh

allow_i_sfx, allow_i_i, remove_h,

spr

allow_i_sfx, allow_i_l,

am

allow_i_sfx, allow_i_v,

rmh

allow_i_sfx, hatef_qamats_vav,

lq+

allow_i_sfx, i_tsere_ii,

klx

allow_i_sfx, prevent_i_h, allow_i_i, allow_i_t,

npl

allow_i_sfx, prevent_pfx_hi, qamats_u,

pqd

allow_i_t,

+ma, mxa, n+p, nvd, qrb, rhb, war

allow_i_t, allow_i_v,

ndr

allow_i_t, allow_i_v, qamats_vav_iv,

alm

allow_i_v,

+hr, +vl, al, awr, kqr, krw, lh+, ngd, nkt, nqp, nvy, nyr, rb, rxh, wbk, wcm, wkr, wlm, wqh, zqq, zvd

allow_i_v, closed_qamats,

kwc

allow_i_v, i_tsere_ii, remove_yod,

qvm

allow_i_v, qamats_u,

wlc

closed_qamats,

anknv, bkn, brc, hdr, krp, mhh, mlc, prx, scc, slk, wcb, wrt, yml

cmn_final_i_exc,

az, dvi, iwi, wi

cmn_final_i_exc, allow_dbl_i, prevent_i, i_tsere_ii, remove_h, qamats_yod_ii,

kih

cmn_masc_pl_exc,

abivn, ivnh, mim

cmn_masc_pl_exc, allow_i_a, allow_i_i, allow_i_t,

wmm

cmn_masc_pl_exc, i_patah_ii,

igy

cmn_sfx_exc,

id, ira, ixr, wna, wph

cmn_sfx_exc, allow_i,

byt

cmn_sfx_exc, create_final_ha, allow_i_i,

qra

create_final_ha,

kgr, qry

create_final_ha, allow_i,

kwb

create_final_ha, allow_i_i,

qrh

create_final_ha, allow_i_i, allow_i_t,

n+y

create_final_ha, allow_i_i, allow_i_t, allow_i_v,

lqk

create_final_ha, allow_i_i, allow_i_t, remove_h,

cvn

create_final_ha, allow_i_sfx,

zcr

create_final_ha, qamats_1_vav,

yzr

create_final_ha, remove_h,

ymd

create_final_ha, restore_a, allow_i_i, allow_i_v,

bva

create_final_ha, vavqamats_vv,

wvq

hatef_patah_vav,

gah, kvl, wkh

hatef_qamats_vav,

kmr, krb, qdqd, sbl, wkd, xri, ymr, ypr

hatef_qamats_vav, qamats_1_vav,

anih

hatef_segol_vav,

avil

i_patah_ii,

bliyl, din, iaw, ixb, yit

i_tsere_ii,

dah, hrg, hrs, iph, iqd, ird, irq, iry, iwb, kmm, kpz, rcc, rdh, rdm, rgn, rkq, ryy, spn, ydi, yip

i_tsere_ii, hatef_qamats_vav,

yni

i_tsere_ii, i_patah_ii, qamats_yod_ii, vavtsere_vv,

iin

i_tsere_ii, prevent_o,

amr, ywh

init_i_ii, i_patah_ii_exc1,

isr

init_i_ii, i_tsere_ii,

iqr

prevent_i,

bin, mmi

prevent_i, prevent_o,

ihvh

prevent_i_h,

ixg, lvi

prevent_o,

acl, aiph, aph, bch, bl, ch, dkh, iawih, la, mwh, nad, ph, pry, raw, wilh, wla, wlmh, wmal, wth, xan, zat, zh

prevent_o, hatef_qamats_vav,

akz

prevent_o, vavqamats_vv,

lvh

prevent_pfx_hi,

ink

prevent_pfx_hi, allow_i_m, allow_i_t, allow_i_v, qamats_1_vav,

knn

prevent_pfx_hi, allow_i_m, i_tsere_ii, remove_yod,

ixa

qamats_1_vav,

ark, avn, bvw, ctl, kq, wr, ybd

qamats_1_vav, closed_qamats,

ahl

qamats_u,

+by, ryl

qamats_vav_iv,

ikd

qub_a_exc,

+la, kla, wvah

qub_a_exc, allow_i_sfx,

cla

qub_a_exc, closed_qamats,

gal

remove_h,

bgd, ryw

remove_h, prevent_o,

pdh

remove_yod,

kbr

restore_a,

nva

restore_a, qub_a_exc, allow_i_h, allow_i_t, allow_i_v,

nwa

sp_exc, allow_dbl_i, allow_i_v,

cny

sp_exc, allow_i_0afx, allow_i_sfx, allow_i_m, allow_i_v, closed_qamats,

dbr

vavqamats_vv,

gvy, xvk, yrh

vavqamats_vv, vavpatah_vv,

rvh

vavqamats_vv, vavtsere_vv, vavpatah_vv,

wlh, ynv

vavtsere_vv,

kgv, mzv, ninvh, qxh

vavtsere_vv, vavpatah_vv,

nvh

 




No comments:

Post a Comment