Archives

12.5.16.23:59: ANCESTORS OF THE OFFSPRING OF WATERS

Last night I was frustrated by my inability to identify how Avestan "Napá" in Wikipedia was related to Apąm Napāt. I had never seen á in Avestan romanization before, and I assumed it was a substitute for long ā. It took me over three hours to figure out that it was actually a substitute for 𐬃 å [ɔː] from *-ās, the long counterpart of ō from *-as. I think the rounding reflects an earlier retroflex *-ẓ [ʐ]:

*-ās [aːs] > *-āẓ [aːʐ] > *[aːɰ] > *-āw [aːw] > -å [ɔː]

*-as [ɐs] > *-aẓ [ɐʐ] > *[ɐɰ] > *-aw [ɐw] > -ō [oː]

The shift of retroflex [ʐ] to labial w has parallels in the development of secondary labials from earlier Chinese retroflexes: e.g.,

雙 Late Middle Chinese *ʂaŋ > Mandarin shuangwaŋ] 'double'

書 Early Mandarin *ʂu > Xi'an Mandarin fu 'book'

In this scenario, the different heights of -å [ɔː] and -ō [oː] reflect the different heights of long and short a. I am projecting Sanskrit's lower long ā / higher short a distinction back into Proto-Indo-Iranian. If Avestan short a was still higher than [a], it couldn't be as high as Avestan ə.

The Avestan letter 𐬃 <å> is a blend of 𐬆 <ə> and 𐬃 <ā> (Avestan is written from right to left). The schwa indicates that <å> [ɔː] was higher than <ā> [aː] (and reminds me of how Koreanㅓ <ŏ> has been transcribed as both [ə] and [ɔ]).

*-ās and *-as underwent somewhat different developments in Sanskrit before word-initial voiced segments:

*-ās [aːs] > *-āẓ [aːʐ] > *[aːɰ] > [aː]

*-as [ɐs] > *-aẓ [ɐʐ] > *[ɐɰ] > *-aw [ɐw] > -ō [oː] but -a before vowels other than a-

*-ẓ [ʐ] became -r after Sanskrit non-a vowels and before word-initial voiced segments other than r-*.

Given all of the above, Avestan Napå in Apąm Napå is from *napās, which I presume is from *napāt-s with the masculine nominative singular suffix *-s.

(5.17.00:31: Its Latin cognate nepos < *nepot-s 'grandson, nephew, descendant' has a similar reduction of final *-t-s.)

Jackson (1892: 49) listed the formula

Orig. t + s = Skt. s (through intermediate ss §§185, 186).

but he gave no examples of final *-t-s and I still can't find the nominative singular napå (though the stem is on p. 58).

I think Avestan zam- 'earth' and ziiam- 'winter' also underwent *-āC-s > *-ās simplification (final forms from Skjærvø 2003: 54 with ii rewritten as y):

*zām-s > *zā-s > zå

(cognate to the second half of Russian Novaya Zemlya)

(Jackson 1892: 93 derived this from *zm̥̄-s with a long syllabic m.)

*zyām-s > *zyā-s >

(I'm surprised by -ām- given its cognates Skt hi and Russian zima.)

Sanskrit has different simplifications:

*napāt-s > napāt 'grandson' (5.17.00:08: all 15 attestations in Rigveda here)

*kṣām-s > kṣās 'earth'

*praśām-s > *praśān-s > *praśān 'quieting'

(5.17.00:06: This word may not date back to pre-Sanskrit, but I think this change might have occurred to other *-ām-s words.)

Note, however, that not all final clusters were reduced to single consonants in Avestan: e.g.,

*āp-s > āf-š (not *ā-s or -å) 'water'

corresponding to a hypothetical Sanskrit *ap < *ap-s. Not all forms of Skt ap are attested. Whitney (1896: 148) listed only nine out of twenty-four possible forms:


Singular Dual Plural
nominative unattested? unattested? āp-as (later also ap-as)
accusative ap-as (later also āp-as)
instrumental ap-ā ad-bhis
dative unattested? ad-bhyas
ablative
genitive ap-as ap-ām
locative unattested? ap-su
vocative unattested?

Note the dissimilation of -p to -d before -bh- in the instrumental and dative-ablative. No such dissimilation occurred in the Avestan dative plural aiβ < *ab-bhyas (Jackson 1892: 84). Nor does it occur in other Sanskrit -p stems: e.g.,

dharma-gub-bhyas (not *dharma-gud-bhyas) (dat.-abl. pl. of dharma-gup 'guardian of law')

The varying lengths of the first vowel of Sanskrit āp-/ap- is correlated with accentuation:

accented ā in the stem is long:

ā́pas (nom. pl.)

unaccented a in the stem is short:

apás (acc. pl.)

(5.17.00:18: But one should not conclude that accentuation and length are always correlated: e.g., apás (acc. pl.) and nápāt both have accented short a, even though the latter also has a long ā that one might think would 'attract' the accent.)

Although ap- should normally go back to Proto-Indo-European *H-p- (*H- = uncertain laryngeal), I wonder if it is from an irregular variant *ʕ-p- of the root *ʕ-kʷ- of Latin aqua.

*Sanskrit avoids -r r- sequences: e.g.,

-ir + r- > -ī r- (-r loss with compensatory lengthening)

-īr + r- > -ī r- (-r loss; ī already long so no compensatory lengthening needed)


12.5.15.23:59: NEPHEW NEPTUNE?

Until I found this 2009 article by Gordon Whittaker last night, I had never noticed the similarity between Sanskrit Apām Napāt / Avestan Apąm Napāt 'Offspring of Waters' and Neptune (< Latin Neptūnus). I just found that the connection was first proposed by Georges Dumézil:

Dumézil though remarked words deriving from root *nep- [i.e., cognates of English nephew] are not attested in IE languages other than Vedic and Avestan [but what about Old Irish in the next sentence?]. He proposed an etymology that brings together [Latin] Neptunus with Vedic and Avestan theonyms Apam Napat, Apam Napá [not Napát with -t?*] and Old Irish theonym Nechtan, all meaning descendant of the waters [does Nechtan have that meaning in OI?]. By using the comparative approach the Indo-Iranian, Avestan and Irish figures would show common features with the Roman historicised legends about Neptune. Dumézil thence proposed to derive the nouns from IE root *nepot-, descendant, sister's son**.

Wikipedia lists several other proposed etymologies for Neptūnus.

The sound correspondences seem to work. OI cht is from Proto-Indo-European *pt: e.g.,

secht 'seven' < PIE septm̥.

However, I don't know if Latin -ūnus and OI -an can be accounted for.

I wouldn't go as far as Whittaker, who also links these words to a name in a completely non-Indo-European language: Sumerian Nudimmud*** and its variants, none of which have anything corresponding to -p-:

Nutemud (oldest attested form?)

Nutememud

Nudamud

Nadimmud ("artificially differentiated" - so is the a artificial?)

And even if the first half of Nudimmud was cognate to Napāt, where would the second half come from?

*5.16.1:12: As far as I know, Napāt shouldn't lose final -t in Avestan. I can't find any reference to such a loss in Jackson or Skjærvø's grammars. Nor would I expect one given that final -t is stable in the Sanskrit -t declension: there is no Sanskrit *Napā.

The Old Persian cognate of Avestan Napāt is napa without -t.

5.16.2:59: Ah, I see: the nominative singular of Avestan Napāt is Napå (Skjærvø 2003: 66):

The stem napāt- has the nom. from an h-stem napah-.

**5.16.1:32: Was *nepōt that specific? Both Beekes (1995) and Watkins (2000) defined it as 'nephew' which is ambiguous.

***5.16.00:55: Wikipedia has a native etymology for Nudimmud: nu 'likeness' + dim mud 'make bear'.


12.5.14.23:59: BUCKNELL'S SANSKRIT MANUAL 1: NEPHEWS OR GRANDSONS?

Today I got Roderick S. Bucknell's (1994) Sanskrit Manual in the mail. Although getting another Sanskrit grammar felt redundant at first, Bucknell's approach has many unique features that set it apart from the competition. I may highlight some of them in future entries.

For now, I only want to mention that the first thing I looked up was his treatment of -ṛ stems like mātṛ 'mother' which I blogged about yesterday. He described the masculine and feminine -ṛ stems on separate pages. I prefer to have both of them on the same pages so that they can be learned simultaneously. Moreover, he referred to them by their nominative singular as nouns which can easily get them confused with true stems and -an nouns whose nominative singulars also end in -ā:

A partial comparison of -ṛ, -ā, and -an stems


mātṛ 'mother' (f.)

kanyā 'girl' (f.)

rājan 'king' (m.)

nominative singular (all share the same ending)

mātā < *-ēr

kanyā

rājā < *-ōn

accusative singular (different stems)

mātar-am

kanyā-m

rājān-am

Bucknell chose mātṛ 'mother' as the model for feminine -ṛ stems like duhitṛ 'daughter'; it is arguably more basic than svasṛ 'sister' which he treated as a variant. Similarly, he chose pitṛ 'father' as the model male -ṛ stem kinship term and treated naptṛ (which I glossed last night as 'grandson') as a variant. Questions:

1. Why did he gloss naptṛ as 'nephew' rather than 'grandson'? I have not found any Sanskrit dictionary defining it as 'nephew', though the word is certainly cognate to nephew, nepotism, etc.

2. Did Sanskrit lose the meaning 'nephew'? The word means both 'nephew' and 'grandson' in Old English and Latin, and Watkins (2000: 58) reconstructed its Indo-European source *nepōt with both meanings.

3. Can one generalize and say *nepōt originally meant 'secondary younger male relative' (with sons being primary younger male relatives) which then narrowed in semantic scope, becoming 'grandson' in later Sanskrit and 'nephew' in English?

(5.15.00:59: Wiktionary defined Old English nefa as 'stepson' in addition to 'nephew' and 'grandson'. 'Stepson' also fits my proposed generic meaning 'secondary younger male relative'.)

4. According to Monier-Williams, naptṛ has a variant stem napāt- "only in the strong cases and earlier lang." So is this how the paradigm changed through time?


Stage 1: napāt-

Stage 2: mixed with naptṛ

Stage 3: naptṛ only?

strong case: nominative singular

napāt

napāt

naptā

weak case: instrumental singular

napāt-ā

naptr-ā

naptr-ā

(5.15.00:54: The Sanskrit Heritage site gives a full paradigm for napāt without any forms of naptṛ.)

5. Was the later naptṛ - the equivalent of a hypothetical English *nephther - by analogy with other kinship terms?

6. 5.15.1:00: Did that analogy occur at the Proto-Indo-Iranian level? Monier-Williams listed both types of forms in Avestan: napāt and naptar. So my stage 1 might have been pre-Proto-Indo-Iranian and both Sanskrit and Avestan might have inherited the stems from stage 2. My inability to find napāt in some dictionaries and grammars leads me to think that naptṛ was the sole survivor in Sanskrit by stage 3.


12.5.13.23:59: MĀTUR DAY

This photo of a "Mather's (sic) Day" cake made me think of Sanskrit mātṛ 'mother' which contains the syllabic I've been blogging about lately. mātṛ has a unique declension that is like a compromise between those of its fellow feminine svasṛ 'sister' and its opposite-sex counterpart pitṛ 'father':


svasṛ mātṛ pitṛ
accusative singular: 'mother' with short vowel like 'father' svasār-am mātar-am pitar-am
dual nominative, accusative, vocative: 'mother' with short vowel like 'father' svasār-au mātar-au pitar-au
plural nominative, vocative: 'mother' with short vowel like 'father' svasār-am mātar-as pitar-as
plural accusative: 'mother' with -s like 'sister' but in later epic Sanskrit with -as like 'father' svasṝ-s mātṝ-s > mātar-as pitṝ-n > pitar-as
Otherwise all three identical: e.g., genitive singular svasur mātur pitur

Why does 'mother' decline in 'father' in the first three instances? At first I thought it was analogy: i.e., that 'mother' used to decline like feminine 'sister' but mostly switched to the 'father' declension since 'mother' and 'father' share the semantic category of 'parents'. However, now I think both 'mother' and 'sister' might be regular feminine - declension nouns. Judging from Sanskrit alone, the long vowel before r in 'sister' but not in 'mother' could reflect a laryngeal *H in 'sister' absent from 'mother':

*swesoHr- > svasār- (short vowel *o + laryngeal *H = long ā)

*meʕter- > mātar- (short vowel *e + laryngeal = long ā)

Beekes' (1995: 38) Proto-Indo-European *suésōr could be from *swesoHr-.

(5.14.1:28: Burrow 1955 regarded the long ā of 'sister' and -tṛ agent nouns as being "introduced from the analogy of the nom. sg." which ends in ā: e.g.,

svasā (nom. sg.) : svasār- (stem of acc. sg., dual. nom./acc./voc., nom. pl.)

Macdonell notes that the non-agent -tṛ agent noun naptṛ 'grandson' also has ā where svasṛ does: e.g., acc. sg. naptār-am. But the nom. sgs. of 'mother', 'father', and bhrātṛ 'brother' also end in -ā:

mātā (nom. sg.) but mātar-, not *mātār- (stem of acc. sg., dual. nom./acc./voc., nom. pl.)

pitā but pitar-, not *pitār-

bhrātā but bhrātar-, not *bhrātār-

So why were they immune from analogy? Higher frequency than 'sister', etc.?)

The different plural accusative endings (f. -ṝ-s, m. -ṝ-n) are expected since -n is unique to masculines.

-as is a common accusative plural ending for both masculine and feminine nouns, so it is not surprising that it later spread to 'mother' from 'father', making their endings identical.

The -u- in the shared genitive singular - though normal for the -ṛ declension - is unusual because it's not part of the normal gradation of syllabic -ṛ. Where did it come from? Beekes (1995: 176) reconstructed the Proto-Indo-European genitive singular of 'mother' as *mḗ-tr-s (presumably < *meʕtr-s), which I would expect to develop into Sanskrit *mā-tṛ-s, not mātur (< *mātur-s?).

(5.14.1:36: Burrow 1955 derived -ur from *-ṛš; cf. Avestan -ərəš, -arš.)

The -u- in mātur reminds me of the irregular -u- in the weak present stem of class VIII kṛ 'do' e.g., kur-v-anti 'they do' instead of *k-v-anti (cf. its earlier class V* form k-ṇv-anti 'id.'.) However, this kur- is from *kʷr- with the labial quality of *kʷ having become u. (I thought I came up with this myself today, but now I see that I might have first read about this solution in Burrow 1955.) However, the -sur and -tur genitive singulars cannot be from *sʷr or *tʷr since Proto-Indo-European did not have any labialized alveolars or dentals.

*5.14.1:26: Class VIII verbs have -v- before the third person present indicative ending -anti whereas Class V verbs have -nv- (here, -ṇv- with a retroflex -ṇ- that assimilated to the preceding ṛ).


12.5.12.23:59: ANGLO-HMONG TONOLOGY

I would like to see statistics on Hmong tonal frequency. In the Romanized Popular Alphabet, the mid tone is unmarked, leading me to think that it is the most frequent tone, but I don't know if that is really the case.

Although Hmong and Chinese are unrelated, they do share similar tonal systems (albeit with different values for each tonal category). For a long time I assumed that the mid tone corresponded to the Chinese 'upper level' category which may be the most common Chinese tone. However, according to this table, the mid tone actually corresponds to the Chinese 'upper departing' category ("tone 5").

If White Hmong (WH; Hmoob Dawb) tones originated in the same way as Chinese tones, they should have the following sources:


'Level':
Final sonorant
'Rising':
Final glottal stop
'Departing': Final fricative 'Entering': Final stop (non-glottal)
'Upper': Voiceless initial consonant tone -b (high) tone -v (mid rising) unmarked tone (mid) tone -s (low)
'Lower': Voiced initial consonant tone -j (high falling) tone -s (low) tone -g (mid falling, breathy) tone -m (low creaky) or -d (long low rising)

(5.13.00:11: The 'upper' tones are all nonlow in WH except for tone -s. The 'lower' tones are all falling [i.e., becoming low] and/or low in WH. Hence WH is a mostly voiced-low tonal language.

The table for Green Hmong [Moob Leeg] would be identical except that -g would be in the 'lower rising' as well as 'lower departing' categories. This merger is also common in Chinese.)

The breathiness of tone -g may be a trace of a lost final *-h.

In my last entry, I reported that Golston and Yang (2000) found low -s tones in nearly all White Hmong (WH) loans from toneless French. One might hypothesize:

Nearly all loans from nontonal languages have -s tones in WH.

And then one might predict that loans from toneless English should also mostly have -s tones. However, G&Y found that Anglo-Hmong loans have four different tones out of eight (RPA spellings are my guesses):

-m tone: final unstressed syllable

xov fam < sofa

-j tone: stressed syllable with long nucleus (a tense vowel, diphthong, or vowel-[ɹ] sequence)

Khes Maj < Kmart

-v tone: syllable ending in a voiceless consonant (the same environment that conditioned all non-b, -j tones in Hmong?)

mbav < bus (WH has no initial b-; mb- is the closest substitute)

-s tone: syllable ending in a vowel or nasal (the same environment that conditioned -b, -j tones in Hmong?)

Khes Maj < Kmart

Do WH speakers hear more tones in English than in French?

The creaky -m tone surprised me because English has no phonemic creaky voice. Yet I can easily pronounce sofa with creaky voice in  the last syllable.

One might expect Thai speakers to perceive English 'tones' in a similar manner, but they don't. According to Gandour (1979), Anglo-Thai loans are full of high and mid tones - the very tones that are absent from Anglo-Hmong loans! And Anglo-Thai falling and low tones do not correspond to Anglo-Hmong falling and low tones:


Anglo-Thai Anglo-Hmong
falling tone final syllables of polysyllabic words ending in sonorants: wii (mid) saa (falling) 'visa' stressed syllable with a long nucleus: Khes Ma'Kmart'
low tone final syllables of polysyllabic words ending in voiceless stops (in Thai): hɔt (high) dɔɔk (low) 'hot dog' syllables ending in sonorants: Khes Maj 'Kmart'

I wish I could find examples of the same English word in both Thai and WH for tonal comparison.

5.13.00:54: According to Nacaskul (1979: 161),

The high tone seems to be the favourite tone for English loanwords [in Thai ...] It is also noticeable that the rising tone hardly ever occurs in English loanwords in Thai [with only three exceptions ending in diphthongs, whereas the Hmong rising tone corresponds to syllables ending in voiceless consonants].

5.13.1:10: Bickner (1986: 35) pointed out

that the route travelled by a particular word [i.e., borrowing through speech or writing] will influence its pronunciation in Thai. For words which entered the language through speech, it is important to consider [tonal] contour shape as well as several seemingly minor details of the phonology of the different Thai tones in order to understand the pattern of tone assignment [e.g., the "glottal constriction ... characteristic of the Thai high tone" that is absent in the Hmong high tone]

The Biblical Hmong words in my last post were probably borrowed through both speech and writing - possibly with a conscious attempt to maintain tonal uniformity - whereas the Anglo-Hmong loans were probably borrowed through speech.

5.13.1:21: Kenstowicz and Suchato (2004: 25) performed an experiment in which native speakers of Thai assigned high or mid tones to nonsense English monosyllables ending in nasals or nasal-stop combinations:


High tone Mid tone
Syllable ending in nasal 202 842
Syllable ending in nasal + stop 593 451

I wonder if the participants would have chosen other tones if they had been given more options.

Next: The Segments of Anglo-Hmong


12.5.12.3:48: WHAT DO MARY, PETER, AND JOB HAVE IN COMMON?

White Hmong (WH; Hmoob Dawb) has eight tones but also has borrowings from toneless languages. Since all WH syllables must have a tone, foreign words that did not originally have tones acquire them. Golston and Yang's (2000) paper "White Hmong Loanword Phonology" examines tonal assignment in words such as

'Damascus': Fr Damas [dama] > WH Das Mas [da ma] (s = low tone)

'Sinai': Fr Sinaï [sinai] > WH Xis Nais [si nai] (initial [s] is written as WH x; WH s is [ʂ])

'Timothy': Fr Timothée [timɔte] > WH Tis Mos Tes [ti mɔ te]

'David': Fr David [david] > WH Das Vis [da vi]

'Sarah': Fr Sarah [saʁa] > WH Xas Las [sa la]

'Jacob': Fr Jacob [ʒakɔ] > WH Yas Kos [ja kɔ]

(The WH spellings are my guesses based on G&Y's phonetic transcription.)

Although one might think that the silent -s of French Damas was borrowed as WH -s for a low tone, nearly all French syllables were borrowed with s-tones regardless of their original spellings.

At least four exceptions have other tones:

-b (high) tone

'Mary': WH Mab Liab [ma lia] (not *Mas Lias; appears to be from Maria, not Fr Marie; not listed by G&Y as an exception)

'Peter': WH Pob Zeb [pɔ ʒe] (not *Pos Zes; does not appear to be from Pierre;  WH z is [ʐ], not [z], just as WH s is [ʂ], not [s])

'Job': WH Yob [jɔ] (not *Yos)

-v (mid rising) tone

'Ruth': WH Luv [lu] (not *Lus; not listed by G&Y as an exception)

G&Y regard Yob as a possible case of spelling-driven tone assignment. Note, however, that Jacob also ends in a -b but was not borrowed as WH *Yas Kob.

'Mary' and 'Peter' may have anomalous tones because they could be loans from a language other than French. I cannot identify a source for 'Peter', as I don't know of any European equivalent of 'Peter' like Po(d)re. I wonder if there are two or more layers of Christian WH vocabulary: a French layer and an even older non-French layer. The Hmong might have heard about Mary (and Peter?) before the whole of the Bible was translated into Hmong.

The -v tone in 'Ruth' has parallels in Hmong borrowings from English that I'll look at next time.


12.5.11.2:16: SEELTERSK: FONETIK, FONOLOGIE, STAVERING

I thank Dwight Decker for introducing me to Sater(land) Frisian (Seeltersk; hereafter SF - initials he would like). Here are a few things about it that caught my eye:

Three degrees of vowel length

Most SF vowels are either short or long, but high vowels may be 'semilong'. Acute accents distinguish long high vowels from semilong high vowels in spelling (stavering).

Short Semilong Long
i [ɪ] ie [iˑ] íe [iː]
u [ʊ] uu [uˑ] úu [uː]
(but uu in uui [uːi]; there is no semilong [uˑi])

I wonder what conditioned the semilong vowels.

The only other language I can think of with three degrees of vowel length is Estonian which has 'overlong' rather than 'semilong' vowels. Unlike SF, Estonian lacks a three-way distinction in its orthography: long and overlong vowels are spelled identically.

Wikipedia lists a few other examples of languages with three degrees of vowel length:

One of the very few languages to have three lengths, independent of vowel quality or syllable structure, is Mixe. An example from Mixe is [poʃ] "guava", [poˑʃ] "spider", [poːʃ] "knot". Similar claims have been made for Yavapai and Wichita.

Could Tangut's rich vowel system have had such a distinction?

The nonhigh SF long vowels are generally written doubled without acute accents. Exceptions are

oa [ɔː] (not oo, which is for [oː]; there is no corresponding short *[o])

öä [œː] (not öö); abbreviated to ö in öi [œːi] since there is no short *[œi]

(2:24: Were these historically opening diphthongs *[oa] and *[œɛ] that monophthongized?)

This SF course implies that üü may also have three lengths; it says that ie, uu, üü (without acute accents!) "are sometimes pronunciated long, sometimes shorter."

The course also mentions that long vowels may be written as single vowels in open syllables (as in Dutch) in "Dr. Fort's spelling".

-u vs. -uw

SF has orthographic syllables ending in both -u and -uw: e.g.,

Dau 'dew'

häuw 'hit, thrust'

What is the phonetic difference between them, if any? w is [w] after /u/, so are they [u] and [uw]? Or is -w required after some diphthongs but not others? Are there minimal pairs of the same vowel or diphthong before zero and -w?

The aforementioned SF course teaches that -w is part of the diphthong: äuw [ew] (Wikipedia: [ɛu]). Is this use of -w arbitrary, or was äuw originally *[ɛ(u)v] or *[ɛ(u)ʋ]?

s vs. z

Why is /s/ spelled both s- and z- in initial position? The SF course states that "Initial s is always sharp like in English sister [s]". Does the z-spelling reflect a lost earlier *[z]? Is the absence of minimal pairs of s- and z-words accidental?


12.5.11.1:12: MORE MŪṢ-TERIES

I forgot to mention mūṣ-tery 7 last night - which is arguably really muṣ-tery 1: why is √muṣ listed as a variant of √maṣ 'hurt' in Monier-Williams? CuC roots do not alternate with CaC roots.

Mūṣ-tery 8: According to Monier-Williams, √maṣ 'hurt' was "prob. invented to serve as the source of the words" with maṣ- (= mash- in MW's romanization) 'powder; ink'. But why doesn't this artificial verb mean 'powderize' rather than 'hurt' (< 'crush' < 'crush into powder'?)? I would expect a more transparent relationship between an artificial verb and the words that inspired it.

Mūṣ-tery 3 revisited: I asked,

Is this verb [√mūṣ- 'steal'] attested, or was the root invented on the basis of the 'stealer' interpretation of mūṣ- 'mouse'?

Turner (entry 10222) derived Hindi मूसना mūs-nā 'to steal' from Skt mūṣ-a-ti 'steals'. (Obviously he meant that the two shared a root √mūṣ, not that Hindu -nā is from Skt -ati.) Since it's highly unlikely that the Hindi verb is based on an artificial root, I assume that √mūṣ was a real root - perhaps a colloquial variant of earlier √muṣ influenced by mūṣ-'mouse'. How many other marginal Sanskrit words have firm descendants in later Indo-Aryan?


12.5.10.2:27: MŪṢ-TERIES

In "More Ra-ts", I mentioend that the Sanskrit root for 'rat, mouse' is mūṣ-, cognate to English mouse. I first learned a suffixed derivative mūṣ-ika- 'rat, mouse'. The word survives today in Hindi as mūs; other modern descendants are listed in entry 10258 of Turner.

The earliest attestation of the word I can find is in the Rigveda (i, 105, 8)

mūṣ-o na śiśn-ā vy-ad-anti

rat-NOM-PL as tail-INST-SG devour-3PL

Mūṣ-tery 1: What does this mean? Griffith translated this as 'as rats devour the weaver's threads' but I don't see any 'weaver's threads'. I do see the instrumental singular (not plural!) of 'tail'. 'As the rats devour with a tail?' That doesn't make sense.

'Tail' is the only part of the phrase that has no cognate in English.

na, literally/cognate to 'not', came to mean 'although not being' (Monier-Williams).

vy- 'apart' is cognate to vice (that which is apart - separated - from that which is correct?).

ad is cognate to eat; *e became a in Sanskrit.

Mūṣ-tery 2: Are any other forms of mūṣ- attested: e.g., is its nominative singular mūṭ? (-ṣ is not permissible before a pause.)

Mūṣ-tery 3: Monier-Williams glossed mūṣ- as 'stealer, thief'. Is the word cognate to √muṣ 'steal'? That seems unlikely as Sanskrit ū is from a Proto-Indo-European vowel-laryngeal sequence *uH whereas short u is from PIE *u without a laryngeal. My understanding is that laryngeals are integral parts of roots and can't be inserted: e.g.,

*mus > *mu-H-s!?

(2:45: The Dhātupāṭha listed √mūṣ with a long vowel as 'steal'. Is this verb attested, or was the root invented on the basis of the 'stealer' interpretation of mūṣ- 'mouse'?)

Mūṣ-tery 4:muṣ 'steal' can be conjugated as a member of three different verb classes: e.g., 'steals' could be

I. (earliest attested class). moṣ-a-ti (not muṣ-a-ti which is class VI; see below)

IX. (second oldest but most common?) muṣ-ṇā-ti

VI. (newest) muṣ-a-ti

Class traits are in bold.

-ti '-s' is cognate to archaic English -th.

I always thought of VI as the easiest class: the stem is more stable in the present than in classes I or IX. So I'm not surprised it's the newest. What I don't understand is the function(s?) of the elements between the roots and stems: -a- in I and VI and -ṇā- in IX.

It's interesting that the verb started out in huge class I, then moved to the smaller classes IX and VI.

(2:36: According to Whitney [1924: 263, 267], there are "less than twenty" class IX verbs "in use through the whole life of the language" as opposed to "over two hundred" class I verbs and roughly fifty class VI verbs during the same period.)

The first class IX verb I learned was √krī 'buy' (e.g., krī-ṇā-ti 'buys') - and to steal is to take without buying.

I hope to return to class IX when/if I write about Korean verbs again.

Mūṣ-tery 5:muṣ 'steal' has no attested future (not counting grammatical texts). Was it really impossible to say 'will steal' with √muṣ as opposed to its synonym √cur?

Mūṣ-tery 6: Monier-Williams listed a derived noun muṣ 'theft' with the note "MW" where the abbreviation of an attestation is expected. "MW." is not in the printed list of abbreviations but the online edition says it is short for

Monier-Williams' Sanskrit-English Dictionary, 1st edition with marginal notes

Was Monier-Williams was citing his own work, or was this abbreviation added by later editors?

Monier-Williams finished the new edition of his dictionary just days before he passed away in 1899.


12.5.9.23:27: MORE RA-TS

My entry on Vietnamese chuột < *juət 'rat' and related words reminded David Boxenhorn of "Chua the rat" from Kipling's The Jungle Book. That name is derived from the common New Indo-Aryan word for 'rat' and/or 'mouse':

cūha 4899 *cūha ʻ rat, mouseʼ.

S. cūho m. ʻ ratʼ; L. cūhā m. ʻratʼ, cūhī f. ʻmouseʼ; P. cūhā m., °hī f. ʻrat, mouseʼ; N. cuhā ʻmouseʼ; B. cuyā ʻrat, mouseʼ; Or. cūā ʻmouseʼ; H. cūhā, cūā m., cūhī, °hiyā f. ʻrat, mouseʼ, G. cuvɔ m.; M. ċuhā, ċuvā m. ʻsharp -- witted personʼ.  - Turner (1962-6: 267; key to abbreviations)

I don't know where this word came from. It has no Sanskrit cognate. The basic Sanskrit root for 'rat' is mūṣ-, cognate to English mouse. Sanskrit lexicons list caṇḍu- 'rat' and cikura-, cikka-, cuñcu-, cucundarī, and chucchundara- 'muskrat' (cf. Kipling's "Chuchundra, the musk-rat" and entry 2661 in Burrow and Emeneau's Dravidian Etymology Dictionary) but none look like good matches for *cūha, as I wouldn't expect to correspond to a or i or *h to correspond to k, kk, ñc, or various NT-type clusters*. The only Munda forms I can find (in Ho and Remo) apart from Santali cũnd 'muskrat' (from B&E 2661) don't resemble *cūha either.

Some other c/ts-words for 'rat' in (South)east Asia:

Thai ไจ้   <cai2> cai 'rat (calendrical)'

Lao ໄຈ້ <cai2> cai 'rat (calendrical)'

Old Chinese 子 *tsəʔ 'rat (calendrical)'

Korean 쥐 cwi 'rat'

(I recall that Hashimoto Mantarou proposed that this word was borrowed from Chinese 鼠 'rat' which generally has a fricative initial [e.g., Mandarin shu] but southern languages like Taiwanese tshi still have affricate initials.)

The Thai and Lao words may be borrowed from a southern Old Chinese variant of 子 'rat' with a presyllable (prefix?) that conditioned vowel lowering:

*Cʌ-tsəʔ > *Cʌ-tsʌɰʔ > *tsʌjʔ

Thai and Lao c- is the closest available equivalent of Chinese ts-.

Thai and Lao written tone 2 is from *-ʔ.

5.10.00:13: The Thai/Lao/OC words have nothing to do with the Korean or Indian words: a shared consonant type is not sufficient evidence for a connection.

*I would expect medial *h to be from a voiced aspirate (gh, jh, ḍh, dh, bh) or a fricative (ś, ṣ, s).


12.5.8.23:59: NƆ̣-T RELATED

Yesterday I mentioned Proto-Tai *hnuu 'rat' which vaguely resembles Tangut

3907 2nɔ̣ 'rat' (not the calendrical term which is 3859 1xwɨi)

One might think they are related not only to each other but also to

Proto-Kam-Sui

*kh-noC (Edmondson and Yang 1988 in Schuessler 2007: 471; tone C may be from *-ʔ)

*hnoC (Peiros)

*hnuC (Thurgood)

Proto-Mon-Khmer *kni (Shorto 2006) and its descendants: e.g., Mon ဂၞိ <gni> nɔeˀ (Shorto 1962)

Old Chinese *hnaʔ (Schuessler 2007: 471)

Japanese ne < *na(-)i or *ne (calendrical term; the regular word is nezumi)

However, the only solid set of cognates is probably Proto-Tai and Proto-Kam-Sui. The vowels of the other forms mostly don't match. If one proposed rules to explain the vocalic discrepancy, those rules should also apply to other cognates and/or loans.

The source of Jpn ne is unknown; it is probably a truncation of the regular word nezumi, since -zumi cannot be interpreted as a suffix or suffix sequence.

Taiwanese tshi for 'rat' may point to Old Chinese *th rather than *hnaʔ.

And Tangut 2nɔ̣ may be from pre-Tangut *SnroH or *SnraŋH; both have a medial *-r- absent in the others and there is no evidence for a Tangut infix *-r-.* (A Tangut *-ŋ : Old Chinese zero correspondence may not be a problem: see Schuessler 1997: 76-77.)

*5.9.1:05: An unknown coronal obstruent *S- conditioned the tense vowel of 2nɔ̣. The subscript dot indicates tenseness.

Pre-Tangut *-r- conditioned the lowering of *o (possibly from *-aŋ) to ɔ.

There are no known alternations between Tangut words reconstructible with and without *-r-. For now I assume that Tangut medial *-r- is a root consonant.

An unknown final glottal *-H conditioned the second (i.e., 'rising') tone of 2nɔ̣.


12.5.7.23:49: HUNTING FOR RATS

Seeing

Vietnamese chuột < *juət 'rat (regular word)'

Thai ชวด <jawaɗa>* chuat < *juat 'rat (calendrical)'

at Andrew West's BabelPad page made me wonder how widespread this word was. It's also in Khmer:

ជូត <juuta> cuut < *juut 'rat (calendrical)'

The regular words for 'rat' outside Vietnamese are

Thai หนู <hnuu> nuu and Lao ໜູ  <hnuu> nuu < Proto-Tai *hnuu*

Khmer កណ្ដុរ <kaṇḍura> kɑndao ~ kɑndol < *kɔnɗur

-o ~ -l < *-r is odd; normally *-r becomes zero: e.g.,

is Surin Khmer knʌr 'rat' related?

I was surprised to find a chuột-like word for 'rat' in only one other language in the SEAlang Mon-Khmer Languages Project dictionary: Thanh Hoa Muong cuot. Ferlus did not reconstruct it at the Proto-Vietic level: i.e., in the ancestor of Vietnamese and Muong. It's not in Proto-Tai'o-Matic either.

The fact that chuột is the regular word for 'rat' in Vietnamese makes me think that the word was originally Vietnamese and spread to Thai and Lao via Khmer.

But that scenario doesn't explain why Thai chuat and Lao suat don't have -uu- like Khmer cuut; their -ua- matches Vietnamese -uô- [uə].

And why would Vietnamese -uô- [uə] be borrowed as -uu- in Khmer when Khmer had a perfect phonetic match -uə-? I would expect the Khmer word for 'rat' to be a homophone of ជួត <jt> ct 'wrap around the head, wear a turban'.)

Is there any other word with a similar distribution (Thai, Lao, Khmer, Vietnamese, Muong, but few/no smaller languages)?

*I keep changing my mind about how to transliterate Thai and Lao. I originally wrote <jwɗ> for both but decided to write the inherent vowels for maximum compatibility with Sanskrit and Pali. However, those vowels are often not meaningful for native words: e.g., native ชวด/ຊວດ <jawaɗa> < *juat 'rat' was never trisyllabic *jawaɗa, though borrowed นคร <nagara> nakhɔɔn 'city' is from trisyllabic Sanskrit/Pali nagara- 'id.'

Final -t in Thai and Lao is written as ด/ດ <ɗ>; cf. the -d of Classical Tibetan corresponding to Old Chinese *-t: e.g., CT brgyad : OC *pret 'eight'.


12.5.6.21:15: POGAN-ISM

Until this morning, I assumed that Białystok was a purely Polish name: 'white slope'. (Is there a slope in Białystok?) But then I saw this etymology in Wikipedia:

The linguist A. P. Nepokupnyj proposes that the language source for Białystok is Yotvingian. Names with the -stok suffix as a second element of a hydronym are localized in the basin of the upper Narew.

I looked up Yotvingian and found this sad story:

Until the 1970s, Yotvingian was chiefly known from toponyms and medieval Russian sources. But in the 1970s a monument with Yotvingian writing was discovered by accident. In Belarussia, a young man named Zinov, an amateur collector, bought a manuscript titled Pogańskie gwary z Narewu ("Pagan speeches of Narew") from a priest. It was written partly in Polish, and partly in an unknown, "pagan" language. Unfortunately, Zinov had an argument with his mother, who burned the priceless manuscript in a rage. However before the manuscript was destroyed, Zinov had made notes of it which he subsequently sent to the renowned Baltist Vladimir Toporov. Even though Zinov's notes were riddled with errors, it has been proven beyond doubt that the notes are indeed a copy of an authentic Yotvingian text. This short Yotvingian–Polish dictionary (of just 215 words), Pogańskie gwary z Narewu, appears to have been written by some Polish priest in order to preach to Yotvingians in their mother tongue.

What if Zinov hadn't taken notes? How much less would be known?

The title of the book puzzles me because Polish for 'pagan' is pogańskie (nominative plural) with o instead of a. The left side of the Wikipedia page on paganism lists other o-words for 'paganism' besides Polish poganstvo:

West Slavic:

Czech pohanství (why not -stvo?*)

Slovak pohanstvo

South Slavic:

Croatian poganstvo (but Serbian паганизам without о and with the borrowed suffix -изам from Latin -ismus instead of the native suffix -ство - I find it ironic that Croatian seems to have more Slavic elements despite Croatia's greater affinity with the West)

Slovene poganstvo

Baltic:

Lithuanian pagonybė

Samogitian paguonībė

Uralic:

Hungarian pogányság

All of these words are based on a common Latin prototype pāgān(ism)us without o. Why do the borrowings have o in two different locations? Are there other words with non-Latin o corresponding to Latin a?

*22:49: If I understand pages 50 and 51 of Janda and Townsend's grammar correctly, Czech has a two-way opposition I haven't seen elsewhere in Slavic:

-stvo is for animate(-related?) collectives (hence svinstvo 'filthiness' - a condition reminiscent of svině 'swine'?)

-ství is for other abstractions

The declension of -ství is surprising: all singular cases are -ství except for instrumental singular -stvím, and the plural has less syncretism than the singular: -ství (nom./gen./acc./voc.), -stvím (dat.), -stvích (loc.), -stvími (inst.).


12.5.5.23:59: SANSKRIT IN TANGUT?

Offhand, there are two Tangut things I'd like to see before I die:

- the missing rising tone volume of the Tangraphic Sea (文海) dictionary

- a corpus, preferably with analysis, of all Tangut transcriptions of Sanskrit

The first would provide definitions and graphic analyses of thousands of tangraphs (Tangut characters). Parts of the rising tone volume can be recovered from other texts, but not the whole thing.

The second would go far beyond what I have now:

- Nishida's (1964) references to rhymes in Tangut transcriptions of Sanskrit: e.g., some unspecified rhyme 1 tangraph was used to transcribe Skt pū. There are four different rhyme 1 tangraphs with p-readings:

I have no idea which one was used. I reconstruct the reading of all four as 1pəu; there is no 1pū in my reconstruction that would be a perfect match for Skt pū.

- Nishida's (1964) references to initials and homophone groups in Tangut transcriptions of Sanskrit: e.g., the Tangut initials in chapter I of Homophones were used to transcribe Skt p-, ph-, bh- m- in the Juyongguan inscription. (But what about Skt b-? Was that absent from the inscription? According to Whitney 1924: 26, Skt ph is 15 times less common than Skt b, so I'd be surprising if b- is missing.) And that at least one of the ten characters in dental homophone group 18 (but which ones?) transcribed Sanskrit to.

- Grinstead's chart of Sanskrit transcription tangraphs and his tangraphic transcription index mixing modern Mandarin (! - in lieu of Tangut period northwestern Chinese) and Sanskrit together.

I don't remember ever seeing Sanskrit syllabic anywhere in Nishida 1964 and I don't see it in Grinstead 1972; the closest Sanskrit entries in the latter are mr and vr which were both transcribed with

2057 2məəiʳ (= Gong Hwang-cherng's 2meer, Arakawa's 2mywor, and Li Fanwen 1986's 2buạ)

Why would a consonant cluster be transcribed with a long vowel? Why not have a short vowel to keep the break between consonants to a minimum? Are the long vowels in the first two reconstructions and the -yw- in the third dubious? And why transcribe v- with m- or m- with Li Fanwen 1986's b-? I suspect 2057 was used to transcribe some Tangut period northwestern Chinese syllable with *mb- transcribing Skt vr- rather than Skt vr- itself. I wish I could see the word(s) that Grinstead found 2057 in so I could assess the probability of a Chinese intermediary.

If was ever transcribed in Tangut and if my reconstruction of Tangut were correct (note the subjunctive - I think I'm wrong, though less wrong than others), I would guess that by itself would be transcribed as rəʳ or rɨəʳ and that it would be transcribed after consonants as rhyme 90 -əʳ or rhyme 92 -ɨəʳ.

5.6.1:32: I went through Nishida 1964 and could not find a single example of Sanskrit syllabic ṛ. The closest thing I could find was Sanskrit m-r (why the hyphen?) on p. 50 which was transcribed with an unspecified rhyme 31 tangraph of his labial homophone group 17 which contains the rhymes

Rhyme Possible tones Nishida 1964 Li Fanwen 1986 Gong Hwang-cherng 1997 Arakawa 1999 This site
28 rising only -ʉɦ -ʉ -I
31 level or rising -i -jɨ -I: -iə
90 level only -ʉr -ur -ər -Ir -əʳ

I would have expected m-r to be transcribed with retroflex rhyme 90 rather than nonretroflex rhyme 31.

Li Fanwen and Gong reconstructed the rhyme 28 and 90 syllables with b- and the rhyme 31 words with m-. If they are correct, this group should be split in two: 17a for b- + rhymes 28/90 and 17b for m- + rhyme 31. Nishida split the group into three:

17a. 2mbʉɦ

17b. 1mbʉr

17c. 1mɨ, 2mɨ

His order reflects the order of the subgroups in Homophones which does not match the rhyme numbering based on the Tangraphic Sea.


12.5.4.23:59: VA-RI-ATION

My last post got me thinking about how Sanskrit is pronounced in languages other than Hindi.

Descriptions of ऋ <ṛ> in Marathi vary:

Burgess (1854: 5): "ribald (nearly,) or ru in French rue.": i.e., "nearly" [rɪ] or [ry]

Navalkar (1880: 3): "ri in rid": i.e., [rɪ]

Masica (1991: 115): [rɨ] (if I read him correctly)

Pandharipande (1997: xlvii) and Pandharipande (2003: 701): a "consonant" r̥: i.e., syllabic [r̩]?

Wikipedia (now): [ru]

[ry] is dubious since I don't know of any Indo-Aryan languages with front rounded vowels.

Do the others reflect social, regional, and/or temporal variation?

Thai ฤ <ṛ> is pronounced differently in different words (Gedney 1947: 89; rewritten in my notation):

ฤดู duu ~ raduu 'season' < Skt tu-

I have not seen the ra- pronunciation in other sources. Is it extinct?

ฤๅษี rɨɨsii (with long ɨɨ!) 'hermit' < Skt ṣi-

also ฤษี sii with a short vowel in Haas (1956: 68)

หฤทัย harɨthay 'heart' < Skt hdaya-

สันสกฤต saŋsakrit 'Sanskrit' < Skt saṃskta-

ฤ <ṛ> is also in the non-Sanskrit loanword อังกฤษ aŋkrit 'English', presumably spelled by analogy with saŋsakrit.

I assume Thai got its Indic vocabulary as well as its script through Khmer rather than directly from some Indian language. As far as I know, in modern Khmer, ឫ <ṛ> is only [rɨ]*, so I would expect Thai ฤ <ṛ> to only be [rɨ]. Do the various Thai pronunciations of ฤ <ṛ> reflect borrowing from different strata of Indo-Khmer with different pronunciations of ឫ <ṛ> that are now mostly obsolete? Do modern Khmer dialects have non-[rɨ] pronunciations of ឫ <ṛ>? Is there any evidence for Indic vocabulary coming through non-Khmer sources: e.g., Mon and Burmese? (How was ၒ <ṛ> pronounced in Mon and Burmese?** The character is now obsolete.)

What became of Skt in Indonesian? Looking at this Wikipedia entry, I see three kinds of correspondences:

Regweda (presumably [rə-]) or Rigweda < Skt gveda

Smrti < Skt Smti 'a category of Hindu scriptures'

But which of these words are old and which are modern adaptations of Sanskrit?

5.5.00:30: Indonesian re is reminscent of Javanese [rə] for <ṛ>.

5.5.1:01: I doubt Smrti is [smti] since I've never seen any mention of a syllabic r in descriptions of Indonesian.

5.5.1:54: The only correspondences I found in Doug Cooper's list of Sanskrit loans in Indonesian based on De Casperis (1997) and Mahdi (2000) are

In er : Skt ṛ:

amerta < Skt amṛta- 'nectar'

In ar : Skt ṛ:

kartika < Skt kṛttikā 'Pleiades'

In a : Skt (doubtful):

swasembada 'self-sufficient' < Skt sva- 'self' + saṃvṛddha 'thriving'

but sembada 'strongly built' by itself is derived from sambaddha 'joined', so swasembada is probably from swa- 'self' + sembada 'strongly built'.

*According to Pinnow (1980: 105), Maspero transcribed ឫ <ṛ> as with a symbol for a rounded vowel, but I wonder if ŭ is a typo for ư̆ [ɨ].

**5.5.00:43: Back in 1994, I interpreted Burmese <ui> [o] as formerly representing *[ɨ] or *[ə]. Could ၒ <ṛ> have been equivalent to ရို <rui> = *[rɨ] or *[rə]?

Wheatley (1987: 845-846) regarded <ui> as "a Mon invention for representing a mid front rounded vowel and it probably had the same value in Old Burmese". However, I think such vowels are unusual in Southeast Asian languages, so I would prefer to reconstruct an unrounded vowel.


12.5.3.23:59: A RI-L VOWEL IN HINDI?

The Sanskrit syllabic liquids ṛ, ṝ, ḷ, and the theoretical were traditionally regarded as vowels* and written like vowels**. But as far as I know, no modern Indo-Aryan languages have syllabic liquids anymore; they seem to have already disappeared in Middle Indo-Aryan. Nonetheless modern Indic scripts may still contain 'vowel' symbols for those extinct liquids.

Yesterday I mentioned how Khmer ឮ <ḹ> represented the native word lɨɨ 'to hear'; it never stood for a syllabic liquid in Khmer.

Devanagari has a symbol ऋ <ṛ> for Sanskrit syllabic which is pronounced as [rɪ] in Sanskrit loanwords in Hindi. This [rɪ] is the basis for ri for in lay romanizations of Sanskrit: e.g., Rigveda for gveda and Amritas for Amtas. [rɪ] is a consonant-vowel syllable, not a syllabic liquid.

So I was somewhat surprised to see ऋ <ṛ> [rɪ] listed as a vowel in both Kachru (1993) and Shapiro's (2003) descriptions of Hindi. If I did not know that this syllable arose from a Sanskrit syllabic liquid, I would not understand why it is distinct from रि <ri>, the regular Devanagari spelling of [rɪ].

Shapiro transcribed the consonant of ऋ <ṛ> with an apical tap [ɾ] distinct from his [r] for र <r>. Can ऋ <ṛ> [ɾɪ] and रि <ri> [rɪ] be distinguished in Hindi speech? Does Hindi have a phoneme /ɾ/ that only appears before one vowel in loanwords?

Kellogg's Hindi grammar (1876: 1) also listed ऋ <ṛ> [rɪ] as a vowel, but I expect the partial conflation of script and phonology in earlier works. Kellogg (1876: 9) denied that ऋ <ṛ> [rɪ] and रि <ri> [rɪ] were phonetically distinct.

How should Hindi ऋ <ṛ> [rɪ] be described? As a vowel in accordance with the script and tradition, or as a special spelling of [rɪ] in some (but not all) Sanskrit loans with [rɪ]?

*The Sanskrit syllabic liquids ṛ, ṝ, ḷ have the same distribution as vowels.

and undergo strengthening processes similar to those of vowels: e.g.,

Basic grade Guṇa grade Vṛddhi grade
ar ār
al (theoretically *āl; unattested)
i e < *ai ai < *āi
u o < *au au < *āu

i and u in the basic grade can be thought of as syllabic variants of y and w.

5.4.00:11: Different inflected and derived forms require different grades in Sanskrit: e.g.,

Basic grade: k-ta- 'did'

Guṇa grade: kar-o-ti 'does'

Vṛddhi grade: kār-a- 'doer'

3:21: Loanwords with all three grades are in Hindi, but I do not know how well Hindi speakers understand this system without learning Sanskrit.

**In Indic scripts, vowels typically have special forms for word-initial position and are otherwise subordinated to consonant symbols in other positions: e.g., in Bengali,

ঋ <ṛ> (initial position)

কৃ <kṛ> (subordinate position beneath ক <k>)

An exception is Thai which has no special forms for word-initial vowels or postconsonantal vowels:

ฤ <ṛ> (initial position)

กฤ <kṛ> (immediately following ก <k> but not subordinated to it)

Next:  Short e and o in Hindi (delayed but improved)


12.5.2.23:30: VOWELS IN YATES AND WENGER'S BENGÁLÍ* GRAMMAR (1885)

Sorry, no Hindi yet. I'd like to wrap up Bengali first.

19th century public domain grammars at Google Books allow me to easily see how languages (might) have changed over the last century. "Might" in parentheses because I have no idea how literary or obsolete the old descriptions are. For instance, Yates and Wenger (hereafter YW) distinguish between short and long vowels: they wrote the latter with acute accents. (Kellogg 1876 has the same notation for Hindi long vowels. Were acute accents common in colonial period works on Indian languages?) Does this mean that at least one prestige variety of spoken Bengali still had such a distinction, or is it just an artifact of transliteration from a script that retains that distinction? YW's descriptions on p. 3 and 15 indicate that

short a (in their transcription) was [ʌ] "especially before certain compound consonants" and [ɔ] elsewhere; "for the sake of uniformity with the custom in other Indian languages it is written a"

á "is the above letter lengthened" which sounds like [ɔː], but "has the sound of a in father" rules that out, so I assume it was [aː].

<yā> was "like <e>"or like the first a in the English affable": i.e., [æ].

the <a> of <bya> was "almost like <e>", implying that it was distinct from the <yā> of <byā> which was "like <e>" without any qualification. Perhaps my reconstruction of long *æ̅ for <yā> and short for <bya> was correct.

short i varied between [ɪ] and [iː] (so was it on its way to merging with long í?)

long í was [iː]

short u was [ʊ] without a long variant (so did short i and long í merge before short u and ú?)

long ú was [uː]

e was short [ɛ], halfway between the [æ] and [e] of modern Bengali. YW compared it to the short vowel of English thereɛə] rather than the long diphthong of English Dane [dejn]. I presume <eka> 'one' was [ɛk] rather than modern [æk].

o, despite the absence of an acute accent, was a long [oː]. (I am assuming the comparison with English note [nowt] was not exact.)

ai was [oj] but au was [aw], not [ow]. So much for my assumption of symmetry between the two diphthongs.

short lri**, despite the spelling with lr-, "is like li in little" and is transcribed later in the same line as li, so I think it was [lɪ]. As was the case with a [ɔ], the transcription is not a reliable guide to pronunciation.

long lrí "is the preceding lengthened, lí": i.e., [liː].

Yates and Wenger's Bengálí vowels (in my notation)

Height Short Long Short Long Short Long
High i ī
u ū
Upper mid e (none) (none) ō
Lower mid æ æ̅ (none)
Low
(none) *ā

Length is nonphonemic for e and o, contrary to my reconstruction.

YW transcribed final <ya> on p. 8 as -y, not -e. I reconstruct three stages:

Early 19th century Modern
<-ai> -oj -oj
<-aya> -ɔj e

Next: A Ri-l Vowel in Hindi?

*The acute accents for long vowels in "Bengálí" reflect the long vowels of its Hindi source बंगाली <baṃgālī>. Bengali for 'Bengali' is বাংলা <bāṃlā> Bangla.

**lri is for Sanskrit short syllabic which only appears in forms of the root kḷp 'be well ordered'.

lrí is for Sanskrit long syllabic which never appears in any real Sanskrit words. It was created "only for the sake of an artificial symmetry" with short syllabic (Whitney 1896: 11).

As far as I know, the letter for is only in common use in Khmer. ឮ <ḹ> represents the native word lɨɨ 'to hear'. This is not evidence for a long syllabic l in earlier Khmer; it indicates that Khmer lɨɨ was the closest possible approximation of Sanskrit ḹ.


12.5.1.23:59: RECONSTRUCTING VOWEL LENGTH IN BENGALI

On Monday, David Boxenhorn asked me about vowel frequencies in the modern descendants of Sanskrit. I've been trying to find figures in vain ever since. The closest I've gotten so far was this statement I found today about Bengali which has no numbers:

In terms of the frequency of occurrence also /æ/ and /ɔ/ have a lower rate among others, while /o/ and /a/ are the most frequent ones.

Oh wait, I did find Greenberg (2005: 18) yesterday:

Ferguson and Chowdhury report a short count on Bengali vowels in which the ratio of non-nasalized to nasalized vowels was 50:1. I counted the first thousand vowels in Stendhal's Le rouge et le noir and found 82.5% oral vowels to 17.5% nasal.

I wish I could see Ferguson and Chowdhury (1960) for more statistics.

I'm surprised /ɔ/, the default vowel of the Bengali script, isn't the most common vowel, as it corresponds to /a/, the most common Sanskrit vowel. I wonder if /o/ includes [o] written as both <a> and <ō>. Assuming that Bengali orthography is etymological, I suspect that Bengali once had vowel length distinctions in upper mid as well as high vowels before losing vowel length:

Earlier Bengali vowels: short vs. long with gaps in the system

Height Short Long Short Long Short Long
High *i
*u
Upper mid *e *o
Lower mid *æ̅ (none)
Low
(none) *ā

Modern Bengali vowels: no length

Height Front Central Back
High i
u
Upper mid e o
Lower mid æ ɔ
Low
a

Correspondences between Bengali spelling and earlier and modern Bengali vowels?

Bengali spelling Earlier Bengali? Modern Bengali
<a> short and short *o [ɔ] and [o]
<o> long [o]
<ai> *oy with short *o [oy]
<au> *ow with short *o [ow]
<ya> in word-final position short *e [e]
<e> long > short *e [æ] and [e]
<ya> after <b> short
<yā> long *æ̅ [æ]
<i> short *i [i]
<ī> long
<u> short *u [u]
<ū> long

This post and the previous one draw heavily upon Dasgupta (2003) and also rely on Bagchi (1996) and Klaiman (1993) to a lesser extent. I don't actually know any Bengali, so I've probably made a lot of mistakes.

I don't see any reason to posit a long or a new short *a (the original became short *ɔ).

I assume short *o developed as a variant of *ɔ. (I won't go into the environments where raised to *o.) Short and long *o merged in speech as *o but remain distinct in writing.

Short *e had a very restricted distribution, so the functional load of the short-long *e distinction was low. Some long lowered to long *æ̅ in certain environments (see below), but others merged with short *e.

The different treatment of <bya> and <byā> in modern Bengali leads me to believe that monophthongization might have preceded the loss of length. If *bya and *byā merged into *bya and then monophthongized to *bæ, then it would be impossible to explain why *bæ from *bya could raise to upper mid *be before high vowels unlike *bæ from *byā. I prefer to keep the two syllable types distinct even after monophthongization as *bæ and *bæ̅. The former could raise to upper mid be to harmonize with following high vowels, whereas the latter remained mid-low even after length was lost.

Short might have been higher than long *bæ̅, just as short and its Sanskrit source a were higher than long *ā. A higher would be more prone to merger with *e (already shortened?) than lower *æ̅. The height of the vowel resulting from the merger of short and *e is determined by the height of the following vowel: [æ] before nonhigh vowels and [e] before high vowels. 

What is so special about *by as opposed to other *Cy-clusters?

<bya> [bæ]

but <Cya> [Cɔ] if C ≠ <b>

Perhaps the fact that *by- < *vy- and *v- and *y- are the consonant counterparts of the high vowels that condition vocalic phenomena is relevant. *by- is the only cluster that originated from a glide sequence.

Next:  Short e and o in Hindi


12.4.30.23:59: THE A-E-O CYCLE

Sanskrit a is almost as common as all the other vowels combined in that language. Last night, I was thinking that post-Sanskrit vowel systems might be headed toward equilibrium: fewer a and more non-a vowels. However, tonight I realize that the origins of Sanskrit a indicate that equilibrium is not a universal destination. The merger of Proto-Indo-European *e, *o, *n̥, *m̥ ̥ into Skt a (and long PIE and into Skt ā) went in the opposite direction, flooding the system with a-vowels (in bold). (But note that short *a-diphthongs did lose their a-quality.)

Late Proto-Indo-European Sanskrit Pali*
*a, *e, *o, *n̥, *m̥ a a
, *ē, *ō ā ā
*āi, *ēi, *ōi ai e, i
*āu, *ēu, *ōu au o, u
*ai, *ei, *oi ē (no short e in Sanskrit**) e
*au, *eu, *ou ō (no short o in Sanskrit**) o
*i i i
ī ī
*u u u
ū ū
*r̥, *l̥ a, i, u

But judging from spelling, Bengali might have gone full circle by developing new e and o-like vowels from Sanskrit a-vowels:

Bengali spelling Bengali phonetics
<ya> in word-final position [e]
<bya> [bæ]; [be] before a high vowel
<yā> [æ]
<a>, <ya> before any consonant other than <b> [ɔ] and [o]
<ai> [oj]
<au> [ow]

5.1.2:17: These new e and o-like vowels do not necessarily correspond to Proto-Indo-European *e and *o: e.g.,

'nine': Bengali <naya> nɔe < Skt nava < PIE *newn

'hundred': Bengali <śata> ʃɔt < Skt śatam <  PIE *km̥tom

'mind': Bengali <mana> mɔn < Skt man-as <  PIE *men-

*5.1.1:12: Pali /e/ and /o/ are mostly long [eː] and [oː] but have short allophones before geminate consonants.

**5.1.2:19: Sanskrit ē and ō are often written as e and o since they are always long.


12.4.29.23:59: PHONOSTATISTICS AND GRAPHOSTATISTICS

I never thought that frequencies of consonants and vowels could be counted until I encountered a chart of such frequencies in Whitney's Sanskrit Grammar almost twenty years ago.

Looking at conventional charts of Sanskrit vowels

Short vowels

i
u

a

Long vowels and diphthongs (e and o are always long, so they are not written with a macron)

ī
ū
e
o
ai
au

ā

one might think that the vowels are all roughly equally common, but that is not the case: the most common vowel (a) is

one out of every five segments in Sanskrit

1.07 times more common than the next four most common vowels combined (ā, i, e, u)

2.4 times more common than its long counterpart ā (which is 1.07 times more common than the next two most common vowels i and e combined)

27 times more common than the least common vowel ū or the most common syllabic consonant (consonant appearing in the same positions as vowels)

110 times more common than the least common diphthong au (āu in Whitney's notation)

1978 times more common than the least common syllabic consonants and

(4.30.3:09: Here's a graph showing that nearly half the vowels in Sanskrit are a:

For a graph in a different format, see "Visualizing Sanskrit Vowel Frequency".)

The development of a system so heavily skewed toward a would be different from that of a system with no a at all like Beekes' Proto-Indo-European reconstruction. By observing trends in languages with different phonostatistical patterns, one might be able to make predictions about later changes or explain known (and sometimes baffling) changes.

I am not surprised that Beekes' PIE developed an a and its descendants all have a, because I can't remember ever hearing of a language without a (not counting claims of phonemically - but not phonetically! - 'vowelless' languages).

Conversely, it just occurred to me that the monophthongization of the diphthongs ai and au to e and o in P*ali reduced the frequency of a (the first half of those diphthongs) - a step toward equilibrium? But then again, sometimes became a in Pali, increasing the frequency of a.

Years later, I wrote a PhD dissertation based on what I'd now call graphostatistics (only 20 hits in Google so far - this will be number 21!). I looked at spelling frequencies in Old Japanese texts to determine the most likely pronunciations of OJ consonants and vowels.

Whitney compiled his figures by hand in the 19th century, whereas I used a computer at the end of the 20th century. As more texts are digitized at Google Books and elsewhere, the opportunities for phonostatistical and graphostatistical studies are on the rise. But how many are taking advantage?


12.4.28.23:48: MUL-LING OVER KOREAN L-STEMS

Last night I could have mentioned three more verbs that could be confused with the mut-verbs:

물-다  mul-ta 'bite'

물-다  mul-ta 'pay'

물-다  mūl-ta 'go bad, spoil, turn sour'

(The hangul spelling of all three verbs is identical despite differences in vowel length. Not all speakers distinguish long and short vowels.)

Vowel length aside, they have the same 'infinitive'* as 묻-다  mut-ta 'ask' but not 묻-다  mut-ta 'bury':

물-어  mur 'bite/pay/spoil/ask'

물-어  r 'spoil'

묻-어  mud 'bury'

What would be a good test of confusion between the two? I Googled

사체(死體)-를 물-어

sachhe-rŭl mur

'bury a corpse'

with the wrong infinitive and got the legitimate

사체(死體)-를 물-어-뜯-

sachhe-rŭl mur-ŏ-ttŭt-

'bite a corpse' (lit. '... bite-INF-tear apart')

since Google does not take Korean word spacing into account.

I had a hard time with Korean verbs with l/r as a student. Apparently even native Koreans do. Today I found this passage in Martin (1992: 238; I have changed his romanization to match mine and added emphasis and hyphens):

The l-doubling vowel stem has a shape which ends in vowel + rŭ-. When the infinitive (-ŏ/-a) or the past tense (-ŏss-/-ass-) is attached, the vowel ŭ drops, as expected [the vowel sequence ŭ-ŏ is absent from Korean verb conjugation], and the remaining l geminates [= doubles] - as not expected:

purŭ- > pull-ŏ 'calls'

morŭ- > mōll-a 'does not know' (the long ō in the infinitive [of 'does not know'] and forms derived from it is an irregularity).

Many Koreans regularize these verbs by doubling the l everywhere; they treat the stems as pullŭ-, mollŭ-, etc.

This error seems much more common than confusing the conjugation of 'bury' and 'ask':

Verb Correct Incorrect
부르-다
purŭ-da
'call'
부른다
purŭ-nda
'calls'
16,100,000 results
부른다
pullŭ-nda
'calls'
91,400 results
모르-다
morŭ-da
'not know'
모른다
morŭ-nda
'does not know'
32,300,000 results
몰른다
mollŭ-nda
'does not know'
43,400 results

-ㄴ다 -nda is a present ending for vowel-final stems. The equivalent ending for consonant-final stems is -는다 -nŭnda which was in two examples from last night:

묻-는다  mun-nŭnda 'buries' (present; -t > -n before -n-)

묻-는다  mun-nŭnda 'asks' (present; -t > -n before -n-)

Going back to where I started from, -l stems like 물-  mul- lose their -l before certain endings and become vowel-final stems: e.g.,

문다 mu-nda (not *물-는다 *mul-lŭnda** with -l) 'bites'

문다 mu-nda (not *물-는다 *mul-lŭnda with -l) 'pays'

문다 mū-nda (not *물-는다 *mūl-lŭnda with -l) 'spoils'

I was surprised to see 4,510 Google results for the incorrect form *물는다 *mul-lŭnda 'bites' (and perhaps also 'pays'; 'spoils' is a stative verb that is not followed by the processive verb ending -lŭnda < -nŭnda).

*The Korean 'infinitive' is not like a European infinitive. It can even be the sole finite verb in a 半말 panmal 'half-speech' style sentence (23:57: hence Martin's translations with '-s': 'calls'). So I am not comfortable with the term even though some linguists use it. I would rather not argue that Korean verbs have an infinitive and a homophonous finite form.

*Korean n- becomes l- after -l, though this is not reflected in hangul spelling (in < >):

물-는다 <mur-nŭnta> mul-nŭnda > mul-lŭnda

There is only one hangul letter ㄹ <r> for [r] and [l].


12.4.27.23:59: BURYING QUESTIONS, ASKING CORPSES

Two Korean verbs have identical forms in dictionaries:

묻-다  mut-ta 'bury'

묻-다  mut-ta 'ask'

They remain identical as long as consonant-initial suffixes are added: e.g.,

묻-고 mut-ko 'bury, and ...'

묻-고 mut-ko 'ask, and ...'

묻-는다  mun-nŭnda 'bury' (present; -t > -n before -n-)

묻-는다  mun-nŭnda 'ask' (present; -t > -n before -n-)

However, the -t of 'ask' (but not 'bury') becomes -r before vowel-initial suffixes: e.g.,

묻-었다 mud-ŏtta 'buried' (-t- > -d- between vowels)

물-었다 mur-ŏtta 'asked'

묻-으면 mud-ŭmyŏn 'if ... bury'

물-으면 mur-ŭmyŏn 'if ... ask'

(All vowel-initial suffixes are written with the zero consonant letter ㅇ shaped with a zero.)

Given the substantial overlap between the paradigms of 'bury' and 'ask', I wondered if Koreans mix them up. I Googled the following phrases with correct and incorrect conjugation:

Verb Correct Incorrect
'bury' 사체(死體)-를 묻-었다
sachhe-rŭl mud-ŏtta
'buried a corpse'
27,700 Google results
*사체(死體)-를 물-었다
*sachhe-rŭl mur-ŏtta
'asked a corpse'
0 Google results
'ask' 질문(質問)-을 물-었다
chilmun-ŭl mur-ŏtta
'asked a question'
2.6 million Google results
*질문(質問)-을 묻-었다
*chilmun-ŭl mud-ŏtta
'buried a question'
1 Google result

Given the extreme rarity of errors, Korean learners have no excuse not to conjugate these two verbs correctly.

However, I was surprised to see 31,300 results for *듣-었다 *tŭd-ŏtta, an error for 들-었다 r-ŏtta 'heard', the past form of 듣-다 tŭt-ta 'hear' which conjugates like 묻-다 mut-ta 'ask'. Why is the wrong form of 'heard' much more common than the wrong form of 'asked'? (I presume 'hear' and 'ask' have similar frequencies as common verbs of speech; one hears and asks more than one buries, so I would expect 'bury' to have a lower frequency.) There is no homophonous verb*듣-다 *tŭt-ta with -d before vowel-initial stems, so 듣-었다 *tŭd-ŏtta 'heard' must be by analogy to 듣-다 t-ta and other -t-final forms of 'hear'. (The shift of t to d between vowels in *tŭd-ŏtta is automatic.)

In theory, Koreans could reanalyze -t stems as -n stems due to forms like 묻-는다  mun-nŭnda, but I can't find any examples of

*사체(死體)-를 문-었다 *sachhe-rŭl mun-ŏtta 'buried a corpse'

*질문(質問)-을 문-었다 *chilmun-ŭl mun-ŏtta 'asked a question'

with -n instead of the correct stem-final consonants. Perhaps this is because Korean verbs with nasal-final stems are rare*; there are not enough of them to serve as analogical models for conjugating the far more common verbs with -t stems.

*4.28.1:12: If Alexander Vovin (2010) is correct, this rarity is the result of a sound change:

- Old *-nt stems became -t stems:

*munt- > mut- 'bury'

There are no longer any -nt stems in Middle Korean or modern Korean.

- Old *-t stems became -r stems before vowels:

*mut- > mur- 'ask' (but still mut- before consonants)

- Old *-n stems almost always became vowel-final stems:

*on- > o- 'come'

but -n- remains in the imperative on-ŏra! 'come!'

but all other imperatives of modern vowel-final stems lack -n-: e.g.,

po-ara! 'see!',  not *pon-ara! 'see!'

Exception 1: an- 'embrace' might be an archaism that retained its -n

Exception 2: shin- 'put on one's feet' might have retained -n to continue to resemble its source, the noun shin 'shoe'.

These are the only two 'pure' modern -n stems. The others listed in Martin (1992: 364) are abbreviations of stems ending in vowels or clusters:

non- ~ nonŭ- 'distribute'

mun- ~ munŭ- 'demolish'

kkŭn- ~ kkŭnh- 'cut'


12.4.26.23:59: PROTHESIS IN PRONOUNS: SWEDISH AND SLAVIC

Here's yet another Wikipedia passage that had me thinking, Is this the real deal?

[Swedish] ni ['you' (pl.)] is derived from an older pronoun I*, "ye", for which verbs were always conjugated with the ending -en. I became ni when this conjugation was dropped; thus the n was moved from the end of the verb to the beginning of the pronoun.

At first I thought, no way, Swedish verbs follow their subjects, so how could the -en be the source of an n- on the preceding pronoun?

I VERB-en > n-i VERB-?

But Swedish also has a verb-subject construction in which the -en would directly precede I:

VERB-en I > VERB n-i

Was this second order extremely common in older Swedish?

This change has a parallel in Old Norse.

Slavic languages also have a prothetic n- in pronouns. These selected forms give you some idea of the variation within Slavic:

Gloss Proto-Slavic (Schenker 1993: 90): no n- Serbo-Croatian: nj- in all four Czech: n- in loc. only Polish: n- sometimes in gen. and dat.; n- always in inst. and loc. Russian: n- sometimes in gen., dat., and inst.; n- always in loc.
3rd sg. gen. m. *jego njega jeho jego ~ niego jego ~ nego
3rd sg. dat. m. *jemu njemu jemu jemu ~ niemu jemu ~ nemu
3rd sg. inst. m. *jimь njim jím nim im ~ nim
3rd sg. loc. m. *jemь njemu nemu njom

Polish and Russian n-forms are used before prepositions. The locative always has n- since it is always preceded by prepositions. One  might then conclude that the n- is an old final consonant of one or more prepositions that came to be associated with the following pronoun. Yes, I guessed right for once! Bacz (2009: 168) wrote (emphasis mine):

According to the historical grammars of Polish (e.g. Kuraszkiewicz 1972: 130-31), the pronominal third person n'- [i.e., palatalized n-] forms replaced the original suppletive j-forms in the declensional paradigm of the pronouns on, ona, ono 'he, she, it'  when -n, the final consonant of the prototypical Slavic prepositions *vъn (modern w) 'in' and *sъn (modern z) 'with' shifted and mechanically attached itself to the locative and the instrumental j-forms of the following pronouns, respectively. The shift is illustrated by the examples in (4) taken from Doroszewski and Wieczorkiewicz (1972: 92).

(4) Forms before the shift : Forms after the shift

*vъn-jemь-LOC. 'in him' : w nim-LOC.'in him'

*sъn-jimь-INST. 'with him' : z nim-INST. 'with him'

The initial n' of the prepositional pronominal forms in the locative and the instrumental after the prepositions w 'in' and s/z 'with' has, with time, generalized to the other prepositions used with these cases [which did not end in -n] (such as przy nim-L 'next to him', po nim-L 'after him', etc.) and to the other prepositional cases: genitive, dative and accusative (o niego-G 'to him', ku niemu-D 'toward him', przez niego-A 'because of him').

4.27.0:11: Bacz introduced me to Polish preposition-pronoun contractions: e.g., weń < w niego 'in him'. In English, contractions may be considered nonliterary, but in Polish, these contractions are literary, which explains why I haven't seen them in grammars for learners.

*Danish still has I 'you' (pl.) without n-. Norwegian once had I but now has dere which was "only slowly breaking its way into literary language" in the early 20th century (Groth 1914: 71). Note that this capitalized I is a second person plural unlike its first person singular English homograph I. I assume northern Germanic I is cognate to

English ye, you

Dutch jij ~ je (coincidentally a homograph of French je 'I') 'you' (sg.) and jullie 'you' (pl.; < jij + lui 'people')

German ihr 'you' (pl.)

and even Sanskrit yūyam (nom.), yuṣma- (oblique stem) 'you' (pl.)

Icelandic has þið 'you' (pl.; originally dual; cf. Old Norse þit). The old plural is þér (same as Old Norse), presumably cognate to Norwegian dere and homophonous with the dative singular of Icelandic þú 'thou'.

The þ- of the Old Norse pronouns (and by extension Icelandic and the d- of Norwegian dere) is from a verb ending (emphasis mine):

The nominative forms are often suffixed to the verb, e.g. mæli-k 'I speak', má-k-at 'I cannot' (-at 'not', frequent in poetry). Similarly heyrðu and skaltu < skalt þú. Such occurrences with the dual and plural forms of the second person pronoun led to re-analysis on the part of the speakers: skuluð ér > skuluðér was subsequently interpreted as skulu þér. Hence the alternate forms þit and þér [for original dual it and plural ér] and the frequent use of the 3rd person plural [with þ-] in place of the 2nd person.

Old Norse ér 'you' looks like a cognate of German ihr 'you' (pl.).


12.4.25.23:59: DEFINITE ARTICLES IN COLOGNIAN: WHAT'S THÉ DIFFERENCE?

Last Tuesday, I returned to blogging after a long hiatus because I was driven to look into the dubious Wikipedia claim that there were Hmong in the Tangut Empire. And last Saturday I found an error in the description of Gan tones in the English Wikipedia (now corrected!). So as much as I love Wikipedia, I don't believe everything I read on it. There are errors there (and here too - sigh). Tonight I saw these two passages and initially wondered what they were describing (emphasis mine):

English Wikipedia: Colognian has two distinct sets of definite articles indicating focus and uniqueness

German Wikipedia: Die bestimmten Artikel des Kölschen haben jeweils zwei Ausprägungen, eine betonte und eine unbetonte, wovon die betonte Variante mit dem entsprechenden Demonstrativpronomen zusammenfällt. Sie wird vor allem benutzt, wenn auf einen bestimmten unter mehreren möglichen oder einen bereits bekannten Gegenstand Bezug genommen wird. „Es dat et Enkelche?“ Ist das Ihr Enkelchen / eines Ihrer Enkelchen? Aber: „Es et dat Enkelche?“ Ist es dieses Enkelchen?

'The definite article of Colognian has two forms, one stressed and one unstressed; the stressed variant coincides with the corresponding demonstrative. It [the unstressed variant?] is used especially when referring to a certain one among several possiblities or one already known object of reference. „Es dat et Enkelche?“ Is that your grandson/one of your grandsons? But: „Es et dat Enkelche?“ Is it this grandson?

I was confused at first because I took "distinct" to mean 'segmentally distinct' which seemed to clash with stressed vs. 'unstressed'. I was erroneously assuming that a difference in segments could not be accompanied with differences in stress.* Wikipedia didn't seem to be wrong - but I was! D'oh! Then a light clicked on ... the right one, I hope:

Es dat et Enkelche?

Is that the (unstressed) grandson (= your grandson or one of your grandsons)?

Es et dat Enkelche?

Is it the (stressed) grandson?

Et ('the'; neuter nominative singular) looks like it should be a shorter, unstressed derivative of the demonstrative dat (cf. Dutch het 'the' (neuter) and dat 'that') referring to a certain known object of reference (e.g., your grandson). Its stressed counterpart is dat 'the' (cf. standard German das 'the' (neuter) with -s < -t) which is identical to the demonstrative dat.

4.26.1:00: My attempt at a table of Colognian definite articles:

gender stressed 'the' / 'this' unstressed 'the'
masculine (cf. Dutch de 'the') der (cf. standard German der)
feminine die (cf. standard German die 'the', Dutch die 'that') de (cf. Dutch de)
neuter dat (cf. standard German das 'the', Dutch dat 'that')
'this' (but not stressed 'the'!) can also be dis or dit (cf. standard German dies, Dutch dit, Eng this)
et (cf. Dutch het)

I got and de from this standard German-Colognian translator site and from browsing the Colognian Wikipedia (which oddly does not seem to have an article on Colognian). Ah, I went to the site of the Akademie för uns Kölsche Sproch 'Academy for Our Colognian Language' and used its German-Colognian online dictionary to find the other nonneuter pronouns der and die which happen to look like standard German. I was hoping for more exotic forms like d-less ä and e which would be to and die what d-less et is to dat.

Note the asymmetry: stressed m. 'the' is like Dutch 'the', but the other stressed 'the' are like German 'the' and Dutch 'that'; conversedly, unstressed m. 'the' is like German 'the', but the other unstressed 'the' are like Dutch 'the'. I would have predicted that German-like der was the stressed form, but it's actually the unstressed form even though it's longer than unstressed dä.

*4.26.1:10: Stressed and unstressed forms can be segmentally quite different: e.g.,

English a can be stressed [ej] or unstressed [ə] (though this is not indicated in spelling).

The dative singular of the Polish second person pronoun (i.e., 'to thee') can be stressed tobie [tɔ́bʲe] or unstressed ci [tɕi] (< earlier ti?).

How could I have forgotten that?


12.4.24.23:54: SOURCE OF THE SUN?

I looked up the tonal term (and 'sun' among other things) 陽 yang (as in yin-yang) in Schuessler's ABC Etymological Dictionary of Old Chinese (1997: 558) which linked it to Siamese ปลั่ง <plaŋ1> 'shiny', following Unger (in Hao-ku 33 (1986), a journal I've never seen. Let's suppose the Siamese word was borrowed from Chinese. Perhaps the earliest reconstructible form of the word in Chinese was

*pɯ-laŋ

Other Sino-Tibetan languages have -laŋ with or without various preceding syllables. Perhaps *laŋ was the root and preceding syllables are language-specific prefixes.

I reconstruct a generic high vowel (symbolized as *ɯ) in the presyllabic prefix to condition the partial raising of the root vowel:

*pɯ-laŋ > *pɯ-lɨ > *lɨaŋ > *jɨaŋ > modern Mandarin yang

*l lenited to *j before higher vowels, whereas it became 'emphatic' (pharyngealized; indicated with underlining) before lower vowels and hardened: e.g., in

*laŋ > *laŋ > *d > modern Mandarin tang 'sweets'

written with the same phonetic 昜 *laŋ plus semantic 𩙿'eat'

is 糖 Md tang < *daŋ 'sugar' cognate?

which might have lost a high vowel presyllable still in

𩛿 ~ 餳 *sɯ-laŋ > *sɯ-lɨ > *slɨaŋ  > *zlɨaŋ > *zɨaŋ > modern Mandarin xing 'dried sweets'

Note the variation in phonetics: 易 as well as 昜 with an extra stroke*.

None of the 'sweets' words have early attestations.

Possible cognates:

*kɯ-s-laŋ > *khl or *kɯ-t-laŋ > tɕhɨaŋ > Md chang 'bright'

*s-laŋ-ʔ > *hlaŋ-ʔ with a glottal stop suffix might be the source of Vietnamese láng 'shiny'

炳/昺/邴 *T-pɯ-laŋ-ʔ > *rplɨaŋʔ > *prɨaŋʔ > *pɨaŋʔ > Md bing 'bright'

Old Chinese is often reconstructed with medial *-r- all over the place. This is odd. Could a lot of these *-r- be from earlier *t- or even *l-preinitials? I use *T- to represent an uncertain coronal preinitial (*t-, *l-, *r-).

None of the above words have *-s > *-h. The latter is the likely source of the old Thai tone 1 (now a low tone) in ปลั่ง <plaŋ1> 'shiny'. Perhaps it reflects a southern Old Chinese word *plaŋ-s or *plaŋ-h.

*Could a 日 drawing of the sun have been a phonetic *lVK in 易 *lek 'to change' as well as 昜 *laŋ 'south side'?

4.25.0:05?: No, going by Grammata serica recensa, early forms of 日 don't resemble what became 日 in the modern form of 易.

I also thought 日 could be a *lVK phonetic in 昌 (see above), but once again the early forms don't match.


Tangut fonts by Mojikyo.org
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2011 Amritavision