(Posted after expansion on 15.12.18.)

I thought it might be fun to try to work out my own reconstruction of Monic rhymes using the data in Diffloth (1984: 286) before seeing his solution. Real forms are in bold.

Correspondence 1 2 3 4 5 6
Proto-Monic *-eK *-eC *-iC *-iK *-ɛC *-ɛK
Pre-Nyah Kur *-iK *-iC *-iiC *-iiK *-ɛeC *-ɛeK
Nyah Kur -iC -iiC -iiK -iiC -eeC
Pre-Literary Mon *-eK *-iC *-iK *-eC *-eK
Literary Mon <-eK> <-iK> <-eK>

I assumed that Mon generally preserved vowel heights: <e> rhymes came from Proto-Monic nonhigh vowels, and the vowel of *-eC raised to assimilate to the following palatal.

My solution has a short absent from the reconstruction in my last post. (But such a vowel was in an earlier draft of that post. Both versions of my reconstruction lacked length.)

Capital letters stand for palatals (-c, -ɲ) and velars (-k, -ŋ).

I posit a chain shift in Nyah Kur:

1. *iii (contra last night's post in which *e > *ei > Nyah Kur ii)

2. *e > i (and *-iK > *-iC - but why didn't -iiK become *-iiC?)

3. > *ɛe

3a. *-ɛeC > *-eiC > -iiC

3b. *-ɛeK > *-eeK > *-eeC

In Written Mon, lower mid front vowels raised, and palatals backed:

1, *-eC > *-iC

2. > <e>

3. *C > <K> I finally got around to looking at Diffloth's solution two and a half weeks later. He used Old Mon and non-Monic languages to help him. I wonder what my solution would have looked like if I had checked that data..

Correspondence 1 2 3 4 5 6
Proto-Monic (this site) *-eK *-eC *-iC *-iK *-ɛC *-ɛK
Proto-Monic (Diffloth 1984: 288) *-iK *-iC *-iiK *-eeK *-iiC *-eeC
Pre-Nyah Kur (this site) *-iC *-iiC *-eeK *-iiC *-eeC
Nyah Kur -iC -iiC -iiK -iiC -eeC
Pre-Old Mon (this site) *-eC *-iC? *-iiK *-eeC
Old Mon (phonetic; this site) *-eiC *-iC? *-i(i)K *-eiC
Old Mon <iC> (once) ? <-iK>/<-īK> <-iK> <-iC> <-eC>/<-iC>
Literary Mon <-eK> <-iK> <-eK>

I prefer Diffloth's solution to my own not only because of its firm comparative grounding but also because

- it made use of existing contrasts (short vs. long, i vs. e) instead of eliminating length and introducing a third height (artifacts of the Pre-Proto-Monic reconstruction I didn't post).

- the phonetic shifts that I think it requires make more sense (and are much simpler for Nyah Kur though not for Mon):

Nyah Kur:

1. *-K > *-C after *i(i)

2. *ee > ii


1. Neutralization of long vowel height before *-K and *C:

*-eeK > *-iiK

*-iiC > *-eeC

2. Neutralization of vowel height before *-C:  *-iC > *-eC (to match *-eeC)

3. Diphthongization of *e(e) to assimilate to a following *-C:

*-e(e)C > *-eiC

There was no Indic vowel symbol for [ei], so *-eiC was written in Old Mon as both <eC> and <iC>.

4. Loss of vowel length: *-iiK*-iK

I predict that the first attestation of <-īK> with a long vowel predates the first attestation of <-iK>. If that is not the case, then vowel length was already being lost at the time of the earliest known Mon texts, and <-īK> with a long vowel represented a conservative, waning pronunciation.

5. Backing of *-C to *-K after front vowels (dissimilation?)

*-(e)iC > *-(e)iK

6. Straightening of *ei to <e> before *-K (the *-i- was motivated by a following palatal that no longer exists)

*-eiK > <eK>

One could regard 5 and 6 as the same change on an abstract phonemic level:

*/eC/ > /eK/

since */e/ was *[ei] before */C/. (There was no contrast between *-eC and *-eiC.)

Note that some of the above changes are mine and not Diffloth's, though their starting point is his.

I'm embarrassed by how wrong I was, but I'm keeping my original solution in this post anyway as an example of what not to do: namely, fail to exploit existing resources. One approach to solving problems like this one is to calculate the maximum number of possibilities given existing variables and then see which possibilities make for the simplest 'story':

2 heights (i, e) * 2 lengths (short, long) * 2 coda classes (-C, -K) = 8

1. *-iC

2. *-iiC

3. *-iK

4. *-iiK

5. *-eC

6. *-eeC

7. *-eK

8. *-eeK

Although "2 lengths" may seem redundant, there are languages with three vowel lengths (more examples here). (Mon, however, is not one of them.)

*-eC and *-eK can be ruled out at the proto-Monic level because Nyah Kur, the only Monic language with vowel length, lacks short e.

Another error I made was investing too much in wrong hypotheses: the absence of vowel length and the presence of a third front vowel height. If I had worked from the eight possibilities above, I might have seen how a more Nyah Kur-style solution was superior.

Lastly, the improbability of Pre-Nyah Kur *-iiK not becoming *-iiC given *-iK-iC in my reconstruction should have been a warning sign. It would make phonetic sense for Pre Nyah Kur *-iiK to become *-iɨK (i.e., have its second half lose its palatality to assimilate to the following velar) which would then be immune to the change *-iK > *-iC. Then *-iɨK would simplify into to Nyah Kur *-iiK. However, such a complex intermediate stage is obviously an attempt to salvage a flawed reconstruction; it adds unnecessary complexity. Diffloth's solution is much simpler: *-eeK-iiK. PROTO-MONIC VOWELS IN DIFFLOTH (1984)

(Posted after expansion on 15.12.17-18.)

I have been reading Gérald Diffloth's The Dvaravati Old Mon Language and Nyah Kur (1984) to understand Monic history and to get a better grasp of vowel warping which also occurred in Chinese and possibly also Tangut. The complex diphthongs,of modern Monic languages come from much simpler Proto-Monic vowels: e.g.,

Singu ʔəsʌe̲i̯a < Proto-Monic *k[r]sw 'to whisper'

The underline and subscript diaresis indicate the most prominent vowel with clear (underline) or breathy (diaresis) phonation. Proto-Monic did not have phonemic phonations (though perhaps the two types of phonation already existed on a subphonemic level). Conversely, the subscript inverted breve indicates the least prominent vowel.

I have not found a diagram of Proto-Monic vowels (as opposed to rhymes) in Diffloth's book, so I made one myself:

*i/*ii length only contrastive before final palatals and velars *ɯ/*ɯɯ only two instances of *ɯɯ known *u/*uu *uu nearly in complementary distribution with *oo
*iə nearly in complementary distribution with *ɛɛ (no *ɯə!) (*uə as alternative to *ɔɔ)
(no *e or *ee!) /*əə only two instances of *əə known -/*oo nearly in complementary distribution with *uu
-/*ɛɛ nearly in complementary distribution with *iə *a/*aa /*ɔɔ length only contrastive before final velars could *ɔɔ be *uə?

The five most common vowels according to Diffloth (1984: 284) are in large type.

Although *ɯ(ɯ) is a back vowel, I have followed Diffloth in grouping it with the central vowels.

I moved the low vowels *a/*aa into the lower mid row for a more compact chart.

If I ignore marginal contrasts, I could reconstruct a more symmetrical Pre-Proto-Monic system with a phonemic length distinction for only one vowel:

*i *u
*iə > *ɛɛ (no *ɯə!) *uə > *ɔɔ
*e > *ii  *o > *oo and *uu
(no *ɛ) *a/*aa

11.30.0:10: I could even eliminate that one last remaining length distinction by positing a chain shift: *ɯə > > *a > *aa. But I would rather reconstruct a high-frequency vowel as *aa instead of *ɯə.

Here's how I think front vowels could have developed between Pre-Proto-Monic and Proto-Monic:

Stage 1 Stage 2 Stage 3
*i *i *i/*ii (< *ei < *e)
*iə *iə *iə
*e *e (no *e)
(no *ɛ) *ɛɛ (from *eɛ < *ie < *iə before glottal stops) *ɛɛ

For comparison, here's a table of the development of back vowels:

Stage 1 Stage 2 Stage 3
*u *u *u/*uu (< *ou <  *o before nongrave consonants)
*uə (no *uə) (no *uə)
*o *o *oo (< *ou < *o before grave consonants)
*ɔ/*ɔɔ (from *oɔ < *uɔ < *uə)  *ɔ/*ɔɔ

In both systems, diphthongs (*iə, *uə) monophthongized into long vowels, and mid vowels (*e, *o) became yet more long vowels.

It is not surprising that there is only a three-way contrast between front vowels since front vowels are less frequent than central or back vowels in Proto-Monic. Do other Austroasiatic languages have relatively low frequencies of front vowels? The imbalance of front and back vowels in Pre-Proto-Monic may go back to Proto-Austroasiatic. Shorto's 2006 reconstruction of Proto-Austroasiatic has no corresponding to *ɔ.

12.18.7:28: I am not happy with the above proposals since they require unmotivated splits:

- *iə*iə except

- before glottal stops: *iə*ɛɛ

- in two problematic words without glottal stops (Diffloth 1984: 282-283):

*t[l]m[ɛɛ/aa]t 'flattened' (Nyah Kur points to *ɛɛ, but Mon points to *aa)

*k[ ]ʔɛɛm 'to clear one's throat' (a sound-symbolic exception to sound change?) 

- *o > *oo or *uu before glottal stops (the height is otherwise predictable; see Diffloth 1984: 276-278, 380-381)

One might be able to write off the two unexpected cases of *ɛɛ, but there are ten instances of both *oo and *uu before glottal stops (Diffloth 1984: 377-378). Twenty words are too many to be ignored. If Pre-Proto-Monic had length distinctions for *a and *u (and other central vowels?), perhaps it had one for *i as well:

*i/ii *ɯ/ɯɯ? *u/uu
(no *e/*ee) *ə/əə? (no *o)/*oo
(no *ɛ)/*ɛɛ *a/*aa *ɔ/*ɔɔ

That inventory is almost identical to Diffloth's. We have come almost full circle.

The distribution of distinctive long vowels is identical to that of Nyah Kur (Diffloth 1984: 52). HOW DID KUMĀRA BECOME KAUR?

(Posted after expansion on

The Sikh surname ਕੌਰ Kaur (Punjabi 'princess') is from Sanskrit kumāra 'boy, prince' (f. kumārī) which has the following Punjabi descendants listed in Turner's dictionary:

kavār, kãvārā, kavārā, kuārā, kamārā m. 'bachelor'

kãvārī, kavārī, kuārī, kamārī, f. 'virgin'

kãvar m.' prince'

kaür, kaur m. 'boy, prince'

Those forms share the following changes in common:

The first vowel of kumāra was (almost) always delabialized: u > a.

Is the u of kuār- from u or from m?

kumār- > kuār-?

kumār- > kamār- > kãvār- > kavār- > kuār-?

Is the u of kaür/kaur from u or from m?

kumār- > kuār- > kaur?

kumār- > *kumar- > kuar- > kaur?

kumār- > kamār- > *kamr- > kaur?

The latter scenario seems unlikely since it requires a short vowel to remain while a long vowel disappears.

m lenited to v (sometimes nasalizing the previous vowel) or was lost (except in kamārā).

The original final vowel was lost, and new vowel suffixes were added:

Masculine is a Punjabi suffix. (Sanskrit is feminine.)

Is feminine a direct retention of Sanskrit -ī?

What I don't understand is:

- how a single form could develop in six or seven different ways (is the difference between the last two purely orthographic?):

kavār, kãvārā, kavārā, kuārā, kamārā, kaür, kaur

Are those forms taken from different dialects and/or different time periods?

- how kaür ~ kaur lost long ā (unless a is from ā) QʷʰEST FOR FIRE

(Posted after expansion by

Today I found a 110-entry Tujia wordlist at The forms for 'fire'

Tasha (Qixin) Tujia mi₂₁

Duogu and Boluo (Luxi) Tujia mi₅₃

Dianfang and Tanxi Tujia mi₅₅

made me want to check Baxter and Sagart (2014) to see if they reconstructed Old Chinese 火 'fire' as a potential cognate *m̥ˁəjʔ corresponding to Sagart's (1999: 159) *ahmɨjʔ. Sagart (1999: 159) wrote,

The Middle Chinese initial [of 'fire'] may reflect OC [Old Chinese] *hm-, *hw- or *hŋʷ-. Evidence that the Old Chinese initial was *hm- comes from a variant form: [Mandarin] hui₃ 𤈦 MC xwjïjX, defined by the Shuo Wen Jie Zi as 'fire' 火也. Since hui₃ 𤈦 includes the phonetic wei₃ 尾 *bmɨjʔ > mjïjX 'tail', it must reflect OC *bhmɨjʔ, and [Mandarin] huo₃ 火 itself must be *ahm[ɨ]jʔ > xwaX (see Li 1971).

If the two words for 'fire' are related, I could trace them back to a common prototype:


Early Old Chinese

Presyllabic vowel loss 1

Emphasis (phonetic)

Emphasis (phonemic)

Presyllabic vowel loss 2











Sagart compared his *ahmɨjʔ to Proto-Austronesian *DamaR 'torch'. If there is a relationship between the two words, ideally my sesquisyllabic Early Old Chinese should resemble the Proto-Austronesian form. My assumption is that Proto-Austronesian preserves first syllables better than Chinese did. However, the voiceless *s- that devoices voiced *m- in my scenario is unlike the voiced retroflex *D of Proto-Austronesian. And the second vowels don't match either. I could work around that by positing *Dam̥ʌR as the source of both words. The *D- would somehow become Early Old Chinese *s- (via *z-?; but cf. a different proposal below), and lower mid would lower to Proto-Austronesian *a and raise to Old Chinese *ə. Such proposed sound changes would need to be confirmed in other etyma shared by Proto-Austronesian and Old Chinese.

12.12.12:36: I don't know where Sagart (1999: 159) got Proto-Austronesian *DamaR 'torch' from. In Blust and Trussel's Austronesian Comparative Dictionary, the Proto-Austronesian form is listed as *damaR with an alveolar *d and the gloss 'tree resin used in torches (?)'. Its Kavalan reflex zamaR means 'fire'; other reflexes have meaning involving tree resin, torches, and light.

In modern Mon, "[a]ll voiced and aspirated stops, together with s were reduced to h" in presyllables (Jenny 2005: 27): e.g., Old Mon <dirmoṅ> and Middle Mon <damaṅ> 'dwelling place' correspond to modern spoken Mon [həmo̤ŋ] (Shorto 1971: 194, Diffloth 1984: 176). Could voiced stops like *d- be a source of sonorant devoicing in Old Chinese: e.g.,

*dam- > *dʌm- > *ðʌm- > *θʌm- > *hʌm- > *həm-ʔ > *hm- > *m̥-?

There is no evidence for fricatives as an intermediate stage of *d- to h-lenition in Mon, though it is possible that inscriptional <d> represented [ð] at some point.

The final rhotic in 'fire' normally shifted to *-j but could have shifted to *-n in the dialect of Old Chinese ancestral to the source of the Amoy literary reading hoN³ for 火. (-N indicates vowel nasalization.) Bodman (1985: 12) mentioned that form and others with unexpected nasalized vowels in Southern Min: e.g.,

指 "most of Southern Min" caiN³ 'finger' < *mə.kinʔ? < *-rʔ??

In Eastern Min, Fuzhou has tsieŋ³ as well as tsai³ and tsi³. But it is an outlier among non-Southern Min languages.

cf. Northern Min: 建甌 Jian'ou literary and colloquial ki³ ~ tsi³ < OC *mə.kijʔ? < *-rʔ? without nasal

I presume Northern Min initial zero/h/ɦ are due to postpresyllabic lenition:

*mə.k- > *mə.g- > *mə.ɣ- > *ɣ- > *h- > Jian'ou zero, 松溪 Songxi h-, and 石陂 Shibei ɦ-

椅 Chaozhou iN³ 'chair' < *Cə.q(r)anʔ? < *-rʔ??

cf. Amoy < OC *Cə.q(r)ajʔ? < *-r? without nasal vowel

睇 Chaozhou thoiN³ 'to look at' < *l̥ˤ[ə]nʔ? < *-rʔ??

cf. Cantonese thai³ < OC *l̥ˤ[ə]jʔ < *-r? without nasal (Cantonese has no nasal vowels)

鼻 "Southern Min" phiN⁵ 'nose, to smell' < OC *m-bi[t]-s

好 Amoy hoN⁵ 'to like' (lit.) < OC *qʰˤuʔ-s

潮 Chaozhou tioN² 'tide' < OC *[N]-t<r>aw

'Fire', 'finger', 'to look at', and 'chair' might have ended in *-rʔ (is it a coincidence that they all end in glottal stops?), and it is remotely possible that OC *m-bi[t]-s 'nose' had a nasal variant *m-bin-s (though homorganic alternations are generally between velars; I can't find any examples of *-t ~ *-n alternations in Gong 1994).

However, nothing outside Min points to final nasals (or a final *r that could become a nasal) in the last two words. Could the nasalization of 'to like' and 'tide' be by analogy with other words? AN AUSTROASIATIC ADOPTEE IN CHINESE

(12.9.3:29: I wrote the first draft of this post nearly two weeks ago but did not post it until I finished revising and expanding it.)

I found Zheng Rongbin's "The Zhongxian (中仙) Min Dialect: A Preliminary Study of Language Contact and Stratum-Formation" when Googling for 囝, the graph for the Min word for 'son', while writing my entry on a similar-looking Khitan small script character:

309 <ghó> 'he' (?)

The Zhongxian word for 'son' is kɯŋ with a nonlabial vowel. Most Min forms for 'son' also have nonlabial vocalism: e.g.,


Taiwanese kiáⁿ

Chaozhouʿkán (Goddard 1883: 65), kĩã 42 (Hanyu fangyan cidian, vol. II, p. 1977; hereafter HFC) 

海康 Haikang kia 31

中山隆都 Zhongshan Longdu kiɛn 24


Fuzhou kiaŋ 31 (HFC)

泰 順泗溪 Taishun Sixi kiẽ 44 (

Hainanese kjʌ (

I wonder what Northern Min forms are like.

Norman and Mei (1976: 297) reconstructed Proto-Min *kian.

I only recently discovered two Min forms with labial vowels in HFC:

Puxian: 仙游 Xianyou kyã 53

Central: 三明 Sanming kyaiŋ 21

The word also appears without labial vocalism in non-Min languages (all data from HFC):

In Min-dominated Fujian:

南平 Nanping Mandarin ʿkiŋ

浦城忠信 Pucheng Zhongxin Wu kiãi 44

明溪 Mingxi Hakka kieŋ 5

In Jiangxi next to Fujian:

萬安 Wan'an Hakka kieŋ 21

The nonlabial vocalism of that word for 'son' in Chinese has bothered me for a long time because the word is supposed to be a substratal borrowing from Austroasiatic, and nearly all Austroasiatic forms I have seen have a labial vowel or at least a labial medial:

Aslian: Semnam kuoːn (but Jahai kɛn!)

Proto-Bahnaric *kɔːn (Sidwell 2011)

Proto-Katuic *kɔːn (Sidwell 2005)

Proto-Khasic *kʰɔn (Sidwell 2012)

Khmeric: Pre-Angkorian <kon> (the most frequent spelling; other Pre-Angkorian and Angkorian spellings also have labial vowels and/or <v> with the sole exception of one occurrence of <kven> in 1139)

Proto-Khmuic *kɔːn  (Sidwell 2013)

Pakanic or Mangic: Do Bolyu qɔ³³je⁵³ and Mang van⁶ belong in this cognate set?

Monic: Dvaravati Old Mon *kɔːn (Diffloth 1984)

Nicobaric: koːən ~ kón ~ kúan (Shorto 2006)

Proto-Palaungic *kɔːn (Sidwell 2010)

Proto-Pearic *kn (! - Headley 1985; why does SEAlang have parentheses around the vowel?)

Vietic: Vietnamese con [kɔn], Ruc kɔːn, perhaps Thavung kun (in words for 'man', 'woman', and 'children')

Shorto (2006) reconstructed Proto-Mon-Khmer *kuun ~ *kuən.

The reading of 囝 in the Middle Chinese dictionary tradition is *kɨənˀ which is close to Proto-Min *kian; both imply an Old Chinese *kanʔ. 

Was Old Chinese *a an approximation of an Austroasiatic *ɔ, a vowel which did not exist in Early and Middle Old Chinese?

Is the -y- in Xianyou and Sanming from *-ɨ- with tertiary labialization transferred from the secondary labialization of the following vowel?

*kian > *kion > *kyon > Xianyou kyã 53, Sanming kyaiŋ 21

This hypothesis predicts that Xianyou and Sanming should have reflexes of Middle Chinese 建 *kɨənʰ 'establish' with -y-, and Xiaoxuetang lists such reflexes: Xianyou kyøŋ 52 and Sanming kyaiŋ 33. 建 is from Old Chinese *kan and never had a labial vowel. Its Northern Min readings have secondary or tertiary labialization:

松溪 Songxi kyŋ 33

建甌 Jian'ou kuɪŋ 33 ~ kyɪŋ 33

If the Northern Min word for 'child' is a homophone of 建 (disregarding tones), then it too should have a nonoriginal -y-.

Only now do I realize that the final glottal stop corresponds to nothing in Austroasiatic. Why the mismatch? SAME PARENTS, DIFFERENT PRONUNCIATION: SORORAL VARIATION IN MAAP KRAAT NYAH KUR

One may find the synchronic variation that I reconstructed for Old Chinese to be implausible. However, I found an even greater degree of synchronic variation in มาบกราด Maap Kraat Nyah Kur presyllables among the three elderly informants in Diffloth (1984). Informants MKle and MKlu have different forms even though they are sisters! Colored cells indicate innovative forms. The most prominent vowel is indicated with a subscript symbol specifying register: a underline for clear voice (< *voiceless initials) and a diaresis for breathy voice (< *voiced initials).

Proto-Monic Gloss MKle MKlu MKp
*jliiŋ long chəmi̤in khəmi̤in chəmi̤in
*j-m-lɛɛʔ short chəmlɛ̤ɛʔ chəmlɛ̤ɛʔ khəmlɛ̤ɛʔ
*(cŋ)kaam husk cəŋka̱am cəŋka̱am təŋka̱am
*c-ŋ-kiəm handful cəŋki̲am təŋki̱əm təŋki̱əm
*cŋkɔɔr bark cəŋku̱ay kəŋku̱ay ~
*cŋkiər unpleasant cəŋki̱əy təŋki̱əy cəŋkɯ̱əy
*smpɔɔt to wipe səmpu̱at kəmpu̱at kəmpu̱at ~
*tmpɔh seven cəmpɔ̱h kəmpɔ̱h kəmpɔ̱h ~
*p-m-tɯl sand pəmtɯ̱y kəmtɯ̱y kəmtɯ̱y
*cmpiir pumpkin chəmpi̱i ~
khəmpi̱iy səmpɨ̱ɨy

It would be difficult to reconstruct Monic presyllabic consonants with confidence without premodern and comparative data. In one case ('seven'), none of the three informants preserved the original presyllabic initial.

And without more information on cross-generational phonology, I cannot understand how

[m]any of these differences probably represent unsystematic and limited adaptation, on the part of very old persons, to the rapidly changing speech of younger speakers, many of whom, including their own children, have now abandoned the language and speak Thai. We simply arrived twenty years too late to record sounds which have been preserved for more than fourteen centuries. (Diffloth 1984: 320)

With the exception of a monosyllabic form of 'seven', I don't see how any of the innovative forms are more Thai-like than the more conservative forms. (Thai favors monosyllables.)

Is the variation in Phan Rang Cham due to such adaptation?

11.26.15:36: And what is one to make of the variation in Hmongic forms for 'dog' (Ratliff 2010: 206)?

Proto-Hmong-Mien *qluwX


Pa-Hng (why a final nasalized vowel?)

Baiyun ta 1 ljɔ̃ 7

Tan Trinh ta 1 ljɔ̃ 7 ~ Baiyun ka 1 ljɔ̃ 7

West Hmongic:

Xianjin (Hmong) tl̥e 3

Fuyuan (Hmjo) qlei B

North Hmongic: Jiwei (Qo Xiong) qwɯ 3

Mienic: Liangzi Mun klu 3

If not the for the Pa-Hng forms, I would say that *ql- simplified to q- or kl- with assimilation in tl̥- (q- became dental like l which became voiceless like the preceding stop). Are the Pa-Hng forms stretched monosyllables - expansions of *klV and *tlV? Or do they contain *l-roots preceded by different prefixes? Or do they preserve the original disyllabic structure of their Proto-Hmong-Mien ancestor which might have been something like *qa.luwX? Would *qa- have changed to *ta- by analogy with some other word or even a compressed but now extinct form with an assimilated cluster like *tljɔ̃ 7?

11.26.20:45: If the root of 'dog' is *luwX, then Old Chinese *Cə.kˁroʔ (which has no internal etymology) could be a borrowing of a prefixed Proto-Hmong-Mien word that displaced the native word 犬 *[k]ʷʰˁ[e][n]ʔ.

What if the Old Chinese form was *təkˁroʔ from a Proto-Hmong-Mien *tV-qV-luwX with a doubly prefixed *l-root or *tV-qluwX with a singly prefixed *ql-root? Could Pa-Hng ta-l-be a simplification of an earlier *tV-ql-? IS 'HE' A MAN IN A BOX?

For a long time I thought Khitan pronouns were unknown. Then this year I got ahold of Wu and Janhunen (2010). The authors regard the Khitan small script character

309 <ghó>

as a word possibly meaning 'he' when by itself*. Unfortunately I cannot find a full argument for that interpretation in their book.

309 resembles Chinese 士 'officer, gentleman' with an added enclosure 囗. I am reminded of 囝, a graph for the Min word for 'son' composed of 子 'son' inside 囗. (There was no contact between the Khitan in the north and the Min in the south, so 309 did not influence 囝 or vice versa.)

Other potential forms of 'he' are

309-140 <ghó.en> 'he.GEN' = 'his'? (蕭仲恭 41.1, 46.6, 耶律詳穩 35.34)

11.25.0:06: Why isn't this <ghó.on> with vowel harmony?

309-205 <ghó.de> 'he.DAT' = '(to) him'? (興宗 21.15, 24.26, 許王 39.20, 蕭仲恭 5.58, 蕭敵魯 45.11)

Why isn't this <ghó.do> with vowel harmony?

309-254 <ghó.d> 'he.PL?' = 'they'? (耶律詳穩 32.6)

Wu and Janhunen (2010: 200) identified this as a possible dative which makes sense in the context of 耶律詳穩 32. Was *-de shortened to -d after gho?

Added 11.25.23:33:

309-341 <ghó.er> 'he.ACC/INST' = '(by) him'? (道宗 17.23)

Why doesn't this suffix have harmonic variants: e.g., *<or>?

309-339 <ghó.i> 'he.GEN' = 'his'? (蕭仲恭 48.10)

<i> does not harmonize; cf.

<b.qo.i> 'son.GEN'

Did <ghó.i> and <ghó.en> both mean 'he.GEN'?

If those blocks containing 309 plus characters for common case suffixes are forms of 'he', why are they so rare?

*11.25.23:40: 309 <ghó> also occurs as a phonogram in blocks such as

309-261-261-112-341 <ghó> (道宗 22.20)

Once again <ghó> appears next to e-graphs even though o and e typically do not mix in Altaic-type languages.

11.26.0:52: The transliteration of <ghó> seems to be based on the Chinese transcription 訛 *(ng)()o (Kane 2009: 72). It does not resemble the third person pronouns *i (singular) and *a (plural) that Janhunen (2003: 18) reconstructed for Proto-Mongolic. It is not currently possible to determine whether the Khitan and/or Proto-Mongolic pronouns are innovations. I doubt that the common ancestor of Khitan and Proto-Mongolic can ever be reconstructed in detail. CAN ANYONE EXPLAIN THE EXTRA X IN AVESTAN AND OLD PERSIAN?

Jackson (1892: 29) wrote (converting his transliteration to Hoffman's),

In Av. [= Avestan], we sometimes find x prefixed to ṣ̌, initial or internal, apparently without etymological value: e.g., ā-xṣ̌nuš 'up to knee', cf. Skt. abhi-jñu.

Another example in Jackson (1892: 136, 193) is the desiderative present participle zixṣ̌nā̊ŋhəmna- 'wanting to know' (cf. Skt jijñāsamāna-; Av xṣ̌n : Skt < Proto-Indo-European *gn).

Was this x- added to by analogy with words with 'true' x-clusters corresponding to Sanskrit kṣ-: e.g., Avestan xṣ̌aϑrəm 'rule' (cf. Skt kṣatram).

Old Persian is not descended from Avestan, but it also had this extra x in /xšnā-/ 'know' (inchoative; see Cheung 2007: 466 for examples).

Zoroastrian Middle Persian /šnās-/ 'recognize' (inchoative) lost it. (But is the `ayin in Manichaean Middle Persian <ʕšnʔs> a reflection of the extra x?)

Avestan and Persian belong to different branches of Iranian. Did the extra x indepedently 'grow' twice in Iranian, once in the east (Avestan?) and again in the west (Persian)?

11.24.23:27: I placed a question mark after "Avestan" since its classification as eastern is disputed. In any case, Avestan and Old Persian do not subgroup together. They are in northeastern and southwestern branches in this tree. WHY DID KOREANS BORROW LATE MIDDLE CHINESE GRADE II VELAR-FINAL SYLLABLES IN TWO DIFFERENT WAYS? (PART 2)

(I originally meant to post this entry last night but noticed I had overlooked something essential and decided to upload a revised version. The title should be more specific since these posts are about *K- + nonhigh, nonlabial vowel + *K syllables, but I've retained the title for the sake of continuity.)

For over twenty years I had been assuming that Middle Old Chinese *ˀraK/ˀreK/ˀrəK-rhymes all merged into Late Middle Chinese (LMC) Grade II *æK which I recently revised as *ʌ̆eK. But I had known the evidence against such a merger all along!

In modern Sino-Korean, reflexes of Middle Old Chinese (MOC) *KˀreK/KˀrəK always end in -jəK (< *eK), whereas reflexes of MOC *KˀraK end in either -jəK (< *eK) or -aŋ (< *eŋ). I conclude that the northeastern LMC (NELMC) source dialect of 8th century Sino-Korean at least partly distinguished between reflexes of those two types of syllables unlike other LMC dialects. NELMC may have undergone a chain shift not found in other dialects (*ʌ̆a > *ʌ̆e > *e) without the stages in the Late Old Chinese and Early Middle Chinese columns. Hence NELMC is not descended from the prestige dialects in those columns. 

Sinograph Gloss Middle Old Chinese Vocalization Late Old Chinese Early Middle Chinese Non-NE LMC NELMC 8th c. SK Prescriptive SK Premodern SK Modern SK
change *kˀraŋ *kʌ̆aŋ *keaŋ *kæŋ *kæŋ *kʌ̆eŋ *kʌjŋ kʌjŋ kʌjŋ kɛŋ
seventh Heavenly Stem *keŋ kjəŋ
to plow *kˀreŋ *kʌ̆eŋ *kaeŋ *kɛŋ *keŋ kjəŋ
guest *kʰˀrak *kʌ̆ak *kʰeak *kʰæk *kʰæk *kʰʌ̆ek *kʌjk kʰʌjk kʌjk kɛk
go to *kˀrak *kʌ̆ak *keak *kæk *kæk *kʌ̆ek *kʌjk kʌjk kjək
obstruct *kˀrek *kʌ̆ek *kaek *kɛk *kek *kek kjək
hide, skin; change *kˀrək *kʌ̆ək *xʱek? *hek hjək (irregular initial*)

In the prescriptive SK of 東國正韻 Tongguk chŏngun (1446), the reflexes of MOC *KˀreK/KˀrəK (KjəK) are always distinct from reflexes of MOC *KˀraK (KʌjK). That suggests the KjəK < *KeK reflexes of MOC *KˀraK were less prestigious and hence not worthy of inclusion in Tongguk chŏngun.

Were those KjəK-readings borrowed from a LMC dialect which had *KɛK instead of *Kʌ̆eK from MOC *KˀraK?

Were those KjəK-readings considered incorrect because Old Korean *e was a poorer match for the NELMC diphthong? Perhaps *ʌ̆e (short-long) had become *ʌĕ (long-short) which would have been better approximated by Old Korean *ʌj than by *e. But at least some *e-forms persisted and their reflexes are standard today,

Were those KjəK-readings considered incorrect because of a desire to keep a clear distinction between the Early Middle Chinese 庚陌 *-æŋ/*-æk and 耕麥 *-ɛŋ/*-ɛk categories that was muddied in the actual borrowings whose descendants are in use today? Were readings like kʌjŋ for 庚 artificial creations in Tongguk chŏngun?

*11.23.23:39: There is no Korean-internal reason to borrow a *k-word like 革 with h-. I suspect the h- reflects a NELMC initial *xʱ-. My theory of emphatic origins requires a lower vowel presyllable to condition emphasis in this word:

Early Old Chinese *Cʌ-krək > MOC *kˀrək

If that presyllable were *Nʌ-, it could have dropped in mainstream Chinese after emphasis whereas it fused with the *k- in the ancestor of NELMC:

Early Old Chinese *Nʌ-krək*Nʌ-kˁrək*ŋkˁrək*ŋgˁrək*gˁrək*gʌ̆ək*gʌ̆ek*ɣʌ̆ek > NELMC *xʱek

The Tongguk chŏngun reading kjək may have an artificial k- based on the prestige dialectal base of the Chinese phonological tradition.

The Tongguk chŏngun readings as a whole may be an artificial compromise between that tradition and actual NELMC-based readings already in use in Korea since the 8th century AD. WHY DID KOREANS BORROW LATE MIDDLE CHINESE GRADE II VELAR-FINAL SYLLABLES IN TWO DIFFERENT WAYS? (PART 1)

In "Chinese Grade II, Version 2015.11.19", I wrote,

The stages [of Chinese presented here] are 'generic'; as I will demonstrate later, actual dialects could differ from this model.

Here's such a brief demonstration. What I reconstructed as Late Middle Chinese (LMC) *æ normally corresponds to Sino-Korean (SK) a. But that *æ also has other correspondences:

更 'change': MOC *kˀraŋ > LMC *kæŋ : SK kʌjŋ (not *kaŋ)

羹 'soup': MOC *kˀraŋ > LMC *kæŋ : SK kʌjŋ

庚 'seventh Heavenly Stem': MOC *kˀraŋ > LMC *kæŋ : SK kjəŋ < *keŋ (not *kaŋ or *kʌjŋ)

耕 'to plow': MOC *kˀreŋ > LMC *kæŋ : SK kjəŋ < *keŋ

客 'guest': MOC *kʰˀrak > LMC *kʰæk : SK kʌjk (no aspiration; borrowed before Korean developed a phonemic aspirated /kʰ/;  not *kak)

格 'go to': MOC *kˀrak > LMC *kæk : SK kjək*kek (not *kak or *kʌjk)

隔 'obstruct': MOC *kˀrek > LMC *kæk : SK kjək*kek

革 'hide, skin; change': MOC *kˀrək > LMC *kæk : SK hjək*hek (irregular initial)

(The SK forms are premodern and in IPA to facilitate comparison. *-k forms added 11.22.23:34.)

I think ʌj and  (< Old Korean *e) reflect two different aprpoaches to borrowing a northeastern LMC diphthong *ʌe corresponding to *æ before velars in other Middle Chinese dialects. ʌj matched the first vowel of *ʌe and approximated the second with a glide. Old Korean *e matched the second, more prominent half of *ʌe (which I could more precisely write as *ʌ̆e).

11.23.13:34: If you look carefully, you can see a pattern among the SK readings that I missed when I wrote this entry. I'll reveal that pattern next time. HOW DID MEOW BECOME ME-NG-OW?

While looking through Thurgood's From Ancient Cham to Modern Dialects (1999) for examples of diphthongs resulting from vocalic splits*, I found two unusual-looking forms:

Western Cham maŋiau 'cat' from Proto-Chamic *miaw (p. 159)

Phan Rang Cham pimaw 'mushroom' from Proto-Chamic *bɔh maw (p. 158)

'Cat' has a -ŋ- that 'grew' in the middle of *miaw. Moreover, there is an extra -a- between m- and this -ŋ-. Why was this originally monosyllabic word stretched into two syllables when the general tendency was to compress? Examples from Thurgood (1999: 112):

Proto-Chamic *bara > Western Cham pra 'shoulder'

Proto-Chamic *bulan > Western Cham ea plan 'moon, month' (What does ea mean?)

Proto-Chamic *bulu > Western Cham plau 'body hair'

'Mushroom' disturbs me because i and are very different vowels. I fear that Old Chinese presyllabic vowels might have undergone nonrecoverable changes similar to those that occurred in this word.

11.21.23:46: I am also disturbed by Phan Rang Cham mɨyau 'cat' (p. 159). Could this word be like a pre-Western Cham *mayiau, a stretched form whose intrusive *-y- nasalized to *-ɲ- under the influence of the preceding *m-? But why would  *-ɲ- back to -ŋ-?

11.22.0:19: Western Cham does have ɲ [Thurgood 1999: 274], though I do not know if word-medial -ɲi- is possible. Did *-ɲi- become -ŋi-? CHINESE GRADE II, VERSION 2015.11.19

(More like version 2015.11.21, since I have revised this entry over the past two days.)

In my last entry, I reconstructed the Late Old Chinese reading of the Grade II word 講 'discuss, explain' (later 'speak') as *kəoŋʔ. I think I've reconstructed Grade II words with similar diphthongs before. In any case, here's how I think *-ˁr- (> Middle Chinese Grade II) and *-r- (> Middle Chinese Grade III) syllables developed between Middle Old Chinese (after emphasis had developed) and Late Middle Chinese:

Middle Old Chinese Vowel bending Vocalization Late Old Chinese Early Middle Chinese Late Middle Chinese
*-ˁre *-re *-ʌe *-ae *-ɛ(j) *-æ(j)
*-re *-rie *-ɨie *-ɨe *-ɨi
*-ˁra *-ra *-ʌa *-ea *-ɛ *-(j)æ
*-ra *-rɨa *-ɨa *-ɨə *-iə > *-ø > *-y
*-ˁroh *-ro *-ʌo *-əw *-əw
*-ro *-ruo *-ɨuo *-uo *-u
*-ˁri *-rei *-ʌej *-aej *-ɛj *-æj
*-ri *-ri *-ɨi
*-ˁrə *-rəɨ *-ʌəɰ *-aej *-ɛj *-æj
*-rə *-rɨə *-ɨə *-ɨ *-ɨi
*-ˁru *-rou *-ʌow > *-ʌew *-aew *-ɛw *-æw
*-ru *-ru *-ɨu *-u *-ɨw

The stages are 'generic'; as I will demonstrate later, actual dialects could differ from this model.

I wrote *-ˁ- before *-r- since I follow Baxter and Sagart in writing emphasis before the first consonant of a cluster: e.g., *pˁr-. But I think emphasis was a feature of all consonants in a cluster: e.g., /pˁr/ was phonetically [pˁrˁ].

Between Middle and Late Old Chinese, nonemphatic *-r- became a high central that fused with nonfront high vowels, but emphatic *-ˁr- became lower-mid back before lower vowels. (I decided to write the reflex of *-ˁr- as *-ʌ- because emphasis is associated with backing and because *-ˁrə did not become long *-əː.)

Perhaps vocalization predated vowel bending: e.g., *-re > *-ɨe > *-ɨie.

In Late Old Chinese, lower series vowels dissimilated:

*ʌa (achromatic-achromatic) > *ea (palatal-achromatic) : ae (!) in Baxter's Middle Chinese notation

*ʌə (achromatic-achromatic) > e > *ae (achromatic-palatal) : ea (!) in Baxter's Middle Chinese notation

*-ʌow (achromatic + labial + labial) > *-ʌew > *-aew (achromatic + palatal + labial) : aew in Baxter's Middle Chinese notation

In Early Middle Chinese, lower series diphthongs monophthongized

*ea >

*ae >

In Late Middle Chinese, merged into *æ.

Toward the end of the Late Middle Chinese period, a *-j- developed between velars and in at least some dialects: e.g., the ancestor of Mandarin and the source of Sino-Vietnamese.

The first vowel in the achromatic-achromatic diphthong *ɨə dissimlated. Then the resulting fused into a front mid labial vowel that raised:

*-ɨə > -iə > *-ø > *-y

Go-on -o < *-ɨə, Kan-on -yo < *-iə, Sino-Korean < *-ø, and Sino-Vietnamese < *-y reflect these four stages. (However, those four types of Sinoxenic were borrowed from four different Middle Chinese dialects at four different times, so it is possible that they reflect different dialect developments: e.g., Sino-Korean could be from a conservative *-ɨə or even *-ə and Sino-Vietnamese -ư, an unrounded vowel, may directly reflect an *-ɨ from *-ɨə.)

I am not happy with the diphthongs I reconstructed. I should examine the diphthongs of Mon-Khmer languages which have undergone vocalic splits and diphthongization to get a feel for which vowel sequences are plausible. (The only such Mon-Khmer language I am familiar with is Khmer which does not have ʌa, etc.)

11.21.22:42: I originally intended to include onset development in the table above but omitted it to focus on the vowels. Here are a few examples of how *r-clusters changed over time:

Sinograph Gloss Middle Old Chinese Vowel bending Vocalization Late Old Chinese Early Middle Chinese Late Middle Chinese
snake *pˁra *pra *pʌa *pea *pæ
skin *pra *prɨa *pɨa *pua *puo > *pu *fu
to crouch *dˁreʔ *ɖreʔ *ɖʌeʔ *ɖaeʔ *ɖɛ(j)ˀ *ʈɦǽ(j)
bug *dreʔ *ɖrieʔ *ɖɨieʔ *ɖɨeʔ *ɖɨeˀ *ʈɦɨí
household *kˁra *kra *kʌa *kea *kæ *kjæ
plant, place name *kra *krɨa *kɨa *kɨə *kiə > *kø > *ky

*ɨa became *ua after labials in syllables with zero or glottal codas: *Pɨa(H) > *Pua(H).

In late Early Middle Chinese, final mid vowels in diphthongs raised (and merged with the preceding vowel if it was identical):

*uo > *uu > *u

*ɨe > *ɨi

*ie > *ii > *i (no examples in this post because *ie was only in syllables that did not have medial *-r-)

Coronals developed retroflex allophones before rhotics:

*t(ʰ)r > *ʈ(ʰ)r > *ʈ(ʰ)

*dr > *ɖr >

*nr > *ɳr >

*(t)s(ʰ)r > *(t)ʂ(ʰ)r > *(t)ʂ(ʰ)

*(d)zr > *(d)ʐr > *(d)ʐ

Those allophones became phonemic after the rhotics were lost.

Final glottal stops conditioned glottalization in Early Middle Chinese which in turn led to a tone in Late Middle Chinese. (I think it might be better to translate 聲 as 'phonation' rather than as 'tone' for Early Middle Chinese.)

Labials weakened to dentilabial fricatives before *u: *pu > *fu.

Voiced obstruents became voiceless-obstruent--clusters in Late Middle Chinese (Pulleyblank 1984). IS SANYA MANDARIN MANDARIN?

Today I found Thurgood et al.'s A Grammatical Sketch of Hainan Cham and was astonished by Appendix C on 三亞 Sanya Mandarin. An informant read a Mandarin newspaper out loud with surprising results:

- the codas included -k and -t unlike any Mandarin variety I have ever seen but like southern Chinese languages: e.g.,

kok²⁴ 'country' (cf. LOC *kwək, standard Mandarin guó)

siet²⁴ 'snow' (cf. LOC *swɨat, standard Mandarin xuě)

but LOC *-p corresponds to *-t:

zet²⁴ 'leaf' (cf. LOC *jɨap, standard Mandarin yè)

sit²⁴ 'ten' (cf. LOC *dʑɨəp, standard Mandarin shí)

- the codas included glottal stops and nasal-glottal stop clusters generally where they would be reconstructed in Old Chinese (!): e.g.,

kiuʔ⁴ ³ 'nine' (cf. LOC *kuʔ, standard Mandarin jiǔ)

kiaŋʔ⁴ ³ 'nine' (cf. LOC *kəoŋʔ, standard Mandarin jiǎng)

but see the exceptions below!

- "the numbers are all read in Cham except dates": e.g.,

六十年 as Cham naːnʔ³³ piu⁵⁵ tʰun³³ instead of Chinese *lok²⁴ sit²⁴ nien²¹

How much does this reading pronunciation correspond to the informant's spoken Mandarin? Is it possible that this reading pronunciation is derived from a conservative southern Chinese language? There seem to be at least four strata in the reading pronunciations.

The first two strata (a conservative southern Chinese language and Cham) are listed above.

The third stratum looks like Mandarin and must be recent. It is characterized by an absence of final stops (and tonal differences I will explore later):

社會主義 se³³ huj³³ tsu²¹ zi³³ 'socialism' (cf. the LOC readings *dʑiaʔ, gwas, tɕuoʔ, and ŋɨajh - of course there was no LOC word 'socialism' - and standard Mandarin shèhuì zhǔyì)

but 主席 tsiuʔ⁴ ³ si³³ 'chairman' (cf. the LOC readings *tɕuoʔ and *ziak) preserves the final glottal stop in the morpheme 主 'master' - though the following morpheme lacks *-k!

tə²¹ 'get' (cf. LOC *tək and standard Mandarin dé)

s(i)o²¹ 'speak' (cf. LOC *ɕwɨat and standard Mandarin shuō)

The word 學習 sioʔ²⁴ sit²⁴ 'to study' is a combination of forms from different strata like 'chairman'. However, in 'chairman' the first syllable was more archaic, whereas in 'to study' the second syllable sit²⁴ is more archaic (cf. LOC *zɨəp and standard Mandarin without a stop coda). sioʔ²⁴ (< *xioʔ < LOC *gəuk) is like 合肥 Hefei Mandarin ɕyɐʔ with a glottal stop that is a trace of the *-k preserved in the oldest Chinese stratum. Hence I think sioʔ²⁴ is from a fourth stratum that is slightly more archaic than standard Mandarin xué which has no final stop. IS 'WING' FROM 'BRANCH' IN CHINESE?

In my previous entry, I reconstructed Old Chinese

*Cɯ.ke > *ke 'branch' (spelled 支 or 枝 with 木 'tree'), limb' (spelled 肢 or 胑 with 肉 'flesh') 

Could *C have been *s- if 'limb' is related to 翅 *sɯ.ke-s 'wing'?

Baxter and Sagart (2014: 140) reconstructed 翅 as Old Chinese *s-kʰe-s and *kʰe-s with an aspirated *kʰ- absent from their reconstruction of 'branch' as *ke.

Below I reconstruct a single 翅 *sɯ.ke-s 'wing' that underwent three different paths of reduction. I have included 'branch' and a possible cognate 咫 'foot (8 inches)' (< 'length of a branch'?) for comparison.

Sinograph Early Old Chinese Presyllabic vowel neutralization;
emphasis phonemic
Early *s.k-reduction Late *s.k-reduction Later reflexes
Phase 1: syncope Phase 2: cluster to aspirate Phase 1: syncope Phase 2:  cluster to fricative
*sɯ.ke-s *sə.ke-s *sə.ke-s *sə.ke-s * *ɕieh Middle Chinese *ɕieʰ; perhaps also a few modern forms like 漳浦 Zhangpu Min si unless their s- is from *tɕʰ-
* *kʰe-s *kʰe-s *tɕʰieh Most Chinese varieties: e.g., Mandarin chi
*ke-s *kieh Northwestern Middle Chinese *keʰ implied by 翅 in transcriptions of Indic ke-like syllables; no modern descendants
支枝肢胑 *sɯ.ke *sə.ke *ke *tɕie Middle Chinese *tɕie; most Chinese varieties
*kie Min forms with k-
*sɯ.ke-ʔ *sə.ke-ʔ *keʔ *tɕieʔ Middle Chinese *tɕieˀ

Early *s.k-reduction is part of the same wave of changes as *C.l-reductions 2-3 in this post. Similarly, late *s.k-reduction is part of the same wave of changes as *C.l-reductions 4-5.

If *sɯ- is part of a root 'branch', then *sɯ- was lost in 'limb, branch' prior to late *s.k-reduction whereas it was never dropped in the derived word 'wing'.

A wild possibility is that *sɯ- was lost in 'limb, branch' very early on - even before emphasis - but the resulting *ke did not become emphatic due to analogy with 'wing'. However, I would expect 'wing' to be remodeled after 'branch' rather than the other way around. In English, 'branch' is more common than 'wing'. Was the frequency the other way around in early Chinese? WHY DOES ARABIC HAVE EMPHATICS IN LOANWORDS FROM LANGUAGES WITHOUT EMPHATICS?

Normally if a language with sound X borrows from a language without sound X, I wouldn't expect sound X to be in borrowings. So for instance Hindi-Urdu has voiced aspirates and English doesn't. Hence I wouldn't expect voiced aspirates in Hindi-Urdu loanwords from English. (If there are such loanwords with gh, etc., I'd like to know about them.)

However, Hindu-Urdu loanwords from English do have retroflex stops even though English doesn't: e.g., ākar from doctor.

The reason is that Hindi-Urdu lacks alveolar stops, and to Hindi-Urdu speakers, English alveolar stops are perceived as being closer to Hindi-Urdu retroflex stops than to Hindi-Urdu dental stops.

I have long known that Arabic has emphatics (in bold) in loanwords even from languages without emphatics: e.g.,

Latin strata > Greek strata > Aramaic ʕsr > Arabic iraː 'way'

(11.17.19:59: I presume the Aramaic form had a vowel added before s to break up the initial cluster: str- > ʔVs.tˁV.r-. Did s really simplify to  in Arabic? Could the Arabic word be from Middle Persian <slt> /srat/ 'street'?)

French bicyclette > Moroccan Arabic bəqʃliːa

Spanish falta > Moroccan Arabic faːla 'error, offense'

French automobile > Moroccan Arabic uːmubiːl

French déserter > Algerian Arabic zarˁtˁa

French exercice > Algerian Arabic garˁsˁ (with French /gz/ simplified to /g/)

Italian gelati > Tunisian Arabic ʒiːlaː 'ice cream'

Turkish abla > Egyptian Arabic ʔala 'older sister'

French tante > Egyptian Arabic an

Those examples are from Kossmann's chapter on borrowings in Owens (2013). I didn't see any explanation for those emphatics there, but I guess that they might have something to do with approximating foreign vowel qualities: e.g., it would make sense to borrow automobile as uːmubiːl if /u/ had a lower allophone [ʊ] after emphatic /tˁ/ that was closer to foreign o than the higher allophone of /u/ after nonemphatic /t/.

Perhaps I am on the right track - at least for Moroccan Arabic (MA). Kenstowicz and Louriz wrote in their abstract:

MA has three vowel phonemes /a/ /i/ /u/ (as well as an epenthetic schwa). They take lowered and retracted allophones [ɑ], [e] and [o] respectively, when tautosyllabic with an emphatic consonant. The latter are redundant and predictable variant of the corresponding phonemes. This would lead one to predict that they should play no role in loanword adaptation. Also, since French lacks emphatic consonants, we expect that the above mentioned allophones should be absent completely from French loanwords in MA. However, consideration of French loanwords in MA shows that French /ɑ/, /e/ and /o/ can be identified with the MA allophones that appear in emphatic contexts in the native phonology.

I think this is the full article. I haven't had time to read it yet.

Note that the three vowels match the three 'lower series' vowels *a, *e, *o of my Old Chinese reconstruction which condition emphasis unless preceded by the 'higher series' vowels: e.g.,

*ke > *kˁe 'chicken' (source of White Hmong qaib)

but *Cɯ.ke > *ke 'branch' (spelled 支 or 枝 with 木 'tree'), limb' (spelled 肢 or 胑 with 肉 'flesh') 

An understanding of the apparent emphatic-nonemphatic mismatches in Arabic loanwords and their sources may help understand similar apparent mismatches in Old Chinese words and related forms in neighboring languages: e.g., between Old Chinese *kˁ- and White Hmong q- in 'chicken' (though to be a C > Hmong-Mien loan) and 'dog' (possibly a Hmong-Mien > Chinese loan?). HOW DID HMONGIC AND MIENIC GET THEIR WORDS FOR 'IRON'? (PART 1)

The short answer is "from Chinese". Here's a longer answer. Thanks to Mark Alves for drawing my attention to this issue.

Baxter and Sagart‘s (2014: 160) reconstruction of the Old Chinese word 鐵 for 'iron' does not quite match the Proto-Hmongic and Proto-Mienic forms reconstructed by Ratliff (2010: 258):

Language Initial Vowel Coda 'Tone'
Old Chinese l̥ˁ- -i- -k D
Proto-Hmongic l̥- -u-! -w C
Proto-Mienic r̥-! -ɛ- -k D

I have rewritten Ratliff's hl- and hr- in IPA to facilitate comparison.

'Tone' is in quotation marks since Old Chinese did not have tones and it is not clear whether the other two proto-languages had them. Nonetheless these 'tonal' categories definitely have tonal reflexes in daughter languages.

The Proto-Hmongic form has a labial vowel *u where I would expect a palatal vowel.

Similarly, Proto-Hmongic has *u in *ʔjuw C 'small, young' borrowed from some reflex of Old Chinese 幼 *[ʔ](r)iw-s. I would have expected Proto-Hmongic *ʔjiw C, but there is no Proto-Hmongic rhyme *-iw in Ratliff's reconstruction.

Here's my attempt to (unconvincingly, I'll admit) bridge the phonetic gap between the Old Chinese and Proto-Hmongic forms: 

Proto-Hmong-Mien *-k words developed Proto-Hmongic tone C unlike Proto-Hmong-Mien *-t and *-p words which developed Proto-Hmongic tone D (Ratliff 2010: 31). I suspect Proto-Hmong-Mien *-k became pre-Proto-Hmongic *-x which merged with *-h, the source of Proto-Hmongic tone C:

Proto-Hmong-Mien Pre-Proto-Hmongic Proto-Hmongic
*-h *-h Tone C (accompanied by breathiness [ʰ]?)
*-k *-x
*-t *-ʔ Tone D (accompanied by a final [ʔ]?)

(11.16.1:19: This merger has a parallel in Chinese:

Early Old Chinese Middle Old Chinese Late Old Chinese Middle Chinese Modern Chinese
*-s *-h *-h *-ʰ Tone C
*-ks *-x
*-ts *-ts > *-s (phonetically [c]?) *-s (phonetically [ɕ]?) *-jʰ

Unlike the pre-Proto-Hmongic merger, the Chinese merger involved clusters and perhaps a chain shift if Middle Old Chinese *-s really was [s]: *-ts, *-ps > *-s > *-h.)

Old Chinese *l̥ˁik 'iron' was borrowed before the merger as pre-Proto-Hmongic *l̥ix after Proto-Hmong-Mien *-ik and *-ek had become pre-Proto-Hmongic *-ɨx.

Old Chinese *ʔiwh 'young' was borrowed as pre-Proto-Hmongic *ʔiwh.

The rhyme of 'iron' merged with the rhyme of 'young' in pre-Proto-Hmongic, and the vowel assimilated to the following glide in Proto-Hmongic:

*-ix > *-iɣ > *-iɰʰ > *-iwʰ > Proto-Hmongic *-uw C

(11.16.1:26: Was Proto-Hmongic *-uw phonetically [ʊw]? Modern reflexes include [o], [ɔ], [ə], and [aw] as well as [u]. See Ratliff 2010: 135-136.)

Perhaps *-x (from an earlier *-k) generally became pre-Proto-Hmongic *-w which was then lost after certain vowels: e.g., 

*-ɨx*-ɨwʰ*-ɨ C (there was no *-ɨw in Proto-Hmongic).

(11.16.0:16: I am reminded of pre-Tangut *-k which shifted to *-w which was then lost after certain vowels: e.g.,

*-ak > *-aw > -a [there was no -aw in Tangut]

However, this secondary -w in Tangut was not associated with a particular tone unlike my proposed secondary *-w in Proto-Hmongic.)

The Proto-Mienic form has *r̥- instead of *l̥-. A rhotic also underlies Vietic forms for 'iron': e.g., Vietnamese sắt < *kr-. Did Proto-Mienic and Vietic borrow 'iron' from Chinese dialects in which *l̥- became *r̥-?

Next: Chronological issues.

11.16.1:10: The Vietic borrowing may reflect an archaic *kr- cluster from an even earlier *kʌ.l-:


> mainstream Chinese *l̥ˁik > *l̥ˁit*l̥ˁeit*tʰet

> dialect A *l̥ˁik > *l̥ˁeik (source of Proto-Tai *l̥ek 'iron' [Pittayaporn 2009: 333] and Proto-Palaungic *l̥ek 'iron' [Sidwell 2010])

> dialect B *k.rˁik*r̥ˁik*r̥ˁeik (source of Proto-Mienic *r̥ˁɛk)

> dialect C *k.rˁik*k.rˁeik  > *k.rˁaik (source of Vietic forms; was -ik borrowed as a palatal stop *-c?)

(11.16.1:40: The shift *-ei > *-ai is in other southern reflexes of old emphatic syllables: e.g., 雞 *kˁe > *kai, the source of Proto-Tai *kaj B 'chicken' (the tone is unexplained and may reflect an *-h from an earlier *-s suffix in the source dialect.)

The Vietic borrowing may have displaced a native cognate of Proto-Katuic *taːʔ 'iron' (Sidwell 2005) if the Vieto-Katuic hypothesis (see Alves 2005) is correct.

Proto-Katuic *taːʔ 'iron' superficially resembles but is probably not cognate to Old Khmer <teka> /ɗɛːk/ 'iron' which Jenner compared to Siamese เหล็ก /lèk/ 'iron'.  Could the Khmer form be from a Old Chinese dialect in which *k- dropped without conditioning the devoicing of the following liquid?

> dialect D *lˁik > *lˁeik > *dek 

But why was Old Chinese *d- borrowed as an Old Khmer implosive /ɗ/ instead of /d/? Could dialect D have had implosives?

> dialect D *ʔlˁik > *ʔlˁeik > *ɗek? 

Are there other instances of Old Chinese *lˁ corresponding to Khmer /ɗ/? I'd like to take a second look at Jenner and Pou's "Some Chinese Loanwords in Khmer" (1973). FALLEN PREFIXES: GSR 0011

I don't have time to do what I said I would at the end of my last entry. A simple answer grew into a long post that I can't complete now. While researching that entry, I came across Grammata serica recensa series 0011 and thought it might be fun to apply my 'extended emphatic theory' to it taking Baxter and Sagart's reconstructions as a starting point.

First, a few words about the phonetic component of 0011: 阝+左. It represents syllables of the shape LOJ. (I use capital letters to reflect generic forms.) But - at least in the later script - it contains 左 representing syllables of the shape TSAJ/TSAR*. 左 TSAJ/TSAR is too different from 阝+左 LOJ to be a phonetic within a phonetic. The Sino-Korean reading 좌 chwa for 左 and 佐 could be from a Middle Chinese *tswa which in turn could be from an Old Chinese TSOJ/TSOR. Is the 左 in阝+左 a partial phonetic reflecting a dialect in which 左 ended in -OJ rather than -AJ? Is there a way to reconcile L- and TS-?

Another case of a possible L-/TS- alternation is 酉 *luʔ (Baxter and Sagart: *N-ruʔ) 'wine' ~ 酒 *tsuʔ 'wine'. I proposed that 酉 and 酒 are members of an Old Chinese palatal series. But the initials of 0011 do not overlap with those of my proposed palatal series.

Maybe all of this is a nonissue if 阝+左 and 左 are in fact unrelated. My ignorance of Chinese paleography is showing.

Let's move on to something I think I understand better: the words written with 阝+左:

GSR Sinograph Gloss Early Old Chinese *C.l-reduction 1 (optional) Phonemic
*C.l-reduction 2 (optional) *C.l-reduction 3:
*sə.l(ˁ)- > *s.l(ˁ)- (optional)
after all
*s.l(ˁ)- > *l̥(ˁ)-
*C.l-reduction 4:
*sə.l- > *s.j-
after all *s.l-
> *s-
*C.l-reduction 5:
*s.l- > *z-
Middle Chinese
0011d long and narrow mountain *(CV-)loj *lojʔ *lˁojʔ *lˁojʔ *dojʔ *dwajʔ *dwaˀ
0011j hanging tuft of hair *rɯ-loj *rə-loj *r-loj *ɖuoj *ɖwɨaj *ɖwie
*tV-loj-ʔ *t-loj-ʔ *t-lˁoj-ʔ *tojʔ *twajʔ *twaˀ
*(CV-)loj-ʔ *loj-ʔ *lˁoj-ʔ *dojʔ *dwajʔ *dwaˀ
0011l lazy
0011b, 0011e 墮隋 to fall
0011a to destroy
0011e, 0011f 墮隳 *sɯ-loj *sə-loj *s-loj *l̥oj *xuoj (W. dialect) *xwɨaj *xwie
0011b to shred sacrificial meat *sɯ-loj-ʔ *s-loj-ʔ *s-lˁoj-ʔ *l̥ˁoj-ʔ *tʰojʔ *tʰwaʔ *tʰwaˀ
*sɯ-loj-s *sə-loj-s *s-loj-s *l̥oj-s *xuojh (W. dialect) *xwɨajh *xwieʰ
*sə-loj-s *s-loj-s *suojh *swɨajh *swieʰ
0011i slippery
0011k beautiful *sɯ.lojʔ-s *s.lojʔ-s *s.lˁojʔ-s *l̥ˁojʔ-s *tʰojʔ *tʰwajh *tʰwaʰ
*sɯ.lojʔ *lojʔ *lˁoj-ʔ *dojʔ *dwajʔ *dwaˀ
0011c oval *s.lojʔ *s.lˁojʔ *l̥ˁojʔ *tʰojʔ *tʰwajʔ *tʰwaˀ
0011h marrow *sɯ.lojʔ *sə.lˁojʔ *s-lojʔ *sojʔ *swɨajʔ *swieˀ
0011b (place name) *sɯ.loj *sə.loj *s.juoj *zwɨaj *zwie
0011g to follow

(Thanks to David Boxenhorn for fixing the table.)

Here is a simplified table including possibilities absent from the large table. Only one path of reduction is listed per Middle Chinese reading. There are others: e.g., Middle Chinese *sə.lˁoj could be from a *s(ə).lˁoj that reduced to *lˁoj (*C.l-reduction 2) as well as a *sʌ.loj that reduced to *loj (*C.l-reduction 1).

Early Old Chinese *C.l-reduction 1 (optional) Phonemic emphasis *C.l-reduction 2 (optional) *C.l-reduction 3:
*sə.l(ˁ)- > *s.l(ˁ)- (optional)
after all *s.l(ˁ)- > *l̥(ˁ)-
*C.l-reduction 4:
*sə.lˁ- > *s.d-
after all *s.lˁ- > *s-?
*sə.l- > *s.j-
after all *s.l- > *s-
*C.l-reduction 5:
*s.d- > *dz-?
*s.j- > *z-
Middle Chinese
*sʌ.loj *sʌ.loj *sə.lˁoj
*sə.lˁoj *s.doj? *dzwaj? *dzwa
*s.lˁoj *soj *swaj *swa
*s.loj *s.lˁoj *l̥ˁoj *tʰoj *tʰwaj *tʰwa
*loj *lˁoj *doj *dwaj *dwa
*sɯ.loj *sɯ.loj *sə.loj *sə.loj *sə.loj *s.joj *zwɨaj *zwie
*s.loj *soj *swɨaj *swie
*s.loj *l̥oj *xuoj ( W. dialect) *xwɨaj *xwie
*juoj *jwɨaj *jwie

And here is a text summary of what I think happened.

Originally, there were at least six roots:

*(CV-)loj 'long and narrow mountain'

*loj 'to fall' > 'hanging hair'; perhaps also 'lazy' (fallen?) and 'to destroy' > 'to shred sacrificial meat' and even 'slippery' (causing to fall?)

*sɯ.lojʔ 'beautiful' (and 'oval'?)

*sɯ.lojʔ 'marrow'

*sɯ.loj (place name; derived from one of the other roots?)

*sɯ.loj 'to follow'

The various *sɯ- may have had various earlier sources prior to Early Old Chinese: e.g., *si, *ɕə, *tsu, etc.

These words had variation in degrees of reduction: e.g., *sɯ.lojʔ (none) ~ *s.lojʔ (partial) ~ *lojʔ (full).

Sagart (1999) gave examples of such variation in modern languages: e.g., Phan Rang Cham cơ.lan ~ clan ~ lan 'road' (quoted from Alieva 1994; from disyllabic Proto-Austronesian *zalan).

Although I assume the degree of reduction was unpredictable, I also assume that sound changes regularly applied to consonant clusters and single initials, resulting in predictable outputs (though not inputs!): e.g., all *s.l- at any given time became the same thing. (However, *s.l- became three different things at different times: *l̥-, *s-, and *z-.) Once clusters fused into new initials, they left gaps to be filled by presyllable-initial sequences that reduced into new clusters: e.g., *s.l- > *l̥- followed by *sə.l- > *s.l-,

The Middle Chinese forms in the final column are reflexes of variants that not only happened to survives but were also considered worthy of inclusion in the lexicographic tradition. Yet other variants must have existed in speech but were not recorded.

The variation in the Middle Chinese column does not imply that any given Middle Chinese speaker had three ways to say 'to shred sacrificial meat'. Even if such a word were still in use, each speaker probably only had one way to say it, and three of those ways were regarded as sufficiently prestigious. The term Middle Chinese as used here does not refer to a single coherent language; rather, it is a set of approved forms of heterogeneous origin.

*11.15.13:55: I was wondering why Baxter and Sagart reconstructed *-r in 左 ~ 佐 *tsˁarʔ-s 'to aid, assist'.

I see that 左 'left' (not 'to aid, assist') rhymes with an *-r word in Shijing 1.V.5.3. Starostin (1989: 567) reconstructed the rhyme words as

左 : 瑳 : 儺

*tsaːjʔ 'left' : *sʰaːjʔ : *n̥aːr (or *naːr).

I have rewritten his notation into IPA to faciliate comparison with Baxter and Sagart's reconstructions:

*tsˁa[j]ʔ : *tsʰˁarʔ : *nˁarʔ.

I assume that 儺 is to be read as *nˁarʔ which would rhyme better with the other two words than its other reading *nˁar. Do commentaries point to one reading or the other?

The brackets indicate that *[j] indicates that the coda is uncertain: it could be *-r as well as *-j. That rhyme sequence seems to indicate that 'left' ended in *-r: *tsˁarʔ. Since 'to aid, assist' was written with the same character as 'left', I think the two words probably both had *r. PROTO-MIN AND SINO-VIETIC EVIDENCE FOR EROSION

To demonstrate my proposed stages of Old Chinese erosion, I present my reconstructions for the words with Proto-Min reflexes and/or borrowed forms in Vietic from Baxter and Sagart (2015: 71-72). I retained Baxter and Sagart's numbering of the examples.

Type A words

168. 節 stage 3 *Cʌ-tsik > *Cʌ-tsit > stage 4 *Cə-tˁsit > stage 5 *C-tˁsit > stage 6 *tsˁet 'joint'

> Proto-Min *ts-

> Vietnamese *ts- > Tết 'New Year festival'

Also cf.


pre-Tangut *Tʌ-tsik > 4739 1tsewr1 'id.'

whose retroflexion and lowered vowel reflect a lost coronal-initial presyllable.

Monosyllabic words with low series vowels automatically developed emphasis:

167. 斗 *toʔ > *tˁoʔ 'bushel; ladle'

> Proto-Min *t-

> Vietnamese *t- > đấu 'bushel'

169. 繭 *kenʔ > *kˁenʔ 'cocoon'

> Proto-Min *k-

> Vietnamese kén 'id.'

170. 芥 *krets > *krˁets 'mustard plant'

> Proto-Min *k-

> *kɛs or later *kɛjʰ > Vietnamese cải 'cabbage'

171. 點 *temʔ > *tˁemʔ 'black spot'

> Proto-Min *t-

> Vietnamese *t- > đốm 'spot' (irregular vowel)

172. 白 *brak > *bˁrak 'white'

> Proto-Min *b-

> Vietnamese bạc 'silver'

Conversely, monosyllabic words with high series vowels did not develop emphasis:

173. 而 *nə > *nə

cf. 乃 stage 3 *Cʌ-nəʔ > stage 4 *Cə-nˁəʔ > stage 5 *C-nˁəʔ > stage 6 *nˁəʔ

If those Middle Old Chinese monosyllabic words (167, 169-173) were sesquisyllabic or polysyllabic at an earlier stage, external comparison would be necessary to identify the phonemes preceding the surviving syllables. My theory predicts the lost vowels were originally low series: e.g.,

167. 斗 stage 3 *Cʌ-toʔ > stage 4 *Cətˁoʔ > stage 5 *C-tˁoʔ > stage 6 *tˁoʔ

Type B words

Early Vietic presyllables could reflect stage 4 or 5 (if an epenthetic vowel was inserted to break up an initial cluster) in Chinese borrowings.

163. 牀 stage 3 *kɯ-dzraŋ > stage 4 *kə-dzraŋ > stage 5 *k-dzraŋ 'bed'

> Proto-Min *dzh-

> Vietic *kV-ɟ- > Rục /kciːŋ 2/, Vietnamese giưòng 'id.'

164. 種 stage 3 *kɯ-toŋʔ > stage 4 *kə-toŋʔ > stage 5 *k-toŋʔ > stage 6 *toŋʔ 'seed'

> Proto-Min *tš- may be from stage 5 or stage 6 since *t- and *k-t- merged into that Proto-Min initial

> Vietic *kV-C- > Rục /kcoːŋ 3/ 'id.', Vietnamese giống 'species, breed, strain, race, sex, gender'

165. 箴 stage 4 *tə-qəm > stage 5 *t-qəm > stage 6 *q- > *k- > *tɕ- (palatalization)

> Proto-Min *tš- (see 164)

> Vietnamese *tV-C-> găm 'bamboo or metal needle'

11.9.0:14: Later borrowings of the same word are *tV-C-> ghim and *k- > kim. The high vowels reflect the raising and fronting of to *i that in turn conditioned the palatalization of *k. Kim is a borrowing of a stage 6 form *kim prior to palatalization.

the *t- remains in Lakkia /them 1/

166. 謝 stage 2 *si-lak-s > stage 3 *sɯ-ljak-s > stage 4 *sə-ljak-s 'decline, renounce'

> Proto-Min *-dzia C; the hyphen indicates a lost presyllable

> Vietnamese *CV-ɟ- > giã 'say goodbye'

I will bridge the gaps between *sə-lj-, Proto-Min *-dz-, and Vietnamese *CV-ɟ- next time. COMEBACK

I had a good reason to not post for the past two days - and to have been emphasizing emphasis in recent posts.

On Thursday and Friday, I participated in an academic conference for the first time since 2003. Guillaume Jacques wrote a report about it. My contribution was "Old Chinese Type A/Type B in Areal Perspective".

Maybe I should have renamed my talk "Typological" instead of "Areal". But apart from one mention of Salish, I did stick to Eurasia and Egypt which is right next door.

You can download my PowerPoint presentation here. I want to supplement it with a table of my stages of (pre)syllabic erosion in Old Chinese:

Erosion stage 1 2 3 4 5 6
Number of vowels in (pre)syllables 6 4 2 1 0
Low series vowels *Ce- *Că (= *Cʌ-) *Cə- *C- *Ø-
High series vowels *Ci- *Ci- *Cɨ̆- ( = *Cɯ-)
*Cə- *Cə-
*Cu- *Cu-
Language stage Pre-Chinese Early Old Chinese Middle Old Chinese Late Old Chinese

Notes on the phases:

1. Roots were originally disyllabic with the same six vowels in both first and second syllables. Maybe either syllable could have been stressed at this point:

2. First syllables of disyllables which were unstressed (and/or lose stress?) became presyllables with less vocalic diversity than the stressed syllables that followed them. Six vowels were reduced to an Austronesian-like four-vowel system. Presyllabic *i may have left traces in some syllables that distinguish it from other high series presyllabic vowels. With the exception of those syllables, it is impossible to determie whether a high vowel syllable had *i, *ə, or *u without non-Chinese evidence. Hence I consider this stage to be pre-Chinese.

3. Low vowel *Că- presyllables (which I have been writing as *Cʌ- on my site) conditioned emphasis:

*Că-Ca > ́*Că-Cˁa

All high series presyllabic vowels merged into *ɨ̆ (which I have been writing as on my site). No emphasis developed after *Cɨ̆-presyllables:

*Cɨ̆-Ca > *Cɨ̆-Ca (no change)

4. The two presyllabic vowels merged into schwa. Emphasis is no longer predictable and beomes phonemic.

*Că-Cˁa > *Cə-Cˁa /CəCˁa/

*Cɨ̆-Ca > *Cə-Ca /CəCa/

5. The presyllabic vowels are lost, and presyllables become preinitials.

*Cə-Cˁa > *CCˁa

*Cə-Ca > *CCa

6. Preinitials were lost:

*CCˁa > *Cˁa

*CCa > *Ca

Presyllables could be in various degrees of reduction at any given time: e.g., in the earliest period, unstressed *Ce (stage 1) could have been optionally pronounced as *Ci (stage 2). In Middle Old Chinese, *Cə- and its reduction *C- coexisted side by side. This is analogous to the different degress of reduction of unstressed vowels in English ranging from spelling-like pronunciations to schwa or even zero. In a few cases, unstressed syllables can disappear entirely: e.g., because [bɪˈkʌz] ~ [bɨˈkʌz] ~ [bəˈkʌz] ~ [bkʌz] ~ cause [kʌz].

11.8.3:09: Applying my stages to those forms of because:

Stage 1/2: [bɪˈkʌz]

Stage 3: [bɨˈkʌz] (frontness neutralization)

Stage 4: [bəˈkʌz] (height neutralization)

Stage 5: [bkʌz] (loss; > [pkʌz] with voicing assimilation?) CONSONANTAL VS. VOCALIC THEORIES OF CHINESE EMPHASIS

David Boxenhorn asked me about the implications of consonantal and vocalic theories of Chinese emphasis.

First, let me define how I interpret 'consonantal' and 'vocalic' in this context.

I regard Baxter and Sagart's (2014) reconstruction as a consonantal theory. In my understanding of their system, the locus of emphasis is restricted to the initial consonants of core syllables; there are no emphatic preinitials or presyllable initials, no emphatic vowels, and no spreading of emphasis beyond the consonants.

I advocate what could be called a vocalic theory in the sense that emphasis was ultimately conditioned by low vowels in what I call Early Old Chinese. But in Middle Old Chinese, some of those low vowels were lost, and the locus shifted to the consonant (though emphasis was phonetically present in the following vowel if not the coda). Then in Late Old Chinese, emphasis was lost, and previously predictable vocalic allophones after emphatic and nonemphatic consonants became phonemic:

Early Old Chinese (no phonemic emphasis): /Cʌpi/ [Cʌpi] > [Cˁʌˁpˁiˁ]

Middle Old Chinese (phonemic emphatic consonants): /pˁi/ [pˁiˁ] > [pˁeˁiˁ]

Late Old Chinese (no phonemic emphasis): /pei/ [pei]

So my theory could also be called consonantal or even syllabic depending on which period one is looking at and whether one is looking at phonemes or allophones.

Now back to the question.

Baxter and Sagart (2014: 69) reconstruct 36 emphatic consonants. 35 of them have nonemphatic counterparts; the 36th, *ʔʷˁ, lacks a nonemphatic counterpart *ʔʷ. Conversely, there are no nonemphatic consonants lacking emphatic counterparts. The near-total symmetry between the emphatic and nonemphatic subsets of consonants is striking; it is reminiscent of the near-total symmetry between

- the emphatic and nonemphatic subsets of the phonetic (but not phonemic!) inventory of Cairene Arabic

- the palatalized and nonpalatalized consonants in Russian

(Norman 1994, the originator of the Chinese emphatic theory, regarded Russian nonpalatalized consonants as pharygealized: i.e., what I call 'emphatic'; in any case, the palatalized consonants are not simply nonemphatic.)

If we knew nothing about Slavic language history, we might notice how other Slavic languages have smaller sets of palatalized consonants or even no palatalized consonants at all (e.g., Serbo-Croatian), conclude that Russian is conservative, and project the Russian system back into Proto-Slavic. But that would be a mistake, as we know that palatalization in Slavic was secondary and conditioned by front vowels. The short front vowel */ĭ/ was lost, and palatalized consonant allophones that had once been before it and other front vowels were reinterpreted as phonemes:

*/Cĭ/ [Cʲɪ] > /Cʲ/ [Cʲ] (after loss of short */ĭ/)

*/Ci/ [Cʲi] > /Cʲi/ [Cʲi] (nonshort */i/ retained)

The large phonetic inventory of Cairene Arabic emphatics is due to emphatic spread from five emphatic phonemes /tˁ dˁ sˁ zˁ rˁ/ and the vowel /ɑ/ (Youssef 2014); there is no need to assume that Cairene Arabic preserved a far larger inventory of emphatics than Classical Arabic. (Note, however, that emphatic /rˁ/ and a back /ɑ/ phoneme distinct from /a/ do not exist in Classical Arabic. The origins of these two phonemes are deserving of investigation. I do not assume that all non-Classical traits of modern Arabic varieties are innovations; some could be retentions of traits conserved in the nonstandard dialects of Arabic conquerors but lost in the standard.)

The precedents of Slavic and Cairene Arabic make me hesitant to project the gigantic Old Chinese inventory back into a higher node or even Proto-Sino-Tibetan. There is, to the best of my knowledge, no attested Sino-Tibetan language with such an inventory. Emphatics have not been reported in any variety of Chinese. (Perhaps they are waiting to be detected; sometimes we are blind to the unexpected.) Was the Old Chinese consonant system the last remnant of a huge proto-system that was simplified everywhere else?

I don't think so. I have never seen so many emphatics in any other proto-language. I already wrote about Afroasiatic emphatics at some length last week, so here I will merely state that Ehret's (1995) Proto-Afroasiatic has only seven voiceless emphatics which mostly form 'triads' with nonemphatic voiced and voiceless consonants:

*p' *t' *tl' *s' *c' *k' *kʷ' vs. *p *t (no *tl) *s *c *k *kʷ vs. *b *d *dl *z *j *g *gʷ

In Proto-Afroasiatic, emphatics were ejectives which generally later became pharyngealized in Arabic. (Mehri emphatics seem to be in transition.)

Johanna Nichols' (2003) Proto-Nakho-Dagestanian (Northeast Caucasian) also has such triads:

Ejectives *t' *c' *cc' *č' *čč' *ƛ' *ƛƛ' *k' *kk' *q' *qq'
Voiceless *t *c *cc *čč *ƛƛ *k *kk *q *qq
Voiced *d - *ǯ - - *g - *G -

There is a phonetic reason for triads instead of tetrads with voiced as well as voiced ejectives: voiced ejectives do not and cannot exist.

The presence of voiced as well as voiceless emphatics in Old Chinese indicats that Old Chinese was not like Arabic - that its pharyngealized consonants were not from earlier *ejectives.

Interestingly, Nichols does not reconstruct pharyngealized consonants in P even though they are present in Archi and Rutul. I have not found pharyngealized consonants in Nichols' lists of Archi and Rutul reflexes. That suggests the pharyngealized consonants of those two languages are rare and possibly secondary.

I think Old Chinese pharyngealized consonants are also secondary. But why do I think low vowels conditioned pharyngealization? The low vowel a is the syllabic counterpart of the pharyngeal approximant ʕ (Pulleyblank 1997 and Operstein 2010: 177); it is to ʕ what i, ɨ, and u are to j, ɰ, and w. So I expect an Old Chinese *a-like low vowel to condition pharygealization in neighboring segments - much as back /ɑ/ does in Cairene Arabic - particularly given that northern neighbors of Chinese and their neighbors have harmonic systems in which vowel and consonant qualities are intertwined to some extent. I write the unstressed low vowel triggering pharyngealization as *ʌ, borrowing the symbol for the conventional interpretation of arae a 'bottom a' (ㆍ), the minimal low vowel of Middle Korean. I could have written it as *ă, but I wanted a symbol that was easy to distinguish from *a and that reflected my hypothesis that Chinese once had height harmony like Middle Korean.

My vocalic emphatic theory predicts that all Middle Old Chinese words with emphatic consonants once had emphasis-triggering low vowels. I used to think that Old Chinese *e and *o belonged to the same height class as *a (as they do in my reconstruction of Old Korean) and also triggered emphasis, but I am less certain of that. Maybe *e and *o-syllables also needed a preceding true low vowel to become emphatic:

*(Cʌ)Ce > *Cˁe (no presyllable needed) or *CʌCe > *Cˁe but *Ce > *Ce?

I could reinterpret *e and *o as *aj and *aw or *ja and *wa with *a (cf. Pulleyblank's *ə/*a two-vowel reconstructions of Old Chinese), but that has costs: e.g., it forces me to reinterpret *-ew as *-aɥ, etc.

Mid vowels aside, my theory predicts that Middle Old Chinese words of the type emphatic consonant + higher vowel (*Cˁi / *Cˁə / *Cˁu) should be from earlier *CACI (*A = low vowel and *I = high vowel) sequences. If Chinese borrowed such words from a polysyllabic language (e.g., Austronesian) or vice versa, the polysyllabic sources/borrowings should have begun with low vowels at the time of borrowing. Some Austronesian words with possible Old Chinese relatives pose problems for my theory; I will deal with them next time.

Conversely, if we assume that Old Chinese emphatic consonants are not innovations, then Old Chinese borrowings may preserve emphasis that was once present in the donors. The trouble is that there is no independent or internal evidence to suggest that Austronesian, Kra-Dai, Hmong-Mien, etc. had emphatics. If Old Chinese 狗 *Cə.kˁro 'dog' is a borrowing from Proto-Hmong-Mien *qluwX 'id.', why does it have an emphatic? My vocalic theory could account for the emphasis as being from a low presyllabic vowel and/or the low series vowel *o. (That would be the case even if the direction of borrowing were reversed.)

Lastly for now, my theory predicts that vowel heights for Old Chinese prefixes can be recovered if correlations between prefixes and emphasis can be made. I have yet to test this prediction.

On the other hand, the consonantal theory predicts no correlation between prefixes and emphasis since prefixes are invariably reconstructed with nonemphatics and can occur before both emphatic and nonemphatic-initial roots. A CEREBRAL COUNTEREXAMPLE? OLD CHINESE 首 'HEAD'

Last night, I wrote that there was

a specific word that made me question my old uvular theory - possibly even before I saw Baxter and Sagart's uvular proposal years ago.

That word is 首 'head' which Baxter and Sagart reconstructed as *l̥uʔ. I regard that form as Middle Old Chinese.

Last week, I wrote,

the [Early] Old Chinese initial of 'head' may be from *Kl- [...] if the word is related to Proto-Austronesian *quluh and/or Proto-Tai *krawC (Pittayaporn 2009: 323) / *kləwC (Li Fang-Kuei 1977). Proto-Hmong-Mien *kləuX 'road' (Ratliff 2010: 264) is a loan from Old Chinese 道 containing 首 as a phonetic. If 道 had an initial stop, perhaps 首 did too.

According to my old uvular theory, a *q- would be sufficient to trigger emphasis. If 首 'head' is from *quluh, its later reflexes should have stop initials and lowered vowels as traces of emphasis:

*quluh > *quluʔ > *qɯluʔ > *qluʔ > *l̥ˁuʔ > *l̥ˁouʔ > *tʰouʔ > *tʰauʔ

(The similarity to 頭 Mandarin [tʰou] and Cantonese [tʰau] 'head' is coincidental; 頭 and 首 are unrelated words.)

But the actual Late Old Chinese form of 首 was *ɕuʔ with a fricative from nonemphatic *l̥- and a high vowel that was never lowered by emphasis.

Maybe 首 has nothing to do with Proto-Austronesian *quluh and originally had a *k- as in the Proto-Tai word and the Proto-Hmong-Mien borrowing of its near-homophone 道 'road'.

Or maybe they are tied together.

Baxter and Sagart do not reconstruct *q- as a preinitial or in presyllables. In the dialect reconstructed by Baxter and Sagart, *q- is not automatically emphatic, and I speculate that nonemphatic preinitial/presyllabic *q- fused with nonemphatic *l into voiceless nonemphatic .

If Proto-Tai *krawC is a loan from Old Chinese, it could be from a dialect in which the presyllabic vowel lowered after a uvular that then fronted and became emphatic (due to lower vowel-emphatic harmony):

*qul- > *qɯl- > *qʌl- > *kˁʌl- > *kˁl- > *kˁr-

That emphasis conditioned the lowering of *-u to *-aw.

Proto-Hmong-Mien *kləuX 'road' may be from yet another dialect - one which did not shift *kˁl- to *kˁr-:

*qʌluʔ > *kˁʌluʔ > *kˁluʔ > *kˁlouʔ > *kləuʔ

The rhyme *-əuʔ may be an intermediate stage in bending between *-ouʔ and *-auʔ.

The ancestors of modern Chinese words for 'road' lost the presyllable after it conditioned emphasis:

*qʌluʔ > *kˁʌluʔ > *kˁʌlˁuʔ > *lˁuʔ > *douʔ > *dəuʔ (?) > *dauʔ

Northern Min forms such as Jianyang lau have a secondary l- that is an intervocalically lenited *-d- and not a retention of an original lateral:

*kˁʌluʔ > *kˁʌlˁuʔ > *kʌduʔ > *kʌdouʔ > *kʌdəuʔ (?) > *kʌdauʔ > *kʌlauʔ > lau

(11.4.0:24: The chronology of lenition and presyllabic loss relative to vowel changes is unknown.

For convenience, I have retained the low presyllabic vowel throughout the derivation, though it may have merged with the high presyllabic vowel at some point before the presyllable was lost. Once emphasis became phonemic for the initials of core syllables, the height of presyllabic vowels could have been neutralized without any loss of information:

Stage: locus of distinction 1: presyllabic vowel height 2: core initial emphasis 3: core vowel height
Low vowel presyllable *kʌluʔ *kə *kədouʔ
High vowel presyllable *kɯluʔ *kəl *kəjuʔ

I wrote "vowel" at the top of the column for stage 3 rather than "consonant and vowel" because in many cases the distinctions in stages 1 and 2 leave no traces on the initial: e.g.,

*kʌpuʔ > *kə > *kəpouʔ

*kɯpuʔ > *kəp > *kəpuʔ

Emphatic *pˁ- and nonemphatic *p- have merged into nonemphatic *p- in stage 3; only the vowels are distinct.

On the other hand, emphatic *lˁ- and nonemphatic *l- did not merge with each other. The former hardened and merged with emphatic *dˁ- as nonemphatic *d-, whereas the latter weakened to *j-.) DID BACK CONSONANTS CONDITION EMPHASIS IN OLD CHINESE?

David Boxenhorn asked me if emphasis in Old Chinese (OC) could be conditioned by presyllabic back consonants instead of low vowels: e.g.,

*Q.C- > *Cˁ- rather than *Cʌ.C- > *Cˁ-

My old answer was that both could condition emphasis, though low vowels in presyllables and syllables proper were the primary sources. I thought that

- uvular (and pharyngeal?)-initial syllables

- and perhaps also uvular-initial presyllables

conditioned emphasis (indicated below with *ˁ).

Contrast the development of uvulars with velars in this earlier reconstruction:

Low vowels: emphasis and secondary uvulars

*qa > *qˁa vs. *ka > *qˁa

*qe > *qˁe vs. *ke > *qˁe

*qo > *qˁo vs. *ko > *qˁo

*Cʌ.CV > *CˁV

e.g., *Cʌ.kV > *qˁV

High vowels: emphasis with original uvulars but no emphasis with velars

*qə > *qˁə vs. *kə > *kə

*qi > *qˁi vs. *ki > *ki

*qu > *qˁu vs. *ku > *ku

*Cɯ.CV > *CˁV

except: *qɯ.CV > *CˁV

However, Baxter and Sagart's OC reconstruction contrasts emphatic and nonemphatic uvulars: e.g., *q vs. *qˁ, etc. Such a distinction is rare in the world's languages; it is currently in Archi and Rutul in the Caucasus, far from China. I would rather not reconstruct an exotic distinction - at least not at an early level - so I prefer to reconstruct primary nonemphatic uvulars and secondary emphatic uvulars:

Original nonemphatic uvulars and velars retained after higher vowels:

*qə > *qə; cf. *kə > *kə

*qi > *qi; cf. *ki > *ki

*qu > *qu; cf. *ku > *ku

*Cɯ.qV > *qV; e.g., *Cɯ.kV > *kV

Secondary emphatic uvulars and velars developed after lower vowels:

*qa > *qˁa; cf. *ka > *kˁa

*qe > *qˁe; cf. *ke > *kˁe

*qo > *qˁo; cf. *ko > *kˁo

*Cʌ.qV > *qˁV; e.g.,*Cʌ.kV > *kˁV

Such a complex system eventually broke down when most of the uvulars left gaps to be filled by emphatic velars which backed:

*q(ˁ)- > *ʔ-; new *q- from *kˁ-
*q(ʰˁ)- > *x- (usually*; phonetically [χ] before lower vowels?); new *qʰ- from *kʰˁ-

*ɢˁ- > *ɢ-; new *ɢ- from *gˁ-

*ɢ- > *ʁ- > *ɰ- > *j-; new *ɢ- from *gˁ-

The new system only had one series of uvulars and no phonemic emphasis (though emphasis may have persisted at the phonetic level):

Stage Emphasis Presyllables Uvulars Velars Vowels
Early Old Chinese Nonphonemic Present One series:
One series:
One series without diphthongs:
*a *e *o *ə *e *o
Middle Old Chinese Phonemic Lost to varying degrees in dialects; loss or merger of presyllabic vowels conditioning emphasis made emphasis phonemic Two series:
... vs. *qˁ ...
Two series:
... vs. *kˁ ...
Late Old Chinese Nonphonemic One series mostly from emphatic velars: *q- < *kˁ-
But *ɢ- is from both original uvular *ɢˁ- > *ɢ- and emphatic velar *gˁ-
One series from nonemphatic velars:
Two series with diphthongs:
lower *a *e *o *əɨ *ei *ou
(< *emphatic consonant + *a *e *o *ə *e *o)
higher *ɨa *ie *uo *ɨə *i *u
(< *nonemphatic consonant + *a *e *o *ə *e *o)

(11.3.1:09: The three stages contain roughly the same amount of complexity distributed in different ways. The locus of a binary distinction traveled rightward over time:

Early Old Chinese: low vs. high vowel presyllables

*/Cʌ.pi/ [Cʌ.pi] > [Cˁʌˁ.pˁɪˁ]

*/Cɯ.pi/ [Cɯ.pi]

Middle Old Chinese: emphatic vs. nonemphatic core syllable initials; Baxter and Sagart's reconstruction corresponds to this stage

*/pˁi/ [pˁəˁɪˁ]

*/pi/ [pi]

Late Old Chinese: lower vs. higher vowels

*/pei/ [peɪ]

*/pi/ [pi]

I originally wanted to use back consonants in the examples above, but the uvular-velar shifts would complicate the three-way contrast.)

That's the big picture. Tomorrow I'll look at a specific word that made me question my old uvular theory - possibly even before I saw Baxter and Sagart's uvular proposal years ago.

*11.3.0:45: See Baxter and Sagart (2014: 102-105) for less common reflexes of *q(ʰˁ)-:

- Middle Chinese *ɕ- via *x- before front vowels including secondary fronted *a

- Proto-Min *kʰ-

- Middle Chinese *ʈʰ- from *qʰr- in eastern dialects

For simplicity, I have only listed reflexes of *q(ʰˁ)- without preinitials or presyllables at the Middle Old Chinese level. Such preceding elements resulted in even more reflexes: e.g., Middle Old Chinese *t.qʰ- became Late Old Chinese *tɕʰ- (see Baxter and Sagart 2014: 160; their Middle Chinese initials are almost identical to my Late Old Chinese initials). WHY DO OPPOSITES ATTRACT? THE MEHRI DEFINITE ARTICLE Ḥ(Ə)-

Last week I mentioned the Mehri definite article a- which only occurred before emphatics and voiced consonants.

Mehri has another definite article ḥ(ə)- which only occurs before glottal stops, f-, and voiced nonemphatics:

feature ʔ-, f- b-, d-, g-, l-, m-, n-, r-, s-, w-, y-
pharyngeal(ized) + - -
voiced - - +

Why does voiceless pharyngeal-initial ḥ(ə)- occur before voiced nonpharyngealized consonants? This distribution is what I call diachronic detritus; the synchronically inexplicable result of an earlier sound change. Rubin (2010: 71) wrote that

Many of the nouns with the definite article ḥ(ə)- have an etymological initial ʾ- [i.e.., glottal stop], which is sometimes reflected in the long ā of the definite article ḥā-.

Exceptions have y- from *y- (e.g, yūm 'days') or are due to analogy.

So ḥ(ə)- may have originally only been before *ʔ- which was lost before voiced consonants and *y-. Maybe this was the earlier distribution of articles:

1. *ʔa-

> a- before emphatic and voiced consonants

a- remains in front of ʔ- from voiced *ʕ-

> *ʔ- > zero before voiceless consonants other than *ʔ-

2. *ḥə-

> before *ʔ- (> zero in *ʔ- + voiced consonant clusters) and *y-

Why did *y-words have both *ʔa- and *ḥə-?

Was *ḥə- lost before *ʔ- + emphatic/voiceless consonant clusters?

Why is ḥə- before voiceless nonglottal f-? (I presume Mehri f is from *p- since there is no p in Mehri; was the lenition of *p due to Arabic influence?)

ḥə-f- would not be surprising if that sequence were from *ḥə-ʔw- whose cluster fused into -f-, but such a word would have a nondefinite form with (ʔ)w-, and Rubin did not note any (ʔ)w- ~ ḥə-f-alternations.

That scheme omits the plural definite article hə- ~ ha- which only occurs

- before nouns with the voiceless fricatives s- and ɬ̠- and a CCōC pattern

- the high-frequency nouns həbɛ̄r 'the camels' and hərbāt 'the companions' (sg. ərbāt)

Are the last two forms relics of a period when h-articles were more common? They have nothing in common with the other h-article forms apart from ending in long vowel + consonant sequences. DID *I CONDITION *A-FRONTING IN OLD CHINESE?

Two nights ago I reconstructed Old Chinese (OC) 射 with presyllabic *-i- without comment:

*mi-lak (*m-ljak?) 'to hit with an arrow' ~ *mi-lak-s (*m-ljak-s?) 'to shoot with a bow, archer'

I want to explain my reasoning here.

One mystery in OC phonology is why *a behaves in two different ways in syllables of the type *T(s)a(k/j): e.g.,

蛇 with two readings:

Early OC *mlaj 'snake' > Late OC *ʑjæj with a front vowel

Early OC *laj (second syllable of 'compliant') > Late OC *jɨaj with a nonfront vowel

Here are some approaches to the problem:

- Starostin (1989) reconstructed *a and *ia; the later is equivalent to my *ja above.

- A few years ago, I reconstructed *a and *æ.

- Baxter and Sagart (2014) used the notation *a and *A, stating that "Our *A is not intended as a seventh vowel; it is an explicitly ad hoc notation that basically means 'a case of OC *-a which for as yet unexplained reasons becomes MC -jae instead of MC -jo [i.e., the regular reflex].' "

My fourth approach involves the frontness of *-i- moving into the following syllable:

Early OC *mi-laj > *mljaj > *nɮjaj > *ɮjaj > Late OC *ʑjæj 'snake'

Early OC *laj > *lɨaj > *ɮɨaj > *ʑɨaj > Late OC *jɨaj (second syllable of 'compliant')

There was a chain shift *nɮ- > *ɮ- > *ʑ- > *j-. Prenasalization shielded *nɮ- from weakening to *ʑ- like *ɮ-.

Normally *a-breaking is conditioned by a high-vowel presyllable. 'Compliant' may have had only one such syllable, and the remainder of the word harmonized with it:

委蛇 Early OC *Cɯ-q(r)oj-laj > Late OC *ʔuoj-jɨaj

I assume presyllables had short vowels that were reduced versions of the six in main syllables: *i, *ə, *u, *e, *a, *o. The last three merged into a low vowel I write as *ʌ. The first three merged into a high vowel I write as except in a few cases where later fronting is a trace of *i. Perhaps at one point there was a triangular three-vowel system in presyllables somewhat like that of Pacoh: *i vs. a nonfront high vowel vs. *ʌ.

Unfortunately, I do not have any evidence for *-i- other than the fronting in later OC. Why was this fronting in such a specific environment (*coronal + *a + *-k or *-j)? Why didn't fronting occur in *-ŋ final syllables which usually develop like *-k syllables? *-ŋ has more in common with *-k than *-j, yet fronting only occurred before the latter two codas. WHAT WAS THE RANGE OF OLD CHINESE ROOSTERS?

The Chinese character 酉 currently only represents the word for the tenth Earthly Branch conventionally translated as 'rooster' in English.

The character is a drawing of a wine vessel and is used as a semantic element in characters for words with alcoholic semantics. Moreover the words 酉 for 'tenth Earthly Branch' and 酒 'wine' rhyme. So it seems likely that 酉 was originally devised for a word 'wine vessel' that was cognate to 酒 'wine'. However, are there any texts in which 酉 means 'wine vessel'?

Premodern dictionaries list three other definitions for 酉:

1. 就 (Shuowen, c. 100 AD) which has many possible translations (e.g., 'to go to'); I don't know which one was intended

2. 飽 (Guangyun, 1008 AD) 'satiated'

3. 老 (Guangyun, 1008 AD) 'old'

How old are those definitions? Is there textual support for them?

All these definitions had *u in Early Old Chinese (EOC) like 酉 and 酒. Did 酉 represent five unrelated near-homophones that later greatly diverged in Late Old Chinese (LOC)?

1. 酉 'tenth Heavenly Branch': EOC *N-ruʔ (Baxter and Sagart 2014: 372) > LOC *juʔ

2. 酉 'wine vessel', possibly cognate to 酒 EOC *tsuʔ (Baxter and Sagart 2014: 347) > LOC *tsuʔ

3. a cognate of 就 '?': EOC *[dz]u[k]s > LOC *dzuh

4. a cognate of 飽 'satiated': EOC *pʌ-ruʔ > *prˁuʔ > LOC *pɔuʔ > *pæuʔ

5. a cognate of 老 'old': EOC *Cʌ-ruʔ > *rˁuʔ > LOC *louʔ > *lauʔ

Why was 酒 EOC *tsuʔ 'wine' written with a phonetic 酉 EOC *N-ruʔ 'wine vessel'? Was a common rhyme and similar semantics sufficient to justify the choice of a phonetic with completely different initials? And what is Baxter and Sagart's justification for reconstructing a nasal prefix in 酉 EOC *N-ruʔ?

But what if my OC palatal hypothesis is correct, and 酉 'wine vessel' and 酒 were cognates with similar initials? (Also see these two follow-up posts.)

酉 'wine vessel': EOC *N-cuʔ > *ɟuʔ  > LOC *juʔ

酉 'tenth Heavenly Branch' might have had *N-c- or original *ɟ-

酒 EOC *cuʔ > LOC *tsuʔ

According to the palatal hypothesis, 就 might have been EOC *N-Cu(k)-s (*C = an unknown palatal) which would have been a good phonetic match for 酉 *ɟuʔ.

However, there is no reason to believe that 飽 and 老 ever had any palatals. Nor is there any reason to reconstruct palatal prefixes. So why would 'satiated' and 'old' be written with a palatal phonetic 酉? Did the use of 酉 for *r-words reflect a dialect or dialects in which  (already lenited to *j?) and *r had converged (or even merged?): e.g., to  and *ʐ? (Starostin reconstructed Eastern Han *ʑ- as the source of Postclassic and Middle Chinese *j-.) It would not be unreasonable to write forms like 飽 *pʐˁuʔ 'satiated' and 老 *ʐˁuʔ 'old' as 酉 *ʑuʔ. Such forms would not be in early texts in which  and *r were distinct. If 酉 represented 'satiated' and 'old' in early texts, I would have to abandon this explanation.


While researching this post, I rediscovered a 2008 post in which I compared the Common Tai ʔj- : Jiamao Hlai tsh- correspondence to the *j-*ts- alternations in Middle Chinese that I derive from Old Chinese palatal *ɟ-*c-. Last year I proposed an implosive palatal stop *ʄ- as a source of later Tai  ʔj-. Norquest (2007: 13) regarded Jiamao Hlai as a "non-Hlai language which has been in close contact with Hlai", so Jiamao Hlai tsh- may reflect a Hlai obstruent at the time of borrowing. Could that early Hlai obstruent have been *ʄ-? Can *ʄ- be reconstructed at the Proto-Kra-Dai level?

According to Norquest (2007: 338), Jiamao tsh- is in borrowings that had Proto-Hlai *tɕh-; even earlier Jiamao borrowings of words with that initial phoneme have ts- from pre-Hlai *c-. Is pre-Hlai *c- a reflex of a Proto-Kra-Dai *ʄ-? Unfortunately, I can't find the only example of a Common Tai ʔj- : Jiamao Hlai tsh- correspondence that I have (CT ʔjuu B : J tshu 'to be') from Shintani (1991: 2) in Norquest (2007).

Norquest (2007: 348) reconstructed pre-Hlai *lj- as a source of Jiamao unaspirated ts- in loanwords. That reminds me of how I used to reconstruct 酉 as EOC *luʔ though its phonetic series had *ts- in Middle Chinese (e.g., 酒 Middle Chinese *tsuʔ 'wine'). The difference is that pre-Hlai *lj- was borrowed into pre-Jiamao as *lj- which hardened to *dʑ- and devoiced to ts-, whereas there is no reason to believe that the affricate of 酒 Middle Chinese *tsuʔ is the product of hardening and devoicing; those features must be projected back into the Old Chinese reading of 酒 though they conflict with all evidence pointing toward a voiced initial for its phonetic 酉. WHY WAS OTTOMAN TURKISH ض <Ḍ> PRONOUNCED IN TWO DIFFERENT WAYS?

To answer that question, I looked at three old textbooks:

Hagopian (1907: 9): "It is generally pronounced as a hard [= emphatic] z, but sometimes as a hard d."

Barker (1854: 2): "d hard, and sometimes z"

Vaughan (1709: xxvi, 2): ’z without any reference to a stop (Oddly, ظ <ẓ> was romanized as an affricate ’dz!)

Was there a new layer of Arabic loans after1709 in which Arabic was borrowed as d instead of z? I can't imagine why that would be the case, so my guess is that Vaughan left out the d-variant, and that the variation reflects two or more strata of borrowing from Arabic before 1709.

Looking at Embarki (2013: 25-26) to go back a millennium in Arabic itself, I see that Al-Khalīl (d. 786) grouped ض <ḍ> with شج <sh> and ج <j> as 'arched' (شجريه shajriyya) in Kitāb al-`ayn. That implies <ḍ> sounded somehing like sh and j. Could it have been a lateral fricative [ɮˤ] or affricate [dɮˤ]?

Sībawayh rejected the pronunciation of ض <ḍ> and ظ <ẓ> as [θ] as "bad" in Al-Kitāb (793). This theta s reminiscent of Proto-Semitic *θˁ which is the source of Arabic /ðˁ/ ~ /zˁ/ written as ظ <ẓ>. I assume <ẓ> *ðˤ and (a continuant pronunciation?) of <ḍ> first merged into *ðˤ, devoiced to become [θˤ], and merged with [θ].

Retsö (2013: 435, 439-440) collected data pointing toward an earlier lateral pronunciation of the consonant written as <ḍ>:

- "Traces of such an articulation are found in some modern dialects in the southern peninsula"

- The name of the pre-Islamic Arabic god Ruā appears in 7th c. BC cuneiform as <ru.ul.da.a.ú> with <...l.d...>

- Arabic al-qaḍī was borrowed into Spanish as alcalde with -ld-

- "In the Modern South Arabian languages we find a laterialized and glottalized apico-alveolar consonant that etymologically corresponds to Arabic /ḍ/": e.g., Mehri ź /ɬ̠ʼ/ (I presume).

Yet he concluded "there is no real evidence that the present-day [stop] realization of the ḍād is secondary". I do not understand why.

Ehret (1995: 481) reconstructed the Proto-Semitic source of this consonant as a stop *dˁ as well as a lateral ultimately going back to a Proto-Afroasiatic lateral *dl which was "probably" an affricate (i.e., [dɮ]?). I stated my objections to a stop interpretation here. Would Retsö regard Arabic /ḍ/ as a retention of Proto-Semitic source of this consonant as a stop *dˁ?

All that reminds me of the Tangut initial that Tai Chung-pui (2008: 201) reconstructed as ld-. Tibetan transcriptions for that initial include zl- and even a single instance of c-. I suspect it was a lateral affricate [dɮ] like Proto-Afroasiatic *dl. I have yet to look into the origin of ld-.

One ld-word with external cognates is

5710 1ldiq3 'arrow' (transcribed in Tibetan as ldi(H), zliH, d-ya; see Tai Chung-pui 2008: 198)

which Guillaume Jacques (2014: 161) derived from pre-Tangut *S-lje and cognate to Japhug zdi < *l- 'arrow'. I used to think 5710 was cognate to Old Chinese 矢 *hliʔ < *sl-? 'arrow'. Now I think it might be cognate to Tibetan mda < *mla 'arrow' (via Bodman's law) and Old Chinese 射 *mi-lak (*m-ljak?) 'to hit with an arrow' ~ *mi-lak-s (*m-ljak-s?) 'to shoot with a bow, archer', as *-a often rose to -i in Tangut.

Could 矢 and 射 be the zero and *a-grades of a root *lj-K? But there is no external support for a *-j- in the root. Unlike Guillaume, I don't think there was a medial *-j- in the Tangut word for 'arrow'. No other cognates contain -j-.

Was there a consonant between *S- (which conditioned Tangut vowel tension that I write as -q) and *-l- in pre-Tangut that fused with *-l- to become an affricate [dɮ]?

Even if I believed in reconstructing *-j- in 'arrow', I could not derive [dɮ] from *-lj- before *S- as Guillaume's *S-ljo 'head' became

0124 2luq3 (transcribed in Tibetan as lu; see Tai Chung-pui 2008: 220)

with l-, not ld-.

I also wouldn't reconstruct *-j- in 'head'; I think the pre-Tangut form was *S-luH which is close to Old Chinese 首 *hluʔ < *sl-? 'head'.

However, the Old Chinese initial of 'head' may be from *Kl- rather than *sl- if the word is related to Proto-Austronesian *quluh and/or Proto-Tai *krawC (Pittayaporn 2009: 323) / *kləwC (Li Fang-Kuei 1977). Proto-Hmong-Mien *kləuX 'road' (Ratliff 2010: 264) is a loan from Old Chinese 道 containing 首 as a phonetic. If 道 had an initial stop, perhaps 首 did too. WHAT IS THE ORIGIN OF THE VOICING CONTRAST IN ARABIC EMPHATICS?

Last night, I wrote (in haste as always),

In Afroasiatic (including Mehri), there is a three-way contrast between emphatics and voiceless and voiced nonemphatics

But as I've long known, that is certainly not accurate for Modern Standard Arabic which has a four-way contrast in its alveolar stops: /tˁ t dˁ d/. And some speakers have also a four-way contrast in their alveolar sibilants: /sˁ s zˁ z/. (Other speakers have /ðˁ/ instead of /zˁ/, but no one has a voiceless dental fricative /θˁ/.) However, there is no four-way contrast for nonalveolars.

According to Islam Youssef (2006: 13, 16), Cairene Arabic has voiced and voiceless emphatics as well as voiced and voiceless nonemphatics throughout its inventory of allophones, but has only five emphatic consonant phonemes which are all coronals: /dˤ tˤ sˤ zˤ rˤ/. The distinction between /r/ and /rˤ/ (Youssef 2006: 25) is absent from Modern Standard Arabic.

To be on the safe side, I've inserted "usually" in my statement since a voicing contrast in Afroasiatic emphatics is unusual in my extremely limited experience. If Arabic is not alone, I'd like to know.

In any case, Ehret (1995: 481-482) did not reconstruct a voicing contrast in Proto-Afroasiatic (PAA) emphatics. Here are his sources for the voiced and voiceless emphatic pairs in Arabic:

PAA *tʼ  Proto-Semitic *tˁ > A /tˁ/

PAA *dl  > Proto-Semitic ~*dˁ > A /dˁ/

Omotic has ejective reflexes. I wonder if Ehret did not reconstruct this consonant as an emphatic at the PAA level because he wanted to avoid having two lateral emphatics. His PAA reconstruction has either zero or one emphatic per consonant class.

I think the Arabic stop might be an innovation since other Semitic languages in the comparative table at David Boxenhorn's blog have fricatives. I would rather not have Proto-Semitic *dˁ lenite independently multiple times while remaining intact in Arabic.

PAA *tlʼ  > Proto-Semitic *sʼ ~ *sˁ > A /sˁ/

PAA *sʼ lost its emphasis and merged with *s in pre-Proto-Semitic and hence also in Arabic.

PAA *čʼ  > Proto-Semitic *θˁ (or *tʲʼ?) > A /ðˁ/ ~ /zˁ/

Ehret cited Omotic languages which have čʼ today.

Did A /ðˁ/ become /zˁ/ to be the emphatic counterpart of /z/ so that both voiced emphatics were alveolars?

10.29.14:41: This shift also reduced the markedness of the segment since alveolar fricatives are more common than dental fricatives.

The last three sound changes have no parallels in Old Chinese unless *lˤ- became *ɮ- before hardening to *d-. I have yet to see any Afroasiatic-Sinitic parallels in emphatic evolution despite the existence of emphatic-conditioned Semitic-Sinitic vocalic parallels. Voiceless obstruents never became voiced in Old Chinese (though the opposite often occurred). HOW IS MEHRI LIKE PROTO-INDO-EUROPEAN (AND UNLIKE OLD CHINESE)?

I have long been bothered by the glottalic theory because I didn't know of any example of a language whose ejectives had become voiced stops. Then last night I discovered Mehri in which ejective (= Wikipedia's 'emphatic') fricatives have voiced allophones - and voiced consonants have ejective allophones (emphasis mine)!

Voiced obstruents, or at least voiced stops, devoice in pausa. In this position, both the voiced and emphatic stops are ejective, losing the three-way contrast (/kʼ/ is ejective in all positions). Elsewhere, the emphatic and (optionally) the voiced stops are pharyngealized. Emphatic (but not voiced) fricatives have a similar pattern, and in non-pre-pausal position they are partially voiced.

Rubin (2010: 14) wrote (emphasis mine),

As Johnstone also notes, it is not completely clear how the glottalic [= Wikipedia's 'emphatic'] consonants fit into the categories of voiced and voiceless. Johnstone (AAL [Afroasiatic Linguistics], p. 7) wrote that they are "perhaps best defined as partially voiced." What is certain is that the glottalic consonants pair with voiced consonants when it comes to certain morphological features, for example the appearance of the definite article (§ 4.4) and the prefix of the D/L-Stem (§6.2). 

Let's look at Rubin's coverage of those two features:

The definite article a- is found before the consonants b, d, ð, ð̣, g, ġ, j, ḳ, l, m, n, r, ṣ, ṣ̌, ṭ, w, y, z, and ź (voiced and glottalic consonants), though not all nouns beginning with those consonants take the article a-.


The definite article is also used with nouns beginning with ʾ, though only when the ʾ derives from etymological ʿ.


The definite article a- usually does not occur (or, one could say it has the shape Ø) before the consonants f, h, ḥ, k, s, ś, t, t, and x (voiceless, non-glottalic consonants). (p. 69)

The prefix a- appears only in [D/L-Stem verbs] when the initial root letter is voiced or glottalic, similar (but not identical) to the distribution of the definite article (see § 4.4). (pp. 93-94) 

Conversely, Rubin paraphrased Johnstone's observation about nonemphatics on p. 6 of Afroasiatic Linguistics:

Aspiration of most of the voiceless non-glottalic [= nonemphatic] consonants constitutes an important element in the distinction of glottalic/non-glottalic pairs. (p. 14)

Although emphatics conditioned similar vowel changes in Mehri and Old Chinese (OC), the resemblance between the two stops there:

- In Afroasiatic (including Mehri), there is usually a three-way contrast between emphatics and voiceless and voiced nonemphatics, but in OC, there was a six-way contrast defined by voicing, aspiration, and emphasis: e.g.,

Mehri: /tˁ t d/ : cf. Proto-Indo-European (and Archi [Northeast Caucasian] and Ubykh [Northwest Caucasian]) */tʼ t d/

OC: *tˁ *tʰˁ *dˁ *t *tʰ *d

- There is no pairing between emphatics and voiced nonemphatics in Old Chinese.

Cf. how ejectives and voiced nonejectives did merge in many Indo-European varieties.

- Aspiration plays no role in distinguishing between emphatics and voiceless nonemphatics in Old Chinese.

But it may have distinguished between ejectives and nonejectives in Proto-Indo-European: e.g., */tʼ t d/ could have been *[tʼ tʰ dʱ].

These differences between Mehri and OC do not necessarily invalidate what I could call the 'Chinese Glottalic Theory'. But they do imply that there are limits of using Mehri (or any similar language) for predicting phonetic phenomena in OC.

I wish that OC did not have unique typology. Maybe it didn't. I would not be surprised if emphasis played a role in the development of Tangut vowels. HOW IS MEHRI LIKE OLD CHINESE?

I have known for some time about the existence of the Modern South Arabian languages (not to be confused with Old South Arabian). However, I had never read anything about them. I couldn't even name one until tonight when I discovered Rubin's The Mehri Language of Oman (2010). I immediately went to the section on vowels and found synchronic shifts reminscent of diachronic shifts in Old Chinese:

Mehri shift Glottalic Gutturals Liquids Cf. Old Chinese
/iː/ > [aj] *Cˁi > *Cej (> southern *Caj)
/uː/ > [aw] *Cˁu > *Cow > *Caw
/eː/ > [aa] X no parallel

I have converted Rubin's notation into IPA as used in the Wikipedia article on Mehri.

Lowering in Mehri is conditioned by three classes of consonants:

Glottalic: /tʼ θʼ ɬ̠ʼ sʼ kʼ/ (no known cases of /ʃʼ/ followed by /iː uː eː/)

Guttural: /χ ʁ ħ ʕ ʔ h/

Liquid: /r l/ if "there is normally a glottalic or guttural consonant elsewhere in the root" (Rubin 2010: 29; i.e., glottalic and guttural consonants' lowering effects can spread 'through' them: e.g., [məʁrajb] 'well-known' which I presume is phonemically /məʁriːb/)

10.27.11:52: This phenomenon is reminscent of the 'transparency' of Thai sonorants (not just liquids) in tonal development: e.g., อร่อยʔàrɔ̀ɔy 'delicious' should in theory have an r-conditioned falling tone on its second syllable, but it has a glottal stop-conditioned low tone as if the -r- didn't exist. However, in Thai, the conditioning consonant has to precede the 'transparent' sonorant, whereas in Mehri, the consonants conditioning lowering can be in the order liquid ... glottalic/guttural as well as the reverse: e.g., [məlawtəʁ] 'killed' (masc. pl.) which I presume is phonemically /məluːtəʁ/.

Wikipedia calls the glottalic consonants 'emphatic' and lists pharyngealized allophones for all but /kʼ/. Thus they may be comparable to the Old Chinese pharygealized 'emphatic' consonants that conditioned the lowering of high vowels. However, Mehri has a limited set of mostly coronal emphatics, whereas Old Chinese had an emphatic counterpart of every single nonemphatic consonant and even *ʔʷˁ which had no nonemphatic counterpart (Baxter and Sagart 2014: 69).

Early Old Chinese as reconstructed by Baxter and Sagart did not have any uvular or pharyngeal fricatives. I have hypothesized that Old Chinese had pharyngeal fricatives for Old Chinese, but there is no strong evidence for them. I do, however, believe Late Old Chinese developed uvular fricatives accompanied by lowered vowels.

The Old Chinese counterparts of Mehri glottals and liquids did not condition lowering. I would not have expected Mehri /ʔ h/ to condition lowering since I have never heard of nonpharygealized glottals having that effect in any other language. HOW DID CAMISIA BECOME KAMĪZ?

I was surprised that Latin camisia 'shirt' was borrowed into Arabic as قميص qamīṣ with three features I woudn't expect:

1. uvular q for [k]

2. a long ī for short [i] (to match an existing a ... ī vowel template? lengthening in some intermediary language?)

3. an emphatic for nonemphatic [s]

Why not borrow the word as *kamis? And why does Urdu has -z in qamīz? I assume kamīz and kamīj show different degrees of Indicization (i.e., avoidance of un-Indic q and z).

10.26.0:26: While I'm on this topic, a kamīz is half of a shalvār kamīz outfit. Why was Persian شلوار‎‎ shalvār borrowed into Arabic as سروال‎ sirwāl instead of شلوار‎‎ *shalwār? Was i ... ā an existing vowel template? Why not preserve the consonants instead of backing s and reversing l and r?

10.26.3:26: Is kamīz a direct borrowing from Portuguese camisa [kɐmizɐ]? Is qamīz a compromise between qamīṣ and kamīz? WHY RECONSTRUCT BOTH *-J- AND *-I̯- IN PROTO-HMONG-MIEN?

After a two-day detour into Korean ... back to my favorite topic!

I've written a series of posts starting here about how I've been troubled by Ratliff's (2010) Proto-Hmong-Mien (PHM) distinction between *j *ʷ in onsets and i̯ u̯ at the beginning of rhymes. Why not simply abandon the distinction, reduce the four to two (*j *w), and rewrite *-ji̯- as *-j-? The short answer is because the distinctions indicated by her notation are real, even if that notation may not be phonetically precise. Let's look at *tshji̯əŋ 'new' (last seen here) again and break it up into an initial and a rhyme.

I have no doubt that PHM had a distinction between the initials that Ratliff reconstructed as *tsh- (3.2) and *tshj- (3.17) which have different reflexes. Initial 3.17 clearly has a palatal quality absent from initial 3.2. Ratliff already reconstructed an aspirated palatal stop *ch- (4.2).

PHM Hmongic Mienic
Yanghao Jiwei White Hmong Zongdi Fuyuan Jiongnai Pa-Hng Luoxiang Mien Mun Biao Min Zao Min
3.2. *tsh- sh- s- txh- [tsh] s- tsh- θ- ɕ- θ- tθ- s- h-
3.17. *tshj- ɕh- ɕ- tsh- [tʂh] ɕ- s- s-
4.2. *ch- tɕh- tɕh- ch- [ch] tɕ- tɕh- t- tɕh- s-/ȶ- ȶh- ts-/f-

(I have excluded the initials of 'thousand' from 3.2 since some forms look like they might be post-PHM borrowings from Chinese. I excluded the initials of 'new' from 3.17 [even though I chose 3.17 because 'new' was reconstructed with it!] for reasons I will discuss next time.)

Those PHM initials remained unchanged in Ratliff's reconstructions of Proto-Hmongic and Proto-Mienic.

If 3.17 were something other than *tshj-, it might have been *tʃh- or *tʂh- (as in White Hmong tsh [tʂ]) which are absent from Ratliff's reconstruction. A retroflex value may fit Pulleyblank's hypothesis of *ks-retroflexion.

I am hesitant to reconstruct a palatal affricate *tɕh- since I know of no language contrasting palatal affricates with stops.

Similarly, rhyme 18g (*-i̯əŋ) is clearly more palatal than rhyme 21d (*-əŋ):

PHM Hmongic Mienic
Yanghao Jiwei White Hmong Zongdi Fuyuan Jiongnai Pa-Hng Luoxiang Mien Mun Biao Min Zao Min
18g. *-i̯əŋ -i -ia -æin -en -eŋ -ĩ, -e -aŋ -(j)aŋ -aŋ/-in -jaŋ/-ɛŋ/-ɔŋ
21d. *-əŋ -aŋ -ɑŋ -o [ɔ] -oŋ ? -æ̃ ? -ɔŋ -ɔŋ

It does not help that Ratliff (2010: 161) gave only a single example of 21d, though it is a basic word (*hməŋH 'night'). I am surprised that *-əŋ is rarer than *-i̯əŋ; that may imply that 21d was more marked than 18g rather than the other way around. Or is the rarity of 21d simply random?

10.25.16:57: Are there reflexes of 18g and 21d with schwa? None of the sample of eleven languages have schwa. Of course there is no need for a proto-phoneme to be preserved intact in at least one descendant language, and a mid central vowel could easily lower to a or back to o.

Ratliff reconstructed the Proto-Hmongic reflex of 18g as *-in. Other PHM rhymes that merged with 18g in Proto-Hmongic are 18b *-im, 18c *-in, 18d *-iŋ, 18e *-i̯əm, and 18f *-i̯ən. I wonder if Proto-Hmongic *-in was phonetically something like *[iən] (cf. Southern American English diphthongization). White Hmong -ia more or less retained the original diphthong but lost the *-n, whereas Zongdi reversed the diphthong but kept the nasal: *-iən > -æin. Yanghao lost the schwa while Jiwei, Fuyuan, and Jiongnai fused *iə into a mid front vowel. Pa-Hng generally lost the schwa like Yanghao, but its word le³ for 'plum' has a front vowel. (I expected Jiwei, Fuyuan, and Jiongnai to also have front-vowel words for 'plum', but Pa-Hng is the only Hmongic language with 'plum' in Ratliff's language sample.)

Tangut fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2015 Amritavision