Here are the remaining types of 白水村 Baishuicun (BSC) 'White Water Village' -ai forms that are not from non-*a vowels followed by nasals (see part 5).

4. 白 pai and : Middle Chinese *bæk

These look like loans from standard Mandarin bai [paj] and southwestern Mandarin pə. The 小學堂 Xiaoxuetang database does not list Hunan Mandarin forms, so the southwestern Mandarin forms in this post are from Guangxi to the west of Hunan.

5. 擲 tsai : Middle Chinese *ɖiek

I would expect *tsiə. A loanword? But the rhyme doesn't match southwestern Mandarin tsɿ. Maybe tsɿ was borrowed as *tsə whose schwa shifted to -ai. See the next class.

6. 而兒爾 ai : Middle Chinese  *ɲɨ, *ɲie, *ɲieˀ

All three are ə in southwestern Mandarin. Perhaps all three were borrowed as *ə which later broke to -ai. That shift is parallel to the shift of Old Chinese *-əˁ to Mandarin -ai (see part 6 for examples of Mandarin -ai forms borrowed into BSC).

7. 日入 ai : Middle Chinese *ɲit, *ɲip < Old Chinese *nit, *nip;  栗 lai : Middle Chinese *lit

These are i, y, and li in southwestern Mandarin. All three once had a final glottal stop. Were they borrowed into BSC as *iʔ, *iʔ, and *liʔ whose *i broke to ai before a now-lost glottal stop? Other BSC forms which may be loans of -i forms without glottal stops do not end in -ai.

The first two have alternate forms.

BSC 入 y (if I am reading Xiaoxuetang correctly) looks like a recent direct loan from southwestern Mandarin.

BSC 日 ɲi is close to Middle Chinese *ɲit and may be an old loan postdating the palatalization of Old Chinese *n-.

BSC 日 na and 入 na may be native. Could their n- be a retention of Old Chinese *n-?

8. 睡 fai could be a borrowing from a southwestern Mandarin form resembling Lingui suei and Luorong suɐi. f- may be a sporadic simplification of *sw-. I cannot find any other examples of f- from sibilants in BSC.

When I wrote part 6, I thought there were eight classes of exceptions, but now I see ten. (I broke up one class into classes 4, 5, and 7. Although 4 and 5 both had *-k, 7 is completely dissimilar and its inclusion was a mistake.)

9. BSC 甫 pai : Middle Chinese *puoˀ

I have no idea why this form has a -i.

10. BSC 些 s-, l- + -ou, -iə, -ai, -əi : Middle Chinese *sjæ

I don't know for sure which initials go with which finals. I suppose the first form is sou and the last one is ləi. BSC -iə regularly corresponds to Middle Chinese *-jæ, so I guess the second form is siə. If the third form is lai, it cannot be related to the first two since BSC l- is not from *s-. A DIP INTO WHITE WATERS (PART 7): AI-XCEPTION CLASSES 2-3

In part 5 I proposed the following chain shift in 白水村 Baishuicun (BSC) 'White Water Village':

*-VN > *an > *-ai > *-oi > -o

*V was a non-*a vowel.

Most BSC -ai are from *-VN with the exception of eight types of forms.

The first type in part 6 was borrowed after the shifts took place.

There is only one example of the second type: 鏘 *tshɨaŋ 'jangling noise'. Onomatopoetic words may be exceptions to general developments.

There is also only one example of the third type: 黌 *ɣwæŋ 'school' whose coda may have later fronted to *-ɲ.

Types 2-3 may be borrowings postdating the shift of *-VN to *-an but predating the shift of *-an to *-ai:

Early pre-BSC
Late pre-BSC
(N/A since these are loans)
thai (tshai?)
*kæŋ > *kaɲ > *kan

The th of thai may be a typo for tshai in the 小學堂 Xiaoxuetang database, as it lists no other examples of BSC th- from *tsh-, and thai sounds even less like a jangling noise than tshai does.

鏘 and 黌 are low-frequency characters, so their BSC readings thai (tshai?) and xai may be literary borrowings without colloquial (i.e., native) equivalents *tshoi and *xoi. <HACÑ(Ī)>

And now for a Thai-Islamic detour that will bring me back to 白水村 Baishuicun 'White Water Village' ...

Last night I found the Thai Wikipedia article for hajji which is titled หัจญี <hacñī> [hàtjiː]. It lists another form ฮัจญี <ɦacñī> [hátjiː]. According to, the 1982 edition of the Royal Dictionary lists two more forms:

หะยี <haḥyī> [hàʔjiː]

หัจญี <hacñī> [hàtjiː]

Although Thailand has a large Muslim population which is two-thirds Malay, I never looked at any Thai terminology for Islam or Thai transcriptions of Malay until now. The above forms made me wonder:

1. Is Thai terminology for Islam based on Malay: e.g., are [hàtjiː] etc. loans from Malay haji rather than directly from Arabic ḥajjī?

2. Thai has no consonant like Malay and Arabic j [dʒ]. I am accustomed to seeing English [dʒ] rendered as [j] or [tɕ]: e.g.,

เอเนท์ <ʔecend̽> [ʔeːên]

เอเนต์ <ʔecent̽> [ʔeːên]

เอเนต์ <ʔeyent̽> [ʔeːjên]

(The falling tone of the second syllable is unwritten in those spellings. I assume the tone is falling on the basis of alternate spellings with a first tonal marker*, though I would expect a high tone in a final syllable that orignally ended in a nasal-stop cluster in English.)

But I have never seen a foreign [dʒ] rendered as Thai จญ <cñ> [tj] before. Another instance of <cñ> is

ฮัจญ์ <ɦacñ̽> [hát] 'hajj' (with a silencer over <ñ>; syllable-final <c> is [t])
ญ <ñ> normally represents [j] from an earlier *ɲ. Why is it in transcriptions for a nonnasal consonant?

3. What principles underlie the choice of tones for Islamic/Malay loans in Thai? The first closed syllable of 'hajji' has both high and low tones (the two most common possibilities for closed syllables with short vowels), and the second open syllable has an unmarked mid tone (like some but not all English loans).

4. Who devised these Thai spellings? Were they Thai-Malay bilinguals? Did they know the Jawi script for Malay (which I briefly mentioned here)?

The Malay spoken in the Thai-Malaysia border region has a number of interesting phonetic characteristics that are apparently not reflected in Jawi spelling which seems to be historical. The fronting of *aN to /ɛː/ reminds me of the *-an > *-ai shift from parts 2 and 5 of my series of Baishuicun; in both cases, a final nasal conditions the fronting of a preceding *a.

*The first tonal marker indicates a falling tone in a sonorant-final syllable when it is atop a *voiced consonant symbol such asญ <ñ> or ย <y>. It indicates a low tone in such a syllable when it is atop an *implosive or *voiceless consonant symbol. The starred (i.e., reconstructed) qualities are not necessarily retained in modern Thai. *Voiced obstruents have devoiced,  *voiceless sonorants have voiced, and the *implosives are no longer implosive: e.g.,

*ban¹ > [ân]

*an¹ > [màn]

*ɓan¹ > [bàn]

On the other hand, *voiced sonorants and *voiceless obstruents retain their original voicing qualities:

*man¹ > [mân]

*pan¹ > [pàn]

*an¹ > [àn]

A tonal split (*¹ > falling/low) compensates for the loss of voiced obstruents and voiceless sonorants. The implosives and have moved into the space vacated by orignal *b and *d (which have become [pʰ] and [tʰ]), but vowels following implosives still bear tones associated with *implosives. HAJI, HAZHE, HAZHI

Sorry, I'm on another Sino-Islamic detour.

The Chinese Wikipedia article for hajji has three types of transcriptions (readings here are in Mandarin unless stated otherwise and tones are not included):

1. *velar-initial second syllable:

哈吉 haji [xatɕi], 阿吉 aji

These must postdate the recent palatalization of *k in Mandarin.

2. affricate-initial* open second syllable:

哈只 hazhi [xatʂr̩] (the transcription in the article title), 哈芝 / 哈指 / 哈治 / 哈志 hazhi

The second syllables of these transcriptions have different tones:

'yin level': 芝

'rising': 只指

'departing': 治志

3. affricate-initial *closed second syllable:

哈哲 hazhe [xatʂɤ] (Cantonese haazit [haːtsiːt])

Mandarin 哲 zhe has lost the *-t retained in Cantonese and has a 'yang level' tone in the standard language.

I have several questions:

1. What are forms for 'hajji' in the Chinese varieties spoken by the 回 Hui people?

2. Is there a standard tone class for the second syllable in 回 Hui speech (which is not to be confused with 徽州 Huizhou Chinese)?

3. How is 'hajji' written in the toneless Xiao'erjing script?

4. Do the spoken and written forms in the Hui community match the transcriptions of the non-Hui Chinese world?

5. What is the oldest known Chinese character transcription of 'hajji'? My guess is that the earliest transcriptions were of the hazhi type.

6. Wikipedia states that 哈哲 hazhe (Cantonese haazit) is the transcription used in Hong Kong and Macao. If this transcription was devised by a Cantonese speaker, why does its Cantonese reading have a -t corresponding to nothing in Arabic? If I set the Chinese Wikipedia page to display in Hong Kong or Macao complex characters, the title of the article is still 哈只 hazhi (Cantonese haazi [haːtsiː]) which is a better phonetic match in Cantonese.

*These affricates are not original either, but their affrication predates Islam and is not relevant. AFANTI

I'm going to take a northern detour away from 白水村 Baishuicun 'White Water Village' to look at Mandarin 阿凡提 afanti 'effendi' (< Arabic afandī or the like). I've long had the impression that such Islamic loanwords were borrowed into Mandarin in recent centuries. However, afanti has an aspirated -t- [tʰ] which is a weak match for foreign -d-. The t- [tʰ] of Mandarin 提 is from *d-. Was  afandī borrowed into a Chinese language that retained *d-? I doubt that for two reasons.

First, I would expect Islamic loans to be from the northwest, and Tangut transcription evidence indicates that *d- had become *tʰ- in the northwest by the early second millennium AD.

Second, a Chinese language retaining *d- in 提 would probably also have retained *v- in 凡. Was afandī borrowed as *avandi? I suppose one could try to evade this problem by proposing that this Chinese language devoiced *v- before *d-, so afandī was borrowed as *afandi. I have thought that the Chinese variety underlying Sino-Vietnamese (John Phan's 'Annamese Middle Chinese'; AMC) might have devoiced *v- before *d-*, but I am not sure. In any case, AMC could not have been the source of afanti for geographical and phonological reasons. 凡 had a final *-m in AMC that does not match the -n- of afandī.

By coincidence the ultimate Greek source of afandī is αὐθέντης <authéntēs> with a -t- corresponding to the  -t- [tʰ] of Mandarin afanti. Obviously afandī postdates several changes in Greek:

- the shift of au to

- the shift of aspirates to fricatives (tʰ > θ)

- the devoicing of β to -f- before voiceless consonants like θ

- the simplification of -fθ- to -f-

- the voicing of -t- to -d- after -n-

- the raising of -ē- to -i-

So I'm back to where I started: why does afandī correspond to Mandarin afanti?

*My logic was that *v- patterned like *f- in Sino-Vietnamese (SV):

AMC SV stage 1 SV stage 2 SV stage 3 Modern SV
*pʰ- *pʰ- *pʰ- *pʰ- ph- [f]
*v- > *f-
*b- *b- *p- *ɓ- b- [ɓ]
*p- *p-

Early Vietnamese had no *f-, so *pʰ- was the closest equivalent of AMC *f-.

If AMC still had *v-, I would expect it to correspond to SV b- < *b-. (I assume early Vietnamese had no *v-, and that modern v- is from *w-. There are no cases of Chinese *v- corresponding to Vietnamese v-, which leads me to believe that Vietnamese had no *v- at the time of borrowing and that the shift of *w- to v- postdates borrowing.)

I now think SV reflects a stage of AMC in which all voiced obstruents had been devoiced:

AMC SV stage 1 SV stage 2 Modern SV
*pʰ- *pʰ- *pʰ- ph- [f]
*v- > *f-
*b- > *p- *p- *ɓ- b- [ɓ]

Vietnamese spelling reflects the *pʰ-/ɓ-stage of the 17th century. A DIP INTO WHITE WATERS (PART 6): AI-XCEPTION CLASS 1

About one-seventh of 白水村 Baishuicun (BSC) 'White Water Village' -ai forms cannot be traced back to rhymes with non-*a-vowels plus nasals at the left end of this chain from part 5:

*-VN > *an > *-ai > *-oi > -o

I have classified those remaining forms into eight categories.

The first category consists of -ai forms from Old Chinese *-əˁ:

tai (borrowing layer 2), to (borrowing layer 1) < *dˁəˁ < *Cʌ-dəʔ or *Nʌ-təʔ

mai (borrowing layer 2) < *mˁəˁʔ < *Cʌ-məʔ

tai (borrowing layer 2), lo < *CV-tai (borrowing layer 1) < *Nʌ-tˁəˁsˁ < *Nʌ-təs

the tone of tai (but not lo!) indicates a voiced initial *d- which may be from *Nʌ-t-

pai (borrowing layer 2), (native) < *bˁəˁ < *Cʌ-bə or *Nʌ-pə

I think there are at least three layers in these forms.

is native and may directly reflect Old Chinese *-əˁ. BSC borrowed from prestige dialects whose *-əˁ developed a glide:

*-əˁ > *-əɰ > *-əj > *-aj

The first layer of borrowings predates the *-ai > -o shift in BSC and the second layer postdates it.

The first layer of borrowings also predates lenition and the loss of presyllables in BSC. A DIP INTO WHITE WATERS (PART 5): A CH-*AI-N SHIFT?

The 小學堂 Xiaoxuetang database is back, so I can add a new link (in bold) to my 白水村 Baishuicun (BSC) 'White Water Village' chain shift from part 2:

*-VN > *an > *-ai > *-oi > -o

Here are some sample words with composites of prestigious Early and Late Middle Chinese cognates for comparison:

Sinograph Early Middle Chinese Late Middle Chinese Pre-BSC BSC
*dəm *dam *tan tai
*len *lien *lan lai
*ʂɤan *ʂæn *san sai
*khwan *khwan *khan khai
*təŋ *təŋ *tan tai
*lɨəŋ *lɨəŋ *lan lai
*neŋ *nieŋ *nan lai ~ nai
*tshɨm *tshim *tshan tshai
*mun *vun *man mai ~ uai
*mon *mon *man mai
*kən *kən *kan kai
*sin *sin *san sai
*touŋ *toŋ *(CV-)tan lai
*luoŋ *lyoŋ *lan lai

Rhymes with non-*a-vowels plus nasals merged into *-an, which shifted to -ai after an earlier *-an shifted to *-oi and an earlier *-oi shifted to -o. Here is how that merger might have taken place:

Stage 1 Stage 2 Stage 3
*-in *-en *-an
*-ɨm *-en or *-on or *-ən
*-on *-on

In stage 1, pre-BSC had a vowel system resembling prestige EMC.

In stage 2, pre-BSC front vowels merged into *e before nasals and back vowels merged into *o before nasals. Central vowels could have merged into *e, *o, or before nasals.

In stage 3, pre-BSC *-en and *-on merge into *-an. The *-n conditions an *-i- that remains after the nasal is lost:

*-an > *-ain > -ai

There are two forms in the first table that would not be in the second:

*san > sai

*khan > khai

I think those forms were borrowed after earlier *-an became *-ai which then became -oi. Hypothetical native cognates would be *soi and *khoi.

mai may be native, whereas 文 uai is a borrowing from some late Tang or newer form resembling Sino-Vietnamese văn. (It is geographically impossible for BSC to have borrowed from Sino-Vietnamese, but a local neighboring language could have had a similar form.) uai tells us that the *-an to -ai shift postdates the *m- > v- shift in the source of uai.

The l- of 冬 loi and 東 loi is due to lenition after a prefix *CV- that was lost at some unknown point.

I have kept 龍 loi⁴¹ separate from the homophones 冬 loi⁴⁴ and 東 loi⁴⁴ since they have different tones: a mid-high falling tone from a *voiced initial (*l-) and a mid-high level tone from a *voiceless initial (*-t-).

As has been the case so far, a single rule cannot account for all forms with the same rhyme. I will write about other sources of *-ai in part 6. A DIP INTO WHITE WATERS (PART 4): LEFT-*O-VERS

One-fifth of 白水村 Baishuicun (BSC) 'White Water Village' dialect forms ending in -o cannot be explained using the sound laws I proposed in parts  2 and 3. I will try to explain these forms which fall into eight categories. I could have covered the first two categories in part 3, but they are more complicated than the majority of the *-at class.

1. 佛 fo < *fat (< *but) 'Buddha'

This appears to be a loan from a language in which *-ut became *-at as in Cantonese 佛 fat (though Cantonese is not spoken in Hunan and therefore is not the source of fo). An earlier loan is pu which is close to prestigious Early Middle Chinese *but, an abbreviation of 佛陀 *but da 'Buddha'.

Could the word simply be a very recent borrowing from Mandarin fo (whose rhyme is irregular)?

I am reluctant to guess glosses for BSC forms since the 小學堂 Xiaoxuetang database does not provide any, but in this case I'm pretty sure 佛 is 'Buddha' (though 佛 may have other meanings in BSC), and it would be odd to mention 佛陀 *but da without also mentioning its Indic source.

2. 蝨 so < *sat (< *ʂɨt < *ʂit < *srit < *srik)

This might be a loan from a language in which *-it became *-at as in Cantonese 蝨 sat (though I must note again that Cantonese cannot be the source of this particular word).

3. -o < *-ai

In part 2, I dealt with BSC -o-forms corresponding to -ai-type rhymes in other Chinese languages. The following words may have had pre-BSC *-ai even though they may not have -ai-like rhymes in other Chinese languages:

Sinograph Old Chinese Early Middle Chinese Late Middle Chinese Pre-BSC BSC
(*srəj > *ʂɨj?) *ʂi *ʂi *sai so
*Cɯ-baj or *Nɯ-paj *bɨe *bɨi *pai po
*ʔəj *ʔɨj *ʔi *ai o
*pəts > *pɨjh *pujʰ *fi *pai po
*məjʔ > *mɨjʔ *mujˀ *vi *mai mo
*pajʔ *paˀ *pa *pai po
*naj *na *na *nai no
*lats *da(j)ʰ *da(j) *tai to

Pre-BSC *-ai may have been a reflex of Old Chinese *-aj/-əj/-ɨj-type rhymes; it cannot have been borrowed from a prestigious EMC or LMC dialect. (BSC 衣 i is a loan from an prestige LMC-like form.)

篩 is not attested in Old Chinese. Pre-BSC 篩 *sai is reminiscent of Cantonese 篩 sai, though it cannot be from Cantonese. The word could be borrowed from a form like Mandarin 篩 shai (whose reading is from 籭; the hypothetical regular Mandarin reading would be *shi).

4. 麥 mo < *mai

This has an unexpected yang departing tone. Is this a very late loan from a form like Mandarin 麥 mai which lost its final stop and entering tone? If it were a native word or an early loan, it should have an entering tone as a trace of its original *-k.

5. -o < *-raw

ko is ultimately from Old Chinese *kraw but it could be a loan from *kaw or *ko in some later language. I can't look further into this (or anything else BSC-related) because the Xiaoxuetang site is down.

6. -o < (*-wa? <) *-ra

There are three forms in this category: 灑下拏. 咬 from category 4 may also belong to this category, as pre-BSC or a source language may have been like Taiwanese which has -a from both *-ra and *-raw:

咬 T ka (kau is a borrowing)

灑 T sa (borrowed?; se may be native; cf. 下 below)

下 T literary (i.e., borrowed) ha; the colloquial (i.e., native) form is e

拏 T na (borrowed?; displaced a native *ne?)

"Like" does not entail a close relationship. Taiwanese is both genealogically and geographically distant from BSC.

7. -o < *-ə(ŋ)

Is 扔 no from Old Chinese *nəŋ or an open-syllable variant *nə (cf. Japanese no < *nə for 乃, the phonetic of 扔)?

8. Unrelated synonyms

I initially thought -o of 久 no might be a reflex of Old Chinese *-ə in 久 *kʷəʔ, but I doubt n- is from *kʷ-. Moreover, the yang departing tone of no is not what I would expect for a descendant of 久 *kʷəʔ. I conclude that no is an unrelated synonym.

mo < *mat? has an initial and an entering tone (from an earlier final stop) that rule out any relationship with Old Chinese *Cɯ-ʔoj. Even if *C- were *m-, the tone would still be irregular.

no < *nat? has the same problems as 萎 mo; it cannot be related to EMC 拋 *phræw (the word is not attested in Old Chinese).

ŋo < *ŋat? is not related to Old Chinese 鷹 ʔəŋ.

ko superficially resembles Sino-Japanese but may be from *kat which cannot be related to EMC 硬 *ŋɤeŋʰ (the word is not attested in Old Chinese, and the SJ initial is irregular). A DIP INTO WHITE WATERS (PART 3): FINAL ST-*AP-S

The chain shift I proposed in part 2 explains three-fifths of -o syllables in the 白水村 Baishuicun (BSC) 'White Water Village' dialect:

*-ai > *-oi > -o

Another fifth requires further sound laws:

*-ap merged with *-at (cf. the *-p > -t shift in Nanchang which is 500 km to the east and not closely related)

*-at > *-ait > *-oi(t) > *-o

I propose that *-t had a fronting effect on *a similar to the fronting effect of -d in Tibetan:

'eight': Proto-Sino-Tibetan ?*prjat >

Old Chinese *pret > pre-BSC *pat > pait > *poi(t) > BSC po

Earlier Tibetan brgyad > Lhasa cɛʔ

Here are more examples to show the merger of multiple *-t and *-p rhymes. The Early and Late Middle Chinese forms here are composites based on prestige dialects and are not directly ancestral to BSC.

Sinograph Early Middle Chinese Late Middle Chinese Pre-BSC BSC
*ləp *lap *lap > *lat lo
*ɣɤap *ɣæp *xap > *xat xo
*kɤep *kæp *kap > *kat ko
*ɣwet *xwiet *xwat fo
*xɤat *xæt *xat xo
*pɤet *pæt *pat po
*puot *fat *fat fo

Although BSC no longer has any final stops, they have conditioned tones absent from words without original final stops: high level if the initial consonant was *voiceless and mid level if the initial consonant was voiced. I don't know when the final consonants were lost after *a shifted to *ai before *-t.

發 must be a loanword because it has *f- instead of the expected *p- (see my discussion of 煩 xoi and 佛 pu ~ fo in part 1). Foreign *f- might have been borrowed as *f- after BSC developed its own *f- from *xw- in words such as 穴. Conversely, it is also possible that *xw- became *f- after loanwords introduced *f- into the BSC phonemic inventory. A DIP INTO WHITE WATERS (PART 2): A CH-OI-N SHIFT?

In part 1, I proposed the following sound change for the 白水村 Baishuicun (BSC) 'White Water Village' dialect:

*-an > *-oi

I now propose a larger chain shift:

*-an > *-ai > *-oi > -o

BSC -o often corresponds to *-e/*-aj-type rhymes in prestigious Early and Late Middle Chinese dialects which were not its ancestors (but might have been sources of loans into BSC). I do not include tones in pre-BSC and BSC forms. I have not heard BSC, but I assume that its -i is [j] after vowels, so there is no real difference between MC *-j and (pre-)BSC (*)-i.

Sinograph Early Middle Chinese Late Middle Chinese Pre-BSC BSC
*najʰ *nàj *nai > *noi no
*nəjʰ *nəj > *naj
*buojʰ *fàj *pai > *poi po
*bɤajʰ *bàj
*kɤe *kæj *kai > *koi ko

My proposal accounts for 59% (58/99) of the -oi forms in the 小學堂 Xiaoxuetang database. I will deal with the others in part 3.

BSC p- corresponding to LMC *f- (e.g., in 吠) indicates a native word. A hypothetical early loan of 吠 would be *xo and a hypothetical late loan would be *fo (see my discussion of 煩 xoi and 佛 pu ~ fo in part 1). A DIP INTO WHITE WATERS (PART 1)

Over the past couple of days I have been intrigued by the dialect of 湘南土話 Xiangnan Tuhua 'local speech of southern Hunan' spoken in 白水村 Baishuicun 'White Water Village' in 江永縣 Jiangyong County. In " 'More' Evidence", I found that 更 Old Chinese (OC) *kraŋ(s) / Middle Chinese (MC) *kɤaŋ(ʰ) 'watch of the night'/'more' corresponded to Baishuicun (BSC) koi. I hypothesized that -oi was from an *-aɲ like that of Sino-Vietnamese canh/cánh [kaɲ]. Looking at other BSC -oi forms, I can make a more general statement: *A/O-type vowels followed by nonback nasals (*ɲ, *n, *m) became -oi. The nasals probably merged into *-n before becoming -i.

Sinograph Early Middle Chinese Late Middle Chinese Pre-BSC BSC
*buan *van *xan xoi
*kən *kən *kon or *kan? koi
*tan *tan *CV-tan loi
*kwɤan *kwæn *kwan koi
*ʂɤen *ʂæn *san soi
*kɤaŋ *kæŋ *kaɲ koi
*ŋɤem *ŋæm *ŋam ŋoi
*ʂɤam *ʂæm *sam soi
*bon *bon *pon > *pan? poi

Pre-BSC is a very rough guess bridging Late Middle Chinese (LMC)* and BSC. It looks more like a typical Chinese language than BSC does.

xoi is probably a loanword. I think the native BSC reflex of EMC *b- is p-: e.g., 佛 pu 'Buddha' (a later borrowed form is fo). BSC is not descended from generic LMC dialects which lenited labial stops to fricatives before *u. The borrowing of 煩 must predate the shift of *-an to *-oi and the borrowing of *f-. *x- was the closest pre-BSC equivalent of foreign *f-. I conclude there are at least two layers of borrowings that can be distinguished by their treatment of foreign labiodental fricatives: an older x-layer and a newer f-layer.

I think the l- of 單 is due to BSC-internal lenition.

The above scheme accounts for most but not all instances of -oi in BSC. Two requiring further investigation are

吾 OC *ŋa > EMC, LMC *ŋo : BSC ŋoi

崖 OC *ŋre > EMC *ŋɤe > LMC *ŋæj : BSC ŋoi

Neither had nasals in earlier Chinese. The normal BSC reflexes of OC *-a and *-re are -u and -o.ŋoi could be a borrowing from a dialect that had broken *-ɤe to *-æj or the like. But the -i in 吾 ŋoi remains a mystery.

The above scheme cannot account for cases in which -oi did not develop from *-an: e.g., 肝 BSC kaŋ (not *koi!) <  OC/MC *kan. It seems that velars somehow blocked the *-an to -oi shift.

Next: A ch-oi-n shift?

*9.11.23:03: The LMC reconstruction here is a composite of the prestige dialects underlying borrowed forms in Chinese and non-Chinese languages. It is not a direct ancestor of pre-BSC, though such an ancestor may have been similar and may have borrowed from an LMC prestige dialect. 'SECONDAR-Y' ROUNDING IN CANTONESE

The unexpected labiovelar /kʷ/ in  Cantonese 梗 /kʷaːŋ˧˥/ 'stem' (among many other meanings) from my last post brought to mind a Cantonese form that has puzzled me for many years: 乙 /jyːt˧/ 'second Heavenly Stem'. I used to think its rounded vowel /yː/ was unique, as it wasn't in any reconstruction or actual form that I had ever seen: e.g.,

Old Chinese *ʔi̯ɛt (Karlgren), *qrig (Zhengzhang), my *ʔrət or *ʔrit (I cannot find any rhyming evidence favoring one vowel over the other*)

Middle Chinese: *ʔi̯ĕt (Karlgren), *ʔɣiɪt (Zhengzhang), my *ʔɨit

Mandarin yi

Taiwanese it

Sino-Vietnamese ất [ʔət]

Sino-Korean ŭl [ɯl] < idealized ʔɯ́rʔ

Sino-Japanese otsu < *ət

However, I now see that rounded vowels are not only in Yue varieties like Cantonese but also a few southern non-Yue varieties:

Yue: too many /y/-varieties to list; other rounded vowels (or glides?) are in

Xintian Fantian, Dapu Taiheng jɵk

Kaiping (Chikan) zuat

Taishan (Taicheng) zᵘɔt (? - there is no syllable like this in Stephen Li's Taishan syllabary)

Mengshan iut

Huaiji wut

Dongguan (Guancheng) zøt

Bao'an (Shajing) (j)iɔʔ

Ping: Nanning yt (loan from Cantonese?)

Hakka: Huizhou yat


Zhongshan (Gong'an) iuə

Shaozhou Tuhua:

Xingzi ɵy

西岸 Xi'an oi

Bao'an u

All the non-Yue varieties are within the Yue area, so their rounding may be due to Yue influence.

If rounding is a Yue innovation, why did it happen? Both 梗 and 乙 had medial *-r- in Old Chinese. Did that *-r- sporadically become *-w-? (Cf. Elmer Fudd's "wascally wabbit".) Are there other Old Chinese *-r- words with modern labial reflexes?

I would expect kw-reflexes of Old Chinese 甲 *qrap 'first Heavenly Stem', but the only remotely similar forms are

Yixian kɔɐ̆ʔ

Guilin (Chaoyang) kuo

Jiangyong Chengguan (Baishuicun) kuə

whose diphthongs might be breakings of an *o from *a (cf. o-forms like Lingchuan (Tanxia) ko). None of those three varieties have rounded vowels in 乙, though Baishuicun does have a rounded vowel in 梗.

*My Old Chinese *ʔrət and *ʔrit could both become Middle Chinese *ʔɨit:

OC *r-vocalization -breaking monophthongization
*-ət *-ɨət *-ɨt (> *-ut after labials)
*-rət *-ɨət *-ɨit
*-rit *-ɨit

I have included two other rhymes for comparison.

There was a chain shift:

*-ət > *-ɨət > *-ɨit

That could be interpreted as a push or pull chain:

Push: When *-ət broke to *-ɨət, it 'pushed' original *-ɨət into the 'space' of *-ɨit.

Pull: When original *-ɨət merged with *-ɨit, it left a gap to be filled by *-ət after it broke to *-ɨət.

I generally prefer pull chains, but mixed reflexes of *-ət and *-rət might point to a push chain. 'MORE' EVIDENCE FOR THE LIMITS OF THE MIDDLE CHINESE LEXICOGRAPHICAL TRADITION

Last night I saw this passage in the Wikipedia article on Cantonese phonology:

There are about 630 sounds [i.e., syllables disregarding tones?] in the Cantonese syllabary. Some of these, such as /ɛː˨/ and /ei˨/ (欸), /pʊŋ˨/ (埲), /kʷɪŋ˥/ (扃) are not common any more; some such as /kʷɪk˥/ and /kʷʰɪk˥/ (隙), or /kʷaːŋ˧˥/ and /kɐŋ˧˥/ (梗) which has traditionally had two equally correct pronunciations are beginning to be pronounced with only one particular way uniformly by its speakers (and this usually happens because the unused pronunciation is almost unique to that word alone) thus making the unused sounds effectively disappear from the language [...]

At first I was puzzled by 梗 /kʷaːŋ˧˥/ 'stem' (among many other meanings) which has a labiovelar initial even though it is written with a velar phonetic 更 'watch of the night'/'more'. I have never seen an Old Chinese reconstruction of 梗 with a labiovelar or labial. 梗 had no *-w- according to the Middle Chinese lexicographical tradition based on prestige varieties. But not all modern forms arise from those varieties. Forms in multiple branches of Chinese (written here without tones) may point to *-w-:

Yunhe kuɛ (see here for more Wu forms with -u-)

Nanchang ku (see here for more Gan forms with -u-; is Leping mu a typo for kuaŋ?)

Fuzhou literary (!) ku ~ colloquial keiŋ (see here for more Min forms with -u-)

Lechang kuɐn (see here for more Yue forms with -u-)

Lingui yɛn (the only Ping form with a labial; the aspiration is irregular and can be sporadically found in other Ping varieties and in Yue, Hakka and even Mandarin)

Meixian literary (!) ku ~ colloquial kɛn (see here for more Hakka forms with -u-)

Fengyang kua (see here for more Shaozhou Tuhua forms with -u-)

The -u- and -y- of those forms cannot be derived from Middle Chinese reconstructions for 梗 such as my *kɤaŋˀ or Old Chinese reconstructions derived in turn from those reconstructions: e.g., my *kraŋʔ. (I reconstruct its phonetic 更 as Middle Chinese *kɤaŋ(ʰ) from Old Chinese *kraŋ(s).)

梗 has no labials in Mandarin, Jin, or Xiang. Was labiality lost in the north, or is it a common retention of southern languages that do not form a subgroup? Fuzhou, Meixian, and perhaps other varieties may have borrowed from one or more southern literary Middle Chinese dialects with a labial absent from other prestige dialects.

For comparison, 更 does not have a labial with a few exceptions:

Shaxian and Sanming kɔ̃ (< *kaŋ?; Sanming also has kɛ̃; other Min forms here)

Yangshuo kyɛ̃ (but Lingui kəŋ; other Ping forms here)

Hezhou kɔ (< *kaŋ?)

Jiangyong Chengguan (Baishuicun) koi (< *kaɲ?; cf. Sino-Vietnamese canh 'watch of the night' ~ cánh 'more' [kaɲ])

The labials of most of these forms do not necessary point to *-w-. The shifts I propose for Shaxian, Sanming, and Hezhou have parallels in northwestern Middle Chinese (in which *-aŋ became *-o; a similar shift occurred in neighboring Tangut and its relative Japhug rGyalrong). *PI̵K A CODA

Here's something I don't see every day: a Chinese character (逼) whose readings have three different types of codas:

Velar/glottal: Cantonese bik [pɪk], Suzhou ʔ (source)

Alveolar/dental: Sino-Japanese hitsu < *pit

Labial: Sino-Korean phip

Its Middle Chinese reading was *pɨk. I can't explain this diversity.

9.8.0:48: There are Chinese readings with -t as well, but they are regular reflexes of *-k after front vowels: e.g.,

Toisanese pet < *pek (source)

Meixuan Hakka pit < *pik (source)

That is not the case with Sino-Japanese hitsu with a -tsu instead of the expected -ki. There was no such fronting rule in Japanese or in the Chinese source dialects of Sino-Japanese.

Conversely, 匹 Middle Chinese *phit has two Sino-Japanese readings: hiki as well as the expected hitsu < *pit.

Sino-Korean is full of irregularly aspirated labial initials. In fact there is no *pa in Sino-Korean; all syllables that should be *pa are pha: e.g., 波 pha < Middle Chinese *pa. I have long wondered if this was the product of hypercorrection. (Korean never had f, so Chinese *f- was Koreanized as the stops p- and ph-, and words without *f- in Chinese such as 波 might have been read as if they had *f-.)

The idealized Sino-Korean readings of Tongguk chŏngun (1448) lack this excess aspiration of labials. (I would expect the Tongguk chŏngun reading of 逼 to be *pík, but I can't find 逼 in that dictionary. Although Martin 1992: 126 listed pík in a table of Tongguk chŏngun readings, I don't see it in the book itself.)

On the other hand, Sino-Korean is almost completely lacking in kh-readings, though Tongguk chŏngun has them where they are expected: e.g.,

可 Sino-Korean ka, Tongguk chŏngun khǎ < Middle Chinese *khaˀ

This may tell us something about the chronology of the development of Korean aspirates which are either borrowed or secondary. THE M-ISSING COMMISSAR

Today I read about Genrikh Lyushkov (1900-1945?), whose title brought to mind a question that I've had for a long time: why was German Kommissar borrowed into Russian as комиссар komissar? Why is an m 'missing' from that word and команда komanda 'team' (< French commande)? Was there a rule to simplify sequences of identical consonants at prefix-root boundaries in spellings of loans?

Latin com-missarius > R komissar

Latin com-mendare > R komanda

(The Latin forms are for root identification only and do not necessarily match the later forms' parts of speech, etc.)

But what about

Latin com-mercium > F commerçant > R коммерсант kommersant (not *комерсант komersant) 'merchant'

and Latin com-mutator > R коммутатор kommutator (not *комутатор komutator) 'switchboard'?

That rule obviously does not apply to native words with secondary sequences resulting from syncope: e.g., введение vvedenie 'introduction' < въведеніе vŭ-vedenie 'in-leading' ≠ ведение vedenie 'leadership'.

Sequences of identical consonants within a root word remained intact: e.g., the -ss- of missarius and the -mm- of communis (hence R коммунизм kommunizm 'Communism').

Elsewhere in East Slavic, although both Belarusian and Ukrainian have phonemic gemination, all of the above loanwords (presumably borrowed from Russian) lack geminates:

Gloss Russian Belarusian Ukrainian
commissar комиссар
team команда
merchant коммерсант
switchboard коммутатор
Communism коммунизм

R комитет komitet / B камітэт kamitet / U комітет komitet does not fall into this category since its French source comité (< English committee) already lacked the double consonants of Latin committere.

Finnish komissaari 'commissioner' looks like a borrowing from Russian komissar.

Why does Serbo-Croatian комесар komesar have an -e- instead of an -i-?

9.6.20:48: And why did Old Latin comoine(m) become Latin communis 'common' with -mm-? LIANMA, LINGMO, LINYIN

Most Chinese character spellings of foreign place names in Japanese can be explained in terms of Chinese and/or Japanese readings.

One baffling exception is 布哇 Hawai 'Hawaii' which makes no obvious sense in either Chinese or Japanese. I have written about it thrice (2008, 2010, 2012). I can't think of a better explanation than what I proposed in 2012.

I discovered what initially appeared to be another exception tonight in Yamamoto (2009: 81): 嗹馬 Denmāku 'Denmark' which would be read as Lianma in Mandarin and as *Renba in normal Sino-Japanese. I didn't think it could have been created by a Japanese speaker because Japanese not only has [d] but also has characters pronounced [den]. Mandarin, on the other hand, has no [d] (what is romanized d is actually an unaspirated [t]). So was voiced l intended to be a substitute for voiced d? Apparently it was, as it and similar Chinese names for Denmark turn up in the 1852 edition of the 海國圖志 Illustrated Treatise on the Maritime Kingdoms by 魏源 Wei Yuan:

嗹國 Lianguo (guo is 'country')

領墨 Lingmo (-ng would be an acceptable substitute for -n to a speaker of a Chinese variety like Shanghai without an -n : -ng distinction)

吝因 Linyin (I have no idea what -yin is doing)

9.6.0:36: The Japanese may have taken the spelling 嗹馬 from the Treatise given its influence in Japan:

Wei's work was also to have a later impact on Japanese foreign policy. In 1862, samurai Takasugi Shinsaku, from the ruling Japanese Tokugawa shogunate, visited Shanghai on board the trade ship Senzaimaru. Japan had been forced open by US Commodore Matthew C. Perry less than a decade earlier and the purpose of the mission was to establish how China had fared following the country's defeat in the Second Opium War (1856–1860). Takasugi was aware of the forward thinking exhibited by those such as Wei on the new threats posed by Western "barbarians" [...] Sinologist Joshua Fogel concludes that when Takasugi found out "that the writings of Wei Yuan were out of print in China and that the Chinese were not forcefully preparing to drive the foreigners out of their country, rather than derive from this a long analysis of the failures of the Chinese people, he extracted lessons for the future of Japan". Similarly, after reading the Treatise, scholar and political reformer Yokoi Shōnan became convinced that Japan should embark on a "cautious, gradual and realistic opening of its borders to the Western world" and thereby avoid the mistake China had made in engaging in the First Opium War. Takasugi would later emerge as a leader of the 1868 Meiji Restoration which presaged the emergence of Japan as a modernised nation at the beginning of the 20th century. Yoshida Shōin, influential Japanese intellectual and Meiji reformer, said Wei's Treatise had "made a big impact in our country". *BAKUSHIKO AND *BAKUSHIK(W)A

I almost added this to my last post, but I ran out of time, and the topic is somewhat different, n  ...

I have long been puzzled by Chinese character spellings of foreign place names in Japanese. Some seem to be hybrids of Chinese and Japanese readings.

For instance, when I Googled for モスコウ Mosukou and 1939 last night, I found 北京より 莫斯古へ Pekin yori Mosukou e, 高山洋吉 Takayama Yōkichi's 1939 translation of Sven Hedin's Von Peking nach Moskau. Although the Kobe University City Library has it catalogued as Pekin yori Mosukuwa e (with the currently dominant Japanese name of the city - the most likely term to be searched), I assume 莫斯古 was meant to be read as Mosukou since the name appears as モスコウ Mosukou in the title of chapter 11. 莫斯古 Mosukou would be read as *Mosigu [mwɔ sz̩ ku] in Mandarin and *Bakushiko in normal Sino-Japanese. Whoever created that spelling seemed to be thinking of Mandarin 莫斯 [mwɔ sz̩] followed by Sino-Japanese 古 ko. (There is no [mɔ] or [mo] in standard Mandarin.)

I used to think another Japanese spelling 莫斯科 was a direct loan from Mandarin Mosike [mwɔ sz̩ kʰɤ] (ke was once [kʰɔ] and is still [kʰɔ] or the like in many other varieties of Chinese today). But could it be a blend of Mandarin 莫斯 [mwɔ sz̩] followed by Sino-Japanese 科 kwa (pronounced [ka])? Was 莫斯科 first attested in Chinese or Japanese? In any case, it cannot be based on its hypothetical normal Sino-Japanese reading *Bakushikwa.

Tonight I discovered a third Japanese spelling 莫斯哥 in Yamamoto (2009: 78). This looks like a direct loan from the less common Mandarin name Mosige [mwɔ sz̩ kɤ] (ge was once [kɔ] and is still [kɔ] or the like in many other varieties of Chinese today). In normal Sino-Japanese, it would be read *Bakushika which sounds nothing like Moskva or Moscow.

In the Kobe University Library Newspaper Clippings Collection, 莫斯科 appears 815 times between 1912 and 1941, 莫斯哥 appears only once in 1916, and 莫斯古 does not appear at all. The three katakana spellings combined outnumber the kanji spellings by nearly two to one (1564 : 816). MOSUKUWA VS. MOSUKŌ (AND MOSUKOU)

While looking up ワルシャワ Warushawa and ワルソー Warusō in various editions of Kenkyusha's New Japanese-English Dictionary, I noticed that their distribution paralleled that of モスクワ Mosukuwa (< Russian Moskva) and モスコー Mosukō (< English Moscow). Was there a shift toward more Slavic-flavored Japanizations by 1974?

Here is the distribution of both terms and a third term in the Kobe University Library Newspaper Clippings Collection:

Mosukuwa: 866 results, 1912-1942

Mosukō: 568 results, 1912-1942

モスコウ Mosukou: 130 results, 1915-1936

I was expecting Mosukuwa to be less common than Mosukō by analogy with Warushawa and Warusō, but the reverse is true. I wish I had postwar statistics. Here are current Google statistics showing the gaps between the three have widened considerably:

Mosukuwa: 1.36 million

Mosukō: 119,000 results including non-Russian Moscows (cf. the use of Warusō for non-Polish Warsaws)

Mosukou: 10,600 results including モスコウイッツ Mosukowittsu 'Moskowitz'

I've never heard Moscow rhyme with Mexico. Is that pronunciation still current, and if so, where? WARUSHAWA VS. WARUSŌ

The influence of English on Japanese has only grown over time, while the influence of other European languages has waned: e.g., German-based dēdētē 'DDT' (in this dictionary of extinct Japanese words) has been replaced by English-based dīdītī. (What was the last major German or French loanword in Japanese?) So when I see a continental European loanword, I assume it is pre-1945: e.g., ワルシャワ Warushawa 'Warsaw' which sounds like Polish Warszawa (though Polish w is [v]).

That was why I was surprised to see an English-like ワルソー Warusō for 'Warsaw' in the September 2, 1939, Asahi shinbun. (Yes, the 75th anniversary of the beginning of WWII is still on my mind.) How far back do Warushawa and Warusō go? I wish Google Ngram Viewer worked with Japanese.

I quickly found various attestations of Warusō from the period:

- the September 18, 1939, entry of the diary of 馬淵良三 Mabuchi Ryōzō

- the October 6, 1939, 大陸日報 Continental Daily News published in Vancouver, BC

- 宮本百合子 Miyamoto Yuriko, "The Flames of the Life of Mrs. Curie" (December 1939)

- the Privy Council's "Abolishing an Imperial Embassy in Poland" (October 1, 1941)

Was Warusō the standard Japanese name for Warsaw at the time? Judging from Wikipedia, today it seems to linger only in a few contexts such as ワルソー条約 Warusō jōyaku 'Warsaw Convention' (1929; cf. ワルシャワ条約 Warushawa jōyaku 'Warsaw Pact' with the same jōyaku) and ワルソー・コンチェルト Warusō koncheruto 'Warsaw Concerto' (1941). The Japanese Wikipedia entry for Warsaw doesn't mention Warusō as an alternative of Warushawa. Looking in various editions of Kenkyusha's New Japanese-English Dictionary, I discovered that editions prior to 1974 only listed Warusō. The 1974 edition listed both Warusō and Warushawa for the first time.

The 1975 edition of Sanseido's New Concise Japanese-English Dictionary that I have been using for over thirty years lists only Warushawa in its appendix of place names.

It would be interesting to see when, say, Asahi shinbun shifted from Warusō to Warushawa. A couple more data points: I just found Warusō in the December 20, 1919 Ōsaka asahi shinbun (image / HTML) and Richard Austin Freeman's The Case of Oscar Brodski, translated into Japanese by 妹 尾韶夫 Seno Akio in 1957.

Okay, a few more: Warushawa first appears in the Kobe University Library Newspaper Clippings Collection in the May 30, 1913, Jiji shinpō (image / HTML), and appears in papers up through 1939 (image / HTML). Warusō first appears in that collection in 1915 (image / HTML), and last appears in 1941 (image / HTML). Warusō outnumbers Warushawa by a ratio of roughly seven to one (138 : 21). Today in Google, Warusō is vastly outnumbered by Warushawa (20,300 : 515,000). Warusō is the only Japanization of Warsaws outside Poland, but I presume those other Warsaws aren't mentioned enough to give Warushawa serious competition. GYDDANYZC

I have long been interested in Slavic partly because it underwent massive vowel loss paralleling the massive vowel losses I reconstruct for Chinese and Tangut.

My favorite example is monosyllabic Gdańsk from *Gŭdanĭskŭ* with four syllables (Comrie 1987: 326). That city has been on my mind lately because today is the 75th anniversary of the German invasion of Poland.

On Saturday I learned that Gdańsk is first attested as Gyddanyzc sometime after 997 AD. Is that spelling evidence for

- the retention of medial and

- the loss of final

circa 1000 AD?

Why were the vowels that were later lost both spelled y? Had they merged into [ɨ] in the version of the name that was transcribed? They could not have merged in the ancestor of the modern name Gdańsk, as ń is from *nĭ and still reflects the palatal quality of the lost vowel *ĭ. Or is ny a transcription of [ɲ]?

Is the doubling of d significant?

Why was the sibilant before c [k] written as z instead of s? The z also appears in the later spellings Kdanzk (1148), Gdanzc (1188), and Danzc (1263). I assume the z of the spelllings Danczk (1311), Danczik (1399), and Danczig (1414) is half of a digraph cz [tʂ] and is not evidence for [z].

*This matches Old Church Slavonic Гъданьскъ <Gŭdanĭskŭ>. Is that form of the name attested in ancient texts, or is it a retroactive creation?

I assume the ISO 639-1 code cu for OCS is from c(h)u(rch). cu makes me think of Cuman which has no ISO 639-1 code; its three-letter ISO 639-3 code is qwm. qum and cum were already taken for Sipakapense in Guatemala and Cumeral in Colombia. EAT-YMOLOGY 3: *NZ- > *NDZ-?

I just realized that 'eat' from my last post wasn't the best example of a word that had undergone brightening without lenition. What if lenition were followed by fortition (in bold) after a nasal?

stem 1: *NI-dza > *NI-dzja > *NI-z- > *Nz- > *ndz- > dzi 1.11

stem 2: *NI-dza-w > *NI-dzjaw > *NI-z- > *Nz- > *ndz- > dzio 1.51

Perhaps these are better examples of brightening without lenition:

Tangut 0749 phi 1.11 'to order' (stem 1), 4568 phio 2.44  'to order' (stem 2)  : Japhug kɤ-ɣɤ-xpra 'to order'

also cf. Somang ka-wa-kprá 'to order' preserving Proto-rGyalrong *kpr-

Pre-Tangut *CI-Kpra(-w-H) > *CI-Kprja(w-H) > *Kpr- > phi(o) 1.11/2.44

or Pre-Tangut *KI-pra(-w-H) > *KI-prja(w-H) > *Kpr- > phi(o) 1.11/2.44

Tangut 5449 1tị 'to put' 1.67 (stem 1), 5633 1tiọ 'to put' 1.72 (stem 2)  : Japhug kɤ-ta 'to put'

Pre-Tangut *CI-S-ta(-w) > *CI-Stja(w) > *tt- > ti ~ tiọ̣ 1.67/1.72

or Pre-Tangut *SI-ta(-w) > *SI-tja(w) > *tt- > ti ~ tiọ̣ 1.67/1.72

If lenition had preceded brightening, ph- and t- would have lenited to *v- and *l- in those words.

I do not know if the *I of the brightening presyllable followed *K- that conditioned the aspiration of ph- and/or the *S- that conditioned the tension of rhymes 1.67 and 1.72 (indicated by a subscript dot). *I could have been in a presyllable preceding one or both of those consonants.

Pre-Tangut *K(I-)pr- nicely matches Proto-rGyalrong *kpr-, but pre-Tangut *S(I)-t- does not match Proto-rGyalrong *t-. Perhaps the aspirated th- of Written Burmese thāḥ 'to put' is from *St-.

Old Chinese 置 *trək-s 'to place' may be an unrelated lookalike even if it is from *r-tək-s, as it has a *-k absent in  the other languages.

I cannot explain why stem 2 of 'to order' has a second ('rising') tone from an *-H absent in stem 1. EAT-YMOLOGY 2: BRIGHTENING BEFORE LENITION?

The second item in Guillaume Jacques' 2006 list of Tangut-Japhug rGyalrong comparisons is

Tangut 5113 1wji 'to do' (stem 1), 36211wjo 'to do' (stem 2)  : Japhug kɤ-pa 'to close'

also cf. other rGyalrong forms: e.g., Somang ka-pa 'to do'

(more in #1133-1135 at Nagano and Prins' database)

See my first "Eat-ymology" post for an explanation of stems 1 and 2.

That post was about a parallel pair of stems:

Tangut 4517 1dzji 'to eat' (stem 1), 4547 1dzjo 'to eat' (stem 2)  : Japhug kɤ-ndza 'to eat'

The parallelism is not as apparent if one looks at rhyme numbers and/or my reconstructions:

Stem 1 2
'to do' vɨi 1.10 vɨo 1.51
'to eat' dzi 1.11 dzio 1.51

Guillaume uses Gong Hwang-cherng's reconstruction in which rhymes 1.10 and 1.11 are both -ji. I reconstruct them differently. I also reconstruct different allophones of 1.51 after different initials. The class II initial v- is followed by Grade III -ɨ- but not Grade IV -i-. It was somehow antipalatal in a way that medial -w- is not. Gong reconstructed w in both initial and medial position and did not reconstruct a Grade IV distinct from Grade III.

I can see why Gong reconstructed rhymes in those two grades identically. There are a few minimal pairs involving them: e.g.,


0932 ʔɨi 1.10 'many' (only with that meaning in dictionaries?) : 3119 ʔi 1.11 'many'

Those pairs were probably the reason why the Tangut split rhymes (e.g., into 1.10 -ɨi and 1.11 -i). When no such pairs were present, the Grade III/IV distinction was subphonemic, and there was no split. Hence there was only one rhyme 1.51 -ɨo/-io. The frontness of the first half of the diphthong was predictable.

I think the Grade III/IV distinction was absent from pre-Tangut. I reconstruct the pre-Tangut sources of 'to do' as

stem 1: *CI-pa > *CI-pja > *CI-β- > *vi > vɨi 1.10

stem 2: *CI-pa-w > *CI-pjaw > *CI-βj- > *vjo > vɨo 1.51

I do not know if *-ja and *-jaw became *-i and *-jo before or after lenition. My guess is that such shifts predated the loss of stop codas that might have blocked the raising of *-ja to *-i:

Brightening stage 1 *-ja *-jaw *-jaC
Brightening stage 2 *-i *-jo *-jaC
Final coda loss *-i *-jo *-ja
GV > VV diphthong reanalysis -i -io -ia

Some of the changes above could be viewed in terms of a drag chain:

*-jaC > *-ja > *-i

Tangut syllables with lenited initials must have once had presyllables conditioning intervocalic lenition:

*presyllable + labial > *β- > v-
*presyllable + dental > l-

*presyllable + alveolar > *z- > ɮ-

*presyllable + palatal > - > ʐ-

*presyllable + velar > ɣ-

Lenition must have preceded brightening because there are words such as 'to eat' with brightening but without lenition. 'To eat' must have lost its brightening presyllable before 'to do':

Gloss 'to eat' 'to do'
Pre-Tangut *NI-dza *CI-pa
Brightening *NI-dzja *CI-pja
Presyllable to prenasalization *Ndzja *CI-pja
Lenition *Ndzja *CI-β-

If brightening had followed lenition, 'to eat' should have been *ɮi 1.11 (stem 1) / *ɮio 1.51 (stem 2) with *ɮ- from a lenited *-dz-. EAT-YMOLOGY

Guillaume Jacques' 2006 list of Tangut-Japhug rGyalrong comparisons begins with

Tangut 4517 1dzji 'to eat' (stem 1), 4547 1dzjo 'to eat' (stem 2)  : Japhug kɤ-ndza 'to eat'

Guillaume used Gong's 1997 reconstruction of Tangut.

Those Tangut words in my reconstruction are 1dzi (without -j-) and 1dzio (which could be rewritten as dzjo).

Stem 2 (in bold below) was used before the first and second personal singular suffixes when the object is in the third person. Otherwise stem 1 was used:

Subject \ object of 'eat' ... me ... us ... thee ... you ... him/her/it/them
I ... (no 'I eat me', etc.)


We ...


Thou ...

 (no 'You eat you', etc.)
You ...


He/she/it/they ...





The seventeen slots in that table have only six forms. I list my reconstructions when they differ from Gong's.

1. bare stem 1 (3rd person subject and object)

2. stem 1 + 2ŋa (first person singular object)

3. stem 1 + 2nja (= my 2nia; second person singular object)

4. stem 1 + 2nji (= my 2ni; nonthird person plural subject and/or object)

5. stem 2 + 2ŋa (first person singular subject + 3rd person object)

6. stem 2 + 2nja (= my 2nia; second person singular subject + 3rd person object)

Reconstructing the history of the Tangut and Japhug words for 'to eat' involves dealing with issues 2 and 3 from my last post.

Cognates of the Tangut word such as Japhug kɤ-ndza, Written Tibetan za-ba, and Written Burmese cā generally have a. The high front vowel of Tangut -ji is assumed to be the product of 'brightening' (Matisoff 2004).

In 2009, Guillaume derived -ji in 'eat' from *-ja. I don't know whether he still does in his new book. This is phonetically plausible. However, it raises the question of where the *-j- in *-ja came from. How far back can it be projected? Did languages such as Japhug, Tibetan, and Burmese lose it? Or is it a Tangut-internal innovation?

Gong (1994: 42) thought Old Chinese and Tangut retained Proto-Sino-Tibetan *-j- whereas Tibetan and Burmese generally lost it. Perhaps he would have said Japhug had lost it in this word. (There are no *affricate-j clusters in Guillaume's (2004: 331-332) Proto-rGyalrong reconstruction. Did *ndzj- simplifiy to *ndz-?)

On the other hand, I did not reconstruct -j- in Tangut. I proposed that the brightening of *a to -i was due to a high-vowel presyllable:

*CI-dza > *CI-dzja > 1dzi

*CI-dza-w > *CI-dzjaw > 1dzio

The problem with this hypothesis is the absence of external evidence for *CI-. Could the Japhug reflex of *CI- be n-; i.e, was *CI- something like *[ni]? If so, perhaps the presyllable was absorbed into the initial in both Japhug and Tangut:

Pre-Proto-rGyalrong *ni-dza > Proto-rGyalrong *ndza >

Japhug -ndza

Somang -zá

Zbu -ndzeʔ, -ndziʔ (with brightening conditioned by the front vowel of *ni-?), -ndzʌʔ

Tangut: *ni-dza > *ni-dzja > *ndzi > 1dzi

I have followed Gong and Arakawa who reconstructed Tangut voiced obstruent initials without prenasalization, but others such as Nishida (1964) and Sofronov (1968) would disagree. The most recent scholar in favor of complex voiced obstruent initials is Tai (2008:

[...] there are regular use of prescripts in front of voiced obstruents [in the Tibetan transcription of Tangut], suggesting that there should be a pre-initial consonant [in Tangut], which is probably a weak nasal or glottal sound.

I followed Guillaume who reconstructed *ndz- at the Proto-Tangut (= my pre-Tangut) level in 2009. GUILLAUME JACQUES' ESQUISSE DE PHONOLOGIE ET DE MORPHOLOGIE HISTORIQUE DU TANGOUTE NOW IN PRINT

That was the best news I'm likely to hear all week. This month too. Maybe even this year.

Unfortunately I haven't seen the book yet. Google Books has no preview for it. Nonetheless I am confident that I will be impressed. I have seen Guillaume's previous work on Tangut and rGyalrong and look forward to see how he has build upon it. I am particularty interested to see his treatment of the following topics:

1. What shared innovations distinguish his proposed Macro-rGyalrongic group from the rest of Qiangic or - if  Macro-rGyalrongic is his term for Qiangic - the rest of Sino-Tibetan?

Tonight I found the 2011 dissertation of Marielle Prins (whose rGyalrongic database I constantly use) which states  on p. 21 that there is "an absence of common innovations" in Qiangic. Prins proposed that

the similarities between the Qiangic languages may be caused by diffusion rather than be genetic in nature. [...] It is more likely that the shared features of these languages are the result of contact induced structural convergence, and that the Qiangic group should be considered an areal language group rather than a group of genetically related languages. (p. 22)

I wonder what Guillaume would say about that.

I am not sure whether Prins is denying that the Qiangic languages are related at all, or if she is just rejecting Qiangic as a subgroup. The latter position need not entail a complete absence of a genetic relationship: e.g., Qiangic could consist of languages from multiple Sino-Tibetan branches which have converged. Is Prins' Qiangic like my Altaic (completely unrelated languages tha have converged) or like the Balkan languages (which are from different branches of Indo-European)?

2. I assume Guillaume is still using Gong's 1997 reconstruction of Tangut which has three grades of rhymes (his III corresponds to my III and IV):

Grade Gong Gong's source Arakawa This site
I -Ø- *-Ø- -Ø- -Ø- + lowering of high vowel
II -i- *-r- -j- -ɤ-
III -j- *-j- long vowel -ɯ-
IV -i-

Gong's Tangut -j- and Old Chinese *-j- were retentions from his Proto-Sino-Tibetan *-j-. On the other hand, Guillaume does not reconstruct Old Chinese *-j-. How does he account for Gong's Tangut *-j-?

3. Pre-Tangut *a was raised and fronted ('brightened') to various degrees. I have tried to explain the multiple reflexes of *a by reconstructing presyllables with front vowels:

*CI-Ca > Ci

*CE-Ca > Cie

More recently I have hypothesized that some 'brightening' might have been conditioned by a suffix *-j: e.g.,

1749 *kwa-j > 1kwe 'hoof'

cf. Ersu nkhuɑ⁵⁵ 'id.'; more cognates at STEDT

How does Guillaume explain 'brightening' in Tangut?

4. Gong reconstructed long vowels that do not correspond to long vowels in Tangut transcriptions of Sanskrit. I am now agnostic about those vowels and reconstruct them with an abstract symbol ' to differentiate them from their much more common '-less counterparts (short vowels in Gong's reconstruction). The zero-' distinction does not seem to correspond to anything in rGyalrong; both types of Tangut vowels correspond to the same Japhug rGyalrong vowels (Jacques 2006): e.g.,

Tangut rhyme Gong This site Japhug
37 -jij -ie -i, -e, -o
40 -jiij -ie'

What does Guillaume think is the source of vowel length in Gong's reconstruction? Does that length reflect a disticnction lost in Japhug?

5. I reconstructed *-H as the source of the Tangut second ('rising') tone; syllables without *-H developed the Tangut first ('level') tone. This type of tonogenesis has parallels in Chinese, Tibetan, and Burmese, but not. Southern Qiang (Evans 2007). What is Guillaume's account of the origin of Tangut tones? A *E(YE)-GRADE ROOT?

If Tangut 1new 'breast', 2niu 'to drink milk', and 2niụ 'to give milk' are from the root *√n-w that I proposed in my last post, there ideally should be other sets of -ew ~ -iu words. Unfortunately, I still haven't gotten around to looking for them, but today it did occur to me that Tangut

4684 1me 'eye'

and Old Chinese 目 *muk 'eye' might share a root *m-kʷ:

*m-e-kʷ (e-grade) > *mew > 1me

(See this series of posts on Tangut *labial-w syllables: 12.7.23 / 12.7.28 / 12.7.29).

*m-kʷ (zero-grade) > 目 *muk

This word is widespread in Sino-Tibetan (STEDT roots #33, 681, 682). It often has an -i-: e.g., Tibetan mig 'eye'. Was there an i-grade? (I have borrowed the terms e-grade and zero-grade from Indo-European studies. There is no such thing as an i-grade in Indo-European, but perhaps it existed in Sino-Tibetan.) Or was the root *mʲ-kʷ with an initial palatalized consonant that was vocalized as -i- in the zero grade in many languages?

One might want to resurrect an old-fashioned reconstruction *mjuk for Old Chinese 目 *muk and view its *mj- as a reflex of *mʲ-, but I have never seen any evidence for *-j- in modern Chinese languages, and there is no trace of *-j- in Sinoxenic:

Taiwanese bak (colloq.), bok (lit.)

Cantonese muk

Mandarin mu (in earlier reconstructions, *mj- > w-, not m-)

Sino-Vietnamese mục (not *dục < *mjuk or *mʲuk)

Sino-Korean mok

Sino-Japanese moku, boku

One might also be tempted to regard 覓 Middle Chinese *mek 'to seek' as being from an e-grade *m(ʲ)ekʷ like Tangut 1me 'eye', but the earliest attestation of the word that I can find is in Yupian (c. 543 AD), so I do not know if it should be reconstructed at the Old Chinese level. It could be a later unrelated innovation that has nothing to do with m-words for 'eye'. A *N-W WORD FAMILY?

The Tangut word

4834 2niụ < *S-nuH 'to give milk'

from my last two posts is a causative derivative of

4614 2niu < *S-nu 'to drink milk'

and I think those two words are related to


2123 1new 'breast' =

left of 3588 1new 'radish' (phonetic) +

left and center of 5275 2nɪʳ 'breast' (semantic).

Tangut *-w can be from *-k or *-w. If 2123 1new 'breast' is from *new and not *nek*, then it and the niu-words may share a root *√n-w:

pre-Tangut prefix root consonant 1 vowel root consonant 2 suffix gloss
e-grade Ø n- -e- -w Ø breast
zero-grade Ø -H to drink milk
S- to give milk

The *-w of the zero-grade root would have been pronounced as a vowel [u].

Could the grade hypothesis account for the vocalic diversity of these cognates?

Old Chinese 乳 *Cɯ-noʔ 'nipple, milk' could be from an o-grade *n-o-w. (The prefix could be *pɯ- if 孚 *phu is phonetic.)

If the above scenario is correct, are there other cases of *Cew ~ *Cu alternations in Tangut, and what is the significance of the different grades?

Li (2008: 832) regarded 5275 2nɪʳ 'breast' as a loan from Chinese. The only similar Chinese word I know of is 奶 'breast, milk'. But I cannot find any attestations of 奶 before the Qing Dynasty. If 奶 had existed in northwestern Middle Chinese, it would have been pronounced *nəjˀ, and if Tangut speakers added a *T-prefix, the resulting *T-nəjˀ would have developed into 2nɪʳ. The Old Chinese source of *nəjˀ would be *Cʌ-nəʔ which might have come from an even earlier *Cʌ-nəw-ʔ: i.e., a schwa-grade form of *√n-w. Perhaps the pre-Tangut prefix directly reflects the Old Chinese prefix if the latter had survived in the colloquial speech of the northwest during the Middle Chinese period:

OC *Tʌ-nəw-ʔ > *Tʌ-nəʔ > MC *T(ʌ)-nəjˀ > pre-Tangut *T(ʌ)-nəjˀ > Tangut 2nɪʳ

However, all that is highly speculative.

2nɪʳ could be an unrelated lookalike from a pre-Tangut source such as *Cʌ-nirH.

In any case, I cannot think of a way to derive 2nɪʳ from *√n-w within Tangut. If it is ultimately from that root, it would have to be a Chinese loanword.

*One might be tempted to reconstruct *-k since Maru has nuk⁵⁵ 'breast, milk', but Maru -k is an innovation. A BOVINE DYNASTY? (PART 2)

Guillaume Jacques (2010) equated the second half of ngo.snuHi, the Tibetan transcription of the name of the mythical first Tangut emperor, with Tangut

2niụ < *S-nuH 'to give milk'

In 2008, Guillaume rejected the temptation to go further and equate Tibetan s- with pre-Tangut *S- (his *s-):

Une hypothèse plus audacieuse pourrait être de voir dans ce s- une notation du préfixe causatif *s- qui doit se reconstruire pour ce verbe. En tangoute, nju.² [= 2niụ]  est dérivé de nju² [= 2niu]  (#4614) 'boire du lait'; le préfixe causatif *s- a disparu, laissant comme seule trace la 'voix tendue' notée par un point en dessous de la voyelle (Gong 1999). Cette hypothèse, toutefois, est très improbable dans la mesure où elle supposerait que soit conservée dans la graphie tibétaine une prononciation du tangoute plus ancienne que le système reconstruit à partir des dictionnaires du XIIème siècle, et donc antérieure d’au moins quatre cent ans aux textes tibétains eux-mêmes.

He regarded the Tibetan s- as "un simple artifice orthographique" since

dans le tibétain central du XIVème siècle, les consonnes préinitiales étaient déjà probablement confondues, voire amuies

but I wonder if ngo.snuHi reflects a nonstandard 14th century Tangut dialect which preserved pre-Tangut *S-. Tangut may have been internally diverse, and this dialect may have been to 12th century standard Tangut what modern Cantonese (which preserves final stops) is to Tangut period northwestern Chinese (which lost final stops) or what Ladakhi (which preserves some s-clusters: examples here and here) is to 14th century central Tibetan.

The -Hi may reflect a -j which was another trait of this 14th century Tangut dialect. Summing up the differences between the two types of Tangut and their common parent pre-Tangut:

Word cow to give milk
Pre-Tangut *ŋwə(-j)-H *S-nu(-j)-H
Standard Tangut 2ŋwɪ 2niụ
Later nonstandard Tangut ŋwə or ŋ(w)o snuj
Tibetan transcription ngo snuHi

The standard dialect had a *-j suffix in 'cow' absent from the nonstandard dialect. Conversely, the nonstandard dialect had a *-j suffix in 'to give milk' absent from the nonstandard dialect.

It is remotely possible that the -iụ of standard 2niụ could be a metathesis of *-u-j rather than a breaking of *u. But even if that were true - and I don't think it is - many or even most -iu could not be from *-u-j, as -iu regularly corresponds to Japhug rGyalrong < Proto-rGyalrong *-u (Jacques 2004: 143, 2006: 16-17). Moreover, if such a metathesis had occurred in standard Tangut, I would expect Chinese *-uj or perhaps even *-wi to correspond to Tangut -iu in very early loans. No such loans have been identified.

In any case, *-j cannot go back very far because probable cognates lack it, and nothing else leads me to believe that Tangut preserved a *-j lost elsewhere.

Next: A *n-w word family? A BOVINE DYNASTY? (PART 1)

Guillaume Jacques (2010) equated 2339, the first syllable of the Tangut imperial surname 2ŋwɪ 1mi, with its homophone (and near-homograph) 0395 2ŋwɪ 'cow':


The shared center and right components are phonetic. The surname tangraph has 'sage' on the left, whereas 'cow' has the center of 'bear' according to Precious Rhymes of the Tangraphic Sea:


0395 2ŋwɪ 'cow' = 5605 2riẽ 'bear' + 2139 2ŋwɪ 'a kind of bird'

Without looking outside Tangut, I could reconstruct the pre-Tangut source of 2ŋwɪ 'cow' as

*Cʌ-ŋwiH (if the -w- is original) or

*Pʌ-ŋiH (if the -w- is from a presyllable)

with a low presyllabic vowel to condition the lowering of *i. However, it is unlikely that the root vowel was once *i.* Probable external cognates such as Old Chinese 牛*ŋʷə 'cow' and Written Burmese nvāḥ (< *ŋwaH?*; many more here) point to a nonfront vowel. This word was borrowed into southwestern Tai as *ŋuaA 'ox'**.

I used to reconstruct the rhyme of 2ŋwɪ 'cow' as -əi. Could 2ŋwɪ or 2ŋwəi be from *ŋʷə-i-H?

The name of the first Tangut emperor was transcribed as ngo.snuHi in Tibetan. Guillaume Jacques identified that as Tangut

0395 4834 2ŋwɪ 2niụ 'the cow gives milk /  [someone] fed milk by the cow'

whose meaning was rendered in Tibetan as

ba-la Hthung-ba

cow-DAT milk drink-NMLZ

'he who drinks milk from the cow'

ngo might be a transcription of a nonstandard Tangut *ŋwə without my proposed suffix *-i. There is no character for schwa in the Tibetan script, so Tibetan o might represent a schwa. It is also possible that a pre-Tangut *ŋwə could have become *ŋo in that dialect (whereas *-wə did not fuse into -o in standard Tangut).

*Was *ŋw- > *nw- a regular change in Proto-Lolo-Burmese? I can't remember if my unpublished reconstruction from twenty years ago had either cluster. Matisoff's (1972) reconstruction has only one word with *ŋ(w)- which has a variant initial *mw- (not *nw-!).

**Do variants with w/v- and h- reflect different sources of borrowing? See Gedney's list of forms in Hudak 2008: 95. Unfortunately I could not find the word in Pittayaporn's  2009 dissertation on Proto-Tai. It may have been excluded because it could not be reconstructed at the Proto-Tai level. WAS THE TANGUT IMPERIAL FAMILY THE MI OF WEI?

Two years ago I saw Guillaume Jacques' derivation of the Tangut imperial surname

2339 1903 2ŋwɪ 1mi

from a hypothetical homophonous phrase

0395 4542 2ŋwɪ 1mi 'the cow feeds [someone]' / 'fed by the cow'

At the end of last month, I saw another derivation but couldn't remember what it was. I found it last night in Nishida (2010: 233):

It is very probable that the second syllable, miɦ (level 11), of ŋʷwɪ-miɦ (level 11) meaning "imperial family" was one of the corresponding cases of the [Tangut autonym] Mi. Its meaning might have been the Mi of Wei 魏.

The Tangut imperial family claimed descent from the Tuoba clan of the Northern Wei. Although this etymology is initially appealing, it has phonological problems.

First, the Tangut called the Wei

4962 2vɪ or 5574 2vɨi

rather than 2ŋwɪ. v- in those transcriptions reflects the loss of *ŋ- in the Tangut period northwestern Chinese pronunciation of 魏. Perhaps the imperial surname contains an earlier borrowing of 魏 preserving its nasal initial.

Second, the Tangut autonym

2344 2mi < *miH

has a 'rising' tone, whereas the second syllable of the surname has a 'level' tone. This tonal difference does not necessarily rule out a connection between the two names. The 'rising' tone of the autonym may be a reflex of a final glottal suffix *-H absent from the 1mi < *mi of the surname. Both 2mi and 1mi may be cognate to Tibetan mi 'person'. A SILKEN SOURCE FOR THE RED RADICAL?

I'm surprised I was able to account for all uses of the 'red' radical

(Boxenhorn code: qie; Nishida radical 226)

in a straightforward manner in my last post. It means 'red' and/or is phonetic in all but one case (E):

A. n-phonetic B. 'red'
E. 1tʂhɨĩ 'Chen' (a family on the land of the 2nie family?) C. xŨ-phonetic in < B. 1xʊ̃ 'red' D. -iã-phonetic in < B. 2ʔiã (1st syl of 'rouge')

I am normally at a loss to explain the function of a component in one or more tangraphs containing it. For instance, I have no idea what

the right side of

1671 1nie 'red'

is doing. It is in 65 other tangraphs. I think it is phonetic in

1674 2nie (second syllable of 2mi 2nie 'younger sister')

1809 2nie (second syllable of 1ɣɤə 2nie 'few')

which are near-homophones of 1nie 'red'. But what is it doing in, say,

3528 2tho' 'to harm, endanger'

whose analysis is unknown? Did red signify danger?

Going back to the other half of 1617 'red', I think

might be derived from the seal form of the top half (幺) of the Chinese 'silk' radical 糸 on the left side of Chinese 紅 'red'. The vertical line at the top of the Chinese 'silk' radical corresponds to the horizontal line of the Tangut 'red' radical, and the two circles correspond to the two

of the Tangut 'red' radical. If the admittedly vague similarity between the two radicals is just pareidolia on my part, did the Tangut simply draw a random line pattern and declare it to be 'red' and/or nie?

Tangut fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2014 Amritavision