The Battle of Mang Yang Pass occurred sixty-five years ago today:

It was one of the bloodiest defeats of the French Union together with the Battle of Dien Bien Phu in 1954 and the Battle of Cao Bằng in 1950.


The ambush and destruction of GM 100 [Groupement Mobile No. 100] was considered the last significant battle of the First Indochina War. Three weeks later, on Jul. 20, 1954, a battlefield ceasefire was announced when the Geneva agreements were signed, and on Aug. 1, the armistice went into effect, sealing the end of the French Indochina and the partition of Vietnam along the 17th parallel. The last French troops left South Vietnam in April 1956, upon request from President Ngô Đình Diệm.

What kind of name is Mang Yang? Vietnamese syllables normally do not begin with Y-. Mang Yang is in Gia Lai Province which has many obviously non-Vietnamese names. The one I recognize is Pleiku which is un-Vietnamese in four ways:

  1. It begins with p-. ph- is permissible but not p- (because earlier Vietnamese *p- became b-).

  2. It begins with a consonant cluster containing -l-. All native Vietnamese *Cl-clusters became tr-. Wiktionary has a phonetic Vietnamese spelling pờ lây cu splitting the first syllable Plei in two. (Why is a huyền tone assigned to pờ?)

  3. The first syllable ends in -ei, a rhyme unknown in Vietnamese.

  4. The second syllable has a k- instead of c- for [k] before a back vowel. Vietnamese k- is normally written only before front vowels.

Wikipedia and Wiktionary derive Pleiku from Jarai Plơi Kơdưr, lit. 'village north/above'.

The Vietnamese Wikipedia says Mang Yang is Bahnar for cổng trời, lit. 'gate sky': i.e., 'sky gate'. But the dictionary of the Plei Bong-Mang Yang Bahnar dialect by the Bankers and Mơ (1979) has no words like mang 'gate' or yang 'sky'. I cannot find a word for 'gate' in the dictionary's English-Bahnar index, and the only word for 'sky' I can find using that index is plĕnh on p. 99. (There is supposed to be another word for 'sky' on p. 110, but I don't see one.) There is a yang 'spirits, nonhuman beings that affect humans' on p. 145. Perhaps that is the Yang of Mang Yang. WHY DON'T FINAL FRICATIVES DEVOICE IN TURKISH?

In my last post, I didn't comment on the final consonants of Arabic Muḥammad and Turkish Mehmet. Turkish final stops and affricates devoice in final position: e.g., Arabic kitāb > Turkish kitap 'book' (but acc. sg. kitab-!). Note, however, that the etymological -d of Mehmet does not survive in the spelling of the accusative singular: Mehmet'i [mehmedi]. (The apostrophe separates a proper name from a suffix. The rule is to keep the spellings of proper names intact regardless of actual pronunciation.)

Note also that I spoke of final stops and affricates devoicing but not fricatives: /ʒ z v/ remain voiced in final position unlike their Russian counterparts.

Tonight I realized that /z v/ are phonetically fricatives but behave like sonorants. Final /z/ in native words comes from Proto-Turkic *-r. It is a former sonorant that still behaves like one. /v/ acts as if it were /w/. I think /ʒ/ is only in borrowings like bej 'beige' and garaj 'garage'; it may retain its final voicing by analogy with /z/. And/or such borrowings postdate devoicing. (When did devoicing occur?)

(I am reminded of how traditional Tangut phonology groups z- and zh-sounds with liquids in consonant class IX rather than with s- in consonant class VI and sh- in consonant class VII.)

One problem with the above analysis is that /r/ devoices in word-final position. So if /z/ is really like /r/, why doesn't it devoice like /r/? And if I understand Kornfilt (2009: 524) correctly, speakers who devoice /r/ also devoice palatal /lʲ/ and may even devoice velar /ɫ/. (Göksel and Kerslake 2005: 8-9 do not mention the devoicing of laterals.) HOW DID MUḤAMMAD BECOME MEHMET?

Originally this post was titled "Why Doesn't Muḥammed Have Ü?". But the answer to that question is simple: Arabic /u/ was borrowed as Ottoman u both before and after Arabic pharyngeals. I mistakenly thought vowels in Ottoman borrowings from Arabic were determined only by preceding Arabic consonants.

My new title question is more difficult. According to Wikipedia, "the most common Turkish form of the Arabic name Muhammad" is Mehmed (now Mehmet):

Originally the intermediary vowels in the Arabic Muhammad were completed with an e in adoption to Turkish phonotactics, which spelled Mehemed, and the name lost the central e over time. Final devoicing of d to t is a regular process in Turkish. The prophet himself is referred to in Turkish using the archaic version, Muhammed.

I thought Mehmet was a Turkish version of Arabic Maḥmūd, but they are only related because they share the same M-Ḥ-D root. The two names are distinct in Arabic spelling: Meḥemmed (now Mehmet) and Muḥammed are both محمد <mḥmd> like Arabic Muḥammad (Buğday 2009: 220), whereas I suppose Turkish Mahmut (Ottoman Maḥmūd?; the name is not in Buğday 2009) is محمود <mḥmwd> like Arabic Maḥmūd.

Turkish Mahmut < Arabic Maḥmūd has a for the same reason that Ottoman Muḥammed has u: a neighboring Arabic pharyngeal. (Contrast with Ottoman mühimmāt < Arabic muhimmāt 'important matters' in which /u/ has no pharyngeal neighbor.)

On the other hand, Turkish Mehmet < Ottoman Meḥemmed < Arabic Muḥammad has a first e where I would expect an u before an Arabic pharyngeal. And the second e of Ottoman Meḥemmed occurs where I would expect an a after an Arabic pharyngeal.

The key word is "Arabic". Turkish doesn't have pharyngeals. Here's what I think might have happened. Turks heard Arabic [muħammæd] and borrowed it in harmonized form ("in adoption to Turkish phonotactics" as Wikipedia put it) as *Mühemmed. (I assume the borrowing of Arabic /a/ as a in the presence of pharyngeals was a learned practice only possible to those who were literate: i.e., aware of a graphic if not a phonetic distinction between Arabic glottal ه <h> and pharyngeal ح <ḥ>, both borrowed as [h] in Turkish.) The first vowel was then irregularly assimilated to the other two e: Mehemmed (written etymologically in Ottoman as <mmd>, transcribed here with vowels and unwritten gemination as Meemmed, a compromise between the pronunciation and the spelling).

The relationship between Mehmet and Muhammed is slightly like that between the Korean and Japanese words for 'Buddha' on the one hand and the Sino-Korean and Sino-Japanese morphemes for 'Buddha':

Sinoxenic morpheme
부처 Puchhŏ < *put-ke < Late Old Chinese 佛 *but Pul < northern Late Middle Chinese *fur
Hotoke < *potə-ka-i < Paekche *? < Late Old Chinese 佛 *but Butsu < Early Middle Chinese *but

The two columns represent two kinds of borrowing. All of the above forms are based on Chinese 佛 'Buddha' (itself a borrowing from Indic Buddha). But the forms in the first column cannot be mechanicaly derived from Chinese like those in the second column. The former were idiosyncratically borrowed as single items and not as part of an entire lexicon complete with systematic conventions of pronunciation. (Chinese is to Korean and Japanese what Arabic was to Ottoman.)

Adding to the idiosyncracy are suffixes absent from Chinese. Early Korean *-ke and early Japanese *-ka- seem to be a Koreanic morpheme 'ruler' which may have continental origins: cf. Khitan qa 'khan'. Japanese *-i is a noun suffix.

I cannot explain the *o in Japanese. Perhaps there was a lowering of *u in Paekche, the likely donor language. But there is no other evidence of such lowering. The general tendency in early Japanese was toward raising, not lowering: pre-Old Japanese *o became Old Japanese u, not the other way around. WHY DOES MÜHACIR HAVE Ü?

After the ethnic cleansing of Phocaea, muhacirs settled in what is now Foça.

The Turkish word muhacir [muhadʒir] 'migrant' is from Arabic muhājir. I was surprised that the Azerbaijani counterpart is mühacir with ü. I would understand fronting a foreign u to make a word conform to vowel harmony, but mühacir is even less harmonic than muhacir (which would be ˟mühecir or ˟muhacır if it were fully harmonic).

first vowel
second vowel
third vowel
muhacir back
mühacir front
hypothetical (all front vowels)
˟mühecir front front front
hypothetical (all back vowels)
˟muhacır back back back

On the basis of these two words (dangerous!), I expected that Arabic u after nonemphatic consonants such as m was be borrowed into Turkish as back u and Azerbaijani as front ü.

And I was wrong. Buğday's The Routledge Introduction to Literary Ottoman (2009: 11) explains:

The pronunciation of short vowels in Persian and Arabic words is generally governed by which consonants appear before and after the vowels. Arabic vowel graphs are as a rule interpreted as front vowels in Ottoman (üstün = e, kesre = i, ötre = ö, ü). There is nonetheless a group of consonants that cause front vowels in their environment to shift their point of articulation and become back vowels (a, ı, o, u).

Those consonants that shift vowels from front to back are: ح ḥ, خ ḫ, ص ṣ, ض ż, ط ṭ, ظ ẓ, ع ,` غ ġ, ق ḳ. The remaining consonants retain the front articulation of the vowels:

ب b, پ p, ت t, ث s, ج c, چ ç, د d, ذ z, ر r, ز z, j, س s

ش ş, ف f, ك k, ل l, م m, ن n, و v, ه h,ی y

I have long known about Turkish e for Arabic a, and that has never surprised me since [æ] is an allophone of Arabic /a/ and is the phonetic value of Persian short a.

Arabic [æ] > Persian [æ] > Turkish e

But neither Arabic nor Persian have front rounded vowels, so I didn't expect this shift:

Arabic [u] > early New Persian [u] > Turkish ü (less commonly ö and rarely o)

(Modern Persian has lowered [u] to [o].)

ö is particularly odd in `Ömer after `ayn which normally should favor a back vowel: e.g., in sā`at [saːʔat] 'clock'. (Turks could not pronounce `ayn [ʕ], but they did replicate the backness of /a/ after /ʕ/ in Arabic.) Did the first vowel front to match the frontness of the second vowel?

o in `osmān 'Uthman' is understandable since a mid [o] approximates the lowered allophone [ʊ] of /u/ after `ayn.

So although I initially thought that Turkish mücahit 'jihadi' (cf. Azerbaijani mücahid) < Arabic mujāhid was irregular, in fact it is regular, and the real question is: why isn't Turkish muhacir 'migrant' ˟mühacir with a front vowel?

Another question is: Why does the word 'jihadi' have u in Uzbek mujohid (cf. Tajik mujohid with the Tajik-internal shift o < ā) and modern Uyghur mujahit? Is there an east-west split in the way Arabic u is borrowed in Turkic? Do Uzbek and Uyghur reflect Chagatai borrowing practices? Did Chagatai and early Turkish speakers perceive Arabic /u/ in nonemphatic environments differently?

Turkish fronting of nonemphatic vowels interests me because it reminds me of the Mandarin palatal reflexes of Middle Old Chinese nonemphatic vowels in Mandarin: e.g.,

Middle Old Chinese
Mandarin (sans tones)

to dwell

good fortune




3rd person poss. pron.

(Not all *k-nonemphatic vowel sequences have palatal reflexes in Mandarin. *k- that palatalized early became *tɕ- which in turn became [tʂ]: e.g., 支 *ke > *kie > *tɕie > *tɕi > [tʂɻ̩] 'branch'.)

Norman (1994) was the first to make the connection between Arabic emphasis/nonemphasis and what Pulleyblank called the type A/B contrast in Old Chinese (which Norman interpreted in terms of pharyngealization). HOW DID PHOCAEA BECOME FOÇA?

Today is the centennial of the massacre at Φώκαια <Phṓkaia> /fokea/ [focea] 'Phocaea', now Turkish Foça /fotʃa/ [fotʃa]. (What would its Ottoman spelling have been? فوچا <fwčʔ>?).

I was surprised by the correspondence betwen Greek /kea/ [cea] and Turkish /tʃa/ [tʃa]. In theory Greek /fokea/ [focea] could have become Turkish ˟Fokea /fokea/ [focea]. But maybe the local Greek and Turkish versions of the name are closer: e.g., if the local Greek dialect had shifted *ea to [ja] and if the local Turkish dialect had merged [c] and [tʃ], etc. Or maybe I'm just seeing regular borrowing conventions at work reflecting an earlier time: e.g., if Greek /k/ had palatalized to [c] before /e/ before Turkish /k/ did, then the closest Turkish equivalent of Greek [c] at that time would be [tʃ].

Having spent so many years studying Sinoxenic - systematic Chinese borrowings in Vietnamese, Korean, and Japanese - I'm accustomed to regularity in borrowings. And unusual features are usually not random noise. They generally reflect lost features: e.g., dentals in Sino-Vietnamese reflecting old southern palatalized labials, the -l of Sino-Korean reflecting an old northern final liquid absent from any living Chinese language, etc.

Middle Chinese 必 *pit 'necessarily'
> Sino-Vietnamese tất [tət] < *sət < *psət < *ət in Annamese Middle Chinese

Ferlus (1992) reconstructed earlier Sino-Vietnamese *pz-, but I have never seen that cluster in any Mon-Khmer language

the schwa is an interesting deviation from the Chinese norm I'll explore later

> Sino-Korean phil < *pir in northern Middle Chinese

the Korean aspiration is irregular and may be due to hypercorrection

So I'd like to think there's some significance in the correspondence betwen Greek /kea/ [cea] and Turkish /tʃa/ [tʃa]. But maybe there isn't any. The elite of Vietnam, Korea, and Japan looked up to Chinese and wanted to closely emulate Chinese pronunciation, whereas Turks had no motivation to closely emulate the pronunciation of their Greek subjects. Greek εἰς τὴν Πόλιν [is tim bolin] 'to the city' became İstanbul, not ˟İstimbolin. MATERNAL COMPRESSION

In "On the Origin of the Mainstream Hakka Word [oi1] 'Mother' ", W. South Coblin proposes that oi-type words for 'mother' in Hakka varieties originate from the compression of two syllables (amoi) into one (oi). Although amoi > oi at first looks like am-loss (i.e., the disappearance of the first half of the word), if the kinship prefix a- is analyzed as a zero consonant Ø- plus a rhyme -a, then oi is really Øoi with the initial of the kinship prefix Øa- and the rhyme of the root moi 'mother':

- oi

That is an example of one of three types of compression in Chinese and other languages of the region:

1. disyllabic word > loss of first syllable without any trace in the second syllable

刀 Early Old Chinese *CVtaw > Late Old Chinese *taw 'knife' (If not for Vietnamese [zaːw] with lenition of *-t- conditioned by *CV-, no first syllable would be reconstructible)

2. disyllabic word > fusion of initial consonants of both syllables + rhyme of second syllable

抱 Early Old Chinese *mʌpuʔ > Late Old Chinese *bowʔ 'to carry in the arms'

*b is a fusion of *m- and *-p-.

My formulation needs to be tweaked because the vowel of the surviving syllable has changed under the influence of the lost vowel of the previous syllable: *u has lowered to *ow.

3. disyllabic word > initial of first syllable + rhyme of second syllable ... no, I'd better reformulate that.

Coblin gives a standard Mandarin example: 不用 yòng lit. 'not use' > 甭 béng 'no need to' (note the neat stacked composite character). My initial formulation doesn't work; it would predict a fusion ˟bòng or ˟bèng (the latter takes into account the impossibility of -ong after labials in standard Mandarin). But the actual form has the tone of the first syllable and a rhyme that is unlike either syllable. So how about

3'. disyllabic word > initial of first syllable + fusion of rhymes of both syllables

to account for 甭 béng?

And 3' can be reworded to account for 抱 *bowʔ:

2'. disyllabic word > fusion of initial consonants of both syllables + fusion of rhymes of both syllables

No, wait, *-ʌ- in *mʌpuʔ isn't a rhyme - it's a vowel in the middle of a word. And I can't think of a word to describe *CʌCu > *CʌCow > *Cow. 'Umlaut' isn't right. Vowel harmony is involved, but there's also diphthongization. I've used the term 'bending' and Schuessler uses the term 'warping', but neither term acknowledges the first vowel that triggers the process. 'Harmonic bending' or 'harmonic warping'?

In any case, I've been thinking that reduction is irregular. Fusion is a type of reduction. So I expect some difficulty in trying to ... reduce reduction to a set of simple categories. I'd still like to say something other than 'anything can happen', though. There are constraints on complexity.

Let's zoom out from a single etymology toward the bigger picture of Hakka as described by Coblin. Let me try to translate his words into a tabular tree:

Early South Central Chinese
Early Southern Highlands Chinese
a subset of Tuhua/Pinghua
Common Hakka-She

土話 Tuhua 'local speech' and 平話 Pinghua 'ordinary speech' are generic terms for a set of unclassified Chinese languages. Coblin proposes that some of them may be related to his Southern Highlands Chinese group of languages which I could call 'Greater Hakka'. He reconstructs 'mother' in Early South Central Chinese as *mVi3/4, leaving aside the problem of daughter forms with tones 1 and 2 (e.g., the "[oi1]" in his title) for the time being. TANGUT VOWELS V. 190509

Writing about the Tangut transcription of Sanskrit trailokya got me thinking about the phonetic values of Tangut vowels again. Here's my own take on the four grades influenced by Gong Xun's ideas. Only basic vowels are listed in Tangraphic Sea order, so there are no nasalized, tense, or retroflex vowels. I still have no idea what the distinction that I indicate as -' was.

Basic vowel
[ʊʶ] [u]
[ɪˁ] [ɪʶ] [ɰi]
[ɑˁ] [ɑʶ] [a]
[ɤˁ] [ɤʶ] [ɯ]
[ɛˁ] [ɛʶ] [ɰe]
[ɔˁ] [ɔʶ] [o]

I write the basic vowel /ə/ as an easy-to-type y in my abstract notation.

The grades:

I. Pharyngealized; lowered and/or backed

Pharyngealization is carried over from Jerry Norman's proposal for the Old Chinese source of Middle Chinese Grade I.

The lowered and backed allophones are similar to Arabic vowel allophones after 'emphatics' as described in Kaye (2009: 565).

Syllables with 'lower' series vowels (*a *e *o) automatically developed pharygealization unless this was blocked by a preceding 'higher' series presyllabic vowel (*ɯ):

*Ca > *Cɑˁ but *CɯCa > *Ca

Conversely, a 'lower' series presyllable vowel (*ʌ) triggered pharyngealization in a following 'higher' series vowel (*ə *i *u):

*CʌCi > *Cɪˁ

Low /a/ cannot be lowered any further, so it is only backed.

Front /e/ is retracted to [ɛˁ]. The underlining indicates retraction. [ɪˁ] without underlining is already backer than front [i], so I do not underline it.

Back /u o/ cannot be backed any further, so they are only lowered.

II. Uvularized; lowered and/or backed

Medial /r/ in pre-Tangut pharyngealized syllables became uvular [ʁ]. This uvular medial was lost, but it colored the following vowel: e.g.,

pre-Tangut *pʰrat*pʰʁɑˁtpʰɑʶ = 2475 𗧑 1pha2 'to break in two'

Note that Gong Xun reconstructs uvularization in both Grades I and II:

Gong Xun

This site

In his system, a medial -ʕ- distinguishes Grade II from Grade I.

Gong has a single unmarked category corresponding to my Grade III and IV. Although it is true that Grades III and IV are in nearly complementary distribution -

Grade III: after v- (a labiovelar glide?), retroflexes, (velarized?) l-

Grade IV: elsewhere

- I still want to work out how they sounded to distinguish between the few minimal pairs that existed.

Syllables with 'higher' series vowels (*ə *i *u) automatically became Grade III or IV dependng on the preceding initial unless there was a preceding 'lower' series presyllabic vowel (*ʌ):

*Ci > *Ci but *CʌCi > *Cɪˁ

Conversely, a 'higher' series presyllable vowel (*ɯ) triggered Grade III or IV in a following 'lower' series vowel (*a *e *o):

*CɯCa > *Cæ

III. Higher and centralized

Grade III was less palatal and more velar than IV. Its palatal vowels /ɰi ɰe/ had velar glides /ɰ/ to distinguish them from the pure palatal vowels [i e] of Grade IV. The sequence /wɰ/ surfaced as [w].

IV. Higher and fronted

Grade IV was more palatal than III. It had front vowels [æ y ø] corresponding to the central or back vowels of other grades.

An exception to that pattern is [ɨ] which, though not front, was still fronter than its back counterparts in other grades.

The Grade IV equivalent of labiovelar [w] in other grades was labiopalatal [ɥ].


Unattested syllables are in parentheses.

𗳛 0244
[qɑˁ] 𗉯 1689
[qɑʶ] -
𗡝 4620
𗩰 3687
1kwa1 [qwɑˁ] 𗬶 2307
1kwa2 [qwɑʶ] 𘐈 5758
𘟖 1031
𗴫 3006
𘎫 5157
([qwɪˁ]) 𗔤 4899
[qwɪʶ] -
(1kwi3) ([kwi]) 𘅧 0576
1kwi4 [kɥi]

I regard [q] as the Grade I and II allophone of /k/.

Are the gaps in the table random or systematic? Any theory of grades should be able to answer that question.

My hypotheses above regarding the origin of the grades predict that

- lower-vowel syllables should tend to have Grades I and II

- higher-vowel syllables should tend to have Grades III and IV

if *CV monosyllables outnumbered *CV̆CV sesquisyllables.

And above we see

- there is no 1ka3 or 1kwa4

- there is no 1k(w)i1

which fits my predictions.

The absence of 1kwi3 is also not surprising, since ki-syllables should tend to have Grade IV, not Grade III. k- does not belong to the subset of initials associated with Grade III: v-, retroflexes, and l-. There are only three known k-syllables with Grade III, and two of them happen to be in the table: 1kwa3 and 1ki3. The third is 1ka'3 which must have been something like [ka] plus whatever feature was represented by -'. PITTAYAPORN'S PROTO-TAI *-ɲ

One of the innovations of Pittayawat Pittayaporn's (2009) PhD dissertation The Phonology of Proto-Tai is his reconstruction of a Proto-Tai final palatal nasal:

Since it has been established that PT [Proto-Tai] allows palatal consonants in the coda [i.e., *-c¹ and *-j], one would also expect to find the palatal nasal occurring in coda position. Although the reconstruction of PT *-c is unequivocal, there is rather little evidence for final *-ɲ. The only potential case I have identified so far is ‘to eat’, which is reflected as /kinA1/ in all SWT [Southwestern Tai] varieties but as /kɯnA1/ in NT dialects [Northern Tai] like Wuming and Yay. We can speculate that the PT form for ‘to eat’ was *kɯɲ A but the vowel was fronted so that the PSWT [Proto-Southwestern-Tai] form for this etyma was *kin A. Therefore, I tentaitively hypothesize that PT had both *-c and *-ɲ.

The reconstruction of *k- and tone category A for the Proto-Tai word 'to eat' is certain. The vowel and coda of that word are less certain.

Let's look at the 'eating' problem from a subgrouping perspective. Unlike Li Fang-Kuei whose classic model of the Tai family had only three branches (Northern, Central, and Southwestern), Pittayaporn (2009: 298) proposed four branches on the basis of clusters of innovations:

A. Most Tai languages

B. Ningming

C. Chongzuo and Shangsi

D. All of Li's Northern Tai languages (such as the displaced Saek in the southeast) plus some of his Central Tai languages

Wikipedia has a clickable version of Pittayaporn's tree.

What is 'to eat' in the four branches?

A. Siamese kin A1

B. Ningming ken A1 (not in Pittayaporn 2009; found in Hudak 2008: 121)

C. Shangsi kɤn A1

D. Yay kɯn A1 but Saek kin A1

There are two types of words for 'to eat': ones with front vowels (A, B, Saek) and ones with back vowels (C, Yay). All end in -n.

Given that -in words are found in both A and D (Saek), let's suppose those branches independently preserve a proto-rhyme *-in.

By analogy, any -in words in Siamese and Saek should respectively end in -en in Ningming, -ɤn in Shangsi, and Yay -ɯn unless complicated by other factors. But is this really the case? Compare the forms for 'to eat' with those for Pittayaporn's *lin A 'water pipe':

A. Siamese lin A2

B. Ningming (no cognate in Pittayaporn or Hudak)

C. Shangsi lin A2 (not ˟lɤn A2)

D. Yay and Saek lin A2 (not Yay ˟lɯn A2)

It is true that in the modern languages, 'to eat' and 'water pipe' belong to different tonal categories (A1 and A2) conditioned by the initials (*voiceless > 1, *voiced > 2). So one could try to salvage the *-in reconstruction of 'to eat' by claiming that *-i- changed before *-n in tone A1 syllables in Ningming, Shangsi, and Yay. But why would, for instance, tone A1 cause *-i- to lower and back to -ɤ- in Shangsi?

Might the original rhyme of 'to eat' be preserved in Shangsi - or Ningming or Yay? No, because the rhymes of 'to eat' in those languages do not otherwise correspond to -in in Siamese and Saek. Here are all the relevant correspondence sets, including those I already mentioned:

Pittayaorn's Proto-Tai
*-ɯɲ -in
-ɤn -ɯn -in
to eat
water pipe
-ɤn ?
*-ɤn -on
*-ɯn -ɯn ?
-ɯn -ɯn to ascend

Pittayaporn's solution is ingenious:

- It accounts for the front vowel of Siamese and Saek as the result of feature transfer: the palatality of *-ɲ shifted to the vowel *-ɯ-, causing it to independently front to -i- in two distant branches of Tai (assuming the Saek word isn't a loan).

- The shift of *-ɯɲ to -Vn in all branches fits a trend against -Vɲ rhymes in Southeast Asian languages. Khmer does have a high neutral vowel-palatal nasal sequence /ɨɲ/ (e.g., in <beñ> /pɨɲ/, the Penh of Phnom Penh), but it is exceptional. Burmese once had /-aɲ/ as its sole /-ɲ/ rhyme, and Vietnamese only has /-aɲ -eɲ -iɲ/.

There are, however, two problems with his *-ɲ:

First, it is only reconstructible in 'to eat'. Perhaps it had merged with *-n (and/or *-ŋ) after other vowels. Or 'to eat' is simply irregular, and *-c has no nasal counterpart, just as Old Chinese *-kʷ has no nasal counterpart.

Second, there is no external support for *-ɲ either within Kra-Dai or beyond it. Although Norquest (2015) reconstructs *-ɲ in Proto-Hlai, he does not reconstruct *-ɲ in Proto-Qi³ *kʰən (< my pre-Hlai *kən) 'to eat'. Blust's Proto-Austronesian *kaen [kaən] - somehow related to the Proto-Tai and Proto-Qi words - ends in *-n, not *-ɲ. The *k-word for 'to eat' probably goes back to Proto-Kra-Dai and is either inherited or borrowed from some Austronesian-type language⁴. Does Proto-Tai preserve a *-ɲ lost elsewhere?

¹The reconstruction of a Proto-Tai final palatal stop is another innovation of Pittayaporn (2009). Although no attested Tai language has /c/, reconstructing *-c accounts for correspondence set 2 in the following table:

Tai languages
Pittayaporn's Proto-Tai Saek


Sets 1-3 are from Pittayaporn (2009: 211-212). Set 4 is based on the forms for 'liver'.

Saek is an aberrant Tai language which "shows many peculiarities that cannot be reconciled within the conventional model of PT [Proto-Tai] phonology" (Pittayaporn 2009: 14).

The Be languages are generally thought to be close relatives of Tai. See Chen (2018: 18) for the placement of Be within four different proposed Kra-Dai language trees. Ostapirat has changed his mind over time; in 2000 he viewed Be as a sister of Tai but in 2015 he viewed Be as a primary branch of Kra-Dai, and as of 2017 he viewed Be as a sister of a Tai-Kam-Sui subgroup.

²Proto-Tai *ˀjen A 'tendon' has a different set of rhyme correspondences that may be conditioned by a palatal initial absent from Proto-Tai *ʰmen C 'porcupine'.

³Proto-Qi is my term for the common ancestor of the Qi subgroup of Hlai. Norquest reconstructs it but has no term for it. Other early Hlai languages had unrelated words for 'to eat'. As only the Proto-Qi word is cognate to the Proto-Tai word, it seems that pre-Hlai must have inherited the word from Proto-Kra-Dai, but only one dialect of Proto-Hlai (i.e., Proto-Qi) retained it whereas other dialects of Proto-Hlai replaced it with innovations of unknown origin: *C-ləːk in Proto-Run and *C-luːɦ elsewhere.

⁴I am deliberately vague here because I do not know if Proto-Kra-Dai is descended from Proto-Austronesian or is a sister to it (i.e., a descendant of Proto-Austro-Dai, if I may modernize Benedict's term 'Austro-Tai'). Or if there is no genetic relationship between Kra-Dai and Austronesian, if Proto-Kra-Dai borrowed from Proto-Austronesian, an ancestor of Proto-Austronesian, or a descendant of Proto-Austronesian. SANSKRIT TRAILOKYA IN THE TANGUT INSCRIPTION AT JUYONGGUAN

Five years ago I rediscovered 村田治郎 Murata Jirō's 1957 book on the inscriptions of the Cloud Platform at 居庸關 Juyongguan¹ in the University of Hawaii library. I had last borrowed it around 1996. Of course my attention was drawn to the Tangut inscription. But, I confess, not for long. Soon after that I dove into the world of Tangut's distant relative Pyu. And I've been there for four years.

Then yesterday Andrew West reawakened my interest in the Juyongguan inscriptions.

Today I was looking at the Tangut inscription at Juyongguan, and the Tangut transcription


5300 3639 2770 4620

1ty4 2rer4 2lo1 1ka4

of Sanskrit trailokya 'three worlds' jumped out at me. I've used Trailokya as part of my long pen name for maybe twenty-five years now.

A few words on the transcription characters:

𘎤 5300 1ty4: The only consonant clusters possible in native Tangut words had -w- as their second element. So one strategy for transcribing Sanskrit consonant clusters was to break them up into CyC-sequences. Tangut y was a neutral vowel, and in Grade IV (indicated by my -4) it was something like [ɨ] or [ɯ].

𗣀 3639 2rer4: Tangut had no [aj]. Guillaume Jacques (2014: 206) does not even reconstruct *-aj at the pre-Tangut level. I am guessing pre-pre-Tangut *-aj became pre-Tangut *-ej (which Guillaume does reconstruct) and then Tangut -e.

Here's a possible example:

𘞪 5356 1teq4 < *Sɯ-taj² 'single' could be cognate to Jingpho tāi and Boro otay, part of a cognate set that Matisoff (2003: 262) glosses as 'single/one/whole/only'³.

(I finished the rest of the entry on 5.4.15:39, added a footnote on 5.6.19:06, and then failed to save the finished page. What follows is a new second half from 5.6.19:39.)

𗥹 2770 2lo1: For a long time, I used to think that Tangut tones might actually be phonations: tone 1 was the default phonation and tone 2 was the marked (creaky or breathy?) phonation. But the phonation hypothesis predicts that Sanskrit would be transcribed solely using Tangut characters for syllables with tone 1. There would be no reason to transcribe Sanskrit with Tangut characters for syllables with tone 2: i.e., a phonation that did not exist in Sanskrit. However, most Sanskrit Co-syllables⁴ were transcribed with Tangut characters for syllables with tone 2 (Arakawa 1999: 111).

Tone 1
Tone 2
Both tones

co, jo

to, do

pho, bo, mo

yo, ro

śo, ho

Why was tone 2 favored for Sanskrit Co-syllables?

Conversely, why was ko transcribed with a Tangut character for a syllable with tone 1?

And was there a reason to transcribe the remaining Sanskrit syllables with Tangut characters for syllables with both tones? For instance, was there something about the -lo- of trailokya that necessitated tone 2, whereas the lo in some other word was somehow different to Tangut ears and required the tone 1 character 𗓽 4710 1lo1?

𗡝 4620 1ka4: This character transcribed both Sanskrit ka and kya. Why not transcribe Sanskrit kya as ky ya (cf. 1ty4 2rer4 for trai above) or as a fanqie character for kya combining  part of a kV-character with part of a ya-character? Perhaps 1ka4 was something like [kja]. But if Grade IV (written here as -4) was characterized by [j], why could 1ka4 also represent Sanskrit ka? Was there no simple [ka] in Tangut? Were Grade I and II ka something other than [ka]: e.g., [qɑˁ] and [qɑʶ] like Middle Chinese *1ka1 and *1ka2? Why was there no Grade III ka?

Chinese and Tangut grades seem to be similar. So if the Middle Chinese transcription of Sanskrit ka was 迦 *1ka3, I would expect the Tangut transcription to be 1ka3 - a syllable that does not exist in Tangut!

To complicate matters, Grinstead (1972: 144) says 4620 could represent Sanskrit ke. 1ka4 must have sounded like Sanskrit ke as well as ka and kya. Maybe it had a front vowel: [kjæ]? 

¹This name was built into Windows 10's pinyin IME. It's interesting to see what's in and out of the IME.

Sometimes more annoying than interesting. For instance, the common character 家 jia 'house' isn't listed as a choice for jia. I've been typing 家族 jiazu 'family' and deleting the second character to type 家 jia.

At least 波 bo 'wave' is included as a choice for bo now. I recall having to type the wrong reading po to make it display in some older version of the Windows Mandarin IME. I just noticed that the bopomofo IME accepts both bo and po for 波 bo 'wave'.

²(Pre-)pre-Tangut *S- conditioned Tangut -q (my symbol for vowel tension) and pre-)pre-Tangut *-ɯ- (perhaps a front or back high vowel like *-i- or *-u- in pre-pre-Tangut) conditioned Grade IV.

³Matisoff (2003: 262) does not gloss the Jingpho and Boro forms.

⁴Many Sanskrit Co-syllables are absent from Arakawa's data: e.g., kho, gho, cho, jho, ṭo, etc.

⁵Arakawa (1999: 111) accidentally omitted the rhyme and first tone of 𗓽 4710 1lo1, the other Tangut transcription character for Sanskrit lo in his table. URN-ING MY PAY

1. Four years of studying Pyu are paying off. Prof. Janice Stargardt of Cambridge made me reexamine the Hpayahtaung urn inscription (PYU 20). After all my advances in Pyu phonology, grammar, and lexicography, I'm finally beginning to understand it now. Just beginning. I imagine that the Khitan Small Script Research Group felt like I did when they began to make progress in understanding Khitan in the late 70s. The decipherment of both Pyu and Khitan both have a long way to go - neither is remotely as advanced as the decipherment of Tangut - but I am now beyond the level of mere isolated words and a handful of grammar rules.

I thought Pyu was totally hopeless when I first tried to wrestle with it in 2015. But I'm starting to see the light at the end of the tunnel now. I'll probably never reach the end of the tunnel, but I hope my work can help others get there.

2. I try not to have tunnel vision. Paradoxically, not focusing on Pyu is the key to understanding Pyu. It's my knowledge of other languages that have made a difference in my efforts to crack that extinct language. I don't have time to look into anything other than Pyu in depth anymore, but I can still glance at the world outside first-millennium Burma.

While Googling for spontaneous nasalization for last night's entry, I came across Rémy Viredaz' "Two unrecognized vowel phonemes in Proto-Slavic".

Even before I got to the mind-blowing part about new phonemes (p. 13), I was stunned by his phonetic interpretation of the traditional set of vowel phonemes as a symmetric system (p. 1). Imagine a Slavic conlang retaining those old phonetic values.

One of Viredaz' new phonemes accounts for the unusual -e of the Old Novgorod masculine o-declension corresponding to *-ъ in the Slavic mainstream.

Now I wonder how Magadhi got -e in the masculine a-declension corresponding to the Slavic masculine o-declension. Needless to say, an Indic verson of Viredaz' solution won't work.

3. I haven't forgotten about northeast Asia. Last night I also saw Andrew Shimunek's "Phonological and literary characteristics of some pieces of Khamnigan oral folklore" which made me wonder if anyone has done a survey of what might be called phonoliterary techniques in the Altaic world. Both Khamnigan and Khitan use rhyme which is alien to Korean and Japanese. Oddly a couple of words that rhyme in Russian have Khamnigan forms that do not rhyme:

R zeljonka > Kh tʃilɔːɴqʰɔ 'green tobacco'

R kartofel' ~ kartoška > qʰɔrtʰapqʰa 'potatoes'

4. Alexander Vovin's "EOJ [Eastern Old Japanese] specific vocabulary and Ainu vocabulary from the Man'yōshū" is a handy reference that only an expert in both early Japonic and Ainu could write.

Now I'm curious about the Proto-(Mainland) Japanese and even Proto-Japonic forms underlying the EOJ and Western Old Japanese forms: e.g., what I presume would be *yuru for EOJ yuru and WOJ yuri < *yuru-i 'lily'. INDO-BURMESE IRREGULARITIES

I'm going to retire the Jurchen day titles because they would all reappear after sixty days. And they made sense as umbrella titles under which I could write about multiple topics, but they make less sense if I'm only going to write about one thing.

John Okell and Anna Allott's Burmese/Myanmar Dictionary of Grammatical Forms - like John Okell's other works on Burmese - is a model that ought to be emulated by teachers of other languages. Saya John's Burmese learning materials are the best I have ever used for learning any language, so I can say without a doubt (and with great shame) that the poverty of my Burmese is all my own fault.

DGF - as Prof. Justin Watkins called it - is a pleasure to read. If only I could retain everything I had read in it. It's been nearly three years since I studied Burmese in Rangoon under Saya John, reading DGF for fun every morning at breakfast. (That's a redundant phrase - what other meal I would I have in the morning?).

Today I looked up the organization ဒို့ အိမ် <dui. im> [do̰ ʔeĩ] Doh Eain 'Our Home' and wanted to see what DGF had to say about <dui.> an abbreviation of ငါတုိ့ <ṅā tui.> 'I plural' = 'we'. While looking in the section, I saw an example sentence for <nāḥ> in the <n> section (<n> is after <d>> and <dh> in the Burmese script) with the word

စကြဝဠာ <cakravaḷā> [sɛʔ tɕa̰ wə là] 'universe' (cakra is cognate to English wheel)

which looks like a mix of Sanskrit cakravāḍa (later cakravāla) and Pali cakkavāḷa (Pali is from ḍ, and Sanskrit l looks like a Classical Sanskrit substitution for which isn't in the CS consonant system). What surprises me is that it is not †<cakravāa> [sɛʔ tɕa̰ wa l] with the penultimate and ultimate written vowel lengths/spoken tones reversed to match the Indic originals.

I looked for the word in John Okell's Burmese: An Introduction to the Script whose "Mismatches" section (pp. 308-311) is still a great reference long after one has mastered all the regular patterns of the script. It wasn't there (though it is an example of the "Unwritten final consonant" type since <cakra> should theoretically be †[s tɕa̰] with an open first syllable rather than [sɛʔ tɕa̰] from earlier cakra). But what was there was

ဇိဝှာ <jivhā> [zḛĩ ʍa] 'tongue' < Pali jivhā (cognate to English tongue)

as an example of "an unwritten creaky tone". If the word had a regular spelling, it would be †<jin.vhā> or †<jim.vhā> with the nasality (<n>/<m>) and creaky tone (<.>) indicated. If the word were a regular borrowing from Pali it would be [zḭ ʍa] without nasality or the mid vowel [e] that developed after an earlier *nasal.

In both 'universe' and 'tongue', it seems that an initial short syllable was filled out with a coda (*k and either *n or *m). This filling seems to have taken place after the pattern of borrowing Indic short vowels as *creaky vowels was set, so Pali jivhā was borrowed as *dʑḭN ʍa (*N = nasal I'm uncertain about) rather than as dʑiN ʍa.

This filling has paralells in Thai: e.g., Sanskrit cakra corresponds to Indo-Thai จักร <cakra> [càk krà-] with an unwritten filler (echo) [k].

The rule for fillers seems to be that they echo following stops and are homorganic (?) nasals before following sonorants (so maybe the earlier Burmese word for 'tongue' was *dʑḭm ʍa).

I can understand the motivation for fillers in Thai in which there is a constraint against syllables ending in short vowels. But there is no similar constraint in Burmese which has syllables ending in creaky (*short?) vowels: e.g., စ <ca> [sa̰] 'to start'. Why couldn't cakra, uh, start with †[sa̰]?

Might fillers in Indo-Burmese tell us something about how Indic vocabulary was acquired by Burmese speakers? In other words, are the fillers traces of a Mon intermediary? Old Mon did not allow open stressed syllables: e.g., <ca> 'to eat' was [caʔ] with an unwritten final [ʔ]. Unfortunately, Shorto's Old Mon dictionary doesn't have an entry for cakra- or any compounds with it; the closest entry to Burmese <cakravaḷā> is <cakkavāḷa> [cakkəwal] which matches the Pali. And Shorto has no entry for any Old Mon version of Pali jivhā 'tongue'.

Maybe echo fillers go back to Pyu: e.g., vikrama 'valor' appears in Pyu as vikrama. But Pyu had no constraint against open syllables like those of Old Mon or Thai, so I suspect the Pyu echo fillers go back to Indic itself:

The first consonant of a group—whether interior, or initial after a vowel of a preceding word—is by the grammarians either allowed or required to be doubled. (Whitney, Sanskrit Grammar, §229)

But what of nasal fillers? They have no basis in Pyu or early Indic. I think 'spontaneous' nasalizations are a later phenomenon in Indic, and even if they were contemporary with Old Burmese, would they have affected the pronunciation of 'high' languages like Sanskrit or Pali at the time?

These issues should be covered in the definitive work on Indoxenic - systematic Indic borrowings outside India. Will such a book ever exist? Does anyone know enough both about Indic and Southeast Asian languages past and present to write it? There's no similar book for Sinoxenic yet. Long ago I had hoped to write a book on 'megaloan' systems covering both Indoxenic and Sinoxenic and even Arabic loans in the Islamic world. I had no shortage of ambition back then. Now I have a shortage of time ...

21:31: I forgot to ask: what is -vāḍa/-vāla/-vāḷa? None of the meanings I can find make any sense when combined with cakra- 'wheel' to form 'universe':

From Monier-Williams' Sanskrit dictionary:

vāḍa (no entry)

vāla (said to be a later form of vāra, but I suspect it's an l-dialect variant) 'hair of an animal's tail'

From the Pali Text Society dictionary:

vāḷa 'snake, beast of prey [< Skt vyāḍa 'id.']; music (?)'

This problem isn't Indo-Burmese; it's just Indic. THE PHOENIX ON THE DAY OF THE RED DOG

Or, in Jurchen,

<RED.nggiyan DOG DAY> fulanggiyan indahūn inenggi

Today I accepted a request from Marijn van Putten (blog / to follow me on Twitter. Here are three samples of his work:

1. What was Sibawayh's pronunciation of Arabic ج <j>?

2. "Is Qur'ān a loanword from Syriac?"

All historical linguistics students would benefit from his explanation of how to evaluate loanword proposals.

3. "The Case for Proto-Semitic and Proto-Arabic Case: A Reply to Jonathan Owens"

I was not convinced by Owens' argument for case as an innovation in Arabic, but as a nonspecialist I didn't feel competent to judge. So I am happy to see Ahmad Al-Jallad and Van Putten's critical take. THE DAY OF THE YELLOW DRAGON

Or, in Jurchen,

<YELLOW.nggiyan DRAGON DAY> songgiyan mudur inenggi

Today Chief Cabinet Secretary 菅 義偉 Yoshihide Suga revealed the new Japanese era name 令和 <GOOD HARMONY>¹ Reiwa that will begin a month from now. ('Now' is Alofi time so I can finish this on 1 April!)

So much has been said about that name in just one day. Hence I might not be the first to say this:

Could the Old Chinese word *Cɯreŋ 'good' written as 令 be cognate with the far more common Old Chinese word 良 *Cɯraŋ  'good'?

OK, I doubt anyone else would reconstruct the two words othat way. But the two words are similar no matter which reconstruction one uses. They're even similar to this day in Mandarin: líng and liáng.

There are two problems with my claim:

First, what is the significance of the two different root vowels? I have no explanation. I would feel more justified in linking the words if I had other examples of *a ~ *e alternation with identifiable functions.

Second, did the words really have the same initial consonant in their presyllables? If not, what would have been the significance of different prefixes?

I need the high-vowel presyllables to account for the diphthongs that the words developed in Middle Chinese:

令 Old Chinese *Cɯreŋ > Middle Chinese *Cɯri

良 Old Chinese *Cɯraŋ > Middle Chinese *Cɯrɨ

¹Why is this name already in Windows 10's Japanese IME!? Was it added in an update? THE DAY OF THE WHITE OX

Or, in Jurchen,

<ša.nggiyan OX DAY> šanggiyan wihan inenggi

1. I started looking at Alexander M. Ščerbak's "Reconstrucing the Manchu-Tungusic Proto-language" (2012) tonight. It lists sample proto-forms divided into six semantic categories. I only have time to discuss the first, "Terms for Day, Night, Month, and Year" from a Jurchen/Manchu (J/M) perspective.

1a. *ineŋī 'day'

J/M and Oroqen are the only languages cited with -ŋg-. J/M have hardened intervocalic *-ŋ- to a prenasalized stop. This seems in line with the fortition of initial *ŋ- in Manchu gala 'hand' from Ščerbak's*ŋāla.

Jin Qizong reads Jurchen


as <> ngala.

However, the vocabulary of the Bureau of Interpreters has a Chinese transcription *xa la pointing to gala [ʁala] c. 1500.

If early written Jurchen lacked ŋ-fortition, then perhaps


should be transliterated <ngiyan ... DAY> ngiyan ... inengi [ŋʲan ... inəŋi].

1b. *dolbo 'night'

Why did Manchu dobori lose -l- if -l- was retained elsewhere in the same environment: e.g., in M golbon 'clothes rack'?

3.8.1:11: Ming Jurchen dialects may have retained -l-.


in the vocabulary of the Bureau of Translators was transcribed in Ming Mandarin as 多羅斡 *to lo wo which may have represented dolwo < *dolbo.

The corresponding form in the vocabulary of the Bureau of Interpreters was transcribed in Ming Mandarin as 多博力 *to po li which may have represented dolbori.

The Chinese transcriptions exemplify two approaches to a Jurchen coda -l absent in Chinese: insert a vowel after it or just ignore it.

1c. *bēga 'month' ~ 'season'

I'm surprised a *long vowel was reduced to mere palatalization in Jurchen

biya [pʲa] 'moon, month' (unchanged in Manchu except for the obsolescence of the character, of course).

1d. *anŋa 'year'

I suppose the palatality of the nasal of Jurchen

aniya [aɲa] (again, unchanged in Manchu)

was conditioned by *n which blocked *-ŋ-fortition: *-nŋ- > *-ɲŋ-*-jn- (as in Nanai ajŋani '') > *-nj- > [ɲ]. In the same volume, Janhunen (2012: 16) grouped Nanaic together with Jurchenic in Southern Tungusic. Might *-nŋ- > *-ɲŋ-*-jn- be a Southern Tungusic innovation?

Could a similar *-n- to *-j- shift have occurred in J/M 'gold'?

3.8.1:27: The earliest attested form of J/M 'gold' is Jurchen

<GOLD.un> alcun (or ancun?) 'gold' (originally spelled with a single character <GOLD>?)

Perhaps *alcun > *ancun > *aɲcun > *ajcun > *ajsyn > Manchu aisin [ajɕin]. But then why does have Manchu have an unrelated word alcu 'the concave side of a toy made from an animal's ankle bone' with the -lcu sequence that became -isi- in 'gold'? Absolute regularity would demand that alcu is either a loanword or originated from something other than *alcu: i.e., Manchu developed a new -lcu after the old one became -isi. Rozycki (1983: 27) identified alcu as a loanword in Tungusic from Mongolic, so a workaround to explain -lcu in terms of sound laws is unnecessary in that case. But what of, say, Manchu kalcun 'spirit' which has no Mongolic source? Why didn't it become †kaisin?

The fronting of the second vowel is a problem, as Manchu does have words with isu and aisu: e.g., gisun (not †gisin) 'word' and aisuri (not †aisiri) 'a kind of bird'. The 'missing link' form in Alchuka has a second vowel that is neither palatal nor labial: anʃïn.

The palatal c is also a problem, as Turkic and Mongolic have t, and the original vowel of the second syllable was not palatal, so this is not a case of *ti becoming ci (a change which didn't happen in Jurchen and wouldn't happen until Manchu).

In any case, Poppe's (1960: 52) derivation of aisin from *alʲsin < *alʲtin < *altin as reported in Rozycki (1983: 24) doesn't look likely.

I wonder if the word was borrowed independently by Turkic, Mongolic or Tungusic from different varieties of some fourth type of language - perhaps Xiongnu or Rouran.

2. Today I found the Pyu phrase tiṁ priṅ·ḥ kdaṅ· 'LOC city ?' (27.6) which at first glance appears to have double case marking. kdaṅ· ooks like the second half of ṅit·ṁ kdaṅ· 'with, including'. But 'with in the city' makes no sense. Moreover, ṅit·ṁ kdaṅ· precedes nouns: e.g., ṅit·ṁ kdaṅ· saḥ 'with sons' (16.4A). Maybe kdaṅ· does not modify priṅ·ḥ 'city'. Maybe kdaṅ· even has nothing to do with ṅit·ṁ kdaṅ·.

3. I am not sure how to write kdaṅ· in phonological notation. I used to take it at face value as /k.daŋ/ with a period indicating a potential schwa. But lately I think it might be ambiguous.

One possible interpretation of Pyu preinitial-initial sequences (using velar-dental stop-a sequences as examples)

No schwa
Schwa (with lenition of following nonaspirates)
/kəta/ [kəda]
/kəda/ [kəða]

It is unclear if *schwa or some other minimal vowel contrasted with zero after preinitials. The above scenario assumes such a contrast existed but was not indicated in the script (except indirectly if a following consonant was lenited).

kt-type voiceless-voiceless sequences are absent from the 12th century Kubyaukgyi text, suggesting that /kt/ may have merged with /kət/.

The sequence ktha is hypothetical; the only instance of kth- in the entire corpus is kthor·ḥ '?' (27.6).

Aspirates are rare in Pyu. Aspiration after stops may not be phonemic: e.g., kthor·ḥ might be /ktorH/ rather than /ktʰorH/ or /kətʰorH/. (It cannot be /kətorH/ because an intervocalic /t/ would voice to [d], and the word would have been spelled †kdor·ṃḥ [kəðorH].)

/rH/ may have been voiceless [r̥] or [r] preceded by a vowel with phonation and/or a tone. THE DAY OF THE WHITE RAT

Or, in Jurchen,

<ša.nggiyan DAY> šanggiyan singge inenggi

1. Japanese 蝦蛄 shako 'Oratosquilla oratoria' is a strange word. It is the only Japanese word I know of with sh- corresponding to standard Mandarin x-. It looks like a recent borrowing from Mandarin 蝦蛄 xiāgū, itself an interesting word for reasons I won't go into here. Yet shako ends in -o like a Sino-Japanese borrowing from Middle Chinese rather than -u, though I doubt Middle Chinese is relevant here. In short, the word seems as if it mixes borrowing patterns:

Middle Chinese
Hypothetical borrowing from Mandarin
Actual borrowing
ga, ka
xiā [ɕja˥]
[ku˥ ]

How was this word borrowed? When was it first attested? I presume it must have displaced a Japanese word since shako live in Japanese waters.

3.7.9:45: I should have read the Japanese Wikipedia article on shako before asking those questions. Going by what it says - I have no other references on hand - it seems the resemblance to Mandarin 蝦蛄 xiāgū is fortuitous.

The Edo period name for shako was shakunage because when boiled, it turned purple like a shakunage flower (Rhododendron subg. Hymenanthes). Shakunage is spelled as 石楠花 <ROCK CAMPHOR FLOWER> or 石南花 <ROCK SOUTH FLOWER>. I suspect that even though the spellings could be taken as meaningful, they are actually phonogram sequences. Shakunage then got shortened to shaku or shako, and the latter was then respelled as Chinese 蝦蛄 'mantis shrimp'.

If 石楠花 ~ 石南花 shakunage and 蝦蛄 shako are actually native Japanese words in sinographic disguise, their sha is in need of explanation since sha is normally only in loanwords. A major exception is 喋る shaberu 'to chat', a modern, common colloquial word whose origin is unknown to me.

A shift of -u to -o is unusual in Japanese. I can't think of any examples. Normally the vowel shift goes the other way around: -o > -u. So I wonder if shako is actually a more conservative form and if the association with shakunage was the product of later confusion. Shaku would then be a clipped form of shakunage or from shako with vowel raising. However, normally o-raising is not in final position, so that might favor the clipping hypothesis. I don't have the dialectological background needed to solve this problem.

3.7.6:41: Wiktionary lists Sino-Japanese Go-on readings ge and ku for 蝦 and 蛄, but my policy is to regard Sino-Japanese readings as hypothetical unless they occur in attested words. So many readings in dictionaries are generated on the basis of fanqie and knowledge of the general patterns of the two major strata of Sino-Japanese, Go-on and Ka-on. I don't know of any words in which 蝦 and 蛄 are read as ge and ku, so I only list ga, ka, and ko here on the basis of 蝦蟇 gama 'toad', 魚蝦 gyoka 'fish and shrimp', and 蟪蛄 keiko 'a kind of cicada'.

2. I didn't know about Eskayan, a constructed language of the Philippines, and its gigantic syllabary.

3. I also didn't know about Gustav Heldt's new translation of the Kojiki.

4. This would upset J. Marshall Unger (via Joanne Jacobs whom I haven't linked to in ages; emphasis mine):

There is no single way a brain becomes “rewired,” explains Wolf, a cognitive neuroscientist and director of UCLA’s Center for Dyslexia, Diverse Learners, and Social Justice. The process happens differently, depending on how we read. Readers of Chinese (an ideographic language) rewire differently from those who read Spanish (a logographic one).

It upsets me for three reasons.

First, languages can't be characterized by their writing systems.

Second, the Chinese script isn't "ideographic"; no writing system is. It isn't really "logographic" either, though that term is less wrong than "ideographic", as there is a partial correlation between Chinese words and Chinese characters.

Third, Spanish orthography is not "logographic"; Spanish is written in an alphabet, not a script with thousands of characters for words or morphemes. Strictly speaking, no writing system is logographic either - there are too many words in any language (Toki Pona aside) for the one-character-per-word principle to be viable.

5. John Candy died 25 years ago today. I didn't know he had Ukrainian ancestry, though I'm not surprised since Canada has "the world's third-largest Ukrainian population behind Ukraine itself and Russia."

Today I learned Canada has its own Ukrainian dialect. I was surprised to see cash register borrowed as a spelling-based кеш реґистер (?) kesh régyster rather than as a pronunciation-based кеш реджистер †kesh rédzhyster.

6. Seeing the word має <maje> 'has' in the Wikipedia article on Canadian Ukrainian made me check to see what other Cyrillic alphabets have є <je>. I forgot about Rusyn! And I didn't know about the letter's various usages over time and in Church Slavonic.

7. Via Viacheslav Zaytsev: Kychanov's (1970) decipherment and translation of "Гимн священным предкам тангутов" (Hymn to the Sacred Ancestors of the Tangut). I had heard of the text but didn't know about this study from almost fifty years ago! THE DAY OF THE YELLOW PIG

Or, in Jurchen,

< so.nggiyan PIG DAY> songgiyan uliyan inenggi

1. Not that it matters much, but when I tried to copy and paste 'pig' from the last day of the pig, I discovered that entry was missing from my index page! I've restored it; it'll eventually disappear after the entry preceding it does. As far as I know, I've never accidentally deleted an entry between entries like that before.

2. I'm stuck in Pirahã (P; getting tired of typing the tilde) mode now. It could be worse. I've only glanced at Daniel L. Everett's Don't Sleep, There Are Snakes: Life and Language in the Amazonian Jungle (2008). I'll read the whole thing eventually - I have two books to finish. I don't want to get too involved with Pirahã. But a glimpse of any language can offer data for future use, and so here I jot my notes on the little I've seen based on the sketch on pp. xi-xii in his book.

2a. Vowels

Here are the allophones:

/i/ [ɪ] ~ [ɛ] ~ [i]
/o/ [u] ~ [o]
/a/ [ɑ]

Here are the allophones in order of apparent frequency:

Phoneme \ frequency
[ɛ] [i]
[u] [o]

And here is a chart of the allophones:

[i] [u]
[ɪ] (gap 2)
(gap 1)
[ɛ] (gap 4)
(gap 3)

/i o/ constitute a class. They are the only vowels with allophonic variation and the only vowels that condition consonantal allophony (see below).

I don't know why o was chosen to symbolize the nonlow back vowel if its most frequent allophone is [u].

I am surprised there are no allophones [e] and [ʊ] (gaps 1-2). Are the P careful to avoid those vowels, or is Everett's description simplified?

The absence of [æ] and [ɔ] (gaps 3-4) may be motivated by the need to preserve 'buffer space' between /i o/ and /a/; such vowels combine characteristics of /i o/ and /a/.

2b. Consonants

Here are the allophones of the two most interesting consonants:

between /i/ and /o/
[ɺ͡ɺ̼] or [g]

/s/ palatalizes to [ʃ] before /i/.

The remaining consonants do not have allophony in Everett's introductory account: /ʔ h k t p/. (I won't go into the issue of whether [k] is really an allophone of [h].)

But later on p. 182, he talks about variation in 'head':

xapapaí ~ kapapaí ~ papapaí ~ xaxaxaí ~ kakakaí

The variation only affects voiceless labial /p/ and back consonants /ʔ h k/, not alveolar /t/ and not voiced /g b/. Vowels and tones (acute = high; unmarked = low) are stable.

The two voiced consonants can be regarded as front and back. (I almost wrote labial and nonlabial, but t'nonlabial' /g/ in fact has a linguolabial allophone [ɺ͡ɺ̼]; the subscript 'seagull' indicates linguolabiality.)

The nonlow vowels condition nonstop allophones of the voiced consonants: the lateral flap [ɺ͡ɺ̼] and the trill [ʙ]. I wanted to say continuant allophones, but Wikipedia says,

Whether laterals, taps/flaps, or trills are continuant is not conclusive.

Are /g b/ the results of a merger of a larger set of earlier voiced consonants?

- Were there originally three voiced consonants */g d b/?

- /g/ could be a merger of */g/ and */d/

- an earlier initial nasal allophone [ŋ] of /g/ could have merged with [n].

- [ɺ͡ɺ̼] could have originally been the /i o/-allophone of */d/

- this lateral flap allophone may in turn be a merger of an original *liquid and a lenited allophone of */d/; cf. how Korean intervocalic /r/ is a blend of the liquids *r and *l and lenited *t

- [g] could have originally been the /i o/-allophone of /g/

- Were nasals */ŋ n m/ originally distinct from stops */g d b/?

Looking at what little remains of P's extinct relatives may help to answer these questions. The initial consonant of Yahahí ~ Jahahí is intriguing, as P has nothing like it (anymore?). THE DAY OF THE YELLOW DOG

Or, in Jurchen,

< so.nggiyan CHICKEN DAY> songgiyan indahūn inenggi

1. I've glanced at Pirahã phonology before but never noticed two things until today:

1a. Pirahã has no nasal vowels. So why does the exonym of the Hi'aiti'ihi 'Straight Ones' have a nasal vowel? Is it a Portuguese borrowing from some other indigenous language? (And what does the apostrophe represent? Is it another way to write the glottal stop which is written as x elsewhere?)

(3.3.20:05: No, the apostrophe indicates a high tone; it seems to be an easily typeable substitute for the acute accent that Everett uses. Vowels not followed by apostrophes have low tones.)

(3.3.23:58: And even if Pirahã has no nasal vowels now, maybe it once did. The name could date back to the first contact with Portuguese speakers.)

1b. Pirahã has three vowels


which are not in a 'top-heavy' classical 'triangle':


I have never seen a Pirahã-type 'left-heavy' triangle before. Do any languages have the other two hypothetically possible layouts?





3.4.0:33: Today (3.3) I thought it would be interesting to see what distributional phenomena and allophony would motivate such analyses. Here are some syllables in hypothetical languages with the latter two types of vowel systems:

'Right-heavy' with nine phonetic vowels


'Bottom-heavy' with nine phonetic vowels

/ɨ/ /æ/ /ɑ/
[kɨ] [kæ] [kɑ]

I've designed the allophones so each phonemic symbol matches one allophone. But what if they don't? What if the 'bottom-heavy' language had only two phonetic high vowels distributed like this?


Do those syllables share a single vowel phoneme? What if speakers rhymed [i] and [u]? Should that vowel phoneme be symbolized as /ɨ/ halfway between front [i] and [u] even though [ɨ] isn't actually in the language? Does it make sense for /t/ to palatalize before nonpalatal /ɨ/? That is, in fact, what I think happened in Late Old Chinese: e.g., 之 *tə > *tɨə > *tɕɨə 'genitive marker'.

1c. In Don't Sleep, There Are Snakes: Life and Language in the Amazonian Jungle (2008), Daniel L. Everett mentions Steve Sheldon's Pirahã neologism for the Christian God, Baíxi Hioóxio 'Up-high Father'.

3.4.0:32: And I forgot to mention why I mentioned that ... I went on to write item 2 without finishing 1c.

If [k] is an allophone of /hi/, then is it possible to pronounce Hioóxio as Koóxio? What would be the phonetic motivation for hardening /hi/ into [k]?

3.4.21:02: Answering my first question, no:

The sequences [hoa] and [hia] are said to be in free variation with [kʷa] and [ka], at least in some words.

But why wouldn't [ha] be in free variation with [ka]? I thought perhaps at one time pre-Pirahã had *[q] and *[k] with Mongolian-type distribution: *[qa] but *[ku] and *[ki]. *[qa] became [ha], whereas *[kua] and *[kia] became [hoa] ~ [kʷa] and [hia] ~ [ka]. However, if *high vowels conditioned *[k], why aren't [hi hu] in free variation with [ki ku]?

I'm surprised to see presumably disyllabic [hia] in free variation with monosyllabic [kʷa]. Does such variation only apply if /i/ and /a/ are both the same tone? Do /hía/ (high + low tone) and /hiá/ (low + high tone) exist? If they do, do they have monosyllabic free variants?

2. I have long been puzzled by the correspondences of the codas in the early written Sino-Tibetan languages. Today it finally occurred to me to see how much more confusion adding Evans' (2001) Proto-Southern-Qiang tones would cause. In chronological order from left to right (except for Proto-Southern-Qiang which can't be dated; I've put it last since it alone has the innovation of losing most codas):

Old Chinese
Old Tibetan
Proto-Southern Qiang

-s (< *-ds?)
(*low tone)

*low tone

*low tone
(*high tone)
(*high tone)
(*low tone)

*low tone

The Tangut numerals from 'one' to 'nine' all have the 'level tone' (1- in my notation which I adopted from Arakawa Shintarō), whereas 'ten' has the 'rising tone' (2- in my notation) from a source I symbolize as *-H, possibly a glottal consonant. I doubted there would be any correlation between the two Tangut tones and the two tones of Proto-Southern Qiang, its closest relative among the languages above. And of course there was none.

3.4.0:34: Did tone 1 spread through the closed set of Tangut numerals 'one' through 'nine"/

3.4.21:14: Notes on individual numerals:

'One': Straightforward. Tangut and Proto-Southern Qiang do not preserve any final stops.

'Two': Pre-Burmese points to *-t, Old Chinese, Old Tibetan, and Tangut are ambiguous, and Pyu has an open syllable.

The function of the *-s in Old Chinese and Old Tibetan is unknown.

I have no idea what Tangut *X is; it is a dummy symbol for the source of the equally mysterious feature -' which distinguishes certain rhymes in Tangut. I have never found any correlation between *X/-' and any feature in any other language. It could be a Proto-Sino-Tibetan feature preserved only in Tangut, though I doubt that.

'Three': At first I was pleased to see -ḥ in both Pyu and pre-Burmese. But look at 'four', 'five', and 'nine' where pre-Burmese has a -ḥ absent from Pyu. Pre-Burmese -ḥ doesn't correlate with Old Chinese *-ʔ.

'Four': Might  Pre-Burmese -ḥ here be from *-s rather than *-ʔ? Why was this *-s added? There is no trace of it in Pyu (where *-s probably became -ḥ) or pre-Tangut (where *-s may have become *-H).

'Five': Pre-Burmese -ḥ corresponds to Old Chinese *-ʔ, but Pyu lacks the expected -ḥ.

'Six': See 'one'.

'Seven': Tibetan has a unique root for 'seven'.

'Eight': See 'one'.

'Nine': Pre-Burmese -ḥ corresponds to Old Chinese *-ʔ, but Pyu lacks the expected -ḥ.

'Ten': The languages do not share a common root. This is the only pre-Tangut word with *-H in the set, and that *-H / Tangut 'rising' tone corresponds to Proto-Southern Qiang low tone ... just like pre-Tangut *-Ø / Tangut 'level' tone which can also correspond to Proto-Southern Qiang high tone! THE DAY OF THE RED CHICKEN

Or, in Jurchen,

< RED.nggiyan CHICKEN DAY> fulanggiyan tiko inenggi

1. It's actually still the day of the green sheep for me as I write this item, but it's already the day of the red chicken in what was once the Jurchen Empire.

Viacheslav Zaytsev linked to this video of the text in Jurchen found by the Arkhara River discovered by Prof. Andrey Zabiyako (h/t Andrew West who has written the definitive article on the subject in English).

That site is not far from Birobidzhan. I just learned that Biro- is a reference to the Bira River - 'River River'. Bira is 'river' in Jurchen, Manchu, and other Tungusic languages; the word can be reconstructed for Proto-Tungusic. I wonder what specific language is the source of that name and of the name of the Bidzhan River. Wiktionary does not have etymologies for either name.

2. I had no idea Li Fang-Kuei's brother-in-law 徐道鄰 Hsu Dau-lin was once Chiang Ching-kuo's tutor upon the latter's return from the USSR.

3. While looking at Evans' (2001) reconstructions of Proto-Southern Qiang numerals, I realized why his PSQ *a (low tone) corresponds to Tangut 5981 𗈪 0a1 'one' rather than †i4 < *a. Brightening (*a > i) in Tangut might only have applied in word-final position, and *a 'one' only appeared before other words, so its vowel remained intact.

A wilder possibility is that 0a1 is from *ʕa with a pharyngeal *ʕ- that blocked brightening and conditioned Grade I, but there is no evidence for such a pharyngeal in pre-Tangut.

I reconstruct 𗈪 0a1 'one' with Grade I (hence -1 in my notation) because it was transcribed in late 12th century northwestern Chinese as 阿 1a1 in the Pearl in the Palm glossary.

The 0 indicates that I don't know the tone of 'one'. Maybe it literally had 'zero' tone in the sense that its tone may have been neutral.

4. Speaking of numerals, I was surprised to learn that Dmitri Mendeleev used Sanskrit numeral prefixes (eka- 'one', dvi- 'two', tri- 'three') in the periodic table he submitted for publication 150 years ago today. Why Sanskrit?

5. Looking at Alexander Vovin's (2017) reconstruction of Old Korean (OK) *-arari for a verbal suffix 下里 <BELOW.ri> that he seems to regard as cognate to Middle Korean (MK) àráj 'bottom' made me wonder how it lines up with John R. Bentley's 2000 reconstruction of *arUsI 'below, lower' for Paekche (P):

Old Korean
Middle Korean


The correspodence of P *rU and OK *ra may point to a Proto-Koreanic (or Proto-South Koreanic?) *ɔ.

The P and OK words may have different suffixes added to a shared root *arɔ. If the Old Korean liquid had been *l, I might propose a Proto-Koreanic voiceless *l̥ that became P *s and OK *l. But OK *l would not have lenited to zero in MK: OK †arali would have become MK †àrári.

6. What are the characters 𠡙, 𠧭, 烞, and U+2C1D1 (⿺气朴) for? I found them in the Wiktionary entry for 朴 when writing this addendum to "The Day of the White Hare". Of course I don't know a lot of characters. What makes those so special?

- I have never seen 朴 as a phonetic before

- 朴 was not a phonetic in Old Chinese, so its derivatives must postdate Old Chinese

- 力 'strength', 大 'big', 火 'fire', and 气 'air' are not normal left-hand components

烞 has a Wiktionary entry with a Mandarin reading but no meaning. defines 烞 as 'the sound of cracking from heat'. It has no definitions for the other three characters.

7. 加藤昌彦 Katō Atsuhiko (2009) reconstructs a ten-vowel system for Proto-Pwo Karen including two unrounded high nonfront vowels and on the basis of dialects preserving a contrast between them. I do not recall ever seeing a description of a living language with a /ɨ ɯ/ contrast before. There seems to be a common assumption that Proto-Sino-Tibetan had a small number of vowels. How such a small inventory expanded into the larger inventories of languages like Proto-Pwo Karen remains to be explained.

8. Today is the centennial of Korea's 三一運動 Samil undong, the March 1st Movement. Looking at the text of the Korean Declaration of Independence (image / English), I was surprised by how relatively modern it looks. It lacks the obsolete vowel symbol arae a (ㆍ), perhaps the most striking characteristic of old hangul orthography. It does have ᄯ <st> for modern ㄸ <tt> and instances of standalone ㅣ <i> instead of 이 <Øi>: e.g., ㅣ며 <i myŏ> as well as modern 이며 <Øi myŏ> for i-mye 'be-and' after vowel-final words.

9. The Jurchen word for 'honey' apparently only survives in Chinese transcription as 希粗 *xi tsʰu in the vocabulary of the Bureau of Interpreters (#1025). The corresponding Manchu word is hibsu [xipsu].

Does *tsʰ represent [tsʰ] < *ps in that Jurchen dialect (which could not be ancestral to Manchu which preserved *ps), or does the transcription conceal a Jurchen [ps]?

How would the Jurchen ancestor of hibsu have been written? Neither a phonogram <hip> nor a logogram <HONEY> have been found. Would the word have been written <>? We probably do not yet have a complete set of Jurchen characters. Parts of the Jurchen Character Book are missing; there are characters in inscriptions and the Sino-Jurchen vocabularies that are not in that presumably early catalog, and there may be characters that are not in any of those sources. THE DAY OF THE GREEN SHEEP

Or, in Jurchen,

<nion.nggiyan SHEEP DAY> nionggiyan honi inenggi

1. Today is the 900th birthday of Emperor Xizong of the Jurchen Empire.

His Jurchen name, transcribed in Jin Chinese as  合剌 *xo la, was probably either *Hola or *Hora.

He has been credited with the apparently short-lived Jurchen small script. If Aisin-Gioro Ulhicun is right, these are the only two remaining blocks in the small script:

None of the components look like Khitan small script components except for the one at the bottom of the second block resembling 쇼 whose reading has yet to be identified.

쇼 also looks like the hangul spelling of Middle Korean syo 'cow', a word I have yet more to say about.

2. I was hoping Guillaume Jacques would give examples of Tangut *P-causatives in "The Labial Causative In Trans-Himalayan" (2019), and he did. Here are two more examples from Gong (1988: 45-46).

ghost, demon, devil
𘘏 0622
*Pɯ.ʔ[o/ə] to bring evil
𘔚 1671
𗽫 2765
to turn red

It may be significant that all five examples of causatives are Grade III/IV syllables (written by Guillaume with -j- following Li Fanwen and by me with -3/-4). I hypothesize that Grade III/IV was conditioned by *high-vowel presyllables. So the causative prefix may have been *Pɯ-. (*ɯ is my symbol for an unknown high vowel. Maybe I should just write *I or *Y.)

Gong also gives examples of zero ~ -w- alternations with Tangut without any obvious semantic function. Those pairs outnumber the causative pairs and need further investigation. Some may be doublets involving *P-preinitials or presyllables that had nothing to do with causative *Pɯ-: e.g., perhaps

𗪺 3354 1ghi2 'power'

𘏐 5307 1ghwi2 'power'

are two different reflexes of a pre-Tangut noun *Pʌ.gr[a/e] 'power' (Note the *nonhigh vowel in the presyllable needed to condition both lenition and Grade II.) One lost its presyllable before *P- could condition *-w-:

Stage 1: The earliest reconstructible form
Stage 2: Grade II for syllables with *-r- and lower vowels (*ʌ, *a, *e) compensating for *-r-loss *Pʌ.g[a/e]2
Stage 3: *-g-lenition between sonorants
Stage 4: *Pʌ-loss
*ɣ[a/e]2 *Pʌ.ɣ[a/e]2
Stage 5: *a/e-merger *ɣe2 *Pʌ.ɣe2
Stage 6: Presyllabic vowel loss
Stage 7: Labial metathesis: *PC- > *Cw-
Stage 8: *e-raising

(Tone 1 is automatically assigned to pre-Tangut syllables without *-H.)

The exact relative chronology of changes is unknown, though the following suborders are certain:

*-g- must lenite before presyllabic vowels are lost

*Pʌ- must be lost after its vowel conditioned lenition

It would be simpler but not necessary for *-r- to be lost before labial metathesis. Keeping *-ɣr- intact up until metathesis would require *P- to 'jump' over two consonants, not just one: *P.ɣr- > *ɣrw-.

It would be simpler but not necessary for *-r- to be lost before lenition. Keeping *-r- intact up until lenition would require lenition to occur between sonorants in general rather than just vowels.

I think Gong Xun may be right about Grade II being uvularization: i.e., i2 was phonetically [iʶ] conditioned by a medial *-r- that was perhaps uvular *[ʁ] in the vicinity of low vowels.

*r- by a high vowel like the *rɯ- of *rɯ.nej 'red' was not uvular and did not condition uvulariztion/Grade II; *rɯ.nej became Grade IV 1ne4.

3. I saw this in Theraphan Luangthongkum's "A View on Proto-Karen Phonology and Lexicon" (2019) and thought, 'no!'

PTB *b-r-gyat*b-g-ryat > PK *grɔtD ‘eight’

Aside from the macroproblem of Proto-Tibeto-Burman (PTB) probably not existing, a microproblem is the proposed survival of 'PTB' *g- in Proto-Karen. PTB *b-r-gyat is a projection of Written Tibetan brgyad 'eight' back into the past, complete with a *-g- that is a Tibetan innovation - the product of Li Fang-Kuei's law. The consonant cluster †bry- does not exist in Written Tibetan.

I think Tibetan and PK have different prefixes attached to a common *r-root for 'eight'. (But what were those prefixes for?) Chinese has a labial prefix like Tibetan (八 *pret 'eight') whereas Japhug kɯrcat 'eight' and Evans' (2001: 2460 Proto-Southern Qiang *khr[a/e] 'eight' have velar prefixes. (Northern Qiang also has a velar prefix: e.g., Mawo khaʳ.) Tangut 𘉋 1ar4  < *rjat 'eight' may preserve the bare root. It might have had a presyllable *Cɯ-, but there is no internal evidence pointing to either *p- or *k-. If the Tangut form had a presyllable, I would guess it started with *k- since Tangut is more closely related to Japhug and Qiang than to Tibetan and Chinese.

The vowel of PK *grɔtD ‘eight’ is surprising because other languages lack rounded vowels in the word: e.g., Pyu, sometimes thought to be Karenic, has hrat·ṁ /r̥ät/ 'eight'. (Could Pyu /r̥/ be from *gr- via *ɣr-? There is no gr- in Pyu.)

You can see the diversity of forms for 'eight' in Sino-Tibetan at STEDT. THE DAY OF THE WHITE HARE

Or, in Jurchen,

<šang.giyan HARE DAY> šanggiyan gulma? inenggi

1. My pareidolia glasses make me see Chinese 兔 ~ 兎 <HARE> in Jurchen <HARE>. Compare the Jurchen character with these cursive forms of the Chinese character.

Trying to see the Khitan large script character -

<tau.lia> 'hare'

- a fusion of two phonograms -

<tau> + <lia>

in Jurchen <HARE> is too much of a stretch even for me.

2. I just learned there's a lesser known 'Seoul' - no, not a town with the same name as the capital, but a homophonous unrelated Sino-Korean compound 暑鬱 서울 sŏul, a Chinese medical term that I could calque as 'thermopression'.

3. I wish Naver's Korean dictionary switched to Unicode for premodern hangul. The image for ᄫᅳᆯ <βɯr> in the entry for 서울 Seoul is hard to read.

4. I've had the 'Microsoft Old Hangul' IME installed for over a year but never tried it out until now. It produces ... Latin letters? I Googled "microsoft old hangul" and the first result I get is

I purchased a brand new laptop and installed the korean keyboard however, it does not allow me to actually type in hangul which is the most frustrating thing.

The second result I get is

I have tried also tried installing the Microsoft Old Hangul keyboard but I can't get it to actually type in Hangul just in Latin letters.

There are only 119 results. I guess almost no English speakers care about this. I used BabelMap to type ᄫᅳᆯ <βɯr> above, but there's no way I'm going to type more than a few Middle Korean words that way.

Apparently Microsoft Old Hangul only works in Office. Great ... I have that IME installed on my laptop without Office.

5. The link above goes to an enthusiast of the グルジア Gurujia language. I should have guessed what that was. Other katakana names are the obvious ジョージア Jōjia and カルトリ Karutori (< ქართული Kartuli). Kanji short names are 具語 Gugo and 喬語 Kyōgo; -go is 'language', and Kyō is the Japanese reading of the first character of 喬治, the Chinese version of 'George'.

6. It just occurred to me that 白 <WHITE> in 白村 Hakusuki could be just as un-Chinese as 村 <VILLAGE> suki. Suki is not a Japanese word. According to Wikipedia, Kōjien regards it as an Old Korean word for 'village'. But I don't know of any similar Korean word. Could it be a cognate of Korean 시골 shigol 'village' in an extinct Koreanic language: namely, Paekche?

If <WHITE> - read as *bæk in Late Old and Early Middle Chinese - is actually a phonogram, could it represent a native Koreanic word - a cognate of Korean 박 pak 'gourd'? Then the Old Korean name underlying Hakusuki would be 'Gourd Village'.

(3.1.19:56: In 三國史記 Samguk sagi, 朴 [Late Old Chinese *pʰɔk] is a transcription of the surname of the founder of Shilla and is glossed as 'gourd'.

If that gloss is correct, what I wrote two entries ago could be wrong. It may not be necessary to regard Late Old Chinese 斯盧 for Old Korean *sela 'Shilla' from  三國志 Sanguozhi [Records of the Three Kingdoms, c. 280] as an early transcription *sie la predating the shift of *-a to *-ɔ in what I could call Very Late Old Chinese.  Perhaps Old Koreans speakers thought Very Late Old Chinese *-ɔ was similar to their *a [phonetically back [ɑ]?] and wrote 'Shilla' as very late Old Chinese 斯盧 *sie lɔ predating the shift of *-a to *-ɔ. According to Coblin [1983: 103], the *a to *-ɔ [his *-o] shift was complete by the Western Jin: i.e., the late 3rd century when the Sanguozhi was compiled. But there is no guarantee that 斯盧 was a transcription invented on the spot in 28X; it could have been created prior to the raising and rounding of *a. In any case, reading 斯盧 as Sino-Korean saro < earlier Sino-Korean sʌro < 8th century Late Middle Chinese *sz̩ lo is anachronistic. Even a 6th century Early Middle Chinese reading like *si[ə] lo would be anachronistic.)

7. I saw this blurb for Tao Te Ching: An All-New Translation:

Renowned translator William Scott Wilson offers a fresh version of the Tao Te Ching that will resonate with the modern reader. While most translators have relied on the "new" text of 200 B.C., Wilson went back another 300 years to work from the original characters used during Lao Tzu's lifetime. By referring to these earlier characters, Wilson is able to offer a text that is more authentic in language and nuance, yet preserves all the beauty and poetry of the work.

The "original characters"? What does that mean? That earlier shapes of the characters somehow give more insight? Why not the 'original wording'? Because "characters" sound so much exotic?

No, he really is referring to the shapes of the characters. In his own words, "the nuance and meaning of the original characters was lost"! (p. 11) My Exotik East alarm is ringing. Loudly. No one's going to invite me to a Japanophile conference. Sniff.

I don't see him using a special old-timey font or anything for the characters. Are their olde shapes a secret for his erudite eyes only? (And does it even occur to him that the Mandarin readings he uses are just as anachronistic as his modern font?)

It gets worse ... "Chinese, as a language based on ideographs" (p. 27) ... characters which wouldn't exist if there weren't a spoken language to begin with. Characters which the majority of Chinese through time barely knew or didn't know at all.

John Cikoski would have a fit:

The legend of an "ideographic language" is false; reading Chinese is not grokking images of a man standing by his words or a woman kneeling under a roof or a bear riding a skateboard through a dentist's office or whatever. (p. xii)

So would the late John DeFrancis. His The Chinese Language: Fact and Fantasy is still a favorite of mine after over thirty years.

I am reminded of Bernhard Karlgren's "Notes on Lao-Tse" (1932: 1):

Of all the documents of the pre-Han China, no one has attracted and interested Western readers so much as the short and exceedingly pensive treatise Tao te king [= Tao Te Ching] attributed to an unknown author around 400 BC. It has been translated several dozen times into Western languages. The majority of these "translations" merely reveal that their translators have had very little knowledge of the Chou-time language.


In most of these translations [the ones Karlgren regards as "[t]he most serious attempts" which makes one wonder what he thinks of the others] we find lines interspersed in the text, being explanatory speculations of the translators, for which additions the classical Chinese text has no corresponding passages.

It would be nice to see a Jurchen translation of the Tao Te Ching. Here's a Manchu version with the original Chinese and Karlgren's translation for comparison:


doro be doro oci ojo-ro-ngge,


BK: 'The Tao Way that can be (told of:) defined'


enteheme doro waka.

'constant way NEG'

BK: 'is not the constant Way,'


gebu be gebu oci ojo-ro-ngge,

'name ACC name TOP be-IPFV.PTCP.NMLZ'

BK: 'the names that can be named (used as terms)'


enteheme gebu waka.

'constant way NEG'

BK: 'are not constant names (terms).'

The Wikibooks translation does not match what I see:

There are ways but the way is uncharted;

There are names but not nature in words

I'm glad the ideographic myth hasn't taken root in Jurchenology or Khitanology. The enigmatic construction of many Tangut characters makes tangraphy fertile ground for the ideographic myth.

8. Is that supposed to be a Chinese character on the cover of Ideals of the Samurai?

9. Why wasn't I assigned Cikoski's Introduction to Classical Chinese (1976) at Berkeley? It was published there fourteen years before I took Classical Chinese.

Two days ago I found Brandt's Introduction to Literary Chinese, an example of the common sink-or-swim method. Blub blub.

10. I realized that Jurchen

<> senggige 'filial piety, relative' < senggi 'blood'

is parallel to Khitan

cišid- 'filial' (reconstruction by Shimunek [2017: 219]) < cis 'blood'

which was one of the first words in the small script that I ever saw. I don't think the parallel is coincidental.

Later, Manchu replaced senggige with hiyoo-šun, a borrowing of Ming Chinese 孝 *xjaw 'filial'.

11. I finally got around to rediscovering Blench and Post 2013. Too much to quote and comment on here. Maybe I can seriallize my reaction upon rereading it.

12. I keep forgetting to mention my idea of Jurchen

<sin> [ɕiɴ]

in 'rat' possibly being a phonogram derived from or cognate to Chinese 剩, pronounced *ʂiŋ in Liao and JIn Chinese. The graph could go back to the Parhae script. In Parhae times, the eastern Late Middle Chinese reading of 剩 would have been something like *ɕɦɨŋ. (But why is the Sino-Korean reading from that period ing [iŋ] instead of †sŭng [sɯŋ]?)

13. I didn't know Canadian Aboriginal syllabics were influenced by devanagari and Pitman shorthand. And I should have guessed this:

Canadian syllabics would influence the Pollard script in China.

14. How did Russians come to borrow рисунок risunok 'picture' from Polish rysunek? Wiktionary has lists of Russian borrowings from Polish and Russian terms derived from Polish. The two lists overlap. Does the longer second list contain calques as well as direct borrowings?

15. 3.1.19:49: A topic I forgot to mention: Luce (1985: 24) mentions old spellings of names of Karen (Old Burmese <karyaṅ>) groups:

Pgho for Pwo "in the older [Western?] books"; possibly also Old Burmese <plav>, <plavʔ>, <plo>, <ploʔ>, <plov>, <pravʔ>

Bghai for Bwè "as the older books call them"

Old Burmese <cakrav> "provisionally" for Sgaw

Have historical studies of Karen integrated such data? Pgh- and Bgh- remind me of Burling's (1969: 29) Proto-Karen *pɣ-. More about that in my entry for 2.24. THE DAY OF THE WHITE TIGER

Or, in Jurchen,

<šang.giyan TIGER DAY> šanggiyan tasha inenggi

I am out of time for tonight, so there are only three items.

1. I haven't been comfortable with transliterating the first character of the date as <šang> because it also seems to stand for sa- in

<sa.hai> sa-ha-i < *sa-qʰa-i 'know-PFV.PTCP-GEN' (Yongning Temple Stele line 8, 1413; the interpretation is from Jin Qizong [1984: 98])

and various other words where there is no nasal or trace of one. (A nasal would have blocked the lenition of *qʰ to h [χ]. *saɴ-qʰa-i would have become Jurchen †sakai.)

And reinterpreting the second character as <nggiyan> isn't going to work because

<RED.nggiyan?> 'red'

can't be fulnggiyan which violates Jurchen phonotactics. And the phonotactics of any language I've ever seen. But what if 'red' was fulanggiyan with an a to breakup the bizarre sequence -lngg-?

(2.25.21:11: I did not pick a at random to be a filler vowel; Janhunen (2003: 7) reconstructed Proto-Mongolic *xulaxan 'red'. That *x- is from an even earlier *p-. Is there any reason to suppose that Mongghul fulaan 'red' has f- from *x- < *p- as opposed to straight from *p-?)

The Jurchen and Khitan large script characters for 'tiger'

are probably related via a shared Parhae prototype distinct from Chinese 虎 <TIGER>. 

2.26.18:50: Curiously that Khitan character is not in N4631 which has two near-lookalikes:

0335 and 0280

I do not know whether those are variants of <TIGER>. I have not seen 0280 in calendrical contexts (but perhaps its contexts involve physical tigers), and I have never seen 0335 in context. Here are four instances of 0280 that I have seen:


Epitaph for the 蕭袍魯 Great Prince of the North, line 3 (1041)


Epitaph for 蕭袍魯 Xiao Paolu, lines 4-5, 7 (1090)


Epitaph for 耶律褀 Yelü Qi, line 23 (1108)

I have no idea where word divisions are. I have provided the characters preceding and following 280 without knowing whether they represented words or parts of words.

2. I just mentioned the South Korean writer 全光鏞 Chŏn Kwang-yong ... and he turned out to be one of S. Robert Ramsey's informants for the 咸鏡南道 South Hamgyŏng Province dialect of 北青郡 Pukchhŏng County in Accent and Morphology in Korean Dialects (1978).

Ramsey's other informant is a woman with the unusual (to me) name 趙五木禮 <cho o.mok.rye> Cho Omongnye. Are the characters of her trisyllabic personal name simply phonograms (there is a native word omok 'concave' - a strange morpheme for a name - and no native rye; nye 'yes' cannot possibly be relevant) or is the name really a meaningful sequence of three morphemes 五 'five', 木 'wood/tree', and 禮 'ceremony/decorum'?

I was hoping South Hamgyŏng would support my hypothesis of Proto-Korean *e, but ... I'll have to describe how my dream crumbled some other time.

3. I saw an online ad for Rocketman starring Taron Egerton, a graduate of Ysgol Penglais School, a name that is structually like the equally redundant Mount Fuji-san in reverse: Ysgol at the beginning is Welsh for 'school', just as -san at the end is Sino-Japanese for 'mountain'.

4. 2.27.21:14: BONUS FOURTH ITEM: I forgot to mention a solution I had on the 22nd to this problem: How can Jurchen

<sol.go> 'Korea' (cf. Manchu solho 'id.')

and Middle Mongolian 莎郎合思 solangqa-s 'Koreans' with -o- be reconciled with the Late Old Chinese transcriptions 斯盧 ~ 斯羅 of Old Korean *sela  with *-e-?

斯盧 and 斯羅 appear to be from two different strata of transcriptions reflecting different stages of Late Old Chinese:

斯盧 *sie la (in more precise notation, *sie lɑ) predates the shift of *-a to *-ɔ and the shift of *-aj to *-a. At this stage, 羅 was read *laj and was not yet appropriate for transcribing foreign la.

斯羅 *sie la postdates the shift of *-a to *-ɔ and the shift of *-aj to *-a. At this stage, 盧 was read *lɔ  and was no longer appropriate for transcribing foreign la.

At neither stage did Late Old Chinese have a syllable *sio. Sio, er, so what if 斯盧 ~ 斯羅 were attempts to write an Old Korean *sjola? Or - now it occurs to me - *søla? (But nothing else indicates Old Korean had front rounded vowels.) The Jurchen/Manchu and Mongolian names for Korea could be based on *sjola with the simplification of *sj- to s- to fit their phonotactics. Then later Old Korean shifted *jo (or *ø?) to *e.

That idea generates more problems, though.

First, how can Middle Korean sjó 'cow' exist if *jo became *e? sjó would have to come from something other than *sjo in Old Korean: e.g., *siro with an *-r- blocking the fusion of *i-o into *e. But there is no evidence for a disyllabic early word for 'cow'. The earliest attestation of a Koreanic word for 'cow' is as 首 in the sinographic spelling of a Koguryo toponym.首 was read as *ɕuʔ in Late Old Chinese which lacked *sju or *sjo, so 首 might have been a viable phonogram for a North Koreanic *sjo.

Second, if the Koreanic word had *ø, that vowel should correspond to Mongolian ö, not the o in solangqa-s.

My guess is that the Jurchen/Manchu and Mongolian names for Korea are borrowings from a North Koreanic *sjola or the like which differed as much from Shilla *sela as Polish Lwów [lvuf] differs from Ukrainian Львів [lʲʋiw] 'Lvov' (But are there any other cases of northern *jo : southern *e?) 'Old Korean' or 'early Koreanic' or whatever we call it must have been as diverse as Slavic or perhaps even Romance are today.

The same must have been true of the Chinese of the time; the reconstructions here are generic without the regional flavoring that must have existed. It would be great to see an update of Paul LM Serruys' 1959 study of the 方言 Fangyan 'Regional Words'. THE DAY OF THE YELLOW OX

Or, in Jurchen,

<so.giyan DAY> sogiyan wihan inenggi

1. I've been meaning to post this since 2.7: I wonder if <so> orignated as a Parhae script cognate of Chinese 牛 <COW>. What if that cognate were used to write a North Koreanic cognate of Middle Korean syó? Then in turn this logogram for a North Koreanic word was then recycled as a phonogram for Jurchen so. (Although Jin Qizong [1984: 185] glossed this graph as 'yellow', it appears in spellings for various unrelated so-words, so it may just be a phonogram.)

2. Anthony Burgess wrote and slept in a Dormobile. Nice portmanteau word. Is there a Chinese equivalent of portmanteau words? Imagine the possibilities in hangul or the Khitan small script.

3. LOL, best use of the button choice meme I've seen yet by noealz (via Jay Lim via Gerry Bevers). Knowing which words are Sino-Korean helps a lot in remembering which words are spelled with ㅐ ae and which ones have ㅔ e: there are hardly any Sino-Korean morphemes with -e: the only one that immediately comes to mind is 揭 ke. And knowing the etymologies of native words helps: e.g., 내- nae- 'to put out' is from 나 na- 'to come out' + the causative suffix -이- -i-. But that won't help with monomorphemic 개 kae 'dog' and 게 ke 'crab' which can't be broken down any further.

4. LOL 2:

"Today, a good working knowledge of Chinese characters is still important for anyone who wishes to study older texts (up to about the 1990s)"

When I first started learning Korean in 1987, I saw mixed-script texts and figured I'd better start learning Chinese character readings right away. I added Sino-Korean readings in pencil to my copy of Nelson's The Modern Reader's Japanese-English Character Dictionary (still in print after 57 years, and for good reason!). Now I hardly see Chinese characters in current Korean texts: e.g., on's front page I only see

中文 'Chinese writing' (top of page) and 中國語 'Chinese language' (bottom of page) for the Chinese-language edition; the latter is a Korean word Chunggugŏ which shouldn't be used to indicate a Chinese edition for Chinese readers

日文 'Japanese writing' (top of page) and 日本語 'Japanese language' (bottom of page) for the Japanese-language edition; the latter is a Chinese word which shouldn't be used to indicate a Japanese edition for Japanese readers

Those characters aren't for Korean readers; these nine are.

4 characters used as abbreviations of country names:

Puk for Pukhan 'North Korea'

Mi for Miguk 'America'

Il for Ilbon 'Japan'

Tok for Togil (/tok/ + /il/) 'Germany'

3 characters for political abbreviations

Chhŏng for 青瓦臺 Chhŏngwadae 'Blue House'

ya for 野黨 yadang 'opposition'

Mun for 文在寅 Moon Jae-in

2 characters for disambiguation with homophones

mo 'mother'

chŏn 'previous': without characters could be interpreted as 'Commander Chŏn'.

5. TIL about the first Cherokee script (and first Native American-language) newspaper, the ᏣᎳᎩ ᏧᎴᎯᏌᏅᎯ <> Cherokee Phoenix, which was first published 191 years ago today. It appropriately came back to life in modern times.

6. I found 朱震球 Patrick Chu's study of correspondences between Cantonese and Sino-Korean readings. I worked out the correspondences between Sino-Korean and Sino-Japanese on my own as I added Sino-Korean readings to my copy of Nelson's dictionary.

7. I guessed that 'railroad' in Manchu would be a calque of Chinese 鐵道 'iron road', and voila: sele-i jugūn 'iron-GEN road' for 鐵路 'railroad' (lit. 'iron road'; close enough).

8. It is tempting to try to link Manchu sele to Korean 쇠 soe < Middle Korean sóy 'iron', but

- the vowels are too different (e is higher class and nonlabial, whereas o is lower class and labial)

- if sóy were from a Proto-Korean disyllable, its Middle Korean form should have rising pitch rather than a high pitch: †sǒy < *sòrí with a low pitch syllable followed by a high pitch syllable

- if I understand Vovin (2017) correctly, if there ever were a lost liquid in 'iron', it would have to be *-r-, not *-l-, and Jurchen/Manchu retain an r/l-distinction lost in Korean, so sele cannot be from *sere

9. Wiktionary gives अङ्कसङ्गणक aṅkasaṅgaṇaka as a Sanskrit translation of 'laptop (computer)':

aṅka- 'lap'

-saṅgaṇaka- < sam- 'com-'¹ + gaṇaka- 'calculator' (< gaṇa- 'number')

That led me to the Hindi Wikipedia article

संगणक अभियान्त्रिकी saṁgaṇak abhiyāntrikī 'computer engineering'

The first word is just another spelling of -saṅgaṇaka- 'computer'. Hindi drops the final -a of Sanskrit-based forms. (I hesitate to say 'loanword' here, since I suspect the word was coined out of Sanskrit for Hindi before being used in Sanskrit. I can't imagine a Sanskrit neologism for 'computer' predating a Hindi term.)

The second word is puzzling. yāntrikī is the feminine of 'relating to instruments (yantra)'. But what is abhi- doing? It is hard to translate. Monier-Williams' definition:

(a prefix to verbs and nouns, expressing) to, towards, into, over, upon. (As a prefix to verbs of motion) it expresses the notion or going towards, approaching, &c (As a prefix to nouns not derived from verbs) it expresses superiority, intensity, &c

Does it correspond to the en- of engineering? At first I thought the word was derived from a verb abhi-yam in which abhi- was an idiomatic prefix, but there is no such verb.

10. Try Viacheslav Zaytsev's Tangut eye exam.

11. It's always good to see a new name in the tiny field of Tangutology: 橘堂晃一 Kitsudō Kōichi, who coauthored "Tangut Text Printed in the “Illustration of the Ten Realms of Mind Contemplation 観心十法界図” in the Collection of the State Hermitage Museum, Russia" with Arakawa Shintarō.

12. And here's a solo paper by Kitsudō on Khitan influence on Uyghur Buddhism and a paper by Kitsudō and Peter Zieme on an Old Uyghur text with Tangut and Khitan parallels. I happened to see them right after I was thinking about how someday I might wish I knew more about Turkic.

How I wish there were Buddhist texts in Khitan. Something other than funerary texts. But I fear written Khitan was never a vehicle for Buddhism. Spoken Khitan, however ... oh, to hear a conversation about Buddhism in Khitan!

13. Does the South Hamgyŏng dialect of Korean preserve pre-vowel harmony vocalism? E.g., is manjŏ 'ahead' (mixing the lower vowel a with the higher vowel ŏ) more conservative than Seoul mŏnjŏ? See Ramsey (1978: 61) for more examples of South Hamgyŏng a corresponding to Seoul ŏ.

14. Why do Jurchen

<sol.go> 'Korea' (cf. Manchu solho 'id.')

and Middle Mongolian 莎郎合思 solangqa-s 'Koreans' have -o- in the first syllable if they are based  on Old Korean *sela (transcribed in Late Old Chinese as 斯盧 ~ 斯羅; later respelled as 新羅 <> - now read Shilla) with *-e-? The Jurchen/Manchu forms made me think the labiality of some suffix spread into the first syllable, but there is no labiality in the noninitial syllables of the Middle Mongolian form.

¹2.24.15:08: Although it's tempting to regard sam- and com- as cognates, Proto-Indo-European *kóm should have become Sanskrit śam, not sam. THE DAY OF THE YELLOW RAT

Or, in Jurchen,

<YELLOW.giyan DAY> sogiyan singge inenggi

1. I would expect 'rat' to be †singger since the Manchu word is singgeri, and Jurchen mudur 'dragon' corresponds to Manchu muduri 'id'. But the second graph is <ge>, not <ger>.

The first looks like Chinese 利 'profit' which was read *li in Jin Chinese. But other versions of it look less like 利:

I don't know why Jin Qizong reconstructed its reading as ʃïn with a nonfront ï (IPA [ɨ] or [ɯ]?). Was he influenced by the nonfront vowel in the modern Mandarin pronunciation shen [ʂən] of the character 申 used to transcribe sing-?

If one believed that Jurchen had frontness harmony, the e in the second syllable should go with a front vowel i in the first syllable, not ï.

On the other hand, I think Jurchen had height harmony, and the higher series vowel i is what I expect to go with e [ə], the higher series counterpart of a. If i had a lower series vowel, that would have been ī [ɪ] which would not coexist with the higher series vowel e [ə] within a root.

Lastly, the *ʂ- of the Ming Chinese transcription 申 *ʂin reflects a Jurchen s- [ɕ] that is more likely to have palatalized before a high front vowel i than a nonfront ï or a less high ī [ɪ].

2.23.11:13: Jurchen s-, like Korean or Japanese /s/, palatalizes before /i/.

2. I had been wondering what it's like for Tibetan refugees to move to the West. A firsthand account by བསམ་གཏན་རྒྱལ་མཚན་མཁར་རྨེའུ། bSamgtan rGyalmtshan mKharrme'u (Samten Gyeltsen Karmay):

In September 1961 we all arrived in England, which somewhat reminded us of India. I recall one day David invited us to lunch at Claridge’s, where he was staying. He led us into the hotel garden and on the lawn beside the swimming pool gave us exercise books and pencils and began teaching us the Roman alphabet.

And from his late benefactor David Snellgrove's perspective:

Before starting the journey to the West, we spent a few weeks together in the frontier town of Kalimpong, in British times the beginning of the old route from India into Central Tibet, then easily reached by rail from Calcutta where we would start our air-journey to Europe. Here I started some lessons in English and in world-geography and bought them all European style clothes, which they wanted to have so as not to be so conspicuous in there new setting.

3. At last I see (but can't read, alas) Graham Thurgood's PhD dissertation The Origins of Burmese Creaky Tone and a much shorter book Notes on the Origins of Burmese Creaky Tone.

4. I should look into what caused Roger Blench to change his mind about the classification of Kman (Miju). Compare:

2011 (with my former student Mark Post):

Miju does have more Tibeto-Burman roots than some of the other languages considered here, so it is provisionally classified as an isolate within Sino-Tibetan.


Kman is usually considered a Tibeto-Burman language, part of the ‘North Assam’ group, a characterisation which goes back to Konow (1902). However, there is no published argument defending this classification andBlench & Post (2013) consider it equally likely to be a language isolate.

I need to review Blench & Post (2013).

5. I can't believe I never saw EG Pulleyblank's 1965 article interpreting the Middle Chinese transcription 突厥 *dot kut simply as 'Türk' until now.

6. Two different descriptions of Kman (Miju) consonants:

Wikipedia has labialized velar fricatives /xʷ ɣʷ/ without nonlabialized counterparts /x ɣ/.

They seem to correspond to Roger Blench's unaspirated and aspirated /hʰ h/. I've never seen an aspirated /hʰ/ before.

By "consonant prosodies" which "include "labialisation, palatalisation, lateralisation and rhoticisation", does Blench mean clusters with [w j l r] after a consonant?

7. I noticed that the Tangut characters

𗄠 4524 2ngwu1 < *P.ŋoH or *Pʌ.ŋəH  'leader'

𗄟 4528 2ngwu1 < *P.ŋoH or *Pʌ.ŋəH  'official'

(the two characters probably represent the same word in two different contexts)

have the same element as the 'sorcerer' characters in my previous entries:

𗄞 4539 1vyq3 < *S.wi(p/t) 'wizard, witch, sorcerer'

𗄦 4527 2jeq2 < *S.NdreH or Sɯ.NdraŋH 'wizard'

𗄤 4536 2ror4 < *Cɯ.roH 'wizard, witch, sorcerer'

𗄥 4550 1lheq4 < *Sɯ-ɬe or *Sɯ-ɬaŋ 'wizard, sorcerer'

So is 𘠋  a semantic element for a person of authority? Not always - what is it doing in

𗄡 4529 2kyq4 < *S.kiH 'burnt'

whose analysis is unknown?

8. Only now did I discover Andrew West's BabelStoneHan font has hentaigana!

9. How is Nadsat translated into Russian?

10. I had no idea Anthony Burgess had such a rich linguistic background: e.g.,

Burgess attained fluency in Malay, spoken and written, achieving distinction in the examinations in the language set by the Colonial Office. He was rewarded with a salary increase for his proficiency in the language.


During his years in Malaya, and after he had mastered Jawi, the Arabic script adapted for Malay, Burgess taught himself the Persian language, after which he produced a translation of Eliot's The Waste Land into Persian (unpublished).

11. I wish Gerry Bevers wrote posts at Literary Chinese for Korean Learners. I've never seen anything on learning hanmun in English!

12. Bevers is not afraid to touch the radioactive Liancourt Rocks / Dokdo / Takeshima issue.


Or, in Jurchen,

<RED.giyan PIG DAY> fulgiyan uliyan inenggi

1. The Jurchen logogram <PIG> is clearly cognate to the Khitan large script logogram


but neither seems to have any cognate Chinese character unless I put on my pareidolia glasses and see a resemblance to 亥 'pig (in the 12-animal cycle)'.

I have shown the late form of the character from the vocabulary of the Bureau of Translators (#162; early 1400s?). Interestingly the earlier form of the character from the 進士  jinshi candidate monument (1224)

looks less like the Khitan form. Unfortunately, the character is not in what remains of the Jurchen Character Book thought to contain the earliest forms of characters.


2. Shimunek (2017: 45) reads the Old Mandarin transcription of a Khitan river name as *niawlaka. He regards the Chinese transcription of a Serbi river name as a cognate *ñawlag.

He rejects attempts to connect the river name to Khitan

<> 'gold';

the words are too dissimilar. Instead he sees a possible link to *ñaw 'lake'.

One problem is that *a had shifted to *o in Old Mandarin, so was read *niawloko. In an earlier period, those graphs would have been read as *niawlaka, but in that period, a final *-g would have been transcribed as *-k (as in the Serbi hydronym's transcription), whereas Old Mandarin lacked final stops, necessitating a whole syllable *-ko to transcribe foreign *-g.

I think that *-g may have been uvular *-ɢ or *-ʁ to harmonize with *a.

3. Shimunek (2017: 44) regards the transcriptions of an ethnonym that Pulleyblank (1983) reconstructed as *tägräg as "further evidence in support of Beckwith's (2007a) of dialectal variation between coda *g and *ŋ in northern frontier varieties of Old Chinese and Early Middle Chinese."

*tɛyŋ liayŋ *tʰɛr (< Beckwith's *tʰêk) lək
*ṭʰik lək
This site
*teŋ leŋ *tʰe(ik/t/r) lək
*ʈʰɨək lək
*dek lek
*dək lək

(2.20.1:15: The last two columns are my additions.)

I don't think such variation is necessary. *-k and *-ŋ are simply two different strategies to transcribe foreign *-g. There is no need to project *-g into Chinese.

Wikipedia avoids the issue of what the ethnonym was at the time by taking the easy (though anachronistic) option of reading 丁零 in standard Mandarin as Dingling, 鐵勒 as Tiele, etc.

4. Vovin (2003: 97) proposed that Cheju 굴레 kulle 'mouth' "is likely to be connected with Japonic *kutu- 'mouth'." He repeats this proposal on p. 24 in the section on a possible Japonic substratum of Cheju in his 2009 book.

Three apparent problems: If the Proto-Japonic word for 'mouth' was *kotu-i:

1. Cheju has -u- instead of -o-

(Japonic *o raised to *-u- in Pelagic Japonic but not Peninsular Japonic)

2. Cheju has -ll- instead of †-l- which is the expected reflex of intervocalic *-t-

3. Cheju has -e instead of -wi

But Cheju historical phonology seems like unexplored territory and my Proto-Japonic form could be wrong, so maybe the gaps can be bridged.

5. Speaking of Cheju, Wikipedia says,

This kingdom [on Cheju] is also sometimes known as Tangna (탕나), Seomna (섬나), and Tammora. All of these names mean "island country".


sŏm < *sema is 'island', a word shared with Japonic (e.g., pre-Old Japanese *sema 'id.').

But 'country' in Korean and, as far as I know, Cheju is 나라 nara, not 나 na: e.g., Cheju 여나라 Yŏnara ~ 예나라 Yenara 'Japan'.

I don't know of any 탕 thang 'island'.

Ah, I see now, somebody phonetically respelled two old names for Cheju, 涉羅 <sŏp.ra> [sʰɔmna] and 乇羅 <thak.ra> [tʰaŋna] in hangul. Neither 涉 Sŏp nor 乇 Thak mean 'island'. Nara cannot be abbreviated to 羅 -ra. See more old names for Cheju here.

And Vovin (2009: 25) thinks 耽牟羅 Thammora has "a transparent Japonic etymology": it is either cognate to Japanese tani 'valley' + mura 'village' or Japanese tami 'folk' + mura 'village'.

Tham could reflect a reduction of *tani- to *tam- before *m-.

牟 was read as *mu 'moo' in mainstream Old and Midlde Chinese. But the Sino-Korean reading 모 mo may indicate an eastern dialect with *mo for 'moo', as there was no *u to *o shift in Korean. 牟 represented a word for 'to moo' (as well as various homophones: see Karlgren [1957: 285] and Schuessler [2009: 184]), and such an onomatopoetic word might plausibly have vocalic variation. If 牟羅 was read *mora as in modern Sino-Korean, it could be evidence for a Proto-Japonic *mora 'village' whose *o raised to u in Japan but not on Cheju.

Again, no 'island' or 'country'.

6. Oddities in this Wikipedia entry on Cheju mythology:

6a. The translation gets off to a bad start, mentioning a "Ying Prefecture" not in the actual text. The Japanese translation has similar problems; it starts with 瀛州 'Ying Prefecture'. (Or Yŏng if one prefers to read it in Korean rather than Mandarin. Both readings are anachronistic.)

6b. Conversely, the translation ignores a lot before the mention of the first god 良乙那.

6c. It would be lazy and anachronistic to read 良乙那 with Sino-Korean readings as ryang + ŭl + na. 乙 is probably a phonogram for *r (Vovin ). 良 is a problem. Did it transcribe a syllable beginning with *r- which would be unusual in initial position in an Altaic language (but see here)? Or did it transcribe a syllable beginning with an *l- (cf. its Middle Chinese reading *l) which is possible in Altaic but unusual for Koreanic? 

7. I just ordered Robbins Burling's book Spellbound. What would he say about modern Lhasa Tibetan spelling: e.g.,

གཙང་པོ་ <gtsaṅ.po> ˉtsaṅko 'river'

གཟུགས་པོ་ <gsugs.po.>ོˊsuku 'body'

དོ་པོ་ <do.po.> ˊthopo 'luggage'

(Examples from 星 実千代 Hoshi Michiyo, 現代チベット語文法(ラサ方言) Gendai Chibetto-go bunpō (Rasa hōgen) [A Grammar of Modern Tibetan (Lhasa Dialect)]. Transcriptions in italics are in her orthography.)

8. New words for today:

zilant (this neo-tamga for Kazan is a neat example)

wyvern (first encountered 29 years ago in Megadeth's "Five Magics" but I only looked it up today)

enosis (The agreements leading to the proclamation of independence of Cyprus from the United Kingdom were made sixty years ago today.)


I found that last word in Language and Culture in Northeast India and Beyond: In Honor of Robbins Burling co-edited by my former student Mark Post.

2.20.1:51: Burling, last seen here, gave me my first introduction to Lolo-Burmese via the data in his 1967 book which I used to write my own reconstruction. I just realized he used Robert B. Jones' Karen data in the same way for his book Proto-Karen: A Reanalysis (1969)! THE DAY OF THE RED DOG

Or, in Jurchen,

<RED.giyan DOG DAY> fulgiyan indahūn inenggi

1. I've now been doing this Jurchen calendar shtick long enough to recycle the colors (red last came up on the 9th). Here's the whole cycle:

blue/green > red > yellow > white > black (and back to green again)

Soon I'll be recycling the animals and won't have to make the occasional new character image from Jason Glavy's font anymore. Yay! (I love his font; I just don't love the inconvenience of creating an image for every character I want to display.)

1a. Jin Qizong (1984: 235) derived <RED> from Chinese 金 <GOLD> (not <RED>!). 金 cannot be a phonetic loan, as it did not sound anything like fulgiyan; its Jin dynasty reading was *kim. (I don't agree with Shimunek [2017: 106-108] on the absence of *-m in Jin Chinese; I should go into why later.)

The Khitan large script character


looks nothing like the Jurchen character or Chinese 金 <GOLD>. I thought it might be related to Chinese 赤 <RED>, but that character has no similar variants. And to complicate matters further, Liu and Wang (2004: 23, #84) read this character as a transcription of Liao Chinese 金 *kim 'gold'!

A problem for the ex Khitanis hypothesis of the origin of the Jurchen script is why the Jurchen chose to copy the script of their "worst enemies" (as Janhunen [1994: 7] put it) in some instances but not others. As Janhunen asked, why didn't they just adopt the Chinese script or the simpler Khitan small script? Why seemingly modify the more complex Khitan large script at random? My view and his is that they did not do that; rather, they adapted the Parhae script, which, as Vovin (2012) demonstrated, predates the Khitan scripts. According to this ex Parhis hypothesis, the Khitan and Jurchen large scripts are sister derivatives of the Parhae script rather than a random deformation of the Chinese script and a derivative of that deformation.

1b. Shimunek (2017: 227) reconstructs the Khitan equivalent of 'red dog' as lyawqu ñaq (Yelü Dilie 14.27-28, 1092) with an initial cluster ly-.

Going by what (Kane 2009: 255) says, I think Nie (1988) may have been the first to suggest that Khitan had initial clusters.

Altaic languages generally avoid initial clusters. The big exception is Middle Korean whose clusters were the short-lived products of the reduction of word-initial syllables: e.g., pstaj < *pVsVtaj 'time'. Did Khitan initial clusters have similar origins?

2. Shimunek (2017: 225) reads the Khitan small script character

as <qai> and translates it as 'a discourse deictic demonstrative' borrowed from and corresponding to (Jin) Chinese 該 *kaj (my reconstruction) in the bilingual Sino-Khitan Langjun inscription. But I don't see 該 in the Chinese text. That loan proposal is phonologically interesting for reasons I should go into later.

3. After I mentioned a blog post on the pronunciation of postvocalic r in "Please Please Me" on the day of the green monkey, David Boxenhorn made me listen to it again for the first time in years. Do you hear an r after a vowel, and if you do, in which words?

4. Almost thirty years ago I was talking to H. Mack Horton about 南總里見八犬傳 Nansō Satomi hakkenden (The Tale of the Eight Dogs of the Satomi of Nansō). I'm glad to see there's a specialist in it who just published a book on it and is translating it. (Here's an unrelated online translation in progress. It's not clear to me whether the online version is based on 曲亭馬琴 Kyokutei Bakin's original or on a modern translation.)

Glynne Walley's courses show a lot of breadth - I guessed correctly that manga would be one topic, but he's done much more spanning the last millennium, going beyond the written word into rakugo, noh, and kyōgen (the latter two in a course with the great title "Monkey Fun").

I once thought I was going to be a Japanese literature scholar, but as you can obviously tell from this blog, I took a big detour and never turned back.

5. I'd like to learn more about JD Wisgo who runs and who just published Two of Six: A Captain's Dilemma, a translation of an online SF novella with parallel Japanese and English text.

I love parallel texts; my favorite is the Korean-English edition of 全光鏞 Chŏn Kwang-yong's 꺼삐딴 리 Kkŏppittan Ri (Kapitan Ri, 1962) translated by the late Prof. Marshall R. Pihl¹ who was my Middle Korean teacher. I just bought the book on Kindle; it's one of the few stories I've read that has stayed with me for three decades. Disappointingly the Kindle version lacks the Korean text which is in the print edition. At least Prof. Pihl's biography appears in both Korean and English, as does editor Bruce Fulton's - but Chŏn's own biography is only in English!

Things could be worse:

My Amazon account was hacked over the past few days. Had to contact Amazon for intervention. My account ended up being wiped and terminated.

2.20.22:41: Here's a description of the story by Michael Kociuba who read the same edition I did about thirty years ago:

In the story "Kapitan Lee," by Chon Kwangyong, the struggle to improve one's fortune seems to have taken precedence over loyalty to family or nation. The protagonist -- Dr. Yi Inguk, alias Kapitan Lee -- constantly strives to amass wealth and protect himself even at the expense of his fellow countrymen. As he refuses to treat patients who are unlikely to pay his fees, most of his clients are Japanese before liberation and members of "the moneyed class" after 1945.

Dr. Yi is divided in his loyalties, and that would all depend on who is in control. He served the oppressor during Japanese rule, and when the U.S. is the overlord, he donates a national treasure to the consul's collection without the slightest sense of guilt. Editor [Peter H.] Lee compares the physician to a chameleon, changing his colors to match the world which surrounds him, no matter how servile his efforts are.

I think Peter H. Lee was the translator of that edition. I agree with Gerry Bevers; it's a shame Prof. Lee doesn't have a Wikipedia entry. In lieu of an entry, I recommend Bevers' page on him, including his own memories of the man. Is Prof. Lee still alive? I also recommend Bevers' entire site, Korean Language Notes.

After all these years I finally figured out that 꺼삐딴 Kkŏppittan in the title is based on the pronunciation rather than the spelling of Russian капитан <kapitan> [kəpʲɪˈtan]. Until now I had been expecting a transliteration-like rendering of the word as 까삐딴 Kkappittan. The Russian word has been transliterated in the English translations of Kkŏppittan Ri; if it weren't, the name would be something like †Cuppitan with -u- as an attempt to indicate [ə]. (See A Clockwork Orange for other examples of Russian in English 'phonetic' spelling: e.g., gulliver for голова <golova> [ɡəlɐˈva] 'head'. It just occurred to me that Chinese transcriptions of foreign names are like gulliver: attempts to approximate foreign names using preexisting elements - though in the case of gulliver, the preexisting element is the trisyllabic name Gulliver rather than a syllable.)

6. Here's an interesting name reading I found on Wisgo's site:

犬吠埼一介 Ikkai Inubōsaki

介 is normally read suke in final position in men's names.

The real surprise is for 吠 which is normally read ho- in hoeru < poyu.

2.19.1:09: I tried to come up with a derivation for the name, but it doesn't work:

*inu-nə poyu-ru saki > Inubōsaki

'dog-GEN bark-ATTR cape' = 'cape where a dog barks'

There are two problems:

First, although  *nə-p > *Np > *Nb > b is possible, the genitive marker nə in a subordinate clause should not be reduced to N, at least not in Western Old Japanese. But maybe the name originates from a different dialect.

Second, neither premodern -oyuru nor modern -oeru can compress to -ō, unless one posits an ad hoc development in the source dialect.

I would expect to be from an earlier *nə-popu or *nə-papu. There was no verb †popu, but there is a papu which became modern 這う hau 'to crawl'. It seems then that the name is from

*inu-nə pap-u saki > Inubōsaki

'dog-GEN crawl-ATTR cape' = 'cape where a dog crawls'

without any ad hoc compression (apart from the unexpected reduction of *nə). The name could theoretically be written as †犬這埼 'dog-crawl-', but 吠 'howl' is semantically preferable to 這 'crawl'.

2.19.0:38: I didn't realize 犬吠埼 Inubōsaki contains the animal for this entry until the start of the next day, the day of the red pig!

2.20.22:03: I also hadn't known Inubōsaki was a place name. Wikipedia's Japanese and English articles on Cape Inubō take the 'bark' character at face value.

7. Lastly, here's today's incremental addition to the Tangut sorcerer thread: characters without the 'grass' element for near-synonyms:

𗄞 4539 1vyq3 < *S-wi(p/t) 'wizard, witch, sorcerer'

𗄦 4527 2jeq2 < *S-NdreH or Sɯ-NdraŋH 'wizard'

2.20.23:05: The absence of 𘤃 'grass' (herbal medicine?) in those characters makes me wonder if 1vyq3 and 2jeq2 were not 'medicine men' unlike the other two words I've mentioned so far:

𗄤 4536 2ror4 < *Cɯ.roH 'wizard, witch, sorcerer'

𗄥 4550 1lheq4 < *Sɯ-ɬe or *Sɯ-ɬaŋ 'wizard, sorcerer'

Is the shared *S- in three out of the four words so far significant?

I wouldn't take the slight differences in the definitions from Li Fanwen (2008) too seriously. Ditto for the Chinese definitions I haven't quoted. I suspect neither the English nor the Tangut captures the true differences between the words. Which is not Li's fault - there is nothing to go on but the brief, circular definitions from the Tangut dictionary tradition which define them in terms of each other.

I'm glad these words have survived at all; a wealth of pre-Buddhist Khitan and Jurchen - and Pyu! - vocabulary has probably vanished without a trace. But who knows what lurks among the undefined words in extant Khitan and Pyu texts?

¹2.21.19:05: It's a shame that Prof. Pihl doesn't have a Wikipedia entry. Far Outliers honors him.

I wonder if the Marshall R. Pihl papers collection contains the materials from the 1994 Middle Korean class that I took. THE DAY OF THE GREEN CHICKEN

Or, in Jurchen,

<nion.giyan CHICKEN DAY> niongiyan tiko inenggi

1. The Jurchen logogram <CHICKEN> might be related to the Khitan large script character


but the resemblance is vague at best.

The Jurchen and Khitan words may also be related somehow - the small script spelling of the Khitan word

tells us that 'chicken' was something like t-Qa, but there is no agreement on what was between the t- and -a. The latest reconstruction I've seen is Shimunek's (2017: 372) taqa <>.

The vocabularies of the Bureau of Translators and Interpreters have different transcriptions of the second syllable of 'chicken': 和 *xo (BoT #152) and 課 *kʰo (BoI #332, #424). The Chinese forms are only approximate, but there is no doubt that one had an initial fricative and the other had a stop.

Vovin (1997: 274) proposed that Jurchen/Manchu intervocalic *-k- became -h-, Other Tungusic forms for 'chicken' point to a medial stop. So it seems then that Jurchen tiqo [tɪqʰɔ] in the later Bureau of Interpreters vocabulary is from a conservative dialect that didn't lenite *-k-, whereas  the earlier Bureau of Translators form tiho [tɪχɔ] is from an innovative dialect that did. There is no evidence for a nasal that would have blocked lenition: *-nk- > -k-.

Manchu coko [tʂʰɔqʰɔ] may be a borrowing from a conservative dialect preserving a medial stop. The first vowel of the Manchu form seems to have assimilated to the second vowel. Wu and Janhunen (2010: 260) noted the similarity of Khitan small script character 39

with the modern simplified Chinese character 开 kai which in turn also happens to resemble Jurchen <CHICKEN>. Since 雞 'chicken' in Middle Chinese was *kej (something like *kaj in the south - far from the Jurchen!), it is tempting to come up with a pseudoexplanation for the Jurchen graph: tiko was written as a variant of 开 which almost sounded like  雞 'chicken'. But that would be anachronistic.

As far as I know, no one has proposed a reading for 39. The diacritic <ˀ> in Kane's (2009: 301) <kải> indicates that it is a placeholder transliteration chosen purely for visual similarity with 开 kai; it is not meant to indicate that Kane thinks 39 was pronounced kai.

39 probably did not stand for a single segment. It is only attested twice in the corpus in Research on the Khitan Small Script (1985): once in the epitaph for Empress 宣懿 Xuanyi (18.10.1) and once in the epitaph for the 許王 Prince of Xu (39.9.2). It occurs just once in the epitaph for Xiao Dilu (45.4). It is in initial positon before


in Xuanyi and Dilu and before


in Xu. Could its reading end in a consonant? Or in i if <as> is an error for <is>?

2. It took me thirty years to figure out that the Korean honorific nominative/ablative particle kkesŏ is an example of double indirectness as politeness. That explains why it is both nominative and ablative (not a combination I'm used to from an Indo-European perspective):

曾組ᄭᅦ셔 나시면

tsɯŋtso-skəj-sjə na-si-mjən

great.grandparent-DAT.HON-ABL go.out.HON.if

'if the great-grandparents go out ...' (家禮諺解 Karye ŏnhae 2.2, 1632; example found in Lee and Ramsey 2011: 271-272)

아버지께서 온 便紙

abŏji-kke-sŏ o-n phyŏnji

father-DAT.HON-ABL come-REAL.ATTR letter

'a letter that has come from Father' (a modern example from Martin 1992: 637)

In the second phrase, the ablative refers to the source of a physical object, whereas in the first phrase, it refers to the metaphorical 'source' of an action (i.e., its performer).

3. The modern honorific dative particle 께 kke < skəj above is the result of layers of contraction:

- 께 kke is a compound of -s 'GEN' and kəj 'to that place'

- kəj is a contraction of 'that' + ŋəkɯj 'to that place'

- ŋəkɯj 'to that place' is "derived from" kəkɯj 'to that place' (Lee and Ramsey 2011: 190)

- kəkɯj 'to that place' contains the dative-allative marker -ɯj 'to', so presumably kək was once a noun 'place' - but how did the -ŋ- ~ -k- variation come about? Vovin (2003: 96, 2009: 96 [on the same page in two different publications!]) proposed that Middle Korean intervocalic -k- is from Proto-Korean *-nk-. Two possibilities:

- the demonstratives used to have a final *-n (related to the realis attributive -n?) that was reanalzyed as part of the following word: *kɯn + kəkɯj > + ŋəkɯj (with irregular fusion of *-nk- to ŋ- in that phrase but regular fusion to -k- in kəkɯj?)

- the original word for 'place' was disyllabic nVkək, reduced to ŋək ~ kək

Martin (1992: 577) analyzed Middle Korean iŋəkɯj 'to this place' as i-ŋək-ɯj. There is no doubt that i is 'this' and ɯj is 'to', but initial ŋ- is odd in a native word.

4. David Boxenhorn asked me about Altaic vowel harmony. I don't have time to say much, but I can type a few introductory remarks here.

Altaic can be thought of as a continuum of five families in contact from east to west:

West: front harmony
Central red zone: height harmony
East: no vowel harmony

Turkic has frontness harmony like Uralic languages to the west:

Languages in what I call the red zone (after their shared word for 'red') have height harmony:

I believe Old Chinese and possibly also Tangut went through a height harmony phase influenced by Altaic neighbors.

Japonic has no vowel harmony beyond Arisaka's law: a tendency against having coexist with *a, *o, or *u within a root. See section of this file by Bjarke Frellesvig (who writes as *o and *o as *wo). In Japonic, there are no sets of  harmonizing affixes like those in other Altaic languages.

Wikipedia led me to Yoshida (2006) on i becoming e to assimilate to an e in the same word in modern Kyoto Japanese, but that is not like any other form of Altaic vowel harmony.

5. When discussing the problem of naming language groupings, David Boxenhorn suggested calling the South Arabian languages (which are not closely related to Arabic and not descended from Old South Arabian) Felician after Arabia Felix. That sounds better than my ideas:

- Mehric after the language with the most speakers

- Mehri-Soqotric, after the two languages with the most speakers

6. Robbins Burling in Proto-Karen: A Reanalysis (1969: 12) used phonostatistical arguments against Robert B. Jones' (1961: 100) reconstruction of twelve final nonglottal stops in Proto-Karen. (Compare with Proto-Karen's relative Old Burmese which only had four final stops: -k, -c, -t, -p; -c was ultimately secondary. Pyu had only three final stops: -k, -t, -p.) All appear only 1-3 times in Jones' reconstruction and are hence suspicious.

When I encounter rarities in Pyu, I note them and file them away instead of immediately granting them phonemic status.

Looking at Burling's (1969: 30-31) own reconstruction, I see asymmetries in his rhymes that I want to explore later.

7. Burling's (1969: 21) comments on Karen tones seem to apply to tone systems throughout the Sinosphere:

The tones fall readily into 6 major correspondence patterns. Little phonetic sense can be made of these correspondences. A high rising tone in one language may correspond regularly with a low falling tone in another, and in some cases even checked tones in one language correspond to smooth tones in others. Nevertheless, since the number of tones is small, and the number of examples of each is large, the correspondences hardly seem questionable.

My first encounter with this phenomenon was when I first read about Cantonese in 1990. I was accustomed to standard Mandarin, whose tones correspond with those of Cantonese as follows in sonorant-final syllables (*stop-final 'checked' syllables are complicated):

high level
high rising
low falling-rising
high falling
Cantonese *voiceless initial
high level or high falling
high rising
mid level
Cantonese *voiced initial -
low falling
low rising
low level

That was easy to learn. The Taiwanese correspondences were not:

Initial class
high level or high falling high rising mid level
high or mid checked
low falling low rising
low level
low checked
high level
high falling
low falling
low checked
mid rising
high falling (again)
mid level high checked

8. I didn't know there was a living Old South Arabian language!

9. I've never seen a term like this for an unidentified language before.

10. Sort of answering my own question, I finally got around to hearing Rihanna's pronunciation of care at about :31 in "Work". It sounds like [kjɛɹ] to me. "Sort of" because I don't know how representative that pronunciation is.

Old Japanese ke might have been something like [kʲɛ].

11. What is the origin of Geronimo's English name which doesn't sound like his name [kòjàːɬɛ́] in Mescalero-Chiricahua?

12. No time to look into Tangut

𗄤 4536 2ror4 'wizard, witch, sorceror'

tonight. I'll just say that it has a near-mirror image (near-?)synonym

𗄥 4550 1lheq4 'id.'

with 𘤃 'grass' (herbal medicine?) and 𘤧 'small' (referring to the size of the herbs?) in opposite places under  𘠋 '?' and stop there for now.

13. Shimunek (2017: 218) reconstructed Khitan

'was caused to serve' (Shimunek's translation)

as [r̩lgər] which is doubly un-Altaic: Altaic languages do not have native words with r- (Khitan may prove that to be a myth) or syllabic liquids. Typology aside, there is nothing phonetically implausible about his proposal. However, others would read that word very differently: e.g.,

Khitan small script character

Khitan small script character number
Chinggeltei 1979 ?
Jishi 1996
ər ?
Chinggeltei 2002
gə / ɣə wei
Kane 2009
Liu 2009
ku / tsh
Chinggeltei 2010
gə / ɣə ər / er
Wu and Janhunen 2010 ir
Takeuchi 2012
Liu 2014
ku / tsh ni
Shimunek 2017

(2.19.19:27: I expanded this list greatly using Andrew West, Viacheslav Zaytsev, and Michael Everson's wonderful compilation of readings. I'm surprised Jishi 1996 doesn't have a reading for 261 which is an extremely common character whose [l] can easily be verified by its presence in transcriptions of Chinese *l-syllables.)

Note that transliterations do not necessarily equate pronunciations: e.g., compare Shimunek's <> with [r̩lgər]. THE DAY OF THE GREEN MONKEY

Or, in Jurchen,

<nion.giyan mo.nion DAY> niongiyan monion/bonion inenggi

1. I originally wrote 'green' and 'monkey' as nongiyan and monon more or less following Jin Qizong (1984), but then I realized that Ming Chinese 嫩 *nun in their transcriptions was the only possible way to write Jurchen [ɲɔn] in sinography since there are no characters for *ɲon, *ɲun, etc. The Manchu cognates niowanggiyan 'green' and monio/bonio 'monkey' with nio [ɲɔ] confirm a palatal nasal [ɲ]. It would be unlikely for n to become [ɲ] before a nonpalatal vowel [ɔ].

2.17.19:43: The vocabularies of the Bureau of Translators and Interpreters have different transcriptions of the first syllable of 'monkey': 卜 *pu (BoT #152) and 莫 *mo (BoI #332, #424). This parallels the b [p] ~ m variation in Manchu. Anna Dybo's Tungusic dictionary regards the m- as secondary. The m- may be due to assimilation with the following -n-: cf. the b- ~ m- alternation in the paradigm of Manchu 'I':

b- when no nasal follows: bi (nominative)

m- when a nasal follows: mini (genitive), minci (ablative), minde (dative), mimbe (accusative)

be 'we (exclusive)' has the same alternation: e.g., meni (genitive).

2. In "The Day of the Black Horse", I proposed that pre-Tangut *-aw became *-a. I just found a potential example:

*kraw > *kraɰ > *kra > *kri > 𗠭 4533 1ki2 'to call out, to shout'

cf. Written Burmese ကြော် <krau> < *graw? (following Pulleyblank's 1963 analysis of <au>) 'to shout loudly'.

This example entails *-w loss before *a-brightening (i.e., raising to i).

2.17.21:21: But I don't know when *rV > V2 (i.e., Grade II V). Above I've placed that change after *a shifted to *i, but it could have predated that.

3. "Talking tactics: Rihanna and the pop stars who change accent" (via Lisa Jansen) mentions an application of phonostatistics I never imagined:

Take the Beatles for example; a band who were masters in vocal shape-shifting, and picked up traits from their fans across the Atlantic during the height of Beatlemania in the US. In You Say Potato: A Book About Accents, authors David and Ben Crystal note the impact of the Beatles’ fluctuating tones. Citing a report by Peter Trudgill in 1980, which examined the way in which the Beatles sounded out the r after a vowel, something most American singers would do, they wrote:

"In 1963/64, in such songs as Please Please Me, almost 50% of the words containing this feature had the r sounded. By the time of the Sergeant Pepper album in 1967, this had fallen to less than 5%. Note that the use of the feature was never totally consistent. That’s normal. When singers copy Americans, they get the accent sometimes right, sometimes wrong. But over the years, the Beatles' singing voices show that they are leaving the mid-Atlantic way behind and starting to sound more consistently British."

That made me wonder if exceptions to sound changes are cases of incomplete imitation.

4. Andreas Hölzl's "Udi, Udihe, and the language(s) of the Kyakala" (2018: 136) mentioned an Alchuka form that looks like the missing link between Jurchen

<GOLD.un> ancun (or alcun?) 'gold' (originally spelled with a single character <GOLD>?)

and Manchu aisin 'id.': anʃïn!

5. Looking up 𗠭 4533 1ki2 'to call out, to shout' in Li Fanwen's 2008 Tangut dictionary, I stumbled on a nearby entry

𗄤 4536 2ror4 'wizard, witch, sorceror'

Li only mentions attestations in dictionaries. So 2ror4 may be a so-called 'ritual language' word or, in my view, a non-Sino-Tibetan substratum word. The Mixed Categories volume of the Tangraphic Sea mentions several possible (near-)synonyms. I'll look at them tomorrow.

6. Looking at Shimunek's (2017: 218) reconstruction of un-'Altaic'-looking Khitan initial clusters (e.g., kʰtʃʰ- and tʰg-)  made me think he could have cited Middle Korean initial clusters like pst- for areal/typological support.

Surprising even from a Middle Korean perspective is his initial [r̩l]. Middle Korean had no r-initial words. More on this tomorrow.

7. Looking at Shimunek's "Post-publication Addendum to Languages of Ancient Southern Mongolia and North China: A Revised Transcription of Middle Mongol in ’Phagspa Script", I wonder how he would reconstruct the initial consonant of ꡖꡞꡘ ꡂꡦ ꡋꡦ <ɦir gė nė> *ɦirgen-e 'person-DAT/LOC' at an earlier stage.

2.17.20:37: Two topics I forgot to mention:

8. I finally got to see text in the Mongolian Latin alphabet. Or to be more precise, two versions of it. I'm confused: Wikipedia says one system

was officially adopted in Mongolia in 1931. In 1939, the second version of the Latin alphabet was introduced but not used widely until it was replaced by the Cyrillic script in 1941.

citing Lenore A. Grenoble's Language Policy in the Soviet Union (2003: 49). But the 1931 date is for Kalmyk, not Khalkha Mongolian in Mongolian, and I don't see any mention of the other points.

On the other hand, the Mongolist György Kara (2005: 187) only mentions an "ephemeral attempt" at a Latin alphabet for Mongolia "launched by Choibalsan in 1940".

(2.18.13:40: No, wait, his timeline [p.197] says there was an experimental alphabet for Khalkha in the "early 1930s". No mention of the specific date 1931 or of a new alphabet in 1939. He gives 1945 as the date of the introduction of Cyrillic for Khalkha.)

9. More confusion: The Wikipedia article on hanja (Chinese characters in Korean) says,

South Korean primary schools abandoned the teaching of Hanja in 1971, although they are still taught as part of the mandatory curriculum in 6th grade. They are taught in separate courses in South Korean high schools, separately from the normal Korean-language curriculum. Formal Hanja education begins in grade 7 (junior high school) and continues until graduation from senior high school in grade 12.

So are hanja taught in sixth grade or not? The first sentence tells me 'yes'; the last sentence tells me 'no'.

I'd still love to see a list of hanja taught in North Korean schools. THE DAY OF THE BLACK SHEEP

Or, in Jurchen,

<saha.liyan SHEEP DAY> sahaliyan honi inenggi

1a. The Jurchen character <saha> is only attested in the vocabulary of the Bureau of Interpreters (#481, #620), but its shape goes back centuries.

Jin Qizong (1984: 93) observed that there is an identical character in the Khitan large script from a remnant of a memorial from the mausoleum of Emperor Taizu of Liao (r. 916-926). Could that memorial date from the mid-to-late 920s: i.e., only a few years after the 'creation' (whatever that really meant) of the Khitan large script?

As the Khitan large script character for 'black'

is somewhat (though not entirely) different, my guess is that the Jurchen character may be a recycling of a Khitan large script character pronounced saqa (Shimunek [2017: 213] did not reconstruct x or h for Khitan). That character in turn might be derived from a Parhae prototype that was either pronounced similarly or represented an unrelated Parhae (North Koreanic?) morpheme with a meaning similar to whatever Khitan saqa might have meant.

Another possibility is that

were variants of <BLACK> in the Khitan large script. But they might be too different to be variants.

I am hesitant to transliterate

as a logogram <BLACK> because it is also attested in the verb stem

sahada- 'to hunt' (#481); cf. Manchu sahada- 'id.'.

Could that spelling be <HUNT.da> which at some earlier point ? Did the Jurchen originally write 'to hunt' as a single logogram <HUNT>? Was sahaliya then spelled <HUNT.liya> with <HUNT> used as a phonogram for saha-? Perhaps

represented a Khitan root 'to hunt' in the Khitan large script. If so, I cannot think of any plausible cognate Chinese character, though with pareidolia, one can see a 'covered cross' on the right side of  狩 'to hunt'.

1b. Jin Qizong (1984: 296) observed that <liyan> has a near-lookalike in the epitaph for Xiao Xiaozhong 蕭孝忠 (1089):

(shown here in Jerry You's font)

Was that character also read something like liyan? Might the character be from a Parhae graphic cognate of Chinese 亮 or the right side of Chinese 涼? Both 亮 and 涼 would have been pronounced something like *ljaŋ in the northeastern Chinese known to the Parhae (cf. their Sino-Korean reading 량 ryang).

1c. Jin Qizong (1984: 11, 12) found a different form of <SHEEP> in the Jurchen Character Book thought to date from the early Jin dynasty. I presume he identified its meaning on the basis of context (e.g., being surrounded by other animal date terms in sequence?) since the Book is monolingual. He writes this Jin form of <SHEEP> in three different ways in his dictionary:

As I do not have a clear copy of the Book, I do not know which form is attested in it. (Maybe two or more are if the character appears more than once.)

The last form is the closest to Khitan <SHEEP>, though the top elements (ヒ and ユ) are oriented in opposite directions:

Could some or all of these have originated as pictographs of sheep?

2. I was hoping to write a report on Larry Hyman's talk "Functions of Vowel Length in Language: Phonological, Grammatical, & Pragmatic Consequences", but no one was there. There wasn't even a sign indicating a new location or cancellation.

2a. In his abstract, Hyman mentions Bantu languages which

- "have added restrictions which shorten long vowels in pre-(ante-)penultimate word position and/or on head nouns and verbs that are not final in their XP"

- "have lost the [vowel length] contrast but have added phrase-level penultimate lengthening"

Why would vowels shorten in pre-(ante-)penultimate position? Or lengthen in penultimate position?

Those which have "new long vowels (e.g. from the loss of an intervocalic consonant flanked by identical vowels)" are like Mongolian: e.g., the city name Улаанбаатар Ulaanbaatar < *hulagan 'red' + *bagatur 'hero' (the 1924 collocation is obviously of Communist origin and hence cannot be reconstructed at the proto-level).

2b. I wonder what Hyman would say about Pulleyblank's (1962: 99) and Starostin's (1989) theories of vowel length and Chinese vocalic development in what Sagart (1999) called 'type A' and 'type B' syllables. Four proposals on type B syllables:

Pulleyblank: Old Chinese *Vː > Middle Chinese *jV..

Starostin, OTOH, had the reverse idea: Old Chinese short *V > Middle Chinese *jV. (This is a simplification.)

In the Baxter-Sagart system, Old Chinese *V before nonpharyngeal consonants > Middle Chinese jV (their j is a notational device).

In my system, (1) *high vowels not preceded by high vowels and (2) *low vowels preceded by high vowels > Middle Chinese high vowel-initial diphthongs.

The traditional (i.e., Karlgrenian) view is that Old Chinese *jV > Middle Chinese *jV.

2.16.22:45: A comparison of different views:

Type A syllables (all agree the Middle Chinese reflexes had no *-j- before *-e)

Old Chinese
Middle Chinese
Pulleyblank (1962)
*Ce *Cej
Starostin (1989)
*Ceː *Ciej
Baxter and Sagart (2014)
This site (my view since 2002)

Type B syllables (all agree the Middle Chinese reflexes had *-j- or *-i- before *-e)

Old Chinese
Middle Chinese
Pulleyblank (1962)
Starostin (1989)
Baxter and Sagart (2014)
This site (my view since 2002)

Baxter and Sagart's Middle Chinese notation is not starred since it is not phonetic. Their -ji- is a spelling device to indicate Grade IV chongniu status. I don't know how they think -jie was pronounced.

If I wrote Middle Chinese the way I write Tangut and Tangut period northwestern Chinese, I would write *Cie as Ce4 with 4 for Grade IV. I have considered writing such a notation for Middle Chinese to avoid getting bogged down in phonetic trivia.

3. Two things struck me as I was looking at Shimunek's (2017: 215-217) reconstruction of Middle Khitan vowels.

3a. His Middle Khitan vowel inventory is front-heavy unlike the Mongolic, Jurchen/Manchu, or early Korean systems:

Shimunek's Middle Khitan (3 front vowels)






He respectively places *ɛ and *ʊ higher and lower than I would expect. *ʊ is similarly high in the next table.

Shimunek's Common Serbi-Mongolic (2 front vowels)





Proto-Mongolic (1 front vowel)





Ming Jurchen in the Sino-Jurchen vocabularies (1 front vowel; note the similarity to the Middle Khitan inventory except for the front vowels)






Manchu (1 front vowel; descended from a Jurchen dialect retaining ʊ unlike the vocabularies dialects)







Early Korean (1-2 front vowels; in a more phonetic notation than usual to facilitate comparison with Shimunek's systems)





So far nobody else believes in my *ɛ. I'll live.

(Tables added 2.16.0:16.)

3b. Another surprise from a Mongolic/Jurchen/Manchu/Korean perspective is that his Middle Khitan a and ə belong to the same vowel harmony category, whereas they are typically in opposing categories. Contrast:

his Middle Khitan nar-ən 'tomb-GEN' (instead of †nar-an)


Written Mongolian aqa-aca '' vs. eke-ece 'mother-ABL' (e = [ə])

Jurchen ala-ha 'lose-PERF' vs. ete-he 'win-PERF' (e = [ə]; both from the Bureau of Translators vocabulary, #689, #794)

(2.16.0:24: I wonder if 阿剌 *a la- in Chinese transcription is an error for ana-; the Manchu cognate is ana-bu- 'to lose' with -n-, not -l-. See below for the Manchu verb ala- with -l-.)

Manchu ala-ha 'tell-PERF' vs. gene-he 'go-PERF' (e = [ə])

Korean 받아 pad-a 'receive-INF' vs. 벋어 pŏd-ŏ 'stretch-INF' [ɔ] is from earlier ə.)

Vowel harmony is breaking down in the spoken Korean 'infinitive': pad-a may be pronounced (but never spelled!) pad-ŏ (which is heard "increasingly in Seoul today" [Lee and Ramsey 2011: 296]).

I think nar-ən is also a case of vowel harmony breakdown possibily facilitated by a lack of stress on suffixes. Kane (2009: 132) gives examples of a-nouns followed by a genitive written <an>. However, Kane does not give examples of the type ... aC-an; all the stems in his examples end in -a, so, for instance,

<> 'of the qaghan'

might have simply been [qaʁan] rather than [qaʁaːn]. Perhaps a-final nouns took -n and aC-final nouns took -ən. THE DAY OF THE BLACK HORSE

Or, in Jurchen,

<saha.liyan DAY> sahaliyan morin inenggi

I can't believe I started the day thinking I'd never have enough to fill this entry.

1. I recall that Grinstead (1972) derived the Jurchen character <HORSE> from Chinese 保 'to protect', which would have been pronounced *paw (would Pulleyblank have reconstructed *pɔw?) in Jin Chinese. But why would the Jurchen write an m-word with a p-character?

Today I realized that <HORSE> might be derived from a Parhae script graphic cognate of 保 with a para-Japonic (!) reading cognate to Japanese mor- 'to protect'.

2. I discovered Lisa Jansen's blog Lisa Loves Linguistics. Excerpts from two posts:

2a. " 'He said me haffi work, work, work…' – Rihanna's multivocal identity":

the insertion of a palatal glide between [k] and [a] as in cyar instead of care which is also a more or less Pan-Caribbean feature

At first I thought of how English [kæ] is borrowed into Japanese as kya (e.g., cat as kyatto), but care doesn't have [æ]. Is care [kja] in the Caribbean?

2b. "The Sociolinguistics of 'Indie' Music: Kate Nash" (by Anika Gerfer)

Trudgill (1983) and Simpson (1999) discovered that a range of British artists of the mid-20th century switched to an ‘American accent’ in singing (Simpson labels this set of features associated with ‘American accents’ the “USA-5 model”).

That reminds me of the story behind the Kinks' "Come Dancing":

While recording "Come Dancing," Ray was asked to sing in an "American accent," a request he turned down.

Even the content was thought to be too English for the American market:

Although Arista Records founder Clive Davis had reservations about releasing the single in the United States due to the English subject matter of dance halls, the track saw an American single release in April 1983.

But the lyrics didn't bother me in Hawaii.

3. I finally realized that Sino-Korean 天動 chhŏndong 'thunder' became 'nativized' as 천둥 chhŏndung to harmonize the lower series vowel o with the preceding higher series vowel ŏ.

Korean vowel classes (added 2.16.0:41; ă is obsolete)


4. A Haiman Tetralogy

Quoting from a grammar that's actually fun to read!

4a. In the Khmer dialect described by Haiman (2011: 1), what he transcribes as av (ៅ <au> in Khmer script)  is pronounced as [aɯ]. I suspect a similar shift of *-aw > *-aɰ occurred in Tangut. Eventually this *-aɰ simply became -a.

4b. Haiman (2011: 10):

Leaving this small number of words aside, it is still remarkable that in a language where almost every two-consonant cluster is attested word-initially, there are (virtually) no such (glottal stop + C) clusters.

I think "every" is too strong for Khmer which has many constraints on initial clusters: e.g., no clusters starting with implosives.

I'm reminded of how I thought anything could be in a Pyu consonant cluster after seeing sequences like kṭl- from inscription 12 and tdl- from inscription 16) until I actually collected all the clusters in the corpus and put aside marginal oddities. Then patterns emerged: e.g., what appeared to be three-consonant clusters were really sequences of preinitials followed by initials spelled with two consonants:

kṭl- /k.L̥/

tdl- /t.L/

/L̥ L/ may have been lateral affricates [tɬ dɮ].

2.16.20:11: Whether these mysterious laterals have anything to do with the laterals sometimes reconstructed for Tangut (e.g., Sofronov 1968 and Tai 2008's ld-) remains to be seen. I have not yet been able to identify any cognates of Pyu words with /L̥ L/ (or the similarly enigmatic /R̥ R/ written as  ṭr and dr).

4c. Haiman (2011: 19):

Smith (2007: ii) declares the native orthography to be "the best [transcription of Khmer phonetics] on the planet" and heroically dispenses with any romanizations in even the initial chapters of his introductory textbook. No other scholar has followed him in either this bold assessment or in practice

I haven't seen Smith (2007), but it does seem "bold" to do so, given that I had to work through 148 pages of Huffman's Cambodian System of Writing (1970) to learn the script.

4d. Haiman (2011: 22):

Final <s> may be pronounced [s], in a hypercorrect reading style: thus nah, written as <nas> can be pronounced [nas] or [nah]. Otherwise, it is pronounced as [h]

This makes the Khmer borrowing of juif 'Jew' as ជ្វីស <jvīs> [cʋih] (hypercorrect [cʋis]) with <s> instead of <ḥ> even stranger; a nonsibilant [h] seems more like [f] to me than a sibilant [s].

5. Looking at Roland Emmerick's 2009 sketch of Khotanese, I wondered where balysa- /balza-/? 'Buddha' came from. (ys in Khotanese Brahmi stands for non-Indic /z/, a common sound in Iranian languages.)

6. Today's color is black, and yesterday I proposed that the Jurchen phonogram <he> was from a Parhae script counterpart of Chinese 黑 'black'. In Middle Chinese, was pronounced 黑 *xək (probably more like *xʌk), yet its Sino-Korean reading is hŭk [hɯk] with a high vowel. That oddity is not isolated; it is true of Sino-Korean readings corresponding to Middle Chinese *-ək/*-əŋ in general. What's going on? The borrowing of Middle Chinese *-ək/*-əŋ (*-ʌk/-ʌŋ?) as Sino-Korean [ɯk]/[ɯŋ] is even more puzzling considering that Korean once had [ʌk]/[ʌŋ]. The early ('Go-on') layer of Sino-Japanese presumably borrowed via a Koreanic language (Paekche) has -oku/-ou < -ək/-əũ for those Middle Chinese rhymes. (That tells us a bit about how Sino-Paekche differed from Sino-Shilla which became Sino-Korean.)

7. I was reluctant to propose that Ming Jurchen gulmahun 'hare' and Manchu gūlmahūn 'id.' had acquired their final syllables by analogy with Ming Jurchen indahun and Manchu indahūn 'dog', but now here I am mentioning it after seeing Shimunek (2007: 353)'s similar proposal for Middle Mongol 'snake':

The ai /Ay/ element in the Middle Mongol form [moqai ~ moqoi] is probably the result of analogical change: cf. MMgl noqai 'dog', qaqai 'pig', taulai 'hare', etc. (Emphasis mine.)

Note that all four of those animals are part of the twelve-animal cycle.

8. Shimunek's 2018 article on Jurchen numerals is a good companion to Andrew West's article on the same topic.

9. I agree with Juha Janhunen (2012: 13) about

the assimilation model of linguistic expansion. According to this model, it is not populations that migrate but languages. When a speech community expands its territory to comprise areas where other languages are originally spoken, the principal process is that of linguistic replacement, or language shift, due to which the new language is, in most cases voluntarily, adopted by speakers of the former local languages. Empirical experience from different parts of the world tells us that language shift is by far the most important mechanism of linguistic expansion. This conclusion has only been confirmed by recent progress in human genetics.

That is why I like to speak of the coming of Burmese speakers into the Pyu lands rather than just 'the Burmese'; the latter could imply that the Pyu were completely replaced by 'the Burmese', whereas it is more likely that Pyu speakers switched to Burmese. The descendants of the Pyu are still here, though they don't speak Pyu or identify as Pyu anymore.

10. I disagree with Pevnov (2012: 17) about the term 'Tungusic':

which in my opinion is incorrect for the following reasns: first, it would at the very least be strange to consider Jurchen or Manchu to be Tungusic, and second, following such a logic of terminological simplification, it would analogically be possible to replace the term "Indo-European" with "European," "Finno-Ugric" with "Finnic" or "Ugric" and so forth, although it is unlikely that anyone would agree with such innovations.

The term Manchu-Tungusic could imply there are only two branches, Jurchen/Manchu and an 'everything else' branch (which is in fact Pevnov's view, one he shares with Sunik and Vasilevich). But that may not be the case: e.g., on the previous page, Janhunen (2012: 16) posits a different model in which Jurchenic (Jurchen and Manchu) are a subbranch of Southern Tungusic:


See Wikipedia for a model with the same basic structure (but different details below the second-level branches: e.g., Janhunen regards Kili as Nanaic, whereas Wikipedia lists Kili as Ewenic).

The term 'Sino-Tibetan' has similar problems - it could imply there are only two branches, Sinitic and Tibeto-Burman, which I do not think is the case. But at least Chinese and Tibetan are both well-known languages that could serve as representatives of the family. The layman has heard of Manchu but not of 'Tungusic'. Moreoever, there is no language called 'Tungusic'.

Shimunek's term 'Serbi-Mongolic' also implies there are two (known) branches, Serbi and Mongolic, and that does seem to be the case. Serbi is not a well-known language, but at least it was a language (see Shimunek 2017: 121-168 for details on Middle Serbi).

2.16.21:30: For further reading on naming language families, I recommend Ostapirat (2000: 18):

We propose to call the whole language stock, to which Kra and other sister languages belong, Kra-Dai. The term follows the popular tradition of juxtaposing two big language members of the family, which sometimes are also linguistically distant enough from each other to give the feel of the whole family (cf. Sino-Tibetan, Tibeto-Burman, Mon-Khmer, etc). Such "dual" names appear to have proved practical; the longer names have seemed to be less successful in competition. For instance the term "Kam-Tai" which represents the Tai and Kam-Sui branches have quickly taken over the older names such as "Tai-Kam-Sui-Mak" (the last three members belong to the Kam-Sui branch).

Rereading that, I see the first line might give the impression that Kra is a language, though it is actually a group of languages.

Dai in Kra-Dai also refers to a group of languages; it "is the reconstructed form of autonyms of various Tai groups" such as the Thai. I like Dai as it avoids the homophonous confusion of Tai and Thai in English. Dai does have homophony problems of its own, but as a proto-word it is the shared heritage of all Tai peoples.

'Tungusic', on the other hand, is not based on a proto-autonym shared by most Tungusic languages (or even most non-Jurchenic Tungusic languages); it is a Turkic word for 'pig' that was an exonym of the Evenks. It has stuck in English, and I doubt it has any potential serious competitor other than Manchu-Tungusic: e.g., Eweno-Jurchenic.

