Until now I've been reading the character

2705 'to help; right side of character (i.e., assistant)'

in Tangut character analyses as 2beq4. But last night I discovered that it should be 2bir'4.

It and its homophone

2928 'to explain, note' (with 'speech' on its right side; probably a different spelling of a specialized usage of 'to help')

form a two-character homophone group in the first chapter of Homophones. All characters in that first chapter have readings with labial initials (p-, ph-, b-, m-). The Timely Pearl transcription of 2705 is *3me3 (32.2.8); the diacritic indicates b- rather than m-.

I have no idea how Nishida, Sofronov, and Gong determined the rhyme

N -ɛ̣ 2.54

S -ɪ̭e 2.? (no rhyme number listed on II: 307, but implictly 2.54 since its transcription 命 is listed under 2.54 on II: , presumably a typo for -ɪ̭ẹ since -ɪ̭e 2.54 should either be 2.35 or 2.37 according to I: 137)

G -jịj 2.54

= my 2-eq4

before the rediscovery of the Precious Rhymes of the Tangraphic Sea which listed 2705 under rhyme 2.86 (-ir'). 2928 is not in PRTS, but its rhyme must be identical to that of 2705 since the two are homophones.

(8.29.0:59: Perhaps 2.54 was determned by a process of elimination. The Chinese transcription 命 *3me3 most likely reflected a Tangut reading like be4 (be3 would be anomalous in Tangut). 2705 could not have had the first tone because 2705 was not in the Tangraphic Sea's volume for first-tone characters. 2705 was thus thought to be in the lost second-tone volume [and later it indeed was found in  the second-tone volume of  the Precious Rhymes of the Tangraphic Sea]. But was it 2be2 [2.33], 2be'2 [2.35], 2beq4 [2.54], or 2ber4 [2.68]? 2705 was not in the homophone groups thought to be for 2be2 and 2be'2, so 2beq4 and 2ber4 were the remaining possibilities. I don't know why 2.54 was favored over 2.68. The actual rhyme turned out to be 2.86 - which was none of the above!)

All this makes me wonder how many other readings have been revised in light of the Precious Rhymes of the Tangraphic Sea. And how many other readings are questionable. FULL TETRATANGRAPHIC FORMULAE

You may have noticed in my last entry that I started quoting Tangraphic Sea four-character character analyses in full instead of converting them into A + B (+ C) formulae which are easier for me to type.  Perhaps paying attention to the exact wording of the analyses might help me formulate hypotheses about the script (which remains enigmatic to me after two decades).

David Boxenhorn saw the analysis of 'six' (which I will rewrite here in full) in "Tired Chapters of Accuracy"


3200 1chhiw4 = 3849 3130 1012 2705 1zhiw3 2mer4 1zeq4 2beq4

'(first syllable of 'sixth (month)') + palace + how-much right'

and my assessment of it as "implausible". He wrote that the implausiblity

supports the theory that the analyses are mnemonic.

You know what else supports the mnemonic theory (I just thought of it!)? The analyses are all four characters. I can easily imagine a group of students reciting them. Even singing them.

And isn't there a Chinese tradition of four-character sayings?

[... I]s there any continuity between adjacent analyses, either semantic or phonetic? Something to suggest that they are not meant to be read in isolation?

I just realized that The Golden Guide consists of 5-character lines. Characters plus their 4-character analyses are also 5 characters long. They could be both be recited in the same way, e.g. with the same tune!

You know, in English the most common meter is iambic pentameter, which would be great for every two lines of these works.

I should look at analyses in adjacent Tangraphic Sea entries in the new features.

I had been thinking that if the Golden Guide had a tune, it couldn't apply to the Tangraphic Sea analyses, but I had overlooked the possibility of including the analyzed character's reading in the tune.

As for meter, I have never seen a comparative study of the structures of poetry in (South)east Asian 'monosyllabic' languages. Does such a study exist? Or even, say, a study of poetic structures in what Guillaume Jacques might call the Macro-rGyalrongic world? Is there a Qiangic language today with poetry characterized by five-syllable lines? Is there a tonal pattern within and/or between lines of the Golden Guide and/or the Tangraphic Sea analyses?

In any case, I don't think the Tangraphic Sea analyses are truly etymological. In other words, I don't think they necessarily reflect the reasoning of the creator(s) of the script. Who would devise graphs for 'sixth (month)', 'palace', and 'how much', and then fuse them into 'six'? 'EIGHT' FROM 'SEVEN', 'SIXTH' FROM 'FIVE'?

I have been haunted by the Tangraphic Sea analysis of 4602 1ar4 'eight' for twenty years:


4602 1ar4 'eight' = 4778 1139 2750 1868 1shaq4 1e4 1ghu2 1teq4

'seven GEN head remove' = 'remove the top of seven'

If 'seven' and 'eight' are related characters, I would expect the more complex character ('seven') to be built up from a less complex character ('eight') rather than the other way around. But that scenario also has issues. Why would the character for 'eight' be designed before the character for 'seven'? And what does the top half of 'seven' do? I used to think it might be a phonetic symbol for sh-, so 'seven' was written as sh-yar = 1shaq3. However that configuration of strokes (Boxenhorn code: biozoxzox) appears nowhere else in the Tangut script. Why did 'seven' merit a unique 'head'?

The Tangraphic Sea analysis of 'seven' does not answer that question:


4778 1shaq4 'seven' = 4751 3916 4602 1602 1se1 2si4 1yar4 2ngorn1 

'clean (nominalizer) eight all' = '(the top of?) cleanliness all of eight'

If 'eight' is from 'seven' - at least in the Tangraphic Sea - it is not entirely surprising that the second syllable of

3849 5081 1zhiw3-1vi1 'sixth (month)'

is said to be from 'five':


5081 1vi1 = 5286 3936 1999 2705 1vi1 1pha1 1ngwy1 2beq4

'(second syllable of 1ten4-1vi1 'clever') left + five right'

5286 is phonetic. 1999 is not optimally semantic, though. Why not simply extract part of

3200 1chhiw4

the character for the regular word for 'six'?

8.27.22:39: Cf. how the character for 'fourth (son)' is derived from the regular character for 'four' (though the words are unrelated!):


4934 1ngwyr 'fourth (son)' = 4971 2750 2205 1602 1shwi3 1ghu2 1lyr'4 2ngorn1 

'age head (i.e., top) + four all'

The only other son-counting character with a similar structure is


1257 1ar4 'eighth (son)' = 0384 3936 4602 1602 1leq4 1pha1 1ar4 2ngorn1 

'son left + eight all'

which is simply a different spelling of 4602 1ar4, the regular word for 'eight'. Why did 'fourth (son)' incorporate 'four' unlike other characters for counting words unrelated to regular numerals? DID COMMON AND 'RITUAL' TANGUT SHARE A ROOT FOR 'SIX'?

So far the last word on the subject of the so-called 'ritual language' of Tangut is Andrew West's 2011 article. I have yet to write a full response after over five years. My view in short is a blend of Nishida Tatsuo's and Andrew's; the 'ritual' language is a subset of substratal vocabulary used in glosses. I think these words were borrowed from a language of unknown affiliation - possibly an isolate spoken in Tangut territory. 

But if I am right, I would not expect the first syllable of 'sixth (month)'

3849 5081 1zhiw3-1vi1

to sound like the regular word for 'six':

3200 1chhiw4*K-truk.

Last night it occurred to me that 3849 might be 3200 with a root initial that lenited after the vowel of a lost presyllable:

*KV-truk > *KV-ch- > *KV-zh- > zh- (the relative timing of *-uk > -iw is unknown)

cf. 3200 in which *KV- conditioned aspiration: *KV-truk > *K-truk > 1chhiw4

But if that were the case, then what is 5081? It is not attested by itself except as a dictionary entry, and it does not occur as a suffix after other numerals.

I used to think that 3849 5081 1zhiw3-1vi1 was an unanalyzable disyllabic word, but 1vi1 is homophonous with

3649 1vi1 'sixth' (son).

Is 1zhiw3-1vi1 a redundant compound combining a variant of the basic Tangut word for 'six' with a substratal word? Or is 'sixth' before 'son' an abbreviation of a disyllabic substratal word 1zhiw3-1vi1?

8.26.14:31: Perhaps the resemblance of substratal 1zhiw3-1vi1 to 1chhiw is no more meaningful than that between Malay dua < *duSa 'two' and Sanskrit dva- ~ dvi- 'id.' (borrowed into Malay as dwi-). The zh- of 1zhiw3 does not have to be from a lenited consonant; it could be a direct borrowing of a voiced fricative from a substratal language. Similarly,  the -iw of 1zhhiw1 need not be from *-uk. TIRED CHAPTERS OF ACCURACY

In my last post I mentioned the unusual -aq1 ~-iq1 alternation in a disyllabic Tangut verb 'not know':


5077 1817 1my4-1daq1 ~ 5077 5283 1my4-1diq1. 

The first syllable does not seem to occur by itself. Was it originally the same morpheme as

5643 1my4 'not'

which negates auxiliary verbs? Was there originally a d-verb for 'know' that was an auxiliary in a V + 1my4 + d-verb 'not know how to V' construction?

The second syllables occur as glosses for


3020 1ja'3 'accuracy' = left of 3543 1dzwy1 'chapter' + left of 1817 1daq1 'know'

in Mixed Categories. That implies they might also be standalone verbs, though I have not yet seen them used by themselves in texts.

3020 shares its right side 'speech' (derived from Chinese 言 'id.'?) with 1817. (The bottom half of 'speech' has different shapes depending on whether it is on the left or right. Generaly the final stroke of a Tangut character cannot point in a northeast direction.)

Its left side

Boxenhorn code: jil

is only in three other characters. It is semantic in 3543 (see above) with 'hand' on the right (why?), semantic and phonetic in

3021 1ja'3 'chapter' (cf. 3543 'chapter' above)

and phonetic in 3523, half of the mirror-image disyllabic words


3523 1688 2ja'3-1gu1 ~ 1gu1-2ja'3 'tired' (cf. 3020 and 3021, both 1ja'3)

I have not seen 3021, 3523, or 1688 outside dictionaries.

(8.25.23:20: Li Fanwen 2008 and Kychanov and Arakawa have glosses for each half of those words, but I don't know how they determined which side meant what:

3523:  L 'skinny, wan and sallow'; K&A 'toil, exhausting labor'

1688: L 'toil', K&A 'toil'

K&A defined 2ja'3-1gu1 as 'toil'; they have no entry for 1gu1-2ja'3.)

I first encountered the right side of 3021 (Boxenhorn code: dim) in 3200 1chhiw4 'six' whose Tangraphic Sea analysis is implausible:


3200 1chhiw4 = 3849 1zhiw3 + 3130 2mer4 'palace' + 1012 1zeq4 'how much'

I doubt that the character for the regular word for 'six' was derived from 3849,  the character for the first half of 'sixth (month)' in the so-called 'ritual language':

3849 5081 1zhiw3 1vi1

And why write 'six' with part of 'palace'?

'How much' is not the worst source of a numeral character component, but its right half doesn't appear in any other numeral characters.

The right side of 3523 and 1688 is semantic:

4675 2rer4 'toil, hard work'.

Work makes one tired.

8.25.20:20: The vague similarity of 4675 to the right side of Chinese 作 '' may be coincidental.

The left side of 1688 from

0678 1gu1 'arise, build'

is phonetic and vaguely similar to the left of Tangut period northwestern Chinese 孤 *1ku1. WHERE DID *-U GO IN TANGUT?

My last series of posts was about the Tangut verb 'eat' which has two stems:


4517 1dzi3 < *Nɯ-dza and 4547 1dzo4- < *Nɯ-dza-u

The latter was derived from the former via the addition of a third person object suffix *-u. Here is my version of Gullaume Jacques' (2014: 232) table of pre-Tangut *-Ø ~ *-u verb stem alternations and their Tangut reflexes. I have added the first two rows and reorganized the others.

Alternation type Pre-Tangut Tangut
u *Cʌ...-ə -u1
*-ə-u -u3
i *Cɯ-...-ej -e3/4
*Cɯ-...-ej-u -i3/4
y *Cɯ- ...-aC (Cŋ) -a3/4
*Cɯ-....aC-u -y3/4
o2 *(Cʌ)-...-ra -i2
*(Cʌ)-...-ra-u -o2
o3/4-i *Cɯ-...-a -i3/4
*Cɯ-...-a-u -o3/4
o3/4-u *Cɯ-...-o -u3/4
*Cɯ-...-o-u -o3/4
o3/4-e *Cɯ-...-aŋ -e3/4
*Cɯ-...-aŋ-u -o3/4

Two oddities that jump out at me:

1. Why are there so few alternating verbs with Grade I rhymes? I only know of two such verbs, and the second cannot be reconstructed with *-u:


1338 1dzu1 ~ 4973 1dzu3 'love'


5077 1817 1my4-1daq1 ~ 5077 5283 1my4-1diq1 'not know' (but the first syllable is Grade IV)

2. Why are there no pre-Tangut verbs ending in *-i ~ *-i-u?

I presume pre-Tangut *-u ~ *-u-u stems merged into Tangut -u. Did *-u vanish after higher series vowels (*u, *i, *ə)? Or is the set above an incomplete remnant of what was once a much larger system involving more (or even all) verb stems that was increasingly regularized (i.e., lost its alternations) over time?

8.24.23:13: The table above does not include non-*u alternations such as the one in 'not know'.

I wonder if the i-type and y-type alternations involve ablaut rather than *-u. I can imagine how *-ej-u fused into -i

*-eju > *-iju > *-ju > *-y > -i

though I'm surprised it didn't merge with -ew.

However, I have more difficulty with deriving -y (phonetically some sort of nonlabial central vowel or diphthong) from *-aC-u. *-C- must have lenited to nothing, and *a and *u might have fused into a vowel that was nonlabial and central like *a but nonlow like *u. THE PAST AND PRESENT SOUND OF EATING IN TANGUT (PART 5)

In parts 3 and 4, I reconstructed the ancestors of the stems of the Tangut verb 'eat'

4517 1dzi3 and 4547 1dzo4-

with a presyllable *Nɯ- whose high vowel conditioned Grades III and IV.

But looking at the word's cognates in the rGyalrongic Languages Database (e.g., Brag bar ka-nə'-dza) made me wonder if the word had two presyllables - and if the high vowel conditioning Grades III and IV belonged to a presyllable preceding a nasal: Unfortunately, almost none of the presyllables preceding (n(V)-)dzV in rGyalrongic languages have high vowels:

a'- ka'-
kə- kwə- tə- nə-

The exceptions are Da tshang towu'nza and Japhug (variety A) tu'-nza; Guillaume Jacques' (2016: 143) Japhug simply lists ndza (the form given for variety B in the database) with the directional prefixes tɤ- and thɯ-. (How many of the presyllables above are directional prefixes?)

I doubt that a single presyllable before *NV- can be reconstructed at the Proto-rGyalrongic level; different varieties apparently added different prefixes to *NV-dza. In some cases, there were two prefixes:

*ka-tV- > Tsho bdun (variety A) kat˺'- (B has kə-)

*kV-pV- or *o-kə- > Brag steng 'Khyung-ri kwə-

*to-pV- > Da tshang towu'-

My hypothetical *pV- is the source of xxx ᴾ-. I don't know how ᴾ- differs from p-. I also don't know what the function of the apostrophe is. Is it a breve? I assume it is not a glottal stop which appears as ʔ in the site's transcription.

Brag bar ka-nə'-dza preserves the syllabicity of the nasal element of 'eat'. Without knowing anything about Brag bar phonology, I do not know whether *nə' can be derived from *Nɯ- with a high vowel.

bZhi lung kaˈ-ᵐtsok˺ points to *m as the specific value of *N. I am guessing that the small  indicates that ᵐts- is a unit phoneme, and that the presyllable is ka'- rather than *ka'm-.

8.23.23:08: Do I have to posit a high-vowel presyllable to account for the grades of the Tangut forms? Guillaume Jacques derives his Grade III (= my Grades III and IV) from *-j-, and such a medial is in some forms of 'eat' in the rGyalrongic database: e.g., dPa' dbang ndzja (identical to Guillaume's pre-Tangut *ndzja!) and Kha ra kyo ka'-zje. Is this attested -j- primary or secondary? In other words, did languages like Japhug lose *-j- (*ndzja > ndza), or did *a become ja (with or without further fronting and raising: ja > je > i) in some languages? I favor the latter scenario, but I am not absolutely certain, as I do think *-j- did exist in the ancestor of these languages. I just don't know if it existed in this particular word at an early stage. THE PAST AND PRESENT SOUND OF EATING IN TANGUT (PART 4)

As I revised what was supposed to be the last part of my trilogy on

4517 1dzi3 < *Nɯ-dza 'eat'

I realized I had overlooked something obvious. 4517 belongs to a small class of verbs with two stems. The other stem is

4547 1dzo4

with a Grade IV rhyme. Why don't the rhymes of both stems have the same grade? Did 4517 have some additional affix that conditioned Grade III instead of the usual Grade IV after dz-? This second stem never appears by itself; it is combined with person suffixes in 'I eat him/her/it/them' (1SG > 3), 'thou eatest thyself/him/her/it/them' (2SG > 2SG, 2SG > 3), and perhaps 'I eat myself' (1SG > 1SG; forms for unattested subject/object combinations are in parentheses).

subject\object 1SG 2SG 1/2PL 3




(*?) (*1dzi3-2ni4?)











(*?) (*1dzi3-2ni4?)





(*?) (*1dzi3-2ni4?)


The o-stem is derived from the i-stem plus a third person object suffix *-u prior to 'brightening' (*a > i):

*Nɯ-dza-u > 1dzo4

If *-w had been added after 'brightening', the o-stem might have been an iw-stem:

*Nɯ-dzi-u > *1dziw4

8.22.23:32: Note that *-u is not present in all verb forms for third person objects. In that respect Tangut differs from northern Qiang in which a cognate suffix -w is consistently in all verb forms for third person objects. THE PAST AND PRESENT SOUND OF EATING IN TANGUT (PART 3)

I conclude my trilogy on

4517 1dzi3 'eat'

by looking into the origin of its reading.

Although it is tempting to assume that Tangut dz- is a direct retention of an earlier voiced obstruent, dz- may have been [ndz] with prenasalization as in Muya ndzɯ³⁵, Japhug ndza, and Naxi ndzɯ³³. That prenasalization in turn may have been from an *m-. (8.21.23:19: See Matisoff 2003: 119 for the reasoning behind his reconstruction of *m.)

As for the rhyme, there is no doubt that it has been 'brightened' to use Matisoff's (2004) term: i.e., *a has raised to i. There are two problems regarding brightening that remain to be solved.

First, why did *a raise to i1 in some cases and i3/i4 in others? According to Guillaume Jacques (2014), the presence or absence of *-j- determines the type of raising:


*Cja > Ci3/Ci4 (Guillaume does not make a distinction between Grades III and IV; he uses Gong's reconstruction in which both grades are characterized by a -j- that is not supported by Tibetan transcriptions.)

But for years I have proposed that presyllabic vowel harmony conditioned the changes leading to Tangut's rich inventory of open-syllable rhymes. I have yet to fully integrate my views with Guillaume's, but here's a first attempt. *Ca by itself or with a lower vowel presyllable became Ci1, whereas *Ca with a higher vowel presyllable became Ci3 or Ci4 depending on the initial:

*(Cʌ-)Ca > Ci1

*Cɯ-CaCi3/Ci4 (normally Pi4, Vi3, Ti4, Ki4, TSi4, CHi3, Qi4, li3, rir4, zi4, zhi3)

e.g., *Nɯ-dza > 1dzi3 (!; see below)

Unfortunately I have not found the Sino-Tibetan equivalent of Hittite that preserves the presyllables predicted by my hypothesis.

Second, why did *a become i3 after dz- in 'eat' instead of i4? There is a related word

4513 2dzi4*Nɯ-dza-H 'eat, drink, food'


The second half of the fanqie spellng of the Tangut cognate of Wobzi dzí 'eat' is also puzzling:


4517 1dzi3 'eat' = 5051 1naq4 + 0932 1i3 (Mixed Categories of the Tangraphic Sea 4.252)

Normally Class VI initials like dz- only combine with Grade I and IV rhymes: e.g., 1dzi1 and 1dzi4. But 4517 and its homophones

0382 1dzi3 'equal' and 4912 1dzi3 'cut'

have a Grade III rhyme -i3!

There are only ten syllables combining Class VI initials with Grade III rhymes. The other seven are


0524 1dzu3 'admonish, instruct' 

4973 1dzu3 'love'

4977 1dzu3 'father-in-law, uncle' (only in dictionaries)

5121 1dzu3 'dream' (only in dictionaries)

3408 1tsa3 'broil, roast' (only in dictionaries)

3976 2tsha3 (transcription character; only in dictionaries)

3371 1dza'3 'hair worn in a bun or coil; peak'

(8.21.1:54: Is it significant that Class VI initials can only precede four different Grade III rhymes?)

Why is the syllable type TSV3 (Class VI initial + Grade III rhyme) so rare? Was there something 'antialveolar' about Grade III? There certainly was no such quality in Grades I and IV.

Gong demonstrated a strong (but not absolute) correlation between Tangut and Middle Chinese grades. In Middle Chinese, Grade III was less palatal than Grade IV: e.g., in Chinese borrowings in Korean, Grade III -i is sometimes -ŭi, but Grade IV -i is always -i. For years I reconstructed Tangut Grade III -i as [ɨi] and Grade IV -i as [i]. The velarity of the first half of [ɨi] was the 'antialveolar' quality that usually made it incompatible with Class VI initials. But now I write those rhymes algebraically as -i3 and -i4 because I am no longer certain about their phonetic values. If -i3 were really [ɨi], why was

0932 1i3 'many, more, much'

a transcription character for Sanskrit i? Sanskrit has no [ɨi] or even [ɨ]. I would have expected Sanskrit i to be transcribed as 1i4: i.e., as [i].

8.20.23:54: Why can't I just reverse my reconstructions of Tangut i3 and Tangut i4 on the basis of 0932 1i3 transcribing Sanskrit i? Because that goes against what is known about the Chinese grades and because i3 is less common than i4 (88 : 209). It would be strange if i3 [i] were less common than [ɨi].

8.21.1:59: Another solution would be to reconstruct i3 as simple [i] and i4 as [ji]. But i4 was transcribed in Tibetan as i and not as yi. And characters with i4 were used to transcribe Sanskrit syllables with i, not yi. THE PAST AND PRESENT SOUND OF EATING IN TANGUT (PART 1)

After mentioning the Tangut cognate of Wobzi dzí 'eat' in my last entry, I looked up its fanqie and was surprised:


4517 1dzi3 = 5051 1naq4 + 0932 1i3 (Mixed Categories of the Tangraphic Sea 4.252)

Shouldn't those initial and final spellers add up to 1ni3 instead of 1dzi3? I don't know of any Chinese or Tibetan transcriptions of 4517 or its homophones

0382 1dzi3 'equal' and 4912 1dzi3 'cut'

so its reading has to be determined using internal evidence. 4517 is in the section of Mixed Categories for class VI initials: i.e., dental sibilants (ts- tsh- dz- s-). n- is a class III initial. The closest class VI initial to n- is dz- which is voiced and possibly prenasalized: [ndz]. I'm not entirely satisfied with that resolution to the conflict.

Next: The rhyme of 4517.

8.19.22:57: Here is some indirect evidence for reading 4517 with an affricate initial. 4517 is the initial speller for


4855 2dze4 'one of a pair' = 4517 1dzi3 + 4517 2tse4

which is a transcription character for Sanskrit je. Palatals in standard Sanskrit pronunciation correspond to dental affricates in Tangut (as in Tibetan and Chinese). So the initial of 4855 - and its initial speller 4517 - must have been dz- (or something close like [ndz]).

Could someone explain why Sofronov (1968 II: 85) lists the initial speller of 4517 (5051) as part of class VI fanqie chain 12? The fanqie of 5051 has an initial speller which doesn't even have a class VI initial:


5051 1naq4 = 5700 2ni'4 + 5371 1taq4

It has a class III initial n- that can be confirmed by its Tibetan transcription ni (Tai 2008: 208). RIGHTWARD REDUPLICATION IN WOBZI

In "Triply Awake in Sanskrit", I asked,

Does any language have rightward reduplication?

I should have been more specific and asked about rightward reduplication in verb inflection as opposed to other kinds of rightward reduplication that I'm used to like

Japanese hitobito 'people' (< hito < pitə 'person')

Burmese kàũgàũ 'well' (< kàu 'good')

Thanks to Guillaume Jacques who understood what I was really getting at, I now know that the answer is yes. Lai Yunfan wrote an entire paper about rightward reduplication (RDP) in Wobzi: e.g.,

14a. nú ǽ-ɕʰə-vɣí-n-vɣɑ(< vɣí 'full')

'thou DIR*-Q-full-2SG-RDP'

'Are you full or not? (I don't think you are.)'

14b. ŋó zɑmɑ̀ dz-ɑ́ŋ-dzu gædì. (< dzí 'eat', cognate to Tangut 4517  1dzi3 'id.')

'I meal eat-1SG-RDP after'

'(I will be) after I have finished eating my meal (but I don't really know if I'll finish it).'

Note that the person suffixes go between the root and its reduplication. I never expected that because I am accustomed to Sanskrit which has the structure


e.g., ād-a (< a- + √ad 'eat')


'I ate' (perfect)

I also asked,

Is there a tendency not to have a reduplicated root right before an inflectional affix?

In other words, is the structure √-RDP-INFL rare or nonexistent? I wasn't even thinking of RDP-INFL-√ when I wrote that.

8.18.22:09: There are six possible sequences of root, reduplication, and a single inflectional affix:


2. √-INFL-RDP (Wobzi)

3. RDP-√-INFL (Sanskrit)



6. INFL-RDP-√ (Sanskrit reduplicated aorists with the augment a- in the INFL slot; does any language have an agreement prefix in that position?)  

Are 1, 4, 5, and 6 with an agreement prefix attested? Is the distribution of these sequences due to chance? What are the historical implications, if any, of these different orders? For instance, the first person singular and second person suffixes of Wobzi seem to be derived from pronouns.* (You can see both those suffixes and their possible source pronouns in the sentences above.) Did the grammaticalization of pronouns predate reduplication?

*√ > *√-INFL > √-INFL-RDP?

*On the other hand, the first person plural suffix -j does not appear to be related to ŋgə́ɟi 'we'. There is no third person suffix. HOW FREQUENT ARE REINTERPRETED REDUPLICATIONS IN SANSKRIT?

The short answer: Not very.

For perspective, here are the frequencies of a few basic verbs in the Digital Corpus of Sanskrit:

kṛ 'do': 28926

as 'be': 20762

gam 'go': 13531

'give': 8626

'go' (extended version of the next root): 4166

i 'go': 1014

ad 'eat': 237 (Sanskrit speakers didn't write much about eating, but they must have talked about it. Written language is not a mirror of spoken language in terms of content as well as style.)

Now here are the frequencies of all the reduplicated verbs reinterpreted as simple verbs that I know of and their source verbs:

√jāgṛ 'awake': 155 (see "Triply Awake in Sanskrit" and §641 and §1020a of Whitney's Grammar)

√gṛ: 81

√jakṣ 'eat' (less common synonym of √ad): 69 (see "Tell Me Why They Speak Differently")

√ghas: 0 (the corpus is not comprehensive; see Monier-Williams for citations)

√cakas 'shine': 20 (see §641 and §677 of Whitney's Grammar)

√cakṣ 'tell' (< 'verbally shed light on'?): 12 (see "Tell Me Why They Speak Differently")

both of the above are related to √kāś 'shine' (which only appears once in the DCS without reduplication): 38

√dīdī 'shine': 3 (0 for its later variant √dīdhī; see §641 and §676 of Whitney's Grammar) 

√dī: 0

√daridrā 'run': 0 (see §641 and §1024a of Whitney's Grammar) 

√drā: 0

√vevī 'go': 0 (see §641, §676, and §1024a  of Whitney's Grammar) 

√vī: 284

√pīpī 'swell': 0 (see §676 of Whitney's Grammar) 

√pī: 0

I now don't feel so bad about having either forgotten or never having learned these verbs because they're so rare. I suppose that if a verb is rare but had reduplicated forms in use (e.g., √kāś above), one might encounter the nonreduplicated forms so infrequently that one might regard the reduplicated forms as basic and build upon them: e.g., with rereduplications like the perfect 3rd sg jajāgāra 'awoke' < √jāgṛ < √gṛ.

8.17.23:40: One problem with that hypothesis is that the reduplicated forms of two of those verbs are rare in the DCS:

√gṛ: only one out of 81 instances is reduplicated

√vī: none of 284 instances is reduplicated

How could jāgr- and vevī- be reanalyzed as simple stems if they were hardly ever used? Or were they common in speech but not in writing - at least until their reanalysis was complete? TRIPLY AWAKE IN SANSKRIT

I almost mentioned √jāgṛ 'awake' as an example of a reduplicated root reinterpreted as a simple root in "Tell Me Why They Speak Differently", but I decided not to because it has a complication absent from √cakṣ 'tell' and √jakṣ 'eat': a long first vowel in the reduplication. The original root is √gṛ, and if it conjugated like a regular class 3 -verb such as

√ṛ 'move' > i-yar-ti 3rd sg

√pṛ 'fill' (cognate to its translation) > 3rd sg pi-par-ti

√bhṛ 'bear' (cognate to its translation) > 3rd sg bi-bhar-ti

I would expect the reduplication in the present indicative to be ji-. But it is jā: e.g., 3rd sg jā-gar-ti (not *ji-gar-ti). Is the length of ā a remnant of a lost laryngeal *H: *gʷeH-gʷorH > jā-gar-? Or has the intensive stem been generalized? (8.16.19:57: The latter. See §676 of Whitney's Grammar. But why generalize the stem from a low-frequency form? I don't see any intensive forms of √kṛ 'do' in the Digital Corpus of Sanskrit. Bucknell's Sanskrit Manual does not even list forms of this type of intensive. [It does list footnotes of a second type in footnotes.])

In any case, it is strange how a CVCV-verb was reinterpreted as a simple verb when most simple verb roots have the shape (C)V(C)(C). (On the other hand, it is not surprising that √cakṣ 'tell' and √jakṣ 'eat' were reinterpreted as simple verbs since their reduplicated forms do fit the CVCC template for simple verb roots like rakṣ 'protect'.)

The title refers to how the already reduplicated verb can be subject to reduplication: e.g., the perfect 3rd sg jajāgāra with two copies of √gṛ (the earlier perfect was jāgāra with only one copy).

8.16.21:18: All this brought to mind old and new questions:

Old: Why do fifty verbs (i.e., class 3) reduplicate in the present indicative? They do not have any common semantic feature: e.g. (examples from Burrow 2001: 322): jigharti 'sprinkles', piparti 'fills', bibharti 'bears', jigāti 'goes', mimāti 'bellows', śiśāti 'sharpens', siṣakti 'cleaves', dadāti 'gives', dadhāti 'places', jahāti 'leaves', babhasti 'eats', vavartti 'turns', sasasti 'sleeps', saścati 'they accompany'. 'Sharpens' might be considered an inherently repetitive action, but 'gives' does not necessarily entail giving again and again.

On the other hand, the motivation for reduplication is transparent in the intensive "which expresses intensification or repetition (emphasis mine) of the sense expressed by the root (Burrow 2001: 355). And reduplication in past forms (the perfect and one type of aorist) makes me think the verb was partly repeated to confirm that something had happened. The perfect 3rd sg cakāra and the reduplicating aorist 3rd sg acīkarat 'did' remind me of English did do. (Of course, the weak parallel ends there, as did is used with all English verbs. Dadau 'gave' is not reminiscent of *gave give.)

New: Sanskrit verb reduplication is leftward. Does any language have rightward reduplication? Is there a tendency not to have a reduplicated root right before an inflectional affix? The only inflectional affix in Sanskrit that precedes reduplicated roots is the augment a- (e.g., in the reduplicating aorist 3rd sg acīkarat 'did' above); all other prefixes are derivational: e.g., in the reduplicating aorist 3rd pl ācakrire 'they drove near' < ā 'toward' + √kṛ 'make'. A FISHY VOWEL IN JAPHUG

I initially thought it was unusual that Japhug only has the vowel /y/ in /qaɟy/ 'fish' and its derivatives. I might expect such an anomalous vowel to be in a loanword, but I don't know of any external source of the word, and I wonder if it is cognate to Tangut

2zhu3 'fish' < *CV-CuH.

Then I realized that I knew of another case of a language with a vowel only in one root and its derivatives: Sanskrit syllabic is only in √kl̥p 'arrange'; its external cognate Latin corpus has -r-.

And I also realized that Japhug /y/ may be a fusion of /wi/ that became a new phoneme for some speakers.'Fish' for other speakers is /qaɟwi/ which I assume is the older form.

8.15.22:59: Unfortunately I do not know of any other cases of Japhug /wi/ corresponding to Tangut u. This is not surprising since /wi/ is an infrequent rhyme in Japhug.

Until now I would have been surprised if *wi became Tangut u, because

- Tangut wy is from *wi: e.g.,

1200 1khwy4 'dog', cognate to Japhug khɯna and Written Tibetan khyi < *kwi

- Tangut u is mostly from *o (with at least a couple of exceptions from *ə)

There is no Tangut syllable *zhwy3, so one could claim that *wi became u3 after zh, but that is unlikely since the syllable shwy3 does exist, and it is probable that vowels developed the same way after sh and zh which only differ in voicing.

Even if I don't understand the sound correspondences involved, I don't think it is a coincidence that Macro-rGyalrongic languages have z-words for 'fish'. The vowels of those words are all over the place. STEDT lists the Proto-Qiangic reconstruction *r-dzwa. I am skeptical because such a form might have become Tangut *tswar, not 2zhu3. But I am unable to propose an alternative. TELL ME WHY THEY SPEAK DIFFERENTLY

I went through the list of 433 verbs in Bucknell's Sanskrit Manual and found three more cases of suppletion. (See my last post for √han ~ √vadh 'kill'.)

√cakṣ 'tell'

the future, desiderative, aorist, absolutive, and perfect participle are formed from √khyā

√paś 'see' (with a rare variant spaś which can mean 'spy' as a noun; cognate to spy)

all non-present indicative forms are formed from √dr̥ś which has no present indicative of its own

√vac 'speak' (cognate to vocal)

the present indicative 3rd pl vadanti is formed from √vad

I learned about √paś 'see' in my beginning Sanskrit class at Berkeley. But the other two are news to me.

I never heard of the root cakṣ - which is not surprising since it only occurs twelve times in the Digital Corpus of Sanskrit. I wouldn't expect suppletion in a word so infrequent that its irregular alternations would be hard to remember. But at least I can guess the reason for the suppletion: √cakṣ is a reduplication of √kāś that was reinterpreted as a separate verb*. It originally must have lacked forms without reduplication, and hence those gaps were filled by forms of khyā. However, some gaps were later filled by forms of √cakṣ: e.g., the causative 3rd sg cakṣayati (the causative of nonreduplicated √kāś is kāśayati).

The Sanskrit Grammarian does list the the present indicative 3rd pl vacanti, but it's not in the DCS.

√vac is ten times more common than √vad in the DCS. Why replace vacanti with vadanti?

*8.14.23:10: √kāś seems to be to √cakṣ what

- cakar- is to kr̥ 'do' (c-reduplication of a k-root; the palatalization of *kʷ to c reflects an earlier lost front vowel: *kʷe-kʷ- > ca-k-)

- dad- is to 'give' (loss of ā in a root preceded by its reduplication)

A similar reduplication that became an independent verb is √jakṣ 'eat' from √ghas 'id.' j is to g(h) what c is to k. gh devoices and deaspirates before s (which becomes after k).

I just noticed that khyā has no early attestations. It looks like a borrowing from a Middle Indic descendant of √kśā 'tell' which looks like a zero-grade variant of √kāś plus an ā-suffix like. 'go' < √i 'id.' According to Monier-Williams, "√kśā is mentioned as forming some tenses of √khyā and √cakṣ". Since roots are just artificial abstractions, one could reformulate that by saying there is just one verb with three types of stems: early ā-extended (√kśā), late ā-extended (√khyā), and reduplicated (√cakṣ). Or four if forms like kāśate (√kāś) are included. TELL ME WHY THEY SPEAK DIFFERENTLY 

I went through the list of 433 verbs in Bucknell's Sanskrit Manual and found three more cases of suppletion. (See my last post for √han√vadh 'kill'.)

√cakṣ 'tell'

the future, desiderative, aorist, absolutive, and perfect participle are formed from √khyā 

√paś 'see' (with a rare variant spaś which can mean 'spy' as a noun; cognate to spy)

all non-present indicative forms are formed from √dr̥ś which has no present indicative of its own

√vac 'speak' (cognate to vocal)

the present indicative 3rd pl is formed from √vad

I learned about √paś 'see' in my beginning Sanskrit class at Berkeley. But the other two are news to me.

I never heard of the root cakṣ - which is not surprising since it only occurs twelve times in the Digital Corpus of Sanskrit. I wouldn't expect suppletion in a word so infrequent that its irregular alternations would be hard to remember. But at least I can guess the reason for the suppletion: √cakṣ is a reduplication of √kāś that was reinterpreted as a separate verb*. It originally must have lacked forms without reduplication, and hence those gaps were filled by forms of khyā. However, some gaps were later filled by forms of √cakṣ w

does list of , but it's not in .

 is more common than . wy replace why cunpmmn forms? WHAT IS SO APPEALING ABOUT SUPPLETION?

I don't know.

I was looking through Bucknell's Sanskrit Manual whose entry for √han 'kill' has an 3rd sg. aorist avadhīt from an unrelated root √vadh. It is a bit as if the simple past of kill were slew. Bucknell lists other past forms for √han: e.g., the imperfect ahan and the perfect jaghāna. Why not form an aorist from the same root? Whitney's Sanskrit Roots in fact does list four types of such aorists:

1. aghāni passive 3rd sg. ("prescribed or authorized by the Hindu grammarians" but otherwise unattested)

(no type 2 aorist ahanat 3rd sg. active in Whitney - but see below!)

3. ajīghanat 3rd sg. active (from the epics onward)

4. ghān? 3rd sg. active (in the Sūtras); ahasta 2nd pl. active (theoretical)

5. ahānīt 3rd (Jaiminīya-Brāhmaṇa)

(no type 6 aorist *haṃsīt 3rd sg. active )

(no type 7 aorist *haṃsat 3rd sg. active ; this type is impossible with -an roots)

Conversely, Bucknell lists the 3rd sg. present active of √vadh as ... hanti from √han 'kill'! Yet Whitney lists present system forms for √vadh:

vadha? imperative 2

vadheyam optative 1 sg. active (Atharva-Veda), vadhet optative 3 sg. active (Vājasaneyi-Saṃhitā)

The key word is 'system'; the imperative and optative are part of that system, but the most basic part - the present indicative whose 3rd sg. should be vadhati - does not exist in Whitney. (But see below!)

Why were the paradigms of √han and √vadh intertwined? Many forms of linguistic change involve simplification. Suppletion, however, is complication (except from the point of view of counting the total number of forms). I can understand the appeal of analogy - making forms similar makes them easier to generate and recognize. But suppletion has the opposite effect. What is the payoff of replacing vadhati with hanti?

The Wikipedia article on suppletion does not mention a general theory of the origins of the phenomenon, though it is full of Indo-European examples (excluding Sanskrit!) and links to the Surrey Suppletion Database (with non-IE examples but still no Sanskrit). But it did lead me to Greville G. Corbett. I'll be looking at his works soon.

8.13.11:13: The Digital Corpus of Sanskrit lists type 2 aorist forms such as 3rd sg. active ahanat which occurs 23 times in the Mahābhārata alone.

And the DCS lists present indicative forms of √vadh of two different classes: e.g.,

1: vadhate 3rd sg. middle (once in Rāmayaṇa)

4: vadhyati 3rd sg. active (twice in Mahābhārata)

Monier-Williams listes the class 1 3rd sg. active vadhati.

So there is textual evidence for fuller paradigms of both verbs. I don't fault Whitney for not including these forms. I cannot imagine the effort needed to make a compilation like his in 1885 without using electronic corpora. Whitney was aware that his book could not be comprehensive (p. vi):

As a matter of course, no such work as the present can pretend to completeness, especially at its first appearance. The only important texts of which we have exhaustive verbal indexes are the Rig-Veda and the Atharva-Veda [...] But I trust it will be found that the measure of completeness here attained is in general proportioned to the importance of the material: that it is the more indifferent forms and derivatives which, having being passed over by the Lexicon, have escaped my glossing also.

I hope to see an exhaustive verbal index of the Tangut corpus someday.

A similar index for the Khitan corpus would also be nice - and difficult since we aren't always certain what is and isn't a verb: e.g., I am not confident that

324-090-262 <yên.ó.ui> (蕭敵魯17.22)

from earlier this week is a verb. The final character

262 <ui>

is a known converb suffix, but not all final instances of it are converbs: e.g.,

334-262 <g.ui> < Liao Chinese 國 *kuj

is a noun 'country'.

Before I conclude, I should point out that the notions of 'root' and 'paradigm' are abstractions. Bucknell (1994: xv) reminds us that roots are "handy labels artificially derived from the actually occurring verb (and noun) forms." Sanskrit students are taught to derive forms from roots, but in reality grammarians derived roots from forms.Similarly, grammarians draw up paradigms based on forms. Sanskrit speakers did not have grids in their heads mixing hanti and avadhīt. How many English speakers are aware that is, was, and be have been grouped together into the same paradigm? On the one hand, discussions of suppletion can be said to be about fictions. On the other hand, it is a fact that unrelated forms can be used in semantically similar contexts - and can completely replace an expected related form. But why? Why do speakers create such complications for themselves? What's the payoff? HOW DID THAI BORROW A WORD FOR 'SCORPION' FROM KHMER?

In my last entry, I mentioned Thai ขตอย [kʰà tɔːj] 'scorpion', a borrowing from Khmer. Normally I expect Khmer-Thai words to be spelled more or less as in Khmer. Hence I would predict that the Khmer word corresponding to ขตอย <khtʔy> is *​ខ្តយ <khtay> [kʰtɑːj] < *kʰtɔːj. But in fact the standard Khmer word for 'scorpion' is ខ្ទួយ <khduəy> [kʰtuəj] < *kʰduəj which should correspond to a hypothetical Thai *ขทวย <khdvy> [kʰà tʰuəj]. How do I account for this mismatch?

The mismatch of Thai [t] and Khmer [t] < *d (assuming the spelling is etymological*) is easy to explain. The Thai borrowing must postdate devoicing in Khmer:

Before devoicing in both languages, Khmer *d would have been borrowed as Thai *d which became Thai [tʰ].

If Thai devoiced before Khmer, Khmer *d would have been borrowed as Thai which became Thai [d].

If Khmer devoiced before Thai, Khmer *d would have become [t] which would have been borrowed as Thai [t].

After devoicing in both languages, Khmer *d would have become [t] which would have been borrowed as Thai [t].

On the other hand, the mismatch of Thai [ɔː] and Khmer [uə] is puzzling. Jenner and Pou (1980-81) regarded Khmer [uə] as phonemically unchanged from the Old Khmer period (when it was written <va>; the vowel symbol <uə> was created later). Does Thai reflect a pronunciation of /uə/ as something like [wɔ] in some premodern Khmer dialect?

Phonological issues aside, why would the Thai borrow a word for 'scorpion' from Khmer at a late (i.e., post-devoicing) period when Khmer was no longer a prestigious language?

*8.12.2:22: I can't find any word for 'scorpion' in Jenner's Old Khmer dictionary. So I don't know how old the spelling ខ្ទួយ <khduəy> is. In theory [kʰtuəj] could be from either *kʰduəj or *kʰtuəj. I can only find one ខ្ត- <kht-> [kʰt-] word in the SEAlang Khmer dictionary; ខ្ទួ- <khd-> is a far more common spelling of [kʰt-]. If 'scorpion' was from *kʰtuəj and was not written until after devoicing, was it written as ខ្ទួយ <khduəy> by analogy with the majority of [kʰt]-words? TRANSPARENCY, OPACITY, AND HARMONY IN KHITAN

When I started to look through the small script block index in Wu and Janhunen (2010) to find all the instances of small script character 342

my eyes halted at the sight of these two blocks:

006-140 <MOUNTAIN.en> 'mountain*-GEN' (epitaph for 蕭敵魯 Xiao Dilu [1061-?]* 39.12, 1114, and epitaph for 耶律詳穩 Yelü Xiangwen [1010–1091] 37.2, 1091)

and 006-151 <MOUNTAIN.ghu> (耶律詳穩 4.4; perhaps a personal name [Wu & Janhunen 2010: 145])


140 <en> and 151 <ghu> belong to opposite harmonic categories; the former is yin and the latter is yang.

If Khitan had simple vowel harmony - and it obviously doesn't - 006 <MOUNTAIN> (reading unknown) would only combine with yin or yang characters, not both.

006 seems to mostly combine with yang characters, so it probably represented a yang word. Could that word have been a cognate of Written Mongolian aghulan? There's no way to tell. There is no Khitan-internal evidence for any reading aside from the harmonic hints provided by other characters in its blocks.

If 006 is yang, what is 140 yin <en> doing after it?

I think there are two or three types of Khitan suffixes:

- those that are invariable: e.g., accusative-instrumental <er> and perfective /lUn/

- those that sometimes harmonize with the preceding stem: e.g., genitive <en>

- those that always harmonize with the preceding stem

The third type may not really exist.

Kane (2009: 132-135) lists six types of /n/-genitive suffixes:

<an> <in> <on> <un> <n> <en>

The first four generally appear after stems sharing their vowels. The genitives of <e>-final stems end in <n> rather than <en>. <en> is for consonant-final stems without any regard for vowels: e.g.,

<ta.ang.en> 'Tang dynasty-GEN' (郎君 2.5)

combines a stem with a yang vowel with the yin suffix <en>.

The only (?) consonant stems that do not take <en> might be those ending in <ong> which take <on>: e.g.


071-154 <ong.on> 'prince-GEN'

What's going on here? In general, Khitan consonant codas are 'opaque' to harmonic assimilation with the exception of /ŋ/ which is 'transparent' to labial harmonization.

Was Khitan /ŋ/ like Japanese -n which despite its romanization may be perceived by non-Japanese as ŋ and even has vocalic allophones: e.g., 本を /hoN o/ 'book ACC' is [hõũo] (Vance 1987: 36)? If <ong.on> was [õũon], the vowel [ũ] would not be as opaque as a consonant; it would let the labiality of the preceding vowel pass 'through' into the suffix but not its height (which is why <ta.ang.en> above is not *<> with a low-vowel stem and suffix).

Examples of 'opacity' and 'transparency' in Asian phonology:

In Sanskrit Kṛṣṇena 'by Krishna', retroflexion spreads from to the adjacent sibilant and nasal but not to the n of the final syllable.  Paradoxically, although ṣṇ became retroflex, they are 'opaque' and prevent retroflexion from spreading to the n of the instrumental suffix.

On the other hand, m in Rāmeṇa 'by Rama' is 'transparent', and retroflexion spreads 'through it' and into the n of the instrumental suffix.

In Khmer, vowels developed differently after voiceless and voiced consonants. *kʰɛr would normally become [kʰmae] and *mɛr would normally become [mɛ]. Yet *kʰmɛr 'Khmer' became ខ្មែរ [kʰmae] rather than *[kʰmɛ] because *m was 'transparent' and allowed *kʰ to condition the warping of the vowel.

On the other hand, was 'opaque', so *kʰɟɛŋ 'move apart' became *[kʰcɛːŋ]  rather than *[kʰcaeŋ] with a warped vowel.

In Thai, tones developed differently after aspirated, unaspirated, and voiced consonants. *kʰɛːn would normally become [kʰɛ̌ːn] with a rising tone and *mɛr would normally become [mɛːn] with a mid tone. Yet *kʰamɛːn 'Khmer' became เขมร [kʰà mɛ̌ːn] with a rising tone rather than *[kʰà mɛːn] with a mid tone because *m was 'transparent' and allowed *kʰ to condition the rising tone on the stressed second syllable. (Unstressed short syllables with voiceless initials developed low tones.) 

On the other hand, [t] in ขตอย [kʰà tɔːj] 'scorpion' (also a loan from Khmer) is 'opaque', so the second syllable has a mid tone rather than a rising tone.

*8.11.2:39:  Kane (2009: 36) gives 'tomb' and 'tomb cut into a mountain' as alternative glosses for 006 <MOUNTAIN>. KHITAN <UL.ÚN> (PART 1)

In my last post, I mentioned the Khitan verb

244-076-261-090-366-144 <s.gho.l.ó.ul.ún> (epitaph for 蕭敵魯 Xiao Dilu [1061-?]* 17.24, 1114)

containing a yang character <gho> ending in the perfective suffix <ul.ún>. I proposed that <ul.ún> might be invariable: i.e., that it didn't have allomorphs like its rough equivalents in its sister Mongolian and Manchu, the descendant of its neighbor Jurchen:

Mo yabu-lugha 'has gone' (allomorph after yang stems)

Mo ükü-lüge 'has died' (allomorph after yin stems)

Janhunen (2003: 24) reconstructed this ending (his 'confirmative [praesens perfecti]') as Proto-Mongolic *-lUxA. Do this ending and the Khitan ending go back to a Proto-Khitan-Mongolic *lU with different suffixes? The Khitan suffix added to *lU might be the nominalizer or participle suffix (?) <ún> after the causative/passive <> in

254-257-261-349-144 <ún> 'awarded' (?) (蕭令公 14.1; see below for more forms of this verb)

Ma susa-ha 'died' (allomorph after yang non-o stems)

Ma gene-he 'went' (allomorph after yin stems)

Ma o-ho 'became' (allomorph after yang o-stems)

I was certainly partly wrong - a price of writing in haste - as the suffix is actually /lUn/ (I use a capital letter to indicate uncertainty about the vowel), and the preceding vowel, if any, depends on the preceding stem, as I will demonstrate in a later part. But so far I do think <ún> is invariable. Here it is following the yin passive suffix <>:

247-257-261-349-261-144 <ún> (Kane 2009: 146; source not given) ~ 254-257-261-349-261-144 <ún> (蕭仲恭 8.16) 'was awarded'

It seems that Khitan vowel harmony might be obligatory within a root but is only obligatory in certain suffixes - and /lUn/ is not one of them. If Khitan had stress (a detail not indicated in its scripts), roots might have had primary and secondary (and even teritary?) stress whereas suffixes were unstressed and sometimes may have contained neutral vowels that by convention were written with yin characters. (So far I haven't seen any invariable suffixes with yang characters.)

8.10.2:25: The invariable Manchu accusative suffix be [bə] is a merger of Jurchen


<ba>, <be>, and <bo>.

Invariable Khitan suffixes may be the results of similar mergers: e.g.,

<er> [ər] (accusative-instrumental) < *ar, *ər, *or?

<lún> (perfective) < *lʊn, *lun?

The merger of vowels into schwa is of course not limited to harmonic 'Altaic' languages. English is full of unstressed schwas with multiple origins. WHAT IS KHITAN SMALL SCRIPT CHARACTER 342 DOING IN NATIVE WORDS? (PART 3)

Having examined all instances of 342 in Qidan xiaozi yanjiu, I moved on to Wu and Janhunen 2010 which has one word beginning with 324:

324-090-262 <yên.ó.ui> (epitaph for 蕭敵魯 Xiao Dilu [1061-?]* 17.22, 1114)

W&J (2010: 95) transliterated this as <üen.ó.ui> and identify it as being in the middle of a sentence, but did not go further than glossing the finite past tense (perfective in my view) suffix <ul.ún> in the sentence-final verb

244-076-261-090-366-144 <s.gho.l.ó.ul.ún>:

The overall meaning of this and the preceding section remains obscure.

W&J 2010 is full of variants of that sentence. At this point it is often simply impossible to do much more with Khitan than to spot finite verb endings and use them to divide unpunctuated text into sentences.

Using my simpliistic yin/yang test and assuming that 076 <gho> is yang, I tentatively regard all other characters in the verb above to be either yang or neutral.

I hypothesize that <ul.ún> could be an invariable suffix like the accusative/instrumental suffix


I will provide supporting evidence in my next post.

There is no doubt that 244 <s> and 261 <l> are neutral since they combine with both yin and yang characters. Here they are with the yin characters <g> and <ge>:

244-144-334-261-349-144 <s.úún> (興宗 20.16, 蕭令公 11.18)

144 <ún> is presumably in the root <s.ún.g> of that verb (<> is a passive/causative suffix), so I guess that it might be inherently yin in roots (since <g> is yin), though it could appear with roots of all types as part of the suffix <ul.ún>.

Going back to the word with 234, perhaps it was pronounced something like [jɛnɔwi] or [jɛnɔ(ː)j]. I am not sure how to interpret the sequence <ó.ui>. The final [i] or [j] may be a converb suffix. If there is a [w], it may be from a lenited *-b-: cf.

<tau> 'five'

which is cognate to Proto-Mongolic *tabu/n.

8.9.0:26: Janhunen (2003: 6, 397) does not reconstruct *w for either Proto-Mongolic or pre-Proto-Mongolic. Although Khitan is a para-Mongolic language - a sister to the Mongolic languages - there is no guarantee that it too lacked an original [w]. Nonetheless for now I hypothesize that all instances of [w] in Khitan are either in loanwords or transcriptions such as

070-131 <w.u> < Liao Chinese 武 *wu 'martial'

or are secondary: e.g., from *b or in intervocalic hiatus.

*Not to be confused with an earlier, more famous 蕭敵魯 Xiao Dilu (879?-918) who died almost a century earlier. WHAT IS KHITAN SMALL SCRIPT CHARACTER 342 DOING IN NATIVE WORDS? (PART 2)

I thought this series of posts might have five or so parts, but there may be as few as three. I went through all the instances of small script character 342

in the corpus in Qidan xiaozi yanjiu and only found two instances of it in non-Chinese (and hence possibly native) words that I didn't list in part 1. One is 342 by itself; another is

324-335-084 <yê> (or <yên.ya.ra>?)

which raises the following questions:

1. Did Khitan distinguish between /ɲa/ and /nja/? Was <yên.ya> phonemically /jeɲa/ or /jenja/? Could it also have been written with

222 <ń>?

2. It seems that at least some Khitan CV characters can also double as VC characters (cf. 𐰹 <oq/uq/qo/qu> and 𐰜 <ök/ük/kö/kü> in Old Turkic). Is the final character 084 <ar> or <ra>? If 084 was <ar>, how did it differ from

123 <ar>?

123 <ar> can be a perfective ending. If 084 is also <ar>, could it too be a perfective ending for a verb whose subject may be the immediately preceding phrase

085 131-236 133-118 <SIX m.qú> '? [of] the six divisions'?

8.8.0:33: Could 133-118 <m.qú> be a shorter spelling of

133-253-118 <m.o.qú> 'first' (itself a derivative of

133-186 <m.o> 'big, great' [m.])?

If so, then maybe <m.qú> is an adjective modifying a noun <yê>/<yên.ya.ra>, and the final <ar>/<ra> is not a perfective suffix. WHAT IS KHITAN SMALL SCRIPT CHARACTER 342 DOING IN NATIVE WORDS? (PART 1)

Until two days ago, I assumed that Khitan small script character 342,

variously read or transcribed as

[en] (Chinggeltei et al.1985: 99; Chinggeltei 2002: 29, Chinggeltei 2010: 421)

[ən] (Aisin Gioro 1999)

[æn] (Aisin Gioro 2004)

<iên> (Kane 2009: 63)

<üan> (Kane 2009: 305)

<üen> (Wu and Janhunen 2010: 268)

was only in Chinese borrowings.

But when I was looking for combinations of

033 <is> 'nine'

with 'yang' characters for my last post, I found this set of seven blocks combining 324 with the yang character 151 <ghu> which never appears in Chinese borrowings:

Block Character numbers Transliteration Attestations
151-324 <ghu.?n> 興宗 34.13, 蕭仲恭 30.49
151-324-033-090 <ghu.?ó> 許王 21.2
151-324-033-090-097 <ghu.?ó.úr> 道宗 26.19, 蕭仲恭 40.31
151-324-033-261-051-122 <ghu.?> 道宗 6.6
151-324-033-261-051-189-098 <ghu.?> 道宗 24.18

151-324-033-261-051-290 <ghu.?án> 仁懿 18.11, 蕭令公 5.7
151-324-033-311 <ghu.?> 許王 18.31, 許王 22.10

If 324 can be in the same block as <ghu> and <gha>, it is either yang or neutral. Which is a surprise if it was <üan> or <üen> because the combination of gh plus ü is alien to Mongolian (and Turkic). I think its vowel may have been [ɛ] which is the same height as the yang vowel [ɔ]. Could <ó> in the second and third blocks be [ɔ] - and úr in the third block be [ʊr] with the yang vowel [ʊ]?

Was Khitan [ɛ] was a yang vowel like the nonhigh front vowel *e, which I've regarded as the Old Korean yang counterpart of originally yin (later neutral) *i?

I already tentatively read

073 <ên>

as [ɛn], and 324 doesn't look like a variant of 073 (i.e., a character with an identical reading), so I think the reading of 324 must have been slightly different: e.g., [jɛn]. So perhaps the seven blocks above were read as something like

1. [ʁʊjɛn]

2. [ʁʊjɛnɪsɔ]

3. [ʁʊjɛnɪsɔʊr] or [ʁʊjɛnɪsɔːr]?

4. [ʁʊjɛnɪs(ɔ)lʁɑ(ː)j]

5. [ʁʊjɛnɪs(ɔ)lʁɑ(ː)l]

6. [ʁʊjɛnɪs(ɔ)lʁɑ(ː)n]

7. [ʁʊjɛnɪs(ɔ)b]

8.6.23:36: Do all those words share a root [ʁʊjɛn]? Are [ɪsɔ] and [b] heretofore unknown (derivational?) suffixes? Could 3 have a masculine perfective suffix /ɔr/? Are the [lʁɑ]-forms passives or causatives followed by converbs [ɑj], [ɑn], and the feminne perfective [ɑn]?

I have projected the vowel [ɔ] from 2 and 3 into what would otherwise be consonant clusters in 4-7.

I don't know if graphic double or triple vowels in 4-6 represented long vowels. <IS> 'NINE' NEUTRAL?

In "Emperor Nine", I proposed that Khitan <i> was phonemically /i/ with a lowered allophone [ɪ] in words with 'lower' series vowels: [ɑɔʊ].

Since vowel series correlate with degrees of consonant backness (higher : velar and lower : uvular) and possibly pharyngealization (as in modern Khalkha [see Janhunen 2003] and my Old Chinese reconstruction, though this detail is difficult to reconstruct), I will speak of 'yin'and 'yang' syllables

Syllable type Pharyngealization Velar or uvular? Vowel height Examples
Yin No Velar Higher [tə kə gə xə]
Yang Yes? Uvular Lower [t(ˁ?)ɑ qɑ ʁɑ χɑ]

to emphasize that whole syllables - not just segments - belong to one class or the other. And as many Khitan small script characters represent syllables, I will speak of 'yin characters' (e.g., <k g ge>) and 'yang characters' (e.g., <qa gha xa>).


<is> 'nine'

from "Emperor Nine" a yin character or a yang character? Its Proto-Mongolic cognate *yersü/n 'nine' has 'feminine' (i.e., yin) vowels, but that does not guarantee that it too is yin.

I propose a simple (and hence probably not foolproof test) of yin/yang classification:

- A character is probably yin if it appears in a block with the yin characters par excellence

<k g ge>

- A character is probably yang if it appears in a block with the yang characters par excellence

<qa gha>

- If a character appears in a block with both types of characters, it is neutral.

Note that cooccurrence with the invariable accusative/instrumental suffix <er> is not evidence for classifying a character as yin. The yang characters par excellence <qa> and <gha> appear before the suffix <er> in

<> 'qaghan-PL-ACC' (found in Kane 2009: 132 which does not specify the source; not in the texts in Qidan xiaozi yanjiu)

but that does not mean that <qa> and <gha> are yin or neutral because <er> has no allomorphs; it appears after syllables of all types. (Contrast that <er> with its homophone, the perfective suffix which has <er> after yin stems, <ar> after yang stems, and <or> after <o>-stems [yin/yang to be determined].)

Also note that this test only applies to words not borrowed from a language without harmony (e.g., Chinese). Chinese transcription combinations like

<>, a transcription of Chinese 開 *kʰaj 'open'

contain velar-lower vowel (i.e., yang consonant-yin vowel) sequences that I wouldn't expect in native Khitan words (or words borrowed from languages with harmony: e.g., <qa.gha> 'qaghan' which might be borrowed from Turkic).

I regard <is> as neutral because it appears with both yin and yang characters:

033-334-144 <is.g.ún> [isgun] (a title or name transcribed in Chinese as 乙室謹 *iʂikin; 蕭仲恭 1.10, 50.17) <g> is yin

The Chinese transcription may indicate that /s/ was palatalized to [ɕ] after [i].

The mismatch of 144 <ún> with Chinese *-in here and with -an in foreign renditions of <qid.ún> 'Khitan' is unexplained; the interchangeability of 311-144 <b.ún> with 288 <bun> (transcribed in Chinese as 本 *pun) points to an u-vowel (Kane 2009: 52).

073-033-051-123 <ê> [ɛnɪsɣɑr] '?' (興宗 31.14); <gha> is yang, and presumably <ên> is either yang or neutral

8.5.1:50: I am guessing that /s/ did not palatalize after [ɪ]. Cf. Ukrainian which has [sʲi] and [sɪ] but not *[sʲɪ].

I am now skeptical about pharyngealized consonants in Khitan. If yang syllables had pharyngealized segments, then <ê> would have been pronounced [ɛˁnˁɪˁsˁɣˁɑˁrˁ], and the character <is> would have stood for both [is] and [ɪˁsˁ]. Although Middle Old Chinese phonetic elements could represent both pharyngealized and plain syllables, that flexibility was due to sound changes that complicated an earlier, simplier system in Early Old Chinese: e.g., the phonetic series 壹 GSR 395 (Schuessler 29-13):

Sinograph Early Old Chinese: 壹 for *...ʔit(s) syllables Middle Old Chinese: 壹 for *ʔit(s) and *ʔˁiˁtˁ(sˁ) syllables
㦤懿饐 *ʔits *ʔits
曀㙪殪 *Cʌ-ʔits *ʔˁiˁtˁsˁ

The Khitan small script had no long history (though the possibility of changes in spelling over time should be explored), and so I assume the fit between sound and symbol was fairly close during its three centuries of use. EMPEROR NINE

If my hypotheses about Khitan consonant/vowel harmony are correct, was the first syllable of Chinese *xɔŋti 'emperor', written as


in the Khitan large and small scripts, [χɔŋ] with a uvular and lower series vowel or [xoŋ] with a velar and a higher series vowel? Or was it [xɔŋ] as in Chinese with an un-Khitan velar plus lower series vowel sequence?

I don't know for sure, which is why my transliteration <xong> is agnostic: <x> could be velar or uvular, and transliteration vowel symbols generally do double duty for the upper and lower series (except for <e> and <a> which form an upper-lower series pair). But maybe looking at combinations of <xong> in the small script might point in one direction or the other.

Qidan xiaozi yanjiu lists two such combinations for what I assume to be native (or at least non-Chinese) words:

075-151 <hong.ghu> (仁懿 6.1)

I think <gh> was uvular [ʁ], so <u> had to be lower series [ʊ]. My harmony hypothesis would then require <hong> to be [χɔŋ] (or in very strict notation, [χɔɴ] with a uvular nasal).

I suspect Khitan had a neutral vowel /i/ with higher and lower series allophones: [i] and [ɪ] depending on the context. So

075-033 <> (許王 52.8)

might have been pronounced [χɔŋɪs].

(The post title refers to this character block which does not mean 'emperor nine', though its components do mean 'emperor' and 'nine'.)

Next: Was 'nine' really neutral? DID KHITAN HAVE GH- IN BORROWINGS FROM CHINESE?


But I think my first blog post in seven and a half months should be longer than that, so here's the reasoning behind my answer.

For some time I've revised Kane's (2009) transliteration of the Khitan word for 'emperor' (below in the large and small scripts) <hoŋ di> [ɣoŋ di]


as <xong di>: e.g., here. The reasoning for this reading is as follows:

1. The large script spelling looks exactly like Liao Chinese 皇帝 *xɔŋti 'emperor'. This fact by itself does not guarantee that Khitan 皇 帝 also meant 'emperor' or sounded like the Chinese word for 'emperor', as there are many large script characters which have meanings and/or readings unlike their Chinese lookalikes.

2. The context of these two spellings indicates that they indeed did represent a Khitan word for 'emperor'.

3. In the Langjun inscription, the first character of the small script spelling is also used to write the Chinese surname 黃 *xɔŋ, a homophone of the first syllable of 皇帝 *xɔŋti 'emperor'. Therefore it is likely that the Khitan word for 'emperor' was something like *xɔŋti: i.e., <xong di>, which may have phonetically been something like [χɔŋti]*.

4. Although it is true that the Chinese word for 'emperor' once had a voiced back fricative onset *ɣ/*ʁ/*ɦ, all other Chinese loans in Khitan postdate the devoicing of initial voiced obstruents: e.g., 唐 'Tang dynasty' in the Langjun inscription appears as

229-199-140 <ta.ang.en>

from *tʰaŋ plus the Khitan genitive ending <en>, not

*171-199-140 <da.ang.en>

from earlier*daŋ. There is no reason to assume that 'emperor' was borrowed before the rest of the Chinese vocabulary in Khitan. Therefore it is most likely that the initial of the Khitan word for 'emperor' was voiceless <x>, not voiced <gh> (= Kane's <h>).

Yesterday I realized that

'empress' (in the small script; what is the large script equivalent?)

in Khitan - also a loan from Chinese - must be <xeu> and not <gheu> (= Kane's <heu>) for the same reason that <xongdi> 'emperor is not <ghongdi>. The correct reading of this character is crucial because it is part of the spelling for

<?.úr> 'spring'

which Kane transliterated as <heu.ur> (= <gheu.ur> in my system) and which I transliterate with a voiceless initial as <xeu.úr>. The Khitan word is cognate to Written Mongolian qabur:

Written Mongolian q a b u r
Khitan (Qinggeltei 2002) <ɣou> <ur>
Khitan (Kane) <heu> [ɣ-] <úr>
Khitan (this site) <xeu> [xəw] <úr>

Although my voiceless x is a closer match to WM voiceless q than Qinggeltei and Kane's voiced initials, it is still a fricative rather than the stop <q> that I would expect to correspond to WM q (as in

<qa.gha> : WM qaghan).

'Qaghan' is an areal political term that could have been borrowed independently into Khitan and Mongolian. Perhaps *q in the common ancestor of Khitan and Mongolian weakened to <x> in Khitan, and <qa.gha> was borrowed after that weakening. Are all <q>-words in Khitan late borrowings? No, because nothing indicates that the Khitan autonym


<qid.ún> (in the large and small scripts)

is of foreign origin. Did Khitan preserve a two-way contrast lost in Mongolian?

*q- > K <q>, M q: e.g., 'qaghan', 'Khitan'?

*x-> K <x>, M q: e.g., 'spring'

Proto-Mongolic *x is unrelated to Khitan <x>; it is from pre-Proto-Mongolic *p: e.g., PM *xon 'year' corresponds to Khitan


<po> ~ <p.o> 'time' (in the large and small scripts)

Moving on to the first vowel of 'spring', I would expect it to be <a> in Khitan, but the fact that the first syllable of 'spring' is homophonous with a borrowing of Chinese 后 *xəw 'empress' rules out <a>. I side with Kane and transliterate with <e> (for [ə]). Chinggeltei's <o> could be explained as a Khitan assimilation of the first vowel to the second (or lacking a dissimilation that occurred in Mongolic). However, I think <o> might be an anachronism since 后 did not have a rounded vowel in northern Chinese until recently; that syllable was borrowed into Manchu centuries later as heo [xəw].

Could Khitan have retained a *e lost in Mongolic?

*qebur > K <xeu.úr>, M qabur

Although *qe was impossible in Mongolic which permits qa and ke but not *qe or *ka, Mongolic might have developed harmonic restrictions under the influence of early Turkic (which also permitted qa and ke but not *qe or *ka). Khitan underwent a parallel development after *q shifted to <x>; was not possible in Khitan, and <ka> could only occur in loanwords and transcriptions: e.g.,

<>, a transcription of Chinese 開 *kʰaj 'open'

The alternate spelling


may represent a [χɑj] that was harmonic and hence less un-Khitan than <>. <x> does appear instead of <k> even in syllables where <k> would be harmonic: e.g.,

<x.i>, a transcription of Chinese 起 *kʰi 'rise' and 期 *kʰi 'period'

The lenition of *-b- in the Khitan word for 'spring' is also found in


<tau> 'five' (in the large and small scripts)

whose Written Mongolian cognate tabun retains the medial consonant.

*I hypothesize that Khitan had two series of vowels, 'higher' and 'lower', and that they appeared in complimentary distribution after back consonants in native words:

Native Khitan vowels (excluding the vowels ɨ and y which are only in Chinese loanwords)

higher i (no [e]? merged with *i?) ə <e> o u
lower ɪ e [ɛ] <ê> (> initial [ja]?) a ɔ ʊ

Native Khitan KV-combinations

Consonant class 'higher' 'lower'
k-, g-, x- X
q-, ʁ-, χ- X

Velar-'lower' vowel combinations were possible in loanwords and transcriptions: e.g., <> above.

Khitan had two series of obstruents that are transcribed by Kane as if they were voiceless and voiced. I follow Kane's practice, though I am not sure what the phonetic distinction was. Kane's voiced series usually but not always corresponds to the Chinese voiceless unaspirated series, so it may have been phonetically voiceless aspirated: e.g., <d> = [t]. The possibility that Khitan had nonphonemic voicing like Korean should be explored: e.g., <qid.ún> 'Khitan' may have been /qitun/ = [qɪdʊn] with intervocalic voicing. The seemingly inconsistent transcriptions of Chinese syllable initials may show patterns in polysyllabic contexts: e.g., <t> initially and <d> in intervocalic position. HOW NOT TO RECONSTRUCT MON-IC RHYMES

(Posted after expansion on 15.12.18.)

I thought it might be fun to try to work out my own reconstruction of Monic rhymes using the data in Diffloth (1984: 286) before seeing his solution. Real forms are in bold.

Correspondence 1 2 3 4 5 6
Proto-Monic *-eK *-eC *-iC *-iK *-ɛC *-ɛK
Pre-Nyah Kur *-iK *-iC *-iiC *-iiK *-ɛeC *-ɛeK
Nyah Kur -iC -iiC -iiK -iiC -eeC
Pre-Literary Mon *-eK *-iC *-iK *-eC *-eK
Literary Mon <-eK> <-iK> <-eK>

I assumed that Mon generally preserved vowel heights: <e> rhymes came from Proto-Monic nonhigh vowels, and the vowel of *-eC raised to assimilate to the following palatal.

My solution has a short absent from the reconstruction in my last post. (But such a vowel was in an earlier draft of that post. Both versions of my reconstruction lacked length.)

Capital letters stand for palatals (-c, -ɲ) and velars (-k, -ŋ).

I posit a chain shift in Nyah Kur:

1. *i > ii (contra last night's post in which *e > *ei > Nyah Kur ii)

2. *e > i (and *-iK > *-iC - but why didn't -iiK become *-iiC?)

3. > *ɛe

3a. *-ɛeC > *-eiC > -iiC

3b. *-ɛeK > *-eeK > *-eeC

In Written Mon, lower mid front vowels raised, and palatals backed:

1, *-eC > *-iC

2. > <e>

3. *C > <K> I finally got around to looking at Diffloth's solution two and a half weeks later. He used Old Mon and non-Monic languages to help him. I wonder what my solution would have looked like if I had checked that data..

Correspondence 1 2 3 4 5 6
Proto-Monic (this site) *-eK *-eC *-iC *-iK *-ɛC *-ɛK
Proto-Monic (Diffloth 1984: 288) *-iK *-iC *-iiK *-eeK *-iiC *-eeC
Pre-Nyah Kur (this site) *-iC *-iiC *-eeK *-iiC *-eeC
Nyah Kur -iC -iiC -iiK -iiC -eeC
Pre-Old Mon (this site) *-eC *-iC? *-iiK *-eeC
Old Mon (phonetic; this site) *-eiC *-iC? *-i(i)K *-eiC
Old Mon <iC> (once) ? <-iK>/<-īK> <-iK> <-iC> <-eC>/<-iC>
Literary Mon <-eK> <-iK> <-eK>

I prefer Diffloth's solution to my own not only because of its firm comparative grounding but also because

- it made use of existing contrasts (short vs. long, i vs. e) instead of eliminating length and introducing a third height (artifacts of the Pre-Proto-Monic reconstruction I didn't post).

- the phonetic shifts that I think it requires make more sense (and are much simpler for Nyah Kur though not for Mon):

Nyah Kur:

1. *-K > *-C after *i(i)

2. *ee > ii


1. Neutralization of long vowel height before *-K and *C:

*-eeK > *-iiK

*-iiC > *-eeC

2. Neutralization of vowel height before *-C: *-iC > *-eC (to match *-eeC)

3. Diphthongization of *e(e) to assimilate to a following *-C:

*-e(e)C > *-eiC

There was no Indic vowel symbol for [ei], so *-eiC was written in Old Mon as both <eC> and <iC>.

4. Loss of vowel length: *-iiK > *-iK

I predict that the first attestation of <-īK> with a long vowel predates the first attestation of <-iK>. If that is not the case, then vowel length was already being lost at the time of the earliest known Mon texts, and <-īK> with a long vowel represented a conservative, waning pronunciation.

5. Backing of *-C to *-K after front vowels (dissimilation?)

*-(e)iC > *-(e)iK

6. Straightening of *ei to <e> before *-K (the *-i- was motivated by a following palatal that no longer exists)

*-eiK > <eK>

One could regard 5 and 6 as the same change on an abstract phonemic level:

*/eC/ > /eK/

since */e/ was *[ei] before */C/. (There was no contrast between *-eC and *-eiC.)

Note that some of the above changes are mine and not Diffloth's, though their starting point is his.

I'm embarrassed by how wrong I was, but I'm keeping my original solution in this post anyway as an example of what not to do: namely, fail to exploit existing resources. One approach to solving problems like this one is to calculate the maximum number of possibilities given existing variables and then see which possibilities make for the simplest 'story':

2 heights (i, e) * 2 lengths (short, long) * 2 coda classes (-C, -K) = 8

1. *-iC

2. *-iiC

3. *-iK

4. *-iiK

5. *-eC

6. *-eeC

7. *-eK

8. *-eeK

Although "2 lengths" may seem redundant, there are languages with three vowel lengths (more examples here). (Mon, however, is not one of them.)

*-eC and *-eK can be ruled out at the proto-Monic level because Nyah Kur, the only Monic language with vowel length, lacks short e.

Another error I made was investing too much in wrong hypotheses: the absence of vowel length and the presence of a third front vowel height. If I had worked from the eight possibilities above, I might have seen how a more Nyah Kur-style solution was superior.

Lastly, the improbability of Pre-Nyah Kur *-iiK not becoming *-iiC given *-iK > -iC in my reconstruction should have been a warning sign. It would make phonetic sense for Pre Nyah Kur *-iiK to become *-iɨK (i.e., have its second half lose its palatality to assimilate to the following velar) which would then be immune to the change *-iK > *-iC. Then *-iɨK would simplify into to Nyah Kur *-iiK. However, such a complex intermediate stage is obviously an attempt to salvage a flawed reconstruction; it adds unnecessary complexity. Diffloth's solution is much simpler: *-eeK > -iiK. PROTO-MONIC VOWELS IN DIFFLOTH (1984)

(Posted after expansion on 15.12.17-18.)

I have been reading Gérald Diffloth's The Dvaravati Old Mon Language and Nyah Kur (1984) to understand Monic history and to get a better grasp of vowel warping which also occurred in Chinese and possibly also Tangut. The complex diphthongs,of modern Monic languages come from much simpler Proto-Monic vowels: e.g.,

Singu ʔəsʌe̲i̯a < Proto-Monic *k[r]sw 'to whisper'

The underline and subscript diaresis indicate the most prominent vowel with clear (underline) or breathy (diaresis) phonation. Proto-Monic did not have phonemic phonations (though perhaps the two types of phonation already existed on a subphonemic level). Conversely, the subscript inverted breve indicates the least prominent vowel.

I have not found a diagram of Proto-Monic vowels (as opposed to rhymes) in Diffloth's book, so I made one myself:

*i/*ii length only contrastive before final palatals and velars *ɯ/*ɯɯ only two instances of *ɯɯ known *u/*uu *uu nearly in complementary distribution with *oo
*iə nearly in complementary distribution with *ɛɛ (no *ɯə!) (*uə as alternative to *ɔɔ)
(no *e or *ee!) /*əə only two instances of *əə known -/*oo nearly in complementary distribution with *uu
-/*ɛɛ nearly in complementary distribution with *iə *a/*aa /*ɔɔ length only contrastive before final velars could *ɔɔ be *uə?

The five most common vowels according to Diffloth (1984: 284) are in large type.

Although *ɯ(ɯ) is a back vowel, I have followed Diffloth in grouping it with the central vowels.

I moved the low vowels *a/*aa into the lower mid row for a more compact chart.

If I ignore marginal contrasts, I could reconstruct a more symmetrical Pre-Proto-Monic system with a phonemic length distinction for only one vowel:

*i *u
*iə > *ɛɛ (no *ɯə!) *uə > *ɔɔ
*e > *ii *o > *oo and *uu
(no *ɛ) *a/*aa

11.30.0:10: I could even eliminate that one last remaining length distinction by positing a chain shift: *ɯə > > *a > *aa. But I would rather reconstruct a high-frequency vowel as *aa instead of *ɯə.

Here's how I think front vowels could have developed between Pre-Proto-Monic and Proto-Monic:

Stage 1 Stage 2 Stage 3
*i *i *i/*ii (< *ei < *e)
*iə *iə *iə
*e *e (no *e)
(no *ɛ) *ɛɛ (from *eɛ < *ie < *iə before glottal stops) *ɛɛ

For comparison, here's a table of the development of back vowels:

Stage 1 Stage 2 Stage 3
*u *u *u/*uu (< *ou < *o before nongrave consonants)
*uə (no *uə) (no *uə)
*o *o *oo (< *ou < *o before grave consonants)
*ɔ/*ɔɔ (from *oɔ < *uɔ < *uə) *ɔ/*ɔɔ

In both systems, diphthongs (*iə, *uə) monophthongized into long vowels, and mid vowels (*e, *o) became yet more long vowels.

It is not surprising that there is only a three-way contrast between front vowels since front vowels are less frequent than central or back vowels in Proto-Monic. Do other Austroasiatic languages have relatively low frequencies of front vowels? The imbalance of front and back vowels in Pre-Proto-Monic may go back to Proto-Austroasiatic. Shorto's 2006 reconstruction of Proto-Austroasiatic has no corresponding to *ɔ.

12.18.7:28: I am not happy with the above proposals since they require unmotivated splits:

- *iə > *iə except

- before glottal stops: *iə > *ɛɛ

- in two problematic words without glottal stops (Diffloth 1984: 282-283):

*t[l]m[ɛɛ/aa]t 'flattened' (Nyah Kur points to *ɛɛ, but Mon points to *aa)

*k[ ]ʔɛɛm 'to clear one's throat' (a sound-symbolic exception to sound change?)

- *o > *oo or *uu before glottal stops (the height is otherwise predictable; see Diffloth 1984: 276-278, 380-381)

One might be able to write off the two unexpected cases of *ɛɛ, but there are ten instances of both *oo and *uu before glottal stops (Diffloth 1984: 377-378). Twenty words are too many to be ignored. If Pre-Proto-Monic had length distinctions for *a and *u (and other central vowels?), perhaps it had one for *i as well:

*i/ii *ɯ/ɯɯ? *u/uu
(no *e/*ee) *ə/əə? (no *o)/*oo
(no *ɛ)/*ɛɛ *a/*aa *ɔ/*ɔɔ

That inventory is almost identical to Diffloth's. We have come almost full circle.

The distribution of distinctive long vowels is identical to that of Nyah Kur (Diffloth 1984: 52). HOW DID KUMĀRA BECOME KAUR?

(Posted after expansion on

The Sikh surname ਕੌਰ Kaur (Punjabi 'princess') is from Sanskrit kumāra 'boy, prince' (f. kumārī) which has the following Punjabi descendants listed in Turner's dictionary:

kavār, kãvārā, kavārā, kuārā, kamārā m. 'bachelor'

kãvārī, kavārī, kuārī, kamārī, f. 'virgin'

kãvar m.' prince'

kaür, kaur m. 'boy, prince'

Those forms share the following changes in common:

The first vowel of kumāra was (almost) always delabialized: u > a.

Is the u of kuār- from u or from m?

kumār- > kuār-?

kumār- > kamār- > kãvār- > kavār- > kuār-?

Is the u of kaür/kaur from u or from m?

kumār- > kuār- > kaur?

kumār- > *kumar- > kuar- > kaur?

kumār- > kamār- > *kamr- > kaur?

The latter scenario seems unlikely since it requires a short vowel to remain while a long vowel disappears.

m lenited to v (sometimes nasalizing the previous vowel) or was lost (except in kamārā).

The original final vowel was lost, and new vowel suffixes were added:

Masculine is a Punjabi suffix. (Sanskrit is feminine.)

Is feminine a direct retention of Sanskrit -ī?

What I don't understand is:

- how a single form could develop in six or seven different ways (is the difference between the last two purely orthographic?):

kavār, kãvārā, kavārā, kuārā, kamārā, kaür, kaur

Are those forms taken from different dialects and/or different time periods?

- how kaür ~ kaur lost long ā (unless a is from ā)

Tangut fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2015 Amritavision