Archives TANGUT GRADE III -A('): RHYMES 19 AND 21 (PART 2)

I started what was meant to be a series almost three weeks ago. Then I got caught up correcting my own mistakes  - the ones I noticed, that is: a wrong rhyme and a wrong fanqie speller. (There must be even more errors in my Tangut reconstruction that I haven't even noticed yet!). I thank David Boxenhorn for reminding me of my plans to write about Tangut 'apostrophe' rhymes like 21 -ɨa'.

I got the apostrophe notation from Arakawa Shintarō. He uses apostrophes to indicate glottal stops in initial position, so maybe they indicate final glottal stops in his reconstruction. I, on the other hand, use apostrophes simply to mean 'different in some unknown way'.

In the Tangraphic Sea, nonapostrophe rhymes are followed by similar apostrophe rhymes in the first group of rhymes (1-60). Apostrophes and tenseness are always mutually exclusive, and apostrophes and nasality are almost mutually exclusive. Their coexistence in 59-60 should be investigated. There are also some anomalous combinations of nasality with tenseness (65 and 76) and retroflexion (97-98). I am fairly confident about the classification of rhymes up to 60; later rhymes, particularly 97 and up, are iffy, and others interpret them very differently (e.g., Arakawa only has two retroflex apostrophe rhymes: 88-89). The ordering pattern breaks down after 62: e.g., 63 is an e-type rhyme rather than an i-type rhyme. 104 and 105 look like last-minute additions.

Rhyme type
Plain rhymes
Tense rhymes
Retroflex rhymes
Apostrophe Nasal
99, 101
17-20, 105
65, 76 (!)
97-98 (!)
59-60 (!)

I do not rule out the possibility of a reinterpretation of the later rhymes in the future. For now, let's focus on 19 and 21.

In my reconstruction, both 19 and 21 are Grade III rhymes, so in theory they should have Grade III initials (in green). The reality is messer. Unexpected initial types are in pink, and initial types with minimal pairs are in red.

Other laterals
19 -ɨa


h-, ɦ-!
20 -a
21 -ɨa'
d-, n-!
k-, kh-!
ʔ-, ɦ-!
24 -a'

I used to follow Gong and write glottal fricatives as if they were velar, but I thought it was odd to have x- and ɣ- under "Glottal", so I now follow Arakawa and write the voiceless fricative as h-. By analogy I write its voiced counterpart (absent in Arakawa's reconstruction) as ɦ-.

There are no labial or lateral fricative (ɬ- ɮ-) initials which were generally somehow incompatible with Grade III. The absence of lateral fricatives is due to a larger constraint against alveolar fricatives in Grade III (but see below!).

Arakawa reconstructed Grade III as vowel length and reconstructed 21 as the only Grade IV rhyme in his system with both vowel length and medial -y-. But I don't understand why those features would be incompatible with labials. If pya and pa: were possible (Arakawa 1997: 128), why not *pya:?

This site






In my reconstruction, labials are rarely followed by Grade III -ɨ-.

Let's look at all the anomalies and see if they can be explained (or at least have notable features):

Li Fanwen 2008 number

bent, winding, crooked (only in dictionaries) No Grade IV rhyme 20 *kwa; regular reflex of  pre-Tangut *Cɯ-kwa or *Pɯ-ka?
1tsɨa to broil, roast (only in dictionaries); cognate to Grade IV rhyme 20 1tsa 'hot'
Why isn't this Grade IV rhyme 20 1tsa?
Minimal pair with Grade IV rhyme 20 1tsa
first half of 1hɨa 1ʂɤe 'to condemn' (only in dictionaries)
No Grade IV rhyme 20 *ha; regular reflex of pre-Tangut *Cɯ-ha?
fast, rapid
2ɦɨa second half of 1dzəʳ 2ɦɨa 'fast, rapid'; cognate to 2521 with voiced initial conditioned by lost prefix and second tone conditioned by suffix *-H
No Grade IV rhyme 20 *ɦa; regular reflex of pre-Tangut *Cɯ-Ka (with lenition of intervocalic *-K-)?

cover, lid, to cover; borrowing from Late Middle Chinese 盒 *xɑ(p) 'box' or some related word with voiced initial and vowel bending conditioned by high-vowel prefix?
umbrella of a carriage (specialized usage of 3008 above)
to make a detailed inquiry
No Grade III rhyme 19 *lwɨa; regular reflex of pre-Tangut *Cɯ-lwa or *Pɯ-la?
second half of 1bạ 1lwa 'lower limbs, legs'

1dɨa' second half of 1ti 1dɨa' 'to drip' (only in dictionaries); < Tangut period northwestern Chinese 滴答 *ti tɑ (but final vowels don't match - did front vowel of 1ti condition breaking of an earlier in the following syllable?) Minimal pair with Grade IV rhyme 24 1da' for transcribing Sanskrit ḍa
1nɨa' black
No Grade IV rhyme 24 *na'; regular reflex of pre-Tangut *Cɯ-naX?
2nɨa' to not be
dung, excrement
second half of 2mə 2nɨa' 'Tangut'; Tibetan minyag 'Tangut' may reflect an earlier or nonstandard form; may be derived from 0176 'black' plus a suffix *-H conditioning the second tone
1kɨa' transcription of Sanskrit ka and
No Grade IV rhyme 24 *ka'; regular reflex of pre-Tangut *Cɯ-kaX (except for transcription character, of course)?
foundation, basis, burden; transcription of Sanskrit ka and
pedestal, plinth (same word as 3985 above)
1khɨa' transcription of Sanskrit kha No Grade IV rhyme 24 *kha'
1ʔɨa' yes
No Grade IV rhyme 24 *ʔ(j)a'; regular reflex of pre-Tangut *Cɯ-ʔaX?

horn (only in dictionaries)
second half of 2vɪ 2ʔɨa' 'singing' (with 2vɪ 'to sing'; both halves only in dictionaries)
gold (less common synonym of 1kɤẹ)

Six anomalies in Gong's reconstruction (3456, 3502, 5584, 5763 in rhyme 20 and 0357, 0837 in rhyme 24) are not listed because they are no longer anomalies if they are reconstructed with ld- (following Tai 2008) instead of l-. ld- may have been a lateral affricate with the same pattern of distribution as the lateral fricatives ɬ- and ɮ-: i.e., in Grades I, II, and IV but not III.

Out of the remaining twenty-four anomalies, only two have corresponding Grade IV syllables (3408 and 2936). Those minimal pairs force me to reconstruct 19 and 20 differently (unlike Arakawa and Gong who seem to reconstruct them as homophones*).

All others are in complementary distribution, albeit not in the ideal pattern of complementary distribution. Did the Tangut dictionary tradition reflect a mixture of dialects with different sound changes: e.g.,

- in dialect A, *Cɯ-tsa became Grade III rhyme 19 1tsɨa

- in dialect B, *Cɯ-tsa became Grade IV rhyme 20 1tsa

and the dialect A form was chosen to be the standard form for 'to broil' while the dialect B form was chosen to be the standard form for 'hot'. I would rather not reconstruct different prefixes to account for the different vocalism of 'to broil' and 'hot' which probably share the same root *tsa.

Only two of the anomalies (3948 and 4823) are characters created for transcribing Sanskrit, and one of them (3948) is homophonous with a native word (3985). Why were Sanskrit ka, kā, and kha transcribed with the Tangut rhyme -ɨa' containing -ɨ- and the mysterious apostrophe feature absent from Sanskrit? (No Tangut transcription of Sanskrit khā is known.) Was that practice influenced by the Chinese transcriptions 迦 *kɨa and 佉 *khɨa for those syllables? (In earlier Chinese, *k(h)a was [q(ʰ)ɑ] with an un-Sanskrit uvular, so velar-initial sylalbles with medial *-ɨ- were regarded as closer matches.) The Tangut transcription of Sanskrit ka as

4620 1ka

without  -ɨ- may reflect Sanskrit filtered through Tibetan or even Sanskrit itself. Are transcriptions with 4620 closer to (Tibetanized) Sanskrit? Conversely, are transcriptions with

3948 1kɨa', 3985 1kɨa', and 4823 1khɨa'

based on Sinified Sanskrit? Or were those two types of transcriptive characters randomly mixed up?

*Although both 19 and 20 are -a: in Arakawa's notation, Arakawa's (1997: 128) table appears to list two subtypes of each of those rhymes (not including subtypes with -w-). 'CROSSED NINE' IN THE KHITAN SMALL SCRIPT

Today I was looking at the Khitan small script fish tally in Bushell (1897: 18) which ends with character 089 resembling Chinese 九 with a bar across it:

Kane (2009: 45) wrote,

Aisin Gioro 2004: 51 notes that the title for a [Khitan] lady of high rank, 別胥 biexu [in modern standard Mandarin pronunciation] was normally written


<b.ɥ.dz.ü> ~ <p.ɥ.dz.ü>

but in Gu [i.e., 故耶律氏銘石 Gu Yelü shi mingshi, the epitaph of Mme. Yelü, 1115] it is written


suggesting that [089] is similar to [258] <dz>. [089] is only found in [native] Kitan words. In the rhymed sections of the Xingzong inscription, [089] rhymes with [131] <u>.

The evidence for the pronunciation of 089 points in contradictory directions:

1. 別胥 was something like *pje(ʔ)sy in Liao Chinese.

2. 089 was interchangeable with 258 <dz> (used for Chinese unaspirated *ts)

3. 089 may have fused with 289 <ü> to represent a syllable [Cy].

4. 089 rhymed with <u>.

I think 089 might have been <su> or <sy>:

1'. The Chinese transcription 胥 *sy suggests [s].

2'. Although 258 <dz> was created to transcribe Chinese *ts, an affricate absent from Khitan, the Khitan often spelled that foreign consonant with 244 <s>. Perhaps even those who spelled Chinese loanwords with 258 <dz> may have pronounced them with [s]. So interchangeability with 258 <dz> may indicate either [dz] or [s].

3'. Maybe there was a rule of assimilation: <su.ü> > [sy(ː)] or <ɥ.us> > [y(ː)s] (if <su> had an alternate reading [us] after consonant characters; 082 <ɥ> is usually a semivowel, though perhaps it was [ø] in the title transcribed as 別胥)

4'. It is simplest to assume that 089 ended in [u] if it rhymed with <u> [u], though the possibility of the rhyming of similar vowels ([y] and [u]) cannot be ruled out.

The Chinese loanword data in Kane (2009) lacks the syllables *su and *sy. Would such syllables have been transcribed as 089?

If 089 had an alternate reading [us], how would it have differed from


068 and 103 <us>?

Are there any instances of 089 alternating with those characters?

I also considered the possibility that the alternation between <089.ü> and <dz.ü> might have indicated a reading like [dʑu] or [dʑy] for 089, but if 089 were <ju>, it would be homophonous withs


147 ~ 148 ~ 149 <ju>

and therefore redundant. And there are no known cases of Liao Chinese *tɕy transcribed as 089. (The Khitan consonants written as voiced obstruents corresponded to Chinese voiceless unaspirated obstruents.)

(10.29.0:29: The modern standard Mandarin pronunciation of 九 'nine' as [tɕjow] is not evidence for pronouncing 089 as [dʑu] or [dʑy]. In Liao Chinese, 九 was *kiw, and the Khitan borrowed it as


with a velar stop, not a palatal affricate.)

089 appears at the end of


<284.089> which "must refer to the emperor, the throne, or affairs of state" (Kane 2009: 69): i.e., 'imperial'

Are there any continental 'Altaic' terms for rulers ending in something like -su or -us? The first word that comes to mind is Mongolian ulus 'people, nation' which has already been proposed as a potential cognate of

<xu.177> (see the discussion of this mysterious word in Kane 2009: 162-165)

Could <284.089> have meant 'national'?

The vertically stacked variant of <284.089> is from the fish tally. (The handwritten copy of the fish tally on p. 623 of Qidan xiaozi yanjiu has the regular horizontal combination.) The significance of vertical stacks, if any, is unknown. Back in May I started to collect vertical stacks for a future post, but I never finished. LOST WORD FAMILIES

Today I read Stephen Wootton Bushell's account of the end of the Tangut Empire. I looked through Li Fanwen's (2008) Tangut dictionary to translate 亡國 'lost country' and found six equivalents of 亡 'to be lost, die':

1. 0316 1xwɤa 'to lack, die, kill'

2. 0788 2me (second half of 1sə 2me 'death'; the first half is 'to die')

3. 1508 1bɛ 'to lose, fail'

4. 1839 1ɬø 'to lose, fail'

5. 2194 1me 'to not exist, not have'

6. 4007 1phɑ 'to damage, lose'

The m-words belong to an m-family of Tangut negatives related to *m-negatives in Old Chinese (e.g, 亡 *Cɯ-maŋ 'to be lost') and elsewhere in Sino-Tibetan. Only two (1918 and 2376) contain 'not' (Nishida radical 041):

0788 2me < *CE-ma-H or *Cɯ-ma-j-H

This bound morpheme is homophonous with 1064 'not yet', but may have a different pre-Tangut origin because 1064 precedes rather than follows the verb 'to die', and 'not yet die' for 'death' makes no sense.

0944 1mʌ̣ < *Sʌ-mə 'not'

1064 2me < *CE-ma-H or *Cɯ-ma-j-H 'not yet'

1918 1mi < *CI-ma 'not' (the most general negator)

2194 1me < *CE-ma or *Cɯ-ma-j 'to not exist, not have'

2376 2mẹ < *SE-ma-H or *Sɯ-ma-j-H 'nothing, not'

5643 1mə < *mə 'not' (for auxiliary verbs)

I have been tempted to include 1943 2nɨa' 'not' (before 'be') in that list, but I cannot prove that n- is from *mj-. Nor can I explain why an *m-family word would have a medial *-j-. My pre-Tangut reconstruction has no *-j-infix.

If 0316 1xwɤa 'to lack, die, kill' is from *P-xra, it might be related to the first syllable of

3913 4862 1xɤə 2lɨa' < *xrə (Cɯ-)laXH 'to leave' (only in dictionaries).

4862 (also written 4951 ) is 'frontier, border' by itself, but it would be odd to have a noun in second position if 3913 4862 was a verb-noun sequence. Is 3913 a prefix that derived a verb out of a noun? Or is 4862 a phonetic symbol for a syllable unrelated to 'border' in 3913 4862? Could 3913 4862 be a sequence of verbs: e.g., 'vacate leave'? Then the second half of 3913 4862 might be related to a lateral-initial family of 'loss' words

1068 1lɨə < * 'to fall, sink'

1839 1ɬø < *Kɯ-lo < *-əw < *-ə-k? 'to lose, fail'

3545 1ɬəʳ' < *R-K-lə 'to lose, fall'

related to Old Chinese 失 *l̥it 'to lose'.

Li Fanwen (2008: 252) regarded 1508 1bɛ (Grade I rhyme 34) as a loan from Chinese 敗 'to lose', but the two may be unrelated lookalikes, as I would expect Middle Chinese 敗 *bɤajʰ (Grade II) to correspond to Tangut *bɤe (Grade II rhyme 35). Gong (2002: 421) regarded 1508 as an irregular loan. See Gong (2002: 421) for examples of the regular correspondence between Middle Chinese *-ɤaj (his *-aj) and Tangut Grade II rhyme 35 in loanwords.

Li Fanwen (2008: 643) translated 4007 1phɑ 'to damage, lose' as Chinese 破 'to break, smash'. I agree with Gong (2002: 417) who regarded 4007 as a loan from Middle Chinese 破 *phɑ. THE GOLDEN GUIDE: LINE 102: TANGRAPHS 506-510

102. Translating lists of Chinese surnames in tangraphy is relatively easy, but not as interesting as translating coherent text. No wonder I haven't been motivated to translate the Golden Guide since I got stuck in the surname section in 2010. If only I had more patience. The last surname is just four lines away!

Tangraph number 506 507 508 509 510
Li Fanwen number 4579 2736 4807 2177 2476
My reconstructed pronunciation 2l
2bɤa' 1khi 1pʌ 1xwɤa
Tangraph gloss the surname element Lu the surname element Ba to lose (< Chn 棄 *khi) big flower (< Chn 華 *xwɤa)
Word the surname 呂 Lü (*lɨu) the surname 馬 Ma (*mbɤa) the surname 杞 Qi (*khɨi) or 祁 Qi (*khɨi) the surname 不  Bu (*pʌ) the surname 華 Hua (*xwɤa)
Translation Lü, Ma, Qi, Bu, Hua

506: The analysis of 4579 is unknown, but its structure is obviously inspired by its Chinese soundalike 呂 (also 吕) which looks like a stack of two 口 mouths. (It is actually a drawing of the spine.) The left side of 4579 consists of two Tangut

mouth radicals. The right side

has no known independent function and could be from 547 (!) other tangraphs. Some radicals can stand alone while others require it as an apparent filler: e.g.,

0764 1reʳ 'horse' (with a radical derived from Chinese 馬 'horse' on the left)

which brings us to the next tangraph.

507: The analysis of 2736 is also unknown, but it is obviously related to 0764 'horse' (above).


2736 2bɤa' sounds like Chinese 馬 *mbɤa 'horse', the translation equivalent of 0764 1reʳ. The mysterious phonetic feature that I write as an apostrophe must not have made 2bɤa' sound too different from Chinese *mbɤa. (Tangut b- might have been prenasalized [mb].)

It would be nice if the top left radical of 2736 were a diacritic indicating that a tangraph was to be read like its Chinese translation. However, I doubt that is the case. I don't have time to investigate all 42 tangraphs with that radical on the left at the moment, so for now I'll pick one at random which may not be representative:

2314 2ʔɨu 'death' (only in dictionaries?; analysis unknown)

does not sound like any Chinese word for 'death', and subtracting its left-hand radical results in


5156 1vɑ (name and transcription character; see 369) =

left of 5489 2ryʳ (surname element rur)

right of 1925 2bɨu (surname element -bu)

which has nothing to do with death and doesn't even sound like 2ʔɨu. I presume that a va-family had something to do with families whose names contained the syllables rur and bu.

Miscellaneous Tangraphs (27.7.11-12, #837) lists these last two Chinese surnames in the opposite order:

2736 2bɤa' 'Ma' and 4579 2lɨu 'Lü'

508: The other three tangraphs have surviving analyses; ironically one of them is 'to lose':


4807 1khi 'to lose' =

top of 4910 2ve (second half of 1ʂwo 2ve 'to clear away, clean up'; semantic) +

all of 3545 1ɬəʳ' 'to lose, fall' (cognate to Old Chinese 失 *l̥it 'to lose'?; semantic)

3545 has a circular analysis:


3545 1ɬəʳ' 'to lose, fall' =

bottom left of 1068 1lɨə 'to fall, sink' (cognate to 3545?; semantic) +

bottom right of 4807 1khi 'to lose' (semantic)

3545 looks like a semantic compound of 'die' (Nishida's radical 045) and 'hand':


509: 2177 is a semantophonetic compound:


2177 1pʌ 'big' =

left of 2892 2khwɛ 'big' (< Chn 魁 *khwɛ) (semantic)

all of 2306 1pʌ (second half of 2tsoʳ 1pʌ 'small colt') (phonetic)

2306 has a dubious circular analysis:

2306 1pʌ  =

center of 2177 1pʌ (phonetic) +

right of 2132 2ʔjew 'achievement' (why?)

The analysis of 2132 also leads back to 2177:


2132 2ʔjew =

2477 2thọ (second half of 1dza 2thọ 'to grow up'; semantic)+

2177 1pʌ 'big' (semantic)

I think 2306 came first, followed by 2177 and then 2132.

510: I'm not surprised the tangraph for the loanword for 'flower' is derived from the tangraph for the native word, but what is 'head' doing?


2476 1xwɤa 'flower' (< Chn 華 *xwɤa) =

left and center of 2750 1ɣɤu 'head' (why?) +

right of 2467 1vạ 'flower' (semantic)

2467 1vạ superficially resembles Old Chinese 華 *wra, the source of *xwɤa, but it goes back to *Sɯ-wa which has no *-r-. If the medial *-r- of OC *wra is a metathesized prefix -

*T-wa > *r-wa > *wra

- then perhaps the Chinese and Tangut words for 'flower' are related. But if Baxter and Sagart (2014) are right, 華 was OC *qʷʰˁra, sharing nothing in common with Tangut *Sɯ-wa other than a vowel. THE GOLDEN GUIDE: LINE 101: TANGRAPHS 501-505

101. I couldn't resist the opportunity to try out my newest Tangut vowel reconstruction in a continuation of where I left off last year (even if doing so entailed inconsistency with my reconstructions of lines 1-100).

Tangraph number 501 502 503 504 505
Li Fanwen number 4695 5087 2259 3951 2042
My reconstructed pronunciation 1giw'
Tangraph gloss the name Giw
the Chinese surnames Yang and Wang the surname element Me
to talk, speak
Word the surname 牛 Niu (*ŋgɨiw) the surname 酒 Yang (*jø̃) the surname 孟 Meng (*mɤẽ) the surname 杜 Du (*thu) the surname 家 Jia (*kɤa)
Translation Niu, Yang, Meng, Du, Jia

501: 4695 1giw' contains 1909 1guʳ 'ox, cattle' as a 'xenophonetic' (i.e., a phonetic element chosen for the pronunciation of its translation in another language: in this case, Tangut period northwestern Chinese *ŋgɨiw 'ox'). However, 1909 is not in its Tangraphic Sea analysis:


4695 1giw' 'the name Giw' =

top of 4940 2ʔjə 'the surname Y' (a family associated with the Giw?) +

bottom of 4107 1giw' (first syllable of 1giw' 1kie 'a kind of plant')

Nor is 1909 in the analysis of 4107 which takes us back to 4695:


4107 1giw' (first syllable of 1giw' 1kie 'a kind of plant') =

top of 4303 1kie (second syllable of 1giw' 1kie 'a kind of plant') +

4695 1giw' 'the name Giw'

The Tangraphic Sea analysis of 1909 is dubious; surely its 'sources' are actually its derivatives:


1909 1guʳ 'ox, cattle' =

part of the bottom of 4704 2rɛʳ  'ox, elephant' (i.e.., large mammal?) +

part of the bottom of 0021 2bɨu 'ox, elephant' (synonym of 4704)

1909 is not a simple pictograph. It seems to contain 'not' (left), a Tangut derivative of 羊 'goat' (center), and a mysterious right-hand element whose function eludes me. 'Not' must be an abbreviation of some other tangraph.

Chinese 牛 *ŋgɨiw < *ŋʷəʔ and Tangut 1guʳ < *Nʌ-gur or *Tʌ-ŋgu are vaguely similar but difficult to relate. A zero grade Tangut derivative of the root *ŋʷʔ would be *2ŋu, not 1guʳ.

502: The analysis of this surname tangraph makes me wonder if there was a Yang family associated with sheep and birds.


5087 1ʔjø̃ 'Yang' =

center of 3452 2ʔje 'sheep' +

left of 2262 1dʐwɨõ 'bird' +

right of 2107 1tsɪʳ 'earth' 

I am not sure it is necessary to reconstruct a glottal stop before *j-. It is odd that Tangut had ʔj- but no simple j-, and I cannot account for the ʔ- in 1ʔjaʳ < *rjat 'eight' unless it is a remnant of a prefix.

5087 also transcribed the surname 王 *wɨõ 'Wang'. Another Tangut transcription was in 412:

0403 1võ (Chinese transcription character)

503: 2259 is a straightforward semantophonetic compound:


2259 2mɤe (the surname element Me) =

left of 2888 'surname' (semantic) +

center and right of 1966 1mɤe 'to call, greet' (phonetic)

2259 2mɤe did not have a nasal vowel like Chinese 孟 *mɤẽ 'Meng', but perhaps the Tangut thought it was appropriate to write a Chinese surname with a tangraph for a similar-sounding syllable in indigenous surnames such as.

0493 2259 2sə 2mɤe 'Syme' and 2259 0714 2mɤe 1tʂɤew 'Mechew'.

504: 3951 is a phonosemantic compound used as a transcription of an unrelated Chinese name:


3951 1thu 'to talk =

left of 3949 1thu (second syllable of 2kyʳ 1thu 'skill') (phonetic) +

right of 1045 2dạ 'speech' (semantic)

505: I could guess the analysis of this tangraph even before seeing it in Li Fanwen (2008: 339):


2042 2kɤa 'duck' =

left of 3058 2ɮəʳ' 'water' +

right of 2262 1dʐwɨõ 'bird'

Such transparent tangraphs are rare, which is why I continue to wonder how tangraphs were learned. ALBANIAN 'SALT' FROM 'GROATS'?

Two months ago I was looking up the reflexes of Proto-Indo-European *seʕl- 'salt' and was surprised to see Albanian ngjelmët 'salty'. I've long been puzzled by how *s- became gj- (part 1 / part 2). Today I found Matasović's (2012: 14) reconstruction of the stages between *s- and gj- which are like mine from two years ago:

*s- > *ś- > *ź- > gj-

But where did the n- in ngj- [ɲɟ] come from? Orel (1998: 298) reconstructed Proto-Albanian *en-salma. What is the prefix *en- doing? Is it *en 'in' which is a verbal prefix (Orel 2000: 168)? Or is it another prefix? Wiktionary has a prefix *(a)n- without any attribution.

The unrelated Albanian noun kripa < *krūpā 'salt' (Orel 1998: 197) is a loan from ... Slavic 'groats' (e.g., Russian krupa)! What is the semantic bridge between 'salt' and 'groats'? UMBROUS UMBRELLA

Umbrellas have been in the news lately. The Sino-Vietnamese (SV) reading of Cantonese 遮 ze 'umbrella' (< 'to obstruct') is già with an irregular huyền 'dark' tone. In Middle Chinese (MC), 遮 was *tɕja. Normally

MC *tɕ- corresponds to SV ch- [c]

MC *-ja corresponds to SV -a

which reminds me of my recent derivation of Tangut rhyme 20 -a from *-ia
the MC 'yin level' tone corresponds to the SV ngang 'level' tone

so I would expect the SV reading of 遮 to be *cha with a ngang tone indicated by the absence of a tonal diacritic. However, the actual reading già on the surface not only has initial gi- [z] ~ [j] but also has a huyền tone implying a *voiced initial.

The initial gi- turns out to be regular.* Annamese Middle Chinese (AMC)* *tɕ- was borrowed as Old Vietnamese (OV) *c- before all rhymes other than *-ja. (See further exceptions here.**) Both *k- and *c- voiced before *-j- in Old Vietnamese, merging into Middle Vietnamese (MV) [ɟ] and leniting to [z] or [j] in New Vietnamese (NV):

OV *kj- > *gj- > MV [ɟ] > NV [z] ~ [j]

OV *cj- > *ɟj- > MV [ɟ] > NV [z] ~ [j]

The spelling gi- reflects a *ɟ-like pronunciation in 17th century Middle Vietnamese.

The voicing implied by Vietnamese tones reflects primary rather than secondary voicing: e.g.,

加 MC *kæ > AMC *kja > OV *kja > *gja > SV gia 'to add'

伽 MC *gɨa > AMC *kjà > OV *kjà > *gjà >  SV già 'transcription of Indic ga'

加 has a ngang tone reflecting its original *voiceless initial and not its secondary *voiced initial gj-.

Similarly, the huyền tone of 伽 reflects its original *g- rather than the new *g- that developed in OV.

I wondered if MC *tɕj- had become AMC *dʑj- with 'yang' tones, but MC *tɕj- non-'level' tone syllables have SV tones implying *voiceless initials:

者 MC *tɕjaˀ with 'rising' tone > AMC *tɕjả > OV *cjả > *ɟjả >  SV giả (not SV *giã) 'nominalizer'

蔗 MC *tɕjaʰ with 'departing' tone > AMC *tɕjá > OV *cjá > *ɟjá > SV giá (not SV *giạ) 'sugar cane'

If *tɕj- became AMC *dʑj- only in 'level' syllables, what would be the phonetic motivation for such a limited change? Why would 'oblique' (i.e., non-'level') tones be anti-voicing?

Original MC *dʑj- and MC *ʑj- apparently merged into AMC *tɕʰ- with 'yang' tones***: e.g.,

社 MC *dʑjaˀ > AMC *tɕʰjã > > OV *cʰjã > MV [ɕã] > SV xã 'altar for the god of the soil'

蛇 MC *ʑjaˀ > AMC *tɕʰjà > > OV *cʰjà > MV [ɕà]  > SV xà 'snake'

cf. 車 MC *tɕʰja > AMC *tɕʰja > OV *cʰja > MV [ɕa] > SV xa 'cart' with an original *voiceless initial and ngang tone (i.e., a 'yin' tone)

The aspiration of OV *cʰ- might have blocked voicing before *-j-. Conversely, *-j- could have become voiceless after *cʰ-: *cʰj- > *cʰj̊-.

MC *dʑj- and MC *ʑj- merged into AMC *ɕ- with 'yang' tones when not followed by *-j-: e.g.,

臣 MC *dʑjin > AMC *ɕə̀n > > OV *sʰə̀n > MV [tʰə̀n] > SV thần 'minister'

神 MC *ʑjin > AMC *ɕə̀n > > OV *sʰə̀n > MV [tʰə̀n]  > SV thần 'god'

cf. 申 MC *ɕin > AMC *ɕən > OV *sʰən > MV [tʰən] > SV thân 'ninth Earthly Branch' with an original *voiceless initial and ngang tone (i.e., a 'yin' tone)

Summing up the history of shibilants in SV (with some more details):

*kɤ- > *kɣ- > *kɰ- *kj- *kj- > *gj- [ɟ] gi-
*tɕj- *tɕj- *cj- > *ɟj-
*tɕʰj- *tɕʰj- *cʰj- [ɕ] x-
*tɕ- *tɕ- *c- [c] ch-
*tɕʰ- *tɕʰ- *cʰ- [ɕ] x-
*dʑ- *ɕ- *sʰ- [tʰ] th-

I can't explain why there was a four-way merger of MC *tɕʰj-, *dʑj-, *ʑj-, and *ɕj- but only a three-way merger of MC *dʑ-, *ʑ-, and *ɕ-. Was there a three-way merger of *Cj-clusters in AMC and OV parallel to the other three-way merger?

*tɕʰj- *tɕʰj- *cʰj- [ɕ] x-
*dʑj- *ɕj- *sʰj-

I am reluctant to reconstruct aspirated fricatives in OV, but they allow me to formulate a single rule covering two changes:

OV *s(ʰ)- > MV [t(ʰ)]
Reconstructing palatal fricatives in OV forces me to formulate two rules:

OV *s- > MV [t]

OV *ɕ- > MV [tʰ]

I do not know of any modern Vietic language with [ɕ]. Then again, I do not know of any modern Vietic language with [sʰ] which is of course a rare sound in the world's languages.

I could reconstruct palatal stops instead of affricates in AMC or palatal affricates instead of stops in OV, but I presume that AMC had affricates like other Chinese dialects and OV had palatal stops like modern Vietnamese. There is no guarantee that was the case: e.g., AMC could have had palatal stops due to Vietnamese influence. (One could use a term like 'Annamese' to avoid the anachronism of 'Vietnamese' as a name for the early Vietic language of Annam.)

*AMC is the dialect of Middle Chinese that developed in Annam and later became extinct after the independence of Vietnam. See Phan (2013).

I write AMC tones using Vietnamese tone marks for convenience. I would not be surprised if the phonologies of the two languages had converged.

**炙 MC *tɕjaʰ 'to roast meat' corresponds to SV chả (< *cả) and chá (< *cá) with ch- [c] instead of gi-. The tone of SV chả indicates that it is an older loan borrowed before the convention of borrowing *tɕj- as *cj-. The tone of SV chá with a tonal reflex characteristic of newer loans may indicate that tonal borrowing patterns changed shortly before the convention of borrowing *tɕj- as *cj- with a *-j- that later conditioned the voicing of the preceding *c-.

***I am using 'yin' and 'yang' as shorthand for 'normally**** conditioned by voiceless initial' and 'normally conditioned by voiced initial'. Here are the six written***** Vietnamese tones and their 'yin/yang' status:

'yin' ngang sắc hỏi
'yang' huyền nặng ngã

The name of each tone contains its characteristic diacritic (or no diacritic in the case of unmarked ngang).

****There are 'yang' tones in Chinese in syllables with *voiceless initials: e.g., standard Mandarin 國 guó < *kwək 'country' which has a 'yang level' tone even though it originally had a 'yin entering' tone.

*****Southern Vietnamese speakers merge the ngã tone with the hỏi tone, but that is not reflected in spelling which mostly reflects Middle Vietnamese. AURAL DOUBLES (PART 2)

A recap of part 1: Tangut had two syllables with similar fanqie ('to hear' as an initial speller plus a rhyme 20 final speller):


3369 1mia 'transcription character for Skt ma, mā' and sixteen homophones = 5026 1mi 'to hear' + 3853 1tia 'topic marker'


5025 2mia 'transcription character for Skt mya' = top and bottom left of 5026 1mi 'to hear' + left of 5314 2ʔia 'transcription character for Sanskrit ya'

If both syllables were mia (disregarding tones), why was 3369 1mia used to transcribe Sanskrit ma and without -y-? And why create 5025 2mia as an 'aural double' of 3369 1mia etc. if 1mia was already a good match for Sanskrit mya?

The answer to both questions is the same: 3369 etc. were actually 1ma, not 1mia, so a special character had to be created to transcribe Sanskrit mya.

But wait - if rhyme 20 was -a, then I can't reconstruct rhyme 17 as -a anymore. What was rhyme 17? To answer that question and the questions I asked at the end of part 1 -

Why did I reconstruct -i- in rhyme 20? Can this -i- be salvaged?

- I need to write about 'grades'. I've already covered the topic in "G-*r-adation in Chinese" (part 1 / part 2) and "G-*r-adation in Tangut" (part 1 / part 2), but I've changed my mind about a few things over the past day.

In the Yunjing rhyme tables for some unknown variety of Late Middle Chinese, a-type syllables were placed in four tables:

Grade \ Table 27 28 29 30
I *-ɑ *-wɑ  
II     *-ɤa *-wɤa
III *-wɨɑ *-ɨa  
IV   *-ia

Vietnamese, Korean, and Japanese loans from Late Middle Chinese have /-(w)a/ for all of those rhymes, so their vowels must have been a-like. One could reconstruct a single Yunjing phoneme */a/ and compress the four tables into two:

Grade \ Table 27+29 28+30
I */-a/ */-wa/
II */-ɤa/ */-wɤa/
III */-ɨa/ */-wɨa/
IV */-ia/  

But why didn't the author of the Yunjing do that? I think it's because */a/ had two allophones, back *[ɑ] and central and/or front *[a]. The *[ɑ] rhymes were placed in tables 27 and 28, while the *[a] rhymes were placed in tables 29 and 30. I reconstruct these allophones on the basis of correspondences with standard Mandarin and Cantonese. (The latter two languages are probably not descendants of the Yunjing language, but their ancestors were probably similar to it.)

Standard Mandarin

Grade \ Table 27+29 28+30
I [ɤ] after velars, [wɔ] elsewhere
II [ja] after *back initials, [a] elsewhere
III [ɤ] [ɥɛ]
IV [jɛ]  

Standard Cantonese

Grade \ Table 27+29 28+30
I [ɔː]
II [aː] [waː] after velars, [aː] elsewhere
III [ɛː] [œː]

The Cantonese pattern is quite clear:

- Grade I: back vowel

- Grade II: central vowel

- Grades III and IV: front vowels

The Mandarin pattern is complicated by these shifts:

*[ɔ] > [ɤ] after velars, [wɔ] elsewhere

*[wɨɑ] > *[wja] > *[ɥa] > *[ɥɛ]

*[ɛ] > [ɤ] after retroflexes

Sino-Vietnamese, Sino-Korean, and Sino-Japanese data for some non-a rhymes indicate that Grade IV was more palatal than Grade III (which may have been entirely nonpalatal in the source dialects of SK and the Go-on layer of SJ): e.g.,

Sino-Korean (premodern spelling)
Sino-Japanese (Go-on)
-ŏn after back initials; -yŏn elsewhere
-on < *-ən
-iên with palatalization of labial initials: *pʲ- > t-, etc.

Similarly, Mandarin Grade IV [jɛ] is more palatal than Grade III [ɤ].

All these diverse sources give us some idea of what the four grades in Chinese were like:

- I was backer than the others

- IV was more palatal than III

We do not know for sure that Tangut also had grades. I do not know of any Tangut term for 'grade'. However, patterns of correlation between Tangut rhymes and Chinese grades in transcriptions have been known for over half a century. Moreover, those patterns also correlate with Tangut initials.

Here is a new Tangut-internal definition of 'grades'. One could identify the grade of a Tangut rhyme by looking at which initials may precede it:

Grade \ Initial
dental stops

The table above is only a first approximation.

I classify rhymes which can be preceded by any initial as Grade III/IV. One could also consider such rhymes Grade V, though such a term would have no parallel in the Chinese tradition.

Compare that distribution of initials with the distribution of Chinese initials in Yunjing:

Grade \ Initial
*w- and labiodentals
dentals and


The two patterns are not identical, but there are similarities:

- Labiodentals and r- never appeared in Grade IV.

- Dentals and sibilants were in near-complementary distribution with shibilants.

- l- was infrequent in Grade IV.

I think these similarities were due to Chinese influence on Tangut. Of course, Tangut had its own history, which is why the parallels are not absolute: e.g.,

- Tangut had Grade I and II v- unlike Chinese (in which *w- became *ɣw- in Grades I and II - a change absent from Tangut).

- Tangut had Grade I r- unlike Chinese (in which *r- became *l-; Yunjing *r- is from *n-)

The distribution of initials in each grade tells us whether certain grades were 'friendly' or 'hostile' toward certain initials. Such 'attitudes' give us clues about the phonetic characteristics of both grades and initials. For example, the fact that shibilants never occur in Grade IV, the most palatal of the grades, tells us that they were not palatal in either the Yunjing language or Tangut. That is why I reconstruct retroflex shibilants. One can also make historical inferences from (near-)complementary distribution: e.g., Chinese shibilants derived from dentals and sibilants, and Tangut shibilants may have partly derived from dentals and/or sibilants.

Having established strong parallels between grades in the two languages, I used to think that Grade IV a was the same in Yunjing Chinese and Tangut: i.e.., -ia. But I could not explain why Tangut rhyme 20 -ia

- transcribed Sanskrit -a and (and nearly all rhyme 20 characters for Sanskrit -ya syllables were fanqie tangraphs combining part of an initial speller with the left side of the character transcribing Snaskrit ya: e.g., 5025)

- was transcribed as -a(H) in Tibetan

Now I think I have a solution:

Transcribed in Tibetan as
Transcribed Sanskrit
*-ɑ 17:
-a, -ā
*-ɤa 18: -ɤa
(no data)
*-ɨa 19: -ɨa
-a (rare; only after shibilants)
*-ia 20: -a
-a, -ā

If the Chinese dialect known to the Tangut was similar to the Yunjing language, it had four kinds of a-rhymes which were similar to Tangut rhymes 17-20.

The Grade IV a-type rhyme of late Tang Dynasty northwestern Chinese was transcribed in Tibetan as -ya, matching the *-ia I reconstructed for the Yunjing language. Maybe that rhyme was still *-ia in the eleventh century, and the Tangut thought its front (?) *a was like the front vowel of their rhyme 20.

The Tangut transcribed Sanskrit central a and ā - vowels absent from their language - with both back ɑ (rhyme 17) and front a (rhymes 18-20).

I suspect that rhyme 20 was once an *-ia that simplified to -a after all initials except glottal stop. Hence the rhyme 20 tangraphs

1767 1ʔia and 5314 2ʔia)

transcribed Sanskrit ya and yā. There was no rhyme 20 *ʔa. The *i of pre-Tangut *-ia may have been conditioned by a preceding presyllable with a high vowel, as Japhug cognates identified by Guillaume Jacques (2006) lack i:

0335 1pha < *Cɯ-pha : J ɯ-phaʁ 'side'

1530 1ma < *Cɯ-ma : J smar 'river'

2098 2ŋa < *Cɯ-ŋa-H : J aʑo < *ŋa-jaŋ 'I' (also cf. Old Chinese 吾 *ŋa 'I')

4225 1sa < *Cɯ-sa : J kɤ-sat 'to kill' (also cf. Old Chinese 殺 *ksat 'to kill')

4459 2ba < *Nɯ-ba-H 'to cut': J kɤ-mbaʁ 'to be cut'

3926 and 4601 2na < *Cɯ-naH 'thou' and second person singular verb suffix

correspond to Old Chinese 汝 *Cɯ-naʔ 'thou'.

If Grade IV rhyme 20 lacked -i-, and Tangut Grade IV was characterized by frontness contrasting with the backness of Grade I, I can revise my vowel reconstructions as follows:

Vowel Front Central Back
Grade i e ə a u o
IV: fronter/higher i e < *ie
ə < *iə a < *ia
y < *iu
ø < *io
III: ɨ ɨi ɨe ɨə ɨa ɨu ɨo
II: ɤ ɤi ɤe ɤə ɤa ɤu ɤo
I: backer/lower ɪ ɛ ʌ ɑ u

That table is not as simple as its predecessor from four months ago, but it fits the Tibetan and Sanskrit transcription evidence better. AURAL DOUBLES (PART 1)

I remain troubled by my reconstruction of Tangut rhyme 20 (1.20/2.17) as -ia. Let's look at the transcription evidence for (or should I say against?) the syllable 1mia from my last post:

1. In Pearl in the Palm,

0092 1mia 'mother'

was transcribed in 12th century northwestern Chinese as 麻 *mbɤa. Granted, there was no Chinese *mia, so this does not necessarily mean 1mia is wrong.

2. On the other hand, it is possible to write mya in the Tibetan script, and yet 0092 was transcribed eight times as ma. Moreover, all rhyme 20 syllables were consistently transcribed without -y-. The Tibetan evidence favors reconstructions of rhyme 20 like Arakawa's -a: and Sofronov's -a (see this table).

3. Moreover, rhyme 20 was often used to transcribe Sanskrit -a and -ā. That is another point in favor of Sofronov's -a. Sofronov did not reconstruct a length distinction in Tangut, whereas Arakawa did. I would expect Arakawa's length distinction to correspond to the length distinction of Sanskrit, but it doesn't: e.g., Arakawa's long -a: may corresponds to Sanskrit short -a as well as long -ā, and vice versa. (10.23.1:09: Gong's length distinction that I used to carry over into my reconstruction also did not correspond to Sanskrit length:

Sanskrit Tangut rhyme Sofronov Arakawa Gong This site until recently This site now
a, ā 17 -a -a -a -a
(none) 18 -ɑ̂ -ya -ia -ɤa
(y)a 19 -jɑ -a: -ja -ɨa -ɨa
a, ā 20 -a -ia -ia
21 -â, -ä -ya: -jaa -ɨaa -ɨa'
(none) 22 -aˁ -a' -aa -aa -a'
23 -âˁ, -jaˁ, -äˁ -ya' -iaa -ææ -ɤa'
a, ā 24 -aɯ, -âɯ -a:' -jaa -iaa -ia'

Colors indicate length: pink = short, green = mixed, blue = long.

Rhyme 22 could not have been a simple -a or -aa, as it was never used to write Sanskrit. Rhymes 18 and 23 were also un-Sanskrit.)

If rhyme 20 were -ia, there would be no reason to create a special fanqie character


5025 2mia = top and bottom left of 5026 1mi 'to hear' + left of 5314 2ʔia 'transcription character for Sanskrit ya'

to transcribe Sanskrit mya, since one of the seventeen 1mia characters with the fanqie


5026 1mi 'to hear' + 3853 1tia 'topic marker'

would have been sufficient. However,

3369 1mia

actually transcribed Sanskrit ma and without -y-!

(10.23.0:33: One might think that 5025 was created for Sanskrit mya because the second tone was favored for Sanskrit words. But tones in Sanskrit transcription seem to be random: e.g.,

- ma was transcribed with both 3369 1mia and 4737 2ma

- mi was transcribed with 5026 1mi, the initial fanqie speller for 5025 and 3369

- Cya syllables were transcribed with both first and second tone tangraphs

I doubt tones in Tangut transcriptions of Sanskrit had anything to do with Vedic pitch accent which was absent from Buddhist Sanskrit.)

In Arakawa's (1997) Nishida-style reconstruction, the reason for 'aural doubles' - tangraphs with slightly different fanqie containing 5026 1mi 'to hear' - is clear: 5025 was 2myaɦ, whereas 3369 and its sixteen homophones were 1maɦ without -y-.

In Arakawa's own reconstruction, 5025 might be 2mya: contrasting with 3369 and sixteen other 1ma:. (Yet there is no 2mya: or 2ma: on pp.128-129 of Arakawa's 1997 syllabary, though there are seventeen 1ma:.)

Tangraph Li Fanwen number Sanskrit transcription value Nishida-style from Arakawa 1997 Arakawa 1997? Gong
This site

5025 mya 2myaɦ (2mya:?; not in his syllabary) 2mja

3369 ma, mā 1maɦ 1ma: (with long vowel for Skt ma!) 1mja
1mia (with -i- for -i-/-y-less Skt ma, mā!)

(10.23.1:46: I don't know how Sofronov would reconstruct 5025 and 3369 today. In 1968, he reconstructed them as 2ma and 1ma.)

At least everyone agrees that rhyme 20 was a-like, which is why I render it as -a in my lay transcription of Tangut.

Next: Why did I reconstruct -i- in rhyme 20? Can this -i- be salvaged? WHY SO MIA-NY?

I have been writing about names of Kumārajīva lately (part 1 / part 2) such as Tangut

3948 3369 3284 2152 3284 (again!) 1kɨa' 1mia 2lɨa 1ʂɨi 2lɨa

The tangraph transcribing was one of the rhyme 1.20 syllables in the Tangraphic Sea that I listed last week.  Most were written with one or two tangraphs, but 1mia was written with seventeen! (For comparison I have also included the corresponding rising tone syllable 2mia with rhyme 2.17.)

Tangraph Li Fanwen number Reading Li Fanwen gloss Type (* = only in dictionaries)
0092 1mia mother (cf. 3334) free morpheme 1
0409 former times (only in dictionaries?; combines with regular word for 'day') bound morpheme 1*
1178 first half of 1mia 2nie 'end' (only in dictionaries; cf. 3369) free morpheme 1 in a compound 'end-tail'*
1215 first half of 1mia 2mɤe' 'to think of, to long for' (only in dictionaries) morpheme half 1*
1216 ten thousand (loan from Late Old Chinese 萬 *mɨanh 'id.'?) free morpheme 2
1458 second half of 2ni' 1mia 'salamander' (only in dictionaries) bound morpheme 2* after a Chinese loanword 鯢 'salamander'
1530 river free morpheme 3
1721 stirrup free morpheme 4
1803 first half of 1mia 1ɬiu' 'gray', name of an ancestor (only in dictionaries) morpheme half 2*, free morpheme 2*
2270 last syllable of (2mɪ) 2mɪ 1mia 'a kind of bird' (only in dictionaries) morpheme part 3*
2648 first half of 1mia 1khiu 'underground' (1khiu is 'under') bound morpheme 1
3334 female, woman (cf. 0092) free morpheme 1
3369 end, tail, east (only in dictionaries; cf. 1178); first syllable of 1mia 2ɬiụ 'plantain' and 1mia ?xa 'water buffalo'; transcription of Sanskrit ma, mā free morpheme 1*, morpheme half 1, morpheme half 2, (not in Tangut words)
3527 analogy; generally; doubt, fear (i.e., uncertain); and; few; should (i.e., to be time for), time; clothes free morphemes 5-11
3569 fishing hook free morpheme 12
3718 second half of 1ɣa 1mia 'doorframe' (1ɣa is 'door') bound morpheme 2
5118 second half of 1niu 1mia 'earring' (1niu is 'ear') bound morpheme 3
5025 2mia transcription of Sanskrit mya (not in Tangut words)

Why are there so many 1mia - and no native 2mia? The lower frequency of second tone syllables indicates that the source of the second tone must have been something extra which I reconstruct as a final glottal *-H by analogy with Chinese.

I reconstruct *Cɯ-ma(C) as the pre-Tangut source of 1mia. The high presyllabic vowel conditioned the breaking of the main vowel:

*C₁ɯ-ma(C₂) > *C₁ɯ-mɨa > *mɨa > 1mia

I don't know when the final consonant was lost relative to vowel breaking.

The various 1mia may have had different presyllabic and/or final consonants in pre-Tangut: e.g.,

*kɯ-map, *tɯ-mak, *pɯ-ma, etc.

I count 24 types of 1mia:

17 in texts (not just dictionaries; pink):

12 free morphemes (0092 = 3334, 1216, 1530, 1721, 3527 [seven homophones!?], 3569)

3 bound morphemes (2648, 3718, 5118)

2 parts of polysyllabic morphemes (3369 [two homophones])

7 only in dictionaries (blue; possible 'ritual language' words and/or words that didn't happen to appear in Buddhist, Confucian, military, etc. texts: e.g., 'salamander'):

2 free morphemes (1178 = 3369, 1803)

2 bound morphemes (0409, 1458)

3 parts of polysyllabic morphemes (1215, 1803, 2270)

Green indicates a tangraph (3369) that represents one morpheme only in dictionaries and parts of words in texts.

Further analysis may be able to reduce the number of types of 1mia: e.g., the 1mia in 1458 2ni' 1mia 'salamander' may be 'river' and the 1mia in 4681 5118 1niu 1mia 'earring' may be 'hook'.

Although one could describe tangraphy as 'logography' (i.e., as a word-per-character writing system), 3527 might have represented up to seven unrelated words! Conversely, the word 1mia 'female' was written with two tangraphs (0092 and 3334) depending on whether it referred to mothers or females in general. And 1mia 'end' was written differently depending on whether it was an independent word (3369) or in the compound 1178 5734 1mia 2nie 'end-tail'.

10.22.1:54: A high degree of homophony is tolerable: e.g., English can can mean

1. to be able

2. a container

3. to place in a container

4. prison (if preceded by the?)

5. toilet (if preceded by the?)

6. to be ready for release (in the can)

7. to be released from employment (mostly passive: was/got canned?)

8. Canada (e.g., in Canwest)

and various other meanings I have never encountered. Context is sufficient to disambiguate these many uses.

None of those meanings are opposites. One might look up

1530 1mia and 2648 1mia

in Li Fanwen (2008) and think they are near-opposites ('river' and 'land'), but in fact the latter apparently only occurs in the disyllabic expression

2648 5399 1mia 1khiu 'underground'

and I suppose that is much more common than

1530 5399 1mia 1khiu 'under a river'

so there is little risk of ambiguity. (In Google, under a river has 8.74 million hits, which sounds like a lot, but underground has 335 million hits! And many references to under a river involve underwater construction that would have been unimaginable to the Tangut nearly a thousand years ago.) 'ZEN': A REMNANT OF TANGUT EMPIRE CHINESE?

KJ Solonin's article made me think about the Tangut name for Zen


3504 1ʂɨã =

all of 2833 2diẽ 'calm, quiet' (probably 'not' + top and bottom right of 'to move')

left of 5593 1bɤo' 'to look, watch, observe'

as well as the Tangut names of Kumārajīva (part 1 / part 2). 1ʂɨã is a borrowing from Tangut period northwestern Chinese 禪 *ʂɨã which in turn is from Late Old Chinese (LOC) *dʑian, a Sinified form of Pali jhāna- (< Sanskrit dhyāna 'meditation'). (Japanese Zen is from Middle Chinese *dʑien.) Coblin (1994: 323) reconstructed 禪 as *śan ~ *źan in the 9th and 10th centuries AD on the basis of these Tibetan transcriptions:

大乘中宗見解: shan, zhan

南天竺國菩提達摩禪師觀門: zhan, Hzhan

LOC *dʑ developed differently in premodern northwestern Chinese and in Mandarin in 'level' tone syllables:

Tone 'Level' 'Nonlevel'
Premodern northwestern Chinese > >
Mandarin ch [tʂʰ] sh [ʂ]

I don't understand the phonetic motivation for the split. Why were 'nonlevel' tones incompatible with a voiced affricate? (Voiceless affricates were possible before 'nonlevel' tones.)

Although modern northwestern Chinese generally has Mandarin-style reflexes of *dʑ, 禪 'Zen' still has a fricative initial in some varieties (Coblin 1994: 323):

Xining ʂã⁴⁴

Dunhuang ʂæ̃²⁴

Early 20th century Xi'an (as recorded by Karlgren): ʂæ̃ (tone unknown)

I thought these fricatives might be substratum retentions. I had either forgotten or overlooked this passage earlier in Coblin (1994: 101):

Occasional exceptions are found [to the Mandarin pattern of reflexes of *dʑ ...], e.g.[0678] (QYS źi̯än) "Zen Buddhism": [mid-Tang Chang'an] *dźan > *źan; CSZ [colloquial Suzhou] *śan (~ *źan?); XN [Xining]: ʂã⁴⁴; DH [Dunhuang]: ʂæ̃²⁴. These exceptional modern reflexes appear to derive directly from forms like those found in CSZ.

I looked for those "occasional exceptions" and found

蟬 LOC *dʑian 'cicada' is ʂæ̃²⁴as well as tʂʰæ̃²⁴(cf. standard Mandarin chan) in Xiaoxuetang's Xi'an data

辰 LOC *dʑin 'fifth Earthly Branch' is ʂɛ̃ (tone unknown) in Karlgren's Xi'an data (Coblin 1994: 361) and ʂẽ²⁴as well as tʂʰẽ²⁴ (cf. standard Mandarin chen) in Xiaoxuetang's Xi'an data

This last graph has two Sino-Korean readings, chin (without aspiration!) and shin. The first reading may be an old borrowing from Early Middle Chinese *dʑin; the second is from Late Middle Chinese *ɕin.

The multiple Sino-Korean readings of 什 in 鳩摩羅什 'Kumārajīva') may also be from different strata of borrowing: 집 chip from Early Middle Chinese *dʑip and 십 ship from Late Middle Chinese *ɕip. (집 chip becomes -jip with secondary voicing after a sonorant. That voicing is due to a Korean phonological rule and does not preserve the voicing of Early Middle Chinese *dʑip.)

A third Sino-Korean reading 습 sŭp is difficult to explain; it may be from a different Late Middle Chinese dialect in which *-ip became *-ɨp rather than vice versa.

The Xining reading of 禪 'Zen' also has an irregular 'yin level' tone (which would normally reflect an earlier *voiceless initial) instead of the expected 'yang level' tone (reflecting an earlier *voiced initial). I don't think the tone of 禪 'Zen' indicates that it had a voiceless initial in pre-Xining. I hypothesize that the original dialect of the region had a 'yang level' tone that sounded like the 'yin level' tone of the Mandarin dialect that displaced it.

If I am correct, then a study of irregular tones in Xining may reveal something about the substratal tone system. Unfortunately, it may not reveal the exact values of the tones at the time of borrowing because all tones - substratal and superstratal may have changed since then. So I don't know if 44 was the 'yang level' tone contour in the substratum dialect.

It would be interesting if other modern northwestern dialects also have a seemingly 'yin level' tone for 禪 'Zen'.

Dunhuang only has one 'level' tone which may be a merger of earlier 'yin level' and 'yang level' tones.

I don't know the modern Xi'an reading of 禪 'Zen', but I do know that both the substratal fricative-initial and superstratal affricate-initial readings of 蟬 'cicada' and 辰 'fifth Earthly Branch' have 'yang level' tones in modern Xi'an. Were the tones of the substratal readings shifted to match the superstratal tones?

One last question: Why would northwestern Chinese retain an old word for 'Zen'? The answer probably has something to do with the religious history of the region.

I am reminded of how Japanese Buddhist terminology consists of Early Middle Chinese-based borrowings (呉音 Go-on) that were not displaced by Late Middle Chinese borrowings (漢音 Kan-on) during the Tang Dynasty: e.g., 禪 Zen was not replaced by a newer borrowing *Sen. (One might think that Zen Buddhism was practiced in Japan before the Tang Dynasty, but in fact it took root in the 12th century when 1ʂɨã 'Zen' was practiced in the Tangut Empire. An old reading Zen was used for a new school because of the strong association between Go-on and Buddhism in Japan.)

On the other hand, Korean Buddhist terminology generally consists of Late Middle Chinese borrowings: e.g., 禪/선 Sŏn 'Zen' probably replaced an earlier borrowing that would have become modern 전 *Chŏn. A rare exception is the 什 -jip in 鳩摩羅什/구마라집 Kumarajip. But that is not the most common reading of 鳩摩羅什. Here are Google frequencies for the three readings of the name:

구마라십 Kumaraship: 215,000

구마라집 Kumarajip: 21,900

구마라습 Kumarasŭp: 19,300

The newer reading 십 ship outnumbers the older reading 집 jip by nearly ten to one.

The older voiced affricate reading of 禪 'Zen' has left no trace in Sino-Vietnamese. The only Sino-Vietnamese reading of 禪 is Thiền from southern Late Middle Chinese *ʑien; there is no *Chiền from southern Early Middle Chinese *dʑien. THE TANGUT NAMES OF KUMĀRAJĪVA (PART 2)

The third Tangut name of Kumārajīva shares no characters with the other two:

1429 4575 4710 4867 1kiew 2mo 1lo 1ʂɨəʳ

It is obviously based on Tangut period northwestern Chinese 鳩摩羅什 *kɨwmbɔlɔʂɨi from a 4th century *kumaladʑip.

As I mentioned yesterday, 1429 is also the transcription character for 鳩 in the Tangut translation of the Forest of Categories (Gong 2002: 438).

4575 and 4710 are also transcription characters for Sanskrit mo and lo (Arakawa 1997: 111).

4867 was also used to transcribe other Chinese characters pronounced *ʂɨi (十實失室) and 涉 *ʂɨa (Li 2008: 770). The retroflexion in Tangut may have reflected subphonemic vowel retroflexion in Chinese after retroflex affricates: /ʂi/ = [ʂɨiʳ] and /ʂia/ = [ʂɨaʳ].

In theory the name could have been borrowed in a more Sanskrit-like form as *kʊ ma raʳ dzi va via Tibetan kumaradziba [kumaradziwa] or directly from the variety of Sanskrit known to the Tangut which had [dz] for j. (My Tangut reconstruction has no rhyme -u. Retroflexion was almost always obligatory after r- in Tangut.)

I was curious to see how Kumārajīva was rendered in other languages. Judging from Wikipedia entry titles:

Czech Kumáradžíva preserves the long vowels.

Polish Kumaradżiwa [kumaradʐiva] has retroflex for Sanskrit palatal j [dʑ]. I would have expected *Kumaradziwa [kumaradʑiva] with palatal dz (pronounced like [dʑ] before i). The combination of retroflex and palatal i is unusual in Polish. I wonder if that i is pronounced [ɨ] as in the normal Polish combination ży [ʐɨ].

Ukrainian Кумараджива [kumaradʐɪva] has [ɪ] instead of [i]. I presume the spelling was taken from Russian Кумараджива [kumaradʐɨva].

Korean 쿠마라지바 [kʰumaradʑiba] has an un-Sanskrit (and English-influenced?) initial aspirate. I presume it is a modern term. Older names are 鳩摩羅什 Kumarasŭp/Kumaraship/Kumarajip (the last character is read three different ways) and 羅什 Nasŭp (with initial r- becoming n- before a-). THE TANGUT NAMES OF KUMĀRAJĪVA (PART 1)

Having just written about Chinese transcriptions of Indic, I thought it was neat that I then stumbled upon KJ Solonin's tentative identification of

2152 3284 1ʂɨi 2lɨa

as a Tangut transcription of the name of Kumārajīva (1998: 411, 414 #80), translator of the Lotus Sutra and other Buddhist texts into Chinese. Kumārajīva's Chinese name was 鳩摩羅什, pronounced *kumaladʑip in the 4th century AD. In the Tangut period northwestern dialect of Chinese, it would have been read as *kɨwmbɔlɔʂɨi. If the two names are connected, the Tangut name might be an accidental inversion of

*3284 2152 2lɨa 1ʂɨi

corresponding to 羅什 *lɔʂɨi, an abbreviation of 鳩摩羅什 *kɨwmbɔlɔʂɨi. (This abbreviation was obviously created by a Chinese speaker, as a natural break in the Sanskrit would be between Kumāra 'boy, prince' and jīva 'life'.)

Unfortunately, the name 2lɨa 1ʂɨi only appears once in the text that Solonin translated. However, a transcription of the full name 鳩摩羅什 *kɨwmbɔlɔʂɨi does appear in the Hongchuan preface of the Lotus Sutra (Li 2008: 533; see Nishida 2004 on the Tangut Lotus Sutra):

3948 3369 3284 2152 3284 (again!) 1kɨa' 1mia 2lɨa 1ʂɨi 2lɨa

There are several things that are odd about this spelling.

First, 3948 1kɨa' is a poor match for Chinese 鳩 *kɨw. It is a transcription character for Sanskrit ka and kya (Arakawa 1997: 110, 116; Kychanov and Arakawa 2006: 692). In the Tangut translation of the Forest of Categories,*kɨw was transcribed as

1429 1kiew

which is a much better match (Gong 2002: 438). 1429 is also a transcription character for Sanskrit (?) kyu (Grinstead 1972: 111) and is the first character in a different transcription I'll examine tomorrow.

Second, 3369 1mia (rhyme 20) has an -i- that corresponds to zero in Chinese 摩 *mbɔ and Sanskrit and Tibetan ma (Arakawa 1997: 110, Kychanov and Arakawa 2006: 234).

Maybe I should follow Sofronov and Arakawa and stop reconstructing -i- in rhyme 20.

Third, 3284 2lɨa (rhyme 19) has an -ɨ- that corresponds to zero in Chinese 羅 *lɔ and Sanskrit la (Arakawa 1997: 110).

I have yet to see a fully satisfactory solution to the problem of reconstructed Tangut medials seemingly reflecting nothing in transcriptions of Chinese and Sanskrit.

Fourth, 3284 appears again, corresponding to zero in the four-syllable Chinese name. The first four syllables of this longer Tangut name are obviously based on Chinese (hence 2lɨa for 羅 *lɔ rather than *raʳ for Sanskrit ra). I would have expected a fifth syllable to be

2640 1pho

a transcription of Chinese 婆 *phɔ < *ba for Sanskrit va in longer Chinese names for Kumārajīva:

鳩摩羅什婆 *kɨwmbɔlɔʂɨiphɔ < *kumaladʑipba

鳩摩羅時婆 *kɨwmbɔlɔʂɨiphɔ < *kumaladʑɨba

鳩摩羅耆婆 *kɨwmbɔlɔtʂɨiphɔ < *kumalatɕiba

Having not seen the text where Li found this longer transcription, I don't know if this second 3284 is a typo (I doubt that, as even the Chinese translation has a doubled syllable: 鳩摩羅什羅) or in the orignal. Kychanov and Arakawa (2006: 692) do not list any words beginning with 3948. Maybe this longer name is a confused blend of *1kɨa' 1mia 2lɨa 1ʂɨi and the short inverted name 1ʂɨi 2lɨa.

At least 2152 1ʂɨi is a perfect match for Chinese 什 *ʂɨi, and is attested as a transcription of the last syllable of the name 李七什 *lɨi tshi ʂɨi (Li 2008: 356).

Next: Another Tangut name for Kumārajīva. TESTING STAROSTIN'S 'LATE-RAL' SCENARIO

(I rhyme lateral [ˈlætəɹo] and scenario [səˈnæɹio]. 'Late-ral' is [ˈlejtəɹo] with a linking schwa to preserve the resemblance to [ˈlætəɹo].)

One of the biggest sound changes in Chinese was the loss of laterals:

Old Chinese *l- in type A syllables > Middle Chinese *d-

Old Chinese *hl- in type A syllables > Middle Chinese *th-

Old Chinese *l- in type B syllables > Middle Chinese *j-

Old Chinese *hl- in type B syllables > Middle Chinese *ɕ-

(The nature of the Old Chinese type A/B distinction is disputed, but the Middle Chinese initials are uncontroversial.)

In my last entry, I mentioned two conflicting chronologies for the lateral shift in Chinese. Schuessler (2009) reconstructed Middle Chinese-like initials (*j-, *ɕ-, *d-, *th-) in his Later Han Chinese (i.e., Eastern Han / Late Old Chinese), whereas Starostin mostly reconstructed transitional fricatives or laterals for that period:

Old Chinese syllable type Early Old Chinese Late Old Chinese Middle Chinese
Starostin Schuessler
A *l- (Starostin: *l- and dɮ-) *l- *d-
*hl- (Starostin: *tɬ-) *hl- *th-
A and B *r- *l-
B *l- *ʑ- *j-
*hl- (Starostin: *tɬ-) *ɕ-

(I use the same notation regardless of scholar for ease of comparison. I list Starostin's reflexes of his Early Old Chinese *tɬ- and *dɮ- because they correspond to *hl- and *l- in others' reconstructions. Starostin's EOC *hl- behaved differently from others' *hl-; it became Late Old Chinese and Middle Chinese *h- [= others' *x-]. For arguments against Starostin's lateral affricates, see Sagart 1999. I have included EOC *r- for comparison.)

To test Starostin and Schuessler's reconstructions of Late Old Chinese (LOC), let's look at Eastern Han transcriptions of Indic from Coblin (1983).

If Starostin is right:

- LOC *l- should transcribe Indic l

- LOC *hl- shouldn't be used in transcription because there was no Indic voiceless hl

- LOC *r- should transcribe Indic r

If Schuessler is right:

- LOC *d- from EOC *l- could transcribe Indic d

- LOC *th- from EOC *hl- could transcribe Indic th

- LOC *l- from EOC *r- should transcribe both Indic *l and *r (since LOC no longer had *r-)

Both would agree that LOC *ɕ- should transcribe Sanskrit ś [ɕ].

As I already noted last time, the correspondence of Starostin's *ʑ- / Schuessler's *j- to Indic y- [j] is ambiguous since Starostin would have said that *ʑ- was the closest available initial due to the absence of *j- in his LOC. Correspondences between this LOC initial and Sanskrit c-, j- [ɟ], ś- [ɕ], and s- suggest that it was "a fricative or affricate of some sort" (Coblin 1983: 63): e.g., Starostin's *ʑ-.

In the transcriptions of 安世高 An Shigao (mid-2nd c. AD) we find that:

- Indic d and even intervocalic -t- were transcribed with Starostin's LOC *l- / Schuessler's *d- (18, 19; the numbers are from Coblin 1983)

- Indic l was transcribed with Starostin's LOC *r- / Schuessler's *l- (13, 15, 28)

These pattern are not quirks of An Shigao; they can also be found in the transcriptions of 支婁迦淺 Zhi Loujiachen/Lokakṣema (mid-2nd c. AD; his name has 婁 Starostin's LOC *r- / Schuessler's *l- for Sanskrit l-) and 康孟詳 Kang Mengxiang (late 2nd-early 3rd c. AD). All three men were non-Chinese who settled in Luoyang, so their transcriptions probably represent the same dialect.

The only Indic th in An Shigao's transcriptions was transcribed with 替 whose EOC initial is ambiguous. It definitely had *th- in Middle Chinese and must have had *th- here. Starostin might have taken that as evidence for reconstructing  替 with *th- in EOC.

th is a low-frequency consonant, so it's not surprising that there are no instances of it transcribed with original or secondary *th-. (Oddly Lokakṣema transcribed it as the coda-onset sequence -t s- in 55.)

I conclude that the following chain shift had occurred in the Luoyang dialect of LOC by the mid-2nd century AD:

*r- > *l- (type A) > *d-

This is contrary to Starostin's 'late-ral' scenario in which the laterals hardened later.

I also reconstruct a parallel change

*hl- (type A) > *th-

on the grounds that it would be odd if *hl- lagged behind its voiced counterpart *l-. Unfortunately there is no Indic transcription evidence for that.

Phonetic glosses such as

'聖 *hlieŋh (type B; > MC *ɕieŋʰ) is read like 通 *hloŋ (type A; > MC *thoŋ)' (Xu Shen 1063, b. in 召陵 Zhaoling 200 km SE of Luoyang, fl. c. 100 AD)

'天 *hlein (type A; > MC *then) read as 身 *hlin (type B; > MC *ɕin)' (Gao You 243, b. in 涿 Zhuo, fl. c. 200 AD)

indicate that *hl- did not harden in other LOC dialects during the early centuries of the first millennium AD. The glosses would not make sense if *hl- had already become *th-.

10.17.23:17: Some LOC glosses that seem bizarre might make more sense if we don't try to shove the words into the standard paradigm defined by the Chinese lexicographical tradition. For instance, perhaps Xu Shen pronounced 通 as something like *hliøŋ with a front diphthong similar to 聖 *hlieŋh. The expected Old Chinese reconstruction 通 *hloŋ is mechanically derived from Middle Chinese *thoŋ, whereas my hypothetical *hliøŋ would have vowel warping conditioned by a presyllable in an Old Chinese variant *Cɯ-hloŋ or *Hɯ-loŋ. Perhaps *Hɯ-loŋ was the earliest form which developed along two paths:

Early fusion: i.e., before conditioned vowel warping

*Hɯ-loŋ > *hloŋ > *thoŋ (Middle Chinese prestige form recorded in dictionaries)

Late fusion: i.e., after conditioned vowel warping

*Hɯ-loŋ > *Hɯ-luoŋ > *hluoŋ > *hlioŋ > *hliøŋ (> Middle Chinese *ɕyøŋ?; nonprestige and extinct?)

For more examples of variation between fused and unfused presyllables, see the discussion of Phan Rang Cham (Austronesian) and Ruc and Nha Heun (Austroasiatic) in Sagart (1999: 15-17). TILTED TONGUE

On Monday, I wrote,

I wrote the pre-Tangut source of ld- as *L-. External evidence may help us identify what *L- was.

Last night I mentioned


3190 1ldwia 'tongue' = (4226 1ldwị + 0537 1pia) + 1223 2phɤo' (Mixed Categories of the Tangraphic Sea 11.122)

as one of the syllables with a fanqie including the mysterious additional character 1223.

1ldwia is probably related to the many l-words for 'tongue' in Sino-Tibetan: e.g.,

Old Chinese 舌 *mɯ-lat or *m(ɯ)-ljat (Baxter and Sagart 2014: *mə.lat)

also cf. 舐/舓/咶 *mɯ-leʔ or *m(ɯ)-ljeʔ (B&S 2014: *Cə.leʔ) 'to lick'

and perhaps 舔 *hlˁimʔ < *qlimʔ or *Hʌ-limʔ (? - I can't find any attestations before the 13th century AD; nonetheless it resembles lem-words for 'tongue' elsewhere in Sino-Tibetan and may be very old) 'to lick'

It is not possible to determine whether Middle Chinese *ʑ- in 'tongue' and 'lick' is from *mɯ-l- or *m(ɯ)-lj-. Coblin (1986) reconstructed medial *-i- for 'tongue' at the Proto-Sino-Tibetan level.

If the third word is related, and if the root was *√lj, then I can reconstruct

*m(ɯ)-lj-a-t (a-grade)

*m(ɯ)-lj-e-ʔ (e-grade)

It is tempting to reconstruct *m(ɯ)-lj-a-j-ʔ (a-grade), but the phonetics 氏 and 易 point to *e.

*qli-m-ʔ or *Hʌ-li-m-ʔ (zero grade; the *j of the root became *i if no vowel followed)

Classical Tibetan ljags /ldʑags/ < *n-ljaks (Jacques, "The laterals in Tibetan")

CT j is an affricate /dʑ/, whereas pre-Tibetan *j is a glide.

Although it would be nice if Tibetan had *m- like Chinese, *m-lj- would have developed into mj- /mdʑ/, not lj- /ldʑ/ (Jacques, "The laterals in Tibetan").

Written Burmese hlyā

I cannot explain the variation in final consonants (Old Chinese *-t and *-[m?]ʔ, pre-Tibetan *-ks, Written Burmese zero). I presume they are all suffixes.

The pre-Tangut source of 1ldwia must be a combination of the following elements:

ld- may be from a consonant prefix plus root *l-

-w- is from a labial prefix *P- (and that prefix might have combined with *l- to form ld-)

-i- is from *-j- and/or a presyllabic *-ɯ-

a final stop could have been lost without a trace

the tone indicates there was no final *-H

If the root was *√lj, that narrows down the possibilities.

The simplest reconstruction would be *m-lja whose *m- would combine with *l- to form ld- and condition the medial glide -w-.

A more complex reconstruction *P-N-lja would have separate sources of -d- and -w-.

Forms for 'tongue' in Horpa varieties seem to be from  *P-lj-: fʑa, vɮɛ, etc. See STEDT and the rGyalrongic Languages Database (item #36).

According to Guillaume Jacques ("The laterals in Tibetan"), Li Fang-Kuei, Coblin, and Gong all reconstructed *n-l- as the source of Written Tibetan ld- (whereas Jacques reconstructed *d-l- since his *n-l- became WT Hd- /nd/.) Perhaps *N-l- similarly became ld- in Tangut. *N- may have been an *n- as in pre-Tibetan *n-ljaks 'tongue' or an *m- as in Chinese *m(ɯ)-ljat.

The only other word out of the eight I discussed yesterday that might have a cognate - with emphasis on might - is


0841 1ɬwiẹ 'oblique, slanting, inclined' = (2814 2ɬ + 3439 1piẹ) + 1223 2phɤo' (Mixed Categories of the Tangraphic Sea 12.122)

Before I go on to a possible cognate, I realize what 1223 is doing here and in various other cases. I think 1223 in such contexts means 'combine the initial of one syllable with a labial-initial syllable to form a syllable with medial -w-': e.g.,

1ɬiẹ + 1piẹ = 1ɬpiẹ > 1ɬwiẹ ̣

Could this suggest that -w- was [v] or [β] and that Tangut labials lenited in coherent speech (as opposed to words pronounced in isolation): i.e., 1ɬiẹ 1piẹ was pronounced [ɬiẹ viẹ] or [ɬiẹ βiẹ]?

Another possibility is that labials were followed by a subphonemic glide [w]: e.g., 1piẹ /piẹ/ was [pwiẹ] and

1ɬiẹ + [1pwiẹ] = 1ɬwiẹ

There was no contrast between /P/ and /Pw/ in Tangut.

That does not explain the highly anomalous fanqie for 2417 (which does not have a labial-initial final speller; moreover, its final speller has a different rhyme with the wrong tone!):


2417 1ʂwɨọ 'to need, want' = (0245 2ʂwɨi + 1449 2tʂhwɨoʳ̣̣) + 1223 2phɤo' (Tangraphic Sea 55.222)

Moreover, 1223 is redundant in cases like the one above and


5679 1khwɤa 'remnants' = (2554 1khwɤe + 4314 1bɤa) + 1223 2phɤo' (Tangraphic Sea 26.211)

in which the initial speller has -w-. Perhaps this use of 1223 originated in fanqie for words like 0841 and was overextended.

Back to cognates: 0841 1ɬwiẹ could go back to *S-P-KE-la:

*S- conditioned the tense vowel

*P- conditioned -w-

*K- fused with *l- to form ɬ-

*-E- conditioned the raising and breaking of *a to ie

The root *la would be shared with Old Chinese 邪 'awry' *sla (spelled 斜 from the 2nd century BC onwards for 'slanted'). But it is not clear if 邪 had an *l-root.

First, other *l-less reconstructions of 邪 are possible: e.g.,

*sja (Schuessler 2009 and this site)

*sə.ɢA (B&S 2014, which reconstructs the left side 牙 of 邪 as *m-ɢˤ<r>a; Schuessler 2009 reconstructs *ŋrâ and I reconstruct *ŋra)

Second, the lateral phonetic 余 *la of the later spelling 斜 is not strong evidence for an *l-root if

- Baxter and Sagart's *sə.ɢa is correct

- *l- had shifted to *ʑ- by the 2nd century BC

- *sə.l-, *s-l-, *s-ɢ-, and *sə.ɢ- had merged into something like *sj- or *zj- (i.e., a *ʑ-like cluster) by the 2nd century BC

However, Starostin reconstructed a different chronology in which laterals remained lateral as late as the 2nd century BC (i.e., during the Western Han):

*lhia > 邪/斜 Western Han *lhia > Eastern Han *zhia

*dɮa > Western Han *la > Eastern Han *ʑa

Eastern Han transcriptions of Sanskrit y- are ambiguous. Starostin might have said that Chinese *ʑ- was used for Sanskrit y- because there was no *j-. On the other hand, Schuessler would say that Chinese *j- was used for Sanskrit y-. BIRD WORDS

At the end of my last entry, I asked what 1223 was doing in this Tangut fanqie:


1363 1swia 'time' = (5323 1swi + 0537 1pia) + 1223 2phɤo' (Tangraphic Sea 29.132)

The analysis of 1223 2phɤo' 'gentle, harmonious, together, pair'  is unknown, but it looks like 'bird' + 'word':


It is in eight fanqie in the first and third surviving volumes of the Tangraphic Sea. It might have been in the lost second volume as well.

Volume/Page/position Tangraph Li Fanwen number Initial class Rhyme Reading (Nishida-style, Arakawa 1997) Reading (this site) Fanqie Gloss
initial final
1.26.211 5679 V 1.18 1khamba 1khwɤa 2554 1khwɤe 4314 1bɤa remnants (only in dictionaries?)
1.29.132 1363 VI 1.20 1špwaɦ 1swia 5323 1swi 0537 1pia time, transcription character for Chinese 宣 *swiã, *siu
1.55.222 2417 VII 1.48 1štšhor 1ʂwɨo 0245 2ʂwɨi 1449 2tʂhwɨoʳ to need, want
1.84.253 1029 V 1.80 1kwɑr 1kwaʳ 2503 1kʊ̣ 5528 1baʳ to cry, weep, sob
3.11.111 0732 IX 1.64 1hlwạ 1ɬwiạ 1770 1ɬwi 5370 1piạ ash, dust
3.11.122 3190 1.20 1ɬwaɦ 1ldwia 4226 1ldwị 0537 1pia tongue
3.12.111 2238 1.67 1hlwị 1ɬwị 0239 1ɬiə 5212 1pị the surname Lhwi
3.12.122 0841 1.61 1lwɛ̣ 1ɬwiẹ 2814 2ɬị 3439 1piẹ oblique, slanting, inclined

What is the function of 1233? It can be translated into Chinese as 合 'together', the word used in Middle Chinese transcriptions of Sanskrit to indicate that two syllables were to be read as one: e.g.,

娑婆二合 *sa ba TWO TOGETHER for Sanskrit sva

One might expect 1233 to appear in fanqie for Sanskrit transcription characters, but it doesn't; in fact, one of the fanqie is for the basic word 3190 1ldwia 'tongue'. Why wasn't its fanqie simply

4226 1ldw + 0537 1pia

without 1233? Fanqie are by definition combinations of initials and finals; wouldn't 1233 be redundant?

In any case, 1233 is not a carryover from the Chinese lexicographical tradition, since 合 does not appear in Chinese fanqie.

1233 is interpreted in at least three ways in Arakawa's Nishida-style reconstruction:

1. Read as a sequence of two syllables:

(1kĭɛ2 + 1mba) TOGETHER = 1khamba

This is the only disyllabic reading in Arakawa's Nishida-style reconstruction.

Why isn't the combination 1kĭɛ2mba or 1kamba (if the second rhyme is copied in the first syllable)?

2. Read as a combination of the initials of the two syllables and the rhyme of the second syllable:

(1sw + 1paɦ) TOGETHER = 1špwaɦ

(2ši + 2tšhɔr) TOGETHER = 1štšhor (not 2štšhɔr!)

3. Redundant in the other five instances which might as well be normal fanqie

The first two interpretations are highly unlikely. I don't know of any transcriptions of 5679. And I doubt Chinese 宣 *swiã and 修 *siu would have been transcribed with a very un-Chinese cluster špw-.

So that leaves the third interpretation which is also unsatisfying. What, if anything, does 1223 indicate that differs these eight syllables from all others in the Tangraphic Sea? I can't help but fear that the instances of 1233 in the lost second volume might not shed light on this mystery. A PHONETIC KEY TO TANGRAPHIC SEA RHYME 1.20

Nearly fifty years have passed since the Russian translation of the Tangraphic Sea, and the Chinese translation of that dictionary turned thirty last year. An English translation would be nice but perhaps also redundant since Tangutologists should be able to read Russian and/or Chinese. Of course, English would be nice for many non-Tangutologists. What I would like to see (and make) is a Tangraphic Sea with reconstructed character readings. Since I have been writing abou rhyme 20 syllables lately, here are the readings for the

rhyme 20 1sia 'to do (only in dictionaries?); transcription character for Chinese *sa, *sã and Sanskrit sa, sā'

entries in the first (level) tone* volume of the Tangraphic Sea. You can see the characters in Andrew West's online Tangraphic Sea. I have added the initial classes from Homophones. Groups are divided by circles in the original text.

Page/position Initial class Group Reading Fanqie Number of tangraphs
initial final
27.241-27.261 I 1 1pia 2228 p- 1216 6
27.262-28.111 2 1phia 0797 ph- 0618 4
28.112-28.211 3 1mia 5026 m- 3583 17
28.212-28.221 III 4 1tia 5300 t- 4620 2
28.222 5 1thia 5671 th- 1
28.231 1nia 0635 n- 3179 1
28.232 V 1kia 1484 k- 1
28.233-28.241 1gia 2900 g- 1693 2
28.251-28.262 VI 6 1tsia 3031 ts-/dz- 5
28.271 7 1tshia 3278 tsh- 1
28.272-29.111 1sia 4250 s- 2
29.112-29.121 VIII 8 1ʔia 5346 ʔ- 4620 2
29.122 IX 9 1ldia1 0475 ld- 3583 1
29.131 1ldia2 4464 ld- 2019 1
29.132 VI 1swia (5323 sw- + 0537) 1223 1
29.141-29.142 1tshwia 0311 tshw- 1289 2
29.143 IX 10 1lwia 2302 lw- 1825 2

The initial classes are in nearly the same order as Homophones except that some class VI tangraphs break up a group of class IX tangraphs.

The absence of classes II (v-) and VII (retroflex shibilants) is a trait of Grade IV rhymes.

Class IV (ɲ-?) is rare.

Some groups divided by circles correlate with homophone groups (e.g., 1-4), but others don't: e.g., the fifth group is a mixture of class III and V syllables.

Fanqie initial speller 3031 is ambiguous (see "When Rhyme 21 Is Really Rhyme 20" and "When 1825 Is Really 1829"). I would not expect 3031 to represent dz- here, since dz-tangraphs were placed in the Mixed Categories volume of the Tangraphic Sea.

I see now that I mixed up the fanqie of 1829 and 1825 (as well as those characters themselves) last week. Great. For the record, the correct fanqie are


1829 'to heat up, burn' 1tshia = 3278 1tshi + 1693 1sia (Tangraphic Sea 28.271)


1825 1tshwia 'to roast, warm up' and 5041 1tshwia 'stove, furnace' =

0311 1tshwiə + 1289 1lwia (Tangraphic Sea 29.141-29.142)

1825 is from 1829 with a prefix *P- in addition to the *Kɯ- that conditioned aspiration and vowel breaking:

*Kɯ-tsa > 1829 tshia

*P-Kɯ-tsa > 1825 tshwia

(The bare root is in Tibetan tsha 'hot' whose initial aspiration is secondary. More cognates here.)

5041 is presumably an extended use of 1825 (i.e., 'where food is warmed up', 'device for heating').

In theory one might expect only one fanqie final speller for all rhyme 1.20 syllables or two (one for -ia and another for -wia), but in fact there are ten! That does not mean there were ten subtypes of rhyme 1.20 syllables.  Nearly all of those ten can be linked in a complex fanqie tree:

3179 0618
3583 2019

Members of that tree are in pink in the first table. (I have colored 0537 somewhat differently since it is followed by 1223. I will write about 1223 in my next entry.)

I placed 1693 at the root since its fanqie final speller is ... itself! 1693 is the final speller of 3179 and 0618, 3179 is the final speller of 4620 which is the final speller of 3583 and 2019, etc.

The final spellers 1289 and 1825 for -wia form a closed circle. 1289 is the final speller of its final speller 1825 (see above for the fanqie of 1825).


1289 1lwia 'lower limbs, legs' = 2302 1lɨə + 1693 1tshwia (Tangraphic Sea 29.143)

I don't know why 1363 swia wasn't spelled with either 1289 or 1829:


1363 1swia 'time' = 5323 1swi + 0537 1pia + 1223 2phɤo' (Tangraphic Sea 29.132)
Next: What is 1223 doing in that fanqie?

10.14.21:21: The numbers at the ends of

1ldia1 'to come' and 1ldia2 'to return, transport'

indicate that they were treated as nonhomophonous (heterophonous - why isn't that word used more in linguistics?) in the Tangraphic Sea (and in Homophones!) even though their fanqie seem to indicate they are homophones. Their final spellers belong to the same tree (see above), and the initial speller of 1ldia2 is derived from the initial speller of 1ldia1. See "Come Again?" for details.

"1.20" in the title of this post refers to tone one, rhyme 20.

The Tangraphic Sea volume for the second [rising] tone has been lost. Rhyme 2.17 is the rising tone counterpart of rhyme 1.20. The rhyme numbers do not match since not all level tone rhymes have rising tone counterparts and vice versa: e.g., 1.6, 1.13, and 1.16 lacked rising tone versions. Arakawa (1997) lists rhyme 1.20 and 2.17 tangraphs side by side. THE COMING CLAN

Yesterday I reconstructed a Tangut word for 'come' with ld-. Other words for 'come' have the same fanqie initial speller (0475), so they can also be reconstructed with ld-:

3456 1ldia < *Cɯ-La 'to come'

*C- might be the *S- conditioning vowel tension (indicated with a subscript dot) in the words below. *Sɯ- could have been lost after the vowel conditioned breaking (see below) but before *S- could condition tension.

Normally conditions the breaking of *a to ɨa after *l-. Did *a break to ia after *L-?

4106 1ldɨə̣ < *S-Lə 'to come'

2373 1/2ldɨẹ < *Sɯ-La/ə-j(-H) 'to come'

The root vowel is ambiguous.

The Precious Rhymes of the Tangraphic Sea has two entries for this character, one in the level tone volume and the other in the rising tone volume. Although there are other characters with two readings, I don't know of any other case in which the two readings only differ in tone.

5727 1ldɨə̣ < *S-Lə 'to transport, come' (homophone of 4106; cf. how 3456 is nearly homophonous with 3502 'to transport', written as a mirror image of 5727 and derived from it:


I wrote the pre-Tangut source of ld- as *L-. External evidence may help us identify what *L- was. There are many Sino-Tibetan words for 'come' with l-; at least one (Mandarin 來 lai < *mʌ-rək) is not related to the others. Do the Tangut words belong to this clan of l-words? If so,

- do the other languages preserve a root-initial l- that gained a prefix in Tangut?

cf. how *d-l- became ld- in Tibetan (Jacques, "The Laterals in Tibetan")

- or does Tangut preserve a cluster reduced to l- in other languages?

- or are both Tangut ld- and non-Tangut l- from a third source in Proto-Sino-Tibetan? COME AGAIN?

(23:09: The title refers to this idiom and to the fact that 3456 'come' is followed by 3502, another Tangut character containing it in Homophones.)

After two steps backward ... one step forward ... I hope.

In my last post, I mentioned


3456 1lia (Grade IV) 'to come' = 0475 1liu (Grade IV) + 3583 1tia (Grade IV)

which has no homophones: it is in the isolated liquid-initial section of Homophones (A edition, 55A54).

Right below it in Homophones (A edition, 55A55) is


3502 1lia (Grade IV) 'to return, transport' = 4464 1lɨə̣ (Grade III) + 2019 1thia (Grade IV)

which looks like 3456 'come' plus 'hand' and is derived from all of 'come' and the left side of 5727 1lɨə̣ 'transport, come' (also containing 'hand' and 'come' in reverse order) in Tangraphic Sea:


I have followed Gong who reconstructed 3456 and 3502 as homophones in spite of the fact that they are isolates. It would also be hard to distinguish them in context since both are motion verbs. But if they weren't homophones, what was the difference between them?

Could they have had different initials? Their initial spellers are of different grades (III and IV). So perhaps 3456 had Grade IV [l] whereas 4464 had Grade III velarized [ɫ]. If they had identical finals, I would have to posit a phonemic distinction between /l/ and velarized /ɫ/. Sofronov (1968 II: 308) reconstructed 3456 as 1la and 4464 as 1lda. But how could there be such a distinction if the two initial spellers were part of the same fanqie chain?


4464 1lɨə̣ (Grade III) = 0475 1liu (Grade IV) + 1493 siə̣ (Grade IV)

(There was no /ɨə̣/ : /iə̣/ distinction; the quality of the first vowel was dependent on the initial.)

Tai (2008: 201) reconstructed the initial of that chain as ld- since it was transcribed in Tibetan as ld- (11 times) and  zl- (3 times), but never as a simple l- (Tai 2008: 198). That initial was transcribed in late 12th century northwestern Chinese as *l- which is not necessarily evidence for reconstructing Tangut l-. Chinese *l- would have been the best available substitute for an un-Chinese *ld-. (There was no *d- in that Chinese dialect.) Hence there seem to have been two kinds of 1ldia.

I cannot reconstruct either 3456 or 3502 with -w- since the fanqie do not contain such a medial. The final spellers were transcribed in Tibetan without -w- (Tai 2008: 210):

3853: ta (37 times)

2019: tha (9 times)

3853 was also used to transcribe Sanskrit ṭa, ta, and without -v- (Sanskrit had no -w-).

The Chinese transcriptions 怛 *ta and 達 *tha for 3853 and 2019 lack *-w-.

None of the transcription evidence supports the -i- required by my Grade IV hypothesis or Gong's -j-. Sofronov's (2012) -a is much more likely for rhyme 20 which he regarded as Grade I, not IV. The l- from earlier in this post would be unusual before a Grade IV rhyme but normal before a Grade I rhyme. Sofronov (2012) sometimes reconstructed more than one value for a single Tangut rhyme, but rhyme 20 was not one of them. At this point I can only combine Tai's ld- with Sofronov's 1-a and be agnostic about the difference between the two 1lda-like syllables (3456 and 3502). WHEN 1825 IS REALLY 1829

What's worse than having to publicly correct a mistake on a blog? Having to publicly correct that correction!

Andrew West pointed out that the correct fanqie for Tangut character 3371 (and its homophones 0596 and 1283) is

3371, 0596, 1283 1dzia = 3031 + 1829 (not 1825!)

I got the idée fixe that


was the final speller and didn't notice that 1829 with the same left-hand radical 'fire' and a similar right-hand radical in the fanqie of the handwritten copy of the Tangraphic Sea in Wenhai yanjiu (1983) or Arakawa's Seikago tsūin jiten (Tangut rhyme dictionary, 1997).

Notice that I have not supplied readings for 3031 or 1829.

I have already explained why 3031 is ambiguous, and I will add one more complication here:

- 3031 is the initial speller for 3371, 0596, and 1283 which are in the MIxed Categories of the Tangraphic Sea. For some reason, all dz-, dʐ-, and ɬ-syllables were placed in Mixed Categories along with a seemingly random smattering of other syllables. That suggests 3371, 0596, 1283 had dz-.

- On the other hand, 3031 is in the 'rising' tone volume of Precious Rhymes of the Tangraphic Sea instead of the Mixed Categories volume. That implies 3031 did not have dz-.

The fanqie for 1829 indicates -w- ... or does it? There is no transcription evidence for the -w- of 1829, its final speller 1289 1lwia or 0259 1lwia, the only homophone of 1289. -w- is an attempt to account for why 1289 1lwia is not in the same homophone group as

3456 1lia 'to come'

whose Chinese transcription 辢 *la has no *-w-. Then again, that transcription is not ironclad proof 3456 didn't have -w-, because the Chinese known to the Tangut had no syllable *lwa. Nonetheless a Tangut lwia could have been transcribed in Chinese as 辢 *laCLOSED with a small 合 'closed (mouth)' diacritic to indicate -w-.) 1289 and 3456 had the same initial (l-) and rhyme (1-ia), so they presumably had different medials (-w- and zero).

If 1289 was 1lwia, then 1829 was 1tshwia, and 3371, 0596, and 1283 were 1dzwia ... which conflicts with the use of 0596 as a transcription character for Sanskrit ja without -v- (there is no -w- in Sanskrit).

Let's suppose that 3371, 0596, and 1283 were 1dzia without -w- and that their fanqie final speller 1829 was 1tshia without -w-. 1829 and 1825 were in different homophone groups even though they had the same initial (tsh-) and rhyme (1-ia), so they presumably had different medials (zero and -w-). But if 1825 was 1tshwia, why was it transcribed in Tibetan as tsha instead of tshwa? Was the subscript -wa character accidentally omitted?

This is so frustrating. I want to end on a more positive note. Andrew West recently created an online Homophones lookup tool. You can input the Li Fanwen 2008 numbers I use for tangraphs to see that

- 3371, 0596, 1283 1dz?(w)ia are in the same homophone group (31A46-48; all Homophones numbers here are from the A edition; different editions have different numbers)

- 1829 1tsh(w)ia (the final speller for those three syllables) and 1825 1tsh(w)ia (which I confused with 1829) are in different homophone groups (31B36 [which has no homophones] and 33A13-14 [a set of two homophones: 5041 and 1825])

- 1289 1lwia (the final speller for 1829) and 3456 1lia are in different homophone groups (53B78-54A11 [a set of two homophones: 1289 and 0259] and 55A54 [which has no homophones])

Alas, Homophones does not give any concrete information about the homophone groups beyond their initial classes: e.g., 3371, 0596, 1283, 1829, and 1825 belong to the sixth class (alveolars) and 1289 and 3456 belong to the ninth class (liquids). The Tangraphic Sea lists homophone groups organized by rhyme with fanqie, but fanqie for most 'rising' tone syllables are lost, and readings for fanqie spellers are dependent on a mixture of transcription evidence and educated guesswork (e.g., the reasoning for reconstructing -w- above). WHEN RHYME 21 IS REALLY RHYME 20

(10.11.18:25: Formerly titled "Tangut Grade III -a('): Rhymes 19 and 21 (Part 2)", but I changed the title since this entry has nothing to do with either rhyme apart from my confusion of rhymes 20 and 21.)

If you don't want to constantly make a fool of yourself in public, don't blog about Tangut.

For the past couple of days, I've been reconstructing 3371 as 1dzɨa' with Grade III rhyme 21 which would be unusual after dz-, but its fanqie in the Mixed Categories of the Tangraphic Sea clearly indicates that it has Grade IV rhyme 20 which is normal after dz-:


3371 1dzia = 3031 2dzi + 1825 1tshwia (sic; should be 1829!)

Even this corrected (?) reading remains problematic for several reasons.

First, the initial might be ts-. The evidence is ambiguous:

1. There is no fanqie for 3031, the initial speller of 3371.

2. 3031 was used to transcribe

Chinese characters with *ts-readings

Sanskrit ci (pronounced [tsi] in the variety of Sanskrit known to the Tangut, probably via Tibetan which had [ts] for Sanskrit c).

3. 3031 was transcribed in Tibetan as both Hdza and Htsa. The phonetic value of H- is uncertain: it could have represented prenasalization or a voiced back fricative.

4. Another character

1290 2?-ew 'ordinal suffix, class, limitation'

with 3031 as a fanqie initial speller was transcribed in Tibetan as tsa, tsi(H), gtsiH, and gdzi(H).

5. 3371 was homophonous with

0596 'to grow'

a transcription character for Sanskrit ja (pronounced [dza] in the variety of Sanskrit known to the Tangut, probably via Tibetan which had [dz] for Sanskrit j).

Second, it would be odd for a -wia graph (1825; sic - should be 1829!) to be a fanqie final speller for -ia without -w-. But it would also be odd for Sanskrit ja [dza] to be transcribed as dzwia instead of dzia.

The Tibetan transcription of 1825 is tsha, not tshwa. So maybe 1825 lacked -w- after all. And maybe it lacked -i- as well. A Sofronov-style reconstruction of 1825 as 1tsha may be best. But then how can one explain the different fanqie for the other 1tsha (or 1tshia) in the Tangraphic Sea?


1829 'to heat up, burn' 1tshia = 0311 1tshwiə + 1289 1lwia

(10.14.20:00: This is actually the fanqie for 1825!)

Maybe 1829 had -w- and 1825 and its homophone

5041 'stove, furnace'

did not. Their fanqie has no -w- in either speller:


1825 and 5041 1tshia = 3278 1tshi + 1693 1sia (used to transcribe Sanskrit sa)

(10.14.20:00: This is actually the fanqie for 1829!)

I will revise my reconstructions accordingly:

Tangraph Sofronov 1968 Li Fanwen 1986 Gong Nishida-style reconstruction in Arakawa 1997 This site
1829 1tsha 1tsha 1tshja 1tshaɦ 1tshwia (formerly 1tshia)
1825 1tshwa 1tshɛ 1tshjwa 1tshaɦ² 1tshia (formerly 1tshwia)

(10.14.20:04: No, judging from the corrected fanqie, Sofronov and Gong were right to reconstruct -w- in 1825 and 5041! Which means that the equation below is still 'broken' or 'unbalanced', depending on your preference in metaphors.)

Plugging that revised reconstruction of 1825 back into the fanqie at the beginning of this post results in a balanced equation:

3371 1dzia = 3031 2dzi + 1825 1tshia

The two homophones of 1825 listed in Mixed Categories of the Tangraphic Sea share that fanqie and should also be read as 1dzia:

0596 'to grow' and 1283 'stomach' (attested only in dictionaries)

This entry demonstrates how errors and their corrections can cause chain reactions in Tangut reconstructions.

I have eliminated one type of apparent anomaly in rhyme 21: the combination of an alveolar initial dz- with the Grade III medial -ɨ-. But other anomalies remain, and I will examine them in future entries. TANGUT GRADE III -A('): RHYMES 19 AND 21 (PART 1)

Last night I mentioned the words (phrases?)

3371 0378 1dzɨa' 2ʔʊ 'curled hair' and 3371 1144 1dzɨa' 2dị 'bun (of hair)'

and noted that their first syllables had an anomalous initial-rhyme combination. (No, actually they don't!)

3371 has the Grade III rhyme 21 (= 1.21/level tone rhyme 21 and 2.18/rising tone rhyme 18). (10.10.20:01: The true rhyme of 3371 is 20.) Here are the latest reconstructions of that rhyme and its immediate neighbors in the first rhyme cycle:

Rhyme Tibetan transcription Gong 1997 Arakawa 1999 Sofronov 2012 This site
Grade Rhyme Grade Rhyme Grade Rhyme Grade Rhyme
17 -a(H) I -a I -a I I -a
18 (none) II -ia II -ya II -ɑ̂ II -ɤa
19 -a(H) III -ja IIIa -a: III -jɑ III -ɨa
20 IIIb I -a IV -ia
21 III -jaa IV -ya: II/IV -â/-ä III -ɨa'
22 -ang (sic!) I -aa I -a' I -aˁ I -a'
23 -ar II -iaa II -ya' II/III/IV -âˁ/-jaˁ/-äˁ II -ɤa'
24 -a(H) III -jaa III -a:' I/II -aɯ/-âɯ IV -ia'
25 -am I -ã I -an I -an I -ã
26 (none) II -iã II -yan II -ân II -ɤã
27 III -jã III -a:n III/IV -jan/-än III/IV -ɨã/-iã

In Gong's reconstruction, there is no Grade III/IV distinction, and many rhymes are redundant: e.g., rhymes 21 and 24. Hence Gong regarded

3371 1dzjaa (rhyme 21; = my 1dzɨa') 'hair worn in a bun; peak' and 4075 1dzjaa (rhyme 24; my 1dzia') 'thrifty'

(10.10.20:02: 3371 should be 1dzia with rhyme 20.)

as homophones in spite of their placement into different rhymes and homophone groups in the Tangraphic Sea. They are not homophonous in the other three reconstructions.

In Arakawa's reconstruction, rhyme 21 is the only Grade IV rhyme, and it has a combination of the -y- of his Grade II and the vowel length of his Grade III.

Sofronov's reconstruction is very different from all others: e.g., it has Grade II and Grade IV variants of rhyme 21. Sofronov reconstructs five subtypes of a-rhymes corresponding to three subtypes in the other reconstructions.

In my reconstruction, Grade III rhymes are characterized by medial -ɨ- and are distinct from Grade IV rhymes with -i-. Grade III and IV rhymes typically have different initials:

III: v- (= w- in most other reconstructions), shibilants (tʂ-, tʂh-, dʐ-, ʂ-, ʐ-), l- (cf. Grade II which occurs with shibilants but not sibilants or r-)

All of these initials are associated with Grade III in the Late Middle Chinese (LMC) of the rhyme table tradition. (So are many other LMC initials other than sibilants and *ɣ-.) In LMC, Grade III was nonpalatal and Grade IV was palatal. Assuming that the Tangut carried over that distinction into their analysis of their own language, Tangut Grade III initials must have been nonpalatal. Tangut l may have been velarized [ɫ].

IV: all other initials (cf. Grade I which occurs with all non-shibilants)

However, this correlation between grade and initial is not absolute: e.g., 1dzɨa' has a dz- that normally should precede a Grade IV rhyme. Hence the distinction between medial /ɨ/ and /i/ is phonemic as well as phonetic, and the Tangut created separate rhyme categories whenever the medial could not be predicted on the basis of the initial. Minimal pairs like 3371 and 4075 above necessitated the separation of rhymes 21 and 24. (10.10.20:05: 3371 1dzia [not 1dzɨa'!] and 4075 1dzia' actually differ in terms of the presence or absence of the mysterious feature that I write as -', not in terms of medials.)

On the other hand, I presume all medials in rhyme 27 were nondistinctive (and predictable?*) as suggested by the mixture of Grade III and IV in this rhyme 27 fanqie:


1ʂɨã (Grade III) = 2ʂɨu (Grade III) + 1kiã (Grade IV!)

Hence there was no need to create separate rhyme categories for -ɨã and -iã syllables.

I'll start looking at the unpredictable medials of rhymes 21 and its -'-less counterpart 19 this weekend.

*It is possible that -ɨ- and -i- were completely interchangeable in rhymes like 27: e.g.,

1ʂɨã ~ 1ʂiã (cf. Grade III rhyme 36 1ʂɨe; there is no Grade IV rhyme 37 *1ʂie)

1kɨã ~ 1kiã (cf. Grade IV rhyme 37 1kie; there is no Grade III rhyme 36 *1kɨe)

It is also possible that rhyme 27 had only one medial (-ɨ- or -i-) after all initials, so all rhyme 27 syllables were Grade III or IV.

It is not possible to choose between these alternatives at this point. It might be more accurate to write the medial of rhyme 27 with an algebraic symbol like -I-. However, I have already used that symbol to represent a lost unstressed presyllabic vowel conditioning the raising and fronting of pre-Tangut *a to i. I assign medials to rhyme 27 syllables following the general pattern: -ɨ- after shibilants (there are no v- or l-rhyme 27 syllables) and -i- after other initials. WHIP = TSU + SHARP + ?

If 0219 2tseʳw 'whip' has three sources, the first two might be one of three tangraphs with a TSU-type reading and 3767 1reʳw 'sharp, pointed end':


What might be the third? There are nine tangraphs sharing a right side with 0219 that I didn't cover last Saturday:

LFW2008 Tangraph Reading LFW2008 gloss Class(es)
0054 1tswa hair worn in a bun or coil HAIR
0375 1ka second syllable of 2phʊ 1ka 'boots worn in rain or snow' HAIR (fur boots?)
0378 2ʔʊ second syllable of 1dzɨa' 2ʔʊ 'curled hair' HAIR
1144 2dị second syllable of 1dzɨa' 2dị 'bun (of hair)' HAIR
2279 1swa second syllable of 2siọ 1swa 'a kind of grass' SWA
4021 1swa second syllable of 1niu 1swa 'ear ornament' SWA
4045 1swa hair HAIR, SWA
4371 1dạ second syllable of 2me 1dạ 'hair' HAIR
5133 2rieʳ wool, feather, fine hair HAIR

All of the above characters either represent (parts of) words for hair or syllables homophonous with 1swa 'hair'. So 2tseʳw 'whip' is either 'TSU + hair' or 'TSU + sharp + hair'.

Two of the above characters (0378, 1144) are only attested after


3371 1dzɨa' 'hair worn in a bun or coil; peak (< like a bun of hair on the top of the head?)' = 2750 1ɣɤu 'head' + 1lwʊ̣ 'to mix, blend'

They may be adjectives modifying 1dzɨa'.

Both the structure and pronunciation of 3371 are odd to me (10.10.20:15: because I reconstructed 3371 incorrectly! It should be 1dzia with a Grade IV rhyme, not 1dzɨa' with a Grade III rhyme.) I wouldn't describe a bun or coil as mixed and blended hair. And Grade III rhymes with -ɨ- normally don't follow alveolars. I will take a closer took at -ɨa' tomorrow. WERE TANGUT WHIPS SHARP?

On Sunday I concluded that the left side of 0219 2tseʳw 'whip' might be an abbreviation of some tangraph with a TSU-type reading, though I admit the phonetic match is poor:


2tseʳw 'whip' < left of 1tshwiu, bottom left of 2dziu', or right of 2dʐwɨiw?

I also identified the rest of 0219 as being from

2061 2pɤẹ̃ 'hair'

as a whole. And on Saturday I used Google to demonstrate that whips are associated with hair in English, though of course there is no guarantee the Tangut also had such an association.

2061 of course consists of two components. Maybe each of those components in 0219 2tseʳw 'whip' is from a different source. Let's look at eleven possible sources of

the center of 0219:

LFW2008 Tangraph Reading LFW2008 gloss Class(es)
1146 2kạ tattered TATTER
1964 ?ɬə smooth LHY, SMOOTH
2434 1bie to mend, patch BE, TATTER (i.e., to fix tatters)
2600 1miaʳ hair HAIR
3088 1bie second syllable of 2bə 1bie 'dung beetle' BE
3089 1tʂɨọ ugly UGLY
3090 2ɬọ first syllable of 2ɬọ 2ɬwi 'ugly and old'; can it stand alone? UGLY
3558 2pɤẹ̃ first syllable of 2pɤẹ̃ 2ba 'flattery' BE
3767 1reʳw sharp, pointed end SMOOTH (left and center from 1963 'smooth')
4330 1ʔị ladle, scoop I (bottom center and right from 3101 2ʔị 'to repeat')
4817 ?ɬə plane for carpentry LHY

I have excluded five tangraphs containing 2061.

The classes can be grouped into three families:


SMOOTH > LHY > (UGLY if 2ɬọ had a ?ɬə tangraph as phonetic)


The last is an unusual case, as the shape of the bottom center component of 4330 1ʔị 'ladle' does not match its source 3101 2ʔị 'to repeat' in its Tangraphic Sea analysis:


The source of the top and bottom left of 4330 1ʔị 'ladle' is 4368 2dwʊ 'chopsticks'.

Among these characters, the best candidate for a source of 0219 'whip' is 3767 1reʳw 'sharp, pointed end'. I wish I knew more about Tangut material culture. Did Tangut whips have sharp ends? THE APPEARANCE OF ANGER

Two of the Tangut words in yesterday's table

0924 2niạ 'anger, rage' and 0996 2mə 'appearance, spirit'

were borrowings from Chinese 惱 'angry' and 模 'pattern' according to Li Fanwen (2008: 156, 167).

The first etymology would work only if there was a pre-Tangut prefix *Sɯ- of unknown function (!) added to *nawʔ from Middle Chinese *nawˀ. The *S- of the prefix conditioned vowel tension (indicated by a subscript dot) and the high vowel of the prefix conditioned the -i- in the main syllable:

*Sɯ-nawʔ > *Sɯ-nɨawʔ > *Sɯ-nɨaɯʔ > *S-nɨaɯʔ > *nnɨaɯʔ > *ṇɨaɯʔ > *ṇɨ̣ạɯ̣ʔ > 2niạ

The relative chronology of changes is not entirely clear, though *a-breaking must have preceded -loss and *S-tension.

I once thought Tangut rhymes ending in the algebraic symbol -' (corresponding to what I used to reconstruct as long vowels) once had final consonants:

-V' (= -VV) < *-VC

If that were the case - and I don't think it was* - then the absence of -' in 0924 2niạ would not rule an earlier final consonant (i.e., *-w) since -' could not occur with tense vowels. This complimentary distribution is a clue to the identity of -' which had to have some phonetic characteristic that was incompatible with tense vowels.

The second etymology is highly improbable because Middle Chinese 模 *mo 'pattern' should correspond to Tangut *2mʊ, not 2mə. (See Gong 2002: 413 for examples of MC *-uo : Tangut -u which is equivalent to MC *-o : Tangut -ʊ in my reconstruction. I regret not include the raising of *-o to *-ʊ in pre-Tangut.)

There are isolated instances of the correspondences

Tangut : Japhug rGyalrong -u < *-o, -ɯ < *-u

in Jacques (2006), but the general pattern is clear:

Tangut (= Jacques' -u) : Japhug rGyalrong -u < *-o, -ɯ < *-u

2mə 'spirit' may be an unrelated homophone of 2mə 'pattern' that was written with the same character.

The Precious Rhymes of the Tangraphic Sea analyzed the graph 0996 for 2mə as being from


the top of 1365 and the bottom of 4744 2ʔiõ 'appearance' (a loan from Middle Chinese 樣 *jɨaŋʰ or Tangut period northwestern Chinese *jõ).

Li may have been tempted to have derived 2mə from Middle Chinese 模 *mo 'pattern' since the word appears with the clarifying character 4744 in Homophones:

2ʔiõ 2mə

He translated that collocation as 模樣 'pattern' which would have been read as *mo jɨaŋʰ in Middle Chinese - a near-mirror image of 2ʔiõ 2mə! I think this resemblance is coincidental. In Tangut period northwestern Chinese, 模樣 was something like *mbʊ jõ which would have been borrowed into Tangut as *bʊ 2ʔiõ. (Tangut tones for Chinese loans are unpredictable, so I have not indicated the hypothetical tone of the first syllable.)

The analysis of 0924 2niạ 'anger, rage' is unknown. Perhaps it was from the top and bottom left of 0948 1na 'to steal' (phonetic) plus 'demon' (semantic) extracted from one of forty-nine different possible characters:


('Demon' has left-hand and right-hand forms which are interchangeable in tangraphic analyses.)

None of the other 'demon' characters mean 'anger', so none stand out as more likely sources than others.

*My old -V' < *-VC hypothesis would not predict Tangut-Japhug rGyalrong comparisons such as these from Jacques (2006):

'nose': 5700 2ni' (not *2ni) : J sna

'needle': 4935 1ɣa (not *1ɣa') : J ta-qaβ

Correlations between Tangut -' and Japhug final consonants in sets such as

'fruit': 2436 1mia' : J sɯ-mat

may be coincidental. WHAT PLUS 'HAIR' EQUALS 'WHIP'?

If the center and right components of

0219 2tseʳw 'whip'

are from

2061 2pɤẹ̃ 'hair',

what is the source of the left-hand component


None of the 69 other characters with that component are a plausible semantic match for 2tseʳw 'whip' which may belong to the TSU phonetic class:

LFW2008 Tangraph Reading LFW2008 gloss Class(es) Class codes
0009 1ʂwɨo to appear; to raise (< 'cause to appear'?) APPEAR S1
0020 1tʂɨa road, way (literal and metaphorical: 'manner'); to lay bricks CHA, ROAD P1, S2
0029 2rɪʳ market RIR P2
0033 1tshị road, way ROAD S2
0051 1thaʳ' obvious APPEAR S1
0060 1dõ street ROAD S2
0130 2thʊ source, resources TU? P3?
0486 2paʳ horse with white trotters PAR P4
0503 1tʂɨa the surname Cha CHA P1
0745 2vɨe the surname Ve VE P5
0752 1tʂɨa ceremony, courtesy CEREMONY, CHA S3, P1
0753 2vɨe face VE P5
0760 2dʐɨe to judge, decide JE P6
0924 2niạ anger, rage NA P7
0948 1na to steal, rob NA P7
0996 2mə appearance, spirit APPEAR S1
1003 1lew full, filled, satisfied not HOLLOW?, LU? (but analysis has 1630 2dziẽ 'carve'!) S4
1026 1tʂwɨa the name Chwa; luck CHA P1
1071 2dziu' first half of 1071 1226 'to hide, conceal' HIDE, TSU? S5, P8?
1082 2riʳ second syllable of surnames ending in Rir RIR P2
1094 2ʐɨə to go without a burden GO S6
1226 ?T- second half of 1071 1226 'to hide, conceal' HIDE, TU? S5, P3?
1360 1va to hide, conceal' HIDE S5
1364 1ŋa hollow, void HOLLOW, NGA S4, P9
1578 1swiə ear EAR S7
1588 1tʂɨa sheep guardian god CHA, SHEEP P1, S8
1630 2dziẽ to carve CARVE, JE S9, P6
1641 2dʐɨa lamb CHA?, SHEEP? (but analysis has 1043 1lew 'full') P1?, S8
1651 1tshwiu to salute CEREMONY, TSU? S3, P8?
2663 1kwiə̣ to kowtow, worship on bended knees CEREMONY S3
2755 2lwəʳ the surname Lwyr LWYR P10
2972 1ŋa to spread; Grinstead: 'empty' HOLLOW?, NGA S4?, P9
3049 1xwaʳ to melt, thaw; to confess (< 'melt down' and release information?) XA, SPEAK P11, S10
3575 2ni to listen, hear EAR S7
3579 2kie impressive and dignified, eminent APPEAR (i.e., prominent?), CEREMONY? S1, S3?
3689 1lʊ' to dig LU P12
3731 1khɤu' to milk KHY P13
3813 2vɨẹ to see someone off VE P5
3821 2lʊ to enjoin; to tell; to give a present CEREMONY?, LU, SPEAK S3?, P12, S10
3828 1tʂɨə to give a present; to enjoin; to tell; to know CEREMONY?, CHA, SPEAK? (but no CEREMONY, CHA, or SPEAK graph in analysis which has 3813 2vɨẹ 'to see someone off') S3?, P1?, S10?
3874 1ʔiə hunger HOLLOW (lacking food) S4
3920 1kiụ to bow, salute CEREMONY S3
4110 2paʳ awning, shed PAR P4
4153 2lɨiw to gather, assemble; transcription character LU? P12?
4170 1dza to chisel CARVE S9
4185 2ʂɨa musk SHA P14
4201 ?kha casket, small box XA? P11?
4469 2ʂɨi to go toward, depart GO S6
4475 ?xa to puff, blow; transcription character XA P11
4534 2dʐwɨiw hungry HOLLOW (lacking food; but analysis has 130 'source')?, TSU? S4?, P8?
4677 2bə bull BA P15
4681 1niu ear EAR, NU S7, P16
4682 2khiə' chimney, window, hole, space HOLLOW, KHY S4, P13
4696 1bạ cymbals BA, CYMBAL P15, S11
4744 2ʔiõ appearance, shape; transcription character APPEAR S1
4758 1tsiə big cymbals CYMBAL S11
4761 1ʂwɨa to speak, say SHA, SPEAK P14
4762 1tʂhɨe to go, walk GO S6
4766 2bə a kind of vegetable BA P15
4768 1ʂwɨa ambition, will SHA P14
4812 2rioʳ to brush, wipe, whisk RIR? P2
4822 2dzwiə to go, walk GO S6
4849 1niu the surname Nu NU P16
4894 1mio to listen, hear EAR S7
5126 1lɨu' to carve, engrave CARVE, LU S9, P12
5412 2lwəʳ ceremony, rite; to get a haircut; transcription character CEREMONY, LWYR S3, P10
5693 1vɪʳ to listen, hear EAR, VE S7, P5
6010 1kiụ to bow, salute (= 3920) CEREMONY S3

I have numbered phonetic (P) classes by order of first occurrence in the table above. Class names are in my lay romanization for Tangut which ignores the four grades, vowel tension, and the unknown distinction indicated by -'. Y represents central nonlow vowels.

Phonetic classes organized by Homophones chapter

Chapter Initial type Phonetic class
I Labials P4. PAR, P15. BA
II Labiodentals P5. VE
III Dentals P3. TU, P7. NA, P16. NU
IV Retroflexes (none)
V Velars P9. NGA, P13. KHY
VI Alveolars (no pure VI classes) P6. JE,
VII Alveopalatals (actually retroflex shibilants?) P1. CHA, P14. SHA
VIII Glottals P11. XA
IX Liquids P2. RIR, P10. LWYR, P12. LU

Some of the phonetic classes could be combined (P4. PAR + P15. BA, P1. CHA + P14. SHA, P10. LWYR, P12. LU).

P6 and P8 might be split, as I am not certain that mixing class VI and VII initials was permissible in Tangut phonetic series.

I have also numbered semantic (S) classes by order of occurrence:






S6. GO






10.6.0:59: Some of those 27 classes could be combined into even bigger classes using ambiguous graphs as pivots: e.g., 0020 can either be CHA or ROAD, so CHA and ROAD graphs could be grouped together. Here is one particularly large group containing 18 classes:


That diagram is meant to be read from left to right: e.g.,


Two smaller groups are



Three classes cannot be combined with others: GO, NA, RIR.

Thus one could say there are six kinds of :



3. EAR

4. GO

5. NAR

6. RIR

But I doubt literate Tangut actually looked, at, say, 0760 2dʐɨe 'to judge, and thought, 'its left side indicates that it has a JE-like reading like 1630 2dziẽ 'carve', derived from the right side of 5126 1lɨu' 'to carve', in turn derived from the bottom left of 3821 2lʊ 'to give a present', in turn derived from the center of 5412 2lwəʳ 'ceremony':


How did the Tangut learn and perceive their own script? ARE WHIPS LIKE HAIR?

It was fun to use tentative Unicode code points for Tangut characters and components in my last post, but now I'm going to use Li Fanwen 2008 numbers again.

I've been trying to figure out the graphic etymology of

0219 2tseʳw 'whip'

The left side is shared with 69 other characters which don't seem to have any phonetic or semantic similarity to 2tseʳw 'whip'. I'll look at them again and post a list tomorrow.

The center and right components appear in five other characters. I already mentioned the first yesterday:

LFW2008 Tangraph Reading LFW 2008 gloss Character structure


2pɤẹ̃ hair left of 'hair' + left of another graph for 'hair'


2mioʳ second syllable of 2177 0227 1pə 2mioʳ 'rude, coarse, careless' 'language' + 'hair': i.e., coarse words are rude


2phʊ boots worn in rain or snow 'boots' next to 'hair': i.e., furry boots


2giu silk, silkworm 'bug' atop 'hair' (i.e., silk thread)


2ɬɤi smooth, glossy 'not' next to 'hair'

If the right two-thirds of 0219 were taken as a unit, then 'hair' is the most likely source. Although a whip is not much like a hair, it is even less like 'rude', 'rain and snow boots', 'silk(worm)', or 'smooth'. Moreover, none of the five sound like 2tseʳw.

I'll break up that two-thirds and see if I can find more plausible graphic sources.

10.5.0:30: Are whips associated with hair on Google?

"whip like a hair": 0 results

"whips like hair": 2 results

"whips made of hair": 7 results

"hairs like whips": 229 results

"hairs whip": 374 results

"hair whips" 32,100 results

"hair like whips": 39,400 results

"whip hair": 62,200 results

"hair whip": 93,500 results

"hair like a whip": 273,000 results

Of course modern English usage is not the key to the ancient Tangut mind. Nonetheless, the whip-hair connection is stronger in the 21st century than I had thought. UNICODE TANGUT COMING IN JUNE 2016

This has been an exciting week. First, Baxter and Sagart's new Old Chinese reconstruction, then the catalog of Khitan large script characters, and in less than two years, 6,126 Tangut characters plus the Tangut iteration mark  and 753 Tangut radicals. Andrew West has documented the long road his team has taken. Bravo!

Finding Tangut characters is easy in Unicode. For example, if I want the first character I mentioned on Wednesday, I can just search for its Li Fanwen 2008 number (0219) on this code chart, and voila!

U+17366 2tseʳw 'whip'

And I can find the second character I mentioned on Wednesday (Li Fanwen 2008 number 1877) by looking through the range of characters sharing its left-hand radical U+1896E (= Nishida 219, gloss unknown). Oddly the source graph for its left side according to the Combined Homophones and Tangraphic Sea has a different radical (U+18954 = Nishida 218 'dog/fox'):


U+1785F 2ʔiəʳ 'whip' =

left (!) of U+175EF 2khɤi 'yak'

all of U+18571 2phʊ 'tree'

Why does 'yak' plus 'tree' equal 'whip'?

The analysis of U+17366 2tseʳw 'whip' is unknown. There are 69 other characters containing the component

U+1892C (= Nishida 103, gloss unknown),

16 other chararacters with the middle component

U+18942 (= Nishida 275, gloss unknown),

14 other characters with the right-hand component

U+18975 (= Nishida 134, gloss unknown),

and five other characters containing the middle and right hand components: e.g.,

U+173F3 2pɤẹ̃ 'hair'.

Is a whip like a giant hair? Maybe. Or maybe there's a more likely source of the right two-thirds of U+1785F 2ʔiəʳ 'whip'. I'll look at the possibilities tomorrow. THE KHITAN LARGE SCRIPT IN SRI LANKA

I never expected Khitan to be discussed in

Sri Lanka <ś.ri l.ang.k.a>

at WG2 meeting 63. To be more precise, it was the Khitan large script that came up, not the Khitan small script above. I'm much less confident about this attempt to write the name in the large script:

<ś(i) ri la ang ka>

Even if one or more of those characters turns out to be inappropriate for transcribing Sri Lanka, I'm certain that a large script spelling would take up more space than its small script equivalent since the former is not clustered into word blocks like the latter.

The first of the large script characters is identical to the Chinese character 已 pronounced i in Liao Chinese, the northeastern dialect known to the Khitan a millennium ago. Should Khitan large script characters be unified with Chinese characters in Unicode?

The unification was proposed to minimize the security issues caused by co‐existence of similar shaped characters in the CJK Unified Ideograph [i.e., Chinese character] block and Khitan Large Script block.

Not knowing what the security issues are, I oppose unification. Unifying Chinese characters and the Khitan large script would be like unifying Latin A, Greek Α, and Cyrillic А. Would Greek and Cyrillic lookalike letters (e.g., Γ and Г) be assigned to one or the other alphabet while letters unique to Greek or Cyrillic (e.g., Δ and Д) were assigned to separate alphabets? My mind reels.

I also don't think unifying Jurchen (large) script characters resembling Khitan large script characters is a good idea. To me, Chinese characters, Khitan large script characters, and Jurchen (large) script characters are like the Latin, Greek, and Cyrillic alphabets: related scripts that should be kept apart in spite of partial visual overlap.

Encoding issues aside, I've been excited to browse the longest list of Khitan large script characters I have ever seen:

Proposal on Encoding Khitan Large Script in UCS

Part 1: Characters 0001-0472

Part 2: Characters 0473-0963

Part 3: Characters 0964-1455

Part 4: Characters 1456-1930

Part 5: Characters 1931-2218

( This last file does not include 已 <ś(i)> attested in the epitaph for 多羅里本 Duoluoliben [a.k.a. 突呂不 Tulübu, 1081], though it does include 己 [#1938] and 巳 [#1941] which also look like Chinese characters.)

I especially appreciate the inclusion of images of original characters. (10.3.0:06: But I wish I understood the codes for their sources.) I wanted to continue my series on Baxter and Sagart's new Old Chinese reconstruction, but I had to interrupt it to mention this breakthrough in Khitanology.

Alas, that list does not include any characters that Viacheslav Zaytsev may have discovered in Nova N 176, the longest known Khitan text in either script. As much as I'd love to be able to type the Khitan large script in Unicode as soon as possible, I wonder if it might be a good idea to wait until the characters in that book have been catalogued. It might be odd to have a first Khitan large script encoding covering all texts but Nova N 176. Typing words from what may be the most important Khitan text in the far future might involve going back and forth between a primary Khitan large script block and a Khitan Extended-A block. Awkward.

10.3.1:18: ADDENDUM: The Khitan large script proposal lists several inscriptions that I have never heard of before:

1. 耶律大王墓誌 Epitaph for Prince Yelü (personal name not given; 1051)

2. 耶律準墓誌銘 Epitaph for Yelü Zhun (1068)

3. 耶律李家奴墓誌銘 Epitaph for Yelü Li Jianu (1081)

4. 留隱太師墓誌銘 Epitaph for Master Liuyin (1109)

I wish I could see them. GSR 0000 IN BAXTER AND SAGART (2014): PART 1

I didn't know Baxter and Sagart's new book Old Chinese: A New Reconstruction came out until it was released in the US yesterday, almost two weeks after it was released in the UK on 18 September. I'm not surprised it's sold out in the UK. I've waited years for it. I'll have to wait even longer because I can't afford it. But for now at least I can look at the reconstructions which the authors have kindly shared with the public (alternate URL). All reconstructions in this post are Baxter and Sagart's unless I state otherwise.

Will these reconstructions ultimately displace those of Karlgren's Grammata serica recensa (GSR, 1957)? We shall see.

For years  I would recommend Schuessler's Minimal Old Chinese (2009) reconstructions to nonspecialists, as they incorporate many post-GSR elements widely accepted among scholars today (e.g., a six-vowel system) while excluding more controversial proposals. (I also recommend his Later Han Chinese in the same book. By definition it's too early to be Middle Chinese, but it's close, and I prefer it to nearly all Middle Chinese reconstructions I've seen.)

I dream of publishing my own reconstruction, but I really should finish my Golden Guide translation first, among many other things. I'd also like to publish a complete list of my readings of Tangut characters and the pre-Tangut sources of those readings. Both my Chinese and Tangut reconstructions have only been available in scattered form on this site and a couple of publications.

Enough about me. Let's start looking at Baxter and Sagart's Old Chinese reconstructions organized by GSR numbers. (Alas, the characters in the PDF are not directly searchable, though one can indirectly find them by searching for their Unicode code points.) At the top of the list are characters without GSR codes. Baxter and Sagart assigned them the number 0000.

The first character is 𠓥 *pe[n] 'whip', an alternate spelling of 鞭.

鞭 is a semantic-phonetic compound ('leather' + *be[n]) whereas 𠓥 is a compound of 攴 'strike' (itself a semantic-phonetic compound of 卜 *pˁok atop 又, a drawing of a hand) beneath something looking like 入 'enter' with a short horizontal line inside it. Those top components don't look like a pictograph of a whip to me, but I presume they're semantic. Another variant 𠓠 simply has 入 'enter' on top. See more variants here.

The brackets around the coda indicate that Baxter and Sagart "are uncertain about its identity". In this case, the coda might have been *-r. We know for sure that 'whip' ended in *-n in Middle Chinese, but Middle Chinese *-n could be from Old Chinese *-r as well as *-n*.

*pe[n] turns out to be an uncontroversial reconstruction. Pan, Zhangzheng, and Schuessler all reconstruct it as *pen. I am the odd man out, as my system requires a high vowel presyllable to account for the vowel breaking (partial vowel height matching) in Middle Chinese (MC):

*Cɯ-pen > *Cɯ-pien > *pien (= pjien in Baxter's MC transcription "not intended as a reconstruction")

My Old Chinese *pen without a high vowel presyllable (e.g., 邊 'side') remained *pen in Middle Chinese. Baxter and Sagart reconstruct 'side' as *pˁe[n] with a pharygealized *pˁ- that in my view blocked breaking. Such pharygealized initials distinguish their reconstruction from most others. I only reconstruct pharyngealization in Middle Old Chinese; it developed in (pre)initial** consonants preceding 'lower' vowels (*ʌ *e *a *o) and spread through the syllable:

*pen > *pˁen > *pˁeˁnˁ

On the other hand, my Old Chinese *Cɯ-pen was not subject to pharyngealization because its preinitial preceded the 'higher' vowel *ɯ. Pharyngealization and its absence conditioned vowel allophones that became phonemic after the loss of pharyngealization in Late Old Chinese (OC):

Graph Baxter and Sagart's OC This site Baxter's MC
Early OC Middle OC Late OC, MC
𠓥/鞭 *pe[n] *Cɯ-pen *Cɯ-pien > *pien *pien pjien
*pˁe[n] *pen *pˁen > *pˁeˁnˁ *pen pen

*10.2.0:51: I reconstruct *-n unless (1) a phonetic series or word family also contains Middle Chinese *-j readings pointing to *-r and/or (2) external evidence points to *-r. Baxter and Sagart's policy of reconstructing  is safer since there is no guarantee that all Old Chinese *-r belonged to such phonetic series or word families and/or can be reconstructed on the basis of external evidence.

I have not found any true cognates of the Chinese word for 'whip'; all lookalikes in the region are borrowings.

The Tangut words for 'whip' are completely different:

0219 2tseʳw < *T-tse(k/w)H (common) and 1877 2ʔiəʳ < *T-ʔəH or *ʔərH (only in dictionaries?)

**10.2.1:02: Preinitials are onsets of unstressed presyllables whereas initials are onsets of stressed syllables. Hence the preinitial of *Cɯ-pen was *C- (an unknown consonant) and its initial was *p-. The height of the vowel after the first consonant in a (sesqui)syllable conditioned the presence or absence of pharyngealization in Middle Old Chinese.

I suspect that uvular initials always conditioned pharyngealization regardless of the following vowel unless preceded by a high vowel presyllable, but I have not yet investigated that hypothesis:

*qi > *qˁi (but *ki > *ki)

*Cʌ-qi > *Cˁʌˁ-qˁeˁiˁ > *kei (same outcome as *Cʌ-ki)

*Cɯ-qi > *Cɯ-ki > *ki (same outcome as *Cɯ-ki)

Tangut fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2014 Amritavision