Having mentioned the Early Middle Korean poem 悼二將歌 To i chang ka 'A Song Mourning Two Generals' (1120 AD) last night, I thought it might be interesting to examine its first line and demonstrate the problems involved in trying to decipher hyangga (early Korean poetry). The poem is in the hyangchhal script mixing semantograms (red) and phonograms (blue). Purple indicates words written as combinations of semantograms and phonograms. For simplicity I have transliterated all scholars' reconstructions in the same way except for Kim Wan-jin's which reflects the vowel shift hypothesis that I and others have rejected (e.g., Oh Sang-suk 1998 and Ko Seongyeon 2013).

Premodern Sino-Korean reading tsyu ɯr wan ho pʌyk ho sim mun
Chinese gloss lord second Heavenly Stem complete question particle; preposition white, to report (< make clear [i.e., white]) question particle; preposition heart hear
Late Middle Korean translation equivalent nim - o(ɣ)ʌro hɯy-  'white', sʌrp- 'report' - mʌzʌm tɯt-
Yang Chu-dong (1942) *nim *ɯr *oʌrɣo *sʌrβ-ɯn *mʌzʌm-ʌn
Chi Hyŏn-yŏng (1948) *orɣo *sʌrp-on *mʌzʌm-ɯn
Kim Wan-jin (1980) *ni(li)m *ər *uɔrɣu *sɔrβ-ən *mɔzɔm-ɔn
Yu Chhang-gyun (1994) *nim *ɯr *oʌrɣo *sʌrβ-ɯn *mʌzʌm-ɯn
This site (*nilim?) *(oɣʌr?-ɣ/h)o *(sʌr?)ɣo(n?) *(mʌzʌ?)m-Vn
Gloss lord ACC wholly
report-? heart-TOP
As for the heart that reported all to my lord ...

1. 主

There is no way to be sure how this word was read. We know that in the Koreanic Paekche language, 'lord' was transcribed in sinographs as 爾林 *ɲi(e) lim which represented something like *n(y)elim or *nilim, cognate to Late Middle Korean nim. Medial -l- survived in nali 'stream' in the Koryŏ kayo 'songs of Koryŏ' (Kim Wan-jin 1980: 211). So it is likely - though not certain - that 主 was *nilim. (As a convention I write the lost liquid of Korean as l to differentiate it from r which was retained.) The Chinese-Korean glossary Jilin leishi only tells us that 主 'lord' was 主 'lord' in Korean, which doesn't help; the informant may have used the Sino-Korean word to try to impress his Chinese interlocutor.

2. 乙

This cannot be a semantogram because the calendrical term 'second Heavenly Stem' makes no sense here. It must be a phonogram for the accusative ending *-ɯr which surprises me since I would expect 'lord' to be the direct object.

3. 完乎

The first sinograph is a semantograph whose reading is uncertain.

I don't know why Chi did not reconstruct *ʌ; Late Middle Korean o(ɣ)ʌ did not arise from the breaking of o.

乎 could represent *o (cf. early Sino-Japanese wo from Sino-Paekche), *ɣo, or *ho. Could Late Middle Korean o(ɣ)ʌro be a contraction of Early Middle Korean *o(ɣ)ʌrɣo? Or is 完乎 some unrelated synonymous adverb ending in *-(ɣ/h)o?

4. 白乎

If 白 is a semantogram for *sʌrp- and if lenition took place (if was in the previous word - there was no before lenition), then perhaps *p lenited to before *-o in this dialect. (In standard Late Middle Korean, *p lenited to the β that Yang, Kim, and Yu projected back into Early Middle Korean.)

*o/ɣo/ho is a poor phonetic match for Yang and Yu's *-ɯn and Kim's *-ən. Those reconstructions were influenced by the Late Middle Korean adnominal suffix -ʌn which would be expected in this context. Maybe *-o is an Early Middle Korean ending without a Late Middle Korean descendant. Or Chi is right and the ending was *-on. Its *o could have been reduced to ʌ in Late Middle Korean, and the final *-n might not have been written because it assimilated with the initial m- of the next word: *-on m- > *-om m- (analyzed in writing as <-o.m->). Further examples of <-V.N-> for expected *-VN N-sequences could verify this hypothesis.

5. 心聞

心 is a semantogram for an *-m-final word for 'heart'. If the ɣ-s above are correct - i.e., if lenition took place - then the medial consonant of 'heart' probably already lenited to *-z- (unless this poem was composed after *ɣ-lenition but before *z-lenition).

One might be tempted to regard 聞 as a verb 'hear', but that's not possible since Korean sentences do not end in bare verb stems. I think it represents the final *-m of 'heart' followed by a topic suffix *-ɯn that split into Late Middle Korean -ʌn and -ɯn depending on the height of the vowels of the preceding noun. There was no charcter with a Sino-Korean reading *mɯn, so 聞 mun was the best available match.

Yang projected Late Middle Korean -ʌn back into Early Middle Korean even though lower mid unrounded is not a good match for the high rounded u of Sino-Korean mun. Kim's *-ən has the same problem.

The only way to make Yang and Kim's readings work is to suppose that the scribe had the early Sino-Korean reading *mən for 聞 (cf. Sino-Japanese mon < Sino-Paekche) in mind.

If Chi's -on became Late Middle Korean -ʌn, then perhaps there was a rounded allomorph *-un of the topic particle due to labial harmony after 'heart' (whose Late Middle Korean ʌ might be a reduction of an earlier *a or *o) that was reduced to -ʌn and -ɯn in Late Middle Korean:

*ma/ozom-un > *-ɯn > mʌzʌm-ʌn

(11.27.1:36: *mazam and *mozam would not trigger labial harmony since the vowel closest to the suffix would not be labial. There is no way to be certain about the vowels since Jilin leishi only tells us that 心 'heart' in Korean sounded like Chinese 心 'heart' pronounced like 尋 which sounded like 心 with a different tone.)

I am skeptical about vowel harmony in Old and Early Middle Korean. If it existed, it might have worked differently from that of Late Middle Korean: e.g., the potential labial harmony after 'heart'.)

Oddly, Yang and Kim respectively reconstructed *-ʌn and *-ɔn in accordance with vowel harmony after 'heart' but violated vowel harmony by reconstructing *-ɯn and *-ən after 'report' which belonged to the same lower vowel class as 'heart'. On the other hand, Chi and Yu disregarded vowel harmony after 'heart'. A NEW BOOK: KIAER'S THE OLD KOREAN POETRY

I didn't know about The Old Korean Poetry: Grammatical Analysis and Translation or its author Jieun Kiaer until today. I would like to see it.

The The in its title is unusual; I'm surprised it wasn't removed in the editing process.

I am also surprised that a syntactician wrote that book. I initially assumed she was a historian whom I had not heard of.

I wonder how she dealt with Old Korean whose hyangchhal script is highly problematic. I have seen seven different complete decipherments, and I would like to see the decipherment in Ryu and Pak (2003). As Ramsey and Lee (2011: 57) wrote,

[... I]nterpretation of the hyangga retains a monumental task. We quite honestly do not know what some hyangga mean, much less what they sounded like.

I am curious to see if she has her own decipherment.

Moreover, since the description only mentions fourteen hyangga (Old Korean poems)*, I assume the other twenty poems covered in the book are in Middle Korean, as only twenty-six** hyangga have survived.

11.26.1:39: I am not sure what this line means:

This book provides linguistic explanations for each poem and essential vocabulary – both in Middle and Contemporary Korean.

What are the criteria for "essential" status? I assume all grammatical morphemes are included.

Does this vocabulary accompany each poem, or is it in an appendix?

There is no mention of Old Korean vocabulary, even though two-fifths of the book is about Old Korean poetry. Is that because there is no universally accepted decipherment of Old Korean?

Are the Contemporary Korean forms translations of the Middle Korean forms?

*11.26.2:24: I am guessing that these are the fourteen Shilla poems preserved in Samguk yusa (late 13th c. AD).

**11.26.2:46: I am counting the Koryŏ poem 悼二將歌 To i chang ka 'A Song Mourning Two Generals' (1120 AD) in the total. Lee and Ramsey (2011: 57) exclude it. STUMPED BY THE SEA CAMEL

Last night, I rediscovered the Haitai (해태 Haethae) brand and noticed that the English Wikipedia lists the Chinese characters for it as 海陀 'sea hill', one of the many variants of the name of the xiezhi:

Late Middle Chinese
Late Old Chinese
Character 1 gloss
Character 2 gloss
獬豸 해태 haethae, 해치 haechi
*xɤaj ʈhɤaj ~ ʈhi

*ɣɤeʔ ɖɤɑjʔ ~ ɖɨɑjʔ first syllable of 'xiezhi'
worm; crawl like a feline beast or reptile; disperse
no other uses
獬廌 'xiezhi'
understand (< 'untie')
해태 haethae Daum
*xɤaj thəj *ɣɤeh dəɰʔ slack (cognate to 'untie')
normally 'laziness'
해타 haetha *ɣɤaj thwɑ
*ɣɤeh dwɑjʔ/h lazy
咳唾 *xəj thwɑ *ɣəɰ thwɑjh cough
normally 'spittle'
해태 haethae *xəj thəj

*xəɰʔ dəɰ sea

normally 'seaweed'
海陀 해타 haetha, 해태 haethae
*xəj thɑ *xəɰʔ dɑj
hill; usually a phonetic symbol
no other uses?

海駝 Daum

The inclusion of Middle and Late Old Chinese readings does not mean all these terms existed in Middle and/or Late Old Chinese. See below.

陀 is normally read 타 tha, not 태 thae, a reading which seems to simultaneously reflect Late Middle Chinese *thɑ (< Early Middle Chinese *dɑ) and Late Old Chinese *dɑj. How is such a mixture of new and old possible? Although *th- ~ *d-variation is possible**, I doubt that Sino-Korean thae reflects an Old Chinese variant reading *thɑj for 陀  in 海陀, as I cannot find any attestation of that word before the History of Liao (1344), centuries after the Middle Chinese period. (How old is the Chinese place name 海陀山?)

Here's what I think happened (revised and expanded 11.25.23:23):

- The word may be a late 1st millennium BC loan from some non-Chinese language, as it has no Chinese etymology and I cannot find any attestations prior to 獬廌 in the Records of the Grand Historian (c. 109 BC). Moreover, all spellings are either partly or completely phonetic.

- 'Xiezhi' developed an abbreviated monosyllabic form 廌 (unless 廌 is a loanword and 獬廌 is a Chinese-foreign hybrid compound 'understanding 廌').

- Some of the phonetic variation may indicate multiple borrowings of the same word from different dialects of the same language or a set of related languages.

- Some of the spellings appear to be puns, and some may be of Korean origin, as I have not seen the last six for 'xiezhi' in a Chinese text.

- The earliest forms had two syllables with voiced initial consonants. Forms that would have been read with one or two voiceless initial consonants may be later spellings coined after voiced initials had devoiced (and often aspirated) in Late Middle Chinese: e.g., it is unlikely that 咳唾 is an old spelling because it would have been pronounced *ɣəɰ thwɑjh with a voiceless aspirate in Late Old Chinese absent from the earliest attested forms.

- The 海駝 'sea camel' spelling could reflect a folk etymology.

- The pronunciation haethae spread to the spellings 海陀 and 海駝 in Korean.

11.25.23:30: The mismatch between the characters 海陀/海駝 and the pronunciation haethae in Korean is reminiscent of the mismatch between the spelling colonel and its pronunciation. Cummings (1988: 449) wrote,

Etymologists do not agree completely on colonel, but whatever the historical dynamics of the word, it is a clear case of mixed convergence, the pronunciation of one, apparently earlier form, coronel, having become attached to the spelling of another.

*11.25.1:51: Obviously Shuowen is not a Korean reference source, but any word in a Classical Chinese text has a Sino-Korean reading.

**11.25.23:33: Late Old Chinese 太 *thɑs 'greatest' and 大 *dɑs 'great' go back to Early Old Chinese *hlats and *lats and share a root *lats. The *hl- of 太 must be from a voiceless prefix plus root-initial *l-. AVERAGING THAI SONG TONES

Two nights ago, I wrote,

I have no data on Thai Song, the third language written with Tai Viet, but I expect its *implosives to follow the same [tonal] pattern as Black Tai and White Tai.

In other words, I expected Thai Song tones to tend to be higher after reflexes of *voiced initials (other than *glottals including *implosives) and lower after reflexes of *voiceless and *glottal initials. Hence the heights of the tones would roughly match the names of the consonant letters they were associated with: i.e., HIGH and LOW.

Last night I found Somsonge Burusphat's 2012 compliation of Thai Song tones at twelve locations*. Her paper even includes tonal contours of individual speakers. Here are the average heights of each tone on a five-point scale (1 = lowest, 5 = highest) as described on page 37. I did not include varieties whose tones were only described in words in my calculations. As an example of how I calculated the averages, the Loei A tone is 241, 2 + 4 + 1 = 7, and 7 divided by 3 is .2.33. Then I added 2.33 to 3 (< 24, the average of Donyaihom), 3 (< 24, the average of Dontoom), 3.5 (< 34, the average of Suantaeng), etc. and divided that total by 11 (the number of languages with numerical tone descriptions), resulting in 2.9.

Proto-Tai tone A B C D
'low' tone class: *voiceless and *glottal initial 2.9 3.6 2.3 3.6
'high' tone class: *voiced initial 3.8 3.2 3.2 3.6

As in Black Tai and White Tai, the *voiced A and C tones are higher than their *voiceless/glottal counterparts, but there is litle or no height difference between the *voiced and *voiceless/glottal B and D tones. The *voiced vs. voiceless/glottal distinction correlates with contours that are masked by single-number averages:

Thai Song: usually level (sometimes falling) vs. rising

Black Tai: level vs. rising

White Tai: falling vs. rising

11.24.23:16: Here are the average heights of Thai Song tones at three points followed by their average contours:

Average starting points

Proto-Tai tone A B C D
'low' tone class: *voiceless and *glottal initial 2.3 2.5 2.5 2.7
'high' tone class: *voiced initial 3.3 3.3 3.7 3.6

Average mid points

Proto-Tai tone A B C D
'low' tone class: *voiceless and *glottal initial 3 3.6 2.3 3.7
'high' tone class: *voiced initial 4.5 3.3 3.6 3.6

Average ending points

Proto-Tai tone A B C D
'low' tone class: *voiceless and *glottal initial 3.5 4.6 2.1 4.6
'high' tone class: *voiced initial 3.5 3.1 2.1 3.5

Average contour

Proto-Tai tone A B C D
'low' tone class: *voiceless and *glottal initial 24 35 32 35
'high' tone class: *voiced initial 354 33 42 44

The composite *voiced tones always start higher, though this is obscured in the average contour table since 2.5 is rounded up to 3 and 3.3 is rounded down to 3.

I have not tried to average the presence or absence of glottalization in the C tone.

**11.24.23:24: I have excluded the Black Tai data from Vietnam, so all figures here are from eleven locations in Thailand. Although Black Tai and Thai Song lie on eastern and western ends of a spectrum, I was only interested in the tones of Thai Song (i.e., the varieties spoken in Thailand). See this table for Black Tai tones. BLACK AND WHITE EVIDENCE FOR VIETNAMESE PHONOLOGICAL HISTORY

Last night, I hypothesized that several unexpected letters in the Tai Viet script for Black Tai, White Tai, and Thai Song were devised to write consonants in loanwords with anomalous 'high' tones from Vietnamese:


(ꪒ U+AA92 TAI VIET LETTER LOW DO is for [d] + 'low' tones < implosive *ɗ)


(ꪖ U+AA96 TAI VIET LETTER LOW THO is for [tʰ] + 'low' tones < voiceless aspirated *th)


(ꪚ U+AA9A TAI VIET LETTER LOW BO is for [b] + 'low' tones < implosive *ɓ)


(ꪞ U+AA96 TAI VIET LETTER LOW PHO is for [pʰ] + 'low' tones < voiceless aspirated *ph)

Tonight I found even more letters of that type:


(does not resemble ꪂ U+AA82 TAI VIET LETTER LOW KHO for [kʰ] + 'low' tones < voiceless aspirated *kh; looks like a derivative of ꪅ U+AA85 TAI VIET LETTER HIGH KHHO presumably for White Tai* [x] < voiced *ɣ; the left side resembles HIGH PO which has no phonetic resemblance - could it be a graphic cognate of Khmer ឃ <gh>?)


(derived from ꪌ U+AA8C TAI VIET LETTER LOW CHO for [cʰ] + 'low' tones < voiceless aspirated *ch)


(derived from ꪬ U+AAAC TAI VIET LETTER LOW HO for [h] + 'low' tones < voiceless *h)


(derived from ꪮ U+AAAE TAI VIET LETTER LOW O for [ʔ] + 'low' tones < voiceless *ʔ)

I assume these letters could also be used to write native onomatopoeia and non-Vietnamese loanwords with anomalous 'high' tones.

By 'anomalous' I mean that a word has a tone not conditioned by the usual historical source(s) of its initial consonant: e.g., Black or White Tai could borrow Vietnamese thành 'to become' as, say, than 31, with a mid-low falling tone that normally developed after *voiced initials, not voiceless *th- which is the usual source of Black and White Tai th-. See these tables.

All of the above letters are for stops with the exception of HIGH HO for the fricative h followed by normally *voiced tones. It occurred to me that any Vietnamese loans with HIGH KHO, CHO, and PHO must have been borrowed before *kh *ch *ph became fricatives [x s f] in Vietnamese. In short, their stop quality dates them. (I am assuming that the Vietnamese dialects known to Black and White Tai speakers lost most aspirates like the major dialects did.) Such old loans can tell us that their tones in Vietnamese may have sounded like *voiced tones to Black and White Tai speakers at the time of borrowing. That resemblance may not have survived to the present day; Tai and/or Vietnamese might have changed its tones. Hence Vietnamese borrowings may be clues to tonal change (or its absence) in Black and White Tai and Vietnamese. There could be multiple strata of Tai borrowings from Vietnamese with different patterns of tonal correspondences: e.g.,

- suppose the Vietnamese ngang tone was once 44 (mid-high level)

- Black and White Tai speakers could borrow ngang as their tone 44

- then ngang lowered to 33 (mid level)

- Black and White Tai speakers borrowed ngang as their tone 22 (neither language has 33, so 22 is the closest match)

- so the same Vietnamese tone category (ngang) corresponds to two different tones in Black and White Tai: 44 in older loans and 22 in newer loans

Again, without any Vietnamese loan data on hand, I can't explore this idea any further.

*11.23.2:46: Is Black Tai [k] < written etymologically with HIGH KHHO or with LOW KO as if it were from *g? I don't know what the Thai Song reflex of is or how it is written. D-OU-B-LED LETTERS IN TAI VIET

After asking about the Lao script last night, I thought it might be a good time to ask a question about the Tai Viet block of Unicode:

Are the 'extra' Tai Viet d and b letters for Vietnamese loanwords?

I downloaded an SIL Tai Viet font last December, but forgot about it until Wednesday when I needed to install a pre-Unicode SIL IPA93 Sophia font to view John Coleman's page on Shilha. I looked in my folder of SIL fonts and rediscover their Tai Heritage font. Last night while looking through its character inventory in Andrew West's BabelMap, I was surprised to see two letters for d and b:





Although Thai has six letters for th and three letters for ph, it has only one letter each for d and b (< Proto-Tai and *ɓ) in native words:

Initial type *implosive *voiceless aspirated *voiced *voiced aspirated in Indic loans (pronounced as *voiced)
retroflexes in Indic loans (pronounced as dentals in Thai) ฎ <ɗ̣> > d ฐ <ṭh> *th > th ฑ <ḍ> *d > th ฒ <ḍh> *d > th
dental ด <ɗ> > d ถ <th> *th > th ท <d> *d > th ธ <dh> *d > th
labial บ <ɓ> > b ฝ <ph> *ph > ph พ <b> *b > ph ภ <bh> *b > ph

(Although neither Sanskrit nor Pali had a retroflex implosive ɗ̣, some Indic correspond to d written as ฎ in Indo-Thai loans.)

Similarly, Lao has two letters for th and two letters for ph (without counterparts of the 'extra' letters in Thai for Sanskrit and Pali loanwords), but only one letter each for d and b:

Initial type *implosive *voiceless aspirated *voiced
dental ດ <ɗ> > d ຖ <th> *th > th ທ <d> *d > th
labial ບ <ɓ> > b ຜ <ph> *ph > ph ພ <b> *b > ph

In Thai and Lao native words, d and b are associated only with tones that developed in *voiceless-initial syllables. (Proto-Tai implosives, though voiced, conditioned the same tones as *voiceless consonants in those languages.) The same is true for Black Tai and White Tai, two of the three languages written with Tai Viet (see Gedney's descriptions in Hudak 2008: 9, 12):

Black Tai tones

Proto-Tai tone A B and D C
'low' tone class: *voiceless and *implosive' initial 22 45 21
'high' tone class: *voiced initial 55 44 31

White Tai tones

Proto-Tai tone A D B C
'low' tone class: *voiceless and *implosive initial 22 45 24
'high' tone class: *voiced initial 44 454 31

The 'high' and 'low' tone classes in the Unicode names roughly correspond to the heights of the *voiced- and *voiceless-initial tones: all *voiced-initial tones start at 3 or higher on a 5-point scale, whereas only *voiceless-initial tones may start as low as 2.

I have no data on Thai Song, the third language written with Tai Viet, but I expect its *implosives to follow the same pattern as Black Tai and White Tai.

Hence I hypothesize that the 'extra' Tai Viet d and b letters are for borrowings of Vietnamese words with implosive initials đ- [ɗ] and b- [ɓ] and tones resembling native tones that developed in *voiced-initial syllables. Unfortunately I cannot test this hypothesis, because I have no Vietnamese borrowings in any script on hand.

11.22.4:04: The Tai Viet 'low' d and b letters are obviously related to the d and b letters of Thai and Lao.

Tai Viet ꪓ 'high' d looks like a ligature of ꪙ 'high' n and ꪒ 'low' d. I assume ꪙ high n was chosen to signify the tone class and is not a trace of earlier prenasalization: i.e., ꪓ 'high' d was never pronounced *[nd].

Tai Viet ꪛ 'high' b, on the other hand, looks like a ligature of ꪚ 'low' b and ꪝ 'high' p (< *b). Although one might think it once represented a cluster [bɓ], such a sequence is highly improbable.

Tai Viet ꪝ 'high' p in turn looks like a derivative of ꪜ 'low' p (< *p), a long-tailed derivative of ꪚ 'low' b, rather than a relative of Thai พ <b> ph < *b and Lao ພ <b> ph < *b which lack a right-hand tail.

Graphic cognates

Tai Viet Thai Lao
ꪒ 'low' d < ด <ɗ> d < ດ <ɗ> d <
ꪓ 'high' d < Vietnamese? no equivalent; น <n> + ด <ɗ> no equivalent; ນ <n> + ດ <ɗ>
ꪔ 'low' t < *t ต <t> t < *t ຕ <t> t < *t
ꪕ 'high' t < *d and ꪗ 'high' th < Vietnamese? ท <d> th < *d ທ <d> th < *d
ꪖ 'low' th < *th ถ <th> th < *th ຖ <th> th < *th
ꪚ 'low' b < บ <ɓ> b < ບ <ɓ> b <
ꪛ 'high' b < Vietnamese? no equivalent; บ <ɓ> + <b> no equivalent; ບ <ɓ> + ພ <b>
ꪜ 'low' p < *p and ꪝ 'high' p < *b ป <p> p < *p ປ <p> p < *p
ꪞ 'low' ph < *ph (not cognate to ผ <ph> ph < *ph) (not cognate to ຜ <ph> ph < *ph)
ꪟ 'high' ph < Vietnamese? no equivalent no equivalent
ꪠ 'low' f < *f (not cognate to ฝ <f> f < *f) (not cognate to ຝ <f> f < *f)
ꪡ 'high' f < *v (not cognate to ฟ <f> f < *v) (not cognate to ຟ <f> f < *v)

The last four Tai Viet letters in the table have no Thai or Lao graphic cognates.

Tai Viet ꪟ 'high' ph and ꪡ 'high' f look like derivatives of ꪝ 'high' p < *b in the Noto Sans Tai Viet font, but do not resemble 'high' p in N3220 or this Unicode chart.

I assume the 'extra' Tai Viet letters for 'high' th and ph are for Vietnamese loans with 'high' tones that would not normally follow native 'low' th. 'LOST' LAO LETTERS?

I shouldn't interrupt my Churyumov-Gerasimenko series (which itself interrupted a series on Tangut rhyme 4) but I want to ask this before I forget:

Were Sanskrit and Pali loanwords ever written etymologically in premodern secular Lao writing?

Today, Sanskrit and Pali loanwords are written phonetically in Lao, whereas they are written etymologically in Thai: e.g.,

Lao ພາສາ <bāsā> phaasaa

Thai ภาษา <bhāṣā> phaasaa

from Skt bhāṣā or Pali bhāsā 'language'. In earlier Lao and Thai, the word was *baasaa; neither language ever had a *bh or *ṣ. Later, *b shifted to ph in both languages.

I count Lao <bāsā> as a 'phonetic' spelling even though the word is no longer [baːsaː] in modern Lao because <b> is always [pʰ] in modern pronunciation; it is an absolutely regular spelling without any regard for Sanskrit or Pali. (To simplify matters, I will not discuss the interaction between consonants and tones in Lao and Thai.)

Conversely, I count Thai <bhāṣā> as etymological because it retains special letters <bh> and <ṣ> for Indic sounds that never existed in Thai.

As far as I know, the usual pattern is for religiously motivated scripts to keep 'extra', etymological letters even in secular writing, unless there are later attempts to eliminate those letters in modern times: e.g.,

- Russian only lost the Greek-based letters Ѳ (< theta) and Ѵ (< upsilon) less than a century ago, and lost others (Ѯ < xi, Ѱ < psi, Ѡ < omega) a little over three centuries ago.

- Ottoman Turkish retained 'extra' letters for Arabic loans up to its demise. I know of no attempt to create equivalents in the Turkish Latin alphabet, though that would have been theoretically possible.

- Persian retains 'extra' letters for Arabic loans to this day despite proposals for reforms that would eliminate them (see Sprachman 2002: 54-77 for examples).

- Burmese and Khmer, like Thai, retain 'extra' letters for Indic loans: e.g., Burmese ဘ <bh> and Khmer ភ <bh>.

- There was a short-lived attempt to eliminate the 'extra' letters in Thai.

Lao seems to be an exception to this pattern if my understanding of Enfield (1999: 260) is correct. I used to think that Lao script had lost the 'extra' letters (e.g., this post), but according to Enfield, it apparently never had them:

When people argue on this basis for a "return to tradition" through incorporation of the remaining characters [needed for etymological spelling of Indic loans], they are in fact not arguing for restoration, but for the modern, and in many cases novel, fixture of orthographical devices in the language. The deeper historical questions regarding developments of "native" Lao/Thai orthography are complex ones, which I cannot pursue here. But it is important to understand in the present context that the standardized etymological basis of Thai orthography in its present form, being literally designed to handle faithful transcription of Pali and especially Sanskrit, does not represent something that Lao once had or, in particular, could ever "go back to."

Yet Maha Sila Viravong's Lao alphabet had most of those 'extra' letters. I once assumed they were retentions, but they weren't even resurrections: e.g., the description of his Lao <ṭ> says it is (emphasis mine)

[o]ne of the 14 additional Lao letters that were created to transcribe Pali consonants. The letters were originally created in the 1930's by Dr Maha Sila Viravong who was working for the Buddhist Academic Council which was presided over by Prince Phetsarath.

I don't think they were created ex nihilo, though. I assume the letters were derived from some variant of the tham 'dharmic' script that retained the 'extra' letters. (The forms of the 'extra' letters in the two scripts as presented on Wikipedia do not always match: e.g., Lao <jh> and tham <jh>. Are the Lao forms novel inventions or are they based on variant letter forms not in Wikipedia or my fonts?)

Putting aside whatever happened in the 1930s, would Lao - and Thai - of centuries past have spelled Indic loanwords as if they were native words: e.g., *baasaa as <bāsā>? Is Lao <bāsā> a spelling that has been unchanged since the word was first written in a secular context? On the other hand, is Thai <bhāṣā> a modern pseudoarchaism?

Although I know something about Tai historical phonology, I know nothing about Tai philology. Why is Tai spelling history not mentioned in English-language studies of Tai language history? It is not as if the Tai languages were never written until modern times. Is it because Tai linguistics is largely the domain of field workers? I fear a large body of data (especially in Zhuang which is written in a Chinese-based script) has been overlooked.

11.21.2:41: Some romanizations of Lao names indicate knowledge of Indic etymology. Many examples are on this page: e.g., Bhuma for ພູມາ <būmā> [uːmaː] from Sanskrit/Pali bhūma- 'earth' (with final lengthening). I used to think these were transliterations of Lao spellings prior to a reform that eliminated the 'extra' letters, but if Lao never had these 'extra' letters prior to Maha Sila Viravong's alphabet, what is the origin of these spellings? Are they just carryovers from the Indic style of transliterating Thai? CHURYUMOV IN TANGRAPHY (PART 3)

Tangutizing the second syllable of Ukrainian Чурюмов [tʃuˈrʲumow] 'Churyumov' should be trivial. There is no doubt that Tangut had r- (transcribed in Tibetan as r-), and Gong Hwang-cherng (1997) and Arakawa (1997) both reconstruct two -ju rhymes*. Yet neither reconstruct a syllable rju. In Gong's reconstruction, r- can only precede retroflex vowels in rhymes 77-103 with the exception of rhyme 43 -jɨj. Similarly, in Arakawa's reconstruction, r- can only precede retroflex vowels in rhymes 77-103 with the exceptions of

rhyme 43 rjẽ2

rhyme 75 rjoŋ (rjọ̃ in his 1999 reconstruction**; Gong reconstructed ljọ with l-)

rhyme 77 rjek2 (rjẹ ĩn his 1999 reconstruction; Gong reconstructed reʳj with a retroflex vowel)

rhyme 78 re'2 (re'̣ in his 1999 reconstruction; Gong reconstructed rieʳj with a retroflex vowel)

rhyme 79 rje'2 (rjẹ' in his 1999 reconstruction; Gong reconstructed rjiʳj with a retroflex vowel)

I reconstruct rhyme 43 as -ẽ, and I think rẽ was a simplification of an earlier *rẽʳ with an unusual nasalized retroflex vowel like those of Kalasha (Heegård and Mørch 2004: 67). rjoŋ/rjọ̃ with rhyme 75 may have a similar explanation if its initial was r- (as opposed to Gong's l-): it may be from an earlier *rjọ̃ʳ with a nasalized retroflex tense vowel. (Does any language have that triple combination? I doubt it, but then again, I would be skeptical of nasalized retroflex vowels if I didn't know about Kalasha.)

Therefore the ryu of Churyumov would have to be Tangutized with a retroflex vowel as something like rjuʳ. Gong reconstructed twenty rjuʳ-syllables, whereas Arakawa only reconstructed eighteen. Arakawa may have accidentally left out the two members of Homophones A group 9.77. The obvious choice is

2147 2rjuʳ 'broom'

which rhymes with its synonym, the first half of
2271 0109 2zjuʳ 2gjịj (Gong), 2zzjuʳ 2gẹ̃ (Arakawa) 'comet' (lit. 'broom star')

So can we finally move on to Tangutizing -mov yet?

Next: Y not?

*[ju] is -yu in Arakawa's notation. I have rewritten Arakawa's and Gong's reconstructions in an IPA-like system to facilitate comparison.

**Arakawa (1999: 41) reconstructed both rhymes 73 and 75 as -ọ, but I think 75 -ọ was a typo for 75 -jọ̃ since he regarded 75 as a Grade II rhyme, his Grade II has medial -j-, and he placed 75 across from 57 -jõ. CHURYUMOV IN TANGRAPHY (PART 2)

In part 1 I decided to Tangutize the first syllable of Ukrainian Чурюмов [tʃuˈrʲumow] 'Churyumov' as 1013

which was pronounced something like chu.

Were Tangut shibilants palatal, alveopalatal, or retroflex?

I have been using the neutral notation ch to avoid answering that question up until now in this series.

There is no doubt that Tangut class VII initials were shibilants. They were most commonly transcribed in Tibetan as c-, ch-, j-, and sh- (ignoring preinitials; Tai 2008: 194). Moreover, one of the class IX initials was commonly transcribed in Tibetan as zh- (again ignoring preinitials; Tai 2008: 201). Although the Tibetan initials were probably palatal* [tɕ tɕʰ dʑ ɕ ʑ], that does not mean that the Tangut initials were necessarily palatal, as the Tibetan script had no characters for alveopalatal [tʃ tʃʰ dʒ ʃ ʒ] or retroflex [tʂ tʂʰ dʐ ʂ ʐ].

Middle Chinese had a distinction between palatals and retroflexes. Twelfth-century reflexes of both types of initials were used to transcribe Tangut shibilants in the Timely Pearl. That either implies that the distinction was lost (as in Phags-pa Chinese to the east from the following century) or that neither was a perfect match for the Tangut shibilants (which could have been alveopalatal).

Sanskrit also had a distinction between palatal ś [ɕ] and retroflex [ʂ]. Both were transcribed with Tangut sh-, though there are a few alveolar s-tangraphs for Sanskrit palatal ś-syllables and the s-tangraph 0493

could represent Sanskrit palatal ś, retroflex ṣ, and alveolar s (Arakawa 1997: 110-114). This admittedly small tendency to write Sanskrit palatal ś as Tangut alveolar s may suggest that Tangut sh was closer to Sanskrit retroflex ṣ. The correspondence of Sanskrit palatal ś to Tangut alveolar s is also reminiscent of the Russian transcription of Mandarin palatal x [ɕ] as palatalized alveolar [sʲ]: e.g., 西夏 Xixia 'Tangut' as Си Ся [sʲi sʲa].

However, the use of Tangut alveolar affricates (ts- tsh- dz-) to transcribe Sanskrit palatal stops (c ch j**) is not evidence against Tangut shibilant affricates being palatal because the variety of Sanskrit known to the Tangut had alveolar affricates instead of palatal stops.

Shibilants are one of the three types of 'vigilant' Grade III initials in Tangut. If Grade III was palatal as reconstructed by Gong, then I would expect the 'vigilant' initials to be palatals:

class II ɥ- (which I would not expect in a labiodental class)

class VII tɕ-, tɕh-, dʑ-, ɕ-

class IX λ-, ʑ-

I prefer to more or less follow 李新魁 Li Xinkui (1980)*** and reconstruct these initials as follows:

class II v-

class VII tʂ-, tʂh-, dʐ-, ʂ-

class IX l- [ɫ], ʐ-

These initials are all 'antipalatal': cf. how Russian retroflexes and nonpalatalized [v] and [ɫ] cannot precede [i]. Just as Russian /i/ retracts to [ɨ] after retroflexes, pre-Tangut *i became ɨi after retroflexes.

There is acoustical affinity between l and v: e.g., in Ukrainian, syllable-final *-l became /v/: e.g., *volk > /vovk/ 'wolf'.

Moreover, there is also acoustical affinity between retroflexes and labiodentals: e.g., in modern northwestern Chinese dialects (whose substrata if not ancestors were the dialects known to the Tangut 800-900 years ago), retroflexes became labiodentals before *w and *u (Coblin 1994: 97, 102)

*tʂ- > pf-

*tʂh- > pfh-

*ʂ- > f-

- > v-

So I am not surprised that the 'vigilant' initials form a class in Tangut.

Lastly, if the Tangut grades were like Chinese grades, Grades II and III were less palatal than Grade IV. And those are precisely the two grades associated with shibilants****. Hence I think palatal initials are less likely in those two grades.

In any case, Tangut must have had retroflexes at some stage, as the shibilant in

3200 1tʂhɨiw < *K-truk 'six'

is from an *r-cluster: cf. Classical Tibetan drug, Written Burmese khrok, and many other Sino-Tibetan words for 'six'. This retroflex *tʂh- from *K-tr- could have shifted to alveopalatal *tʃh-, palatal *tɕh-, or even alveolar *tsh- in Tangut dialects. There are rare cases of Tibetan alveolar affricates (ts- dz-) transcribing Tangut shibilants (Tai 2008: 194). If those instances of ts- and dz- are not errors*****, they may reflect the beginnings of a shift from retroflexes to alveolars.

Next: On to the second syllable in part 3.

*They were palatal in Old and Classical Tibetan (Jacques 2012: 90). There is no guarantee they were also palatal in the dialect(s) underlying the Tibetan transcriptions of Tangut. Nonetheless there is also no evidence suggesting that they were not palatal in that dialect or dialects.

**I have not seen any Tangut transcriptions of the rare Sanskrit consonant jh.

***11.19.0:24: Li Xinkui (1980) was the first to reconstruct retroflexes in classes VII and IX. My reconstructions are identical to his (as listed in Li Fanwen 1986: 126-127) except for (1) dʐ- corresponding to his aspirated dʐh- and (2) [ɫ].

****Arakawa (1997: 135) proposed that rhyme 50 which only has shibilant and l-initials (i.e., two of the three types of 'vigilant' initials) was grade I. Sofronov, Gong, and I regard 50 as Grade III.

*****The Tibetan characters for ts and dz are derived from the characters for c and j:

ts < c
dz < j

I would expect the extra stroke of ts and dz to be accidentally omitted rather than accidentally added. Thus I suspect the scribes intended to write ts and dz, though it is not clear whether they actually heard [ts] and [dz] or if they misheard shibilants as [ts] and [dz]. CHURYUMOV IN TANGRAPHY (PART 1)

Last night I wrote,

I still think each half is appropriate since Philae did cause us to see 67P/Churyumov-Gerasimenko. I'm not going to try to Tangutize all of that.

But tonight I decided to try anyway since Tangutizating Churyumov and Gerasimenko brings up some interesting issues in Tangut phonological reconstruction.


Before I deal with Tangut, I have a Ukranian question. I assume that Чурюмов [tʃuˈrʲumow] 'Churyumov' was once Чурюмовъ with a final weak yer. Normally o fronted to i before a weak yer in Ukrainian: e.g., Харьковъ > Харків 'Kharkiv'. Yet the surname is not *Чурюмів. Nor is the genitive plural of мова 'language' *мів from мовъ; it's мов.  All other forms of мова never had weak yers in the syllable before o. Was o restored by analogy in those words?

Tangutizing Chu-

Many scholars (Nishida, Sofronov, Huang, Li Fanwen, Gong Hwang-cherng, Arakawa, and most recently even myself) reconstruct Tangut rhyme 1 as -u. (Huang and Li also respectedly reconstructed -iu and -ü.) Hashimoto reconstructed a long vowel -U [uː]. This rhyme was almost always transcribed in Tibetan as -u (Tai 2008: 204), and it was used to transcribe Sanskrit -u and (Arakawa 1997: 110, 112).

The consensus is that rhyme 1 was Grade I. Grade I rhymes never follow class VII initials: i.e., shibilants such as ch-. Therefore there was no Grade I syllable chu in Tangut. Why were ch- and -u incompatible in Tangut? That question incorporates the assumption that rhyme 1 was -u. Perhaps the Grade III (Arakawa's Grade II) rhyme 2 syllable

that transcribed the Tangut period northwestern Chinese cognates of modern Mandarin

朱蛛猪諸 zhū [tʂu ˥]

竹竺 zhú [tʂu ˧˥]

zhǔ [tʂu ˨˩˦]

zhù [tʂu ˥˩]

zhōu [tʂow ˥]

zhǒu [tʂow ˨˩˦]]

was chu (and therefore the best match for the Chu- of Churyumov), and the Grade I rhyme was something other than -u. Here are three possible scenarios:

Grade/rhyme A B C
I/1 nonshibilant + -u nonshibilant + X + -u nonshibilant + X + -u
III/2 shibilant + X + -u shibilant + -u shibilant + Y + -u

In scenario A, rhyme 1 was -u, but rhyme 2 had an extra quality X that differentiated it from simple -u: e.g., Gong Hwang-cherng's -j-, Arakawa's -y- (= [j]), and my -ɨ-.

In scenario B, it is rhyme 1 that had an extra quality X that differentiated it from simple -u. If Gong Xun (2014) is correct, that quality may have been pharyngealization or retracted tongue root. But this begs the question of why the Tangut would use rhyme 1 to transcribe Sanskrit -u and without pharyngealization or retracted tongue root. My short answer is that rhyme 1 was the best available match after many initials. I will elaborate on that answer in the future.

In scenario C, neither rhyme had a simple -u. For years until recently I reconstructed rhyme 1 as -əu and rhyme 2 as -ɨu.

Next: Tangutizing the rest of Churyumov. PHILAE IN TANGRAPHY

Having just written about the Tangut word for 'comet', I wanted to come up with a Tangutized name of the Φιλαί Philae lander. A Tangutization could be based on at least three different pronunciations:

English [fajli]

Modern Greek [file]

Classical Greek [pʰilai]

Each poses at least one problem for Tangutization.

Did Tangut have f-?

As far as I know, only Nishida, Huang Zhenhua, and Arakawa reconstructed f-. I do not think it is completely impossible, but I am not yet fully convinced. It is a low-frequency initial in all three reconstructions:

- it appears before only 18 of Nishida's (1964: 85-86) 102 rhymes

- it appears before only 6 of Arakawa's (1997: 125-149) 105 rhymes

- it appears before only 12 of Huang's (1983: 128-134) 97 'level' tone rhymes

Only Arakawa and Huang (?) reconstruct f- before -i(:) in 3859 'rat' which has no homophones:

Arakawa 1fi:, Huang 1fi (?)

but Nishida 1wi, Sofronov 1968 1xi̭we, Gong 1xjwi

That tangraph was in the labiodental chapter of Homophones, but its initial fanqie speller - also in that chapter - had an initial fanqie speller in the glottal chapter which includes back (velar?) fricatives:


3850 1xwɨi (rhyme 10) = 0418 1xwɨə + 2228 pi (rhyme 11, not 10!)


0418 1xwɨə = 2504 1xu + 1760 1ʂwɨə

Unfortunately there is no Tibetan transcription data for 2504 or any other tangraph in its initial fanqie chain (VIII 10, Tai 2008: 196).

2504 is a transcription character for Sanskrit hu. That would be unlikely if its Tangut reading were fu - unless Tangut had no hu or xu. It transcribed both *x- and *f-initial Chinese syllables.

On the other hand, why would a xw-syllable be placed in the labiodental chapter of Homophones? Was [f] an allophone of /xw/? Were [f] and [x] in free variation before -u?

Moreover, 0418 means 'Buddha' and may be a loanword from Tangut period northewestern Chinese 佛 *fɨə. Was that word borrowed with f- and/or xw-?

In any case, I wouldn't want to Tangutize Philae with 'rat'.

Did Tangut have -ai?

Even if Tangut had f-, I am unaware of anyone reconstructing a Tangut syllable like fai. And it is doubtful that the extant recorded varieties of Tangut had -ai (though such a rhyme could have existed in unwritten dialects). Such a rhyme should have corresponded to -aHi in Tibetan transcription, but no such transcription exists.

ai is rare in Sanskrit, so it is not surprising that Arakawa (1997: 113) listed only four Tangut transcriptions of Sanskrit Cai-syllables:

4884 2ni (Grade IV rhyme 11) for Skt nai and ni; transcribed in Tibetan as niH

2563 2mɛ (Grade I rhyme 34) for Skt mai; rhyme mostly transcribed in Tibetan as -i and -e

4262 2be (Grade IV rhyme 37) for Skt vai; rhyme mostly transcribed in Tibetan as -e

5300 3639 1tə 2reʳ (Grade IV rhyme 79) for Skt trai; rhyme mostly transcribed in Tibetan as -e

If Tangut had a rhyme -ai, all Sanskrit syllables would have been transcribed with that rhyme and its retroflex variant -aiʳ after r-.

The use of both Tangut -i and -e-type rhymes indicates that Tangut had no exact match for -ai. -i imitated the second half of -i while mid front -e was a compromise between low a and high front i.

One might counter that the Tangut heard a foreign (e.g., Tibetan or Chinese) pronunciation of Sanskrit with a monophthong like e instead of ai, but if that were the case, the Tangut could have consistently transcribed that e as e.

Following the precedent of transcribing Sanskrit ai with -e-type rhymes, I would Tangutize Philae as

0749 0046 1phi 2le

which one might be tempted to 'translate' as 'cause to see', though that would be ungrammatical in Tangut since the causative 1phi follows verbs. I chose 1phi because it transcribed Tangut period northwestern Chinese *phi-syllables (霹鼻脾備琵) in the Timely Pearl. 2le 'to see' transcribed Sanskrit le. Even though 0749 0046 can't be a Tangut phrase, I still think each half is appropriate since Philae did cause us to see 67P/Churyumov-Gerasimenko. I'm not going to try to Tangutize all of that. I'll settle for

3200 1084 2205 0749 1tʂhɨiw 2ɣạ 1ʂɨạ 1phi  'six ten seven causative'

with a Tangutization of English [pʰi] 'P'. ARE COMETS BEAUTIFUL BROOMS?

Since a comet has been in the news lately, this would be a good time to take up Andrew West's suggestion to write about the Tangut word for 'comet' which appears in Timely Pearl 074:

2271 0109 2ɮyʳ 2gẹ 'comet'

It corresponds morpheme by morpheme to its Chinese translation 掃星 'broom star'. 2271 is probably a derivative and special spelling of 3695 2ɮyʳ 'broom' used in astronomical contexts:


2271 = 'grass' (left of 3695) + an element Grinstead (1972: 28) glossed as 'finery' and 'ornament', perhaps from a character such as 0364 tsẽ 'beautiful' - were comets 'beautiful brooms' in the sky as opposed to those on the ground which could be held with hands (the right-hand element of 3695)?

Is 2ɮyʳ 2ge a calque of 掃星?

Although I do not know of any pre-Tang attestations of 掃星, 彗星 'comet', also literally 'broom star', goes at least as far back as the Han Dynasty, and Karlgren (1957: 143) glossed 彗 by itself as 'comet' in the pre-Han Zuo zhuan.

Unfortunately there is no way to determine how old the Tangut term is; the most I can say is that it certainly was not invented on the spot by Timely Pearl author Gule Maocai in 1190, as it also appears in other texts: 53A53 in the first edition of Homophones which was written 65 years earlier, Newly Assembled Precious Dual Maxims (1187), and volume 6 of the Tangut translation of the Golden Light Sutra. And each half is in the Precious Rhymes of the Tangraphic Sea dating from sometime after 1069.

I do not know of any Tibetan term for 'comet' like 'broom star'. Are there languages outside the Sinosphere with 'broom star' for 'comet'? How likely it is that the Tangut coined the term independently?

2271 could also mean 'comet' in the compound

2814 2271 2ɬị 2ɮyʳ 'comet'

'moon comet'

from Timely Pearl 083. It too is a morpheme-for-morpheme match of Chinese 月孛 'moon comet'. Could Late Middle Chinese 孛 *pɦot be the source of Tibetan phod 'comet'?

In Homophones 53A52 and Timely Pearl 265, the regular tangraph for 'broom' is paired with a rhyming synonym also written with the 'grass' radical:

3695 2147 2ɮyʳ 2ryʳ 'broom (and?) broom'

Are those two words cognates? I would not expect lateral ɮ- and retroflex r- to be in the same word family. Is 2ɮyʳ from *2l-ryʳ? Are there other pairs of ɮ- and r-words with identical rhymes and similar semantics?

Nishida (1964: 213) has the English translation 'broom' for 3695 2147, implying that it was a redundant compound. Other possible redundant compounds are:

2147 4260 2ryʳ 2ɬø̃ 'broom (and?) broom' (Homophones A 51A37 and 48B32)

4260 2147 2ɬø̃ 2ryʳ 'broom (and?) broom' (Tangraphic Sea 1.55.211)

0094 4910 2147 1ʂwo 2vɛ 2ryʳ 'sweeping broom' (Tangraphic Sea 1.55.211)

0094 4910 2147 1ʂwo 2vɛ 2ɬø̃ 'sweeping broom' (Tangraphic Sea 1.81.252)

0094 4910 1ʂwo 2vɛ is a verb 'to sweep'. Although Li (2008: 16, 777) glossed it as a noun followed by a verb, the two halves seem inseparable, so I regard it as a disyllabic root rather than as a compound. THE M-D-L-ED MYSTERY OF TANGUT RHYME 4 (PART 1)

The second syllable of
3721 5407 2bʌ 2dɤu 'stupa, pagoda'

has Tangut rhyme 4

0730 1mɤu 'protruding mouth; pestle' (name of the level tone variant of rhyme 4)

0310 2mɤu 'transcription of Sanskrit mu, mū; cord (< Chn 纆?); to wipe (< Chn 抹?); to connect' (name of the rising tone variant of rhyme 4)

which has mystified me for almost seven years.

Modern scholars have categorized Tangut rhymes in terms of 'grades'. I am not entirely comfortable with that because I don't know of any Tangut word for 'grade'. I have not seen any of the Tangut translation equivalents of Chinese 等 'grade' used in a phonological context:

0382 1dzɨi 'equal, even'

0424 2te 'equality (< Chn 等); to measure'

0724 2nə 'plural suffix'

1290 2tsew 'class'

1576 2kɑ̣ 'equality'

17371kɑ 'equal, even'

(Of course, there is the possibility that the Tangut used an entirely different word for 'grade' that has not yet been identified. Tangut phonologists were surely familiar with the concept from Chinese phonology; the issue is whether they applied it to their own language.)

Moreover, what I have seen of the Tangut rhyme tables (not enough!) was not arranged by grade unlike the Chinese 韻鏡 Yunjing 'Rhyme Mirror' rhyme tables. Nonetheless, I am convinced by Gong's (1994) arguments in favor of Tangut grades, though I favor four grades instead of three. And I would add another argument: each grade is strongly correlated with a different set of initials:

Non-l liquids





(I have omitted the controversial and rare class IV initials.

✓ means 'present'.

X means 'ideally absent'. Red means 'actually absent'; yellow means 'ideally absent but exceptions exist': e.g.,

labial b- and glottal ʔ- and x- before grade III rhyme 2 -ɨu

alveolar tsh- before grade II rhyme 8 -ɤi

velar k-, alveolar dz-, and glottal ʔ- before grade III rhyme 10 -ɨi

l- before grade IV rhymes 3 -y and 37 -e

Exceptions list added 11.15.1:20.)

Today I coined the term 'vigilant' to refer to Grade III. Vigil is a mnemonic for three initial types associated with Grade III: v- for labiodentals (class II), g- [dʒ] for shibilants (class VII + class IX ʐ-), and l.

Grade IV is nonvigilant: i.e., it generally follows initials other than those three types.

Grade II is 'hypervigilant'; it can have any initial - vigilant or nonvigilant - other than alveolars and r-. Gong derived Grade II from medial *-r- in a 1993 paper I have not yet seen:

*CrV > CV + Grade II

I think Gong was right because his hypothesis predicts no r- before Grade II unless pre-Tangut had a cluster *rr-. And *alveolar-r clusters may have become retroflexes as in Chinese: e.g, *sr- > ʂ-.

Grade I is shibilant-free. Maybe I could call it 'hypervalent', meaning that it may occur after v-, l-, and nonvigilant initials, but not shibilants.

Rhymes 1-4 were all transcribed as -u in Tibetan. All scholars who reconstruct grade systems in Tangut agree that rhyme 1 was grade I, but disagree on the others:

Non-l liquids
Hashimoto 1965
Gong 1997
Arakawa 1999
Sofronov 2012
This site
✓but not m-!

✓but not d-!


b- only
ʔ-, x- only
m- only
d- only
ɬ- only

The initials of rhyme 4 are unlike any of the initial sets expected for the four grades. They are all back initials with the exceptions of m- and d- (as reconstructed by Gong) and ɬ- (= Gong's lh-). Why do m- and d- appear before rhyme 4 but not rhyme 1 in Gong's reconstruction?

Next: Were m- and d- really m- and d-? STUMPED BY 'STUPA' (PART 4: ETYMOLOGY OF THE SECOND SYLLABLE)

Having covered homophones of the first syllable of

3721 5407 2bʌ 2dɤu 'stupa, pagoda'

in parts 2 and 3, and hypothesizing that

4908 1b(w)ʌ 'ceremony and propriety'

might be related to it, I am now going to look for a plausible cognate of the second syllable among its (near-)homophones:

A group
Homophones A page/location Homophones B/D group Homophones B/D page/location Tangraph Gloss Reading Tangraphic Sea rhyme Overall rhyme
III.5 12A48 (1) 13A41 to exist, have, place 1dɤu 1.4 4
12A51 13A42 peaceful
12A52 13A44 building
12A53 13A45 first half of 0979 0978 1dɤu 2da 'slow, obtuse, dazed'
III.6 12A54 13A43 anger, rage (< Chn 怒)
12A55 13A46 to ban, prohibit, resist; to sink, drown, trap (when reduplicated)
12A56 (2) 13A47 to measure (< Chn 度) 2dɤu 2.4
12A57 13A48 second syllable of the surname 4561 2284 2ba 2dɤu (almost homophonous with 3721 5407 2bʌ 2dɤu 'stupa'!)
12A58 13A51 second syllable of 4373 4281 2dɤu 2lɨi 'pear tree' (< Chn 杜梨)
12A62 13A53 second word of 2671 0712 2bʌ 2dɤu 'drawers and stomacher' (< Chn 肚), a homophone of 3721 5407 2bʌ 2dɤu 'stupa'
12A63 13A54 second syllable of 0691 0710 2bɤa 2dɤu 'large-collared gown' (not in Tangraphic Sea)

Homophones group numbering follows the conventions established in part 3.

Note how the different placement of and in the middle implies that they could have had different tones in different editions of Homophones. Perhaps the circle marking the end of group III.5 was improperly placed in Homophones A, and that error was corrected in later editions of Homophones.

The missing 12A61 (Homophones A)/13A52 (Homophones B and D) is of course 5407 2dɤu, the second half of 'stupa'. And its obvious source is 2829   1dɤu 'building':


4908 + 2829 = 3721 5407

1bʌ 'ceremony' + 1dɤu 'building' = 2bʌ 2dɤu 'stupa'?

Semantically that is fine, but the tones do not match. Why were 'level' tones changed to 'rising' tones? And when did that happen? Before or after a final glottal *-H conditioned rising tones - if Tangut tones are of segmental origin? (What if Tangut tones originated from pitch accent as in southern Qiang?)

Moreover, is it simply a coincidence that four out of seven disyllabic words with dɤu fit the pattern bA dɤu?

0691 0710 2bɤa 2dɤu 'large-collared gown'

2671 0712 2bʌ 2dɤu 'drawers and stomacher'

3721 5407 2bʌ 2dɤu 'stupa, pagoda'

4561 2284 2ba 2dɤu (a surname)

The homophones from yesterday's list of 1bʌ-syllables come to mind:

5035 4068 1bʌ 1mɛ 'to present a gift; to fete'

5042 4072 1bʌ 1mɛ 'soft'

Is such (near-)homophony the result of unconscious evolution, or is it the product of conscious design? The study of Tangut polysyllabic morphemes has barely begun. STUMPED BY 'STUPA' (PART 3: NEAR-HOMOPHONES OF THE FIRST SYLLABLE)

In part 2, I looked at exact homophones of the first syllable of
3721 5407 2bʌ 2dɤu 'stupa, pagoda'

in search of possible cognates and found none. I forgot to look at near-homophones with the first tone:

Fanqie Tangraph Li Fanwen number Gloss
Initial Final


3149 first syllable of 3149 2811 2816 1bʌ 1lɨə 1lø 'round bone' (only in dictionaries)
5035 first syllable of 5035 4068 1bʌ 1mɛ 'to present a gift; to fete' (homophonous with 'soft' below)
5042 first syllable of 5042 4072 1bʌ 1mɛ 'soft' (homophonous with 'present a gift' above)


0022 resources
3594 first syllable of 3594 0620 1bʌ 2dɑ 'abrupt', 3594 0700 1bʌ 2dʐɨaʳ 'to throw' (only in dictionaries), and 3594 5586 1bʌ 2dʐwø 'to throw'
3692 first syllable of 3692 0342 1bʌ 1dzə 'to throw' (only in dictionaries); why weren't all three 'throw' verbs written with the same tangraph? Why change the bottom right corner?
4908 ceremony and propriety (only in dictionaries)
5031 second syllable of 2621 5031 2lɨə 1bʌ, name of an ancestor of the black-headed Tangut

Although these eight tangraphs have two different fanqie (which I will call A and B), they were placed in the same group as all but one of the 2bʌ tangraphs in Homophones A:

A group
Homophones A page/location Tangraphs Reading Tangraphic Sea rhyme Overall rhyme
I.16 03A55-03A61 1bʌ B 1.27 28
03A62-03A63 2bʌ 2.25
03A64 1bʌ A 1.27
03A65-03A67 2bʌ 2.25
03A68 1bʌ A 1.27
03A71-03A72 2bʌ 2.25
03A73 1bʌ' 1.31 32
03A74 1bʌ A 1.27 28
03A75-03B12 2bʌ 2.25
I.17 03B13
03B14 1bʌʳ 1.84 90

The numbering of Homophones A groups follows Li Fanwen 1986 and Arakawa 1997.

The two types of 1.27 were separated from each other and from 2.25 in Homophones B and D. Rhyme 28 tangraphs were no longer mixed with tangraphs of other rhymes. (rhyme 32) is 10A21 and (rhyme 90) is 10A57 in Homophones B and D.

B/D group
Homophones B/D page/location Tangraphs Reading Tangraphic Sea rhyme Overall rhyme
(1) 04A54-04A58 1bʌ B 1.27 28
(2) 04A61-046A62 1bʌ A
(3) 04A63-(04B01)

2bʌ 2.25

I have arbitarily numbered the Homophones B and D groups.

The main character of 04B01 in Homophones B has not survived, but what remains of the clarifier beneath it matches 1765 in Homophones A and D. 1765 is not separated from other 2bʌ in B or D. (This section is missing from Homophones C.)

The (mis)matches between the different editions of Homophones and the Tangraphic Sea indicate that

- 1bʌ (both types) and 2bʌ were very close (or even homophonous in the Homophones A dialect: i.e., they all had the same tone - or no tone?)

- the distinction between the two types of 1bʌ may not have been an isolated quirk of the Tangraphic Sea, as the B type tangraphs are clustered at the beginning of Homophones A group I.17, and have their own group (1) in Homophones B and D

- the fanqie suggest the difference may have involved a medial -w-, though such a medial is not otherwise thought to be distinctive after labials:

A: 1bi + 1kʌ = 1bʌ

B: 1bu + 1lʌ = 1bwʌ? (whose -w- may be from a prefix *P- - but would *P-prefixed forms really have outnumbered prefixless forms five to three?)

- rhymes 32 and 90 were similar to rhyme 28

- 1bʌ' (rhyme 32) might have been 1bʌ A in the Homophones A dialect whose ancestor may have lacked the conditioning factor that became the 'apostrophe' feature in the Tangraphic Sea dialect

- 2bʌ (rhyme 28) 'dark green' might have been 1bʌʳ like in the Homophones A dialect whose ancestor may have had a prefix *R- in 'dark green' that conditioned retroflexion absent from the Tangraphic Sea dialect

At the pre-Tangut level, the sources of 1bʌ and 2bʌ could have been identical except for the absence or presence of a final glottal *-H that conditioned the rising tone. (I am putting aside the question of whether type B 1bʌ had *P- at the pre-Tangut level.)

Out of the eight 1bʌ above, the only one with any potential semantic relevance to 3721 5407 2bʌ 2dɤu 'stupa, pagoda' is 4908 'ceremony and propriety'. If 4908 were suffixed with *-H or shifted to the rising tone after tonogenesis (but why?), it could have been added to something like 'mound' or 'building' to form 'stupa, pagoda'. But is there a 2dɤu with such a meaning?

Next: Homophones of 2dɤu 'stupa, pagoda'. STUMPED BY 'STUPA' (PART 2: ETYMOLOGY OF THE FIRST SYLLABLE)

The second half of the Tangut word

3721 5407 2bʌ 2dɤu 'stupa, pagoda'.

could be used on its own to mean 'stupa'. Was the first half 3721 2bʌ a prefix or modifier? Eight of the homophones of 3721 2bʌ are not free morphemes and cannot be modifiers. Nor does it seem likely that the noun 'stupa' shared a prefix with, say, the verb 'to swell'. The remaining five homophones do not look like probable modifiers: .e.g, 'insect'.

Tangraph Li Fanwen number Gloss

first syllable of 0589 3530 (2008) 2bʌ 2dəʳ (1xõ) 'scabies' (only in dictionaries)
first syllable of 1386 2434 2bʌ 1be and 1386 1146 2bʌ 1kɑ̣, botḥ 'old and shabby'
dark green (only in dictionaries)
first syllable of 2276 1972 2bʌ 2reʳ 'to swell' (the second half can occur by itself)
first syllable of 2280 0504 2bʌ 2lɨẽ 'spinach'
first word of 2671 0712 2bʌ 2dɤu 'drawers and stomacher', a homophone of 3721 5407 2bʌ 2dɤu 'stupa'; Li Fanwen (2008: 439) regarded 2671 as a loan from Chinese 襪 'drawers', but the latter was *va which is a poor phonetic match.
first syllable of 2828 0865 2bʌ 1tʂɨe 'to bear a burden' (only in dictionaries?; the quotation from Nevsky 1960 II may be a quotation from the lost rising tone volume of the Tangraphic Sea)
first syllable of 2828 0090 2bʌ 1voʳ 'mandarin duck' (1voʳ is 'chicken')
first syllable of the place name 2828 5856 2bʌ 2ɣɑ
pellet; first word of phrases 3381 2290 2bʌ 2lõ and 3381 5900 2bʌ 2di, both 'pellet' (2lõ is 'round' and 2di is 'broken')
first syllable of 4766 1032 4789 2bʌ 1vʌ̣ 1ny 'a kind of vegetable' (only in dictionaries)

I presume that the homophony of 3721 5407 2bʌ 2dɤu 'stupa' and the phrase 2671 0712 2bʌ 2dɤu 'drawers and stomacher' is purely coincidental.

If none of the above are related to 3721 5407 2bʌ 2dɤu 'stupa', there are several other possibilities.

First, 2dɤu may be an abbreviation for a disyllabic root 2bʌ 2dɤu, just as Chinese 塔 ta 'stupa' is an abbreviation of 塔婆 tapo < *thəp-ba. Unlike the Chinese word, 2bʌ 2dɤu is not a borrowing from Indic, and I wonder what its original meaning was.

Second, 2bʌ- could be a fusion of *N- with *p(h)ʌ or even *Nʌ- which lowered the vowel of a following *p(h)ə or *bə, so its true cognates may have been pronounced p(h)ʌ, p(h)ə, or and/even (< *Cʌ-Pə) or və (< *Cə-Pə). Casting a wider net may eventually yield results.

The third is a copout: 2bʌ- is the last survival of a morpheme that was lost elsewhere: cf. were- of werewolf (from an extinct wer 'man'; were- has only recently has become productive in neologisms) and -groom of bridegroom (from an extinct guma 'man').

(11.12.8:25: Fourth, a cognate of 2bʌ- could have the first ['level'] tone. I'll look at 1bʌ tangraphs in part 3.) STUMPED BY 'STUPA' (PART 1: TANGRAPHIC STRUCTURE)

Andrew West put up a page on Tangut text decorations including drawings of stupas. That got me thinking about the Tangut word

3721 5407 2bʌ 2dɤu 'stupa, pagoda'.

Each of the two tangraphs is the clarifier for the other in the various editions of Homophones:

2bʌ 2dɤu (a left-hand clarifier is read after the main tangraph; see scans)

2bʌ 2dɤu (a right-hand clarifier is read before the main tangraph; see scans)

The characters have nearly symmetrical structures. The analysis of the first tangraph is unknown, but I suspect it is similar to the analysis of the second:


3721 2bʌ (first half of 2bʌ 2dɤu 'stupa') =

'earth' < left of 3792 1lwy 'low' (only in dictionaries?) +

all of 5053 1tsəʳ' 'fifth' (used before 1448 2ʔew 'son'; see Andrew's article on Tangut filial ordinals)?


5407 2dɤu 'stupa' =

top of 5053 1tsəʳ' 'fifth' +

right of 1572 1phɤõ 'white' +

'earth' < left of 3792 1lwy 'low'

The 'earth' radical is not surprising, as the Chinese character 塔 for 'stupa' also contains an 土 'earth' radical. But why extract it from 'low' (a curious choice for a tall structure), and what are the functions of 'fifth' and 'white'?

I would have expected

1ŋwʌ 'five'

for the five elements symbolized by a stupa instead of 'fifth' (son).

'White' may refer to the color of a stupa. Why is 'white' in the analysis of 5407 but not 3721? The left-hand component of 5407 analyzed as a blend of 'fifth' and 'white' is unique to that tangraph:


It is that unusual radical that makes 3721 5407 unlike disyllabic words with symmetrical tangraphs: e.g.,

1721 5660 1ma ?kwi 'stirrup'


Kane's transcription of the Khitan large script in chapter five of his 2009 book has few of the diacritics that are common on vowels in his transliteration of the Khitan small script. The exceptions are ü and ï in Khitan large script spellings of Chinese loanwords and ê which may also be exclusive to Chinese loanwords:

[sêng un] 'commander' (transcribed in Chinese as 詳穩; < Chinese 將軍 'general'?)

[an] ~ [ên] for the transcription of Chinese 元 and 原

Liu and Wang (2004: 87) read the latter as [ɑn].

Does that mean the language underlying the Khitan large script had fewer vowels than the language underlying the Khitan small script? Not necessarily.

I think it is more likely that one phonology was written in two different ways. From a phonemic perspective, the Khitan large script may have underdifferentiated the Khitan vowel system whereas the Khitan small script overdifferentiated it by including characters for allophones. And overdifferentiation could have led to spelling problems as the vowel system changed over time and as Jurchen came to write in Khitan.

I could be wrong. Future analysis may reveal that the Khitan large script had at least six characters corresponding to the six back (?) vowel characters of the Khitan small script, and all six might have been phonemic in the tenth century when both scripts were established. But current scholarship indicates a degree of vocalic flexibility in the large script absent in the small script: e.g.,

[un] ~ [ən] (Liu and Wang 2004: 81; Kane 2009: 179 only has one reading [un])

may correspond to up to three characters in the small script:

<un>, <ún>, and/or <en> (= [ən])

Kane only confirmed the [un] : <un> correspondence.

In any case, it doesn't correspond to

<én> (two variants)

whose large script counterpart is

according to Kane 2009: 174. Maybe é was front whereas u, ú, and e [ə] were nonfront.

A future transliteration of the Khitan large script might have capital letters for vowel classes: e.g., <Un> for

with an <u> indicating a nonfront, nonlow vowel. Precise vocalism could only be determined via comparison with small script spellings (if any). REDHOUSE'S TURKISH VOWELS

After struggling to identify Khitan vowels, I would expect the identification of Turkish vowels in James Redhouse's 1880 dictionary to be trivial. I thought Redhouse would have eight symbols corresponding to the eight vowels of modern Turkish (a e ı i o ö u ü), but in fact he has eleven: five in roman type, three in small capitals, and three in italics:

1. A as in wall

2. a as in far

3. a as in about

4. E as in pan

5. e as in pen

6. i as in pin

7. i as in girl

8. o as in go

9. u as in French tu

10. U as in full

11. u as in fun

Users could ignore the distinctions indicated with small capitals and italics. Redhouse wrote in 1856 that

were the European character ever to be adopted in Turkey [which happened less than eighty years later!], for the purpose of writing the Ottoman language, there is no reason why the a, the e, the i, and the u should not bear several values as they do with us; whereas in printing, and, if necessary, even writing, the difference could be pointed out by one or two strokes under them, thereby leaving the upper part free for the introduction of special signs to distinguish the long from the short vowels, and the accentuated from the unaccentuated syllables.

He used macrons and acute and grave accents to indicate vowel length and accentuation (? - more on this below).

The five vowels in roman type are obvious:

- a, e, i, o, u = modern a, e, i, o, ü (not u!)

Clues to the other six (in bold) are in his list of the phonetic values of the hàrékÉ on pp. 11-12:

fètha = A, a, a, É, e

késsrÉ = i, i

dàmma ~ zàmma = o, u, U, u
Modern dotless ı, ö, and u must be equivalent to italic i, italic u, and small capital U:

Àlti = altı 'six'

dùrt = dört 'four'

òrdU = ordu 'camp, army' (the same word I wrote about two days ago)
The other three have modern equivalents overlapping with those of a and e:

Àda = ada 'island'

tèkÉ = teke 'shrimp'

On page 20, Redhouse differentiated between "hard" (= nonpalatal) small capital A and italic a and "soft" (palatal) roman a.

Only small capital Ā and roman ā can be long, and "there are scarcely any long vowels" in native Turkish words (pp. 13-14). (The original Turkic long vowels were long gone, and modern long vowels that resulted from the loss of /ɣ/ are absent from Redhouse's description*. The 'silent' letter ğ corresponds to Redhouse's gh,

a hard g, taking sometimes a gliding sound [...] sometimes softened down to the value of w when preceded or followed by o or U, and even by i; at other times it becomes almost imperceptible in the pronunciation. [p. 16])

After briefly browsing through Redhouse's dictionary, I tentatively conclude that

- the core (native) eight vowels were

nonpalatal/back: A (which could be long in Arabic and Persian loanwords), i, o, U [ɑ(ː) ɯ o u]

palatal/front: e, i, u, u [e i ø y]

- italic a was [ə] which was always short

The small capital A : italic a distinction corresponds to nothing in earlier Turkic or modern Turkish, and I wonder if Redhouse was hearing allophony.

Italic a seems to be in final syllables - but see àna below!

- roman a was a front or central [a] and could be long or short in Arabic and Persian loanwords

but it's also in native àna 'mother' (cf. modern ana ~ anne; the latter violates vowel harmony like Redhouse's form) and kara 'black' (after a back k)!

- small capital E in part corresponds to modern [ɛ], a word-final allophone of /e/

Although Redhouse wrote that small capital E was like the a [æ] of English pan, it does not always correspond to [æ], an allophone of /e/ before sonorant codas: e.g., Redhouse's ben (not bEn!) 'I' corresponds to modern ben [bæn].

On the other hand, kEtEn 'flax' corresponds to keten [ketæn], but it has a small capital E in nonword-final position!

I would have to look through Redhouse more carefully to refine my interpretation of his notation.

I was hoping to see another interpretation of the Redhouse romanization in Yavuz Kartallıoğlu's "The Vowels of Turkish Language in Transcription Texts" (2010). Unfortunately he did not look at Redhouse's dictionary, though he did include Redhouse's earlier French-language grammar of Turkish in his study.

*According to Kerslake (1998: 184), /ɣ/ was already lost prior to the nineteenth century. Redhouse may have been  transcribing a spelling pronunciation. FREQ<U>ENCY IN XINGZONG AND ZHONGGONG

Looking at spellings of 'camp' in the Khitan small script such as


<ordu.u> (Renyi 11.24, etc.) ~ <ordu.ú> ~ <ord.ó> (both unknown; see Kane 2009: 77)

one might conclude that their final characters all represent the same vowel:


<u> = <ú> = <ó>

And if one looks at other alternations in Kane (2009),


<u> = <û> = <ó> = <ô> (p. 45)


<ó> = <o> (p. 65)

one could equate all back vowel characters:


<u> = <ú> = <û> = <ó> = <ô> = <o>

Using the same logic, one could spot the overlap (烏) between the various phonograms for Old Japanese u


and Old Japanese wo


from Igarashi (1969: 160, 163) and conclude they all represented the same syllable:

于汙宇紆羽禹有雲卯兎菟得 = = 乎呼袁遠怨惋越弘小少𠮧緒絃綬雄尾男麻嗚塢

But that was not in fact the case. Those phonograms were taken from at least three strata of Old Japanese writing, each with its own conventions. 烏 'crow' does not appear in the earliest of the three strata. 烏 stood for u in the middle stratum and for wo in the new stratum. The two sound values reflected two strata of borrowing from Chinese:

- Before the seventh century, Early Middle Chinese 烏 *ʔo was borrowed as pre-Old Japanese *o which raised to u in Old Japanese. Hence 烏 was used as a phonogram for Old Japanese u in the middle stratum.

- Sometime between the seventh and early eighth century, Late Middle Chinese 烏 *ʔo was borrowed as Old Japanese wo because Old Japanese no longer had o (which had raised to u). Hence 烏 was used as a phonogram for Old Japanese wo in the new stratum.

The new style of writing in Nihon shoki (720) did not catch on, so the middle style continued to be used in later texts.

The lesson to be learned here is that there was no homogenous 'Old Japanese' orthography that lasted unchanged even over a brief two-century period.

Similarly, there is no reason to assume that Khitan orthography lasted unchanged from the invention of the small script in c. 925 until the script died out around 1200*. The earliest dated Khitan small script text is the epitaph of 耶律宗教 Yelü Zongjiao from 1053, over a century after the script's creation. Thus there is no guarantee that even that text preserves the phonology that the script was originally intended to represent. Perhaps the alternations of vowel symbols reflect spelling errors due to mergers. Another source of error may be influence from Jurchen in the Jin Dynasty texts. Jin Jurchen may have had a vowel system that was simpler or at least different from that of Khitan. (The Jin Jurchen reconstruction in Jin Qizong's dictionary may have as few as five vowel phonemes depending on one's analysis.) 12th century Khitan small script spelling may contain clues to what Khitan with a Jurchen accent sounded like. (Unfortunately at this point it's not clear what Jin Jurchen itself sounded like!)

Let's see if there are any differences in the frequencies of u/o-graphs in the earliest and latest Khitan small script texts I have on hand: the epitaphs for Emperor 興宗 Xingzong (1055) and  蕭仲恭 Xiao Zhonggong (1150):

Xingzong : Zhonggong ratio
ú 7/6%/5th
û 15/13%/4th

The relative rankings of vowels in the two texts are nearly identical except for

186 <o> and 245 <ú> which were only half as common in Zhonggong as in Xingzong

252 <ô> which was twice as common in Zhonggong as in Xingzong

The frequencies of 131 <u> and 372 <û> rose by 20% and 30% in Zhonggong. Were some Xingzong <ú>-words spelled with <u> and <û> in Zhonggong?

And were some Xingzong <o>-words spelled with <ô> in Zhonggong?

Did the Zhonggong language have only one u-vowel corresponding to three in Khitan c. 925?

090 <ó> rarely appears in transcriptions of Chinese words, and its relative ranking did not change. Perhaps it was stable while <o> and <ô> merged in the Zhonggong language.

The mergers above would have resulted in a Zhonggong vowel system resembling that of Manchu:

e [ə]
ó [ʊ] (like Manchu ū)

The interpretation of 090 <ó> is from Qidan xiaozi yanjiu (1985: 152).

Confirmation of this hypothesis will require the close study of individual words in the two texts.

*The Khitan small script was abolished by the Jin emperor over two centuries later in c. 1191-1192. I do not know if the Khitan small scripit continued to be used in the Kara-Khitan Khanate until its demise or if Yelü Chucai (1190-1244) knew the small script. WHAT'S TO THE RIGHT OF 'RICE'? (AND THE LEFT?)

Two of the Khitan small script spellings of the Khitan equivalents of the era names 重熙 Chongxi (1032-1055) and 大安 Da'an (1085-1094) included the small script character 355 resembling Chinese 米 'rice':

<HEAVEN ordu.l.ɣ> (Yelü Dilie 14)

<GREAT ordu.ó.o.ón> (Yelü Dilie 26)

They made me wonder about the other contexts in which 355 米 could be found:

355 <ordu>
Character QXY# Reading Source Character QXY# Reading Source
038 ?
Xingzong 14.7 080 ii Xu 22.11

Xu 53.6 090 ó Yelü Dilie 26
186 o
Daozong 5.20, Xu 25.21, Yelü Tabuye 13.7 131 u Xingzong 2.17, Renyi 11.24, Linggong 4.18, 17.3, Zhonggong 10.48, 13.46, 16.30, Yelü Tabuye 13.7
Xu 43.23 154 on Xu 58.13
270 êm Xu 58.4 189 a Daozong 24.27
245 ú Unknown (see Kane 2009: 77)
261 l Yelü Dilie 14
341 er Xu 58.4

The above table only lists characters immediately preceding or following 355 米 (which also occurs by itself in Xu 11.18); it is not a list of blocks or possible blocks: e.g., I know of no block 038/128-355-080 <?.ordu.ii>.

Given that

1. all known readings of preceding characters end in -o (ó) or consonants (s, -m)

2. there is no consistent vowel in the following characters (a, e, ii, o, ó, ú)

3. the expected converbs after o- and u-final stems are

<oi> and <ui>

not <ii> (Kane 2009: 149-150; but

<p.o.ju> was followed by <ii>

in Langjun 3.15 and Zhonggong 20.28. Was

*<p.o.ju.ui> (not in any text in Qidan xiaozi yanjiu)

intended? Would the same mistake have been made in two different texts? Both date from the Jin Dynasty; do they reflect errors in Khitan as a second language?)

4. my hypothesis that Khitan generally avoided vowel sequences unless they contained identical vowels or ended in high vowels (Vi and Vu)

but that hypothesis cannot account for the verbal noun oduon 'nourished' transcribed in Chinese as 窩篤盌 *(ʔ)wo-tu(ʔ)-on (see Kane 2009: 159)

I wonder if

1. 米 355 was originally a logograph <ordu> 'camp'.

2. <o> was added to <ordu> as a clarifier of the initial vowel:

<o.ordu> = ordu (Xu 25.21; more examples in 3 below)

3. <u> and <ú> were added to <ordu> as a clarifier and/or lengthener of the final vowel:


<ordu.u> (Renyi 11.24, etc.) ~ <ordu.ú> (unknown; see Kane 2009: 77) = ordu(u)


<o.ordu.u> (Yelü Tabuye 13.7) ~ <o.ordu.ú> (unknown; see Kane 2009: 77) = ordu(u)

Cf. early Turkic ordu: ~ ordo: with a final long vowel (Clauson 1972: 203).

4. later <ordu> was reinterpreted as a phonogram <U(r)d> and used for

4.1. ordo 'camp' (reflecting an o-final variant of the word; cf. the Turkic variation above):


<ord.ó> (unknown; see Kane 2009: 64, 77) ~ <ord.on> (genitive?; Xu 58.13)

4.2. a consonant-final verb U(r)d- taking the converb -ii:


4.3. an a-final verb (?) U(r)da- '?' taking the perfective suffix -(a)r:

<U(r)> (Daozong 24.27).

4.4. a noun or verb dêmU(r)d- '?' taking the accusative/instrumental or the perfective suffix -(e)r in

<d.êm.U(r)> (Xu 58.4; related to <d.em> 'to enfeoff'?)

I would expect the perfective suffix <or> after an o-final verb stem.

4.5. a verb udu- in the era names also written with <ú.dû>:

<HEAVEN ú.dû.l.ɣ> (Kane 2009: 159; not in Qidan xiaozi yanjiu; primary source unknown) ~

<HEAVEN ú.dû.l.ɣ> (Xingzong 1, 2, Xiao Linggong 17)

<GREAT ú.dû.ó.o.ón> (Yelü Tabuye 9, Xiao Zhonggong 6)

See the top of this post for the spellings with 355 米 which I would now interpret as

(heaven) udulɣa(a)r 'Chongxi'

(great) ud(u)o(o)n 'Da'an'

I also considered the possibility that 'camp' could have had a third form ord. Ord could have been borrowed from Turkic before apocope, whereas ordu(u) ~ ordo could have been later reborrowings from Turkic. Clauson (1972: 203) regarded the word as a loan in Turkic. Perhaps it is ultimately from Ruanruan or Xiongnu.

For an earlier take on 355 米 and 'camp', see what I wrote almost exactly three years ago.

11.7.15:28: Found and added line numbers for Yelü Dilie. GREAT PEACE, GREAT DIVERSITY

One of the most complex characters in the Khitan large script


is probably the same character that appears in the more complex spellings of the Khitan large script equivalent of the Liao Dynasty Chinese era name 大安 Da'an (Liao Chinese *taj an) 'Great Peace' (1085-1094):


(Kane 2009: 182; source of each spelling unknown)

(epitaph of 蕭袍魯 Xiao Paolu, line 15, 1090)

(epitaph of 耶律褀 Yelü Qi, line 22, 1108)

All of the above begin with <GREAT> resembling Chinese 大 'great'.

Kane listed six two-character spellings, but two (the third and fifth) look like

to me, so I count them as one (the fourth in my list above).

I suspect the first two spellings in Kane


are missing a third character present in the epitaphs I quoted:


The tiny characters in Kane's book are hard to make out, so


look almost identical. I used the zoom function of a digital camera to try to make out the subtle differences between them. Neither is in N4631. I wish I could compare them to the forms in inscriptions.

My image

is based directly on the only inscriptional form I have seen (on page 74 of N4631).

The font in N4631 has yet another variant with a joined 干, a dot instead of a vertical line, and no hook on the right:


I will consider these four forms to be equivalent:

= = =
They seem to be combinations of




Those vertical ligatures are reminiscent of


<taulia> 'rabbit' = <tau> + <lia>

The bottom component of 0430 is abbreviated as 匚~氵 and placed on the left (not right, where 氵 would not be permitted in Chinese) in

~~~~= 匚~氵(< ~)+~

Another variant of that combination may be 1792

from N4631.

This horizontal ligature also appears in the Khitan large script equivalent of the Liao Dynasty Chinese era name 重熙 Chongxi (Liao Chinese *tʂhuŋ xi) 'Repeated Splendor' (1032-1055; as translated by Kane 2009: 6):

(epitaph of the 北大王 Grand Prince of the North, lines 13, 15, 19, 1041)

(epitaph of 多羅里本郎君 Court Attendant Duoluoliben, line 8, 1081)

(epitaph of 多羅里本郎君 Court Attendant Duoluoliben, line 10, 1081)

(epitaph of 耶律褀 Yelü Qi, line 8, 1108)

All of the above begin with <HEAVEN> resembling Chinese 天 'heaven' atop 土 'earth'.

Note how different spellings can coexist within the same epitaph:
~ (Duoluoliben)

~ (Yelü Qi)

I would expect to find more if I had a complete database of known Khitan large script texts.

As if all of the above weren't complex enough, the identical large script second halves of the Khitan equivalents of Chongxi and Da'an correspond to different second halves in the small script:



<HEAVEN ú.dû.l.ɣ> (Kane 2009: 159; not in Qidan xiaozi yanjiu; primary source unknown) ~

<HEAVEN ú.dû.l.ɣ> (Xingzong 1, 2, Xiao Linggong 17) ~

<HEAVEN ordu.l.ɣ> (Kane 2009: 77, 159; Yelü Dilie 26)



<GREAT ú.dû.ó.o.ón> (Yelü Tabuye 9, Xiao Zhonggong 6) ~

<GREAT ordu.ó.o.ón> (Kane 2009: 77; Yelü Dilie 14)

So did

~ (nonligatures)
~~~ (horizontal ligatures)

~ ~ ~ (vertical ligatures)

have two readings depending on context, U(r)dUoon and U(r)dUlɣaar? (Capital letters indicate uncertain vowels.)

Could the first part


be a verb stem U(r)du- 'nourish' (suggested gloss from Kane 2009: 159)? But it is hard to believe the nominalizer -oon and the causative/passive-perfective sequence -lɣa-ar which are neither nearly homophonous nor synonymous would be written identically as

~ (and their variants in ligatures).

And would either of those two characters or some variant represent the verb written as

<ú.dû.l.ɣ> 'nourished'? (Daozong 13)

without a preceding <HEAVEN> in the small script?

John Tang's proposal of slight differences between the large and small script languages is attractive here. Perhaps names for the two eras had identical endings in the large script but different endings in the small scripts. Another possibility is that


represented a morpheme entirely different from -oon and -lɣa-ar, as it does not have the textual frequency I would expect for a verb ending. Does it ever appear in contexts other than era names? Moreover, if 'Chongxi' ended in the perfective ending -ar in the Khitan large script, I would expect


which is a frequent character (and which, unlike most large script characters, resembles its small script counterpart:


11.5.21:05: Added <HEAVEN ú.dû.l.ɣ>, <ú.dû.l.ɣ> as an independent verb, the comparison between large and small script characters for <an>, and sources for small script spellings of 'Chongxi' and 'Da'an'.

11.7.15:27: Found and added line numbers for Yelü Dilie. BIG-MOUTHED MEDICINE

The case for y in the N4631 reconstructions of Khitan (as opposed to Khitan itself) is even weaker than I had thought. Andrew West suggested that yo for the Khitan large script character

0067 'medicine' < Liao Chinese 藥

may have been intended to be IPA [jo] rather than IPA [yo]. Although there are modern Mandarin dialects with yo for 'medicine', their y may have secondary rounding. I do not know of any Khitan small script spelling of the word, but the Liao Chinese form must have been intermediate between Sino-Korean 약 yak [jak] (reflecting eighth century northeastern Chinese) and Yuan Dynasty forms like

Phags-pa Chinese ꡭꡠꡓ <yew> *jɛw (as reconstructed by Coblin 2007: 155)

Zhongyuan yinyun *jaw ~ *jɔ (as reconstructed by Pulleyblank 1991: 363)

and it would be simplest to reconstruct *j instead of a transient *y (or *ɥ).

I have no idea why the shape 夻 (resembling 大 'big' atop 口 'mouth') is associated with 'medicine'. It looks nothing like any variant of 藥 or any other character that probably would have been pronounced *jo in Liao Chinese. It does look like the Chinese character hua 'big-mouthed fish', but it doesn't sound like it, and I don't know if it is attested anywhere before Zihui (1615).

Chinese 藥 'medicine' does have an exact lookalike in the Khitan small script (0344 in N4631). It is the most complex character in N4631. Here are statistics on characters in N4631 with more than ten strokes:

Number of strokes
Number of characters
N4631 numbers
Chinese lookalikes?
0344 = 藥
1606 (variant of 0275 below?) none
0275 (variant of 1606 above?), 0736
0430, 0724, 1921
1921 similar to 殿
0247, 0337, 1670, 1841, 1931, 2106
1670 similar to 焰
1841 similar to 尊
1931 = 殿; cf. 1921 above
0350, 0627, 1236, 1454, 1497, 1716, 1776
0350 similar to 爾
0627 = 黄
1716 = 道
2,198 characters is too many for me to look at right now!

The most frequent stroke count is 5, matching what Andrew found with a smaller sample.

I have very limited experience with the Khitan large script. Until tonight, I only knew of four characters with twelve or more strokes: 1716 道, 1921 resembling 殿, 0344 藥, and


without any Chinese twin. Based on that small sample, I assumed that most complex characters were Chinese near-lookalikes, but in fact the opposite seems to be true. I don't know of any lookalikes for 12/20 (60%) of the characters with twelve or more strokes. Perhaps I could find more lookalikes - particularly by digging for variants in Longkan shoujian (997).

1716 道 was read dau (phonetically [taw]?), a match for Liao Chinese 道 *taw. However, I do not know how the other complex Chinese lookalikes such as 0344 藥 were read. There is no guarantee that they had Chinese-based readings, as many less complex Chinese lookalikes have unpredictable readings of mostly unknown origin: e.g.,

Character form
Khitan large script reading
Liao Chinese reading

xa (transcription of initial of Liao Chinese 行 *xaŋ)
*ʂaŋ 'above'
tau 'five', transcription of Liao Chinese 討 *thaw
*ŋu 'five'

iri 'name' (corresponding to Khitan small script <i.ri>
*ŋu 'seventh Earthly Branch'
an (transcription of rhyme of Liao Chinese 韓 *xan) *tʂi 'to arrive'

bai (transcription of initial of Liao Chinese 百 *paj) *kaw 'high'

五 represents both the Khitan and Liao Chinese words for 'five', so perhaps the other readings in the Khitan column above are Khitan translationo equivalents of the Liao Chinese readings: e.g., the Khitan word for 'high' was something like bai (which coincidentally sounds like Tangut

2be 'high'!).

However, this hypothesis has yet to be tested: e.g., does


correspond to 'high' in small script texts? That particular combination of characters is not in Qidan xiaozi yanjiu (1985). Is it in any texts that were found over the following three decades? THE VOWELS OF KHITAN IN N4631

See my last post for the consonants.

I don't think the 175 reconstructed readings of Khitan large script characters in this Unicode proposal were intended to represent a single coherent system. Nonetheless I could not help but catalog their ten vowels with their frequencies. Dubious vowels are in red.

i: 70
y: 2
u: 58
e: 18
ə: 14
o: 21
ɛ: 1
æ: 1
a: 36
ɑ: 13

Khitan had some sort of vowel harmony, so I would expect pairs or sets of vowels: e.g.,

'higher' series
ɛ y
ə (ɔ)
'lower' series
æ (ø)
ɑ o

I have filled in a couple of holes.

Back to the inventory from N4631:

1. i and u are more secure than y which is in only two readings:

tɕyr 'two' and yo < Liao Chinese 藥 *yo 'medicine'

If not for the small script characters

<y> and <ü>

corresponding to Liao Chinese and *y, I would hesitant to reconstruct y in Khitan. I think Khitan had y, though I don't know if it appeared in native words such as 'two'.

The Proto-Mongolic cognate of 'two', *jiri-n (Janhunen 2003: 16), has a neutral vowel i that could be from front *i or nonfront *ï. So it does not necessarily support the front-vowel reconstruction tɕyr.

2. Although there are many examples of ə, only two have transcriptional support:

ən, transcribed in Liao Chinese as 恩 *ən and 隱 *in (or *ɨn, if it was still like the 8th century form underlying Sino-Korean ŭn)

nəzəi 'dog', transcribed in Chinese as 捏褐 *njexo

In the latter case, ə appears to be a guess intended to bridge the gap between *e in the Chinese transcription and o in Classical Mongolian noqai 'dog'.

I would still expect ə in Khitan since it is in Manchu and Korean. But expectations should be backed by evidence.

3. The only example of ɛ is in the mysterious title

transcribed in Chinese as 詳穩 *sjaŋwən. I think it may simply be a variant of Khitan

sianggün 'general'

from Chinese 將軍. I don't see any reason to reconstruct a special vowel for this word.

4. The only example of æ is in


transcribed in Chinese as  八 *pa(ʔ). The reading pæt is probably based on Middle Chinese 八 *pæt which predated the creation of the large script by three centuries. By the Liao Dynasty, northeastern Chinese final stops had either been reduced to glottal stops or lost entirely, and *æ had shifted backward toward the space vacated by *a after it raised to *o:

*æ > *a > *o (or in phonetic notation, *[æ] > *[ä], *[ɑ] > *[ɔ]; the new /a/ was not as back as the old /a/)

If Khitan had an æ distinct from a, there would have been no way to unambiguously indicate that in Chinese transcription. Moreover, there do not seem to be multiple types of a-graphs in the small script.

5. That is one reason I am skeptical about an a vs. ɑ distinction in Khitan.

Another is that there is no Chinese transcription evidence for such a distinction:.

The Liao Chinese homophones 上 and 尚 *ʂaŋ ~ *ʂɑŋ (the exact pronunciation of the *a-type vowel is unknown) were borrowed as

ʃaŋ and ʃɑŋ

which have different vowels in N4631.


is a transcription of Liao Chinese 化 *xwa with a nonback vowel (not *xwɑ with a back vowel!).

I would simply reconstruct *a instead of *ɑ. *a may have been phonetically [ɑ] in some or even all environments, but I cannot reconstruct that level of detail.

6. Subtracting ɛ, æ, ɑ, and the loanword vowel y from the vowel inventory of N4631 leaves a six-vowel system which happens to be like the ones I reconstruct for pre-Tangut and Old Chinese:

'higher' series
ə u
'lower' series

That is too small to be the Khitan vowel system because the small script has a wealth of symbols for back and/or rounded vowels (Kane 2009: 29):

<o ó ô u ú û>

The diacritics in the transliteration are purely for differentiation purposes and have no phonetic value. Kane expected [y] and [ø], but I think [ʊ] and [ɔ] are also possibilities: cf. Manchu which has u [u] and ū [ʊ]. THE CONSONANTS OF KHITAN IN N4631

175 (7.9%) of the 2,218 Khitan large script characters in this Unicode proposal have reconstructed readings with twenty-six consonants. Dubious consonants are in red.

k   ŋ x ɣ  
tʃʰ   ʃ  
ts   s z  
t d n   r, l
p b m w

1. There is only one character with h transcribed as Liao Chinese 海里 *xɑjli and 解里 *xjajli:

hɑili (a name)

Nearly all other Liao Chinese *x correspond to x. I suspect that h could be rewritten as x.

2. The voiced counterpart of q may have been ɣ which could have been uvular [ʁ].

ɣ is, however, doubtful in

ɣa and ɣuaŋ

if those characters were read like Liao Chinese 何 *xɔ and 皇 *xɔŋ rather than Middle Chinese 何 *ɣɑ and 皇 *ɣwɑŋ.

3. The unaspirated-aspirated distinction in N4631 corresponds to the voiced-voiceless distinction in my Khitan transliterations and transcriptions: e.g., N4631 k, kʰ : my g, k. I do not think there was a three-way distinction between unaspirated voiceless, aspirated voiceless, and voiced. More in 8 and 10 below.

4. is in only one reading:

tɕyr 'two'

I suspect it could be rewritten as tʃ. If it was meant to be an allophone of /tʃ/ before front vowels, it should also appear before i.

5. ts was probably only in Chinese loanwords. I do not know why the first character of

ts.(u) 都統 'commander-in-chief'

was reconstructed as ts. (The second character has no reading in N4631; Liu and Wang 2004 reconstruct that character sequence as tʂʻ.u on the basis of small script evidence that I have not yet seen.)

6. z is only in

nəzəi 'dog'

which was transcribed in Chinese as 捏褐 *njexo which has no sibilant. z may be a typo for x.

Li and Wang (2004: 26) regard that character as a variant of


7. I treat th which only appears once in

then 'heaven'

as equivalent to tʰ. I do not know how the reading of that character was determined. Without any transcription or alternation evidence, there is no reason to believe that it sounded like Liao Chinese 天 *tʰjen 'heaven.' Liu and Wang (2004: )

8. d could be rewritten as t, as it corresponds to Liao Chinese *t (there was no Liao Chinese *d) and Mongolian and Jurchen/Manchu d [t]:

dan, transcribed as Liao Chinese 丹 *tan (Liao Chinese had no *d, so there would be no better transcription of a foreign d)

dol : Mongolian doloqan 'seven'

dor : Jurchen/Manchu doron 'seal', Manchu doro 'ritual'

dun < borrowing of Liao Chinese 屯 *tun

9. I treat ph which only appears once in

pho 'time'

as equivalent to pʰ.

10. I don't think there was a b distinct from p. b is in only two readings:

bur 'Buddha/all' and bun.

The first reading is shaky (part 1 / part 2 / part 3).

The second character is a transcription of Liao Chinese 汾 *fun (not *bun; Liao Chinese had no *b-).

11. I would rewrite the table without [h tɕ z d b] as

q [qʰ] ɣ [ʁ]  
k [kʰ] g [k] ng [ŋ] h [x]  
c [tʃʰ] j [tʃ]   š [ʃ] y [j]
  dz [ts]   s  
t [tʰ] d [t] n   r, l
p [pʰ] b [p] m w

with a non-IPA notation for compatibility with Mongolian and Manchu romanization. This is an attempt to improve upon the table without changing it too much; it does not represent my reconstruction of Khitan consonants.


I thank Andrew West for pointing out that the Khitan large script title from my last post

<RED ? tai siang gün>

actually corresponds to Liao Chinese 金吾大將軍 *kim ŋu taj tsiaŋ kyn, not 金五大將軍 *kim ŋu taj tsiaŋ kyn. Kane (2009) has the correct spelling 金吾 on pages 18, 42, and 99 along with its Khitan small script transcription:

<g.m ng.u>

I was wondering how Kane was able to translate 金五 as 'imperial insignia'. 金吾 - 'gold I' at first glance - also doesn't look like 'imperial insignia'. Hucker (1988: 168) wrote that the literal meaning of 金吾 was

[...] not wholly clear; probably used interchangeably from Chou into Han times with a homophonous term for prison, but traditionally interpreted as a special weapon, or a gold-tipped baton, or the image of a bird called chin-wu that was believed to frighten away evil. From Han on, commonly used in reference to imperial insignia, as in chih chin-wu (Chamberlain for the Imperial Insignia).

What was the "homophonous term for prison"? The closest term I can think of is Old Chinese 圄 Cɯ-ŋaʔ whose stressed syllable was close to 吾 *ŋa.

Andrew also reminded me that 吾 was also a Khitan small script character and noted that

<RED ngu>

corresponds to 金吾 in line 7 of the epitaph for 蕭袍魯 Xiao Paolu. This raises the question of whether <?> is a homophone of <ngu> rather than a variant of <sung> transcribing Liao Chinese 宗 *tsung 'ancestor':


Also, if <RED> could do double duty for native liauqu ~ liauqú 'red' and Sino-Khitan gim 'gold', could <?> do double duty for the unknown Khitan first person pronoun 'I' as well as Sino-Khitan ngu? GOLD FIVE OR RED ANCESTOR?

Having just discussed multiple readings of Tangut characters, I think it's a good time to discuss the possibility of multiple readings of Khitan large script characters.

On Monday, two titles jumped out at me when I was rereading Kane's overview of the history of Khitan large script decipherment in The Kitan Language and Script (2009):

<RED ? tai siang gün> =

Liao Chinese 金五大將軍 *kim ŋu taj tsiaŋ kyn

'senior general of the imperial insignia guard' (translation from Kane 2009: 170)

(lit. 'gold five great take* army')

<RED si chung lu tai fu giem giau tai ui> =

Liao Chinese 金紫崇祿大夫檢校太尉 *kim tsɨ tʂhuŋ luʔ taj fu kiem kiaw thaj uj

'lord of the golden seal and the purple ribbon, grand master of the court of imperial entertainments, inspector, and defender-in-chief' (translation with help from Hucker 1988)

(lit. 'gold purple lofty blessing great man inspect examine great official')

There are several odd things about these titles:

1. They seem to be transcriptions of Liao Chinese with the possible exceptions of <RED> and/or <?>. More on this point below.

2. <RED> corresponds to Liao Chinese 金 *kim 'gold'. I expected



or some phonogram for <gim> corresponding to


the small script transcription of 金.

3. Liao Chinese 五 *ŋu  'five' corresponds to


which this Khitan large script Unicode proposal regards as equivalent to

<sung> ~ <dzung> (hereafter 1190, its number in the proposal)

a transcription of Liao Chinese 宗 *tsung 'ancestor' which in turn is equivalent to the small script transcriptions


<sung> ~ <dzung>

Was <?> a phonogram for *ŋu that should be transliterated as <ngu>?

And was it a variant of 1190 which had at least two kinds of readings, <ngu> and <sung> ~ <dzung>? Or were <ngu> and 1190 <sung> ~ <dzung> distinct characters?

One might expect the large script character

<tau> 'five'

to correspond to its Liao Chinese lookalike and translation equivalent 五 *ŋu 'five'. But the Khitan seemed to prefer borrowing Chinese titles in toto rather than partly translating them, which is why the initial <RED> is unusual.

The Unicode proposal lists the reconstruction *kim (<gim> in my system) for

(hereafter 1651, its number in the proposal)

and glossed it as Chinese 金 'gold'. Reading the titles as

<gim ngu tai siang gün>

<gim si chung lu tai fu giem giau tai ui>

seems straightforward, but then raises the issue of why 1651 meant 'gold' in titles but 'red' in the calendrical system (see table 2.1 in Andrew West's post). Did 1651 and the Khitan color term written as


<l.iau.qu> ~ <l.iau.qú> (does this variation reflect grammatical gender?)

in the small script have a broad semantic range corresponding to 'red' through 'deep yellow'? Was the calendrical term for 'yellow' written as


<GOLD> ~ <GOLD♂>

in the small script more precisely 'pale yellow'?

Then again, the small script equivalent of 金 *kim 'gold', the Chinese name of the Jurchen state, was none other than 山 <GOLD>! What was the Khitan large script spelling of the name of the Jurchen state,

or ~?

Did 1651 and its variants have two kinds of readings, one Khitan and another Sino-Khitan?

Khitan Large script Small script
liauqu (native)


(added last variant 11.1.22:29)

liauqú (native)

It is also possible that each of the large script characters above had a different reading.

John Tang suggested that the language written with the large script was not quite the same as the language written with the small script. Perhaps large script writers translated Chinese 金 'gold' in noncalendrical contexts as 'red' whereas small script writers translated it as 'gold'. However, as Andrew West pointed out,

"the Khitan scripts do not show any significant chronological variation"

"there is no obvious geographical distinction between the two scripts"

"both scripts were commonly used for exactly the same function (writing memorials for the dead)"

"both scripts are used to write memorials for both men and women"

"there are memorials to princes and princesses in both scripts, although the only memorials to emperors and empresses found so far are in the small script"

"both scripts were used to write memorials for members of the Yelü 耶律 clan"

Nearly all those factors make a linguistic (as opposed to a merely orthographic) distinction between the large and small scripts unlikely, unless scribes were led in different directions by schools that did not communicate with each other, each school accumulating idiosyncracies of their teachers without noticing what the other school was doing. That hypothesis would predict that the earliest texts in both scripts should be more similar than the last texts.

If John Tang is right, maybe the language underlying the small script was considered to be of slightly higher status which would explain why it was used in imperial memorials. On the other hand, there must be memorials we have not yet seen, and the correlation between the small script and emperors and empresses may break down following further discoveries.

My frustration brings to mind the quotation by Nishida Tatsuo at the beginning of Kane's book:

To tell you the truth, the Kitan script is becoming more and more incomprehensible. Things which we were not able to understand before we are even less able to understand now.

*I have long been puzzled by the tone of 將 in 將軍 'general'. 將 has two tones:

'level' for 'to take'

'departing' for 'to lead an army', 'general' (as a monosyllabic word and in compounds other than 將軍 'general')

One would expect 將 to have a 'departing' tone in 將軍 'general' which looks like '將 lead an 軍 army'. But in fact the 將 of 將軍 'general' has a 'level' tone. So is 將軍 'general' literally 'take[r of] an army'? Is the 'level' tone due to irregular tone sandhi (軍 'army' has a 'level' tone)? The influence of the verb-object phrase 將軍 'take army' for 'checkmate' in chess (and in a broader sense, putting someone on the spot)? Is there some obvious explanation that has eluded me for years? Is there any Chinese language in which 將 has a 'departing' tone in 將軍 'general'? THE SOUND OF THE DOUBLE-SKINNED MOUTH

Last night I mentioned


4620 1ka 'how' =

left of 2247 1tu (first half of 1tu 1muʳ 'stupid'; arbitary source for the 'mouth' radical?) +

all of 1326 1kə (perfective prefix; phonetic)

as one of three Tangut transcriptions of Sanskrit ka. It also transcribed Sanskrit krā, ga, and kiṃ (Nevsky 1960 I: 574).

The title refers to its structure: 'mouth' on the left and what appears to be 1dʐə 'skin' doubled on the right. I have no idea why one would write a perfective prefix with 'skin'.

Nor do I have much of an idea of why 4620 transcribed Chinese syllables other than *ka (Li Fanwen 2008: 733):

Tangut text Sinograph Middle Chinese Tibetan transcriptions of Tang NW Chinese Tangut period NW Chinese Liao Chinese Phags-pa Chinese
Forest of Categories *kit kyir *ki *kiʔ ꡂꡦꡞ gÿi [kji]
Forest of Categories, Sunzi *kɨanʰ (Used by Amoghavajra to transcribe Skt -kaṇ-, kañ-) *kɨã *kien ꡂꡠꡋ gen [kɛn]
Forest of Categories *kɨanˀ, *kɨenˀ (Amoghavajra used a homophone 謇 to transcribe Skt -khaṇ-, -kan-)
Forest of Categories (see Nevsky 1960 II: 83) *ken kyan, kyen *kiã ꡂꡦꡋ gÿan [kjɛn]

Although my Tangut period NW Chinese reconstruction is based on Gong's which in turn is based on Tangut evidence, none of the readings of those characters match 1ka.

I have included Liao and Phags-pa Chinese (with Coblin 2007's transliteration and phonetic reconstructions) for reference. Neither the reconstruction of Liao Chinese nor the attested forms of Phags-pa Chinese is dependent on the reconstruction of Tangut. Both varieties were spoken to the east of northwestern Chinese, and of course Phags-pa Chinese postdates the fall of the Tangut Empire.

No single reconstruction of 4620 can account for all of its uses*:

Source Reconstruction Sanskrit ka Sanskrit krā Sanskrit ga Sanskrit kiṃ Chinese *ki Chinese *kia-type syllables
Nishida 1966 1kǐɑ partial match full match except for nasality
Sofronov 1968
Li Fanwen 1986
This site now
1ka full match partial match weak match partial match
Arakawa 1997 (Nishida-style) 1kaɦ partial match weak match partial match
Gong Hwang-cherng 1997 1kja partial match full match except for nasality
This site 2008-2014 1kia partial match full match except for nasality
Kotaka 2012 (Arakawa-style) 1ka: full match except for vowel length partial match weak match partial match

How do I explain those mismatches?

1. Tangut knowledge of Sanskrit was probably limited, so some inaccuracy was inevitable.

1a. Tangut probably did not have contrastive vowel length (contra Arakawa and Gong) which explains why transcription characters such as 4620 did double duty for short and long-vowel syllables.

1b. Tangut had no Cr-consonant clusters. Sanskrit kr could be misheard as k.

1c. If Tangut g was prenasalized [ŋg], then Tangut k could have been an acceptable approximation of Sanskrit g.

2. Educated Tangut were well versed in Chinese. Hence it is inconceivable that they repeatedly made glaring errors in transcription.

2a. Tangraphs may have had multiple readings, and the lexicographical tradition only listed basic readings unless the readings were very different from each other: e.g.,

4456 2tha / 4457 2lẹ 'big' (the tangraph is listed twice in Li Fanwen 2008)

Readers could supply nonbasic readings from context. Nonbasic readings of 4620 may have been closer to Chinese *ki and *kia.

2b. The readings of tangraphs underlying transcriptions may have been from nonstandard dialects: e.g., a dialect in which *kia had not simplified to ka or a dialect in which *kia had simplified to ki. The situation may have been comparable to the use of non-Mandarin-based transcriptions in written Mandarin today.

2c. The readings of the sinographs being transcribed may have been colloquial forms that were irregular from the viewpoint of the Chinese lexicographical tradition: e.g., 吉 may have had a reading like *ka (cf. Sino-Vietnamese cát [kaːt] instead of the expected regular *cất [kət] or *kia in addition to *ki. I cannot explain the *a of my speculative *ka. (Is SV cát the product of taboo deformation?) *ia may have been conditioned by a low-vowel prefix:

Standard reading: *klit > *kit > *kir >*ki
Alternate reading: *Cʌ-klit > *Cʌ-kleit > *ket > *kiet > *kier > *kia?

(That in fact is the evolution of 結 which was transcribed in Tangut by 2219 1ke which was also the transcription character for Sanskrit ke.)

Amoghavajra's transcriptions with 建 and 謇 (a homophone of 蹇) may suggest that they had alternate *kan-like readings in eighth century northwestern Chinese, though it is more likely that he used *kɨan-type characters because *kan was *[qɑn] with an un-Sanskrit uvular initial whereas *kɨan was *[kɨan] with a velar initial.

10.31.0:45: The problem with 2c is that it is unlikely that colloquial readings would be used to pronounce the Classical Chinese texts translated by the Tangut. And surely Tangut who could not only speak colloquial Chinese but also read Classical Chinese would know better than to mix informal and formal pronunciations of words. Then again, the line between colloquial and literary is not absolute: e.g.,

However, some dialects of Hokkien, such as Penang Hokkien as well as Philippine Hokkien (Lan-lang-oe) overwhelmingly favor colloquial readings. For example, in both Penang Hokkien and Philippine Hokkien, the characters for 'university,' 大學, are pronounced toā-ȯh (colloquial readings for both characters), instead of the literary reading tāi-hȧk, which is common in Taiwanese and Mainland Chinese [Hokkien] dialects.

10.31.0:49: Grinstead's dictionary (1972: 144) defined 4620 as 'Skt. ke' but I think ke may be a typo for ka. His table of dhāraṇī transcription tangraphs on p. 184 does not list 4620 as ke, his list of Tangut phonetics on p. 190 equates 4620 with ka, and no other scholar has ever equated 4620 with Sanskrit ke. TANGUT GRADE III -A('): RHYMES 19 AND 21 (PART 2)

I started what was meant to be a series almost three weeks ago. Then I got caught up correcting my own mistakes  - the ones I noticed, that is: a wrong rhyme and a wrong fanqie speller. (There must be even more errors in my Tangut reconstruction that I haven't even noticed yet!). I thank David Boxenhorn for reminding me of my plans to write about Tangut 'apostrophe' rhymes like 21 -ɨa'.

I got the apostrophe notation from Arakawa Shintarō. He uses apostrophes to indicate glottal stops in initial position, so maybe they indicate final glottal stops in his reconstruction. I, on the other hand, use apostrophes simply to mean 'different in some unknown way'.

In the Tangraphic Sea, nonapostrophe rhymes are followed by similar apostrophe rhymes in the first group of rhymes (1-60). Apostrophes and tenseness are always mutually exclusive, and apostrophes and nasality are almost mutually exclusive. Their coexistence in 59-60 should be investigated. There are also some anomalous combinations of nasality with tenseness (65 and 76) and retroflexion (97-98). I am fairly confident about the classification of rhymes up to 60; later rhymes, particularly 97 and up, are iffy, and others interpret them very differently (e.g., Arakawa only has two retroflex apostrophe rhymes: 88-89). The ordering pattern breaks down after 62: e.g., 63 is an e-type rhyme rather than an i-type rhyme. 104 and 105 look like last-minute additions.

Rhyme type
Plain rhymes
Tense rhymes
Retroflex rhymes
Apostrophe Nasal
99, 101
17-20, 105
65, 76 (!)
97-98 (!)
59-60 (!)

I do not rule out the possibility of a reinterpretation of the later rhymes in the future. For now, let's focus on 19 and 21.

In my reconstruction, both 19 and 21 are Grade III rhymes, so in theory they should have Grade III initials (in green). The reality is messer. Unexpected initial types are in pink, and initial types with minimal pairs are in red.

Other laterals
19 -ɨa


h-, ɦ-!
20 -a
21 -ɨa'
d-, n-!
k-, kh-!
ʔ-, ɦ-!
24 -a'

I used to follow Gong and write glottal fricatives as if they were velar, but I thought it was odd to have x- and ɣ- under "Glottal", so I now follow Arakawa and write the voiceless fricative as h-. By analogy I write its voiced counterpart (absent in Arakawa's reconstruction) as ɦ-.

There are no labial or lateral fricative (ɬ- ɮ-) initials which were generally somehow incompatible with Grade III. The absence of lateral fricatives is due to a larger constraint against alveolar fricatives in Grade III (but see below!).

Arakawa reconstructed Grade III as vowel length and reconstructed 21 as the only Grade IV rhyme in his system with both vowel length and medial -y-. But I don't understand why those features would be incompatible with labials. If pya and pa: were possible (Arakawa 1997: 128), why not *pya:?

This site






In my reconstruction, labials are rarely followed by Grade III -ɨ-.

Let's look at all the anomalies and see if they can be explained (or at least have notable features):

Li Fanwen 2008 number

bent, winding, crooked (only in dictionaries) No Grade IV rhyme 20 *kwa; regular reflex of  pre-Tangut *Cɯ-kwa or *Pɯ-ka?
1tsɨa to broil, roast (only in dictionaries); cognate to Grade IV rhyme 20 1tsa 'hot'
Why isn't this Grade IV rhyme 20 1tsa?
Minimal pair with Grade IV rhyme 20 1tsa
first half of 1hɨa 1ʂɤe 'to condemn' (only in dictionaries)
No Grade IV rhyme 20 *ha; regular reflex of pre-Tangut *Cɯ-ha?
fast, rapid
2ɦɨa second half of 1dzəʳ 2ɦɨa 'fast, rapid'; cognate to 2521 with voiced initial conditioned by lost prefix and second tone conditioned by suffix *-H
No Grade IV rhyme 20 *ɦa; regular reflex of pre-Tangut *Cɯ-Ka (with lenition of intervocalic *-K-)?

cover, lid, to cover; borrowing from Late Middle Chinese 盒 *xɑ(p) 'box' or some related word with voiced initial and vowel bending conditioned by high-vowel prefix?
umbrella of a carriage (specialized usage of 3008 above)
to make a detailed inquiry
No Grade III rhyme 19 *lwɨa; regular reflex of pre-Tangut *Cɯ-lwa or *Pɯ-la?
second half of 1bạ 1lwa 'lower limbs, legs'

1dɨa' second half of 1ti 1dɨa' 'to drip' (only in dictionaries); < Tangut period northwestern Chinese 滴答 *ti tɑ (but final vowels don't match - did front vowel of 1ti condition breaking of an earlier in the following syllable?) Minimal pair with Grade IV rhyme 24 1da' for transcribing Sanskrit ḍa
1nɨa' black
No Grade IV rhyme 24 *na'; regular reflex of pre-Tangut *Cɯ-naX?
2nɨa' to not be
dung, excrement
second half of 2mə 2nɨa' 'Tangut'; Tibetan minyag 'Tangut' may reflect an earlier or nonstandard form; may be derived from 0176 'black' plus a suffix *-H conditioning the second tone
1kɨa' transcription of Sanskrit ka and
No Grade IV rhyme 24 *ka'; regular reflex of pre-Tangut *Cɯ-kaX (except for transcription character, of course)?
foundation, basis, burden; transcription of Sanskrit ka and
pedestal, plinth (same word as 3985 above)
1khɨa' transcription of Sanskrit kha No Grade IV rhyme 24 *kha'
1ʔɨa' yes
No Grade IV rhyme 24 *ʔ(j)a'; regular reflex of pre-Tangut *Cɯ-ʔaX?

horn (only in dictionaries)
second half of 2vɪ 2ʔɨa' 'singing' (with 2vɪ 'to sing'; both halves only in dictionaries)
gold (less common synonym of 1kɤẹ)

Six anomalies in Gong's reconstruction (3456, 3502, 5584, 5763 in rhyme 20 and 0357, 0837 in rhyme 24) are not listed because they are no longer anomalies if they are reconstructed with ld- (following Tai 2008) instead of l-. ld- may have been a lateral affricate with the same pattern of distribution as the lateral fricatives ɬ- and ɮ-: i.e., in Grades I, II, and IV but not III.

Out of the remaining twenty-four anomalies, only two have corresponding Grade IV syllables (3408 and 2936). Those minimal pairs force me to reconstruct 19 and 20 differently (unlike Arakawa and Gong who seem to reconstruct them as homophones*).

All others are in complementary distribution, albeit not in the ideal pattern of complementary distribution. Did the Tangut dictionary tradition reflect a mixture of dialects with different sound changes: e.g.,

- in dialect A, *Cɯ-tsa became Grade III rhyme 19 1tsɨa

- in dialect B, *Cɯ-tsa became Grade IV rhyme 20 1tsa

and the dialect A form was chosen to be the standard form for 'to broil' while the dialect B form was chosen to be the standard form for 'hot'. I would rather not reconstruct different prefixes to account for the different vocalism of 'to broil' and 'hot' which probably share the same root *tsa.

Only two of the anomalies (3948 and 4823) are characters created for transcribing Sanskrit, and one of them (3948) is homophonous with a native word (3985). Why were Sanskrit ka, kā, and kha transcribed with the Tangut rhyme -ɨa' containing -ɨ- and the mysterious apostrophe feature absent from Sanskrit? (No Tangut transcription of Sanskrit khā is known.) Was that practice influenced by the Chinese transcriptions 迦 *kɨa and 佉 *khɨa for those syllables? (In earlier Chinese, *k(h)a was [q(ʰ)ɑ] with an un-Sanskrit uvular, so velar-initial sylalbles with medial *-ɨ- were regarded as closer matches.) The Tangut transcription of Sanskrit ka as

4620 1ka

without  -ɨ- may reflect Sanskrit filtered through Tibetan or even Sanskrit itself. Are transcriptions with 4620 closer to (Tibetanized) Sanskrit? Conversely, are transcriptions with

3948 1kɨa', 3985 1kɨa', and 4823 1khɨa'

based on Sinified Sanskrit? Or were those two types of transcriptive characters randomly mixed up?

*Although both 19 and 20 are -a: in Arakawa's notation, Arakawa's (1997: 128) table appears to list two subtypes of each of those rhymes (not including subtypes with -w-).

Tangut fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2014 Amritavision