I have updated my database of Tangut readings (download version 1.3 1 here) with the following changes:

- the anomalous ren-readings combining r- with a nonretroflex rhyme -en have been replaced by len. See Jacques (2014: 184-185).

- corrected readings for

L2164 and L3965 which had a nonexistent -an' rhyme instead of -y'

Thanks to Andrew West for spotting the wrong reading of L2164.

L5566 which belongs to rhyme 70 (1.67), not rhyme 67 (1.64)

L6027 which had r in the "Nasal/w" column instead of the "Cycle" column

The first change is in version 1.3 which I did not upload. All other changes are new in version 1.3.1. THREE TANGUT MEATS

I used to think Tangut

5865 1soq1 'three' (Tibetan transcriptions gsoH x 14, gso x 4, so x 2)

was from *k-sum (cf. Japhug rGyalrong χsɯm and Written Tibetan gsum) with a *k- that weakened to *x- and assimilated to the following *s- which conditioned vowel tension (written as -q):

*k-s- > *xs- > *ss- > s-q

The Tibetan transcriptions with g- may reflect a dialect retaining preinitial *k-.

Guillaume Jacques (2014: 197) proposed a pre-Tangut form *sə-svm (whose *v = any vowel but *i) with reduplication like Tagalog ta-tlo < Proto-Austronesian *telu.

(Might the retroflexion of Tangut

2005 21lyr' < *rliXH 'four'

be from a reduplicated *l- that merged with the preinitial *r- that conditioned vowel retroflexion written as -r? Retroflex vowels are so common in Tangut that I suspect they had sources other than *r-.)

But if *k-s- became s- + tension - and *k-obstruent sequences which became aspirates - then how can I account for the aspirate in

3465 1chhi3 'meat' (Tibetan transcription: chi)

whose root had initial *sj- (cf. Written Tibetan sha, Written Burmese sāḥ)?

Maybe *k-s- and *k-sj- had different reflexes:

*k-s > *x-s- > *s-s- > s-q

*k-sj- > *kʂʰ- > chh- [tʂʰ-]

But I doubt that was the case. There is no external support for *k- in 'meat'.

Perhaps the preinitial of 'meat' was *t- (cf. Japhug tɯ-ɕa), and *t-sj- fused into aspirated chh-. That proposal is not without problems, as we'lll see next time. CORDIAL COMPASSION?

Almost two weeks ago, I realized that

1483 2ne4 'compassion'

in the Tangut text

3457 0478 1483 2323 5404 4625 5302 1siw4 1sho'3 2ne4 1vy1 1la1 2me'4 0L?

'new collect compassion piety record final volume'

sounds almost exactly like

2518 2ne'4 'heart' (Tibetan transcriptions from Tai 2008: 215: gne x 4, ne x 1, nye x 1, gnyeH x 1)

The only difference between the two is the presence of the unknown phonetic quality 'prime' (transcribed as -') in 'heart'.

Are the two words are related? In other words, did they have similar forms in pre-Tangut?

Before I can answer those questions, I should survey the phonetic details of 'heart' in Tangut:

- According to Arakawa's hypothesis, Tibetan preinitial g- indicates tone 1, but 'heart' has tone 2

- For twenty years I have suspected, contra everyone else, that the Tibetan preinitials might be taken literally rather than as orthographic devices for tones. Could the transcribed dialect preserve a preinitial *k- (written as g- following Tibetan spelling conventions; kn- is un-Tibetan) lost in standard Tangut? Perhaps preinitials in the transcribed dialect normally corresponded to tone 1 in standard Tangut, but 'heart' developed tone 2 in standard Tangut because it had lost its preinitial before tonogenesis.

- If Tangut grades were like Chinese grades as I interpret them, Grade IV was the most palatal. But exactly how this palatality was expressed is unclear. Did Tibetan nye ~ ne transcribe [ɲe], [nʲe], [nie], etc.?

- What Tangut sound did Tibetan final -H transcribe? The mysterious 'prime'?

On to pre-Tangut:

-e'4 with 'prime' has six sources:

*Cɯ-...-aŋX, *Cɯ-...-eŋX, *Cɯ-...-enX

*(Cɯ-)...-jaŋX, *(Cɯ-)...-jeŋX, *(Cɯ-)...-jenX

-e4 without 'prime' has only two sources (and yet is more common!):

*Cɯ-...-aŋ and *(Cɯ-)...-jaŋ

I no longer think *Cɯ-...-an(X) is a source of -e(')4.

- Exterior cognates of 'heart' point to a front vowel and *-ŋ e.g., Tibetan snying.

- But they also point to *s- and not *k-.

- STEDT's Proto-Tibeto-Burman roots #251, #689, and #1385 have *s/k-, but the data on the site don't seem to support *k-.

- And if pre-Tangut had *s- in 'heart', that consonant would condition tension absent from 2ne'4 (i.e.., 'heart' would be *2neq4).

Taking all of the above into account, the pre-Tangut word for 'heart' was

*kɯ-neŋXH or *k(ɯ)-njeŋXH

with a front vowel like Tibetan snying. (*-H conditioned tone 2.)

But 'conscience' could not be

*kɯ-neŋH or *k(ɯ)-njeŋH

because those forms would have developed into *2ni4, not 2ne4. (Whatever *X was blocked the raising of *e in *eŋX.)

Moreover, it is improbable that a nonbasic word 'conscience' would be derived from a basic word 'heart' via subtraction.

It is more probable that 'conscience' is an unrelated word with a different rhyme 

*Cɯ-naŋX or *(Cɯ)-njaŋX

that came to sound like 'heart'. A VEXING VOLUME

At the beginning of "An Interesting Reading", I mentioned the Tangut text

3457 0478 1483 2323 5404 4625 5302 1siw4 1sho'3 2ne4 1vy1 1la1 2me'4 0L?

'new collect compassion piety record final volume'

whose final character has an unknown reading. 0- indicates an unknown tone. L- indicates an unknown Class IX initial (l- lh- ld- r- z- zh-). ? indicates an unknown rhyme. I am going to start using -0 for an unknown grade.

This character appears as an initial speller in this fanqie chain:


1165 1luq3 'to rub, knead' = 5302  0L?0 + 0500 1tsuq4

(There is no phonemic distinction between Grades III and IV in rhyme 62 -uq3/4; the grade is automatically determined by the initial.)


4550 1lheq4 'sorcerer' = 1165 1luq3 + 3318 1cheq3

(There is no phonemic distinction between Grades III and IV in rhyme 64 -eq3/4; the grade is automatically determined by the initial.)

I have converted the readings from Gong's reconstruction from Li Fanwen (2008) into my system. I am unaware of any transcriptive evidence for 1165 and 4550 - or even any attestations of either character outside dictionaries. How can 1165 have a known initial if the initial of its initial speller is unknown? Why not reconstruct 1165 as 1Luq3/4? (l- r- zh- would be followed by Grade III and ld- lh- z- by Grade IV.) And why does the initial of 4550 (lh-) not match the initial of its initial speller 1165 (l-)? Shouldn't 4550 be reconstructed as 0Leq3/4?

How many other Tangut character readings are shaky? AN INTERESTING READING

I've been filling holes in my Tangut character folder lately. So far I have images for 3,634 out of the 6,125 Tangut characters in Unicode 9.0: i.e., about 59% of the total. The fact that I haven't needed images for two out of five characters even after nearly eleven years of blogging about Tangut indicates how skewed the distribution of characters is. I estimate the number of distinct characters in Guillaume Jacques' index to the

3457 0478 1483 2323 5404 4625 5302 1siw4 1sho'3 2ne4 1vy1 1la1 2me'4 0L?

'new collect compassion piety record final volume'

to be about a thousand.

The reading of 5302 is unknown. It is in the section for characters without homophones in the ninth chapter of Homophones, so it must have an L-type initial (l- lh- ld- r- z- zh). Beyond that nothing else can be said. I know of no transcriptions of it.

The reading of the first half of

4006 5383 0TS? 2se4 'interest' (in the financial sense)

should be unknown. Yet Li Fanwen (2008: 643) lists Gong's reconstructed reading as 2tswər. This reading was not in Li Fanwen (1997: 740). Kychanov and Arakawa (2006: 367), on the other hand, list the Sofronov-style reconstruction 2?ə̣.

How does anyone know what the rhyme of 4006 is? The character does not appear as a final speller in any fanqie. I do not know of any transcriptions of it that could even give us a vague idea of what the rhyme might have been. And transcriptions would not indicate the tone which I have written as 0- for unknown.

I write the initial as capital TS- to indicate that it belonged to class VI (alveolar sibilants other than z- which I suspect might have been lateral [ɮ]). But I don't know which class VI initial it had: ts-, tsh-, dz-, or s-.

Lastly, how does anyone know what 4006 5383 means? Neither edition of Li cites any attestations outside dictionaries, and Li (1997: 740, 974) lists no definitions for either 4006 or 5383. Have Kychanov and/or Arakawa found such attestations and identified the meaning from context? SINO-TANGUT PHONOLOGICAL PARALLELS (PART 1)

At a glance, Tangut and Tangut period northwestern Chinese (hereafter simply 'Chinese') phonology appear to be similar: 

- They had largely overlapping consonant inventories with a three-way distinction between voiceless unaspirated, voiceless aspirated, and prenasalized voiced: e.g., p- : ph- : b- [mb].

Tangut, however, had more consonants: gh-, lh-, ld-, r-, z- [ɮ].

And Chinese had an f- absent in most Tangut reconstructions (the exceptions being Nishida's and Arakawa's).

- They had six basic vowel types: u, i, a, y, e, o.

- These vowels had four types of variations ('grades').

Tangut, however, had further variations absent from Chinese: tension, retroflexion, and the mysterious quality that I write with -' and call 'prime'.

- They contrasted oral and nasal vowels.

- Their syllables had the structure C(w)V(G); they only permitted -w and perhaps -j in coda position.

Despite many common features, it would be an exaggeration to say that the two languages share a common phonology. Notice that I have not mentioned tones. There does not seem to be any correlation between the two 'tones' of Tangut and Chinese tonal categories: e.g., Chinese 龍 *2lon3 'dragon' was borrowed twice with both tones:

4897 1lon3 and 4203 2lon3

This could imply that Tangut and Chinese tones sounded very different, making one-to-one mapping between them impossible.

Or perhaps Tangut had phonations (plain vs. breathy?) instead of tones despite the use of 'tone' in the Tangut phonological tradition. The Tangut couldn't hear tones because they didn't have any. (I am now skeptical of the phonation hypothesis that I came up with in the late 90s. If Tangut had phonation and Chinese didn't, why didn't the Tangut simply borrow and transcribe all Chinese tones with Tangut clear phonation?)

One last possiblity - as yet unexplored - is that the Tangut were sensitive to sandhi variants of tones. Suppose, for instance, that Tangut and Chinese tones 1 and 2 were similar, and that Chinese tone 1 became tone 2 before tone 4: 龍栢 */1lon3 4pe2/ > [2lon3 4pe2] 'dragon cypress'. Then it would make sense to borrow that disyllabic word as

4203 4119 2lon3 1pi2

with the second tone while borrowing monosyllabic 龍 /1lon3/ = [1lun3] 'dragon' as

4897 1lon3

with the first tone. But why, then, was Chinese 龍栢 */1lon3 4pe2/ 'dragon cypress' transcribed (as opposed to borrowed) in the Timely Pearl as

4897 5970 1lon3 1pi2

with the first tone rather than the second? Here are five explanations:

1. The most boring, namely, that this is a random error.

2. Hypercorrection: the transcriber knew that the Chinese word for 'dragon' had tone 1 and might have assumed that tone 2 in the Tangut loan deviated from the Chinese (when in fact it reflected Chinese tone sandhi).

3. The transcription reflects a careful Chinese reading pronunciation "1lon3 ... 4pe2" without tone sandhi.

4. The transcription reflects a variant Chinese pronunciation without tone sandhi - perhaps from a dialect slightly different from the source of the Tangut borrowing.

5. The borrowing reflects a slightly earlier stage of Chinese with tone sandhi and the transcription reflects a slightly later stage without tone sandhi (and with the original first tone restored by analogy with 'dragon' in isolation?).

The tones are not the only differences between Tangut 2lon3 1pi2 'dragon cypress' and its Chinese source lon3 4pe2. I'll explore the others in part 2. DISSECTING A TANGUT MARRIAGE (PART 5)

If 5051 (second half of 1y4 1naq4 'marriage'; Boxenhorn code: biogeodex) could be abbreviated to resemble 2544 'sage'  (Boxenhorn code: geo) in 0532 2ge4 'to marry' (Boxenhorn code: hosgeo),


why wasn't it abbreviated that way in other derivatives?

3657 1y4 (first half of 1y4 1naq4 'marriage'; Boxenhorn code: giibiogeo)

1625 2tuq4 'to mate, marry' (Boxenhorn code: fosbiogeo)

5975 1naq4 'parallel, weft' (Boxenhorn code: palbiogeo)

In other words, why do those three characters have a 'hat' (bio) absent in 0532?

I think 3657 needed a 'hat' (bio) to distinguish it from an existing character without it:

2449 2bi1 'sun' (Boxenhorn code: giigeo)

2449 must precede 3657 in the chronology of tangraphic creation.

But there are no characters with the structures





so in theory the 'hats' (bio) are redundant, though their presence does makes the connection of 1625 and 5975 to 5051 more transparent.

I am reminded of the inconsistency of simplification in the postwar Japanese script:

- 獨 'alone' was simplified to 独 (with the phonetic 蜀 'the state of Shu' reduced to 虫 'bug')

- but 濁 'muddy' was not simplified to 浊 even though no such character already exists (and years later, 濁 was simplified to 浊 in the PRC).

There is no deep meaning behind the inconsistency of 独 and 濁. Perhaps there is none behind the inconsistency of

0532 without a 'hat' (bio)

on the one hand and

1625 and 5975 with 'hats' (bio)


Many Tangut marital characters from the previous parts contain

2544 2shen4 'sage' < Chinese *3shen3

and if one had never known about 2544, one might guess that it was a semantic component 'marry'. But it acquired that secondary function as an abbreviation of 5051:


5051 1naq4 = 3657 2705 2546 2705 1y4 2ber'4 1naq4 2ber'4

(first half of 1y4 1naq4 'marriage') right + 'god' right

2544 'sage' is semantic in 2546 'god', the phonetic of 5051. I have no doubt about the first half of the Tangraphic Sea analysis of 2546:


2546 1naq4 = 2544 1602 0149 0737 1naq4 2ngorn1 2wer1 1chhen3

'sage' all + 'protect' bottom

But I have doubts about the second half. 0149 must be derived from 2546 rather than the other way around. The 'person' on the right of 2546 is either simply 'person' (but why would 'god' have 'person'?) or an abbreviation of one of the 1,186 (!) tangraphs containing 'person'.

Someone (I?) should try to reconstruct a chronology of the derivation of tangraphs based on the Tangraphic Sea derivations plus common sense. Here's a sliver of that chronology:

In words: 2544 begat 2546, which in turn begat 5051 and 0149.

5051 begat 3657, 1625, 5975, and 0532 (but why does 0532 lack the 'horned hat' of the others?).

5138 begat 5138 1gu'1, first syllable of 1gu'1 1chhiw4, the name of a Tangut god (1chhiw4 is 'six').

Next: Why don't all married sages wear hats? DISSECTING A TANGUT MARRIAGE (PART 3)

As I wrote in part 2, I thought that 5051 1naq4 was simply phonetic in its homophone 5975:


5975 1naq4 'parallel, weft' = 5938 3936 5051 3936 2ge4 1pha1 1naq4 1pha1

'classical text, warp' left + (second half of 1y4 1naq4 'marriage') left

But then I discovered that 5938, listed as the source of the left side of 5975, had a homophone

0532 2ge4 'to marry'

which the Tangraphic Sea lists as a definition for the first half of

3657 5051 1y4 1naq4 'marriage'.

Is 0532 'to marry' a metaphorical extension of 5938 'warp' (in the sense of weaving)? Li (1997: 104) defined 0532 as 'weave, marry' - to which STEDT added '(join in marriage)' - but the revision of the entry for 0532 in Li (2008) has the definition 'to marry, to unite in marriage' without any reference to weaving..

If 0532 is originally a weaving term, then could 5051 1naq4 of 3657 5051 1y4 1naq4 'marriage' also originally be a weaving term - specifically, an extended usage of 5975 1naq4 'parallel, weft'?

3657 1y4 is attested as an independent word 'marriage, matchmaker, relatives by marriage'. 3657 5051 1y4 1naq4 'marriage' is thus originally 'marriage weft' with the first half clarifying the metaphorical use of the second half which does not occur on its own in the sense of 'marriage'.

Do 5938 2ge4 < *Nɯ-Kan/ŋ ~ *Cɯ-ŋgan/ŋ 'warp'* and 5975 1naq4 < *Sɯ-naC 'weft' have cognates outside Tangut? Unfortunately, neither 'warp' nor weft' are in the rGyalrongic Languages Database. Both are at STEDT, but I can't find any cognates there or in Guillaume Jacques' Japhug dictionary which lists tɤ-ʁjar 'warp' and tɯ-jlɤβ 'weft'.

*I reconstruct a presyllable with to condition Grade IV after a velar. However, I do not know whether that presyllable had a nasal initial *N- or preceded a nasal. I also do not know if the velar stop after the nasal was originally voiced or not. In any case, Tangut g- is from *ŋg- which may in turn have more complex origins. DISSECTING A TANGUT MARRIAGE (PART 2)

The character for the second half o

3657 5051 1y4 1naq4 'marriage'

has two probable derivatives besides the first character:


1625 2tuq4 'to mate, marry' = *0482 3936 5051 3936 2dzen4 1pha1 1naq4 1pha1

*'to copulate' left + (second half of 1y4 1naq4 'marriage') left?


5975 1naq4 'parallel, weft' = 5938 3936 5051 3936 2ge4 1pha1 1naq4 1pha1

'classical text, warp' left + (second half of 1y4 1naq4 'marriage') left

The analysis of 1625 is my guess since it is one of the many characters whose analysis was in the lost second tone volume of the Tangraphic Sea.

0482 is the clarifier of 1625 in Homophones, so it is certain that the Tangut considered the two to be semantically related even if 0482 was not actually in the analysis of 1625.

1625 2tuq4 should go back to pre-Tangut *Sɯ-to-H:

*S- conditioned the tension of the vowel transcribed as -q.

*-ɯ- conditioned Grade IV in lower vowels (*a, *e, *o) after dentals

I am assuming that the raising of *o to *u predated the conditioning of Grade IV.

I could be wrong. Maybe Grade IV was conditioned by a raised *o after a dental:

*S(ɯ)-to-H > *S(ɯ)-tu-H > 2tuq4

If so, maybe there was no after *S-.

*-o raised to -u (Jacques 2014: 206); whether this occurred before or after Grade IV is uncertain.

*-H conditioned tone 2; it may ultimately be from *-ʔ or *-h (< *-s).

*Sɯ-to-H might go back to an even earlier *Sɯ-ton-H if it is cognate to forms for 'to marry' like

Somang rGyalrong ston muŋ ka-pa

Daofu sto lmo və  (is v- a lenited *p preserved in Somang?)

Xinlong Queyu ste⁵⁵ rmu⁵⁵ vi¹³ (did *o front before *-n?; v- < *p-?)

and if *Cɯ-...-on merged with *Cɯ-...-o into -u3/4. That would be parallel with the merger of *Cɯ...-en and *Cɯ...-e into -i3/4, and one could propose a general rule:

*Cɯ-...mid vowel + -n > Grade III/IV high vowel

5051 must be semantic in 1625 since the two sound nothing alike. Conversely, 5051 must be phonetic in 5975. But could it be something more? I didn't think so at first. I'll explain why I changed my mind next time. DISSECTING A TANGUT MARRIAGE (PART 1)

The two halves of

3657 5051 1y4 1naq4 'marriage'

are written similarly, so it's not surprising that they have circular derivations in the Tangraphic Sea:


3657 1y4 = 3436 2705 4973 3936 1sa'1 2ber'4 1naq4 1pha1

(second half of 1ne4 1sa'1 'close relative) right + (second half of 1y4 1naq4 'marriage') left


5051 1naq4 = 3657 2705 2546 2705 1y4 2ber'4 1naq4 2ber'4

(first half of 1y4 1naq4 'marriage') right + 'god' right

2546 is clearly phonetic in its homophone 5051. So I think the sequence of character creation was

2546 > 5051 > 3657

though I am surprised the character for a second syllable was devised before the character for a first syllable.

Why was 5051 abbreviated in 3657? Because it was no longer phonetic, so there was no longer any need to keep all of 2546 under the 'horned hat'? Because the right-hand 'person' component (Boxenhorn code: dex) is so common (it appears in one out of five Tangut characters) that it is almost expendable? In any case, 5051 doesn't appear in its entirety as a component of any character.

Next: Other instances of 'depersonalized' 5051. EATING BEGINS WITH LOVE, NOT MARRIAGE

Thanks to Guillaume Jacques for catching my mistake. The correct fanqie for 'eat' from "The Past and Present Sound of Eating in Tangut" (Part 1 / Part 2) is


4517 1dzi3 'eat' = 4973 1dzu4 'love'+ 0932 1i3 'many, more, much'

The correct initial speller 4973 is visually very similar to 5051, the erroneous initial speller that I posted which represents the second half of the disyllabic word

3657 5051 1y4 1naq4 'marriage'

I will take a closer look at this word starting tomorrow.

Alas, the enigma of the final speller remains. Why is it Grade III instead of Grade IV after dz-? THE 'RIGHT' RHYME (PART 6) / TANGUT PHONETIC DATABASE VERSION 1.2

After this I think I'll be really be done with the topic of Tangut rhyme 101 (1.93/2.86) for some time.

I have updated my database of Tangut readings (download version 1.2 here) to incorporate the changes I have proposed in this series:

- the reinterpretation of rhyme 101 as -er' instead of -ir' (see part 1)

- the reassignment of

2705 'to help; right side of character (i.e., assistant)' and 2928 'to explain, note' (with 'speech' on its right side; probably a different spelling of a specialized usage of 'to help')

from rhyme 2.54 to 2.86 (see "Explaining the 'Right' Reading")

I didn't upload version 1.1 in which I replaced a lot of symbols for nasalization with glides after mid vowels, bringing my transcription closer to Gong's reconstruction: e.g.,

-en > -ey (corresponding to Gong's -əj, -iəj, -jɨj)

-on > -ow (corresponding to Gong's -ow, -iow, -jow)

I have retained the nasalization from 1.0 in 1.2. THE 'RIGHT' RHYME (PART 5)

I didn't expect to write a five-part series on this topic, but I forgot to mention the Tangut-internal evidence on the rhyme in part 4, so it's getting its own part.

"It" consists of alternations between rhyme 101 and other rhymes (70 -iq3/4 and 84 -ir3/4) in what superficially appear to be synonym pairs (Gong 2002: 103):


1. 5683 2er'4 ~ 5209 2iq4 'to stretch, lengthen'


2. 1928 1ler'3 ~ 5850 1liq3 'to rub'


3. 5742 1tser'4 ~ 3641 1tsir4 'to choose'

Gong reconstructed rhyme 101 as -iir which matched the i of the non-101 members of those pairs. However, the placement of rhyme 101 in the Tangraphic Sea (see part 1) and the Tibetan and Chinese transcription evidence (see parts 2 and 3) point to a nonhigh vowel. The merger of *-ir' with *-er' that I first proposed in part 1 can account for the vowel mismatch.

Those three pairs could be reconstructed in accordance with the proposals in part 4 as

1. 'to stretch, lengthen'

*r((ɯ)-s)ɯ-ʔa-X/*r((ɯ)-s)ɯ-ʔe(n/ŋ)-X ~


2. 'to rub'

*r((ɯ)-s)ɯ-la-X/*r((ɯ)-s)ɯ-le(n/ŋ)-X ~


3. 'to choose'

*rɯ-tsa-X/*rɯ-tse(n/ŋ)-X ~



*Cɯ-tsar-X/*Cɯ-tser-X ~


All three pairs require a -vowel presyllable to condition the raising of *a and/or *e to their Grade III/IV reflexes.

The rhyme 101 members of pairs 1 and 2 must have had a prefix *r(ɯ)- to condition retroflexion absent in their rhyme 70 counterparts ending in -iq. If their bases had *sɯ-prefixes, then there is no need to reconstruct  after *r- since the  of *sɯ- would be sufficient to condition Grades III/IV. But the possibility of *rɯ-sɯ- cannot be ruled out.

All three pairs involve the presence or absence of the mysterious factor *X that I have arbitrarily written at the ends of syllables.

Without external evidence, there is no way to narrow down the possibilities.

And without narrower definitions, there is no way to be sure about the functions of the various affixes. (*X may not have been an affix, though for convenience I write it as if it were a suffix.) THE 'RIGHT' RHYME (PART 4)

I originally wanted to end this series with comparative evidence for Tangut rhyme 101 (1.93/2.86) words, but I don't know of any. Which is not surprising as there are only thirteen characters with a total of five different readings ending in rhyme 101 (Arakawa 1997: 91):

Homophones initial class Tone 1 Tone 2
I   2ber'4
V 2ker'4
VI 1tser'4 2tser'4
IX 1ler'3  

Grades III and IV are normally in complementary distribution. If there are no Grade III/IV minimal pairs in a rhyme, I assign Grade III to Class II and VII consonants and the Class IX consonant l-. The default grade for all other initials including the Class IX consonant lh- is IV. This assignment parallels the general pattern of distribution of initials in rhymes that have Grade III and IV minimal pairs. The different distribution of l- and lh- suggests that they did not simply differ in terms of voicing. I think l- was velar [ɫ] (as in Nishida's 1ɫĭə̣r corresponding to my 1ler'3) whereas lh- was a fricative [ɬ]. (Note that Nishida reconstructed l- and ɫ- as distinct initials, whereas I think /l/ was always [ɫ] except before Grade IV rhymes where it was [l].)

Nishida (1964: 67) was the first to identify rhyme 101 as retroflex, and that classification has been carried over into the reconstructions of Arakawa, Gong, and this site. (Sofronov does not reconstruct retroflexion in Tangut vowels in any of his three reconstructions.)

Vowel retroflexion has two sources in Tangut: (pre)initial *r- and final *-r. The r- of Tibetan transcriptions of Tangut rhyme 101 syllables may directly reflect a preinitial r- preserved in a nonstandard Tangut dialect (see part 2). Preinitial *r- may have been *rɯ- with a high vowel conditioning Grade III/IV.

Nishida (1964: 67) also reconstructed tension in rhyme 101. Gong reconstructed preinitial *s- as the source of tension in Tangut vowels, though he reconstructed length rather than tension as an extra nonretroflex feature in rhyme 101: -iir. If Nishida and Gong are both right, the sources of rhyme 101 syllables would have preinitial *s(-r)- or *(r-)s-.

I can't rule out Nishida's tension, but I do not think Gong's length was present in this rhyme or any other, at least not during the Tangut imperial period. Sanskrit has phonemic vowel length that does not correlate with Gong's reconstructed vowel length in the readings of Tangut transcription characters. For now I simply acknowledge that this rhyme was somehow different from regular -er3/4, and indicate that difference with an apostrophe that I call 'prime'. I arbitrarily indicate the unknown source of 'prime' as a final *-X in pre-Tangut. The position of *-X is simply carried over from *-'; the actual conditioning factor of -' could have been anywhere in the syllable. I have never seen -'/*-X correlate to anything in any other Sino-Tibetan language. It is remotely possible that Tangut preserves something lost in the rest of its gigantic family, but I am hesitant to make such an extreme claim until I look harder. Which won't be tonight. All I can say for now is that *-X seems to have blocked raising in rhyme 40, the nonretroflex counterpart of 101:

R10 -i3 < *Cɯ...-en, *Cɯ...-eŋ

R11 -i4 < *Cɯ...-en, *Cɯ...-eŋ

R40 -e'3/4 < *Cɯ...-enX, *Cɯ...-eŋX

and perhaps *Cɯ...-anX, *Cɯ...-aŋX?

Integrating the above proposals (other than *s-) with Guillaume Jacques' sources for -i/e3/4 (in my notation; -ji(j) in his) and my hypotheses of mergers from part 1 and part 3, I have come up with up to twelve possible sources of -er'3/4:

Early pre-Tangut Late pre-Tangut Standard Tangut
*rɯ-CaX *Cir'3/4 Cer'3/4
*r(ɯ)-CukX *Ciwr'3/4
*rɯ-CekX *Cewr'3/4
*rɯ-CanX *Cer'3/4

Obviously not all twelve had to exist in pre-Tangut. THE 'RIGHT' RHYME (PART 3)

In part 1, I proposed reinterpreting Tangut rhyme 101 (1.93/2.86) as -er' instead of -ir'. A mid vowel e fit the Chinese transcription *3me3 (Timely Pearl 32.2.8) for

2705 'to help; right side of character (i.e., assistant)'

better. And if rhyme 101 had e, that would be the vowel expected after y in rhyme 100 following the usual order of Tangut vowels.

In this part, I will start to look at the transcriptional evidence for rhyme 101.

1. Sanskrit transcription evidence

There isn't any. This tells us that 101 probably didn't sound like anything in Sanskrit. V'-rhymes are rare in Tangut transcriptions of Sanskrit and Vr'-rhymes seem to be nonexistent. That tells me that the unknown quality that I write with a prime symbol was absent from Sanskrit.

2. Tibetan transcription evidence

Tai (2008: 229) lists 22 transcriptions of two tangraphs with rhyme 101:

0467 1tser' 'method, art, skill, dharma'

transcribed as rtsi (x 1), rtse (x 5), rdze (x 1), rc? (x 1)

2698 2tser' 'nature, character'

transcribed as rtse (x 12), ?e (x 1)

Out of 22 transcriptions, 20 end in -e, 1 ends in -i, and 1 ends in an unknown vowel. The obvious conclusion is that the vowel of rhyme 101 was something like Tibetan e.(Why did Gong reconstruct long i instead of long e for rhyme 101?)

The preinitial r- of the transcriptions may either indicate the retroflexion of the following vowel or reflect an actual preinitial r- in the transcribed Tangut dialect corresponding to retroflexion in the standard dialect described in the Tangut phonological tradition:

Tangut dialect transcribed in Tibetan Standard Tangut
CV (plain vowel) CV (plain vowel)
rCV (r- + plain vowel) CVr (retroflex vowel)

If the table above is correct, the dialect transcribed in Tibetan had fewer vowels than the standard dialect; the latter had retroflex vowels absent in the former.

I would expect the Chinese transcriptions of rhyme 101 characters other than 2705 to also contain *e, but we will see that is not the case in part 3. THE 'RIGHT' RHYME (PART 1)

I wouldn't have guessed that

2705 2bir'4 'to help; right side of character (i.e., assistant)'

was transcribed in the Timely Pearl as *3me3 (32.2.8) with -e rather than -i. Why not transcribe it as, say, 彌 *1mbi4 with *-i?

It is a fact that 2705 is listed under rhyme 2.86 in the Precious Rhymes of the Tangraphic Sea.

On the other hand, it is merely a hypothesis that rhyme 2.86 was 2-ir'4: i.e., second tone Grade IV i with retroflexion and some unknown quality marked as -'. Should -ir' be -er' with a mid vowel like the transcription *3me3 of 2705?

In my systerm, there is no -er' or -ur', though all other Tangut vowel types are represented in -Vr' rhymes: -ir', -ar', -yr', -or'. Did *-ir' and *-er' merge into one rhyme while *-ur' and *-or' merged into another? It would be neat if the merged rhymes were of the same height: e.g., mid -er' and -or' or high -ir' and -ur'.

The usual order of vowels in the Tangraphic Sea is u-i-a-y-e-o (with iw/ew between e and o). This order is not followed in the -Vr' rhymes. Reinterpreting rhyme 101 (1.93/2.86) as -er' (cf. Arakawa's -yer2) would make the -Vr' rhyme order closer to the norm:

88. 1.83 -ar'1

89. 2.75 -ar'3/4

99. 2.84 -ir'1 (a merger of *-ir'1 and *-er'1?)

100. 1.92/2.85 -yr'3/4

101. 1.93/2.86 -er'3/4 (formerly written as -ir'3/4; a merger of *-ir'3/4 and *-er'3/4?)

102. 1.94 -or'1 (a merger of *-ur'1 and *-or'1?)

103. 1.95 -or'3/4 (a merger of *-ur'3/4 and *-or'3/4?)

(The absence of -Vr'2 rhymes may tell us that the phonetic quality of -' was incompatible with Grade II which was at least partly from *-r-. Retroflex vowels were conditioned by [pre]initial *r- or final *-r but not *-r-.)

The only remaining oddity in the order of -Vr' rhymes is the placement of 88-89 -ar' not only before 99 -ir' but also in the middle of the -Vr rhyme sequence:

77. -er1

78. -er2

79. -er3/4

80. 1.75/2.69 -ur1

81. 1.76/2.70 -ur4

82. 1.77/2.71 -ir1

83. 1.78 -ir2

84. 1.79/2.72 -ir3/4

85. 1.80/2.73 -ar1

86. 1.81 -ar2

87. 1.82/2.74 -ar3/4

88. 1.83 -ar'1

89. 2.75 -ar'3/4

90. 1.84/2.76 -yr1

91. 1.85 -yr2

92. 1.86/2.77 -yr3/4

93. 1.87/2.78 -ewr1

94. 1.88/2.79 -iwr4

95. 1.89/2.80 -or1

96/97. 1.90/2.81 -or2/3/4

The placement of the -er rhymes (77-79) before the -ur rhymes (80-81) instead of after the -yr rhymes (90-92) defies explanation.

Next: The transcriptive evidence for the 'right' rhyme. EXPLANING THE RIGHT READING

Until now I've been reading the character

2705 'to help; right side of character (i.e., assistant)'

in Tangut character analyses as 2beq4 (a reading converted from Gong's reconstruction 2bjịj in Li Fanwen's 1997 dictionary).  But last night I discovered that it should be 2bir'4 (a reading converted from Gong's reconstruction 2bjir in Li Fanwen's 1997 dictionary).

2705 and its homophone

2928 'to explain, note' (with 'speech' on its right side; probably a different spelling of a specialized usage of 'to help')

form a two-character homophone group in the first chapter of Homophones. All characters in that first chapter have readings with labial initials (p-, ph-, b-, m-). The Timely Pearl transcription of 2705 is *3me3 (32.2.8); the diacritic indicates b- rather than m-.

I have no idea how Nishida, Sofronov, and Gong determined the rhyme

N -ɛ̣ 2.54

S -ɪ̭e 2.? (no rhyme number listed on II: 307, but implictly 2.54 since its transcription 命 is listed under 2.54 on II: , presumably a typo for -ɪ̭ẹ since -ɪ̭e 2.54 should either be 2.35 or 2.37 according to I: 137)

G -jịj 2.54

= my 2-eq4

before the rediscovery of the Precious Rhymes of the Tangraphic Sea which listed 2705 under rhyme 2.86 (-ir'). 2928 is not in PRTS, but its rhyme must be identical to that of 2705 since the two are homophones.

(8.29.0:59: Perhaps 2.54 was determned by a process of elimination. The Chinese transcription 命 *3me3 most likely reflected a Tangut reading like be4 (be3 would be anomalous in Tangut). 2705 could not have had the first tone because 2705 was not in the Tangraphic Sea's volume for first-tone characters. 2705 was thus thought to be in the lost second-tone volume [and later it indeed was found in  the second-tone volume of  the Precious Rhymes of the Tangraphic Sea]. But was it 2be2 [2.33], 2be'2 [2.35], 2beq4 [2.54], or 2ber4 [2.68]? 2705 was not in the homophone groups thought to be for 2be2 and 2be'2, so 2beq4 and 2ber4 were the remaining possibilities. I don't know why 2.54 was favored over 2.68. The actual rhyme turned out to be 2.86 - which was none of the above!)

All this makes me wonder how many other second tone readings have been revised in Li Fanwen's 2008 dictionary in accordance with the Precious Rhymes of the Tangraphic Sea. FULL TETRATANGRAPHIC FORMULAE

You may have noticed in my last entry that I started quoting Tangraphic Sea four-character character analyses in full instead of converting them into A + B (+ C) formulae which are easier for me to type.  Perhaps paying attention to the exact wording of the analyses might help me formulate hypotheses about the script (which remains enigmatic to me after two decades).

David Boxenhorn saw the analysis of 'six' (which I will rewrite here in full) in "Tired Chapters of Accuracy"


3200 1chhiw4 = 3849 3130 1012 2705 1zhiw3 2mer4 1zeq4 2beq4

'(first syllable of 'sixth (month)') + palace + how-much right'

and my assessment of it as "implausible". He wrote that the implausiblity

supports the theory that the analyses are mnemonic.

You know what else supports the mnemonic theory (I just thought of it!)? The analyses are all four characters. I can easily imagine a group of students reciting them. Even singing them.

And isn't there a Chinese tradition of four-character sayings?

[... I]s there any continuity between adjacent analyses, either semantic or phonetic? Something to suggest that they are not meant to be read in isolation?

I just realized that The Golden Guide consists of 5-character lines. Characters plus their 4-character analyses are also 5 characters long. They could be both be recited in the same way, e.g. with the same tune!

You know, in English the most common meter is iambic pentameter, which would be great for every two lines of these works.

I should look at analyses in adjacent Tangraphic Sea entries in the new features.

I had been thinking that if the Golden Guide had a tune, it couldn't apply to the Tangraphic Sea analyses, but I had overlooked the possibility of including the analyzed character's reading in the tune.

As for meter, I have never seen a comparative study of the structures of poetry in (South)east Asian 'monosyllabic' languages. Does such a study exist? Or even, say, a study of poetic structures in what Guillaume Jacques might call the Macro-rGyalrongic world? Is there a Qiangic language today with poetry characterized by five-syllable lines? Is there a tonal pattern within and/or between lines of the Golden Guide and/or the Tangraphic Sea analyses?

In any case, I don't think the Tangraphic Sea analyses are truly etymological. In other words, I don't think they necessarily reflect the reasoning of the creator(s) of the script. Who would devise graphs for 'sixth (month)', 'palace', and 'how much', and then fuse them into 'six'? 'EIGHT' FROM 'SEVEN', 'SIXTH' FROM 'FIVE'?

I have been haunted by the Tangraphic Sea analysis of 4602 1ar4 'eight' for twenty years:


4602 1ar4 'eight' = 4778 1139 2750 1868 1shaq4 1e4 1ghu2 1teq4

'seven GEN head remove' = 'remove the top of seven'

If 'seven' and 'eight' are related characters, I would expect the more complex character ('seven') to be built up from a less complex character ('eight') rather than the other way around. But that scenario also has issues. Why would the character for 'eight' be designed before the character for 'seven'? And what does the top half of 'seven' do? I used to think it might be a phonetic symbol for sh-, so 'seven' was written as sh-yar = 1shaq3. However that configuration of strokes (Boxenhorn code: biozoxzox) appears nowhere else in the Tangut script. Why did 'seven' merit a unique 'head'?

The Tangraphic Sea analysis of 'seven' does not answer that question:


4778 1shaq4 'seven' = 4751 3916 4602 1602 1se1 2si4 1yar4 2ngorn1 

'clean (nominalizer) eight all' = '(the top of?) cleanliness all of eight'

If 'eight' is from 'seven' - at least in the Tangraphic Sea - it is not entirely surprising that the second syllable of

3849 5081 1zhiw3-1vi1 'sixth (month)'

is said to be from 'five':


5081 1vi1 = 5286 3936 1999 2705 1vi1 1pha1 1ngwy1 2beq4

'(second syllable of 1ten4-1vi1 'clever') left + five right'

5286 is phonetic. 1999 is not optimally semantic, though. Why not simply extract part of

3200 1chhiw4

the character for the regular word for 'six'?

8.27.22:39: Cf. how the character for 'fourth (son)' is derived from the regular character for 'four' (though the words are unrelated!):


4934 1ngwyr 'fourth (son)' = 4971 2750 2205 1602 1shwi3 1ghu2 1lyr'4 2ngorn1 

'age head (i.e., top) + four all'

The only other son-counting character with a similar structure is


1257 1ar4 'eighth (son)' = 0384 3936 4602 1602 1leq4 1pha1 1ar4 2ngorn1 

'son left + eight all'

which is simply a different spelling of 4602 1ar4, the regular word for 'eight'. Why did 'fourth (son)' incorporate 'four' unlike other characters for counting words unrelated to regular numerals? DID COMMON AND 'RITUAL' TANGUT SHARE A ROOT FOR 'SIX'?

So far the last word on the subject of the so-called 'ritual language' of Tangut is Andrew West's 2011 article. I have yet to write a full response after over five years. My view in short is a blend of Nishida Tatsuo's and Andrew's; the 'ritual' language is a subset of substratal vocabulary used in glosses. I think these words were borrowed from a language of unknown affiliation - possibly an isolate spoken in Tangut territory. 

But if I am right, I would not expect the first syllable of 'sixth (month)'

3849 5081 1zhiw3-1vi1

to sound like the regular word for 'six':

3200 1chhiw4*K-truk.

Last night it occurred to me that 3849 might be 3200 with a root initial that lenited after the vowel of a lost presyllable:

*KV-truk > *KV-ch- > *KV-zh- > zh- (the relative timing of *-uk > -iw is unknown)

cf. 3200 in which *KV- conditioned aspiration: *KV-truk > *K-truk > 1chhiw4

But if that were the case, then what is 5081? It is not attested by itself except as a dictionary entry, and it does not occur as a suffix after other numerals.

I used to think that 3849 5081 1zhiw3-1vi1 was an unanalyzable disyllabic word, but 1vi1 is homophonous with

3649 1vi1 'sixth' (son).

Is 1zhiw3-1vi1 a redundant compound combining a variant of the basic Tangut word for 'six' with a substratal word? Or is 'sixth' before 'son' an abbreviation of a disyllabic substratal word 1zhiw3-1vi1?

8.26.14:31: Perhaps the resemblance of substratal 1zhiw3-1vi1 to 1chhiw is no more meaningful than that between Malay dua < *duSa 'two' and Sanskrit dva- ~ dvi- 'id.' (borrowed into Malay as dwi-). The zh- of 1zhiw3 does not have to be from a lenited consonant; it could be a direct borrowing of a voiced fricative from a substratal language. Similarly,  the -iw of 1zhhiw1 need not be from *-uk. TIRED CHAPTERS OF ACCURACY

In my last post I mentioned the unusual -aq1 ~-iq1 alternation in a disyllabic Tangut verb 'not know':


5077 1817 1my4-1daq1 ~ 5077 5283 1my4-1diq1. 

The first syllable does not seem to occur by itself. Was it originally the same morpheme as

5643 1my4 'not'

which negates auxiliary verbs? Was there originally a d-verb for 'know' that was an auxiliary in a V + 1my4 + d-verb 'not know how to V' construction?

The second syllables occur as glosses for


3020 1ja'3 'accuracy' = left of 3543 1dzwy1 'chapter' + left of 1817 1daq1 'know'

in Mixed Categories. That implies they might also be standalone verbs, though I have not yet seen them used by themselves in texts.

3020 shares its right side 'speech' (derived from Chinese 言 'id.'?) with 1817. (The bottom half of 'speech' has different shapes depending on whether it is on the left or right. Generaly the final stroke of a Tangut character cannot point in a northeast direction.)

Its left side

Boxenhorn code: jil

is only in three other characters. It is semantic in 3543 (see above) with 'hand' on the right (why?), semantic and phonetic in

3021 1ja'3 'chapter' (cf. 3543 'chapter' above)

and phonetic in 3523, half of the mirror-image disyllabic words


3523 1688 2ja'3-1gu1 ~ 1gu1-2ja'3 'tired' (cf. 3020 and 3021, both 1ja'3)

I have not seen 3021, 3523, or 1688 outside dictionaries.

(8.25.23:20: Li Fanwen 2008 and Kychanov and Arakawa have glosses for each half of those words, but I don't know how they determined which side meant what:

3523:  L 'skinny, wan and sallow'; K&A 'toil, exhausting labor'

1688: L 'toil', K&A 'toil'

K&A defined 2ja'3-1gu1 as 'toil'; they have no entry for 1gu1-2ja'3.)

I first encountered the right side of 3021 (Boxenhorn code: dim) in 3200 1chhiw4 'six' whose Tangraphic Sea analysis is implausible:


3200 1chhiw4 = 3849 1zhiw3 + 3130 2mer4 'palace' + 1012 1zeq4 'how much'

I doubt that the character for the regular word for 'six' was derived from 3849,  the character for the first half of 'sixth (month)' in the so-called 'ritual language':

3849 5081 1zhiw3 1vi1

And why write 'six' with part of 'palace'?

'How much' is not the worst source of a numeral character component, but its right half doesn't appear in any other numeral characters.

The right side of 3523 and 1688 is semantic:

4675 2rer4 'toil, hard work'.

Work makes one tired.

8.25.20:20: The vague similarity of 4675 to the right side of Chinese 作 '' may be coincidental.

The left side of 1688 from

0678 1gu1 'arise, build'

is phonetic and vaguely similar to the left of Tangut period northwestern Chinese 孤 *1ku1. WHERE DID *-U GO IN TANGUT?

My last series of posts was about the Tangut verb 'eat' which has two stems:


4517 1dzi3 < *Nɯ-dza and 4547 1dzo4- < *Nɯ-dza-u

The latter was derived from the former via the addition of a third person object suffix *-u. Here is my version of Gullaume Jacques' (2014: 232) table of pre-Tangut *-Ø ~ *-u verb stem alternations and their Tangut reflexes. I have added the first two rows and reorganized the others.

Alternation type Pre-Tangut Tangut
u *Cʌ...-ə -u1
*-ə-u -u3
i *Cɯ-...-ej -e3/4
*Cɯ-...-ej-u -i3/4
y *Cɯ- ...-aC (Cŋ) -a3/4
*Cɯ-....aC-u -y3/4
o2 *(Cʌ)-...-ra -i2
*(Cʌ)-...-ra-u -o2
o3/4-i *Cɯ-...-a -i3/4
*Cɯ-...-a-u -o3/4
o3/4-u *Cɯ-...-o -u3/4
*Cɯ-...-o-u -o3/4
o3/4-e *Cɯ-...-aŋ -e3/4
*Cɯ-...-aŋ-u -o3/4

Two oddities that jump out at me:

1. Why are there so few alternating verbs with Grade I rhymes? I only know of two such verbs, and the second cannot be reconstructed with *-u:


1338 1dzu1 ~ 4973 1dzu3 'love'


5077 1817 1my4-1daq1 ~ 5077 5283 1my4-1diq1 'not know' (but the first syllable is Grade IV)

2. Why are there no pre-Tangut verbs ending in *-i ~ *-i-u?

I presume pre-Tangut *-u ~ *-u-u stems merged into Tangut -u. Did *-u vanish after higher series vowels (*u, *i, *ə)? Or is the set above an incomplete remnant of what was once a much larger system involving more (or even all) verb stems that was increasingly regularized (i.e., lost its alternations) over time?

8.24.23:13: The table above does not include non-*u alternations such as the one in 'not know'.

I wonder if the i-type and y-type alternations involve ablaut rather than *-u. I can imagine how *-ej-u fused into -i

*-eju > *-iju > *-ju > *-y > -i

though I'm surprised it didn't merge with -ew.

However, I have more difficulty with deriving -y (phonetically some sort of nonlabial central vowel or diphthong) from *-aC-u. *-C- must have lenited to nothing, and *a and *u might have fused into a vowel that was nonlabial and central like *a but nonlow like *u. THE PAST AND PRESENT SOUND OF EATING IN TANGUT (PART 5)

In parts 3 and 4, I reconstructed the ancestors of the stems of the Tangut verb 'eat'

4517 1dzi3 and 4547 1dzo4-

with a presyllable *Nɯ- whose high vowel conditioned Grades III and IV.

But looking at the word's cognates in the rGyalrongic Languages Database (e.g., Brag bar ka-nə'-dza) made me wonder if the word had two presyllables - and if the high vowel conditioning Grades III and IV belonged to a presyllable preceding a nasal: Unfortunately, almost none of the presyllables preceding (n(V)-)dzV in rGyalrongic languages have high vowels:

a'- ka'-
kə- kwə- tə- nə-

The exceptions are Da tshang towu'nza and Japhug (variety A) tu'-nza; Guillaume Jacques' (2016: 143) Japhug simply lists ndza (the form given for variety B in the database) with the directional prefixes tɤ- and thɯ-. (How many of the presyllables above are directional prefixes?)

I doubt that a single presyllable before *NV- can be reconstructed at the Proto-rGyalrongic level; different varieties apparently added different prefixes to *NV-dza. In some cases, there were two prefixes:

*ka-tV- > Tsho bdun (variety A) kat˺'- (B has kə-)

*kV-pV- or *o-kə- > Brag steng 'Khyung-ri kwə-

*to-pV- > Da tshang towu'-

My hypothetical *pV- is the source of xxx ᴾ-. I don't know how ᴾ- differs from p-. I also don't know what the function of the apostrophe is. Is it a breve? I assume it is not a glottal stop which appears as ʔ in the site's transcription.

Brag bar ka-nə'-dza preserves the syllabicity of the nasal element of 'eat'. Without knowing anything about Brag bar phonology, I do not know whether *nə' can be derived from *Nɯ- with a high vowel.

bZhi lung kaˈ-ᵐtsok˺ points to *m as the specific value of *N. I am guessing that the small  indicates that ᵐts- is a unit phoneme, and that the presyllable is ka'- rather than *ka'm-.

8.23.23:08: Do I have to posit a high-vowel presyllable to account for the grades of the Tangut forms? Guillaume Jacques derives his Grade III (= my Grades III and IV) from *-j-, and such a medial is in some forms of 'eat' in the rGyalrongic database: e.g., dPa' dbang ndzja (identical to Guillaume's pre-Tangut *ndzja!) and Kha ra kyo ka'-zje. Is this attested -j- primary or secondary? In other words, did languages like Japhug lose *-j- (*ndzja > ndza), or did *a become ja (with or without further fronting and raising: ja > je > i) in some languages? I favor the latter scenario, but I am not absolutely certain, as I do think *-j- did exist in the ancestor of these languages. I just don't know if it existed in this particular word at an early stage. THE PAST AND PRESENT SOUND OF EATING IN TANGUT (PART 4)

As I revised what was supposed to be the last part of my trilogy on

4517 1dzi3 < *Nɯ-dza 'eat'

I realized I had overlooked something obvious. 4517 belongs to a small class of verbs with two stems. The other stem is

4547 1dzo4

with a Grade IV rhyme. Why don't the rhymes of both stems have the same grade? Did 4517 have some additional affix that conditioned Grade III instead of the usual Grade IV after dz-? This second stem never appears by itself; it is combined with person suffixes in 'I eat him/her/it/them' (1SG > 3), 'thou eatest thyself/him/her/it/them' (2SG > 2SG, 2SG > 3), and perhaps 'I eat myself' (1SG > 1SG; forms for unattested subject/object combinations are in parentheses).

subject\object 1SG 2SG 1/2PL 3




(*?) (*1dzi3-2ni4?)











(*?) (*1dzi3-2ni4?)





(*?) (*1dzi3-2ni4?)


The o-stem is derived from the i-stem plus a third person object suffix *-u prior to 'brightening' (*a > i):

*Nɯ-dza-u > 1dzo4

If *-w had been added after 'brightening', the o-stem might have been an iw-stem:

*Nɯ-dzi-u > *1dziw4

8.22.23:32: Note that *-u is not present in all verb forms for third person objects. In that respect Tangut differs from northern Qiang in which a cognate suffix -w is consistently in all verb forms for third person objects. THE PAST AND PRESENT SOUND OF EATING IN TANGUT (PART 3)

I conclude my trilogy on

4517 1dzi3 'eat'

by looking into the origin of its reading.

Although it is tempting to assume that Tangut dz- is a direct retention of an earlier voiced obstruent, dz- may have been [ndz] with prenasalization as in Muya ndzɯ³⁵, Japhug ndza, and Naxi ndzɯ³³. That prenasalization in turn may have been from an *m-. (8.21.23:19: See Matisoff 2003: 119 for the reasoning behind his reconstruction of *m.)

As for the rhyme, there is no doubt that it has been 'brightened' to use Matisoff's (2004) term: i.e., *a has raised to i. There are two problems regarding brightening that remain to be solved.

First, why did *a raise to i1 in some cases and i3/i4 in others? According to Guillaume Jacques (2014), the presence or absence of *-j- determines the type of raising:


*Cja > Ci3/Ci4 (Guillaume does not make a distinction between Grades III and IV; he uses Gong's reconstruction in which both grades are characterized by a -j- that is not supported by Tibetan transcriptions.)

But for years I have proposed that presyllabic vowel harmony conditioned the changes leading to Tangut's rich inventory of open-syllable rhymes. I have yet to fully integrate my views with Guillaume's, but here's a first attempt. *Ca by itself or with a lower vowel presyllable became Ci1, whereas *Ca with a higher vowel presyllable became Ci3 or Ci4 depending on the initial:

*(Cʌ-)Ca > Ci1

*Cɯ-CaCi3/Ci4 (normally Pi4, Vi3, Ti4, Ki4, TSi4, CHi3, Qi4, li3, rir4, zi4, zhi3)

e.g., *Nɯ-dza > 1dzi3 (!; see below)

Unfortunately I have not found the Sino-Tibetan equivalent of Hittite that preserves the presyllables predicted by my hypothesis.

Second, why did *a become i3 after dz- in 'eat' instead of i4? There is a related word

4513 2dzi4*Nɯ-dza-H 'eat, drink, food'


The second half of the fanqie spellng of the Tangut cognate of Wobzi dzí 'eat' is also puzzling:


4517 1dzi3 'eat' = 5051 1naq4 + 0932 1i3 (Mixed Categories of the Tangraphic Sea 4.252)

Normally Class VI initials like dz- only combine with Grade I and IV rhymes: e.g., 1dzi1 and 1dzi4. But 4517 and its homophones

0382 1dzi3 'equal' and 4912 1dzi3 'cut'

have a Grade III rhyme -i3!

There are only ten syllables combining Class VI initials with Grade III rhymes. The other seven are


0524 1dzu3 'admonish, instruct' 

4973 1dzu3 'love'

4977 1dzu3 'father-in-law, uncle' (only in dictionaries)

5121 1dzu3 'dream' (only in dictionaries)

3408 1tsa3 'broil, roast' (only in dictionaries)

3976 2tsha3 (transcription character; only in dictionaries)

3371 1dza'3 'hair worn in a bun or coil; peak'

(8.21.1:54: Is it significant that Class VI initials can only precede four different Grade III rhymes?)

Why is the syllable type TSV3 (Class VI initial + Grade III rhyme) so rare? Was there something 'antialveolar' about Grade III? There certainly was no such quality in Grades I and IV.

Gong demonstrated a strong (but not absolute) correlation between Tangut and Middle Chinese grades. In Middle Chinese, Grade III was less palatal than Grade IV: e.g., in Chinese borrowings in Korean, Grade III -i is sometimes -ŭi, but Grade IV -i is always -i. For years I reconstructed Tangut Grade III -i as [ɨi] and Grade IV -i as [i]. The velarity of the first half of [ɨi] was the 'antialveolar' quality that usually made it incompatible with Class VI initials. But now I write those rhymes algebraically as -i3 and -i4 because I am no longer certain about their phonetic values. If -i3 were really [ɨi], why was

0932 1i3 'many, more, much'

a transcription character for Sanskrit i? Sanskrit has no [ɨi] or even [ɨ]. I would have expected Sanskrit i to be transcribed as 1i4: i.e., as [i].

8.20.23:54: Why can't I just reverse my reconstructions of Tangut i3 and Tangut i4 on the basis of 0932 1i3 transcribing Sanskrit i? Because that goes against what is known about the Chinese grades and because i3 is less common than i4 (88 : 209). It would be strange if i3 [i] were less common than [ɨi].

8.21.1:59: Another solution would be to reconstruct i3 as simple [i] and i4 as [ji]. But i4 was transcribed in Tibetan as i and not as yi. And characters with i4 were used to transcribe Sanskrit syllables with i, not yi. THE PAST AND PRESENT SOUND OF EATING IN TANGUT (PART 1)

After mentioning the Tangut cognate of Wobzi dzí 'eat' in my last entry, I looked up its fanqie and was surprised:


4517 1dzi3 = 5051 1naq4 + 0932 1i3 (Mixed Categories of the Tangraphic Sea 4.252)

Shouldn't those initial and final spellers add up to 1ni3 instead of 1dzi3? I don't know of any Chinese or Tibetan transcriptions of 4517 or its homophones

0382 1dzi3 'equal' and 4912 1dzi3 'cut'

so its reading has to be determined using internal evidence. 4517 is in the section of Mixed Categories for class VI initials: i.e., dental sibilants (ts- tsh- dz- s-). n- is a class III initial. The closest class VI initial to n- is dz- which is voiced and possibly prenasalized: [ndz]. I'm not entirely satisfied with that resolution to the conflict.

Next: The rhyme of 4517.

8.19.22:57: Here is some indirect evidence for reading 4517 with an affricate initial. 4517 is the initial speller for


4855 2dze4 'one of a pair' = 4517 1dzi3 + 4517 2tse4

which is a transcription character for Sanskrit je. Palatals in standard Sanskrit pronunciation correspond to dental affricates in Tangut (as in Tibetan and Chinese). So the initial of 4855 - and its initial speller 4517 - must have been dz- (or something close like [ndz]).

Could someone explain why Sofronov (1968 II: 85) lists the initial speller of 4517 (5051) as part of class VI fanqie chain 12? The fanqie of 5051 has an initial speller which doesn't even have a class VI initial:


5051 1naq4 = 5700 2ni'4 + 5371 1taq4

It has a class III initial n- that can be confirmed by its Tibetan transcription ni (Tai 2008: 208).

Tangut fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2015 Amritavision