I ran out of time last night to write a substantial entry, and I don't have time to write part 2. I would, however, like to transition toward tonight's diversion by noting one major difference betwen Old Persian cuneiform and the Khitan small script: the former had a word divider character (𐏐), whereas characters of the latter were clustered into word blocks.

The hangul alphabet invented five centuries after the Khitan small script also has character clusters, though Korean clusters consist of letters for single syllables.

I've been able to type modern Korean clusters on my computers for 18 years. Stacking is automatic: e.g., if I type ㄸ tt and ㅐ ae, they combine into 때 ttae 'time'.

However, I wasn't able to type premodern Korean letters until the last decade or so, and I've never been able to combine them. But now I can. On Saturday I installed Google's free Noto Sans Korean font and tonight I discovered it has full support for premodern hangul. Now I can type ᄣᅢ pstay, the Middle Korean ancestor of modern 때 ttae 'time'. (You probably won't see a cluster for pstay without Noto Sans Korean or another font with similar capabilities.) Clustering is not yet automated. I have to build obsolete hangul clusters piece by piece from the Hangul Jamo Unicode block using BabelMap: e.g., ᄣᅢ consists of initial ᄣ- pst- (U+1123) followed by medial -ᅢ -ay (= modern ae, U+1162). (That syllable does not have a final component.) It will be awkward to stop to construct a cluster in the middle of touch typing: e.g., the title of the 1446 document introducing hangul was 훈민 hunmin (with clusters still used today) followed by the archaic clusters 져ᇰ tsyəng* and  ᅙᅳᆷ ʔɯm. Nonetheless, I'm excited to be able to type archaic clusters at all. Perhaps some future edition of BabelPad will enable me to type, say, Yale romanization for Middle Korean and convert that into premodern hangul clusters.

The "yeys" in the title of this post is the Yale romanization of modern Korean yet 'old', spelled 옛 <Øyəys>. In Middle Korean, 'old' was 녯 nyəys.

*In modern hangul, ㅇ does double duty for zero and ng, but originally ㅇØ ~ ɣ  and ㆁng were distinct letters. WAS THE KHITAN SMALL SCRIPT LIKE OLD PERSIAN? (PART 1)

Last night I was reading the Encyclopædia Iranica biography of Karl Hoffmann who worked on the Old Persian script. What would an Iranist like him have thought of the Khitan small script? The two scripts superficially resembled their predecessors (cuneiform and sinography) and had characters for syllables, vowels, and a small number of words. Unlike Old Persian, the Khitan small script also had characters for diphthongs, vowel-consonant sequences (which may have also been read as consonant-vowel sequences) and consonants.

Old Persian character types
Khitan small script character types
syllable: 𐎣 <ka>
syllable: ` <qa>
consonant: <dz>
vowel: 𐎠 <a>
vowel: <a>
diphthong: <ai>
vowel-consonant (~consonant-vowel): <al> (~ <la>?)
word: 𐏈 <AURAMAZDĀ>
word: <HEAVEN>

Next: Gaps in the Old Persian and Khitan small script syllabaries. AVESTAN ALPHABETICAL ORDER

Last night I installed Google's Noto fonts. The first and so far only one I've used is Noto Sans Avestan. Over a decade ago I struggled with non-Unicode Avestan fonts, but now I can use BabelMap to input it.

The Unicode order of Avestan letters is close to the standard Indic order with several differences:

1. The e and o-vowels are before i and u instead of after.

2. Fricatives absent from Indic are where aspirated stops absent from Avestan are in Indic: e.g., 𐬑 x is where kh would be excepted.

3. All nasals are grouped together after labial obstruents instead of being grouped with obstruents at the same point of articulation.

4. 𐬭 r follows 𐬫 y and 𐬬 v instead of being between them. (The Pazend non-Avestan letter 𐬮 l follows 𐬭 r.)

What is the origin of this order? I see it in Bartholomae's Altiranisches Wörterbuch (1904). My first introduction to Avestan over twenty years ago was in Jackson's Avesta Grammar (1892) which had a more Indic-like order except for the placement of nasals. Later I saw Kanga's Practical Grammar of the Avesta Language (1891) whose order was as Indic as possible. And last year I finally got ahold of Beekes' Grammar of Gatha-Avestan (1988) which had an ABC-based order like Skjærvø's Old Avestan Glossary (2006).

Beekes (1988: 13) mentioned letters I didn't see in Jackson or Kanga:

𐬕 ġ was "of unknown value"

𐬜 δ̣ was a "graphic variant of δ?" (The letter Beekes transliterated as δ does not have its own code point; similarly, there is only one code point for two versions of which Skjærvø transliterated as and 2.)

𐬂 å "only occurs in one manuscript"

𐬅 ą̇ was "of unknown use"

𐬪 was "a variant of y"

According to Skjærvø (2003: 10), å was for short a before ŋ (I am reminded of the *-aŋ > -o shift in Tangut and northwestern Tangut period Chinese), and ą̇ was originally for a nasalized schwa: *ə̨. Do others agree with these interpretations?

Skjærvø (2003: 1) seemed to regard ġ, δ̣ (his δ2), 2,  and (his Y) as graphic variants of g, δ, t̰,  and y. Are they truly interchangeable? Why did ġ and get their own codepoints while and did not? Will variation selectors permit distinctions between the two kinds of δ and in the future? AN ESEN-TIAL SEMANTIC SPECTRUM

Here is my attempt to diagram the semantic overlap between the words in the last two entries and the Japanese  root yasu as an unrelated external 'control' for comparison:

Khitan esen Mongolian esen
(Lessing 1960: 333)
Turkic esen
(Clauson 1972: 248)
Middle Persian āsān
(MacKenzie 1986: 12)
Arabic ḥasan Japanese yasu
  安い・廉い yasu-i 'cheap'
easy 易い yasu-i 'easy'
at rest 休む yasu-m-u 'to rest'
calm, quiet   peaceful 安らか yasu-raka 'peaceful'
  sound, safe   安らか yasu-raka 'sound, safe' (obsolete)
healthy, good health in good health (康 yasu 'health')

I have tried to arrange the various meanings on a color-coded rainbow spectrum. Of course, there is no reason for semantics to be easily described in terms of a single dimension (or even two!), and other arrangements are possible.

'Good' is so broad that I assigned no color to it. Unlike John Tang, I don't think ḥasan 'good' has anything to do with the s-n words to the east.

I wonder if Japanese yasu- 'cheap' originated as 'easy to obtain'. The semantic range of yasu- and its derivatives in modern times is between 'cheap' and 'peaceful', but in earlier Japanese bodies could be yasu-raka 'safe and sound' (arguably even 'healthy'), and even in modern Japanese the name element yasu can be written with the Chinese character 康 'health' (but not 壽/寿 'longevity'!). The entire range (apart from 'good') could be defined in terms of the absence of ...

cost, which is a kind of ...

difficulty, which is a common characteristic of ...

work, which may lead to (is there a better transition?) ...

conflict, which can involve ...

danger, which may result in ...

injury, which is like ...

disease, which also harms the body, and may lead to ...

death, which is the ultimate harm.

The Persian and Turkic meanings do not intersect, though that by itself is not sufficient to argue that āsān and esen are unrelated, since 'peaceful' could shift to 'safe'. However, the semantic difference coupled with the vocalic mismatch make me skeptical about esen as a borrowing of āsān.

Mongolian shares 'health' in common with Turkic but not 'calm, quiet'. What is the earliest attested meaning? If it is 'healthy', then the Turkic word was borrowed in a narrow sense and the range of the Mongolian word later expanded into the 'green zone'.

I assume 'longevity' is a Khitan-internal extension of 'health' which might have been the original meaning of esen in the common ancestor of Khitan and Mongolic that borrowed the word from Turkic. Or was the direction of borrowing in the other direction? A third possibility is that both borrowed the word from Xiongnu, but of course that hypothesis cannot be tested because the Xiongnu word for 'health' is unknown (along with 99.9% of the Xiongnu lexicon). Is there anything like esen 'health' in the Yeniseian languages which may be related to Xiongnu? PEACEFUL BUT NOT ESEN-TIAL

On Sunday I asked,

Does [Middle Persian] āsān have an Indo-European or at least Iranian etymology?

On Monday I found the entry for that word in Nyberg's A Manual of Pahlavi (1964-74 II: 30) which derived it from āsāy- 'to rest' and led me to Bailey (1930: 16) who derived it from an Iranian root *sam, equivalent to *samH in Cheung's (2007: 330) modern Proto-Iranian reconstruction.

The Sanskrit cognate of *samH is śami. (I like Cheung's use of superscript i to indicate the Sanskrit i that is a reflex of *H.)

Proto-Iranian *samH and Sanskrit śami in turn come from Proto-Indo-European *kemʕ. The only English word with that root in Watkins' (2011: 41) American Heritage Dictionary of Indo-European Roots is nosocomial. I never even saw the word until yesterday.

There is one problem with that internal etymology, though. I don't understand how Bailey got from *sam to -sāy-. (Ā- is an Indo-Iranian prefix.)

Cheung (2007: 328) avoided that problem by deriving Manichaean Middle Persian sy-* (cognate to Pahlavi [Cheung's "Zoroastrian Middle Persian"] -sāy-) and Pahlavi āsān (a present participle) from Proto-Iranian *saiH 'to lie down, go to sleep'. The Sanskrit cognate of that root is śayi, and the Proto-Indo-European source of that root is *keiʔ with the following descendants in English (Watkins 2011: 39):

- Native: hind 'British farm assistant', hide 'unit of land area' (both as unknown to me as nosocomial!)

- Borrowed:

From Irish: ceilidh

From Latin (directly or otherwise): city, civic, civil; incunabulum (yet another new word for me - I'm referring to the last one, of course!)

From Greek: cemetery

From Sanskrit: Shiva

Given that internal etymology of āsān, there is no need to derive it from Arabic ḥasan.

I am still not entirely convinced that Khitan/Mongolic/Turkic esen is from Middle Persian becaue of the vowels. Are there any other cases of Turkic e corresponding to Middle Persian ā? Clauson (1972: 248) wrote that "The spelling asan, which is common in Uyğ. is prob. an aberration." What if asan is an Uyghur borrowing from Middle Persian while esen is an unrelated native soundalike in Turkic?

*The Manichaean script is an abjad, so I guess sy- was pronounced [saːj], but I don't really know. AN ESEN-TIAL ETYMOLOGY

On Friday I found John Tang's paper "On the Terms Concerning Longevity in Khitan and Jurchen Languages". He proposed that the Khitan word


<es.en> (in the large and small scripts) 'longevity'


a probable etymological explanation: [Mongolian] esen 'healthy, good health; calm, quiet' [see Lessing 1960: 333] < Uig. äsän 'in good health, sound, safe' [see Clauson 1972: 248] < Pahl. 's'n [âsân] 'at rest, easy, peaceful' [see MacKenzie 1986: 12]. It is related to the personal name Ḥasan, which is common in Arabic and Persian languages (Rybatzki 2006: 176-177). This connection suggests that Khitan, the so-called “Para-Mongolic” (Janhunen 2003: 391-402), had been partly influenced by Near Eastern culture, as well as by the significant Uighur compacts. (p. 483)

I am not sure all of that holds up.

I do think the Khitan/Mongolic and Turkic terms are connected. Perhaps esen is an early loan from Turkic into the common ancestor of Khitan and Mongolic.

But is esen native to Turkic? Clauson (1972: 248) wrote that Turkic esen was "[n]ot to be confused with Pe[rsian] āsān 'easy' ": i.e., the descendant of Pahlavi/Middle Persian āsān. I suppose he did not regard the Turkic word as a borrowing from Persian. I am hesitant to do so, as I don't understand why Persian ā would have been borrowed as e instead of a in Turkic.

Then again, Clauson also wrote "but see Doerfer II 478": i.e., Türkische und mongolische Elemente im Neupersischen. According to Doerfer, the Middle Persian word was not only borrowed into Turkic but also borrowed from Turkic back into New Persian as اسن esän (not isan in his Persian romanization)?

In any case, I doubt Arabic ḥasan has anything to do with the above words for four reasons (revised 7.21.1:03):

1. Ḥasan means 'good', not 'healthy' or 'safe'. A semantic shift of 'good' to 'healthy' is not impossible (e.g., English feel good), but the semantic fit is nonetheless loose.

2. MacKenzie (1986: 12) did not list Middle Persian āsān as a borrowing from Arabic. (Does āsān have an Indo-European or at least Iranian etymology?)

3. I would expect  Arabic ḥ- to correspond to Middle Persian h- rather than zero. (Cf. how Arabic ḥ- is pronounced [h] in New Persian.)

4. Esen is widespread within Turkic, so it might be reconstructible at the Proto-Turkic level. And esen might also be reconstructible for the common ancestor of Khitan and Mongolic (unless it was borrowed from Turkic by Khitan and Mongolic after their breakup). If Persian āsān were a borrowing from Arabic, it would have to date from the Islamic conquest in the seventh century AD or later: i.e., long after the breakup of Proto-Turkic and the split of the common ancestor of Khitan and Mongolic. (7.21.7:24: Is āsān attested in Persian before the seventh century?)

I conclude that ḥasan and āsān are unrelated lookalikes. WHAT WERE THE LIFESPANS OF KHITAN SMALL SCRIPT CHARACTERS?

When one hears that the Khitan small script was created around 925 and was abolished around 1192, one might look at the set of 369 characters at the back of Kane (2009) or the set of 448 characters in Sun et al. (2010) and assume they all rose and fell at the same time. But that is unlikely. A few may even be modern copyists' errors: e.g.,

121 for 063

in the handwritten transcription of the 仁懿 inscription which has been reburied and lost (so there is currently no way to verify the original). The remaining real characters may have come and gone at different points over three centuries: e.g.,


114/352 <ɨi>

may have become obsolete after *-ɨi and *-i had merged into *-i in Chinese. I wrote "after" rather than "when" because 114/352 may have already been an archaism by the time it was used in the 宣懿 and 道宗 inscriptions of 1101. Was it used in any other inscriptions? My guess is that it may have been obsolete when the 郎君 inscription was composed in 1134 after the fall of the Khitan Empire, as Chinese 期 *khɨi (later *khi) was transcribed as 339 <i> in

340-339 <x.i>.

The earliest known Khitan small script inscription is 耶律宗教 from 1053, over a century after the script was created circa 925. It is possible that there are small script characters that only appeared in the earliest, as yet undiscovered texts. And could new characters have been created after 925?

It would be interesting to keep track of the attestations of each Khitan small script character in dated texts. DID KHITAN SMALL SCRIPT CHARACTER 114/352 REPRESENT A DIPHTHONG?

Kane (2009: 48) interpreted Khitan small script character 114

as a long vowel [iː]. He transliterated it and its variant 352

as <î> with a circumflex reminiscent of the circumflex some use to write Japanese vowel length. (That practice seems to be in decline now that support for vowels with macrons is widespread.)

I think he concluded that 114/352 was a long vowel because it appears in transcriptions of Chinese 騎 which was also transcribed with <i.i>:


334-114 <g.î> (宣懿 2.28) :  334-352 <g.î> (道宗 2.24): 334-339-339 <g.i.i> (蕭仲恭 19.34)

However, there are only three instances of 114/352 in the entire corpus in Qidan xiaozi yanjiu (1985). Besides the two instances of Chinese transcriptions above there is also one native (or at least non-Chinese) word:

352-235-359 <î.ri.?> '?' (道宗 7-21)

Was [iː] really only in one Chinese word and one native word? I think 114/352 represented something a bit more exotic than [iː].

Sino-Korean was borrowed from an 8th century dialect of eastern Middle Chinese that was either ancestral or closely related to Liao Chinese. In premodern Sino-Korean, 騎 was 긔 kɯi, implying eastern Middle Chinese *kɨi. Could the diphthong of *kɨi have survived in Liao Chinese? If so, then 114/352 might have stood for [ɨi] or [ɯi], and 339-339 <i.i> could have represented a more Khitanized pronunciation [iː] of an un-Khitan [ɨi] or [ɯi].

But if [ɨi] or [ɯi] were un-Khitan, why would 352 be in a non-Chinese word 352-235-359? Maybe that instance of 352 was an error for some other character with the top element 火~大. Unfortunately, there are no other words of the type x-235-359 in Qidan xiaozi yanjiu, so I cannot determine what that other character might have been. KHITAN SMALL SCRIPT VOWEL CHARACTER FREQUENCY IN QIDAN XIAOZI YANJIU

Kane (2009) identified nineteen vowel characters in the Khitan small script. I only looked at ten core characters in my previous post. Here is a list of all nineteen in order of frequency in the texts indexed in Qidan xiaozi yanjiu (1985). Colors indicate whether a character is only in native (or at least non-Chinese) Khitan words (red), only in Chinese loanwords (blue), or in both (green).

Rank Character Kane's (2009) transliteration Qidan xiaozi yanjiu number Qidan xiaozi yanjiu frequency Notes
1 a 189 618 Khitan and Chinese
2 i 339 520
3 u 131 490
4 o 186 257
5 û 372 256
6 ii 080 248 Khitan only; 109 is a variant of 348
7 ó 090 194
8 e 348 130
9 109 76
10 ú 245 75 Khitan and Chinese
11 ô 252 36 Khitan only; variant of 186?
12 y 082 24 Chinese only; medial [ɥ]
13 ï 353 15 Chinese only; 113 is a variant of 353
14 ü 226 10
15 ï 113 8
16 ô 253 5 Khitan only; = 252; variant of 186?
17 î 352 2 Khitan and Chinese
18 î 114 1 Chinese only; variant of 352 miswritten as 339 <i> in Kane's index
19 i'i 338 1 Chinese only; = 懿 / 339-339 <i.i>

I am not surprised that the top four are <a i u o>.

However, I was surprised to see <û>, <ii>, and <ó> above <e> which was probably /ə/, the higher/yin counterpart of lower/yang /a/. Then again, if the phoneme /ə/ is an inherent vowel of CV characters, the frequency of the character <e> should be low. Vowel phoneme frequency is distinct from vowel character frequency.

Nonlow achromatic vowels share the top element 火~大 in common:


348 ~ 109 <e> /ə/


353 ~ 113 <ï> /ɨ/ or /ɯ/


352 ~ 114 <î> which may have been /ɨi/ or /ɯi/

I am now wondering if 080 <ii> was not simply long /iː/ which might have been the value of 339-339 <i.i> and 338 <i'i> in Chinese loans. 080 might have stood for an i-like vowel or diphthong absent from Chinese: e.g., <ɪ> given that it followed 323 which might have had a uvular initial in

323-080-222 <qi.ii.ń> '奚 Xi' (許王 52.4)

Aisin Gioro (2011) read 080 <ii> as <əl> which not only results in an odd reading of the name of the Xi (something like <tə.ə> in her reconstruction which sounds nothing like Middle Chinese 奚 *ɣej 'Xi'), but also is unlike its fellow converb suffixes which all end in <i> (Kane 2009: 149-151). Perhaps Aisin Gioro regards 080 as the yin counterpart of the yang converb suffix

098 <al>.

<u>, <ú>, and <û> were all used to transcribe Liao Chinese *u, so I can't figure out how they were different.

<o> and <ô> may be the same vowel since they alternate in the same words (Aisin Gioro 2004: 11 cited in Kane 2009: 65). I doubt that these graphic alternations correspond to phonetic alternations due to ablaut, since ablaut is not an 'Altaic' areal trait. (Then again, neither is grammatical gender, an aspect of Khitan that remains largely uninvestigated.)

Liu Fengzhu et al. (2009) interpreted 090 <ó> as <ʊ> - the back counterpart of the <ɪ> that I proposed for 080. AN <II>-SOLATED EXAMPLE?

The Khitan small script spelling

323-080-222 <qi.ii.ń>

for '奚 Xi' (許王 52.4) could also be transliterated <>. (See my last post for other interpretations of 322.) Those transliterations brought the following questions to mind:

1. Did Khitan have phonemic vowel length?

2. If it did, how did it correlate with what appear to be sequences of identical vowels in the Khitan small script? Long vowels need not be consistently written. They were not distinguished from short vowels in early Attic, and in later Greek which had special letters η and ω for long lower mid vowels and digraphs ει and ου for long upper mid vowels, "the remaining vowel letters α, ι and υ continued to be ambiguous between long and short phonemes." Estonian today has no written distinction between long and overlong vowels.

(Incidentally, Estonian has ɤV-diphthongs like those I reconstruct for Grade II in my current reconstructions of Tangut and Middle Chinese.)

3. How many degrees of vowel length did Khitan have? Did <> had an overlong vowel like Estonian, or was its vowel simply long [iː]?

4. Was 080 <ii> disyllabic [iʔi] or even [iji]? <ii> is one of a set of converb suffixes (Kane 2009: 149): <ai>, <ei>, <i>, <oi>, <ui>. Were the suffixes other than <i> pronounced as diphthongs or disyllables or simply as [i]: e.g., was

186-107 <o.oi> 'become-then' = 'became ... and then ...' (context in Kane 2009: 150)

pronounced [oj] or [oji] or [oːj] or [oʔoj] or ...?

5. Why is <ii> the only known double/long vowel symbol in Khitan - the "<ii>-solated example" of my title?

Kane (2009: 29) identified a large number of symbols for back vowels (and front rounded vowels?). This table is based on his table 1.32 with the addition of <ii>. It does not include variants.

189 <a>

348 <e>

339 <i>

186 <o>

131 <u>
090 <ó>

245 <ú>

252 <ô>

372 <û>

080 <ii>

6. Could one or more of the lower frequency <o> and/or <u>-type symbols have stood for long vowels? In other words, should <ó>, <ú>, <ô>, and/or <û> be moved to the bottom row (assuming <ii> was a long vowel [iː])?

7. Why are there no other <a> and <e>-type symbols? Were <a> and <e> ambiguous in terms of length? Could long [aː] and [əː] only be specified in writing with graph sequences like <Ca.a>, <e.eC>, etc.? Was, for instance,

053-051-011 <qa.ɣ> 'qaghan(-GEN?)' = '(of a?) qaghan'

read as [qɑʁɑːn] with a long vowel? Or was <an> simply a way of writing [n] before a stem ending in [ɑ]?

8. Was the distribution of long vowels in Khitan unbalanced? In Japanese, aa and ee are rare in native words, ii is usually at the ends of adjectives where it is from -iki and -isi, etc. Did ii get its own small script character because it was the most common long vowel in Khitan?

9. Could some Khitan small script CV symbols really be symbols for syllables with long vowels?

10. Did Liao Chinese have vowel length or at least what the Khitan perceived as vowel length? Many Khitan small script spellings of Chinese words have seemingly redundant vowel marking: e.g.,

225-303 <> for 兵 *piŋ (instead of *<>)

Some Chinese words were spelled with one or two vowels: e.g.,

311-131 <b.u> or 311-372-131 <b.û.u> for 部 *pu

See Kane (2009: 244-251) for more examples. THE QI TO THAT BODY?

I was puzzled by the Khitan large script characters


for qi 'that' almost a year ago. But now I think I've figured out their origin. Tonight I learned that the Chinese character 厥 'his, her, its, their' has a variant 𨈐 dating back to Yupian (c. 543). (TLS lists 43 more variants from 異體字字典 and seven more from 漢語大字典: 𠪏𣅲𣐍橛橜瘚𦪘.) Khitan qi may have been like Japanese sono which can be translated as both a third person possessive and 'that'.

I do not know how the shapes of 厥 and 𨈐 can be related. It may be more accurate to speak of unrelated characters for the same word as 'equivalents' instead of 'variants'.

I also do not know the origin of the Khitan small script character 323

for qi. Nor do I understand why 'that' is

323-151 <qi.ɣ>

with a final consonant in 蕭仲恭. The latter cannot be plural: e.g., both <qi ai> and <qi.ɣ ai> mean 'that [specific] yar' in the examples in Kane (2009: 121). (How I wish he included citations for all his examples!) Kane (2009: 73) reported that <qi.ɣ> "is also found elsewhere", though it is not in the texts that I have on hand.

I briefly thought 'that' might have been qiɣ, and <qi.ɣ> was really <qiɣ.ɣ> with a redundant 151 <ɣ>, but the Khitan small script spelling

323-080-222 <qi.ii.ń>

for '奚 Xi' (許王 52.4) makes more sense if 323 represented an open syllable ending in ii.

The reading <qi> is from Kane (2009) and is close to Liao Chinese 奚 *xi. Kane did not explain why he chose q- instead of x- (cf. Aisin Gioro's 2004 reading hi). Perhaps he thought 奚 *xi stood for a foreign name *qi. However,奚 was read as *ɣej centuries earlier when it first appeared as part of the name 庫莫奚 *khoʰmak ɣej Kumo Xi in the Book of Wei. So perhaps 'that' in Khitan was closer to ɣe or ɣi*, and the Khitan name for the Xi might have been something like ɣei(n) or ɣii(n).

The final <ń> 許王 52.4 in may be a genitive suffix; <qi.ii.ń> appears before <qa.ɣ go.e.en>, and the phrase

323-080-222 053-051-011 319-348-140 <qi.ii.ń>

may mean 'of the goe of the qaghan of the Xi'. The meaning of goe is unknown.

Aisin Gioro (2009) changed her reading of 323 from hi to (cf. Mongolian and Manchu tere 'that'). However, 323-080-222 would then be <tə.ii.ń> which does not sound like 奚. Did she regard 323-080-222 as something other than 'Xi' or did she think the Khitan had an exonym <tə.ii.ń> for the Xi?

323 looks like Chinese 口 'mouth'. Was the Khitan word for 'mouth' (and all other body parts except for 'head'?) qi or ɣi or the like (which would be very different from Mongolian aman 'mouth')? I doubt it, as no other Khitan small script characters are obvious pictographs.

*There was no syllable *ɣi in the Chinese of that period, so 奚 *ɣej could have been an approximation of a foreign *ɣe or *ɣi. RISING SAGES IN KHITAN - AND TANGUT?

For years I have been puzzled by the Khitan large script character

used to transcribe Liao Chinese 聖 *3ʂiŋ 'sage'. Why would 'sage' have been written as  an apparent combination of 夕 *4siʔ 'evening' and the name 卞 *3pian?

Tonight I found that an identical-looking Chinese character 𫝢 has been in Unicode since version 6.0 in 2010. 𫝢 turns out to be quite different from the sum of its apparent parts; it is a variant of 升 *1ʂiŋ 'rise', a near-homophone* of 聖 *3ʂiŋ 'sage' that is attested as far west as Dunhuang (Huang Zheng 2005: 361), though it is not in Longkan shoujian (997) compiled in the Khitan Empire. A dotless variant 㚈 is even more similar to 升. TLS lists even more variants from 異體字字典 including 𢦑 which reminds me of Tangut

.2544 2ʂɨẽ 'sage' < northwestern Chinese 聖 *3ʂɨẽ 'sage'

Although I thought the Tangut character might have been derived from Khitan 𫝢, I now wonder if its shape combined the parallel diagonal lines of a 𢦑-type variant with the right-hand vertical line of a 㚈-type variant. (7.13.0:13: The earliest attestation I can find for 𢦑 is in 四 聲篇海 from the Jin Dynasty. However, Shuowen from 100 AD has a similar form.)

Note that in the dialect of Chinese known to the Tangut, 升 *1ʂɨĩ 'rise' and 聖 *3ʂɨẽ had different vowels, whereas the two words were homophonous except for their tones in the dialect of Chinese known to the Khitan: *1ʂiŋ and *3ʂiŋ. Moreover, the merger of their rhyme categories in the east dates from the Liao Dynasty (see the table in Kane 2010: 242).

Hence I conclude that the idea of writing 聖 as 𫝢 probably originated among the Khitan in Liao times and may not have been a retention from the elusive, hypothetical Parhae script. (If Chinese pronunciation in Parhae were like Sino-Korean which was based on an eighth century eastern dialect, 聖 and 升 were read something like *3sjəŋ and *1sɯŋ.)

The Tangut might not have thought of writing 'sage' as a 𢦑/𫝢-like character on their own because 聖 *3ʂɨẽ did not sound like 升 *1ʂɨĩ 'rise' in the Chinese dialect they knew. So I think they got the idea from the Khitan.

How many other Tangut characters reflect Khitan influence? Probably not many, but 2544 might not be an isolated instance.

Also, how many other Khitan large script characters are based on unfamiliar variants of Chinese characters?

*Numbers in front of syllables indicate tones in Chinese as well as Tangut. 升 *1ʂiŋ 'rise' had a 'level' tone whereas 聖 *3ʂiŋ 'sage' had a departing tone. The contours and voice qualities of these tones are unknown. Using the same notation for Chinese and Tangut facilitates comparisons, though it also may erroneously imply that Chinese and Tangut had identical or similar tones. The lack of systematic treatment of Chinese tones in Tangut transcriptions of Chinese and Tangut borrowings from Chinese may indicate that they sounded very different. WAS TANGUT RHYME 50 GRADE I?

Rhymes 50 and 51 were similar in the first published full-scale Tangut reconstruction known to me (Kychanov and Sofronov 1963) but not in most subsequent reconstructions: e.g., Sofronov (2012: 428) reconstructed rhyme 50 as Grade III -joˁ (identical to part of his rhyme 55!) and rhyme 51 as Grade I -o. Arakawa (1999) is an exception in which rhymes 50 and 51 are both Grade I -o.

Given that Tangut rhymes are generally grouped into sets with ascending grades (I-III or I-IV depending on the reconstruction), treating 50 as Grade I avoids the problem of an odd grade sequence found in other reconstructions (e.g., III-I-II-III/IV for o-rhymes in Sofronov 2012), though it also raises the question of why there are two Grade I rhymes in a row - something not found elsewhere in Arakawa's reconstruction. Moreover, rhyme 50 is almost always preceded by class VII initials which otherwise only precede Grade II and III rhymes.

Reconstructions of class VII initials preceding rhyme 50

Nishida 1964 tš- tšh- ndž- š-
Sofronov 1968 tś- tśh- ndź- ś-
Li Xinkui 1980 tʂ- tʂh- dʐ- ʂ-
Huang Zhenhua 1983 tś- tśh- ȵtś- ś-
Li Fanwen 1986 tɕ- tɕh- dʑ- ɕ-
Gong Hwang-cherng 1997 tś- tśh- dź- ś-
Arakawa 1999 c- ch- j- sh-
This site tʂ- tʂh- dʐ- ʂ-

Let me try to explain those anomalies.

In my reconstruction, the four grades are differentiated by their medials:

Medial zero -ɤ- -ɨ- -i-

Grade I and perhaps Grade II vowels are lower and perhaps also backer than their Grade III and IV counterparts: e.g., Grade I (rhyme 8), Grade II -ɤi (or -ɤɪ?; rhyme 9), Grade III -ɨi (rhyme 10), and Grade IV -i (rhyme 11).

Why did the shibilants of class VII almost always precede the Grade II and III medials? Here's what I think happened:

1. Early pre-Tangut had palatals: tɕ-, tɕh-, dʑ-, ɕ-.

The first three could also have been palatal stops: c-, ch-, ɟ-.

2. These palatals were followed by medial -i-.

3. Pre-Tangut developed retroflexes from *dental-r-clusters: e.g., *k-tr- became *tʂh- in 'six' (cf. Tibetan drug 'six').

4. The local dialect of Chinese merged its palatals and retroflexes. Modern northwestern Chinese dialects later developed new palatals from old dentals and velars before *i:

Middle Chinese Tangut period Chinese Modern northwestern Chinese
*palatals *retroflexes retroflexes (and labiodentals before *u)
*alveolars *alveolars alveolars and palatals
*velars *velars velars and palatals

The table above is simplified: e.g., it does not account for northwestern dialects like Xi'an which have different reflexes for Middle Chinese palatals and retroflexes. See Coblin (1994: 97-105) for a more detailed overview of the development of palatals, retroflexes, alveolars, and velars in northwestern dialects.

Moreover, it is not clear whether modern northwestern Chinese dialects are descended from earlier northwestern Chinese dialects or if the latter were substrata for the former which were newcomers.

5. Pre-Tangut merged its palatals and retroflexes:

Pre-Tangut Tangut
*palatals retroflexes
*dental-r sequences

6. The -i- that followed palatals became -ɨ-.

7. This -ɨ- spread to retroflexes that did not come from palatals (i.e., those from *Tr-sequences).

Another possibility is that *Tr-sequences became *Tɨ-sequences which affricated into *Tʂɨ-sequences. No such affrication occurred when nondentals were followed by *-r-: e.g., *kr- > *kɨ-.

8. This -ɨ- lowered to *-ɤ- for height harmony if it was

- preceded by presyllabic

- not preceded by presyllabic and followed by a 'low'* vowel (*a *e *o):

Vowels Heights Vowels Heights
*-ɨu high-high *-ɨu high-high
*-ʌ-ɨu low-high-high *-ʌ-ɤu low-low-high
*-ɯ-ɨu high-high-high *-ɯ-ɨu high-high-high
*-ɨi high-high *-ɨi high-high
*-ʌ-ɨi low-high-high *-ʌ-ɤi low-low-high
*-ɯ-ɨi high-high-high *-ɯ-ɨi high-high-high
*-ɨa high-low *-ɤa low-low
*-ʌ-ɨa low-high-low *-ʌ-ɤa low-low-high
*-ɯ-ɨa high-high-low *-ɯ-ɨa high-high-low
*-ɨə high-high *-ɨə high-high
*-ʌ-ɨə low-high-high *-ʌ-ɤə low-low-high
*-ɯ-ɨə high-high-high *-ɯ-ɨə high-high-high
*-ɨe high-low *-ɤe low-low
*-ʌ-ɨe low-high-low *-ʌ-ɤe low-low-high
*-ɯ-ɨe high-high-low *-ɯ-ɨe high-high-low
*-ɨo high-low *-ɤo low-low
*-ʌ-ɨo low-high-low *-ʌ-ɤo low-low-high
*-ɯ-ɨo high-high-low *-ɯ-ɨo high-high-low

9. The presyllables were lost, so ɤ was no longer a predictable allophone of /ɨ/.

10. -ɨ- generally merged with -i- after initials other than retroflexes, v-, and l- (which may have been velar [ɫ]).

Grade II -ɤV and Grade III -ɨV syllables were assigned to different rhymes except perhaps in the case of rhyme 55 which I should investigate.

Reconstruction Grades of rhyme 55
Hashimoto 1965 III
Sofronov 1968
Gong 1997 II III
Arakawa 1999 II
Sofronov 2012 II III IV

What if -ɤ- and/or -ɨ- were occasionally lost after retroflexes? Then those retroflex-initial syllables would be Grade I: i.e., without medials.

Grade II *dʐɤo or Grade III *dʐɨo > 1955 Grade I R50 1.48 *dʐo?

I am using Arakawa's reconstruction of rhyme 50 here.

This solution has several problems.

First, why weren't these -o syllables assigned to Grade I rhyme 51 -o?

Second, was medial -ɤ- and/or -ɨ-loss after retroflexes sporadic within the Tangut prestige dialect, or were rhyme 50 borrowings from one or more dialects that had regularly lost medials after retroflexes.

Third, it cannot explain why rhyme 50 has liquid-initial syllables such as

2912 Grade I R50 1.48 lo (in Arakawa's reconstruction).

Why isn't 2912 in the same homophone group as

4710 Grade I R51 1.49 lo (in Arakawa's reconstruction)

in the Tangraphic Sea and the Homophones?

Tonight another solution occurred to me. Rhymes 44-49 had front vowels followed by -w. What if rhyme 50 was -ow with an -o like rhyme 51 but a -w like rhyme 49?

49. -iw'

50. -ow

51. -o

Tangut -w is from earlier *-w and *-k. If this solution was correct, I would expect rhyme 50 words to have cognates ending in *-w and/or *-k. Unfortunately, I do not know of any cognates for rhyme 50 words.

7.12.0:54: If rhyme 50 was -ow, why would it be almost exclusively preceded by shibilants? What would make -ow more shibilant-friendly than rhyme 51 -o?

Why would medial -ɤ- and/or -ɨ- be lost before -ow but not other -w rhymes?

Conversely, if rhyme 50 was -ɤow and/or -ɨow, why was there no simple -ow?

*Phonologically, was a 'high' vowel and *e and *o were 'low' vowels, though they were phonetically all mid vowels. I could also call *i *ə *u 'higher' vowels to distinguish them from the 'lower' vowels *e *a *o. DID TANGUT RHYME 51 HAVE A LONG VOWEL?

Last night I added the first published full-scale Tangut reconstruction that I know of (Kychanov and Sofronov 1963) to my database (Excel / HTML). Kychanov and Sofronov's reconstruction has three rhymes with macrons. The first of these is rhyme 51 following 50 which lacks a macron:

Rhyme KS Nishida 1964 Hashimoto 1965 Sofronov 1968 Huang 1983 Li 1986 Gong 1997 Arakawa 1999 Sofronov 2012 This site
50: 1.48 -oɦ -jəw -i̭o -iən -ǐo̭/-ǐo -jwo -o -joˁ -wɨo
51: 1.49/2.42 -ʌ̄ -ɔɦ -ɔwN -o -uẽ, -uɐ̃ -ǐəu/-ǐuo -(w)o -o -(w)o
52: 1.50/2.43 -ʌ' -ǐow -owN -uõ -ǐo̭/-ǐo/-ɪo̭ -io -yo -ɤo
53: 1.51/2.44 -i̭ʌ -ǐɔɦ -jowN -i̭o -iõ, -ïõ, -iɔ̃ -ǐou -j(w)o -o: -jo, -ö -(w)ɨo, -(w)io

I list two reconstructions for rhymes 50 and 51 in the column for Li (1986). The first is from pages 165-166 and the second is in the rest of the book. On page 165, Li wrote rhyme 51 as -ǐəo which is also his reconstruction for rhyme 49 on the previous page. I assume that -ǐəo is supposed to be -ǐəu.

I have included rhyme 53 which some consider to be similar to rhyme 50.

I have also included rhyme 52 to complete the set of -o-rhymes. On page 188, Li wrote rhyme 52 as -ɪo̭, but elsewhere he wrote it as -ǐo̭ or -ǐo like rhyme 50.

Kychanov and Sofronov observed that 50 and 51 were in complementary distribution: 50 appeared after shibilants and what they reconstructed as r, whereas 51 appeared after all other types of initials. This still mostly holds true in my reconstruction:

Grade Rhyme Shibilants l- ɬ- v- Other initials
III 50: -wɨo X X X
I 51: -o X X X
51': -wo X X X
II 52: -ɤo X X
III 53a: -ɨo X X
53a': -wɨo X X X X
IV 53b: -io X X X
53b': -wio X X X X

Rhymes 50 and 51 are not in complementary distribution, as both can occur before l-:


R50 1.48: 2732 lwɨo : R51 1.49: 1018 and 1595 lwo

Unlike Kychanov and Sofronov (1963), I do not reconstruct r- before rhyme 50 (or any rhyme in this o-set).

Rhymes 50 and 53 are in complementary distribution only if tones are taken in consideration. The following syllables would be homophonous if tones are ignored:


R50 1.48 1955 dʐwɨo, 2784 dʐwɨo : R53 2.44 2207 dʐwɨo, 5586 dʐwɨo

Why did the Tangut regard first tone -wɨo as a rhyme category distinct from first tone  -ɨo while placing second tone -wɨo in the same category as second tone -ɨo?

Tone\rhyme -ɨo -wɨo
1 R53 1.51 R50 1.48
2 R53 2.44

Tonight I noticed that the Grade II rhyme 52 never has -w-, whereas rhyme 50 always has -w-. Moreover, all the initials of rhyme 50 can also occur in Grade II. Could rhyme 50 be the -w-version of rhyme 52: i.e., could I reconstruct rhyme 50 as -wɤo?

Tone\rhyme -ɤo -wɤo
1 R52 1.50 R50 1.48
2 R52 2.43 (none)

Different grades for rhyme 50 imply different origins:

Grade III -wɨo < *Pɯ-o and/or *Cɯ-wo

Grade II -wɤo < *P-o and/or *-wo (but lwɤo would be from the improbable *P-lro or, worse yet, *lwro, so I am inclined not to regard rhyme 50 as Grade II)

In either case, I do not understand why the non-Grade I rhyme 50/1.48 precedes the Grade I rhyme 51/1.49. Arakawa avoided that problem by reconstructing both rhymes 50 and 51 as -o, though I am not sure how he accounted for the difference between


R50 1.48: 2732 lwɨo : R51 1.49: 1018 and 1595 lwo

I wish there were a publicly available complete list of Arakawa's reconstructions. Kotaka's partial list of Arakawa's reconstructions (now offline) has 1ldwo for 1595 (and presumably its homophone 1018 would also be 1ldwo)*. I think 2732 would be 1lo in Arakawa's reconstruction. So he might say they had different initials and medials. However, Arakawa (1997: 134, 135) reconstructed the syllable lo in both rhyme 50 and rhyme 51. What determined whether a given lo was assigned to rhyme 50 or 51?

Any solution must explain why rhymes 50 and 51 have largely nonoverlapping initials. One might interpret Kychanov and Sofronov's -ʌ̄ as a long vowel, though I don't understand why a long vowel would not follow shibilants and their r- Moreover, I don't think they intended -ʌ̄ to be a long vowel:

Поэтому, чтобы подчеркнуть связь между этими двумя гласными, обозначим его как ʌ̄.

'Therefore to emphasize the connection between these two vowels [of rhymes 50 and 51], we will denote it [rhyme 51] as ʌ̄.

I suppose their macron is a bit like the (over)bar of mathematics:

A bar (also called an overbar) is a horizontal line written above a mathematical symbol to give it some special meaning.

In this case, ʌ̄ might mean 'special variant of rhyme 50 after shibilants and r'. They did not specify if or how rhyme 51 phonetically differed from rhyme 50.

*7.11.0:41: I have followed Gong in reconstructing only a small number of liquids (l-, ɬ-, ɮ-, ʐ-, r- = Gong's l, lh-, z-, ź-, r-). However, Tai (2008: 201) has made a case for a larger set of liquids, and I have yet to integrate his ideas into my own reconstruction. If Tai is correct, then


R50 1.48: 2732 : R51 1.49: 1018 and 1595

is not a true minimal set. 2732 had an initial belonging to liquid fanqie chain 1 (generally transcribed in Tibetan as l- with or without preinitials), whereas the other two had an initial belonging to liquid fanqie chain 4 (generally transcribed in Tibetan as ld- or zl-). Tai reconstructed the initials of liquid fanqie chains 1 and 4 as l- and ld-.

7.11.1:21: Sanskrit o is always long [oː]. If the version of Sanskrit heard by the Tangut preserved long [oː] and if rhyme 50 were short and 51 were long - not that anyone said they were - I would not expect rhyme 50 in transcriptions of Sanskrit -o-syllables. Nonetheless

R50 1.48 0009 ʂwɨo

transcribed Sanskrit śo [ɕoː]. (I do not reconstruct [ɕ] in Tangut.) Moroever its -w- corresponds to zero in Sanskrit. (Its -ɨ- is not a problem, as ʂo is not possible in my Tangut reconstruction.) However, this usage of an rhyme 50 -w-character is an isolated instance and could be regarded as an error.

Most instances of Sanskrit -o were transcribed with rhyme 51 characters, but that does not necessarily mean that rhyme 51 was long. Exceptions mostly had initials absent from rhyme 51: shibilants and r-. Shibilants were not possible before grade I initials, and r- was not possible in first cycle rhymes other than rhyme 43. TANGUT RHYME DATABASE: 9 JULY 2014 EDITION

I updated my Tangut rhyme database (Excel / HTML) for the first time since September to include

- my latest reconstruction of the Tangut rhyme system (in the rightmost column named "*new")

See "G-*r-adation in Tangut (Part 2)" for an explanation of the vowels.

- corrections in my previous reconstruction which I used (with variations) between 2008 and this summer (in the column named "old")

- Kychanov and Sofronov's 1963 reconstruction which is the first published full-scale Tangut reconstruction to the best of my knowledge; I started that column last September and regret not completing it in time for the fiftieth anniversary of that reconstruction's publication.

I do not know how to best represent their diacritics in Unicode and have used hangul letters as similar-looking placeholders for the time being.

Eventually I will include Sofronov's 2012 reconstruction. I may also add Huang Zhenhua's 1983 reconstruction and Li Fanwen's 1986 reconstruction. MARRIAGE, TANGUT STYLE (PART 2)

The character for the first syllable of the Tangut word for 'to get married'

0225 1ɣɨə

is interesting because it has a right half (Boxenhorn code wuu) that is in no other character, whereas its left half is in 53 other characters.

In the Tangraphic Sea, 0225 is analyzed as


0225 1ɣɨə = left of 1085 1ɮi 'man' + top and bottom right of 0050 1ni'*

Why wasn't all of 0050 used as the right half of 0225? Because it would have been too complex? The omitted bottom left component is

'not' (Nishida radical 041 / Boxenhorn code cia).

Li Fanwen (2008: 9) defined 0225 as 'to marry'. 0225 is not in either Kychanov or Nishida's dictionaries. Its Tangraphic Sea analysis and definition is

2ʂɨe 1ɣəu 1dziẽ 2ŋõʳ

'request head relation-by-marriage whole'

'0050 is composed of the top half of 0147 'request' over all of 1965 'relation by marriage'.'

1ni' 1tia 1ɮi 1ni' 1lɨə

'0050 TOPIC man 0050 is'

'0050 is as in 'to 0050 a man' [?] and'

1ɣɨə 1ɮwị 1vɨi 1ʔie 1ʔiə

'wedding do GEN say'

'how one says to wed'

The D version of Homophones has the note

1ni' - 1ɣɨə 1ɮwị 1ɮi

'0050 - get married man' = a verb-noun sequence 'married man'?

0050 might mean something like 'to marry a man'. Li lists no examples of 0050 outside dictionaries, so I cannot speculate any further about its semantics. It is obviously semantically relevant to 0225, whose structure can be interpreted as an abbreviation of an object-verb sequence 'man + marry'.

0050 in turn is another semantic compound: a noun-verb sequence 0147 + 1965 'requested relation by marriage'.

The analysis of 0147 (Boxenhorn code wus) is unknown. It occurs in only one other character:


5150 1thʊʊ 'to request' = center of 2364 1sew 'to survey' + all of 0147 2ʂɨe 'to request'

I'll look at the analysis of 1965 in part 3.

*I am no longer comfortable with writing long vowels in Tangut readings since I no longer believe Tangut had any such vowels. If Tangut had, for instance, a distinction between -i (rhyme 11) and -ii (my old reconstruction of rhyme 14), I would expect Sanskrit syllables ending in short -i and long to be respectively transcribed with rhyme 11 and 14 characters. However, rhyme 14 occasionally appears in transcriptions of Sanskrit -i but never appears in transcriptions of Sanskrit (Arakawa 1999: 111-112):

Rhyme 10 11 14 30 31 37 46 84
Sanskrit \ Tangut -ɨi -i -ii -ɨə -iə -ie -iew -iʳ
-i 5 22 4! 1 1 2 1 2
2 3 0!   1

That suggests rhyme 14 was inappropriate for both -i and -ī. Nonetheless, it must have been -i-like, as it was transcribed as -i in Tibetan. Hence I will write rhyme 14 as -i', with an apostrophe as a typographical substitute for a prime symbol indicating a rhyme that was somehow different from its counterpart without an apostrophe. MARRIAGE, TANGUT STYLE (PART 1)

The Tangut word for 爲婚 'to get married' in the Tangut-Chinese handbook The Timely Pearl in the Palm (1190) is

0225 1851 1ɣɨə 1ɮwị (page 34, column 3, characters 3-4)

Although Li Fanwen (2008: 39, 308), and Kychanov and Arakawa (2006: 517, 142), and Grinstead (1972: 130, 78) all list each half as an independent verb, all textual examples in Li (2008) contain the two halves together. That leads me to conclude that this is a disyllabic word rather than a sequence of two words (though it may have originated as a sequence of two words).

In an earlier stage of Tangut, this word may have been *Cɯ-K(r)ə *S-P-zi:

- The vowel of *Cɯ- conditioned the lenition of *K- to ɣ- before being lost.

-*K is an unknown back stop; it could have been velar (*k, *kh, *g) or even uvular (*q, *qh, *ɢ).

(7.8.0:00: I am assuming ɣ- is derived rather than original.)

- Medial *-r- lenited to -ɨ-; if there was no medial *-r-, *ə might have broken to *ɨə after ɣ if it was uvular *[ʁ].

- Preinitial *S- conditioned tension (indicated by a subscript dot) and preinitial *P- conditioned medial -w-:

*S-P-ɮi > *zbɮi > *zɮbi > *zɮβi > *zɮwi > *ɮ̣ẉị > ɮwị

The root of the second half may be

1085 1ɮi 'man'

whose character in fact is the source of the left half of the first half according to the Tangraphic Sea (more on this in part 2).

Was *S-P-zi a derived verb meaning something like 'obtain a man'?

7.8.1:40: *S- derived causative verbs in some cases: e.g.,


4906 2gwi (< *Ni-gwa-H) 'to wear, to put on (clothes)' : 3146 1gwị ̣< (*S-Ni-gwa) 'to make to wear, to clothe (v.t.)' (both stem 1 which is used in all environments but two; see below)


3686 2gwio (< *Ni-gwa-w-H) 'to wear, to put on (clothes)' : 0539 1gwiọ ̣< (*S-Ni-gwa-w) 'to make to wear, to clothe (v.t.)' (both stem 2 which is used when the subject is first or second person singular and the object is third person)

See Gong (2002: 51-54) for many more examples.

*P- derived verbs from nouns in at least two cases (pairs and glosses from Gong 2002: 45-46; the Tangut and pre-Tangut reconstructions are mine):


3003 1ʔɨu (< *ʔru) 'ghost, demon, devil' : 0622 1ʔwɨu (< *P-ʔru) 'bring an evil'


3259 1dzi 'a state of abstraction, meditation, without anxiety and hinderance' : 3411 1dzwi (< *P-dzi) 'cause the mind to be in the state of abstraction'

I have not included the pair 3943 'sole of shoe' and 3961 'to sole' since I follow Gong's later reconstruction of 3943 with -w-.

So could *S-P-ɮi be more precisely glossed as 'to cause to get a man'? UNAMI GEMINATES

I wonder if Unami geminate obstruents (in bold) sounded like Korean tense obstruents: e.g.,

kkə́ntkaan 's 'then you danced' vs. ná kə́ntkaan 'then there was dancing'

ppɔ́ɔm 'his thigh' vs. ní pɔ́ɔm 'the ham'

nsaassaakkənə́mən 'I stuck it out repeatedly' vs. nsaasaakkənə́mən "I stuck it out slowly"

(I have replaced the hard-to-see dot · for length with a doubling of the previous letter.)

Unami has a geminate xx reminscent of Middle Korean ㆅ hh (which was lost sometime after the seventeenth century). This xx is distinct from /xh/ which is pronounced [xk] in medial position: e.g., /màxhee/ 'to be red' is [màxkee]. I wonder how initial /xh/ in /xhook/ 'snake' is pronounced. (The word appears in this 1889 dictionary as achgook. Did initial /xh/ result from apheresis: axg- > /xh/?)

Korean tense consonants generally originated from earlier obstruent sequences: e.g.,

ttae < pstay 'time'

There are, however, cases of tense consonants in words that originally never had obstruent sequences: e.g.,

kkot < 곶 kos 'flower'

ssang < 솽 swang 'double' (< Chinese 雙; sw- is an obstruent-sonorant cluster)

sshi < 시 si 'courtesy title, family' (< Chinese 氏)

cf. native 씨 sshi < psi 'seed'

The tense consonant of 氏 might incorporate genitive -s-: e.g., I-sshi 'the Lee family' could be a reanalysis of *Ri-s-si 'Lee-GEN-family'.

I suspect that Tangut at one time also had tense consonants from obstruent sequences. Their tenseness was lost after it spread into the following vowel: e.g.,

0359 *Sʌ-tuŋ > *stʊ > *ttʊ > *ttʊ̣ > 1tʊ̣ 'thousand' (cf. Written Tibetan stong 'id.')

Did Unami geminates originate from earlier clusters? (7.6.21:18: Here is a list of Proto-Algonquian clusters.) VALENTINA

Last night I finished reading this biographyof the mother of the professor who introduced me to Tangut twice - once on a plane to Japan in 1988 and again in a classroom in 1994. Many or even most of the posts on this blog would not exist if I had not met the son of Valentina Valerianovna Lyovina. I thank her for her indirect yet enormous impact on my life. The story of her life inspired me this week when I needed the strength to go on. And go on she did - from Russia to Yugoslavia, Canada, and ultimately America. In her honor I shall transcribe her name in the script that I studied thanks to her son:

5156 1va (transcription character)

5267 1lɨẽ (transcription character)

0648 2ti 'to remain'

3274 1na (surname character)

Three of those characters were used to transcribe the Sanskrit syllables va, ti, and na; the exception is the second which was used to transcribe Chinese l-syllables with front nasal vowels: 靈林凌陵菱䔖綾令伶零領連蓮廉鐮.

The second syllable could also be transcribed as

3421 2lɛ̃ (syllable of 2lɛ̃ 2lɛ̃ 'medium'?*) or 1661 1lɨĩ (transcription character).

I am not confident about the nasalization of the vowel of 3421.

1661 may be closest to the Russian and English pronunciation of unstressed -len-.

Tangut l- may have been velar [ɫ] as it usually could be followed by the Grade III medial -ɨ- but not the Grade IV -i-.

Tangut had no syllables ending in -n; foreign nasal-final syllables were transcribed with characters for Tangut syllables ending in nasal vowels.

*7.6.2:31: The Tangraphic Sea definition of 3421 is

'2lɛ̃ TOPIC 1140 1413 is' (what I gloss as 'is' is the Tangut equivalent of the Classical Chinese copula 也; I discussed both in parts 1 and 2 of "A Fami-*l-j-l Resemblance")

'5254 5254 is'

'5963 5963 is'

'not small big GENITIVE say': i.e., 'how one says neither small nor big'; the genitive indicates that 'not small big' was nominalized, so a very literal translation would be 'the saying of not small [or] big'

I have not glossed some of the words in the definition because their definitions are circular:

1140 is defined as 0669 1140 'short', 5254 5254, 3421, and 'a body that is small but not big'

1413 is defined as 5963 5963, 1140 1413, 5254 5254, and 3421 3421

5254 is defined as 5963 5963, 1140 1413, 3421 3421, and 'neither big nor small'

5963 is defined as 1140 1413, 5254, 4417 '?', and 'neither big nor small'

'Neither big nor small' implies that 3421 is 'medium', but 3421 and its definition 5254 5254 also define 1140 whose other glosses include 'short' and 'a body that is small but not big'. How can 3421 and 5254 5254 be 'small' and 'not small' at the same time?

Li Fanwen (2008: 554) translated 3421 as an adjective 'equal, even, moderate'. I can understand how he got 'moderate' from 'neither big nor small', but not 'equal' or 'even'.

Kychanov and Arakawa (2006: 499) translated 3421 as an adjective 'middle-sized' but 3421 3421 as an adverb 'equally, just right'. Is there any text with 3421 3421 as an adverb? All examples of 3421 and 3421 3421 that I have seen in Li Fanwen (2008) are only in dictionaries. DID THE TANGUT KNIT? (PART 2)

When I looked up 'knit' in the English index of Li Fanwen's 2008 Tangut dictionary last night, I only found

2958 2e 'knitted wool'

whose entry mentioned its near-synonym

5930 2kəu 'knitted wool, woollen blanket'

but later I found two characters that he defined as 'knit' in English on pages 247 and 734:


1481 1tʂhəõ 'to knit, weave' (defined in Chinese as 結 'to tie'; Kychanov and Arakawa 2006: 'tie, knot, clog, bind', путы 'fetters') =

top and bottom left of 1539 1tʂhwəẽ 'to tie, fasten' (semantic) +

bottom right of 2176 1tʂɨəʳ 'to tie' (only in dictionaries?; semantic)


4626 2dwa 'to knit' (defined in Chinese as 織 'to weave'; Kychanov and Arakawa 2006: 163: 'weave, knit') =

left of 4640 2dwa 'many, much' (only in dictionaries?; phonetic)

left of 0635 1niə (first syllable of 0635 1424 1niə 1thiu 'blood relations'; semantic: 'blood ties' > 'tie'?)

I doubt either verb meant 'to knit' for the reason I gave at the start of my last post. I assume 1481 is 'to tie' and 4626 is 'to weave'.

1481 1tʂhəõ 'to tie' and 1539 1tʂhwəẽ 'to tie' are obviously cognates and even combine to form a disyllabic verb

1481 1539 1tʂhəõ 1tʂhwəẽ '[to] tie, clog, shackles, fetters' (Kychanov and Arakawa 2006: 661)

The two words go back to *choN and *P-cheN; the labial prefix of the latter conditioned -w-. Did *P-cheN originate as the second half of a reduplicative compound *choN-*cheN which came to be an independent word *cheN? See Gong (2003: 612) for other examples of o-e reduplicative compounds.

4626 2dwa may go back to *P-da-H, and its *da may be the root of

0630 1la < *Cʌ-Ta 'to weave' and 2497 2la < *Cʌ-Ta-H 'to weave'

(*T may be *t, *th, or *d; voicing and aspiration are neutralized when obstruents lenited in intervocalic position.)

A d-word for 'weave' is in some Qiangic languages (e.g., Ersu dɛ, Lyuzu de, Xumi dyi; the last two were also glossed as 'knit'!), but not in rGyalrong (e.g., Casmi ka-tia, Tshobdun ka-ta, Japhug kɤ-taʁ, Somang ka-tak, Zbu kɐ-tɐχ < *-q; did *-q become zero in Tangut instead of conditioning vowel 'length'?).

7.5.2:03: Perhaps prenasalized forms are the 'missing link' between the d- and t-words for 'to weave': e.g.,


Namuyi ndæ (also glossed 'knit'!), Guiqiong nthɑ, Zhaba a-ntha 'weave a basket'


Daofu nthɑ (also glossed 'knit'!)

Could 4626 2dwa go back to *P-N-t(h)a-H? If so, then there is no need to reconstruct *d- in the Tangut root for 'to weave', and the la-words for 'to weave' may go back to *Cʌ-t(h)a(-H). Perhaps *Cʌ- was *Nʌ-, and the three words were originally very similar:

*Nʌ-t(h)a > *Nʌ-la > 0630 1la

*Nʌ-t(h)a-H > *Nʌ-la-H > 2497 2la

*P-Nʌ-t(h)a-H > *P-Nda-H > 2497 2dwa

*Nʌ- was completely lost after *t(h) lenited to *-l- following its vowel in the first two words, but it lost its vowel in the third word and fused with the root initial. I wonder how many other *t-roots have l- and d-derivatives in Tangut. DID THE TANGUT KNIT?

Although Li Fanwen's (2008: 484) dictionary defined Tangut

2958 2e and 5930 2kəu

as 'knitted wool', Kychanov and Arakawa (2006: 377) defined both as 'princess'**, and I doubt the Tangut knew what knitting was if knitting is only about a millennium old and initially spread westward from the Middle East. Perhaps the Tangut could have spoken about knitting by recycling a word for 'weave' such as


0630 1la 'to weave' = left of 0435 2kiʳw 'to weave' + center and right of 2374 2pʊ̣ 'to weave'

which has a rising tone variant


2497 2la 'to weave' = center and right of 2374 2pʊ̣ 'to weave' + right of 0630 1la 'to weave'

and which may be cognate to Old Chinese 織 *tək 'to weave' and tak-type words glossed as 'weave' and 'knit' (!) in other Sino-Tibetan languages.

The two verbs combine into

1la 2la 'to entwine, thread though; plait silk' (Kychanov and Arakawa 2006: 432).

The initial l- of 1la and 2la is a lenited *T-:

*Cʌ-Ta(-H) > 1la ~ 2la

The initial of the prefix *Cʌ- is unknown. The vowel of the prefix was low since it did not condition raising in the root before being lost.

The function of the suffix *-H is unknown.

The rhyme is problematic if I want to relate these la-words to OC *tək 'weave', Somang rGyalrong ka-tak 'to knit', etc.

- It would be simple if Old Chinese and pre-Tangut vowels usually matched, and OC corresponded to pre-Tangut *ə, but Tangut a is from pre-Tangut *a, not *ə.

- I expect pre-Tangut *-k to be reflected as vowel 'length' in Tangut: *-Vk > -VV.

Compare with Tangut

3752 3296 2miə 2niaa 'Tangut'

whose pre-Tangut ancestor was borrowed as Tibetan Mi-nyag (implying *mə-njak).

(I put 'length' in quotation marks because I no longer think Tangut had distinctive vowel length. I do not know what the real difference between 'short' rhyme 17 -a and the less frequent 'long' rhyme 22 -aa was.)

Sino-Tibetan t-k words in turn resemble Thai ถัก thak 'to knit' and Lao ຖັກ thak 'id.' which have no cognates outside southwestern Tai unless they are related to tak 'knot' in Po-ai, a northern Tai language (cf. how English knit and knot are cognates). Pittayaporn (2009: 89) did not reconstruct *th- in Proto-Tai (PT) and regarded th-words "as post-PT lexical innovations, either borrowings or forms derived after the establishment of the contrastive aspiration."

If thak 'to knit' is a borrowing from 織 OC *tək, its aspiration is unexpected. Is that aspiration a trace of a prefix in a southern dialect of Chinese? Moreover, its vowel is also unexpected. OC (later *ɨə) would have been borrowed into early Tai as or which would have become ɯ and o* in Thai and Lao, not a. Maybe thak is from a southern Late Old Chinese */tʰək/ [tʰʌq] with aspiration and a lowered vowel due to an earlier prefix:

*sʌ-tək > *sˁʌˁ-ˁtˁʌˁqˁ > *sˁtˁʌˁqˁ > */tʰək/ [tʰʌq]

7.4.0:28: If there ever was such a form, it does not have any modern reflexes in southern Chinese languages and Sino-Vietnamese. Taiwanese tsit, Cantonese zik, and Sino-Vietnamese chức all point to Middle Chinese *tɕɨk from Old Chinese *tək. I cannot find any aspirated forms of 織 in any Chinese variety at 小學堂.

*7.4.0:37: In Pittayaporn (2009), always rounded to o before *-k in Thai except in *lɤk 'deep' which became Thai ลึก lɯk instead of ลก *lok. I do not know why Thai ลึก was not marked with "-v" for vocalic irregularity.

**7.4.1:48: Kychanov and Arakawa (2006: 377-378) also defined the disyllabic word

2958 5930 2e 2kəu

as 'princess', whereas Li Fanwen (2008: 484) glossed that as 綫毯 'thread blanket' and 毛毯 'woolen blanket', presumably because each character had the note

1743 5760 2lɨu 1tshʊ  'thread rough***' = 'rough thread'

in the D version of Homophones.

Li glossed 2958 by itself as 毛綫 'wool' in Chinese and 5930 by itself as 毯 'blanket' in Chinese and as 'woollen blanket' in addition to 'knitting wool' in English. (His dictionary has bilingual glosses for individual characters but only Chinese glosses for polysyllabic words and phrases.)

As far as I know, neither 2958 nor 5930 occur by themselves or as a sequence outside dictionaries, so 2958 5930 may be a disyllabic 'ritual language' word, and 2958 and 5930 may not be monosyllabic words.

I do not know

- why Li defined 2958 5930 as something other than a redundant compound of 'rough thread', and how he was able to define 2958 and 5930 slightly differently as monosyllabic words.

- why Kychanov and Arakawa's definitions for 2958, 5930, and 2958 5930 are so different from Li Fanwen's.

***5760 1tshʊ  'rough' was borrowed from Chinese 粗 'id.' A FAMI-*L-J-L RESEMBLANCE? (PART 2)

I forgot to mention in my previous post that I doubt 隹/維/惟 ever had *l- in Old Chinese. 維/惟 (but not 隹) definitely had some sort of palatal segment in Middle Chinese dialects, judging from Sino-Korean yu (borrowed from northeastern MC) and Sino-Vietnamese duy < *jwi (borrowed from southeastern MC). Although MC *j- generally comes from OC *l-, it is not necessary to reconstruct *l- in this case, as a simple OC *wi would also have become MC *jwi or *ɥi, etc. Moreover, the cluster *lw- would be sui generis in my reconstruction and nonexistent in Baxter, Sagart, and Schuessler's latest reconstructions. I just wanted to see how far I could take my *l-j-l hypothesis and fit 隹/維/惟 into it. Schuessler (1987: 632) once reconstructed 隹/維/惟 as *ljuəj but has since opted for *wi (see p. 37 of his 2009 book). *wi matches Thurgood's (1982) Proto-Sino-Tibetan copula *wəj and Matisoff's (1985) Proto-Tibeto-Burman copula *waj. (I have replaced Thurgood and Matisoff's *y with *j for ease of comparison.

Lowes (2006) gathered copulas and similar verbs from 71 Sino-Tibetan languages and mapped them. Some have l- and w-; others don't. Although Gong's (2003) description of Tangut is in her bibliography, she didn't list the Tangut copula

0508 2ŋwʊ

which resembles Cogtse rGyalrong ŋos in her "Remaining" category. See #2156 'be, is' in Nagano and Prins' database for cognates in other rGyalrong varieties.

Lacking familiarity with the many languages in her paper, I will reserve further comments on it and focus instead on Chinese and Tangut. Could the apparent vowel alternations in the following forms be remnants of earlier verb paradigms? Chinese glosses are brief; see Schuessler (1987) for fuller glosses. Tangut glosses are based on Gong (2003) except for 1lɨə which is based on Sofronov (1968 I: 262).

Root Language Rhyme type -i < -(ə)j?
< -j(ə)l?
-e- -a
*w- Old Chinese 隹/維/惟
*wi < *wəj? 'to be' (impersonal)

*waj 'to act as, do'

*wets < *waj-t-s or *wej-t-s 'should'

*wəʔ 'there is, to have'
2vɨe < *Cɯ-wajH or *Cɯ-we(j)H 'there is'
*l-j-l? Old Chinese
*ʔlil < *ʔljəl? 'to be'

(objective; later copula)
*ləʔ 'should, indeed' (later perfective)
1lɨə < *lə  (intensive particle)
*m- Old Chinese
*məj < *-l? 'to not be'

*Cɯ-maj < *-l? 'there is no'

*met < *maj-t or *mej-t 'to not have'

*mə 'should not'

*Cɯ-ma 'there is no'
1mi 'not' (before nonauxiliary verbs)

1mie < *Cɯ-maj or *Cɯ-me(j) 'there is no', 2mie < *Cɯ-majH or *Cɯ-me(j)H 'not yet'

1mɨə < *mə 'not' (before auxiliary verbs)

2niaa < *mjaCH? 'no!', 'don't ...!'

There are even more Old Chinese negatives, but they either start with *p- (e.g., 不 *pə) or have rhymes that are variations of those in the table above (e.g., 莫 *mak). Notes on the five rhyme types that I did include:

1. -j(ə)j/l

隹/維/惟 OC *wi could be from *wəj or be the zero grade of *w-j. Although OC *-j may be partly from *-l, the *-j of *wəj could not be from *-l if it is descended from Thurgood's Proto-Sino-Tibetan *wəj.

It is interesting that Lowes (2006) did not list any probable modern reflexes of PST *wəj, though she devoted a section of her paper to the proto-form. STEDT only lists reflexes in Nungic and Loloish.

微 OC *məj may be the schwa grade of *m-j or *m-l.

Tangut 1mi may be the zero grade of *m-j or *m-l (if *l merged with *j in pre-Tangut) or from *Ci-ma after 'brightening'. See 5 for other ma-negatives. I would prefer to reconstruct a high-frequency function word as monosyllabic.

2. -aj/l

爲 OC *waj is the *a-grade of *w-j. See above on why I don't derive its *-j from *-l.

Tangut 2vɨe may also be from *w-j. See 4 below for a homophonous optative prefix absent from the table above.

也 OC *ljalʔ is an *a-grade form of *l-j-l.

靡 OC *maj and Tangut 1mie and 2mie may be *a-grade forms of *m-j or *m-l (if *l merged with *j in pre-Tangut).

3. -e

叀 OC *wets may be an *a-grade or *e-grade form of *w-j.

蔑 OC *met may be an *a-grade or *e-grade form of *m-j or *m-l.


These forms could have lost their root-final consonants because they were originally unstressed.

The perfective and optative prefixes

2vɨə- < *wəH and 2ve- < *weH or *wə-j-H?

are not cognate to 有 OC *wəʔ and 叀 OC *wets since they are originally directional prefixes meaning 'there, outside'.

5. -a

Earlier reconstructions of 無 OC *Cɯ-ma have a medial *-j- corresponding to the *-j- of my pre-Tangut *mjaCH: e.g., Gong Hwang-cherng's OC *mjag. However, recent reconstructions have abandoned OC *-j-, as it usually corresponded to nothing in other languages: e.g., Written Tibetan ma 'not' and Written Burmese ma 'not' (even though mya is possible in both Written Tibetan and Written Burmese).

I would normally derive Tangut 2niaa from *Cɯ-naH, but I wanted to see if I could make it part of the m-family. If Tangut ni- is partly from *mj-, perhaps all miV are from Cɯ-mV:

Pre-Tangut Tangut
*Cɯ-mV miV
*(Cɯ-)mjV niV

Are there any other Tangut niV-words with m-cognates? The closest thing that comes to mind is

1nɨaa < *Cɯ-naC or *mjaC 'black': cf. 黑 OC *sʌ-mək 'id.'


1. it has -ɨ- and -a-, not -i- and -ə-. (Medial -ɨ- after a dental is unusual and should be investigated.)

2. there is no Chinese-internal reason for reconstructing *-j- in OC *sʌ-mək (which is not to say that *-j- is impossible; I suspect it was lost after the emphatic consonants conditioned by *Cʌ-prefixes).

3. it is a better phonetic match for Written Tibetan nag 'black'.

This next word also has phonetic problems:

2nieʳ < *Cɯ-nerH or *rɯ-ne(n)H or *mjerH or *rɯ-mje(n)H 'face': cf. 面 OC *mens 'id.'

Although it at least has -i- unlike 'black', once again, there is no Chinese-internal reason for reconstructing *-j-, and it is not certain Tangut ever had a final nasal in this word since *r-...-e and *r-...-eN merged as -eʳ. A FAMI-*L-J-L RESEMBLANCE?

Last night when I mentioned the Chinese third person pronoun 伊 (now obsolete in standard Mandarin, though surviving in Taiwanese), I realized that a homophonous copula written with the same character* might share a root *l-j-l with the copula 也. Tonight I tried to see how far I could extend the *l-j-l family. Possible affixes of unknown function are in red (indicating that they cast doubt on this hypothesis - it is too easy to 'relate' unrelated words by dismissing all 'extra' segments and syllables as 'affixes').

Sinograph My Old Chinese Zhou Chinese gloss from Schuessler 1987
*ʔlil personal equational copula
*ljajʔ < *ljalʔ? marks statement as objective fact (later copula)
*ləʔ? should, indeed (later perfective)
隹/維/惟 *lwi < *lwil? < *P-lil? impersonal equational copula

The phonetic of 伊 Md yi < *ʔlil is 尹 Md yin < *lwirʔ (< *P-lirʔ) 'be straight, straighten, administrator (i.e., one who straightens)'. I reconstruct *-l and *-r for characters sharing a phonetic that are later read with -i/j and -n.

*ʔlil is a 'zero grade' form of the root *√l-j-l. The medial *-j- became *-i- since it was not followed by a vowel.

*ljalʔ is an *a-grade form of the root *√l-j-l. I am not certain that the consonant after *a was originally *l instead of *j. It is difficult to believe that a pause marker (another function of 也 according to Schuessler 1987) could have five segments instead of being a simple V or CV syllable. Maybe it had a shorter reading in that function - a reading like that of 矣.

The reconstruction of 矣 is elusive, as it belongs to a phonetic series with various initials without an obvious common denominator. The simple reconstruction *ləʔ could be an unstressed reduction of some pre-Chinese form of the verb *√l-j-l. But I see no reason why a deontic form would be more prone to reduction than a realis form.

The Tangut translation equivalent of both 也 and 矣 is

5285 1lɨə < *lə

and it is tempting to see a direct connection between pre-Tangut *lə and 矣 *ləʔ.

Maybe 矣 *ləʔ has nothing to do with *√l-j-l and is simply a variant spelling of 已 *ləʔ 'to finish' (and by extension, 'already'). 矣 and 已 might have also phonetically differed in some way that can no longer be reconstructed.

隹/維/惟 *P-lil is another 'zero grade' form of the root *√l-j-l. Although the shift of *PC- to *Cw- is phonetically plausible, there are not enough phonetic and word family alternations to fully justify reconstructing a prefix *P-. I am much more confident about reconstructing *P- in pre-Tangut to account for sets of words such as

0618 1tsia < *Cɯ-tsa (*Kɯ-tsa?) 'hot'

(The prefix *Kɯ- may have been lost after conditioning vowel breaking and before it could condition aspiration in the root initial *ts-. Tibetan tsha < *tsa preserves the bare root.)

1829 1tsha < *Kɯ-tsa 'hot'

1825 1tshwia < *P-Kɯ-tsa 'to heat'

See Gong (2002: 45-46) for more examples in Tangut. I interpret his Chinese example as involving vowel alternation rather than zero ~ *-w-alternation:

熱 Middle Chinese *ɲiet < Old Chinese *Cɯ-ŋet 'hot'

爇 Middle Chinese *ɲwiet < Old Chinese *Cɯ-ŋot 'to burn'

Then again, maybe there never was an *o in 'to burn':

*Pɯ-ŋet > *Pɯ-ŋiet > *Pŋiet > *ŋwiet > *ɲwiet

But I hesitate to reconstruct a Proto-Sino-Tibetan *P-causative prefix, much less relate it to the Proto-Austronesian causative prefix *pa- and/or Anderson's (2004: 162) Proto-Austroasiatic causative prefix *’B-. Lookalikes abound  within as well as among languages: e.g., Vietnamese also has a l-copula (< Proto-Vietic *la 'to work'?) that is unrelated to any of the Chinese forms above. In any case, if 隹/維/惟 ever had a *P-prefix, it could not be causative, since 隹/維/惟 is 'be', not 'cause to be' (unless 'cause to be' was downgraded to 'be').

*The semantics of 伊 'he/she/it ~ is' are reminiscent of those of modern standard Mandarin 是 'is' which originated as 'this'. I thought 伊 might have undergone a similar shift from pronoun to verb, but Schuessler (1987: 742) listed the verbal meaning before the pronominal meaning (implying the verbal meaning is older?), and my *l-j-j hypothesis requires 伊 to have been a verb first. However, I cannot think of a case in which a verb became a pronoun, though I can make up a scenario of reinterpretation:

1. 'X is Y'

2. 'is Y' (X is dropped as an assumed subject)

3. 'he's Y' ('is' comes to refer to the subject)

4. 'he' ('he's' is used before verbs as well as nouns, becoming a pronoun)

If that occurred, then Taiwanese 伊是 i si 'he is' is etymologically 'is' + 'this'! FALLING INTO EXHAUSTION: TANGUT RHYME 51

The Sanskrit syllable ho [ɦoː] was transcribed in Tangut with the common Grade I transcription characters

3118 1xʊ (rhyme 1) and 5595 2xwo (rhyme 51)

(Arakawa 1999: 111). 3118 makes more sense if R1 was which is closer to [oː] than the -u of other reconstructions. hu and ho have very distinctive vowel characters in Indic scripts, so I doubt a Tangut speaker misread Sanskrit ho and transcribed it using 3118 as if it were hu.

Perhaps the familiarity of those high-frequency transcription characters took priority over phonetic precision when 3118 and 5595 were chosen for Sanskrit ho.

A better phonetic match for Sanskrit ho would have been

5661 1xo 'third person singular pronoun'*

without -w-. Although 5661 is a rare character, transcriptions may contain characters that are hardly used in other contexts: cf. Mandarin 伊 yi 'third person pronoun (obsolete)' which is now mostly used for transcribing foreign i in names such as 伊拉克 Yilake 'Iraq'. Perhaps whoever chose 5595 had a dialect in which xwo had become xo.

The level and rising tone names for R51 in the Precious Rhymes of the Tangraphic Sea are

2326 1tho 'tired, weary' and 4290 2thwo 'to fall into'**

I can safely reconstruct R51 as -o since

- Sanskrit -o usually corresponds to R51 in Tangut transcriptions (Arakawa 1999: 111)

- R51 is almost always transcribed as Tibetan -o (Tai 2008: 218)

As I will explain in a later post, perhaps -o was more precisely [ɔ], a vowel absent from both Sanskrit and Tibetan. [ɔ] might have been the best match for Sanskrit -o if Tangut had no [o]. And of course Tibetan o would be the best match for a Tangut [ɔ]. But for now I will continue to use the simpler symbol -o.

*5661 is apparently only in dictionaries. It may be a pronoun of the so-called 'ritual language' (which I suspect was a substratal language a.k.a. 'Tangut B').

The normal Tangut third person singular pronoun is

0388 2thia

which may be derived from the demonstrative

2019 1thia 'that' (written as 0388 plus 'water' on the left)

plus a suffix *-H that conditioned the second (i.e., 'rising') tone. Kepping (1985: 61) regarded 2019 as a demonstrative, whereas Gong (2003: 607) translated it as both a pronoun and a demonstrative. I assume the pronominal uses of 2019 are secondary.

1thia is cognate to Ronghong Qiang the 'that' (LaPolla and Huang 1996: 52).

In turn, the Tangut and Qiang forms resemble the Mandarin third person pronoun ta [tʰa]. That similarity may be coincidental: cf. how those three words happen to sound vaguely like English that. The third person pronoun of the Chinese dialect known to the Tangut is unknown. In any case, the Tangut, Qiang, and Mandarin words cannot be from a Proto-Sino-Tibetan *tha or the like since Mandarin ta is from Old Chinese *hlaj with a lateral initial and a final glide.

**Unlike previous rising tone rhyme names, 4290 2thwo has a -w- absent from its level tone counterpart 1tho. There was no Tangut syllable *2tho without -w-, so 2thwo was the closest rising tone match for level tone 1tho.

I don't know the reasoning behind rhyme names. Why was 1tho, a word known only from dictionaries (and hence a possible 'ritual'/substratum language/Tangut B word) chosen instead, of say,

1292 1to 'the surname To'

which had an exact rising tone counterpart

4859 2to 'end'?

I think it's appropriate to end this post with that character! A DIVINE PREFACE: TANGUT RHYME 1

Tangut rhyme 1 (R1) has two names in the Precious Rhymes of the Tangraphic Sea: one for its level tone version (1.1 = 1st tone, 1st rhyme) and another for its rising tone version (2.1 = 2nd tone, 1st rhyme).

5085 1bʊ 'preface' and 3224 2bʊ 'to divine, tell fortunes' (< Chn 卜?; see below; "Divine" in the title is an English play on words and is not mean to imply 3224 is an adjective 'divine')

R1 is almost always transcribed as -u in Tibetan with a few exceptions (Tai 2008: 217):

-uH (once)

-o (twice)

-i (once)

-iH (once)

Therefore R1 must have been u-like. (I cannot explain the i-transcriptions.)

Having reconstructed Grade I i / R8 as in my previous post, I would like to reconstruct Grade I u / R1 as its back counterpart -ʊ. The Tibetan -o transcriptions may imply a vowel like ʊ that was slightly lower than u.

However, R1 was almost always used to transcribe Sanskrit -u (Arakawa 1997: 110), implying that it may have simply been -u: i.e., an exact match of the Sanskrit vowel. Like other Grade I rhymes, -u lacks the vowels that characterize Grades II-IV: -ɤ-, -ɨ-, -i-.

But -u has three problems.

First, if pre-Tangut *i lowered to ɪ in Grade I, I would expect its back counterpart *u to lower to ʊ in Grade I. Of course there is no absolute rule of symmetry in vocalic development. Nonetheless, vowels shifting in unison are more probable than vowels each going their own way.

Second, if Tangut had -u but no -ʊ, it would have had an asymmetrical set of simple vowels:

i (R11 / Grade IV)   u? (R1 / Grade I [not IV!])
ɪ (R1 / Grade I)   (no Grade I -ʊ!)
e (R34 / Grade I) ə (R28 / Grade I) o (R51 / Grade I)
  a (R17 / Grade I)  

On the other hand, Ukrainian has a similar asymmetrical vowel system (without ə; Ukrainian ɪ is central front, reflecting its origin as a merger of central and front *i). (6.28.1:14: No such merger producing an isolated ɪ ever occurred in Tangut.)

Reconstructing might result in another asymmetrical vowel system unless I reconstructed Grade III R2 or Grade IV R3 as -u, a possibility I have considered from time to time: e.g., in my June 20 and June 17 reconstructions.

Third, this Tangut loanword from Middle Chinese (MC) has R1 corresponding to EMC *-o:

3806 2bʊ < MC *bo 'cattail'

-ʊ is closer to MC *-o than -u. (Perhaps the ancestor of R51 was not *-o when this word was borrowed.)

Unfortunately there are no other examples of R1 loans that are unambiguously from MC. The name of the rising tone version of R1,

3224 2bʊ 'to divine, tell fortunes'

could be from Chinese 卜, but the initial is irregular (does it incorporate a Tangut voicing prefix: *b- < *N-p-?) and cannot be used to determine the age of the borrowing. (A voiced obstruent initial in a Sino-Tangut loanword indicates MC origin, since later loans have voiceless aspirates reflecting post-MC devoicing.) 2bʊ could be from a form anywhere on a spectrum from MC *pok to post-MC *pu.

Hence for now I seem to be alone in reconstructing R1 as -ʊ, though I remain open to the possibility that it was -u, the reconstruction favored by the majority of scholars. AN INTENTION TO WHISTLE: TANGUT RHYME 8

At the end of part 2 of "G-*r-adation in Tangut", I wrote,

I will explain the reasoning behind the reconstruction of individual [Tangut] vowels and diphthongs in future posts.

I initially thought that I would start with Tangut rhyme 1, but I realized that my explanation for that rhyme was dependent on my explanation for rhyme 8. So I'm going to start with 8 and go back to 1.

Rhyme 8 (R8) has two names in the Precious Rhymes of the Tangraphic Sea: one for its level tone version (1.8 = 1st tone, 8th rhyme) and another for its rising tone version (2.7 = 2nd tone, 7th rhyme).

3100 1sɪ 'intention' and 1007 2sɪ '(to) whistle'

R8 is the Grade I member of a set of four i-type rhymes (R8-R11). I reconstruct Grade I with zero corresponding to the vowels that characterize Grades II-IV: -ɤ-, -ɨ-, -i-. I would prefer to reconstruct R8 as zero plus a simple vowel. But what was that vowel?

R8 was transcribed in Tibetan as -i (5 times), -iH (twice), -yi (once), and -ing (once) (Tai 2008: 206). This does not necessarily mean that R8 was -i, but it does tell us that R8 was something like -i.

R8 was never used to transcribe Sanskrit short -i or long (Arakawa 1997: 110). Therefore R8 was -i/ī-like but not -i/ī itself, and I rule out Arakawa's reconstruction of -i for R8.

Sofronov (1968 and 2012) and Gong (1997) reconstructed R8 as short -e, a sound absent from Sanskrit. (Sanskrit e is always long [eː].) Their reconstructions are consistent with the absence of R8 in Tangut transcriptions of Sanskrit (Arakawa 1997), but not with the Tibetan i-transcriptions of R8. If R8 were -e, I would expect its Tibetan transcription to be *-e, not -i.

Given that

- Grade I in Chinese is associated with descendants of lower(ed) vowels (that were emphatic at an even earlier stage)

- the Tangut grade system was influenced by Chinese

- there was no Grade I i-type rhyme in Chinese, implying Tangut R8 was unlike anything in Chinese (and further indicating that a simple -i is improbable for R8 since Chinese certainly had -i, which was in Chinese Grade IV, not Chinese Grade I!)

I reconstruct R8 as -ɪ, a lowered version of pre-Tangut *i. -ɪ is like Tibetan -i while also being unlike anything in Sanskrit. Nishida (1964) and Li (1986) also reconstructed R8 as -ɪ.

Other proposed reconstructions of R8 are unlike Tibetan -i or anything in Sanskrit:

Hashimoto (1965): -ɛj [-eːj] (I would expect the Tibetan transcription *-e)

Huang Zhenhua (1983): -ɔi, -oi (I would expect the Tibetan transcription *-oHi)

I used to reconstruct R9 as -ɪ, but now I reconstruct it as -ɤi with the Grade II vowel -ɤ. NAMES OF THE TANGUT CAPITAL (PART 2)

In part 1 I covered names containing 州 zhou, 府 fu, and/or 興 xing from Dunnell (1989).. Here are the miscellaneous names from her article (with the exception of #4).

1. 衙頭 Yatou

衙 is 'government office' and 頭 is 'head'. Is this a Chinese term coined in the Tangut Empire and/or a translation or even a transcription of a Tangut term?

2. 牙帳 Yazhang

This superficially looks like 'tooth tent', but I wonder 牙 is another spelling of its homophone 衙 'government office'. According to Dunnell, "In Tang and Liao usage, yazhang designated the imperial camp, or the emperor himself."

3. Erighaya/Egrigaia/Iriqai/Irigai

This is the most mysterious of all the names. It is presumably a transcription of a Tangut name. At first I thought Mongolian speakers had added the first syllable had been added to an r-initial original because Mongolian did not permit initial r- (cf. Mongolian Orus for 'Russia'), but now I wonder if E-/I- is the mysterious E- in Etsina/Etzina whose latter two-thirds mean 'black water':

3058 0176 2ziəəʳ 1nɨaa

I have never been able to identify a plausible Tangut word corresponding to E-.

Kychanov identified Ir- as a Mongolian inversion of Tangut ri: i.e.,

4396 2riəʳ 'room, hall, main buiilding'.

He also identified the final syllable as, in Dunnell's (1989: 58) words, one of "various Tangut words denoting fortified settlement". (Does Kychanov's original article specify those words?) However, I do not know of any Tangut word sounding like ghaya/gaia/gai/qai with such a meaning. And the words that do sound nothing like ghaya/gaia/gai/qai:

0289 1vɪ 'walled city'

1623 2vạ 'imperial city, imperial palace'

1869 1po 'fort' < Chn 堡

My current Tangut reconstruction does not even have the rhyme -ai. Nor have I ever seen a Tibetan transcription of Tangut indicating a rhyme -ai.

4. Calachan

Andrew West pointed that Calachan is Marco Polo's name for the capital of the "province [not city!] called Egrigaia [...] belonging to Tangut" (The Travels of Marco Polo, p. 281). I can't see the note about Calachan in that edition, so I found  it in another edition. Rashid al-Din wrote the name as Kalajān, and the name refers to Alashan (Mongolian Alasha) known in Tangut as

2xɪ̃ 1lã

I cannot explain the mismatch in the first vowels of 2xɪ̃ 1lã, Alasha (whose sha is from Chinese 山 'mountain'), and Mandarin 賀蘭 Helan < *xɔlan.

Palladius identified Calachan as "the summer residence of the Tangut kings [i.e., not the Tangut capital], which was 60 li from Ning-hia, at the foot of the Alashan Mountains. It was built by the famous Tangut king Yuen-hao, on a large scale, in the shape of a castle, in which were high terraces and magnificent buildings." Palladius stated that the Tangut name of Calachan was "apparently" Halachar. Is that name in Chinese transcription in 西夏書事 Xixia shushi? I cannot find any char-like Tangut syllable that would make sense after 2xɪ̃ 1lã.

5. 開封 Kaifeng (!)

Kaifeng was of course the capital of the Northern Song. Like Shi Jinbo, I don't think the Tangut ever used the name of the capital of their neighbor and rival. Dunnell could not find 'Kaifeng' in any Tangut sources. I don't know how that Chinese name would have been written in Tangut. A possible transcription might have been

4186 2635 1khe 2xiõ

with the character that appeared in names in part 1, bringing us full circle. G-*R-ADATION IN TANGUT (PART 2)

While writing "Hilo in Tangraphy", I changed my mind about how to reconstruct the core Tangut vowel system. What follows is an outline of the history I reconstruct leading up to my newest diagram.

The earliest stage of pre-Tangut had only six vowels:

ə u

Basic pre-Tangut words had the structure

presyllable + syllable


with stress on the second vowel. The six unstressed first vowels of the presyllable may have merged into a smaller set. For now let's suppose there were only two unstressed first vowels: higher and lower *ʌ.

Medial *-r- lenited to *-ɨ-.

Under the influence of the neighboring dialect of Chinese, palatals became retroflexes followed by *-ɨ-.

Vowel harmony required the first and second vowels of a word to have matching height classes. Nonhigh vowels bent upward after and high vowels bent downward after *ʌ. *-ɨ- lowered and backed to *-ɤ- after and before lower vowels not preceded by *ɯ. These changes produced a richer vowel system full of diphthongs that became unpredictable and hence phonemic after presyllables were lost:

original main vowel
*ə *a
no presyllable: no change
*ə *a
no presyllable + *-ɨ-
*ɨi *ɤe *ɨə *ɤa *ɨu *ɤo
presyllable with *i
*ɨa *u
presyllable with + *-ɨ- *ɨi *ɨe *ɨu *ɨo
presyllable with *ei
presyllable with + *-ɨ- *ɤi *ɤe *ɤə *ɤa
*ɤu *ɤo

*-ɨ- generally fronted to *-i- after initials other than retroflexes and *l- which may have been velar [ɫ]. *-ɨ- survives in a few words which may be archaisms and/or borrowings from dialects without *-ɨ-fronting: e.g.,

0785 1bɨu < *bru 'border'

3408 1tsɨa < *Cɯ-tsa 'to broil' (cf. Tibetan tsha < *tsa 'hot'; was the presyllabic a causative prefix?)

*u merged with *iu.

*ei and *ou monophthongized as ɪ and ʊ: i.e., as compromises between upper-mid and high vowels.

The Tangut phonological tradition categorized these vowels and diphthongs into four grades. Three (II-IV) each had a characteristic vowel, whereas rhymes of the first grade did not begin with any of those vowels:

Vowel Front Central Back
Grade i e ə a u o
IV: i i ie
iu io
III: ɨ ɨi ɨe ɨə ɨa
II: ɤ ɤi
ɤe ɤə ɤa ɤu ɤo
I: Ø ɪ
e ə a ʊ

This latest system is close to the one I've been using for the last six years but has the following differences:

- All Grade II diphthongs now share a vowel; my current diphthongs transparently share more in common than the lowered vowels of my earlier reconstruction

- The Grade I i- and u-vowels are now monophthongs like the other Grade I vowels. David Boxenhorn poined out that my system from last week had no simple vowels; all phonetically simple vowels were phonemic diphthongs. That is highly unlikely, so I have reinterpreted Grade I as the home of simple vowels.

I will explain the reasoning behind the reconstruction of individual vowels and diphthongs in future posts. WHAT CAN A KUNG FU POSTER TEACH US ABOUT SOUTHEAST ASIAN PHONETIC HISTORY?

Today I saw a poster for หมัดนรก ฝ่ามือพญายม Mat narok famɯɯ phayaa Yom (Hell Fist and King Yama's Palm), the Thai version of 幽靈神功 Youling shengong (Phantom Kung Fu). (Many more Thai posters for Chinese movies are at Kung Fu Movie Posters.)

The word พญา <bañā> phayaa 'king' caught my eye because it didn't look like an Indic loan even though most polysyllabic Thai words are of Indic origin. Where could it come from? My guess was Khmer, and yes, there is a Khmer word ពញា <bañā> phɲiə. But the trail doesn't stop there. It goes back to another Tai language. The online version of Headley's 1977 Khmer dictionary derives it from Lao ພະຍາ <baḥyā> phaɲaa. So where did that come from?

Here's what I think happened. The root of the word is ultimately Khmer after all: Old Khmer vrah 'divine being' which later became premodern Khmer brah. This word was borrowed into early Thai and Lao (or their common ancestor?) as *bra and added to the native Tai word *yaa 'male' (surviving in Thai royal language). *brayaa developed regularly into modern Thai พระยา <braḥyā> phrayaa 'a rank of nobility'. Meanwhile in Lao it underwent *y-nasalization, becoming *braɲaa.

That Khmer-Lao hybrid form was then borrowed into Khmer as ព្រញា <brañā>. Then Lao lost medial *-r- and the resulting *baɲaa was borrowed into Khmer as ពញា  <bañā> bɔɲaa.

This bɔɲaa (or bəɲaa with a reduced first vowel) in turn was borrowed into premodern Thai as *baɲaa, becoming modern Thai พญา <bañā> phayaa after devoicing of *b- and denasalization of *ɲ. Thai phayaa 'king' coexists alongside phrayaa 'a rank of nobility'. (Although one might prefer to simply derive the Thai form directly from Lao without a Khmer intermediary, Khmer is a more likely source than Lao since Khmer was a source of Thai court terminology.)

Finally, Khmer bəɲaa became modern phɲiə after further Khmer-internal changes: breaking of the stressed second vowel after a voiced consonant, devoicing of *b, and loss of the unstressed first vowel. The Khmer spelling ពញា  <bañā> still reflects the word's disyllabic origin even though the word is now pronounced as a monosyllable.

All of the above implies the following relative chronology of changes:

1. Khmer vr > br (before Lao and Thai borrowed *bra from Khmer)

(Is it also possible that early Tai speakers borrowed Khmer vr- as *br- even before this change in Khmer?)

2. Lao *y > ɲ (before Khmer borrowed <brañā> from Lao)

3. Lao *-r- > Ø (before Khmer borrowed Lao *baɲaa as <bañā>)

4. Khmer aa > after voiced consonants (before voiced obstruents were devoiced)

5. Devoicing of *b (and other voiced obstruents) in Khmer, Lao, and Thai (after aa-breaking)

Devoicing in Khmer must have occurred before voicing conditioned aa-breaking, and it could have occurred in Khmer after it had already occurred in Lao and Thai.

The loss of the first vowel in Khmer must date after Lao *baɲaa was borrowed as a disyllable in Khmer. Thai *baɲaa could be from a Khmer monosyllable *bɲaa, though I cannot find any attestations of a monosyllabic Khmer spelling.

Tangut fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2014 Amritavision