Archives WHY SO MIA-NY?

I have been writing about names of Kumārajīva lately (part 1 / part 2) such as Tangut

3948 3369 3284 2152 3284 (again!) 1kɨa' 1mia 2lɨa 1ʂɨi 2lɨa

The tangraph transcribing was one of the rhyme 1.20 syllables in the Tangraphic Sea that I listed last week.  Most were written with one or two tangraphs, but 1mia was written with seventeen! (For comparison I have also included the corresponding rising tone syllable 2mia with rhyme 2.17.)

Tangraph Li Fanwen number Reading Li Fanwen gloss Type (* = only in dictionaries)
0092 1mia mother (cf. 3334) free morpheme 1
0409 former times (only in dictionaries?; combines with regular word for 'day') bound morpheme 1*
1178 first half of 1mia 2nie 'end' (only in dictionaries; cf. 3369) free morpheme 1 in a compound 'end-tail'*
1215 first half of 1mia 2mɤe' 'to think of, to long for' (only in dictionaries) morpheme half 1*
1216 ten thousand (loan from Late Old Chinese 萬 *mɨanh 'id.'?) free morpheme 2
1458 second half of 2ni' 1mia 'salamander' (only in dictionaries) bound morpheme 2* after a Chinese loanword 鯢 'salamander'
1530 river free morpheme 3
1721 stirrup free morpheme 4
1803 first half of 1mia 1ɬiu' 'gray', name of an ancestor (only in dictionaries) morpheme half 2*, free morpheme 2*
2270 last syllable of (2mɪ) 2mɪ 1mia 'a kind of bird' (only in dictionaries) morpheme part 3*
2648 first half of 1mia 1khiu 'underground' (1khiu is 'under') bound morpheme 1
3334 female, woman (cf. 0092) free morpheme 1
3369 end, tail, east (only in dictionaries; cf. 1178); first syllable of 1mia 2ɬiụ 'plantain' and 1mia ?xa 'water buffalo'; transcription of Sanskrit ma, mā free morpheme 1*, morpheme half 1, morpheme half 2, (not in Tangut words)
3527 analogy; generally; doubt, fear (i.e., uncertain); and; few; should (i.e., to be time for), time; clothes free morphemes 5-11
3569 fishing hook free morpheme 12
3718 second half of 1ɣa 1mia 'doorframe' (1ɣa is 'door') bound morpheme 2
5118 second half of 1niu 1mia 'earring' (1niu is 'ear') bound morpheme 3
5025 2mia transcription of Sanskrit mya (not in Tangut words)

Why are there so many 1mia - and no native 2mia? The lower frequency of second tone syllables indicates that the source of the second tone must have been something extra which I reconstruct as a final glottal *-H by analogy with Chinese.

I reconstruct *Cɯ-ma(C) as the pre-Tangut source of 1mia. The high presyllabic vowel conditioned the breaking of the main vowel:

*C₁ɯ-ma(C₂) > *C₁ɯ-mɨa > *mɨa > 1mia

I don't know when the final consonant was lost relative to vowel breaking.

The various 1mia may have had different presyllabic and/or final consonants in pre-Tangut: e.g.,

*kɯ-map, *tɯ-mak, *pɯ-ma, etc.

I count 24 types of 1mia:

17 in texts (not just dictionaries; pink):

12 free morphemes (0092 = 3334, 1216, 1530, 1721, 3527 [seven homophones!?], 3569)

3 bound morphemes (2648, 3718, 5118)

2 parts of polysyllabic morphemes (3369 [two homophones])

7 only in dictionaries (blue; possible 'ritual language' words and/or words that didn't happen to appear in Buddhist, Confucian, military, etc. texts: e.g., 'salamander'):

2 free morphemes (1178 = 3369, 1803)

2 bound morphemes (0409, 1458)

3 parts of polysyllabic morphemes (1215, 1803, 2270)

Green indicates a tangraph (3369) that represents one morpheme only in dictionaries and parts of words in texts.

Further analysis may be able to reduce the number of types of 1mia: e.g., the 1mia in 1458 2ni' 1mia 'salamander' may be 'river' and the 1mia in 4681 5118 1niu 1mia 'earring' may be 'hook'.

Although one could describe tangraphy as 'logography' (i.e., as a word-per-character writing system), 3527 might have represented up to seven unrelated words! Conversely, the word 1mia 'female' was written with two tangraphs (0092 and 3334) depending on whether it referred to mothers or females in general. And 1mia 'end' was written differently depending on whether it was an independent word (3369) or in the compound 1178 5734 1mia 2nie 'end-tail'.

10.22.1:54: A high degree of homophony is tolerable: e.g., English can can mean

1. to be able

2. a container

3. to place in a container

4. prison (if preceded by the?)

5. toilet (if preceded by the?)

6. to be ready for release (in the can)

7. to be released from employment (mostly passive: was/got canned?)

8. Canada (e.g., in Canwest)

and various other meanings I have never encountered. Context is sufficient to disambiguate these many uses.

None of those meanings are opposites. One might look up

1530 1mia and 2648 1mia

in Li Fanwen (2008) and think they are near-opposites ('river' and 'land'), but in fact the latter apparently only occurs in the disyllabic expression

2648 5399 1mia 1khiu 'underground'

and I suppose that is much more common than

1530 5399 1mia 1khiu 'under a river'

so there is little risk of ambiguity. (In Google, under a river has 8.74 million hits, which sounds like a lot, but underground has 335 million hits! And many references to under a river involve underwater construction that would have been unimaginable to the Tangut nearly a thousand years ago.) 'ZEN': A REMNANT OF TANGUT EMPIRE CHINESE?

KJ Solonin's article made me think about the Tangut name for Zen


3504 1ʂɨã =

all of 2833 2diẽ 'calm, quiet' (probably 'not' + top and bottom right of 'to move')

left of 5593 1bɤo' 'to look, watch, observe'

as well as the Tangut names of Kumārajīva (part 1 / part 2). 1ʂɨã is a borrowing from Tangut period northwestern Chinese 禪 *ʂɨã which in turn is from Late Old Chinese (LOC) *dʑian, a Sinified form of Pali jhāna- (< Sanskrit dhyāna 'meditation'). (Japanese Zen is from Middle Chinese *dʑien.) Coblin (1994: 323) reconstructed 禪 as *śan ~ *źan in the 9th and 10th centuries AD on the basis of these Tibetan transcriptions:

大乘中宗見解: shan, zhan

南天竺國菩提達摩禪師觀門: zhan, Hzhan

LOC *dʑ developed differently in premodern northwestern Chinese and in Mandarin in 'level' tone syllables:

Tone 'Level' 'Nonlevel'
Premodern northwestern Chinese > >
Mandarin ch [tʂʰ] sh [ʂ]

I don't understand the phonetic motivation for the split. Why were 'nonlevel' tones incompatible with a voiced affricate? (Voiceless affricates were possible before 'nonlevel' tones.)

Although modern northwestern Chinese generally has Mandarin-style reflexes of *dʑ, 禪 'Zen' still has a fricative initial in some varieties (Coblin 1994: 323):

Xining ʂã⁴⁴

Dunhuang ʂæ̃²⁴

Early 20th century Xi'an (as recorded by Karlgren): ʂæ̃ (tone unknown)

I thought these fricatives might be substratum retentions. I had either forgotten or overlooked this passage earlier in Coblin (1994: 101):

Occasional exceptions are found [to the Mandarin pattern of reflexes of *dʑ ...], e.g.[0678] (QYS źi̯än) "Zen Buddhism": [mid-Tang Chang'an] *dźan > *źan; CSZ [colloquial Suzhou] *śan (~ *źan?); XN [Xining]: ʂã⁴⁴; DH [Dunhuang]: ʂæ̃²⁴. These exceptional modern reflexes appear to derive directly from forms like those found in CSZ.

I looked for those "occasional exceptions" and found

蟬 LOC *dʑian 'cicada' is ʂæ̃²⁴as well as tʂʰæ̃²⁴(cf. standard Mandarin chan) in Xiaoxuetang's Xi'an data

辰 LOC *dʑin 'fifth Earthly Branch' is ʂɛ̃ (tone unknown) in Karlgren's Xi'an data (Coblin 1994: 361) and ʂẽ²⁴as well as tʂʰẽ²⁴ (cf. standard Mandarin chen) in Xiaoxuetang's Xi'an data

This last graph has two Sino-Korean readings, chin (without aspiration!) and shin. The first reading may be an old borrowing from Early Middle Chinese *dʑin; the second is from Late Middle Chinese *ɕin.

The multiple Sino-Korean readings of 什 in 鳩摩羅什 'Kumārajīva') may also be from different strata of borrowing: 집 chip from Early Middle Chinese *dʑip and 십 ship from Late Middle Chinese *ɕip. (집 chip becomes -jip with secondary voicing after a sonorant. That voicing is due to a Korean phonological rule and does not preserve the voicing of Early Middle Chinese *dʑip.)

A third Sino-Korean reading 습 sŭp is difficult to explain; it may be from a different Late Middle Chinese dialect in which *-ip became *-ɨp rather than vice versa.

The Xining reading of 禪 'Zen' also has an irregular 'yin level' tone (which would normally reflect an earlier *voiceless initial) instead of the expected 'yang level' tone (reflecting an earlier *voiced initial). I don't think the tone of 禪 'Zen' indicates that it had a voiceless initial in pre-Xining. I hypothesize that the original dialect of the region had a 'yang level' tone that sounded like the 'yin level' tone of the Mandarin dialect that displaced it.

If I am correct, then a study of irregular tones in Xining may reveal something about the substratal tone system. Unfortunately, it may not reveal the exact values of the tones at the time of borrowing because all tones - substratal and superstratal may have changed since then. So I don't know if 44 was the 'yang level' tone contour in the substratum dialect.

It would be interesting if other modern northwestern dialects also have a seemingly 'yin level' tone for 禪 'Zen'.

Dunhuang only has one 'level' tone which may be a merger of earlier 'yin level' and 'yang level' tones.

I don't know the modern Xi'an reading of 禪 'Zen', but I do know that both the substratal fricative-initial and superstratal affricate-initial readings of 蟬 'cicada' and 辰 'fifth Earthly Branch' have 'yang level' tones in modern Xi'an. Were the tones of the substratal readings shifted to match the superstratal tones?

One last question: Why would northwestern Chinese retain an old word for 'Zen'? The answer probably has something to do with the religious history of the region.

I am reminded of how Japanese Buddhist terminology consists of Early Middle Chinese-based borrowings (呉音 Go-on) that were not displaced by Late Middle Chinese borrowings (漢音 Kan-on) during the Tang Dynasty: e.g., 禪 Zen was not replaced by a newer borrowing *Sen. (One might think that Zen Buddhism was practiced in Japan before the Tang Dynasty, but in fact it took root in the 12th century when 1ʂɨã 'Zen' was practiced in the Tangut Empire. An old reading Zen was used for a new school because of the strong association between Go-on and Buddhism in Japan.)

On the other hand, Korean Buddhist terminology generally consists of Late Middle Chinese borrowings: e.g., 禪/선 Sŏn 'Zen' probably replaced an earlier borrowing that would have become modern 전 *Chŏn. A rare exception is the 什 -jip in 鳩摩羅什/구마라집 Kumarajip. But that is not the most common reading of 鳩摩羅什. Here are Google frequencies for the three readings of the name:

구마라십 Kumaraship: 215,000

구마라집 Kumarajip: 21,900

구마라습 Kumarasŭp: 19,300

The newer reading 십 ship outnumbers the older reading 집 jip by nearly ten to one.

The older voiced affricate reading of 禪 'Zen' has left no trace in Sino-Vietnamese. The only Sino-Vietnamese reading of 禪 is Thiền from southern Late Middle Chinese *ʑien; there is no *Chiền from southern Early Middle Chinese *dʑien. THE TANGUT NAMES OF KUMĀRAJĪVA (PART 2)

The third Tangut name of Kumārajīva shares no characters with the other two:

1429 4575 4710 4867 1kiew 2mo 1lo 1ʂɨəʳ

It is obviously based on Tangut period northwestern Chinese 鳩摩羅什 *kɨwmbɔlɔʂɨi from a 4th century *kumaladʑip.

As I mentioned yesterday, 1429 is also the transcription character for 鳩 in the Tangut translation of the Forest of Categories (Gong 2002: 438).

4575 and 4710 are also transcription characters for Sanskrit mo and lo (Arakawa 1997: 111).

4867 was also used to transcribe other Chinese characters pronounced *ʂɨi (十實失室) and 涉 *ʂɨa (Li 2008: 770). The retroflexion in Tangut may have reflected subphonemic vowel retroflexion in Chinese after retroflex affricates: /ʂi/ = [ʂɨiʳ] and /ʂia/ = [ʂɨaʳ].

In theory the name could have been borrowed in a more Sanskrit-like form as *kʊ ma raʳ dzi va via Tibetan kumaradziba [kumaradziwa] or directly from the variety of Sanskrit known to the Tangut which had [dz] for j. (My Tangut reconstruction has no rhyme -u. Retroflexion was almost always obligatory after r- in Tangut.)

I was curious to see how Kumārajīva was rendered in other languages. Judging from Wikipedia entry titles:

Czech Kumáradžíva preserves the long vowels.

Polish Kumaradżiwa [kumaradʐiva] has retroflex for Sanskrit palatal j [dʑ]. I would have expected *Kumaradziwa [kumaradʑiva] with palatal dz (pronounced like [dʑ] before i). The combination of retroflex and palatal i is unusual in Polish. I wonder if that i is pronounced [ɨ] as in the normal Polish combination ży [ʐɨ].

Ukrainian Кумараджива [kumaradʐɪva] has [ɪ] instead of [i]. I presume the spelling was taken from Russian Кумараджива [kumaradʐɨva].

Korean 쿠마라지바 [kʰumaradʑiba] has an un-Sanskrit (and English-influenced?) initial aspirate. I presume it is a modern term. Older names are 鳩摩羅什 Kumarasŭp/Kumaraship/Kumarajip (the last character is read three different ways) and 羅什 Nasŭp (with initial r- becoming n- before a-). THE TANGUT NAMES OF KUMĀRAJĪVA (PART 1)

Having just written about Chinese transcriptions of Indic, I thought it was neat that I then stumbled upon KJ Solonin's tentative identification of

2152 3284 1ʂɨi 2lɨa

as a Tangut transcription of the name of Kumārajīva (1998: 411, 414 #80), translator of the Lotus Sutra and other Buddhist texts into Chinese. Kumārajīva's Chinese name was 鳩摩羅什, pronounced *kumaladʑip in the 4th century AD. In the Tangut period northwestern dialect of Chinese, it would have been read as *kɨwmbɔlɔʂɨi. If the two names are connected, the Tangut name might be an accidental inversion of

*3284 2152 2lɨa 1ʂɨi

corresponding to 羅什 *lɔʂɨi, an abbreviation of 鳩摩羅什 *kɨwmbɔlɔʂɨi. (This abbreviation was obviously created by a Chinese speaker, as a natural break in the Sanskrit would be between Kumāra 'boy, prince' and jīva 'life'.)

Unfortunately, the name 2lɨa 1ʂɨi only appears once in the text that Solonin translated. However, a transcription of the full name 鳩摩羅什 *kɨwmbɔlɔʂɨi does appear in the Hongchuan preface of the Lotus Sutra (Li 2008: 533; see Nishida 2004 on the Tangut Lotus Sutra):

3948 3369 3284 2152 3284 (again!) 1kɨa' 1mia 2lɨa 1ʂɨi 2lɨa

There are several things that are odd about this spelling.

First, 3948 1kɨa' is a poor match for Chinese 鳩 *kɨw. It is a transcription character for Sanskrit ka and kya (Arakawa 1997: 110, 116; Kychanov and Arakawa 2006: 692). In the Tangut translation of the Forest of Categories,*kɨw was transcribed as

1429 1kiew

which is a much better match (Gong 2002: 438). 1429 is also a transcription character for Sanskrit (?) kyu (Grinstead 1972: 111) and is the first character in a different transcription I'll examine tomorrow.

Second, 3369 1mia (rhyme 20) has an -i- that corresponds to zero in Chinese 摩 *mbɔ and Sanskrit and Tibetan ma (Arakawa 1997: 110, Kychanov and Arakawa 2006: 234).

Maybe I should follow Sofronov and Arakawa and stop reconstructing -i- in rhyme 20.

Third, 3284 2lɨa (rhyme 19) has an -ɨ- that corresponds to zero in Chinese 羅 *lɔ and Sanskrit la (Arakawa 1997: 110).

I have yet to see a fully satisfactory solution to the problem of reconstructed Tangut medials seemingly reflecting nothing in transcriptions of Chinese and Sanskrit.

Fourth, 3284 appears again, corresponding to zero in the four-syllable Chinese name. The first four syllables of this longer Tangut name are obviously based on Chinese (hence 2lɨa for 羅 *lɔ rather than *raʳ for Sanskrit ra). I would have expected a fifth syllable to be

2640 1pho

a transcription of Chinese 婆 *phɔ < *ba for Sanskrit va in longer Chinese names for Kumārajīva:

鳩摩羅什婆 *kɨwmbɔlɔʂɨiphɔ < *kumaladʑipba

鳩摩羅時婆 *kɨwmbɔlɔʂɨiphɔ < *kumaladʑɨba

鳩摩羅耆婆 *kɨwmbɔlɔtʂɨiphɔ < *kumalatɕiba

Having not seen the text where Li found this longer transcription, I don't know if this second 3284 is a typo (I doubt that, as even the Chinese translation has a doubled syllable: 鳩摩羅什羅) or in the orignal. Kychanov and Arakawa (2006: 692) do not list any words beginning with 3948. Maybe this longer name is a confused blend of *1kɨa' 1mia 2lɨa 1ʂɨi and the short inverted name 1ʂɨi 2lɨa.

At least 2152 1ʂɨi is a perfect match for Chinese 什 *ʂɨi, and is attested as a transcription of the last syllable of the name 李七什 *lɨi tshi ʂɨi (Li 2008: 356).

Next: Another Tangut name for Kumārajīva. TESTING STAROSTIN'S 'LATE-RAL' SCENARIO

(I rhyme lateral [ˈlætəɹo] and scenario [səˈnæɹio]. 'Late-ral' is [ˈlejtəɹo] with a linking schwa to preserve the resemblance to [ˈlætəɹo].)

One of the biggest sound changes in Chinese was the loss of laterals:

Old Chinese *l- in type A syllables > Middle Chinese *d-

Old Chinese *hl- in type A syllables > Middle Chinese *th-

Old Chinese *l- in type B syllables > Middle Chinese *j-

Old Chinese *hl- in type B syllables > Middle Chinese *ɕ-

(The nature of the Old Chinese type A/B distinction is disputed, but the Middle Chinese initials are uncontroversial.)

In my last entry, I mentioned two conflicting chronologies for the lateral shift in Chinese. Schuessler (2009) reconstructed Middle Chinese-like initials (*j-, *ɕ-, *d-, *th-) in his Later Han Chinese (i.e., Eastern Han / Late Old Chinese), whereas Starostin mostly reconstructed transitional fricatives or laterals for that period:

Old Chinese syllable type Early Old Chinese Late Old Chinese Middle Chinese
Starostin Schuessler
A *l- (Starostin: *l- and dɮ-) *l- *d-
*hl- (Starostin: *tɬ-) *hl- *th-
A and B *r- *l-
B *l- *ʑ- *j-
*hl- (Starostin: *tɬ-) *ɕ-

(I use the same notation regardless of scholar for ease of comparison. I list Starostin's reflexes of his Early Old Chinese *tɬ- and *dɮ- because they correspond to *hl- and *l- in others' reconstructions. Starostin's EOC *hl- behaved differently from others' *hl-; it became Late Old Chinese and Middle Chinese *h- [= others' *x-]. For arguments against Starostin's lateral affricates, see Sagart 1999. I have included EOC *r- for comparison.)

To test Starostin and Schuessler's reconstructions of Late Old Chinese (LOC), let's look at Eastern Han transcriptions of Indic from Coblin (1983).

If Starostin is right:

- LOC *l- should transcribe Indic l

- LOC *hl- shouldn't be used in transcription because there was no Indic voiceless hl

- LOC *r- should transcribe Indic r

If Schuessler is right:

- LOC *d- from EOC *l- could transcribe Indic d

- LOC *th- from EOC *hl- could transcribe Indic th

- LOC *l- from EOC *r- should transcribe both Indic *l and *r (since LOC no longer had *r-)

Both would agree that LOC *ɕ- should transcribe Sanskrit ś [ɕ].

As I already noted last time, the correspondence of Starostin's *ʑ- / Schuessler's *j- to Indic y- [j] is ambiguous since Starostin would have said that *ʑ- was the closest available initial due to the absence of *j- in his LOC. Correspondences between this LOC initial and Sanskrit c-, j- [ɟ], ś- [ɕ], and s- suggest that it was "a fricative or affricate of some sort" (Coblin 1983: 63): e.g., Starostin's *ʑ-.

In the transcriptions of 安世高 An Shigao (mid-2nd c. AD) we find that:

- Indic d and even intervocalic -t- were transcribed with Starostin's LOC *l- / Schuessler's *d- (18, 19; the numbers are from Coblin 1983)

- Indic l was transcribed with Starostin's LOC *r- / Schuessler's *l- (13, 15, 28)

These pattern are not quirks of An Shigao; they can also be found in the transcriptions of 支婁迦淺 Zhi Loujiachen/Lokakṣema (mid-2nd c. AD; his name has 婁 Starostin's LOC *r- / Schuessler's *l- for Sanskrit l-) and 康孟詳 Kang Mengxiang (late 2nd-early 3rd c. AD). All three men were non-Chinese who settled in Luoyang, so their transcriptions probably represent the same dialect.

The only Indic th in An Shigao's transcriptions was transcribed with 替 whose EOC initial is ambiguous. It definitely had *th- in Middle Chinese and must have had *th- here. Starostin might have taken that as evidence for reconstructing  替 with *th- in EOC.

th is a low-frequency consonant, so it's not surprising that there are no instances of it transcribed with original or secondary *th-. (Oddly Lokakṣema transcribed it as the coda-onset sequence -t s- in 55.)

I conclude that the following chain shift had occurred in the Luoyang dialect of LOC by the mid-2nd century AD:

*r- > *l- (type A) > *d-

This is contrary to Starostin's 'late-ral' scenario in which the laterals hardened later.

I also reconstruct a parallel change

*hl- (type A) > *th-

on the grounds that it would be odd if *hl- lagged behind its voiced counterpart *l-. Unfortunately there is no Indic transcription evidence for that.

Phonetic glosses such as

'聖 *hlieŋh (type B; > MC *ɕieŋʰ) is read like 通 *hloŋ (type A; > MC *thoŋ)' (Xu Shen 1063, b. in 召陵 Zhaoling 200 km SE of Luoyang, fl. c. 100 AD)

'天 *hlein (type A; > MC *then) read as 身 *hlin (type B; > MC *ɕin)' (Gao You 243, b. in 涿 Zhuo, fl. c. 200 AD)

indicate that *hl- did not harden in other LOC dialects during the early centuries of the first millennium AD. The glosses would not make sense if *hl- had already become *th-.

10.17.23:17: Some LOC glosses that seem bizarre might make more sense if we don't try to shove the words into the standard paradigm defined by the Chinese lexicographical tradition. For instance, perhaps Xu Shen pronounced 通 as something like *hliøŋ with a front diphthong similar to 聖 *hlieŋh. The expected Old Chinese reconstruction 通 *hloŋ is mechanically derived from Middle Chinese *thoŋ, whereas my hypothetical *hliøŋ would have vowel warping conditioned by a presyllable in an Old Chinese variant *Cɯ-hloŋ or *Hɯ-loŋ. Perhaps *Hɯ-loŋ was the earliest form which developed along two paths:

Early fusion: i.e., before conditioned vowel warping

*Hɯ-loŋ > *hloŋ > *thoŋ (Middle Chinese prestige form recorded in dictionaries)

Late fusion: i.e., after conditioned vowel warping

*Hɯ-loŋ > *Hɯ-luoŋ > *hluoŋ > *hlioŋ > *hliøŋ (> Middle Chinese *ɕyøŋ?; nonprestige and extinct?)

For more examples of variation between fused and unfused presyllables, see the discussion of Phan Rang Cham (Austronesian) and Ruc and Nha Heun (Austroasiatic) in Sagart (1999: 15-17). TILTED TONGUE

On Monday, I wrote,

I wrote the pre-Tangut source of ld- as *L-. External evidence may help us identify what *L- was.

Last night I mentioned


3190 1ldwia 'tongue' = (4226 1ldwị + 0537 1pia) + 1223 2phɤo' (Mixed Categories of the Tangraphic Sea 11.122)

as one of the syllables with a fanqie including the mysterious additional character 1223.

1ldwia is probably related to the many l-words for 'tongue' in Sino-Tibetan: e.g.,

Old Chinese 舌 *mɯ-lat or *m(ɯ)-ljat (Baxter and Sagart 2014: *mə.lat)

also cf. 舐/舓/咶 *mɯ-leʔ or *m(ɯ)-ljeʔ (B&S 2014: *Cə.leʔ) 'to lick'

and perhaps 舔 *hlˁimʔ < *qlimʔ or *Hʌ-limʔ (? - I can't find any attestations before the 13th century AD; nonetheless it resembles lem-words for 'tongue' elsewhere in Sino-Tibetan and may be very old) 'to lick'

It is not possible to determine whether Middle Chinese *ʑ- in 'tongue' and 'lick' is from *mɯ-l- or *m(ɯ)-lj-. Coblin (1986) reconstructed medial *-i- for 'tongue' at the Proto-Sino-Tibetan level.

If the third word is related, and if the root was *√lj, then I can reconstruct

*m(ɯ)-lj-a-t (a-grade)

*m(ɯ)-lj-e-ʔ (e-grade)

It is tempting to reconstruct *m(ɯ)-lj-a-j-ʔ (a-grade), but the phonetics 氏 and 易 point to *e.

*qli-m-ʔ or *Hʌ-li-m-ʔ (zero grade; the *j of the root became *i if no vowel followed)

Classical Tibetan ljags /ldʑags/ < *n-ljaks (Jacques, "The laterals in Tibetan")

CT j is an affricate /dʑ/, whereas pre-Tibetan *j is a glide.

Although it would be nice if Tibetan had *m- like Chinese, *m-lj- would have developed into mj- /mdʑ/, not lj- /ldʑ/ (Jacques, "The laterals in Tibetan").

Written Burmese hlyā

I cannot explain the variation in final consonants (Old Chinese *-t and *-[m?]ʔ, pre-Tibetan *-ks, Written Burmese zero). I presume they are all suffixes.

The pre-Tangut source of 1ldwia must be a combination of the following elements:

ld- may be from a consonant prefix plus root *l-

-w- is from a labial prefix *P- (and that prefix might have combined with *l- to form ld-)

-i- is from *-j- and/or a presyllabic *-ɯ-

a final stop could have been lost without a trace

the tone indicates there was no final *-H

If the root was *√lj, that narrows down the possibilities.

The simplest reconstruction would be *m-lja whose *m- would combine with *l- to form ld- and condition the medial glide -w-.

A more complex reconstruction *P-N-lja would have separate sources of -d- and -w-.

Forms for 'tongue' in Horpa varieties seem to be from  *P-lj-: fʑa, vɮɛ, etc. See STEDT and the rGyalrongic Languages Database (item #36).

According to Guillaume Jacques ("The laterals in Tibetan"), Li Fang-Kuei, Coblin, and Gong all reconstructed *n-l- as the source of Written Tibetan ld- (whereas Jacques reconstructed *d-l- since his *n-l- became WT Hd- /nd/.) Perhaps *N-l- similarly became ld- in Tangut. *N- may have been an *n- as in pre-Tibetan *n-ljaks 'tongue' or an *m- as in Chinese *m(ɯ)-ljat.

The only other word out of the eight I discussed yesterday that might have a cognate - with emphasis on might - is


0841 1ɬwiẹ 'oblique, slanting, inclined' = (2814 2ɬ + 3439 1piẹ) + 1223 2phɤo' (Mixed Categories of the Tangraphic Sea 12.122)

Before I go on to a possible cognate, I realize what 1223 is doing here and in various other cases. I think 1223 in such contexts means 'combine the initial of one syllable with a labial-initial syllable to form a syllable with medial -w-': e.g.,

1ɬiẹ + 1piẹ = 1ɬpiẹ > 1ɬwiẹ ̣

Could this suggest that -w- was [v] or [β] and that Tangut labials lenited in coherent speech (as opposed to words pronounced in isolation): i.e., 1ɬiẹ 1piẹ was pronounced [ɬiẹ viẹ] or [ɬiẹ βiẹ]?

Another possibility is that labials were followed by a subphonemic glide [w]: e.g., 1piẹ /piẹ/ was [pwiẹ] and

1ɬiẹ + [1pwiẹ] = 1ɬwiẹ

There was no contrast between /P/ and /Pw/ in Tangut.

That does not explain the highly anomalous fanqie for 2417 (which does not have a labial-initial final speller; moreover, its final speller has a different rhyme with the wrong tone!):


2417 1ʂwɨọ 'to need, want' = (0245 2ʂwɨi + 1449 2tʂhwɨoʳ̣̣) + 1223 2phɤo' (Tangraphic Sea 55.222)

Moreover, 1223 is redundant in cases like the one above and


5679 1khwɤa 'remnants' = (2554 1khwɤe + 4314 1bɤa) + 1223 2phɤo' (Tangraphic Sea 26.211)

in which the initial speller has -w-. Perhaps this use of 1223 originated in fanqie for words like 0841 and was overextended.

Back to cognates: 0841 1ɬwiẹ could go back to *S-P-KE-la:

*S- conditioned the tense vowel

*P- conditioned -w-

*K- fused with *l- to form ɬ-

*-E- conditioned the raising and breaking of *a to ie

The root *la would be shared with Old Chinese 邪 'awry' *sla (spelled 斜 from the 2nd century BC onwards for 'slanted'). But it is not clear if 邪 had an *l-root.

First, other *l-less reconstructions of 邪 are possible: e.g.,

*sja (Schuessler 2009 and this site)

*sə.ɢA (B&S 2014, which reconstructs the left side 牙 of 邪 as *m-ɢˤ<r>a; Schuessler 2009 reconstructs *ŋrâ and I reconstruct *ŋra)

Second, the lateral phonetic 余 *la of the later spelling 斜 is not strong evidence for an *l-root if

- Baxter and Sagart's *sə.ɢa is correct

- *l- had shifted to *ʑ- by the 2nd century BC

- *sə.l-, *s-l-, *s-ɢ-, and *sə.ɢ- had merged into something like *sj- or *zj- (i.e., a *ʑ-like cluster) by the 2nd century BC

However, Starostin reconstructed a different chronology in which laterals remained lateral as late as the 2nd century BC (i.e., during the Western Han):

*lhia > 邪/斜 Western Han *lhia > Eastern Han *zhia

*dɮa > Western Han *la > Eastern Han *ʑa

Eastern Han transcriptions of Sanskrit y- are ambiguous. Starostin might have said that Chinese *ʑ- was used for Sanskrit y- because there was no *j-. On the other hand, Schuessler would say that Chinese *j- was used for Sanskrit y-. BIRD WORDS

At the end of my last entry, I asked what 1223 was doing in this Tangut fanqie:


1363 1swia 'time' = (5323 1swi + 0537 1pia) + 1223 2phɤo' (Tangraphic Sea 29.132)

The analysis of 1223 2phɤo' 'gentle, harmonious, together, pair'  is unknown, but it looks like 'bird' + 'word':


It is in eight fanqie in the first and third surviving volumes of the Tangraphic Sea. It might have been in the lost second volume as well.

Volume/Page/position Tangraph Li Fanwen number Initial class Rhyme Reading (Nishida-style, Arakawa 1997) Reading (this site) Fanqie Gloss
initial final
1.26.211 5679 V 1.18 1khamba 1khwɤa 2554 1khwɤe 4314 1bɤa remnants (only in dictionaries?)
1.29.132 1363 VI 1.20 1špwaɦ 1swia 5323 1swi 0537 1pia time, transcription character for Chinese 宣 *swiã, *siu
1.55.222 2417 VII 1.48 1štšhor 1ʂwɨo 0245 2ʂwɨi 1449 2tʂhwɨoʳ to need, want
1.84.253 1029 V 1.80 1kwɑr 1kwaʳ 2503 1kʊ̣ 5528 1baʳ to cry, weep, sob
3.11.111 0732 IX 1.64 1hlwạ 1ɬwiạ 1770 1ɬwi 5370 1piạ ash, dust
3.11.122 3190 1.20 1ɬwaɦ 1ldwia 4226 1ldwị 0537 1pia tongue
3.12.111 2238 1.67 1hlwị 1ɬwị 0239 1ɬiə 5212 1pị the surname Lhwi
3.12.122 0841 1.61 1lwɛ̣ 1ɬwiẹ 2814 2ɬị 3439 1piẹ oblique, slanting, inclined

What is the function of 1233? It can be translated into Chinese as 合 'together', the word used in Middle Chinese transcriptions of Sanskrit to indicate that two syllables were to be read as one: e.g.,

娑婆二合 *sa ba TWO TOGETHER for Sanskrit sva

One might expect 1233 to appear in fanqie for Sanskrit transcription characters, but it doesn't; in fact, one of the fanqie is for the basic word 3190 1ldwia 'tongue'. Why wasn't its fanqie simply

4226 1ldw + 0537 1pia

without 1233? Fanqie are by definition combinations of initials and finals; wouldn't 1233 be redundant?

In any case, 1233 is not a carryover from the Chinese lexicographical tradition, since 合 does not appear in Chinese fanqie.

1233 is interpreted in at least three ways in Arakawa's Nishida-style reconstruction:

1. Read as a sequence of two syllables:

(1kĭɛ2 + 1mba) TOGETHER = 1khamba

This is the only disyllabic reading in Arakawa's Nishida-style reconstruction.

Why isn't the combination 1kĭɛ2mba or 1kamba (if the second rhyme is copied in the first syllable)?

2. Read as a combination of the initials of the two syllables and the rhyme of the second syllable:

(1sw + 1paɦ) TOGETHER = 1špwaɦ

(2ši + 2tšhɔr) TOGETHER = 1štšhor (not 2štšhɔr!)

3. Redundant in the other five instances which might as well be normal fanqie

The first two interpretations are highly unlikely. I don't know of any transcriptions of 5679. And I doubt Chinese 宣 *swiã and 修 *siu would have been transcribed with a very un-Chinese cluster špw-.

So that leaves the third interpretation which is also unsatisfying. What, if anything, does 1223 indicate that differs these eight syllables from all others in the Tangraphic Sea? I can't help but fear that the instances of 1233 in the lost second volume might not shed light on this mystery. A PHONETIC KEY TO TANGRAPHIC SEA RHYME 1.20

Nearly fifty years have passed since the Russian translation of the Tangraphic Sea, and the Chinese translation of that dictionary turned thirty last year. An English translation would be nice but perhaps also redundant since Tangutologists should be able to read Russian and/or Chinese. Of course, English would be nice for many non-Tangutologists. What I would like to see (and make) is a Tangraphic Sea with reconstructed character readings. Since I have been writing abou rhyme 20 syllables lately, here are the readings for the

rhyme 20 1sia 'to do (only in dictionaries?); transcription character for Chinese *sa, *sã and Sanskrit sa, sā'

entries in the first (level) tone* volume of the Tangraphic Sea. You can see the characters in Andrew West's online Tangraphic Sea. I have added the initial classes from Homophones. Groups are divided by circles in the original text.

Page/position Initial class Group Reading Fanqie Number of tangraphs
initial final
27.241-27.261 I 1 1pia 2228 p- 1216 6
27.262-28.111 2 1phia 0797 ph- 0618 4
28.112-28.211 3 1mia 5026 m- 3583 17
28.212-28.221 III 4 1tia 5300 t- 4620 2
28.222 5 1thia 5671 th- 1
28.231 1nia 0635 n- 3179 1
28.232 V 1kia 1484 k- 1
28.233-28.241 1gia 2900 g- 1693 2
28.251-28.262 VI 6 1tsia 3031 ts-/dz- 5
28.271 7 1tshia 3278 tsh- 1
28.272-29.111 1sia 4250 s- 2
29.112-29.121 VIII 8 1ʔia 5346 ʔ- 4620 2
29.122 IX 9 1ldia1 0475 ld- 3583 1
29.131 1ldia2 4464 ld- 2019 1
29.132 VI 1swia (5323 sw- + 0537) 1223 1
29.141-29.142 1tshwia 0311 tshw- 1289 2
29.143 IX 10 1lwia 2302 lw- 1825 2

The initial classes are in nearly the same order as Homophones except that some class VI tangraphs break up a group of class IX tangraphs.

The absence of classes II (v-) and VII (retroflex shibilants) is a trait of Grade IV rhymes.

Class IV (ɲ-?) is rare.

Some groups divided by circles correlate with homophone groups (e.g., 1-4), but others don't: e.g., the fifth group is a mixture of class III and V syllables.

Fanqie initial speller 3031 is ambiguous (see "When Rhyme 21 Is Really Rhyme 20" and "When 1825 Is Really 1829"). I would not expect 3031 to represent dz- here, since dz-tangraphs were placed in the Mixed Categories volume of the Tangraphic Sea.

I see now that I mixed up the fanqie of 1829 and 1825 (as well as those characters themselves) last week. Great. For the record, the correct fanqie are


1829 'to heat up, burn' 1tshia = 3278 1tshi + 1693 1sia (Tangraphic Sea 28.271)


1825 1tshwia 'to roast, warm up' and 5041 1tshwia 'stove, furnace' =

0311 1tshwiə + 1289 1lwia (Tangraphic Sea 29.141-29.142)

1825 is from 1829 with a prefix *P- in addition to the *Kɯ- that conditioned aspiration and vowel breaking:

*Kɯ-tsa > 1829 tshia

*P-Kɯ-tsa > 1825 tshwia

(The bare root is in Tibetan tsha 'hot' whose initial aspiration is secondary. More cognates here.)

5041 is presumably an extended use of 1825 (i.e., 'where food is warmed up', 'device for heating').

In theory one might expect only one fanqie final speller for all rhyme 1.20 syllables or two (one for -ia and another for -wia), but in fact there are ten! That does not mean there were ten subtypes of rhyme 1.20 syllables.  Nearly all of those ten can be linked in a complex fanqie tree:

3179 0618
3583 2019

Members of that tree are in pink in the first table. (I have colored 0537 somewhat differently since it is followed by 1223. I will write about 1223 in my next entry.)

I placed 1693 at the root since its fanqie final speller is ... itself! 1693 is the final speller of 3179 and 0618, 3179 is the final speller of 4620 which is the final speller of 3583 and 2019, etc.

The final spellers 1289 and 1825 for -wia form a closed circle. 1289 is the final speller of its final speller 1825 (see above for the fanqie of 1825).


1289 1lwia 'lower limbs, legs' = 2302 1lɨə + 1693 1tshwia (Tangraphic Sea 29.143)

I don't know why 1363 swia wasn't spelled with either 1289 or 1829:


1363 1swia 'time' = 5323 1swi + 0537 1pia + 1223 2phɤo' (Tangraphic Sea 29.132)
Next: What is 1223 doing in that fanqie?

10.14.21:21: The numbers at the ends of

1ldia1 'to come' and 1ldia2 'to return, transport'

indicate that they were treated as nonhomophonous (heterophonous - why isn't that word used more in linguistics?) in the Tangraphic Sea (and in Homophones!) even though their fanqie seem to indicate they are homophones. Their final spellers belong to the same tree (see above), and the initial speller of 1ldia2 is derived from the initial speller of 1ldia1. See "Come Again?" for details.

"1.20" in the title of this post refers to tone one, rhyme 20.

The Tangraphic Sea volume for the second [rising] tone has been lost. Rhyme 2.17 is the rising tone counterpart of rhyme 1.20. The rhyme numbers do not match since not all level tone rhymes have rising tone counterparts and vice versa: e.g., 1.6, 1.13, and 1.16 lacked rising tone versions. Arakawa (1997) lists rhyme 1.20 and 2.17 tangraphs side by side. THE COMING CLAN

Yesterday I reconstructed a Tangut word for 'come' with ld-. Other words for 'come' have the same fanqie initial speller (0475), so they can also be reconstructed with ld-:

3456 1ldia < *Cɯ-La 'to come'

*C- might be the *S- conditioning vowel tension (indicated with a subscript dot) in the words below. *Sɯ- could have been lost after the vowel conditioned breaking (see below) but before *S- could condition tension.

Normally conditions the breaking of *a to ɨa after *l-. Did *a break to ia after *L-?

4106 1ldɨə̣ < *S-Lə 'to come'

2373 1/2ldɨẹ < *Sɯ-La/ə-j(-H) 'to come'

The root vowel is ambiguous.

The Precious Rhymes of the Tangraphic Sea has two entries for this character, one in the level tone volume and the other in the rising tone volume. Although there are other characters with two readings, I don't know of any other case in which the two readings only differ in tone.

5727 1ldɨə̣ < *S-Lə 'to transport, come' (homophone of 4106; cf. how 3456 is nearly homophonous with 3502 'to transport', written as a mirror image of 5727 and derived from it:


I wrote the pre-Tangut source of ld- as *L-. External evidence may help us identify what *L- was. There are many Sino-Tibetan words for 'come' with l-; at least one (Mandarin 來 lai < *mʌ-rək) is not related to the others. Do the Tangut words belong to this clan of l-words? If so,

- do the other languages preserve a root-initial l- that gained a prefix in Tangut?

cf. how *d-l- became ld- in Tibetan (Jacques, "The Laterals in Tibetan")

- or does Tangut preserve a cluster reduced to l- in other languages?

- or are both Tangut ld- and non-Tangut l- from a third source in Proto-Sino-Tibetan? COME AGAIN?

(23:09: The title refers to this idiom and to the fact that 3456 'come' is followed by 3502, another Tangut character containing it in Homophones.)

After two steps backward ... one step forward ... I hope.

In my last post, I mentioned


3456 1lia (Grade IV) 'to come' = 0475 1liu (Grade IV) + 3583 1tia (Grade IV)

which has no homophones: it is in the isolated liquid-initial section of Homophones (A edition, 55A54).

Right below it in Homophones (A edition, 55A55) is


3502 1lia (Grade IV) 'to return, transport' = 4464 1lɨə̣ (Grade III) + 2019 1thia (Grade IV)

which looks like 3456 'come' plus 'hand' and is derived from all of 'come' and the left side of 5727 1lɨə̣ 'transport, come' (also containing 'hand' and 'come' in reverse order) in Tangraphic Sea:


I have followed Gong who reconstructed 3456 and 3502 as homophones in spite of the fact that they are isolates. It would also be hard to distinguish them in context since both are motion verbs. But if they weren't homophones, what was the difference between them?

Could they have had different initials? Their initial spellers are of different grades (III and IV). So perhaps 3456 had Grade IV [l] whereas 4464 had Grade III velarized [ɫ]. If they had identical finals, I would have to posit a phonemic distinction between /l/ and velarized /ɫ/. Sofronov (1968 II: 308) reconstructed 3456 as 1la and 4464 as 1lda. But how could there be such a distinction if the two initial spellers were part of the same fanqie chain?


4464 1lɨə̣ (Grade III) = 0475 1liu (Grade IV) + 1493 siə̣ (Grade IV)

(There was no /ɨə̣/ : /iə̣/ distinction; the quality of the first vowel was dependent on the initial.)

Tai (2008: 201) reconstructed the initial of that chain as ld- since it was transcribed in Tibetan as ld- (11 times) and  zl- (3 times), but never as a simple l- (Tai 2008: 198). That initial was transcribed in late 12th century northwestern Chinese as *l- which is not necessarily evidence for reconstructing Tangut l-. Chinese *l- would have been the best available substitute for an un-Chinese *ld-. (There was no *d- in that Chinese dialect.) Hence there seem to have been two kinds of 1ldia.

I cannot reconstruct either 3456 or 3502 with -w- since the fanqie do not contain such a medial. The final spellers were transcribed in Tibetan without -w- (Tai 2008: 210):

3853: ta (37 times)

2019: tha (9 times)

3853 was also used to transcribe Sanskrit ṭa, ta, and without -v- (Sanskrit had no -w-).

The Chinese transcriptions 怛 *ta and 達 *tha for 3853 and 2019 lack *-w-.

None of the transcription evidence supports the -i- required by my Grade IV hypothesis or Gong's -j-. Sofronov's (2012) -a is much more likely for rhyme 20 which he regarded as Grade I, not IV. The l- from earlier in this post would be unusual before a Grade IV rhyme but normal before a Grade I rhyme. Sofronov (2012) sometimes reconstructed more than one value for a single Tangut rhyme, but rhyme 20 was not one of them. At this point I can only combine Tai's ld- with Sofronov's 1-a and be agnostic about the difference between the two 1lda-like syllables (3456 and 3502). WHEN 1825 IS REALLY 1829

What's worse than having to publicly correct a mistake on a blog? Having to publicly correct that correction!

Andrew West pointed out that the correct fanqie for Tangut character 3371 (and its homophones 0596 and 1283) is

3371, 0596, 1283 1dzia = 3031 + 1829 (not 1825!)

I got the idée fixe that


was the final speller and didn't notice that 1829 with the same left-hand radical 'fire' and a similar right-hand radical in the fanqie of the handwritten copy of the Tangraphic Sea in Wenhai yanjiu (1983) or Arakawa's Seikago tsūin jiten (Tangut rhyme dictionary, 1997).

Notice that I have not supplied readings for 3031 or 1829.

I have already explained why 3031 is ambiguous, and I will add one more complication here:

- 3031 is the initial speller for 3371, 0596, and 1283 which are in the MIxed Categories of the Tangraphic Sea. For some reason, all dz-, dʐ-, and ɬ-syllables were placed in Mixed Categories along with a seemingly random smattering of other syllables. That suggests 3371, 0596, 1283 had dz-.

- On the other hand, 3031 is in the 'rising' tone volume of Precious Rhymes of the Tangraphic Sea instead of the Mixed Categories volume. That implies 3031 did not have dz-.

The fanqie for 1829 indicates -w- ... or does it? There is no transcription evidence for the -w- of 1829, its final speller 1289 1lwia or 0259 1lwia, the only homophone of 1289. -w- is an attempt to account for why 1289 1lwia is not in the same homophone group as

3456 1lia 'to come'

whose Chinese transcription 辢 *la has no *-w-. Then again, that transcription is not ironclad proof 3456 didn't have -w-, because the Chinese known to the Tangut had no syllable *lwa. Nonetheless a Tangut lwia could have been transcribed in Chinese as 辢 *laCLOSED with a small 合 'closed (mouth)' diacritic to indicate -w-.) 1289 and 3456 had the same initial (l-) and rhyme (1-ia), so they presumably had different medials (-w- and zero).

If 1289 was 1lwia, then 1829 was 1tshwia, and 3371, 0596, and 1283 were 1dzwia ... which conflicts with the use of 0596 as a transcription character for Sanskrit ja without -v- (there is no -w- in Sanskrit).

Let's suppose that 3371, 0596, and 1283 were 1dzia without -w- and that their fanqie final speller 1829 was 1tshia without -w-. 1829 and 1825 were in different homophone groups even though they had the same initial (tsh-) and rhyme (1-ia), so they presumably had different medials (zero and -w-). But if 1825 was 1tshwia, why was it transcribed in Tibetan as tsha instead of tshwa? Was the subscript -wa character accidentally omitted?

This is so frustrating. I want to end on a more positive note. Andrew West recently created an online Homophones lookup tool. You can input the Li Fanwen 2008 numbers I use for tangraphs to see that

- 3371, 0596, 1283 1dz?(w)ia are in the same homophone group (31A46-48; all Homophones numbers here are from the A edition; different editions have different numbers)

- 1829 1tsh(w)ia (the final speller for those three syllables) and 1825 1tsh(w)ia (which I confused with 1829) are in different homophone groups (31B36 [which has no homophones] and 33A13-14 [a set of two homophones: 5041 and 1825])

- 1289 1lwia (the final speller for 1829) and 3456 1lia are in different homophone groups (53B78-54A11 [a set of two homophones: 1289 and 0259] and 55A54 [which has no homophones])

Alas, Homophones does not give any concrete information about the homophone groups beyond their initial classes: e.g., 3371, 0596, 1283, 1829, and 1825 belong to the sixth class (alveolars) and 1289 and 3456 belong to the ninth class (liquids). The Tangraphic Sea lists homophone groups organized by rhyme with fanqie, but fanqie for most 'rising' tone syllables are lost, and readings for fanqie spellers are dependent on a mixture of transcription evidence and educated guesswork (e.g., the reasoning for reconstructing -w- above). WHEN RHYME 21 IS REALLY RHYME 20

(10.11.18:25: Formerly titled "Tangut Grade III -a('): Rhymes 19 and 21 (Part 2)", but I changed the title since this entry has nothing to do with either rhyme apart from my confusion of rhymes 20 and 21.)

If you don't want to constantly make a fool of yourself in public, don't blog about Tangut.

For the past couple of days, I've been reconstructing 3371 as 1dzɨa' with Grade III rhyme 21 which would be unusual after dz-, but its fanqie in the Mixed Categories of the Tangraphic Sea clearly indicates that it has Grade IV rhyme 20 which is normal after dz-:


3371 1dzia = 3031 2dzi + 1825 1tshwia (sic; should be 1829!)

Even this corrected (?) reading remains problematic for several reasons.

First, the initial might be ts-. The evidence is ambiguous:

1. There is no fanqie for 3031, the initial speller of 3371.

2. 3031 was used to transcribe

Chinese characters with *ts-readings

Sanskrit ci (pronounced [tsi] in the variety of Sanskrit known to the Tangut, probably via Tibetan which had [ts] for Sanskrit c).

3. 3031 was transcribed in Tibetan as both Hdza and Htsa. The phonetic value of H- is uncertain: it could have represented prenasalization or a voiced back fricative.

4. Another character

1290 2?-ew 'ordinal suffix, class, limitation'

with 3031 as a fanqie initial speller was transcribed in Tibetan as tsa, tsi(H), gtsiH, and gdzi(H).

5. 3371 was homophonous with

0596 'to grow'

a transcription character for Sanskrit ja (pronounced [dza] in the variety of Sanskrit known to the Tangut, probably via Tibetan which had [dz] for Sanskrit j).

Second, it would be odd for a -wia graph (1825; sic - should be 1829!) to be a fanqie final speller for -ia without -w-. But it would also be odd for Sanskrit ja [dza] to be transcribed as dzwia instead of dzia.

The Tibetan transcription of 1825 is tsha, not tshwa. So maybe 1825 lacked -w- after all. And maybe it lacked -i- as well. A Sofronov-style reconstruction of 1825 as 1tsha may be best. But then how can one explain the different fanqie for the other 1tsha (or 1tshia) in the Tangraphic Sea?


1829 'to heat up, burn' 1tshia = 0311 1tshwiə + 1289 1lwia

(10.14.20:00: This is actually the fanqie for 1825!)

Maybe 1829 had -w- and 1825 and its homophone

5041 'stove, furnace'

did not. Their fanqie has no -w- in either speller:


1825 and 5041 1tshia = 3278 1tshi + 1693 1sia (used to transcribe Sanskrit sa)

(10.14.20:00: This is actually the fanqie for 1829!)

I will revise my reconstructions accordingly:

Tangraph Sofronov 1968 Li Fanwen 1986 Gong Nishida-style reconstruction in Arakawa 1997 This site
1829 1tsha 1tsha 1tshja 1tshaɦ 1tshwia (formerly 1tshia)
1825 1tshwa 1tshɛ 1tshjwa 1tshaɦ² 1tshia (formerly 1tshwia)

(10.14.20:04: No, judging from the corrected fanqie, Sofronov and Gong were right to reconstruct -w- in 1825 and 5041! Which means that the equation below is still 'broken' or 'unbalanced', depending on your preference in metaphors.)

Plugging that revised reconstruction of 1825 back into the fanqie at the beginning of this post results in a balanced equation:

3371 1dzia = 3031 2dzi + 1825 1tshia

The two homophones of 1825 listed in Mixed Categories of the Tangraphic Sea share that fanqie and should also be read as 1dzia:

0596 'to grow' and 1283 'stomach' (attested only in dictionaries)

This entry demonstrates how errors and their corrections can cause chain reactions in Tangut reconstructions.

I have eliminated one type of apparent anomaly in rhyme 21: the combination of an alveolar initial dz- with the Grade III medial -ɨ-. But other anomalies remain, and I will examine them in future entries. TANGUT GRADE III -A('): RHYMES 19 AND 21 (PART 1)

Last night I mentioned the words (phrases?)

3371 0378 1dzɨa' 2ʔʊ 'curled hair' and 3371 1144 1dzɨa' 2dị 'bun (of hair)'

and noted that their first syllables had an anomalous initial-rhyme combination. (No, actually they don't!)

3371 has the Grade III rhyme 21 (= 1.21/level tone rhyme 21 and 2.18/rising tone rhyme 18). (10.10.20:01: The true rhyme of 3371 is 20.) Here are the latest reconstructions of that rhyme and its immediate neighbors in the first rhyme cycle:

Rhyme Tibetan transcription Gong 1997 Arakawa 1999 Sofronov 2012 This site
Grade Rhyme Grade Rhyme Grade Rhyme Grade Rhyme
17 -a(H) I -a I -a I I -a
18 (none) II -ia II -ya II -ɑ̂ II -ɤa
19 -a(H) III -ja IIIa -a: III -jɑ III -ɨa
20 IIIb I -a IV -ia
21 III -jaa IV -ya: II/IV -â/-ä III -ɨa'
22 -ang (sic!) I -aa I -a' I -aˁ I -a'
23 -ar II -iaa II -ya' II/III/IV -âˁ/-jaˁ/-äˁ II -ɤa'
24 -a(H) III -jaa III -a:' I/II -aɯ/-âɯ IV -ia'
25 -am I -ã I -an I -an I -ã
26 (none) II -iã II -yan II -ân II -ɤã
27 III -jã III -a:n III/IV -jan/-än III/IV -ɨã/-iã

In Gong's reconstruction, there is no Grade III/IV distinction, and many rhymes are redundant: e.g., rhymes 21 and 24. Hence Gong regarded

3371 1dzjaa (rhyme 21; = my 1dzɨa') 'hair worn in a bun; peak' and 4075 1dzjaa (rhyme 24; my 1dzia') 'thrifty'

(10.10.20:02: 3371 should be 1dzia with rhyme 20.)

as homophones in spite of their placement into different rhymes and homophone groups in the Tangraphic Sea. They are not homophonous in the other three reconstructions.

In Arakawa's reconstruction, rhyme 21 is the only Grade IV rhyme, and it has a combination of the -y- of his Grade II and the vowel length of his Grade III.

Sofronov's reconstruction is very different from all others: e.g., it has Grade II and Grade IV variants of rhyme 21. Sofronov reconstructs five subtypes of a-rhymes corresponding to three subtypes in the other reconstructions.

In my reconstruction, Grade III rhymes are characterized by medial -ɨ- and are distinct from Grade IV rhymes with -i-. Grade III and IV rhymes typically have different initials:

III: v- (= w- in most other reconstructions), shibilants (tʂ-, tʂh-, dʐ-, ʂ-, ʐ-), l- (cf. Grade II which occurs with shibilants but not sibilants or r-)

All of these initials are associated with Grade III in the Late Middle Chinese (LMC) of the rhyme table tradition. (So are many other LMC initials other than sibilants and *ɣ-.) In LMC, Grade III was nonpalatal and Grade IV was palatal. Assuming that the Tangut carried over that distinction into their analysis of their own language, Tangut Grade III initials must have been nonpalatal. Tangut l may have been velarized [ɫ].

IV: all other initials (cf. Grade I which occurs with all non-shibilants)

However, this correlation between grade and initial is not absolute: e.g., 1dzɨa' has a dz- that normally should precede a Grade IV rhyme. Hence the distinction between medial /ɨ/ and /i/ is phonemic as well as phonetic, and the Tangut created separate rhyme categories whenever the medial could not be predicted on the basis of the initial. Minimal pairs like 3371 and 4075 above necessitated the separation of rhymes 21 and 24. (10.10.20:05: 3371 1dzia [not 1dzɨa'!] and 4075 1dzia' actually differ in terms of the presence or absence of the mysterious feature that I write as -', not in terms of medials.)

On the other hand, I presume all medials in rhyme 27 were nondistinctive (and predictable?*) as suggested by the mixture of Grade III and IV in this rhyme 27 fanqie:


1ʂɨã (Grade III) = 2ʂɨu (Grade III) + 1kiã (Grade IV!)

Hence there was no need to create separate rhyme categories for -ɨã and -iã syllables.

I'll start looking at the unpredictable medials of rhymes 21 and its -'-less counterpart 19 this weekend.

*It is possible that -ɨ- and -i- were completely interchangeable in rhymes like 27: e.g.,

1ʂɨã ~ 1ʂiã (cf. Grade III rhyme 36 1ʂɨe; there is no Grade IV rhyme 37 *1ʂie)

1kɨã ~ 1kiã (cf. Grade IV rhyme 37 1kie; there is no Grade III rhyme 36 *1kɨe)

It is also possible that rhyme 27 had only one medial (-ɨ- or -i-) after all initials, so all rhyme 27 syllables were Grade III or IV.

It is not possible to choose between these alternatives at this point. It might be more accurate to write the medial of rhyme 27 with an algebraic symbol like -I-. However, I have already used that symbol to represent a lost unstressed presyllabic vowel conditioning the raising and fronting of pre-Tangut *a to i. I assign medials to rhyme 27 syllables following the general pattern: -ɨ- after shibilants (there are no v- or l-rhyme 27 syllables) and -i- after other initials. WHIP = TSU + SHARP + ?

If 0219 2tseʳw 'whip' has three sources, the first two might be one of three tangraphs with a TSU-type reading and 3767 1reʳw 'sharp, pointed end':


What might be the third? There are nine tangraphs sharing a right side with 0219 that I didn't cover last Saturday:

LFW2008 Tangraph Reading LFW2008 gloss Class(es)
0054 1tswa hair worn in a bun or coil HAIR
0375 1ka second syllable of 2phʊ 1ka 'boots worn in rain or snow' HAIR (fur boots?)
0378 2ʔʊ second syllable of 1dzɨa' 2ʔʊ 'curled hair' HAIR
1144 2dị second syllable of 1dzɨa' 2dị 'bun (of hair)' HAIR
2279 1swa second syllable of 2siọ 1swa 'a kind of grass' SWA
4021 1swa second syllable of 1niu 1swa 'ear ornament' SWA
4045 1swa hair HAIR, SWA
4371 1dạ second syllable of 2me 1dạ 'hair' HAIR
5133 2rieʳ wool, feather, fine hair HAIR

All of the above characters either represent (parts of) words for hair or syllables homophonous with 1swa 'hair'. So 2tseʳw 'whip' is either 'TSU + hair' or 'TSU + sharp + hair'.

Two of the above characters (0378, 1144) are only attested after


3371 1dzɨa' 'hair worn in a bun or coil; peak (< like a bun of hair on the top of the head?)' = 2750 1ɣɤu 'head' + 1lwʊ̣ 'to mix, blend'

They may be adjectives modifying 1dzɨa'.

Both the structure and pronunciation of 3371 are odd to me (10.10.20:15: because I reconstructed 3371 incorrectly! It should be 1dzia with a Grade IV rhyme, not 1dzɨa' with a Grade III rhyme.) I wouldn't describe a bun or coil as mixed and blended hair. And Grade III rhymes with -ɨ- normally don't follow alveolars. I will take a closer took at -ɨa' tomorrow. WERE TANGUT WHIPS SHARP?

On Sunday I concluded that the left side of 0219 2tseʳw 'whip' might be an abbreviation of some tangraph with a TSU-type reading, though I admit the phonetic match is poor:


2tseʳw 'whip' < left of 1tshwiu, bottom left of 2dziu', or right of 2dʐwɨiw?

I also identified the rest of 0219 as being from

2061 2pɤẹ̃ 'hair'

as a whole. And on Saturday I used Google to demonstrate that whips are associated with hair in English, though of course there is no guarantee the Tangut also had such an association.

2061 of course consists of two components. Maybe each of those components in 0219 2tseʳw 'whip' is from a different source. Let's look at eleven possible sources of

the center of 0219:

LFW2008 Tangraph Reading LFW2008 gloss Class(es)
1146 2kạ tattered TATTER
1964 ?ɬə smooth LHY, SMOOTH
2434 1bie to mend, patch BE, TATTER (i.e., to fix tatters)
2600 1miaʳ hair HAIR
3088 1bie second syllable of 2bə 1bie 'dung beetle' BE
3089 1tʂɨọ ugly UGLY
3090 2ɬọ first syllable of 2ɬọ 2ɬwi 'ugly and old'; can it stand alone? UGLY
3558 2pɤẹ̃ first syllable of 2pɤẹ̃ 2ba 'flattery' BE
3767 1reʳw sharp, pointed end SMOOTH (left and center from 1963 'smooth')
4330 1ʔị ladle, scoop I (bottom center and right from 3101 2ʔị 'to repeat')
4817 ?ɬə plane for carpentry LHY

I have excluded five tangraphs containing 2061.

The classes can be grouped into three families:


SMOOTH > LHY > (UGLY if 2ɬọ had a ?ɬə tangraph as phonetic)


The last is an unusual case, as the shape of the bottom center component of 4330 1ʔị 'ladle' does not match its source 3101 2ʔị 'to repeat' in its Tangraphic Sea analysis:


The source of the top and bottom left of 4330 1ʔị 'ladle' is 4368 2dwʊ 'chopsticks'.

Among these characters, the best candidate for a source of 0219 'whip' is 3767 1reʳw 'sharp, pointed end'. I wish I knew more about Tangut material culture. Did Tangut whips have sharp ends? THE APPEARANCE OF ANGER

Two of the Tangut words in yesterday's table

0924 2niạ 'anger, rage' and 0996 2mə 'appearance, spirit'

were borrowings from Chinese 惱 'angry' and 模 'pattern' according to Li Fanwen (2008: 156, 167).

The first etymology would work only if there was a pre-Tangut prefix *Sɯ- of unknown function (!) added to *nawʔ from Middle Chinese *nawˀ. The *S- of the prefix conditioned vowel tension (indicated by a subscript dot) and the high vowel of the prefix conditioned the -i- in the main syllable:

*Sɯ-nawʔ > *Sɯ-nɨawʔ > *Sɯ-nɨaɯʔ > *S-nɨaɯʔ > *nnɨaɯʔ > *ṇɨaɯʔ > *ṇɨ̣ạɯ̣ʔ > 2niạ

The relative chronology of changes is not entirely clear, though *a-breaking must have preceded -loss and *S-tension.

I once thought Tangut rhymes ending in the algebraic symbol -' (corresponding to what I used to reconstruct as long vowels) once had final consonants:

-V' (= -VV) < *-VC

If that were the case - and I don't think it was* - then the absence of -' in 0924 2niạ would not rule an earlier final consonant (i.e., *-w) since -' could not occur with tense vowels. This complimentary distribution is a clue to the identity of -' which had to have some phonetic characteristic that was incompatible with tense vowels.

The second etymology is highly improbable because Middle Chinese 模 *mo 'pattern' should correspond to Tangut *2mʊ, not 2mə. (See Gong 2002: 413 for examples of MC *-uo : Tangut -u which is equivalent to MC *-o : Tangut -ʊ in my reconstruction. I regret not include the raising of *-o to *-ʊ in pre-Tangut.)

There are isolated instances of the correspondences

Tangut : Japhug rGyalrong -u < *-o, -ɯ < *-u

in Jacques (2006), but the general pattern is clear:

Tangut (= Jacques' -u) : Japhug rGyalrong -u < *-o, -ɯ < *-u

2mə 'spirit' may be an unrelated homophone of 2mə 'pattern' that was written with the same character.

The Precious Rhymes of the Tangraphic Sea analyzed the graph 0996 for 2mə as being from


the top of 1365 and the bottom of 4744 2ʔiõ 'appearance' (a loan from Middle Chinese 樣 *jɨaŋʰ or Tangut period northwestern Chinese *jõ).

Li may have been tempted to have derived 2mə from Middle Chinese 模 *mo 'pattern' since the word appears with the clarifying character 4744 in Homophones:

2ʔiõ 2mə

He translated that collocation as 模樣 'pattern' which would have been read as *mo jɨaŋʰ in Middle Chinese - a near-mirror image of 2ʔiõ 2mə! I think this resemblance is coincidental. In Tangut period northwestern Chinese, 模樣 was something like *mbʊ jõ which would have been borrowed into Tangut as *bʊ 2ʔiõ. (Tangut tones for Chinese loans are unpredictable, so I have not indicated the hypothetical tone of the first syllable.)

The analysis of 0924 2niạ 'anger, rage' is unknown. Perhaps it was from the top and bottom left of 0948 1na 'to steal' (phonetic) plus 'demon' (semantic) extracted from one of forty-nine different possible characters:


('Demon' has left-hand and right-hand forms which are interchangeable in tangraphic analyses.)

None of the other 'demon' characters mean 'anger', so none stand out as more likely sources than others.

*My old -V' < *-VC hypothesis would not predict Tangut-Japhug rGyalrong comparisons such as these from Jacques (2006):

'nose': 5700 2ni' (not *2ni) : J sna

'needle': 4935 1ɣa (not *1ɣa') : J ta-qaβ

Correlations between Tangut -' and Japhug final consonants in sets such as

'fruit': 2436 1mia' : J sɯ-mat

may be coincidental. WHAT PLUS 'HAIR' EQUALS 'WHIP'?

If the center and right components of

0219 2tseʳw 'whip'

are from

2061 2pɤẹ̃ 'hair',

what is the source of the left-hand component


None of the 69 other characters with that component are a plausible semantic match for 2tseʳw 'whip' which may belong to the TSU phonetic class:

LFW2008 Tangraph Reading LFW2008 gloss Class(es) Class codes
0009 1ʂwɨo to appear; to raise (< 'cause to appear'?) APPEAR S1
0020 1tʂɨa road, way (literal and metaphorical: 'manner'); to lay bricks CHA, ROAD P1, S2
0029 2rɪʳ market RIR P2
0033 1tshị road, way ROAD S2
0051 1thaʳ' obvious APPEAR S1
0060 1dõ street ROAD S2
0130 2thʊ source, resources TU? P3?
0486 2paʳ horse with white trotters PAR P4
0503 1tʂɨa the surname Cha CHA P1
0745 2vɨe the surname Ve VE P5
0752 1tʂɨa ceremony, courtesy CEREMONY, CHA S3, P1
0753 2vɨe face VE P5
0760 2dʐɨe to judge, decide JE P6
0924 2niạ anger, rage NA P7
0948 1na to steal, rob NA P7
0996 2mə appearance, spirit APPEAR S1
1003 1lew full, filled, satisfied not HOLLOW?, LU? (but analysis has 1630 2dziẽ 'carve'!) S4
1026 1tʂwɨa the name Chwa; luck CHA P1
1071 2dziu' first half of 1071 1226 'to hide, conceal' HIDE, TSU? S5, P8?
1082 2riʳ second syllable of surnames ending in Rir RIR P2
1094 2ʐɨə to go without a burden GO S6
1226 ?T- second half of 1071 1226 'to hide, conceal' HIDE, TU? S5, P3?
1360 1va to hide, conceal' HIDE S5
1364 1ŋa hollow, void HOLLOW, NGA S4, P9
1578 1swiə ear EAR S7
1588 1tʂɨa sheep guardian god CHA, SHEEP P1, S8
1630 2dziẽ to carve CARVE, JE S9, P6
1641 2dʐɨa lamb CHA?, SHEEP? (but analysis has 1043 1lew 'full') P1?, S8
1651 1tshwiu to salute CEREMONY, TSU? S3, P8?
2663 1kwiə̣ to kowtow, worship on bended knees CEREMONY S3
2755 2lwəʳ the surname Lwyr LWYR P10
2972 1ŋa to spread; Grinstead: 'empty' HOLLOW?, NGA S4?, P9
3049 1xwaʳ to melt, thaw; to confess (< 'melt down' and release information?) XA, SPEAK P11, S10
3575 2ni to listen, hear EAR S7
3579 2kie impressive and dignified, eminent APPEAR (i.e., prominent?), CEREMONY? S1, S3?
3689 1lʊ' to dig LU P12
3731 1khɤu' to milk KHY P13
3813 2vɨẹ to see someone off VE P5
3821 2lʊ to enjoin; to tell; to give a present CEREMONY?, LU, SPEAK S3?, P12, S10
3828 1tʂɨə to give a present; to enjoin; to tell; to know CEREMONY?, CHA, SPEAK? (but no CEREMONY, CHA, or SPEAK graph in analysis which has 3813 2vɨẹ 'to see someone off') S3?, P1?, S10?
3874 1ʔiə hunger HOLLOW (lacking food) S4
3920 1kiụ to bow, salute CEREMONY S3
4110 2paʳ awning, shed PAR P4
4153 2lɨiw to gather, assemble; transcription character LU? P12?
4170 1dza to chisel CARVE S9
4185 2ʂɨa musk SHA P14
4201 ?kha casket, small box XA? P11?
4469 2ʂɨi to go toward, depart GO S6
4475 ?xa to puff, blow; transcription character XA P11
4534 2dʐwɨiw hungry HOLLOW (lacking food; but analysis has 130 'source')?, TSU? S4?, P8?
4677 2bə bull BA P15
4681 1niu ear EAR, NU S7, P16
4682 2khiə' chimney, window, hole, space HOLLOW, KHY S4, P13
4696 1bạ cymbals BA, CYMBAL P15, S11
4744 2ʔiõ appearance, shape; transcription character APPEAR S1
4758 1tsiə big cymbals CYMBAL S11
4761 1ʂwɨa to speak, say SHA, SPEAK P14
4762 1tʂhɨe to go, walk GO S6
4766 2bə a kind of vegetable BA P15
4768 1ʂwɨa ambition, will SHA P14
4812 2rioʳ to brush, wipe, whisk RIR? P2
4822 2dzwiə to go, walk GO S6
4849 1niu the surname Nu NU P16
4894 1mio to listen, hear EAR S7
5126 1lɨu' to carve, engrave CARVE, LU S9, P12
5412 2lwəʳ ceremony, rite; to get a haircut; transcription character CEREMONY, LWYR S3, P10
5693 1vɪʳ to listen, hear EAR, VE S7, P5
6010 1kiụ to bow, salute (= 3920) CEREMONY S3

I have numbered phonetic (P) classes by order of first occurrence in the table above. Class names are in my lay romanization for Tangut which ignores the four grades, vowel tension, and the unknown distinction indicated by -'. Y represents central nonlow vowels.

Phonetic classes organized by Homophones chapter

Chapter Initial type Phonetic class
I Labials P4. PAR, P15. BA
II Labiodentals P5. VE
III Dentals P3. TU, P7. NA, P16. NU
IV Retroflexes (none)
V Velars P9. NGA, P13. KHY
VI Alveolars (no pure VI classes) P6. JE,
VII Alveopalatals (actually retroflex shibilants?) P1. CHA, P14. SHA
VIII Glottals P11. XA
IX Liquids P2. RIR, P10. LWYR, P12. LU

Some of the phonetic classes could be combined (P4. PAR + P15. BA, P1. CHA + P14. SHA, P10. LWYR, P12. LU).

P6 and P8 might be split, as I am not certain that mixing class VI and VII initials was permissible in Tangut phonetic series.

I have also numbered semantic (S) classes by order of occurrence:






S6. GO






10.6.0:59: Some of those 27 classes could be combined into even bigger classes using ambiguous graphs as pivots: e.g., 0020 can either be CHA or ROAD, so CHA and ROAD graphs could be grouped together. Here is one particularly large group containing 18 classes:


That diagram is meant to be read from left to right: e.g.,


Two smaller groups are



Three classes cannot be combined with others: GO, NA, RIR.

Thus one could say there are six kinds of :



3. EAR

4. GO

5. NAR

6. RIR

But I doubt literate Tangut actually looked, at, say, 0760 2dʐɨe 'to judge, and thought, 'its left side indicates that it has a JE-like reading like 1630 2dziẽ 'carve', derived from the right side of 5126 1lɨu' 'to carve', in turn derived from the bottom left of 3821 2lʊ 'to give a present', in turn derived from the center of 5412 2lwəʳ 'ceremony':


How did the Tangut learn and perceive their own script? ARE WHIPS LIKE HAIR?

It was fun to use tentative Unicode code points for Tangut characters and components in my last post, but now I'm going to use Li Fanwen 2008 numbers again.

I've been trying to figure out the graphic etymology of

0219 2tseʳw 'whip'

The left side is shared with 69 other characters which don't seem to have any phonetic or semantic similarity to 2tseʳw 'whip'. I'll look at them again and post a list tomorrow.

The center and right components appear in five other characters. I already mentioned the first yesterday:

LFW2008 Tangraph Reading LFW 2008 gloss Character structure


2pɤẹ̃ hair left of 'hair' + left of another graph for 'hair'


2mioʳ second syllable of 2177 0227 1pə 2mioʳ 'rude, coarse, careless' 'language' + 'hair': i.e., coarse words are rude


2phʊ boots worn in rain or snow 'boots' next to 'hair': i.e., furry boots


2giu silk, silkworm 'bug' atop 'hair' (i.e., silk thread)


2ɬɤi smooth, glossy 'not' next to 'hair'

If the right two-thirds of 0219 were taken as a unit, then 'hair' is the most likely source. Although a whip is not much like a hair, it is even less like 'rude', 'rain and snow boots', 'silk(worm)', or 'smooth'. Moreover, none of the five sound like 2tseʳw.

I'll break up that two-thirds and see if I can find more plausible graphic sources.

10.5.0:30: Are whips associated with hair on Google?

"whip like a hair": 0 results

"whips like hair": 2 results

"whips made of hair": 7 results

"hairs like whips": 229 results

"hairs whip": 374 results

"hair whips" 32,100 results

"hair like whips": 39,400 results

"whip hair": 62,200 results

"hair whip": 93,500 results

"hair like a whip": 273,000 results

Of course modern English usage is not the key to the ancient Tangut mind. Nonetheless, the whip-hair connection is stronger in the 21st century than I had thought. UNICODE TANGUT COMING IN JUNE 2016

This has been an exciting week. First, Baxter and Sagart's new Old Chinese reconstruction, then the catalog of Khitan large script characters, and in less than two years, 6,126 Tangut characters plus the Tangut iteration mark  and 753 Tangut radicals. Andrew West has documented the long road his team has taken. Bravo!

Finding Tangut characters is easy in Unicode. For example, if I want the first character I mentioned on Wednesday, I can just search for its Li Fanwen 2008 number (0219) on this code chart, and voila!

U+17366 2tseʳw 'whip'

And I can find the second character I mentioned on Wednesday (Li Fanwen 2008 number 1877) by looking through the range of characters sharing its left-hand radical U+1896E (= Nishida 219, gloss unknown). Oddly the source graph for its left side according to the Combined Homophones and Tangraphic Sea has a different radical (U+18954 = Nishida 218 'dog/fox'):


U+1785F 2ʔiəʳ 'whip' =

left (!) of U+175EF 2khɤi 'yak'

all of U+18571 2phʊ 'tree'

Why does 'yak' plus 'tree' equal 'whip'?

The analysis of U+17366 2tseʳw 'whip' is unknown. There are 69 other characters containing the component

U+1892C (= Nishida 103, gloss unknown),

16 other chararacters with the middle component

U+18942 (= Nishida 275, gloss unknown),

14 other characters with the right-hand component

U+18975 (= Nishida 134, gloss unknown),

and five other characters containing the middle and right hand components: e.g.,

U+173F3 2pɤẹ̃ 'hair'.

Is a whip like a giant hair? Maybe. Or maybe there's a more likely source of the right two-thirds of U+1785F 2ʔiəʳ 'whip'. I'll look at the possibilities tomorrow. THE KHITAN LARGE SCRIPT IN SRI LANKA

I never expected Khitan to be discussed in

Sri Lanka <ś.ri l.ang.k.a>

at WG2 meeting 63. To be more precise, it was the Khitan large script that came up, not the Khitan small script above. I'm much less confident about this attempt to write the name in the large script:

<ś(i) ri la ang ka>

Even if one or more of those characters turns out to be inappropriate for transcribing Sri Lanka, I'm certain that a large script spelling would take up more space than its small script equivalent since the former is not clustered into word blocks like the latter.

The first of the large script characters is identical to the Chinese character 已 pronounced i in Liao Chinese, the northeastern dialect known to the Khitan a millennium ago. Should Khitan large script characters be unified with Chinese characters in Unicode?

The unification was proposed to minimize the security issues caused by co‐existence of similar shaped characters in the CJK Unified Ideograph [i.e., Chinese character] block and Khitan Large Script block.

Not knowing what the security issues are, I oppose unification. Unifying Chinese characters and the Khitan large script would be like unifying Latin A, Greek Α, and Cyrillic А. Would Greek and Cyrillic lookalike letters (e.g., Γ and Г) be assigned to one or the other alphabet while letters unique to Greek or Cyrillic (e.g., Δ and Д) were assigned to separate alphabets? My mind reels.

I also don't think unifying Jurchen (large) script characters resembling Khitan large script characters is a good idea. To me, Chinese characters, Khitan large script characters, and Jurchen (large) script characters are like the Latin, Greek, and Cyrillic alphabets: related scripts that should be kept apart in spite of partial visual overlap.

Encoding issues aside, I've been excited to browse the longest list of Khitan large script characters I have ever seen:

Proposal on Encoding Khitan Large Script in UCS

Part 1: Characters 0001-0472

Part 2: Characters 0473-0963

Part 3: Characters 0964-1455

Part 4: Characters 1456-1930

Part 5: Characters 1931-2218

( This last file does not include 已 <ś(i)> attested in the epitaph for 多羅里本 Duoluoliben [a.k.a. 突呂不 Tulübu, 1081], though it does include 己 [#1938] and 巳 [#1941] which also look like Chinese characters.)

I especially appreciate the inclusion of images of original characters. (10.3.0:06: But I wish I understood the codes for their sources.) I wanted to continue my series on Baxter and Sagart's new Old Chinese reconstruction, but I had to interrupt it to mention this breakthrough in Khitanology.

Alas, that list does not include any characters that Viacheslav Zaytsev may have discovered in Nova N 176, the longest known Khitan text in either script. As much as I'd love to be able to type the Khitan large script in Unicode as soon as possible, I wonder if it might be a good idea to wait until the characters in that book have been catalogued. It might be odd to have a first Khitan large script encoding covering all texts but Nova N 176. Typing words from what may be the most important Khitan text in the far future might involve going back and forth between a primary Khitan large script block and a Khitan Extended-A block. Awkward.

10.3.1:18: ADDENDUM: The Khitan large script proposal lists several inscriptions that I have never heard of before:

1. 耶律大王墓誌 Epitaph for Prince Yelü (personal name not given; 1051)

2. 耶律準墓誌銘 Epitaph for Yelü Zhun (1068)

3. 耶律李家奴墓誌銘 Epitaph for Yelü Li Jianu (1081)

4. 留隱太師墓誌銘 Epitaph for Master Liuyin (1109)

I wish I could see them. GSR 0000 IN BAXTER AND SAGART (2014): PART 1

I didn't know Baxter and Sagart's new book Old Chinese: A New Reconstruction came out until it was released in the US yesterday, almost two weeks after it was released in the UK on 18 September. I'm not surprised it's sold out in the UK. I've waited years for it. I'll have to wait even longer because I can't afford it. But for now at least I can look at the reconstructions which the authors have kindly shared with the public (alternate URL). All reconstructions in this post are Baxter and Sagart's unless I state otherwise.

Will these reconstructions ultimately displace those of Karlgren's Grammata serica recensa (GSR, 1957)? We shall see.

For years  I would recommend Schuessler's Minimal Old Chinese (2009) reconstructions to nonspecialists, as they incorporate many post-GSR elements widely accepted among scholars today (e.g., a six-vowel system) while excluding more controversial proposals. (I also recommend his Later Han Chinese in the same book. By definition it's too early to be Middle Chinese, but it's close, and I prefer it to nearly all Middle Chinese reconstructions I've seen.)

I dream of publishing my own reconstruction, but I really should finish my Golden Guide translation first, among many other things. I'd also like to publish a complete list of my readings of Tangut characters and the pre-Tangut sources of those readings. Both my Chinese and Tangut reconstructions have only been available in scattered form on this site and a couple of publications.

Enough about me. Let's start looking at Baxter and Sagart's Old Chinese reconstructions organized by GSR numbers. (Alas, the characters in the PDF are not directly searchable, though one can indirectly find them by searching for their Unicode code points.) At the top of the list are characters without GSR codes. Baxter and Sagart assigned them the number 0000.

The first character is 𠓥 *pe[n] 'whip', an alternate spelling of 鞭.

鞭 is a semantic-phonetic compound ('leather' + *be[n]) whereas 𠓥 is a compound of 攴 'strike' (itself a semantic-phonetic compound of 卜 *pˁok atop 又, a drawing of a hand) beneath something looking like 入 'enter' with a short horizontal line inside it. Those top components don't look like a pictograph of a whip to me, but I presume they're semantic. Another variant 𠓠 simply has 入 'enter' on top. See more variants here.

The brackets around the coda indicate that Baxter and Sagart "are uncertain about its identity". In this case, the coda might have been *-r. We know for sure that 'whip' ended in *-n in Middle Chinese, but Middle Chinese *-n could be from Old Chinese *-r as well as *-n*.

*pe[n] turns out to be an uncontroversial reconstruction. Pan, Zhangzheng, and Schuessler all reconstruct it as *pen. I am the odd man out, as my system requires a high vowel presyllable to account for the vowel breaking (partial vowel height matching) in Middle Chinese (MC):

*Cɯ-pen > *Cɯ-pien > *pien (= pjien in Baxter's MC transcription "not intended as a reconstruction")

My Old Chinese *pen without a high vowel presyllable (e.g., 邊 'side') remained *pen in Middle Chinese. Baxter and Sagart reconstruct 'side' as *pˁe[n] with a pharygealized *pˁ- that in my view blocked breaking. Such pharygealized initials distinguish their reconstruction from most others. I only reconstruct pharyngealization in Middle Old Chinese; it developed in (pre)initial** consonants preceding 'lower' vowels (*ʌ *e *a *o) and spread through the syllable:

*pen > *pˁen > *pˁeˁnˁ

On the other hand, my Old Chinese *Cɯ-pen was not subject to pharyngealization because its preinitial preceded the 'higher' vowel *ɯ. Pharyngealization and its absence conditioned vowel allophones that became phonemic after the loss of pharyngealization in Late Old Chinese (OC):

Graph Baxter and Sagart's OC This site Baxter's MC
Early OC Middle OC Late OC, MC
𠓥/鞭 *pe[n] *Cɯ-pen *Cɯ-pien > *pien *pien pjien
*pˁe[n] *pen *pˁen > *pˁeˁnˁ *pen pen

*10.2.0:51: I reconstruct *-n unless (1) a phonetic series or word family also contains Middle Chinese *-j readings pointing to *-r and/or (2) external evidence points to *-r. Baxter and Sagart's policy of reconstructing  is safer since there is no guarantee that all Old Chinese *-r belonged to such phonetic series or word families and/or can be reconstructed on the basis of external evidence.

I have not found any true cognates of the Chinese word for 'whip'; all lookalikes in the region are borrowings.

The Tangut words for 'whip' are completely different:

0219 2tseʳw < *T-tse(k/w)H (common) and 1877 2ʔiəʳ < *T-ʔəH or *ʔərH (only in dictionaries?)

**10.2.1:02: Preinitials are onsets of unstressed presyllables whereas initials are onsets of stressed syllables. Hence the preinitial of *Cɯ-pen was *C- (an unknown consonant) and its initial was *p-. The height of the vowel after the first consonant in a (sesqui)syllable conditioned the presence or absence of pharyngealization in Middle Old Chinese.

I suspect that uvular initials always conditioned pharyngealization regardless of the following vowel unless preceded by a high vowel presyllable, but I have not yet investigated that hypothesis:

*qi > *qˁi (but *ki > *ki)

*Cʌ-qi > *Cˁʌˁ-qˁeˁiˁ > *kei (same outcome as *Cʌ-ki)

*Cɯ-qi > *Cɯ-ki > *ki (same outcome as *Cɯ-ki) STILI IN OFFORD AND GOGLITSYNA (2005)

Offord & Gogolitsyna (2005; hereafter OG) is the first book of Russian for foreign learners that I have ever seen with extensive coverage of variation in Russian. Although Japanese is well known for its complex speech levels, I was surprised tonight to find that McClure's (2000) book on Japanese in the same series covers the same topic in a 33-page chapter that is less than half as long as OG's two chapters combined (72 pages). I think it would be possible to write a full-length book on variation for learners of Japanese. OG identify three registers of Russian. I have grouped their lowest varieties into a fourth register ranked below their first register (R1):

R0: Subcolloquial

Demotic, youth slang, prison slang, thieves' cant, vulgar language

R1: Colloquial

Everyday spoken conversation. I would have extreme difficulty making out compressed forms such as monosyllabic [grʲu] for trisyllabic говорю 'I say' (p. 10). I have wondered how learners cope with compression in English.

R2: Neutral

"This is the norm of the educated speaker, the standard form of the language that is used for polite but not especially formal communication [...] It is the register that the foreign student as a rule first learns and which is most suitable for his or her first official or social contacts with native speakers. [...] This register is perhaps best defined in negative terms, as lacking the distinctive colloquial features of R1 and the bookish features of R3" (p. 14)

R3: Higher

a. Academic/scientific

Apart from textbook Russian, this is the style I am most acquainted with. OG noted the feature that stands out in my mind:

Various means are used to express a copula for which English would use some form of the verb to be, e.g. состои́т из [consists of], зaключáeтся в [concludes in], прeдстaвля́eт собо́й [presents itself as], all meaning is (4.2). (p. 15)

All three expressions for 'is' can be found in Zaytsev (2011)'s paper on Khitan. (Can be found is itself a bookish synonym for is in English.)

b. Official/business

c. Journalism/political debate

Literary and online language can mix elements from across the above spectrum.

OG also discuss regional variation.

All that makes me ponder how little is known about Tangut, Jurchen, and Khitan. I suspect that Tangut words only appearing in odes and dictionaries are from a traditional, colloquial register whereas the bulk of surviving Tangut texts are in an elevated, Chinese-influenced register. Even less is known about Jurchen and Khitan. Surviving texts in those languages are largely in inscriptions representing a 'monumental' register. One huge possible exception is the Khitan book that Zaytsev (2011) identifed; it may have been written in a different style. NOT BEING THERE ANYMORE: RUSSIAN GERUND VARIANTS

Russian has several types of gerund suffixes. Six books for English-speaking learners include notes on when to use them:

Aspect Suffix Example Reiff (1883: 181) Forbes (1916: 171) Arant (1981: 119) Pul'kina & Zakhava-Nekrasova (1992: 371) Offord & Gogolitsyna (2005: 328) Wade (2011: 386, 389)
Imperfective (-shibilant) -я / (+shibilant) -а встречая 'meeting' "written tongue"  
(consonant +) -учи / (vowel +) -ючи встречаючи 'meeting' "familiar language" "peasants", "popular poetry" not mentioned; even будучи 'being' is absent "popular parlance"; "generally avoided in the modern literary language" with the sole exception of
будучи 'being'
only будучи 'being' only будучи 'being', three others*
Reflexive imperfective (-shibilant) -ясь / (+shibilant) -ась встречаясь 'meeting'     
Perfective (-shibilant) -я / (+shibilant) -а войдя 'having entered' "common with reflexive verbs"
(vowel +) -в встретив 'having met' "written tongue"   interchangeable   "preferred in written styles" to -я/-а
(vowel + ) -вши встретивши 'having met' "familiar language" "peasants", "popular poetry" "less frequently" used than -в "archaic flavour"; "may also occur" in "the colloquial register"or "demotic" not mentioned
(consonant +) -ши вошедши 'having entered'   "rarely used"  
Reflexive perfective (-shibilant) -ясь / (+shibilant) -ась разбредясь 'having wandered in different directions'  
(vowel + ) -вшись встретившись 'having met'
(consonant +) -шись ведшись 'having been in progress'


1. Is -учи /-ючи still in "popular parlance" today?

Google Ngrams has no data for встречаючи, so here are two more pairs of gerunds:

читая vs. читаючи 'reading' (the former is always more common)

делая vs. делаючи 'doing' (ditto)

The title refers to the most common surviving -учи /-ючи gerund in Будучи там, the Russian title of Being There (1979). Будучи 'being' has 4.99 million Google results. See below for the Google statistics of other -учи /-ючи gerunds mentioned in Wade (2011).

2. I am surprised that the perfective gerund suffix -а/я is still around. It could be confused with the homophonous imperfective gerund suffix (though the latter attaches to many more stems). And yet войдя 'having entered' is much more common than its synonym вошедши in Google Ngram Viewer. (There is no risk of confusing inflected imperfective and perfective gerunds [i.e., stem-suffix sequences] as opposed to suffixes in isolation as long as each aspect has a different stem: e.g., the imperfect gerund corresponding to войдя/вошедши 'having entered' is входя 'entering' with a different stem вход-.)

3. I am also surprised that -вши is in decline (Wade does not even mention it!) though its reflexive counterpart -вшись is common.

встретивши was once more common than встретив, but their fortunes reversed shortly before the Revolution.

In short, I would expect the imperfective and perfective gerund suffixes to be maximally differentiated over time and internally consistent:

-я(сь)/-а(сь) vs. -(в)ши(сь)

But that's not the case!

9.30.1:36: Added a column for Offord & Gogolitsyna (2005: 328) and Google Ngrams links.

*The three are

едучи 'traveling' "is sometimes found in poetic or folk speech" (p. 386; 97,300 Google results)

жить припеваючи 'to live in clover' (p. 386; 124,000 Google results)

крадучись 'stealthily' (p. 394; 391,000 Google results) TRANSCARPATHIAN RUSYN MASCULINE 'JA-NIMATES'

The Transcarpathian Rusyn (TR) and Prešov Rusyn (PR) masculine animate declension in Magocsi (1979: 83) and Magocsi (1979: 83) is straightforward in the singular: all endings are added to an invariable stem brat-:

Case Proto-Slavic TR PR Ukrainian Belarusian Russian Serbo-Croatian Polish Slovak Czech
nominative *bratrŭ brat bratr
genitive *bratra brata bratra
dative *bratru bratu, bratovy bratovi bratu, bratovi bratu bratovi bratru, bratrovi
accusative *bratrŭ brata bratra
instrumental *bratromŭ bratom bratam bratom bratem bratom bratrem
locative *bratrě bratu bratovi brati, bratovi bracie brate bratu bracie bratovi bratru, bratrovi
vocative *bratre brate bracie - brate bracie - bratře

However, the TR plural has an unexpected -j- in some forms:

Case Proto-Slavic TR PR Ukrainian Belarusian Russian Serbo-Croatian Polish Slovak Czech
nominative *bratri bratȳ braty braty brat'ja braća bracia bratia bratři
genitive *bratrŭ bratüv brativ brativ bratoŭ brat'jev braće braci bratov bratrů
dative *bratromŭ bratüm, bratjam bratom bratam brat'jam braći braciom bratom bratrům
accusative *bratry bratüv brativ brativ bratoŭ brat'jev braću braci bratov bratry
instrumental bratamy bratami brat'jami braćom braćmi bratmi
locative *bratrěchŭ bratjach bratoch bratach brat'jach braći braciach bratoch bratrech
vocative *bratri braty - braty - braćo bracia - bratři

The TR dative and locative plurals resemble the Russian plurals, but that must be a coincidence, as TR is not contiguous with Russian; it is spoken in the Transcarpathian Oblast' "which borders upon four countries: Poland, Slovakia, Hungary, and Romania." I wonder if those TR ja-plurals were influenced by Polish whose ci is from *tj. The TR nominative plural is unlike those of Polish or Slovak.

TR bratüm < *bratomŭ may be an older TR dative plural or a very old borrowing from Slovak predating *o-fronting and *-ŭ-loss.

Moreover, the Russian plural forms are based on an old feminine collective which must have replaced an earlier regular masculine plural *braty still preserved in the other East Slavic languages. On the other hand, all non-j TR forms are from brat- rather than the feminine collective *bratĭja.

The Serbo-Croatian 'plural' braća 'brothers' is still a feminine collective singular unlike Russian brat'ja which takes plural endings except in the old nominative singular (now reinterpreted as a plural). Hence none of its endings are cognate to those of the original masculine plurals.

Polish has a mixture of old singular and plural forms of that collective. I assume the old feminine accusative singular *bracię has been replaced by the old feminine genitive singular braci to conform to the genitive-as-accusative pattern of masculine animates. (23:30: The old feminine vocative singular would have been *bracio; it has been replaced by the old nominative singular since masculine plurals have identical vocatives and nominatives.)

Slovak combines that collective (reinterpreted as a masculine plural) in the nominative with forms of brat- in all other cases.

Notes on other forms

Stem: Only Czech preserves the second *-r-.

Nominative/accusative singular: Originally identical but differentiated later when the genitive was used as the singular. See Schenker (1993: 108).

Dative/locative singular: Apparently partly merged in TR and Ukrainian. Fully merged in PR, Slovak, and Czech. Dative for locative reminds me of the dative after German prepositions.

What is the origin of -ovy/-ovi?

PR y normally does not correspond to Ukrainian i. Why does PR have -y instead of -i?

Instrumental singular: Did Polish and Czech generalize -em from other paradigms? Czech -em in this paradigm must postdate *r shifting to ř before *e (a change visible in the vocative).

Belarusian unstressed  *o became a.

Nominative plural: In spite of my transliteration, TR/PR  bratȳ [bratɨ] is homophonous with Belarusian braty [bratɨ] but not Ukrainian braty [bratɪ].

Genitive plural: Originally homophonous with nominative and accusative singular. How did *-ovŭ (the source of most forms above) and *oː (the source of Czech ů [uː]) develop?

The *o before fronted to ü in TR and lost its rounding in PR and Ukrainian.

*-v became Belarusian -ŭ.

Russian -ev is an allomorph of -ov after -j-.

Dative plural: Is -a- instead of -o- in most of East Slavic other than TR and PR by analogy with the instrumental -ami?

Is PR bratom due to Slovak influence postdating the *o > i shift before *ŭ?

Is TR bratüm due to Slovak influence predating *o-fronting?

Accusative plural: Czech preserves the original homophony of accusative and instrumental plural. All other modern languages have accusative plurals from genitive plurals.

Instrumental plural: Schenker (1993: 89) could not explain the original ending *-y. It was replaced by -mi endings by analogy with other declensions.

-a- in East Slavic could be from the -ami of the -a-declension.

Locative plural: Is Czech the only language in the table with a reflex of *ě? Most of East Slavic seems to have generalized -a- from the instrumental and/or dative plural. Polish braciami has the -ami of an a-declension instrumental plural. Slovak may have generalized -o- from the genitive/accusative and/or dative plural. PR o must be from the dative plural since *o borrowed from the old genitive/accusative plural *-ovŭ would have fronted to *i.

9.28.23:57: I forgot to ask if the -j- in TR bratjam and bratjach is in all masculine consonant-final dative and locative plural forms or is only in a subset of those forms. I could answer my own question by looking for all masculine consonant-final dative and locative plural forms in Magocsi (1979), but my copy is not machine-searchable, and that would be time-consuming. My guess is that (1) TR brat belongs to a small class of masculine animate nouns which once had alternate plurals based on feminine singular collectives and (2) all other TR masculine animate nouns share the endings -am and -ach with masculine inanimates and neuters. НЕСПРАВНІ СЛОВА

Magocsi (1979: 82) listed fifteen English loanwords in American Rusyn that he regarded as "incorrect" (несправні <nespravni>). They contain a number of surprises from an English speaker's perspective:

1. 'Displaced' stress

Verbs are borrowed with the stressed suffix -ва́ти <váty>. The English roots are unstressed: e.g.,

bother > бадерова́ти <baderováty> (not *báderovaty)

Is the stress in this word by analogy with other -ня <-nja> words?

grocer > ґросе́рня <grosérnja> (not *grósernja)

The stress in 'watch out!' is by analogy with its native equivalent:

watch > вачу́йте <vačújte> (not *váčujte) : мирку́йте <myrkújte>

Also see 'cookies' and 'cousin' below.

2. Assignment of monosyllabic consonant-final nouns to the feminine -a declension

yard > я́рда <járda> (not *jard)

car > ка́ра <kára> (not *kar)

mine > ма́йна <májna> (not *majn)


store > штор <štor> (not *štóra; the initial consonant is irregular)

Polysyllabic consonant-final nouns were assigned to the masculine consonant-final declension:

carpet > ка́рпет <kárpet>

closet > кла́зет <klázet>

3. Double plural

cookies > куке́сы <kukésȳ>

I suppose the Rusyn plural ending was added to kukés- because *kuki would end in an un-Rusyn -i- and could not be declined.

Is there a singular kukés?

I'm surprised the stem isn't *kúkiz-.

4. Spelling-based borrowings?

Rusyn y is [ɪ].

cousin > кузи́н <kuzýn> (not *kázyn)

picture > пі́кчер <píkčer> (not *pýkčer)

run (?) > рунова́ти <runováty> 'to drive' (not *ranováty)

The -e- in kukésy 'cookies' may also be influenced by spelling.

5. Vowel not matching spelling or pronunciation

drive > дрейвова́ти <drejvováty> (not *drájvovaty)

Oddities like this make me wonder about the dialect(s) and nonnative, non-Rusyn English that Rusyn speakers heard. _DENT_F_KAC_JA

If someone asked me how to distinguish between modern written Russian, Belarusian, and Ukrainian without actually knowing the languages, I'd tell them to look for letters specific to each orthography:

ъ <''> is only in Russian

є <je> and ї <ji> are only in Ukrainian

ў <ŭ> is only in Belarusian

The problem with that approach is the low frequency of those letters:

ъ <''> is the rarest letter in Russian

є <je> and ї <ji> are eight-point letters in Ukrainian Scrabble

ў <ŭ> is the 12th least frequent letter* in the Narkamaŭka Belarusian orthography and the 11th least frequent letter in the Taraškievica orthography

Here is a different approach using higher-frequency letters:

- if a text contains і, it is either Ukrainian or Belarusian

- if a text contains і and и, it is Ukrainian

- if a text contains і and ы, it is Belarusian

- if a text contains и and ы, it is Russian

This table shows the distribution of the three letters:

Letter Russian Belarusian Ukrainian
і (not used) /i/
и /i/ (not used) /ɪ/
ы /ɨ/ (not used)

Note that и has different phonemic values in Russian and Ukrainian.

і is the third most frequent letter in Belarusian and a one-point letter in Ukrainian Scrabble.

ы is the 5th most frequent letter in the Narkamaŭka Belarusian orthography and the 4th most frequent letter in the Taraškievica orthography, but is the 19th most frequent letter in Russian.

The Russian, Belarusian, and Ukrainian words for 'identification' exemplify the different distributions of those letters:

R идентификация <identifikacija>

B ідэнтыфікацыя <identyfikacyja>

U ідентифікація <identyfikacija>

The Russian word would be an even better example if it contained ы as well as и.

Belarusian has one difference absent from the table above: э where the others have е.

So far, so good. But then I finally got around to looking at the Rusyn alphabet this week. I've known about Rusyn for years without knowing that its alphabet was like a combination of the Russian and Ukrainian alphabets. It has

- ё, ы, ъ like Russian

- є, і, ї like Ukrainian

I don't know anything about Rusyn, much less its historical phonology. My guess is that Rusyn did not merge *y and *i unlike Ukrainian:

Proto-Slavic Russian Belarusian Ukrainian Rusyn?
*y ы и ы
*i и і, ы и
е е, я і і

Did Pannonian Rusyn merge all three vowels into и? If so, then it is like Ikavian Serbo-Croatian in that respect.

On Tuesday I discovered a Transcarpathian variant of the Rusyn alphabet with two more letters in Magocsi (1979): ӱ <ü> and ю̈ <jü>.

ӱ <ü> is from *o before a short high vowel:

*nočĭ 'night' >

Russian ночь <noč'>

Belarusian ноч <noč>

Transcarpathian Rusyn нӱч <nüč> (fronting) (p. 14)

Ukrainian ніч <nič> (fronting and loss of rounding)

I can't explain this correspondence:

*děvica 'girl' >

Russian девочка <dеvočkа>

Belarusian дзяўчына <dzjaŭčyna>

Transcarpathian Rusyn дӱвочку <düvočku> 'girl' (acc. sg., p. 23) (I would expect *divočku)

(9.27.0:05: I'm pretty sure the nom. sg. is düvočka. is  Did round before *o?)

Ukrainian дівчина <divčyna>

ю̈ <jü> is much rarer than ӱ <ü>. Here are two examples from *e before a short high vowel:

*medŭ 'honey' >

Russian and Belarusian мёд <mjod> [mʲot]

Transcarpathian Rusyn мню̈ д <mnd> [mɲyd] (p. 37)

(9.27.0:30: Transcarpathian Rusyn [mɲ] is reminiscent of Czech [mɲ] from *mj-, though the two languages are not contiguous. Transcarpathian Rusyn's neighbor Slovak has [m] corresponding to Czech [mɲ].)

Ukrainian мед <med> [mɛd]

*anŭgelŭ 'angel' >

Russian ангел <angel>

Belarusian анёл <anjol>

(9.27.0:32: Coincidentally reminiscent of Slovak anjel. Did Belarusian simplify *ng to n?)

Transcarpathian Rusyn агню̈ ль <ahnl'>, ангел <anhel> (p. 52)

(The former has an irregular palatalized -l' and the latter looks like a later loan.)

Ukrainian ангел <anhel>

Another example is from *ju before a short high vowel:

*ključĭ 'key' >

Russian, Belarusian, and Ukrainian ключ <ključ>

Transcarpathian Rusyn клю̈ ч <klč>

*The Belarusian frequency lists include Russian letters absent from Belarusian at the bottom: и, ъ, щ. I presume those letters appeared in Russian names and words in Belarusian texts. I have excluded those letters from my ranking. MIENSK I MINSK

(The title is from Менск і Мінск 'Miensk and Minsk', the first song I ever heard in Belarusian.)

I was puzzled by this section of the English Wikipedia entry on Minsk:

The Old East Slavic name of the town was Мѣньскъ (i.e. Měnsk < Early Proto-Slavic or Late Indo-European Mēnĭskŭ), derived from a river name Měn (< Mēnŭ). The direct continuation of this name in Belarusian is Miensk (pronounced [mʲɛnsk]). The resulting form of the name, Minsk (spelled either Минскъ or Мѣнскъ), was taken over both in Russian (modern spelling: Минск) and Polish (Mińsk), and under the influence especially of Russian it also became official in Belarusian. However, some Belarusian-speakers continue to use Miensk (spelled Менск) as their preferred name for the city.

It does not explain where Minsk came from. The standard Belarusian reflex of Proto-Slavic ('jat') is e (with palatalization of the preceding consonant indicated by -i- in Łacinka). Russian has the same reflex of jat. Among the East Slavic standard languages, only Ukrainian has i from jat. The Slavic root for 'white' in Беларусь Belarus' 'White Rus' ' has jat:

Proto-Slavic *běl-

Belarusian бел- bieł- [bʲɛl]

Russian бел- bel- [bʲɛl] (Łacinka disguises the fact that the Belarusian and Russian roots are homophonous)

Ukrainian біл- bil-

Polish biał- [bʲaw]

(More descendants here.)

One might think that Minsk is a borrowing from Ukrainian (in which the word is Мінськ Mins'k with the shift ĭs > s'), and in fact Vasmer credits Ukrainian influence rather than outright borrowing. The Belarusian Wikipedia in the current official orthography states that according to Aničenka (1987), the spelling Minsk adopted in 1939 incorporates the Ukrainian reflex of jat.

The Taraškievica Belarusian and Russian Wikipedias mention another explanation by Abremska-Jabłońska in Kramko and Štychaŭ 2001: the influence of the Polish name Mińsk (Mazowiecki) '(Masovian) Minsk'.

The Russian Wikipedia says the i-spelling in Latin dates from 1502 when Minsk was under Lithuanian rule. The Polish-Lithuanian Commonwealth was still 67 years in the future.

At first I thought it was likely that the Poles renamed Minsk after their own Mińsk, but why would non-Poles* alter the name to match a name in a foreign country? And centuries later, why would the BSSR adopt a Ukrainianized name for Minsk?

Here is an uninformed guess: Did the originators of the spelling Minsk perceive the local Belarusian reflex of jat to be i-like: i.e., an [e] or [ɪ] higher than Belarusian e [ɛ]? Such a high reflex could have later lowered and merged with [ɛ]. Or this hypothetical high-jat dialect could have been replaced by an [ɛ]-jat dialect.

*9.26.0:52: I don't know who wrote the Latin documents containing Minsk. They could have been Lithuanians or Belarusians. In any case, they did not have the option of writing a higher e with the dotted letter ė which was absent from the earliest Lithuanian alphabet of 1547. (In modern Lithuanian orthography, plain e is [ɛ] and dotted ė is [eː]. The Lithuanian Wikipedia article on the Lithuanian alphabet gives me the impression that dotted ė is only a little over a century old.) BROTHER-IN-LAWS IGOR AND OLEG

I am barely a dilettante at Slavic, so I constantly fear that I am raising Comparaitve Slavic 101-level questions whenever I bring up the subject. Yesterday I asked why *e in *děverĭ 'brother-in-law' didn't raise in Ukranian. Today I learned that the late George Shevelov himself (1979: 309) wasn't sure:

The reason for the appearance of e in [standard Ukrainian] díver 'husband's brother' is unclear. Could it be an influence of NU [northern Ukrainian] dialects where e is restored in unstressed syllables?

So maybe that wasn't such a bad queston after all. I don't know about these next questions, though.

Another word from my last post, Russian Igor' / Ukrainian Ihor / Belarusian Ihar, is from Old East Slavic In(ŭ)gvarŭ which in turn is a loan from Old Norse Ingvarr. Let's go through this word from left to right:

According to Shevelov, nasal + consonant sequences did not exist at the time. Hence there were four options to deal with Old Norse Ing-:

1. Borrow as is in spite of native phonotactics: Ingvarŭ

2. Insert ŭ to break up the ng-cluster: Inŭgvarŭ

3. Drop the n to avoid the ng-cluster: the ancestor of Igor'

4. Replace In- with native nasalized Ę- to break up the ng-cluster.

All but the last options were exercised. A nasal vowel would have become *Ja- in modern forms like Russian *Jagor', etc.

G weakened to h in  Ukrainian and Belarusian.

I have not seen the change *va > o anywhere in Slavic. Are there other examples? Was Old Norse va something like [wɒ] or [wɔ] which would have been close to Old East Slavic *o? Belarusian a in Ihar is from o and is not a direct retention of the Old Norse vowel.

Why does Russian have -r' < *-rĭ if the Old East Slavic word ended in *-ŭ?

Ukrainian final -r in theory could be from either *-rŭ or *-rĭ, but the -r of Ihor must be from *-rĭ since palatalized r appears before endings: Ihorja instead of *Ihora, etc.

Belarusian r is always unpalatalized, so the endings of Ihar do not reveal whether its -r was from *-rŭ or *-rĭ: e.g, Ihora, etc.

Another Norse name in East Slavic is Russian Oleg / Ukrainian Oleh / Belarusian Aleh from Old Norse Helgi via Old East Slavic Olĭgŭ.

Old East Slavic had no H-. (As already stated, the later h of Ukrainian and Belarusian is from g.) Old Norse -e- was borrowed as Old East Slavic Je- with a prothetic J-. This Je- then became Jo- and ultimately O-; cf. Proto-Slavic *ezero >  Russian/Ukrainian ozero / Belarusian vozera 'lake'. Belarusian lowered unstressed O- to A-.

'Strong' ĭ before a 'weak' ŭ lowered to e in East Slavic. (See Wikipedia on the 'strong'/'weak' distinction.)

Why does the -i of Old Norse Helgi correspond to Old East Slavic instead of -ĭ? I am reminded of how Russian third person verb endings end in -t from -tŭ instead of the expected -t' from -tĭ corresponding to Ukrainian -t', Belarusian -c', and - far outside Slavic - Sanskrit -ti. BROTHER-IN-LAW IGOR THE EEL

One feature that distinguishes standard Ukrainian (hereafter simply 'Ukrainian') from the other major Slavic languages is i from *o before a consonant followed by or *ŭ: e.g.,

ніч nich < *nochĭ 'night' (cf. Russian ночь noch')

кіт kit < *kotŭ 'cat' (cf. Russian кот kot)

Last Friday, it occurred to me that if Russian noch' corresponds to Ukrainian nich, then Russian Игорь Igor' should correspond to Ukrainian *Ігір *Ihir. (Russian -ь -' is a trace of *ĭ, *g weakened to h in Ukrainian, and Ukrainian palatalized r' lost its palatalization except before vowels.) But the actual Ukrainian name is Ігор Ihor with o.

Similarly, the Ukrainian cognate of Russian угорь ugor' < *ǫgorĭ 'eel' is вугор vuhor, not *вугір *vuhir. (Prothetic v- is common before stressed *u in Ukrainian. I don't know why the stress moved to o after prothesis. Russian retains the original initial stress.)

Ukrainian i is also partly from *e before a consonant followed by or *ŭ: e.g.,

сім sim < *sedmĭ 'seven' (cf. Russian семь sem')

обмін obmin < *obmenŭ 'exchange' (cf. Russian обмен obmen)

( According to Shevelov 1979: 322 and 1993: 950, *e did not raise before unless it received retracted stress:

*médŭ > мед med (not *мід *mid) 'honey' (cf. disyllabic forms with initial stress: médu, etc.)

*neslŭ́ > ніс nis 'carried' (cf. disyllabic forms with final stress: neslá, etc.; my assumption is that all disyllabic forms including the source of nis originally had final stress)

Could this be restated as *e raising before a stressed *ŭ? If so, why did *e raise in *obmenŭ? Russian obmén, obména, etc. has root stress whereas Ukrainian óbmin, óbminu etc. has prefixal stress. Is either stress original, or did *obmenŭ once have final stress?

However, the Ukrainian cognate of Russian деверь dever' < *děverĭ 'brother-in-law' is дівер diver, not *дівір *divir. (I is the regular Ukrainian reflex of *ě.)

Did *o and *e regularly fail to raise in Ukrainian before word-final *r and a short high vowel? *o did raise before word-medial *rĭ in гіркий hirkyj < *gorĭkij 'bitter' (cf. Russian горький gor'kij; y is the regular Ukrainian reflex of noninitial *i). A DIP INTO WHITE WATERS (PART 10): XIANGNAN TUHUA PROTO-TONES

I am normally skeptical of attempts to reconstruct proto-tone contours (as opposed to proto-tone categories), but against my better judgment, I wanted to see what I could do with the two 湘南土話 Xiangnan Tuhua 'local speech of southern Hunan' tone systems available at the 小學堂 Xiaoxuetang database: one from 白水村 Baishuicun (BSC) 'White Water Village' and another from 道 Dao County.

The overall picture of tone category evolution is clear:

Old Chinese had no tones.

Middle Chinese could be defined as the first stage of tonal Chinese. It might be more accurate to describe very early Middle Chinese as having phonations (clear / creaky / breathy) than tones. These phonations became phonemic after the consonants that conditioned them were lost. They probably developed into tones at different rates in different dialects.

Middle Chinese had four tonal categories:

平 'level' vs. 上 'rising'

去 'departing' vs. 入 'entering'

The Middle Chinese names of the tones exemplfify them: e.g., *bɨeŋ 'level' has a 'level' tone, etc.

The first two tones may have had level and rising contours in the dialect spoken by whoever coined those names which are first attested in the fifth century AD. That does not mean those categories were level and rising in other dialects of that period or later periods.

The names 'departing' and 'entering' may imply that those tones were perceived as opposites in some way but do not hint at contours. It is tempting to regard 'departing' as falling since the modern standard Mandarin reflexes of the first three tones after *voiceless initials are high level*, low rising**, and high falling, but there is no guarantee that the currently dominant Chinese language just so happens to preserve contours that are over 1,500 years old.

Later the four tones developed yin and yang allophones after different initial classes that became phonemic when initial distinctions were lost.

Modern reflexes of the four tones vary considerably: e.g., words that once had the 'departing' tone can have level tones (as in Cantonese) or rising tones (as in Shanghai). I use single quotation marks to distinguish between tone names and contours; the latter are written without quotation marks.

I have listed the tones of BSC and Dao in part 9. I reconstruct a seven-tone system for their common ancestor Proto-Xiangnan Tuhua (PXT):

Initial \ coda 'level': *-sonorant 'rising': *-ʔ 'departing': *-s 'entering': *-p/t/k/kʷ
'yin': *voiceless ('clear') *high falling (54) *high level (55) *mid level (33) ?*high rising (45) + no stop
'yang': *voiced ('muddy') *low falling (31) *low level (22) ?*mid falling (43~42) (+ no stop < *yang entering)

The merger of yang 'departing' and yang 'entering' may be an innovation distinguishing PXT from the rest of Chinese. If other PXT dialects retain a distinct yang 'entering' tone, an eighth tone will have to be reconstructed in the future.

As I wrote last night,

Yin/yang is correlated with height for the 'level' and 'rising' tones (yin : higher, yang : lower) but not for the 'departing' tones which have the opposite pattern (yin : lower, yang : higher).

So I did not hesitant to reconstruct higher yin and lower yang 'level' and 'rising' tones. The contours are more questionable. Dao has two falling 'level' tones and BSC has only one falling 'level' tone. It is simpler to assume that one tone became falling in BSC than to assume that two tones became falling in Dao.

If PXT 'level' tones were falling, then PXT 'rising' tones could not be falling (unless they were falling with a creaky phonation absent from 'level' tones). I project the level 'rising' tones of Dao back into PXT and regard the contours of the BSC tones as innovations.

BSC has merged the yang 'rising' and yin 'departing' tones into a single tone:

PXT *mid level + *low level > pre-BSC *nonhigh level > BSC low falling

Dao still has a distinct mid level yin 'departing' tone which I regard as a retention of PXT.

Up to this point, the PXT system is identical to the Dao system. Is Dao really that conservative?

I am most reluctant to construct the last two tones. The BSC and Dao contours are so different:

Tone BSC Dao
Yin 'entering' 55 35
Yang 'departing/entering' 33 52

If I average the contours, I get *high rising (45) for yin 'entering' and *mid falling (43~42) for yang 'departing/entering'. This almost fits the general yin-higher/yang-lower pattern. (Yang 'departing/entering' starts slightly higher than the other two yang tones.) Averaging is an act of desperation, not a serious methodology. Hence I have placed question marks before those two tones in my first table.

Final stops in entering tones could have been lost in pre-PXT, paving the way for the yang 'departing/entering' merger.

*High rising after *voiced initials.

**High falling after *voiced obstruent initials.

Tangut fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2014 Amritavision