Archives

18.2.17.23:59: HOW WAS OLD PERSIAN CUNEIFORM LIKE THE KHITAN SMALL SCRIPT?

It occurred to me today that both Old Persian cuneiform and the Khitan small script superficially resemble other scripts (Sumero-Akkadian cuneiform and Chinese characters) but operate on different principles.

Old Persian cuneiform was a syllabary with random gaps (transliterated below) plus a few logograms for 'Ahura Mazda', etc. not included in the table.

a
ka
ga
xa
ca
ja
ta
da
θa
pa
ba
fa
na
ma
ya
va
ra
la
sa
za
ša
ça
ha
i
X
X
X
X
X
X
di
X
X
X
X
X
mi
X
vi
X
X
X
X
X
X
X
u
ku
gu
X
X
X
tu
du
X
X
X
X
nu
mu
X
X
ru
X
X
X
X
X
X

The gaps do not correspond to gaps in Old Persian phonology or phonetics: e.g., although there was no cuneiform character <ki>, Old Persian did have /ki/, and that was written as a sequence <ka.i>.

The Khitan small script (KSS) also seems to have been a syllabary with random gaps plus some possible consonant letters and a few logograms. The phonetic values of the c. 400 distinct syllabograms are not well understood and in some cases are unknown. But the picture that emerges from Kane's (2009) transliteration of the KSS is one of random gaps: e.g., unlike Old Persian, the  KSS has <ki>,  but no <ka> is known yet. That particular gap may reflect the absence of ka in Khitan if its phonology were like that of its surviving relative Mongolian. However, other gaps may have been random: e.g., there is no known syllabogram <ma>, though there was a syllable ma that had to be written as a letter sequence <m.a>.

The Khitan and Jurchen large scripts are mixtures of syllabograms and logograms. The Khitan large script is too poorly understood for me to say anything about gaps in it. The Jurchen large script, OTOH, is mostly readable, and Kane's (1989: 27) table of syllabograms shows a few gaps. I have transliterated their contexts below:

ma
me
mi
X
mu
da
X
di
do
du
na
ne
ni
X
nu
ca
ce
ci
X
cu
ša
še
ši
X
šu
sa
se
X
so
su
ka
ke
ki
X
ku

The absence of <si> is not surprising since *si could have become ši or merged with ši. Jurchen could have been like Korean or Japanese which lack a distinction between /ši/ and /si/.

On the other hand, it is striking that gaps cluster in the column of <Co>-syllabograms. There is no obvious phonetic motivation for the absence of <Co>-syllabograms with the initials m- n- c- š- k- which do not constitute a coherent class of consonants. Nor is there a clear reason why there is no <de> if <te> and <ne> exist with initials at the same point of articulation. Each of those gaps could be either random or illusory - the 'missing' syllabograms could simply be one of the characters whose readings are currently unknown.

The Jurchen small script is all but unknown; the existing samples are too small for decipherment, much less the detection of gaps.

18.2.17.3:07: DID OLD PERSIAN HAVE UNWRITTEN FINAL CONSONANTS LIKE PYU?

It seems that Pyu sometimes had unwritten syllable-final consonants with the exception of /h/ which was always written on the line as a colon-like visarga. Some Pyu texts have subscript syllable-final consonant symbols and other don't. One Pyu text - the 'B' pillar of the Kubyaukgyi (a.k.a. Myazedi) inscription - has subscript consonants only in its first three lines and none in the remaining twenty-six. There is no obvious correlation between the presence or absence of subscript consonants and geography, date, or genre. The problem of why there were two styles of writing Pyu is reminiscent of the problem of why the Khitan had two scripts.

The Indic scripts of the Philippines originally had no means of indicating final consonants, and the Hanunó'o script is still generally written without the pamudpod vowel cancellation sign introduced in the 1950s.

Schmitt (2008: 84) suggests that Old Persian may have had a third type of situation in which some final consonants were written (/m r š/) and others were not though they

were perhaps still pronounced but in some manner phonetically reduced. Note that original Proto-Iranian *-a is written as Old Persian <-a> (i.e., [-aː]), but original *-an or *-ad is written as -<Ca> (i.e., [-a]).

I see two possibilities here:

1. *-an and *-ad merged into final short [-a] distinct from *-a which became long [-aː].

2. *-an became nasalized short [-ã] and *-ad became short [-aʔ] with a final glottal stop.

I used to think that Pyu also had unwritten nasal vowels and glottal stops that were reduced from earlier nasals and oral stops that were once written, but even the earliest texts do not always have final written consonants, even when there is more than sufficient space for them.


18.2.15.23:48: DOES PERSIAN 'AND' HAVE A PROTO-INDO-EUROPEAN SOURCE?

My short answer is no.

My long answer:

Persian و <wa> [væ] ~ [o] 'and' looks like a loan from the identically spelled و Arabic wa, but is in fact a convergence of an Arabic loanword with a native word *u < Old Persian utā (cf. Avestan and Vedic Sanskrit uta 'and'; the final lengthening is secondary).

Wiktionary derives utā in turn from Proto-Indo-European (PIE) *éti 'and', the source of Latin et 'and'. There are at least three problems with that etymology:

1. PIE *e should become Old Persian a, not u.

2. PIE *i should become Old Persian i (word-final -iy), not -ā.

3. There is already an Old Persian ati- 'beyond' (cf. Avestan aiti-* and Sanskrti ati- 'id.') which looks like the regular reflex of PIE *éti.

I think it's more likely that *uta was a Proto-Indo-Iranian innovation unless there are *uta-like forms elsewhere in Indo-European.

*2.16.15:21: The first -i- in Avestan aiti is epenthetic and conditioned by an i in the following syllable.


18.2.14.23:59: WHY WRITE 'WIND' AS 'PAGE NUMBER'?

The last of the fourteen spellings of Vietnamese gió 'wind' in the traditional Chữ Nôm script at nomfoundation.org is

𩖅 = số 'number' + 頁 hiệt 'page' (originally 'head')

số is phonetic. In Middle Vietnamese s- was retroflex [ʂ] and gi- was palatal [ɟ]*, but in modern Hanoi, they are respectively much closer as alveolar [s] and [z]. Does the spelling 𩖅 reflect a dialect like Hanoi? How far back does it go, and is it associated with a certain region? In theory the Chữ Nôm script could be a rich source of dialect history since scribes could invent characters for native Vietnamese words incorporating phonetic elements whose readings resembled those words in their dialect (but not necessarily in other dialecfs or even their own dialect at a different point in time).

The function of the right half of 𩖅 is obscure. The wind has nothing to do with pages or heads. But wait, I see at hvdic.thivien.net that 𩖅 could also write sỏ in đầu sỏ 'leader'. I think that word is a compound of the Chinese loan 頭 đầu 'head' and the native word sỏ 'head of a pig'. If so, then 𩖅 for gió 'wind' is a case of a Chữ Nôm character originally devised for one Vietnamese word being recycled to write another:

sỏ 'head of a pig' > written as 𩖅 'sô-head'́ > 𩖅 recycled for gió 'wind'

What I still don't understand is how 頁 'head' came to represent 'page'. In Vietnamese as far as I can tell, hiệt  (ultimately going back to Old Chinese *get) means both 'head' and 'page', but in Chinese, 頁 has a second, unrelated reading for 'page' going back to Old Chinese *sɯ-lap 'leaf' (normally written 葉). In theory the 'page' reading of 頁 should exist in Vietnamese as *diệp (the reading of 葉 'leaf'), but no such reading seems to exist.

*2.15.0:20: De Rhodes (1651) said gi- "should be pronounced in the Italian manner" (translation from Gregerson 1969: 161). I interpret that to mean gi- was a palatal stop [ɟ] rather than an Italian palato-alveolar affricate [] since the former is more likely in Southeast Asia.

2.15.3:03: Added a high-vowel presyllalbe *sɯ- to Early Old Chinese *lap 'leaf' to account for the lack of emphasis which is normally conditioned by lower vowels such as low *a in Middle Old Chinese. The phonetic series of 葉  (Karlgren's GSR 339 + 633) points to *sɯ- in most cases.

Word
Stage 1
Stage 2
Stage 3
Stage 4
世 'generation' (< 'leaf' + suffix)
*sɯ-lap-s *slap-s
*l̥ap-s
葉 'leaf' *sɯ-lap *lap
*lap
韘 'archer's thimble'
*sɯ-lap *slap
*l̥ap
屧 'bottom inlay in shoe' *sʌ-lep *sʌ-lˁep *slˁep

In Stage 2, high-voweled *Cɯ- blocks emphasis in the following syllable, but low-voweled *Cʌ- conditions it.

In Stage 3, some *CV-presyllables are reduced to *C- whereas others are dropped entirely.

In Stage 4, *sl- has fused into *l̥-, whereas *sV-l- still intact at stage 3 became a new *sl-.

But note 蝴蝶 *galep 'butterfly' in which *sɯ- or  even *s- cannot be reconstructed.


18.2.13.23:59: WELCOMING THIS WIND (PART 2)

How did a Chinese character 這 'to welcome' which should have been read as nghiện come to be read as giá (and hence qualify as a spelling of the native Vietnamese word gió 'wind')? I'll embed my answer in a longer discussion of the words written with 這 below.

Wiktionary regards 這 as

part of the(OC [= Old Chinese] *ŋaŋ, *ŋraŋs, “to face, to meet”) word family

and cites Zhengzhang's OC reconstruction *ŋrans.

I cannot immediately reject all that. Nonetheless, I am skeptical.

First, the earliest attestation of the word I can find is in an entry in the dictionary 玉篇 Yupian compiled in the 6th century AD: i.e., during the Middle Chinese (MC) period. Is there evidence for the word in Old Chinese, or was MC *ŋɨenʰ mechanically projected back into OC as *ŋrans? The word is not in Schuessler's 1987 dictionary of early Zhou Chinese. There is a common, unspoken, and dangerous assumption that almost any native Chinese word can be traced back to early Old Chinese. (I was going to say that obvious loanwords like 佛 Middle Chinese *but 'Buddha' are thankfully exempt and that no one would reconstruct an Old Chinese 'reading' of 佛, but Zhengzhang's site has such a reconstruction: *bɯd!)

Second, the 迎 word family has two types of forms: open syllables and velar-final syllables. (I disregard *-ʔ and *-s* which may be suffixes.) In the past I have proposed that *-a was from an earlier syllabic *-ŋ, the 'zero grade' of *-aŋ. I also proposed that *-a could be the zero grade of *-an. Below I provide examples with Sanskrit parallels (citing Sanskrit zero ~ -m alternations in lieu of Sanskrit zero ~ -ṅ [ŋ̍] alternations which don't exist since Proto-Indo-European had no *ŋ̍).

Old Chinese
Sanskrit
zero grade
a-grade
zero grade
a-grade
*wŋ̍ 'to go'
*waŋ-ʔ̍ 'to go'̍
ga-tá- < *gʷm̩-tó- 'gone' gám-a-ti < *gʷóm-e-ti 'goes' (Vedic)
*ŋn̩-ʔ 'to talk'
an 'speech'
ha-tá- < *gʷʰn̩-tó- 'slain'
hánti < *gʷʰénti 'slays'

My proposal explains why these word families don't seem to have forms with a mixture of final consonants: e.g., the 迎 *√ŋ-ŋ word family does not contain words with *-t, *-p, *-m, *-j, *-r, *-w, etc. The few *-k forms could reflect a lost denasalizing suffix.

If 這 belonged to that family, it would be the sole member with *-n.

My proposal has a number of problems: e.g., no support from the rest of Sino-Tibetan and no explanation for when zero grade occurs. (In the Sanskrit past participles above, one can see that unaccented roots take zero grade.)

In any case, the fact remains that -n words are anomalous in a zero ~ velar-final series, and that fact should be explained somehow - even if the zero-grade hypothesis is wrong.

It seems that at some point in the late first millennium AD, 這 came to be used to write an unrelated, nonhomophonous word 'this' (now zhè in Mandarin). The earliest attestation of 這 for 'this' that I can find is in the Jiu Tang shu 'Old Book of Tang' (945). How did that happen?

Here's what I've pieced together from Wiktionary (which should cite its sources) with my caveats.

The word 'this' was once written as 者 and was

[d]erived from (OC *tjaːʔ, “one which”), around the Tang Dynasty.

(OC *tjaːʔ, “one which”) > 者 (MC t͡ɕiaX, “this (possessive case)”) > 者 (MC t͡ɕiaX, “this (general demonstrative)”) > Mandarin 這 (zhè).

There are three problems with that etymology:

1. 者 'one which', unlike 'this', does not precede nouns.

but perhaps X 者 Y 'one which X Y' was reinterpreted as 'X this Y', followed by X before 者 becoming unnecessary?

2. 者 'this' has no 'possessive case' - no word in Chinese does.

That is really an analytical and terminological error that doesn't affect the validity of deriving 'this' from 'one which'.

3. 者 'one which'/'this' had a 'rising tone' in MC but 這 'this' has a 'departing tone'.

Was 者 'one which' used to write an unrelated homophone 'this'? Is the 'departing tone' of 這 'this' due to a sandhi tone (a 'departing'-like allophone of the 'rising tone'?) reinterpreted as the default tone since 'this' must always precede something (i.e., is in a sandhi context)?

There was also a word for 'this' with a 'level tone' written phonetically with a character 遮 for 'block off'. Could the 'departing tone' of 這 'this' be the etymological and colloquial tone while 'level' and 'rising tone' readings were artificial spelling pronunciations based on 遮 for 'block off' and 者 'one which'?

Wiktionary then says there was a "confusion in medieval handwriting" between 遮 'this' and 這 'to welcome' which led to 這 becoming the dominant spelling for 'this'.

Although Sino-Vietnamese readings almost entirely reflect Chinese as it was spoken during the end of the third Chinese domination of Vietnam (602-938; i.e., right before the early attestation of 這 for 'this' in Jiu Tang shu), I briefly thought già 'this' might be from the fourth Chinese domination (1407-1427). By that time 這 was firmly in place for 'this' in Chinese, and I could see the word entering Vietnamese via the Ming occupation. The trouble is that 這 would have been something like [tʂjɛ] in Ming Chinese* which would have been borrowed into Vietnamese as *tré with a retroflex initial and a mid vowel, not già with a *palatal initial and low vowel. So I think già is from the last days of the third Chinese domination before *a rose to a mid vowel in Chinese.

*2.14.6:28: I don't have any Ming materials on hand, so I am projecting the Yuan dynasty Phags-pa reading ꡆꡦ <jee> (interpreted by Coblin 2007: 171 as [tʂjɛ]; needless to say, the script should be rotated 90 degrees clockwise) of 這 forward into the Ming dynasty.

The Ming reading of 者 (homophonous except for its rhyme) was used to transcribe Jurchen je /tšə/: e.g., 兀者 *[utʂjɛ] for uje 'heavy' (#67 in the Bureau of Interpreters' Sino-Jurchen vocabulary, Kane 1989: 49).

According to Coblin (2003: 349), Robert Morrison romanized the early 19th century Mandarin rhyme of 這 and 者 as -ay as in May, possibly [e] in his dialect of English. (The Mandarin rhyme is a back [ɤ] in the modern standard.)

Of course all those different varieties of northern Chinese were probably not in a linear relationship across half a millennium, but they all have a nonlow vowel in common unlike giá whose low vowel is characteristic of pre-second millennium pronunciation.

Moreover I don't know if the Vietnamese would have perceived the 'departing tone' of the Ming occupiers as a sắc tone which was the standard equivalent of that tone in late first millennnium borrowings. Perhaps my hypothetical *tré would have had a different tone.


18.2.12.23:59: WELCOMING THIS WIND (PART 1)

nomfoundation.org listed fourteen spellings of Vietnamese gió 'wind' in the traditional Chữ Nôm script. In the previous post, I have already listed twelve spellings containing the phonetic 俞 du 'to consent' (or phonetics containing that phonetic: du 'to pass' and 愈 dũ 'more'). The remaining two spellings lack 俞 and could be said to belong to a Group D of miscellaneous spellings (or groups D and E with one character each):

13. 這 (see the entry for gió in Anthony Trần Văn Kiệm's Giúp đọc Nôm và Hán Việt 'Aid for Reading Nôm and Sino-Vietnamese')

14. 𩖅

I will discuss 14 later.

13 這 represented Middle Chinese *ŋɨenʰ 'to welcome'; its phonetic is 言 ngôn 'speech' atop 辶, the semantic element for motion. Sino-Vietnamese is almost entirely based on southern Late Middle Chinese, so the Sino-Vietnamese reading of 這 should have been *nghiện which is the SV reading of 這's Middle Chinese homophones such as 唁 'to offer condolences' and 彥 'handsome man'.

But the actual Sino-Vietnamese reading of 這 is giá which is obviously not far from gió 'wind'. It's understandable why gió 'wind' would have been written as 這 giá: the initial consonant gi- and the sắc tone (represented by an acute accent) match even though the vowels don't. (At least lower mid o [ɔ] is just one step up above low a.) It's less understandable how a character that looks like it should have been read something like 言 ngôn came to be read as an open syllable giá without an initial nasal. However, I think I figured out what happened, and I'll post my solution in part 2. The title of this two-part microseries hints at the answer.

2.13.0:11: Giles' Chinese-English Dictionary (1892 I: 48) lists the Sino-Vietnamese readings of 這 as the expected nghiện (converted from its notation) as well as gia (no tone indicated).


18.2.11.23:59: CHÂN GIÒ NƯỚNG IN CHỮ NÔM

Last night I went to a Vietnamese restaurant in search of chân giò nướng 'grilled trotters'. They weren't on the menu, but I did try to look up how that dish would have been written in the traditional Chữ Nôm script. My guess is something like

蹎𨃝𤓢

The first and third characters are straightforward made-in-Vietnam semantophonetic compounds:

chân 'foot, leg' = 足 'foot' + 眞 chân 'true'

𤓢 nướng 'to grill' = 火 'fire' + 曩 nãng 'formerly'

Although 娘 nương and 孃 nương are better phonetic matches for nướng 'to grill', both already have a left-hand element 女, and it would be awkward to place another left-hand element 火 'fire' next to it. (I suppose placing 娘 or 孃 atop the bottom version 灬 of 'fire' would have been possible, but I haven't seen any made-in-Vietnam characters with 灬.) Stripping them of 女 and replacing that element with 火 'fire' would result in

烺 which already exists and is read lãng 'bright' with l- (not n-)

爙 which already exists and is read nhưỡng 'fiery appearance; Mars' (rare) with nh- [ɲ] (not n-)

whereas 曩 nãng does have n-.

One might conclude that matching initials were a high priority when selecting phonetic components of Chữ Nôm characters. But the second character of the dish I wanted does not have a phonetic with a matching initial:

𨃝 giò 'leg of an animal' = 足 'foot' + 徒 đồ 'disciple'

The trouble is that I don't think there is any Chinese character whose Sino-Vietnamese reading combines gi- with a rounded vowel. Although the only part of 徒 đồ that precisely matches giò 'leg of an animal' is the tone, everything else is close enough:

đ- [] is not gi- [z] (northern) ~ [j] (southern) < *ʑ, but at least it's neither labial nor velar; it's in the middle zone with gi-

ô [o] is back mid rounded like o [ɔ]

I just realized giò 'leg of an animal' could in theory have been written with 由 do (d- [z] (northern) ~ [j] (southern) < *j) as a phonetic. Cf. how gió 'wind' with a different tone was written with a d-phonetic:

Group A with phonetic 俞 du 'to consent'

1. with a Vietnamese abbreviation of 風 'wind' on top: ⿱風俞 (U+2CC82)

2. with a Vietnamese abbreviation of 風 'wind' on the right: 𫖾

3. with 雨 'rain' (symbolizing weather phenomena) on top: ⿱雨俞 (U+2CC05)

Group B with phonetic 逾 du 'to pass'

4. as 逾 without modification

5. with 風 'wind' on top: 𩙋

6. with a Vietnamese abbreviation of 風 'wind' (not in Unicode) on top: 𩙌

7. with 雨 'rain' (symbolizing weather phenomena) on top: 𫕲
Group C with phonetic 愈 dũ 'more'

8. with 風 'wind' on the left: 𩙍

9. with a Vietnamese abbreviation of 風 'wind' on the left: 𫗃

10. with 月 'moon/meat' as a substitute for the Vietnamese abbreviation of 風 'wind' on the left: ⿰月愈 (not in Unicode)

11. with a Vietnamese abbreviation of 風 'wind' (not in Unicode) on the right: ⿰(𠘨+二)愈

12. with a Vietnamese abbreviation of 風 'wind' (not in Unicode) on top: 𫗄

2.12.2:04: Added all d-phonetic characters for gió 'wind' (I couldn't stop at one).


18.2.10.23:56: THE ZANABAZAR SQUARE SCRIPT (PART 3)

I left out the retroflex sibilant out of my discussion of Zanabazar Square script retroflex characters in part 2. Unlike the other retroflexes, <ṣa> is a mirror image of a nonretroflex character <śa> as in Tibetan and not as in other Brahmic scripts such as Brahmi or Devanagari where <ṣa> and <śa> are completely different.

Transliteration
Zanabazar
Tibetan
Brahmi
Devanagari
<śa>
𑨮
𑀰

<ṣa>
𑨯 𑀱
<ka>
𑨋 𑀓
<kṣa>
𑨲 𑀓𑁆𑀱
क्ष

The table above includes <kṣa> which has a special character in the Zanabazar Square script which is clearly derived from <ka> though the altered lower left corner bears little resemblance to <ṣa>.

Tibetan has a transparent stack of <ka> over <ṣa>.

In Brahmi, <ka> and <ṣa> are fused into a transparent ligature.

Only now after twenty-six years do I finally see the logic in Devanagari <kṣa> which I learned as a special character. The top left loop is what's left of <ka> and the bottom of the left side is what's left of <ṣa>.

2.11.15:03: I wonder if Zanabazar's <kṣa>  was influenced by Devanagari <kṣa>. Both have similar bottom left-hand corners.


18.2.9.23:59: THE ZANABAZAR SQUARE SCRIPT (PART 2)

Thanks to Andrew West for providing me with a WOFF version of his font for the Zanabazar Square script. I have tagged part 1 of this series to employ that font.

The Tibetan roots of the script are implied in its characters for retroflex consonants which are derived from the characters for dental consonants:

Retroflex
Dental
Zanabazar
Transliteration
Zanabazar Transliteration
𑨔
<ṭa>
𑨙 <tha>
𑨕 <ṭha>
𑨚 <tha>
𑨖 <ḍa>
𑨛 <da>
𑨘 <ṇa>
𑨝 <na>

'Implied' because the retroflexes are derived from the dentals in several different ways rather than simple mirror-image versions of the dentals as in Tibetan:

Retroflex
Dental
Tibetan
Transliteration
Tibetan
Transliteration

<ṭa>
<tha>
<ṭha>
<tha>
<ḍa>
<da>
<ṇa>
<na>

The Zanabazar retroflexes nonetheless are not as distinct from the dentals as they would have been if they had directly descended from Brahmic retroflex characters:

Retroflex
Dental
Brahmi
Transliteration
Brahmi
Transliteration
𑀝
<ṭa>
𑀢 <tha>
𑀞 <ṭha>
𑀣 <tha>
𑀟 <ḍa>
𑀤 <da>
𑀡 <ṇa>
𑀦 <na>

(If the Semitic hypothesis of the origin of Brahmi is correct, <tha da na> are original* and <ṭha ḍa ṇa> may be derived from them, but the relationships between them were no longer obvious in the Brahmic scripts of Zanabazar's time after centuries of graphic evolution: e.g., Devanagari <ṭha> and <tha>.)

The Tibetan script did not incorporate any descendants of the Brahmi retroflex characters. Here is the earliest extent account of the creation of the Tibetan script (translated by Sam van Schaik):

In India the script has 50 letters. Tönmi discarded the gha [voiced aspirate] group and the ṭa [retroflex] group, which do not appear in Tibetan speech.

The consequences of discarding the gha group are visible in part 1 where I explained how the Tibetan script and its Zanabazar derivative represented voiced aspirates without relying on descendants of the Brahmi voiced aspirate series.

Later the Tibetan script was extended for transcribing Sanskrit, and retroflex letters were created by mirror-imaging dental letters instead of Tibetanizing northern Indian retroflex letters that descended from Brahmi retroflex letters.

Was Zanabazar unaware of a Brahmic script with a retroflex series completely different from its dental series? If he was aware of one, he might have decided to somewhat follow the Tibetan precedent anyway because graphically related characters are easier to learn than graphically unrelated ones. (The same logic may underlie the voiced aspirates of the Zanabazar square script.)

*2.10.0:15: Salomon (1998: 25) compares Brahmi <tha da na> to Phoenician and Aramaic <tˁ d n>. I don't see much similarity between Brahmi <da na> and Semitic <d n>, but Brahmi 𑀣 <tha> certainly does look like Phoenician 𐤈 <tˁ> and its Greek derivative Θ theta. However, coincidental overlaps between simple shapes are expected.

If Brahmi was based on a Semitic script, its <ṭa> seems to have been created ex nihilo unless Bühler's (1895) view that <ṭa> was a reduction of <ṭha> is correct.


18.2.8.17:32: A GOOD AMOUNT OF VARIATION: THE ORIGIN AND ORTHOGRAPHY OF VIETNAMESE TỐT (PART 1)

Thompson (1976: 116) reconstructed Proto-Viet-Muong *tʰoc 'good (beautiful)' on the basis of

(I have rewritten Thompson's segments in IPA but have retained his tonal notation.)

The trouble with reconstructing *tʰ is that there is a different set of sound correspondences also pointing to *tʰ found in 'medicine':

Thompson solved this problem by reconstructing two kinds of proto-*tʰ: one that deaspirated and one that didn't. But why would one deaspirate?

Premodern Chinese loans into Vietnamese point to a simpler solution. They have the following correspondences:

A chain shift occurred in Vietnamese after borrowing from Chinese:

*s > *t >

That shift postdates the split of Vietnamese from the other Viet-Muong languages.

Proto-Viet-Muong *s became aspirated in the ancestor of Mường Khến rather than unaspirated t as in Vietnamese. Some other Muong varieties retain *s.

Thus I reconstruct Proto-Viet Muong *soc 'good'.

Can that word be projected back into Proto-Vietic, or is it a Proto-Viet-Muong innovation? In other words, does it exist in any non-Viet-Muong Vietic languages, and if it does, is it native to those languages (rather than a borrowing from Viet-Muong)?

The only match I could find at SEAlang is Ruc tʰóːt 'good'. This looks like a loan postdating the fortition of Proto-Viet-Muong *s and the shift of Proto-Viet-Muong *oc to ôt in Vietnamese since

The aspirated initial rules out Vietnamese as a source. Is there a nearby Muong language with a word like tʰóːt for 'good'? Could the Ruc word be a composite of a Muong word with tʰ- and a Vietnamese word with [ot]? But there do not seem to be any Muong in Quảng Bình Province where the Ruc live. Puzzling.

I suppose Ruc haːj is the native word for 'good'.

Next: The many spellings of Vietnamese tốt in Chữ Nôm.


18.2.7.23:28: THE ONSET OF PROTO-TAI 'NEAR'

I know from experience that interrupting a series of posts leads me to dropping a series midway and forgetting about it. (I do remember the Golden Guide series I never finished, though. That is too big to forget.) But on the other hand I also don't want to forget topics that come up in the middle of a series. This is one such topic.

In Siamese,

(I use Tai tone terminology [A1, C1] in lieu of IPA tone letters to facilitate comparison between Tai languages.)

are a minimal pair distinguished only by tone in pronunciation. Their different vowel symbols imply an earlier segmental distinction that was lost - and that can be confirmed by other Tai languages which preserve different rhymes: e.g., Yay caj A1 'far' and caɰ C1 'near'. (All non-Lao Tai data in this post is from Pittayaporn 2009.)

The Lao cognates of 'far' and 'near' are like those of Siamese apart from lacking a medial [l]:

So far, Siamese, Lao, and Yay seem to indicate that 'far' and 'near' should be reconstructed with the same initial in their common ancestor Proto-Tai. However, other Tai languages have different initials in the two words: e.g.,

Gloss
Bao Yen
Lungchow
'far'
kwɤj A1
kwaj A1
'near'
sɤɰ C1
kʰjaɰ C1

Therefore the two words must have had different initials in the proto-language. Pittayaporn (2009: 345) reconstructed them as sesquisyllables ('one-and-a-half syllables') *k.laj A and *k.raɰ C with a presyllable (his 'degenerate syllable') k.-. The presyllable-onset sequences *k.l- and *k.r- were distinct from the true clusters *kl- and *kr- which had different reflexes:

Pittayaporn's Tai subgroup
Q
N
P
F
Proto-Tai
Siamese
Lao
Yay
Bao Yen
Lungchow
*k.l- (only in 'far')
kl-
k-
c-
kw-
kw-
*kl-: e.g., 'rice seedling'
kl-
k-
c-
c-
kj-
*k.r-: e.g., 'illness, fever'
kʰ-
kʰ- c-
kʰ-
h-
*kr-: e.g., 'six'
h-
h-
r-
s-
h-

Notice that the initials of the non-Yay reflexes of *k.raɰ C 'far' do not match those of the similar-sounding word *k.raj A 'illness, fever' in the table above:

'Far' is the only instance of Siamese kl- from *k.r-. Could the common ancestor of Siamese and Lao have irregularly altered 'far' to match the*k.l-initial of 'near'? However, no such analogy would motivate the initials of 'far' in Bao Yen and Lungchow.

Bao Yen has both kʰ- and s- as reflexes of *k.r-. Might they be reflexes of different presyllables?

Bao Yen, as its Vietnamese name implies, is spoken in northwestern Vietnam. It may be no coincidence that the kʰ- and s- reflexes of *k.r- are like the Mường Khến and northern Vietnamese reflexes of Proto-Viet-Muong *kr-:  x- (< *kʰ-) and [s].

As for Lungchow, it has three reflexes of *k.r-:

Perhaps *k.r- generally simplified to *kr- in pre-Lungchow but not in 'hard' and 'near' where it developed into kʰ(j)-. The medial -j- of kʰjaɰ C1 'near' is reminiscent of the -j- that is a reflex of *-l- in *kl-. kʰjaɰ C1 'near' looks like a compromise between *k.r- and *kl-variants of 'near' in pre-Lungchow. Could such variation go back to Proto-Tai, with some languages like Thai and Lao reflecting a version of the word with *-l- instead of *-r-?

But I think the Lungchow words for 'near' and 'hard' may actually reflect different presyllabic vowels:

A palatal presyllabic vowel conditioned -j- in 'near', whereas 'hard' had no such vowel and therefore never developed -j-.

That hypothesis might explain other cases of unexpected -j- in Lungchow. But do such cases exist? And might the s- of Bao Yen sɤɰ C1 come from the *kIr- I proposed above (as opposed to Bao Yen kʰ- < *kVr- in which V is not palatal)?


18.2.6.23:59: THE ZANABAZAR SQUARE SCRIPT (PART 1)

Yesterday I learned of the Zanabazar Square script from Andrew West and downloaded his font for it. In short it is like an extended version of the Tibetan script with additional characters for Sanskrit and Mongolian. If you do not have a Zanabazar Square font, you can see the characters here.

The first thing that caught my eye was that it has characters for voiced aspirated initial <gha ḍha dha bha dzha> that are not simply ligatures with <ha> like Tibetan གྷ ཌྷ དྷ བྷ ཛྷ <g.ha ḍ.ha d.ha b.ha dz.ha>. They are also not derived from the Brahmi characters for for 𑀖𑀠𑀥𑀪𑀛 <gha ḍha dha bha jha>:

There is no consistent graphic method of derivation. Moreover, the base characters are a mix of voiceless unaspirated initial characters (<ka ta>) and voiced initial characters (<ḍa ba dza>). Might that hint at how Mongolians perceived Tibetan pronunciations of Sanskrit voiced aspirates? Nonetheless, those derived characters are still easier to learn than hypothetical Brahmi-based characters for <gha ḍha dha bha dzha> whose shapes bore no relation to those of characters for phonetically similar consonants: e.g., 𑀖 <gha> looks nothing like 𑀓 𑀔 𑀕 <ka kha ga>, etc. You can see many more examples in Brahmi-descended scripts here.


18.2.5.23:59: RETURN TO THE SILVER RIVER (PART 4)

Given that the Tangut character for the Chinese loanword for 'river'

𗊧

1990 1chhwan3 'river' < Tangut period northwestern Chinese 川 1chhwan3 'river'

contains

𘠣

Unicode Tangut component 036 / Nishida radical 181 / Boxenhorn code cir

the left side of

𗋽

3058 2zyr'4 'water'

I would expect the Tangut character for the native loanword for 'river' to contain that component. But it doesn't. 1530 1ma4 'river' is analyzed in Tangraphic Sea as

𗲌=𗋽+𘇲𘢸

- top right (not left side!) of 3058 2zyr'4 'water'

- right side of 0632 1vi1 'ripe, cooked'

0632 has no phonetic or semantic similarity to 1530.

Of course, there is water in a river, so 3058 is not surprising, though its abbreviation as

𘢸

Unicode Tangut component 185 / Nishida radical 026 / Boxenhorn code fam

is. Nishida (1966: 242) regarded that component as 'stone', as it appears in

𗱸

1074 1luq1 'stone'

But this is a case where labels for components are misleading; it makes no sense to call the top of 1530 'stone'.

If the Tangut script reflects the phonetic structure of a second Tangut language - 'Tangut B' - the characters imply that the Tangut B readings of 3058, 1530, and 1074 all have a common element X, and that 'river' and 'ripe' are near-homophones:

3058 'water': X + ? + ? (the left-hand element might be semantic and have no reading)

1530 'river': X + the sounds of 0632 'ripe'

1074 'stone': X + ? + ?

Is there a language in the region in which 'river' sounds like 'ripe' preceded by a segment or syllable that would be the phonetic value of X?

2.6.1:09: I would also expect that language to have the same initial consonant (or syllable) in 'river' and 'stone'. I'm guessing no such language exists today. But did one exist in the past?


18.2.4.22:28: RETURN TO THE SILVER RIVER (PART 3)

3572 2ngwo1 'silver' was analyzed in Precious Rhymes of the Tangraphic Sea as

𘊟=𗵧+𘁾

- the bottom left of 0136 2de'4 'ingot' (< Middle Chinese 鋌 *deŋˀ 'id.') +

- the right of 5722 2ngwo1 (first half of 3360 5722 0nwy0 2ngwo1 'eloquence')

Clearly 0136 is semantic and 5722 is phonetic. Case closed? Not quite.

First, why pick the bottom left of 0136 instead of the 'metal' radical

𘨝

Unicode Tangut component 542 / Nishida radical 028 / Boxenhorn code tex

which is also absent from the character for another major metal in the analysis of 0136 2de'4 'ingot':

𗵧=𗵒+𘊟+𗣒

- top of 0152 1kiq2 'gold' +

- left of 3572 2ngwo1 'silver' +

- right of 2290 2lon1 'round' (but an ingot isn't round! is 0136 really 'ingot'?)

Why write the words for some metal objects with 'metal' but not others? That is quite different from the situation in Chinese where nearly all metals are written with 金 'metal' - one exception that comes to mind is 汞 gong 'mercury', a combination of 工 gong (phonetic) and 水 'water' (semantic).

Second,

𘨐

Unicode Tangut component 529 / Nishida radical - / Boxenhorn code tau

is also phonetic in

𘁉=𗵚+𘊟

5723 2ngwo1 'elephant' (only found in dictionaries) =

- bottom center of 0021 1bu2 'elephant, ox'? (the semantics of this word need closer examination)

- right of 3572 2ngwo1 'silver'

but the seven other characters containing that component are not read 2ngwo1. I would like to look into the other functions of that component after I wrap up this tetralogy on the Silver River.


18.2.3.23:59: RETURN TO THE SILVER RIVER (PART 2)

1990 1chhwan3 'river', the Tangut character I used to transcribe Chinese chuan 'river' in the name of the Tangut font I use, was analyzed in Tangraphic Sea as a combination of

𗊧=𗋽+𗦎+𗿀

1. the left side of 3058 2zyr'4 'water'

2. the center of 2474 2rar1 'to flow'

3. the right of 2107 1tsir1 'earth'

The left side of 'water' is no surprise.

𘠣

Unicode Tangut component 036 / Nishida radical 181 / Boxenhorn code cir

is the left-hand form of the Tangut radical for 'water' - one of the few elements in the script that has an indisputable single meaning - and its presence in 'river' is similar to that of the presence of the Chinese radical 氵 'water' in 江 jiang 'river' (as in 江泽民 Jiang Zemin) but not 川 chuan which is a drawing of a river). Grinstead (1972) regarded the Tangut radical as a derivative of the Chinese radical.

The right side of 江 jiang 'river' is phonetic - in Old Chinese, 江 was *kroŋ and its phonetic 工 was *koŋ - but the remaining two components of Tangut 1chhwan3 are not phonetic: they sound nothing like 1chhwan3. And unlike the water radical,

𘡤𘠴

Unicode Tangut components 101, 053 / Nishida radicals 104, - / Boxenhorn codes dai, cok

have no obvious fixed meanings.

Nishida (1966) attempted to assign meanings to as many components as possible but could not find one for his radical 104. It seems to be phonetic in twelve characters pronounced rar, but of course 1chhwan3 'river' is phonetically completely different, and sixty other characters containg it are also not pronounced rar. It could not mean 'flow' in, say:

𗙊

0205 1jan3

which is a meaningless transcription character.

And no one seems to have found any function for the small, closed set of exclusively right-hand components such as Unicode 053 - could they be phonetic symbols for Tangut B final consonants akin to 音 in Old Korean pam 'night', written 夜音 <NIGHT.m> or 乙 in the made-in-Korea character 乭 tol 'stone', a ligature of the Chinese characters 石 'stone' and 乙 ŭl? If the intent was to represent 'river' as 'water flowing through land', why not pick the 'earth' radical

𘤆

Unicode Tangut component 263 / Nishida radical 210 / Boxenhorn code ges

instead of a right-hand component also found in non-'earth' characters such as

𗅉𗅋

1906 1non'2 'and, also, again' and 1918 1mi4 'not'

which didn't even sound like 2107 1tsir1 'earth' (or each other)?

2.4.0:36: Ironically, the 'earth' radical isn't in

𗿀

2107 1tsir1 'earth'

whose simlar-looking left-hand component (Unicode Tangut component 267 / Nishida radical 211 / Boxenhorn code gii) looks like

𗾆

3087 1dzew4 'waist'!

Andrew West looked at every single character containing a element resembling 3087:

This component is Nishida Tatsuo's Radical No. 211, which he calls the "sun radical" 日部 (see Seikago no kenkyū 西夏語の研究 [A Study of the Hsi-Hsia Language] page 244). However, very few characters with this component are in any way related to the sun, and so Nishida's radical name is a misnomer (by far the largest semantic group of characters with this component is the "Bird-related" group, but Nishida already has a "bird" radical). As we shall see below, unlike most Chinese radicals, Tangut radicals do not have a single fixed meaning, and so giving names to them (as Nishida and others have done) is at best not very useful, and at worst misleading.

I am one of those 'others', and I confess I give names partly out of convenience - it's easier for me to remember names than numbers. However, those names are only truly justified whenever there is a nearly one-to-one correspondence between a component and a function: e.g., 'water' usually is in water-related characters, though there are still puzzling exceptions I cannot explain like

𗋕𗋚

2019 1tha4 'third person pronoun' and 2590 2vy3 'outward motion; perfective prefix'

which have no obvious aquatic connection (unless 2vy3 once meant 'out of a river'?). More on those characters in my entry on line 25 of the Golden Guide, a series I have yet to finish.


18.2.2.22:18: RETURN TO THE SILVER RIVER (PART 1)

I haven't posted anything in over a year. In fact this is the first time I'm using KompoZer since the end of last February when I started a post I never finished. I have lots of those. It might be interesting to complete them knowing what I know now. But in the meantime, I thought I'd start a new wave of posts with the name of Prof. 景永时 Jing Yongshi's font that freed me from the need to make a GIF every time I wanted to display a Tangut character: Tangut Yinchuan. Which in Tangut might be

𗼎𗾧𗷲𗊧

3752 3296 1478 1990 2my4 2na'3 1gin4 1chhwan3

if I phonetically transcribe how Yinchuan 'Silver River' was pronounced in the Chinese dialect known to the Tangut a millennium ago. (1990 isn't just a transcription; it is a borrowing of that dialect's word for 'river'.) Otherwise I could render the name as

𗼎𗾧𘊟𗲌

3752 3296 3572 1530 2my4 2na'3 2ngwo1 1ma4

with the native words for 'silver' and 'river'.

I've written about the Tangut autonym 3752 3296 before, so I'm going to look at the four characters I've added to it above. In order to not be too ambitious, I'll focus on just one character per post.

The first is the transcription character 1478 analyzed in Tangraphic Sea as

𗷲=𗷭+𗰽

1478 1gin4 = left of 0830 1kin4 + right of 0405 1dzwyq4 'wall'

1478 1gin4 and 0830 1kin4 are nearly homophonous, so obviously

𘣟

Unicode Tangut component 224 / Nishida radical 123 'together' / Boxenhorn code fol

is supposed to tell us that 1478 sounds like 0830, though it is not obvious how a Tangut reader would know that 0830 was the source instead of any of the 96 other characters with 224/fol is a problem:

𗵧𘑿𘑿𘜃𘜃𘘭𗷮𗸈𗷶𗷺𗸃𗷵𗸍𗷯𗷴𗷻𗸓𗹏...

It is also not obvious is why 1478 is said to have its right side taken from the left side of 0405 1dzwyq4 'wall' which sounds nothing like 1gin4. 1gin4 does not mean 'wall'. The Tangraphic Sea defines it as a tribal and place name, giving

𗷲𗉔

1478 0707 1gin4 1chew3 (a transcription of Tangut period Chinese 銀州 *1gin4 1chiw 'Silver Prefecture')

as an example. Was the Gin tribe or the Silver Prefecture associated with walls?

Some Tangut transcription characters are combinations of components of two characters: one character for the initial consonant and another for the rhyme. The last of those characters in Li Fanwen's 2008 dictionary is 6072:

𘌝=𘌜+𗋾

6072 2pu3 = 5970 1pi2 + 3057 2zhu3

In theory, 1gin4 could have been written as a combination of a component from a g-character and a component from a 1-in4 character (the initial 1- indicates the tone which belongs to the rhyme, though I write it first following Arakawa Shintarō's convention). And in fact, such a character exists:

𘝰=𘄎+𘃻

5622 1gin4 = 1638 1gi4 + 0494 1in4

5622 even appeared in another transcription of Tangut period Chinese 銀州 1gin3 1chiw 'Silver Prefecture':

𘝰𗉔

5622 0707 1gin4 1chew3

So why were two characters

𗷲𘝰

1478 and 5622

created to write the syllable 1gin4? 5622 doesn't have an entry in Tangraphic Sea, but Homophones lists 5622 and 1478 as ... nonhomophones.

Ah, I see what happened now. My readings are based on those of Gong Hwang-cherng who thought 5622 and 1478 were homophones. But Homophones is right - the Tangraphic Sea fanqie for 1478 indicates that 1478 is Grade III, not Grade IV:

𗷲=𗤡+𗙃

1478 1gin4 = 3590 1gi'4 + 1661 1lin3

Hence from now on I will read 1478 as 1gin3 with a final -3 for Grade III.

I think I understand what happened. Normally Tangut only permits three grades in a syllable with g-: I, II, and IV. But the Chinese word for 'silver' was 1gin3 with Grade III. So the Tangut had two options: they could either write the Chinese word as 1478 1gin3 with an un-Tangut combination of g- and Grade III, or they could write it as 5622 1gin4 with a slightly Tangutized pronunciation. I wouldn't be surprised if most Tangut called the Silver Prefecture 1gin4 1chew3 and if the literate among them often 'misread' 1478 as 1gin4 with Grade IV to avoid the un-Tangut combination of g- and Grade III.

What were the Grades, exactly? I still don't know, but for now I believe III was nonpalatal and IV was palatal. Tangut was like Russian which normally favors 'Grade IV' [i] over 'Grade III' [ɨ] afer velars: e.g., the plural of pirog is pirogi [pʲirɐˈɡʲi] rather than *[pʲirɐˈɡɨ] with the regular plural ending [ɨ]. "Normally" but not always, because the great late Prof. Kychanov's name contained a velar [k] followed by 'Grade III' [ɨ]. And because the Tangut could pronounce velars with Grade III if they really wanted to closely imitate Chinese which had no restrictions on velars and Grade III.


17.1.17.23:57: BEGINNINGS OF THE END

Tangut

𘛵

4859 2to1 'to end'

is a relatively simple character that was supposedly derived from more complex characters according to Combined Homophones and Tangraphic Sea 6.231:

𗀬𗡼𘃪𗘡

0117 2705 5712 0737

2thew1 2ber'4 1jwa3 1chhen3

'(first half of 'finally'?) right to.end bottom'

But surely 0117 and 5712 were derived from 4859 rather than the other way around.

0117 is a particularly odd 'source' as

𗀬𗀭

0117.0048 2thew1 2thwu4 'finally' (?)

is a disyllabic word apparently only in dictionaries. Does it belong to the 'ritual language' (which I think was a substratal language)? It looks like a reduplicative form.

1.19: Li Fanwen (2008: 20) even phonetically glossed 0117.0048 in Chinese as 都都 dudu as if it were a perfect reduplication, though it wasn't; there is no doubt that its two syllables belong to different rhyme categories (2.38 and 2.3). If the word is of native origin or was borrowed very early, it could be mechanically derived from *tʰopH.Pɯ.tʰoH:

*-op > -ew1 (but -ew1 also has other sources; see below)

*-H > tone 2-

*Pɯ- > -w-...4

*-o > -u

Could 2thew1 2thwu4 be a borrowing of something like *tʰop(p)ɯtʰoH? Could a single medial *-p- be the source of both final -w and medial -w-? Could tone 2 have spread from the second syllable? Or was the original medial consonant an aspirated *-pʰ- that was the source of (1) final -w and tone 2 of the first syllable and (2) medial -w- of the second syllable?

One problem with the above scenario is that both halves of 2thew1 2thwu4 are attested apart from each other in the definition for 5712 (Mixed Categories of the Tangraphic Sea 7.133):

𘃪𗫂𘛵𘃪𗀬𘃞...𗀭𘃞

5712 3583 4859 5712 0117 5285 ... 0048 5285

1jwa3 1ta4 2to1 1jwa3 2thew1 1ly3 ... 2thwu4 1ly3

'5712 is [as in] 4859 5712 0117 ... is 0048.'

Li Fanwen (2008: 20) translated that definition as 畢者終、竟、畢也...終也 'finish is end, finally, finish, ... is end', interpreting

𘛵𘃪𗀬

4859 5712 0117

2to1 1jwa3 2thew1

as separate glosses. However, if that were the author's intent, he could have broken up the three syllables with the phrase-final particle 5285:

*𘃪𗫂𘛵𘃞𘃪𘃞𗀬𘃞

5712 3583 4859 5285 5712 5285 0117 5285

1jwa3 1ta4 2to1 1ly3 1jwa3 1ly3 2thew1 1ly3

'5712 is 4859, is 5712, is 0117.' = 畢者終也、竟也、畢也

Although 4859 5712 0117 could be a string of three words (there is no Tangut word for 'and'), I tentatively assume that

𘛵𘃪𗀬

4859 5712 0117

2to1 1jwa3 2thew1

is a trisyllabic word ending in a bound morpheme 0117. In any case, 0117 and 0048 are in two separate parts of the definition of 5712 and therefore are probably not borrowed from a single disyllabic word, though it is hypothetically possible for an original disyllabic word to be later reanalyzed as a sequence of two morphemes: cf. Late Old Chinese 獅子*ʂitsəʔ 'lion' (from a form like Tocharian B ṣecake) later reanalyzed as 'lion' + noun suffix.

I thought 0117 2thew1 might be a reduplication of 4859 2to1 in an X Y X' pattern, but they can't be terribly close in pre-Tangut,

𘛵𘃪𗀬

4859 5712 0117

2to1 1jwa3 2thew1< *tokH PɯNCaC KtopH?

and it would be weird for X' in such a pattern to then combine with an X'' in another word - namely,

𗀬𗀭

0117.0048 2thew1 2thwu4 < *KtopH Pɯ.KtoH?

Could 4859, 0117, and 0048 share a root *to? Here is a list of possible reconstructions for each morpheme:

4859 2to1 < *taŋH, *tokH, *tojH?

*taŋH resembles Old Chinese 終 *tuŋ 'end', but the vowels don't match.

0117 2thew1 < *tʰopH, *Cʌ.tʰukH, *Cʌ.tʰikH

or *KtopH, *Kʌ.tukH, *Kʌ.tikH?

The *top-like reconstructions resemble Proto-Kuki-Chin *toop 'end', but it's unlikely a o

I reconstruct lower-vowel presyllables *Cʌ- and *Kʌ- to condition Grade I in the higher-vowel rhymes *-ukH and *-ikH; without such presyllables, those rhymes would have retained vowel height and developed into Grade IV -iw rather than Grade I -ew.

If aspiration is not original, then it is from *K-.

0048 2thwu4 < *Pɯ.tʰoH, *PtʰəH, *Pɯ.KtoH, *PKtəH, *Kɯ.PtoH, *KPtəH?

I assume medial -w- is always secondary from *P-, but I could be wrong.

I reconstruct higher-vowel presyllables to condition Grade IV in the lower-vowel rhyme *-oH.

If aspiration is secondary, then it is from *K-, and the order of this *K- relative to *P- is uncertain.

Out of all the above possibilities, I could pick a set sharing *to as a common denominator and then regard the other elements as affixes:

4859 2to1 < *to-k-H, *to-j-H?

0117 2thew1 < *K-to-p-H?

0048 2thwu4 < *Pɯ-K-to-H, *Kɯ-P-to-H?

But what would those affixes mean? And are there any other alternations of the type -u ~ -ew justifying the reconstruction of an earlier alternation *-Ø ~ *-p?

Putting diachrony aside, the synchronic meanings of 0117 and 0048 are uncertain:

Li Fanwen number
Clauson 2016
Nishida 1966
Grinstead 1972
Kychanov and Arakawa 2006
Li Fanwen 2008
0117
(none)
遇い終わる 'to finish meeting'
(none)
заканчивать, завершать
finish, end
約束, 完結, 終
completely, finally
完 (adv.)
0048
會見を終わる 'to end a meeting'?
заканчивать
finish
約束, 終
at last, in the end
終 (adv.)
0117-0048

(no polysyllabic words)
заканчивать, завершать finish, end
約束, 完結, 終了, 做完
完畢, 終畢

(1.29.1:27: Filled in Nishida column. 0117 does not have its own entry in Nishida 1966, but its meaning is given in the entry for 0048.)

I think the definitions in modern (i.e., post-Clauson) dictionaries are speculative. Not entirely groundless - the fact 0117, 0048, and 0117-0048 appear in definitions for 'end'-words indicates that they mean something like 'end'. But 'something like' is not the same thing as certainty that they are verbs (according to Kychanov and Arakawa) or adverbs (according to Li Fanwen). It is, however, more than a simple question mark indicating we have no idea what something means.

1.24.16:01: A future Tangut dictionary could distinguish between three categories of words:

1. words whose meanings can be confirmed from context

2. words whose general semantic domain can be determined from dictionary entries

3. words whose meanings are unknown

0117, 0048, and 0017-0048 fall into the middle category. Strictly speaking, 0117 may not even be a word; it  may be a bound morpheme.

A distinction between bound and free morphemes would also be a useful feature of a future Tangut dictionary. Current dictionaries are character-based, and all characters are given definitions, even though not all characters represent free morphemes. Users unfamiliar with Tangut cannot easily determine whether a given nontranscription character represents a word (i.e., a free morpheme) or only part of a word. (Transcription characters are indicated as such and by definition represent sounds, not words.)

To come: Is 0099 another member of the 'final' family?


17.1.15.23:55: WHAT'S SO MATERNAL ABOUT BROTHERS?

Could character structure elucidate the meaning of

𗵏𘕳

0012.5873 1bu3.2kuq1

which Li Fanwen (2008: 3, 926) defined as 'brothers'?

The first character 0012 has this analysis in Tangraphic Sea 1.7.131:

𗴺𗥦𗮀𘓳

0092 2750 5415 1602

1ma4 1ghu2 1bu3 2ngorn1

'mother head <bu> all' = top of 'mother' plus all of the homophonous phonetic <bu>

(I use < > to indicate that <bu> is a transliteration - only a loose phonetic approximation and not IPA.)

Of course brothers are born from mothers. But so are sisters. Why not abbreviate 'man' or, better yet, one of these more common characters to create 0012?

𗤾𘈎

2447 0605

2lo3 2toq4

'elder.brother younger.brother'

Could 1bu3.2kuq1 have referred to brothers sharing a mother?

1.18.19:33: But if that was the case, why isn't the top of 'mother' also in 5873? Disyllabic words written with characters sharing the same component are common in both Chinese and Tangut?

Combined Homophones and Tangraphic Sea A 7.203 analyzed 5873 as

𘕲𘊱𘏐𗡼

5876 3936 5307 2705

2kuq1 1pha1 1ghwi2 2ber'4

'<ku> left power right' = left of the phonetic <ku> + right of 'power'

5876 2kuq1 means 'to tie', so its meaning may also be relevant. Could 0012.5873 be interpreted as

𘣍𗮀𘨎𘤅

'mother' + <bu> + <ku>/'tie' + 'power'

i.e., powerful (people) with maternal ties called <bu.ku>? Why 'powerful'? Could the right-hand component just signify 'person': 'people with maternal ties called <bu.ku>'? That component appears in three of the characters for m-'people' words. However, I assumed it was self-promoting in the autonyms

𗼇 ~ 𗼎𗾧

2344 2mi4 ~ 3752 3296 2my4 2na4 'Tangut'

since it means 聖 'sage' by itself and corresponds to Sanskrit ārya 'noble' (Clauson 2016: 339). It doesn't seem to have such a function in the character

𘈑

for the presumably neutral word 0607 1myr4 'people, clan'. Maybe the common denominator of 5873, 2344, 3296, and 0607 is 'kinsman' which would explain why the component

𘢌

'person'

without implications of kinship was not used in 5873.

That component does, however, appear in

𗤾

2447 2lo3 'elder.brother'

but not

𘈎

0605 2toq4 'younger.brother'

which has a different component

𘡃

of unknown origin and function.


17.1.12.23:59: TOUCHY-FEELY TOOL HARMONY

Tonight I discovered the spelling 摸摸具和 'touch touch tool harmony' for Japanese momonga 'flying squirrel' in Wikipedia. To modern Japanese eyes it looks as if it should be read momoguwa, and it turns out that in the Edo period it was read as something like momongwa. Why not spell momongwa as <mo.mon.gu.wa> with a <mon>-graph? Was 具 still read with a prenasalized stop [ŋg] when the spelling 摸摸具和 was devised? Offhand I can't think of other cases of unwritten -n-. Or of CwV-syllables spelled as CV.CV. The two-character spelling 具和 <gu.wa> for gwa indicates that gwa in loans from Chinese (see a list here) had become ga and therefore gwa-characters (瓦畫) were no longer suitable for transcribing gwa.

1.18.12:51: According to Wikipedia, the word is first attested as momi in the Heian period; momo came later, and momongwa is from the Edo period. If -n- is short for genitive *-no-, then what is -gwa? Could it be an irregular reduction of something like *kupa? Could that reduction postdate the simplification of Sino-Japanese gwa to ga?

Gloss
Stage 1
Stage 2
Stage 3
(readings for 瓦畫)
gwa
ga
ga
flying squirrel
*mono-no-kupa
*momo-n-kuwa
momongwa

I don't know what the *kupa in *mono-no-kupa would be, but I doubt it's 鍬 kuwa < *kupa 'hoe' or 桑 kuwa < 具波 *kupa 'mulberry'. (Wish I had 上代仮名遣辞典 A Dictionary of Old Japanese Kana Usage by 五十嵐仁一 Igarashi Jin'ichi on hand to quickly find the Old Japanese phonogram spellings - if any - of those words and remove the asterisks. I did find the combining form 具波 gupa for 'mulberry' in Man'yōshū 3350.)

I wouldn't normally expect the syllable gwa or kwa in a native Japanese word, though such syllables are not impossible in native Japonic words: e.g., Okinawan kwain < *kura- 'eat', cognate to Japanese kurau.


17.1.11.23:59: CLAUSON 2016: THE FRATERNAL TEST

Two days ago, I got my copy of Sir Gerard Clauson's skeleton dictionary of Tangut over twenty years after I had first read about it in Analysis of the Tangut Script.

One of the many things I like about Clauson's dictionary is that it is free of the speculative definitions found in later dictionaries. For instance in my last entry, I used Li Fanwen's (2008: 3, 926) definition of 'brothers' for

𗵏𘕳

0012.5873 1bu3.2kuq1

That definition is presumably based solely on the Tangraphic Sea definitions of 0012 and 5873:

Tangraphic Sea 1.7.131:

𗵏𗫂𗵏𘕳𘃞

0012 3583 0012.5873 5285

1bu3 1ta4 1bu3.2kuq1 1ly3

'0012 TOP 0012.5873 AFF' = '0012 is [as in] 0012.5873'

𗤾𘈎𘃞

2447 0605 5285

2lo3 2toq4 1ly3

'elder.brother younger.brother AFF' = '[It means] elder [and] younger brother'
𗑝𗶚𘇫𗉚𗗙𘘥

4739 0213 0635.1424 1139 1279

1tsewr1 1ne4 1ny4.1thu4 1e4 1y4

'joint near relative GEN COMP' = '[It is what] closely related relatives [are] called'

Combined Homophones and Tangraphic Sea A 7.203:

𗤾𘈎𗑝𗶚𘇫𗉚𗗙𘘥

2447 0605 4739 0213 0635.1424 1139 1279

2lo3 2toq4 1tsewr1 1ne4 1ny4.1thu4 1e4 1y4

'elder.brother younger.brother joint near relative GEN COMP' = '[5873 is what] elder [and] younger brothers [and] closely related relatives [are] called'

The word 0012.5873 is apparently not attested outside the entries for its characters in dictionaries.

Last night I looked up both 0012 and 5873 in Clauson, and as I had hoped, he glossed both as '?' in entries 1069 and 3120. The question marks most likely reflect Clauson's lack of access to the Tangraphic Sea, but I think they are still appropriate to some degree today because there is no guarantee that the components of Tangraphic Sea entries are precise synonyms: e.g., 'elder and younger brothers' is certainly not the same thing as 'closely related relatives'. So could 0012.5873 have been 'sibling'?

1.14.11:14: I don't think 0012.5873 was 'sibling' because I would expect 'sibling' to appear in definitions for sororal words. Perhaps 'closely related relatives' is needed to specify biological brothers as opposed to brothers in a broader, nonbiological sense: e.g., males of the same age. Were Tangut 2lo3 'elder brother' and 2toq4 'younger brother' used as nonbiological terms of address like Burmese ကို ko 'elder brother' and မောင် maũ 'younger brother'?

Unfortunately the Tangraphic Sea definitions for 2lo3 'elder brother' and 2toq4 'younger brother' have been lost. I would not expect 0012.5873 to appear in them since I think 0012.5873 was a subset (biological) of 2lo3-2toq4 'brothers' in a broader sense.


17.1.8.23:58: ANTHROPOGENESIS IN TANGUT

One of the first Tangut words - and characters - that I learned over twenty years ago was

𘓐

2541 2dzwo4 'person'

which doesn't belong to the *m-'people' word family from my last post.

I never knew its etymology until I saw the Loloish words for 'person' in Burling (1967: 89):

Lisu tshō, Lahu chɔ̄, Akha tsɔ́hà

which are from Proto-Lolo-Burmese *tsaŋ.

Tangut -o is partly from *-a, so 2dzwo4 could be from *Pɯ-N-tsaŋH or *Nɯ-P-tsaŋH with

- *P- to condition medial -w-

- *-ɯ- to condition Grade IV

- *-N- to condition voicing of *-ts-

- *-H to condition tone 2 (the 'rising tone' - or was it really a phonation?)

According to STEDT, this word is also found in Central Naga and Bai (see forms here), so it is not an innovation of Burmo-Qiangic (Jacques' [2014: 2] proposed Sino-Tibetan subgroup containing both Lolo-Burmese and Tangut [as part of Qiangic]).

1.10: I'm glad I didn't post this right away because I realized my proposal has a problem.

Jacques' (2014: 206) pre-Tangut *-jaŋ (= my *Cɯ- ... -aŋ) became Gong's Tangut -jij (= my -e3/4), not Gong's Tangut -jo (= my -o3/4). (The initial determines whether the rhyme has Grade III or IV.)

Therefore I would expect *-jwaŋ (= my *Pɯ- ... -aŋ or *Cɯ-P- ... -aŋ) to become Gong's Tangut -jwij (= my -we3/4), not Gong's Tangut -jwo (= my -wo3/4).

2dzwo4 ends in -wo4, not -we4, so it cannot be from *Pɯ-N-tsaŋH or *Nɯ-P-tsaŋH. Or can it?

I can't find any examples of *-jwaŋ (= my *Pɯ- ... -aŋ or *Cɯ-P- ... -aŋ) in Jacques (2014). I propose that such a sequence became -wo3/4:

*Pɯ-Caŋ > *Pɯ-Cɨaŋ > *P-Cɨaŋ > *Cwɨaŋ > *Cwo3/4 and/or

*Cɯ-P-Caŋ > *Cɯ-Cwaŋ > *Cɯ-Cwɨaŋ > *Cwɨaŋ > *Cwo3/4

The medial *-w- 'encouraged' the following vowel to retain its labiality, whereas labiality was lost without *-w-:

*Cɯ-Caŋ > *Cɯ-Cɨaŋ > *Cɨaŋ > *Ciaŋ > *Cö > *Ce3/4

The *Cɯ- above is not *Pɯ- which would have condtioned -w-.

Unfortunately I do not know of any Chinese loanword evidence for my proposed sound change. Middle Chinese *-waŋ3 corresponds to Sino-Tangut -on1 rather than -wo3 in the one case known to me (Gong 2002: 424):

旺 MC *3waŋ3 : ST 𗼤 2340 1von1 'prosperous'

I suspect the word was *3won3 with a nasalized vowel -on in Tangut period northwestern Chinese (TPNWC), and that this form was borrowed into Tangut with -on1, a nasalized vowel rhyme that originated from something like *-om, a merger of Cʌ- ... -um, *-am, *-em, and *-om (but not rhymes ending in the velar nasal *-ŋ!). If my proposal is correct, an earlier borrowing of 旺 might have had -wo rather than -on in Tangut. Here is a possible relative chronology:

Stage 1 2
Tangut *-waŋ3/4 -wo3/4
*-om1 -on1
TPNWC *-waŋ3 -won3

At stage 1, TPNWC *3waŋ3 is a better match for Tangut *vaŋ3 (I write initial *w- as v-) than Tangut *vom1. But at stage 2, TPNWC *3won3 is a better match for Tangut 1von1 (which was how *3won3 was actually borrowed) than Tangut *vo3.

Did the sound change *-aŋ > -o spread from Chinese to Tangut? Japhug underwent the same change (Jacques 2004: 143) even though it was not in contact with Chinese until recently and its ancestor separated from that of Tangut long ago. A case of drift? Or just coincidence? The fusion of au into o is common (e.g., Sanskrit*), though the shift > > *u that would precede it isn't.

Lastly, on Monday morning in the rGyalrongic Languages Database I found some forms for 'person' that have labial + affricate initials like my pre-Tangut *Pɯ-N-tsaŋH: e.g.,

mDaH mdo βdzi

Tag gsum vdzi̤

At first I thought -i might be an unusual reflex of *-waŋ. However, I suspect that -i is from a rhyme with a lost *-t given Ri ṣe wdzit̚.

Forms like Nye dgaH brgya gcig vdzɨmi look like redundant compounds of the Pdz-word for 'person' with the m-'people' word from my last post.

I was initially hopeful that a third type of 'person' word in the database might be related to Tangut 2dzwo4 < *Pɯ-N-tsaŋH:

Rong wam kə' mcu

Wobzi vɟú

Hbrong rzong βɟuʔ

But now I think their palatal stops are hardened from what might be a *-j- still more or less present in

Khog po kə' mbju

Tsho bdun A ke' ᵐbo

Tsho bdun B kə³³ rəᴺ⁴⁴ bjo⁵⁴

Khang sar kə' rbju

rDzong Hbur kə' rmbju

Those words are reminiscent of Gong's 1bjuu = my 1bu3, the first half of

𗵏𘕳

0012.5873 1bu3.2kuq1 'brothers' <*NPə.SkoH or *NɯPo.SkoH**?

a word only known from dictionaries. But I do not know of any examples of 1bu3 standing by itself, so I don't think there is any connection.

Go la thang nya lo ta' ʁap is a fourth type of rGyalrongic word for 'person' without any known Tangut cognate.

*Sanskrit au is from *āu. There was a chain shift: *āu > au > o.

**1.12.6:12: These reconstructions assume that the word is native or at least was borrowed before the sound changes that occurred between pre-Tangut and Tangut.

I suspect the word is from a non-Sino-Tibetan substratum that is the source of other unanalyzable disyllabic words in Tangut. Could it have simply meant 'brother' without any age distinction?

In Old Chinese, there was a strong tendency for both halves of disyllabic noncompound words to be of the same syllable type: AA or BB rather than AB or BA.

A-type syllables had low presyllabic vowels (*ʌ) or lower main vowels (*e *a *o) and developed Grades I or II in Middle Chinese.

B-type syllables had high presyllabic vowels (*ɯ) or higher main vowels (*i *ə *u) and developed Grades III or IV in Middle Chinese.

Tangut and Chinese seem to have undergone similar (though not identical) developments. I believe both languages underwent syllable-internal harmonization: i.e., the height of the main vowel harmonized with the height of the presyllable (if any). The presyllables were then lost, the harmonized vowels became phonemic, and the two languages developed a four-grade distinction.

Chinese disyllabic noncompound words usually had height harmony. I have never looked into whether Tangut disyllabic noncompound words also usually had height harmony. Tangut 1bu3.2kuq1 lacks height harmony; it combines a Grade III (type B) syllable with a Grade I (type A) syllable. If height harmony was the norm in Tangut between as well as within syllables, then 1bu3.2kuq1 was either a loanword from a language that lacked height harmony*** or a compound 1bu3-2kuq1****. (I use hyphens to indicate morphological boundaries and periods to indicate linked syllables without any certain morphological relationship between them.) I favor the former, as I have not found words like 1bu3 or 2kuq1 with meanings I would expect for the halves of 'brothers'. I also have not found a source for 1bu3.2kuq1. I suspect Tangut may be our only source of information on its substratum: i.e., we will never find external confirmation for a word like buku.

***Cf. Turkish kitap 'book' from Arabic kitāb. Kitap violates Turkish palatal vowel harmony because it contains a front vowel i followed by a nonfront vowel a.

****Cf. Finnish seinäkello 'wall clock', a compound word without palatal harmony across its halves: seinä 'wall' has a front vowel ä whereas kello 'clock' has a back vowel o. (The vowels e and i are neutral.)


17.1.7.23:51: THE TANGUT *M-'PEOPLE' WORD FAMILY

For a long time I have assumed that

𗼇

2344 2mi4 'Tangut' (see my last post)

and the first syllable of

𗼎𗾧

3752 2my4 2na'4 < *-k 'Tangut' (borrowed into Tibetan as mi-nyag)

was from *mi, a cognate to Tibetan mi 'person', and that 2na'4 was from *Cɯ-nak-XH, cognate to Tibetan nag 'black' and almost homophonous with

𗰞 0176 1na4 'black'.

In short, I thought that the Tangut called themselves the '(Black) People'. I thought that 2my4 was phonetically something like [mjə], an unstressed, reduced form of the independent monosyllabic form 2mi4.

Although I still think 2my4 had some sort of nonlow nonpalatal vowel, last night I realized that 2mi4 could not go back to *mi because *i backed to Tangut -y. Tonight I think 2mi4 is from *Cɯ-meH with a mid vowel like the main vowel of Japhug tɯr-me 'person'. Cf.

𗶹

4469 2shi3 < *Cɯ-sheH 'to go' : Japhug ɕe 'id.'

The final *-H symbolizes the glottal source of the second 'tone' 2- (which may have been phonation rather than a ptich). The presyllable could not have ended with *-r like Japhug tɯr- at the time Tangut developed retroflex vowels because *Cɯr-meH would have become Tangut *2mir4 with a retroflex vowel -ir, not 2mi4 with a nonretroflex vowel -i. Jacques (2014: 24) identified

𗇋

3818 2mer4 < *Cɯr-mejH 'person; nominalizer'

as a cognate of Japhug tɯr-me with the expected retroflex vowel. (But what is the *-j needed to block the raising of *-e to -i? A suffix? Is *-mejH from *-meH + *-j?)

The alternation of 4469 2shi3 with

𗶷

4481 1shy3 < *Cɯ-sheH 'to go' : Japhug ɕe 'id.'

is reminiscent of the -i ~-y alternation of

𗼇 ~ 𗼎𗾧

2344 2mi4 ~ 3752 3296 2my4 2na4 'Tangut'

though the former is not part of a disyllabic word.

A monosyllabic member of the m-'people' word family with -y is

𘉑

4574 1my4 < *mi 'other person'

which Jacques (2014: 145) also identified as a cognate of Japhug tɯr-me. Could this be the true direct cognate of Tibetan mi?

Another such member is

𘈑

0607 1myr4 < *r-mi 'people, tribe'

I am now inclined to think there was a *-e ~ *-i alternation in the pre-Tangut *m-'people' word family:

*-e-words *-i-words
𗼇 2344 2mi4 < *Cɯ-me-H 'Tangut'
𗇋 3818 2mer4 < *Cɯ-r-me-j-H 'person; nominalizer'
𘈑 0607 1myr4 < *r-mi 'people, tribe'
𗼎
3752 2my4 < *mi-H- (first syllable of 'Tangut')
𘉑 4574 1my4 < *mi 'other person'

Did that alternation originate as a distinction between, say, a schwa-grade *-əj and a zero-grade *-i? The Sanskrit alternation between guṇa-grade nara- and zero-grade n-, both 'man', comes to mind.

Next: What is the etymology of the most common word for 'person' in Tangut?


17.1.6.23:59: INSTALLATION-FREE TANGUT

After my initial post using the Tangut Yinchuan font, I was worried about how to tell readers they'd need that font to see Tangut text in subsequent posts. Thanks to Andrew West and David Boxenhorn, you may now be able to see

𗼇𘝞𗏇

2344 4797 2403 2mi4 1wyr4 2di4 'Tangut script'

on this blog without installing a font on any device. I've been using images for the past decade to be able to read Tangut on my phone, but now I only need them for Khitan and Jurchen until they're added to Unicode.

17.1.17.1:30: Unfortunately I can't view the characters online in Chrome even though they're visible in my local copy in Chrome. But they are visible online in Firefox and on my iPhone. I don't know about visibility elsewhere. I don't have time to fix this issue right now. Maybe next week.


17.1.5.23:55: MIYAKO IN TANGUT

Japanese names are Sinified by reading their Chinese characters (if any) in a Chinese language: e.g., 宮古 Miyako is Mandarin Gonggu, Cantonese Gunggu, etc.

How would Japanese names have been Tangutized? The Tangut only knew of Japan through Chinese written records, so they wouldn't have known how Japanese names were pronounced in Japanese, much less other Japonic languages like Miyako. Thus the Tangut would have phonetically transcribed the Tangut period northwestern Chinese readings for the characters of Japanese names: e.g., 宮古 TPNWC *1kun3 2ku1 would have been Tangutized as

𗌵𘝻

1306 1034 1kon4 1kwo1

using the transcriptions of 宮 and 古 in the Forest of Categories.

Tangut had no rhyme -un3 and generally did not permit Grade III rhymes after velars, so -on4 was the best available match for TPWNC *-un3.

Although Tangut had a rhyme -u1 whose romanization on this site happens to match TPWNC *-u1, my notation is not IPA, and perhaps Tangut -wo was an attempt to approximate a TPWNC final like [ʊ], a vowel partway between [u], the vocalic counterpart of the glide w, and mid o.

There are other hypothetical and less likely approaches to Tangutizing Miyako.

One is to translate the Chinese characters 宮古 'palace ancient' into their Tangut equivalents: e.g.,

𘓹𗖏

1623 0429 2vaq1 2nwo4 'palace ancient' (which happens to have Tangut noun-adjective order!).

(Li Fanwen's Chinese-Tangut index has a typo; it lists 0428 as the equivalent of 古 'ancient'.)

Another - the least likely of all - is to phonetically transcribe the Japanese name:

𗓁𘁂𗁀

5026 5314 2946 1mi4 2a4 1ko1

I have used the Tangut characters for transcribing the Sanskrit syllables mi, ya, and ko. 5314 was probably phonetically something like [ja].

But how would the Tangut have known that 宮古 was read Miyako?

All of the above assumes the spelling 宮古 existed during the heyday of the Tangut. But I don't know how old it is.


17.1.4.23:59: JAROSZ ON NEVSKY ON MIYAKO

I was planning to write a follow-up to this post using the Tangut Yinchuan font. But I ran out of time, so I'll merely link to Aleksandra Jarosz' 2015 PhD dissertation Nikolay Nevskiy's Miyakoan Dictionary reconstruction from the manuscript and its ethnolinguistic analysis: Studies on the manuscript (via Bitxəšï-史). Although it is obviously about 宮古 Miyako, its profile of Nevsky is still of interest to Tangutologists. It is no wonder that he "succeeded in deciphering the highly complicated, Chinese-character-inspired and by then largely unintelligible script of the medieval Xixia kingdom, the homeland of Tangut speakers" given that he

... was a very prolific and dedicated scholar, remembered by his colleagues and informants alike as one truly open-minded and able to grasp the cultures and languages of the subjects of his study almost intuitively. He was also a brilliant multilingual speaker, reportedly having mastered as many as sixteen Asiatic languages (apart from Japanese including Tibetan, Mongolian, Manchu, Pali, Korean and Giliak), as well as English, German, French and Latin (Kanna 2008:167). He acquired his first Orient languages as early as in the times of his Rybinsk gymnasium (post-1900), when he learned Tatar from a local family of native speakers, as well as mastered Arabic alphabet through self-study (Katō 2011:18). (p. 19)

And the Tangutologist and Khitanologist Viacheslav Zaytsev appears on page 8 and in the acknowledgements!

Next: How to write 'Miyako' in Tangut.


17.1.3.18:36: TANGUT AVIAN ANATOMY

About twenty-five years ago I learned the following method to convert base-10 numerals from 1 to 60 into their Chinese sexagenary equivalents.. The coming Chinese new year is the 34th in the 60-year cycle. The first character of the sexagenary term is the Heavenly Stem for the second digit: i.e., 丁 'fourth Heavenly Stem'. The second character is the Earthly Branch for X, a number between 1 and 12:

(X + (Y * 12)) = 34

X turns out to be 10 (and Y is 2), so the second character of the sexagenary term is 酉 'tenth Earthly branch': i.e.., rooster'.

The Tangut adopted the sexagenary cycle and somehow found Tangut equivalents for the Heavenly Stems. Last time I wrote about

𗸃

0410 1vi1 'fourth Heavenly Stem'.

The reasoning behind choosing 1vi1 eludes me. (I'm assuming the Tangut terms were repurposed existing words rather than just made up.)

On the other hand, the logic behind the Tangut equivalents of the Earthly Branches is transparent: e.g., 酉 'rooster' was simply translated as

𗿼

2262 1jwon3 'bird'.

(The Japanese did the same thing; they read 酉 as the native word tori 'bird'.)

2262 has three components:

𘤊𘤏𘪣.

Andrew West has written about the first at length here; it appears in other characters for words for birds (see below) but also has other associations.

The first and second components

𘤊𘤏

appear in the second entry in the Tangraphic Sea:

𘀑 = 𘀏 + 𘤊 (< 𗿼)

3911 1pu1 'a kind of bird' = left and top of 3909 1pu1 'the name Pu' (phonetic) + left of 2262 1jwon3 'bird'.

3911 in turn is part of the analysis of 3909, the first entry in the Tangraphic Sea:

𘀏 = 𘀑 + 𘦑 (< 𗩝)

3909 1pu1 'the name Pu' = left of 3911 1pu1 'a kind of bird' (phonetic) + right of 2653 1penq 'horn'

You can see those two characters in context at Andrew West's site.

𘤏 might also be semantic for 'bird' in 3911 and even 3909 if the Pu were associated with birds and horns.

The third 𘪣 means 'bird', but like most Tangut semantic components (and unlike its possible inspiration, Chinese 鳥 'bird'), it cannot stand by itself. I don't know why some elements can be independent and others can't. In the case of 2260, the elements appended to 𘪣 'bird' are phonetic:

𗿼 = 𘤊𘤏 (< 𗿤) + 𘪣 (< 𘝋)

2262 1jwon3 'bird' = left and center of 2260 1jwon3 'breeding' + left of 1242 2dzwy4 'wing' (with a slight modification of the top element's bottom right corner)

2262 1jwon3 'bird' sounds like 2260 1jwon3 'breeding' and has 1242 2dzwy4 'wings'.

2260 has a circular derivation:

𗿤 = 𘤊𘤏 (< 𗿼) + 𘣑 (< 𘟢)

2260 1jwon3 'breeding' = left and center of 2262 1jwon3 'bird' (phonetic) + right of 0373 2vi1 'to copulate, mate' (semantic)

So does 1242:

𘝋 = 𘪢 (< 𘝁) + 𗟎

1242 2dzwy4 'wing' = left of 0673 2thy1 'wing' (semantic) + bottom right of 4289 2dzwy2 'winding corridor' (phonetic; is that component [Boxenhorn code: caigie] in Unicode?)

4289 is obviously from 1242 as a phonetic plus the semantic component 𘡩 'wood' (a corridor can be a wooden structure). The Tangraphic Sea confirms my guess:

𗟎 = 𘡩 (< 𗞵) + 𘝋

4289 2dzwy2 'winding corridor' = top of 4364 1rur4 'wooden framework' (semantic) + all of 1242 2dzwy4 'wing' (phonetic)

(The Boxenhorn code for the bottom right of 4289 is tok, not caigie, but the two components look alike to me.)

𗟎 4289 must postdate the less complex 𘝋 1242 'wing'. But does 𘝋 1242 'wing' postdate 𗿼 2262 'bird'? And what is the function of the right side of 1242 (stroke code EACCQBE not in N4636) which is unique to that character? If it is derived from two other characters, why weren't those characters mentioned in the Precious Rhymes of the Tangraphic Sea (the corresponding volume of the Tangraphic Sea has been lost)?


17.1.2.23:50: MY FIRST POST IN TANGUT YINCHUAN

I just used Andrew West's file to convert Li Fanwen numbers for Tangut characters into Unicode for the first time to type the Tangraphic Sea analysis of1vi1 'fourth Heavenly Stem', the first half of the sexagenary term for the Tangut year beginning on 28 January 2017:

𗸃 = 𘣟 (< 𗷰) + 𘧦 (< 𘔁)

0410 1vi1 'fourth Heavenly Stem' = left of 0613 2t-? 'to refuse, remove' + 'fire', left of 4661 1bi4 'third Heavenly Stem'

If you can't see the characters, please install Prof. 景永时 Jing Yongshi's free Tangut font at BabelStone.

It's not surprising that 0410 shares 𘧦 'fire' with 4661 since both the third and fourth Heavenly Stems are associated with fire and were hence called 'red' in Khitan and Jurchen (and ᡠᠯᡤᡳᠶᠠᠨ fulgiyan 'red' and ᡠᠯᠠᡥᡡᠨ fulahūn 'reddish' in Manchu).

But why was 𘧦 'fire' combined with 𘣟from 2t-? 'to refuse, remove' which is neither (nearly) homophonous with 1vi1 'fourth Heavenly Stem' nor obviously semantically relevant to it? 𘣟 is not among the character components that Nishida (1966) was able to gloss.

𗷰 0613 2t-? 'to refuse, remove' is listed in Tangraphic Sea as a component in at least two more characters:

𗅯 = 𘠐 (< 𗅉) + 𘣟 (< 𗷰)

2377 1ky4 'to prohibit' = 'not', left of 1906 1non2 (conjunction) (semantic) + left of 0613 2t-? 'to refuse, remove' (semantic)

𘒐 = 𘧉 (< 𘒖) + 𘣟 (< 𗷰)

1462 1lo1' 'cooperation' = 1535 1lo'1 'to gather, assemble' (semantic/phonetic) + all of 0613 2t-? 'to refuse, remove'

There may have been other derivatives of 0613 in the lost 'rising tone' volume of the Tangraphic Sea.

I can understand why 'to refuse' is in 'to prohibit', but what's it doing in 'cooperation'?

And I might expect 'not' + 'to refuse' to represent a word for 'not refuse', but I presume the character is like a double negative.

Lastly, I have no idea what the etymologies for 𘔁 1bi4 'third Heavenly Stem' and 𗸃 1vi1 'fourth Heavenly Stem' are.

Next: Tangut avian anatomy.


Tangut Yinchuan font copyright © Prof. 景永时 Jing Yongshi
Tangut character image fonts by Mojikyo.org
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2017 Amritavision