Or, in Jurchen,

<šang.giyan SNAKE.he DAY> šanggiyan meihe inenggi

1. Related or abbreviated? For years I thought of <SNAKE> as resembling 厄 'adversity', but today I finally realized it's related to the right side of 蛇 Chinese 'snake'. The left side 虫 'bug' is a later addition; the right side 它 was originally a standalone drawing of a snake. <SNAKE> may be from a northeastern version of 它 that became part of the Parhae script. I then saw that Jin Qizong (1984: 35) thought <SNAKE> is an abbreviation of 蛇.

2. Today I also wondered if <he> could somehow be related to the graph for Jin Chinese 黑 *xə 'black'. If <he> goes back to the Parhae script or even earlier, then its original phonetic value may have been *xək like the Middle Chinese reading of 黑. Native Jurchen words can only end in -n, so it would be understandable if the Jurchen took a Parhae graph for *xək and used it to write their he [xə].

3. Two days ago I was reading Jonathan Evans' Introduction to Qiang Phonology and Lexicon (2001: 182) on the "weak role of tone in [Qiang] tonal dialects". He got different tones for the morpheme 'finger' in the names of the five fingers in two different recording sessions:

session 1: low (4×), high (1×)

session 2: high (5×)

Was Tangut like its modern living Qiang relatives? Were its tones as unstable? Or as unstable at some earlier point in its history before they 'settled down' to the point where a rhyme dictionary organized by tone (the Tangraphic Sea) made sense? Is my assumption that the 'rising tone' originated from a final glottal *-H misguided? I fear the history of Tangut tones is complex.

I should have written all that in "The Day of the Yellow Hare", but I forgot until I stumbled across that page again today.

4. Sergey Dmitriev's 2018 article on Tangut tree names shows how much can be extracted from just a few entries of the Sino-Tangut glossary Pearl in the Palm. I hope other semantic categories in that booklet are subjected to similarly intense analyses.

The dedication is to Elena N. Nevskaja, the late daughter of NA Nevsky, the greatest Tangutologist of all. I am saddened to learn she is gone.

5. Going back to Evans, I was looking at his reconstructions of Proto-Southern Qiang (PSQ) initial clusters (2001: 165-166). Looking at *KC-clusters, it is tempting to phonologize them all with a preinitial /k/ whose aspiration and voicing are conditioned by the following initial:

PSQ (phonological)
tsh- before e, tɕh- elsewhere
ɕ- s-
*khɕ- */kɕ-/ ɕ-, tsh-
s-, ɕ- khɕ-
dʑ- ʑ- gʑ-
gɹ-, dz-
g-, dʐ-
gʑ- before y, gʐ- elsewhere

Such assimilation has a modern parallel in Taoping in which preinitial /χ/ is [ʁ] before voiced initials.

But that analysis requires a voiceless /r̥/, a consonant not reconstructed elsewhere in PSQ. Moreover, it doesn't work for *PC-clusters:

PSQ (phonological)

Or does it? What if Evans' *pz-, *pr-, and *phr- are */ps- pr̥- pʂ-/?

2.14.17:14: I could also reinterpret *khr- as /kʂ-/ to parallel *phr- /pʂ-/. All voiceless sibilants would then condition aspiration of the preinitial: */CS̥/ = *ChS-. Nonsibilant */r̥/ would not: */pr̥/ = *pr- (not *phr-). No, not all - */ps/ isn't *phs-, it's ... *pz-! /voiceless/ + /voiceless/ = /voiced/? I think not, though maybe I could just rewrite *pz- and *bz- as *ps- and *pz- (i.e., regard the phonological and phonetic forms as identical) and have Taoping undergo a chain shift:

*/ps-/ > */pz-/ > bz-.

Still, there seems to be strong if not perfect complementary distribution - there is a tendency against voicing mismatches: e.g., no *kz- or *bs-. Perhaps a neater earlier system was complicated by

- borrowings from languages with different phonotactics

- and or/by new preinitials from earlier syllables that lost their vowels after the voicing assimilation rule ceased to operate: e.g.,

*pz- > Taoping bz-

*pVz- > Taoping pz-

The reanalysis above is motivated by a hypothesis that Proto-Sino-Tibetan had fewer preinitials than initials: e.g., one preinitial velar stop *k- but three initial velar stops *k- *kʰ- *g-. But in theory Qiang could have preserved preinitials lost in Old Chinese, Old Tibetan, pre-Tangut, Pyu, etc.

6. Today I learned that 'Jewish' in Khmer is ជ្វីស <jvīs> [cʋih], a borrowing from French juif [ʒɥif]. Why is it spelled with s and not <ḥ>?

7. Looking at Vovin (2017) again while writing footnote 2 of "The Day of the White Dragon", I noticed he reconstructed Old Korean 日尸 <SUN.l> 'sun' (普皆廻向歌 Pogaehoehyangga, line 5, mid-960s) as *nal. That would seem to rule out a connection with Serbi-Mongolic forms like Khitan ñayr 'day' (as reconstructed by Shimunek 2017: 358) and Middle Mongolian naran 'sun'.

2.14.15:51: The only way around this would be to reconstruct a third liquid or a liquid cluster in the source language of 'day/sun' that became *r in Serbi-Mongolic but *l in Koreanic.

8. 2.14.17:57: I forgot to mention this passage I saw yesterday:

The Shakya clan of India, to which Gautama Buddha, called Śākyamuni "Sage of the Shakyas", belonged, were also likely Sakas as Michael Witzel and Christopher I. Beckwith have demonstrated.

I hope there is more to the argument than the similarity between Śākya and Saka. As Attwood (2012: 58) wrote,

The similarity in names is not enough to identify the Śākyas with the Iranian Sakas.

Attwood evaluates and expands upon Witzel's 2010 proposal. I am unable to evaluate it or Beckwith's 2015 book. THE DAY OF THE WHITE DRAGON

Or, in Jurchen,

<šang.giyan DRAGON.r DAY> šanggiyan mudur inenggi

I thought I had lost my list of topics for yesterday's entry, but I found the former as I was about to post the latter. The list was in index.htm before I was about to paste "The Day of the Yellow Hare" onto the top.

1. In my discussion of the Jurchen word for 'red', I forgot to mention modern Sanjiazi Manchu fulxajn 'red' (Kim 2008: 144) corresponding to standard written Manchu fulgiyan (which I presume to have been [fʊlɢʲaʜ]). x seems to be from an earlier fricative *[ʁ] rather than a stop *[ɢ]. But why is it devoiced between voiced segments [l] and [a]? Is it from an earlier unaspirated stop *[q]? (Voiced stop symbols in my Jurchen/Manchu notation may have been either voiceless unaspirated or voiced in medial position.)

I'd like to find more instances of the g : x correspondence.

I'd also like to find more examples of palatality moving to the end: C₁iyVC₂ > C₁VyC₂. Having just mentioned Jurchen šanggiyan 'white' (the standard written Manchu word is the same), I would expect a Sanjiazi form ending in -ajn, but the actual form is ɕaŋŋən without -j- (Kim 2008: 94) Could -ən be a reduction of *-ajn?

2. Middle Korean 븕- pɯrk- 'red' is somehow related to Sanjiazi fulxajn. When looking for what Alexander Vovin (2009: 73) had to say about 븕- pɯrk-, I found his proposal that the attributive suffix of Old Korean

明期 <BRIGHT.kɯj> *pʌlk-kɯj 'bright¹-ATTR²' (處容歌 Chŏyong-ga, line 1, mid-700s)

is the source of the Proto-Japanese³ attributive suffix *-ke. I suspect the Koreanic source of loans in Proto-Japanese was not a direct ancestor of Old Korean. So maybe the source language had an attributive suffix *-ke, possibly from a Proto-Koreanic *-kɯj. Otherwise I would expect Old Korean *-kɯj to be borrowed into Proto-Japanese as *-kəj or even *-kɨj if Frellesvig and Whitman's proposal of a seventh Proto-Japonic (and by extension, Proto-Japanese) vowel is correct.

An apparent paradox just occurred to me: Old Korean has -Vj where the Koreanic source of Proto-Japanese loanwords has *-e and vice versa:

'ATTR': OK -kɯj : PJN *-ke

'Buddha': OK *putke (cf. pre-Jurchen *putiki; pre-Jurchen had no front vowel *[e]⁴) : PJN *pətəkaj

Adding yet another layer of complication:

'temple': OK *tjara (cf. Jurchen taira(n); ty- [tj] is not possible in Jurchen⁵) : PJN *tera (< *tjara?)

Or was Jurchen ai an attempt to approximate a Koreanic *[e]? a is nonhigh like *[e] and i is palatal like *[e].

Maybe this can be resolved at the Proto-Koreanic level. And/or maybe there was more than one Koreanic source of Proto-Japanese loanwords: e.g., one language at two different periods or two languages/dialects at once.

3. A sequel to my proposal of *rjaC > rar4 in Tangut: I looked up all three rar4 words with etymologies in Jacques (2014), and none have cognates with *-j-:

𘗶 0803 2rar4 'horse' < *-k-H?, suffixed stop-final variant of 𘆝 0764 1rer4 < *-ŋ 'id.' : Japhug mbro < *-ŋ 'id.', Written Burmese mraṅḥ 'id.'

𘅤 1715 1rar4 'to write' : Japhug rɤt 'id.'

𘃜 5523 1rar4 'must' : Japhug ra 'id.', Written Burmese 'id.'

If Gong Xun is right, all three had a simple initial *r- in pre-Tangut like the cognates for the last two words:

2rar4 < *rak-H? 'horse'

1rar4 < *rat 'write'

1rar4 < *raC 'must' (but why does Tangut have a final consonant corresponding to zero in Japhug and Written Burmese?)

That's simpler than my scenario in which Grade IV lower vowels are 'bent up' by preceding high vowels in presyllables that were lost:

rar4 < *.raC

On the basis of Japhug and Written Burmese, I could propose *mɯ.rak-H as the source of 2rar4. But there is no external evidence for presyllables for 'to write' and 'must'; at this point they are merely constructs necessitated by my theory.

The relative simplicity of Gong's theory and mine is reversed with Grade I (there are no Grade II or III syllables with r-):

rar1 < *raʶ (Gong) but *(Cʌ.)ra (this site)

The advantage of my theory is that it requires no exotic segments like uvularized *aʶ. (But nonexotic segments not supported by external evidence are not to be embraced.)

The ratio of rar1 to rar4 in my database of Tangut character readings (≠ morphemes or words!)  At a glance that may suggest Gong's *aʶ (> a1) was almost as common as his *a (> a4), which seems implausible. However, a count of types is not a count of tokens. Phonemic frequency analysis of Tangut texts remains to be done. 

4. Yesterday I learned who founded Taitō and why it has its name - the kanji are 太東, and 太 is short for 猶太 'Jewish', a reference to Michael Kogan's background. 猶太, pronounced [jowtʰaj] in standard Mandarin, is a Chinese phonetic transcription of a form like Judaea.

After so many years, I finally wondered - why does the d of Judaea correspond to an aspirated [tʰ] in most Chinese readings of 太⁶: e.g., standard Mandarin [tʰaj]? Was 太 'great' chosen more for its meaning than its phonetic value? But then why not transcribe dae as 大 'great' without a dot and with either an unaspirated [t] or a voiced [d] depending on Chinese variety?

(2.13.22:48: Answering my own question, I learned that 猶大 without a dot already exists as the Chinese borrowing of Judah [son of Jacob] and, in Protestantism, Judas and Jude. But I would imagine 猶 太 predates 猶大, so it wouldn't be as if 猶大 were already taken. I could be wrong, though. I don't have time to track down those words.

I don't know what the earliest Chinese term for 'Jewish' was. The English and Chinese Wikipedia mention Yuan dynasty terms 竹忽 *tʂu xu and 朱乎得 *tʂu xu tə as terms for Jews, but I can't find any attestations at Scripta Sinica. The initial *tʂ- is odd since I would expect a glide *j-. What language with an affricate-initial word for 'Jew' would be a plausible source for those borrowings?

5. Last month I proposed that the Jurchen word for 'sword'


might be halmar corresponding to Manchu halmari 'a sword used by shamans'. I then realized that Jurchen


mudur 'dragon' : Manchu muduri 'id.'

is another example of that correspondence, but forgot to blog about it until today, a dragon day. Is Manchu -ri in part from earlier -r, or is this another case where Manchu is more conservative?

6. Back in 2011 I proposed that the Jurchen phonogram


as in

<> wihan 'ox'

had what looked like Jin Chinese 不 *pu 'not' on the bottom because it was originally intended to write a Koreanic word *an 'not'. That was an extremely stupid idea, even 'wronger' than usual for this site, because an isn't attested until the late 1800s (Martin 1992: 419); the earlier Korean form was disyllabic ani.

But maybe that idea can be salvaged minus the anachronistic reference to an. Today I saw Alexander Vovin's "Two Tungusic Etymologies" (2018) in which he reads Late Old Korean 不知 <NOT.ti> as anti 'not'. So 不 was read an-, though that was still centuries before there was a standalone word an 'not'. He then proposes that Proto-Korean *an-negatives are the sources of Tungusic negatives. That borrowing must have occurred very long ago - long before the rise (and fall) of Parhae in the second half of the first millennium CE. Still, the idea of Jurchen speakers knowing of Koreanic an-negatives now seems a bit more plausible.

7. Some new terms for convenience:

North Koreanic: hypothetical prestige language of Parhae underlying the Parhae script. Inexplicable sound-symbol matches in the Khitan and Jurchen large scripts (e.g., why write Jurchen an with a character 不 read *pu in Jin Chinese?) might involve North Koreanic readings.

Late Koreanic loans in Jurchen (e.g., taira(n) 'temple') are from North Koreanic.

Earlier Koreanic loans in Tungusic (or vice versa - e.g., 'red'?) could predate the North/South Koreanic split.

South Koreanic: source language(s) of Koreanic loans in Proto-Japanese and Old Japanese. The inconsistent correspondences between Old Japanese and Old Korean *e may reflect borrowings from two varieties of South Koreanic, 'A' (from Paekche?) and 'B' (from Kaya?). The Old Korean of Shilla may be a third variety, 'C'.

Japanese tera 'temple' is a loan from South Koreanic, so there is no need to come up with a single early Koreanic form underlying both Jurchen taira(n) and Japanese tera; the vowel of the first syllable could have developed differently in North and South Koreanic.

I could just use terms like 'Koguryo' for North Koreanic and 'Paekche' for South Koreanic, but I want to avoid conflating languages with states, particularly given the presence of a Japonic and perhaps even Tungusic substratum on the peninsula.

¹2.13.23:02: Korean 'red' and 'bright' are thought to be related via ablaut (Vovin 2009: 7). In Middle Korean, they both have -r-, but I reconstruct -l- for Old Korean for both words, given that

1. Old Korean had an r/l contrast lost in Middle Korean (Vovin 2017)

2. Tungusic and Mongolic have an r/l contrast

3. Tungusic and Mongolic have l in 'red'

4. So Old Korean had *l in 'red'

5. And if 'red' and 'bright' have the same root

6. Then 'bright' had *l too

Old Korean
Middle Korean
Modern Korean
*pɯlk- pŭrk-
붉- pulk-
*pʌlk- ᄇᆞᆰ părk-
밝- palk-

²2.13.23:24: The sequence *ʌ ... ɯ looks bizarre from a Middle Korean perspective because it combines a lower vowel stem with a higher vowel suffix, but Old Korean did not have vowel harmony (Vovin 2009: 11). I suspect that vowel harmony was introduced into Korean by Tungusic speakers in the northern half of the peninsula. But is there any evidence for more vowel harmony in northern Korean than in southern Korean?

The sequence *pʌlk-kɯj is bizarre in another way: the normal Korean attributive suffix is -ɯn: cf. 去隱 <LEAVE.ɯn> for *?-ɯn 'left' in 慕竹旨郞歌 Mojukchirangga (c. 700).

Stranger still, it is also possible to interpret 明期 <BRIGHT.kɯj> as *pʌlk-ɯj 'bright-GEN'. Strange because pʌlk- is a verbal root that should not be followed by a genitive suffix. Could *pʌlk-ɯj be a remnant of a time when the verb/noun distinction was not as strict?

³2.13.13:33: Proto-Japanese is distinct from Proto-Japonic:

Japanese dialects
Ryukyuan languages

Proto-Japonic is the ancestor of the entire family. Proto-Japanese is the ancestor of the dialects of mainland Japan.

⁴2.13.13:01: E in my Möllendorff-style notation for (pre-)Jurchen represents [ə], not [e]. [i] was the only front vowel in (pre-)Jurchen.

⁵2.13.13:43: Alexander Vovin (2007: 77) proposed Old Korean *tiara 'temple' and metathesis in Jurchen (*ia > ai) to work around the impossibility of the initial cluster tj- in Jurchen.

⁶2.14.0:31: The major exception is Toisanese in which [tʰ] became [h], so 猶太 theoretically would be read [ziwhaj]. (*j- became [z] in Toisanese - a sound change shared by Vietnamese.) But I have no idea if [ziwhaj] is the actual Toisanese word for 'Jewish'. I don't know how far 'syllabic conversion' goes in nonstandard Chinese varieties. Have Toisanese speakers simply borrowed Cantonese 猶太 [jɐwtʰaːj]?

I suspect nonstandard Chinese varieties have a lot of borrowings from prestige languages: e.g., why read 妮妲莉寶雯 'Natalie Portman' in Toisanese instead of borrowing Cantonese [nejtaːtlej powmɐn]?

I hope I read that correctly. 妲 can also be read [tʰaːn]. That might be a recent modern reading by analogy with 袒 and 坦, both [tʰaːn]. Jiyun (1037) lists two fanqie for 妲:

- 當割切 for *tat, corresponding to Cantonese [taːt]

- 得案切 for *tan which should correspond to Cantonese †[taːn] with unaspirated [t]

The only Mandarin reading I know of is da [ta] from *tat, so I guessed that 妲 was [taːt] in 妮妲莉 'Natalie' (even though I'd expect an aspirated [tʰ] corresponding to written -t-). Is the Cantonese name based on an American pronunciation [næɾəli] with a voiced alveolar flap [ɾ]? If so, then unaspirated [t] would be a better match for [ɾ] than aspirated [tʰ]. THE DAY OF THE YELLOW HARE

Or, in Jurchen,

<YELLOW.giyan HARE DAY> sogiyan gulma? inenggi

I don't have time to even make a list like last night¹. And I don't want to wait another twelve days to say this, so ...

It's not clear how the Ming Jurchen would have written 'hare' in their script. The Bureau of Translators vocabulary (early 1400s?) has the Ming Mandarin transcription

古魯麻孩 *ku lu ma xaj (#150)

for a two-character spelling ending in a phonogram

<HARE.hai> gulmahai

whereas the Bureau of Interpreters vocabulary (c. 1500?) without Jurchen script has the Ming Mandarin transcription

姑麻洪 *ku ma xuŋ (#1100)

for gu[l]mahun (Kane 1989: 218) which reflects a different suffix found in Manchu gūlmahūn².

Kiyose (1977: 105) suggested that the Bureau of Translators form gulmahai is genitive, implying that the word for 'hare' without the genitive case marker -i was *gulmaha³. But if that as the case, how would -ha have been written? N3696 lists eight different Jurchen characters read xa (= my ha [χa]).

Here I've followed Andrew West who regards 'hare' as simply gulma sans suffixes, but at present I cannot confirm that shorter reading because the only phonetic evidence for the word I have on hand are the two transcriptions above. I do not know of any Jin dynasty attestations of the word. I suspect that the original spelling was a single logogram *<HARE>. However, I cannot say whether *<HARE> would have been read as *gūlma, *gūlmaha, *gūlmahūn, or something else.

¹2.12.0:49: I did make notes for a list to appear in this entry, but I lost it due to computer problems. I should reconstruct it later today before I forget.

²2.12.10:39: -hūn is probably the same suffix found in Manchu indahūn 'dog' and Ming Jurchen

<DOG.hun> indahun 'dog'

(from the Bureau of Translators vocabulary, transcribed 引荅洪 *in ta xuŋ [#147]; the Bureau of Interpreters vocabulary has indahu, transcribed 因荅忽 *in ta xu [#413]).

Other Tungusic languages have a bare stem (e.g., Orok ŋinda) or a different suffix (e.g., Oroch inaki).

It's not possible to tell whether the one-character spelling


from the Jurchen Character Book manuscript thought to be an early catalog of characters represented indahūn⁴, the bare root inda, or even inda with a different suffix. It's even possible that Proto-Tungusic *ŋ- (cf. the Orok form above) was still present in the Jin Jurchen word for 'dog'.

The function of -hūn is unclear to me. It does not seem to be the -hūn that Gorelova (2002: 148-150) regards as a suffix for Manchu quality nouns: e.g., aibishūn 'swollen, swelling (n.)' (cf. aibimbi 'to swell').

³2.12.9:39: See Gorelova (2002: 114) for examples of the Manchu noun suffix -ha. It is unclear to me how she distinguishes between nouns with -ha suffixes and nouns with unsuffixed roots ending in -ha (assuming the latter type of noun exists in her view).

⁴2.12.10:28: Jin Jurchen probably had a Manchu-like u/ū [u/ʊ] distinction lost in the dialect recorded by the Bureau of Translators. See Kiyose (1977: 45-46) on how Ming Jurchen spellings indicate the loss of that distinction.

It is unclear whether the Bureau of Interpreters dialect retained the distinction because there would be no clear way to indicate it in Ming Mandarin transcriptions: e.g., *ku ma xuŋ could represent either gu[l]mahun as Kane thought or gulmahūn. THE DAY OF THE YELLOW TIGER

Or, in Jurchen,

<YELLOW.giyan TIGER DAY> sogiyan tasha inenggi

I'm going to try something new. I have too many topics on my mind and not enough time to cover any of them properly. Yet I don't want them to slip away forgotten or remain as unfinished stub entries, never to be completed. So I'll just make a quick list of topics I might return to later. Might.

1. In "The Day of the Red Ox", I didn't mention Middle Korean 븕 pŭrk- 'red' which is somehow related to the Mongolic/Tungusic word for 'red'. Was -ŭ- [ɯ] an attempt to imitate a foreign [ʊ]? Here's a modern Korean book in which English took [tʰʊk] is phonetically rendered in hangul as 특 thŭk [tʰɯk].

2. Looking at the cover of Jacques (2014) with examples of Tangut ar4 words, I realized that maybe I was wrong about pre-Tangut *rjaC becoming Tangut ar4. Maybe *rjaC became rar4, whereas *CV-rjaC became ar4: i.e., *-rj- lenited to zero between a presyllable and the main vowel. (Actually, I think ar4 was phonetically something like [jæʳ], so maybe *-rj- was reduced to *-j-.)

3. I wish this page on Tungusic from 1998 were rewritten in Unicode. Maybe it'd be legible if I dug up an old pre-Unicode SIL phonetic font.

4. I was looking at Nedjalkov's (1997: 311, 314-315) description of Evenki vowels and vowel harmony. Two eye-catching things off the top of my head:

1. no true high vowels [i u]

2. long [ɛː] patterning with [a] rather than [ɛ] in vowel harmony.

Could [ɛː] be from *aj (cf. Korean 애 [ɛ] < Middle Korean [aj], [ʌj])?

2.11.1:11: Wikipedia doesn't even try to describe Evenki vowel harmony rules:

Knowledge of the rules of vowel harmony is fading, as vowel harmony is a complex topic for elementary speakers to grasp, the language is severely endangered (Janhunen), and many speakers are multilingual.

5. For three years I've agreed with Beckwith (2002) who was the first to propose that Pyu aṁ was [ɛ]. I've been assuming that aṁ [ɛ] < *e. Today I realized that maybe it could partly directly come from *ja: e.g.,

*ja > *jæ  > *jɛ > [ɛ]

in hrat·ṁ [r̥ɛt] 'eight' (cf. Old Tibetan brgyad 'eight').

6. For years I've wanted to convert transcriptions of Rouran names into Middle Chinese and see if anything interesting emerges. Here's an example: 郁久閭社崙 ʔuk kuʔ lɨə dʑiæʔ lon for 'Yujiulü Shelun' in modern standard Mandarin. THE DAY OF THE RED OX

Today is a

<RED¹.giyan OX².an> ful(a)giyan wihan inenggi 'red ox day'

in the Jurchen calendar.

It is hard at a glance to tell whether 'red' was disyllabic [fʊlɢʲaʜ]³ (i.e., identical to later standard Manchu fulgiyan⁴) or trisyllabic [fʊlaɢʲaʜ]. The Bureau of Translators vocabulary (early 1400s?) has the trisyllabic transcription

弗剌江 *fu la kjaŋ (#617)

whereas the Bureau of Interpreters vocabulary (c. 1500?) has the disyllabic transcription

伏良 *fu ljaŋ (#1100)

The obvious solution would be to posit [a]-loss: earlier trisyllabic [fʊlaɢʲaʜ] became later disyllabic [fʊlɢʲaʜ]. But it is not clear that the varieties of Jurchen within the two vocabularies are two snapshots of the same dialect at two different points in time. It is not even clear that each vocabulary is homogeneous: i.e., reflecting only a single dialect rather than a mix learned from various informants who may not even have been contemporaries. Lastly, it is possible that Ming Mandarin *la was merely a device to write a simple Jurchen [l]. There was no Ming Mandarin syllable *ful (and hence no character for such a syllable), so [fʊl] might have been transcribed as 弗剌 *fu la. On the other hand, other Tungusic languages do have an a after l in 'red', and the undoubtedly related Proto-Mongolic word for 'red' does have an *a between *l and *g: *hulagan, suggesting that the *a at least dates back to when Tungusic borrowed the word from Mongolic (or vice versa?). I should look into this more.

As for 'ox', the Bureau of Translators vocabulary has the transcription

委罕 *wej xan (#143)

whereas the Bureau of Interpreters vocabulary has the transcription

亦哈 *i xa (#411)

Jin (1984: 128) takes the transcription *wej xan at face value, reconstructing Jurchen weixan (= weihan in my notation) which violates vowel harmony (e and a belong to opposing vowel classes and should normally not be in the same root). Kiyose (1977: 105), on the other hand, disregards the *w- without explanation and reconstructs Jurchen ihan which matches Manchu ihan [ɪχaʜ] 'ox'. Kane's (1989: 216) reconstruction of iha is straightforward.⁵

Once again, the obvious solution is to posit loss over time: earlier wi- became later i-. The *-e- of the transcription simply reflects the fact that Ming Mandarin had no syllable *wi; *wei was the closest match for Jurchen [wɪ]. Manchu has no wi, so all early Jurchen wi became later Jurchen/Manchu i. Japanese had the same wi > i change, which is why the kana / for <wi> are now obsolete.

The trouble is that there is no support elsewhere in Tungusic for an initial w- in 'ox'; all the non-Jurchen forms in Cincius (1975: 299) start with i-type vowels. (Oddly I cannot find the 'ox' cognate set at starling.)

Is it possible that Jurchen once preserved a Proto-Tungusic *w- that all other languages lost before *i⁶, even though Jurchen/Manchu is considered innovative? There is no a priori reason to reject that possibility; a language that is innovative in many ways can still be conservative in at least one way. Ideally I would like to find other cases of Jurchen wi- corresponding to i- elsewhere in Tungusic.

¹2.10.13:45: I have only seen this character followed by <giyan> in the Bureau of Translators vocabulary. Nonetheless I don't think it was a Ming dynasty addition to the Jurchen character set. It is not attested in words other than 'red'. So I suspect that it was originally a standalone logogram <RED> and that <giyan> was added later to represent its final syllable.

²2.10.13:55: This character appears by itself in the Jurchen Character Book manuscript thought to be an early catalog of characters. That suggests it was originally a standalone logogram <OX> and that the <an> in theBureau of Translators vocabulary is a later addition.

³2.10.23:29: Or perhaps [fʊlʁʲaʜ] with [ʁ]. The Bureau of Interpreters transcription without *k might indicate that the uvular stop had lenited to the point where it was hard to perceive.

⁴2.10.17:17: The Manchu spelling fulgiyan appears trisyllabic, but -iy- is just a means to write palatalization.

⁵2.10.14:19: The absence of an -n present in Manchu is a common trait of the Bureau of Interpreters inscriptions. See Kane (1989: 112) for other cases of a Jurchen zero : Manchu -n correspondence.

⁶2.10.23:27: Cincius' enormous Tungusic dictionary (1975) only has six pages of entries for в- <v> and only two entries for ви- <vi>, both for Evenki words without cognates elsewhere. So it does not appear there is any obvious modern (i.e., non-Jurchen) evidence for reconstructing Proto-Tungusic *wi-. I suspect *w- was once far more frequent and lost in most environments (e.g., in Manchu w is only possible before a and e). The only *w-word I could find in starling's Proto-Tungusic is *wa- 'kill' which is solidly attested throughout the family.

As tempting as it may be to reject wi- in Jurchen (and, by extension, earlier Tungusic), the Chinese transcription 委 *wej is difficult to explain away since (1) the Chinese could have easily chosen an *i-character to write a Jurchen i- and (2) I cannot think of any *i-character that might be miswritten as 委 *wej. WAS G'AG'AI KOREAN?

Tonight it occurred to me that the Manchu script was ironically credited to two men with non-Manchu names, Erdeni and G'ag'ai.

Erdeni is the Mongolian borrowing of Sanskrit ratna- 'jewel' with an initial vowel added to avoid an initial r- forbidden by Mongolian phonotactics.

Crossley (2000: 185) wrote,

Like those of many leaders of the Nurgaci period, Erdeni's origins are difficult to characterize. He had a Mongol name and certainly could write Mongolian [the written language used by the late Ming Jurchen, just as the Jin Jurchen before them had used Khitan]; he may have been a native of a Mongolian-speaking region. But the early Manchu records suggest that he was also expert in Chinese, and that in his contemporary frame he functioned as a Nikan.

Nikan is Manchu for 'Chinese', and in Crossley's view, the term does not simply mean 'of Chinese descent'; it refers to "those who behaved as Chinese" (2000: 55). One could be ethnically Mongol - or Jurchen or Korean - and function as Nikan.

Crossley (2000: 188) speculates that G'ag'ai might have been of Nikan "background" (ethnicity) "but in fact it [his heritage?] was irrelevant" since his "responsibility for literate acts under the [Jurchen] state" made him live as a Nikan.

So was G'ag'ai [kakaj] a Nikan - er, Chinese - name? It has the velar-a sequence absent from native Jurchen/Manchu (and Mongolian) words¹. However, I don't know of any plausible Ming (or even modern standard) Mandarin name element pronounced [ka]². Here's a wild guess - might the name be Korean: i.e., something like 가개 Kagae ([kakaj] or [kagaj]³ in the 16th century)? Googling for 김가개 Kim Kagae, I found this 2009 article by 최범영 Chhoe Pŏm-yŏng which not only mentions an attestation of the name in 1404 but independently speculates that Korean Kagae is the source of G'ag'ai's name.

2.9.0:59: The name Kim Kagae appears as 金加介 in the entry for day 29, month 8 of the 12th year of King Sejong's reign (1430) in 世宗實錄 Sejong shillok 'Veritable Records of [King] Sejong'. I can't find any mention of a Kim Kagae in the entry for 1404 in 太宗實錄 Thaejong shillok 'Veritable Records of [King] Thaejong', so I don't know if the Kagae of 1404 is also spelled 金加介. It may be a native Korean name with varying Chinese character spellings.

¹2.9.0:05: The situation with g'a [ka] and ga [qa] in Jurchen/Manchu is similar to that for k'a [kʰa] and ka [qʰa].

²2.9.1:21: There are, in fact, no [ka] syllables in 'Phags-pa transcription a few centuries earlier, and [ka] only has a marginal status in modern standard Mandarin⁴. Windows 10's Pinyin IME's first suggestion for Pinyin ga [ka] is the transcription character 噶 for foreign ga: e.g., 喀什噶爾 Kāshígá'ěr 'Kashgar' and ... 噶蓋 Gágài, the Chinese transcription of 'G'ag'ai'. The other ga-suggestions are 嘎尕尬旮呷軋釓尜伽咖戛夾胳嘠錷玍魀, none of which I've ever seen in a name.

³2.9.1:01. I don't know if intervocalic voicing already existed in 16th century Korean. [aj] did not monophthongize to [ɛ] until the "end of the eighteenth century" (Lee & Ramsey 2011: 264).

⁴2.9.1:12: Old Chinese was full of *ka (= Baxter and Sagart's *kˁa) which became Middle Chinese *ko which in turn became modern standard Mandarin [ku].

Middle Chinese gained a new *ka from Old Chinese *kaj. (The final *-j shielded *-a from raising before being lost.) This too was lost in modern standard Mandarin: the new *ka became *ko and then [kɤ].

You can see part of a vowel shift chain: *aj > *a > *o > *u.

In tabular form:

Old Chinese
Middle Chinese
Early Mandarin
Modern standard Mandarin
[ɤ] after velars

I have excluded reflexes of early Mandarin *o after other initials. FUK'ANGGAN I GEBU (THE NAME OF FUK'ANGGAN)

The 乾隆 Qianlong emperor died 220 years ago today. He appointed 福康安 Fuk'anggan to lead the troops in the Sino-Nepalese War.

Fuk'anggan has an interesting name for two reasons:

1. It has the typical Chinese three-syllable pattern, it contains the Chinese syllable k'ang with a velar-a combination [ka] absent from native Manchu words¹, and it even has a meaningful, positive Chinese character spelling: 'good-fortune health peace'. Yet it is romanized as a trisyllabic single name because it is not a Chinese name - 福Fu is not his surname, though it may have been influenced by his clan name Fuca (spelled with a different fu, 富 'rich', in Chinese: 富察). And his personal name was not 康安 K'anggan; it was Fuk'anggan. At most I could say that Fuk'anggan is a Sino-Manchu hybrid; it wouldn't have been a Jurchen name many centuries ago.

2. I would not expect 安 to correspond to Manchu gan; it was read an in Beijing.

Does gan for 安 reflect the influence of a Mandarin dialect in which 安 was read ŋan? (2.8.0:04: There are many such dialects today.) [ŋ] was not a possible syllable-initial consonant in Manchu, so [ŋŋ] was not possible word-internally in Manchu, and [fukʰaŋɢan] would be the closest Manchu approximation of a Mandarin *fu kʰaŋ ŋan.

¹2.8.1:08: ka in the Möllendorff romanization of Manchu that I use represents [qʰa] with a uvular [qʰ]. [qʰa] is more common in Manchu than the loan sequence [kʰa], so it makes sense to use ka for the more frequent syllable and k'a for the less frequent syllable.  The apostrophe after velar letters corresponds to velarity, not aspiration as in the Wade-Giles romanization of Mandarin.

Möllendorff did, however, use the apostrophe for aspiration to romanize other Manchu letters for Chinese transcription: ts' [ts] and c' [tʂʰ] (the latter only before y in his romanization). I favor Norman's decision to drop the aspiration in those cases since there is no native [tsʰ] that contrasts with ts'. Nor is there a native cy that contrasts with c'y. HAVE AN ICE DAY

Today is the first day of the new year. The first of the month - ice inenggi 'new day' in Jurchen (see Andrew West's online Jurchen calendar):

Jin Qizong derived the character for ice 'new' (pronounced with two syllables: [itɕə]) from the left side 亲 of Chinese 新 'new'. I couldn't quite buy that because of the asymmetry of the Jurchen character and the symmetry of 亲. But I just found the asymmetrical Chinese variant 𢀝 from the Jin (!) dynasty dictionary 四聲篇海 Si sheng pian hai 'Sea [of Writings] Arranged by the Four Tones'. (I got the title translation from Imre Galambos.)

As an adherent of Janhunen's ex Parhis² hypothesis, I don't think the Jurchen script was Chinese mutiliated on the spot by Wanyan Xiyin in 1119. Rather, I think 完顏希尹 Wanyan Xiyin adapted an existing Parhae script that was a local (i.e., Manchurian) variant of the Chinese script. And the character for 'new' in the Parhae script might have been that variant 𢀝 or something close to it - possibly even identical to the Jurchen character. CIKOSKI'S NOTES FOR A LEXICON OF CLASSICAL CHINESE

Today I discovered John Cikoski's Notes for a Lexicon of Classical Chinese, Volume I (2011) while looking for Bernhard Karlgren's (1954) quotation about the excesses of phonemics. The book would have strongly appealed to me if I were still a Karlgrenian.

In the early 90s I borrowed every book of Karlgren's I could find. My favorite remains his 1954 Compendium of Phonetics in Ancient Chinese and Archaic Chinese which walked me through the reasoning behind his reconstructions. I no longer agree with him on many matters, but at least I know why he did what he did. A scientist must insure that his results are replicable and not seemingly pulled out of a hat.

When I first saw Pulleyblank's Middle Chinese (1984) in 1992, my gut reaction was disbelief. Chinese couldn't have looked like that! Too bizarre! It would be another year before a second look at Pulleyblank persuaded me.

If I had never become a Pulleyblank fan, I would enjoy Cikoski's book more. Cikoski picks up where Karlgren left off and builds upon the master's reconstruction while still avoiding what he perceives as the pitfalls of modern approaches. Details later.

2.5.21:17: But in the meantime I found the other volumes of his Lexicon with a copyright notice, covers, and a non-Unicode Grammata Serica font with a key here. THE FATHER OF JURCHEN LANGUAGE STUDIES

Today I realized that's who Wilhelm Grube was when I read his Wikipedia entry. I've known about him since the mid-90s. I have no idea why it took me so long to see the obvious.

I also saw Andrew West's scan of Grube's seal (葛祿博藏書印 'Seal of the Library of Ge Lubo', read from top to bottom, right to left):




cáng 'to store'

I didn't recognize the seal form of 藏 'to store'; it's so much simpler than the regular print form 藏. The closest Unicode match is in CJK Unified Ideographs Extension B: 𤖋. I'm surprised 𤖋 is not in this list of 28 variants of 藏.

As simple as 𤖋 is, it's not as simple as the proposed second-round simplified character 䒙 - one of the lucky ones in Unicode (CJK Unified Ideographs Extension A, to be exact). Some second-round characters still aren't in Unicode (and are marked in red on Andrew West's page). It's incredible ... we can type Tangut in Unicode but not "newspapers, books, and publications of all kinds" written in second-round simplified characters in 1978.

藏/𤖋/䒙 has two standard Mandarin readings, cáng and zàng. Neither quite matches the reading of the phonetic of 䒙, 上 shàng. However, 上 is a very transparent phonetic for 䒙 in Wu varieties like Suzhou in which both 藏/𤖋/䒙 and 上 can be [zɒŋ]¹ (ignoring tonal differences; compare the readings here and here).

¹2.4.1:10: 上 also has a colloquial Suzhou reading [zaŋ] with a different vowel. THE 75TH ANNIVERSARY OF THE BATTLE OF KWAJALEIN

got me thinking about Marshallese vowels and the perpetual mystery of Tangut rhymes again for the first time since 2014. The very first time I thought of comparing Marshallese with Tangut was in 2010. And nearly a decade later, it was the sight of Kwajalein in IPA that got me on that track again:


The complex vowels of Marshallese are analyzed as just four basic vowel phonemes /a ɜ ɘ ɨ/ that 'warp' under the influence of consonants with various qualities.

Similarily, the complex vowels of Tangut could have been just six basic vowels (u i a y e o) that 'warped' under the influence of consonants with various qualities (pharyngealized, uvularized, and plain from a Xun Gong-type perspective).

The 'grades' of Tangut correspond to those qualities. I write grades as numbers after basic vowels: e.g., ka1 is grade I ka.

I could write Marshallese using a similar notation: e.g., /kʷɨwatʲlʲɜjɜnʲ/ (?) 'Kwajalein' as k1ɨw1at3l3ɜjɜn3. I can't place the 'grade' numbers after the vowels since vowels are influenced by consonants on either side, and not all consonants are followed by phonemic vowels. In my Marshallese 'grade' system, 1 is labial(ized) and 3 is palata(lized); 2 - not in 'Kwajalein' - is velar(ized).

2.4.10:30: Here's my (mis?)understanding of how /kʷɨwatʲlʲɜjɜnʲ/ (?) surfaces as [kʷuɒ͡æzʲ(æ)lʲɛːnʲ]

1. /ɨw/ becomes [u] after /kʷ/.

2. /a/ becomes [ɒ͡æ] (starting labial like /w/ and ending palatal like /tʲ/) between /w/ and /tʲ/.

3. Palatal [æ] is inserted to break up palatalized /tʲ/ and /lʲ/.

4. /tʲ/ voices to [zʲ] between vowels.

5. /ɜ/ becomes palatal [ɛ] between palatal(ized) consonants (/lʲ/ and /j/; /j/ and /nʲ/).

6. /VjV/ contracts to a long vowel [ɛː].

I have doubts about whether abstract phonemic forms like /kʷatʲlʲɜjɜnʲ/ represent what speakers are thinking. The phonemic-phonetic gap seems enormous:

ɜ j
ɒ͡æ æ ɛː
Marshallese spelling
Ø l
English spelling

I am reminded of Bernard Karlgren's (1954: 366) criticism of an

intellectual sport - to write a given language with as few simple letters as possible, preferably no other than those to be found on an American typewriter.

/ʷ/, /ʲ/, and /ɜ/ obviously aren't found on an American typewriter (or any typewriter unless it's been customized, I imagine), but the problem remains: how far should a phonemic analysis go before it no longer corresponds to reality? RYUMUNADESU

Thirty years ago tonight, リュムナデスのカー サ Ryumunadesu no Kāsa 'Limnades Caça' made his animated debut on Saint Seiya. I had first seen him in the manga some months before that. That was my first exposure to the name of a kind of naiad. I had assumed the Greek name was Lymnades since Japanese borrows Greek y as yu. But in fact the closest Greek name is Λιμνάδες Limnádes with i, not y.

Could mangaka Kurumada Masami have arbitrarily changed ムナデス Rimunadesu to リュムナデス Ryumunadesu? I have doubts because I don't remember him altering any other foreign mythological names. This page lists many of those names as spelled in his manga/the anime: e.g., スキュラ Skyura 'Σκύλλα Scylla' (with the expected yu : Greek y correspondence).

The same katakana spelling appears in 門あさ美 Kado Asami's song title リュム ナデス Ryumunadesu from 1985 - three years before the Ryumunadesu in the Saint Seiya manga. Did Kurumada get his spelling from the song, or do both attestations of Ryumunadesu independently derive from a common source?

The fact that this entry in 幻想世界神話辞典 Gensō sekai shinwa jiten 'Fantasy and World Mythology Dictionary') is titled リュムナデス Ryumunadesu and cites two sources

ギ リシア神話小事典 Girisha shinwa shōjiten (A Small Encyclopedia of Greek Mythology), a 1979 translation of Bernard Evslin's Gods, Demigods, and Demons: An Encyclopedia of Greek Mythology (1975)

世 界の妖精・妖怪事典 Sekai no yōsei·yōkai jiten (An Encyclopedia of the World's Fairies and Mythical Creatures), a 2003 translation of Carol Rose's Spirits, Fairies, Leprechauns, and Goblins: An Encyclopedia (1996)

suggests that the リュムナデス Ryumunadesu spelling has a life beyond and a history predating the Seiya character and the song title.

Might リュムナデス Ryumunadesu have originated as a error by some Meiji period translator who confused Greek i with y? I'm guessing the spelling might go as far back as Meiji since I can't imagine the Japanese only learning about the Limnades during the last century. Unfortunately Google Books Ngram Viewer doesn't do Japanese yet, so I can't see any attestations of the spelling in old books. JURCHEN 1284: MAHILA 'HAT'

If I had more time, I'd write an English dictionary of Jurchen characters, building upon the foundation that Jin Qizong laid in his 1984 女真文辞典 Nüzhenwen cidian 'Jurchen dictionary'. Ideally it'd be online so I could continually update it. But in reality ... you'll get random blog entries like this one about this character or that.

Tonight's character is numbered 1284 in N3788¹. It is only attested as the first half of mahila 'hat' in the Sino-Jurchen vocabulary of the Bureau of Translators (Kiyose #547):

1284 0176 <HAT la>

Although 1284 does not appear in which seems to be the earliest surviving list of Jurchen characters, I suspect that it was originally a standalone character for mahila 'hat' in the early 12th century, and that <la> was later added to it as a phonetic clarifier at some point prior to the compilation of the Sino-Jurchen vocabulary in the 15th century. I agree with Jin Qizong who regards it as a pictograph.

The second character 0176 is a common phonogram for la. See Kiyose (1977: 70) for a list of its other occurrences within the vocabulary and Jin Qizong (1984: 36-37) for examples in other texts. It is apparently the sole Jurchen character pronounced la.

I think of 0176 as Chinese 友 'friend' with an extra dot, but the first stroke of the part of 0176 resembling the 又 component (originally a drawing of a hand, though it does not represent a word for 'hand' in Chinese) stretches further leftward, crossing over the 丿 stroke (part of 𠂇, a drawing of another hand). How many Chinese students of Jurchen miswrote 0176 as 友 plus a dot?

Speaking of hands, the Tangut word for 'hand' is 𗁅 3485 1laq1 < *S-lak. 1laq1 and other Tibeto-Burman (i.e., non-Chinese Sino-Tibetan) words for 'hand' sound like 0176 la. Is 0176 a repurposed character originally intended to write 'hand' in  some Tibeto-Burman language²? That hypothesis makes no geographic sense, as there were no Tibeto-Burman languages spoken in Manchuria³. I regard the correspondence between  the 又 hand shape and Tibeto-Burman lak-words for 'hand' as a coincidence.

¹If I use N4631 numbers for Khitan large script, I might as well use N3788 numbers for the Jurchen (large) script.

²1.28.20:35: The Tangut script is a rich source of pareidolic stimuli. After 23 years, I suddenly 'saw' the hand-shape in the right-hand component 𘦳 of 𗁅 3485. (I still don't know why that component, often regarded as 'hand', cannot stand alone and needed a vertical stroke to be a standalone character.) If one pulls apart 又 into its component strokes フ and 乀, inserts two more 丿 between them, and adds two strokes 丷 on top, the result is  𘦳.

One could also subtract what I've called the 𘡊 'horned hat' and see the remaining 𘢌 as Chinese 手 'hand' tilted 45 degrees, but I think the resemblance between the two elements is coincidental. 𘢌 is often (but not always!) 'person', and Grinstead (1972) has derived it from a variant of Chinese 人 'person' with two extra intersecting strokes on the bottom right. (Alas, that variant is not yet in Unicode. Here is a similar variant with three nonintersecting strokes.)

³1.28.21:14 (expanded 22.33): The Jurchen script is an offshoot of the Parhae script of Manchuria. But even if the roots of that script go back westward to the lost 'Serbi script' (to use the term from Shimunek 2017: 121), that script was for Serbi (Xianbei), not a Tibeto-Burman language.

Thirty years ago, Kwanten (1989: 19) wrote,

I have recently come in possession of a number of early T'ang documents written in a script that bears very close similarity with Tangut. These documents will be the subject of a later communication, but they appear to solve the mystery [of the origin of the Tangut script] discussed above. I wish to thank Prof. Edward S.I. Wang of the Chinese Culture University in Taipei for having drawn my attention to these documents.

Unfortunately, to the best of my knowledge Kwanten never wrote about those documents or about Tangut again.

If I assume that those documents (which I have never seen) indeed contained a Tangut-like script from the early Tang, and if I take into account the fact that the Tangut ruling house claimed descent from the Tuoba of the Northern Wei (see Dunnell 1994: 157-158 for a discussion of interpretations of that claim), I can come up with this highly speculative and almost certainly wrong scenario:

- The Tuoba rulers spoke both Serbi and a Tibeto-Burman language (pre-Tangut?)

- The lost Serbi script was an offshoot of the Chinese script designed to write both languages (cf. Pahawh Hmong which was intented to write both Hmong and Khmu, though no examples of Khmu in Pahawh Hmong have survived)

- The Tangut script is a western descendant of this script, and the  Parhae script is an eastern descendant. Khitan and Jurchen large scripts both descend from the Parhae script.

One huge problem with this is that I am unaware of any evidence for any Tibeto-Burman language in the Northern Wei. The Chinese transcriptions of Middle Serbi analyzed by Shimunek (2017: 125-163) are Mongolic-like (Janhunen's Para-Mongolic, a term Shimunek rejects), not Tibeto-Burman.

Another huge problem is that there is no resemblance between the Tangut script on the one hand and the Parhae/Jurchen/Khitan (PJK?) scripts on the other beyond a shared set of Chinese stroke types. No one is going to confuse Jurchen

0176 la

with the Tangut element (not character) 𘦳 'hand', much less the actual Tangut character for 'hand', 𗁅 3485 1laq1. FIELD OF FORTUNE

No one is going to give me an award for awareness. Obliviousness, maybe.

I don't know how I missed Andrew West's latest Khitan post from last month. At least I'm only a month late.

He deals with two inscriptions in the Khitan large script. The last graph in the first inscription is

0819 (I'm going to follow Andrew's lead and start using N4631 numbers.)

which looks exactly like Chinese 田 'field'.

Andrew wrote,

Liu Fengzhu and Wang Yunlong 2004 propose the reading [ku].

I am confused. I have not been able to find 0819 in 劉鳳翥 Liu Fengzhu and 王雲龍 Wang Yunlong's 契丹大字《耶律昌允墓誌銘》之研究 (2004) or in Andrew's index to their appendix of Khitan large script characters and readings. This is the first time I have seen the reading [ku].

For many years I have assumed 0819 was read [ʊʁ] (ugh in the loose transcription style I've been using on this site) on the basis of two readings in Kane (2009: 183):

0729 0819 Nirug (Kane; 耶律褀墓誌 17; 23:36: corresponding to the name of a 耶律 Yelü clan member transcribed as 涅魯古 *nje lu ku in 遼史 History of the Liao Dynasty? related to Written Mongolian nirughun 'back, spine, mountain range'?)

1254 0819 Qudug (Kane; name of a general in 多蘿里本郎君墓誌銘 14, name of someone's son in 耶律褀墓誌 14 and perhaps the same person again in line 16 of the same inscription)

Kane does not cite sources for either of these forms (or many others in his book), so I have supplied attestations that I have seen. (I can't say I've seen many Khitan large script texts.)

The large script name Qudug seems to correspond to Kane's (2009: 81) reading qudug 'happiness, good fortune' for the unusually complex small script character 380 (Kane's number)

that "Liu, Chinggeltei, Aisin Gioro and others identify [...] with“ the northern Chinese transcription 胡覩古 *xu tu ku¹. Normally I expect single logographs in the large script to correspond to two-character blocks in the small script, but this is the only case of the reverse that I can think of.

How can the [ku] reading of 0819 be reconciled with Kane's -ug / my [ʊʁ]? Here are two solutions:

1. Reversible readings

0819 was like Old Turkic 𐰸 which could be read as qu ~ qo ~ uq ~ oq ~ q depending on context (Tekin 1968: 24).

For years I have assumed that Khitan characters of this type were read as CV after vowels and VC after consonants. So Nirug and Qudug in the large script were <nir.ʊʁ> and <qʊd.ʊʁ>.

I would expect the [ku] reading (my [ʁʊ]) to be after a vowel, but I don't know what the context was and can't test my guess.

2. Only one reading

What if the northern Chinese transcription 胡覩古 *xu tu ku represented a Khitan [qʰʊdʊʁʊ]? Then 0819 could have been [ʁʊ] everywhere.

The trouble is the alternate transcription 胡都 *xu tu reflecting another strategy to deal with final consonants absent in northern Chinese: namely, ignore them. This zero ~ *ku alternation implies a Kitan final [k]-like consonant. The word has an uvular initial in later languages, and in this region uvulars generally forbid following velars. So the final consonant has to be uvular [qʰ] or [ʁ], not velar [kʰ] or [g]. And that final consonant has to be [ʁ], since Chinese unaspirated obstruents were used to approximate Khitan voiced obstruents.

For now I think solution 1 is probably right. However, to be sure I would need to see the context for which the [ku] reading was proposed.

¹Why not interpret the underlying Khitan word as [xutuku]? The limited northern Chinese syllabary was unable to cope with Khitan phonetics:

1. There was no northern Chinese [qʰ]. Chinese *x- (possibly [χ]) was the closest equivalent.

2. There was no northern Chinese [ʊ], at least in open syllables.

3. There was no northern Chinese [d].

4. There was no northern Chinese [ʁ].

5. There were no final stops in northern Chinese, so foreign final consonants were either rendered with CV-syllables or ignored (as in an alternate transcription of the Khitan word as 胡都 *xu tu).

I will discuss the Turkic, Mongolian, Jurchen, and Manchu evidence for this word in a separate post. Without that evidence, it would not be unreasonable to reconstruct *[xudug] without any uvulars or [ʊ]. I HAVE SHIMUNEK'S BOOK!

I thought I'd never see a copy of Andrew Shimunek's Languages of Ancient Southern Mongolia and North China (2017). I thank Prof. Victor Mair for reminding me about it. I then finally realized I could borrow it from the SOAS library. Duh. It wasn't on the shelves, so I had to order it from offsite. I picked it up today. Here's the photographic proof:

Andrew Shimunek, Languages of Ancient Southern Mongolia and North China

It is HUGE. 517 pages - more than two hundred pages longer than Daniel Kane's The Kitan Language and Script (2009) which has almost always been at my side since 2011. (I didn't take it with me when I studied in Thailand and Burma. Shame on me?)

I'm running out of time tonight, so I just want to say one thing about the book. (If I had all the time in the world, I'd write a book about the book.)  Since 2019 is the 900th anniversary of the Jurchen large script, I went to the index in search of the Jurchen script. Seven pages are listed (xxv, 99, 105-108, 362), but flipping through the book, I've seen more Jurchen than that. I should write a Jurchen index for the book which has no indexes for language names and subjects but not specific words. I'll post the index here when I'm done.

1.26.3:13: Of course I'd like to write other indexes for the book as well. DO PYU AND PA-O SHARE A WORD FOR 'TO POUR'?

One of the many frustrating things about Pyu is that there are two styles of writing it: full and abbreviated. And no one has yet figured out why there were two styles¹, much less why they were mixed in one text (PYU 8). The problem is reminiscent of the mystery of the Khitan large and small scripts. In 2010, Andrew West wrote,

Having looked at and discounted the various possibilities outlined above, we seem to be none the wiser about why there were two completely different ways of writing the Khitan language. Both scripts are complex enough to require a considerable investment of time and effort to learn to read and write, so how is it possible that both scripts managed to coexist and flourish for so long ? Did the Khitan education system require students to learn both scripts, or were Khitan scholars only able to read and write one or other of the two scripts ? It makes no sense to me ...

One major difference - besides the fact that the Pyu writing styles involve only a single script - is that I presume both Khitan scripts provide more or less the same amount of phonological information in their non-logographic characters. That is not true of the two Pyu styles.

The abbreviated style omits all subscript consonants representing codas²: e.g., in PYU 8, the one text mixing the styles³, 'to be named' appears as rmiṅ·⁴ on line 3 but as rmi without subscript ṅ· on line 4. (All lines of PYU 8 after 3 are in the abbreviated style.) Until Arlo Griffiths' recent identification of the subscript consonants, Pyu was thought to be an exclusively open syllable language like Tangut⁵. Arlo also identified the r- atop rmiṅ·. So until he came along, the word was thought to be /mi/. Now I interpret it as /r.miŋ/.

If a Pyu text has no subscript consonants, it is most likely in the abbreviated style (though it is also possible that the text happened to have no closed syllables requiring subscript consonants⁶, particularly if it is very short).

The word cha 'to pour' appears in PYU 7.18 and 8.18. (PYU 7 is nearly identical to 8; one major difference is that PYU 7 is completely in the abbreviated style.) If the word had only appeared here, it would not be possible to determine if cha had a coda or not. However, there is a word chai 'to pour' in PYU 7.22 and 8.23. Scholars disagree on whether cha and chai mean the same thing. I belong to the school of thought that regards them as semantically identical. I go further and also regard them as phonemically identical: two different abbreviations of  a hypothetical full spelling for /cʰaj/.

If I am correct, then /cʰaj/ cannot be compared to Written Burmese ဆမ်း chamḥ 'to pour on food'; the codas cannot be reconciled.

But could /cʰaj/ be compared to Pa-O chjā 'to pour' which I found in Solnit's 1989 wordlist today? Pa-O is a Karenic language, and both Katō (2005) and Krech (2012) have proposed that Pyu is Karenic. So this is not a case of me finding a potential cognate in some random Sino-Tibetan language far from Pyu. If the distribution of Karenic languages in the past were like their distribution today, Pyu may have had Karenic neighbors. Or should I say relatives?

For now I continue to regard Pyu as an isolate within Sino-Tibetan - the family's equivalent of Albanian - or if an equlaly extinct parallel is desired, Tocharian. However, that doesn't mean I am not on the lookout for any lexical parallels which could be inherited or borrowed.

I don't think Pa-O chjā 'to pour' is one of those parallels. The word appears to be isolated within Pa-O, unless it is somehow related to other Karenic words with a *stopped tone⁷. Worse yet, Luangthongkum (2014: 9) regards Proto-Karenic *-e as the reflex of Proto-Tibeto-Burman *-a(ː)j. (But then where does her Proto-Karenic *-aj on p. 5 come from?) And her *-e apparently remained -e in northern Pa-O but warped to -ei in southern Pa-O (Shintani 2012: 31):

Proto-Karenic *ble A 'tongue' >

Northern Pa-O phre 33

Southern Pa-O plei 53

(It's not clear to me what sort of Pa-O is in Solnit 1989. I'm guessing northern since 'tongue' in his Pa-O is phrē.)

I don't believe in Proto-Tibeto-Burman (i.e., a common ancestor of all non-Chinese Sino-Tibetan languages), but if I assume that Pyu /aj/ is a retention from Proto-Sino-Tibetan, then I would expect it to correspond to Pa-O -e(i), not -a. And I would expect Pa-O a to correspond to Pyu a; cf. 'moon':

Proto-Karen (Luangthongkum 2013) *ʔla A >

Northern Pa-O la 21

Southern Pa-O la 42

Pyu rla / (PYU 4-6)

The initial ch- of 'pour' may also be a red flag, as it is extremely rare in Pyu, appearing only in three late texts (PYU 7, 8, and 39) and a single undated molded tablet (PYU 86). It  almost always occurs in grammatical morphemes with variant forms in c. I suspect the ch-forms are sandhi variants. That leaves cha(i) as the only content word with ch-. Might it be a loanword? Perhaps it was borrowed from some Karenic language that broke *-e to *-aj. Or that ch- is from some rare Old Pyu cluster that fused into a simple onset in Late Pyu. (1.26.1:46: But no other such fusions are known to have occurred. However, Pyu spelling may be conservative. Could the ch- of 'to pour' be an 'error' revealing the 12th century pronunciation of an earlier cluster?)

¹1.26.1:11: All known molded tablets are in the abbreviated style. I have suggested that the abbreviated style was used due to the small amount of space on the tablets. However, there are also inscriptions with the abbreviated style despite ample space for the full style (e.g., PYU 2-6). And there are inscriptions in the full style squeezed onto small surfaces (e.g., the bottom edge of PYU 24). So space was not always a factor in the choice of style.

²1.26.2:00: All subscript consonants represent the codas, but the reverse may not have been true if the nonsubscript character -ḥ represented a glottal coda /h/ (or /ʔ/?). I also think indicated the voicelessness of sonorant codas: e.g., the honorific ḅay·ṁḥ was /ɓäj̊/ (and was abbreviated as ḅaṁḥ with the indicating a voiceless but unwritten /j/).

³1.26.1:11: On the other hand, the two Khitan scripts are never mixed.

⁴The middle dot indicates that the preceding consonant letter is written as a subscript character.

⁵My Tangut notation does not make this clear since I use -n, -q, -r, and -' after vowels to indicate nasalization, tenseness, retroflexion, and an unknown quality of vowels that are not followed by codas.

⁶1.26.2:06: In other words, the only closed syllables in the text were those ending in /h/ and in voiceless sonorants (whose voicelessness was written as ḥ). If turns out to have been a marker of phonation or tone, then the text would only have open syllables.

⁷1.26.2:13: Katō (2005: 5) proposes that cha 'to pour' is cognate to Eastern Pwo chè, Western Pwo sheʔ, and Sgaw chɛ̄ʔ, all in the H3 (*high stopped) tone category. I don't think any of those Karenic forms are cognate to cha because I would expect a Proto-Karenic final stop to correspond to a Pyu final stop or /h/ (which might have really been /ʔ/). DOES PYU ṄA 'TO SPEAK' HAVE A TANGUT COGNATE?

Today I realized that Pyu ṅa /ŋa(C)/, a verb of saying (PYU 7.14, 8.14) might be cognate to Written Tibetan ngag 'speech' and Old Chinese 語 *ŋ(r)aʔ 'to speak' and 言 *ŋa[n] 'speech'. Might ṅa also be cognate to the Tangut ngwu'-word family?

Li Fanwen number
language, speech
𗟲 1ngwu'1

speech, word (cf. 4902 below)
𗖸 to say, to eulogize
𗑾 speech, word (how is this different from 1014?)

One problem is the vowel. Tangut -u'1 is either from pre-Tangut *-oX or *-əX¹. The late EG Pulleyblank might suggest *a/*ə-ablaut: Pyu, Tibetan, and Chinese had *a whereas pre-Tangut had *ə. Perhaps comparative work with closer relatives of Tangut will point to one vowel or the other.

Another problem is the medial *-w- which is from pre-Tangut *P-. None of the other languages have preinitial or presyllabic p-. Contrast with Tangut 𗏁 1ngwy1 < pre-Tangut *P-ŋa² 'five' whose *P- corresponds to p- in Pyu piṁṅa /pïŋa/ 'id.'

¹1.25.0:52: -' represents an unknown Tangut phonetic quality and *-X represents its equally unknown pre-Tangut source.

²1.25.1:05: The vowel of 'five' irregularly changed to match that of the adjacent numeral 𗥃 1lyr'3 'four', presumably at a stage when 'four' was something like *R-ly' before *R- conditioned a retroflex vowel and was lost. If not for that change, 'five' would have been †1ngwi1. THE JURCHEN NAME OF EMPEROR SHIZONG (PART 4)

In parts 2 and 3 I covered the possibility of

as a Jurchen single-character spelling for the Jurchen name of 金世宗 Emperor Shizong (r. 1161-1189) which is only known to me as a Chinese transcription 烏祿 *u lu.

I don't know of any other single-character candidates for the spelling, so I'm going to move on to potential parts of a two-character spelling.

N3696 has a handy index of Jurchen characters organized by Jin Qizong's readings. It lists five types of u-characters (variants not shown here):

Why would Jurchen need five u when it could have done with one? Maybe because it didn't have just one? If Jin Jurchen had a /ʊ/ : /u/ distinction and a vowel length distinction, then four of the five could stand for /ʊ ʊː u uː/. And the fifth might not have stood for u in the Jin dynasty; it might have been a logogram for a word beginning with u- that was later spelled with it plus one or more phonograms, leading to its reanalysis as an u-character.

Another possibility is that the Jurchen script inherited a set of characters from the Parhae script that somehow made more sense for the language it originally represented but became redundant for Jurchen.

I'll keep those scenarios in mind as I examine the u-characters in depth from part 5 onward. NATIONAL HANDWRITING DAY: JURCHEN EDITION

Today is National Handwriting Day.

This year is the 900th anniversary of the Jurchen large script.

Intersect the two, and you get me writing gurun ni ngala herge inenggi, my attempt to translate 'national handwriting day' in Jurchen.

National Handwriting Day
gurun nation
ngala hand
herge shape, possibly script like Manchu hergen?

I'd like to comment on those eight graphs, but I'm out of time, and I want to get back to the Emperor Shizong series tomorrow. And I still have to write the later parts of "The Jurchen Script: Innovation or Derivation?". THE JURCHEN NAME OF EMPEROR SHIZONG (PART 3)

Moving on to the first half of

<? ? fushe den>

from the end of part 2, the antepenultimate character appears in two entries in the Bureau of Translators vocabulary: 强盛 'strong and prosperous' above, and 'sword' which was translated as

hanma (Kiyose and Jin Qizong's reading; I have converted Jin's notation into mine which in this case is identical to Kiyose's)

and transcribed into Ming Mandarin as 罕麻 *xan ma (#217). Hanma doesn't look like Manchu loho 'sword', but it does resemble Manchu halmari 'a sword used by shamans'. There was no Ming Mandarin syllable *xal, so *xan may correspond to a Jurchen hal-. (See Kane 1989: 130 for other cases of this type of correspondence.)

The -ri of halmari may be a noun suffix of unknown function. See Gorelova (2000: 114) for other examples of -ri. The Bureau of Translators vocabulary dialect may have preserved the bare stem halma without a suffix. See Kane (1989: 116) for other instances of zero in this dialect corresponding to Manchu -ri.

On the other hand, the character

only appears at the ends of the aforementioned two words, 'sword' and 'strong'. If it were simply read ma, it should be more common as a phonogram. Could it have represented maa with a long vowel or even mar? Kane (1989: 130) noted that Jurchen syllable-final-r was sometimes omitted in Chinese transcription but does not give any word-final examples.

Could that character (for mar?) be derived from the Chinese character 犮 which would have been read as *pɦar in northern Late Middle Chinese?

Jin Qizong (1984: 202) proposed mam as an alternate reading of that character but did not give any context for that alternate reading. (Norman 1978 lists only three Manchu words ending in -m, so I am skeptical of -m as a final in Jurchen.)

In any case, that character's reading contained a, so vowel harmony dictates that there was a break between


the a-word <? ma(a/r?)> 'strong' and the e-word <fushe den> 'prosperous'.

Let's go back to the Ming Mandarin transcription for that phrase:

兀魯麻弗塞登 *u lu ma fu sə təŋ

The first character

must correspond to 兀魯 *u lu. Or so Kiyose and Jin thought; both glossed it as 'strong' (which then raises the question of what the -ma after it was).

But wait. It just occurred to me that

might be -lma. If so, then the two words with it could be interpreted as

<SWORD.lma> = halma, transcribed as Ming Mandarin 罕麻 *xan ma

<STRONG.lma> = ulma, transcribed as Ming Mandarin 兀魯麻 *u lu ma

Compare the shape of <STRONG> to the right side 𧈧 of the Chinese character 強 'strong'.

I think the first characters of those words were originally standalone logograms. Those Ming spellings have a final phonogram that might not have been present when the script was originally developed in the early 12th century. If the final phonogram was <lma>, then my attempt to link it to 犮 *pɦar will have to be abandoned.

If <STRONG> was a standalone logogram for ulma (or ulumaa, urumar, etc.) then the early 13th century name

Aotun Ulu (Jin Qizong's reading)

may have been Aotun Ulma (or urumaa, ulumar, etc. - notice I haven't repeated the possible permutations). Unfortunately I don't know of any Tungusic cognates of the Jurchen u-word for 'strong' that could narrow down the possibilities. Could the word be non-Tungusic: i.e., Khitanic? (Not necessarily from literary Khitan but either some nonstandard dialect of Khitan or a related, unwritten language - the source of the Jurchen numerals 'eleven' through 'nineteen' which are para-Mongolic but not literary Khitan.)

It's also possible that <STRONG> is functioning as a phonogram for ulu or uru in that name. The character may give connotations of 'strong', but ulu or uru by itself may not mean 'strong'; that disyllable could be an unrelated partial homophone of 'strong'.

Even if <STRONG> in that name was ulu or uru, that still doesn't mean that was the name of 金世宗 Emperor Shizong (r. 1161-1189) - the character reading could have had l whereas the emperor's name could have had r - or vice versa.

And if even the two names have the same liquid, they might not have had the same vowels! Manchu had two allophones of /u/, [ʊ] and [u], in accordance with the rules of vowel harmony. Kiyose (1977: 45-47) argues on the basis of Ming Jurchen spelling that the Bureau of Translators dialect had a single /u/. Kane (1989) similarly posits a single /u/ for the Bureau of Interpreters solely on the basis of Chinese transcriptions (since the Interpreters' dialect was not also recorded in Jurchen spelling). However, Kiyose (1977) believes Jin Jurchen had a more complex vowel system than Ming Jurchen. My guess is that this system had seven vowels: three 'feminine', three 'masculine', and one 'neutral' (but who knows, maybe there was a 'masculine ī /ɪ/ too):

'feminine' vowels
i /i/
e /ə/
o /o/
u /u/
'masculine' vowels
(ī /ɪ/?) a /a/
ō /ɔ/
ū /ʊ/

(The macron does not signfy length. Möllendorf used a macron to transliterate Manchu <ū> [ʊ], and I have used it for other 'masculine' vowels except for a - ā would be redundant.)

In theory Jin Jurchen might have had a phonemic distinction between /ʊ/ and /u/. If so, then perhaps the emperor's name was Ulu /ulu/ with feminine vowels and the name of the successful 進士 jinshi candidate was Aūtūn Ūlū /aʊtʊn ʊlʊ/ with masculine vowels. (I think the u-vowel of <STRONG> was [ʊ] to harmonize with the masculine a-vowel.)

As we will see in the next parts, the Jurchen script has multiple characters for what seem to be u- and lu-syllables from a Ming Jurchen perspective. Such apparent redundancy may reflect Jin Jurchen distinctions between /ʊ/ and /u/ on the one hand and /lʊ/ and /lu/ on the other. But does the evidence support that hypothesis? We shall see. THE JURCHEN NAME OF EMPEROR SHIZONG (PART 2)

The most obvious candidate for the Jurchen spelling of the name of 金世宗 Emperor Shizong (r. 1161-1189) who died 830 years ago yesterday is

ulu (Jin Qizong's reading)

which appears in the name

Aotun Ulu (Jin Qizong's reading)

from a 1224 list of successful candidates for the degree of 進士 jinshi in the imperial examinations. (Alas, as of this writing the article does not cover examinations in the Jurchen Empire.)

Problem solved? No, not quite.

In part 1 I already mentioned the problem of whether 烏祿 *u lu, the Chinese transcription of Emperor Shizong's name, represented Jurchen Ulu or Uru. How do we know that his name was Uru and not Ulu? We don't.

But let's suppose it was Ulu. Did Ulu have to be written with the character


Perhaps not.

Let's look at how that character's reading was reconstructed.

I am unaware of any Jin dynasty Chinese transcriptions of the character. The only transcription I know of is from the Bureau of Translators vocabulary in which 强盛 'strong and prosperous' was translated as

uluma (or uruma) fusheden (Kiyose)

uluma fuseden (Jin Qizong)

and transcribed into Ming Mandarin as 兀魯麻弗塞登 *u lu ma fu sə təŋ (#761).

There is no word spacing in the Jurchen script. How did Kiyose and Jin decide where to make a break between 'strong' and 'prosperous'? My guess is vowel harmony. Normally in Jurchen, a and e do not coexist within a root.

But how do we know that there were different vowels a and e in the two roots? Let's work backwards from the last character which also appears in

geden 'leave'

and transcribed into Ming Mandarin as 革登 *kə təŋ (#862). Both transcriptions have 登 *təŋ in common, so

must have sounded like 登 *təŋ: i.e., it was den [təɴ] in the transcription system I use on this site.

One character down, three to go:

<? ? ? den>

The penultimate character also appears in

fushegu 'fan' (cf. Manchu fusheku 'id.'; I can't explain the g : k mismatch)

transcribed into Ming Mandarin as 伏塞古 *fu sə ku (#221). Both transcriptions have 弗塞/伏塞 *fu sə in common, so

must have sounded like 弗塞/伏塞 *fu sə: i.e., it was fushe [fusxə] in the transcription system I use on this site.

(I follow Kiyose in supplying an h [x] after s on the basis of Manchu fusheku. It is possible that the Ming Jurchen dialect of the vocabulary lost the h that standard Manchu retained from another Ming Jurchen dialect. In any case, there were no Ming Mandarin syllables *fus or *sxə, so 伏塞 *fu sə may or may not have stood for Jurchen fushe rather than fuse.)

Halfway there:

<? ? fushe den>

Fushe and den are in vocalic harmony (no a to conflict with e in either reading) and are likely to have been part of the same word. But was fusheden by itself 'prosperous', or did the second character represent one or more syllables at the beginning of 'prosperous'? (We can assume that at least the first character represented the Jurchen word for 'strong'.) Kiyose and Jin give away the answer above. However, if you want to learn the probable logic behind their answer, watch for part 3. THE JURCHEN NAME OF EMPEROR SHIZONG (PART 1)

金世宗 Emperor Shizong (r. 1161-1189) of the Jin dynasty died 830 years ago today. He was a great advocate of the Jurchen language and culture. He had the Chinese classics translated into Jurchen. Unfortunately, none of those translations have been found. I know of only nine or ten dated Jurchen texts from his reign, the 大定 Dading 'Great Settlement' period:

# of Jurchen characters
海龍 Hailong rock inscriptions
~20 + ~80 = ~100 total
河頭胡論河 Hetouhulunhe 100-household seal
和拙海欒 Hezhuohailuan 100-household seal
夾渾山 Jiahunshan 100-household seal
可陳山 Kechenshan 100-household seal
迷里迭河 Milidiehe 100-household seal
移改達葛河 Yigaidagehe 100-household seal
Jin Victory Memorial Stele
Zhaoyong General Memorial

Further details are at Wikipedia.

The last dated Khitan large and small script texts are also from his period:

- the epitaph for 李爱郎君 Court Attendant Li Ai (1176; 470 large script characters)

- the epitaph for the 博州防禦使 Defense Commissioner of Bozhou (1171; 1,570 small script blocks)

Again, further details are at Wikipedia.

Emperor Shizong's successor 金章宗 Emperor Zhangzong (r. 1189-1208) abolished the Khitan scripts.

But back to Jurchen - I've been wondering what Emperor Shizong's name was in the Jurchen script. The History of the Jin Dynasty (Basic Annals 5 and 6) presents it as 烏祿 *ulu in the Chinese script. Chinese transcriptions of Jurchen do not differentiate between Jurchen l and r since Jin Chinese had no *r. So his name could have been Ulu or Uru in Jurchen. The ambiguities do not stop there. In theory there are many ways to spell both Ulu and Uru in the Jurchen script. I'll be examining the possibilities in the following parts: 2, 3, 4, 5 (link to be added). WHAT IS THE ETYMOLOGY OF SPANISH CERDO 'PIG'?

The English Wikipedia derives Spanish cerdo 'pig' from Latin seta 'bristle' and the Spanish Wikipedia derives it from Latin setula, a diminutive of seta. Are those folk etymologies? I see several problems:

1. Is hair really the most prominent feature of a pig?

2. Latin s- should remain s- and not become c-.

3. Latin -t- should become -d-, not -rd-.

4. Latin -tul- should become -j- or -ld-, not -rd-: cf.

viejo 'old person' < vetulus 'little old'

espalda 'back' < spatula 'broad, flat piece'

Seda 'silk' looks like the regular Spanish reflex of Latin seta.

Steven Schwartzman derives cerda from Vulgar Latin *cirra 'a tuft of hair in an animal's mane'. But pigs don't have manes. And I would expect Spanish to retain *rr rather than shift it to rd: e.g., Latin carrus 'wagon' became Spanish carro, not cardo.

Might cerdo 'pig' have no Latin etymology? Might it be a borrowing from some substratal language? WAS TANGUT 2WUQ1 'TO AID' BORROWED FROM CHINESE? (PART 2)

In my last post, I expressed doubts about


0645 2wuq1 'to aid'

being a borrowing from Tangut period northwestern Chinese (TPNWC) *3wu3 or an earlier form (e.g., Early Middle Chinese *wuʰ) on the basis of its initial: why would Chinese *w- be borrowed as Tangut w- [ʔw]? Tangut had no simple initial [w]; the two obvious choices for imitating Chinese *w- were v- and w- [ʔw]. Gong (2002) did not identify any instances of Tangut w- [ʔw] for what I reconstruct as Chinese *w- before -u-type rhymes, but I can't say that it would be impossible for a Tangut wu to be from a Chinese *wu.

I thought the tense vowel rhyme of 2wuq1 would even more strongly rule out a borrowing scenario. As Gong first proposed, Tangut tense vowels derive from an earlier preinitial which I write as *S.-: e.g.,


0359 1tuq1 < *S.toŋ 'thousand' (cf. Written Tibetan stong 'id.'; more cognates at STEDT)

I use capital *S.- to indicate the possibility of multiple sources of tenseness-triggering *S.-. In that particular case I think *S.- really was [s], but in the case of another numeral, I am not so sure:


0359 1soq1 < *S.sum 'three' (cf. Written Tibetan gsum 'id.'; more cognates at STEDT)

I suspect that *S.s- was originally *ks- which then merged with *ss- via a *xs-stage.

For now I reconstruct all Tangut tense vowel syllables as having *S.- in pre-Tangut. But perhaps I should reconsider given that Gong (2002: 425) identified six Chinese loanwords with tense vowels. Likely Chinese sources are in bold.

Li Fanwen #
Tangut period NW Chinese
Middle Chinese
𗐯 4719


to write
𗒨 4696


*mujʰ taste

*ɕiˀ arrow
𗄭 1941

to gather

Chinese had long ago lost *sC-clusters, so the tense vowels in the Tangut borrowings do not reflect a Chinese *s-.

At least two of those loans postdated Middle Chinese:

- 'to write' reflects the Chinese sound change *-jæ > *-e

- 'taste' reflects the Chinese sound change *mu- > *v-

('World' is ambiguous.) Am I to believe that a prefix *S.- was present as late as the turn of the millennium and added to those loans which almost immediately developed tense vowels? E.g.,

TPNWC *3ke2 > *2S.-ke2 > *2kke2 > *2kkeq2 > 2keq2

all in the space of about a century?

Three loans are early:

- 'cymbals' preserves¹ Middle Chinese *b-

- 'arrow' underwent the Tangut *-i > -y shift which seems to have postdated Middle Chinese; it may date from the late first millennium AD (see 'to gather' below)

- 'to gather' underwent that same shift and preserves¹ Middle Chinese *dz-. Compare with 'taste' which has a post-Middle Chinese initial but did not undergo the Tangut *-i > -y shift, a change that must have occurred before it was borrowed. The potential of using loanwords to date Tangut sound changes has yet to be fully explored.

But not so early that they would have had *sC-clusters that would become single consonants + tense vowels in Tangut.

I can think of five ways to deal with the problem of why those six loans have tense vowels.

1. They are unrelated native Tangut lookalikes that once had *S.-.

I'd buy this if I had internal etymologies for at least some of the six, but I don't.

2. They are the random byproducts of misperception.

But what in Chinese could sound like tense vowels to Tangut ears?

3. They are sporadic attempts to emulate Chinese phonetic features absent in Tangut.

It may not be a coincidence that all the loans are from Chinese words with tones 2-4 from final glottals or stops. The trouble is that the two late loans, 'to write' and 'taste', had no final glottals in Chinese by the time they were borrowed.

4. They acquired tense vowels by analogy with other words with tense vowels.

But which words would have been the models for analogy?

5. Perhaps 'taste' acquired a tense vowel by assimilating with


1079 2lenq3 'sweet' (this resembles lem-type words for 'sweet' in Sino-Tibetan, but a pre-Tangut *S.lem would have become lonq, not lenq.)

in the compounds

𘕉𗗘 𗗘𘕉

1viq3 2lenq3 and 2lenq3 1viq3, both 'sweet' (see Gong 2002: 352-353 for attestations).

That is, an earlier *1vi3 2lenq3/*2lenq3 1vi3 became 1viq3 2lenq3/*2lenq3 1viq3, and 1viq3 retained a tense vowel even as an independent word.

I could then claim that 'world' acquired a tense vowel by assimilating with


0359 1soq1 'three'

in the phrase


1soq1 2keq2 'three worlds' (a calque of Chinese 三世 'three worlds' or Tibetan dus gsum 'three times': i.e., past, present, and future).

but I think that's pushing it. And I have no phrases to explain the tenseness in the other four loanwords.

Should 2wuq1 be added to that set of loanwords with anomalous tense vowels? Maybe.

¹It would be more precise to say "preserves the voicing of", since Middle Chinese voiced obstruents were oral, whereas they were borrowed into Tangut as prenasalized stops b- [mb] and dz- [ndz]. WAS TANGUT 2WUQ1 'TO AID' BORROWED FROM CHINESE? (PART 1)

In my last post, I remarked upon the similarity of Tangut


0645 2wuq1 < *Sʌ-ʔwə/oH 'to aid'

to the Sino-Korean reading 우 u for 祐 'to aid'. I considered and rejected the possibility that the Tangut and Chinese words were cognates: i.e., inherited from Proto-Sino-Tibetan.

But I didn't consider yet another possibility: could the Tangut word be a borrowing from Chinese? That would explain the similarity between 2wuq1 and Sino-Korean u: they were both borrowed from roughly contemporary varieties of Chinese. 2wuq1 looks like Edwin G. Pulleyblank's Early Middle Chinese 祐 *wuwʰ (= my *wuʰ) and Tangut period northwestern Chinese (TPNWC) *3wu3.

However, "looks" does not mean "sounds". My w- is [ʔw], not a true [w] like Pulleyblank's *w-. Middle Chinese *w- corresponds to Tangut v- ([v]? [ʋ]?), not w- [ʔw] in Gong's list of Chinese loans in Tangut (2002: 407-408):


0403 1von1 : 王 *wɨaŋ 'the surname Wang'


2340 1von1 : 旺 *wɨaŋʰ 'bright'

I wrote "corresponds" because 'Middle Chinese' is a Platonic entity distinct from whatever northwestern dialect the Tangut were in contact with.

On the other hand, Gong's list of Tangut transcriptions of Chinese in the Forest of Categories (2002: 436-437, 444-445) shows vacillation between v-and w- for Chinese *w-syllables (correspondence types A and E). That seems to imply that the Tangut lacked a simple initial [w]: they could only approximate a Chinese initial [w] with either v- ([v]? [ʋ]?) or w- [ʔw].

Homophones B chapter and homophone group
Li Fanwen number
Tangut reading
Tangut period NW Chinese
Early Middle Chinese
Corresponence type
𗍁  II 1

*wɨejʰ A: v- : w-
II 2


𘍵 II 3



II 9



𗍾  II 9



II 26


B: Ø- : w-
*1hun3 *wuŋ C: h- : w-
𗭴 VIII 5087
*wɨaŋ B: Ø- : w-
𗇝 VIII 4689

*wɨet D: yw- : w-
𗫖 VIII 2094

E: w- : w-
𗤭 VIII 3128


𗨂 VIII 3685

*wɨep B: Ø- : w-
VIII 3628
*wɨen F: gh- : w-
*2/3wen3 *wɨenˀ/ʰ

*wuŋ C: h- : w-

There are also four other types of correspondences:

B: Tangut Grade IV Ø-syllables may have begun with [j], and Chinese Grade III *w- may have become [ɥ], a glide absent from native Tangut words. (But see correspondence D below.)

C: Unique to transcriptions of 雄 *1hun3 (for †1wun3) which must have developed the same irregular fricative found through much of Chinese: e.g., Cantonese hoŋ and Mandarin xiong < *hjuŋ.

D: Tangut ywa [ɥa] is a special rhyme in the readings of only three characters:

𗇝 4689 1ywa4 'glittering'

𗇜 5014 1ywa4 'to go fast; quick' (only attested in the Tangraphic Sea dictionary)

𗮞 5099 1shywa3 'transcription character for Sanskrit śva'

The first two words may be borrowings from 'Tangut B', the non-Sino-Tibetan language that I think is the source of much Tangut vocabulary and possibly even reflected in the structure of the more obscure characters.

F: Tangut ghw [ɣw] might have been an attempt to approximate Chinese [w] without the initial stop of Tangut w- [ʔw]. ghw- is from Gong's reconstruction; it corresponds to w- in Sofronov and Nishida's reconstructions converted into my notation. If Sofronov and Nishida are right, the use of 3628 is simply another instance of correspondence E.

Given that TPNWC 右 *2wu3, the phonetic of TPNWC 祐 *3wu3, was transcribed in Tangut as both 1vi3 and 2ew4, I would expect TPNWC 祐 *3wu3 (or an earlier Early Middle Chinese *wuʰ) to have been borrowed as †1vi3 or †2ew4 with initial †v- or †Ø-,  not 2wuq1 with initial w-. However, the existence of correspondence pattern E (Tangut w- [ʔw] : Chinese *w-) weakens an initial-based argument against a borrowing scenario. Note, however, that pattern E is not attested with the rhyme type of 右 and 祐. That may suggest that w- [ʔw] was inappropriate for 右 and 祐 even though it was appropriate for TPNWC 雲 *1wun3 and 員 *1wen3. TPNWC *w- could have had different allophones before different rhymes.

As I will explain in part 2, I think the rhyme of


0645 2wuq1 'to aid'

may even more strongly rule out a borrowing scenario. THE PREHISTORY OF TANGUT 2WUQ1 'TO AID'

When looking at Andrew West's post about a Tangut hand mirror with the character


0645 2wuq1

which he translated as 祐 'to aid', it occurred to me that 2wuq1 sounds like 우 u, the Sino-Korean reading of 祐. (The Yale romanization of the reading is visually even closer - wu!)

That makes the Tangut word easier to learn. I try to take advantage of soundalikes whenever I can. But are the two forms related? I don't think so, because 2wuq1 is from Pre-Tangut *Sʌ-ʔwə/oH, whereas u is ultimately from Old Chinese *wəʔ(-s).

1.13.13:49: Commentary on the reconstructions


T0. The only remotely similar words I know of are Old Chinese Pa-type words (my ignorance of the rest of Sino-Tibetan is showing):

*Cɯ.P(r)a > *bɨa 'to help'

*Cɯ.P(r)a-ʔ > *bɨaʔ 'to help'

*Cɯ.P- may have fused into *b-: *N-p- > *m-p- > *m-b- > *b-. Another possibility is that *-P- was *-b-, and that *C- has left no trace: *Cɯ.ba > *Cɯ.bɨa > *bɨa.

輔 may be a *ʔ-suffixed variant of 扶.

The presyllabic and main vowels don't match.

There is no guarantee that Tangut -w- is from a lenited stop *P.

A medial *-r- cannot be ruled out; if it existed, it corresponds to nothing in Tangut.

T1. Pre-Tangut *S- conditions Tangut vowel tenseness that I indicate in my notation as -q.

T2. Pre-Tangut *-ʌ- conditions the grade of the Tangut syllable (-1). The phonetic value of -u1 was (partly) lower than [u]: e.g., [ou]. *-u (< *-ə or *-o) lowered to harmonize with the height of unaccented *-ʌ- which was later lost.

T3. I have projected Tangut [ʔw] (w- in my notation) back into pre-Tangut. But I suspect that at the pre-Tangut stage there was a sequence *-CVP- that was compressed into Tangut [ʔw]. Pre-Tangut *-ʌ- could have been in that sequence: e.g., *S(ʌ).Cʌ.PəH.

T4. The pre-Tangut vowel could be either or *o; both merged into -u1 (Jacques 2014: 206).

T5. Pre-Tangut *-H is a laryngeal that conditioned Tangut tone 2 which I write Arakawa-style at the beginning of my notation. *-H could correspond to Old Chinese *-ʔ-s. My assumption that Tangut tones originated Chinese-style from laryngeals could be wrong; they may preserve Proto-Sino-Tibetan tones or have some entirely different origin. But if Tangut and Chinese developed similar grade systems (possibly via contact), they might have developed tones in similar ways as well.

Old Chinese

C0. 祐 *wəʔ(-s) 'to assist' belongs to a large word family discussed at length in Schuessler (2007: 581-582). Schuessler reconstructs a Proto-Sino-Tibetan root *wəs. I don't know how he would account for the *-ʔ in the Old Chinese members of the family.

On the other hand, Matisoff (2003: 327, 591) relates 佑 (another spelling of 祐) to his Proto-Tibeto-Burman *grwak 'friend/assist'. The rhyme might work: Matisoff's Proto-Tibeto-Burman *a can be from Proto-Sino-Tibetan *ə. See Schuessler (2007: 31-32) on Tibeto-Burman -k corresponding to Old Chinese *-ʔ. I don't believe in 'Tibeto-Burman' except as a convenient term for 'non-Chinese Sino-Tibetan', so 'Tibeto-Burman' here is to be taken as the latter.

As for *gr-, see C1-C2 below.

C1. Baxter and Sagart reconstruct the 祐 word family with *[ɢ]ʷ-. The brackets indicate 'either *ɢʷ-, or something else that has the same Middle Chinese reflex as *ɢʷ-' (wording based on Baxter and Sagart 2014: 8): e.g., *N-qʷ- or *m-qʷ-. *ɢʷ- does look like Matisoff's Proto-Tibeto-Burman *g- (see C0 above). But I am suspicious of it - there is no Chinese-internal evidence that there was ever a stop in this word family. And Baxter and Sagart's system has no simple *w- which is what Schuessler and I reconstruct instead of *ɢʷ- in this word family.

C2. There could not have been an *-r- in this word because *wrəʔ-s (or Baxter and Sagart's *[ɢ]ʷrəʔ-s) would have become Middle Chinese †wiʰ (cf. 鮪 *wrəʔ / *[ɢ]ʷəʔ > *wiˀ 'sturgeon'²), not *wuʰ (> Sino-Korean 우 u).

¹Samuel E. Martin designed the Yale romanization of Korean to be typeable on a standard US keyboard, so it has no diacritics or nonbasic Latin letters. w distinguishes labial wu [u] from nonlabial u [ɯ] (= ŭ in the modified McCune-Reischauer romanization of Korean on this site).

²The Sino-Korean reading of 鮪 should be †위 wi, but in fact it is 유 yu, presumably by analogy with 유 yu, the reading of the far more common character 有 'to exist'. There would be few opportunities to use 鮪 in Korean; the Korean word for 'sturgeon' is 鐵甲상어 chhŏlgapsangŏ 'iron armor shark'.

The suffix -ngŏ 'fish' is from Middle Chinese 魚 *ŋɨə, but in hangul it is written as < Ø.ŏ> across two syllables, so it is not associated with 魚 since character readings always only occupy single hangul blocks.

상어 sangŏ 'shark' is from Middle Chinese 鯊魚 *ʂæ ŋɨə, though it cannot be written as 鯊魚 in Korean since its parts do not correspond to syllable blocks/Sino-Korean readings:

Sinographs (aligned with Middle Chinese)
Middle Chinese
Korean (transliterated)

Sino-Korean (transliterated) s
Sinographs (aligned with Sino-Korean)

The Sino-Korean readings of 鯊 and 魚 are 사 sa and 어 ŏ, so 鯊魚 is read as saŏ. I suspect that sangŏ is an old borrowing from spoken Middle Chinese, whereas saŏ is a literary Korean creation combining the isolated readings sa and ŏ (< ngŏ). A TR-OUBLING TR-ANSCRIPTION

蔡同榮 Chai Trong-rong passed away fifteen years ago today. At first his name might look Vietnamese because of its tr, a letter combination not used in romanizations of the other major East Asian languages. However:

- Chai is not a Vietnamese surname. It is not even a possible Sino-Vietnamese syllable.

- Vietnamese names are usually made up of Sino-Vietnamese elements, and rồng 'dragon' is not one of them; it is a loan from Late Old Chinese 龍 *roŋ, but it is not Sino-Vietnamese in the strict sense: i.e., it is not the reading of 龍 which is long, a much later loan which may postdate rồng by a millennium.

- The Sino-Vietnamese reading of 蔡同榮 is Thái Đồng Vinh which is quite different from Chai Trong-rong.

- Nothing in Chai's background - beginning with his childhood in Japanese-ruled colonial Taiwan - points to a Vietnamese connection. (At first I thought he might be a Vietnamese immigrant to Taiwan. But he is ethnically Taiwanese.)

The tr doesn't match anything in the other forms of the name listed at Wikipedia:

Mandarin (IPA: [tsʰaj˥˩ tʰʊŋ˧˥ ɻʊŋ˧˥]):

Pinyin: Cài Tóngróng

Wade-Giles: Tsài Tóngróng (sic; the correct Wade-Giles is Tsʻai⁴ T'ung²-jung²)

Tainan Taiwanese Hokkien (IPA: [tsʰwa˥˩ tʰɔŋ˧ ʔeŋ˨˦]):

Pe̍h-ōe-jī: Chhòa Tông-êng

Then it occurred to me that Trong has the same letters as torng, the Gwoyeu Romatzyh (GR) romanization of 同. The -r- represents a high rising tone, not a consonant [r]. Could Trong be a metathesis of torng? And was the reordering of o and r accidental or intentional?

Rong is the GR romanization of 榮, but Chai is not the GR romanization of 蔡 which is tsay with y signalling [j] preceding a high falling tone.

1.12.13:41: Later last night it occurred to me that someone unfamiliar with Chinese might have accidentally spread the r- of -rong to the preceding syllable: Tong-rong > Trong-rong. But why would Chai adopt someone else's error? TODAY IN JURCHEN HISTORY

By coincidence, two major anniversaries today are exactly 23 years apart:

- the fall of the Northern Song capital of Bianjing (now Kaifeng) to the Jurchen:

Emperor Qinzong and his father, Emperor Huizong, were captured by the Jin army. The Northern Song dynasty came to an end.

- the assassination of Emperor Xizong in 1050


Emperor Xizong felt depressed by the loss of his sons that he developed an addiction to alcohol and started neglecting state affairs. He also became more violent and ruthless, and started killing people indiscriminately. One of his victims was Ambaghai, a Mongol chieftain and great-granduncle of Genghis Khan.

Emperor Xizong was overthrown and murdered by his chancellor, Digunai, and other court officials in a
coup d'état on 9 January 1150.

Xizong is of linguistic interest as the man attributed with the creation of the mysterious Jurchen small script whose fate may have been intertwined with his:

During the 1970s a number of gold and silver paiza with the same inscription, apparently in the small Khitan script, were unearthed in northern China. Aisin-Gioro has analysed the inscription on these paiza, and although the structure of the characters is identical to the Khitan small script she concludes that the script is not actually the Khitan small script but is in fact the otherwise unattested Jurchen small script. She argues that this small script was only used briefly during the last five years of the reign of its creator, Emperor Xizong, and when he was murdered in a coup d'état the small script fell out of use as it was less convenient to use than the earlier large script.

If Aisin Gioro Ulhicun is right, the only two surviving samples of the script are those that she has identified. I have written about the first twice before; I should finally get around to writing about the second six years later. THE JURCHEN SCRIPT: INNOVATION OR DERIVATION? (PART 1)

(Edited 1.10.0:41 before posting.)

川崎保 Kawasaki Tamotsu's 「渤 海」文字資料からみた女真文字の起源に関する一考察 ('An Observation Concerning the Origin of the Jurchen Script as Seen from Parhae Script Materials', 2014), a response to Alexander Vovin's "Did Wanyan Xiyin Invent the Jurchen Script?" (2012), contrasts two views of the Jurchen large script: 発明 hatsumei 'invention' (the ex Khitanis¹ hypothesis) and 発展 hatten 'development' (the ex Parhis² hypothesis).

To try to parallel how both terms begin with the root 発 hatsu- (hat- before t-) 'go out', I have loosely rendered 発展 hatten as 'derivation' in the title so that it has the same ending as invention.

Kawasaki first summarizes Vovin's  English-language paper in Japanese before presenting his own views.

Vovin read a Parhae stamped tile in Jurchen as

pe gorhon ni

'old thirteen GEN' = 'of Old Thirteen'

Kawasaki interpreted the first two characters as a single Jurchen character looking like Chinese 舍 'to set aside; lodging'.

I am not sure what to make of this stamp for several reasons. Here are the first two.

1. The first character on the stamp has 人 on the top rather than ス. There is no evidence that those two elements were interchangeable in the Jurchen large script, as no ス-graphs have 人-variants in Jin Qizong's dictionary (1984: 23-24). Nonetheless that does not refute Vovin's reading because 人 and ス could have been interchangeable in the earlier Parhae script. Alternately, the stamp may show an older Parhae form with 人 that was replaced by ス in Jurchen.

2. Vovin interprets

as Jurchen pe 'old'. At first this seems plausible given that (1) the Manchu word for 'old' is fe and (2)  Jin Jurchen p- corresponds to Ming Jurchen and Manchu f-. Parhae Jurchen predated Jin Jurchen and probably would also have had p-.

However, the word for 'old' is attested in the Jurchen large script as disyllabic

<pu (g)e> = pu(g)e (奧屯良弼餞飲碑 Aotun Liangbi picnic inscription 1)

and not as a monosyllabic pe. The contraction of pu(g)e to Manchu fe may have been a post-Jin innovation in some but not all varieties of Ming Jurchen. The Bureau of Translators vocabulary has disyllabic fuwe (transcribed as 弗厄 *fu ə and spelled in the Jurchen large script as in the Aotun Liangbi picnic inscription; #667) whereas the Bureau of Interpreters vocabulary has a monosyllabic transcription 佛 *fo, presumably for a form like fo in a Ming Jurchen dialect that compressed fuwe differently than the ancestor of standard Manchu. (o is labial like -uw- and mid like e [ə].)

It is unlikely that a monosyllabic Parhae Jurchen pe expanded into a Jin Jurchen pu(g)e and then recontracted into fo or fe. Parhae Jurchen pe may be an anachronism unless it is from a dialect not ancestral to the more conservative varieties of Jurchen with disyllabic pu(g)e/fuwe. Could standard Manchu fe be a descendant of the Pohai Jurchen pe-dialect (or another Pohai Jurchen dialect with the same type of *uge > -e compression)?

'Old' in Jurchen: a simplified possible family tree

(† = expected but not attested)

Proto-Jurchen *puge
Parhae Jurchen dialect 1: pe Parhae Jurchen dialect 2: †pu(g)e
Jin Jurchen dialect 1: †pe Jin Jurchen dialect 2: pu(g)e (Aotun 1)
Ming Jurchen dialect 1: †fe Ming Jurchen dialect 2: fo (Bureau of Interpreters) Ming Jurchen dialect 3: fuwe (Bureau of Translators)
Standard Manchu fe (Did these dialects survive into the Manchu era?)

Maybe, though Manchu does have a single word with -uge: buge ~ buhe (rather than †be < *buge) 'gristle'. Are those loans from noncompressing dialects? How heterogenous is standard Manchu?

And I would like confirmation of the sound value of


Ideally I'd like to see a polysyllabic word written with that character in the Parhae material that corresponds to a Jin Jurchen pe or Ming Jurchen/Manchu fe. Without such interlocking of both internal and external evidence, I have no way of knowing how

would have been read in Parhae. Graphic similarity does not entail phonetic similarity: e.g., Jurchen 日 'day' and 月 'month' look exactly like Jin or Ming Mandarin 日 and 月 but are pronounced completely differently: inenggi and biya instead of *ʐi and *ɥe. I am convinced that there is graphic continuity between the Parhae and Jurchen scripts. I am more agnostic about projecting Jin Jurchen values back onto an earlier script that may have been used to write other languages, related or otherwise. (In theory even Koreanic and para-Japonic speakers in Parhae could have used the Parhae script.)

I am not even sure there is graphic continuity between this particular Parhae script character and its Jurchen (near-)lookalike. I have already mentioned the problem of the different shapes of the top elements. Might the Parhae character be the source of

<?> '?' (N4631 #1355; font from

in the Khitan (not Jurchen!) large script?

To the best of my knowledge, the character

is not attested in Jin Jurchen; it is only known - at least to me - from the Ming Jurchen 永寧寺碑 Yongning Temple Stele (lines 3 and 4) where

<pe ing>

transcribes Ming Chinese 平 *pʰiŋ.

There is no evidence that

was read with final -e or any other vowel since it is only followed by -ing in the corpus. Could that character have been devised in Ming Jurchen to write  Ming Chinese *pʰ, a consonant absent from native Jurchen words (in which p [pʰ] had shifted to f)? I would prefer to read that character simply as p.

Jin Qizong (1984: 24) suggests that

is derived from Jurchen

<FORTY> dehi 'forty'

or from the aforementioned Chinese character 平 *pʰiŋ.

I would add that there is an even closer match for the shape of Jurchen <p> in the Khitan large script:

<FORTY> (北大王墓誌 Epitaph for the Grand Prince of the North 5; font from

One other potentially related Khitan large script character

<?> '?' (N4631 #2026; font from

is a lookalike for simplified Chinese 圣 'sage'. Unfortunately nothing is known about the phonetic or semantic value of 圣 in Khitan.

Although <FORTY> in the Khitan and Jurchen large scripts and what I read as p certainly look similar, I can't understand why one would decide to write a phonogram <p> as a modification of a logogram <FORTY>. Jurchen dehi has no p in it, and the Khitan word for 'forty' probably sounded something like Written Mongolian döcin 'forty' which also lacks p. Why not add a dot to, say,

be (accusative marker)

to create a phonogram for <p(e)>?

In fact a dotted phonogram derived from <be> already exists - the aforementioned


which is only attested in noninitial position after sonorants (vowels and l; Jin Qizong 1984: 100).

Might the graphic similarity between

<be> and <(g)e>

indicate that *-g- might have lenited to [ɣ] ~ [β] - the latter perhaps just after u? - justifying the choice of <be> as a basis for <(g)e>? Cf. how Proto-Koreanic *-p- lenited to [ɣ] and [β] in different dialects:

Proto-Koreanic *tupur 'two' > southeastern Old Korean 二肸 <TWO.> tuɣur but western Middle Korean 두ᄫᅳᆯ tuβur

(This is a revision of Alexander Vovin's proposal that Proto-Koreanic *-b- became [ɣ] and [β] in different dialects. Now neither he nor I follow S. Robert Ramsey's proposal to reconstruct *b in earlier Korean.)

Medial lenition is common to both Jurchen/Manchu and Korea, and a detailed comparison of the process in the two might be interesting.

I'll get to more of the problems with 'Old Thirteen' in part 2.

¹I originally wrote ex nihilo, but that could be misinterpreted as a straw man, as no one has ever claimed that Wanyan Xiyin had invented the Jurchen large script without any influence from other scripts. 'From the Khitans' is better since the orthodox view is that Wanyan Xiyin took the existing Khitan large script and arbitrarily changed it.

On the other hand, I side with Janhunen who thinks the Jurchen script is a modification of the Parhae script which was a sister of the standard Chinese script. Contrasting the two views:

Ex Khitanis

Chinese script
Khitan large script
Jurchen large script

There is no Parhae script for the Jurchen script to derive from in the orthodox view which holds that the Parhae wrote exclusively in Chinese characters.

Ex Parhis (Janhunen)

Proto-Chinese script
Standard Chinese script
Nonstandard northeastern Chinese script
Parhae script
Khitan large script
Jurchen large script

Today it occurred to me that the lost Tabghach script might fit into the above schema as the ancestor of the Khitan large script:

Ex Parhis (this site)

Proto-Chinese script
Standard Chinese script
Nonstandard northern Chinese scripts
Tabghach script? Parhae script
Khitan large script
Jurchen large script

In the above scenario, one could speak of a  para-Mongolic (or 'Xianbeic' = Shimunek's 'Serbic'?) line of scripts (Tabghach and Khitan) and a  possibly 'Tungusic' line of scripts (Parhae and Jurchen). But this is extremely speculative, as we have no idea what the Tabghach script looked like; it might not have been what Janhunen called 'sinoform' (i.e., Chinese-like). It may have had no relationship to any of the other scripts in the diagram above.

Janhunen (1996: 153) wrote,

In view of the later ethnic situation in the border zone between Korea and Continental Manchuria, and in the absence of any contradicting evidence, the most natural assumption about the states of Koguryo and Bohai [= Parhae] is that they were dominated by people ethnically ancestral to the Jurchen. It is well known that the Bohai ruling elite as largely formed by descendants of Koguryo nobility [...] there are no indications that any significant number of people linguistically connected with the modern ethnic Koreans would, during this period, have been present outside of the United Shilla territory.

To some extent, the above conjecture about the possible Jurchen identity of the Bohai population is complicated by the fact that the Bohai people continued to be counted as a separate ethnopolitical entity even after the fall of the Bohai kingdom. Not only the [Khitan] Liao [dynasty] but also the [Jurchen] Jin [dynasty] system of ethnic administration registered the Bohai people as distinct from the Khitan and Jurchen populations.

Another complicating factor is the absence of Jurchenic elements in the Koguryo onomastic material which is split between Koreanic and Para-Japonic items.

My attempt to reconcile the above points:

- Koguryo was a multiethnic state with a Chinese-influenced Koreanic-speaking elite ruling over Tungusic (including Jurchenic), Koreanic, and Para-Japonic-speaking subjects.

- As there is no evidence for Para-Japonic in Parhae, the Para-Japonic language(s) of Koguryo may have become extinct by the time Parhae was established in 698. Or Para-Japonic speakers were in the part of Koguryo that Shilla conquered: i.e., the part that did not become part of Parhae.

- The majority population of Parhae was Tungusic-speaking; their languages were not prestigious and hence almost totally absent from written records except in the Parhae script at a local and unofficial level.

- 'Parhae' as an ethnonym could have been a cover term for various peoples speaking Jurchenic and/or para-Jurchenic languages: i.e., Tungusic languages more or less related to Jurchen.

- The Jurchen script, then, was an official version of the previously informal Parhae script originally used to record one or more relatives of Jurchen but possibly not Jurchen itself. Wanyan Xiyin may have introduced new usages or characters to adapt the script to Jurchen. This introduction may not have been by him alone; it could have paralleled what happened when the Mongolian script was adapted for Manchu five centuries later. Roth Li (2000: 13) wrote,

The process of modifying [the Mongolian] script [for Manchu] occurred over at least a decade and was not, as some Chinese, sources made it appear, carried out singlehandedly by Dahai in 1632.

Dahai may have been the new Wanyan Xiyin: i.e., the man later attributed with the result of a slow process involving multiple people. As far as I know, there is no contemporary documentation indicating that Wanyan Xiyin 'created' the Jurchen large script in 1119 according to his biography in History of the Jin Dynasty 73; that 'fact' may be a later oversimplification. The later dates from other sources (1121 in the Wanyan Xiyin inscription and 1123 in the Record of the Great Jin State) could be reconciled by viewing the 'creation' of the Jurchen script as a process spanning years before and after 1120. Cf. Wikipedia:

The date of the creation of the script (1119 or 1120) varies in different sources. Franke (1994) says that "[t]he Jurchens developed ... [the large script] ... in 1119". Kane (1989) (p. 3) quotes the Jin Shi [History of the Jin Dynasty], which states that "[i]n the eighth month of the third year of the [天輔] Tianfu period (1120), the composition of the new script was finished". The two dates can be reconciled as one may imagine that the work started in 1119 and was completed in August–September (the eighth month of the Chinese calendar) of 1120.

In fact Tianfu 3 is 1119, so Kane's (1989: 3) 1120 may be an error; his 2009 book has 1119 on p. 3. (The date of the script is on page 3 of both books!)

²My guess at the Latin ablative of Parhae. I declined Parhae like Thebae 'Thebes' and Sinae 'China' which are pluralia tantum (though of course no such concept exists in Korean or Chinese [the Korean name is the Sino-Korean reading of the Chinese place name 渤海). WHAT IF THE LANGJUN INSCRIPTION WERE IN JURCHEN?

It just occurred to me that the Jurchen large script was fifteen years old when the 郎君 Langjun inscription was written in Khitan in 1134. The Arkhara inscripton of 1127 demonstrates that the Jurchen large script was already in use in the years between its 'creation'¹. So why wasn't the Langjun inscription about a Jurchen aristocrat - none other than the emperor's brother - written in his language?

It would be interesting to try to construct a Jurchen version of the inscription. At least I can guess that they would have written 'Tang dynasty' as

<ta ang> (attested in 1185 in 大 金得勝陀頌碑 26)

which partly parallels the structure of the Khitan small script spelling


from the Langjun inscription, though of course the Khitan small script fuses two characters (<ta> and <ang>) into a single block unlike the Jurchen large script characters which remain full-sized.

All that makes me wonder about the influence of Khitan writing practices on the Jurchen script - and if they can be differentiated from Parhae writing practices. Were the Parhae the first to write CVC syllables as <CV VC> sequences?

¹Not the best word, as I agree with Juha Janhunen (1994) who first proposed that the Jurchen script is actually derived from a preexisting Parhae script rather than being invented on the spot. It was Alexander Vovin who introduced me to Janhunen's idea over twenty years ago. I just found 川崎保 Kawasaki Tamotsu's 「渤 海」文字資料からみた女真文字の起源に関する一考察 (2014), a response to Vovin's "Did Wanyan Xiyin Invent the Jurchen Script?" (2012).

Next: My thoughts on Kawasaki's article.

