What is the etymology of Chinese 孔雀 'peacock'? It looks like 孔 'hole'* plus 雀 'sparrow'; the latter makes sense, but the former doesn't. My guess is that 孔 is 'great', a homophone of 'hole' in 孔鳳  and 孔鸞, terms for mythical birds. Confucius' family name 孔 might also be 'great'.

In the bilingual glossary Pearl in the Palm (1190), 孔雀 'peacock' was translated into Tangut as

1voʳ 2ləi 'chicken ?'

The second half 2ləi doesn't seem to be an independent word, though Li Fanwen (2008: 14) glossed it as both 雀 'sparrow' and 雞 'chicken'. It also appears in the Tangraphic Sea analysis of 1voʳ


1voʳ 'chicken' = top, bottom left, and bottom right of 2ləi '?' + bottom right of 1zeʳw 'graceful, elegant, gorgeous, variegated' (not traits I normally associate with chickens)

and as the first half of

2ləi 2ʔiaaʳ

glossed as 雞 'chicken' in the Pearl in the Palm.

2ʔiaaʳ by itself can mean 'chicken'**; the difference, if any, between it and its synonym 1voʳ*** is unknown.

I don't know what 2ləi means. None of its homophones have appropriate semantics for either 'peacock' or 'chicken':

'wrestling', 'Li, the name of an ancestor', 'east', 'to boil', 'fog'

'gourd ladle', 'tiger', 'dark-complexioned', 'fear, dread'

I would expect 2ləi in 1voʳ 2ləi 'peacock' to be an adjective or a noun, but I wouldn't expect an adjective in first position of a generic noun like 2ləi 2ʔiaaʳ 'chicken' (as opposed to a term like

2liẹ 2lɨə 'the great earth').

Perhaps 2ləi in 'peacock' and 'chicken' is a once-independent word that only survives as a part of other words like English were- in werewolf from Old English wer 'man'. I can't think of a single meaning that could be in both

'chicken + X' = 'peacock'

'X + chicken' = 'chicken'

so there might have been two different *CV[-high]TiH (with different meanings, segments, and/or parts of speech?) that merged into a single 2ləi.

*孔 'hole' is probably derived from 空 'empty':

Old Chinese 空 *khoŋ 'empty' > 孔 *khoŋ-ʔ 'hole'

**10.28.00:50: 2ʔiaaʳ 'chicken' may be from *Cɯ-raH which resembles Matisoff's Proto-Tibeto-Burman *k-rak 'chicken' or *Cɯ-ʔar which resembles Matisoff's Proto-Tibeto-Burman *ʔaːr or haːr  'chicken'. The vowel lengths even match!

***10.28.00:56: 1voʳ 'chicken' resembles these Qiangic forms

Lyuzu ɣua³⁵  'chicken'

Mawo   'year of the chicken'

but I bet they are unrelated lookalikes. VEXED BY VENUS

(I meant to write this last Friday, but didn't get around to it until tonight.)

Friday is the day of Venus. I am puzzled by Burmese


<sokrā> [θauʔcà] 'Friday, Venus' < Sanskrit śukra- 'bright, Venus (m.)*'; also cf. Pali sukka- 'bright')

for several reasons:

1. Why is the first vowel ော <o> instead of ု <u> corresponding to Skt (and Pali) u**? The spelling <so> implies a pronunciation *[θɔ́], not [θauʔ].*

2. Why does the first syllable [θauʔ] end in a glottal stop (corresponding to zero in its spelling) as if the word were borrowed from an Indic *sokkrā with a geminate absent from Sanskrit? Was *-kkr- the result of confusion between Skt śukra- and Pali sukka?

3. Why does the word end in ာ <ā> even though 'Venus' is not an feminine in Sanskrit? I would expect the word to be

<sukra> [θúca̰] or <sukka> [θouʔka̰]

with a final written short vowel corresponding to a spoken vowel with a creaky tone.

*Sanskrit śukra- has no inherent gender as an adjective but is masculine as a noun 'Venus'.

*I am puzzled by why Burmese [ɔ́] and [ɔ̀] are written as <o> and <au> instead of <oḥ> and <o>. [ɛ́] and [ɛ̀] have similarly unexpected spellings:

tone high < *-h low < *final nasal or long vowel
[ɔ] ော <o> [ɔ́] (not <o> ော် <au> [ɔ̀] (not <o>)
[ɛ] ဲ <ai> [ɛ́] (not <ai> -ယ် <ay> [ɛ̀] (not <ai>)
other vowels <Vḥ> <V>

It's been 18 years since I last looked at Old Burmese. Maybe it holds the key to this mystery. MASCULINE FROM FEMININE

When I first learned about the use of dots for masculine numerals in the Khitan small script, I was surprised that masculine graphs were derived from feminine (or neuter?) graphs:



I even accidentally reversed the terms 'masculine' and 'feminine' when I first blogged about them.

I've been trying to think of examples of masculine words that are derived from their feminine counterparts and are longer than them.

The first word that came to mind was Esperanto fraŭlo 'bachelor', but as a back-formation, it's shorter than its source fraŭlino 'miss'. There is no Esperanto root *fraŭ, though fraŭlino is obviously from German Fräulein which in turn is from Frau.

German strong masculine nominative singular adjectives are longer than their female counterparts but not derived from them. If German were written in a Khitan-like script, the feminine -e form could be regarded as basic*, and the masculine -er form could be regarded as the feminine -e-form plus -r: e.g.,

nominative singular German pseudo-Khitan script gloss
feminine gute Frau <GOOD woman> good woman
masculine guter Mann <GOOD-DOT man> good man

If one extended this Khitan analogy further, the strong feminine genitive singular guter might also be written with a dot, and future scholars might initially assume that such forms are 'errors' for gute. Conversely, the weak masculine nominative singular is gute and would be written without a dot, so later linguists might misinterpret such forms as 'errors' for guter.

Similarly, could apparent gender 'errors' in Khitan reflect syncretism rather than confusion? Perhaps the masculine forms are homophonous with the feminine forms in certain contexts.

Is grammatical gender in Khitan an innovation or a retention? Other 'Altaic' languages do not have grammatical gender. Was it a Pre-Proto-Mongolic trait that was lost through convergence with its neighbors?

*I am pretending that the -e-less attribute forms (e.g., gut) do not exist in this imaginary scenario. DID KHITAN NUMERALS HAVE INVISIBLE (BUT AUDIBLE) SUFFIXES?

In "Gender in the Khitan Large Script", I mentioned that grammatical gender was indicated for verbs but not numerals in the large script, whereas it was indicated for both word classes in the small script:

large script small script
verbs masculine vs. feminine (unknown if neuter forms also exist)
numerals no distinction masculine vs. nonmasculine (feminine and possibly also neuter)

The verb endings were written with <VC> graphs in the large script. Could numerals have had endings that were less than <VC>: e.g., (unstressed?) short vowels? Such endings might have been unwritten in the large script just as the Persian ezāfe is unwritten after consonants (and sometimes even after vowels):

فیلم خوب

<fylm xvb> = film-e xub 'good movie'

On the other hand, the longer Khitan <VC> numeral ending <un> was written: e.g.,

<TWO un> 'two-GEN'

A short final front vowel could have conditioned fronting of root vowels (before being lost?): e.g.,

xon-e > xön-e > xön 'ten' (f.?)

Since e is considered a feminine vowel in Mongolian and Manchu, I assume that it would have been feminine in Khitan as well.) Hence I could reconstruct front vowels for dotless (i.e., nonmasculine) numerals and back vowels for dotted (masculine) numerals. However, the use of dotless <FIVE> in

<HUNDRED FIVE> = <jau tau>

as a transcription of Liao Chinese 招討 *jautau 'bandit suppression commissioner' implies that nonmasculine 'five' was *tau with a back vowel (and without a suffix) rather than *täu(e) with a front vowel (and a suffix).

Perhaps the dot in the small script indicated the addition of a masculine suffix that did not affect root vocalism because it was a neutral vowel or it belonged to the same vowel class as the root: e.g.,

# 1 2 3 4 5 6 7 8 9 10
♀ (and neuter?) mem cur ɣur dur tau nil daro ńo is xon
mem-e cur-u ɣur-u dur-u taw-a nil-i daro-o ńo-o is-i xon-o

(The reconstructions of the roots are tenative. The reconstructions of the suffixes are wholly hypothetical; their vowel harmony is patterned after the vowel harmony of -Vn genitive suffixes.)

I do not know of any dotted numeral graphs serving as phonograms in the small script. Did they have suffixes or some other phonetic quality that made them unlikely phonograms: e.g., were there no words in Khitan other than the masculine form of 'five' with the phonetic sequence tawa?

Could the dots be a purely semantic device without any phonetic value like the use of the 女 'female' or 牛 'ox' radicals in Chinese 她 'she' and  牠 'it' (cf. 他 'he' with the 亻 'person' radical)?

Could the dots indicate phonetically distinct forms for some numerals but not others? The masculine and feminine forms for 'first' have different though presumably related stems <m.o> and <m.as.qú>. Maybe mo was the masculine form of 'one' and mas the feminine form. (<-qú> may be an adjective ending; it is also in <l.iau.> 'red' and <s.iau.> 'blue'.) However, other ordinal numerals have identical stems for both genders: e.g., <c.ur.er> 'second' (m.) and <c.ur.én> 'second' (f.). Perhaps the cardinal counterparts of such invariable ordinal stems were also invariable, but the use of the dot to indicate the masculine form of 'one' was extended by analogy to other numerals*. The absence of dots where they are expected in phrases like

<SIX po.od> 'six (non-m.) hours'; cf. <SEVEN-DOT po.od> 'seven (m.) hours'

may be due to the lack of a phonetically distinct masculine numeral. Do such errors ever occur with 'one'?

10.26.3:20: Although cardinal numerals past 'one' may not have had any gender distinctions, other words like ordinal numerals and past tense verbs would reflect the grammatical gender of nouns.

*10.26.4:30: In German, there are distinct forms of 'one' depending on gender:

ein Mann 'a/one man', eine Frau 'a/one woman'

But there is no distinction for numerals above 'one':

zwei Männer 'two men', zwei Frauen 'two women'

If German were written like the Khitan small script, those words would be written as

<ONE-DOT mann> <ONE frau>

<TWO-DOT männ.er> <TWO frau.en>

even though zwei does not have a distinct masculine form in speech. Since zwei is the same before Männer as well as Frauen in speech, it may be tempting to write zwei as <TWO> without a dot before both nouns. GENDER IN THE KHITAN LARGE SCRIPT

The pairs of dotted and dotless graphs in the Khitan small script have no known counterparts in the large script: e.g.,



in the small script correspond to a single*


in the large script. If the dot indicated grammatical masculinity, there are two possible explanations for this mismatch:

1. The same language variety underlies both scripts. The gender of numerals was simply ignored in the large script. Readers of the large script knew from context when to pronounce the masculine or nonmasculine form of a numeral, just as Russians know when to read 2 as dva (masc./neut. nom.) or dve (fem. nom.), etc.

2. Different varties of spoken Khitan underlie both scripts. This is hypothesis E in Andrew West's list of five possible reasons for two Khitan scripts.

Even if Khitan large script numeral graphs had no visible grammatical gender, both the large and small scripts had different graphs for masculine and feminine past tense endings (Kane 2009: 174):

past tense ending <-er> ♂ <-én> ♀
large script

small script

Why was gender marked in verbs but not numerals?

Next: The Sounds (and Sources) of Gender in Khitan

*10.25.00:50: There are multiple variants of the large script graphs for 'four', 'six', 'eight', and 'nine'. Although I have not yet investigated the distribution of these variants in depth, this glance at some variants of 'six' indicates that at least some variants were interchangeable before certain nouns.

My guess is that these variations do not reflect gender because it is unlikely that the variety of Khitan underlying the large script texts would have gender distinctions for those four numerals but not 'one', 'two', 'three', 'five', or 'seven'. I have never seen a language that has a gender distinction in 'four' without such a distinction in 'one' through 'three'. DID KHITAN HAVE THREE GENDERS?

The Khitan small script has pairs of graphs like


with and without dots. Kane (2009: 27) wrote,

There have been many suggested explanations for these [dots]: that they indicate some sub-class of numerals or colours, or are used with nouns to whom respect is due (Chen Naixiong 1992). Recently Wu Yingzhe 2005a has proposed the dot indicates grammatical gender.

For the last three years I have assumed that Khitan had two grammatical genders, masculine and feminine. However, yesterday morning I wondered if it also had a neuter gender.

Khitan has several types of plurals (Kane 2009: 138-142). I suspect there is a correlation between plural endings and genders:

gender plural ending of <qi> 'that' before noun dot on numeral before noun plural ending of noun past tense examples
masculine <d>?, <s>? yes <(V)d>, <s> <er> <SEVEN-DOT ai.d> 'seven fathers'
<NINE-DOT x.iŋ.d> 'nine capitals'
<THREE-DOT MONTH.s> 'three months'
<NINE-DOT non.s.en> 'nine generations-GEN'
neuter <t> no <(V)d>, <s> <er>? <FOUR us.g.d> 'four characters'
<EIGHT us.g.d> 'eight characters'
<TEN ki.ên.d> 'ten counties'
<FOUR REGION.a.ad.en> 'four regions-GEN'
<FIVE n.ai.d.l.ɣa.ad> 'five harmonies'
<qi.t ai.s.er> 'those years-ACC'
<EIGHT NINE ai.s> 'eight or nine years' <TEN ui.s> 'ten matters'
feminine <t>? no? <t> <én> <mo.t> 'mothers, wives, females'

Unfortunately I can't find any examples of <qi> before nonneuter plural nouns or <mo.t> preceded by a numeral.

I predict that neuter nouns pattern like masculine nouns as the subjects of past verbs. (Gender marking in the past tense is reminiscent of Slavic: e.g., Russian byl (m.) ~ byla (f.) ~ bylo (n.) 'was'.)

I initially thought that <s> was an inanimate ending, but it occurs in at least one animate plural: <qid.s> 'Khitan people' (Kane 2009: 141). I wonder if that word was neuter since the Khitan consist of both males and females. I have no way of determining its gender until I see the forms of numerals and 'those' that can precede it. <non.s> 'generations' is also arguably animate since it refers to people (at least in the context of Khitan funerary inscriptions).

Conversely, <d> cannot be an animate ending, as it occurs in 'capitals' (m.) and 'characters', 'counties', 'regions', and 'harmonies' (all n.).

<d> even occurs in the biologically feminine noun <aú.ui.d> 'royal ladies' (Kane 2009: 139) which might have been grammatically masculine like Sanskrit dārās 'wife, wives' or neuter like German Mädchen 'girl, girls'. If there are other feminine nouns in <d>, then I will have to merge my neuter and feminine categories back into a single feminine category.

<po.od> 'hours' appears with both dotted and dotless numerals (Kane 2009: 141). Perhaps the Khitan gender system was collapsing, and <po>, once a neuter noun, could optionally be preceded by masculine numerals. Would 'those hours' be <qi.t po.od> (n.) or <qi.d po.od> (m.)?

Was gender only optionally marked in numerals? If so, then dotted numerals must always precede masculine nouns, but not all masculine nouns are preceded by dotted numerals. Khitan dialects might have had different degrees of gender marking like Danish dialects which may have one, two, or three grammatical genders. Although I presume there was a prestige dialect of Khitan spoken by the imperial family and its consort clan, individual scribes might have left traces of their nonstandard dialects in writing.

Next: Gender in the Khitan Large Script PARA-MONGOLIC TEENS

In "Hey Nineteen", I reconstructed the Para-Mongolic suffix '-teen' as *-Kun on the basis of Jurchen


<niyuhun> 'eighteen' (< 'eight-ten'; two spellings)

<oniyohon> 'nineteen' (< 'nine-ten').

I was uncertain about whether the first consonant was *k or *x because I don't know if the words were borrowed before or after the *k-to-h [x] shift in Jurchen. If they predate the shift, they could have had *k- or *x-; if they postdate the shift, they had *x-.

Janhunen (2003: 17) reconstructed *x-as the initial of Proto-Mongolic *xa-r-ba-n 'ten'. Could Para-Mongolic *-xun be an irregular contraction of a Pre-Proto-Mongolic *-xa-r-ba-n? The *-u- could be a remnant of *-b-:

*-xa-r-ba-n > *-xaβan > *-xawan > *-xawn > *-xun


'ten' (large and small script)

could be something like xun, given the Khitan tendency toward short forms corresponding to longer (yet newer!) Mongolic forms.

Could the difference between

dotless (feminine and neuter*?) and dotted (masculine?) 'ten'

in the small script have been vocalic: e.g., <xun> and <xon>?

Kiyose (1977: 133) derived Ming Jurchen -hun from *-hön. Perhaps the Khitan words for 'ten' were <xon> and <xön> with a mid vowel from *-a-r-b(a)-:

*-xa-r-ba-n > *-xaβan > *-xawan > *-xɔn > *-xon (with front vowel variant *-xön)

I assume and merged into Ming Jurchen u. Original *u also seems to have merged into Ming Jurchen u according to the Chinese transcriptions, but it's also possible that a Manchu-like /u/ : /ʊ/ distinction was ignored:

Jin Jurchen Ming Jurchen Manchu
ö [o]?* u u
ü [u]?
u [ʊ]? u and/or ʊ? u, ʊ (preserved only after velars)

Perhaps there is Jurchen script evidence for an /u/ : /ʊ/ distinction: e.g., multiple <Cu> characters may turn out to have been read with different vowels /ö/ /ü/ /u/ in Jin Jurchen.

*Jin Jurchen o might have been [ɔ] if the language had height-based vowel harmony:

front central back
higher i [i] e [ə] ö [o] ü [u]
lower i [i] < < *e a [a] o [ɔ] u [ʊ]

This system is similar to the one I reconstruct for Old Korean:

front central back
higher *i *u
lower *e *a *o

Were the two systems originally even more similar? (Chinese loans in Korean rule out the possibility that the Proto-Korean system was more like Jurchen, though the reverse might be true.) Was that similarity due to convergence, or did it also reflect divergence after convergence?

Higher-lower vowel distinctions are also crucial for my reconstructions of Old Chinese and Pre-Tangut which have higher and lower-vowel presyllables *Cɯ- and *Cʌ- that condition massive vowel changes.

*I realized today that Khitan might have had three genders. I'll explain why next time. HEY NINETEEN

In "Return to the Roof: The Assis-ten-t", I mentioned Jurchen

<oniyohon> 'nineteen'

which might contain a Para-Mongolic (Khitan?) suffix *kon or *xon for '-teen'.

One might expect the Khitan word for 'nine' to resemble oniyo-, but it is

is or iš* (Kane 2009: 109; shown here in one large and small script spelling; see "Feminine Lines" for other large script spellings)

which is similar to Janhunen's (2003: 399) Proto-Mongolic *yersü-n 'nine'** which may be an innovation***.

Perhaps Pre-Proto-Mongolic *o-nay- (cf. Proto-Mongolic *nay- 'eight' in Janhunen 2003: 17) was the original root for 'nine' preserved in 'nineteen' in the Para-Mongolic language (Khitan?) that was the source of the Jurchen loanword but not in Mongolic:

Early Pre-Proto-Mongolic *o-nay- 'nine', *o-nay-Kun 'nineteen'
Late Pre-Proto-Mongolic *o-nay- ~ *yer-sün 'nine', *o-nay-Kun 'nineteen'
Proto-Mongolic *yer-sün 'nine',
*xa-r-ba-n yer-sün
'nineteen'(lit. 'ten nine')
Khitan is < *yer-sün 'nine'
but onyo-Kon < *o-nay-Kun 'nineteen' (see below)

PPM *o-nay- is implied by Jurchen


<niyuhun> 'eighteen'

another potential loanword from Para-Mongolic (Khitan?). Perhaps *nay 'eight' became *nya- and its *a raised and rounded to assimilate to other vowels:

'eighteen': *nay-Kun > nyaKun > *nyuKun

'nineteen': *o-nay-Kun > onyaKun > *onyoKon

I followed Kiyose who reconstructed Jurchen 'nineteen' and 'eighteen' with -iy-, but the Chinese transcriptions of the two words in 華夷譯語 (Kiyose 1977: 133) imply that Kiyose's niyV was a single syllable:

oniyohon 'nineteen': 斡女歡 *o xon for Jurchen IPA [oɲoχon]?

niyuhun 'eighteen': 女渾 * xun for Jurchen IPA [ɲuxun]?

Kiyose presumably reconstructed the syllable transcribed as 女 differently depending on the other vowels of the word: niyu before hun and niyo surrounded by o and hon in accordance with the rules of Manchu vowel harmony.

Norman's Manchu dictionary has nio ~ niyo doublets: e.g., niolmon ~ niyolmon 'moss'. The two are not always interchangeable: e.g., there is no *niyohe corresponding to niohe 'wolf' and there is no *nio corresponding to niyo 'swamp'. I presume that 'wolf' was IPA [ɲoxə]**** whereas 'swamp' was IPA [nijo]. Perhaps I should follow Jin (1984) and reconstruct the Jurchen numerals as oniohon and niuhun by analogy with Manchu niohe [ɲoxə] 'wolf'.

I am not sure those Jurchen '-teen' words are from Khitan because '-teen' words appear as 'ten' + 'nine', etc. in the Khitan large and small scripts:

<TEN NINE> (large script; 蕭袍魯墓誌銘 line 5)

<TEN NINE> (small script; bronze mirror found in Ulanqab)

Did the Khitan simply imitate the Chinese practice of writing 'ten' + 'nine' etc., just as Germans write 21 but say einundzwanzig rather than *zwanzigeins or *zweieins?

The Jurchen graph

<oniyohon> 'nineteen'

looks like a fusion of Chinese/Khitan 十 'ten' and Chinese (but not Khitan!) 九 'nine' or Jurchen


'ten' and 'nine'.

Jin (1984: 200) derived the Jurchen graphs



from Chinese 十 'ten' and 八 'eight', but they remind me more of Chinese 方 'direction' and 万 'ten thousand'.

The latter graph resembles



in the Khitan small script, possibly derived from Chinese 万 'ten thousand' because the Khitan word for 'ten thousand' was tumu, transcribed in Liao Chinese as 圖木 *thu mu (Kane 2009: 120) and written in the small script as

resembling Liao Chinese 及 *kiʔ 'to reach'.

*10.22.00:17: Janhunen regarded the reconstruction of is for Khitan 'nine' as "clearly anachronistic [i.e., too much like Mongolic?] and unlikely to be correct". I am not sure why he came to this conclusion. Perhaps it is because is has a final -s that he considered to be secondary***.

In the 2010 book he coauthored with Wu Yingzhe, 'nine' was reconstructed as is according to Aisin Gioro (2012: 7).

**10.22.0:11: I do not know if

Khitan i- is from *ye-

Proto-Mongolic *ye- is from *i-

both are from some third type of initial syllable

***10.22.4:28: Janhunen (2003: 17) reconstructed *yer without *-sün as the root for 'nine' (cf. *yere-n 'ninety'). It lacks a reflex of his Pre-Proto-Mongolic suffix *-pAn found in most other lower numerals, and he viewed lower numerals like 'nine' without that suffix "as somehow special and perhaps secondary."

Perhaps he thought is was "clearly anachronistic" because he did not expect a Para-Mongolic language like Khitan to have an -s from a suffix *-sün that was a Proto-Mongolic innovation. However, I think *-sün could have been a late Pre-Proto-Mongolic innovation inherited by both Proto-Mongolic and Khitan.

****10.22.4:40: Manchu niohe [ɲoxə] 'wolf' could be a loan from Para-Mongolic which preserved a *ny- lost in Mongolic (Janhunen 2003: 397): cf. Kane 2009's Khitan

<ńi.qo> 'dog'

and Janhunen 2003's Proto-Mongolic *noka-i 'dog'.

Tangut fonts by Mojikyo.org
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2012 Amritavision