Archives VAN MEN

What is the story behind the irregular conjugations of Hungarian van 'is' and megy 'goes'?

Number/person Ending(s) 'to be' < Proto-Finno-Ugric* *wole- 'to go' < Proto-Uralic *mene-
1st singular -ok/-ek vagy-ok megy-ek
2nd singular -sz [s] vagy-Ø mé-sz ~ mégy-Ø
3rd singular van-Ø megy-Ø
1st plural -unk/-ünk vagy-unk megy-ünk
2nd plural -tok/-tek vagy-tok men-tek
3rd plural -nak/-nek van-nak men-nek

The list of endings is not exhaustive and only includes endings that would normally be expected for these two verbs.

1. Why do the two verbs have -gy [ɟ] even though their roots lack palatal consonants?

2, Why does that gy have different distributions in the paradigms of the two verbs: e.g.,  van and vagytok (not *vagy and *vantok) but megy and mentek (not *men and *megytek)?

3. Why do 'thou art' and one form of 'thou goest' have a zero ending?

4. Why do the forms of 'thou goest' have long vowels? Is length in mész compensating for a root-final consonant lost before -sz?

5. Why does 'to be' have a instead of o which is still in other forms like volt 'he/she/it was'?

6. Why does 'to be' have n instead of l which is still in other forms like volt 'he/she/it was'?

I could ask even more questions about the rest of the paradigms of those two verbs (e.g., why is the potential of 'to go' me-het with the stem reduced to an open syllable?), but I'll stop here.

*Although Proto-Finno-Ugric may not even exist (cf. Tibeto-Burman in Sino-Tibetan), I cite this form merely to indicate that the source of the Hungarian verb had *l which is still in some other forms of the verb (e.g., volt 'he/she/it was') as well as in related languages: Finnish olla and Estonian olema. WHAT IS THE INDIC SOURCE OF THAI NATTA?

That question came to mind when I saw the name of this restaurant. I assume the Natta of Natta Thai is from the name [náttʰaː] which I've seen spelled ณัฏฐา <ṇaṭṭhā> and ณัฐฐา <ṇaṭhṭhā>. The letters ณ <ṇ>, ฏ <ṭ>, and ฐ <ṭh> are for retroflex consonants that were never in Thai and usually signal Indic origin. Yet I cannot find any Sanskrit or Pali words beginning with ṇa- other than Skt ṇakāra 'the sound ṇ' which is not relevant here. Is ณ <ṇ> a hypercorrection for น <n>? There is a Pali word naṭṭha ... but it means 'destroyed'!

Since I mentioned ณ <ṇ>, here is a question I've had for a long time: why is the Thai preposition [náʔ] spelled ณะ <ṇḥ>? Was that an attempt to dress up a native word in Indic-like guise? The use of a low-frequency letter also makes the word stand out. Was that intentional? How far back does the retroflex spelling go? Was the word ever spelled with dental น <n> as นะ <nḥ> like Lao ນະ <nḥ> [nāʔ]? TONOGENETIC CLUES IN MIZO 'DECLENSION'?

I first heard of Mizo (as 'Lushai') back in the late 90s when I learned that Starostin had found a correlation between Mizo short vowels and Middle Chinese Grade III (going back to Old Chinese 'type B' syllables which he reconstructed with short vowels and which I reconstruct as nonemphatic). See Sagart's (1999: 42-43) summary of proposals concerning the origin of Grade III which is often reconstructed as a  medial *-j-.

I didn't look at Mizo again until tonight when I took a good look at its Wikipedia entry. Normally, I expect Asian tonal languages to be 'isolating' like Chinese, but Mizo nouns decline! Or is 'declination' an artifact of looking at Mizo through an Indo-Aryan lens and/or Mizo orthography? Would it be better to analyze the suffixed case forms as noun-postposition sequences as in DeLancey (2004)? In any case, the ergative and instrumental both end in -in but have different tones. Does that tonal alternation reflect one or more lost final consonants? Are the two -in from a single original suffix (or postposition) with or without a following glottal suffix that conditioned a different tone?

*-in > -in + tone

*-in-H > -in + a different tone

6.21.23:36: Segmental affixes may also be the source of tone changes in derived verbs (though some derivations may postdate tonogenesis and be by analogy with existing pairs of verbs).

6.21.23:57: How many tones does Mizo have? Wikipedia lists eight. But Khoi Lam Thang (2001: 40) listed five, and Lorrain (1940) in Namkung (1996: 234) listed only three! How can these different descriptions be reconciled? And where did these tones come from? Wikipedia makes it sound as if Mizo had Chinese-style tonogenesis:

Tone systems have developed independently in many of the daughter languages [daughters of which language?] largely through simplifications in the set of possible syllable-final and syllable-initial consonants. Typically, a distinction between voiceless and voiced initial consonants is replaced by a distinction between high and low tone, while falling and rising tones developed from syllable-final h and glottal stop, which themselves often reflect earlier consonants.

I hoped to see the details in this process in Khoi Lam Thang's (2001: 98) dissertation. Unfortunately, his reconstruction of Proto-Chin, the ancestor of Mizo and its sisters, lacked a tonal component.

This  analysis  shows  that  there  are  comparatively  clearer  tonal  correspondences between Tedim, Mizo and Hakha. However, tone in Mara, Khumi and Kaang are split within the Patterns [established by Gordon Luce for Chin languages such as Mizo], tremendously complicated and without predictable environments. Thus, while a reconstruction of proto Northern Chin may be proposed from this data, a reconstruction of Proto Chin tone is incomplete and cannot at present be proposed. Therefore this thesis will be limited to a segmental reconstruction for Proto Chin. A Chin tonal analysis is in progress by Dr. Fraser Bennett and Ajarn Noel Mann. Their initial findings seem much closer to Luce’s Tonal Patterns.

I wonder what their final findings were. REPLICATING GRAINS OF GOLD

I like Andrew West's English title for the Tangut text that I have beencalling the Golden Guide. His latest post is about manuscript copies of the Grains and practice pieces in which characters from the Grains were written repeatedly.

He used my notation to transcribe Tangut readings with a twist: he wrote tones as superscript numerals and grades as subscript numerals: e.g., he wrote the reading of

'moon, month'

as ²lhiq₄ = lhiq with tone 2 and Grade IV. I write it as 2lhiq4 because superscript and subscript numerals are difficult for me to type and to read.

He linked to my notes on the Grains whenever they were available. I still have 96 lines left to translate and annotate. (I stopped at line 104 in January.) Now I want to finish so Andrew can add more links to his entry. DID KHITAN AND JURCHEN SHARE A WORD FOR 'GRANDSON' (PART 2)?

I forgot to make a few points about Khitan

191 'grandson'

in my last entry, and I've thought more about the topic since, so here's a follow-up I didn't plan.

Why does 191 mean 'grandson'?

I don't know. I haven't seen Liu Fengzhu and Chengel's (2003: 18) explanation for that gloss. If I can find it, I might write a part 3.

191 occurs four times in the epitaph of Field Marshal Yelü, but none of those four occurrences unambiguously mean 'grandson' (Wu and Janhunen 2010: 159, 161, 190).

191 also functions as a phonogram: e.g., in the female name

191-236-372-361 <191.ur.û.en> (Xiao Dilu 26.26; see Wu and Janhunen 2010: 106-107).

How was 191 pronounced?

Lu Yinghong & Zhou Feng (2000: 49) read it as [mu] because they regarded it as a transcription of Liao Chinese 睦 *muʔ. However, the rest of what they regarded as a transcription of a Chinese phrase is not a good match (Wu and Janhunen 2010: 107). Given that other Chinese final glottal stops may have been Khitanized as (= -h in Kane 2009), perhaps 191 was <muɣ> (which resembles Written Mongolian omuɣ 'clan', though I am skeptical of apheresis; see below).

The fact that 191 is often followed by u-graphs (e.g., 236 <ur> above; see Qidan xiaozi yanjiu 312 and Wu and Janhunen 2010: 317 for others) suggests that its reading may have ended in -u. Perhaps the name above was something like Mu(u)ruen.

Kane (2009: 302) transcribed 191 as <mú>, but his entry for the character on p. 58 is blank, so I do not know his reasoning.

Wu and Janhunen (2010: 264) transcribed 191 as <mó>, presumably reflecting Wu Yingzhe's (2007: 46-47) which I haven't seen. Maybe by part 3 ...

Does the Khitan word written as 191 has external cognates?

If the reading of 191 began with an m-, I doubt it can be connected to Manchu omolo 'grandson', since I don't know of any cases of Khitan C- corresponding to VC- in other languages. Hence I don't think Khitan underwent apheresis. (Is there any language that lost all initial vowels?) A reading mu would make a link even more problematic since I would not expect Khitan u to correspond to Manchu o.

In part 1, I proposed that 191 may have been <om>. Such a short form - if valid - raises other issues. Manchu omolo has apparent cognates throughout Tungusic with the shape omol(g)V (Cincius 1975 2: 17-18). Therefore the word might be reconstructed at the Proto-Tungusic level. Is the word a loan from pre-Khitan (prior to monosyllabic reduction) into ((pre-)Proto-)Tungusic or vice versa? It cannot be a loan from Khitan into Jurchen or any other Tungusic language, since that scenario cannot account for final -l(g)V. Gorelova (2002: 114) analyzed Manchu omolo as omo-lo with a noun suffix -lo. That analysis seems to be synchronically correct since the plural of omolo is omosi with the plural -si replacing -lo before the root.  But is it diachronically correct? Was the Proto-Tungusic root *omo- rather than *omol(g)V, or was the word reanalyzed within Manchu? The Jurchen plural

<omo.lo.shi>  (Kyŏngwŏn inscription 3:2)

 could either be analyzed as omo-lo-shi with double suffixes or as omolo-shi with a trisyllabic root that was later reanalyzed as a root-suffix sequence omo-lo by analogy with other -lo nouns in Manchu. 

Starostin's online Altaic database treats Proto-Tungusic  *omu- (sic) 'offspring, descendant, grandchild' and *umu- 'to lay eggs' as one and the same root. I reject that identity for three reasons. First, the supposed initial vowel alternation looks like an ad hoc device to tie the two roots together. Second, all evidence points to *o as the second vowel of the om-root; *u is another bridging device to make the child root look like 'to lay eggs'. Finally, *umu- was apparently reconstructed solely on the basis of Evenki umū-. A form in a single language cannot be projected back to the proto-language.

All that effort enables Starostin to connect the Tungusic omo- (not omu-!) words to various um-words elsewhere in 'Altaic':

Old Turkic umay 'name of a goddess' < 'placenta'?

If I am reading Clauson 1972: 164-165 correctly, the word is first attested in the 8th century AD as the name of a goddess "whose particular function was to look after women and children, possibly because this object [the placenta] was supposed to have magic qualities". The first attestation of the meaning 'placenta' that I can see in his entry was in the 11th century AD. I assume 'placenta' is the earlier meaning even though it is actually found later.

Written Mongolian umai 'womb'

Korean um 'sprout'

Japanese um- 'to give birth'

The Turkic and Mongolian words must share a common source; one language probably loaned the word to the other.

The semantics of the Korean word are distant from 'womb'. Um may be an -m-suffixed nominalization of an extinct verb 'to sprout'.

The Japanese word may be a chance lookalike like English womb [wum]. Is English 'Altaic'?

Starostin reconstructed Proto-Altaic *úmu 'to give birth'. According to the rules in Etymological Dictionary of the Altaic Languages (2003 1: 18), the first vowel of the reflexes of a Proto-Altaic word with the vowel sequence *u-u should be *U in Proto-Tungusic. The cover symbol *U enabled Starostin et al. to regard both the improbable *umu- and the incorrect *omu- to be descendants of *úmu. DID KHITAN AND JURCHEN SHARE A WORD FOR 'GRANDSON'?

Last weekend, I opened Wu and Janhunen (2010) at random and saw this passage about the Khitan small script character


on p. 107:

Even so, assuming that the value [mu], here romanized as mó, is approximately correct, the Khitan item for 'grandson' may perhaps be compared with Manchu omolo id., suggesting that the actual pronunciation might also have been [omo] (Wu Yingzhe 2007f: 46-47).
I wonder if 191 was [om] ~ [mo] with a reversible reading like other Khitan small script characters such as

222 [iń] ~ [ńi].

The Jurchen word for 'grandson' was omolo as in Manchu. I suspect that the character variously written as

was originally a logogram for omolo (though it is not attested alone) which later acquired a following <lo> (see my posts from 6.1 and 6.3) in the attested spellings:


(Kyŏngwŏn inscription 3:2, mid-12th century; the spelling on the left is from Jin 1984: 205 and the spelling on the right is from Jin and Jin 1980: 336)

(Deshengtuo inscription 14, 1185; Jin 1984: 205 also reports this in Yongning 12, but Jin and Jin 1980: only list the second of the next two spellings in Yongning 12.)


(Yongning temple inscription 12, 1413; the spelling on the left is from Jin 1984: 205 and the spelling on the right is from Jin and Jin 1980: 364)

(Hua-Yi yiyu Berlin ms. people section 14, before c. 1500?)

I have not seen any of the originals, so I am not certain about the details.

I have not yet been able to find an exact match for Jurchen <omo> in the Khitan large script. Characters 0170, 0204, and 0205 in N4631 are vaguely similar, but until their readings and/or meanings are known, I cannot regard them as prototypes for Jurchen <omo>. KHITAN SMALL SCRIPT CHARACTER 346 IN QIDAN XIAOZI YANJIU

Qidan xiaozi yanjiu (1985), the foundation of current studies on the Khitan small script, only lists four instances of 346 in the texts it covers:

244-346-273 <s.?.un> (道 14.11, 24.16, 仲 17.37) and 251-346-273 <n.?.un> (許 57.33)

Are those genitives of nouns, or is <un> part of the stem? If <un> is a genitive suffix, the vowel of 346 should be u according to the present understanding of Khitan vowel harmony. So perhaps that is partly why Kane (2009) transliterated it as <uŋ> and Wu and Janhunen (2010) transliterated it as <ung₂>. The final nasal reflects the assumption that 346 is a variant of single-dotted 345 <ung> from my last post:

345 is much more common. Qidan xiaozi yanjiu lists 72 occurrences of 345 which can appear by itself (on the murals where characters are often not grouped into blocks) and in first, third, and fourth position: e.g.,

345-041 <> (興 25.3), 334-019-345 <> for Liao Chinese 宮 *giung (or *güng?) (道 6.33), 048-092-261-345-341 <?> (許 61.2)

Is 346 simply a variant of 345 (Kane 2009: 77), or is it a distinct character? If it is the latter, was its reading similar to <ung> (e.g., <üng>) or was it something else with an u-vowel? 346 coexists with 345 in all three texts where it was found (道, 許, 仲). Was the number of dots on the bottom random like the dots in the three variants of Jurchen <lo>?

The fact that 346 only occurs in blocks of the type <C.346.un> suggests a deliberate choice, though it could also be an artifact of extremely limited data. Qidan xiaozi yanjiu does not list the blocks <s.ung.un> and <n.ung.un> with 345 instead of 346. Is this complementary distribution accidental or meaningful? Have any such blocks been found in the three decades following the publication of Qidan xiaozi yanjiu? The closest block with 345 is

244-345 <s.ung> 宋 'Song (dynasty)' (仁 8.13)

which might be the stem of

244-346-273 <s.?.un> (道 14.11, 24.16, 仲 17.37)

if 345 and 346 really are equivalent and if <un> is a genitive suffix.

If 244-346 is also 'Song', could 251-346 be a loan of a Liao Chinese word *nung? AN 'ETERNAL' LINK BETWEEN THE KHITAN SMALL SCRIPT AND THE JURCHEN (LARGE) SCRIPT?

Tonight I noticed that the Jurchen (large) script character


for the transcription of Ming Chinese 永 *yüng 'eternal' resembles a cross between the Khitan small script characters

106 ~ 345 ~ 346

which are slightly different ways to transcribe Liao Chinese *-ung. (I assume 106 is an abbreviation of 345. The function, if any, of the extra dot in 346 is unknown.)

Was the Jurchen character derived from 106/345/346, or is the similarity a coincidence? Normally Jurchen characters are thought to be derivatives of Khitan large script characters or 'sisters' if not descendants of those characters. So I would expect Jurchen <üng> to be somehow related to Khitan large script characters such as these two (1692 and 0555 in N4631):

N4631 glossed 1692 as 'first' and listed the reading [tʰur] (= <tur> in my Khitan transcription). There is no semantic or phonetic resemblance to 永 *yüng 'eternal' or its (near-)homophones.

Nothing is known about 0555. Was it pronounced üng?

The Khitan small script character for Chinese transcription


that Kane (2009) transcribed as <iúng> may have been pronounced üng. It of course does not look anything like Jurchen <üng> unless one is imaginative. I doubt the Jurchen - who were literate in Khitan - overlooked it and chose a small script character with a somewhat different reading (106/345/346) as the basis for their <üng>.

6.3.1:06: Maybe I am wrong about 181 being üng. The Liao Chinese rhyme that it transcribes was also transcribed in the small script as 019-345 <iu.ung>: e.g.,

334-019-345 <g.iu.ung> for 宮

So was 181 <iung>? (I see no reason to add an acute accent, as there is no <iung> distinct from <iúng> in Kane 2009.)

Another possibility is that the Liao Chinese rhyme was -üng, and the Khitan had two strategies for writing it: a spelling reflecting a partially nativized -iung (if Khitan had no ü) and a spelling with a character specifically designed for -üng. The degree of phonetic mismatch between Liao Chinese and Khitan must have been considerable, though it eludes precise measurement. LAST OF THE OLD JURCHEN SCREENCAPS

Yesterday I discovered a screencap of Jurchen characters that I made four years and two laptops ago. At the time I created images of all but two that I didn't upload until tonight:

<i(r)> and <lo>

Those two might be the last all-new images of 72-point characters on this site. All future images will be of 48-point characters or be derivatives of existing 72-point character images.

Although the two characters look very similar, they have completely different phonetic values.

<i(r)> was transcribed as Ming Chinese* 一兒 *ir in the Sino-Jurchen glossary (Kiyose 1977: 91) and in turn transcribed the initial *y- of Ming Chinese 永 on the Yongning Temple Stele (lines 1, 6, 8, 10, and 13). A dotless variant

appears on line 9 of that inscription.

<lo> corresponds to the Ming Chinese phonetic transcription 洛 *lo of the name

<> = 充哥洛 *cunggolo (<ge> was also transcribed as Ming Chinese 革 *ge.)

in memorial XI (Kiyose 1977: 201) and has dotless and single-dotted variants:

The dotless variant from the Yongning Temple Stele looks like the Chinese character 早 *dzaw and the Khitan large script character

whose reading is unknown. Was 早 also read <lo> in Khitan?

The two appear together - ignoring variation - in the transcriptions 


<i.üng.lo> (line 8 and 10) and <i.üng.lo> (line 13*)

of 永樂 *yünglo in the Yongning Temple Stele inscription. That word illustrates how the presence of absence of a left-hand bend in the central stroke is the key difference between the two graphs.

*6.2.1:08: Ming Chinese reconstructions are in the same non-IPA orthography that I use for Jurchen and Khitan to facilitate comparison. The use of identical letters in different languages does not necessarily entail exact phonetic matches: e.g., Ming dz [ts] was voiceless unlike Khitan

104~354 <dz>

which may have been voiced [dz].

**6.2.0:54: The two-character transcription


of 永樂 *yüŋlo in line 6 is presumably an error for



Thanks to Andrew West for solving yesterday's screen capture mystery. It turns out that if I went to Control Panel\Appearance and Personalisation\Display on my 1920 x 1080 Windows 8 laptops, "Change the size of all items" was set to medium on one and "Larger" on the other. Hence screen captures were 25-50% larger than I expected. Changing the setting to "Smaller" makes screen captures the size I'm accustomed to - but makes text in programs even harder to reader than it already is. So I now have both laptops set to "Larger" for maximum legibility, and instead of resetting them to "Smaller" every time I want to make new Tangut, Jurchen, or Khitan character images, I'll use 48-point characters instead of 72-point characters. I tested that new technique, and the results are almost indistinguishable from my old technique: e.g., Khitan small script <TWENTY>

is 87 x 84 pixels in 48 point now but 85 x 82 in 72 point on my old machines. Close enough! WHY IS THE KHITAN SMALL SCRIPT SO LARGE ON MY NEW LAPTOPS?

For nine and a half years, I have been using 96 x 124-pixel images to represent the Tangut, Khitan, and Jurchen (TJK) scripts. These images are almost always screen captures of TJK characters in 72-point fonts in BabelPad. (In a few cases, they are screen captures of Chinese characters that I have modified by hand to represent Khitan large script characters that I don't have in a font.) This system has worked since January 2006 on five different laptops: two with Windows XP, one with Windows Vista, and two with Windows 7. However, it no longer works on my new Windows 8 machines. 72-point characters no longer fit in a 96 x 124-pixel space: e.g., the Khitan small script character for 'twenty'

was 85 x 82 in 72 point in Andrew West's font on previous machines but is now 107 x 104 on one Windows 8 machine and 129 x 125 on another. The screen resolution is 1920 x 1080 on both machines. Is there anything I can do to make characters appear at the same old size again? I am reluctant to post about TJK if future character images can't be consistent with previous ones.

Ah, I see now. Windows 8 has nothing to do with this problem. Resolution is the key. My previous machines had screens set to 1024 x 768, 1280 x 800, and 1366 x 768.  If I reset my current machines' resolution to one of those smaller formats, 'twenty' is 85 x 82 in 72 point. Why is it 26% and 52% larger on two different 1920 x 1080 screens? Intel should know the answer; both machines have an Intel HD Graphics Control Panel. IN-S-ERTED IN TIME

Having mentioned Middle Korean (MK) ᄣᅢ pstay 'time' in my last post, this might be a good, um, time for a short note on John Whitman's (2012: 32) etymology which I just discovered on Wednesday:

Proto-Koreo-Japonic (PKJ) *pə(n)tə >

Proto-Korean *pət-ay > MK pstay > modern Korean ttae 'time'

I thought Whitman's split of 'time' into two parts was ingenious.*-ay is a locative suffix reanalyzed as part of the stem, so modern Korean 때에 ttae-e 'time-LOC' has the same suffix twice: i.e., it is etymologically *'(time-LOC)-LOC'. Although the sequence *pət-ay is not harmonic according to Middle Korean rules, earlier Korean may not have had any harmony.

Proto-Japonic *pə(n)tə 'interval' > Japanese hodo

I see two phonetic problems with this etymology.

First, if the PKJ form had *-n-, that consonant corresponds to nothing in MK unless *pnt- became pst- (via *pzt-?). I know of no parallel for such an unusual change.

Second, if the J form did not have -n-, there is no source for MK -s-, unless *pt- became pst-, which is not only strange but also raises the issue of how MK could also have pt- (e.g., in ᄠᅢ ptay 'dirt'). I would rather not posit a chain shift with syncope in 'time' and 'dirt' occurring in different periods to explain why both pst- and pt- exist in MK:

Proto-Korean *pət-ay 'time' *pVtay 'dirt'
Early syncope *ptay *pVtay
*s-insertion *pstay *pVtay
Late syncope pstay ptay

I did not specify a syncopated vowel in 'dirt' because I have no evidence for its quality. One could hypothesize that *pət- became pst- whereas *pVt- (in which *V was a vowel other than *ə) became pt-, but I don't know why *ə would be more s-friendly (sigmaphilic?) than, say, *ʌ. SICILIAN GEMINATION

What is the origin of initial geminates in Sicilian which may or may not be written? Has gemination from syntactic doubling been carried over into isolated forms: e.g.,

è bonu [ebˈboːnu] 'is good' (?) > bonu [bboːnu]?

(And is [ebb] in turn from *es b ...?)

In any case, I assume the phenomenon is a Sicilian innovation. Although nomu [nnomu] 'name' originally might have had a consonant cluster in Proto-Indo-European (*ʕʷnomn*), that cluster was gone in Latin, so no parallel with Korean 'tense' consonants from earlier clusters (e.g., pstay > ttae 'time') can be drawn.

Is gemination in words like mmàggini 'image' from earlier *VC-sequences (*imàggini > mmàggini), or does it postdate apheresis (*imàggini > *màggini > mmàggini)?

*I assume Greek o- in onoma 'name' is from PIE *ʕʷ- rather than a prothetic vowel as proposed by Cowgill and Beekes (1969). What would be the motivation for prothesis? Greek does not have a constraint against initial n-. (Does any language have such a constraint before o? In Korean, n- was lost before i and y: e.g., 李 Ri > Ni > I 'the surname Lee'.)

Another possible initial PIE cluster is *ʔn- which would normally become Greek en-. Greek o- could be due to *e- assimilating to a following *o. SOLVING FOR X IN MALTESE

Why does x equal [ʃ] in Maltese? That usage surprises me since it is not in English, Italian, or Sicilian (whose alphabet does not include the letter x). Is it

-a remnant of a convention that once existed on the Italian peninsula (and perhaps still exists there, albeit in a language other than Italian or Sicilian)?

- influenced by the orthography of some European language which Maltese is not in direct contact with: e.g., Portuguese and Catalan?

- a Maltese-internal innovation possibly motivated by a one-sound-per-symbol principle (though the digraph from my last post is not compatible with that principle).

I was also going to ask why Maltese has j for [j] unlike English or Italian, but then I learned that Sicilian has the same usage. MYSTIFIED BY MALTESE VOWEL BENDING

I was surprised to see this in the Wikipedia article on Maltese:

/ɐɪ ɛɪ/ represented by għi, and /ɐʊ ɛʊ ɪʊ ɔɪ ɔʊ/ written għu.

I only had a vague memory of għi standing for /ɛɪ/ and għu standing for /ɔʊ/. I was particularly surprised by the equation of għu with /ɔɪ/. So I went to the source (Borg and Marie Azzopardi-Alexander 1997: 299) and found that the passage should be rewritten as follows:

/ɐɪ ɛɪ/ represented by għi, and /ɐʊ ɔʊ/ written għu.

/ɛʊ ɪʊ ɔɪ/ are written as ew, iw, and oj.

I have long assumed that the spellings with are historical and point to a time when there was an 'emphatic' consonant (a voiced uvular fricative corresponding to Arabic gh?) that conditioned the bending of the following vowel before disappearing. Cf. how 'emphasis' (pharyngealization) conditioned the bending of *i and *u in Old Chinese before disappearing:

*Cˁi > *Cˁei > *Cei

*Cˁu > *Cˁou > *Cau

I wish I could confirm my guess by consulting a work on Maltese historical phonology.

I also wish I knew why għi and għu each have two readings. Those readings can't be allophones because they are in brackets: i.e., they are phonemic. Do the spellings reflect a period before a phonemic split conditioned by a factor that has now been lost? WHAT IS THE ORIGIN OF UVULARIZATION IN QIANG? (PART 2) In part 1, I asked,

Did MLQ [Mawo and Luhua Qiang] merge its equivalents of 'Grade I' and 'Grade II': i.e., is MLQ QVʁ from *QV with a plain vowel and *QVʁ with a uvularized vowel?)

I suspect that uvularization is secondary in at least some MLQ words with uvular initials: e.g., Mawo and Luhua Qiang qaʶ 'I' whose external cognates lack any trace of a medial *-r-. Other possible examples are 'afraid/fear', 'fish', 'Chinese', and 'chisel' below.

Luhua Qiang χuʶ 'tiger' looks like a loan from modern Mandarin 虎 hu 'tiger'. Could secondary uvularization be very recent in MLQ?

On the other hand, Luhua Qiang qʰaʶ 'bitter' corresponds to Tangut Grade II

4046 1khi2 'bitter'

Normally I reconstruct medial *-r- as the source of Grade II in Tangut, so it is initially tempting to regard uvularization in the Luhua form as having beem conditioned by a lost *-r-. However, once again there is no external evidence for a medial *-r- (e.g., the cognate Tibetan root is kha, not *khra), so I think some Tangut velar-initial Grade II syllables originally had uvular initials with secondary uvularized vowels:

*qʰa > *qʰVʶ > 1khi2 [kʰiʶ]

(I do not specify the vowel in the intermediate stage since I don't know if raising in Tangut preceded or followed uvularization.)

Possible Tangut cognates of other uvularized MLQ words in Evans et al. (2015) are not Grade II: i.e., they lacked medial *-r-. (Syllable-final numbers indicate Tangut grades: e.g., 1khi2 above is Grade II.)

LFW number
Mawo Qiang
Luhua Qiang

younger brother

təʶ 'brother of a man'
təʶ 'brother of a man'

baʶ 'old (of objects)
to fear

quʶ 'afraid'
quʶ 'fear'


diʶ 'thigh'
diʶ 'thigh'

suʶ 'hemp'
suʶ 'hemp'
to know

niʶ niʶ




six-year-old sheep

nuʶ 'ram'
nuʶ-tə 'ram'

*zar or *Rza

Some of those words may be unrelated: e.g., the Tangut word for 'chisel' is probably a loan from Middle Chinese 鑿 *dzak. The sound correspondences between MLQ and Tangut are not yet known, so I am not able to easily distinguish between true cognates and mere lookalikes.

Could a uvular affix absent in Tangut have conditioned MLQ uvularization?

5.26.2:34: Mawo Qiang nuʶ and Luhua nuʶ-tə 'ram' might be from MLQ nu 'sheep' plus a uvular affix: e.g., *ʁ-nu. However, Evans (2001: 298) listed the Mawo word for 'sheep' as ȵu with a palatal initial instead of n-. I wish I had Qiangyu jianzhi on hand to check the word.

5.26.2:39: Luhua Qiang suʶ 'ten' has no Tangut cognate, but it does resemble Pyu <sū> (Krech 2012's <sav>) ~ <sau> 'ten'. Pyu had initial <sr> in native words (e.g., <srūḥ> 'relative'), so the simple <s> of 'ten' cannot be from *sr- unless there was a chain shift: *Xr- > *sr- > s-.

(Pyu has <h> in <hoḥ> 'three' corresponding to s- elsewhere: e.g., Tangut 1soq1 'three'. That may imply Pyu <s> was once something else that filled the gap left by original *s- when it lenited to *h-.) WHAT IS THE ORIGIN OF UVULARIZATION IN QIANG? (PART 1)

I forgot to make one point in my last entry. It seems that a lot of Chinese historical phonological studies are conducted in a vacuum without much reference to other Sino-Tibetan languages, let alone general phonological typology. Even Sinoxenic (Sino-Vietnamese, Sino-Korean, Sino-Japanese: i.e., systematic borrowings of Chinese) and transcriptive data are not getting as much as attention as I think they deserve. A better (I dare not say 'true' or 'correct') reconstruction of the history of Chinese should take into account the bigger picture.

One of the reasons I like Norman's pharyngeal theory for Old Chinese (OC) is that it makes sense both areally and typologically; it makes OC like its 'Altaic' neighbors (see Norman's 1994 article for details) and it allows nongenetic parallels to be drawn between OC and Semitic. It was a chance look at Maltese that convinced me Norman was right; pharyngeals conditioned vowel lowering in both languages: e.g.,

Imġarri Maltese [anté͜ik] < *antˁk 'ancient' (loan from Italian; Camilleri & Vanhove 1994: 104)

MC *tek < *tejk < OC 弔 *tˁi 'arrive' (but *tˁekʷ is also possible; is there any rhyming evidence pointing to one or the other vowel? Baxter and Sagart 2014 regard the vowel as ambiguous. My guess is that the word was originally *tekʷ with pharyngealization developing before the lower series vowel *e. If the word was *tˁikʷ, its pharygealization would reflect a lost presyllable with a lower vowel: *Cʌ-tikʷ > Cˁʌ-tˁikʷ > *tˁikʷ.)

The vowel changes in OC are also similar to those in Khmer, though the conditioning factor in Khmer was voicing rather than pharyngealization: e.g.,

Khmer [əj] < *iː after voiceless consonants

Late OC *ej < *i after pharyngealized consonants

Last night I proposed that uvularization was conditioned by pharyngealized (= 'emphatic') initials followed by uvular allophones of */r/ in OC and Tangut. Does uvularization in Qiang have a similar origin?

The distribution of uvularized vowels in Mawo and Luhua Qiang (MLQ) as described in Evans et al. (2015) suggests that exact parallels cannot be drawn between Qiang on the one hand and OC and Tangut on the other:

1. Chinese and Tangut contrasted plain and uvularized vowels ('Grade I' and 'Grade II') after reflexes of *uvulars, whereas only uvularized vowels can occur after uvulars in MLQ (Evans et al. 2015: 24). (5.25.1:50: Did MLQ merge its equivalents of 'Grade I' and 'Grade II': i.e., is MLQ QVʁ from *QV with a plain vowel and *QVʶ with a uvularized vowel?)

2. If I understand Evans et al. (2015) correctly, MLQ permits both plain and uvularized vowels to occur in uvular QC-clusters. In theory, both QRV and QRVʁ might exist in MLQ, though I cannot find any examples in Evans et al. (2015). On the other hand, Chinese and Tangut only had uvularized vowels after reflexes of *QR-clusters.

3. Some uvularization in MLQ is due to right-to-left spreading: e.g.,

Luhua kʰɹa 'eight' + suʶ 'ten' = kʰɹaʶ-suʶ 'eighty' (Evans et al. 2015: 29)

(5.25.1:51: I think uvularization in vowels after velars is exclusively secondary in MLQ: i.e., there are no isolated monosyllabic roots combining velar initials with uvularized vowels.)

This phenomenon has no known parallel in Chinese or Tangut. (I reconstruct left-to-right emphatic spreading in those languages.)

Although Tangut is more closely related to Qiang than to Chinese, Tangut areally aligns with Chinese at least as far as uvularization is concerned if my interpretation of Grade II is correct for both languages. WAS MIDDLE CHINESE (AND TANGUT) GRADE II UVULARIZED?

Last week I finally started an entry that was more than just a link to another scholar's work. However, I ran into Internet problems and put off writing nearly all of the entry until tonight.

On the 14th I discovered Evans et al.'s "Uvular approximation as an articulatory vowel feature". Although the paper only discusses that feature in the Mawo and Luhua dialects of Northwestern Qiang, I wonder if that feature characterized Grade II in Middle Chinese (MC).

Old Chinese (OC) syllables with 'emphasis' (pharyngealization) became Grade II syllables in MC if they had a medial *-r-. Otherwise they became Grade I syllables:



In my reconstruction of OC, uvulars were only in 'emphatic' syllables. Medial *-r- had an uvular allophone *[ʀ] after 'emphatic' initials. This *[ʀ] weakened to a fricative *[ʁ] and uvularized the following vowel before disappearing in late OC:

OC *CˁʀV > *CˁʁV > *CˁʁVʶ > *CVʶ

Tangut Grade II may have had a similar origin: e.g.,

*pʰroH > *pʰˁʀoH > *pʰˁʁoH > *pʰˁʁoʶH > 0080 2pho2 [pʰoʶ²] 'snake'

There was no way to indicate uvularization in the Tibetan script, so Tangut Grades I and II were not distinguished in Tibetan transcription.

MC Grade II vowels were borrowed as nonuvularized vowels in Vietnamese, Korean, and Japanese - all languages lacking uvularization:

Vietnamese Korean Japanese
*ka (Grade I) 'song' ca [kaː] ka

*k(ɰ/j)aʶ (Grade II) 'to add' gia [zaː] < *kjaː

Grade II developed a glide after velars in the MC dialect underlying Sino-Vietnamese:

OC *KˁʀV > *KˁʁV > *KˁʁVʶ > *KɰVʶ > *KjVʶ

Sino-Korean is based on an eighth century northeastern dialect in which that glide had not yet fronted to *-j-. Velar *-ɰ- has not left a trace in Sino-Korean. The -y- in a few Sino-Korean borrowings of Grade II syllables is due to the Korean-internal breaking of *e and does not reflect later NE MC *-j-: e.g.,

界 MC *kɰèʶj > Old Korean *kéy > Middle Korean *〮곙 *kyŏ́y > modern Korean 계 kye [ke]

(5.24.1:42: The MC 'departing' tone that I indicate with a grave accent corresponds to the Middle Korean high tone that I indicate with an acute accent. I have projected the high tone back into Old Korean, but it is possible that the OK source of the high tone had a different contour. In any case, the contours of the OK and the northeastern MC tones were probably similar.

The earliest attested MK reading for 界 is a prescriptive reading 〮갱 káy that is not ancestral to modern Korean 계 kye. The prescriptive reading is from 界 MC *kɰàʶj. I reconstructed MK *〮곙 *kyŏ́y to account for the modern form.)

Sino-Khitan is based on a later stage of that northeastern dialect in which *-ɰ- had fronted to *-j-: e.g.,

家 MC *kɰaʶ > Liao *kja(ʶ) > Khitan small script  <g.ia>

Uvularization may have been lost in Liao Chinese after plain *a raised to *o, leaving a gap to be filled by uvularized *aʶ:

*aʶ > *a > *o

If uvularization persisted in later stages, it must have been subphonemic. It has not been observed in any living Chinese languages. A WEB OF TANGUT CATALOGUES

Andrew West wrote,

Dozens of Tangut Buddhist manuscripts held at the Institute of Oriental Manuscripts (IOM) in Saint Petersburg have been digitized and made available online at the IDP [International Dunhuang Project] website, but there is no accompanying description or bibliographic information for any of them, and not even the title of the text is given. This makes it difficult for the handful of scholars in the world who can read Tangut to usefully browse the Tangut collection on IDP, and next to impossible for everybody else.

I am one of those scholars. Fortunately, Andrew has come to the rescue with his Web of Tangut Catalogues. Thank you, Andrew!

If only Khitan and Jurchen had as many texts as Tangut! BIBLIOGRAPHY OF RGYALRONG STUDIES

Here's another list of recommended reading while I'm away. It was compiled by Guillaume Jacques, whose works taught me almost everything I know about rGyalrong (which is to say very little - mea culpa for not reading enough).

I would add these 1979 Linguistics of the Tibeto-Burman Area articles by Nagano Yasuhiko, who edited the online rGyalrongic languages database along with Marielle Prins:

A historical study of rGyarong initials and prefixes. 4.2: 44-68.

A historical study of rGyarong rhymes. 5.1: 37-47.

I haven't seen those articles since 2008. It would be interesting to compare his views of rGyalrong phonological history with Guillaume's. GUILLAUME JACQUES' BLOGS - AND A BABELSTONE BONUS

I haven't written anything here for almost two weeks now. I may not blog much for the next several weeks.

If you are waiting for me to return, I recommend Guillaume Jacques' posts at these three sites:

Panchronica (in French)

Diversity Linguistics Comment (in English)

HimalCo (in English)

Oh, and Andrew West has a new post that I didn't see until just now! I see he used my simplified transcription of Tangut. Nice!

One reason I've been away is that I got a new laptop and haven't gotten around to setting up its connection with my server yet. Here goes ... if you can see this, I succeeded. DISTRIBUTIONAL DICTIONARIES OF CHARACTERS

Traditional East Asian dictionaries do not explicitly state whether characters can only occur in combinations or not. At first glance, one might get the idea that both 麒 and 麟 are Chinese words, but in fact the first only occurs in the disyllabic word 麒麟 'qilin'*, whereas the second can be found as an independent word in Classical Chinese** and as a part of other words. A 'distributional dictionary' could make a three-way distinction between

- superbound (appearing solely as part of a single polysyllabic word): e.g., 麒

- bound (appearing as part of two or more polysyllabic words): e.g., 麟 in modern Mandarin

- free (able to appear as an independent word): e.g., 麟 in Classical Chinese

Even finer distinctions may be possible, but that's a start.

Such distinctions could be carried over into a Tangut character dictionary since Tangut, like Chinese, has a large number of monosyllabic morphemes. However, the scheme might have to be altered somewhat for Khitan and Jurchen which have a large number of polysyllabic morphemes. Nonetheless, I still think it is important to know that, for example, as far as I know, Jurchen


may be superbound, as it only appears in

<> 'the name Jahudai'

whereas its homophone


has a far wider distribution: it can represent dai 'girdle' (< Chinese 帶) and the syllable dai in many words other than the name Jahudai. The two characters do not appear to be interchangeable. And even if they were interchangeable, it would be nice to know when that was the case: e.g., from the start or only from the Ming Dynasty onward.

Once we determine that two or more homophonous characters were not interchangeable, then we can try to determine why. In some cases the homophony may not turn out to be original: i.e., the two characters originally had different readings that merged over time, and the original functions of the characters blurred. Since <dai2> resembles Jin Chinese 大 *dai, I think it had always been read dai, whereas <dai1> may have originally stood for a rarer Jurchen syllable that later became dai.

*I am not counting the use of 麒 in definitions such as


'The male qilin is called the qi; the female is called the lin'

from the Book of Han. This explanation for the disyllabic word qilin is a folk etymology.

**In modern Mandarin, 麟 only occurs in morpheme combinations. I would be surprised if 麟 is a monosyllabic word in any modern Chinese language. It is possible that very early attestations of 麟 as an independent word were pronounced *grin, a contraction of 麒麟 *gərin. THE LATE GREAT CHU

Today I downloaded the latest version of Andrew West's BabelStone Han PUA font containing 194 楚 Chu script transcription characters.

In 1127, 1350 years after the fall of the original Chu and less than a decade after the creation of the Jurchen (large) script, the Jurchen Empire established 大楚 Great Chu as a buffer between them and the Southern Song. This puppet state only lasted a month.

How would the Jurchen have written 大楚 *Dai Cu 'Great Chu' in their then-new script?

There were two different types of Jurchen graphs for dai.

Jin and Jin (1984: 81, 136) only list a single word-final example for one type:

<> 'the name Jahudai'

The other type was much more common and used to transcribe Jin Chinese 大 *dai 'great' as well as representing the syllable dai in the native Jurchen names

<> and <> (Jin and Jin 1984: 5)

What was the original reasoning behind having two graphs for the same syllable? Were they originally nonhomophonous? My guess is that the common <dai2> was read as dai from the start, whereas the rare <dai1> was originally for some other syllable that merged with dai: e.g., *daai.

Was there also been a lost phonetic distinction between the two kinds of <cu>? Both could be used to write native words, and both even appeared side by side in

<cu1.cu2.wa.hai> 'according to'?

But only <cu2> appeared in Chinese transcriptions, so I conclude that *Dai Cu would have been written as


4.12.2:42: <cu2> could represent the monosyllabic auxiliary verb cu- 'to be able' (Jin and Jin (1984: 81, 259). Perhaps <cu2> was originally a logograph for that verb, whereas <cu1> may have a phonogram from the beginning.

<dai2> resembles Chinese 大 *dai 'great' and could have initially been intended to write that word (and homophonous Chinese loanwords?), unlike <dai1> which might have been reserved for dai in native words. PROTO-SINO-TIBETAN-AUSTRONESIAN *PONUQ 'BRAIN'?

Old Chinese (OC) 腦 *nuʔ 'brain' was a type A syllable* with vowel lowering. According to my theory, *u partly lowered to harmonize with a low unstressed vowel in a lost presyllable:

*Cʌ-nuʔ > *Cʌ-nouʔ > *nouʔ > *nauʔ > Mandarin nao

However, Laurent Sagart (2002: 5) regarded 腦 *nˁuʔ 'brain' as cognate to Proto-Austronesian (PAN) *punuq with a high first vowel *u. If OC had a high presyllabic vowel in 'brain', it would have matched the high main vowel, and there would have been no lowering:

*pu-nuʔ > *nuʔ > *ɲuʔ > Mandarin *rou

Can both Laurent and I be right? PAN had only four vowels (*a *e [= *ə] *i *u), whereas OC had six (*a *e *ə *i *o *u). Laurent (2002: 8) reconstructed seven vowels in Proto-Sino-Tibetan-Austronesian (PSTAN) to account for the following correspondences in main vowels:

PSTAN Environment OC PAN
*u before labials *u
elsewhere *u
*o before labials *a
elsewhere *o
*a before *y *i *a
elsewhere *a
(everywhere) *e
*e after grave consonants *e
elsewhere *i
*i in open syllables *i
in closed syllables *i

I only reconstruct two vowels in OC presyllables: high and low *ʌ**. I have long thought each resulted from the merger of various unstressed vowels. Let's suppose that those earlier vowels were identical to the seven vowels in PSTAN final syllables:

*i *i
*u *u
*o *u

Above I assume that PAN first vowels developed more or less like second vowels. A study of OC syllable types and PAN fist vowels may reveal a different course of development.

My OC could be from PSTAN *o which raised to *u in PAN:

*ponuq > OC *pʌ-nuʔ and PAN *punuq 'brain'

4.11.1:10: If OC and PAN are not related, the word could be a borrowing from one into the other when the source language had *o as the first vowel.

4.11.1:35: Of course OC is not the only Sino-Tibetan language. STEDT lists nu-words for 'brain' in other languages. The Proto-Sino-Tibetan form may have ended in a *-q that

- was retained in Proto-rGyalrongic

- became *-k in some languages: e.g., Written Burmese ūḥnok

- became a glottal stop in OC

- was lost in Tangut

0118 and 0127 2no1 < *noH 'brain'

Was the mid vowel in some of these forms lowered before *-q? Jacques' (2004: 266) Proto-rGyalrongic reconstruction does not have *-uq. Maybe there was a chain shift: *-uq > *-oq > *-ɔq.

4.11.2:17: I am agnostic about PSTAN. Currently I think Austronesian is more likely to be related to Kra-Dai than to Sino-Tibetan.

If the correspondences above are valid, they do not entail a genetic relationship. They may tell us about patterns of borrowing.

Conversely, if the correspondences are due to common ancestry, exceptional forms may have been borrowed after a split (cf. how the loanword paternal has p instead of the regular Germanic f from Proto-Indo-European p).

*4.11.1:56: Type A syllables were characterized by secondary pharyngealization (a.k.a. 'emphasis') at some point. I do not know of any other Sino-Tibetan language with pharyngealization. I suspect that pharyngealization was a Chinese innovation which may have been due to contact with a substratum or neighboring language. I have omitted pharyngealization in this discussion to focus on the vowels.

**4.11.2:15: I got the symbols and from my phonetic notation for Middle Korean which had a two-class height harmony system like my Old Chinese reconstruction. I chose them because they are visually distinct from the letters for my six vowels. Their actual phonetic values may have overlapped with two of the vowels: e.g., they could have been and *a. It is easier to type than a phrase like "unaccented presyllabic higher vowel" or *ə̆ with a breve. DO AUSTRONESIAN AND SINO-TIBETAN SHARE A WORD FOR SETARIA ITALICA?

Today I saw Laurent Sagart's "Austronesian and Sino-Tibetan words for Setaria italica and Panicum miliaceum: any connection?" (2014) and was surprised to see him mention Khitan in a paper about prehistory (emphasis mine):

There is a complication with the semantics of this comparison: certain modern authors (Li 1983:29; Hu 1984; Chai et al. 1999:9) claim jì 稷 did not mean 'Setaria italica' in early Chinese but 'Panicum miliaceum'. This view, widespread among Chinese agronomists, is based on statements by various Chinese authors from c. 1000 CE down to modern times, to the effect that jì 稷 is the same plant as 穄 *[ts][a][t]-s > tsjejH > jì ‘Panicum miliaceum’. Thus Chai et al. (1999:9) observe that in the three provinces of Shandong, Henan and Hebei, (glutinous) Panicum miliaceum varieties are today usually referred to as jì 稷.

However, this is a confusion arising from the phonetic convergence of these two words after Middle Chinese (a standard reading pronunciation from the sixth century CE, known to us through the dictionary Qie Yun 切韻, prefaced in 601 CE, and its later editions). In Modern Standard Chinese, Middle Chinese (MC) 稷 tsik and 穄 tsjejH have both evolved, quite regularly, to jì [ʨi 51]. The merger had already occurred in northern Chinese during the Khitan or Liao dynasty, which occupied parts of north China, including Hebei, from 916 to 1125 CE. Phonetic transcriptions in Khitan small script of the 11th and 12th century Chinese show that while MC final -k was still represented by a glottal stop in poetry, it had disappeared in everyday speech (Kane 2009:252sq.). thus in everyday Chinese of the Khitan period,'Setaria italica', MC tsik, was probably [tsi]. At the same time, the character 祭, a MC homophone of'Panicum miliaceum' on Middle Chinese (both MC tsjejH), and the phonetic element in'panicum', was also [tsi] (Shen 2014:318). It is significant that there are no statements equating 稷 tsik and 穄 tsjejH from time periods preceding the phonetic merger of the two forms [i.e., from before c. 1000 CE]. Thus we can be satisfied that 稷 tsik and 穄 tsjejH were distinct cereals in early Chinese times, and that (since there is no question that jì 穄 meant ‘panicum’) jì 稷 tsik must be the name of Setaria italica.

I would like to add that Kane's argument is based on Chinese-internal data: the poetry in question is in Chinese, and the loss of final glottal stop is implied in 沈括 Shen Gua in 夢溪筆談 Mengqi bitan 'Dream Pool Essays' (1088; Kane's translation):

Even now the Heshuo [= Hebei; i.e., north of the Yellow River] people pronounce 肉 [*zhiwʔ] as 揉 [*zhiw], and 贖 [*shu] as 樹 [*shu].

In the Khitan small script,

[g]enerally speaking there is no consistency in the use of the graphs used to transcribe syllables which ended in stops in MC and probably a glottal stop in Song Chinese. This does not prove that Liao Chinese did not have a glottal stop in such words, just that the Kitan [= Khitan] transcription does not indicate it. (Kane 2009: 254)

For instance, the Khitan small script character

339 <i>

was used to transcribe syllables whose MC readings ended in -i and -it (both corresponding to Song *-iʔ). The one instance of a word whose MC reading ended in -ik like 稷 tsik 'Setaria italica' was written as

087 <tz>

which also transcribed the open syllables 知 *ji (MC trje) and 旨*ji (MC tsyijX).

The Sino-Tibetan forms for Setaria italica look like a good match for Proto-Austronesian *beCeŋ (*e = [ə]) with the exception of the coda:

Probable Tibeto-Burman cognates of the Chinese word [稷 Old Chinese *[ts]ək] are Trung tɕjaʔ55 ‘millet’, Lhokpu cək ‘Setaria italica’ (van Driem, p.c. to LS, June 24, 2004; not phonologized): if the shape and semantics of this last form are confirmed, the Proto-Sino-Tibetan word for 'Setaria italica' might sound something like #tsək (pre-reconstruction).

Both Proto-Sino-Tibetan (PST) and Proto-Austronesian (PAN) had . I would expect the following correspondences which are in Sagart (2002: 7):

OC (and probably also PST) *-k : PAN *-k

OC (and probably also PST) *-ŋ : PAN *-ŋ

Yet Sagart also found examples of the correspondence

OC (and probably also PST) *-k : PAN *-ŋ

which has Sino-Tibetan-internal parallels: e.g.,

Tangut 1siw4 < *sik, Written Burmese sac < *sik : OC 新 *sin < *siŋ? 'new'

I presume there is morphological variation within Sino-Tibetan. But if the Sino-Tibetan and PAN forms for Setaria italica are related, how can the different codas be explained? Are they different reductions of *-ŋk, a cluster lost in ST and PAN?

Genetic scenario:

Proto-ST-AN *-ŋk > PST *-k but PAN *-ŋ

Nongenetic scenario (i.e., borrowing):

pre-PAN *-ŋk > borrowed as *-k in PST but became * in PAN

4.10.4:40: The first vowel of PAN *beCeŋ (*e = [ə]) is consistent with my theory that presyllables with higher vowels (*i, *ə, *u) conditioned type B syllables in Old Chinese such as 稷 Old Chinese *[ts]ək].

Sagart (2002: 8)  found the following correspondences between  OC syllable types and PAN segments:

OC type A : PAN penultimate syllable initial voiceless stop (except *q-) or zero (i.e., no penultimate syllable)

OC type B : other PAN penultimate syllable initials including *q-

If PAN preserved Proto-ST-AN penultimate syllable initials, I do not understand why bare syllables and syllables preceded by voiceless stops developed type A with pharygealization. And why would *q- block pharygealization which was the default (!) development? (Normally pharygealization is marked: i.e., nondefault.)

PSTAN *(tV)CV > OC *CˁV (type A)

PSTAN *qVCV, *sVCV, *nVCV > OC *CV (type B)

In Semitic terms, type A is 'emphatic', and Semitic q is an 'emphatic' consonant, so I would expect it to be associated with type A.


Last night I found these translated sections of the History of the Liao Dynasty translated in Wittfogel and Fêng (1949: 261):


On the day mou-ch'ên [of the eleventh month in the thirteenth year of T'ung-ho [= 995 AD*] ] Korea sent ten boys to study the [Ch'i-tan [= Khitan] ] national language.


On the day kêng-ch'ên [sic for kêng-hsü] [of the third month in the fourteenth year of T'ung-ho [= 996 AD] ] Korea again sent ten boys to study the [Ch'i-tan [= Khitan] ] national language.


[On the day chia-shên of the twelfth month in the first year of K'ai-t'ai [= 1012 AD] ], Kuei Prefecture reported that its inhabitants, who had originally been moved from Silla [= Korea], were illiterate, and that schools should be set up to educate them. This request was approved by imperial decree.

I wondered which Khitan script(s) those Koreans learned: the large script, the small script, or both.

David Boxenhorn suggested that those Koreans might have tried to write their own language in the small script. That would have been easy to do, since Korean a thousand years ago

- had *CV(C) syllables like Khitan without the consonant clusters of a few centuries later (and even such clusters coud have been written with sequences of small script consonant symbols)

- had roughly the same consonants as Khitan minus the uvulars

- shared most of its vowels with Khitan (*i, *e [> later Korean yŏ], *ə, *a, *u, *o)

Only the apparent absence of the vowels and (> later Korean a/ŭ) in Khitan might be a problem. Existing CV, V, and VC characters could do double duty for those vowels: e.g.,

273 <un>

could represent both Korean *ɯn as well as *un. That would parallel the current use of the Roman letter u to transcribe both Korean [ɯ] and [u]: e.g., Kim Jong-un is [kimdʑəŋɯn].

Also, dots could be added to indicate non-Khitan uses of characters, just as the Khitan added a dot to  <pu> to write the Chinese syllable <fu>:


241 <pu> > 261 <fu>

4.9.3:13: David's scenario makes me wonder if the Jurchen used the small script to write their language.

When I saw this passage in Wittfogel and Fêng (1949: 253),

In 1150 a distinguished Jurchen statesman is said to have written a confidential political letter to his son in the small Ch'i-tan script; this interesting document, translated into vernacular Chinese, is preserved in the Chin Shih [= History of the Jin Dynasty] (CS [= Chin shih 76, 2a ff.; 84, 3a ff.).

I wondered if the statesman wrote in Khitan or in Jurchen using the Khitan small script. Wittfogel and Fêng raised the possibility of the latter:

Many Chin records describe the continued use of the Ch'i-tan script during the early and middle years of the Chin dynasty. Unfortunately, they do not make it clear whether this also involved the use of the Ch'i-tan language. There must have been a number of Jurchen who spoke Ch'i-tan, but the question still arises whether such knowledge was necessary to the use of the Ch'i-tan script. In the formative period of their power the Mongols wrote their documents in the Mongol language but in the alphabetic Uighur script (Browne 28 II, 441; cf. Barthold 28, 41). The Manchus until the year 1599 wrote their documents in Mongol and used the Mongol script (KHTSL 3, 2a-b). The Jurchen may have availed themselves of either method exclusively, or of both at different periods of time, first adopting an alien language and script and later using the alien script for transcribing their own language. In the latter case the smaller script would seem particularly appropriate, for as an alphabetic system of writing it could easily be adjusted to the needs of another language, especially if this language belonged to the same Altaic complex [as Korean does!]

*4.9.2:49: Although I suspect the eleventh month of Tonghe (= T'ung-ho) 13 is in the start of 996 AD, Wittfogel and Fêng referred to 995 AD in their footnote (emphasis mine):

This record is confirmed by the Korean official history which relates that in 995 the Korean government sent ten boys to Liao [the Khitan Empire] to study the Ch'i-tan language (KRS [= Koryŏ-sa 'History of Koryo'] 3, 46). However, this effort seems to have produced very poor results. In 1010, when the Liao vanguard general sent a document written in Ch'i-tan to the Korean court, no one could read it (KRS 94, 86).

Did any of the ten boys return as men to serve the court, and if so, were any of them still at the court in 1010?

4.10.4:54: Andrew West pointed out that the eleventh month of Tonghe 13 is equivalent to 25 November-24 December 995, so Wittfogel and Fêng's date is correct.


I wanted to see 'on the tomb' from my last post in context, so I looked at the text on the lid of the epitaph for 蕭仲恭 Xiao Zhonggong as copied in Qidan xiaozi yanjiu (1985: 594):

1. 139-051-290-253 <na.gha.án.ô>

2. 188-169 <?.qó>

3. 081-140 <MONTH.en>

4. 081-348 <MONTH.e>

5. 334-262 <g.ui>

6. 071 <ong>

7. 076-020-361-140 <gho.y.én.en>

8. 251-084-205 <>

9. 052-334-361 <?.g.én>

Let's go through it block by block:

1. Kane 2009 (51, 106) translated

139-051 <na.gha> and 139-051-290 <na.gha.án>

as 'uncle' and  'maternal uncle' (cf. Written Mongolian naghachu 'maternal uncle'; Ji Shi 1982). Neither occur alone in Qidan xiaozi yanjiu's index of Khitan small script words. Have they been found in isolation in the texts discovered in the three decades since the publication of that book?

Could 290 <án> be the plural suffix also in

311-151-290 <b.ugh.án> 'children' < 311-168 <b.qo> 'son, child'

which also has unexpected medial voicing in the plural? Is -gh- a contraction of *-qw- < *-qo-?

The final character

252 <ô>

is an error for

341 <er>

which Kane (2009: 106) regarded as the invariable (and in this case, nonharmonic) accusative-instrumental suffix ('via the maternal uncle'?). However, I would expect the genitive: 'junior tent of the maternal uncle'.

Could <er> be a plural suffix?


222-362-222-341 <ń.iau.ń.er> 'siblings' < 222-362 <ń.iau> 'sibling'?

is another plural ending in <er>, though the suffix may be <ń.er>. I don't know of any plural suffix <ń>, so I don't think <.ń.er> is a double plural suffix.

Could <er> be a plural genitive suffix if <na.gha.án> is a singular?

Could <na.gha.á> be a doubly marked plural like Japanese ko-domo-tachi, English child-r-en, and Dutch kind-er-en (cf. German Kind-er with only one suffix)?

2. <?.qo> is 'junior' (Kane 2009: 25). Kane interpreted this as an adjective modifying the previous noun ('junior maternal uncles'), though if that was the case, it would be in an un-'Altaic' position: i.e., following instead of preceding hte noun.

Aisin Gioro read the first character

188 <?>

as <od> in 2004 and as <oji> in 2011. If it is <od>, how did it differ from


which Aisin Gioro read as <ad> ~ <od> and <od> ~ <do>?


081 <MONTH>

is an error for

380 <TENT>.

Kane (2009: 25) translated blocks 1-3 as 'the tent of the junior maternal uncles'; I would add an 'of' before 'the' to correspond to the genitive suffix

140 <en>.


081 <MONTH>

is an error for

082 <yw>

with a dot. Hence <yw.e> is a transcription of the Liao Chinese name 越 *Ywe.

5. Transcription of Liao Chinese 國 *gueiʔ 'state'*.

6. Transcription of Liao Chinese 王 *ong 'prince'.

Blocks 4-6 means 越國王 'prince of the state of Yue'.

7. 076-020 <gho.y> may be a verb stem.

361 <én> could be a nominalizing suffix, though I would not expect <é> after <gho.y> if Khitan vowel harmony was like Mongolian or Manchu vowel harmony.

Is 140 <en> a genitive before 'tomb': 'on the tomb of ...'?

8. 'tomb-LOC': 'on the tomb'.

9. Kane transcribed 052 as <RECORD>, and stated that it "is only found in the word

[052-334] <> [= my <g>] 'record'

with various suffixes." However, it can occur in isolation and with characters other than 334, though it cannot occur in noninitial position (Qidan xiaozi yanjiu 1985: 201-202, 690-691). That suggests 052 is not a logogram. Aisin Gioro read it as <cu> in 2004 and <ce> in 2011.

361 <én> is a nominalizing suffix. Kane (2009: 155) translated 052-334-361 <?.g.én> as 'inscription'.

*4.8.3:48: Although the Khitan may have borrowed Liao Chinese 國 as gui [guj], I suspect the Liao Chinese pronunciation was *gueiʔ [kwəjʔ]. In Middle Chinese, 國 was *kwək, and has developed in at least two different ways in modern Mandarin dialects:

1. *kwək > *kwəɰk > *kwəɰʔ > *kwəjʔ > [kwej] (e.g., Jinan)

2. *kwək > *kwəʔ > [kwo] (e.g., Beijing)

Forms like Linquan [kwɛ] or 13th century Phags-pa Chinese ꡂꡟꡠ <gue> may be from either *kwəjʔ or *kwəʔ with fronting of the schwa.

The Khitan borrowed from a dialect with the first path of development.

Prescriptive 15th century Sino-Korean 귁 kuyk might be a conscious compromise between actual Sino-Korean 국 kuk and Ming Mandarin [kuj].


According to my harmonic unwritten vowel hypothesis,

251-084 <n.ra> 'tomb'

in the Khitan small script was read nara without the apparent harmonic violation of Kane's (2009: 123) nera. So far, so good. But the dative-locative suffix for 'tomb' is de, not *da:

251-084-205 <> 'tomb-LOC'

This is not an isolated spelling. It occurs seven times in four texts over a span of a century:

- twice in 蕭令公 (1.10, 26.14; 1057)

- once in 許王 (2.17; 1105)

- once in 耶律撻不也 (1.10; 1115)

- thrice in 蕭仲恭 (lid 3.2, 1.8, 44.38; 1150)

I wonder if there are even earlier occurrences. Did the harmonic form *nara-da ever exist: e.g., at the time of the invention of the small script c. 925?

Here are other examples of seemingly nonharmonic dative-locative -de:

051-251-205 <> '?-DAT/LOC'? (蕭令公 12.17) instead of *ghan-da (assuming ghan is the stem though it is not attested in isolation)

071-205 <> 'prince-DAT' (蕭仲恭 4.51) instead of 071-217 <> (quoted in Kane 2009: 137; source not specified)

076-189-099-205 <> '?-DAT/LOC' (耶律撻不也 21.1) instead of *ogha(a)d-da

141-205 <> 'seven-LOC' (蕭仲恭 8.12) instead of *dolo-do

But -de is expected if Aisin Gioro's (2004, 2005) reconstruction of 'seven' as dil is correct.

248-118-205 <jal.qú.de> '?-DAT/LOC' (許王 50.17) instead of *jalqu-du

The reading <jal> is from Aisin Gioro (2004).

Was nara-de a harbinger of the ultimate fate of the Khitan dative-locative? If Khitan had survived, would it have an invariable -de [də], just as the Jurchen dative-locative suffixes

<do> and <du> (= Kiyose's dö)

merged into Manchu de [də]? Could such an invariable -de already have existed in late colloquial Khitan, emerging occasionally in texts that otherwise reflected harmonic allomorphy lost in speech?

4.7.0:56: Khitan had an invariable accusative-instrumental suffix -er, though the homophonous perfective suffix had -ar and -or  allomorphs (Kane 2009: 131, 145-146). Would as yet undiscovered 10th century small script texts also have accusative-instrumental -ar and -or? Why did merger occur in the accusative-instrumental before the dative-locative? Was disambiguating the former from a homophonous verb suffix a factor?

Unlike Khitan, Jurchen had three allomorphs of the accusative suffix:


<ba> (written with two types of characters), <be>, <bo>

All three merged into Manchu be [bə].


On Friday I was looking for the name of Yelü Abaoji's father

244-084-051-099-222 <ń>

transcribed in Chinese as 撒 剌汀 *saʔlaʔding or 撒剌的 *saʔlaʔdiʔ* in Kane (2009). Last night I found it on page 129. I also rediscovered my 2014 post on the name.

Last year I interpreted 084 as ar and read the name as Sargha(a)diń. But if 084 was ar, what was the difference, if any, between it and 123

which also represented ar?

Kane (2009: ) read 084 as ra and tentatively reconstructed an inherent vowel e in 244. Hence he read the name as Seraghadiń. The coexistence of e and a is unexpected in Mongolic or Jurchen/Manchu. There is no guarantee that Khitan vowel harmony was like Mongolic or Jurchen/Manchu vowel harmony, but the limited evidence suggests some degree of similarity. So I am skeptical that the name contained an e. However, other alternatives also have problems: e.g., Sargha(a)diń above. A zero-vowel interpretation of 244 results in Sragha(a)diń with an un-'Altaic' (and hence unlikely) initial cluster. The Chinese transcriptions cannot help us, as Liao Chinese had no *se or *sr-, so 撒剌 *saʔlaʔ- could represent Khitan Sar-, Sera-, or Sra-.

A fourth possibility is that the name was Saragha(a)diń with an unwritten first vowel. Were Khitan small script readers able to supply unwritten vowels with the aid of vowel harmony rules? Perhaps 244 was read as s, sa, or se depending on context. In this case, it was read as sa because sr- would be an impermissible initial cluster and sera- would violate vowel harmony.


244-084-254 <s.ra.d> '?' and 251-084 <n.ra> 'tomb'

which Kane read as serad and nera would be read as sarad and nara according to my harmonic hypothesis.

In these cases, the reader would have to look ahead to determine whether the vowels of 244 <s> and 251 <n> would be a or e.

Conversely, readers of the traditional Mongolian script keep previous vowels in mind to disambiguate later vowel letters: e.g., the second vowel letter of


<eja/en> 'lord'

has to be read as e because the first vowel is e. Although a medial a looks exactly the same as a medial e, *ejan would violate vowel harmony.

*撒剌的 is from the History of the Liao Dynasty. I don't know where Kane (2009: 129) found 撒剌汀.

15.4.4:23:40: WHY <SA> MANY?: PART 1

I have already discussed

(~~) and ,

two of the eight types of Jurchen <sa>-graphs, at length in "Jurchen Polyphony 2", "That Yu-ni- Component", and "Un-<sa>rtain-'tea' ", so I will move on to the third which is only attested in two names:

the surname <sa.hala>* (女真進士題名碑 21)

the personal name <>** (慶源郡女真國書碑 4:2)

Was this <sa> intended solely for use in names, or was it used to write other words absent from the few texts that we have on hand?

*4.5.1:31: Jin Guangping and Jin Qizong (1980: 311)  and Jin Qizong (1984: 107) read the second character as xala = my hala. However, the entry for that character in Jin Qizong (1984: 129) listed gal as Jin Guangping's reading and does not include the surname as an example. To confuse matters further, the Chinese transcription of the name is 撒合烈 *saʔhoʔlieʔ with different vocalism that is not harmonic. I would expect something like *saʔhoʔlaʔ.

**4.5.1:35: I cannot explain the nonharmonic sequence -ae. If u and i were neutral vowels, I would expect *udisaa or *udisee. Could the name be of non-'Altaic' origin: i.e., from a language without vowel harmony? But what language would that be? The name is too long to be Chinese.

Tangut fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2015 Amritavision