The Khitan small script has two big advantages over the Khitan large script for modern scholars:

- a smaller number of characters: i.e., fewer variables - perhaps 400 small script characters excluding variants as opposed to perhaps 1,000 large script characters excluding variants

- clustering of characters in blocks corresponding to words (or at least morphemes) as opposed to no obvious word or morphemic division in the large script

The key word in the second point is "obvious". There are clues to word division in the large script if both scripts represent the same language.

Khitan is an 'Altaic'-type language (but see here!) with suffixes and vowel harmony. For example, there are four dative-locative suffixes in the small script:

Khitan small script Transliteration Generally after stems with
<iú> (<da>?*) <a>
<de> <e> or <i>
<do> <o>
<dú> <u>

As we'll see below, there are exceptions to these patterns.

Since large script characters often represent single syllables, I would expect each of these small script characters to have a large script counterpart.

So far the only large script dative-locative suffix I know of is

<de> [tə]?

which may be graphically related to Old Chinese 時 *də. If the reading of a Khitan large script character resembles a pre-Liao Dynasty reading of a similar Chinese character, that may either be a coincidence or evidence for Janhunen's Parhae hypothesis (in which the Khitan large script was based on a Parhae script that was a sister of mainstream sinography rather than an early 10th century invention).

In theory, if we see <de> in a Khitan large script text, it may be

- a dative-locative suffix for a noun with <e> or <i>

- a phonogram for <de>

- at the end of a word

- in the middle of a word

- at the beginning of a word

- which is a monosyllabic word

If we see a string <> and see <X> with other known case suffixes, then <X> is probably a noun with <e> or <i>.

Or is it? <jau tau> 'bandit suppression commissioner' has neither <e> nor <i> but is followed by <de> in the large script:

<jau tau de> (Yongning 8; Kane 2009: 174)

Its Chinese source 招討 *tʃiaw tʰaw does have an *i. Was the Khitan word [tɕiaw tʰaw] with an [i] reflected in vowel harmony but not in its spellings with <jau>? I have not found any large or small script examples of <tau> 'five' plus <de>. That may indicate <iau ... au ... e> was possible but not <au ...  e>.

Another case in which the first vowel of a word may determine vowel harmony is

<> 'on the tomb' (Kane 2009: 137)

This begs the question of how a disharmonic word


came to be. <ne.ra> is probably not a compound of <ne> and <ra> since it is unlikely that any Khitan word could begin with <r>.

Perhaps sporadic cases of <de> after non-<e>/<i> vowels indicate that the case markers were beginning to collapse into a single [tə] usable after any vowel. Such a suffix would have been like the Manchu dative-locative suffix -de [tə] which is a merger of Jin Jurchen

-do and -dö

(as reconstructed by Kiyose 1977: 42; Jin 1984 reconstructed -do and -du).

Is the Proto-Tungusic dative *-du (as reconstructed by Georg 2004) a borrowing from Khitan or a related language? Are its locative uses in Jurchen/Manchu due to the influence of Khitan which lacked a dative/locative distinction?

*<iú> is clearly something like <iú> in Khitan small script transcriptions of Chinese syllables with *-y, but appears where <da> would be expected after nouns. Did it have two unrelated readings? DERUSU UZĀRA

Much of my career has involved the reconstruction of languages through scripts for other languages: e.g., Old Japanese and Old Korean through Chinese characters, Tangut through Chinese characters and the Tibetan script, etc. To understand how transcription worked in the past, it is useful to study transcriptions in the present.

Today I was surprised by how Russian Дерсу Узала Dersu Uzala [dʲɪrˈsu ʊzɐˈla] corresponded to Japanese デルス・ウザーラ Derusu Uzāra with an unexpected long vowel. Russian does not have phonemic vowel length, and these languages which do have phonemic vowel length only have short vowels in their versions of the name:

Czech Děrsu Uzala (not *Uzála; the háček indicates palatalization of the preceding D)

Finnish Dersu Uzala (not *Uzaala)

Slovak Dersu Uzala (not *Uzála)

What was Dersu Uzala's original Nanai name? Nanai has long vowels, so I thought it might be Dərsu Uzāla, but then I discovered that the name is Дэрсүү Узаала Dersüü Uzaala as well as Дэрсү Узала Dersü Uzala in Mongolian. Was the Nanai name Dərsü̅ Uzāla with two long vowels?

Is the Japanese name based on Nanai, and if so, why was only one long vowel retained? Could Derusu Uzara be based on Tuvan actor Maxim Munzuk's pronunciation of his character's name?

I assume the Czech, Finnish, and Slovak names are transliterations of Russian without any reference to Nanai.

The first long vowel in Mongolian Дэрсүү Узаала corresponds to a stressed Russian vowel, but the second doesn't. What is the reasoning behind final stress for both halves of the name in Russian?

Closer to home, I don't understand how stress is assigned to Hawaiian and Japanese names in English in Hawaii: e.g.,

[kʰʊˈhiːow] for Kūhiō [ˌkuːhiˈoː]

[tʰəˈnakə] for Tanaka

Stress in Hawaiian does not necessarily match stress in Anglicized Hawaiian names, and Japanese has no stress. CELTIC HEROES AT DAWN

I was recently asked about the etymology of the English word Easter and whether it had a Celtic connection. The Proto-Indo-European root of Easter is *ʕews 'shine', so I would expect Proto-Celtic *aus according to the correspondences here. However, the University of Wales has a file with a very different reconstruction:

Proto-Indo-European *ʕe w s -
Hypothetical Proto-Celtic *a u s
Actual? Proto-Celtic *wā s ri-

The initial consonants of Middle Irish fair 'sunrise' and Welsh gwawr 'dawn' point to Proto-Celtic *w-. Did *au irregularly become *wā?

Simply reversing *a and *u would not account for the long vowel. Does that vowel reflect a Proto-Indo-European lenghthened grade ēws?

Is *-ri- a suffix?

Not far from *wāsri- 'dawn' in that list of Proto-Celtic reconstructions is *wāro- 'hero'. The latter superficially (and presumably only coincidentally) resembles Proto-Indo-European *wiHro- 'man' whose true Proto-Celtic reflex is *wiro- (> Irish fear and Welsh gŵr). What attested Celtic forms underlie the reconstruction of *wāro-? DR. ZHIVOGO (SIC)

The Russian name Zhivago = Živago [ʐɨˈvagə]. was borrowed from the Old Church Slavonic definite adjective živago '(of) the living' (masculine animate accusative-genitive singular), a contraction of živ-a 'living' plus -jego 'that which is known'. The native Russian equivalent of this adjective is živogo [ʐɨˈvovə] with a different second vowel.

Two weeks ago I wrote about the unexpected final -a of masculine animate accusative-genitive singular adjectives in Serbo-Croatian and Slovene. I was also puzzled by the penultimate vowel o in Serbo-Croatian which is like that of East Slavic:

Branch Language '(the) living' Type Cf. 'him/his' Cf. 'whom/whose'
(n/a) Proto-Slavic *živ-a-jego aje *jego *kogo
South Old Church Slavonic živajego, živaago, živago aje/aa/a jego kogo
East Russian živógo o jegó [jɪˈvo] kogó [kɐˈvo]
Belarusian živóha jahó kahó
Ukrainian žyvóho johó kohó
South Serbo-Croatian živog(a) njega kog(a)
Slovene živega e koga
West Polish żywego jego kogo
Czech and Slovak živého jeho koho

Most languages above have transparent compressions of *aje. Old Church Slavonic favored the first vowel (aje > aa > a), whereas Slovene and West Slavic favored the second (*aje > long é in Czech and Slovak, e in Slovene and Polish). However, East Slavic and Serbo-Croatian o does not look like a compression of *aje.

One could try to explain that o as having assimilated to the following *o which then became a in Belarusian and Serbo-Croatian (and even optionally lost in the latter).

However, my guess is that the o was by analogy with the hard pronominal declension: e.g., *kogo 'whom/whose'. Ukrainian has taken this analogy further than the others so even soft-stem adjectives have o, whereas the others have e or a ja which may be from *e:

Ukrainian -'oho (with palatalization of the stem-final consonant before o)

Belarusian -jaho < *-ego (or *-ogo before a palatalized stem-final consonant?)

Russian -ego

Serbo-Croatian -eg(a)

Moving on from morphology to semantics, is the surname Živago really a frozen genitive?

The same Old Church Slavonic word is an accusative in Luke 24: 5: 'Why do you seek the Living One (živago) among the dead?' (See the OCS text with a Russian translation here.)

That line in turn reminded me of Dutch surnames that are frozen accusatives: e.g., Den Beste.

I would like to see studies of case frequency (e.g., how frequent are accusatives relative to nominatives?) and case freezing (e.g., how often do frozen forms originate from old accusatives as in Romance?). I don't expect a simple account of the latter since multiple factors can influence speakers' choices of forms to freeze. For instance, suppose two languages lost their case system. Language A originally had a single stem for all cases and a zero ending for the nominative. On the other hand,  language B originally had one stem for the nominative and an oblique stem for all other cases.

Case Language A Language B
Nominative ka cəns
Accusative ka-ti cəs-e
Genitive ka-pu cəs-o

I would expect language A to have nouns based on the nominative (e.g., ka) and language B to have nouns based on the oblique (e.g, cəs instead of *cəns). ROMANIZZAZIONE DELLA LINGUA RUSSA

This morning I saw this Italian cover for Doctor Zhivago. The style of romanization caught my eye because of its use of accents and háčeks (in bold):

Borís Leonídovič Pasternàk


The accents correspond to Russian stress. I expected all stressed vowels to bear grave accents. Was an acute accent chosen for í because it is high and more like mid-high é [e] and ó [o] than mid-low è [ɛ], ò [ɔ], and low à [a]? Why doesn't "Živago" bear an accent? Because the accent is penultimate and predictable? The Italian Wikipedia has no accent in the title Il dottor Živago in the title but has an accent on the character's name Jùrij Andrèevič Živàgo. (Stressed ù [u] has a grave accent even though it is high like í [i].)

Do most Italian understand the function of the háček? According to Wikipedia, Zivago and Boris Leonidovich Pasternak without háčeks are the usual Italianizations of those names. Similarly, I see Chernobyl at Corriere della sera, though the Italian Wikipedia has Černobyl' with a háček and an apostrophe for the soft sign. Should encyclopedias favor scientific transcriptions over lay transcriptions? Which is a user more likely to look up, Chernobyl or Černobyl'? Does it matter if one redirects to the other?

I'm surprised there is no article on Russian romanization or transliteration in the Italian Wikipedia. THE CHARACTERS OF MINISTERS: MODIFIED-MODIFIER ORDER IN KHITAN

Let me take a brief diversion into Khitan syntax. (I say that as if any of you could stop me!)

While looking through Kane (2009) for the umpteenth time, I noticed something odd that should have caught my eye years ago. Khitan had three equivalents of Chinese 四字功臣 'four character meritorious official' (i.e., an official whose title is written with four Chinese characters):

1. <FOUR us.g.d g.ung> 'four characters meritorious official'

2. <g.ung FOUR us.g.d > 'meritorious official four characters'

3. <g.ung us.g.d FOUR> 'meritorious official characters four'

1 follows the modifier-modified order that is the norm in 'Altaic' languages and even in non-Altaic Chinese. 2 and 3, however, have un-'Altaic' order. 4-6 are structurally similar to 1-3:

4. <ONE us.g.en g.ung> 'one character-GEN meritorious official'

5. <g.ung EIGHT us.g.d> 'meritorious official eight characters'

6. <g.ung us.g.d SIX> 'meritorious official characters six'

<g.ung cin> is a Chinese loan; it is bimorphemic in Chinese but was probably monomorphemic in Khitan, so I do not ever expect to see

*<cin g.ung>

'official meritorious'. (Similar Chinese loans retain Chinese morpheme order in Vietnamese which has un-'Altaic' modified-modifier order.)

At first I thought 2 and 5 had mixed order -

modifier + modified: numeral + 'character'

modified + modifier: 'meritorious official' + (numeral + 'character')

- but then I realized 'four characters' in 2 and 'eight characters' in 5 could be analyzed as single syntactic units following nouns rather than as numeral-noun sequences.

How can the un-'Altaic' order in 2, 3, 5, and 6 be explained? Are there other Khitan phrases with modified-modifier order? KHITAN SMALL SCRIPT CHARACTERS IN AISIN GIORO (2012) BUT NOT KANE (2009)

I have begun to compile a database of Khitan small script characters to facilitate my study of Aisin Gioro Ulhicun's Khitan reconstruction. So far it includes 473 characters in two numbering systems (Qidan xiaozi yanjiu/Kane's and Aisin Gioro's):

378 from Qidan xiaozi yanjiu

2 added to those 378 in Kane (2009)

90 in Andrew West's font that are not in Qidan xiaozi yanjiu or Kane (2009)

3 that are in Aisin Gioro (2012) but not in any of the above sources or N3820. (4.12.0:30: I can't see the Khitan characters in N3918R, but I assume they are the same as those in N3820, as the total number of characters has not changed.)

The latter three are her numbers 109,  234, and 293:

Unfortunately her reconstructions of their readings are in her 2011 book 契 丹語諸形態の研究 which I haven't seen. AISIN GIORO'S RECONSTRUCTIONS OF KHITAN VOWELS

After four posts in a row about consonants, it's time to look at vowels for a change.

Aisin Gioro Ulhicun has not yet publicly released a full description of her reconstruction of Khitan phonology, but I can attempt to reverse-engineer it from the fragments in this 2012 article which has her latest reconstructions of many Khitan small script characters in the fourth column. (Some reconstructions are in Aisin Gioro's 2011 book 契 丹語諸形態の研究 which I haven't seen.) Those reconstructions have eleven vowels:

i  ï u
ö ə o
æ a ã ɑ

Other vowels (e.g., ü, ɪ, e) may be in the 2011 reconstructions I have not yet seen.

The core vowels appear to be i, ə, a, u, and o. The others appear only in restricted environments to the best of my limited knowledge:

- æ is only in closed syllables (cf. English [æ] which cannot appear in word-final position)

- ã is only in 45 <qa> ~ <qã> 'khan' (= 051 <ha> in Kane 2009; see Andrew West's list for the glyphs corresponding to each of Kane's numbers which are all from Qidan xiaozi yanjiu except for the last two)

Do any known northeast Asian languages have nasal vowels?

If we didn't know about the word 'khan' in other languages, would it be possible to reconstruct a nasal vowel?

I suppose it is possible that 'khan' is the only word in Khitan with a nasal vowel because it was borrowed from a language in which *-an became *-ã, but I wouldn't bet on that.

- ɑ is only in the closed syllable 160 <tʃɑl> (= 183 <car> in Kane 2009)

Is <ɑ> a typo for <a>? If not, why not reconstruct <tʃal>? What is the evidence for a back allophone of */a/ before a final (velarized?) */l/?

- ö is only in 324 <u> ~ <ö> (= 372 <û> in Kane 2009); it transcribes Chinese *u and in native words corresponds to Mongol ö and perhaps u (see Kane 2009: 80, 99, and 105)

- ʊ is only in 313 <ʊŋ> and 320 <tʃʊŋ> (= 357 <úŋ> and 367 <źuŋ> in Kane 2009) for Chinese loanwords

I don't see why 313 (Kane 357) can't be reconstructed as <uŋ>. I don't know what the difference was - if any - between its rhyme and the rhyme of Kane's 106/345 <uŋ>; all three characters transcribed Chinese *-uŋ. Kane (2009: 77) regarded 346 as a variant of 345, though he gave no examples of their interchangeability. Kane 181 <iúŋ> for 龍 *ljuŋ or *lyŋ might have been <üŋ> (= Aisin Gioro's 158 <juŋ>).

320 (Kane 367) is probably <ywiŋ> since it transcribed Chinese 榮 *jwiŋ (which still rhymed with *-iŋ words during the Liao Dynasty; see Kane 2009: 249) and is clearly derived from 榮:

*jwiŋ shifted to *juŋ by the Yuan Dynasty and did not develop a *z-like initial until after the Yuan Dynasty, long after the fall of the Khitan. It never had an affricate initial in Chinese, so I do not know why Aisin Gioro reconstructed 320 with<tʃ->. (This section revised 4.10.23:27.)

- ï is presumably only for Chinese loans, though I wonder if it also existed in native Khitan words (Korean kŏran < *kətan 'Khitan' may imply a Khitan *qïtan, and Janhunen [2003: 5] reconstructed in pre-Proto-Mongolic.)

Only six vowels (the core five pljus æ) appear in diphthongs:

Rising diphthongs: iV = /jV/?


I have included ju since it's not clear to me how iV and jV are different in Aisin Gioro's reconstruciton.

Rising diphthongs: uV = /wV/?


I could have included ui if it was /wi/, but I suspect it was the high counterpart of oi /oj/ (see below) and the mirror image of ju (see above).

Falling diphthongs: Vi = /Vj/?

  əi oi
æi ai  

Falling diphthongs: Vu = /Vw/?

  əu, (jəu)  

Comparing the four tables above, a pattern emerges:

V in iV/Vi is never nonlow and front

V in uV/Vu is always nonhigh and central

Offhand I wonder if Aisin Gioro's core vowel system is compatible with a height harmony system:

high i ə u
low æ = /e/? a o

This is like the height harmony system of Middle Korean (and my reconstructions of Old Chinese and pre-Tangut). However, the limited distribution of æ makes me wonder if it was just an allophone of /a/ or /ə/.

I don't know where ö would fit. The conflicting clues for its pronunciation - back [u] or front [ø]? - reminds me of Manchu ū [ʊ] which was written like Mongolian ü.

I suppose the Chinese loan vowels ï and ʊ would fall into the high and low series.

I doubt ã and ɑ (as a phoneme distinct from /æ/ and /a/ as opposed to an allophone of /a/) ever existed. *C(.)R-USTERS IN BLACK TAI AND BAO YEN

I concluded "S-implification in Black Tai and Bao Yen" by writing,

Without looking at the development of other Proto-Tai *C(.)r-clusters in the two languages, I cannot be confident about these reconstructions.

I already gathered all the reflexes of Proto-Tai *C(.)r-clusters in Black Tai in that post. I list them again below in a more convenient tabular format along with the corresponding reflexes in Bao Yen from Pittayaporn (2009). Unlike Pittayaporn, I distinguish between *qr- and *q.r-, and I reconstruct *c.r- instead of *cr-.  I have added reflexes of *ʰr- and *r- for comparison.

Proto-Tai *pr- *p.r- *br- *tr- *r.t- *ʰr- *c.r- *kr- *qr- *k.r- *q.r- *gr- *voiced C.r- *r-
Black Tai pʰ-/f- t- p- h- tʰ- h- s- c- h- h-
Bao Yen pʰj- pʰ- pj- kʰ- r-, (l-) r-

See "S-implification in Black Tai and Bao Yen" for more on the development of *K(.)r-clusters. Details on other types of *r-clusters follow.

Notes on Black Tai:

1. Pittayaporn (2009) has /pʰ/ correponding to /f/ in Gedney's data in Hudak (2008). The former is more conservative:

*pr- > *pɣ- > *px- > /pʰ-/ > /f-/

2. *p.r- may have become *pr- and then *tr- after original *pr- and *tr- had been lost. This new *tr- then simplified to /t-/.

3. *br- lost all trace of its medial:

*br- > *bɣ- > *bɰ- > *bj- > *b- > /p-/

Compare with *gr- whose medial became *-j- and palatalized the preceding velar:

*gr- > *gɣ- > *gɰ- > *gj- > *kj- > /c-/

(The relative chronology of changes I do not discuss in detail is not intended to be exact: e.g., devoicing might have preceded palatalization.)

4. *r.t- merged with *tr- and perhaps became /h-/ via a dental fricative stage:

*r.t- > *tr- > *tɣ- > *tx- > *θ- > /h-/

Then again, if *kr- became a *kx- that simplified to *x-, then perhaps *tx- also simplified to *x-.

5. *voiced C.r- merged with *r-.*ʰr- and *r- may have become *x- and *ɣ- after original *x- and *ɣ- had become *kʰ- and *g- (now /kʰ/ and /k/). These new velar fricatives then backed and merged as /h/.

6. *c.r- (Pittayaporn's *cr-) may have become a third kind of *tr- dating between the other two (original *tr- and *tr- from *p.r-):

Proto-Tai *r.t-/*tr-merger *-r- > *-x- sesquillabic compression; *Cx- > *x- cluster assimilation Black Tai
*p.r- *p.r- *p.r- *pr- *tr- /t/
*tr- *tr- *tx- *x- /h/
*c.r- *c.r- *c.r- *tr- *tθ- /tʰ/

Cluster assimilation required one part of a cluster to become more like the other:

*pr- > *tr- (labial to dental)

*tr- > *tθ- (voiced sonorant to voiceless obstruent)

That is my attempt to find commonality between two otherwise seemingly very different paths of change.

Notes on Bao Yen:

1. *-r- became /(ʰ)j/ after labials as well as *g-. I don't understand why this didn't happen after *k- and *q-:

*pr- > /pʰj/

*br- > /pj/

*gr- > *kj- > /c/

but *kr-, *qr- > /kʰ/ (not */kʰj/)

Maybe there was a constraint against coronals + *-j-.

2. Pittayaporn's Proto-Tai *p.r- has two kinds of reflexes in Bao Yen:

/pʰj/ (like *pr-: e.g., 'shuttle of loom')

/pʰ/ (unlike *p.r-: .e.g., 'cucumber')

My guess is that some *p.r-words (e.g., 'shuttle of loom') compressed into monosyllables before others (e.g., 'cucumber') in pre-Bao Yen.

Here is how original *pr- and secondary *pr- might have developed:

*pr- > *pɣ- > *pʰɣ-  *pʰɰ- > /pʰj-/

*p.r- > *pr- > *pɣ- > *px- > /pʰ-/

3. Proto-Tai *ʰrwɯ:j A became Bao Yen /wi: A1/ rather than */wi: A1/, presumably because Bao Yen does not have initial /hw/. I don't know that for a fact; I only know that /hw/ is not in any Bao Yen word in Pittayaporn's data.

Note that Proto-Tai *hw- sans *-r- became Bao Yen /pʰ/.

4. I didn't reconstruct *kx- (from *kr- and *qr-) simplifying to *x- in Bao Yen, so I won't reconstruct *tx- (from *tr- and *r.t-) simplifying to *x-. Instead, I'll have *tx- fuse into *θ- and back to /h-/ (cf. Black Tai note 4 above).

5. Unlike Black Tai, Bao Yen did not merge *ʰr- and *(voiced C.)r-. Only the former became /h/; the latter (generally?) remained *r- (Pittayaporn found a single case of /l/ < *voiced C.r- - perhaps a loanword?).

6. I'm not happy with how I bridged *c.r- and /tʰ/ in Black Tai (note 6 above), but I can't think of any better solution, and for now I recycle it for Bao Yen. THE ORI--IN OF MOHAWK'S ONLY AFFRICATE

I rediscovered Mohawk when looking for a language lacking m and found it mentioned in Wikipedia's article on bilabial nasals. I last wrote about Mohawk five years ago after seeing it on a stop sign. Back then I didn't mention Mohawk's only affricate /dʒ/ which apparently is quite different from other obstruents judging from Wikipedia's description of Mohawk phonology:

- it is always voiced: [dz] ~ [dʒ] (depending on dialect)

- it patterns in clusters like a sonorant rather than an obstruent

One other unusual characteristic of /dʒ/ is its ability to combine with /j/ in both initial and medial position. Is /dʒj/ pronounced [dʒj] which would be very difficult to distinguish from [dʒ] without [j]? Or is /dʒj/ pronounced as palatal [dʑ] or [ɟ]?

I wonder if /dʒ/ was originally a voiced sonorant like *r which does not exist in modern Mohawk (though Proto-Iroquoian had *ɹ). I am not confident about that solution for three reasons:

1. I don't know what /dʒ/ corresponds to in other Iroquoian languages. Is it from Proto-Iroquoian *ts?

2. I don't know of any other language in which *r(j) hardened to /dʒ(j)/: *r > > *ʒ > /dʒ/ or *r > > *dʐ > /dʒ/.

3. If *r hardened to /dʒ/, I would expect other instances of fortition. Do such instances exist?

Like Mohawk, Proto-Iroquoian lacked labials other than *w. Did pre-Proto-Iroquoian undergo lenition: e.g., did *p and *m weakened to *w?

If Proto-Iroquoian had no *m, where did Cherokee /m/ come from? Loanwords? S-IMPLIFICATION IN BLACK TAI AND BAO YEN

I've read the introduction to Ostapirat (2000) many times, but recently this passage on p. 19 jumped out at me, and not just because of the un-PC use of "inferior":

"Kra", the autonym which originally means 'human being' [...] Cf. the related form in Black Tai /saa C1/, which has been borrowed as Vietnamese /xá/ to designate various inferior ethnic groups in Vietnam

How did Kra come to have initial [s] in Black Tai and Vietnamese? (Vietnamese x is [s].)

Black Tai is spoken in northwestern Vietnam, and I initially thought that perhaps it had undergone the shift

*Cr- > *ʂ- > s-

which had also occurred in northern Vietnamese. (Southern Vietnamese still has [ʂ].) The resulting /saa C1/ was then borrowed as xá.

However, that was not the case:


*pr- > Black Tai /pʰ-/ (Pittayaporn 2009: 140) or /f-/ (Gedney 0287, 0300) but northern Vietnamese s [s]

*br- > Black Tai /p-/ (e.g., Gedney 0647)

*tr-, *kr- (and my *qr-; see below) > Black Tai /h-/ (Pittayaporn 2009: 141, 143; e.g., Gedney 0081 and 0082) but northern Vietnamese s [s]

*cr- > Black Tai /tʰ-/ (Pittayaporn 2009: 142; e.g., Gedney 0706) but northern Vietnamese s [s]?

*gr- > Black Tai /c-/ (e.g., Gedney 0160)

*qr- (= my *q.r-; see below) > Black Tai /s-/ (Pittayaporn 2009: 144; e.g., Gedney 0124)

*C.r-sequences (i.e., sesquisyllables beginning with *CVr-)

*p.r- > Black Tai /t-/ (e.g., Gedney 0345)

*k.r- > Black Tai /s-/ (e.g., Gedney 0120)

*Unknown voiced consonant + r- > Black Tai /h-/ (e.g., Gedney 0310)

Gedney numbers refer to cognates in Hudak 2008.

Pittayaporn's Proto-Tai has no *dr-, *ɟr-, or *ɢr-; these may be accidental gaps.

Pittayaporn (2009: 337) reconstructed Proto-Tai *kraː C 'slave' which should have become Black Tai */haa C1/ but is /saa C1/ as if it were from *qraː C or *k.raː C. My guess is that pre-Black Tai retained a sesquisyllabic *k.raː C that collapsed into a monosyllabic *kraː C in other early Tai varieties.

Bao Yen  (Pittayaporn 2009), another Tai language in Vietnam, also seems to have retained *k.raː C. Compare its reflexes of *K(.)r- with those of Black Tai (Gedney in Hudak 2008). Exclamation marks indicate forms that would be irregular if they did not come from sesquisyllables.

Gloss Proto-Tai (Pittayaporn 2009) Bao Yen Black Tai
slave *kraː C (my pre-BY and pre-BT *k.raː C) /saa C1/ (!) /saa C1/ (!)
spider *krwaːw A (my pre-BY and pre-BT *k.rwaːw A) /saaw A1/ (!) /saaw A1/ (!)
to imprison *k.raŋ A /saŋ A1/ /saŋ A1/
six *krok D (my pre-BY *k.rok D) /sok DS1/ (!) /hok DS1/
to seek *kraː A (my pre-BY *k.raː A) /saa A1/ (!) /haa A1/
to sift *qrɤŋ A (my pre-BY *q.rɤŋ A) /sɤŋ A1/ (!) /hɤŋ A1/
egg *qraj A (my pre-BT *q.raj A) /kʰaj B1/ /saj B1/ (!)
mountain stream *qrwɤj C /kʰuəj C1/ /huaj C1/
to laugh *krɯəw A /kʰuə A1/ /hua A1/
fish net *kreː A /kʰɛː A1/ /hɛ A1/
mortar *grok D /cok DS2/ /cok DS2/

Unlike Pittayaporn, I distinguish between *q.r- and *qr-; his *qr- corresponds to my q.r- whose reflexes are like those of *k.r-, and my new *qr- has reflexes like those of *kr-.

Like 'slave', 'spider' was sesquisyllabic in the ancestors of Bao Yen and Black Tai.

Pre-Bao Yen apparently retained sesquisyllabic forms of 'six', 'seek', and 'sift' that became monosyllables in pre-Black Tai and other early Tai varieties.

Conversely, pre-Black Tai retained a sesquisyllabic form of 'egg' that became a monosyllable in pre-Bao Yen and other early Tai varieties.

Here is how *K(.)r- might have simplified:

Black Tai

Proto-Tai Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Today
*kr-, *qr- *kr- *kɣ- *kx- *x- /h/
*k.r-, *q.r- *k.r- *kr- *kʂ- *ʂ- /s/
*gr- *gr- *gɣ- *gɰ- *gj- *kj- /c/

I cannot reconstruct a *kʰ-stage between *kr-, *qr- and /h-/ because Black Tai has a /kʰ-/ distinct from /h-/.

Bao Yen

Proto-Tai Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Today
*kr-, *qr- *kr- *kɣ- *kx- *kʰ- /kʰ/
*k.r-, *q.r- *k.r- *kr- *kʂ- *ʂ- /s/
*gr- *gr- *gɣ- *gɰ- *gj- *kj- /c/

The relative chronology is only approximate.

Without looking at the development of other Proto-Tai *C(.)r-clusters in the two languages, I cannot be confident about these reconstructions. MEETING AT NIGHT: GEMINATION IN UKRAINIAN AND BELARUSIAN DECLENSION

While looking for a free online Ukrainian grammar, I found Martin Dietze's "Ukrainian Grammar Short Reference" at his (what a domain name!) with the paradigm of зустріч 'meeting':

Case/number Ukrainian Belarusian Russian
Nominative/accusative singular зустріч ніч ноч ночь
Vocative singular* зустріче ноче -
Genitive/dative/locative singular
Nominative/vocative/accusative plural
зустрічі ночі ночы ночи
Instrumental singular зустріччю ніччю ноччу ночью
Genitive plural зустрічей ночей начэй ночей
Dative plural зустрічам ночам начам ночам
Instrumental plural зустрічами ночами начамі ночами
Locative plural зустрічах ночах начах ночах

I've included the paradigm of ніч 'night' and the paradigm of its Belarusian and Russian cognates for comparsion. (The Belarusian and Russian cognates of зустріч are сустрэча and встреча which belong to a different declension.)

Why do Ukrainian and Belarusian have stem-final gemination (in bold) in the instrumental singular? My guess is that the consonant lengthened to compensate for the loss of the vowel which is still preserved in Russian orthography:

*nočьju  > U [nʲitʃʲːu], B [notʂːu], R [notɕu]**

Compare how a vowel rather than a consonant lengthened in a similar environment in Japanese: *tiyu > *tyu > chū.

I am also reminded of Pali geminates from Sanskrit *-(C)Cy-: e.g.,

maccu- 'god of death' < mṛtyu- 'death'

maccha- < matsya- 'fish' (Does the aspiration of geminate cch indicate that Sanskrit s was aspirated [sʰ] like Korean s? See Jacques 2011 for more on aspirated fricatives.)

According to Wikipedia, there was no gemination if there were one or more consonants before *CьjV. Gemination would have resulted in a C1C2ː-cluster which doesn't exist in Ukrainian. See Wikipedia and Mayo (1993: 903) for further constraints on gemination in Ukrainian and Belarusian.

4.6.20:39: The Ukrainian verb лити [lɪtɪ] 'to pour' has similar stem-initial gemination: e.g.,

*lьju > ллю [lʲːu] 'I pour' (cf. Belarusian and Russian лью [lʲu])

However, the stem-initial gemination of Ukrainian ссати [ɑtɪ] and Belarusian ссаць [atsʲ] 'to suck' appears to be from *sъ- with instead of *ь; cf. Russian сосать [sɐsatʲ].

*Dietz did not list a vocative singular for зустріче, so I supplied one by analogy with ноче.

**Can Ukrainian, Belarusian, and Russian speakers tell each other apart by their pronunciations of ч /č/? I have never heard Belarusian, and have relied on Wikipedia's IPA guides (Ukrainian, Belarusian, and Russian) for the phonetic forms here. To my poor ear, Ukrainian and Russian ч sound similar, and both sound completely different from Mandarin j [tɕ] even though Wikipedia's IPA for Mandarin j is identical to its IPA for Russian ч. FROM FATHER TO UNCLE

I think Proto-Indo-European *pʕtḗr 'father' could have become something like *poti in Proto-Slavic*, but in fact Proto-Slavic had a different word *otьcь for 'father', and according to Schenker (1993: 113), the Indo-European word for 'father' became Proto-Slavic *strъjь 'paternal uncle' (presumably from *s- + zero grade  *ptr- + -ъjь). I have three questions about *strъjь:

1. Is the initial *s- s-mobile?

2. What is the suffix *-ъjь?

3. What other languages shifted 'father' to 'uncle'? Or 'mother' to 'aunt'?

*Cf. Proto-Slavic *mati 'mother' from *méʕtēr. ĐER(I)-ATION

I am puzzled by the Glagolitic letter Ⰼ for several reasons (not even including the derivation of its form!):

1. Why does it exist? Wikipedia lists its sound value as /dʑ/, though that phoneme did not exist in Old Church Slavonic.

2. Does it really correspond to modern Serbo-Croatian ћ ć, its voiced counterpart Serbo-Croatian ђ đ, and Macedonian ѓ ǵ, as Wikipedia implies? None of those three sounds existed in Old Church Slavonic.

3. Why does its name (đervь ~ ǵervь 'tree') have initial đ- (see the Croatian Wikipedia) or ǵ- (both sounds that did not exist in Old Church Slavonic!) if no Slavic word for 'tree' has similar initial consonants: e.g., Serbo-Croatian and Macedonian drvo (not SC *đrvo or M *ѓрво *ǵrvo)?

(4.5.0:05: Moreover, why does its name end in -ь if Slavic words for 'tree' end in -o?)

3'. And why does Old Church Slavonic have дрѣво drěvo with ě if Proto-Slavic had *dervo with e?

4. Cubberley (1993: 24) listed ћ (transliterated as ǵ/j, not ć) as the early Cyrillic equivalent of Ⰼ, and questions 1 and 2 apply to ћ as well as Ⰼ.

On the other hand, Wikipedia does not mention ћ in its article on the early Cyrillic alphabet, though its article "Tshe" (ћ) does mention that the later Cyrillic letter ћ ć was based on the earlier Cyrillic letter ћ ǵ/j. Oddly the article on "Dje" (ђ), an obvious derivative of ћ, does not mention either ћ. THE VÖWELS OF PREKMÜRŠČINA

Two days ago, I asked about Slovene surnames with ü which I thought was un-Slovene. It's actually un-standard Slovene. I forgot that I read in December about how some Slovene dialects have front rounded vowels: e.g., that of Prekmurje, where Albina Nećak Lük is from (and where Danilo Türk's ancestors might be from; his native Maribor is "where significant immigrant communities from Prekmurje have settled"). I was reminded of that fact by the last line of the Lord's Prayer in Prekmurje Slovene on p. 273 of Francis Tapon's The Hidden Europe:

Prekmurje: nego odslobodi nas od hüdoga

Standard: temveč reši nas hudega

'but deliver us from evil'

I assume those front rounded vowels are due to Hungarian and/or German influence since Prekmurje borders Hungary and Austria. What I don't understand is how they developed in native words. In some cases, they seem to reflect nearby palatal segments: e.g.,

P odpüščamo 'we forgive' with ü before š (cf. standard odpuščamo)

P hüdoga < *hudega 'evil'? (assuming the standard form is more conservative)

Presumably ö is always conditioned (by some palatal segment?) since it is nonphonemic according to Wikipedia.

On the other hand, ü is presumably phonemic because it is unpredictable: e.g., it appears in the river name Müra (standard Mura) which contains no palatal segments other than ü.

The unusual vocalism of Prekmurje is not limited to front rounded vowels: e.g., it has au or ou from earlier *o and *ǫ: e.g.,

'God': Baug ~ Boug < *bogъ

'road': paut ~ pout < *pǫtь

I presume this shift postdated the merger of *o and *ǫ. But why did some *o break while others didn't? Is accent a factor? THE SERBO-CROATIAN CASE -GA-ME

Vowel correspondences between Slavic languages are generally very straightforward, so I'm frustrated by how Serbo-Croatian and Slovene third person pronoun and adjective case endings don't quite line up with their equivalents elsewhere. Unexpected vowels are in red.

masculine/neuter singular Proto-Slavic Serbo-Croatian Slovene Polish Russian
'he', 'it'
accusative/genitive *jego njega njega jego jego
clitic* accusative/genitive (*go) ga ga go n/a
locative *jemь njemu njem nim nem
dative *jemu njemu jemu jemu
clitic dative (*mu) mu mu mu n/a
adjective 'new' accusative (masculine animate), genitive *nova-jego novog(a) novega nowego novogo
short genitive *nova nova n/a
locative *nově-jemь novom(e/u) novem nowym novom
dative *novu-jemu novemu nowemu novomu
short locative *nově novu n/a
short dative *novu

(I have left out masculine inanimate and neuter accusative forms for 'new' since they are regular.)

Today I realized that the unexpected Serbo-Croatian and Slovene accusative/genitive a might be by analogy with the genitive ending -a in masculine and neuter o-stems (e.g., mesto 'place', 'town') and short adjectives (e.g., *nova).

*nova-jego města > SC novog(a) m(j)esta 'of the new place', Sl novega mesta 'of the new town'

Did this analogy occur

- indepedently in Serbo-Croatian and Slovene?

- in a common ancestor of Serbo-Croatian and Slovene (Proto-Southwestern Slavic as opposed to Macedonian and Bulgarian)?

- in Proto-South Slavic (and if so, are a-forms attested in earlier Bulgarian and Macedonian)?

Could the unexpected Serbo-Croatian dative-locative e also be due to analogy? If so, what would be the model? Proto-Slavic  masculine and neuter o-stems had a locative ending *-e. Were forms like novome first created at a time when Serbo-Croatian masculine and neuter o-stems (e.g., mesto 'place') and short adjectives (e.g., *nově) still had an *-ě-like locative? (Those stems now have -u for both dative and locative.)

*u nově-jemь městě > *u novome m(j)est(j)e? > SC u novom(e) m(j)estu 'in the new place'

*4.3.19:00: Wiktionary does not reconstruct clitic forms for the third person pronoun, but I have done so because nearly identical clitics are attested in all three branches of Slavic. (No standard East Slavic language has them, but ho < *go and mu are in nonstandard Ukrainian [Shevelov 1993: 960].) It's possible that all three branches (or even languages within them) independently dropped the first syllables of the third person pronouns to form the clitics, but is it probable?

4.3.22:18: When I first saw Serbo-Croatian -ga, I thought of how unstressed Russian -го is pronounced [və] and assumed that Serbo-Croatian a was also the product of vowel reduction. But I later rejected that idea because as far as I knew, the reduction of *o was unique to Russian and Belarusian. (Belarusian has -га [ɣa] corresponding to Russian -го [və] and SC -ga.) However, *o-reduction is actually more widespread that I thought: it's also in Upper Carniolan Slovene and Smolyan Bulgarian. Wikipedia reports akan'e (vowel reduction to a) in Polissian Ukrainian whose eastern variety is transitional with Russian, so I assume it has Russian-style *o-reduction (rather than Belarusian-style *o and *e-reduction). Nonetheless that wider distribution doesn't necessarily mean my original guess was correct. As far as I know, akan'e is completely unknown in Serbo-Croatian. LÜK-ING TÜRK-ISH

Tonight I rediscovered my copy of Language in the Former Yugoslav Lands. The name of the author of the chapter on Slovene caught my eye: Albina Nećak Lük. Neither ć nor ü are in Slovene. Ć is in Serbo-Croatian (and nećak is Serbo-Croatian for 'nephew' - a coincidence?), but ü is in neither Slovene nor Serbo-Croatian. Lük is from Prekmurje, where "[p]eople of different languages, Slovene, Hungarian, German, Romani, Yiddish, live in close contact for centuries". Is Lük a Hungarian or German name?

The umlaut in Lük reminded me of another name from Slovenia that puzzled me: Danilo Türk. I thought I might have already written about Türk, but Google tells me I haven't. In any case, these folks wrote about it years before I ever heard of him. The first post in that thread quotes this article.

I wonder how the average Slovene pronounces ć and ü. Are they consistently distinguished from native č and u? If not, who distinguishes them and who doesn't? "A TIME FOR TRUTH-TELLING"

Joanne Jacobs wrote that

March 31 is my birthday, a time for truth-telling.

I tried to translate the latter phrase and came up with

1ʐɨəʳ 1tshiee 2ziọ 'truth speak time'

That got me thinking about two of the various Tangut words for 'time':

0705 1ziẹ and 4861 2ziọ

Although their characters are completely different, they are phonetically similar and exhibit an e ~ o alternation almost always otherwise found in verbs. (See below for the only other example with nouns that I know of.)

Jacques (2009) regarded that e ~ o alternation as the result of suffixation. (I have converted his reconstruction into mine; the basic principle is the same.)

0749 *CI-pha > 1phi 'to send, cause' (stem 1*; no suffix)

4568 *CI-pha-w-H > 2phio 'to send, cause' (stem 2; suffixed)

Jacques regarded *-w as a third person patient suffix with a cognate -w still in northern Qiang today.

(The function of *-H which conditioned the second tone is unknown. Not all second stems have the second tone.)

By analogy I could reconstruct the two words for 'time' as *SE-Sa and *SE-Sa-w-H.

*S- conditioned vowel tension (indicated by a subscript dot)

*-E- conditioned the raising of *-a to *-ia and later -ie; it and *-a conditioned the intervocalic lenition of an earlier sibilant *-S- to -z- (phonetically [ɮ]?)

But there are limits to analogy. The *-w in 'time' obviously cannot be a third person patient suffix. What is it? Could it be from an earlier *-k? Or is 'time' a true case of ablaut: i.e., primary vowel alternation as opposed to secondary vowel alternation?

(4.1.0:55: Yet another solution is to assume that the original root vowel was *o and the -e-form is from *-o plus a suffix *-j:

*Sɯ-So-H > 2ziọ

*Sɯ-So-j > 1zi

However, that begs the question of what *-j is. I have reconstructed the presyllabic vowel conditioning later -i- as *ɯ, since the frontness of -e is due to *-j. Unlike *E, *ɯ did not cause stressed vowels to front.)

A pair of nouns with a similar alternation* is

1591 2nie 'language' and 1824 2nwio 'word'

Is the semantic difference between those nouns comparable to that between the two words for 'time' (e.g., time as a whole vs. a point in time)? Tangut has many apparent synonyms whose distinctions are not yet fully understood.

*Jacques (2009: 3) explained the difference between stems 1 and 2:

Stem 2 is used when the verb’s subject (that is, A for a transitive verb or S for an intransitive one) is 1Sg or 2Sg and the patient is third person (Gong 2001:26). Stem 1 occurs in all other cases, including those when a 1sg or 2sg agreement suffix appears but is coreferent with the patient of the verb (Gong 2001:32-34).

**4.1.1:00: Unlike the words for 'time', 'language' and 'word' share the same tone, and 'word' has a medial -w- conditioned by a *P-prefix. YERNAZ RAMAUTARSING

I first heard about Yernaz Ramautarsing today through Bosch Fawstin. I've been trying to find the derivation of his name.

Ramautarsing is a Hindi compound of three elements of Sanskrit origin:

राम Rām 'Rama'

औतार autār 'avatar'

सिंह Siṃh 'lion'

When I Googled Yernaz, the results not involving Ramautarsing were mostly ... Kazakh! According to, ер yer is 'hero'* and наз naz is a loan from Persian (presumably ناز nāz 'glory'). Did this Turco-Persian hybrid spread into India before arriving in Suriname where Ramautarsing was born?

*I'm surprised I can't find the Kazakh word in this entry for Proto-Turkic *ēr 'man'. (Also see Clauson 1972: 192.) A KHIOOR-IOUS KHAN-UNDRUM

There are only three tangraphs with the element vai:


1018 1lwo 'moist' (vemvai) =

left of 0642 2lõ 'origin' (vemjolcon; phonetic) +

center of 3052 1niooʳ 'water (trigram ☵)' (cirvaigii; semantic)


3052 1niooʳ 'water (trigram ☵)' (cirvaigii) =

left of 3058 2ziəəʳ 'water' (cirzaa; semantic) +

right of 1018 1lwo 'moist' (vemvai; semantic) +

right of 5941 1diə̣ 'strip' (pargii; why?)


4754 1khiooʳ (Sanskrit transcription) (biobuxvai) =

top and left of 4807 1khi 'to lose' (biobuxpik; initial) +

center of 3052 1niooʳ 'water (trigram ☵)' (cirvaigii; rhyme)

As I wrote in my last entry, 1018 "is clearly a phonosemantic compound," though I don't understand why its semantic component is vai instead of the much more common radical cir 'water'.

3052 is probably a semantic compound. 5941 'strip' might be a reference to how trigrams look like strips, though its components par and gii are absent from the tangraphs for the seven other trigram names:

3950 1tshwiu 'heaven (☰)' (girdexgie)

3389 2ŋiõ 'mountain (☶)' (dexfei)

2777 1ŋeʳw 'thunder (☳)' (dexdukcin)

1995 2məi 'wind (☴)' (biidexdak)

4555 1pə 'fire (☲)' (qeucok)

3910 1phəu 'earth (☷)' (girges)

1976 2bie 'swamp (☱)' (baebeldexbel)

4754 is a fanqie character for Sanskrit transcription according to both the Tangraphic Sea and Homophones. I would expect it to transcribe a Sanskrit phoneme sequence khyor [kʰjoːr] corresponding to its reading 1khiooʳ. However, the only Sanskrit khyor I know of is the genitive and locative dual of sakhi- 'friend', its compounds, and a few rare -khi nouns before a voiced segment. Would the Tangut really create a special tangraph for such forms? Or am I overlooking a common verb form, dharani, or mantra with khyor?

Perhaps my slight rewriting of Gong's Tangut reconstruction is wrong. Arakawa (1997: 149) reconstructed 4754 as khya:n. Monier-Williams' Sanskrit dictionary has 156 nouns with -khyān-; 124 end in khyāna-. However, according to Arakawa (1997: 117), 4754 transcribed Sanskrit khan and khyan, not khyān! That is hard to reconcile with the Chinese transcription 娘 *ndʐɨo for 3052 which rhymes with 4754. The only way out I can see is to reconstruct a -(y)an-like reading which shifted to an -o-like reading by 1190 when 3052 was transcribed in the Pearl. VEM-ŚA

There are only three tangraphs with the left-hand element vem (in David Boxenhorn's code):

0200 2lõ 'relative'

0642 2lõ 'origin'

1018 1lwo 'moist'

I could call this trio of lo-characters a vem-śa (vaṃśa- being Sanskrit for 'family').

Only the third has a known Tangraphic Sea analysis:


1018 1lwo 'moist' = left of 0642 2lõ 'origin'(phonetic) + center of 1niooʳ 'water (trigram)' (semantic)

It is clearly a phonosemantic compound. I should look into how many tangraphs with oral vowel readings have nasal vowel phonetics and vice versa.* I am also interested in how many tangraphs with -w-readings have -w-less phonetics and vice versa. The looseness of the 'fit' of Tangut phonetics has yet to be measured.

I presume 'relative' is phonetic in 'origin' or the other way around. It is also possible that 'origin' is semantic in 'relative' (i.e., people sharing a common origin).

The right side of 0200 (Boxenhorn code: vemqii)

is semantic; it is shared with ten other tangraphs:

Tangraph LFW2008 Boxenhorn code Reading Gloss
0199 fulqii 2siə first half of 2siə 2sa 'to connect' (i.e., to make near)
0213 fioqii 1nie relative (people who are near to one)
0915 qiitos 1thaa to haunt, make mischief (to be haunted is to have ghosts [symbolized by the right element tos] nearby, and ghosts make mischief)
1639 qiibaehae 1khwạ far (opposite of near)
1951 ciadexqii 2sa Second half of 2siə 2sa 'to connect'
1957 gaaqii variant of 0213
2217 dexbelqii 1ɣʌ near; not attested outside dictionaries? 'ritual' word? resemblance to Middle Chinese 近 *gɨnˀ 'near' coincidental?
2223 bilhasqii 2rieʳ to mend, sew (to seal holes in clothing is to connect threads - to make them near each other)
4506 banqiiqex 2ləu to burn, ignite, light (the right element is qex 'fire')
5228 haehasqii ?lẽ husband of sisters (relative and hence near)

Nearly all have glosses involving nearness (0199, 0213 = 1957, 0915, 1639, 1951, 2217, 2223, 5228). The sole exception is 4506 in which qii might be phonetic: 2ləu is vaguely like 2lõ, the reading of 0200.

Unfortunately the right side of 0642 (Boxenhorn code: vemjolcon)

is unique. It may be a compound of parts (jol and con) extracted from two tangraphs: one of 15 others with jol and one of 12 others with con.

*Gong reconstructed rhyme 2.47 as -ow which corresponds to my -õ. If Gong was correct, vem represented lo-like syllables with or without medial or final glides.

In any case, I don't know why only three lo-like syllables were written with vem while 41 others were not. "V NANGOMU"

I am puzzled by Nangomu in the title of this article in a Slovene newspaper: "Trimesečna pomoč v bolnišnici v Nangomu" 'Three months' aid in a hospital in Nangoma'. -a nouns are typically feminine in Slavic. The preposition v 'in' takes the locative case*, and Nangomu is the locative singular of Nangoma as if it were ... masculine? However, the body of the article contains what I'd expect: "v Nangomi" with the locative singular of a feminine Nangoma.

Can foreign place names ending in -a be declined in Slovene as if they belonged to either gender? On .si sites**, Google has 523 results for "v Nangomi" (f.) but only 5 results for "v Nangomu" (m.). Similarly, Google has 357,000 results for v Ameriki" (f.) but only 375 results for "v Ameriku" (m.) including Russian transliterations (so the real figure is smaller). It's not surprising that Wiktionary lists Slovene Amerika as a feminine noun. Should Nangoma also be solely feminine? Most Slovene -a nouns are feminine, not masculine. Why decline Nangoma according to a minority pattern? Is there a Nangoma-like masculine -a noun motivating an analogical declension? Compare how English dive has an irregular conjugation by analogy with the rhyming verb drive.

Returning to the Slavic world, I have never understood how even recent loanwords in Russian came to have the less common -a ending in the plural: e.g., svitera as well as svitery 'sweaters' with the usual -y ending. Isn't this -a an old dual ending: e.g., in glaza 'eyes' (formerly 'two eyes')? I can understand why -a became the plural ending for paired items but not how it spread to nouns not strongly associated with 'two': e.g., professora 'professors'.

*V can also take the accusative case when it means 'into', but 'into' makes no sense in the title.

**The article I quoted is an .eu site. If I search through all Slovene sites, Google will not display the number of results. I can go to the end of the results and see a number, but that figure excludes similar entries and can't be compared to the results for .si sites which do include similar entries. Including similar entries in a Slovene search results in no numbers at all - not even if I go to the last page. <X.Ü.N M.IN TZ.ING I.M>

On Sunday I discovered a virtual 訓民正音 Hunmin chŏngŭm online and wished that equivalent documents could be found for the Tangut, Jurchen, and Khitan scripts. Although I agree with Juha Janhunen's (1994, 1996) hypothesis of the Khitan and Jurchen large scripts being the results of evolution rather than invention, the Khitan and Jurchen small scripts and the Tangut script were all conscious inventions, and in theory their creators could have written manifestos. But in reality ... who knows?

Although the Khitan small script is believed to have been invented around 925, the earliest known example of the script (耶律宗教 Yelü Zongjiao's epitaph) is from 1053. Compare that 128-year gap with the three-year gap between the invention of hangul in late 1443 or early 1444 and the publication of Hunmin chŏngŭm in October 1446. For fun I've written the Sino-Khitan pronunciation of 訓民正音 Hunmin chŏngŭm below in the Khitan small script:

<x.ü.n i.m>

The Jurchen small script is believed to have been invented in 1138. Only a handful of undeciphered potential examples (e.g., the block on the right) survive. Aisin Gioro Ulhicun theorized that the script died out following the assassination of its inventor Emperor Xizong.

The Tangut script is believed to have been invented around 1036. Modern scholars have tried to reverse-engineer the script with the aid of the Tangraphic Sea's analyses, but the latter are circular and unreliable. Hence the structure of tangraphy remains only partly understood. Although all agree that there is a large number of recurring components ('radicals'), no native source explictly names these radicals, much less states what their functions are*. I would not go as far as Janhunen (1994: 10), who wrote,

The attempts made so far to analyze the Tangut 'characters' in terms of a 'radical' structure of the Chinese type are so unconvincing that they only corroborate the impression that the Tangut script cannot have been of the 'ideographic' type.

However, I admit that Tangut radicals are often seemingly arbitrary without any obvious semantic or phonetic function: e.g., why do nearly one out of five tangraphs** have Nishida's radical 204

which is often called 'person' though it often appears in tangraphs that have nothing to do with 2dzwio 'person'?

1nɨaa 'black'***, 1na 'dog', 1giəə 'nine'**** - three of the 1,187+ tangraphs with 'person'

A Tangut Hunmin chŏngŭm might clear this up - though experience has taught me that more data means more mysteries in Tangutology.

*The Tangraphic Sea describes tangraphs in terms of the left, right, etc. of other tangraphs; it only implies the existence of radicals without actually including those radicals in its text. Amazingly no dictionary in the vast native lexicographic tradition is organized by radical. Is that absence telling?

**This figure excludes tangraphs with radicals incorporating the shape of 'person': e.g.,

Nishida's radical 302 / 2to 'end'

This tangraph is <demavixe> in Downes' (2008) transliteration system; Nishida's radical 204 / 2dzwio 'person' is <xe>. Downes (2008: 20) transliteration

is designed to act as a mnemonic to aid in remembering the structure of individual characters - it has no relation to how the language behind the Xixia script may once have sounded.

The graphic relationship between Nishida's radicals 302 and 204 is also visible in their stroke order codes in this Unicode Tangut proposal: DCGQCCCQ and CCCQ.

***The Tangraphic Sea analysis for 'black' is incomplete and circular:


1nɨaa 'black' = ? + right of 1kõ 'night' (which is obviously 'black' plus the mystery component 'person')

The missing first tangraph might be 1na 'dog' which would be phonetic.

1nɨaa 'black' is phonetic in 1na 'dog':


1na 'dog' = left of 1khwiə 'dog' + left and center of 1nɨaa 'black'

1khwiə has an unexpected central vowel; its cognates have front vowels: e.g., Written Tibetan khyi, Written Burmese khveḥ, and Middle Chinese 犬 *kʰwenˀ (whose Old Chinese source is unclear). Perhaps the original vowel was *ɨ.

One might think that the shared components of 1nɨaa 'black' and 1na 'dog' constitute a tangraph with a reading like na, but in fact

'how, where'

was pronounced 2lɨọ! (Is 'person' on its right due to the influence of the sinograph 何 'what' which has 亻 'person' on its left?)

****Is 'person' in the tangraph for 'nine' due to the influence of the sinograph 仇 which has 亻 'person' on its left and was almost homophonous with 九 'nine' in Tangut period northwestern Chinese? СИМЪ ПОБѢДИШИ!

I'm going to interrupt my series on Kensiu to ask a few questions that arose when I looked at the images in this Wikipedia entry.

First, what does the Old Church Slavonic phrase симъ побѣдиши on this coat of arms mean? The literal Russian translation is этим победишь (which is oddly absent from the Wikipedia article on that phrase), but I'm still lost.

The second word побѣдиши is 'thou willst conquer'.

I would expect the first word симъ to be either accusative or instrumental ('with this' being a loose equivalent of the original Latin in hoc signo 'in this sign'). However, it seems to be dative plural according to this paradigm. (Its Russian translation этим can either be dative plural or instrumental singular; although the former corresponds to Old Church Slavonic симъ, the latter corresponds to Old Church Slavonic симь with a different final vowel.) Did Old Church Slavonic побѣдиши take dative objects unlike modern Russian победишь which takes accusative objects? That seems unlikely according to this description of the Old Church Slavonic dative. Or is симъ a later substitute for the instrumental singular симь 'with this'? (No subsitution of jers is mentioned in Wikipedia's description of the Russian recension of Old Church Slavonic.)

Second, the Old Church Slavonic симь 'with this' that I expected has the instrumental ending -мь, the regular reflex of Proto-Slavic *-mĭ. But in Russian, the instrumental ending is hard -м as in Belarusian (which lost /mʲ/ in word-final position) and Ukrainian (which has no /mʲ/ unlike its two sisters). Was there irregular loss of *-ĭ in Russian? Russian 'seven' has the expected reflex of *-ĭ: i.e. palatalization of *m.

Proto-Slavic *sed > Russian семь, Belarusian сем, Ukrainian сім

The Ukrainian vowel і may look irregular at first glance, but it is the regular reflex of *e before *-ĭ.

Third, why does the -и of the Old Church Slavonic second person singular ending -ши in побѣдиши correspond to nothing* in Russian and other modern Slavic languages? Did they completely lose a vowel that Old Church Slavonic retained from Proto-Slavic *-ši?

Fourth - veering off-topic, I know - why does the -и of the Old Church Slavonic infinitive ending -ти (< Proto-Slavic *-ti; e.g., in побѣдити 'to conquer') have yet another set of correspondences?

A. Syllabic

Ukrainian -ти /ty/

Serbo-Croatian and Slovene -ti

B. Mixed: consonantal and syllabic

Russian unaccented** -ть as well as accented -ти

Belarusian -ць as well as -ці (in есці 'to eat') and -чы (in бегчы 'to run'); are all three always unaccented?

C. Consonantal

Czech -t



I was going to go even more off-topic and ask about the divergent developments of Proto-Slavic -ть (ranging from -ть to zero), but I rediscovered Schenker's (2003: 97) answer to that question.

Fifth, what does this standard say? What I see looks like

... И

... Е.С.ТДЪ

The second line looks like a very strange acronym. Are the periods not periods? All I'm really certain about is -ДЪ.

*The -ь in Russian -шь /š/ is currently silent, though it was once /ĭ/. It is obviously a reflex of Proto-Slavic *-i, though it still looks irregular to me.

**Of course a nonsyllabic consonant can't carry an accent. KENSIU 3: NASAL VOWELS

Kensiu has a large vowel inventory. Nearly every oral vowel has a nasal counterpart with the exceptions of /ə/ and retroflex /ɚ/. That reminds me of Tangut which

- also has nasal counterparts of most of its oral plain (nontense, nonretroflex) vowels

- also lacks nasal schwa, though it has nasal diphthongs with schwa

- has only three nasal retroflex vowels

Tangut nasal vowels

Grade \ Cycle Plain Extra* Tense Retroflex
IV: high, more palatal iẽ iã iõ iõõ - iẹ̃ iõʳ
III: high, less palatal ɨĩ ɨẽ ɨã ɨõ ɨõõ - ɨẹ̃ ɨõʳ
I: mid əĩ - əũ - õʳ
II: low - ɛ̃ æ̃ ɔ̃ ɔ̃ɔ̃ - - -

Nasal retroflex vowels are very rare (I only know of Kalash; UPSID has no examples) and I have never seen nasal tense vowels anywhere else, so I suspect that Tangut briefly developed them only to lose nearly all of them: e.g.,

*Cɯ-reNH > *2riẽʳ >  2riẽ (see the eleven tangraphs with this reading here)

*S-soN > *1sọ̃ > 1sọ 'three' (cf. Classical Tibetan gsum 'id.')

The few remaining nasal retroflex and tense vowels at the time of the Tangraphic Sea (the 11th century) might have been long gone when the last known Tangut inscription was made.

I can think of three sources for nasal vowels:

Onsets: 毛 *maw > 大埔元洲仔 Tai Po Yuen Chau Tsai mõ 'hair'

Medials: Latin bonum > Old Portuguese bõo > Portuguese bom [bõ] 'good'

Portuguese orthographic final -m is not a retention of Latin -m

Codas: 名 *meŋ > 大埔元洲仔 Tai Po Yuen Chau Tsai miã 'name'

Given that "the incidence of final nasal consonants is very low" (Bishop 1996), I suspect that Kensiu nasal vowels are from earlier nasal codas: e.g.,

*ʔɛtiNʔ > ʔɛtʔ 'to disobey'

However, I have yet to test that hypothesis with comparative data: i.e., look for cognates that have nasal codas corresponding to Kensiu nasal vowels.

*3.19.0:54: Two diphthongs (rhymes 104 and 105) were placed in a fourth cycle that I call 'extra'. Rhyme 104 (əũ) was only in Chinese loanwords and rhyme 105 (ya) has baffled me for years.

Tangut nasalized u-vowels may have been the first to lose their nasality; other vowels may have followed over the centuries. Words with rhyme 104 were borrowed after native *əũ was lost. KENSIU 2: AUTONYMS, ETHNONYMS, GLOSSONYMS

I'm going to carefully read Bishop (1996) to try to answer the questions I raised in my last post (which counts as "Kensiu 1"). I'll be posting my thoughts as I go along.

Let me start with the name of the language itself. Where does Kensiu (Kensiw) come from? It's not from its speakers' autonym Mani*. It doesn't seem to be either Thai** or Malay in origin. Is it a name from a third language: i.e., one spoken by some minority group near the Mani? Is it a Kensiu word, and if so, what does it mean?

*3.18.2:36: According to Bishop (1996: 233), the Kensiu word is maniʔ 'Negrito'. Hamilton (2003: 82) glossed it as 'human being'.

In Thai, the Mani are มันนิ <manni> manníʔ with an extra -n-. I am surprised the name was not borrowed as มานิ <māni> *maaníʔ  or มะนิ <mḥni> máníʔ. (I have assigned default tones to all of the above strings based on their segments and syllable structure. Loanwords may have irregular tones.)

I do not know if Kensiu word maniʔ contains two major syllables or a minor syllable followed by a major syllable.

**3.18.2:40: Thai กันซิว <kanziw> kansiw 'Kensiu' has an unexpected first vowel. (The <z> is needed for a mid tone in the second syllable.) I expected เก็นซิว <kĕnziw> kensiw or เกินซิว <kənziw> kənsiw since I do not know if the e in Malay Kensiu 'Kensiu' is [e] or [ə].

In Malaysia, Kensiu speakers are called Kensiu. Perhaps Kensiu (Kensiu kensiw?) is the ethnic group's original autonym which was abandoned by members in Thailand who adopted maniʔ as their autonym and only use kensiw as the name of their language.

If there is a Kensiu word kensiw, I do not know if it contains two major syllables or a minor syllable followed by a major syllable.

