Archives

14.9.29.23:59: NOT BEING THERE ANYMORE: RUSSIAN GERUND VARIANTS

Russian has several types of gerund suffixes. Six books for English-speaking learners include notes on when to use them:

Aspect Suffix Example Reiff (1883: 181) Forbes (1916: 171) Arant (1981: 119) Pul'kina & Zakhava-Nekrasova (1992: 371) Offord & Gogolitsyna (2005: 328) Wade (2011: 386, 389)
Imperfective (-shibilant) -я / (+shibilant) -а встречая 'meeting' "written tongue"  
(consonant +) -учи / (vowel +) -ючи встречаючи 'meeting' "familiar language" "peasants", "popular poetry" not mentioned; even будучи 'being' is absent "popular parlance"; "generally avoided in the modern literary language" with the sole exception of
будучи 'being'
only будучи 'being' only будучи 'being', three others*
Reflexive imperfective (-shibilant) -ясь / (+shibilant) -ась встречаясь 'meeting'     
Perfective (-shibilant) -я / (+shibilant) -а войдя 'having entered' "common with reflexive verbs"
(vowel +) -в встретив 'having met' "written tongue"   interchangeable   "preferred in written styles" to -я/-а
(vowel + ) -вши встретивши 'having met' "familiar language" "peasants", "popular poetry" "less frequently" used than -в "archaic flavour"; "may also occur" in "the colloquial register"or "demotic" not mentioned
(consonant +) -ши вошедши 'having entered'   "rarely used"  
Reflexive perfective (-shibilant) -ясь / (+shibilant) -ась разбредясь 'having wandered in different directions'  
(vowel + ) -вшись встретившись 'having met'
(consonant +) -шись ведшись 'having been in progress'

Comments:

1. Is -учи /-ючи still in "popular parlance" today?

Google Ngrams has no data for встречаючи, so here are two more pairs of gerunds:

читая vs. читаючи 'reading' (the former is always more common)

делая vs. делаючи 'doing' (ditto)

The title refers to the most common surviving -учи /-ючи gerund in Будучи там, the Russian title of Being There (1979). Будучи 'being' has 4.99 million Google results. See below for the Google statistics of other -учи /-ючи gerunds mentioned in Wade (2011).

2. I am surprised that the perfective gerund suffix -а/я is still around. It could be confused with the homophonous imperfective gerund suffix (though the latter attaches to many more stems). And yet войдя 'having entered' is much more common than its synonym вошедши in Google Ngram Viewer. (There is no risk of confusing inflected imperfective and perfective gerunds [i.e., stem-suffix sequences] as opposed to suffixes in isolation as long as each aspect has a different stem: e.g., the imperfect gerund corresponding to войдя/вошедши 'having entered' is входя 'entering' with a different stem вход-.)

3. I am also surprised that -вши is in decline (Wade does not even mention it!) though its reflexive counterpart -вшись is common.

встретивши was once more common than встретив, but their fortunes reversed shortly before the Revolution.

In short, I would expect the imperfective and perfective gerund suffixes to be maximally differentiated over time and internally consistent:

-я(сь)/-а(сь) vs. -(в)ши(сь)

But that's not the case!

9.30.1:36: Added a column for Offord & Gogolitsyna (2005: 328) and Google Ngrams links.

*The three are

едучи 'traveling' "is sometimes found in poetic or folk speech" (p. 386; 97,300 Google results)

жить припеваючи 'to live in clover' (p. 386; 124,000 Google results)

крадучись 'stealthily' (p. 394; 391,000 Google results)


14.9.28.23:14: TRANSCARPATHIAN RUSYN MASCULINE 'JA-NIMATES'

The Transcarpathian Rusyn (TR) and Prešov Rusyn (PR) masculine animate declension in Magocsi (1979: 83) and Magocsi (1979: 83) is straightforward in the singular: all endings are added to an invariable stem brat-:

Case Proto-Slavic TR PR Ukrainian Belarusian Russian Serbo-Croatian Polish Slovak Czech
nominative *bratrŭ brat bratr
genitive *bratra brata bratra
dative *bratru bratu, bratovy bratovi bratu, bratovi bratu bratovi bratru, bratrovi
accusative *bratrŭ brata bratra
instrumental *bratromŭ bratom bratam bratom bratem bratom bratrem
locative *bratrě bratu bratovi brati, bratovi bracie brate bratu bracie bratovi bratru, bratrovi
vocative *bratre brate bracie - brate bracie - bratře

However, the TR plural has an unexpected -j- in some forms:

Case Proto-Slavic TR PR Ukrainian Belarusian Russian Serbo-Croatian Polish Slovak Czech
nominative *bratri bratȳ braty braty brat'ja braća bracia bratia bratři
genitive *bratrŭ bratüv brativ brativ bratoŭ brat'jev braće braci bratov bratrů
dative *bratromŭ bratüm, bratjam bratom bratam brat'jam braći braciom bratom bratrům
accusative *bratry bratüv brativ brativ bratoŭ brat'jev braću braci bratov bratry
instrumental bratamy bratami brat'jami braćom braćmi bratmi
locative *bratrěchŭ bratjach bratoch bratach brat'jach braći braciach bratoch bratrech
vocative *bratri braty - braty - braćo bracia - bratři

The TR dative and locative plurals resemble the Russian plurals, but that must be a coincidence, as TR is not contiguous with Russian; it is spoken in the Transcarpathian Oblast' "which borders upon four countries: Poland, Slovakia, Hungary, and Romania." I wonder if those TR ja-plurals were influenced by Polish whose ci is from *tj. The TR nominative plural is unlike those of Polish or Slovak.

TR bratüm < *bratomŭ may be an older TR dative plural or a very old borrowing from Slovak predating *o-fronting and *-ŭ-loss.

Moreover, the Russian plural forms are based on an old feminine collective which must have replaced an earlier regular masculine plural *braty still preserved in the other East Slavic languages. On the other hand, all non-j TR forms are from brat- rather than the feminine collective *bratĭja.

The Serbo-Croatian 'plural' braća 'brothers' is still a feminine collective singular unlike Russian brat'ja which takes plural endings except in the old nominative singular (now reinterpreted as a plural). Hence none of its endings are cognate to those of the original masculine plurals.

Polish has a mixture of old singular and plural forms of that collective. I assume the old feminine accusative singular *bracię has been replaced by the old feminine genitive singular braci to conform to the genitive-as-accusative pattern of masculine animates. (23:30: The old feminine vocative singular would have been *bracio; it has been replaced by the old nominative singular since masculine plurals have identical vocatives and nominatives.)

Slovak combines that collective (reinterpreted as a masculine plural) in the nominative with forms of brat- in all other cases.

Notes on other forms

Stem: Only Czech preserves the second *-r-.

Nominative/accusative singular: Originally identical but differentiated later when the genitive was used as the singular. See Schenker (1993: 108).

Dative/locative singular: Apparently partly merged in TR and Ukrainian. Fully merged in PR, Slovak, and Czech. Dative for locative reminds me of the dative after German prepositions.

What is the origin of -ovy/-ovi?

PR y normally does not correspond to Ukrainian i. Why does PR have -y instead of -i?

Instrumental singular: Did Polish and Czech generalize -em from other paradigms? Czech -em in this paradigm must postdate *r shifting to ř before *e (a change visible in the vocative).

Belarusian unstressed  *o became a.

Nominative plural: In spite of my transliteration, TR/PR  bratȳ [bratɨ] is homophonous with Belarusian braty [bratɨ] but not Ukrainian braty [bratɪ].

Genitive plural: Originally homophonous with nominative and accusative singular. How did *-ovŭ (the source of most forms above) and *oː (the source of Czech ů [uː]) develop?

The *o before fronted to ü in TR and lost its rounding in PR and Ukrainian.

*-v became Belarusian -ŭ.

Russian -ev is an allomorph of -ov after -j-.

Dative plural: Is -a- instead of -o- in most of East Slavic other than TR and PR by analogy with the instrumental -ami?

Is PR bratom due to Slovak influence postdating the *o > i shift before *ŭ?

Is TR bratüm due to Slovak influence predating *o-fronting?

Accusative plural: Czech preserves the original homophony of accusative and instrumental plural. All other modern languages have accusative plurals from genitive plurals.

Instrumental plural: Schenker (1993: 89) could not explain the original ending *-y. It was replaced by -mi endings by analogy with other declensions.

-a- in East Slavic could be from the -ami of the -a-declension.

Locative plural: Is Czech the only language in the table with a reflex of *ě? Most of East Slavic seems to have generalized -a- from the instrumental and/or dative plural. Polish braciami has the -ami of an a-declension instrumental plural. Slovak may have generalized -o- from the genitive/accusative and/or dative plural. PR o must be from the dative plural since *o borrowed from the old genitive/accusative plural *-ovŭ would have fronted to *i.

9.28.23:57: I forgot to ask if the -j- in TR bratjam and bratjach is in all masculine consonant-final dative and locative plural forms or is only in a subset of those forms. I could answer my own question by looking for all masculine consonant-final dative and locative plural forms in Magocsi (1979), but my copy is not machine-searchable, and that would be time-consuming. My guess is that (1) TR brat belongs to a small class of masculine animate nouns which once had alternate plurals based on feminine singular collectives and (2) all other TR masculine animate nouns share the endings -am and -ach with masculine inanimates and neuters.


14.9.27.19:03: НЕСПРАВНІ СЛОВА

Magocsi (1979: 82) listed fifteen English loanwords in American Rusyn that he regarded as "incorrect" (несправні <nespravni>). They contain a number of surprises from an English speaker's perspective:

1. 'Displaced' stress

Verbs are borrowed with the stressed suffix -ва́ти <váty>. The English roots are unstressed: e.g.,

bother > бадерова́ти <baderováty> (not *báderovaty)

Is the stress in this word by analogy with other -ня <-nja> words?

grocer > ґросе́рня <grosérnja> (not *grósernja)

The stress in 'watch out!' is by analogy with its native equivalent:

watch > вачу́йте <vačújte> (not *váčujte) : мирку́йте <myrkújte>

Also see 'cookies' and 'cousin' below.

2. Assignment of monosyllabic consonant-final nouns to the feminine -a declension

yard > я́рда <járda> (not *jard)

car > ка́ра <kára> (not *kar)

mine > ма́йна <májna> (not *majn)

but

store > штор <štor> (not *štóra; the initial consonant is irregular)

Polysyllabic consonant-final nouns were assigned to the masculine consonant-final declension:

carpet > ка́рпет <kárpet>

closet > кла́зет <klázet>

3. Double plural

cookies > куке́сы <kukésȳ>

I suppose the Rusyn plural ending was added to kukés- because *kuki would end in an un-Rusyn -i- and could not be declined.

Is there a singular kukés?

I'm surprised the stem isn't *kúkiz-.

4. Spelling-based borrowings?

Rusyn y is [ɪ].

cousin > кузи́н <kuzýn> (not *kázyn)

picture > пі́кчер <píkčer> (not *pýkčer)

run (?) > рунова́ти <runováty> 'to drive' (not *ranováty)

The -e- in kukésy 'cookies' may also be influenced by spelling.

5. Vowel not matching spelling or pronunciation

drive > дрейвова́ти <drejvováty> (not *drájvovaty)

Oddities like this make me wonder about the dialect(s) and nonnative, non-Rusyn English that Rusyn speakers heard.


14.9.26.23:59: _DENT_F_KAC_JA

If someone asked me how to distinguish between modern written Russian, Belarusian, and Ukrainian without actually knowing the languages, I'd tell them to look for letters specific to each orthography:

ъ <''> is only in Russian

є <je> and ї <ji> are only in Ukrainian

ў <ŭ> is only in Belarusian

The problem with that approach is the low frequency of those letters:

ъ <''> is the rarest letter in Russian

є <je> and ї <ji> are eight-point letters in Ukrainian Scrabble

ў <ŭ> is the 12th least frequent letter* in the Narkamaŭka Belarusian orthography and the 11th least frequent letter in the Taraškievica orthography

Here is a different approach using higher-frequency letters:

- if a text contains і, it is either Ukrainian or Belarusian

- if a text contains і and и, it is Ukrainian

- if a text contains і and ы, it is Belarusian

- if a text contains и and ы, it is Russian

This table shows the distribution of the three letters:

Letter Russian Belarusian Ukrainian
і (not used) /i/
и /i/ (not used) /ɪ/
ы /ɨ/ (not used)

Note that и has different phonemic values in Russian and Ukrainian.

і is the third most frequent letter in Belarusian and a one-point letter in Ukrainian Scrabble.

ы is the 5th most frequent letter in the Narkamaŭka Belarusian orthography and the 4th most frequent letter in the Taraškievica orthography, but is the 19th most frequent letter in Russian.

The Russian, Belarusian, and Ukrainian words for 'identification' exemplify the different distributions of those letters:

R идентификация <identifikacija>

B ідэнтыфікацыя <identyfikacyja>

U ідентифікація <identyfikacija>

The Russian word would be an even better example if it contained ы as well as и.

Belarusian has one difference absent from the table above: э where the others have е.

So far, so good. But then I finally got around to looking at the Rusyn alphabet this week. I've known about Rusyn for years without knowing that its alphabet was like a combination of the Russian and Ukrainian alphabets. It has

- ё, ы, ъ like Russian

- є, і, ї like Ukrainian

I don't know anything about Rusyn, much less its historical phonology. My guess is that Rusyn did not merge *y and *i unlike Ukrainian:

Proto-Slavic Russian Belarusian Ukrainian Rusyn?
*y ы и ы
*i и і, ы и
е е, я і і

Did Pannonian Rusyn merge all three vowels into и? If so, then it is like Ikavian Serbo-Croatian in that respect.

On Tuesday I discovered a Transcarpathian Lemko variant of the Rusyn alphabet with two more letters in Magocsi (1979): ӱ <ü> and ю̈ <jü>.

ӱ <ü> is from *o before a short high vowel:

*nočĭ 'night' >

Russian ночь <noč'>

Belarusian ноч <noč>

Lemko нӱч <nüč> (fronting) (p. 14)

Ukrainian ніч <nič> (fronting and loss of rounding)

I can't explain this correspondence:

*děvica 'girl' >

Russian девочка <dеvočkа>

Belarusian дзяўчына <dzjaŭčyna>

Lemko дӱвочку <düvočku> 'girl' (acc. sg., p. 23) (I would expect *divočku)

(9.27.0:05: I'm pretty sure the nom. sg. is düvočka. is  Did round before *o?)

Ukrainian дівчина <divčyna>

ю̈ <jü> is much rarer than ӱ <ü>. Here are two examples from *e before a short high vowel:

*medŭ 'honey' >

Russian and Belarusian мёд <mjod> [mʲot]

Lemko мню̈ д <mnd> [mɲyd] (p. 37)

(9.27.0:30: Lemko [mɲ] is reminiscent of Czech [mɲ] from *mj-, though Lemko and Czech are not contiguous. Lemko's neighbor Slovak has [m] corresponding to Czech [mɲ].)

Ukrainian мед <med> [mɛd]

*anŭgelŭ 'angel' >

Russian ангел <angel>

Belarusian анёл <anjol>

(9.27.0:32: Coincidentally reminiscent of Slovak anjel. Did Belarusian simplify *ng to n?)

Lemko агню̈ ль <ahnl'>, ангел <anhel> (p. 52)

(The former has an irregular palatalized -l' and the latter looks like a later loan.)

Ukrainian ангел <anhel>

Another example is from *ju before a short high vowel:

*ključĭ 'key' >

Russian, Belarusian, and Ukrainian ключ <ključ>

Lemko клю̈ ч <klč>

*The Belarusian frequency lists include Russian letters absent from Belarusian at the bottom: и, ъ, щ. I presume those letters appeared in Russian names and words in Belarusian texts. I have excluded those letters from my ranking.


14.9.25.23:51: MIENSK I MINSK

(The title is from Менск і Мінск 'Miensk and Minsk', the first song I ever heard in Belarusian.)

I was puzzled by this section of the English Wikipedia entry on Minsk:

The Old East Slavic name of the town was Мѣньскъ (i.e. Měnsk < Early Proto-Slavic or Late Indo-European Mēnĭskŭ), derived from a river name Měn (< Mēnŭ). The direct continuation of this name in Belarusian is Miensk (pronounced [mʲɛnsk]). The resulting form of the name, Minsk (spelled either Минскъ or Мѣнскъ), was taken over both in Russian (modern spelling: Минск) and Polish (Mińsk), and under the influence especially of Russian it also became official in Belarusian. However, some Belarusian-speakers continue to use Miensk (spelled Менск) as their preferred name for the city.

It does not explain where Minsk came from. The standard Belarusian reflex of Proto-Slavic ('jat') is e (with palatalization of the preceding consonant indicated by -i- in Łacinka). Russian has the same reflex of jat. Among the East Slavic standard languages, only Ukrainian has i from jat. The Slavic root for 'white' in Беларусь Belarus' 'White Rus' ' has jat:

Proto-Slavic *běl-

Belarusian бел- bieł- [bʲɛl]

Russian бел- bel- [bʲɛl] (Łacinka disguises the fact that the Belarusian and Russian roots are homophonous)

Ukrainian біл- bil-

Polish biał- [bʲaw]

(More descendants here.)

One might think that Minsk is a borrowing from Ukrainian (in which the word is Мінськ Mins'k with the shift ĭs > s'), and in fact Vasmer credits Ukrainian influence rather than outright borrowing. The Belarusian Wikipedia in the current official orthography states that according to Aničenka (1987), the spelling Minsk adopted in 1939 incorporates the Ukrainian reflex of jat.

The Taraškievica Belarusian and Russian Wikipedias mention another explanation by Abremska-Jabłońska in Kramko and Štychaŭ 2001: the influence of the Polish name Mińsk (Mazowiecki) '(Masovian) Minsk'.

The Russian Wikipedia says the i-spelling in Latin dates from 1502 when Minsk was under Lithuanian rule. The Polish-Lithuanian Commonwealth was still 67 years in the future.

At first I thought it was likely that the Poles renamed Minsk after their own Mińsk, but why would non-Poles* alter the name to match a name in a foreign country? And centuries later, why would the BSSR adopt a Ukrainianized name for Minsk?

Here is an uninformed guess: Did the originators of the spelling Minsk perceive the local Belarusian reflex of jat to be i-like: i.e., an [e] or [ɪ] higher than Belarusian e [ɛ]? Such a high reflex could have later lowered and merged with [ɛ]. Or this hypothetical high-jat dialect could have been replaced by an [ɛ]-jat dialect.

*9.26.0:52: I don't know who wrote the Latin documents containing Minsk. They could have been Lithuanians or Belarusians. In any case, they did not have the option of writing a higher e with the dotted letter ė which was absent from the earliest Lithuanian alphabet of 1547. (In modern Lithuanian orthography, plain e is [ɛ] and dotted ė is [eː]. The Lithuanian Wikipedia article on the Lithuanian alphabet gives me the impression that dotted ė is only a little over a century old.)


14.9.24.23:54: BROTHER-IN-LAWS IGOR AND OLEG

I am barely a dilettante at Slavic, so I constantly fear that I am raising Comparaitve Slavic 101-level questions whenever I bring up the subject. Yesterday I asked why *e in *děverĭ 'brother-in-law' didn't raise in Ukranian. Today I learned that the late George Shevelov himself (1979: 309) wasn't sure:

The reason for the appearance of e in [standard Ukrainian] díver 'husband's brother' is unclear. Could it be an influence of NU [northern Ukrainian] dialects where e is restored in unstressed syllables?

So maybe that wasn't such a bad queston after all. I don't know about these next questions, though.

Another word from my last post, Russian Igor' / Ukrainian Ihor / Belarusian Ihar, is from Old East Slavic In(ŭ)gvarŭ which in turn is a loan from Old Norse Ingvarr. Let's go through this word from left to right:

According to Shevelov, nasal + consonant sequences did not exist at the time. Hence there were four options to deal with Old Norse Ing-:

1. Borrow as is in spite of native phonotactics: Ingvarŭ

2. Insert ŭ to break up the ng-cluster: Inŭgvarŭ

3. Drop the n to avoid the ng-cluster: the ancestor of Igor'

4. Replace In- with native nasalized Ę- to break up the ng-cluster.

All but the last options were exercised. A nasal vowel would have become *Ja- in modern forms like Russian *Jagor', etc.

G weakened to h in  Ukrainian and Belarusian.

I have not seen the change *va > o anywhere in Slavic. Are there other examples? Was Old Norse va something like [wɒ] or [wɔ] which would have been close to Old East Slavic *o? Belarusian a in Ihar is from o and is not a direct retention of the Old Norse vowel.

Why does Russian have -r' < *-rĭ if the Old East Slavic word ended in *-ŭ?

Ukrainian final -r in theory could be from either *-rŭ or *-rĭ, but the -r of Ihor must be from *-rĭ since palatalized r appears before endings: Ihorja instead of *Ihora, etc.

Belarusian r is always unpalatalized, so the endings of Ihar do not reveal whether its -r was from *-rŭ or *-rĭ: e.g, Ihora, etc.

Another Norse name in East Slavic is Russian Oleg / Ukrainian Oleh / Belarusian Aleh from Old Norse Helgi via Old East Slavic Olĭgŭ.

Old East Slavic had no H-. (As already stated, the later h of Ukrainian and Belarusian is from g.) Old Norse -e- was borrowed as Old East Slavic Je- with a prothetic J-. This Je- then became Jo- and ultimately O-; cf. Proto-Slavic *ezero >  Russian/Ukrainian ozero / Belarusian vozera 'lake'. Belarusian lowered unstressed O- to A-.

'Strong' ĭ before a 'weak' ŭ lowered to e in East Slavic. (See Wikipedia on the 'strong'/'weak' distinction.)

Why does the -i of Old Norse Helgi correspond to Old East Slavic instead of -ĭ? I am reminded of how Russian third person verb endings end in -t from -tŭ instead of the expected -t' from -tĭ corresponding to Ukrainian -t', Belarusian -c', and - far outside Slavic - Sanskrit -ti.


14.9.23.23:59: BROTHER-IN-LAW IGOR THE EEL

One feature that distinguishes standard Ukrainian (hereafter simply 'Ukrainian') from the other major Slavic languages is i from *o before a consonant followed by or *ŭ: e.g.,

ніч nich < *nochĭ 'night' (cf. Russian ночь noch')

кіт kit < *kotŭ 'cat' (cf. Russian кот kot)

Last Friday, it occurred to me that if Russian noch' corresponds to Ukrainian nich, then Russian Игорь Igor' should correspond to Ukrainian *Ігір *Ihir. (Russian -ь -' is a trace of *ĭ, *g weakened to h in Ukrainian, and Ukrainian palatalized r' lost its palatalization except before vowels.) But the actual Ukrainian name is Ігор Ihor with o.

Similarly, the Ukrainian cognate of Russian угорь ugor' < *ǫgorĭ 'eel' is вугор vuhor, not *вугір *vuhir. (Prothetic v- is common before stressed *u in Ukrainian. I don't know why the stress moved to o after prothesis. Russian retains the original initial stress.)

Ukrainian i is also partly from *e before a consonant followed by or *ŭ: e.g.,

сім sim < *sedmĭ 'seven' (cf. Russian семь sem')

обмін obmin < *obmenŭ 'exchange' (cf. Russian обмен obmen)

(9.24.22.49: According to Shevelov 1979: 322 and 1993: 950, *e did not raise before unless it received retracted stress:

*médŭ > мед med (not *мід *mid) 'honey' (cf. disyllabic forms with initial stress: médu, etc.)

*neslŭ́ > ніс nis 'carried' (cf. disyllabic forms with final stress: neslá, etc.; my assumption is that all disyllabic forms including the source of nis originally had final stress)

Could this be restated as *e raising before a stressed *ŭ? If so, why did *e raise in *obmenŭ? Russian obmén, obména, etc. has root stress whereas Ukrainian óbmin, óbminu etc. has prefixal stress. Is either stress original, or did *obmenŭ once have final stress?

However, the Ukrainian cognate of Russian деверь dever' < *děverĭ 'brother-in-law' is дівер diver, not *дівір *divir. (I is the regular Ukrainian reflex of *ě.)

Did *o and *e regularly fail to raise in Ukrainian before word-final *r and a short high vowel? *o did raise before word-medial *rĭ in гіркий hirkyj < *gorĭkij 'bitter' (cf. Russian горький gor'kij; y is the regular Ukrainian reflex of noninitial *i).


14.9.22.23:02: A DIP INTO WHITE WATERS (PART 10): XIANGNAN TUHUA PROTO-TONES

I am normally skeptical of attempts to reconstruct proto-tone contours (as opposed to proto-tone categories), but against my better judgment, I wanted to see what I could do with the two 湘南土話 Xiangnan Tuhua 'local speech of southern Hunan' tone systems available at the 小學堂 Xiaoxuetang database: one from 白水村 Baishuicun (BSC) 'White Water Village' and another from 道 Dao County.

The overall picture of tone category evolution is clear:

Old Chinese had no tones.

Middle Chinese could be defined as the first stage of tonal Chinese. It might be more accurate to describe very early Middle Chinese as having phonations (clear / creaky / breathy) than tones. These phonations became phonemic after the consonants that conditioned them were lost. They probably developed into tones at different rates in different dialects.

Middle Chinese had four tonal categories:

平 'level' vs. 上 'rising'

去 'departing' vs. 入 'entering'

The Middle Chinese names of the tones exemplfify them: e.g., *bɨeŋ 'level' has a 'level' tone, etc.

The first two tones may have had level and rising contours in the dialect spoken by whoever coined those names which are first attested in the fifth century AD. That does not mean those categories were level and rising in other dialects of that period or later periods.

The names 'departing' and 'entering' may imply that those tones were perceived as opposites in some way but do not hint at contours. It is tempting to regard 'departing' as falling since the modern standard Mandarin reflexes of the first three tones after *voiceless initials are high level*, low rising**, and high falling, but there is no guarantee that the currently dominant Chinese language just so happens to preserve contours that are over 1,500 years old.

Later the four tones developed yin and yang allophones after different initial classes that became phonemic when initial distinctions were lost.

Modern reflexes of the four tones vary considerably: e.g., words that once had the 'departing' tone can have level tones (as in Cantonese) or rising tones (as in Shanghai). I use single quotation marks to distinguish between tone names and contours; the latter are written without quotation marks.

I have listed the tones of BSC and Dao in part 9. I reconstruct a seven-tone system for their common ancestor Proto-Xiangnan Tuhua (PXT):

Initial \ coda 'level': *-sonorant 'rising': *-ʔ 'departing': *-s 'entering': *-p/t/k/kʷ
'yin': *voiceless ('clear') *high falling (54) *high level (55) *mid level (33) ?*high rising (45) + no stop
'yang': *voiced ('muddy') *low falling (31) *low level (22) ?*mid falling (43~42) (+ no stop < *yang entering)

The merger of yang 'departing' and yang 'entering' may be an innovation distinguishing PXT from the rest of Chinese. If other PXT dialects retain a distinct yang 'entering' tone, an eighth tone will have to be reconstructed in the future.

As I wrote last night,

Yin/yang is correlated with height for the 'level' and 'rising' tones (yin : higher, yang : lower) but not for the 'departing' tones which have the opposite pattern (yin : lower, yang : higher).

So I did not hesitant to reconstruct higher yin and lower yang 'level' and 'rising' tones. The contours are more questionable. Dao has two falling 'level' tones and BSC has only one falling 'level' tone. It is simpler to assume that one tone became falling in BSC than to assume that two tones became falling in Dao.

If PXT 'level' tones were falling, then PXT 'rising' tones could not be falling (unless they were falling with a creaky phonation absent from 'level' tones). I project the level 'rising' tones of Dao back into PXT and regard the contours of the BSC tones as innovations.

BSC has merged the yang 'rising' and yin 'departing' tones into a single tone:

PXT *mid level + *low level > pre-BSC *nonhigh level > BSC low falling

Dao still has a distinct mid level yin 'departing' tone which I regard as a retention of PXT.

Up to this point, the PXT system is identical to the Dao system. Is Dao really that conservative?

I am most reluctant to construct the last two tones. The BSC and Dao contours are so different:

Tone BSC Dao
Yin 'entering' 55 35
Yang 'departing/entering' 33 52

If I average the contours, I get *high rising (45) for yin 'entering' and *mid falling (43~42) for yang 'departing/entering'. This almost fits the general yin-higher/yang-lower pattern. (Yang 'departing/entering' starts slightly higher than the other two yang tones.) Averaging is an act of desperation, not a serious methodology. Hence I have placed question marks before those two tones in my first table.

Final stops in entering tones could have been lost in pre-PXT, paving the way for the yang 'departing/entering' merger.

*High rising after *voiced initials.

**High falling after *voiced obstruent initials.


14.9.21.23:42: A DIP INTO WHITE WATERS (PART 9): DEPARTING A MUDDY ENTRANCE

I wanted to wrap up my series on 白水村 Baishuicun (BSC) 'White Water Village' last night, but I have one more thing to say before I move on.

I have written almost nothing about BSC tones. I have omitted them from all forms in this series. However, they are interesting. Here is the general pattern which has many exceptions:

Initial \ coda 'level': *-sonorant 'rising': *-ʔ 'departing': *-s 'entering': *-p/t/k/kʷ
'yin': *voiceless ('clear') 44 35 21 55 + no stop
'yang': *voiced ('muddy') 41 21 33 (+ no stop < *yang entering)

For comparison, here  are the tones of another dialect of 湘南土話 Xiangnan Tuhua 'local speech of southern Hunan' from 道 Dao County:

Initial \ coda 'level': *-sonorant 'rising': *-ʔ 'departing': *-s 'entering': *-p/t/k/kʷ
'yin': *voiceless ('clear') 54 55 33 35 + no stop
'yang': *voiced ('muddy') 31 22 52 (+ no stop < *yang entering)

BSC and Dao are the only dialects of Xiangnan Tuhua in the 小學堂 Xiaoxuetang database. I don't know how much variation is in Xiangnan Tuhua.

What stands out in those two tables is how the yin entering tone has no yang counterpart. I don't remember ever seeing that elsewhere before. The three usual patterns are:

- no entering tone: e.g., standard Mandarin

- a single entering tone without a yin/yang distinction: e.g., Jin

- yin and yang entering tones: e.g., Cantonese and Taiwanese

Yin/yang is correlated with height for the 'level' and 'rising' tones (yin : higher, yang : lower) but not for the 'departing' tones which have the opposite pattern (yin : lower, yang : higher). This is unlike Cantonese which has consistently higher yin tones. The BSC yin entering tone is high and patterns with the 'level' and 'rising' tones, but the Dao yin entering tone starts lower than the yang departing tone that merged with the *yang entering tone.


14.9.20.23:45: A DIP INTO WHITE WATERS (PART 8): -AI-XCEPTION CLASSES 4-10

Here are the remaining types of 白水村 Baishuicun (BSC) 'White Water Village' -ai forms that are not from non-*a vowels followed by nasals (see part 5).

4. 白 pai and : Middle Chinese *bæk

These look like loans from standard Mandarin bai [paj] and southwestern Mandarin pə. The 小學堂 Xiaoxuetang database does not list Hunan Mandarin forms, so the southwestern Mandarin forms in this post are from Guangxi to the west of Hunan.

5. 擲 tsai : Middle Chinese *ɖiek

I would expect *tsiə. A loanword? But the rhyme doesn't match southwestern Mandarin tsɿ. Maybe tsɿ was borrowed as *tsə whose schwa shifted to -ai. See the next class.

6. 而兒爾 ai : Middle Chinese  *ɲɨ, *ɲie, *ɲieˀ

All three are ə in southwestern Mandarin. Perhaps all three were borrowed as *ə which later broke to -ai. That shift is parallel to the shift of Old Chinese *-əˁ to Mandarin -ai (see part 6 for examples of Mandarin -ai forms borrowed into BSC).

7. 日入 ai : Middle Chinese *ɲit, *ɲip < Old Chinese *nit, *nip;  栗 lai : Middle Chinese *lit

These are i, y, and li in southwestern Mandarin. All three once had a final glottal stop. Were they borrowed into BSC as *iʔ, *iʔ, and *liʔ whose *i broke to ai before a now-lost glottal stop? Other BSC forms which may be loans of -i forms without glottal stops do not end in -ai.

The first two have alternate forms.

BSC 入 y (if I am reading Xiaoxuetang correctly) looks like a recent direct loan from southwestern Mandarin.

BSC 日 ɲi is close to Middle Chinese *ɲit and may be an old loan postdating the palatalization of Old Chinese *n-.

BSC 日 na and 入 na may be native. Could their n- be a retention of Old Chinese *n-?

8. 睡 fai could be a borrowing from a southwestern Mandarin form resembling Lingui suei and Luorong suɐi. f- may be a sporadic simplification of *sw-. I cannot find any other examples of f- from sibilants in BSC.

When I wrote part 6, I thought there were eight classes of exceptions, but now I see ten. (I broke up one class into classes 4, 5, and 7. Although 4 and 5 both had *-k, 7 is completely dissimilar and its inclusion was a mistake.)

9. BSC 甫 pai : Middle Chinese *puoˀ

I have no idea why this form has a -i.

10. BSC 些 s-, l- + -ou, -iə, -ai, -əi : Middle Chinese *sjæ

I don't know for sure which initials go with which finals. I suppose the first form is sou and the last one is ləi. BSC -iə regularly corresponds to Middle Chinese *-jæ, so I guess the second form is siə. If the third form is lai, it cannot be related to the first two since BSC l- is not from *s-.


14.9.19.23:49: A DIP INTO WHITE WATERS (PART 7): AI-XCEPTION CLASSES 2-3

In part 5 I proposed the following chain shift in 白水村 Baishuicun (BSC) 'White Water Village':

*-VN > *an > *-ai > *-oi > -o

*V was a non-*a vowel.

Most BSC -ai are from *-VN with the exception of eight types of forms.

The first type in part 6 was borrowed after the shifts took place.

There is only one example of the second type: 鏘 *tshɨaŋ 'jangling noise'. Onomatopoetic words may be exceptions to general developments.

There is also only one example of the third type: 黌 *ɣwæŋ 'school' whose coda may have later fronted to *-ɲ.

Types 2-3 may be borrowings postdating the shift of *-VN to *-an but predating the shift of *-an to *-ai:

Sinograph
Early pre-BSC
Late pre-BSC
BSC
(N/A since these are loans)
*tshan
thai (tshai?)
*xan
xai
*kən
*kan
kai
*kæŋ > *kaɲ > *kan
*kai
koi
*kai
*koi
ko

The th of thai may be a typo for tshai in the 小學堂 Xiaoxuetang database, as it lists no other examples of BSC th- from *tsh-, and thai sounds even less like a jangling noise than tshai does.

鏘 and 黌 are low-frequency characters, so their BSC readings thai (tshai?) and xai may be literary borrowings without colloquial (i.e., native) equivalents *tshoi and *xoi.


14.9.18.23:18: <HACÑ(Ī)>

And now for a Thai-Islamic detour that will bring me back to 白水村 Baishuicun 'White Water Village' ...

Last night I found the Thai Wikipedia article for hajji which is titled หัจญี <hacñī> [hàtjiː]. It lists another form ฮัจญี <ɦacñī> [hátjiː]. According to thai-language.com, the 1982 edition of the Royal Dictionary lists two more forms:

หะยี <haḥyī> [hàʔjiː]

หัจญี <hacñī> [hàtjiː]

Although Thailand has a large Muslim population which is two-thirds Malay, I never looked at any Thai terminology for Islam or Thai transcriptions of Malay until now. The above forms made me wonder:

1. Is Thai terminology for Islam based on Malay: e.g., are [hàtjiː] etc. loans from Malay haji rather than directly from Arabic ḥajjī?

2. Thai has no consonant like Malay and Arabic j [dʒ]. I am accustomed to seeing English [dʒ] rendered as [j] or [tɕ]: e.g.,

เอเนท์ <ʔecend̽> [ʔeːên]

เอเนต์ <ʔecent̽> [ʔeːên]

เอเนต์ <ʔeyent̽> [ʔeːjên]

(The falling tone of the second syllable is unwritten in those spellings. I assume the tone is falling on the basis of alternate spellings with a first tonal marker*, though I would expect a high tone in a final syllable that orignally ended in a nasal-stop cluster in English.)

But I have never seen a foreign [dʒ] rendered as Thai จญ <cñ> [tj] before. Another instance of <cñ> is

ฮัจญ์ <ɦacñ̽> [hát] 'hajj' (with a silencer over <ñ>; syllable-final <c> is [t])
ญ <ñ> normally represents [j] from an earlier *ɲ. Why is it in transcriptions for a nonnasal consonant?

3. What principles underlie the choice of tones for Islamic/Malay loans in Thai? The first closed syllable of 'hajji' has both high and low tones (the two most common possibilities for closed syllables with short vowels), and the second open syllable has an unmarked mid tone (like some but not all English loans).

4. Who devised these Thai spellings? Were they Thai-Malay bilinguals? Did they know the Jawi script for Malay (which I briefly mentioned here)?

The Malay spoken in the Thai-Malaysia border region has a number of interesting phonetic characteristics that are apparently not reflected in Jawi spelling which seems to be historical. The fronting of *aN to /ɛː/ reminds me of the *-an > *-ai shift from parts 2 and 5 of my series of Baishuicun; in both cases, a final nasal conditions the fronting of a preceding *a.

*The first tonal marker indicates a falling tone in a sonorant-final syllable when it is atop a *voiced consonant symbol such asญ <ñ> or ย <y>. It indicates a low tone in such a syllable when it is atop an *implosive or *voiceless consonant symbol. The starred (i.e., reconstructed) qualities are not necessarily retained in modern Thai. *Voiced obstruents have devoiced,  *voiceless sonorants have voiced, and the *implosives are no longer implosive: e.g.,

*ban¹ > [ân]

*an¹ > [màn]

*ɓan¹ > [bàn]

On the other hand, *voiced sonorants and *voiceless obstruents retain their original voicing qualities:

*man¹ > [mân]

*pan¹ > [pàn]

*an¹ > [àn]

A tonal split (*¹ > falling/low) compensates for the loss of voiced obstruents and voiceless sonorants. The implosives and have moved into the space vacated by orignal *b and *d (which have become [pʰ] and [tʰ]), but vowels following implosives still bear tones associated with *implosives.


14.9.17.23:51: HAJI, HAZHE, HAZHI

Sorry, I'm on another Sino-Islamic detour.

The Chinese Wikipedia article for hajji has three types of transcriptions (readings here are in Mandarin unless stated otherwise and tones are not included):

1. *velar-initial second syllable:

哈吉 haji [xatɕi], 阿吉 aji

These must postdate the recent palatalization of *k in Mandarin.

2. affricate-initial* open second syllable:

哈只 hazhi [xatʂr̩] (the transcription in the article title), 哈芝 / 哈指 / 哈治 / 哈志 hazhi

The second syllables of these transcriptions have different tones:

'yin level': 芝

'rising': 只指

'departing': 治志

3. affricate-initial *closed second syllable:

哈哲 hazhe [xatʂɤ] (Cantonese haazit [haːtsiːt])

Mandarin 哲 zhe has lost the *-t retained in Cantonese and has a 'yang level' tone in the standard language.

I have several questions:

1. What are forms for 'hajji' in the Chinese varieties spoken by the 回 Hui people?

2. Is there a standard tone class for the second syllable in 回 Hui speech (which is not to be confused with 徽州 Huizhou Chinese)?

3. How is 'hajji' written in the toneless Xiao'erjing script?

4. Do the spoken and written forms in the Hui community match the transcriptions of the non-Hui Chinese world?

5. What is the oldest known Chinese character transcription of 'hajji'? My guess is that the earliest transcriptions were of the hazhi type.

6. Wikipedia states that 哈哲 hazhe (Cantonese haazit) is the transcription used in Hong Kong and Macao. If this transcription was devised by a Cantonese speaker, why does its Cantonese reading have a -t corresponding to nothing in Arabic? If I set the Chinese Wikipedia page to display in Hong Kong or Macao complex characters, the title of the article is still 哈只 hazhi (Cantonese haazi [haːtsiː]) which is a better phonetic match in Cantonese.

*These affricates are not original either, but their affrication predates Islam and is not relevant.


14.9.16.23:36: AFANTI

I'm going to take a northern detour away from 白水村 Baishuicun 'White Water Village' to look at Mandarin 阿凡提 afanti 'effendi' (< Arabic afandī or the like). I've long had the impression that such Islamic loanwords were borrowed into Mandarin in recent centuries. However, afanti has an aspirated -t- [tʰ] which is a weak match for foreign -d-. The t- [tʰ] of Mandarin 提 is from *d-. Was  afandī borrowed into a Chinese language that retained *d-? I doubt that for two reasons.

First, I would expect Islamic loans to be from the northwest, and Tangut transcription evidence indicates that *d- had become *tʰ- in the northwest by the early second millennium AD.

Second, a Chinese language retaining *d- in 提 would probably also have retained *v- in 凡. Was afandī borrowed as *avandi? I suppose one could try to evade this problem by proposing that this Chinese language devoiced *v- before *d-, so afandī was borrowed as *afandi. I have thought that the Chinese variety underlying Sino-Vietnamese (John Phan's 'Annamese Middle Chinese'; AMC) might have devoiced *v- before *d-*, but I am not sure. In any case, AMC could not have been the source of afanti for geographical and phonological reasons. 凡 had a final *-m in AMC that does not match the -n- of afandī.

By coincidence the ultimate Greek source of afandī is αὐθέντης <authéntēs> with a -t- corresponding to the  -t- [tʰ] of Mandarin afanti. Obviously afandī postdates several changes in Greek:

- the shift of au to

- the shift of aspirates to fricatives (tʰ > θ)

- the devoicing of β to -f- before voiceless consonants like θ

- the simplification of -fθ- to -f-

- the voicing of -t- to -d- after -n-

- the raising of -ē- to -i-

So I'm back to where I started: why does afandī correspond to Mandarin afanti?

*My logic was that *v- patterned like *f- in Sino-Vietnamese (SV):

AMC SV stage 1 SV stage 2 SV stage 3 Modern SV
*pʰ- *pʰ- *pʰ- *pʰ- ph- [f]
*f-
*v- > *f-
*b- *b- *p- *ɓ- b- [ɓ]
*p- *p-

Early Vietnamese had no *f-, so *pʰ- was the closest equivalent of AMC *f-.

If AMC still had *v-, I would expect it to correspond to SV b- < *b-. (I assume early Vietnamese had no *v-, and that modern v- is from *w-. There are no cases of Chinese *v- corresponding to Vietnamese v-, which leads me to believe that Vietnamese had no *v- at the time of borrowing and that the shift of *w- to v- postdates borrowing.)

I now think SV reflects a stage of AMC in which all voiced obstruents had been devoiced:

AMC SV stage 1 SV stage 2 Modern SV
*pʰ- *pʰ- *pʰ- ph- [f]
*f-
*v- > *f-
*b- > *p- *p- *ɓ- b- [ɓ]
*p-

Vietnamese spelling reflects the *pʰ-/ɓ-stage of the 17th century.


14.9.15.23:40: A DIP INTO WHITE WATERS (PART 6): AI-XCEPTION CLASS 1

About one-seventh of 白水村 Baishuicun (BSC) 'White Water Village' -ai forms cannot be traced back to rhymes with non-*a-vowels plus nasals at the left end of this chain from part 5:

*-VN > *an > *-ai > *-oi > -o

I have classified those remaining forms into eight categories.

The first category consists of -ai forms from Old Chinese *-əˁ:

tai (borrowing layer 2), to (borrowing layer 1) < *dˁəˁ < *Cʌ-dəʔ or *Nʌ-təʔ

mai (borrowing layer 2) < *mˁəˁʔ < *Cʌ-məʔ

tai (borrowing layer 2), lo < *CV-tai (borrowing layer 1) < *Nʌ-tˁəˁsˁ < *Nʌ-təs

the tone of tai (but not lo!) indicates a voiced initial *d- which may be from *Nʌ-t-

pai (borrowing layer 2), (native) < *bˁəˁ < *Cʌ-bə or *Nʌ-pə

I think there are at least three layers in these forms.

is native and may directly reflect Old Chinese *-əˁ. BSC borrowed from prestige dialects whose *-əˁ developed a glide:

*-əˁ > *-əɰ > *-əj > *-aj

The first layer of borrowings predates the *-ai > -o shift in BSC and the second layer postdates it.

The first layer of borrowings also predates lenition and the loss of presyllables in BSC.


14.9.14.23:04: A DIP INTO WHITE WATERS (PART 5): A CH-*AI-N SHIFT?

The 小學堂 Xiaoxuetang database is back, so I can add a new link (in bold) to my 白水村 Baishuicun (BSC) 'White Water Village' chain shift from part 2:

*-VN > *an > *-ai > *-oi > -o

Here are some sample words with composites of prestigious Early and Late Middle Chinese cognates for comparison:

Sinograph Early Middle Chinese Late Middle Chinese Pre-BSC BSC
*dəm *dam *tan tai
*len *lien *lan lai
*ʂɤan *ʂæn *san sai
*khwan *khwan *khan khai
*təŋ *təŋ *tan tai
*lɨəŋ *lɨəŋ *lan lai
*neŋ *nieŋ *nan lai ~ nai
*tshɨm *tshim *tshan tshai
*mun *vun *man mai ~ uai
*mon *mon *man mai
*kən *kən *kan kai
*sin *sin *san sai
*touŋ *toŋ *(CV-)tan lai
*toŋ
*luoŋ *lyoŋ *lan lai

Rhymes with non-*a-vowels plus nasals merged into *-an, which shifted to -ai after an earlier *-an shifted to *-oi and an earlier *-oi shifted to -o. Here is how that merger might have taken place:

Stage 1 Stage 2 Stage 3
*-in *-en *-an
*-em
*-eŋ
*-ɨm *-en or *-on or *-ən
*-əŋ
*-ən
*-on *-on
*-oŋ
*-un

In stage 1, pre-BSC had a vowel system resembling prestige EMC.

In stage 2, pre-BSC front vowels merged into *e before nasals and back vowels merged into *o before nasals. Central vowels could have merged into *e, *o, or before nasals.

In stage 3, pre-BSC *-en and *-on merge into *-an. The *-n conditions an *-i- that remains after the nasal is lost:

*-an > *-ain > -ai

There are two forms in the first table that would not be in the second:

*san > sai

*khan > khai

I think those forms were borrowed after earlier *-an became *-ai which then became -oi. Hypothetical native cognates would be *soi and *khoi.

mai may be native, whereas 文 uai is a borrowing from some late Tang or newer form resembling Sino-Vietnamese văn. (It is geographically impossible for BSC to have borrowed from Sino-Vietnamese, but a local neighboring language could have had a similar form.) uai tells us that the *-an to -ai shift postdates the *m- > v- shift in the source of uai.

The l- of 冬 loi and 東 loi is due to lenition after a prefix *CV- that was lost at some unknown point.

I have kept 龍 loi⁴¹ separate from the homophones 冬 loi⁴⁴ and 東 loi⁴⁴ since they have different tones: a mid-high falling tone from a *voiced initial (*l-) and a mid-high level tone from a *voiceless initial (*-t-).

As has been the case so far, a single rule cannot account for all forms with the same rhyme. I will write about other sources of *-ai in part 6.


14.9.13.22:40: A DIP INTO WHITE WATERS (PART 4): LEFT-*O-VERS

One-fifth of 白水村 Baishuicun (BSC) 'White Water Village' dialect forms ending in -o cannot be explained using the sound laws I proposed in parts  2 and 3. I will try to explain these forms which fall into eight categories. I could have covered the first two categories in part 3, but they are more complicated than the majority of the *-at class.

1. 佛 fo < *fat (< *but) 'Buddha'

This appears to be a loan from a language in which *-ut became *-at as in Cantonese 佛 fat (though Cantonese is not spoken in Hunan and therefore is not the source of fo). An earlier loan is pu which is close to prestigious Early Middle Chinese *but, an abbreviation of 佛陀 *but da 'Buddha'.

Could the word simply be a very recent borrowing from Mandarin fo (whose rhyme is irregular)?

I am reluctant to guess glosses for BSC forms since the 小學堂 Xiaoxuetang database does not provide any, but in this case I'm pretty sure 佛 is 'Buddha' (though 佛 may have other meanings in BSC), and it would be odd to mention 佛陀 *but da without also mentioning its Indic source.

2. 蝨 so < *sat (< *ʂɨt < *ʂit < *srit < *srik)

This might be a loan from a language in which *-it became *-at as in Cantonese 蝨 sat (though I must note again that Cantonese cannot be the source of this particular word).

3. -o < *-ai

In part 2, I dealt with BSC -o-forms corresponding to -ai-type rhymes in other Chinese languages. The following words may have had pre-BSC *-ai even though they may not have -ai-like rhymes in other Chinese languages:

Sinograph Old Chinese Early Middle Chinese Late Middle Chinese Pre-BSC BSC
(*srəj > *ʂɨj?) *ʂi *ʂi *sai so
*Cɯ-baj or *Nɯ-paj *bɨe *bɨi *pai po
*ʔəj *ʔɨj *ʔi *ai o
*pəts > *pɨjh *pujʰ *fi *pai po
*məjʔ > *mɨjʔ *mujˀ *vi *mai mo
*pajʔ *paˀ *pa *pai po
*naj *na *na *nai no
*lats *da(j)ʰ *da(j) *tai to

Pre-BSC *-ai may have been a reflex of Old Chinese *-aj/-əj/-ɨj-type rhymes; it cannot have been borrowed from a prestigious EMC or LMC dialect. (BSC 衣 i is a loan from an prestige LMC-like form.)

篩 is not attested in Old Chinese. Pre-BSC 篩 *sai is reminiscent of Cantonese 篩 sai, though it cannot be from Cantonese. The word could be borrowed from a form like Mandarin 篩 shai (whose reading is from 籭; the hypothetical regular Mandarin reading would be *shi).

4. 麥 mo < *mai

This has an unexpected yang departing tone. Is this a very late loan from a form like Mandarin 麥 mai which lost its final stop and entering tone? If it were a native word or an early loan, it should have an entering tone as a trace of its original *-k.

5. -o < *-raw

ko is ultimately from Old Chinese *kraw but it could be a loan from *kaw or *ko in some later language. I can't look further into this (or anything else BSC-related) because the Xiaoxuetang site is down.

6. -o < (*-wa? <) *-ra

There are three forms in this category: 灑下拏. 咬 from category 4 may also belong to this category, as pre-BSC or a source language may have been like Taiwanese which has -a from both *-ra and *-raw:

咬 T ka (kau is a borrowing)

灑 T sa (borrowed?; se may be native; cf. 下 below)

下 T literary (i.e., borrowed) ha; the colloquial (i.e., native) form is e

拏 T na (borrowed?; displaced a native *ne?)

"Like" does not entail a close relationship. Taiwanese is both genealogically and geographically distant from BSC.

7. -o < *-ə(ŋ)

Is 扔 no from Old Chinese *nəŋ or an open-syllable variant *nə (cf. Japanese no < *nə for 乃, the phonetic of 扔)?

8. Unrelated synonyms

I initially thought -o of 久 no might be a reflex of Old Chinese *-ə in 久 *kʷəʔ, but I doubt n- is from *kʷ-. Moreover, the yang departing tone of no is not what I would expect for a descendant of 久 *kʷəʔ. I conclude that no is an unrelated synonym.

mo < *mat? has an initial and an entering tone (from an earlier final stop) that rule out any relationship with Old Chinese *Cɯ-ʔoj. Even if *C- were *m-, the tone would still be irregular.

no < *nat? has the same problems as 萎 mo; it cannot be related to EMC 拋 *phræw (the word is not attested in Old Chinese).

ŋo < *ŋat? is not related to Old Chinese 鷹 ʔəŋ.

ko superficially resembles Sino-Japanese but may be from *kat which cannot be related to EMC 硬 *ŋɤeŋʰ (the word is not attested in Old Chinese, and the SJ initial is irregular).


14.9.12.23:36: A DIP INTO WHITE WATERS (PART 3): FINAL ST-*AP-S

The chain shift I proposed in part 2 explains three-fifths of -o syllables in the 白水村 Baishuicun (BSC) 'White Water Village' dialect:

*-ai > *-oi > -o

Another fifth requires further sound laws:

*-ap merged with *-at (cf. the *-p > -t shift in Nanchang which is 500 km to the east and not closely related)

*-at > *-ait > *-oi(t) > *-o

I propose that *-t had a fronting effect on *a similar to the fronting effect of -d in Tibetan:

'eight': Proto-Sino-Tibetan ?*prjat >

Old Chinese *pret > pre-BSC *pat > pait > *poi(t) > BSC po

Earlier Tibetan brgyad > Lhasa cɛʔ

Here are more examples to show the merger of multiple *-t and *-p rhymes. The Early and Late Middle Chinese forms here are composites based on prestige dialects and are not directly ancestral to BSC.

Sinograph Early Middle Chinese Late Middle Chinese Pre-BSC BSC
*ləp *lap *lap > *lat lo
*ɣɤap *ɣæp *xap > *xat xo
*kɤep *kæp *kap > *kat ko
*ɣwet *xwiet *xwat fo
*xɤat *xæt *xat xo
*pɤet *pæt *pat po
*puot *fat *fat fo

Although BSC no longer has any final stops, they have conditioned tones absent from words without original final stops: high level if the initial consonant was *voiceless and mid level if the initial consonant was voiced. I don't know when the final consonants were lost after *a shifted to *ai before *-t.

發 must be a loanword because it has *f- instead of the expected *p- (see my discussion of 煩 xoi and 佛 pu ~ fo in part 1). Foreign *f- might have been borrowed as *f- after BSC developed its own *f- from *xw- in words such as 穴. Conversely, it is also possible that *xw- became *f- after loanwords introduced *f- into the BSC phonemic inventory.


14.9.11.23:41: A DIP INTO WHITE WATERS (PART 2): A CH-OI-N SHIFT?

In part 1, I proposed the following sound change for the 白水村 Baishuicun (BSC) 'White Water Village' dialect:

*-an > *-oi

I now propose a larger chain shift:

*-an > *-ai > *-oi > -o

BSC -o often corresponds to *-e/*-aj-type rhymes in prestigious Early and Late Middle Chinese dialects which were not its ancestors (but might have been sources of loans into BSC). I do not include tones in pre-BSC and BSC forms. I have not heard BSC, but I assume that its -i is [j] after vowels, so there is no real difference between MC *-j and (pre-)BSC (*)-i.

Sinograph Early Middle Chinese Late Middle Chinese Pre-BSC BSC
*najʰ *nàj *nai > *noi no
*nəjʰ *nəj > *naj
*buojʰ *fàj *pai > *poi po
*bɤajʰ *bàj
*kɤe *kæj *kai > *koi ko
*kɤej

My proposal accounts for 59% (58/99) of the -oi forms in the 小學堂 Xiaoxuetang database. I will deal with the others in part 3.

BSC p- corresponding to LMC *f- (e.g., in 吠) indicates a native word. A hypothetical early loan of 吠 would be *xo and a hypothetical late loan would be *fo (see my discussion of 煩 xoi and 佛 pu ~ fo in part 1).


14.9.10.23:40: A DIP INTO WHITE WATERS (PART 1)

Over the past couple of days I have been intrigued by the dialect of 湘南土話 Xiangnan Tuhua 'local speech of southern Hunan' spoken in 白水村 Baishuicun 'White Water Village' in 江永縣 Jiangyong County. In " 'More' Evidence", I found that 更 Old Chinese (OC) *kraŋ(s) / Middle Chinese (MC) *kɤaŋ(ʰ) 'watch of the night'/'more' corresponded to Baishuicun (BSC) koi. I hypothesized that -oi was from an *-aɲ like that of Sino-Vietnamese canh/cánh [kaɲ]. Looking at other BSC -oi forms, I can make a more general statement: *A/O-type vowels followed by nonback nasals (*ɲ, *n, *m) became -oi. The nasals probably merged into *-n before becoming -i.

Sinograph Early Middle Chinese Late Middle Chinese Pre-BSC BSC
*buan *van *xan xoi
*kən *kən *kon or *kan? koi
*tan *tan *CV-tan loi
*kwɤan *kwæn *kwan koi
*ʂɤen *ʂæn *san soi
*kɤaŋ *kæŋ *kaɲ koi
*kɤeŋ
*ŋɤem *ŋæm *ŋam ŋoi
*ʂɤam *ʂæm *sam soi
*bon *bon *pon > *pan? poi

Pre-BSC is a very rough guess bridging Late Middle Chinese (LMC)* and BSC. It looks more like a typical Chinese language than BSC does.

xoi is probably a loanword. I think the native BSC reflex of EMC *b- is p-: e.g., 佛 pu 'Buddha' (a later borrowed form is fo). BSC is not descended from generic LMC dialects which lenited labial stops to fricatives before *u. The borrowing of 煩 must predate the shift of *-an to *-oi and the borrowing of *f-. *x- was the closest pre-BSC equivalent of foreign *f-. I conclude there are at least two layers of borrowings that can be distinguished by their treatment of foreign labiodental fricatives: an older x-layer and a newer f-layer.

I think the l- of 單 is due to BSC-internal lenition.

The above scheme accounts for most but not all instances of -oi in BSC. Two requiring further investigation are

吾 OC *ŋa > EMC, LMC *ŋo : BSC ŋoi

崖 OC *ŋre > EMC *ŋɤe > LMC *ŋæj : BSC ŋoi

Neither had nasals in earlier Chinese. The normal BSC reflexes of OC *-a and *-re are -u and -o.ŋoi could be a borrowing from a dialect that had broken *-ɤe to *-æj or the like. But the -i in 吾 ŋoi remains a mystery.

The above scheme cannot account for cases in which -oi did not develop from *-an: e.g., 肝 BSC kaŋ (not *koi!) <  OC/MC *kan. It seems that velars somehow blocked the *-an to -oi shift.

Next: A ch-oi-n shift?

*9.11.23:03: The LMC reconstruction here is a composite of the prestige dialects underlying borrowed forms in Chinese and non-Chinese languages. It is not a direct ancestor of pre-BSC, though such an ancestor may have been similar and may have borrowed from an LMC prestige dialect.


14.9.9.23:27: 'SECONDAR-Y' ROUNDING IN CANTONESE

The unexpected labiovelar /kʷ/ in  Cantonese 梗 /kʷaːŋ˧˥/ 'stem' (among many other meanings) from my last post brought to mind a Cantonese form that has puzzled me for many years: 乙 /jyːt˧/ 'second Heavenly Stem'. I used to think its rounded vowel /yː/ was unique, as it wasn't in any reconstruction or actual form that I had ever seen: e.g.,

Old Chinese *ʔi̯ɛt (Karlgren), *qrig (Zhengzhang), my *ʔrət or *ʔrit (I cannot find any rhyming evidence favoring one vowel over the other*)

Middle Chinese: *ʔi̯ĕt (Karlgren), *ʔɣiɪt (Zhengzhang), my *ʔɨit

Mandarin yi

Taiwanese it

Sino-Vietnamese ất [ʔət]

Sino-Korean ŭl [ɯl] < idealized ʔɯ́rʔ

Sino-Japanese otsu < *ət

However, I now see that rounded vowels are not only in Yue varieties like Cantonese but also a few southern non-Yue varieties:

Yue: too many /y/-varieties to list; other rounded vowels (or glides?) are in

Xintian Fantian, Dapu Taiheng jɵk

Kaiping (Chikan) zuat

Taishan (Taicheng) zᵘɔt (? - there is no syllable like this in Stephen Li's Taishan syllabary)

Mengshan iut

Huaiji wut

Dongguan (Guancheng) zøt

Bao'an (Shajing) (j)iɔʔ

Ping: Nanning yt (loan from Cantonese?)

Hakka: Huizhou yat

Unclassified:

Zhongshan (Gong'an) iuə

Shaozhou Tuhua:

Xingzi ɵy

西岸 Xi'an oi

Bao'an u

All the non-Yue varieties are within the Yue area, so their rounding may be due to Yue influence.

If rounding is a Yue innovation, why did it happen? Both 梗 and 乙 had medial *-r- in Old Chinese. Did that *-r- sporadically become *-w-? (Cf. Elmer Fudd's "wascally wabbit".) Are there other Old Chinese *-r- words with modern labial reflexes?

I would expect kw-reflexes of Old Chinese 甲 *qrap 'first Heavenly Stem', but the only remotely similar forms are

Yixian kɔɐ̆ʔ

Guilin (Chaoyang) kuo

Jiangyong Chengguan (Baishuicun) kuə

whose diphthongs might be breakings of an *o from *a (cf. o-forms like Lingchuan (Tanxia) ko). None of those three varieties have rounded vowels in 乙, though Baishuicun does have a rounded vowel in 梗.

*My Old Chinese *ʔrət and *ʔrit could both become Middle Chinese *ʔɨit:

OC *r-vocalization -breaking monophthongization
*-ət *-ɨət *-ɨt (> *-ut after labials)
*-rət *-ɨət *-ɨit
*-rit *-ɨit
*-it

I have included two other rhymes for comparison.

There was a chain shift:

*-ət > *-ɨət > *-ɨit

That could be interpreted as a push or pull chain:

Push: When *-ət broke to *-ɨət, it 'pushed' original *-ɨət into the 'space' of *-ɨit.

Pull: When original *-ɨət merged with *-ɨit, it left a gap to be filled by *-ət after it broke to *-ɨət.

I generally prefer pull chains, but mixed reflexes of *-ət and *-rət might point to a push chain.


14.9.8.23:08: 'MORE' EVIDENCE FOR THE LIMITS OF THE MIDDLE CHINESE LEXICOGRAPHICAL TRADITION

Last night I saw this passage in the Wikipedia article on Cantonese phonology:

There are about 630 sounds [i.e., syllables disregarding tones?] in the Cantonese syllabary. Some of these, such as /ɛː˨/ and /ei˨/ (欸), /pʊŋ˨/ (埲), /kʷɪŋ˥/ (扃) are not common any more; some such as /kʷɪk˥/ and /kʷʰɪk˥/ (隙), or /kʷaːŋ˧˥/ and /kɐŋ˧˥/ (梗) which has traditionally had two equally correct pronunciations are beginning to be pronounced with only one particular way uniformly by its speakers (and this usually happens because the unused pronunciation is almost unique to that word alone) thus making the unused sounds effectively disappear from the language [...]

At first I was puzzled by 梗 /kʷaːŋ˧˥/ 'stem' (among many other meanings) which has a labiovelar initial even though it is written with a velar phonetic 更 'watch of the night'/'more'. I have never seen an Old Chinese reconstruction of 梗 with a labiovelar or labial. 梗 had no *-w- according to the Middle Chinese lexicographical tradition based on prestige varieties. But not all modern forms arise from those varieties. Forms in multiple branches of Chinese (written here without tones) may point to *-w-:

Yunhe kuɛ (see here for more Wu forms with -u-)

Nanchang ku (see here for more Gan forms with -u-; is Leping mu a typo for kuaŋ?)

Fuzhou literary (!) ku ~ colloquial keiŋ (see here for more Min forms with -u-)

Lechang kuɐn (see here for more Yue forms with -u-)

Lingui yɛn (the only Ping form with a labial; the aspiration is irregular and can be sporadically found in other Ping varieties and in Yue, Hakka and even Mandarin)

Meixian literary (!) ku ~ colloquial kɛn (see here for more Hakka forms with -u-)

Fengyang kua (see here for more Shaozhou Tuhua forms with -u-)

The -u- and -y- of those forms cannot be derived from Middle Chinese reconstructions for 梗 such as my *kɤaŋˀ or Old Chinese reconstructions derived in turn from those reconstructions: e.g., my *kraŋʔ. (I reconstruct its phonetic 更 as Middle Chinese *kɤaŋ(ʰ) from Old Chinese *kraŋ(s).)

梗 has no labials in Mandarin, Jin, or Xiang. Was labiality lost in the north, or is it a common retention of southern languages that do not form a subgroup? Fuzhou, Meixian, and perhaps other varieties may have borrowed from one or more southern literary Middle Chinese dialects with a labial absent from other prestige dialects.

For comparison, 更 does not have a labial with a few exceptions:

Shaxian and Sanming kɔ̃ (< *kaŋ?; Sanming also has kɛ̃; other Min forms here)

Yangshuo kyɛ̃ (but Lingui kəŋ; other Ping forms here)

Hezhou kɔ (< *kaŋ?)

Jiangyong Chengguan (Baishuicun) koi (< *kaɲ?; cf. Sino-Vietnamese canh 'watch of the night' ~ cánh 'more' [kaɲ])

The labials of most of these forms do not necessary point to *-w-. The shifts I propose for Shaxian, Sanming, and Hezhou have parallels in northwestern Middle Chinese (in which *-aŋ became *-o; a similar shift occurred in neighboring Tangut and its relative Japhug rGyalrong).


14.9.7.23:57: *PI̵K A CODA

Here's something I don't see every day: a Chinese character (逼) whose readings have three different types of codas:

Velar/glottal: Cantonese bik [pɪk], Suzhou ʔ (source)

Alveolar/dental: Sino-Japanese hitsu < *pit

Labial: Sino-Korean phip

Its Middle Chinese reading was *pɨk. I can't explain this diversity.

9.8.0:48: There are Chinese readings with -t as well, but they are regular reflexes of *-k after front vowels: e.g.,

Toisanese pet < *pek (source)

Meixuan Hakka pit < *pik (source)

That is not the case with Sino-Japanese hitsu with a -tsu instead of the expected -ki. There was no such fronting rule in Japanese or in the Chinese source dialects of Sino-Japanese.

Conversely, 匹 Middle Chinese *phit has two Sino-Japanese readings: hiki as well as the expected hitsu < *pit.

Sino-Korean is full of irregularly aspirated labial initials. In fact there is no *pa in Sino-Korean; all syllables that should be *pa are pha: e.g., 波 pha < Middle Chinese *pa. I have long wondered if this was the product of hypercorrection. (Korean never had f, so Chinese *f- was Koreanized as the stops p- and ph-, and words without *f- in Chinese such as 波 might have been read as if they had *f-.)

The idealized Sino-Korean readings of Tongguk chŏngun (1448) lack this excess aspiration of labials. (I would expect the Tongguk chŏngun reading of 逼 to be *pík, but I can't find 逼 in that dictionary. Although Martin 1992: 126 listed pík in a table of Tongguk chŏngun readings, I don't see it in the book itself.)

On the other hand, Sino-Korean is almost completely lacking in kh-readings, though Tongguk chŏngun has them where they are expected: e.g.,

可 Sino-Korean ka, Tongguk chŏngun khǎ < Middle Chinese *khaˀ

This may tell us something about the chronology of the development of Korean aspirates which are either borrowed or secondary.


14.9.6.20:39: THE M-ISSING COMMISSAR

Today I read about Genrikh Lyushkov (1900-1945?), whose title brought to mind a question that I've had for a long time: why was German Kommissar borrowed into Russian as комиссар komissar? Why is an m 'missing' from that word and команда komanda 'team' (< French commande)? Was there a rule to simplify sequences of identical consonants at prefix-root boundaries in spellings of loans?

Latin com-missarius > R komissar

Latin com-mendare > R komanda

(The Latin forms are for root identification only and do not necessarily match the later forms' parts of speech, etc.)

But what about

Latin com-mercium > F commerçant > R коммерсант kommersant (not *комерсант komersant) 'merchant'

and Latin com-mutator > R коммутатор kommutator (not *комутатор komutator) 'switchboard'?

That rule obviously does not apply to native words with secondary sequences resulting from syncope: e.g., введение vvedenie 'introduction' < въведеніе vŭ-vedenie 'in-leading' ≠ ведение vedenie 'leadership'.

Sequences of identical consonants within a root word remained intact: e.g., the -ss- of missarius and the -mm- of communis (hence R коммунизм kommunizm 'Communism').

Elsewhere in East Slavic, although both Belarusian and Ukrainian have phonemic gemination, all of the above loanwords (presumably borrowed from Russian) lack geminates:

Gloss Russian Belarusian Ukrainian
commissar комиссар
komissar
камісар
kamisar
комісар
komisar
team команда
komanda
каманда
kamanda
команда
komanda
merchant коммерсант
kommersant
камерсант
kamersant
комерсант
komersant
switchboard коммутатор
kommutator
камутатар
kamutatar
комутатор
komutator
Communism коммунизм
kommunizm
камунізм
kamunizm
комунізм
komunizm

R комитет komitet / B камітэт kamitet / U комітет komitet does not fall into this category since its French source comité (< English committee) already lacked the double consonants of Latin committere.

Finnish komissaari 'commissioner' looks like a borrowing from Russian komissar.

Why does Serbo-Croatian комесар komesar have an -e- instead of an -i-?

9.6.20:48: And why did Old Latin comoine(m) become Latin communis 'common' with -mm-?


14.9.5.23:55: LIANMA, LINGMO, LINYIN

Most Chinese character spellings of foreign place names in Japanese can be explained in terms of Chinese and/or Japanese readings.

One baffling exception is 布哇 Hawai 'Hawaii' which makes no obvious sense in either Chinese or Japanese. I have written about it thrice (2008, 2010, 2012). I can't think of a better explanation than what I proposed in 2012.

I discovered what initially appeared to be another exception tonight in Yamamoto (2009: 81): 嗹馬 Denmāku 'Denmark' which would be read as Lianma in Mandarin and as *Renba in normal Sino-Japanese. I didn't think it could have been created by a Japanese speaker because Japanese not only has [d] but also has characters pronounced [den]. Mandarin, on the other hand, has no [d] (what is romanized d is actually an unaspirated [t]). So was voiced l intended to be a substitute for voiced d? Apparently it was, as it and similar Chinese names for Denmark turn up in the 1852 edition of the 海國圖志 Illustrated Treatise on the Maritime Kingdoms by 魏源 Wei Yuan:

嗹國 Lianguo (guo is 'country')

領墨 Lingmo (-ng would be an acceptable substitute for -n to a speaker of a Chinese variety like Shanghai without an -n : -ng distinction)

吝因 Linyin (I have no idea what -yin is doing)

9.6.0:36: The Japanese may have taken the spelling 嗹馬 from the Treatise given its influence in Japan:

Wei's work was also to have a later impact on Japanese foreign policy. In 1862, samurai Takasugi Shinsaku, from the ruling Japanese Tokugawa shogunate, visited Shanghai on board the trade ship Senzaimaru. Japan had been forced open by US Commodore Matthew C. Perry less than a decade earlier and the purpose of the mission was to establish how China had fared following the country's defeat in the Second Opium War (1856–1860). Takasugi was aware of the forward thinking exhibited by those such as Wei on the new threats posed by Western "barbarians" [...] Sinologist Joshua Fogel concludes that when Takasugi found out "that the writings of Wei Yuan were out of print in China and that the Chinese were not forcefully preparing to drive the foreigners out of their country, rather than derive from this a long analysis of the failures of the Chinese people, he extracted lessons for the future of Japan". Similarly, after reading the Treatise, scholar and political reformer Yokoi Shōnan became convinced that Japan should embark on a "cautious, gradual and realistic opening of its borders to the Western world" and thereby avoid the mistake China had made in engaging in the First Opium War. Takasugi would later emerge as a leader of the 1868 Meiji Restoration which presaged the emergence of Japan as a modernised nation at the beginning of the 20th century. Yoshida Shōin, influential Japanese intellectual and Meiji reformer, said Wei's Treatise had "made a big impact in our country".


14.9.4.22:45: *BAKUSHIKO AND *BAKUSHIK(W)A

I almost added this to my last post, but I ran out of time, and the topic is somewhat different, n  ...

I have long been puzzled by Chinese character spellings of foreign place names in Japanese. Some seem to be hybrids of Chinese and Japanese readings.

For instance, when I Googled for モスコウ Mosukou and 1939 last night, I found 北京より 莫斯古へ Pekin yori Mosukou e, 高山洋吉 Takayama Yōkichi's 1939 translation of Sven Hedin's Von Peking nach Moskau. Although the Kobe University City Library has it catalogued as Pekin yori Mosukuwa e (with the currently dominant Japanese name of the city - the most likely term to be searched), I assume 莫斯古 was meant to be read as Mosukou since the name appears as モスコウ Mosukou in the title of chapter 11. 莫斯古 Mosukou would be read as *Mosigu [mwɔ sz̩ ku] in Mandarin and *Bakushiko in normal Sino-Japanese. Whoever created that spelling seemed to be thinking of Mandarin 莫斯 [mwɔ sz̩] followed by Sino-Japanese 古 ko. (There is no [mɔ] or [mo] in standard Mandarin.)

I used to think another Japanese spelling 莫斯科 was a direct loan from Mandarin Mosike [mwɔ sz̩ kʰɤ] (ke was once [kʰɔ] and is still [kʰɔ] or the like in many other varieties of Chinese today). But could it be a blend of Mandarin 莫斯 [mwɔ sz̩] followed by Sino-Japanese 科 kwa (pronounced [ka])? Was 莫斯科 first attested in Chinese or Japanese? In any case, it cannot be based on its hypothetical normal Sino-Japanese reading *Bakushikwa.

Tonight I discovered a third Japanese spelling 莫斯哥 in Yamamoto (2009: 78). This looks like a direct loan from the less common Mandarin name Mosige [mwɔ sz̩ kɤ] (ge was once [kɔ] and is still [kɔ] or the like in many other varieties of Chinese today). In normal Sino-Japanese, it would be read *Bakushika which sounds nothing like Moskva or Moscow.

In the Kobe University Library Newspaper Clippings Collection, 莫斯科 appears 815 times between 1912 and 1941, 莫斯哥 appears only once in 1916, and 莫斯古 does not appear at all. The three katakana spellings combined outnumber the kanji spellings by nearly two to one (1564 : 816).


14.9.3.23:59: MOSUKUWA VS. MOSUKŌ (AND MOSUKOU)

While looking up ワルシャワ Warushawa and ワルソー Warusō in various editions of Kenkyusha's New Japanese-English Dictionary, I noticed that their distribution paralleled that of モスクワ Mosukuwa (< Russian Moskva) and モスコー Mosukō (< English Moscow). Was there a shift toward more Slavic-flavored Japanizations by 1974?

Here is the distribution of both terms and a third term in the Kobe University Library Newspaper Clippings Collection:

Mosukuwa: 866 results, 1912-1942

Mosukō: 568 results, 1912-1942

モスコウ Mosukou: 130 results, 1915-1936

I was expecting Mosukuwa to be less common than Mosukō by analogy with Warushawa and Warusō, but the reverse is true. I wish I had postwar statistics. Here are current Google statistics showing the gaps between the three have widened considerably:

Mosukuwa: 1.36 million

Mosukō: 119,000 results including non-Russian Moscows (cf. the use of Warusō for non-Polish Warsaws)

Mosukou: 10,600 results including モスコウイッツ Mosukowittsu 'Moskowitz'

I've never heard Moscow rhyme with Mexico. Is that pronunciation still current, and if so, where?


14.9.2.23:23: WARUSHAWA VS. WARUSŌ

The influence of English on Japanese has only grown over time, while the influence of other European languages has waned: e.g., German-based dēdētē 'DDT' (in this dictionary of extinct Japanese words) has been replaced by English-based dīdītī. (What was the last major German or French loanword in Japanese?) So when I see a continental European loanword, I assume it is pre-1945: e.g., ワルシャワ Warushawa 'Warsaw' which sounds like Polish Warszawa (though Polish w is [v]).

That was why I was surprised to see an English-like ワルソー Warusō for 'Warsaw' in the September 2, 1939, Asahi shinbun. (Yes, the 75th anniversary of the beginning of WWII is still on my mind.) How far back do Warushawa and Warusō go? I wish Google Ngram Viewer worked with Japanese.

I quickly found various attestations of Warusō from the period:

- the September 18, 1939, entry of the diary of 馬淵良三 Mabuchi Ryōzō

- the October 6, 1939, 大陸日報 Continental Daily News published in Vancouver, BC

- 宮本百合子 Miyamoto Yuriko, "The Flames of the Life of Mrs. Curie" (December 1939)

- the Privy Council's "Abolishing an Imperial Embassy in Poland" (October 1, 1941)

Was Warusō the standard Japanese name for Warsaw at the time? Judging from Wikipedia, today it seems to linger only in a few contexts such as ワルソー条約 Warusō jōyaku 'Warsaw Convention' (1929; cf. ワルシャワ条約 Warushawa jōyaku 'Warsaw Pact' with the same jōyaku) and ワルソー・コンチェルト Warusō koncheruto 'Warsaw Concerto' (1941). The Japanese Wikipedia entry for Warsaw doesn't mention Warusō as an alternative of Warushawa.

14.9.3.21:51: Looking in various editions of Kenkyusha's New Japanese-English Dictionary, I discovered that editions prior to 1974 only listed Warusō. The 1974 edition listed both Warusō and Warushawa for the first time.

The 1975 edition of Sanseido's New Concise Japanese-English Dictionary that I have been using for over thirty years lists only Warushawa in its appendix of place names.

It would be interesting to see when, say, Asahi shinbun shifted from Warusō to Warushawa.

14.9.3.22:36: A couple more data points: I just found Warusō in the December 20, 1919 Ōsaka asahi shinbun (image / HTML) and Richard Austin Freeman's The Case of Oscar Brodski, translated into Japanese by 妹 尾韶夫 Seno Akio in 1957.

Okay, a few more: Warushawa first appears in the Kobe University Library Newspaper Clippings Collection in the May 30, 1913, Jiji shinpō (image / HTML), and appears in papers up through 1939 (image / HTML). Warusō first appears in that collection in 1915 (image / HTML), and last appears in 1941 (image / HTML). Warusō outnumbers Warushawa by a ratio of roughly seven to one (138 : 21). Today in Google, Warusō is vastly outnumbered by Warushawa (20,300 : 515,000). Warusō is the only Japanization of Warsaws outside Poland, but I presume those other Warsaws aren't mentioned enough to give Warushawa serious competition.


14.9.1.22:09: GYDDANYZC

I have long been interested in Slavic partly because it underwent massive vowel loss paralleling the massive vowel losses I reconstruct for Chinese and Tangut.

My favorite example is monosyllabic Gdańsk from *Gŭdanĭskŭ* with four syllables (Comrie 1987: 326). That city has been on my mind lately because today is the 75th anniversary of the German invasion of Poland.

On Saturday I learned that Gdańsk is first attested as Gyddanyzc sometime after 997 AD. Is that spelling evidence for

- the retention of medial and

- the loss of final

circa 1000 AD?

Why were the vowels that were later lost both spelled y? Had they merged into [ɨ] in the version of the name that was transcribed? They could not have merged in the ancestor of the modern name Gdańsk, as ń is from *nĭ and still reflects the palatal quality of the lost vowel *ĭ. Or is ny a transcription of [ɲ]?

Is the doubling of d significant?

Why was the sibilant before c [k] written as z instead of s? The z also appears in the later spellings Kdanzk (1148), Gdanzc (1188), and Danzc (1263). I assume the z of the spelllings Danczk (1311), Danczik (1399), and Danczig (1414) is half of a digraph cz [tʂ] and is not evidence for [z].

*This matches Old Church Slavonic Гъданьскъ <Gŭdanĭskŭ>. Is that form of the name attested in ancient texts, or is it a retroactive creation?

I assume the ISO 639-1 code cu for OCS is from c(h)u(rch). cu makes me think of Cuman which has no ISO 639-1 code; its three-letter ISO 639-3 code is qwm. qum and cum were already taken for Sipakapense in Guatemala and Cumeral in Colombia.


14.8.31.1:31: EAT-YMOLOGY 3: *NZ- > *NDZ-?

I just realized that 'eat' from my last post wasn't the best example of a word that had undergone brightening without lenition. What if lenition were followed by fortition (in bold) after a nasal?

stem 1: *NI-dza > *NI-dzja > *NI-z- > *Nz- > *ndz- > dzi 1.11

stem 2: *NI-dza-w > *NI-dzjaw > *NI-z- > *Nz- > *ndz- > dzio 1.51

Perhaps these are better examples of brightening without lenition:

Tangut 0749 phi 1.11 'to order' (stem 1), 4568 phio 2.44  'to order' (stem 2)  : Japhug kɤ-ɣɤ-xpra 'to order'

also cf. Somang ka-wa-kprá 'to order' preserving Proto-rGyalrong *kpr-

Pre-Tangut *CI-Kpra(-w-H) > *CI-Kprja(w-H) > *Kpr- > phi(o) 1.11/2.44

or Pre-Tangut *KI-pra(-w-H) > *KI-prja(w-H) > *Kpr- > phi(o) 1.11/2.44

Tangut 5449 1tị 'to put' 1.67 (stem 1), 5633 1tiọ 'to put' 1.72 (stem 2)  : Japhug kɤ-ta 'to put'

Pre-Tangut *CI-S-ta(-w) > *CI-Stja(w) > *tt- > ti ~ tiọ̣ 1.67/1.72

or Pre-Tangut *SI-ta(-w) > *SI-tja(w) > *tt- > ti ~ tiọ̣ 1.67/1.72

If lenition had preceded brightening, ph- and t- would have lenited to *v- and *l- in those words.

I do not know if the *I of the brightening presyllable followed *K- that conditioned the aspiration of ph- and/or the *S- that conditioned the tension of rhymes 1.67 and 1.72 (indicated by a subscript dot). *I could have been in a presyllable preceding one or both of those consonants.

Pre-Tangut *K(I-)pr- nicely matches Proto-rGyalrong *kpr-, but pre-Tangut *S(I)-t- does not match Proto-rGyalrong *t-. Perhaps the aspirated th- of Written Burmese thāḥ 'to put' is from *St-.

Old Chinese 置 *trək-s 'to place' may be an unrelated lookalike even if it is from *r-tək-s, as it has a *-k absent in  the other languages.

I cannot explain why stem 2 of 'to order' has a second ('rising') tone from an *-H absent in stem 1.


Tangut fonts by Mojikyo.org
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2014 Amritavision