Archives YAT AND ETA

Today I realized that my interpretation of the early Slavic vowel yat as [ɛː] (< *ai) sounded like the classical value of the Greek letter Η eta. Since Cyrillic is an offshoot of the Greek alphabet, one might expect yat to have been written with an eta-based Cyrillic letter. But of course eta was actually the model for the Cyrillic letter И <I> because eta had raised to [i] by the 4th century AD, long before Cyrillic was created in the late 9th century. [ɛː] was long gone in Greek, so a non-Greek letter was created for yat: Ѣ.

Ѣ looks like a derivative of the front yer letter Ь [ɪ] which in turn looks like a derivative of the Glagolitic front yer letter Ⱐ. But it is strange that a lower mid long vowel was written with a modified lower high short vowel rather than, say, with an additional stroke (like Czech ě which is nowadays used to transliterate yat). I don't see any resemblance between Ѣ and its Glagolitic counterpart Ⱑ.

5.14.23:24: According to Wikipedia, Schenker (1995) thought Ⱑ might be from Greek alpha Α. That makes a lot of sense if yat were [æ].

Modern reflexes of yat vary considerably in height from [ja] with a low vowel in eastern Bulgarian* to [i] in Ukrainian.

*Eastern Bulgarian has two reflexes of yat: [ja] and [ɛ]. The former is in stressed syllables not followed by front vowels. The latter occurs elsewhere. PROTO-CELTIC VOICED ASPIRATES?

I've seen this Proto-Celtic word list before, but I didn't notice voiced aspirates in it until now:

*mori-steigh-(e/o-) 'sea'

*men-n-dh-e/o- (?) 'want'*

*ati-od-bher-to- (?) 'sacrifice'

Are those pre-Proto-Celtic forms? I thought Proto-Celtic lost aspiration in voiced consonants:

Proto-Indo-European *gh *dh *bh > Proto-Celtic *g *d *b

*5.14.0:42: This reminds me of Avestan mazdā- 'wisdom' < *mn̥s-dheʔ 'mind-place', though the first root is in the e-grade in Celtic. CHU AND KRA-DAI (PART 2)

Here's my attempt to reconstruct the Old Chinese (OC) phonetic series of 楚 (Schuessler 2009 series 1-62, Karlgren 1957's series 88 plus 90) to make it fit Chamberlain's (2016) hypothesis from part 1.

The series has five types of Early Middle Chinese (EMC) readings (ignoring final consonants):

I. *sɨə-*Cɯ-sa- (*kɯ-sa-?) (胥湑稰諝糈壻婿)

II. *ʂɨə- < *kɯ-sa- (疋疏蔬梳糈)

III. *tʂʰɨə- < *kʂʰɨa- < *kɯ-sa- (楚 only)

IV. *ŋæ- < *ŋgʐa- < *N-k-sa- (alternate reading of 疋 only)

V. *sej < *se (alternate reading of 壻婿 only)

The high-vowel presyllables of types I-III conditioned medial *-ɨ- which in turn conditioned the raising of *a to *ʂɨə.

The high-vowel presyllables of type I was lost after conditioning medial *-ɨ-, but they fused with *s in types II and type III. *kɯ-s- that fused early became EMC *ʂ- via *kʂ-; *kɯ-s- that fused late became EMC *tʂʰɨə- via *kʂʰ-.

Type III *kʂʰɨaʔ might have approximated an early Kra-Dai *kraʔ, especially if it were phonetically something like [kʁaʔ].

(5.12.0:56: Or if 'Kra' were [kʐaʔ]. Cf. Polish krz [kʂ] from *kʐ- < *krʲ-. Pittayaporn 2009: 99 reconstructed *ks- as a Proto-Tai source of Proto-Southwestern Tai [and hence Siamese] *kʰr-, though he does not list any examples of Proto-Tai *ks-, and he reconstructed the Proto-Tai cognate of 'Kra' as *kraː C 'slave' with *kr- rather than *ks-. Siamese kʰaː C1 'slave' lacks the -r- that would point to medial *-s-. If *ks- became Siamese kʰr-, perhaps *kz- became *kr- and then Siamese kʰ-.)

*N- fused with *k- to form the *ŋ- of type IV.

(5.12.0:11: OC *a fronted to after retroflexes.)

The *-e rhyme of type V is anomalous and unique to 壻~婿 'son-in-law'; it cannot be reconciled with the *-a rhyme of the other types.

5.12.1:03: Added all examples of each type listed in (Schuessler 2009: 59) plus 疋 as the sole example of type IV which was not listed in Schuessler. CHU AND KRA-DAI (PART 1)

Chamberlain (2016) proposed that the name of the state now known as 楚 Chǔ in Mandarin is the same name as Kra as in Kra-Dai. This is an ingenious idea. But does it really work?

The rhymes certainly match. 楚 ended in *-aʔ in Old Chinese, and 'Kra' in Proto-Kra-Dai was something like *kraʔ (cf. Ostapirat's Proto-Kra *kra C 'Kra' and Pittayaporn's Proto-Tai *kraː C 'slave'; I interpret the C tone category as *-ʔ like Norquest 2016).

The trouble is the initial. If 楚 had initial *kr- in Old Chinese, it would have become Early Middle Chinese †kæʔ and Mandarin †jiǎ. But instead it became Early Middle Chinese †*tʂʰɨəʔ and Mandarin chǔ [tʂʰu] with aspirated retroflex initials.

Can those initials be reconciled?

Pulleyblank (1962: 129) proposed that Old Chinese *skʰ- might have become Early Middle Chinese *tʂʰ-. Later, Pulleyblank (1965: 206) proposed Old Chinese *kʰs- as a source of Early Middle Chinese *tʂʰ-. But there is no *s in Proto-Kra-Dai *kraʔ. *s- is likely to have been in the Old Chinese reading of 楚 since nearly all readings of characters in the 疋 phonetic series began with *ʂ- or *s- in Early Middle Chinese. There is no evidence on the Chinese side directly pointing to *k- in 楚 or any other member of the 疋 phonetic series, though 疋 does have another Early Middle Chinese reading *ŋæʔ which could mechanically be derived from an Old Chinese *ŋraʔ - close to *kraʔ but with a velar nasal rather than a stop.

Next: How can I make Chamberlain's idea work?

5.11.11:56: Added reference to Pulleyblank (1965) and link to Pulleyblank (1962). ARMENIAN, KOREAN, AND BURMESE APPROACHES TO KHITAN OBSTRUENTS

In my last entry, I wrote,

the Khitan transcribed Liao Chinese *t as both <t> and <d>

There are similar inconsistencies with other obstruents and to a lesser extent even in the spelling of native Khitan words: e.g., 'second' is spelled with both 162 <c> and 104 <dz> (Kane 2009: 115).

I originally thought that Liao Chinese and Khitan had different obstruent systems: e.g., LC had an unaspirated : aspirated distinction whereas Khitan had a voicing distinction. But that wouldn't explain the inconsistency in Khitan native words.

Today it occurred to me that Khitan might have had Armenian-style variation:

The major phonetic difference between dialects is in the reflexes of Classical Armenian voice-onset time. The seven dialect types have the following correspondences, illustrated with the t–d series:

Correspondence in initial position

Indo-European *d
*dʰ *t
Erevan t
Istanbul d
Kharpert, Middle Armenian d
Malatya, SWA
Classical Armenian, Agulis, SEA t
Van, Artsakh t

But of course Khitan had only two obstruent series, not three.

Might the use of certain spellings correlate with certain locations and/or time periods? They would then reflect the obstruent series of different regional/chronological varieties of Khitan. The unspoken assumption of Khitan studies is that the language was homogeneous over a wide area for a long period, but that is unlikely.

Another possibility is that Khitan was like modern Korean in which unaspirated obstruents have voiced and voiceless allophones conditioned by different environments: Sino-Korean 德 /tək/ appears as

[dək] after a sonorant

[tək] elsewhere

Could 254.020 <d.ei> ~ 247.020 <t.ei> transcribing Liao Chinese 德 (Kane 2009: 253) have had a similar distribution?

A final possibility is that Khitan was like Burmese in which etymological voiceless consonants may be voiced in close juncture. Wheatley (2009: 729) explains that in Burmese,

[c]lose juncture is characteristic of certain grammatical environments [...] But within compounds the degree of juncture between syllables is unpredictable; the constituents of disyllabic compound nouns (other than recent loanwords) tend to be closely linked, but compound verbs vary, some with open, some with close juncture.

The above possibilities are not mutually exclusive for Khitan. THE KHITAN EMPEROR SHENGZONG IN UNICODE

Today I discovered that lookalikes for all four Khitan large script characters for 聖宗皇帝 'Emperor Shengzong' (r.  979-1031) exist in Unicode:


Of course it's only the first two characters that are interesting; they are unknown to nearly all literate in Chinese. The last two are identical to Chinese 皇帝 'emperor'.

'Emperor Shengzong' exemplifies how the Khitan large script to a Chinese eye is a mix of familiar and alien elements. The first two characters combine famliar elements

夕 'evening' + 卞 'hat' = 𫝢

亻 'person' + 及 'to reach' = 伋

in unfamiliar ways.

𫝢 turns out to be a variant of 升 'to rise', which in turn was a homophone of 聖 *šiŋ 'sage' in Liao Chinese aside from its tone. 𫝢/升 and 聖 were not homophones until the late first millennium AD, so the use of 𫝢 for 'sage' may date from the Liao dynasty and is probably not a carryover from the pre-Liao Parhae script hypothesized by Janhunen. Why didn't the Khitan simply recycle 聖 'sage' the way they recycled 皇帝 'emperor'? Was 聖 'sage' too complex for the Khitan large script which favored a low number of strokes per character?

In Chinese, 伋 is a name character of no known meaning. (It is the birth name of Confucius' grandson 子思 Zisi.) It would have been pronounced *ki in Liao Chinese and not 宗 *tsuŋ like 'ancestor'. So the reasoning for 伋 as 'ancestor' is unclear (though at least the 亻 'person' radical makes sense). Might a Khitan or even a Parhae word for 'ancestor' have sounded something like *ki?

(5.9.9:39, revised 14:16: Was 伋 a semantic compound invented by someone who might not have known about the rare character 伋? But I know of no semantic compounds unique to the Khitan large script. The closest instance I can think of is


which consists of 天 'heaven' over 土 'earth'. It is not a true semantic compound because it does not represent a word for 'heaven and earth' or 'world' (the sum of 'heaven and earth'); 土 'earth' seems to disambiguate an unknown Khitan word for 'heaven' from 天 for <tên>, a borrowing from Liao Chinese. The semantic function, if any, of 及 'to reach' in 伋 'ancestor' is less clear.

The Dictionary of Chinese Character Variants has no 伋-like variants of 宗. What I will call Janhunen's Question remains unanswered: If the Khitan wanted a script to distinguish themselves from the Chinese, why did they keep or replace characters seemingly at random? I still think the only possible answer is that they didn't do that - rather, they adapted a sister script of Chinese [Janhunen's hypothetical Parhae script]. The situation is somewhat parallel to that of Cyrillic which is related to the Latin alphabet but not derived from it; they are 'cousins', not 'daughter' and 'mother'.)

Although the shapes of 皇帝 'emperor' are uninteresting, the question of how we know their readings is worth examining. Kane (2009) reads them as <hoŋ di> (= <ghong di> in the transcription system on this site).

However, I have not found any Khitan small script phonetic spelling of the first half of 皇帝 'emperor' or any of its homophones in Chinese. I would expect such a spelling to be 340.071 <> with voiceless 340 <h> rather than voiced-initial 076 <gho>. (There is no known small script character <gh> without a vowel, and devoiced to *x in Liao Chinese.) No spelling <> is in Qidan xiaozi yanjiu (1985: 460). Has such a spelling been found in the thirty-plus years since the publication of that book?

Kane (2009: 244) lists 247.339.339 <t.i.i> as a small script spelling of the second half of 皇帝 'emperor'. Unfortunately, he does not cite a source for this spelling, and it is not in Qidan xiaozi yanjiu (1985: 375). I presume <t.i.i> is from an inscription discovered after Qidan xiaozi yanjiu was written. The <t> of <t.i.i> does not necessarily invalidate Kane's reading di for 帝 since the Khitan transcribed Liao Chinese *t as both <t> and <d>, and they transcribed Liao Chinese *i as both <i> and <i.i>.

5.9.0:33: Why is the name character 伋 glossed in English as 'deceptive' at

5.9.0:49: Kane (2009: 181) also lists a second Khitan large script character ⿰歹卞 for 聖 'sage' with 歹 'bad' on the left instead of  夕 'evening' from Liu and Wang (2004: 27, character 150). That character has no Unicode lookalike; it is character 0177 in N4631 ("Proposal on Encoding Khitan Large Script in UCS") which does not seem to list 𫝢 from Kane (2009: 183). Where is 𫝢 attested? Regardless of whether 𫝢 is an error for ⿰歹卞 and hence not a real Khitan large script character, I have no doubt that  ⿰歹卞 is a variant of the Chinese character 𫝢 and is a phonetic loan for  聖 'sage'.

I also think that 𫝢 / ⿰歹卞 <shing> may have been the inspiration for the vaguely similar Tangut character


2shen3 'sage'

whose Tangraphic Sea analysis has been lost.

5.9.22:31: Are Khitan large script characters

1054 (升 + a dot on the right)

1056 (1054 with the first stroke 丿 stretching over both vertical strokes of 廾 plus a dot on the right)

in N4631 further variants of 𫝢 / ⿰歹卞 <shing>?

5.10.1:49: Chinggeltei's  關於契丹文字的特點 (1997: 110) includes 𫝢  in its list of Khitan large script characters. OBLIQUE AFFRICATES IN CHINESE

Today on Wikipedia I saw that standard Mandarin 斜 xie [ɕjɛ] 'oblique' corresponded to Lower Yangtze Mandarin

colloquial [tɕia]

literary [tɕiɪ]

with affricate initials. The colloquial reading preserves an earlier -a going all the way back to Old Chinese; the literary reading has an innovative raised vowel [ɪ].

The dictionary Middle Chinese initial is *z-. Other dialects of Middle Chinese might have had *dz-. In any case, the Old Chinese word began with *sɯ-, though what was between that *sɯ- and *-a is not clear: *sɯ.ɢa, *sɯ.ja, and *sɯ.la are all possible. There is no known external comparison that could narrow down the possibilities. The character 斜 has the phonetic 余 *Cɯ.la, but the character 斜 dates from Han times, and at that point *ɢ, *j, and *l might have already merged into *j. (邪 'slant' - a homophone of 斜 in Middle Chinese - may be a pre-Han spelling of the same word. But its phonetic 牙 has a velar nasal initial *ŋ-!)

My hypothetical Middle Chinese *dz- might be from *sɯ.ɢ- > *s.ɢ- > *zɢ- > *zd- > *dz-. But it's more likely that it results from a Late Old Chinese or Middle Chinese confusion of *z- with *dz-. Japanese merged *z- and *dz- into /z/ which is now [dz] initially, [z] medially, and [ddz] when geminated.

Xiaoxuetang reports affricate initials in 斜 in

Mandarin: 天長 Tianchang [tsʰ] (the sole Mandarin example on the site)

Wu: 丹陽 Danyang [dʑiɑ] ~ [dʑiɒ], etc.

(Hui: no data; NB: this 徽 Hui is not the Mandarin-speaking Muslim 回 Hui, whose name is pronounced with a different tone)

Gan: 湖口 Hukou [dʑia], etc.

Xiang: 雙峰 Shuangfeng [dʑio], etc.

Min: 廈門 Amoy [tsʰia] (colloquial; literary [sia]), etc.

Yue: Cantonese [tsʰɛ] (where long ago I first observed this affricate initial corresponding to Middle Chinese *z-; I didn't know such an initial was in Mandarin too)

Ping: 永福 Yongfu [tsʰiə], etc.

Hakka: 梅縣 Meixian [tsʰia] (colloquial; literary [sia]), etc.

The affricate initial is represented in nearly every branch. No Jin variety on that website has an affricate reading. But all but one of the unclassified varieties has an affricate initial.

It seems that literary varieties of Middle Chinese kept *z- (> modern [s]) apart from *dz- while colloquial varieties merged them to various extents.

5.8.13:40: For comparison, let's see if the above dialects also have affricates for Middle Chinese 徐 *zɨə 'to walk slowly; a surname':

Mandarin: 天長 Tianchang [tʃʰʮ], etc.

Wu: 丹陽 Danyang [dʑyz] (sic), etc.

Hui: 旌德 Jingde [tsʰʮ], etc.

Gan: 湖口 Hukou [dzi], etc.

Xiang: 雙峰 Shuangfeng [dy] (sic) ~ [dʑy], etc.

Min: 廈門 Amoy [tsʰi] (colloquial; literary [su]), etc.

Yue: Cantonese [tsʰœy], etc.

Ping: 永福 Yongfu [tsʰy], etc.

Hakka: 梅縣 Meixian [tsʰi], etc.

The only Jin variety with a reading is the most well-known: 太原 Taiyuan [ɕy]. 徐 is a common surname, so it must be in other Jin varieties. The absence of affricates in Jin readings of 斜 'oblique' makes me guess that 徐 also lacks affricates in the rest of Jin, but I don't know.

The unclassified varieties have a mix of initials: e.g.,

富川 Fuchuan [sy]

鍾山 Zhongshan [θy]

賀州 Hezhou [ty] (cf. the stop [d] in Shuangfeng above)

道縣 Daoxian [tso]

連州 Lianzhou [tsʰɛu]

To work out what's going on with them would require studies of their individual phonologies. It is a shame that Xiaoxuetang doesn't seem to have initial, rhyme, and tonal inventories online for each variety. In theory I could extract inventories from the data, but I don't have the time to do that right now. HAVE A ČĪZBURGERU: ENGLISH BORROWINGS IN LATVIAN

After mentioning Latvian datums last time with its combination of a Latin neuter suffix -um and a Latvian masculine suffix -s, I was curious to see how Baltic languages dealt with a recent influx of English loans. Baltic languages and Greek are the only modern Indo-European languages I know of that still retain ancient -s suffixes in the nominative case.

I guessed that all Latvian borrowings of English consonant-final stems would be placed in the first masculine declension like datums. And it does seem that is generally the case. See these two lists. Even sibilant-final stems are assigned to that declension: e.g., bizness (which is biznes-s and not copying the -ss of the English spelling) and finišs (< finish + -s). I might have expected them to be assigned to the second declension with -is or the third declension with -us.

The exceptions I've seen so far end in -er in English:

adapteris < adapter

menedžeris < manager

peidžeris < pager

porteris < porter

taimeris < timer

Were they assigned to the second declension by analogy with some earlier wave of -eris loans?

Not all English -er words become -eris words in Latvian: cheeseburger has become čīzburgers (with an un-English pronunciation of burger with [u] - †čīzberger would have been closer to the English original). Maybe -burger is by analogy with hamburgers, perhaps in turn influenced by Russian <gamburger>, also with [u]? No, maybe -burger is simply based on a spelling pronunciation. THE GENDER OF 'DATE' IN BALTO-SLAVIC AND ROMANCE

On the same Wiktionary page as Dutch datum 'date' (masculine despite its Latin neuter ending -um!) are

Czech datum (neuter); cf. Slovak dátum (masculine; why a long á that doesn't match Czech or Latin?; its neighbor Hungarian dátum also has a long vowel)

Serbo-Croatian and Slovene datum (masculine)

Macedonian <datum> is also masculine. The shift to masculine in Slavic is understandable since consonant-final nouns are generally masculine, and Latin -um is not a Slavic suffix and hence prone to reinterpretation as the ending of a stem.

Leaving Slavic, Latvian has no neuter, and its feminine stems generally end in vowels, so masculine datums is also understandable.

However, Latvian's sister Lithuanian has feminine data (which looks like the Latin plural!) rather than masculine †datumas (see Wikipedia on LIthuanian declension).

And going back to Slavic, Polish also has feminine data, and Bulgarian, Macedonian, Belarusian, Russian, and Ukrainian have feminine <data>. Romance languages have feminine data (French date and Romanian dată) too. Wiktionary derives the Romance forms from a Late Latin data. fdb explains:

Italian, Spanish, Portuguese (etc.) data, and French date (whence English date) are all taken from Mediaeval Latin data, the plural of classical Latin datum, but reinterpreted in these languages as a singular noun. German and Dutch use the classical singular form datum.

All of these are bookish borrowings from Mediaeval or Classical Latin (so-called cultisms) and not organic descendants of the Latin words.

[Someone asks what organic descendants would look like.]

In that case one would expect *dada in Spanish, Portuguese and Italian.

Are the -um forms in Slavic and Latvian borrowings from German Datum?

5.6.0:01: English date then got borrowed into German as das Date which is presumably neuter by analogy with Datum.

5.6.0:09: Added quotation from fdb.

5.6.0:28: Danish date from English has common gender (cf. German above).

5.6.0:32: Added Romanian dată. THE GENDER OF DUTCH '-ISM'S AND 'DATE'

Not in time for May Day ...

French communisme is masculine, as is its Latinized German equivalent Kommunismus with a restored Latin masculine nominative singular ending -us. So why is Dutch communisme (and other -isme words like socialisme) neuter?

Conversely, datum has a Latin neuter nominative singular ending -um and is still neuter in German. So why is Dutch datum masculine unlike, say, neuter museum which is still neuter in Dutch?

Are the genders by analogy with semantically similar words? Was there ever a time when de communisme and het datum were acceptable?

5.5.0:33: Google Books has examples of het datum from the 18th and 19th centuries. But I can't find any examples of de communisme in Dutch (as opposed to French where that is a preposition-noun sequence rather than a definite article-noun sequence).

Treffers-Daller (1994: 140) discusses French-Dutch gender mismatches and mentions Van Marle's hypothesis that French borrowings are marked and may receive the marked gender: the less frequent neuter gender (only 25% of Dutch nouns are neuter according to Tuinman 1967).

She also writes,

According to Volland (1986), many French loans obtain neuter gender when borrowed into German. About 60 percent of the borrowings keep the original gender in German, and 40 percent are allocated another gender. In most cases it is the masculine nouns who become neuter in German. It is remarkable that the same tendency for masculine words to become neuter exists in German and in Dutch.

Obviously Kommunismus is not one of those masculine words (though its -us may have made it resistant to gender shift). CZECH VOWEL ASYMMETRY AGAIN

Judging from the IPA for Czech at Wikipedia, Czech vowels are phonetically as well as distributionally asymmetrical:

/iː/ [iː]

/u uː/ [u uː]
/i/ [ɪ]

/o oː/ [o oː]
/e eː/ [ɛ ɛː]

/a aː/ [a aː]

The front part of the system 'tilts downward' with the exception of /iː/ which is high.

Short /i/ is lower than long /iː/ and has no back counterpart at the same height.

/e eː/ are lower than /o oː/.

How did this system come about? /i iː/ are from earlier front *i *iː and central *ɨ *ɨː.

Was there a Ukrainian-like phase in which the central high vowels became *ɪ *ɪː? (Ukrainian has no phonemic vowel length, though.) The four front vowels in stage 2 then merged into an English-like subsystem with a higher long vowel and a lower short vowel in stage 3:

Stage 1

Stage 2

Stage 3

Unlike Czech, Slovak is next door to Ukrainian, and according to the IPA at Wikipedia it has no [ɪ]; its vowel system is truly symmetrical on the phonetic level if one ignores the increasingly marginal vowel [æ]:

[i iː]

[u uː]
[e eː]
[o oː]

[a aː]

The Slovak phonology article at Wikipedia, however, paints a more complex picture: e.g., /e eː/ [e̞ e̞ː] may be phonetically higher than /o oː/ [ɔ̝ ɔ̝ː] - the reverse of Czech. (Did the presence of low [æ] - a vowel absent from Czech - incentivize speakers to raise /e eː/ for greater contrast during its heyday in the past?) Nonetheless it seems that length is not correlated with height differences unlike Czech where short and long /i/ have different heights.

Like Czech /i iː/, Slovak /i iː/ are from earlier front *i *iː and central *ɨ *ɨː So I suspect Slovak also had a Ukrainian-like phase in which the central high vowels became *ɪ *ɪ.

But maybe at some earlier point Czech and/or Slovak had a Rusyn-like stage in which central *ɨ *ɨː coexisted with front *ɪ *ɪ. I still don't understand how Rusyn can have both central /ɨ/ and front /ɪ/ since I assume both are from *ɨ. Are they in complementary distribution? Is one native and one borrowed?

5.4.0:40: Are Czech /e eː/ lower mid because they merged with */ě/ *[ɛː]? */ě/ was historically long, but its reflexes in Czech are both long and short for reasons I don't understand:

*bělъjь > bí /bliː/ 'white'

*svě > svět /svjet/ 'world'

The short reflex is /e/ which may be preceded by a secondary palatal consonant: e.g., /j/ in the case of /svjet/. CZECH VOWEL ASYMMETRY

Having written about Slavic and vowels in my last two entries, I'm going to combine the two topics together.

The standard Czech vowel system appears symmetrical if one only looks at vowels in isolation. Each short vowel has a long counterpart:


And the diphthongs form a triangle:




But distribution tells a more complex story.

Original *uː became /ou/ except "chiefly in noun prefixes" (Short 1993: 456). e.g., úraz 'injury' but urazit 'to injure'. Why was the prefix *u lengthened to an *uː later preserved in nouns? I still don't understand the backstory of length in Slavic.

Original *oː became uo and then a new /uː/ written <ů> (which I think of as <o> atop <u>); cf. Polish <ó> /u/ and Slovak <ô> /uo/ from earlier *oː. (I'd like to see a chronology of *oː-shifts in West Slavic.)

Loanwords supplied a new /oː/ and /au eu/ to balance /ou/.

Those back vowel developments did not have exact front vowel parallels. *iː did not become †/ei/ (though Short 1993: 464 reports ý /ɨː/ > /ej/ in colloquial Czech), and *eː only sometimes became /iː/ (Short 1993: 464). INDEPENDENT VOWEL SYMBOLS IN THE INDIC SCRIPTS OF THE PHILIPPINES

Indic scripts typically have two kinds of vowel symbols:

- dependent vowel symbols attached to/in 'orbit' around consonant symbols

- independent vowel symbols

Depending on the script, vowels may be written with dependent vowel symbols plus a carrier <°a>, independent vowel symbols, or a mix of the two.

The Indic scripts of the Philippines generally only have three independent vowel symbols each, and on closer observation, some of those symbols are derived from others:

Baybayin for Tagalog on central Luzon in the north: three truly independent symbols <°a °i °u>

Hanunoo on southern Mindoro in the center: independent <°a °u:>; <°i> looks like <°a> plus a stroke on the bottom right (unlike either the dependent vowel <i> on the top or the dependent vowel <u> on the bottom)

Buhid on southern Mindoro in the center: independent <°a °u>; <°i> looks like <°a> plus a stroke on the bottom like the dependent vowel <u> rather than the dependent vowel <i> on the top)

Tagbanwa on Palawan in the southwest: <°a °i> have the same basic shape with different extra strokes: one on the bottom for <°a> and another on top for <°i>; neither stroke matches the dependent vowel <u> on the bottom or the dependent vowel <i>); only <°u> is not derived from another symbol

Kulitan for Kapangpangan on central Luzon in the north: independent <°i °u>; <°a> looks like <°u> plus an extra stroke on the bottom left (unlike the dependent vowel <u> on the bottom); <°e °o> look like <°a°i> and <°a°u>, reflecting their apparent origin as "monophthongized diphthongs".

Tagalog is the most conservative; it alone preserves three completely different vowel symbols that still resemble their Indic prototypes.

It is not surprising that the Mindoro scripts have the same innovation (replacing <°i> with a <°a>-derivative).

Tagbanwa and Kulitan seems to have each gone their own way. Tagbanwa is isolated by the sea, but Kulitan is next door to Baybayin. WHAT HAPPENED TO UKRAINIAN NOMINATIVE PLURAL ADJECTIVES?

I almost 'corrected' Ukrainian <zorjani> 'stellar (nom. pl.)' to †<zoryany> with a <y> ending that I expected by analogy with Russian <ye> and Belarusian <yja> < *-ye after 'hard' (nonpalatalized) stems. But the nominative plural ending is <i> regardless of stem type. Compare:

stem type
'soft' (palatalized)
m. nom. sg.
nom. pl.
m. nom. sg.
nom. pl.

Did <i> spread by analogy through all adjective paradigms despite the fact that hard stems outnumber soft stems (which would have led me to guess that <y> would win out)? Did the higher frequency and lower markedness of <i> in Ukrainian help it to defeat its less palatal competitor <y>?

5.1.0:07: Added table.

5.1.22:22: Maybe Ukrainian shares an areal feature with Polish which has soft novi 'new (m. pers. nom. pl.)' instead of †nowy. (But the non-m. pers. nom. pl. is still hard nowe rather than soft †nowie.)

Slovak, another neighbor of Ukrainian, has a mixed pattern like Polish: soft noví 'new (m. anim. nom. pl.)' ~ hard nové (other nom. pl.). A consistently hard paradigm would have †nový́ ~ nové and a consistentl soft paradigm would have noví ~ †novie. (Both í and ý́ are /ɨː/, but in the past I assume ý was something like /ɨː/. No long /ieː/ exists.)

So does Czech: noví 'new (m. anim. nom. pl.)' instead of †nový. (As in Slovak, both í and ý are /ɨː/, but in the past I assume ý́ was something like /ɨː/.) Unlike any of the above languages, Czech has three types of nominative plurals:

1. soft noví 'new (m. anim. nom. pl.)'

2. hard nové 'new (m. inanim. + fem. nom. pl.)' instead of soft †noví < *-ie

3. hard nová 'new (neut. anim. nom. pl.)' instead of soft †noví < *-ie < *-a̋

Interslavic doesn't have a 'soft' e, so the non-m. anim. nom. pl. has to be hard:

soft novi 'new (m. anim. nom. pl.)'

hard nove 'new (other nom. pl.)'

This two-way distinction is hard for me to grasp since I'm accustomed to Russian having a single form for both categories. STAR WARS IN SLAVIC

Having just linked to the Belarusian Wikipedia's entry on Star Wars, I was surprised by how Star was translated as <Zordnyja> which isn't cognate to the 'star' word in most of the other Slavic titles for the movie:

South Slavic

Bosnian zvijezda 'star'

Croatian Zvjezdani 'stellar'

Serbian zvezda 'star'

Slovenian zvezd 'of the stars'

Bulgarian <Mežduzvezdni> 'interstellar'

Macedonian <zvezdite> 'the stars'

West Slavic

Polish Gwiezdne 'stellar'

Silesian Gwjezdne stellar' (did an author of this article translate the title?)

Slovak Hviezdne 'stellar'

East Slavic

Russian <zvëzdnye> 'stellar'

The exceptions are Ukrainian <Zorjani> 'stellar' and Czech Star (Wars) (untranslated!).

I was expecting a Belarusian adjective derived from <zvjazda> 'star' (the name of this newspaper that I've seen online) - something like Interslavic zvězdne. <Zvjazdnyja>?

Wiktionary derives Belarusian <zorka> 'star' from Proto-Slavic *zorja. But is the word attested outside East Slavic? The only cognate I know of is Ukrainian <zirka> 'star' whose <i> is unexpected; normally *o > <i> before or *ъ, not *a. (The Ukrainian adjective <Zorjani> 'stellar' preserves *o.)

4.30.1:30: Filled out the list of equivalents of Star and added the final note about <Zorjani>.

4.30.21:21: I might as well survey the second half of the title in Slavic as well. I'm going to guess that it's some cognate of Belarusian <vojny> 'wars' almost everywhere: cf. Interslavic vojny 'wars'. I seem to recall an exception other than the untranslated Wars in Czech - ah, it was Serbo-Croatian!

South Slavic

Serbo-Croatian ratovi 'wars' (but would vojne be theoretically possible?)

Slovenian vojna 'war'

Bulgarian <vojni> 'wars'

Macedonian <vojna> 'war'

West Slavic

Polish and Silesian wojny 'wars'

Slovak vojny 'wars'

East Slavic

Ukrainian <vijny> 'wars' (nom. pl. of <vijna>; as with <zirka>, why did *o become <i> even without a following or *ъ?)

4.30.23:23: Duh, the word was *vojьna in Proto-Slavic. And I suppose <zirka> 'star' is from a earlier *zorьka or  *zorъka.

Russian <vojny> 'wars'

Serbo-Croatian rat turns out to be the cognate of Ancient Greek ἔρις éris 'strife' ... and English earnest!? I see the word is in East Slavic as well, but not West Slavic, so vojna is the best choice for Interslavic since it's understood across the entire family. TABLES AND FALCONS: THE FATE OF FINAL *L IN SLAVIC

Polish kiełbasa /kʲewbasa/ from my last two entries is spelled with ł but is no longer pronounced with an [l].

Standard Polish once had three kinds of phonetic laterals, but only two survive today: a palatal allophone before /i/ and a dental allophone elsewhere.

Earier phonetic
Current phonetic
Current phonemic
Example (from de Bray 1980: 261)
łapa 'paw'
lato 'summer'
list 'letter'

The reflexes of Polish laterals seem straightforward: old hard *l becomes /w/ and old soft *lʲ becomes /l/.

Hence *stolъ 'table' and *sokolъ 'falcon' became Polish stół /stuw/ and sokół /sokuw/.

(I can't predict when *o became ó /u/.)

What does not seem straightforward to me is the fate of syllable-final *l in Ukrainian, Belarusian, and Serbo-Croatian.

There is a tendency toward shifting syllable-final *-l to /w w o/ in those languages: e.g.,

Ukrainian /stojaw/ 'stood' (masc. sg.) < *-l

Belarusian /stajaw/ 'stood' (masc. sg.) < *-l

Serbo-Croatian /stajao/ 'stood' (masc. sg.) < *-l

The best-known example might be Serbo-Croatian /beograd/ (cf. English Belgrade reflecting earlier *l).

Nonetheless, 'table' and 'falcon' may retain *-l:

Ukrainian /stil/, /sokil/

Belarusian /stol/, /sokal/

Serbo-Croatian /sto/ (Serbia) ~ /stol/ (Croatia), /soko/ (Bosnia, Serbia) ~ /sokol/ (Croatia)

(Countries are from Wiktionary entries.)

In Belarusian, word-final *l remains except in the past tense masculine singular (Mayo 1993: 893). (Did it erode there due to high frequency?)

The situation in Ukrainian seems similar, though I know of one case of /w/ < *l that is not a past tense masculine singular: /piw/ < *polъ 'half'.

Could /l/ retention in Croatian stol 'table' be motivated by avoiding homophony with 'hundred' which is /sto/ across Slavic? That doesn't explain Croatian sokol 'falcon', though. Browne and Alt (n.d.: 20) write,

In adjectives and nouns it [*l > o] is widespread though some words avoid it: masculine singular nominative mio [< *mil] 'nice', feminine mila, but ohol 'haughty', feminine ohola.

I assume borrowings postdating *l-shifts retain final -l in Serbo-Croatian: e.g., hotel (not †hoteo).

Ukrainian and Belarusian seem to favor borrowing foreign -l-words with /lʲ/:

U /hote/, B /hate/ 'hotel'

U /alkoho/, B /alkaho/ 'alcohol'

but U <mark hemill> and B <mark hèmil>, both /mark hemil/ 'Mark Hamill'. (The B form is from the B Wikipedia entry for the original Star Wars [Зорныя войны. Эпізод IV: Новая надзея].)

4.29.21:57: Added Mayo on Belarusian, Ukrainian /piw/, Browne and Alt quotation, and everything after that. IRREGULARITIES IN 'KIELBASA' REVISITED

Yesterday I discovered in de Bray's (1980: 258) book on West Slavic that Polish kiełbasa /kʲewbasa/ is in fact the regular reflex of an earlier *kl̩basa (cf. Slovak klbása ~ klobása). I assume his hard *l̩ goes back to Proto-Slavic *ъl.

But I still don't know how to account for the front vowels of

Ukrainian ківбаса <kivbasa> < *kilbasa

Belarusian кілбаса <kilbasa> ~ келбаса <kelbasa>

Are they borrowings of forms resembling Polish kiełbasa or pre-Polish (proto-West Slavic?) *kl̩basa? If they are from *kl̩basa, their front vowels could have been inserted to avoid /klb/-clusters that are not possible in East Slavic.

My guess is that Belarusian келбаса <kelbasa> is a borrowing from Polish kiełbasa, whereas Belarusian кілбаса <kilbasa> is an older form with an epenthetic vowel.

Ukrainian ківбаса <kivbasa> was presumably borrowed as *kilbasa before *l > <v> /w/. I don't think it's from Polish since

- the height of the first vowel doesn't match

- Polish ł apparently became [w] in the standard language only in the early twentieth century (Wikipedia); Morfill (1884: 1) says it is "a very strong l", not [w].

- Polish ł is still [ɫ̪] and not [w] in eastern dialects of Polish in contact with Ukrainian (Wikipedia)

A recent borrowing from the modern standard pronunciation of kiełbasa would be †<kevbasa> and a borrowing from a pre-20th century standard pronunciation or an eastern dialectal pronunciation would be †<kelbasa>. IRREGULARITIES IN 'KIELBASA'

Wiktionary derives Polish kiełbasa /kʲewbasa/ and its relatives from a Proto-Slavic *kъlbasa, in turn borrowed from some Turkic word similar to modern Turkish külbastı 'roasted meat', lit. 'ash-pressed'. Irregularity within Slavic implies that the word was borrowed more than once.

The Polish word and nonstandard forms like

Ukrainian ківбаса <kivbasa>

Belarusian кілбаса <kilbasa> ~ келбаса <kelbasa>

have front vowels /i e/ that I would not expect from Proto-Slavic *ъ.

At first I thought that maybe the Polish and Belarusian forms were from an earlier Ukrainian

*külbasa < *kölbasa < *kolbasa (cf. standard Ukranian ковбаса <kovbasa>)< *kъlbasa

but *o only raises to і in standard Ukrainian before a lost weak jer ( or  *ь) which wasn't in this word. Maybe the <kivbasa> dialect worked differently.

My current guess is that the /i e/ vowels in Polish, Ukrainian, and Belarusian reflect attempts to imitate Turkic ü and are not from *o or *ъ.

The Belarusian forms have /l/ instead of /w/ < *l corresponding to Ukranian <v> /w/ < *l and Polish ł < *l. This suggest that the Belarusian borrowings postdate the shift of *l to /w/ in Belarusian. But maybe I misunderstand when *l becomes /w/ in Belarusian. A SHARED *SHCH-IFT IN CHINESE AND RUSSIAN

Last Friday (yes, I'm behind), I saw

新商品 'new product', lit. 'new trade item'

on packaging.

In Old Chinese, 商 was

either *sɯ-taŋ (corresponding to Baxter and Sagart 2014's *s-taŋ)

or *sɯ-laŋ (corresponding to Schuessler 2009's *lhaŋ)

and in Middle Chinese, it was *ɕɨaŋ.

It occurred to me that the palatalization of *sɯ-t- to *ɕ-

*sɯ-t- > *sɯ-tɨ-  > *stɨ- > *stɕɨ- > *ɕtɕɨ- > *ɕːɨ- > *ɕɨ-

was like what I understand to be the palatalization of *stj- to [ɕː] in Russian:

*stj- > *stɕ- > *ɕtɕ- > щ [ɕː]

Above I presume there was an intermediate *ɕtɕɨ-stage at some point in Old Chinese resembling romanizations of Russian щ as šč or shch (e.g., Хрущёв Khrushchev), but without external evidence (e.g., Old Chinese transcriptions of a foreign word with šč-), it's impossible to say when that point was.

3.14.11:45: I assume that Russian alternations such as

вместить 'to contain (perf.)' ~ вмещу 'I will contain'

can be internally reconstructed as

*vmestitĭ ~ *vmestju

to fit the pattern of

вменить 'to consider (perf.)' ~ вменю 'I will consider'

< *vmenitĭ ~ *vmenju

Ideally I'd like to find an example of initial щ- [ɕː] from *stj-, but I think initial щ [ɕː] is normally from *sk-. A possible exception I found in Preobrazhensky's Etymological Dictionary of the Russian Language is щегол 'goldfinch'; Duden says German Stieglitz 'goldfinch' is of Slavic origin.

Proto-Slavic *štjegŭlŭ? > *ščegŭlŭ

East Slavic:

Ukrainian щиголь <ščyhol'>, щоголь <ščohol'>, щоглих <ščohlix>

Belarusian щигель <ščihel'>, щиглик <ščiglik> (I have kept Preobrazhensky's spellings with щ and и instead of modern шч and і)

(why -ль as if from *-lĭ?)

(no South Slavic reflexes? I would expect Bulgarian initial щ- [št], Serbo-Croatian initial št-, and Slovene initial šč-)

West Slavic:

Czech stehlec, stehlík (with ste- rather than the regular ště- [ʃcɛ] - could this be a borrowing from some variety of German in which st- was [st] instead of [ʃt]?)

Polish szczygieł [ʂtʂɨɡʲɛw]

Upper Sorbian šćihlica [ʃtsʲihlitsa]

Lower Sorbian ščgeľc [ʂtʂgɛlts] (I have kept Preobrazhensky's spelling with ľ instead of modern l)

The reflexes of *stj- could have had parallels in Old Chinese at different stages and/or different places. A LITTLE MISTAKE: ÍT ÓT TO BE THE PHONETIC

In my last post, I wrote that 乚 ất was the phonetic of the Vietnamese Chữ Nôm character 𡮒 ót 'a kind of fish'. After announcing that post on Twitter, I realized that the actual phonetic was 𠃝 which has two readings, ít 'little' and út 'youngest'. I didn't think of 𠃝 because 乙 appears as 乚  in 𡮒.

If the creator of 𡮒 had the reading út in mind for its phonetic 𠃝, the score of 𡮒 would be 2 + 3 + 2 + 2 = 9 - much higher than my original score of 6.

乙 is a 'Semitic phonetic': it can represent syllables with a wide range of vowels as long as those vowels are within the consonantal frame [ʔ-t]:

Neutral or achromatic vowels (neither palatal nor labial)

ướt [ʔɨət]

ất [ʔət]

ớt [ʔəːt]

𢖮 ắt [ʔat]

𢖮 át [ʔaːt]

Palatal vowels

𠃝 ít [ʔit]

𠮙 ét [ʔɛt]

Labial vowels

𠃝 út [ʔut]

𡮒 ót [ʔɔt]

All of those syllables have the sắc tone written with an acute accent. Syllables with initial glottal stops and final stops regularly develop that tone.

Such a range of vocalism for a phonetic is unusual in Chữ Nôm. In my 2003 book, I proposed that phonetics generally belong to three vowel classes: neutral, palatal, or labial.

'Semitic phonetics' are exceptions to that generalization: e.g., 曰 viết in

neutral: 曰 vất [vət], 抇 vớt [vəːt]

palatal: 𢪏 vít [vit], 𧿭 vết [vet], 𢪏 vét [vɛt]

labial: ⿰曰𡿨 vót [vɔt]

3.1.0:39: Compare the ranges of readings for 'Semitic phonetics' above with those for کت <kt> listed in Hayyim's  New Persian-English Dictionary:

neutral: kat

palatal: ket

labial: kot

(Of course, Persian is not a Semitic language, but it is written in a Semitic script.)

One difference is that all of those k-t readings have no tones, whereas all of the readings for Chữ Nôm characters with the two 'Semitic phonetics' above have the same tone. Perhaps the term 'Semitic phonetic' is a misnomer if the consonantal frames are actually consonant-and-tone frames.

cam is a third 'Semitic phonetic' whose derivatives below have readings with three different tones (ngang, huyền, sắc) as well as three different vowel classes:

neutral: 坩柑泔 cam [kaːm], 紺 cám [kaːm], ⿰月甘 cằm [kam], 𩚵 [kəːm], 鉗 cườm [kɨəm]

palatal: 鉗 kìm [kim], ghìm [ɣim], kiềm [kiəm], kềm [kem], kèm [kɛm]

labial: 鉗 cùm [kum], 柑 cùm [kum]

Note, however, that all but one of the readings in that sample have either the ngang or huyền tones which are variants of the same proto-tone conditioned by voicing or its absence in proto-onsets. Also, only one of those characters is a made-in-Vietnam character (⿰月甘). 甘 was already a neutral and palatal phonetic in Middle Chinese because Old Chinese *a often had palatal reflexes after nonemphatic initials. An ideal example of a 'Semitic phonetic' would have many made-in-Vietnam derivatives with a wide range of vowels and tones. I should dig deeper to see if I can find one. ÓT TO BE WRITTEN: FISHING FOR PHONETICS

The Vietnamese Chữ Nôm script represents Vietnamese syllables with existing and modified Chinese characters. The problem is that Vietnamese has many more syllables than Sino-Vietnamese, the subset of Vietnamese syllables that are Chinese character readings. For instance, Vietnamese has syllables ending in -ót, a rhyme absent from Sino-Vietnamese.

In my last two posts, I looked at Vietnamese solutions for writing the syllable lót.

I got curious about how other -ót syllables were written and found several strategies. My examples are not exhaustive, and I have omitted glosses in most cases since I am focusing on readings.

1. Overall match

⿰口脫 thót : 脫 thoát (score: 2 + 3 + 2 + 2 = 9; not a 10 only because the vowel heights don't match: o [ɔ] is higher than oa [wa], though I could be generous and say oa is like [o] + [a], and [ɔ] is between those two vowels in height)

2. Matching the onset and coda without much regard for the vowel

𡮒 ót 'a kind of fish' : 乚 ất (the unwritten onset is [ʔ]; score: 2 + 0 + 2 + 2 = 6)

mót : 蔑 miệt (score: 2 + 1 + 2 + 1 = 6; the only matching vowel quality is length*)

⿰曰𡿨 vót : 曰 viết (score: 2 + 1 + 2 + 2 = 7; the only matching vowel quality is length)

This is the consonantal skeleton or Semitic strategy. If English were written with such a strategy:

cat = drawing of a cat

Kate = <woman> + <cat>

kite = <wing> (representing flight) + <cat>

cut = <blade> + <cat>

coat = <clothes> + <cat>

coot = <bird> + <cat>

caught = <hand> + <cat>

Cf. the reverse Semitic strategy (5 below).

3. Matching the rhyme without much if any regard for the onset

3a. Glottal onset : nonglottal phonetic

𡁾 hót : 說 thuyết < *ɕ- (or *sʰ-?) (score: 0 or 1 + 2 + 2 + 2 = 6 or 7, depending on whether the aspiration of th- [tʰ] < *sʰ-? counts as a partial match for h-)

3b. *Palatal onset : nonpalatal phonetic

chót with initial [c] : 卒 tốt < *(t)s- (score: 1 + 3 + 2 + 2 = 8)

giót < *ɟ- < *CV-c- : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on how close the initials were when 埣 was created: *CV-c- is not far from *(t)s-, whereas modern gi- [z] ~ [j] is far from t-)

xót < *ɕ- < *cʰ- : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on how close the initials were when 埣 was created: *cʰ- is not far from *(t)s-, whereas modern x- [s] is far from t-)

⿰律𡿨 xót < *ɕ- < *cʰ- : 律 luật (score: 0 + 2 + 2 + 1 = 5)

3c. *Retroflex onset : nonpalatal phonetic

sót < *ʂ- < *Cr- : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on whether the proto-onset was *sr- which isn't too far from *(t)s-; *(t)s- had hardened to t- by the time *Cr- fused into *ʂ-)

rót < *r- or *CV-s- (proto-onset unknown) : 卒 tốt < *(t)s- (score: 0 or 1 + 3 + 2 + 2 = 7 or 8, depending on whether the proto-onset was *CV-s-)

3d. Palatal nasal onset nh- [ɲ] : oral onset phonetic

nhót : 卒 tốt < *(t)s- (score: 0 + 3 + 2 + 2 = 7)

𦝬 nhót : 突 đột with initial [ɗ] < *t- (score: 0 + 3 + 2 + 1 = 6)

𣑵 nhót : 聿 duật with initial [z] ~ [j] < *dʲ- < *j- (score: 0 or 1 + 2 + 2 + 1 = 5 or 6, depending what the onset of 聿 was when 𣑵 was created)

3e. Lateral onset : nonlateral onset phonetic

⿰貝骨 lót : 骨 cốt (score: 0 + 3 + 2 + 2 = 7)

lót : 卒 tốt < *(t)s- (score: 1 + 3 + 2 + 2 = 8)

3f. Labial onset : nonlabial onset phonetic

𡁾 vót : 說 thuyết < *ɕ- (or *sʰ-?) (score: 0 + 2 + 2 + 2 = 6)

vót : 卒 tốt < *(t)s- (score: 0 + 3 + 2 + 2 = 7)

This character could belong to 2 or 3a depending on which part is phonetic:

⿰孛乙 ót 'back of brain' : 孛 bột '' 'comet' + 乙 ất 'second Heavenly Stem' (score: 0 + 3 + 2 + 1 if 孛 is phonetic or 6 or 2 + 0 + 2  + 2 = 6 if 乙 is phonetic)

Neither part is obviously semantic. The absence of any component meaning 'brain' or even 'head' is puzzling. Could this be a double phonetic compound with 孛 approximating the vowel and 乙 the rest?

4. Approximating the onset, vowel, and tone without regard for the coda

𠲿 thót : 束 thúc (score: 2 + 3 + 1 + 2 = 8)

I suspect 𠲿 was created by a speaker of a central or southern dialect in which *-t > [k]. If so, 𠲿 is really an example of strategy 1, and the score should be 9 (with a penalty solely for vowel height mismatch).

5. Approximating the vowel and tone without regard for the consonants

The reverse Semitic strategy (cf. 2 which is the Semitic strategy).

hót : 束 thúc (score: 0 + 3 + 1 + 2 = 6)

I suspect this usage of 束 started with a speaker of a central or southern dialect in which *-t > [k]. If so, 束 is really an example of strategy 3, and the score should be 7 (with penalties for the onset and vowel height mismatch). The score could be raised to 8 if the aspiration of th- [tʰ] counts as a partial match for h-.

No solution has a score of 4 for vowels simply because no phonetic has a Sino-Vietnamese reading with o [ɔ]. The maximum possible score for -ót syllables is 9 out of an ideal of 10 (= 2 + 4 + 2 + 2). The actual scores above range from 5 to 9. It is not possible to determine the median or the mode of scores for ót-characters from the data in this post because it is incomplete and only typologically rather than statistically represenative: e.g., I omitted all but one strategy 1 character with a score of 9 because near-exact matches are boring.

Until now Chữ Nôm characters and readings have been treated as a uniform, timeless body. The next phase of Chữ Nôm studies should take space and time into account: where and when do certain spellings arise, and what can they tell us about Vietnamese phonetics in a given place and period?

*I consider all Vietnamese vowels and diphthongs to be the same length for scoring purposes with the exceptions of the short vowels ă [a] and â [ə] which cannot appear in syllable-final position because all Vietnamese syllables must be bimoraic. Hypothetical *Că and *Câ-syllables would be monomoraic and therefore not permissible. A LÓT OF BRIBES OF BONES AND SHELLS

字典𡦂喃引解 Tự Điển Chữ Nôm Dẫn Giải ‘Character Dictionary of Chữ Nôm with Quotations and Explanations' lists

⿰貝骨 (not in Unicode) lót 'bribe'

as a homophone of lót 'to add a layer beneath or inside' from yesterday. (I suspect the noun is an extension of the verb: a bribe is something one pockets - put inside.)

bối 'shell' on the left is the monetary radical. It's not surprising.

What is surprising is 骨 cốt 'bone' on the right with initial [k] instead of [l]. Or is it?

Using yesterday's scoring system for phonetic fidelity, ⿰貝骨 is a 7:

- the initial consonant is a 0 - [k] and [l] have nothing in common

- the vowel is a 3 - o [ɔ] and ô [o] are both back rounded and of the same length; only their height differs

- the final consonants is a 2 - a perfect match

- the tone is a 2 - a perfect match

Taberd lists a spelling of lót 'bribe' with a matching initial and an ironic original meaning:

律, originally for luật 'law' (bare phonetic)

I find his entry format confusing:

— 揬 | đút —, subornare

Why are the dashes in the Chữ Nôm and the Quốc Ngữ romanization on opposite sides? Why isn't the entry like this?

揬 — | đút —, subornare

đút, another word for 'bribe' (presumably an extended usage of đút 'to insert'), has two other spellings without the 扌 'hand' radical (the means of insertion):

⿰貝突 with the monetary radical plus the same phonetic 突 đột 'suddenly'

is there the syllables of the redundant compound ⿰貝突⿰貝骨 đút lót 'bribe' would have matching radicals with this spelling: cf. Sino-Vietnamese 賄賂 hối lộ 'bribe' with double monetary radicals

đút with the monetary radical plus the phonetic 卒 tốt 'to end'

Let's score those spellings:

揬 and ⿰貝突: initial 2, vowel 3, final 2, tone 1 = 8

賥: initial 1.5 (t- is closer to đ- than, say, l- which would be a 1), vowel 3, final 2, tone 2 = 8.5

Do scores correlate with textual frequency? Did writers tend to favor better phonetic matches? Probably not. I admit my scoring is arbitrary and for fun. And timely given that the


Thế vận hội Mùa đông

'World athletic meeting Season winter' = 'Olympic Winter Games'

are still going. Though not for long - they end tomorrow.

(I wanted to type a made-in-Vietnam character for mùa 'season', but my editor doesn't support CJK Unified Ideographs Extension E. And it probably never will since KompoZer's development has been frozen since 2010.) A LÓT OF COMPROMISES: FITTING VIETNAMESE INTO A CHINESE SYLLABARY

Today I found out that one of the Chữ Nôm spellings of Vietnamese tốt 'good' (see parts 1 and 2 of my series)

䘹 = semantic 衤 y 'clothes' + phonetic 卒 tốt 'to end'

is also a Chinese character in the strict sense; it has been attested in Chinese since at least c. 2000 years ago in 楊雄 Yang Xiong's 方言 Fangyan 'Regional Speech' where it refers to *tsout 'underwear'. Did the Vietnamese recycle 䘹 for tốt 'good', or did they unintentionally recreate it? I suspect the latter, as 䘹 is a rare character; the fact that it was encoded in Unicode's Extension A block rather than the main CJK Unified Ideographs block tells me that it wasn't common enough to make it into the first wave of 20,971 characters.

字典𡦂喃引解* Tự Điển Chữ Nôm Dẫn Giải ‘Character Dictionary of Chữ Nôm with Quotations and Explanations' lists a second reading for 䘹, lót 'add a layer beneath or inside', citing


lót trong áo cừu

'add.layer in coat'

from 嗣德聖製字學解義歌 Tự Đức thánh chế tự học giải nghĩa ca 'Tự Đức's Sagely Made Song for Character Study and Explaining Meanings' edited by Emperor Tự Đức sometime in the 19th century.

Normally the Vietnamese did not write l-syllables with t-characters. The other three spellings of lót in Tự Điển Chữ Nôm Dẫn Giải have l-phonetics:

1. 律 luật 'law' (bare phonetic)

2. 𢯰 =扌 'hand' + 律 luật

3. ⿰衤律 (not in Unicode) = 衤 'clothes' + 律 luật

It is not possible to write lót with Chinese characters for lót [lɔt] or even as lốt [lot] because no Chinese characters with those Sino-Vietnamese readings exist.

Chinese syllables with *l- ending in stops were borrowed with the nặng tone written as a subscript dot, not the sắc tone written with an acute accent. I can't think of any exceptions to this rule at the moment. So it seems a tonal match was impossible.

Tone aside, a perfect segmental match was also impossible.

lọt may not even have been a theoretically possible reading since *-ɔt does not seem to have been a rhyme in any variety of Chinese known to the Vietnamese during a millennium of Chinese rule.

The absence of lột, on the other hand, is partly accidental - there was no Chinese phonotactic rule forbidding it. lột would have ultimately come from an early Old Chinese *Cʌ-rut, with a *low vowel conditioning the lowering of *u:

*Cʌ-rut > *Cʌ-rout > ́́́*rout > *lout > *lot (> Sino-Vietnamese *lột)

luật 'law' comes from early Old Chinese *rut without a preceding *low vowel to lower its *u:

*rut > *lut > *lwit > *lwət (> Sino-Vietnamese luật)

Without any Chinese characters read as lọt or lột (or lót or lốt), the Vietnamese

- had to compromise on the tone if they were to use an l-phonetic

- had to compromise on the vowel if they were to use an l-phonetic

- had to compromise on the initial if they were to use a non-l -ốt phonetic

The four spellings of lót reflects two different kinds of compromises:

- the 律-spellings have l- at the expense of the tone and the vowel

- 䘹 has a perfectly matching rhyme at the expense of the initial

It seems the Vietnamese generally favored approximating the initial, but I would like to see statistics.

It would be fun to come up with a scoring system for how close a Chữ Nôm character reading matches the pronunciation of its Chinese phonetic component**. Off the top of my head:


0 points - nothing in common

1 point - shared point of articulation (l- and t- as in 䘹) or shared manner of articulation

2 points - both shared point and manner of articulation


0 points - nothing in common

1 point - shared frontness, height, roundness(less), or length

4 points - perfect match


0 points - nothing in common

1 point - same register or *VQHC class

2 points - perfect match

Using that scoring system, out of a maximum score of 10 (2 for onset consonant, 4 for vowel, 2 for coda consonant, 2 for tone):

the 律-spellings have 7 points:

2 + 2 (length match; partial shared frontness and roundness) + 2 + 1 (same *VQHC class)

䘹 has 9 points:

1 (l- instead of t-) + 4 + 2 + 2

Yet spellings of the 䘹 type which compromise on the initial are less common despite their higher score. That makes me think I need a better scale for measuring onset fidelity.

*2.23.14:10: Chữ 'character' can be spelled at least six different ways in Chữ Nôm:

1. 字

2. 𡦂 (字 doubled; this character formation strategy is rare)

3. 𡨸 (like 2 but with one 字 abbreviated as 宁)

4. ⿰ 字 + 宁 (reversal of 3)

5. ⿰ 字 + 文 'writing'

6. 𡨹 (like 4 but with thủ 'to guard' instead of 字; 𡨹 is a Chữ Nôm character for giữ 'to guard' doing double duty for a phonetically similar word chữ; its phonetic 宁 is an abbreviation of 字 chữ)

I don't know which is the preferred spelling of the author (Nguyễn Quang Hồng). I picked 𡦂 to differentiate chữ from 字 tự which also means 'character'. Both words are borrowed from the same Chinese etymon at different periods.

**2.23.14:11: 𡗶 is a rare example of a Chữ Nôm character without a phonetic component: it is a compound of 天 'heaven' above 上 'above'. It is reminiscent of the Khitan large script character

for 'heaven' (with 土 'earth' on the bottom) which probably predates it. (The earliest surviving Chữ Nôm text is from 1209; the earliest known uses Chữ Nôm from the late first millennium do not include 𡗶.) WHAT'S SO BEAUTIFUL ABOUT A BUG'S BOTTOM?: THE ORIGIN AND ORTHOGRAPHY OF VIETNAMESE TỐT (PART 2)

I've abandoned a lot of series on this blog, but I haven't forgotten to continue what I started two weeks ago. and both list nine spellings of Vietnamese tốt 'good':

Group A with phonetic 卒 tốt 'to end'

A1: bare phonetic

1. 卒

A2: tốt đệp: 'good' in the sense of 'good-looking'

2. 䘹 with 衣 'clothes' on the left

Was  䘹 used in reference to clothes, or could it be used more broadly?

3. 𬙼 with 美 'beautiful' on the left

4. 𩫛 with 高 'high' on the left (why isn't this in group A3?)

5. 𡄰 with 善 'good (opposite of evil)' on the left (why isn't this in group A4?)

A3: tốt (dáng cao): 'good' in the sense of 'high'

6. 崪 with 山 'mountain' on the left

7. 崒 with 山 'mountain' on top

A4: tốt xấu: 'good' as the opposite of 'bad'

8. 𡨧 with a 宀 roof on top (why? - it's not by analogy with a roof in xấu 'bad', since none of the spellings of xấu contain a roof: 丑瘦臭醜.)

Group B without any phonetic: tốt đệp: 'good' in the sense of 'good-looking'

9. 𧍉 with 虫 trùng 'bug' plus 底 để 'bottom'

This last character is like so many Tangut characters: it does not seem to be the sum of its parts. Its components neither sound like tốt nor mean anything like 'good'. I wish I could find an example of 𧍉 in context to confirm this reading.

𧍉 has a second reading which makes phonetic and semantic sense: đỉa 'leech'. HENTAI KITSUBUN?

In premodern Japan, there was a form of Japanized Chinese now known as 變體漢文 hentai kanbun 'modified Chinese prose'.

Just as the Japanese once wrote in Chinese, the Jurchen once wrote in Khitan. There was no Jurchen script until 1119. Nonetheless, as late as 1156, 18 years after the creation of the second Jurchen script,

... it was officially ordered that in the [Jurchen Empire's] examination for copyist in the Department of National Historiography the Jurchen copyists be able to translate Kitan [= Khitan] into Jurchen, and the Kitan copyists Chinese into Kitan. Even the Jin [= Jurchen] emperor Shizong commented, "The new Jurchen script cannot match it [Khitan]." The Chinese original was first written in the Kitan small script and then annotated in or translated into the Jurchen script. (Kane 2009: 3)

Last night as I was thinking about the last known dated Khitan small script text from the Jurchen era known as

<GREAT> and <HEAVEN> (1161-1189)

in the Khitan small script, I realized that Jurchenized Khitan might be called 變體契文 hentai kitsubun 'modified Khitan prose' - or in Sino-Jurchen, something like biyanti kiwen.

How might Khitan be Jurchenized? Khitan seemed to have grammatical gender (Kane 2009: 144):

In the past tense of verbs, one can also see this distinction between the suffix <er> for males and <én> for females.


With the numerals, Wu Yingzhe has noticed an important phenomenon: in most cases, the dotted form refers to a male, and the undotted form to a female, or is non-gender specific. Dotted and undotted forms also appear with inanimate objects, strongly suggesting grammatical gender in Kitan. The whole corpus needs to be reexamined with a view to pursuing these clues, but that research has not yet been done.

Jurchen, on the other hand, did not. In theory Jurchen speakers writing in Khitan might have omitted masculine dots or added them to nonmasculine forms. (Not just feminine because I suspect there might be a neuter gender with agreement patterns blending masculine and feminine characteristics; seemingly inconsistent nouns may have been neuter.) Do inconsistencies in gender marking cluster in Jurchen-period Khitan texts? Even if that were the case, that does not necessarily mean gender problems would have been unique to Jurchen speakers. Perhaps gender was on the decline in native speaker Khitan.

Other Jurchen errors from a Khitan native speaker's perspective could have been less dramatic: e.g., incorrect case marking akin to saying Japanese-style 'X DAT become' instead of 'X NOM become' for 'become X' in Korean.

Centuries later, the Jurchen used Khitan's sister language Mongolian in writing after they had forgotten their own scripts. Has there ever been a study of Mongolian as written by Manchu speakers?

So far, the Khitan corpus has generally been treated as a single entity. But Khitan was written across a wide area for three centuries. What may appear to be inconsistencies within the corpus may turn out to be innovations and/or hentai kitsubun features correlated with specific times and/or places.

*2.22.9:24: The hypothetical Khitan neuter might be like the Romanian neuter which has no unique features:

[...] in synchronic terms, Romanian neuter nouns can also be analysed as "ambigeneric", i.e. as being masculine in the singular and feminine in the plural

Or maybe there is no neuter. The dots may not have a simple one-to-one correlation with  the two genders. KHITAN DOROGHAM 'PEACE'?

Last night I mentioned one exception to 'heaven' and 'great' matching up in the Khitan large and small scripts in Andrew West's list of era names:

= =

The Chinese era name 大定 'great settlement' (1161-1189; Kane 2009's translation) corresponds to


in the large script and both


in the small script.

I assume the third large script character

is <gham> corresponding to


in the small script.

Did the Khitan have two era names recorded in the small script? This list mentions only one small script inscription from that era: the epitaph for the 博州防禦使 Bozhou defense commissioner (1171). I don't have access to the text. Do both small script era names appear in it, or is a single instance of the era name difficult to read? I can imagine a damaged 'great' looking like it could be 'heaven' in the small script or vice versa.

In any case, the second name element also appears in the Khitan large script equivalent of the Chinese era name  保寧 'protect tranquility' (969-979; again, Kane's translation):

<? SEAL gham>

The first character might represent a Khitan word for 'protect'. I cannot guess its reading because I don't know the Khitan small script equivalent of that era name.

I can, however, draw this equation:

定 'settlement' = 寧 'tranquility' = large <SEAL gham> = small <SEAL>

Thesaurus Linguae Sericae defines both 定 and 寧 as 'peaceful' in the synonym group 'delightful because orderly and lacking chaos'.

I conclude that <SEAL.gha(.a)m> (a joint transliteration of the large and small spellings) could have meant something like 'peace'. It may have had nothing to do with seals or rituals.

The Khitan word for 'seal' doubled as the word for 'ritual'. Jurchen doro(n), written as


<SEAL> ~ <SEAL> ~ <SEAL.un> (with clarifier)

with a character clearly related to


in the Khitan large script, also had that double meaning. Jurchen doro(n) may either be a borrowing from Khitan or be an unrelated word whose semantic scope was influenced by Khitan.

Khitan <SEAL> in <SEAL.gha(.a)m> may be a phonogram rather than a logogram. If the Khitan word <SEAL> was the source of Jurchen doro(n), it might have been read doro, and the word in question was dorogham.

Khitan dorogham looks vaguely like Written Mongolian words such as doru 'weak' and doru-ghsi 'downward' (cf. dege-gsi 'upward'*). Could they all share a root *dor 'down'?

> 'calmed down' > 'peaceful'

> 'pushed down onto paper' > 'seal'

> 'act performed to calm down', 'act of stamping a mark on circumstances' > 'ritual'

> 'strength down' > 'weak'

So perhaps <SEAL> wasn't just a phonogram; it might have been etymologically appropriate as well in <SEAL.gha(.a)m>. But what would Khitan -gham be if <SEAL> was the root?

*2.21.1:51: -gsi is an allomorph of the Mongolian '-ward' suffix after 'feminine' vowels like e. THE RE-DISSECTION OF KHITAN 'SUCCESSION'

Last night I noticed something obvious that eluded me three years ago: the second character of the Khitan large script equivalent of the Chinese era name 統和 'uniting harmony'

looks like 統 'to unite, govern', the first character of the Chinese era name. Could the 統-like Khitan large script character have represented a Khitan borrowing of Liao Chinese 統 *tʰuŋ 'to unite, govern' or a native Khitan word meaning 'to unite' and/or 'to govern'? If so, then my earlier equation of that single character with two characters in the large script and with <s.bu.o.ɣo> in the small script would have to be changed to


'to unite/govern'? ≠ 'succession' = 'succession'

That then brings up the disturbing possibility that other Khitan large script era names may not be equivalents of Khitan small script names even if there are definite partial matches: e.g., 'heaven' and 'great' almost always match up in the two scripts in Andrew West's list of era names:

= =

I will look at one exception to that generalization in my next post. ECHOES OF THUNDER: CLARIFIERS IN THE JURCHEN LARGE SCRIPT

Today I saw Japanese dare 'who' spelled as 誰れ <> in the title of the Gatchaman episode that first aired forty-five years ago today: 「総裁Xは誰れだ」. That is unusual because 誰 by itself is normally sufficient as a logogram for dare 'who'. The character has no other modern reading when it stands alone for a word. The reading tare is archaic; no one would look at a cartoon titled 「総裁Xは誰だ」 and wonder whether 誰 should be read tare or dare.) Therefore there is no need to add a clarifier hiragana れ <re>.

The Jurchen script is full of such clarifiers: e.g., akdiyan 'thunder' appears as both


<THUNDER> and <>.

If not for the word's Manchu reflex akjan and Chinese transcriptions like 阿玷 ~ 阿甸 *atjan*, we would not be able to reconstruct the probable reading of <THUNDER>.

Why did the Jurchen write akdiyan with a clarifier <an>? If Juha Janhunen is right, the Jurchen large script was not invented; rather, it was an adaptation of an existing variant of the Chinese script that was in use in the kingdom of Parhae that once ruled the Jurchen. If <THUNDER> was taken from that Parhae script, the Jurchen may have thought <THUNDER> could stand for both the lost Parhae word for 'thunder' or their own word. Adding <an> insured that <> would be read as their word akdiyan rather than as the Parhae word (which in this scenario wouldn't have ended in -an).

The Jurchen could have gotten the idea of clarifiers from their southern neighbors in Korea who used clarifiers to indicate that Chinese characters were to be read as Korean words rather than as Chinese words: e.g., in line 7 of the Old Korean poem 彗星歌 Hyeysŏngga 'Song of the Comet' (lit. 'Sweeping Star Song') by 融天師 Master Yungchhŏn in the 鄉札 hyangchhal script during the reign of Shilla's 眞平王 King Chinphyŏng (r. 579-632).

道尸 <ROAD.l> 'road' (cf. modern Korean kil)

掃尸 <SWEEP.l> 'sweep' (cf. modern Korean ssŭl)

星利 <> 'star.?' (cf. modern Korean pyŏl.i**)

are written with the clarifiers 尸 <l> and 利 <li> to rule out Sino-Korean readings *to, *so, and *seŋ (later syŏng and now sŏng) for 道, 掃, and 星. We do not know for certain whether the modern Korean words above are the direct descendants of the Old Korean words, but they are the most likely reflexes even though in theory Old Korean could have had another word for road ending in *-l that was not ancestral to modern kil, etc. We can be far less certain about the Old Korean pronunciations of <ROAD>, <BROOM>, and <STAR> without internal sources (i.e., alternate phonogram spellings in hyangchhal) or external sources (e.g., Chinese transcriptions).

*2.19.5:16: 阿玷 is from the Sino-Jurchen vocabulary of the Bureau of Translators (四夷館; entry 7) and  阿甸 is from the Sino-Jurchen vocabulary of the Bureau of Interpreters (會同館; entry 4).

**2.19.23:39: <> at first appears to corresponding to modern Korean 'star' followed by the nominative case marker i, but the context seems to require an accusative case marker, so either the usage of i may have changed over time or 'star' in fact was once a disyllabic word ending in *-li or *-ri with a final *-i reminiscent of Old Japanese posi 'star' which is sometimes thought to be a cognate. I need to look into this more.

Tangut Yinchuan font copyright © Prof. 景永时 Jing Yongshi
Tangut character image fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2017 Amritavision