1. Yesterday, I mentioned that 銛 has three Mandarin readings:

(The meanings are from 廣韻 Guangyun [1008] and were not necessarily current even a thousand years ago.)

金 <METAL> on the left of 銛 is semantic. 

舌 on the right of 銛 is an unusual phonetic. Most phonetics represent one class of syllables, but 舌 represents three, so it has three different numbers in Karlgren's Grammata serica recensa (1957) and Schuessler's (2009) Minimal Old Chinese and Later Han Chinese: A Companion to Grammata Serica Recensa:

銛 is unusual because it simultaneously belongs to two series: GSR 302/S 22-01 and GSR 621/S 36-16.

The phonetic of GSR 302/S 22-01 is 𠯑 OC *kʷat to shut the mouth' which originally has nothing to do with 舌 <TONGUE> (apart from sharing 口 <MOUTH> on the bottom) and hence cannot be conflated with GSR 288/S 20-10. 𠯑 is only abbreviated as 舌 in combinations. There is also another abbreviation 𠮮 which is distinct from 舌 <TONGUE>. Does 銽, the full form of 銛, only have velar readings?

Another more complex variant of 銛 is 𨨱 with the phonetic

huó < MC *ɣwat < OC *gʷat

also belonging to GSR 302/S 22-01. Does 𨨱 also only have velar readings?

舌 in GSR 621/S 36-16 is an abbreviation of the semantic compound 甜 <TONGUE.SWEET> OC *lem 'sweet' as a phonetic. Such abbreviated phonetics are common in Tangut: e.g., 𗗘 1079 2lenq3 'sweet' may be an abbreviated phonetic 𘡔 in

Or one of those three could be phonetic in the others. No Tangraphic Sea analysis survives for any of the four.

𗗘 1079 2lenq3 < *Slim 'sweet' is cognate to 甜 OC *lem.

2. GEOGRAPHY IS NOT LINGUISTIC GENEALOGY: Just as not all languages of India and Europe are Indo-European, not all languages of Micronesia are Micronesian. According to Wikipedia, Ross, Pawley, and Osmond (2016) regard Yapese as a primary branch of Oceanic: a sister of Micronesian. And according to Wikipedia, Blust (1993) regards Chamorro and Palauan as primary branches of Malayo-Polynesian: sisters of Oceanic. Nukuoro and Kapingamarangi are Polynesian outliers in Micronesia.

Classification of the languages of Micronesia

Micronesian languages

Proto-Oceanic had a fairly simple phonology with one unusual feature: a distinction between two series of labials:

*ᵐb *m
*pʷ *ᵐbʷ *mʷ *w

Proto-Malayo-Polynesian only had *p *b *m *w. How did the *Pʷ-series develop?

Yapese doesn't have any labialized labials, but it does have some unusual features absent from Proto-Oceanic and most languages:

I've never heard of ejective fricatives before. More on them here with audible samples.

I have no idea how either series developed.

Yapese has sixteen vowels. How did they develop from the *five of Proto-Oceanic? I'm interested in how small vowel systems develop into larger ones because I am looking for insights into vocalic expansion in Tangut which went from *six vowels to dozens (without distinctive length!).

3. It's unfortunate that the peoples indigenous to the  regions where the world's most famous cities are now located are not well known. Until today I didn't know about the Ohlone of "the coast from San Francisco Bay through Monterey Bay to the lower Salinas Valley". I spent years in Berkeley without ever learning of the Chochenyo. I've never been to Ohlone Park or the Ohlone Greenway.

4. I look forward to KJ Solonin's forthcoming book The Descendants of the White and High: The Tanguts in Asian History. I wonder who'll contribute to it. A corresponding book for Tangut's distant relative Pyu would be nice, though it might be far slimmer. HANGUL AND IDEOGRAPHIC TONE MARKS

(Posted 19.10.16.)

1. I first found the Wikibooks index to 東國正韻 Correct Rhymes of the Eastern Country almost exactly six years ago and have been using it ever since. But I didn't notice until last night that it used Korean-specific Unicode combining characters for tones:

The only other possible tone (low) is unmarked.

Preceding those marks in Unicode order are

which I've never seen in any electronic text, though I've seen them in print since I got my first Sino-Japanese dictionary over thirty years ago. Let's see if they work with Tangut:

(I don't know how the Tangut translated Chinese 去'departing' since the native Tangut phonological tradition only applies three Chinese tonal categories to Tangut.)

I see that the combining tone characters only work with my Tangut font if I copy and paste the above text and post it into BabelPad or BabelMap. The characters aren't in the Tangut font specified by the style for Tangut on this site.

2. I first heard the Cantonese term zuk1 sing1 for overseas Chinese in 1991. I was told it meant 'empty bamboo', but I could never find any word sing1 meaning 'empty' (and the noun-adjective order was odd). It's taken me 28 years to learn what sing1 is:

The original term is 竹杠 ['bamboo rod': i.e., hollow/empty bamboo]. But 杠 [gong3] is pronounced exactly like"fall”, which is considered as inauspicious. The very opposite of “fall” is “rise”. So 升, meaning “rise”, is chosen to replace 杠.

Strangely the term turns out to mean 'thick bamboo pole' as well as someone 'empty of Chinese culture and values like a hollow bamboo pole'.

3. Dept. of Etymology ≠ Semantics: I wouldn't have guessed this distinction, since both bi- and di- mean 'two':

By contrast, duotheism, bitheism or ditheism implies (at least) two gods. While bitheism implies harmony, ditheism implies rivalry and opposition, such as between good and evil, or light and dark, or summer and winter.

I'm trying to come up with a mnemonic for the distinction: ditheism entails discord, whereas bitheism is associated with b-something? Benevolence?

4. Does Manichaeism still exist?

In modern China, Manichaean groups are still active in southern provinces, especially in Quanzhou and around the Cao'an, the only Manichaean temple that has survived until today. There is a Chinese Manichaean Council with representatives in Tibet and Beijing.[citation needed]

Normally I edit out things like "[citation needed]" when quoting Wikipedia, but I think that instance has to stay.

The Wikipedia article on the 草庵 Cao'an does not mention any modern Manichaeans.

5. Today episode 2 of Super Robot Mach Baron turns forty-five years old. I never noticed the unusual spelling of the name of the series' robot designer until tonight:


That spelling only appears in the closing credits. Online it appears as 田中視一. Apparently 㐅 is an idiosyncratic simplification of 示 shimesu-hen.

田中 is Tanaka, but I don't know how to read (⿰㐅見)一 ~ 視一. The obvious Sino-Japanese reading Shiichi doesn't sound like a name. O'Neill's Japanese Names gives a native name reading nori for 視, and Wiktionary lists a native reading tomo, so perhaps (⿰㐅見)一 ~ 視一 is Norikazu or Tomokazu (kazu being the native name reading of 一; there is a strong tendency for both readings in a two-character name to be of the same origin).

6. While reading about Heil V1, one of the robots on Mach Baron, I encountered a kanji I didn't recognize:銛 mori 'harpoon'.

Shipka stats, Kanken levels, and Jun Da's general Chinese ranks:

Jun Da


銛 has three Mandarin readings: xiān, tiǎn, and guā. More on them tomorrow. ?STEMI QAGHAN

(Posted 19.10.16.)

1. In Turkish and Mongolian Studies (1962, 73), Sir Gerard Clau*son mentions

the great Türkü ruler of the second half of the sixth century, Eştemi Kağan (the exacct pronunciation of his name of his name is uncertain, the Byzantines called him Stembis Xagan, and the Chinese Shih-tieh-mi [Shidiemi in pinyin])

How did Clauson guess that the first vowel might be E-? There has to be an initial vowel because Turkic did not allow initial consonant clusters.

The Wikipedia entry for that qaghan is titled "Istämi" and gives his name in runes as 𐰃𐰾𐱅𐰢𐰃 <is₂t₂mi>. Is that an attested spelling, or someone's modern creation? The runic script is often ambiguous, but that spelling unambiguously represents [i] since <i> has to be [i] and not [ɯ] in a word with series 2 (front-vowel) consonants <s₂> and <t₂>. The e is not written since

[s]hort vowels, other than those enclosed in digraphs, should not be written except when they are the first vowel of a word, and then only if they are not a/e. (Clauson 1962: 81)

Other versions of the name in Wikipedia:

Chinese transcriptions in Wikipedia:

I don't see the Shidiemi Clauson mentioned.

All the transcriptions point to ş rather than s, contrary to the runic spelling. (The s in the Byzantine spelling could represent Turkic ş since Greek had no letter for ş.)

2. The final *-t in the transcriptions of ?stemi above corresponds to zero in Turkic. Clauson (1962: 88) noted that Middle Chinese *-t corresponded to Turkic -ð, -l, -r, and even zero, but not -t which was transcribed as *tV. That seems to imply that northwestern Middle Chinese *-t had already shifted to *-r. Might cases of *-r corresponding to Turkic zero really involve a subphonemic Turkic [ʔ] after short vowels? Are there any cases of *-r corresponding to zero after Turkic long vowels?

3. Taishanese has palatal allophones [tɕ tɕʰ] of /ts tsʰ/ before the high vowels /i u/. This is reminiscent of the palatalization of stops (but not affricates!) before high vowels in Late Old Chinese:

Pulleyblank (1984: 179) opened my eyes to the phenomenon of palatalization and affrication without palatal vowels, though I should have figured that at least such affrication was possible since I had known even before reading his book that Japanese tsu [tsɯ] was from *tu.

It's been over twenty years since I wrote an unpublished study of Taishanese historical phonology. It would be interesting if I could find it, though it's so old that the file might be not readable with my current software and fonts. Unicode insured that everything I've written since I switched to Windows XP in 2002 should be legible in the future.

4. James Evans' 1841 grid for his syllabics is completely full. It has some interesting characteristics.

4a. Onsets

sp- is the only possible initial cluster. It is now obsolete.

4b. Nuclei

There is no <u> or <ū> since Cree and Ojibwe only had a single type of labial vowel phoneme written as <o> and <ō>.

There is a graphic distinction between <e> and <ē> even though

Cree ê only occurs long [...] Not all writers then or now indicate length, or do not do so consistently; since there is no contrast, no one today writes ê as a long vowel.

4c. Codas

There is a single symbol for a cluster coda <hk> since -hk is "a common grammatical ending in Cree". In Ojibwe, the same symbol represents the common cluster coda -nk.

Murdoch (1981: 27, 60) includes a second cluster  coda symbol <sk>.

I found Murdoch through this page via Wikipedia:

John wrote to Cree Literacy Network about it in 2017: "I researched the origins and evolution of syllabic characters for Cree, Inuit and Dene languages, producing a MEd thesis at the University of Manitoba in 1981. Although James Evans, the Wesleyan Methodist missionary played a part in the first printings in syllabics at Norway House, He was not the person who was the most instrumental in the writing systems conception and spread. During my research I visited archives as well as Aboriginal communities in the Boreal Forest as well as the Eastern Arctic. Missionaries George Barnley, John Horden, Jean-Nicolas Laverlochère, Edmund Peck and Jean Baptiste Thibeault all arrived to Cree, Inuit and Dene nations who were already able to read and write in the system."

Is Murdoch saying the script spread across the First Nations without Evans' (or any Euro-Canadian?) involvement? It's not clear whether he agrees withWikipedia's take based on that online article:

there is strong evidence to suggest that the Cree people already knew the writing system and Evans simply adapted it for print.

But skimming Murdoch's thesis, I get the impression that in 1981 (maybe not anymore) he supported the conventional view of Evans as inventor. I'm confused.

This reminds me of the issue of whether the Khitan and Jurchen large scripts were invented or derived from a preexisting Parhae script.

5. What is the etymology of Yazidi?

Earlier scholars and many Yazidis derive it from Old Iranian yazata, Middle Persian yazad, divine being.

I wouldn't expect a to become i or î (the Yazidi autonym is Êzîdî ~ Êzidî). But I don't know Kurdish historical phonology.

*a > i is not impossible. It's common in Tangut: e.g.,

*tI-S-tsa >𘅗 1321 1ziq4 'shoe'

Compare with Japhug tɯ-xtsa 'id.' (Jacques 2014: 90). KHITAN CHICKENS, STABLES, AND ALTERNATORS

(Posted 19.10.16.)

1. I've long been bothered by the Khitan word for 'chicken', written as


<t.qo.a> (~ <CHICKEN>?)

I have put the large script character in parenthesis, since I do not know of any evidence for its pronunciation. It is parsimonious to assume that it was pronounced like <t.qo.a> in the small script.

How was <t.qo.a> pronounced? That's half of my problem. The other half is how that pronunciation relates to other words for 'chicken' in continental 'Altaic' (which I regard as a language area, not a language family).

Kane (2009: 88) reads <t.qo.a> as teqoa with an inherent vowel e. This reading has a nonharmonic e-a sequence that is unusual for a continental 'Altaic' language.

Shimunek (2017: 372) reads the small script spelling as <>. It is certain that the second symbol stood for something absent from Chinese, though what that somehting was is uncertain.

Neither interpretation is a simple match for other words for 'chicken' in the area:

Here's an attempt to make sense out of most of that, taking Kane's interpretation as an endpoint in Khitan:

1. The earliest form of the word was something like *taqɯʁu. I don't know whether it originates from Turkic, Serbi-Mongolic, or a third language in contact with them (Ruanruan? Xiongnu?).

2. The vowels metathesized in pre-Khitan: *tɯqaʁu.

3. Medial lenited to zero: *tɯqaʁu > *tɯqau.

4. *au metathesized: *tɯqau > *tɯqua. (I thought all au in Khitan were either from *aCu or in Chinese loanwords, but what of au in taulia 'hare' corresponding to Mongolian taulai? Maybe only root-final *au metathesized.)

5. In an unwritten eastern Khitan dialect, *tɯqau or *tɯqua became *tiqo.

6. That *tiqo was borrowed into pre-Jurchen which then lenited intervocalic *q to h: *tiqo > tiho. (It's also possible that pre-Jurchen borrowed Khitan as *i, so maybe the Khitan source form retained *ɯ.)

(10.13.1:23: I briefly considered the possibility that *au monophthongized within Jurchen: *tiqau > tiho. But *au should correspond to Manchu oo which is not in Manchu coko. So I think the common Khitan source of the Jurchen and Manchu words already ended in -o.)

7. In the written Khitan dialect, *tɯqua became teqoa with high vowels lowering under the influence of a.

8. One Jurchen dialect shifted *tiqo to *tyoqo. *ty then palatalized to c, resulting in Manchu coko *[tɕʰɔqʰɔ] (later [tʂʰɔqʰɔ] with a Mandarin-like retroflex [tʂ]). (But I cannot explain why Manchu has -k- instead of -h-. Manchu intervocalic -k- should be from a *cluster, not a simple *-q-.)

Shimunek (2017: 372) reconstructs Proto-Serbi-Mongolic *tʰakʰɪɣa 'chicken'. Presumably *-ɪɣa was reduced to -a in his Khitan.

Shimunek (2017: 372) thinks Middle Korean ᄃᆞᆰ <tărk> [tʌrk] 'chicken' is also part of the same word family.

Old Japanese təri 'bird' has been linked to that Korean word (perhaps most recently by Francis-Ratte [2016: 211] who regards them as genetic cognates).

I suppose one could regard Korean and Japanese r as attempts to imitate a foreign *ʁ. I can't explain the  Korean and Japanese vowels.

2. <t.qo.a> 'chicken' is an example of what I call a stable in Khitan. I propose grouping obstruent-initial words in Khitan into two categories, stables and alternators. Stables are always written consistently with the same type of consonant, whereas alternators alternate.

<ku> 'person'
<> ~ <> 'open' < 開 *kʰaj
<x.s> 'region'
<> 'tent'
<> 'blood'
<> ~ <> 'second'
<ju.un> 'summer'
<t.qo.a> 'chicken'
<> ~ <> 'fourth'
<da.lV?> 'seven'
<p.o(.o)> 'monkey'
<p.u> ~ <b.u> 'to be' (Kane 2009: 156)
<b.qo> 'son'

( Chinese loanwords seem to generally be stable, though there are exceptions like the syllable in the table above.)

I think the stables mostly have unaspirated-aspirated oppositions:


I've omitted x /x/ which doesn't fit into the above paradigm. The shift of k > x is also in Jurchen and Manchu (under Khitan influence?).

The alternators are a mystery:

3. I didn't see Andrew West's latest post until just now. Two points:


This sequence of three characters 没蜜施 does not make any sense as Chinese, but are here used to transcribe the Old Uighur word bolmïš "to have become" (from bol- "to be, to become" plus perfect participle -mïš) or bulmïš "to have received" (from bul- "to find, to get, to receive" plus perfect participle -mïš) which both occur in the titles of nine Uighur khans between 747 and 848, as recorded in the Old Book of Tang (舊唐書 Jiù Tángshū) and New Book of Tang (新唐書 Xīn Tángshū).

Why were bol- and bul- both written with the same Chinese character 没 (now read with m- in most Chinese varieties, not b-!)?

In the Tang prestige dialect, 没 was pronounced something like *mbor.

The Tang prestige dialect no longer had *b-, so *mb- was the closest available approximation of Old Uyghur b-.

The Tang prestige dialect had no *-l, so *-r was the closest available approximation of Old Uyghur -l.

*mbor is the best possible approximation of Old Uyghur bol- 'to be'. But why wasn't Old Uyghur bul- 'to find' transcribed as *mbur? Because the Tang prestige dialect lacked that syllable: earlier *mut had become *mvur instead of *mbur. *mb- could not occur before *u in the Tang prestige dialect. So 没 *mbor had to do double duty for Old Uyghur bol- 'to be' and Old Uyghur bul- 'to find'.


There is some confusion over the two words bolmïš "to have become" and bulmïš "to have received" as they are both written the same in the Old Uighur and Old Turkic scripts, and are both transcribed as 没蜜施 or 没密施 mò mì shī in Chinese (is pronounced mut⁶ in Cantonese), which suggests to me that they should be the same word.

Clauson (1972: 332) says early Turkic bol- and bul- were "normally indistinguishable graphically". I assume the different vowels have been projected backward from the modern languages: e.g., modern Turkish has olmak 'to be' (with irregular b-loss) and bulmak 'to find'.

3c. Why did 蜜 *mbir ~ 密 *mbɨir transcribe an Old Uyghur open syllable mï? In theory, 麋糜靡 *mbɨi would have been better fits, but 蜜 and 密 are more frequent characters, so maybe they came first to the transcriber's mind. (No, in fact, 糜靡 are more frequent than 蜜 in Jun Da's premodern corpus!)

4. Japanese amefurashi 'sea hare' has a surprising spelling (in addition to boring ones: アメフラシ, 雨降らし, 雨降):


ame-fur-ash-i is literally 'rain-fall-CAUS-ing'. ame obviously corresponds to 雨 <RAIN>, but furashi has nothing to do with 虎 <TIGER>. (It is a coincidence that 虎 is pronounced fu2 in Cantonese.)

Are there any native Chinese, Korean, or Vietnamese words for 'sea hare'? Do the words I found in Wikipedia count, or are they loan translations of Latin lepus marinus (via sea hare or some similar European term)?

There are five nôm spellings for biển 'sea' at

phonetic reading
氵<WATER> biển ✓✓ ✓✓
𣷷 biến
𣷭 bỉ ✓✓ 3

The points indicate the quality of matches: i.e., the number of check marks.

㴜 is the most straightforward; the phonetic 扁 is a perfect match.

𣷷𤅶 have a phonetic with a mismatching tone in the same *register as the tone of biển.

汴 has a phonetic with a mismatching tone in a register differing from that of the tone of biển.

𣷭 has a phonetic with a matching tone, but the vowel of 彼 is a monophthong instead of a diphthong, and 彼 has no -n. Did 𣷭 originate as an error for 𣷷? EYE OF THE ARAB

(Posted 19.10.16.)

That's what I thought عين العرب `Ayn al-`Arab meant at first. It's actually 'Spring of the Arab' - the water sort of spring. عين `ayn has multiple meanings. `Ayn al-`Arab is also known as Kobanî or Kobane:

Nobody disputes that the town [of Kobane] is a relatively new settlement. Before the 20th century, it was just a water meadow where even great commanders like Saladin used to feed the horses of his army. For a long time, it was referred to as Arab Punarı (“Arab Spring” in Turkish).

Muhsin Kızılkaya, a writer of Kurdish origin, told private Turkish broadcaster CNN Türk on Oct. 13 that Kobane was not even a small village at the turn of the century. “The Germans set a small station there while building the Baghdad Railway. A new settlement was developed around the construction and locals called it Kobane, in reference to the German ‘company’ that built a road in the area,” he said.

The rendering of “company” as “Kobane” seems logical at first glance, considering the fact that both Kurds and Arabs adapt many Western words by changing the letter “m” to “b.”

Really? Initially I assumed that the p of German Kompanie was Arabized as b since standard Arabic has no p. Kurdish has p, and both Kurdish and Arabic have m, so there would be no motivation for changing m to b.

(The correspondence of mp to b reminds me of how Japanese b is from *mp: e.g., in 旅人 tabibito < *tambimpitə 'traveller'. See yesterday's entry.)

The Hurriyet Daily News article points out another problem:

Historically, however, the “company theory” sounds weak, as Germans use the word “Gesellschaft” for business companies. “Kompanie,” on the other hand, refers to military units. [See Wiktionary.]

Others have suggested that the middle part of the name Kobane could come from the German word “bahn” (road). In fact, Anatolische Eisenbahn, a German company, built the landmark Baghdad Railway, which some historians see as one of the causes of the First World War.

But if -bane is from German Bahn, what is Ko-?

I'm amazed that such a recent name has no certain etymology. TANGUT DATABASE 4.0

1. Version 4.0 of my Tangut database includes a new Unicode column and has corrected data for 11 entries thanks to Andrew West. Details in the changelog on sheet 2.

2. I never gave any thought to the d in Japanese 仲人 nakōdo 'matchmaker' (cf. naka 'middle' and hito 'person') and 狩人・猟人 karyūdo 'hunter' (cf. kari 'hunting' and hito 'person') until I learned today that 旅人 <TRIP PERSON> could be read as tabyūdo (just one of six possible readings!). Wiktionary gives the following derivation:

*/tapiputo/ → */tabibuto/ → /tabiudo/ → /tabjuːdo/

Here's my derivation:

Stage 1: *tambi nə pitə 'trip GEN person'

Stage 2: *tambimbitə

The earliest attested form from Man'yōshū (but written semantographically as 客人 <GUEST PERSON>; if not for the reading tradition, I would never have guessed that 客 was read tabi).

*nə p is compressed into *mb.

The regular reflex of this word survives in modern Japanese as tabibito, the most common reading of 旅人.

Stage 3: *tambimbto

rounds to *o.

Irregular assimilation of *i to the preceding prenasalized labial stop *mb.

Stage 4: *tambiũndo

The nasalization spreads from the vowel to the following stop.

Stage 5: *tabjuːdo

Voiced stops lose prenasalization.

*i is reduced to *j and its length is transferred to the following *u.

Stage 6: [tabjɯːdo]

*u loses its rounding.

3. If Hawaiian mōʻī 'king' is "of recent origin", not in print until 1832, where did it come from? proposes a link to ʻī 'supreme'. I would expect mōʻī to be a noun-adjective phrase 'supreme mō'. But what is mō? The short form of moku 'district'?

It seems mōʻī got 'promoted' over time: 19th century attestations mean 'temple image', 'lord of images', and 'a rank of chiefs who could succeed to the government but who were of lower rank than chiefs descended from the god Kāne'. Cf. how the Xiongnu title 'crown prince' transcribed in Late Old Chinese as 護于 *ɣwah-wɨa (phonetically [ʁwɑχwɨa]¹?) may be the source of the later Altaic title qaghan 'supreme ruler' (see Vovin 2007 on its etymology).

¹I speculate that uvular [χ] is an allophone of final /h/ in 'type A' syllables which are characterized by lower, backer vowels (like [ɑ] for /a/) and uvulars. The use of for what I think was [ʁ] is out of habit and in accordance with tradition.

It is interesting that the second syllable of 護于 *ɣwah-wɨa is a 'type B' syllable which is characterized by higher, less back vowels (like *ɨa < *a). That suggests the original Xiongnu word had a mix of vowel types and that the Xiongnu language did not have vowel harmony like Altaic (or, I think, Early Old Chinese). A very un-Altaic *ʁwɑχwa² with two different vowels may have been simplified to qaghan (i.e., two syllables with the same vowel) in Turkic and Serbi-Mongolic (e.g., Khitan qagha) via Ruanruan.

²Late Old Chinese had no syllable *wa, so 于 *wɨa might have been an approximation of a Xiongnu  *wa.

4. 寄席 <GATHER SEAT> yose 'traditional Japanese verbal entertainment theater' is an interesting case of an abbreviation in speech but not in writing. Naver regards it as an abbreviation of  寄せ席 yoseseki 'gather-seat'. seki 'seat' is no longer pronounced in yose, but its character 席 remains in spelling. KHITAN SMALL SCRIPT CHARACTERS 108, 110, AND 111

(Posted 19.10.11.)

Could the low-frequency Khitan small script characters

110 and 111

be variants of the high-frequency character

254 <d>?

If 110 is a variant of 254, then perhaps the low-frequency character


is a variant of

107 ~ 347 <oi>.

Let's see if any instances of 110 and 111 are in environments matching those of 254.

In 契丹小字研究 Research on the Khitan Small Script (1985), 110 always only appears in second position unlike 254 which has no such restriction. Is that significant? Could that imply that 110 is a vowel character that must follow an initial consonant character?

254 match?
Xuan 19.24, Zhong 6.6, 13.27, 41.26,  46.4 021.110
Zhong 41.12, Zhong 44.44 021.110.140
mo.110.en Dao 25.13

<mo.110.en> looks like a genitive of <mo.110>. If 110 is <d>, then the above words are mod and mod-en. Could mod in turn be a variant of 021.247 <mo.t> 'woman-PL'? Both -d and -t are attested as plural suffixes after vowel-final nouns (Kane 2009: 138-140).

254 match?
Xu 51.44
Gu 6.30
Zhong 28.9

111 is followed by 241 <pu> and resembles

374 <tai>.

Could 111 241 be <tai pu> for Liao Chinese 太傅*tʰajfu 'Grand Tutor'?

254 appears by itself where 254.122 <> for Liao Chinese 大 *taj would be expected. I suspect 254 is an error for 254.122 and not a true standalone character like 374 (and 111?).

028.111.339.100 is the spelling on pp. 607 and 705 of Research on the Khitan Small Script, but p. 178 has 028.110.339.100.

108 is always in second or third position. Is it in any environments where 107 and/or 347 are attested?

107 match?
347 match?
Xing 4.13, Dao 7.24, 19.8, Xuan 6.20 131.108

Dao 28.11
Xuan 27.26
Gu 15.6, 15.8
Zhong 43.36

<u.108.d> could be a plural of <u.108>.

<> looks like a verb + -lge- causative/passive + converb -ei sequence. Interpreting 108 as <oi> would work nicely there: moilgei? But <oi> would result in awkward vowel sequences elsewhere: e.g., cieoien? Could 108 represent a CV syllable absent from Chinese loanwords? ÖCALAN

(Posted 19.10.11.)

1. What is the etymology of the surname of Kurdish leader Abdullah Öcalan? It doesn't look Kurdish since it has the vowel ö absent from Kurdish. The vowel ö is characteristic of Turkish (and Wiktionary identifies the name as Turkish), but the vowel sequence ö-a violates Turkish vowel harmony.  (Hypothetical harmonic names would be Ocalan and Öcelen.) Such violations are possible in compounds and loanwords, but I cannot find any Turkish words ö, öc, or calan that would enable me to analyze the name as ö-calan or öc-alan. (There is a Turkish word alan.)

2. "Areal developments in the history of Iranic: West vs. East" (2018) by Martin Joachim Kümmel looks like a handy all-in-one-place reference for the big picture of Iranic historical phonology.

I'm going to start using 'Iranic' instead of 'Iranian' in linguistic contexts

[t]o avoid confusion with terms related to the country or territory of Iran (especially in recent geneticist papers speaking of prehistoric "Iranian" populations almost certainly not "Iranian" in the linguistic sense) (Kümmel 2018, slide 3)

Iranic is consistent with my use of Turkic to avoid confusion with Turkish referring to the country and dominant language of Turkey.

3. By analogy, maybe I could use Taic to avoid confusion with Thai referring to the country and dominant language of Thailand. Or perhaps better yet, Daic for consistency with Kra-Dai (it is odd that most speak of a 'Tai' rather than a 'Dai' branch of Kra-Dai, though there is a Kra branch).

4. Back to Kurdish: how did Sorani and Kurmanji develop h- in hesp 'horse' in this table? That seems to be an independent Kurdish innovation that has nothing to do with the equally mysterious h- in Greek ἵππος <híppos>. The Proto-Indo-European initial consonant of 'horse' was *ʔ-, not the *s- that became h- in Greek and Iranic.

*s-weakening occurred independently in those two branches, as it is absent from Indic and cannot be reconstructed at the Proto-Indo-Iranic level (see Kümmel 2018, slide 14):

Proto-Indo-European *s
Greek h
Sanskrit s
Proto-Iranic *h

Pyu also has *s-weakening: e.g., hi 'to die' (cf. Tangut 𗢏 3072 2si4 < *CIseH 'id.').

Typing "Proto-Indo-Iranic" and "Proto-Iranic" feels weird. But are such terms any worse than Indic instead of Indian?

5. In that same table are Sorani erz ~ erd, Kurmanji erd, and Zaza erd 'earth'. Is that word a northwestern Iranic innovation that has nothing to do with English earth? Wiktionary lists no Iranic reflexes of Proto-Indo-European *ʔer- 'earth'.

6. How did I never encounter the word glossonym before?

Among some Yazidis, the glossonym Ezdîkî is used for Kurmanji to signify an attempt to erase their affiliation to Kurds.

7. Another new word for today: Kurdification.

Kurdification is a cultural change in which non-ethnic Kurds or/and non-ethnic Kurdish area or/and non-Kurdish languages becomes Kurdish.

I don't see how languages can become Kurdish as opposed to be replaced by Kurdish.

8. I don't understand the phonetic logic behind this Kurdish rule:

After /ɫ/, /t/ is palatalized to [tʲ]. An example is the Central Kurdish word gâlta ('joke'), which is pronounced as [gɑːɫˈtʲæ].

Is /t/ dissimilating after velarized /ɫ/? I think of velarization and palatalization as being opposites. But palatals are between dentals and velars in terms of point of articulation, so maybe this rule is assimilatory.

9. I also don't get this Kurdish rule which involves palatalization next to a velar:

When preceding /ŋ/, /s, z/ are palatalized to /ʒ/.

I guess that happens that happens because [ʒ] (I think phonetic brackets were intended) is closer to /ŋ/ than /s, z/.

Is there any language in which /s z/ become velar [x ɣ] next to a velar?

10. This Sorani allophony reminds me of how Middle Chinese corresponds to Mandarin [ə] in 生 shēngəŋ] (I should continue my series on 生):

The vowel [æ] is sometimes pronounced as [ə] (the sound found in the first syllable of the English word "above"). This sound change takes place when [æ] directly precedes [w] or when it is followed by the sound [j] (like English "y") in the same syllable.

The environment is completely different, though. And I don't understand why glides are schwa-friendly, though I admit I find [əw əj] easier to pronounce than [æw æj].

10. I am baffled by Kümmel's use of [ʆ] as well as [ɕ] for Sanskrit in slide 14 of his 2018 PowerPoint. I have never seen anyone use [ʆ] for Sanskrit (or any language, really). Is [ʆ] an allophone of /ś/ before /r/?

11. I've never seen the letters ḧ ẍ before. They are optional symbols for [ħ ɣ] in the Hawar and universal extended alphabets for Kurdish. (The Hawar section of the Wikipedia article seems to have the Arabic equivalents of ḧ ẍ reversed. I assume the IPA at the bottom of the article is correct, and that the diaereses are added to indicate voiced versions of h [h] and x [x].)

12. Trying to relate Mandan in North Dakota to ... Welsh seems so random. I didn't know there were deep inland variants of the Welsh-in-America myth. Or that there were this many variants:

In all, at least thirteen real tribes, five unidentified tribes, and three unnamed tribes have been suggested as "Welsh Indians."

Chris Harvey/Languagegeek patiently proves "That Mandan Is Not Welsh".

Just seven years after Harvey wrote that article, Mandan became extinct when Edwin Benson died in 2016.

13. Backer is bigger in Mandan:

Mandan, like many other North American languages, has elements of sound symbolism in its vocabulary. A /s/ sound often denotes smallness/less intensity, /ʃ/ denotes medium-ness, /x/ denotes largeness/greater intensity

Is there a similar gradation for stops: /t/ 'less' vs. /k/ 'more'? (There is no /tʃ/.) THE AMERICANIZATION OF MICHÁLKA

(Posted 19.10.11.)

1. It's hard to predict how non-English names are pronounced in American English. For instance, I wouldn't have guessed that Czech Michálka [ˈmɪxaːlka] would be pronounced as [miːˈʃɑːkə]. I'm not too surprised by the stress moving to the original long vowel (unstressed long vowels are difficult for English speakers). [miː] may reflect a Czech variety with [i] instead of [ɪ] (English has no nonfinal short [i], so [i] had to be lengthened to [iː]). But how did ch end up as [ʃ]? By analogy with French? Did <l> become silent by analogy with the silent <l> of words like <talk>?

2. Czech vowels look phonologically symmetrical, but that symmetry is lost in phonetic notation (source):

/iː/ [iː]

/uː/ [uː]
/u/ [u]

/i/ [ɪ]

/oː/ [oː]
/o/ [o]
/eː/ [ɛː]
/e/ [ɛ]

/aː/ [aː]
/a/ [a]

Why are /iː i/ not phonetically the same height? (They are both equally high in Eastern Moravian Czech. The [mˈʃɑːkə] may reflect a Czech variety with /i/ [i] instead of /i/ [ɪ].)

What motivates /i eː e/ being lower than /u oː o/?

Slovak, on the other hand, at first glance seems to have a nearly symmetrical system apart from the nearly extinct vowel /æ/ which has no long counterpart (source):

/iː/ [iː]
/i/ [i]

/uː/ [uː]
/u/ [u]
/eː/ [ɛː]
/e/ [ɛ]

/oː/ [oː] /o/ [o]

/æ/ [æ] /aː/ [aː]
/a/ [a]

But in closer notation, /eː e/ are higher than /ɔː ɔ/ - the reverse of Czech!

/iː/ [i̞ː]
/i/ [i̞]

/uː/ [u̞ː]
/u/ [u̞]
/eː/ [e̞ː] /e/ [e̞]

/oː/ [ɔ̝ː] /o/ [ɔ̝]

/æ/ [æ] /aː/ [aː]
/a/ [a]

Can Czech and Slovak native speakers tell each other apart from the minor differences in their vowels?

3. Why is Japanese sazo(kashi) 'certainly' spelled 嘸(かし) with 嘸 <MOUTH.NOT>? 嘸 has represented several morphemes over time in Chinese. The earliest one known to me is Late Old Chinese *mɨaʔ 'surprised' in 漢書 Book of Han (111 AD). But none mean 'certainly'. Did someone in Japan think that 嘸 would be appropriate for sazo 'certainly' because 無 <NOT> would convey 'no (doubt)' or 'no (choice)': i.e., inevitability and hence certainty?

口 <MOUTH> is a common radical in Chinese grammatical words, though I can't think of any parallels in Japanese off the top of my head. (The one made-in-Japan kanji with口 <MOUTH> that comes to mind is 噺 hanashi 'story' which isn't a grammatical morpheme or even an abstract one like sazo 'certainly'. I just found that Wiktionary has a list of made-in-Japan kanji - the only other one with 口 <MOUTH> is , a phonetic symbol for the first syllable of the placename 囎唹 Soo).

Shipka stats, Kanken levels, and Jun Da's general Chinese ranks:

Jun Da

無 = 无 56

嘸 = 呒

Windows 10's Japanese IME does not include 嘸 as a choice for sazo, but it does include 嘸かし for sazokashi after さぞかし.

Google frequencies:

さぞかし outnumbers 嘸かし by 289 to 1.

4. Wiktionary lists so as the on reading for 囎 even though by definition a made-in-Japan kanji cannot have borrowed Chinese readings. It can, however, have Chinese readings, as a Chinese reader could read it like a component that could be interpreted as a phonetic: e.g.,

in Mandarin.

Wiktionary also lists a kun reading shō. If 囎 is only for 囎唹 Soo, when is shō used? THE EARLY LIFE OF 生 (PART 1)

(Posted 19.10.11.)

1. Continuing from "The Two Lives of 生" with a new title:

For a long time - all the way into the late Old Chinese period - 生 belonged to the 耕 rhyme category.

In 詩經 Shijing about three millennia ago, 友生 'friend' rhymed with 平 'peace', another 耕 rhyme word (translation by James Legge):


And shall a man,

Not seek to have his friends?


Spiritual beings will then hearken to him;

He shall have harmony and peace.

I reconstruct 生 as Old Chinese *sIreŋ and 平 as Old Chinese *CIPreŋ.

In modern Cantonese, 生 and 平 have pairs of readings that don't rhyme at all:


How did that happen? Stick with me for future parts.

2. I found the Hong Kong Characters section of the 漢語多功能字庫 Multi-function Chinese Character Database when looking for Cantonese 𢫏 kam2 from the previous post. I don't know why I never explored that database before. I was aware of it but just never clicked. I don't have time to look around, so I'm just going to look at the 33-stroke Hong Kong characters:

Is 𡤻 a character for girls' names?

3. When I tried to type 靈 líng 'spirit' (Jun Da frequency for simplified 灵: #730) for the previous topic, Windows 10's Microsoft Pinyin IME did not include it in its 94 choices for Mandarin ling (excluding the lin-graphs after those 94). I don't understand why some common characters don't appear in the first batch of options. The first four options are (with Jun Da rankings)

1. 零 líng 'zero' (#1498)

2. 令 lìng 'order' (#267)

3. 另 líng 'other' (#620)

4. 霛 líng (variant of 靈; #8145)

I've never even seen 霛 before.

Another case like this is 家 jiā 'home' (Jun Da #55) which is not in Microsoft Pinyin IME's 89 choices for Mandarin jia (excluding the ji-graphs after those 89).

To type 靈, I typed <lingjing> (see topic 4), got 靈境, and deleted the second character 境.

To type 家, I typed <jiazu> 'family', got 家族, and deleted the second character 族.

I shouldn't have to do that.

4. I meant to type <jingling> for 精靈 jīnglíng 'spirit' and ended up discovering another word 靈境 língjìng 'spiritual territory' instead.

Taishūkan's 新漢和辞典 New Sino-Japanese Dictionary defines 靈境 <SPIRIT TERRITORY> as 靈地 <SPIRIT EARTH>: こうごうしい土地 kōgōshii tochi 'godly land'.

kōgōshii has a long ō common in Chinese loanwords like 皇后 kōgō 'empress', but if it were a Chinese loanword, it wouldn't be spelled in kana as こうごうしい. It has a partly semantographic spelling 神神しい <GOD GOD si i> revealing its true etymology. The Japanese root kamu- 'god' was compressed into kō:

*kamu > *kau > *kɔː > [koː]

The of kōgōshii is the same but with the voicing characteristic of the initial consonants of second elements of compounds; cf. 神神 <GOD GOD> kamigami 'gods' < kami < *kamu-i 'god'.

5. Korean 간직 kanjik 'storing away' sounds like a Chinese loanword but has no obvious Chinese etymology. Martin et al. (1967: 41) suggests a Sino-Korean source *看直 'look straight' which would be pronounced kanjik in Korean. It's a perfect phonetic match, but I don't see how it can semantically fit.

6. Today I learned that no get means 'does not have' in West African Pidgin English as well as in Pidgin (Hawaii Creole English/HCE):

25% of young pipo no get one single friend

I say pipo too.

HCE has about 600,000 speakers (Sakoda and Siegel 2003: 1), whereas West African Pidgin English has about 75 million! The largest French-based creole seems to be Haitian Creole with 12 million speakers.

Here's "The absolute beginners' guide to [West African] Pidgin" by speaker Kobby Ankomah-Graham:

Pidgin is defined by its practicality. Fluency will reduce how much you have to pay for cab fares or market tomatoes. [...] Advertising in Pidgin – once unthinkable – is now commonplace.

That's not yet the case for HCE in Hawaii, though we do have HCE greeting cards in stores. TANGUT DATABASE 3.2

1. Version 3.2 of my Tangut database has nine corrected readings thanks to Andrew West. Details in the changelog on sheet 2. More soon.

2. Viacheslav Zaytsev compares the first and second printings of one of the most important books I have ever used: Eric Grinstead's Analysis of the Tangut Script. I borrowed the second printing from the University of Hawaii library in 1994 and got my own copy a few years later. I have never seen the first printing before:

My copy of the 1st printing has 3 inserts (2 of them are glued) with info in Danish. This allows us to know some more facts related to Grinstead’s biography: his PhD thesis was submitted to the Fac. of Philosophy at the U. of Copenhagen on 29 Nov 1971, and defended on 30 Jan 1973

That information has been incorporated into Grinstead's Wikipedia entry. I hope more facts about Grinstead emerge in the years to come.

It is a shame that no one in Copenhagen picked up his torch.
3. Could Tangut 𗼧 6037 1kew2 'to instruct' be a loan from Chinese 敎 'to instruct'? 1kew2 resembles keu, the Sino-Japanese reading of 敎 borrowed from southern (not northwestern!) Chinese c. the 5-6th centuries. A few modern Yue varieties have keu-like readings. (But Yue in the south is only distantly related to the northwestern Chinese known to the Tangut.)

There are two problems.

First, normally the rhyme of 敎 corresponds to Tangut -o2, not -ew2 (Gong 2002: 375).

Second, Li Fanwen's (2008: 949) Chinese glosses for 6037 do not match his English gloss 'to instruct':

誥 'admonition, to admonish'

詔 'decree, to decree'

The Combined Homophones-Tangraphic Sea entry for 6037 - with the only attestations of 6037 that I know of - may point to 'decree' since 6037 is something an emperor does. MYŎNGDONG THEATER: LET'S MEET AT WALKERHILL

1. The photo I chose to illustrate my last post is of 明洞劇場 Myŏngdong Theater from 워_커힐에서 만납시다 Wŏ̄khŏhil esŏ mannapshida (Let's Meet at Walkerhill, 1966). (Walkerhill is one word; it's the name of this hotel.)

The entire movie is on the 韓國映像資料院 Korean Film Archive's YouTube channel with Korean and English subtitles - great for listening comprehension practice!

Let's look at the opening which is entirely in hanja with two exceptions.

0:55: The company name 株式會社東南亞映画公司 Chushik hoesa Tongnam A yŏnghwa kongsa 'Stock Company Southeast Asia Movie Company' is interesting in two ways. It combines the Japanese term 株式會社 'stock company' with the Chinese term 公司 'company'. And why is a Korean company called Southeast Asia Movie Company?

會 resembles 曾 with a closed 八 on top instead of 人 and an extra horizontal stroke joined to the central vertical stroke.

The left-hand vertical strokes of 映 and 司 are elongated so 日 resembles 阝 and 口 resembles a squarish P.

画 (a simplification of 畵) has a vertical line that goes all the way to the bottom horizontal line.

The ㇆ stroke of 司 resembles the first two strokes of 可.

1:01: 泰 appears as 𣳾.

式 has its dot moved to the top left.

社 has a hook on the bottom left.

給 has the same closed 八 on top instead of 人 as 會 in this frame and 0:55.

1:06: 워_커힐 Wŏ̄khŏhil 'Walkerhill' has an underscore vowel length marker. There is no such marker after khŏ, even though Japanese would perceive Walker as ウォーカー Wōkā with two long vowels.

1:12: 7-stroke variant of 㓰, a simplification of 劃.

1:27: 崔 has a long bottom horizontal stroke extending leftward.

李 has a 子 resembling the bottom of 雪.

1:35: I feel sorry for the people whose names are illegible here. I would write about their names if only I could make them out!

1:44: 玉 has a mirror-image dot.

后 is written entirely with straight lines. I didn't know it could be used in names.

2:00: The 重 of 勲 has a bottom horizontal line so long that it extends under the left half of 力.

梁 is missing its right dot, and the top right component looks like (⿹㇆ 㐅).

2:15: The 卜 of 朴 resembles hangul ㅏ.

2:33: I've never seen 変 in a name before. The left and right dots of the top element 亦 are mirror-imaged and the central strokes are straightened.

2:47: 永 is written as 水 with 二 on top, resembling Khitan small script 004

with an extra horizontal line.

Blink and you'll miss what I assume is the surname 徐 based on the Korean Film Archive's credits listing.

투위스트 Thuwisŭthŭ 'Twist' is the only other hangul word in the opening sequence.

2:57: 具 is missing one horizontal line, and two of the horizontal lines don't go all the way across to join the vertical lines. Those nonjoining lines are also in 貞 above, and 具, 貞, and 南 all have open left corners. In handwriting there can be a slight gap in that position, but in the stylized hand of these credits, that gap is bigger than usual.

The of 妊 looks like an angular ナ floating over an equally angular メ.

3:03: The 金 of 錫 has a bottom resembling 止 instead of Khitan small script 295 <p>:

The dot of 太 is moved leftward and overlaps with the lower left leg of 大.

The top left 丿 of 舞 has been reduced to a dot on top.

辰 is subtly different - the components under 厂 look like Γ atop Khitan small script 028 <sh>:

3:16: 星圃 Sŏng-pho 'Star-field' (if I'm reading correctly) is a nice name.

3:23: The top right of 監 looks like a horizontal line over ハ.

The ヰ of 韓 has a first horizontal stroke extending further to the left than the second one beneath it.

And skipping to the very end ...

1:36:34: 謝 has 扌 instead of 寸. I have never seen 扌 on the right of any Chinese character before; it is the left-hand variant of 手.

2. Surprise! The Dictionary of Chinese Character Variants has an entry for 媤 in its appendix of made-in-Korea characters even though it already has an entry for 媤 as a Chinese character.

3. Thirty years ago I believed I knew all the 常用漢字 jōyō kanji. But there were only 1,945 jōyō kanji then. 196 more were added (and 5 were subtracted) in 2010. I've seen all of the 196 before but one: 錮. Great, a Japanese ninth-grader knows that character but I don't.

I suspected 固 has been substituted for it, and I was right. Wiktionary explained that until 錮 became a jōyō kanji, the word 禁錮 kinko 'imprisonment without labor' was still written as 禁錮 in the criminal code, but in laws enacted (shortly?) after the announcement of the tōyō kanji (the predecessor of the jōyō kanji which excluded 錮), 禁錮 was written in mazegaki 'mixed writing' as 禁こ with hiragana こ <ko>. Newspapers, however, used 禁固 with kakikae: substituting the high-frequency tōyō and later jōyō kanji 固 for its homophone 錮.

大修館新漢和辞典 Taishūkan's New Sino-Japanese Dictionary gives only two words with 錮, 禁錮 and 錮疾, an alternate spelling of 痼疾 'chronic disease'. How common are these words? Google stats:

It's looking to me as if 錮 is required in schools solely for the word 禁錮 (which is more frequently written 禁固).

Frequency stats for 錮 from Dmitry Shipka:

錮 occurs just once in the entire Twitter corpus! I don't know how meaningful #4490 is, since there's no real sense in ranking it higher or lower than any other characters with only one instance in Twitter.

I think any kanji ranked lower than #2000 are not really worth learning for most people. 錮 is ranked way lower than #2000 except in the news, presumably due to reports about people imprisoned without labor. The sort of thing I don't read. So I don't feel too bad about not knowing it even existed until now.

I like Gakken's A New Dictionary of Kanji Usage (1982) which is frequency-based and includes frequent kanji regardless of whether they are jōyo or not. (It has an appendix of jōyō kanji that weren't frequent enough to make the cut for a main entry.) It doesn't include 錮. But would a new edition include it in the main section, or would it be in the appendix?

9.28.1:36: Just found that Windows 10's IME favors 禁固 over 禁錮, specifying that the latter is a legal term. And 禁こ is not in the list of potential spellings for kinko. MYŎNGDONG THEATER (PART 1)

All topics from yesterday that I didn't have time for last night:

1. I'm afraid to look at this page of old 明洞劇場 Myŏngdong Theater-related images because it might be packed with variant hanja.

Before getting to those, let's look at the two maps at the top of the page:

(I can't figure out how to link to the maps, so I've copied their titles which can be searched for.)

The 1960 map is entirely in hanja except for the non-Chinese word 유네스코 Yunesŭkho 'UNESCO' which can't be written in hanja and 극장 kŭkchang, which can - but isn't, perhaps because its hanja 劇 場 would be almost illegible in a small space. The tiny characters are not well-written and hard to read: e.g., I initially misread 市公舘 shigonggwan 'city public hall' as 市25舘 which makes no sense.

The 1972 map, on the other hand, is entirely in hangul. It is dangerous to make big claims based on just two items, but I cannot help but think the difference between the two maps reflects the shift away from hanja (which was far from complete in 1972 - the 1972 movie ads toward the bottom of the page still have hanja).

It's also telling that the two following announcements about the theater from 1953 and 1958 have had to be completely transcribed in hangul - something that wouldn't be necessary if hanja-heavy text were still the norm in 21st century Korea.

I haven't actually gotten around to discussing the variant hanja on that page yet. Later.

2. When I started studying linguistics thirty years ago, I was put off by an exercise whose answer was that Korean /s/ voices to [z] intervocalically. No, it doesn't, which is why foreign z is borrowed as /c/ which voices to [dʑ] intervocalically: e.g.,

Or does it? T. Cho et al. (2002: 212) "in fact observed that about 46% of tokens of /s/ were fully voiced in this position". I have never heard [z] in Korean. Is this [z] a recent innovation?

Historically, *s did become [z] in intervocalic position, and that [z] then lenited to zero, which is why there are modern alternations such as

낫다 <nas ta> nat-ta get.better-FIN 'to get better'

나았다 <na Øat ta> naØ-at-ta get.better-PAST-FIN 'got better'

in which earlier *s survives as [t] before consonants but vanishes between vowels.

Is history repeating itself?

I am very skeptical of intervocalic /s/ becoming long [sː] according to Wikipedia. I have never heard that /s/ sounding like Japanese ss.

3. I have no idea why the bound noun chabal in the expressions

자발(머리) 없다 chabal (mŏri) ŏpta (lit. 'X [head] lack')

자발 적다 chabal chŏkta (lit. 'X few')

both 'to be quick-tempered, impatient, restless' (Martin et al. 1967: 1379; is chabal 'patience'?)

has a variant 재발 chaebal. I'd expect a Cae-variant if the second syllable had i (e.g., 애기 aegi < 아기 agi 'baby'). ae is from *ai. But obviously there is no i in chabal. And a shift in the opposite direction (chaebal > chabal) has no precedent.

4. Martin et al.'s 1967 dictionary says 媤 <HUSBAND'S.FAMILY> is a "Korea-made character", but Wiktionary gives non-Korean readings for it: Mandarin sī, Cantonese si1, and Japanese shi. The earliest attestation I can find is in 集韻 Jiyun (1037) which lists a variant 㚸.

The Korean reading 시 shi is unusual because Mandarin should correspond to Korean sa. shi could theoretically be a borrowing of an Early Middle Chinese *si rather than a c. 8th century Late Middle Chinese *sz̩ that would have become the expected sa. However, I can't find any evidence of 媤 before the 11th century. And if 媤 had existed in the Early Middle Chinese period, it would have been *sɨ with a central vowel, not *si with a front vowel.

Moreover, I can't understand what would motivate reading 媤 as shi. Its phonetic is 思 sa, and no other characters with 思 are read shi. No, wait, no other common characters with 思 are read shi. There is a character 緦 <SACK.CLOTH> also read shi. 東國正韻 Tongguk chŏngun (Correct Rhymes of the Eastern Country, 1448) gives the prescriptive reading of 緦 as ᄉᆡ /sʌj/. Normally 15th century /ʌj/ becomes modern ae, but maybe in this instance it became i. 媤 is not in Tongguk chŏngun, but if it were, would its reading have been given as /sʌj/?

The correspondence of /ʌj/ to Mandarin i is unusual ... unless ...

In Old Chinese, 緦 was *sə. If 媤 had existed in Old Chinese, it too would have been *sə. That *sə should have developed into Late Middle Chinese *sz̩ which would be borrowed into Korean as /sʌ/ which would become modern sa.

But what if 緦 and 媤 had an Old Chinese variant *CAsə? (Perhaps such a sesquisyllable was actually original, and *sə is but a reduction.) The *A would condition the warping of *sə:

*CAsə > *CAsʌɰ > *sʌj

The end result matches the /sʌj/ for 緦 in Tongguk chŏngun. shi for 緦 and its homophone 媤 would then reflect variation within Chinese rather than a Korean-internal random change. VARIANT HANJA C. 1963

No time to revise my Tangut database today, but Tangut turns out to be marginally relevant in a most unexpected place: a set of South Korean newspaper movie ads from c. 1963 (h/t ╹ω╹りなれはあとい@衛兵るた) with hanja variants that bring to mind Juha Janhunen's hypothetical Parhae script. (No, I'm not drawing a direct line between the two. Parhae characters did not survive into the 1960s. I'm merely pointing out two sets of Chinese character variants from the greater Korean cultural zone.)

1. This 人 + two strokes variant of 人 <PERSON> in an ad for 夫婦條約 Pubu choyak (The Husband-Wife Contract, 1963) reminds me of Eric Grinstead's (1972) hypothesis that the Tangut character component 𘢌 <PERSON> (in one out of five characters!) was derived from that variant.

Here's another example in an ad for 傷한 갈대를 꺾지 마라 Sanghan kaldaerŭl kkŏkkchi mara (Do Not Break a Damaged Reed, 1962).

A similar variant of 文 (Later Han example here) reminds me of Khitan small script character


which was pronounced something like [je] judging from its use in Chinese loans. Or was it? Kane (2009: 327) pointed out that in native words, 327 combines with a-graphs: e.g.,

327-123 <> (Xu 18.18).

327-261-051-189-123 <> (Xu 33.2)

[je] coexisting with [a] is very un-'Altaic'. Did 327 have two readings, one for Chinese loans and another for Khitan words? Or are we still far from understanding how Khitan vowel harmony worked?

Dotless 文 326 (reading unknown) combines with both a- and e-graphs - again, very un-Altaic: e.g.,

That last word looks like the feminine counterpart of masculine <>. So is one word misspelled? In other words, was that word originally spelled with 326 or 327? Did 326 and 327 originally represent a harmonic pair? Was one [ja] and the other [je]? Compare with how Manchu differentiated a and e [ə] with a dot for the latter centuries later. The use of the dot in the Khitan small script to indicate grammatical masculine gender should also be taken into consideration when interpreting dotted 327.

2. This simplification of 演 as ⿰氵⿱𡧇儿 in an ad for 자이안트 Chaianthŭ = Giant (1956) is new to me. (A clearer comparison of the two.) ⿰氵⿱𡧇儿 is not in the Dictionary of Chinese Character Variants entry for 演.

Notice how the vertical part of ㄴ <n> in the logo for 자이안트 <ch.a Ø.i Ø.a.n th.ŭ> is barely visible. But the letter has to be ㄴ <n> because ㅡ <ŭ> is not possible in that position.

絕讚 chŏlchan 'highest praise' is written almost unrecognizably:

3. In the ad for Let No Man Write My Epitaph (1960) at the bottom left of this image, dots separate foreign names even though spaces would suffice. Such dots are commonly used to separate foreign names in Japanese, but I've never seen the practice in Korean before.

The character _ is used to indicate vowel length, much like Japanese ー (but note the different size and placement relative to the base line - and how it can go under the word divider dot):

9.26.0:48: Next to that ad is an all-text ad for 다이엘 Taniel (Daniel; Le Puits aux trois vérités, 1961) in Korean. It too has dots as name dividers. But its text is vertical, so its vowel length marker is also vertical 丨: e.g.

<k.ŭ ro-t.ŭ> Kurō [kurdɯ] 'Claude'

Nowadays foreign /l/ and /r/ are borrowed differently in Korean in medial position (e.g., Claude is now 클로드 Khŭllodŭ), but at this point they seem to be both borrowed as Korean /r/ even when /r/ has a mismatching allophone: e.g., Morgan as 몰간 <m.o.r k.a.n> /morkan/ [molgan] (now 모간 <m.o k.a.n> is preferred). Another example is  the aforementioned Shirley as 샤_리 <s.ya_r.i> /syaːri] [ɕʰaːri], now 셜리 <s.yŏ.r r.i> /syŏrri/ [ɕʰɔlli].

4. 禮 is normally simplified as 礼, but this ad for 江華道令 Kanghwa Toryŏng (The Reluctant Prince, 1963) has 𥘇 with an extra stroke. TANGUT DATABASE 3.1

1. Thanks yet again to Andrew West for submitting corrections for my Tangut database. Version 3.1 has

2. What is the etymology of Cantonese aa6 'ten' in '31'-'99' (excluding the tens: '40', '50', etc.)?

1saa is a contraction of the words for 三 'three' and 十 'ten' written as three 十 <TEN> fused together (into what almost looks like <FOUR> in the Khitan large script). I would have expected saam1 and sap6 to fuse into ˟saap. The loss of -p is irregular in Cantonese.

Tone 6 points to a *voiced initial, presumably *ŋ- (呀 has a *ŋ-phonetic 牙). I wonder if aa6 is a linking particle related to the aa3 (not aa6!) in this list of items cited on Wiktionary:


jan1.wai4 gwok3 aa3, kyun4 aa3, wing4 aa3, gaai1 hai6 nei5 jau5, zi3 dou3 sai3 sai3, sing4 sam1 so2 jyun6.

because kingdom aa3, power aa3, glory aa3, all is you have, reach to generation generation, sincere heart NMLZ wish.

'For thine is the kingdom, and the power, and the glory, for ever. Amen.' (1882 translation of the Gospel of Matthew)

But the tones are different: the linking particle is aa3, not aa6. Did tone 6 spread from 'ten' to the linking particle?

*saam1 sap6 aa3 > *saam1 sap6 aa6 > saa1 aa6?

3. I couldn't help but think Kage Baker was named after Japanese 影 kage 'shadow'. But her name is disyllabic:

Her unusual first name (pronounced like the word cage) is a combination of the names of her two grandmothers, Kate and Genevieve.

4. I'm surprised this concept came from the pen (or should I say word processor?) of a language teacher:

The cyborgs can recognize, understand, and speak any known human language instantly, including local variants and dialects.Their lingua franca is called "Cinema Standard", presumably the English spoken in 20th century movies, with which most Company operatives are obsessed.

The words are Wikipedia's and not Kage Baker's. Interestingly their lingua franca wasn't Elizabeth English which was her specialty. I suppose Cinema Standard was for the reader's convenience. TANGUT DATABASE 3.0

Thanks again to Andrew West for submitting corrections for my Tangut database. Version 3 has

See the changelog on sheet 2 for details. Even more corrections soon. TANGUT DATABASE 2.1

Thanks to Andrew West for submitting corrections for my Tangut database. Version 2.1 has corrected readings for ten characters (see the changelog on sheet 2). More corrections soon. HIIRAGI ISN'T HOLLY

1. And these four Khmer characters aren't (just) for Pali:

Last night I noticed said,

The last 4 characters [i.e., <°ū °û ś ṣ> above] are used only for pali.
They are rare, but that doesn't mean they're for Pali.

2. I found this post by No-sword (Matt Treyvaud) while trying to find more information about Flora Best Harris:

Hiiragi is often pressed into service as a Japanese translation for "holly" (in the Christmassy sense), but in fact it's a different plant: Osmanthus heterophyllus, a.k.a. "false holly". Completely different order from actual holly.

I'll never think of 柊 hiiragi as simply 'holly' again. Though I was in good company - that's what Flora Best Harris called it in English. Who was the first to translate hiiragi as 'holly'?

9.9.9:39: Is 'holly' in fact a translation of another, earlier (Dutch?) translation of hiiragi? Now I'm curious about early Dutch-Japanese lexicography. Early Portuguese-Japanese lexicography gets more attention. PRACTICAL SCHOLARSHIP: MASTER'S IN TYPEFACE DESIGN

1. Fonts are the blood of the digital world. We can't read on machines without them. And all fonts have typefaces. As ZURB puts it,

A font is a container of type.


A typeface is the design of a set of characters — letters, numbers and punctuation.

In other words, fonts can't be empty containers. There are no fonts without designs. And as the Department of Typography & Graphic Communication at the University of Reading explains, typeface designs

rely on a deep web or of historical, cultural, and technical understanding, as well as plain-old form-making skills. From the impact of traditional forms of writing, the developments in the technologies of type-making and typesetting, the typeface designer needs to be aware of how texts are transmitted and shared in each society, and respond to the editorial practices and conventions of each market.

That university's MA Typeface Design (MATD) program trains students to tap into that "deep web", to be capable of producing scholarly knowledge as well as applying that knowledge to the practical task of creating beautiful texts in many scripts.

I've been slowly reading Zachary Quinn Scheuren's MATD dissertation "Khmer Printing Types and the Introduction of Print in Cambodia: 1877-1977".

I found that last night while trying to Google whether Franklin Huffman's coauthor Im Proum's surname was Im or Proum. Google Scholar treats Proum as the surname, but I still don't really know.

2. This morning I found this long list of samples of predigital Khmer script at whose front page is a list of samples of digital Khmer fonts.

And tonight I found khmerfonts' page on how to make a Khmer font. (But see my next entry!)

3. Not long ago I wrote a post showing my ignorance of Old Khmer script. I no longer have an excuse to be in the dark. This morning I found SEAlang's Old Khmer images page which allows me to see Old Khmer texts. But at the moment I can't figure out how to get these features to work:

Clicking an image will load either:

depending on which button in the upper right is blue.

But maybe I'm just a prisoner of my computer illiteracy (exemplified by the deliberately primitive design of this website - my philosophy is not to do anything I don't understand ... which doesn't explain the innumerable forays into the unknown [for me] on this blog ... so maybe that's not my philosophy).

Fortunately, I can check my readings of the texts using the Corpus of Khmer Inscriptions. Unfortunately, only inscriptions with Jenner's readings are up, so I'm out of luck with K.27 which isn't one of them. Looking at K.28, I see virāmas which look like superscript dashes. When did they fall out of favor in Khmer?

4. I've also been slowly reading Meredith McKinney's "Classical Prose" in The Routledge Handbook of Literary Translation (Kelly Washbourne and Ben Van Wyke, eds.). McKinney mentions Flora Best Harris, an early translator of Japanese into English. I wonder how Harris learned Japanese in those days. THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 14)

Indic scripts generally have two types of vowel symbols:

The Khmer script has both types of symbols, though they are not quite used the way one might expect:

The Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script uses the independent vowel symbols for many (not all) vowels, even though it would be possible to write all English word-initial vowels as <ʔa> + vowel character combinations. Some of the phonemic assignments of independent vowel symbols surprised me:

independent vowel symbol
modern Khmer
dependent vowel symbol transliteration modern Khmer
after *voiceless
after *voiced

<ā> (not the inherent vowel <a>!)
<ʔa"> -

<ʔa'> -

<a'> -

ʔə, ʔəj, ʔɨ
<i> e, ə i, ɨ /ɪ/

<ī> əj


ʔo, ʔu, ʔao
<u> o

ʔou, ʔuː ុអ់ <u'> (not <°ū>!) -


ej, ə eː, ɨ /ɛ/
<°ai> ʔaj
<°aiʔa'> -
<aiʔa'> -




<°o₂ḥ> -



<°au> ʔaw
ឳអ់ <°auʔa'> -
<auʔa'> -
<yu> (not <yū>!) ju

អឹអ់ <ʔïʔa'> -
ឹអ់ <ïʔa'> -

<va> (not <va'>!) vɔː ុះ

(9.7.21:59: Added the next six paragraphs and greatly expanded the table above.)

I use hyphens to indicate that <'> and <"> have no sound values of their own:

Other hyphens indicate that a symbol combination is not used in Khmer as far as I know.

Note how the phonemic assignments of ASB independent vowels and their dependent counterparts do not always match: e.g., independent <°ǔ> corresponds to dependent <ū> since Khmer has no dependent <ǔ>.

Some ASB dependent vowels have no independent counterparts. I presume they are written as <ʔa> + vowel character combinations.

ASB takes advantage of the existence of two <°o>-characters to assign them to different vowels.

Once again (see part 11), ASB <ḥ> represents /j/.

ASB regards <yu> and <va> as independent vowel symbols.

¹9.7.21:49: Khmer has two homophonous independent vowel symbols for <°o>. I transliterate them as ឱ <°o₁> and ឲ <°o₂>. Their Unicode names are KHMER INDEPENDENT VOWEL OO TYPE ONE and KHMER INDEPENDENT VOWEL OO TYPE TWO. Huffman (1970: 118) says ឱ <o₁> "is the more common of the two", so I'm not surprised by the numbers in the Unicode names.

²I was taught ឲ្យ <°oya> which is the main spelling in  the online editions of Headley's dictionaries and the only spelling given in Huffman's 1970 textbook and Jacob's 1974 dictionary. Ehrman's grammatical sketch in Contemporary Cambodian has ឲយ <°oya> (with full-sized rather than subscript <ya>)  as the only the spelling. Has the regular spelling <ʔoya> become popular in recent years?

³9.7.12:54: I think /ɛɪ/ in the ASB key to independent vowel symbols should be /eɪ/ as in the ASB key to dependent vowel symbols. THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 13)

The Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script is based on an nonrhotic dialect. Thus it has symbols for vocalic sequences corresponding to /Vr/-sequences in rhotic dialects:

This site
example word
Khmer script transliteration Khmer script transliteration Khmer script transliteration
ីរ <īra>
ូរ? <ūra>? ុះ
<āyïra>? ៃអ់

យើរ <yīera> ឹអ់

Question marks indicate my guesses for sequences I couldn't find in Huffman and Proum (1983). (here is on p. 43 and cure is on p. 44 of H&P.) The spelling <īe> represents [əː] after *voiced consonants in Khmer (e.g., <y>). Does Huffman pronounce cure as [kʰjəːɹ]?

I'm surprised there's no ASB symbol for /ɛə/ as in square. Perhaps the ASB dialect has no /ɛə/. Did it shift /ɛə/ to /ɪə/? The ASB /ə/-vowel subsystem is almost symmetrical except for the lack of a /jɪə/:

/ɪə/ /ʊə/
/aɪə/ /aʊə/

<ḥ> has no consistent function in the ASB system; it corresponds to /ə/ above and to /j/ in <uaḥ> for /ɔj/.

I would have expected /ə/ to be <uʔa'> instead of <uḥ>. (<ua>  isn't available because ASB already assigned that to /ɔ/.)

Modern standard Khmer is also nonrhotic. However, unlike nonrhotic English varieties, *-r has been lost without a trace in modern standard Khmer: e.g.,

(Examples from Huffman 1970: 20.)

I used to think there were a few exceptions ending in <-ăra> and Sanskrit <-arCa>: e.g.,

(Examples from Huffman 1970: 50.)

I regarded the final [ə] as a trace of /r/, but it's not - [ɔə] is the regular reflex of short *a (via *ɔ) after *voiced consonants and before *nonvelar codas. Khmer words could not end in *short vowels. It seems that *ɔ-breaking occurred before *-r was (recently?) lost:

modern spelling
ជ័រ ធម៌ មាន់
stage 1
stage 2: *a-raising after *voiced consonants *ɟɔr *dhɔr *mɔn
stage 3: *ɔ-breaking
*Cɔər *Chɔər *mɔən
stage 4: *r-loss
[cɔə] [tʰɔə] [mɔən]

9.6.0:10: I have left the consonants for 'resin' and 'dharma' unspecified in stage 3 since I do not know whether obstruent devoicing preceded or followed stage 3. THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 12)

1. Here are the last two vowel symbols¹ in the Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script with their counterparts in Huffman and Proum (H&P; 1983) and my own preferences:

This site
example word
Khmer script transliteration Khmer script transliteration Khmer script transliteration




<ï> យូ

In modern Khmer, <o> is prohnounced [ao] after *voiceless consonants and [oː] after *voiced consonants. H&P must have the first phonetic value in mind.

In modern Khmer, <au> is pronounced [aw] after *voiceless consonants and [ɨw] after *voiced consonants. H&P must think English /aw/ is closer to Khmer [ao] than Khmer [aw].

H&P and I have the historical sound values of Khmer symbols in mind. In earlier Khmer, there was no [ao], so <au> would have been the best choice for English /aw/.

H&P do not have a special symbol for /juː/, so I speculate they would write /juː/ with their symbols for /j/ and /uː/.

ASB uses the short neutral (i.e., nonpalatal and nonlabial) vowel symbol ឹ<ï> for the palatal-labial   sequence /juː/ even though <ï> is pronounced [ə] after *voiceless consonants and [ɨ] after *voiced consonants in Khmer.

9.5.0:29: The logic here seems to be that a simple, common Khmer symbol is preferred to a symbol sequence for a common English phoneme sequence.

¹From a rhotic speaker's perspective. ASB is designed for nonrhotic English, as part 13 will make clear.

2. On Sunday I learned of three martial arts that originated in Hawaii. They all have interesting names that I could call 英制和語 <ENG MAKE JPN WORD> Eisei wago 'Japanese words made by English speakers' or 布制和語 <HI MAKE JPN WORD> Fusei wago 'Japanese words made in Hawaii²' - terms intended to sound like the actual term 和製英語 <JPN MAKE ENG WORD> Wasei eigo 'made-in-Japan English words':

2a. カジュケンボ Kajukenbo is from 空手 <EMPTY HAND> karate + 柔道 <SOFT WAY> + 拳法 <FIST METHOD> ken 'martial arts' (see 2b below) + boxing. Note how the long vowel of jū is absent from Kajukenbo. It could be spelled in kanji as 空柔拳法菩 'bo(dhisattva) of the empty and soft martial arts'.

2b. 唐法拳法 Kara-ho Kempo looks redundant in kanji:

Kara is the archaic Japanese word for continental Asia (China and Korea; the word is ultimately cognate to Korea). Here it is written as <TANG> (i.e., Tang dynasty) to specify that Kara refers to China rather than Korea.

法 <METHOD> is read as in most contexts (but see below). Kara-hō is presumably 'Chinese method'³.

拳 <FIST> ken (pronounced [kem] before p-) in Japanese is homophonous with 劍 <SWORD> ken, so 劍法 <SWORD METHOD> 'swordsmanship' (now spelled 剣法 in Japan) is also kenpō (or kem if one prefers to romanize phonetically). That is not a case of 50/50 ambiguity, though. In Google, 拳法 kenpō 'martial art' outnumbers 剣法 kenpō 'swordsmanship' by a ratio of almost 32 : 1 (1.81 million to 57,000).

法 <METHOD> appears again at the end but is read as after ken. 法 was originally borrowed with initial p- in Japanese, but that p- was weakened to h- except in the clusters -np- and -pp-.

Tonight I was puzzled by "DIAN HSUHE" on the official Kara-Ho shield until I figured it referred to Mandarin 點穴  diǎn xué <POINT HOLE>, a.k.a. the 'touch of death'. "HSUHE" is from the Wade-Giles romanization hsüeh with the letters of eh reversed.

2c. 檀山流 Danzan-ryū 'Sandalwood Mountain School' contains a Japanization of Chinese 檀山 'Sandalwood Mountain' (Taansaan in the Cantonese spoken by most Chinese here), an archaic name for Hawaii unknown in Japanese.

I just realized that sandal- in sandalwood looks like an Anglicization of Sanskrit candana- 'sandalwood'. (Middle Chinese 檀 *dan is an abbreviation of 栴檀那 *tɕiendanna, a borrowing of candana-.) It's not - Wiktionary shows that the Europeanization of candana- occurred much earlier in Greek which borrowed the word as σάνδανον sándanon. (Latin in turn borrowed the Greek word as sandalum an unexpected -l-. Perhaps the word was remodelled after the similar-sounding but unrelated word sandalium, the source of  English sandal.)

²9.5.0:27: 布 fu is short for 布哇 Hawai 'Hawaii' which looks as if it should be read Fuai: i.e., the sum of its parts 布 fu and 哇 ai. I've never been able to explain how Hawai came to be spelled 布哇. Usually mysteries of this type can be solved by reading the kanji in Mandarin (i.e., the spelling is imported from Chinese), but 布哇 isn't in use in Chinese (the Chinese name for Hawaii is 夏威夷), and as far as  I know, 布 is not read ha in any language.

³唐法 Kara-hō is an invented 湯桶 yutō-style collocation unique to this proper noun. If I didn't already know that noun, I would read it as hō Kenpō with the Sino-Japanese reading for 唐, since two-kanji words are mostly read with two Sino-Japanese readings, often even from the same stratum of borrowing.

3. I can't remember anymore if I ever wrote a guide to how I assign grades to Tangut syllables, so here goes:

In general, I follow Gong Hwang-cherng's grade assignments though I do not use his notation:

How do I determine whether Gong's -j- corresponds to my -3 or -4?

STEP 1: Is the j-rhyme listed twice in Gong's reconstruction? For instance, Gong reconstructs both rhyme 10 and rhyme 11 as -ji.

If the rhyme is listed twice (like rhyme 10/11 -ji), go to step 2. If not (like rhyme 62 -jụ), go to step 3.

STEP 2: If there are two j-rhymes that Gong reconstructs identically, I assign Grade III to the first rhyme and Grade IV to the second: e.g., Gong's rhyme 10 -ji is my -i3 and his rhyme 11 -ji is my -i4.

STEP 3: If Gong only reconstructs a j-rhyme once, I assign grades mechanically depending on the initial. I assign Grade III if the initial is

All other j-syllables with a nonduplicate j-rhyme have Grade IV.

That assignment is not arbitrary; it follows the general pattern of initials in syllables to which I assigned Grade III and IV according to the methodology in step 2.

That pattern seems to be phonetically motivated. Grade IV was apparently more palatal than Grade III, and the initials associated with Grade III may have been 'antipalatal': v-, l- (phonetically velar or velarized?), and the class VII initials and zh- (phonetically retroflex?).

9.5.2:40: I am reminded of Polish which has retroflex consonants with Tangut parallels:





Polish nonpalatalized velarized [ɫ] became [w] in standard Polish (but is retained in some dialects). Tangut l- and v- could have been like Polish [ɫ] and [w].

The nonpalatalized [l] ~ [w] alternations of Ukrainian and Belarusian also come to mind:

The masculine forms originally ended in *-l.

In all of the above Slavic languages, a lateral and [w] originated from nonpalatalized *l, whereas in Tangut, l- and v- are distinct initial phonemes with distinct histories. I do not intend to draw any deep parallels between Slavic and Tangut. I cite Slavic merely to show how a lateral and [w] can be phonetically similar enough so that one can change into the other. l- and v- must have been phonetically similar in Tangut too.

As for why l- and v- behave like the retroflexes, I am reminded of the unetymological -w- after some Mandarin retroflexes: e.g., in 霜 shuāngwaŋ] 'frost' < Late Middle Chinese *ʂaŋ. And Wikipedia agrees with my perception of English /tʃ, dʒ, ʃ, ʒ/ as "often slightly labialized: [tʃʷ dʒʷ ʃʷ ʒʷ]." So the Grade III consonants are united by some sort of w-ish-ness. HOW MANY SHORT VOWELS DID EARLIER KHMER HAVE?

My introduction to Khmer historical phonology back in 1994 was Pinnow (1980) which posited twelve long *vowels and four *short vowels:

Pinnow's long *vowels





Pinnow's short *vowels


Pinnow takes the long vowels as basic, so he indicates brevity rather than length.

Last year I discovered Jenner and Sidwell's (2010) reconstructed vowel system:

Jenner and Sidwell's long *vowels


The two *Vːə diphthongs are Angkorian innovations. Modern [ɯːə] (my [ɨə]) is an even more recent innovation; see Sakamoto (1977) who demonstrates that many [ɨə]-words are borrowings from Thai and, to a lesser extent, Vietnamese. (He notes one highly anomalous loan from Sanskrit: រឿណរង្គ <rï̄yeṇaraṅga> [rɨənrŭəŋ] < Skt raṇaraṅga 'battlefield'.)

Jenner and Sidwell's short *vowels





Jenner and Sidwell posit eight short vowels, twice as many as Pinnow's four. They only give one example of the 'extra' vowels (from a Pinnowian perspective):

Old Khmer
Modern Khmer
<radeḥ> ~ <rddeḥ>
*rədeh ~ *rɔdeh
<radeḥ> [rɔtih]

The form in the "Pinnow?" column is my guess according to my understanding of his system.

A sample of Old Khmer <-eḥ> words in Jenner's online dictionary all ended in *-eh with short *e:

<neḥ> *neh 'this'

<peḥ> eh 'to pluck'

<ʔseḥ> *seh 'horse'

Like 'cart', all three of the above words are still spelled with <-eḥ> in modern Khmer.

Are there any Old Khmer <-eḥ> words that were pronounced with *-eːh? Or is short *e an allophone of *eː before *-h?

The problem is that the Khmer script has never had distinct symbols for short and long e, so in theory Old Khmer <e> could have represented either *e or *eː. I can only think of two ways to reconstruct such a length distinction in Old Khmer:

1. Internal evidence: The *e that Pinnow and I reconstruct on the basis of modern Khmer corresponds to two sets of spelling patterns in Old Khmer.

2. Backward projection: If Old Khmer <e> has two sets of correspondences in modern Khmer, then those sets might be reflexes of short *e and long *eː.

(The fact that *e has different reflexes depending on the *voicing of the preceding consonant and on what follows *e is not relevant if the goal is to reconstruct a phonemic length distinction. To do that, one would ideally find two Old Khmer words spelled with the same consonant plus <-eḥ> with different reflexes in modern Khmer. One would then conclude that the Old Khmer script was incapable of indicating the length difference in the vowels of those two words.)

In scenario 1, if some modern Khmer <-eḥ> corresponds to Old Khmer <-eḥ> *-eh, then some other modern Khmer <-eḥ> might correspond to Old Khmer <-X> *-eːh.

In scenario 2, some Old Khmer <-eḥ> *-eh became modern Khmer <-eḥ>, wheas other instances of Old Khmer <-eḥ> *-eːh became modern Khmer <-Y>.

I don't know enough about Old Khmer to guess which scenario is correct, much less come up with other scenarios.

For now I am inclined to go with my allophonic hypothesis. Pinnow's/my *e has different reflexes depending on whether it was followed by *-h:

before *-h
before palatals
after *voiceless initial
[ə] [ej]
after *voiced initial
[ɨ] [eː]

That suggests that *e was phonetically (not phonemically!) different (shorter?) before *-h. The reflexes of *-eh are identical to those for *-ĭh:

before *-h
before *-ʔ and in open syllables
before *-j
after *voiceless initial

after *voiced initial


*e in *-eh might have been short like in *-ĭh. THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 7)

1. The Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script uses the  មូសិកទន្ត​ <mūsikadanta> 'mouse¹ tooth' diacritic ៉ <"> to represent English /æ/. That wouldn't have occurred to me since <"> in Khmer is not a vowel symbol. It has two functions:

- to indicate that a vowel after a voiced consonant symbol is pronounced as if it had been preceded by a *voiceless consonant: e.g.,

យ៉ាង <y"āṅa>​​​ [jŋ] 'kind'

which has the reflex of *aː normally after voiceless consonants as in

ខាង ​<khāṅa> [kʰŋ] 'side'

rather than the reflex of *aː normally after voiced consonants as in

យាង <yāṅa> [jŋ] 'to go (royal)'

- to indicate that ប <pa> stands for [p] rather than [ɓ], its normal value in modern Khmer: e.g.,

ប៉ី​ <p"ī> [pəj] 'flute'

cf. បី​ <pī> [ɓəj] 'three'

(Examples from Huffman 1970 added 8.30.14:48.)

The simplicity of <"> (two short strokes) is appropriate for the fifth most common vowel in English after /ə ɪ i ɛ/. Three of those vowels have one-stroke symbols in the ASB system:

/i/ has a two-stroke symbol ី <ī>.

ASB's <"> is easier to write than my choice of two-stroke <ĕ> (modern Khmer [ae] ~ [ɛː] < *ɛː) for English /æ/.

¹8.30.15:24: Sanskrit mūṣika- 'mouse' is cognate to English mouse.

2. Last night I saw that the English Wikipedia gives two different Khmer spellings of Lon Nol:

The first is the one used by the Khmer Wikipedia which doesn't mention the second. One might expect the two spellings to be homophonous, but I would read them (perhaps erroneously) with different vowels as

Was it really possible to pronounce his personal name Nol two different ways? In case you're wondering what's going on with the gap between spelling and pronunciation, this table may help (K = voiceless obstruent, G = voiced obstruent, = voiced sonorant, P = labial, ฿ = nonlabial):

Earlier Khmer
Modern Khmer
*GɔːC <GaCa> [KɔːC]
*ṄɔːC <*ṄaCa> [ṄɔːC]
*KɔC <KaCa'>
*GɔC <GaCa'> [Kŭə฿] ~ [KuP]
*ṄɔC <ṄaCa'> [Ṅŭə฿] ~ [ṄuP]
*KoːC <KoCa> [KaoC]
*GoːC <GoCa>
*ṄoːC <ṄoCa> [ṄoːC]
(no *CoC)
*KuːC <KūCa> [KouC]
*GuːC <GūCa>
*ṄuːC <*ṄūCa> [ṄuːC]
*KuC <KuCa> [KoC]
*GuC <GuCa>
*ṄuC <ṄuCa> [ṄuC]

Earlier Khmer has both voiceless and voiced obstruents (*K and *G) which merge into voiceless [K] in modern Khmer.

Earlier Khmer has a simple short/long vowel system whose modern Khmer reflexes diverge depending on the *voicing value of the preceding consonant (and the labiality of the final consonant after *ɔ).

After *voiceless consonants, labial vowels are pushed up:

After *voiced consonants, labial vowels either remain the same or are pushed up:

The bending of Khmer vowels reminds me of the bending of Old Chinese vowels. In both Khmer and Old Chinese, *vowels split into two series, 'lower' and 'higher' (though the conditioning factors were different):

modern Khmer
Late Old Chinese
modern Khmer Late Old Chinese
*o > *əw
[ou] *ou > *aw

(Above I give Khmer reflexes for *long vowels in spite of the absence of length in the first column.)

I suspect Tangut also underwent a similar vowel split, though the details are unknown.

One might expect <ṇula> to behave like *ṄuC in my first table: i.e., it should be read [ɳul]. But in modern Khmer script, ណ <ṇa> indicates that the following vowel is read as if it had once been preceded by voiceless *n̥. Khmer never had a retroflex /ɳ/ phoneme, so ណ <ṇa> came to be used as the virtual *voiceless counterpart of ន <na> for /n/. I emphasize the word "virtual" - vowels after ន <na> are pronounced like vowels after, say, ត <ta>, but at no time was <ṇa> ever pronounced *n̥. For instance, <ṇāma> [naːm] 'water' is a borrowing from Thai น้ำ <nā2ṁ> [naːm˥] 'water' which never had *n̥. The word was borrowed after the shift of *aː to *iə after *n. Contrast with នាម <nāma> [niəm] 'name', borrowed from Indic before the shift of *aː to [iə] after voiced *n.

Early Khmer
*naːm not yet borrowed
*aː to [iə] *nm
Later Khmer
[niəm] [naːm]

3. Last night I saw the Khmer spelling of Sisowath for the first time:

ស៊ីសុវតិ្ថ <sˌīsuvatthi> [siːsoʋat]

(I confess I've only read about Cambodian history in English for the last twenty-six years.)

<suvatthi> is from Pali suvatthi- 'well-being'. But what is <sˌī>? I was surprised by the ក្ផៀសក្រោម <kphiasa kroma> 'under dash'² ុ <ˌ> which I've never seen in an Indic loan before. It indicates that <ī> is to be read [iː] as if it had followed a *voiced consonant.

There is an identically spelled Khmer word <s^ī> [siː] 'to eat' with <trīsabda>. Why isn't it <sī> [səj] with the regular reflex of *iː after voiceless */s/? Is it a loanword like another Khmer <s^ī> [siː] 'color', a loanword from Thai สี  <sī> [siː˩˩˦] 'id.' borrowed after the shift of *iː to [əj] after *voiceless consonants was complete?

²8.30.16:07: The ុ <kphiasa kroma> identical to  ុ <u> in shape is a subscript virtual voicing reversal mark. It replaces the ៉ <mūsikadanta> and the ៊ <trīsabda> diacritics when a superscript symbol occupies their positions.

It's taken me twenty-four years to figure out that <trī> is 'three' and not 'fish' and that ៊ <trīsabda> gets its name from its resemblance to ៣ <3>.

4. Last night I saw the name Mohannad Mohanna. Are Mohannad and Mohanna unrelated? Wikipedia says Arabic مهند Muhannad (> Persian Mohannad) is  from Hind 'India' (with mu- + a CaCCaC pattern). And Wikipedia has three entries for places in Iran named  مهنا‎ Mohanna.

5. Wiktionary says Latin sidus 'star' is from Proto-Indo-European *sweyd- 'sweat'. How is that semantically possible?

6. Today I started copying the Tangut character textbook


0152 4009 5370 5449 4797

1kiq2 1dyq4 1paq 1tiq4¹ 2wyr4

'gold grain palm place writing'

a.k.a. The Golden Guide by hand.

The second character in the title (4009) is the only tangraph with component 157 (𘢜).

In Homophones A, 4009 appears with component 160 (𘢟) (16B48).

In Homophones B and D (a.k.a. B2 and B5), 4009 appears with component 157 (𘢜) (17B22).

You can see scans of the Homophones pages on Andrew West's site.

157 seems to be an abbreviated form of 160 with the <WATER>-like portion (component 036 𘠣) written as a single stroke.

The Tangraphic Sea analysis of 4009 derives 157/160 from a right-hand element which I can't find in Unicode: ⿱𘠙𘠅. 006 𘠅 is the right-hand version of 036 𘠣 <WATER>. None of the characters with ⿱𘠙𘠅 have anything to do with water, though:

Li Fanwen number
𗾵 2615
second half of 𗉴𗾵 1687 2615 2chhy3 2khu4 'minced meat' (dictionary only)
𗚭 4142
to chop; bean jam?
𘚑 4446
to break, broken (dictionary only)
𗮝 5359
minced meat
fragmentary, broken; < Chn 碎
𘂉 5900
second half of  𗨦𘂉  3381 5900 2by1 2di4 'fragment' (dictionary only)

Unlike many other tangraphic elements, ⿱𘠙𘠅 has a clear semantic function: almost all of the above involve pieces or making something into pieces.

I regard dictionary-only words as candidates for loans from a substratum. 1687 2615 might be a unanalyzable disyllabic substratum synonym of native 5359.

4142 may be a phonetic loan for 'bean jam'.

Unlike 1687 2615, 4446 is not disyllabic. Could it be a native monosyllable that just so happens not to have been found in any nonlexicographic texts yet?

5359 has a straightforward graphic structure <MEAT.BREAK> and is the basis for 2615 and 4142.

5380 has an unexpected -e that may indicate that the Chinese dialect known to the Tangut already had a pronunciation of 碎 closer to modern standard Mandarin [swej] than Middle Chinese *swiʰ.

5900 may be an adjective 'broken' after 3381 'pellet'. Without any attestations not preceded by 3381, I can't tell if it can stand by itself.

7. I wouldn't have guessed that Hagadone is an Americanization of Hagedorn. THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 6)

1. I used the inherent vowel of the Khmer script to write English /ʌ/, but the Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script uses the long vowel symbol ា <ā> instead. That surprised me because /ʌ/ isn't long.

Another surprise is that in the ASB proposal, ា <ā> does double duty for English /a/. Wouldn't that lead to ambiguous spellings? Maybe not - many of my /a/-words are /ɔ/-words in the dialect ASB is based on (e.g., pot = my /pat/ but ASB's /pɔt/). Putting such words aside, the only minimal pair I can think of is calm /kam/ : cum /kʌm/; both would be written កាមា <kāma> in ASB. I would distinguish them as កាម <kāma> and កម <kama>.

ា <ā> does double duty in my system as well for final schwa: e.g., comma /kamə/ as កាមា <kāmā>. Although I use the inherent vowel for schwa in word-medial position, I can't do so in word-final position where <Ca> represents /C/.

2. Last night I saw an ad for a book by "LEUYEN PHAM" in all caps on my Kindle. It caught my eye because I had never seen the two syllables of a Vietnamese personal name run together before. The front (and only?) page of LeUyen Pham's site asks, "How Do You Pronounce LeUyen Pham?!?" but doesn't answer the question. In Vietnamese, Le Uyên (tones unknown) is pronounced [le ʔwiən] (north) ~ [le ʔwiəŋ] (south). However, I don't know how English speakers pronounce it.

3. I thought I had never seen T in the name of any Hawaiian group until last night when I saw a reference to Hui Aloha ‘Āina Tuahine in the Honolulu Star-Advertiser. Turns out I had first seen the name of that group on p. 363 of Albert J. Schütz' The Voices of Eden back in 1995.

In standard Hawaiian, t shifted to k. However, t persists in tūtū 'any relative or close friend of grandparent's generation' and Tuahine, defined by as

(More commonly Tuahine). Name of a misty rain famous in Mānoa, Oʻahu, named for Kuahine, who turned to rain after the murder of her daughter, Ka-hala-o-Puna; the rain is also in other localities.

I suspect t survived in tūtū and Tuahine because they were borrowed into English which has a /t/ : /k/ distinction absent in Hawaiian which only has three kinds of stops: labial /p/, glottal /ʔ/, and a third stop whose point of articulation varies by dialect.

Does Tuahine indicate that the Mānoa dialect had [t] for that third stop?

Today Mānoa is a center for education in standard Hawaiian with [k] for that third stop. When Hawaiian was repopularized¹, words with that third stop were pronounced with [k] following their standardized spellings with k.

¹I would rather not say "revived" since Hawaiian has never died. What has been lost is the original diversity of dialects. As far as I know, the only two varieties still spoken by large numbers of people are the Niʻihau dialect native to the population of Niʻihau and the standard language learned in schools.

4. Valdemar Knudsen (1819-1898) is said to have been able to speak "the 3 Hawaiian languages" fluently. What were the three? Hawaiian, English, and Pidgin Hawaiian? (Hawaiian Creole English, now 'Pidgin', had only begun to develop during Knudsen's last years.)

The Pidgin Hawaiian article at Wikipedia says a couple of surprising things:

Emerging in the mid-nineteenth century, it was spoken mainly by immigrants to Hawaii, and mostly died out in the early twentieth century, but is still spoken in some Hawaiian communities, especially on the Big Island.

It's still alive? I thought it was extinct. Has anyone done any modern fieldwork?

Like all pidgins, Pidgin Hawaiian was a fairly rudimentary language, used for immediate communicative purposes by people of diverse language backgrounds, but who were mainly from East and Southeast Asia.

Southeast Asia? As far as I know, mass Southeast Asian immigration to Hawaii postdates the Vietnam War. THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 5)

1. The Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script has no inherent vowels, so it has vowel symbols correspondng to my inherent vowel: <ā> for /ʌ/ and បន្តក់ <pantaka'> /bɑntɑʔ/ <'> for /ə/.

This site
Khmer script transliteration Khmer script transliteration Khmer script transliteration
ក៉ម្ប់ស <k"amp'asa>
កែម្បស <kĕmpasa>
ពាត់ន <bāt'ana> ពតន <batana>

(8.28.1:36: Added my guesses for H&P-style forms.)

One good reason to use <'> for schwa is that it is a simple, short stroke. It would be impractical to write the most frequent vowel in English with a complex shape.

It hadn't occurred to me to use <'> as a vowel character because in Khmer proper it functions as a breve for the inherent vowel and <ā>, not as a vowel character:

បន្តក់ <pantaka'> /bɑntɑʔ/

(A hypothetical †តក <taka>​ would be †/tɑːʔ/. In theory the name of the diacritic could be written with two <'> as ˟បន្ត់ក់​ <pan'taka'>, but unstressed initial <CaC> syllables always have short vowels, so a second <'> is redundant.)

កាត់ <kāta'> /kat/ 'cut'

(The resemblance to the English word is coincidental; compare កាត <kāta> /kaːt/ 'card' without <'>.)

In Khmer proper, <'> appears atop the symbol for a syllable-final consonant following the symbol with the vowel it shortens, whereas in ASB, <'> behaves like a vowel symbol, combining with the symbol for the consonant that immediately precedes a schwa.

(8.28.1:22: Khmer examples of <'> added. បន្តក់ <pantaka'> is unreadable in ASB since ASB has no inherent vowels - it looks like ASB [p-nt-kə].)

2. Khmer ថៃ <thai> [tʰaj] must have been borrowed after Thai *d- > tʰ-, and ថៃឡង់ដ៍ <thaiḷaṅ'aṭa˟> [tʰajlɑŋ] may be an even more recent borrowing from French Thaïlande [tajlɑ̃d] with Khmer [ɑŋ] approximating French [ɑ̃]. But surely the Khmer had a word for 'Thai' predating those borrowings. Did Khmer ever have a word like †ទៃ <dai>? The only premodern Khmer word I can find in Jenner is an undated សៀម <siama> 'Siam'.

3. When looking for 'Thai' in Philip Jenner's Old Khmer dictionary last night, the only entry that appeared was ស៊ង <s"aṅa> 'two', a borrowing from Thai /sɔːŋ/ 'id.' attested in a text from 1684. The <"> indicates that <sa> by itself could not represent /sɔː/. The split of /ɔː/ to

/ɑː/ after *voiceless consonants

/ɔː/ after *voiced consonants

must have already occurred. The addition of <"> indicates that the following vowel is one normally associated with a *voiced consonant: i.e., /ɔː/ in this case. សង <saṅa> without <"> was /sɑːŋ/ < */sɔːŋ/ 'to give back'.

That split occurred after the loss of voicing in obstruents.

Thai also devoiced its obstruents, but unlike Khmer, it aspirated them: e.g., *d- > tʰ-. So I was surprised to see Thai พัน <ban> *ban (now /pʰan/)  'thousand' borrowed as Khmer ពទ <bana>. Is the Khmer spelling merely a mechanical copy of the Thai spelling, or does it indicate that Thai *b had not yet shifted to pʰ-?

4. What is the etymology of the name Odoacer?

5. I think I've only spoken of 'Turkish' borrowings into Balkan languages. Marek Stachowski (2019) writes, "it is much better to call Turkish loanwords in the Balkan languages just 'Turkish', which is sufficiently clear in English." Whew. But I confess that I used the term 'Turkish' without knowing his reasoning against the term 'Ottoman Turkish'. THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 4)

1. Exactly how the Brahmi script - the parent of all Indic scripts - developed is not clear, but one thing is for sure: the principle of inherent vowels is ingenious. Brahmi was first used to write Middle Indo-Aryan whose most common vowel was short a. Making this short a inherent to consonant symbols saved a lot of effort and space.

This state of affairs reflects a Proto-Indo-Iranian innovation: the merger of Proto-Indo-European *e/*o into *a (there was no Proto-Indo-European *a according to Beekes¹):

*eH, *oH, *ē, *ō
*ei, *oi
*eu, *ou

Whitney (1896: 26) found that nearly 20% of segments in a Sanskrit text sample were short a.

Middle Indo-Aryan (e.g., Pali) inherited that abundance of a from Sanskrit (= Old Indo-Aryan) and gained a few more short a via the loss of other vowels: e.g.,

Skt mr̥ga- > Pali maga- (also miga-!) 'deer'

(See Masica 1991: 167-168 for more examples.)

And it was at that stage that Brahmi was developed to write a-filled Middle Indo-Aryan. (Sanskrit was first written after its descendants were!)

An inherent a in the Brahmi script is a good fit for Old and Middle Indo-Aryan. But is it a good fit for English? What vowel is most frequent in English? My guess was schwa, and I was right. So in my adaptation of Khmer script to English, the inherent vowel is schwa: e.g.,

campus [kʰæmpəs] > កែម្បស <kĕmpasa>

cf. ASB ក៉ម្ប់ស <k"amp'asa>

(Example added 8.27.0:39. ASB added 8.27.20:31. I write English voiceless obstruents as Khmer unaspirated voiceless obstruents regardless of their allophonic aspiration in English: e.g., English /k/ [k] ~ [kʰ] asក <ka>.)

I also use the inherent vowel to write /ʌ/ and syllabic consonants: e.g.,

button [ˈbʌtn̩] > ពតន <batana>

cf. ASB ពាត់ន <bāt'ana>

(8.27.20:34: ASB added.)

(8.27.0:03: Like actual Khmer, I write final consonants with <Ca> symbols. I could use virāmas, but if I can live without them in Khmer and Thai without word spacing, I can live without them in English with word spacing.

In theory I could write syllabic consonants as subscript consonants: e.g.,

button [ˈbʌtn̩] > ពត្ន <batna>

but I prefer to reserve subscript consonants for clusters of nonsyllabic consonants.)

On the other hand, the Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script has no inherent vowels because "it only adds an additional layer of complexity".

¹Like Pulleyblank, I find this situation highly improbable, and I suspect original central *a and later polarized to front and back vowels *e and *o.

2. The first proposed etymology of culvert at Wiktionary is almost meaningless: "a dialectal word". Almost, because at least that tells us the proposer thinks the word is a borrowing from another dialect. But which dialect - and what is its derivation there?

3. Today I realized I never learned how to write Burmese rotated subscripts which only appear in  Pali loanwords:

ဌ <ṭha> > ဏ္ဌ <ṇṭha>
ဍ <ḍa> > ဏ္ဍ <ṇḍa>

Are they written like their full-size counterparts but at a different angle: e.g., is sideways ဌ <ṭha> written  counterclockise from right to left rather than counterclockwise from top to bottom like its upright version?

(8.27.22:39: I found the answer on p. 402 of John Okell's comprehensive Burmese: An Introduction to the Script [429 pages!]: both rotated subscripts are written clockwise from left to right.)

Today I also realized that there is a logic to the rotation of subscript characters:

Burmese: An Introduction to the Script doesn't mention a subscript version of ဠ <ḷa>, and most of my Burmese fonts don't have such a subscript. However, the Myanmar Text font does: its subscript <ḷa> (written counterclockwise when full-sized) rotates clockwise (cf. ဌ <ṭha>) even under normal-width characters. Is subscript <ḷa> real or artificial? I've never seen any Cḷ-clusters in Sanskrit or Middle Indo-Aryan and wouldn't expect any since originates from a lenition of intervocalic ḍ: e.g.,

Skt soaśa > Pali soasa 'sixteen'

Skt garua- > Pali garua- 'garuda'

(Examples added 8.27+.12:48. See Masica 1991: 170 for more.)

4. The Jurchen character 右 <mei> is a lookalike of Chinese 右 <RIGHT>.

Today it occurred to me that <mei> sounds a bit like Japanese 右 migi 'right'. A mostly dubious scenario: THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 3)

1. Indic scripts like Khmer tend to have a wealth of consonant characters. This is because Indic scripts were originally intended for Indic languages characterized by

Most of these oppositions do not exist in English.

On the other hand, in spite of that wealth of consonant characters, Indic scripts originally¹ lacked characters for several fricatives that exist in English: /ʒ z θ ð f/.

The Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script assigns 'extra' characters from an English perspective to English fricatives:

This site
Khmer script transliteration Khmer script transliteration Khmer script transliteration




/ʒ/ ហ្យ>






/ð/ ឌ?






I've included Huffman and Proum's (1983: 31, 42; hereafter H&P) transcription and my own for comparison. English /w/ is not a fricative, but I have included it because Khmer has no /v/ : /w/ distinction.

/ɣ/: Did ASB include this for symmetry with /x/ which is extremely marginal in English? /ɣ/ reminds me of Sanskrit l̥̄ which was created to be parallel to the rare but real phoneme r̥̄. (In Khmer, the character ឮ <l̥̄> devised for that purely hypothetical Sanskrit phoneme is used for the real and really common word [lɨː] 'to hear'.

/ʃ/: ASB's choice of ឆ <cha> reminds me of the Thai convention of borrowing English /ʃ/ as /tɕʰ/: e.g., shampoo as แมพู <jeemabū> [tɕʰɛːmpʰuː].

Unlike H&P and ASB, I use the extinct Khmer character ឝ <śa>: cf. Hindi शैम्पू <śaimpū> 'shampoo'.

/ʒ/: I use the ត្រីសព្ទ <trīsabda> diacritic <^> to indicate that voiceless <śa> is pronounced with a voiced initial. In actual Khmer orthography, <trīsabda> over a <voiceless> consonant indicates that a following vowel is pronounced as if it had originally followed a *voiced consonant: e.g., ហ៊឵ន <h^ān> [hiən] 'to dare' is pronounced as if it had developed from a (nonexistent and impossible) *ɦaːn. I could transliterate <ha> + <trīsabda> as <ɦa> rather than as <h^a>, but to imply that Khmer once had [ɦ] would go against the general historical/etymological principle of my transliteration.

/z/: H&P's ឯ <°e> may be a case of arbitrarilly using a Khmer character that would otherwise go unused.

I would have expected ASB to use ឍ <ḍha> or ​​​​ធ <dha> for /z/ by analogy with the other voiced aspirates for voiced fricatives. <Z> on my Khmer NIDA keyboard layout is assigned to ឍ <ḍha>.

My ស៊ <s^a> is based on the same principle as my ឝ៊ <ś^a> (see above).

/ð/: ASB's ឋ <ṭha> is surprising since this character originally did not represent a voiced consonant.

/f/: H&P's ហ្វ <hva> is carried over from an existing Khmer convention to write foreign /f/.

/w/: H&P use អ្វ <ʔva> because វ <va> is already taken for /v/. In modern Khmer, វ <va> is pronounced [ʋ] in initial position, but in earlier Khmer, it may have been pronounced [w].

¹"Originally" because characters werd later devised for such fricatives: e.g., Devanagari ज़ <j.a> for [zə]. But such characters created in India postdate the spread of Indic script to Southeast Asia, so there is no Khmer analogue of Devanagari ज़ <j.a> for [zə].)

²The actual printed character is ឧ <°u>, perhaps because Huffman and Proum's (1983) was written on a typewriter without a ឌ <ḍa> key. Compromises were inevitable on typewriters:

The Keyboard layout for 120+ elements of Cambodian script and essential punctuation marks was a very difficult task because of the limitation to 46 keys and 96 positions of the standard typewriter.

2. Could Khitan large script character 2091

be a variant of 2050

<taulia> 'hare'?

(8.26.17:01: I found 2091 without context in N4631 which has no gloss for it. 2091 may either be distinct from 2050 [i.e., not occur in calendrical contexts where 'hare' is expected] or represent 'hare' - the actual animal - in a noncalendrical context that remains to be deciphered.)

3. Korean 표고 phyogo 'shiitake mushroom' caught my eye because it is one of a small number of native words with yo. (Perhaps the most important are 좋 choh- < 죻- /cyoh/ 'good' and 소 so < 쇼 /syo/ 'cow'. The distribution of y is skewed in Korean. It most often precedes ŏ, and for twenty years, I have thought that many if not all native go back to an Old Korean *e (not to be confused with modern Korean ㅔ e < /əj/). But where does native yo come from?

Martin et al. (1967: 1758) suggest that phyogo may be Chinese but do not specify a Chinese source. -go /ko/ sounds like Middle Chinese 菇 *ko 'mushroom'. Wikipedia lists a number of Chinese words for 'mushroom' (given here in standard Mandarin pronunciation, though I do not know if they are all standard Mandarin words), but none are like the †piaogū [tone for first syllable unknown] that would theoretically correspond to Korean phyogo.

Here is the distributin of phy- in native words in Martin et al. (1967):

None are core words that would be likely retentions from Proto-Koreanic

Since Korean ph- is from *kVpV- and *pVkV-, perhaps there was a constraint in some intermediate stage against clusters like *kpy- and *pky-. However, a three-consonant cluster constraint does not explain the paucity of native ky-words not beginning with kyŏ-. I'll look at those words later. Why k-? k- is the most common initial consonant letter in hangul (not counting the zero initial), whereas ㅍ ph- is the least common (not counting reinforced consonants like pp-). THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 2)

1. Unlike modern Khmer script which deviates from the one-symbol-per-sound ideal with two or more consonant letters per consonant phoneme and two or more readings per vowel symbol (see part 1), the Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script has just one symbol per consonant (not including subscript variants) and one reading per vowel symbol. Compare Huffman and Proum's (1983: 31; hereafter H&P) transcription of English for modern Khmer speakers with the ASB system and my own system:

This site
Khmer script
Khmer script
transliteration Khmer script transliteration
pʰej ផេ

pʰiː ភី

ASB, my own English-in-Khmer system, and my Khmer transliterations are based on the old values of Khmer characters: e.g.,

A Khmer reader would pronounce

ASB and my system agree most of the time but not all of the time: e.g., the treatment of English [ej]. I'll start covering the differences in part 3.

2. 稚内 <YOUNG INSIDE> Wakkanai is an Ainu name in disguise. Ainu wakka 'water' is written as 稚 <YOUNG> since Japanese waka (not wakka!) is 'young'. And Ainu nai 'river' is written as 内 <INSIDE> since it sounds like Sino-Japanese nai 'inside'. This makes me wonder

- how many other cases of -VCCV- sequences (like wakka) are written with characters normally read with -VCV- sequences (like 稚 waka)

- if any Ainu words are written semantically in Japanese place names: e.g., is wakka 'water' ever written as 水 <WATER>? Is nai 'river' ever written as 川 <RIVER> or 河 <RIVER> or 江 <RIVER>?

Ainu nai 'river' happens to coincidentally resemble Middle Korean nayh 'river'. The resemblance fades if the Middle Korean word is traced to Old Korean 川理 <RIVER ri> *(na?)ri (cf. its Paekche cognate 'stream', recorded in Japanese as 那禮 ~ ナレ nare ~ ナリ nari).

3. Today I was listening to 坪能克裕 Tsubonō Katsuhiro's score for Aura Battler Dunbine (1983-84). Tsubonō's name is spelled as an unusual combination of a native Japanese 坪 <TSUBO> tsubo 'unit of area' with a Sino-Japanese 能 <CAN> nō. Is the name a variant of 坪野 <TSUBO FIELD> Tsubono with a long final vowel?

4. Today I discovered 金芝河 Kim Chi-ha's respellings of the names of what he called 五賊 오적 Ojŏk 'Five Bandits':

Normal hanja
Kim's hanja
狾䋢 재벌 chaebŏl
𠣮獪狋猿 국회의원 kukhoe ŭiwŏn
National Assembly member
高級公務員 跍礏功無獂 고급공무원 kogŭp kongmuwŏn
high-ranking public official
將星 長猩 장성 changsŏng
長・次官 瞕𤠝矔 장차관 changchhagwan
minister and vice-minister

I'm out of time, so I'll comment on Kim's respellings and Brother Anthony of Taizé's translations later.

5. Via Bitxəšï-史 today: Blažek et al.'s Altaic Languages: History of Research, Survey, Classification and a Sketch of Comparative Grammar (2019) can be freely downloaded here (click on "KE STAŽENÍ"). I use 'Altaic' on this site to refer to an areal grouping of languages, but that book treats it as a genetic language family. A quick look at the book leaves me unconvinced. THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 1)

Last night I disccovered the Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script.

It reminded me of how I came up with a way to write English in hangul when I first learned that alphabet in 1987. Unaware of the wealth of obsolete hangul letters, I recall inventing a letter 巳 for /l/ based on ㄹ /r/. I might have made up other letters as well.

Khmer has so many characters that it's not necessary to invent new ones for English.

Obstruent devoicing and vowel warping conditioned by the *voicing of preceding consonants have resulted in many pairs of homophonous consonant characters on the one hand and vowel characters with double readings on the other: e.g.,

A Khmer script for English designed for maximum compatibility with modern Khmer script for Khmer would carry over those characteristics. Huffman and Proum's (1983) transcription of English for modern Khmer speakers has those characteristics: e.g.,

Note how English p [pʰ] has to be written differently depending on the following vowel.

ASB takes a simpler approach which I'll describe next time.

(8.24.0:12: Huffman and Proum probably have no transcriptions like ភេ <bhe> or ផី <phī> because the syllables [pʰeː] and [pʰəj] do not exist in American English.) KHMER INDEPENDENT VOWEL LIGATURES

I didn't figure out that these Khmer independent vowel characters were ligatures until two days ago:

ឨ <°û> (= ឧក <°uka> /ʔok/) < ឧ <°u> + ក <ka> (<ˆ> symbolizes the upper 'hair' stroke of <ka>)

ឪ <°ǔ> (= ឩវ <°ūva> /ʔəw/) < ឧ <°u> (not ឩ <°ū>!) + វ  <va> (<ˇ> symbolizes the upper 'hair' stroke of <va>)

Duh. Two mysteries down, more to go:

Why was the now-obsolete ligature ឨ <°û> created? One might guess there vwas a high-frequency word /ʔok/ (< earlier /ʔuk/). There is no likely candidate for such a word in modern Khmer. Here are all the meanings of ឧក <°uka> ~ អុក <ʔuka> /ʔok/ in Headley's dictionary at SEAlang:

1. 'bellyband, cinch, girth (of a harness)'

2. 'to reproach, blame, censure; to scold; to abuse, to criticize severely'

3a. 'to slam something down; to fall down hard'

3b. 'check, checkmate'

4. 'kind of vulture'

None of those words would seem to be frequent enough to motivate the creation of a ligature for them.

However, there is an Old Khmer word /ʔuk/ 'also' (by coincidence resembling Dutch ook 'also'!) that Jenner transliterates as <ukk> ~ <uk> ~ <ukka>. So I'm guessing that's the word that was once represented by ឨ; once that word became obsolete, its ligature vanished along with it.

What I don't understand about Jenner's transliteration system for Old Khmer is when he chooses to write final <-a>. He and Sidwell do not explain this in their Old Khmer Grammar. I don't know how <ukk> differed from <ukka> in the original script. (I confess I have only seen Old Khmer in transliteration.) Jenner's <ukka> is presumably ឧក្ក <°ukka>, but what is Jenner's <ukk>? Is it ឧក្ក៑ <°ukk·> with a virāma? In Old Khmer Grammar, final <-a> is only in Indic loanwords with the exceptions of CV syllables (ka 'clause conjunction', ta 'subordinating conjunction', sa 'white'). Did Old Khmer scribes carefully write virāmas all over the place - a practice abandoned in modern Khmer?

(Wikipedia says the Khmer virāma is "mostly obsolete". Huffman [1970: 53] says it is "sometimes used in the transcription of Sanskrit words"; his exercises do not mention it at all. I do see it in វេយ្យាករណ៍សំស្ក្រឹត veyyākaraṇa˟ saṁskrïta> 'Sanskrit Grammar' [1999].)

I have wondered if Jenner simply omits <-a> in transliterations of native Khmer words ending in consonants, but that does not explain cases like <ukk> ~ <ukka> or

<ʼāyatta> ~ <ʼāyatt> 'dependence' < Skt āyatta-

If I were right, <ukka> and <ʼāyatt> shouldn't exist, but they do. Is Jenner in fact consistently writing all word-final consonant symbols without any other dependent symbols as <Ca>? If so, then do transliterations like <ukk> and <ʼāyatt> reflect spellings of words before consonant initials of other words?

That new hypothesis predicts that the courtesy title <poñ> should always precede a consonant. And yet ... Old Khmer Grammar example 307 begins with

<poñ uy oy kñuṃ ...> (K.557/600N: 1, 612 AD)

poñ Uy give slave

'The poñ Uy has given slaves ...'

I would expect ˟<poña uya oy kñuṃ>. How were those words written in the original? As

(a) four akṣaras without virāmas


<po ñu yo ykñuṁ> (I prefer <ṁ> for anusvāra to <ṃ> which I reserve for Pyu subscript dots.)

(b) four akṣaras with virāmas <·>


<poñ· uy· oy· kñuṁ>

(c) something even more bizarre with subscript independent vowel symbols ឧ <°u> and ឱ <°o> instead of the dependent vowel symbols in (a)?

If not for Jenner's transliteration, I would have imagined something with seven akṣaras like


<po ña °u ya °o ya kñuṁ>

But Jenner's dictionary doesn't list a spelling <poña> for the courtesy title.

For comparison, a modern Khmer word-for-word translation of the phrase would be


<pa ṅa °u ya °oya¹ khñuṁ>

without any indication of which <a> are silent.

If I am right about ឪ <°ǔ> being from ឧ <°u> + វ  <va>, I might expect to find earlier spellings of ឪ-words with <°uva>. The most important ឪ-word might be <°ǔbuka> /ʔəwpuk/ 'father'. The earliest attestations of this word that I can find are <°ābbhūka> (1599) and <°ābuka> (1602) which are close to Sanskrit āvuka- 'father'. A regular reflex of <°ābuka>, អាពុក <°ābuka> /ʔaːpuk/, exists today alongside <°ǔbuka> /ʔəwpuk/. How did /aː/ change to /əw/? I don't know of any other instance of such a change.

I think /əw/ is the regular reflex of */əw/ (not */aː/!) after *voiceless consonants. (Is /ʔəwpuk/ < */ʔaːbbuk/ with reduction of */aː/ and lenition of */b/?)

*/əw/ raised to /ɨw/ after *voiced consonants. Compare:

Do the <ū> spellings reflect earlier vowel length? Conversely, the absence of the extra stroke for vowel length in ឪ <°ǔ> could be interpreted as indicating an absence of earlier vowel length, but I don't think ឪ <°ǔ> and <ūva> had distinct *rhymes. The extra stroke of ឩ <°ū> may have been dropped from the abbreviation ឪ <°ǔ> as redundant since there was no contrast between <uva> and <ūva>. Did a transitional character combining ឩ <°ū> with the 'hair' of វ <va> ever exist?

¹In modern Khmer, ឲ្យ /aoj/ 'to give' has an unusual spelling <°ȯya> (not ˟ឱយ <°o ya>!). My transliteration has no space to indicate that <-ya> is a subscript rather than an independent akṣara យ <ya>. That subscript is unusual because it represents a final glide /j/ rather than a medial glide /j/ (as in ខ្យល់ <khya la'> /kjɑl/ 'wind') or zero (as in ពាក្យ <bā kya> /piəʔ/ 'word' < Skt vākya-).

ឲ <°ȯ> is a rare character that is, as far as I know, unique to ឲ្យ <°ȯya>. I transliterate it as ឲ <°ȯ> to distinguish it from regular ឱ <°o>.

Could <°ȯya> have originated as an abbreviation of ឲយ្យ <°ȯyya>  (my guess for what Jenner's 17th century oyya represents)? Or is subscript <y> for final /j/ a remnant of this practice?

Subscripts were previously also used to write final consonants; in modern Khmer this may be done, optionally, in some words ending -ng or -y, such as ឲ្យ aôy  ("give").

I would like to know which other words have subscript <ṅ> and <y> for codas. XIANGNAN TUHUA INTERLUDE: GITHUB JIANGYONG

1. Tonight I discovered sgalal's list of '江永 Jiangyong dialect' readings for 女書 Nüshu 'women's writing' characters on GitHub. This dialect which I'll call GJY (Github Jiangyong) differs yet again from the ones I talked about last night:

*ɣwæjʰ fuə³³ xu⁵² fə³³ fwe⁴⁴ xwa⁵¹
*xwæ xu⁵⁴
fə⁴⁴ xwa⁵⁵
association *ɣwajʰ ui²²
uɯ³³ ~ fɯ³³
vwe³³ xwej⁵¹

GJY is written without non-ASCII characters, so I am unsure if w is really ɯ, e is really ə, etc. In any case, GJY is distinct from OXT, Daoxian, and Baishuicun.

All three morphemes are written with one character in OXT Nüshu, but are written with two different characters in GJY Nüshu. Given the mismatches above, I expect many other dialectal differences in Nüshu spelling.

I found GJY via which also has two different hanzi-to-Nüshu converters.

2. Those converters use images to allow people without Nüshu fonts to see Nüshu characters. I'm one of those people. I went looking for a free Nüshu font and found a page about Chelsy Jiayi Wu's NVSHU SANS (V being a common substitution for Mandarin Ü). Some of the issues she mentions are relevant for Tangut, Jurchen, and Khitan as well as Nüshu: e.g.,

Every handwritten sample reflects the varying styles of its author. Without a history of standardization, it is difficult for me to identify what elements of each character are necessary for letterform identification. Where should a stroke begin and end? Which elements are ornamental and which are absolutely essential? Where should this dot be positioned relative to the stroke?

Chelsy Wu has an interesting background: "Born in Tokyo, raised in Shanghai", and a triple native speaker of Japanese, Mandarin, and English. (No Shanghainese?) She runs the site Explorations in Global Language Justice. OMNIGLOT'S XIANGNAN TUHUA SAMPLE (PART 1: INTRODUCTION)

1. Omniglot has a sample of 女書 Nüshu 'women's writing' characters with readings in an unspecified variety of 湘南土話 Xiangnan Tuhua 'Southern Hunan local speech'.  This variety (hereafter 'OXT') is not the same as the 道縣 Daoxian and 白水村 Baishuicun¹ varieties at 小學堂 Xiaoxuetang. Compare their reflexes of Middle Chinese 話 *ɣwæjʰ 'speech':

Daoxian and Baishuicun are about 45 km apart. I have no idea how far they are from OXT.

Omniglot gives the local name of Xiangnan Tuhua as [tifɯə] without specifying tones. I suspect [tifɯə] is etymologically 地話 'earth speech', so [fɯə] may be from a fourth variety of Xiangnan Tuhua (OXT2?).

Unless I'm misreading the OXT sample, the OXT reflexes of Middle Chinese 話 *ɣwæjʰ 'speech', 花 *xwæ 'flower, and 會 *ɣwajʰ 'association' are homophonous. But that is not the case with Daoxian and Baishuicun:

*ɣwæjʰ fuə³³ xu⁵² fə³³ xwa⁵¹
*xwæ xu⁵⁴
fə⁴⁴ xwa⁵⁵
association *ɣwajʰ ui²²
uɯ³³ ~ fɯ³³

會 in 會不會 lit. 'would or wouldn't' (hard to translate; more examples here) has another pronunciation in Daoxian: xo⁵². As association-會 and 會不會-會 are usually homophonous, I suspect that ui²² and xo⁵² belong to different strata of Daoxian: at least one may be borrowed, and if both are borrowed, one is newer than the other.

I added a Mandarin column to show how different Xiangnan Tuhua varieties from it as well as from each other.

That glimpse at Xiangnan Tuhua-internal variation makes me wonder how that variation maps onto Nüshu. I betray my ignorance of Nüshu here with some basic questions:

2. Last night - shortly after mentioning Wanzi Gelao - I was horrified to learn of this fake 'Gelao' manuscript. A fake is bad enough; a fake that is simply disguised Chinese is even worse. To pretend that the language that is replacing Gelao is 'ancient Gelao' is tasteless.

¹Five years ago, I wrote a ten-part series on Baishuicun: 1-4 / 5-8 / 9-10.

