22.214.171.124:30: METAL TONGUE
1. Yesterday, I mentioned that 銛 has
three Mandarin readings:
xiān < MC *siem < OC *sIlem
'shovel; harpoon; sharp'
tiǎn < MC *tʰemˀ < OC *l̥emʔ 'to
take (not the usual word); shovel'
guā < MC *kwat < OC *kʷat 'to cut
(The meanings are from 廣韻 Guangyun 
and were not necessarily current even a thousand years ago.)
金 <METAL> on the left of 銛 is semantic.
舌 on the right of 銛 is an unusual phonetic. Most phonetics represent one class of syllables, but 舌 represents three, so it has three different numbers in Karlgren's Grammata serica recensa (1957) and Schuessler's (2009) Minimal Old Chinese and Later Han Chinese: A Companion to Grammata Serica Recensa:
GSR 288/S 20-10 舌 *mIlat 'tongue' (in theory could be phonetic for other *lat-syllables, but it isn't)
GSR 302/S 22-01 for *kʷat-syllables
It may be tempting to claim that this series is really for *kʷlat-syllables
and is just a *kʷ-branch of GSR 288/S 20-10, but this is
impossible. See below.
GSR 621/S 36-16 for *lem-syllables
銛 is unusual because it simultaneously belongs to two series: GSR 302/S 22-01 and GSR 621/S 36-16.
The phonetic of GSR 302/S 22-01 is 𠯑 OC *kʷat to shut the mouth' which originally has nothing to do with 舌 <TONGUE> (apart from sharing 口 <MOUTH> on the bottom) and hence cannot be conflated with GSR 288/S 20-10. 𠯑 is only abbreviated as 舌 in combinations. There is also another abbreviation 𠮮 which is distinct from 舌 <TONGUE>. Does 銽, the full form of 銛, only have velar readings?
Another more complex variant of 銛 is 𨨱 with the phonetic
活 huó < MC *ɣwat < OC *gʷat
also belonging to GSR 302/S 22-01. Does 𨨱 also only have velar readings?
舌 in GSR 621/S 36-16 is an abbreviation of the semantic
compound 甜 <TONGUE.SWEET> OC *lem 'sweet' as a phonetic.
abbreviated phonetics are common in Tangut: e.g., 𗗘
1079 2lenq3 'sweet' may be an abbreviated phonetic 𘡔 in
𗗕 0207 first syllable of 𗗕𗃨 2lenq3 2o1 'shifting phantom'
𗗖 0504 second syllable of 𗺫𗗖 2by1 2lenq3 'spinach'
𗗡 0955 second syllable of 𘀛𗗡 2ly1 2lenq3 'dirty'
Or one of those three could be phonetic in the others. No Tangraphic Sea analysis survives for any of the four.
𗗘 1079 2lenq3 < *Slim 'sweet' is cognate to 甜 OC *lem.
2. GEOGRAPHY IS NOT LINGUISTIC GENEALOGY: Just as not
all languages of India and Europe are Indo-European, not all languages
to Wikipedia, Ross, Pawley, and Osmond (2016)
regard Yapese as a primary branch of Oceanic: a sister of Micronesian.
to Wikipedia, Blust (1993) regards Chamorro and Palauan
as primary branches of Malayo-Polynesian: sisters of Oceanic. Nukuoro and Kapingamarangi
outliers in Micronesia.
Classification of the languages of Micronesia
Proto-Oceanic had a fairly simple phonology with one unusual feature: a distinction between two series of labials:
only had *p *b *m *w. How did the *Pʷ-series develop?
Yapese doesn't have any labialized labials, but it does have some unusual features absent from Proto-Oceanic and most languages:
glottalized sonorants /ŋˀ nˀ mˀ jˀ lˀ wˀ/
ejectives /kʼ tʼ pʼ θʼ fʼ/
I've never heard of ejective fricatives before. More on them here with audible samples.
I have no idea how either series developed.
Yapese has sixteen vowels. How did they develop from the *five of
Proto-Oceanic? I'm interested in how small vowel systems develop into
larger ones because I am looking for insights into vocalic expansion in
Tangut which went from *six vowels to dozens (without distinctive
3. It's unfortunate that the peoples indigenous to the regions where the world's most famous cities are now located are not well known. Until today I didn't know about the Ohlone of "the coast from San Francisco Bay through Monterey Bay to the lower Salinas Valley". I spent years in Berkeley without ever learning of the Chochenyo. I've never been to Ohlone Park or the Ohlone Greenway.
4. I look forward to KJ Solonin's forthcoming book The
Descendants of the White and High: The Tanguts in Asian History.
I wonder who'll contribute to it. A corresponding book for Tangut's
distant relative Pyu would be nice, though it might be far slimmer.
126.96.36.199:07: HANGUL AND IDEOGRAPHIC TONE MARKS
1. I first found the Wikibooks index to 東國正韻 Correct Rhymes of the Eastern Country almost exactly six years ago and have been using it ever since. But I didn't notice until last night that it used Korean-specific Unicode combining characters for tones:
〮 U+302E HANGUL SINGLE DOT TONE MARK (high tone)
〯 U+302F HANGUL DOUBLE DOT TONE MARK (rising tone)
The only other possible tone (low) is unmarked.
Preceding those marks in Unicode order are
〪 U+302A IDEOGRAPHIC LEVEL TONE MARK
〫 U+302B IDEOGRAPHIC RISING TONE MARK
〬 U+302C IDEOGRAPHIC DEPARTING TONE MARK
〭 U+302D IDEOGRAPHIC ENTERING TONE MARK
which I've never seen in any electronic text, though I've seen them in print since I got my first Sino-Japanese dictionary over thirty years ago. Let's see if they work with Tangut:
𗗔〪 0218 1e'4 'level (tone)'
𗨁〫 2612 2phu4 'rising (tone)'
𘃽〭 1616 2o1 'entering
(I don't know how the Tangut translated Chinese 去'departing' since the native Tangut phonological tradition only applies three Chinese tonal categories to Tangut.)
I see that the combining tone characters only work with my Tangut
font if I copy and paste the above text and post it into BabelPad or
BabelMap. The characters aren't in the Tangut font specified by the
style for Tangut on this site.
2. I first heard the Cantonese term zuk1 sing1 for overseas Chinese in 1991. I was told it meant 'empty bamboo', but I could never find any word sing1 meaning 'empty' (and the noun-adjective order was odd). It's taken me 28 years to learn what sing1 is:
The original term is 竹杠 ['bamboo rod': i.e., hollow/empty bamboo]. But 杠 [gong3] is pronounced exactly like 降 "fall”, which is considered as inauspicious. The very opposite of “fall” is “rise”. So 升, meaning “rise”, is chosen to replace 杠.
Strangely the term turns out to mean 'thick bamboo pole' as well as someone 'empty of Chinese culture and values like a hollow bamboo pole'.
3. Dept. of Etymology ≠ Semantics: I wouldn't have guessed this distinction, since both bi- and di- mean 'two':
By contrast, duotheism, bitheism or ditheism implies (at least) two gods. While bitheism implies harmony, ditheism implies rivalry and opposition, such as between good and evil, or light and dark, or summer and winter.
I'm trying to come up with a mnemonic for the distinction: ditheism entails discord, whereas bitheism is associated with b-something? Benevolence?
4. Does Manichaeism still exist?
In modern China, Manichaean groups are still active in southern provinces, especially in Quanzhou and around the Cao'an, the only Manichaean temple that has survived until today. There is a Chinese Manichaean Council with representatives in Tibet and Beijing.
Normally I edit out things like "" when quoting Wikipedia, but I think that instance has to stay.
The Wikipedia article on the 草庵 Cao'an does not mention any modern Manichaeans.
5. Today episode 2 of Super
Robot Mach Baron turns
forty-five years old. I never noticed the unusual spelling of the name
series' robot designer until tonight:
That spelling only appears in the closing credits. Online it appears
as 田中視一. Apparently 㐅 is an idiosyncratic simplification of 示 shimesu-hen.
田中 is Tanaka, but I don't know how to read (⿰㐅見)一 ~ 視一. The obvious Sino-Japanese reading Shiichi doesn't sound like a name. O'Neill's Japanese Names gives a native name reading nori for 視, and Wiktionary lists a native reading tomo, so perhaps (⿰㐅見)一 ~ 視一 is Norikazu or Tomokazu (kazu being the native name reading of 一; there is a strong tendency for both readings in a two-character name to be of the same origin).
6. While reading about Heil V1, one of the
robots on Mach Baron,
I encountered a kanji I didn't recognize:銛 mori 'harpoon'.
Da's general Chinese ranks:
銛 has three Mandarin readings: xiān, tiǎn, and guā.
More on them tomorrow.
188.8.131.52:54: ?STEMI QAGHAN
1. In Turkish and Mongolian Studies (1962, 73), Sir Gerard Clau*son mentions
the great Türkü ruler of the second half of the sixth century, Eştemi Kağan (the exacct pronunciation of his name of his name is uncertain, the Byzantines called him Stembis Xagan, and the Chinese Shih-tieh-mi [Shidiemi in pinyin])
How did Clauson guess that the first vowel might be E-? There has to be an initial vowel because Turkic did not allow initial consonant clusters.
The Wikipedia entry for that qaghan is titled "Istämi" and gives
his name in runes
as 𐰃𐰾𐱅𐰢𐰃 <is₂t₂mi>. Is that an attested spelling, or
someone's modern creation? The runic script is often ambiguous, but
that spelling unambiguously represents [i] since <i> has to be
[i] and not [ɯ] in a word with series 2 (front-vowel) consonants
<s₂> and <t₂>. The e is not written since
[s]hort vowels, other than those enclosed in digraphs, should not be written except when they are the first vowel of a word, and then only if they are not a/e. (Clauson 1962: 81)
Other versions of the name in Wikipedia:
İstemi (Turkish spelling of Istemi)
Ishtemi (= İştemi in Turkish spelling)
Chinese transcriptions in Wikipedia:
室點密 Shidianmi < *ɕit tém mɨit
室點蜜 Shidianmi < *ɕit tém mit
瑟帝米 Sedimi < *ʂit tèj méj
I don't see the Shidiemi Clauson mentioned.
All the transcriptions point to ş rather than s, contrary to the runic spelling. (The s in the Byzantine spelling could represent Turkic ş since Greek had no letter for ş.)
2. The final *-t in the transcriptions of ?stemi
above corresponds to zero in Turkic. Clauson (1962: 88) noted that
Middle Chinese *-t corresponded to Turkic -ð, -l, -r,
and even zero, but not -t which was transcribed as *tV.
That seems to imply that northwestern Middle Chinese *-t had
already shifted to *-r. Might cases of *-r
corresponding to Turkic zero really involve a subphonemic Turkic [ʔ]
after short vowels? Are there any cases of *-r corresponding to
zero after Turkic long vowels?
3. Taishanese has palatal allophones [tɕ tɕʰ] of /ts tsʰ/ before the high vowels /i u/. This is reminiscent of the palatalization of stops (but not affricates!) before high vowels in Late Old Chinese:
*t(ʰ)i > *tɕ(ʰ)i
*t(ʰ)ɨ > *tɕ(ʰ)ɨ
*t(ʰ)u > *tɕ(ʰ)u
Pulleyblank (1984: 179) opened my eyes to the phenomenon of
palatalization and affrication without palatal vowels, though I should
have figured that at least such affrication was possible since I had
known even before reading his book that Japanese tsu [tsɯ] was
It's been over twenty years since I wrote an unpublished study of
Taishanese historical phonology. It would be interesting if I could
find it, though it's so old that the file might be not readable with my
current software and fonts. Unicode insured that everything I've
written since I switched to Windows XP in 2002 should be legible in the
4. James Evans' 1841 grid for his syllabics is completely full. It has some interesting characteristics.
sp- is the only possible initial cluster. It is now obsolete.
There is no <u> or <ū> since Cree
only had a single type of labial vowel phoneme written as <o> and
There is a graphic distinction between <e> and <ē> even though
Cree ê only occurs long [...] Not all writers then or now indicate length, or do not do so consistently; since there is no contrast, no one today writes ê as a long vowel.
There is a single symbol for a cluster coda <hk> since -hk
is "a common grammatical ending in Cree". In Ojibwe, the same symbol
represents the common cluster coda -nk.
Murdoch (1981: 27, 60) includes a second cluster coda symbol <sk>.
I found Murdoch through this page via Wikipedia:
John wrote to Cree Literacy Network about it in 2017: "I researched the origins and evolution of syllabic characters for Cree, Inuit and Dene languages, producing a MEd thesis at the University of Manitoba in 1981. Although James Evans, the Wesleyan Methodist missionary played a part in the first printings in syllabics at Norway House, He was not the person who was the most instrumental in the writing systems conception and spread. During my research I visited archives as well as Aboriginal communities in the Boreal Forest as well as the Eastern Arctic. Missionaries George Barnley, John Horden, Jean-Nicolas Laverlochère, Edmund Peck and Jean Baptiste Thibeault all arrived to Cree, Inuit and Dene nations who were already able to read and write in the system."
Is Murdoch saying the script spread across the First Nations without
Evans' (or any Euro-Canadian?) involvement? It's not clear whether he
agrees withWikipedia's take based on that online article:
there is strong evidence to suggest that the Cree people already knew the writing system and Evans simply adapted it for print.
But skimming Murdoch's thesis, I get the impression that in 1981 (maybe not anymore) he supported the conventional view of Evans as inventor. I'm confused.
This reminds me of the issue of whether the Khitan and Jurchen large
scripts were invented or derived from a preexisting Parhae script.
5. What is the etymology of Yazidi?
Earlier scholars and many Yazidis derive it from Old Iranian yazata, Middle Persian yazad, divine being.
I wouldn't expect a to become i or î (the Yazidi autonym is Êzîdî ~ Êzidî). But I don't know Kurdish historical phonology.
*a > i is not impossible. It's common in Tangut: e.g.,
*tI-S-tsa >𘅗 1321 1ziq4 'shoe'
Compare with Japhug tɯ-xtsa 'id.' (Jacques 2014: 90).
KHITAN CHICKENS, STABLES, AND ALTERNATORS
1. I've long been bothered by the Khitan word for 'chicken', written as
<t.qo.a> (~ <CHICKEN>?)
I have put the large script character in parenthesis, since I do not know of any evidence for its pronunciation. It is parsimonious to assume that it was pronounced like <t.qo.a> in the small script.
How was <t.qo.a> pronounced? That's half of my problem. The other half is how that pronunciation relates to other words for 'chicken' in continental 'Altaic' (which I regard as a language area, not a language family).
Kane (2009: 88) reads <t.qo.a> as teqoa with an inherent vowel e. This reading has a nonharmonic e-a sequence that is unusual for a continental 'Altaic' language.
Shimunek (2017: 372) reads the small script spelling as <t.aq.a>. It is certain that the second symbol stood for something absent from Chinese, though what that somehting was is uncertain.Neither interpretation is a simple match for other words for 'chicken' in the area:
Old Uyghur takığuː (from Clauson [1972: 468]; see that
entry for other Turkic forms)
Middle Mongolian takiya (from Shimunek [2017: 372]; see that entry for other Mongolic forms)
Here's an attempt to make sense out of most of that, taking Kane's interpretation as an endpoint in Khitan:
1. The earliest form of the word was something like *taqɯʁu.
I don't know whether it originates from Turkic, Serbi-Mongolic, or a
third language in contact with them (Ruanruan? Xiongnu?).
2. The vowels metathesized in pre-Khitan: *tɯqaʁu.
3. Medial *ʁ lenited to zero: *tɯqaʁu > *tɯqau.
4. *au metathesized: *tɯqau > *tɯqua. (I thought all au in Khitan were either from *aCu or in Chinese loanwords, but what of au in taulia 'hare' corresponding to Mongolian taulai? Maybe only root-final *au metathesized.)
5. In an unwritten eastern Khitan dialect, *tɯqau or *tɯqua became *tiqo.
6. That *tiqo was borrowed into pre-Jurchen which then lenited intervocalic *q to h: *tiqo > tiho. (It's also possible that pre-Jurchen borrowed Khitan *ɯ as *i, so maybe the Khitan source form retained *ɯ.)
(10.13.1:23: I briefly considered the possibility that *au monophthongized within Jurchen: *tiqau > tiho. But *au should correspond to Manchu oo which is not in Manchu coko. So I think the common Khitan source of the Jurchen and Manchu words already ended in -o.)
7. In the written Khitan dialect, *tɯqua
became teqoa with high vowels lowering under the
influence of a.
8. One Jurchen dialect shifted *tiqo to *tyoqo.
*ty then palatalized to c, resulting in Manchu coko
*[tɕʰɔqʰɔ] (later [tʂʰɔqʰɔ] with a Mandarin-like retroflex
(But I cannot explain why Manchu has -k- instead of -h-.
Manchu intervocalic -k- should be from a *cluster, not a simple
Shimunek (2017: 372) reconstructs Proto-Serbi-Mongolic *tʰakʰɪɣa
'chicken'. Presumably *-ɪɣa was reduced to -a in his
Shimunek (2017: 372) thinks Middle Korean ᄃᆞᆰ <tărk> [tʌrk] 'chicken' is also part of the same word family.
Old Japanese təri 'bird' has been linked to that Korean word
(perhaps most recently by Francis-Ratte [2016: 211] who regards them as
I suppose one could regard Korean and Japanese r as attempts
to imitate a foreign *ʁ. I can't explain the Korean and
2. <t.qo.a> 'chicken' is an example of what I call a stable
in Khitan. I propose grouping obstruent-initial words in Khitan
into two categories, stables and alternators. Stables
are always written consistently with the same type of consonant,
whereas alternators alternate.
<x.ai> 'open' < 開 *kʰaj
||<p.u> ~ <b.u>
'to be' (Kane 2009: 156)
(184.108.40.206:57: Chinese loanwords seem to generally be stable,
though there are exceptions like the syllable in the table above.)
I think the stables mostly have unaspirated-aspirated oppositions:
I've omitted x /x/ which doesn't fit into the above
paradigm. The shift of k > x is also in Jurchen and
Manchu (under Khitan influence?).
Are the alternations specific to certain environments: e.g., do
/aspirates/ deaspirate between voiced segments? (But why don't all
/aspirates/ deaspirate?). 'Second', 'fourth', and 'to be' are unlikely
to have /aspirates/ since their Mongolian cognates begin with
unaspirated j-, d-, and b-. But ... why would
/nonaspirates/ surface as [aspirates]? Was Khitan like English with
Could the alternators have a third obstruent series
without any unique spellings? The small script was made under Uyghur
Old Uyghur script had only two consonant series (though its Aramaic
source had three). Are the two series of the script an artifact of
Uyghur? Might Mongolic have reduced three series of consonants to two?
3. I didn't see Andrew West's latest post until just now. Two points:
This sequence of three characters 没蜜施 does not make any sense as Chinese, but are here used to transcribe the Old Uighur word bolmïš "to have become" (from bol- "to be, to become" plus perfect participle -mïš) or bulmïš "to have received" (from bul- "to find, to get, to receive" plus perfect participle -mïš) which both occur in the titles of nine Uighur khans between 747 and 848, as recorded in the Old Book of Tang (舊唐書 Jiù Tángshū) and New Book of Tang (新唐書 Xīn Tángshū).
Why were bol- and bul- both written with the same Chinese character 没 (now read with m- in most Chinese varieties, not b-!)?
In the Tang prestige dialect, 没 was pronounced something like *mbor.
Idealized Middle Sino-Korean 모ᇙ〮 <morʔ·> preserves *-r.
Kan-on botsu < Old Japanese Kan-on *mbot preserves the stop part of the initial, though it lacks a final liquid because it was borrowed before *-t > *-r.
Old Uyghur transcriptions of Chinese render Tang prestige Chinese *mb- as <m> or, once in the case of 穆, <p>. (Unfortunately I do not know of any Uyghur transcription of 没.)
The Tang prestige dialect no longer had *b-, so *mb- was the closest available approximation of Old Uyghur b-.
The Tang prestige dialect had no *-l, so *-r was the closest available approximation of Old Uyghur -l.
没 *mbor is the best possible approximation of Old Uyghur bol- 'to be'. But why wasn't Old Uyghur bul- 'to find' transcribed as *mbur? Because the Tang prestige dialect lacked that syllable: earlier *mut had become *mvur instead of *mbur. *mb- could not occur before *u in the Tang prestige dialect. So 没 *mbor had to do double duty for Old Uyghur bol- 'to be' and Old Uyghur bul- 'to find'.
There is some confusion over the two words bolmïš "to have become" and bulmïš "to have received" as they are both written the same in the Old Uighur and Old Turkic scripts, and are both transcribed as 没蜜施 or 没密施 mò mì shī in Chinese (没 is pronounced mut⁶ in Cantonese), which suggests to me that they should be the same word.
Clauson (1972: 332) says early Turkic bol- and bul- were
"normally indistinguishable graphically". I assume the different vowels
have been projected backward from the modern languages: e.g., modern
Turkish has olmak 'to be' (with irregular b-loss) and bulmak 'to find'.
3c. Why did 蜜 *mbir ~ 密 *mbɨir transcribe an Old Uyghur open syllable mï? In theory, 麋糜靡 *mbɨi would have been better fits, but 蜜 and 密 are more frequent characters, so maybe they came first to the transcriber's mind. (No, in fact, 糜靡 are more frequent than 蜜 in Jun Da's premodern corpus!)
4. Japanese amefurashi 'sea hare' has a surprising spelling (in addition to boring ones: アメフラシ, 雨降らし, 雨降):
雨虎 <RAIN TIGER>
ame-fur-ash-i is literally 'rain-fall-CAUS-ing'. ame
obviously corresponds to 雨 <RAIN>, but furashi has
nothing to do with 虎 <TIGER>. (It is a coincidence that 虎 is
pronounced fu2 in Cantonese.)
Are there any native Chinese, Korean, or Vietnamese words for 'sea
hare'? Do the words I found in Wikipedia count, or are they loan
translations of Latin lepus marinus (via sea hare or some
similar European term)?
C 海兔 'sea hare'
K 바다 토끼 pada thokki 'sea hare'
V thỏ biển, lit. 'hare sea' with Vietnamese modified-modifier order
There are five nôm spellings for biển 'sea' at nomfoundation.org:
The points indicate the quality of matches: i.e., the number of
㴜 is the most straightforward; the phonetic 扁 is a perfect match.
𣷷𤅶 have a phonetic with a mismatching tone in the same *register as the tone of biển.
汴 has a phonetic with a mismatching tone in a register differing from that of the tone of biển.
𣷭 has a phonetic with a matching tone, but the vowel of 彼 is a monophthong instead of a diphthong, and 彼 has no -n. Did 𣷭 originate as an error for 𣷷?
220.127.116.11:55: EYE OF THE ARAB
That's what I thought عين العرب `Ayn al-`Arab meant at first. It's actually 'Spring of the Arab' - the water sort of spring. عين `ayn has multiple meanings. `Ayn al-`Arab is also known as Kobanî or Kobane:
Nobody disputes that the town [of Kobane] is a relatively new settlement. Before the 20th century, it was just a water meadow where even great commanders like Saladin used to feed the horses of his army. For a long time, it was referred to as Arab Punarı (“Arab Spring” in Turkish).
Muhsin Kızılkaya, a writer of Kurdish origin, told private Turkish broadcaster CNN Türk on Oct. 13 that Kobane was not even a small village at the turn of the century. “The Germans set a small station there while building the Baghdad Railway. A new settlement was developed around the construction and locals called it Kobane, in reference to the German ‘company’ that built a road in the area,” he said.
The rendering of “company” as “Kobane” seems logical at first glance, considering the fact that both Kurds and Arabs adapt many Western words by changing the letter “m” to “b.”
Really? Initially I assumed that the p of German Kompanie was Arabized as b since standard Arabic has no p. Kurdish has p, and both Kurdish and Arabic have m, so there would be no motivation for changing m to b.
(The correspondence of mp to b reminds me of how
Japanese b is from *mp: e.g., in 旅人 tabibito
< *tambimpitə 'traveller'. See yesterday's entry.)
The Hurriyet Daily News article points out another problem:
Historically, however, the “company theory” sounds weak, as Germans use the word “Gesellschaft” for business companies. “Kompanie,” on the other hand, refers to military units. [See Wiktionary.]
Others have suggested that the middle part of the name Kobane could come from the German word “bahn” (road). In fact, Anatolische Eisenbahn, a German company, built the landmark Baghdad Railway, which some historians see as one of the causes of the First World War.
But if -bane is from German Bahn, what is Ko-?
I'm amazed that such a recent name has no certain etymology.
TANGUT DATABASE 4.0
1. Version 4.0 of my Tangut
database includes a new Unicode column and has corrected data for 11
entries thanks to Andrew West.
Details in the
changelog on sheet 2.
2. I never gave any thought to the d in Japanese 仲人 nakōdo 'matchmaker' (cf. naka 'middle' and hito 'person') and 狩人・猟人 karyūdo 'hunter' (cf. kari 'hunting' and hito 'person') until I learned today that 旅人 <TRIP PERSON> could be read as tabyūdo (just one of six possible readings!). Wiktionary gives the following derivation:
*/tapiputo/ → */tabibuto/ → /tabiudo/ → /tabjuːdo/
Here's my derivation:
Stage 1: *tambi nə pitə 'trip GEN person'
Stage 2: *tambimbitə
The earliest attested form from Man'yōshū (but written semantographically as 客人 <GUEST PERSON>; if not for the reading tradition, I would never have guessed that 客 was read tabi).
*nə p is compressed into *mb.
The regular reflex of this word survives in modern Japanese as tabibito, the most common reading of 旅人.
Stage 3: *tambimbũto
*ə rounds to *o.
Irregular assimilation of *i to the preceding prenasalized labial stop *mb.
Stage 4: *tambiũndo
The nasalization spreads from the vowel to the following stop.
Stage 5: *tabjuːdo
Voiced stops lose prenasalization.
*i is reduced to *j and its length is transferred to the following *u.
Stage 6: [tabjɯːdo]
*u loses its rounding.
3. If Hawaiian mōʻī 'king' is "of recent origin", not in print until 1832, where did it come from? wehewehe.org proposes a link to ʻī 'supreme'. I would expect mōʻī to be a noun-adjective phrase 'supreme mō'. But what is mō? The short form of moku 'district'?
It seems mōʻī got 'promoted' over time: 19th century attestations mean 'temple image', 'lord of images', and 'a rank of chiefs who could succeed to the government but who were of lower rank than chiefs descended from the god Kāne'. Cf. how the Xiongnu title 'crown prince' transcribed in Late Old Chinese as 護于 *ɣwah-wɨa (phonetically [ʁwɑχwɨa]¹?) may be the source of the later Altaic title qaghan 'supreme ruler' (see Vovin 2007 on its etymology).
¹I speculate that uvular [χ] is an allophone of final /h/ in 'type A' syllables which are characterized by lower, backer vowels (like [ɑ] for /a/) and uvulars. The use of *ɣ for what I think was [ʁ] is out of habit and in accordance with tradition.
It is interesting that the second syllable of 護于 *ɣwah-wɨa is a 'type B' syllable which is characterized by higher, less back vowels (like *ɨa < *a). That suggests the original Xiongnu word had a mix of vowel types and that the Xiongnu language did not have vowel harmony like Altaic (or, I think, Early Old Chinese). A very un-Altaic *ʁwɑχwa² with two different vowels may have been simplified to qaghan (i.e., two syllables with the same vowel) in Turkic and Serbi-Mongolic (e.g., Khitan qagha) via Ruanruan.
²Late Old Chinese had no syllable *wa, so 于 *wɨa might have been an approximation of a Xiongnu *wa.
4. 寄席 <GATHER SEAT> yose 'traditional Japanese verbal entertainment theater' is an interesting case of an abbreviation in speech but not in writing. Naver regards it as an abbreviation of 寄せ席 yoseseki 'gather-seat'. seki 'seat' is no longer pronounced in yose, but its character 席 remains in spelling.
18.104.22.168:26: KHITAN SMALL SCRIPT CHARACTERS 108, 110, AND 111
Could the low-frequency Khitan small script characters
110 and 111
be variants of the high-frequency character
If 110 is a variant of 254, then perhaps the low-frequency character
is a variant of
107 ~ 347 <oi>.
Let's see if any instances of 110 and 111 are in environments matching those of 254.
In 契丹小字研究 Research on the Khitan Small Script (1985), 110 always only appears in second position unlike 254 which has no such restriction. Is that significant? Could that imply that 110 is a vowel character that must follow an initial consonant character?
|Xuan 19.24, Zhong 6.6, 13.27, 41.26, 46.4||021.110
|Zhong 41.12, Zhong 44.44||021.110.140
<mo.110.en> looks like a genitive of <mo.110>. If 110 is
<d>, then the above words are mod and mod-en.
Could mod in turn be a variant of 021.247 <mo.t>
'woman-PL'? Both -d and -t are attested as plural
suffixes after vowel-final nouns (Kane 2009: 138-140).
111 is followed by 241 <pu> and resembles
Could 111 241 be <tai pu> for Liao Chinese 太傅*tʰajfu 'Grand Tutor'?
254 appears by itself where 254.122 <d.ai> for Liao Chinese 大 *taj
would be expected. I suspect 254 is an error for 254.122 and not a true
standalone character like 374 (and 111?).
028.111.339.100 is the spelling on pp. 607 and 705 of Research on the Khitan Small Script, but p. 178 has 028.110.339.100.
108 is always in second or third position. Is it in any environments where 107 and/or 347 are attested?
|Xing 4.13, Dao 7.24, 19.8, Xuan 6.20||131.108
|Gu 15.6, 15.8
<u.108.d> could be a plural of <u.108>.
<mo.108.l.ge.ei> looks like a verb + -lge- causative/passive + converb -ei sequence. Interpreting 108 as <oi> would work nicely there: moilgei? But <oi> would result in awkward vowel sequences elsewhere: e.g., cieoien? Could 108 represent a CV syllable absent from Chinese loanwords?
1. What is the etymology of the surname of Kurdish leader Abdullah
Öcalan? It doesn't look Kurdish since it has the vowel ö absent from
Kurdish. The vowel ö is characteristic of Turkish (and Wiktionary
identifies the name as Turkish), but the vowel sequence ö-a
violates Turkish vowel harmony. (Hypothetical harmonic names
would be Ocalan and Öcelen.) Such
violations are possible in compounds and loanwords, but I cannot find
any Turkish words ö, öc, or calan that would enable me
to analyze the name as ö-calan or öc-alan. (There is a
Turkish word alan.)
2. "Areal developments in the history of Iranic: West vs. East" (2018) by Martin Joachim Kümmel looks like a handy all-in-one-place reference for the big picture of Iranic historical phonology.
I'm going to start using 'Iranic' instead of 'Iranian' in linguistic contexts
[t]o avoid confusion with terms related to the country or territory of Iran (especially in recent geneticist papers speaking of prehistoric "Iranian" populations almost certainly not "Iranian" in the linguistic sense) (Kümmel 2018, slide 3)
Iranic is consistent with my use of Turkic to avoid confusion with Turkish referring to the country and dominant language of Turkey.
3. By analogy, maybe I could use Taic to avoid
confusion with Thai referring to the country and dominant
language of Thailand. Or perhaps better yet, Daic
for consistency with Kra-Dai (it is odd that most speak
of a 'Tai' rather than a 'Dai' branch of Kra-Dai, though there is a Kra
4. Back to Kurdish: how did Sorani and Kurmanji develop h-
in hesp 'horse' in this
table? That seems to be an independent Kurdish innovation that has
nothing to do with the equally mysterious h- in Greek ἵππος <híppos>.
The Proto-Indo-European initial consonant of 'horse' was *ʔ-,
not the *s- that became h- in Greek and Iranic.
*s-weakening occurred independently in those two branches, as
it is absent from Indic and cannot be reconstructed at the
Proto-Indo-Iranic level (see Kümmel
2018, slide 14):
Pyu also has *s-weakening: e.g., hi 'to die'
(cf. Tangut 𗢏 3072 2si4
< *CIseH 'id.').
Typing "Proto-Indo-Iranic" and "Proto-Iranic" feels
weird. But are such terms any worse than Indic instead
5. In that same table are Sorani erz ~ erd, Kurmanji erd, and Zaza erd 'earth'. Is that word a northwestern Iranic innovation that has nothing to do with English earth? Wiktionary lists no Iranic reflexes of Proto-Indo-European *ʔer- 'earth'.
6. How did I never encounter the word glossonym before?
Among some Yazidis, the glossonym Ezdîkî is used for Kurmanji to signify an attempt to erase their affiliation to Kurds.
7. Another new word for today: Kurdification.
Kurdification is a cultural change in which non-ethnic Kurds or/and non-ethnic Kurdish area or/and non-Kurdish languages becomes Kurdish.
I don't see how languages can become Kurdish as opposed to be replaced by Kurdish.
8. I don't understand the phonetic logic behind this
After /ɫ/, /t/ is palatalized to [tʲ]. An example is the Central Kurdish word gâlta ('joke'), which is pronounced as [gɑːɫˈtʲæ].
Is /t/ dissimilating after velarized /ɫ/? I think of velarization
and palatalization as being opposites. But palatals are between dentals
and velars in terms of point of articulation, so maybe this rule is
9. I also don't get this
Kurdish rule which involves palatalization next to a velar:
When preceding /ŋ/, /s, z/ are palatalized to /ʒ/.
I guess that happens that happens because [ʒ] (I think phonetic brackets were intended) is closer to /ŋ/ than /s, z/.
Is there any language in which /s z/ become velar [x ɣ] next to a
Sorani allophony reminds me of how Middle Chinese *æ
corresponds to Mandarin [ə] in 生 shēng [ʂəŋ] (I should
continue my series on 生):
The vowel [æ] is sometimes pronounced as [ə] (the sound found in the first syllable of the English word "above"). This sound change takes place when [æ] directly precedes [w] or when it is followed by the sound [j] (like English "y") in the same syllable.
The environment is completely different, though. And I don't
understand why glides are schwa-friendly, though I admit I find [əw əj]
easier to pronounce than [æw æj].
10. I am baffled by Kümmel's use of [ʆ] as well as [ɕ] for Sanskrit
in slide 14 of his
2018 PowerPoint. I have never seen anyone use [ʆ] for Sanskrit (or
any language, really). Is [ʆ] an allophone of /ś/ before /r/?
11. I've never seen the letters ḧ ẍ before. They are optional symbols for [ħ ɣ] in the Hawar and universal extended alphabets for Kurdish. (The Hawar section of the Wikipedia article seems to have the Arabic equivalents of ḧ ẍ reversed. I assume the IPA at the bottom of the article is correct, and that the diaereses are added to indicate voiced versions of h [h] and x [x].)
12. Trying to relate Mandan in North Dakota to ... Welsh seems so random. I didn't know there were deep inland variants of the Welsh-in-America myth. Or that there were this many variants:
In all, at least thirteen real tribes, five unidentified tribes, and three unnamed tribes have been suggested as "Welsh Indians."
Chris Harvey/Languagegeek patiently proves "That Mandan Is Not Welsh".
Just seven years after Harvey wrote that article, Mandan became extinct when Edwin Benson died in 2016.
13. Backer is bigger in Mandan:
Mandan, like many other North American languages, has elements of sound symbolism in its vocabulary. A /s/ sound often denotes smallness/less intensity, /ʃ/ denotes medium-ness, /x/ denotes largeness/greater intensity
Is there a similar gradation for stops: /t/ 'less' vs. /k/ 'more'? (There is no /tʃ/.)
22.214.171.124:42: THE AMERICANIZATION OF MICHÁLKA
1. It's hard to predict how non-English names are pronounced in American English. For instance, I wouldn't have guessed that Czech Michálka [ˈmɪxaːlka] would be pronounced as [miːˈʃɑːkə]. I'm not too surprised by the stress moving to the original long vowel (unstressed long vowels are difficult for English speakers). [miː] may reflect a Czech variety with [i] instead of [ɪ] (English has no nonfinal short [i], so [i] had to be lengthened to [iː]). But how did ch end up as [ʃ]? By analogy with French? Did <l> become silent by analogy with the silent <l> of words like <talk>?
2. Czech vowels look phonologically symmetrical, but that symmetry is lost in phonetic notation (source):
Why are /iː i/ not phonetically the same height? (They are both
equally high in Eastern Moravian
Czech. The [miːˈʃɑːkə] may reflect a Czech variety with /i/ [i]
instead of /i/ [ɪ].)
What motivates /i eː e/ being lower than /u oː o/?
Slovak, on the other hand, at first glance seems to have a nearly symmetrical system apart from the nearly extinct vowel /æ/ which has no long counterpart (source):
||/i/ [i]||/uː/ [uː]
||/oː/ [oː]||/o/ [o]|
|/æ/ [æ]||/aː/ [aː]
But in closer notation, /eː e/ are higher than /ɔː ɔ/ - the reverse of Czech!
||/i/ [i̞]||/uː/ [u̞ː]
|/eː/ [e̞ː]||/e/ [e̞]|
|/oː/ [ɔ̝ː]||/o/ [ɔ̝]|
|/æ/ [æ]||/aː/ [aː]
Can Czech and Slovak native speakers tell each other apart from the
minor differences in their vowels?
3. Why is Japanese sazo(kashi) 'certainly' spelled 嘸(かし) with 嘸 <MOUTH.NOT>? 嘸 has represented several morphemes over time in Chinese. The earliest one known to me is Late Old Chinese *mɨaʔ 'surprised' in 漢書 Book of Han (111 AD). But none mean 'certainly'. Did someone in Japan think that 嘸 would be appropriate for sazo 'certainly' because 無 <NOT> would convey 'no (doubt)' or 'no (choice)': i.e., inevitability and hence certainty?
口 <MOUTH> is a common radical in Chinese grammatical words,
though I can't think of any parallels in Japanese off the top of my
head. (The one made-in-Japan kanji with口 <MOUTH> that comes to
mind is 噺 hanashi 'story' which isn't a grammatical morpheme or
even an abstract one like sazo 'certainly'. I just found that Wiktionary
has a list of made-in-Japan kanji - the only other one with 口
<MOUTH> is 囎,
a phonetic symbol for the first syllable of the placename 囎唹 Soo).
Da's general Chinese ranks:
|無 = 无||56
|嘸 = 呒
Windows 10's Japanese IME does not include 嘸 as a choice for sazo,
but it does include 嘸かし for sazokashi after さぞかし.
さぞかし: 5.34 million
さぞかし outnumbers 嘸かし by 289 to 1.
4. Wiktionary lists so as the on reading for 囎 even though by definition a made-in-Japan kanji cannot have borrowed Chinese readings. It can, however, have Chinese readings, as a Chinese reader could read it like a component that could be interpreted as a phonetic: e.g.,
噺 hanashi as xīn like 新 xīn
囎 so as zèng like 贈 zèng
Wiktionary also lists a kun reading shō. If 囎 is only for 囎唹
Soo, when is shō used?
126.96.36.199:54: THE EARLY LIFE OF 生 (PART 1)
1. Continuing from "The Two Lives of 生" with a new title:
For a long time - all the way into the late Old Chinese period - 生 belonged to the 耕 rhyme category.
In 詩經 Shijing
about three millennia ago, 友生 'friend' rhymed with 平 'peace',
another 耕 rhyme word (translation by James Legge):
I reconstruct 生 as Old Chinese *sIreŋ and 平 as Old Chinese *CIPreŋ.
And shall a man,
Not seek to have his friends?
Spiritual beings will then hearken to him;
He shall have harmony and peace.
In modern Cantonese, 生 and 平 have pairs of readings that don't rhyme at all:
How did that happen? Stick with me for future parts.
2. I found the Hong Kong Characters section of the 漢語多功能字庫 Multi-function Chinese Character Database when looking for Cantonese 𢫏 kam2 from the previous post. I don't know why I never explored that database before. I was aware of it but just never clicked. I don't have time to look around, so I'm just going to look at the 33-stroke Hong Kong characters:
𡤻 lyun4 '?' < 女 <FEMALE> + 鸞 lyun4
鱻 sin1 < 魚 <FISH> x 3 'fresh' (I guessed this was a variant of 鮮 sin1 'fresh', and I was right)
騐 for 驗 yim6 'to test' (even though 念 nim6 has n-!)
亱 for 夜 ye6 'night' (even though 但 daan3 sounds nothing like ye6!)
Both appear to be graphic errors (facilitated by the rhyming of 驗 and 念).
龗 ling4 'dragon' < 霝 ling4 + 龍 <DRAGON>
Is ling4 an ablaut variant of 龍 lung4
'dragon', or is it merely a different spelling of 靈 ling4
'spirit' used for 'spirit' as a reference to dragons? (Although I cite
Cantonese readings here, the ablaut would be at the Old Chinese level
and involve an alternation of *e and *o, not i
Is 𡤻 a character for girls' names?
3. When I tried to type 靈 líng 'spirit' (Jun
for simplified 灵: #730) for the previous topic, Windows 10's Microsoft
Pinyin IME did not include it in its 94 choices for Mandarin ling
(excluding the lin-graphs
after those 94). I don't understand why some common characters don't
appear in the first batch of options. The first four options are (with
Jun Da rankings)
1. 零 líng 'zero' (#1498)
2. 令 lìng 'order' (#267)
3. 另 líng 'other' (#620)
4. 霛 líng (variant of 靈; #8145)
I've never even seen 霛 before.
Another case like this is 家 jiā 'home' (Jun Da #55) which is not in Microsoft Pinyin IME's 89 choices for Mandarin jia (excluding the ji-graphs after those 89).
To type 靈, I typed <lingjing> (see topic 4), got 靈境, and deleted the second character 境.
To type 家, I typed <jiazu> 'family', got 家族, and deleted the second character 族.
I shouldn't have to do that.
4. I meant to type <jingling> for 精靈 jīnglíng 'spirit' and ended up discovering another word 靈境 língjìng 'spiritual territory' instead.
Taishūkan's 新漢和辞典 New Sino-Japanese Dictionary defines 靈境 <SPIRIT TERRITORY> as 靈地 <SPIRIT EARTH>: こうごうしい土地 kōgōshii tochi 'godly land'.
kōgōshii has a long ō common in Chinese loanwords like 皇后 kōgō 'empress', but if it were a Chinese loanword, it wouldn't be spelled in kana as こうごうしい. It has a partly semantographic spelling 神神しい <GOD GOD si i> revealing its true etymology. The Japanese root kamu- 'god' was compressed into kō:
*kamu > *kau > *kɔː > kō [koː]
The gō of kōgōshii is the same kō but with
the voicing characteristic of the initial consonants of second elements
of compounds; cf. 神神 <GOD GOD> kamigami 'gods'
< kami < *kamu-i 'god'.
5. Korean 간직 kanjik 'storing away' sounds like a Chinese loanword but has no obvious Chinese etymology. Martin et al. (1967: 41) suggests a Sino-Korean source *看直 'look straight' which would be pronounced kanjik in Korean. It's a perfect phonetic match, but I don't see how it can semantically fit.
6. Today I learned that no get means 'does not have' in West African Pidgin English as well as in Pidgin (Hawaii Creole English/HCE):
25% of young pipo no get one single friend
I say pipo too.
HCE has about 600,000 speakers (Sakoda and Siegel 2003: 1), whereas West African Pidgin English has about 75 million! The largest French-based creole seems to be Haitian Creole with 12 million speakers.
Here's "The absolute beginners' guide to [West African] Pidgin" by speaker Kobby Ankomah-Graham:
Pidgin is defined by its practicality. Fluency will reduce how much you have to pay for cab fares or market tomatoes. [...] Advertising in Pidgin – once unthinkable – is now commonplace.
That's not yet the case for HCE in Hawaii, though we do have HCE greeting cards in stores.
188.8.131.52:58: TANGUT DATABASE 3.2
1. Version 3.2 of my Tangut
database has nine corrected readings thanks to Andrew West. Details in the
changelog on sheet 2. More soon.
Zaytsev compares the first and second printings of one of the most
important books I have ever used: Eric Grinstead's Analysis of the
I borrowed the second printing from the University of Hawaii library in
1994 and got my own copy a few years later. I have never seen the
first printing before:
My copy of the 1st printing has 3 inserts (2 of them are glued) with info in Danish. This allows us to know some more facts related to Grinstead’s biography: his PhD thesis was submitted to the Fac. of Philosophy at the U. of Copenhagen on 29 Nov 1971, and defended on 30 Jan 1973
That information has been incorporated into Grinstead's Wikipedia entry. I hope more facts about Grinstead emerge in the years to come.
It is a shame that no one in Copenhagen picked up his torch.
3. Could Tangut 𗼧 6037 1kew2 'to instruct' be a loan from Chinese 敎 'to instruct'? 1kew2 resembles keu, the Sino-Japanese reading of 敎 borrowed from southern (not northwestern!) Chinese c. the 5-6th centuries. A few modern Yue varieties have keu-like readings. (But Yue in the south is only distantly related to the northwestern Chinese known to the Tangut.)
There are two problems.
First, normally the rhyme of 敎 corresponds to Tangut -o2, not -ew2 (Gong 2002: 375).
Second, Li Fanwen's (2008: 949) Chinese glosses for 6037 do not match his English gloss 'to instruct':
誥 'admonition, to admonish'
詔 'decree, to decree'
The Combined Homophones-Tangraphic Sea entry for 6037 - with the only attestations of 6037 that I know of - may point to 'decree' since 6037 is something an emperor does.
184.108.40.206:57: MYŎNGDONG THEATER: LET'S MEET AT WALKERHILL1. The photo I chose to illustrate my last post is of 明洞劇場 Myŏngdong Theater from 워_커힐에서 만납시다 Wŏ̄khŏhil esŏ mannapshida (Let's Meet at Walkerhill, 1966). (Walkerhill is one word; it's the name of this hotel.)
movie is on the
韓國映像資料院 Korean Film Archive's YouTube channel with Korean and
English subtitles - great for listening comprehension practice!
Let's look at the opening which is entirely in hanja with two exceptions.
0:55: The company name 株式會社東南亞映画公司 Chushik hoesa Tongnam A yŏnghwa kongsa 'Stock Company Southeast Asia Movie Company' is interesting in two ways. It combines the Japanese term 株式會社 'stock company' with the Chinese term 公司 'company'. And why is a Korean company called Southeast Asia Movie Company?
會 resembles 曾 with a closed 八 on top instead of 人 and an extra
horizontal stroke joined to the central vertical stroke.
The left-hand vertical strokes of 映 and 司 are elongated so 日 resembles 阝 and 口 resembles a squarish P.
画 (a simplification of 畵) has a vertical line that goes all the way to the bottom horizontal line.
The ㇆ stroke of 司 resembles the first two strokes of 可.
1:01: 泰 appears as 𣳾.
式 has its dot moved to the top left.
社 has a hook on the bottom left.
給 has the same closed 八 on top instead of 人 as 會 in this frame and 0:55.
1:06: 워_커힐 Wŏ̄khŏhil 'Walkerhill' has an underscore vowel length marker. There is no such marker after khŏ, even though Japanese would perceive Walker as ウォーカー Wōkā with two long vowels.
1:12: 7-stroke variant of 㓰, a simplification of 劃.
1:27: 崔 has a long bottom horizontal stroke extending leftward.
李 has a 子 resembling the bottom of 雪.
1:35: I feel sorry for the people whose names are illegible here. I
would write about their names if only I could make them out!
1:44: 玉 has a mirror-image dot.
后 is written entirely with straight lines. I didn't know it could be used in names.
2:00: The 重 of 勲 has a bottom horizontal line so long that it extends under the left half of 力.
梁 is missing its right dot, and the top right component looks like
2:15: The 卜 of 朴 resembles hangul ㅏ.
2:33: I've never seen 変 in a name before. The left and right dots of the top element 亦 are mirror-imaged and the central strokes are straightened.
2:47: 永 is written as 水 with 二 on top, resembling Khitan small script 004
with an extra horizontal line.
Blink and you'll miss what I assume is the surname 徐 based on the Korean Film Archive's credits listing.
투위스트 Thuwisŭthŭ 'Twist' is the only other hangul word in the
2:57: 具 is missing one horizontal line, and two of the horizontal lines don't go all the way across to join the vertical lines. Those nonjoining lines are also in 貞 above, and 具, 貞, and 南 all have open left corners. In handwriting there can be a slight gap in that position, but in the stylized hand of these credits, that gap is bigger than usual.
The of 妊 looks like an angular ナ floating over an equally angular メ.
3:03: The 金 of 錫 has a bottom resembling 止 instead of Khitan small script 295 <p>:
The dot of 太 is moved leftward and overlaps with the lower left leg of 大.
The top left 丿 of 舞 has been reduced to a dot on top.
辰 is subtly different - the components under 厂 look like Γ atop Khitan small script 028 <sh>:
3:16: 星圃 Sŏng-pho 'Star-field' (if I'm reading correctly) is a nice name.
3:23: The top right of 監 looks like a horizontal line over ハ.
The ヰ of 韓 has a first horizontal stroke extending further to the left than the second one beneath it.
And skipping to the very end ...
1:36:34: 謝 has 扌 instead of 寸. I have never seen 扌 on the right of
any Chinese character before; it is the left-hand variant of 手.
2. Surprise! The
Dictionary of Chinese Character Variants has an entry for 媤 in its
appendix of made-in-Korea characters even though it already has an
entry for 媤 as a Chinese character.
3. Thirty years ago I believed I knew all the 常用漢字 jōyō kanji. But there were only 1,945 jōyō kanji then. 196 more were added (and 5 were subtracted) in 2010. I've seen all of the 196 before but one: 錮. Great, a Japanese ninth-grader knows that character but I don't.
I suspected 固 has been substituted for it, and I was right. Wiktionary explained that until 錮 became a jōyō kanji, the word 禁錮 kinko 'imprisonment without labor' was still written as 禁錮 in the criminal code, but in laws enacted (shortly?) after the announcement of the tōyō kanji (the predecessor of the jōyō kanji which excluded 錮), 禁錮 was written in mazegaki 'mixed writing' as 禁こ with hiragana こ <ko>. Newspapers, however, used 禁固 with kakikae: substituting the high-frequency tōyō and later jōyō kanji 固 for its homophone 錮.
大修館新漢和辞典 Taishūkan's New Sino-Japanese Dictionary gives only two words with 錮, 禁錮 and 錮疾, an alternate spelling of 痼疾 'chronic disease'. How common are these words? Google site:.jp stats:
禁固 1.14 million
It's looking to me as if 錮 is required in schools solely for the word 禁錮 (which is more frequently written 禁固).
Frequency stats for 錮 from Dmitry Shipka:
錮 occurs just once in the entire Twitter corpus! I don't know how
meaningful #4490 is, since there's no real sense in ranking it higher
or lower than any other characters with only one instance in Twitter.
I think any kanji ranked lower than #2000 are not really worth
learning for most people. 錮 is ranked way lower than #2000 except in
the news, presumably due to reports about people imprisoned without
labor. The sort of thing I don't read. So I don't feel too bad about
not knowing it even existed until now.
I like Gakken's A New Dictionary of Kanji Usage (1982) which is frequency-based and includes frequent kanji regardless of whether they are jōyo or not. (It has an appendix of jōyō kanji that weren't frequent enough to make the cut for a main entry.) It doesn't include 錮. But would a new edition include it in the main section, or would it be in the appendix?
9.28.1:36: Just found that Windows 10's IME favors 禁固 over 禁錮, specifying that the latter is a legal term. And 禁こ is not in the list of potential spellings for kinko.
220.127.116.11:57: MYŎNGDONG THEATER (PART 1)
All topics from yesterday that I didn't have time for last night:
1. I'm afraid to look at this page of old 明洞劇場 Myŏngdong Theater-related images because it might be packed with variant hanja.
Before getting to those, let's look at the two maps at the top of the page:
1960년 명동극장 지도 1 (1960 Myŏngdong Theater Map 1)
1972년 명동극장 지도 2 (1972 Myŏngdong Theater Map 2)
(I can't figure out how to link to the maps, so I've copied their
titles which can be searched for.)
The 1960 map is entirely in hanja except for the non-Chinese word
유네스코 Yunesŭkho 'UNESCO' which can't be written in hanja and 극장 kŭkchang,
which can - but isn't, perhaps because its hanja 劇
場 would be almost illegible in a small space. The tiny characters are
not well-written and hard to read: e.g., I initially misread 市公舘 shigonggwan
'city public hall' as 市25舘 which makes no sense.
The 1972 map, on the other hand, is entirely in hangul. It is dangerous to make big claims based on just two items, but I cannot help but think the difference between the two maps reflects the shift away from hanja (which was far from complete in 1972 - the 1972 movie ads toward the bottom of the page still have hanja).
It's also telling that the two following announcements about the theater from 1953 and 1958 have had to be completely transcribed in hangul - something that wouldn't be necessary if hanja-heavy text were still the norm in 21st century Korea.
I haven't actually gotten around to discussing the variant hanja on
that page yet. Later.
2. When I started studying linguistics thirty years ago, I was put
off by an exercise whose answer was that Korean /s/ voices to [z]
intervocalically. No, it doesn't, which is why foreign z is
borrowed as /c/ which voices to [dʑ] intervocalically: e.g.,
브라질 <p.u r.a c.i.r> [puradʑil] 'Brazil'.
Or does it? T. Cho et al. (2002: 212) "in fact observed that about 46% of tokens of /s/ were fully voiced in this position". I have never heard [z] in Korean. Is this [z] a recent innovation?
Historically, *s did become [z] in intervocalic position, and that [z] then lenited to zero, which is why there are modern alternations such as
낫다 <nas ta> nat-ta get.better-FIN 'to get better'
나았다 <na Øat ta> naØ-at-ta get.better-PAST-FIN 'got better'
in which earlier *s survives as [t] before consonants but
vanishes between vowels.
Is history repeating itself?
I am very skeptical of intervocalic /s/ becoming long [sː]
according to Wikipedia.
I have never heard that /s/ sounding like Japanese ss.
3. I have no idea why the bound noun chabal in the expressions
자발(머리) 없다 chabal (mŏri) ŏpta (lit. 'X [head] lack')
자발 적다 chabal chŏkta (lit. 'X few')
both 'to be quick-tempered, impatient, restless' (Martin et al. 1967: 1379; is chabal 'patience'?)
has a variant 재발 chaebal. I'd expect a Cae-variant if the second syllable had i (e.g., 애기 aegi < 아기 agi 'baby'). ae is from *ai. But obviously there is no i in chabal. And a shift in the opposite direction (chaebal > chabal) has no precedent.
4. Martin et al.'s 1967 dictionary says 媤 <HUSBAND'S.FAMILY>
is a "Korea-made character", but Wiktionary gives
non-Korean readings for it: Mandarin sī, Cantonese si1,
and Japanese shi. The earliest attestation I can find is in 集韻 Jiyun (1037)
which lists a variant 㚸.
The Korean reading 시 shi is unusual because Mandarin sī should
correspond to Korean sa. shi could theoretically be a borrowing
of an Early Middle Chinese *si rather than a c. 8th century
Late Middle Chinese *sz̩ that would have become the expected sa.
However, I can't find any evidence of 媤 before the 11th century. And if
媤 had existed in the Early Middle Chinese period, it would have been *sɨ
with a central vowel, not *si with a front vowel.
Moreover, I can't understand what would motivate reading 媤 as shi. Its phonetic is 思 sa, and no other characters with 思 are read shi. No, wait, no other common characters with 思 are read shi. There is a character 緦 <SACK.CLOTH> also read shi. 東國正韻 Tongguk chŏngun (Correct Rhymes of the Eastern Country, 1448) gives the prescriptive reading of 緦 as ᄉᆡ /sʌj/. Normally 15th century /ʌj/ becomes modern ae, but maybe in this instance it became i. 媤 is not in Tongguk chŏngun, but if it were, would its reading have been given as /sʌj/?
The correspondence of /ʌj/ to Mandarin i is unusual ...
In Old Chinese, 緦 was *sə. If 媤 had existed in Old Chinese,
it too would have been *sə. That *sə should have
developed into Late Middle Chinese *sz̩ which would be borrowed
into Korean as /sʌ/ which would become modern sa.
But what if 緦 and 媤 had an Old Chinese variant *CAsə? (Perhaps such a sesquisyllable was actually original, and *sə is but a reduction.) The *A would condition the warping of *sə:
*CAsə > *CAsʌɰ > *sʌj
The end result matches the /sʌj/ for 緦 in Tongguk chŏngun. shi for 緦 and its homophone 媤 would then reflect variation within Chinese rather than a Korean-internal random change.
18.104.22.168:57: VARIANT HANJA C. 1963
No time to revise my Tangut database today, but Tangut turns out to be marginally relevant in a most unexpected place: a set of South Korean newspaper movie ads from c. 1963 (h/t ╹ω╹りなれはあとい@衛兵るた)
with hanja variants that bring to mind Juha Janhunen's hypothetical
Parhae script. (No, I'm not drawing a direct line between the two.
Parhae characters did not survive into the 1960s. I'm merely pointing
out two sets of Chinese character variants from the greater Korean
1. This 人 + two strokes variant of 人 <PERSON> in an ad for 夫婦條約 Pubu choyak (The Husband-Wife Contract, 1963) reminds me of Eric Grinstead's (1972) hypothesis that the Tangut character component 𘢌 <PERSON> (in one out of five characters!) was derived from that variant.
Here's another example in an ad for 傷한 갈대를 꺾지 마라 Sanghan kaldaerŭl kkŏkkchi mara (Do Not Break a Damaged Reed, 1962).
A similar variant of 文 (Later Han example here) reminds me of Khitan small script character
which was pronounced something like [je] judging from its use in Chinese loans. Or was it? Kane (2009: 327) pointed out that in native words, 327 combines with a-graphs: e.g.,
327-123 <327.ar> (Xu 18.18).
327-261-051-189-123 <327.l.gha.a.ar> (Xu 33.2)
[je] coexisting with [a] is very un-'Altaic'. Did 327 have two readings, one for Chinese loans and another for Khitan words? Or are we still far from understanding how Khitan vowel harmony worked?
Dotless 文 326 (reading unknown) combines with both a- and e-graphs - again, very un-Altaic: e.g.,
326-100 ~ 326-361 <326.en>
- 100 and 361 are variants
That last word looks like the feminine counterpart of masculine <327.l.gha.a.ar>. So is one word misspelled? In other words, was that word originally spelled with 326 or 327? Did 326 and 327 originally represent a harmonic pair? Was one [ja] and the other [je]? Compare with how Manchu differentiated a and e [ə] with a dot for the latter centuries later. The use of the dot in the Khitan small script to indicate grammatical masculine gender should also be taken into consideration when interpreting dotted 327.
2. This simplification of 演 as ⿰氵⿱𡧇儿 in an ad for 자이안트 Chaianthŭ = Giant (1956) is new to me. (A clearer comparison of the two.) ⿰氵⿱𡧇儿 is not in the Dictionary of Chinese Character Variants entry for 演.
Notice how the vertical part of ㄴ <n> in the logo for 자이안트 <ch.a Ø.i Ø.a.n th.ŭ> is barely visible. But the letter has to be ㄴ <n> because ㅡ <ŭ> is not possible in that position.
絕讚 chŏlchan 'highest praise' is written almost unrecognizably:
⿰⿱凵扌⿱一丷⿱(己) = 絕
⿱凵扌 doesn't look much like 糸
⿰言⿱(先)见 = 讚
the two 先 have been reduced to a single 先-like character with four strokes (no top left 丿, and the vertical line and bottom left 丿 are one stroke)
3. In the ad for Let No Man Write My Epitaph (1960) at the bottom left of this image, dots separate foreign names even though spaces would suffice. Such dots are commonly used to separate foreign names in Japanese, but I've never seen the practice in Korean before.
The character _ is used to indicate vowel length, much like Japanese ー (but note the different size and placement relative to the base line - and how it can go under the word divider dot):
진·세바그 <ch.i.n-·s.e p.a. k.ŭ> Jīn Sebagŭ [tɕiːn sʰebagɯ] 'Jean Seberg'
샤_리·윈타 <s.ya_r.i·Ø.u.i.n th.a> Shāri Wintha [ɕʰaːri wintʰa] 'Shelley Winters'
the name was Koreanized as if it were Shirley Winter; the use of a for English /ɚ/ here and in Sebagŭ is an older practice probably influenced by Japanese; modern Korean converts English /ɚ/ into ŏ, an option absent from Japanese
9.26.0:48: Next to that ad is an all-text ad for 다이엘 Taniel (Daniel; Le Puits aux trois vérités, 1961) in Korean. It too has dots as name dividers. But its text is vertical, so its vowel length marker is also vertical 丨: e.g.
<k.ŭ ro-t.ŭ> Kurōdŭ [kuroːdɯ] 'Claude'
Nowadays foreign /l/ and /r/ are borrowed differently in Korean in medial position (e.g., Claude is now 클로드 Khŭllodŭ), but at this point they seem to be both borrowed as Korean /r/ even when /r/ has a mismatching allophone: e.g., Morgan as 몰간 <m.o.r k.a.n> /morkan/ [molgan] (now 모간 <m.o k.a.n> is preferred). Another example is the aforementioned Shirley as 샤_리 <s.ya_r.i> /syaːri] [ɕʰaːri], now 셜리 <s.yŏ.r r.i> /syŏrri/ [ɕʰɔlli].
4. 禮 is normally simplified as 礼, but this ad for 江華道令 Kanghwa Toryŏng (The Reluctant Prince, 1963) has 𥘇 with an extra stroke.
Kanghwa Toryŏng doesn't mean 'The Reluctant Prince'; it is another name for King 哲宗 Chŏlchong (r. 1849-1863).
22.214.171.124:56: TANGUT DATABASE 3.1
1. Thanks yet again to Andrew West for submitting corrections for my Tangut database. Version 3.1 has
a sort column to facilitate restoring the original row order
notes on cases when Li Fanwen's 1997 and 2008 dictionaries have
different numbers for the same character
a corrected reading for 𗂰 2li4
'west' (the Tangut translation of Andrew's name! - more to come).
2. What is the etymology of Cantonese aa6 'ten' in '31'-'99' (excluding the tens: '40', '50', etc.)?
三十 1saam1 sap6 'three ten' = 'thirty'
卅呀 saa1 aa6 X 'thirty 6aa X' = 'thirty-X'
卅 1saa is a contraction of the words for 三 'three' and 十
'ten' written as three 十 <TEN> fused together (into what
almost looks like <FOUR> in the Khitan large script). I
would have expected saam1 and sap6 to fuse into ˟saap.
The loss of -p is irregular in Cantonese.
Tone 6 points to a *voiced initial, presumably *ŋ- (呀 has a *ŋ-phonetic
牙). I wonder if aa6 is a linking particle related to the aa3
(not aa6!) in this list of items cited on Wiktionary:
jan1.wai4 gwok3 aa3, kyun4 aa3, wing4 aa3, gaai1 hai6 nei5 jau5, zi3 dou3 sai3 sai3, sing4 sam1 so2 jyun6.
because kingdom aa3, power aa3, glory aa3, all is you have, reach to generation generation, sincere heart NMLZ wish.
'For thine is the kingdom, and the power, and the glory, for ever. Amen.' (1882 translation of the Gospel of Matthew)
But the tones are different: the linking particle is aa3, not aa6. Did tone 6 spread from 'ten' to the linking particle?
*saam1 sap6 aa3 > *saam1 sap6 aa6 > saa1 aa6?
3. I couldn't help but think Kage Baker was named after Japanese 影 kage 'shadow'. But her name is disyllabic:
Her unusual first name (pronounced like the word cage) is a combination of the names of her two grandmothers, Kate and Genevieve.
4. I'm surprised this concept came from the pen (or should I say word processor?) of a language teacher:
The cyborgs can recognize, understand, and speak any known human language instantly, including local variants and dialects.Their lingua franca is called "Cinema Standard", presumably the English spoken in 20th century movies, with which most Company operatives are obsessed.
The words are Wikipedia's and not Kage Baker's. Interestingly their lingua franca wasn't Elizabeth English which was her specialty. I suppose Cinema Standard was for the reader's convenience.
TANGUT DATABASE 3.0
Thanks again to Andrew West for submitting corrections for my Tangut database. Version 3 has
a column for Li Fanwen 1997 numbers
corrected entries for characters 5995-6074
no more ghost entries for characters 5995a, 5996a, 5997a, 5998a, 5999a, and 6075-6080
See the changelog on sheet 2 for details. Even more corrections soon.
126.96.36.199:57: TANGUT DATABASE 2.1
Thanks to Andrew West for submitting corrections for my Tangut database. Version 2.1 has corrected readings for ten characters (see the changelog on sheet 2). More corrections soon.
HIIRAGI ISN'T HOLLY
1. And these four Khmer characters aren't (just) for Pali:
ឩ <°ū> is for Sanskrit as well as Pali.
ឨ <°û> is for native Khmer - I think.
19.9.17:35: I could be totally wrong. I imagine the late Philip Jenner thinking this site is ridiculous, though I also hope he might have appreciated interest in his beloved language. The entry on ឨ <°û> at the link asks the dumb question,
ឝ <ś> is for Sanskrit (there is no ś in Pali)
ឞ <ṣ> is for Sanskrit (there is no ṣ in Pali)
Last night I noticed khmerfonts.info said,
The last 4 characters [i.e., <°ū °û ś ṣ> above] are used only for pali.They are rare, but that doesn't mean they're for Pali.
2. I found this post by No-sword (Matt Treyvaud) while trying to find more information about Flora Best Harris:
Hiiragi is often pressed into service as a Japanese translation for "holly" (in the Christmassy sense), but in fact it's a different plant: Osmanthus heterophyllus, a.k.a. "false holly". Completely different order from actual holly.
I'll never think of 柊 hiiragi as simply 'holly' again. Though I was in good company - that's what Flora Best Harris called it in English. Who was the first to translate hiiragi as 'holly'?
9.9.9:39: Is 'holly' in fact a translation of another, earlier (Dutch?) translation of hiiragi? Now I'm curious about early Dutch-Japanese lexicography. Early Portuguese-Japanese lexicography gets more attention.
PRACTICAL SCHOLARSHIP: MASTER'S IN TYPEFACE DESIGN
1. Fonts are the blood of the digital world. We can't read on
machines without them. And all fonts have typefaces. As ZURB puts
A font is a container of type.
A typeface is the design of a set of characters — letters, numbers and punctuation.
In other words, fonts can't be empty containers. There are no fonts
without designs. And as the Department of
Typography & Graphic Communication at the University of Reading
explains, typeface designs
rely on a deep web or of historical, cultural, and technical understanding, as well as plain-old form-making skills. From the impact of traditional forms of writing, the developments in the technologies of type-making and typesetting, the typeface designer needs to be aware of how texts are transmitted and shared in each society, and respond to the editorial practices and conventions of each market.
That university's MA Typeface Design (MATD) program trains students
to tap into that "deep web", to be capable of producing scholarly
knowledge as well as applying that knowledge to the practical task of
creating beautiful texts in many scripts.
I've been slowly reading Zachary Quinn Scheuren's MATD dissertation "Khmer Printing Types and the Introduction of Print in Cambodia: 1877-1977".
I found that last night while trying to Google whether Franklin Huffman's coauthor Im Proum's surname was Im or Proum. Google Scholar treats Proum as the surname, but I still don't really know.
2. This morning I found this long list of
samples of predigital Khmer script at khmerfonts.info whose front page is a list of samples of
digital Khmer fonts.
And tonight I found khmerfonts' page on how to make a
Khmer font. (But see my next entry!)
3. Not long ago I wrote a post showing my ignorance of Old Khmer script. I no longer have an excuse to be in the dark. This morning I found SEAlang's Old Khmer images page which allows me to see Old Khmer texts. But at the moment I can't figure out how to get these features to work:
Clicking an image will load either:
analysis of the inscription;
a map showing the original location of the inscription, or
a corpus-style listing of the text
depending on which button in the upper right is blue.
But maybe I'm just a prisoner of my computer illiteracy (exemplified by the deliberately primitive design of this website - my philosophy is not to do anything I don't understand ... which doesn't explain the innumerable forays into the unknown [for me] on this blog ... so maybe that's not my philosophy).
Fortunately, I can check my readings of the texts using the Corpus of Khmer Inscriptions. Unfortunately, only inscriptions with Jenner's readings are up, so I'm out of luck with K.27 which isn't one of them. Looking at K.28, I see virāmas which look like superscript dashes. When did they fall out of favor in Khmer?
4. I've also been slowly reading Meredith McKinney's "Classical Prose" in The Routledge Handbook of Literary Translation (Kelly Washbourne and Ben Van Wyke, eds.). McKinney mentions Flora Best Harris, an early translator of Japanese into English. I wonder how Harris learned Japanese in those days.
188.8.131.52:45: THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 14)
Indic scripts generally have two types of vowel symbols:
independent symbols for word-initial vowels
dependent symbols for vowels after consonants
The Khmer script has both types of symbols, though they are not quite used the way one might expect:
Khmer has no initial vowels; all vowels must be preceded by a consonant.
So in theory Khmer should get by with only the dependent symbols.
But in reality Khmer has both symbols.
One might then predict that Khmer uses the independent symbols for initial vowels in Indic loanwords. Such words are pronounced with an initial glottal stop in Khmer, but they had zero initials in Sanskrit and Pali.
That prediction is true ... but Khmer also uses the indepedent symbols for some (not all) native [ʔ]-words: e.g., ឲ្យ <°o₂ya> [ʔaoj] 'to give' with independent ឲ <°o₂>¹ and a subscript ្យ <ya>. (I use <°> to indicate independent vowel symbols in transliteration.) [ʔaoj] can also be written with a dependent vowel ោ <o> and online យ <ya> as អោយ <ʔoya>. The two spellings coexist side by side²:
Lastly, there is the complication of the four independent
syllabic liquid symbols: <r̥ r̥̄ l̥ l̥̄>. Obviously consonants
are not vowels, but in Sanskrit they behave like vowels and
consequently have both independent and dependent symbols in some Indic
scripts: e.g., Devanagari (below):
ऋद्धि <°r̥ddhi> r̥ddhi- 'supernatural power'
अमृत <amr̥ta> amr̥ta- 'immortal'
Khmer has no syllabic consonants, so Sanskrit syllabic liquids were borrowed as liquid-vowel sequences. Initial Indic syllabic liquids are written with independent syllabic liquid symbols, whereas there is no special symbol for noninitial syllabic liquids so they are written as pronounced (i.e., as consonant-vowel sequences). Compare the Khmer spellings of these Sanskrit words with their Devanagari spellings.
ឫទ្ធិ <°r̥ddhi> ~ រឹទ្ធិ <rïddhi> [rɨt] 'power'
(optionally spelled phonetically with a consonant-vowel sequence)
អម្រឹត <amrïta> [ʔɑmrɨt] 'immortal'
The Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script uses the independent vowel symbols for many (not all) vowels, even though it would be possible to write all English word-initial vowels as <ʔa> + vowel character combinations. Some of the phonemic assignments of independent vowel symbols surprised me:
||dependent vowel symbol||transliteration||modern Khmer
||<ā> (not the inherent vowel <a>!)||aː
||ʔə, ʔəj, ʔɨ||ិ
||<i>||e, ə||i, ɨ||/ɪ/
||ʔo, ʔu, ʔao||ុ
||ʔou, ʔuː||ុអ់||<u'> (not <°ū>!)||-
||ej, ə||eː, ɨ||/ɛ/|
||<yu> (not <yū>!)||ju
||<va> (not <va'>!)||vɔː||ុះ
(9.7.21:59: Added the next six paragraphs and greatly expanded the table above.)
I use hyphens to indicate that <'> and <"> have no sound values of their own:
<'>: shortens a preceding <Ca> and <Cā>
<">: indicates that <pa> is to be read as [p] rather than as [ɓ], its current default value; also indicates that vowels after voiced sonorant symbols are to be read as if they were preceded by *voiceless consonants
Other hyphens indicate that a symbol combination is not used in
Khmer as far as I know.
Note how the phonemic assignments of ASB independent vowels and their dependent counterparts do not always match: e.g., independent <°ǔ> corresponds to dependent <ū> since Khmer has no dependent <ǔ>.
Some ASB dependent vowels have no independent counterparts. I presume they are written as <ʔa> + vowel character combinations.
ASB takes advantage of the existence of two <°o>-characters to
assign them to different vowels.
Once again (see part 11), ASB <ḥ> represents /j/.
ASB regards <yu> and <va> as independent vowel symbols.
¹9.7.21:49: Khmer has two homophonous independent
vowel symbols for <°o>. I transliterate them as ឱ <°o₁> and
ឲ <°o₂>. Their Unicode names are KHMER INDEPENDENT VOWEL OO TYPE
ONE and KHMER INDEPENDENT VOWEL OO TYPE TWO. Huffman (1970: 118) says ឱ
<o₁> "is the more common of the two", so I'm not surprised by the
numbers in the Unicode names.
²I was taught ឲ្យ <°oya> which is the main spelling in the online editions of Headley's dictionaries and the only spelling given in Huffman's 1970 textbook and Jacob's 1974 dictionary. Ehrman's grammatical sketch in Contemporary Cambodian has ឲយ <°oya> (with full-sized rather than subscript <ya>) as the only the spelling. Has the regular spelling <ʔoya> become popular in recent years?
³9.7.12:54: I think /ɛɪ/ in the ASB key to independent vowel symbols should be /eɪ/ as in the ASB key to dependent vowel symbols.
184.108.40.206:59: THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 13)
Alternate Script Bureau's (ASB) proposal for writing English in the
Khmer script is based on an nonrhotic dialect. Thus it has symbols
for vocalic sequences corresponding to /Vr/-sequences in rhotic
||Khmer script||transliteration||Khmer script||transliteration||Khmer script||transliteration|
Question marks indicate my guesses for sequences I couldn't find in
Huffman and Proum (1983). (here is on p. 43 and cure
is on p. 44 of H&P.) The spelling <īe> represents [əː] after
*voiced consonants in Khmer (e.g., <y>). Does Huffman pronounce cure
I'm surprised there's no ASB symbol for /ɛə/ as in square. Perhaps the ASB dialect has no /ɛə/. Did it shift /ɛə/ to /ɪə/? The ASB /ə/-vowel subsystem is almost symmetrical except for the lack of a /jɪə/:
<ḥ> has no consistent function in the ASB system; it
corresponds to /ə/ above and to /j/ in <uaḥ> for /ɔj/.
I would have expected /ə/ to be <uʔa'> instead of <uḥ>.
(<ua> isn't available because ASB already assigned that to
Modern standard Khmer is also nonrhotic. However, unlike nonrhotic English varieties, *-r has been lost without a trace in modern standard Khmer: e.g.,
ការ <kāra> /kaː/ 'work'
គូរ <gūra> /kuː/ 'to draw'
ពីរ <bīra> /piː/ 'two'
(Examples from Huffman 1970: 20.)
I used to think there were a few exceptions ending in <-ăra> and Sanskrit <-arCa>: e.g.,
ជ័រ <jăra> > [cɔə] 'resin'
ធម៌ <dharma> > [tʰɔə] 'dharma'
(Examples from Huffman 1970: 50.)
I regarded the final [ə] as a trace of /r/, but it's not - [ɔə] is the regular reflex of short *a (via *ɔ) after *voiced consonants and before *nonvelar codas. Khmer words could not end in *short vowels. It seems that *ɔ-breaking occurred before *-r was (recently?) lost:
|stage 2: *a-raising after *voiced consonants||*ɟɔr||*dhɔr||*mɔn|
|stage 3: *ɔ-breaking
|stage 4: *r-loss
9.6.0:10: I have left the consonants for 'resin' and 'dharma' unspecified in stage 3 since I do not know whether obstruent devoicing preceded or followed stage 3.
220.127.116.11:59: THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 12)
1. Here are the last two vowel symbols¹
Alternate Script Bureau's (ASB) proposal for writing English in the
Khmer script with their counterparts in Huffman and Proum (H&P;
1983) and my own preferences:
||Khmer script||transliteration||Khmer script||transliteration||Khmer script||transliteration|
In modern Khmer, <o> is prohnounced [ao] after *voiceless consonants and [oː] after *voiced consonants. H&P must have the first phonetic value in mind.
In modern Khmer, <au> is pronounced [aw] after *voiceless consonants and [ɨw] after *voiced consonants. H&P must think English /aw/ is closer to Khmer [ao] than Khmer [aw].
H&P and I have the historical sound values of Khmer symbols in
mind. In earlier Khmer, there was no [ao], so <au> would have
been the best choice for English /aw/.
H&P do not have a special symbol for /juː/, so I speculate they would write /juː/ with their symbols for /j/ and /uː/.
ASB uses the short neutral (i.e., nonpalatal and nonlabial) vowel symbol ឹ<ï> for the palatal-labial sequence /juː/ even though <ï> is pronounced [ə] after *voiceless consonants and [ɨ] after *voiced consonants in Khmer.
9.5.0:29: The logic here seems to be that a simple, common Khmer
symbol is preferred to a symbol sequence for a common English phoneme
¹From a rhotic speaker's perspective. ASB is
designed for nonrhotic English, as part 13 will make clear.
2. On Sunday I learned of three martial arts that originated in
Hawaii. They all have interesting names that I could call 英制和語 <ENG
MAKE JPN WORD> Eisei wago 'Japanese words made by English
speakers' or 布制和語 <HI MAKE JPN WORD> Fusei wago 'Japanese
words made in Hawaii²' - terms
intended to sound
like the actual term 和製英語 <JPN MAKE ENG WORD> Wasei eigo
'made-in-Japan English words':
2a. カジュケンボ Kajukenbo
is from 空手 <EMPTY HAND> karate + 柔道
<SOFT WAY> jūdō + 拳法 <FIST METHOD>
kenpō 'martial arts' (see 2b below) + boxing.
Note how the long vowel of jū is absent from Kajukenbo.
It could be spelled in kanji as 空柔拳法菩 'bo(dhisattva) of the empty and
soft martial arts'.
2b. 唐法拳法 Kara-ho Kempo looks redundant in kanji:
唐 Kara is the archaic Japanese word for continental Asia
(China and Korea; the word is ultimately cognate to Korea).
Here it is written as <TANG> (i.e., Tang dynasty) to specify that
Kara refers to China rather than Korea.
法 <METHOD> is read as hō in most contexts (but see
below). Kara-hō is presumably 'Chinese
拳 <FIST> ken (pronounced [kem] before p-) in
Japanese is homophonous with 劍 <SWORD> ken, so 劍法
<SWORD METHOD> 'swordsmanship' (now spelled 剣法 in Japan) is also kenpō
(or kempō if one prefers to romanize phonetically). That
is not a case of 50/50 ambiguity, though. In Google, 拳法 kenpō
'martial art' outnumbers 剣法 kenpō 'swordsmanship' by a ratio of
almost 32 : 1 (1.81 million to 57,000).
法 <METHOD> appears again at the end but is read as pō
after ken. 法 was originally borrowed with initial p- in
Japanese, but that p- was weakened to h- except in the
clusters -np- and -pp-.
Tonight I was puzzled by "DIAN HSUHE" on the official Kara-Ho shield until I
figured it referred to Mandarin 點穴 diǎn xué <POINT
HOLE>, a.k.a. the 'touch of death'.
"HSUHE" is from the Wade-Giles romanization hsüeh with the
letters of eh reversed.
2c. 檀山流 Danzan-ryū 'Sandalwood Mountain School' contains a Japanization of Chinese 檀山 'Sandalwood Mountain' (Taansaan in the Cantonese spoken by most Chinese here), an archaic name for Hawaii unknown in Japanese.
I just realized that sandal- in sandalwood looks
like an Anglicization of Sanskrit candana- 'sandalwood'.
(Middle Chinese 檀 *dan is an abbreviation of 栴檀那 *tɕiendanna,
a borrowing of candana-.) It's not - Wiktionary
shows that the Europeanization of candana- occurred much
earlier in Greek which borrowed the word as σάνδανον sándanon.
(Latin in turn borrowed the Greek word as sandalum an
unexpected -l-. Perhaps the word was remodelled after the
similar-sounding but unrelated word sandalium, the source
²9.5.0:27: 布 fu is short for 布哇 Hawai
'Hawaii' which looks as if it should be read Fuai: i.e., the
sum of its parts 布 fu and 哇 ai. I've never been able to
explain how Hawai came to be spelled 布哇. Usually mysteries of
this type can be solved by reading the kanji in Mandarin (i.e., the
spelling is imported from Chinese), but 布哇 isn't in use in Chinese (the
Chinese name for Hawaii is 夏威夷), and as far as I know, 布 is not
read ha in any language.
³唐法 Kara-hō is an invented 湯桶 yutō-style collocation unique to this proper noun. If I didn't already know that noun, I would read it as Tōhō Kenpō with the Sino-Japanese reading Tō for 唐, since two-kanji words are mostly read with two Sino-Japanese readings, often even from the same stratum of borrowing.
3. I can't remember anymore if I ever wrote a guide to how I assign grades to Tangut syllables, so here goes:
In general, I follow Gong Hwang-cherng's grade assignments though I do not use his notation:
Gong's Grade I with zero marking : my -1
The exception to this rule is Gong's rhyme 4 -u
which I interpret as -u2 rather than -u1. Gong
reconstructed both rhymes 1 and 4 as Grade I -u, but I
differentiate them as -u1 and -u2. (9.5.1:54: There is
no Grade II -iu in Gong's reconstruction.)
Gong's Grade II with -i- : my -2
Gong's Grade III with -j- : my -3 or -4
How do I determine whether Gong's -j- corresponds to my -3
STEP 1: Is the j-rhyme listed twice in Gong's
reconstruction? For instance, Gong reconstructs both rhyme 10 and rhyme
11 as -ji.
If the rhyme is listed twice (like rhyme 10/11 -ji), go to
step 2. If not (like rhyme 62 -jụ), go to step 3.
STEP 2: If there are two j-rhymes that Gong reconstructs identically, I assign Grade III to the first rhyme and Grade IV to the second: e.g., Gong's rhyme 10 -ji is my -i3 and his rhyme 11 -ji is my -i4.
STEP 3: If Gong only reconstructs a j-rhyme once, I assign grades mechanically depending on the initial. I assign Grade III if the initial is
class II (v-)
class VII (ch-, chh-, j-, sh-)
class IX l- and zh- (but not lh-, z- or r-!)
All other j-syllables with a nonduplicate j-rhyme have Grade IV.
That assignment is not arbitrary; it follows the general pattern of initials in syllables to which I assigned Grade III and IV according to the methodology in step 2.
That pattern seems to be phonetically motivated. Grade IV was apparently more palatal than Grade III, and the initials associated with Grade III may have been 'antipalatal': v-, l- (phonetically velar or velarized?), and the class VII initials and zh- (phonetically retroflex?).
9.5.2:40: I am reminded of Polish which has retroflex consonants with Tangut parallels:
Polish nonpalatalized velarized [ɫ] became [w] in standard Polish (but
is retained in some dialects). Tangut l- and v-
could have been like Polish [ɫ] and [w].
The nonpalatalized [l] ~ [w] alternations of Ukrainian and Belarusian also come to mind:
U думала [dumala] '(she) thought' ~ думав [dumaw] '(he) thought'
B думала [dumala] '(she) thought' ~ думаў [dumaw] '(he) thought'
The masculine forms originally ended in *-l.
In all of the above Slavic languages, a lateral and [w] originated from nonpalatalized *l, whereas in Tangut, l- and v- are distinct initial phonemes with distinct histories. I do not intend to draw any deep parallels between Slavic and Tangut. I cite Slavic merely to show how a lateral and [w] can be phonetically similar enough so that one can change into the other. l- and v- must have been phonetically similar in Tangut too.
As for why l- and v- behave like the retroflexes, I am reminded of the unetymological -w- after some Mandarin retroflexes: e.g., in 霜 shuāng [ʂwaŋ] 'frost' < Late Middle Chinese *ʂaŋ. And Wikipedia agrees with my perception of English /tʃ, dʒ, ʃ, ʒ/ as "often slightly labialized: [tʃʷ dʒʷ ʃʷ ʒʷ]." So the Grade III consonants are united by some sort of w-ish-ness.
18.104.22.168:01: HOW MANY SHORT VOWELS DID EARLIER KHMER HAVE?
My introduction to Khmer historical phonology back in 1994 was Pinnow (1980) which posited twelve long *vowels and four *short vowels:
Pinnow's long *vowels
Pinnow's short *vowels
Pinnow takes the long vowels as basic, so he indicates brevity rather than length.
Last year I discovered Jenner and Sidwell's (2010) reconstructed vowel system:
Jenner and Sidwell's long *vowels
The two *Vːə diphthongs are Angkorian innovations. Modern
[ɯːə] (my [ɨə]) is an even more recent innovation; see Sakamoto
(1977) who demonstrates that many [ɨə]-words are borrowings from
Thai and, to a lesser extent, Vietnamese. (He notes one highly
anomalous loan from Sanskrit: រឿណរង្គ <rï̄yeṇaraṅga> [rɨənrŭəŋ]
< Skt raṇaraṅga 'battlefield'.)
Jenner and Sidwell's short *vowels
Jenner and Sidwell posit eight short vowels, twice as many as
Pinnow's four. They only give one example of the 'extra' vowels (from a
|<radeḥ> ~ <rddeḥ>
||*rədeh ~ *rɔdeh
The form in the "Pinnow?" column is my guess according to my
understanding of his system.
A sample of Old Khmer <-eḥ> words in Jenner's online dictionary all ended in *-eh with short *e:
<neḥ> *neh 'this'
<peḥ> *ɓeh 'to pluck'
<ʔseḥ> *seh 'horse'
Like 'cart', all three of the above words are still spelled with
<-eḥ> in modern Khmer.
Are there any Old Khmer <-eḥ> words that were pronounced with *-eːh? Or is short *e an allophone of *eː before *-h?
The problem is that the Khmer script has never had distinct symbols for short and long e, so in theory Old Khmer <e> could have represented either *e or *eː. I can only think of two ways to reconstruct such a length distinction in Old Khmer:
1. Internal evidence: The *e that Pinnow and I
reconstruct on the basis of modern Khmer corresponds to two sets of
spelling patterns in Old Khmer.
2. Backward projection: If Old Khmer <e> has two sets
of correspondences in modern Khmer, then those sets might be reflexes
of short *e and long *eː.
(The fact that *e has different reflexes depending on the
*voicing of the preceding consonant and on what follows *e is
not relevant if the goal is to reconstruct a phonemic length
distinction. To do that, one would ideally find two Old Khmer words
spelled with the same consonant plus <-eḥ> with different
reflexes in modern Khmer. One would then conclude that the Old Khmer
script was incapable of indicating the length difference in the vowels
of those two words.)
In scenario 1, if some modern Khmer <-eḥ> corresponds to Old Khmer <-eḥ> *-eh, then some other modern Khmer <-eḥ> might correspond to Old Khmer <-X> *-eːh.
In scenario 2, some Old Khmer <-eḥ> *-eh became modern Khmer <-eḥ>, wheas other instances of Old Khmer <-eḥ> *-eːh became modern Khmer <-Y>.
I don't know enough about Old Khmer to guess which scenario is correct, much less come up with other scenarios.
For now I am inclined to go with my allophonic hypothesis. Pinnow's/my *e has different reflexes depending on whether it was followed by *-h:
|after *voiceless initial
|after *voiced initial
That suggests that *e was phonetically (not phonemically!) different (shorter?) before *-h. The reflexes of *-eh are identical to those for *-ĭh:
||before *-ʔ and in open syllables
|after *voiceless initial
|after *voiced initial
*e in *-eh might have been short like *ĭ in *-ĭh.
22.214.171.124:59: THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 7)
1. The Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script uses the មូសិកទន្ត <mūsikadanta> 'mouse¹ tooth' diacritic ៉ <"> to represent English /æ/. That wouldn't have occurred to me since <"> in Khmer is not a vowel symbol. It has two functions:
- to indicate that a vowel after a voiced consonant symbol is pronounced as if it had been preceded by a *voiceless consonant: e.g.,
យ៉ាង <y"āṅa> [jaːŋ] 'kind'
which has the reflex of *aː normally after voiceless consonants as in
ខាង <khāṅa> [kʰaːŋ] 'side'
rather than the reflex of *aː normally after voiced consonants as in
យាង <yāṅa> [jiəŋ] 'to go (royal)'
- to indicate that ប <pa> stands for [p] rather than [ɓ], its normal value in modern Khmer: e.g.,
ប៉ី <p"ī> [pəj] 'flute'
cf. បី <pī> [ɓəj] 'three'
(Examples from Huffman 1970 added 8.30.14:48.)
The simplicity of <"> (two short strokes) is appropriate for
the fifth most common vowel in English after /ə ɪ i ɛ/. Three of those
vowels have one-stroke symbols in the ASB system:
/ə/ : ់ <'> (which isn't a vowel symbol in Khmer)
/ɪ/ : ិ <i>
/ɛ/ : េ <e>
/i/ has a two-stroke symbol ី <ī>.
ASB's <"> is easier to write than my choice of two-stroke <ĕ> (modern Khmer [ae] ~ [ɛː] < *ɛː) for English /æ/.
¹8.30.15:24: Sanskrit mūṣika- 'mouse'
is cognate to English mouse.
2. Last night I saw that the English Wikipedia
gives two different Khmer spellings of Lon Nol:
លន់ នល់ <luna' nala'> (47,900 Google results)
លន់ ណុល <luna' ṇula> (8,470 Google results)
The first is the one used by the Khmer Wikipedia which doesn't mention the second. One might expect the two spellings to be homophonous, but I would read them (perhaps erroneously) with different vowels as
Was it really possible to pronounce his personal name Nol two different ways?
126.96.36.199:45: In case you're wondering what's going on with the gap between spelling and pronunciation, this table may help (K = voiceless obstruent, G = voiced obstruent, Ṅ = voiced sonorant, P = labial, ฿ = nonlabial):
|*GɔC||<GaCa'>||[Kŭə฿] ~ [KuP]|
|*ṄɔC||<ṄaCa'>||[Ṅŭə฿] ~ [ṄuP]|
Earlier Khmer has both voiceless and voiced obstruents (*K and *G) which merge into voiceless [K] in modern Khmer.
Earlier Khmer has a simple short/long vowel system whose modern Khmer reflexes diverge depending on the *voicing value of the preceding consonant (and the labiality of the final consonant after *ɔ).
After *voiceless consonants, labial vowels are pushed up:
*ɔ(ː) > [ɑ(ː)]
*oː > [ao]
*uː > [ou]
*u > [o] (filling the short *o-gap of earlier Khmer)
After *voiced consonants, labial vowels either remain the
same or are pushed up:
*ɔː > [ɔː]
*oː > [oː]
*uː > [uː]
*u > [u]
The bending of Khmer vowels reminds me of the bending of Old Chinese vowels. In both Khmer and Old Chinese, *vowels split into two series, 'lower' and 'higher' (though the conditioning factors were different):
||Late Old Chinese
||modern Khmer||Late Old Chinese|
||*o > *əw
||[ou]||*ou > *aw
(Above I give Khmer reflexes for *long vowels in spite of the absence of length in the first column.)
I suspect Tangut also underwent a similar vowel split, though the details are unknown.
One might expect <ṇula> to behave like *ṄuC in my first table: i.e., it should be read [ɳul]. But in modern Khmer script, ណ <ṇa> indicates that the following vowel is read as if it had once been preceded by voiceless *n̥. Khmer never had a retroflex /ɳ/ phoneme, so ណ <ṇa> came to be used as the virtual *voiceless counterpart of ន <na> for /n/. I emphasize the word "virtual" - vowels after ន <na> are pronounced like vowels after, say, ត <ta>, but at no time was <ṇa> ever pronounced *n̥. For instance, <ṇāma> [naːm] 'water' is a borrowing from Thai น้ำ <nā2ṁ> [naːm˥] 'water' which never had *n̥. The word was borrowed after the shift of *aː to *iə after *n. Contrast with នាម <nāma> [niəm] 'name', borrowed from Indic before the shift of *aː to [iə] after voiced *n.
|*aː to [iə]||*niəm|
3. Last night I saw the Khmer spelling of Sisowath for the first time:
ស៊ីសុវតិ្ថ <sˌīsuvatthi> [siːsoʋat]
(I confess I've only read about Cambodian history in English for the last twenty-six years.)
<suvatthi> is from Pali suvatthi- 'well-being'. But what is <sˌī>? I was surprised by the ក្ផៀសក្រោម <kphiasa kroma> 'under dash'² ុ <ˌ> which I've never seen in an Indic loan before. It indicates that <ī> is to be read [iː] as if it had followed a *voiced consonant.
There is an identically spelled Khmer word <s^ī> [siː] 'to eat' with <trīsabda>. Why isn't it <sī> [səj] with the regular reflex of *iː after voiceless */s/? Is it a loanword like another Khmer <s^ī> [siː] 'color', a loanword from Thai สี <sī> [siː˩˩˦] 'id.' borrowed after the shift of *iː to [əj] after *voiceless consonants was complete?
²8.30.16:07: The ុ <kphiasa kroma> identical to ុ <u> in shape is a subscript virtual voicing reversal mark. It replaces the ៉ <mūsikadanta> and the ៊ <trīsabda> diacritics when a superscript symbol occupies their positions.
It's taken me twenty-four years to figure out that <trī> is
'three' and not 'fish' and that ៊ <trīsabda> gets its name from
its resemblance to ៣ <3>.
4. Last night I saw the name Mohannad
Mohanna. Are Mohannad and Mohanna
says Arabic مهند Muhannad (> Persian Mohannad)
is from Hind 'India' (with mu- + a CaCCaC
pattern). And Wikipedia has three entries for places
named مهنا Mohanna.
says Latin sidus 'star' is from Proto-Indo-European *sweyd-
'sweat'. How is that semantically possible?
6. Today I started copying the Tangut character textbook
0152 4009 5370 5449 4797
1kiq2 1dyq4 1paq 1tiq4¹ 2wyr4
'gold grain palm place writing'
a.k.a. The Golden Guide by hand.
The second character in the title (4009) is the only tangraph with component 157 (𘢜).
In Homophones A, 4009 appears with component 160 (𘢟)
In Homophones B and D (a.k.a. B2 and B5), 4009 appears with component 157 (𘢜) (17B22).
You can see scans of the Homophones pages on Andrew
157 seems to be an abbreviated form of 160 with the <WATER>-like portion (component 036 𘠣) written as a single stroke.
The Tangraphic Sea analysis of 4009 derives 157/160 from a right-hand element which I can't find in Unicode: ⿱𘠙𘠅. 006 𘠅 is the right-hand version of 036 𘠣 <WATER>. None of the characters with ⿱𘠙𘠅 have anything to do with water, though:
||Li Fanwen number
||second half of 𗉴𗾵 1687 2615 2chhy3 2khu4 'minced meat' (dictionary only)|
||to chop; bean jam?
||to break, broken (dictionary only)|
||fragmentary, broken; < Chn 碎
||second half of 𗨦𘂉 3381 5900 2by1
2di4 'fragment' (dictionary only)
Unlike many other tangraphic elements, ⿱𘠙𘠅 has a clear semantic function: almost all of the above involve pieces or making something into pieces.
I regard dictionary-only words as candidates for loans from a
substratum. 1687 2615 might be a unanalyzable disyllabic substratum
synonym of native 5359.
4142 may be a phonetic loan for 'bean jam'.
Unlike 1687 2615, 4446 is not disyllabic. Could it be a native monosyllable that just so happens not to have been found in any nonlexicographic texts yet?
5359 has a straightforward graphic structure <MEAT.BREAK> and
is the basis for 2615 and 4142.
5380 has an unexpected -e that may indicate that the Chinese dialect known to the Tangut already had a pronunciation of 碎 closer to modern standard Mandarin [swej] than Middle Chinese *swiʰ.
5900 may be an adjective 'broken' after 3381 'pellet'. Without any
attestations not preceded by 3381, I can't tell if it can stand by
7. I wouldn't have guessed that Hagadone is an Americanization of Hagedorn.
188.8.131.52:59: THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 6)
1. I used the inherent vowel of the Khmer script to write English /ʌ/, but the Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script uses the long vowel symbol ា <ā> instead. That surprised me because /ʌ/ isn't long.
Another surprise is that in the ASB proposal, ា <ā> does double duty for English /a/. Wouldn't that lead to ambiguous spellings? Maybe not - many of my /a/-words are /ɔ/-words in the dialect ASB is based on (e.g., pot = my /pat/ but ASB's /pɔt/). Putting such words aside, the only minimal pair I can think of is calm /kam/ : cum /kʌm/; both would be written កាមា <kāma> in ASB. I would distinguish them as កាម <kāma> and កម <kama>.
ា <ā> does double duty in my system as well for final schwa:
e.g., comma /kamə/ as កាមា <kāmā>. Although I use the
inherent vowel for schwa in word-medial position, I can't do so in
word-final position where <Ca> represents /C/.
2. Last night I saw an ad for a book by "LEUYEN PHAM" in all caps on
my Kindle. It caught my eye because I had never seen the two syllables
of a Vietnamese personal name run together before. The front (and
only?) page of LeUyen Pham's site
asks, "How Do You Pronounce LeUyen Pham?!?" but doesn't answer the
question. In Vietnamese, Le Uyên (tones unknown) is pronounced
[le ʔwiən] (north) ~ [le ʔwiəŋ] (south). However, I don't know how
English speakers pronounce it.
3. I thought I had never seen T in the name of any Hawaiian group until last night when I saw a reference to Hui Aloha ‘Āina Tuahine in the Honolulu Star-Advertiser. Turns out I had first seen the name of that group on p. 363 of Albert J. Schütz' The Voices of Eden back in 1995.
In standard Hawaiian, t shifted to k. However, t
persists in tūtū
'any relative or close friend of grandparent's generation' and Tuahine,
by wehewehe.org as
(More commonly Tuahine). Name of a misty rain famous in Mānoa, Oʻahu, named for Kuahine, who turned to rain after the murder of her daughter, Ka-hala-o-Puna; the rain is also in other localities.
I suspect t survived in tūtū and Tuahine because they were borrowed into English which has a /t/ : /k/ distinction absent in Hawaiian which only has three kinds of stops: labial /p/, glottal /ʔ/, and a third stop whose point of articulation varies by dialect.
Does Tuahine indicate that the Mānoa dialect had [t] for that third stop?
Today Mānoa is a center for education in standard Hawaiian with [k]
for that third stop. When Hawaiian was repopularized¹,
words with that
third stop were pronounced with [k] following their standardized
spellings with k.
¹I would rather not say "revived" since
Hawaiian has never died. What
has been lost is the original diversity of dialects. As far as I know,
the only two varieties still spoken by large numbers of people are
the Niʻihau dialect native to the population of Niʻihau and the
standard language learned in schools.
4. Valdemar Knudsen (1819-1898) is said to have been able to speak "the 3 Hawaiian languages" fluently. What were the three? Hawaiian, English, and Pidgin Hawaiian? (Hawaiian Creole English, now 'Pidgin', had only begun to develop during Knudsen's last years.)
The Pidgin Hawaiian article at Wikipedia says a couple of surprising things:
Emerging in the mid-nineteenth century, it was spoken mainly by immigrants to Hawaii, and mostly died out in the early twentieth century, but is still spoken in some Hawaiian communities, especially on the Big Island.
It's still alive? I thought it was extinct. Has anyone done any modern fieldwork?
Like all pidgins, Pidgin Hawaiian was a fairly rudimentary language, used for immediate communicative purposes by people of diverse language backgrounds, but who were mainly from East and Southeast Asia.
Southeast Asia? As far as I know, mass Southeast Asian immigration to Hawaii postdates the Vietnam War.
184.108.40.206:59: THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 5)
Alternate Script Bureau's (ASB) proposal for writing English in the
Khmer script has no inherent vowels, so it has vowel symbols
correspondng to my inherent vowel: <ā> for /ʌ/ and បន្តក់
<pantaka'> /bɑntɑʔ/ <'> for /ə/.
|Khmer script||transliteration||Khmer script||transliteration||Khmer script||transliteration|
(8.28.1:36: Added my guesses for H&P-style forms.)
One good reason to use <'> for schwa is that it is a simple, short stroke. It would be impractical to write the most frequent vowel in English with a complex shape.
It hadn't occurred to me to use <'> as a vowel character
because in Khmer proper it functions as a breve for the inherent vowel
and <ā>, not as a vowel character:
បន្តក់ <pantaka'> /bɑntɑʔ/
(A hypothetical †តក <taka> would be †/tɑːʔ/. In theory the name of the diacritic could be written with two <'> as ˟បន្ត់ក់ <pan'taka'>, but unstressed initial <CaC> syllables always have short vowels, so a second <'> is redundant.)
កាត់ <kāta'> /kat/ 'cut'
(The resemblance to the English word is coincidental; compare កាត <kāta> /kaːt/ 'card' without <'>.)
In Khmer proper, <'> appears atop the symbol for a syllable-final consonant following the symbol with the vowel it shortens, whereas in ASB, <'> behaves like a vowel symbol, combining with the symbol for the consonant that immediately precedes a schwa.
(8.28.1:22: Khmer examples of <'> added. បន្តក់
<pantaka'> is unreadable in ASB since ASB has no inherent vowels
- it looks like ASB [p-nt-kə].)
2. Khmer ថៃ <thai> [tʰaj] must have been borrowed after Thai *d-
> tʰ-, and ថៃឡង់ដ៍ <thaiḷaṅ'aṭa˟> [tʰajlɑŋ] may be an
even more recent borrowing from French Thaïlande
[tajlɑ̃d] with Khmer [ɑŋ] approximating French [ɑ̃]. But surely the
Khmer had a word for 'Thai' predating those borrowings. Did Khmer ever
have a word like †ទៃ <dai>? The only premodern Khmer word I can
find in Jenner is an
undated សៀម <siama> 'Siam'.
3. When looking for 'Thai' in Philip Jenner's Old Khmer dictionary last night, the only entry that appeared was ស៊ង <s"aṅa> 'two', a borrowing from Thai /sɔːŋ/ 'id.' attested in a text from 1684. The <"> indicates that <sa> by itself could not represent /sɔː/. The split of /ɔː/ to
/ɑː/ after *voiceless consonants
/ɔː/ after *voiced consonants
must have already occurred. The addition of <"> indicates that the following vowel is one normally associated with a *voiced consonant: i.e., /ɔː/ in this case. សង <saṅa> without <"> was /sɑːŋ/ < */sɔːŋ/ 'to give back'.
That split occurred after the loss of voicing in obstruents.
Thai also devoiced its obstruents, but unlike Khmer, it aspirated them: e.g., *d- > tʰ-. So I was surprised to see Thai พัน <ban> *ban (now /pʰan/) 'thousand' borrowed as Khmer ពទ <bana>. Is the Khmer spelling merely a mechanical copy of the Thai spelling, or does it indicate that Thai *b had not yet shifted to pʰ-?
4. What is the etymology of the name Odoacer?
5. I think I've only spoken of 'Turkish' borrowings into Balkan languages. Marek Stachowski (2019) writes, "it is much better to call Turkish loanwords in the Balkan languages just 'Turkish', which is sufficiently clear in English." Whew. But I confess that I used the term 'Turkish' without knowing his reasoning against the term 'Ottoman Turkish'.
220.127.116.11:59: THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 4)
1. Exactly how the Brahmi script - the parent of all Indic scripts - developed is not clear, but one thing is for sure: the principle of inherent vowels is ingenious. Brahmi was first used to write Middle Indo-Aryan whose most common vowel was short a. Making this short a inherent to consonant symbols saved a lot of effort and space.
This state of affairs reflects a Proto-Indo-Iranian innovation: the merger of Proto-Indo-European *e/*o into *a (there was no Proto-Indo-European *a according to Beekes¹):
|*eH, *oH, *ē, *ō
Whitney (1896: 26) found that nearly 20% of segments in a Sanskrit
text sample were short a.
Middle Indo-Aryan (e.g., Pali) inherited that abundance of a from Sanskrit (= Old Indo-Aryan) and gained a few more short a via the loss of other vowels: e.g.,
Skt mr̥ga- > Pali maga- (also miga-!) 'deer'
(See Masica 1991: 167-168 for more examples.)
And it was at that stage that Brahmi was developed to write a-filled Middle Indo-Aryan. (Sanskrit was first written after its descendants were!)
An inherent a in the Brahmi script is a good fit for Old and Middle Indo-Aryan. But is it a good fit for English? What vowel is most frequent in English? My guess was schwa, and I was right. So in my adaptation of Khmer script to English, the inherent vowel is schwa: e.g.,
campus [kʰæmpəs] > កែម្បស <kĕmpasa>
cf. ASB ក៉ម្ប់ស <k"amp'asa>
(Example added 8.27.0:39. ASB added 8.27.20:31. I write English
voiceless obstruents as Khmer unaspirated voiceless obstruents
regardless of their allophonic aspiration in English: e.g., English /k/
[k] ~ [kʰ] asក <ka>.)
I also use the inherent vowel to write /ʌ/ and syllabic consonants: e.g.,
button [ˈbʌtn̩] > ពតន <batana>
cf. ASB ពាត់ន <bāt'ana>
(8.27.20:34: ASB added.)
(8.27.0:03: Like actual Khmer, I write final consonants with <Ca> symbols. I could use virāmas, but if I can live without them in Khmer and Thai without word spacing, I can live without them in English with word spacing.
In theory I could write syllabic consonants as subscript consonants: e.g.,
button [ˈbʌtn̩] > ពត្ន <batna>
but I prefer to reserve subscript consonants for clusters of
On the other hand, the Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script has no inherent vowels because "it only adds an additional layer of complexity".
¹Like Pulleyblank, I find this situation highly improbable, and I suspect original central *a and *ə later polarized to front and back vowels *e and *o.
first proposed etymology of culvert at Wiktionary is almost
meaningless: "a dialectal word". Almost, because at least that tells us
the proposer thinks the word is a borrowing from another dialect. But
which dialect - and what is its derivation there?
3. Today I realized I never learned how to write Burmese rotated subscripts which only appear in Pali loanwords:
ဌ <ṭha> > ဏ္ဌ <ṇṭha>
ဍ <ḍa> > ဏ္ဍ <ṇḍa>
Are they written like their full-size counterparts but at a different angle: e.g., is sideways ဌ <ṭha> written counterclockise from right to left rather than counterclockwise from top to bottom like its upright version?
(8.27.22:39: I found the answer on p. 402 of John Okell's comprehensive Burmese: An Introduction to the Script [429 pages!]: both rotated subscripts are written clockwise from left to right.)
Today I also realized that there is a logic to the rotation of subscript characters:
only characters with descenders (ဌ <ṭha>, ဍ <ḍa>) rotate under extra-wide ဏ<ṇa> which is wide enough to cover the full length of the rotated characters (8.27.12:39: But ဋ <ṭa> is abbreviated rather than rotated under ဏ <ṇa>: ဏ္ဋ <ṇṭa>. In the Myanmar Text font, ဋ <ṭa> does rotate under extra-wide characters other than ဏ <ṇa>: e.g., > က္ဋ <kṭa>. However, in actual Burmese, the only <Cṭ>-clusters are <ṭṭ> and <ṇṭ>, so the behavior of subscript ဋ <ṭa> in other clusters is theoretical and up to the font designers. Other Burmese fonts I have abbreviate rather than rotate ဋ <ṭa> under က <ka>. Whether you see a rotated ဋ <ṭa> in က္ဋ <kṭa> depends on your browser's preferred Burmese font.)
the character written counterclockwise rotates clockwise: ဌ <ṭha> > ဏ္ဌ <ṇṭha>
conversely, the character written clockwise rotates clockwise: ဍ <ḍa> > ဏ္ဍ <ṇḍa>
Burmese: An Introduction to the Script doesn't mention a subscript version of ဠ <ḷa>, and most of my Burmese fonts don't have such a subscript. However, the Myanmar Text font does: its subscript <ḷa> (written counterclockwise when full-sized) rotates clockwise (cf. ဌ <ṭha>) even under normal-width characters. Is subscript <ḷa> real or artificial? I've never seen any Cḷ-clusters in Sanskrit or Middle Indo-Aryan and wouldn't expect any since ḷ originates from a lenition of intervocalic ḍ: e.g.,
Skt soḍaśa > Pali soḷasa 'sixteen'
Skt garuḍa- > Pali garuḷa- 'garuda'
(Examples added 8.27+.12:48. See Masica 1991: 170 for more.)
4. The Jurchen character 右 <mei> is a lookalike of Chinese 右
Today it occurred to me that <mei> sounds a bit like Japanese 右 migi 'right'. A mostly dubious scenario:
It is a fact that Japonic language(s?) were spoken on the Korean peninsula.
Suppose such a language was spoken in Parhae in the northern
part of the peninsula. (No evidence for this.)
Suppose that hypothetical 'Parhae Japonic' had a word for 'right' like *mei cognate to Japanese migi. (No evidence for this.)
Suppose that hypothetical Parhae Japonic word was written as a semantograph 右 <RIGHT>. (No evidence for this.)
Jurchen borrowed 右 <RIGHT> from Parhae Japonic as a phonogram for the Jurchen syllable mei. (8.27.0:27: And if there's no evidence for all but the first link in this chain, the chain falls apart. But ... if Japonic is irrelevant to the origin of using 右 <RIGHT> for mei, what is the origin? Is Jurchen 右 borrowed from abbreviation of some other character used semantographically to write a morpheme mei in some language of Parhae: e.g., 佑祐 - or even 右-like 友有 and their derivatives?)
18.104.22.168:59: THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 3)
1. Indic scripts like Khmer tend to have a wealth of consonant characters. This is because Indic scripts were originally intended for Indic languages characterized by
a four-way opposition of voiceless unaspirated, voiceless aspirated, voiced unaspirated, and voiced aspirated obstruents (k, kh, g, gh)
Most of these oppositions do not exist in English.
On the other hand, in spite of that wealth of consonant characters,
Indic scripts originally¹ lacked
for several fricatives that exist in English: /ʒ z θ ð f/.
The Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script assigns 'extra' characters from an English perspective to English fricatives:
|Khmer script||transliteration||Khmer script||transliteration||Khmer script||transliteration|
I've included Huffman and Proum's (1983: 31, 42; hereafter H&P) transcription and my own for comparison. English /w/ is not a fricative, but I have included it because Khmer has no /v/ : /w/ distinction.
/ɣ/: Did ASB include this for symmetry with /x/ which is extremely
marginal in English? /ɣ/ reminds me of Sanskrit l̥̄ which was
created to be parallel to the rare but real phoneme r̥̄.
(In Khmer, the character ឮ <l̥̄> devised for that purely
hypothetical Sanskrit phoneme is used for the real and really common
word [lɨː] 'to hear'.
/ʃ/: ASB's choice of ឆ <cha> reminds me of the Thai convention of borrowing English /ʃ/ as /tɕʰ/: e.g., shampoo as แชมพู <jeemabū> [tɕʰɛːmpʰuː].
Unlike H&P and ASB, I use the extinct Khmer character ឝ
<śa>: cf. Hindi शैम्पू <śaimpū> 'shampoo'.
/ʒ/: I use the ត្រីសព្ទ <trīsabda> diacritic <^> to indicate that voiceless <śa> is pronounced with a voiced initial. In actual Khmer orthography, <trīsabda> over a <voiceless> consonant indicates that a following vowel is pronounced as if it had originally followed a *voiced consonant: e.g., ហ៊឵ន <h^ān> [hiən] 'to dare' is pronounced as if it had developed from a (nonexistent and impossible) *ɦaːn. I could transliterate <ha> + <trīsabda> as <ɦa> rather than as <h^a>, but to imply that Khmer once had [ɦ] would go against the general historical/etymological principle of my transliteration.
/z/: H&P's ឯ <°e> may be a case of arbitrarilly using a Khmer character that would otherwise go unused.
I would have expected ASB to use ឍ <ḍha> or ធ <dha> for /z/ by analogy with the other voiced aspirates for voiced fricatives. <Z> on my Khmer NIDA keyboard layout is assigned to ឍ <ḍha>.
My ស៊ <s^a> is based on the same principle as my ឝ៊ <ś^a> (see above).
/ð/: ASB's ឋ <ṭha> is surprising since this character originally did not represent a voiced consonant.
/f/: H&P's ហ្វ <hva> is carried over from an existing
Khmer convention to write foreign /f/.
/w/: H&P use អ្វ <ʔva> because វ <va> is already
taken for /v/. In modern Khmer, វ <va> is pronounced [ʋ] in
initial position, but in earlier Khmer, it may have been pronounced [w].
¹"Originally" because characters werd later devised for such fricatives: e.g., Devanagari ज़ <j.a> for [zə]. But such characters created in India postdate the spread of Indic script to Southeast Asia, so there is no Khmer analogue of Devanagari ज़ <j.a> for [zə].)
²The actual printed character is ឧ <°u>,
perhaps because Huffman
and Proum's (1983) was written on a typewriter without a ឌ <ḍa>
were inevitable on typewriters:
The Keyboard layout for 120+ elements of Cambodian script and essential punctuation marks was a very difficult task because of the limitation to 46 keys and 96 positions of the standard typewriter.
2. Could Khitan large script character 2091
be a variant of 2050
(8.26.17:01: I found 2091 without context in N4631 which
gloss for it. 2091 may either be distinct from 2050 [i.e., not occur in
calendrical contexts where 'hare' is expected] or represent 'hare' -
the actual animal - in a noncalendrical context that remains to be
3. Korean 표고 phyogo 'shiitake mushroom' caught my eye because it is one of a small number of native words with yo. (Perhaps the most important are 좋 choh- < 죻- /cyoh/ 'good' and 소 so < 쇼 /syo/ 'cow'. The distribution of y is skewed in Korean. It most often precedes ŏ, and for twenty years, I have thought that many if not all native yŏ go back to an Old Korean *e (not to be confused with modern Korean ㅔ e < /əj/). But where does native yo come from?
Martin et al. (1967: 1758) suggest that phyogo may be
Chinese but do not specify a Chinese source. -go /ko/ sounds
like Middle Chinese 菇 *ko
'mushroom'. Wikipedia lists a number of Chinese words for 'mushroom'
(given here in standard Mandarin pronunciation, though I do not know if
they are all standard Mandarin words), but none are like the †piaogū
[tone for first syllable unknown] that would theoretically correspond
to Korean phyogo.
香菇 xiānggū 'fragrant mushroom'
冬菇 dōnggū 'winter mushroom'
北菇 běigū 'northern mushroom'
厚菇 hòugū 'thick mushroom'
薄菇 báogū 'thin mushroom'
花菇 huāgū 'flower mushroom'
phya-: two adverbs (sound-symbolic?) only attested in
modern times: 퍅 phyak 'weakly' (variant of 팩 phaek ~ 픽 phik)
and 퍅퍅 phyakphyak 'firmly'
phyŏ- (excluding phyŏ in inflected and derived
forms of 피- phi- 'to spread out, bloom'): 편 phyŏn 'kind
of pastry' (< Chn 餅?), 편수 phyŏnsu 'head artisan' (? + Chn 手
'hand'?), 'Korean ravioli', 평지 phyŏngji 'kind of plant'
phyo-: only phyogo
None are core words that would be likely retentions from Proto-Koreanic
Since Korean ph- is from *kVpV- and *pVkV-, perhaps there was a constraint in some intermediate stage against clusters like *kpy- and *pky-. However, a three-consonant cluster constraint does not explain the paucity of native ky-words not beginning with kyŏ-. I'll look at those words later. Why k-? ㄱ k- is the most common initial consonant letter in hangul (not counting the zero initial), whereas ㅍ ph- is the least common (not counting reinforced consonants like pp-).
22.214.171.124:57: THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 2)
1. Unlike modern Khmer script which deviates from the one-symbol-per-sound ideal with two or more consonant letters per consonant phoneme and two or more readings per vowel symbol (see part 1), the Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script has just one symbol per consonant (not including subscript variants) and one reading per vowel symbol. Compare Huffman and Proum's (1983: 31; hereafter H&P) transcription of English for modern Khmer speakers with the ASB system and my own system:
ASB, my own English-in-Khmer system, and my Khmer transliterations are based on the old values of Khmer characters: e.g.,
ប <p> represents English /p/ (initial [pʰ]) which is quite different from [ɓ], its primary phonetic value in Khmer.
ី <ī> represents English [iː] which is quite different from [əj], its phonetic value in Khmer after *voiceless obstruents.
A Khmer reader would pronounce
ASB បែ <pɛ> as [ɓae], not [pʰej]
my បេ <pe> as [ɓej], not [pʰej]
ASB and my បី <pī> as [ɓəj], not [pʰiː]
ASB and my system agree most of the time but not all of the time:
e.g., the treatment of English [ej]. I'll start covering the
differences in part 3.
2. 稚内 <YOUNG INSIDE> Wakkanai is an Ainu name in disguise. Ainu wakka 'water' is written as 稚 <YOUNG> since Japanese waka (not wakka!) is 'young'. And Ainu nai 'river' is written as 内 <INSIDE> since it sounds like Sino-Japanese nai 'inside'. This makes me wonder
- how many other cases of -VCCV- sequences (like wakka)
are written with characters normally read with -VCV- sequences (like 稚 waka)
- if any Ainu words are written semantically in Japanese place names: e.g., is wakka 'water' ever written as 水 <WATER>? Is nai 'river' ever written as 川 <RIVER> or 河 <RIVER> or 江 <RIVER>?
Ainu nai 'river' happens to coincidentally resemble Middle
Korean nayh 'river'. The resemblance fades if the Middle Korean
word is traced to Old Korean 川理 <RIVER ri> *(na?)ri (cf.
its Paekche cognate 'stream', recorded in Japanese as 那禮 ~ ナレ nare
~ ナリ nari).
3. Today I was listening to 坪能克裕 Tsubonō Katsuhiro's score for Aura
Battler Dunbine (1983-84). Tsubonō's name is spelled as an
combination of a native Japanese 坪 <TSUBO> tsubo 'unit of
area' with a Sino-Japanese 能 <CAN> nō. Is the name a
variant of 坪野 <TSUBO FIELD> Tsubono with a long final
4. Today I discovered 金芝河 Kim Chi-ha's respellings of the names of what he called 五賊 오적 Ojŏk 'Five Bandits':
||National Assembly member
||high-ranking public official
||minister and vice-minister
I'm out of time, so I'll comment on Kim's respellings and Brother
Anthony of Taizé's translations later.
5. Via Bitxəšï-史 today: Blažek et al.'s Altaic Languages: History of Research, Survey, Classification and a Sketch of Comparative Grammar (2019) can be freely downloaded here (click on "KE STAŽENÍ"). I use 'Altaic' on this site to refer to an areal grouping of languages, but that book treats it as a genetic language family. A quick look at the book leaves me unconvinced.
THE ALTERNATE SCRIPT BUREAU'S KHMER SCRIPT FOR ENGLISH (PART 1)
Last night I disccovered the Alternate Script Bureau's (ASB) proposal for writing English in the Khmer script.
It reminded me of how I came up with a way to write English in hangul when I first learned that alphabet in 1987. Unaware of the wealth of obsolete hangul letters, I recall inventing a letter 巳 for /l/ based on ㄹ /r/. I might have made up other letters as well.
Khmer has so many characters that it's not necessary to invent new ones for English.
Obstruent devoicing and vowel warping conditioned by the *voicing of preceding consonants have resulted in many pairs of homophonous consonant characters on the one hand and vowel characters with double readings on the other: e.g.,
ផ <ph> *ph > [pʰ]
ភ <bh> *bh > [pʰ]
ី <ī> *iː > [əj] after *voiceless consonants, [iː] after *voiced consonants
A Khmer script for English designed for maximum compatibility with
modern Khmer script for Khmer would carry over those characteristics.
Huffman and Proum's (1983) transcription of English for modern Khmer
speakers has those characteristics: e.g.,
ផេ <phe> for pay [pʰej]
ភី <bhī> for pea [pʰiː]
Note how English p [pʰ] has to be written differently
depending on the following vowel.
ASB takes a simpler approach which I'll describe next time.
(8.24.0:12: Huffman and Proum probably have no transcriptions like ភេ <bhe> or ផី <phī> because the syllables [pʰeː] and [pʰəj] do not exist in American English.)
126.96.36.199:57: KHMER INDEPENDENT VOWEL LIGATURES
I didn't figure out that these Khmer independent vowel characters were ligatures until two days ago:
ឨ <°û> (= ឧក <°uka> /ʔok/) < ឧ <°u> + ក <ka> (<ˆ> symbolizes the upper 'hair' stroke of <ka>)
ឪ <°ǔ> (= ឩវ <°ūva> /ʔəw/) < ឧ <°u> (not ឩ <°ū>!) + វ <va> (<ˇ> symbolizes the upper 'hair' stroke of <va>)
Duh. Two mysteries down, more to go:
Why was the now-obsolete ligature ឨ <°û> created? One might guess there vwas a high-frequency word /ʔok/ (< earlier /ʔuk/). There is no likely candidate for such a word in modern Khmer. Here are all the meanings of ឧក <°uka> ~ អុក <ʔuka> /ʔok/ in Headley's dictionary at SEAlang:
1. 'bellyband, cinch, girth (of a harness)'
2. 'to reproach, blame, censure; to scold; to abuse, to criticize severely'
3a. 'to slam something down; to fall down hard'
3b. 'check, checkmate'
4. 'kind of vulture'
None of those words would seem to be frequent enough to motivate the creation of a ligature for them.
However, there is an Old Khmer word /ʔuk/ 'also' (by coincidence resembling Dutch ook 'also'!) that Jenner transliterates as <ukk> ~ <uk> ~ <ukka>. So I'm guessing that's the word that was once represented by ឨ; once that word became obsolete, its ligature vanished along with it.
What I don't understand about Jenner's transliteration system for Old Khmer is when he chooses to write final <-a>. He and Sidwell do not explain this in their Old Khmer Grammar. I don't know how <ukk> differed from <ukka> in the original script. (I confess I have only seen Old Khmer in transliteration.) Jenner's <ukka> is presumably ឧក្ក <°ukka>, but what is Jenner's <ukk>? Is it ឧក្ក៑ <°ukk·> with a virāma? In Old Khmer Grammar, final <-a> is only in Indic loanwords with the exceptions of CV syllables (ka 'clause conjunction', ta 'subordinating conjunction', sa 'white'). Did Old Khmer scribes carefully write virāmas all over the place - a practice abandoned in modern Khmer?
(Wikipedia says the Khmer virāma is "mostly obsolete". Huffman [1970: 53] says it is "sometimes used in the transcription of Sanskrit words"; his exercises do not mention it at all. I do see it in វេយ្យាករណ៍សំស្ក្រឹត veyyākaraṇa˟ saṁskrïta> 'Sanskrit Grammar' .)
I have wondered if Jenner simply omits <-a> in
transliterations of native Khmer words ending in consonants, but that
does not explain cases like <ukk> ~ <ukka> or
<ʼāyatta> ~ <ʼāyatt> 'dependence' < Skt āyatta-
If I were right, <ukka> and <ʼāyatt> shouldn't exist, but they do. Is Jenner in fact consistently writing all word-final consonant symbols without any other dependent symbols as <Ca>? If so, then do transliterations like <ukk> and <ʼāyatt> reflect spellings of words before consonant initials of other words?
That new hypothesis predicts that the courtesy title <poñ>
should always precede a consonant. And yet ... Old Khmer Grammar
example 307 begins with
<poñ uy oy kñuṃ ...> (K.557/600N: 1, 612 AD)
poñ Uy give slave
'The poñ Uy has given slaves ...'
I would expect ˟<poña uya oy kñuṃ>. How were those words written in the original? As
(a) four akṣaras without virāmas
<po ñu yo ykñuṁ> (I prefer <ṁ> for anusvāra to <ṃ> which I reserve for Pyu subscript dots.)
(b) four akṣaras with virāmas <·>
<poñ· uy· oy· kñuṁ>
(c) something even more bizarre with subscript independent vowel symbols ឧ <°u> and ឱ <°o> instead of the dependent vowel symbols in (a)?
If not for Jenner's transliteration, I would have imagined something with seven akṣaras like
<po ña °u ya °o ya kñuṁ>
But Jenner's dictionary doesn't list a spelling <poña> for the courtesy title.
For comparison, a modern Khmer word-for-word translation of the phrase would be
<pa ṅa °u ya °oya¹ khñuṁ>
without any indication of which <a> are silent.
If I am right about ឪ <°ǔ> being from ឧ <°u> + វ <va>, I might expect to find earlier spellings of ឪ-words with <°uva>. The most important ឪ-word might be <°ǔbuka> /ʔəwpuk/ 'father'. The earliest attestations of this word that I can find are <°ābbhūka> (1599) and <°ābuka> (1602) which are close to Sanskrit āvuka- 'father'. A regular reflex of <°ābuka>, អាពុក <°ābuka> /ʔaːpuk/, exists today alongside <°ǔbuka> /ʔəwpuk/. How did /aː/ change to /əw/? I don't know of any other instance of such a change.
I think /əw/ is the regular reflex of */əw/ (not */aː/!) after
*voiceless consonants. (Is /ʔəwpuk/ < */ʔaːbbuk/ with reduction of
and lenition of */b/?)
*/əw/ raised to /ɨw/ after *voiced consonants. Compare:
ត្រូវ <trūva> /trəw/ < *trəw 'correct'
នូវ <nūva> /nɨw/ < *nəw 'with'
Do the <ū> spellings reflect earlier vowel length? Conversely, the absence of the extra stroke for vowel length in ឪ <°ǔ> could be interpreted as indicating an absence of earlier vowel length, but I don't think ឪ <°ǔ> and <ūva> had distinct *rhymes. The extra stroke of ឩ <°ū> may have been dropped from the abbreviation ឪ <°ǔ> as redundant since there was no contrast between <uva> and <ūva>. Did a transitional character combining ឩ <°ū> with the 'hair' of វ <va> ever exist?
¹In modern Khmer, ឲ្យ /aoj/ 'to give' has an
unusual spelling <°ȯya> (not ˟ឱយ <°o ya>!). My
transliteration has no space to indicate that <-ya> is a
subscript rather than an independent akṣara យ <ya>. That
subscript is unusual because it represents a final glide /j/ rather
than a medial glide /j/ (as in ខ្យល់ <khya la'> /kjɑl/ 'wind') or
zero (as in ពាក្យ <bā kya> /piəʔ/ 'word' < Skt vākya-).
ឲ <°ȯ> is a rare character that is, as far as I know, unique to ឲ្យ <°ȯya>. I transliterate it as ឲ <°ȯ> to distinguish it from regular ឱ <°o>.
Could <°ȯya> have originated as an abbreviation of ឲយ្យ
<°ȯyya> (my guess for what Jenner's 17th century oyya
represents)? Or is subscript <y> for final /j/ a remnant of this
Subscripts were previously also used to write final consonants; in modern Khmer this may be done, optionally, in some words ending -ng or -y, such as ឲ្យ aôy ("give").
I would like to know which other words have subscript <ṅ> and <y> for codas.
XIANGNAN TUHUA INTERLUDE: GITHUB JIANGYONG
1. Tonight I discovered sgalal's list of '江永 Jiangyong dialect' readings for 女書 Nüshu 'women's writing' characters on GitHub. This dialect which I'll call GJY (Github Jiangyong) differs yet again from the ones I talked about last night:
||uɯ³³ ~ fɯ³³
GJY is written without non-ASCII characters, so I am unsure if w
is really ɯ, e is really ə, etc. In any case, GJY is
distinct from OXT, Daoxian, and Baishuicun.
All three morphemes are written with one character in OXT Nüshu, but
are written with two different characters in GJY Nüshu. Given the
mismatches above, I expect many other dialectal differences in Nüshu
I found GJY via nushuscript.org
which also has two different hanzi-to-Nüshu
2. Those converters use images to allow people without Nüshu fonts to see Nüshu characters. I'm one of those people. I went looking for a free Nüshu font and found a page about Chelsy Jiayi Wu's NVSHU SANS (V being a common substitution for Mandarin Ü). Some of the issues she mentions are relevant for Tangut, Jurchen, and Khitan as well as Nüshu: e.g.,
Every handwritten sample reflects the varying styles of its author. Without a history of standardization, it is difficult for me to identify what elements of each character are necessary for letterform identification. Where should a stroke begin and end? Which elements are ornamental and which are absolutely essential? Where should this dot be positioned relative to the stroke?
Chelsy Wu has an interesting background: "Born in Tokyo, raised in Shanghai", and a triple native speaker of Japanese, Mandarin, and English. (No Shanghainese?) She runs the site Explorations in Global Language Justice.
188.8.131.52:35: OMNIGLOT'S XIANGNAN TUHUA SAMPLE (PART 1: INTRODUCTION)
sample of 女書 Nüshu 'women's writing' characters with readings in an
unspecified variety of 湘南土話 Xiangnan Tuhua
'Southern Hunan local speech'. This variety (hereafter 'OXT') is
not the same as the 道縣 Daoxian
and 白水村 Baishuicun¹ varieties at
their reflexes of Middle Chinese 話 *ɣwæjʰ 'speech':
Daoxian and Baishuicun are about 45 km apart. I have no idea how far they are from OXT.
Omniglot gives the local name of Xiangnan Tuhua as [tifɯə] without specifying tones. I suspect [tifɯə] is etymologically 地話 'earth speech', so [fɯə] may be from a fourth variety of Xiangnan Tuhua (OXT2?).
Unless I'm misreading the OXT sample, the OXT reflexes of Middle Chinese 話 *ɣwæjʰ 'speech', 花 *xwæ 'flower, and 會 *ɣwajʰ 'association' are homophonous. But that is not the case with Daoxian and Baishuicun:
||uɯ³³ ~ fɯ³³
會 in 會不會 lit. 'would or wouldn't' (hard to translate; more examples here) has another pronunciation in Daoxian: xo⁵². As association-會 and 會不會-會 are usually homophonous, I suspect that ui²² and xo⁵² belong to different strata of Daoxian: at least one may be borrowed, and if both are borrowed, one is newer than the other.
I added a Mandarin column to show how different Xiangnan Tuhua
varieties from it as well as from each other.
That glimpse at Xiangnan Tuhua-internal variation makes me wonder how that variation maps onto Nüshu. I betray my ignorance of Nüshu here with some basic questions:
How diverse are the varieties of Xiangnan Tuhua represented in Nüshu?
Is it possible to determine that Nüshu was originally developed
to represent a specific variant? E.g., if Nüshu has a single character
to represent 'speech', 'flower', and 'association', that coud suggest
it was originally developed to represent OXT or a dialect like OXT
rather than varieties like Daoxian or Baishuicun in which the three
morphemes are not homophones.
Is there internal evidence within Nüshu suggestive of sound
changes that occurred since it was first developed? For example, does
Nüshu have two homophonous characters that could have represented two
different syllables that later merged? Conversely, have there been
recent phonemic splits not reflected in Nüshu?
2. Last night - shortly after mentioning
Wanzi Gelao - I was horrified to learn of this fake 'Gelao' manuscript.
A fake is bad enough; a fake that is simply disguised Chinese is even
worse. To pretend that the language that is replacing Gelao is 'ancient
Gelao' is tasteless.
¹Five years ago, I wrote a ten-part series on Baishuicun: 1-4 / 5-8 / 9-10.