220.127.116.11:57: TANGUT DATABASE 2.0
Thanks to Andrew West for giving me the push I needed to update my Tangut database. Version 2.0 has the following improvements:
1. Unicode Tangut characters for each entry.
2. The addition of B variants for six characters:
1267A/B 𗈕𗈊 2zhi 'to boil in a covered pot over a slow fire; to braise'
2799A/B 𗽍𗽎 1dwy'1 'protruding'
2941A/B 𗁙𗁚 2kyr'4
3007A/B 𗌕𗌖 2?ar? 'net'
3286A/B 𗧧𗨢 2li3 'to spoil/dote on a child'
5150A/B 𗮐𗮑 1thu'1 'to beg, request, demand'
Can you spot the differences between the variants?
3. Revised entries for
0671/2572 𘈥𗂔 1sa1 'to swell; to choke' (another pair of variants; the meanings seem like opposites)
5457 𘏠 2tsoq4 'to penetrate, pass through' (I previously misread this as 2tse'4 since Li Fanwen [1997: 988] gave its rhyme as 2.35; the actual rhyme is 2.64)
The file still has vast room for improvement and expansion.
18.104.22.168:36: THE SECRET OF JURCHEN BI
The Jurchen (large) script remains opaque even though most of its characters can be more or less read. The problem is that the readings do not seem to correlate with the structure of the characters. Characters with similar components generally do not have similar readings: e.g.,
giru has a variant
su (phonogram; perhaps a simplification of Chinese 蘇, pronounced *su during the Jin dynasty)
giru (first two syllables of girucu 'shame'; originally a standalone logogram girucu?)
and if one alters the right side a bit, the result is
bi (phonogram)which has two further variants
The right hand element of bi is not found in any other characters. What would possess the 'creator' of the Jurchen script to take su and add three strokes to it to write bi?
Those three strokes vaguely resemble the Chinese character 匕 'spoon'
(or the right side of the more common homophonous character 比 'to
compare') whose Jin dynasty reading *pi would have sounded like
bi [pi] to Jurchen ears. So is Jurchen bi a phonetic
compound of su and *pi?
But what is su doing in bi? Su has no known meaning
and doesn't sound like bi.
Today I came up with a solution that explains both halves. I think Jurchen bi might be somehow related to the Chinese character 秘 ~ 祕 'secret' whose Jin dynasty reading was also *pi. The su-like part on the left corresponds to 禾 ~ 示(also written 礻) and the three strokes on the right correspond to 必.
22.214.171.124:50: BATAVO-TAIWANESE ACRES
While looking in Endymion
History: A Manual (2000 edition - I'm three editions behind)
for an English equivalent of the Chinese (and Khitan) title 開府儀同三司 for my last entry, I stumbled upon
Taiwanese word 甲 kah [kaʔ˧˨], a measurement of land, on p. 243.
I was surprised to learn that it was a borrowing of Dutch akker (cognate
to English acre - though a kah is actually about 2.1 acres).
I had always assumed that the Dutch had never left any linguistic traces in Taiwan. Wrong!
How many other Batavo-Taiwanese
words are there? The Wikipedia
entry on Taiwanese doesn't mention the existence of Dutch loans.
I just found that Wiktionary has a long English entry for 開府儀同三司. Nice.
VEXED BY FIREFOX (PART 1)
I've almost always been able to use Firefox when Chrome failed me. Until now.
What would Firefox be called in Tangut? How about
4408 1870 1my1' 1jy2 'fire fox'?
The first half (4408) has bothered me for years for two reasons.
First, why does 4408 contain what looks like a <WOOD> radical
(𘡩) atop what have been thought to be two
<FIRE> radicals (𘠠 and 𘧦)? Compare 13+-stroke 4408 to the 4-stroke
simplicity of Chinese 火 'fire'. The Tangraphic Sea analysis is
improbable: would the graph for the basic word for 'fire' really be
derived from the graph for half of a n apparently nonbasic word for
4408 1my'1 'fire' =
top and left of 4413 2pu4 'to burn, ignite' (semantic) +
right of 5082 1vi1 (second syllable of 𘓼𘍽 4555 5082 1py1 1vi1 'fire', only attested in dictionaries; could the first syllable, attested as the name of the trigram for 'fire', be cognate to 4413?; semantic)
The derivation for 4413 is unknown.
The derivation for 5082 is circular:
5082 1vi1 (second syllable of 1py1 1vi1 'fire') =
left of 5286 (second sylllable of 𘄦𘍵 1772 5286 1ten4 1vi1 'intelligent'; phonetic) +
right of 4408 1my'1 'fire' (semantic)
Surely 4408 was devised before 5286.
Second, 𗜐 4408 1my'1 < *miX 'fire' has the mysterious phonetic characteristic that I call 'prime' and represent as an apostrophe which is easier to type than a true prime symbol. I represent its pre-Tangut source as *X (though I could just as easily carry over the prime notation, since I have no idea what *X was). A mi-word for 'fire' is widespread in Sino-Tibetan, but none of the cognates of pre-Tangut *miX contain any obvious segment or tone that plausibly correlates with *X. Suppose, for instance, that I proposed that pre-Tangut *X corresponds to Written Burmese -ḥ. That correspondence works for 'fire' and 'nine' but not for 'two' and 'five'. Written Burmese 'two' lacks -ḥ, and (pre-)Tangut 'five' lacks *-X/-'.
7.11.21:55: A table of the above words and more:
||Li Fanwen number
||nhac < *n̥ik||kni
||*pŋi < *pŋa?||*ʁuɑ L
||*ŋgiX < *ŋgiwX?||*χguə
Proto-Southern Qiang reconstructions are from Evans (2001). Key to his tone symbols:
parentheses: one counterexample exists in the data
dash: data are equivocal
Evans did not reconstruct a tone for 'nine'. Using his notation, I
would reconstruct *(H): Longxi and Mianchi have high tones, but Taoping
has a mid tone which normally points to *L.
My near-total ignorance of Pyu basic vocabulary (e.g., 'fire') does
raise the troubling possibility that Pyu is a non-Sino-Tibetan language
with loans from Sino-Tibetan. Tai has borrowed nearly all its lower
numerals (with the exception of 'one') from Chinese.
My reconstruction of pre-Tangut 'nine' implies a chain shift:
*-k >*-w > Ø
Pre-Tangut *-w was lost, and Tangut gained a new -w from the lenition of pre-Tangut *-k: e.g., in
𘈩 0100 *kʌtik > *lew 'one'
'Three', 'four', and 'five' all had the same tone (or, more likely, segmental source of a tone) in Proto-Lolo-Burmese, and I suspect that tone source spread from one numeral to the others. (Cf. how *-i spread from 'four' to 'five' in pre-Tangut. Or how ) If that tone corresponded to Pyu -h, then that tone source spread from 'three' to 'four' and 'five' in Proto-Lolo-Burmese (or some ancestor of PLB). But that scenario assumes Pyu is conservative, which I don't think it is.
A huge problem is that the final segments (or quasi-segments in the case of [pre-]Tangut *-X/-') line up poorly. Ideally I'd like to see a pattern like
Tangut tone 2 : Written Burmese -ḥ : Pyu -h : Written Tibetan -s : Old Chinese *-s
among the oldest languages (Proto-Southern Qiang tones are of recent origin), but there are no instances of that above. And the thought of languages adding a final *-s or *-h to some random numerals but not others bothers me.
Also disturbing is the possibility that (pre-)Tangut *-X/-' corresponds to nothing in any other language because it is a reflex of a Proto-Sino-Tibetan phonetic feature completely lost elsewhere. I'd like to think that maybe some Qiangic language (i.e., a relatively close living relative of Tangut) has something corresponding to (pre-)Tangut *-X/-'. Proto-Southern Qiang apparently isn't that language.
One more possibility is that *-X/-' is unique to Tangut because it reflects a substratum language which had it. But that hypothesis cannot be tested since we know nothing about such a substratum language (unless its traces are in the so-called 'ritual' language [see Andrew West's skeptical take], and -' does not seem to be any more prominent in that subset of the Tangut vocabulary - in fact, -' is even less frequent in the 'ritual' numerals than in the regular ones!). And if a substratum language had -', why would its speakers impose that feature onto a language that didn't have it? I don't know anything about the English of Hmong native speakers, but I imagine that English does not have any uvular phonemes (that is, a feature in Hmong absent in English).
However, I can imagine a situation in which a speaker of a continental Altaic-type language would introduce uvulars into English because uvulars and velars are in complementary distribution in their own language (i.e., nonphonemic): e.g.,
native, English /ki/ > [ki]
native, English /ka/ > [qɑ]
(For convenience I use the symbol /k/ to represent the Altaic back
consonant. One might argue that ideally I should use a symbol other
than /k/ or /q/ to avoid implying that one allophone is more like the
Platonic form of the phoneme than the other.)
But ... when Khitan and Manchu actually did encounter [ka]-type combinations violating their phonotactics in Chinese, they borrowed them as /ka/: e.g.,
Liao Chinese 開 *kʰaj 'to open' > Khitan small script <k.ai> (not <q.ai>; element in the title <k.ai fu ng.i t.ung s.a.am sï> < 開府儀同三司 *kʰaj fu ŋi tʰuŋ sam sz̩, lit. 'open government ceremony same three official' and not a monosyllabic verb 'to open')
Mandarin gang [kaŋ] 'steel' > Manchu g'an [kaɴ] (not [qaɴ])
As a result, the uvular-phonetic distinction became phonemic as well
as phonetic: e.g., these new imported /ka/ contrasted with native /qa/.
Then again, I am citing written Khitan and Manchu which may have reflected an elite, idealized pronunciation. Some Khitan and Manchu speakers learning Chinese might have pronounced uvulars before /a/. If they did, at least they had a phonotactic motivation for doing so. The phonotactic motivation, if any, for pronouncing whatever -' was in Tangut is unknown. Minimal pairs such as
3513 1my1 'sky' : 4408 1my'1 'fire'
seem to rule out a phonotactic motivation.
Could the fact that 'sky' and 'fire' had different vowels in
pre-Tangut be relevant? Could I abandon *X and instead propose
pre-Tangut *-u > -y (e.g., 'sky'; cf. Written
Burmese muiḥ < *məwh 'sky')
but pre-Tangut *-i > -y'?
No, because there are cases of -y' from pre-Tangut *-uX and -y from pre-Tangut *-i: e.g.,
𗡡0320 1vy'1 < *NApuX or *CANpuX 'soft, weak' (cf. Japhug mpɯ < *-u 'soft')
4880 2ryr1 < *riH 'copper' (cf. Written Tibetan gri 'knife'?)
(The -r of 2ryr1 is vowel retroflexion conditioned by *r-. As 1lyr'3 'four' above demonstrates, there is no phonotactic constraint against ' coexisting with retroflexion, so I cannot claim that 2ryr1 would have ended in -y' if not for retroflexion.)
There is even a doublet for 'worm'
1888 2by1 < *mbuH and 5270 1by'1 < *mbuX
which is cognate to Written Tibetan Hbu [mbu] 'id.' See Gong Hwang-cherng's "A Hypothesis of Three Grades and Vowel Length Distinction in Tangut" (1995) for more examples. (Gong's 'long vowels' correspond to my V' 'vowel-prime' sequences. The correct explanation for -' would have to account for such doublets.
126.96.36.199:59: TWENTY BLADES OF CHINESE GRASS
At the end of "An-derused",
I was surprised to see 漢 <CHINESE> 한 Han with 艹 instead of 廿 on the top right on the cover of 最新版常用學習三千漢字 Chhoeshinphan sangyong haksŭp samchŏn hancha (Three Thousand Hanja for Everyday Study: New Edition).
I was even more surprised to look inside and see the entry for 漢 <CHINESE> 한 Han on p. 47. Each of the three thousand hanja in the book has a large entry character atop a chart showing how to write it in seven steps and one or more example words containing it. The large entry character is 漢 with 艹 (resembling the character component <GRASS> though actually having nothing to do with grass) on the top right. However,
the hanja is listed as "水 radical [i.e., 氵] 11 strokes" (the 11-stroke figure only makes sense if the hanja has 4-stroke 廿 [resembling the character <TWENTY> though having nothing to do with twenty] rather than 3-stroke 艹)
the seven-step writing diagram has 艹 in step 2 and 廿 in step 3
the example words 怪漢 koehan 'suspiciously behaving man' and 漢方 Hanbang 'Chinese medicine' have 漢 with 廿 on the top right.
That must be confusing to someone who does not know how to write the
shows both ways to write <CHINESE>. If I were to write a book
on hanja, I'd bring up the variation of <CHINESE>.
I can describe that variation in terms of Unicode:
the 廿-version is U+FA47
the 艹-version is U+FA9A
So why don't I just type U+FA47 and U+FA9A instead of resorting to phrases like "漢 with 廿 on the top right"? Because I don't think most people have fonts that support the distinction between the two forms.The <CHINESE> hanja that you see here in fact has a third Unicode codepoint: U+6F22. Why are there three codepoints for two forms of <CHINESE>¹?
The 廿-version (U+FA47)
was added later in Unicode
for compatibility with the Japanese non-Unicode JIS standard which has
a separate codepoint for the 廿-version; U+6F22 corresponds to the
艹-version in Japanese fonts. (The Unihan database gives "J3" as a
source, but J3
is a code for JIS X 0213:2004 level-3 which I presume didn't exist
The 艹-version (U+FA9A)
was added even later in Unicode
for compatibility with the North Korean non-Unicode KPS 10721-2000
standard which has a separate codepoint for the 艹-version; U+6F22
presumably corresponds to the 廿-version in North Korean fonts. (I can't
find a copy of the North Korean standard online to confirm my guess.)
This table is my attempt to show the relationships between a few encodings and forms of <CHINESE>:
||North Korean equivalent
||South Korean equivalent in KS
The duplicate codepoints in Unicode are a byproduct of the different
versions of <CHINESE> corresponding to U+6F22 in Japanese and
North Korean encodings. In an 'ideal' Unicode without regard for
non-Unicode encodings, there would either be two codepoints for the two
versions (following a maximalist philosophy of one codepoint per form)
or just one (following a minimalist philosophy of one codepoint per platonic
character) but not three.
¹There are in fact at least 31 forms of <CHINESE>, but the 廿~艹 variants (and the simplified Chinese form 汉) are all that are needed for everyday purposes.
Of course right after I finished my previous post on 顏 U+984F~顔 U+9854 for Sino-Korean 안 an <FACE>, I realized I should have checked the very first Sino-Korean dictionary, 東國正韻 Tongguk chŏngun (1447), which is also one of the earliest hangul texts. Its entry for ᅌᅡᆫ ngan (the prescriptive 15th century reading of <FACE>) has the form 顏 U+984F. Needless to say, it is absurd to draw direct lines across three vastly different periods, but I'll do so anyway:
Tongguk (1447): 顏 — Gale (1897): 顏 — Sae chajŏn (1961): 顏 (all U+984F)
Those are the three earliest texts in my survey so far. I am certain of what I have seen in them. I am less certain about these search results from titles and authors in the National Library of Korea's database (via Cambridge's list of Korean studies resources), since it's possible someone typed one form instead of the other:
Sorting the results by date reveals some obvious typos: e.g., modern items like a book with 2018 in the title dated "201" instead of "201X". And the difference between "201" and "201X" is more obvious than that between 顏 U+984F and 顔 U+9854.
It is certainly not true that 顔 only appears in post-1961 books. The earliest result for 顔 is 史鉞 Sawŏl (The Axe of History, 1506). Although I don't have time to go through the online scan of the book (there is no search function), I can believe 顔 was in it, since the earliest attestation of that form that I can find is in the Chinese rhyme dictionary Guangyun (1008).
Conversely, it is also not true that 顏 U+984F is absent from recent publications, as ... oh no. The results include anything with an in the title or author's name regardless of whether it's spelled 顏, 顔, 晏 (the surname of the author of Sawŏl), in hangul as 안, etc. I suppose that makes sense in a time when few people may know what the proper hanja is. But why does searching for 顏 U+984F and 顔 U+9854 generate different results if all that matters is the presence of a syllable an regardless of written form? I don't know. I wonder if the site developer will ever address that question.
I'm going to look at the question of 顏 U+984F~顔 U+9854 from one last
angle. Here is a list of the frequency of the two forms in South Korean
national newspapers according
to Google. I have arranged the papers in order of circulation whenever
I could find figures. The figures were partly undated, so this table
cannot be interpreted as a true ranking. I just wanted a rough idea of
the popularity of the various papers.
||顏 U+984F||顔 U+9854||Notes
||顔 U+9854 figure includes instances in the paper's Japanese edition.|
||0 in spite of the fact that the paper does not
have a no-hanja policy like Hankyoreh (see below).
||顏 U+984F figure excludes instances in the
paper's Chinese edition.
顔 U+9854 figure includes instances in the paper's Japanese edition.
||The paper has a no-hanja policy in its Korean
edition, so the figures are for 顔 U+9854 in the paper's Japanese
The one instance of 顏 U+984F is in a comment in the Japanese edition and is probably a character selection error, as the rest of the comment is in postwar characters; the writer is not someone like me who insists on prewar orthography.
For comparison, in Asahi shinbun, 顏 U+984F appears 30 times and 顔 U+9854 appears 165,000 times. (Those figures include <FACE> for native kao as well as Sino-Japanese gan, whereas <FACE> in Korean only represents Sino-Korean an.) Kanji are alive and well in Japanese, whereas hanja are in decline in Korean. I took hanja seriously when I first started learning Korean in 1987. The newspapers were full of them then. But now Hankyoreh has zero in its Korean edition. An all-kana Japanese newspaper is unthinkable today, even though Japanese TV news reporters demonstrate it is possible to present the news orally without any kanji (not counting onscreen text).
What is missing from the figures above are a sense of proportion and the time dimension. What is the frequency of each form of <FACE> per million characters (counting hangul letter blocks as single characters) per year per publication? My guess is that the Japanese usage of both forms of <FACE> has remained constant after the postwar writing reform, whereas <FACE> in either form has become increasingly infrequent in Korean, though 顔 U+9854 has taken the lead due to
the inability to type 顏 U+984F in Windows' Korean IME
日常 ilsang 'everyday, common': 0
日曜日 iryoil 'Sunday': 0
日記 ilgi 'diary': 0
일상 ilsang 'everyday, common': 194
일요일 iryoil 'Sunday': 425
일기 ilgi 'diary': 593
中央日報 JoongAng Ilbo 'Central Daily'
創刊号 chhangganho 'first issue'
創刊辭 chhanggansa 'first issue editorial' (i.e., something
written to introduce the first issue)
1965年9月2日 chhŏn'gubaengnyukshibo-nyŏn kuwŏl iil 'September 2, 1965'
第1號 che ir ho 'issue number 1'
日刊 ilgan 'daily'
<NUMBER> 호 ho appears on the front page as 號 U+865F and as 号 U+53F7, the simplified form also used in postwar Japan. I get the impression that official standards aside - 号 U+53F7 isn't supported by KSC encoding or in the 1,800 hanja taught in secondary schools - Korean typographers and even reference book writers are not purists when it comes to hanja forms. I was surprised to see 漢 <CHINESE> 한 Han with 艹 instead of 廿 on the top right on the cover of 最新版常用學習三千漢字 Chhoeshinphan sangyong haksŭp samchŏn hancha (Three Thousand Hanja for Everyday Study: New Edition).
188.8.131.52:59: J-AN-US: THE TWO <FACE>S OF NAVER
Why do I care so much about minute variations like 顏~顔 for Sino-Korean 안 an <FACE>?
In TJK¹ studies, subtly different graphs are often regarded by modern scholars as separate entities. Whether such differences also reflect linguistic differences requires study.
No such study is needed to know that 顏 and 顔 are the 'same'
character in one sense. But in Unicode, they are not: 顏 is U+984F and 顔
is U+9854. Unicode is not consistent about assigning variants to
different codepoints. That is not necessarily a flaw. Should the
VS17 and VS18 forms of 喩 U+55A9 have different codepoints? I
couldn't tell them apart without laying VS18 over VS17. On the other
hand, why doesn't VS19
of 囀󠄀 U+56C0 have its own codepoint? Andrew
West has much more on this issue.
Back to Korean: my interest lies in determining what the de facto standard form of 안 an <FACE> is or was at different points of time.
Last night I forgot to check Gale (1897), one of the first Korean dictionaries I ever used. Page 943 has 顏 U+984F.
Today I use naver.com. Its hanja dictionary treats 顏 U+984F as the 本字 ponja 'original character' of 顔 U+9854, but its entry for 顔 U+9854 is lengthier, including lists of 18 words and 5 phrases containing 顔 U+9854 without equivalents in the entry for 顏 U+984F. Clearly the dictionary regards 顔 as the principal form. Yet if I run a search on those characters throughout the entire dictionary (i.e., if I have 전체 chŏnchhe 'entire body' selected), I get
One might conclude that the 31 words containing 顏 U+984F can never be written with 顔 U+9854, that there are no phrases that can be written with 顏 U+984F, etc. But that isn't true: anything that can be written with one can be written with the other. And yet there may not be any overlap between those lists: e.g.,
伯顏 paegan 'Jurchen word for a rich man' has no alternate spelling 伯顔 with U+9854listed
顔面 anmyŏn 'face' has no alternate spelling 顏面 with U+984F listed
有顏面 yuanmyŏn 'having a face'
隔歲顏面 kyŏkseanmyŏn 'a face one meets for the first time
in a year'
My impression is that in South Korea 顔 U+9854 has become dominant but has not yet fully eclipsed 顏 U+984F. Otherwise I would expect Naver to be like Japanese dictionaries which have a single main entry for 顔 U+9854 and list 顏 U+984F as a variant.
I predict that the domination of 顔 U+9854 will increase over time as
Koreans type fewer hanja and only use hanja that their IMEs provide for
them: e.g., 顔 U+9854 but not 顏 U+984F in the case of Windows.
¹Andrew West's term for Tangut/Jurchen/Khitan, a play on CJK for Chinese/Japanese/Korean.
THE GOOD FACE OF MILLET
Leftovers from yesterday:
1. Lenition was an unspoken theme of "Fanning Red Ears of Grain". Yesterday I realized that
Dutch goeie < goede 'good'
Japanese 良い yoi < yoki 'good'.
In both cases, the lenition is not regular. Not all intervocalic -d-
and -k- have disappeared from those languages. Goede
still exists as a formal form, and 良き yoki still exists
as an archaic form. Both unlenited broeder 'brother'
(religious) and lenited broer 'brother' (sibling) coexist in
formal Dutch. Japanese -k- almost never lenites in noninflected
forms: e.g., 時 toki 'time' has not become ˟toi. (The
one exception I can think of is 垣間 kaima 'gap in a fence' < kaki-ma-mi
'fence-space' in which the noun kaki 'fence' - never ˟kai
by itself - lenited.)
Lenition is mandatory in inflected forms apart from archaisms, and not
all hypothetical archaisms are possible: e.g., no one says ˟kakita
'wrote' instead of kaita < *kakitari. *tari
didn't regularly become ta
in Japanese; it is another example of reduction. All these examples
demonstrate how reduction is not necessarily regular like a 'sound
law'. I expect nonreductive sound changes to involve
exceptionless sound laws: e.g., *dh > *d in Germanic.
(Maybe not the best example since *dh > *d could be
regarded as reduction [aspiration loss], but I don't know of any
language in which deaspiration isn't regular. Was *dh really *ð?
See Phoenix's )
I got the broeder/broer example from Phoenix who has more on Dutch (and Irish) lenition.
For even more including cases of hypercorrect -d-insertion
in Dutch, see de
Vaan (2018: 64-65).
I assume goei
(formal goed) 'good' is a product of backformation from
lenited goeie, as I don't know of any other cases of
Dutch final -d [t] becoming -i.
2. I forgot to mention one other 'ao-ddity' in "Fanning Red Ears of Grain" - a Japanese name that is sui generis as far as I know:
*apa-pu 'millet-place.where.grow' > *ababu > *aβaβu > *awawu > *awau > *awɔː > 粟生 <ahafu> ao (not ˟aoː!)
I can't explain why the final vowel isn't long. Was *aoː
confused with the much more frequent word ao 'green'?
3. When I typed Japanese 顏 kao 'face' in "Fanning Red Ears of Grain", I initially used Windows 10's Korean IME since I thought the standard modern forms of hanja were identical to the prewar kanji I prefer. But to my surprise, the IME converted 안 an (the Sino-Korean reading of 顏) to 顔 which looks exactly like the postwar kanji for kao. I was surprised.
When I was studying Korean, my instructor pointed out I had written a postwar Japanese-style kanji instead of the proper hanja which was identical to the corresponding prewar kanji. Perhaps that was before I decided to embrace prewar Japanese orthography in formal writing. Ever since then I've been writing hanja and prewar kanji identically without any problems. Until now.
I did a quick survey of Korean books I could easily find to see what form of <FACE> was in them. Obviously Wiktionary isn't a book, but I've included it anyway:
||Author or publisher
||새字典 Sae chajŏn (New Dictionary of Characters)
||東亞出版社 Dong-A chhulphansa
||A Korean-English Dictionary (in entry for
顏面 anmyŏn 'face')
||Martin, Lee, and Chang
||賢學學習玉篇 Hyŏnhak haksŭp okphyŏn (Hyŏnhak Study
||A Guide to Korean Characters
||Bruce K. Grant
||Jacob Chang-ui Kim|
||最新版常用學習三千漢字 Chhoeshinphan sangyong haksŭp
samchŏn hancha (Three Thousand Hanja for Everyday Study: New Edition)
||弘新文化社 Hongshin munhwasa
||Pictorial Sino-Korean Characters||Jacob Chang-ui Kim
||동아現代活用玉篇 Dong-A hyŏndae hwaryong okphyŏn (Dong-A Modern Practical Jewel Book)||東亞出版社 Dong-A chhulphansa|
||List of 1,800 hanja taught in South Korean schools||Wiktionary
<FACE> is in the KSC standard as presented in 中日朝漢字字形對照 'Chinese-Japanese-Korean Chinese character form comparison' and Wiktionary. But there are multiple versions of the KSC standard, so maybe the form changed over the years.
I wonder what the standard form is in North Korea.
Grant (1982) and Kim (1984) were my introduction to hanja. I started learning readings from their indexes in 1987. I obviously wasn't paying attention to the form of <FACE> back then.
My copy of Hyŏnhak haksŭp okphyŏn has a cover attached upside down and a partly mirror-imaged page of publication information which do not inspire confidence. Is Hyŏnhaksa still in business? I can't Google a company site.
184.108.40.206:50: LA SCAR
425 years ago today, Portuguese and lascarins invaded Kandy. Lascarin is from Persian لشکر lashkar.
article derives Persian lashkar 'army' from Arabic العسكر al-`askar
'the army'. So is lashkar an article-incorporating word like algebra
or Haitian Creole lalin < la lune?
No, because the Persian word is attested in Middle Persian as <lškl> before the coming of Islam. So the direction of borrowing and analysis was the other way around: a Persian word without an article was reinterpreted as an Arabic article-noun sequence.
But why does the Arabic word have `ayn, a consonant absent from the Persian word (and Persian in general)?
Perhaps at the time of borrowing, the first vowel of Persian lashkar sounded more like Arabic /a/ after `ayn rather than Arabic /a/ after a glottal stop.
Another possibility - not necessarily exclusive - is that lashkar
could not have been interpreted as al-'askar because the
Persian form has no glottal stop. Persian la- sounded more like
Arabic l`a, a voiced sequence without a stop, than l'a,
a voiced sequence interrupted by a stop.
A question I can't answer is why the Arabic word has s instead of sh.
FANNING RED EARS OF GRAIN
Yesterday I saw part of 47 Ronin (2013). I looked up that movie, and as a result I finally learned how to spell Akō in Japanese¹: 赤穗・あかほ <RED EAR.OF.GRAIN>/<akaho>², which looks as if it should be pronounced Aka(h)o, i.e., as aka 'red' + ho 'ear of grain' with little or no sandhi. But in fact the second and third vowels have fused into a single long vowel:
*aka-po > *akabo > *akaβo > *akawo > *akao > *akɔː > akoː
I didn't expect that because the normal reflex of *apo is ao: e.g.,
I would expect ō to be from *a(p)u, not *a(p)o. Is *a(p)o > ō a sound change in the Akō dialect?
*kapo > *kabo > *kaβo > *kawo > 顏・かほ <kaho> kao 'face'
Standard Japanese has cases of the reverse that I can't explain:
*apuŋgu > 扇ぐ・あふぐ <afugu> aogu 'to fan' (cf. Okinawan ōjun 'id.')
Compare with this (mostly) regular word from the same root:
*apuki > 扇・あふぎ <afugi> ōgi 'fan (noun)' (cf. Okinawan ōji 'id.')
*k > g is irregular. Here's a doubly irregular word:
*ambure- > 溢れ・あふれ- <afure> 'to overflow' (cf. Okinawan andiin < anriin < *ambure- 'id.')
I don't know how *mb became f. The spelling <afure> should regularly be read as ōre.
¹I read John
Allyn's 47 Ronin in English around 1986 and incredibly
never encountered the name Akō in Japanese until now!
²I write all Japanese forms in prewar kanji and kana orthography. Prewar kana orthography is closer to earlier pronunciation than modern kana orthography.
220.127.116.11:48: HE HINDIKE EPOKHE?
243 years ago yesterday, John Adams predicted that
[t]he Second Day of July 1776, will beh the most memorable Epocha, in the History of America.
His use of epocha retaining Latin -a got me wondering about the etymology of the word:
from Ancient Greek ἐποχή (epokhḗ, “a check, cessation, stop, pause, epoch of a star, i.e., the point at which it seems to halt after reaching the highest, and generally the place of a star; hence, a historical epoch”), from ἐπέχω (epékhō, “I hold in, check”), from ἐπι- (epi-, “upon”) + ἔχω (ékhō, “I have, hold”).
I then looked up the etymology of ékhō:
From Proto-Indo-European *seǵʰ-.
But wait - how can that be? PIE *s- becomes Greek h-, not zero.Oh, duh: Sihler (2008: 170) points out that Grassmann's Law applies to the secondary aspirate h- as well as the primary aspirates (kh th ph):
ἔχ ékh- < *hekʰ- < *seǵʰ-
(Here I assume the devoicing of the primary aspirates predates
Grassmann's Law. Does it? The
Proto-Greek Wikipedia page says Grassmann's Law may be
already had voiceless aspirates.)
Grassmann's Law does not apply to the future stem, presumably because the law must postdate deaspiration before *s:
ἕξ- héks- < *seǵʰ-s-.
I really should have known better because the same is true in
budhyate < *bhudh- 'wakes'
bhotsyate < *bhudh-sya- 'will wake'
Bucknell (1994: 179) lists a variant future form bodhiṣyati with an -i- blocking -s- from conditioning the deaspiration of the preceding dh. But I have not been able to confirm this form in Monier-Williams, Whitney, or the Digital Corpus of Sanskrit.
As tempting as it is to regard Grassmann's Law as a shared innovation of Greek and Indo-Iranian, that's not possible. Grassmann's Law must postdate *s- > h- in Greek, a change that never happened in Proto-Indo-Iranian. (*s > h did occur later in Iranian but not in Indic.) Wikipedia's Graeco-Aryan page suggests that
Rather, it is more likely that an areal feature spread across a then-contiguous Graeco-Aryan–speaking area. That would have occurred after early stages of Proto-Greek and Proto-Indo-Iranian had developed into separate dialects but before they ceased to be in geographic contact.
While I'm on the topic of Greek h ... today I was surprised to see Hindikē for 'Indian' in Wikipedia's "India (Herodotus)" article. Until now I thought that 'India' had initial I- in Greek because it was borrowed from Old Persian Hinduš 'Indus' (after *s > h in Old Persian; cf. Sanskrit Sindhus) after Greek had lost h-. But that Wikipedia article gives the Greek spelling Ἰνδική <Indikḗ> for Hindikē, not Ἱνδική <Hindikḗ>. Google gives only seven results for Ἱνδική. One is an OCR error for Ἰνδική. Two (1 2) are Armenian dictionaries with no Greek that I can see, three (1 2 3) appear to be copies of the same Armenian dictionary, and one is a Greek Facebook post. Hindikē looks like an error for the standard form Indikē.
THE ETYMOLOGY OF CANTONESE 1LAT
Today it occurred to me that Cantonese 甩 1lat 'to lose' may be cognate to 失 1sat 'to lose' (now a bound morpheme in Cantonese):
1lat < *l̥it
1sat < *l̥it
Also belonging to this word family is
逸 6jat < *lit 'to escape' (also now a bound morpheme in Cantonese)
What was the original root initial? Two scenarios with two
A. *l- is original, and *l̥- is
A1. from a devoicing prefix + *l- or
A2. by analogy with some other voiceless/voiced sonorant-initial verb pair.
B. *l̥- is original, and *l- is
B1. from a voicing prefix + *l̥- or
B2. by analogy with some other voiceless/voiced sonorant-initial verb pair.
The B scenario seems less popular. I've never seen anyone propose
anything like it, probably because of a reluctance to posit a primary
voiceless lateral. Voiceless laterals are uncommon in the world's
languages, though they seem common in 'Tibeto-Burman' (i.e.,
Sino-Tibetan minus Chinese - and even within Chinese, Taishanese has
But wait - if both 1lat and 1sat go back to *l̥it,
why do they have different initial consonants in Cantonese? Two
A. 1lat is native Cantonese, whereas 1sat (with
cognates throughout Chinese) is borrowed. In other words,
In native Cantonese words, Proto-Chinese *l̥- merged with *l-.
In borrowings, Proto-Chinese *l̥- merged with *ɕ-.
But how many native Cantonese words have l- as a reflex of *l̥-?
There are many Cantonese words with s- from Proto-Chinese *l̥-.
Are they all borrowings?
B. l- and s- are the products of reduction at
different points in time. Three identical Proto-Chinese sequences could
undergo three different paths of reduction:
||reduction phase 1
||reduction phase 2
||reduction phase 3
||reduction phase 4
The trouble is that I cannot easily account for a fourth type of reduction also involving an *sl-type sequence that fuses into *z-. More on this problem tomorrow.
(18.104.22.168:57: It seems that every time I write that I'll continue tomorrow, I end up finding some other topic that eats up my time the next day. In this case I am finishing a July 4th-themed post that has to go up on July 4th. So this and other loose ends will have to wait - or, worse yet, be forgotten. I have no idea how many unfinished series there are on this blog after seventeen years.)
'BASIL' IN TANGUT
While researching the post I originally intended for today, I found this Tangut borrowing of Sanskrit arjaka 'basil' in Kychanov and Arakawa (2006: 361):
4541 0013 3985 1a? 1zar 1ka'3
I would expect the Sanskrit consonant cluster -rj- to be rendered as -ryr dz- with an epenthetic retroflex vowel -yr and dz, the usual Tangutization of Sanskrit j. (Tangut, like Tibetan and Late Middle Chinese, reflects a style of Sanskrit pronunciation with dental affricates instead of palatal stops.)
(22.214.171.124:54: Compare zar for rja with ryr ga
for rka in
𗠝𘙇𘕜𗏵𗜫4541 0795 5091 3369 4293 1a? 2ryr4 1ga4 1ma4 1si4 for Sanskrit Arkamasi [a name]
from Sun and Tai 2012: 359. I cannot explain the g for k.)
But instead of †ry dza, the actual Tangut form has 1zar1 with z- and vowel retroflexion. Why?
My guess is that the Tangut reflects a rdza ka, the Tibetan version of the Sanskrit word for 'basil'. Here's what I think happened:
1. The Tangut borrowed Tibetan a rdza ka as *a rdza ka'. (I'm leaving out tones and grades for simplicity.)
2. *a rdza ka' became *a dzar ka' after *rCV became CVr (i.e., [CVʳ] with a retroflex vowel) in Tangut.
3. Medial *-dz- lenited to *-z-: *a dzar ka' > *a zar ka'.
126.96.36.199:57: CAN AI DECIPHER PYU?
tl;dr: I doubt it.
I ended my last entry with a teaser for what was supposed to be this entry. Today I did start writing part 6 of my 役/堤 series. Then I saw this on reddit:
Machine learning has been used to automatically translate long-lost languages - Some languages that have never been deciphered could be the next ones to get the machine translation treatment.
That took me to MIT Technology Review which links to the original paper "Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B". I haven't looked at it yet. I am not a computer science person, so I almost certainly wouldn't understand it. I do understand the MIT article, so I'll make a few comments here.
The big idea behind machine translation is the understanding that words are related to each other in similar ways, regardless of the language involved.
So the process begins by mapping out these relations for a specific language. This requires huge databases of text.
There is no huge database of Pyu text. My text file of all the Pyu text that I can 'read' (not understand - just transliterate in most cases) is 50 kb.
Such a database would be possible for Pyu's distant relative Tangut. A Khitan database, though far smaller than that for Tangut, would still be bigger than the Pyu database.
If only languages had one-to-one correspondences!
The key insight enabling machine translation is that words in different languages occupy the same points in their respective parameter spaces. That makes it possible to map an entire language onto another language with a one-to-one correspondence.
The idea is that any language can change in only certain ways—for example, the symbols in related languages appear with similar distributions, related words have the same order of characters, and so on.
The general idea that language change is constrained is correct.
With these rules constraining the machine, it becomes much easier to decipher a language, provided the progenitor language is known.
But we don't know the progenitor (ancestor) of Pyu. The reconstruction of Proto-Sino-Tibetan has barely begun. I don't even know where Pyu fits into the family.
Luo and co put the technique to the test with two lost languages, Linear B and Ugaritic. Linguists know that Linear B encodes an early version of ancient Greek and that Ugaritic, which was discovered in 1929, is an early form of Hebrew.
But Ugaritic is not an early form of Hebrew; it's an early relative. An aunt, not a mother. Mycenean Greek has a similar relationship to ancient Greek as we know it. No mention of progenitor languages like Proto-Semitic or Proto-Indo-European. It seems that the technique is actually dependent on better known relatives, not progenitors. And those relatives have to be close. Pyu has no known close relatives.
It would be interesting to test this technique on modern languages.
Spanish could be deciphered using Italian. But Italian wouldn't help,
with, say, Albanian, Armenian, or Bengali. Indo-European has enormous
internal diversity, and so does Sino-Tibetan.
But the big advantage of machine-based approaches is that they can test one language after another quickly without becoming fatigued. So it’s quite possible that Luo and co might tackle Linear A with a brute-force approach—simply attempt to decipher it into every language for which machine translation already operates.
The hope is that Linear A will turn out to be a close relative of some "language for which machine translation already operates". But what if it isn't? What if it's an isolate?
Pyu does not seem to be an isolate in the sense of have zero relatives. But it does seem to be an isolate within Sino-Tibetan - an Asian Albanian without close relatives among its neighbors. So I doubt a brute-force approach using Burmese, Chin, Karen, etc. is going to pay off.
A WÉI-RD READING
One last branch of the tree that started with 役小角 En no Ozunu's name:
While checking the Wiktionary entry for 堤 from "Edachi Again", I was surprised when I saw its list of Mandarin readings for the character. As the Sesame Street song goes, "One of these things is not like the others":
dī 'dike; base of bottle'
tí 'dike; base of bottle'
tǐ (sic; an error for dǐ) 'to stop'
shí (first syllable of 堤封 shífēng, now normally tífēng 'totally')
wéi (in place names; the only example I could find is
premodern洙堤郡 Zhūwéi Prefecture)
Normally multiple readings of a character have initial consonants at
similar places of articulation. t- and d- are both
dental and sh-, though not dental, is retroflex. w-,
however, is labial. I cannot think of any other T-character
with a w-reading.
I found 洙堤郡 Zhūwéi Prefecture in 集韻 Jiyun (1039). I did not find it in Scripta Sinica's text database, so I have no idea how old that place name is.
The Jiyun fanqie for 堤 in 洙堤郡 is
勻 *win + 規 *kwie
which adds up to a Middle Chinese reading *wie. But Middle
Chinese no longer even existed by 1039. And I could argue that 'Middle
Chinese' in the sense of 'the language of dictionaries and rhyme
tables' did not exist, at least not as a spoken language. Putting those
misgivings aside, I think an 11th century reader might have pronounced堤
in the prefecture name as something like *wi whose initial is
still hard to reconcile with the others.
I'm not even sure how to read 洙 in the prefecture name. More on this problem next time.
188.8.131.52:14: GSR 130 AND 128
GSR 851a 役 from my last three posts
looks like a semantophonetic
compound of 彳 'to go' and a phonetic GSR 130a 殳, but 殳 is in fact a
semantic component 'baton' (Karlgren 1957: 226), 'a kind of lance'
(Schuessler 1987: 563).
The standard Mandarin reading of 殳 is shū with a high level tone normally pointing to a *voiceless initial. But other evidence points to a *voiced initial. GSR 128s 殊 'to cut off' > 'very', a homophone of 殳, transcribes Sanskrit ju in 文殊 for Mañju(śrī). And 殊 is also now shū in standard Mandarin. Why aren't 殳 and 殊 ˟shú with a high rising tone reflecting a *voiced initial?
Here's how I reconstruct the history of 殳 and 殊:
Scenario A: Primary *-d-
*CIdo > *CIduo > *duo > *dʑuo > *dʑu > *ʑu > shū
Scenario B: Secondary *d-
*NITo > *NITuo > *NTuo > *nduo > *dʑuo > *dʑu > *ʑu > shū
*N- is an unknown nasal. If Old Chinese was like Pyu, it had two possible nasal initials in presyllables: *n- and *m- (but probably not *ŋ-, unless ṅraḥ /ŋ.raH/ in PYU 20 is not an isolated oddity).
*T is an unknown dental stop: *t, *tʰ, or *d.
I can posit two parallel scenarios for 投 'to throw':
Scenario A: Primary *d-
*do > *dou > *du > *dəw > tóu
Scenario B: Secondary *d-
*NTo > *ndo > *do > *dou > *du > *dəw > tóu
Schuessler (2007: 500) links 投 to Written Tibetan Hdor-ba
'to throw away' and gtor-ba 'to throw', but I would expect
Written Tibetan -r to correspond to Old Chinese *-r,
殊 'to cut off' is cognate to 誅 'to punish, kill, reprove'. I assume both 殊 and 誅 had unaspirated or voiceless-initial roots, as there is no evidence for *tʰ or *d in 誅:
*RIto > *RItuo > *Rtuo > *truo > *ʈuo > *ʈu > zhū [tʂu]
*R- might be *r- or *l-. *R- is so common in Old Chinese that I suspect it cannot simply be *r-. Written Tibetan has preinitial l- as well as r-, so Old Chinese may also have had preinitial *l-.
184.108.40.206:59: EDACHI AGAIN: WHAT COUNTS AS OLD JAPANESE?
Continuing from my previous entry ...
岩波古語辞典 Iwanami kogo jiten (The Iwanami Dictionary of Old Words, 1990) gives an example of Old Japanese 役 edachi 'being forced to fight or work for the government' from 古事記 Kojiki (Record of Ancient Matters, 712):
Tsutsumi ike ni edachite, Kudara no ike wo tsukuriki.
'[They] were put to work on dikes and ponds, [and they] made the Pond of Paekche.'
Here is the context from 倉野憲司 Kurano
(1991: 145) reading of
Mata Shiragibito maiwatarikitsu. Koko wo mochite Takeuchi no sukune no mikoto hikiite, tsutsumi ike ni edachite, Kudara no ike wo tsukuriki.
'Again Shilla people came over [to Japan]. Therefore Takeuchi no sukune no mikoto led them, had them put to work on dikes and ponds, [and they] made the Pond of Paekche.'
I fear that someone might see that entry and conclude that edachi is an Old Japanese word.
Why "fear"? Notice I wrote "reading" and not "edition". Kurano's (1991: 276) edition of the Kojiki - the text upon which his reading is based - doesn't have a single hiragana, since of course hiragana did not yet exist in 712:
It has punctuation marks that almost certainly weren't in the original text. I don't know what this passage looks like in the oldest surviving manuscript (the Shinpuku Temple manuscript from 1371-72), but you can see there is no punctuation in this image of a page from that manuscript. The punctuation is hardly the biggest problem, though.
Let's look at another reading of the Kojiki by 武田祐吉 Takeda Yūkichi (1977: 137)²:
Mata Shiraki hito maiwatarikitsu. Koko wo mochite Takeuchi no sukune no mikoto, hikiite, watari no tsutsumi no ike to shite, Kudara no ike wo tsukuriki.
'Again Shilla people came over [to Japan]. Therefore Takeuchi no sukune no mikoto led them, [and they] made the Pond of Paekche as a pond of the dike of the people who crossed over [i.e., from the Korean peninsula].'
(6.29.22:12: 'pond of the dike' makes no sense to me. The original text has 堤池 <DIKE POND>.)
It has no form of the verb edachi or even the character 役 (which Takeda reads as 渡 watari '[person who] crossed over').
Those two readings are not the only possibilities. Jidaibetsu kokugo daijiten (1967: 141) mentions two more readings of 爲役 or 役:
etatase < *ye-tat-ase-Ø 'service-stand
etate < *ye-tate-Ø 'service-stand (v.t.)-INF'
and proposes a third:
Which of these, if any, is right? There is no evidence within the original text to know. All the readings - the 讀み下し文 yomikudashibun - are translations into a stylized Japanese that is archaic but may not be identical to Old Japanese. That's why I romanize the readings in modern pronunciation.
The key word is phonograms. Unless an Old Japanese word is attested in phonograms, its phonetic value is unknown. All the e-readings of 役 are simply educated guesses. We don't know how yomikudashi worked in 712. 役 might even have been read as something like Go-on wiyaku!
6.29.22:20: The bottom line for me is that only Old Japanese words in phonograms can be cited in phonetic transcription. Old Japanese words in semantograms should be cited without phonetic transcription: e.g.,役 as <SERVICE>, not educated guesses like edachi, etachi, etatase, etate, etashi, etc.
古事記をそのまま読む Reading the Kojiki as It Is has more on the problem of interpreting 役 (or 渡 in the manuscript that it reproduces; Takeda's reading seems to be based on a manuscript with 渡): e.g.,
しかし、「役之堤池」は、全く不可解な構文である。まず、"之"が「の」を表すとした場合、「役の堤と池」は意味をなさない。 「役」を動詞とした場合も、目的語「堤池」の前に"之"を挟むことは絶対にない。 「役」がもし正しいとすれば、考えられる唯一の可能性は「"堤池之役"の誤写」である。
However, 役之堤池 is an completely incomprehensible construction. First, if 之 represents no [a genitive marker], 'dikes and ponds of service' makes no sense. [I had considered that possibility.] Even if 役 is taken as a verb, 之 would absolutely not be between it and its object 堤池 'dikes and ponds'. If 役 is taken as correct, the only conceivable possibility is an erroneous copying of 堤池之役 'service of dikes and ponds'.
6.29.21:50: Added English translations of the readings and derivations of the Jidaibetsu readings.
¹6.29.22:23: I have converted Kurano's hiragana
back into the
original kanji whenever possible to facilitate comparison with the
original all-kanji text.
²6.29.22:24: I have converted Takeda's hiragana and postwar simplified kanji back into the original kanji whenever possible to facilitate comparison with the original all-kanji text.
Continuing from yesterday's post about the unusual Japanese name 役小角 En no Ozunu:
The English Wiktionary lists edachi as a kun (native Japanese) reading of the Chinese character 役. (時代別国語大辞典 Jidaibetsu kokugo daijiten [The Great Dictionary of the National Language Categorized by Era] favors etachi.) I think edachi/etachi is not fully native. I regard it as a Chinese-Japanese hybrid *ye-(nV-)tat- 'to be forced to fight or work for the government'. The verb may be identical or similar in structure to modern 役に立つ yaku ni tatsu ~ 役立つ yakudatsu 'to be useful', lit. 'role DAT/LOC stand'. Yaku is an even earlier borrowing of Chinese 役 than e.
Before I look more into edachi/etachi, here's my take on the
history of its first morpheme 役:
The earliest reconstructible form for 役 is *CI-waj-k 'to do service' (Schuessler 2007: 568 reconstructs 役 *wai-k from 爲 *wai 'to do' = my *CI-waj.) *CI- may have been a causative prefix *SI-. (Cf. Baxter and Sagart's [2014: 56 ] causative *s-prefix.) The unknown high vowel *I is needed to account for the later vocalism (see the appendix). *-w- could be *-ɢʷ- (after Baxter and Sagart 2014), but I prefer to avoid exotic solutions if I can. See below for hard evidence for *-w- (or at least labiality).
Fusion of *-aj- into *-e-: *Ciwajk > *Ciwek
Vowel harmony-driven warping: *-e- breaks to *-ie- after a high vowel: *CIwek > *CIwiek
Presyllable loss: *CIwiek > *wiek. Early Sino-Vietnamese việc was borrowed at this stage.
*wi-fusion: *wiek > *ɥek
*e > *a in palatal environments in some southern dialects (details unclear): *ɥek > *ɥak. Go-on yaku was borrowed from such a dialect before the 7th century. The earliest attestation of the Go-on reading that I know of is wiyau (sic; error for †wiyaku¹) in Ruiju myōgishō (c. 1100). It is remarkable that an un-Japanese [ɥ] or [jw] survived in spelling if not in pronunciation centuries after *ɥak was borrowed. There is no trace of such an initial in modern Go-on yaku.
Fronting of coda after a palatal vowel: *jek > *jejk or even *jec (if Hashimoto is right about Middle Chinese final palatals). Kan-on eki < 7th c. Kan-on *yeki was borrowed at this stage. But note that Sino-Korean 역 yŏk which probably slightly postdates Kan-on eki has nothing pointing toward a palatal component of the coda. The Sino-Korean reading isn't 옉 ˟yek < ˟yəyk < ˟(y)eyk. Maybe the coda had no palatal component in the source dialect of Sino-Korean. Or perhaps there was a phonotactic constraint against ˟-eyk in Old Korean. I don't know of any native Korean root with 옉 yek < yəyk < (y)eyk.
*e-raising in some dialects: *jek > *jik. Sino-Vietnamese dịch < *jic was borrowed at this stage.
Back to edachi and the unusual name 役 En: Most Old
Japanese speakers had no contact with Chinese speakers. The average
speaker had few Chinese borrowings in their vocabulary. The elite, on
the other hand, was more familiar with Chinese, and elite
pronunciations of Chinese words may have been on an continuum from
native speaker-like to heavily assimilated (i.e., Japanized). So the
Kan-on reading (singular) of 役 was really a set of readings in En no
Ozunu's time (the 7th century):
*yeyk - the most 'authentic'
*yek - slightly simplified while still maintaning an un-Old Japanese coda
*yeki - approach A to making the reading fit the Old Japanese open syllable template (add a vowel - namely an *i corresponding to the *-j- of Chinese *-jk)
*ye - approach B to making the reading fit the Old Japanese open syllable template (subtract the coda)
*ye is (questionably) attested in Old Japanese as a
word 'corvee'. (But I write it with an asterisk because I reconstructed
the *y-. The word is only known through the
reading tradition²; there is no phonogram spelling pointing to *y-.)
The name En may have originated as *yek whose coda
assimilated to the nasal of the following genitive marker nə
(cf. the Korean rule /k n/ > [ŋ n]).
*yek nə wonduno > *yeŋ nə wonduno > En no Ozunu (with a pseudoarchaic -nu based on an erroneous reading of a phonogram for Old Japanese no).
¹220.127.116.11:19: wiyaku is the Ruiju myōgishō Go-on reading of 疫 'epidemic', a homophone of 役 in Chinese. The graph 疫 is a combination of 疒 'disease' (semantic) and 役 (abbreviated phonetic, itself a semantic compound of 彳 'to go' and 殳 'baton, beat' [Karlgren 1957: 226]). The Kan-on reading is eki, and the word was Japanized as *ye (now pronounced e), (questionably) attested in
But I don't know for sure how 疫 was originally intended to be read
in those texts. In fact, 倉野憲司 Kurano Kenji's (1991: 255) edition of Kojiki
has 伇 (an archaic variant of 役, not 疫) which appears as 役 eyami
(! < *ye-yami, a hybrid of Chinese 'epidemic' and native
Japanese 'illness') in his 讀み下し文 yomikudashibun
on p. 101. 武田祐吉 Takeda Yūkichi's (1977: 96) yomikudashibun of Kojiki
has 伇 which he reads as e (< *ye).
²*ye appears in Man'yōshū 3847 in
the semantogram combination 課役 <IMPOSE SERVICE> which has been
read as edachi (< *yendati), etsuki (< *yentukɨ),
and mitsuki (< *mitukɨ) 'tax' (Ōno et al.
1990: 205). (6.28.21:57: All three possibilities fit the meter.)
Appendix: Evidence for labiality in 役
18.104.22.168:30: Karlgren (1957: 226) reconstructed Old Chinese 役 as *di̯ĕk without any labial segment. Thirty years later, Schuessler (1987: 743) reconstructed the word as ?*ljik without any labial segment. Standard Mandarin yi and Cantonese jik have no labial segment. However, both internal and external evidence point to a labial segment.
1. Internal evidence
武昌 Wuchang and 柳州 Liuzhou y
合肥 Hefei yəʔ
(Do any Jin varieties have a labial vowel in this morpheme? Xiaoxuetang only lists Taiyuan 太原 ieʔ.)
1.2. Most Wu varieties at Xiaoxuetang have labial vowels: y, u, or o; 莊村 Zhuangcun has ʯʔ [ʐ̩ʷʔ]1.3. All Xiang varieties at Xiaoxuetang have y.
1.4. Some Gan varieties at Xiaoxuetang have y or u; 平江 Pingjiang has ʯɤt [ʐ̩ʷɤt].
1.5. A few Hakka varieties at Xiaoxuetang have labial vowels:
翁源 Wengyuan yt
武平 Wuping iɒuʔ (but why is the u closer to the coda than the onset?)
- 上猶 Shangyou ye (with level tone!)
1.6. Some Yue varieties at Xiaoxuetang have v-, y, or u.
1.7. Some Pinghua varieties at Xiaoxuetang have v-, ʋ-, y, u, or o.
1.8. Min varieties (list not exhaustive):
1.8.1. Southern: 揭陽 Jieyang uek
1.8.2. Pu-Xian: yʔ in both 莆田 Putian and 仙游 Xianyou
1.8.3. Eastern: 福安 Fuan peik with p-!
1.8.4. Northern: 石陂 Shibei ɦy (with level tone!)
Are the initial consonants of Fuan and Shibei evidence for a proto-obstruent like Baxter and Sagart *ɢʷ-?
1.8.5. Central: 明溪 Mingxi y (with departing tone!)
1.8.6. Other: 隆都 Longdu uɐk (with upper register!) and 將樂 Jiangle y
1.9. Some of Xiaoxuetang's unclassified varieties also have labial segments:
道縣 Daoxian y
豐陽 Fengyang iɔi
星子 Xingzi vɑi
2. External evidence
2.1. Early Sino-Vietnamese việc and Muong (variety unidentified) [wiək] (Pulleyblank 1994: 83, cited by Schuessler 2007: 563)
2.2. Ruiju myōgishō (c. 1100) Go-on wiyau (sic; error for †wiyaku)
2.3. Borrowings in Tai: Saek viak D2L 'work', Siamese wiek³ (Maspero 1912: 73, cited by Schuessler 2007: 563; thai-language.com regards เวียก wiak as Isan - i.e., not standard Siamese - and the example implies it means 'work')
22.214.171.124:32: Added the forms from Schuessler (2007: 563).
126.96.36.199:44: EN NO OZUNU
役小角 En no Ozunu, founder of 修驗道 Shugendō, was banished by the Japanese court 1320 years ago today. The spelling of his name is doubly interesting.
角 'horn' is normally read as tsuno. In a compound, I
would expect ts- to voice to z-: -zuno. But I
wouldn't expect a final -u. Iwanami kogo jiten (1990) says tsunu
is an Edo period error for tsuno based on the
misinterpretation of man'yōgana for -no as nu. So is Ozunu
an Edo period misreading of 小角? (The genitive marker no between
En and Ozunu is unwritten.) Or is Ozuno (another
reading of 小角) a regularization of an original Ozunu reflecting
a dialect in which *-o raised to -u? (Cf. forms like
Hitachi Old Japanese yu [Kupchik 2011: 374] corresponding to
Western Old Japanese and even modern standard Japanese yo
役 (Wiktionary) is normally read as yaku or eki. Both of those readings are Chinese loans. 役 has never had a nasal-final reading in Chinese. So why is 役 read En in this name? If the name is native, it shouldn't end in -n since all Japanese words in the 8th century ended in vowels. I wonder how 役 was read when he was alive.
I was surprised to learn last night that بشار الأسد Bashar al-Assad
is pronounced [baʃˈʃaːr elˈʔasad] in Levantine Arabic with a gemihnate
and a single [s]. Why isn't it written as Basshar al-Asad
The Polish Wikipedia reflects the geminate [ʃʃ] and a single [s]: Baszszar
The Slovak Wikipedia even reflects the long [aː]: Baššár
al-Asad. (But the Czech Wikipedia lacks the geminate: Bašár
The Hungarian and Albanian Wikipedias have e for at least
one short [a]: Bassár
el-Aszad (but not ˟Eszed!)
el-Asad (but not ˟Esed!).
The Thai Wikipedia has บัชชาร อัลอะซัด <ɓăjjāra ʔălaʔaḥzăɗa> whihch I assume is read as [bàtsaːn ʔan ʔasát]. Alas, I don't know of any entries for Assad in the Lao, Khmer, or Burmese Wikipedias.
The Tamil Wikipedia has another drastic localization: பசார்
அல்-அசத் <pacār al-acat> [pasaːr al asat]. (Tamil has
no initial [b] or final [d].)
I don't know how typical those renderings are. I wish I had time to investigate how Arabic names are localized in various languages.
The title is from my attempt to Tangutize Assad's name as
1637 5994 4541 2682 4541 1693 0804
2ba1 1shar3 1a? 2lu3 1a? 1sa4 2dy4
using conventions originally developed for Sanskrit.
I am unaware of any reasoning for choosing either tone 1 or tone 2 for transcribing Sanskrit, so I have not taken tones into consideration when choosing transcription characters from the set used for Sanskrit as compiled by Arakawa (1997).
1. 1637: Sanskrit -a is generally Tangutized as -a4,
though Sanskrit ba can be Tangutized with -a1. I should
look into exceptional cases of a1-transcription.
2. 5994: In theory I could have transcribed [ʃʃaː] as shy sha, but I have chosen an English-like solution with just one fricative.
Tangut has no syllables ending in [r]. The -r of my Tangut notation indicates vowel retroflexion, not an [r]-coda.
Tangut has no word spacing. Perhaps modern Tangut would have had a
dot here to separate foreign names.
3. 4541: Devised to write Sanskrit a. Sanskrit a
consonants is normally transcribed as -a4, so I suspect 4541 is
1a4. But 1a1 is also possible since Sanskrit ba
can be transcribed as 1637 2ba1. I do not think 1a2
or 1a3 are likely, as neither is the known reading of any other
tangraph. 1a4 is attested as the reading of 𗅹
2375 'east, tail'. 1a1 is also not attested as the reading of
any other tangraph, but Sanskrit a was transcribed in Chinese
as *1a1, and Tangut -a1 can correspond to
Sanskrit -a. Hence 1a1 is not impossible as a reading
I do not know why Li (2008: 721) does not list a tone for 4541 which appears in the level tone (i.e., tone 1) section of Mixed Categories of the Tangraphic Sea.
4. 2682: Isolated Sanskrit consonants are usually Tangutized as -y syllables. However for some reason l is Tangutized as either 2682 ending in -u or 3284 𗥰 2la3. I have opted for 2682 2lu3 since it sounds like the Japanese solution for writing -l: ル -ru.
5. 4541: See above.
6. 1693: An example of an -a4 character used to Tangutize a Sanskrit syllable. Contrast with 1637 2ba1 for Sanskrit ba. Gong reconstructed -a4 as [ja], but there is no [j] in most Sanskrit syllables transcribed as -a4. (An obvious exception is 𘁂 5314 2a4 for Sanskrit ya.)
7. 0804: An example of an y-syllable used to Tangutize a Sanskrit consonant. Contrast with 2682 above which ends in -u rather than the usual -y.
188.8.131.52:52: CIR: THE THREE-AXIS MODEL OF ORTHOGRAPHIC REFORM
Having recently finished reading Robbins Burling's Spellbound: Untangling English Spelling, I've been thinking about how to characterize different proposals for reforming English orthography. The CIR model has three axes:
I could describe a proposal in terms of these three features using
this notation: [±lowercase letter of feature].
Continuity refers to whether a proposal incorporates an existing practice, either by leaving it alone or by expanding its domain.
To write all English [dʒ] as <j> is [+c] since some English [dʒ] are already written as <j>.
To write all English [ʃ] as <x> is [-c] since English [ʃ] is not written a <x>. (I am not counting foreign names like <Xi>.)
Continuity is of interest to both existing users of English (native
and nonnative) and learners who would want to access literature in the
Internationality refers to whether a proposal is compatible with non-English orthographic practices.
To write all English [i] as <i> instead of, say, <ee> is
[+i] since <i> represents [i] in most Latin-alphabet
To write all English [ʃ] as <s> is [-i] since no Latin-alphabet irorthographies have <s> for [ʃ] with the major exception of Hungarian. (Also, <s> is [ʂ] for southern Vietnamese speakers - not [ʃ], but close.)
Internationality is of interest to learners who would benefit from
an orthography using conventions they are likely to already know.
To write all English [k] as <k> is [+r].
To write English [k] as <c> before nonfront vowels and <k> before front vowels is [-r]. (But still more regular than the current spellings of [k]!)
Regularity is of interest to learners who do not want to be burdened
Obviously those features are not really binary; there are degrees of
CIR. I don't want to assign arbitrary numerical values, so maybe I
could double plus and minus signs: e.g., to switch English to the Shavian
would be [--i] ('doubleminus international'? - cf. Orwell's
'doubleplusgood') since no other language is written in that script.
Shavian in terms of all three features is [-c -i +r]:
[-c]: no continuity with any previous English
orthography (except on a very superficial level: e.g., full-sized
on-line symbols for both consonants and vowels, left-to-right
direction, word spacing, etc.)
[-i]: see above; Shavian would make English orthography even
less like any other existing orthography
The existing orthography is [+c -i -r]. It is irregular with many language-specific eccentricities.
A 'perfect' orthography that is [+c +i +r] seems impossible. To maintain continuity to some nontrivial degree, a new orthography would have to abandon internationality: e.g., reject international <i> for English-specific <ee> as the spelling of [i].
A [+c -i +r] orthography would require learning a lot of English-specific conventions, but those conventions would be consistent: e.g., <mee> and <eet> instead of <me> and <eat> (cf. <eel> and <feet> which would remain unchanged).
A [-c +i +r] approach would require all four of those [i]-example words to have new spellings: <mii>, <iit>, <iil>, <fiit>.
Maybe I should call regularity between two features, 'regularity' and 'monophoneticity/monophonemicity' (?). It is possible to have regularity without absolute one-to-one correspondences: e.g., [i] could be <ii> in closed syllables but <i> in open syllables: e.g.,< iit>, <iil>, <fiit> but <mi> (since there is no [mɪ] that would be written <mi> if <i> = [ɪ]). Another example of this type of 'split' (or environment-conscious) regularity is my proposal above for <c> and <k> which I regarded as [-r].
Lastly, the CIR terminology or something like it could be used to describe any script. Invented scripts would be [-c]. Adaptations of existing scripts could be regarded as [-c] if they bear little or no relation to a previous script for a language. The modern Turkish alphabet is
[-c]: no continuity with the
previous Ottoman script
which is an abjad, not an alphabet, and written in the opposite
(right-to-left) direction. Also not a straightforward direct offshoot
of any particular existing orthography.
[+i]: most letters have internationally
recognized sound values, though there are a few [-i] surprises: e.g.,
<c> is [dʒ]. (Is there any other alphabet with a voiced value of
<c> predating the modern Turkish alphabet? I'm not counting the
use of <c> for [g] as well as [k] in early Latin.)
[+r]: regular in the sense that pronunciation is almost¹
completely predictable on the basis of spelling, but not in the
monophonetic or monophonemic sense: e.g., ğ
has several phonetic values, and the circumflex marks both vowel length
or a palatal pronunciation of the preceding consonant. So it is
definitely not [++r].
The Tangut script is
[-c]: by default since there is no previous Tangut script
and also in the sense that it is not a derivative of any existing
script (though obviously the look of the script is Chinese-inspired)
[-i]: almost nothing in the script works like
any other script, and the few parallels with Chinese are not obvious or
numerous enough to be of much help for learning tangraphy
[-r]: the script is not internally consistent; the same phonetic value or semantic element can be written in more than one way
[-c -i -r] scripts like Tangut are the hardest to learn because they are sui generis.
¹6.26.17:26: I inserted "almost" because of ambiguous cases like gâvur [ɟaʋur] 'infidel' which in theory could also be read as ˟[ɟaːʋur] with a long vowel as well as a palatal consonant (cf. kâfir [caːfir] 'infidel') or ˟[gaːʋur] with a long vowel. Google Translate's TTS (?) 'knows' that gâvur and kâfir both have palatal consonants but that only kâfir has a long vowel.
I'm surprised that gâvur doesn't have a long vowel. It is borrowed from Persian گاور gāvur (before the u > o shift in modern standard Persian) which does have a long vowel. And kâfir (from Arabic via Persian) demonstrates that palatals can precede long vowels in Turkish.
I confess I was tempted to derive gāvur from kāfir, but there would be no reason for Persians to change k, f, and i to g, v, and r. The earlier form of gāvur is گبر gābr which is phonetically even further from kāfir - the second consonant b is a stop, not a fricative, and there is no second vowel. gābr is from Aramaic, not Arabic.
THE BATTLE OF MANG YANG PASS
The Battle of Mang Yang Pass occurred sixty-five years ago today:
It was one of the bloodiest defeats of the French Union together with the Battle of Dien Bien Phu in 1954 and the Battle of Cao Bằng in 1950.
The ambush and destruction of GM 100 [Groupement Mobile No. 100] was considered the last significant battle of the First Indochina War. Three weeks later, on Jul. 20, 1954, a battlefield ceasefire was announced when the Geneva agreements were signed, and on Aug. 1, the armistice went into effect, sealing the end of the French Indochina and the partition of Vietnam along the 17th parallel. The last French troops left South Vietnam in April 1956, upon request from President Ngô Đình Diệm.
What kind of name is Mang Yang? Vietnamese syllables normally do not begin with Y-. Mang Yang is in Gia Lai Province which has many obviously non-Vietnamese names. The one I recognize is Pleiku which is un-Vietnamese in four ways:
It begins with p-. ph- is permissible but not p- (because earlier Vietnamese *p- became b-).
It begins with a consonant cluster containing -l-. All
native Vietnamese *Cl-clusters became tr-. Wiktionary has a
phonetic Vietnamese spelling pờ lây cu splitting the first
syllable Plei in two. (Why is a huyền tone assigned to
The first syllable ends in -ei, a rhyme unknown in Vietnamese.
The second syllable has a k- instead of c- for [k] before a back vowel. Vietnamese k- is normally written only before front vowels.
Wikipedia and Wiktionary derive Pleiku from Jarai Plơi Kơdưr, lit. 'village north/above'.
The Vietnamese Wikipedia says Mang Yang is Bahnar for cổng trời, lit. 'gate sky': i.e., 'sky gate'. But the dictionary of the Plei Bong-Mang Yang Bahnar dialect by the Bankers and Mơ (1979) has no words like mang 'gate' or yang 'sky'. I cannot find a word for 'gate' in the dictionary's English-Bahnar index, and the only word for 'sky' I can find using that index is plĕnh on p. 99. (There is supposed to be another word for 'sky' on p. 110, but I don't see one.) There is a yang 'spirits, nonhuman beings that affect humans' on p. 145. Perhaps that is the Yang of Mang Yang.
184.108.40.206:50: WHY DON'T FINAL FRICATIVES DEVOICE IN TURKISH?
In my last post, I didn't comment
on the final consonants of Arabic Muḥammad
and Turkish Mehmet. Turkish final stops and affricates
devoice in final position: e.g., Arabic kitāb > Turkish kitap
'book' (but acc. sg. kitab-!). Note, however, that the
etymological -d of Mehmet does not survive in the
spelling of the accusative singular: Mehmet'i [mehmedi].
(The apostrophe separates a proper name from a suffix. The rule is to
keep the spellings of proper names intact regardless of actual
Note also that I spoke of final stops and affricates devoicing but not fricatives: /ʒ z v/ remain voiced in final position unlike their Russian counterparts.
Tonight I realized that /z v/ are phonetically fricatives but behave
like sonorants. Final /z/ in native words comes from Proto-Turkic *-r.
It is a former sonorant that still behaves like one. /v/ acts as if it
were /w/. I think /ʒ/ is only in borrowings like bej 'beige'
and garaj 'garage'; it may retain its final voicing by analogy
with /z/. And/or such borrowings postdate devoicing. (When did
(I am reminded of how traditional Tangut phonology groups z-
and zh-sounds with liquids in consonant class IX rather than
with s- in consonant class VI and sh- in consonant
One problem with the above analysis is that /r/ devoices in word-final position. So if /z/ is really like /r/, why doesn't it devoice like /r/? And if I understand Kornfilt (2009: 524) correctly, speakers who devoice /r/ also devoice palatal /lʲ/ and may even devoice velar /ɫ/. (Göksel and Kerslake 2005: 8-9 do not mention the devoicing of laterals.)
220.127.116.11:50: HOW DID MUḤAMMAD BECOME MEHMET?
Originally this post was titled "Why Doesn't Muḥammed Have Ü?".
But the answer to that question is simple: Arabic /u/ was borrowed as
both before and after Arabic pharyngeals. I mistakenly thought vowels
in Ottoman borrowings from Arabic were determined only by preceding
Originally the intermediary vowels in the Arabic Muhammad were completed with an e in adoption to Turkish phonotactics, which spelled Mehemed, and the name lost the central e over time. Final devoicing of d to t is a regular process in Turkish. The prophet himself is referred to in Turkish using the archaic version, Muhammed.
I thought Mehmet was a Turkish version of Arabic Maḥmūd, but they are only related because they share the same M-Ḥ-D root. The two names are distinct in Arabic spelling: Meḥemmed (now Mehmet) and Muḥammed are both محمد <mḥmd> like Arabic Muḥammad (Buğday 2009: 220), whereas I suppose Turkish Mahmut (Ottoman Maḥmūd?; the name is not in Buğday 2009) is محمود <mḥmwd> like Arabic Maḥmūd.
Turkish Mahmut < Arabic Maḥmūd has a
for the same reason that Ottoman Muḥammed has u: a
neighboring Arabic pharyngeal. (Contrast with Ottoman mühimmāt
< Arabic muhimmāt 'important matters' in which /u/
has no pharyngeal neighbor.)
On the other hand, Turkish Mehmet < Ottoman Meḥemmed < Arabic Muḥammad has a first e where I would expect an u before an Arabic pharyngeal. And the second e of Ottoman Meḥemmed occurs where I would expect an a after an Arabic pharyngeal.
The key word is "Arabic". Turkish doesn't have pharyngeals. Here's what I think might have happened. Turks heard Arabic [muħammæd] and borrowed it in harmonized form ("in adoption to Turkish phonotactics" as Wikipedia put it) as *Mühemmed. (I assume the borrowing of Arabic /a/ as a in the presence of pharyngeals was a learned practice only possible to those who were literate: i.e., aware of a graphic if not a phonetic distinction between Arabic glottal ه <h> and pharyngeal ح <ḥ>, both borrowed as [h] in Turkish.) The first vowel was then irregularly assimilated to the other two e: Mehemmed (written etymologically in Ottoman as <mḥmd>, transcribed here with vowels and unwritten gemination as Meḥemmed, a compromise between the pronunciation and the spelling).
The relationship between Mehmet and Muhammed is slightly like that between the Korean and Japanese words for 'Buddha' on the one hand and the Sino-Korean and Sino-Japanese morphemes for 'Buddha':
||부처 Puchhŏ < *put-ke < Late Old Chinese 佛 *but||佛 Pul < northern Late Middle Chinese *fur
||佛 Hotoke < *potə-ka-i < Paekche *? < Late Old Chinese 佛 *but||佛 Butsu < Early Middle Chinese *but
The two columns represent two kinds of borrowing. All of the above
forms are based on Chinese 佛 'Buddha' (itself a borrowing from
Indic Buddha). But the forms in the first column cannot be
mechanicaly derived from Chinese like those in the second column. The
former were idiosyncratically borrowed as single items and not as part
of an entire lexicon complete with systematic conventions of
pronunciation. (Chinese is to Korean and Japanese what Arabic was to
Adding to the idiosyncracy are suffixes absent from Chinese. Early
Korean *-ke and early Japanese *-ka- seem to be a
Koreanic morpheme 'ruler' which may have continental origins: cf.
Khitan qa 'khan'. Japanese *-i is a noun suffix.
I cannot explain the *o in Japanese. Perhaps there was a lowering of *u in Paekche, the likely donor language. But there is no other evidence of such lowering. The general tendency in early Japanese was toward raising, not lowering: pre-Old Japanese *o became Old Japanese u, not the other way around.
18.104.22.168:13: WHY DOES MÜHACIR HAVE Ü?
After the ethnic cleansing of Phocaea, muhacirs settled in what is now Foça.
The Turkish word muhacir [muhadʒir] 'migrant' is from Arabic muhājir. I was surprised that the Azerbaijani counterpart is mühacir with ü. I would understand fronting a foreign u to make a word conform to vowel harmony, but mühacir is even less harmonic than muhacir (which would be ˟mühecir or ˟muhacır if it were fully harmonic).
|hypothetical (all front vowels)
|hypothetical (all back vowels)
On the basis of these two words (dangerous!), I expected that Arabic u after nonemphatic consonants such as m was be borrowed into Turkish as back u and Azerbaijani as front ü.
And I was wrong. Buğday's The Routledge
Introduction to Literary
Ottoman (2009: 11) explains:
The pronunciation of short vowels in Persian and Arabic words is generally governed by which consonants appear before and after the vowels. Arabic vowel graphs are as a rule interpreted as front vowels in Ottoman (üstün = e, kesre = i, ötre = ö, ü). There is nonetheless a group of consonants that cause front vowels in their environment to shift their point of articulation and become back vowels (a, ı, o, u).
Those consonants that shift vowels from front to back are: ح ḥ, خ ḫ, ص ṣ, ض ż, ط ṭ, ظ ẓ, ع ,` غ ġ, ق ḳ. The remaining consonants retain the front articulation of the vowels:
ب b, پ p, ت t, ث s, ج c, چ ç, د d, ذ z, ر r, ز z, j, س s
ش ş, ف f, ك k, ل l, م m, ن n, و v, ه h,ی y
I have long known about Turkish e for Arabic a, and
that has never surprised me since [æ] is an allophone of Arabic /a/ and
is the phonetic value of Persian short a.
Arabic [æ] > Persian [æ] > Turkish e
But neither Arabic nor Persian have front rounded vowels, so I didn't expect this shift:
Arabic [u] > early New Persian [u] > Turkish ü (less commonly ö and rarely o)
(Modern Persian has lowered [u] to [o].)
ö is particularly odd in `Ömer after `ayn which normally should favor a back vowel: e.g., in sā`at [saːʔat] 'clock'. (Turks could not pronounce `ayn [ʕ], but they did replicate the backness of /a/ after /ʕ/ in Arabic.) Did the first vowel front to match the frontness of the second vowel?
o in `osmān 'Uthman' is understandable since a mid [o] approximates the lowered allophone [ʊ] of /u/ after `ayn.
So although I initially thought that Turkish mücahit 'jihadi' (cf. Azerbaijani mücahid) < Arabic mujāhid was irregular, in fact it is regular, and the real question is: why isn't Turkish muhacir 'migrant' ˟mühacir with a front vowel?
Another question is: Why does the word 'jihadi' have u in Uzbek mujohid (cf. Tajik mujohid with the Tajik-internal shift o < ā) and modern Uyghur mujahit? Is there an east-west split in the way Arabic u is borrowed in Turkic? Do Uzbek and Uyghur reflect Chagatai borrowing practices? Did Chagatai and early Turkish speakers perceive Arabic /u/ in nonemphatic environments differently?
Turkish fronting of nonemphatic vowels interests me because it
reminds me of the Mandarin palatal reflexes of Middle Old Chinese
nonemphatic vowels in Mandarin: e.g.,
||Middle Old Chinese
||Mandarin (sans tones)
||3rd person poss. pron.
(Not all *k-nonemphatic vowel sequences have palatal
reflexes in Mandarin. *k- that palatalized early became *tɕ-
which in turn became [tʂ]: e.g., 支 *ke > *kie > *tɕie
> *tɕi > [tʂɻ̩] 'branch'.)
Norman (1994) was the first to make the connection between Arabic emphasis/nonemphasis and what Pulleyblank called the type A/B contrast in Old Chinese (which Norman interpreted in terms of pharyngealization).
HOW DID PHOCAEA BECOME FOÇA?
Today is the centennial of the massacre at Φώκαια <Phṓkaia> /fokea/ [focea] 'Phocaea', now Turkish Foça /fotʃa/ [fotʃa]. (What would its Ottoman spelling have been? فوچا <fwčʔ>?).
I was surprised by the correspondence betwen Greek /kea/ [cea] and Turkish /tʃa/ [tʃa]. In theory Greek /fokea/ [focea] could have become Turkish ˟Fokea /fokea/ [focea]. But maybe the local Greek and Turkish versions of the name are closer: e.g., if the local Greek dialect had shifted *ea to [ja] and if the local Turkish dialect had merged [c] and [tʃ], etc. Or maybe I'm just seeing regular borrowing conventions at work reflecting an earlier time: e.g., if Greek /k/ had palatalized to [c] before /e/ before Turkish /k/ did, then the closest Turkish equivalent of Greek [c] at that time would be [tʃ].
Having spent so many years studying Sinoxenic - systematic Chinese
borrowings in Vietnamese, Korean, and Japanese - I'm accustomed to
regularity in borrowings. And unusual features are usually not random
noise. They generally reflect lost features:
e.g., dentals in Sino-Vietnamese reflecting old southern
palatalized labials, the -l of Sino-Korean reflecting an old
liquid absent from any living Chinese language, etc.
Middle Chinese 必 *pit 'necessarily'
> Sino-Vietnamese tất [tət] < *sət < *psət < *pʲət in Annamese Middle Chinese
Ferlus (1992) reconstructed earlier Sino-Vietnamese *pz-, but I have never seen that cluster in any Mon-Khmer language
the schwa is an interesting deviation from the Chinese norm I'll explore later
> Sino-Korean phil < *pir in northern Middle Chinese
the Korean aspiration is irregular and may be due to hypercorrection
So I'd like to think there's some significance in the correspondence betwen Greek /kea/ [cea] and Turkish /tʃa/ [tʃa]. But maybe there isn't any. The elite of Vietnam, Korea, and Japan looked up to Chinese and wanted to closely emulate Chinese pronunciation, whereas Turks had no motivation to closely emulate the pronunciation of their Greek subjects. Greek εἰς τὴν Πόλιν [is tim bolin] 'to the city' became İstanbul, not ˟İstimbolin.
the Origin of the Mainstream Hakka Word [oi1] 'Mother' ", W. South
Coblin proposes that oi-type words for 'mother' in Hakka
varieties originate from the compression of two syllables (amoi)
into one (oi). Although amoi > oi at first
looks like am-loss (i.e., the disappearance of the first half
of the word), if the kinship prefix a- is analyzed as a zero
consonant Ø- plus a rhyme -a, then oi is really Øoi
with the initial of the kinship prefix Øa- and the rhyme
of the root moi 'mother':
That is an example of one of three types of compression in Chinese
and other languages of the region:
刀 Early Old Chinese *CVtaw > Late Old Chinese *taw 'knife' (If not for Vietnamese [zaːw] with lenition of *-t- conditioned by *CV-, no first syllable would be reconstructible)
2. disyllabic word > fusion of initial consonants of both syllables + rhyme of second syllable
抱 Early Old Chinese *mʌpuʔ > Late Old Chinese *bowʔ 'to carry in the arms'
*b is a fusion of *m- and *-p-.
My formulation needs to be tweaked because the vowel of the surviving syllable has changed under the influence of the lost vowel of the previous syllable: *u has lowered to *ow.
3. disyllabic word > initial of first syllable + rhyme of second
syllable ... no, I'd better reformulate that.
Coblin gives a standard Mandarin example: 不用 bú yòng lit. 'not use' > 甭 béng 'no need to' (note the neat stacked composite character). My initial formulation doesn't work; it would predict a fusion ˟bòng or ˟bèng (the latter takes into account the impossibility of -ong after labials in standard Mandarin). But the actual form has the tone of the first syllable and a rhyme that is unlike either syllable. So how about
3'. disyllabic word > initial of first syllable + fusion of rhymes of both syllables
to account for 甭 béng?
And 3' can be reworded to account for 抱 *bowʔ:
2'. disyllabic word > fusion of initial consonants of both syllables + fusion of rhymes of both syllables
No, wait, *-ʌ- in *mʌpuʔ isn't a rhyme - it's a
vowel in the middle of a word. And I can't think of a word to describe *CʌCu
> *CʌCow > *Cow. 'Umlaut' isn't right. Vowel
harmony is involved, but there's also diphthongization. I've used the
term 'bending' and Schuessler uses the term 'warping', but neither term
acknowledges the first vowel that triggers the process. 'Harmonic
bending' or 'harmonic warping'?
In any case, I've been thinking that reduction is irregular. Fusion
is a type of reduction. So I expect some difficulty in trying to ...
reduce reduction to a set of simple categories. I'd still like to say
something other than 'anything can happen', though. There are
constraints on complexity.
Let's zoom out from a single etymology toward the bigger picture of
Hakka as described by Coblin. Let me try to translate his words into a
South Central Chinese
|Early Southern Highlands
||a subset of Tuhua/Pinghua
土話 Tuhua 'local speech' and 平話 Pinghua 'ordinary speech' are generic terms for a set of unclassified Chinese languages. Coblin proposes that some of them may be related to his Southern Highlands Chinese group of languages which I could call 'Greater Hakka'. He reconstructs 'mother' in Early South Central Chinese as *mVi3/4, leaving aside the problem of daughter forms with tones 1 and 2 (e.g., the "[oi1]" in his title) for the time being.
TANGUT VOWELS V. 190509
Writing about the Tangut
transcription of Sanskrit trailokya
got me thinking about the phonetic values of Tangut vowels again.
Here's my own take on the four grades influenced by Gong
Xun's ideas. Only basic vowels are listed in Tangraphic Sea
order, so there are no nasalized, tense, or retroflex vowels. I still
have no idea what the distinction that I indicate as -' was.
I write the basic vowel /ə/ as an easy-to-type y in my
I. Pharyngealized; lowered and/or backed
Pharyngealization is carried over from Jerry Norman's proposal for the Old Chinese source of Middle Chinese Grade I.
The lowered and backed allophones are similar to Arabic vowel allophones after 'emphatics' as described in Kaye (2009: 565).
Syllables with 'lower' series vowels (*a *e *o)
automatically developed pharygealization unless this was blocked by a
preceding 'higher' series presyllabic vowel (*ɯ):
*Ca > *Cɑˁ but *CɯCa > *Ca
Conversely, a 'lower' series presyllable vowel (*ʌ) triggered pharyngealization in a following 'higher' series vowel (*ə *i *u):
*CʌCi > *Cɪˁ
Low /a/ cannot be lowered any further, so it is only backed.
Front /e/ is retracted to [ɛˁ]. The underlining indicates
retraction. [ɪˁ] without underlining is already backer than front [i],
so I do not underline it.
Back /u o/ cannot be backed any further, so they are only lowered.
II. Uvularized; lowered and/or backed
Medial /r/ in pre-Tangut pharyngealized syllables became uvular [ʁ]. This uvular medial was lost, but it colored the following vowel: e.g.,
pre-Tangut *pʰrat > *pʰʁɑˁt > pʰɑʶ = 2475 𗧑 1pha2 'to break in two'
Note that Gong Xun reconstructs uvularization in both Grades I and II:
In his system, a medial -ʕ- distinguishes Grade II from Grade I.
Gong has a single unmarked category corresponding to my Grade III and IV. Although it is true that Grades III and IV are in nearly complementary distribution -
Grade III: after v- (a labiovelar glide?), retroflexes, (velarized?) l-
Grade IV: elsewhere
- I still want to work out how they sounded to distinguish between the few minimal pairs that existed.
Syllables with 'higher' series vowels (*ə *i *u)
automatically became Grade III or IV dependng on the preceding initial
unless there was a preceding 'lower' series presyllabic vowel (*ʌ):
*Ci > *Ci but *CʌCi > *Cɪˁ
Conversely, a 'higher' series presyllable vowel (*ɯ) triggered Grade III or IV in a following 'lower' series vowel (*a *e *o):
*CɯCa > *Cæ
III. Higher and centralized
Grade III was less palatal and more velar than IV. Its palatal vowels /ɰi ɰe/ had velar glides /ɰ/ to distinguish them from the pure palatal vowels [i e] of Grade IV. The sequence /wɰ/ surfaced as [w].
IV. Higher and fronted
Grade IV was more palatal than III. It had front vowels [æ y ø]
corresponding to the central or back vowels of other grades.
An exception to that pattern is [ɨ] which, though not front, was
still fronter than its back counterparts in other grades.
The Grade IV equivalent of labiovelar [w] in other grades was
Unattested syllables are in parentheses.
I regard [q] as the Grade I and II allophone of /k/.
Are the gaps in the table random or systematic? Any theory of grades should be able to answer that question.
My hypotheses above regarding the origin of the grades predict that
- lower-vowel syllables should tend to have Grades I and II
- higher-vowel syllables should tend to have Grades III and IV
if *CV monosyllables outnumbered *CV̆CV sesquisyllables.
And above we see
- there is no 1ka3 or 1kwa4
- there is no 1k(w)i1
which fits my predictions.
The absence of 1kwi3 is also not surprising, since ki-syllables should tend to have Grade IV, not Grade III. k- does not belong to the subset of initials associated with Grade III: v-, retroflexes, and l-. There are only three known k-syllables with Grade III, and two of them happen to be in the table: 1kwa3 and 1ki3. The third is 1ka'3 which must have been something like [ka] plus whatever feature was represented by -'.
PITTAYAPORN'S PROTO-TAI *-ɲ
One of the innovations of Pittayawat Pittayaporn's (2009) PhD
Phonology of Proto-Tai is his reconstruction of a Proto-Tai final
The reconstruction of *k- and tone category A for the Proto-Tai word 'to eat' is certain. The vowel and coda of that word are less certain.
Since it has been established that PT [Proto-Tai] allows palatal consonants in the coda [i.e., *-c¹ and *-j], one would also expect to find the palatal nasal occurring in coda position. Although the reconstruction of PT *-c is unequivocal, there is rather little evidence for final *-ɲ. The only potential case I have identified so far is ‘to eat’, which is reflected as /kinA1/ in all SWT [Southwestern Tai] varieties but as /kɯnA1/ in NT dialects [Northern Tai] like Wuming and Yay. We can speculate that the PT form for ‘to eat’ was *kɯɲ A but the vowel was fronted so that the PSWT [Proto-Southwestern-Tai] form for this etyma was *kin A. Therefore, I tentaitively hypothesize that PT had both *-c and *-ɲ.
Let's look at the 'eating' problem from a subgrouping perspective. Unlike Li Fang-Kuei whose classic model of the Tai family had only three branches (Northern, Central, and Southwestern), Pittayaporn (2009: 298) proposed four branches on the basis of clusters of innovations:
A. Most Tai languages
C. Chongzuo and Shangsi
D. All of Li's Northern Tai languages (such as the displaced Saek in the southeast) plus some of his Central Tai languages
Wikipedia has a clickable version of Pittayaporn's tree.
What is 'to eat' in the four branches?
A. Siamese kin A1
B. Ningming ken A1 (not in Pittayaporn 2009; found in Hudak 2008: 121)
C. Shangsi kɤn A1
D. Yay kɯn A1 but Saek kin A1
There are two types of words for 'to eat': ones with front vowels (A, B, Saek) and ones with back vowels (C, Yay). All end in -n.
Given that -in words are found in both A and D (Saek), let's suppose those branches independently preserve a proto-rhyme *-in.
By analogy, any -in words in Siamese and Saek should respectively end in -en in Ningming, -ɤn in Shangsi, and Yay -ɯn unless complicated by other factors. But is this really the case? Compare the forms for 'to eat' with those for Pittayaporn's *lin A 'water pipe':
A. Siamese lin A2
B. Ningming (no cognate in Pittayaporn or Hudak)
C. Shangsi lin A2 (not ˟lɤn A2)
D. Yay and Saek lin A2 (not Yay ˟lɯn A2)
It is true that in the modern languages, 'to eat' and 'water pipe'
belong to different tonal categories (A1 and A2) conditioned by the
initials (*voiceless > 1, *voiced > 2). So one could try to
salvage the *-in reconstruction of 'to eat' by claiming that *-i-
changed before *-n in tone A1 syllables in Ningming, Shangsi,
and Yay. But why would, for instance, tone A1 cause *-i- to
lower and back to -ɤ- in Shangsi?
Might the original rhyme of 'to eat' be preserved in Shangsi - or
Ningming or Yay? No, because the rhymes of 'to eat' in those languages
do not otherwise correspond to -in in Siamese and Saek. Here
are all the relevant correspondence sets, including those I already
Pittayaporn's solution is ingenious:
- It accounts for the front vowel of Siamese and Saek as the result
of feature transfer: the palatality of *-ɲ shifted to the vowel
*-ɯ-, causing it to independently front to -i- in two
distant branches of Tai (assuming the Saek word isn't a loan).
- The shift of *-ɯɲ to -Vn in all branches fits a trend against -Vɲ rhymes in Southeast Asian languages. Khmer does have a high neutral vowel-palatal nasal sequence /ɨɲ/ (e.g., in <beñ> /pɨɲ/, the Penh of Phnom Penh), but it is exceptional. Burmese once had /-aɲ/ as its sole /-ɲ/ rhyme, and Vietnamese only has /-aɲ -eɲ -iɲ/.
There are, however, two problems with his *-ɲ:First, it is only reconstructible in 'to eat'. Perhaps it had merged with *-n (and/or *-ŋ) after other vowels. Or 'to eat' is simply irregular, and *-c has no nasal counterpart, just as Old Chinese *-kʷ has no nasal counterpart.
Second, there is no external support for *-ɲ either within Kra-Dai or beyond it. Although Norquest (2015) reconstructs *-ɲ in Proto-Hlai, he does not reconstruct *-ɲ in Proto-Qi³ *kʰən (< my pre-Hlai *kən) 'to eat'. Blust's Proto-Austronesian *kaen [kaən] - somehow related to the Proto-Tai and Proto-Qi words - ends in *-n, not *-ɲ. The *k-word for 'to eat' probably goes back to Proto-Kra-Dai and is either inherited or borrowed from some Austronesian-type language⁴. Does Proto-Tai preserve a *-ɲ lost elsewhere?
¹The reconstruction of a Proto-Tai final palatal stop is another innovation of Pittayaporn (2009). Although no attested Tai language has /c/, reconstructing *-c accounts for correspondence set 2 in the following table:
Sets 1-3 are from Pittayaporn (2009: 211-212). Set 4 is based on the forms for 'liver'.
Saek is an aberrant Tai language which "shows many peculiarities
that cannot be reconciled within the conventional model of PT
[Proto-Tai] phonology" (Pittayaporn 2009: 14).
The Be languages are generally thought to be close relatives of Tai. See Chen (2018: 18) for the placement of Be within four different proposed Kra-Dai language trees. Ostapirat has changed his mind over time; in 2000 he viewed Be as a sister of Tai but in 2015 he viewed Be as a primary branch of Kra-Dai, and as of 2017 he viewed Be as a sister of a Tai-Kam-Sui subgroup.
²Proto-Tai *ˀjen A 'tendon' has a different set of rhyme correspondences that may be conditioned by a palatal initial absent from Proto-Tai *ʰmen C 'porcupine'.
³Proto-Qi is my term for the common ancestor of the
Qi subgroup of Hlai. Norquest reconstructs it but has no term for it.
Other early Hlai languages had unrelated words for 'to eat'. As only
the Proto-Qi word is cognate to the Proto-Tai word, it seems that
pre-Hlai must have inherited the word from Proto-Kra-Dai, but only one
dialect of Proto-Hlai (i.e., Proto-Qi) retained it whereas other
dialects of Proto-Hlai replaced it with innovations of unknown origin: *C-ləːk
in Proto-Run and *C-luːɦ elsewhere.
⁴I am deliberately vague here because I do not know if Proto-Kra-Dai is descended from Proto-Austronesian or is a sister to it (i.e., a descendant of Proto-Austro-Dai, if I may modernize Benedict's term 'Austro-Tai'). Or if there is no genetic relationship between Kra-Dai and Austronesian, if Proto-Kra-Dai borrowed from Proto-Austronesian, an ancestor of Proto-Austronesian, or a descendant of Proto-Austronesian.
SANSKRIT TRAILOKYA IN THE TANGUT INSCRIPTION AT JUYONGGUAN
Five years ago I rediscovered 村田治郎 Murata Jirō's 1957 book on the inscriptions of the Cloud Platform at 居庸關 Juyongguan¹ in the University of Hawaii library. I had last borrowed it around 1996. Of course my attention was drawn to the Tangut inscription. But, I confess, not for long. Soon after that I dove into the world of Tangut's distant relative Pyu. And I've been there for four years.
Then yesterday Andrew West reawakened my interest in the Juyongguan inscriptions.
Today I was looking at the Tangut inscription at Juyongguan, and
the Tangut transcription
5300 3639 2770 4620
1ty4 2rer4 2lo1 1ka4
of Sanskrit trailokya 'three worlds' jumped out at me. I've used Trailokya as part of my long pen name for maybe twenty-five years now.
A few words on the transcription characters:
𘎤 5300 1ty4: The only consonant clusters possible in native Tangut words had -w- as their second element. So one strategy for transcribing Sanskrit consonant clusters was to break them up into CyC-sequences. Tangut y was a neutral vowel, and in Grade IV (indicated by my -4) it was something like [ɨ] or [ɯ].
𗣀 3639 2rer4: Tangut had no
[aj]. Guillaume Jacques (2014: 206) does not even reconstruct *-aj
at the pre-Tangut level. I am guessing pre-pre-Tangut *-aj
became pre-Tangut *-ej (which Guillaume does reconstruct) and
then Tangut -e.
Here's a possible example:
(I finished the rest of the entry on 5.4.15:39, added a footnote on 5.6.19:06, and then failed to save the finished page. What follows is a new second half from 5.6.19:39.)
𗥹 2770 2lo1: For a long time, I used to think that Tangut tones might actually be phonations: tone 1 was the default phonation and tone 2 was the marked (creaky or breathy?) phonation. But the phonation hypothesis predicts that Sanskrit would be transcribed solely using Tangut characters for syllables with tone 1. There would be no reason to transcribe Sanskrit with Tangut characters for syllables with tone 2: i.e., a phonation that did not exist in Sanskrit. However, most Sanskrit Co-syllables⁴ were transcribed with Tangut characters for syllables with tone 2 (Arakawa 1999: 111).
|pho, bo, mo
Why was tone 2 favored for Sanskrit Co-syllables?
Conversely, why was ko transcribed with a Tangut character for a syllable with tone 1?
And was there a reason to transcribe the remaining Sanskrit
syllables with Tangut characters for syllables with both tones? For
instance, was there something about the -lo- of trailokya
that necessitated tone 2, whereas the lo in some other word was
somehow different to Tangut ears and required the tone 1 character 𗓽 4710 1lo1?
𗡝 4620 1ka4: This character
transcribed both Sanskrit ka and kya. Why not
transcribe Sanskrit kya as ky ya (cf. 1ty4 2rer4
for trai above) or as a fanqie character for kya
combining part of a kV-character with part of a ya-character?
Perhaps 1ka4 was something like [kja]. But if Grade IV (written
here as -4) was characterized by [j], why could 1ka4
also represent Sanskrit ka? Was there no simple [ka] in Tangut?
Were Grade I and II ka something other than [ka]: e.g., [qɑˁ]
and [qɑʶ] like Middle Chinese *1ka1 and *1ka2? Why was
there no Grade III ka?
Chinese and Tangut grades seem to be similar. So if the Middle Chinese transcription of Sanskrit ka was 迦 *1ka3, I would expect the Tangut transcription to be 1ka3 - a syllable that does not exist in Tangut!
To complicate matters, Grinstead (1972: 144) says 4620 could
represent Sanskrit ke. 1ka4 must have sounded like Sanskrit ke
as well as ka and kya. Maybe it had a front vowel:
¹This name was built into Windows 10's pinyin IME. It's interesting to see what's in and out of the IME.
Sometimes more annoying than interesting. For instance, the common
character 家 jia 'house' isn't listed as a choice for jia. I've
been typing 家族 jiazu 'family' and deleting the second character
to type 家 jia.
At least 波 bo 'wave' is included as a choice for bo now. I recall having to type the wrong reading po to make it display in some older version of the Windows Mandarin IME. I just noticed that the bopomofo IME accepts both bo and po for 波 bo 'wave'.
²(Pre-)pre-Tangut *S- conditioned Tangut -q
(my symbol for vowel tension) and pre-)pre-Tangut *-ɯ- (perhaps
a front or back high vowel like *-i- or *-u- in
pre-pre-Tangut) conditioned Grade IV.
³Matisoff (2003: 262) does not gloss the Jingpho and Boro forms.
⁴Many Sanskrit Co-syllables are absent from
Arakawa's data: e.g., kho, gho, cho, jho, ṭo, etc.
⁵Arakawa (1999: 111) accidentally omitted the rhyme and first tone of 𗓽 4710 1lo1, the other Tangut transcription character for Sanskrit lo in his table.
URN-ING MY PAY
1. Four years of studying Pyu are paying off. Prof. Janice Stargardt
of Cambridge made me reexamine the
Hpayahtaung urn inscription (PYU 20). After all my advances in Pyu
phonology, grammar, and lexicography, I'm finally beginning to
understand it now. Just beginning. I imagine that the Khitan Small
Script Research Group felt like I did when they began to make progress
in understanding Khitan in the late 70s. The decipherment of both Pyu
and Khitan both have a long way to go - neither is remotely as advanced
as the decipherment of Tangut - but I am now beyond the level of mere
isolated words and a handful of grammar rules.
I thought Pyu was totally hopeless when I first tried to wrestle with it in 2015. But I'm starting to see the light at the end of the tunnel now. I'll probably never reach the end of the tunnel, but I hope my work can help others get there.
2. I try not to have tunnel vision. Paradoxically, not focusing on Pyu is the key to understanding Pyu. It's my knowledge of other languages that have made a difference in my efforts to crack that extinct language. I don't have time to look into anything other than Pyu in depth anymore, but I can still glance at the world outside first-millennium Burma.
While Googling for spontaneous nasalization for last
night's entry, I came across Rémy Viredaz' "Two
unrecognized vowel phonemes in Proto-Slavic".
Even before I got to the mind-blowing part about new phonemes (p. 13), I was stunned by his phonetic interpretation of the traditional set of vowel phonemes as a symmetric system (p. 1). Imagine a Slavic conlang retaining those old phonetic values.
One of Viredaz' new phonemes accounts for the unusual -e of the Old Novgorod masculine o-declension corresponding to *-ъ in the Slavic mainstream.
Now I wonder how Magadhi got -e in the masculine a-declension
corresponding to the Slavic masculine o-declension. Needless to
say, an Indic verson of Viredaz' solution won't work.
3. I haven't forgotten about northeast Asia. Last night I also saw Andrew Shimunek's "Phonological and literary characteristics of some pieces of Khamnigan oral folklore" which made me wonder if anyone has done a survey of what might be called phonoliterary techniques in the Altaic world. Both Khamnigan and Khitan use rhyme which is alien to Korean and Japanese. Oddly a couple of words that rhyme in Russian have Khamnigan forms that do not rhyme:
R zeljonka > Kh tʃilɔːɴqʰɔ 'green tobacco'
R kartofel' ~ kartoška > qʰɔrtʰapqʰa 'potatoes'
4. Alexander Vovin's "EOJ [Eastern Old Japanese] specific vocabulary and Ainu vocabulary from the Man'yōshū" is a handy reference that only an expert in both early Japonic and Ainu could write.
Now I'm curious about the Proto-(Mainland) Japanese and even Proto-Japonic forms underlying the EOJ and Western Old Japanese forms: e.g., what I presume would be *yuru for EOJ yuru and WOJ yuri < *yuru-i 'lily'.