Leftovers from yesterday:

1. Lenition was an unspoken theme of "Fanning Red Ears of Grain". Yesterday I realized that

Dutch goeie < goede 'good'

is like

Japanese 良い yoi < yoki 'good'.

In both cases, the lenition is not regular. Not all intervocalic -d- and -k- have disappeared from those languages. Goede still exists as a formal form, and 良き yoki still exists as an archaic form. Both unlenited broeder 'brother' (religious) and lenited broer 'brother' (sibling) coexist in formal Dutch. Japanese -k- almost never lenites in noninflected forms: e.g., 時 toki 'time' has not become ˟toi. (The one exception I can think of is 垣間 kaima 'gap in a fence' < kaki-ma-mi 'fence-space' in which the noun kaki 'fence' - never ˟kai by itself - lenited.) Lenition is mandatory in inflected forms apart from archaisms, and not all hypothetical archaisms are possible: e.g., no one says ˟kakita 'wrote' instead of kaita < *kakitari. *tari didn't regularly become ta in Japanese; it is another example of reduction. All these examples demonstrate how reduction is not necessarily regular like a 'sound law'. I expect nonreductive sound changes to involve exceptionless sound laws: e.g., *dh > *d in Germanic. (Maybe not the best example since *dh > *d could be regarded as reduction [aspiration loss], but I don't know of any language in which deaspiration isn't regular. Was *dh really *ð? See Phoenix's )

I got the broeder/broer example from Phoenix who has more on Dutch (and Irish) lenition.

For even more including cases of hypercorrect -d-insertion in Dutch, see de Vaan (2018: 64-65).

I assume goei (formal goed) 'good' is a product of backformation from lenited goeie, as I don't know of any other cases of Dutch final -d [t] becoming -i.

2. I forgot to mention one other 'ao-ddity' in "Fanning Red Ears of Grain" - a Japanese name that is sui generis as far as I know:

*apa-pu 'millet-place.where.grow' > *ababu > *aβaβu > *awawu > *awau > *awɔː > 粟生 <ahafu> ao (not ˟a!)

I can't explain why the final vowel isn't long. Was *aoː confused with the much more frequent word ao 'green'?

3. When I typed Japanese 顏 kao 'face' in "Fanning Red Ears of Grain", I initially used Windows 10's Korean IME since I thought the standard modern forms of hanja were identical to the prewar kanji I prefer. But to my surprise, the IME converted 안 an (the Sino-Korean reading of 顏) to 顔 which looks exactly like the postwar kanji for kao. I was surprised.

When I was studying Korean, my instructor pointed out I had written a postwar Japanese-style kanji instead of the proper hanja which was identical to the corresponding prewar kanji. Perhaps that was before I decided to embrace prewar Japanese orthography in formal writing. Ever since then I've been writing hanja and prewar kanji identically without any problems. Until now.

I did a quick survey of Korean books I could easily find to see what form of <FACE> was in them. Obviously Wiktionary isn't a book, but I've included it anyway:

Author or publisher
새字典 Sae chajŏn (New Dictionary of Characters)
東亞出版社 Dong-A chhulphansa
A Korean-English Dictionary (in entry for 顏面 anmyŏn 'face')
Martin, Lee, and Chang
賢學學習玉篇 Hyŏnhak haksŭp okphyŏn (Hyŏnhak Study Jewel Book)
賢學社 Hyŏnhaksa
A Guide to Korean Characters
Bruce K. Grant
Jacob Chang-ui Kim
最新版常用學習三千漢字 Chhoeshinphan sangyong haksŭp samchŏn hancha (Three Thousand Hanja for Everyday Study: New Edition)
弘新文化社 Hongshin munhwasa
Pictorial Sino-Korean Characters Jacob Chang-ui Kim
동아現代活用玉篇 Dong-A hyŏndae hwaryong okphyŏn (Dong-A Modern Practical Jewel Book) 東亞出版社 Dong-A chhulphansa
List of 1,800 hanja taught in South Korean schools Wiktionary

<FACE> is in the KSC standard as presented in 中日朝漢字字形對照 'Chinese-Japanese-Korean Chinese character form comparison' and Wiktionary. But there are multiple versions of the KSC standard, so maybe the form changed over the years.

I wonder what the standard form is in North Korea.

Grant (1982) and Kim (1984) were my introduction to hanja. I started learning readings from their indexes in 1987. I obviously wasn't paying attention to the form of <FACE> back then.

My copy of Hyŏnhak haksŭp okphyŏn has a cover attached upside down and a partly mirror-imaged page of publication information which do not inspire confidence. Is Hyŏnhaksa still in business? I can't Google a company site. LA SCAR

425 years ago today, Portuguese and lascarins invaded Kandy. Lascarin is from Persian لشکر lashkar.

Wikipedia's lascar article derives Persian lashkar 'army' from Arabic العسكر al-`askar 'the army'. So is lashkar an article-incorporating word like algebra or Haitian Creole lalin < la lune?

No, because the Persian word is attested in Middle Persian as <lškl> before the coming of Islam. So the direction of borrowing and analysis was the other way around: a Persian word without an article was reinterpreted as an Arabic article-noun sequence.

But why does the Arabic word have `ayn, a consonant absent from the Persian word (and Persian in general)?

Perhaps at the time of borrowing, the first vowel of Persian lashkar sounded more like Arabic /a/ after `ayn rather than Arabic /a/ after a glottal stop.

Another possibility - not necessarily exclusive - is that lashkar could not have been interpreted as al-'askar because the Persian form has no glottal stop. Persian la- sounded more like Arabic l`a, a voiced sequence without a stop, than l'a, a voiced sequence interrupted by a stop.

A question I can't answer is why the Arabic word has s instead of sh. FANNING RED EARS OF GRAIN

Yesterday I saw part of 47 Ronin (2013). I looked up that movie, and as a result I finally learned how to spell Akō in Japanese¹: 赤穗・あかほ <RED EAR.OF.GRAIN>/<akaho>², which looks as if it should be pronounced Aka(h)o, i.e., as aka 'red' + ho 'ear of grain' with little or no sandhi. But in fact the second and third vowels have fused into a single long vowel:

*aka-po > *akabo > *akaβo > *akawo > *akao > *akɔː > ak

I didn't expect that because the normal reflex of *apo is ao: e.g.,

*kapo > *kabo > *kaβo > *kawo > 顏・かほ <kaho> kao 'face'

I would expect ō to be from *a(p)u, not *a(p)o. Is *a(p)o > ō a sound change in the Akō dialect?

Standard Japanese has cases of the reverse that I can't explain:

*apuŋgu > 扇ぐ・あふぐ <afugu> aogu 'to fan' (cf. Okinawan ōjun 'id.')

Compare with this (mostly) regular word from the same root:

*apuki > 扇・あふぎ <afugi>  ōgi 'fan (noun)' (cf. Okinawan ōji 'id.')

*k > g is irregular. Here's a doubly irregular word:

*ambure- > 溢れ・あふれ- <afure> 'to overflow' (cf. Okinawan andiin < anriin < *ambure- 'id.')

I don't know how *mb became f. The spelling <afure> should regularly be read as ōre.

¹I read John Allyn's 47 Ronin in English around 1986 and incredibly never encountered the name Akō in Japanese until now!

²I write all Japanese forms in prewar kanji and kana orthography. Prewar kana orthography is closer to earlier pronunciation than modern kana orthography. HE HINDIKE EPOKHE?

243 years ago yesterday, John Adams predicted that

[t]he Second Day of July 1776, will beh the most memorable Epocha, in the History of America.

His use of epocha retaining Latin -a got me wondering about the etymology of the word:

from Ancient Greek ἐποχή (epokhḗ, “a check, cessation, stop, pause, epoch of a star, i.e., the point at which it seems to halt after reaching the highest, and generally the place of a star; hence, a historical epoch”), from ἐπέχω (epékhō, “I hold in, check”), from ἐπι- (epi-, “upon”) + ἔχω (ékhō, “I have, hold”).

I then looked up the etymology of ékhō:

From Proto-Indo-European *seǵʰ-.

But wait - how can that be? PIE *s- becomes Greek h-, not zero.

Oh, duh: Sihler (2008: 170) points out that Grassmann's Law applies to the secondary aspirate h- as well as the primary aspirates (kh th ph):

ἔχ ékh- < *hekʰ- < *seǵʰ-

(Here I assume the devoicing of the primary aspirates predates Grassmann's Law. Does it? The Proto-Greek Wikipedia page says Grassmann's Law may be post-Mycenaean. Mycenaean already had voiceless aspirates.)

Grassmann's Law does not apply to the future stem, presumably because the law must postdate deaspiration before *s:

ἕξ- héks- < *seǵʰ-s-.

I really should have known better because the same is true in Sanskrit. Compare:

budhyate < *bhudh- 'wakes'

bhotsyate < *bhudh-sya- 'will wake'

Bucknell (1994: 179) lists a variant future form bodhiṣyati with an -i- blocking -s- from conditioning the deaspiration of the preceding dh. But I have not been able to confirm this form in Monier-Williams, Whitney, or the Digital Corpus of Sanskrit.

As tempting as it is to regard Grassmann's Law as a shared innovation of Greek and Indo-Iranian, that's not possible. Grassmann's Law must postdate *s- > h- in Greek, a change that never happened in Proto-Indo-Iranian. (*s > h did occur later in Iranian but not in Indic.) Wikipedia's Graeco-Aryan page suggests that

Rather, it is more likely that an areal feature spread across a then-contiguous Graeco-Aryan–speaking area. That would have occurred after early stages of Proto-Greek and Proto-Indo-Iranian had developed into separate dialects but before they ceased to be in geographic contact.

While I'm on the topic of Greek h ... today I was surprised to see Hindikē for 'Indian' in Wikipedia's "India (Herodotus)" article. Until now I thought that 'India' had initial I- in Greek because it was borrowed from Old Persian Hinduš 'Indus' (after *s > h in Old Persian; cf. Sanskrit Sindhus) after Greek had lost h-. But that Wikipedia article gives the Greek spelling νδική <Indikḗ> for Hindikē, not νδική <Hindikḗ>. Google gives only seven results for νδική. One is an OCR error for νδική. Two (1 2) are Armenian dictionaries with no Greek that I can see, three (1 2 3) appear to be copies of the same Armenian dictionary, and one is a Greek Facebook post. Hindikē looks like an error for the standard form Indikē. THE ETYMOLOGY OF CANTONESE 1LAT

Today it occurred to me that Cantonese 甩 1lat 'to lose' may be cognate to 失 1sat 'to lose' (now a bound morpheme):

1lat < *l̥it

1sat < *l̥it

Also belonging to this word family is

6jat < *lit 'to escape' (also now a bound morpheme in Cantonese)

What was the original root initial? Two scenarios with two subscenarios each:

A. *l- is original, and *l̥- is

A1. from a devoicing prefix + *l- or

A2. by analogy with some other voiceless/voiced sonorant-initial verb pair.

B. *l̥- is original, and *l- is

B1. from a voicing prefix + *l̥- or

B2. by analogy with some other voiceless/voiced sonorant-initial verb pair.

The B scenario seems less popular. I've never seen anyone propose anything like it, probably because of a reluctance to posit a primary voiceless lateral. Voiceless laterals are uncommon in the world's languages, though they seem common in 'Tibeto-Burman' (i.e., Sino-Tibetan minus Chinese - and even within Chinese, Taishanese has [ɬ]).

But wait - if both 1lat and 1sat go back to *l̥it, why do they have different initial consonants in Cantonese? Two scenarios:

A. 1lat is native Cantonese, whereas 1sat (with cognates throughout Chinese) is borrowed. In other words,

But how many native Cantonese words have l- as a reflex of *l̥-? There are many Cantonese words with s- from Proto-Chinese *l̥-. Are they all borrowings?

B. l- and s- are the products of reduction at different points in time. Three identical Proto-Chinese sequences could undergo three different paths of reduction:

reduction phase 1
reduction phase 2
reduction phase 3
reduction phase 4
*l̥- *l̥-

The trouble is that I cannot easily account for a fourth type of reduction also involving an *sl-type sequence that fuses into *z-. More on this problem tomorrow.

( It seems that every time I write that I'll continue tomorrow, I end up finding some other topic that eats up my time the next day. In this case I am finishing a July 4th-themed post that has to go up on July 4th. So this and other loose ends will have to wait - or, worse yet, be forgotten. I have no idea how many unfinished series there are on this blog after seventeen years.) 'BASIL' IN TANGUT

While researching the post I originally intended for today, I found this Tangut borrowing of Sanskrit arjaka 'basil' in Kychanov and Arakawa (2006: 361):


4541 0013 3985 1a? 1zar 1ka'3

I would expect the Sanskrit consonant cluster -rj- to be rendered as -ryr dz- with an epenthetic retroflex vowel -yr and dz, the usual Tangutization of Sanskrit j. (Tangut, like Tibetan and Late Middle Chinese, reflects a style of Sanskrit pronunciation with dental affricates instead of palatal stops.)

( Compare zar for rja with ryr ga for rka in


4541 0795 5091 3369 4293 1a? 2ryr4 1ga4 1ma4 1si4 for Sanskrit Arkamasi [a name]

from Sun and Tai 2012: 359. I cannot explain the g for k.)

But instead of †ry dza, the actual Tangut form has 1zar1 with z- and vowel retroflexion. Why?

My guess is that the Tangut reflects  a rdza ka, the Tibetan version of the Sanskrit word for 'basil'. Here's what I think happened:

1. The Tangut borrowed Tibetan a rdza ka as *a rdza ka'. (I'm leaving out tones and grades for simplicity.)

2. *a rdza ka' became *a dzar ka' after *rCV became CVr (i.e., [CVʳ] with a retroflex vowel) in Tangut.

3. Medial *-dz- lenited to *-z-: *a dzar ka' > *a zar ka'. CAN AI DECIPHER PYU?

tl;dr: I doubt it.

I ended my last entry with a teaser for what was supposed to be this entry. Today I did start writing part 6 of my 役/堤 series. Then I saw this on reddit:

Machine learning has been used to automatically translate long-lost languages - Some languages that have never been deciphered could be the next ones to get the machine translation treatment.

That took me to MIT Technology Review which links to the original paper "Neural Decipherment via Minimum-Cost Flow: from Ugaritic to Linear B".  I haven't looked at it yet. I am not a computer science person, so I almost certainly wouldn't understand it. I do understand the MIT article, so I'll make a few comments here.

The big idea behind machine translation is the understanding that words are related to each other in similar ways, regardless of the language involved.

Universal grammar?

So the process begins by mapping out these relations for a specific language. This requires huge databases of text.

There is no huge database of Pyu text. My text file of all the Pyu text that I can 'read' (not understand - just transliterate in most cases) is 50 kb.

Such a database would be possible for Pyu's distant relative Tangut. A Khitan database, though far smaller than that for Tangut, would still be bigger than the Pyu database.

The key insight enabling machine translation is that words in different languages occupy the same points in their respective parameter spaces. That makes it possible to map an entire language onto another language with a one-to-one correspondence.

If only languages had one-to-one correspondences!

The idea is that any language can change in only certain ways—for example, the symbols in related languages appear with similar distributions, related words have the same order of characters, and so on.

The general idea that language change is constrained is correct.

With these rules constraining the machine, it becomes much easier to decipher a language, provided the progenitor language is known. 

But we don't know the progenitor (ancestor) of Pyu. The reconstruction of Proto-Sino-Tibetan has barely begun. I don't even know where Pyu fits into the family.

Luo and co put the technique to the test with two lost languages, Linear B and Ugaritic. Linguists know that Linear B encodes an early version of ancient Greek and that Ugaritic, which was discovered  in 1929, is an early form of Hebrew.

But Ugaritic is not an early form of Hebrew; it's an early relative. An aunt, not a mother. Mycenean Greek has a similar relationship to ancient Greek as we know it. No mention of progenitor languages like Proto-Semitic or Proto-Indo-European. It seems that the technique is actually dependent on better known relatives, not progenitors. And those relatives have to be close. Pyu has no known close relatives.

It would be interesting to test this technique on modern languages. Spanish could be deciphered using Italian. But Italian wouldn't help, with, say, Albanian, Armenian, or Bengali. Indo-European has enormous internal diversity, and so does Sino-Tibetan.

But the big advantage of machine-based approaches is that they can test one language after another quickly without becoming fatigued. So it’s quite possible that Luo and co might tackle Linear A with a brute-force approach—simply attempt to decipher it into every language for which machine translation already operates.

The hope is that Linear A will turn out to be a close relative of some "language for which machine translation already operates". But what if it isn't? What if it's an isolate?

Pyu does not seem to be an isolate in the sense of have zero relatives. But it does seem to be an isolate within Sino-Tibetan - an Asian Albanian without close relatives among its neighbors. So I doubt a brute-force approach  using Burmese, Chin, Karen, etc. is going to pay off. A WÉI-RD READING

One last branch of the tree that started with 役小角 En no Ozunu's name:

While checking the Wiktionary entry for 堤 from "Edachi Again", I was surprised when I saw its list of Mandarin readings for the character.  As the Sesame Street song goes, "One of these things is not like the others":

  1. 'dike; base of bottle'

  2. 'dike; base of bottle'

  3. tǐ (sic; an error for dǐ) 'to stop'

  4. shí (first syllable of 堤封 shífēng, now normally tífēng 'totally')

  5. wéi (in place names; the only example I could find is premodern洙堤郡 Zhūwéi Prefecture)

Normally multiple readings of a character have initial consonants at similar places of articulation. t- and d- are both dental and sh-, though not dental, is retroflex. w-, however, is labial. I cannot think of any other T-character with a w-reading.

I found 洙堤郡 Zhūwéi Prefecture in 集韻 Jiyun (1039). I did not find it in Scripta Sinica's text database, so I have no idea how old that place name is.

The Jiyun fanqie for 堤 in 洙堤郡 is

*win + 規 *kwie

which adds up to a Middle Chinese reading *wie. But Middle Chinese no longer even existed by 1039. And I could argue that 'Middle Chinese' in the sense of 'the language of dictionaries and rhyme tables' did not exist, at least not as a spoken language. Putting those misgivings aside, I think an 11th century reader might have pronounced堤 in the prefecture name as something like *wi whose initial is still hard to reconcile with the others.

I'm not even sure how to read 洙 in the prefecture name. More on this problem next time.

Tangut Yinchuan font copyright © Prof. 景永时 Jing Yongshi
Tangut character image fonts by Mojikyo.org
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2019 Amritavision