While looking up 5505 'sheep' (next year's calendrical animal) in Li Fanwen's Tangut dictionary, I stumbled upon

5501 2gy 'tool, utensil'

a borrowing from Late Middle Chinese (LMC) 具 *gy. Until recently I agreed with Gong that Tangut had plain (i.e., nonprenasalized) voiced stops. So of course LMC *g- would be borrowed as Tangut g-. But now I'm not so sure.

The Sino-Japanese reading gu for 具 was once *ŋgu which in turn might be from an earlier *ŋgo, an approximation of Early Middle Chinese *guo. Early Japanese had no *g-, but it did have a prenasalized *ŋg- which was the best available match for Chinese *g-.

Was Tangut like early Japanese? Did it lack plain voiced obstruents? Were Chinese words with voiced obstruents borrowed with prenasalized obstruents?

Looking at Guillaume Jacques' 2006 Tangut-Japhug rGyalrong comparisons, I see that Tangut g- nearly always corresponds to a Japhug nasal-obstruent sequence: ɴq-, ŋg-, ŋgr-, ʑŋgr-, mg-, mgr-. (The reverse is not true.*)

The one apparent exception involves words that may not be cognate:

2181 1ge 'valley' : Japhug co 'id.'

Guillaume Jacques (2004: 297) derived co from *twaŋ (cf. Written Burmese twaŋḥ 'well'). I can't imagine *tw- becoming a pre-Tangut *k- that would blend with a nasal prefix to become Tangut g-.

(If *tw- became pre-Tangut *k-, why is that cluster preserved in

0070 1thwə 'to open' : Japhug kɤ-cɯ < *-u 'id.'

with aspiration from preinitial *K-? I would not expect *tw- to back to *k- before the front vowel of 'valley'. But maybe the vowel of 'valley' was back when the backing took place.)

If there are any Japhug words with voiceless velar or uvular initials that correspond to Tangut g-, I would reconstruct a nasal prefix in pre-Tangut that was absent from Japhug:

Japhug k/q- : Tangut g- < *N-k/q-?

Guillaume's list of cognates had no Japhug words with a simple initial g-. The only instance of a Japhug initial cluster with g- not preceded by a nasal (tɯ-zgrɯ 'elbow') corresponded to Tangut k-:

1298 1kiʳw 'elbow'

(12.17.1:34: The retroflex vowel is normally from pre-Tangut preinitial *r-  or final *-r. I would not expect it to correspond to Japhug medial *-r-. Could pre-Tangut medial *-r- condition vowel retroflexion in certain environments? Maybe the two words aren't related. Could the Tangut word be related to Proto-Kuki-Chin *ki(i)w and rGyalrongic rk-forms like northern Ergong r̥kəu⁵³  and Danba Ergong ʐkiau?) 

Did pre-Tangut *g- devoice to k-? Did Tangut then develop a new g- from prenasalized *Ng-?

Next: Another possible fate for pre-Tangut *g-?

*12.17.1:17: Japhug nasal-stop sequences may correspond to Tangut oral stops:

Japhug ɴɢ-, mq-, ŋk- : Tangut k-

I presume that either the Japhug nasal is an innovation or that Tangut lost a nasal prefix.

This pair suggests that the pre-Tangut dental-velar cluster *ng- fused into ŋ- instead of g-:

2857 2ŋo 'illness' : Japhug kɤ-ngo 'to be ill' "MATHEMATICALLY ELEGANT" BUT INACCURATE

Last night, I wrote,

I do not believe in glottochronology or other mathematical shortcuts in historical linguistics.

Today I found Martin W. Lewis'* stance on such shortcuts (2012; emphasis mine):

Mathematically intricate though it may be, the model employed by the authors [of this Science article] nonetheless churns out demonstrably false information. Failing the most basic tests of verification, the Bouckaert article typifies the kind of undue reductionism that sometimes gives scientific excursions into human history and behavior a bad name, based on the belief that a few key concepts linked to clever techniques can allow one to side-step complexity, promising mathematically elegant short-cuts to knowledge. While purporting to offer a truly scientific* approach, Bouckaert et al. actually forward an example of scientism, or the inappropriate and overweening application of specific scientific techniques to problems that lie beyond their own purview.

I haven't read the Bouckaert article, so I can't comment on it. However, I think Lewis' criticism could also apply to other studies. The temptation to "side-step complexity" is particularly strong when faced with huge language families like Indo-European, Austronesian, and Sino-Tibetan. Wouldn't it be so much easier to toss all that data into a digital vat, push a button, and get an answer? But is that answer meaningful?

I confess I've used probability in my 2008 paper on Old Chinese. And I've written in favor of phonostatistics:

By observing trends in languages with different phonostatistical [i.e., sound frequency] patterns, one might be able to make predictions about later changes or explain known (and sometimes baffling) changes.

So I wouldn't say all mathematics is irrelevant to historical linguistics. But there are limits to mathematical models. Languages are not like carbon-14 or lifeforms; they do not change at a fixed rate, and their changes are Lamarckian rather than Darwinian: i.e., acquired charactersitics are 'inherited'. (Strictly speaking, no language is 'inherited'; learners 'recreate' languages by imitating the users around them. Biological metaphors are arguably inappropriate for linguistics, but we are stuck with them.) I predict that viable mathematical models for linguistics (1) will be developed by linguists and (2) will be sui generis instead of being inspired by, say, paleontology or biology.

Are there mathematical models that have churned out demonstrably true information in the social sciences? If so, could such models be modified for linguistics?

*Like me, Martin W. Lewis also prefers pixels to paper:

Rather than publishing my research findings in academic journals, I now post them on-line on the weblog. A GENERATIONAL CONSTRAINT ON SOUND CHANGE?

This passage from the Wikipedia article on Verner's Law made me wonder if it was possible to formulate a generational constraint on sound change (emphasis mine):

Moreover, the combination of the above-mentioned traditional order (Grimm's before Verner's) and the dating of Grimm's law to the 1st century BC requires an unusually fast change of the late Common Germanic at the turn of the millennium: within only a few decades, the three dramatic changes mentioned below would have had to happen in quick succession. This would be the only way to explain that all Germanic languages show these changes. Such a rapid language change seems implausible. Strictly speaking, it would have caused a child to be unable to understand his own grandparents.

I do not believe in glottochronology or other mathematical shortcuts in historical linguistics. So I hesitate to state that X amount of years is needed for a sequence of changes. Nonetheless it is hard for me to believe that intelligibility can be lost in as few as two generations. I presume that sound changes must be gradual if the language is to be transmitted at all.*

*12.15.2:15: That last sentence rhymes, but is it true?

Here's a simple scenario to demonstrate what I meant by gradual. Suppose we know that a language lost final stops: e.g., -k > -Ø. Is there any documented instance of speakers losing them in a generation or two? I would imagine that there would be a transitional generation with -ʔ, and that -k and (and later, and -Ø) would coexist within the same community for at least a generation or two.

How would gradualism work with, say, the Vietnamese consonant shift?

*s > *t > [ɗ]

I suppose the two consonants could have been ts or θ and d at an intermediate stage:

*s > *ts/θ > [t]

*t > *d > [ɗ]

θ might have hardened into the aspirated stop th of some Muong varieties: e.g.,

'hair' (data from SEAlang)

Proto-Vietic *-suk

Son La and Thanh Hoa Muong sak⁷

Hoa Binh Muong thắc

Vietnamese tóc [tawk͡p]

(There is more to the chain. Palatal ɕ became a new s, and in the north, retroflex ʂ merged with that s. However, that happened long after the rest of the shift was completed, which is why I didn't star ɕ and ʂ. Both fricatives are attested in 17th century Vietnamese romanization as <x> and <s>. There was no alveolar s in the dialect that was the basis of that romanization.) WHY SINO-TIBETAN RECONSTRUCTION IS NOT LIKE INDO-EUROPEAN RECONSTRUCTION (YET)

When I was walking across the University of Hawaii campus with Bob Blust around 1996, he asked me if I thought Sino-Tibetan reconstruction methodology was fundamentally different from the 'standard' reconstruction methodology of Indo-European or in his specialty, Austronesian. I answered no, and he seemed to like that answer. But now I'm not so sure.

Last week, David Boxenhorn told me that Proto-Indo-European (PIE) reconstructions

- are adequate to express what people want to say about language relationships

- "are largely reinterpretations of the same data"

- "have the advantage of being well-known"

I don't think the same can be said for Sino-Tibetan (ST) or even Chinese. At least not yet.

All language reconstructions symbolize language relationships. To reconstruct a proto-language is to assume a relationship between its descendants. Conversely, my unwilllingness to reconstruct, say, 'Altaic' indicates I don't think there is a genetic relationship between the 'Altaic' languages.

The 'distance' between a reconstructed proto-language and its presumed descendants is a gauge of innovation. And shared innovations are my preferred basis of subgrouping.

So far I know of only one large-scale Proto-Sino-Tibetan (PST) reconstruction: the 494-word list of Coblin (1986). Superficially it looks like a mixture of Chinese and Tibetan, which is what a nonspecialist might expect given the name 'Sino-Tibetan'. (It was largely based on Chinese and Tibetan data; many other languages were cited only once or twice.) But is such a blend sufficient to represent the relationships between the many ST languages? (Ethnologue lists 460 ST languages; regardless of whatever number one uses, there is no doubt that the vast majority are not in the same branches of ST as Chinese and Tibetan.) The internal complexity with ST may be so great that ST might be comparable to 'Nostratic' (whose existence is debatable) rather than Indo-European.

Nearly a decade later, Gong (1994) published a shorter list of Proto-Sino-Tibetan reconstructions which, as Coblin (2011) noted, were

identical in most of its details with his Old Chinese [reconstructions]. This then points to a tacit conclusion about the nature of Sino-Tibetan and early Chinese, i.e., that Common Sino-Tibetan [i.e., Proto-Sino-Tibetan] was, phonologically at least, virtually the same language as Old Chinese.

There is no Indo-European or Austronesian reconstruction like Coblin's or Gong's PST which strongly resembles a daughter language. While it is true that the earliest version of Schleicher's fable looks like Sanskrit, no modern PIE reconstruction could be mistaken for Sanskrit at a glance. It is not a priori impossible that one branch of a language family could be extremely conservative: e.g., Baltic. Even so, the degree of conservativism (Sinocentrism?) in Gong's PST reconstruction is exceptional.

Why are PST reconstructions so Sino and so Tibetan (and not so much Kiranti)? The answer lies in another difference between IE and ST reconstruction.

There are several key languages for PIE reconstruction, and all are well-known - down to the most minute phonetic detail in the case of Sanskrit. Some are even of considerable time depth.

On the other hand, nearly all ST languages are attested only from the last millennium, with the obvious major exception of Chinese, whose script is somewhat opaque (more on that problem below).

The temptation to follow Coblin and Gong's examples and stick to languages with early written records (Chinese, Tibetan, Burmese, and in Gong's case, Tangut) for ST reconstruction is strong. However, the possibility that languages without such records could be of considerable value for reconstruction cannot be denied. Age does not entail archaism: e.g., Sanskrit long ago lost *e and *o which partly survive in Italian today:

'seven': PIE *septm > Skt sapta but It sette

'night': PIE *nokʷt- > Skt nakta- but It notte

(Of course, there's no need to use Italian for PIE reconstructions since we have Latin with septem and nox. Nonetheless, the point remains that older is not better in every way.)

If Austronesianists were unaware of the Formosan languages, their Proto-Austronesian would really be Proto-Malayo-Polynesian: i.e., Austronesian minus Formosan. Could there be Sino-Tibetan analogues of Formosan languages whose value for reconstruction has yet to be recognized?

It would be unthinkable to reconstruct Proto-Austronesian today without reference to Formosan which is part of a core body of data that any Proto-Austronesian reconstruction must account for. Proto-Austronesian reconstructions are, to use David's words, "largely reinterpretations of [this] same data". Similary, PIE reconstructions are "largely reinterpretations" of a fixed set of languages: Sanskrit, Greek, Latin, etc.

Coblin and Gong's PST reconstructions are not based on the same data. Coblin cast a wider net than Gong, Gong included Tangut and Coblin did not, and even the Old Chinese that they use is different: Coblin used Li Fang-Kuei's reconstruction whereas Gong used his own reconstruction which is similar but not identical to Li's.

The reconstruction of Chinese is a problem unlike anything in Indo-European. Despite a wealth of evidence, there is no consensus on Old or Middle Chinese phonology; different scholars' reconstructions of Old Chinese can look like different languages, and conversion between reconstructions is not always easy.

Some reconstructions of 道 'road' (now dao in Mandarin):

Karlgren (1957): *d'ôg

Li Fang-Kuei: *dəgwx (from Schuessler 1987: 115)

Schuessler (1987): *gləwʔ

Starostin (1989): *lhūʔ

Gong (1994): *'ləmx

Baxter and Sagart (2014): *[kə.l]ˤuʔ

This site: *Cʌ-luʔ (*C- might be *q-)

Reconstructions of PIE, Proto-Austronesian, etc. are not constrained by a wealth of philological material, whereas the structure of Chinese characters, rhymes in poetry and dictionaries, and the rhyme tables combine into one gigantic phonetic algebra problem whose solutions vary depending on one's use of clues scattered throughout East Asia: modern Chinese languages, loanwords in non-Chinese languages, and transcriptions of Chinese in non-Chinese scripts and vice versa. Simply applying the comparative method is not enough; it cannot take us as far back as Old Chinese, just as applying the comparative method to modern Romance languages cannot restore all the details of Latin. One must also master traditional Chinese phonology and try to make sense of it in modern terms while also constantly looking outside the Chinese box for guidance.

The usual starting point for reconstruction is Middle Chinese. Like Pulleyblank (1994: 164), I believe in

[g]etting Middle Chinese right. What I mean by this, of course, is not that we should wait until we are one hundred percent certain about every detail of the phonological systems underlying the Qièyùn and the rhyme tables before looking at Old Chinese, simply that we should get it as right as we possibly can. That means, in my view, eliminating basic errors in Karlgren's system that inevitably get projected back and distort our views on the earlier stages of the language.

Those basic errors may not only be projected backward into Old Chinese but also outward into reconstructions of other languages recorded in Chinese characters during the Middle Chinese period. An error in PIE reconstruction has no similar chain reaction effects; Nostratic and the like aside, PIE is as far as Indo-Europeanists go, and the reconstruction of neighboring proto-languages is not dependent on PIE.

We are far from a PST reconstruction that has wide acceptance. Neither Coblin nor Gong's reconstructions are frequently cited - unlike PIE reconstructions that have even made it into the American Heritage Dictionary. There will be no "well-known" PST reconstruction until the controversies in Chinese reconstruction settle down or Chinese ceases to play a crucial role. The second scenario is particularly unlikely.

In short, ST/Chinese reconstruction methodology is different from the IE/Austronesian 'norm', and in a way the stakes are higher, as reconstructed Chinese is also the key to the reconstruction of other languages. None of that means that the basic principles of linguistics (e.g., the regularity of sound change) fail to apply in the East. The differences are necessitated by the nature of the extant written evidence, not some inherent exotic quality of Sino-Tibetan speech. If early Indo-European had its own tradition of phonological analysis and were in a complex script, and if early Uralic and Basque and extinct languages like Etruscan were transcribed in that script, then what I wrote about Chinese would also apply to Indo-European. WHY DO I RECONSTRUCT THE TANGUT LIGHT LABIAL AS V-?

In my last entry, I wrote that the lip-rounding of Tangut shibilants

was sufficient for them to behave like Class II v- and Class IX l- which may have been partway between [ɫ] and [w]

but did not explain why I reconstructed the Class II initial* as labiodental v- instead of bilabial w- which would be the obvious choice given the lip-rounding of the shibilants. Below I list three arguments for v-. In isolation, they may not be convincing, but together ... I'll admit I'm not 100% convinced myself, so I've followed them with three counterarguments.

1. It's Class II, not Class VIII

I'll start with the most obvious argument.

The Tangut had adopted the traditional Chinese classification of initial consonants. In the Chinese phonological tradition, *w- was considered to be a glottal initial (喉音). In Tangut, glottal initials are Class VIII. Like Gong, I reconstruct ʔw- but not w- in Class VIII. This fanqie shows how ʔw- was analyzed as ʔ- + w-:


2094 1ʔwĩ = 3003 2ʔɨu + 0209 1lwĩ
That fanqie would not make sense if the initial of 2094 were w- without a glottal stop.

Gong reconstructed w- in Class II. However, I prefer to reconstruct v- because that class corresponds to 'light labials' (labiodentals) in the Chinese phonological tradition.

There is, however, no guarantee that all 'light labials' were labiodental in the Chinese dialect known to the Tangut. In fact, I think that one 'light labial' in that dialect was *w-, as I'll explain below. So an argument based on labels is the weakest of all.

2. Tibetan transcription

Tibetan has a letter for w- but no letter for v-. If Tangut had w-, it should have been transcribed as w-. However, there are many different Tibetan transcriptions of the Class II initial (Tai 2008: 177-178):

Tibetan transcription Frequency
d-w- 19
w- 16
yw- 5
b-w- 4
ww- 2
b-, b-?-, d-?-, ny-, H-?, H-bh-, wy-, yww- 1 each

Nishida (1964: 82) also found a transcription wh-. Unfortunately its frequency is unknown.

d- may indicate a tone.

See section 5 below for yw- and wy-.

b-C-sequences transcribed Tangut Cw-sequences, so b-w- may have been equivalent to ww-. The doubling of w- and the transcriptions with stops (b-, H-bh-) may indicate that the Class II initial had more friction than w-: i.e., that it was v-. (b- and H-bh- normally transcribed Tangut b- which may have been prenasalized [mb]. H- before an obstruent indicated prenasaliation in Tibetan.)

All of the above assumes that Tibetan w- was [w] in the dialect(s) of the transcribers. It is possible that w- was [v] in those dialects and that ww-, etc. were attempts to write [w]! But such a scenario cannot account for the data in the next section.

3. Chinese transcription

Both the Class II initial and Class VIII ʔ- (followed by rhyme 1 -u) were used to transcribe the Chinese 'light labial' traditionally known as 微(Gong 2002: 436-437, 444). I interpret this to mean that Tangut v- and ʔu were the closest available approximations of Tangut period northwestern Chinese *w-. v- wasn't bilabial, but at least it lacked a glottal stop. Conversely, ʔu was close to w- apart from the glottal stop.

Moreover, the Class II syllable

2467 1vɨạ 'flower'

was transcribed as Chinese *fɨa in the Pearl (with a diacritic indicating the Tangut initial wasn't simply f-) and conversely,

1360 1va 'to hide'

transcribed Chinese 發 *fɨa in the Forest of Categories (Gong 2002: 437). These transcriptions make more sense if the Class II initial was v-. If. Here's why I'm not so sure.

4. A problematic fanqie

This Tangraphic Sea fanqie may imply that the Class II initial was w-:


0072 1võ (Class II initial in Homophones) = 1097 2ʔu (Class VIII initial in Homophones!) + 0365 1kwõ

(12.8.0:22: 0072 'to wish' is probably a borrowing from Chinese 望 *wɨõ which never had a glottal stop.)

But perhaps 0072 had different initials in the Tangraphic Sea and Homophones dialects. The Tangraphic Sea initial ʔw- could contain a prefix plus root initial v- (/w/ which became a fricative in initial position). If 0072 had a Class II initial in both dialects, maybe that initial was a labiodental glide ʋ- which is more u-like (i.e., has less turbulence) than v-.

5. Tibetan transcription revisited

The yw-, wy-, and yww-transcriptions before front vowels make me wonder if the Class II initial was palatalized vʲ- or even labiopalatal ɥ- in at least some environments. Could the 'vigilant' initials associated with Grade III be palatal?

ɥ- tɕ- tɕh- dʑ- ɕ- ʑ- λ- (instead of v/ʋ- tʂ- tʂh- dʐ- ʂ- ʐ- ɫ-)

The problem with the palatal interpretation of the 'vigilant' initials is that it clashes with the fact that Grade III is nonpalatal. Maybe nonpalatals became palatal in the dialect(s) transcribed in Tibetan.

6. Sanskrit transcription

Sanskrit v- is transcribed with Tangut ph- and b- as well as v-. If Tangut had w- instead of v-, b- would be an understandable approximation of v-. But what about ph-?

I suspect that the b- and ph-transcriptions reflect Sanskrit borrowed through Chinese. Late Old Chinese and Early Middle Chinese did not have *v-, so Sanskrit v- was transcribed as *b- which later became *ph- in the Chinese dialect known to the Tangut.

Part of the variation may be in the Sanskrit itself, as Sanskrit v "is often confounded and interchanged with the labial consonant b" (Monier-Williams 1899: 910).

In any case, Tangut-internal variation (v-ariation?) may cloud our picture of the Class II initial(s?).

*There is no consensus on the number of Class II initials. Sofronov (1968 II: 69-70) identified five fanqie chains for Class II initials on the basis of the surviving two-thirds of the Tangraphic Sea. It is possible that further chains may be in the lost 'rising' tone volume. In any case, it is possible to reconstruct up to five different Class II initials, assuming that each fanqie chain has a unique initial. Nishida (1964: 84) reconstructed four Class II initials (f- v- ɱv- w-) whereas I only reconstruct v-. The focus of this post is on the phonetic quality of v-, not on the number of Class II initials. I should argue against reconstructing multiple Class II initials elsewhere. WERE TANGUT RETROFLEX INITIALS PRONOUNCED WITH LIP-ROUNDING?

Tangut Grade III rhymes are normally preceded by 'vigilant' initial consonants. Although those consonants may seem to have nothing in common at first glance, I presented a case for their similarity here. Tonight I will add to that case by drawing parallels between Tangut and modern European languages. Flemming (2013: 55) wrote,

Non-anterior coronal fricatives are often produced with lip-rounding, for example post-alveolar fricatives in English and French and retroflex fricatives in Polish are all produced in this way (Ladefoged and Maddieson 1996: 148, Puppel, Nawrocka-Fisiak, and Krassowska 1977: 157). Again, there is no articulatory basis for this relationship. From an auditory point of view, however, we can see that rounding serves to enhance the distinctiveness of these sounds.

I now suspect that Tangut retroflex shibilants (Class VII tʂ-, tʂh-, dʐ-, ʂ- and Class IX ʐ-) were pronounced with lip-rounding like Polish retroflex cz dż sz rz/ż [tʂ dʐ ʂ ʐ]. This lip-rounding was sufficient for them to behave like Class II v- and Class IX l- which may have been partway between [ɫ] and [w] (cf. Polish ł [w] < *ɫ). The -w- of the irregular Tibetan transcription zhwa (as well as zha) for

4184 1ʂɨa 'to appear; a transcription character for Sanskrit śa and ca'

may reflect that lip-rounding. A medial -w- was phonemic after Tangut labialized retroflex shibilants: e.g.,

0372 wɨa 'to hold'

was distinct from 4184 1ʂɨa above, just as English schwa is distinct from Shah.

Although English [ɹ] may be pronounced with lip-rounding, I cannot tell if Tangut Class IX r- had such rounding because it only appeared in rhymes in which the Grade III/IV distinction was neutralized. If Tangut r- was retroflex, I would expect it to behave like the other retroflexes and have lip-rounding. But if Tangut r- was an alveolar tap or flap [ɾ], then it wouldn't have lip-rounding. EDWIN G. PULLEYBLANK (1922-2013)

Edwin G. Pulleyblank, perhaps my greatest influence, passed away last year. I found the task of writing about his impact on me to be so daunting that I kept putting off this entry. But I thought it would be appropriate to finally type something after my entry on Roy Andrew Miller, another man who changed my life. Short and late is bad, but nonexistent is worse.

My first exposure to Chinese historical phonology was in my second year as a Japanese major at Berkeley. My Classical Chinese class used George Kennedy's ZH Guide: An Introduction to Sinology as a textbook. Kennedy used simplified forms of Bernhard Karlgren's reconstructions of Middle Chinese with Gwoyeu Romatzyh-influenced tonal spelling to romanize Chinese characters in the book: e.g., ZI HOJ for 辭海 'word sea' (the ZH in the title). In a Japanese language history class, I used the Yunjing rhyme tables - presumably with help from Karlgren's reconstruction - to try to figure out the eight 'vowels' of Old Japanese. I can't remember what I concluded. It was almost certainly wrong: i.e., not much like what I wrote in my PhD dissertation eight years later.

Then I went to Japan in the summer of 1991 and picked up The Gakken Great Sino-Japanese Dictionary with Tōdō Akiyasu's Old and Middle Chinese reconstructions which were similar to Karlgren's.  Tōdō's Middle Chinese reconstruction was the first that I became familiar with. It was easy to transition from it to Karlgren's own reconstruction in his Analytic Dictionary of Chinese and Sino-Japanese which I bought in the fall of 1992.

Two characteristics of Karlgrenian* reconstructions are

- -y- or -y-like medial consonants** in over half the syllables: e.g., in syllables such as kya, chyi, pyu, etc.

- a large number of vowels

Karlgren's reconstruction had fifteen, including six kinds of a. Ramsey (1987: 131) described his vowel inventory as "exceedingly complex" and regarded "this overabundance of vowels [as] the least satisfying part of Karlgren's reconstructions."

Apparently no attested language has that much -y-. But I didn't know that at the time. How could the great Karlgren be wrong?

So when I first glimpsed Pulleyblank's Middle Chinese at a bookstore shortly before graduating Berkeley, I was shocked by his reconstruction. It was so unlike Karlgren's. It couldn't be right ... could it? And I wasn't going to pay $40 to give it a chance.

After returning to Hawaii and enrolling as a graduate student at the University of Hawaii, I spent much of 1993 borrowing every book of Karlgren's I could find in the library. I internalized Karlgren's Middle Chinese reconstruction and his arguments for it. (It is to Karlgren's credit that he could explain himself in a way that a beginner could understand.) I was a true believer.

In the fall, Leon Serafim lent me his copy of Middle Chinese. Now that I could read it at my leisure for free, I began to see the error of my ways.

I was disturbed by how Karlgren's -y- often corresponded to nothing in Sino-Vietnamese, Sino-Korean, and Sino-Japanese, even though those languages could have consistently reflected it - if it had really existed.

Moreover, I had begun my study of Chinese-like Southeast Asian languages, and neither they nor living Chinese languages had vowel inventories like Karlgren's.

Pulleyblank's reconstruction had looked so odd to me at first sight because it dealt with both these issues. Pulleyblank had reconstructed far fewer -y- and a smaller, more realistic vowel system.

Daniel Bryant provided the background of Pulleyblank's revolutionary reconstruction:

As EGP [Edwin G. Pulleyblank] himself has explained, it became clear to him, both from problems encountered in the use of Chinese transcriptions of foreign words in the writing of Background [The Background of the Rebellion of An Lu-shan] and in his attempts to bring historical phonology to bear on grammatical questions in Classical Chinese, that the reconstructions by Bernhard Karlgren of the premodern phonology of Chinese simply could not be made to agree with a broader range of evidence than Karlgren had used. Confronted with the monolith of Karlgren’s system and his reputation as the virtual creator and certainly the greatest master of modern studies of the historical phonology of Chinese, a lesser scholar might well have either suppressed the difficult evidence or, even more likely, have looked for a less problematical field in which to make a career. Nothing is more characteristic of EGP than his resolve to get to the bottom of the problem, whatever difficulties might arise in the form either of enormous toil or of injured sensibilities - Karlgren, for all his greatness as a philologist, regarded the reconstruction of premodern Chinese pronunciation as an accomplished task and, not incidentally, as one of his own accomplishing. The full force of his derision awaited anyone so incautious as to challenge the validity of his system, and all the more so in the case of EGP, whom the evidence soon compelled to move beyond the stage of simply amending Karlgren’s system in a few details to that of setting out to rethink the entire structure.

Pulleyblank made me "rethink the entire structure", but Karlgrenian reconstructions continued to be popular for half a century. I wish Pulleyblank had seen his reconstruction eclipse its Karlgrenian competitors during his lifetime. Alas, that was not to be.

I was never Pulleyblank's student, but I did see how his work did not receive the embrace that it deserved, and I learned the value of standing up for oneself and not giving in. Pulleyblank did his own thing by gathering unexpected evidence and challenging old assumptions. If only more were so bold.

I tried to follow Pulleyblank's example when reconstructing Tangut. My Tangut sound system has been de-Karlgrenized compared to its competitors. I trimmed excess -y- (= -i- in my notation) and simplified the huge vowel system as much as I could. All of my Tangut vowels are variants on a mere six vowels: u, i, a, ə, e, and o. I've gone outside the Sino-Tibetan box, looking to Korean and Khmer for inspiration when positing sound changes between Tangut and pre-Tangut. I don't see Tangut as a sui generis language; I see it as just another human language that might have been less exotic than people think.

All scholars should emulate Pulleyblank and, in Bryant's words, "think clearly about the effects of ignoring relevant material or of imposing arbitrary limits on the scope of one’s sources." Evidence is everywhere, and we mustn't hesitate to leave our comfort zones to find it - even if it forces us to abandon our initial conclusions or even reject the consensus. Unlike Karlgren who barely changed his Chinese reconstruction over the years, Pulleyblank constantly refined his reconstruction, and I myself consider my Tangut reconstruction to be a work in progress, subject to revision upon the discovery of new data. Stand your ground even when your peers reject you, but don't stand still for long. As this song I heard today put it, keep on moving! Pulleyblank never stopped. Nor will I. May his restless, indomitable spirit live on in us.

*12.2.4:33: I use the term 'Karlgrenian' to refer to Karlgren-like reconstructions such as Tōdō's as well as Karlgren's own reconstructions.

**I use y as a lay-friendly substitute for Karlgren's -i̯- and others' -j-. ROY ANDREW MILLER (1924-2014)

Today I learned that Roy Andrew Miller had passed away in Honolulu last August.

My understanding was that he had been living here since retiring from the University of Washington in 1989. I heard he was using the University of Hawaii library while I was a graduate student in the 90s. But I didn't recognize him because I didn't know what he looked like until tonight. I might have unknowingly walked past the man whose books changed my life.

About twenty-five years ago, I read Miller's The Japanese Language. At the time I was taking introductory linguistics. It was one thing to learn generic principles; it was another to see Miller apply those principles to a language that had always been part of my life. He made abstractions concrete and relevant for me.

I walked away from Chomskyan linguistics after a year, but I became a fan of Miller's, and I moved on to his other books.

I had learned of the hypothesis of a shared ancestor of Korean and Japanese back in high school when I did a term project on Korea. Miller's Japanese and the Other Altaic Languages went even further, introducing me to the idea of 'Altaic', a huge family encompassing Turkic, Mongolic, and Tungusic as well as Korean and Japanese.

I went to graduate school intent on proving the existence of Altaic. However, I actually ended up rejecting the Altaic hypothesis once I studied Old Turkic, Mongolian, and Manchu. To this day I don't think the five 'branches' of Altaic are related; they are certainly part of the same linguistic area, but that does not entail shared ancestry. Neighbors need not be blood relatives.

Although I now disagree with Miller about Altaic, I still owe him a great debt for expanding my horizons beyond Korea and Japan. I knew of the Mongols and Manchus from Chinese history in high school, yet I had never thought about their languages until I read Miller's book on Altaic. Studying Mongolian and Manchu led me in turn to the study of Khitan and Jurchen, two lesser-known languages that fascinate me even now.

Miller's books not only got me interested in the history of Japanese but also inoculated me against the vast mythology that grew around the language in the 20th century. Japan's Modern Myth and Nihongo: In Defence of Japanese appealed to me as a fan of Fight Back! with David Horowitz, James Randi, and John DeFrancis' The Chinese Language: Fact and Fantasy. How I love to see bogus claims debunked!

Linguists shouldn't just talk to themselves; they should reach out to the general public. Miller showed me how. I could understand his books as a teenager without extensive linguistic training. As an adult I struggle* to set the record straight about languages even in casual conversations.

Thank you for everything, RA Miller. Requiscat in pace.

*11.30.4:11: I wrote "struggle" because the burden is on me to make myself understood. I must get my point across without jargon or oversimplification. And it's hard to find a balance between clarity and accuracy. WHAT THE *-HƐːK IS GOING ON?

In languages with Chinese-type phonologies, tonal categories 'split' following the loss of a voicing contrast in initial consonants: e.g.,

Before the split: three tones

following *voiced and *voiceless consonants

After the split: six tones

following consonant that was once *voiceless
following consonant that was once *voiced


All nonimplosive obstruents became voiceless:

*pa A > pa A1

*ba A > pa A2

All sonorants became voiced:

*hma B > ma B1

*ma B > ma B2

Original implosives may have conditioned series 1 or 2 tones depending on the language:

*ɓa C > ba C1 (as in standard Thai)

*ɓa C > ba C2 (as in the Thai of Chiang Mai)

Although consonants may have changed, the series of the tone (1 or 2) gives away the original quality of the consonant.

The six tones may later partly merge: e.g., Saigon Vietnamese has a five tone system because tones B1 and B2 merged.

The six tones of Vietnamese

following consonant that was once *voiceless
A1: ngang
B1: hỏi
C1: sắc
following consonant that was once *voiced
A2: huyền
B2: ngã
C2: nặng

Certain consonant-tone combinations may be aberrant. Suppose there is a language which never had *voiceless sonorants. So all its sonorants should be followed by series 2 tones. Yet the language occasionally has series 1 tones after sonorants. How is that possible? It turns out the odd sonorant-1 sequences are only in recent loanwords and onomatopoeia that were created after the tonal split. All words, native or otherwise, predating the split follow the regular pattern.

Last week, I wrote about Tai Viet letters which I think were created to handle such anomalous consonant-tone combinations. I thought those combinations might have been in Vietnamese loanwords.

It turns out that Vietnamese itself also has such odd combinations. In Vietnamese, voiceless aspirates and fricatives normally precede series 1 tones: e.g.,

*kʰih > khỉ B1 'monkey' (*-h conditioned B tones)

*haːl > hai A1 'two'

(I will redundantly write the letter-number code for each tone even though the tone is indicated in Vietnamese orthography.)

Exceptions tend to be Chinese loanwords: e.g.,

*but > *fət > Phật C2 'Buddha'

*ɣɤek > *ɣɤac > hạch C2 'nucleus'

But what about exceptions that are not obviously Chinese: e.g., hạt C2 'seed'? One could mechanically reconstruct voiced aspirates and fricatives to account for them: e.g., *ɦat. In fact, that is what Thompson (1976: 1185) did; he reconstructed Proto-Viet-Muong *ɦot*. However, my impression is that other Mon-Khmer languages which did not lose a lot of initial consonants do not have voiced aspirates and fricatives. (I can't claim to have seen the phonemic inventory of every Austroasiatic language.) In fact the SEAlang Mon-Khmer Comparative Dictionary (MKCD)'s IPA input method doesn't even offer the option of typing voiced ɦ, implying that the consonant may not occur in the data. It's possible that Vietnamese preserved a lost elsewhere, but I'd rather not go that far.

The MKCD doesn't list hạt C2 'seed', but it does list hạch C2 'seeds' which I can't find anywhere else. I thought hạch C2 meat 'nucleus' (as I glossed it above). That Chinese loanword must have entered Proto-Vietic, as the MKCD has Proto-Vietic *-hɛːk 'seed' which is close to Early Middle Chinese *ɣɤek (phonetically *[ɣɤɛk] with vowel height dissimilation that later became even more pronounced after the palatality moved to the coda in *ɣɤac). I suppose the hyphen indicates the presence of a lost prefix. Perhaps that prefix had a voiced initial and was present at the time of the tone split, conditioning a series 2 tone that remained after the prefix was lost.

Conversely, maybe words like

*CV-taːw > *CV-ðaːw > dao A1 'knife'

had prefixes with voiceless initials at the time of the tone split, conditioning a series 1 tone that remained after the prefix was lost.

Returning to hạt C2 'seed', it has no lookalikes outside Vietnamese in MKCD other than Tho hɒːt⁸ with rounding absent in Vietnamese. I assume the even number indicates a series 2 tone. (If I am correct about Thompson's o being a typo, perhaps his Muong Khen hot should be hat.) Hạt resembles hạch, but not as much as the spelling might imply, as hạt has a long vowel and hạch a short vowel. In southern Vietnamese, hạch is pronounced [hat]. Could hạt [hat] be a phonetic respelling of hạch [hat]? That would not account for the difference in vowel length. Are the two words unrelated lookalikes? Is Tho hɒːt⁸ a borrowing from Vietnamese?

*I don't know why he reconstructed *o. I think the o in his data and reconstruction is a typo for a. CAN RAM'S HORNS ACCOUNT FOR SINO-KOREAN GRADE II?

Middle Chinese (MC) Grade II rhymes in premodern Sino-Korean usually have the vowel a: e.g.,

-ang in 江 kaŋ 'river'.

-aj in 解 haj 'untie, understand' (from "Stumped by the Sea Camel")

Most of the exceptions* have long puzzled me. They are unlike the corresponding rhymes in Sino-Japanese and Sino-Vietnamese and modern Chinese languages: e.g.,

Sinograph Premodern Sino-Korean Sino-Japanese (Kan-on) Sino-Vietnamese Cantonese Mandarin
kjək kaku cách [kac] gaak ge [kɤ]
kjəŋ kau canh [kaɲ] gang geng [kɤŋ]
kjəj kai giái [zaːj] gaai jie [tɕjɛ] < *kjaj
kʌj kai giai [zaːj] gaai jie [tɕjɛ] < *kjaj
kʌjŋ kau canh, cánh [kaɲ] gang geng [kɤŋ]
kʌjk kaku khách [xac] haak ke [kʰɤ]

Kan-on has a in all six readings.

Sino-Vietnamese has short a before palatals and long a elsewhere.

Cantonese has short a before -ng and long a elsewhere.

Mandarin has either [ɤ] or [ɛ] (both romanized as e).

MC reconstructions** (disregarding tones) don't match those Sino-Korean forms either:

Sinograph Karlgren Wang Li Li Rong Shao Rongfen Zhengzhang Shangfang Pan Wuyun Pulleyblank Baxter This site until recently
*kɛk *kæk *kɛk *kɐk *kɣɛk *kɯæk *kjaːk keak *kɛk
*kɐŋ *kɐŋ *kɐŋ *kaŋ *kɣæŋ *kɯaŋ *kjaːjŋ kaeng *kæŋ
*kăi *kɐi *kɛi *kɐi *kɣɛi *kɯæi *kjaːj keaj *kɛj
*kɐŋ *kɐŋ *kɐŋ *kaŋ *kɣæŋ *kɯaŋ *kjaːjŋ kaeng *kæŋ
*kʰɐk *kʰɐk *kʰɐk *kʰak *kʰɣæk *kʰɯak *kʰjaːjk khaek *kʰæk

In June, I proposed that MC Grade II was characterized by a medial -ɤ- ('ram's horns') from an earlier emphatic *-ʀˁ-. Ram's horns is the vocalic counterpart of Zhengzhang's medial consonant *-ɣ-.

I think premodern Sino-Korean had three different simplifications of MC *ɤV-clusters:

1. *ɤa > a (ignoring the first vowel; e.g., in 江 *kɤaŋ > SK kaŋ)

2. *ɤe > *e > (ignoring the first vowel; e.g., in 界 *kɤej > SK *kej > kjəj)

3. *ɤe > ʌ (ignoring the second vowel; e.g., in 皆 *kɤej > SK kʌj)

Either of the last two could apply to *ɤe.

Two shifts occurred prior to those simplications in the Chinese source dialect:

The second vowel of *ɤa sometimes assimilated to a following palatal:

aJ > eJ (e.g., in 庚 *kɤa > *kɤe > SK *keŋ > kjəŋ and 更 *kɤa > *kɤe > SK kʌjŋ; Tongguk chŏng'un has SK kʌjŋ for both)

Conversely, the second vowel of *ɤe sometimes dissimilated from a following palatal:

eJ > aJ (e.g., in 介 *kɤej > *kɤaj > SK kaj)

Notice how two originally homophonous Middle Chinese syllables (界 and 介) became distinct in the source dialect and remained distinct in Korean.

11.28.1:04: Standard Mandarin may preserve in 隔, 庚, 更, and 客. The [jɛ] of 界 and 皆 is from *ɤej > *ɤaj > *eaj > *jaj.

*11.28.1:20: The one exception that isn't surprising is SK -jo from MC *-jaw < *-eaw < *-ɤaw. This is parallel to SK -o from MC *-ɑw. Korean does not have syllables ending in -Vw.

**11.28.1:08: Forms are from except for Baxter's, mine, and Karlgren's *kɛk (a correction from GSR).

Strictly speaking, Baxter's MC forms are transcriptions, not reconstructions. They are not meant to be taken literally: e.g., keaj is not [keaj]. A HEART THAT REPORTED ALL TO MY LORD

Having mentioned the Early Middle Korean poem 悼二將歌 To i chang ka 'A Song Mourning Two Generals' (1120 AD) last night, I thought it might be interesting to examine its first line and demonstrate the problems involved in trying to decipher hyangga (early Korean poetry). The poem is in the hyangchhal script mixing semantograms (red) and phonograms (blue). Purple indicates words written as combinations of semantograms and phonograms. For simplicity I have transliterated all scholars' reconstructions in the same way except for Kim Wan-jin's which reflects the vowel shift hypothesis that I and others have rejected (e.g., Oh Sang-suk 1998 and Ko Seongyeon 2013).

Premodern Sino-Korean reading tsyu ɯr wan ho pʌyk ho sim mun
Chinese gloss lord second Heavenly Stem complete question particle; preposition white, to report (< make clear [i.e., white]) question particle; preposition heart hear
Late Middle Korean translation equivalent nim - o(ɣ)ʌro hɯy-  'white', sʌrp- 'report' - mʌzʌm tɯt-
Yang Chu-dong (1942) *nim *ɯr *oʌrɣo *sʌrβ-ɯn *mʌzʌm-ʌn
Chi Hyŏn-yŏng (1948) *orɣo *sʌrp-on *mʌzʌm-ɯn
Kim Wan-jin (1980) *ni(li)m *ər *uɔrɣu *sɔrβ-ən *mɔzɔm-ɔn
Yu Chhang-gyun (1994) *nim *ɯr *oʌrɣo *sʌrβ-ɯn *mʌzʌm-ɯn
This site (*nilim?) *(oɣʌr?-ɣ/h)o *(sʌr?)ɣo(n?) *(mʌzʌ?)m-Vn
Gloss lord ACC wholly
report-? heart-TOP
As for the heart that reported all to my lord ...

1. 主

There is no way to be sure how this word was read. We know that in the Koreanic Paekche language, 'lord' was transcribed in sinographs as 爾林 *ɲi(e) lim which represented something like *n(y)elim or *nilim, cognate to Late Middle Korean nim. Medial -l- survived in nali 'stream' in the Koryŏ kayo 'songs of Koryŏ' (Kim Wan-jin 1980: 211). So it is likely - though not certain - that 主 was *nilim. (As a convention I write the lost liquid of Korean as l to differentiate it from r which was retained.) The Chinese-Korean glossary Jilin leishi only tells us that 主 'lord' was 主 'lord' in Korean, which doesn't help; the informant may have used the Sino-Korean word to try to impress his Chinese interlocutor.

2. 乙

This cannot be a semantogram because the calendrical term 'second Heavenly Stem' makes no sense here. It must be a phonogram for the accusative ending *-ɯr which surprises me since I would expect 'lord' to be the direct object.

3. 完乎

The first sinograph is a semantograph whose reading is uncertain.

I don't know why Chi did not reconstruct *ʌ; Late Middle Korean o(ɣ)ʌ did not arise from the breaking of o.

乎 could represent *o (cf. early Sino-Japanese wo from Sino-Paekche), *ɣo, or *ho. Could Late Middle Korean o(ɣ)ʌro be a contraction of Early Middle Korean *o(ɣ)ʌrɣo? Or is 完乎 some unrelated synonymous adverb ending in *-(ɣ/h)o?

4. 白乎

If 白 is a semantogram for *sʌrp- and if lenition took place (if was in the previous word - there was no before lenition), then perhaps *p lenited to before *-o in this dialect. (In standard Late Middle Korean, *p lenited to the β that Yang, Kim, and Yu projected back into Early Middle Korean.)

*o/ɣo/ho is a poor phonetic match for Yang and Yu's *-ɯn and Kim's *-ən. Those reconstructions were influenced by the Late Middle Korean adnominal suffix -ʌn which would be expected in this context. Maybe *-o is an Early Middle Korean ending without a Late Middle Korean descendant. Or Chi is right and the ending was *-on. Its *o could have been reduced to ʌ in Late Middle Korean, and the final *-n might not have been written because it assimilated with the initial m- of the next word: *-on m- > *-om m- (analyzed in writing as <-o.m->). Further examples of <-V.N-> for expected *-VN N-sequences could verify this hypothesis.

5. 心聞

心 is a semantogram for an *-m-final word for 'heart'. If the ɣ-s above are correct - i.e., if lenition took place - then the medial consonant of 'heart' probably already lenited to *-z- (unless this poem was composed after *ɣ-lenition but before *z-lenition).

One might be tempted to regard 聞 as a verb 'hear', but that's not possible since Korean sentences do not end in bare verb stems. I think it represents the final *-m of 'heart' followed by a topic suffix *-ɯn that split into Late Middle Korean -ʌn and -ɯn depending on the height of the vowels of the preceding noun. There was no charcter with a Sino-Korean reading *mɯn, so 聞 mun was the best available match.

Yang projected Late Middle Korean -ʌn back into Early Middle Korean even though lower mid unrounded is not a good match for the high rounded u of Sino-Korean mun. Kim's *-ən has the same problem.

The only way to make Yang and Kim's readings work is to suppose that the scribe had the early Sino-Korean reading *mən for 聞 (cf. Sino-Japanese mon < Sino-Paekche) in mind.

If Chi's -on became Late Middle Korean -ʌn, then perhaps there was a rounded allomorph *-un of the topic particle due to labial harmony after 'heart' (whose Late Middle Korean ʌ might be a reduction of an earlier *a or *o) that was reduced to -ʌn and -ɯn in Late Middle Korean:

*ma/ozom-un > *-ɯn > mʌzʌm-ʌn

(11.27.1:36: *mazam and *mozam would not trigger labial harmony since the vowel closest to the suffix would not be labial. There is no way to be certain about the vowels since Jilin leishi only tells us that 心 'heart' in Korean sounded like Chinese 心 'heart' pronounced like 尋 which sounded like 心 with a different tone.)

I am skeptical about vowel harmony in Old and Early Middle Korean. If it existed, it might have worked differently from that of Late Middle Korean: e.g., the potential labial harmony after 'heart'.)

Oddly, Yang and Kim respectively reconstructed *-ʌn and *-ɔn in accordance with vowel harmony after 'heart' but violated vowel harmony by reconstructing *-ɯn and *-ən after 'report' which belonged to the same lower vowel class as 'heart'. On the other hand, Chi and Yu disregarded vowel harmony after 'heart'. A NEW BOOK: KIAER'S THE OLD KOREAN POETRY

I didn't know about The Old Korean Poetry: Grammatical Analysis and Translation or its author Jieun Kiaer until today. I would like to see it.

The The in its title is unusual; I'm surprised it wasn't removed in the editing process.

I am also surprised that a syntactician wrote that book. I initially assumed she was a historian whom I had not heard of.

I wonder how she dealt with Old Korean whose hyangchhal script is highly problematic. I have seen seven different complete decipherments, and I would like to see the decipherment in Ryu and Pak (2003). As Ramsey and Lee (2011: 57) wrote,

[... I]nterpretation of the hyangga retains a monumental task. We quite honestly do not know what some hyangga mean, much less what they sounded like.

I am curious to see if she has her own decipherment.

Moreover, since the description only mentions fourteen hyangga (Old Korean poems)*, I assume the other twenty poems covered in the book are in Middle Korean, as only twenty-six** hyangga have survived.

11.26.1:39: I am not sure what this line means:

This book provides linguistic explanations for each poem and essential vocabulary – both in Middle and Contemporary Korean.

What are the criteria for "essential" status? I assume all grammatical morphemes are included.

Does this vocabulary accompany each poem, or is it in an appendix?

There is no mention of Old Korean vocabulary, even though two-fifths of the book is about Old Korean poetry. Is that because there is no universally accepted decipherment of Old Korean?

Are the Contemporary Korean forms translations of the Middle Korean forms?

*11.26.2:24: I am guessing that these are the fourteen Shilla poems preserved in Samguk yusa (late 13th c. AD).

**11.26.2:46: I am counting the Koryŏ poem 悼二將歌 To i chang ka 'A Song Mourning Two Generals' (1120 AD) in the total. Lee and Ramsey (2011: 57) exclude it. STUMPED BY THE SEA CAMEL

Last night, I rediscovered the Haitai (해태 Haethae) brand and noticed that the English Wikipedia lists the Chinese characters for it as 海陀 'sea hill', one of the many variants of the name of the xiezhi:

Late Middle Chinese
Late Old Chinese
Character 1 gloss
Character 2 gloss
獬豸 해태 haethae, 해치 haechi
*xɤaj ʈhɤaj ~ ʈhi

*ɣɤeʔ ɖɤɑjʔ ~ ɖɨɑjʔ first syllable of 'xiezhi'
worm; crawl like a feline beast or reptile; disperse
no other uses
獬廌 'xiezhi'
understand (< 'untie')
해태 haethae Daum
*xɤaj thəj *ɣɤeh dəɰʔ slack (cognate to 'untie')
normally 'laziness'
해타 haetha *ɣɤaj thwɑ
*ɣɤeh dwɑjʔ/h lazy
咳唾 *xəj thwɑ *ɣəɰ thwɑjh cough
normally 'spittle'
해태 haethae *xəj thəj

*xəɰʔ dəɰ sea

normally 'seaweed'
海陀 해타 haetha, 해태 haethae
*xəj thɑ *xəɰʔ dɑj
hill; usually a phonetic symbol
no other uses?

海駝 Daum

The inclusion of Middle and Late Old Chinese readings does not mean all these terms existed in Middle and/or Late Old Chinese. See below.

陀 is normally read 타 tha, not 태 thae, a reading which seems to simultaneously reflect Late Middle Chinese *thɑ (< Early Middle Chinese *dɑ) and Late Old Chinese *dɑj. How is such a mixture of new and old possible? Although *th- ~ *d-variation is possible**, I doubt that Sino-Korean thae reflects an Old Chinese variant reading *thɑj for 陀  in 海陀, as I cannot find any attestation of that word before the History of Liao (1344), centuries after the Middle Chinese period. (How old is the Chinese place name 海陀山?)

Here's what I think happened (revised and expanded 11.25.23:23):

- The word may be a late 1st millennium BC loan from some non-Chinese language, as it has no Chinese etymology and I cannot find any attestations prior to 獬廌 in the Records of the Grand Historian (c. 109 BC). Moreover, all spellings are either partly or completely phonetic.

- 'Xiezhi' developed an abbreviated monosyllabic form 廌 (unless 廌 is a loanword and 獬廌 is a Chinese-foreign hybrid compound 'understanding 廌').

- Some of the phonetic variation may indicate multiple borrowings of the same word from different dialects of the same language or a set of related languages.

- Some of the spellings appear to be puns, and some may be of Korean origin, as I have not seen the last six for 'xiezhi' in a Chinese text.

- The earliest forms had two syllables with voiced initial consonants. Forms that would have been read with one or two voiceless initial consonants may be later spellings coined after voiced initials had devoiced (and often aspirated) in Late Middle Chinese: e.g., it is unlikely that 咳唾 is an old spelling because it would have been pronounced *ɣəɰ thwɑjh with a voiceless aspirate in Late Old Chinese absent from the earliest attested forms.

- The 海駝 'sea camel' spelling could reflect a folk etymology.

- The pronunciation haethae spread to the spellings 海陀 and 海駝 in Korean.

11.25.23:30: The mismatch between the characters 海陀/海駝 and the pronunciation haethae in Korean is reminiscent of the mismatch between the spelling colonel and its pronunciation. Cummings (1988: 449) wrote,

Etymologists do not agree completely on colonel, but whatever the historical dynamics of the word, it is a clear case of mixed convergence, the pronunciation of one, apparently earlier form, coronel, having become attached to the spelling of another.

*11.25.1:51: Obviously Shuowen is not a Korean reference source, but any word in a Classical Chinese text has a Sino-Korean reading.

**11.25.23:33: Late Old Chinese 太 *thɑs 'greatest' and 大 *dɑs 'great' go back to Early Old Chinese *hlats and *lats and share a root *lats. The *hl- of 太 must be from a voiceless prefix plus root-initial *l-. AVERAGING THAI SONG TONES

Two nights ago, I wrote,

I have no data on Thai Song, the third language written with Tai Viet, but I expect its *implosives to follow the same [tonal] pattern as Black Tai and White Tai.

In other words, I expected Thai Song tones to tend to be higher after reflexes of *voiced initials (other than *glottals including *implosives) and lower after reflexes of *voiceless and *glottal initials. Hence the heights of the tones would roughly match the names of the consonant letters they were associated with: i.e., HIGH and LOW.

Last night I found Somsonge Burusphat's 2012 compliation of Thai Song tones at twelve locations*. Her paper even includes tonal contours of individual speakers. Here are the average heights of each tone on a five-point scale (1 = lowest, 5 = highest) as described on page 37. I did not include varieties whose tones were only described in words in my calculations. As an example of how I calculated the averages, the Loei A tone is 241, 2 + 4 + 1 = 7, and 7 divided by 3 is .2.33. Then I added 2.33 to 3 (< 24, the average of Donyaihom), 3 (< 24, the average of Dontoom), 3.5 (< 34, the average of Suantaeng), etc. and divided that total by 11 (the number of languages with numerical tone descriptions), resulting in 2.9.

Proto-Tai tone A B C D
'low' tone class: *voiceless and *glottal initial 2.9 3.6 2.3 3.6
'high' tone class: *voiced initial 3.8 3.2 3.2 3.6

As in Black Tai and White Tai, the *voiced A and C tones are higher than their *voiceless/glottal counterparts, but there is litle or no height difference between the *voiced and *voiceless/glottal B and D tones. The *voiced vs. voiceless/glottal distinction correlates with contours that are masked by single-number averages:

Thai Song: usually level (sometimes falling) vs. rising

Black Tai: level vs. rising

White Tai: falling vs. rising

11.24.23:16: Here are the average heights of Thai Song tones at three points followed by their average contours:

Average starting points

Proto-Tai tone A B C D
'low' tone class: *voiceless and *glottal initial 2.3 2.5 2.5 2.7
'high' tone class: *voiced initial 3.3 3.3 3.7 3.6

Average mid points

Proto-Tai tone A B C D
'low' tone class: *voiceless and *glottal initial 3 3.6 2.3 3.7
'high' tone class: *voiced initial 4.5 3.3 3.6 3.6

Average ending points

Proto-Tai tone A B C D
'low' tone class: *voiceless and *glottal initial 3.5 4.6 2.1 4.6
'high' tone class: *voiced initial 3.5 3.1 2.1 3.5

Average contour

Proto-Tai tone A B C D
'low' tone class: *voiceless and *glottal initial 24 35 32 35
'high' tone class: *voiced initial 354 33 42 44

The composite *voiced tones always start higher, though this is obscured in the average contour table since 2.5 is rounded up to 3 and 3.3 is rounded down to 3.

I have not tried to average the presence or absence of glottalization in the C tone.

**11.24.23:24: I have excluded the Black Tai data from Vietnam, so all figures here are from eleven locations in Thailand. Although Black Tai and Thai Song lie on eastern and western ends of a spectrum, I was only interested in the tones of Thai Song (i.e., the varieties spoken in Thailand). See this table for Black Tai tones. BLACK AND WHITE EVIDENCE FOR VIETNAMESE PHONOLOGICAL HISTORY

Last night, I hypothesized that several unexpected letters in the Tai Viet script for Black Tai, White Tai, and Thai Song were devised to write consonants in loanwords with anomalous 'high' tones from Vietnamese:


(ꪒ U+AA92 TAI VIET LETTER LOW DO is for [d] + 'low' tones < implosive *ɗ)


(ꪖ U+AA96 TAI VIET LETTER LOW THO is for [tʰ] + 'low' tones < voiceless aspirated *th)


(ꪚ U+AA9A TAI VIET LETTER LOW BO is for [b] + 'low' tones < implosive *ɓ)


(ꪞ U+AA96 TAI VIET LETTER LOW PHO is for [pʰ] + 'low' tones < voiceless aspirated *ph)

Tonight I found even more letters of that type:


(does not resemble ꪂ U+AA82 TAI VIET LETTER LOW KHO for [kʰ] + 'low' tones < voiceless aspirated *kh; looks like a derivative of ꪅ U+AA85 TAI VIET LETTER HIGH KHHO presumably for White Tai* [x] < voiced *ɣ; the left side resembles HIGH PO which has no phonetic resemblance - could it be a graphic cognate of Khmer ឃ <gh>?)


(derived from ꪌ U+AA8C TAI VIET LETTER LOW CHO for [cʰ] + 'low' tones < voiceless aspirated *ch)


(derived from ꪬ U+AAAC TAI VIET LETTER LOW HO for [h] + 'low' tones < voiceless *h)


(derived from ꪮ U+AAAE TAI VIET LETTER LOW O for [ʔ] + 'low' tones < voiceless *ʔ)

I assume these letters could also be used to write native onomatopoeia and non-Vietnamese loanwords with anomalous 'high' tones.

By 'anomalous' I mean that a word has a tone not conditioned by the usual historical source(s) of its initial consonant: e.g., Black or White Tai could borrow Vietnamese thành 'to become' as, say, than 31, with a mid-low falling tone that normally developed after *voiced initials, not voiceless *th- which is the usual source of Black and White Tai th-. See these tables.

All of the above letters are for stops with the exception of HIGH HO for the fricative h followed by normally *voiced tones. It occurred to me that any Vietnamese loans with HIGH KHO, CHO, and PHO must have been borrowed before *kh *ch *ph became fricatives [x s f] in Vietnamese. In short, their stop quality dates them. (I am assuming that the Vietnamese dialects known to Black and White Tai speakers lost most aspirates like the major dialects did.) Such old loans can tell us that their tones in Vietnamese may have sounded like *voiced tones to Black and White Tai speakers at the time of borrowing. That resemblance may not have survived to the present day; Tai and/or Vietnamese might have changed its tones. Hence Vietnamese borrowings may be clues to tonal change (or its absence) in Black and White Tai and Vietnamese. There could be multiple strata of Tai borrowings from Vietnamese with different patterns of tonal correspondences: e.g.,

- suppose the Vietnamese ngang tone was once 44 (mid-high level)

- Black and White Tai speakers could borrow ngang as their tone 44

- then ngang lowered to 33 (mid level)

- Black and White Tai speakers borrowed ngang as their tone 22 (neither language has 33, so 22 is the closest match)

- so the same Vietnamese tone category (ngang) corresponds to two different tones in Black and White Tai: 44 in older loans and 22 in newer loans

Again, without any Vietnamese loan data on hand, I can't explore this idea any further.

*11.23.2:46: Is Black Tai [k] < written etymologically with HIGH KHHO or with LOW KO as if it were from *g? I don't know what the Thai Song reflex of is or how it is written. D-OU-B-LED LETTERS IN TAI VIET

After asking about the Lao script last night, I thought it might be a good time to ask a question about the Tai Viet block of Unicode:

Are the 'extra' Tai Viet d and b letters for Vietnamese loanwords?

I downloaded an SIL Tai Viet font last December, but forgot about it until Wednesday when I needed to install a pre-Unicode SIL IPA93 Sophia font to view John Coleman's page on Shilha. I looked in my folder of SIL fonts and rediscover their Tai Heritage font. Last night while looking through its character inventory in Andrew West's BabelMap, I was surprised to see two letters for d and b:





Although Thai has six letters for th and three letters for ph, it has only one letter each for d and b (< Proto-Tai and *ɓ) in native words:

Initial type *implosive *voiceless aspirated *voiced *voiced aspirated in Indic loans (pronounced as *voiced)
retroflexes in Indic loans (pronounced as dentals in Thai) ฎ <ɗ̣> > d ฐ <ṭh> *th > th ฑ <ḍ> *d > th ฒ <ḍh> *d > th
dental ด <ɗ> > d ถ <th> *th > th ท <d> *d > th ธ <dh> *d > th
labial บ <ɓ> > b ฝ <ph> *ph > ph พ <b> *b > ph ภ <bh> *b > ph

(Although neither Sanskrit nor Pali had a retroflex implosive ɗ̣, some Indic correspond to d written as ฎ in Indo-Thai loans.)

Similarly, Lao has two letters for th and two letters for ph (without counterparts of the 'extra' letters in Thai for Sanskrit and Pali loanwords), but only one letter each for d and b:

Initial type *implosive *voiceless aspirated *voiced
dental ດ <ɗ> > d ຖ <th> *th > th ທ <d> *d > th
labial ບ <ɓ> > b ຜ <ph> *ph > ph ພ <b> *b > ph

In Thai and Lao native words, d and b are associated only with tones that developed in *voiceless-initial syllables. (Proto-Tai implosives, though voiced, conditioned the same tones as *voiceless consonants in those languages.) The same is true for Black Tai and White Tai, two of the three languages written with Tai Viet (see Gedney's descriptions in Hudak 2008: 9, 12):

Black Tai tones

Proto-Tai tone A B and D C
'low' tone class: *voiceless and *implosive' initial 22 45 21
'high' tone class: *voiced initial 55 44 31

White Tai tones

Proto-Tai tone A D B C
'low' tone class: *voiceless and *implosive initial 22 45 24
'high' tone class: *voiced initial 44 454 31

The 'high' and 'low' tone classes in the Unicode names roughly correspond to the heights of the *voiced- and *voiceless-initial tones: all *voiced-initial tones start at 3 or higher on a 5-point scale, whereas only *voiceless-initial tones may start as low as 2.

I have no data on Thai Song, the third language written with Tai Viet, but I expect its *implosives to follow the same pattern as Black Tai and White Tai.

Hence I hypothesize that the 'extra' Tai Viet d and b letters are for borrowings of Vietnamese words with implosive initials đ- [ɗ] and b- [ɓ] and tones resembling native tones that developed in *voiced-initial syllables. Unfortunately I cannot test this hypothesis, because I have no Vietnamese borrowings in any script on hand.

11.22.4:04: The Tai Viet 'low' d and b letters are obviously related to the d and b letters of Thai and Lao.

Tai Viet ꪓ 'high' d looks like a ligature of ꪙ 'high' n and ꪒ 'low' d. I assume ꪙ high n was chosen to signify the tone class and is not a trace of earlier prenasalization: i.e., ꪓ 'high' d was never pronounced *[nd].

Tai Viet ꪛ 'high' b, on the other hand, looks like a ligature of ꪚ 'low' b and ꪝ 'high' p (< *b). Although one might think it once represented a cluster [bɓ], such a sequence is highly improbable.

Tai Viet ꪝ 'high' p in turn looks like a derivative of ꪜ 'low' p (< *p), a long-tailed derivative of ꪚ 'low' b, rather than a relative of Thai พ <b> ph < *b and Lao ພ <b> ph < *b which lack a right-hand tail.

Graphic cognates

Tai Viet Thai Lao
ꪒ 'low' d < ด <ɗ> d < ດ <ɗ> d <
ꪓ 'high' d < Vietnamese? no equivalent; น <n> + ด <ɗ> no equivalent; ນ <n> + ດ <ɗ>
ꪔ 'low' t < *t ต <t> t < *t ຕ <t> t < *t
ꪕ 'high' t < *d and ꪗ 'high' th < Vietnamese? ท <d> th < *d ທ <d> th < *d
ꪖ 'low' th < *th ถ <th> th < *th ຖ <th> th < *th
ꪚ 'low' b < บ <ɓ> b < ບ <ɓ> b <
ꪛ 'high' b < Vietnamese? no equivalent; บ <ɓ> + <b> no equivalent; ບ <ɓ> + ພ <b>
ꪜ 'low' p < *p and ꪝ 'high' p < *b ป <p> p < *p ປ <p> p < *p
ꪞ 'low' ph < *ph (not cognate to ผ <ph> ph < *ph) (not cognate to ຜ <ph> ph < *ph)
ꪟ 'high' ph < Vietnamese? no equivalent no equivalent
ꪠ 'low' f < *f (not cognate to ฝ <f> f < *f) (not cognate to ຝ <f> f < *f)
ꪡ 'high' f < *v (not cognate to ฟ <f> f < *v) (not cognate to ຟ <f> f < *v)

The last four Tai Viet letters in the table have no Thai or Lao graphic cognates.

Tai Viet ꪟ 'high' ph and ꪡ 'high' f look like derivatives of ꪝ 'high' p < *b in the Noto Sans Tai Viet font, but do not resemble 'high' p in N3220 or this Unicode chart.

I assume the 'extra' Tai Viet letters for 'high' th and ph are for Vietnamese loans with 'high' tones that would not normally follow native 'low' th. 'LOST' LAO LETTERS?

I shouldn't interrupt my Churyumov-Gerasimenko series (which itself interrupted a series on Tangut rhyme 4) but I want to ask this before I forget:

Were Sanskrit and Pali loanwords ever written etymologically in premodern secular Lao writing?

Today, Sanskrit and Pali loanwords are written phonetically in Lao, whereas they are written etymologically in Thai: e.g.,

Lao ພາສາ <bāsā> phaasaa

Thai ภาษา <bhāṣā> phaasaa

from Skt bhāṣā or Pali bhāsā 'language'. In earlier Lao and Thai, the word was *baasaa; neither language ever had a *bh or *ṣ. Later, *b shifted to ph in both languages.

I count Lao <bāsā> as a 'phonetic' spelling even though the word is no longer [baːsaː] in modern Lao because <b> is always [pʰ] in modern pronunciation; it is an absolutely regular spelling without any regard for Sanskrit or Pali. (To simplify matters, I will not discuss the interaction between consonants and tones in Lao and Thai.)

Conversely, I count Thai <bhāṣā> as etymological because it retains special letters <bh> and <ṣ> for Indic sounds that never existed in Thai.

As far as I know, the usual pattern is for religiously motivated scripts to keep 'extra', etymological letters even in secular writing, unless there are later attempts to eliminate those letters in modern times: e.g.,

- Russian only lost the Greek-based letters Ѳ (< theta) and Ѵ (< upsilon) less than a century ago, and lost others (Ѯ < xi, Ѱ < psi, Ѡ < omega) a little over three centuries ago.

- Ottoman Turkish retained 'extra' letters for Arabic loans up to its demise. I know of no attempt to create equivalents in the Turkish Latin alphabet, though that would have been theoretically possible.

- Persian retains 'extra' letters for Arabic loans to this day despite proposals for reforms that would eliminate them (see Sprachman 2002: 54-77 for examples).

- Burmese and Khmer, like Thai, retain 'extra' letters for Indic loans: e.g., Burmese ဘ <bh> and Khmer ភ <bh>.

- There was a short-lived attempt to eliminate the 'extra' letters in Thai.

Lao seems to be an exception to this pattern if my understanding of Enfield (1999: 260) is correct. I used to think that Lao script had lost the 'extra' letters (e.g., this post), but according to Enfield, it apparently never had them:

When people argue on this basis for a "return to tradition" through incorporation of the remaining characters [needed for etymological spelling of Indic loans], they are in fact not arguing for restoration, but for the modern, and in many cases novel, fixture of orthographical devices in the language. The deeper historical questions regarding developments of "native" Lao/Thai orthography are complex ones, which I cannot pursue here. But it is important to understand in the present context that the standardized etymological basis of Thai orthography in its present form, being literally designed to handle faithful transcription of Pali and especially Sanskrit, does not represent something that Lao once had or, in particular, could ever "go back to."

Yet Maha Sila Viravong's Lao alphabet had most of those 'extra' letters. I once assumed they were retentions, but they weren't even resurrections: e.g., the description of his Lao <ṭ> says it is (emphasis mine)

[o]ne of the 14 additional Lao letters that were created to transcribe Pali consonants. The letters were originally created in the 1930's by Dr Maha Sila Viravong who was working for the Buddhist Academic Council which was presided over by Prince Phetsarath.

I don't think they were created ex nihilo, though. I assume the letters were derived from some variant of the tham 'dharmic' script that retained the 'extra' letters. (The forms of the 'extra' letters in the two scripts as presented on Wikipedia do not always match: e.g., Lao <jh> and tham <jh>. Are the Lao forms novel inventions or are they based on variant letter forms not in Wikipedia or my fonts?)

Putting aside whatever happened in the 1930s, would Lao - and Thai - of centuries past have spelled Indic loanwords as if they were native words: e.g., *baasaa as <bāsā>? Is Lao <bāsā> a spelling that has been unchanged since the word was first written in a secular context? On the other hand, is Thai <bhāṣā> a modern pseudoarchaism?

Although I know something about Tai historical phonology, I know nothing about Tai philology. Why is Tai spelling history not mentioned in English-language studies of Tai language history? It is not as if the Tai languages were never written until modern times. Is it because Tai linguistics is largely the domain of field workers? I fear a large body of data (especially in Zhuang which is written in a Chinese-based script) has been overlooked.

11.21.2:41: Some romanizations of Lao names indicate knowledge of Indic etymology. Many examples are on this page: e.g., Bhuma for ພູມາ <būmā> [uːmaː] from Sanskrit/Pali bhūma- 'earth' (with final lengthening). I used to think these were transliterations of Lao spellings prior to a reform that eliminated the 'extra' letters, but if Lao never had these 'extra' letters prior to Maha Sila Viravong's alphabet, what is the origin of these spellings? Are they just carryovers from the Indic style of transliterating Thai? CHURYUMOV IN TANGRAPHY (PART 3)

Tangutizing the second syllable of Ukrainian Чурюмов [tʃuˈrʲumow] 'Churyumov' should be trivial. There is no doubt that Tangut had r- (transcribed in Tibetan as r-), and Gong Hwang-cherng (1997) and Arakawa (1997) both reconstruct two -ju rhymes*. Yet neither reconstruct a syllable rju. In Gong's reconstruction, r- can only precede retroflex vowels in rhymes 77-103 with the exception of rhyme 43 -jɨj. Similarly, in Arakawa's reconstruction, r- can only precede retroflex vowels in rhymes 77-103 with the exceptions of

rhyme 43 rjẽ2

rhyme 75 rjoŋ (rjọ̃ in his 1999 reconstruction**; Gong reconstructed ljọ with l-)

rhyme 77 rjek2 (rjẹ ĩn his 1999 reconstruction; Gong reconstructed reʳj with a retroflex vowel)

rhyme 78 re'2 (re'̣ in his 1999 reconstruction; Gong reconstructed rieʳj with a retroflex vowel)

rhyme 79 rje'2 (rjẹ' in his 1999 reconstruction; Gong reconstructed rjiʳj with a retroflex vowel)

I reconstruct rhyme 43 as -ẽ, and I think rẽ was a simplification of an earlier *rẽʳ with an unusual nasalized retroflex vowel like those of Kalasha (Heegård and Mørch 2004: 67). rjoŋ/rjọ̃ with rhyme 75 may have a similar explanation if its initial was r- (as opposed to Gong's l-): it may be from an earlier *rjọ̃ʳ with a nasalized retroflex tense vowel. (Does any language have that triple combination? I doubt it, but then again, I would be skeptical of nasalized retroflex vowels if I didn't know about Kalasha.)

Therefore the ryu of Churyumov would have to be Tangutized with a retroflex vowel as something like rjuʳ. Gong reconstructed twenty rjuʳ-syllables, whereas Arakawa only reconstructed eighteen. Arakawa may have accidentally left out the two members of Homophones A group 9.77. The obvious choice is

2147 2rjuʳ 'broom'

which rhymes with its synonym, the first half of
2271 0109 2zjuʳ 2gjịj (Gong), 2zzjuʳ 2gẹ̃ (Arakawa) 'comet' (lit. 'broom star')

So can we finally move on to Tangutizing -mov yet?

Next: Y not?

*[ju] is -yu in Arakawa's notation. I have rewritten Arakawa's and Gong's reconstructions in an IPA-like system to facilitate comparison.

**Arakawa (1999: 41) reconstructed both rhymes 73 and 75 as -ọ, but I think 75 -ọ was a typo for 75 -jọ̃ since he regarded 75 as a Grade II rhyme, his Grade II has medial -j-, and he placed 75 across from 57 -jõ. CHURYUMOV IN TANGRAPHY (PART 2)

In part 1 I decided to Tangutize the first syllable of Ukrainian Чурюмов [tʃuˈrʲumow] 'Churyumov' as 1013

which was pronounced something like chu.

Were Tangut shibilants palatal, alveopalatal, or retroflex?

I have been using the neutral notation ch to avoid answering that question up until now in this series.

There is no doubt that Tangut class VII initials were shibilants. They were most commonly transcribed in Tibetan as c-, ch-, j-, and sh- (ignoring preinitials; Tai 2008: 194). Moreover, one of the class IX initials was commonly transcribed in Tibetan as zh- (again ignoring preinitials; Tai 2008: 201). Although the Tibetan initials were probably palatal* [tɕ tɕʰ dʑ ɕ ʑ], that does not mean that the Tangut initials were necessarily palatal, as the Tibetan script had no characters for alveopalatal [tʃ tʃʰ dʒ ʃ ʒ] or retroflex [tʂ tʂʰ dʐ ʂ ʐ].

Middle Chinese had a distinction between palatals and retroflexes. Twelfth-century reflexes of both types of initials were used to transcribe Tangut shibilants in the Timely Pearl. That either implies that the distinction was lost (as in Phags-pa Chinese to the east from the following century) or that neither was a perfect match for the Tangut shibilants (which could have been alveopalatal).

Sanskrit also had a distinction between palatal ś [ɕ] and retroflex [ʂ]. Both were transcribed with Tangut sh-, though there are a few alveolar s-tangraphs for Sanskrit palatal ś-syllables and the s-tangraph 0493

could represent Sanskrit palatal ś, retroflex ṣ, and alveolar s (Arakawa 1997: 110-114). This admittedly small tendency to write Sanskrit palatal ś as Tangut alveolar s may suggest that Tangut sh was closer to Sanskrit retroflex ṣ. The correspondence of Sanskrit palatal ś to Tangut alveolar s is also reminiscent of the Russian transcription of Mandarin palatal x [ɕ] as palatalized alveolar [sʲ]: e.g., 西夏 Xixia 'Tangut' as Си Ся [sʲi sʲa].

However, the use of Tangut alveolar affricates (ts- tsh- dz-) to transcribe Sanskrit palatal stops (c ch j**) is not evidence against Tangut shibilant affricates being palatal because the variety of Sanskrit known to the Tangut had alveolar affricates instead of palatal stops.

Shibilants are one of the three types of 'vigilant' Grade III initials in Tangut. If Grade III was palatal as reconstructed by Gong, then I would expect the 'vigilant' initials to be palatals:

class II ɥ- (which I would not expect in a labiodental class)

class VII tɕ-, tɕh-, dʑ-, ɕ-

class IX λ-, ʑ-

I prefer to more or less follow 李新魁 Li Xinkui (1980)*** and reconstruct these initials as follows:

class II v-

class VII tʂ-, tʂh-, dʐ-, ʂ-

class IX l- [ɫ], ʐ-

These initials are all 'antipalatal': cf. how Russian retroflexes and nonpalatalized [v] and [ɫ] cannot precede [i]. Just as Russian /i/ retracts to [ɨ] after retroflexes, pre-Tangut *i became ɨi after retroflexes.

There is acoustical affinity between l and v: e.g., in Ukrainian, syllable-final *-l became /v/: e.g., *volk > /vovk/ 'wolf'.

Moreover, there is also acoustical affinity between retroflexes and labiodentals: e.g., in modern northwestern Chinese dialects (whose substrata if not ancestors were the dialects known to the Tangut 800-900 years ago), retroflexes became labiodentals before *w and *u (Coblin 1994: 97, 102)

*tʂ- > pf-

*tʂh- > pfh-

*ʂ- > f-

- > v-

So I am not surprised that the 'vigilant' initials form a class in Tangut.

Lastly, if the Tangut grades were like Chinese grades, Grades II and III were less palatal than Grade IV. And those are precisely the two grades associated with shibilants****. Hence I think palatal initials are less likely in those two grades.

In any case, Tangut must have had retroflexes at some stage, as the shibilant in

3200 1tʂhɨiw < *K-truk 'six'

is from an *r-cluster: cf. Classical Tibetan drug, Written Burmese khrok, and many other Sino-Tibetan words for 'six'. This retroflex *tʂh- from *K-tr- could have shifted to alveopalatal *tʃh-, palatal *tɕh-, or even alveolar *tsh- in Tangut dialects. There are rare cases of Tibetan alveolar affricates (ts- dz-) transcribing Tangut shibilants (Tai 2008: 194). If those instances of ts- and dz- are not errors*****, they may reflect the beginnings of a shift from retroflexes to alveolars.

Next: On to the second syllable in part 3.

*They were palatal in Old and Classical Tibetan (Jacques 2012: 90). There is no guarantee they were also palatal in the dialect(s) underlying the Tibetan transcriptions of Tangut. Nonetheless there is also no evidence suggesting that they were not palatal in that dialect or dialects.

**I have not seen any Tangut transcriptions of the rare Sanskrit consonant jh.

***11.19.0:24: Li Xinkui (1980) was the first to reconstruct retroflexes in classes VII and IX. My reconstructions are identical to his (as listed in Li Fanwen 1986: 126-127) except for (1) dʐ- corresponding to his aspirated dʐh- and (2) [ɫ].

****Arakawa (1997: 135) proposed that rhyme 50 which only has shibilant and l-initials (i.e., two of the three types of 'vigilant' initials) was grade I. Sofronov, Gong, and I regard 50 as Grade III.

*****The Tibetan characters for ts and dz are derived from the characters for c and j:

ts < c
dz < j

I would expect the extra stroke of ts and dz to be accidentally omitted rather than accidentally added. Thus I suspect the scribes intended to write ts and dz, though it is not clear whether they actually heard [ts] and [dz] or if they misheard shibilants as [ts] and [dz]. CHURYUMOV IN TANGRAPHY (PART 1)

Last night I wrote,

I still think each half is appropriate since Philae did cause us to see 67P/Churyumov-Gerasimenko. I'm not going to try to Tangutize all of that.

But tonight I decided to try anyway since Tangutizating Churyumov and Gerasimenko brings up some interesting issues in Tangut phonological reconstruction.


Before I deal with Tangut, I have a Ukranian question. I assume that Чурюмов [tʃuˈrʲumow] 'Churyumov' was once Чурюмовъ with a final weak yer. Normally o fronted to i before a weak yer in Ukrainian: e.g., Харьковъ > Харків 'Kharkiv'. Yet the surname is not *Чурюмів. Nor is the genitive plural of мова 'language' *мів from мовъ; it's мов.  All other forms of мова never had weak yers in the syllable before o. Was o restored by analogy in those words?

Tangutizing Chu-

Many scholars (Nishida, Sofronov, Huang, Li Fanwen, Gong Hwang-cherng, Arakawa, and most recently even myself) reconstruct Tangut rhyme 1 as -u. (Huang and Li also respectedly reconstructed -iu and -ü.) Hashimoto reconstructed a long vowel -U [uː]. This rhyme was almost always transcribed in Tibetan as -u (Tai 2008: 204), and it was used to transcribe Sanskrit -u and (Arakawa 1997: 110, 112).

The consensus is that rhyme 1 was Grade I. Grade I rhymes never follow class VII initials: i.e., shibilants such as ch-. Therefore there was no Grade I syllable chu in Tangut. Why were ch- and -u incompatible in Tangut? That question incorporates the assumption that rhyme 1 was -u. Perhaps the Grade III (Arakawa's Grade II) rhyme 2 syllable

that transcribed the Tangut period northwestern Chinese cognates of modern Mandarin

朱蛛猪諸 zhū [tʂu ˥]

竹竺 zhú [tʂu ˧˥]

zhǔ [tʂu ˨˩˦]

zhù [tʂu ˥˩]

zhōu [tʂow ˥]

zhǒu [tʂow ˨˩˦]]

was chu (and therefore the best match for the Chu- of Churyumov), and the Grade I rhyme was something other than -u. Here are three possible scenarios:

Grade/rhyme A B C
I/1 nonshibilant + -u nonshibilant + X + -u nonshibilant + X + -u
III/2 shibilant + X + -u shibilant + -u shibilant + Y + -u

In scenario A, rhyme 1 was -u, but rhyme 2 had an extra quality X that differentiated it from simple -u: e.g., Gong Hwang-cherng's -j-, Arakawa's -y- (= [j]), and my -ɨ-.

In scenario B, it is rhyme 1 that had an extra quality X that differentiated it from simple -u. If Gong Xun (2014) is correct, that quality may have been pharyngealization or retracted tongue root. But this begs the question of why the Tangut would use rhyme 1 to transcribe Sanskrit -u and without pharyngealization or retracted tongue root. My short answer is that rhyme 1 was the best available match after many initials. I will elaborate on that answer in the future.

In scenario C, neither rhyme had a simple -u. For years until recently I reconstructed rhyme 1 as -əu and rhyme 2 as -ɨu.

Next: Tangutizing the rest of Churyumov. PHILAE IN TANGRAPHY

Having just written about the Tangut word for 'comet', I wanted to come up with a Tangutized name of the Φιλαί Philae lander. A Tangutization could be based on at least three different pronunciations:

English [fajli]

Modern Greek [file]

Classical Greek [pʰilai]

Each poses at least one problem for Tangutization.

Did Tangut have f-?

As far as I know, only Nishida, Huang Zhenhua, and Arakawa reconstructed f-. I do not think it is completely impossible, but I am not yet fully convinced. It is a low-frequency initial in all three reconstructions:

- it appears before only 18 of Nishida's (1964: 85-86) 102 rhymes

- it appears before only 6 of Arakawa's (1997: 125-149) 105 rhymes

- it appears before only 12 of Huang's (1983: 128-134) 97 'level' tone rhymes

Only Arakawa and Huang (?) reconstruct f- before -i(:) in 3859 'rat' which has no homophones:

Arakawa 1fi:, Huang 1fi (?)

but Nishida 1wi, Sofronov 1968 1xi̭we, Gong 1xjwi

That tangraph was in the labiodental chapter of Homophones, but its initial fanqie speller - also in that chapter - had an initial fanqie speller in the glottal chapter which includes back (velar?) fricatives:


3850 1xwɨi (rhyme 10) = 0418 1xwɨə + 2228 pi (rhyme 11, not 10!)


0418 1xwɨə = 2504 1xu + 1760 1ʂwɨə

Unfortunately there is no Tibetan transcription data for 2504 or any other tangraph in its initial fanqie chain (VIII 10, Tai 2008: 196).

2504 is a transcription character for Sanskrit hu. That would be unlikely if its Tangut reading were fu - unless Tangut had no hu or xu. It transcribed both *x- and *f-initial Chinese syllables.

On the other hand, why would a xw-syllable be placed in the labiodental chapter of Homophones? Was [f] an allophone of /xw/? Were [f] and [x] in free variation before -u?

Moreover, 0418 means 'Buddha' and may be a loanword from Tangut period northewestern Chinese 佛 *fɨə. Was that word borrowed with f- and/or xw-?

In any case, I wouldn't want to Tangutize Philae with 'rat'.

Did Tangut have -ai?

Even if Tangut had f-, I am unaware of anyone reconstructing a Tangut syllable like fai. And it is doubtful that the extant recorded varieties of Tangut had -ai (though such a rhyme could have existed in unwritten dialects). Such a rhyme should have corresponded to -aHi in Tibetan transcription, but no such transcription exists.

ai is rare in Sanskrit, so it is not surprising that Arakawa (1997: 113) listed only four Tangut transcriptions of Sanskrit Cai-syllables:

4884 2ni (Grade IV rhyme 11) for Skt nai and ni; transcribed in Tibetan as niH

2563 2mɛ (Grade I rhyme 34) for Skt mai; rhyme mostly transcribed in Tibetan as -i and -e

4262 2be (Grade IV rhyme 37) for Skt vai; rhyme mostly transcribed in Tibetan as -e

5300 3639 1tə 2reʳ (Grade IV rhyme 79) for Skt trai; rhyme mostly transcribed in Tibetan as -e

If Tangut had a rhyme -ai, all Sanskrit syllables would have been transcribed with that rhyme and its retroflex variant -aiʳ after r-.

The use of both Tangut -i and -e-type rhymes indicates that Tangut had no exact match for -ai. -i imitated the second half of -i while mid front -e was a compromise between low a and high front i.

One might counter that the Tangut heard a foreign (e.g., Tibetan or Chinese) pronunciation of Sanskrit with a monophthong like e instead of ai, but if that were the case, the Tangut could have consistently transcribed that e as e.

Following the precedent of transcribing Sanskrit ai with -e-type rhymes, I would Tangutize Philae as

0749 0046 1phi 2le

which one might be tempted to 'translate' as 'cause to see', though that would be ungrammatical in Tangut since the causative 1phi follows verbs. I chose 1phi because it transcribed Tangut period northwestern Chinese *phi-syllables (霹鼻脾備琵) in the Timely Pearl. 2le 'to see' transcribed Sanskrit le. Even though 0749 0046 can't be a Tangut phrase, I still think each half is appropriate since Philae did cause us to see 67P/Churyumov-Gerasimenko. I'm not going to try to Tangutize all of that. I'll settle for

3200 1084 2205 0749 1tʂhɨiw 2ɣạ 1ʂɨạ 1phi  'six ten seven causative'

with a Tangutization of English [pʰi] 'P'. ARE COMETS BEAUTIFUL BROOMS?

Since a comet has been in the news lately, this would be a good time to take up Andrew West's suggestion to write about the Tangut word for 'comet' which appears in Timely Pearl 074:

2271 0109 2ɮyʳ 2gẹ 'comet'

It corresponds morpheme by morpheme to its Chinese translation 掃星 'broom star'. 2271 is probably a derivative and special spelling of 3695 2ɮyʳ 'broom' used in astronomical contexts:


2271 = 'grass' (left of 3695) + an element Grinstead (1972: 28) glossed as 'finery' and 'ornament', perhaps from a character such as 0364 tsẽ 'beautiful' - were comets 'beautiful brooms' in the sky as opposed to those on the ground which could be held with hands (the right-hand element of 3695)?

Is 2ɮyʳ 2ge a calque of 掃星?

Although I do not know of any pre-Tang attestations of 掃星, 彗星 'comet', also literally 'broom star', goes at least as far back as the Han Dynasty, and Karlgren (1957: 143) glossed 彗 by itself as 'comet' in the pre-Han Zuo zhuan.

Unfortunately there is no way to determine how old the Tangut term is; the most I can say is that it certainly was not invented on the spot by Timely Pearl author Gule Maocai in 1190, as it also appears in other texts: 53A53 in the first edition of Homophones which was written 65 years earlier, Newly Assembled Precious Dual Maxims (1187), and volume 6 of the Tangut translation of the Golden Light Sutra. And each half is in the Precious Rhymes of the Tangraphic Sea dating from sometime after 1069.

I do not know of any Tibetan term for 'comet' like 'broom star'. Are there languages outside the Sinosphere with 'broom star' for 'comet'? How likely it is that the Tangut coined the term independently?

2271 could also mean 'comet' in the compound

2814 2271 2ɬị 2ɮyʳ 'comet'

'moon comet'

from Timely Pearl 083. It too is a morpheme-for-morpheme match of Chinese 月孛 'moon comet'. Could Late Middle Chinese 孛 *pɦot be the source of Tibetan phod 'comet'?

In Homophones 53A52 and Timely Pearl 265, the regular tangraph for 'broom' is paired with a rhyming synonym also written with the 'grass' radical:

3695 2147 2ɮyʳ 2ryʳ 'broom (and?) broom'

Are those two words cognates? I would not expect lateral ɮ- and retroflex r- to be in the same word family. Is 2ɮyʳ from *2l-ryʳ? Are there other pairs of ɮ- and r-words with identical rhymes and similar semantics?

Nishida (1964: 213) has the English translation 'broom' for 3695 2147, implying that it was a redundant compound. Other possible redundant compounds are:

2147 4260 2ryʳ 2ɬø̃ 'broom (and?) broom' (Homophones A 51A37 and 48B32)

4260 2147 2ɬø̃ 2ryʳ 'broom (and?) broom' (Tangraphic Sea 1.55.211)

0094 4910 2147 1ʂwo 2vɛ 2ryʳ 'sweeping broom' (Tangraphic Sea 1.55.211)

0094 4910 2147 1ʂwo 2vɛ 2ɬø̃ 'sweeping broom' (Tangraphic Sea 1.81.252)

0094 4910 1ʂwo 2vɛ is a verb 'to sweep'. Although Li (2008: 16, 777) glossed it as a noun followed by a verb, the two halves seem inseparable, so I regard it as a disyllabic root rather than as a compound. THE M-D-L-ED MYSTERY OF TANGUT RHYME 4 (PART 1)

The second syllable of
3721 5407 2bʌ 2dɤu 'stupa, pagoda'

has Tangut rhyme 4

0730 1mɤu 'protruding mouth; pestle' (name of the level tone variant of rhyme 4)

0310 2mɤu 'transcription of Sanskrit mu, mū; cord (< Chn 纆?); to wipe (< Chn 抹?); to connect' (name of the rising tone variant of rhyme 4)

which has mystified me for almost seven years.

Modern scholars have categorized Tangut rhymes in terms of 'grades'. I am not entirely comfortable with that because I don't know of any Tangut word for 'grade'. I have not seen any of the Tangut translation equivalents of Chinese 等 'grade' used in a phonological context:

0382 1dzɨi 'equal, even'

0424 2te 'equality (< Chn 等); to measure'

0724 2nə 'plural suffix'

1290 2tsew 'class'

1576 2kɑ̣ 'equality'

17371kɑ 'equal, even'

(Of course, there is the possibility that the Tangut used an entirely different word for 'grade' that has not yet been identified. Tangut phonologists were surely familiar with the concept from Chinese phonology; the issue is whether they applied it to their own language.)

Moreover, what I have seen of the Tangut rhyme tables (not enough!) was not arranged by grade unlike the Chinese 韻鏡 Yunjing 'Rhyme Mirror' rhyme tables. Nonetheless, I am convinced by Gong's (1994) arguments in favor of Tangut grades, though I favor four grades instead of three. And I would add another argument: each grade is strongly correlated with a different set of initials:

Non-l liquids





(I have omitted the controversial and rare class IV initials.

✓ means 'present'.

X means 'ideally absent'. Red means 'actually absent'; yellow means 'ideally absent but exceptions exist': e.g.,

labial b- and glottal ʔ- and x- before grade III rhyme 2 -ɨu

alveolar tsh- before grade II rhyme 8 -ɤi

velar k-, alveolar dz-, and glottal ʔ- before grade III rhyme 10 -ɨi

l- before grade IV rhymes 3 -y and 37 -e

Exceptions list added 11.15.1:20.)

Today I coined the term 'vigilant' to refer to Grade III. Vigil is a mnemonic for three initial types associated with Grade III: v- for labiodentals (class II), g- [dʒ] for shibilants (class VII + class IX ʐ-), and l.

Grade IV is nonvigilant: i.e., it generally follows initials other than those three types.

Grade II is 'hypervigilant'; it can have any initial - vigilant or nonvigilant - other than alveolars and r-. Gong derived Grade II from medial *-r- in a 1993 paper I have not yet seen:

*CrV > CV + Grade II

I think Gong was right because his hypothesis predicts no r- before Grade II unless pre-Tangut had a cluster *rr-. And *alveolar-r clusters may have become retroflexes as in Chinese: e.g, *sr- > ʂ-.

Grade I is shibilant-free. Maybe I could call it 'hypervalent', meaning that it may occur after v-, l-, and nonvigilant initials, but not shibilants.

Rhymes 1-4 were all transcribed as -u in Tibetan. All scholars who reconstruct grade systems in Tangut agree that rhyme 1 was grade I, but disagree on the others:

Non-l liquids
Hashimoto 1965
Gong 1997
Arakawa 1999
Sofronov 2012
This site
✓but not m-!

✓but not d-!


b- only
ʔ-, x- only
m- only
d- only
ɬ- only

The initials of rhyme 4 are unlike any of the initial sets expected for the four grades. They are all back initials with the exceptions of m- and d- (as reconstructed by Gong) and ɬ- (= Gong's lh-). Why do m- and d- appear before rhyme 4 but not rhyme 1 in Gong's reconstruction?

Next: Were m- and d- really m- and d-? STUMPED BY 'STUPA' (PART 4: ETYMOLOGY OF THE SECOND SYLLABLE)

Having covered homophones of the first syllable of

3721 5407 2bʌ 2dɤu 'stupa, pagoda'

in parts 2 and 3, and hypothesizing that

4908 1b(w)ʌ 'ceremony and propriety'

might be related to it, I am now going to look for a plausible cognate of the second syllable among its (near-)homophones:

A group
Homophones A page/location Homophones B/D group Homophones B/D page/location Tangraph Gloss Reading Tangraphic Sea rhyme Overall rhyme
III.5 12A48 (1) 13A41 to exist, have, place 1dɤu 1.4 4
12A51 13A42 peaceful
12A52 13A44 building
12A53 13A45 first half of 0979 0978 1dɤu 2da 'slow, obtuse, dazed'
III.6 12A54 13A43 anger, rage (< Chn 怒)
12A55 13A46 to ban, prohibit, resist; to sink, drown, trap (when reduplicated)
12A56 (2) 13A47 to measure (< Chn 度) 2dɤu 2.4
12A57 13A48 second syllable of the surname 4561 2284 2ba 2dɤu (almost homophonous with 3721 5407 2bʌ 2dɤu 'stupa'!)
12A58 13A51 second syllable of 4373 4281 2dɤu 2lɨi 'pear tree' (< Chn 杜梨)
12A62 13A53 second word of 2671 0712 2bʌ 2dɤu 'drawers and stomacher' (< Chn 肚), a homophone of 3721 5407 2bʌ 2dɤu 'stupa'
12A63 13A54 second syllable of 0691 0710 2bɤa 2dɤu 'large-collared gown' (not in Tangraphic Sea)

Homophones group numbering follows the conventions established in part 3.

Note how the different placement of and in the middle implies that they could have had different tones in different editions of Homophones. Perhaps the circle marking the end of group III.5 was improperly placed in Homophones A, and that error was corrected in later editions of Homophones.

The missing 12A61 (Homophones A)/13A52 (Homophones B and D) is of course 5407 2dɤu, the second half of 'stupa'. And its obvious source is 2829   1dɤu 'building':


4908 + 2829 = 3721 5407

1bʌ 'ceremony' + 1dɤu 'building' = 2bʌ 2dɤu 'stupa'?

Semantically that is fine, but the tones do not match. Why were 'level' tones changed to 'rising' tones? And when did that happen? Before or after a final glottal *-H conditioned rising tones - if Tangut tones are of segmental origin? (What if Tangut tones originated from pitch accent as in southern Qiang?)

Moreover, is it simply a coincidence that four out of seven disyllabic words with dɤu fit the pattern bA dɤu?

0691 0710 2bɤa 2dɤu 'large-collared gown'

2671 0712 2bʌ 2dɤu 'drawers and stomacher'

3721 5407 2bʌ 2dɤu 'stupa, pagoda'

4561 2284 2ba 2dɤu (a surname)

The homophones from yesterday's list of 1bʌ-syllables come to mind:

5035 4068 1bʌ 1mɛ 'to present a gift; to fete'

5042 4072 1bʌ 1mɛ 'soft'

Is such (near-)homophony the result of unconscious evolution, or is it the product of conscious design? The study of Tangut polysyllabic morphemes has barely begun. STUMPED BY 'STUPA' (PART 3: NEAR-HOMOPHONES OF THE FIRST SYLLABLE)

In part 2, I looked at exact homophones of the first syllable of
3721 5407 2bʌ 2dɤu 'stupa, pagoda'

in search of possible cognates and found none. I forgot to look at near-homophones with the first tone:

Fanqie Tangraph Li Fanwen number Gloss
Initial Final


3149 first syllable of 3149 2811 2816 1bʌ 1lɨə 1lø 'round bone' (only in dictionaries)
5035 first syllable of 5035 4068 1bʌ 1mɛ 'to present a gift; to fete' (homophonous with 'soft' below)
5042 first syllable of 5042 4072 1bʌ 1mɛ 'soft' (homophonous with 'present a gift' above)


0022 resources
3594 first syllable of 3594 0620 1bʌ 2dɑ 'abrupt', 3594 0700 1bʌ 2dʐɨaʳ 'to throw' (only in dictionaries), and 3594 5586 1bʌ 2dʐwø 'to throw'
3692 first syllable of 3692 0342 1bʌ 1dzə 'to throw' (only in dictionaries); why weren't all three 'throw' verbs written with the same tangraph? Why change the bottom right corner?
4908 ceremony and propriety (only in dictionaries)
5031 second syllable of 2621 5031 2lɨə 1bʌ, name of an ancestor of the black-headed Tangut

Although these eight tangraphs have two different fanqie (which I will call A and B), they were placed in the same group as all but one of the 2bʌ tangraphs in Homophones A:

A group
Homophones A page/location Tangraphs Reading Tangraphic Sea rhyme Overall rhyme
I.16 03A55-03A61 1bʌ B 1.27 28
03A62-03A63 2bʌ 2.25
03A64 1bʌ A 1.27
03A65-03A67 2bʌ 2.25
03A68 1bʌ A 1.27
03A71-03A72 2bʌ 2.25
03A73 1bʌ' 1.31 32
03A74 1bʌ A 1.27 28
03A75-03B12 2bʌ 2.25
I.17 03B13
03B14 1bʌʳ 1.84 90

The numbering of Homophones A groups follows Li Fanwen 1986 and Arakawa 1997.

The two types of 1.27 were separated from each other and from 2.25 in Homophones B and D. Rhyme 28 tangraphs were no longer mixed with tangraphs of other rhymes. (rhyme 32) is 10A21 and (rhyme 90) is 10A57 in Homophones B and D.

B/D group
Homophones B/D page/location Tangraphs Reading Tangraphic Sea rhyme Overall rhyme
(1) 04A54-04A58 1bʌ B 1.27 28
(2) 04A61-046A62 1bʌ A
(3) 04A63-(04B01)

2bʌ 2.25

I have arbitarily numbered the Homophones B and D groups.

The main character of 04B01 in Homophones B has not survived, but what remains of the clarifier beneath it matches 1765 in Homophones A and D. 1765 is not separated from other 2bʌ in B or D. (This section is missing from Homophones C.)

The (mis)matches between the different editions of Homophones and the Tangraphic Sea indicate that

- 1bʌ (both types) and 2bʌ were very close (or even homophonous in the Homophones A dialect: i.e., they all had the same tone - or no tone?)

- the distinction between the two types of 1bʌ may not have been an isolated quirk of the Tangraphic Sea, as the B type tangraphs are clustered at the beginning of Homophones A group I.17, and have their own group (1) in Homophones B and D

- the fanqie suggest the difference may have involved a medial -w-, though such a medial is not otherwise thought to be distinctive after labials:

A: 1bi + 1kʌ = 1bʌ

B: 1bu + 1lʌ = 1bwʌ? (whose -w- may be from a prefix *P- - but would *P-prefixed forms really have outnumbered prefixless forms five to three?)

- rhymes 32 and 90 were similar to rhyme 28

- 1bʌ' (rhyme 32) might have been 1bʌ A in the Homophones A dialect whose ancestor may have lacked the conditioning factor that became the 'apostrophe' feature in the Tangraphic Sea dialect

- 2bʌ (rhyme 28) 'dark green' might have been 1bʌʳ like in the Homophones A dialect whose ancestor may have had a prefix *R- in 'dark green' that conditioned retroflexion absent from the Tangraphic Sea dialect

At the pre-Tangut level, the sources of 1bʌ and 2bʌ could have been identical except for the absence or presence of a final glottal *-H that conditioned the rising tone. (I am putting aside the question of whether type B 1bʌ had *P- at the pre-Tangut level.)

Out of the eight 1bʌ above, the only one with any potential semantic relevance to 3721 5407 2bʌ 2dɤu 'stupa, pagoda' is 4908 'ceremony and propriety'. If 4908 were suffixed with *-H or shifted to the rising tone after tonogenesis (but why?), it could have been added to something like 'mound' or 'building' to form 'stupa, pagoda'. But is there a 2dɤu with such a meaning?

Next: Homophones of 2dɤu 'stupa, pagoda'. STUMPED BY 'STUPA' (PART 2: ETYMOLOGY OF THE FIRST SYLLABLE)

The second half of the Tangut word

3721 5407 2bʌ 2dɤu 'stupa, pagoda'.

could be used on its own to mean 'stupa'. Was the first half 3721 2bʌ a prefix or modifier? Eight of the homophones of 3721 2bʌ are not free morphemes and cannot be modifiers. Nor does it seem likely that the noun 'stupa' shared a prefix with, say, the verb 'to swell'. The remaining five homophones do not look like probable modifiers: .e.g, 'insect'.

Tangraph Li Fanwen number Gloss

first syllable of 0589 3530 (2008) 2bʌ 2dəʳ (1xõ) 'scabies' (only in dictionaries)
first syllable of 1386 2434 2bʌ 1be and 1386 1146 2bʌ 1kɑ̣, botḥ 'old and shabby'
dark green (only in dictionaries)
first syllable of 2276 1972 2bʌ 2reʳ 'to swell' (the second half can occur by itself)
first syllable of 2280 0504 2bʌ 2lɨẽ 'spinach'
first word of 2671 0712 2bʌ 2dɤu 'drawers and stomacher', a homophone of 3721 5407 2bʌ 2dɤu 'stupa'; Li Fanwen (2008: 439) regarded 2671 as a loan from Chinese 襪 'drawers', but the latter was *va which is a poor phonetic match.
first syllable of 2828 0865 2bʌ 1tʂɨe 'to bear a burden' (only in dictionaries?; the quotation from Nevsky 1960 II may be a quotation from the lost rising tone volume of the Tangraphic Sea)
first syllable of 2828 0090 2bʌ 1voʳ 'mandarin duck' (1voʳ is 'chicken')
first syllable of the place name 2828 5856 2bʌ 2ɣɑ
pellet; first word of phrases 3381 2290 2bʌ 2lõ and 3381 5900 2bʌ 2di, both 'pellet' (2lõ is 'round' and 2di is 'broken')
first syllable of 4766 1032 4789 2bʌ 1vʌ̣ 1ny 'a kind of vegetable' (only in dictionaries)

I presume that the homophony of 3721 5407 2bʌ 2dɤu 'stupa' and the phrase 2671 0712 2bʌ 2dɤu 'drawers and stomacher' is purely coincidental.

If none of the above are related to 3721 5407 2bʌ 2dɤu 'stupa', there are several other possibilities.

First, 2dɤu may be an abbreviation for a disyllabic root 2bʌ 2dɤu, just as Chinese 塔 ta 'stupa' is an abbreviation of 塔婆 tapo < *thəp-ba. Unlike the Chinese word, 2bʌ 2dɤu is not a borrowing from Indic, and I wonder what its original meaning was.

Second, 2bʌ- could be a fusion of *N- with *p(h)ʌ or even *Nʌ- which lowered the vowel of a following *p(h)ə or *bə, so its true cognates may have been pronounced p(h)ʌ, p(h)ə, or and/even (< *Cʌ-Pə) or və (< *Cə-Pə). Casting a wider net may eventually yield results.

The third is a copout: 2bʌ- is the last survival of a morpheme that was lost elsewhere: cf. were- of werewolf (from an extinct wer 'man'; were- has only recently has become productive in neologisms) and -groom of bridegroom (from an extinct guma 'man').

(11.12.8:25: Fourth, a cognate of 2bʌ- could have the first ['level'] tone. I'll look at 1bʌ tangraphs in part 3.) STUMPED BY 'STUPA' (PART 1: TANGRAPHIC STRUCTURE)

Andrew West put up a page on Tangut text decorations including drawings of stupas. That got me thinking about the Tangut word

3721 5407 2bʌ 2dɤu 'stupa, pagoda'.

Each of the two tangraphs is the clarifier for the other in the various editions of Homophones:

2bʌ 2dɤu (a left-hand clarifier is read after the main tangraph; see scans)

2bʌ 2dɤu (a right-hand clarifier is read before the main tangraph; see scans)

The characters have nearly symmetrical structures. The analysis of the first tangraph is unknown, but I suspect it is similar to the analysis of the second:


3721 2bʌ (first half of 2bʌ 2dɤu 'stupa') =

'earth' < left of 3792 1lwy 'low' (only in dictionaries?) +

all of 5053 1tsəʳ' 'fifth' (used before 1448 2ʔew 'son'; see Andrew's article on Tangut filial ordinals)?


5407 2dɤu 'stupa' =

top of 5053 1tsəʳ' 'fifth' +

right of 1572 1phɤõ 'white' +

'earth' < left of 3792 1lwy 'low'

The 'earth' radical is not surprising, as the Chinese character 塔 for 'stupa' also contains an 土 'earth' radical. But why extract it from 'low' (a curious choice for a tall structure), and what are the functions of 'fifth' and 'white'?

I would have expected

1ŋwʌ 'five'

for the five elements symbolized by a stupa instead of 'fifth' (son).

'White' may refer to the color of a stupa. Why is 'white' in the analysis of 5407 but not 3721? The left-hand component of 5407 analyzed as a blend of 'fifth' and 'white' is unique to that tangraph:


It is that unusual radical that makes 3721 5407 unlike disyllabic words with symmetrical tangraphs: e.g.,

1721 5660 1ma ?kwi 'stirrup'


Kane's transcription of the Khitan large script in chapter five of his 2009 book has few of the diacritics that are common on vowels in his transliteration of the Khitan small script. The exceptions are ü and ï in Khitan large script spellings of Chinese loanwords and ê which may also be exclusive to Chinese loanwords:

[sêng un] 'commander' (transcribed in Chinese as 詳穩; < Chinese 將軍 'general'?)

[an] ~ [ên] for the transcription of Chinese 元 and 原

Liu and Wang (2004: 87) read the latter as [ɑn].

Does that mean the language underlying the Khitan large script had fewer vowels than the language underlying the Khitan small script? Not necessarily.

I think it is more likely that one phonology was written in two different ways. From a phonemic perspective, the Khitan large script may have underdifferentiated the Khitan vowel system whereas the Khitan small script overdifferentiated it by including characters for allophones. And overdifferentiation could have led to spelling problems as the vowel system changed over time and as Jurchen came to write in Khitan.

I could be wrong. Future analysis may reveal that the Khitan large script had at least six characters corresponding to the six back (?) vowel characters of the Khitan small script, and all six might have been phonemic in the tenth century when both scripts were established. But current scholarship indicates a degree of vocalic flexibility in the large script absent in the small script: e.g.,

[un] ~ [ən] (Liu and Wang 2004: 81; Kane 2009: 179 only has one reading [un])

may correspond to up to three characters in the small script:

<un>, <ún>, and/or <en> (= [ən])

Kane only confirmed the [un] : <un> correspondence.

In any case, it doesn't correspond to

<én> (two variants)

whose large script counterpart is

according to Kane 2009: 174. Maybe é was front whereas u, ú, and e [ə] were nonfront.

A future transliteration of the Khitan large script might have capital letters for vowel classes: e.g., <Un> for

with an <u> indicating a nonfront, nonlow vowel. Precise vocalism could only be determined via comparison with small script spellings (if any). REDHOUSE'S TURKISH VOWELS

After struggling to identify Khitan vowels, I would expect the identification of Turkish vowels in James Redhouse's 1880 dictionary to be trivial. I thought Redhouse would have eight symbols corresponding to the eight vowels of modern Turkish (a e ı i o ö u ü), but in fact he has eleven: five in roman type, three in small capitals, and three in italics:

1. A as in wall

2. a as in far

3. a as in about

4. E as in pan

5. e as in pen

6. i as in pin

7. i as in girl

8. o as in go

9. u as in French tu

10. U as in full

11. u as in fun

Users could ignore the distinctions indicated with small capitals and italics. Redhouse wrote in 1856 that

were the European character ever to be adopted in Turkey [which happened less than eighty years later!], for the purpose of writing the Ottoman language, there is no reason why the a, the e, the i, and the u should not bear several values as they do with us; whereas in printing, and, if necessary, even writing, the difference could be pointed out by one or two strokes under them, thereby leaving the upper part free for the introduction of special signs to distinguish the long from the short vowels, and the accentuated from the unaccentuated syllables.

He used macrons and acute and grave accents to indicate vowel length and accentuation (? - more on this below).

The five vowels in roman type are obvious:

- a, e, i, o, u = modern a, e, i, o, ü (not u!)

Clues to the other six (in bold) are in his list of the phonetic values of the hàrékÉ on pp. 11-12:

fètha = A, a, a, É, e

késsrÉ = i, i

dàmma ~ zàmma = o, u, U, u
Modern dotless ı, ö, and u must be equivalent to italic i, italic u, and small capital U:

Àlti = altı 'six'

dùrt = dört 'four'

òrdU = ordu 'camp, army' (the same word I wrote about two days ago)
The other three have modern equivalents overlapping with those of a and e:

Àda = ada 'island'

tèkÉ = teke 'shrimp'

On page 20, Redhouse differentiated between "hard" (= nonpalatal) small capital A and italic a and "soft" (palatal) roman a.

Only small capital Ā and roman ā can be long, and "there are scarcely any long vowels" in native Turkish words (pp. 13-14). (The original Turkic long vowels were long gone, and modern long vowels that resulted from the loss of /ɣ/ are absent from Redhouse's description*. The 'silent' letter ğ corresponds to Redhouse's gh,

a hard g, taking sometimes a gliding sound [...] sometimes softened down to the value of w when preceded or followed by o or U, and even by i; at other times it becomes almost imperceptible in the pronunciation. [p. 16])

After briefly browsing through Redhouse's dictionary, I tentatively conclude that

- the core (native) eight vowels were

nonpalatal/back: A (which could be long in Arabic and Persian loanwords), i, o, U [ɑ(ː) ɯ o u]

palatal/front: e, i, u, u [e i ø y]

- italic a was [ə] which was always short

The small capital A : italic a distinction corresponds to nothing in earlier Turkic or modern Turkish, and I wonder if Redhouse was hearing allophony.

Italic a seems to be in final syllables - but see àna below!

- roman a was a front or central [a] and could be long or short in Arabic and Persian loanwords

but it's also in native àna 'mother' (cf. modern ana ~ anne; the latter violates vowel harmony like Redhouse's form) and kara 'black' (after a back k)!

- small capital E in part corresponds to modern [ɛ], a word-final allophone of /e/

Although Redhouse wrote that small capital E was like the a [æ] of English pan, it does not always correspond to [æ], an allophone of /e/ before sonorant codas: e.g., Redhouse's ben (not bEn!) 'I' corresponds to modern ben [bæn].

On the other hand, kEtEn 'flax' corresponds to keten [ketæn], but it has a small capital E in nonword-final position!

I would have to look through Redhouse more carefully to refine my interpretation of his notation.

I was hoping to see another interpretation of the Redhouse romanization in Yavuz Kartallıoğlu's "The Vowels of Turkish Language in Transcription Texts" (2010). Unfortunately he did not look at Redhouse's dictionary, though he did include Redhouse's earlier French-language grammar of Turkish in his study.

*According to Kerslake (1998: 184), /ɣ/ was already lost prior to the nineteenth century. Redhouse may have been  transcribing a spelling pronunciation.

Tangut fonts by
Tangut radical and Khitan fonts by Andrew West
Jurchen font by Jason Glavy
All other content copyright © 2002-2014 Amritavision