I am interested in languages with unusual vowel contrasts because I am looking for plausible interpretations of what seems to have been a huge vowel system in Tangut.

Nishida (1964) proposed that Tangut had retroflex vowels which "are rare, occurring in less than one percent of the languages of the world" (but North American English and Mandarin are part of that one percent!). Although I was initially skeptical, Tibetan transcriptions with r and a near-total correlation between initial r- and the rhymes that Nishida reconstructed with retroflexion* convinced me that he was right.

I propose three sources of retroflexion in Tangut:

- Preinitial *r- (which might be from earlier *l- and/or *t- as well as *r-)

- Initial *r-

- Coda *-r

Oddly, medial *-r- is not one of them; it conditions Grade II vowels.

On Wednesday, I learned that the Dravidian Badaga language had two degrees of retroflexion in its vowels, 'ha: e.g.,

beː 'mouth' : beːʳ 'bangle' : beːʳʳ 'crop'

(I have rewritten these examples from UCLA using single and double ʳ to represent half and full retroflexion.)

Where does this retroflexion come from? Is it a remnant of lost retroflex consonants (*R) that lost their retroflexion and/or disappeared? Did more retroflex consonants (i.e., geminates and clusters) condition more retroflexion?

*VR > Vʳ(C): e.g., *aʈ > *aʳ(t)

*VRR > Vʳʳ(CC): e.g., *aʈʈ > *aʳʳ(tt)

Or did different rhotics (e.g., alveolar and retroflex) condition different degrees of retroflexion?

Does Badaga still have retroflex consonants? None of the UCLA data contain retroflex consonants. Although this proposed Badaga script has characters for retroflex consonants and no characters for retroflex vowels, perhaps the orthography is historical: e.g., [aʳt] might be written as <aʈ> with a retroflex consonant rather than a retroflex vowel. (But how would the contrast between 'mouth', 'bangle', and 'crop' be written? As <beː>, <beːr>, and <beːrr>?)

The Wikipedia article on Badaga and this UCLA wordlist imply that half-retroflex vowels have pharygealized allophones. That reminds me of the pharyngeal vowels that Sofronov (2012) reconstructed for Tangut. What is the origin of pharyngealization in Badaga and Tangut?

11.17.4:39: The two degrees of retroflexion in Badaga remind me of the two degrees of vowel length in Estonian:

vere [vereˑ] 'blood (gen.sg.)' : veere [vreˑ] 'edge (gen.sg.)' : veere [veːːre] 'roll (imp. 2nd sg.)'

Long and overlong vowels are not distinguished in spelling. Overlong vowels have "a distinctive tonal contour". Did this contour (or some ancestor of it) condition overlength, or does overlength have a segmental origin?

*The only exception is the syllable 2riẽ which may be from an earlier *2riẽʳ with nasalization and retroflexion. All other Tangut syllables with initial r- have retroflex vowels. AN IN-CREU-DIBLE CHANGE

I feel somewhat foolish blogging about matters that were probably resolved long ago by philologists: e.g., the correspondence of Sanskrit dental l to retroflex in modern Indo-Aryan and Dravidian languages. Nonetheless another part of me does not mind the mental exercise of reinventing the wheel in the dark. When I see something unusual in a language I do not specialize in, I am compelled to try to explain it.

Yesterday, I learned about this unusual shift in Catalan "not found elsewhere" in Romance (emphasis mine):

Unusual development of early /(d)z/, resulting from merger of Proto-Western-Romance /ð/ (from intervocalic -d-) and /dz/ (from intervocalic -ty-, -c(e)-, -c(i)-); see note above about a similar merger in Occitan. In early Old Catalan, became /w/ finally or before a consonant, remained as /(d)z/ between vowels [...]

pedem 'foot' → peu

crucem 'cross' → creu, crēdit 'he believes' → (ell) creu

Verbs in second-person plural ending in -tis: mirātis 'you (pl.) look' → *miratz → mirau → mireu/mirau

How did this happen? d > u reminds me of the simple substitution ciphers I devised as a child; the two sounds have nothing in common beyond voicing. If Latin were unknown, I would not believe that Catalan -u could be from d. But almost any sound can become another given enough steps:

Latin d > > *(d)z (after merger with original dz) > (again!) > *v > u

I think it's more likely that and *(d)z became interchangeable for a while before dominated and was replaced by *v which is a more common consonant. (v is in one out of five languages in UPSID, but ð is only in one out of twenty.)

An English example of ð > v is bruvver for brother.

Last night, I realized that history had almost repeated itself. Latin might have undergone a similar change, albeit with a different outcome:

Proto-Indo-European *dh > > *v > Latin f: e.g., PIE *dhuHmos > Latin fūmus > Catalan fum 'smoke' (cf. Sanskrit dhūmas 'id.')

Note that PIE *dh became *v in initial position, whereas the Latin d that became Catalan u was in noninitial position. NA_A

Yesterday, I had used Wikipedia to find various Indo-Aryan and Dravidian names for Diwali and Kali, and today I tried to think of some other Sanskrit name known throughout India containing -l-. Nala first came to mind, no doubt because I had read about him in Lanman's Sanskrit Reader almost two decades ago. Unfortunately, Wikipedia only had three articles in Indian languages about Nala. Nala is नल Nal in Hindi as I would expect with the regular loss of the final vowel. But Tamil நளன் Naan (why the underlining?*) and Malayalam നളൻ Naan both have a retroflex ḷ.

Did early Indo-Aryan /l/ have a colloquial retroflex allophone [ɭ] that sounded like retroflex to Dravidian speakers who distinguished between alveolar l and retroflex ḷ?

Could this retroflex allophone be preserved in modern Indo-Aryan languages like Gujarati, Marathi, and Oriya which have retroflex in 'Diwali'? (But Gujarati and Marathi have dental l in 'Kali'!)

Is retroflex corresponding to Sanskrit l an indication of inheritance** (in the case of IA languages) or nonlearned borrowing (in the case of Dravidian languages) whereas dental or lateral l indicates learned borrowing? For example, Marathi दिवाळी Divāī is inherited whereas दीपावलि Dīpāvali is borrowed without any change from Sanskrit and दिपवाळी Dipavāī and or दिपावळी Dipāvaī are semilearned forms. (I found those words in Molesworth 1857.)

*11.15.1:04: Tamil ன் n is alveolar whereas ந் n is dental and ண் is retroflex.

**11.15.1:08: Strictly speaking, inheritance of a form that had undergone the shift *l > ḷ. Could this shift be due to Dravidian speakers who adopted early Indo-Aryan and pronounced its /l/ as retroflex [ɭ]? DIWA_I

Instead of dwelling on a disaster in the recent past, I want to look at the names of a festival being celebrated this week. Today was the start of Diwali in most of India. Its names go back to

Sanskrit dīpāvali- < dīpa 'light, lamp, lantern' + āvali- 'row'

Classical Sanskrit only has a dental l.* However, some modern Indian languages have a retroflex ḷ** in their words for 'Diwali':


Gujarati દિવાળી Divāī

Marathi दिवाळी Divāī

Oriya ଦୀପାବଳି Dīpabaī


Kannada ದೀಪಾವಳಿ Dīpāvai

Tamil தீபாவளி Tīpavaī

Telugu దీపావళి Dīpāvai

but Malayalam ദീപാവലി Dīpāvali with l!

There is no rhotic consonant in Skt dīpāvali-, so that retroflex cannot be the result of assimilation to a nearby r. Was l > a regular change? Or was that an irregular change that occurred in one language whose word for 'Diwali' then spread to others?

11.14.8:09: For comparison, 'Kali' has dental l in

Gujarati કાલિ li

Marathi काली lī

but retroflex in

Kannada ಕಾಳಿi

Malayalam (!) കാളി i

Oriya କାଳୀ ī

Tamil காளி ī

Telugu కాళిಳಿi

So retroflex for dental l in 'Diwali' does not necessarily predict retroflex for dental l in 'Kali'. And Malayalam, which had dental l for dental l in 'Diwali', has retroflex for dental l in 'Kali'.

*11.14.8:14: Vedic Sanskrit also had a retroflex [ɭ].which appears to be an intervocalic allophone of retroflex [ɖ] and has nothing to do with the Classical Sanskrit l in 'Diwali' and 'Kali'.

**This retroflex [ɭ] is not to be confused with the rare syllabic [lˌ] of Sanskrit (which isn't even included in this table of the National Library at Kolkata romanization). HORSEWIND

Another Tangut word for Hurricane Sandy might be

1ʂɛ̣ 'fierce wind'

At first I thought the character was a semantic compound of 'horse' and 'wind'


or evidence for a Tangut B compound noun 'horse wind' meaning 'fierce wind'. However, looking at the analyses of its (near-)homophones

1ʂɛ̣ 'to tie (horses)'

1ʂɛ̣ 'lameness'

2ʂɛ̣ 'evil, harm, calamity'

(conveniently grouped together in Homophones 39A11-14), I think that 'to tie (horses)' is an abbreviated phonetic in the other three*:


'to tie (horses)' in turn contains another abbreviated phonetic (but on the right instead of the left)


1ʂɛ̣ 'to tie (horses)' = left of 2lɨị 'to tie (horses)' + top and bottom right of 1ʂɨe 'to take, extract'

The others are also phonetic-semantic compounds (but with phonetics on the left):


1ʂɛ̣ 'fierce wind' = left of 1ʂɛ̣ 'to tie (horses)' + left of 1lɨə 'wind'


1ʂɛ̣ 'lameness' = left of 1ʂɛ̣ 'to tie (horses)' + right of 2dʐɛ̃ 'lame'


1ʂɛ̣ 'evil, harm, calamity' = left of 1ʂɛ̣ 'to tie (horses)' + all of 2ŋiạ 'scar, defect, drawback'

The actual Tangraphic Sea analyses of 'fierce wind' and 'lameness' are circular:


 1ʂɛ̣ 'fierce wind' =  left of 1ʂɛ̣  'lameness' + left of 1lɨə 'wind'


1ʂɛ̣  'lameness' = left of 1ʂɛ̣ 'fierce wind' + right of 2dʐɛ̃ 'lame'

I suspect the three ʂɛ̣ not meaning 'tie' are related words. Sandy certainly was a calamity. And 'lameness' is a form of harm, so it's not surprising that the Combined Homophones and Tangraphic Sea analysis of 'harm' is


2ʂɛ̣  'evil, harm, calamity' = left of 1ʂɛ̣  'lameness' + all of 2ŋiạ 'scar, defect, drawback'

Sets like these make me want to believe that tangraphy is like sinography with more abbreviations. But there are still aspects of tangraphy that lack Chinese precedents: e.g., the mysterious graph-final elements like the right sides of

2dʐɛ 'wheel' and 1lɨə 'wind'

which I thought might reflect Tangut B suffixes added to Tangut B roots represented by the semantic elements

'wheel' and 'wind'

*11.13.7:52: I'd like to write the Tangut equivalent of Grammatica Serica Recensa: Grammatica Tangutica, a guide to phonetic series of tangraphs. WHEELWIND

I was unable to blog for a week because I was without power until November 4 due to Hurricane Sandy, and I've spent the last week catching up on other matters.

The Tangut, an inland Central Asian people, probably had no word for hurricanes. They might have likened them to

1swiə 2gie 'whirlwind'

which appears to be an indivisible disyllabic word.

According to the Tangraphic Sea, the first character is a combination of 'wind' and 'wheel':


1swiə (first half of 'whirlwind') = left of 1lɨə 'wind' + left of 2dʐɛ 'wheel'

I suspect that the Tangut radical


is from Chinese 風 'wind' with the first two strokes squared off as ㄇ, the third stroke removed, and the remaining six strokes (虫) rearranged. The four top strokes correspond to 中 (but with the horizontal lines no longer touching the vertical lines and ㄱhas lost its bend) and the two strokes across ㄇ are horizontal instead of tilted.

The Tangut radical


may be from the right of Chinese 輪 minus the 人 on top. The Tangut script favors X-like shapes to 十-like shapes inside enclosures. 十-like shapes may be surrounded on two sides but not three. Tangraphy seems to have a conscious aesthetic philosophy unlike the Khitan or Jurchen scripts.

The analysis of the second character of 'whirlwind' is unknown*, but it is obviously a mirror image of the first character which was presumably created first:

I wonder how many Tangut miswrote 'whirlwind' with the elements reversed:

*Both Li Fanwen (2008: 41) and Shi et al. (2000: 257) provide analyses for the second character



2gie (second half of 'whirlwind') = right of 1swiə (first half of 'whirlwind') + left of 1lɨə 'wind'

Shi et. al.:


2gie (second half of 'whirlwind') = left of 2dʐɛ 'wheel' + left of 1lɨə 'wind'

but the part of the Precious Rhymes of the Tangraphic Sea containing its analysis (reproduced in Shi et al. 2000: 833) is damaged, so neither analysis can be confirmed.

