18.104.22.168:51: UNICODE 6.1: THE MEROITIC SCRIPT
Although the Meroitic scripts (3rd c. BC-4th c. AD) which are new in Unicode 6.1 resemble Egyptian hieroglyphics and Egyptian Demotic, they are structurally similar to the Old Lisu script invented 16 centuries later. Consonant letters represent syllables beginning with that consonant plus a unless a vowel symbol follows: e.g., the name Kaditareye was spelled
without any letters for a after k and t. Like Old Lisu, the Meroitic scripts had special letters for initial a.
The Meroitic hieroglyphic script looks like pictures but is really an alphabet with elaborate letter shapes which have simpler counterparts in the Meroitic cursive script.
Both scripts have 'extra letters' for a few syllables ending in -e and one syllable ending in -o: <ne se te to>. These four syllables were never written as <n> + <e>, <s> + <e>, <t> + <e>, and <t> + <o>? Why were they singled out for special treatment? Rowan's (2006) hypothesis as summarized by Wikipedia accounts for <ne se te>, but why is <to> the only o-syllable with a special character? Was to a high-frequency word in Meroitic? I think unusual spellings are fine for high-frequency words: e.g., the Tagalog grammatical markers [naŋ] and [maŋa] are spelled ng and mga.
To take us full circle, Tagalog was originally written with the Baybayin script. The inherent vowel of Baybayin consonant characters was ... a, as in Meroitic and Old Lisu.
22.214.171.124:17: UNICODE 5.2 (NOT 6.1!): THE OLD LISU SCRIPT: ANSWER KEYSee my previous post for the questions.
1. In the Old Lisu sample, a was the inherent vowel of consonants without following vowel letters: e.g.,
M by itself is ma
but MI is mi (not mai) and MU is mu (not mau)
The Old Lisu letter A represents ʔa with a glottal stop initial. Old Lisu MA doesn't exist since maʔa has two syllables and syllables are written apart in Old Lisu as in Vietnamese. (Hence OL and Vietnamese spacing do not necessarily correlate with word spacing. Both languages have polysyllabic words written with spaces between syllables.)
2. Punctuation marks indicate the tones of preceding syllables:
comma = mid rising tone
period = high tone
semicolon = low tense tone
colon = low tone
3. Since a comma is a tone mark in Old Lisu, dash + period functions as a comma.4. The Old Lisu letter F represents ts, the unaspirated (h-less) counterpart of inverted F representing aspirated tsh.
The function of inversion can be inferred from the noninverted-inverted pair in the sample:
Old Lisu K = unaspirated k
Old Lisu inverted K = aspirated kh
5. Inverted P, T, C represent ph, th, ch, the aspirated counterparts of P, T, C representing unaspirated p, t, c.
6. Old Lisu syllables from the Bible sample in Daniels (1996: 582):
|inverted T + inverted A:||thæ||low|
|inverted K + U.,||khu||high + mid rising|
|T + inverted E,||tø||mid rising|
Inverted vowels represent phonetically similar vowels.
126.96.36.199:47: UNICODE 5.2 (NOT 6.1!): THE OLD LISU SCRIPT
I think I assumed that the Old Miao (a.k.a. Pollard) script was already in Unicode because the Old Lisu (a.k.a. Fraser*) script was in Unicode 5.2, released in 2009.
For a change, I'm going to let you to figure out how the Old Lisu script works. A sample of Old Lisu text is at the top of this file by Michael Everson. Here is my attempt to phonetically transcribe the first few syllables:
tsho + ? tone
kha + mid rising tone
næ + high + low tense tone
si + ? tone
shya + ? tone
ma + ? tone
ja + ? tone
gu+ ? tone
kwa + ? tone
myø + low tone
xo + low tone
su + ? tone
nya + comma
Questions:1. What is the Old Lisu letter for a?
2. How are tones indicated in the Old Lisu script? (Tone is not indicated for most of the syllables in the sample.**)
3. What is the Old Lisu equivalent of a comma?4. What does the Old Lisu letter F represent? (Do you need a hint?***)
5. If the Old Lisu letters P, T, and C represent p, t, c, what do inverted P, T, and C represent?
6. Write the following syllables (from the Bible sample in Daniels 1996: 582) in Old Lisu:
mi + low tone
thæ + low tone
khu + high + mid rising tone
tø + mid rising tone
nya + mid rising tone + comma
If you get stuck, you could look at the Wikipedia entry for the Old Lisu script and/or this WG2 document. (What is WG2?) The latter has lots of examples of Old Lisu in use - even in comics!
The New Lisu script developed by Chinese linguists is based on the Pinyin romanization of Mandarin. I mechanically converted the above sample into New Lisu according to the tables in Bradley (2003: 12):
co kaq nailr si xya ma jja ggu gwa myeit hot su nya,
Tones are represented as final consonants in New Lisu. This conversion probably doesn't have enough tone letters and may have too many tone letters in the syllable nailr, since I doubt tone letter sequences are permitted in New Lisu.
In the New Lisu table, ei represents both [e] and [ø]. Is that correct?
According to Bradley (2003: 6), New Lisu script "has been rejected by most literate Lisu people" and Old Lisu script "is what literate Lisu people prefer".
*The Old Lisu script was created by three missionaries: Ba Thaw of Burma, JO Fraser of Scotland, and JG Geis of the US. Although the script is commonly known as the Fraser script, that name erroneously implies that he was its sole creator.
Fraser's Handbook of the Lisu (Yawyin) language (1922) is at archive.org.
**David Bradley (2003: 6) has statistics on tone mark omission in Old Lisu:
This tendency to omit tone marking in the Fraser script shows a great deal of individual difference within Lisu; some people try to omit as much as possible, others try to write them all or nearly all. Even in highly formal written contexts such as published Scriptures, most of the tone marking can be omitted. For example, in the four most recent editions of the New Testament from 1978 onward, John 1: 1-5 has 88.9, 87.2, 84.4 and 82.1 per cent of tones omitted; in most cases the tones actually marked in each edition are slightly different. In private letters, depending on the writer, omission can be even greater. One letter which I received recently has 96.6 per cent of the tones omitted, but this is extreme. This omission of tone marking was normal from the very beginning: the 1921 catechism has no tones at all marked in its title!
***Hint: look at the Old Lisu letters K and inverted K in the sample.
188.8.131.52:59: UNICODE 6.1: THE OLD MIAO SCRIPT
Here's the second item in Andrew West's "What's new in Unicode 6.1" that caught my eye. I assumed that the script Samuel Pollard (British), Wang Mingji, John Zhang, James Yang (all Miao), and Stephen Lee (Chinese) created for the Miao was already in Unicode and was surprised to learn that it wasn't. The Old Miao script (a.k.a. the Pollard script) was influenced by James Evans' Cree syllabics but I don't see a resemblance.
In the Cree system, shapes represented consonants and rotations indicate vowels: e.g.,
V = peeΛ = pi
> = poo
< = pa
Vowelless p was written as a small version of < pa in the newer eastern version of the syllabary <. (The original character for p was a small vertical bar ' unrelated to the V-shapes for p-syllables.)
In the original Old Miao script, consonant characters do not rotate. However, vowel symbols are in 'orbit' around the consonant and may be in one of five positions:
|vowel position 1||
|consonant character||vowel position 2|
|vowel position 3|
|vowel position 4|
|vowel position 5||
Each position represented a different tone: e.g.,
ʌ̄ ʑa (high tone; - a in position 1) 'shall'
Ɔ|| mau (mid-high tone; || au in position 2) (last syllable of tu ʈau mau 'messenger')
Ɔ- ma (mid rising tone; - a in position 3) 'exist'
C) nü (low tone;) ü in position 4) 'he'
C_ na (low tone; - a in position 4) 'look'
(I don't know what tone position 5 signified. I'm guessing some sort of low tone distinct from the other low tone. I got the tone descriptions from Daniels 1996 which doesn't mention position 5.)
Unicode 6.1 isn't out yet and I don't have any Old Miao font, so I can only approximate the script with the characters I have on hand.
Although the consonant symbols in the examples above don't have familiar sound values, others pose no problem to anyone who knows the Latin alphabet: T, S, V, L. R is [ʐ] and U is [w]. I don't understand why Latin consonant letters weren't recycled as is, adding new ones for Miao sounds like [tl] without single-letter basic Latin equivalents.
Latin U and O were recycled as vowel symbols but new symbols were created for other vowels and diphthongs. I can understand why A and E weren't recycled. They would be hard to write at reduced size. However, I am surprised that the shape I represents [ai] rather than [i] which is ∩. (6.16.00:16: Then again, perhaps 'long I' = [ai] of English was the inspiration of I for [ai].)
Can you read the following Miao words?
1. TU (first syllable of 'messenger' - hint!)
2. T∩ (first syllable of 'place')
3. Ɔ∩ 'you'
A later revision of the Old Miao script had all vowels in position 4 followed by a character for each of eight (not five!) tones (Wang Yangcai 2005):
|Tone description||high level||mid-high rising||high falling||low-mid rising||mid||mid-low falling||mid-low||low|
Note that the characters resembling upper and lower case T represent different tones.
See these documents for more details on the Pollard script.
6.16.1:04: Although the older version of the Old Miao script is harder to type, I like being able to visualize the pitch height of a syllable simply by looking at the position of its vowel characters.
184.108.40.206:07: UNICODE 6.1: KHMU IN LAO SCRIPT
Andrew West has posted a list of "What's new in Unicode 6.1". Two items caught my eye. I'll deal with the second tomorrow.
The first was the addition of two Lao letters for Khmu. Khmu and Lao have very different sound systems. Khmu has more consonants than Lao and no tones, whereas Lao has six tones. Lao tones are indicated by a combination of 'extra' consonants and tone markers: e.g., Lao has two letters for kh, ຂ and ຄ, each associated with somewhat different sets of tones when combined with tone markers (Diller 1996: 464):
||Open syllable||Closed syllable|
|Initial letter of syllable||No tone marker||Marker 1||Marker 2||Short vowel||Long vowel|
|ຂ kh- < *kh-||low rising||mid||low falling||high||low falling|
|ຄ kh- < *g-||high||high falling||mid||high falling|
Khmu in Lao script (KL) has no need for such consonant pairs: e.g., KL only has ຄ for kh. 'Extra' consonant letters can be used to write consonants unique to Khmu:
|ຽ||*-y-||part of letter sequence for -ia||j-|
The first Lao tone marker indicates preglottalization of sonorants in KL: e.g.,
Lao ນ່- <n1-> n- = Khmu ʔn-
Lao ນ- <n-> n- = Khmu n-
Two Khmu consonants absent in Lao, g- and final -ɲ, have KL letters of their own. I can't type them on this blog because Unicode 6.1 isn't released yet (and even when it is, most Lao fonts won't be able to handle them), but I can approximate them as
Lao ກ <k> k- + mid-letter may kan = Khmu g-
Lao ຍ <y> -y + mid-letter may kan = Khmu -ɲ
May kan is a Lao diacritic resembling a circle over the left edge of a breve. It represents a when written between the tops of two consonant letters and it also has this function in Khmu: e.g.,
ມັຫ <mah> mah 'to eat'
The absence or presence of a similar shape distinguishes the Thai letters
บ <ʔb> and ษ <ṣ>
which are not phonetically similar. (There should be a Lao equivalent of Thai ษ <ṣ>. What happened to it?*)The Lao letter ຍ <y> used as the base of Khmu -ɲ represents ɲ- initially in Lao but represents -y in final position:
ຍາຍ <yaay> ɲaay < *yaay 'maternal grandmother' (cf. Thai ยาย yaay 'id.')
ຍ <y> has the same double function in Khmu:
ເຍຶອຣ <yɨar> ɲɨar 'squirrel'
ຽຣອຍ <ÿrʔy> jrɔɔy 'mortar'
This is why ຍ <y> could not be used for both initial and final -ɲ in Khmu.
Summing up the usage of ຍ <y> in Lao and Khmu:
|ຍ||initial||final||final + mid-letter may kan|
*I assume Lao script once had letters for Indic-only sounds like Thai and Khmer: i.e.,
- voiced aspirates: <gh>, etc.
- retroflexes: <ṭ>, etc.
- syllabic <ṛ> and <ḷ>
- palatal <ś>
- retroflex <ḷ>
I also assume Lao once had letters for Tai-only sounds (<x>, <ɣ>, <z>) and a second <h> for syllables with anomalous tones. Thai still has such letters today, though the two letters for etymological velar fricatives are obsolete except in lists of the letters of the alphabet.
However, I have never seen letters for any of the above sounds in any modern description of the Lao script. Unicode has codepoints reserved for those letters: e.g., a Lao equivalent of Thai ษ <ṣ> for a Sanskrit retroflex sibilant would be assigned to 0EA9. Do those letters exist? Where can I see them online? Will those codepoints eventually be assigned?
11.6.13.00:23: THE CONTINUUM OF INDOXENIC TRANSLITERATION
In my posts on Thai and Khmer, I've been writing words from those languages in three ways:
- native script
- <transliteration in brackets>- transcription in italics
Aside from spelling variations for some words, everyone can agree on how to spell a given word in Thai or Khmer. However, I still haven't settled on a single transliteration or transcription system for either language.
It seems that everyone has their own Khmer transcription. I created my own when I was studying Khmer in 1995. Pinnow (1980: 106-107) compared just some of the many transcriptions in use.
I use transliteration to facilitate comparison with Sanskrit and Pali and indicate Indic distinctions in the native scripts in terms of Roman letters. Right now I use a maximally Indic transcription: e.g.,
Thai รถ <ratha> rot
Khmer រថ <ratha> rʊət
both from Skt ratha 'chariot' (cognate to Latin rota 'wheel')
But I could drop the final 'inherent' <a> which is rarely pronounced in final position in Thai and Khmer: <rath>.
In Thai, I could even drop the inherent vowel between consonants: รถ <rth> or <rtʰ> (distinct from a three-letter sequence รตห <r t h>) unless I wanted to transcribed the phinthu used in the Thai transcription of Indic*. But I couldn't do so in Khmer since Khmer uses consonant-subscript consonant combinations to indicate the lack of an inherent vowel between consonants:
រថ <rath> vs. រ្ថ <rtha>
Thai has no such combinations. Skt ratha and rtha in theory both correspond to
Thai รถ <ratha> ~ <rath> ~ <rth> ~ <rtʰ>
but in fact Skt rtha corresponds to Thai รรถ <raratha> ~ <rrth> (and transcriptions in between), pronounced at: e.g.,
อรรถ <ʔararatha> ~ <ʔrrth> at 'code' < Skt artha 'purpose'
One might think that Indic two-element consonant clusters are all indicated by doubling of the first consonant plus the second consonant, but that doesn't happen if the first consonant isn't <r>: e.g., Skt vidyaa 'knowledge' was borrowed as
วิทยา <vidayaa> ~ <vidyaa>
วิทททยา <vidadayaa> ~ <viddyaa>
Out of context, a sequence of identical letters can be pronounced as C, C...C, CVC, two different consonants, or even V or VC: e.g.,
<CC> = C: ฟุตบอลล <vutbʔll> futbɔɔn 'football' (extinct spelling in Haas 1956: 55; now spelled with a single <l>: ฟุตบอล <vutbʔl>)
<CC> = CC: เทนนิส <ednnis> thennit 'tennis'
Here and below I strictly follow Thai letter order in my transliteration, placing <e> and <ɛ> before the consonants they are pronounced after.
<CC> = C...C: แหม่ม <ɛhm1m> mɛm 'Western woman' (< Eng ma'am?)
<CC> = C1C2: อกตัญญู <ʔktññu> akkatanyu < P akataññu 'ungrateful'
<CC> = CVC: กก <kk> kok 'reed'
<CC> = V: กรรม <krrm> kam < Skt karma
<CC> = VC: สวรรค์ <swrr(g)> sawan < Skt svarga 'heaven'
Conversely, single consonants can stand for double consonants and clusters of two different consonants:
<C> = C: อกตัญญู <ʔktññu> akkatanyu < P akataññu 'ungrateful'
<CC> = C1C2: ศิลป <ślp> sinlapa < Skt śilapa 'arts, crafts'
These unwritten clusters are absent from Indic sources of loanwords.
There are many rules to determine how to pronounce single and double written consonants in context, so it is not as if each word is sui generis. Nonetheless, Thai text-to-speech conversion is difficult due to the complex relationship between orthography and pronunciation.
*The phinthu indicates a Thai consonant letter without its inherent vowel: e.g., Skt artha is transcribed in Thai as อรฺถ <ʔartha> with a phinthu beneath ร <ra> to indicate <a> is not read. Note that the regular Thai spelling of the Thai borrowing of Skt artha is อรรถ <ʔarrtha>.
220.127.116.11:22: IN BAD STANDING
Thai กุฎฐัง <kuṭṭhaṅa>kutthaŋ 'leprosy' is spelled with the etymologically retroflex letters ฎ <ṭa> and ฐ <ṭha> usually indicating an Indic origin. The word resembles Pali kuṭṭha 'leprosy' except for the final <-ṅa> -ŋ which does not match any suffix in Pali, Sanskrit, Thai, or Khmer. Are there any other Indic loans with an unexpected <-ṅa> -ŋ? Could the word be a blend of a Pali borrowing with a similar-sounding native Thai word for 'leprosy' ending in -ŋ? None of the native Thai terms for 'leprosy' at SEAlang end in -ŋ:
|ขี้ทูด||<khii duuʔda>||khii thuut||khii is 'excrement'; thuut may be a native root for 'leprosy' that doesn't occur by itself|
|ขี้เรื้อน||<khii2 ria2na>||khii rɨan||rɨan may be another (!) native root for 'leprosy' that doesn't occur by itself|
|โรคเรื้อน||<roga rɨa2na>||rook rɨan||rook < Skt roga 'disease' < root ruj 'break'|
|หูหนาตาเล่อ||<huu hnaa taa ləə1>||huu naa taa ləə||lit. 'ear thick eye clumsy'; this is a four-syllable elaborate expression (more examples)|
I'm surprised that Thai has two native roots for 'leprosy' and would be surprised if it had a third that happened to sound like P kuṭṭha < Skt ku-ṣṭha < ku-stha, lit. 'bad-stand'.
sṭha is cognate to Eng stand. s- becomes retroflex ṣ- after u.Does ku- have an Indo-European etymology? The Skt 'bad' prefix with an IE etymology is dus- (cf. Greek dys-): e.g., duḥ-stha 'unsteady' < 'bad-stand'. -s- becomes -ḥ- before another s-.
6.12.22:40: Khmer has a native root *glaŋ for 'leprosy' ending in -ŋ, but its vague resemblance to Thai kutthaŋ is coincidental:
ឃ្លង់ <ghla'ṅa> khlʊəŋ (bare root; <ghl> = */gl/)គំលង់ <gaṃla'ṅa> k-um-lʊəŋ (infixed)
Khmer ចង <caṅa> cɑɑŋ 'mild leprosy' may be an extended usage of its homophone 'to be tied'.
Khmer ព្យាធិ៍ <byaadh(i)> pyiet 'leprosy' is a loan from Skt vyaadhi with the <b> : Skt v correspondence I discussed here.I can't find a Khmerized version of P kuṭṭha or Skt kuṣṭha.
6.13.1:49: I just realized that the final nasal of Thai กุฎฐัง <kuṭṭhaṅa>kutthaŋ 'leprosy' corresponds to the neuter nominative/accusative singular ending -ṃ [ŋ] of Pali:
P kuṭṭhaṃ [kuʈʈhaŋ] > Thai kutthaŋ
Thai generally borrows Sanskrit and Pali stems without inflections but this is a rare exception. Are there other nouns with -ŋ < Pali -ṃ?
Chinese also generally borrows Indic stems sans inflexions: e.g.,
Middle Chinese 佛陀 *butda < Skt buddha (nom. sg. buddha-s)
The one exception I can think of is neuter like 'leprosy':
Middle Chinese 悉曇 *sitdəm (rather than *sitda) < Skt siddha-m 'accomplished'
which is the name of my favorite Indic script.
18.104.22.168:06: KEṆḌ-Y COATED CONUNDRUM
Thai เกณฑ์ <keṇ(ḍa)> keen 'criterion; to recruit' is spelled as if it were from an Indic keṇḍa. But there is no such Sanskrit or Pali word. There is a Khmer word កេណ្ឌ <keṇḍa> kaen ~ keen 'to recruit' and has an alternate spelling កែន <kɛna> without the Indic retroflex letters <ṇḍa> and with the Khmer-only vowel letter <ɛ>. I can't find any early attestations of this word in Jenner and Pou (1980-81: 20) or SEAlang's Corpus of Khmer Inscriptions. What's going on here? Here's my guess:
1. There was a native Khmer word *kɛɛn ~ *keen 'to recruit'.
2. This word was 'jazzed up' with a spelling <keṇḍa> indicating a nonexistent Indic origin. (I need a term for this kind of orthographic 'promotion'.) Orthographic word-final <nCa> and <n(Ca)> are both [n] in Khmer and Thai. (Parentheses indicate Khmer and Thai letters with a silencing mark above them. Not all Khmer and Thai silent letters bear such a mark: e.g., Khmer silent <ḍa> in <keṇḍa> is unmarked.)
3. The word was borrowed into Thai as <keṇ(ḍa)> keen. (Indic e is [ee] and Thai ee preserves its length.)
If เกณฑ์ <keṇ(ḍa)> keen were a genuine Indic loan, it would combine with other words as เกณฑ์ <keṇḍa> keentha- without the silencer mark and with an extra bridge vowel -a-, but the Royal Institute of Thailand's dictionary lists no compounds with keentha-.
Problem: Why does keen also mean 'criterion' in Thai? 'Criterion' is the sort of word that is likely to be borrowed. Did this meaning also once exist in Khmer? Or is keen 'criterion' an unrelated native Thai homophone?
22.214.171.124:39: KOT IN A BIND
Where does Thai กฎ <kaʔḍa>* kot 'rule' come from? The Royal Institute of Thailand's dictionary lists no etymology. The spelling implies that it's from an Sanskrit or Pali kaṭa but there is no Sanskrit or Pali kaṭa meaning 'rule': see Monier-Williams, p. 243 and the Pali Text Society, p. 176. I doubt kot is a native Thai word since 'rule' is likely to be borrowed and its final -t is spelled with etymogically retroflex ฎ <ʔḍa> for Indic loanwords rather than the usual -ด <ʔda> for native Thai words. (Although Thai has never had retroflexes, Indic retroflex-dental distinctions are preserved in Thai spelling.)
Could kot be from Khmer? I can't find any Khmer word spelled like កត <kata> or កដ <kaṭa> with a similar meaning. The closest Khmer word is ក្រិត្យ <kritya> krət 'duty; law' < Skt kṛtya 'what is to be done' which corresponds to Thai กฤตย- <kṛtya> krittaya- 'to do' (archaic).
The resemblance between kot and English code is coincidental. Thai has already borrowed English code as โค้ด <goo2ʔda> khoot with a long vowel and an etymologically nonretroflex final letter. (The 2 transliterates the superscript tone marker indicating a falling tone after <g>. <g> kh- was chosen to approximate English aspirated c- [kh] without any regard for its earlier sound value.)
*I transliterate ฎ ด บ as <ʔḍ ʔd ʔb>. I could transliterate the last two as <ɗ ɓ> but there's no single-letter IPA symbol for a retroflex implosive.
126.96.36.199:41: NOP(E): (K)NOW (K)NEW NINE IN THAI (AND KHMER)
Indo-European words have spread all the way to East Asia via Sanskrit and Pali. For example, Proto-Indo-European *gnh3- 'know' is the root of Skt prajñaa 'wisdom' which in turn is the source of Pali paññaa and
Japanese 般若 hannya < *pannya 'mask for female demons in Noh' (why?)Khmer បញ្ញា <paññaa> paññaa 'wisdom' (via Pali)
via Middle Chinese 般若 *paɲɨaʔ which sounds more like Pali paññaa than Skt prajñaa
also cf. ប្រាជ្ញា <praajñaa> praacñaa 'intelligence' < Skt praajñaa with a long first vowelThai ปัญญา <paññaa> panyaa 'wisdom' (via Pali)
The other PIE root for 'know', *uid (> Skt vid) doesn't appear in northeast Asia but is in Khmer and Thai as
K ពិទ- <bid-> pit- (in ពិទ្យាធរ <bidyaadhara> pityiethɔɔ 'learned person' < 'knowledge-bearer')K វិទ- <vid-> vit- (in វិទ្យា <vidyaa> vityie 'knowledge')
T พิท- <bid-> phit(tha)- (in พิทยา <bidyaa> phitthayaa 'knowledge')
T วิท- <wid-> wit(tha)- (in วิทยา <vidyaa> witthayaa 'knowledge')
which by coincidence has developed into a soundalike of Eng wit
Thai ฟ <v> is never used in borrowings from Indic. Perhaps Khmer <v> was [w] or [β] or [ʋ] when the Thai used the Khmer alphabet as a basis for their own alphabet in the 13th century. Sanskrit and Pali v was [ʋ].
<b> ~ <w> variation is also in the Thai borrowings of 'new' and 'nine' (both nava- in Sanskrit and Pali):
|new||นพ <naba-> nop(pha)-||นว- <nawa-> nawa-|
|nine||นพ <naba-> nop(pha)-||นว- <nawa-> nawa-|
(นพ- <naba> 'new' is only in นพกะ <nabaka> nopphaka < P navaka 'novice'.)
Note how the <b> for Indic v forms have o instead of a for Indic a. Do these two features go together? There are words like
with <w> for Indic v plus o for Indic a, but I can't think of any cases of <b> for Indic v preceded by a for Indic a:
วงศ์ <waṅśa> woŋ < Skt vaṃśa 'family'
T <Cab-> < Indic Cav-?
Khmer has the interesting compromise spelling
នព្វ <nabva> nup(peaʔ-)
as well as
នព <naba> nɔɔp
នវ <nava> neaʔveaʔ-
for both 'new' and 'nine'.
I have long thought that these two features
Khmer/Thai <b> : Indic v
Khmer/Thai rounded vowels : Indic unrounded a
reflected the phonology of the eastern Indic speakers that the Khmer learned Sanskrit and Pali from.
Bengali - a modern eastern Indic language - has those two features:
বিদ্যা bidæ < vidyaa 'knowledge'
নব nɔbo < nava 'new'; 'nine'
(My transcriptions are guesses. Bengali spelling is not a reliable guide to pronunciation.)
(These words were borrowed rather than inherited from Sanskrit.)
But did such features already exist in eastern Indic when Old Khmer borrowed from Sanskrit and Pali?
To complicate matters further, Old Khmer v- may correspond to modern Khmer p- < *b-: e.g.,
OK vok : modern ពក <baka> pɔɔk 'swollen'
There are also <b> ~ <v> doublets in modern Khmer: e.g.,
-ពិល <bila> in ពពិល <babila> pɔpɨl : OK valvel 'utensil passed around to bless someone or something'
វិល <vila> vɨl : OK -vil 'to turn'
(The word families that Jenner and Pou [1980-81] imply such as the one above deserve investigation, as they are phonetically and semantically loose. I am not sure that 'utensil passed around' is really from 'turn', but then again who could have guessed that the name of a demon mask would be from 'knowledge'?)
I have not found any cases of modern Khmer <b> p corresponding to Old Khmer v-. Did OK v- sporadically harden to *b-?
|Proto-Khmer *v-||Old Khmer v-||Modern Khmer <v> v- ~ <b> p-|
If so, then perhaps
Indic v- > borrowed into OK as v- > later Khmer *b- > modern Khmer p-, borrowed into Thai as *b- > ph-
The SEAlang corpus of Khmer inscriptions has vidyaa and nava, but not bidyaa or naba.
Another possibility is that modern Khmer is not the descendant of OK, which sporadically lenited *b- to v-:
|Proto-Khmer *b-||Old Khmer b- ~ v-||(no descendant?)|
|*b- in ancestor of modern Khmer||Modern Khmer <b> p-|
In any case, I wonder if the OK v- : modern Khmer <b> p- mismatches are relevant to the problem of Khmer/Thai stops for Indic v.