Speech-sounds and their transcription

I have been thinking hard about this issue of transcription, provoked more intensely now, after learning about PARSIL developed by Mr. Shreyas Munshi.

To my mind the need for transcription arises due to the basic fact that commonly available keyboards have Roman characters. But Roman characters have poor linkage with phonetics or phonology. People around the world speak different languages. But when we say that a person is speaking,  what he is doing is producing speech-sounds.  So the problem is about typing all speech-sounds of all languages around the world using commonly available keyboards having Roman characters. This is what is called as transcription or transliteration.

The fact that Roman characters have poor linkage with phonetics or phonology can be explained by the fact that

  • The vowels ‘a’, ‘e’, ‘i’, ‘o’ and ‘u’ are spread out across the alphabet at 1st, 5th, 9th, 15th and 21st positions. And we also have ‘y’ in 25th position also serving the function of vowel sound.
  • The words ‘err’,  ‘sir’, ‘ton’ and ‘sun’ have different vowels for the same vowel sound. This may not be called as problem with Roman script, but problem arising from the English people having used, in different words, different vowels for the same vowel sound

When thinking of typing all speech-sounds of all languages around the world, one should first of all compile a list of all those speech-sounds. Compiling such a list becomes easy, by taking clue from scripts of Indian languages, because scripts of Indian languages have been, since millenia, methods of writing speech-sounds.

So one view can be that instead of struggling to type all speech-sounds, using commonly available keyboards, which have Roman characters, why not change keyboards to adopt any such script, which inherently has the capability to characterize the speech-sounds ?

But this suggestion of adopting ‘any such script’ raises a corollary question, “which script ?” Since scripts of all Indian languages will be candidates, this corollary question will certainly have political colors and would become difficult to resolve.

Alongside, it comes to mind that two vowel-sounds, which are missing in Indian languages are – (1) the sound of ‘a’ as in ‘cat’ and (2) that of ‘o’ in ‘dog’.

Among Indian scripts, DevanagarI script has one shortcoming, compared to scripts of South Indian Languages, which have distinct characterization for short and long vowel sounds as in ‘get’ and ‘gate’ (or ‘gait’). The South Indian Language-scripts also have distinct characterization for short and long vowel sounds as in ‘poke’ and ‘goat’.

Actually, Sanskrit grammar recognizes that the basic vowel sounds can be pronounced in 18 different ways – (1) nasal अनुनासिक and not nasal अननुनासिक (2) these as short ह्रस्व long दीर्घ and extended प्लुत (3) these further as stressed उदात्त  unstressed or low अनुदात्त  and plain or level स्वरित. One can of course appreciate that it becomes a un-gainful effort to develop a script to characterize all such shades of a basic vowel sound. Yet in Sanskrit texts one does find some thought having been given to this in terms of pronunciation-notations called as स्वरांकन.

As is said, scriptures have been said to have been transmitted through teacher-disciple lineages, with great emphasis on correct pronunciation, rather than by written down texts. In fact it comes to mind that this was so, not because people did not know writing. This was so, because a particular mantra will have its best effect, only from its correct and proper pronunciation. If concept of writing was not there, why would have Vyaasa-maharshi sought Ganesh to write down the Mahaabhaaratam for him ?

DevanagarI script excels in characterization of conjunct consonants. Most number of conjunct consonants possibly come in the word “kaarstnyam” कार्स्त्न्यम्. As can be seen the number of consonants, which are conjunct here, are as many as five, ‘r-s-t-n-y’. In Kannada, this becomes very complicated.

The Tamil script is substantially a uni-level script, except that syllabic consonants need a dot above them. On the aspect of being uni-level, Tamil excels over the Roman, because all letters have almost the same height.

DevanagarI cannot be called as uni-level.

  • A word like Truman ट्रूमन्  has three levels below the line and
  • a word like ‘sarvaiH’ सर्वैः has as many as three strokes above the line, all having single point of coincidence on the line.

In a uni-level script as Tamil the words will become too long कार्स्त्न्यम् = கார்ச்த்ன்யம். Longer the spread of a word, it becomes more difficult to read.

One major problem with Tamil script is that it has very less number of characters, so much so that one has to write ‘gangaa’ गङ्गा as கங்கா literal pronunciation being ‘kankaa’ कङ्का. While on one hand it has very less number of characters, on the other hand, some speech-sounds have more than one characters.

DevanagarI is used by three languages – Sanskrit संस्कृतम्, Hindi हिन्दी and MaraathI मराठी. From the point of phonetics and scripting there are two points to be noted.

  • Hindi has no use of the character “L” ळ. This letter does not find specific mention in Sanskrit grammar also and has much less use, though it is there in the word अग्निमीळे in the very first mantra in Rigveda.
  • Hindi has some consonants with a ‘nuktaa’ a dot under them, connoting special intonation as that of ‘z’ in ‘nazar’. Although I have spelt it with ‘z’, its pronunciation is somewhat like accentuated ‘j’ and is written as नज़र.
  • I would not consider script used for Urdu as Indian script at all for obvious reasons. It has nothing in common with any other Indian script.

Exploring on this subject of speech-sounds and their transcription, I came across ISO 15919 Transliteration of Devanagari and related Indic scripts One can read interesting information about this at http://en.wikipedia.org/wiki/ISO_15919

That raises a question in my mind, “Why are people exercising. putting in efforts to develop transcription systems such as PARSIL, if there has been ISO 15919 already ?”

Another curiosity in my mind is about the need for developing transcription-systems. General thinking about the need for developing transcription-systems seems to be that Roman alphabet has only 26 alphabets, whereas speech-sounds are many more. The thought that Roman alphabet has only 26 alphabets, does not seem to be mathematically correct. If one considers that the capitals make an additional set of 26 characters, we have as many as 52 characters and we also have on commonly available keyboards some special characters as ‘ ` ~ which are not used for any speech-sounds. The commonly available keyboards also have the Control and Alt keys, which have diverse capabilities. If we use these also and assign specific speech-sounds to them, I think we have enough number of characters right there on commonly available keyboards for transcription of any language.

Actually a simple scheme of transliteration is detailed at http://www.aa.tufs.ac.jp/~tjun/sktdic/

transcription scheme 01