This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| wiki:orthography [2019/08/21 18:44] – [Orthography] chuck | wiki:orthography [2023/03/15 04:22] (current) – chuck | ||
|---|---|---|---|
| Line 10: | Line 10: | ||
| In order to filter out orthographic variants, regular expressions are used to normalize the text. Some of these filters may overlap; if a portion of text matches more than one filter, only one filter is applied. Filters are applied in the order that they appear. | In order to filter out orthographic variants, regular expressions are used to normalize the text. Some of these filters may overlap; if a portion of text matches more than one filter, only one filter is applied. Filters are applied in the order that they appear. | ||
| + | |||
| + | Some regular expressions may coincide with sandhi rules described in the // | ||
| Examples are given to show how the text is normalized; counterexamples are exceptions which are not normalized. | Examples are given to show how the text is normalized; counterexamples are exceptions which are not normalized. | ||
| + | |||
| + | Normalized spellings may not represent what is generally considered to be " | ||
| === geminated t === | === geminated t === | ||
| <code pcre>/ | <code pcre>/ | ||
| - | * replaces tt(h) after consonantal/ | + | * replaces |
| - | * examples: | + | |
| - | * counterexamples: | + | * __examples__ |
| + | * arttha => artha | ||
| + | * saṃskṛtta => saṃskṛta | ||
| + | * prākritta => prākrita | ||
| + | * tattvam => tatvam | ||
| + | * pattram => patram | ||
| + | * __counterexamples__ | ||
| + | * atty annam (source: [[http:// | ||
| === geminated consonants after r === | === geminated consonants after r === | ||
| - | <code pcre>/ | + | <code pcre>/ |
| - | * replaces doubled consonants (excluding t) after consonantal/ | + | * replaces doubled consonants (excluding t) after consonantal/ |
| - | * examples: | + | |
| + | * __examples__ | ||
| + | * arddha => ardha | ||
| + | * dharmma => dharma | ||
| + | * pṛcchati => pṛchati | ||
| === geminated aspirated consonants === | === geminated aspirated consonants === | ||
| <code pcre>/ | <code pcre>/ | ||
| - | * replaces jjh, tth, ṭṭh, and ddh with jh, th, ṭh, and dh respectively | + | * replaces |
| - | * examples: | + | |
| + | * __examples__ | ||
| + | * attha => atha | ||
| + | *daddhi => dadhi | ||
| === final nasal variants === | === final nasal variants === | ||
| <code pcre>/ | <code pcre>/ | ||
| - | * replaces -ṃl, -ṃś, -ṃs, and -nn with -n | + | * replaces |
| - | * examples: | + | |
| - | * counterexamples: aṃśa, annam | + | * __examples__ |
| + | * gacchaṃs tu => gacchan tu | ||
| + | * puruṣānn | ||
| + | |||
| + | === internal nasal variants === | ||
| + | <code pcre>/ | ||
| + | |||
| + | * replaces nasals preceding certain consonants with an anusvāra (this regular expression is the opposite of rule [[https:// | ||
| + | |||
| + | * __examples__ | ||
| + | * nandita => naṃdita | ||
| + | * yuñjati => yuṃjati | ||
| + | |||
| + | ==== Script/ | ||
| + | |||
| + | Some normalization filters require | ||
| + | |||
| + | === pṛṣṭhamātrā vowels === | ||
| + | <code pcre>/ | ||
| + | |||
| + | In transcriptions of Devanāgarī sources, pṛṣṭhamātrā vowels are transcribed as ê, aî, ô, and aû (see [[wiki: | ||
| + | |||
| + | * These filters require '' | ||
| + | |||
| + | === valapalagilaka === | ||
| + | <code pcre>/ | ||
| + | |||
| + | In transcriptions of Telugu sources, the valapalagilaka reph is transcribed as ṙ (see [[wiki: | ||
| + | |||
| + | * This filter requires '' | ||
| + | |||
| + | === ṭh written as ṭ === | ||
| + | <code pcre>/ | ||
| + | |||
| + | In some Devanāgarī manuscripts, | ||
| + | |||
| + | * This filter requires a ''< | ||
| + | |||
| + | === b written as v === | ||
| + | <code pcre>/ | ||
| + | |||
| + | In some scripts, b is not distinguished from v. | ||
| + | |||
| + | * This filter requires a ''< | ||
| + | |||
| + | === dbh written as bhd === | ||
| + | <code pcre>/ | ||
| + | |||
| + | In some Devanāgarī manuscripts, | ||
| + | |||
| + | * This filter requires a ''< | ||