Phonetic characters in digital editions Toma Erjavec 1
- Slides: 25
Phonetic characters in digital editions Tomaž Erjavec 1 & Matija Ogrin 2 tomaz. erjavec@ijs. si, matija. ogrin@zrc-sazu. si 1 Department of Knowledge Technologies Jožef Stefan Institute Ljubljana 2 Institute of Slovenian Literature and Literary Sciences Scientific Research Centre of the Slovenian Academy of Sciences and Arts, Ljubljana Slo. Fon 21 April 2006
Overview of the talk 1. 2. 3. IPA PUA TEI
The problem n n n provide standardised encoding (XML) and Web viewing (HTML) of complex digital editions in particular, the Freising manuscripts (e-BS) work in progress in the project “Scholarly Digital Editions of Slovenian Literature” http: //nl. ijs. si/e-zrc/
Focus of the talk e-BS, a very complex document: facsimile, commentary, diplomatic and critical trascriptions, translations, dictionary, bibliography, name index, … n but also: u phonetic transcription in IPA u (recording) n
HTML representation of e. BS phonetic transcription
IPA n n n International Phonetic Alphabet (International Phonetic Association) contains not-well supported characters, e. g. ɐ, ɕ, ɚ, ɷ heavy use of diacritics: u unusual diacritical marks: ˀ ˒ ˤ u more than one diacritic: ǡ u diacritics spanning digraphs:
Computer representation of IPA SAMPA (for HLT) n n transliteration to ASCII SAMPA for contemporary Slovenian: u http: //www. phon. ucl. ac. uk/home/sampa/sloven-uni. htm u ZEMLJAK, Melita, KAČIČ, Zdravko, DOBRIŠEK, Simon, ŽGANEC GROS, Jerneja, WEISS, Peter. Računalniški simbolni fonetični zapis slovenskega govora. Slav. rev. , apr. -jun. 2002, 50/2, 159 -169. UNICODE (for humans) n n n universal character set, better and better supported contains “IPA Extensions”, “Combining diacritical marks” various good Unicode IPA fonts available, e. g. Doulos SIL for non-standardised characters: Private Use Area (PUA) not to be used lightly!
Unicode definitions
Unicode definitions
ZRCola developed at ZRC SAZU (Peter Weiss) n Unicode input system for linguistic use in Win. Word program: u decomposed and composed characters: u keyboard input u font which covers historical characters as well as IPA & (now) some specifics of e-BS ideal for use in e-BS n
ZRCola and PUA
Why PUA? ZRCola font uses PUA mostly for n defining new Slovene (related) historical characters n composed characters with diacritics (+ digraphs), for better diacritic placement n Unicode offers Combining diacritical marks, but complex stacks can cause problems for font rendering
Some comparissons PUA E 31 B ZRCola � mapping to 0105+0307 Times NR ą MS Tahoma ą Doulos SIL ą PUA EB 25 ZRCola � mapping to r+0300+0329 Times NR r MS Tahoma r Doulos SIL r PUA E 35 E ZRCola � mapping to 00 E 6+0303+0300 Times NR æ MS Tahoma æ Doulos SIL æ PUA EEC 8 ZRCola � ~mapping to t+j+032 E Times NR tj MS Tahoma tj Doulos SIL tj
Problem PUA = Private Use Area but n e-ZRC = standardised & interchangable n How to retain the benefits of ZRCola, yet make e-BS interchangable? How to enable reading e-BS for platforms without the ZRCola font?
Text Encoding Initiative e-ZRC editions encoded in XML n using the Text Encoding Initiative Guidelines, TEI P 4 n TEI P 5 makes provisions for encoding PUA characters and glyphs n in TEI P 4 user extensions are necessary to achieve the same effect n
PUA in TEI P 5 n n n TEI P 5 chapter 25. Representation of non-standard characters and glyphs markup in text to identify PUA characters or glyphs link these elements to their TEI header definition TEI header can give, for each new character: u a name (text description a la Unicode), e. g. LATIN SMALL LETTER A u mapping to standard Unicode u character properties rendering software (e. g. XSLT stylesheet for conversion to HTML) can then use the PUA version, or the standard version
Markup in the document n n text: b� : ʒɛ g� : spɔdi miłɔstíwi � : t� ɛ b� : ʒɛ tɛbǽ ispɔwǽdæ in XML: <line n="2" id="bs. PT. 1. 002"> b<g corresp="zrcola. E 656"/>: ʒɛ g<g corresp="zrcola. E 656"/>: spɔdi miłɔstíwi <g corresp="zrcola. E 656"/>: t<g corresp="zrcola. EECC"/>ɛ b<g corresp="zrcola. E 656"/>: ʒɛ tɛbǽ ispɔwǽdæ </line>
Markup in the header PUA characters are defined in tei. Header/encoding. Desc: <char. Desc> <desc>PUA characters as defined by <xref url="http: //zrcola. zrc-sazu. si/">ZRCola</xref> Character descriptions taken from and based on The Unicode Standard 4. 1 U 41 M 050317. lst </desc> <char id="zrcola. E 31 B"> <char. Name>LATIN SMALL LETTER A WITH OGONEK AND DOT ABOVE</char. Name> <char. Prop><local. Name>font</local. Name><value>ZRCola</value></char. Prop> <char. Prop><local. Name>mapping</local. Name><value>exact</value></char. Prop> <mapping type="PUA">&#x. E 31 B; </mapping> <mapping type="standard">&#x 0105; <!--LATIN SMALL LETTER A WITH OGONEK->&#x 0307; <!--COMBINING DOT ABOVE--></mapping> </char> <!-- more chars --> </char. Desc>
Standardisation of ZRCola PUA n n n ZRCola very well documented “visually”, i. e. for humans but lacking machine processable meta-data: Unicode compliant name mapping to standard Unicode (identity, similarity) we only implemented 50+ characters that actually appear in e. BS substantial work to describe all PUA characters in ZRCola distribution maybe better to abandon the precomposed PUA characters that can be expressed in standard Unicode?
PUA display with ZRCola
PUA display without ZRCola
Documentation
Mapping to Unicode, Doulos SIL font
TEI to HTML <xsl: template match="g"> <xsl: variable name="glyph" select="id(@corresp)/mapping[@type=$ENCODING]"/> <SPAN> <xsl: if test="$ENCODING = 'standard'"> <xsl: attribute name="class"> <xsl: value-of select="id(@corresp)/char. Prop[local. Name='mapping']/value"/> </xsl: attribute> </xsl: if> <xsl: attribute name="title"> <xsl: value-of select="id(@corresp)/char. Prop[local. Name='font']/value"/> <xsl: text>: </xsl: text> <xsl: value-of select="id(@corresp)/char. Name"/> </xsl: attribute> <xsl: value-of select="$glyph"/> </SPAN> </xsl: template>
Conclusions introduced IPA, PUA & TEI n showed how PUA characters can be, via TEI, made u interchangable u documented u flexibly presented n this does require investment of time by the designers of PUA characters n
- Esslii
- Gimnazija jesenice
- Kanban task manager for outlook
- Static character and dynamic character
- Ha se
- Police alphabet
- International phonetic alphabet spanish
- Phonetic drill
- Phonetic definiton
- Coarticulation nedir
- What is i.p.a
- Larynx funtion
- Phonetic drill
- @cyran__o
- Transcribe and mark primary stress on society
- Phonetic drill
- Phonetic exercise
- Cheese phonetic
- Phonetic similarity example
- Medicare phonetic alphabet
- French alphabet pronunciation in english
- Transcription sounds
- Cot phonetic transcription
- S phonetic alphabet
- Disney phonetic alphabet
- Speech phonetic transcription