Should I use TLG betacode or Unicode for polytonic classical Greek in my electronic publications?
Some practical considerations one hears quoted for both sides of this debate. (Thanks, Ross. All comments/additions welcome.)
Arguments one hears for coding polytonic classical Greek with TLG Beta Code even today in new electronic publications:
- Unicode conflates the idea of "character" and "glyph", treating an alpha+acute as a different letter from an alpha+grave, and a terminal sigma as different from a medial sigma.
- Morpheus (Perseus morphological parser, aka cruncher) needs Beta Code input.
- There are symbols defined in Beta Code but not yet defined in Unicode, and symbols defined in both, but with no font support in Unicode (but this is a problem either way).
Arguments one hears for coding polytonic classical Greek with Unicode in new electronic publications:
- Unicode is an international standard.
- It sucks to have to implement a transcoder vel sim. in an already hairy process off setting up tomcat/cocoon or other on-the-fly publication framework.
- If you offer your XML source files for download, and the Greek is TLG B C, people can't read them easily, without conversion.
- By virtue of the transcoder and other conversion methods out there, we can always go back to Beta Code, on the fly, when it is necessary.
- Beta code, by using punctuation marks in non-standard ways, requires a rewrite of any tokenizer (e.g. you can't count on ")" to follow the end of a word); this requires some extra programming in some instances.