TLG Beta Code vs. Unicode FAQ: Difference between revisions
Jump to navigation
Jump to search
(Imported from xwiki) |
m (cat) |
||
Line 16: | Line 16: | ||
# By virtue of the transcoder and other conversion methods out there, we can always go ''back'' to Beta Code, on the fly, when it is necessary. | # By virtue of the transcoder and other conversion methods out there, we can always go ''back'' to Beta Code, on the fly, when it is necessary. | ||
# Beta code, by using punctuation marks in non-standard ways, requires a rewrite of any tokenizer (e.g. you can't count on ")" to follow the end of a word); this requires some extra programming in some instances. | # Beta code, by using punctuation marks in non-standard ways, requires a rewrite of any tokenizer (e.g. you can't count on ")" to follow the end of a word); this requires some extra programming in some instances. | ||
[[Category:FAQ]] |
Revision as of 12:39, 6 November 2006
Should I use TLG betacode or Unicode for polytonic classical Greek in my electronic publications?
Some practical considerations one hears quoted for both sides of this debate. (Thanks, Ross. All comments/additions welcome.)
Arguments one hears for coding polytonic classical Greek with TLG Beta Code even today in new e-pubs:
- Unicode conflates the idea of "character" and "glyph", treating an alpha+acute as a different letter from an alpha+grave, and a terminal sigma as different from a medial sigma.
- Morpheus (Perseus morphological parser, aka cruncher) needs Beta Code input.
- There are symbols defined in Beta Code but not yet defined in Unicode, and symbols defined in both, but with no font support in Unicode (but this is a problem either way).
Arguments one hears for coding polytonic classical Greek with Unicode in new e-pubs:
- Unicode is an international standard.
- It sucks to have to implement a transcoder vel sim. in an already hairy process off setting up tomcat/cocoon or other on-the-fly publication framework.
- If you offer your XML source files for download, and the Greek is TLG B C, people can't read them easily, without conversion.
- By virtue of the transcoder and other conversion methods out there, we can always go back to Beta Code, on the fly, when it is necessary.
- Beta code, by using punctuation marks in non-standard ways, requires a rewrite of any tokenizer (e.g. you can't count on ")" to follow the end of a word); this requires some extra programming in some instances.