TLG Beta Code vs. Unicode FAQ: Difference between revisions

From The Digital Classicist Wiki
Jump to navigation Jump to search
(Imported from xwiki)
 
m (cat)
Line 16: Line 16:
# By virtue of the transcoder and other conversion methods out there, we can always go ''back'' to Beta Code, on the fly, when it is necessary.
# By virtue of the transcoder and other conversion methods out there, we can always go ''back'' to Beta Code, on the fly, when it is necessary.
# Beta code, by using punctuation marks in non-standard ways, requires a rewrite of any tokenizer (e.g. you can't count on ")" to follow the end of a word); this requires some extra programming in some instances.
# Beta code, by using punctuation marks in non-standard ways, requires a rewrite of any tokenizer (e.g. you can't count on ")" to follow the end of a word); this requires some extra programming in some instances.
[[Category:FAQ]]

Revision as of 12:39, 6 November 2006

Should I use TLG betacode or Unicode for polytonic classical Greek in my electronic publications?

Some practical considerations one hears quoted for both sides of this debate. (Thanks, Ross. All comments/additions welcome.)

Arguments one hears for coding polytonic classical Greek with TLG Beta Code even today in new e-pubs:

  1. Unicode conflates the idea of "character" and "glyph", treating an alpha+acute as a different letter from an alpha+grave, and a terminal sigma as different from a medial sigma.
  2. Morpheus (Perseus morphological parser, aka cruncher) needs Beta Code input.
  3. There are symbols defined in Beta Code but not yet defined in Unicode, and symbols defined in both, but with no font support in Unicode (but this is a problem either way).

Arguments one hears for coding polytonic classical Greek with Unicode in new e-pubs:

  1. Unicode is an international standard.
  2. It sucks to have to implement a transcoder vel sim. in an already hairy process off setting up tomcat/cocoon or other on-the-fly publication framework.
  3. If you offer your XML source files for download, and the Greek is TLG B C, people can't read them easily, without conversion.
  4. By virtue of the transcoder and other conversion methods out there, we can always go back to Beta Code, on the fly, when it is necessary.
  5. Beta code, by using punctuation marks in non-standard ways, requires a rewrite of any tokenizer (e.g. you can't count on ")" to follow the end of a word); this requires some extra programming in some instances.