Collations for Ancient Languages in XSLT and XQuery

When writing XSLT stylesheets and XQuery queries, classicists will find the need to alphabetize their material in orders determined by the language or other considerations. For example, scholars working with Latin may wish to conflate the i with the j and the u with the v, and those working with Greek may wish to have the ϙ (qoppa) collated in the alphabet, or to include characters that are outside the Greek and Coptic and Greek Extended planes.

The best solution is to use XSLT 3.0 (https://www.w3.org/TR/xslt-30/), XQuery 3.0 (https://www.w3.org/TR/xquery-30/), or XQuery 3.1 (https://www.w3.org/TR/xquery-31/), which all support the Unicode Collation Algorithm as specified in the Functions and Operators specification (https://www.w3.org/TR/xpath-functions-31/#uca-collations). Earlier W3C recommendations on XSLT (1.0, 2.0, 3.0) and XQuery (1.0, 3.0) provide for collations through attributes such as @collation, but they leave to individual transformation engines the decisions on how to construct and retrieve specific collations.

Syriac

 * Syriac Reference Portal: romanized transliteration scheme: definition and application -- intended to work with the Saxon engine.

Alternatives
<sort select="translate($gr,'ἀἁἂἃἄἅἆἇἈἉἊἋἌἍἎἏὰάᾀᾁᾂᾃᾄᾅᾆᾇᾈᾉᾊᾋᾌᾍᾎᾏᾰᾱᾲᾳᾴᾶᾷᾸᾹᾺΆᾼΆΑάαΒβϐΓγΔδἐἑἒἓἔ ἕἘἙἚἛἜἝὲέῈΈΈΕέεϵ϶ΖζἠἡἢἣἤἥἦἧἨἩἪἫἬἭἮἯὴήᾐᾑᾒᾓᾔᾕᾖᾗᾘᾙᾚᾛᾜᾝᾞᾟῂῃῄῆῇῊΉῌͰͱΉΗήηΘθϑϴἰἱἲἳἴἵἶἷἸἹἺἻἼἽἾἿὶίῐῑ ῒΐῖῗῘῙῚΊΊΐΙΪίιϊϳΚκϏϗϰΛλΜμΝνΞξὀὁὂὃὄὅὈὉὊὋὌὍὸόῸΌΌΟοόΠπϺϻῤῥῬΡρϱϼΣςσϲϹϽϾϿΤτὐὑὒὓὔὕὖὗὙὛὝὟὺύῠῡῢΰῦῧῨῩ ῪΎΎΥΫΰυϋύϒϓϔΦφϕΧχΨψὠὡὢὣὤὥὦὧὨὩὪὫὬὭὮὯὼώᾠᾡᾢᾣᾤᾥᾦᾧᾨᾩᾪᾫᾬᾭᾮᾯῲῳῴῶῷῺΏῼΏΩωώϖϚϛϜϝϞϟϘϙͲͳϠϡϷϸϢϣϤϥϦϧϨϩϪϫϬϭ Ϯϯ᾽ι᾿῀῁῍῎῏῝῞῟῭΅`´῾ʹ͵Ͷͷͺͻͼͽ;΄΅·', 'ααααααααααααααααααααααααααααααααααααααααααααααααααβββγγδδεεεεεεεεεεεεεεεεεεεεεεζζηηηηηηηηηη ηηηηηηηηηηηηηηηηηηηηηηηηηηηηηηηηηηηηηηθθθθιιιιιιιιιιιιιιιιιιιιιιιιιιιιιιιιιιιικκκκκλλμμννξξο οοοοοοοοοοοοοοοοοοοππϻϻρρρρρρρσσσσσσσσττυυυυυυυυυυυυυυυυυυυυυυυυυυυυυυυυυυφφφχχψψωωωωωωωωωωω ωωωωωωωωωωωωωωωωωωωωωωωωωωωωωωωωωωωω ϛϛϝϝϟϟϙϙϠϠϡϡϸϸϣϣϥϥϧϧϩϩϫϫϭϭϯϯ')"/> lower-case(translate(normalize-unicode($gr,'NFD'), '&amp;#x0300;&amp;#x0301;&amp;#x0308;&amp;#x0313;&amp;#x0314;&amp;#x0342;&amp;#x0345;',''))
 * It is possible to use the fn:translate function as a processor-independent collation method. This method binds characters to specific Unicode code points, and relies upon default sorting by code point to alphabetize. Here is an example of how to sort, non-case-sensitive, a sequence of Greek words stored in the variable $gr (select new lines have been introduced, to improve display on the screen; when using this code remove all newlines between the opening and closing tags):
 * A simpler function, which should perform the same result, might be (i.e. normalize as decomposed Unicode, then strip out the combining diacritics characters):