Python Unicode Collation Algorithm

From The Digital Classicist Wiki
Jump to navigation Jump to search



  • James Tauber


From the Github page (accessed 2020-10-06):

Python Unicode Collation Algorithm (PyUCA) is a Python implementation of the Unicode Collation Algorithm (UCA). It is used for sorting non-English strings properly. The core of the algorithm involves multi-level comparison. For example, café comes before caff because at the primary level, the accent is ignored and the first word is treated as if it were cafe. The secondary level (which considers accents) only applies then to words that are equivalent at the primary level.