6904, 6905, 6907 libc: locale/collation fixes

Review Request #402 - Created March 16, 2017 and submitted

Information
Yuri Pankov
illumos-gate
master
6904, 6905, 6907
7f632ed...
Reviewers
general

6904 collation: Fix expansion substitutions -- fix from FreeBSD (originally from DragonFly BSD)

     collate: Fix expansion substitions (broken upstream too)

     Through testing, the user noted that some Cyrillic characters were
     not sorting correctly, and this was confirmed.

     After extensive testing and review, the localedef tool was
     eliminated as the culprit.  The sustitutions were encoded correctly
     in LC_COLLATE.

     The error was mainly in wcscoll where character expansions were
     mishandled.  The main directive pass routines had to be written to
     go back for a new collation value when the "state" variable was set.
     Before pointers were being advanced, the second lookup was gettting
     applied to the wrong character, etc.

     The "eat expansion codes" section on collate.c also had a bug. Later
     own, the "state" variable logic was changed to only set if next
     code was greater than zero (rather than >= 0).

     Some additional cleanups got captured from previous work:
     1) The previous commit moved the binary search comment from the
        correct location to a wrong location because it's wrong upstream
        in Illumos.  The comment has little value so I just removed it.
     2) Don't check if pointers are null before freeing, this is
        redundant as free() handles null pointers.
     3) The two binary search trees were standardized wrt initialization
     4) On the binary search trees, a negative "high" exits rather than
        checking the table count again.

  • 6905 locales: Fix eucJP sorting -- fix from FreeBSD (originally from DragonFly BSD)
     locales: Fix eucJP sorting (broken upstream?)

     Sorting eucJP text with "sort" resulted in an illegal sequence while
     "gsort" worked.  This was traced back to mbrtowc handling which was
     broken for eucJP (probably eucCN, eucKR, and eucTW as well).  This
     small fix took hours to figure out.  The OR operation to build the
     wide character requires an unsigned character to work correctly. The
     euc wcrtowc conversion is probably broken upstream in Illumos as
     well.

  • 6907 strcoll() and strxfrm() don't seem to agree
     Rework the forward order case in wcscoll_l().

Tested using the test cases provided in the tickets.

Issues

  • 0
  • 2
  • 0
  • 2
Description From Last Updated
Yuri Pankov
Yuri Pankov
Baptiste Daroussin
Yuri Pankov
Robert Mustacchi
Robert Mustacchi
Yuri Pankov
Review request changed

Status: Closed (submitted)

Loading...