6904, 6905, 6907 libc: locale/collation fixes

Review Request #402 — Created March 16, 2017 and submitted

yuripv
illumos-gate
master
6904, 6905, 6907
7f632ed...
general

6904 collation: Fix expansion substitutions -- fix from FreeBSD (originally from DragonFly BSD)

     collate: Fix expansion substitions (broken upstream too)

     Through testing, the user noted that some Cyrillic characters were
     not sorting correctly, and this was confirmed.

     After extensive testing and review, the localedef tool was
     eliminated as the culprit.  The sustitutions were encoded correctly
     in LC_COLLATE.

     The error was mainly in wcscoll where character expansions were
     mishandled.  The main directive pass routines had to be written to
     go back for a new collation value when the "state" variable was set.
     Before pointers were being advanced, the second lookup was gettting
     applied to the wrong character, etc.

     The "eat expansion codes" section on collate.c also had a bug. Later
     own, the "state" variable logic was changed to only set if next
     code was greater than zero (rather than >= 0).

     Some additional cleanups got captured from previous work:
     1) The previous commit moved the binary search comment from the
        correct location to a wrong location because it's wrong upstream
        in Illumos.  The comment has little value so I just removed it.
     2) Don't check if pointers are null before freeing, this is
        redundant as free() handles null pointers.
     3) The two binary search trees were standardized wrt initialization
     4) On the binary search trees, a negative "high" exits rather than
        checking the table count again.

  • 6905 locales: Fix eucJP sorting -- fix from FreeBSD (originally from DragonFly BSD)
     locales: Fix eucJP sorting (broken upstream?)

     Sorting eucJP text with "sort" resulted in an illegal sequence while
     "gsort" worked.  This was traced back to mbrtowc handling which was
     broken for eucJP (probably eucCN, eucKR, and eucTW as well).  This
     small fix took hours to figure out.  The OR operation to build the
     wide character requires an unsigned character to work correctly. The
     euc wcrtowc conversion is probably broken upstream in Illumos as
     well.

  • 6907 strcoll() and strxfrm() don't seem to agree
     Rework the forward order case in wcscoll_l().

Tested using the test cases provided in the tickets.

  • 0
  • 0
  • 2
  • 0
  • 2
Description From Last Updated
yuripv
yuripv
bapt
  1. Looks good to me, I have tested the 6907 fix on FreeBSD and it works for me

  2. 
      
yuripv
rm
  1. The changes genearlly look good as far as I could follow them. Thanks Yuri.

    1. Thanks, Robert. I've removed the test cases that were there originally as they don't really show full problem severity (and are thirdparty licensed), and added the test case for the strcoll() vs strxfrm() problem that I used while fixing the issue (under CDDL, per original's code author, quoting "I don't care what you do with that code, CDDL is fine").

  2. I think we may need some thirdpartylicense bits for these tests.

  3. While we depend on locale/ru for the libc-tests package, it doesn't include the ISO8859-5 locale. We may need another dep here or to use a locale that has it.

  4. 
      
yuripv
yuripv
rm
  1. What ended up happening to the older tests?

    1. I just removed them, given the locale dependencies and licensing - new one should cover what those did (and much more besides that).

    2. OK, sounds good. Thanks.
  2. 
      
yuripv
Review request changed

Status: Closed (submitted)

Loading...