11741 regexec: fix processing multibyte strings

Review Request #2343 — Created Sept. 21, 2019 and submitted

yuripv
illumos-gate
master
11741
2315
4ec9c83...
general

Matcher function incorrectly assumed that moffset that we get from
findmust is in bytes. Fix this by introducing a stepback function,
taking short path if MB_CUR_MAX is 1, and going back byte-by-byte,
checking if we have a legal character sequence otherwise.

$ echo 'éa' | sed -ne '/.a/p'
$ echo 'éa' | LD_LIBRARY_PATH=~/ws/il11741/proto/root_i386/lib sed -ne '/.a/p'
éa
$ echo 'aéaa' | sed -ne '/a.aa/p'
$ echo 'aéaa' | LD_LIBRARY_PATH=~/ws/il11741/proto/root_i386/lib sed -ne '/a.aa/p'
aéaa
$ echo 'éaé' | sed -ne '/.a./p'
$ echo 'éaé' | LD_LIBRARY_PATH=~/ws/il11741/proto/root_i386/lib sed -ne '/.a./p'
éaé
yuripv
tsoome
  1. Ship It!
  2. 
      
yuripv
Review request changed

Status: Closed (submitted)

Loading...