Bug #4580

/usr/bin/grep can't handle multibyte characters

Added by Alexander Pyhalov almost 5 years ago. Updated 10 months ago.

Status:ClosedStart date:2014-02-06
Priority:NormalDue date:
Assignee:-% Done:

100%

Category:-
Target version:-
Difficulty:Medium Tags:needs-triage

Description

Grep doesn't work correctly with non-english text.
For example, I can't grep for cyrillic "а" character.

$ cat ~/tmp/test 
абвгдеёжзийклмнопрстуфхцчшщъыьэюя
АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
$ /usr/bin/grep -i а ~/tmp/test
grep: RE error 67: Illegal byte sequence

History

#1 Updated by Yuri Pankov almost 5 years ago

Sounds like yet another reason to make /usr/xpg4/bin/grep the default one:

$ /usr/xpg4/bin/grep -i абвгде greptest
абвгдеёжзийклмнопрстуфхцчшщъыьэюя
АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ
$

#2 Updated by Yuri Pankov over 4 years ago

  • Subject changed from grep doesn't work correctly with non-english text to /usr/bin/grep can't handle multibyte characters

#3 Updated by Garrett D'Amore over 4 years ago

Agreed. However, note that xpg4 grep uses extended regexes rather than simple ones. This may break some people. We should discuss.

#4 Updated by Electric Monk 10 months ago

  • % Done changed from 0 to 100
  • Status changed from New to Closed

git commit d2d52addd50254d1b7c318c6784172d8d7de20c6

commit  d2d52addd50254d1b7c318c6784172d8d7de20c6
Author: Alexander Pyhalov <apyhalov@gmail.com>
Date:   2018-01-09T19:12:32.000Z

    8858 /usr/bin/grep doesn't support -E option
    4580 /usr/bin/grep can't handle multibyte characters
    8929 8868 tests are not delivered with system/test/utiltest
    8860 Example in grep(1) is incorrect
    Reviewed by: Peter Tribble <peter.tribble@gmail.com>
    Reviewed by: Toomas Soome <tsoome@me.com>
    Reviewed by: Yuri Pankov <yuripv@gmx.com>
    Approved by: Robert Mustacchi <rm@joyent.com>

Also available in: Atom