A good start is to read the “GNU coding standards” and the “Information for maintainers of GNU software” documents.
GNU grep's mailing lists are hosted on lists.gnu.org.
To report bugs, suggest features, ask questions, or help in the development of GNU grep, please consider joining the bug-grep mailing list. Bug fixes and patches are better posted using the Savannah tools described below, rather than attaching them in email messages sent to this mailing list. To subscribe to this mailing list, send an email message to bug-grep-request@gnu.org with "subscribe" (without the quotation marks) in the subject header field (or in the body) of the email message, or visit the web page of the mailing list. Its archives are also available.
The list messages can be filtered by matching the following header field:
X-BeenThere: bug-grep@gnu.org
The list also automatically receives messages from the Savannah trackers that can be filtered by matching the following additional header fields:
X-Savane-Project: grep X-Savane-Tracker: bugs
or:
X-Savane-Project: grep X-Savane-Tracker: patch
or:
X-Savane-Project: grep X-Savane-Tracker: support
To follow development more closely, there is also the grep-commit mailing list. More details about what email messages are sent there can be found in the CVS repository section below. This is a read-only mailing list; subscribers cannot post directly to it. To subscribe to this mailing list, send an email message to grep-commit-request@gnu.org with "subscribe" (without the quotation marks) in the subject header field (or in the body) of the email message, or visit the web page of the mailing list. Its archives are also available.
The list messages can be filtered by matching the following header field:
X-BeenThere: grep-commit@gnu.org
Older GNU grep releases directed users to the bug-gnu-utils mailing list. As a consequence, some still post their bug reports and questions there. For this reason, it is a good idea for GNU grep developers to monitor this mailing list and follow up on related threads started there by redirecting them to the bug-grep mailing list. New threads about GNU grep must not be intentionally started there. To subscribe to this mailing list, send an email message to bug-gnu-utils-request@gnu.org with "subscribe" (without the quotation marks) in the subject header field (or in the body) of the email message, or visit the web page of the mailing list. Its archives are also available.
The list messages can be filtered by matching the following header field:
X-BeenThere: bug-gnu-utils@gnu.org
The Savannah project page for GNU grep features a bug report area, a patch submission area, and other development-related tools.
If you wish to post bug reports or patches on Savannah, it is preferable that you create an account for yourself there and that you login before posting so that other developers can know who you are and follow up on your posting with that in mind.
Before contributing significant changes to GNU grep, the Free Software Foundation (FSF) requires that you sign copyright assignment papers. Therefore, if you have not already done so and are not willing or able to, it may be better then to just describe bugs or proposed features rather than post actual code (or documentation), as they would then have to be rewritten anyway.
Please keep these areas clean by only posting there information that is directly related to the bug or patch at hand. Ask basic questions on the bug-grep mailing list.
The identity of the current maintainers is also available there.
Generic instructions can be found on GNU grep's Savannah web page about CVS.
The contents of GNU grep's source code are stored in the following CVS repository:
CVS_RSH=ssh cvs -z3 -d:ext:anoncvs@savannah.gnu.org:/cvsroot/grep co grep
This repository is also available from its web interface.
Each time a commit is made to this tree, a message is sent to the grep-commit mailing list which can be filtered by matching the following header field:
To: grep-cvs-logs@gnu.org
Additionally, each time a file is modified in this tree, a message is sent to the grep-commit mailing list which can be filtered by matching the following header field:
To: grep-cvs-diffs@gnu.org
Daily snapshots of GNU grep's source code CVS repository are made available by Tony Abou-Assaleh. They have the advantage of containing files generated by the GNU auto tools (and which are not found in CVS), just like a regular release would.
The contents of GNU grep's web site at http://www.gnu.org/software/grep/ are automatically extracted from the following CVS repository:
CVS_RSH=ssh cvs -z3 -d:ext:anoncvs@savannah.gnu.org:/webcvs/grep co grep
This repository is also available from its web interface.
Each time a commit is made to this tree, a message is sent to the grep-commit mailing list which can be filtered by matching the following header field:
To: grep-webcvs-logs@gnu.org
Additionally, each time a file is modified in this tree, a message is sent to the grep-commit mailing list which can be filtered by matching the following header field:
To: grep-webcvs-diffs@gnu.org
(The grep-commit mailing list functionality for this tree should now work thanks to Savannah sr #103962.)
Information about CVS itself is available from its web site. Information about SSH is available from the OpenSSH web site or from the lsh web site.
Developers with write access to the CVS trees will need to create an account on Savannah and upload their SSH public identity information there.
People who can't access a CVS repository through its usual interface (because they sit behind a prohibitive firewall) can download individual files from a CVS repository's web interface, when one is available. This latter process can be automated by using a client program such as CVSGrab.
The latest stable release of GNU grep is "2.5.1a".
The current roadmap for GNU grep has been laid out in a 2005-03-08 post by Stepan Kasal on the bug-grep mailing list entitled “Plan for grep”:
2.5.2 ===== Our main goal for grep 2.5.2 is to get sane performance with utf-8. That can be achieved by the patches written by Tim Waugh for Red Hat. Besides that, I can do some changes in the infrastructure, so that I can "breathe": 1) rewrite the configure.in script, perhaps also Makefile.am 2) set up for gnulib-tool --import 3) improve the test ifrastructure I'm afraid I have to do 1) myself, and it is closely tied with 2), so they probably have to be done together. If someone likes awk and wanted to help with 3), it could help. In short, there should be only one awk script for .test-->.script rule. The header of each .test file should state some details, like which command to run, eg. "grep -E". We also heve to invent a way to collect the test cases for non-C locales; either by running the whole set twice, or by creating a separate .test files. The "make check" goal should run this, if the computer has a locale like en_US.utf8 installed. After completing these, we can: 4) check in the patches for the sync of dfa.c with GNU awk 5) other small patches which wait for a test case 6) process the RedHat patches After 6), I should repeat Tim's measurments and see whether the utf8 performance improved. Independently, I'd like to see 7) some _minimal_ cleanup of the grep(), grepdir(), recursion (the "main loop") and fix --directories=read 8) mark the -P option clearly as "experimental"; Well, that'll be perhaps enough for a release. 2.5.3 ===== Fix the combinations: * -i -o * --colour -i * -o -b * -o and zero-width matches Go through the bug list im my mailbox and fix fixable. Fix bugs reported with 2.5.2. 2.6.x ===== The following should go here: - upgrade to current regex.c from glibc, - new functionality, - fixes for -P, - heavy refactoring.
A number of tasks must be performed before every release.
Drop dfa.[ch] into a copy of gawk and run “make check”.
The grep.pot file must be sent to the Translation Project to get fresh po files.
The ABOUT-NLS file must be updated by getting a fresh copy from GNU gettext's CVS with
cvs -d:pserver:anoncvs@sources.redhat.com:/cvs/gettext co gettext/gettext-runtime/ABOUT-NLS
with password “anonymous” or the following line in $HOME/.cvspass:
/1 :pserver:anoncvs@sources.redhat.com:2401/cvs/gettext Ay=0=a%0bZ
(Shouldn't this be automated by “make dist” instead of keeping a redundant copy in GNU grep's CVS?)
The NEWS file must be updated to document significant new features in GNU grep.
Some regression tests may be known to fail for the impending release. These specific tests should either document in their output that their failure is known about and to be expected and ignored, or they should just be disabled in the release (but kept activated in CVS after that). This is to limit the number of redundant bug reports.
The source code for GNU grep includes a TODO file which contains various ideas and issues that may be worth exploring.
See this list of grep implementations.
Take a look at these and consider opportunities for merging or cloning:
In general, interesting things to check in POSIX/OpenGroup include:
For this issue, interesting things to check in POSIX include:
In particular, consider the following with POSIX' approach on case folding in mind. Assume a non-Turkic locale with a character repertoire reduced to the following various forms of “LATIN LETTER I”:
0049;LATIN CAPITAL LETTER I;Lu;0;L;;;;;N;;;;0069; 0069;LATIN SMALL LETTER I;Ll;0;L;;;;;N;;;0049;;0049 0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN CAPITAL LETTER I DOT;;;0069; 0131;LATIN SMALL LETTER DOTLESS I;Ll;0;L;;;;;N;;;0049;;0049
First note the differing UTF-8 octet lengths of U+0049 (0x49) and U+0069 (0x69) versus U+0130 (0xC4 0xB0) and U+0131 (0xC4 0xB1). This implies that whole UTF-8 strings cannot be case-converted in place, using the same memory buffer, and that the needed octet-size of the new buffer cannot merely be guessed.
We have
lc(I) = i, uc(I) = I lc(i) = i, uc(i) = I lc(İ) = i, uc(İ) = İ lc(ı) = ı, uc(ı) = I
where lc() and uc() denote lower-case and upper-case conversions.
There are several candidate --ignore-case logics (including the one mandated by POSIX):
Using the
if (lc(input_wchar) == lc(pattern_wchar))
logic leads to the following matches:
\in I i İ ı pat\ ---------- "I" | Y Y Y n "i" | Y Y Y n "İ" | Y Y Y n "ı" | n n n Y
There is a lack of symmetry between CAPITAL and SMALL LETTERs with this.
Using the
if (uc(input_wchar) == uc(pattern_wchar))
logic leads to the following matches:
\in I i İ ı pat\ ---------- "I" | Y Y n Y "i" | Y Y n Y "İ" | n n Y n "ı" | Y Y n Y
There is a lack of symmetry between CAPITAL and SMALL LETTERs with this.
Using the
if ( lc(input_wchar) == lc(pattern_wchar) || uc(input_wchar) == uc(pattern_wchar))
logic leads to the following matches:
\in I i İ ı pat\ ---------- "I" | Y Y Y Y "i" | Y Y Y Y "İ" | Y Y Y n "ı" | Y Y n Y
There is some elegance and symmetry with this. But there are potentially two conversions to be made per input character. If the pattern is pre-converted, two copies of it need to be kept and used in a mutually coherent fashion.
Using the
if ( input_wchar == pattern_wchar || lc(input_wchar) == pattern_wchar || uc(input_wchar) == pattern_wchar)
logic (as mandated by POSIX) leads to the following matches:
\in I i İ ı pat\ ---------- "I" | Y Y n Y "i" | Y Y Y n "İ" | n n Y n "ı" | n n n Y
There is a different CAPITAL/SMALL symmetry with this. But there's also a loss of pattern/input symmetry that's unique to it. Also there are potentially two conversions to be made per input character.
Using the
if (lc(uc(input_wchar)) == lc(uc(pattern_wchar)))
logic leads to the following matches:
\in I i İ ı pat\ ---------- "I" | Y Y Y Y "i" | Y Y Y Y "İ" | Y Y Y Y "ı" | Y Y Y Y
This shows total symmetry and transitivity (at least in this example analysis). There are two conversions to be made per input character, but support could be added for having a single straight mapping performing a composition of the two conversions.
Any optimization in the implementation of each logic must not change its basic semantic.
In general, interesting things to check in Unicode include:
For this issue, interesting things to check in Unicode include:
Unicode uses the
if (toCasefold(input_wchar_string) == toCasefold(pattern_wchar_string))
logic for caseless matching. Let's consider the “LATIN LETTER I” example mentioned above. In a non-Turkic locale, simple case folding yields
toCasefold_simple(U+0049) = U+0069 toCasefold_simple(U+0069) = U+0069 toCasefold_simple(U+0130) = U+0130 toCasefold_simple(U+0131) = U+0131
which leads to the following matches:
\in I i İ ı pat\ ---------- "I" | Y Y n n "i" | Y Y n n "İ" | n n Y n "ı" | n n n Y
This is different from anything so far!
In a non-Turkic locale, full case folding yields
toCasefold_full(U+0049) = U+0069 toCasefold_full(U+0069) = U+0069 toCasefold_full(U+0130) = <U+0069, U+0307> toCasefold_full(U+0131) = U+0131
with
0307;COMBINING DOT ABOVE;Mn;230;NSM;;;;;N;NON-SPACING DOT ABOVE;;;;
which leads to the following matches:
\in I i İ ı pat\ ---------- "I" | Y Y * n "i" | Y Y * n "İ" | n n Y n "ı" | n n n Y
This is just sad!
Note that having toCasefold(U+0131), simple or full, map to itself instead of U+0069 is in contradiction with the rules of Section 5.18 of the Unicode Standard since toUpperCase(U+0131) is U+0049. Same thing for toCasefold_simple(U+0130) since toLowerCase(U+0131) is U+0069. The justification for the weird toCasefold_full(U+0130) mapping is unknown; it doesn't even make sense to add a dot (U+0307) to a letter that already has one (U+0069). It would have been so simple to put them all in the same equivalence class!
Otherwise, also consider the following problem with Unicode's approach on case folding in mind. Assume that we want to perform
echo 'AßBC | grep -i 'Sb'
which corresponds to
input: U+0041 U+00DF U+0042 U+0043 U+000A pattern: U+0053 U+0062
Following “CaseFolding-4.1.0.txt”, applying the toCasefold() transformation to these yields
input: U+0061 U+0073 U+0073 U+0062 U+0063 U+000A pattern: U+0073 U+0062
so, according to this approach, the input should match the pattern. As long as the original input line is to be reported to the user as a whole, there is no problem (from the user's point-of-view; implementation is complicated by this).
However, consider both these GNU extensions:
echo 'AßBC' | grep -i --only-matching 'Sb' echo 'AßBC' | grep -i --color=always 'Sb'
What is to be reported in these cases, since the match begins in the middle of the original input character 'ß'?
Note that Unicode's toCasefold() cannot be implemented in terms of POSIX' towctrans() since that can only return a single wint_t value per input wint_t value.
The purpose of this listing is to help GNU grep maintainers track down bug fixes and improvements made by distributors so they can be integrated back into the upstream releases from GNU, if appropriate.
Users should not use this listing to find a substitute target where to send their bugs reports. These are still best sent upstream, to the GNU grep team, through the use of the bug-grep@gnu.org mailing list or of the GNU grep project page on Savannah.
This listing is not exhaustive; priority is given to listing distributors who actually maintain patches to the upstream package from GNU.
Please keep this listing sorted by entry. Each field type may appear more than once if appropriate, the field order being significant.
Web site | http://www.debian.org/ |
Package database entry | Old stable http://packages.debian.org/oldstable/base/grep |
Maintainer | Robert van der Meulen <rvdm at debian.org> |
Package database entry | Stable http://packages.debian.org/stable/base/grep |
Maintainer | Ryan M. Golbeck <rmgolbeck at debian.org> |
Maintainer | Jeff Bailey <jbailey at nisa.net> |
Package database entry | Testing http://packages.debian.org/testing/base/grep |
Package database entry | Unstable http://packages.debian.org/unstable/base/grep |
Maintainer | Anibal Monsalve Salazar <anibal at debian.org> |
Maintainer | Santiago Ruano Rincon <santiago at unicauca.edu.co> |
Bug tracking | http://bugs.debian.org/grep |
Source package name | grep |
Binary package name | grep |
Entry updated | 2005-11-08 |
Web site | http://fedora.redhat.com/ |
Web site | http://www.redhat.com/ |
Maintainer | Tim Waugh <twaugh at redhat.com> |
Bug tracking | Red Hat Bugzilla http://bugzilla.redhat.com/ |
Managed repository | cvs -d:pserver:anonymous@cvs.fedora.redhat.com:/cvs/dist co devel/grep |
Managed repository | http://cvs.fedora.redhat.com/viewcvs/devel/grep/ |
Source package name | grep |
Binary package name | grep |
Entry updated | 2005-05-05 |
Web site | http://www.freebsd.org/ |
Bug tracking | http://www.freebsd.org/cgi/query-pr-summary.cgi?query |
Managed repository | CVS_RSH=ssh cvs -d:ext:freebsdanoncvs@anoncvs.FreeBSD.org:/home/ncvs co src/gnu/usr.bin/grep |
Managed repository | http://www.freebsd.org/cgi/cvsweb.cgi/src/gnu/usr.bin/grep/ |
Entry updated | 2005-05-05 |
Web site | http://www.gentoo.org/ |
Package database entry | http://packages.gentoo.org/packages/?category=sys-apps;name=grep |
Bug tracking | Gentoo Bugzilla http://bugs.gentoo.org/ |
Managed repository | http://www.gentoo.org/cgi-bin/viewcvs.cgi/sys-apps/grep/ |
Source package name | grep |
Binary package name | grep |
Entry updated | 2005-05-05 |
Web site | http://www.mandrivalinux.com/ |
Bug tracking | Mandriva Bugzilla http://qa.mandriva.com/ |
Source package name | grep |
Binary package name | grep |
Entry updated | 2005-05-05 |
Web site | http://www.netbsd.org/ |
Package database entry | ftp://ftp.netbsd.org/pub/NetBSD/packages/pkgsrc/textproc/grep/README.html |
Bug tracking | http://www.netbsd.org/Misc/query-pr.html |
Managed repository | cvs -d:pserver:anoncvs@anoncvs.NetBSD.org:/cvsroot co pkgsrc/textproc/grep |
Managed repository | http://cvsweb.netbsd.org/bsdweb.cgi/pkgsrc/textproc/grep/ |
Source package name | grep |
Binary package name | grep |
Entry updated | 2005-05-05 |
Web site | http://www.openbsd.org/ |
Package database entry | http://www.openbsd.org/3.8_packages/i386/ggrep-2.5.1p1.tgz-long.html |
Maintainer | Christian Weisgerber <naddy at openbsd.org> |
Bug tracking | http://www.openbsd.org/query-pr.html |
Managed repository | cvs -d:pserver:anoncvs@anoncvs1.ca.openbsd.org:/cvs co ports/sysutils/ggrep |
Managed repository | http://www.openbsd.org/cgi-bin/cvsweb/ports/sysutils/ggrep/ |
Source package name | ggrep |
Binary package name | ggrep |
Entry updated | 2005-11-08 |
Web site | http://www.openpkg.org/ |
Maintainer | Ralf S. Engelschall <rse at openpkg.org> |
Managed repository | cvs -d :pserver:anonymous@cvs.openpkg.org:/v/openpkg/cvs co openpkg-src/grep |
Managed repository | rsync -av rsync://rsync.openpkg.org/openpkg-cvs/openpkg-src/grep/ . |
Managed repository | http://cvs.openpkg.org/dir?d=openpkg-src/grep |
Source package name | grep |
Binary package name | grep |
Entry updated | 2005-06-19 |
Web site | http://www.novell.com/linux/suse/ |
Maintainer | Andreas Schwab <schwab at suse.de> |
Package database entry | Professional http://www.novell.com/products/linuxpackages/professional/grep.html |
Source package name | grep |
Binary package name | grep |
Entry updated | 2005-06-19 |
Return to GNU grep's main page.
Return to the GNU Project's home page.
Return to the FSF's home page.
Please send inquiries about GNU and the FSF to
Free Software Foundation Voice: +1 617 542-5942 51 Franklin Street, Fifth Floor Fax: +1 617 542-2652 Boston MA 02110-1301 USA Email: gnu@gnu.org
Please send broken links and other web page corrections (or suggestions) to
The GNU Webmasters webmasters@gnu.org
Please see the Translations README for information on coordinating and submitting translations.
Copyright © 2005 Free Software Foundation, Inc.,
51 Franklin Street, Suite 330, Boston, MA 02110-1301, USA
Verbatim copying and distribution of this entire article
are permitted worldwide, without royalty, in any medium,
provided this notice and the copyright notice are preserved.
Updated: $Date: 2005/11/11 07:46:04 $ (UTC) by $Author: charles_levert $ (at savannah.gnu.org)