[image of the Head of a GNU]

Ideas list

STUDENTS - BEFORE YOU SUBMIT YOUR PROJECT PROPOSAL:
Please make sure that you have read the GNU Project's guidelines for Summer of Code projects. In particular, please make sure you include all the information we need.

Many GNU projects have more than one suggestion, so they're listed in alphabetical order by project.

Arch - Autoconf - Bison - Classpath - CSSC - DotGNU - Emacs IRC - Emacs Muse - Emacs Planner - fdisk - findutils - freetalk - GCC - Gimp - Gnash - Gnome - GNUnet - GNUstep - GnuTLS - GRUB - GSS - Hurd - libcdio - LibIDN - Parted - SASL - Shishi - Smalltalk - Texinfo - XaoS

About adding ideas to this page:


GNU Arch

  1. Tree inventory tool: a tree inventory examines the contents of a tree and distinguishes "important" files from "discardable" files. For example, if the tree is a C program, the ".c" and ".h" files are important but the ".o" files and Emacs back-up files ("*~") are "discardable".

    The inventory tool also assigns a logical ID to files such that that ID is independent of the file name. If you rename "foo.c" to be "bar.c", the inventory tool should say before hand that "foo.c" has logical identity X and, after, that "bar.c" now has logical identity X.

    There should be flexible ways for a user to assign logical identities to files.

    Directories, symbolic links, and special files should be able to have logical identities.

  2. Whole tree diff and patch: A traditional recursive diff compares files that have the same name in both trees and doesn't compare directories at all. An arch recursive diff and patch should be based on the logical IDs of the inventory too. If, in tree A, the file with ID X is called "foo.c" and in tree B the file with ID X is called "bar.c", Arch's whole tree diff should know to compare "foo.c" to "bar.c". If we apply the resulting patch to a third tree, in which the file with ID X is called "baz.c", Arch's whole tree patch should know to apply the differences to "baz.c". If comparison of two trees reveals that "foo.c" has been renamed to "bar.c", then applying that patch to a tree that still has "foo.c" should cause the file to be renamed "bar.c".

Arch 1.x already had these features but there are problems with their implementation. In Arch 1.x, these features aren't available as separate tools -- you would have a tricky time mixing them cleanly with `git', for example. And in Arch 1.x, people don't much like the syntax and semantics of the various control files that are used. And in Arch 1.x, the implementation does not have the greatest performance. The problems in 1.x are hard to fix incrementally because of a need for backward compatibility.

There is an opportunity to implement these features cleanly -- from scratch. To not worry *too* much about backwards compatibility. Just to take the good ideas and implement them in a solid form. With a little guidance, a talented student could do this in a couple of months. The result will be useful for Arch 2.0 but should also be useful to users of `git', `Subversion', and other systems.

For a student, this is a good chance to get exposure to fundamentals of coding to POSIX standards -- writing nicely portable code. It is good chance to practice using basic system calls in a context that requires understanding them in depth. This is also a great chance for a student to have a hand in making the good ideas in Arch more widely adopted.


GNU Autoconf

  1. Make Autoconf more cross-compilation friendly by providing some mechanism, e.g., ssh-based, for AC_TRY_RUN to be executed on a remote system.

  2. Provide standard ways to look at files on the remote system.
    Some people write autoconf tests like
      if test -f /proc/self/environ
        AC_DEFINE(HAVE_PROC)
      fi
    

    This works fine when building with a native compiler. It completely fails when building with a cross-compiler: running test -f on the build system tells you nothing about the host system. The proposal is to add something like AC_FILE_EXISTS(/proc/self/environ) to see whether the file exists. In the normal native case this can be implemented using test -f; in the cross case it can be implemented via ssh.

    In general, a carefully written autoconf script will work when building with a cross-compiler. Unfortunately, not all scripts are written carefully. The goal is to make it easier for this to work correctly. The ultimate goal is to make it easier to cross-build a GNU/Linux distribution.


GNU Bison

GNU Bison is the GNU project's parser generator. There are many possible tasks to be addressed.

  1. Extension to other languages.
    Currently Bison generates parsers in C/C++ for LALR(1) and GLR algorithms. More language support would be useful: Scheme, Java, C# to name just a few.
  2. Extension to other parsing techniques.
    Bison support LALR(1) and extends it to G(LA)LR(1), but there are other interesting parsing techniques. For a start, one could consider issuing non-compressed parsers, as today the need for compressed tables is much less critical, and may even incur a slight loss of performance. A more ambitious tasks consists in implementing the full LR(1) automata support. There are good algorithms that allow to keep the full LR(1) expressive power while keeping as much as possible LALR's compactness. The Menhir parser generator (for the CAML programming language) is a perfect implementation of these ideas. Implementing them in Bison would unleach the algorithm to other programming languages.
  3. Extension of the front-end features.
    Many features are desirable in Bison, all of them very reachable, provided one has the time to address them:
  4. Burke-Fisher Error Correction
    YACC error recovery scheme is quite poor: the author of the grammar has to clutter her grammar with special annotation specifying how to recover from the error. As emphasized, this is not even correction: most of the time, the intent is merely to recover from the error, i.e., continuing as far as we can, while completely ignoring the contents of the erroneous text.

    The Burke-Fisher Error Correction (or repair) algorithm tries to address these issues by trying, by itself, insertion, deletion and replacement of various tokens around the error spot. In addition, the user is provide with new directives to specify semantic values of tokens to create.

    This scheme is quite efficient, especially because :

    The work would consist of implementing Burke-Fisher in Bison, for C output only as a first stage. The main source of documentation is Andrew Appel's ``Modern Compiler Implementation''. Writing test cases in the current testing framework is a mandatory part of the project.


GNU Classpath

See the Summer of Code Projects list for GNU Classpath.

GNU CSSC

  1. Fully implement ignored, included and excluded deltas.
  2. Implement sccs-comb

DotGNU

The DotGNU project is a Free Software implementation of the .NET and Mono platforms. The following is a list of items you might want to work on. Most of these tasks are considered at the moment to be 50% complete.

If you want to modify or extend these tasks or have your own ideas, please see the GNU Project guidelines for Summer of Code projects. If you want to discuss potential projects with the DotGNU team, we'd love to talk to you about it. You can contact us by email on the DotGNU mailing list, or by IRC on the #portable.net or #dotgnu channels on irc.freenode.net.

  1. Finish libJIT ELF writer (Complexity: medium-high)
    Read the libjit rationale for instruction and rationale for the DotGNU JIT Library (libJIT). The libJIT library contains routines that permit pre-compiling JIT'ed functions to an on-disk representation. This representation can be loaded at some future time, to avoid the overhead of compiling the functions at runtime. We use the ELF format for this purpose, which is a common binary format used by modern operating systems and compilers.

    GNU/Linux uses ELF. However, it isn't necessary for your operating system to be based on ELF natively. We use our own routines to read and write ELF binaries. We chose ELF because it has all of the features that we require, and reusing an existing format was better than inventing a completely new one.

  2. Port libJIT to a new architecture (Complexity: medium-high)
    You could port libJIT to a new architecture, for example OpenRISC, SPARC, MIPSEL and so on.

    For this project, you should be familiar with compiler implementation techniques and the particulars of the target CPU's instruction set. The libJIT manual describes the steps needed to for porting libJIT to new architectures.

  3. Enhance the libJIT interpreter (Complexity: medium-high)
    LibJIT includes an interpreter for running code on platforms that don't have a native code generator yet. This reduces the need for programmers to write their own interpreters for such platforms. Essentially, this project means making the regression tests with 'make check' in the Portable .NET directory work with the interpreter.

  4. Finish the implementation of libJIT support for ARM or x86-64 (Complexity: medium)
    For this project, you should be familiar with compiler implementation techniques and the particulars of the target CPU's instruction set.

    The libJIT manual describes the steps needed to for porting libJIT to new architectures. We can provide access to ARM and x86-64 machines, and indeed machines with other CPUs too.

  5. Enhance libJIT support for x86 (Complexity: medium)
    LibJIT includes a set of primitive code generators. However, the current implementation calls intrinstic functions for opcodes with long and float values. These need to be implemented as primitive code generators instead.

  6. Enhance libJIT optimization (Complexity: medium-high)
    For example, implement inlining, enhance constant propagation or dead-code elimination.
  7. Work on memory leaks, implement a special feature in GC. (Complexity: high)
    It would be beneficial to have a method in GC which can enumerate all objects which reference a specified object. The signature of this method might be, for example, object[] GC.GetReferences( object o ).

  8. Porting Application (Complexity: medium)
    There are a number of Free applications using .NET which currently do not run under DotGNU. Pick any non-trivial Free application and propose a Summer-of-Code project to make it work under DotGNU. The CodeProject contains many software projects that are interesting, but they are likely small.

    Ports should aim to create a helper class library to assist in the porting. Basically, every time a P/Invoke is found in one of these applications or a dependency exists on a third-party control or library, some stubs or primitive implementation should be exposed in this "helper" library.

    This includes Windows.Forms, XML, and Internet applications.

  9. Enhance Windows.Forms (Complexity: medium)
    The Portable .NET Windows.Froms library implements much of .NET 1.1, but many are still missing. None of the .NET 2.0 specific Windows.Forms is implemented yet.

    This project would significantly enhance the completeness of implementation of at .NET 1.1 or .NET 2.0.

  10. Replacing CIL with native code. (Complexity: very high)
    DotGNU contains a code generator that can be used for Just-in-Time compilation at runtime. Code can also be compiled ahead of time to produce native code before it's needed.

    JIT compilation is more commonly used, but for some systems where memory is restricted or where startup time is important, pre-compiling the code can be a significant win.

    The goal is to modify the runtime and compilation so that the bytecodes can be safely removed from a program and a single image is shipped containing both metadata and native code.

  11. Implement generics or any other C# 2.0, 3.0 feature. (Complexity: very high)
    The Portable .NET C# compiler is based on the treecc tool.

  12. Object-oriented C# bindings for Allegro (Complexity: medium-high)
    This project would provide C# bindings for the Allegro library. This includes not only being able to call Allegro functions from C#, but also being able to do so in a way which 'feels natural' for the C# language. While the first part of the task is technically straightforward to define, the second part will require some thoughtful interface design.

    The Allegro library is a free video game software library, with functions for basic 2D graphics, image manipulation, text output, audio output, midi music, input and timers. It also includes additional routines for things like fixed-point and floating-point matrix arithmetic, unicode strings, file system access, file manipulation, data files, and (limited, software-only) 3D graphics.


Emacs IRC Client (ERC)

ERC is a powerful, modular, and extensible IRC client for Emacs.

  1. Come up with a good algorithm for detecting whether to automatically reconnect to a server if the connection fails at some point. Ideally, this would give up eventually so as not to freeze the Emacs process. Also, it should Do The Right Thing when the user initially tries to connect to an IRC server and the connection is refused.
  2. Improve idle detection by using the metric of when a key was last pressed or when a mouse action last occurred.

Emacs Muse

Emacs Muse is an authoring and publishing environment for Emacs. It simplifies the process of writings documents and publishing them to various output formats. Muse uses a very simple Wiki-like format as input.

  1. Add support for publishing documents in one or more of the following markup syntaxes: Markdown, MoinMoin, Org Mode, reStructuredText.
  2. Implement conversion routines to turn the following formats into Muse-native markup: (X)HTML, Docbook, Groff, Texinfo.
  3. Implement add-on routines that make Muse a feasible option for documenting software, especially with respect to publishing documentation to the Texinfo format.

Emacs Planner

Planner is an organizer and day planner for Emacs. It helps you keep track of your pending and completed tasks, daily schedule, dates to remember, notes and inspirations.

  1. Permit category links to appear at the beginning rather than the end of a task description.
  2. Think of a good way to share tasks with Evolution and other day planners. This will involve exporting and importing tasks from these programs.

GNU fdisk

  1. Finish lfdisk, the Linux fdisk facsimile interface
  2. Create gfdisk, the full-featured fdisk-style interface to libparted
  3. Create cfdisk, the cfdisk-style interface to libparted
For more information, please see the full list of project suggestions for GNU fdisk.

Important notice: You do NOT need to be an expert in partitioning and file systems for these tasks.


GNU findutils

  1. slocate compatibility and other enhancements
  2. Be more NFS friendly
    Allow locate to pick up databases from mount points, so that many NFS clients can share the same locate database which we build on the server. Automatically adjust the path prefix of the results where requried. Ideally this should work without any requirement for locate to read a separate external configuration file. If you change the interpretation of $LOCATE_PATH, do this in a backward-compatible way.

GNU Freetalk

Freetalk is a console based Jabber client. It features a readline interface with completion of buddy names, commands, and even ordinary English words. Freetalk is extensible, configurable, and scriptable through a Guile interface.

  1. Replace loudmouth with libjingle as the XMPP library and enable voice support.
  2. Add SOCKS support for file transfer.
  3. Abstract the communication library from the messenger core, so that adding IRC as the communication medium is possible
  4. Add conferencing support.
  5. Provide curses/cdk bindings in the core (like readline) and provide and curses based UI.

GNU Gnash

Most of the suggested projects involve implementing missing functionality to achieve compliance with the Flash v8 format. Some of these are little projects are a few days of work a piece (and therefore you would need to propose a 'bundle' of these for the Summer of Code), others are more substantial and perhaps could be done on their own. All of these tasks involve finding and building test cases and documentation as well as just writing the code.

GNU GSS - Generic Security Services

GNU GSS is a new implementation of the GSS-API framework, used to provide security services to applications. It is typically used for Kerberos, but can support other mechanisms too.

The following is a list of items you might want to work on. If you want to modify or extend these tasks or have your own ideas what to work on, please feel invited to contact us on the help-gss mailing list.

  1. Make the library modular, so that each mechanism is in a separate library that is dlopen()'d. Have it be able to multiplex GSS-API services between Shishi, MIT Kerberos, and Heimdal, as per a configuration file. This would make it possible for all applications that support GSS-API to depend on GNU GSS, and the actual Kerberos implementation can be selected by the administrator on a host. This is useful for Linux distributions which today ship separate packages for popular applications, e.g. OpenSSH, one without GSS-API support, one with GSS-API support for MIT Kerberos, and one with GSS-API support for Heimdal.
  2. Implement the Simple Public-Key GSS-API Mechanism, see RFC 2025.
  3. Implement the The Simple and Protected GSS-API Negotiation Mechanism, see RFC 2478.

GNU Hurd

The GNU Hurd is the GNU project's replacement for the Unix kernel. The Hurd is a collection of servers that run on the Mach microkernel to implement file systems, network protocols, file access control, and other features that are implemented by the Unix kernel or similar kernels (such as Linux).

The following is a list of items you might want to work on. If you want to modify or extend these tasks or have your own ideas what to work on, please feel invited to contact us on the bug-hurd mailing list or the #hurd IRC channel.

  1. Make GNU Mach use more up to date device drivers.
  2. Work on GNU Mach's IPC / VM system.
  3. Design and implement a sound system.
  4. Transition the Hurd libraries and servers from cthreads to pthreads.
  5. Find and implement a reasonable way to make the Hurd servers use syslog.
  6. Design and implement libchannel, a library for streams.
  7. Rewrite pfinet, our interface to the IPv4 world.
  8. Implement and make the Hurd properly use extended attributes.
  9. Design / implement / enhance support for the...
    1. Andrew File System (AFS);
    2. NFS client and NFSd;
    3. EXT3 file system;
    4. Logical Volume Manager (LVM).

GnuTLS

GnuTLS is a free implementation of the SSL/TLS security protocol.

The following is a list of items you might want to work on. If you want to modify or extend these tasks or have your own ideas what to work on, please feel invited to contact us on the help-gnutls mailing list.

  1. Datagram TLS support. RFC 4347 describe a UDP version of TLS.
  2. Support for the elliptic curves ciphersuites as an alternative authentication method.
  3. Redesign and rewrite libtasn1 (asn.1 parser library). The new implementation must be efficient and easy to extend with new types and encoding rules (say BER and DER).
  4. Write a crypto backend to perform (symmetric and assymetric de/encryption, hash and MAC, key generation, random number generation). It should be able to utilize libgcrypt and other free libraries, such as libtomcrypt, and should be extendable for hardware drivers.

GNUnet

GNUnet is a framework for secure peer-to-peer networking. The primary application implemented within the GNUnet framework is anonymous censorship-resistant file-sharing.

  1. Port GNUnet to Java ("Freeway", www.gnunet.org/freeway):
    The GNUnet reference implementation is written in pure C. Stephane Vallee ported the old 0.6.x tree to Java about two years ago, however the current 0.7.x version is dramatically different and the Java version has not been updated to reflect those changes. Having a (compatible) Java version would make GNUnet easier to install for some users. It would also help developers that are more skilled in Java than in C to prototype new protocols. Some users would consider Java to be an improvement in terms of security.

    The goal of this project is to port the current C code to Java, re-using some of the existing Freeway code. While the existing Java port does not work properly with free JVMs, it would be important that the result will run using only free software. We do not expect that the entire codebase of GNUnet is ported to Java; instead, the Java version should be able to use native calls to load existing C modules. This will limit the effort to porting the core (src/server/, about 8000 lines) and utilities (src/util/, about 17000 lines, largely unchanged from 0.6.x). After that, it should already be possible to write extensions in Java. Note that the 0.6.x Freeway port did not use native C calls to load existing modules, which is why it was not possible to keep up with the development of the reference implementation written in C. We will make sure that this does not happen this time.

  2. Create a web interface for GNUnet:
    GNUnet uses a client/server design: one or more clients connect to a single background process that implements all of the "GNUnet logic". Our current client tools are either command-line or GTK based. The goal of this project is to create a webinterface for the background process which enables the user to obtain status information, manage downloads, upload content etc. A webinterface would make deployment easier in situations where many client systems already have a browser available -- an administrator would then only have to install the webinterface and the GNUnet daemon on one central server. A C/C++ implementation with few dependencies would be preferred, but PHP, Java or similar solutions are also welcome as long as they are free and easy to set up.

  3. Port the GTK based user interface to Qt:
    GNUnet's graphical user interface is based on glade/GTK, which is not supported natively on OS X. Also, various embedded devices make use of Qt and GTK drops support of Windows 9x in 2.8. The goal of this project is to port gnunet-gtk (manage downloads, uploads, searches, ...) to Trolltech's Qt toolkit. Testing on OS X would be appreciated. The existing GTK implementation uses a strict Model-View-Controller (MVC) design. The View is provided by glade, the model by the FSUI library. Using this design, a Qt port should only have to implement code for a Qt controller and use a tool like qtdesigner to design the interface. If time permits, a port of gnunet-setup (configuration of the background process and the client tools) from GTK to Qt and possibly integration with the general Qt interface could be attempted.

  4. Improve connection of NATted and firewalled peers:
    Connections in GNUnet are usually done over TCP or UDP (IPv4 and IPv6). A HTTP-like transport is also available, but it does not work with HTTP proxies. The goal of this project is to improve connectivity by exploring the use of additional techniques like UPnP, Hole-punching, "GNUnet over HTTP", "GNUnet over DNS" etc. This project requires access to various NAT boxes for the empirical evaluation (which we cannot provide).

GNUstep

Some principal ideas:

  1. Implement KVO support in the base library
  2. Improve printing support
  3. Write a PDFKit wrapper around the poppler library
  4. Clean up and finish implementing the text system
  5. Create an AJAX framework for GNUstepWeb

This separate page has more ideas and information for GNUstep.


GRUB

There is a list of project suggestions for GRUB.


GNU libcdio

There is a detailed list of libcdio project ideas but here are the headline items from that list:

  1. Add, finish or improve an OO API.
  2. Add, finish a CD-image format parser. Use it say to write a CD-image format converter.
  3. Finish UDF and/or ISO 9660 handling.
  4. write a "tar" command for ISO 9660 and/or UDF images.
  5. Add EAC (exact audio copy) features, possibly on top of cd-paranoia. Fix possible cd-paranoia bugs.
  6. Modify libcdio to handle CD-ROM drive customization.
  7. Wide character support for CD-Text.
  8. Revise to use glib rather than home-grown routines.

LibIDN

GNU LibIDN is a library for string processing, and implements StringPrep, Punycode and IDNA. It can be used by applications to translate non-ASCII hostnames to names that can be found in the DNS. Libidn is part of libc and used by the getaddrinfo() API.

General guidelines for contributions are available.

The following is a list of items you might want to work on. If you want to modify or extend these tasks or have your own ideas what to work on, please feel invited to contact us on the help-libidn mailing list.

  1. Implement a feature that reject invalid Unicode (UTF-8) data, and use it to validate all inputs to the library, as per RFC 3629.
  2. Optimize the library for speed and/or memory footprint.

GNU Parted

The separate list of project suggestions for GNU Parted has more details on all of these.

  1. Real partition ID support
  2. Extending ext3 support
  3. Support for more filesystems
  4. Partition guessing
  5. Porting to other platforms, like Solaris or BSD
  6. Miscellaneous frontend enhancements

There is also a project suggestion for GRUB which integrates it with libparted.

Important notice: You do NOT need to be an expert in partitioning and file systems for these tasks.


GNU SASL

GNU SASL is an implementation of the Simple Authentication and Security Layer framework and a few common SASL mechanisms. SASL is used by network servers (e.g., IMAP, SMTP) to request authentication from clients, and in clients to authenticate against servers.

  1. Make the mechanisms modular, and loaded using dlopen(). This will reduce the size of the core libgsasl library, and also enable more modularity to drop-in new SASL mechanisms. Compare how Cyrus SASL does this.
  2. Specify and implement a AES-CCM security layer (or similar encryption scheme that provide authenticated encryption) for DIGEST-MD5 and test interoperability with other implementations, see DIGEST-MD5 draft.
  3. Implement AES-CTR mode for DIGEST-MD5, see DIGEST-MD5 draft, and test interoperability with other implementations.
  4. Implement the One-Time-Pad SASL mechanism, see RFC 2444, and test interoperability with other implementations.
  5. Provide standard callbacks to query user for passwords, one for terminals (ttys), for GNOME (similar to GnuPG's gpg-agent), and for KDE (similar to KDE Wallet?). The mechanisms should be re-usable for use in other projects, e.g. Shishi. Possibly unified in a separate library.

GNU Shishi

Shishi is a free Kerberos 5 implementation. The goal is to be compatible with MIT Kerberos, Heimdal and Windows, and most basic features work. It can support Kerberos authentication in SSH (OpenSSH and LSH) and SASL (Cyrus SASL and GNU SASL), and support Kerberos rsh.

The following is a list of items you might want to work on. If you want to modify or extend these tasks or have your own ideas what to work on, please feel invited to contact us on the help-shishi mailing list.

  1. Implement the set/change password protocol, see draft-ietf-krb-wg-kerberos-set-passwd-04.txt. This would make it possible to change passwords remotely, through a standardized protocol.
  2. Implement Public-Key Cryptography for Initial Authentication in Kerberos, see draft-ietf-cat-kerberos-pk-init-34.txt. This is another way to support X.509 authentication in Kerberos, compared to the one which Shishi already support through TLS.
  3. Implement cross-realm authentication logic.
  4. Implement functionality to read MIT/Heimdal configuration files and Kerberos ticket caches. This would enable drop-in use of Shishi where MIT/Heimdal is used today.
  5. Implement a LDAP backend for the Kerberos server.

GNU Smalltalk

Implement an alternative syntax to input programs into GNU Smalltalk, that is cleaner for use of GNU Smalltalk as a scripting language. The spec is being developed and will be made available to the student (but the complexity of the task doesn't really depend on the details of the spec). Documenting the new functionality. Integrating the changes with existing tools such as the code browser and the automatic documentation generator.

Background needed: Knowledge of Smalltalk and of parsing techniques (recursive-descent parser).


GNU Texinfo

Generalize the font mechanism in texinfo.tex. Some knowledge of TeX, its fonts, and its macro system is needed, or it would take too long to learn all the background. E.g., you should know what a tfm file is. However, given knowledge of TeX, no deep knowledge of Texinfo (which is a pretty thin layer of TeX macros anyway) is needed.

One important result of doing this will be to allow direct Latin-1 input (which requires using different fonts than the default Computer Modern, e.g., Latin Modern). Alternatively, that could be the focus of the project.


GNU XaoS

The XaoS development page has more ideas.

  1. XaoS - formula evaluation: Implement formula evaluation for XaoS, using the formconv library.
  2. XaoS - GNOME and KDE integration: Integrate XaoS with Gnome and/or KDE. Currently only a native GUI is available.

Other GNU Projects

Some GNU projects are registered as separate projects in the Google Summer of Code. These include