The truth is that gawk was not designed for simple extensibility. The facilities for adding functions using shared libraries work, but are something of a “bag on the side.” Thus, this tour is brief and simplistic; would-be gawk hackers are encouraged to spend some time reading the source code before trying to write extensions based on the material presented here. Of particular note are the files awk.h, builtin.c, and eval.c. Reading awk.y in order to see how the parse tree is built would also be of use.
With the disclaimers out of the way, the following types, structure members, functions, and macros are declared in awk.h and are of use when writing extensions. The next section shows how they are used:
AWKNUMAWKNUM is the internal type of awk
floating-point numbers. Typically, it is a C double.
NODENODE.
These contain both strings and numbers, as well as variables and arrays.
AWKNUM force_number(NODE *n)void force_string(NODE *n)NODE's string value is current.
It may end up calling an internal gawk function.
It also guarantees that the string is zero-terminated.
size_t get_curfunc_arg_count(void)stack_ptr. If this value is
greater than tree->param_cnt, the function was
called incorrectly from the awk program.
Caution: This function is new as of gawk 3.1.4.
n->param_cntmake_builtin function.
n->stptrn->stlenNODE's string value, respectively.
The string is not guaranteed to be zero-terminated.
If you need to pass the string value to a C library function, save
the value in n->stptr[n->stlen], assign '\0' to it,
call the routine, and then restore the value.
n->typeNODE. This is a C enum. Values should
be either Node_var or Node_var_array for function
parameters.
n->vnamevoid assoc_clear(NODE *n)n.
Make sure that `n->type == Node_var_array' first.
NODE **assoc_lookup(NODE *symbol, NODE *subs, int reference)symbol is the array, subs is the subscript.
This is usually a value created with tmp_string (see below).
reference should be TRUE if it is an error to use the
value before it is created. Typically, FALSE is the
correct value to use from extension functions.
NODE *make_string(char *s, size_t len)NODE that
can be stored appropriately. This is permanent storage; understanding
of gawk memory management is helpful.
NODE *make_number(AWKNUM val)AWKNUM and turn it into a pointer to a NODE that
can be stored appropriately. This is permanent storage; understanding
of gawk memory management is helpful.
NODE *tmp_string(char *s, size_t len);NODE that
can be stored appropriately. This is temporary storage; understanding
of gawk memory management is helpful.
NODE *tmp_number(AWKNUM val)AWKNUM and turn it into a pointer to a NODE that
can be stored appropriately. This is temporary storage;
understanding of gawk memory management is helpful.
NODE *dupnode(NODE *n)NODE;
understanding of gawk memory management is helpful.
void free_temp(NODE *n)NODE
allocated with tmp_string or tmp_number.
Understanding of gawk memory management is helpful.
void make_builtin(char *name, NODE *(*func)(NODE *), int count)func as new built-in
function name. name is a regular C string. count
is the maximum number of arguments that the function takes.
The function should be written in the following manner:
/* do_xxx --- do xxx function for gawk */
NODE *
do_xxx(NODE *tree)
{
...
}
NODE *get_argument(NODE *tree, int i)i-th argument from the function call.
The first argument is argument zero.
NODE *get_actual_argument(NODE *tree, unsigned int i, int optional, int wantarray);i. wantarray is TRUE
if the argument should be an array, FALSE otherwise. If optional is
TRUE, the argument need not have been supplied. If it wasn't, the return
value is NULL. It is a fatal error if optional is TRUE but
the argument was not provided.
Caution: This function is new as of gawk 3.1.4.
get_scalar_argument(t, i, opt)get_actual_argument.
Caution: This macro is new as of gawk 3.1.4.
get_array_argument(t, i, opt)get_actual_argument.
Caution: This macro is new as of gawk 3.1.4.
void set_value(NODE *tree)void update_ERRNO(void)ERRNO variable, based on the current
value of the C errno variable.
It is provided as a convenience.
An argument that is supposed to be an array needs to be handled with some extra code, in case the array being passed in is actually from a function parameter.
In versions of gawk up to and including 3.1.2, the following boilerplate code shows how to do this:
NODE *the_arg;
the_arg = get_argument(tree, 2); /* assume need 3rd arg, 0-based */
/* if a parameter, get it off the stack */
if (the_arg->type == Node_param_list)
the_arg = stack_ptr[the_arg->param_cnt];
/* parameter referenced an array, get it */
if (the_arg->type == Node_array_ref)
the_arg = the_arg->orig_array;
/* check type */
if (the_arg->type != Node_var && the_arg->type != Node_var_array)
fatal("newfunc: third argument is not an array");
/* force it to be an array, if necessary, clear it */
the_arg->type = Node_var_array;
assoc_clear(the_arg);
For versions 3.1.3 and later, the internals changed. In particular, the interface was actually simplified drastically. The following boilerplate code now suffices:
NODE *the_arg;
the_arg = get_argument(tree, 2); /* assume need 3rd arg, 0-based */
/* force it to be an array: */
the_arg = get_array(the_arg);
/* if necessary, clear it: */
assoc_clear(the_arg);
As of version 3.1.4, the internals improved again, and became even simpler:
NODE *the_arg;
the_arg = get_array_argument(tree, 2, FALSE); /* assume need 3rd arg, 0-based */
Again, you should spend time studying the gawk internals; don't just blindly copy this code.