Next: , Previous: Nondecimal Data, Up: Advanced Features



10.2 Two-Way Communications with Another Process

     
     From: brennan@whidbey.com (Mike Brennan)
     Newsgroups: comp.lang.awk
     Subject: Re: Learn the SECRET to Attract Women Easily
     Date: 4 Aug 1997 17:34:46 GMT
     
     
     Message-ID: <5s53rm$eca@news.whidbey.com>
     
     
     
     
     
     
     On 3 Aug 1997 13:17:43 GMT, Want More Dates???
     <tracy78@kilgrona.com> wrote:
     >Learn the SECRET to Attract Women Easily
     >
     >The SCENT(tm)  Pheromone Sex Attractant For Men to Attract Women
     
     The scent of awk programmers is a lot more attractive to women than
     the scent of perl programmers.
     --
     Mike Brennan
     

It is often useful to be able to send data to a separate program for processing and then read the result. This can always be done with temporary files:

     # write the data for processing
     tempfile = ("mydata." PROCINFO["pid"])
     while (not done with data)
         print data | ("subprogram > " tempfile)
     close("subprogram > " tempfile)
     
     # read the results, remove tempfile when done
     while ((getline newdata < tempfile) > 0)
         process newdata appropriately
     close(tempfile)
     system("rm " tempfile)

This works, but not elegantly. Among other things, it requires that the program be run in a directory that cannot be shared among users; for example, /tmp will not do, as another user might happen to be using a temporary file with the same name.

Starting with version 3.1 of gawk, it is possible to open a two-way pipe to another process. The second process is termed a coprocess, since it runs in parallel with gawk. The two-way connection is created using the new `|&' operator (borrowed from the Korn shell, ksh):1

     do {
         print data |& "subprogram"
         "subprogram" |& getline results
     } while (data left to process)
     close("subprogram")

The first time an I/O operation is executed using the `|&' operator, gawk creates a two-way pipeline to a child process that runs the other program. Output created with print or printf is written to the program's standard input, and output from the program's standard output can be read by the gawk program using getline. As is the case with processes started by `|', the subprogram can be any program, or pipeline of programs, that can be started by the shell.

There are some cautionary items to be aware of:

It is possible to close just one end of the two-way pipe to a coprocess, by supplying a second argument to the close function of either "to" or "from" (see Close Files And Pipes). These strings tell gawk to close the end of the pipe that sends data to the process or the end that reads from it, respectively.

This is particularly necessary in order to use the system sort utility as part of a coprocess; sort must read all of its input data before it can produce any output. The sort program does not receive an end-of-file indication until gawk closes the write end of the pipe.

When you have finished writing data to the sort utility, you can close the "to" end of the pipe, and then start reading sorted data via getline. For example:

     BEGIN {
         command = "LC_ALL=C sort"
         n = split("abcdefghijklmnopqrstuvwxyz", a, "")
     
         for (i = n; i > 0; i--)
             print a[i] |& command
         close(command, "to")
     
         while ((command |& getline line) > 0)
             print "got", line
         close(command)
     }

This program writes the letters of the alphabet in reverse order, one per line, down the two-way pipe to sort. It then closes the write end of the pipe, so that sort receives an end-of-file indication. This causes sort to sort the data and write the sorted data back to the gawk program. Once all of the data has been read, gawk terminates the coprocess and exits.

As a side note, the assignment `LC_ALL=C' in the sort command ensures traditional Unix (ASCII) sorting from sort.

Beginning with gawk 3.1.2, you may use Pseudo-ttys (ptys) for two-way communication instead of pipes, if your system supports them. This is done on a per-command basis, by setting a special element in the PROCINFO array (see Auto-set), like so:

     command = "sort -nr"           # command, saved in variable for convenience
     PROCINFO[command, "pty"] = 1   # update PROCINFO
     print ... |& command       # start two-way pipe
     ...

Using ptys avoids the buffer deadlock issues described earlier, at some loss in performance. If your system does not have ptys, or if all the system's ptys are in use, gawk automatically falls back to using regular pipes.


Footnotes

[1] This is very different from the same operator in the C shell, csh.