From owner-ntemacs-users@cs.washington.edu  Wed Feb 17 11:35:19 1999
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "17" "February" "1999" "19:09:28" "GMT" "Andrew Innes" "andrewi@harlequin.co.uk" nil "105" "Re: AW: how to use emacs in -batch mode from bash?" "^From:" nil nil "2" nil nil nil nil]
	nil)
Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by june.cs.washington.edu (8.8.7+CS/7.2ju) with ESMTP id LAA10985 for <voelker@cs.washington.edu>; Wed, 17 Feb 1999 11:35:19 -0800
Received: (majordom@localhost) by trout.cs.washington.edu (8.8.5+CS/7.2trout) id LAA29587 for ntemacs-users-outgoing; Wed, 17 Feb 1999 11:10:55 -0800 (PST)
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2trout) with ESMTP id LAA29583 for <ntemacs-users@trout.cs.washington.edu>; Wed, 17 Feb 1999 11:10:51 -0800 (PST)
Received: from holly.cam.harlequin.co.uk (holly.cam.harlequin.co.uk [193.128.4.58]) by june.cs.washington.edu (8.8.7+CS/7.2ju) with ESMTP id LAA08886 for <ntemacs-users@cs.washington.edu>; Wed, 17 Feb 1999 11:10:49 -0800
Received: from gpo.cam.harlequin.co.uk (gpo.cam.harlequin.co.uk [192.88.238.241])           by holly.cam.harlequin.co.uk (8.8.4/8.8.4) with ESMTP 	  id TAA13702; Wed, 17 Feb 1999 19:10:02 GMT
Received: from gridlock.cam.harlequin.co.uk (gridlock.cam.harlequin.co.uk [192.88.238.223])           by gpo.cam.harlequin.co.uk (8.8.4/8.8.4) with ESMTP 	  id TAA28859; Wed, 17 Feb 1999 19:09:29 GMT
Message-Id: <199902171909.TAA28859@gpo.cam.harlequin.co.uk>
In-reply-to: <36C9ADFC.ABD3ACE4@Maths.QMW.ac.uk> (F.J.Wright@qmw.ac.uk)
References: <5B9BE15FBECDD111A1820000F843B87C16C16F@bkmail1.bk.bosch.de> <199902161547.HAA28357@june.cs.washington.edu> <36C9ADFC.ABD3ACE4@Maths.QMW.ac.uk>
Precedence: bulk
X-FAQ: http://www.cs.washington.edu/homes/voelker/ntemacs.html
From: Andrew Innes <andrewi@harlequin.co.uk>
Sender: owner-ntemacs-users@cs.washington.edu
To: F.J.Wright@qmw.ac.uk
CC: mike.fabian@it-mannesmann.de, Rolf.Sandau@de.bosch.com,         ntemacs-users@cs.washington.edu, cygwin@sourceware.cygnus.com
Subject: Re: AW: how to use emacs in -batch mode from bash?
Date: Wed, 17 Feb 1999 19:09:28 GMT

[added cygwin@sourceware.cygnus.com]

On Tue, 16 Feb 1999 17:42:20 +0000, "Dr Francis J. Wright" <F.J.Wright@qmw.ac.uk> said:
>OK.  Putting the pieces together, this works and appears to do what you
>want:
>
>bash-2.02$ hi=HO; emacs -batch --eval "(message \\\"$hi\\\")"
>HO
>
>But that leaves the question: why does it work?
>
>bash-2.02$ set -x
>bash-2.02$ emacs -batch --eval "(message \\\"hi\\\")"
>+ emacs -batch --eval '(message \"hi\")'
>hi
>
>Hence, this is equivalent to my previous suggestion after variable
>interpolation.  But I agree with you, Mike, that so many \s should not
>be necessary.
>
>Could it be that NTEmacs is parsing its command line based on an
>assumption that is wrong when the shell is bash?  It's probably using
>libraries that assume the shell is COMMAND or CMD, which have different
>quoting rules.  Hence, when using bash it is necessary to quote in a way
>that makes no sense from a UNIX/bash perspective.

That's pretty much right on the nose (except that command.com/cmd.exe
don't really have quoting rules; they are too dumb for that).

This is the old "Microsoft vs Cygnus" quoting rules problem, but in
reverse this time.

The basic problem is that Windows applications normally rely on the C
library startup code to construct the argv[] array (list of command line
arguments) by parsing the command line.  (On DOS/Windows, the command
line is passed as a single string and it is entirely up to the
application how it interprets that string.  On Unix, applications
receive a list of argument strings exactly as provided by the parent.
The C libraries for Windows compilers provide startup code to
reconstruct the list of argument strings to emulate the Unix
environment.)

This technique of the startup code parsing the command line to construct
the argument list is perfectly reasonable, but Cygnus put a fly in the
ointment by using slightly incompatible rules for parsing the command
line.  The basic rule is the same for both: arguments are separated by
white space (which is discarded), so quotes must be put around arguments
that are intended to contain white space.  The rules diverge when
handling the case where a quote character itself appears in an argument
(an embedded quote), and must be escaped so it isn't misconstrued as the
end of the argument.

Now Emacs was made aware of the two quoting rules back in 19.34.6 days,
to solve the problem of constructing the command line for subprocesses
started from Emacs, so that the subprocess will "see" the list of
arguments that Emacs intends even if there are embedded quotes.  (Aside:
At the same time, I added some magic so that Emacs would detect
automatically which rules to use by looking at the application
executable, specifically to check whether it imports cygwin.dll.  That
has worked well, except that the magic broke with newer releases of the
Cygnus library when the dll name changed.  The next version of Emacs
will have better magic which works with all releases of the cygwin
library, and will hopefully continue to work with any future releases.)

However, we are now seeing the same problem occuring, this time on the
Cygnus side.  The Cygnus port of bash will be applying the normal shell
quoting rules to parse the command line typed by the user (or entered in
the shell script), to construct the list of arguments to pass to Emacs.
However, when bash invokes spawn() or exec() or some similiar library
function to actually invoke Emacs, it has to flatten the argument list
into a single string.  Clearly, the library function that does that is
assuming the subprocess will use the Cygnus quoting rules to reconstruct
the list of arguments.  That fails when an argument contains an embedded
quote and the application doesn't use the Cygnus rules, which is the
situation here.

Note that this is a problem with bash that applies when it invokes any
application not compiled with the cygwin library, not just Emacs.

I see two possible solutions to this general problem:

 1. Change the cygwin spawn/exec/whatever library functions to use the
    Microsoft rules for escaping embedded quotes when running non-cygwin
    applications (I believe they already detects when they are spawning
    non-cygwin applications; if not, the method Emacs uses could be
    reused for this).

 2. Change the cygwin quoting rules to match the Microsoft ones.  This
    would apply to spawn/exec and the startup code, and would cause some
    breakage when mixing with applications compiled with old versions of
    cygwin.

Since cygwin-compiled applications tend to be recompiled when new
releases of the library come out, option (2) might actually be viable,
and would be the preferred solution since it would maximise the
interoperability between applications.  But even option (1) would be a
major improvement.

AndrewI

PS. There is a certain amount of irony in all this: the Microsoft
startup code looks like it was intended to support escaping embedded
quotes by doubling them (as Cygnus does), but the parsing code contains
a bug which prevents this from working.  If not for this bug, the
problem with bash invoking non-cygwin applications wouldn't arise.

From owner-ntemacs-users@cs.washington.edu  Wed Feb 17 13:51:47 1999
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Wed" "17" "February" "1999" "16:24:23" "-0500" "Christopher Faylor" "cgf@cygnus.com" nil "68" "Re: AW: how to use emacs in -batch mode from bash?" "^From:" nil nil "2" nil nil nil nil]
	nil)
Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by june.cs.washington.edu (8.8.7+CS/7.2ju) with ESMTP id NAA23641 for <voelker@cs.washington.edu>; Wed, 17 Feb 1999 13:46:46 -0800
Received: (majordom@localhost) by trout.cs.washington.edu (8.8.5+CS/7.2trout) id NAA01514 for ntemacs-users-outgoing; Wed, 17 Feb 1999 13:23:56 -0800 (PST)
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2trout) with ESMTP id NAA01510 for <ntemacs-users@trout.cs.washington.edu>; Wed, 17 Feb 1999 13:23:53 -0800 (PST)
Received: from cygnus.com (runyon.cygnus.com [205.180.230.5]) by june.cs.washington.edu (8.8.7+CS/7.2ju) with ESMTP id NAA21285 for <ntemacs-users@cs.washington.edu>; Wed, 17 Feb 1999 13:23:52 -0800
Received: from kramden.cygnus.com (kramden.cygnus.com [192.80.44.95]) 	by runyon.cygnus.com (8.8.7-cygnus/8.8.7) with ESMTP id NAA13884; 	Wed, 17 Feb 1999 13:23:49 -0800 (PST)
Received: (from cgf@localhost) by kramden.cygnus.com (8.8.7/8.7.3) id QAA14031; Wed, 17 Feb 1999 16:24:23 -0500
Message-ID: <19990217162423.A13997@cygnus.com>
References: <5B9BE15FBECDD111A1820000F843B87C16C16F@bkmail1.bk.bosch.de> <199902161547.HAA28357@june.cs.washington.edu> <36C9ADFC.ABD3ACE4@Maths.QMW.ac.uk> <199902171909.TAA28859@gpo.cam.harlequin.co.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.93i
In-Reply-To: <199902171909.TAA28859@gpo.cam.harlequin.co.uk>; from Andrew Innes on Wed, Feb 17, 1999 at 07:09:28PM +0000
Precedence: bulk
X-FAQ: http://www.cs.washington.edu/homes/voelker/ntemacs.html
From: Christopher Faylor <cgf@cygnus.com>
Sender: owner-ntemacs-users@cs.washington.edu
To: Andrew Innes <andrewi@harlequin.co.uk>, F.J.Wright@qmw.ac.uk
Cc: mike.fabian@it-mannesmann.de, Rolf.Sandau@de.bosch.com,         ntemacs-users@cs.washington.edu, cygwin@sourceware.cygnus.com
Subject: Re: AW: how to use emacs in -batch mode from bash?
Date: Wed, 17 Feb 1999 16:24:23 -0500

On Wed, Feb 17, 1999 at 07:09:28PM +0000, Andrew Innes wrote:
>However, we are now seeing the same problem occuring, this time on the
>Cygnus side.  The Cygnus port of bash will be applying the normal shell
>quoting rules to parse the command line typed by the user (or entered in
>the shell script), to construct the list of arguments to pass to Emacs.
>However, when bash invokes spawn() or exec() or some similiar library
>function to actually invoke Emacs, it has to flatten the argument list
>into a single string.  Clearly, the library function that does that is
>assuming the subprocess will use the Cygnus quoting rules to reconstruct
>the list of arguments.  That fails when an argument contains an embedded
>quote and the application doesn't use the Cygnus rules, which is the
>situation here.

As far as I know, the method used to "quote a quote" in cygwin is the
same as what is used in Visual C's libraries.  Here's a small program
that I just wrote to test this:

#include <stdio.h>
main(int argc, char **argv)
{
    int i;
    for (i = 0; i < argc; i++)
        printf("arg %d: /%s/\n", i, argv[i]);
}

And, here's the result:

c:\tmp>echoarg a b """"
arg 0: /echoarg/
arg 1: /a/
arg 2: /b/
arg 3: /"/

>Note that this is a problem with bash that applies when it invokes any
>application not compiled with the cygwin library, not just Emacs.
>
>I see two possible solutions to this general problem:
>
> 1. Change the cygwin spawn/exec/whatever library functions to use the
>    Microsoft rules for escaping embedded quotes when running non-cygwin
>    applications (I believe they already detects when they are spawning
>    non-cygwin applications; if not, the method Emacs uses could be
>    reused for this).

Cygwin does not know when it is running a non-cygwin application.  If it
did we wouldn't go through this quoting mess at all.

If Emacs is detecting this somehow, I'd love to hear how they do it.  I've
wanted to put more smarts into spawn for some time.

> 2. Change the cygwin quoting rules to match the Microsoft ones.  This
>    would apply to spawn/exec and the startup code, and would cause some
>    breakage when mixing with applications compiled with old versions of
>    cygwin.

See above.  As far as I can tell, cygwin is already compliant with Microsoft's
rules.  That was the intent in this whole scheme, actually.

>PS. There is a certain amount of irony in all this: the Microsoft
>startup code looks like it was intended to support escaping embedded
>quotes by doubling them (as Cygnus does), but the parsing code contains
>a bug which prevents this from working.  If not for this bug, the
>problem with bash invoking non-cygwin applications wouldn't arise.

I'm not sure why you're seeing this and I'm not but for my version of
MSVC 5.0 this seems to be working correctly.

cgf

From owner-ntemacs-users@cs.washington.edu  Thu Feb 18 05:47:24 1999
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Thu" "18" "February" "1999" "13:22:07" "GMT" "Andrew Innes" "andrewi@harlequin.co.uk" nil "112" "Re: AW: how to use emacs in -batch mode from bash?" "^From:" nil nil "2" nil nil nil nil]
	nil)
Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by june.cs.washington.edu (8.8.7+CS/7.2ju) with ESMTP id FAA06600 for <voelker@cs.washington.edu>; Thu, 18 Feb 1999 05:47:24 -0800
Received: (majordom@localhost) by trout.cs.washington.edu (8.8.5+CS/7.2trout) id FAA07998 for ntemacs-users-outgoing; Thu, 18 Feb 1999 05:23:38 -0800 (PST)
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2trout) with ESMTP id FAA07994 for <ntemacs-users@trout.cs.washington.edu>; Thu, 18 Feb 1999 05:23:35 -0800 (PST)
Received: from holly.cam.harlequin.co.uk (holly.cam.harlequin.co.uk [193.128.4.58]) by june.cs.washington.edu (8.8.7+CS/7.2ju) with ESMTP id FAA06026 for <ntemacs-users@cs.washington.edu>; Thu, 18 Feb 1999 05:23:32 -0800
Received: from gpo.cam.harlequin.co.uk (gpo.cam.harlequin.co.uk [192.88.238.241])           by holly.cam.harlequin.co.uk (8.8.4/8.8.4) with ESMTP 	  id NAA20705; Thu, 18 Feb 1999 13:22:40 GMT
Received: from gridlock.cam.harlequin.co.uk (gridlock.cam.harlequin.co.uk [192.88.238.223])           by gpo.cam.harlequin.co.uk (8.8.4/8.8.4) with ESMTP 	  id NAA15032; Thu, 18 Feb 1999 13:22:07 GMT
Message-Id: <199902181322.NAA15032@gpo.cam.harlequin.co.uk>
In-reply-to: <19990217162423.A13997@cygnus.com> (message from Christopher 	Faylor on Wed, 17 Feb 1999 16:24:23 -0500)
References: <5B9BE15FBECDD111A1820000F843B87C16C16F@bkmail1.bk.bosch.de> <199902161547.HAA28357@june.cs.washington.edu> <36C9ADFC.ABD3ACE4@Maths.QMW.ac.uk> <199902171909.TAA28859@gpo.cam.harlequin.co.uk> <19990217162423.A13997@cygnus.com>
Precedence: bulk
X-FAQ: http://www.cs.washington.edu/homes/voelker/ntemacs.html
From: Andrew Innes <andrewi@harlequin.co.uk>
Sender: owner-ntemacs-users@cs.washington.edu
To: cgf@cygnus.com
CC: F.J.Wright@qmw.ac.uk, mike.fabian@it-mannesmann.de,         Rolf.Sandau@de.bosch.com, ntemacs-users@cs.washington.edu,         cygwin@sourceware.cygnus.com
Subject: Re: AW: how to use emacs in -batch mode from bash?
Date: Thu, 18 Feb 1999 13:22:07 GMT

On Wed, 17 Feb 1999 16:24:23 -0500, Christopher Faylor <cgf@cygnus.com> said:
>On Wed, Feb 17, 1999 at 07:09:28PM +0000, Andrew Innes wrote:
>>However, we are now seeing the same problem occuring, this time on the
>>Cygnus side.  The Cygnus port of bash will be applying the normal shell
>>quoting rules to parse the command line typed by the user (or entered in
>>the shell script), to construct the list of arguments to pass to Emacs.
>>However, when bash invokes spawn() or exec() or some similiar library
>>function to actually invoke Emacs, it has to flatten the argument list
>>into a single string.  Clearly, the library function that does that is
>>assuming the subprocess will use the Cygnus quoting rules to reconstruct
>>the list of arguments.  That fails when an argument contains an embedded
>>quote and the application doesn't use the Cygnus rules, which is the
>>situation here.
>
>As far as I know, the method used to "quote a quote" in cygwin is the
>same as what is used in Visual C's libraries.  Here's a small program
>that I just wrote to test this:
>
>#include <stdio.h>
>main(int argc, char **argv)
>{
>    int i;
>    for (i = 0; i < argc; i++)
>        printf("arg %d: /%s/\n", i, argv[i]);
>}
>
>And, here's the result:
>
>c:\tmp>echoarg a b """"
>arg 0: /echoarg/
>arg 1: /a/
>arg 2: /b/
>arg 3: /"/

This example doesn't show up the difference, because the MSVC startup
code _does_ handle repeated quotes, but not in quite the same way (see
crt/src/stdargv.c in the MSVC library source for the gory details).

Here is a more revealing example:

d:\users\andrewi>echoarg "test a" "test ""b""" "test ""c"" d"
arg 0: /echoarg/
arg 1: /test a/
arg 2: /test "b"/
arg 3: /test "c/
arg 4: /d/

Note that arg 2 comes out as expected (fortuitously it turns out), but
arg 3 is split into two args by the MSVC code (and drops a quote in the
process), and not by the Cygwin code.  The reason is that MSVC sometimes
treats a doubled quote as the end of the argument.  To escape an
embedded quote reliably (at least in the absence of preceding
backslashes), you have to triple it like so:

d:\users\andrewi>echoarg "test a" "test """b"""" "test """c""" d"
arg 0: /echoargs/
arg 1: /test a/
arg 2: /test "b"/
arg 3: /test "c" d/

In fairness, this might not be a bug in the MSVC code, but a deliberate
feature.  It enables the following, slightly strange, method of
constructing arguments with whitespace:

d:\users\andrewi>echoarg "a and b "together
arg 0: /echoargs/
arg 1: /a and b together/

I can imagine that someone requested this behaviour, as a way to enable
DOS batch files to do things they couldn't otherwise easily do.

Anyway, the upshot of this mess is that the only really reliable way to
escape an embedded quote is to put a backslash before it (and double all
literal backslashes immediately preceding the embedded quote).  This is
what I refer to as the Microsoft quoting rule.

>>Note that this is a problem with bash that applies when it invokes any
>>application not compiled with the cygwin library, not just Emacs.
>>
>>I see two possible solutions to this general problem:
>>
>>1. Change the cygwin spawn/exec/whatever library functions to use the
>>Microsoft rules for escaping embedded quotes when running non-cygwin
>>applications (I believe they already detects when they are spawning
>>non-cygwin applications; if not, the method Emacs uses could be
>>reused for this).
>
>Cygwin does not know when it is running a non-cygwin application.  If it
>did we wouldn't go through this quoting mess at all.
>
>If Emacs is detecting this somehow, I'd love to hear how they do it.  I've
>wanted to put more smarts into spawn for some time.

In NT-Emacs, we examine the header of an executable, and if it is in PE
format, we walk the import table to see whether it implicitly links to
"cygwin.dll". (In the next release, I just check whether there is a dll
whose name starts "cygwin".)

>>2. Change the cygwin quoting rules to match the Microsoft ones.  This
>>would apply to spawn/exec and the startup code, and would cause some
>>breakage when mixing with applications compiled with old versions of
>>cygwin.
>
>See above.  As far as I can tell, cygwin is already compliant with Microsoft's
>rules.  That was the intent in this whole scheme, actually.

The Microsoft rules are unfortunately more complicated than they seem,
as shown above.  I believe the simplest rule to reliably escape embedded
quotes for MSVC-compiled programs is to use backslash, which is what
NT-Emacs does.

AndrewI

