From owner-ntemacs-users@cs.washington.edu  Mon Jan  4 05:43:00 1999
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil]
	[nil "Mon" " 4" "January" "1999" "13:09:30" "GMT" "Andrew Innes" "andrewi@harlequin.co.uk" nil "53" "Re: [20.3.3.1.1] crash in GC - how to debug?" "^From:" nil nil "1" nil nil nil nil]
	nil)
Received: from joker.cs.washington.edu (joker.cs.washington.edu [128.95.1.42]) by june.cs.washington.edu (8.8.7+CS/7.2ju) with SMTP id FAA26545 for <voelker@june.cs.washington.edu>; Mon, 4 Jan 1999 05:43:00 -0800
Received: from trout.cs.washington.edu (trout.cs.washington.edu [128.95.1.178]) by joker.cs.washington.edu (8.6.12/7.2ws+) with ESMTP id FAA32261 for <voelker@joker.cs.washington.edu>; Mon, 4 Jan 1999 05:42:59 -0800
Received: (majordom@localhost) by trout.cs.washington.edu (8.8.5+CS/7.2trout) id FAA14893 for ntemacs-users-outgoing; Mon, 4 Jan 1999 05:12:27 -0800 (PST)
Received: from june.cs.washington.edu (june.cs.washington.edu [128.95.1.4]) by trout.cs.washington.edu (8.8.5+CS/7.2trout) with ESMTP id FAA14889 for <ntemacs-users@trout.cs.washington.edu>; Mon, 4 Jan 1999 05:12:24 -0800 (PST)
Received: from holly.cam.harlequin.co.uk (holly.cam.harlequin.co.uk [193.128.4.58]) by june.cs.washington.edu (8.8.7+CS/7.2ju) with ESMTP id FAA25900 for <ntemacs-users@cs.washington.edu>; Mon, 4 Jan 1999 05:12:21 -0800
Received: from gpo.cam.harlequin.co.uk (gpo.cam.harlequin.co.uk [192.88.238.241])           by holly.cam.harlequin.co.uk (8.8.4/8.8.4) with ESMTP 	  id NAA15357; Mon, 4 Jan 1999 13:10:01 GMT
Received: from gridlock.cam.harlequin.co.uk (gridlock.cam.harlequin.co.uk [192.88.238.223])           by gpo.cam.harlequin.co.uk (8.8.4/8.8.4) with ESMTP 	  id NAA28103; Mon, 4 Jan 1999 13:09:30 GMT
Message-Id: <199901041309.NAA28103@gpo.cam.harlequin.co.uk>
In-reply-to: <uu2y75jnr.fsf@prague.ixos.cz> (message from Trung Tran-Duc on 04 	Jan 1999 13:04:08 +0100)
References:  <uu2y75jnr.fsf@prague.ixos.cz>
Precedence: bulk
X-FAQ: http://www.cs.washington.edu/homes/voelker/ntemacs.html
From: Andrew Innes <andrewi@harlequin.co.uk>
Sender: owner-ntemacs-users@cs.washington.edu
To: trung.tranduc@prague.ixos.cz
CC: ntemacs-users@cs.washington.edu
Subject: Re: [20.3.3.1.1] crash in GC - how to debug?
Date: Mon, 4 Jan 1999 13:09:30 GMT

On 04 Jan 1999 13:04:08 +0100, Trung Tran-Duc <trung.tranduc@prague.ixos.cz> said:
>Hello Emacs hackers,
>
>I'm running the latest beta version of NTEmacs. I have an .emacs which causes
>NTEmacs to crash reliably each time during startup. Obviously pointers are
>corrupted somewhere inside GC data structures. Unfortunately the crash happens
>at GC phase, and you know it's too late.
>
>I am willing to debug it, but please, can someone tell me the strategy of
>hunting Emacs GC-related bug. Is there any tech doc about how GC is done in
>Emacs, etc.

The Elisp manual might have some notes about the Emacs internals,
otherwise use the source.

GC in Emacs is mark-and-sweep, except that small strings are compacted.

If the bug is GC-related, there are several possiblilities for how the
memory corruption could have occurred:

 - the only handle to an object is in a local variable in a C procedure,
   but it hasn't been GCPRO'd (referring to the mechanism by which local
   handles to lisp objects are "protected" against GC).

   -> if GC happens while this value is still live, the object will be
      collected and the handle now contains a dangling pointer

 - a local object handle points to a string, but it hasn't been GCPRO'd

   -> if GC happens, the string will be moved during the compaction
      phase, and the handle now contains a dangling pointer

 - a bug in one or more C procedures causes memory corruption

If I were trying to track this down, I would put a breakpoint on
Fgarbage_collect (the C entrypoint for the GC), and count how many GCs
happen before the crash.  Then I would rerun Emacs and start tracing
through the execution following the penultimate GC to see if I can spot
where the corruption occurs.

You may find yourself tracing through lots of compiled lisp bytecode,
which is rather tedious (although you can watch certain key variables to
get a good sense of what is happening); if you can't seem to get
anywhere that way, try to work out where you are in your .emacs when the
penultimate GC occurs, and put in an explicit call to (debug) a bit
before it.  That way you can step through the lisp code in the lisp
debugger, which gives you a good chance of working out what is going on
leading up to the crash.

Ultimately, you will need to step through the C code once you've
narrowed down a likely location for the bug.  Good luck.

AndrewI

