|
5.11 Some nice stuff from the Smalltalk innardsJust like with everything else, you'd probably end up asking yourself: how's it done? So here's this chapter, just to wheten your appetite...
5.11.1 How Arrays WorkSmalltalk provides a very adequate selection of predefined classes from which to choose. Eventually, however, you will find the need to code a new basic data structure. Because Smalltalk's most fundamental storage allocation facilities are arrays, it is important that you understand how to use them to gain efficient access to this kind of storage.The Array Class. Our examples have already shown the Array class, and its use is fairly obvious. For many applications, it will fill all your needs--when you need an array in a new class, you keep an instance variable, allocate a new Array and assign it to the variable, and then send array accesses via the instance variable. This technique even works for string-like objects, although it is wasteful of storage. An Array object uses a Smalltalk pointer for each slot in the array; its exact size is transparent to the programmer, but you can generally guess that it'll be roughly the word size of your machine. (33) For storing an array of characters, therefore, an Array works but is inefficient. Arrays at a Lower Level. So let's step down to a lower level of data structure. A ByteArray is much like an Array, but each slot holds only an integer from 0 to 255-and each slot uses only a byte of storage. If you only needed to store small quantities in each array slot, this would therefore be a much more efficient choice than an Array. As you might guess, this is the type of array which a String uses. Aha! But when you go back to chapter 9 and look at the Smalltalk hierarchy, you notice that String does not inherit from ByteArray. To see why, we must delve down yet another level, and arrive at the basic methods for creating a class. For most example classes, we've used the message:
But when we implemented our CheckedArray example, we used
subclass:. This kind of class creation specifies the simplest Smalltalk object. The object consists only of the storage needed to hold the instance variables. In C, this would be a simple structure with zero or more scalar fields.(34).
variableSubclass:. All the other types of class are a superset of
a
variableByteSubclass:. This is a special case of
variableWordSubclass:. Once again, this is a special case of
Accessing These New Arrays. You already know how to access instance
variables--by name. But there doesn't seem to be a name for this new
storage. The way an object accesses it is to send itself
array-type messages like
The problem is when an object wants to add a new level
of interpretation to the at: and at:put: messages. Consider
a Dictionary--it is a
The answer is that Smalltalk has defined An Example. This can get pretty confusing in the abstract, so let's do an example to show how it's pretty simple in practice. Smalltalk arrays tend to start at 1; let's define an array type whose permissible range is arbitrary.
The code has two parts; an initialization, which simply records what index you wish the array to start with, and the at: messages, which adjust the requested index so that the underlying storage receives its 1-based index instead. We've included a range check; its utility will demonstrate itself in a moment:
Since 4 is below our base of 5, a range check error occurs. But this check can catch more than just our own misbehavior!
Our do: message handling is broken! The stack backtrace pretty much tells the story:
Our code received a do: message. We didn't define one, so we inherited the existing do: handling. We see that an Integer loop was constructed, that a code block was invoked, and that our own at: code was invoked. When we range checked, we trapped an illegal index. Just by coincidence, this version of our range checking code also dumps the index. We see that do: has assumed that all arrays start at 1.
But the issues start to run deep. If our parent class believed that it knew enough to assume a starting index of 1(36), why didn't it also assume that it could call basicAt:? The answer is that of the two choices, the designer of the parent class chose the one which was less likely to cause trouble; in fact all standard Smalltalk collections do have indices starting at 1, yet not all of them are implemented so that calling basicAt: would work.(37) Object-oriented methodology says that one object should be entirely opaque to another. But what sort of privacy should there be between a higher class and its subclasses? How many assumption can a subclass make about its superclass, and how many can the superclass make before it begins infringing on the sovereignty of its subclasses? Alas, there are rarely easy answers. Basic Allocation. In this chapter, we've seen the fundamental mechanisms used to allocate and index storage. When the storage need not be accessed with peak efficiency, you can use the existing array classes. When every access counts, having the storage be an integral part of your own object allows for the quickest access. When you move into this area of object development, inheritance and polymorphism become trickier; each level must coordinate its use of the underlying array with other levels. 5.11.2 Two flavors of equalityAs first seen in chapter two, Smalltalk keys its dictionary with things like #word, whereas we generally use 'word'. The former, as it turns out, is from class Symbol. The latter is from class String. What's the real difference between a Symbol and a String? To answer the question, we'll use an analogy from C.In C, if you have a function for comparing strings, you might try to write it:
But clearly this is wrong! The reason is that you can have two copies of a string, each with the same contents but each at its own address. A correct string compare must walk its way through the strings and compare each element. In Smalltalk, exactly the same issue exists, although the details of manipulating storage addresses are hidden. If we have two Smalltalk strings, both with the same contents, we don't necessarily know if they're at the same storage address. In Smalltalk terms, we don't know if they're the same object. The Smalltalk dictionary is searched frequently. To speed the search, it would be nice to not have to compare the characters of each element, but only compare the address itself. To do this, you need to have a guarantee that all strings with the same contents are the same object. The String class, created like:
does not satisfy this. Each time you execute this line, you may well get a new object. But a very similar class, Symbol, will always return the same object:
In general, you can use strings for almost all your tasks. If you ever get into a performance-critical function which looks up strings, you can switch to Symbol. It takes longer to create a Symbol, and the memory for a Symbol is never freed (since the class has to keep tabs on it indefinitely to guarantee it continues to return the same object). You can use it, but use it with care.
This tutorial has generally used the strcmp()-ish kind of
checks for equality. If you ever need to ask the question
"is this the same object?", you use the
Using C terms, 5.11.3 The truth about metaclasses
Everybody, sooner or later, looks for the implementation of the
Huh? Classes are objects?!? Let me explain.
Open up an image; type `gst -r' so that you have no run-time
statistics on the screen; type the text following the
Nothing new for now. Let's try something else:
You get it, that strange
You see, there is a sort of `parallel' hierarchy between classes and metaclasses. When you create a class, Smalltalk creates a metaclass; and just like a class describes how methods for its instances work, a metaclass describes how class methods for that same class work.
Now you would expect that
Uh?!? Try to read it aloud: the
Class is a subclass of other classes.
Evaluating Now, one final question and one final step: what are metaclasses instances of? The question makes sense: if everything has a class, should not metaclasses have one? Evaluate the following:
If you send
To understand the role of
The circle is closed. In the end, this mechanism implements a clean, elegant and (with some contemplation) understandable facility for self-definition of classes. In other words, it is what allows classes to talk about themselves, posing the foundation for the creation of browsers. 5.11.4 The truth of Smalltalk performanceEverybody says Smalltalk is slow, yet this is not completely true for at least three reasons. First, most of the time in graphical applications is spent waiting for the user to "do something", and most of the time in scripting applications (which GNU Smalltalk is particularly well versed in) is spent in disk I/O; implementing a travelling salesman problem in Smalltalk would indeed be slow, but for most real applications you can indeed exchange performance for Smalltalk's power and development speed. Second, Smalltalk's automatic memory management is faster than C's manual one. Most C programs are sped up if you relink them with one of the garbage collecting systems available for C or C++. Third, even though very few Smalltalk virtual machines are as optimized as, say, the Self environment (which reaches half the speed of optimized C!), they do perform some optimizations on Smalltalk code which make them run many times faster than a naive bytecode interpreter. Peter Deutsch, who among other things invented the idea of a just-in-time compiler like those you are used to seeing for Java(39), once observed that implementing a language like Smalltalk efficiently requires the implementor to cheat... but that's okay as long as you don't get caught. That is, as long as you don't break the language semantics. Let's look at some of these optimizations. For certain frequently used 'special selectors', the compiler emits a send-special-selector bytecode instead of a send-message bytecode. Special selectors have one of three behaviors:
No-lookup methods do contain a primitive number specification,
Other messages are open coded by the compiler. That is, there are no message sends for these messages--if the compiler sees blocks without temporaries and with the correct number of arguments at the right places, the compiler unwinds them using jump bytecodes, producing very efficient code. These are:
Other minor optimizations are done. Some are done by a peephole optimizer which is ran on the compiled bytecodes. Or, for example, when GST pushes a boolean value on the stack, it automatically checks whether the following bytecode is a jump (which is a common pattern resulting from most of the open-coded messages above) and combines the execution of the two bytecodes. All these snippets can be optimized this way:
That's all. If you want to know more, look at the virtual machine's source code in `libgst/interp-bc.inl' and at the compiler in `libgst/comp.c'. |