Memory issues

This chapter is about Object Pascal and memory. This chapter explains how Windows programs use memory, and it also describes the internal data formats used in Object Pascal.

Windows memory management

This section explains how Delphi applications use memory.

Code segments

Each module (the main program or library and each unit) in a Delphi application or DLL has its own code segment. The size of a single code segment can't exceed 64K, but the total size of the code is limited only by the available memory.

Segment attributes

Each code segment has a set of attributes that determine the behavior of the code segment when it's loaded into memory.

MOVEABLE or FIXED

When a code segment is MOVEABLE, Windows can move the segment around in physical memory to satisfy other memory allocation requests. When a code segment is FIXED, it never moves in physical memory. The preferred attribute is MOVEABLE, and unless it's absolutely necessary to keep a code segment at the same address in physical memory (such as if it contains an interrupt handler), you should use the MOVEABLE attribute. When you do need a fixed code segment, keep that code segment as small as possible.

PRELOAD or DEMANDLOAD

A code segment that has the PRELOAD attribute is automatically loaded when the application or library is activated. The DEMANDLOAD attribute delays the loading of the segment until a routine in the segment is actually called. Although this takes longer, it allows an application to execute in less space.

DISCARDABLE or PERMANENT

When a segment is DISCARDABLE, Windows can free the memory occupied by the segment when it needs to allocate additional memory. When a segment is PERMANENT, it's kept in memory at all times.

When an application makes a call to a DISCARDABLE segment that's not in memory, Windows first loads it from the .EXE file. This takes longer than if the segment were PERMANENT, but it allows an application to execute in less space.

Changing attributes

The default attributes of a code segment are MOVEABLE, DEMANDLOAD, and DISCARDABLE, but you can change this with a $C compiler directive. For example,

{$C MOVEABLE PRELOAD PERMANENT}

For details about the $C compiler directive, see Appendix B.

There is no need for a separate overlay manager. The Windows memory manager includes a full set of overlay management services, controlled through code segment attributes. These services are available to any Windows application.

The automatic data segment

Each application or library has one data segment called the "automatic data segment," which can be up to 64K in size. The automatic data segment is always pointed to by the data segment register (DS). It's divided into four sections: the local heap, the stack, the static data, and the task header.

Memory issues - 图1

The first 16 bytes of the automatic data segment always contain the task header in which Windows stores various system information.

The static data area contains all global variables and typed constants declared by the application or library.

The stack is used to store local variables allocated by procedures and functions. On entry to an application, the stack segment register (SS) and the stack pointer (SP) are loaded so that SS:SP points to the first byte past the stack area in the automatic data segment. When procedures and functions are called, SP is moved down to allocate space for parameters, the return address, and local variables. When a routine returns, the process is reversed by incrementing SP to the value it had before the call. The default size of the stack area in the automatic data segment is 16K, but this can be changed with a $M compiler directive.

Unlike an application, a library has no stack area in its automatic data segment. When a call is made to a procedure or function in a DLL, the DS register points to the library's automatic data segment, but the SS:SP register pair isn't modified.

Therefore, a library always uses the stack of the calling application.

The last section in the automatic data segment is the local heap. It contains all local dynamic data that was allocated using the LocalAlloc function in Windows. The default size of the local heap section is 8K, but this can be changed with a $M compiler directive.

The local heap is used by Windows for the temporary storage of things such as edit control and list box buffers. Never set the local heap to zero.

The heap manager

Windows supports dynamic memory allocations on two different heaps: The global heap and the local heap.

The global heap is a pool of memory available to all applications. Although global memory blocks of any size can be allocated, the global heap is intended only for large memory blocks (256 bytes or more). Each global memory block carries an overhead of at least 20 bytes, and there is a system-wide limit of 8192 global memory blocks, only some of which are available to any given application.

The local heap is a pool of memory available only to your application or library. It exists in the upper part of an application's or library's data segment. The total size of local memory blocks that can be allocated on the local heap is 64K minus the size of the application's stack and static data. For this reason, the local heap is best suited for small memory blocks (256 bytes or less). The default size of the local heap is 8K, but you can change this with the $M compiler directive.

Delphi includes a heap manager which implements the New, Dispose, GetMem, and FreeMem standard procedures. The heap manager uses the global heap for all allocations. Because the global heap has a system-wide limit of 8192 memory blocks (which certainly is less than what some applications might require), Delphi's heap manager includes a segment sub-allocator algorithm to enhance performance and allow a substantially larger number of blocks to be allocated.

Note To read more about using the heap manager in a DLL, see Chapter 12.

This is how the segment sub-allocator works: When allocating a large block, the heap manager simply allocates a global memory block using the Windows GlobalAlloc routine. When allocating a small block, the heap manager allocates a larger global memory block and then divides (sub-allocates) that block into smaller blocks as required. Allocations of small blocks reuse all available sub-allocation space before the heap manager allocates a new global memory block, which, in turn, is further sub-allocated.

The HeapLimit variable defines the threshold between small and large heap blocks. The default value is 1024 bytes. The HeapBlock variable defines the size the heap manager uses when allocating blocks to be assigned to the sub-allocator. The default value of HeapBlock is 8192 bytes. You should have no reason to change the values of HeapLimit and HeapBlock, but if you do, make sure that HeapBlock is at least four times the size of HeapLimit.

The HeapAllocFlags variable defines the attribute flags value passed to GlobalAlloc when the heap manager allocates global blocks. In a program, the default value is gmem_Moveable, and in a library the default value is gmem_Moveable + gmem_DDEShare.

Internal data formats

The next several pages discuss the internal data formats of Object Pascal.

Integer types

The format selected to represent an integer-type variable depends on its minimum and maximum bounds:

If both bounds are within the range -128..127 (Shortint), the

variable is stored as a signed byte.
If both bounds are within the range 0..255 (Byte), the variable is

stored as an unsigned byte.
If both bounds are within the range -32768..32767 (Smallint), the

variable is stored as a signed word.
If both bounds are within the range 0..65535 (Word), the variable

is stored as an unsigned word.
Otherwise, the variable is stored as a signed double word

(Longint).

Char types

A Char or a subrange of a Char type is stored as an unsigned byte.

Boolean types

A Boolean type is stored as a Byte, a ByteBool is stored as a Byte, a WordBool type is stored as a Word, and a LongBool is stored as a Longint.

A Boolean can assume the values 0 (False) and 1 (True). ByteBool, WordBool, and

LongBool types can assume the value of 0 (False) or nonzero (True).

Enumerated types

An enumerated type is stored as an unsigned byte if the enumeration has no more than 256 values, and if the type was declared in the {$Z-} state (the default). If an enumerated type has more than 256 values, or if the type was declared in the {$Z+} state, it is stored as an unsigned word.

Floating-point types

The floating-point types (Real, Single, Double, Extended, and Comp) store the binary representations of a sign (**+**or -), an exponent, and a significand. A represented number has the value

+/- significand * 2exponent

where the significand has a single bit to the left of the binary decimal point (that is, 0

<= significand < 2).

In the figures that follow, msb means most significant bit and lsb means least significant bit. The leftmost items are stored at the highest addresses. For example, for a real-type value, e is stored in the first byte, f in the following five bytes, and s in the most significant bit of the last byte.

The Real type

A 6-byte (48-bit) Real number is divided into three fields:

width in bits

1 39 8

msb lsb msb lsb

The value v of the number is determined by the following:

if 0 < e <= 255, then v = (-1)s * 2(e-129) * (1.f).

if e = 0, then v = 0.

The Real type can't store denormals, NaNs, and infinities. Denormals become zero when stored in a Real, and NaNs and infinities produce an overflow error if an attempt is made to store them in a Real.

The Single type

A 4-byte (32-bit) Single number is divided into three fields:

width in bits

1 8 23

msb lsb msb lsb

The value v of the number is determined by the following:

if 0 < e < 255, then v = (-1)s * 2(e-127) * (1.f).

if e = 0 and f <> 0, then v = (-1)s * 2(-126) * (0.f).

if e = 0 and f = 0, then v = (-1)s * 0. if e = 255 and f = 0, then v = (-1)s * Inf. if e = 255 and f <> 0, then v is a NaN.

The Double type

An 8-byte (64-bit) Double number is divided into three fields:

width in bits

1 11 52

msb lsb msb lsb

The value v of the number is determined by the following:

if 0 < e < 2047, then v = (-1) s * 2(e-1023) * (1.f).

if e = 0 and f <> 0, then v = (-1) s * 2(-1022) * (0.f).

if e = 0 and f = 0, then v = (-1) s * 0. if e = 2047 and f = 0, then v = (-1) s * Inf. if e = 2047 and f <> 0, then v is a NaN.

The Extended type

A 10-byte (80-bit) Extended number is divided into four fields:

width in bits

1 15 1 63

msb lsb msb lsb

The value v of the number is determined by the following:

if 0 <= e < 32767, then v = (-1) s * 2(e-16383) * (i.f).

if e = 32767 and f = 0, then v = (-1) s * Inf.

if e = 32767 and f <> 0, then v is a NaN.

The Comp type

An 8-byte (64-bit) Comp number is divided into two fields:

width in bits

1 63

msb lsb

The value v of the number is determined by the following:

if s = 1 and d = 0, then v is a NaN

Otherwise, v is the two's complement 64-bit value.

Pointer types

A Pointer type is stored as two words (a double word), with the offset part in the low word and the segment part in the high word. The pointer value nil is stored as a double-word zero.

String types

A string occupies as many bytes as its maximum length plus one. The first byte contains the current dynamic length of the string, and the following bytes contain the characters of the string.

The length byte and the characters are considered unsigned values. Maximum string length is 255 characters plus a length byte (string [255]).

Set types

A set is a bit array where each bit indicates whether an element is in the set or not. The maximum number of elements in a set is 256, so a set never occupies more than 32 bytes. The number of bytes occupied by a particular set is calculated as

ByteSize = (Max div 8) - (Min div 8) + 1

where Min and Max are the lower and upper bounds of the base type of that set. The byte number of a specific element E is

ByteNumber = (E div 8) - (Min div 8)

and the bit number within that byte is

BitNumber = E mod 8

where E denotes the ordinal value of the element.

Array types

An array is stored as a contiguous sequence of variables of the component type of the array. The components with the lowest indexes are stored at the lowest memory addresses. A multidimensional array is stored with the rightmost dimension increasing first.

Record types

The fields of a record are stored as a contiguous sequence of variables. The first field is stored at the lowest memory address. If the record contains variant parts, then each variant starts at the same memory address.

File types

File types are represented as records. Typed files and untyped files occupy 128 bytes, which are laid out as follows:

type

TFileRec = record Handle: Word; Mode: Word; RecSize: Word;

Private: array [1..26] of Byte; UserData: array [1..16] of Byte; Name: array [0..79] of Char;

end;

Text files occupy 256 bytes, which are laid out as follows:

type

TTextBuf = array [0..127] of Char; TTextRec = record

Handle: Word;

Mode: Word;

BufSize: Word;

Private: Word;

BufPos: Word;

BufEnd: Word;

BufPtr: ^TTextBuf;

OpenFunc: Pointer;

InOutFunc: Pointer; FlushFunc: Pointer; CloseFunc: Pointer;

UserData: array [1..16] of Byte; Name: array [0..79] of Char; Buffer: TTextBuf;

end;

Handle contains the file's handle (when the file is open).

The Mode field can assume one of the following values:

const

fmClosed = $D7B0; fmInput = $D7B1; fmOutput = $D7B2; fmInOut = $D7B3;

fmClosed indicates that the file is closed. fmInput and fmOutput indicate that the file is a text file that has been reset (fmInput) or rewritten (fmOutput). fmInOut indicates that the file variable is a typed or an untyped file that has been reset or rewritten. Any other value indicates that the file variable hasn't been assigned (and thereby not initialized).

The UserData field is never accessed by Object Pascal and is free for user-written routines to store data in.

Name contains the file name, which is a sequence of characters terminated by a null character (#0).

For typed files and untyped files, RecSize contains the record length in bytes, and the

Private field is unused but reserved.

For text files, BufPtr is a pointer to a buffer of BufSize bytes, BufPos is the index of the next character in the buffer to read or write, and BufEnd is a count of valid characters in the buffer. OpenFunc, InOutFunc, FlushFunc, and CloseFunc are pointers to the I/O routines that control the file. The section entitled "Text file device drivers" in Chapter 13 provides information on that subject.

Procedural types

A global procedure pointer type is stored as a 32-bit pointer to the entry point of a procedure or function.

A method pointer type is stored as a 32-bit pointer to the entry point of a method, followed by a 32-bit pointer to an object.

Class types

A class type value is stored as a 32-bit pointer to an instance of the class. An instance of a class is also known as an object.

The internal data format of an object resembles that of a record. The fields of an object are stored in order of declaration as a contiguous sequences of variables. Any fields inherited from an ancestor class are stored before the new fields defined in the descendant class.

The first four-byte field of every object is a pointer to the virtual method table (VMT) of the class. There is only one VMT per class (not one per instance), but two distinct class types never share a VMT, no matter how identical they appear to be. VMTs are built automatically by the compiler, and are never directly manipulated by a

program. Likewise, pointers to VMTs are automatically stored in class instances by the class type's constructor(s) and are never directly manipulated by a program.

The layout of a VMT is shown in the following table. At positive offsets, a VMT consists of a list of 32-bit method pointers, one per user-defined virtual method in the class type, in order of declaration. Each slot contains the address of the corresponding virtual method's entry point. This layout is compatible with a C++ v- table, and the OLE Object Model used by Windows Object Linking and Embedding. At negative offsets, a VMT contains a number of fields that are internal to Object Pascal's implementation. These fields are listed here for informational purposes only. An application should use the methods defined in TObject to query this information, since the layout is likely to change in future implementations of Object Pascal.

Table 16-1 Virtual Method Table layout

Offset	Type	Description
-32	Word	Near pointer to type information table (or nil).
-30	Word	Near pointer to field definition table (or nil).
-28	Word	Near pointer to method definition table (or nil).
-26	Word	Near pointer to dynamic method table (or nil).
-24	Word	Near pointer to string containing class name.
-22	Word	Instance size in bytes.
-20	Pointer	Pointer to ancestor class (or nil).
-16	Pointer	Entry point of DefaultHandler method.
-12	Pointer	Entry point of NewInstance method.
-8	Pointer	Entry point of FreeInstance method.
-4	Pointer	Entry point of Destroy destructor.
0	Pointer	Entry point of first user-defined virtual method.
4	Pointer	Entry point of second user-defined virtual method.
...	...	...

Class reference types

A class reference type value is stored as a 32-bit pointer to the virtual method table (VMT) of a class.

Direct memory access

Object Pascal implements three predefined arrays, Mem, MemW, and MemL, which are used to directly access memory. Each component of Mem is a byte, each component of MemW is a Word, and each component of MemL is a Longint.

The Mem arrays use a special syntax for indexes: Two expressions of the integer type Word, separated by a colon, are used to specify the segment base and offset of the memory location to access. Here are two examples:

Mem[Seg0040:$0049] := 7;

Data := MemW[Seg(V):Ofs(V)];

The first statement stores the value 7 in the byte at $0040:$0049. The second statement moves the Word value stored in the first 2 bytes of the variable V into the variable Data.

Direct port access

For access to the 80x86 CPU data ports, Object Pascal implements two predefined arrays, Port and PortW. Both are one-dimensional arrays, and each element represents a data port, whose port address corresponds to its index. The index type is the integer type Word. Components of the Port array are of type Byte and components of the PortW array are of type Word.

When a value is assigned to a component of Port or PortW, the value is output to the selected port. When a component of Port or PortW is referenced in an expression, its value is input from the selected port.

Use of the Port and PortW arrays is restricted to assignment and reference in expressions only; that is, components of Port and PortW can't be used as variable parameters. Also, references to the entire Port or PortW array (reference without index) aren't allowed

C h a p t e r