The built-in assembler

Delphi's built-in assembler allows you to write 8086/8087 and 80286/80287 assembler code directly inside your Object Pascal programs. Of course, you can still convert assembler instructions to machine code manually for use in inline statements, or link in .OBJ files that contain external procedures and functions when you want to mix Object Pascal and assembler.

The built-in assembler implements a large subset of the syntax supported by Turbo Assembler and Microsoft's Macro Assembler. The built-in assembler supports all 8086/8087 and 80286/80287 opcodes, and all but a few of Turbo Assembler's expression operators.

Except for DB, DW, and DD (define byte, word, and double word), none of Turbo Assembler's directives, such as EQU, PROC, STRUC, SEGMENT, and MACRO, are supported by the built-in assembler. Operations implemented through Turbo Assembler directives, however, are largely matched by corresponding Delphi constructs. For example, most EQU directives correspond to const, var, and type declarations in Delphi, the PROC directive corresponds to procedure and function declarations, and the STRUC directive corresponds to Delphi record types. In fact, Delphi's built-in assembler can be thought of as an assembler language compiler that uses Object Pascal syntax for all declarations.

The asm statement

The built-in assembler is accessed through asm statements. This is the syntax of an

asm statement:

asm AsmStatement [ Separator AsmStatement ] end

where AsmStatement is an assembler statement, and Separator is a semicolon, a new- line, or a Object Pascal comment.

Multiple assembler statements can be placed on one line if they are separated by semicolons. A semicolon isn't required between two assembler statements if the statements are on separate lines. A semicolon doesn't indicate that the rest of the line is a comment--comments must be written in Object Pascal style using { and } or **(***and *).

Register use

In general, the rules of register use in an asm statement are the same as those of an external procedure or function. An asm statement must preserve the BP, SP, SS, and DS registers, but can freely modify the AX, BX, CX, DX, SI, DI, ES, and Flags registers. On entry to an asm statement, BP points to the current stack frame, SP points to the top of the stack, SS contains the segment address of the stack segment, and DS contains the segment address of the data segment. Except for BP, SP, SS, and DS, an asm statement can assume nothing about register contents on entry to the statement.

Assembler statement syntax

This is the syntax of an assembler statement:

[ Label ":" ] < Prefix > [ Opcode [ Operand < "," Operand > ] ]

Label is a label identifier, Prefix is an assembler prefix opcode (operation code), Opcode is an assembler instruction opcode or directive, and Operand is an assembler expression.

Comments are allowed between assembler statements, but not within them. For example, this is allowed:

asm

MOV AX,1 {Initial value} MOV CX,100 {Count}

end;

but this is an error:

asm

MOV {Initial value} AX,1;

MOV CX, {Count} 100

end;

Labels

Labels are defined in assembler as they are in Object Pascal--by writing a label identifier and a colon before a statement. There is no limit to a label length, except only the first 32 characters of an identifier are significant in the built-in assembler. And as they are in Object Pascal, labels defined in assembler must be declared in a label declaration part in the block containing the asm statement. There is one exception to this rule: local labels.

Local labels are labels that start with an at-sign (@). Because an at-sign can't be part of a Object Pascal identifier, such local labels are automatically restricted to use within asm statements. A local label is known only within the asm statement that defines it (that is, the scope of a local label extends from the asm keyword to the end keyword of the asm statement that contains it).

Unlike a normal label, a local label doesn't have to be declared in a label declaration part before it's used.

The exact composition of a local label identifier is an at-sign (@) followed by one or more letters (A.. Z), digits (0..9), underscores ( _ ), or at-signs. As with all labels, the identifier is followed by a colon (:).

Instruction opcodes

The built-in assembler supports all 8086/8087 and 80286/80287 instruction opcodes. 8087 opcodes are available only in the {$N+} state (numeric processor enabled), 80286 opcodes are available only in the {$G+} state (80286 code generation enabled), and 80287 opcodes are available only in the {$G+,N+} state.

For a complete description of each instruction, refer to your 80x86 and 80x87 reference manuals.

RET instruction sizing

The RET instruction opcode generates a near return or a far return machine code instruction depending on the call model of the current procedure or function.

procedure NearProc; near; begin

asm

RET { Generates a near return }

end; end;

procedure FarProc; far; begin

asm

RET { Generates a far return }

end; end;

On the other hand, the RETN and RETF instructions always generate a near return and a far return, regardless of the call model of the current procedure or function.

Automatic jump sizing

Unless otherwise directed, the built-in assembler optimizes jump instructions by automatically selecting the shortest, and therefore most efficient form of a jump instruction. This automatic jump sizing applies to the unconditional jump instruction (JMP), and all conditional jump instructions, when the target is a label (not a procedure or function).

For an unconditional jump instruction (JMP), the built-in assembler generates a short jump (one byte opcode followed by a one byte displacement) if the distance to the target label is within -128 to 127 bytes; otherwise a near jump (one byte opcode followed by a two byte displacement) is generated.

For a conditional jump instruction, a short jump (1 byte opcode followed by a 1 byte displacement) is generated if the distance to the target label is within -128 to 127 bytes; otherwise, the built-in assembler generates a short jump with the inverse condition, which jumps over a near jump to the target label (5 bytes in total). For example, the assembler statement

JC Stop

where Stop isn't within reach of a short jump is converted to a machine code sequence that corresponds to this:

JNC Skip

JMP Stop Skip:

Jumps to the entry points of procedures and functions are always either near or far, but never short, and conditional jumps to procedures and functions are not allowed. You can force the built-in assembler to generate an unconditional near jump or far jump to a label by using a NEAR PTR or FAR PTR construct. For example, the assembler statements

JMP NEAR PTR Stop

JMP FAR PTR Stop

always generates a near jump and a far jump, respectively, even if Stop is a label within reach of a short jump.

Assembler directives

Delphi's built-in assembler supports three assembler directives: DB (define byte), DW (define word), and DD (define double word). They each generate data corresponding to the comma-separated operands that follow the directive.

The DB directive generates a sequence of bytes. Each operand can be a constant expression with a value between -128 and 255, or a character string of any length. Constant expressions generate one byte of code, and strings generate a sequence of bytes with values corresponding to the ASCII code of each character.

The DW directive generates a sequence of words. Each operand can be a constant expression with a value between -32,768 and 65,535, or an address expression. For an address expression, the built-in assembler generates a near pointer, that is, a word that contains the offset part of the address.

The DD directive generates a sequence of double words. Each operand can be a constant expression with a value between -2,147,483,648 and 4,294,967,295, or an address expression. For an address expression, the built-in assembler generates a far pointer, that is, a word that contains the offset part of the address, followed by a word that contains the segment part of the address.

The data generated by the DB, DW, and DD directives is always stored in the code segment, just like the code generated by other built-in assembler statements. To generate uninitialized or initialized data in the data segment, you should use Object Pascal var or const declarations.

Some examples of DB, DW, and DD directives follow:

asm

DB

DB DB

0FFH

0,99 'A'

{

{

{

One byte }

Two bytes } Ord('A') }

DB

'Hello world...',0DH,0AH

{

String followed by CR/LF }

DB

12,"Delphi"

{

Object Pascal style string }

DW

0FFFFH

{

One word }

DW

0,9999

{

Two words }

DW

'A'

{

Same as DB 'A',0 }

DW

'BA'

{

Same as DB 'A','B' }

DW

MyVar

{

Offset of MyVar }

DW

MyProc

{

Offset of MyProc }

DD

0FFFFFFFFH

{

One double-word }

DD

0,999999999

{

Two double-words }

DD

'A'

{

Same as DB 'A',0,0,0 }

DD

'DCBA'

{

Same as DB 'A','B','C','D' }

DD

MyVar

{

Pointer to MyVar }

DD

MyProc

{

Pointer to MyProc }

end;

In Turbo Assembler, when an identifier precedes a DB, DW, or DD directive, it causes the declaration of a byte, word, or double-word sized variable at the location of the directive. For example, Turbo Assembler allows the following:

ByteVar DB ?

WordVar DW ?

.

.

.

MOV AL,ByteVar

MOV BX,WordVar

The built-in assembler doesn't support such variable declarations. In Delphi, the only kind of symbol that can be defined in an built-in assembler statement is a label. All variables must be declared using Object Pascal syntax, and the preceding construct corresponds to this:

var

ByteVar: Byte;

WordVar: Word;

.

.

.

asm

MOV AL,ByteVar

MOV BX,WordVar

end;

Operands

Built-in assembler operands are expressions that consist of a combination of constants, registers, symbols, and operators. Although built-in assembler expressions are built using the same basic principles as Object Pascal expressions, there are a number of important differences, as will be explained later in this chapter.

Within operands, the following reserved words have a predefined meaning to the built-in assembler:

Table 19-1 Built-in assembler reserved words

AH

CS

LOW

SI

AL

CX

MOD

SP

AND

DH

NEAR

SS

AX

DI

NOT

ST

BH

DL

OFFSET

TBYTE

BL

DS

OR

TYPE

BP

DWORD

PTR

WORD

BX

DX

QWORD

XOR

BYTE

ES

SEG

CH

FAR

SHL

CL

HIGH

SHR

The reserved words always take precedence over user-defined identifiers. For example, the code fragment,

var

ch: Char;

.

.

.

asm

MOV CH, 1

end;

loads 1 into the CH register, not into the CH variable. To access a user-defined symbol with the same name as a reserved word, you must use the ampersand (&) identifier override operator:

asm

MOV &ch, 1

end;

It's strongly suggested that you avoid user-defined identifiers with the same names as built-in assembler reserved words, because such name confusion can easily lead to obscure and hard-to-find bugs.

Expressions

The built-in assembler evaluates all expressions as 32-bit integer values. It doesn't support floating-point and string values, except string constants.

Built-in assembler expressions are built from expression elements and operators, and each expression has an associated expression class and expression type. These concepts are explained in the following sections.

Differences between Object Pascal and Assembler expressions

The most important difference between Object Pascal expressions and built-in assembler expressions is that all built-in assembler expressions must resolve to a constant value, a value that can be computed at compile time. For example, given these declarations:

const

X = 10;

Y = 20;

var

Z: Integer;

the following is a valid built-in assembler statement:

asm

MOV Z,X+Y

end;

Because both X and Y are constants, the expression X + Y is merely a more convenient way of writing the constant 30, and the resulting instruction becomes a move immediate of the value 30 into the word-sized variable Z. But if you change X and Y to be variables,

var

X, Y: Integer;

the built-in assembler can no longer compute the value of X + Y at compile time. The correct built-in assembler construct to move the sum of X and Y into Z is this:

asm

MOV AX,X

ADD AX,Y

MOV Z,AX

end;

Another important difference between Object Pascal and built-in assembler expressions is the way variables are interpreted. In a Object Pascal expression, a reference to a variable is interpreted as the contents of the variable, but in an built-in assembler expression, a variable reference denotes the address of the variable. For example, in Object Pascal, the expression X + 4, where X is a variable, means the contents of X plus 4, whereas in the built-in assembler, it means the contents of the word at an address four bytes higher than the address of X. So, even though you're allowed to write

asm

MOV AX,X+4

end;

the code doesn't load the value of X plus 4 into AX, but it loads the value of a word stored four bytes beyond X instead. The correct way to add 4 to the contents of X is like this:

asm

MOV AX,X

ADD AX,4

end;

Expression elements

The basic elements of an expression are constants, registers, and symbols.

Constants

The built-in assembler supports two types of constants: numeric constants and string constants.

Numeric constants

Numeric constants must be integers, and their values must be between - 2,147,483,648 and 4,294,967,295.

By default, numeric constants use decimal (base 10) notation, but the built-in assembler supports binary (base 2), octal (base 8), and hexadecimal (base 16) notations as well. Binary notation is selected by writing a B after the number, octal notation is selected by writing a letter O after the number, and hexadecimal notation is selected by writing an H after the number or a $ before the number.

The B, O, and H suffixes aren't supported in Object Pascal expressions. Object Pascal expressions allow only decimal notation (the default) and hexadecimal notation (using a $ prefix).

Numeric constants must start with one of the digits 0 through 9 or a $ character; therefore when you write a hexadecimal constant using the H suffix, an extra zero in front of the number is required if the first significant digit is one of the hexadecimal digits A through F. For example, 0BAD4H and $BAD4 are hexadecimal constants, but BAD4H is an identifier because it starts with a letter and not a digit.

String constants

String constants must be enclosed in single or double quotes. Two consecutive quotes of the same type as the enclosing quotes count as only one character. Here are some examples of string constants:

'Z'

'Delphi'

"That's all folks"

'"That''s all folks," he said.'

'100' '"'

"'"

Notice in the fourth string the use of two consecutive single quotes to denote one single quote character.

String constants of any length are allowed in DB directives, and cause allocation of a sequence of bytes containing the ASCII values of the characters in the string. In all other cases, a string constant can be no longer than four characters, and denotes a numeric value which can participate in an expression. The numeric value of a string constant is calculated as

Ord(Ch1) + Ord(Ch2) shl 8 + Ord(Ch3) shl 16 + Ord(Ch4) shl 24

where Ch1 is the rightmost (last) character and Ch4 is the leftmost (first) character. If the string is shorter than four characters, the leftmost (first) character(s) are assumed to be zero. Here are some examples of string constants and their corresponding numeric values:

Table 19-2 String examples and their values

String Value

'a' 00000061H

'ba' 00006261H

'cba' 00636261H

'dcba' 64636261H

'a ' 00006120H

‘ a' 20202061H

'a' * 2 000000E2H

'a'-'A' 00000020H

not 'a' FFFFFF9EH

Registers

The following reserved symbols denote CPU registers:

Table 19-3 CPU registers

16-bit general purpose AX BX CX DX 8-bit low registers AL BL CL DL

8-bit high registers AH BH CH DH 16-bit pointer or index SP BP SI DI

16-bit segment registers CS DS SS ES 8087 register stack ST

When an operand consists solely of a register name, it's called a register operand. All registers can be used as register operands. In addition, some registers can be used in other contexts.

The base registers (BX and BP) and the index registers (SI and DI) can be written within square brackets to indicate indexing. Valid base/index register combinations are [BX], [BP], [SI], [DI], [BX+SI], [BX+DI], [BP+SI], and [BP+DI].

The segment registers (ES, CS, SS, and DS) can be used in conjunction with the colon (:) segment override operator to indicate a different segment than the one the processor selects by default.

The symbol ST denotes the topmost register on the 8087 floating-point register stack. Each of the eight floating-point registers can be referred to using ST(x), where x is a constant between 0 and 7 indicating the distance from the top of the register stack.

Symbols

The built-in assembler allows you to access almost all Object Pascal symbols in assembler expressions, including labels, constants, types, variables, procedures, and functions. In addition, the built-in assembler implements the following special symbols:

@Code @Data @Result

The @Code and @Data symbols represent the current code and data segments. They should only be used in conjunction with the SEG operator:

asm

MOV AX,SEG @Data

MOV DS,AX

end;

The @Result symbol represents the function result variable within the statement part of a function. For example, in this function,

function Sum(X, Y: Integer): Integer;

begin

Sum := X + Y;

end;

the statement that assigns a function result value to Sum uses the @Result variable if it is written in built-in assembler:

function Sum(X, Y: Integer): Integer;

begin asm

MOV AX,X

ADD AX,Y

MOV @Result,AX

end; end;

The following symbols can't be used in built-in assembler expressions:

  • Standard procedures and functions (for example, WriteLn, Chr)

  • The Mem, MemW, MemL, Port, and PortW special arrays

  • String, floating-point, and set constants

  • Procedures and functions declared with the inline directive

  • Labels that aren't declared in the current block

  • The @Result symbol outside a function

The following table summarizes the value, class, and type of the different kinds of symbols that can be used in built-in assembler expressions. (Expression classes and types are described in a following section.)

Table 19-4 Values, classes, and types of symbols

Symbol

Value

Class

Type

Label

Address of label

Memory

SHORT

Constant

Value of constant

Immediate

0

Type

0

Memory

Size of type

Field

Offset of field

Memory

Size of type

Variable

Address of variable

Memory

Size of type

Procedure

Address of procedure

Memory

NEAR or FAR

Function

Address of function

Memory

NEAR or FAR

Unit

0

Immediate

0

@Code

Code segment address

Memory

0FFF0H

@Data

Data segment address

Memory

0FFF0H

@Result

Result var offset

Memory

Size of type

Local variables (variables declared in procedures and functions) are always allocated on the stack and accessed relative to SS:BP, and the value of a local variable symbol is its signed offset from SS:BP. The assembler automatically adds [BP] in references to local variables. For example, given these declarations,

procedure Test;

var

Count: Integer;

the instruction

asm

MOV AX,Count

end;

assembles into MOV AX,[BP-2].

The built-in assembler always treats a var parameter as a 32-bit pointer, and the size of a var parameter is always 4 (the size of a 32-bit pointer). In Object Pascal, the syntax for accessing a var parameter and a value parameter is the same--this isn't the case in code you write for the built-in assembler. Because var parameters are really pointers, you have to treat them as such. So, to access the contents of a var parameter, you first have to load the 32-bit pointer and then access the location it points to. For example, if the X and Y parameters of the above function Sum were var parameters, the code would look like this:

function Sum(var X, Y: Integer): Integer;

begin asm

LES BX,X

MOV AX,ES:[BX]

LES BX,Y

ADD AX,ES:[BX]

MOV @Result,AX

end; end;

Some symbols, such as record types and variables, have a scope that can be accessed using the period (.) structure member selector operator. For example, given these declarations,

type

TPoint = record

X, Y: Integer;

end;

TRect = record

A, B: TPoint;

end; var

P: TPoint;

R: TRect;

the following constructs can be used to access fields in the P and R variables:

asm

MOV AX,P.X

MOV DX,P.Y

MOV CX,R.A.X

MOV BX,R.B.Y

end;

A type identifier can be used to construct variables on the fly. Each of the following instructions generates the same machine code, which loads the contents of ES:[DI+4] into AX:

asm

MOV AX,(TRect PTR ES:[DI]).B.X MOV AX,TRect(ES:[DI]).B.X

MOV AX,ES:TRect[DI].B.X MOV AX,TRect[ES:DI].B.X MOV AX,ES:[DI].TRect.B.X

end;

A scope is provided by type, field, and variable symbols of a record or object type. In addition, a unit identifier opens the scope of a particular unit, just like a fully qualified identifier in Object Pascal.

Expression classes

The built-in assembler divides expressions into three classes: registers, memory references, and immediate values.

An expression that consists solely of a register name is a register expression. Examples of register expressions are AX, CL, DI, and ES. Used as operands, register expressions direct the assembler to generate instructions that operate on the CPU registers.

Expressions that denote memory locations are memory references; Object Pascal's labels, variables, typed constants, procedures, and functions belong to this category.

Expressions that aren't registers and aren't associated with memory locations are immediate values; this group includes Object Pascal's untyped constants and type identifiers.

Immediate values and memory references cause different code to be generated when used as operands. For example,

const

Start = 10;

var

Count: Integer;

.

.

.

asm

MOV

AX,Start

{ MOV AX,xxxx }

MOV

BX,Count

{ MOV BX,[xxxx] }

MOV

CX,[Start]

{ MOV CX,[xxxx] }

MOV

DX,OFFSET Count

{ MOV DX,xxxx }

end;

Because Start is an immediate value, the first MOV is assembled into a move immediate instruction. The second MOV, however, is translated into a move memory instruction, as Count is a memory reference. In the third MOV, the square brackets operator is used to convert Start into a memory reference (in this case, the word at offset 10 in the data segment), and in the fourth MOV, the OFFSET operator is used to convert Count into an immediate value (the offset of Count in the data segment).

As you can see, the square brackets and the OFFSET operators complement each other. In terms of the resulting machine code, the following asm statement is identical to the first two lines of the previous asm statement:

asm

MOV AX,OFFSET [Start]

MOV BX,[OFFSET Count]

end;

Memory references and immediate values are further classified as either relocatable expressions or absolute expressions. A relocatable expression denotes a value that requires relocation at link time, and an absolute expression denotes a value that requires no such relocation. Typically, an expression that refers to a label, variable, procedure, or function is relocatable, and an expression that operates solely on constants is absolute.

Relocation is the process by which the linker assigns absolute addresses to symbols. At compile time, the compiler doesn't know the final address of a label, variable, procedure, or function; it doesn't become known until link time, when the linker assigns a specific absolute address to the symbol.

The built-in assembler allows you to carry out any operation on an absolute value, but it restricts operations on relocatable values to addition and subtraction of constants.

Expression types

Every built-in assembler expression has an associated type--or more correctly, an associated size, because the built-in assembler regards the type of an expression simply as the size of its memory location. For example, the type (size) of an Integer variable is two, because it occupies 2 bytes.

The built-in assembler performs type checking whenever possible, so in the instructions

var

QuitFlag: Boolean;

OutBufPtr: Word;

.

.

.

asm

MOV AL,QuitFlag

MOV BX,OutBufPtr

end;

the built-in assembler checks that the size of QuitFlag is one (a byte), and that the size of OutBufPtr is two (a word). An error results if the type check fails; for example, this isn't allowed:

asm

MOV DL,OutBufPtr

end;

The problem is DL is a byte-sized register and OutBufPtr is a word. The type of a memory reference can be changed through a typecast; these are correct ways of writing the previous instruction:

asm

MOV DL,BYTE PTR OutBufPtr

MOV DL,Byte(OutBufPtr)

MOV DL,OutBufPtr.Byte

end;

These MOV instructions all refer to the first (least significant) byte of the OutBufPtr

variable.

In some cases, a memory reference is untyped; that is, it has no associated type. One example is an immediate value enclosed in square brackets:

asm

MOV AL,[100H]

MOV BX,[100H]

end;

The built-in assembler permits both of these instructions, because the expression [100H] has no associated type--it just means "the contents of address 100H in the data segment," and the type can be determined from the first operand (byte for AL, word for BX). In cases where the type can't be determined from another operand, the built-in assembler requires an explicit typecast:

asm

INC BYTE PTR [100H] IMUL WORD PTR [100H]

end;

The following table summarizes the predefined type symbols that the built-in assembler provides in addition to any currently declared Object Pascal types.

Table 19-5 Predefined type symbols

Symbol Type

BYTE

1

WORD

2

DWORD

4

QWORD

8

TBYTE

10

NEAR

0FFFEH

FAR

0FFFFH

Notice in particular the NEAR and FAR pseudotypes, which are used by procedure and function symbols to indicate their call model. You can use NEAR and FAR in typecasts just like other symbols. For example, if FarProc is a FAR procedure,

procedure FarProc; far;

and if you are writing built-in assembler code in the same module as FarProc, you can use the more efficient NEAR call instruction to call it:

asm

PUSH CS

CALL NEAR PTR FarProc

end;

Expression operators

The built-in assembler provides a variety of operators, divided into 12 classes of precedence. The following table lists the built-in assembler's expression operators in decreasing order of precedence.

Built-in assembler operator precedence is different from Object Pascal. For example, in a built-in assembler expression, the AND operator has lower precedence than the plus (+) and minus (-) operators, whereas in a Object Pascal expression, it has higher precedence.

Table 19-6 Summary of built-in asssembler expression operators

& Identifier override operator

(), [],HIGH, LOW Structure member selector

+, - Unary operators

: Segment override operator

OFFSET, SEG, TYPE, PTR,

*, /, MOD, SHL, SHR,+, -

Binary addition/ subtraction operators

NOT, AND, OR, XOR Bitwise operators

Table 19-7 Definitions of built-in assembler expression operators

Operator Description

& Identifier override. The identifier immediately following the ampersand is treated as a user-defined symbol, even if the spelling is the same as a built-in assembler reserved symbol.

(...) Subexpression. Expressions within parentheses are evaluated completely prior to being treated as a single expression element. Another expression can optionally precede the expression within the parentheses; the result in this case becomes the sum of the values of the two expressions, with the type of the first expression.

[...] Memory reference. The expression within brackets is evaluated completely prior to being treated as a single expression element. The expression within brackets can be combined with the BX, BP, SI, or DI registers using the plus (+) operator, to indicate CPU register indexing. Another expression can optionally precede the expression within the brackets; the result in this case becomes the sum of the values of the two expressions, with the type of the first expression. The result is always a memory reference.

. Structure member selector. The result is the sum of the expression before the period and the expression after the period, with the type of the expression after the period. Symbols belonging to the scope identified by the expression before the period can be accessed in the expression after the period.

HIGH Returns the high-order 8 bits of the word-sized expression following the operator. The expression must be an absolute immediate value.

LOW Returns the low-order 8 bits of the word-sized expression following the operator. The expression must be an absolute immediate value.

+ Unary plus. Returns the expression following the plus with no changes. The expression must be an absolute immediate value.

- Unary minus. Returns the negated value of the expression following the minus. The expression must be an absolute immediate value.

: Segment override. Instructs the assembler that the expression after the colon belongs to the segment given by the segment register name (CS, DS, SS, or ES) before the colon. The result is a memory reference with the value of the expression after the colon. When a segment override is used in an instruction operand, the instruction will be prefixed by an appropriate segment override prefix instruction to ensure that the indicated segment is selected.

OFFSET Returns the offset part (low-order word) of the expression following the operator. The result is an immediate value.

SEG Returns the segment part (high-order word) of the expression following the operator.

The result is an immediate value.

TYPE Returns the type (size in bytes) of the expression following the operator. The type of an immediate value is 0.

PTR Typecast operator. The result is a memory reference with the value of the expression following the operator and the type of the expression in front of the operator.

* Multiplication. Both expressions must be absolute immediate values, and the result is an absolute immediate value.

/ Integer division. Both expressions must be absolute immediate values, and the result is an absolute immediate value.

MOD Remainder after integer division. Both expressions must be absolute immediate values, and the result is an absolute immediate value.

SHL Logical shift left. Both expressions must be absolute immediate values, and the result is an absolute immediate value.

SHR Logical shift right. Both expressions must be absolute immediate values, and the result

is an absolute immediate value.

+ Addition. The expressions can be immediate values or memory references, but only one of the expressions can be a relocatable value. If one of the expressions is a relocatable value, the result is also a relocatable value. If either of the expressions are memory references, the result is also a memory reference.

- Subtraction. The first expression can have any class, but the second expression must be an absolute immediate value. The result has the same class as the first expression.

NOT Bitwise negation. The expression must be an absolute immediate value, and the result is an absolute immediate value.

AND Bitwise AND. Both expressions must be absolute immediate values, and the result is an absolute immediate value.

OR Bitwise OR. Both expressions must be absolute immediate values, and the result is an absolute immediate value.

XOR Bitwise exclusive OR. Both expressions must be absolute immediate values, and the result is an absolute immediate value.

Assembler procedures and functions

So far, every asm... end construct you've seen has been a statement within a normal begin... end statement part. Delphi's assembler directive allows you to write complete procedures and functions in built-in assembler, without the need for a begin... end statement part. Here's an example of an assembler function:

function LongMul(X, Y: Integer): Longint; assembler; asm

MOV AX,X

IMUL Y

end;

The assembler directive causes Delphi to perform a number of code generation optimizations:

  • The compiler doesn't generate code to copy value parameters into

    local variables. This affects all string-type value parameters, and other value parameters whose size isn't 1, 2, or 4 bytes. Within the procedure or function, such parameters must be treated as if they were var parameters.

  • The compiler doesn't allocate a function result variable, and a

    reference to the @Result symbol is an error. String functions, however, are an exception to this rule--they always have a @Result pointer that is allocated by the caller.

  • The compiler generates no stack frame for procedures and functions

    that aren't nested and have no parameters and no local variables.

  • The automatically generated entry and exit code for an assembler

    procedure or function looks like this:

PUSH

MOV SUB

.

BP

BP,SP

SP,Locals

;Present if Locals <> 0 or Params <> 0

;Present if Locals <> 0 or Params <> 0

;Present if Locals <> 0

.
.

MOV

SP,BP

;Present if Locals <> 0

POP

BP

;Present if Locals <> 0 or Params <> 0

RET

Params

;Always present

  • Locals is the size of the local variables, and Params is the

    size of the parameters. If both Locals and Params are zero, there is no entry code, and the exit code consists simply of a RET instruction.

Functions using the assembler directive must return their results as follows:

  • Ordinal-type function results (integer, boolean, enumerated types,

    and Char) are returned in AL (8-bit values), AX (16-bit values), or DX:AX (32-bit values).

  • Real-type function results (type Real) are returned in DX:BX:AX.

  • 8087-type function results (type Single, Double, Extended, and

    Comp) are returned in ST(0) on the 8087 coprocessor's register stack.

  • Pointer-type function results are returned in DX:AX.

  • String-type function results are returned in the temporary location

    pointed to by the @Result function result symbol.

The assembler directive is comparable to the external directive, and assembler procedures and functions must obey the same rules as external procedures and functions. The following examples demonstrate some of the differences between asm statements in Object Pascal functions and assembler functions. The first example uses an asm statement in a Object Pascal function to convert a string to upper case. Notice that the value parameter Str in this case refers to a local variable, because the compiler automatically generates entry code that copies the actual parameter into local storage.

function UpperCase(Str: String): String;

begin asm

CLD

LEA SI,Str

LES DI,@Result

SEGSS LODSB STOSB

XOR AH,AH

XCHG AX,CX

JCXZ @3

@1:

SEGSS

LODSB

CMP

AL,'a'

JB

@2

CMP

AL,'z'

JA

@2

SUB

AL,20H

@2:

STOSB

LOOP @1

@3:

end; end;

The second example is an assembler version of the UpperCase function. In this case,

Str isn't copied into local storage, and the function must treat Str as a var parameter.

function UpperCase(Str: String): String; assembler; asm

PUSH DS CLD

LDS SI,Str

LES DI,@Result LODSB

STOSB

XOR AH,AH

XCHG AX,CX

JCXZ @3

@1:

LODSB

CMP AL,'a'

JB @2

CMP AL,'z'

JA @2

SUB AL,20H

@2:

STOSB LOOP @1

@3:

POP DS

end;

C h a p t e r