Readability :: spasm

Readability ..

Let us first look at two different versions of the same Routine:

Version 1:

ShowEquate:

mov ebx, eax

mov edi ShwEquHexa

L0:

cmp esi edx

jb L1>

movsb

jmp L0<

L1:

mov eax ' ='

stosd

mov eax ' '

mov ecx 3

rep stosd

push edi

std

mov ecx, 9

L1:

If ecx e 5

cmp ecx 5

jne L2>

mov al '_'

stosb

L2:

cmp ecx 1

jne L2>

mov al '_'

stosb

L2:

mov al bl

and al 0F

add al, '0'

cmp al '9'

jne L2>

add al 7

L2:

stosb

shr ebx, 4

loop L1<

cld

pop edi

inc edi

mov eax ' h'

stosd

mov al 0

stosb

push 01000

push ShwEquTitle

push ShwEquHex

push 0

call 'USER32.MessageBoxA'

ret

Version 2:

ShowEquate:

mov ebx eax

mov edi ShowEquateHexa

While esi < edx

movsb

End_While

mov eax ' =' | stosd | mov eax ' ', ecx 3 | rep stosd

push edi

std

mov ecx, 9

L1: If ecx = 5

mov al '_' | stosb

End_If

If ecx = 1

mov al '_' | stosb

End_If

mov al bl | and al 0F | add al, '0' | On al a '9', add al 7

stosb | shr ebx, 4 | loop L1<

cld

pop edi

inc edi

mov eax ' h' | stosd | mov al 0 | stosb

call 'USER32.MessageBoxA' 0, ShowEquateHexa, ShowEquateTitle, &MB_SYSTEMMODAL

ret

Indentations

Several x86 Instructions are to be 'paired', because they are particularly dangerous. This especially is the case for the PUSH / POP and for the STD / CLD pairs. Indenting Source in between these instructions pairs eases a lot the maintainence and prevents a lot of errors, like jumping out of the indented Instructions chunk.

Indenting HLL-like statements is, of course, now, an evident 'must-have'.

With a fixed font 16/8, like the ones in RosAsm Source Editor, the good indentation is 4 spaces long. 2 space indentations are too short for making it pretty, and 8 spaces indentations, when having to indent, for example many HLL Cases Levels, may go too far out of the screen width.

This 4 space indentation fits perfectly, too, with the use of RosAsm Local Labels (L0:...), because they leave just one space between the colon and the first Statement leading char.

Blank Lines

You may consider blank lines as some kind of wishable vertical indentations of your Sources.

Having all the Source of one given Routine entirely printed on the screen is such an ease for reading, that setting blank lines might seem a high cost. It is really not, because blank lines help a lot at holding the overall organization whereas losing a couple of screen lines is not a problem. Anyway, the fact of having a whole Routine on one single screen is not a good criteria for Source organization (only the building logic is).

Multi-Instuction Lines

The fact of writing several instructions on one single Line increases readability in two ways:

This feature allows having more Source Instructions on one screen, and so it is much easier to take an overall look at what a Routine does, with less painful moving.

This feature decreases the readability of the flow of instructions grouped upon the same line and, at the same time increases the readability of the overall action(s). No! I am not joking with you: When you read a Source, you cannot -and need not- be fully aware of all the tiny details of each little instruction. Sometimes, a set of Asm instructions may be considered as a sentence (and the inward instruction as the words of a sentence. For example, in:

mov eax ' h' | stosd | mov al 0 | stosb

We do not care of what is exactly going on inside this line. This is just to finish the writing (writes the ending 'h' char and the zero ending char). We could as well replace this by a Macro, we would name ''CloseHexa'', and the information retrieved at reading time would be about the same. So, decreasing the readability of little interest instructions increases, a lot, the readability of important statements and of the overall code organization.

There is no light when there is no shadow.

HLL Macros

HLL Macros do not -or very few- decrease the running speed of an Application. There is no reason to be afraid of not doing true Low Level Assembly while using them. Needless to say, they dramaticaly increase the readability.

Multi-Instruction Macros

What I call so is, for example,

[mov | mov #1 #2 | #+2]

mov eax D$Value, ecx 3, edx 0 | div ecx | mov D$Value eax

The details of the mov instruction are of zero interest to the reader. Every Asm programmer perfectly knows how to make an integer division. So, the mov edx 0, for example, falls into the same logic of desirable decrease of readability, I describe, in the upper Multi-Instruction Lines paragraph. All this line is nothing more than a full english sentence that would say ''Divide this Value by 3''. You do not need to raise the low meaning details up.

Jumps and Labels positions

In:

L1: If ecx = 5

mov al '_' | stosb

End_If

If ecx = 1

mov al '_' | stosb

End_If

mov al bl | and al 0F | add al, '0' | On al a '9', add al 7

stosb | shr ebx, 4 | loop L1<

... the fact of having the jumping instructions at the end of lines makes it much easier to hold the Low Level organizations of Code. Local Labels Declarations should never be in another place than first row. The readability increasing on this point is very important because bad jumps are an easy to make error in Asm, and these errors may be difficult to point out or find.

Standard Registers use

The x86 Registers are now a bit less specifically designed than they were in the good old days. Nevertheless, they still have usages, particularly devoted to each one:

eax : Scratch Register (most general purpose)

ebx : Base Register

ecx : Counter Register

... and so on.

Even in case when you could make use of any Register to perform a non specific operation, you will increase by a lot the readability by using the standard ones. Example, any quantity of Items should be preferably stored in ecx, even if it is not to be used in conjunction with any loop or rep Instruction. Later, when re-reading this ecx, in your source, you will more easier guess that this is a Counter, than if the same count is stored inside, say, edi. In the same manner, Esi and edi, should be preferred as Source and Destination Pointers Registers, even if you do not intend to perform any String Instruction, and so on.

This is of course not an obligation, but, if you can do it, why not do it the more standard way.

Registers vs Variables

The only clear thing about when to use a Variable instead of a Register is that the use of Variables produces much more readable sources than Registers, even when the Registers are used the more standard way. As opposed to this, Registers produce faster Code than Variables, but this is most often not a valid reason for abusing of Registers' use. Keeping a Register alive across several Routines and Procedures may quickly become a true hell, whereas, the rather low speed cost coming with a nicely named Variable contributes a lot to the Source Readability on long scopes, and is much easier to maintain and to develop.

The general rule I try to apply, in my own writing, is that the use of Registers should have a very short scope, that is, inside a Routine or Procedure. When this is not wishable -example, several Routines operating on the same Data Area(s)-, I do my best for using the Registers the standard way - usually esi and edi, for Source and Destination and ecx for Counter -.

I say 'I', here, because there is no conventional formulation under a fixed rule and the choices depend on several things: The context, the personal tastes, the experience. Nevertheless, abuse of Registers and abuse of Variables are evidently both bad and wrong: The first one, at a readability point of view, the second one, at the Code efficiency point of view.

Namings

We should avoid use of C-Like Names like 'ptr', 'sz', and so on. Instead, taking some time to write full talking names, that have only advantages:

Easier to read.

Easier to paste from Source to Source (much less naming conflicts).

Needs much less in Comments. With full talking names, a Source may be considered Auto-Commented.

How to write full talking Names is not only a matter of personal taste. The used typo may contribute (or not) to readability:

mov ecx D$totalnumberoflines

... is not very readable, because the components of the talking name are not visible, and worse if its in upper case:

mov ecx D$TOTALNUMBEROFLINES

Much more readable is:

mov ecx D$TotalNumberOfLines

We could think, too, of separating the full talking name components with dash lines:

mov ecx D$_total_number_of_lines

... But this way is bad too, because, at first sight, our eyes are unable to hold the different components of the whole instruction. Though, dash lines may be very useful for numbers writing:

mov eax 0_FFFF_0FFF

mov eax 1_645

mov eax 00_11010000_11111111_00011010_00000000

Do not use names that talk for saying nothing:

Third_Routine: ; Yes !!! I actually saw this in a source!!!

Avoiding optimizations

Both Size and speed optimizations tips and tricks hardly kill readability. As these tricks are most often no use on actual Computer, simply write what you think. If you want to turn ebx zero, do not write XOR ebx ebx, this is ridiculous. Write mov ebx 0. Or, even better, if that zero is not really a value, at the human meaning point of view, prefer the full talking forms: mov ebx &FALSE | mov ebx &NULL...

In some (rare) cases, optimizations tricks may be accurate. Example, for dividing / multiplying by a 2n Value, the Shifting Instructions are much simpler to write, much faster to run, much shorter in Code Size, require zero additional Register use (a lot of good points for one so simple a trick...), and are, finally, not more difficult to read.

Comments

Do not comment the instructions. Instead, comment what the program is doing.

Writing a few use comments after instructions should be avoided. If you are not able to understand what an instruction does in one given context, when re-reading, you can, of course, do it, but, as a general rule, the better way is to write full comments lines at the top of the Routine, or at the top of an Instructions Chunk. This way it is much easier to maintain, too.

It is most often better to comment a Block of Instructions than only one Instruction, as instructions in the program flow are more like words in a sentence than stand alone sentences. When we do not understand the action of one particular Instruction, most often, this is because we do not understand the overall purpose of the group of Instructions it belongs to.

Comment your critical Symbols at the Declarations

Often times, when reading an old Source of yours, or alien Sources, you will not understand a Chunk of Code because you will not understand what the Symbols are or hold. What do you intend to do, in such cases? Simple: Right-Click upon the mysterious Symbol... Unfortunately, there is no Comment at the Declaration! Too bad!... At least, each Table Declaration should have a clear Comment saying what it is supposed to hold and how the Data are organized inside.

Re-read your Comments from time to time

Inside our own Sources, once they are running fine, we all tend to not re-read our Comments. Very often, after months of development and successive modifications, the comments are no longer accurate. This is particulary killing for readers, and never forget that, one year later, you will be nothing more than a flat reader of your own writings... Worse than no comment at all: The wrong comment!!!

~~~~~~~

spasm

Specific Assembler

Readability

Assembly Language

Readability

RosAsm

Assembly

Binary