Addressing

Some of this section should be revision but it goes on to look at some issues you probably haven't considered before. It forms the foundation for the next sections.

Address spaces

By now you should be familiar with the ‘three box’ model of computers.

This module explores around that simple model, which has a single processor which controls a bus to load and store data from/to the other ‘boxes’. This is an adequate model but is not always the whole story. Before looking at alternative structures, here is a brief glimpse at some logical organisations.

First, a little revision. The memory lives in an address space. The size of the address space is set by the architecture and is supposed to be ‘large enough’ for purpose. With an N-bit address, there are 2^N addressable locations. This means that a 32-bit architecture can address 2³² = 4,294,967,296 locations (4 Gi, or “four gig” for short) and a 64-bit architecture can address 2⁶⁴ = 18,446,744,073,709,551,616 locations (which is 16 Ei, about 18E18 or “a lot”).

Until fairly recently it was not practical to fill a 32-bit address space with actual memory in most cases. Now it has become normal and - because there is a demand - there has been a migration to 64-bit architectures, at least in many computers.

Why go to 64 bits rather than, say 40 bits?
That's a good question. There's no overwhelming reason and, historically, the number of address bits has not always been a power of 2. (For example, the 8086/8088 - the great grandfather of most PC processors - had a 20-bit address bus which gave it a 1 MiB address space: adequate/affordable in its time. The first ARM processors, which appeared slightly later, had a 26-bit (64 MiB) address space.) However there are some excuses for sticking to powers of two.

Puzzle

Why might there be a disadvantage in having (for example) a 48-bit word/address space (i.e. 6 bytes per word)?
Hint: consider an array of pointers ...

What is addressed?
Almost (but not quite always) an 8-bit byte. (Those who have done the ‘Stump’ lab. will have seen this need not be the case.) Addressing each bit separately (it has been done) is not often useful and limits the overall memory size more. Traditionally a ‘character’ is a useful minimum unit size and (for most Europeans/Americans) 7 bits was plenty, hence codes like ASCII. 8 bits was later ‘tidier’ - and 8 is a power of 2. This is the ‘char’ in the C language.
Nowadays the outlook is a bit more global but ‘byte addressing’ is strongly established.

Speculation

If you could start again with a ‘clean sheet’ computer design but using contemporary technology, what word size would you choose, and why?
Note: there may be no ‘perfect’ answer to this.

Exploration: character encodings

Most numeric variables need more than the 2⁸ = 256 values a byte provides. Thus they comprise several bytes. Being sensible, these bytes will be stored at adjacent addresses. Moving these one-at-a-time would be slow so it is usual to have a wider data bus to move the bytes in parallel.

A word is the ‘natural’ size of data in a particular architecture: word sizes vary between architectures. When addressing a word one representative address is used: typically the lowest.
This organisation has implications for the way variable addresses are assigned. Unless a correctly aligned address is used there will be problems.

For example, think of attempting a 64-bit word access to address 0000_0000_0000_0013 in the figure above.
What might happen?

Possibilities include:

Truncating the address and fetching word 0000_0000_0000_0000 instead.
Fetching 5 bytes from one word then taking a second cycle to fetch the other three from the next word - then patching these together.
Something else.

All of these have happened in the past. Note that even if the expected answer is achieved functionally the performance penalty is severe.
The ‘bottom line’ is variables should be aligned on addresses which are multiples of their size.

Alignment

The applicability of alignment can legitimately be extended further. Although a processor may (conceptually) work with 64-bit words the quantities ‘behind the scenes’ may be larger.

A clear example is a cache line [[link]], which may be (say) eight words in length. It makes sense to make best use of that cache line and - in particular - it makes sense to ‘start’ at the beginning of the cache line. Placing data structures (either at compile time or dynamically, like malloc may beneficially be aligned on more coarsely aligned boundaries.

Similarly, code entry points (i.e. branch targets) may be aligned in a similar way. This allows more object code bytes/words to be fetched if the bus width [[link]] permits it.

The problem is even more acute with instruction sets - the x86 is a good example - which have variable length instructions. Starting a section of code at an aligned address means the first fetch will capture the most useful bytes.
Sometimes codes, particularly loop entry points are padded out for speed. The saving over the execution time of a particular loop will usually more than make up for the NOPs - especially as NOPs with different length op. codes can be used to reduce the instruction count.

In this (contrived) example, fetching two 8-byte words at a time, how many:

bus operations are needed to cache the highlighted routine?

complete instructions are fetched when the procedure is first entered?

complete instructions are fetched when the loop branch is first taken?

instruction fetch bus cycles are needed to execute the routine, if it loops 5 times?

In a superscalar processor, which can execute more than one instruction in parallel, being able to fetch several instructions quickly can enable increased performance.

Don't just take my word for it! Here is an x86-64 code fragment picked fairly much at random:

                                                ...
.text:00404654 83 f8 04                         cmp    $0x4,%eax
.text:00404657 0f 86 42 fe ff ff                jbe    0x0040449f
.text:0040465d eb 96                            jmp    0x004045f5
.text:0040465f 90                               nop
.text:00404660 41 bb 1b 00 00 00                mov    $0x1b,%r11d
.text:00404666 66 2e 0f 1f 84 00 00 00 00 00    nopw   %cs:0x0(%rax,%rax,1)
.text:00404670 45 88 19                         mov    %r11b,(%r9)
.text:00404673 49 83 c2 01                      add    $0x1,%r10
.text:00404677 49 83 c1 01                      add    $0x1,%r9
.text:0040467b 31 c0                            xor    %eax,%eax
.text:0040467d eb d1                            jmp    0x00404650
.text:0040467f 90                               nop
.text:00404680 41 bb 20 00 00 00                mov    $0x20,%r11d
.text:00404686 eb e8                            jmp    0x00404670
.text:00404688 0f 1f 84 00 00 00 00 00          nopl   0x0(%rax,%rax,1)
.text:00404690 b8 06 00 00 00                   mov    $0x6,%eax
.text:00404695 eb b9                            jmp    0x00404650
.text:00404697 66 0f 1f 84 00 00 00 00 00       nopw   0x0(%rax,%rax,1)
.text:004046a0 44 8d 58 d0                      lea    -0x30(%rax),%r11d
.text:004046a4 b8 02 00 00 00                   mov    $0x2,%eax
.text:004046a9 eb a5                            jmp    0x00404650
.text:004046ab 0f 1f 44 00 00                   nopl   0x0(%rax,%rax,1)
.text:004046b0 41 bb 7f 00 00 00                mov    $0x7f,%r11d
.text:004046b6 eb b8                            jmp    0x00404670
.text:004046b8 0f 1f 84 00 00 00 00 00          nopl   0x0(%rax,%rax,1)
                                                ...

A fragment of the Linux ‘ls’ utility, disassembled with disassembler.io.

Activity: you can try this yourself (fairly) easily with your own code sample(s).

If they're not already familiar, don't worry about the specific instruction mnemonics. Note:

addresses
variable number of bytes in instructions
jumps (jbe, jmp etc.)
‘nop’s (no operations) of various lengths
jump destinations (e.g. 00404670) - here all aligned on 16-byte boundaries

In this sample, most ‘nop’s pad out after jumps, but note 00404666.

Nerdy puzzle
What is the effect of the instruction at 0040467b and why is it done that way?

Endianness

How should bytes be numbered within a word?
Another debated question! This is known as ‘byte order’ or ‘endianness’. Here's a very short video explanation.
The earlier figure implied a little endian arrangement where the least significant byte is at the (numerically) lowest address. The arrangement is arbitrary. There is a tendency for the majority of popular processors to favour ‘little endian’ arrangements although ‘big endian’ is possibly more common in networking.

Most of the time the byte order is irrelevant. It matters a stored quantity is accessed with different sized operations. For example, if a word is written as bytes to a file with one convention and read into a word on a machine with the other convention then the byte order will be swapped and confusion will result.
Try it on a bit of scrap paper!

Many modern machines allow the hardware endianness to be programmed in software.

Aside: note that European languages write words (and the letters in them) left-to-right but numbers are right-to-left. In “1234” the meaning of the ‘1’ is not interpretable until the other digits have been read.
They are usually called “Arabic numerals”. Actually the system was invented in India!

Next: Buses.