Some of this section should be revision but it goes on to look at some issues you probably haven't considered before. It forms the foundation for the next sections.
By now you should be familiar with the ‘three box’ model of computers.
This module explores around that simple model, which has a single processor which controls a bus to load and store data from/to the other ‘boxes’. This is an adequate model but is not always the whole story. Before looking at alternative structures, here is a brief glimpse at some logical organisations.
First, a little revision. The memory lives in an address space. The size of the address space is set by the architecture and is supposed to be ‘large enough’ for purpose. With an N-bit address, there are 2N addressable locations. This means that a 32-bit architecture can address 232 = 4,294,967,296 locations (4 Gi, or “four gig” for short) and a 64-bit architecture can address 264 = 18,446,744,073,709,551,616 locations (which is 16 Ei, about 18E18 or “a lot”).
Until fairly recently it was not practical to fill a 32-bit address space with actual memory in most cases. Now it has become normal and - because there is a demand - there has been a migration to 64-bit architectures, at least in many computers.
Why go to 64 bits rather than, say 40 bits?
That's a good question. There's no overwhelming reason and,
historically, the number of address bits has not always been a
power of 2. (For example, the 8086/8088 - the great grandfather of
most PC processors - had a 20-bit address bus which gave it a
1 MiB address space: adequate/affordable in its time. The first
ARM processors, which appeared slightly later, had a 26-bit
(64 MiB) address space.) However there are some excuses for
sticking to powers of two.
Puzzle
Why might there be a disadvantage in having (for example) a 48-bit word/address space (i.e. 6 bytes per word)?
Hint: consider an array of pointers ...![]()
What is addressed?
Almost (but not quite always) an 8-bit byte. (Those who have done the
‘Stump’ lab. will have seen this need not be the case.) Addressing
each bit separately (it has been done) is not often useful and
limits the overall memory size more. Traditionally a ‘character’ is a
useful minimum unit size and (for most Europeans/Americans) 7 bits was
plenty, hence codes like
ASCII. 8 bits was later
‘tidier’ - and 8 is a power of 2. This is the
‘
Nowadays the outlook is a bit more global but ‘byte addressing’ is
strongly established.
Speculation
If you could start again with a ‘clean sheet’ computer design but using contemporary technology, what word size would you choose, and why?
![]()
Note: there may be no ‘perfect’ answer to this.
Exploration: character encodings
Most numeric variables need more than the 28 = 256 values a byte provides. Thus they comprise several bytes. Being sensible, these bytes will be stored at adjacent addresses. Moving these one-at-a-time would be slow so it is usual to have a wider data bus to move the bytes in parallel.
A word is the ‘natural’ size of data in a particular
architecture: word sizes vary between architectures. When addressing
a word one representative address is used: typically the lowest.
This organisation has implications for the way variable addresses are
assigned. Unless a correctly aligned address is used
there will be problems.
For example, think of attempting a 64-bit word access to address 0000_0000_0000_0013 in the figure above.Possibilities include:
What might happen?
All of these have happened in the past. Note that even if the
expected answer is achieved functionally the performance
penalty is severe.
The ‘bottom line’ is variables should be aligned on addresses which
are multiples of their size.
The applicability of alignment can legitimately be extended further. Although a processor may (conceptually) work with 64-bit words the quantities ‘behind the scenes’ may be larger.
A clear example is a cache line [[link]], which may be (say)
eight words in length. It makes sense to make best use of that cache
line and - in particular - it makes sense to ‘start’ at the beginning
of the cache line. Placing data structures (either at compile
time or dynamically, like
Similarly, code entry points (i.e. branch targets) may be aligned in a similar way. This allows more object code bytes/words to be fetched if the bus width [[link]] permits it.
The problem is even more acute with instruction sets - the x86 is a
good example - which have variable length instructions. Starting a
section of code at an aligned address means the first fetch will
capture the most useful bytes.
Sometimes codes, particularly loop entry points are padded out for
speed. The saving over the execution time of a particular loop will
usually more than make up for the NOPs - especially as NOPs with
different length op. codes can be used to reduce the instruction
count.
In this (contrived) example, fetching two 8-byte words at a time, how many:
![]()
- bus operations are needed to cache the highlighted routine?
- complete instructions are fetched when the procedure is first entered?
- complete instructions are fetched when the loop branch is first taken?
- instruction fetch bus cycles are needed to execute the routine, if it loops 5 times?
In a superscalar processor, which can execute more than one instruction in parallel, being able to fetch several instructions quickly can enable increased performance.
Don't just take my word for it! Here is an x86-64 code fragment picked fairly much at random:
... .text:00404654 83 f8 04 cmp $0x4,%eax .text:00404657 0f 86 42 fe ff ff jbe 0x0040449f .text:0040465d eb 96 jmp 0x004045f5 .text:0040465f 90 nop .text:00404660 41 bb 1b 00 00 00 mov $0x1b,%r11d .text:00404666 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:0x0(%rax,%rax,1) .text:00404670 45 88 19 mov %r11b,(%r9) .text:00404673 49 83 c2 01 add $0x1,%r10 .text:00404677 49 83 c1 01 add $0x1,%r9 .text:0040467b 31 c0 xor %eax,%eax .text:0040467d eb d1 jmp 0x00404650 .text:0040467f 90 nop .text:00404680 41 bb 20 00 00 00 mov $0x20,%r11d .text:00404686 eb e8 jmp 0x00404670 .text:00404688 0f 1f 84 00 00 00 00 00 nopl 0x0(%rax,%rax,1) .text:00404690 b8 06 00 00 00 mov $0x6,%eax .text:00404695 eb b9 jmp 0x00404650 .text:00404697 66 0f 1f 84 00 00 00 00 00 nopw 0x0(%rax,%rax,1) .text:004046a0 44 8d 58 d0 lea -0x30(%rax),%r11d .text:004046a4 b8 02 00 00 00 mov $0x2,%eax .text:004046a9 eb a5 jmp 0x00404650 .text:004046ab 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) .text:004046b0 41 bb 7f 00 00 00 mov $0x7f,%r11d .text:004046b6 eb b8 jmp 0x00404670 .text:004046b8 0f 1f 84 00 00 00 00 00 nopl 0x0(%rax,%rax,1) ...A fragment of the Linux ‘
ls ’ utility, disassembled with disassembler.io.
Activity: you can try this yourself (fairly) easily with your own code sample(s).
If they're not already familiar, don't worry about the specific instruction mnemonics. Note:
Nerdy puzzle
What is the effect of the instruction at0040467b and why is it done that way?
How should bytes be numbered within a word?
Another debated question! This is known as ‘byte order’ or
‘endianness’.
Here's a very
short video explanation.
The earlier figure implied a little endian arrangement where
the least significant byte is at the (numerically) lowest address.
The arrangement is arbitrary. There is a tendency for the majority of
popular processors to favour ‘little endian’ arrangements although
‘big endian’ is possibly more common in networking.
Most of the time the byte order is irrelevant. It matters a stored
quantity is accessed with different sized operations. For example, if
a word is written as bytes to a file with one convention and read into
a word on a machine with the other convention then the byte order will
be swapped and confusion will result.
Many modern machines allow the hardware endianness to be programmed in software.
Aside: note that European languages write words (and the letters in them) left-to-right but numbers are right-to-left. In “1234” the meaning of the ‘1’ is not interpretable until the other digits have been read.
They are usually called “Arabic numerals”. Actually the system was invented in India!
Next: Buses.