COMP25212 Context
Timeline
The scope of this module is computer architecture. This
is the organisation of various ’building blocks‘ of
computer systems. In particular the focus is on memory systems
(hierarchies, caches etc.) and more sophisticated processors
(pipelines, multiprocessing etc.) but these don't operate in isolation
and a number of other areas will be visited.
What is not covered here are engineering details (no gates) nor
instruction sets nor programming although influences of these topics
(and more) will be visible in places. We also attempt to highlight
some recurring principles – such as cacheing*
and resolving dependencies which also apply to many other areas of
computing.
* I prefer that spelling: you don't have to.
Contextual figure
This diagram shows the areas of the computer ‘stack’ which feature here – some more significantly than others. Hover and explore...
Principles
- Parallelism
-
Parallelism often gets somewhat overlooked by those with a purely
software background since (traditional) computer programming is a
serialisation process – one line after another. Parallelism is
implicit in hardware and is increasingly important in software
with multithreading needed to exploit multiprocessors.
It is also significant in extracting performance, including techniques
such as pipelining and vector and superscalar
processing.
- Latency/bandwidth
-
Latency is a horrible problem. Write latency can be alleviated
by write buffering (‘fire-and-forget’) but there is
very little that can be done about read latency. It is
sometimes possible to use speculation. Guessing memory
addresses can be quite accurate and then prefetching data
values will alleviate latency; speculating on data values directly is
less always reliable. Latency is typically reduced by cacheing.
Increasing bandwidth is easier (in principle) with wider and faster
data links moving blocks of data.
- Cacheing
-
It is possible to describe cacheing as the key to computer
architecture. This is a bit cynical but it is a very important
concept which is used in numerous circumstances.
Effective cacheing relies on the statistical properties
and locality of behaviour in software.
- Pipelining
-
This is a cheap way to introduce parallelism into systems. It is
mostly applied to hardware (micro)architectures although it is
applicable to multithreaded software. Care must be taken to
avoid hazards.
Because behaviour is not always straightforward – e.g. software
sometimes has (conditional) branches – there may be speculation
and various expedients may be used to make this more accurate.
- Speculation
- Speculation is guessing what may happen in the future. If a
processor prefetches instructions before completing the current
one(s) then it is speculating. This can improve performance but comes
with a cost. Whether it is beneficial depends on the gains when the
guess is right, the cost when the guess is wrong and the probability
of being correct. It may also depend on the application: is it worth
spending more power to go faster?
Some things are easier to predict than others: sometimes it's worth
adding features (e.g. branch prediction) to get more accurate
predictions.
- Synchronisation
- A dependency occurs when the order of operations is
important. however, waiting for something to finish can cause delays
and, sometimes, operations may be done out of order to save
time, doing something from the future now whilst otherwise
waiting because you (probably) can.
Note that this may also involve speculation that nothing will
go wrong.
Often this might be done safely. Sometimes it is important that
operations are done strictly in the order that the user
specified. In this latter case a barrier (hardware or
software) may be needed to suppress the speculation.
- Error detection/correction
- Not a major feature of this module but faults do occur and error
correction codes are used in memories in high reliability
systems, such as file- and compute-servers. These can extend to
register files. Error detection is routine in I/O systems – including
disks, USB etc. where operations may be retried.
These or similar principles are also used extensively in network
communications, broadcasting etc.
- Virtualisation
- Conforming to a defined interface and encapsulation of functions
enables flexibility. The most familiar technique is
possibly virtual memory although it is likely that processing
will be increasingly virtualised in future, too.
- Interleaving
- A mechanism where several, parallel slow cycling units ‘take
turns’ to provide a service, providing a higher bandwidth.
[Picture two men with sledgehammers driving a single fencepost.] The
principle is used in memories, often where they are shared. One use
is in DRAM where multiprocessors may share the same space but
interleave accesses to different banks (see section on DRAM);
another use is to allow (something close to) dual-port access –
typically with SRAM.
Another application is to speed up reading from (or writing to)
a disk system, where it is referred to as
‘striping’
Exercise: identify at least one (and, ideally, as many as you
can) example to illustrate these principles.
You may need to keep coming back to this list throughout the module.
‘Asynchronous’ notes
To assist navigation, notes are written thus. They include
illustrations, some of which are interactive.
Small exercises are written thus. These are intended to
provoke thought and investigation and provide a basis for discussion
in ‘synchronous’ sessions.
Further reading is marked like this. This is intended for
interested parties. Full understanding is not expected for the module
assessment but all knowledge may come in useful and it's
intended to be enlightening!
Further reading
D.A. Patterson, J.L. Hennessy. Computer Organization and Design.
The Hardware/Software Interface, Morgan Kaufmann/Elsevier
Various versions exist: the most relevant is probably the
ARM
Edition (2016)
This way to the notes.