COMP25212 Context


Timeline

The scope of this module is computer architecture. This is the organisation of various ’building blocks‘ of computer systems. In particular the focus is on memory systems (hierarchies, caches etc.) and more sophisticated processors (pipelines, multiprocessing etc.) but these don't operate in isolation and a number of other areas will be visited.

What is not covered here are engineering details (no gates) nor instruction sets nor programming although influences of these topics (and more) will be visible in places. We also attempt to highlight some recurring principles – such as cacheing* and resolving dependencies which also apply to many other areas of computing.

* I prefer that spelling: you don't have to.

Contextual figure

This diagram shows the areas of the computer ‘stack’ which feature here – some more significantly than others. Hover and explore...

Principles

Parallelism
Parallelism often gets somewhat overlooked by those with a purely software background since (traditional) computer programming is a serialisation process – one line after another. Parallelism is implicit in hardware and is increasingly important in software with multithreading needed to exploit multiprocessors. It is also significant in extracting performance, including techniques such as pipelining and vector and superscalar processing.
Latency/bandwidth
Latency is a horrible problem. Write latency can be alleviated by write buffering (‘fire-and-forget’) but there is very little that can be done about read latency. It is sometimes possible to use speculation. Guessing memory addresses can be quite accurate and then prefetching data values will alleviate latency; speculating on data values directly is less always reliable. Latency is typically reduced by cacheing.
Increasing bandwidth is easier (in principle) with wider and faster data links moving blocks of data.
Cacheing
It is possible to describe cacheing as the key to computer architecture. This is a bit cynical but it is a very important concept which is used in numerous circumstances. Effective cacheing relies on the statistical properties and locality of behaviour in software.
Pipelining
This is a cheap way to introduce parallelism into systems. It is mostly applied to hardware (micro)architectures although it is applicable to multithreaded software. Care must be taken to avoid hazards.
Because behaviour is not always straightforward – e.g. software sometimes has (conditional) branches – there may be speculation and various expedients may be used to make this more accurate.
Speculation
Speculation is guessing what may happen in the future. If a processor prefetches instructions before completing the current one(s) then it is speculating. This can improve performance but comes with a cost. Whether it is beneficial depends on the gains when the guess is right, the cost when the guess is wrong and the probability of being correct. It may also depend on the application: is it worth spending more power to go faster?
Some things are easier to predict than others: sometimes it's worth adding features (e.g. branch prediction) to get more accurate predictions.
Synchronisation
A dependency occurs when the order of operations is important. however, waiting for something to finish can cause delays and, sometimes, operations may be done out of order to save time, doing something from the future now whilst otherwise waiting because you (probably) can.
Note that this may also involve speculation that nothing will go wrong.
Often this might be done safely. Sometimes it is important that operations are done strictly in the order that the user specified. In this latter case a barrier (hardware or software) may be needed to suppress the speculation.
Error detection/correction
Not a major feature of this module but faults do occur and error correction codes are used in memories in high reliability systems, such as file- and compute-servers. These can extend to register files. Error detection is routine in I/O systems – including disks, USB etc. where operations may be retried.
These or similar principles are also used extensively in network communications, broadcasting etc.
Virtualisation
Conforming to a defined interface and encapsulation of functions enables flexibility. The most familiar technique is possibly virtual memory although it is likely that processing will be increasingly virtualised in future, too.
Interleaving
A mechanism where several, parallel slow cycling units ‘take turns’ to provide a service, providing a higher bandwidth. [Picture two men with sledgehammers driving a single fencepost.] The principle is used in memories, often where they are shared. One use is in DRAM where multiprocessors may share the same space but interleave accesses to different banks (see section on DRAM); another use is to allow (something close to) dual-port access – typically with SRAM.
Another application is to speed up reading from (or writing to) a disk system, where it is referred to as ‘striping’
Exercise: identify at least one (and, ideally, as many as you can) example to illustrate these principles.
You may need to keep coming back to this list throughout the module.

‘Asynchronous’ notes

To assist navigation, notes are written thus. They include illustrations, some of which are interactive.

Small exercises are written thus. These are intended to provoke thought and investigation and provide a basis for discussion in ‘synchronous’ sessions.
Further reading is marked like this. This is intended for interested parties. Full understanding is not expected for the module assessment but all knowledge may come in useful and it's intended to be enlightening!

Further reading

D.A. Patterson, J.L. Hennessy. Computer Organization and Design.
The Hardware/Software Interface, Morgan Kaufmann/Elsevier
Various versions exist: the most relevant is probably the ARM Edition (2016)


This way to the notes.