Buses: Implementation

A simple view of a 64-bit processor bus might look something like this:
Here the address bus is 61 bits wide because the lowest 3 bits address the bytes (only) and there are 2³ = 8 bytes across the data buses.
There will also be some control signals: there needs to be an indication of the direction of a given transaction (write/store or read/load) and something to indicate if the bus is active or not. To facilitate write operations of different sizes there needs to be some indication of the transfer size so that the memory system does not modify bytes it should not.

(In a real bus there may be other control signals too, but they don't matter too much here. Some examples are mentioned later.)

Imagine trying to fill the memory space with memories which are bought in 8 GiB units (arbitrary example size) - i.e. 1 GiW at a time. 30 address bits are needed to select a location within a unit. This leaves 31 address bits to consider which is space for about two billion such units.
Some of these other bits are decoded to distinguish RAM from I/O etc. and some will be wanted to select one RAM or another. However it is unlikely that all of these need be used.

The usual expedient is to ignore some address bits. This means their values don't matter and the (visible) effect is to cause a memory (or I/O) location to appear at repeated positions in the (physical) address space. There may also be (notionally) ‘empty’ areas or spaces ‘for future expansion’ designated.

Here are a couple of address map illustrations from the Web.
(Click image for article/attribution.)

Bus implementation

Reminder: two important metrics.

Latency: The delay from starting an operation to perceiving the result.
Bandwidth: The rate at which data can be moved.

Thus the read latency is the time from wanting some data (from a determined address) and receiving that data.

For a given transaction there is not much that can be done to reduce read latency although, architecturally there are mechanisms such as cacheing. Write latency can be alleviated by write buffering.
Bandwidth can be improved by cycling the bus faster - which may be difficult in practice - or making the bus wider. The latter approach is useful when it is known (or expected) that coherent blocks of data such as cache lines are being moved; it does not help with individual ‘random’ transfers.

Making buses wider is usually realisable. Note that it increases the number of parallel wires significantly and this can become a problem for pinout if it crosses a chip boundary.

On the other hand, sometimes it is desirable to have a ‘thinner’ data bus, usually in systems (or parts of systems) where speed doesn't matter as much as saving wires: typically I/O functions coming off a chip. In these instances it is normal to have hardware sequencing multiple cycles so the operation is ‘transparent’ to the user.

It is common for systems to support different bus implementations to carry different parts of the address space. There may be a fast, wide bus connection to the RAM, etc. and a slower (simpler, easier to engineer) bus connecting devices which demand less bandwidth, particularly I/O devices. One bus implementation can be converted into another by what is usually called a bus bridge, hence the various terminology Northbridge, Southbridge, Sandy Bridge, Ivy Bridge etc. The field is mired in marketing jargon!

Explore further. There are plenty of examples of system diagrams on the Web, although many are not wholly clear. Here is an example System on Chip (SoC) which has some clear figures: see around p.17.
(There is another example address map on p.54.)

Referring to these figures, several features referred to here are visible.

Harvard architecture inside the processor: {“DCache” & “ICache”}.

Multiple buses with bridges.

Lower frequency operation further from the processor.

Bus bridged off chip (“External Bus Interface”).

Bus crossbar switch (“Matrix”).

A ‘crossbar’ is a switch which can make any set of non-conflicting connections in parallel. In this case it will allow any master access to any slave.

The implementation of the bus is not really of concern here. On-chip, and often on PCB where it is feasible it is normal to use parallel buses with many wires. For more distant expansion, serial buses (same information but sequenced in time) is currently preferred: a familiar example might be PCI Express.
Another familiar ‘bus’, USB, doesn't carry processor signals: it is for peripheral I/O. It's also more network-like than a traditional bus.

I/O spaces

Interfaces with the ‘outside world’ come in many forms. They are mediated by by ‘peripherals’ - specialised devices which translate software operations into the appropriate sequences of signals.

Typically, peripherals will have a number of specialised registers which are used to monitor and control external quantities. For example, if controlling a motor there could be a register which is written to to set the desired speed, and another where the actual speed could be monitored.

Most often these registers are memory mapped i.e. they have unique addresses in the overall address space. Most I/O peripherals only have a small number of associated registers (maybe only one, typically less than 32) so they only require a small part of the address space. They are therefore often placed ‘together’ in the address space. For economy, many address lines may be ignored so various registers/areas will appear as aliased repetitions at regular intervals.

In a simple computer system the memory is only written and read by a single processor. This means it cannot change unexpectedly (at least if the system is programmed correctly!). Once multiprocessors are introduced - including DMA, GPUs etc. memory locations may be changed whilst the processor is ‘not looking’.

I/O - especially the ‘Input’ bit - is expected to change autonomously. The first significant issue here is to ensure areas of address space containing peripherals are not cached.
There may be other issues to note, such as changing one register may cause changes in others. This gets a bit detailed to delve into here though.

Aside: some processors have additional I/O address spaces, logically separate from the memory space. The most notable example may be the x86 architecture where it is retained for legacy reasons. Here it is a 64 KiB space accessible by the in and out instructions (analogous to ‘load’ and ‘store’) It is redundant in modern systems.

There may be other (usually small) spaces. For example, ARM uses a coprocessor space for some control registers (such as the MMU) which can then be outside the memory mapped space.

Some other bus signals

Wait: Typically there is some timing signal to slow operations when the memory cannot respond immediately.
Fault/abort: Something went wrong. This is used primarily from an MMU for virtual memory page faults but may also signal I/O problems etc.
Locking information: For multiprocessor operations.
Sequence information: Hints about the relationship of successive addresses can help accelerate some memory accesses.
Privilege state: An indication of the processor mode - i.e. is it currently running an application, O.S. etc. Access to some addresses are only allowed in particular modes.