Why Communication Protocols Exist

An MCU needs to exchange data with external devices — displays, sensors, memory chips, other computers. Each device is a separate chip on the circuit board, connected to the MCU by copper traces (wires). A communication protocol defines the rules for how bits travel over those wires: who sends when, how the receiver knows when a bit starts and ends, and how to address a specific device when multiple are connected.

Three protocols dominate embedded systems: SPI, I²C, and UART. They represent different tradeoffs between speed, wire count, and complexity. Understanding when each is used comes down to one question: how much data needs to move, and how many devices share the connection?

SPI (Serial Peripheral Interface)

The Master-Slave Model

SPI is built around an asymmetric relationship: one device is the master and all others are slaves. These are fixed roles determined by the circuit design, not negotiated at runtime.

The master is always the MCU — the device running your firmware. It initiates every transaction and generates the clock signal. A slave is any external device the MCU communicates with: an e-paper display, a flash memory chip, a temperature sensor, an ADC. The slave never speaks unless the master asks it to. The terminology comes from the fact that the slave has no autonomy — it cannot initiate communication, and its timing is entirely controlled by the master’s clock.

Info

Some modern documentation uses “controller/peripheral” instead of “master/slave.” The technical meaning is identical.

The Bus: Shared Wires, Individual Selection

SPI uses a bus topology — multiple slaves share three wires (clock, data out, data in) but each slave gets its own dedicated chip select (CS) line. The CS line is how the master says “I’m talking to you specifically.”

Consider an MCU connected to three SPI devices — an e-paper display, a flash memory chip, and a temperature sensor:

                    ┌──────────────┐
              CS1 ──┤ E-paper      │
MCU ──SCK  ────────┤ display      │
    ──MOSI ────────┤              │
    ──MISO ────────┤              │
              │    └──────────────┘
              │    ┌──────────────┐
              CS2──┤ Flash memory │
              ─────┤              │
              ─────┤              │
              ─────┤              │
              │    └──────────────┘
              │    ┌──────────────┐
              CS3──┤ Temp sensor  │
              ─────┤              │
              ─────┤              │
              ─────┤              │
                   └──────────────┘

SCK, MOSI, MISO are shared (one set of wires)
CS1, CS2, CS3 are separate (one wire per device)

All three devices see every clock pulse and every data bit on the shared wires. But a device only responds when its CS line is pulled low. When CS is high (inactive), the device ignores everything on the bus and keeps its MISO output in a high-impedance state (electrically disconnected), so it doesn’t interfere with other devices.

Why CS Is Active-Low

Pulling CS low (0V) to activate a device rather than high is an electrical convention called active-low signaling. It exists for a historical and practical reason: in early CMOS and TTL logic, pins default to a high state when unconnected (due to internal or external pull-up resistors). This means that at power-on, before the MCU’s firmware has initialized anything, all CS lines are naturally high — and all slaves are naturally deselected. No device accidentally activates during startup. If the convention were active-high, an uninitialized MCU might accidentally select a device and cause bus conflicts during boot.

The Four Signals

SignalDirectionPurpose
SCKMaster → SlaveClock: a square wave generated by the master that tells the slave when to sample each bit
MOSIMaster → SlaveData flowing from MCU to device (commands, pixel data, write operations)
MISOSlave → MasterData flowing from device to MCU (sensor readings, memory contents, status bytes)
CSMaster → SlaveChip select: pulled low to activate one specific slave (one CS line per slave device)

A transaction works as follows: the master pulls one CS line low, then generates clock pulses on SCK. On each clock edge, one bit of data is simultaneously sent on MOSI (master to slave) and received on MISO (slave to master) — this is what full-duplex means. After the transfer, CS returns high.

CS   ‾‾‾‾\_______________________________________/‾‾‾‾
SCK  ‾‾‾‾‾\_/‾\_/‾\_/‾\_/‾\_/‾\_/‾\_/‾\_/‾‾‾‾‾‾‾‾
MOSI ‾‾‾‾‾‾<b7><b6><b5><b4><b3><b2><b1><b0>‾‾‾‾‾‾
           ← one byte, MSB first →

For devices like e-paper displays, the MCU only sends data (commands and pixels) — MISO is unused and often not even connected. For devices like flash memory, the MCU sends a “read address X” command on MOSI and simultaneously receives the data on MISO.

Clock Polarity, Phase, and Why They Exist

A digital signal takes time to transition between low and high — it’s not instantaneous. During that transition, the voltage is briefly ambiguous. If the receiver tried to sample the data line during a transition, it might read a 0 or a 1 unpredictably.

The solution is to use the clock signal to tell the receiver exactly when to sample. But “when” has two degrees of freedom:

Clock polarity (CPOL) determines the idle state of the clock line. Some slave chips were designed to expect the clock to idle low (CPOL=0); others expect it to idle high (CPOL=1). This is purely a convention chosen by the chip’s designer — there is no technical superiority to either.

Clock phase (CPHA) determines which clock edge triggers sampling. CPHA=0 means sample on the first edge (the transition away from idle); CPHA=1 means sample on the second edge (the transition back to idle).

Together, CPOL and CPHA form four modes (0–3). Each slave device’s datasheet specifies which mode it requires. You configure the matching mode in the MCU’s SPI peripheral registers. If master and slave disagree, the slave samples data at the wrong moment and every byte is corrupted — this is one of the most common first-time debugging issues, where “everything is wired correctly but I get garbage.”

Tip

You never choose the SPI mode — the slave device dictates it. You look up the slave’s datasheet, find “CPOL=0, CPHA=0” (or equivalent), and configure the master to match.

SPI Clock Speed and the Prescaler

The master generates the SPI clock, and its frequency determines the data rate. But the MCU’s SPI peripheral cannot generate an arbitrary frequency — it derives the SPI clock by dividing down the peripheral bus clock (the internal clock that drives that region of the chip).

The divider is a prescaler that only supports powers of two (2, 4, 8, 16, 32, 64, 128, 256). This is because a power-of-two divider is trivially implemented in hardware as a chain of flip-flops — each flip-flop halves the frequency. An arbitrary divider (say, divide-by-7) would require a counter with comparison logic, which takes more silicon area and adds propagation delay. Since SPI does not require an exact frequency (there’s no standard baud rate to hit), the power-of-two constraint is an acceptable tradeoff.

For example, if the STM32H743’s SPI1 peripheral bus clock runs at 100 MHz and you set the prescaler to 32, the SPI clock will be 100 / 32 = 3.125 MHz. If the e-paper display’s datasheet says “maximum SPI clock: 4 MHz,” this is fine — you need to be at or below the slave’s maximum, but you don’t need to match it exactly. Running slower simply means the data transfer takes longer.

Important

The MCU does not “ask” the slave what clock speed it wants. The slave’s datasheet specifies a maximum, and you configure the master’s prescaler to produce a frequency at or below that maximum. There is no negotiation.

Extra Pins on Display Modules

SPI displays add auxiliary signals beyond the four standard SPI lines, because the display controller chip (e.g., SSD1681 for e-paper, ILI9341 for TFT LCD) needs to distinguish commands from pixel data:

PinPurpose
DC (or D/C, A0)Data/Command select: low = the byte on MOSI is a command (e.g., “start refresh”), high = the byte is pixel data
RSTHardware reset for the display controller; pulse low to reinitialize
BUSYOutput from the display: asserted while a refresh is in progress; the MCU must wait for it to deassert before sending more data

These are ordinary GPIO pins on the MCU — not part of the SPI peripheral. Your firmware toggles them manually before and during SPI transactions. A typical sequence: pull DC low → send command byte over SPI → pull DC high → send pixel data bytes over SPI → wait for BUSY to deassert.

I²C (Inter-Integrated Circuit)

A Different Tradeoff: Fewer Wires, More Complexity

Where SPI uses a dedicated CS wire per slave, I²C uses just two wires total — SDA (data) and SCL (clock) — regardless of how many devices are connected. Each device has a hardcoded 7-bit address (set by the chip manufacturer, sometimes with a few address bits configurable via pins). The master begins every transaction by sending the target address on the bus, and only the device with that address responds.

This eliminates the need for one CS wire per device, which matters when you have many small peripherals. The cost is bandwidth: I²C standard mode runs at 100 kHz, fast mode at 400 kHz, fast mode plus at 1 MHz — roughly 10–100× slower than SPI. I²C also has more protocol overhead (start conditions, address bytes, ACK/NACK bits per byte) that further reduces effective throughput.

I²C requires pull-up resistors on both SDA and SCL (typically 4.7kΩ to 3.3V). This is because I²C uses open-drain signaling: devices can only pull the line low, and the resistor pulls it back high when released. This allows multiple devices to share the same wire without electrical conflict — if two devices pull low simultaneously, the wire is simply low (no short circuit). SPI doesn’t have this constraint because each device has its own CS line preventing simultaneous access.

What Uses I²C

I²C is the standard interface for small, low-bandwidth peripherals where wiring simplicity matters more than speed. Common examples: temperature/humidity sensors (BME280, SHT40), accelerometers and gyroscopes (MPU6050, LSM6DS3), real-time clock chips (DS3231), small OLED displays (SSD1306 128×64 — these screens have so few pixels that I²C’s low bandwidth is sufficient), port expanders, and EEPROM memory chips.

UART (Universal Asynchronous Receiver/Transmitter)

Point-to-Point, No Clock

UART connects exactly two devices — it’s not a bus. Two wires: TX (transmit) and RX (receive), crossed between the devices (MCU’s TX connects to the partner’s RX and vice versa). There is no clock wire and no master/slave relationship — either side can send at any time.

Because there’s no clock, both sides must agree on a baud rate (bits per second, e.g., 115200) in advance, configured independently on each end. Both devices’ internal clocks must be accurate enough that they don’t drift apart within a single byte frame (~10 bit periods). If one side is configured for 115200 and the other for 9600, the receiver gets garbage — another common first-time debugging issue.

What Uses UART

UART is primarily used for debug output and human-readable console communication — the embedded equivalent of stdout. On the Nucleo board, the ST-LINK debug probe bridges a UART to USB, so connecting the board to your computer creates a virtual serial port. GPS modules output NMEA sentences over UART. Bluetooth modules (like the HC-05) communicate with the MCU over UART. Some inter-board links use UART for simple command/response traffic.

In embedded Rust, defmt over RTT (Real-Time Transfer) is a more efficient alternative to UART for debug logging — it uses the SWD debug link rather than dedicating a UART peripheral — but UART remains ubiquitous in the broader embedded world.

When to Use Which

The choice is usually dictated by the external device — its datasheet specifies which protocol it supports. But when you’re selecting components for a design, the tradeoffs are:

CriterionSPII²CUART
Typical speed1–100 MHz100 kHz – 1 MHz9600–921600 baud
Wire count3 shared + 1 CS per device2 total (shared)2 (point-to-point)
Multiple devicesYes, via CS linesYes, via addressingNo (two devices only)
Use caseHigh-bandwidth: displays, flash memory, high-speed ADCs/DACsLow-bandwidth, many sensors on shared wiresDebug consoles, GPS, Bluetooth modules, inter-board links
Data directionFull-duplex (simultaneous send/receive)Half-duplex (one direction at a time)Full-duplex

Displays — especially anything with more than a few thousand pixels — almost always use SPI because of the framebuffer size. An e-paper display at 800×480 resolution needs 192,000 bytes of pixel data pushed per refresh. At I²C fast-mode (400 kHz), that would take ~4 seconds for the transfer alone. At SPI 4 MHz, it takes ~0.4 seconds.

Tip

On a Raspberry Pi, these same protocols appear as kernel device files — /dev/spidev0.0, /dev/i2c-1, /dev/ttyAMA0 — rather than as memory-mapped registers. The electrical signals on the wire are identical; the programming interface differs. See Microcontroller vs Single-Computer Board for the full comparison.