Document IDCAN-03

SeriesTelematics Tutorial Series

AudienceAdvanced — you should understand CAN frame formats (CAN-02), be comfortable with microcontroller register configuration, and have access to a CAN bus for testing.

CAN Bus Bit Timing, Error Handling, and Fault Confinement

Document ID: CAN-03 Series: Telematics Tutorial Series Target audience: Advanced — you should understand CAN frame formats (CAN-02), be comfortable with microcontroller register configuration, and have access to a CAN bus for testing.

Cross-references: - For physical layer design (cabling, termination, transceivers), see CAN-01. - For frame formats (standard, extended, CAN FD, remote, error), see CAN-02. - For hardware interfaces (socketCAN, CAN adapters, higher-layer protocols, security), see CAN-04.

Acronyms Used in This Document

Acronym	Expansion	First introduced
BRP	Baud Rate Prescaler	Section 2.4
bxCAN	Basic Extended CAN (STM32 legacy CAN peripheral)	Section 2.4
CAN	Controller Area Network	Title
CRC	Cyclic Redundancy Check	Section 4.3
ECR	Error Counter Register (FDCAN/FlexCAN register name)	Section 6.4
ESI	Error State Indicator	Section 9.1
ESR	Error Status Register (bxCAN register name)	Section 6.4
FDCAN	Flexible Data-rate CAN (STM32 CAN FD peripheral)	Section 2.4
FlexCAN	Flexible Controller Area Network (NXP CAN peripheral IP)	Section 2.4
GPIO	General-Purpose Input/Output	Section 4.5
HAL	Hardware Abstraction Layer (STM32 driver library)	Section 6.4
ISO	International Organization for Standardization	Section 1.4
LEC	Last Error Code (bxCAN register field)	Section 6.4
MCAN	Modular CAN (Bosch CAN FD IP core)	Section 1.4
MCU	Microcontroller Unit	Section 8.2
NRZ	Non-Return-to-Zero (CAN bus encoding method)	Section 4.2
ppm	parts per million	Section 8.2
REC	Receive Error Counter	Section 6
SAE	Society of Automotive Engineers	Section 1.3
SJW	Synchronization Jump Width	Section 1.4
SSP	Secondary Sample Point (CAN FD TDC sample point)	Section 3.2
TDC	Transmitter Delay Compensation	Section 3.2
TEC	Transmit Error Counter	Section 6
TQ	Time Quantum	Section 1.1
TSEG1	Time Segment 1 (combined Prop_Seg + Phase_Seg1 in register)	Section 2.4
TSEG2	Time Segment 2 (Phase_Seg2 in register)	Section 2.4

Learning Objectives

By the end of this document, you will be able to:

Calculate CAN bit timing parameters (time quanta, segment lengths, sample point) for a target bit rate
Configure bit timing registers on STM32 (bxCAN/FDCAN), NXP (FlexCAN), and PIC32 (CAN module) microcontrollers
Identify all five CAN error detection mechanisms and explain when each triggers
Trace the Transmit Error Counter (TEC) and Receive Error Counter (REC) state machine through error-active, error-passive, and bus-off states
Implement bus-off recovery strategies in firmware

1. Bit Timing Fundamentals

Every bit on a CAN bus occupies a fixed time period called the bit time. The bit time is divided into smaller segments, each measured in units called time quanta (TQ). Correct bit timing ensures that all nodes on the bus sample each bit at the same relative point — even when their oscillators are not perfectly matched.

1.1 The Time Quantum

A time quantum (TQ) is the smallest unit of time in the CAN bit timing model. It is derived from the controller’s input clock by a prescaler:

TQ = Prescaler / f_clock

Where: - f_clock is the CAN controller’s input clock frequency (in Hz) - Prescaler is an integer divider (typically 1 to 1024) - TQ is the resulting time quantum duration (in seconds)

For example, with a 80 MHz clock and a prescaler of 10:

TQ = 10 / 80,000,000 = 125 ns

1.2 Bit Time Segments

Each bit time is divided into four segments:

packet-beta
  0: "Sync"
  1-6: "Prop_Seg (6 TQ)"
  7-13: "Phase_Seg1 (7 TQ)"
  14-15: "Phase_Seg2 (2 TQ)"

Figure: CAN03 01 bit timing segments

📝 Note: The ASCII diagrams in this document are portable across all Markdown viewers. If you convert these to Mermaid `packet-beta` diagrams for richer rendering, be aware that Mermaid `packet-beta` diagrams may render differently across GitHub, VS Code, and other Markdown viewers. Verify rendering in your target platform.

Sync_Seg (Synchronization Segment) — always 1 TQ

The synchronization segment is the first segment of every bit. The bus edge (transition between recessive and dominant, or vice versa) is expected to occur within this segment. All nodes use this edge to resynchronize their bit timing.

Prop_Seg (Propagation Segment) — 1 to 8 TQ

Compensates for the physical propagation delay of the signal across the bus. This segment must be long enough to cover the round-trip delay: the time for a signal to travel from the transmitting node to the farthest node and back, plus transceiver delays.

Prop_Seg ≥ 2 × (t_bus_propagation + t_transceiver_delay)

Phase_Seg1 (Phase Buffer Segment 1) — 1 to 8 TQ

A buffer segment that can be lengthened by resynchronization. Together with Phase_Seg2, it positions the sample point within the bit time.

Phase_Seg2 (Phase Buffer Segment 2) — 1 to 8 TQ

A buffer segment that can be shortened by resynchronization. Phase_Seg2 must be at least as long as the Information Processing Time (IPT), which is the time the CAN controller needs to calculate the bit value after sampling. IPT is typically 0–2 TQ.

1.3 The Sample Point

The sample point is the instant within the bit time when the bus level is read and interpreted as dominant or recessive. It occurs at the boundary between Phase_Seg1 and Phase_Seg2.

Sample Point (%) = (Sync_Seg + Prop_Seg + Phase_Seg1) / Total_TQ × 100

The sample point position is critical: - Too early → the signal may not have settled after propagation delay (especially on long buses) - Too late → insufficient time for Phase_Seg2, reducing the ability to resynchronize on subsequent edges

Recommended sample point positions:

Application	Recommended sample point	Rationale
CAN 2.0 (≤ 1 Mbit/s)	75%–87.5%	CAN in Automation (CiA) recommendation; balances propagation delay compensation with resynchronization margin
CAN FD arbitration phase	80%–87.5%	Same considerations as classical CAN
CAN FD data phase (2 Mbit/s)	70%–80%	Shorter bit time requires earlier sample to allow Phase_Seg2 resynchronization
CAN FD data phase (5 Mbit/s)	60%–70%	Very short bit time; earlier sample needed
SAE J1939 (250 kbit/s)	87.5%	SAE J1939-11 specifies 87.5% ± 2%

⚠️ Warning: All nodes on the same bus must use the same bit rate and a compatible sample point position. A mismatch of more than 5% between any two nodes' sample points can cause intermittent bit errors that are extremely difficult to diagnose because they correlate with specific data patterns.

1.4 Synchronization Jump Width (SJW)

The Synchronization Jump Width (SJW) defines the maximum number of TQ by which Phase_Seg1 or Phase_Seg2 can be lengthened or shortened during resynchronization. SJW allows the CAN controller to compensate for oscillator drift between nodes.

SJW range: 1 to min(4, Phase_Seg1, Phase_Seg2) TQ

A larger SJW provides more tolerance for oscillator frequency differences but reduces the effective noise margin at the sample point. Typical values:

SJW	Max oscillator tolerance (approx.)	When to use
1 TQ	±0.5%	High-quality crystal oscillators
2 TQ	±1.0%	Standard crystals
3 TQ	±1.5%	Ceramic resonators
4 TQ	±2.0%	Low-cost oscillators, long buses

💡 Tip: The CAN specification requires that all nodes' oscillator frequencies be within ±0.5% of the nominal bit rate for reliable operation at 1 Mbit/s. At lower bit rates, the tolerance is more relaxed. If you are using an internal **Resistor-Capacitor (RC)** oscillator on a microcontroller (typically ±2–5% accuracy), you will have problems at bit rates above 125 kbit/s. Use a crystal or crystal oscillator for any production CAN node.

📝 Note: While classical CAN limits SJW to 1–4 TQ (ISO 11898-1), CAN FD controllers often support larger SJW values in the data phase — the exact maximum is controller-dependent. For example, the Bosch MCAN IP core (used in STM32 FDCAN, SAM E5x, and others) supports SJW up to 128 TQ in the data-phase segment. A larger data-phase SJW can be beneficial for compensating the tighter oscillator tolerance requirements at high data rates (2–8 Mbit/s). Consult your controller's reference manual for the supported DSJW range.

2. Calculating Bit Timing Parameters

2.1 Step-by-Step Procedure

Follow these steps to calculate bit timing for any CAN controller:

Choose the target bit rate (e.g., 500 kbit/s).
Determine the total number of TQ per bit. The bit time must equal an integer number of TQ. Common choices:

Total_TQ = f_clock / (Prescaler × Bit_Rate)

Rearranged: choose a Prescaler that gives a Total_TQ between 8 and 25 (the ISO 11898-1 allowed range). Higher TQ counts give finer sample point resolution.

Set Sync_Seg = 1 TQ (always fixed).
Calculate Prop_Seg based on the bus propagation delay:

Prop_Seg (TQ) = ceil(2 × (t_bus + t_xcvr) / TQ_duration)

Where t_bus is the one-way cable delay and t_xcvr is the total transceiver loop delay.

Distribute remaining TQ between Phase_Seg1 and Phase_Seg2 to achieve the desired sample point.
Set SJW to 1–4 TQ based on oscillator quality.
Verify that the sample point falls within the recommended range for your application.

2.2 Worked Example: 500 kbit/s on STM32

Given: - Target bit rate: 500 kbit/s - STM32 CAN clock: 80 MHz (APB1 clock, typical for STM32F4/G4) - Bus length: 10 m (cable delay ≈ 5 ns/m × 10 m = 50 ns one-way) - Transceiver: MCP2562FD (loop delay ≈ 200 ns total) - Target sample point: 87.5%

Step 1 — Choose Prescaler and Total TQ:

Bit time = 1 / 500,000 = 2,000 ns = 2 µs
TQ = Prescaler / 80,000,000

For Total_TQ = 16:  TQ = 2,000 / 16 = 125 ns → Prescaler = 80 MHz × 125 ns = 10
For Total_TQ = 20:  TQ = 2,000 / 20 = 100 ns → Prescaler = 80 MHz × 100 ns = 8

Choose Total_TQ = 16, Prescaler = 10 (good resolution, common choice).

Step 2 — Calculate Prop_Seg:

Round-trip delay = 2 × (50 ns + 200 ns) = 500 ns
Prop_Seg_min = ceil(500 / 125) = 4 TQ

We need Prop_Seg >= 4. We also need Phase_Seg1 to stay within the ISO 11898-1 range of 1–8 TQ, which limits how much of the remaining TQ budget can go to Phase_Seg1. Choose Prop_Seg = 6 TQ (provides extra propagation margin and keeps Phase_Seg1 within range).

Step 3 — Calculate Phase segments for 87.5% sample point:

Sample point at 87.5% of 16 TQ = 14 TQ
Sync_Seg + Prop_Seg + Phase_Seg1 = 14 TQ
1 + 6 + Phase_Seg1 = 14
Phase_Seg1 = 7 TQ
Phase_Seg2 = 16 - 14 = 2 TQ

Step 4 — Set SJW:

SJW = min(4, Phase_Seg1, Phase_Seg2) = min(4, 7, 2) = 2 TQ

Final configuration:

Parameter	Value
Prescaler	10
Total TQ per bit	16
Sync_Seg	1 TQ
Prop_Seg	6 TQ
Phase_Seg1 (BS1)	7 TQ
Phase_Seg2 (BS2)	2 TQ
SJW	2 TQ
Sample point	87.5%
Actual bit rate	500 kbit/s

def calc_can_timing(f_clock_hz, bit_rate_bps, total_tq, prop_seg_tq, sample_pct):
    """Calculate CAN bit timing parameters."""
    prescaler = int(f_clock_hz / (bit_rate_bps * total_tq))
    # Time quantum in nanoseconds: TQ = prescaler / f_clock, converted to ns
    # Formula: tq_ns = 1_000_000_000 / (f_clock_hz / prescaler)
    tq_ns = 1_000_000_000 / (f_clock_hz / prescaler)
    sample_tq = int(total_tq * sample_pct / 100)
    phase_seg1 = sample_tq - 1 - prop_seg_tq  # subtract Sync_Seg and Prop_Seg
    phase_seg2 = total_tq - sample_tq
    sjw = min(4, phase_seg1, phase_seg2)

    print(f"Clock: {f_clock_hz/1e6:.0f} MHz, Bit rate: {bit_rate_bps/1000:.0f} kbit/s")
    print(f"Prescaler: {prescaler}")
    print(f"Total TQ: {total_tq}, TQ duration: {tq_ns:.0f} ns")
    print(f"Sync_Seg: 1, Prop_Seg: {prop_seg_tq}, Phase_Seg1: {phase_seg1}, Phase_Seg2: {phase_seg2}")
    print(f"SJW: {sjw}")
    print(f"Sample point: {sample_tq/total_tq*100:.1f}%")
    print(f"Actual bit rate: {f_clock_hz / (prescaler * total_tq):.0f} bit/s")

calc_can_timing(80_000_000, 500_000, 16, 6, 87.5)

# Output
Clock: 80 MHz, Bit rate: 500 kbit/s
Prescaler: 10
Total TQ: 16, TQ duration: 125 ns
Sync_Seg: 1, Prop_Seg: 6, Phase_Seg1: 7, Phase_Seg2: 2
SJW: 2
Sample point: 87.5%
Actual bit rate: 500000 bit/s

This Python function automates the bit timing calculation by taking the clock frequency, target bit rate, desired total TQ count, propagation segment length, and target sample point percentage as inputs. It computes the prescaler, distributes the remaining TQ among Phase_Seg1 and Phase_Seg2, and selects the largest valid SJW. You can reuse this function for any clock/bit-rate combination — change the arguments in the calc_can_timing() call at the bottom.

2.3 Additional Worked Examples

The following examples cover common clock frequencies and target bit rates. Each uses the same step-by-step procedure from Section 2.1.

Example 2: 250 kbit/s on 48 MHz Clock (STM32F0/L0)

Given: f_clock = 48 MHz, target = 250 kbit/s, target sample point = 87.5%, bus length = 25 m.

Bit time = 1 / 250,000 = 4,000 ns = 4 µs
For Total_TQ = 16:  TQ = 4,000 / 16 = 250 ns → Prescaler = 48 MHz × 250 ns = 12
Prop_Seg: Round-trip = 2 × (125 ns + 200 ns) = 650 ns → ceil(650 / 250) = 3 TQ
  Use Prop_Seg = 6 TQ (extra margin for 25 m bus)
Sample point at 87.5% of 16 TQ = 14 TQ
Phase_Seg1 = 14 - 1 - 6 = 7 TQ
Phase_Seg2 = 16 - 14 = 2 TQ
SJW = min(4, 7, 2) = 2 TQ

Parameter	Value
Prescaler	12
Total TQ	16
Sync_Seg	1 TQ
Prop_Seg	6 TQ
Phase_Seg1	7 TQ
Phase_Seg2	2 TQ
SJW	2 TQ
Sample point	87.5%
Actual bit rate	250 kbit/s

Example 3: 1 Mbit/s on 40 MHz Clock (NXP S32K)

Given: f_clock = 40 MHz, target = 1 Mbit/s, target sample point = 80%, bus length = 5 m (short in-ECU backbone).

Bit time = 1 / 1,000,000 = 1,000 ns = 1 µs
For Total_TQ = 20:  TQ = 1,000 / 20 = 50 ns → Prescaler = 40 MHz × 50 ns = 2
For Total_TQ = 10:  TQ = 1,000 / 10 = 100 ns → Prescaler = 40 MHz × 100 ns = 4

Choose Total_TQ = 20 (finer resolution).
Prop_Seg: Round-trip = 2 × (25 ns + 200 ns) = 450 ns → ceil(450 / 50) = 9 TQ
  Use Prop_Seg = 9 TQ
Sample point at 80% of 20 TQ = 16 TQ
Phase_Seg1 = 16 - 1 - 9 = 6 TQ
Phase_Seg2 = 20 - 16 = 4 TQ
SJW = min(4, 6, 4) = 4 TQ

Parameter	Value
Prescaler	2
Total TQ	20
Sync_Seg	1 TQ
Prop_Seg	9 TQ
Phase_Seg1	6 TQ
Phase_Seg2	4 TQ
SJW	4 TQ
Sample point	80.0%
Actual bit rate	1 Mbit/s

Example 4: 125 kbit/s on 16 MHz Clock (PIC32 / Arduino)

Given: f_clock = 16 MHz, target = 125 kbit/s, target sample point = 87.5%, bus length = 40 m (industrial network).

Bit time = 1 / 125,000 = 8,000 ns = 8 µs
For Total_TQ = 16:  TQ = 8,000 / 16 = 500 ns → Prescaler = 16 MHz × 500 ns = 8

Prop_Seg: Round-trip = 2 × (200 ns + 200 ns) = 800 ns → ceil(800 / 500) = 2 TQ
  Use Prop_Seg = 6 TQ (generous margin for 40 m bus with multiple stubs)
Sample point at 87.5% of 16 TQ = 14 TQ
Phase_Seg1 = 14 - 1 - 6 = 7 TQ
Phase_Seg2 = 16 - 14 = 2 TQ
SJW = min(4, 7, 2) = 2 TQ

Parameter	Value
Prescaler	8
Total TQ	16
Sync_Seg	1 TQ
Prop_Seg	6 TQ
Phase_Seg1	7 TQ
Phase_Seg2	2 TQ
SJW	2 TQ
Sample point	87.5%
Actual bit rate	125 kbit/s

Example 5: 250 kbit/s SAE J1939 on 80 MHz Clock

Given: f_clock = 80 MHz, target = 250 kbit/s, SAE J1939-11 requires 87.5% ± 2% sample point, bus length = 40 m (heavy truck backbone).

Bit time = 1 / 250,000 = 4,000 ns = 4 µs
For Total_TQ = 16:  TQ = 4,000 / 16 = 250 ns → Prescaler = 80 MHz × 250 ns = 20
For Total_TQ = 20:  TQ = 4,000 / 20 = 200 ns → Prescaler = 80 MHz × 200 ns = 16

Choose Total_TQ = 20 for finer resolution (J1939 requires tight sample point).
Prop_Seg: Round-trip = 2 × (200 ns + 250 ns) = 900 ns → ceil(900 / 200) = 5 TQ
  Use Prop_Seg = 7 TQ (J1939 trucks may have 40 m bus + long stub cables)
Sample point at 87.5% of 20 TQ = 17.5 → round to 18 TQ → 90.0%
  Adjust: for exactly 87.5%, use Total_TQ = 16: 14/16 = 87.5%.
  Revert to Total_TQ = 16, Prescaler = 20, TQ = 250 ns.
Prop_Seg = 6 TQ
Phase_Seg1 = 14 - 1 - 6 = 7 TQ
Phase_Seg2 = 16 - 14 = 2 TQ
SJW = min(4, 7, 2) = 2 TQ

Parameter	Value
Prescaler	20
Total TQ	16
Sync_Seg	1 TQ
Prop_Seg	6 TQ
Phase_Seg1	7 TQ
Phase_Seg2	2 TQ
SJW	2 TQ
Sample point	87.5% (within J1939-11 ±2% requirement)
Actual bit rate	250 kbit/s

📝 Note: Example 5 illustrates a common pitfall — when the desired sample point percentage does not divide evenly into the chosen Total_TQ, you must either adjust Total_TQ or accept a slightly different sample point. For J1939's strict 87.5% requirement, Total_TQ = 8, 16, or 24 all yield an exact 87.5% sample point.

2.4 Platform-Specific Register Configuration

STM32 bxCAN (STM32F1/F2/F4/L4)

The STM32 bxCAN peripheral uses a combined BS1 field that includes Prop_Seg + Phase_Seg1:

/* STM32 bxCAN bit timing for 500 kbit/s, 87.5% sample point */
/* APB1 clock = 80 MHz */
CAN1->BTR = (0 << 30)  |  /* Normal mode (not silent/loopback) */
            (1 << 24)  |  /* SJW = 2 TQ  (register value = SJW - 1) */
            (12 << 16) |  /* BS1 = 13 TQ (register value = BS1 - 1) */
                           /* BS1 = Prop_Seg + Phase_Seg1 = 6 + 7 = 13 */
            (1 << 20)  |  /* BS2 = 2 TQ  (register value = BS2 - 1) */
            (9 << 0);     /* Prescaler = 10 (register value = BRP - 1) */

The BS1 register field in STM32 bxCAN combines Prop_Seg and Phase_Seg1 into a single value. This is why you see BS1 = 13 (= Prop_Seg 6 + Phase_Seg1 7) rather than separate fields.

STM32 FDCAN (STM32G4/H7)

The FDCAN peripheral has separate nominal and data bit timing registers:

/* STM32 FDCAN nominal bit timing for 500 kbit/s, 87.5% sample point */
/* FDCAN kernel clock = 80 MHz */
FDCAN1->NBTP = (1 << 25)  |  /* NSJW = 2 TQ  (register value = NSJW - 1) */
               (12 << 16) |  /* NTSEG1 = 13 TQ (Prop + Phase1, reg = val - 1) */
               (1 << 0)   |  /* NTSEG2 = 2 TQ  (register value = val - 1) */
               (9 << 8);     /* NBRP = 10  (register value = BRP - 1) */

/* Data bit timing for 2 Mbit/s, 75% sample point */
FDCAN1->DBTP = (1 << 0)   |  /* DSJW = 2 TQ  (register value = val - 1) */
               (13 << 8)  |  /* DTSEG1 = 14 TQ (Prop 7 + Phase1 7, reg = val - 1) */
               (4 << 4)   |  /* DTSEG2 = 5 TQ  (register value = val - 1) */
               (1 << 16);    /* DBRP = 2   (register value = val - 1) */

NXP FlexCAN (i.MX, S32K, Kinetis)

#include "fsl_flexcan.h"  /* NXP MCUXpresso SDK FlexCAN driver */

/* NXP FlexCAN bit timing for 500 kbit/s, 87.5% sample point */
/* CAN Protocol Engine clock = 80 MHz */
CAN0->CTRL1 = (9 << 24)  |  /* PRESDIV = 10 (register value = prescaler - 1) */
              (1 << 22)  |  /* RJW = 2 TQ    (register value = SJW - 1) */
              (5 << 19)  |  /* PROPSEG = 6 TQ (register value = val - 1) */
              (6 << 16)  |  /* PSEG1 = 7 TQ   (register value = val - 1) */
              (1 << 12);    /* PSEG2 = 2 TQ   (register value = val - 1) */

FlexCAN keeps Prop_Seg and Phase_Seg1 as separate register fields, which maps directly to the ISO 11898-1 bit timing model.

PIC32 CAN Module (Microchip)

#include <xc.h>           /* Microchip XC32 compiler device header */
#include <sys/attribs.h>  /* Interrupt attribute macros */

/* PIC32 CAN bit timing for 500 kbit/s, 87.5% sample point */
/* PBCLK = 80 MHz */
C1CFG = (9 << 0)   |  /* BRP = 10 (register value = prescaler - 1) */
        (1 << 6)   |  /* SJW = 2 TQ  (register value = SJW - 1) */
        (5 << 8)   |  /* PRSEG = 6 TQ (register value = val - 1) */
        (6 << 11)  |  /* SEG1PH = 7 TQ (register value = val - 1) */
        (1 << 14)  |  /* SEG2PH = 2 TQ (register value = val - 1) */
        (1 << 17);    /* SEG2PHTS = 1 (freely programmable Phase_Seg2) */

📝 Note: All four platforms use "register value = actual value − 1" encoding for timing fields. A register value of 0 means 1 TQ, a value of 1 means 2 TQ, and so on. This is the single most common source of off-by-one bit timing configuration errors.

2.5 Common Bit Timing Configurations

These pre-calculated configurations cover the most common CAN bit rates. All assume an 80 MHz CAN clock:

📝 Note: These tables cover the most common clock frequencies. For other frequencies, use the Python calculator in Section 2.2 or your silicon vendor's CAN bit-timing tool (e.g., Kvaser Bit Timing Calculator, http://www.bittiming.can-wiki.info/).

Bit rate	Prescaler	Total TQ	Prop_Seg	Phase_Seg1	Phase_Seg2	SJW	Sample point
1 Mbit/s	5	16	6	7	2	2	87.5%
500 kbit/s	10	16	6	7	2	2	87.5%
250 kbit/s	20	16	6	7	2	2	87.5%
125 kbit/s	40	16	6	7	2	2	87.5%
100 kbit/s	50	16	6	7	2	2	87.5%
50 kbit/s	100	16	6	7	2	2	87.5%

For a 48 MHz CAN clock (common in STM32F0/L0):

Bit rate	Prescaler	Total TQ	Prop_Seg	Phase_Seg1	Phase_Seg2	SJW	Sample point
1 Mbit/s	3	16	6	7	2	2	87.5%
500 kbit/s	6	16	6	7	2	2	87.5%
250 kbit/s	12	16	6	7	2	2	87.5%
125 kbit/s	24	16	6	7	2	2	87.5%

For a 40 MHz CAN clock (common in NXP S32K, FlexCAN platforms):

Bit rate	Prescaler	Total TQ	Prop_Seg	Phase_Seg1	Phase_Seg2	SJW	Sample point
1 Mbit/s	5	8	4	2	1	1	87.5%
500 kbit/s	5	16	6	7	2	2	87.5%
250 kbit/s	10	16	6	7	2	2	87.5%
125 kbit/s	20	16	6	7	2	2	87.5%

📝 Note: The 40 MHz / 1 Mbit/s entry uses Total_TQ = 8 (not 16) because 40 MHz / (Prescaler × 1,000,000 × 16) does not yield an integer prescaler. With only 8 TQ per bit, the Phase_Seg2 is limited to 1 TQ and SJW = 1, which reduces oscillator tolerance. Use a 40 MHz clock for CAN FD data-phase rates rather than 1 Mbit/s classical CAN if possible.

For a 16 MHz CAN clock (common in PIC32, Arduino, MCP2515):

Bit rate	Prescaler	Total TQ	Prop_Seg	Phase_Seg1	Phase_Seg2	SJW	Sample point
1 Mbit/s	1	16	6	7	2	2	87.5%
500 kbit/s	2	16	6	7	2	2	87.5%
250 kbit/s	4	16	6	7	2	2	87.5%
125 kbit/s	8	16	6	7	2	2	87.5%

💡 Tip: The sample point of 87.5% with 16 TQ is a near-universal choice for classical CAN at all standard bit rates. When in doubt, start with this configuration and adjust only if bus-specific requirements (very long bus, many nodes, ceramic resonators) demand it.

3. CAN FD Dual Bit Rate Configuration

CAN FD frames can use two different bit rates: a slower arbitration phase rate (up to 1 Mbit/s, same as classical CAN) and a faster data phase rate (up to 8 Mbit/s). The bit rate switches at the Bit Rate Switch (BRS) bit and switches back at the CRC delimiter.

3.1 Data Phase Timing Constraints

The data phase operates at higher speeds, which means:

Fewer TQ per bit (as few as 5–10 TQ at 5 Mbit/s)
Less room for resynchronization (Phase_Seg2 may be only 1–2 TQ)
Tighter transceiver symmetry requirements (≤ 5 ns per ISO 11898-2:2016)
Transmitter Delay Compensation (TDC) becomes necessary

3.2 Transmitter Delay Compensation

At high data-phase bit rates, the loop delay through the transceiver (TXD → CAN bus → RXD) may exceed one bit time. Without compensation, the transmitting node would sample its own transmitted bits incorrectly.

Transmitter Delay Compensation (TDC) solves this by measuring the actual transceiver loop delay and shifting the transmitter’s sample point accordingly. Most CAN FD controllers implement TDC automatically — you configure the TDC Offset (TDCO) register based on the transceiver’s typical loop delay.

TDCO ≈ transceiver_loop_delay / TQ_data

For a transceiver with 200 ns loop delay and a data-phase TQ of 25 ns:

TDCO = 200 / 25 = 8 TQ

/* STM32 FDCAN transmitter delay compensation */
FDCAN1->TDCR = (8 << 0);  /* TDCO = 8 TQ (offset for transmitter delay) */

/* Enable TDC — required for data rates ≥ 2 Mbit/s */
FDCAN1->DBTP |= (1 << 23);  /* TDC enable bit */

This register configuration sets the TDC offset to 8 TQ and enables the TDC feature in the data-phase bit timing register. Without enabling TDC, the offset value is ignored.

Secondary Sample Point (SSP)

When TDC is active, the CAN FD controller uses a Secondary Sample Point (SSP) to sample transmitted bits during the data phase. The SSP is positioned at TDCO time quanta after the sample point of the transmitted bit, compensating for the round-trip delay through the transceiver. The controller compares the bit it transmitted against what it reads back at the SSP — if they differ, a bit error is flagged.

Normal sample point:    Used during arbitration phase (reads bus level at Phase_Seg1/Phase_Seg2 boundary)
Secondary sample point: Used during data phase by the TRANSMITTER only
                        SSP position = bit start + TDCO
                        Compensates for TXD → bus → RXD loop delay

📝 Note: Some controllers (e.g., Bosch MCAN / STM32 FDCAN) also provide a **TDC Filter (TDCF)** register that sets the maximum measurable delay. If the measured transceiver delay exceeds TDCF, the controller signals a protocol error. Set TDCF to your transceiver's maximum specified loop delay plus a margin (e.g., 300 ns for a transceiver rated at 250 ns max).

3.3 CAN FD Timing Example: 500 kbit/s Arbitration, 2 Mbit/s Data

Parameter	Arbitration phase	Data phase
Bit rate	500 kbit/s	2 Mbit/s
CAN clock	80 MHz	80 MHz
Prescaler	10	2
Total TQ	16	20
TQ duration	125 ns	25 ns
Sync_Seg	1	1
Prop_Seg	6	7
Phase_Seg1	7	7
Phase_Seg2	2	5
SJW	2	4
Sample point	87.5%	75.0%
TDCO	—	8 TQ

graph LR
    subgraph ARB["Arbitration Phase (500 kbit/s)"]
        direction LR
        A_SOF["SOF"] --- A_ID["ID
11b"] --- A_RRS["RRS"] --- A_IDE["IDE"] --- A_FDF["FDF"]
    end

    subgraph SWITCH1[" BRS=1"]
        direction LR
        BRS["BRS
Rate
Switch"]
    end

    subgraph DATAPH["Data Phase (2 Mbit/s) — 4× faster"]
        direction LR
        D_RES["res"] --- D_ESI["ESI"] --- D_DLC["DLC"] --- D_DATA["Data
(0-64B)"] --- D_CRC["CRC
17/21b"]
    end

    subgraph SWITCH2[" Switch back"]
        direction LR
        CRCD["CRC
Del"]
    end

    subgraph ARBACK["Arbitration Rate (500 kbit/s)"]
        direction LR
        B_ACK["ACK"] --- B_ACKD["ACK Del"] --- B_EOF["EOF
7b"]
    end

    ARB --- SWITCH1 --- DATAPH --- SWITCH2 --- ARBACK

    style ARB fill:#E3F2FD,stroke:#1565C0
    style DATAPH fill:#FFF3E0,stroke:#E65100
    style ARBACK fill:#E3F2FD,stroke:#1565C0
    style SWITCH1 fill:#FFCDD2,stroke:#D32F2F
    style SWITCH2 fill:#FFCDD2,stroke:#D32F2F
    style BRS fill:#FF5722,stroke:#333,color:#fff
    style CRCD fill:#FF5722,stroke:#333,color:#fff

Figure: CAN03 04 canfd dual bitrate

3.4 CAN FD Data-Phase Bit Rate Selection Guide

Selecting the appropriate CAN FD data-phase bit rate involves balancing throughput needs against physical constraints:

Data Rate	Max Bus Length	When to Use	Transceiver Examples
2 Mbit/s	~20 m	Default choice for most automotive/industrial CAN FD. Sufficient for all standard DLC values. Relaxed timing requirements.	TJA1443, MCP2558FD, TCAN4550
5 Mbit/s	~8 m	High-throughput applications: ECU flashing, camera data, large diagnostic uploads. Requires quality cabling and careful PCB layout.	TJA1463, MCP2558FD (selected modes)
8 Mbit/s	~2 m	Lab/bench setups, point-to-point links. Not practical for vehicle-length bus runs.	TJA1463 (bench mode)

At 2 Mbit/s, standard automotive twisted-pair cabling (ISO 11898-2) works reliably up to ~20 meters with proper termination. This covers most in-vehicle bus segments.

At 5 Mbit/s, the transmitter delay compensation (TDC) feature of the CAN FD controller becomes critical. Enable TDC and configure the offset to match your transceiver’s loop delay (typically 100-250 ns).

Bus length limitation formula: max_length_m ≈ propagation_delay_budget_ns / (5 ns/m) where the propagation delay budget is derived from the data-phase bit time minus the configured segments.

4. Error Detection Mechanisms

CAN implements five independent error detection mechanisms. Together, they provide a Hamming distance of 6 — meaning any combination of up to 5 bit errors in a frame is guaranteed to be detected. The probability of an undetected error is less than 4.7 × 10⁻¹¹.

4.1 Bit Error

A bit error occurs when a transmitting node reads back a bus value that differs from what it transmitted — outside of the arbitration field (where this is expected and constitutes lost arbitration rather than an error).

Detection mechanism: The transmitter compares each bit it sends with the actual bus level during the same bit time. If they differ (and the node is not in the arbitration field or the ACK slot), a bit error is signaled.

Example: A node transmits a dominant bit (0) but the bus reads recessive (1). This could indicate a hardware fault in the transmitting transceiver or a bus short that prevents the node from driving the bus.

What it looks like in practice: Bit errors are unique in that only the transmitting node detects them — other nodes on the bus see the error frame but not the original bit mismatch. On an oscilloscope, you will see the transmitter driving CAN_H and CAN_L to the expected differential voltage, but the received signal (at the RXD pin) disagrees. Common root causes include a damaged transceiver output stage, a CAN_H or CAN_L short to ground or V+, or a bus termination fault that causes reflections to corrupt the signal at the transmitter’s own RXD input.

4.2 Stuff Error

A stuff error occurs when a receiver detects six consecutive bits of the same polarity in the bit-stuffed region of a frame (SOF through CRC). CAN uses Non-Return-to-Zero (NRZ) encoding, which means the signal level stays constant for consecutive identical bits — without transitions, receivers cannot resynchronize their bit timing. The bit stuffing rule compensates for this by requiring a stuff bit of opposite polarity after every five consecutive identical bits. Six identical bits in a row means a stuff bit is missing.

Detection mechanism: The receiver counts consecutive same-polarity bits. If the count reaches 6 in the stuffed region of the frame, a stuff error is signaled.

What it looks like in practice: Stuff errors often appear as bursts — when EMI corrupts a portion of a frame, it may flip several consecutive bits to the same value, violating the stuffing rule. On a logic analyzer, you will see a run of 6+ identical bits where a stuff bit should have appeared. Stuff errors are also the most common error type during clock drift problems: if the receiver’s sample point slowly drifts relative to the transmitter’s bit boundaries, it may eventually sample the wrong side of a stuff bit transition and “miss” the stuff bit entirely.

4.3 CRC Error

A CRC error occurs when a receiver’s independently calculated Cyclic Redundancy Check (CRC) does not match the CRC value transmitted in the frame.

Detection mechanism: Each receiver computes the CRC over the received SOF through data field bits. After the CRC field is received, the computed CRC is compared to the transmitted CRC. If they differ, a CRC error is signaled.

What it looks like in practice: CRC errors indicate that at least one bit in the frame was corrupted during transmission, but the corruption was subtle enough to pass the stuff check and form checks. This is the “catch-all” error detector — if a single bit flip occurs anywhere in the frame content, and it does not happen to violate the stuffing rule or a fixed-form field, the CRC will detect it. On a bus analyzer, CRC errors appear as otherwise normal-looking frames that fail the checksum. If CRC errors are intermittent and affect multiple message IDs, suspect EMI or marginal bit timing. If they affect only one message ID, the transmitting node may have a software bug that corrupts its TX buffer.

4.4 Form Error

A form error occurs when a fixed-format field contains an illegal value. The fixed-format fields are:

Field	Required value
CRC delimiter	Recessive (1)
ACK delimiter	Recessive (1)
EOF (all 7 bits)	Recessive (1)

Detection mechanism: The receiver checks each fixed-format bit. If any of these bits is dominant when it must be recessive, a form error is signaled.

📝 Note: The SOF bit is also fixed (always dominant), but an incorrect SOF would prevent frame reception from starting in the first place — it is not classified as a form error.

What it looks like in practice: Form errors point to severe bus corruption — the disturbance is powerful enough to flip bits in the fixed-format fields at the end of the frame. On an oscilloscope, you may see a dominant glitch during the EOF field (which must be 7 recessive bits). Form errors during the CRC delimiter or ACK delimiter often indicate bus contention — two nodes attempting to transmit simultaneously outside the arbitration phase, which should not happen on a correctly configured bus and may indicate a node with faulty CAN controller logic or a bit timing mismatch so severe that nodes disagree on frame boundaries.

4.5 ACK Error

An ACK error occurs when the transmitter does not receive a dominant ACK bit during the ACK slot. This means no receiver acknowledged the frame.

Detection mechanism: The transmitter sends the ACK slot as recessive and reads the bus. If the bus is still recessive during the ACK slot, no receiver acknowledged — the transmitter signals an ACK error.

What it looks like in practice: ACK errors are the easiest to diagnose — they almost always mean the transmitting node is alone on the bus. Common scenarios: only one Microcontroller Unit (MCU) is powered on during development, a connector is unplugged, or all other nodes have entered bus-off. On a scope, you will see a complete, well-formed frame followed by a recessive ACK slot (no receiver pulled it dominant). If you see ACK errors on a bus with multiple powered nodes, check that the other nodes’ CAN controllers are actually initialized and out of reset — a common mistake is configuring the MCU’s General-Purpose Input/Output (GPIO) pins for CAN but forgetting to start the CAN peripheral.

4.6 Error Detection Summary

Error type	Detected by	Where in frame	What it indicates
Bit error	Transmitter	Anywhere except arbitration + ACK slot	Transmitter cannot drive the bus correctly
Stuff error	Receiver	SOF through CRC (stuffed region)	Data corruption during transmission
CRC error	Receiver	After CRC field	Data corruption detected by redundancy check
Form error	Receiver	CRC del, ACK del, EOF	Fixed-format fields corrupted
ACK error	Transmitter	ACK slot	No receiver acknowledged the frame

graph TB
    subgraph FRAME["CAN Frame Fields"]
        direction LR
        SOF["SOF"] --- ID["ID"] --- CTRL["Control"] --- DATA["Data"] --- CRC["CRC"] --- CRCD["CRC Del"] --- ACK["ACK Slot"] --- ACKD["ACK Del"] --- EOF_F["EOF"]
    end

    subgraph ERRORS["Error Detection Mechanisms"]
        direction TB
        BE[" Bit Error (Tx)
Entire frame except arbitration + ACK slot"]
        SE[" Stuff Error (Rx)
SOF through CRC sequence (stuffed region)"]
        CE[" CRC Error (Rx)
Comparison after CRC field"]
        FE[" Form Error (Rx)
CRC Del + ACK Del + EOF only"]
        AE[" ACK Error (Tx)
ACK slot only"]
    end

    FRAME --> ERRORS

    style BE fill:#FFCDD2,stroke:#D32F2F
    style SE fill:#FFE0B2,stroke:#F57C00
    style CE fill:#BBDEFB,stroke:#1976D2
    style FE fill:#E1BEE7,stroke:#7B1FA2
    style AE fill:#C8E6C9,stroke:#388E3C

Figure: CAN03 02 error detection

4.7 Practical Error Detection — What Errors Look Like in candump

When using candump -e (error frames enabled), CAN errors appear as special frames with the error flag bit (0x20000000) set in the CAN ID. Here is what each error type looks like:

# Enable error frame logging
$ candump -e can0

# Bit Error — transmitter read back a different level than it sent
  can0  20000004   [8]  00 04 00 00 00 00 60 00
  #     ^^^^^^^^        ^^ ^^              ^^
  #     Error frame     |  Bit error flag  TEC=96 (data[6])
  #     + CRTL flag     |  (data[1] bit 2: TX error warning)
  #                     Protocol error location
  #                     Common cause: faulty transceiver or bus short

# Stuff Error — a node detected more than 5 consecutive same-value bits
  can0  20000004   [8]  00 08 00 00 00 00 00 00
  #     ^^^^^^^^        ^^ ^^
  #     Error frame     |  Stuff error flag (data[1] bit 3)
  #     + CRTL flag     Protocol error location
  #                     Common cause: data corruption, clock drift

# CRC Error — computed CRC doesn't match received CRC
  can0  20000008   [8]  00 00 00 03 00 00 00 00
  #     ^^^^^^^^              ^^
  #     Error frame           CRC error during delimiter
  #     + PROT flag           (data[2]=0x03: CRC sequence)
  #                           Common cause: EMI, marginal bit timing

# Form Error — fixed-form bit field has wrong value (e.g., bad EOF)
  can0  20000008   [8]  00 00 09 00 00 00 00 00
  #                        ^^
  #                        Form error at EOF field
  #                        Common cause: bus contention, dominant bit stuck

# ACK Error — transmitter didn't see dominant ACK bit
  can0  20000004   [8]  00 20 00 00 00 00 00 00
  #                        ^^
  #                        ACK error (data[1] bit 5)
  #                        Usually means: only one node on bus,
  #                        or all other nodes in bus-off

# Bus-Off Entry
  can0  20000040   [8]  00 00 00 00 00 00 00 00
  #     ^^^^^^^^
  #     Bus-off flag — this node has entered bus-off state

To decode error frames programmatically in Python, check if can_id & 0x20000000 is set, then inspect the data bytes per the Linux kernel CAN error frame format (include/uapi/linux/can/error.h).

5. Error Signaling: Error Frames

When any node detects an error, it immediately transmits an error frame to notify all other nodes. The error frame destroys the current frame transmission, forcing the transmitter to retry.

5.1 Error Frame Structure

An error frame consists of two fields:

Error flag — 6 bits of the same polarity
Active error flag: 6 dominant bits (transmitted by error-active nodes)
Passive error flag: 6 recessive bits (transmitted by error-passive nodes)
Error delimiter — 8 recessive bits

Error Active:    | 0 0 0 0 0 0 | 1 1 1 1 1 1 1 1 |
                   Error Flag      Error Delimiter
                  (6 dominant)      (8 recessive)

Error Passive:   | 1 1 1 1 1 1 | 1 1 1 1 1 1 1 1 |
                   Error Flag      Error Delimiter
                  (6 recessive)    (8 recessive)

Why 6 dominant bits? The active error flag intentionally violates the bit stuffing rule (which inserts a stuff bit after 5 consecutive same-polarity bits). This violation guarantees that every other node on the bus also detects a stuff error — even if the original error was local to the detecting node. This is how CAN propagates error awareness across the entire network.

Why passive error flags are recessive: An error-passive node has accumulated many errors and may itself be the source of bus problems. By transmitting recessive error flags, it avoids disrupting valid communication from other nodes. A passive error flag is only visible if the bus is already idle (recessive) — it does not override ongoing dominant transmissions.

6. Fault Confinement: The TEC/REC State Machine

CAN’s fault confinement mechanism prevents a single malfunctioning node from permanently disrupting the bus. Each node maintains two counters:

TEC (Transmit Error Counter) — incremented when the node detects an error while transmitting
REC (Receive Error Counter) — incremented when the node detects an error while receiving

6.1 The Three Error States

                    TEC > 127 OR REC > 127
                        TEC > 255

stateDiagram-v2
    [*] --> ErrorActive

    state ErrorActive {
        note right of ErrorActive
            TEC ≤ 127 AND REC ≤ 127
            Sends active error flags (6 dominant)
            Normal operating state
        end note
    }

    state ErrorPassive {
        note right of ErrorPassive
            TEC > 127 OR REC > 127
            Sends passive error flags (6 recessive)
            Must wait 8 extra bits after IFS
        end note
    }

    state BusOff {
        note right of BusOff
            TEC > 255
            Cannot transmit or receive
            Recovery: 128 × 11 recessive bits
        end note
    }

    ErrorActive --> ErrorPassive : TEC > 127 OR REC > 127
    ErrorPassive --> ErrorActive : TEC ≤ 127 AND REC ≤ 127
    ErrorPassive --> BusOff : TEC > 255
    BusOff --> ErrorActive : 128 × 11 recessive bits\nTEC = 0, REC = 0

Figure: CAN03 03 error state machine

Error Active (initial state) - Both TEC ≤ 127 and REC ≤ 127 - The node participates fully in bus communication - On error detection, transmits an active error flag (6 dominant bits) to destroy the frame and force a retry - This is the normal operating state for a healthy node

Error Passive - TEC > 127 or REC > 127 - The node can still transmit and receive, but with restrictions: - Error flags are passive (6 recessive bits — does not disrupt other nodes) - Must wait an additional 8-bit suspend transmission period after the interframe space before transmitting - An error-passive node is penalized but not silenced — it can still communicate, just with lower priority

Bus Off - TEC > 255 - The node is completely disconnected from the bus — it cannot transmit or receive - Recovery requires detecting 128 occurrences of 11 consecutive recessive bits (= 1,408 recessive bit times) - After recovery, TEC and REC are both reset to 0 and the node returns to error-active state

⚠️ Warning: A bus-off node does not simply disappear — its transceiver remains connected to the bus. However, the CAN controller disables its transmit driver, so the transceiver does not drive the bus. If the transceiver hardware itself is faulty (e.g., stuck dominant), bus-off will not help — you must physically disconnect the node.

Chronically Elevated TEC/REC: The “Silent Failure” Scenario

A particularly insidious failure mode occurs when a node’s TEC or REC is chronically elevated — hovering in the range of 96–127 — without ever reaching bus-off (TEC > 255). In this scenario:

The node remains in error-passive state (TEC > 127 or REC > 127), or sits just below the error-passive threshold and frequently crosses back and forth.
When error-passive, the node sends passive error flags (6 recessive bits) instead of active ones. Passive error flags do not disrupt other nodes’ communication, so the problem is not visible on the bus.
The node may silently fail to communicate: its transmissions are repeatedly interrupted by errors and retries, but because it is error-passive, it must wait the extra 8-bit suspend transmission period after each interframe space, effectively deprioritizing all its traffic.
Other nodes on the bus continue to operate normally — they may not detect that this node is struggling.

Why this is harder to detect than bus-off: A bus-off event is a clear, discrete failure that triggers interrupts, logs, and (typically) recovery logic. A chronically error-passive node, by contrast, produces no dramatic events — it simply becomes slow and unreliable. Application-layer timeouts may eventually fire, but the root cause is difficult to trace without monitoring the TEC/REC values directly.

Common root causes of chronically elevated counters: - Intermittent wiring issues (loose connector pin, frayed cable, marginal termination) - Marginal bit timing (sample point near the edge of the valid window for the bus length) - A nearby EMI source that causes occasional bit errors without destroying every frame - An oscillator that drifts with temperature, causing errors only at thermal extremes

Mitigation: Implement periodic TEC/REC monitoring in your application firmware. Log warnings when either counter exceeds a threshold (e.g., 96) — this provides early warning before the node reaches error-passive or bus-off. Many CAN controllers provide register access to the current TEC and REC values, and some (like STM32 FDCAN) can generate interrupts when error warning thresholds are crossed.

6.2 Counter Increment/Decrement Rules

The TEC and REC counters follow these rules per ISO 11898-1:

REC (Receive Error Counter) — ISO 11898-1 §12.1.4.3:

Event	Change
Receiver detects an error	REC += 1
Receiver detects an error and is error-passive, and the error is detected after sending a passive error flag	REC += 8
Successful reception	REC -= 1 (minimum 0)

📝 Note: The REC does **not** increment for errors detected during the transmission of an error flag itself (ISO 11898-1 §12.1.4.2). This prevents cascading counter inflation when a node is already signaling an error — without this exception, a single bus disturbance could cause multiple counter increments as nodes react to each other's error flags.

TEC (Transmit Error Counter) — ISO 11898-1 §12.1.4.2:

Event	Change
Transmitter sends an error flag	TEC += 8
Successful transmission	TEC -= 1 (minimum 0)

The asymmetry is deliberate: TEC increments by 8 on each error but only decrements by 1 on success. This means a node that is causing transmit errors will reach bus-off quickly (after approximately 32 consecutive transmit errors: 32 × 8 = 256), while a node experiencing occasional receive errors recovers gradually.

6.3 Tracing the State Machine: Example

A node begins in error-active state (TEC = 0, REC = 0):

Step  Event                          TEC    REC    State
────  ─────                          ───    ───    ─────
  1   Initial state                    0      0    Error Active
  2   Successful TX                    0      0    Error Active (TEC already 0)
  3   Successful TX                    0      0    Error Active
  4   TX error → active error flag     8      0    Error Active
  5   Successful TX                    7      0    Error Active
  6   TX error → active error flag    15      0    Error Active
  7   15 × successful TX               0      0    Error Active
────  ── Fast failure sequence ──     ───    ───    ─────
  8   16 × consecutive TX errors     128      0    Error PASSIVE (TEC > 127)
  9   16 more TX errors              256      0    BUS OFF (TEC > 255)
────  ── Recovery ──                 ───    ───    ─────
 10   128 × 11 recessive bits         0      0    Error Active (recovered)

This trace shows the full lifecycle: a node accumulates transmit errors, transitions through error-passive into bus-off, then recovers. Note that 32 consecutive TX errors (steps 8 + 9 = 32 × 8 = 256) are sufficient to reach bus-off from error-active — this happens in under 100 ms at 500 kbit/s.

Scenario B: Receive Errors — Error-Active to Error-Passive and Back

This scenario models a receiver experiencing intermittent CRC errors due to marginal termination. Unlike transmit errors (TEC += 8), receive errors increment REC by only 1, so the transition to error-passive takes many more errors.

Step  Event                          TEC    REC    State
────  ─────                          ───    ───    ─────
  1   Initial state                    0      0    Error Active
  2   RX CRC error                     0      1    Error Active (REC += 1)
  3   50 × RX CRC errors              0     51    Error Active
  4   Successful RX                    0     50    Error Active (REC -= 1)
  5   50 more RX CRC errors           0    100    Error Active (approaching threshold)
  6   28 more RX CRC errors           0    128    Error PASSIVE (REC > 127)
       Node now sends passive error flags (6 recessive bits)
       Node must wait 8-bit suspend transmission before TX
  7   129 × successful RX             0      0    Error Active (recovered)
       Each successful reception decrements REC by 1 (minimum 0)

Key insight: a receiver needs 128 consecutive errors (with no successes) to reach error-passive, versus only 16 consecutive errors for a transmitter. This asymmetry is deliberate — a faulty transmitter is more dangerous to the network because it actively corrupts the bus, while a faulty receiver only affects itself.

Scenario C: Chronically Elevated Counters — The “Silent Failure”

This scenario shows a node with an intermittent wiring fault that causes periodic transmit errors but allows enough successful transmissions to prevent bus-off. The node oscillates between error-active and error-passive indefinitely.

Step  Event                          TEC    REC    State
────  ─────                          ───    ───    ─────
  1   Initial state                    0      0    Error Active
  2   Normal operation (100 TX OK)     0      0    Error Active
  3   Connector vibration → TX error   8      0    Error Active (TEC += 8)
  4   10 × successful TX               0      0    Error Active (TEC decremented to 0)
  ...  (pattern repeats for hours — intermittent errors, always recovers)
 50   Temperature rises → more errors
 51   5 × TX error (rapid)            40      0    Error Active
 52   4 × successful TX               36      0    Error Active
 53   12 × TX error (burst)          132      0    Error PASSIVE (TEC > 127)
       Passive error flags → other nodes don't see the disruption
 54   30 × successful TX             102      0    Error Active (TEC < 128)
 55   4 × TX error                   134      0    Error PASSIVE (back again)
 56   6 × successful TX              128      0    Error PASSIVE (still > 127!)
 57   2 × successful TX              126      0    Error Active (barely)
 58   2 × TX error                   142      0    Error PASSIVE (oscillating)
  ...  (continues indefinitely — never reaches bus-off, never fully recovers)

This is the most dangerous failure mode because the node never triggers a bus-off event (which would be logged and investigated). Instead, it silently degrades — its messages arrive late or not at all, and the application layer may eventually time out, but tracing the root cause to a marginal TEC requires monitoring the error counters directly. Implement the TEC/REC monitoring code in Section 6.4 and alert when TEC or REC exceeds 96 (the error-warning threshold).

6.4 Runtime TEC/REC Register Reading

Monitoring TEC and REC at runtime is essential for detecting the “silent failure” scenario described in Section 6.1. Most CAN controllers expose these counters in memory-mapped registers. Below are complete code examples for four common platforms.

STM32 bxCAN (STM32F1/F2/F4/L4)

#include "stm32f4xx_hal.h"

void read_can_error_counters(CAN_HandleTypeDef *hcan) {
    uint32_t esr = hcan->Instance->ESR;
    uint8_t tec = (esr >> 16) & 0xFF;  // Bits [23:16]
    uint8_t rec = (esr >> 24) & 0xFF;  // Bits [31:24]
    uint8_t lec = (esr >> 4) & 0x07;   // Bits [6:4] Last Error Code

    const char *lec_str[] = {
        "No error", "Stuff error", "Form error", "ACK error",
        "Bit recessive error", "Bit dominant error", "CRC error", "Set by software"
    };

    printf("TEC=%u REC=%u LEC=%s\n", tec, rec, lec_str[lec]);

    // Error state determination
    if (tec > 255 || rec > 255)
        printf("State: BUS-OFF\n");
    else if (tec > 127 || rec > 127)
        printf("State: ERROR-PASSIVE\n");
    else if (tec > 96 || rec > 96)
        printf("State: ERROR-WARNING\n");
    else
        printf("State: ERROR-ACTIVE\n");
}

The bxCAN Error Status Register (ESR) packs TEC, REC, and the Last Error Code (LEC) into a single 32-bit register. The LEC field records the type of the most recent error, which is useful for identifying whether errors are consistently of one type (suggesting a specific root cause) or mixed (suggesting a general bus problem).

STM32 FDCAN (STM32G4/H7)

#include "stm32g4xx_hal.h"
#include <stdbool.h>

void read_fdcan_error_counters(FDCAN_HandleTypeDef *hfdcan) {
    uint32_t ecr = hfdcan->Instance->ECR;
    uint8_t tec = ecr & 0xFF;           // Bits [7:0]
    uint8_t rec = (ecr >> 8) & 0x7F;    // Bits [14:8]
    bool rp = (ecr >> 15) & 1;          // Bit 15: Receive error passive flag

    printf("TEC=%u REC=%u Error-Passive=%s\n", tec, rec, rp ? "YES" : "NO");
}

The FDCAN Error Counter Register (ECR) uses a different bit layout from bxCAN. Notably, it includes a dedicated “receive error passive” flag (bit 15) that directly indicates whether the receive error counter has exceeded the error-passive threshold, without requiring the application to compare the REC value manually.

NXP FlexCAN (e.g., i.MX RT)

#include "fsl_flexcan.h"
#include <stdbool.h>

void read_flexcan_error_counters(CAN_Type *base) {
    uint32_t ecr = base->ECR;
    uint8_t tec = ecr & 0xFF;          // TXERRCNT: bits [7:0]
    uint8_t rec = (ecr >> 8) & 0xFF;   // RXERRCNT: bits [15:8]

    uint32_t esr1 = base->ESR1;
    bool bus_off = (esr1 >> 2) & 1;    // BOFFINT

    printf("TEC=%u REC=%u Bus-Off=%s\n", tec, rec, bus_off ? "YES" : "NO");
}

The FlexCAN ECR register places TEC in bits [7:0] and REC in bits [15:8]. The ESR1 register provides additional status flags including the bus-off interrupt flag (BOFFINT), which can also be configured to trigger an interrupt for immediate bus-off notification.

Linux socketCAN Monitoring Loop

import socket
import struct
import time

def monitor_error_counters(interface='can0', interval=1.0):
    """Monitor CAN error state changes via socketCAN error frames."""
    s = socket.socket(socket.AF_CAN, socket.SOCK_RAW, socket.CAN_RAW)

    # Enable error frame reception
    err_mask = 0x20000000 | 0x00000004 | 0x00000008  # BUSOFF | CRTL | PROT
    s.setsockopt(socket.SOL_CAN_RAW, socket.CAN_RAW_ERR_FILTER,
                 struct.pack('=I', err_mask))
    s.bind((interface,))
    s.settimeout(interval)

    print(f"Monitoring {interface} error state (Ctrl+C to stop)...")
    last_state = None

    while True:
        try:
            frame = s.recv(16)
            can_id, dlc = struct.unpack('=IB', frame[:5])
            data = frame[8:8+dlc]

            if can_id & 0x20000000:  # Error frame
                if can_id & 0x00000004:  # Controller error
                    state = "ERROR-WARNING" if data[1] & 0x04 else \
                            "ERROR-PASSIVE" if data[1] & 0x10 else \
                            "ERROR-ACTIVE"
                    if state != last_state:
                        print(f"[{time.strftime('%H:%M:%S')}] State change: {state} "
                              f"(TEC={data[6]}, REC={data[7]})")
                        last_state = state
                elif can_id & 0x00000040:  # Bus-off
                    print(f"[{time.strftime('%H:%M:%S')}] BUS-OFF!")
                    last_state = "BUS-OFF"
        except socket.timeout:
            pass

if __name__ == '__main__':
    monitor_error_counters()

# Expected output (example from a node with a loose connector)
Monitoring can0 error state (Ctrl+C to stop)...
[14:23:07] State change: ERROR-WARNING (TEC=96, REC=0)
[14:23:08] State change: ERROR-PASSIVE (TEC=128, REC=0)
[14:23:09] State change: ERROR-WARNING (TEC=112, REC=0)
[14:23:11] State change: ERROR-PASSIVE (TEC=136, REC=0)
[14:23:15] BUS-OFF!

The Python monitoring script opens a raw CAN socket with error frame filtering enabled. When the Linux kernel’s socketCAN subsystem detects a controller state transition (error-active to error-warning, error-warning to error-passive, or error-passive to bus-off), it delivers an error frame to the socket. The script decodes the error frame’s data bytes to extract the current TEC and REC values (bytes 6 and 7) and prints only state transitions, not every individual error — this avoids flooding the terminal during high-error conditions.

7. Bus-Off Recovery Strategies

When a node enters bus-off, it must recover before it can participate in bus communication again. The ISO 11898-1 standard defines the recovery condition (128 × 11 recessive bits), but the recovery strategy — how and when to trigger recovery — is application-defined.

7.1 Automatic Recovery

The node automatically attempts recovery as soon as it detects the bus-off condition. This is the simplest approach:

#include "stm32f4xx_hal.h"

/* STM32 bxCAN: Enable automatic bus-off recovery */
CAN1->MCR |= CAN_MCR_ABOM;  /* Automatic Bus-Off Management */

This single register bit enables the bxCAN peripheral to automatically initiate the ISO 11898-1 bus-off recovery sequence (128 occurrences of 11 recessive bits) as soon as bus-off is entered. No application intervention is required.

Pros: Fastest recovery, minimal downtime. Cons: If the root cause is still present (e.g., a wiring fault), the node will enter bus-off again immediately after recovery, creating a rapid on/off cycle that generates error frames and can degrade bus performance.

7.2 Manual Recovery with Backoff

The application monitors the bus-off event and waits for a configurable period before triggering recovery:

#include "stm32f4xx_hal.h"

/* STM32 bxCAN: Manual bus-off recovery with exponential backoff */
static uint32_t backoff_ms = 100;  /* File-scope: shared between handler and reset */

void CAN_BusOff_Handler(void) {
    /* Disable automatic recovery */
    CAN1->MCR &= ~CAN_MCR_ABOM;

    /* Wait before attempting recovery */
    HAL_Delay(backoff_ms);

    /* Request re-initialization */
    CAN1->MCR |= CAN_MCR_INRQ;   /* Enter init mode */
    while (!(CAN1->MSR & CAN_MSR_INAK));  /* Wait for init */
    CAN1->MCR &= ~CAN_MCR_INRQ;  /* Leave init mode → recovery begins */

    /* Exponential backoff: double the wait time, cap at 10 seconds */
    backoff_ms = (backoff_ms < 10000) ? backoff_ms * 2 : 10000;
}

void CAN_Recovery_Success(void) {
    /* Reset backoff on successful recovery */
    backoff_ms = 100;
}

This approach uses exponential backoff — each consecutive bus-off event doubles the wait time before recovery, up to a maximum. This prevents a broken node from repeatedly flooding the bus with error frames. The HAL_Delay() function is part of the STM32 Hardware Abstraction Layer (HAL) library and provides a millisecond-resolution blocking delay. In a production system, you would typically use a non-blocking timer or RTOS delay instead of HAL_Delay() to avoid stalling the main loop.

⚠️ Warning: The `while (!(CAN1->MSR & CAN_MSR_INAK))` loop has no timeout — in production firmware, add a timeout guard to prevent the MCU from hanging indefinitely if the CAN peripheral fails to enter initialization mode.

7.3 Recovery Strategy Comparison

Strategy	Recovery time	Risk of error storm	Best for
Automatic (ABOM)	Fastest (~2.8 ms at 500 kbit/s)	High if fault persists	Non-critical nodes, development
Fixed delay	Predictable	Medium	Simple applications
Exponential backoff	Adaptive	Low	Production safety-critical systems
Manual only (operator reset)	Depends on operator	None	Safety-critical, requires investigation

💡 Tip: For production systems, implement bus-off recovery with exponential backoff and log every bus-off event with a timestamp, the TEC value at the time of bus-off, and the last few frames transmitted before the event. This data is invaluable for diagnosing intermittent wiring faults and connector problems.

8. Oscillator Tolerance and Clock Accuracy

CAN’s bit timing relies on all nodes having oscillators that are accurate enough to maintain synchronization between resynchronization edges. The maximum allowable oscillator deviation depends on the bit timing configuration.

8.1 Maximum Oscillator Tolerance

The general formula for the maximum oscillator tolerance (df) is:

df_max = min(
    SJW / (2 × 10 × Total_TQ),
    Phase_Seg2 / (2 × (13 × Total_TQ - Phase_Seg2))
)

For the 500 kbit/s example (SJW = 2, Total_TQ = 16, Phase_Seg2 = 2):

sjw = 2
total_tq = 16
phase_seg2 = 2

df1 = sjw / (2 * 10 * total_tq)
df2 = phase_seg2 / (2 * (13 * total_tq - phase_seg2))

df_max = min(df1, df2)
print(f"df1 = {df1*100:.3f}%")
print(f"df2 = {df2*100:.3f}%")
print(f"Max oscillator tolerance: ±{df_max*100:.3f}%")

# Output
df1 = 0.625%
df2 = 0.485%
Max oscillator tolerance: ±0.485%

8.2 Oscillator Types and Their Accuracy

Oscillator type	Typical accuracy	Suitable for
High-stability crystal (TCXO)	±10–50 parts per million (ppm) (0.001–0.005%)	Any CAN bit rate
Standard crystal	±20–100 ppm (0.002–0.01%)	Any CAN bit rate
Ceramic resonator	±0.1–0.5%	≤ 500 kbit/s (marginal at 1 Mbit/s)
Internal RC oscillator	±1–5%	Not suitable for CAN (except ≤ 10 kbit/s)

⚠️ Warning: Many low-cost MCU development boards use ceramic resonators or internal RC oscillators. These are acceptable for initial prototyping but will cause intermittent CAN errors in production. Always verify the oscillator type on every CAN node in your system — one node with a poor oscillator can disrupt the entire bus.

💡 Tip: In practice, standard crystal oscillators (±50 ppm) easily meet CAN's ±0.5% requirement. Ceramic resonators (±0.5%) are marginal and should be avoided for CAN FD data rates above 2 Mbit/s. MEMS oscillators are acceptable alternatives to crystals — they offer comparable accuracy (±20–50 ppm) with better vibration resistance, which can be advantageous in automotive and industrial environments.

9. CAN FD Error Handling Differences

CAN FD inherits the same five error detection mechanisms and the TEC/REC fault confinement model from classical CAN, with several additions:

9.1 Error State Indicator (ESI)

CAN FD frames include the Error State Indicator (ESI) bit, which broadcasts the transmitter’s error state to all receivers:

ESI = 0 (dominant) → transmitter is error-active
ESI = 1 (recessive) → transmitter is error-passive

This gives network management systems immediate visibility into the health of each transmitter — in classical CAN, you can only infer a node’s error state by observing its error flag behavior.

9.2 CRC Improvements

CAN FD uses longer CRCs (CRC-17 for ≤ 16 data bytes, CRC-21 for > 16 data bytes) and adds fixed stuff bits in the CRC field. These improvements increase the Hamming distance for large payloads and improve error detection compared to classical CAN’s CRC-15.

9.3 Protocol Exception Handling

CAN FD introduces Protocol Exception Handling to gracefully handle the situation where a CAN FD frame is received by a node that does not support CAN FD. Without this feature, the classical CAN node would interpret the FDF bit as a form error and destroy every CAN FD frame with error frames.

With protocol exception handling enabled, the CAN FD controller detects the invalid bit pattern and enters a protocol exception state rather than transmitting an error frame. This allows mixed CAN 2.0/CAN FD networks to operate (with classical nodes ignoring FD frames) at the cost of reduced bus availability during FD frame transmissions.

📝 Note: Protocol exception handling is a controller configuration option, not a mandatory feature. Consult your CAN controller's reference manual for the specific register setting.

Troubleshooting

#	Symptom	Likely cause	Diagnostic step	Resolution
1	No communication despite correct wiring	Bit rate or sample point mismatch between nodes	Read the bit timing registers on each node; verify prescaler, BS1/BS2, and SJW match	Configure identical bit rate and compatible sample point (within 5%) on all nodes
2	Intermittent errors that correlate with specific data patterns	Sample point position too early or too late for the bus length	Calculate the required Prop_Seg for your bus length; verify sample point is in the recommended range	Adjust bit timing to move sample point later (for long buses) or earlier (for short buses with many nodes)
3	Error count steadily increases over hours	Oscillator drift due to temperature change	Monitor TEC/REC over time; measure oscillator frequency at temperature extremes	Replace RC oscillator or ceramic resonator with a crystal; increase SJW if possible
4	Node enters bus-off repeatedly	Persistent physical fault (wiring, termination, or transceiver failure) causing continuous transmit errors	Log TEC value and last TX frames before bus-off; capture oscilloscope waveform	Fix the physical fault (see CAN-01); implement exponential backoff recovery
5	CAN FD data phase errors but arbitration phase works	Data-phase bit timing incorrect, transceiver delay compensation wrong, or transceiver not CAN FD-rated	Verify data-phase bit timing parameters; check TDCO setting; verify transceiver part number	Recalculate data-phase timing; adjust TDCO to match actual transceiver loop delay; use CAN FD-rated transceiver
6	One node’s errors affect the entire bus	Error-active node is sending active error flags (6 dominant bits) that disrupt all traffic	Read TEC/REC on the problem node; determine if it is transmitting error frames	Fix the root cause on the problem node; if unfixable, the node will eventually reach bus-off and self-isolate
7	Bus-off recovery takes unexpectedly long	Recovery requires 128 × 11 recessive bits (1,408 bit times); high bus load leaves few idle periods	Calculate expected recovery time: 1,408 × bit_time	Reduce bus load temporarily during recovery; use manual recovery with a controlled bus-quiet period

References

ISO 11898-1:2015 — Road vehicles — Controller area network (CAN) — Part 1: Data link layer and physical signalling. International Organization for Standardization. Defines bit timing model, error detection, and fault confinement.
ISO 11898-2:2016 — Road vehicles — Controller area network (CAN) — Part 2: High-speed medium access unit. Specifies transceiver symmetry and delay requirements for CAN FD.
Robert Bosch GmbH (1991) — CAN Specification Version 2.0. Original bit timing model and error handling specification.
CiA (CAN in Automation) 601-1 — CAN FD Node and system design — Part 1: Bit timing. Recommendations for CAN FD arbitration and data phase bit timing.
SAE J1939-11:2022 — Physical Layer — 250 kbit/s, Twisted Shielded Pair. Specifies 87.5% sample point for J1939 networks.
STM32 Reference Manual RM0090 — bxCAN controller bit timing registers (BTR). STMicroelectronics.
STM32 Reference Manual RM0433 — FDCAN controller bit timing registers (NBTP, DBTP, TDCR). STMicroelectronics.
NXP FlexCAN Reference Manual — FlexCAN bit timing configuration (CTRL1, CBT, FDCBT registers). NXP Semiconductors.
Microchip PIC32 CAN Reference Manual — CAN module bit timing configuration. Microchip Technology.
Kvaser Bit Timing Calculator — Online tool for CAN bit timing parameter calculation.

Changelog

Version	Date	Author	Summary of changes
1.0	2026-03-16	Telematics Tutorial Series	Initial publication

```