Skip to content

modm:platform:fault: ARM Cortex-M Fault Reporters

This module manages data storage for core dumps provided by the :crashcatcher module to investigate HardFault events via offline post-mortem debugging. The data is stored in the volatile memory designated for the heap.

This works as follows:

  1. A HardFault occurs and is intercepted by CrashCatcher.
  2. CrashCatcher calls into this module to store the core dump in the heap as defined by the linkerscript's .table.heap section, thus effectively overwriting the heap, then reboots the device.
  3. On reboot, only the remaining heap memory is initialized, leaving the core dump data intact.
  4. The application has no limitations other than a reduced total heap size! It may access the report data at any time and use all hardware to send out this report.
  5. After the application clears the report ands reboots, the heap will once again be fully available.

Restrictions on HardFault Entry

A HardFault is a serious bug and should it happen your application is most likely compromised in some way. Here are some important points to take note of.

  1. The HardFault has a hardcoded priority of -1 and only the NMI and the Reset exceptions have a higher priority (-2 and -3). This means ALL device interrupts have a LOWER priority!
  2. The HardFault is a synchronous exception, it will NOT wait for anything to complete, especially not the currently executing interrupt (if any).
  3. There are many reasons for the HardFault exception to be raised (e.g. accessing invalid memory, executing undefined instructions, dividing by zero) making it very difficult to recover in a generic way. It is therefore reasonable to abandon execution (=> reboot) rather than resuming execution in an increasingly unstable application.

On HardFault entry, this module calls the function modm_hardfault_entry() which can be overwritten by the application to put the devices hardware in a safe mode. This can be as simple as disabling power to external components, however, its execution should be strictly time bound and NOT depend on other interrupts completing (they won't), which will cause a deadlock.

void modm_hardfault_entry()
{
    Board::MotorDrivers::disable();
    // return from this function as fast as possible
}

After this function returns, this module will generate the coredump into the heap and reboot the device.

Reporting the Fault

In order to recover from the HardFault the device is rebooted with a smaller heap. Once the main() function is reached, the application code should check for FaultReporter::hasReport() and then only initialize the bare minimum of Hardware to send this report to the developer.

To access the report, use the FaultReporter::begin() and FaultReporter::end() functions which return a const_iterator of the actual core dump data, that can be used in a range-based for loop.

Remember to call FaultReporter::clearAndReboot() to clear the report, reboot the device and reclaim the full heap.

int main()
{
    if (FaultReporter::hasReport()) // Check first after boot
    {
        Application::partialInitialize(); // Initialize only the necessary
        reportBegin();
        for (const uint8_t data : FaultReporter::buildId())
            reportBuildId(data); // send each byte of Build ID
        for (const uint8_t data : FaultReporter())
            reportData(data); // send each byte of data
        reportEnd(); // end the report
        FaultReporter::clearAndReboot(); // clear the report and reboot
        // never reached
    }
    // Normal initialization
    Application::initialize();
}

The application is able to use the heap, however, depending on the report size (controllable via the report_level option) the heap may be much smaller then normal. Make sure your application can deal with that.

For complex applications which perhaps communicate asynchronously (CAN, Ethernet, Wireless) it may not be possible to send the report in one piece or at the same time. The report data remains available until you reboot, even after you've cleared the report.

int main()
{
    const bool faultReport{FaultReporter::hasReport()};
    FaultReporter::clear(); // only clear report but do not reboot
    Application::initialize();

    while (true)
    {
        doOtherStuff();
        if (faultReport and applicationReady)
        {
            // Still valid AFTER clear, but BEFORE reboot
            const auto id = FaultReporter::buildId();
            auto begin = FaultReporter::begin();
            auto end = FaultReporter::end();
            //
            Application::sendReport(id, begin, end);
            // reboot when report has been fully sent
        }
    }
}

Using the Fault Report

The fault report contains a core dump generated by CrashCatcher and is supposed to be used by CrashDebug to present the memory view to the GDB debugger. For this, you must use the ELF file that corresponds to the devices firmware, as well as copy the coredump data formatted as hexadecimal values into a text file, then call the debugger like this:

arm-none-eabi-gdb -tui executable.elf -ex "set target-charset ASCII" \
    -ex "target remote | CrashDebug --elf executable.elf --dump coredump.txt"

Note that the FaultReporter::buildId() contains the GNU Build ID, which can help you find the right ELF file:

arm-none-eabi-readelf -n executable.elf

Displaying notes found in: .build_id
  Owner                 Data size Description
  GNU                  0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 59f08f7a37a7340799d9dba6b0c092bc3c9515c5

Post-Mortem Debugging with SCons

The :build:scons module provides a few helper methods for working with fault reports. You still need to copy the coredump data manually, however, the firmware selection is automated.

The SCons build system will automatically cache the ELF file for the build id for every firmware upload (using scons artifact). When a fault is reported, you can tell SCons the firmware build id and it will use the corresponding ELF file automatically.

# Copy data into coredump.txt
touch coredump.txt
# Start postmortem debugging of executable with this build id
scons postmortem firmware=59f08f7a37a7340799d9dba6b0c092bc3c9515c5

This module is only available for stm32.

Options

report_level

Fault Report Level

This module will try to store as much data as is available in the heap and any leftover data will be discarded. This means the application may not have any heap available after a reboot.

You can control how much data is generated by choosing the right report level:

  • core: Just dumps the core registers, which describe where the fault occurred and why. This is usually less than 250 Bytes.
  • stack: Dumps the main stack memory. This will get you a full backtrace, but may take a few kB of space.
  • data: Dumps all memory sections containing static data: .data, .fastdata, .bss. This allows you to see data that isn't related to your current fault location, however, this can take several tens of kB of data.

It is strongly recommended to choose the report level that generates less data than you heap size. The scons size output displays this very prominently, if the Data size is smaller than your Heap size, you're good to use the core+stack+data setting:

Data:      5.2 KiB (26.0% used) = 2285 B static (11.2%) + 3040 B stack (14.8%)
(.bss + .data + .fastdata + .noinit + .stack)

Heap:     14.8 KiB (74.0% available)
(.heap1)

If Heap is smaller than the Data, you may need to switch to using only the core+stack setting:

Data:     11.2 KiB (56.0% used) = 8429 B static (41.2%) + 3040 B stack (14.8%)
(.bss + .data + .fastdata + .noinit + .stack)

Heap:      8.8 KiB (44.0% available)
(.heap1)

Default: core+stack+data
Inputs: [core, core+stack, core+stack+data]

Dependencies

modm:platform:fault modm_platform_fault modm: platform: fault modm_architecture_build_id modm: architecture: build_id modm_platform_fault->modm_architecture_build_id modm_cmsis_device modm: cmsis: device modm_platform_fault->modm_cmsis_device modm_crashcatcher modm: crashcatcher modm_platform_fault->modm_crashcatcher

Limited availability: Check with 'lbuild discover' if this module is available for your target!