ARM Cortex-M Fault Reporters¶
lbuild module: modm:platform:fault
This module manages data storage for core dumps provided by the modm:crashcatcher
module to investigate HardFault events via offline post-mortem debugging.
The data is stored in the volatile memory designated for the heap.
This works as follows:
- A HardFault occurs and is intercepted by CrashCatcher.
- CrashCatcher calls into this module to store the core dump in the heap as
defined by the linkerscript's
.table.heap
section, thus effectively overwriting the heap, then reboots the device. - On reboot, only the remaining heap memory is initialized, leaving the core dump data intact.
- The application has no limitations other than a reduced total heap size! It may access the report data at any time and use all hardware to send out this report.
- After the application clears the report and reboots, the heap will once again be fully available.
Restrictions on HardFault Entry¶
A HardFault is a serious bug and should it happen your application is most likely compromised in some way. Here are some important points to take note of.
- The HardFault has a hardcoded priority of -1 and only the NMI and the Reset exceptions have a higher priority (-2 and -3). This means ALL device interrupts have a LOWER priority!
- The HardFault is a synchronous exception, it will NOT wait for anything to complete, especially not the currently executing interrupt (if any).
- There are many reasons for the HardFault exception to be raised (e.g. accessing invalid memory, executing undefined instructions, dividing by zero) making it very difficult to recover in a generic way. It is therefore reasonable to abandon execution (=> reboot) rather than resuming execution in an increasingly unstable application.
On HardFault entry, this module calls the function modm_hardfault_entry()
which
can be overwritten by the application to put the devices hardware in a safe mode.
This can be as simple as disabling power to external components, however, its
execution should be strictly time bound and NOT depend on other interrupts
completing (they won't), which will cause a deadlock.
void modm_hardfault_entry()
{
Board::MotorDrivers::disable();
// return from this function as fast as possible
}
After this function returns, this module will generate the coredump into the heap and reboot the device.
Reporting the Fault¶
In order to recover from the HardFault the device is rebooted with a smaller
heap. Once the main()
function is reached, the application code should check
for FaultReporter::hasReport()
and then only initialize the bare minimum of
Hardware to send this report to the developer.
To access the report, use the FaultReporter::begin()
and FaultReporter::end()
functions which return a const_iterator
of the actual core dump data, that can
be used in a range-based for loop.
Remember to call FaultReporter::clearAndReboot()
to clear the report, reboot
the device and reclaim the full heap.
int main()
{
if (FaultReporter::hasReport()) // Check first after boot
{
Application::partialInitialize(); // Initialize only the necessary
reportBegin();
for (const uint8_t data : FaultReporter::buildId())
reportBuildId(data); // send each byte of Build ID
for (const uint8_t data : FaultReporter())
reportData(data); // send each byte of data
reportEnd(); // end the report
FaultReporter::clearAndReboot(); // clear the report and reboot
// never reached
}
// Normal initialization
Application::initialize();
}
The application is able to use the heap, however, depending on the report size
(controllable via the report_level
option) the heap may be much smaller then
normal. Make sure your application can deal with that.
For complex applications which perhaps communicate asynchronously (CAN, Ethernet, Wireless) it may not be possible to send the report in one piece or at the same time. The report data remains available until you reboot, even after you've cleared the report.
int main()
{
const bool faultReport{FaultReporter::hasReport()};
FaultReporter::clear(); // only clear report but do not reboot
Application::initialize();
while (true)
{
doOtherStuff();
if (faultReport and applicationReady)
{
// Still valid AFTER clear, but BEFORE reboot
const auto id = FaultReporter::buildId();
auto begin = FaultReporter::begin();
auto end = FaultReporter::end();
//
Application::sendReport(id, begin, end);
// reboot when report has been fully sent
}
}
}
Coredump via GDB¶
In case you encounter a HardFault while debugging and you did not include this
module or if you simply want to store the current system state for later
analysis or to share with other developers, you can simply call the
modm_coredump
function inside GDB and it will generate a coredump.txt
file.
Note that this coredump file contains all volatile memories including the heap,
so this method is strongly recommended if you can attach a debugger.
Consult your chosen build system module for additional integrations.
Using the Fault Report¶
The fault report contains a core dump generated by CrashCatcher and is supposed to be used by CrashDebug to present the memory view to the GDB debugger. For this, you must use the ELF file that corresponds to the devices firmware, as well as copy the coredump data formatted as hexadecimal values into a text file, then call the debugger like this:
arm-none-eabi-gdb -tui executable.elf -ex "set target-charset ASCII" \
-ex "target remote | CrashDebug --elf executable.elf --dump coredump.txt"
Note that the FaultReporter::buildId()
contains the GNU Build ID, which can
help you find the right ELF file:
arm-none-eabi-readelf -n executable.elf
Displaying notes found in: .build_id
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 59f08f7a37a7340799d9dba6b0c092bc3c9515c5
Post-Mortem Debugging with SCons¶
The modm:build:scons
module provides a few helper methods for working with fault
reports. You still need to copy the coredump data manually, however, the firmware
selection is automated.
The SCons build system will automatically cache the ELF file for the build id for
every firmware upload (using scons artifact
).
When a fault is reported, you can tell SCons the firmware build id and it will use
the corresponding ELF file automatically.
# Copy data into coredump.txt
touch coredump.txt
# Start postmortem debugging of executable with this build id
scons debug-coredump firmware=59f08f7a37a7340799d9dba6b0c092bc3c9515c5
This module is only available for rp, sam, stm32.
Options¶
report_level¶
Fault Report Level
This module will try to store as much data as is available in the heap and any leftover data will be discarded. This means the application may not have any heap available after a reboot.
You can control how much data is generated by choosing the right report level:
- core: Just dumps the core registers, which describe where the fault occurred and why. This is usually less than 250 Bytes.
- stack: Dumps the main stack memory. This will get you a full backtrace, but may take a few kB of space.
- data: Dumps all memory sections containing static data:
.data
,.fastdata
,.bss
. This allows you to see data that isn't related to your current fault location, however, this can take several tens of kB of data.
It is strongly recommended to choose the report level that generates less data
than you heap size. The scons size
output displays this very prominently,
if the Data size is smaller than your Heap size, you're good to use the
core+stack+data
setting:
Data: 5.2 KiB (26.0% used) = 2285 B static (11.2%) + 3040 B stack (14.8%)
(.bss + .data + .fastdata + .noinit + .stack)
Heap: 14.8 KiB (74.0% available)
(.heap1)
If Heap is smaller than the Data, you may need to switch to using only the
core+stack
setting:
Data: 11.2 KiB (56.0% used) = 8429 B static (41.2%) + 3040 B stack (14.8%)
(.bss + .data + .fastdata + .noinit + .stack)
Heap: 8.8 KiB (44.0% available)
(.heap1)
Default: core+stack+data
Inputs: [core, core+stack, core+stack+data]
Dependencies¶
Limited availability: Check with 'lbuild discover' if this module is available for your target!