Skip to content

ARM Cortex-M Core

lbuild module: modm:platform:cortex-m

This module generates the startup code, vector table, linkerscript as well as initialize the heap, deal with assertions, provide blocking delay functions, atomic and unaligned access and the GNU build ID.

Since this is only initializes the generic ARM Cortex-M parts, it delegates device-specific initialization to the modm:platform:core module. Please depend on that module directly instead of this one.

Startup

After reset, the ARM Cortex-M hardware jumps to the Reset_Handler(), which is implemented as follows:

  1. The main stack pointer (MSP) is initialized by hardware.
  2. Call __modm_initialize_platform() to initialize the device hardware.
  3. Copy data from internal flash to internal RAM.
  4. Zero sections in internal RAM.
  5. Initialize ARM Cortex-M core: enable FPU and relocate vector table.
  6. Execute shared hardware initialization functions.
  7. Copy data from internal flash to external RAM.
  8. Zero sections in external RAM.
  9. Initialize heap via __modm_initialize_memory() (provided by the modm:platform:cortex-m:allocator option).
  10. Call static constructors.
  11. Call main() application entry point.
  12. If main() returns, assert on core.main.exit (only in debug profile).
  13. Reboot if assertion returns.

Device Initialization

The __modm_initialize_platform() function is called directly after reset, and its purpose is to initialize the device specific hardware, such as enable internal memories or disable the hardware watchdog timer.

It's important to understand that because the .data section has not yet been copied and the .bss section has not yet been zeroed, there exists no valid C environment yet in this function context! This means you cannot use any global variables, not even "local" static ones defined in your function, and depending on your hardware you may not even access read-only data (const variables, global OR local). In addition, if your linkerscript places the main stack pointer into a memory that is disabled on reset, you cannot even access the stack until you've enabled its backing memory. The Reset_Handler therefore calls this function in Assembly without accessing the stack.

It is strongly recommended to only read/write registers in this function, and perhaps even write this function in Assembly if deemed necessary. Do not initialize the device clock, leave the default clock undisturbed!

Additional Initialization

A few modules need to initialize additional hardware during booting. For example: your device has external memories connected that you want to use for the heap. You can create a function that configures the peripherals for these external memories and place a pointer to this function into a special linker section and the startup script will then call this function before heap initialization.

Since the hardware init functions are called after internal data initialization, you have a valid C environment and thus can access the device normally, but since the calls happen before external data and heap initialization you cannot use the heap in these functions!

You can give a relative global order to your init functions. Ordered init functions are called first, then unordered init functions are called in any order. Please note that order numbers 0 - 999 are reserved for use by modm or other libraries!

Unique init function names

Init function names need to be globally unique for linking. Unfortunately there is no simple way of stringifying C++ functions, so you have to provide a name manually for now.

void init_external_sdram()
{
    // configure the hardware here
}
// Startup script calls this function in any order, *after* prioritized functions!
MODM_HARDWARE_INIT(init_external_sdram);
// If you need to pass a C++ function, you need to declare
MODM_HARDWARE_INIT_NAME(init_function_name, namespace::init_function);

// If you need to initialize in a certain order use numbers >= 1000
MODM_HARDWARE_INIT_ORDER(init_before_sdram1, 1000);
// called after init_before_sdram1, since it has a higher order number
MODM_HARDWARE_INIT_NAME_ORDER(init_before_sdram2, namespace::function, 1001);

Interrupt Vector Table

The Cortex-M vector table (VTOR) is target-specific and generated using data from modm-devices. The main stack pointer is allocated according to the linkerscript and the Reset_Handler is defined by the startup script.

All handlers are weakly aliased to Undefined_Handler, which is called if an IRQ is enabled, but no handler is defined for it. This default handler determines the currectly active IRQ, sets its priority to the lowest level, and disables the IRQ from firing again and then asserts on core.nvic.undefined with the (signed) IRQ number as context.

The lowering of the priority is necessary, since the assertion handlers (see modm:architecture:assert) are called from within this active IRQ and its priority should not prevent logging functionality (which might require a UART interrupt to flush data out) from working correctly.

Linkerscript

This module provides building blocks for GNU ld linkerscripts in the form of Jinja macros that the modm:platform:core module assembles into a linkerscript, depending on the memory architecture of the target chosen.

The following macros are available:

  • copyright(): Copyright notice.
  • prefix(): Contains MEMORY sections, output format and entry symbol and stack size definitions

  • section_vector_rom(memory): place the read-only vector table at the beginning of ROM memory.

  • section_vector_ram(memory): place the volatile vector table into RAM memory. You must satisfy alignment requirements externally.

  • section(memory, name): place section .{name} into memory.

  • section_stack(memory, start=None): place the main stack into memory after moving the location counter to start.

  • section_heap(memory, name, section=None): Fill up remaining space in memory with heap section .{name} and add to section.

  • section_rom(memory): place all read-only sections (.text, .rodata etc) into memory.

  • section_ram(memory, rom): place all volatile sections (.data, .bss etc)
  • into memory and load from rom.

  • section_table_zero(memory, sections=[]): place the zeroing table (.bss plus sections) into memory.

  • section_table_copy(memory, sections=[]): place the copying table (.data, .fastdata, .vector_ram plus sections) into memory.
  • section_table_extern(memory): place the zeroing and copying tables for external memories into memory.
  • section_table_heap(memory, sections): place heap tables containing sections into memory.

  • section_rom_start(memory): place at ROM start.

  • section_rom_end(memory): place at ROM end.
  • section_debug(): place debug sections at the very end.

Please consult the modm:platform:core documentation for the target-specific arrangement of these section macros and for potential limitations that the target's memory architecture poses.

Section .fastdata

For devices without data cache place the .fastdata section into the fastest RAM. Please note that the .fastdata section may need to be placed into RAM that is only accessable to the Cortex-M core, which can cause issues with DMA access. However, the .fastdata sections is not required to be DMA-able, and in such a case the developer needs to place the data into the generic .data section or choose a device with a DMA-able fast RAM.

Section .fastcode

For devices without an instruction cache or without a fast RAM connected to the I-bus, place .fastcode into ROM, which usually has a device-specific ROM cache. Please note that using a device with a dedicated instruction cache RAM yields much more predictable performance than executing from ROM, even with a ROM cache.

From the Cortex-M3 Technical Reference Manual:

14.5 System Interface:

The system interface is a 32-bit AHB-Lite bus. Instruction and vector fetches, and data and debug accesses to the System memory space, 0x20000000 - 0xDFFFFFFF, 0xE0100000 - 0xFFFFFFFF, are performed over this bus.

14.5.6 Pipelined instruction fetches:

To provide a clean timing interface on the System bus, instruction and vector fetch requests to this bus are registered. This results in an additional cycle of latency because instructions fetched from the System bus take two cycles. This also means that back-to-back instruction fetches from the System bus are not possible.

Note: Instruction fetch requests to the ICode bus are not registered. Performance critical code must run from the ICode interface.

Adding Sections

The default linkerscripts only describe the internal memory, however, they can be extended for external memories using the linkerscript.* collectors of this module. For example, to add an external 16MB SDRAM to your device and place a static data section there that is copied from flash and use the remainder for heap access, these steps need to be performed:

Add the external SDRAM to the linkerscript's MEMORY statements in the project.xml configuration:

<library>
  <collectors>
    <collect name="modm:platform:cortex-m:linkerscript.memory">
       SDRAM (rwx) : ORIGIN = 0xC0000000, LENGTH = 16M
    </collect>
  <collectors>
</library>

You can also declare this as Python code in a lbuild module.lb file (useful for board support packages modules, see modm:board):

env.collect(":platform:cortex-m:linkerscript.memory",
            "SDRAM (rwx) : ORIGIN = 0xC0000000, LENGTH = 16M")

Add a partition of the new memory to the linkerscripts SECTION statements. Since collectors order is only preserved locally, make sure to add the sections that depend on this order in one value. Here the previous value of the SDRAM location counter is required to "fill up" the remaining memory with the external heap section:

linkerscript_sections = """
.sdramdata :
{
    __sdramdata_load = LOADADDR (.sdramdata);   /* address in FLASH */
    __sdramdata_start = .;                      /* address in RAM */

    KEEP(*(.sdramdata))

    . = ALIGN(4);
    __sdramdata_end = .;
} >SDRAM AT >FLASH

.heap_extern (NOLOAD) : ALIGN(4)
{
    __heap_extern_start = .;
    . = ORIGIN(SDRAM) + LENGTH(SDRAM);
    __heap_extern_end = .;
} >SDRAM
"""
env.collect(":platform:cortex-m:linkerscript.sections", linkerscript_sections)

Next, add the sections that need to be copied from ROM to RAM, here the contents of the .sdramdata section is stored in the internal FLASH memory and needs to be copied into SDRAM during the startup:

linkerscript_copy = """
LONG(__sdramdata_load)
LONG(__sdramdata_start)
LONG(__sdramdata_end)
"""
env.collect(":platform:cortex-m:linkerscript.table_extern.copy", linkerscript_copy)

And finally, to register the remaining memory in SDRAM with the allocator, add the memory range to the heap table. Remember to use the correct memory traits for this memory, see modm:architecture:memory for the trait definitions:

linkerscript_heap = """
LONG(0x801f)
LONG(__heap_extern_start)
LONG(__heap_extern_end)
"""
env.collect(":platform:cortex-m:linkerscript.table_extern.heap", linkerscript_heap)

Linkerscript collectors are plain text

The collectors here only strip the leading/trailing whitespace and newlines and paste the result as is into the linkerscripts. No input validation is performed, so if you receive linker errors with your additions, please check the GNU LD documentation first.

This module is only available for stm32.

Options

allocator

Dynamic memory allocation strategy

By default, the arm-none-eabi toolchain ships with the newlib libc, which uses dlmalloc as the underlying allocator algorithm and only requires the implementation of the void * sbrk(ptrdiff_t size) hook. However, this limits the allocator to use just one memory region, which must then also be of continuous extend, since sbrk can only grow and shrink, but not jump. Therefore, when using the newlib strategy, only the largest memory region is used as heap! Depending on the device memory architecture this can leave large memory regions unused.

For devices with very small memories, we recommend using the block allocator strategy, which uses a very light-weight and simple algorithm. This also only operates on one continuous memory region as heap.

Memory traits

Memories can have different traits, such as DMA-ability or access time. The default memory allocator functions (malloc, new, etc) only return DMA-able memories, ordered by fastest access time. Similarly the search for the largest memory region only considers DMA-able memory.

On multi-SRAM devices

For devices which contain separate memories laid out in a continuous way (often called SRAM1, SRAM2, etc.) the newlib and block strategies choose the largest continuous memory region, even though unaligned accesses across memory regions may not be supported in hardware and lead to a bus fault! Consider using the TLSF implementation, which does not suffer from this issue.

To use all non-statically allocated memory for heap, use the TLSF strategy, which natively supports multiple memory regions. Our implementation treats all internal memories as separate regions, so unaligned accesses across memory boundaries are not an issue. To request heap memory of different traits, see modm::MemoryTraits.

TLSF has static overhead

The TLSF implementation has a static overhead of about 1kB per memory trait group, however, these can then contain multiple non-continuous memory regions. The upside of this large static allocation is very fast allocation times of O(1), but we recommend using TLSF only for devices with multiple memory regions.

Default: newlib
Inputs: [block, newlib, tlsf]
Input Dependency: block -> modm:architecture:heap
Input Dependency: tlsf -> modm:tlsf

linkerscript.flash_offset

Offset of FLASH Section Origin

Add an offset to the default start address of the flash memory. This might be required for bootloaders located there.

Vector Table Relocation

Not all offsets are compatible with the vector table relocation.

Default: 0
Inputs: [0 ... 0x10000] stm32{f0,f1,f3,f4,f7,g0,g4,l1,l4}
Inputs: [0 ... 0x100000] stm32{f1,f2,f4,f7,l4}
Inputs: [0 ... 0x180000] stm32f4
Inputs: [0 ... 0x20000] stm32{f0,f1,f2,f3,f4,g0,g4,l1,l4}
Inputs: [0 ... 0x200000] stm32{f4,f7,l4}
Inputs: [0 ... 0x4000] stm32{f0,f1,f3,g0}
Inputs: [0 ... 0x40000] stm32{f0,f1,f2,f3,f4,f7,g4,l1,l4}
Inputs: [0 ... 0x60000] stm32{f1,f3,f4,l1}
Inputs: [0 ... 0x8000] stm32{f0,f1,f3,g0,g4,l1}
Inputs: [0 ... 0x80000] stm32{f1,f2,f3,f4,f7,g4,l1,l4}
Inputs: [0 ... 0xc0000] stm32{f1,f2}

main_stack_size

Minimum size of the application main stack

The ARM Cortex-M uses a descending stack mechanism which is placed so that it grows towards the beginning of RAM. In case of a stack overflow the hardware then attempts to stack into invalid memory which triggers a HardFault. A stack overflow will therefore never overwrite any static or heap memory and this protection works without the MPU and therefore also on ARM Cortex-M0 devices.

If the vector table is relocated into RAM, the start address needs to be aligned to the next highest power-of-two word depending on the total number of device interrupts. On devices where the table is relocated into the same memory as the main stack, an alignment buffer up to 1kB is added to the main stack.

|              ...                |
|---------------------------------|
|    Interrupt Vectors (in RAM)   |
|        (if re-mapped)           | <-- vector table origin
|---------------------------------| <-- main stack top
|           Main Stack            |
|       (grows downwards)         |
|               |                 |
|               v                 |
|---------------------------------|
|  Alignment buffer for vectors   |
|   (overwritten by main stack!)  |
'---------------------------------' <-- RAM origin

Default: 3040
Inputs: [256 .. 3040 .. 65536]

vector_table_location

Vector table location in ROM or RAM

The vector table is always stored in ROM and copied to RAM by the startup script if required. You can modify the RAM vector table using the CMSIS NVIC functions:

  • void NVIC_SetVector(IRQn_Type IRQn, uint32_t vector)
  • uint32_t NVIC_GetVector(IRQn_Type IRQn)

For applications that do not modify the vector table at runtime, relocation to RAM is not necessary and can save a few hundred bytes of static memory.

By default, the fastest option is chosen depending on the target memory architecture. This does not always mean the table is copied into RAM, and therefore may not be modifiable with this option!

From the ARM Cortex-M4 Technical Reference Manual on exception handling:

  • Processor state is automatically stored to the stack on an exception, and automatically restored from the stack at the end of the Interrupt Service Routine.
  • The vector is fetched in parallel to the state saving, enabling efficient interrupt entry.

On Interrupt Latency

Placing main stack and vector table into the same memory can significantly slow down interrupt latency, since both I-Code and D-Code memory interface need to fetch from the same access port.

This option is only available for stm32{f1,f2,f3,f4,f7,g4,l1,l4}.

Default: ram stm32{f3,f7,g4}
Default: rom stm32{f1,f2,f3,f4,l1,l4}
Inputs: [ram, rom]

Collectors

linkerscript

Computes linkerscript properties (* post-build only):

  • process_stack_size: largest requested process stack size by any module
  • vector_table_location: ram or rom

Stripped and newline-joined collector values of:

  • linkerscript_memory
  • linkerscript_sections
  • linkerscript_extern_zero
  • linkerscript_extern_copy
  • linkerscript_extern_heap

Additional memory properties:

  • memories: unfiltered memory regions
  • regions: memory region names
  • ram_origin: Lowest SRAM origin address
  • ram_origin: Total size of all SRAM regions

:returns: dictionary of linkerscript properties

linkerscript.memory

Additions to the linkerscript's 'MEMORY'

Inputs: [String]

linkerscript.process_stack_size

Maximum required size of the process stack

Inputs: [0 ... +Inf]

linkerscript.sections

Additions to the linkerscript's 'SECTIONS'

Inputs: [String]

linkerscript.table_extern.copy

Additions to the linkerscript's '.table.copy.extern' section

Inputs: [String]

linkerscript.table_extern.heap

Additions to the linkerscript's '.table.heap' section

Inputs: [String]

linkerscript.table_extern.zero

Additions to the linkerscript's '.table.zero.extern' section

Inputs: [String]

vector_table

Computes vector table properties:

  • vector_table: [position] = Full vector name (ie. with _Handler or _IRQHandler suffix)
  • vector_table_location: rom or ram
  • highest_irq: highest IRQ number + 1
  • core: cortex-m{0,3,4,7}{,+,f,fd}

The system vectors start at -16, so you must add 16 to highest_irq to get the total number of vectors in the table!

:returns: a dictionary of vector table properties

Dependencies

modm:platform:cortex-m modm_platform_cortex_m modm: platform: cortex-m modm_architecture_assert modm: architecture: assert modm_platform_cortex_m->modm_architecture_assert modm_architecture_heap modm: architecture: heap modm_platform_cortex_m->modm_architecture_heap modm_architecture_interrupt modm: architecture: interrupt modm_platform_cortex_m->modm_architecture_interrupt modm_architecture_memory modm: architecture: memory modm_platform_cortex_m->modm_architecture_memory modm_cmsis_device modm: cmsis: device modm_platform_cortex_m->modm_cmsis_device modm_platform_clock modm: platform: clock modm_platform_cortex_m->modm_platform_clock modm_tlsf modm: tlsf modm_platform_cortex_m->modm_tlsf

Limited availability: Check with 'lbuild discover' if this module is available for your target!