Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STM32H7xx: Driver for internal flash memory partially uses a fixed flash program word size, which doesn't fit for all STM32H7xx SOCs (e.g. STM32H7A3, STM32H7B0, STM32H7B3) leading to potential flash data corruption #45568

Closed
chrihell opened this issue May 11, 2022 · 2 comments
Assignees
Labels
bug The issue is a bug, or the PR is fixing a bug platform: STM32 ST Micro STM32 priority: low Low impact/importance bug
Milestone

Comments

@chrihell
Copy link
Contributor

Describe the bug
When running an adapted version of the Zephyr littlefs example on a Nucelo-H7A3ZI-Q board it fails to create and format the littlefs filesystem placed in an internal flash partition.

The UART log output shows that littlefs cannot successfully create the filesystem and fails to create superblock 0 with error code -28
(see log output further down)
Unfortunately the meaning of the error code is a bit misleading here.

Please also mention any information which could help others to understand
the problem you're facing:

  • What target platform are you using?
    The Zephyr board name is nucleo_h7a3zi_q.
    It uses an STM32H7A3ZI-Q microcontroller (ARM Cortex-M7).

  • What have you tried to diagnose or workaround this issue?
    I tried various things to examine the issue, mainly log analysis and debugging with gdb.
    Actually I think that the issue originates from the stm32h7xx specific low-level flash driver write routine flash_stm32_write_range(). More on that further down.

To Reproduce
Steps to reproduce the behavior:

  1. Copy attached .conf and .overlay files to directory zephyr\samples\subsys\fs\littlefs\boards to add support for the nucleo_h7a3zi_q board to the littlefs example project. Remove the .txt file extension that was added because of github.
    (see section 'Additional context' for attached files)
  2. Set CONFIG_APP_WIPE_STORAGE=y in prj.conf of the example project.
    This ensures, that the flash partition is always getting erased and that the littlefs filesystem is recreated.
  3. Build example project:
    west build -p -b nucleo_h7a3zi_q zephyr\samples\subsys\fs\littlefs
  4. Connect board via USB cable and open serial console
  5. Flash the example project
    west flash
  6. See UART log ouput, as attached to this issue, showing the error

Expected behavior
The littlefs filesystem initialization shouldn't fail and the example program should be able to execute in the intended way and do it's file operations (e.g. incrementing and writing boot counter to file). No errors should be reported in the log output by littlefs, the file system or flash driver.
File Systems API calls should work as expected.

Impact
I'm rather new to Zephyr and it took me several hours to track down and to locally fix the issue.
Initially I expected that the example project builds and would work as-is out of the box for the used Nucleo board, as support was added for it and littlefs.
So it looked like a showstopper in the beginning, but was in the end luckily more of a learning experience in debugging and working with Zephyr and about filing this bug report.

The issue does probably not only affect littlefs write operations, but potentially affects all internal flash writes on STM32H7xx SOCs that have a flash program word size that is different from 256 bits.

Logs and console output
In the described error situation the UART log output is the following:

I: littlefs partition at /lfs1
*** Booting Zephyr OS build zephyr-v3.0.0-3700-g47850a301c41  ***
Sample program to r/w files on littlefs
Area 0 at 0x80000 on FLASH_CTRL for 524288 bytes
I: LittleFS version 2.4, disk version 2.0
I: FS at FLASH_CTRL:0x80000 is 64 0x2000-byte blocks with 512 cycle
I: sizes: rd 1 ; pr 16 ; ca 256 ; la 32
E: WEST_TOPDIR/modules/fs/littlefs/lfs.c:1077: Corrupted dir pair at {0x0, 0x1}
W: can't mount (LFS -84); formatting
W: WEST_TOPDIR/modules/fs/littlefs/lfs.c:1802: Superblock 0x0 has become unwritable
E: format failed (LFS -28)
E: fs mount error (-28)
FAIL: mount id 0 at /lfs1: -28

The log shows how littlefs tries to format the partition, but fails to write Superblock 0 correctly and then fails with error code -28.

Environment (please complete the following information):

  • OS: Windows 10
  • Toolchain:
    • GNU ARM Embedded Toolchain on Windows (gcc-arm-none-eabi-9-2020-q2-update-win32)
    • xpack-openocd-0.11.0-1-win32-x64
    • CMake 3.20.2
    • Ninja 1.10.2
    • Python 3.8.8 (AMD64 build)
  • Zephyr 3.0.99, Commit hash: 47850a3

Additional context
nucleo_h7a3zi_q.conf.txt
nucleo_h7a3zi_q.overlay.txt

Proposed fix
Change the code in zephyr\drivers\flash\flash_stm32h7x.c:376 to:

for (i = 0; i < len && i + nbytes <= len; i += nbytes, offset += nbytes) {

This essentially replaces the constant 32 by the already calculated number of bytes in nbytes.
The calculated number fits to the flash program word size of the used STM32H7xx SOC type, as it is based on a SOC type specific definition of the flash program word size.

Explanation
Original Code

...
int flash_stm32_write_range(const struct device *dev, unsigned int offset,
			    const void *data, unsigned int len)
{
	int rc = 0;
	int i, j;
	const uint8_t ndwords = FLASH_NB_32BITWORD_IN_FLASHWORD / 2;
	const uint8_t nbytes = FLASH_NB_32BITWORD_IN_FLASHWORD * 4;
	uint8_t unaligned_datas[nbytes];

	for (i = 0; i < len && i + nbytes <= len; i += nbytes, offset += nbytes) {
		rc = write_ndwords(dev, offset,
				   (const uint64_t *) data + (i >> 3),
				   ndwords);
		if (rc < 0) {
			return rc;
		}
	}
...
  • In the shown code part, with each execution of the for loop the number of dwords (64 bits / 8 bytes) is written to flash that is needed to match the flash program word size.

  • Most STM32H7xx SOCs seem to have 256 bits / 32 bytes flash program word size. This results in ndwords being equal to 32 / 8 = 4

  • That's the reason why the hard-coded constant 32 is used to increment the byte offset in the for loop too.

  • Unfortunately some (newer) STM32H7xx SOC types use only a flash program word size of 128 bits / 16 bytes (e.g. STM32H7A3, STM32H7B0, STM32H7B3 types).
    As ndwords and nbytes are calculated based on definitions from the SOC type specific header file they are correct and fitting.
    However the constant 32 in the for loop is not, as it would need to be set to 16 in this case.

  • The issue is caused by the fact, that only two dwords are written via the write_ndwords() call each loop, so 16 bytes, but the byte offset in the for loop is still incremented by 32 bytes.
    When checking the flash content, one can see that a block of 16 bytes of data is followed by an erased block of 16 bytes in an alternating fashion.

  • Proposed solution: The code can be fixed by replacing the constant 32 by the calculated constant nbytes, as this constant respects the definition from the SOC type specific header file from the modules\hal\stm32\stm32cube\stm32h7xx\soc directory.
    See the definition for FLASH_NB_32BITWORD_IN_FLASHWORD in those header files.

Log output via UART after applying the proposed solution

Littlefs detects a corrupted filesystem, as the flash partition got erased by the application.
But this time it succeeds with formatting and the freshly created filesystem can be mounted afterwards.
The file operations of the example application work without errors, too.

I: littlefs partition at /lfs1
*** Booting Zephyr OS build zephyr-v3.0.0-3700-g47850a301c41  ***
Sample program to r/w files on littlefs
Area 0 at 0x80000 on FLASH_CTRL for 524288 bytes
E: Erasing flash area ... 0
I: LittleFS version 2.4, disk version 2.0
I: FS at FLASH_CTRL:0x80000 is 64 0x2000-byte blocks with 512 cycle
I: sizes: rd 1 ; pr 16 ; ca 256 ; la 32
E: WEST_TOPDIR/modules/fs/littlefs/lfs.c:1077: Corrupted dir pair at {0x0, 0x1}
W: can't mount (LFS -84); formatting
I: /lfs1 mounted
/lfs1 mount: 0
/lfs1: bsize = 16 ; frsize = 8192 ; blocks = 64 ; bfree = 62

Listing dir /lfs1 ...
/lfs1/boot_count read count:0 (bytes: 0)
/lfs1/boot_count write new boot count 1: [wr:1]
I: Test file: /lfs1/pattern.bin not found, create one!
------ FILE: /lfs1/pattern.bin ------
01 55 55 55 55 55 55 55 02 55 55 55 55 55 55 55
03 55 55 55 55 55 55 55 04 55 55 55 55 55 55 55
05 55 55 55 55 55 55 55 06 55 55 55 55 55 55 55
07 55 55 55 55 55 55 55 08 55 55 55 55 55 55 55
09 55 55 55 55 55 55 55 0a 55 55 55 55 55 55 55
0b 55 55 55 55 55 55 55 0c 55 55 55 55 55 55 55
0d 55 55 55 55 55 55 55 0e 55 55 55 55 55 55 55
0f 55 55 55 55 55 55 55 10 55 55 55 55 55 55 55
11 55 55 55 55 55 55 55 12 55 55 55 55 55 55 55
13 55 55 55 55 55 55 55 14 55 55 55 55 55 55 55
15 55 55 55 55 55 55 55 16 55 55 55 55 55 55 55
17 55 55 55 55 55 55 55 18 55 55 55 55 55 55 55
19 55 55 55 55 55 55 55 1a 55 55 55 55 55 55 55
1b 55 55 55 55 55 55 55 1c 55 55 55 55 55 55 55
1d 55 55 55 55 55 55 55 1e 55 55 55 55 55 55 55
1f 55 55 55 55 55 55 55 20 55 55 55 55 55 55 55
21 55 55 55 55 55 55 55 22 55 55 55 55 55 55 55
23 55 55 55 55 55 55 55 24 55 55 55 55 55 55 55
25 55 55 55 55 55 55 55 26 55 55 55 55 55 55 55
27 55 55 55 55 55 55 55 28 55 55 55 55 55 55 55
29 55 55 55 55 55 55 55 2a 55 55 55 55 55 55 55
2b 55 55 55 55 55 55 55 2c 55 55 55 55 55 55 55
2d 55 55 55 55 55 55 55 2e 55 55 55 55 55 55 55
2f 55 55 55 55 55 55 55 30 55 55 55 55 55 55 55
31 55 55 55 55 55 55 55 32 55 55 55 55 55 55 55
33 55 55 55 55 55 55 55 34 55 55 55 55 55 55 55
35 55 55 55 55 55 55 55 36 55 55 55 55 55 55 55
37 55 55 55 55 55 55 55 38 55 55 55 55 55 55 55
39 55 55 55 55 55 55 55 3a 55 55 55 55 55 55 55
3b 55 55 55 55 55 55 55 3c 55 55 55 55 55 55 55
3d 55 55 55 55 55 55 55 3e 55 55 55 55 55 55 55
3f 55 55 55 55 55 55 55 40 55 55 55 55 55 55 55

41 55 55 55 55 55 55 55 42 55 55 55 55 55 55 55
43 55 55 55 55 55 55 55 44 55 55 55 55 55 55 55
45 55 aa
I: /lfs1 unmounted
/lfs1 unmount: 0
@chrihell chrihell added the bug The issue is a bug, or the PR is fixing a bug label May 11, 2022
@erwango erwango added priority: low Low impact/importance bug platform: STM32 ST Micro STM32 labels May 12, 2022
@ABOSTM
Copy link
Collaborator

ABOSTM commented May 12, 2022

@chrihell ,
Thanks for reporting this issue, and welcome to Zephyr community.
Thank you also for your analysis, which is very detailed, and further more exact,
And finally thanks for the fix you proposed, and the test you performed, it looks good.

So you made all the job, thus I propose you to continu the Zephyr experience, and push, on your own, the fix that you proposed in a Pull Request.

@chrihell
Copy link
Contributor Author

@ABOSTM
Thanks for the quick and welcoming reply. I'll try to do the PR then.
It can't be that hard to change a line of code, right? ;)

@erwango erwango added this to the v3.1.0 milestone May 16, 2022
chrihell added a commit to chrihell/zephyr that referenced this issue May 16, 2022
The flash_stm32_write_range() function of the STM32H7x flash
driver partially uses a wrong flash program word size for certain
SOC types when calculating the flash write offset. If the used
SOC is not having a flash program word size of 256 bits / 32 bytes
the written data might get corrupted, as the flash write offset
value does not match the number of bytes that were actually
written.

Fixes zephyrproject-rtos#45568

Signed-off-by: Christoph Heller <[email protected]>
laxiLang pushed a commit to laxiLang/zephyr that referenced this issue May 30, 2022
The flash_stm32_write_range() function of the STM32H7x flash
driver partially uses a wrong flash program word size for certain
SOC types when calculating the flash write offset. If the used
SOC is not having a flash program word size of 256 bits / 32 bytes
the written data might get corrupted, as the flash write offset
value does not match the number of bytes that were actually
written.

Fixes zephyrproject-rtos#45568

Signed-off-by: Christoph Heller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The issue is a bug, or the PR is fixing a bug platform: STM32 ST Micro STM32 priority: low Low impact/importance bug
Projects
None yet
Development

No branches or pull requests

3 participants