Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gptp does not work well on NXP rt series platform #33747

Closed
kevin137 opened this issue Mar 26, 2021 · 239 comments · Fixed by #35328
Closed

gptp does not work well on NXP rt series platform #33747

kevin137 opened this issue Mar 26, 2021 · 239 comments · Fixed by #35328
Assignees
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug platform: NXP NXP priority: low Low impact/importance bug

Comments

@kevin137
Copy link

Description of bug

Problem with the gptp_event_capture() function in the gPTP demo in ZephyrOS on the RT1020EVK: time.second member of net_ptp_time struct is updated correctly, but time.nanosecond is NOT. The same function in the same demo on the FRDM-K64F works correctly.

Background

We are trying to use gPTP (IEEE 802.1AS) on the i.MXRT platform to synchronize timestamps across devices in a sensor network. All the hardware documentation for the i.MXRT and App Notes such as AN12149 (https://www.nxp.com/docs/en/nxp/application-notes/AN12149.pdf) indicate that this should be possible. We are interested in using ZephyrOS because we feel it should work well with the way we develop embedded systems, which are at the moment all based on Linux and Yocto. Zephyr seems to have good network support, and gPTP support, at least on some platforms (https://docs.zephyrproject.org/latest/reference/networking/gptp.html). To kick-start this process, and to evaluate whether gPTP/ZephyrOS/i.MXRT is a good technical solution for synchronization, we asked a graduate student at Universitat Politècnica de Catalunya, Santi Prats, to help us.

Santi advanced significantly, first on installing the SDK and tools, then with compiling Zephyr and applications for i.MXRT and Kinetis, also with PTP and gPTP synchronization purely in Linux. Then he encountered a problem which stumped him and us.

To Reproduce

On the FRDM-K64F board (https://docs.zephyrproject.org/latest/boards/arm/frdm_k64f/doc/index.html), he is able to extract timestamps from the gPTP system with nanosecond resolution. But on the RT1020EVK (https://docs.zephyrproject.org/latest/boards/arm/mimxrt1020_evk/doc/index.html), using the same code, with the same libraries and tools, the nanosecond member of the net_ptp_time struct is NOT being filled, and therefore we are only able able to read the PTP Hardware Clock with a resolution of seconds. Reproducing the issue is as simple as adding a few extra LOG_INF() calls to the gptp demo (see below), compiling it for the RT1020-EVK, flashing it, and running it.

Lightly modified demo code

/∗USER BEGIN INCLUDES∗/
#include <net/ptptime.h>
#include <sys/printk.h>
#include <sys/util.h>
/∗USER END INCLUDES∗/
/∗USER BEGIN VARIABLES∗/
static struct net_ptp_time slave_time;
//struct gptp_clk_src_time_invokeparams src_time_invoke_parameters;
bool gm present ;
int status ;
/∗USER END VARIABLES∗/
void main ( v o i d )
{
  /∗ USER BEGIN MAIN. C∗/
  while (1) {
  status=gptp_event_capture(&slave_time, &gm_present) ;
  LOG_INF( ” ” ) ;
  LOG_INF( ” ” ) ;
  LOG_INF( ” Standard info plot:” ) ;
  LOG_INF( ”gPTP event capture is %i ” , status ) ; // 0 es NO ERROR
  LOG_INF( ”gPTP time second %u” , slave_time.second ) ;
  LOG_INF( ” ” ) ;
  LOG_INF( ” Plot slave time SECONDS: ” ) ;
  LOG_INF( ”gPTP slave time second(u) %u” , slave_time.second ) ;
  LOG_INF( ”gPTP slave time second (X) 0x %X” , slave_time.second ) ;
  LOG_INF( ” ” ) ;
  LOG_INF( ”Plot slave time NANOSECONDS: ” ) ;
  LOG_INF( ”gPTP slave time nanosecond (u) %u” , slave_time.nanosecond ) ;
  LOG_INF( ”gPTP slave time nanosecond (X) 0x %X” , slave_time.nanosecond ) ;
  LOG_INF( ” ” ) ;
  LOG_INF( ” slave_time.second address : 0x %X” , &( slave_time.second ) ) ;
  LOG_INF( ” slave_time.nanosecond address : 0x %X” , &( slave_time.nanosecond ) ) ;
  kmsleep(1000) ; // sleep time in ms
  }
  /∗ USER END MAIN. C∗/
}

Build

$ west build -b mimxrt1020 evk samples/net/gptp/
$ west flash

Run

[00:04:59.182,000] <inf> net_gptp_sample: 
[00:04:59.182,000] <inf> net_gptp_sample: Standard info plot:
[00:04:59.182,000] <inf> net_gptp_sample: gPTP event capture is 0
[00:04:59.182,000] <inf> net_gptp_sample: gPTP time second 1614051136
[00:04:59.182,000] <inf> net_gptp_sample: 
[00:04:59.182,000] <inf> net_gptp_sample: Plot slave time SECONDS:
[00:04:59.182,000] <inf> net_gptp_sample: gPTP slave time second (u) 1614051136
[00:04:59.182,000] <inf> net_gptp_sample: gPTP slave time second (X) 0x60347740
[00:04:59.182,000] <inf> net_gptp_sample: 
[00:04:59.182,000] <inf> net_gptp_sample: Plot slave time NANOSECONDS:
[00:04:59.182,000] <inf> net_gptp_sample: gPTP slave time nanosecond (u) 0
[00:04:59.182,000] <inf> net_gptp_sample: gPTP slave time nanosecond (X) 0x0
[00:04:59.182,000] <inf> net_gptp_sample: 
[00:04:59.182,000] <inf> net_gptp_sample: slave_time.second address: 0x80001DC0
[00:04:59.182,000] <inf> net_gptp_sample: slave_time.second address: 0x80001DC8
uart:~$

Expected behavior

The expected behavior is what happens running the same code on the FRDM-k64f:

[00:05:27.167,000] <inf> net_gptp_sample: 
[00:05:27.167,000] <inf> net_gptp_sample: Standard info plot:
[00:05:27.167,000] <inf> net_gptp_sample: gPTP event capture is 0
[00:05:27.167,000] <inf> net_gptp_sample: gPTP time second 1614072200
[00:05:27.167,000] <inf> net_gptp_sample: 
[00:05:27.167,000] <inf> net_gptp_sample: Plot slave time SECONDS:
[00:05:27.167,000] <inf> net_gptp_sample: gPTP slave time second (u) 1614072200
[00:05:27.167,000] <inf> net_gptp_sample: gPTP slave time second (X) 0x6034C988
[00:05:27.167,000] <inf> net_gptp_sample: 
[00:05:27.167,000] <inf> net_gptp_sample: Plot slave time NANOSECONDS:
[00:05:27.167,000] <inf> net_gptp_sample: gPTP slave time nanosecond (u) 906384492
[00:05:27.167,000] <inf> net_gptp_sample: gPTP slave time nanosecond (X) 0x3605646C
[00:05:27.167,000] <inf> net_gptp_sample: 
[00:05:27.167,000] <inf> net_gptp_sample: slave_time.second address: 0x20001DC0
[00:05:27.167,000] <inf> net_gptp_sample: slave_time.second address: 0x20001DC8
uart:~$

Note that on the FRDM-k64f, the nanosecond member IS being updated.

Impact
We are unable to proceed with the development of the time synchronization element of our new sensor solution because of this issue. Santi is attempting to make equivalent functionality work in FreeRTOS to continue with his academic pursuits, but this is not a great solution for us, because other team members are developing other components of the sensor firmware in Zephyr. We the just getting started with Zephyr; we like what we see so far, but timestamp synchronization with gPTP is at the very heart of what we are trying to do with our sensors--if we can not make it work, we will need to change our strategy. We reached out to NXP through our distributor EBV, and they have confirmed that the gPTP stack and demo should work in Zephyr on the RT1020-EVK. They encouraged us to report the issue here.

GDB / console output

Using the gdb debugger, Santi isolated the differences in the progress of the call to gptp_event_capture to a single line of code that differs between the FRDM-K64F and the MIMXRT1020-EVK. The relevant part of his (attached) document describing the problem follows:

GDB debugging

To investigate the differences between what is happening on the FRDM-K64F and the MIMXRT1020-EVK, the GDB debugger was used. The following was executed from the Zephyr directory:

$ west debug

We enter into "debug mode"

(gdb) layout src
(gdb) advance gptp event capture

A position is set in the moment just before the execution of

status=gptp event capture(&slave time, &gm present)

At this instant,

(gdb) print slave time.second
(gdb) print slave time.nanoecond

Both return 0. We now step through to see execution of gptp_event_capture()

(gdb) step

Before the execution finishes, and analyzing the output of (gdb) step, it is seen that execution passes through the following files and in this order:

/include/arch/arm/aarch32/asm inline gcc.h line 56
/subsys/net/l2/ethernet/gptp/gptp user api.c line
/subsys/net/l2/ethernet/gptp/gptp user api.c line 62 
    #### [ The 1020EVK does NOT pass through line 62, unlike the FRDM-K64F ]
/subsys/net/l2/ethernet/gptp/gptp user api.c line 64
/subsys/net/l2/ethernet/gptp/gptp user api.c line 66
/subsys/net/l2/ethernet/ethernet.c line 1059
/include/net/net if.h line 589
/subsys/net/l2/ethernet/ethernet.c line 1060
/subsys/net/l2/ethernet/ethernet.c line 1062
/include/net/net if.h line 555
/subsys/net/l2/ethernet/ethernet.c line 1070
/drivers/ethernet/eth mcux.c line 1083
/subsys/net/l2/ethernet/ethernet.c line 1074
/include/net/net if.h line 1078
/include/net/net if.h line 589
/modules/hal/cmsis/CMSIS/Core/Include/cmsis gcc.h line 453
/modules/hal/nxp/mcux/drivers/imx/fsl common.h line 566
/modules/hal/cmsis/CMSIS/Core/Include/cmsis gcc.h l´ınea 209
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2857
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2826
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2828
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2833 - 2835
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2838
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2860 - 2862
/modules/hal/nxp/mcux/drivers/imx/fsl enet.c line 2866
/modules/hal/cmsis/CMSIS/Core/Include/cmsis gcc.h line 481
/drivers/ethernet/eth mcux.c line 1439
/drivers/ethernet/eth mcux.c line 1440
/drivers/ethernet/eth mcux.c line 1441
/subsys/net/l2/ethernet/gptp/gptp user api.c line 69
/include/arch/arm/aarch32/asm inline gcc.h line 95
/subsys/net/l2/ethernet/gptp/gptp user api.c line 70
/samples/net/gptp/src/main.c

Finally,

(gdb) print slave time.second
(gdb) print slave time.nanoecond

Now, in the display of the time.seconds member, a correct value is seen, while the time.nanoseconds member still reads 0:
debug_1020

In conclusion, we are seeing the desired behavior on the FRDM-K64F; we are able to read timestamps down to the nanosecond. The same application code, making the same api calls on the RT1020 is not returning nanoseconds, however, and it all seems to come down to line 62 in gptp_user_api.c

Environment:

  • OS: Linux
  • Toolchain: Zephyr SDK 0.12.2, West v0.9.0, cmake 3.16.3, pip 20.0.2, GDB
  • Commit SHA: 2857c2e

Zephyr6.pdf

@kevin137 kevin137 added the bug The issue is a bug, or the PR is fixing a bug label Mar 26, 2021
@jukkar
Copy link
Member

jukkar commented Mar 29, 2021

/subsys/net/l2/ethernet/gptp/gptp user api.c line 62
#### [ The 1020EVK does NOT pass through line 62, unlike the FRDM-K64F ]
/subsys/net/l2/ethernet/gptp/gptp user api.c line 64

What does the "NOT pass" mean here?
The line 62 is this one

	for (port = GPTP_PORT_START; port <= GPTP_PORT_END; port++) {

Do you mean that the in 1020EVK, the for loop is not entered?
That would mean that GPTP_PORT_END which is (gptp_domain.default_ds.nb_ports + 1) which means that nb_ports variable is 0 which means that the system was not able to find any ptp clock in the system. That would indicate some issue in mcux driver if this is really the case.
Edit: actually, because how the for loop is written <=, we should always enter the loop.

The gptp subsystem is not board specific so this indicates some issue in mcux driver and/or NXP HAL.

@kevin137
Copy link
Author

My interpretation was that the GPTP_PORT_END condition was not allowing the loop to be entered even one time, so it was getting optimized out by the compiler. Could it be that is a HAL issue then? We also have access to a couple of RT1050 EVKs, though they are in a different office. Would it be helpful to run the same test on a 1050?

@jukkar
Copy link
Member

jukkar commented Mar 29, 2021

Would it be helpful to run the same test on a 1050?

Yes, that would be useful information to know if there are issues in 1050 too.

@MaureenHelm
Copy link
Member

Could it be that is a HAL issue then?

i.MX RT and Kinetis use the same HAL driver (fsl_enet.c).

Would it be helpful to run the same test on a 1050?

Yes, please.

@hakehuang can you have a look?

@hakehuang
Copy link
Collaborator

hakehuang commented Mar 30, 2021

@kevin137 do you follow the document in gptp sample document to set up the environment I find an issue on frdm_k64f and can reproduce it on mimxrt1060_evk, #29599

can you try the SDK ptp1588 example which can be download from kex.nxp.com, the log below shows that the board is ok.
note:
you need make a local loop back RT45 cable connected with board. see this diagram

besides, do you add the board setting to mimxrt1020_evk, in gpt sample as below

CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y
Get the 1-th time 3 second, 813707720 nanosecond
 Get the 2-th time 3 second, 891518500 nanosecond
 Get the 3-th time 3 second, 969332420 nanosecond
 Get the 4-th time 4 second, 47146660 nanosecond
 Get the 5-th time 4 second, 124916860 nanosecond
 Get the 6-th time 4 second, 202730800 nanosecond
 Get the 7-th time 4 second, 280544740 nanosecond
 Get the 8-th time 4 second, 358358700 nanosecond
 Get the 9-th time 4 second, 436172620 nanosecond
 Get the 10-th time 4 second, 513986560 nanosecond
The 1 frame transmitted success! the timestamp is 4 second, 591861900 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 591862760 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 2 frame transmitted success! the timestamp is 4 second, 602166460 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 602167320 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 3 frame transmitted success! the timestamp is 4 second, 612474060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 612474920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 4 frame transmitted success! the timestamp is 4 second, 622781860 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 622782720 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 5 frame transmitted success! the timestamp is 4 second, 633089660 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 633090520 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 6 frame transmitted success! the timestamp is 4 second, 643397260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 643398120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 7 frame transmitted success! the timestamp is 4 second, 653705060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 653705920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 8 frame transmitted success! the timestamp is 4 second, 664012660 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 664013520 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 9 frame transmitted success! the timestamp is 4 second, 674320460 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 674321320 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 10 frame transmitted success! the timestamp is 4 second, 684628260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 684629120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 11 frame transmitted success! the timestamp is 4 second, 694979260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 694980120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 12 frame transmitted success! the timestamp is 4 second, 705330460 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 705331320 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 13 frame transmitted success! the timestamp is 4 second, 715681660 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 715682520 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 14 frame transmitted success! the timestamp is 4 second, 726032860 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 726033720 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 15 frame transmitted success! the timestamp is 4 second, 736384060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 736384920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 16 frame transmitted success! the timestamp is 4 second, 746735060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 746735920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 17 frame transmitted success! the timestamp is 4 second, 757086260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 757087120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 18 frame transmitted success! the timestamp is 4 second, 767437500 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 767438360 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 19 frame transmitted success! the timestamp is 4 second, 777788700 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 777789560 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 20 frame transmitted success! the timestamp is 4 second, 788139860 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 788140720 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60

@kevin137
Copy link
Author

kevin137 commented Mar 30, 2021

Could it be that is a HAL issue then?

i.MX RT and Kinetis use the same HAL driver (fsl_enet.c).

Would it be helpful to run the same test on a 1050?

Yes, please.

@hakehuang can you have a look?

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

@hakehuang
Copy link
Collaborator

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

@kevin137 do you enable below config in mimxrt1020_evk?

CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y

@santi681
Copy link

@kevin137 do you follow the document in gptp sample document to set up the environment I find an issue on frdm_k64f and can reproduce it on mimxrt1060_evk, #29599

can you try the SDK ptp1588 example which can be download from kex.nxp.com, the log below shows that the board is ok.
note:
you need make a local loop back RT45 cable connected with board. see this diagram

besides, do you add the board setting to mimxrt1020_evk, in gpt sample as below

CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y
Get the 1-th time 3 second, 813707720 nanosecond
 Get the 2-th time 3 second, 891518500 nanosecond
 Get the 3-th time 3 second, 969332420 nanosecond
 Get the 4-th time 4 second, 47146660 nanosecond
 Get the 5-th time 4 second, 124916860 nanosecond
 Get the 6-th time 4 second, 202730800 nanosecond
 Get the 7-th time 4 second, 280544740 nanosecond
 Get the 8-th time 4 second, 358358700 nanosecond
 Get the 9-th time 4 second, 436172620 nanosecond
 Get the 10-th time 4 second, 513986560 nanosecond
The 1 frame transmitted success! the timestamp is 4 second, 591861900 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 591862760 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 2 frame transmitted success! the timestamp is 4 second, 602166460 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 602167320 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 3 frame transmitted success! the timestamp is 4 second, 612474060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 612474920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 4 frame transmitted success! the timestamp is 4 second, 622781860 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 622782720 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 5 frame transmitted success! the timestamp is 4 second, 633089660 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 633090520 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 6 frame transmitted success! the timestamp is 4 second, 643397260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 643398120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 7 frame transmitted success! the timestamp is 4 second, 653705060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 653705920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 8 frame transmitted success! the timestamp is 4 second, 664012660 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 664013520 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 9 frame transmitted success! the timestamp is 4 second, 674320460 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 674321320 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 10 frame transmitted success! the timestamp is 4 second, 684628260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 684629120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 11 frame transmitted success! the timestamp is 4 second, 694979260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 694980120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 12 frame transmitted success! the timestamp is 4 second, 705330460 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 705331320 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 13 frame transmitted success! the timestamp is 4 second, 715681660 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 715682520 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 14 frame transmitted success! the timestamp is 4 second, 726032860 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 726033720 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 15 frame transmitted success! the timestamp is 4 second, 736384060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 736384920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 16 frame transmitted success! the timestamp is 4 second, 746735060 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 746735920 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 17 frame transmitted success! the timestamp is 4 second, 757086260 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 757087120 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 18 frame transmitted success! the timestamp is 4 second, 767437500 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 767438360 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 19 frame transmitted success! the timestamp is 4 second, 777788700 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 777789560 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60
The 20 frame transmitted success! the timestamp is 4 second, 788139860 nanosecond
 A frame received. the length 1000  the timestamp is 4 second, 788140720 nanosecond
 Dest Address 01:00:5e:01:01:01 Src Address d4:be:d9:45:22:60

Yes, SDK ptp1588 example works as you mentioned for MIMXRT1020 and FRDM-k64f, although the PTP timestamp expected would be the same as in the timestamp in the master node (a Linux ptp4l implementation) as in Zephyr OS, right? Or why is the timestamp beginning at 0.0 seconds? Is the MIMXRT1020 or FRDM-k64f working as a master node? Is it possible to change that?

@santi681
Copy link

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

@kevin137 do you enable below config in mimxrt1020_evk?

CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y

Hi @hakehuang , this configuration file for frdm_k64f can be easily found at zephyrproject/zephyr/samples/net/gptp/board , but where is the file related to the mimxrt1020_evk configuration? Where do we must add these config parameters for mimxrt1020_evk? Thanks in advance!

@hakehuang
Copy link
Collaborator

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

@kevin137 do you enable below config in mimxrt1020_evk?

CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y

Hi @hakehuang , this configuration file for frdm_k64f can be easily found at zephyrproject/zephyr/samples/net/gptp/board , but where is the file related to the mimxrt1020_evk configuration? Where do we must add these config parameters for mimxrt1020_evk? Thanks in advance!

@santi681 , you can look the the eth_mcux.c code and find the CONFIG_PTP_CLOCK_MCUX, this is the option to enalbe ptp time stamp.

@hakehuang
Copy link
Collaborator

Yes, SDK ptp1588 example works as you mentioned for MIMXRT1020 and FRDM-k64f, although the PTP timestamp expected would be the same as in the timestamp in the master node (a Linux ptp4l implementation) as in Zephyr OS, right? Or why is the timestamp beginning at 0.0 seconds? Is the MIMXRT1020 or FRDM-k64f working as a master node? Is it possible to change that?

the SDK exmaple only to use to check whether there is a board issue. it only examples to add timestamp to udp package, which is not a ptp example. In zephyr gptp case, according to the gptp protocal, when you connect two ptp instance, they will send out pdelay_req and then choose to be master or client, then mast starts to send sync packet. I do not have a workable linux host as ptp master. I tried to connect two boards, but it looks buggy as I reported in #29599

@santi681
Copy link

Yes, SDK ptp1588 example works as you mentioned for MIMXRT1020 and FRDM-k64f, although the PTP timestamp expected would be the same as in the timestamp in the master node (a Linux ptp4l implementation) as in Zephyr OS, right? Or why is the timestamp beginning at 0.0 seconds? Is the MIMXRT1020 or FRDM-k64f working as a master node? Is it possible to change that?

the SDK exmaple only to use to check whether there is a board issue. it only examples to add timestamp to udp package, which is not a ptp example. In zephyr gptp case, according to the gptp protocal, when you connect two ptp instance, they will send out pdelay_req and then choose to be master or client, then mast starts to send sync packet. I do not have a workable linux host as ptp master. I tried to connect two boards, but it looks buggy as I reported in #29599

Clear with that, thanks @hakehuang !

@santi681
Copy link

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

@kevin137 do you enable below config in mimxrt1020_evk?

CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y

Hi @hakehuang , this configuration file for frdm_k64f can be easily found at zephyrproject/zephyr/samples/net/gptp/board , but where is the file related to the mimxrt1020_evk configuration? Where do we must add these config parameters for mimxrt1020_evk? Thanks in advance!

@santi681 , you can look the the eth_mcux.c code and find the CONFIG_PTP_CLOCK_MCUX, this is the option to enalbe ptp time stamp.

Thanks for your replies @hakehuang ! I simply defined these parameters at the beginning of the eth_mcux.c code if this is what I was supposed to do. Defining these config parameters doesn't make any difference to the results. In fact PTP seems to be working properly because slave_time.second are properly synchronized, the problem still remains on getting the slave_time.nanosecond, which is still zero at any time.

@jukkar
Copy link
Member

jukkar commented Mar 31, 2021

I simply defined these parameters at the beginning of the eth_mcux.c code if this is what I was supposed to do

You should not do that, but set the options in your applications prj.conf file. Some other source file might use these options too in which case they would not be set in all the needed places.

@hakehuang
Copy link
Collaborator

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

@kevin137 do you enable below config in mimxrt1020_evk?

CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y

Hi @hakehuang , this configuration file for frdm_k64f can be easily found at zephyrproject/zephyr/samples/net/gptp/board , but where is the file related to the mimxrt1020_evk configuration? Where do we must add these config parameters for mimxrt1020_evk? Thanks in advance!

@santi681 , you can look the the eth_mcux.c code and find the CONFIG_PTP_CLOCK_MCUX, this is the option to enalbe ptp time stamp.

Thanks for your replies @hakehuang ! I simply defined these parameters at the beginning of the eth_mcux.c code if this is what I was supposed to do. Defining these config parameters doesn't make any difference to the results. In fact PTP seems to be working properly because slave_time.second are properly synchronized, the problem still remains on getting the slave_time.nanosecond, which is still zero at any time.

the main missing here is in the HAL driver see the
modules/hal/nxp/mcux/drivers/imx/CMakeLists.txt

zephyr_compile_definitions_ifdef(
  CONFIG_PTP_CLOCK_MCUX
  ENET_ENHANCEDBUFFERDESCRIPTOR_MODE
)

this enables the HAL driver to output ptp time stamp.

@santi681
Copy link

santi681 commented Apr 6, 2021

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

@kevin137 do you enable below config in mimxrt1020_evk?

CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y

Hi @hakehuang , this configuration file for frdm_k64f can be easily found at zephyrproject/zephyr/samples/net/gptp/board , but where is the file related to the mimxrt1020_evk configuration? Where do we must add these config parameters for mimxrt1020_evk? Thanks in advance!

@santi681 , you can look the the eth_mcux.c code and find the CONFIG_PTP_CLOCK_MCUX, this is the option to enalbe ptp time stamp.

Thanks for your replies @hakehuang ! I simply defined these parameters at the beginning of the eth_mcux.c code if this is what I was supposed to do. Defining these config parameters doesn't make any difference to the results. In fact PTP seems to be working properly because slave_time.second are properly synchronized, the problem still remains on getting the slave_time.nanosecond, which is still zero at any time.

the main missing here is in the HAL driver see the
modules/hal/nxp/mcux/drivers/imx/CMakeLists.txt

zephyr_compile_definitions_ifdef(
  CONFIG_PTP_CLOCK_MCUX
  ENET_ENHANCEDBUFFERDESCRIPTOR_MODE
)

this enables the HAL driver to output ptp time stamp.

CMakeLists.txt file:

#
# Copyright (c) 2018, NXP
#
# SPDX-License-Identifier: Apache-2.0
#

zephyr_include_directories(.)

zephyr_compile_definitions_ifdef(
  CONFIG_PTP_CLOCK_MCUX
  ENET_ENHANCEDBUFFERDESCRIPTOR_MODE
)

zephyr_library_compile_definitions_ifdef(
  CONFIG_HAS_MCUX_CACHE FSL_SDK_ENABLE_DRIVER_CACHE_CONTROL
)

zephyr_library_sources_ifdef(CONFIG_CAN_MCUX_FLEXCAN	fsl_flexcan.c)
zephyr_library_sources_ifdef(CONFIG_COUNTER_MCUX_GPT	fsl_gpt.c)
zephyr_library_sources_ifdef(CONFIG_COUNTER_MCUX_PIT	fsl_pit.c)
zephyr_library_sources_ifdef(CONFIG_DISPLAY_MCUX_ELCDIF	fsl_elcdif.c)
zephyr_library_sources_ifdef(CONFIG_DMA_MCUX_EDMA	fsl_edma.c)
zephyr_library_sources_ifdef(CONFIG_DMA_MCUX_EDMA	fsl_dmamux.c)
zephyr_library_sources_ifdef(CONFIG_ENTROPY_MCUX_TRNG	fsl_trng.c)
zephyr_library_sources_ifdef(CONFIG_ETH_MCUX		fsl_enet.c)
zephyr_library_sources_ifdef(CONFIG_FLASH_MCUX_FLEXSPI	fsl_flexspi.c)
zephyr_library_sources_ifdef(CONFIG_GPIO_MCUX_IGPIO	fsl_gpio.c)
zephyr_library_sources_ifdef(CONFIG_HAS_MCUX_CACHE	fsl_cache.c)
zephyr_library_sources_ifdef(CONFIG_I2C_MCUX_LPI2C	fsl_lpi2c.c)
zephyr_library_sources_ifdef(CONFIG_I2S_MCUX_SAI	fsl_sai.c)
zephyr_library_sources_ifdef(CONFIG_I2S_MCUX_SAI	fsl_sai_edma.c)
zephyr_library_sources_ifdef(CONFIG_PWM_MCUX		fsl_pwm.c)
zephyr_library_sources_ifdef(CONFIG_SPI_MCUX_LPSPI	fsl_lpspi.c)
zephyr_library_sources_ifdef(CONFIG_UART_MCUX_LPUART	fsl_lpuart.c)
zephyr_library_sources_ifdef(CONFIG_VIDEO_MCUX_CSI	fsl_csi.c)
zephyr_library_sources_ifdef(CONFIG_WDT_MCUX_IMX_WDOG    fsl_wdog.c)
zephyr_library_sources_ifdef(CONFIG_CAN_MCUX_FLEXCAN	fsl_flexcan.c)

if(NOT CONFIG_ASSERT OR CONFIG_FORCE_NO_ASSERT)
  zephyr_compile_definitions(NDEBUG) # squelch fsl_flexcan.c warning
endif()

I can see these config parameters are already defined. Does it make sense to be able to get the seconds and not the nanosecond? Are the requirements for both parameters the same?

@hakehuang
Copy link
Collaborator

Ok. We should be able to do that. There are some logistical issues because of the Easter vacation, but we should be able to perform the test on a 1050-EVK early next week and post the results here.

@kevin137 do you enable below config in mimxrt1020_evk?

CONFIG_ETH_MCUX=y
CONFIG_PTP_CLOCK_MCUX=y

Hi @hakehuang , this configuration file for frdm_k64f can be easily found at zephyrproject/zephyr/samples/net/gptp/board , but where is the file related to the mimxrt1020_evk configuration? Where do we must add these config parameters for mimxrt1020_evk? Thanks in advance!

@santi681 , you can look the the eth_mcux.c code and find the CONFIG_PTP_CLOCK_MCUX, this is the option to enalbe ptp time stamp.

Thanks for your replies @hakehuang ! I simply defined these parameters at the beginning of the eth_mcux.c code if this is what I was supposed to do. Defining these config parameters doesn't make any difference to the results. In fact PTP seems to be working properly because slave_time.second are properly synchronized, the problem still remains on getting the slave_time.nanosecond, which is still zero at any time.

the main missing here is in the HAL driver see the
modules/hal/nxp/mcux/drivers/imx/CMakeLists.txt

zephyr_compile_definitions_ifdef(
  CONFIG_PTP_CLOCK_MCUX
  ENET_ENHANCEDBUFFERDESCRIPTOR_MODE
)

this enables the HAL driver to output ptp time stamp.

CMakeLists.txt file:

#
# Copyright (c) 2018, NXP
#
# SPDX-License-Identifier: Apache-2.0
#

zephyr_include_directories(.)

zephyr_compile_definitions_ifdef(
  CONFIG_PTP_CLOCK_MCUX
  ENET_ENHANCEDBUFFERDESCRIPTOR_MODE
)

zephyr_library_compile_definitions_ifdef(
  CONFIG_HAS_MCUX_CACHE FSL_SDK_ENABLE_DRIVER_CACHE_CONTROL
)

zephyr_library_sources_ifdef(CONFIG_CAN_MCUX_FLEXCAN	fsl_flexcan.c)
zephyr_library_sources_ifdef(CONFIG_COUNTER_MCUX_GPT	fsl_gpt.c)
zephyr_library_sources_ifdef(CONFIG_COUNTER_MCUX_PIT	fsl_pit.c)
zephyr_library_sources_ifdef(CONFIG_DISPLAY_MCUX_ELCDIF	fsl_elcdif.c)
zephyr_library_sources_ifdef(CONFIG_DMA_MCUX_EDMA	fsl_edma.c)
zephyr_library_sources_ifdef(CONFIG_DMA_MCUX_EDMA	fsl_dmamux.c)
zephyr_library_sources_ifdef(CONFIG_ENTROPY_MCUX_TRNG	fsl_trng.c)
zephyr_library_sources_ifdef(CONFIG_ETH_MCUX		fsl_enet.c)
zephyr_library_sources_ifdef(CONFIG_FLASH_MCUX_FLEXSPI	fsl_flexspi.c)
zephyr_library_sources_ifdef(CONFIG_GPIO_MCUX_IGPIO	fsl_gpio.c)
zephyr_library_sources_ifdef(CONFIG_HAS_MCUX_CACHE	fsl_cache.c)
zephyr_library_sources_ifdef(CONFIG_I2C_MCUX_LPI2C	fsl_lpi2c.c)
zephyr_library_sources_ifdef(CONFIG_I2S_MCUX_SAI	fsl_sai.c)
zephyr_library_sources_ifdef(CONFIG_I2S_MCUX_SAI	fsl_sai_edma.c)
zephyr_library_sources_ifdef(CONFIG_PWM_MCUX		fsl_pwm.c)
zephyr_library_sources_ifdef(CONFIG_SPI_MCUX_LPSPI	fsl_lpspi.c)
zephyr_library_sources_ifdef(CONFIG_UART_MCUX_LPUART	fsl_lpuart.c)
zephyr_library_sources_ifdef(CONFIG_VIDEO_MCUX_CSI	fsl_csi.c)
zephyr_library_sources_ifdef(CONFIG_WDT_MCUX_IMX_WDOG    fsl_wdog.c)
zephyr_library_sources_ifdef(CONFIG_CAN_MCUX_FLEXCAN	fsl_flexcan.c)

if(NOT CONFIG_ASSERT OR CONFIG_FORCE_NO_ASSERT)
  zephyr_compile_definitions(NDEBUG) # squelch fsl_flexcan.c warning
endif()

I can see these config parameters are already defined. Does it make sense to be able to get the seconds and not the nanosecond? Are the requirements for both parameters the same?

it depends on CONFIG_PTP_CLOCK_MCUX, so you need define it in boards just like frdm_k64f does, and this in HAL driver enables the hardware ptp timestamp

@kevin137
Copy link
Author

Hello everyone. I wanted to give a quick update and confirm that we see the same problem on the RT1050EVK. I believe we are configuring prj.conf correctly and have also confirmed that the HAL driver is being referenced. We are preparing a complete package of code, working on the frdm_k64f, NOT working on 1020 and 1050, to be able to share here.

@hakehuang
Copy link
Collaborator

hakehuang commented Apr 19, 2021

Hello everyone. I wanted to give a quick update and confirm that we see the same problem on the RT1050EVK. I believe we are configuring prj.conf correctly and have also confirmed that the HAL driver is being referenced. We are preparing a complete package of code, working on the frdm_k64f, NOT working on 1020 and 1050, to be able to share here.

Thanks kevin. please share your code.

below are my debug log on rt1060 which I think should be same

Breakpoint 2, ENET_Ptp1588GetTimer (base=0x402d8000, handle=handle@entry=0x80000090 <eth0_context+16>, ptpTime=ptpTime@entry=0x8000aca0 <z_interrupt_stacks+1920>)
    at /home/shared/disk/zephyr_project/zephyr_rt1060/modules/hal/cmsis/CMSIS/Core/Include/cmsis_gcc.h:453
453       __ASM volatile ("MRS %0, primask" : "=r" (result) );
(gdb) n
2857        ENET_Ptp1588GetTimerNoIrqDisable(base, handle, ptpTime);
(gdb)
2860        if (0U != (base->EIR & (uint32_t)kENET_TsTimerInterrupt))
(gdb)
2866        EnableGlobalIRQ(primask);
(gdb)
eth_rx (context=0x80000080 <eth0_context>) at /home/shared/disk/zephyr_project/zephyr_rt1060/zephyr/drivers/ethernet/eth_mcux.c:799
799                     if (ptpTimeData.nanosecond < ts) {
(gdb) print ts
$6 = 2863311530
(gdb) print ptpTimeData.nanosecond
$7 = 2863311530
(gdb) n
803                     pkt->timestamp.nanosecond = ptpTimeData.nanosecond;
(gdb) print ptpTimeData.nanosecond
$8 = 2863311530
(gdb) n
804                     pkt->timestamp.second = ptpTimeData.second;
(gdb) p pkt->timestamp.nanosecond
$9 = 17367042
(gdb) p
$10 = 17367042

@aunsbjerg
Copy link
Collaborator

I am seeing the same issue when running the gPTP sample on the mimxrt1064_evk board.

@kevin137
Copy link
Author

We have some new information. Unfortunately, due to continuing logistical difficulties we still don't have the K64-FRDM at our location, and we are not able to run the code with the latest modifications on it. We ARE able to run the same code on the mimxrt1050_evk and mimxrt1020_evk, however, and we see the same behavior as mentioned in all the previous posts. Beyond this, we are starting to looking at the guts of zephyr/modules/hal/nxp/mcux/drivers/imx/fsl_enet.c, and we think we see some things that aren't working as they should be.

First, a couple of links to our code in github:
--The application (basically the samples/net/gptp demo with some LOG_INFs ):
https://github.com/ainguraXmarquiegui/zephyr.git
--The HAL (likewise, our intention is to just see values of registers):
https://github.com/ainguraXmarquiegui/hal_nxp.git

The general issue continues. When the EVK running Zephyr as a slave syncs with our PC running Avnu gptp, we see the seconds field pop into the correct values, but the nanoseconds field is stuck at 0.

Recently, we have started to extract data from what we believe to be the source of the nanosecond field, the ENET_ATVR register (see page 2155 of the i.MX RT1050 Processor Reference Manual, Rev. 4 12/2019). We do this in fsl_enet.c (see ainguraXmarquiegui/hal_nxp@02947be ), and print it in our application. The exact changes can be seen here:
zephyrproject-rtos/hal_nxp@master...ainguraXmarquiegui:feature/aingura_gptp_tests
and
master...ainguraXmarquiegui:feature/aingura_gptp_tests

With this code compiled and running, this is what we see on the console of both the 1020EVK and 1050EVK:

[00:12:21.351,000] <inf> net_gptp_sample: Plot slave time SECONDS:                                                                                                                                                                           
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time second (u) 1619300568                                                                                                                                                              
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time second (X) 0x608490D8                                                                                                                                                              
[00:12:21.351,000] <inf> net_gptp_sample:                                                                                                                                                                                                    
[00:12:21.351,000] <inf> net_gptp_sample: Plot slave time NANOSECONDS:                                                                                                                                                                       
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time nanosecond (u) 0                                                                                                                                                                   
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time nanosecond (X) 0x0                                                                                                                                                                 
[00:12:21.351,000] <inf> net_gptp_sample: ATCRval! (X) 0xA91                                                                                                                                                                                 
[00:12:21.351,000] <inf> net_gptp_sample: ATCRaddr! (X) 0x402D8400                                                                                                                                                                           
[00:12:21.351,000] <inf> net_gptp_sample: ATVRval! (X) 0x0                                                                                                                                                                                   
[00:12:21.351,000] <inf> net_gptp_sample: ATVRaddr! (X) 0x402D8404

As you can see, the ATVR register address is correct. The ATVR register value, however, is always reading 0x0.

We believe this may be caused by the ATCR Reset Timer bit, which we should see getting automatically cleared. We are always reading 0xA91, but think that we should be seeing an 0x8 instead of 0xA.

@hakehuang
Copy link
Collaborator

We have some new information. Unfortunately, due to continuing logistical difficulties we still don't have the K64-FRDM at our location, and we are not able to run the code with the latest modifications on it. We ARE able to run the same code on the mimxrt1050_evk and mimxrt1020_evk, however, and we see the same behavior as mentioned in all the previous posts. Beyond this, we are starting to looking at the guts of zephyr/modules/hal/nxp/mcux/drivers/imx/fsl_enet.c, and we think we see some things that aren't working as they should be.

First, a couple of links to our code in github:
--The application (basically the samples/net/gptp demo with some LOG_INFs ):
https://github.com/ainguraXmarquiegui/zephyr.git
--The HAL (likewise, our intention is to just see values of registers):
https://github.com/ainguraXmarquiegui/hal_nxp.git

The general issue continues. When the EVK running Zephyr as a slave syncs with our PC running Avnu gptp, we see the seconds field pop into the correct values, but the nanoseconds field is stuck at 0.

Recently, we have started to extract data from what we believe to be the source of the nanosecond field, the ENET_ATVR register (see page 2155 of the i.MX RT1050 Processor Reference Manual, Rev. 4 12/2019). We do this in fsl_enet.c (see ainguraXmarquiegui/hal_nxp@02947be ), and print it in our application. The exact changes can be seen here:
zephyrproject-rtos/[email protected]:feature/aingura_gptp_tests
and
master...ainguraXmarquiegui:feature/aingura_gptp_tests

With this code compiled and running, this is what we see on the console of both the 1020EVK and 1050EVK:

[00:12:21.351,000] <inf> net_gptp_sample: Plot slave time SECONDS:                                                                                                                                                                           
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time second (u) 1619300568                                                                                                                                                              
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time second (X) 0x608490D8                                                                                                                                                              
[00:12:21.351,000] <inf> net_gptp_sample:                                                                                                                                                                                                    
[00:12:21.351,000] <inf> net_gptp_sample: Plot slave time NANOSECONDS:                                                                                                                                                                       
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time nanosecond (u) 0                                                                                                                                                                   
[00:12:21.351,000] <inf> net_gptp_sample: gPTP slave time nanosecond (X) 0x0                                                                                                                                                                 
[00:12:21.351,000] <inf> net_gptp_sample: ATCRval! (X) 0xA91                                                                                                                                                                                 
[00:12:21.351,000] <inf> net_gptp_sample: ATCRaddr! (X) 0x402D8400                                                                                                                                                                           
[00:12:21.351,000] <inf> net_gptp_sample: ATVRval! (X) 0x0                                                                                                                                                                                   
[00:12:21.351,000] <inf> net_gptp_sample: ATVRaddr! (X) 0x402D8404

As you can see, the ATVR register address is correct. The ATVR register value, however, is always reading 0x0.

We believe this may be caused by the ATCR Reset Timer bit, which we should see getting automatically cleared. We are always reading 0xA91, but think that we should be seeing an 0x8 instead of 0xA.

I check your https://github.com/ainguraXmarquiegui/zephyr/tree/feature/aingura_gptp_tests but I did not see you modify the west.yml to direct your hal code. how do you do that? do you run west update before your build?

@kevin137
Copy link
Author

Yes. We ran west update initially, and we are not changing between platforms, it is always either RT1050 or RT1020. My colleague @ainguraXmarquiegui ran west update just before creating the fork.

@hakehuang
Copy link
Collaborator

can you have a picture on your real boards connection? and if possible can I have your linux host boards shipped to me?

@kevin137
Copy link
Author

Hello @hakehuang and everyone listening in,

To anyone trying our code, please do a fresh checkout of the latest commit at:

https://github.com/ainguraXmarquiegui/zephyr/tree/feature/aingura_gptp_tests
and
https://github.com/ainguraXmarquiegui/hal_nxp/tree/feature/aingura_gptp_tests

Thanks to our friends at EBV, we got our hands on a FRDM-K64F and were able to run the same code on all three boards, and have confirmed that the FRDM works as expected, and the i.MXRT boards do not--and we are quite sure that it is related to the Reset Timer bit in the ATCR register.

First, as requested, this is the test setup, including are how the cables are hooked up:

FRDM-K64F, connected via USB to my PC (via the SDA USB port for the console UART), and with an Ethernet connection directly from the board to the Ethernet interface of my PC:

photo_2021-04-28_17-10-56

MIMXRT1020-EVK, connected via USB to my PC (via the J13 USB port for the console UART), and with an Ethernet connection directly from the board to the Ethernet interface of my PC:

photo_2021-04-28_17-11-08

MIMXRT1050-EVKB, connected via USB to my PC (via the J28 USB port for the console UART), and with an Ethernet connection directly from the board to the Ethernet interface of my PC:

photo_2021-04-28_17-11-21

My PC is nothing special. It does have an Intel I219-LM LOM Ethernet interface with hardware timestamping, which is very useful for gPTP, but most corporate PCs with NICs from Intel or Broadcom should have this capability:

kcook@i9coffee:~$ lshw 2>/dev/null | grep -B6 -A1 enp0s31f6 
        *-network
             description: Ethernet interface
             product: Ethernet Connection (7) I219-LM
             vendor: Intel Corporation
             physical id: 1f.6
             bus info: pci@0000:00:1f.6
             logical name: enp0s31f6
             version: 10
kcook@i9coffee:~$ sudo ethtool -T enp0s31f6 
Time stamping parameters for enp0s31f6:
Capabilities:
	hardware-transmit     (SOF_TIMESTAMPING_TX_HARDWARE)
	software-transmit     (SOF_TIMESTAMPING_TX_SOFTWARE)
	hardware-receive      (SOF_TIMESTAMPING_RX_HARDWARE)
	software-receive      (SOF_TIMESTAMPING_RX_SOFTWARE)
	software-system-clock (SOF_TIMESTAMPING_SOFTWARE)
	hardware-raw-clock    (SOF_TIMESTAMPING_RAW_HARDWARE)
PTP Hardware Clock: 0
Hardware Transmit Timestamp Modes:
	off                   (HWTSTAMP_TX_OFF)
	on                    (HWTSTAMP_TX_ON)
Hardware Receive Filter Modes:
	none                  (HWTSTAMP_FILTER_NONE)
	all                   (HWTSTAMP_FILTER_ALL)
	ptpv1-l4-sync         (HWTSTAMP_FILTER_PTP_V1_L4_SYNC)
	ptpv1-l4-delay-req    (HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ)
	ptpv2-l4-sync         (HWTSTAMP_FILTER_PTP_V2_L4_SYNC)
	ptpv2-l4-delay-req    (HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ)
	ptpv2-l2-sync         (HWTSTAMP_FILTER_PTP_V2_L2_SYNC)
	ptpv2-l2-delay-req    (HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ)
	ptpv2-event           (HWTSTAMP_FILTER_PTP_V2_EVENT)
	ptpv2-sync            (HWTSTAMP_FILTER_PTP_V2_SYNC)
	ptpv2-delay-req       (HWTSTAMP_FILTER_PTP_V2_DELAY_REQ)

As suggested in the README in the gptp sample application, I have installed OpenAvnu/gPTP: https://docs.zephyrproject.org/latest/samples/net/gptp/README.html#setting-up-linux-host

As specified I have increased the neighborPropDelayThresh in the gptp_cfg.ini file, in this case to 1000000, to absorb any imprecise adjustment in the PHY delay on the Zephyr side. Here is the gptp_cfg.ini file we are using: gptp_cfg.ini.txt

Here is what we are seeing, with the SAME code, running in the SAME way, on all three boards

FRDM-K64F

Note how the Reset Timer bit in the ATCR register is 0, the nanoseconds field is getting filled, and the application is able to display nanoseconds:

FRDM64

MIMXRT1020-EVK

Note how the Reset Timer bit in the ATCR register is 1, the nanoseconds field is NOT getting filled, and the application is reading 0 nanoseconds:

RT1020

MIMXRT1050-EVKB

Note how the Reset Timer bit in the ATCR register is 1, the nanoseconds field is NOT getting filled, and the application is reading 0 nanoseconds:

RT1050

It is not going to be practical for me to send you my principal PC. If it would be VERY helpful and DRASTICALLY reduce the time it will take to resolve this issue, I can find another host device with hardware timestamping support (for OpenAvnu gptp), and send it to you.

Let's get this thing solved! Thanks everybody!

@github-actions github-actions bot closed this as completed Aug 1, 2022
@kevin137
Copy link
Author

kevin137 commented Aug 1, 2022

I just wanted to say that we will let the bot close this issue because neither we nor anyone else seem to have bandwidth or the will to work on it anymore, but this problem has definitely NOT been solved. To have a stable and robust gPTP implementation on i.MXRT, someone needs to implement a servo to keep the PHC locked and keep the corrections smooth. We have "solved" this issue by giving up on Zephyr and going back to FreeRTOS. Much to my chagrin, because I really like the features and community Zephyr brings to high-performance microcontrollers.
We contacted NXP about buying support hours to have them complete the gPTP implementation in Zephyr, but the cost is too high for an organization like ours to pay and effectively donate to the Zephyr community, especially because they were not able (very understandably) to give an upper limit to the effort needed, just a lower limit. I would humbly suggest to NXP that gPTP is a going to be a key building block of Ethernet-connected sensor nodes, and that Zephyr is the RTOS workhorse to bet on. NXP definitely has the knowledge in house to make fantastic embedded PTP/TSN solutions--Layerscape and OpenIL are great. Some happy day, I think SOMEONE will have a rock-solid gPTP solution running on Zephyr, and we will be seeing amazing applications taking advantage of nanosecond synchronizations, without FPGAs or Linux (or FreeRTOS), but that day is not today.

@dleach02 dleach02 removed the Stale label Aug 1, 2022
@dleach02 dleach02 reopened this Aug 1, 2022
@dleach02
Copy link
Member

dleach02 commented Aug 1, 2022

@kevin137, I've removed the stale and reopened the issue.

@github-actions github-actions bot added the Stale label Oct 1, 2022
@dleach02 dleach02 removed the Stale label Oct 3, 2022
@kevin137
Copy link
Author

We've given up. We will let the bot close the issue.

@hakehuang
Copy link
Collaborator

hakehuang commented Jul 20, 2023

@dleach02 I will follow up this fix and retest the gptp performance as well as stabilities.

this fix looks good. the gptp of nxp driver now can achieve the same result as ST boards. and it can endure 24 hours testing without lost sync.

but below code still has risk

https://github.com/zephyrproject-rtos/zephyr/blob/main/drivers/ethernet/eth_mcux.c#L863

pkt->timestamp.nanosecond = ts;
pkt->timestamp.second = ptpTimeData.second;

if ts is wrapped at the boundary of 1 second, such as 99999999 -> 000000001, then we need compensate this gap.

below code are recommend by Seb Laveze

static uint64_t delta_cycles(struct eth_context * context, uint64_t cycles)
{
        uint64_t delta;

        if (cycles >= context->last_cycles)
                delta = cycles - context->last_cycles;
        else
                delta = NSEC_PER_SEC - context->last_cycles + cycles;

        return delta;
}

static uint64_t hw_clock_cycles_to_time(struct eth_context *ctx, uint64_t cycles)
{
        int32_t delta;

        delta = delta_cycles(ctx, cycles);
        if (delta >= (NSEC_PER_SEC / 2)) {
                delta = delta - NSEC_PER_SEC;
        }

        return delta + ctx->ptp_cycles + ctx->offset;
}

  • net gpt sync report
uart:~$ net gptp 1
Port id    : 1 (SLAVE)
Interface  : 0x80001b90 [1]
Clock id   : 02:04:9f:ff:fe:41:34:11
Version    : 2
AS capable : yes

Configuration:
Time synchronization and Best Master Selection enabled        : yes
The port is measuring the path delay                          : yes
One way propagation time on the link attached to this port    : 9096 ns
Propagation time threshold for the link attached to this port : 1000000 ns
Estimate of the ratio of the frequency with the peer          : 1
Asymmetry on the link relative to the grand master time base  : 0
Maximum interval between sync messages                        : 375000000
Maximum number of Path Delay Requests without a response      : 3
Current Sync sequence id for this port                        : 62497
Current Path Delay Request sequence id for this port          : 36673
Current Announce sequence id for this port                    : 40828
Current Signaling sequence id for this port                   : 36736
Whether neighborRateRatio needs to be computed for this port  : yes
Whether neighborPropDelay needs to be computed for this port  : yes
Initial Announce Interval as a Logarithm to base 2            : 0
Current Announce Interval as a Logarithm to base 2            : 0
Initial Sync Interval as a Logarithm to base 2                : -4
Current Sync Interval as a Logarithm to base 2                : -4
Initial Path Delay Request Interval as a Logarithm to base 2  : -3
Current Path Delay Request Interval as a Logarithm to base 2  : -3
Time without receiving announce messages before running BMCA  : 3000 ms (3)
Time without receiving sync messages before running BMCA      : 0 ms (3)
Sync event transmission interval for the port                 : 62 ms
Path Delay Request transmission interval for the port         : 125 ms
BMCA default priority1                                        : 248
BMCA default priority2                                        : 248

Runtime status:
Current global port state                                : SLAVE
Path Delay Request state machine variables:
        Current state                                    : WAIT_ITV_TIMER
        Initial Path Delay Response Peer Timestamp       : 1689846994943948360
        Initial Path Delay Response Ingress Timestamp    : 1689846994943958685
        Path Delay Response messages received            : 0
        Path Delay Follow Up messages received           : 1
        Number of lost Path Delay Responses              : 0
        Timer expired send a new Path Delay Request      : 0
        NeighborRateRatio has been computed successfully : 1
        Path Delay has already been computed after init  : 0
        Count consecutive reqs with multiple responses   : 0
Path Delay Response state machine variables:
        Current state                                    : INITIAL_WAIT_REQ
SyncReceive state machine variables:
        Current state                                    : WAIT_SYNC
        A Sync Message has been received                 : no
        A Follow Up Message has been received            : no
        A Follow Up Message timeout                      : no
        Time at which a Sync Message without Follow Up
                                     will be discarded   : 0
SyncSend state machine variables:
        Current state                                    : SEND_SYNC
        A MDSyncSend structure has been received         : no
        The timestamp for the sync msg has been received : no
PortSyncSyncReceive state machine variables:
        Current state                                    : RECEIVED_SYNC
        Grand Master / Local Clock frequency ratio       : %f
        A MDSyncReceive struct is ready to be processed  : no
        Expiry of SyncReceiptTimeoutTimer                : no
PortSyncSyncSend state machine variables:
        Current state                                    : SYNC_RECEIPT_TIMEOUT
        Follow Up Correction Field of last recv PSS      : 0
        Upstream Tx Time of the last recv PortSyncSync   : 1689846454877762355
        Rate Ratio of the last received PortSyncSync     : %f
        GM Freq Change of the last received PortSyncSync : %f
        GM Time Base Indicator of last recv PortSyncSync : 0
        Received Port Number of last recv PortSyncSync   : 0
        PortSyncSync structure is ready to be processed  : yes
        Flag when the half_sync_itv_timer has expired    : yes
        Has half_sync_itv_timer expired twice            : no
        Has syncReceiptTimeoutTime expired               : yes
PortAnnounceReceive state machine variables:
        Current state                                    : RECEIVE
        An announce message is ready to be processed     : no
PortAnnounceInformation state machine variables:
        Current state                                    : CURRENT
        Expired announce information                     : no
PortAnnounceTransmit state machine variables:
        Current state                                    : POST_IDLE
        Trigger announce information                     : no

Statistics:
Sync messages received                 : 4271
Follow Up messages received            : 8558
Path Delay Request messages received   : 537
Path Delay Response messages received  : 4286
Path Delay messages threshold exceeded : 1
Path Delay Follow Up messages received : 4286
Announce messages received             : 536
ptp messages discarded                 : 0
Sync reception timeout                 : 6
Announce reception timeout             : 0
Path Delay Requests without a response : 16
Sync messages sent                     : 25
Follow Up messages sent                : 25
Path Delay Request messages sent       : 4303
Path Delay Response messages sent      : 537
Path Delay Response FUP messages sent  : 536
Announce messages sent                 : 3

after 24 hours testing

uart:~$ net gptp 1
Port id    : 1 (SLAVE)
Interface  : 0x80001b90 [1]
Clock id   : 02:04:9f:ff:fe:41:34:11
Version    : 2
AS capable : yes

Configuration:
Time synchronization and Best Master Selection enabled        : yes
The port is measuring the path delay                          : yes
One way propagation time on the link attached to this port    : 12103 ns
Propagation time threshold for the link attached to this port : 1000000 ns
Estimate of the ratio of the frequency with the peer          : 0
Asymmetry on the link relative to the grand master time base  : 0
Maximum interval between sync messages                        : 375000000
Maximum number of Path Delay Requests without a response      : 3
Current Sync sequence id for this port                        : 62653
Current Path Delay Request sequence id for this port          : 56886
Current Announce sequence id for this port                    : 40846
Current Signaling sequence id for this port                   : 36736
Whether neighborRateRatio needs to be computed for this port  : yes
Whether neighborPropDelay needs to be computed for this port  : yes
Initial Announce Interval as a Logarithm to base 2            : 0
Current Announce Interval as a Logarithm to base 2            : 0
Initial Sync Interval as a Logarithm to base 2                : -4
Current Sync Interval as a Logarithm to base 2                : -4
Initial Path Delay Request Interval as a Logarithm to base 2  : -3
Current Path Delay Request Interval as a Logarithm to base 2  : -3
Time without receiving announce messages before running BMCA  : 3000 ms (3)
Time without receiving sync messages before running BMCA      : 0 ms (3)
Sync event transmission interval for the port                 : 62 ms
Path Delay Request transmission interval for the port         : 125 ms
BMCA default priority1                                        : 248
BMCA default priority2                                        : 248

Runtime status:
Current global port state                                : SLAVE
Path Delay Request state machine variables:
        Current state                                    : WAIT_ITV_TIMER
        Initial Path Delay Response Peer Timestamp       : 1689924334467500641
        Initial Path Delay Response Ingress Timestamp    : 1689924334467509700
        Path Delay Response messages received            : 1
        Path Delay Follow Up messages received           : 1
        Number of lost Path Delay Responses              : 0
        Timer expired send a new Path Delay Request      : 0
        NeighborRateRatio has been computed successfully : 1
        Path Delay has already been computed after init  : 0
        Count consecutive reqs with multiple responses   : 0
Path Delay Response state machine variables:
        Current state                                    : INITIAL_WAIT_REQ
SyncReceive state machine variables:
        Current state                                    : WAIT_SYNC
        A Sync Message has been received                 : no
        A Follow Up Message has been received            : no
        A Follow Up Message timeout                      : no
        Time at which a Sync Message without Follow Up
                                     will be discarded   : 0
SyncSend state machine variables:
        Current state                                    : SEND_SYNC
        A MDSyncSend structure has been received         : no
        The timestamp for the sync msg has been received : no
PortSyncSyncReceive state machine variables:
        Current state                                    : RECEIVED_SYNC
        Grand Master / Local Clock frequency ratio       : %f
        A MDSyncReceive struct is ready to be processed  : no
        Expiry of SyncReceiptTimeoutTimer                : no
PortSyncSyncSend state machine variables:
        Current state                                    : SYNC_RECEIPT_TIMEOUT
        Follow Up Correction Field of last recv PSS      : 0
        Upstream Tx Time of the last recv PortSyncSync   : 1689912984705873009
        Rate Ratio of the last received PortSyncSync     : %f
        GM Freq Change of the last received PortSyncSync : %f
        GM Time Base Indicator of last recv PortSyncSync : 0
        Received Port Number of last recv PortSyncSync   : 0
        PortSyncSync structure is ready to be processed  : yes
        Flag when the half_sync_itv_timer has expired    : yes
        Has half_sync_itv_timer expired twice            : no
        Has syncReceiptTimeoutTime expired               : yes
PortAnnounceReceive state machine variables:
        Current state                                    : RECEIVE
        An announce message is ready to be processed     : no
PortAnnounceInformation state machine variables:
        Current state                                    : CURRENT
        Expired announce information                     : no
PortAnnounceTransmit state machine variables:
        Current state                                    : POST_IDLE
        Trigger announce information                     : no
Statistics:
Sync messages received                 : 613643
Follow Up messages received            : 1227949
Path Delay Request messages received   : 76792
Path Delay Response messages received  : 614323
Path Delay messages threshold exceeded : 19
Path Delay Follow Up messages received : 614323
Announce messages received             : 76791
ptp messages discarded                 : 0
Sync reception timeout                 : 22
Announce reception timeout             : 0
Path Delay Requests without a response : 16
Sync messages sent                     : 181
Follow Up messages sent                : 181
Path Delay Request messages sent       : 614340
Path Delay Response messages sent      : 76792
Path Delay Response FUP messages sent  : 76791
Announce messages sent                 : 21

to summary here:

  1. the sync accuracy is increased from x100ms level to x10 us level.
  2. can pass 24 hours stress testing, means system can recover from lost sync.
    we can observe Path Delay messages threshold exceeded : 19, which means there 18 times of lost sync during 24 hours stress testing, the MTBF is 1hr20minutes, and apply the suggestion code above can improve a lot.(still in testing)
  3. there still some time jitter, which makes the sync time drifted within 12103 ns - 9096 ns.

@kevin137 , @DerekSnell , @dleach02 , @lomn please take chance to check this issue in latest zephyr code, with
#60562 from chencaidy, I think we can close this issue.

@dleach02
Copy link
Member

Closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug platform: NXP NXP priority: low Low impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.