Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple defects in "Multi Producer Single Consumer Packet Buffer" library #38268

Closed
palchak-google opened this issue Sep 2, 2021 · 12 comments · Fixed by #38777
Closed

Multiple defects in "Multi Producer Single Consumer Packet Buffer" library #38268

palchak-google opened this issue Sep 2, 2021 · 12 comments · Fixed by #38777
Assignees
Labels
area: Logging bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug Stale

Comments

@palchak-google
Copy link
Contributor

Describe the bug
The following defects exist in the mpsc_pbuf library:

  1. Buffer drops packets before reaching maximum capacity
  2. Buffer drops extra packets upon insertion when full
  3. Buffer API is not const correct

To Reproduce
Here is the a short test procedure that exposes these defects:

static uint32_t GetPacketLength(mpsc_pbuf_generic *) {
    return 1;
}

static void PacketDropped(mpsc_pbuf_buffer *, mpsc_pbuf_generic *packet) {
    LOG_INFO("Packet dropped: %c", (char)packet->hdr.data);
}

static uint32_t buffer_storage[4];

static mpsc_pbuf_buffer_config buffer_config = {
	.buf = buffer_storage,
	.size = 4,
	.notify_drop = &PacketDropped,
	.get_wlen = &GetPacketLength,
	.flags = MPSC_PBUF_SIZE_POW2 | MPSC_PBUF_MODE_OVERWRITE
};

static mpsc_pbuf_buffer pkt_buffer;

static void PushPacket(char c) {
    static int i=1;
    LOG_INFO("Push packet (#%i): %c", i++, c);
    mpsc_pbuf_hdr pkt = {
        .valid = 1,
        .busy = 0,
        .data = (uint32_t)c
    };
    mpsc_pbuf_put_word(&pkt_buffer, {.hdr = pkt});
}

static mpsc_pbuf_generic* ClaimPacket() {
    static int i=1;
    mpsc_pbuf_generic *claim = mpsc_pbuf_claim(&pkt_buffer);
    if (claim) {
        LOG_INFO("Claimed Packet (#%i): %c", i++, (char) claim->hdr.data);
        return claim;
    } else {
        LOG_ERROR("Failed to claim packet (#%i)", i);
        exit(1);
    }
}

static void FreePacket(mpsc_pbuf_generic *claim) {
    LOG_INFO("Free claimed packet: %c", (char) claim->hdr.data);
    mpsc_pbuf_free(&pkt_buffer, claim);
}

static void TestMpscBuffer() {
    LOG_INFO("Init (size = 4)");
    mpsc_pbuf_init(&pkt_buffer, &buffer_config);
    // Expect buffer = {}

    for (unsigned i=0; i<4; ++i) {
        PushPacket('a'+i);
    }
    // Expect buffer = {a, b, c, d}

    PushPacket('e');
    // Expect buffer = {b, c, d, e}

    mpsc_pbuf_generic *claim1 = ClaimPacket();
    // Expect claim1 = 'b'

    PushPacket('f');
    // Expect buffer = {b, d, e, f}

    FreePacket(claim1);
    // Expect buffer = {d, e, f}

    mpsc_pbuf_generic *claim2 = ClaimPacket();
    // Expect claim2 = 'd'

    FreePacket(claim2);
    // Expect buffer = {e, f}

    mpsc_pbuf_generic *claim3 = ClaimPacket();
    // Expect claim3 = 'e'

    FreePacket(claim3);
    // Expect buffer = {f}
}

Logs and console output
Here is the output from running this test code on a native_posix target:

[00:00:00.000,000] <inf> mpsc_pbuf_test: Init (size = 4)
[00:00:00.000,000] <inf> mpsc_pbuf_test: Push packet (#1): a
[00:00:00.000,000] <inf> mpsc_pbuf_test: Push packet (#2): b
[00:00:00.000,000] <inf> mpsc_pbuf_test: Push packet (#3): c
[00:00:00.000,000] <inf> mpsc_pbuf_test: Push packet (#4): d
[00:00:00.000,000] <inf> mpsc_pbuf_test: Packet dropped: a
[00:00:00.000,000] <inf> mpsc_pbuf_test: Push packet (#5): e
[00:00:00.000,000] <inf> mpsc_pbuf_test: Packet dropped: b
[00:00:00.000,000] <inf> mpsc_pbuf_test: Claimed Packet (#1): c
[00:00:00.000,000] <inf> mpsc_pbuf_test: Push packet (#6): f
[00:00:00.000,000] <inf> mpsc_pbuf_test: Packet dropped: d
[00:00:00.000,000] <inf> mpsc_pbuf_test: Packet dropped: e
[00:00:00.000,000] <inf> mpsc_pbuf_test: Free claimed packet: c
[00:00:00.000,000] <inf> mpsc_pbuf_test: Claimed Packet (#2): f
[00:00:00.000,000] <inf> mpsc_pbuf_test: Free claimed packet: f
[00:00:00.000,000] <err> mpsc_pbuf_test: Failed to claim packet (#3)

Expected behavior
In this test scenario, a buffer with capacity to store four packets is constructed. However, only three packets can be pushed into the buffer before the buffer starts dropping packets. A total of six packets are pushed into the buffer, but in the end only two packets can be successfully retrieved from the buffer.

Impact
These defects, in particular # 1 and # 1, render the mpsc_pbuf unsuitable for applications that requires defined behavior or reliable ordering and delivery of packets

Environment (please complete the following information):

  • OS: native_posix target (Linux)
  • Toolchain: gcc 10.2.1
  • Commit SHA or Version used: 2.6.0-rc1
@palchak-google palchak-google added the bug The issue is a bug, or the PR is fixing a bug label Sep 2, 2021
@cfriedt cfriedt added area: Logging priority: medium Medium impact/importance bug labels Sep 7, 2021
@nordic-krch
Copy link
Contributor

@palchak-google the case you described is not a bug. packet buffer has buf_size - 1 words capacity. This 1 word gap is used to distinguish between buffer empty and full state. Because of that buffer has 3 packets initially when 4th packet is being put then oldest one is dropped to maintain this one word gap between tail and head. Now, when oldest packet is claimed and new one is pushed then when searching for free space algorithm finds out that space following the "gap" is currently busy (claimed) so it skips it (and puts "padding" packet into this gap) then it needs to drop packets for requested space + 1 word gap. Because of that 2 packets are dropped (d and e).

I will look into improving the documentation to state that clearly.

@nordic-krch
Copy link
Contributor

See the description for allocation when packet is claimed: https://docs.zephyrproject.org/latest/reference/data_structures/mpsc_pbuf.html#allocation-with-overwrite

@nordic-krch
Copy link
Contributor

See #38380

@palchak-google
Copy link
Contributor Author

Let's address each of the three issues:

Issue #1: Packets are dropped prior to reaching maximum capacity:

Your claim is that a minimum of one word of data storage will always consumed by the packet buffer for internal bookkeeping, and the documentation will be changed to reflect this. If the implementation is always going to require at least one extra word be available to differentiate full from empty, why not just put that word into the mpsc_pbuf_buffer structure in the form of a flag indicating full (or not)? This requires the same amount of memory, but lets the user use the complete capacity that was allocated for storing data.

It's common for some types of circular buffer to require the reservation of an extra slot, typically those implemented in hardware or lock-free implementations that rely on atomic pointer operations. This implementation is neither of those, so consuming the user's data storage is unnecessary. Just put the additional bookkeeping data where the rest of the bookkeeping data is: in the mpsc_buf_buffer struct?

Issue #2: Excess packets are dropped when full:

Let's pretend that it's necessary for the implementation to consume at least one word of the data storage. Even if such a scenario, the implementation is still dropping more packets than necessary.

Here's a visualization of the state of the packet buffer for the exact sequence of operations posted previously.

Init (size = 4)

(4)[  _  ,  _  ,  _  ,  _    ]
(4)[ WwRr                    ]

Push packet (#1): a

(4)[  a  ,  _  ,  _  ,  _    ]
(4)[ Rr     Ww               ]

Push packet (#2): b

(4)[  a  ,  b  ,  _  ,  _    ]
(4)[ Rr           Ww         ]

Push packet (#3): c

(4)[  a  ,  b  ,  c  ,  _    ]
(4)[ Rr                 Ww   ]

Push packet (#4): d
Packet dropped: a

(4)[  a  ,  b  ,  c  ,  _    ]
(4)[        Rr          Ww   ]


(4)[  a  ,  b  ,  c  ,  d    ]
(4)[ Ww     Rr               ]

Push packet (#5): e
Packet dropped: b

(4)[  a  ,  b  ,  c  ,  d    ]
(4)[ Ww           Rr         ]


(4)[  e  ,  b  ,  c  ,  d    ]
(4)[        Ww    Rr         ]

Claimed Packet (#1): c

(4)[  e  ,  b  ,  C  ,  d    ]
(4)[        Ww    R     r    ]

Push packet (#6): f
Packet dropped: d

(4)[  e  ,  ~  ,  C  ,  d    ]
(4)[ Rr                 Ww   ]

Packet dropped: e

(4)[  e  ,  ~  ,  C  ,  d    ]
(4)[        Rr          Ww   ]


(4)[  e  ,  ~  ,  C  ,  f    ]
(4)[ Ww     Rr               ]

Push packet (#7): g

(4)[  g  ,  ~  ,  C  ,  f    ]
(4)[        Ww    Rr         ]

Free claimed packet: c

(4)[  g  ,  ~  ,  _  ,  f    ]
(4)[        Ww    r     R    ]

Failed to claim packet (#2)

In this visualization, the following characters are used:
First line indicates the contents of the packet:
[a-f] = valid, not busy (packet)
[A-F] = valid, busy (packet)
_ = invalid, not busy (empty space)
~ = invalid, busy (skip packet)

Second line indicates pointers:
R = read pointer
r = temporary read pointer
W = write pointer
w = temporary write pointer

Let's focus on the actions that follow the line Claimed Packet (#1): c. When packet c is claimed, both write pointers point to the slot that contains packet b. One would expect that the implementation would have marked this slot as invalid (empty), but that appears to not be the case. The read pointer points to the slot occupied by packet c, and the temporary ready pointer has been advanced to the slot occupied by packet d. The buffer is full.

Now packet f is pushed. What happens is that the empty slot previously containing packet b gets stuffed with a skip packet, then the next two valid packets, d and e, get dropped, and finally packet f gets inserted.

The correct behavior should be as follows:

Claimed Packet (#1): c

(4)[  e  ,  b  ,  C  ,  d    ]
(4)[        Ww    R     r    ]

Push packet (#6): f
Packet dropped: d

(4)[  e  ,  b  ,  C  ,  _    ]
(4)[ Rr     Ww               ]

(4)[  e  ,  f  ,  C  ,  _    ]
(4)[ Rr                 Ww   ]

Packet d should be dropped to free up a new empty slot as required by the implementation. Packet f should be inserted at the current write location, which previously contained packet b. Then the temporary read pointer should be advanced to point to packet e.

Incidentally, it's also not clear to me why both read pointers get moved when packet d gets dropped. I would expect that only the temporary read pointer should move, as the (true) read pointer should still point at packet c (or C, as its currently claimed).

Issue #3: Interface is not const correct:

This is still a defect. No documentation changes will fix this.

@palchak-google
Copy link
Contributor Author

Looking over the previous comment, I realized that the sequence of operations is not exactly the same as was posted in the first comment. In the most recent sequence, two packets (f and g) are pushed between the claiming and releasing of packet c. In the original sequence only one packet is pushed between claiming and releasing.

This additional push operation exposed a fourth defect in the implementation, as observed by the final state of the buffer. After packet c is released, the read pointer is advanced but the temporary read pointer is not. At this time there are no outstanding claims on any packets, so I would expect the two read pointers to have the same value.

Also, both packets f and g were pushed into the buffer while packet c was claimed. After packet c is released, there should be at least one (but really two) packets still in the buffer. However, attempting to claim another packet after releasing packet c fails, meaning that somehow packet f and packet g got lost without having been dropped.

I think this defect is related to the odd handling of the read pointer when packet d is dropped.

@cfriedt
Copy link
Member

cfriedt commented Sep 14, 2021

@palchak-google - are you able to submit a PR for this?

@nordic-krch
Copy link
Contributor

@palchak-google i will take a look into that, agree that there is a bug.

@nordic-krch
Copy link
Contributor

@palchak-google see #38777

@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@nordic-krch
Copy link
Contributor

I would lower priority of this. Issue has been partially solved (const in API) and remaining issues are for long time in PRs without much reviewer interest. Issue found here does not impact logging since stress tests are passing in #42001.

@nordic-krch nordic-krch added priority: low Low impact/importance bug and removed priority: medium Medium impact/importance bug labels Feb 1, 2022
@github-actions
Copy link

github-actions bot commented Apr 3, 2022

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Logging bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug Stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants