[FX3] Commits on DMA MANUAL_OUT channel sometimes do not fire off CY_U3P_DMA_CB_CONS_EVENT callback | Cypress Semiconductor
[FX3] Commits on DMA MANUAL_OUT channel sometimes do not fire off CY_U3P_DMA_CB_CONS_EVENT callback
Could anyone tell me what conditions might cause a DMA buffer committed on a MANUAL_OUT channel to get "stuck" and never fire off a CY_U3P_DMA_CB_CONS_EVENT callback?
Are there any ways I can fetch some detailed information about what's going on with a commited DMA buffer, or determine what's being waited on?
Here's a little more context on the situation...
The firmware I'm working on has an IN/OUT endpoint pair, which uses DMA_MANUAL_OUT/IN channels, respectively. This code is similar to the BulkSrcSink example in terms of configuring these endpoints and DMA channels, as well as receiving/sending data.
This has been working quite well for a while now on some machines, but it appears that running on some faster hosts bring the issue to light. When I moved from an older AMD Linux host using the NEC/Renesas chipset to an i7 Windows 7 machine using an Intel XHCI controller, I see started seeing this occur within the first few transfers.
On my Linux box, I found that I'm able to reproduce this by running a program that fires off a lot of back-to-back requests. It's still a hunch that I'm still working to verify, but I'm wondering if the important difference here is inter transfer arrival time.
These endpoints implement a fairly straight-forward control protocol using bulk tranfers:
- The host sends control message to OUT EP, and then begins waiting for ACK on the IN EP
- The FX3 receives a CY_U3P_DMA_CB_PROD_EVENT, and copies DMA buffer data to a FIFO, and then calls CyU3PDmaChannelDiscardBuffer().
- A worker thread will then later dequeue the control request from a FIFO, process it, and then craft an ACK message
- The worker then sends the ACK back the host via the OUT EP:
- First, a buffer is obtained via CyU3pDmaChannelGetBuffer
- ACK data is then copied into the buffer. The data length is guarenteed to be <= the DMA buffer size.
- The buffer is committed via CyU3PDmaChannelCommitBuffer()
- The worker then waits for a max of N seconds (e.g., 5) for the associated CY_U3P_DMA_CB_CONS_EVENT to occur, or reports an error upon timeout.
To verify that the issue is the callback not firing, as opposed to a defect in the worker's "waiting", I've tried a few different approaches, including adding a debug print to the callback, having the callback toggle a global, and setting a breakpoint in the callback (debugging with gdb and a JLink).
I've ensured that the return values of all Cypress API calls are checked -- all are returning success for configurations and operations associated with committing the buffers.
When I get into this situation where a committed buffer becomes locked up, CyU3PDmaChannelGetStatus() reports that the channel is CY_U3P_DMA_ACTIVE, and the conXferCount is generally some reasonable value.
I've tested this out with both SDK 1.2.3 and 1.3 -- as expected, both behave identically.