This sort of problem comes when a system has been tested on a single processor system, but the software actually has multiple threads.

With a single processor, the cpu is released to another thread and the main thread is forced to pause. Consequently any changes to shared data areas can be detected by the other threads. In a multi-core/processor system the main thread rarely pauses, and if the software is poorly written the changes made by one thread may be overwritten by another thread.

The result in this case seems to be that there are big delays because the main thread is waiting for the subsidiary thread to signal an input, while at the same time overwriting the marker telling it that there has been an input. There are software mechanisms to prevent this, but it sounds like they haven't been implemented properly.

The problem is worse when you've got two distinct pieces of software, particularly if one is a driver. In this situation you can't lock them to the same core/processor. I recently had to overcome this problem with a Cherry keyboard driver (built-in credit card stripe reader). Fortunately Cherry had tested their software and allowed for a delay so that you could actually process some data before they fired the next character into the buffer. The problem would have been resolved a lot quicker if they had documented why they had allowed this delay, instead of making me guess at the solution.