The Von Neumann Bottleneck’s Affect on Artificial Intelligence

By John Paul Mueller, Luca Massaron

The Von Neumann bottleneck is a natural result of using a bus to transfer data between the processor, memory, long-term storage, and peripheral devices. No matter how fast the bus performs its task, overwhelming it — that is, forming a bottleneck that reduces speed — is always possible. Over time, processor speeds continue to increase while memory and other device improvements focus on density — the capability to store more in less space. Consequently, the bottleneck becomes more of an issue with every improvement, causing the processor to spend a lot of time being idle.

Within reason, you can overcome some of the issues that surround the Von Neumann bottleneck and produce small, but noticeable, increases in application speed. Here are the most common solutions:

  • Caching: When problems with obtaining data from memory fast enough with the Von Neumann Architecture became evident, hardware vendors quickly responded by adding localized memory that didn’t require bus access. This memory appears external to the processor but as part of the processor package. High-speed cache is expensive, however, so cache sizes tend to be small.
  • Processor caching: Unfortunately, external caches still don’t provide enough speed. Even using the fastest RAM available and cutting out the bus access completely doesn’t meet the processing capacity needs of the processor. Consequently, vendors started adding internal memory — a cache smaller than the external cache, but with even faster access because it’s part of the processor.
  • Prefetching: The problem with caches is that they prove useful only when they contain the correct data. Unfortunately, cache hits prove low in applications that use a lot of data and perform a wide variety of tasks. The next step in making processors work faster is to guess which data the application will require next and load it into cache before the application requires it.
  • Using specialty RAM: You can get buried by RAM alphabet soup because there are more kinds of RAM than most people imagine. Each kind of RAM purports to solve at least part of the Von Neumann bottleneck problem, and they do work — within limits. In most cases, the improvements revolve around the idea of getting data from memory and onto the bus faster. Two major (and many minor) factors affect speed: memory speed (how fast the memory moves data) and latency (how long it takes to locate a particular piece of data). Read more about memory and the factors that affect it.

As with many other areas of technology, hype can become a problem. For example, multithreading, the act of breaking an application or other set of instructions into discrete execution units that the processor can handle one at a time, is often touted as a means to overcome the Von Neumann bottleneck, but it doesn’t actually do anything more than add overhead (making the problem worse). Multithreading is an answer to another problem: making the application more efficient. When an application adds latency issues to the Von Neumann bottleneck, the entire system slows. Multithreading ensures that the processor doesn’t waste yet more time waiting for the user or the application, but instead has something to do all the time. Application latency can occur with any processor architecture, not just the Von Neumann Architecture. Even so, anything that speeds the overall operation of an application is visible to the user and the system as a whole.