Higher CPU clock speeds, core count, and powerful GPUs have been the holy grail of computing advancement over the past few decades. We have even dabbled with low power, energy efficient architectures that revolutioned mobile computing. These innovations led to exponential gains in raw computational power, enabling breakthroughs in AI, scientific computing, and real-time data processing. We’ve now at the cusp where raw compute performance is no longer the primary bottleneck. With the amount of available data, and the pace at which it’s being generated, the challenge lies in efficiently feeding data to those compute engines. Modern CPUs and GPUs are often underutilized because memory and storage systems struggle to keep up with their processing demands. This is why data locality – the principle of keeping data close to where it is processed will be an emerging frontier in computer architecture.

The Memory Wall

While CPUs and GPUs can perform trillions of operations per second, they frequently wait for data to arrive from main memory, storage, or network. The widening gap between compute speed and memory bandwidth, often referred to as the “memory wall,” means that processors spend significant cycles idle. Solutions like larger caches and faster DRAM and NVMe SSDs, higher network bandwidth have improved data access speeds, but not enough to match the throughput required by modern parallel processors. To maximize efficiency, we need architectural changes that optimize/minimize data movement.

Data Locality as a Solution

Data locality optimizations aim to reduce the distance data must travel before being processed. Techniques such as cache-aware algorithms, near-memory computing (processing-in-memory), and smart data prefetching are gaining traction. For example, placing compute units closer to memory – as seen in High Bandwidth Memory stacks for GPUs – reduces latency. Similarly, computational storage moves processing into storage devices themselves, minimizing data transfers. Another approach is distributed computing frameworks that prioritize keeping computations near where data resides, reducing network bottlenecks in cloud environments.

The benefits of data locality are evident in AI and big data workloads. Deep learning models & analytics, in particular, require massive datasets, and frequent data transfers between storage, memory, and GPUs leading to inefficiencies. Similarly, in-memory databases and edge computing leverage data locality to accelerate real-time analytics. As datasets continue to grow, optimizing data movement will be crucial for scaling performance.

Conclusion

The shift toward data locality marks a fundamental change in how we think about and design computing systems. Instead of solely focusing on faster processors, the next wave of innovation should prioritize minimizing data movement and maximizing proximity between compute and memory. Emerging technologies like CXL (Compute Express Link), processing-in-memory (PIM), and smarter memory hierarchies must be explored and applied to real-world systems. The future of computing lies in efficient data pipelines and architectures that maximize calculations without starving processors.