Abstract: Recently, most applications are eager for large storage with fast access to the processor. Software Programmed Scratchpad memory is a promising alternative on-chip memory similar to the cache memory. Besides, scratchpad memory is much more efficient and less complex to implement than cache memory.
Several architectures exploit techniques such as data-level parallelism to improve the performance by fetching single instruction that operates on multiple data. A classic example of these architectures is the vector processor. Vector processors use vector registers to put sets of data elements that have been retrieved from the memory and store the results back to the memory after operations have been done on data in the vector registers.
However, Vector processors are less efficient for handling many scientific computing and multimedia oriented applications, which involve massive computation on extensive data set due to the limited size of vector registers. In addition, there is a complex cache memory hierarchy that feeds the vector registers. We propose M-architecture, a novel CPU-Scratchpad memory architecture that replaces the data cache, L2 cache, and vector registers to tackle these problems. Moreover, we introduce a unique ISA for M-architecture that captures data-level parallelism for scientific computing, mainly the 2D matrix involved workloads. With this method, the memory bottleneck, which comes from extensive load and store, is eliminated or reduced.
We developed a simple simulator to verify M-architecture ISA's functionality and estimate its performance in terms of the average number of operations per cycle and execution time. We used a wide range of benchmarks to evaluate our architecture. The thesis demonstrates that the new architecture offers high performance and achieves more than two orders of magnitude than the scalar execution.