Cache Coherence in Computer Architecture YASH PAL, March 7, 2026March 7, 2026 Cache Coherence in Computer Architecture – In a multiprocessor system where many processes need a copy of the same memory block, the maintenance of consistency among these copies raises a problem referred to as the Cache Coherence Problem. In general. This occurs mainly due to these causes:- Sharing of writable data. Process migration. Inconsistency due to I/O. The Cache Coherence Problem In a multiprocessor system, data inconsistency may occur among adjacent levels or within the same level of the memory hierarchy. For example, the cache and the main memory may have inconsistent copies of the same object. As multiple processors operate in parallel and independently, multiple caches may possess different copies of the same memory block, which creates a cache coherence problem. Cache coherence schemes help to avoid this problem by maintaining a uniform state for each cached block of data. Cache Coherence Operation Let X be an element of shared data that has been referenced by two processors, P1 and P2. Initially, three copies of X are consistent. If the processor P1 writes a new data X1 into the cache, by using the write-through policy, the same copy will be written immediately into the shared memory. In this case, inconsistency occurs between the cache memory and the main memory. When a write-back policy is used, the main memory will be updated when the modified data in the cache is replaced or invalidated. Solution for the Cache Coherence Problem Various schemes have been proposed to solve the cache coherence problem. Cache Write Policies There are two main cache write policies. Write back: Write operations are usually made only to the cache. Main memory is only updated when the corresponding cache line is flushed from the cache. Write through: All write operations are made to main memory as well as to the cache, ensuring that main memory is always valid. Software Solution In the software approach, the detection of potential cache coherence problems is transferred from run time to compile time, and the design complexity is transferred from hardware to software. On the other hand, compile time, software approaches generally make conservative decisions. Leading to inefficient cache utilization. Compiler-based cache coherence mechanisms perform an analysis on the code to determine which data items may become unsafe for caching, and they mark those items accordingly. So, there are some more cacheable items, and the hardware’s operating system does not cache those items. The simplest approach is to prevent any shared data variables from being cached. This is too conservative, because a shared data structure may be exclusively used during some periods and may be effectively read-only during other periods. It is only during periods when at least one process may update the variable and at least one other process may access the variable then cache coherence is an issue. More efficient approaches analyze the code to determine safe periods for shared variables. The compiler then inserts instructions into the generated code to enforce cache coherence during the critical periods. Hardware Solutions A hardware solution provides dynamic recognition at run time of potential inconsistency conditions. Because the problem is only dealt with when it actually arises, there is more effective use of caches, leading to improved performance over a software approach. Hardware schemes can be divided into two categories: Snoopy Bus protocols. Directory protocol. Snoopy Bus Protocols: Snoopy protocols achieve data consistency between the cache memory and the shared memory through a bus-based memory system. Write-invalidate and write-update policies are used for maintaining cache consistency. Consistent copies of block x are in shared memory and three processor caches In the above figure, we have three processors, P1, P2, and P3, having a consistent copy of data element ‘X’ in their local cache memory and in the shared memory. After a write-invalidate operation by P1 Processor P1 writes X1 in its cache memory using the write-invalidate protocol. So, all other copies are invalidated via the bus. It is denoted by ‘1’ as shown in the above figure. Invalidated blocks are also known as dirty, i.e., they should not be used. The write-update protocol updates all the cache copies via the bus. By using write-back cache, the memory copy is also updated, as shown in the figure below. After a write update operation by P1 Directory-Based Protocols: By using a multistage network for building a large multiprocessor with hundreds of processors, the snoopy cache protocols need to be modified to suit the network capabilities. Broadcasting being very expensive to perform in a multistage network, the consistency commands are sent only to those caches that keep a copy of the block. This is the reason for the development of directory-based protocols for network-connected multiprocessors. In a directory-based protocol system, data to be shared is placed in a common directory that maintains the coherence among the caches. Here, the directory acts as a filter where the processors ask permission to load an entry from the primary memory to their cache memory. If an entry is changed, the directory either updates it or invalidates the other caches with that entry. Computer System Architecture engineering subjects Computer System Architecture