Memory Management

Paging

A page is a small block of contiguous memory in the logical address space, i.e. as seen by the process
A frame is a small contiguous block in physical memory
Pages and frames(usually) have the same size:

This is usually power of 2
Sizes range between 512 bytes and 1Gb

Relocation

Logical address (page number, offset within page) needs to be translated into a physical address (frame number, offset within frame)
Multiple “base registers” will be required: Each logical page needs a separate “base register” that specifies the start of the associated frame

page table

The base registers are stored in the page table

The page table can be seen as a function, that maps the page number of the logical address onto the frame number of the physical address, frameNumber = f(pageNumber)
Every process has its own page table containing its own “base registers”

Address translation

A logical (physical) address is relative to the start of the program (memory) and consists of two parts:

The right most m bits that represent the offset within the page (frame)
The left most n bits that represent the page (frame) number

The offset within the page and frame remains the same (they are the same size)
The page number to frame number mapping is held in the page table

Hardware implementation of address translation

The CPU’s memory management unit (MMU) intercepts logical addresses
MMU uses a page table as above
The resulting physical address is put on the memory bus

Virtual Memory

Principle of Locality

Code execution and data manipulation are usually restricted to a small subset (i.e. limited number of pages) at any point in time
I.e. code and data references within a process are usually clustered ) This is called the principle of locality
Not all pages have to be loaded in memory at the same time => virtual memory

Loading an entire set of pages for an entire program/data set into memory is wasteful
Desired blocks could be loaded on demand

Page faults

The resident set refers to the pages that are loaded in main memory
A page fault is generated if the processor accesses a page that is not in memory

A page fault results in an interrupt (process enters blocked state)
An I/O operation is started to bring the missing page into main memory
A context switch (may) take place
An interrupt signals that the I/O operation is complete (process enters ready state)

Benefits of Virtual Memory

Being able to maintain more processes in main memory through the use of virtual memory improves CPU utilisation
Individual processes take up less memory since they are only partially loaded
Virtual memory allows the logical address space (i.e processes) to be larger than physical address space (i.e. main memory)

Page tables revisited

now, the page tables are not simply translation between page number and frame number

A “present/absent bit” that is set if the page/frame is in memory. When we first visit the page table, we check if it is present or absent, if it is absent, it is a page fault and we just simply generate a interruption
A “modified bit” that is set if the page/frame has been modified(only modified pages have to be written back to disk when evicted and it will be use after then)
A “referenced bit” that is set if the page is in use
Protection and sharing bits: read, write and execute or combinations(RWX)

Dealing with large page tables

As the machine become 32 bits or 64 bits, there will be a lot of big page table, so how to deal with it?

Multi-level page table(Page the page table)

The process is as the image above, it use first 10bits and second 10 bits to ensure the frame number and add the offset
Memory organisation of multi-level page tables:

the root page table is always maintained in memory
page tables themselves are maintained in virtual memory due to their size

Assume that a fetch from main memory takes T nano seconds, with single page table level, access is 2 * T; with two page table levels, access is 3 * T and too much levels will increase complexity.

Page tables revisited:TLB to speed up paging

Translation look aside buffers (TLBs) are (usually) located inside the memory management unit**(MMU)**

They cache the most frequently used page table entries
They can be searched in parallel

take care, there will be TLB hit/TLB miss and also Page fault

Inverted Page Tables

A “normal” page table’s size is proportional to the number of pages in the virtual address space ⇒ this can be prohibitive for modern machines
An “inverted” page table’s size is proportional to the size of main memory

The inverted table contains one entry for every frame(not for every page)
The inverted table indexes entries by frame number, not by page number
To speed up the address translation, a hash function based on the page number is used to index the inverted page table

Advantages:
The OS maintains a single inverted page table for all processes
It saves a lot of space (especially when the virtual address space is much larger than the physical memory)

because it maintains the frame, it maintain the physical memory. Assume there is a 64 bit machine, the RAM is 1Gb, the page size is 4KB
4kb = 2^12 , 1Gb = 2^30
For traditional page time, there will be 2^(64-12) = 2^52 number of pages
For inverted page table, the frame number is 2^30/2^12 = 2^18 number of frames = 262144, which is small
if one frame is 4kb too, the size is 2^18* 4kb which is acceptable

Disadvantages:

Virtual-to-physical translation becomes much harder. We need to use hash tables to avoid searching the whole inverted table (be aware of potential collisions).
More time-consuming

It is used in combination with TLBs to speed up the search

Demand paging(Pages are only loaded when needed)

Demand paging starts the process with no pages in memory

The first instruction will immediately cause a page fault
more page faults will follow, but they will stabilise over time until moving to next locality
The set of pages that is currently being used is called its working set

Pre-paging(load page before)

When the process is started, all pages expected to be used (i.e. the working set) could be brought into memory at once

This can reduce page fault rate
Retrieving multiple (contiguously stored) pages reduces transfer times (seek time, rotational latency, etc.)

Virtual memory implementation details

Avoiding unnecessary pages and page replacement is important!

Assume memory access time is ma
page fault rate is p
page fault time is pft
The effective access time is ?(Not consider TLBs)
Answer: T = (1 - p) * ma + pft * p
If it is a single-level page table, the ma = 100ns, pft = 8ms(single level =>2 accesses)
T = (1 - p) * 200 + p * 8000000
The expected/effective access time is proportional to page fault rate when keeping page faults into account
So we need page replacement to reduce page fault rate!
This choice is made by page replacement algorithms and takes into account

When the page is last used/expected to be used again
Whether the page has been modified (only modified pages need to be written)
Replacement choices have to be made intelligently (⇔ random) to save time/avoid thrashing

Thrashing: If a process does not have access to a sufficient number of memory pages, thrashing occurs and page fault rate is high, paging rapidly exchanging data in memory for data on disk, which is bad for performance

Page replacement algorithms

Optimal page replacement(In ideal/optimal world, not practical)

Each page is labeled with the number of instructions that will be executed/length of time before it is used again
the page which is not going to be not referenced for the longest time is the optimal one to remove
The optimal approach is not possible to implement(can implement in VM and used for compare)

It can be used for post-execution analysis ⇒ what would have been the minimum number of page faults
It provides a lowerbound on the number of page faults (used for comparison with other algorithms)

FIFO

FIFO maintains a linked list and new pages are added at the end of the list
The oldest page at the head of the list is evicted when a page fault occurs
Advantage: Easy to implement/understand
Disadvantage: It performs poorly ⇒ heavily used pages are just as likely to be evicted as a lightly used pages

Second chance FIFO

Second chance is a modification of FIFO:

If a page at the front of the list has not been referenced it is evicted
If the reference bit is set, the page is placed at the end of list and its reference bit reset

Advantage: works better than standard FIFO
Disadvantage: simple but it is costly to implement because the list is constantly changing (paging have to be added to the end of the list again)
It can degrade to FIFO if all pages were initially referenced

The clock replacement algorithm

The second-chance implementation can be improved by maintaining the page list as a circle (this is the only difference)

A pointer points to the last “visited” page
In this form the algorithm is called clock
It is faster but an still be slow if the list is long
The time spent on maintaining the list is reduced

Not recently used(NRU)

Referenced and modified bits are kept in the page table, referenced bits are set to 0 at the start, and reset periodically (e.g. system clock interrupt or when searching the list)
Four different page “types” exist:

class 0: not referenced, not modified
class 1: not referenced, modified
class 2: referenced, not modified
class 3: referenced, modified

Page table entries are inspected upon every page fault ⇒ a page from the lowest numbered non-empty class is removed (can be implemented as a clock)
The NRU algorithm provides a reasonable performance and is easy to understand and implement
The performance of NRU can be improved by working with two buffers, containing a modified and free list

Least recently used(LRU)

Least recently used evicts the page that has not been used the longest

The OS must keep track of when a page was last used
Every page table entry contains a field for the counter
This is not cheap to implement as we need to maintain a list of pages which are sorted in the order in which they have been used (or search for the page)
The algorithm can be implemented in hardware using a counter that is incremented after each instruction

summary

Exercise:
Q:

Q:Assume a single-level page table, 20ns for associative TLB lookup, 100ns for memory access and there is no page fault. How many of the TLB hit time and TLB miss time? Assume 80% hit rate, what’s the estimated access time?
A: TLB hit: 20+100 = 120ns
TLB miss: 20+100+100 = 220ns
Access time 120*0.8+220*(1-0.8) = 140ns (40% slowdown relative to absolute addressing)

Q: Assume we have a system with eight logical pages and four physical frames(PFs)
enter image description here

搜索此博客

MikeChen's Blog

[OSC] Day6: Paging, Virtual Memory and page replacement