COMP310_MemoryManagement_VM

COMP310 2023/3/5

Memory Management: Virtual Memory

The core conception in memory management is of address space mapping.
Difference between physical memory and virtual one is: the virtual must be continuous but physical can be fragmental.

Dreamy Memory

The OS tries to make an illusion of using memory of the following properties(desired): private, infinitely large, infinitely fast, nonvolatile, cheap memory.

Memory Hierarchy

However, in real world, the memory follows the hierarchy:

Faster access from left to right, higher price from left to right
Registers(1 cycle)	Cache(~10 cycles)	Main Memory(~100 cycles)	Flash Disk(~1M cycles)	Traditional Disk(~10M cycles)
Primary Storage
On CPU
Not cared by OS		cared by OS(OS memory management)

Cycle here can be interpreted as nano-second(a speed measurement)

Goals of OS Memory Management

Every access to memory is in virtual instead of physical one, due to the virtual mapping by OS.

Main memory allocation
- location of kernel in address space
- what memory to allocate processes
- how many processes to allow
Protection(Segeregation)
- no corruption to OS and other processes
- Privacy: can’t read data from other processes
Transperancy
- Illusion of private memory for each process

Protection

Protection to be leaded to prevent trespassing access to memory space.
The general idea is to add a check in the course of accession.

The check is a hardware device (MMU Memory Management Unit).

Transparency

Essence is to make sure the running of process won’t cause confliction.(no worry of actual running space)

Main Memory Allocation

Where to allocate the kernel?

In the low memory address. The reason is the interrupt vectors(the address mapping of hardware device of transferring between user and kernel mode) are in the low memory.

Physical and Virtual Memory

The early days OS runs only single process at a time which is trusted so no virtual(logical) memory is needed; the process directly access the physical memory.
However, if the process is malicious(tresspass into the kernel) or there are multiple processes runs at the sam time, the model won’t work. Thus, logical(virtual) memory is needed.

Virtual/logical address space = what the program thinks is its memory
Physical address space = actually is its memory(residing physically)

CPU(process) –(generates virtual address space)–> MAP(MMU) –(Transfer)–> Memory(Physical Address)

Mapping Scheme

Classification

Base and Bounds
Segmentation
Paging

For each scheme, we need to understand: physical address space, virtual address space, virtual address, MMU.

Base and Bounds

Virtual Address Space
Linear address space: from 0 to MAX
Physical Address Space
Linear address space: from BASE to BOUNDS = BASE + MAX
MMU
MMU
MMU check relocation register(holds the base value) and limit register(holds the bounds value).
Free Space Distribution
OS will allocate the new process to the holes(space which is free).
Free list : A list of the range of the physical memory not in use.
- Method
  - Take first hole bigger than requested
  - The smallest hole bigger than requested
  - The largest hole -> tries to eliminate many unusable smallest holes

The problem of base and bounds is there may be huge “free space” occupied by the process in the memory that can’t be reallocated (internal fragmentation problem).
One solution is to implement by segmentation.

Segmentation

In stead of treat one program as a chunk in address space, we segmented it in three parts: Code, Heap, Stack which can grow dynamically.

The segmentation has many ways, the above 3 parts is only one of them.

Can be treated as multiple base and bounds.

Virtual Address Space
Two-dimensional space with set of segements and each segment ‘i’ is linear from 0 to MAX(i)
Set of segments, each linear

The order of the segments in physical space has no need to be the same as in virtual address space.
Virtual Address (Structure)
Check Lecture
Segment + Offset
–> Number of segment bits + offset bits = virtual memory address bits
MMU
- Segment table
  Indexed by segment number
  contains (base,limit pair)
  - base: physical address of segment in memory(where the segment starts)
  - limit: length of the segment
- Pointer to segment table in memory
- length of segment table
  
  notice a recursive use in MMU

Advantage and Disadvantage

Disadvantage
Segmentation requires the segmentation table(a data structure) to be stored in MMU, which is convoluted.
Advantage
The segmentation table actually enables memory sharing between processes.
Sharing is not possible with base&bounds but possible with segmentation.

Memory sharing in segementation can be easily reached by having the same point address in the segmentation table.

Protection Bits
To guarantee the protection property of memory allocation of OS in the course of memory sharing, protection bits by extra hardware support are needed.
Protection bits is a few more bits per segment indicating the permission of read, write,and execute.

Compaction

Compaction is a way to solve fragmentation problems(holes) in the physical memory. It can be used in both base&bounds and segmentation.
However, compaction is inefficient:

Process needs to be stopped for rearranging
data need to be copied to somewhere
MMU table changed

Besides, if a segment is needed to be expanded after compaction, the compaction needs to be restarted again.

NOTICE:
The info of mapping is all stored in MMU.
Internal fragmentation only appears in base&bounds since in segmentation the virtual spaces are all occupied. But external fragmentation happens in both.

Internal Fragmentation: can take free space inside the virtual memory space
External Fragmentation: there’s holes(free space) in the physical memory

COMP310 Lecture Notes

阅读全文

COMP310_SynchroP

COMP310 2023/3/4

Synchronization Primitives 同步原语

Concurrency

Purpose

The motivation for leading into concurrency:

CPU trend: Same speed, multiple cores <- destined by Moore’s Law
Goal: write applications fully utilize many cores

Option:

Multiprocess
- apps based on many communicating process
- communicate through message passing -> no shared memeory
- Pros: One process crashing not affect other processes
- Cons: High communication overloads & Expensive context switching
Multithread
- multiple threads in a process
- communicate through memory sharing(shared address space)
- Pros: High efficiency
- Cons: One crash whole crash

Non-Determinism:

Concurrency leads to non-deterministic results -> depends on CPU scheduler
- Different results with same inputs
- Race Conditions

Thread

Def

The smallest controllable unit of the OS. The essential part of thread is data(memory) sharing.

Difference to Process

Processes provide seperation -> suitable for coarse interaction
Threads do not provide seperation -> suitable for tighter integration

Shared Data

Advantage: Many threads can read/write it
Dis: Data racing

Dara Racing

Unwanted access to shared data
result of interleaving of thread executions
program must be correct for all interleavings

Data racing is one of the major causes of non-deterministicity of concurrency.

Synchronization Primitives:

implemented via hardware support in kernel mode

Software	Hardware
Monitors Semaphores Locks(mutex) Condition Variables	Loads Stores Test&Set Disable Interrupts

Atomization(Mutual Exclusion)

The aforementioned problem generates the following basic approach to multithreading with the solution:
The core is atomization(mutual exclusion of critical section).

Divide work among multiple threads
Shared data -> Put shared data access in critical section(atomization)

Purpose:

Prevents simultaneous access to a shared resource(memory region, …);solving critical section problem.

We need library support(pthreads) to achieve mutual exclusion.

POSIX Thread Libraries

Thread API for C/C++

User-level library:

#include<pthread.h>

Compile and link with -pthread

Support for thread creation, termination, synchronization

pthread_create(); pthread_exit(); pthread_join()

Pthread_Locks:
##Notice that the lock itself is a shared variable
Pthread_mutex_lock(mutex)
If lock is held by another thread, block; otherwise, acquire the lock and proceed.
This actually means the condition of the lock is always checking. System resource is in use.
Pthread_mutex_unlock(mutex)
Release the lock

Deadlocks

Threads are stuck waiting for blocked resources and no amount of retry will help.

Condition Variables

Conditional Variable is a sychronization primitive which is a bit complex than lock.

Used when thread A needs to wait for an event done by thread B
A waits until the certain condition is true
1. Test the condition
2. If the condition is not true, call
  pthread_cond_wait() -> causes A be blocked until the condition is true
At some point B makes the condition true
1. Then B calls pthread_cond_signal() ->unblocks A

Condition variable is a shared data, so introduce lock(mutex) here.

Conditional Variables Interface
pthread_cond_init(pthread_cond_t *cv, pthread_condattr_t *cattr)

Initialize the conditional variable, cattr can be null

pthread_cond_wait(pthread_cond_t *cv, pthread_mutex_t *mutex)

Block thread until condition is treu, and automatically unblock mutex

pthread_cond_signal(pthread_cond_t *cv)

Unblock one thread blocked by the conditional variable at random

pthread_cond_broadcast(pthread_cond_t *cv)

Unblock all threads blocked by the conditional variable

Advantage

A waits(sleeps) until a certain condition is true; CPU is not busy(not many resources is used).

Two Problems for Naive Conditional Variable and the Solution

Check the lecture.

Semaphores

#include<semaphore.h>
Semaphore is a shared, non-negative counter with two primary operations: wait(attempts to decrement the counter; blocks when the counter is 0),post/signal(attempts to increment the counter).

Notice that the increment and decrement in semaphore are atomic.

Semaphore value can be initialized upon creation to a positive value, or 0.

Uses

Mutual exclusion:
A semaphore with its counter initialized to 1 is equivalent to a lock.
Bound the currency
Initialized to a larger value to bound the # of threads out of N to proceed.
Consumer-Producer Problem

interface

int sem_init(sem_t *sem,int pshared, unsigned value)
int sem_post(sem_t *sem)
int sem_wait(sem_t *sem)

Hardware Implementation of synchronization

Problem Specifications

Safety(no tresspass of protected space)
Liveness(achieved final goal)

Lock Implementation

Check lecture for detailed notes.

Store & Load

Treat store as a signaling method to indicate the running status of threads; using load to determine if run or not based on status.
Disadvantage: busy_waiting

Disable Interrupts

Thread uses the syscall of LOCK to disable the interrupter in kernel, renable interrupter after finishing.
Adv: Compared to store&load, thread can be hypnotized

COMP310 Lecture Notes

阅读全文

COMP310_multithreading

COMP310 2023/3/3

Multithreading

Purpose

Still use web server as an example:

to ensure a certain level of security(for example, to guarantee the running code on the server would not disrupt anything), we may use multiprocess since process itself ensures a segregation
if security is guaranteed, multithread can be considered to its efficiency(by shared data)

Static data tend to be processed by thread, dynamic by process in server.

Basic Approach to Multithreading

Divide “work” among multiple threads(based on the property of work per se)
Shared data
- Only global variables and heap can be shared.
- shared data accession by critical section

Locking

For example, we may want to have a multithreading program compute the sum and products of elements in two arrays:

main(){
    int i, sum=0, prod=1
    for(i=0;i<MAX_THREADS;i++){Pthread_create(...)}
    for(i=0;i<MAX_THREADS;i++){Pthread_join(...)}
    printf(sum)
    printf(prod)
}

Threadcode(){
    int i,c
    for(i=my_min;i<my_max;i++){
        c = a[i] * b[i]
        sum += c
        prod *= c
    }
}

htop can be used to check the status of the resources(CPU,memory…)

We need to implement lock in thread to prevent data racing.
The naive way would be to implement biglock:

Threadcode(){
    int i,c
    for(i=my_min;i<my_max;i++){
        c = a[i] * b[i]
        pthread_mutex_lock(biglock)
        sum += c
        prod *= c
        pthread_mutex_unlock(biglock)
    }
}

However, it reduces performance since single lock inhibits parallelism. Therefore, two approaches invented to deal with it:

Fine-grain locking:
- Multiple locks on individual pieces of shared data
Privitization
- Make shared data accesses into private data access

Fine-Grain Locking

Threadcode(){
    int i,c
    for(i=my_min;i<my_max;i++){
        c = a[i] * b[i]
        pthread_mutex_lock(sumlock)
        sum += c
        pthread_mutex_unlock(sumlock)
        pthread_mutex_lock(prodlock)
        prod *= c
        pthread_mutex_unlock(prodlock)
    }
}
## 2 lock accesses per thread, per iteration

Privitization

Idea:

For each thread define the local variables to represent the parts of the global variables; access the loop by the locals. Finally using the lock to the local variables.

Threadcode(){
    int i,c
    local_s =0
    local_p =1
    for(i=my_min;i<my_max;i++){
        c = a[i] * b[i]
        local_s += c
        local_p *= c
    }
        pthread_mutex_lock(sumlock)
        sum += local_s
        pthread_mutex_unlock(sumlock)
        pthread_mutex_lock(prodlock)
        prod *= local_p
        pthread_mutex_unlock(prodlock)
}
## 2 lock accesses per thread, in total

Producer/Consumer Problem

Arises when two or more threads communicate with each other.

Def:

One shared bounded buffer with N entries. Requirements:

No production when N are all full(harder to deal since info may loss)
No consumption when no entry is full(easy to deal)
Access to the buffer is mutually exclusive(at a time only one process should be able to access the shared buffer and make changes to it)

Locks cannot deal with it

Solution: Semaphores

check the lecture

Single producer and consumer

Requires 2 semaphores:

emptyBuffer: initalize to N -> N empty buffers; producer can run N times first
fullBuffer: initialize to 0 -> 0 full buffers; consumer can run 0 times first

Multiple producers and consumers

COMP310 Lecture Notes

阅读全文

COMP310_Multiprocess Communication

COMP310 2023/3/3

Multiprocess Communication

Some is not included.

Purpose:

Most times, the program(shell,compiler…) we have seen right now only represents one process, with the following structure:
Code, Globals, Heap, Stack, Registers, PC(Program Counter)
However, in complicated cases, we may have one program composed of multiple processes(web server). In web server, the simplist model comprises two processes, one is used for request fetching, another is for local manipulation.

Multiprocess is a way to implement concurrency. The major purpose is still to boost the efficiency for using CPU.

Example of Web Server

Simplistic Model (Singel Process)
in single process, scheduler is inactive

WebServerProcess{
 forever{
     wait for incoming request #func 1
     read file from disk #
     send file in response # func 2
 }
}

After Seperation

Listener{
 forever{
     wait for incoming request
     CreateProcess( worker, request ) # activate Worker
 }
}

WorkerProcess(request){
 read file from disk
 send response
 exit
}

The idea is to create each worker(process) for a corresponding request to enable the scheduler.

Amount of work on server per request:

Receive network packet(request)
Run listner process
Create worker process -> which can be expensive
Read file from disk
Send response

To solve the creation cost, we can do preprocess( create some(max) workers as initialization ).

Listener{
    for(i=0; i<MAX_PROCESSES; i++) process[i] = CreateProcess(worker)
forever{
    wait for incoming request
    send(request, process[?]) # use Worker
}
}

WorkerProcess[?]{
forever{
wait for message(&request)
read file from disk
send response
exit
}
}

IPC(Inter Process Communication)

Purpose:

To communicate the info between processes.

Two ways to implement IPC:

message passing
- By value communication
- Never by reference -> seperation of msg in sender and receiver
RPC(remote procedural calls)

In Web Server, the IPC is implemented in two fields: inside field(communication between worker and listner) and outside field(communication between client and server).
notice:
in web server, the msg passing is not passed by reference but a new copy created instead. (Indicating memory is not shared)
in OS, the msg passing is implemented through the kernel.

Message Passing

Ways to implement message passing:

Symmetric Addressing
Asymmetric Addressing
Blocking Receive
Nonblocking Receive

Example of asymmetric addressing:

  Listener{
      for(i=0; i<MAX_PROCESSES; i++) process[i] = CreateProcess(worker)
  forever{
      client_pid = receive(msg) #receive from any client(always asymmetric)
      msg' = modified msg including client_pid
      send (msg', worker_process[?])
  }
 }

 WorkerProcess[?]{
  forever{
  receive(msg)
  read file from disk
  send (response, client_pid)
  }
 }

More examples please refer to the lecture

RPC(Remote Procedural Call)

RPC is based on message passing

Purpose

causes a procedure(function) to execute in a different address space(commonly another computer on a shared network).

RPC Interface

RPC Interface exposes a list of remotely callable procedures, with their signatures(arguments and return values).
Analogy:
RPC Server <-> Export file system
RPC Client <-> Import file system
Problem:
Message passing $\neq$ procedural call
So How to bridge the gap?
Solution: stub library

Stub Library

client and server stub are generated automatically, and they are part of the application

Two message types:

Call message
- from client to server
- contains argument
Return message
- from server to client
- contains return values

Mechanism:

Client process communicate with server process through RPC interface(which implementing stub library, the stubs are generated automatically through the compiler).

Clinet proc call clinet stub
client stub goes to kernel and pass the call message
server stub receive from kernel
server get proc call and execute

Message Format:

Message needs to include not only the basic arguments(pid,arg0,…) but which procedure is called.

Notice
The mechanism of RPC indicates it’s very inefficient to implement it. SO CAREFUL THINKING BEFORE IMPLEMENT IT.

How does this work?

Automatic stub generation:

rpcbind
universal addresses to RPC program number mapper
```
sudo apt_get install rpcbind
```
rpcgen
tool generating remote program interface modules

IPC/LPC/RPC

Reference: https://blog.csdn.net/Cmm_CSDN/article/details/86138330
https://cloud.tencent.com/developer/article/1690556

进程通信：

每个进程各自有不同的用户地址空间,任何一个进程的全局变量在另一个进程中都看不到，所以进程之间要交换数据必须通过内核,在内核中开辟一块缓冲区,进程A把数据从用户空间拷到内核缓冲区,进程B再从内核缓冲区把数据读走,内核提供的这种机制称为进程间通信。

进程间通信（IPC，Inter-Process Communication）：
IPC（进程间通信）是指在操作系统中，不同进程之间进行通信和同步的机制。
特指至少两个进程间传送数据或信号的一些技术或方法。进程是操作系统分配资源的最小单位。每个进程都有自己的一部分独立的系统资源，彼此是隔离的。为了能使不同的进程互相访问资源并进行协调工作，才有了进程间通信。这些进程可以运行在同一计算机上或网络连接的不同计算机上。进程间通信技术包括消息传递、同步、共享内存和远程过程调用。 IPC是一种标准的Unix通信机制。

有三种主要类型的进程间通信（IPC）：

Message passing
消息传递（Message Passing）是一种进程间通信（IPC）机制，它允许进程在不共享相同地址空间的情况下进行通信和同步。它通过发送和接收消息来实现。
远程过程调用（RPC Reomote Procedure Call）：
特指一种隐藏了过程调用时实际通信细节的IPC方法。客户端将调用一个本地方法，而这个本地方法则是负责透明的与远程服务端进行过程间通信。这个本地方法会讲相关参数顺序打包到一个消息中，然后把这个消息发送给服务端提供的方法，服务端的方法会从消息中解出序列化发出来的参数，然后执行，最后仍以同样的方式将方法的返回值发送给客户端
共享内存

RPC和MP：
本质上，RPC是基于message passing基础之上衍生出来的。这两者之间最核心的区别在于RPC的适用点主要是用于远程过程或函数调用；因此，在消除RPC的函数或过程调用功能后，RPC便和message passing没有本质区别。

LPC：
LPC（本地过程调用）是Windows操作系统中的一种消息传递机制，用于同一台计算机上的两个进程之间的通信。它类似于广泛使用的标准远程过程调用（RPC）机制，但针对Windows进行了优化。

LPC与消息传递之间的区别：

应用范围：LPC仅用于Windows操作系统中，而消息传递是一种通用的IPC机制。
优化：LPC针对Windows进行了优化，以提高在同一台计算机上的两个进程之间进行通信的效率。
使用方式：LPC不直接通过Windows API提供，而是作为仅供Windows操作系统组件使用的内部机制。而消息传递则可以通过多种编程语言和平台提供的API直接使用。

COMP310 Lecture Notes

阅读全文

Markdown Guide

Guide 2023/3/1

Markdown

阅读全文

LOADING

Aubade_Blog

Persisting will immortalizes freedom.

COMP310_MemoryManagement_VM

Memory Management: Virtual Memory

Dreamy Memory

Memory Hierarchy

Goals of OS Memory Management

Protection

Transparency

Main Memory Allocation

Where to allocate the kernel?

Physical and Virtual Memory

Mapping Scheme

Base and Bounds

Segmentation

Advantage and Disadvantage

Memory Sharing

Compaction

COMP310_SynchroP

Synchronization Primitives 同步原语

Concurrency

Purpose

Option:

Non-Determinism:

Thread

Def

Difference to Process

Shared Data

Dara Racing

Synchronization Primitives:

Atomization(Mutual Exclusion)

Purpose:

Deadlocks

Condition Variables

Advantage

Two Problems for Naive Conditional Variable and the Solution

Semaphores

Uses

interface

Hardware Implementation of synchronization

Problem Specifications

Lock Implementation

Store & Load

Disable Interrupts

COMP310_multithreading

Multithreading

Purpose

Basic Approach to Multithreading

Locking

Fine-Grain Locking

Privitization

Idea:

Producer/Consumer Problem

Def:

Solution: Semaphores

Single producer and consumer

Multiple producers and consumers

COMP310_Multiprocess Communication

Multiprocess Communication

Purpose:

Example of Web Server

Amount of work on server per request:

IPC(Inter Process Communication)

Purpose:

Message Passing

RPC(Remote Procedural Call)

Purpose

RPC Interface

Stub Library

Two message types:

Mechanism:

Message Format:

How does this work?

IPC/LPC/RPC

Markdown Guide