Concurrency

Concurrency 1

Context

Threads and processes execute concurrently or in parallel and can share resources (e.g., devices, variables, memory, data structures, etc.)
A process/thread can be interrupted at any point in time (I/O, timer)
The outcome of programs may become unpredictable

Sharing data can lead to inconsistencies (e.g. when interupted whilst manipulating data)
I.e., the outcome of execution may depend on the order win which instructions are carried out

Example

#include<stdio.h>
#include<stdlib.h>
#include<pthread.h>
int counter = 0;
void * calc(void * number_of_increments) { int i;
for(i = 0; i < *((int*) number_of_increments);i++) counter++;
}
int main()
{ 
    int iterations = 50000000;
    pthread_t tid1,tid2;
    if(pthread_create(&tid1, NULL, calc, (void *) &iterations) == -1) {
        printf("unable to create thread");
    exit(0); 
    }
    if(pthread_create(&tid2, NULL, calc, (void *) &iterations) == -1) { 
        printf("unable to create thread");
        exit(0);
    }
    pthread_join(tid1,NULL);
    pthread_join(tid2,NULL);
    printf("The value of counter is: %d\n", counter);
}

What would be the output?
If there is no interrupt, the output will be fine but there will be interrupt

Therefore, concurrency become a problem

Race condition

A race condition occurs when multiple threads/processes access shared data and the result is dependent on the order in which the instructions are interleaved
We will discuss mechanisms to provide controlled/synchronised access to data and avoid race conditions
Within the OS, processes share resources, including memory, files, processor time, printers, I/O devices, etc.
Therefore, the OS must:

Allocate and deallocate these resources safely (i.e. avoid interference, deadlocks and starvation)
Make sure that interactions within the OS do not result in race conditions
The operating system must provide locking mechanisms to implement/support mutual exclusion (and prevent starvation and deadlocks)

So what is mutual exclusion? What is critical section?

A critical section is a set of instructions in which shared variables (between processes/threads) are changed
Mutual exclusion must be enforced for critical sections:

Only one process at a time should be in the critical section (mutual exclusion)
Processes have to get “permission” before entering their critical section

Any solution to the critical section problem must satisfy the following requirements:

Mutual exclusion: only one process can be in its critical section at any one point in time
Progress: any process must be able to enter its critical section at some point in time
Fairness/bounded waiting: processes cannot be made to wait indefinitely

There are approach for mutual exclusion in terms of software and hardware
Software based: Peterson’s solution
Hardware based: test_and_set(), swap_and_compare()

Concurrency 2

Peterson’s solution(software solution)

Peterson’s solution is a software based solution which worked well for older machines
Two shared variables are used:

turn: indicates which process is next to enter its critical section
boolean flag[2]: indicates that a process is ready to enter its critical section

It is restricted to two processes that execute in strict alternation

The codes are as follows:

Mutual exclusion requirement: the variable turn can have at most one value at a time:

Both flag[i] and flag[j] are true when they want to enter their critical section
Turn is a singular variable that can store only one value
Hence, either while(flag[i] && turn == i)or while(flag[j] && turn == j) is true and at most one process can enter its critical section (mutual exclusion)

By know this principle by heart, you will be able to explain the following images

Hardware Approaches

Disabling Interrupts

Disable interrupts whilst executing a critical section and prevent interruption (i.e., interrupts from timers, I/O devices, etc.)
Disabling interrupts “may” be appropriate on a single CPU machine
This is inefficient on modern multi-core/multi processor machines

Disabling interrupts on all cores/CPUs takes time and causes delays
CPU capacity is lost on other cores

test_and_set()

// Test and set method
boolean test_and_set(boolean * lock) { 
    boolean rv = *lock;
    *lock = true;
    return rv;
}
// Example of using test and set method
do {
    // WHILE the lock is in use, apply busy waiting
    while (test_and_set(&lock)); // Lock was false, now true
    // CRITICAL SECTION
    ...
    lock = false;
    ...
    // remainder section
} while (...)

What does the code do?
test_and_set is a function which reads a pointer to the lock, change the lock value to true but return the initial value of lock
If the lock is false at first, then after test_and_set, it will change to true, but the return value is still false so it won’t go into busy waiting but to the critical section. Then if another thread is coming, this time, lock is true, so it cannot go into critical section. It will busy waiting.

BUT Test and set must be atomic/UN-interruptable

It will also generate deadlocks!

compare_and_swap

// Compare and swap method
int compare_and_swap(int *lock, int expected, int new_value) { 
    int temp = *lock;
    if(*lock == expected)
    *lock = new_value; return temp;
}
// Example using compare and swap method
do {
    // While the lock is in use (i.e. == 1), apply busy waiting
    while (compare_and_swap(&lock, 0, 1) != 0); // Lock was false, now true
    // CRITICAL SECTION
    ...
    lock = 0;
    ...
    // remainder section
} while (...);

The principle is the same
Disadvantages:

test_and_set() and compare_and_swap() are hardware instructions and (usually) not directly accessible to the user
Busy waiting is used
Starvation is possible
Deadlock is possible

The OS uses the hardware instructions to implement higher level mechanisms/instructions for mutual exclusion, i.e. mutexes and semaphores

Concurrency 3

So what are mutex and semaphores?

Mutex Locks

Mutexes are an approach for mutual exclusion provided by the operating system containing a boolean lock variable to indicate availability
The lock variable is set to true if the lock is available (process can enter critical section), false if not
Two atomic functions are used to manipulate the mutex:

acquire(): called before entering a critical section, boolean set to false
release: called after exiting the critical section, boolean set to true again

acquire() and release() must be atomic instructions
No interrupts should occur between reading and setting the lock
If interrupts can occur, the follow sequence could occur:

Disadvantages: The key disadvantage of mutex locks is that calls to acquire() result in busy waiting.(Detrimental for performance on single CPU systems) Advantages:

No costly context switches (which take time) are required
Efficient on multi-core/multi-processor systems when locks are held for a short time only

Semaphores

Semaphores are an approach for mutual exclusion (and process synchronisation) provided by the operating system
Two atomic functions are used to manipulate semaphores (think of the counter++ example)

wait() is called when a resource is acquired, the counter is decremented
signal() is called when a resource is released, the counter is incremented

wait()

Calling wait() will block process once the counter reaches zero

In this way, semaphores can avoid busy waiting

Every semaphore has a “blocked queue” associated with it:

The process joins the waiting queue
The process state is changed from running to blocked
Control is transferred to the process scheduler

signal()

Calling signal() removes the process from the blocked queue

The process state is changed from blocked to ready
Different queueing strategies can be employed to remove processes (e.g. FIFO, etc.)

The negative value of a semaphore is the number of processes waiting for the resource

Caveats

Starvation: poorly designed queueing approaches (e.g. LIFO) may result in fairness violations
Deadlocks: two or more processes are waiting indefinitely for an event that can be caused only by one of the waiting processes

I.e., every process in a set is waiting for an event that can only be caused by another process in the same set

Because we have use semaphores in our coursework, we won’t talk semaphore a lot

Exercise

Q: Multiple jobs can run in parallel and finish faster than if they had run sequentially. Suppose that two jobs, each needing 10 minutes of CPU time, start simultaneously. How long will the last one take to complete if they run sequentially? How long if they run in parallel? Assume 50% I/O wait.
A: 50% I/O wait time means that a process is not in execution(i.e. CPU is sitting idle) for 50% of the total time a process requires from CPU to complete itself(its execution). The time needed to complete a process which requires 10 minute of CPU will be = CPU time required by process/CPU utilization
If each job has 50% I/O wait, then it will take 20 minutes to complete in the absence of competition. If run sequentially, the second one will finish 40 minutes after the first one starts.
With two jobs, the approximately CPU utilization is 1-(0.5)^2 = 0.75. now the CPU utilization for 1 process will be 0.75/2=0.375. Therefore the time required will be = CPU time required by process/CPU utilization= 10/0.375=26.67 minutes.
Q: Can a thread ever be preempted by a clock interrupt? If so, under what circumstances? If not, why not?
A: User-level threads cannot be preempted by the clock unless the whole process’ quantum has been used up. Kernel-level threads can be preempted individually. In the latter case, if a thread runs too long, the clock will interrupt the current process and thus the current thread. The kernel is free to pick a different thread from the same process to run next if it so desires.
Q: In a system using user-level threads, is there one stack per thread or one stack per process?
What about when kernel-level threads are used? Why or why not?
A: Each thread calls procedures on its own, so it must have its own stack for the local variables, return addresses and so on. This is equally true for user-level threads as for kernel-level threads.
Q: Does Peterson’s solution to the mutual exclusion problem work when process scheduling is preemptive? How about when it is non-preemptive?
A: It certainly works with preemptive scheduling. In fact, it was designed for that case. When scheduling is non-preemptive, it might fail. Consider the case in which turn is initially 0 but process 1 runs first. It will just loop forever and never release the CPU.
Q: Five jobs are waiting to be run. Their expected run times are 9, 6, 3, 5, and X. In what order should they be run to minimize average response time? (Your answer will depend on X.)
A: Shortest job first is the way to minimize average response time.
0 < X ≤ 3: X, 3, 5, 6, 9.
3 < X ≤ 5: 3, X, 5, 6, 9.
5 < X ≤ 6: 3, 5, X, 6, 9.
6 < X ≤ 9: 3, 5, 6, X, 9.
X > 9: 3, 5, 6, 9, X.
Q: Five batch jobs A through E, arrive at a computer center at almost the same time. They have estimated running times of A:10, B:6, C:2, D:4, and E:8 minutes. Their (externally determined) priorities are A:3, B:5, C:2, D:1, and E:4, respectively, with 5 being the highest priority. For each of the following scheduling algorithms, determine the mean process turnaround time (average response time). Ignore context switching overhead.
A:

For round robin(assume that the system is multiprogrammed, and that each job gets its fair
share of the CPU. )

During the first 10 minutes, each job gets 1/5 of the CPU. At the end of 10 minutes, C finishes. During the next 8 minutes, each job gets 1/4 of the CPU, after which time D finishes. Then each of the three remaining jobs gets 1/3 of the CPU for 6 minutes, until B finishes, and so on. The finishing times for the five jobs are 10, 18, 24, 28, and 30, for an average of 22 minutes.

For high priority first

B is run first. After 6 minutes, it is finished. The other jobs finish at 14, 24, 26, and 30, for an average of 20 minutes.

For FCFS

If the jobs run in the order A through E, they finish at 10, 16, 18, 22, and 30, for an average of 19.2 minutes

For Shorest job first
Finish times are 2, 6, 12, 20, and 30, for an average of 14 minutes.

搜索此博客

MikeChen's Blog

[OSC] Day3: Concurrency