OS kernel basics

OS kernel basics
Michal Sojka1
Czech Technical University in Prague,
Faculty of Electrical Engineering
Email: [email protected]
April 15, 2015
1
Based on exercises by Benjamin Engel from TU Dresden.
M. Sojka
A3B33OSD, task E2
April 15, 2015
1 / 19
Roadmap
Last week
System calls from user space perspective
This week
I
Switching from kernel to user space
I
Kernel side of system calls
I
Virtual memory basics
M. Sojka
A3B33OSD, task E2
April 15, 2015
2 / 19
NOVA microhypervisor
I
Research project of TU Dresden (< 2012) and Intel Labs (≥ 2012).
I
http://hypervisor.org/, x86, GPL.
I
We will use a stripped down version of the microhypervisor (kernel).
M. Sojka
A3B33OSD, task E2
April 15, 2015
3 / 19
Getting started
tar xf osd-e2.tar.gz
cd osd-e2
make # Compile everything
make run # Run it in Qemu emulator
I
Stdout will show the serial line of the emulated machine
I
user/ – user space code
I
kern/ – stripped down NOVA kernel
M. Sojka
A3B33OSD, task E2
April 15, 2015
4 / 19
Asignment overview
BIOS Kernel boot usercode syscall_handler usercode
iret
CPU reset
sysenter
sysexit
time
Priviledged (kernel) mode
User mode
M. Sojka
A3B33OSD, task E2
April 15, 2015
5 / 19
Booting and starting user space
Booting in a nutshell
1. CPU reset, BIOS executes
2. Bootloader loads the kernel binary into memory
3. We use a small trick – the kernel image contains also user space code
(user/usercode.c)
4. Kernel starts executing – initializes CPU ...
5. and paging (virtual memory)
6. Kernel prepares user space mapping – user space code expects itself loaded at
address 0x2000.
7. User code is started (first switch from kernel to user mode)
Physical
memory
M. Sojka
A3B33OSD, task E2
April 15, 2015
6 / 19
Booting and starting user space
Booting in a nutshell
1. CPU reset, BIOS executes
2. Bootloader loads the kernel binary into memory
3. We use a small trick – the kernel image contains also user space code
(user/usercode.c)
4. Kernel starts executing – initializes CPU ...
5. and paging (virtual memory)
6. Kernel prepares user space mapping – user space code expects itself loaded at
address 0x2000.
7. User code is started (first switch from kernel to user mode)
Kernel code/data
Physical
memory
kern/build/hypervisor
M. Sojka
A3B33OSD, task E2
April 15, 2015
6 / 19
Booting and starting user space
Booting in a nutshell
1. CPU reset, BIOS executes
2. Bootloader loads the kernel binary into memory
3. We use a small trick – the kernel image contains also user space code
(user/usercode.c)
4. Kernel starts executing – initializes CPU ...
5. and paging (virtual memory)
6. Kernel prepares user space mapping – user space code expects itself loaded at
address 0x2000.
7. User code is started (first switch from kernel to user mode)
Kernel code/data
User code/data
Physical
memory
kern/build/hypervisor
M. Sojka
A3B33OSD, task E2
April 15, 2015
6 / 19
Booting and starting user space
Booting in a nutshell
1. CPU reset, BIOS executes
2. Bootloader loads the kernel binary into memory
3. We use a small trick – the kernel image contains also user space code
(user/usercode.c)
4. Kernel starts executing – initializes CPU ...
5. and paging (virtual memory)
6. Kernel prepares user space mapping – user space code expects itself loaded at
address 0x2000.
7. User code is started (first switch from kernel to user mode)
Kernel code/data
EIP
User code/data
Physical
memory
kern/build/hypervisor
M. Sojka
A3B33OSD, task E2
April 15, 2015
6 / 19
Booting and starting user space
Booting in a nutshell
1. CPU reset, BIOS executes
2. Bootloader loads the kernel binary into memory
3. We use a small trick – the kernel image contains also user space code
(user/usercode.c)
4. Kernel starts executing – initializes CPU ...
5. and paging (virtual memory)
6. Kernel prepares user space mapping – user space code expects itself loaded at
address 0x2000.
7. User code is started (first switch from kernel to user mode)
3G (0xC0000000)
0
4G
Virtual
memory
User space
EIP
Kernel space
Kernel code/data
User code/data
Physical
memory
kern/build/hypervisor
M. Sojka
A3B33OSD, task E2
April 15, 2015
6 / 19
Booting and starting user space
Booting in a nutshell
1. CPU reset, BIOS executes
2. Bootloader loads the kernel binary into memory
3. We use a small trick – the kernel image contains also user space code
(user/usercode.c)
4. Kernel starts executing – initializes CPU ...
5. and paging (virtual memory)
6. Kernel prepares user space mapping – user space code expects itself loaded at
address 0x2000.
7. User code is started (first switch from kernel to user mode)
0
3G (0xC0000000)
0x2000
4G
Virtual
memory
User space
EIP
Kernel space
Kernel code/data
User code/data
Physical
memory
User stack
kern/build/hypervisor
M. Sojka
A3B33OSD, task E2
April 15, 2015
6 / 19
Booting and starting user space
Booting in a nutshell
1. CPU reset, BIOS executes
2. Bootloader loads the kernel binary into memory
3. We use a small trick – the kernel image contains also user space code
(user/usercode.c)
4. Kernel starts executing – initializes CPU ...
5. and paging (virtual memory)
6. Kernel prepares user space mapping – user space code expects itself loaded at
address 0x2000.
7. User code is started (first switch from kernel to user mode) – YOUR TASK
0
3G (0xC0000000)
0x2000
4G
Virtual
memory
User space
EIP
Kernel space
Kernel code/data
User code/data
Physical
memory
User stack
kern/build/hypervisor
M. Sojka
A3B33OSD, task E2
April 15, 2015
6 / 19
Prerequisites
What you need to know?
I
NOVA is implemented in C++ (and assembler).
I
Each user “program” is represented by execution context data
structure.
I
The first executed program is called root task.
I
How is the user program mapped into virtual memory, i.e.
what are the virtual addressed of code, data and stack?
I
Intel Instruction Set Reference (link)
M. Sojka
A3B33OSD, task E2
April 15, 2015
7 / 19
Prerequisites
Execution context
I
I
In NOVA, execution context (Ec) represents a thread of execution
(similar to tasks in other OSes).
Data stored in the execution context:
class Ec {
void
(*cont)();
Exc_regs
regs;
static Ec * current;
};
I
I
I
// Continuation address
// Registers
// Currently running Ec
Ec::regs stores user space registers (i.e. syscall parameters)
Ec::current is a (global) pointer to the currently executing Ec.
First Ec is created in bootstrap(), init.cc:
// Create a new Ec with Ec::root_invoke as entry point
Ec::current = new Ec (Ec::root_invoke, addr);
// Start executing the new "task" (in kernel space)
Ec::current->make_current();
UNREACHED; // This is never executed.
I
Ec::root invoke is responsible for steps 6 and 7 from the “booting”
slide.
M. Sojka
A3B33OSD, task E2
April 15, 2015
8 / 19
Prerequisites
Root task
I
First user space “task” invoked by the kernel.
I
Similar to UNIX init process.
I
Our user space code expects the following memory layout (see
user/linker.ld):
0x2000 0x3000
Text
(code)
Data
Entry point
I
We also need stack – let’s put it before the code page.
I
First page is left “not present” to catch NULL pointer
deference errors.
M. Sojka
A3B33OSD, task E2
April 15, 2015
9 / 19
Prerequisites
Root task
I
First user space “task” invoked by the kernel.
I
Similar to UNIX init process.
I
Our user space code expects the following memory layout (see
user/linker.ld):
0
0x2000 0x3000
Text
(code)
Data
}
Not
Stack
present
Page
Entry point
I
We also need stack – let’s put it before the code page.
I
First page is left “not present” to catch NULL pointer
deference errors.
M. Sojka
A3B33OSD, task E2
April 15, 2015
9 / 19
Prerequisites
I
Symbol binary usercode bin start is the
address where linker puts our usercode.bin.
I
Page table manipulation will be the topic of
next week.
0
0x2000 0x3000
Not
Stack
present
Text
(code)
Data
}
Mapping of root task memory
Page
Entry point
void Ec::root_invoke() {
// Allocate one page for stack
void *stack = Kalloc::allocator.alloc_page(1);
// Map the stack page at address 0x1000
Ptab::insert_mapping(1 * PAGE_SIZE,
Kalloc::virt2phys(stack),
Ptab::PRESENT | Ptab::RW | Ptab::USER);
// Map our user space code at 0x2000
Ptab::insert_mapping(2 * PAGE_SIZE,
Kalloc::virt2phys(&_binary_usercode_bin_start),
Ptab::PRESENT | Ptab::USER);
// Map our user space data at 0x3000
Ptab::insert_mapping(3 * PAGE_SIZE,
Kalloc::virt2phys(&_binary_usercode_bin_start+PAGE_SIZE),
Ptab::PRESENT | Ptab::RW | Ptab::USER);
M. Sojka
A3B33OSD, task E2
April 15, 2015
10 / 19
Switch to user space
First switch to user space – your task
I
I
I
Not
Stack
present
After mapping the memory to the
Page
right place, we can start executing the
code.
Use iret instruction to exit the kernel mode and
continue in user mode.
iret takes the operands from the stack!
Prepare an array with 5 elements:
I
I
I
I
I
I
0x2000 0x3000
Text
(code)
Data
}
I
0
Entry point
0x2000: user instruction pointer to return to
SEL USER CODE: new CS
ESP
(include/selectors.h)
0x200: EFLAGS – just set interrupt enabled flag
0x2000: new stack pointer
SEL USER DATA: new SS stack segment
SS
16
ESP
12
EFLAGS
8
CS
4
EIP
0
Point ESP to the array and execute iret instruction.
M. Sojka
A3B33OSD, task E2
April 15, 2015
11 / 19
Switch to user space
In the user space
I
After successful exit to user space you should see:
Ec::handle exc Page Fault (eip=0x2000 cr2=0x42)
I
This says that the instruction at address 0x2000 tried to
access address 0x42 but no page was mapped there.
I
This is expected. See objdump -d user/usercode
I
The output will be:
00002000 <_start>:
2000: c6 05 42 00 00 00 12
2007: 0f 0b
2009: eb fe
I
movb
ud2
jmp
$0x12,0x42
2009 <_start+0x9>
Now, we can look at NOVA’s system calls.
M. Sojka
A3B33OSD, task E2
April 15, 2015
12 / 19
Kernel side of system calls
Kernel side of system calls
I
CPU initialization
I
Kernel entry code
I
Syscall handler
I
Kernel exit code
M. Sojka
A3B33OSD, task E2
April 15, 2015
13 / 19
Kernel side of system calls
CPU initialization
I
Set Model-Specific Registers (MSR) to tell the CPU what to
do when user space invokes the sysenter instruction (see
init.cc, init())
Msr::write<mword>(Msr::IA32_SYSENTER_CS,
SEL_KERN_CODE);
Msr::write<mword>(Msr::IA32_SYSENTER_ESP,
reinterpret_cast<mword>(&Tss::run.sp0));
Msr::write<mword>(Msr::IA32_SYSENTER_EIP,
reinterpret_cast<mword>(&entry_sysenter));
I
CS (code segment) register will be set to kernel code segment
I
I
I
Note that code segment descriptor determines the privilege
level of executing code.
ESP (stack pointer) will point to sp0 member of Tss::run
global variable (see tss.h)
EIP (instruction pointer) will be set to entry sysenter (see
entry.S)
M. Sojka
A3B33OSD, task E2
April 15, 2015
14 / 19
Kernel side of system calls
Syscall entry
1
2
3
4
5
6
7
entry_sysenter:
cld
pop
%esp
lea
-44(%esp), %esp
pusha
mov
$(KSTCK_ADDR + PAGE_SIZE), %esp
jmp
syscall_handler
3. Set ESP to the point behind address of Ec::current->regs
(see Ec::make current() in ec.h).
4. Decrease ESP to skip 11 registers that are used only during
exception handling (Exc regs)
5. Store 8 general purpose registers (syscall arguments) to
Ec::current->regs
6. Set ESP to the top of kernel stack
7. Jump to Ec::syscall handler
M. Sojka
A3B33OSD, task E2
April 15, 2015
15 / 19
Kernel side of system calls
Syscall implementation
I
Ec::syscall handler – A C++ function implementing the
syscalls
I
Where do we get the number argument?
void Ec::syscall_handler (uint8 number)
{
switch(number) {
case 0: ...
case 1: ...
}
ret_user_sysexit();
UNREACHED; // Tell the compiler to not generate
// function epilog
}
M. Sojka
A3B33OSD, task E2
April 15, 2015
16 / 19
Kernel side of system calls
Returning to user space
1
2
3
4
5
6
7
8
9
void Ec::ret_user_sysexit()
{
asm volatile ("lea %0, %%esp;"
"popa;"
"sti;"
"sysexit"
: : "m" (current->regs) : "memory");
UNREACHED;
}
3. Set ESP to point Ec::current->regs.
4. Restore 8 general purpose registers from there.
5. Enable interrupts.
6. Return to user space.
M. Sojka
A3B33OSD, task E2
April 15, 2015
17 / 19
Kernel side of system calls
sysenter/sysexit
I
I
I
I
Faster alternative to int 0x80 and iret.
Does not use stack to store return address.
sysexit sets EIP←EDX, ESP←ECX and decreases the
privilege level.
Therefore the user space syscall wrapper must be different
from the “int 0x80” variant:
unsigned syscall1 (unsigned w0) {
asm volatile (
"
mov %%esp,%%ecx;"
"
mov $1f,%%edx;" // set edx to the addr. of label 1:
"
sysenter;"
"1:"
// continue here after sysexit
: "+a" (w0) : : "ecx", "edx", "memory");
return w0;
}
M. Sojka
A3B33OSD, task E2
April 15, 2015
18 / 19
Assignment
Assignment
1. In Ec::root invoke use iret to exit the kernel and run user space code
(usercode.c).
2. In the kernel, implement “write” system call with the following prototype:
void write(char *buf, int len)
It sends len bytes pointed by buf to the serial line (printf() function in the
kernel).
3. Implement “add” system call that adds two integer arguments, returns the
result and prints it in ASCII to serial line. The prototype of the add system call
will be:
int add(int a, int b)
4. Invoke these two system calls from usercode.c. Use if to check that the
result of add is correct.
ABI: write – AL=1, EDI=buf, ESI=len;
add – AL=2, EDI=a, ESI=b
BIOS Kernel boot usercode syscall_handler usercode
CPU reset
M. Sojka
iret
sysenter
A3B33OSD, task E2
sysexit
Priviledged (kernel) mode
time
April 15, 2015
User mode
19 / 19