Tuesday, January 22, 2013

Linux processes - Essentials - Part1


Introduction to Processes


In order to understand OS Process, we need to understand what a computer program is. CPU executes a set of instructions sequentially which is stored in RAM, along with the help of CPU registers and other collaborating systems like IO and Graphics units. Stored instructions is known as a program. Program is generally an executable file, stored in the disk. If you had ever used, ls, it is a program. Process is a running instance of a program. If you run 10 ls commands in parallel, system would have 10 processes of the program. Let us try to understand processes in Linux further in this post.

Process related information - Linux


Every process has to be initiated or spawned by some other process in the system. When a process is spawned, an unique ID would be allo, known as PID. And the parent process would be identified with PPID. Run the command "ps -f" in your terminal and have a look at the columns. You could be able to see PID and PPID along with other fields. I am listing the results from my terminal below.

karthikeyan@karthikeyan:~$ ps -f
UID        PID  PPID  C STIME TTY          TIME CMD
1000      6990  3032  0 14:49 pts/4    00:00:00 bash
1000      8800  6990  0 16:08 pts/4    00:00:00 nc -l 1234
1000      8803  6990  0 16:08 pts/4    00:00:00 ps -f

You could check the bash process which is the shell. This is the parent process for all the processes, run from this shell. If you get inquisitive about this parent processes, you might want to check the parent process of bash also (PPID, 3032); For that, you have to do "ps -ef". Option 'e' is to show every process in the system. From my terminal,

UID        PID  PPID  C STIME TTY          TIME CMD
1000      3032     1  0 11:38 ?        00:00:30 /usr/bin/gnome-terminal -x /bin/sh -c '/home/karthikeyan/Desktop/Link to idea.sh'
1000      3038  3032  0 11:38 ?        00:00:00 gnome-pty-helper
1000      3039  3032  0 11:38 pts/1    00:00:00 /bin/sh -c '/home/karthikeyan/Desktop/Link to idea.sh'
1000      5802  3032  0 13:33 pts/3    00:00:00 bash
1000      6990  3032  0 14:49 pts/4    00:00:00 bash

PID 3032, which is a parent process of bash, is gnome-terminal process. The parent PID of that process, is actually init as shown below. And Init is obviously spawned by Linux kernel.

UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 11:21 ?        00:00:00 /sbin/init

Since every process has to be spawned by a parent, once the child exits, it has to be cleaned up by parent. Otherwise, the child would be in Zombie state. In zombie state, process resources are deallocated and only entry in the process table remains. This is needed for the parent process to know the exit status of the child process using wait(), system call. Once wait is executed on the zombie process, entry in the process would be deleted and child would move out of Zombie state.

Zombie state is the termination state of the process and there are other states also, as given below:

Running
This is the state of the process, when it is executing. When it gets preempted by scheduler, it would move to ready state and to ready queue to get scheduled later.

Suspended
If the process is waiting on IO or Network, it would be in this state.

Stopped
If the process is in this state, it is stopped by another process, usually a debugger. Once it moves out of this, it would go to ready state.

Handy commands


In this section, we discuss some important commands related with linux processes.

The most important command to remember is "ps aux". This gives almost all the important information about the processes in the system.

PS - Process snap-shot. And a signifies all processes, u signifies user oriented format.

Generally "ps aux" output is lengthy, so Grep or more may be used along with that.

Other important options with ps are:

-e Every process
-f Full format
-p PID List, useful for filtering on PIDs

You could have a look at PS Man page for other options.

Internals of Process


Structure task_struct, defined in linux/sched.h, represents a process. For every process, there would be one task_struct allocated by kernel to store information about the process. So, analyzing this structure, gives us very useful information. Since this is a huge topic, let us take a simple overview alone, in this post. And I would cover other important stuff in a next post :)

Let us understand some important fields.

volatile long state;
This holds the state of the process; We have discussed states in this post.

int prio, static_prio, normal_prio;
unsigned int rt_priority;
Priority of the process.

struct mm_struct *mm, *active_mm;
This is very important field, holds the address space of the process. As you know, every process has its own address space, so that one process can't accidentally write into another process' memory space.

struct thread_struct thread;
This stores CPU state of the process.

struct fs_struct *fs;
struct files_struct *files;

This stores the file descriptors table for the process and other file-system related information would be in the struct fs_struct pointed by 'fs'.

int exit_state;
int exit_code, exit_signal;

This is to know the exit status of the process.

struct pid_link pids[PIDTYPE_MAX];

Hash table for PIDs, to expedite finding of task_struct, given a PID. And the following lists help for walking through processes.

struct list_head children;

This is a doubly linked list for Children processes spawned by this process.

struct list_head sibling;

This is a doubly linked list for siblings of process.

No comments:

Post a Comment