More on *nix

The Unix operating system, on which linux is based, was written in C (hence the close ties found between C and Unix)
It is very much an OS written by programmers for programmers
One of the goals was to produce an operating system which was short and simple, with less emphasis on speed and sophistication
Because the operating system itself is simple, a wide variety of routines are provided in optional libraries to carry out more complex tasks
A command shell interprets the commands a user types in, and runs each one as a process (and of course the command shell is yet another process)
There are a wide variety of shells available for use (Bourne shell, C shell, Korn shell, etc) each with slightly different capabilities
Processes can spawn multiple other processes which can run in parallel
Attempts have been made to treat I/O and disk files as similarly as possible

Programming tools

There are a wide variety of tools available to aid the programmer, several of these we'll consider here:

Makefiles: for organising sequences of commands
Debuggers: aid the programmer in bug hunting
g++ foo.cpp -g -o foo with gdb foo
Program profilers: find out where your program spends most of its execution time
g++ -pg foo.cpp with gprof a.out gmon.out
Assembly language: find out what the assembly language code for your program would look like
g++ -S foo.cpp
Finding things: with grep and find

Makefiles

Makefiles simplify life when you have a number of commands to run in order, depending on whether certain files have been modified

call your makefiles makefile (surprise surprise)
inside makefiles there are dependencies and commands
if you type make foo then the system looks in the makefile (in current directory) for a line beginning foo:
the rest of the line lists the files the foo command depends on
if any of those have been modified since you last ran foo,
then it will look for further commands

foo:  foo.o
      g++ -o foo foo.o

foo.o: foo.cpp
       g++ -c foo.o foo

The makefile above says (basically)

If foo.o has changed:
1. If foo.cpp has changed then
  execute command g++ -c foo.o foo.cpp
2. execute command g++ -o foo foo.o

GDB debugger

Debuggers allow programmers to analyse the run-time behaviour of a program.

To try a debugger:

Recompile the program for the debugger:
g++ foo.cpp -g -o foo
Run the debugger on the program:
gdb foo

Options in the debugger include:

help for online help
break mysub sets a breakpoint at subroutine mysub
step executes the program one instruction at a time
c resumes normal execution (runs til the end or the next breakpoint)
print myvar prints out the current value of variable myvar
help info and help show give more useful ways to obtain info about the program
quit to quit the debugger

grep and find

Often we need to search our directories for a file with a specific name, or a file which contains a specific word or series of words.

To search the current directory (and all below it) to find a file named foo, use the following command:

find . -name foo -print
To search the current directory (and all below it) to find a file whose name begins with foo use: find . -name 'foo*' -print
To search the current directory (and all below it) to find a file whose name ends with .foo use: find . -name '*.foo' -print
To search all files in the current directory for ones which contain the phrase blahblahblah use the command: grep blahblahblah *

Note: you can only run grep and find on files and directories for which you have `read' permission

Command interpreters: shells

A shell is an interactive command interpreter, and uses a syntax similar to C++.

You can either type commands directly in via the keyboard, or put the commands in a file (which we then call a shell script)

We are using the bash shell, and the first thing it does when you log on is to run the commands in your .bashrc, .bash_profile, and .bash_aliases files

Each individual command is run as a seperate process

When you logout, the last thing done by the shell is to run the commands in your .bash_logout file

You can learn more about the shell by entering man bash

Other useful tools

There are many other useful programs available in the libraries, here are a subset of them:

diff - for comparing two files and finding the differences
sort - for sorting the contents of a file
history - for easy repitition of commands
system - for running commands from inside a C++ program
gzip/gunzip - for compressing large files
tar - for storage of directories

diff to compare files

diff file1 file2 compares two files, and prints out all the lines which are different
The -r option can be used to also compare common files in subdirectories (i.e. you have two similar sets of directories, and want to see which files have changed between the two)
The -w option causes diff to ignore differences in whitespace (eg. "w x y z" is treated the same as "wxyz")
The -i option causes diff to ignore differences in upper/lower case (eg. "Foo" is treated the same as "foo")
The -e option produces a list of commands for the ed editor, which would change the first file into the second

sort to sort lines in a file

sort filename prints the output of the file with all the lines in sorted order
sorts alphabetically from left to right
The -b option causes sort to ignore leading whitespace
The -f option causes sort to ignore differences in upper/lower case
-m option causes sort to merge the output (assumes sorted order already)
The -r option causes sort to sort in reverse order
The -o filename option causes sort to send the output to the specified file instead of standard output
The +n option causes sort to skip the first n fields when sorting (fields seperated by whitespace). For example, sort +2 foo would sort the file foo below by the id field:
```
Name      Age    StudentId   Mark Fees
Fred      36     1010101        X 0.02
Wilma     36     0202020        C 0.12
Barney    36     3333333        P 0.37
```

history for repeating commands

to make full use of the history command, edit your .cshrc file to contain the lines
```
      set history = 22
      set savehist = 22
```
this causes the system to keep track of the last 22 commands you have entered, storing them in a file called .history
history will cause these commands to be printed to your screen, along with a number and time for each of them
!n will cause command number n (from this list) to be repeated
!! repeats the last command you typed in
!pattern will repeat the most recent command which began with pattern
^foo^blah repeats the last command, but takes out the first occurance of foo and replaces it with blah. For example typing in more myfile.cpp then ^cpp^data would cause the second command to be more myfile.data

Calling commands from C++ programs

include the stdio.h and stdlib.h libraries
use the call system("blahblah");
to execute command blahblah

for example, the following code clears the screen
then writes hello world:

#include < iostream >
#include < stdio.h >
#include < stdlib.h >

int main()
{
   system("clear");
   cout << "Hello World! \n");
}

Compressing large files: gzip, gunzip

To save disk space, you can compress large files when you aren't using them
gzip filename compresses the file, and stores it in a new file named filename.gz
gzip -r directory compresses all the files in the directory tree
compress filename compresses the file (not as well as gzip) and stores it in a new file named filename.Z
gunzip filename.gz uncompresses the file again (gunzip also works on .Z files)
gzip is very effective on text files and postscript files, but is less effective at compressing large binary files (eg. executables)

tar for directories

if you have a large directory structure to move/copy (eg. to floppy disk, ftp, email, etc) it is a tedious process to handle each individual file
tar -cvf filename.tar directory
takes a directory named directory and stores a copy of the directory structure and all its subdirectories and files into a single file named filename.tar
tar -xvf filename.tar reverses that process (i.e. takes the tarred files, and recreates all the directories, subdirectories etc)
common practices include applying tar to the directory, then using gzip on the result (eg. foo.tar.gz),
this is often done when you have a directory with many and/or large files that you don't expect to use for a considerable time

Controlling processes and monitoring resources

Some of the main functions of any operating system are the control, allocation, and tracking of system resources
The primary resources include:
- processes (programs) and CPU access
- secondary storage (disk space)
- main memory access
- device access (eg. printers)
- environment and administrative information
In this lecture we'll examine commands available to monitor or control processes

Monitoring processes

When you type in a command it is interpretted by the system, and generally a program is started (eg. ls, or mail, pico, etc)
Every program running on the system is called a process
It is possible to have multiple processes started at any given time, each of them taking turns using the CPU
the ps -f command allows us to monitor what the processes are doing, and what resources they are using
the time command allows us to determine how long a program took, both in real time and in CPU time
the which command allows us to determine where the executable for a program came from
(since there may be different programs with the same name in different directories)

The ps -f command

Lists the processes you have running, and related information

USER: your login id
PID: a process id (unique integer for each process)
PPID: parent process id (which process started this one)
%CPU: what % of the system CPU time is this process using
STARTED: when did this process start
TTY: which connection is this process running through
TIME: how long has this process been running
COMMAND: what command started this process

Controlling processes

Controlling processes includes starting processes, terminating them, temporarily stopping them, and restarting them
The program running on your screen is called the foreground process
If you have another program you wish to run at the same time, it is possible to run this in the background, while still running another process in the foreground
(usually the foreground process is something interactive you want to work with, while background processes are programs which take a while to run and require little interaction)
Thus controlling processes can also include moving them between foreground and background
It can also include changing their priority, so they run more slowly and use less of the CPU (being nice to other users on the system)

The kill command

There are two ways to kill a program, the kill command, or ^C (control-C)

^C terminates the foreground process, this works for almost all programs
kill -9 0123 kills the process with process id 0123 (of course, this means you need to know the PID, remember the ps command?)

The fg and bg commands

These move processes between foreground and background

While ^C kills a job, ^Z suspends the job (it stops running, but can be restarted from whereever it stopped)
fg moves the most `recent' background or suspended job into the foreground
bg starts the most `recent' suspended job as a background job
To run a job in the background, attach & to the end of the command when you first start it
If a background job needs input it is automatically suspended

The nice command

Controls how much of the CPU a job can access per minute

Each job has a priority number from 1 to 19
Large numbers mean low priority
If you have a long number-crunching job, be nice to other users and run it in the background with low priority
You can slow your jobs down, but you can't speed them up
nice ps runs the ps command at level 10
The renice command can change (slow down) the priority of a job that's already running
renice ps runs the ps command at level 10

Monitoring disk utilisation

Disk space is frequently in short supply on multi-user systems.
Many commands exist to help you measure current use of available space
The ls -l filename command allows you to see how much space (bytes) is taken up by a specific file
The du command allows you to measure how much disk space is taken up by everything in/below a given directory
The df command allows you to see how much disk space is available in different system areas
The quota command allows you to see the limits and use of disk space for your account

The du command

du allows you to see how much space is taken up by files in/below the current directory
each directory is listed seperately
the grand total is given at the end
totals are given in kilobytes

The df and quota commands

- df lists the system directories and space used/available in each
- df directoryname lists the space used/available in that directory

quota displays your use and limits on disk space and number of files
- quota is a soft limit you can temporarily exceed
- limit is the hard limit
- filesystem indicates which directory area your files are stored in

Monitoring users

While the sysadmin does most of the worrying about who is doing what on the system,

*nix assumes a co-operative environment where anyone can get much of this information

the who and users commands tells you who is on the system
the w (a.k.a. what) command tells you what each user is doing
the whoami command tells you what your login id is (useful if you have multiple accounts and forget which one you're on)
the finger command allows you to check login details for a specific person
(if they're logged in now, if they've read their mail, when they last logged on, etc)

The w command

The w command lists who is on the system and what they're doing

The first line lists:
- the time,
- how long the system has been up,
- how many users are on the system,
- what the load is on the system over the last 3 checks
  (high loads mean slow response time)
the remaining lines list, for each user currently logged in:
- their login id
- the connection (tty) they're using and where from
- the time they logged in at
- idle time (since they last typed something)
- how much CPU their current job has used so far
- what that job is

The who, whoami, and users commands

These commands list who is currently on the system, in different formats

who lists who is logged in:
- login id
- connection (tty)
- time they logged in at

users provides the list of names only
whoami just echoes your own login id

Checking your environment

A wide variety of conditions exist when you are logged in to a system
To a certain extent, it is possible to customise the conditions you are working in
The conditions and customisations are referred to as your environment, and the env command lists the conditions that currently prevail
In addition, the hostname command can tell you the exact machine you're on,
and the uptime command can detail the current operating state of that machine

The uptime and hostname commands

hostname gives the exact name of the machine you are logged into.

uptime gives:

the current time (by the system clock)
how long the system has been up for
the number of users on the system
the average load on the system in the last 5, 30, and 60 seconds
(low load means fast response time, high load means poor response time)

I/O streams, redirection, and command lines

There are four main techniques for getting information into and out of programs

Standard I/O: using cout and cin, user types data in and output is displayed on monitor
I/O redirection: using the << and >> and | operators to re-route all output to go to a file or to take all input from a file
Command line arguments: effectively adding parameters when you call a program (eg. pine foo, foo is the file to be opened/edited)
File input/output: opening one or more files for reading and writing from inside the program

Each of these options has advantages and disadvantages

I/O redirection

I/O redirection is carried out from the prompt, rather than from inside the program

command > filename re-routes all output from the command program to overwrite the specified file
command >> filename re-routes all output from the command program to be appended to the specified file
command << filename causes the command program to take all its input from the specified file
command1 | command2 causes all the output from the command1 program to be used as input for the command2 program

Redirection examples

ps > myfile sends the output of ps into myfile
cat myfile scrolls the contents of myfile on the screen
date >> myfile appends today's date onto myfile
cat myfile scrolls the file contents again
cat myfile | wc sends the contents of myfile into the wc program, which counts the number of lines, words, and characters in the file
ls | wc performs ls on the current directory and uses that as the input to wc