
	"Too little, too slow"

      An introduction to memory management
	and an overview of Linux 2.5 MM.

		Rik van Riel
            riel@conectiva.com.br


	1) Introduction to memory management.


To understand memory management, one needs to have a
good conceptual and quantative overview of the memory
hierarchy, the hierarchy of progressively smaller, faster
and more expensive types of memory that populate every
computer today. Because of this (and because the talk is
scheduled early in the morning), we'll start the talk with
a 30 minute overview of the memory hierarchy and some of
the concepts involved in memory management.

If you're already knowledgeable about memory management,
you may want to skip the rest of this abstract and move
on to section 2).

The speed difference between CPU and memory (25 to 100
times as slow) and memory and hard disk (>100.000 times
as slow) is quite big. Because of this the memory
hierarchy poses some "interesting" performance problems
that the operating system has to deal with.

The speed difference between CPU and memory is mainly
masked by "cache"; cache is very fast memory and using
it does not need support from the Operating System or
application. However, there are some tricks the OS can
perform to make it easier for the cache to do its job
well and to raise system performance.

The speed difference between memory and hard disk is
truly enormous. Furthermore, data on disk will be saved
permanently so we need to store some of it in a way that
we can find it back after the computer is rebooted.

This means that we have to store the data in a "filesystem".
I won't talk about how a filesystem works. The important
part is that a filesystem works like an index where you
have to look up where the data is.

The extra lookup means that the disk would be even slower,
more than a million times as slow as the processor!  The
only reason that the system still runs reasonably fast is
because of some memory management and filesystem tricks.


	2) A look at Linux 2.5 VM


In this part we'll present some new ideas for Linux memory
management. While current Linux memory management should be
able to cope with most "normal" system loads just fine, it
isn't as good as it could be and should be improved a bit to
handle extreme situations a bit better. The following ideas
will be presented.


[2.5 VM]

In Linux 2.5 virtual memory management will see some considerable
changes. One of the main problems with the current Linux memory
management is that sometimes we cannot make a proper distinction
between pages which are in use and pages which can be evicted from
memory to make room for new data. 

In order to improve that situation and make the VM subsystem more
resilient against wildly variable VM loads, we will use ideas from
various other operating systems to improve Linux memory management.
The main page replacement routine will use the active, inactive and
scavenge (cache) lists as found in FreeBSD. This mechanism maintains
a balance between used and old memory pages so there will always be
"proper" pages around to swap. In addition to this there will probably
be things like dynamic and administrator settable RSS limits, anti hog
code to prevent one user or process from hogging the machine and slowing
down the rest of the machine and per-user memory accounting.


[anti hog code]

The virtual memory management of most modern operating systems works
under the assumption that every page is of equal importance, applying
equal memory pressure to each page in the system. This can lead to the
situation where one memory hog is running happily and touching all its
pages all the time (since it is in memory it is fast) and the rest of
the system is thrashing (and will continue to do so since it is running
so slow that it won't get a chance to use its memory before the next
pageout scan).

Since this is a very unfair situation that nobody wants to run into
and also can cause very inefficient system use, we should leave the
idea that every page is equally important behind. There are a number
of ideas that can improve this situation considerably. Two of these
will be presented in this lecture. One is the simple anti-hog code
that was experimented with in the 2.3 kernel and the other is the
solution of dynamic RSS limits.


[process suspension]

When the memory load on the system is just too big (eg. when the working
set of all running processes no longer fits in memory) paging is no longer
enough and something else needs to be done. The simplest solution is to
simply suspend a process for a while so that the sum of all working set
sizes is small enough to fit in memory.

The obvious questions arising with this solution are: which process(es) to
suspend?  For how long should they be suspended?  How do we ensure fairness?
How do we make sure that every process is able to get some work done?  How
do we make sure interactive performance isn't impacted too much?

The algorithm presented is a variation on the algorithm used by ITS (the
Incompatible Timesharing System), where the system makes an attempt at
measuring throughput made times memory used, averaged over time. Using this
per-process number the system can estimate how badly the system is thrashing
(do we need to suspend a process) and make sure all processes receive fair
treatment.
------------------

Keywords: cache, read ahead, virtual memory, filesystem,
memory management, scheduling, memory hierarchy, performance.
