Tuesday, April 15, 2008

Memory overcommit with kvm

kvm supports (or rather, will support; this is work in progress) several ways of running guests with more memory that you have on the host:

Swapping
This is the classical way to support overcommit; the host picks some memory pages from one of the guests and writes them out to disk, freeing the memory for use. Should a guest require memory that has been swapped, the host reads it back from the disk.

Ballooning
With ballooning, the guest and host cooperate on which page is evicted. It is the guest's responsibility to pick the page and swap it out if necessary.

Page sharing
The hypervisor looks for memory pages that have identical data; these pages are all merged into a single page, which is marked read only. If a guest writes to a shared page, it is unshared before granting the guest write access.

Live migration
The hypervisor moves one or more guests to a different host, freeing the memory used by these guests


Why does kvm need four ways of overcommitting memory? Each method provides different reliability/performance tradeoffs.

Ballooning is fairly efficient since it relies on the guest to pick the memory to be evicted. Many times the guest can simply shrink its cache in order to free memory, which can have a very low guest impact. The problem with ballooning is that it relies on guest cooperation, which reduces its reliability.

Swapping does not depend on the guest at all, so it is completely reliable from the host's point of view. However, the host has less knowledge than the guest about the guest's memory, so swapping is less performant than ballooning.

Page sharing relies on the guest behavior indirectly. As long as guests run similar applications, the host will achieve a high share ratio. But if a guest starts running new applications, the share ratio will decrease and free memory in the host will drop.

Live migration does not depend on the guest, but instead on the availablity of free memory on other hosts in the virtualization pool; if other hosts do not have free space, you cannot migrate to them. In addition, live migration takes time, which the host may not have when facing a memory shortage.

So kvm uses a mixed strategy: page sharing and ballooning are used as the preferred methods for memory overcommit since they are efficient. Live migration is used for long-term balancing of memory requirements and resources. Swapping is used as a last resort in order to guarantee that services to not fail.

17 comments:

Anonymous said...

Nice blog you have started. This blog topic is really interesting. Please provide more information about what kvm or more likely linux kernel can do for virtualization.

I does not exist any good resources about what kvm can do. It's usually gets drown in all the fuzz about what the linux kernel can do.
For example, what Storage types can a virtual machine image be stored on? yes, I now, all that types the linux kernel can support. But normal people don't see this connection. And what about high availability with kvm?

Дамјан said...

>> And what about high availability with kvm? <<

What about it? What exactly are you looking for in a virtualization sollution to help you with high availability?

Anonymous said...

Does Vista's ASLR affect KVM's ability to perform page sharing with Vista guests, or does it not matter?

Avi Kivity said...

Brian: I don't know exactly how ASLR is implemented. If it uses Position-Independent Code (PIC) then impact on sharing will be minimal.

Jack said...

Avi,

Which of these technologies is operational as of now ?

Avi Kivity said...

Everything is now available and working, except for page sharing which has not been merged yet. Page sharing is likely to be included in 2.6.31.

Unknown said...

Very informative post!

new releases of qemu provides manual configuration of the balloon driver. What about automatic approach? i.e in memory contentions raise the balloon driver value?

Avi Kivity said...

qemu only knows about the guest that it hosts; automatic ballooning needs to take into account all guests as well as host memory pressure. Therefore it is left for the management application, which can allocate memory to guests and adjust the balloon size using the qemu monitor.

Jack said...

Can you do a self ballooning approach in KVM similar to what Dan Magenheimer has published for Xen , i.e. the guest knows its working set size and a script in the guest inflates the balloon to release the extra memory above its working set size?

Avi Kivity said...

It should be easy to implement (or port) this feature.

Unknown said...

From the balloon definition it looks like you cant guarantee in 100% that the memory will be freed but rather "try to free". For example lets say that you define a "reservation" value of 1G. You allow the guest to grow to max of 2G, but would like in anytime to be able to shrink it to 1G back. What can guarantee that? From your experience is there a threshold that could make sure of that? (like taking spare of 0.25%)

Avi Kivity said...

Right, we cannot guarantee the guest will balloon itself down. The guest may have unloaded its balloon driver, or it may be dead. In that case, we can swap it out (we can selectively swap guests using control groups),

Unknown said...

we can swap it out (we can selectively swap guests using control groups) - Could you provide more details on that? Is that a consequence of using balloon, or another KVM command?

Avi Kivity said...

No, it's an unrelated Linux feature called control groups. See Documentation/controllers/memory.txt

Jo-Erlend Schinstad said...

«Page sharing is likely to be included in 2.6.31.»

Is it?

Avi Kivity said...

No, page sharing was deferred. I hope it will be included in 2.6.32.

Limbo.N said...

and what about this :
http://lwn.net/Articles/329123/
ksm isnt't stable enough or just because you want kernel integration ?

NP.