What is this?

The golden age of Ryzen for gaming virtualization is upon us. If the reader is not familiar with this technology, it allows one to run a virtual machine with direct access to a secondary graphics card. This allows full GPU acceleration of workloads running inside a virtual machine. The most common use cases for non-server non-enterprise uses are for running legacy applications and games that require direct access to a GPU. If everything is working correctly, there is a negligible performance hit. Running a gaming VM vs “bare metal.”

On Linux, with the launch of Ryzen from AMD this type of workload had been problematic. At first the platform was not well suited for sharing PCIe resources between host and guest. This turned out to be a software oversight and was fixed within a couple of months of the launch of Ryzen via an AGESA platform update from AMD. However, the performance has not been as good as it has been on Intel platforms and this was down to a software bug with KVM, one of several hypervisors available for Linux that enables this type of virtualization.

As of this article, that bug has been fixed (mad props to Geoffrey and Paolo) but it will be some time before the patch is included in mainline Linux kernels.

This article will walk you through patching your kernel if you are using Ubuntu, Debian or Fedora distributions. If you are on Arch, you probably already know what you are doing and all you need is the patch itself, which is here.

You should also know that this essentially completely fixes the issue on Ryzen CPUs, but Threadripper (being a bit newer) still has a couple annoying issues that I am certain will be worked out with software fixes. In fact, one reason I am using Ubuntu as the basis of this article is because I am working with a board partner to nail down the specifics of these issues (at least on the hardware side).

Getting Started

You have a choice about what kernel you use. This patch will work on many different kernels including shipping kernels with Debian, Ubuntu and Fedora. If you wish, you need not replace your entire running kernel – the patch is actually just for kvm_amd.o (which is, 99.9% of the time, just a kernel module on your distro).

If you wish to run a bleeding edge kernel, which is 4.14-rc6 as of the time of this article, then you can clone the kernel source from a source that is appropriate for you.

Ubuntu (Bleeding Edge)

git clone    git://git.launchpad.net/~ubuntu-kernel-test/ubuntu/+source/linux/+git/mainline-crack v4.14-rc6

Ubuntu (Just modify what you have)

apt source linux-image-$(uname -r)

 

Next steps

Navigate to the kernel source folder on the CLI

Copy your current kernel’s options and config from /boot to this new unconfigured kernel

Apply the patch

The commands might be something like

cd v4.14rc6
cp /boot/config-4.10.0-37-generic .config
patch -p1 < ryzen.patch

Where ryzen.patch could be /home/youruser/Downloads/ryzen.patch (or you copied ryzen.patch there), or whatever.

https://patchwork.kernel.org/patch/10027525/

Just copy-paste this part into a ryzen.patch file:

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index af256b786a70..af09baa3d736 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3626,6 +3626,13 @@  static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	u32 ecx = msr->index;
 	u64 data = msr->data;
 	switch (ecx) {
+	case MSR_IA32_CR_PAT:
+		if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
+			return 1;
+		vcpu->arch.pat = data;
+		svm->vmcb->save.g_pat = data;
+		mark_dirty(svm->vmcb, VMCB_NPT);
+		break;
 	case MSR_IA32_TSC:
 		kvm_write_tsc(vcpu, msr);
 		break;

The problem this fixes is quite funny -- basically, in the AMD version of this patch, the SVM properties for cachable operations are ignored. Cache was never used, up till now. So of course performance with KVM was not great. Some people, myself included, opted to use the Xen hypervisor in lieu of KVM which did not have this bug (and performance there was pretty good). Xen has its own sets of headaches, however, and I prefer KVM. 

You should see “Hunk 1 of 1 succeeded”) or something similar to that. (It is OK if you get a message about an offset.)

Now, if you are intending to replace your running kernel with a new version, you must build the kernel, install it and reboot.

The commands I used on this Ubuntu install were:

yes '' | make oldconfig
make clean
make -j 16 deb-pkg LOCALVERSION=-custom

 

Note the –j 16 is the number of threads you want to run for the compile job. I was on Threadripper so 31 or 32 might have been appropriate.

Did you get a segfault on compiling? You may need to update your UEFI or RMA your CPU. More info here. This process will typically take between 4 and 45 minutes depending on your system.

Once the compile is done, you will have a number of .deb binary packages. These must be installed with a command such as

sudo dpkg -i linux-headers-4.14.0-rc6-custom_4.14.0-rc6-custom-1_amd64.deb linux-image-4.14.0-rc6-custom_4.14.0-rc6-custom-1_amd64.deb  linux-libc-dev_4.14.0-rc6-custom-1_amd64.deb

If your system was already configured for passthrough virtualization, then you can reboot and test out the new kernel. If not, please see one of our earlier guides on setting up passthrough virtualization.

Updating kvm_amd.o only

If you’d rather not replace your entire kernel, the steps are slightly different. Instead of installing the new packages with dpkg, you will simply copy the kvm_amd.o kernel module over top of your old kvm_amd.o (be sure to save a backup if you want to undo). A command such as:

cp arch/x86/kvm/kvm-amd.ko /lib/modules/$(uname -r)/kernel/arch/x86/kvm/

will overwrite your kernel module for kvm-amd in / with the one you just compiled in your kernel source working directory. From there a simple

rmmod kvm-amd
modprobe kvm-amd

should remove and reload the kvm kernel module (note that no VMs can be running. If you are unable to rm the module, then simply reboot your machine for the updated kernel module to be installed). Note that my assumption is that npt=1 is the default on your system. If you have added module options in /etc/modprobe or equivalnet to disable npt (npt=0) then you will need to change that to npt=1 to re-enable nested page tables on your system. 

Configuration Recommendations

I recommend

iommu=pt amd_iommu=on

for your kernel parameters in /etc/default/grub for both Ryzen and Threadripper platforms. It may also be necessary to enable unsafe interrupts for the vfio_iommu_type1 kernel module. To learn more about the particulars and step-by-step to setup this type of virtualization, please see our earlier articles for the step by step. Allowing unsafe interrupts by passing the option to the kernel module is likely the only thing you might have to do beyond those instructions.

 

Where do we go from here?

Our tested hardware config is a 16-core Threadripper 1950X, Gigabyte Designare X399, GSkill Trident Z DDR4-3200 B-Die memory, ASUS Strix Fury (Host), and a Vega 56 (Guest). We are using the Noctua 120mm TR4 Tower Cooler for cooling and it does a marvelous job. 

If you are bored, post your system specs and your before/after benchmarks or framerates when applying the patch. One can never have enough data!

We will be doing a full AMD guide/article once Kernel 4.15 drops. It has a lot of patches/improvements for everything in the AMD ecosystem. I would encourage you that, if you have bought a Threadripper system, and you are having trouble getting your passed-through peripherals to work, that you kindly let AMD know about your hardware configuration so it can be tested:

http://www.amdsurveys.com/se.ashx?s=5A1E27D24DB2311F

I have been working very hard behind the scenes on the issue with board parners, and progress is being made. Right now Vega 56/64 works well as a guest GPU (but you do have to reboot your host machine -- I believe I've seen a fix for this slated for inclusion in kernel 4.15, but I have not yet tested it.)