THE RANT / THE SCHPLOG
Schmorp's POD Blog a.k.a. THE RANT
a.k.a. the blog that cannot decide on a name

This document was first published 2016-03-01 05:32:20, and last modified 2016-03-01 05:32:20.

Tidbits: Improving sequential read speed on your Crucial MX SSD

This just in - a single command improved the sequential read speed of my Crucial MX100 512GB SSD by 23%, from 436MiB/s to 534MiB/s on my machine, and similarly on my friends machine.

The same trick didn't work on my Crucial MX100 256GB disk, though, where it decreased performance from 536MiB/s to 480MiB/s.

This might be due to Linux kernel differences, so experimenting is advised.

The trick

The trick is quite simple: decrease the maximum sequential read command length to 64KiB (here for /dev/sda):

echo 64 >/sys/block/sda/queue/max_sectors_kb

The default varies widely between kernel versions (and can be limited by your AHCI port) and is usually 512 or (much) higher.

The above echo limits the maximum sequential block that a read command can read to 64KiB.

The command is harmless when applied to the wrong disk, or with wrong values - all of my rotational disks get their maximum read speed with values a slow as 16.

Why it works

Well, I don't know why it works, but I can make an educated guess: If you benchmark the disk at various queue depths, you can see that it needs a queue depth or 3 or 4 to get maximum sequential read performance, likely because the latency of thew disk is not that good.

Under normal circumstances, the kernel will not queue very many read commands, but by reducing the maximum I/O size per request we artificially generate more I/O requests, which apparently queue better.

Why it works worse on my 256GB model is unknown - it's not uncommon for different sizes of SSDs to perform widely differently, but it could be the kernel - the two 512GB models were used with Linux 3.16.7 and 4.1.18, while the 256GB disk is in a box with Linux 4.4.3, and Linux 4.4 shows very weird I/O speeds on the boxes I try it (including reducing overall throughput on my backup system from about 2.2TB/s with Linux 4.3.5 to less then 500MB/s with 4.4.3).

Drawbacks

The most obvious drawback is that more I/O requests also increase the work for the kernel. At maximum throughput, my system generates roughly 9400 interrupts with a maximum size of 64, while it "only" generates about 3000 interrupts/s for a size of 8192.

Whether this offsets the extra 23% read speed increase is hard to say: The extra read speed will rarely make any difference in practise, but neither will the extra interrupts.

Random I/O should not be affected in any way by this change in any way, so fortunately we don't have to choose between random and sequential I/O - random I/O would win out in that case.

Why only the Crucial MX?

Looking at benchmark numbers, it might work with other drives as well - the Samsung 850 Pro for example also needs a queue depth of around 3 to reach optimal performance, and the trick might help there as well to increase the sequential read speed from 450MiB/s to around 580MiB/s - if you try out with your model, feel free to drop me an note.

I tried this with various hardware raid controllers and rotational media as well, and it doesn't seem to do anything either way (as I would have expected), other than to influence the interrupt rate.

The default read size limit, btw., is 512 on most kernels. Only "recent" kernels (somewhere between 3.19 and 4.1) started to increase this to much higher values, triggering a bunch of firmware bugs at the same time :)