In my previous post, I described how to setup a highly available GusterFS server environment using terraform. I briefly mentioned the provisions I had taken to ensure high performance as well, but didn't go into details of performance validation. This is what I'll do here.
To recap, the expected maximum performance from the three Gluster servers was:
To measure this performance, we need a sufficient number of Gluster clients and a load generator that is capable of orchestrating a distributed filesystem load in a reliable and reproducible manner. I've been using Vdbench for this purpose for quite some time and it proved to be a reliable tool here once more. To have sufficient power, I used 9 VM.Standard2.24 systems, 3 in each AD. This gave me sufficient CPU and network bandwidth to be sure that the clients would not be the limiting factor in the performance testing. The general architecture is shown in the diagram from my first post.
Vdbench is a very flexible and powerfull load generator for file or disk IO. You can find the user guide on its download page. (It's usually appropriate to use the latest available version.) I will briefly discuss the configuration file I used for this setup:
hd=default,jvms=24 hd=one,system=10.43.9.5,user=opc,shell=ssh hd=two,system=10.43.9.6,user=opc,shell=ssh hd=three,system=10.43.9.7,user=opc,shell=ssh hd=four,system=10.43.9.8,user=opc,shell=ssh hd=five,system=10.43.9.9,user=opc,shell=ssh hd=six,system=10.43.9.19,user=opc,shell=ssh hd=seven,system=10.43.9.18,user=opc,shell=ssh hd=eight,system=10.43.9.17,user=opc,shell=ssh hd=nine,system=10.43.9.16,user=opc,shell=ssh fsd=fsd1,anchor=/gluster/fast/work,depth=1,width=4,files=50000,size=20M,shared=yes fwd=default,xfersize=4k,fileio=random,fileselect=random,threads=350,stopafter=5,rdpct=100 fwd=fwd1,fsd=fsd1,host=one fwd=fwd2,fsd=fsd1,host=two fwd=fwd3,fsd=fsd1,host=three fwd=fwd4,fsd=fsd1,host=four fwd=fwd5,fsd=fsd1,host=five fwd=fwd6,fsd=fsd1,host=six fwd=fwd7,fsd=fsd1,host=seven fwd=fwd8,fsd=fsd1,host=eight fwd=fwd9,fsd=fsd1,host=nine *rd=rd1,fwd=fwd*,fwdrate=max,format=(clean,only),interval=1 rd=rd3,fwd=fwd1,fwdrate=max,format=restart,elapsed=120,interval=1 rd=rd4,fwd=(fwd1-fwd2),fwdrate=max,format=restart,elapsed=120,interval=1 rd=rd5,fwd=(fwd1-fwd3),fwdrate=max,format=restart,elapsed=120,interval=1 rd=rd6,fwd=(fwd1-fwd4),fwdrate=max,format=restart,elapsed=120,interval=1 rd=rd7,fwd=(fwd1-fwd5),fwdrate=max,format=restart,elapsed=120,interval=1 rd=rd8,fwd=(fwd1-fwd6),fwdrate=max,format=restart,elapsed=120,interval=1 rd=rd9,fwd=(fwd1-fwd7),fwdrate=max,format=restart,elapsed=120,interval=1 rd=rd10,fwd=(fwd1-fwd8),fwdrate=max,format=restart,elapsed=120,interval=1 rd=rd11,fwd=(fwd1-fwd9),fwdrate=max,format=restart,elapsed=120,interval=1
A few remarks to explain this configuration:
The read:write ratio for a full run is set by changing the value for "rdpct" in the defaults for fdw. With that, a full run will scale the number of clients from one to nine without modifying any other parameter.
These tests were run on this equipment:
Tests were run with several different workload combinations to show the most common performance characteristics:
The following graphs show the results:
As we can see, write performance reaches approx. 3GB/sec, close to the theoretical maximum estimated earlier. By increasing the read percentage in the workload, the overall throughput increases, as read traffic doesn't need to be replicated three times. The best result (with eight or nine clients) maxes out at just over 8GB/sec. While this doesn't reach the theoretical limit of around 9GB/sec, it is still a respectable result.
Network bandwidth is not the limiting factor for the second test with a small blocksize of just 4kB. In this case, we measure how many network packets can be transfered and filesystem operations can be serviced by this infrastructure. The impact of replication for the write workloads is still visible, but not as significant as with large transfers. The very linear increase per client flatens out slightly when going from eight to nine clients and reaches a maximum of around 70k IOPS for the read-only workload. Remember that this is testing on a distributed filesystem, so IOPS measured here can not be directly mapped (or compared) to raw disk IOPS possible with a raw block volume.
IO Size 4MB | write | IO Size 4MB | read/write | IO Size 4MB | read | |||||
No. of Clients | MB/s | FOPS | No. of Clients | MB/s | FOPS | No. of Clients | MB/s | FOPS | ||
1 | 968 | 242 | 1 | 1595 | 398 | 1 | 1027 | 256 | ||
2 | 1908 | 477 | 2 | 2953 | 738 | 2 | 2125 | 531 | ||
3 | 2850 | 712 | 3 | 4085 | 1021 | 3 | 3169 | 792 | ||
4 | 2925 | 731 | 4 | 5011 | 1252 | 4 | 4181 | 1045 | ||
5 | 2923 | 730 | 5 | 5790 | 1447 | 5 | 5248 | 1312 | ||
6 | 2914 | 728 | 6 | 5873 | 1468 | 6 | 6212 | 1553 | ||
7 | 2864 | 716 | 7 | 5789 | 1447 | 7 | 7279 | 1819 | ||
8 | 2791 | 697 | 8 | 5710 | 1427 | 8 | 8113 | 2028 | ||
9 | 2769 | 692 | 9 | 5593 | 1398 | 9 | 8119 | 2029 | ||
IO Size 4k | write | IO Size 4k | read/write | IO Size 4k | read | |||||
No. of Clients | MB/s | FOPS | No. of Clients | MB/s | FOPS | No. of Clients | MB/s | FOPS | ||
1 | 27 | 6958 | 1 | 28 | 7153 | 1 | 35 | 9036 | ||
2 | 53 | 13577 | 2 | 54 | 13909 | 2 | 65 | 16629 | ||
3 | 76 | 19609 | 3 | 82 | 21071 | 3 | 100 | 25658 | ||
4 | 105 | 26979 | 4 | 107 | 27590 | 4 | 134 | 34410 | ||
5 | 129 | 33197 | 5 | 133 | 34181 | 5 | 165 | 42264 | ||
6 | 152 | 39060 | 6 | 157 | 40273 | 6 | 199 | 51034 | ||
7 | 175 | 44862 | 7 | 182 | 46745 | 7 | 237 | 60702 | ||
8 | 198 | 50737 | 8 | 205 | 52699 | 8 | 266 | 68342 | ||
9 | 218 | 55821 | 9 | 229 | 58743 | 9 | 272 | 69646 |
With these tests, I could confirm the performance estimates that were based on physical capabilities of the used server equipment. They show several things:
Finally a word about price-performance. Based on the OCI cost estimator (prices as of 2019-12-09), the monthly cost for the tested server setup is:
This means you pay approx. $11,500 per month for a full HA configuration with 32TB of net storage capable of 8GBit/sec read throughput.
A similar configuration on AWS, consisting of 3 servers "m5.metal" and 3x 8x4TB volumes of EBS storage are estimated at around $21,642 by the AWS "Simple Monthly Calculator"... As for the expected performance, this would be up to your own testing.
After around 20 years working on SPARC and Solaris, I am now a member of A-Team, focusing on infrastructure on Oracle Cloud.
Previous Post