Benchmarking Amazon RDS Graviton2

How much performance will you get with Amazon RDS Graviton2?

Amazon Web Services (AWS) announced in October 2020 the new Graviton2 based Amazon RDS instances. These can provide up to 35% performance improvement and up to 52% price/performance improvement for RDS open source databases depending on database engine, version, and workload. But what performance and value can you actually expect?

Graviton2

Graviton2 processors are based on the 64-bit Arm Neoverse cores. Current M5 instances are based on the Intel® Xeon® Platinum 8175M Processors and R5 instances are based on the Intel® Xeon® Platinum 8175 Processors. While these Intel cores have different architectures and features, the M5 and R5 instance processors support DDR4-2666 while Arm Neoverse N1 supports DDR4-3200. You can jump here into more detailed analysis of the Graviton2 processors.

Benchmark Setup

In order, to get some reference data, I performed benchmarks with HammerDB with the TPROC-C benchmark that derived from the TPC-C benchmark. As database I am using Amazon RDS with PostgreSQL in Production mode. This means 100GB storage with 3000 Provisioned IOPS but as Single-AZ-Deployment. All settings under Additional Configuration except Encryption are disabled.

As client I am using a c5.2xlarge with 8 vCPUs to avoid a bottleneck. The benchmark has 320 warehouses and up to 32 users with an iteration of 1, 2, 4, 8, 16, 32. We are starting each iteration with 5min ramp-up time and let each benchmark run for 15min. As result we are collecting the total NOPM and the Cumultative Total for each procedure. NOPM or New Orders per minutes is a close relation of the official tpmC statistic recording but should be not compared with the tpmC. I performed the benchmarks for the M5 vs M6G, and R5 vs R6G instances from the large instances (2 vCPUs) up to the 8xlarge instances (32 vCPUs).

Benchmark Comparison

While the results show that the M5 and R5 instances perform in some cases better, it is important to consider that we need to look at the price per performance. For example, let’s compare a fast graphics card with a score of 200 vs a budget graphics card with a score of 100. If the fast graphics card costs 3,000 USD and the budget graphics card 100 USD you will definitely get a better value/price per performance for the budget card. The same goes here for the RDS instances. The Graviton2 instances are cheaper than their Intel competitors. So we have to consider that even if in some cases Graviton2 instances perform by NOPMs worse than the M5 and R5 instances, they can still give you a better value for price per performance. Therefore, we are comparing only the NOPMs per USD.

TPROC-C Workload

Below you can see the TPROC-C schema. It fulfills orders from customers to supply products from a company. In the TPROC-C schema, the number of rows of all tables, except the Item table, are scaling with the number of warehouses you specify. Read more here at TPROC-C Schema.

TPROC-C Schema – Source: https://www.hammerdb.com/docs/ch03s05.html

The TPROC-C workload uses 5 different transactions. As you can see, the New-order and Payment transactions make up the highest percentage.

  • New-order (45%)
    • receive a new order from a customer
  • Payment (43%)
    • update the customers balance to record a payment
  • Delivery (4%)
    • deliver orders asynchronously
  • Order-status (4%)
    • retrieve the status of customers most recent order
  • Stock-level (4%)
    • return the status of the warehouses inventory

Results

Overall Comparison

If we look at the overall results for M5 to M6 and R5 to R6 instances, it is clear, that we have a better price per performance for the Graviton2 instances.

Overall NOPM per USD for M5 vs M6G (large to 8xlarge)

The M6G instances perform in average up to 20% better considering the price per performance. We can see here that the NOPM per USD drop at 16 and 32 virtual users. This might be due to too high workload on the databases or a too weak client. Each client (c5.2xlarge) has only 8 vCPUs.

Overall NOPM per USD for R5 vs R6G (large to 8xlarge)

Also the R6G instances perform in average up to 20% better for price per performance. They achieve a lower maximum score of 484007 for a load of 8 virtual users vs the M6G that achieves 656160 NOPM per USD. This is because the R-instances cost more per hour. If we directly compare the NOPM, it shows that the R5 and R6G instances perform better than the M5 and M5G instances. This can be due to their greater memory.

Results per Instance Type for M5 vs M6G

The results of the direct M5 vs M6G comparisons show that the Graviton2 instances bring overall a better value for price per performance. As I mentioned before, the NOPM score drops towards 32 virtual users and this could be related due to too high load or a too weak client.

m5.large vs m6g.large
m5.xlarge vs m6g.xlarge
m5.2xlarge vs m6g.2xlarge
m5.4xlarge vs m6g.4xlarge
m5.8xlarge vs m6g.8xlarge

Results per Instance Type for R5 vs R6G

The majority of the memory optimised R6G instances perform also better than the R5G instances. However, the r6g.4xlarge and r6g.8xlarge show only in some cases better results. This r6g.4xlarge has 16 vCPUs and 128GB memory while the r6g.8xlarge has 32 vCPUs and 256GB memory. Our client, the c5.2xlarge, has only 8 cores and 16GB memory and might not be able to generate a realistic load. We will get into this incoherence in more detail below.

r5.large vs r6g.large
r5.xlarge vs r6g.xlarge
r5.2xlarge vs r6g.2xlarge
r5.4xlarge vs r6g.4xlarge (read more below)
r5.8xlarge vs r6g.8xlarge (read move below)

Detailed results

In detail, we see that m6g.large instance performs especially better for the workloads that have the most transactions.

Cumultative Total per procedure – m5.large vs m6g.large

The scale of the r6g.large instance goes only up to 700 Mio. while the scale of the m6g.large goes up to around 1,600 Mio. for NOPMs per USD.

Cumultative Total per procedure – r5.large vs r6g.large

In the NOPM scale, the R6G surpasses the M6G for the Payment transactions but is in a similar range for the New-Order transactions, except there for 32 virtual users. Again, the R-instances are more expensive compared to the M-instances and as a result show a lower NOPM per USD score.

Detailed Results for r6g.4xlarge and r6g.8xlarge

As we can see here, the greater the score gets, the greater the leap of the Graviton2 instance to the R5 instance gets. Also, since Payment and New-Order make out the highest amount of transactions of the TPROC-C, they provide us a more thorough result that shows the price per performance benefit of the Graviton2 instances. This contradicts the NOPM per USD result above.

Cumultative Total per procedure – r5.4xlarge vs r6g.4xlarge

The r6g.8xlarge shows for the payment procedure even a greater score compared to the r5.8xlarge. So the more transactions are being executed, the more obvious it is that the Graviton2 delivers a better price per performance. This also contradicts the NOPM per USD result above.

Cumultative Total per procedure – r5.8xlarge vs r6g.8xlarge

Summary

Overall, the Amazon RDS Graviton2 instances provide a better price per performance for you. When switching from your M5 or R5 instances to the M6G and R6G, you can experience positive or negative changes of your performance but overall, you will get a better value for what you pay. Based on the summed TPROC-C results of the Delivery, New-Order, and Payment workloads, for M6G instances performance can decrease to -5.7% but also increase to up to +14% while the performance of R6G instances can decrease to -5.6% and increase to +15.6%. Keep in mind that your workload will differ from the TPROC-C.

In conclusion, as this benchmark showed, you will get overall a better value for the price you pay per database performance with PostgreSQL on Graviton2.

Run the Benchmark yourself

If you want to run the benchmark by yourself, you can clone the respository sven-hammerdb-benchmark. It is meant to run on an AWS Cloud9 instance and it will install all needed dependencies.

References