Amazon Web Services (AWS) announced in October 2020 the new Graviton2 based Amazon RDS instances. These can provide up to 35% performance improvement and up to 52% price/performance improvement for RDS open source databases depending on database engine, version, and workload. But what performance and value can you actually expect?
Graviton2 processors are based on the 64-bit Arm Neoverse cores. Current M5 instances are based on the Intel® Xeon® Platinum 8175M Processors and R5 instances are based on the Intel® Xeon® Platinum 8175 Processors. While these Intel cores have different architectures and features, the M5 and R5 instance processors support DDR4-2666 while Arm Neoverse N1 supports DDR4-3200. You can jump here into more detailed analysis of the Graviton2 processors.
In order, to get some reference data, I performed benchmarks with HammerDB with the TPROC-C benchmark that derived from the TPC-C benchmark. As database I am using Amazon RDS with PostgreSQL in Production mode. This means 100GB storage with 3000 Provisioned IOPS but as Single-AZ-Deployment. All settings under Additional Configuration except Encryption are disabled.
As client I am using a c5.2xlarge with 8 vCPUs to avoid a bottleneck. The benchmark has 320 warehouses and up to 32 users with an iteration of 1, 2, 4, 8, 16, 32. We are starting each iteration with 5min ramp-up time and let each benchmark run for 15min. As result we are collecting the total NOPM and the Cumultative Total for each procedure. NOPM or New Orders per minutes is a close relation of the official tpmC statistic recording but should be not compared with the tpmC. I performed the benchmarks for the M5 vs M6G, and R5 vs R6G instances from the large instances (2 vCPUs) up to the 8xlarge instances (32 vCPUs).
While the results show that the M5 and R5 instances perform in some cases better, it is important to consider that we need to look at the price per performance. For example, let’s compare a fast graphics card with a score of 200 vs a budget graphics card with a score of 100. If the fast graphics card costs 3,000 USD and the budget graphics card 100 USD you will definitely get a better value/price per performance for the budget card. The same goes here for the RDS instances. The Graviton2 instances are cheaper than their Intel competitors. So we have to consider that even if in some cases Graviton2 instances perform by NOPMs worse than the M5 and R5 instances, they can still give you a better value for price per performance. Therefore, we are comparing only the NOPMs per USD.
Below you can see the TPROC-C schema. It fulfills orders from customers to supply products from a company. In the TPROC-C schema, the number of rows of all tables, except the Item table, are scaling with the number of warehouses you specify. Read more here at TPROC-C Schema.
The TPROC-C workload uses 5 different transactions. As you can see, the New-order and Payment transactions make up the highest percentage.
- New-order (45%)
- receive a new order from a customer
- Payment (43%)
- update the customers balance to record a payment
- Delivery (4%)
- deliver orders asynchronously
- Order-status (4%)
- retrieve the status of customers most recent order
- Stock-level (4%)
- return the status of the warehouses inventory
If we look at the overall results for M5 to M6 and R5 to R6 instances, it is clear, that we have a better price per performance for the Graviton2 instances.
The M6G instances perform in average up to 20% better considering the price per performance. We can see here that the NOPM per USD drop at 16 and 32 virtual users. This might be due to too high workload on the databases or a too weak client. Each client (c5.2xlarge) has only 8 vCPUs.
Also the R6G instances perform in average up to 20% better for price per performance. They achieve a lower maximum score of 484007 for a load of 8 virtual users vs the M6G that achieves 656160 NOPM per USD. This is because the R-instances cost more per hour. If we directly compare the NOPM, it shows that the R5 and R6G instances perform better than the M5 and M5G instances. This can be due to their greater memory.
Results per Instance Type for M5 vs M6G
The results of the direct M5 vs M6G comparisons show that the Graviton2 instances bring overall a better value for price per performance. As I mentioned before, the NOPM score drops towards 32 virtual users and this could be related due to too high load or a too weak client.
Results per Instance Type for R5 vs R6G
The majority of the memory optimised R6G instances perform also better than the R5G instances. However, the r6g.4xlarge and r6g.8xlarge show only in some cases better results. This r6g.4xlarge has 16 vCPUs and 128GB memory while the r6g.8xlarge has 32 vCPUs and 256GB memory. Our client, the c5.2xlarge, has only 8 cores and 16GB memory and might not be able to generate a realistic load. We will get into this incoherence in more detail below.
In detail, we see that m6g.large instance performs especially better for the workloads that have the most transactions.
The scale of the r6g.large instance goes only up to 700 Mio. while the scale of the m6g.large goes up to around 1,600 Mio. for NOPMs per USD.
In the NOPM scale, the R6G surpasses the M6G for the Payment transactions but is in a similar range for the New-Order transactions, except there for 32 virtual users. Again, the R-instances are more expensive compared to the M-instances and as a result show a lower NOPM per USD score.
Detailed Results for r6g.4xlarge and r6g.8xlarge
As we can see here, the greater the score gets, the greater the leap of the Graviton2 instance to the R5 instance gets. Also, since Payment and New-Order make out the highest amount of transactions of the TPROC-C, they provide us a more thorough result that shows the price per performance benefit of the Graviton2 instances. This contradicts the NOPM per USD result above.
The r6g.8xlarge shows for the payment procedure even a greater score compared to the r5.8xlarge. So the more transactions are being executed, the more obvious it is that the Graviton2 delivers a better price per performance. This also contradicts the NOPM per USD result above.
Overall, the Amazon RDS Graviton2 instances provide a better price per performance for you. When switching from your M5 or R5 instances to the M6G and R6G, you can experience positive or negative changes of your performance but overall, you will get a better value for what you pay. Based on the summed TPROC-C results of the Delivery, New-Order, and Payment workloads, for M6G instances performance can decrease to -5.7% but also increase to up to +14% while the performance of R6G instances can decrease to -5.6% and increase to +15.6%. Keep in mind that your workload will differ from the TPROC-C.
In conclusion, as this benchmark showed, you will get overall a better value for the price you pay per database performance with PostgreSQL on Graviton2.
Run the Benchmark yourself
If you want to run the benchmark by yourself, you can clone the respository sven-hammerdb-benchmark. It is meant to run on an AWS Cloud9 instance and it will install all needed dependencies.
- Benchmark Scripts for HammerDB on Cloud9
- AWS Blog Post Amazon RDS Graviton2 performance
- Amazon’s Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute
- Arm Neoverse N1
- AWS and Intel
- AWS re:invent 2019 – Graviton2 Cores
- Intel® Xeon® Platinum 8175M Processors
- Intel® Xeon® Platinum 8175 Processors
- Amazon’s Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute
- HammerDB – About – Does HammerDB implement a real TPC-C or TPC-H benchmark?
- Set Warehouses to cores times 10
- Why both TPM and NOPM Performance Metrics?
- Comparing HammerDB results – NOPM
- Amazon RDS – Instance Types
- HammerDB – Understanding the TPROC-C workload derived from TPC-C