Overview
This article was created to present the following FIX Antenna Java performance improvements:
- An ability to pin session's thread to the specific CPU (affinity) was implemented in the 2.17.0 version of FIX Antenna Java. The new affinity option was enabled for the optimized configuration.
- The new optimized.onload configuration with activated Solarflare OpenOnload network stack was added to the FIX Antenna Java performance testing.
How to measure FIXAJ performance using the FIXAJ package sample
Two Linux machines should be used for the test, one as the Sender Host (Client), the other as the Receiver Host (Server).
FIX Antenna Java performance can be measured by the execution of runClientRoundTripLatencyBM and runServerRoundTripLatencyBM scripts contained in the FA Java package.
Configuration file engine.properties can be found in the benchmarks\etc\benchmark\latency folder.
You can reproduce the measurement on your hardware using the following instruction:
- Go to the benchmarks folder
- On the Receiver Host execute runServerRoundTripLatencyBM script
- On the Sender Host open the runClientRoundTripLatencyBM script and change the IP address from the localhost to the IP of the Receiver Host, run the script
- If a Solarflare Network interface controller is installed on the machine it is possible to activate the Solarflare OpenOnload technique. To achieve this you need to execute the scripts as follows:
onload --profile=latency ./runServerRoundTripLatencyBM on the Receiver Host
onload --profile=latency ./runClientRoundTripLatencyBM on the Sender Host - After the test is completed, the latency.csv file with the raw data will be created on the Sender Host in the same folder
The out-of-the-box engine.properties file may be used for optimized and optimized.onload configurations. For balanced configuration engine.properties file should be empty.
The source code may be found in RoundTripServer.java and RoundTripTester.java samples of the FA Java package (benchmarks\src\com\epam\benchmark\latency).
Environment
Sender Host:
- Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz (2 CPU Hyper-Trading Enabled, 24 Cores)
- RAM 128 GB, 2133 MHz
- NIC Solarflare Communications SFC9120 (Firmware-version: 4.2.2.1003 rx1 tx1)
- Linux (CentOS 7.0.1406 kernel 3.10.0-123.el7.x86_64)
- SolarFlare driver version: 4.1.0.6734a
Receiver Host:
- Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz (2 CPU Hyper-Trading Enabled, 20 Cores)
- RAM 128 GB, 2133 MHz
- NIC Solarflare Communications SFC9120 (Firmware-version: 4.2.2.1003 rx1 tx1)
- Linux (CentOS 7.0.1406 kernel 3.10.0-123.el7.x86_64)
- SolarFlare driver version: 4.1.0.6734a
Test scenario
The test scenario is the following:
- Two test servers are connected via 10GB link.
- An Initiator FIX session is established on the Sender Host , an Acceptor FIX session is established on the Receiver Host.
- The Initiator sends the New Order Single (35=D) message to the Acceptor; the Acceptor receives, validates and parses the message and sends the Execution report (35=8) message back to the Initiator.
During the test Round-trip time (RTT) is measured. The first measurement, t1, is made before the message is sent by the Initiator, the second, t2, is made after the received message is parsed by the Initiator.
RTT=t2-t1.
Test configurations
Properties | Balanced | Optimized | Optimized.onload | |
---|---|---|---|---|
Nagle`s algorithm1 | ✔ | |||
Affinity2 | ✔ | ✔ | ||
Message validation parameters | validateCheckSum | ✔ | ||
validateGarbledMessage | ✔ | |||
Storage type | Persistent | In memory | In memory | |
Queue type | Persistent | In memory | In memory | |
Solarflare OpenOnload3 | ✔ |
1Nagle's algorithm - the algorithm aimed at reducing the number of packets that need to be sent over the network. Nagle's algorithm works by combining a number of small outgoing messages and sending them all at once.
2Affinity - sessions threads are pinned to the specific CPU core.
3Solarflare OpenOnload - the kernel bypass technique by Solarflare that is activated within the test.
The following description will help to choose the most relevant FIX Antenna Java configuration:
- Balanced. It is the good starting point with balanced performance and security. Basic validation options are enabled to prevent corrupted message processing.
- Optimized. It is the configuration aimed at maximum performance. All validation options as well as message persistence are disabled, so it has to be used only in the fully controlled environment. Also, the affinity masks are used in this configuration.
- Optimized.onload. The kernel bypass technique by Solarflare that is activated within the test optimizes the TCP stack. However, the configuration may be applied only if there is a Solarflare Network interface controller installed on the machine.
Results
FIX Antenna Java 2.17.0 configurations comparison
Configuration | Balanced | Optimized | Optimized.onload |
---|---|---|---|
RTT values (microseconds) | |||
Min | 37,9 | 26,2 | 10,5 |
Max | 1894,5 | 1011,8 | 1928,0 |
Average | 46,0 | 28,6 | 11,3 |
RTT distribution (percentiles) | |||
50% | 43,9 | 27,7 | 10,9 |
95% | 63,1 | 30,8 | 12,0 |
99% | 72,4 | 52,2 | 13,8 |
FIX Antenna Java 2.17.0 vs FIX Antenna Java 2.15.27
Complete results of performance testing for FIX Antenna Java 2.15.27 may be found on the FIX Antenna Java 2.15.27 Benchmarks page.
Balanced configurations haven't changed since the previous measurement.
The following comparison is the result of the new affinity option being added to the FIX Antenna Java 2.17.0 optimized configuration.
Configuration | Optimized (2.17.0) | Optimized (2.15.27) |
---|---|---|
RTT values (microseconds) | ||
Min | 26,2 | 27,2 |
Max | 1011,8 | 1200,7 |
Average | 28,6 | 31,1 |
RTT distribution (percentiles) | ||
50% | 27,7 | 29,5 |
95% | 30,8 | 39,3 |
99% | 52,2 | 57,6 |
FIX Antenna Java 2.17.0 vs QuickFIX Java 1.6.3
The benchmark code was ported to QuickFIXJ:
Results can be compared in the following table:
Configuration | FIXAJ Balanced | FIXAJ Optimized | QuickFIXJ Default | QuickFIXJ Optimized |
---|---|---|---|---|
RTT values (microseconds) | ||||
Min | 37,9 | 26,2 | 85,7 | 70,0 |
Max | 1894,5 | 1011,8 | 38917,9 | 43565,8 |
Average | 46,0 | 28,6 | 258,0 | 242,6 |
RTT distribution (percentiles) | ||||
50% | 43,9 | 27,7 | 113,5 | 114,3 |
95% | 63,1 | 30,8 | 182,1 | 164,9 |
99% | 72,4 | 52,2 | 2107,0 | 1336,4 |
The difference between default and optimized QuickFIXJ configurations is in the storage type usage. The corresponding application argument is specified during performance test execution for each case.
Persistent storage type is used for the QuickFIXJ Default configuration. In QuickFIXJ Optimized configuration messages aren't persisted. This will force QFJ to always send GapFills instead of resending messages.