Overview

This article was created to present the following FIX Antenna Java performance improvements:

  1. An ability to pin session's thread to the specific CPU (affinity) was implemented in the 2.17.0 version of FIX Antenna Java. The new affinity option was enabled for the optimized configuration.
  2. The new optimized.onload configuration with activated Solarflare OpenOnload network stack was added to the FIX Antenna Java performance testing.

How to measure FIXAJ performance using the FIXAJ package sample

Two Linux machines should be used for the test, one as the Sender Host (Client), the other as the Receiver Host (Server).

FIX Antenna Java performance can be measured by the execution of runClientRoundTripLatencyBM and runServerRoundTripLatencyBM scripts contained in the FA Java package.

Configuration file engine.properties can be found in the benchmarks\etc\benchmark\latency folder.

You can reproduce the measurement on your hardware using the following instruction:

  1. Go to the benchmarks folder
  2. On the Receiver Host execute runServerRoundTripLatencyBM script
  3. On the Sender Host open the runClientRoundTripLatencyBM script and change the IP address from the localhost to the IP of the Receiver Host, run the script
  4. If a Solarflare Network interface controller is installed on the machine it is possible to activate the Solarflare OpenOnload technique. To achieve this you need to execute the scripts as follows:
    onload --profile=latency ./runServerRoundTripLatencyBM on the Receiver Host
    onload --profile=latency ./runClientRoundTripLatencyBM on the Sender Host
  5. After the test is completed, the latency.csv file with the raw data will be created on the Sender Host in the same folder


The out-of-the-box engine.properties file may be used for optimized and optimized.onload configurations. For balanced configuration engine.properties file should be empty.

The source code may be found in RoundTripServer.java and RoundTripTester.java samples of the FA Java package (benchmarks\src\com\epam\benchmark\latency).

Environment

Sender Host:

  • Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz (2 CPU Hyper-Trading Enabled, 24 Cores)
  • RAM 128 GB, 2133 MHz
  • NIC Solarflare Communications SFC9120 (Firmware-version: 4.2.2.1003 rx1 tx1)
  • Linux (CentOS 7.0.1406 kernel  3.10.0-123.el7.x86_64)
  • SolarFlare driver version: 4.1.0.6734a

Receiver Host:

  • Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.10GHz  (2 CPU  Hyper-Trading Enabled, 20 Cores)
  • RAM 128 GB, 2133 MHz
  • NIC Solarflare Communications SFC9120 (Firmware-version: 4.2.2.1003 rx1 tx1)
  • Linux (CentOS 7.0.1406 kernel  3.10.0-123.el7.x86_64)
  • SolarFlare driver version: 4.1.0.6734a


Test scenario

The test scenario is the following:

  • Two test servers are connected via 10GB link.
  • An Initiator FIX session is established on the Sender Host , an Acceptor FIX session is established on the Receiver Host.
  • The Initiator sends the New Order Single (35=D) message to the Acceptor; the Acceptor receives, validates and parses the message and sends the Execution report (35=8) message back to the Initiator.

During the test Round-trip time (RTT) is measured. The first measurement, t1, is made before the message is sent by the Initiator, the second, t2, is made after the received message is parsed by the Initiator.

RTT=t2-t1.

Test configurations

Properties
BalancedOptimizedOptimized.onload
Nagle`s algorithm1


Affinity2 
Message validation parametersvalidateCheckSum 

validateGarbledMessage ✔

Storage type
PersistentIn memoryIn memory
Queue type
PersistentIn memoryIn memory
Solarflare OpenOnload3


1Nagle's algorithm - the algorithm aimed at reducing the number of packets that need to be sent over the network. Nagle's algorithm works by combining a number of small outgoing messages and sending them all at once.

2Affinity - sessions threads are pinned to the specific CPU core.

3Solarflare OpenOnload - the kernel bypass technique by Solarflare that is activated within the test.


The following description will help to choose the most relevant FIX Antenna Java configuration:

  • Balanced. It is the good starting point with balanced performance and security. Basic validation options are enabled to prevent corrupted message processing.
  • Optimized. It is the configuration aimed at maximum performance. All validation options as well as message persistence are disabled, so it has to be used only in the fully controlled environment. Also, the affinity masks are used in this configuration.
  • Optimized.onload. The kernel bypass technique by Solarflare that is activated within the test optimizes the TCP stack. However, the configuration may be applied only if there is a Solarflare Network interface controller installed on the machine.


Results

FIX Antenna Java 2.17.0 configurations comparison

ConfigurationBalancedOptimized

Optimized.onload

RTT values (microseconds)
Min
37,926,210,5
Max
1894,51011,81928,0
Average46,028,611,3
RTT distribution (percentiles)
50%43,927,710,9
95%63,130,812,0

99%

72,452,2

13,8

FIX Antenna Java 2.17.0 vs FIX Antenna Java 2.15.27

Complete results of performance testing for FIX Antenna Java 2.15.27 may be found on the FIX Antenna Java 2.15.27 Benchmarks page.

Balanced configurations haven't changed since the previous measurement.

The following comparison is the result of the new affinity option being added to the FIX Antenna Java 2.17.0 optimized configuration.

ConfigurationOptimized (2.17.0)Optimized (2.15.27)
RTT values (microseconds)
Min
26,227,2
Max
1011,81200,7
Average28,631,1
RTT distribution (percentiles)
50%27,729,5
95%30,839,3

99%

52,2

57,6

FIX Antenna Java 2.17.0 vs QuickFIX Java 1.6.3

The benchmark code was ported to QuickFIXJ:

Sender Host
public final class RoundTripTester extends ApplicationAdapter {
    ..
    private void sendMessage() throws IOException {
        workMessage.getHeader().setInt(MsgSeqNum.FIELD, workMessageSequence++);
        start = System.nanoTime();
        session.send(workMessage);
    }
 
     
    @Override
    public void fromApp(Message message, SessionID sessionId) throws FieldNotFound, IncorrectDataFormat, IncorrectTagValue, UnsupportedMessageType {
            long end = System.nanoTime();
            String type = message.getHeader().getString(MsgType.FIELD);
            if (workMessageType.equals(type)) {
                final long currentLatency = end - start;
                latencies[counter++] = currentLatency;
                if (limitEnabled) {
                    long nextSendTime = testStart + (counter * sendIntervalMsec);
                    while (true) {
                        long currentTime = System.currentTimeMillis();
                        long toSleep = nextSendTime - currentTime;
                        if (toSleep <= 0) {
                            break;
                        }
                    }
                }
 
                if (counter == NO_OF_MESSAGES) {
                    //print results
                } else {
                    sendMessage();
                }
            } else if (warmUpMessageType.equals(type)) {
                warmUpCounter++;
                if(warmUpCounter == WARM_UP_CYCLE_COUNT){
                    log.info("Warm up cycle has been done");
                    testStart = System.currentTimeMillis();
                    sendMessage();
                }
            }
    }
    ...
Receiver Host
public final class RoundTripServer extends ApplicationAdapter {
    ....
    @Override
    public void fromApp(Message message, SessionID sessionId) {
         try {
            String type = message.getHeader().getString(MsgType.FIELD);
            if("D".equals(type)) {
                executionReport.setString(ClientID.FIELD, message.getString(ClientID.FIELD));
                executionReport.setString(Price.FIELD, message.getString(Price.FIELD));
                executionReport.setString(OrderQty.FIELD, message.getString(OrderQty.FIELD));
                executionReport.setString(TransactTime.FIELD, message.getString(TransactTime.FIELD));
                executionReport.setInt(ExecID.FIELD, ++execID);
                executionReport.setInt(OrderID.FIELD, ++orderID);
                Session.sendToTarget(executionReport, sessionId);
            } else {
                Session.sendToTarget(message, sessionId);
            }
        } catch (Exception e) {
            log.error(e.getMessage(), e);
        }
    }
...

Results can be compared in the following table:

Configuration

FIXAJ Balanced

FIXAJ Optimized

QuickFIXJ Default

QuickFIXJ Optimized
RTT values (microseconds)
Min
37,926,285,770,0
Max
1894,51011,838917,943565,8
Average46,028,6258,0242,6
RTT distribution (percentiles)
50%43,927,7113,5114,3
95%63,130,8182,1164,9
99%72,452,22107,0

1336,4

The difference between default and optimized QuickFIXJ configurations is in the storage type usage. The corresponding application argument is specified during performance test execution for each case.

Persistent storage type is used for the QuickFIXJ Default configuration. In QuickFIXJ Optimized configuration messages aren't persisted. This will force QFJ to always send GapFills instead of resending messages.



  • No labels