Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There’s no similar affinity setting for the outgoing connections, they are always assigned to threads in round-robin way. 

It is recommended to allocate 2-3 shared cores for TASK_EXECUTER, GENERIC, DISPATCHER_THREAD, Logger (these threads do non-latency critical tasks such as heartbeats, connect/disconnect, FIX resends) and use dedicated cores for TCP_READER(s), 1 core per thread.
FILE_CACHE_SYNC thread works best when assigned to 1 dedicated core on a separate CPU node, while all the other threads reside on another node.
If there is more than one processor socket on the machine, it is better to have the file sync thread run on a separate CPU node .

See this page for the reference.

...

  1. The usage of the Solarflare OpenOnload is recommended to achieve better latency results. To use it, it is needed to switch a couple of settings in the engine.properties file and run.sh. 

    If Onload is used and spinning in Onload is on then EF_poll timeout should be non-zero but less than spinning time. 
    E.g. if EF_POLL_USEC=600000 then TCPDispatcher.Timeout should be 500.

    TCPDispatcher.IncomingConnectionOnloadStackAffinity = true.
    If true then all inbound connections that end up at a particular listen socket will be served by the same thread. 
    If false sockets are spread in round-robin fashion among the reading threads like for outbound connections.

  2. Setting the following parameters is recommended for better performance given that user wants to persist messages:

    a.     extraParams.disableTCPBuffer_ = true;

    b.     extraParams.useNewTCPDispatcher_ = true; c.     extraParams.socketPriority_ = Engine::EVEN_SOCKET_OP_PRIORITY;
    d.     extraParams.storageType_ = Engine::persistentCached_storageType; 

    The following engine.properties setting: TCPDispatcher.Timeout = 0 (also available via API) suites to enable spinning in the socket reading thread. This is recommended for OS sockets use.

  3. If pre-populated messages are used for all the sending there are a couple of tips to promote faster handling of pre-populated messages so that they can be released to the wire as quick as possible.

    1.     You can use zero-padded numbers to reserve the space in the message buffer for numeric tags, e.g. msg->set(34, “000000000” );
    2.     Alternatively, msg->setTagLength( 34, 9 ) call can be used, it does the same job as the call above;
    3.     Finally, when all tags are pre-populated you can use the call msg.prepareContinuousBuffer() which would re-serialize the message buffer internally to make it ready to send;
    4.     You can still do some adjustments to the message before sending it out, like changing 38 or other tags (and changing the integer value tags would be the cheapest operation - changing the string value tags that may change the pre-populated tag’s length and cause a couple of extra steps before sending).

  4. It is also recommended to allocate 2-3 shared cores for TASK_EXECUTER, GENERIC, DISPATCHER_THREAD, Logger (these threads do non-latency critical tasks such as heartbeats, connect/disconnect, FIX resends) and use dedicated cores for TCP_READER(s), 1 core per thread.
    FILE_CACHE_SYNC thread works best when assigned to 1 dedicated core on a separate CPU node, while all the other threads reside on another node.
    If there is more than one processor socket on the machine, it is better to have the file sync thread run on a separate CPU node .