In this document, there is a detailed analysis of the reasons leading to missing/non-processed messages in your solution and troubleshooting recommendations.
The goal that is taken into account is to exclude messages loss or assist the user with message recovery in case of various failures.
|Assuming that the configuration is working and there have been no recent changes in the configuration that may lead to the failure.|
A detailed description of the failures are described below
The messages from the client are not delivered to FIXEdge. In this case, the messages are stored on the client's side.
After the reconnection, they recover automatically via standard resend request mechanism if the sender party keeps the sequences.
When counterparty resets or lowers the sequences then the undelivered messages will be lost.
FIXEdge have 2 options to handle these cases:
Users should apply the best strategy for their case.
If FIXEdge failure happens in the middle of routing the sequence number in FIX incoming logs is not updated so messages are considered as not received and will be recovered via FIX Standard resend request mechanisms after the restart. In some conditions, it may lead to duplicates. Missing messages are not expected in these cases.
The messages can be not delivered to the MQ in some cases. FIXEdge has a mechanism for handling these situations.
FIXEdge detects an event on the Business Logic layer that can be captured by a user. The user can implement the logic for error-handling these kinds of errors.
For example, it is possible to send reject back to the sender or to a special session for manual processing these messages later.
Error events types for the configuration above:
in most cases, these issues are related to the wrong or inefficient configuration.
FIXEdge saves a message to the producer queue with persistent storage before sending it.
In the case of FIXEdge failures or, network failures, or if the connection is dropped all non-delivered messages will remain as undelivered in the storage. Once the connection will be recovered, FIXEdge tries to finish the delivery and resend messages to IBM MQ Server.
In case if the IBM MQ server (WebSphere) is not available (e.g. stopped) there will be the next error in the logs.
ERROR [MQQueueWrite] <thread> MQ PUT Commit on queue '<src queue>' failed
Since the messages are saved on the IBM MQ server-side, they will be automatically recovered when IBM MQ Server becomes available from the FIXEdge server.
The recovery process may lead to duplicates if a transaction mode is enabled or missing messages if the transaction mode is disabled
FIXEdge requests the whole transaction if the network drop happens in the middle of the transaction so the messages that were processed before connectivity issues will be reprocessed/resent to the counterparty again
The message can be delivered to an error queue instead of FIX recipient.
MQ TA handles errors when sending messages to BL. If there is an error then the messages are sent to a special error queue so the messages can be processed later, e.g. when the root cause of the problem is resolved.
The only reason for the errors if there is no destination is configured for sending received messages from MQ Server. In most cases, it is related to wrong/invalid Business Logic configuration like there are no specific conditions fit a received message.
MQ Transport Adapter has a mechanism of sending messages as batches within a single transaction. This mechanism is used to detect message loss and recover missing messages.
The batch is considered as delivered if MQ Adapter gets confirmation about the delivery of all of the messages from the batch. Failure of sending even a single message makes the transaction unsuccessful and is resulted in all batch rollback.
If the transaction mechanism is enabled it may lead to message duplicates after recovery in some cases. In order to reduce the number of duplicates, it is recommended to use a transaction batch with a small size.
However, disabling the transaction mechanism may lead to missing messages in case of failures during receiving messages from MQ.
It is designed to increase the performance of the MQ TA.
MQ Client version 7 doesn't have the full support of batch sending so sending performance is not increased.
Additional information about persisting state of transactions can be found: Transactions in MQ
The messages for the error queue can be re-processed using dmpmqmsg tool. The tool can move messages across queues and should run on the server with IBM WebSphere.
Example of moving messages from the queue '
FIX.SESSION1.ERROR' to '
FIX.SESSION1.IN' with the content of '
dmpmqmsg -m TestMQ -IFIX.SESSION1.ERROR -oFIX.SESSION1.IN -sTEST_SENDERID
-m - QueueManagerName, see TransportLayer.MQAdaptor.MQ_MANAGER_NAME parameter
-I - Input queue name, e.g. name of Error Queue, see TransportLayer.MQAdaptor.Session.1.ErrorQueue parameter
-o - Output queue name, in most cases, should be consumer queue, see TransportLayer.MQAdaptor.Session.1.FromClientQueue parameter
-s - Message content, In the example above we use senderCompId for the target FIX session for filtering.
It is recommended to use different Error Queues per session for simplifying troubleshooting of the issues. If messages are related to a session it is easier to create filtering rules.
The queue per MQ session is specified with TransportLayer.MQAdaptor.Session.<N>.ErrorQueue parameter
If there was a FIXEdge failure when messages are in transit from MQ to FIX there is a chance to lose the messages. E.g. when FIXEdge marked the message as received from MQ but the message is not sent to the FIX recipient.
All such cases should be considered independently
The messages that came from transport adaptors must be routed somewhere. If the destination specified in the BL rule is not found then FIXEdge will raise a "Sending message to TL failed" error.
The user can use the Condition element for filtering messages to specifying the destinations. If there are no suitable conditions in the BL rule the message will not be delivered anywhere and the will be the next error.
2021-07-07 17:45:49,565 UTC DEBUG [BL_RoutingTable] 24288 No BL rules found for message with ClientID 'MQHub2', executing DefaultRule. 2021-07-07 17:45:49,565 UTC DEBUG [CC_Layer] 24288 BL has processed a message. Number of client IDs for delivery :0. Number or FIX sessions for delivery :0.. Number or sources identifiers for delivery :0. 2021-07-07 17:45:49,565 UTC ERROR [MQQueuesReader] 24288 Sending message to TL failed. Reason: TL has returned false. 2021-07-07 17:45:49,565 UTC DEBUG [MQQueuesReader_Debug] 24288 Sending to Error queue.
Conditions can work with regex
If the following BL rule results with the above error, that means that the incoming message does have ".*MATCHTHISTEXT.* in the tag 55
<?xml version="1.0" encoding="UTF-8"?> <FIXEdge> <BusinessLayer> <!-- Routing Rule for MQHub incoming messages --> <Rule> <Source Name="MQHub2" /> <Condition> <MatchField Field="55" Value=".*MATCHTHISTEXT.*"/> </Condition> <Action> <Send Name="FIXCLIENT" /> </Action> </Rule> <DefaultRule> <Action> <DoNothing/> </Action> </DefaultRule> </BusinessLayer> </FIXEdge>
To troubleshoot the user should check message content if message content fits if is not fitting apply additional conditions covering adding in the rules for another condition
When there it will be a destination for the message, the user will have the next output specifying that the message has a delivery point.
2021-07-22 20:56:45,325 UTC DEBUG [BL_RoutingTable] 8848 Found 1 suitable rules for message with ClientID 'MQHub2' 2021-07-22 20:56:45,325 UTC DEBUG [CC_Layer] 8848 BL has processed a message. Number of client IDs for delivery :0. Number or FIX sessions for delivery :0.. Number or sources identifiers for delivery :1. Source IDs: 1. 'FIXCLIENT' 2021-07-22 20:56:45,325 UTC DEBUG [FL_MsgTrace] 8848 Sending FIX message to session with ID 'FIXCLIENT'. Message: '8=FIX.4.49=15835=D49=FIXCLIENT56=FIXEDGE34=252=99990909-17:17:1711=Order#064=2021021921=3100=155=MATCHTHISTEXT54=160=20210219-04:11:46.76838=2000040=244=34.710=065'. 2021-07-22 20:56:45,437 UTC DEBUG [MQQueueRead] 8848 MQ GET Commit took 111.15ms, res: 1 2021-07-22 20:56:45,437 UTC DEBUG [MQQueuesReader_Debug] 8848 1 messages are removed form client Queue.
The messages from the FIXEdge are not delivered to the FIX Client if there are network issues. After the reconnection, the messages are recovered automatically via a standard resend request mechanism.
By default, the messages are stored on the FIXEdge side before sending it.
The messages can be lost when FIXEdge reset sequences by the schedule. All undelivered messages will be lost
If configuring proper schedule is not flexible enough and it is required to deliver missing messages the next day after sequence reset, it is recommended using the Managed Queue feature implementing a store-and-forward workflow.
|The Managed Queue feature was introduced in FIXEdge 6.9.0|
The message can be not delivered to the FIX Client in some cases.
It happens when FIXEdge tries to send a message from MQ but the session is offline and property for rejection fix messages in case of disconnection is set:
|FixLayer.FixEngine.Session.Session_Name.RejectMessageWhileNoConnection = true|
It is not recommended to use logic for sending messages in the same session again if the first attempt has failed and the OnUndeliveredMessageEvent event is called.
The list of user-defined Transport Adapter names. The several instances should be separated by a comma.
The following configuration will show configuration parameters for an adapter with the name TransportLayer.MQAdaptor, i.e.:
(deprecated) Defines a number of reconnection attempts if MQ Server not available.
The value of -1 means for unlimited reconnection attempts.
|Time interval in milliseconds between reconnection attempts||1000|
Enables/disables transactions mode:
|The maximum number of messages the batch for receiving from the MQ Server in a single transaction. Should be positive.|
The bigger value increases performance.
Required if UseTransactions is set to 'true'
|The maximum number of messages the batch for sending to the MQ Server in a single transaction. Should be positive.||Required if UseTransactions is set to 'true'||10|