2024-07-22

TOTSCO moving goal posts, again!

One of the big issues I had in initial coding was the use of correlationID on messages. The test cases showed it being used the same on a sequence of messages, e.g. a Switch Order had a destination correlation which only made sense if it was a response to a Match Confirmation, for example. I was wrong, but not for lack of reading the spec.

The API spec says this: In a source element, the correlationID must always be provided, the format can be anything the originator chooses to support their messaging process but should be sufficiently unique to allow correlation of response with request over a reasonable period.

This makes it clear what purpose the correlation ID has, it matters to sender so they can correlate response with request. It also makes it clear the sender is who chooses the correlationID.

Now, for that purpose a Match Request, and subsequent Switch Order, and Switch Order Trigger could all have the same correlationID. Indeed, arguably, a sender could use the same correlation on all Switch Order related messages because the messages all carry a Switch Order Reference, which can be used to tie the response to a specific order. An obvious choice, and we nearly did this, was to use the actual switch order reference as the correlationID.

Also, there is nothing to stop an originator, when generating a reply, to use correlationIDs differently, as they don't expect a response to that reply, and there is no correlation of response with request. Again, an obvious choice for the various switch order messages would be the switch order reference, as this is the one thing missing from a MessageDeliveryFailure message, and would allow that error to tie to a switch order.

TOTSCO Bulletin 66

TOTSCO just released bulletin 66, on handling received (from hub) messages better, notably on response times and validation, but also on handling duplicate requests. They detail a recommendation that the messages are cached for a while, per originating RCPID and source correlationID, and use this to spot a duplicate.

If a sender chose to use the same correlationID for a Match Request and Switch Order, which is definitely sufficiently unique to allow correlation of response with request as per the spec, the recipient would see the Switch Order as a duplicate message and ignore it, maybe resending the Match Confirmation.

If the sender chose to use the SOR on switch order messages or replies, the recipient would see all messages after the first as duplicates, and ignore them.

So now, if effect, based on just a bulletin, the specification mandates that every message sent (request or reply) has a unique correlationID, something not in the spec. In general this is a good idea, but the API spec should have stated that at the start! It now means the source correlation ID matters to the recipient as well, not just the sender. And they have not changed the spec as it is in a change freeze. Oh, and there is no size limit for a correlationID.

The bulletin does not even actually say the sender correlationID has to be unique, it basically assumes it is and explains how recipients can assume it is for spotting duplicate messages!

Once again, a fiasco.

P.S. Our implementation does unique source correlationID already (uses a UUID).

Also, I have updated the NOTSCO test platform to warn of duplicates, and generate a duplicate as well to test CPs handling of duplicates.

Just to add, the confusion caused by the poor specifications is real. Not just that we were confused by the examples implying a way of working, but I monitor the NOTSCO testing and see other CPs doing similar things, based on the specification, that are going to be problems. I'm just waiting for this new check to kick off and show a CP assuming they can pick source correlationIDs for their own purposes (this did happen later in the day). In fact, looking at logs today (we only keep for a day) I already see duplicated correlationIDs that will break when sent to any CP following TOTSCO Bulletin 66.

This is a bigger issue than you realise!

We originally coded with a way of working with correlationIDs that would fall foul of any CP following bulletin 66. We changed later once TOTSCO confirmed that basically its test cases are wrong.

I am seeing now half of the CPs testing on NOTSCO hitting the duplicate test.

The whole way TOTSCO do testing is two random CPs testing against each other. That would NOT have picked up this at all. So the CPs carry on.

Then, wham, on 12th Sep, some OTS messaging breaks because one of the CPs followed the spec (which has NOT BEEN UPDATED) and one implements the de-duplication in bulletin 66.

The fact TOTSCO do ZERO formal testing against the spec is just a serious problem - that is just irresponsible. I'm amazed OFCOM allow it.

1 comment:

  1. I wish RevK had been put in charge of TOTSCo..

    ReplyDelete

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.

Another pub

If you have followed me on mastodon you will know what is happening. Some of you may know I purchased a former pub for my home in Wales near...