This is hopefully going to help other small ISPs that will have the same challenges.
As I explained in my previous post, we have to work with TOTSCO to set up One Touch Switching. Well, we are doing that now that TOTSCO actually exists. The new deadline is September, but we want to ensure we are working well before that.
Specifications
The specifications are not too bad. They have a few inconsistencies, which I have fed back to them. But I was able to code the system reasonably quickly. I created my own test system to act like TOTSCO so I could test my code with messages in and out in advance.
The underlying system is, as I say, just a messaging process between telcos. It can use OAUTH2, which is simple, and involves JSON messages each way, which is also simple. I use C and a load of long standing in-house JSON libraries, but for most people they would use some other platform with standard JSON libraries I am sure. It should be pretty simple. Obviously the hard part is integrating which whatever back end systems and processes the ISP uses, oh, and checking data for clean address data for matching services including UPRNs.
Simulator
TOTSCO have a simulator, which is good. It will allow testing against them. It has been two weeks since I finished coding it all, and only just on the simulator, but it is a mess, so far.
- The token issuing URL had an invalid certificate (wildcard, but one level too high). I ignored that to get further testing.
- The directory URL did not work (404). This provides (or should provide) the list of ISPs, basically.
- The messaging URL simply said "Error connecting to the back end".
Well, that is not a good start, but chasing up, after several days they finally want me to check I am using the correct URLs. Good thing to check, but I was, as per the spec.
- They fixed the token certificate, good, but the reply did not say they fixed it. The new cert now uses a different CA that libcurl does not know, or some such, which is fun. But at least is valid.
- They told me to use the directory path but on the token issuing host, which makes no sense. Re-reading the documentation it certainly implies the directory URL is an "API" and so you would expect to use the API host. So that is weird. But it still did not work (404 Not Found). I eventually found it works if I add the optional parameter &identity=all. Well, it is meant to be optional, and is a GET form style argument, so how it was giving 404 is beyond me. Interestingly, with that, it works on token host and API host, so even weirder.
- They told me to use a path for the messaging that starts /testharness/ which is not as per the specification (which states /letterbox/). So basically the simulator does not follow the specification! Using testharness gets further but a different error this time.
- Oh, and the directory I get has RCPIDs (Retail Communications Provider IDs) which don't meet the specs, so, of course, my code barfs trying to put them in the database which was set for 4 characters, as per the specification. So again, the simulator does not meet the specification.
Some progress
Well, surprisingly, we have a quick response now.
- They say that the duff RCPIDs are dummy entries. OK, but surely they should at least have correct syntax, as otherwise it is sensible for my end to reject them.
- They just say testharness should work, but I have to use specific RCPIDs for testing, good (would be nice if that was documented, maybe I missed something). But they really need to fix it to actually follow the spec and use letterbox.
- I got as far as testing a match request and them trying to send a reply. They get an OAUTH2 Bearer token, and then try and post a message, but the message they post does not use the same bearer token I issued to to them, so is rejected.
- I can see what they tried to post and it does not have the right source and target RCPIDs or correlationIDs, so again I would reject them if they actually authenticated.
- Oddly, after more tests, they are using the right bearer now, but still wrong IDs
Next steps
I have come to the conclusion that the simulator is actually useless. It does not simulate either the TOTSCO messaging platform (as it does not actually use the right URLs, or provide a sensible directory, or actually do OAUTH2) nor actual end to end messaging (as it does not do source/target RCPID or correlationID correctly).
What really puzzles me is that we know we are not the first to do this, and we know some of the big telcos have done this. So how have other ISPs not ripped TOTSCO to pieces over this stupidity already?
Follow up call
We have had a call. They explain that the simulator is totally dumb, it cannot be told to initiate any messages, and all it does it send one of two fixed replies to a match request (depending on the RCPID to which it is sent). It is meant to test connectivity.
But they want to do more than just two match requests and replies, they want us to send the order, update, tigger, and cancel requests.
This makes no sense, as the match requests test connectivity both ways already. And, of course, my system will not do that as it has not received a valid switch order confirmation reply. The fixed text they send is not valid as wrong RCPID and correlationID, so we don't accept it and don't store the switch order reference. And as such it does not see a switch order we can place or update or trigger or cancel.
I could fake such messages, but that is not testing my system.
They say that if I email explaining this, they will move to pre-production platform. The is the same as live, but with other CPs.
What they seem to lack is any sort of useful simulator that handles messages both ways as if to another CP. This would seen a sensible step before going to pre production testing.
Pre-production testing
We have moved on. Yay!
But the simulator test is meant to test connectivity, and seriously, does no more than that.
So you would hope and expect it simulates the real system.
But no!
- The pre-production system has a stupidly big Bearer token, which breaks SQL tinytext. The simulator was way smaller, so not representative of the live system.
- The pre-production system can't talk to us, not sending an Authorisation header, WTF!?
I can confirm we have pre-production testing, and now we have to work with a buddy CP to test. They spent a week not finding one? So we suggested someone, and we are now ready to send and receive messages to complete the integration testing.
This whole process would be literally weeks quicker if they had something like my NOTSCO system.
More challenges
TOTSCO seem to see no issue with the fact they have not defined key data types, such as an RCPID. Well, they do, in one document, but they refuse to follow that spec and insist they have not specified. How they can even start without specifying key data types is beyond me.
> The told me to use a path for the messaging that starts /testharness/ which is not as per the specification (which states /letterbox/). So basically the simulator does not follow the specification!
ReplyDeleteThat's... awful.
There's very little point in having a test environment if it doesn't offer a reasonable level of fidelity. At some point, someone's implementation is going to fail to work because they'll have left a sandbox specific path in place.
Quite. You expect the whole point of the sandbox having a different hostname is that everything else can be identical.
Delete"So how have other ISPs not ripped TOTSCO to pieces over this stupidity already?" I'd say the other ISPs(the larger ones at least)will use this right at the last minute and ask for another 6-12;months before launching OTS :)
ReplyDeleteWhen I worked in the banking industry, we had systems like this all the time, and they were painful to deal with. Of course the difference there was these were legacy systems which had been built up over years, to the influence of internationalised standards such as ISO 20022, so were tricky to change – it's much worse that this is a brand new development! And they say AI will improve this...
ReplyDeleteIn fact, I remember one system from a particularly awkward supplier (and a very large one at that – most UK residents will almost certainly have interacted with their services without realising). We had a maximum latency to respond to their inbound messages to comply with the upstream payment processing requirements. They already caused problems on inbound requests as they started the timer, then ate 50 ms of our response time marshalling and transmitting XML to us! But we optimised around it and came in with a 10% buffer on the required response deadline. Only to discover that, in their test environment, they neglected to also put their hardened HSMs for doing the crypto stuff on the request path, opting for a software crypto option instead. Once we hit prod, all the timings blew up because we also had to wait for the additional HSM operation cost.
There are now various providers in the banking space who purport to be "gateway" service providers – they take the pain of interacting with these upstreams and present a sane, compliant, modern API of their own to their consumers, whom they were able to charge huge fees. It means people can be lazy and not understand the standards, but it also means someone only has to solve the problem once of figuring out all the edge cases of institution X not talking to institution Y because they misread the spec and now cannot change it. Perhaps you need to offer the same sort of gateway interface service for TOTSCO – for a fee, of course!
You mentioned "The pre-production system can't talk to us, not sending an Authorisation header, WTF!?"
ReplyDeleteCan I ask how you got this fixed as we are experiencing the same problem and TOTSCO are trying to blame our token generation.
I can't recall exactly which way it was. Which way are you seeing the issue. The main one we ran in to was on the test system we made where apparently some people send the user/pass in the query not as an Authorization header, and the RFC allows both ways. Have you tried NOTSCO test server at all?
ReplyDeleteIt’s receiving from the hub. Same code was just used with the simulator without issue, just updated the credentials for pre-production. I can see we’re issuing the token to them but they just fail to include it when they call our letterbox endpoint.
ReplyDeleteI.e. “Authorization: Bearer” instead of “Authorization: Bearer hduutdhkcdyujxhjutd”
Service desk says we must have an issue with our token generation but our logs clearly show it’s providing them tokens.
We’re making extensive use of NOTSCO (thank you btw) and haven’t encountered this issue.
It may be worth my checking what we see from notsco to see if I can see anything wrong, and if I can then flag it as an error. What company?
DeleteEmail the Costco email addtess
DeleteLOL spell check- NOTSCO
DeleteCostco email probably more helpful !!
Delete