Mainly for a library where we want to add extra optional options in the future, so want almost C++ style optional and tagged arguments to a function, but in normal C (because C++ is just evil all by itself).
2020-08-27
Pseudo C++ using cpp (the RevK macro)
Mainly for a library where we want to add extra optional options in the future, so want almost C++ style optional and tagged arguments to a function, but in normal C (because C++ is just evil all by itself).
💩
I have been involved with SMS (i.e. text messaging) for a long time. I was even on the ETSI committees that designed GSM (not specifically SMS, sadly), and have been doing things with SMS for nearly 30 years in one way or another, including an SMS->fax/email gateway, and even the ETSI landline SMS module for asterisk. Now, at A&A, we have code to send and receive SMS via a variety of carriers and even a SIP a-law based ETSI landline SMS system.
The specification for SMS is a typical telecoms specification - very different to internet specifications where single bits packed in some small data header can subtly change the interpretation of some or all that follows. These specifications are normally very precise but absolutely horrid, in my view.
But where does the pile of poo come in, and how does it relate to a 30 year old specification for SMS? Well, you may be surprised, but SMS allows for 💩.
SMS are actually coded in the signalling used for calls, and so had limited space. There were actually only 140 bytes (or more correctly octets) of data for the text itself. As you may know SMS allow 160 characters, so this is achieved by packing a 7 bit alphabet in to the 140 bytes.
In fact SMS allows 4 ways the data can be coded, a 7 bit special alphabet, an 8 bit Latin-1 alphabet, and 16 bit unicode (allowing 70 characters). There are also ways to send one longer message in smaller parts. The SMS can also be raw data to be sent to a SIM rather than displayed. Had I written this I'd have used 2 bits to say which it is, but no, the specification uses a Data Coding Scheme which is complicated to say the least. Some times the coding is in 2 bits but others it is implied. It is not fun.
The 7 bit alphabet is sort of ASCII, but does allow some interesting characters - being a European spec it includes some accented characters and even some Greek letters.
Of course this also leaves out some key ASCII such as {, }, [ ], and does not even have € (which was added later). These are coded as two character sequences using ESC.
The 8 bit character set is just normal Latin 1, and the 16 bit is unicode. The unicode allows all unicode characters U+0000 to U+FFFF, but where is pile of poo? It is U+1F4A9 which is too big for 16 bits.
The way this is done is to use a little known trick called UTF-16. There are reserved 16 bit unicode characters U+D800 to U+DFFF. Using two such codes it is possible to encode U+10000 to U+10FFFF.
This means 💩 is actually coded as two 16 bit sequences, 0xD83D 0xDCA9 in SMS!
Why does this matter, I mean, who sends 💩 by SMS? As you can imagine, in the early 90's nobody had heard of 💩, and the best emojis we had were :-)
But we do care, honest, as we use it as a blue* M&M test for carriers we deal with. If they have enough attention to detail to handle a pile of poo they probably have the rest sewn up, technically. We are working with a new carrier for SMS messages, and I am pleased to say the unicode is working. They properly translate to/from UTF-8 coding in the messages we exchange (which is what we use internally). Unlike our previous carrier who could not cope. (* see comments)
We have seen a range of such failures, even the case where one carrier could not handle an @ symbol (presumably as it coded to 0x00 which is an end of string in languages like C). Thankfully that carrier was happy for us to send a raw hex TPDU for SMS, and hence allowing us to code any characters. Our SIP2SIM service has handled pile of poo since we launched it...
The end result is that, shortly, we will be handling a lot more SMS with unicode characters correctly, in most cases, both incoming and outgoing. Watch this space.
2020-08-15
Making a meme?
2020-08-01
A simple flat tyre - but this is 2020, so no...
I²S
I²S is, err, fun. What is I²S Well, first off, it is grammatically like I²C which is an acronym with two Is in it which people then treat an...
-
Broadband services are a wonderful innovation of our time, using multiple frequency bands (hence the name) to carry signals over wires (us...
-
For many years I used a small stand-alone air-conditioning unit in my study (the box room in the house) and I even had a hole in the wall fo...
-
It seems there is something of a standard test string for anti virus ( wikipedia has more on this). The idea is that systems that look fo...