2020-03-08

Big number maths

One of my first C programmes ever, at uni, was a calculator. I had done a lot of code in BASIC, Z80, 6502 and other languages before, but C I learned at uni.

It was a strange exercise for me as I was teamed with someone else, and basically gave up on him. To me a calculator, i.e. something that could parse 1+2*3 and get 7, and understand brackets, was inherently simple. The code was a couple of pages at most and I had added all sorts of extras over and above the basic +, -, *, / binary operators. It handle brackets and operator precedence, and even had a table of operators to allow easy expansion. My "partner" made code that was dozens of pages and every test that involved extra brackets or some unexpected sequence broke it and he had two add extra code for edge cases. Oddly I once encountered a compiler that must have been written in such a way as adding extra brackets, e.g. 1+(2*3), actually broke it and made wrong code. Scary that stuff like exists.

My code has a simple loop to process: prefixes or "("s, operand, postfixes or ")"s, operator, in a loop, ending after last operand. It stacked operands, and operators (after processing the stack for any higher operators before adding), and that was it. It allowed unary pre and post operators, and binary operators. Simple and easy to understand, IMHO.

Of course, it used the normal C code to parse and output numbers, storing internally as floats (I think).

However, having played with mechanical calculators, one of the things I have meant to code one day and had not got around to for over 30 years, is a decimal calculator library... That is until this weekend.

Basically, one tool I have written, and my friends use a lot, includes an "eval" function to evaluate a simple sum for them. It is used in loads of places (as it embeds maths in a back end for a web page). It used integer (long long) types (64 bit) and shifted by number of decimal places it saw, giving it around 18 significant digits. Sadly it broke if you went over that (my bad) and they had some silly case of some numbers plus a fraction which had rather too many decimal places. So they asked me to fix.

A simple fix was limit the size, and then limit size and apply bankers rounding at the limit.

But I decided this may be the time to finally do that decimal maths code. I googled, and there are a few decimal libraries. There are also some arbitrary precision binary libraries. To my surprise the latest GCC has decimal types, where data is stored and processed in decimal. I can only assume we have decimal maths in some processors even, these days. This is good for accounting where the rounding errors you can get are bad (essentially binary does not have a way to represent 0.1 or 0.01 without recurring digits). Sadly GCC may have it, but clib does not yet (scanf and printf), but will some time, and I can see that being useful. Even so, _Decimal128 does have limits on number of digits. There are libraries in other languages, but I wanted something for C.

So I made a C string decimal library, and put it here on GitHub for all to use.

It is a library and command line, and includes functions to add, subtract, multiple, divide, and compare, as well as a simple evaluation function that parses basic sums, just like my uni project calculator.

The key thing is that it works on C text strings for numbers. A simple sum (I learned at school, and now out of date) was 366.2564*86164.091=31558149.7789324. It was numbers from an encyclopaedia in the library. I remember because I had to do the maths manually as no calculator I had would do it to full precision. Even google calculator truncates it a bit. That was my first test, and it worked. Yes, I also learned π to 50 places (and now only remember 25).

So, for add, subtract, and multiple, it is simple to ensure you have as many digits as needed. Sadly division can go on for ever, so I had to include a limit of decimal places, and a rounding rule, and remainder option.

I included rounding on limit on division, and also a separate simple rounding function - obviously actually applied at whatever digit you set. Default is banker's rounding :-

  • Round towards 0 (i.e. -3.9 rounds to -3)
  • Round away from 0 (i.e. -3.9 rounds to -4)
  • Round towards -ve (i.e. -3.9 rounds to -4, 4.2 rounds to 4)
  • Round towards +ve (i.e -3.9 rounds to -3, 4.2 rounds to 5)
  • Simple rounding (i.e 2.4 rounds to 2, 2.5 rounds to 3)
  • Banker's rounding (i.e. 2.4 rounds to 2, 2.5 rounds to 2, 2.6 rounds to 3, 3.5 rounds to 4)
I went further though, and made the calculator work on rational numbers, that means all of the maths is done with numerator and denominator, and only finally at the end does the division get done and rounding applied. This means that 1000/7*7 is 1000, unlike bc which makes it 994. minor optimisations for adding/subtracting with same denominator, and anything with denominator of 1 or a power of 10, making it simpler. For anything without division the maths does not even need a final divide.

I had a few silly bugs, one that fooled me is that zero is a special case, which I store as a magnitude of 0 and no digits, which is fine, but comparing numbers I started by comparing magnitude before comparing digits, so 0 looked like a bigger number than 0.1 as 0 was 0*10^0 and 0.1 was 1^10-1 and magnitude 0 is greater than magnitude -1. I had to add checks for comparison with 0.

However, overall, I am quite chuffed. Heck, I even asked it to work out 1e1000000000+1 and it just worked, that is a billion significant digits!

This is pure coding fun though. But may be useful - feel free to use it.

P.S. Someone asked about Banker's rounding, and I was sure I had blogged once, but cannot find it. Unlike that which I was taught at school, and every Casio (and other) calculator I had, rounding is not as simple as you think. I used to assume that a residual of exactly 0.5, or above, rounded up, and below 0.5 was down. That is what my calculator did. However, this creates a bias. Ideally you want to round an exact 0.5 residual down half the time and up half the time to remove bias. This could be random, but that is bad as you get results which are not reproducible. One simple way, bankers rounding, is to round 0.5 residual to nearest even number. Hence 0.5 rounds to 0, 1.5 rounds to 2, 2.5 rounds to 2, 3.5 rounds to 4, and so on.

6 comments:

  1. I was going to say "python3 just does this" until I got to your final example:

    >>> 1e1000000000+1
    inf

    ReplyDelete
  2. I may be missing something, but why in bankers rounding does 2.5 round to 2 yet 3.5 rounds to 4?

    ReplyDelete
  3. I'm puzzled about the apparent bias in 'normal' rounding - and I've never heard of 'bankers rounding' before.

    Consider an input of numbers to 3 decimal places - so there are 1000 datapoints between integer X and (just below) integer X+1. These datapoints range from X.000 to X.999 .
    With normal rounding, 500 datapoints (X.000 to X.499) are rounded down to integer X. and 500 datapoints (X.500 to X.999) are rounded up to integer X+1. Doesn't seem any bias there.
    So, am I missing something ?

    ReplyDelete
  4. I thought the same, but no. The bias introduced is the amount added/subtracted as part of the rounding - you want that, on average, to be zero. Do it with 1 place to simply. You have 5 "down" and 5 "up", but the 5 "down" are -0.0, -0.1, -0.2, -0.3, -0.4 (average -0.2) and the "up" are +0.5, +0.4, +0.3, +0.2, +0.1 (average +0.3). So bias. Assuming random / even distribution of 0.0 to 0.9 you, on average, change the numbers by +0.05 if you always round up at 0.5.

    ReplyDelete
  5. Worth a look at:
    https://en.wikipedia.org/wiki/IEEE_754#Rounding_rules

    ReplyDelete

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.

Don't use UPS to ship to UK

I posted about shipping and importing and tax and duty - general info. But this is specific. DON'T USE UPS! I had assumed the UPS issue ...