What every computer scientist should know about floating-point arithmetic (1991) [pdf]

For anyone turned off by this document and its proofs, I recommend Numerical Methods for Scientists and Engineers (Hamming). Still a math text, but more approachable.

The five key ideas from that book, enumerated by the author:

(1) the purpose of computing is insight, not numbers

(2) study families and relationships of methods, not individual algorithms

(3) roundoff error

(4) truncation error

(5) instability

What Every Computer Scientist Should Know About Floating-Point Arithmetic (1991) - https://news.ycombinator.com/item?id=23665529 - June 2020 (85 comments)

What Every Computer Scientist Should Know About Floating-Point Arithmetic - https://news.ycombinator.com/item?id=3808168 - April 2012 (3 comments)

What Every Computer Scientist Should Know About Floating-Point Arithmetic - https://news.ycombinator.com/item?id=1982332 - Dec 2010 (14 comments)

What Every Computer Scientist Should Know About Floating-Point Arithmetic - https://news.ycombinator.com/item?id=1746797 - Oct 2010 (2 comments)

Weekend project: What Every Programmer Should Know About FP Arithmetic - https://news.ycombinator.com/item?id=1257610 - April 2010 (9 comments)

What every computer scientist should know about floating-point arithmetic - https://news.ycombinator.com/item?id=687604 - July 2009 (2 comments)

(1991). This article is also available in HTML: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h...

    > 0.1 + 0.1 + 0.1 == 0.3
    
    False

I always tell my students that if they (might) have a float, and are using the `==` operator, they're doing something wrong.

That has more to do with decimal <-> binary conversion than arithmetic/comparison. Using hex literals makes it clearer

     0x1.999999999999ap-4 ("0.1")
    +0x1.999999999999ap-4 ("0.1")
    ---------------------
    =0x3.3333333333334p-4 ("0.2")
    +0x1.999999999999ap-4 ("0.1")
    ---------------------
    =0x4.cccccccccccf0p-4 ("0.30000000000000004")
    !=0x4.cccccccccccccp-4 ("0.3")

Absolutely nobody will think this is 'clearer', this is a leaky abstraction and personally I think that the OP is right and == in combination with floating point constants should be limited to '0' and that's it.

I also like how a / b can result in infinity even if both a and b are strictly non-zero[1]. So be careful rewriting floating-point expressions.

[1]: https://www.cs.uaf.edu/2011/fall/cs301/lecture/11_09_weird_f... (division result matrix)

Anything that overflows the max float turns into infinity. You can multiply very large numbers, or divide large numbers into small ones.

Sure, division might be a tad more surprising though since most don't do that on an every-day basis. The specific case we had was when a colleague had rewritten

  (a / b) * (c / d) * (e / f)

  (a * c * e) / (b * d * f)

as a performance optimization. The result of each division in the original was all roughly one due to how the variables were computed, but the latter was sometimes unstable because the products could produce denomalized numbers.

I would argue that

    double m_D{}; [...]

    if (m_D == 0) somethingNeedsInstantiation();

can avoid having to carry around, set and check some extra m_HasValueBeenSet booleans.

Of course, it might not be something you want to overload beginner programmers with.

.125 + .375 == .5

You should be using == for floats when they're actually equal. 0.1 just isn't an actual number.

Are you saying that my students should memorize which numbers are actual floats and which are not?

    > 1.25 * 0.1
    
    0.1250000000000000069388939039

> 0.1 just isn't an actual number.

A finitist computer scientists only accepts those numbers as real that can be expressed exactly in finite base-two floating point?

One thing that really did it for me was programming something where you would normally use floats (audio/DSP) on a platform where floats were abysmally slow. This forced me to explore Fixed-Point options which in turn forced me to explore what the differences to floats are.

Fixed point gave rise to the old programmers meme 'if you need floating point you don't understand your problem'. It's of course partially in jest but there is a grain of truth in it as well.

Also heavily used in FPGA based DSP.